One of the basic tenets of SEO is the handling of non-canonical URLs. Countless posts have been written on the topic, and it’s a topic that’s discussed at probably every Internet marketing conference, to learn more visit https://www.scott.services. Yet when it comes to our analytics, my experience has been that typically little-to-no attention is dedicated to keeping our content reports clear. As a result, many sites’ content reports are being mishandled, and skewed data is being passed up the food chain.
First things first. What’s a canonical URL? Simply put, canonical URLs are the URLs that you want search engines and visitors to your site to actually discover. The search engines have made it easier and easier for sites to be able to tell them which are their canonical URLs so that non-canonical URLs don’t sabotage a site’s money pages. Visit this website and Find out more about seo services for your business.
A typical scenario looks something like this: A client pulls its top 10 landing pages for analysis. They look at metrics like bounce rate, conversion rate, revenue, etc. But because of rogue query parameters, each of those pages may have multiple duplicates. But none of those duplicates are being factored in because no one knows they even exist, but there are different tools that can be used in web pages to improve their Search optimization, for example you can see Local Client Takeover website to find the best resources for this.
The Primary Culprit
When non-canonical URLs wind up in content reports (Behavior > Site Content), it can become difficult to downright impossible to measure the effectiveness of your most important pages. Query parameters (e.g., http://www.mysite.com/widgets?sort=asc&color=blue&sid=153678) can create many duplicates of a single page. I actually saw with one large ecommerce site that used a gaggle of query parameters, a single page could be divided up into more than a hundred rows.
Typically, the more query parameters a site uses, the more permutations of parameter a URL can take on, causing the potential for duplication to increase significantly. After a stressful day of putting in some work, you can get a breath of fresh air on sites like 해외토토.
The Solution
Google Analytics has a little-known setting that allows you to dictate which parameters you want to exclude. The key litmus test to determine if a parameter should be excluded is this: Does the parameter determine unique content?
For example, if your site used the default WordPress permalink structure that uses the p query parameter to set page URLs, you wouldn’t want to exclude the p parameter because it is used by the site to determine unique content. IOW, http://www.mysite.com/p=123 is a different page from http://www.mysite.com/p=179.
However, query parameters that merely rearrange, filter, or manipulate the information on a page (such as a sort option or size filter for a retailer) should be excluded from content reports. Excluding query parameters won’t filter out visits to these pages, like using a view (previous known as profile) filters do. Instead, it will simply consolidate pages by removing the query parameters from them. If you need more help applying this in your line of business, check with experts like Andy Defrancesco.
So, for example, if you set your view to exclude the parameters sort and color, /jackets/?color=red&sort=asc, /jackets/?color=lime&sort=desc, and /jackets/?color=red&sort=desc would all become /jackets/, and the data for these pages is aggregated. All of the data for these pages would be consolidated into one line item.
How To Find Your Site’s Query Parameters
There are two different ways to find your site’s query parameters:
Google Webmaster Tools’ URL Parameters Report
The URL Parameters report (under Crawl) contains a list of all the query parameters googlebot found while crawling the site. It’s a great place to start. What you want to do is go through that list and use the same litmus test you would to determine if a page is the canonical URL.
Google Analytics’ Line Item Filter
If you pull up the All Pages report (Behavior > Site Content) and use the filter above the report to search for an equal sign, you’ll get a list of pages that contain query parameters.
To exclude query parameters you’ve already identified, take these steps.
Step 1. Click the Advanced link to the right of the filter box.
Step 2. Click the Add a dimension or metric button below the filter you already have. Select Page as your dimension but Exclude as the drop-down to the left.
Step 3. Set the drop-down to the right of the dimension drop-down to Matching RegExp, then separate the parameters you’ve already identified with pipe characters (found above the back slash key).
Step 4. Click the Apply button and analyze away.
Set Up Exclude
When you have your list of parameters you want to exclude, simply drop them in the box, separated by commas. Yours should look something like this:
Learn More
You can learn how to clean up more than just your content reports in Google Analytics with my Analytics Audit Template, a self-guided, 147-page audit template that is regularly updated and will teach you how to do detailed analytics audits like a pro.
Photo by stavos52093
Oscar says
That filter works when you have as a query parameter terms such as color, size and sort, but how do you know what parameters are being used in a big website by not having to go page bu page.
Thanks
Annie Cushing says
If you’re working with a large site, query a list of parameters from your database.
Guest says
Great post, as always, Annie.
I always would like to stress the importance of in mosts cases having two extra filters:
1 filter to change the case all your URIs (either lowercases or uppercase) to make sure that /Index and /index will be logged in to your reports as the same page and having a 2nd filter to remove the trailing slash. That would also fix having: /index.html and /index.html/ as two different pages in your Google Analytics view.
Martijn Scheijbeler says
Great post, as always, Annie.
I always would like to stress the importance of in mosts cases having two extra filters:
– 1 filter to change the case all your URIs (either lowercases or uppercase) to make sure that /Index and /index will be logged in to your reports as the same page.
– A 2nd filter to remove the trailing slash. That would also fix having: /index.html and /index.html/ as two different pages in your Google Analytics view. You can create one via this set up as a filter: http://take.ms/H5HZ7
Annie Cushing says
Can I see examples of the URI’s that resulted from the filter to remove the trailing slash? I’m not following that filter.
Martijn Scheijbeler says
Let me make it a bit more clear what the value is of having this issue with having a trailing slash fixed in Google Analytics. By default most servers don’t have a redirect set up to remove the trailing slash. This not only could possible cause SEO issues as the page: /index and /index/ have the same content, but also in your Google Analytics account. By now most SEOs are convinced that Google should be able to get that they’re the same page and that it won’t harm your SEO. Leaves the issue with the Google Analytics content reporting.
By setting up the filter I’ve mentioned above the request URI that somebody gets when they visit a page is triggered. The regular expression checks if the URI ends with a / if it does it will be removed and in the output field the complete URI without a trailing slash is replaced. This makes sure that the /index and /index/ will both end up in your reporting as /index.
You could do a similar approach for always adding a trailing slash to the URLs to keep your URLs clean, but this one is a bit more dangerous as parameters in your URL could make: /index?q=test to this: /index?q=test/ and in my opinion that doesn’t make reporting any easier.
Annie Cushing says
I was actually hoping you would explain the regex, for the sake of other readers, not the rationale needing to have one URL for each page. I think the case for canonical URLs is pretty common knowledge. Thanks!
Sam says
A simpler solution for trailing slash is (.*)\/$
Look for anything (.*) that ends in an escaped slash \/$, and store it as Extract A.
The only caveat is if you have, like I do, another filter to rename the home page which has a Request URI of just “/” then it must precede the trailing slash rule.
If you so desired you could also then append “.html” to the output but depending on your site structure, this could be counter-intuitive.
cgrant says
Maybe you can clear something up for me. My understanding is that excluding URL query parameters causes the parameters to be ignored for any further processing. In other words, the parameters are not going to be available for building segments, dimensions, and so on. The content reports are definitely cleaned up, but you also run the risk of not being able to do certain things later on that involve the parameters you suppressed. For this reason, I keep one view that’s completely unfiltered.
Am I mistaken in how I think it works?
Annie Cushing says
Yes, that’s correct. This is the most common concern expressed when it comes to removing them. But it’s been my experience that not one single site has ever segmented by a URL parameter. So they leave them in their URLs and never do anything with them.
David Ross says
Annie do you never use URL parameters for informing Adwords campaigns for e-commerce clients?
I’ve found them useful for building tight / ad groups and landing pages on the PPC end. Saves users a click or two applying filters if you send them to a pre-built (filtered) landing page.
David Ross says
Just to clarify – I meant by not excluding cart filters in content reports you have an indication of possible good / useful PPC landing page ideas.
Annie Cushing says
No, that’s what your AdWords reports are for. The parameters you use are funneled to reports, which is more effective than trying to parse these parameter values out of URLs. If there’s additional information you want/need, that’s what custom dimensions are for.
john says
You can contribute this post who wants to use Google analytics code correctly to their sites. Anyway I never forget that’s your inspiration lines moreover it affords cleanup content report to each site owners.
internet marketing
Roxi B says
Hi Annie,
Will this be applied to previous data or will it only be applied to future visits?
I’m just wondering about comparing YoY data for certain pages types? I’m assuming I’d have to export the past years data and dedupe the URLs manually then compare that to the new, cleaned up URLs?
Thanks,
Roxi
Annie Cushing says
No, this won’t apply to historical data, so you’d have to clean up historical reports. Sorry to be the bearer of bad news!
Brian H says
Thank you for all of the great information in this post. I’m currently working on trying to eliminate our query parameters and did the filtering you had outlined. When I apply these to the View Settings, will our page views decrease, like it does when I’m just working with the filter as you outlined above to identify our query parameters? My understanding is that it shouldn’t as it’s combining the page views into one, however, our unique page views would decrease as a result of excluding query parameters. Is that correct? I’m currently trying to exclude our search, translator, and print style sheets for web pages.
Thanks so much for any further insight.
Annie Cushing says
Filtering out query parameters won’t cause a decrease in your pageviews or unique visitors. It will only consolidate the data in your content reports. Hope this puts you at ease. 🙂
matthew says
Hi Just a quick question,
I am trying to run a report though excel and what I want to show is the set amount of visits to a landing page. but these landing pages come in groups.
For example.
http://www.myurl.com/all-pdfs
http://www.myurl.com/all-pdfs/pdf1
http://www.myurl.com/all-pdfs/pdf2
http://www.myurl.com/all-pdfs/pdf3
http://www.myurl.com/all-pdfs/pdf4
http://www.myurl.com/all-pdfs/pdf5
http://www.myurl.com/all-pdfs/pdf6
http://www.myurl.com/all-pdfs/pdf7
If I wanted to show 1 report (in 1 excel tab) for all the traffic to these pages how would this be done?
Thanks in advance
Annie Cushing says
I would use content grouping for this kind of analysis. What I can’t picture is ultimately what you’re looking to show. You said you want to show visits to a landing page but then said they come in groups. So do you want to show the number of sessions for the entire group? I’m just having a tough time following. But content groupings. They’re your friend. https://support.google.com/analytics/answer/2853423?hl=en
Brittney says
Will this work with trailing ?pp=1
Annie Cushing says
Yes, it works for all query parameters.
Dan B says
Thank you for the article. We have a few parameters causing (other) to account for 75% of pageviews. How can I go backward after adding the exclude? How do you clean up old reports?
-Dan
Annie Cushing says
Daily processed tables store a maximum of 50k rows for standard Google Analytics. If you surpass that number, Google will drop your remaining traffic into the Other bucket. It’s the quintessential black box, meaning you can’t recover that data. 🙁
Michael Brown says
Excellent post! Another great way to clean up Google Analytics Data. This is especially so for business running eCommerce sites. I would be interested in seeing a follow up post as to how to query out the parameters in a databases. Thanks again for sharing Annie 🙂
Annie Cushing says
I’m not sure what you mean by “query out the parameters in a databases.”
Puhastusteenused says
Thanks!