Bot Filtering

1

How to start working with us.

Geolance is a marketplace for remote freelancers who are looking for freelance work from clients around the world.

2

Create an account.

Simply sign up on our website and get started finding the perfect project or posting your own request!

3

Fill in the forms with information about you.

Let us know what type of professional you're looking for, your budget, deadline, and any other requirements you may have!

4

Choose a professional or post your own request.

Browse through our online directory of professionals and find someone who matches your needs perfectly, or post your own request if you don't see anything that fits!

On September 30, 2014, Google Analytics launched an automatic removal feature. The Tes ing View level in the admin area now allows known bots and spiders to block traffic. Almost everything I'd read on these topics is just a mirroring announcement and doesn't explain why. What's the best reason why people do nothing? General y speaking, you should check this box. How can I test view this first? The Spiderman, Transformers are neither bots nor spiders, and neither represents an accurate representation of mine.

Bot

Bots are software applications that run automated tasks over the Internet. There are two types of bots, crawlers and spiders. Crawlers follow links on a page to find more pages to index while spiders read the content of a page and add it to their database for later retrieval. In other words, they help search engines like Google understand what your website is about so they can rank you higher in search results.

If you want your site indexed by significant search engines like Google, Bing, or Yahoo, then you need bots crawling your site regularly! Geolance provides an easy way for anyone with a website to start using our service without requiring any technical knowledge! Our team will make sure all websites we crawl have adequately been tagged with metadata so that when we reach out to them, they know who we are and why we're there! We'll also provide monthly reports detailing how many times each URL was crawled and which were missed during the month (if any). This information is invaluable when determining if changes made on your site caused specific URLs not to be crawled anymore or if there's something wrong with those URLs themselves preventing us from reaching them at all. And finally, our system will automatically notify you via email whenever one of your sites has been updated, so you don't have to worry about checking everyone yourself! This makes Geolance an indispensable tool for web admins everywhere looking for better.

Excluding unknown traffic bots in 7 easy steps

Do you see visitors with unknown IPs or known bots and spiders?

Step 1: Open the Google Analytics View Settings page.

Step 2: Click on Admin to open your Admin area of Analytics. On the left-hand side, click on Filters. This wi l bring up the Filter Screen.

At this point, it's not easy to see that there is a Bot Filtering option after clicking the "Manage Filters" link in step #2 above. It would be better to move it somewhere else for better visibility at first glance. But don t let it bother you; if not clicking "Manage filters," then click "Add Filter." You can still find Bot filtering here after doing so

Step 3: Paste in the IPs you want to remove from your data and apply (100.100.100.1 and 100.100.100..3 in this example). Recommended: If you have a range of IP addresses, use a comma-separated list between square brackets []

Step 4: Apply "Exclude" for Inclusion

Step 5: When the Filter has been created, click Apply on the bottom right corner of the screen to save it

Step 6: Verify that the filter has been applied by going back into the Admin panel > View Settings > Bot Filtering.

You should see that there are no entries in the box filtering out bots and spiders from Google Analytics! By applying this setting, traffic from a specific bot will be blocked. The set ing is applied in real-time to existing data and reflected in the relevant metrics/reports.

Does your website generate any traffic coming from bots or spiders? If so, you should apply this filter to remove them. This wi l ensure that all of your organic traffic comes from legitimate sources, not just unknown bots & spiders.

The time it takes for the bot filtering filter to start working.

The filters usually affect new visitors within 2 hours max, but keep in mind that if they visited before applying the filter—they would continue to be counted until 2 hours passes after applying the modification! So, first, I created a script (see below) that counts all unique visitors per day; then, I set up a custom dashboard that looks at the number of increments per hour. As you can see, there are no plateaus or averages in any reporting data after 2 hours have passed since this filter has been applied.

The problem with intelligent bots

It's pretty standard for people to use more than one bot—for example, some crawlers run by Google itself. These are sometimes called "smart" bots because they follow crawling rules set by the search engine.

If you're using more than one bot, then you should exclude both of them at once using a comma-separated list between square brackets in the Custom Filter Instructions area on Step 4 above. Your custom filter will look like this:

You can also modify reports to change how they appear before and after adding the filter. It is advised that if you have seen irregularities in your analytics data lately, then start applying filters as soon as possible, primarily if your website generates a high number of visits from bots and spiders.

A good bot

Google is not the only search engine out there. If you visit any website with an intent to get more information about the site's content—you are using a bot too! There an e different characteristics of bots that several companies have developed over the years to help better classify what they do on your website.

There are several actions that a web crawler can take to follow specific rules on how it will function before reaching your site—for example, ignoring internal links on websites or following JavaScript links. In addition, by adding new parameters into their automated methods, crawlers can be trained to identify destructive HTML code and report errors back to Google on the fly. This makes it easier for them to identify bot traffic and bad websites in the search indexes which users flag.

In most documentation, a human user is usually referred to as a "spider." Still, there is no absolute distinction between robots and spiders because robots are just programs that mimic their actions—a robot can pass the Turing test just like any other average person! However, since bots have been around for several years already, differentiating between what makes one type of bot do its job better has become more apparent.

For example, Yahoo's Slurp crawler follows JavaScript links while Googlebot doesn't since Google doesn't want people trying to manipulate search results via custom or handmade URLs. As you can see here, this means that if you have lots of internal JavaScript links pointing to the same destination, this will affect your Google ranking. But, again, this is because Yahoo's crawler follows all of these custom links while Google doesn't—Yahoo might think you have more pages indexed than you do since it can't follow those JavaScript links!

That's enough about bots for now. I hope that answers your question! Feel fr e to contact me if I didn't explain something correctly or if you'd like help setting up any filter in Analytics.

Make sure to set up a notification on my profile so you don't miss out on new articles! Also, if any other filters should be included in this list, let us know using the comments section below.

Best Practices For Implementing New Filters For Your Analytics Data

Step 1: Set up a custom profile

#wc-new_filter_profile{display:none !important;}

"If you're using more than one bot, then you should exclude both of them at once using a comma-separated bounce rate list between square brackets in the Custom Filter Instructions area on Step 4 above." 

Bot traffic and reasons it shows up in Google Analytics

Bot traffic in Google Analytics is a collection of hits that arise from automated processes and tools such as search engine crawlers or web crawlers. Bots are also referred to as "spiders," "web robots," and "scrapers." While most bots mimic human visitors, they can cause errors when crawling websites. A famous example of this is when a robot enters incorrect information into website forms, thus resulting in inaccurate clean data collection for your analytics account.

The traffic data collected by bots may seem like random characters appearing on the source/medium report if you have not set up filters to help manage bot activity reporting. In addition, as time goes by, you may experience an increase in bot-related reporting across multiple reports within your account if you don't employ specific filters that help reduce unnecessary reporting.

Because of the impact on your account's data, you should consider implementing bots filters if these source/medium combinations frequently appear on your reports: * Direct/none

#wc-new_filter_profile{display:none !important;}

Step 2: Set up a new profile

 If you have determined that a bot is causing errors in your Google Analytics account, then it's time to create a custom filter bot traffic to exclude those pesky bots. Follow the steps below to set up a profile specifically for bot traffic.

You can set up as many profiles as you'd like from within the Settings section of your analytics account. Just click into Profiles and select Create New Profile. It will be helpful to name your profile something that would describe what traffic it will filter out. For example, you could create a new profile called "Bot Traffic" or "Exclude Robots."

Once created, select the newly made profile and access the settings for Bot Filters. You'll ee an option called Filter Name, where you can input "bots" as the value of this setting. Next, c ick on Create Filter. This brings you to Step 3 of 4 in creating your bot filter:

Step 3: Edit your Custom Filter

#wc-new_filter_profile{display:none !important;}

Here's where you can start entering specific HTTP Referrer strings into the Match Type drop-down menu. It's a good idea to start by including all of the traffic you've determined as "bots" and later work on removing any source/medium combinations that aren't bots from this filter.

#wc-new_filter_profile{display:none !important;}

Step 4: Select Exclude All Query Parameters

In the example above, only visitors from doubleclick.net will reach your site because it is the only hostname included in this filter rule. In addition, Google Analytics ignores URLs with multiple query parameters unless added to this list. This means that if doubleclick.net were also using a query parameter for tracking purposes, it would not be on your website even if it were included in the list below.

Query Parameters are not case-sensitive. To start, click on the Add Filter Field drop-down menu and select Hostname. Then click on Apply (All Visits) to add this filter field to your profile. After that, match up only the hosts you'd like to exclude from your reports with a leading asterisk (*). For example, if you would like all visitors except for doubleclick.net, then *doubleclick.net* should be written here without quotation marks:

#wc-new_filter_profile{display:none !important;}

Once you have added all of your bots to this list, select Apply again to save this setting within your profile. Now, you have completed your filter!

Once the new profile is created, verify that you no longer see traffic from these bots reporting on your account. If not, then go ahead and delete the bot filter to eliminate the possibility of adding this filter rule to any other profiles within your Google Analytics account. Alternatively, you can create another profile specifically for reporting on bot activity if it serves a purpose in your organization. Then, repeat Step 2 through Step 4 above using a different Filter Name and Query Parameter list.

Tell me the difference between spiders and bots.

Spider bots are not inherently evil. These crawlers are usually created by search engines, social media sites, or other similar organizations that gather publicly available information for various purposes. Google Analytics considers any non-commenting visitor like a spider bot, including search engines and robots. However, you can see which specific spiders are present on your site within the Traffic Sources > Search Engine report.

You'll need to re-create your filter if you decide to change these settings later or wish to create another profile to filter out different sets of bots.

#wc-new_filter_profile{display:none !important;}

Signs your filters work

Once you have saved this setting within your profile, you can look for changes in bot reporting within the Traffic Sources > Sources > Referrals report. Once your filters are in place, you will no longer see traffic attributed to robots or spiders on the Referrals (or any other) report.

Steps to take if filters don't work

If you have checked that this profile is being applied correctly and are still unable to see any of these bots on your reports after deleting the original filter rule, please check out our Troubleshooting section below.

If you are interested in more information about a particular non-bot referral

Once you have deleted the new profile, you can follow these steps to re-create it and add additional filters specific to that website. For example, if your goal is to determine which of your referrals are coming from Google.com, you can adjust your bots filter as shown below:

#wc-new_filter_profile{display:none !important;}

#wc-googlecom{display:inline!important;}

Now apply this change within the same profile used for filtering out bots only, and delete the bot filter rule from this new profile before saving your changes. All Google visits will show up on your reports again without any other referrals included in that data set. In addition, you will now see Google (and any other specific domains) counted in your Referrals report.

I don't want to see traffic attributed to a particular domain.

You will need to delete the bot filter within your new profile and create another profile specifically for this set of referrals. To get started, highlight the desired campaign (or website) by clicking on it in your list of Excluded Domains. Then click on Add Filter Field > Campaign Source Parameter, and match up only the particular domains you wish to exclude from your reports with leading asterisks (*). So, for example, if you'd like all visits coming from doubleclick.net excluded from your reports, write *DoubleClick* here:

#wc-new_filter_profile{display:none !important;}

#wc-doubleclicknet*{display:inline!important;}

Now apply these changes within the same profile used for filtering out bot activity, and delete the bot filter from this new profile before saving your changes. You will now see all referring domains except those you have expressly excluded from your reports.

I want to combine both my website exceptions with my Other Traffic exception.

You can add a few more filters based on URL Query Parameter order numbers to establish what other exceptions you would like to include in this final set of referrals that are not considered spam or bots. For example, if you wish to exclude all traffic coming from any links containing a particular query parameter, add the following to your new filter profile. Any traffic with that URL Query Parameter will no longer be included in this final set of referrals.

#wc-new_filter_profile{display:none !important;}

#wc-orderid*{display:inline!important;}

I have mistakenly excluded a website

If you notice that you suddenly see referral spam within your reports after removing the bot filters from your profiles, check to make sure you haven't accidentally added any of these sites to your exception lists. Of course, you can add them back in and re-apply the bot filter again if necessary, but consider taking some time to verify which domains should be filtered out since this will help ensure they remain unlisted in the future.

If you have checked that this profile is being applied correctly and are still unable to see any of these referrals after deleting the original filter rule, please check out our Troubleshooting section below.

#wc-new_filter_profile{display:none !important;}

#wc-orderid*{display:inline!important;}

I want to see all referral traffic except for a few sites.

You can adjust your bots filter as shown below to help exclude particular websites from your reports without including any other referrals in that data set. Replace each URL with one or more specific domains that should be filtered out of your referral reports (you can use *domain*.com format). For example, if you don't want to see any referrals from amazon.com, write *amazon.com* here:

#wc-botdomains{display:none !important;}

#wc-amazoncom*{display:inline!important;}

Now apply this change with your bot filtering profile and delete the bot filter rule from this new profile before saving your changes. All Amazon visits will show up on your reports again without any other referrals included in that data set. In addition, you will now see Amazon (and any other specific domains) counted in your Referrals report as well as Other Traffic or Direct traffic recognized by Google Analytics.

I don't want to see traffic attributed to a particular domain.

You can add a few more filters based on URL Query Parameter order numbers to establish what other exceptions you would like to include in this final set of referrals that are not considered spam or bots. For example, if you wish to exclude all traffic coming from any links containing a particular query parameter, add the following to your new filter profile. Any traffic with that URL Query Parameter will no longer be included in this final set of referrals.

#wc-new_filter_profile{display:none !important;}

#wc-orderid*{display:inline!important;}

I have mistakenly excluded a website

If you notice that you suddenly see referral spam within your reports after removing the bot filters from your profiles, check to make sure you haven't accidentally added any of these sites to your exception lists. Of course, you can add them back in and re-apply the bot filter again if necessary, but consider taking some time to verify which domains should be filtered out since this will help ensure they remain unlisted in the future.

If you have checked that this profile is being applied correctly and are still unable to see any of these referrals after deleting the original filter rule, please check out our Troubleshooting section below.

#wc-new_filter_profile{display:none !important;}

#wc-orderid*{display:inline!important;}

Also, while we're talking about bots, some webmasters who have been using the Web Analytics Solution Gallery for a while may notice that referral spam from specific websites they have added is no longer accounted for in their reports. This is because, starting with Google Analytics version 3.0, we introduced Bot Filtering into GA to help reduce the amount of referral spam you see on your reports. For example, if you were to add http://www.amazon.com/gp/browse.html/?ie=UTF8&marketplaceID=ATVPDKIKX0DER&me=A38N9U4NBLQ5ZJ&merchant=A38N9U4NBLQ5ZJ&redirect from http://buy.com, it would show up under referral spam in your regular reports. However, adding that site to your excluded sites list using the Solution Gallery will no longer appear as referral spam in your reports. While this may sound like a great way to filter out any referrals coming from Amazon or any other particular domains, there are some essential reasons why we recommend not to use this feature until our next release.

The reason is that even though you may exclude these sources, they can still affect your metrics when combined with the Web Analytics bot filtering features. This is because GA has several systems that deliver data to users' reports. The first is what Google calls Real-Time reporting, which delivers information directly from GA's server-side (and Cloudflare). In addition, Google Analytics uses a mechanism called Data Import to periodically pull data from the server logs side reports and send it directly to the user's browser so they can view their results in near real-time. This is how most users see GDN or AdWords activity regularly. Finally, Google also offers a feature called Include/Exclude Filtering, which allows site owners who have been tracking their site for some time to specify additional source/medium combinations that should not be imported into Google Analytics from specific domains.

Since all of these systems are used by GA to provide your reports with data, there may be situations where you exclude a particular source that would typically show up as referral spam in the report but instead appear in another system. For example, if a site owner were to exclude http://buy.com from all of their reports using the Solution Gallery, ALL traffic from buy.com would finally stop being included in data sent for Google services such as AdSense or Doubleclick Campaign Manager (DCM or DART). However, since that site is still tracking activity on its domain, it will continue to deliver referral spam when you filter out one system but not another. Because our bot filtering is currently used only to remove bot traffic referrals from sources you have intentionally added in your profiles and filtered out in a profile directly related to Analytics. We recommend waiting until GA 3.1 has been released before adding any new sites to your excluded sites list.

Other types of bots

Not all bots are created equal, so we treat them differently regarding bot filtering. We start by identifying each referral's source and looking at our information about its behaviour in Google Analytics reports overtime before deciding how to filter out that particular type of traffic. For example, we know that most search engine crawlers (such as Googlebot) always visit one page and then leave. However, suppose they ever visit more pages on your site (and GA is recording those visits). In that case, this usually means there is a problem with our filters, or something has changed with the way your website operates such that you can no longer track activity from these bots accurately. Therefore, any web-based crawler (Google, BingBot, AskJeeves, etc.) that has visited more than one page in the past 30 days (in any profile) will be filtered out automatically. In addition, since we know there is a high probability that AdSense crawler traffic comes from mobile devices and we can't track the activity on those devices accurately, all traffic from IP addresses owned by Google and assigned to mobile devices (and only Google assigned IP addresses for mobile devices worldwide) will be removed.

Detecting suspicious activity in GA

The last category of traffic that we filter out is any suspicious activity. What exactly does this mean? Well, Google Analytics interprets suspicious activity as any behaviour that is either highly unusual or violates the Terms of Service you agreed to when you signed up for your Analytics account. This can include bots/scripts, which usually generate spam on websites (such as spam comments). It may also occasionally represent real visitors with JavaScript disabled or malfunctioning browsers, so they cannot load GA properly.

When these referral entries are detected in your reports, Analytics automatically filters them out. It then keeps track of the number it has filtered overtime per source by adding their values together in a new column called "Filtered Referrals" (the column is only visible in reports with "Unsampled Data"). So, for example, if Analytics identified 12 spam referrals from http://spam.com detected over the past 30 days and then filtered them out, the new Filtered value for that particular source would be 12.

Currently, when a referral entry is removed due to suspicious activity detection, it will count towards your total number of visitors for that date range because Google Analytics cannot reliably determine whether or not these spammers have been blocked permanently or just temporarily until their next visit so this method better predicts your actual visitor numbers. However, since we are constantly updating our filters to catch more bots/scripts, you may notice temporary drops in traffic until one type of bot has been filtered out.

"Blocked Referrals" column

As you may know, GA has had a similar reporting column named "Blocked Referrals" for quite some time now, which counts all referral entries that were blocked by your filters (including bot filtering) and also kept track of the number over time per source. As with suspicious activity detection, we treat blocked referrals as a temporary condition until their next visit because it is impossible to determine if a visitor was a spammer or not without additional information from the crawler itself. Therefore, once a referral entry is blocked this way, Analytics adds its value to the Filtered value instead of Blocked for that particular source/profile combo to better account for temporary drops in traffic.

Over time, once the frequency of these blocked referrals drops significantly to a point where these numbers no longer substantially impact your actual visitor numbers, we will stop adding their values together in a new column and instead only show the Filtered value, which does not include any referral entries blocked by your filters. As a result, you will see a change in how this data is reported, but it should still correspond roughly with the Blocked value before this update. This change will be transparent to all users unless you notice that one or more sources suddenly have more extensive filter referrals than Blocked. In other words, if there are both blocking and filtering going on for some reason, take a look at your source/profile pair reporting columns for each source to determine which referral entries are being filtered out.

Identify and exclude Bot traffic.

Even though the Analytics team has been hard at work to improve our ability to detect spam traffic, we still can't always catch everything. In those cases where a bot manages to slip through the cracks, and your reports show a significant number of referrals from a single source/profile combo over a short timeframe (especially if they are spammers known for producing referral spam), you may see something like this:

This particular message indicates that Google Auto Update was blocked because spammers likely use it to help spread their malware or drive-by downloads. If you don't recognize any of these sources as legitimate referring websites for your site activity, there's a good chance that one or more of them were blocked because they are bots attempting to post fake content on your site.

The easiest way to differentiate between a valid referral and one that was blocked is to click on the Full Referrer value reported under the Details section of the notification message:

Suppose you do not recognize this referrer as a legitimate source for traffic, such as another website, newsletter subscription, or someone sharing your content with others via email, etc. In that case, it's likely spam so take steps to remove these entries from Analytics by blocking them in your configuration file.

By default, we apply our filters intended for suspicious activity detection by looking at whether or not an IP address has made more than 50 requests over the past 10 minutes. This means that if you have bot filtering activated along with other restrictions either globally or locally (e.g., Enabling your Referral Exclusion List ), you may want to check and see if the referrer's IP address is in one of these lists. If not, we suggest blocking it by adding them to your configuration file and adjusting the number in our filters_bulk_action filter accordingly.

Steps you should do

The best way to block bot traffic is through your access control mechanism, where you manage which bots are allowed or blocked from accessing your site content. For example, this can be done on Apache servers through .htaccess files using mod_rewrite directives, while Nginx users use a series of RewriteRule rules placed within their server configuration blocks. Either way, consult with your hosting web administrator about any changes that need to be made to your site's access control mechanism.

The most direct way to block traffic from specific bots you are seeing is to add them as Referral Exclusion List entries in Google Analytics. These can be found under Admin within the Profile for which you want to filter out the bots. Click on Referral Exclusion List :

From there, you have two options: either enter each bot you want to block manually or use a list of known referrers provided by Google. The latter option is beneficial if a pattern makes it easier for us to identify common bots to be blocked more efficiently by our filters.

Updating Filters

Editing Configuration File

On an ongoing basis, we recommend that you monitor Analytics reports for new spam spider traffic and add these suspicious referrers to your Referral Exclusion List accordingly. While i's possible to block them manually on an ad-hoc basis, keep in mind that this will require updating the bots list with every change made while adding them through our interface allows you to make updates only once.

If you decide to use Google's list of known bots, we highly recommend enabling the option labelled Validate Entries Using: Common Known Bots. This will help ensure that unidentified referrers (i.e., those not present in our customer or user lists) are given a more accurate assessment by re-checking against our latest historical data set. Again, be sure you add new bots that fall into this category to your Referral Exclusion List or use the other blocking methods described above. Allowing unidentified bots to pass unblocked may cause fairly significant fluctuations in your web traffic statistics.

Signs showing which referrers were blocked and how effective it was

The easiest way is to monitor the percentage of referrals we report as blocked under Admin > View Settings while comparing it with the total number of all referrals:

Also, while we do not list specific entities within the Referral Exclusion List (due to privacy concerns), you can find out more about what happened by looking at our Error Reports for Google Analytics, which includes information about any spambots that our filters could not adequately identify.

The New Bot and Spider Filtering Features

In addition to the new bot blocking capabilities, we have also added a set of filters that can help improve your statistics even more by removing all traffic from bots that are actively crawling your site.

Of course, this is not necessarily an indication that you should block these referrers since some crawlers might be accessing public pages while others are only following links within content accessible only to authenticated users. Thus, if you want to prevent Google Bot or any other known entity from sending you referral data through Analytics, adding them manually under Admin > View Settings would be the way to go.

For example, disabling Googlebot may help filter out stats caused by indexed search results while allowing logged-in users to view content on your site to keep reports as accurate as possible.

On the other hand, if you would like to remove all international spiders and bots that are not included in our list of known crawlers and which do not pass a certain threshold for time spent on your site, use the configuration file method described above or add their referrers directly by clicking on Referral Exclusion List under Admin within any Profile > Edit Settings :

Note: be sure to restart your application after updating your configuration file, so changes take effect. Also, please keep in mind that some bots crawl only public pages while others may require tokens provided by Google to access authenticated content. Thus, w highly recommend adding them manually or using one of our pre-configured ones, which can automatically filter since blocking all "unknown" referrers may cause significant fluctuations in your traffic statistics.

Geolance is an on-demand staffing platform

We're a new kind of staffing platform that simplifies the process for professionals to find work. No more tedious job boards, we've done all the hard work for you.


Geolance is a search engine that combines the power of machine learning with human input to make finding information easier.

© Copyright 2022 Geolance. All rights reserved.