Empty User-agent

1

How to start working with us.

Geolance is a marketplace for remote freelancers who are looking for freelance work from clients around the world.

2

Create an account.

Simply sign up on our website and get started finding the perfect project or posting your own request!

3

Fill in the forms with information about you.

Let us know what type of professional you're looking for, your budget, deadline, and any other requirements you may have!

4

Choose a professional or post your own request.

Browse through our online directory of professionals and find someone who matches your needs perfectly, or post your own request if you don't see anything that fits!

How can I get a blank Agent? The code has an analysis code for the data analysis and analyzes only the traffic in the database. Some User Agents can be found that represent both human traffic and bot traffic. However, the empty user agent string proves problematic to us. I'm also getting traffic from the empty agent, approximately 10%. In addition, I created a user agent strings database using my Log Analysis for humans and robots. I could also miss some entries there. Do user agents denote bots and humans?

User-agent

If your application needs to identify whether the originator of requests is human or bot, then create an algorithm that collects this data to decide whether the originates from a human or bot. This would be much easier than creating two databases for one protocol.

The interpretation of empty user agents is difficult - if not impossible - because bots may send many record types other than getting requests with no user agent specified at all. For example, if you have 10,000 requests, but only ten are sent with an empty user agent, it would be difficult to determine that humans account for 1%.

Suppose you would like to establish whether there is a risk factor associated with the originator of request traffic (i.e. humans vs bots). In that case, you can use the following algorithm:

1. First, determine all GET requests by filtering out other record types (POSTs, etc.) - this should provide the total number of getting requests.

2. Then, filter out sessions where no request parameters were passed (this will identify POST and PUT requests). This step will leave a list of values that indicate a page view or downloads from a URL parameter. You can split them up per second/minute/hour if you want.

3. Finally, calculate the percentage of requests with no user agent header information sent.

This would give you a measure of how many requests are coming from bots that do not identify themselves - and may be indicative of a security risk.

Empty user agent VS robot

It is pretty impossible to make such a generalization. Your in-house definition of what constitutes a bot may be different from my bot's self-definition and the interpretation by the tool developers.

We can say with certainty that if a missing user agent string were sent, it would have been associated with sessions not contained in the robot side of the database. Therefore, if you are getting traffic from empty user agents, they are not being detected by your "bot" detection algorithm - i.e. you are probably missing some types of bots.

Signs you have no human traffic

 A better question might be, how do you know that your detection mechanism works 100% of the time? Unfortunately, you don't - because an unknown number of bots will likely exist that are not covered by the detection mechanism you use.

Be aware, though, that most (if not all) "bots" out there are only something that analyzes web pages and downloads data; nothing complicated like a separate application layer or multi-threaded network connection to work on multiple requests at once. I get traffic from the empty user-agent strings about 10%. This also includes robots such as GoogleBot - which was developed for indexing purposes and not for your web application security testing.

Subscribe to RSS

One way to mitigate the risk of traffic from unknown or un-identified user agents is to subscribe to an RSS feed that will notify you of any changes (new entries) in blank user agent data. This would include newly registered bots and new browser types and versions. You could then create a rule to block and monitor bad bot traffic from these sources automatically.

Alternatively, you can use a commercial WAF that includes a user contributions agent database for identifying known bots and malicious activity. These databases are updated regularly, so you can be assured that you get the latest information about bot behaviour.

In conclusion, understanding how web application firewalls work is essential to protecting your applications - you are using open-source software or a commercial WAF. You need to understand how it works and what it can and cannot do. You should always expect that there will be zero false positives and a small amount of unknown activity - this is just the nature of the type of technology being used.

Geolance is an on-demand staffing platform

We're a new kind of staffing platform that simplifies the process for professionals to find work. No more tedious job boards, we've done all the hard work for you.


Geolance is a search engine that combines the power of machine learning with human input to make finding information easier.

© Copyright 2022 Geolance. All rights reserved.