So the unique visitor tracking test is running. At the moment of writing, I have 159 unique visitors in my visitor table. From an excerpt of the results shown below, it is clear that I need to flag the bots and crawler and exclude them from the page tracking.
id | User agent
82 | Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_7; en-us) AppleWeb [...]
83 | Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/530.5 [...]
84 | msnbot/2.0b (+http://search.msn.com/msnbot.htm)
85 | DoCoMo/2.0 N905i(c100;TB;W24H16) (compatible; Googlebot-Mobile/2.1; [...]
86 | Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.1.2) Gec[...]
id | Visitor | Page
341 | 83 | /
342 | 83 | /tour.html
343 | 56 | /refund/
344 | 56 | /privacy/
345 | 84 | /robots.txt
346 | 84 | /
347 | 85 | /robots.txt
348 | 85 | /
349 | 84 | /refund/
The good thing is that the robots and crawler are good Internet citizens, as you can see for the MSN bot with the id 84, they are always requesting the robots.txt file for the first request. This means that one can directly flag a new visitor as a bot if the first action is to grab the robots.txt file.
Now, this will kick out most of the bots and crawler but not these ones:
70 | Mozilla/5.0
A very minimal user agent string.
303 | 70 | /doc.html//?_SERVER[DOCUMENT_ROOT]=http://www.[...]
304 | 70 | /
305 | 70 | //?_SERVER[DOCUMENT_ROOT]=http://www.[...]
306 | 70 | /doc.html//?_SERVER[DOCUMENT_ROOT]=http://www.[...]
307 | 70 | /
308 | 70 | /doc.html//?_SERVER[DOCUMENT_ROOT]=http://www.[...]
309 | 70 | //?_SERVER[DOCUMENT_ROOT]=http://www.[...]
310 | 70 | /
And looking to trash my site. I am already not logging the ones without a user agent string, but it looks like I will need to use the heuristics of AWStats to mark more of the visitors as bot.
/robots.txt as crawler/bot.I am going to work on that this afternoon and will report to you the results.
Loïc d'Anterroches,
Aug 26, 2009
Read more news with tag: A/B Testing, ...
© Céondo Ltd, 2007-2013. All rights reserved.