Difference between revisions of "Block Bots by User Agent String"

From Brian Nelson Ramblings
Jump to: navigation, search
(Created page with "==How to block a bot by User Agent Sting== Do you have those bandwidth hogging bots as much as Phil and I do? Did you know you can block them in your .htaccess file? ===B...")
 
(Via BrowserMatch)
 
(4 intermediate revisions by the same user not shown)
Line 8: Line 8:
 
  vim .htaccess
 
  vim .htaccess
  
Now Add the following to block  
+
Now Add the following to block bad bots
  
 +
===Via mod_rewrite===
 +
 +
<Directory />
 +
RewriteEngine On
 
  RewriteCond %{HTTP_USER_AGENT} Baiduspider [NC,OR]  
 
  RewriteCond %{HTTP_USER_AGENT} Baiduspider [NC,OR]  
 
  RewriteCond %{HTTP_USER_AGENT} Baidu [NC]  
 
  RewriteCond %{HTTP_USER_AGENT} Baidu [NC]  
 
  RewriteRule ^.*$ - [F,L]
 
  RewriteRule ^.*$ - [F,L]
 +
</Directory>
 +
 +
===Via BrowserMatch===
 +
 +
<Directory />
 +
BrowserMatchNoCase "Baiduspider" bots
 +
BrowserMatchNoCase "HTTrack" bots
 +
BrowserMatchNoCase "Yandex" bots
 +
BrowserMatchNoCase "AhrefsBot" bots
 +
BrowserMatchNoCase "Pinterestbot" bots
 +
BrowserMatchNoCase "YandexImages" bots
 +
BrowserMatchNoCase "YandexBot" bots
 +
BrowserMatchNoCase "Facebot" bots
 +
BrowserMatchNoCase "DotBot" bots
 +
BrowserMatchNoCase "PetalBot" bots
 +
 +
Order Allow,Deny
 +
Allow from ALL
 +
Deny from env=bots
 +
</Directory>
  
Thats it save the file, and you are now blocking the Baidu spider
+
That's it save the file, and you are now blocking the Baidu spider
  
 
===Testing to see if its blocked===
 
===Testing to see if its blocked===

Latest revision as of 01:26, 27 October 2020

How to block a bot by User Agent Sting

Do you have those bandwidth hogging bots as much as Phil and I do? Did you know you can block them in your .htaccess file?

Block the BOT

Let block the most annoying bot on the internet - Baidu spider

vim .htaccess

Now Add the following to block bad bots

Via mod_rewrite

<Directory />
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} Baiduspider [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} Baidu [NC] 
RewriteRule ^.*$ - [F,L]
</Directory>

Via BrowserMatch

<Directory />
BrowserMatchNoCase "Baiduspider" bots 
BrowserMatchNoCase "HTTrack" bots
BrowserMatchNoCase "Yandex" bots
BrowserMatchNoCase "AhrefsBot" bots
BrowserMatchNoCase "Pinterestbot" bots
BrowserMatchNoCase "YandexImages" bots
BrowserMatchNoCase "YandexBot" bots
BrowserMatchNoCase "Facebot" bots
BrowserMatchNoCase "DotBot" bots
BrowserMatchNoCase "PetalBot" bots

Order Allow,Deny
Allow from ALL
Deny from env=bots
</Directory>

That's it save the file, and you are now blocking the Baidu spider

Testing to see if its blocked

One way to do this is to use curl

curl -I http://www.briansnelson.com -A "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"

Now you will get a 403 Forbidden

HTTP/1.1 403 Forbidden
Date: Mon, 06 Jan 2014 19:18:11 GMT