New on LowEndTalk? Please Register and read our Community Rules.
All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
Comments
Are you using any WP caching plugins?
It is possible cloudflare could help you with this.
install quick cache ~ http://wordpress.org/extend/plugins/quick-cache/
or limit crawlers with robots.txt
install varnish. move apache to listen to different port. then configure varnish to listen to port 80 and redirect traffic to the apache port.
I use fastcgi cache plus w3tc. It looks like they were crawling individual comments.
Are you using nginx?
Yes nginx and fastcgi cache and w3tc
do you know which bot, I have a feeling it is bingbot.
Php-fpm?
Yes fpm. The pages they requested weren't cached. Bing bot was looking at unique comment id's or something
Bingbot is known for rushes, where they do a days worth of queries in a few hours.
Do you have forced time out set?
Hmm... I've never seen official bingbot doing this. I've seen a fake bingbot do this to me, but that's the extent of the damages.
I'm going to try robots.txt as bingbot will apparently reference the delay rate.
User-agent: *
Crawl-delay: 15
Disallow: /wp-admin/
Disallow: /wp-includes/
dont add disallows, as malicious bots/skriptkiddies search for those disallows
try to change the folder/dir names to a non standard one.
you van set the crawl delay to bing only.. or for all bots, just google it
Log into Bing's webmaster interface. It'll let you slow the crawl rate from there, as well as identify the best time to crawl.
And have some fun with it.
Oh yeah.
I once had a site on which I put hidden links (both text and images) and robots.txt entries for the same honeypot. Around 150 bots went on that page.
@Gien I think the 20 plus wp-content references every page has pretty much gives WP away
@bdtech yeah you can always tell if its an wp site.. but there ways to hide it a bit
also you dont want extra attention to your admin panel..
also add another layer with htaccess, in your wp-admin wp-includes etc...
if you can edit your iptables / hosts.deny file you can add the ips of the malicious bot(s)
If it's really an issue, use fail2ban. Then it's managed automatically.
Which fail2ban rule do you use?
I had similar issue once, register for Bing webmaster tools (www.bing.com/toolbox/webmaster) and adjust the robot crawling speed from their controls, you can reduce it so that instead of crawling your site in an hour it can do it gradually and thus keep your load down.
I don't. I was just following up on the honeypot concept for dealing with malicious bot that abuse "Disallow" entries in robots.txt. Legitimate bots like Google and Bing can be rate-limited as discussed. For malicious bots:
Yeah @sleddog should keep the bad ones out most of the time
Found the culprit. w3tc does not cache if there's a query string and these requests from bingbot and others peg MySQL
/preview-science-20/?replytocom=117365
Any idea how i can translate this to nginx?
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} "msnbot|Googlebot|bingbot|Slurp|ScoutJet|MJ12bot|Baiduspider|Ezooms|YandexBot|Exabot"
RewriteCond %{QUERY_STRING} ^replytocom=\d+$
RewriteRule ^(.*)$ http://%{HTTP_HOST}%{REQUEST_URI}? [redirect=301,last]
http://support.tigertech.net/wordpress-performance