Anomie said:When I had spiders allowed on a previous Wordpress page, they just hammered the daylights out of it, no doubt wasting a lot of bandwidth and to no good purpose.
Is there a plugin or a robots.txt tweak that will allow crawlers access to the site, but just occasionally?
Somewhere, I've bookmarked a page that explains creative uses of robots.txt and it has in the past been very useful, but I'm not sure if it will really do exactly what I want here...but it might. I seem to recall that the delay had a fairly brief upper limit. Anyway, I was somewhat surprised to see that all SEs actually honored all the other settings as I had them at that time.Tobymaro said:Unfortunately as I know there is not such documented option. Robot.txt is used for denying or allowing of bots. But you may try
Crawl-delay: X
DJB said:Spiders and crawlers indexing your pages are a good thing, you should embrace them not boot them. Its an indication your website/s are growing.
[...]
Be carefull, blocking access to search engine robots could result in loss of traffic to your website.Not only that the techniques are only instructions to to robots, they can ignore them if they choose (example ahrefs.com)
There are two problems for me:xdude said:Hmm I have never bothered about spiders and amount of bandwidth they use is not really a big deal since we have huge amounts of bandwidth these days.
DJB said:Spiders and crawlers indexing your pages are a good thing, you should embrace them not boot them. Its an indication your website/s are growing.
strokerace said:Ummm, I would be blocking some of these bots as they are taking way too much bandwidth. They should be a few megs, not into the gigs of bandwidth.