More ways to stop spammers and unwanted traffic

Comment spammers, trackback spam, stupid bots and AVG linkscanner eating into your bandwidth and server resources? Here’s how to put a dent in their activities with a few mod_rewrite rules.

I hate those blogs that send me fake trackbacks and pingbacks. Unfortunately it’s impossible to stop but this morning I figured out a way of stopping some of them.

Look through the log files of your web server for the string ‘ “-” “-“‘. Lots of requests there aren’t there? I found 914 requests yesterday. Those are requests without a USER_AGENT or HTTP_REFERER and almost all of them are suspicious because they weren’t followed by requests for images, stylesheets. or Javascript files. Unfortunately the WordPress cron server also falls into this category so you need to filter out requests from your own server’s IP address.

This morning I checked up on a spam trackback that came in. This one came from 85.177.33.196:

URL: /xmlrpc.php
HTTP_RAW_POST_DATA: <?xml version=”1.0″?>
<methodCall>
<methodName>pingback.ping</methodName>
<params>
<param>
<value><string>http://7wins. eu/cbprod/detail_10347/cure+your+tight+foreskin.html</string></value>
</param>
<param>
<value><string>http://ocaoimh.ie/2005/03/01/i-am-bored-sites-for-when-youre-bored/all-comments/</string></value>
</param>
</params>
</methodCall>

I looked through my log files for that IP address and discovered the following:

85.177.33.196 – – [03/Jul/2008:06:40:01 +0000] “GET /2005/02/18/10-more-ways-to-make-money-with-your-digital-cameras/ HTTP/1.0″ 200 36151 “-” “-”
85.177.33.196 – – [03/Jul/2008:07:04:18 +0000] “GET /2007/06/07/im-not-the-only-one-to-love-the-alfa-147/ HTTP/1.0″ 200 44967 “-” “-”
85.177.33.196 – – [03/Jul/2008:08:09:40 +0000] “GET /2005/03/01/i-am-bored-sites-for-when-youre-bored/all-comments/ HTTP/1.0″ 200 410423 “-” “-”
85.177.33.196 – – [03/Jul/2008:08:09:44 +0000] “POST /xmlrpc.php HTTP/1.0″ 200 249 “-” “XML-RPC for PHP 2.2.1″
85.177.33.196 – – [03/Jul/2008:09:00:09 +0000] “GET /2007/10/28/what-time-is-it-wordpress/ HTTP/1.0″ 200 63332 “-” “-“

So, the spammer grabs “/2005/03/01/i-am-bored-sites-for-when-youre-bored/all-comments/” at 8:09am and 4 seconds later sends a trackback spam to the same blog post. Annoying isn’t it?

The following mod_rewrite rules will kill those fake GET requests dead.

# stop requests with no UA or referrer
RewriteCond %{HTTP_REFERER} ^$
Rewritecond %{HTTP_USER_AGENT} ^$
RewriteCond %{REMOTE_ADDR} !^64\.22\.71\.36$
RewriteRule ^(.*) – [F]

Replace “64\.22\.71\.36″ with the IP address of your own server. If you don’t know what it is, look through your logs for requests for wp-cron.php, run ifconfig from the command line, or check with your hosting company.
Here are a few of the requests already stopped this morning:

72.21.40.122 – – [03/Jul/2008:09:59:59 +0000] “GET /2005/04/02/photo-matt-a-response-to-the-noise/ HTTP/1.1″ 403 248 “-” “-”
216.32.81.66 – – [03/Jul/2008:10:00:11 +0000] “GET /2006/12/14/bupa-to-leave-irish-market/ HTTP/1.1″ 403 240 “-” “-”
66.228.208.166 – – [03/Jul/2008:10:03:18 +0000] “GET /2008/05/23/youre-looking-so-silly-wii-fit HTTP/1.1″ 403 212 “-” “-”
216.32.81.74 – – [03/Jul/2008:10:04:52 +0000] “GET /1998/03/22/for-the-next-month-o/ HTTP/1.1″ 403 234 “-” “-”
69.46.20.87 – – [03/Jul/2008:10:06:06 +0000] “GET /2006/10/01/killing-off-php/ HTTP/1.1″ 403 229 “-” “-”
72.21.58.74 – – [03/Jul/2008:10:07:54 +0000] “GET /2005/08/12/thunderbird-feeds-and-messages-duplicates/ HTTP/1.1″ 403 255 “-” “-“

Some spam bots are stupid. They don’t know where your wp-comments-post.php is. That’s the file your comment form feeds when a comment is made. If your blog is installed in the root, “/”, of your domain you can add this one line to stop the 404 requests generated:

RewriteRule ^(.*)/wp-comments-post.php – [F,L]

Trackbacks and pingbacks almost always come from sane looking user agents. They usually have the blog or forum software name to identify them. Look for “/trackback/” POSTs in your logs. Notice how 99% of them have browser names in them? Here’s how to stop them, and this has been documented for a long time:

RewriteCond %{HTTP_USER_AGENT} ^.*(Opera|Mozilla|MSIE).*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^$
RewriteCond %{REQUEST_METHOD} ^POST$
RewriteRule ^(.*)/trackback/ – [F,L]

I’ve been using that chunk of code for ages. It works exceptionally well. This was prompted by a deluge of 40,000 spam trackbacks this site received in one day a few months ago.

If you use my Cookies for Comments plugin. Check your browser for the cookie it leaves and use the following code to block almost all of your comment spam:

RewriteCond %{HTTP_COOKIE} !^.*put_cookie_value_here.*$
RewriteRule ^wp-comments-post.php – [F,L]

That will block the spammers even before they hit any PHP script. Your server will breeze through the worst spam attempts. It blocked 2308 comment spam attempts yesterday. Unfortunately it also stops the occasional human visitor leaving a comment but I think it’s worth it.

Do something different. That’s what you have to do. Place a hurdle before the spammers and they’ll fall. On that note, I shouldn’t really be blogging all this, but almost all these ideas can be found elsewhere already and the spammers still haven’t adapted.

Unwanted traffic? What’s that? Surely all visitors are good? Nope, unfortunately not. Robert alerted me to the fact that AVG anti-virus now includes an AJAX powered browser plugin called “Linkscanner” that scans all the links on search engine result pages for viruses and malicious code. Unfortunately that generates a huge number of requests for pages that are never even seen by the visitor. I counted over 7,000 hits yesterday.

Thankfully Padraig Brady has a solution. I hope he doesn’t mind if I reprint his mod_rewrite rules here (unfortunately WordPress changes the ” character so you’ll have to change them back, or grab the code from Padraig’s page.)

#Here we assume certain MSIE 6.0 agents are from linkscanner
#redirect these requests back to avg in the hope they’ll see their silliness
Rewritecond %{HTTP_USER_AGENT} “.*MSIE 6.0; Windows NT 5.1; SV1.$” [OR]
Rewritecond %{HTTP_USER_AGENT} “.*MSIE 6.0; Windows NT 5.1;1813.$”
RewriteCond %{HTTP_REFERER} ^$
RewriteCond %{HTTP:Accept-Encoding} ^$
RewriteRule ^.* http://www.avg.com/?LinkScannerSucks [R=307,L]


You might also like

If you like this post then please subscribe to my full RSS feed. You can also click here to subscribe by email. There are also my fabulous photos to explore too!

67 thoughts on “More ways to stop spammers and unwanted traffic


  1. Outstanding, will add to our increasing bundle of ‘Donncha’s spam stopping solutions’… seriously, you should roll this and your other neat htaccess blocks up together and do a release of them!

    Anyway, erm, thanks :)


  2. Very nice. Thank you for this information as it was exactly what I was looking for this morning. Over the past 3 days my comment spam has shot up exponentially so I needed a good solution.

    Thanks again!


  3. “Unfortunately it also stops the occasional human visitor leaving a comment but I think it’s worth it.”

    What no bold font for this? It’s pretty serious: in trying to lock out spammers, you are locking out humans too. At the very least, you should mention in your comments form that cookies are required, else merely having high security settings will block someone.

    Stop helping spammers to break the web and find a better way to do this.

    I’ll probably unsub from your site soon because it’s too fiddly to interact with it.


  4. Are you sure this “SV1″ blocking wouldn’t block regular users actually visiting the search result they found?

    I just checked my logs, finding a *lot* of these SV1 accesses, and most really just get the page and all script files referenced in it (so they certainly are from this Linkscanner), but some actually get all images, too, just like for actual human users, and some of these have the exact(!) UA string that’d be blocked, without any “.NET…”. So, well, I’m worried I’d block real visitors (who just happen to have this Linkscanner installed) this way, too…


  5. MJ – occasional meaning 1 person, maybe 2 had problems. The first person enabled cookies again and unfortunately I couldn’t track down why the second person couldn’t post a comment.
    Since I installed the cookies protection over 1,600 comments have been left on my site so it works for most people and it’s stopped tens of thousands of spam comments. I’m really sorry to see you go, but it doesn’t affect feed readers.

    cimddwc – SV1 plus no referrer is a pretty good sign of that AVG link scanner. Since I added those rules my traffic according to Analytics or my own Javascript counter hasn’t gone down.


  6. All sound like great ideas. I’m not so keen on cookies, but I’m going to give that tip/plugin a try nevertheless.

    I don’t seem to get trackback/ping spam…maybe I’ve got those disabled. I ought to doublecheck.


  7. Donncha, seems you’re right about the empty referrer. Linkscanner does not just grab the scripts, it also gets one particular GIF that way, and I didn’t cross-check that with the other requests before.


  8. cimddwc – I was hesitant about applying those linkscanner rules myself at first but as they didn’t affect the number of real visitors I got I left them in!


  9. Thx for the tip with the linkscanners.

    I was wondering why my traffic from MSN had tremendously increased for search terms where I am not even listed on the first 50 pages :-)

    Now I’ll be off to implement your suggested solution.


  10. The lack of user agents was something I noticed long ago however my reverse proxy setup won’t allow .htaccess to deny the request if no user agent is specified due to my sites being in a DMZ with private IP addresses. :|

    I’m sure there is probably some way to configure my squid acls to do so which is something I must look into.

    As it is between my squid filter acl and some .htaccess goodness means that from a high of several hundred per spam attempts per day – I am now down to less than half a dozen and they are trapped by Akismet :)


  11. Not to be nitpicky or anything, but isn’t “ifconfig” supposed to be “ipconfig”, or am I just wrong?


  12. I, too have been plagued by spam (100,000 in one day), and tried to use this stuff. However, I put some of this stuff in my htaccess file (assuming that is where it is supposed to go–it never says in the post) and was blocked from my own site. Can you tell us where to actually put this stuff?


  13. Requirement of Accept-Encoding may block legitimate traffic via proxies that strip this header.
    I recommend allowing AVG-like traffic that has Via header.

    just add:
    RewriteCond %{HTTP_VIA} ^$


  14. Man, you just keep making WP fun to use.

    Any thoughts on “kL” suggestion?

    I went ahead gave everything except that last cookie thing a try.

    Nothing is breaking yet. ;)


  15. I’m not afraid to admit you’re one giant step ahead of me with your clever spam fighting techniques.

    Do you think your approach will someday make it in a plug-in or, even better, in the WordPress code itself?

    Many thanks for providing us with this valuable info.


  16. Excellent ideas, thanks! I use a plug-in that has managed to wipe out all my comment spam. I haven’t had to moderate comments in over 8 months. It is outstanding and I highly recommend it. It is WP-Spamfree.


  17. hi, about this:

    If your blog is installed in the root, “/”, of your domain you can add this one line to stop the 404 requests generated:

    RewriteRule ^(.*)/wp-comments-post.php – [F,L]

    what if I installed into a subdirectory, i.e. wordpress ?

    same issue here:

    Check your browser for the cookie it leaves and use the following code to block almost all of your comment spam:

    RewriteCond %{HTTP_COOKIE} !^.*put_cookie_value_here.*$
    RewriteRule ^wp-comments-post.php – [F,L]

    do I need to change the path, and if yes, how?

    And how do I know which one is the cookie set by your cookie plugin?


  18. Thanks for the tip on AVG. I too noticed a sluggishness on my machine after installing the latest version of AVG. Hopefully your advice will assist. Thanks.


  19. ovidiu – I think you add the directory after the “^” character. ie:

    RewriteRule ^wordpress/wp-comments-post.php …

    But if the .htaccess file is in the wordpress/ directory that may not be necessary. Anyone else know?

    Keishon – you’re doing something different. That’s what stops the spammers!

    KL & BlakkJakk – When I copied Padraig’s rules it didn’t have the Accept-encoding line but the rules work well and haven’t had a disastrous affect on my traffic stats.


  20. Hi for all this stupid spam bots I’m using
    bot-trap.de/home/.

    Ok the community is in German language. But I’m sure they speak english too.

    Use it – say WP here it is and spam bots are a things of the past.

    This file from bot-trap is the best I know. ;)

    regards
    Monika


  21. Can never understand the intellect of those people who can be so severe that the sending garbage think that this is the correct way of profit. I use Akismet as I believe most of us here, but something like this helps to prevent attempts to people who believe that their Internet space in which can behave as if there is not some kind of cultural code of conduct.
    Thanks to like this useful tool.


  22. Following is AVG’s official response to LinkScanner concerns:

    We’d like to thank our web community for bringing these challenges to our attention, as building community trust and protecting all of our users is critical to us. We have modified the Search-Shield component of LinkScanner to only notify users of malicious sites; this modified version will be rolled out on July 9th 2008. As of this date. Search-Shield will no longer scan each search result online for new exploits, which was causing the spikes that webmasters addressed with us. However, it is important to note that AVG still offers full protection against potential exploits through the Active Surf-Shield component of our product, which checks every page for malicious content as it is visited but before it is opened.


  23. This has become a little more important for me since the move to my own dedicated and every bit helps! Thanks for the tips Donncha!


  24. @dagoberhr Some people are desperate. Some people are lazy. Some people are greedy. Choose any one of those reasons or any combination to explain why they spam.


  25. Great information, My blog is new but still I get lots of unwanted bots… thank you for this.


  26. good job. Been looking for something like this. Looks very promising. Thanks for the info and will come back and let yah know how it goes.


  27. @donncha:

    I am having problems with your cookie for comments plugin. I activated it and now no comment works anymore. are there any settings I need to adjust?

    you can test it here: http://pacura.ru/ its activated, yo ucan’t comment and I have no clue how to identify the cookie dropped by your plugin :-(


  28. ovidiu – thanks for reporting that. It’s because you have WordPress in a /wordpress/ directory. I’ll change the plugin so it’s more flexible. Check for an update in the next hour or so!


  29. like I already stated on another plugins page:

    I am still wondering why a lot of plugins break because of this small change I have. And installing into a folder, I found on the wordpress codex. There are even installation instructions for installing into a subfolder: http://codex.wordpress.org/Installing_WordPress#In_a_Subdirectory

    If I had known this would cause me so much trouble, I”d avoided it long ago.

    I guess its just something like get_bloginfo(’wpurl’) instead of get_bloginfo(‘siteurl’) right?


  30. ovidiu – just bad luck I guess. I would guess that the vast majority of WordPress installs are in the directory where they’re used.


  31. Donncha, I like your solution to the AVG issue. Seems AVG has gotten the message. I use Avast! myself when I run Windows.


  32. I’ve got most of the best anti-spam tricks in place. Including the old standby Spam Karma 2. One thing that always got me though.
    With SK2, even if the karma was like -50, it still was kept on the site waiting for me to manually purge it.
    Finally wrote my own SK2 plugin that lets you block spammy comments. Not just quarantine.
    http://pirate-king.com/episode/2441
    Since then, I am no longer greeted with “You have 87 spam comments in your queue” which makes me very happy.


  33. Good post, I’ll be using a few of these myself because of several hundred spam comments on one of my blogs(linked from my name) which is just 3 months old and has little or no SERPs.

    >> “RewriteRule ^.* http://www.avg.com/?LinkScannerSucks [R=307,L]”

    Classic, I really liked that.


  34. Thank you so much for this post, while offtopic it has shown me how blog pings (and such) work – leading me somewhere new and giving me something to research.

    This has been the new thing that I learned today.

    (of course the AVG element is now redundant as they should have now stopped doing this (as of yesterday AFAIK, but that’s still a worthwhile example)


  35. James – it should be safe enough to disable those rules now anyway as AVG have released an update that removes their link scanner!


  36. So, is this the ultimate list of ‘donncha’s rules for stopping spammers’ (skipping Cookies for Comments and AVG LinkScanner stuff):

    `
    # stop requests with no UA or referrer
    RewriteCond %{HTTP_REFERER} ^$
    Rewritecond %{HTTP_USER_AGENT} ^$
    RewriteCond %{REMOTE_ADDR} !^64\.22\.71\.36$
    RewriteRule ^(.*) – [F]

    # stop requests to wp-comments-post.php outside of /
    RewriteRule ^(.*)/wp-comments-post.php – [F,L]

    # bad user agents for Trackbacks
    RewriteCond %{HTTP_USER_AGENT} ^.*(Opera|Mozilla|MSIE).*$ [OR]
    RewriteCond %{HTTP_USER_AGENT} ^$
    RewriteCond %{REQUEST_METHOD} ^POST$
    RewriteRule ^(.*)/trackback/ – [F,L]
    `


  37. That’s interesting. Unfortunately I don’t think I have access to that info on my host, and I’d not know exactaly what I’d be doing anyway :P


  38. I’ve really learned alot from this article.. (its a oldie) Thanks for doing all this research and highlighting a preferred way to combat spam is at the HTTP protocol level not the PHP application level.

    I re-found this article from you moving your url.. BTW.


    1. Thanks, I was trying to debug an obscure problem with wp-super-cache, static homepages and a permalink structure with the category and postname. After 20 hours of testing my server worked perfectly. Grrr.

      I’ve reverted everything back, unfortunately that means these posts will be seen as new again by WordPress Planet. :(

Leave a Reply

Loading Facebook Comments ...