Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie

Cuil.com Twiceler rogue bot!

Options
  • 18-02-2010 4:18pm
    #1
    Registered Users Posts: 3,140 ✭✭✭


    Has anyone heard about cuil.com?

    A site I have been working on started logging dozens of 404's every hour a few days back. They were all from the same bot called Twiceler and it turns out it is used to index www.cuil.com

    In just a few days the bot sent 24,000 requests. It's indexing dynamic pages with every single combo of params and doing so at an incredible speed. It also doesn't remember the sub-domain it is on which triggered all the 404's (I was getting 404's from http://www.mysite.com/forumdisplay.php when it should have been http://forum.mysite.com/forumdisplay.php ).

    Bandwidth on the site has increased 3 fold.

    I've now blocked this bot and the IP address block it came from in my htaccess file

    [PHP]
    order allow,deny
    deny from 67.218.*
    allow from all

    RewriteEngine on
    RewriteCond %{HTTP_USER_AGENT} ^Twiceler
    RewriteRule ^.* - [F,L]
    [/PHP]

    Has anyone come across this before? Does anyone know if cuil.com is used and is it a mistake to block robots like this?


Comments

  • Registered Users Posts: 515 ✭✭✭NeverSayDie


    Yeah, it was a new search engine launched by some ex-Google folks back in mid-2008, amid a great deal of hype. It fizzled out spectacularly as I recall, mainly by dint of not being very good at all, let alone as good as Google. Don't know what's happened to it since, but I very much doubt it has the slightest share of the search market. I'd say you're safe enough blocking them if they're up to that kind of nonsense, though you could try contacting them.

    http://www.belfasttelegraph.co.uk/lifestyle/technology-gadgets/google-old-boys-launch-their-own-engine-in-global-search-for-wealth-13923276.html

    Edit; see here, seems like you're not they only one they may have caused problems for;
    http://techcrunch.com/2008/09/01/is-cuil-killing-websites/


  • Registered Users Posts: 3,140 ✭✭✭ocallagh


    Cheers, that article was really good and linked to this page http://www.cuil.com/info/webmaster_info/ which has their entire list of IP addresses

    Upon further research it seems the bot even guesses URLs for example one person reported the bot as indexing

    topic/b
    topic/bl
    topic/blo
    topic/blog

    I haven't seen that yet... If anyone else has a problem with this (Some have called it a DoS attack) here is the updated .htaccess. I was going to just filter on user agent but apparently Twiceler has been known to mask that..

    
    order allow,deny
    deny from 38.99.13.
    deny from 67.218.116.
    deny from 216.129.119.
    allow from all
    
    
    RewriteEngine on
    RewriteCond %{HTTP_USER_AGENT} ^Twiceler
    RewriteRule ^.* - [F,L]
    

    I'll keep an eye on alexis and see how cuil does over the next few months, might give it access again if it gains any market share.

    It currently has 0.011% of global users visiting it each day. I would guess at least half of those users are network admins wondering what is taking their servers down!


Advertisement