Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie

msn bots and how to stop them?

Options
  • 07-03-2011 4:46pm
    #1
    Hosted Moderators Posts: 4,948 ✭✭✭


    Sorry folks if this is in the wrong forum.

    Can anyone tell me how to stop msn bots from crawling a website? They are eating up bandwidth and seem to ignore the disallow in my robots txt file.

    BTW I'm a complete novice at this whole web stuff so don't blind me with jargon :)


Comments

  • Registered Users Posts: 3,772 ✭✭✭Scotty #


    You could block them with .htaccess but you'll need their IP addresses.

    Bot's generally only look at HTML so you are talking a minuscule amount of bandwidth.


  • Hosted Moderators Posts: 4,948 ✭✭✭pullandbang


    Scotty # wrote: »
    You could block them with .htaccess but you'll need their IP addresses.

    Bot's generally only look at HTML so you are talking a minuscule amount of bandwidth.

    Looking at the site stats, (webalizer), the msn bot has accounted for over 80% of my bandwidth since February.


  • Registered Users Posts: 1,801 ✭✭✭cormee


    80% of your allowed bandwith or 80% of your used bandwidth? As Scotty said bots only use minimal amounts of bandwidth so it's probably your used bandwidth (especially if you don't get much traffic to the site).

    Cutting off a bot's access to your site is a pretty drastic step to take, be sure of what you're doing before you proceed.


  • Hosted Moderators Posts: 4,948 ✭✭✭pullandbang


    Hits.pdf


    See the attached.....

    Out of 3008 total sites visiting, msn accounts for over 84% of kbytes downloaded, yet it only accounts for 4% of total visits. With over 424,000 hits in one month from this bot, it's way ahead of anything else by a long shot.


  • Registered Users Posts: 1,801 ✭✭✭cormee


    Hits don't equal a page visit, they're a useless metric. If there's a million images on a page and the bot visits it once that's a million and one hits.

    The MSN bot visited 156 times and downloaded 395523 files and one of the Googlebots visited 42 times and downloaded 6511 files?

    By the look of it the issue isn't with bots. You should be looking at your robots.txt file - the disparity between the amount each bot is downloading per visit would indicate the MSN bot is being given free rein on your server (whereas the Googlebot is behaving a little better). Do you have a storage directory on the server or anything like that? In your robot.txt exclude bots from everything except your html/content files and see how that goes.


  • Advertisement
  • Hosted Moderators Posts: 4,948 ✭✭✭pullandbang


    cormee wrote: »
    Do you have a storage directory on the server or anything like that?

    No idea I'm afraid. I just inherited this site from the guy who was looking after it and I'm swimming in the deep end with no rubber ring :)

    cormee wrote: »
    In your robot.txt exclude bots from everything except your html/content files and see how that goes.

    How would that be worded?


  • Registered Users Posts: 1,801 ✭✭✭cormee


    No idea I'm afraid. I just inherited this site from the guy who was looking after it and I'm swimming in the deep end with no rubber ring :)

    How would that be worded?

    Wording would be depend on the site/server setup so I wouldn't hazard a guess. Without technical knowledge you're better off leaving things as they are.


Advertisement