Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie

Bandwidth usage by Googlebot?

Options
  • 08-11-2005 1:40am
    #1
    Closed Accounts Posts: 519 ✭✭✭


    Just wondering if anyone else has noticed this lately? Googlebot has been eating bandwidth like crazy and even though I banned it's ass in robots.txt it seems as if it's just ignoring it...

    Really starting to P*** me off - just today it used 18mb's on one site - last week I had it register something like 80mb's in one day (According to my stats that is)..

    Thinking I should send them a bill for every mb over the average which other bots are currently setting at about 2-3mb's maximum...

    Just wondering if anyone else is having problems in this area - especially it ignoring the robots.txt which I know is valid and is in the right place.....


Comments

  • Closed Accounts Posts: 12,382 ✭✭✭✭AARRRGH


    Post your robots file here so we can take a look.


  • Closed Accounts Posts: 519 ✭✭✭smeggle


    User-Agent: *
    Disallow: /
    
    User-Agent: googlebot
    Disallow: /firemonger/
    
    User-Agent: googlebot
    Disallow: /firefoxcd/
    

    According to the validator http://www.searchengineworld.com/cgi-bin/robotcheck.cgi

    it's correct but could be written better? Something to do with lower/upper case?


  • Registered Users Posts: 3,886 ✭✭✭cgarvey


    A robots.txt file I put up 3 weeks ago is being adhered to now (took 2.5 weeks). Is the origin IP definately a Google one? If it is, and it's ignoring your robots.txt, then just ban the IPs!

    .cg


  • Closed Accounts Posts: 12,382 ✭✭✭✭AARRRGH


    Alright Cathal. We were in DIT CS together :D

    I know about Avatarworks...

    Guess who? :)


  • Closed Accounts Posts: 519 ✭✭✭smeggle


    cgarvey wrote:
    A robots.txt file I put up 3 weeks ago is being adhered to now (took 2.5 weeks). Is the origin IP definately a Google one? If it is, and it's ignoring your robots.txt, then just ban the IPs!

    .cg

    yeah I would but have you seen how many ip.s they use? it's near enough a full block - you can't really block at that level....

    Anyway - it's one to keep an eye on as the next closet high user is only 1-2mb's maximum and then only weekly.....


  • Advertisement
  • Registered Users Posts: 7,739 ✭✭✭mneylon


    smeggle wrote:
    Really starting to P*** me off - just today it used 18mb's on one site - last week I had it register something like 80mb's in one day (According to my stats that is)..

    That's hardly a lot of bandwidth.


  • Closed Accounts Posts: 519 ✭✭✭smeggle


    blacknight wrote:
    That's hardly a lot of bandwidth.

    True it's not but only when thats looked at on it's own. The day before that or a day or so prior it had used over 100mb's. Over a given month then and an average of say 100mb's pr day, that would work out at close on 3gigs of bandwidth.
    If then your on a fixed amount of bandwidth across multiple domains i.e. as a small business web-master then on an e-commerce hosting account you come quite close to running out of your alloted 15gigs bandwidth pr month.

    In your side of the business it wouldn't really be seen as cause for concern but as I say when managing a small business webhosting service then it can cause undesirable affects.


  • Registered Users Posts: 7,412 ✭✭✭jmcc


    smeggle wrote:
    Just wondering if anyone else has noticed this lately? Googlebot has been eating bandwidth like crazy and even though I banned it's ass in robots.txt it seems as if it's just ignoring it...
    Have you verified that it is really Google that is downloading pages and not some maggots with webscrapers and webminer programs?

    Google typically deep spiders around the start and end of each month. Also if any clients are running Google Adsense ads on their sites, then this spidering would be separate to the main Google spidering - the bot for this would also have a Media Partners identification.

    Regards...jmcc


  • Closed Accounts Posts: 519 ✭✭✭smeggle


    jmcc wrote:
    Have you verified that it is really Google that is downloading pages and not some maggots with webscrapers and webminer programs?

    Google typically deep spiders around the start and end of each month. Also if any clients are running Google Adsense ads on their sites, then this spidering would be separate to the main Google spidering - the bot for this would also have a Media Partners identification.

    Regards...jmcc

    Well I can rule out adsence as the site that bought it to my attention doesn't carry them, thats the 'Open Source' project I look after but I'll look into the other stuff - any tips on what to look for more specifically?


  • Registered Users Posts: 7,412 ✭✭✭jmcc


    smeggle wrote:
    Well I can rule out adsence as the site that bought it to my attention doesn't carry them, thats the 'Open Source' project I look after but I'll look into the other stuff - any tips on what to look for more specifically?
    That the IPs the Googlebot is coming from are actually Google IPs and not random ISP IPs. Sometimes the host resolution is not on by default on weberservers and the stats may be generated on user agents only.

    Regards...jmcc


  • Advertisement
  • Closed Accounts Posts: 1,033 ✭✭✭beller b


    I have this IP following me around my site. Any Ideas if its from google
    66.249.65.42


  • Registered Users Posts: 7,739 ✭✭✭mneylon


    beller b wrote:
    I have this IP following me around my site. Any Ideas if its from google
    66.249.65.42

    It is:

    whois 66.249.65.42

    OrgName: Google Inc.
    OrgID: GOGL
    Address: 1600 Amphitheatre Parkway
    City: Mountain View
    StateProv: CA
    PostalCode: 94043
    Country: US

    NetRange: 66.249.64.0 - 66.249.95.255
    CIDR: 66.249.64.0/19
    NetName: GOOGLE
    NetHandle: NET-66-249-64-0-1
    Parent: NET-66-0-0-0-0
    NetType: Direct Allocation
    NameServer: NS1.GOOGLE.COM
    NameServer: NS2.GOOGLE.COM
    Comment:
    RegDate: 2004-03-05
    Updated: 2004-11-10

    OrgTechHandle: ZG39-ARIN
    OrgTechName: Google Inc.
    OrgTechPhone: +1-650-318-0200
    OrgTechEmail: arin-contact@google.com


  • Closed Accounts Posts: 1,033 ✭✭✭beller b


    For the ammount of time it spends Google doesn,t show many of my pages
    http://www.boards.ie/vbulletin/showthread.php?t=319607

    Don't mind about bandwith , I,m on 500gb PM.. Just wish it would do it's job..


  • Closed Accounts Posts: 12,382 ✭✭✭✭AARRRGH


    Have you created a google sitemap? Makes a big difference...

    http://www.google.com/webmasters/sitemaps/login?sourceid=gsm&subid=us-et-about2


  • Registered Users Posts: 7,739 ✭✭✭mneylon


    smeggle wrote:
    True it's not but only when thats looked at on it's own. The day before that or a day or so prior it had used over 100mb's.

    The only time I've seen that kind of issue with Google was when it was downloading archives (ie. software packages) instead of just indexing their existence.
    I checked the Googlebot's activity on our main site and it's used about 300MB per month, whereas on my personal blog it's using around 200 MB per month. MSN's bot is usually a much worse offender than Google


Advertisement