Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie

robots.txt/flash on Irish websites

Options
  • 17-07-2003 2:41am
    #1
    Registered Users Posts: 7,412 ✭✭✭


    I've been running some new search engines here over the Irish owned webspace and some of the results have been less than helpful for the websites as they serve to decrease the ranking of websites in search engine results.

    The robots.txt file seems to be an optional file with Irish webdesigners. Most of the Irish sites that the search engines here index have no robots.txt file. For a small brochureware site of less than ten pages, this is not really problematic but when it gets to a large database driven website (a few Irish ones exist) then all the engines will begin to index the whole site.

    Another rather disturbing trend I've seen over this search engine run is the muppet webdeveloper who decides to do index pages, or even worse a doorway page, completely in flash. Search engines do not index it and these days, sites live or die by search engine traffic.

    Regards...jmcc


Comments

  • Registered Users Posts: 258 ✭✭peterd


    Originally posted by jmcc
    The robots.txt file seems to be an optional file with Irish webdesigners.
    Should the index depth not be limited by the search engine? Surely big sites would want all their content spidered if it was left to them.


  • Banned (with Prison Access) Posts: 16,659 ✭✭✭✭dahamsta


    Should the index depth not be limited by the search engine?

    Have been for years, since the early days when dynamic sites brought down both the target and indexing machines. TBH I was a little confused by John's post too. What exactly is the problem John? Foot.ie is pretty big and I don't want to restrict spiders, in fact I use mod_rewrite to specifically open it up to them. Am I a bad person?

    adam


  • Closed Accounts Posts: 5,025 ✭✭✭yellum


    If a search engine relied on robots.txt to find content on a site then the search engine is crap.

    robots.txt is there generally to keep engines out of all or some areas. It contains other info too but I don't think robots.txt is something that will knowc your ranking down.


  • Registered Users Posts: 1,452 ✭✭✭tomED


    Originally posted by jmcc
    The robots.txt file seems to be an optional file with Irish webdesigners. Most of the Irish sites that the search engines here index have no robots.txt file. For a small brochureware site of less than ten pages, this is not really problematic but when it gets to a large database driven website (a few Irish ones exist) then all the engines will begin to index the whole site.

    I dont see why its a problem if the spider does index the whole site.... if its a large site theres more than likely lots of really good content that people would be interested in.


    And the robots text is as you say "optional". The robots tet file is only useful for blocking pages that you dont want indexed.... but realisitically, me as an internet marketer would certinly want to make sure each and every page is indexed.... but even that doesnt always happen with some of the larger search engines.
    Originally posted by jmcc
    Another rather disturbing trend I've seen over this search engine run is the muppet webdeveloper who decides to do index pages, or even worse a doorway page, completely in flash. Search engines do not index it and these days, sites live or die by search engine traffic.

    Regards...jmcc [/B]

    Although I have to agree with you in relation to the fact that sites live or die by search engines nowadays - Google (which handles 90% of the webs search queries) can follow links within a flash movie. There are lpenty of tech niques to get you to the top of search engines using flash, albeit difficult and sometimes classed as spamming...


  • Registered Users Posts: 258 ✭✭peterd


    Originally posted by tomED
    ...Google [...] can follow links within a flash movie.
    can you prove this ? (exmaples)


  • Advertisement
  • Registered Users Posts: 7,412 ✭✭✭jmcc


    Originally posted by peterd
    Should the index depth not be limited by the search engine? Surely big sites would want all their content spidered if it was left to them.

    The sites would like all pages to be indexed on each run but the search engines have different priorities. The logistics of a search engine with a few thousand pages is markedly different to that of one with over 100K pages. You've got to determine what is worth indexing and what is not. The robots.txt file becomes very important as it reduces wasted time and bandwidth.

    Regards...jmcc


  • Registered Users Posts: 7,412 ✭✭✭jmcc


    Originally posted by dahamsta
    Should the index depth not be limited by the search engine?

    Have been for years, since the early days when dynamic sites brought down both the target and indexing machines.

    Most SEs now use a more distributed and aperiodic approach to indexing though I've seen Google indexing pages at the rate of one or two a second in bursts over a period of about 5 hours.

    TBH I was a little confused by John's post too. What exactly is the problem John? Foot.ie is pretty big and I don't want to restrict spiders, in fact I use mod_rewrite to specifically open it up to them. Am I a bad person?

    The problem with sites not having robots.txt is that SEs will index everything by default. Sometimes the spiders can pick up a lot of internal stuff that was supposedly not meant to be indexed. (I've found a copies of the Apache user manual in the data.)

    Some webdevs will include a robots statement in the meta data but some spiders and SEs do not respect this on a page by page basis and the Pragma no cache statement also tends to be a bit flakey. I think that one of the reasons that spiders have become more picky about the meta robots statement is that smart webdevs will use it to get their sites indexed on a more frequent basis.

    Having a robots.txt file is generally helpful for search engines.

    Regards...jmcc


  • Registered Users Posts: 7,412 ✭✭✭jmcc


    Originally posted by dahamsta
    TBH I was a little confused by John's post too. What exactly is the problem John?

    I just reread my post and see the problem. Ironically, I had deleted the critical line while editing it.

    What I should have included was the fact that a hell of a lot of Irish websites have no page titles, no page descriptions and no keywords. As a result their standings in the search ngine results pages are affected. There is a lot of work out there for good SEOs but there is a problem convincing most Irish site owners to use these services. Then again , most Irish websites are simply brochureware so the cost may not be warranted.

    Regards...jmcc


  • Registered Users Posts: 7,412 ✭✭✭jmcc


    Originally posted by tomED

    but realisitically, me as an internet marketer would certinly want to make sure each and every page is indexed.... but even that doesnt always happen with some of the larger search engines.

    Yep it is the SE Operator /SEO dilemma. The SE operator wants to maintain a good index of usable material and the SEO needs to get the biggest and most effective SE footprint for the client. As an operator, the main concern would be in providing content that is useful. This means that the index would have to be pruned and this is a fairly time intensive task, even with SQL that is written for the purpose.

    The Google PR links idea was a good one but it is largely obsolete. In an isolated, academic world, the idea is brilliant however it is flawed in that it is easily exploited. At its core was the premise that people would inevitably link to good websites. It was in effect relying on people to decide what should and should not count as a good link.

    The inbound and outbound links may contribute to the depth to which a site is indexed. Indexing a site that is merly a linkswamp is not a good thing for a search engine and I've seen a lot of thist type of site cropping up recently though I've not seen many Irish variants.

    I think that this has made directory style websites all the more important for SEs as they provide some elementary filtering for search engines. Google uses ODP for its directory and most SEs tend to use the ODP dataset in some way.
    Google (which handles 90% of the webs search queries) can follow links within a flash movie.

    I haven't looked at the flash spec but I am not sure that Google have deployed this kind of parser yet.

    Regards...jmcc


  • Registered Users Posts: 7,739 ✭✭✭mneylon


    Originally posted by jmcc
    I just reread my post and see the problem. Ironically, I had deleted the critical line while editing it.

    What I should have included was the fact that a hell of a lot of Irish websites have no page titles, no page descriptions and no keywords. As a result their standings in the search ngine results pages are affected. There is a lot of work out there for good SEOs but there is a problem convincing most Irish site owners to use these services. Then again , most Irish websites are simply brochureware so the cost may not be warranted.

    Regards...jmcc

    This I have to agree with John on.
    One of the functions we offer on irishsearch.net is the 'spider link' option. It's a silly little something I added one wet weekend, but the results are quite scary.
    Although some people will pontificate about how Google and others do not rely on Meta content for ranking it definitely makes a difference if the spidered link has a meaningful title and description. A LOT of Irish sites fail miserably on both counts and this is without even going into the entire ALT etc., area.


  • Advertisement
  • Registered Users Posts: 1,452 ✭✭✭tomED


    Originally posted by peterd
    can you prove this ? (exmaples)

    There are a few examples - do a search for google indexing flash and you will get a few posts on the likes of webmasterworld with information and the infamous GoogleGuy has a few posts about it.

    SEOchat.com also has a few articles on it.

    It follows links which means it will only index the content that is inr you title and meta description (or whatever other text you may have on the page).

    They have not yet implemented the indexing of content within a flash movie, although they pan to release this in the near future. A lot of seo's expected it in the last alog update, but it didn't seem to happen.

    On the note about irish websites and developing their sites properly for search engine rankings.....
    YES its a major issue. But I don't blame the people that own the websites. I think its more the crapy service that a lot of irish web development companies give.

    They think its enough just to put up a website for their client. They don't care if it suceeds or fails. This is why so many people have lost faith in the power of the internet.... hopefully I can help change that! :D


  • Registered Users Posts: 7,739 ✭✭✭mneylon


    Originally posted by tomED

    hopefully I can help change that! :D

    ROFL


  • Registered Users Posts: 1,452 ✭✭✭tomED


    Originally posted by blacknight
    ROFL

    I know - ultimate pimping... I should really ask a mod to delete that line eh...


Advertisement