Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie

Spider Friendly CMS

Options
  • 17-06-2004 10:18am
    #1
    Closed Accounts Posts: 237 ✭✭


    Does anybody know of any spider friendly CMS. In the past, I have recommended systems like typo3 and PHPnuke, however, it seems that most spiders will not crawl into database content.
    On the other hand, spiders will crawl forums like invisionboard, vbulletin etc. which basically use the same technology as most CMS’s

    I googled for the term “Spider Friendly CMS” and there are a few who claim to be, but has anybody any experience on this?


Comments

  • Registered Users Posts: 7,739 ✭✭✭mneylon


    Typo3 is spider friendly if you use mod_rewrite properly


  • Registered Users Posts: 7,412 ✭✭✭jmcc


    Most spiders are sophisticated enough to crawl almost any CMS. However the depth of the crawl depends on the importance of the site. I've seen a few spiders that had problems with ? in a URL but that was down to bad coding. The CMSes that produce what appear to be simple static URL are the best. I think that Midgard was one of these.

    The reason that spiders don't deep crawl on sites that look like forum sites is because the content on a forum is continually changing and to keep the data current would require continual spidering. From the SE point of view, it would be a waste of bandwidth if the site is not important enough and from the site owner's point of view it would be an unnecessary strain on resources.

    Regards...jmcc


  • Registered Users Posts: 7,739 ✭✭✭mneylon


    Originally posted by jmcc

    The reason that spiders don't deep crawl on sites that look like forum sites is because the content on a forum is continually changing and to keep the data current would require continual spidering. From the SE point of view, it would be a waste of bandwidth if the site is not important enough and from the site owner's point of view it would be an unnecessary strain on resources.

    Maybe it depends on the site :)

    One of our sites is constantly being crawled by Google. As soon as there is a new thread you can see the Googlebot grabbing it :)


  • Closed Accounts Posts: 237 ✭✭FreeHost


    Thanks for the reply’s

    Michelle

    The mod_rewrite in a .htaccess file should do the trick, but, one client has an invisionboard forum on his site, without the mod_rewrite rule and it get crawled regularly by google, I was wondering how that worked. Does typo3 specifically need the rewrite rule?

    John

    I’ll have a look at Midgard, the querys I’m getting are for a CMS on articles posted by guests, “but he wants the whole site crawled”


  • Registered Users Posts: 7,739 ✭✭✭mneylon


    Freehost

    I'm not a typo3 guru, but afaik you need the mod_rewrite for it to produce .htm* extensions on the files.

    Google seems to index and reindex one of our sites which is primarily vbulletin, so I'm not sure whether that's the vbulletin archive feature or Google's fixation with the site...


  • Advertisement
  • Registered Users Posts: 1,569 ✭✭✭maxheadroom


    how about xaraya with short urls turned on?

    For an example (by no means finished) look at http://eypireland.com/xaraya . I have no Idea how often, or even if, this gets spidered, but I can't see why it would be spider unfriendly...

    EDIT: spelling


  • Registered Users Posts: 1,452 ✭✭✭tomED


    Originally posted by FreeHost
    Does typo3 specifically need the rewrite rule?

    No typo3 will work without the rewrite rule, but will only produce search engine unfriendly URLs.

    However, the latest version of typo3 has been launched which automatically generates search engine friendly urls.

    I haven't used it yet - but its apparently a much needed upgrade. :)

    EDIT: Search Engines / spiders - google follows querystringed urls, but it tends not to do every single one. If you want better results, try and make the url as friendly as possible.


Advertisement