Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie

App to crawl website for certain text strings?

Options
  • 30-01-2013 11:31am
    #1
    Registered Users Posts: 81,220 ✭✭✭✭


    Like a link checker application (i.e. Xenu) that doesn't just look for a "200 OK" response, but actually reads the pages and looks for messages like "Page error" or similar (should be customizable).

    Perhaps some customizable sort of web scraper?


Comments

  • Registered Users Posts: 7,157 ✭✭✭srsly78


    There are browser plugins that can do this. Chrome has one called "Auto Refresh Plus", that many people have been using to scan the google store checking if the nexus 4 comes back into stock.


  • Registered Users Posts: 81,220 ✭✭✭✭biko


    Thanks, that seems to be a manual process though? I'll download and test now.
    Edit: yeah that seems to just refresh the page I am currently on, doesn't have any crawling capabilities unless I missed something.

    I'm looking for something that will start at www.something.com and crawl the whole website and look for a certain text string.
    For instance "page error" or if a particular phone number is displayed. I can then move on to fix the page or change the phone number on the pages that has it.


  • Registered Users Posts: 7,157 ✭✭✭srsly78


    It has a monitor feature, you can tell it to check for certain strings, and then to execute a custom action if detected (like popping up an alarm or sending a mail or whatever).

    The only manual step is to set it running.


  • Registered Users Posts: 81,220 ✭✭✭✭biko


    I'm looking at "Auto Refresh Plus - Options" and it doesn't seem to have a automatic crawler feature.
    It will only detect a text string if I'm already on the page?

    Preferably the crawler would collect the URLs of the pages that needs attention and present them after scanning the whole site automatically.
    Like a text mining tool, but instead of mining in files on a servers it visits webpages.


  • Moderators, Society & Culture Moderators Posts: 17,642 Mod ✭✭✭✭Graham


    Do you have a list of URL's to be scraped or are you hoping that the crawler will build that list based on the links it comes across?


  • Advertisement
  • Registered Users Posts: 7,157 ✭✭✭srsly78


    You are looking for some automated testing tool for websites?

    Maybe info here: http://stackoverflow.com/questions/9120098/daily-check-all-webpages-from-a-list-of-websites


  • Registered Users Posts: 81,220 ✭✭✭✭biko


    Ok, I figured it out.

    I first use Xenu to build a file with all the pages, then simply use iMacro to visit each page and check for a the text sting.

    Goddamit I love imacros.


  • Registered Users Posts: 2,021 ✭✭✭ChRoMe


    I usually use Grinder for this http://grinder.sourceforge.net/

    You can write your scripts using (j)Python or Clojure. Its very fast and you can simulate 1000s of simultaneous connections.


Advertisement