Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie

Viewing html form POST data

Options
  • 25-07-2005 2:15pm
    #1
    Closed Accounts Posts: 913 ✭✭✭


    Hi Guys,

    Anyone know is it possible to view what data is submitted via
    post method in a html form ?
    For example, if I searched for a flight on ryanair website,
    several things are submitted when I click 'search'.
    I'd like to view these details.

    Thx
    Ray


Comments

  • Registered Users Posts: 68,317 ✭✭✭✭seamus


    You can view the source of the file, and if you know a little about HTML, you can see exactly what's being submitted in the form. Or you can get the webmaster toolbar for firefox, which will allow you to see form details, I find it very handy.


  • Registered Users Posts: 3,886 ✭✭✭cgarvey


    Probably the easiest way is to install the LiveHeaders extension in FireFox which shows you the POST content.

    You can use all sorts of other things like tcpdump (which will give you lots of verbose details about the connection, including POSTed content), or Google for HTTP Tunnel, most of which will allow you to monitor the content as well. LiveHeaders is the easiest way I know of anyway!


  • Closed Accounts Posts: 913 ✭✭✭HarryD


    thanks guys, Live headers done the trick..

    Cheers,
    Ray


  • Closed Accounts Posts: 913 ✭✭✭HarryD


    Follow on question:

    After submitting a form to a 3rd party, a page gets returned.
    What's the best way to filter this page and take certain information,
    without displaying the page ?
    There's probably more to it than meets the eye.

    Thx
    R


  • Closed Accounts Posts: 19,777 ✭✭✭✭The Corinthian


    HarryD wrote:
    After submitting a form to a 3rd party, a page gets returned.
    What's the best way to filter this page and take certain information,
    without displaying the page ?
    There's probably more to it than meets the eye.
    Trying to deep link or automate the search for free flights I see. What you're trying to do is called 'screen scraping'.

    To begin with you’re almost certainly dealing with sessions and multiple POSTs, so you would have to be able to both emulate the process of looking for a flight over multiple pages and fake session cookies in your request headers. Picking out your data is very much dependant on your being able to find ‘flags’ in the HTML to identify where it begins and ends, but remember that you would always be dependant on the site not updating their code or even blocking your IP at any stage.

    Finally, if this is for Ryanair you should know that they don’t like people doing this (it was tried at least once AFAIK) and they will take steps to stop you if they notice you doing so.


  • Advertisement
  • Closed Accounts Posts: 913 ✭✭✭HarryD


    Trying to deep link or automate the search for free flights I see. What you're trying to do is called 'screen scraping'.

    To begin with you’re almost certainly dealing with sessions and multiple POSTs, so you would have to be able to both emulate the process of looking for a flight over multiple pages and fake session cookies in your request headers. Picking out your data is very much dependant on your being able to find ‘flags’ in the HTML to identify where it begins and ends, but remember that you would always be dependant on the site not updating their code or even blocking your IP at any stage.

    Finally, if this is for Ryanair you should know that they don’t like people doing this (it was tried at least once AFAIK) and they will take steps to stop you if they notice you doing so.

    I don't see why any airline would have a problem with this..
    If it brings them more business then I'm sure they're happy.
    Surely openjet.com or skyscanner.net use this ?

    Also I don't see why fake session cookies would be required.
    Emulating the process of looking for a flight and filtering the data
    is not a problem. The main problem is dealing with sessions,
    and grabbing the returned html as opposed to displaying it.

    Thanks for the input,
    Ray


  • Registered Users Posts: 3,886 ✭✭✭cgarvey


    Those sites mentioned pay heavily for an interconnect. You'll need to be session cookie aware, otherwise it won't work for all the airline sites I've ever visited at least.

    If I was you, I'd be using Perl (which has the ability to send/receive requests including a cookie manager), but you might be more comfortable with another scripting language like PHP?

    I don't know of any easy way to do it, but maybe there is.. Google for scraping (with another keyword like HTML or HTTP.. otherwise I'd hate to see the results!) and see if there's any tool that makes it easier than having to write your own script.

    Basically, you'll want to make a GET request to the starting page, grab the session cookie and store it for subsequent requests, make a POST with your search criteria (whose fields you know the values of from your LiveHeaders extension), and then parse the response (which will be one big string of HTML) for a known pattern before and after your required data.

    Of course everytime they update a small thing on their website it'll probably break your straight away.. most websites will have a T&C clause preventing you from doing this, and some may simply block your IP like The C has already mentioned above.

    So check out Perl or PHP if you're familiar with either.

    .cg


  • Closed Accounts Posts: 913 ✭✭✭HarryD


    Yeah I'm using perl.
    I found a handy module called Mechanize, which seems to do the trick.
    Thanks for your help guys
    R


  • Closed Accounts Posts: 19,777 ✭✭✭✭The Corinthian


    HarryD wrote:
    I don't see why any airline would have a problem with this..
    If it brings them more business then I'm sure they're happy.
    Ryanair do a lot more than sell cheap flights on their site. They sell more expensive flights too. And insurance and hotel bookings. Then there are the marketing considerations, such as brand integrity and competition (are you looking to include quotes from competitors, and if so why should they want such public comparisons?). Finally they can sell the data, as has been already pointed out, for a lot of money.

    All this adds up to the fact that if they do notice you deep linking they will take action - at the very least technical (changing their output slightly or blocking your IP) and possibly also legal.
    Also I don't see why fake session cookies would be required.
    Emulating the process of looking for a flight and filtering the data
    is not a problem. The main problem is dealing with sessions,
    and grabbing the returned html as opposed to displaying it.
    However you are likely to be going through multiple pages and not just one. Thus you need to be able to emulate the current session as you go from the initial query through to the return flight and the final page that includes taxes. This requires multiple HTTP requests before outputting your final quote and would also require faking the session cookie headers in the HTTP request.


  • Closed Accounts Posts: 913 ✭✭✭HarryD


    if they do notice you deep linking they will take action - at the very least technical (changing their output slightly or blocking your IP) and possibly also legal.
    Interesting..
    Any idea how skyscanner and openjet get away with it ?
    it was tried at least once AFAIK.
    Can you remember any more info on that ?
    you are likely to be going through multiple pages and not just one. Thus you need to be able to emulate the current session as you go from the initial query through to the return flight and the final page that includes taxes. This requires multiple HTTP requests before outputting your final quote and would also require faking the session cookie headers in the HTTP request.
    Anytime I need a quote I just POST the form with the relevant field data,
    and I get the quote returned. <end of that session>
    No Cookies req'd..
    Mechanize can handle cookies if req'd anyway..


  • Advertisement
  • Closed Accounts Posts: 19,777 ✭✭✭✭The Corinthian


    HarryD wrote:
    Interesting..
    Any idea how skyscanner and openjet get away with it ?
    They most likely pay for the data.
    Can you remember any more info on that ?
    I believe they decided it wasn’t viable and dropped the venture.
    Mechanize can handle cookies if req'd anyway..
    I can’t comment on Mechanize as the only time I’ve ever done anything similar I coded it from scratch.


  • Registered Users Posts: 83 ✭✭fatlog


    most of the stumbling points have been answered above but heres my take...

    to automate it you will need to store sessions variables somewhere. thats a major one.

    the return page could be picked apart using regular expressions. you know what info you want so look for a certain format using a regular expression search. this needs to be very tight as you could pick up otjher data that will throw you off.

    sites do not like this generally and can block it. for example online bookies have lots of betting bots hitting them. some say that they have more bots scraping their site than genuine users browsing the site. and have started to ban bots. others have started to create API's specifically for bots. either way, you could eventually face either being shut down or charged for use of the site.


  • Registered Users Posts: 131 ✭✭theexis


    They most likely pay for the data.
    QUOTE]

    Actually this isn't the case - an ex colleague of mine has recently joined them to work on their screen scraping automation as they want to expand to more markets.


  • Registered Users Posts: 9,579 ✭✭✭Webmonkey


    Done something like this for O2 and Vodafone for automating the process but unfortuntally vodafone have changed their whole system with random fields etc, though i did get around them but still having problems sending.

    Anyways, I used cURL module in PHP. Will probably do the trick, does the cookies whole lot. Got handy source here if you're interested.

    I also got the "Amount of texts left this month" figure, by catching the page and doing simple ereg search with PHP and exploding the lines to get pieces of information.
    Just to let you know, if ryanair keep changing layout of pages it could drive you insane.


  • Closed Accounts Posts: 19,777 ✭✭✭✭The Corinthian


    theexis wrote:
    Actually this isn't the case - an ex colleague of mine has recently joined them to work on their screen scraping automation as they want to expand to more markets.
    He / she joined both Web sites?

    As I said ,I would believe that most serious sites would ultimately pay for the content - because otherwise this is the sort of things that happens:
    Webmonkey wrote:
    Done something like this for O2 and Vodafone for automating the process but unfortuntally vodafone have changed their whole system with random fields etc, though i did get around them but still having problems sending.
    Companies that value their content / services for whatever reason don’t like people stealing it (and whether you think it free advertising or not they consider it stealing). If you’re a minor irritation they’ll probably simply change the code to screw up your screen scrapes or block your server IP. If you become more than that they will sue you. It’s happened.

    So the reason I said I would believe that most serious sites would ultimately pay for the content is that otherwise you’re basing your entire business model on hacking someone else, so you’ve got a decidedly unstable proposition that’s not likely to last in the long run. One way or another.


  • Registered Users Posts: 9,579 ✭✭✭Webmonkey


    Yeah Id have to agree with you man. Was a hobby to me though, sick of trying to find ways around their system now.


  • Registered Users Posts: 4,003 ✭✭✭rsynnott


    All this adds up to the fact that if they do notice you deep linking they will take action - at the very least technical (changing their output slightly or blocking your IP) and possibly also legal.
    .

    Unless they have some sort of "agree to this before you enter this site" thing, they have no legal recourse, provided you don't publish the material extracted. It is certainly quite likely that they'll attempt to block it tho.

    As mentioned before, Mechanize is a good tool for this sort of thing.


  • Closed Accounts Posts: 19,777 ✭✭✭✭The Corinthian


    rsynnott wrote:
    Unless they have some sort of "agree to this before you enter this site" thing, they have no legal recourse, provided you don't publish the material extracted.
    If you use their content for your own site, even if you do not use it verbatim you are in fact publishing part of their content and are in breach of copyright and they do indeed have legal recourse.


  • Registered Users Posts: 4,003 ✭✭✭rsynnott


    If you use their content for your own site, even if you do not use it verbatim you are in fact publishing part of their content and are in breach of copyright and they do indeed have legal recourse.

    Did he say specifically that he'd be publishing it? Sorry, missed that.


  • Closed Accounts Posts: 19,777 ✭✭✭✭The Corinthian


    rsynnott wrote:
    Did he say specifically that he'd be publishing it? Sorry, missed that.
    No, but the discussion has grown far beyond the original poster, encompassing a number of sites that do exactly that. It’s also not an unfair assumption that this is most likely his intention, regardless of whether it is for a commercial venture or not.

    Nonetheless, I was simply adding to your point on the issue of copyright. Breach of copyright need not mean reproducing material verbatim, which was not clear in your comment.


  • Advertisement
  • Registered Users Posts: 4,003 ✭✭✭rsynnott



    Nonetheless, I was simply adding to your point on the issue of copyright. Breach of copyright need not mean reproducing material verbatim, which was not clear in your comment.

    No, it needn't. However, information extracted that way could be used indirectly, in statistics; for example, one could say that 90% of Ryanair's tickets are under €300 (or whatever).


  • Closed Accounts Posts: 19,777 ✭✭✭✭The Corinthian


    rsynnott wrote:
    for example, one could say that 90% of Ryanair's tickets are under €300 (or whatever).
    I don't think you can do that with the Ryanair (or any other budget airline) site - you can find out if tickets under €300 are available, but not how many there are.


  • Registered Users Posts: 131 ✭✭theexis


    He / she joined both Web sites?

    No, He, joined the company that develop skyscanner (based on Scotland). http://www.skyscanner.net/static/aboutus.html even suggests the fact.
    If you use their content for your own site, even if you do not use it verbatim you are in fact publishing part of their content and are in breach of copyright and they do indeed have legal recourse.

    I'm not sure why you see this to be a different situation to any other search engine - are you suggesting that Google cached pages are illegal? Also a recent article in Business Week suggested that most Airlines explicitly leave copyright notice off their sites to attract this kind of traffic (until they get their RSS story together) since its ultimately bringing more business.


  • Registered Users Posts: 4,003 ✭✭✭rsynnott


    I don't think you can do that with the Ryanair (or any other budget airline) site - you can find out if tickets under €300 are available, but not how many there are.

    I'm not sure of the details of the site in question; I was just giving an example of a legal application.

    And Google cached pages are fairly dubious...


Advertisement