Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie

Releasing an app with data scraped from website

Options
  • 21-11-2013 1:24pm
    #1
    Registered Users Posts: 8,022 ✭✭✭


    Anyone know about the legality of scraping data from a website for use in an application? The core features of my app rely on this scraped data so without it I have very little. I've contacted the owners of the website but they haven't responded? I've told them I'm not looking for money and I won't be charging for the app. They don't even have a mobile version of their website and this app offers some really useful features for their customers with Android devices. Also from what I can see there are unofficial apps out there like Next Dublin Bus and Leap Card Balance which both scrape from their respective parent websites for data?

    I'd really like to be able to refer to it on my CV so reckon I'm just going to go ahead and release it to the play store and if they ask me to remove it then fine?


Comments

  • Registered Users Posts: 18,272 ✭✭✭✭Atomic Pineapple


    Not sure of the exact legal stance on it but if you credit where the data is coming from you should be OK, especially if you are not making any money off it.

    I do this with one of my applications, I scrape data from a website but give that website credit for the data and haven't had any issues, even after contacting them to let them know of the app.

    I guess it depends on what you are scraping and if the owner of the data wants to protect it from being used by 3rd parties or not.


  • Registered Users Posts: 6,392 ✭✭✭AnCatDubh


    Assuming its original content and unless otherwise stated (or agreed to) the information published on any website is the ownership/copyright of the person who created/publishes it. So you may be at the wrong end of the law if it ever came to it.

    It would be best to pursue the direct contact and see if they are ok with it. If its transport information as you eluded to in your post then they may have it covered with provisions of public sector data reuse policy, or they may consider the information to have a commercial value and guard it with their lives and tell you to get stuffed. If you aren't getting a response to your emails, do the ring up thing, and if they don't talk to you then, do the drop by to the office thing with an "oh, i was just in the area" intro.

    Also best to have agreement because if you are scraping stuff then they could just block you from scraping if they don't like the sound of it, or the volume of traffic that your app starts sending to their website, and worse still they could change their page which won't make any difference to them but could feic your app up something desperate. That also depends on the extent of their change, and the tolerance of your app.

    Anyhow, best to get permission i'd say.


  • Moderators, Society & Culture Moderators Posts: 17,642 Mod ✭✭✭✭Graham


    Is the source of your data a public body? If so you may be able to access the data under the guise of re-use of public information.

    Have you done everything you can to limit the amount of additional traffic your app is going to generate for the source website?

    As Atomic Pineapple and AnCatDubh stated, it's always better to get this kind of thing sanctioned in advance. Wasn't there a case a while back where access to Dublin Bikes date was blocked?


  • Registered Users Posts: 8,022 ✭✭✭youcancallmeal


    Graham wrote: »
    Is the source of your data a public body? If so you may be able to access the data under the guise of re-use of public information.

    I suppose I'll just mention the site, it is the Irish Blood Transfusion Service www.giveblood.ie. In my app I make it clear that the info is coming from there and I even have a link to the site on my main screen. I don't think IBTS is a public body but possibly a not for profit organisation? Funnily enough they have an iPhone app which looks like it was developed professionally. I downloaded it to try it out but I couldn't get past the splash screen. Can't remember now but some sort of error. I think possibly the guys who developed it have folded as their website is gone, www.agency.com
    Graham wrote: »
    Have you done everything you can to limit the amount of additional traffic your app is going to generate for the source website?

    I could get really advanced with it and start using my own intermediary server which pulls the data down once a day. It would definitely speed up parts of my app and would not cause excessive traffic on their servers. I think this is overkill though and something I don't really have the time or resources to implement.
    Graham wrote: »
    As Atomic Pineapple and AnCatDubh stated, it's always better to get this kind of thing sanctioned in advance. Wasn't there a case a while back where access to Dublin Bikes date was blocked?

    Never heard about that? I'm not surprised though as the Dublin Bike app is absolutely awful! If there was some sort of public api exposing the live bike information I would definitely have a go at doing something better!


  • Subscribers Posts: 1,911 ✭✭✭Draco


    Next Dublin Bus doesn't screen scrape - it uses the RTPI API to get information.

    There was a 3rd party Dublin Bus app but I believe legal letters were sent out and it was pulled from the app store. A similar thing happened with an unofficial Golden Pages app a few years ago.

    Apart from anything else, scraping websites is risky in that if the website changes your app breaks.


  • Advertisement
  • Moderators, Society & Culture Moderators Posts: 17,642 Mod ✭✭✭✭Graham


    I could get really advanced with it and start using my own intermediary server which pulls the data down once a day. It would definitely speed up parts of my app and would not cause excessive traffic on their servers. I think this is overkill though and something I don't really have the time or resources to implement.

    It could be considered a bit cheeky to be scraping their data for the benefit of your app while making them shoulder the burden of your apps traffic. They probably won't mind though if it's because you're too busy.


  • Registered Users Posts: 8,022 ✭✭✭youcancallmeal


    Draco wrote: »
    Apart from anything else, scraping websites is risky in that if the website changes your app breaks.

    Yeah true, even if they change the name of a div element which contains info I'm scraping I end up with errors or blank pages. There is loads more I could do with this app but think I'll probably just release it as is and move on to something else more reliable!


  • Moderators, Society & Culture Moderators Posts: 17,642 Mod ✭✭✭✭Graham


    Luckily for you the Irish Blood Transfusion Service comes under the remit of the Office of the Ombudsman and its re-use of public sector information policy & strategy.

    http://www.giveblood.ie/Re-use_of_Public_Sector_Information/

    http://www.ombudsman.gov.ie/en/About-Us/Policies-and-Strategies/Reuse-of-Public-Sector-Information/

    http://psi.gov.ie

    It's probably a good idea to have this sorted out before you start developing an app, especially when all it often takes is 5 minutes alone with Google.


  • Moderators, Society & Culture Moderators Posts: 17,642 Mod ✭✭✭✭Graham


    Yeah true, even if they change the name of a div element which contains info I'm scraping I end up with errors or blank pages. There is loads more I could do with this app but think I'll probably just release it as is and move on to something else more reliable!

    There's pros and cons to both approaches.

    Using an existing API means the data is easily accessible in a consistent format and is likely to remain so for the foreseeable future. On the downside, as soon as an API appears you're almost guaranteed a few new developers will throw together an App to present that data.

    Scraping data and/or generating your own API is much harder work and requires ongoing maintenance BUT it means your product is likely to remain unique for longer.

    Personally, I think the best apps take the data from either approach and add something unique/useful to it rather than just regurgitating it on a mobile device.


  • Registered Users Posts: 8,022 ✭✭✭youcancallmeal


    Thanks for the advice. I'll post it up here in the next few weeks for a critique. Its my first app so probably going to be loads wrong with it!


  • Advertisement
  • Registered Users Posts: 2,739 ✭✭✭MyPeopleDrankTheSoup


    ask for forgiveness, not for permission, in the general case. worst that will happen is you'll get a legal letter to take it down. no organistion will go try and clean you out without giving you warning first.

    in this specific case, i'd say the IBTS would be delighted with you doing an app.


  • Moderators, Society & Culture Moderators Posts: 17,642 Mod ✭✭✭✭Graham


    ask for forgiveness, not for permission

    Unless you're just churning out boilerplate (cr)apps by the dozen I wouldn't recommend this approach. It would be very easy find yourself with nothing to show for days/weeks/months of work.


  • Registered Users Posts: 72 ✭✭shanard


    Graham wrote: »
    Is the source of your data a public body? If so you may be able to access the data under the guise of re-use of public information.

    Would this cover the Garda Siochana site???


  • Moderators, Society & Culture Moderators Posts: 17,642 Mod ✭✭✭✭Graham


    shanard wrote: »
    Would this cover the Garda Siochana site???

    If you go to the Garda website, there is a link at the bottom of the page "Reuse of information". That would probably be a good place to start.

    The "Re-Use of Public Sector Information Portal" also has some information on re-using public sector information:

    http://psi.gov.ie


Advertisement