Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie

Where can I get a Database of peoples CVs?

Options
  • 02-02-2017 5:55pm
    #1
    Registered Users Posts: 25


    Hey all,

    Longtime lurker and my first post here (Not ideal for here but Im not sure where else would work...). Im looking at something in recruitment and need to get a reasonably large sample of CVs to build to an MVP.

    Is anyone aware of where I could get access to this info online?

    Thanks in advance & apologies if this is not relevant for the thread


Comments

  • Registered Users Posts: 7,157 ✭✭✭srsly78


    I believe the usual method is to spam people with various fake jobs, then build up your database from the replies.

    Databases of candidates are the most important info that recruitment agencies have, not something to be given away.


  • Registered Users Posts: 25 Paddys Wigwam


    Thats what I understand, I was hoping there might be an open access one but my search has only found restricted datasources. Thanks for the help


  • Registered Users Posts: 586 ✭✭✭Aswerty


    Not exactly an answer to your question since I don't know where you'd get a dump like that unless you find something black hat or, as you say yourself, find them but with restricted access for research purposes and what-not. I'm assuming you need the CVs for the purposes of parsing data out of the various CV formats as opposed to having up to date CVs of real people? And on that assumption...

    Depending on the sample size you need you might be able to do something like google "resume download filetype:pdf" and manually download anything that has a persons name in the page title (best indicator for a personal CV). That specific query brings up a lot of how-to-write-a-cv results which you don't want. You might find another query that gives you better results, maybe add a common name to the query (e.g. David) so you only get CVs for named people, repeat for other common names. I imagine a few hours of this type of approach should net you a few hundred CVs. Updating pdf to doc and/or docx for whatever file format requirements you have.

    Hell you could probably automate the above very easily, though your problem then would be filtering the legit CVs from whatever other crap you get.

    I actually met a guy recently who was talking about a product related to CVs and recruitment. I know he was looking at partnering with someone with an existing database (e.g. a recruitment agency).

    I imagine any approach you take other than going to a black hat site or partnering with an agency will mean that on some CVs some personal info will be redacted: phone numbers, emails, addresses, etc.


  • Closed Accounts Posts: 8,015 ✭✭✭CreepingDeath


    Im looking at something in recruitment and need to get a reasonably large sample of CVs to build to an MVP.

    Is anyone aware of where I could get access to this info online?

    Recruitment agencies would be very secretive about that sort of thing.
    They would lose their recruitment fees if companies could just search for candidates directly.

    LinkedIn is probably as open as it gets, with plenty of companies directly contacting people who have the skillset they are looking for if they publish their CV/details in their profile.


  • Registered Users Posts: 23,212 ✭✭✭✭Tom Dunne


    Thats what I understand, I was hoping there might be an open access one but my search has only found restricted datasources. Thanks for the help

    You are looking at exceptional Data Protection issues - who in their right mind would give away their personal information and make it freely available?


  • Advertisement
  • Closed Accounts Posts: 8,015 ✭✭✭CreepingDeath


    Tom Dunne wrote: »
    You are looking at exceptional Data Protection issues - who in their right mind would give away their personal information and make it freely available?

    Facebook and LinkedIn members spring to mind !


  • Registered Users Posts: 23,212 ✭✭✭✭Tom Dunne


    Facebook and LinkedIn members spring to mind !

    Yes, but that's you putting your details out there. Big difference between that and a large database of CVs.


  • Registered Users Posts: 768 ✭✭✭14ned


    Longtime lurker and my first post here (Not ideal for here but Im not sure where else would work...). Im looking at something in recruitment and need to get a reasonably large sample of CVs to build to an MVP.

    I had a startup idea in this area a few years ago I actually wrote some prototype code for (those mine was to eliminate the worth of CVs, which are pretty low information value).

    It used to be you could ask LinkedIn for CVs for a RESTful API, but now you must screenscrape. I also wouldn't rate the quality of what's on there, it was bad in 2011 and it's likely worse now. People just make stuff up on their profile and pat each other on the back with further lies. The only value there might be the network of relationships, but even that's pretty low quality, much worse quality than even Facebook.

    Much higher calibre CVs I found on Stackoverflow Careers where they keep "developer stories" e.g. here's mine https://stackoverflow.com/users/story/805579. People appear much less likely to lie on that because it cross references your "developer story" with your github, sourceforge and questions and answers you have made to stackoverflow. I think they're got a RESTful API and you may be able to pull people resumes legally. You can also validate the CV easily by matching their github repos to the commit identification metadata on openhub which tracks all contributions to all open source on a per individual basis. Here's me on openhub for example: https://www.openhub.net/accounts/ned14.

    Finally I wouldn't undervalue writing a bot which uses Google to find CVs to download. Lots of people put their CVs on a personal website. Back in 2011 I had a python script which pulled a few thousand CVs in just a few hours simply by asking Google for a mix of keywords and yanking any linked PDFs. I'm sure the same would work now, though be aware the CVs found will be for all professions, not a single profession. You'll find particular density in Physics/Maths/IT.

    Anyway without knowing what you're doing, it's hard to say any more that's useful. Good luck with whatever you're doing.

    Niall


  • Closed Accounts Posts: 22,649 ✭✭✭✭beauf


    Tom Dunne wrote: »
    You are looking at exceptional Data Protection issues - who in their right mind would give away their personal information and make it freely available?

    ..in addition theres the problem of holding and using the information without again crossing data protection issues.


Advertisement