Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie

Assistence With Data Analysis / AI: Java

Options
  • 12-02-2014 3:36pm
    #1
    Registered Users Posts: 2,815 ✭✭✭


    I've been writing VBA solutions for years and I am currently in third year of an evening computer science course where I have also written numerous Java projects.

    I am in the early stages of another Java web application for college (Servlets, not with Spring). To be honest, I could develop a simple CRUD application easily and quickly and I'd probably get decent marks but I want a challange and to learn new skills.

    I have an idea for an application that would require a level of data analysis that I have previously not done, possibly involving the graph data structure.

    I am looking for assistence in pointing me in the right direction. I am not looking for exact code to be provided for me, but rather help with choosing the right methodology / data structures / libraries. Once I know the theory, I can then go ahead and write the actual code itself.

    I'd rather not put the spec of my application on this forum publically , so I would be very grateful if someone could provide this help via PM. If anyone is willing to provide this guidance with choosing the right methodology (perferably with experience with complex data analysis / AI), please PM me and I'll send you a description of the data analysis I want to do.

    Cheers

    (I looked at the charter before posting this and I couldn't see anything prohibiting seeking help via PM. I hope this is ok)


Comments

  • Registered Users Posts: 2,021 ✭✭✭ChRoMe


    I've been writing VBA solutions for years and I am currently in third year of an evening computer science course where I have also written numerous Java projects.

    I am in the early stages of another Java web application for college (Servlets, not with Spring). To be honest, I could develop a simple CRUD application easily and quickly and I'd probably get decent marks but I want a challange and to learn new skills.

    I have an idea for an application that would require a level of data analysis that I have previously not done, possibly involving the graph data structure.

    I am looking for assistence in pointing me in the right direction. I am not looking for exact code to be provided for me, but rather help with choosing the right methodology / data structures / libraries. Once I know the theory, I can then go ahead and write the actual code itself.

    I'd rather not put the spec of my application on this forum publically , so I would be very grateful if someone could provide this help via PM. If anyone is willing to provide this guidance with choosing the right methodology (perferably with experience with complex data analysis / AI), please PM me and I'll send you a description of the data analysis I want to do.

    Cheers

    (I looked at the charter before posting this and I couldn't see anything prohibiting seeking help via PM. I hope this is ok)

    Yeah, it mightnt be against the charter, but you will get a lot more people willing to contribute and will get better answers if you do it publicly on the thread. Its a university application, I don't see the need to be so secretive.


  • Registered Users Posts: 1,922 ✭✭✭fergalr


    What about this bit of the charter?
    Answering/Discussing a Question
    1. Keep all answers and suggestions on the thread. Asking users to pm you is not allowed. This is so anybody who has a similar problem in the future can view the solution suggested.

    OP:
    I've a level of expertise in data analysis, especially if it involves graph data structures. Surely you can ask in general terms about your query on the thread?

    I'd rather answer particular questions, rather than look through a detailed spec, and I'd rather do it on the thread so others here (and people coming from google in future) can use it. (Like the charter intends, which is pretty reasonable, imo)

    What sort of stuff are you trying to do?


  • Registered Users Posts: 2,815 ✭✭✭SimonTemplar


    Fair enough guys (can't believe I missed that part of the charter)

    The user inputs a date range and selects a number of locations from a list of possible locations. So an example set of input to the application could be:
    Start_date = 01/03/2014
    End_date = 03/03/2014
    Locations = [loc1, loc3, loc7, loc9, loc21, loc43, loc11, loc13, loc17, loc19, loc22, loc44]

    In the database, each location has a duration, a weighting, and its position (GPS).

    I would like the application to calculate the optimum schedule to visit each location while staying at that location for the required duration as specified by the value in the db.

    The application should take into account locations that are near each other when deciding the schedule. The user can visit locations between 09:00 and 18:00.

    Using Dublin as an example, if there are locations in Dundrum, Rathfarnham and Howth, the application should calculate that order of visits should not put Howth between Rathfarnham and Dundrum becuase it wouldn't make sense to leave the south side for a north side location only to return again.

    If there isn't enough time to visit all the locations within the specified date range, the application uses the location's weighting in the db to decide which locations can be omitted from the schedule.

    At the moment, I'm not taking into account travel time between locations. But please note that if I get the above working and I have enough time, I would like to try and incorporate travel times into the schedule calculation.

    I hope I've explained myself correctly and please ask questions if unclear.

    Cheers


  • Moderators, Society & Culture Moderators Posts: 9,689 Mod ✭✭✭✭stevenmu


    I know you said you wanted a challenge, but you've basically picked one of the hardest problems known to computer science :)

    What you've got is basically a variation on the Travelling salesman problem. I don't have any specific advice implementing a solution in Java, but what I would suggest is that you spend a little time researching the TSP and seeing what approaches to it match your interests, it is commonly represented as a graph problem for e.g. Coming up with a novel solution is probably beyond what you want to do for the moment, but you should be able to find some example solutions that you can then implement the code for yourself.


  • Registered Users Posts: 2,815 ✭✭✭SimonTemplar


    stevenmu wrote: »
    I know you said you wanted a challenge, but you've basically picked one of the hardest problems known to computer science :)

    What you've got is basically a variation on the Travelling salesman problem. I don't have any specific advice implementing a solution in Java, but what I would suggest is that you spend a little time researching the TSP and seeing what approaches to it match your interests, it is commonly represented as a graph problem for e.g. Coming up with a novel solution is probably beyond what you want to do for the moment, but you should be able to find some example solutions that you can then implement the code for yourself.

    Great. I like a challange though.

    I am aware of the TSP and I'd probably be able to implement it with enough research.

    Any ideas how I would incorporate the weighting if there isn't sufficient time to visit all locations?


  • Advertisement
  • Moderators, Society & Culture Moderators Posts: 9,689 Mod ✭✭✭✭stevenmu


    The first thing that comes to mind is to try and find a solution for all locations, then if that fails try and find a solution for all locations minus the lowest weighted one, and repeat until you find a solution that fits the time.

    That's not totally optimal, for e.g. you might end up with a solution with 5 low weighted locations removed, whereas there might be a solution with only one medium weighted location removed which would give a higher total weighting (if all location weightings were added).

    Another option is to run through every possible permutation of locations, finding all the solutions that work with the time frame, and adding up the weightings of the locations to find the solution with the highest total weighting. This would give you the optimal solution (in terms of location weighting at least), but scales badly for larger numbers of locations.


  • Closed Accounts Posts: 2,256 ✭✭✭Molly


    Look into using Linear Programming for doing this. CPLEX and Gurobi are two that have java interfaces and I think are free for students


  • Registered Users Posts: 2,021 ✭✭✭ChRoMe


    Great. I like a challange though.

    I am aware of the TSP and I'd probably be able to implement it with enough research.

    Any ideas how I would incorporate the weighting if there isn't sufficient time to visit all locations?

    Have a look at the "A Star" pathfinding algo


  • Registered Users Posts: 1,922 ✭✭✭fergalr


    ChRoMe wrote: »
    Have a look at the "A Star" pathfinding algo

    No; don't - its not really relevant here, its about finding the single source shortest path in a graph.


  • Registered Users Posts: 2,021 ✭✭✭ChRoMe


    fergalr wrote: »
    No; don't - its not really relevant here, its about finding the single source shortest path in a graph.

    I'm not suggesting it would work out of the box, with a decent heuristic it would be a reasonable place to start?


  • Advertisement
  • Registered Users Posts: 1,922 ✭✭✭fergalr


    ChRoMe wrote: »
    I'm not suggesting it would work out of the box, with a decent heuristic it would be a reasonable place to start?

    No, its not really applicable.

    A* is an algorithm that will find the shortest path from a starting location to a goal location (searching efficiently by using a heuristic).

    There's not really what the OP wants here. If you just wanted the best path from Bray to Howth, in a static weighted road graph, then A* would be a good choice; but that's not what the OP is trying to do at all.
    (FWIW, I gave a lecture on pathfinding algorithms yesterday that covered A* :pac: )


    What the OP wants to do does sound like its more in the family of combinatorial optimisation problems that the TSP is in (as stevenmu mentioned).

    Its not exactly like the TSP though.
    However, it sounds like a variant of the TSP, that has probably been already 'solved'. (By which I mean, there's probably a really good heuristic solution out there; not that anyone has a constant time algorithm that outputs the solution).

    One place to look is the broad field of operations research - its sometimes ignored by CS people, but most of the sort of routing problems you are going to run into in real-world applications were probably thought about a long time ago, and if you go looking hard enough, you'll find some paper written by some genius working for the RAND corporation in the 60s that solves your problem.


    If I could make one observation on the OPs problem:
    I would like the application to calculate the optimum schedule to visit each location while staying at that location for the required duration as specified by the value in the db.

    There might be a really easy way of dealing with this additional complexity you have:

    Specifically: Take your distances (which are currently going to be your 'edge-weights' between the 'cities' (locations)) and convert them into times instead.

    Then everything is expressed in terms of time. (And the problem hasn't changed).

    Now, take your 'duration' value that you have for each city, and just add the duration value to the weight of the edge coming into the city.

    You've lost nothing, and suddenly your problem is a lot easier, and is looking just like the 'orienteering problem'. (including having weights to get as many locations as you can, if you can't get them all).

    Orienteering is one of my favourite sports, so that's a good thing. :pac:

    Here's a survey paper on solutions to the orienteering problem:
    http://www.inf.unibz.it/dis/teaching/SDB/papers/batch1a.pdf


  • Registered Users Posts: 2,021 ✭✭✭ChRoMe


    fergalr wrote: »
    No, its not really applicable.

    [lots of good stuff]

    I stand corrected!


  • Registered Users Posts: 2,815 ✭✭✭SimonTemplar


    Thanks all for your replies, especially fergalr. Your suggestion of merging the distance and duration properties into a single weight is very interesting. Would Google Maps API be the best place to get the time between two GPS points in order to convert distances to times, or would I do that calc myself based on a nominal distance per minute formula?

    I probably won't start developing this in earnest for a few weeks but I'll come back if I run into any issues.

    Thanks again.


  • Registered Users Posts: 1,922 ✭✭✭fergalr


    Thanks all for your replies, especially fergalr. Your suggestion of merging the distance and duration properties into a single weight is very interesting.

    Its a good suggestion - if you just think of the duration property as an 'extra time' you have to go for, in order to get the reward, it makes perfect sense, and greatly simplifies the problem.

    And its not really merging duration and distance - its really merging the duration at the time, with the duration to get to the time.

    You never really cared about distance, really. All you cared about was the duration it takes to get from A to B. That happens to be correlated with distance, sure - but your constraints (how much reward can we get between 09:00 and 18:00) were always duration.

    Would Google Maps API be the best place to get the time between two GPS points in order to convert distances to times, or would I do that calc myself based on a nominal distance per minute formula?

    That's just an implementation detail :p

    Joking aside, that really depends on your application needs - what's this for?

    Probably the best thing to do is to have a measure of how long it actually takes to get from A to B, based on a summary of past journeys from A to B, extracted from GPS routes or something. In fact, you probably want data on many such journeys, and then take the 90th percentile, so that you can be conservative in your planning. (Maybe - depends on the application - how big a deal is it to be late for a delivery, vs how much is it worth to squeeze more deliveries into the data?)

    In fact, really, you probably want it so that the projected travel time varies depending on the time of day, taking into account traditional traffic patterns. (Although, if you do that, you can't use the Orienteering problem solutions, as you've got a more complex problem, where the time between two destinations depends on the order you visit destinations in).
    (Although they'll still provide good inspiration for this more complex problem)

    Anyway, in general, if you want to do some complex AI stuff like this, take the simplest possible approach to start with - something like the travel time between points according to google maps, +30% margin of error - or something like the crow-flies (euclidean) distance between points, but with a really conservative travel speed. (Like you suggest).


    When you get that up and running and implemented, then you can start gathering data, and refine the problem.

    And then you'll discover that no one is using your super cool algorithm because the UI of the interface you wrote is the wrong colour, or it doesn't have a big picture of a padlock on it to show that its secure, etc. etc. The super cool algorithmic stuff not being quite sophisticated enough, is rarely the limiting factor in the real world...
    I probably won't start developing this in earnest for a few weeks but I'll come back if I run into any issues.

    Thanks again.
    Welcome, good luck.


Advertisement