Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie

Programming help please

Options
  • 16-06-2010 6:20pm
    #1
    Registered Users Posts: 13,746 ✭✭✭✭


    Ok, I know what I need to in plain english but when I try to work out what I need programming etc wise I get stuck:

    What I need to do:

    I have a massive amount of files (on dvd)

    ---they need to be searched and a line extracted from each file
    --- from the extracted line two entries need to be further extracted from that and put into something that will use these to plot on a map of Ireland.

    Also

    --- need to search the same files for one thing, eg search how many were and how many were y.

    Basically I dont have a programming background and I am a bot of conflustered as to where to begin with this. Also I dont know where to store all these files, theres 1000's.

    All help appreciated.


Comments

  • Closed Accounts Posts: 26 TinfoilRaver


    might be better off posting in the software dev forum


  • Registered Users Posts: 13,746 ✭✭✭✭Misticles


    I can't find it :(


  • Registered Users Posts: 1,170 ✭✭✭deep1


    Misticles wrote: »
    Ok, I know what I need to in plain english but when I try to work out what I need programming etc wise I get stuck:

    What I need to do:

    I have a massive amount of files (on dvd)

    ---they need to be searched and a line extracted from each file
    --- from the extracted line two entries need to be further extracted from that and put into something that will use these to plot on a map of Ireland.

    Also

    --- need to search the same files for one thing, eg search how many were and how many were y.

    Basically I dont have a programming background and I am a bot of conflustered as to where to begin with this. Also I dont know where to store all these files, theres 1000's.

    All help appreciated.

    What language are you trying to write code?
    The files on dvd what extension are they?
    are they some kind of database file which you need to connect to extract information?
    Please provide more info.


  • Registered Users Posts: 13,746 ✭✭✭✭Misticles


    I dont know what type of code I need to write in.. I know a little C thats about it :D

    they're ASCII files.

    I was thinking of loading them all into one big giant data base and making two search functions..

    I need the extracted information to go into another database also I think.
    Then the map plotting from the first search-- I presume I need some tool for that.
    I need to be able to feed off where the two lots of info are for statistic use.


  • Registered Users Posts: 1,170 ✭✭✭deep1


    Misticles wrote: »
    I dont know what type of code I need to write in.. I know a little C thats about it :D

    they're ASCII files.

    I was thinking of loading them all into one big giant data base and making two search functions..

    I need the extracted information to go into another database also I think.
    Then the map plotting from the first search-- I presume I need some tool for that.
    I need to be able to feed off where the two lots of info are for statistic use.

    Well to get start first decide what language you want to use, all depends what type of application you are trying to build, some might be good for Java and other C++.
    Then start writing the database ( huge task).
    To extract the info you will need sql commands in your programme ( Hope you are good in that)
    It's very hard to explain as i have no idea what the application and what database it is.


  • Advertisement
  • Registered Users Posts: 907 ✭✭✭bandit197


    You could do this in Linux relatively easy using the terminal and the grep function to find the lines you are looking for. It is then possible to pipe the output of your search to a separate file. Do you have any experience with Linux? Im not 100% sure how to search a range of files on a dvd tho so there may be an easier way to do this.


  • Registered Users Posts: 13,746 ✭✭✭✭Misticles


    I have a little experience with Linux.

    I have used sql and phpmyadmin- dunno if thats relevant.

    Can I put all of the data into a big database?


  • Registered Users Posts: 1,064 ✭✭✭Snowbat


    You could extract the data of interest and load it into a database, or even load complete files (with appropriate handling) and extra fields for extracted data of interest.

    If you can provide some sample files and specifics of what you want to do, you'll get better suggestions from us and maybe a BASH script to do it.


  • Registered Users Posts: 339 ✭✭duffman85


    Misticles wrote: »
    Ok, I know what I need to in plain english but when I try to work out what I need programming etc wise I get stuck:

    What I need to do:

    I have a massive amount of files (on dvd)

    ---they need to be searched and a line extracted from each file
    --- from the extracted line two entries need to be further extracted from that and put into something that will use these to plot on a map of Ireland.

    Also

    --- need to search the same files for one thing, eg search how many were and how many were y.


    Basically I dont have a programming background and I am a bot of conflustered as to where to begin with this. Also I dont know where to store all these files, theres 1000's.

    All help appreciated.
    Not sure what you're looking to do in the bold part above.
    Misticles wrote: »
    I dont know what type of code I need to write in.. I know a little C thats about it :D

    they're ASCII files.

    I was thinking of loading them all into one big giant data base and making two search functions..

    I need the extracted information to go into another database also I think.
    Then the map plotting from the first search-- I presume I need some tool for that.
    I need to be able to feed off where the two lots of info are for statistic use.

    Development Forum is here

    import.png
    Similarly to Excel, PHPMyAdmin can import from a text file - just click on the import tab I've circled in red above and pick the text file. It will import them into a table you create to match your data e.g same number of columns etc.

    For this to work the entries on each line need to be separated by tab spaces or commas so phpmyadmin can figure out what column an entry is in.

    If it imports it all into a table for you, write your SQL query to extract the information. though you'd still have to import each file one at a time.

    Excel would be ok for one file but for 1000s of files a database and some SQL or using grep on linux will work better.

    As Snowbat says a sample text file would help us help you more.


  • Registered Users Posts: 13,746 ✭✭✭✭Misticles


    ill try get a sample file up on Monday for you guys. Thanks for the help.
    The files are like 7GB eachso 1000's of them is a hell of alot! :D


  • Advertisement
  • Closed Accounts Posts: 1,397 ✭✭✭Herbal Deity


    bandit197 wrote: »
    You could do this in Linux relatively easy using the terminal and the grep function to find the lines you are looking for. It is then possible to pipe the output of your search to a separate file. Do you have any experience with Linux? Im not 100% sure how to search a range of files on a dvd tho so there may be an easier way to do this.
    You don't need Linux at all. The "Select-String" Cmdlet in Windows Powershell does the same thing as grep :)


  • Registered Users Posts: 1,109 ✭✭✭Skrynesaver


    You don't need Linux at all. The "Select-String" Cmdlet in Windows Powershell does the same thing as grep :)
    No holy wars, let's imagine all the Linux advice is happening in Cygwin ;)

    To OP: If you can read C and have some familiarity with Linux, I'd suggest Perl as the best solution for your problem set.

    You can use something like the following assuming you have a db built, but a single table with two fields in MySQL shouldn't take too long to set up ;)
    #!/usr/bin/perl
    #CODE IS UNTESTED AND WRITTEN WHILE WATCHING FOOTBALL
    use strict;
    use warnings;
    use DBD::mysql;
    use DBI;
    my $path="DIRECTORY_DVD_IS_MOUNTED_ON";
    my $database="DB_NAME";
    my $host="DB_HOST";
    my $port=3306; # The default port for MySQL
    my @files = readir($directory);
    my $dbh = DBI->connect("DBI:mysql:database=$database;host=$hostname;port=$port",
        $user, 
        $password
    );
    for my $file (@files){
       open (FILE, "<", "$file") or warn "Couldn't open $file";
       while (<FILE>){
          if (/condition to match for line/){
             my ($value_1, $value_2) = split (/regex to split on/);
             close(FILE);
              $dbh->do("INSERT INTO foo VALUES (" 
                             $dbh->quote($value_1), 
                             ",",
                             $dbh->quote($value_2) ,
                             ")"
             );
          }
       }
    
    After that I'm unsure if you're doing a google maps mashup or adding to an image (Perl has prebuilt modules for the latter, see CPAN for details)

    Hope that helps


  • Registered Users Posts: 7,412 ✭✭✭jmcc


    Putting the data in a database without preprocessing it is a mistake. There are good text handling tools available in Linux (Perl, grep, fgrep) that would be far more effective at handling this kind of task. It would also be a good idea to copy the data from the DVD to hard drive as this will speed up access time. Also look at breaking the files up into subdirectories to improve access.

    Regards...jmcc


  • Registered Users Posts: 2,781 ✭✭✭amen


    if you are going down the database route and the data is already some what structured then have a look at the Microsoft Log parser.

    It lets you provide a list of files then write sql like queries against the files so you can search for characters, strings, add where clauses, do grouping etc and output the results to a file

    you could automate using batch files


  • Registered Users Posts: 3,721 ✭✭✭E39MSport


    I'm with Skrynesaver and a few others here.

    *nix is your best bet by far imo.


  • Registered Users Posts: 13,746 ✭✭✭✭Misticles


    :(
    Ok, I finally uploaded a file

    I need to extract information from columns 7,8 and 14,15.

    14 and 15 are the lat long which have to be plotted on a map of Ireland.

    Now I have 3.5 years worth of these files at 4 per day so there's alot.

    whats your thoughts now after seeing the file?

    I also need to search the audit trail file and extract reason for termination and burst height- that has to go into a like spreadsheet to get statistics for reasons of termination

    Thanks


  • Registered Users Posts: 1,228 ✭✭✭carveone


    Skrynesaver's little piece of Perl is a nice starting point. Perl is handy for inserting into databases. But what you are asking isn't particularly hard. Given that it is text and very specifically formatted text at that, there's a myriad of ways to do it.

    Most of us here are Unix inclined but something like awk*, which is a simple and fast language designed explicity for text processing, might be a handy way to parse these files. Perl is based on many of the ideas started by awk and other Unix utilities so you can move to Perl then if you need to. Perl and Awk (or the Gnu version, gawk) are available for Windows of course.

    I'll do a quick example in a sec..

    *I use Awk (or gawk) for Windows from the UnxUtils package. A handy place to download it is:

    http://unxutils.sourceforge.net/

    You need the UnxUpdates.zip file and just the binary on its own (ie: there's no installation).


  • Registered Users Posts: 1,228 ✭✭✭carveone


    Firstly, with any scripting language, one should remember that Windows has a tendency to be difficult when using command line arguments, especially quoting. So many unix people will type things like:

    cat file.txt | awk '{ print "Field one is: $1" }'

    which is harder to do under Windows which doesn't know what a single quote is. I'd recommend putting the script into a file and then doing:

    gawk -f scriptfile.awk file_to_process.txt

    which is clearer. Using your example, here's a quicky to print out the fields you were interested in:
    scriptfile.awk:
    
    ($1 ~ /^[0-9]/) && (NF > 10)  { print $7, $8, $14, $15 }
    

    Then do: gawk -f scriptfile.awk EDT20100101_00z.txt

    Ok. So what on earth does that mean?

    Well, every line that gawk encounters is split into fields based (by default) on spaces or tabs. Every line is then run against the script which is generally in the form:
    pattern { action }

    The gawk manual is at:
    http://www.gnu.org/manual/gawk/gawk.html

    So:
    Hello there world
    would split into three fields: $1 to $3. The number of fields encountered is stored in NF. The tilde (~) is a regular expression match (google it!) and square brackets mean range.

    Note that your file has leading information which makes things a little tricky.. We just want the numbers. There are a string of stars across which we could flag to say "start here!", but instead a little pattern match might be just as good.

    Breaking up that script we have:

    The pattern which is:
    ($1 ~ /^[0-9]/) && (NF > 10)

    which means the first field must start (that ^ symbol) with a character in the range ([ ]) 0 to 9. And (&&) the number of fields is greater than 10 (a number I pulled out of my ass but 10 sounds good). As an aside, Perl is littered with these kind of pattern matching expressions.

    And the action is: { print $7, $8, $14, $15 }

    which means print fields 7, 8, 14, 15. Easy.

    What next?


  • Registered Users Posts: 1,228 ✭✭✭carveone


    Here's the audit trail script:
    /^Reason for termination/ { print $0 }
    /^Radiosonde burst height/ { print $0 }
    

    Of course $0 is the whole line, so that might be a bit naff. Looking up the manual to get some string manipulation routines:
    /^Reason for termination/ {
       str = $0
       sub(/^.*:[\t ]*/, "", str)
       print str
    }
    

    That expression in the sub looks a bit nasty. Essentially I'm looking for the colon and then doing a replace with "" (nothing):

    ^ Beginning of string
    . Any character
    * Any number of previous expressions, including 0 (any number of chars)
    : The colon (I hope I got that right and : isn't something funny)
    [ Open range
    \t Tab
    Space
    ] Close range
    * Any number of previous expresssions (ie: 0 or more tabs/spaces)

    Scary huh...


  • Registered Users Posts: 13,746 ✭✭✭✭Misticles


    Oh my god! I'm terrified and regretting I didnt start this earlier :(

    erm... I only need the final values from column 7, 8, 14 and 15.

    I am actually at a loss here! :( I could cry, why I end up picking computery ones is beyond me!


  • Advertisement
  • Registered Users Posts: 1,228 ✭✭✭carveone


    erm... I only need the final values from column 7, 8, 14 and 15.

    That's not too bad, just ignore the data you pick up until the last line. The thing is, Unix is so full of utilities to do stuff like this, most of us here can help in some way. I picked Awk because it is the simplest to pick up and run with. One executable and very simple scripts. Most of us would run with Perl but it's a bit more of a deal to install and get working (but not too much).

    Here's a quicky to get the last entry. I'm assuming the file could have more lines after the last entry otherwise it would be really easy. Might be nicer ways to do this but:

    scriptfile.awk:
    ($1 ~ /^[0-9]/) && (NF > 14)  {
        field7 = $7
        field8 = $8
        field14 = $14
        field15 = $15
    }
    
    END {
        # this is run after awk runs out of line to read
    
        print field7, field8, field14, field15
    }
    

    I've a bit of time at the moment so I can help ;)


  • Registered Users Posts: 1,228 ✭✭✭carveone


    By the way, this is all for preprocessing the data before hitting the database like jmcc suggested.

    Oh and thanks "amen", I didn't know about the Log Parser. Looks interesting.


  • Registered Users Posts: 26,579 ✭✭✭✭Creamy Goodness


    what craveone is doing is a great start.

    the most difficult part of this project is getting the data (correct data too :)) into the database, once it's there plotting the lat/long as a google map marker (etc.) will take you no time, remember in programming if you can do something once, you can do it N amount of times whilst only having written it once.

    I remember doing awk in college but only for about a week and i never have to handle huge amounts of textual data in work so i'd never use it. that being said though awk is the way to go i think.

    Someone said it be best for you to copy all the files off the dvd on to a hard drive or similar, is a great idea just make sure to make two copies one fresh copy and one you can screw around with and work on so in the event you screw things up you won't have to fish out the dvd's to get all the files again.

    haven't tried craveone bit of awk out up there but if it works (not doubting it btw ;)) you can run that script as a command in a perl file and get your perl file to insert those values into your database.


    once you get this far (getting all your data into your database) you pretty much have an open playing field as to which languages you can use to extract the data from the database and display in a web page, i'd suggest php mainly because it's freely available, 1000's of tutorials online, easy to pick up and plus i've been using it now for about 3 years (but that's me :pac:).


  • Registered Users Posts: 13,746 ✭✭✭✭Misticles


    If I wanted to pay someone to do the computer programming side of it, into the database- how much would it be-- given my time frame and the fact I aint that good at C never mind these fancy pants languages!

    I can't copy all of the files onto a hard drive cos all I have is my laptop no external ones or anything


  • Registered Users Posts: 1,228 ✭✭✭carveone


    you can run that script as a command in a perl file and get your perl file to insert those values into your database.

    Well, I wouldn't go that far. Perl is awk+sed+sh+drugs so anything awk can do Perl can do too.
    Misticles wrote: »
    If I wanted to pay someone to do the computer programming side of it, into the database- how much would it be-- given my time frame

    Misticles throws in the towel ;)
    What's the time frame? If it's by tomorrow you may have a problem! Given that I personally would take what Skrynesaver did and the ideas that I presented and then club some bits on it, I can't imagine it would be much more than a few hours. Mostly trying to figure out exactly what you wanted.

    Extracting bits from text and then barfing out comma seperated values (that Excel will read no problem) is 20 mins work.
    I can't copy all of the files onto a hard drive cos all I have is my laptop no external ones or anything

    From what you said, I wouldn't do that. You were saying that you have thousands of big files consisting of GBs of info on DVDs. And that you just want a single line from each file. Preprocessing them to fetch just that one line is what you want in order to give you 1000s of lines, which is nothing to store. Even Excel could probably do the whole lot in one go if you play nicely.

    One you've got manageable data, then you can play with it, reduce it, plot it, store it, analyse it. You simply cannot possibly deal with 100s of GB without some sort of intermediate process (unless you are Google).


  • Registered Users Posts: 13,746 ✭✭✭✭Misticles


    Misticles most definitely throws in the towel, I've alot to do aswell as this bit so grrrr
    theres 1000's of files!!
    3 years worth, 4 per day!!


Advertisement