Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie

How to split a large data file to use as an csv file?

Options
  • 12-03-2013 10:57pm
    #1
    Registered Users Posts: 263 ✭✭


    The file is made up of longitude and latitude coordinates of each county of Ireland and the source file does not exist any more. I need to be able to split them up so i can put them in my database. eg-6.788764,52.564522 -6.801274,52.56532 .
    These are the coordinates for the bases of a game i am doing for a project.

    Can they be split up with java script?

    I can do it manually by insert them into excel and importing it to a csv file and then import them into my database but would take a very long time


    Help needed.


Comments

  • Closed Accounts Posts: 22,479 ✭✭✭✭philologos


    degzs wrote: »
    The file is made up of longitude and latitude coordinates of each county of Ireland and the source file does not exist any more. I need to be able to split them up so i can put them in my database. eg-6.788764,52.564522 -6.801274,52.56532 .
    These are the coordinates for the bases of a game i am doing for a project.

    Can they be split up with java script?

    I can do it manually by insert them into excel and importing it to a csv file and then import them into my database but would take a very long time


    Help needed.

    Can you give me a sample row of the data?

    Are you talking about parsing CSV to insert into a database? Also how are you importing to the database (what language, or is it all in Javascript?).

    What type of database also?


  • Registered Users Posts: 263 ✭✭degzs


    This is a sample row of the data that i am using -6.788764,52.564522 -6.801274,52.56532 -6.806076,52.557723 -6.806557,52.548049. This represents 4 points ona Google map of Ireland which gives me an outline of county Carlow on the map.The minus number are the longitude coordinates and after the comma are the the latitude coordinates.

    I am using phpMyAdmin as my database.

    I need some way to split my data after the comma for each longitude and latitude coordinate and input it into excel. i will have a cell for longitude and a cell for latitude. once its in excel i can import into my database as an csv file.

    Thanks


  • Registered Users Posts: 2,426 ✭✭✭ressem


    You're not quite clear.

    Should the output look like
    -6.788764,52.564522
    -6.801274,52.56532
    -6.806076,52.557723
    -6.806557,52.548049


    Using javascript would be a bit over the top for a huge file.

    How about Microsoft Powershell, built into win 7 and 8.
    Search for and run powershell console.

    Replacing the two filenames, type/copy in
    (Get-Content ("C:\temp\inputfile.csv")) -split ' ' | Set-Content ("C:\temp\outputfile.csv")
    
    that is a space between the '' after -split. This is the delimiter at which the command is splitting the line.


  • Registered Users Posts: 263 ✭✭degzs


    Thanks, this how i want all my data
    -6.788764,52.564522
    -6.801274,52.56532
    -6.806076,52.557723
    -6.806557,52.548049

    I do not know Microsoft Powershell and what way i should go about using it?

    I will wait for the morning and go at it with a clear head.


  • Closed Accounts Posts: 350 ✭✭mickgotsick


    degzs wrote: »
    Thanks, this how i want all my data
    -6.788764,52.564522
    -6.801274,52.56532
    -6.806076,52.557723
    -6.806557,52.548049

    I do not know Microsoft Powershell and what way i should go about using it?

    I will wait for the morning and go at it with a clear head.

    What operating system are you using? Windows? You should be able to do the above with a decent text editor e.g. http://www.ultraedit.com/downloads.html

    It'll have a Find All/Replace All command, so you would find all spaces (" ") and replace them with a new line ("\n").


  • Advertisement
  • Registered Users Posts: 9,605 ✭✭✭gctest50


    degzs wrote: »
    The file is made up of longitude and latitude coordinates of each county of Ireland and the source file does not exist any more.

    Make a copy of the file and save it somewhere safe first
    degzs wrote: »
    I need to be able to split them up so i can put them in my database. eg-6.788764,52.564522 -6.801274,52.56532 .
    These are the coordinates for the bases of a game i am doing for a project.

    If you look at the numbers for a minute - you'll see they are already split up

    1. By commas -6.788764,52.564522

    2, By spaces -6.788764,52.564522 :) -6.801274,52.56532

    You just need to tell it to go to a new line when it sees a space

    Bring your data in :

    http://www.fatcow.com/knowledgebase/read_article.bml?kbid=2591


  • Registered Users Posts: 78,371 ✭✭✭✭Victor


    Open the database in Notepad and do a search and replace on the comma and change it to a tab character. You may need to remove spurious space characters.

    What is the current file format?

    Are we only talking about 32 rows (32 counties) of data? If there is genuinely a cast amount of data (thousands of lines) using MS Word will do it quicker than Notepad.

    No need to make a simple problem difficult.


  • Registered Users Posts: 2,021 ✭✭✭ChRoMe


    Victor wrote: »
    using MS Word

    For the love of god, don't use MS Word as a text editor for files that are going to be run through a parser... use a decent text editor like sublime or notepad++

    Word is just wrong on multiple levels.


  • Moderators, Society & Culture Moderators Posts: 9,689 Mod ✭✭✭✭stevenmu


    How large is large? Are we talking kilobytes, megabytes, gigabytes, terabytes etc?

    If you have excel you can start a new worksheet, then import your file from the Data tab on the ribbon with the From Text button. This will give you a wizard that will let you select format of the file, specify what characters are used as a delimiter etc. That should parse it for you and you can then just save it as a CSV.

    If you have SQL Server (I think the free Express versions allow this too), you can import to your database from the management tools. Again they'll give you a wizard to specify the format of the file, what the delimiters are and so on, and what fields in your database table to map the columns in the file to.

    Edit: Oh and if it's only thousands or tens of thousands of rows you can just paste it all straight into Excel, select it and then use the Text to Columns function on the Data tab to parse it into the right format. That's a really powerful tool for manipulating data, but can get a little unwieldy when you go beyond 10s of thousands of rows of data.


  • Registered Users Posts: 9,605 ✭✭✭gctest50




  • Advertisement
  • Registered Users Posts: 2,426 ✭✭✭ressem


    degzs wrote: »
    Thanks, this how i want all my data
    -6.788764,52.564522
    -6.801274,52.56532
    -6.806076,52.557723
    -6.806557,52.548049

    I do not know Microsoft Powershell and what way i should go about using it?

    I will wait for the morning and go at it with a clear head.

    For powershell, you need to start the program "powershell.exe".
    Then you can just type in commands into the powershell console and hit enter to run them.

    In windows 7 or 8 hold 'Windows Key' + R to bring up the run dialog.
    Type in powershell and press enter. Then type in the command listed above.

    On XP or Vista you need to download and install powershell first.
    XP version at http://www.microsoft.com/en-us/download/details.aspx?id=16818

    Vista 32bit version at http://www.microsoft.com/en-us/download/details.aspx?id=9864


  • Registered Users Posts: 263 ✭✭degzs


    I have tried using ultra edit and i cannot seem to get it to do what i want.

    I have county Carlow manually done so this can give a good example of what i want
    Carlow.xlsx
    , I know at every space i want a new line but seem to be finding it difficult to master.

    My data is in word doc


  • Closed Accounts Posts: 350 ✭✭mickgotsick


    You're first step is to convert the word doc into plain text, so load the file into word and save as a text file.

    I don't have windows but I downloaded UltraEdit for the Mac, so I will give instructions based on that and hopefully you can find a similar method in the windows version.

    Load in the converted text file. Go to the search menu item and then replace. In the "Find what:" box put in a space and in the "Replace with:" box put a \n

    There's some extra options, you might need to click on something so they appear, which I have to do but then I get a check box with "Use regular expressions". Check that box and beside that there'll be a drop down menu and select "Perl".

    Then hit "Replace All" and that should be it.

    EDIT: I had a look at your Excel file and there's an extra column with a 3 in it, how does that fit into the scheme of things? I presume that means Carlow?


  • Registered Users Posts: 263 ✭✭degzs


    Thanks for all the help. got my little problem fixed.


Advertisement