Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie

Converting XML to Excel or vice versa

Options
  • 28-09-2018 4:54am
    #1
    Registered Users Posts: 78,369 ✭✭✭✭


    If I have XML-style data or Excel (or any other table) data, is there an easy / automated way to convert between the two?

    The particular data may not be wholly consistent, e.g. the XML-style data might be missing some fields. In the attached, bus stops in the Dublin 6 area are detailed. Some stops might be missing, for example, the <Data name="route_ref"> data, as this has not been surveyed / recorded. Meanwhile, another stop might be missing <Data name="tactile_paving">. Simply copying and pasting into a table might result in data ending up in the wrong columns / rows and might need substantial editing.


Comments

  • Registered Users Posts: 1,741 ✭✭✭ShatterProof


    As in the existing xml import and export functions in excel?


  • Registered Users Posts: 36,167 ✭✭✭✭ED E


    Would CSV do? Excel is MS proprietary so less libraries will support it than CSV/other.

    NB: Excel can open/import CSVs no problem.


  • Registered Users Posts: 78,369 ✭✭✭✭Victor


    Please appreciate the [n00b] tab on this thread. :) I'm quite comfortable with Excel, XML is somewhat alien.
    As in the existing xml import and export functions in excel?
    After a bit of poking, I found the Import function under the (hidden) Developer tab. It suggests I need a data map, which I don't have. My attempt resulted in garbage.
    ED E wrote: »
    Would CSV do?
    Well the original data is XML. How do I get it from XML to CSV?
    Excel is MS proprietary so less libraries will support it than CSV/other.
    Fair enough. Whether it is CSV, Excel or some other table is somewhat academic.
    NB: Excel can open/import CSVs no problem.
    I understand this. Although sometimes I find either the data or Excel can be finicky.


  • Closed Accounts Posts: 22,649 ✭✭✭✭beauf


    I've never used the XML style sheet route but that would seem to be the way to do it.

    Alternatively write some VBA.

    Or convert to text and replace the tags with , then , , to , then to csv and fix it up. Be long winded though.


  • Registered Users Posts: 78,369 ✭✭✭✭Victor


    beauf wrote: »
    Alternatively write some VBA.
    I r teh n00b.
    Or convert to text and replace the tags with , then , , to , then to csv and fix it up. Be long winded though.
    You then end up with data in the wrong columns / rows and might need substantial editing. Data is thousands of lines long.


  • Advertisement
  • Registered Users Posts: 768 ✭✭✭14ned


    Victor wrote: »
    If I have XML-style data or Excel (or any other table) data, is there an easy / automated way to convert between the two?

    Me personally: I'd write a bit of Python script which uses lxml to convert the XML into well structured CSV. I'd then import the CSV into Excel.

    Niall


  • Closed Accounts Posts: 22,649 ✭✭✭✭beauf


    Victor wrote: »
    I r teh n00b.

    You then end up with data in the wrong columns / rows and might need substantial editing. Data is thousands of lines long.

    I thought this...
    Victor wrote: »
    ... I'm quite comfortable with Excel....

    ...meant you knew excel VBA. I though it was an odd question.

    CSV is just comma seperated values. Can be easier to trouble shoot and fix up than other formats. At least starting out.


  • Closed Accounts Posts: 22,649 ✭✭✭✭beauf


    You can use this aswell. http://www.convertcsv.com/xml-to-csv.htm

    Its only about 112 rows of data. So not much to fix up.

    The main issue is they didn't pad the empty values so any import won't align 100% without some fixing up.


  • Registered Users Posts: 78,369 ✭✭✭✭Victor


    beauf wrote: »
    Its only about 112 rows of data. So not much to fix up.
    I've another 12,000 rows of data to go with that. :D


  • Closed Accounts Posts: 22,649 ✭✭✭✭beauf


    A) I think you need to ask the source can they give it in another format (like Excel)
    B) Find someone who can convert it for you. Because there's a bit of work in it. Either in VBA or Python etc.


  • Advertisement
  • Registered Users Posts: 9,605 ✭✭✭gctest50


    Use power shell


    Get-Content dublin6busstops.xml


    and keep playing with it till you get what you need


  • Registered Users Posts: 78,369 ✭✭✭✭Victor


    beauf wrote: »
    A) I think you need to ask the source can they give it in another format (like Excel)
    In a way, I am the source for 50-75% of the data.

    Me -> OpenStreetMap.org -> overpass-turbo.eu -> XML

    The issue for me is that the website stores the data is in XML (or something very similar), whereas I am much more comfortable dealing with tables.


  • Registered Users Posts: 1,275 ✭✭✭bpmurray


    I loaded that file into Excel with no problems using the standard open dialog and selecting XML as the format. Of course, since Excel is rather brain-dead in this area, you have to rename it with an extension of "xml" or it tries to import it as text. So your solution is a two-step process:
    • ren "dublin 6 bus stops.txt" "dublin 6 bus stops.xml"
      
    • Load Excel and open the file as xml - you can then import is as an XML table or as a new workbook.

    The result is a bit ugly with the first two columns talking about OSM and overpass-turbo, but otherwise it looks fine. The column titles are also a bit ugly too, but that's purely cosmetic.


  • Closed Accounts Posts: 22,649 ✭✭✭✭beauf


    The problem is it doesn't align the columns as each stop has a different number of properties and the excel import is too simplistic to pad them out.

    Also the data needs to be transposed so the data name is the column name and not repeated for every stop as data.


  • Registered Users Posts: 6,501 ✭✭✭daymobrew


    I think that you need to read all records and note the attributes (bench, name, network, operator etc). This will get all possible attributes.
    Then go back through each record and print out the data, with blanks for columns when that record doesn't have a specific attribute.
    Make sense?

    For me I'd use php SimpleXMLElement() (note example #5 Using attributes) or perl, but the language of choice is irrelevant.


  • Closed Accounts Posts: 22,649 ✭✭✭✭beauf


    Loop though all the elements, create them as columns (or similar) then loop through a second time and populate with data.

    When converting between different formats of data you need the number of columns to be the same each time.

    Its likely there are columns not in the sample that are in the full data set. There may be something new in the very last line of the dataset.


  • Registered Users Posts: 6,501 ✭✭✭daymobrew


    beauf wrote: »
    Its likely there are columns not in the sample that are in the full data set. There may be something new in the very last line of the dataset.
    This is my concern too. If you get the foundation code right then it'll work correctly for the full 12k data set.


  • Registered Users Posts: 6,150 ✭✭✭Talisman


    ED E wrote: »
    Would CSV do? Excel is MS proprietary so less libraries will support it than CSV/other.
    The .XLSX file format is part of the Office Open XML standard - it has been an open standard for over 10 years.

    The .XLS file format is proprietary but Microsoft has been using .XLSX format since Office 2007.


  • Closed Accounts Posts: 22,649 ✭✭✭✭beauf


    I think it's worth mentioning that the problem here isn't the data format, file types.

    But rather the information/data is not organised in structural that is useful for the OP.


  • Registered Users Posts: 6,501 ✭✭✭daymobrew


    This little bit of code displays the data names.
    I've never used SimpleXMLElement and could not figure out how to parse a file with multiple XML elements.
    [PHP]<!DOCTYPE html>
    <html>
    <body>
    <pre>
    <?php
    $xml=<<<XML
    <Placemark><name>Bushy Park Road</name><ExtendedData>
    <Data name="@id&quot;><value>node/309051599</value></Data>
    <Data name="bench"><value>no</value></Data>
    <Data name="bus"><value>yes</value></Data>
    <Data name="highway"><value>bus_stop</value></Data>
    <Data name="lit"><value>yes</value></Data>
    <Data name="name"><value>Bushy Park Road</value></Data>
    <Data name="network"><value>Dublin Bus</value></Data>
    <Data name="operator"><value>Dublin Bus</value></Data>
    <Data name="passenger_information_display"><value>no</value></Data>
    <Data name="public_transport"><value>platform</value></Data>
    <Data name="ref"><value>2916</value></Data>
    <Data name="route_ref"><value>15b;15d</value></Data>
    <Data name="shelter"><value>no</value></Data>
    <Data name="tactile_paving"><value>no</value></Data>
    <Data name="website"><value>http://www.dublinbus.ie/en/RTPI/Sources-of-Real-Time-Information/?searchtype=view&searchquery=2916</value></Data&gt;
    <Data name="wheelchair"><value>limited</value></Data></ExtendedData><Point><coordinates>-6.2749285,53.3076343</coordinates></Point></Placemark>
    XML;

    $sxe=new SimpleXMLElement($xml);
    $extended_data_keys = array();
    foreach ($sxe->ExtendedData->Data as $data) {
    //echo $data, ' - ', $data->value, PHP_EOL;
    $extended_data_keys[] = strval( $data[ 'name' ] );
    }

    $extended_data_keys = array_unique( $extended_data_keys );
    print_r( $extended_data_keys );
    ?>
    </pre>
    </body>
    </html>[/PHP]


  • Advertisement
  • Registered Users Posts: 6,150 ✭✭✭Talisman


    Here's the list of field names that appear in the file, sorted alphabetically.

    '@id', 'bench', 'bus', 'covered', 'disused', 'fixme', 'highway', 'lit', 'name', 'name:en', 'name:ga', 'network', 'note', 'operator', 'passenger_information_display', 'public_transport', 'ref', 'route_ref', 'shelter', 'shelter_ref', 'source:ref', 'tactile_paving', 'tactile_writing:braille:en', 'website', 'wheelchair', 'wheelchair:description'

    This is the Python script that generated it:
    """
        Displays list of field names used in XML file generated from OSM data.
    """
    import xml.sax
    
    data_names = []
    
    
    class OSMContentHandler(xml.sax.ContentHandler):
        def __init__(self):
            xml.sax.ContentHandler.__init__(self)
    
        def startElement(self, name, attrs):
            if name == 'Data':
                attr_name = attrs.getValue('name')
                if attr_name not in data_names:
                    data_names.append(attr_name)
    
        def endElement(self, name):
            pass
    
        def characters(self, content):
            pass
    
    
    def main(sourceFileName):
        source = open(sourceFileName)
        xml.sax.parse(source, OSMContentHandler())
        data_names.sort()
        print(data_names)
    
    
    if __name__ == '__main__':
        main('dublin-6-bus-stops.txt')
    

    Given the list of fields you will want a script to iterate through your XML file and insert empty values for any records that are missing fields. This will give you content that you can safely import to Excel.


  • Registered Users Posts: 78,369 ✭✭✭✭Victor


    Talisman wrote: »
    This is the Python script that generated it:
    Thank you very much. What would I run this on?

    An aside: for full data set (6,664 bus stops currently of about 12,500 known, 4579 last edited by me), there are 74 field names. There are websites that does a similar listing / counting tasks

    http://taginfo.openstreetmap.ie/tags/highway=bus_stop#combinations
    http://stat.latlon.org/ie/latest/tags-h.html


  • Registered Users Posts: 6,501 ✭✭✭daymobrew


    Victor wrote: »
    Thank you very much. What would I run this on?

    An aside: for full data set (6,664 bus stops currently of about 12,500 known, 4579 last edited by me), there are 74 field names.
    Could you export the full data set into a file and run the script against it?


  • Registered Users Posts: 6,150 ✭✭✭Talisman


    The script was written in Python 3 so you will need to install Python or Anaconda on your computer in order to run it. I have since created two more scripts which perform the transforms you originally asked about and also include the Point coordinates - Gist.

    I've commented almost every line of the code so you should have no problem following how the scripts work - all you need to do is change file name strings passed in the main() function calls before executing the script.

    Console command to transform XML to CSV:
    python osm_xml_to_csv.py
    

    Console command to transform CSV to XML:
    python csv_to_osm_xml.py
    

    I've also attached the CSV file generated from the data you provided, I had to append the '.txt' extension in order to be able to upload it.

    NOTE: Don't open the generated CSV files directly in Microsoft Office. The reason being that it will corrupt the Irish language text.

    Instead create a new worksheet and import the file as UTF-8:
    Data > Get External Data... > Import Text File > Set "File origin" as UTF-8

    Google Sheets doesn't have any such issues with the generated file.


Advertisement