Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie

Is this possible (Word to xml)

Options
  • 20-01-2007 9:31pm
    #1
    Registered Users Posts: 4,276 ✭✭✭


    Greetings,

    A few of us are trying to make somethign that will share our various notes and hopefully be able to search it for relevant material.

    Anyhow's as our notes are in word I suggested we try to convert it to XML. In anycase is it possible?

    I'm looking into it at the moment. Could I get C# to open the document or somehow parse the document while its in .doc and put it into XML.

    Figure its somehow possible as there is the lovely word schema and so on. Am I just wrong ?

    Any pointers or links would be great :D


Comments

  • Moderators, Politics Moderators Posts: 39,821 Mod ✭✭✭✭Seth Brundle


    It sure is (on my Word 2003)
    go file > save as > XML is the 2nd option in the 'save as type' drop down!
    Much easier than looking for some C# alternative!


  • Registered Users Posts: 4,276 ✭✭✭damnyanks


    Yup thats also possible, but we want to scan the network as well as all our lectureres put their notes on that as opposed to something like blackboard. We also want to compile our stuff together.


  • Moderators, Politics Moderators Posts: 39,821 Mod ✭✭✭✭Seth Brundle


    Thats different (& not quite what you initially asked!).


  • Closed Accounts Posts: 362 ✭✭information


    Coverting a document to xml is very difficult.
    Even more so if you have 100s of documents written by different people.

    Each person has their own style of note taking.
    How are you going to decide on the xml schema.

    Each word document could contain lots of information in no particular format.
    You would need to parse the word doc and try to break it up into section that would fit into your xml schema.

    Before you even start worrying about how to code it, how are you going to decide what information to put into the xml file and what information is useless.

    If you find key word X put that paragraph in a section called <x></x>?etc..


  • Registered Users Posts: 4,276 ✭✭✭damnyanks


    Yeah, I've found out to get hold of the information through the office interop library. It's a case of converting the info into something useful now. Am considering just opening it and converting it into the office XML automaticaly.


  • Advertisement
  • Registered Users Posts: 2,931 ✭✭✭Ginger


    Here is a VB.NET example of what you want to do

    http://www.devx.com/dotnet/Article/17358?trk=DXRSS_XML

    Then after that its a case of looping through a folder collection and files collection and seeing if its doc file and processing it

    Converting to C# should be no probs


  • Closed Accounts Posts: 18,056 ✭✭✭✭BostonB


    Word to custom XML is simple using VBA. AFAIK You've been able do that since Word 97 (VBA) and the MSXML Object Library.

    From experience unless you have a very tightly controlled format to begin with for the documents, (all the content enclosed in tables for example) its a nightmare trying to code for all possible types of formatting and structures that are possible within Word, no matter how you try and lock it down.

    If you keep the document structure simple and robust as possible its doable. But why use word at all? My thinking is that what you want to achieve is formatting separate from content, and everything in a standard structure that can be searched. Then use a database with a web front end with limited WYSIWYG functionality. Its a lot simpler and more robust.


Advertisement