Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie

Can xml pull do this?

Options
  • 29-11-2004 8:48pm
    #1
    Registered Users Posts: 261 ✭✭


    I have been doing some research for a project I’m doing. I have to create a dynamic pipeline over http.

    So I came across xml pull and it seems that it should fit the job nicely.
    Represent each process as an element containing a url and any addition info.

    So basically what I see happening is to change the pipeline you add/edit/remove elements in the document representing the pipeline.

    But what if you’re saving at the same moment that you go to look for the next process?
    Or can you even open the file if xml pull has it open?
    I haven’t come up against this kind of file handling before so any help would be cool.


Comments

  • Closed Accounts Posts: 92 ✭✭tempest


    xml pull is a strategy rather than anything else.
    It _is_ an appropriate strategy for what you are trying to achieve.

    What I don't understand is your focus on files. In an infrastructure like this (and i am in part referring to your original post some weeks ago where there is a more indepth discussion of the problem), there is no need for files.

    It would be more appropriate to focus on the data being passed around. It sounds like you are passing file handles and/or file locations around where this does not make sense. It's far more appropriate to pass the data from one process to the next in memory or as a string. If you do need to store it somewhere then you could use URI's to indicate the location but the input and output URI's should be different. The input should not be modified, potentially archived, but not modified.


  • Registered Users Posts: 261 ✭✭HaVoC


    Sorry my description is abit vague , ok so here is a better description

    The system is a distributed system.
    It has to comform to REST so i'm using http .
    Each process in the pipeline is represented by a url .
    So a http server is constantly processing request through the pipeline.
    i.e. the pipeline server receives the results from the first process it then parses the file to find the address of the secound process.
    so to change the pipeline using memory would involve alot of synchronisation and has potential for lots of errors. So that's why I'm thinking of using an xml file to represent the pipeline structure.

    So its the structure of the pipeline i want to represent not the data.
    Thanks for replying tempest

    Its basically the complexities of opening and editing the file while the server is live and using the same file that gets me :confused: .


  • Closed Accounts Posts: 92 ✭✭tempest


    Sorry about that. I should have identified what you were saying from your first post, but I misread it.

    Ok so your question is how do I (the process) parse an xml file if some one else (An Administrator) is updating that file at the same time. There are numerous ways to approach this and each of them have their own complexities.
    Ultimately what you require is a transaction on the file. I have to say that in this situation my first thought is to store the data in a relational database. The database can guarantee that you get your required data back without worrying about whether someone else is editting it at the same time.

    Accessing a file on a filesystem each time means you will run the risk of getting a corrupt data stream. You could introduce a retry mechanism which tries say 3 times to do the parse and if it fails three times returns an error as a result.


  • Registered Users Posts: 261 ✭✭HaVoC


    Thanks.
    I partly guessed using xml pull would be too easy a solution :D
    I probably stay away from a database as i dont have one in the project at the moment.It would be over kill to use one just for the pipeline.


  • Closed Accounts Posts: 92 ✭✭tempest


    Unfortunately I don't think that it's as simple as not using xml pull.......

    You have to store your pipeline definition somewhere. If that happens to be in a file on a filesystem then you run the risk of somebody else modifying the data at the same time as you and invalidating an open stream.......

    Unless of course you decide that the system has to be taken down for pipeline definition to change, and state that the definition file should not be modified while the server is up.

    This gets around the problem but in most industries would be unacceptable. the downtime and risks would be too much.

    But then again its a college project, and usually stating the limitations of the system in order to reduce the scope to an achievable level or in order to focus the project on more pertinent issues is acceptable. In fact it is probably a good thing as it shows you have at least identified and acknowledged the limitations. (As long as it's documented as such).


  • Advertisement
  • Registered Users Posts: 27,163 ✭✭✭✭GreeBo


    Read from the file once and store the DOM.
    Then I'd have some logic around the accessor to the object to decide if it just returns the DOM or rereads it from disk.
    You could say that the cache goes stale after an hour, or you could just compare timestamps and only read from the file when it has been updated.


  • Registered Users Posts: 261 ✭✭HaVoC


    I think I’ll just end up doing a data structure with the synchronised keyword.
    And store it on an rmi server.


Advertisement