Can xml pull do this?

HaVoC · 29-11-2004 8:48pm #1

I have been doing some research for a project I’m doing. I have to create a dynamic pipeline over http.

So I came across xml pull and it seems that it should fit the job nicely.
Represent each process as an element containing a url and any addition info.

So basically what I see happening is to change the pipeline you add/edit/remove elements in the document representing the pipeline.

But what if you’re saving at the same moment that you go to look for the next process?
Or can you even open the file if xml pull has it open?
I haven’t come up against this kind of file handling before so any help would be cool.

tempest · 30-11-2004 12:21pm

xml pull is a strategy rather than anything else.
It _is_ an appropriate strategy for what you are trying to achieve.

What I don't understand is your focus on files. In an infrastructure like this (and i am in part referring to your original post some weeks ago where there is a more indepth discussion of the problem), there is no need for files.

It would be more appropriate to focus on the data being passed around. It sounds like you are passing file handles and/or file locations around where this does not make sense. It's far more appropriate to pass the data from one process to the next in memory or as a string. If you do need to store it somewhere then you could use URI's to indicate the location but the input and output URI's should be different. The input should not be modified, potentially archived, but not modified.

HaVoC · 30-11-2004 4:12pm

Sorry my description is abit vague , ok so here is a better description

The system is a distributed system.
It has to comform to REST so i'm using http .
Each process in the pipeline is represented by a url .
So a http server is constantly processing request through the pipeline.
i.e. the pipeline server receives the results from the first process it then parses the file to find the address of the secound process.
so to change the pipeline using memory would involve alot of synchronisation and has potential for lots of errors. So that's why I'm thinking of using an xml file to represent the pipeline structure.

So its the structure of the pipeline i want to represent not the data.
Thanks for replying tempest

Its basically the complexities of opening and editing the file while the server is live and using the same file that gets me

.

tempest · 30-11-2004 4:52pm

Sorry about that. I should have identified what you were saying from your first post, but I misread it.

Ok so your question is how do I (the process) parse an xml file if some one else (An Administrator) is updating that file at the same time. There are numerous ways to approach this and each of them have their own complexities.
Ultimately what you require is a transaction on the file. I have to say that in this situation my first thought is to store the data in a relational database. The database can guarantee that you get your required data back without worrying about whether someone else is editting it at the same time.

Accessing a file on a filesystem each time means you will run the risk of getting a corrupt data stream. You could introduce a retry mechanism which tries say 3 times to do the parse and if it fails three times returns an error as a result.

HaVoC · 30-11-2004 5:12pm

Thanks.
I partly guessed using xml pull would be too easy a solution

I probably stay away from a database as i dont have one in the project at the moment.It would be over kill to use one just for the pipeline.

tempest · 30-11-2004 5:38pm

Unfortunately I don't think that it's as simple as not using xml pull.......

You have to store your pipeline definition somewhere. If that happens to be in a file on a filesystem then you run the risk of somebody else modifying the data at the same time as you and invalidating an open stream.......

Unless of course you decide that the system has to be taken down for pipeline definition to change, and state that the definition file should not be modified while the server is up.

This gets around the problem but in most industries would be unacceptable. the downtime and risks would be too much.

But then again its a college project, and usually stating the limitations of the system in order to reduce the scope to an achievable level or in order to focus the project on more pertinent issues is acceptable. In fact it is probably a good thing as it shows you have at least identified and acknowledged the limitations. (As long as it's documented as such).

GreeBo · 30-11-2004 6:00pm

Read from the file once and store the DOM.
Then I'd have some logic around the accessor to the object to decide if it just returns the DOM or rereads it from disk.
You could say that the cache goes stale after an hour, or you could just compare timestamps and only read from the file when it has been updated.

HaVoC · 06-12-2004 11:05am

I think I’ll just end up doing a data structure with the synchronised keyword.
And store it on an rmi server.

Can xml pull do this?

Comments