Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie

SQL Query Help - Count of Offline Minutes

Options
  • 17-10-2006 5:19pm
    #1
    Registered Users Posts: 604 ✭✭✭


    Hi,

    I have 2 tables. a Machine table and a log table. When a machine goes offline or comes back online a log is left in the log table with the machineID and what happened (eg. machine has come back online). I need to write a query that will tell me how many minutes each machine spent offline between 2 specified dates.

    So say the tables are:
    MACHINE
    machineid int,
    machinenumber int,
    .... other machine related stuff

    LOG
    logid int
    machineid int
    log varchar

    I need to find out what the status of the machine was at the start of the timerange and if it was offline add that onto however long it was until the machine came back online within that same range.

    Anyone got any pointers, im sure its a commonish type of query but i havent found much help online so far.


Comments

  • Registered Users Posts: 15,443 ✭✭✭✭bonkey


    Ummm....there's no time-based information in the table-structures you posted.

    How are you supposed to retrieve data between two dates, and where is there information which will tell you that the machine came online / went offline?

    Is it all just encoded in the varchar field called log?


  • Registered Users Posts: 1,466 ✭✭✭Smoggy


    More info is needed , such as data in the log field.
    I presume you can rip out the date from this field and do a comparison.


  • Registered Users Posts: 604 ✭✭✭Kai


    Sorry guys, forgot the Date column.

    LOG
    logid int
    machineid int
    log varchar
    DateCreated datetime


  • Registered Users Posts: 15,443 ✭✭✭✭bonkey


    Also need to know how to identify an "offline" log entry from an "online" one.

    And whether or not it is possible that a machine goes offline without a log entry?


  • Registered Users Posts: 604 ✭✭✭Kai


    In the LOG table the field log will contain "Going Offline" or "Coming Online" accordingly. The machine always leaves a log.


  • Advertisement
  • Registered Users Posts: 15,443 ✭✭✭✭bonkey


    Sorry Kai....

    one more question...

    what "flavour" of SQL is this? MSSQL? Oracle?

    The query isn't actually that simple, and I don't want to use anything thats not supported in whatever DB you're playing with....


  • Registered Users Posts: 4,188 ✭✭✭pH


    It may be possible to do this using a SQL query, by joining multiple times to the log table and attempting to pair start and stop log entries.

    However. by far the simplest way to do it is to use a procedural language select the log records in order, open a cursor and process them.

    Attempting to do it in a SQL statement is not the way to go IMHO.


  • Registered Users Posts: 604 ✭✭✭Kai


    I got it eventually, i used sub queries to pull back the last log and the date it happened, so i had 4 columns, currentLog, CurrentDate, PreviousLog and PreviousDate. i used a case statement to check if the currentLog was a "Coming Online" event, if it was i datediffed the Previous date to the current.

    Its working fine but its a bit slow. The main reason is that im using a query with TOP 1 in it to pull back the previous log and date. Is there an alternative to using TOP 1? Im pulling back the latest field based on the date it was created.


  • Registered Users Posts: 15,443 ✭✭✭✭bonkey


    Kai wrote:
    Is there an alternative to using TOP 1?

    Yes. There is.

    Its still a question of what flavour of SQL you are using.

    There's no point in me telling you how to solve it with Oracle if you're using MSSQL.

    Similarly, if you're using MySQL, then MSSQL syntax may not be much help...


  • Closed Accounts Posts: 3,357 ✭✭✭Beano


    Well given that he's looking for an alternative to top 1 then presumably he's using MSSQL as neither Oracle nor MySQL support that.

    Post up the query you have now and we'll take a look at it. If it is MSSQL then PH is right and you're probably better off using a cursor.


  • Advertisement
  • Registered Users Posts: 15,443 ✭✭✭✭bonkey


    Beano wrote:
    Well given that he's looking for an alternative to top 1 then presumably he's using MSSQL as neither Oracle nor MySQL support that.
    Fair point.
    Post up the query you have now and we'll take a look at it.
    Gosh thats nice of you ;)

    I've no db to hand, so I've syntax-checked none of this, but it should be pretty close.
    SELECT  SUM( DATEDIFF( mi
                         , CASE
                             WHEN off_and_on.went_offline < @some_start_date
                             THEN @some_start_date
                             ELSE off_and_on.went_offline
                           END
                         , CASE 
                             WHEN off_and_on.came_online  > @some_end_date>
                             THEN @some_end_date
                             ELSE off_and_on.came_online
                           END
                 ) -- end of datediff
            ) -- end of sum
    FROM    ( SELECT  online.datecreated came_online 
                   ,  ( SELECT  max(offline.datecreated)
                        FROM    log offline
                        WHERE   offline.log          = 'Going Offline'
                          AND   offline.machineid    = @some_machineid
                          AND   offline.datecreated <= @some_end_date 
                          AND   offline.datecreated <  online.datecreated
                      ) went_offline
              FROM    log online
              WHERE   online.log          = 'Coming Online'
                AND   online.machineid    = @some_machineid
                AND   online.datecreated >= @some_start_date
            ) off_and_on
    WHERE   off_and_on.went_offline is not null
    


  • Registered Users Posts: 604 ✭✭✭Kai


    I see what your doing here bonkey :
    ( SELECT  max(offline.datecreated)
                        FROM    log offline
                        WHERE   offline.log          = 'Going Offline'
                          AND   offline.machineid    = @some_machineid
                          AND   offline.datecreated <= @some_end_date 
                          AND   offline.datecreated <  online.datecreated
                      ) went_offline
    

    Which pulls back the maximum date, this is fine in this particular query but what if you wanted to pull back say the latest log. Using your example in my query im writing it roughly as :
                      ( SELECT  TOP 1 offline.log
                        FROM    log offline
                        WHERE  offline.machineid    = @some_machineid
                          AND   offline.datecreated <= @some_end_date 
                          AND   offline.datecreated <  online.datecreated
                         ORDER BY offline.datecreated desc
                      ) went_offline
    

    Which is going to be slower as it has to do an extra order before returning the log. Because its not a datefield you cant use an aggregating function like MAX.

    Anyone have a better solution ?


  • Registered Users Posts: 15,443 ✭✭✭✭bonkey


    Kai wrote:
    this is fine in this particular query but what if you wanted to pull back say the latest log.
    Then you've changed the requirements and need to redefine the question.
    Anyone have a better solution ?
    A better solution to what? You seem to have changed what it is you're trying to do. You initially talked about finding the total minutes of offline time for a machine in a time-range - thats what my query addressed.

    Now you're asking for a better solution...but to a different problem, which you haven't clearly defined yet.

    jc


  • Registered Users Posts: 604 ✭✭✭Kai


    I belive its the same question ive been asking all along but granted the details ive given havent been as clear as they chould have been.

    Working from an extract of the example you gave:
                      ( SELECT  max(offline.datecreated)
                        FROM    log offline
                        WHERE   offline.log          = 'Going Offline'
                          AND   offline.machineid    = @some_machineid
                          AND   offline.datecreated <= @some_end_date 
                          AND   offline.datecreated <  online.datecreated
                      ) went_offline
    

    this works fine, if the query returns multiple values then we merely select the MAXimum one and all is hunky dory. Now in my query :
                      ( SELECT  TOP 1 offline.log
                        FROM    log offline
                        WHERE  offline.machineid    = @some_machineid
                          AND   offline.datecreated <= @some_end_date 
                          AND   offline.datecreated <  online.datecreated
                         ORDER BY offline.datecreated desc
                      ) went_offline
    

    Im trying to select the latest log, i cant use a MAX statement as the Log is a varchar, so i order it by DateCreated and then use TOP 1 to grab the latest log. Ill more than likely always have to use the ORDER BY clause but is there an alternative to the TOP 1 statement? Is it a satisfactory way of writing the query in a development environment?


  • Closed Accounts Posts: 3,357 ✭✭✭Beano


    Kai wrote:
    I belive its the same question ive been asking all along but granted the details ive given havent been as clear as they chould have been.

    Working from an extract of the example you gave:
                      ( SELECT  max(offline.datecreated)
                        FROM    log offline
                        WHERE   offline.log          = 'Going Offline'
                          AND   offline.machineid    = @some_machineid
                          AND   offline.datecreated <= @some_end_date 
                          AND   offline.datecreated <  online.datecreated
                      ) went_offline
    

    this works fine, if the query returns multiple values then we merely select the MAXimum one and all is hunky dory. Now in my query :
                      ( SELECT  TOP 1 offline.log
                        FROM    log offline
                        WHERE  offline.machineid    = @some_machineid
                          AND   offline.datecreated <= @some_end_date 
                          AND   offline.datecreated <  online.datecreated
                         ORDER BY offline.datecreated desc
                      ) went_offline
    

    Im trying to select the latest log, i cant use a MAX statement as the Log is a varchar, so i order it by DateCreated and then use TOP 1 to grab the latest log. Ill more than likely always have to use the ORDER BY clause but is there an alternative to the TOP 1 statement? Is it a satisfactory way of writing the query in a development environment?


    Now i'm totally confused. I dont understand why you cant use the Max function. The datecreated field is a datetime so there are no problems using the max function with that column. Your query makes no sense in the context of what the subquery is trying to achieve. The subquery is returning the latest offline time before current online time. Your query returns the log column which just stores a string description of the event in the log. Bonkeys is correct.

    There is a slight hole in Bonkeys query. if a machine goes offline before the end of the period you are searching on but doesnt come back online then those offline minutes will not be counted. But thats a trivial thing to fix and general the approach is sound. I'm also a bit bemused that he aliased a tablename with a name that is longer than the table name. Seems an odd thing to do but perfectly valid syntax wise.


  • Registered Users Posts: 15,443 ✭✭✭✭bonkey


    Beano wrote:
    There is a slight hole in Bonkeys query. if a machine goes offline before the end of the period you are searching on but doesnt come back online then those offline minutes will not be counted.
    True. I deliberately left that out though, because while Kai mentioned machines offline at start being counted until they came online, he didn't mention machines that went offline before or during the timeframe that didn't come back online.
    But thats a trivial thing to fix and general the approach is sound.
    I don't think its quite so trivial in that it would involve fundamentally rewriting the query. At the moment, it takes all onlines after the window-start and finds matches for their offlines. If there is no online, the offline can never be included.

    I had thought about swapping it round originally - taking all offlines before the window-end and finding their matching onlines. I can see additional things to be wary of coming from that side as well though. Maybe its not so bad...I might have a tinker...or just union in a query which tests for that specific case :)
    I'm also a bit bemused that he aliased a tablename with a name that is longer than the table name. Seems an odd thing to do but perfectly valid syntax wise.

    I alias names to follow the content of what the resultset contains, rather than for brevity. When dealing with the same table multiple times, or using multi-nesting like this, It helps me keep track of what I'm dealing with at any level and may do the same for others when reading / figuring out the code.


  • Registered Users Posts: 15,443 ✭✭✭✭bonkey


    Kai wrote:
    this works fine, if the query returns multiple values then we merely select the MAXimum one and all is hunky dory.
    Yup. Its important to note, though that this is a special type of nested SELECT, in that it is not a "virtual table" used in the FROM clause, but rather a "virtual field" in the WHERE clause. I'm using it specifically to find the matching offline time for any given entry from the SELECT where log is aliased to online which - as the name suggests - contains only log entries of the machine coming back online.
    Now in my query :

    <code snipped>

    Im trying to select the latest log,
    Yup. But this is why I said that it seems you've changed the requirements...or you're trying to fit a bit of my query into the rest of yours...which may not work.

    Let me explain...

    In my query, the log varchar is only needed to filter SELECT statements for either online or offline methods. The result requested - the total minutes offline - doesn't require a log entry. So - and this was the point behind my last post - I don't understand why you need to select a log entry at all in this manner.

    I do notice that you don't have my WHERE clause which is stripping for only the "machine offline" messages. This is why I don't need to select the log, but I can't figure out how this will work if you don't filter like that here. You are, in effect, retrieving the newest log for some machine from some timerange, but not caring what it says until after you've retrieved it and only it.

    I can't see that working in more than a small number of situations. Maybe I'm missing something, or have guesed wrongly about how you're using this in the rest of your query, or something.....but that's what threw me....I just can't see why you'd need to do this.

    Leaving that aside...lets still look at the question about whether or not there's a better way to solve this type of problem.

    If you had to retrieve multiple pieces of data from a record that is found by max/min techniques, then you either need to be retrieving against a unique value (which the datetime field may not be - its possible to have successive INSERTS with the exact same time, especially dependant on how and where the time is generated as well as the frequency of inserts) or you've got a problem.

    My first instinct is to rewrite the query to try and avoid the problem. Me being me, I'd even consider using a temporary table to get around it, if it could offer a neat solution.

    If that doesn't work, then TOP may indeed be the only man for the job. Its ugly, but its a valid workaround to a real problem. I've often seen SQL written for Oracle systems where there's an ORDER BY and a WHERE ROWNUM = 1 in there. It looks slightly more elegant, and may be slightly more optimised in how it works, but it is effectively the exact same solution. TOP is the best SQL Server can do (that I'm aware of) for dealing with certain problems.

    The good news is that TOP may be an optimisable. If the data can be read using an index then it may be possible to require a single data read operation for each TOP iteration.

    Think about it...you ask for the TOP 1 of data for a given machine, within a date-range. A smart optimiser should be able to use a machineid / datecreated index to get straight to the highest valid record in that range. THen it can read backwards until it has the requested number of inedx-entries for records. In your case, that would be one, so it has it immediately. This index-entry can then be used to read the record directly.

    OK - you've got to figure in index-reads as well, but its still a heck of a lot more efficient than the "retrieve all, sort, and return best" that will happen if there isn't an index up to the task.

    So the short answer is to try everything to avoid it, use it if you really have to, and then try your damndest to optimise any unacceptable slowness away. Remember - if its just for a report or something, being slow may not be an issue worth spending time on.

    God be with the web. It has so lowered people's expectations in terms of what constitutes responsiveness ;)


  • Closed Accounts Posts: 3,357 ✭✭✭Beano


    bonkey wrote:
    True. I deliberately left that out though, because while Kai mentioned machines offline at start being counted until they came online, he didn't mention machines that went offline before or during the timeframe that didn't come back online.


    I don't think its quite so trivial in that it would involve fundamentally rewriting the query. At the moment, it takes all onlines after the window-start and finds matches for their offlines. If there is no online, the offline can never be included.

    I had thought about swapping it round originally - taking all offlines before the window-end and finding their matching onlines. I can see additional things to be wary of coming from that side as well though. Maybe its not so bad...I might have a tinker...or just union in a query which tests for that specific case :)

    A union was exactly what I was thinking off
    bonkey wrote:
    I alias names to follow the content of what the resultset contains, rather than for brevity. When dealing with the same table multiple times, or using multi-nesting like this, It helps me keep track of what I'm dealing with at any level and may do the same for others when reading / figuring out the code.

    fair point.


Advertisement