Optimising a database table for SUM() statements

seamus · 21-04-2008 5:26pm #1

OK, some of the bods here might be able to help.

Problem is, I have a single statement which is causing me some grief. It takes a fairly hefty amount of time (relatively speaking) to execute. I can't stop the query from being run and I can't alter it. So I need to find a way of optimising the table which it queries.

Basically the table contains timesheet entries - one row per entry.

Every time a user enters an "activity type" (i.e. "I did this today for 3 hours"), the program goes and gets the sum of all of the hours that the user has ever done under that activity and displays it to the screen.

The query looks like this:

SELECT SUM(QUANTITY) WHERE PERSONNELLNO = 123456 AND COL2 = 'SOMEVAL' AND COL3 = 'OTHERVAL'

When we're under particularly high load, this query causes the whole screen to freeze while the program goes back and tots up everything done over the last five years.

Since there's no "GROUP BY" in this query, I can't think of any indexes which could help this.

Any ideas?

Tks

pH · 21-04-2008 5:51pm

Not sure what GROUP BY has to do with it but an index on PERSONNELLNO, COL2 , COL3 would help if it doesn't already exist.

Martyr · 21-04-2008 6:58pm

seamus wrote:

...tots up everything done over the last five years.

would put the total hours for each day, week, month, year into separate tables - pre-computed, at end of every day,month,year.

then just add totals for years, months, days - when you need it.

Tom Dunne · 21-04-2008 7:35pm

pH wrote: »

Not sure what GROUP BY has to do with it but an index on PERSONNELLNO, COL2 , COL3 would help if it doesn't already exist.

Agreed. Out of curiosity, what database?

Average Joe wrote: »

would put the total hours for each day, week, month, year into separate tables - pre-computed, at end of every day,month,year.

then just add totals for years, months, days - when you need it.

Doing something like this depends on a few factors, such as how often these type of queries would be run.

Martyr · 21-04-2008 9:26pm

seamus wrote:

Basically the table contains timesheet entries - one row per entry.

Tom Dunne wrote:

Doing something like this depends on a few factors, such as how often these type of queries would be run.

its time tracking software of employees? - so i would guess it gets queried for each employee at the end of every week, month and year for management or payroll purposes.

when its run once at end of day for each employee, why bother re-calculating it again unless there is mistake? seems pointless waste of cpu and time.

GoneShootin · 21-04-2008 11:52pm

seamus wrote: »

Any ideas?

I wonder could you have a running total stored on your DB someplace in a new table? Each time something is done it adds the time. Once you had it populated from say Wednesday, then your code each time after that would simply add the X time spent on an activity and add it to whats there. Say something like

- SELECT total_hours FROM tbl_total_hours WHERE PERSONNELLNO = 123456
- $total_hours=X
- On INSERT of new activity time details from user do:
- $update_total_hours = $total_hours + $_POST[hours] //or whatever
- UPDATE tbl_total_hours SET total_hours = $update_total_hours

With this kind of way, when you run the query, your only looking for one record from the DB. It might not be flexible enough for your needs but it would surely cut down on any DB searching done by your query.

seamus · 22-04-2008 9:43am

Tom Dunne wrote: »

Agreed. Out of curiosity, what database?
Doing something like this depends on a few factors, such as how often these type of queries would be run.

Average Joe wrote: »

its time tracking software of employees? - so i would guess it gets queried for each employee at the end of every week, month and year for management or payroll purposes.
when its run once at end of day for each employee, why bother re-calculating it again unless there is mistake? seems pointless waste of cpu and time.

Agreed completely. The database is an SAP database. So the programs doing the querying are SAP ones. I *could* alter the query in the program and add a computed column (as Average Joe and GS suggest), but then our multi-million euro system would be rendered unsupportable by the vendor.

So I'm stuck with the query as it is.

The query is run pretty much every time an employee enters a single time value into one of their timesheets. So for each employee, it happens 10-12 times per timesheet and we consistenly have 10-15 people entering timesheets into the system. So I would imagine that this query is being run probably once every 2 minutes or so during the day.

We even have a nightly job which already does computations, I have no idea why they put this query into the standard functionality (and don't let you remove it). The most irritating thing is that we don't even want this value. The field has been hidden from the users' form.

pH wrote: »

Not sure what GROUP BY has to do with it but an index on PERSONNELLNO, COL2 , COL3 would help if it doesn't already exist.

I always assumed that indexes on the GROUP BY columns were in use in this kind of situation, but I'll have a look at what indexes are on the columns in the query.

Cheers for the responses so far guys.

Tom Dunne · 22-04-2008 10:21am

seamus wrote: »

Agreed completely. The database is an SAP database. So the programs doing the querying are SAP ones. I *could* alter the query in the program and add a computed column (as Average Joe and GS suggest), but then our multi-million euro system would be rendered unsupportable by the vendor.
So I'm stuck with the query as it is.

SAP database? Yuck.

If you are stuck with the query, can you use materialized views?

You won't be altering the query, but you will be altering what the query, em, queries.

bonkey · 22-04-2008 12:03pm

pH wrote: »

Not sure what GROUP BY has to do with it but an index on PERSONNELLNO, COL2 , COL3 would help if it doesn't already exist.

I'd go one step further...after checking if SAP DB is smart enough to do the kind of optimisation that I'm thinking of.

Create an index on PersonnelNo, Col2, Col3 and quantity.

This creates what is called a 'spanning' index - an index which "spans" the query (as it contains all referenced columns)

What this means is that queries which are spanned by this index shouldn't need to read the data-pages of the table at all. Rather, the leaf-nodes of the index will contain all the information necessary to complete the query.

As with any such index, it (obviously) has about as bad an impact on the performance data-modification queries as you can get, but if you're not talking about seriously-high-volume writes it shouldn't be a problem.

seamus · 22-04-2008 4:04pm

Tom Dunne wrote: »

If you are stuck with the query, can you use materialized views?

You won't be altering the query, but you will be altering what the query, em, queries.

Cheers for that. Set me on a very interesting course today

I built an indexed view which did exactly what I wanted it to do and performed great in queries......except that the database refused to use it :mad:
It continues to opt for one of the other indexes and I'm loath to interfere much with them, so back to square one.

I'd go one step further...after checking if SAP DB is smart enough to do the kind of optimisation that I'm thinking of.

Create an index on PersonnelNo, Col2, Col3 and quantity.

This creates what is called a 'spanning' index - an index which "spans" the query (as it contains all referenced columns)

What this means is that queries which are spanned by this index shouldn't need to read the data-pages of the table at all. Rather, the leaf-nodes of the index will contain all the information necessary to complete the query.

As with any such index, it (obviously) has about as bad an impact on the performance data-modification queries as you can get, but if you're not talking about seriously-high-volume writes it shouldn't be a problem.

It's running on MSSQL, which should be well able to handle spanning indexes. The index space is already 1.5GB on a 2.5GB table, but I'm going to give this a go anyway in the test system. If it yields good results, it's a step in the right direction. I suspect there are too many indexes on this table...like too many cooks.

seamus · 22-04-2008 5:05pm

That's it bonkey. Speeds up the query by a factor of roughly 100

I even found an existing index which only needed one column added to it, so there shouldn't be much impact on write performance.

That was an interesting 5 hour foray into indexed views though

Cheers for your help guys.

bonkey · 23-04-2008 10:38am

seamus wrote: »

That's it bonkey. Speeds up the query by a factor of roughly 100

Nice.

I even found an existing index which only needed one column added to it, so there shouldn't be much impact on write performance.

Keep an eye on it...just in case it comes back to bite you in the ass.

bonkey · 23-04-2008 10:41am

seamus wrote: »

It's running on MSSQL, which should be well able to handle spanning indexes.

Ah...when you said "a SAP DB" I thought you meant the product SAPDB. MSSQL does great with spanning indexes....MSSQL is what I learned most of my optimisation stuff for in the first place

Stating the obvious (hopefully), ensure that the index is non-clustered. Clustering it will not only give you no benefit, it can reduce performance...as well as removing whatever benefit you might get having a different clustered index.

The index space is already 1.5GB on a 2.5GB table, but I'm going to give this a go anyway in the test system. If it yields good results, it's a step in the right direction. I suspect there are too many indexes on this table...like too many cooks.

Definitely keep an eye on your space requirements....

Optimising a database table for SUM() statements

Comments