Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie

SQL Hell!

Options
  • 25-01-2007 7:18pm
    #1
    Closed Accounts Posts: 108 ✭✭


    Hi all,

    I'm trying to write an SQL query that will clean up data in a given table.

    Its difficult to describe so hopefully the SQL I've come up with so far will help:

    (select grouping_id from tableX where
    charges_id = 5 and tax_id = 2 and commissions_id = 24)
    intersect
    (select grouping_id from tableX where
    charges_id = 2 and tax_id = 0 and commissions_id = 0)

    (these are multiple records and there can be more than two!)

    This works for the case where the values are 5,2,24 and 2,0,0. The problem is that over time different users entered data in different compinations/ permutations so data which may look different can actually be a duplicate entry. (I hope that makes sense!)

    So is there a way to return the group_id without getting all the permutations manually and writting a very long sql statement. There may be far more than two rows so this could get very complicated.

    Thanks in advance,

    Les


Comments

  • Registered Users Posts: 15,443 ✭✭✭✭bonkey


    I'm not quite seeing what it is you want to do.

    Are you trying to identify all records for all grouping_ids where the grouping_id occurs more than once in the table?

    I think thats what you're trying to do, but it might not be.

    Also...what flavour of SQL are you using (i.e. for which DB). It could be useful in terms of making sure that whatever suggestions you get don't use some technique not supported on your database.


  • Registered Users Posts: 2,781 ✭✭✭amen


    The problem is that over time different users entered data in different compinations/ permutations so data which may look different can actually be a duplicate entry

    what caused this to happend and has it been rectified so it can't happen again?

    If not waste of time cleaning up the data as you will be back doing it again.

    Make sure there won't be any more crappy data before fixing the data
    I'd also suggest correcting the data on a backup copy first and ensure it is correct there. If there are any reports on the data run them on the backup and make sure they are correct.

    Just curious since you have TaxIDs and Commissions what impact does this have on historical data? Will someone be owed/owe you money?
    Does this data appear on a legal document or a document a customer have?
    If you clean it up and for some reason your company ends up in dispute/court what happens if the document is rerun with data now missing and it doesn't match the customers?


  • Closed Accounts Posts: 108 ✭✭les_steaks


    Thanks for the replies, hopefully I'll explain it a bit better this time!

    Firstly, I'm using SQL Server 2005. Its all in a development environment so there will not be any implications legally etc.

    The inconsistency occurred over time as different people entered data manually. Go forward I've automated the process so consistency should prevail but I use nned to identify duplicate data and eliminate it.

    Take two sample records:

    GRP_ID CHAR_ID TAX_ID COMM_ID etc.
    1 2 5 20
    1 5 0 0

    These may also appear as:

    GRP_ID CHAR_ID TAX_ID COMM_ID etc.
    10 5 5 20
    10 2 0 0

    The table is used to populate a front end on an application and will can correctly work using either record so neither is technically incorrect but for redundancy etc. I want to eliminate the duplicates in the future.

    There could be 10 rows in a group so is without having to query using each permutation can anybody recommend a simpler way?

    Thanks,

    Les


  • Registered Users Posts: 15,443 ✭✭✭✭bonkey


    What you're describing are rows where TAX_ID, COMM_ID etc. are duplicated, but *not* with GRP_ID CHAR_ID the same. Funnily...I've just been solving exactly the same problem in an Oracle database :)

    OK...here's one way of doing it. I'm going to assume that TAX_ID and COMM_ID and ETC are the full set of duplicated fields. If there are more fields (which your etc. suggests) you need to add them....

    There might be some syntax problems in how I nested the SELECT as a table, but I don't have MSSQL to hand so I'm not going to be easily able to check it up for you.

    But the idea is right...the SELECT / HAVING COUNT(*) > 1 gives you a unique set of the duplicated details. Join this back to the table on every field in the set, and it will give you all the records which hit those duplicates.

    The ORDER BY at the end puts them in a sensible order.

    The next trick comes when you want to delete all but one copy of each. Let me know if you can't get that bit either....

    SELECT details.*
    FROM myTable AS details
    INNER JOIN (
    SELECT tax_id, comm_id, etc, count(*)
    FROM mytable
    GROUP BY tax_id, comm_id, etc
    HAVING count(*) > 1
    ) nested_table
    ON details.tax_id = nested_table.tax_id
    AND details.comm_id = nested_table.comm_id
    AND details.etc = nested_table.etc

    ORDER BY tax_id, comm_id, etc, grp_id, char_id


  • Closed Accounts Posts: 108 ✭✭les_steaks


    Thanks bonkey, but I'm not sure that will do it.

    There may be two groups of records say:

    GRP_ID CHG_ID TAX_ID COMM_ID
    1 2 5 10
    1 5 0 0

    and

    GRP_ID CHG_ID TAX_ID COMM_ID
    10 5 5 10
    10 2 0 0

    I need a query that will return both grp_ids, i.e., to me (in this circumstance) the two groups are the same so I need to identify them, delete one and reuse the remaining one.

    They have different grp_ids and the chg_id's are reversed but the front end's query gets all charges, taxes and comms for a particular group so both groups above would return the same.

    What I need is a query like eg.

    (((SELECT GRP_ID FROM
    TABLE_NAME WHERE
    CHR_ID = 2 AND TAX_ID = 5 AND COMM_ID =10)
    INTERSECT
    (SELECT GRP_ID FROM
    TABLE_NAME WHERE
    CHR_ID = 5 AND TAX_ID = 0 AND COMM_ID =0))
    UNION
    ((SELECT GRP_ID FROM
    TABLE_NAME WHERE
    CHR_ID = 5 AND TAX_ID = 5 AND COMM_ID =10)
    INTERSECT
    (SELECT GRP_ID FROM
    TABLE_NAME WHERE
    CHR_ID = 2 AND TAX_ID = 0 AND COMM_ID =0)))


    As I could have ten rows in a particular group to explicitly include all permutations in the query in not practical?

    Thanks,

    Les


  • Advertisement
  • Registered Users Posts: 2,781 ✭✭✭amen


    when deleting these is a nasty little hack you can do in MS SQL
    if you have 5 duplicated rows you could use
    SET ROWCOUNT 1
    and run the delete statement 4 times
    each time only one row will be delete as ROWCOUNT is 1

    or try
    SET ROWCOUNT 4 and run delete once
    remember to
    SET ROWCOUNT 0 afterwards


  • Closed Accounts Posts: 108 ✭✭les_steaks


    Thanks Amen, had that in mind alright, but its actaully identifying the groups to delete that the major headache! Any thoughts????


  • Registered Users Posts: 15,443 ✭✭✭✭bonkey


    IF you look at my query, it finds rows which are the same except for the grp_id / chg_id. Its trivial to change that to find rows which are the same except for just the group_id.

    Then it identifies where two seperate grp_ids have the same content in the rest of the row. This doesn't verify that *every* row in one group matches *every* row in another group, but it does get you close.

    To be honest, I don't think you can easily reduce this down to a piece of SQL.

    Using a cursor, some temporary tables (or cursors) etc. you could do it alright, but I can't think of any easier way.

    To do it with a cursor, you'd do something like the following (written in pseudocode)
    Open cursor1 on : select distinct grp_id from MainTable.
    For each grp_id X in cursor1
      - select * from MainTable where grp_id = X into temptable1
      - open cursor2 on select distinct grp_id from MainTable
      - for each grp_id Y in cursor2
          - select * from MainTable where grp_id = Y into temptable2
          - run a select, joining the contents of temptable1 to temptable2 
            on all fields except grp_id.
          - if the rowcount of this select == rowcount of temptable1
            and rowcount of this select == rowcount of temptable2 then you have a matching group.
              - delete from the main table where grp_id = Y
            end if
        loop
    loop
    

    You can make this smarter by writing a first version which instead of doing deletes simply writes out messages saying Grp X is a clone of Grp Y...and potentially even writing out the contents of each group.

    You can then verify this as you like, and when you're happy, put in the delete and off you go.


  • Closed Accounts Posts: 108 ✭✭les_steaks


    Think I've cracked it. Seems to do what I need in anyhow!

    SELECT grouping_id,count(grouping_id) as counter
    FROM table_name
    WHERE(charges_id IN (0,12,14,11)) AND (tax_id IN (1,0)) AND
    (commissions_id IN (6,7,0))
    group by grouping_id
    order by counter desc

    Now anything in this case with a count of 4 (i.e., (0,12,14,11))) are the groups I'm looking for.

    Thanks all for your help,

    Les


Advertisement