Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie

Large SQL database update

Options
  • 01-06-2006 12:11am
    #1
    Closed Accounts Posts: 8,866 ✭✭✭


    I'm going to tear my hair out if I don't find a decent solution to this! I have two .xls files containing customer details, i.e. names, addresses, contacts etc.

    Now there's about 2000 entries there, and I need to compare and update my current database from these depending on whether there's an existing entry for a certain customer existing already. In simple terms, there might be a Joe Bloggs in the xls list and in my database, but say his phone number isn't in my database, it needs to be, and if there is no entry for Joe Bloggs at all I need to put one in.

    I need a simpler way to do this, at the moment I'm getting NOWHERE!!

    Save me...


Comments

  • Registered Users Posts: 1,456 ✭✭✭FSL


    Use opendatasoure to treat the two excel spreadsheets as sql tables. You can then use standard sql commands to perform the update.


  • Closed Accounts Posts: 8,866 ✭✭✭Adam


    If thats as easy as it sounds it would be nice! But I've never delved into that before, any pointers or links to decent how to's?


  • Registered Users Posts: 1,456 ✭✭✭FSL


    you use the opendatasouce function as the database/table name. The format is:-

    OpenDataSource(Microsoft.Jet.OLEDB.4.0',
    'Data Source="full path to the excel file";Extended properties=Excel 8.0')...[Sheet name in the worksheet$]

    e.g.

    OpenDataSource(Microsoft.Jet.OLEDB.4.0',
    'Data Source="c:\book1.xls";Extended properties=Excel 8.0')...[Sheet1$]
    You have to add a $ sign to the end of the worksheet name.

    Also the data in the columns must be consistent or the row is not read. i.e. if the column contains numeric data then each row must have a numeric value or a zero not a blank. if a column contains character data then if there are any numeric values you must precede them with a ' to indicate character. If you have any such columns then write a simple macro to precede each value with a '


  • Closed Accounts Posts: 8,866 ✭✭✭Adam


    FSL wrote:

    OpenDataSource(Microsoft.Jet.OLEDB.4.0',
    'Data Source="c:\book1.xls";Extended properties=Excel 8.0')...[Sheet1$]
    You have to add a $ sign to the end of the worksheet name.

    Forgive the probably stupid question but what do I write this in? A SQL interface? For my SQL work I use SQLyog, its just a basic gui really...


  • Registered Users Posts: 1,456 ✭✭✭FSL


    You just use a standard SQL update query The opendatasource function is equivalent to the four part name i.e. servername.databasename.owner.table

    If the excel spreadsheet has column headers then they are the field names. if it does not have headers then it calls the fields f1 f2 etc.

    Suppose the the database table is called table1 with a field called telephone and the key field is customerid, the excel spreadsheet is called telnos in newdata.xls in folder fred on drive c it has a column called id and another called tnum. id is the customerid and tnum is the telephone number.

    The sql update query, assuming you are connected to your database, would be

    Update table1 set telephone =(Select tnum from OpenDataSource(Microsoft.Jet.OLEDB.4.0',
    'Data Source="c:\fred\newdata.xls";Extended properties=Excel 8.0')...[telnos$] where id=table1.customerid) where customerid in (select id from OpenDataSource(Microsoft.Jet.OLEDB.4.0',
    'Data Source="c:\fred\newdata.xls";Extended properties=Excel 8.0')...[telnos$])

    You need the final where clause to limit the updates to those records which are in the Excel Spreadsheet.


  • Advertisement
  • Closed Accounts Posts: 80 ✭✭Torak


    FSL wrote:
    You just use a standard SQL update query The opendatasource function is equivalent to the four part name i.e. servername.databasename.owner.table

    If the excel spreadsheet has column headers then they are the field names. if it does not have headers then it calls the fields f1 f2 etc.

    Suppose the the database table is called table1 with a field called telephone and the key field is customerid, the excel spreadsheet is called telnos in newdata.xls in folder fred on drive c it has a column called id and another called tnum. id is the customerid and tnum is the telephone number.

    The sql update query, assuming you are connected to your database, would be

    Update table1 set telephone =(Select tnum from OpenDataSource(Microsoft.Jet.OLEDB.4.0',
    'Data Source="c:\fred\newdata.xls";Extended properties=Excel 8.0')...[telnos$] where id=table1.customerid) where customerid in (select id from OpenDataSource(Microsoft.Jet.OLEDB.4.0',
    'Data Source="c:\fred\newdata.xls";Extended properties=Excel 8.0')...[telnos$])

    You need the final where clause to limit the updates to those records which are in the Excel Spreadsheet.

    As this is probably going to happen on a production database (after testing against a copy and creating a backup first of course), I would recommend giving pointers to information, rather than giving haphazard bits of information which essentially confuse. The information you are giving is worse than bad, it is dangerous...

    You need to arm this, obviously intelligent, individual with the strategy he needs to use to succeed at what he is doing. The information is in there but it is lost in a hazy example which is next to useless without clearly outlining the steps that you took.

    1. Identification of the key fields in the spreadsheet.
    - This is the where clause, where keyfield1 = valuefromspreadsheet
    2. Identification of the fields to be updated
    - This is the set clause, update <tablename> set valfield1 = valuefromspreadsheet, valfield2=valuefromspreadsheet where keyfield1 = valuefromspreadsheet
    3. Is it possible to do a mass update regardless of whether or not the details have changed?
    - This is important because it makes the problem simpler.
    - If, so then generate the sql script using a spreadsheet like the one attached
    - If not then modify the generation technique to restrict the update to only those rows where one of the valuefields identified in step 2 has changed. The sql would become: update <tablename> set valfield1 = valuefromspreadsheet, valfield2=valuefromspreadsheet where keyfield1 = valuefromspreadsheet and (valfield1 != (select valfield1 from database) or valfield2 != (select valfield2 from database)).

    If you have any questions post and I'll try to help


  • Registered Users Posts: 15,443 ✭✭✭✭bonkey


    Forgive the probably stupid question...

    No problem...just as soon as you forgive me mine...

    What database (system) are you using? MySQL? MSSQL? Oracle? DB2?

    I'm guessing its not MSSQL, or you'd have the Enterprise Manager which would give you DTS which would make it all pretty simple.

    In general I'd recommend one of two basic strategies:

    1) Load data from XLS into a "staging" table.
    Insert "missing" records (easy to achieve using INSERT INTO / SELECT FROM / WHERE NOT IN)
    Update all / non-missing records (should also be easy to achieve)

    2) Load data from XLS into a "staging" table
    Drop any constraints referencing customer table
    Drop existing customer table
    Rename staging table to customer table / create new customer table from staging table.
    Recreate constraints.

    This second approach assumes the XLS data covers all records in your db, and not just a subset.


    No matter which way you do it, make sure you back up your db before you start.


  • Registered Users Posts: 640 ✭✭✭Kernel32


    If it is MSSQL then load into staging table first.

    Then go buy SQL Data Compare. I use Red Gate tools all the time. They are fantastic. It won't magically do everything but it will present the data so you can make decisions and then it will generate the scripts.


  • Closed Accounts Posts: 2,046 ✭✭✭democrates


    I've used torak and bonkeys methods before and they are a lifesaver.
    (torak's method can also be used for generating snap html - while you're waiting for the proper cms solution that is)

    As bonkey pointed out be careful about dropping tables and reloading, as autonum indexes will be regenerated. If your insert data and sequence is always the same ad there have been no inserts or deletes no problem, but if not, new id's will now have different data. That is a problem if the key is referenced on another table, or another database eg ldap (or even in code which should never be the case but my shocked eyes have beheld it).

    Eg customer table: abacus solutions id=1, orders table: order id=5 and id=26 refer to customer id=1. You drop the customer table and reload, but now customer id = 1 has aardvark solutions instead of abacus solutions, and hence abacus orders are related to aardvark in error.


Advertisement