I have a massive text file with consultant data for my company that I have to synchronize with records in SQL 2005. The text file uses ~ as the column delimiter and carriage returns for the row delimiter. There are no header columns in the text file. This file has around 180,000 records in it, but I only need about 40,000 of them that meet certain criteria for import. I'll share all of the fields in the flat file if completely necessary, but in the meantime I'll tell you that it basically has all pertinent info for a person's billing and shipping information (Name, Address, City, State, Zip, Phone, etc.) as well as some internal information such as an ID number, signup dates, and the like.
I only need to pull the records from the flat-file that have all of the following things supplied: Internal ID, First Name, Last Name, Phone, and Email. The address information is only partly necessary in the overall scheme of things.
The two tables in the database that I need to import into are a Customer table which holds the necessary info above, and an address table that will hold both of the addresses for that record assuming they are present, but in two separate records. One record will be for shipping, and the other for billing.
So, here's what has to happen:
For each record that qualifies for import we check and see if a record exists for them in the Customer table. The fields to match are [Column 1] from the flat file (remember, no headers), and [ConsultantID] in the Customer table. If they exist, the record is updated with whatever information is in the flat file. If they don't exist a new record is inserted. We then do the same thing for the Address table. The records in the Address table are attached to the Customer table by a FK [CustomerID] which is the PK of the Customer table.
Now, what I've tried so far is to use SSIS to import the flat file into a table in the database and do the synch for just the user data only (no addresses) using a stored procedure. The problem is that the stored procedure takes over half an hour to run which seems excessive considering the relatively small number of records. When I throw in the address data as well, the whole process could very well take over an hour. I don't know if I need to continue down my current path and just accept the fact that it's a long-running operation, or if the whole thing should be done in SSIS (hopefully speeding things up), which I would need your help with. I'm pretty confident in my abilities with SQL Server overall, but SSIS is still relatively new to me beyond simple imports and exports.
So, let's get started. Fire away.
Start Free Trial