This article describes when, how, and why to use Insert/Update rather than Update/Insert, and vice versa. For more information about available operations, see
Operations in the Help.
Insert/UpdateThe Insert/Update operation:
- Tries to insert a record.
- If the insert fails, tries to update a record in the target.
An insert to a target can fail for a number of reasons:
- Violation of a unique index on the table (attempting to insert a duplicate row)
- Attempting to insert a NULL value into a field that does not allow nulls
- Attempting to insert a value that would violate a field’s data type, for example trying to insert the word “One” into a field that is type integer and only allows numeric characters.
For this discussion, we assume that Insert failures occur due to Reason #1, violation of a unique index on the table, attempting to insert a duplicate row.
If you decide to use the Insert/Update operation to prevent duplicate records in the database table, the database is effectively in control. You are relying on the unique index on the table in the database to prevent a duplicate row from being inserted.
If the insert fails, due to the violation of the database unique index, the control passes back to your DTS file, and TIBCO Scribe® Insight attempts to find and update the existing record based on lookup criteria defined in the DTS file.
When this happens, an Insert/Update operation prevents duplicates only if the target has a unique index that supports our criteria for uniqueness.
For example, if you have a table in the target named company with the following fields:
CompId |
Company_Name (unique index field)
| Address_Line_1 (unique index field) | Address_Line_2 | City | Zip_Code (unique index field) |
---|
You decide that the criteria for uniqueness is the combination of Company Name, Address Line 1, and Zip Code. If this unique index does not already exist in the company table for the fields Company_ Name, Address_Line_1, and Zip_Code, you need to create it.
Advantages
Insert/Update is usually faster for an initial data load than Update/Insert. An initial data load usually loads to a database with few, if any, existing records, which means the first attempted operation, the Insert, is successful more often. Therefore, most of the Inserts succeed and TIBCO Scribe® Insight does not have to attempt many Updates.
Disadvantages
Since using Insert/Update requires creating a special unique index, additional overhead is added to the database table. For this reason, best practice is not to use Insert/Update for ongoing data integration tasks.
MS SQL Server “Update Statistics” Command
With the Insert/Update operation, there are usually some Updates performed. The lookup links for the Update typically match the fields included in the “special” unique index. Using the previous table as an example, if an attempt to Insert a record such as the following fails:
Company_Name | Address Line 1 | Zip Code |
---|
ABC Software, Corp. | 1 Main Street | 02034 |
Then the Update would need to use the following as its Lookup Criteria:
- Company = “ABC Software, Corp.” AND
- Address_Line_1 = “1 Main Street” AND
- Zip_code = “02034”
In addition to enforcing uniqueness, a unique index can make lookups and updates against the database faster. The MS SQL Server Update Statistics command makes using an index more efficient by causing that index to be smarter at finding the data. This is especially true when doing an initial load to an empty table. As more data is added to the table, the indexes still “think” the table is empty. Running the Update Statistics command forces the index to review the table and recognize the data, resulting in quicker updating and retrieval of data.
Special Considerations: Relational Tables Don’t Always Support Insert/Update Strategy
The previous example shows a company table with company and address data fields in one table. However, a relational database could store company and address data in two distinct tables: company and address.
For example, the table layouts for a company / address table relationship might be:
Company Table |
---|
CompId | AddressId | Company_Name |
Address Table |
---|
AddressID | Address_Line_1 | Address_Line_2 | City | Zip_Code |
An index cannot span more than one table, therefore, if the criteria for uniqueness is still Company_ Name, \Address_Line_1, and Zip_Code, then you cannot use the Insert/Update strategy because not all the fields are in the same table. However, if the above condition exists, you can use Company_ Name, Address_Line_1, and Zip_Code as your criteria for uniqueness with the Update/Insert operation and a database view instead.
For information on table relationships in the target database, see
Maintaining Table Relationships In The Target Database.
Update/Insert
An Update/Insert operation:
- Tries to update a record using the lookup links in the DTS file.
- Inserts the new record if, and only if, the update fails.
An Update might fail for several reasons:
- The record does not exist in the table.
- The attempted update would violate a field’s datatype, for example trying to update a field that is of type integer with a string of characters like “One”.
For this discussion, assume that the update fails because the record does not exist in the table. If you decide to use the Update/Insert operation, you have control. You are specifying the criteria for uniqueness rather than relying on a database index to prevent a duplicate from being added to the table. You direct TIBCO Scribe® Insight to look for a specific record, and if found, to update that record. If the record is not found, then perform an insert.
Advantages
If a data migration or integration results in more updated target records than inserted records, the Update/Insert operation performs better. If an update is successful, TIBCO Scribe® Insight Workbench continues processing without trying an insert.
Using A Database View
As described above in “Special Considerations: Relational Tables Don’t Always Support Insert/Update Strategy,” sometimes the Update/Insert strategy is the only option. When the fields that make up the criteria for uniqueness are in two different tables, a database view must be created.
A database view acts as a filter for one or more database tables. A view is a precompiled select statement that can join two or more tables and combine fields from two or more tables into a single virtual table. To facilitate the lookup that is needed for this example, a database view could be created using the following syntax:
Create View company_lookup as Select c.company_name, a.address_line_1, a.zip_code
From company c, Address a, Where a.addressid = c.companyid
This creates a view named company_lookup, whose layout is:
Company_Name | Address_Line_1 | Zip_Code |
The company_lookup view joins all the fields that are needed to satisfy our criteria for uniqueness.
Disadvantages
Database views add another layer of abstraction over database tables. When working with a view, the database must constantly evaluate the view definition or precompiled select statement. Thus database activity against a view, or virtual table, is slower than using a real table.