The Sysadmin Notebook  

Sitemap

SQL Server Integration Services

SSIS for Extraction, Transform, Load Operations

Contents

SQL Server Integration Services is Microsoft's Extract, Transform and Load (ETL) platform, to manage the movement of data from one data source to another. SSIS in SQL Server 2008 replaces DTS (Data Transformation Services) that came with earlier versions of SQL Server.

SSIS Packages are collections of tasks with a defined workflow and are normally built using the Business Intelligence Development Studio (BIDS). Packages are built by combining tasks, Data Flows, transformations, event handlers, containers and variables, and are stored in XML format with a '.dtsx' extension. The Import and Export wizard can also be used to build rudimentary SSIS packages.

Import and Export Wizard

Top Bottom

The Import and Export Wizard provides the quickest way to get started building SSIS packages. The Wizard can be accessed:

  1. from SQL Server Management Studio, by right-clicking a database and choosing the Import or Export options
  2. from BIDS, by right-clicking the SSIS Packages and selecting the 'SSIS Import and Export Wizard'
  3. from the Start Menu, and choosing the Import and Export data shortcut under the SQL Server 2008 menu
  4. from the command prompt, by typing dtswizard.exe

The Import and Export Wizard will take you through the steps of defining a data source and destination and managing the mapping of fields between the two, to create and execute an SSIS package. The created package can be dropped into a BIDS project for further refinement

BIDS

Top Bottom

BIDS is the Visual Studio development environment for SQL Server Projects, and provides templates for creating projects for Analysis Services, Reporting Services and Integration Services. There are two Integration Services templates available: Integration Services Project and Integration Services Connections. The latter template creates a blank project with just the connections defined.

Various projects can be combined in a solution meeting specific requirements. The Solution Explorer window in BIDS displays the projects associated to the solution. New projects can be added to the solution by right-clicking the solution name and selecting the 'Add' option. New items can be added to projects from the project's context menu.

File extensions for the files associated with solutions and projects are:

The Package Designer window in BIDS contains four tabs used in creating an SSIS Package for Control Flow, Data Flow, Event Handlers and Package Explorer.

Management Studio

Top Bottom

Management Studio is the tool for administrators to monitor, execute, secure and update packages. To connect to Integration Services, open Management Studio and select Connect -> Integration Services from the Object Browser

Typical ETL Package

Top Bottom

A typical ETL package might begin with an 'Execute SQL Task' in the Control Flow, to empty or drop a table in the target database. The data flow task can be set as a dependancy of this initial Control Flow task. The Data Flow would begin by defining a connection manager for the source data. A Data Flow Source can then be configured to use this connection.

Various transformations might be needed before attempting to load the data into the target database. A Derived Column Transformation can be used to tidy up data in a particular column using expressions. The derived column can either be added to the data stream as a new column or as a replacement for the existing column. A Conditional Split transformation can be used to redirect data that matches a certain conditions, for instance where a column contains NULL data. The redirected data might be fixed using a Lookup transformation or a Derived Column transformation. Those rows that cannot be fixed can be sent to an error log defined as a Flat File Destination Connection. The errors can be passed through an Audit transformation to add any desired audit fields. Those rows that were fixed successfully can be joined back to the main data flow using a Union All transformation. Conversion tasks can be applied to ensure that data in the data flow are the correct type for the target table, before connecting to a Data Flow Destination.

If loading multiple source files of the same format to a single table, the whole data flow can be placed in a Foreach Loop container. In this instance, the Foreach Loop will be configured with an enumerator of 'Foreach File'. Path and filename wildcard will be set in the Collection page. Add a new variable on the variables page to hold the value of the file's name for each iteration.

With the Foreach Loop container defined, the source file connection manager will need to be updated to use this variable for the source file. Use the properties page for the source file connection, and select the Expressions property. The resulting dialogue will allow you to map the filename to the variable defined in the Foreach Loop.

Having successfully processed all the files in a particular directory, you may want to add a File System task to the Foreach Loop container to move all successfully processed files to an archive directory, so they do not get processed next time the package runs. The File System task operation should be set to 'Move File', and the SourceConnection property will be set to the Source File Connection Manager, to pick-up the current filename in the Foreach Loop. A new Connection Manager will need to be defined for the destination folder.