The Sysadmin Notebook  

Sitemap

Data Flow Tasks

Transforming Data from Source to Destination

Contents

Each task in the Control Flow of an SSIS package has a task editor, except the Data Flow task, which has its own tab, where various Data Flow tasks can be combined to configure the actual data flow. Data Flows consist of:

  1. Sources
  2. Transformations
  3. Destinations

The tasks used to move from data from a source to a destination are linked by paths connecting the output from one task to the input of another. Each path can be configured with a Data Viewer, that can be used in the development environment to visualise data at each stage of the Data Flow. To add a Data Viewer to a path, right-click the path and choose 'Configure Data Flow Path Editor'. Then click 'Add' to open the 'Configure Data Viewer' dialogue.

The Connection Manager pane shows active connections to sources and destination data containers. Creating a Connection via the solution explorer makes the connection shared across all packages in the solution. Connections for the current package only, are created via the Connection Manager Pane in the current packages Data Flow tab.

Data Flow Sources

Top Bottom

Define the source for data for the data flow. There are a variety of Data Flow source types, with configuration options appropriate to the source of the data:

OLE DB Source

Top Bottom

Provides access to data sources from any OLE DB compliant database. Once a connection is defined for an OLE DB Source task, the data access mode can be set to:

Using an SQL command instead of a Table or View, allows you to filter the selected columns and rows that are retrieved from the datasource, and thus reduces network traffic and filtering overhead on the client-side. Using a Table or View as the data source, returns all the data to the the client which can then be filtered on the columns page of the data source task editor. Using SQL Commands also allows the use of parameter holders, and the data source task editor will allow you to map the parameter place holders to package/task variables. The Error Output page on the task editor allows you to specify how to handle rows with errors or truncated values. The options are:

Redirect
Will send the row in error through a different output path
Ignore
Will ignore the error and set the field value to NULL
Fail
Will cause the task to return a FAIL execution status

Excel Spreadsheet Source

Top Bottom

Treats an Excel Workbook as a database and each sheet as a table. Because of the mismatch in datatypes between Excel columns and SQL Server datatypes, conversion transformation may be necessary for Excel data sources

Flat File Source

Top Bottom

Can handle fixed-width, delimited or ragged-right data-sources. The flat file source is configured from the Connection Manager, and any changes to the source definition will require the Data Flow task to be refreshed to reflect the changes.

Other File Source

Top Bottom
Raw File sources
a specialised version of Flat File sources, in a format specially optimised for use with SSIS
XML
for handling XML Data sources
ADO.NET
an ADO.NET data source handler

Data Flow Destinations

Top Bottom

Data Flow Destinations are configured similarly to Data Flow Sources, with the additions of a Mapping page used to map input (External Columns) to output (Output Columns). Data Flow Destinations can not be configured until an input path has been assigned to the task. Data Flow Destinations include:

Data Flow Transformations

Top Bottom

Transformations tasks sit between source and destination tasks, and are used to apply transformations to the data flow.

Aggregate Transformation

Top Bottom

Aggregates data by applying a GROUP BY expression to an input column, and then applying aggregate functions to other columns to define the output stream for the transformation.

Available aggregate functions include:

Audit Transformation

Top Bottom

Adds auditing information regarding SSIS package to the output data flow. Available auditing fields to add are:

Cache Transformation

Top Bottom

Used to load a cache file into the data flow for later use in a lookup transformation

Character Map Transformation

Top Bottom

Performs character transformation in the Data Flow

Conditional Split

Top Bottom

Used to send data from a single input path to multiple output paths based on one or more conditions written in SSIS expression language. Each condition or case is given a name, corresponding to the name of the transformations output paths. A default output path is also created, for any rows in the input flow that do not match any of the defined conditions. The various output paths can be brought together again with a Union All transformation or sent to different Data Flow Destinations.

Example Expressions for Conditional Split Transformation
Expression Description
customer_category == "NHS" Selects records where the customer_category field value is "NHS"
customer_category == "NHS" || customer_category == "PP" Selects records where the customer_category field value is either "NHS" or "PP"
customer_category == "NHS" || org_category == "NHS" Selects records where the customer_category field value is "NHS" or the org_category field value is "NHS"
(customer_category == "NHS" || customer_category == "PP") && org_category == "NHS" Selects records where the customer_category field value is either "NHS" or "PP" and the org_category field value is "NHS"

Copy Column Transformation

Top Bottom

Adds a copy of a column from the input path with a new column name to the output path

Data Conversion Transformation

Top Bottom

Allows you to convert column data from one type to another. Similar in functionality to the CAST or CONVERT TSQL functions. The Data Conversion transformation results in the creation of an additional field in the pipeline representing the data after conversion.

Data Mining Query

Top Bottom

Used to fill gaps or predict values for new columns based on the input data stream

Derived Column Transformation

Top Bottom

Creates a new column based on the value from an existing column using SSIS expressions. For example, the ISNULL operator can be used with the ternary operator to fill in blank fields. Alternatively SUBSTRING or TRIM functions can be used to remove blank spaces or truncate large fields. The derived column can be used either to replace the original column in the output stream, or added as a new column. Cast operators are available in the expression, to ensure the data type of the derived column will be able to match its final destination type.

Example Expressions for Derived Column Transformation
Expression Description
(DT_STR,5,1252)"NHS" Adds the string value "NHS" to the defined column in the dataflow as a 5 character string value
(DT_STR,1,1252)3 Adds the string value "3" to the defined column in the dataflow as a 1 character string value
"B" + SUBSTRING(ordercat,1,1) Returns a string consisting of a "B" followed by the first character of the ordercat field value
(DT_DBDATE)GETDATE() Adds the current date value to the defined column in the dataflow as a database date type
LEN(ordertime) == 5 ? ordertime : (DT_STR,5,1252)NULL(DT_STR,5,1252) Checks if the ordertime field value is 5 characters long. If it is then this value is used. If not, then a NULL string is used. Note that although the expression "NULL(DT_STR, 5, 1252)" is supposed to return a NULL string, the return value still needs to be cast as a string using "(DT_STR, 5, 1252)"
LEN(ordertime) == 4 ? SUBSTRING(ordertime,1,2) + ":" + SUBSTRING(ordertime,3,4) : (DT_STR,5,1252)NULL(DT_STR,5,1252) If the ordertime field value is only 4 characters long, then insert a colon between the 2 and 3 character. Otherwise return a NULL string
RTRIM(SUBSTRING(category,1,4)) Selects the first four characters of the "category" field, then removes any trailing spaces

Export Column Transformation

Top Bottom

Exports data from the data flow to a file. Can be used to create files from BLOB data in the input flow.

Fuzzy Grouping Transformation

Top Bottom

Used to find redundant data in dimension tables for data cleanup

Fuzzy Lookup Transformation

Top Bottom

Used to correct poor data quality during an ETL load

Import Column Transformation

Top Bottom

Used to import files into BLOB fields

Lookup Transformation

Top Bottom

Performs a lookup using columns from the input flow and a lookup table to retrieve additional columns from the lookup table for the output flow, either replacing input columns or adding new columns to the output.

The lookup table used can be specified as a table, view or SQL query. A join is specified on the columns tab between the input flow and the lookup table. Then the additional columns to add to the pipeline are selected. Failed lookups can force the component to fail, the row to be redirected or the lookup columns to be set to NULL.

Rows that fail the Lookup can be sent along a different output path by setting the value of the 'Specify how to handle rows with no matching entries' property. Thus it is possible to use the lookup to split the input data flow, without actually adding new columns of data, based on whether a matching record is found in the lookup or not.

Merge Transformation

Top Bottom

Merges data from two input paths to a single output path. Useful for joining error paths back to the main flow once the errors have been dealth with or for loading data from two distinct sources. Data from both sources must be sorted beforehand, must have matching metadata and only two input paths can be specified

Merge Join Transformation

Top Bottom

Performs a join on two inputs to produce a single output stream

Multicast

Top Bottom

Sends a single input data flow to multiple output paths. No filtering is available for the output data flow and each output path contains the same data flow content.

OLE DB Command Transformation

Top Bottom

Used to execute an SQL statement for each row in an input stream. The statement can be a stored procedure call or a prepared SQL statement. The statement will typically contain one or more parameter markings, and the parameters are then mapped to the appropriate fields in the input flow and executed for each row in the data flow.

Percentage and Row Sampling Transformation

Top Bottom

Allows selection of a sample of an input stream to produce two output streams corresponding to selected and non-selected data

Pivot Transformation

Top Bottom

Produces cross-tabulated column data by summarising row data from the input stream

Unpivot Transformation

Top Bottom

Unpivots columns into rows

Row Count Transformation

Top Bottom

Used to count rows from the input stream

Script Component Transformation

Top Bottom

Use custom scripts for sources, transformations and destinations. The first step in configuring a Script Component task is to identify which type of component it is:

Slowly Changing Dimension Transformation

Top Bottom

Designed to handle propagation of attribute changes in lookup tables

Sort Transformation

Top Bottom

Used to sort data stream by specified columns

Term Extraction Transformation

Top Bottom

Used to mine text data for word or phrase frequency

Term Lookup Transformation

Top Bottom

Does Term Extraction combined with mapping from a lookup table

Union All Transformation

Top Bottom

Takes output from multiple sources and combines them into a single resultset. Used to bring together different paths in the data flow.