Flat File Connector

Version 20.0.7488


Flat File Connector



Flat File Connectors convert flat file formats into XML and vice versa.

Overview

Each Flat File Connector is configured with a specific flat file format to convert to XML or to generate out of XML. The Flat File Connector has two primary modes:

  • Position Delimited
  • Character Delimited

For Position Delimited flat files, the connector is configured with an arbitrary set of field names, indices (i.e. positions), and lengths indicating where data appears in each line of the flat file.

For Character Delimited flat files, the connector is configured with the character that separates field values in the flat file.

More details on configuring the flat file format can be found in the Defining the Flat File Format section.

The Flat File Connector supports defining multiple different types of lines in the flat file. For example, a flat file may have a ‘header’ line representing Purchase Order data, and multiple ‘item’ lines representing the Line Items in the order.

The key to defining multiple line types is to specify the Control Field; the control field value is what determines the type of a specific line in the flat file data (for example, a header line might have a control field value of ‘HEAD’ while an item line has a control field value of ‘ITEM’). More details on configuring multiple line types can be found in the Multiple Line Types section.

After the flat file format is configured, the connector will convert files that match this format into XML. The resulting XML structure is explained in the XML Format section. The Flat File Connector can also translates XML that fits this structure into a flat file according to the defined flat file format.

Some flat files have implied hierarchical relationships between different line types. For information on preserving these hierarchical relationships when converting the flat file to XML, please see the Multi-Line Hierarchy section.

Connector Configuration

Settings Tab

Configuration

Settings related to the core configuration of the connector.

  • Connector Id The static name of the connector. All connector-specific files are held in a folder by the same name within the Data Directory.
  • Connector Description An optional field to provide free-form description of the connector and its role in the flow.
  • File Type Position Delimited - the fields in flat file appear at a specific position within each line.
    Character Delimited - the fields in the flat file are separated by a specific character, for example comma-separated values.
  • Delimiter If File Type is set to Character Delimited, this is the character that separates individual fields in the flat file.

Control Field

Settings related to the Control Field, which determines the different types of lines defined in the flat file format.

  • Multi-line Mode Whether the flat file format includes multiple types of lines. More information can be found in the Multiple Line Types section.
  • Start Index If File Type is Position Delimited and Multi-line Mode is enabled, this value is the index in the line where the Control Field begins. For example, if the first field in a line defines the type of line (i.e. the first field is the control field), then the Start Index would be 0.
  • Field Index If File Type is Character Delimited and Multi-line Mode is enabled, this value is the index of the Control Field in the line (indices start at 0). For example, if if the second field of a line defines the type of line (i.e. the second field is the control field), then the Field Index would be 1.
  • Generate Field/Line Type Names If File Type is Character Delimited and Multi-line Mode is enabled, this setting provides the option to not specify the names and indexes of the fields and line types within the flat file. When this setting is enabled, the connector will auto-generate XML element names for fields and line types that are not explicitly defined in the Line Types section below.
  • Column Headers Present If Multi-line Mode is disabled, this determines whether the first line of a flat file should be interpreted as column headers (i.e. the names of each field rather than actual data). Similarly, enabling this setting causes the connector to generate a header row when converting from XML to flat file.
  • Use Auto Generated Column Names If File Type is Character Delimited and Multi-line Mode is disabled, this determines whether the connector will generate generic field names. Keep this setting disabled to manually specify the field names in the Line Type section.

Line Types

This section allows for defining the field names and positions (if Line Type is Position Delimited) in the flat file format. More details on defining the flat file format can be found in the Defining the Flat File Format section.

Multiple line types can be defined if Multi-line Mode is enabled by using the Add Line Type button. Each line type has a Control Field Value, which is used to identify the line type. For example, a header line may have the control field value of ‘HEAD’ while an item line has the control field value of ‘ITEM’.

If File Type is set to Character Delimited, Multi-line Mode is enabled, and Generate Field/Line Type Names is enabled, it is not necessary to provide names and indexes for all of the fields or line types present in the flat file. In this case, the connector will auto-generate XML element names for any unspecified fields or line types.

Advanced

  • Padding Character When creating a flat file and the field value does not fill the entire field length, this character will be used to fill the rest of the field.
  • Invalid XML Name Prefix Some field names are not valid names for XML elements (for example, fields that begin with a number like ‘123ABC’), so a prefix must be applied to generate XML from the flat file. Similarly, when translating from XML to flat file, the connector will look for this prefix and remove it.
  • Log Messages Whether the log entry for a processed file will include a copy of the file itself.
  • Save to Sent Folder Whether files processed by the connector should be copied to the Sent folder for the connector.

  • Nest Line Types This setting is only relevant when converting flat files to XML, and when there are multiple line types in the flat file. When enabled, the connector will add hierarchy to the resulting XML based on the control fields of the flat file lines (i.e. the fields that identify the type of line). More information on this automatic hierarchy can be found in Multi-Line Hierarchy section.
  • Pad on EOL By default, the connector will throw an error if it encounters an unexpected end-of-line. When enabled, this setting tells the connector to pad out the remainder of the line instead of throwing an error.

Local Folders

Settings that determine the folder on disk that files will be processed from, and where they will be placed after processing.

  • Input Folder (Send) The connector can process files placed in this folder. If Send Automation is enabled, the connector will automatically poll this location for files to process.
  • Output Folder (Receive) After the connector finishes processing a file, the result will be placed in this folder. If the connector is connected to another connector in the flow, files will not remain here and will instead be passed along to the Input/Send folder for the connected connector.
  • Processed Folder (Sent) After processing a file, the connector will place a copy of the processed file in this folder if Save to Sent Folder is enabled. This copy of the file will not be passed along to the next connector in the flow.

Automation Settings

  • Send Whether files arriving at the connector will automatically be processed.

Defining the Flat File Format

The first step in configuring a Flat File Connector is defining the format of the flat file. This section describes formats that have a single line type, or in other words, each line in the flat file will have the same set of fields. For flat files with multiple different line types, please see the Multiple Line Types section.

Character Delimited Single-line Format

For Character Delimited flat files, defining the format is simple: specify the character that separates different fields in the flat file via the Delimiter property.

The Column headers present field indicates whether the first line of the flat file is a header line; in other words, it contains the names of fields rather than actual data. If these column headers are present, the connector will use the header names as the names of XML elements in the resulting translated XML. Similarly, the connector will use the XML element names to generate a header row when translating from XML to flat file.

If column headers are not present, the connector supports manually specifying the names of each field by adding the fields in the Line Type section. These field names are applied in index order, meaning that the first entry in Line Type will be the name of the first field in the flat file line, and so on.

The connector can also auto-generate generic field names by enabling Use auto generated field names.

Position Delimited Single-line Format

For Position Delimited flat files, defining the format requires specifying the position of each field in the format. The Line Type section in the connector settings provides an interface for adding an arbitrary number of fields that are present in each line of the flat file. Each field must be identified by a name and the position in the file it appears.

The Column headers present field indicates whether the first line of the flat file is a header line; in other words, it contains the names of fields rather than actual data. The field names must still be configured in the Line Type section, and this setting simply helps ensure that a header row is not interpreted as real data.

Multiple Line Types

If the flat file format contains multiple types of lines, the Multi-line Mode property should be enabled. The field in the flat file that identifies the line type is known as the Control Field.

Character Delimited Multi-line Format

When the File Type is Character Delimited, the Field Index setting determines where the Control Field appears in each line of the flat file. This index starts at 0, meaning that if the Control Field is the 5th value in the line, the Field Index should be 4.

For each possible value that might appear in the Control Field, define a new line type by clicking the Add Line Type button within the Line Types section of the connector settings. The value that identifies the line type should be set in the Control Field Value for that line.

Once each possible line type has been added and identified via a specific Control Field Value, the fields that will appear in each line type should be specified in index order.

If Generate Field/Line Type Names is enabled, it is not necessary to provide names and indexes for all of the fields or line types present in the flat file (only the Control Field is required). In this case, the connector will auto-generate XML element names for any unspecified fields or line types.

Position Delimited Multi-line Format

When the File Type is Position Delimited, the Start Index setting determines the position where the Control Field appears (begins) in each line of the flat file. This index starts at 0, meaning that if the Control Field begins at the 15th character in the line, the Field Index should be 14.

For each possible value that might appear in the Control Field, define a new line type by clicking the Add Line Type button. The value that identifies the line type should be set in the Control Field Value for that line.

Once each possible line type has been added and identified via a specific Control Field Value, the fields that will appear in each line type should be specified along with the position in the line that they appear (begin).

Multi-line Example

For example, say that a flat file contains two types of lines, a shipment line type and a package line type. The shipment line contains information on the date, time, and addresses of shipment, and the package line type contains information on the items being shipped.

The shipment line type may have a Control Field Value of ‘SHIP’, and the package line type may have a Control Field Value of ‘PCKG’. The first field in each line is either ‘SHIP’ or ‘PCKG’ to indicate what type of line it is.

To handle this case, the Multi-line mode should be enabled, and the Field Index (or Start Index) should be set to 0 to indicate that the Control Field is the first field in the line. Then, there should be two line types configured in the Line Types section; one with a Control Field Value of ‘SHIP’, which contains each field in a shipment line (e.g. ShipDate, DeliveryDate, ShipToAddress, etc), and one with a Control Field Value of ‘PCGK’, which contains each field in a package line (e.g. ItemName, ItemWeight, etc).

XML Format

After a flat file is converted to XML, the result has the following XML structure:

  • An Items element at the root of the document
  • Each line in the flat file has an element named the same as the Control Field Value for that line (or ‘row’ if no Control Field Value is defined)
  • Each field in the row is a child element of the Control Field Value element

For example, if a flat file has ‘SHIP’ and ‘PCKG’ lines, then the XML output would look similar to this:

<Items>
  <SHIP>
    <ShipmentId>12B992</ShipmentId>
    <Date>20200228</Date>
    <ShipTo>14 Wallaby Way</ShipTo>
  </SHIP>
  <PCKG>
    <ShipmentId>12B992</ShipmentId>
    <ItemName>Goggles</ItemName>
    <ItemWeight>3.98</ItemWeight>
  </PCKG>
  <PCKG>
    <ShipmentId>12B992</ShipmentId>
    <ItemName>Fins</ItemName>
    <ItemWeight>1.07</ItemWeight>
  </PCKG>
</Items>

To convert an XML file into a flat file, the XML input must match the above structure (this includes the restriction that the field names must match the defined fields in the connector configuration).

When converting XML into a flat file, the connector will create a new line in the resulting flat file for each row element in the input XML. For each child of the row element, the connector will match the child to a Field Name in the connector configuration, and place the value for that element at the appropriate Field Index.

Multi-Line Hierarchy

Flat files with multiple line types often have an implied hierarchical relationship between the line types. For example, one line might contain data for a Purchase Order (e.g. customer name, order date, etc), while the following lines represent each line item in that order (e.g. item name, item quantity, etc). The line items “belong” to the order, creating a hierarchical relationship.

These relationships are sometimes referred to as “Master-Detail” relationships; meaning the first line type (Purchase Order) is the “master,” and the following lines (line items) are “details” about that master line. In XML, this is often represented as a “Parent-Child” relationship.

The Flat File Connector can preserve these hierarchical relationships when translating flat files to XML. If the Nest Line Types setting in the Advanced tab is enabled, the connector will automatically indent “detail” line types so that they are children of “master” line types.

The following is a simple example of translating a hierarchical flat file with Nest Line Types set to true. The section after the example will explain the logic the connector uses to determine the hierarchical relationships.

Multi-Line Hierarchy Example

Take as example input the following flat file:

M,CustomerA,05022020,true
D,itemA,4,2.09
D,itemB,9,15.23
M,CustomerB,05032020,false
D,itemC,1,5.99
D,itemE,1,3.99

There are two line types in this flat file, M and D:

  • M, or Master, contains information for an order: customer name, order date, and a “high value customer” flag.
  • D, or Detail, contains line item information for the preceeding order: item name, item quantity, item price.

Since the D line type “belongs” to the M line type in a hierarchical relationship, it may be desirable to preserve this hierarchy when translating the flat file to XML. To see what this looks like, take the following example output when the Nest Line Types setting is enabled and the above flat file is translated to XML:

<Items>
  <M>
    <CustomerName>CustomerA</CustomerName>
    <OrderDate>05022020</OrderDate>
    <HighValueCustomer>true</HighValueCustomer>
    <D>
      <ItemName>itemA</ItemName>
      <ItemQuantity>4</ItemQuantity>
      <ItemPrice>2.09</ItemPrice>
    </D>
    <D>
      <ItemName>itemB</ItemName>
      <ItemQuantity>9</ItemQuantity>
      <ItemPrice>15.23</ItemPrice>
    </D>
  </M>
  <M>
    <CustomerName>CustomerB</CustomerName>
    <OrderDate>05032020</OrderDate>
    <HighValueCustomer>false</HighValueCustomer>
    <D>
      <ItemName>itemC</ItemName>
      <ItemQuantity>1</ItemQuantity>
      <ItemPrice>5.99</ItemPrice>
    </D>
    <D>
      <ItemName>itemE</ItemName>
      <ItemQuantity>1</ItemQuantity>
      <ItemPrice>3.99</ItemPrice>
    </D>
  </M>
</Items>

Notice that the D records are children of the M records, which accurately reflects the hierarchy present in the source data.

Multi-Line Hierarchy Logic

The Flat File Connector uses the order of lines in the source flat file to determine the hierarchy between line types. When Nest Line Types is enabled, the connector follows the following logic:

  • The first line type the connector encounters is always treated as the top level of hierarchy (i.e. it is not indented in the resulting XML)
  • After the first line type, when the connector encounters a new line type it assumes this line type is one level of hierarchy below the previous line type (i.e. it “belongs” to the previous line and is therefore indented in the resulting XML)
  • When the connector encounters a line type it has encountered before, it returns to that line type’s level of hierarchy, closing all XML elements at hierarchy levels equal-to or below this level.

Thus, the source flat file must have lines in order according to the desired hierarchy.