XML Map Connector
XML Map Connector
The XML Map Connector provides a visual designer-driven way to transform XML data from one structure to another.
XML is the primary format that ArcESB uses to retrieve and manipulate data. Files are often converted to XML when they enter the Arc flow, and generated from XML when they leave the XML flow. Transforming between XML structures is therefore critical to many use cases.
The XML Map connector provides a flexible and intuitive interface for transforming XML structures. First, a source template file (a file containing the XML structure for all input documents) and a destination template file (a file containing the XML structure that should be output by the connector) should be uploaded to the connector. Please see the Generating Template Files section for details.
After source and destination template files have been uploaded, the Mapping designer will be populated with the source and destination XML structure. Then, elements from the source can be dragged-and-dropped onto elements in the destination to establish the mapping relationship. Please see the Using the Designer section for details.
The How-To Guides section of the documentation includes several guides for EDI mapping flows. These guides cover more than just the use of the XML Map Connector, but can be used as additional examples if necessary.
This section contains all of the configurable connector properties.
Settings related to the core operation of the connector.
- Connector Id The static name of the connector. All connector-specific files are held in a folder by the same name within the Data Directory.
- Connector Description An optional field to provide free-form description of the connector and its role in the flow.
- Source File A file that represents the XML structure of input documents. Any files processed by the connector should have a matching XML structure. Elements in input documents can be repeated in ways that differ from the Source File as long as the structure of nested elements is the same.
- Destination File A file that represents the structure of output documents. Files produced by the connector will have a matching XML structure. Elements in the output document may be repeated or omitted in ways that differ from the Destination File, according to the mapping, however the structure of nested elements will remain the same.
Defines the mapping relationship between input and output files. Please see Using the Designer for more details.
Settings related to the automatic processing of files by the connector.
- Send Whether messages arriving at the connector will automatically be processed.
Settings that determine the folder on disk that files will be processed from, and where they will be placed after processing.
- Input Folder (Send) The connector can process files placed in this folder. If Send Automation is enabled, the connector will automatically poll this location for files to process.
- Output Folder (Receive) After the connector finishes processing a file, the result will be placed in this folder. If the connector is connected to another connector in the flow, files will not remain here and will instead be passed along to the Input/Send folder for the connected connector.
- Processed Folder (Sent) After processing a file, the connector will place a copy of the processed file in this folder if Save to Sent Folder is enabled. This copy of the file will not be passed along to the next connector in the flow.
Settings related to the allocation of resources to the connector.
- Max Workers The maximum number of worker threads that will be consumed from the threadpool to process files on this connector. If set, overrides the default setting from the Profile tab.
- Max Files The maximum number of files that will be processed by the connector each time worker threads are assigned to the connector. If set, overrides the default setting from the Profile tab.
Settings not included in the previous categories.
- Send Filter A glob pattern filter that determines which files in the Send directory should be processed by the connector. Patterns will exclude matching files if the pattern is preceded by a minus sign:
Multiple patterns can be specified, comma-delimited, with later filters taking priority.
- Local File Scheme A filemask for determining local file names as they are downloaded by the connector. The following macros may be used to reference contextual information:
%ConnectorId%, %Filename%, %FilenameNoExt%, %Ext%, %ShortDate%, %LongDate%, %RegexFilename:%, %DateFormat:%.
As an example: %FilenameNoExt%_%ShortDate%%Ext%
- Log Messages Whether the log entry for a processed file will include a copy of the file itself.
- Save to Sent Folder Whether files processed by the connector should be copied to the Sent folder for the connector.
Generating Template Files
The first step in any XML Mapping is to upload template files representing the Source and Destination XML Structure. These samples can be generated within Arc in several ways.
Template Files from Transformation Connectors
Transformation connectors like X12, EDIFACT, and CSV automatically convert documents into XML, so they can easily generate template XML files.
Upload Test File
EDI Connectors (X12, EDIFACT, etc) and CSV Connectors include a feature that automatically generates an XML representation of input files. In the Input tab of these connectors, click the More dropdown and select ‘Upload Test File’. Navigate to a sample file on disk that should be modeled as XML, and click OK.
Once a test file has been uploaded, connect the current connector to an XML Map Connector in the Flow (in either direction; inbound to the XML Map Connector or outbound from the XML Map Connector). The XML Map Connector will automatically detect this Test File structure and include it as an available Source or Destination File in the settings dropdown.
Manually Create a Test File
Test files can also be manually created. Simply send a sample file through the transformation connector (EDI Connectors should be explicitly configured as ‘EDI-to-XML’ mode) to generate an XML output file. Download that output file and upload it again as the Source or Destination File in an XML Map Connector.
Template Files from Database Connectors
Database Connectors like the SQL Server Connector, MySQL Connector, and CData Connector automatically generate XML representations of their Input and Output mappings. Similar to EDI and CSV Connectors, after an Input/Output Mapping is saved in a Database Connector, any connected XML Map Connectors can automatically detect these XML structures.
To use a database Input/Output Mapping as a Source/Destination File, simply follow these steps:
- In the Database Connector, establish a working connection to the database
- Create an Input or Output mapping and save changes (please see the documentation for the specific database connector for details)
- Connect an XML Map Connector to the Database Connector in the Flow, and save the Flow changes (the blue save icon in the bottom right)
- Find the Input or Output mapping as a Source or Destination File in the XML Map Connector settings
XML Map connectors are typically in-between two other connectors in an Arc flow. It is usually best to generate template files with the two connectors surrounding an XML Map connector in the flow. For example, the following flow has an XML Map connector mapping X12 files to a database insert:
In this example, the X12 connector and the SQL Server connector should be used to generate the XML template files for the XMLMap connector in-between them. More details on generating a template file from an X12 connector or SQL Server connector can be found in the subsections above.
Using the Designer
Once a template file has been set in both Source File and Destination File, the visual designer will populate with the complete document model for both the source and destination XML. These model can be traversed as an XML tree.
Parent and Leaf Nodes
The XML tree in the visual designer has two types of nodes: Parent nodes (nodes with children but no value) and Leaf nodes (nodes with a value but no children). Parent nodes in the source can be dragged onto Parent nodes in the destination, and Leaf nodes in the source can be dragged onto Leaf nodes in the destination.
Dragging a source Parent onto a destination Parent will establish a Foreach relationship between the source and destination nodes: each occurrence of the source element will produce a corresponding destination element (including all of the destination element’s children). Once a Foreach relationship is established, a green xpath will appear in the destination; elements at this path in input XML will result in a new instance of the output node (and its children). In a more technical sense, a Foreach relationship instructs the connector to loop over a given xpath in the source and produce the mapped destination structure for each element it finds. The green xpath in the destination tree view is the xpath over which the connector will loop.
Dragging a source Leaf onto a destination Leaf will instruct the connector to populate the destination element with the value from the source element. After dragging-and-dropping onto the destination node, the xpath from which values will be read is displayed in the destination tree view.
The xpaths displayed in the destination tree view are either Absolute or Relative. Absolute xpaths begin with a slash (/) and describe the entire xpath in the source, beginning from the root of the document. Relative xpaths do not begin with a slash, and are relative to a Foreach loop set in a parent node. A relative xpath can be relative to multiple Foreach loops (as long as the element has multiple parents that each have a Foreach relationship mapped). To find the absolute xpath for any given relative xpath, simply concatenate each parent’s Foreach xpath, starting from the top of the document, until reaching the current node.
Parent nodes (Foreach loops) should be mapped before Leaf nodes are mapped. Establishing the loop relationships requires an understanding of the source and destination XML structures: whenever a repeated element in the source should result in a repeated element in the destination, those elements should be mapped together in a Foreach relationship.
Within a Foreach loop, Leaf element xpaths are relative to the mapped Foreach xpath.
As a very simple example, consider the following source and destination XML, where a nested XML structure should be converted into a flat XML structure:
<Source> <customer> <name> <first></first> <last></last> </name> <address> <streetLine1></streetLine1> <streetLine2></streetLine2> <city></city> <zip></zip> </address> </customer> </Source>
<Destination> <customerInfo> <firstName></firstName> <lastName></lastName> <addressLine1></addressLine1> <addressLine2></addressLine2> <city></city> <zip></zip> </customerInfo> </Destination>
Each repetition of the customer element in the source should result in a customerInfo element in the destination, so these parent elements should be mapped together to form a Foreach relationship. Then, the mapping for each Leaf element is simple:
Note that the xpaths for the Leaf elements are relative to the xpath in the Parent element (the xpath that defines the Foreach relationship).
Unnecessary Parent Nodes
When establishing a Foreach relationship, only a single instance of the mapped elements needs to exist in the Source and Destination file. In other words, the Foreach relationship takes care of ensuring that the number of output elements matches the number of corresponding input elements.
To make this clear, imagine in the above example that the Source File (i.e. the template for the input XML) had multiple sets of customer element groups. Establishing a single Foreach relationship between customer (Source) and customerInfo (Destination) ensures that the number of customerInfo element groups matches the number of customer element groups, for any input file. Since it only takes one customer element to establish this Foreach relationship, all other customer elements are irrelevant to the mapping and can be ignored (or deleted).
Similarly, if the Destination File in the above example had multiple customerInfo elements, all but one should be deleted. The Foreach relationship between customer and customerInfo would still ensure that the appropriate number of customerInfo element groups appeared in the XML output.
Mapping Multiple Loops
XML Mappings will often require multiple Foreach loop relationships within the same document. The principle for mapping loops remains the same: any repeated Parent elements in the source XML that should generate repeated elements in the destination XML should be mapped as a loop. Outer loops should be mapped before inner loops (in other words, start at the top of the XML structure and work down). All looping relationships should be mapped before any Leaf elements are mapped.
As a common example, consider mapping an incoming Purchase Order report to a destination database. Such a mapping contains two distinct element structures that may repeat (and thus each requires a Foreach relationship): (1) a single report may contain multiple individual orders, and (2) an individual order may contain multiple line items.
The source template structure may look like this:
<OrderReport> <WebOrder> <CustomerName>John Doe</CustomerName> <PurchaseDate>12/21/18</PurchaseDate> <Line> <ItemName>Hammer</ItemName> <ItemCost>1500</ItemCost> <ItemQuantity>1</ItemQuantity> <ItemDescription>Standard claw hammer</ItemDescription> </Line> <Line> <ItemName>Nail</ItemName> <ItemCost>10</ItemCost> <ItemQuantity>20</ItemQuantity> <ItemDescription>Ten penny nails</ItemDescription> </Line> <Subtotal>1700</Subtotal> <TaxPercent>4</TaxPercent> </WebOrder> </OrderReport>
This example includes only one WebOrder element for brevity, but the mapping should handle cases where multiple WebOrder sections are included in the same OrderReport.
The output of this mapping should match the XML model of a database insert. The XML model of a database insert is created automatically by a Database Connector (like the MySQL Connector, SQLite Connector, CData Connector, etc), and the Template Files from Database Connectors section discusses how these XML models can be used as template files in the XML Map Connector.
Proper database design suggests that the data should be inserted into two separate tables, one for Orders and one for Line Items. Generating an appropriate Input Mapping for this approach may result in a template structure like this:
<Items> <Order> <FirstName></FirstName> <LastName></LastName> <Date></Date> <OrderLine> <Name></Name> <Desc></Desc> <Price></Price> <Amount></Amount> </OrderLine> </Order> </Items>
In this example, the Order element represents an insertion into an Orders table, and the OrderLine element represents an insert into a Line Items table. Templates are generated with only one element representing each table, but the Foreach relationships established during the mapping will ensure that the appropriate number of inserts are created.
The other children of Order (FirstName, LastName, Date) represent columns in the Orders table, and the children of OrderLine (SKU, Price, etc) represent columns in the Line Items table.
Establishing the Looping Relationships
A WebOrder element in the Source should result in a new insert into the Orders table, so it should be dragged onto the Order element in the Destination. Similarly, a Line element in the Source should result in a new insert into the Line Items table, so it should be dragged onto the OrderLine element in the Destination.
After establishing these two Foreach relationships, the visual designer should look like this:
Note that the second occurrence of the Line element in the Source (displayed as Line) can be ignored. This is explained in more detail in the Unnecessary Parent Nodes section.
After the loop relationships are established and any unnecessary destination elements are removed, the Leaf elements can be mapped to fill in the destination values:
Using the Expression Editor
The expression editor supports modifying values as they are mapped from the source to the destination. This editor makes use of the powerful ArcScript language to format and dynamically generate content. To access the Expression Editor, select a node in the destination document and select the tablet and pencil icon to display the editor.
The editor displays a black panel containing the ArcScript expression used to render the result. When editing a node that already has been mapped to an element from the source XML, the expression will display the xpath representing this mapping. From here, edit the expression to manipulate the value, or include references to additional nodes in the source XML.
Any expression in square brackets is evaluated as a variable in ArcScript. In most situations, variable expressions include an xpath() evaluation of an element in the source document. Multiple bracketed expressions can be used to express multiple variables, either back-to-back or interspersed with literal characters (outside of square brackets).
For example, to combine the values at two different paths:
<Customer> <First>Bruce</First> <Last>Wayne</Last> </Customer>
A single expression can join the two values:
Formatters support manipulating the values returned at different xpaths. Formatters are separated by a pipe character (|) in the expression, and evaluated from left to right. For example:
[xpath('City') | toupper | substring(0,3)]
In this example, before the value of the value at the City xpath is returned, all string characters are converted to upper case characters, and a substring of the first three characters are returned in the result. For example, if the source document had a value of:
The resulting expression returns the following:
After selecting the Formatters tab in the Expression Editor, each formatter is displayed in a searchable list. A formatter can be added to the expression directly by clicking on the formatter from the list.
String manipulation is a common use cases for the Expression Editor. Common string formatters include:
For example, it may be necessary to split the a Name value from the input XML into two separate fields of the output XML.
In this case, the split formatter should be used. The parameters of this formatter are the character around which to split the string and the index of the resulting array that should be returned (indexes begin at 1):
[xpath(CustomerName) | split(' ', 1)]
The full list of string formatters can be found here.
Another common use case involves reformatting dates from the source document to the destination. This is supported by the todate formatter, which accepts two arguments: the format of the output and input dates. The following example converts a date in the form of 12/21/18 to a date in the form of Friday, 21 December, 2018:
[xpath(PurchaseDate) | todate(D, "mm/dd/yy")]
Additional functions that are useful for date calculation are dateadd and datacompare, which can be used to add or subtract fixed periods of time to a date and perform date comparisons, respectively.
The full list of date formatters can be found here.
Math operations are useful for performing calculations on numerical values from the source XML. The following example converts cents to dollars, and ensures that the resulting value is a decimal value with two positions:
[xpath(ItemCost) | divide(100) | decimal(2)]
Math formatters can be used to calculate tax and add the tax value to a total. The following example includes a nested set of math formatter expressions; each expression is evaluated from left to right, and a nested expression is evaluated in its entirety before returning to outer expressions:
[xpath(Subtotal) | divide(100) | multiply([xpath(TaxPercent) | divide(100) | add(1)]) | decimal(2)]
The full list of math formatters can be found here.
The xpath() formatter supports lookahead syntax to further specify which values from the source document should be mapped to the destination document. Lookaheads can help target a specific value in the midst of repeated XML element structures.
For example, the input XML may have multiple line items, only one of which contains the desired value. Each line item has the same xpath, so Lookahead syntax is required to retrieve the desired value from among the values at the same xpath.
The following XML demonstrates this situation, as the LineItem elements have matching XML structure:
<LineItem> <ItemType>Goods</ItemType> <ItemName>Widgets</ItemName> <ItemAmount>20.00</ItemAmount> </LineItem> <LineItem> <ItemType>Tax</ItemType> <ItemName>Sales Tax</ItemName> <ItemAmount>1.38</ItemAmount> </LineItem>
Imagine that the amount for the ‘Tax’ line item (1.38) needs to be mapped to the destination document, but not the amount for the ‘Goods’ item (20.00). Since both line items have the same XML structure, an xpath alone is not enough to specify the ‘Sales Tax’ line item amount. As an illustration, the following expression uses the correct xpath but retrieves the ‘Goods’ item amount instead of the ‘Tax’ item amount (since the ‘Goods’ item amount is the first value that satisfies the xpath):
In order to specify the ‘Tax’ line item, the expression needs to look into the LineItem element for the ItemType element, which identifies the line item as a ‘Tax’ item. The LineItem element is thus the ‘parent’ of the Lookahead, and the ItemType element is the ‘target’ of the Lookahead.
Lookahead syntax is as follows: inside the xpath expression, add square brackets directly after the ‘parent’ element of the Lookahead. Inside the square brackets, provide the xpath to the ‘target’ element of the Lookahead and use an equals expression to check the target value (note that the square brackets must be escaped with backslashes):
This translates to: “find the value from ‘LineItem/ItemAmount’ for the ‘LineItem’ element where ‘LineItem/ItemType’ is ‘Tax’. The expression would return the value 1.38.
After saving changes in the expression editor, the expression displayed in the Destination mapping should have green text to indicate a valid expression. If the expression is bold or italics black text, then a syntax issue is causing the expression to be evaluated as a literal or an invalid expression. Typically this is caused by not escaping reserved characters like square brackets, parentheses, or slashes.
Treat Empty As Null
The Expression Editor includes the option to treat empty input values (e.g. a string with a length of 0) as NULL output values. By default, this is false, and empty input values will be treated as the empty string: “”
As an example for when this setting may be useful, some mappings may interface with a database table that includes columns that do not accept NULL values. In these cases, empty string values may prevent errors while inserting to the database, or empty string values pulled from the database may need to be converted to NULL to better reflect the dataset.
Conditionals are added to Destination nodes so that they are only included in the output document if a certain condition is true.
Select the Filter icon next to a mapped element to add a conditional to the mapping. The Conditional editor allows for creating logical rules and groups of rules that determine whether a destination element should appear in the output document. Each rule uses a configurable boolean operator to compare an input element to a specified value.
For example, when mapping input XML from a Purchase Order, the tax associated with the purchase may be included as one of the Line Items. When mapping the tax information output element, it may make sense to add a conditional that excludes Line Items with an ItemName that is not equal to Tax.
Conditional logic can also be accomplished using Lookahead syntax, described in the section above. Often time, parent elements are qualified by a child element that provides context to the values inside the parent. In these cases it may be easier to use Lookahead syntax than create multiple conditionals to exclude unwanted values.
The Conditional editor allows for specifying a custom condition using the syntax of ArcScript. One common use of this custom condition editor is to compare two dynamic values from the Source document (rather than comparing a single dynamic value against a static value).
For example, the following custom conditional could be used to check to see whether two values in the Source XML are equal:
'[xpath(element1)]' == '[xpath(data/element2)]'
Note that the single quotes in the above example are required.
Boolean logic in custom conditionals can also be performed entirely using ArcScript formatters, like the following example:
[xpath(element1) | equals([xpath(data/element2])])]
Note that this syntax does not require single quotes. All of ArcScript’s formatters can be found documented in the dedicated Formatters section.
The XML Map connector is designed to perform as much of the mapping as possible through the designer, however there may be situations where scripting is required to handle custom use cases.
To access additional operations beyond the formatters provided in the Expression Editor, click on the angle brackets adjacent to a destination element (</>) to open a Custom Script for that element.
The custom script editor, supports all of the features of ArcScript found in the Scripting section. As with other sections of ArcScript in Arc, the code block contains an arc:info section where the input parameters available to the script are defined. Additionally, there is an output element result.text that can be set to return the results of the custom script. The value that is returned in result.text is the value used to populate the destination element.
As an example, a script can be used to determine the SKU for an item based on the item’s name. A simple way of accomplishing this is to use a select/case statement in combination with the xpath formatter to check the ItemName element of the input XML:
ArcScript is fully available within code view, including powerful ArcScript Operations. For example, the source XML may contain an ID of an item, and the SKU for that item must be retrieved from a database; in this case the dbQuery operation can look up the SKU for the corresponding ID.
ArcScript also supports performing conditional logic within a mapping template. The arc:if keyword is one of many keywords available to assist in performing conditional logic within templates. For example, if the source file contained information about customers within QuickBooks, it may be desirable to perform different business logic for customers with an outstanding balance as opposed to customers that have paid in full. A simple example of this use case might look like the following:
<arc:set attr="paidInFull" val="true" /> <arc:if exp="[xpath('balance')] > 0"> <arc:set attr="paidInFull" val="false" /> </arc:if> <arc:set attr='result.text'>[paidInFull]</arc:set>
Foreach Loop Index
When an element mapping is within a Foreach loop, the index of the current loop is always available within a custom script. The \_index attribute is reserved for the current index of the inner-most Foreach loop that contains the current element.
As an example, imagine mapping a LineItem element that exists within two Foreach loops: one loop for each order in the document, and another loop (within the first loop) for each item in an order. Referencing ‘[_index]’ within the LineItem element mapping will return the number of times the inner loop (the ‘item’ loop) has looped so far. This value is 1-indexed.
It may be useful to set scripting variables at one point in the mapping and then reference those variables later in the mapping. This is supported via the _map item.
The _map item is just like any other ArcScript item, except that its scope encompasses an entire document processed by the XML Map Connector. In other words, any attributes of the _map item will persist throughout the mapping and are only cleared when the XML Map Connector finishes processing a file.
For example, a mapping project may require tallying up the total cost of multiple line items in a purchase order (i.e. the mapping includes some number of LineItemCost elements and also a TotalCost element). ArcScript could be used to keep track of the running total by setting an attribute of the _map item like the following:
<arc:set attr="_map.sum_cost" value="[_map.sum_cost | def(0) | add([xpath(LineItemCost)])]" />
The above line adds the value of the LineItemCost element to the current value of _map.sum_cost (with a default value of 0 if _map.sum_cost does not yet exist). If this code is included in an element within the Foreach loop that loops over all of the line items, the value of _map.sum_cost will be the TotalCost when the Foreach loop exits.
Since the attributes of _map are preserved, this same _map.sum_cost value can be referenced later in the TotalCost element mapping, e.g.:
Virtual nodes are special nodes added to the Destination structure that do not directly appear in the output XML. Instead, these virtual nodes provide an opportunity to implement logic that affects the appearance/values of other (non-virtual) nodes in the output.
The XML Map Connector supports three types of virtual nodes:
- Code Script
A code script virtual node provides an opportunity to write custom ArcScript that does not need to return an output value. Often, these nodes will use the special _map item to store values that need to be referenced later in the mapping, but do not need to be output in the current context.
For example, the scenario described in the Map Item section is a good candidate for a virtual code script node. The sum of the line item costs needs to be returned as output outside of the Foreach loop where it is calculated. So, a virtual code script node within the Foreach loop can calculate the value (and not output it), then this value can be referenced (as output) in a non-virtual node outside the loop.
A condition virtual node groups output elements together based on a shared conditional. All children of the condition node will appear in the output if the condition is true, and not appear if the condition is false.
This is functionally equivalent to adding the same conditional to each of the individual nodes independently. For conditions that affect many different nodes, it is likely more convenient to create a single condition node, and then make all of the relevant output nodes a child of the condition node.
A loop virtual node functions the same as a Foreach mapping between parent nodes, except that the parent node will not actually appear in the output XML. This allows for ‘flattening’ repeated elements in the Source into a non-hierarchical structure in the Destination. This is easiest to understand via an example.
Take the following input XML:
<!-- example input --> <Items> <DataReading> <Temperature>212.5</Temperature> </DataReading> <DataReading> <Temperature>9.2</Temperature> </DataReading> <DataReading> <Temperature>5.1</Temperature> </DataReading> </Items>
This needs to be mapped to a flat structure that includes all of the DataReading data:
<!-- desired output --> <Items> <OutputData> <Temperature>212.5</Temperature> <Temperature>9.2</Temperature> <Temperature>5.1</Temperature> </OutputData> </Items>
This can be accomplished by establishing a Foreach relationship with a Loop node in the Destination corresponding to each DataReading element in the Source:
If the Foreach relationship was established between DataReading and OutputData, then the OutputData element would be repeated in the result. The Loop node avoids this repetition of hierarchy and flattens the value into the single OutputData element.