PostgreSQL Advanced Configuration
PostgreSQL Advanced Configuration
The Advanced tab of the PostgreSQL connector allows you to configure advanced properties of the connector and its data handling methods. You can highlight any field on this tab to open a tooltip with more information about the functionality of that field.
The sections below outline the functionality and configuration of XML models, child tables, and advanced record selection. This is not a full list of configuration options available in the Advanced tab.
The PostgreSQL connector models PostgreSQL tables in ArcESB as XML. The modelling follows this structure:
- A parent element identifies the table to insert into.
- Each child element corresponds to a column in the target table.
This establishes a connection between XML elements and PostgreSQL values, which allows the connector to read values from incoming XML documents and use them in INSERT or UPDATE queries.
The XML Model for a table can be viewed by clicking </> Code above the configuration pane. The following XML is a simplified example model for a table called Account:
<Items> <Account action="upsert"> <Id key="true" /> <account_c /> <AccountNumber /> <AccountSource /> </Account> </Items>
In the above example,Id, account_c, AccountNumber, and AccountSource are each columns of the Account table. When the connector processes an XML file that matches this structure, it automatically inserts the values from that XML into the Account table. When the connector queries the Account table, it outputs XML that matches this structure.
The PostgreSQL connector supports batching to improve performance when inserting large sets of data. Batching is configured via two fields under the Advanced Settings section labelled TransactionSize and BatchInputSize. When these fields are set to positive integers, the connector inserts [BatchInputSize] records in a single operation (batch), and it inserts [TransactionSize] records in a single connection (transaction).
An XML input file can contain more records than a single transaction size, in which case the connector will insert the data over multiple transactions.
If an error occurs during a batch insert, the connector will rollback the transaction and cache the index of the first record in the failed transaction. When the input file is retried, only records after the cached index will be inserted into PostgreSQL.
When querying for output, the connector may be configured to support batching records together into a single document. The Max Records setting may be used to govern how many records should be included in a single message, and the Batch Size setting may be used to indicate how many Batch Messages should be included in a single Batch Group. More information on Batch Groups and Batch Messages can be found here.
The PostgreSQL connector supports Upserting into and Selecting multiple tables with a single action. When creating an table model with multiple tables, the tables should have a foreign key relationship to each other such that one table is the child of another table.
To interface with child tables, first create a model that targets the parent table. Then add child tables from within the configuration panel using the +Add button. Select the child table from the list of tables and the designer will automatically populate with the hierarchical structure of tables. Select the columns of the child table that should be included in the model.
Child Table XML
The Input Mapping XML structure will be multi-layered when inserting into parent and child tables. The following XML is an example Input Mapping for two tables related by a foreign key:
<Account action="upsert"> <Id key="true" /> <account_c /> <AccountNumber /> <AccountSource /> <AccountPartner> <Id key="true" /> <AccountFromId ref="@film_id" /> <IsPrimary /> <OpportunityId /> <Role /> </AccountPartner> </Account>
In the above example, Account is the parent table and AccountPartner is the child table. When the PostgreSQL connector processes an input XML file that matches this structure, it updates both the Account and AccountPartner tables and ensures that the keys linking the two records are the same.
Some columns within child tables are marked with a blue REF tag. This tag indicates that these columns correspond to the primary key in the parent table. After making the insert into the parent table, and before making the insert into the child table, the connector queries the parent table for the primary key of the record it just inserted. The connector then uses this value when inserting into the child table.
This is useful when the primary key of the parent table is generated by PostgreSQL. In this case, this column is not passed in as input XML, so the connector does not know the value of the column until after it has inserted the record into the parent table.
In the XML model, the ref attribute indicates that a column can be referenced later or that the value for a column should be read from another column in the mapping. In order to reference the value of columns that are generated by PostgreSQL, the column element should include a ref attribute set to the reserved value
<Item action="upsert"> <Id key="true" upsert="id" ref="@@LAST_INSERT_ID" />
This indicates that the id value will be retrieved after the insert and can be referenced later (usually in a child table). To reference this value later, set the ref element to the name of the relevant column:
<id key="true" ref="@id" />
In the above example, id is the name of the primary key in the parent table, and it is also the name of a column in the child table. The ref syntax instructs the connector to retrieve the id from the parent table after inserting, and then use that retrieved value as the id column in the child table.
Selecting From Child Tables
Typically, when Selecting from a parent table and child table, records should only be pulled from the child table if a foreign key matches the primary key from the parent table.
As an example, imagine a Select that pulls purchase order data from a parent table, PurchaseOrders, and a child table, PurchaseOrderLineItems. Each record (order) pulled from PurchaseOrders should also pull the line items associated with that order. The two tables are related by a shared column, for example a PONumber column that is the primary key of the PurchaseOrders table and a foreign key in the PurchaseOrdersLineItems table. Line item records should only be pulled from the child table if the line item’s PONumber matches the PONumber of the order being pulled from the parent table.
To establish this relationship in Arc, add a new Filter to the Select and configure it as follows:
- In the first (left-hand) dropdown, select the column of the child table that is the foreign key to the parent table. In the above example, this would be the PONumber column of the PurchaseOrderLineItems table.
- In the middle (operator) dropdown, select equal
- In the last (right-hand) input box, first use the dropdown arrow to the right to select REF. Afterwards, the input box should have the $ prefix, indicating that it is referencing the parent table
- Use the last (right-hand) dropdown to select the parent table column that should match the child table’s foreign key. In the above example, this would be the PONumber column of the PurchaseOrders table.
Only Process New or Changed Records
Select actions can be configured to only retrieve records that have been updated or added recently. The connector can use two approaches for detecting which records are new or modified.
The first approach requires a DateTime column that represents the last-modified time for records, and the second requires a column that explicitly stores a value representing whether a record should be processed or not.
Using a Last Modified Column
Find the Advanced settings at the bottom of the configuration panel. If a DateTime column is detected, the option appears to Use column for processing new or changed records. Set this to the appropriate DateTime column that represents the last-modified time for records.
Using a Custom Column
Select actions can be configured to update a column in the table for records that are successfully pulled from PostgreSQL. This can be combined with a Filter rule that only selects records with a column value that indicates they have not yet been pulled (or should be pulled again).
To enable column updating after processing, find Advanced settings at the bottom of the Mapping Editor. Enable Update column with value when rows are processed successfully, then specify which column should be updated and the value that should be set in this column.
For example, an Output Mapping might be configured to update the Processed column to 1 after pulling a record. Then, a Filter rule could be applied that restricts records to only those where Processed not equal 1.
In this example, when the connector processes the Select template, it creates a SELECT statement based on the template. The template would execute
SELECT Id, AccountId, Amount FROM INVOICE, create an XML file for every record in the result set based on the template, and place those files in the Output folder for this connector.