Sources

Introduction

The sources page provides an overview of all connected data sources to the Actuals platform. At the top right of the page it is possible to use the magnifying glass to search for a specific source by description and to add a new source by clicking the button. Apart from the description, the status for each source is shown as well. When data for a source is being collected, the status is Live. For some source the data for that source is populated by collecting data from another source. In that case the status is “Processed by other source”. This also means that the source cannot be changed as there is no source data to configure. Every live source has multiple action buttons (from left to right): uploading a file, editing the source and viewing the processing details. Editing the source uses mostly the same wizard/setup as adding a new source, hence we will explain both at the same time.

Source configuration page

When adding a new source, the first step is to choose one of the integration connection that are available on the Actuals platform. The Integrations section contains a list of the specific setup for the specific connection. When working with files, the CSV upload integration is typically the best place to start. Once a connection is chosen for a source, it cannot be changed.

The next step is to setup the general source configuration. While some of the options are specific to the connection, several general settings are always required

Source name : the name of the source, used throughout the platform

Data processing : definition of how the files should be processed. Incremental means that all new files/data are added to the existing transactions, Full means that only the data from the most recent file is used (and data from all other files are removed)

Schedule : when is the data for the source collected and processed into transactions. Apart from a schedule to collect the data daily (at a specific time in GMT+1) or weekly, manually specifies that no data is actively collected. This is typically the case for sources where data is uploaded manually

Header : Expected column names/properties when data is collected for a source

Separator : When processing files, how are the different values separated. When this is unclear and a sample data is available, we recommend opening it with a text editor and check how the values are split in the file

File filter : What type of files should be processed.

Typically when selecting a specific connection (that does not uses manual files), we recommend to leave these settings as is.

Source mapping page

The final step in adding/editing a source is the data mapping. On this page, the data from the source is mapped to the properties of Actuals transactions. All Actuals transactions have transactionid, timestamp and amount as properties as well as many many more. This page not only allows users to map the columns from the input data to these properties but also apply formulas to further transfrom and customize the values. To use the value of a specific property the value __columnXX__ should be used corresponding to the column number (XX is the number of the column) in the input data. Additionally functions can be use to further process the values. An example is the CONCAT function that concatenates the values of a list of properties. If for instance for the transactionid property the formula CONCAT(__column1__,”-“,__column2__) is used, the the transactionid property will be populated for this source by concatenating the values of the first column in the data with a - followed by the value in the second column. Below is a list of common functions and how to use them

TO_TIMESTAMP to convert input timestamps to Actuals timestamps.

CONCAT Concatenate several strings into one string.

REPLACE Replace a specific value in a string with another string

LEFT Get a fixed number of characters from a string starting from the left

RIGHT Get a fixed number of characters from a string starting from the right

SUBSTRING Extract a substring from a string

SUBSTRING_INDEX Return a substring of a string before a specified number of delimiter occurs

UPPER Convert the text to upper-case

TRIM Remove leading and trailing spaces from a string

FROM_UNIXTIME Returns a date /datetime from a version of unix_timestamp

STR_TO_DATE Return a date based on a string and a format

CONVERT Convert an expression to an integer

CAST Convert a value to an int datatype

ABS Return the absolute value of a number

IF Use specific value per row based on conditions that are met

CASE Go through conditions and return a value when the first condition is met

OBJECT_CONSTRUCT Returns an OBJECT constructed from the arguments (to work with variant field types).

OBJECT_CONSTRUCT_KEEP_NULL Returns an OBJECT constructed from the arguments that retains key-values pairs with NULL values (to work with variant field types).

OBJECT_INSERT Returns an OBJECT value consisting of the input OBJECT value with a new key-value pair inserted (or an existing key updated with a new value).

OBJECT_PICK Returns a new OBJECT containing some of the key-value pairs from an existing object.

TO_JSON Converts a VARIANT value to a string containing the JSON representation of the value.

When applying these formula’s, the output should be strings / text. The only exception to that are the following parameters

timestamp has type datetime and therefore is typically defined using either STR_TO_DATE or FROM_UNITIME

amount , vatamount , quantity , source have type int and therefore is typically defined CAST. Note that amount and vatamount are specified in cents so to store 12.50 euro, the integer value 1250 should be provided

Apart from the mapping itself, filters can be applied as well. This means that specific rows are never imported as transactions. As an example a data set can have an column that contains the order status. It makes sense to filter the data for this source to only include rows where the order state column has the value “successful”. The filters applied are combined, so data is only imported if all filter conditions are met,

As getting the right columns, using the right formulas and applying the correct filters can be challenging, the bottom of the source configuration page contains a Test mapping section. After selecting a file, the Test mapping button can be used to apply the formulas and filters on the data in that specific file. This allows users to verify that the formulas and filter are correct in terms of syntax and also that the output is as expect. Note that this option is not available when initially creating the source as no data has been collected then. We highly recommend to always test the mapping before storing the source by clicking on Add at the bottom right.

We have specific guides available on how to work with JSON and XML data. You can check them beneath here.

Working with JSON data in Actuals [GUIDE-2]

Actuals allows two functionalities related to JSON structured data.

Extract data from a JSON string

Create a JSON string

1. Extract data from a JSON string

The Actuals platform allows to extract data from a JSON string inside a column. Imagine a file is provided that contains a column with a string of JSON data, for example a csv file that looks like this:

header name: ID header number: __column1__	header name: JSON_DATA header number: __column2__
Item 1	{“orderId”:”order123”}
Item 2	{“orderId”:”order124”}
Item 3	{“orderId”:”order125”}

If we want extract the orderId from the json string we can use this formula:

sql
JSON_EXTRACT_PATH_TEXT(<column_identifier>, '<path_name>')

In this example the JSON_DATA column is the column we want to look for. The column is located as second column. We need to define this with the column number, in this case this is the second column so you should use __column2__. The path is orderId, to extract the orderId we will create this formula:

sql
JSON_EXTRACT_PATH_TEXT(__column2__, '"orderId"')

It is also possible to extract nested data, for a more comprehensive guide on how to use this formula see: Snowflake JSON_EXTRACT_PATH_TEXT documentation.

2. Construct a JSON String

Actuals also supports the use case to construct a JSON object from source data. Imagine a file that provided that contains a column with a string of JSON data, for example a csv file that looks like this:

ID	payment_method	payment_method_variant
Order 1	Creditcard	VISA
Order 2	Creditcard	VISA
Order 3	Creditcard	Mastercard

If we want construct a json object containing all information about the payment method used for a orders we can use this formula:

OBJECT_CONSTRUCT('<column_key>', '<column_value>', '<column_key2>', '<column_value2>')

In this example we want to construct a object containing columns payment_method and payment_method_variant The columns are located as second and third column in the source data. We need to define this with the column number, in this case this is the second and third column so you should use __column2__ and __column3__.

To construct the object we will create this formula: OBJECT_CONSTRUCT('payment_method', __column2__,'payment_method_variant',__column3__).

For a more comprehensive guide on how to construct objects see: Snowflake OBJECT_CONSTRUCT documentation.

Working with XML data in Actuals [GUIDE-3]

Actuals contains functionality to work with XML structured data. The available functions are:

Extract properties from a XML column

1. Extract properties from a XML column

The Actuals platform allows to extract data from XML data inside a column. Imagine a file is provided that contains a column with a string of XML data, for example a csv file that looks like this:

ID	XML_DATA
Item 1	<order><orderId>order123</orderId></order>
Item 2	<order><orderId>order124</orderId></order>
Item 3	<order><orderId>order125</orderId></order>

If we want to extract the orderId from the XML string we can use this formula:

sql
XML_EXTRACT_XPATH_TEXT(<column_identifier>, <xpath>)

In this example the XML_DATA column is the column we want to look for. The column is located as the second column. We need to define this with the column number, in this case this is the second column so you should use is __column2__. The path is order --> orderId, to extract the orderId we will create this formula:

sql
XML_EXTRACT_XPATH_TEXT(__column3__, '/order/orderId')

To specify the path of the data you are looking for, we are using the XPATH standard. More information on what possibilities you have to extract data can be found in the XPath specification.

ℹ️

The XML_EXTRACT_XPATH_TEXT function will return NULL if the provided XPATH doesn’t exist in the XML string (similar to the JSON_EXTRACT_PATH_TEXT function).

When the XPATH returns multiple results it will return a value error. Make sure you always provide an explicit XPATH function that returns a maximum of 1 result. If you want to test the result of a XPATH formula on a certain XML string you can use an online XML XPATH evaluation tool like Xpather.

File upload page

The upload file button creates a popup that can be used to upload a file with data for that specific source. A file can be provided by either dragging the file from explorer to the web page or by clicking the link in the popup and browsing to the file. Important is to check the delimiter set for the source as the uploaded file should have the same delimiter in order for the values to be split properly. There is a limit on the file size of files that can be upload through the web app. If the files are bigger than 20MB, then an sFTP client should be used to upload the files.

Processing details page

The Source processing details shows al files/data batches that have been collected for a source. The search box can be used to search on specific file names and dates. The check box can be used to select any number of files and apply any of the actions as the appear in the Action drop down at the top left corner. The list shows the following properties per item

Timestamp : the date / time at which the data was collected in GMT+1

Filename : the name of the file including the location on the fileserver

#rows : the number of rows that have been stored for this item

Status : either Complete, Deleted or Not processed. The status can be changed by applying the actions in the Action dropdown

Processed Timestamp : The date / time at which the rows of data from the file were processed into transactions

Transaction Count : The number of transactions that were processed. Typically this is equal to or lower that the number of rows as several rows are filtered out

Error Count : The number of rows with an error

Action : For the item, several specific links can be clicked. View raw data allows users to see some rows that were imported for this item. This give a good impression of the data stored and any basic processing issues. Test raw data applies the current mapping the the data in this specific item, similar to the functionality on the mapping page. View transactions shows several transaction when the rows of the item have been processed into transactions

Input timestamp	Formula
31/12/2024 23:59:58	TO_TIMESTAMP(__yourcolumn__, 'dd/mm/yyyy hh24:mi:ss')
2024-12-31T23:59:58+02:00	TO_TIMESTAMP(__yourcolumn__, 'YYYY-MM-DDTHH24:MI:SS+TZH:TZM')

Input column 1	Input column 2	Formula
Order	123	CONCAT(__yourcolumn1__, __yourcolumn2__)
Order	124	CONCAT(__yourcolumn1__, __yourcolumn2__, ,’additionalfixedpieceoftext’)