Deep Dive into Record Files

This article contains the following sections:


FactoryTX encrypts data at the edge device before transmitting it to Sight Machine. We transmit the data in a consistent, human-readable record format that contains a JSON document per point in time. In other words, the records are customer data that have gone through a first layer of pre-processing in order to turn them into the JSON format.

Sections of the Record File

The record file consists of two sections:

  • Metadata: The top-level reserved keywords that work to transmit the data payload to the correct place. In the sample below, from timestamp through sslog_type.
  • Field values: The data payload from the original data source, translated into field names and values. In the sample below, from OuterBodyWidth to Size.

Sample Record File

Converting Data into the Record Format

The conversion process involves formatting and renaming fields. The core of FactoryTX’s functionality is taking in data in many different formats and producing a standardized, human-readable output. It is possible to take existing fields and map them to reserved keywords, which will make Extract, Transform, and Load (ETL) configuration easier in the Sight Machine platform. Most of these can be set by simply naming the field in the receiver configuration or renaming it as part of a transform operation.

Understanding the Schema

The following lists the schema that we use to transmit data from plant floor assets:




The timestamp indicating the time when the record has been received on the destination machine. This is used to determine the order of message arrival.


An array of objects in the form of host: capturetime, which are used to trace records that are transmitted through multiple endpoints.


An integer value that increments/changes at the machine level with each cycle. This is used for cycle bundling during Extract, Transform, and Load (ETL). This is an optional field and can be computed in the AI Data Pipeline.


The status of the machine: running, idle, or in an off state. This is an optional field and can be computed in ETL.


The ID of the machine that transmitted the information. This is a required field.


The customer ID of the machine. This is an optional field.


The plant-level location information. This is an optional field.


The type of machine from which the data came. This is an optional field.


The type of log being generated. This information is used to create rules for modeling when there are multiple sources that describe a single asset. Typically, something like “cycle,” "PLC," or "CSV."


The time at which the event was logged at the point of origin. For best results, this should be an NTP-synchronized source. This is a required field.


A key value set of alarm codes from the machine. Typically, non-zero means alarm state. This is an optional field.


A key value set of codes that describe any inline inspections done by the machine. Typically, non-zero means inspection failure.


An array of serial numbers used to associate data in the process. Currently, this can handle either systems that have multiple serialization schemes OR multiple serials being produced at the same time. This is an optional field and can be assigned in ETL.


An array of batch_type and batch_id that can handle different levels of group serialization (e.g., carton/pallet/raw material). This is an optional field.


The target rate of polling for this asset. This is an optional field.


Any bookkeeping fields used for encryption. This is an optional field.


An array of attachments that includes full path references on the local file system and MIME types for each. This is an optional field.


A free-form document that includes all the data extracted from the asset (PLC or equivalent). Typically, this is in the form of FIELDNAME { units "mm", value 12345 }, but it can be altered to include arrays, key value pairs, etc. This is intended to be human-readable values (i.e., in units, not voltages, etc.). This is a required field.