5. Deep Dive into Record Files

This article contains the following sections:

Introduction

FactoryTX encrypts data at the edge device before transmitting it to Sight Machine. We transmit the data in a consistent, human-readable record format that contains a JSON document per point in time. In other words, the records are customer data that have gone through a first layer of pre-processing in order to turn them into the JSON format.

Sections of the Record File

The record file consists of two sections:

  • Metadata: The top-level reserved keywords that work to transmit the data payload to the correct place. In the sample below, from timestamp through sslog_type.
  • Field values: The data payload from the original data source, translated into field names and values. In the sample below, from OuterBodyWidth to Size.

Sample Record File

Converting Data into the Record Format

The conversion process involves formatting and renaming fields. The core of FactoryTX’s functionality is taking in data in many different formats and producing a standardized, human-readable output. It is possible to take existing fields and map them to reserved keywords, which will make Extract, Transform, and Load (ETL) configuration easier in the Sight Machine platform. Most of these can be set by simply naming the field in the receiver configuration or renaming it as part of a transform operation.

Understanding the Schema

The following lists the schema that we use to transmit data from plant floor assets:

Schema

Description

capturetime

The timestamp indicating the time when the record has been received on the destination machine. This is used to determine the order of message arrival.

capturetime_src

An array of objects in the form of host: capturetime, which are used to trace records that are transmitted through multiple endpoints.

counter

An integer value that increments/changes at the machine level with each cycle. This is used for cycle bundling during Extract, Transform, and Load (ETL). This is an optional field and can be computed in the AI Data Pipeline.

running

The status of the machine: running, idle, or in an off state. This is an optional field and can be computed in ETL.

source

The ID of the machine that transmitted the information. This is a required field.

source_customer

The customer ID of the machine. This is an optional field.

source_location

The plant-level location information. This is an optional field.

source_machine_type

The type of machine from which the data came. This is an optional field.

source_logtype

The type of log being generated. This information is used to create rules for modeling when there are multiple sources that describe a single asset. Typically, something like “cycle,” "PLC," or "CSV."

timestamp

The time at which the event was logged at the point of origin. For best results, this should be an NTP-synchronized source. This is a required field.

alarm_codes

A key value set of alarm codes from the machine. Typically, non-zero means alarm state. This is an optional field.

defect_codes

A key value set of codes that describe any inline inspections done by the machine. Typically, non-zero means inspection failure.

serial

An array of serial numbers used to associate data in the process. Currently, this can handle either systems that have multiple serialization schemes OR multiple serials being produced at the same time. This is an optional field and can be assigned in ETL.

batch

An array of batch_type and batch_id that can handle different levels of group serialization (e.g., carton/pallet/raw material). This is an optional field.

poll_rate

The target rate of polling for this asset. This is an optional field.

encryption

Any bookkeeping fields used for encryption. This is an optional field.

attachments

An array of attachments that includes full path references on the local file system and MIME types for each. This is an optional field.

fieldvalues

A free-form document that includes all the data extracted from the asset (PLC or equivalent). Typically, this is in the form of FIELDNAME { units "mm", value 12345 }, but it can be altered to include arrays, key value pairs, etc. This is intended to be human-readable values (i.e., in units, not voltages, etc.). This is a required field.