Overview: Building Data Extraction Pipelines
Releasing data from factory environments is required for Sight Machine to deliver distinctive data modeling and analysis. FactoryTX (or FTX, where TX stands for “Transmit”) is Sight Machine's data acquisition product, a continuous, automatic tool that extracts data from edge devices or machines of origin and efficiently imports it into the Sight Machine platform. Typically, FTX runs on an edge device on the factory floor, collecting data from a variety of sources in real time, conditioning the data, and then transmitting it into the Sight Machine platform.
To build a data extraction pipeline, FactoryTX:
- Coerces machine-specific data coming from a variety of sources and proprietary formats to discrete time series data.
- Labels all data with meaningful metadata, such as the data source, type, and a timestamp designated by a set of predefined rules, to assist data analysis.
- Builds a consistent, human-readable record format, which is a JSON document per point in time. For more information about the record format, see Deep Dive into Record Files.
- Streams data to Sight Machine in near real time using:
- Polling: Typically, FTX is a polling rather than a real-time data pipeline. Each receiver has its own independent polling rate, expressed in seconds (can be sub-second). You can adjust the polling rates as necessary, based on the applications/sources being polled. For more information, see About Polling in Configurations in FactoryTX.
- Microbatching: While basic message queuing sends each piece of data individually, FactoryTX can employ microbatching (a specific count or time interval of messages together), which promotes efficiency. This methodology allows for tradeoffs of latency vs. efficiency. When working with highly redundant manufacturing data, the messages can be compressed through the microbatching process and use less bandwidth and fewer resources.
- Store and forward: In the store and forward data transmission method, a device receives a complete message and temporarily stores it in a buffer before forwarding it to the final destination. This is useful in locations with network connectivity issues: FactoryTX does not lose data when networks go down.
- Contains a single configuration file. Typically, the version is controlled in Git, an open-source version control system that tracks changes and stores file backups. The configuration file includes all credentials and configuration information.
- Performs basic transforms on the data using Python Pandas. (A transform is the manipulation of data inside FactoryTX’s built-in data pipeline.)
This guide is a field manual that provides Sight Machine partners and customers with the basic knowledge needed to install and implement FactoryTX. You do not need to be a full-time data engineer to follow the instructions in this guide.
FactoryTX is also extensible and developers can write new receivers that will allow it to connect to different data sources. For more information about receivers or assistance with the customer’s FactoryTX implementation, please contact your Sight Machine sales partner.