Pipeline Builder
    • Dark
      Light

    Pipeline Builder

    • Dark
      Light

    Article summary

    Overview

    Pipeline Builder is a feature within Factory BUILD designed to automatically transform raw data, regardless of its format or source, into a real-time data stream.

    This real-time processing is enabled by Stateful Processing, which is critical for making accurate calculations. It retains the knowledge of previous records and performs necessary recalculations to the existing state as new data arrives. This ensures that data models are refreshed in real time (hot path), completely eliminating the need for a separate, time-consuming batch process (cold path) when late or out-of-order data enters the system.

    Before You Begin: Accessing and Navigating

    This section outlines how to access the Pipeline Builder and covers its basic navigation features.

    Accessing Pipeline Builder 

    Step 1:Open the top navigation and navigate to Factory BUILD.

    Step 2: From the displayed workspaces, select the one containing the desired pipeline.

    Step 3: Proceed to access the pipeline. Note: If a deployed version exists, it will be displayed by default. If the pipeline has never been deployed, the draft version will appear. 

    Basic Navigation

    Explore the basic navigation of the pipeline builder in this guide.

    Factory BUILD Homepage: Select the Workspace Icon to navigate back to the main Factory BUILD page. 

    Auto-Save: The system saves your work automatically. Always manually save after making changes for persistence. 

    Workspace Navigation: Use the provided dropdown to switch between the Pipeline Builder and the Environment Builder. 

    Draft vs Deployed: If available, use the dropdown to access the deployed version of the pipeline. 



    Options: Navigating to JSON mode, managing Extensions (to upload new operators), and accessing Download


    Workflow: Central Error Console & Advanced Operator Use

    Navigating Errors with the Central Error Console

    This feature is crucial for managing and understanding potential errors, especially in large pipelines.

    1. Locate and select the Error Console in the upper right corner of the screen. This action reveals all errors within the context of the pipeline.

    2. Search for a specific operator name or error description, or simply scroll through the list. 

    3. Select an error from the list.

    • The corresponding operator on the canvas is automatically selected.

    • The operator configuration drawer opens.

    • If the operator is part of a template, the template will also expand.

    Advanced Operator Workflow (JINJA, JSON, Data Dictionary)

    This illustrates the dynamic configuration capabilities of the pipeline.

    1. Open an operator that uses JINJA syntax.


    2. The JSON configuration and evaluate it in JSON.

     

    3. Access the tables using the Data Dictionary option. Within the Data Dictionary option, navigate to the appropriate Table name using the dropdown. 

    Feature Benefits

    The Pipeline Builder offers significant advantages for real-time data ingestion and processing:

    • Real-Time Data Streams: Automatically converts various raw data formats/sources into a live, flowing data stream.

    • Data Foundation Robustness: Effectively handles and processes late, missing, and out-of-order data without crashing the system.

    • Stateful Accuracy: Stateful Processing ensures calculations are accurate by retaining knowledge of previous records and dynamically making recalculations as new data arrives.

    • No Cold Path Reruns: Data models are refreshed in real time (hot path), eliminating the need to rerun extensive batch processes (cold path) when historical or out-of-order data comes in.

    • Streamlined Error Resolution: The Central Error Console provides a unified view and immediate navigation to the error-causing operator on the canvas, significantly speeding up debugging.

    Summary

    Pipeline Builder in Factory BUILD is a powerful, real-time data processing tool that uses Stateful Processing to efficiently handle imperfect, streaming data. Its integrated features, like the Central Error Console and support for advanced configuration via JINJA/JSON, ensure that data models are always current and accurate, transforming complex data ingestion into a simple, automated workflow.


    What's Next