3. Evaluating the Project Landscape

This article contains the following sections:


An important part of the pre-engagement process is evaluating the customer’s current data landscape (i.e., the various machines, databases, and systems that the customer has in place and from which Sight Machine can extract data). Understanding where individual plants or factories sit in terms of readiness allows the customer to select the appropriate products that will have greater impact.

Examining the Physical Process

Start your plant examination by gathering existing diagrams—or creating new diagrams—of the customer’s manufacturing process at each applicable factory, plant, or division. The diagrams should include details such as the following:

  • A graphical flowchart showing each step of the physical process at each plant
  • Lists of all the data sources generated during each process at each plant
  • A description of what type of product is being produced by each machine

Sample Physical Process Diagram

Data Evaluation Meeting Resources

As a reference, please see General Data Requirements and Process Questions. You can copy and customize this document for the customer and give it to stakeholders as part of the data evaluation meeting.

NOTE: The document is still in draft form; final version is pending.

Questions to Ask the Customer

During the data evaluation meeting, you and the implementation team should ask the customer the following questions:

  • What is the overarching initiative with which this effort is associated?
  • What is the driving use case for the datasets and site(s)?
  • Are there any assumptions around what is the primary driver of issues to explore?
  • Is there a process flow available?
  • How long does it take to move from raw material to finished product?
  • How long are each of the process steps?
  • Can the start and end of a process step be identified in the data provided? How?
  • What are the data sources that will be used?
  • Can we receive sample data? What period of time will it cover?
  • How do these data sources relate to the various assets?
  • Are there manual data capture steps? Where are those in the process?
  • In what format will the data be provided?
  • How can the data be joined/assembled together? Is this being done now? How?
  • Is there an SME/process/automation engineer available to assist with any questions that may arise regarding the data or the process?
  • What are identifiers that can track the batch throughout the process?
  • Is there an identifier that can be used to track back to the supplier?
  • Are there different types of quality issues that arise? If so, are these identified in the data?
  • Are there particular downtime or stop issues that are notable? How are they captured and coded? How are they currently used?
  • Is there supplemental information that exists outside of the line that might be relevant in diagnosing the cause of the downtime and quality issues (e.g., environmental factors)?

Analyzing the Organizational Functions

Understanding the customer’s physical manufacturing process helps you catalog the available stakeholders and personnel resources that will work with the implementation teams. You will want to convey to the customer the resources’ expected roles and responsibilities during the engagement process. Some of the resources use the platform as part of their workflow, while others need to know about the process so they can provide appropriate organizational support.

Organizational Functions


Interact with Data Systems?

Roles and Responsibilities




· Act as a point of contact during initial setup and connectivity

· Facilitate with network configuration and troubleshooting


· Maintain network connectivity post-deployment

Data Engineers



· Have deep historical knowledge of the data systems

· Execute data-centric components of the project, such as: data definition, collection, acquisition, management, and analysis


· Build, customize, and maintain many of the customer’s current applications and systems

Data Scientists



· Have access to cleaned, conditioned data, transformed into tables and intuitive objects

· Acquire data from a variety of raw data streams, parse and verify manual sources to structure and format consistent data, and keep data updated


· Evaluate historical data to determine the efficiency of the company’s manufacturing

· Interpret data and draw insights using analytic tools

Production and Manufacturing Engineering



· Use data to help support the production lines

· Are the closest to overall equipment effectiveness (OEE) calculations and use the metric for root cause problem solving on their individual lines

Quality Control



· Use data to ensure production quality

· Identify quality issues on the manufacturing lines and point out where efficiency is lacking

Machine Operators



· Use key sensor information and related key performance indicators (KPIs) for the lines and machines

· Want to know the causes of machine downtime so they can improve machine performance

Plant-Level Management



· Have real-time visibility into every machine and line throughout the facility for immediate and actionable insights

Corporate-Level Management



· Use standard key performance indicators (KPIs) to compare across divisions, plants, and suppliers and identify problems and best practices

Creating a Stakeholder List

You should create a list of stakeholders who will act as the project implementation or roll-out team. Include both Sight Machine and customer personnel, their roles, and their contact information.

Sample Stakeholder List




Contact Information

Sight Machine

Transformation Lead

John Smith



Data Architect

Sri Patel



Customer Success Manager

Hans Jensen



Data Expert

Thomas Atkins



Data Scientist

Ivana Horvat




Executive Sponsor

Rajwinder Kaur



Manufacturing Lead

Maria Rossi



Manufacturing Lead

Juan Perez



IT Lead

Jean Dupont



IT Lead

Yamada Hanako




Peter Schmidt




Mary Jones



Data Scientist

David Green



Project Manager

James King



Cataloging the Different Types of Data Sources

You will work with the customer’s internal resources to determine the various data sources used throughout the manufacturing process. Inquire about all of the following:

  • Real-Time Data Sources: Examples include data directly from sensors and programmable logic controllers (PLCs), images from cameras, worker IDs and shift codes gathered at time of operation from scheduling systems, supplier data, etc.
  • Archive Data Sources: Examples include historians and databases. Computer-based data sources may store their data here, in a time series format or organized by material or asset.
  • File-Based Data Sources: Examples include Excel spreadsheet outputs, logfiles from computer programs, unstructured data sources like images or G-code files, etc. This is often data produced as part of the assembly line process, but which is not currently being interpreted.
  • Application Programming Interfaces (APIs): Examples include product data such as serial codes pulled from legacy Manufacturing Execution System (MES)/Factory Information Systems (FIS) and enterprise resource planning (ERP) solutions, and other systems of records that we can integrate with directly.

Working with Sample Data

After cataloging the data sources, you will work with the customer to acquire sample data from each source that you identified so the implementation team can analyze it. You should familiarize yourself with the data formats and types of data that you might see.

Typical Data Formats

Most frequently, the customer delivers sample data as CSV or Excel files. In addition, you may receive database dumps or file backups.

CSV File Sample

A comma-separated values (CSV) file contains the values in a table as a series of ASCII text lines organized so that each column value is separated by a comma from the next column's value and each row starts a new line. The file extension is .csv.

Excel File Sample

A Microsoft Excel spreadsheet contains various columns and rows of data. There may be multiple worksheets in each spreadsheet. Possible file extensions include .xlsx, .xls, .xlsm (macro-enabled), .xlst (template), etc.

Database Dump File Sample

A database dump file contains a record of the table structure and/or the data from a database and is most often used for backing up a database so that its contents can be restored in the event of data loss. A MySQL dump file is usually in the form of a list of SQL statements.

The file extension depends on the type of database from which the file was exported. For example, a MySQL dump file may have a .bak extension (but it could have .csv or .xml instead), while an Oracle database dump uses the .dmp extension.

A database dump file often has the date of backup in the file name, and it must be imported back into a native database to be accessed. Because this type of file can be very large, it is usually compressed for delivery (using Zip or gzip) or stored on a customer file system.

File Backup Sample

Many programs use proprietary file formats that have customized extensions and require a parser to be written. If these are text files, you can view them in a text editor; if they are binary, you may need to use a hex editor.

Text File Sample (modified INI format):

Binary File Sample:

Types of Data

You may see any of the following data types from the customer:

  • Rich time series data: This data contains a complete snapshot of state for a part of a process at a fixed time interval.

  • Sparse time series data: This data contains a single tag or subset of tags at each timestamp, and may be at a variable time interval.

  • One record per output data: This data is organized around the parts/materials as they go through a process, and may be a record for an individual serial number or a single record for a batch of raw or finished material.

  • Event log data: This data is a record of machine events, sometimes with mixed variables, event descriptions, and event codes. A typical format is TIMESTAMP, EVENT_ID, EVENT_DATA.

  • Logfile data: This data contains free-form logfiles from computer-based applications. The structure can vary significantly but expect that there is a timestamp and some free-form message

Tools for Working with Data

You will want to use different software/programing tools to open and read the various data sources. Your decision may be affected by the platform on which you are working and the file type that you are trying to access, as well as personal preference.

The following list of tools is by no means exhaustive.




Use to open CSV or Excel files.

Text/Source Code Editors

Use to open text or source code files.

· Use Notepad++ for Windows:

· Use Sublime Text for Mac:

Hex Editors

Use to open files structured in hexadecimal format (i.e., binary data files).

· There are several good hex editors available for Windows:

· Hex Fiend is an open-source hex editor for Mac:

Python Notebooks

Use different software modules to open different file formats.

· For CSV and Excel, use Pandas (i.e., software libraries written for the Python programming language for data manipulation and analysis). For more information:

· For SQL, use SQLAlchemy:

· For binary files, Python has built-in file read.

ELK Stack

Use to read and search through logfiles (especially, large volumes).

ELK consists of three powerful tools (which are reflected in the tool’s acronym):

· Elasticsearch: A log search tool.

· Logstash: A tool for log data intake, processing, and output, including system logs, Web server logs, error logs, and application logs.

· Kibana: A log-data dashboard that contains point-and-click pie charts, bar graphs, trendlines, maps, and scatter plots.

For more information:



Use to browse and edit SQL data, as well as create and edit tables, views, procedures, triggers, and scheduled events. HeidiSQL is a powerful and easy client for MySQL, MariaDB, Microsoft SQL Server, and PostgreSQL.

For more information:


Building a Customer Value Map

An important step in evaluating the project landscape is creating a customer value map. This valuable tool helps you set project milestones and informs the order of the implementation plan. For instance, how far do you have to get through the implementation plan in order to provide value to the customer?

The map contains the following columns to help you visualize the value:

  • Task: This column lists items that will be in the implementation plan, prioritized from easiest/fastest to accomplish to most difficult/time consuming.
  • Customer Value: This column lists improvements or changes that the customer’s stakeholders view as important, listed from most to least important.

You can draw a connection from each task to each customer value to determine when the most value will be provided during the implementation process.

Consider the two examples below, which list identical tasks and customer values. The tasks are listed in the order that the implementation team prefers, but depending on which tasks are attainable (yellow) and which are not (grey), you may have to guide the customer toward reprioritizing their value list.

In this example, it is clear that customer values 3 and 4 are the most realistic.

Customer Value Map Example 1

In this example, customer values 1 and 2 are more appropriate.

Customer Value Map Example 2