4. Developing a Project Description

This article contains the following sections:


You are responsible for creating a detailed project description for the implementation team to follow. Later, the Sight Machine teams will use this description to develop the project’s statement of work.

The project description should include sections such as:

  • Introduction/Executive Overview
  • Objectives
  • Risk Factors or Concerns
  • Key Contacts
  • Process Details (e.g., Industry Overview, Line/Process Mapping)
  • Information About the Data (e.g., Connectivity and Data Sources)
  • Sample Data Exploration (e.g., File Exploration, Data Model Alignment)

You can refer to the Sample Project Description section below for more information.

Gathering Materials

To develop an accurate and useful project description, you will collect a variety of materials from the customer, some of which you should already have after evaluating the project landscape. You can use the checklist below to mark your progress.

Materials Describing the Physical Process

  • Gather or create factory, plant, and division flowcharts and diagrams. See Examining the Physical Process in Evaluating the Project Landscape.
  • Take pertinent photographs/videos.
  • Collect verbal descriptions from stakeholders.

Materials Describing the Data from the Process

  • Identify all in-scope data sources. For example:

  • Identify the data payloads (i.e., connectivity information) of the data sources. For example:

  • Build a network data flow diagram to illustrate the customer data infrastructure, showing which machines can access the cloud, etc. For example:

  • Build a matrix of current data assets in relation to the AI Data Pipeline manufacturing model. For example:

Materials Describing the Goals in the Data Project

  • Indicate the users of the data, as well as their roles and responsibilities. See Analyzing the Organizational Functions in Evaluating the Project Landscape.
  • Describe the types of data problems that the customer wants to solve. These customer value propositions are user stories based on issues or questions the customer has about the machines’ real-time telemetry.
  • Find examples of data problem statements. These are the success criteria by which the customer will judge the success of the project (i.e., the customer return on investment, or ROI). For example:
  • Collect data from process historian (30 second fidelity), quality, and APA separator file.
  • Baseline should account for seasonal variability (via historical data) and include the major mT of production per day (calculated from separator flow in m3/h).

Sample Project Description

The following sample shows all of the sections that a project description should contain.


Executive Overview

Multinational company headquartered in Mumbai, India. Business activities include manufacture, sales, and distribution of paints, coatings, and household decor products.

Prospect Reference Links

Include links to corporate homepage, product information, etc.


Customer Objectives

This list should be numbered:

  1. Reduce Waste
  2. Optimize Line Efficiency

Phase 1 Deliverables (relating back to Customer Objectives)

  1. Analysis of root cause of waste on line 3 (Objective #1)
  2. Analysis of different lines at different plants to compare efficiencies (Objective #2)

Risk Factors or Concerns

Identify any risk factors or concerns that have arisen during discovery and qualification. Tie them back to the deliverables.

Risk Factor

Affects Deliverable

Machine data not available.

1, 2

Key Contacts

For both Customer and Sight Machine contacts, include the following in a table:

  • Name, title, email, and phone
  • Role in project: SME, project manager, day-to-day technical contact, plant manager, executive leadership, extrusion expert, etc.
  • Pertinent notes: critical to project funding, be sure to attend meetings where this contact is in attendance, etc.

Process Details

Industry Overview/Supplementary Material

General industry information found during discovery process. YouTube videos can be helpful here. Also videos, slides, and images from the customer or facility.

Line/Process Mapping

Add a block diagram of the line. Highlight the portion where we will focus.

Information About the Data

Raw Data Connectivity

Data Name






Links to

Sample Data


MySQL Data Warehouse


every min

GMP network

Missing key table, but will be added.

Defect Spreadsheet


Corp LAN

NOTE: If PLC, we need tag mappings and information from customer.

  • Please note any specifics for tables, or anything else you find during the presales process.
  • Protocols available and supported by DE team:
  • OPC UA
  • SQL
  • Spreadsheet

Process Area Data

Process Area

Data Name

Table / Sheet










MySQL Data Warehouse

EBRS Table

CPV Table

Potential cycle boundary: every row = 1 cycle


MySQL Data Warehouse

Parameter Tables (Categorical additional info)



Contains information

or supporting information for the given category.

Potentially relevant to the given category.

No relevant information for the given category.

Reviewed and determined it is not applicable to Sight Machine's use cases.

Sample Data Exploration

File Exploration

Example questions to ask of the data:

  1. What is the general structure of the data? (show with Jupyter)
  2. How many rows? Columns?
  3. Any noticeable issues with the data?
  4. Is there a cycle boundary field? How would we identify it?
  5. Output count definition
  6. Serial Numbers/Batch Numbers
  7. Downtime Indicators
  8. Defect Codes
  9. Downtime Codes
  10. Ideal/Max Cycle Time
  11. Recipe or Product Indicator
  12. OEE Equations

EXAMPLE: {File 1} Data Exploration

  • Link to Jupyter Notebook Exploration (HTML version)
  • 00-DownSample-Dataset.ipynb

EXAMPLE: {File 2} Data Exploration

  • Link to Jupyter Notebook Exploration (HTML version)
  • 01-Next-logical-step.ipynb

Please note the sequentially numbered notebook names in the exploration, e.g., 01-Next-logical-step.ipynb. This helps to ensure that another person could follow behind you in a logical order and rerun the datasets to recreate your data findings.