- Print
- DarkLight
Correlation Heatmap
- Print
- DarkLight
Overview
The Correlation Heatmap is a multivariate analysis tool that helps identify unexpected correlations among parameters, identifying root causes and affecting output fields. It can be used on Cycle and Part models for high-granularity and lower-granularity comparisons, making it a valuable first step in data analysis.
Before You Begin
Ensure you have:
- Access to the Application tab in Sight Machine
- Permission to view selected assets or part types
- At least 3–4 numeric parameters for correlation
- Minimum 30 observations in your selected date range
📝 Note: The more complete your dataset, the more reliable the correlations.
How to Create a Correlation Heatmap
1. Open the Correlation Heatmap
- Navigate to Application
- Select Correlation
- Select Correlation Heatmap

2. Select Your Data Model
Choose between:
- Cycles – analyze process parameters within the same asset or asset type
- Parts – analyze part-level characteristics across multiple machines
💡 Tip:
- Use Cycles when comparing readings from similar or identical machines.
- Use Parts when comparing finished product features across lines.

3. Choose Assets or Part Types
- Select Asset (Cycles) or Part Type (Parts)
- Choose one or multiple similar assets
📝 Note: Selecting multiple similar assets increases sample size, improving correlation reliability.

4. Set the Time Range
Choose from:
- Relative Ranges: Last 7/30/90 days
- Absolute Range: Specific start/end dates
⚠️ Warning: If the date range includes major process changes or downtime, correlations may be distorted.

5. Select Parameters
- Select 3–10 numeric parameters
- Categorical fields are automatically filtered out
💡 Tip: Start with 5–7 parameters for clearer visual results.

6. Configure Carry-Forwards
Choose how to handle forward-filled values:
- Keep All (default)
- First
- Last

7. Generate the Heatmap
- Select Update
- The heatmap calculates all correlations
- Results display in the matrix
8. Interpret the Heatmap
- Dark blue → strong positive correlation
- Dark red → strong negative correlation
- White/gray → weak or no correlation
💡 Tip:
Start by reviewing the darkest cells to identify the strongest parameter relationships.
Watch the full workflow here 👉

Tips, Notes, and Warnings
💡 Tip:
Use correlation findings to guide deeper analysis with Curve Fit or Time-Series Correlation.
📝 Note:
Parameters with very low variability often show weak correlations simply because they don’t change much.
⚠️ Warning:
Correlation ≠ causation. Always validate findings with process knowledge.
Practical Examples
Example 1
Process Parameter Exploration
Scenario: A food processing plant wants to understand relationships between temperature, pressure, humidity, cycle time, and energy consumption.
Configuration:
- Model: Cycles
- Assets: Sterilization_Unit_1, Sterilization_Unit_2
- Time Range: Last 60 days
- Parameters: Temperature, Pressure, Humidity, Cycle_Time, Energy_Consumption
Results (Correlation Matrix):
| Temperature | Pressure | Humidity | Cycle_Time | Energy | |
| Temperature | 1.00 | 0.88 | -0.12 | -0.74 | 0.92 |
| Pressure | 0.88 | 1.00 | -0.08 | -0.69 | 0.85 |
| Humidity | -0.12 | -0.08 | 1.00 | 0.15 | -0.18 |
| Cycle_Time | -0.74 | -0.69 | 0.15 | 1.00 | -0.71 |
| Energy | 0.92 | 0.85 | -0.18 | -0.71 | 1.00 |
Key Insights:
- Temperature and Energy are strongly correlated (0.92) → Higher temperatures increase energy consumption.
- Temperature and Pressure move together (0.88).
- Temperature and Cycle Time show a strong negative correlation (-0.74) → Higher temperatures shorten cycles.
- Humidity has weak relationships across the board.
Actions:
- Use Curve Fit Analysis to quantify temperature–energy impacts.
- Examine the temperature–cycle time relationship to optimize throughput.
- Confirm that humidity can be safely deprioritized.
Example 2
Quality Investigation
Scenario: An automotive parts manufacturer wants to understand which parameters are most associated with defect rate.
Configuration:
- Model: Parts
- Part Type: Brake_Rotor_A
- Time Range: Last 90 days
- Parameters: Defect_Rate, Material_Hardness, Pour_Temperature, Cooling_Rate, Mold_Pressure, Machine_Speed
Results (Top Correlations with Defect_Rate):
| Parameter Pair | Correlation |
| Defect_Rate ↔ Cooling_Rate | -0.81 |
| Defect_Rate ↔ Material_Hardness | -0.67 |
| Defect_Rate ↔ Mold_Pressure | 0.54 |
| Defect_Rate ↔ Pour_Temperature | -0.42 |
| Defect_Rate ↔ Machine_Speed | 0.38 |
Key Insights:
- Cooling Rate is the strongest factor (-0.81) → Faster cooling reduces defects.
- Material Hardness also plays a major role (-0.67).
- Mold Pressure shows a moderate positive correlation (0.54), which is unexpected and worth investigating.
- Speed and Pour Temperature show weaker relationships.
Actions:
- Immediately review mold pressure settings.
- Prioritize improvements in cooling rate control.
- Confirm material hardness targets with suppliers.
- Explore time-based correlation changes using Time-Series Correlation.
Example 3
Multicollinearity Detection for Model Building
Scenario: A data scientist wants to build a predictive model for yield and must avoid redundant predictors.
Configuration:
- Model: Cycles
- Assets: Reactor_A, Reactor_B, Reactor_C
- Time Range: Last 120 days
- Parameters: Yield, Temp_Sensor_1, Temp_Sensor_2, Pressure_A, Pressure_B, Flow_Rate, pH_Level, Catalyst_Amount
Results (Identifying Multicollinearity):
Highly correlated predictors:
- Temp_Sensor_1 ↔ Temp_Sensor_2: 0.97
- Pressure_A ↔ Pressure_B: 0.89
- Flow_Rate ↔ Catalyst_Amount: 0.78
Correlations with Yield:
- Temp_Sensor_1: 0.72
- Pressure_A: 0.65
- pH_Level: 0.58
- Catalyst_Amount: 0.51
Key Insights:
- Temperature sensors are nearly identical → Keep only one.
- Pressure measurements are also highly linked → Include only one.
- Flow_Rate and Catalyst_Amount overlap → Consider selecting one or creating a ratio.
Actions:
- Use Temp_Sensor_1, Pressure_A, pH_Level, and Catalyst_Amount (exclude their correlated counterparts).
- This reduces multicollinearity without losing meaningful information.
- Build and validate the predictive model with the selected inputs.
Calculation Method
Sight Machine uses the Pearson-R correlation coefficient to quantify linear relationships between pairs of continuous variables.
Pearson-R Correlation Coefficient
Range: -1.0 to +1.0
Interpretation:
- +1.0: Perfect positive relationship
- 0.0: No linear relationship
- -1.0: Perfect negative relationship
Mathematical Formula
r = (n × Σ(xy) - Σx × Σy) / √[(n × Σ(x²) - (Σx)²) × (n × Σ(y²) - (Σy)²)]
Definitions:
- n = number of observations
- Σ(xy) = sum of products
- Σx, Σy = sums
- Σ(x²), Σ(y²) = sums of squares
Alternative Formula (Mean-Centered)
r = Σ[(xi - x̄)(yi - ȳ)] / √[Σ(xi - x̄)² × Σ(yi - ȳ)²]
Both formulas produce the same result.
Covariance-Based Formula
r = Cov(X,Y) / (σx × σy)
Pearson-R normalizes covariance so results are always between −1 and +1.
How Correlation Heatmap Calculations Work
Step 1: Parameter Selection and Filtering
The tool identifies eligible fields:
- Includes: Continuous numeric fields
- Excludes: Categorical, textual, timestamp fields, and parameters marked excluded in the model
Users select 3–10 numeric parameters.
For N parameters, the heatmap computes N × (N−1) / 2 unique correlations.
Step 2: Data Retrieval and Preparation
The tool collects all cycles or parts matching:
- Selected assets or part types
- Date range
- Carry-forward configuration
Data is loaded into an observations × parameters table, missing values are flagged, and carry-forward rules applied.
Step 3: Pairwise Correlation Calculation
For each parameter pair:
- Identify observations where both values exist.
- Compute Σx, Σy, Σ(xy), Σ(x²), Σ(y²).
- Apply the Pearson-R formula.
- Store the coefficient and sample size.
Step 4: Matrix Population
The tool constructs a symmetric matrix with:
- Diagonal values = 1.00
- Off-diagonal values = Pearson-R coefficients
- Rows and columns aligned to selected parameter order
Step 5: Color Encoding
Color scale:
- Dark blue = strong positive
- Light blue = weak positive
- White/gray = near zero
- Light red = weak negative
- Dark red = strong negative
Color intensity increases with |r|.
Step 6: Summary Statistics
For each parameter, the tool computes:
- Count
- Standard deviation
- Minimum and maximum
- (Mean is used internally)
Step 7: Results Presentation
The interface displays:
- The color-coded matrix
- Numeric coefficients
- Summary statistics
- Interactive tooltips
Calculation Performance and Efficiency
Correlation count grows quadratically with parameter count.
Data is held in memory for performance (modern systems handle millions of rows easily).
Interpreting Correlation Strength
| Range | Interpretation |
| ≥0.7 | Strong |
| 0.5-0.7 | Moderate - Strong |
| 0.3-0.5 | Moderate |
| 0.1-0.3 | Weak |
| <0.1 | Very weak / None |
Positive correlations increase together; negative correlations move inversely.
Data Requirements
Parameters must be continuous numeric fields.
Up to 10 parameters can be analyzed at a time.
- Cycles model: Use when parameters come from the same asset type.
- Parts model: Use for part-level parameters across machines.
Date ranges can be absolute or relative.
Carry-forward handling can be included or excluded.
Understanding the Heatmap Matrix
The matrix is symmetric, with rows and columns representing parameters and each cell showing the pairwise correlation.
Summary Statistics
Each parameter includes:
- Count
- Standard deviation
- Minimum and maximum
These help assess data quality and variability.
Calculation Considerations
Pearson-R:
- Uses only valid pairs
- Ignores missing values
- Is scale-independent
- Captures linear relationships only
- May understate non-linear relationships
Common Use Cases
- Exploratory analysis
- Selecting parameters for modeling
- Understanding process behavior
- Root cause investigation
- Comparing time periods
Feature Benefits
- Simultaneous Multi-Parameter Exploration: Enables the exploration of relationships among many parameters at the same time, accelerating the discovery of complex or unexpected correlations within the dataset.
- Visual Root Cause Discovery: Helps efficiently pinpoint potential root causes by visually identifying how changes in input fields may influence output fields, supporting early-stage diagnostic analysis.
- Pearson-R Coefficient Matrix: Computes the Pearson-R correlation coefficient for up to 10 selected parameters, quantifying the strength and direction of every pairwise relationship in a clear matrix format.
- Multivariate Model Flexibility: Supports multivariate analysis on Cycle and Part data models, allowing users to tailor the scope of the analysis based on their data structure needs.
- High-Granularity Cycle Analysis: Allows investigation of parameters from the same machine type with high granularity when using the Cycle model, ensuring detailed operational insights.
- Cross-Machine Comparison Capability: Utilizes the Part model to facilitate comparison of fields across different machines, which is essential for understanding interactions in end-to-end production processes.
- Integrated Summary Statistics: Provides an immediate table of summary statistics (Count, Standard Deviation, Min, Max) for each selected parameter, offering essential contextual data alongside the correlation matrix.
- Foundation for Detailed Follow-Up: Acts as an ideal exploratory starting point that logically leads to deeper investigation using other tools, such as the Curve Fit Analysis, for validating identified pairs of interest.
Summary
The Correlation Heatmap tool displays a heatmap of the correlation coefficient between all possible pairs of parameters that are associated with the selected machine type. It provides you with a visualization of the nature of correlation using color (positive in blue or negative in red), and the magnitude of the relationship using intensity or hue.
The value of correlation is represented by the Pearson-R correlation coefficient displayed in each cell.
This is an excellent way for you to discover which parameters are correlated to each other.
