Curve Fit Analysis
    • Dark
      Light

    Curve Fit Analysis

    • Dark
      Light

    Article summary

    Overview

    Curve Fit Analysis is a multivariate tool used to investigate the relationship between two parameters in exploratory processes. It uses linear least-squares regression and scatterplots, providing metrics like Pearson-R correlation coefficient and r-squared. It can be performed on Cycle and Part models, enabling high-granularity and lower-granularity comparisons. This data-driven decision ensures accurate predictions and efficient process optimization in manufacturing operations.

    Before You Begin

    Ensure you have:

    • Access to the Application tab
    • Permission to view selected assets or part types
    • Two numeric parameters available (X and Y)
    • At least 30 observations in the selected date range (100+ recommended)

    Create a Curve Fit Analysis

    1. Open Curve Fit Analysis

    • Navigate to the Application tab.
    • Select Curve Fit Analysis.
    • The configuration panel opens.

    2. Select a Data Model

    Select how you want the tool to group data:

    • Cycles – cycle-level parameters from the same asset type
    • Parts – part-level data across machines producing the same part type

    💡 Tip: Use Cycles for equipment behavior and Parts for final product characteristics.

    3. Select Assets

    • Choose an Asset (Cycles) or Part Type (Parts).
    • Select one or more assets of the same type.
    • Confirm selections in the left panel.

    📝 Note: Using multiple similar assets increases data volume and regression stability

    4. Set the Time Range

    Choose:

    • Relative ranges (Last 7/30/90 days)
    • Absolute ranges (manual dates)

    📝 Note: Larger ranges provide more data but may include process changes.

    5. Select X and Y Parameters

    • Choose the X-axis parameter (independent variable).
    • Choose the Y-axis parameter (dependent variable).

    Both must be numeric fields.

    Example:

    • X: Oven Temperature
    • Y: Cure Time

    6. Add Stratification (Optional)

    To compare different conditions:

    • Select a categorical field (e.g., Shift, Product Type).
    • Each category generates its own regression line.
    • Leave blank for a single-line regression.

    7. Configure Carry-Forwards

    Choose how forward-filled values should be handled:

    • Keep All (default)
    • First
    • Last

    8. Generate Results

    • Select Update.
    • Wait for the scatter plot and regression line to load.
    • Review statistical metrics and visuals

    How Linear Regression Works

    Least-Squares Method

    Curve Fit Analysis uses linear least-squares regression, which fits the line that minimizes the squared distance between each data point and the regression line:

    • y = mx + b
    • m = slope
    • b = intercept

    Calculation Formulas

    Slope (m):

    m = (n × Σ(xy) - Σx × Σy) / (n × Σ(x²) - (Σx)²)

    Intercept (b):

    b = (Σy - m × Σx) / n

    Statistical Metrics Provided

    1. Pearson-R

    Measures linear correlation strength.

    Range: –1.0 to +1.0

    • ≥ 0.7 strong
    • 0.4–0.7 moderate
    • < 0.4 weak

    💡 Tip: Pearson-R only measures linear patterns.

    2. R-squared (R²)

    Explains how much of Y’s variance is predicted by X.

    Range: 0.0–1.0

    • ≥ 0.7 strong predictor
    • 0.4–0.7 moderate
    • < 0.4 weak

    Formula: R² = r²

    3. P-value

    Indicates statistical significance.

    p < 0.05 → significant

    p < 0.01 → highly significant

    p ≥ 0.05 → not significant

    ⚠️ Warning: Significance ≠ causation.

    4. Standard Error

    Measures how far points deviate from the regression line.

    Lower values = better fit.

    Data Requirements

    Numeric Fields Only

    • X and Y must be continuous numeric fields.
    • Categorical fields can only be used for stratification.

    Cycles vs. Parts

    • Cycles → machine cycle data
    • Parts → part-level comparisons

    Date Ranges: Use relative or absolute windows.

    💡 Tip: Be cautious of long ranges that include process or equipment changes.

    Interpreting Results

    Scatter Plot: Shows:

    • Distribution of data points
    • Regression line alignment
    • Concentration or spread
    • Outliers

    Histograms: Display data distributions for X and Y:

    • Identify skew
    • Spot outliers
    • Check data ranges

    Common Use Cases

    1. Predict Outcomes: Use regression equations to estimate results (e.g., predict Cycle Time from Temperature).

    2. Identify Drivers: Understand how inputs relate to outputs.

    3. Validate Expectations: Confirm whether expected relationships hold true.

    4. Compare Conditions: Use stratification to test differences across shifts, operators, products, or machines.

    Limitations & Considerations

    Linearity Requirement: Relationships must be linear. Non-linear patterns require other methods.

    Outliers: Large deviations can distort the regression line. Always inspect visually.

    Causation: Regression describes correlation; it does not prove cause-and-effect.

    Sample Size

    • Minimum: 30 points
    • Recommended: 100+

     ⚠️ Small datasets can produce misleading slopes or R².

    Feature Benefits

    • Detailed Relationship Validation: Allows for in-depth investigation of the relationship between two specific parameters, confirming whether a high correlation value is meaningful or just a data artifact.
    • Linear Regression Computation: Computes and visualizes a linear least-squares regression between the two parameters, providing a precise, quantifiable fit line.
    • Comprehensive Scatterplot Visualization: Provides a clear visual representation via a scatterplot, complete with histograms along each axis to simultaneously show the individual distributions of the two parameters.
    • Stratification Capability: Allows users to select a categorical field for stratification, enabling the breakdown of data points into separate traces to analyze how the relationship differs across various categories (e.g., product type).
    • Validation Metrics (R-squared & Pearson-R): Delivers a table of key regression metrics, including the Pearson-R correlation coefficient and the R-squared value (see note below). The R-squared value provides a simple estimate of how well the independent variable determines the dependent variable (closer to 1.0 indicates a stronger fit).
      NOTE: Statisticians can speak volumes on the finer points of interpreting the r-squared value, but it is basically an estimate of how well the independent variable (the variable on the X axis of your chart) determines the dependent variable (the variable on the Y axis of your chart). Small numbers close to zero indicate a weak relationship. Bigger numbers close to one indicate a stronger relationship.
    • Strategic Exploratory Tool: Most helpful later in the exploratory process; serves as an excellent follow-up tool after identifying key pairs using the Correlation Heatmap or Time-Series Correlation.
    • Model Flexibility (Cycle & Part): Supports multivariate analysis on Cycle and Part data models, enabling high-granularity analysis within a machine (Cycle model) or cross-machine comparison (Part model).
    • Integrated Summary Statistics: Includes a table of summary statistics (Count, Standard Deviation, Min, Max) for both selected parameters, providing immediate contextual data for the analysis.

    Summary

    The Curve Fit tool outputs a scatterplot with a regression line and two histograms, one for each parameter. This enables you to gain a deeper understanding of the relationship between the two parameters.

    For example, the stratification that you set applied a different color for the data points for each machine, allowing you to see if the machine impacted the relationship.

    The value of correlation is represented by the Pearson-R correlation coefficient, and you can see other useful statistical metrics displayed, such as r-squared (see note below), p-value, and standard error.NOTE: Statisticians can speak volumes on the finer points of interpreting the r-squared value, but it is basically an estimate of how well the independent variable (the variable on the X axis of your chart) determines the dependent variable (the variable on the Y axis of your chart). Small numbers close to zero indicate a weak relationship. Bigger numbers close to one indicate a stronger relationship.