Aggregate
    • Dark
      Light

    Aggregate

    • Dark
      Light

    Article summary

    Stateful Operator that aggregates multiple records into one record based on an explicit end of window / bin flag, boundary_field. A function is applied to the values of each record field to determine the value in the output record.

    This operator is a variation of the Reduce Operator. The difference between the Reduce Operator and the Aggregate Operator is the Reduce Operator groups output by changes of the values of a set of primary keys, primary_key_fields, where as the Aggregate Operator groups based on a boolean boundary flag, boundary_flag, becoming true. Additionally, the Aggregate Operator has functionality to emit output a window with retractions before it closes if the emit_window option is set to each_update.

    Input Data

    A single field on the input stream must be of data type boolean for use as a boundary flag to indicate the end of a window / bin.

    Configuration Settings

    The following settings are supported:

    • boundary_field (required): The name of the boolean field indicating the end of a window / bin.

    • emit_window (required): Flag indicating if operator should emit an output each watermark update, or when a window is marked closed on a watermark update. The field must be one of the following values:

      • when_complete: Output window result on boundary field being true and the pipeline state is being synchronized. This option only outputs a record for the window once all the data up to the boundary flag for the window is processed. This is useful if working state is not needed. Complete cycle records vs partial cycle records.

      • each_update: Output window current result on change each pipeline state synchronization. This option outputs partial window states and may issue retractions and updates each state synchronization. This may impact the pipeline performance downstream. This option is useful for things like KPIs where we group the data by date, but want a latest to date update.

    • fields (required): (ONE_OR_MORE) A list of window function transforms and its parameters for the input record to the per window output record. See each of the window function descriptions below for details on their output behavior. Each field configuration consists of at least:

      • function (required): Name of the function to use to calculate the output value in the window, e.g. mean or first. Acceptable functions are listed below.

      • to_field (required): Name of the Output-Record field that will hold the result of the function. Values of to_field must be unique in the output record.

      • *: Each window function can define additional required and optional parameters. See the function descriptions below for details.

    Output Data

    The format of the output record will primarily be decided by the fields-window-function description. The record annotations of the input fields are copied to the output fields.

    A default-aggregation annotation is added to any fields calculated from an aggregation function that has an equivalent function in the data visualization application.

    Examples

    PER-MACHINE CYCLES

    The operator reports the minimum and maximum temperature for each cycle, where cycles are terminated when the cycle_end field is true. The timezone and cycle_end fields are not included in the output. Since cycle_end is not used as the input to a function, it will be added to the configuration when auto-draft is run.

    Input

    Data

    timestamp

    next_timestamp

    timezone

    machine

    temperature

    cycle_end

    2019-01-01 11:00:00

    2019-01-01 11:00:10

    America/Detroit

    can

    72.0

    false

    2019-01-01 11:00:10

    2019-01-01 11:00:20

    America/Detroit

    can

    73.0

    false

    2019-01-01 11:00:20

    2019-01-01 11:00:30

    America/Detroit

    can

    72.0

    false

    2019-01-01 11:00:30

    2019-01-01 11:00:40

    America/Detroit

    can

    73.0

    true

    2019-01-01 13:00:00

    2019-01-01 13:00:10

    America/Detroit

    can

    73.0

    false

    2019-01-01 13:00:10

    2019-01-01 13:00:20

    America/Detroit

    can

    85.0

    false

    2019-01-01 13:00:20

    2019-01-01 13:00:30

    America/Detroit

    can

    93.0

    false

    2019-01-01 13:00:30

    2019-01-01 13:00:40

    America/Detroit

    can

    86.0

    true

    Record Schema

    Field Name

    Field Data Type

    Field Annotations

    original-name

    stream-type

    stat-type

    measurement-scale

    unit-of-measure

    timestamp

    Instant

    next_timestamp

    Instant

    timezone

    ZoneId

    machine

    Text

    temperature

    double

    temperature

    sensor_values

    Continuous

    thermodynamic_temperature

    celsius

    cycle_end

    boolean

    Configuration
    {
      "partition_by": ["machine"],
      "boundary_field": "cycle_end",
      "emit_window": "when_complete",
      "fields": [
        {
          "function": "min",
          "from_field": "temperature",
          "to_field": "min_temperature"
        },
        {
          "function": "max",
          "from_field": "temperature",
          "to_field": "max_temperature"
        },
        {
          "function": "first",
          "from_field": "timestamp",
          "to_field": "start_time"
        },
        {
          "function": "last",
          "from_field": "next_timestamp",
          "to_field": "end_time"
        },
        {
          "function": "ignore",
          "from_field": "timezone",
          "to_field": "timezone"
        }
      ]
    }

    Output

    Data

    timestamp

    min_temperature

    max_temperature

    start_time

    end_time

    2019-01-01 11:00:00

    72.0

    73.0

    2019-01-01 11:00:00

    2019-01-01 11:00:40

    2019-01-01 13:00:00

    73.0

    93.0

    2019-01-01 13:00:00

    2019-01-01 13:00:40

    Record Schema

    Field Name

    Field Data Type

    Field Annotations

    original-name

    stream-type

    stat-type

    measurement-scale

    unit-of-measure

    default-aggregation

    timestamp

    Instant

    min_temperature

    double

    temperature

    sensor_values

    Continuous

    thermodynamic_temperature

    celsius

    min

    max_temperature

    double

    temperature

    sensor_values

    Continuous

    thermodynamic_temperature

    celsius

    max

    start_time

    Instant

    first

    end_time

    Instant

    last

    Aggregate Window Functions

    SUM

    Description

    Sum of non-null values in the window. Set to 0 if no non-null values present.

    Field Configuration Settings

    The following settings are supported:

    • function (required): This must be set to sum.

    • to_field (required): Name of the Output-Record field that will hold the result of the function.

    • from_field (required): Name of the Input-Record field that the function will read for inputs. The data type of the named field must be double.

    FIRST

    Description

    Returns the first value in the window. Ignores null values by default.

    Field Configuration Settings

    The following settings are supported:

    • function (required): This must be set to first.

    • to_field (required): Name of the Output-Record field that will hold the result of the function.

    • from_field (required): Name of the Input-Record field that the function will read for inputs.

    • include_nulls (optional): Boolean setting that causes the function to include nulls when choosing the first value in the window. Defaults to false if not provided.

    LAST

    Description

    Returns the last value in the window. Ignores null values by default.

    Field Configuration Settings

    The following settings are supported:

    • function (required): This must be set to last.

    • to_field (required): Name of the Output-Record field that will hold the result of the function.

    • from_field (required): Name of the Input-Record field that the function will read for inputs.

    • include_nulls (optional): Boolean setting that causes the function to include nulls when choosing the last value in the window. Defaults to false if not provided.

    MIN

    Description

    Minimum value in the window. Nulls sort low.

    Field Configuration Settings

    The following settings are supported:

    • function (required): This must be set to min.

    • to_field (required): Name of the Output-Record field that will hold the result of the function.

    • from_field (required): Name of the Input-Record field that the function will read for inputs.

    MAX

    Description

    Maximum value in the window. Nulls sort low.

    Field Configuration Settings

    The following settings are supported:

    • function (required): This must be set to max.

    • to_field (required): Name of the Output-Record field that will hold the result of the function.

    • from_field (required): Name of the Input-Record field that the function will read for inputs.

    MEAN

    Description

    Mean of non-null values in the window. Set to null if no non-null values present.

    Field Configuration Settings

    The following settings are supported:

    • function (required): This must be set to mean.

    • to_field (required): Name of the Output-Record field that will hold the result of the function.

    • from_field (required): Name of the Input-Record field that the function will read for inputs. The data type of the named field must be double.

    WEIGHTED MEAN

    Description

    Weighted mean of the non-null values in the window. To compute this value, the product of each value and its corresponding weight is summed and then divided by the sum of all the weights. Weights must be 0 or greater, or the function raises a warning marker indicating that a negative weight was used, then treats that weight as 0. If any weight is infinite or if all the weights are 0, the result is null. If one or more values are positive infinity and there are no negative infinity values, the result is positive infinity. The same is true for negative infinities. If there are both positive infinity and negative infinity values in the window (with an associated weight that is positive), the result is null.

    Field Configuration Settings

    The following settings are supported:

    • function (required): This must be set to weighted_mean.

    • to_field (required): Name of the Output-Record field that will hold the result of the function.

    • value_field (required): Name of the Input-Record field that the function will read for value inputs. The data type of the named field must be double.

    • weight_field (required): Name of the Input-Record field that the function will read for weight inputs. The data type of the named field must be double. Weights should be >= 0.

    STANDARD DEVIATION

    Description

    Estimate of population standard deviation from non-null values in the window. Set to null if no non-null values are present, or if one or more values are too large to store.

    This function applies Bessel's correction to the result to account for bias in the sample. As a special case, if there is exactly one non-null input value then the result will be zero instead of infinity.

    The standard deviation function restricts the absolute value of its input to be less than 1012. Any window containing a value larger than 1012 will produce a null result. Additionally, this function cannot store values smaller than 10-15. Such values will be treated as if they were zero instead.

    Field Configuration Settings

    The following settings are supported:

    • function (required): This must be set to standard_deviation.

    • to_field (required): Name of the Output-Record field that will hold the result of the function.

    • from_field (required): Name of the Input-Record field that the function will read for inputs. The data type of the named field must be double.

    UNION

    Description

    Union of values in all the arrays in the window, in ascending order. The result is an array of distinct non-null elements found in all the arrays in the window.

    Field Configuration Settings

    The following settings are supported:

    • function (required): This must be set to union.

    • to_field (required): Name of the Output-Record field that will hold the result of the function.

    • from_field (required): Name of the Input-Record field that the function will read for inputs. This field must be an array type.

    UNIQUE

    Description

    All distinct values in the window in ascending order. This function was previously known as set.

    Field Configuration Settings

    The following settings are supported:

    • function (required): This must be set to unique.

    • to_field (required): Name of the Output-Record field that will hold the result of the function.

    • from_field (required): Name of the Input-Record field that the function will read for inputs.

    IGNORE

    Description

    Produces no output. The from_field specified for this field will be ignored by autodraft.

    Field Configuration Settings

    The following settings are supported:

    • function (required): This must be set to ignore.

    • to_field (required): Unused, since this function does not produce any output, but must be unique. Setting the value equal to from_field is recommended.

    • from_field (required): Name of the field on the input record to ignore.