- Print
- DarkLight
User Defined Functions (UDF)
- Print
- DarkLight
Introduction
This tutorial guides through the process of creating user-defined functions. These functions facilitate the development of custom business logic, which can be stored as an API. This API can then serve as a building block for an application or as a tool for integrating information with other enterprise systems.
Creating A User Defined Function
Creating from a Template
Step 1: Open the Hamburger Menu in the upper right corner.
Step 2: Select the Dev Panel option from the menu.
Please note, this option is only accessible to users with a developer role or higher.
Step 3: Locate and select the User Defined Notebook API. This is where the list of user-defined functions is housed.
In this environment, no existing or saved user-defined functions are present. It provides instructions on how to access ODBC within the notebook and how to execute it.
Step 4: Download the template file. This file, a Jupyter notebook, includes an example of creating a user-defined function.
Saving a New UDF
This guide explains how to download a Jupyter notebook file, modify it, and upload it as a user-defined function.
Downloading and Accessing the Notebook File
Step 1: Download the notebook file as a template.
Step 2: Open the terminal and start your Jupyter notebook.
Step 3: Navigate to the location of the downloaded file. For this example, the file is in the Downloads folder.
Modifying the Notebook File
Step 1: After opening the template, you will notice 3 distinct sections.
Section 1: Initialize Parameters
The first section initializes parameters and contains a parameter tag. These can be modified by passing in new values to the API, such as date parameters or different machines. In the example below, a, b, and c are all parameters initialized in the first cell.
Section 2: Business Logic
The next section contains the business logic. This is a simple example and in practice this can take up as many cells as is necessary.
Section 3: API Response
The final cell returns the output via the API response.
Step 1: Modify the template as needed.
Step 2: Execute the code to see the expected result. In this example, without passing any parameters, the API response should be 2.
Step 3: Save your notebook.
Uploading the Modified Notebook File
Step 1: Return to the developer page and choose the saved file from the appropriate location.
Step 2: Provide a name for the UDF. This will be the name of the API that's being called.
Step 3: Provide a description for the function.
Step 4: Select "upload". A confirmation message will appear indicating a successful upload.
The new function is now visible, complete with its description and the time it was last saved, and ready to run!
Running Your UDF
After uploading, an api request can be POSTed to
{environment_name}.sightmachine.io/v1/udf/task/async
with the following body format (parameters are optional):{ "name": <notebook name>, "parameters": { <param>: <value>, ... } }
You will receive acknowledgement of a successful submission along with a task_id. You can then poll for your results at
{environment_name}.sightmachine.io/v1/udf/task/async/{task_id}
Example UDF
Data Quality
The following UDF analyzes a given machine_type to find fields which have recently been all null or only a single static non-null value.
Default Parameters
The first executable cell of the notebook, containing the parameters tag, sets the default parameter values for the API.
API_KEY = ""
API_SECRET = ""
machine_type = ""
days = 5
include_last_non_null = False
schema_name = ""
Imports
Import packages required for your UDF.
import sqlalchemy as sqla
from smsdk import client
from IPython.display import JSON
NOTE:
Only some packages are available for User Defined Functions. See the list below for currently available packages. If new packages are needed, please reach out to your customer success representative.
Available Packages
UDFs only support the following packages:
asttokens==2.4.1
attrs==21.4.0
backcall==0.2.0
boto3==1.26.67
botocore==1.29.67
cerberus==1.3.4
certifi==2023.7.22
charset-normalizer==3.0.1
comm==0.2.2
coverage==7.1.0
debugpy==1.5.1
decorator==5.1.1
deepdiff==6.3.0
dill==0.3.8
entrypoints==0.4
exceptiongroup==1.2.2
executing==2.0.1
git+https://github.com/sightmachine/sightmachine-sdk.git@0fa400d0158cbc76b678802590abf9471c71ea0e
google-api-core==1.22.2
google-cloud-core==1.4.1
google-cloud-storage==1.31.2
google-crc32c==1.1.0
google-resumable-media==1.2.0
googleapis-common-protos==1.52.0
google-auth==1.21.3
greenlet==0.4.14
gunicorn==22.0.0
idna==3.7
importlib-resources==5.10.2
iniconfig==2.0.0
ipykernel==6.9.2
ipython==7.32.0
ipython-genutils==0.2.0
ipywidgets==8.1.1
jedi==0.18.2
joblib==1.2.0
jupyter-client==7.1.2
jupyter-core==4.11.2
jupyterlab-widgets==3.0.11
Levenshtein==0.23.0
matplotlib==3.7.1
matplotlib-inline==0.1.3
mock==4.0.3
multiprocess==0.70.16
nest-asyncio==1.5.4
numpy==1.21.6
openai==0.27.2
packaging==23.0
pandas==1.1.5
parso==0.8.3
pathos==0.3.2
pexpect==4.8.0
pickleshare==0.7.5
platformdirs==4.2.2
plotly==5.18.0
pluggy==1.0.0
pox==0.3.4
ppft==1.7.6.8
prompt-toolkit==3.0.28
psutil==5.9.4
psycopg2==2.9.5
ptyprocess==0.7.0
pure-eval==0.2.3
pyOpenSSL==24.1.0
pyparsing==3.1.2
PyYAML==6.0
pyzmq==25.0.0
Pygments==2.11.2
pytest==7.2.1
python-dateutil==2.8.2
pytz==2022.7.1
requests==2.28.2
ruamel.yaml==0.18.6
scikit-learn==1.0.2
scipy==1.9.3
six==1.16.0
SQLAlchemy==1.4.32
stack-data==0.6.3
swifter==1.2.0
tenacity==8.5.0
tiktoken==0.3.2
tomli==2.0.1
tornado==6.1
traitlets==5.1.1
typing-extensions==4.4.0
urllib3==1.26.14
wcwidth==0.2.5
widgetsnbextension==4.0.11
zipp==3.7.0
Validate Parameters
Validate the parameters including machine_type.
tenant = ""
cli = client.Client(tenant)
cli.login('apikey', secret_id=API_SECRET, key_id=API_KEY)
cli.select_db_schema(schema_name=schema_name)
machine_types = cli.get_machine_type_names(clean_strings_out=False)
if machine_type not in machine_types:
raise RuntimeError(f"machine_type not found: {machine_type}")
Setup ODBC Connection
Connect to the DB using the provided API Key and Secret.
connection_string = f"postgresql+psycopg2://{API_KEY}:{API_SECRET}@pgbouncer:6432/tenant_storage"
engine = sqla.create_engine(connection_string)
Perform Analysis
Run the query and identify columns with null and static fields.
table_name = f"cycle_{machine_type.lower()}"
query = sqla.text(f"""
SELECT column_analysis(:s_name, :t_name, :days) AS column_analysis, count(*) AS total_rows_checked
FROM {engine.dialect.identifier_preparer.quote_schema(schema_name)}.{engine.dialect.identifier_preparer.quote_identifier(table_name)}
WHERE "Cycle End Time" > NOW() - INTERVAL ':days days'
""")
with engine.connect() as conn:
res = conn.execute(query, s_name=schema_name, t_name=table_name, days=days)
data = [dict(row) for row in res]
null_fields = []
static_fields = []
column_analysis = data[0]["column_analysis"][0]
for column_name, distinct_count in column_analysis.items():
if distinct_count == 0:
null_fields.append(column_name)
elif distinct_count == 1:
static_fields.append(column_name)
Format Response
The last executable cell of the notebook contains the response of the UDF.
response = {
schema_name: {
machine_type: {
"days": days,
"total_rows_checked": data[0]["total_rows_checked"],
"null_field_count": len(null_fields),
"null_fields": null_fields,
"static_field_count": len(static_fields),
"static_fields": static_fields
}
}
}
JSON(response)