Example Transform pipeline

What is this?

This is an example data pipeline that creates a fake DataFrame, then runs a basic transformation on it and saves the result in the Microsoft SQL database (feature store). It shows you how you can write, test and deploy a basic data transformation using FlowEHR, package data transformation code into a Python wheel, and send metrics and logs to Azure Monitor.

Quick start

Make sure to follow the quick start guide on working with data pipelines in FlowEHR.

Code Structure

These are the files that are useful to explore:

entrypoint.py: Entrypoint for the pipeline, this is where the pipeline starts to run from.
transform.py: File that defines transformations.
Tests: Tests for the above transformations.
Test configuration: Helper fixture using [] for writing unit tests with PySpark
db.py: Helpers for working with Microsoft SQL database
monitoring.py: Helpers for sending logs and metrics to Azure Monitor.
Makefile: Used for command shortcuts, and certain commands are expected to be defined to ensure successful deployment of the pipeline to Azure.
pyproject.toml: Defines building of the Python wheel that contains all code defined for the pipeline.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Example Transform pipeline

What is this?

Quick start

Code Structure

Files

README.md

Latest commit

History

README.md

File metadata and controls

Example Transform pipeline

What is this?

Quick start

Code Structure