Skip to content

Commit

Permalink
cli added config support and validate command (#729)
Browse files Browse the repository at this point in the history
* load config from .ini, .json, .py
* added `hamilton validate`

---------

Co-authored-by: zilto <tjean@DESKTOP-V6JDCS2>
  • Loading branch information
zilto and zilto authored Mar 7, 2024
1 parent 973739c commit 427f523
Show file tree
Hide file tree
Showing 7 changed files with 328 additions and 175 deletions.
19 changes: 17 additions & 2 deletions examples/cli/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,13 +13,28 @@ Test the installation with

# Features

Currently 4 commands:

## Commands
- `build`: creates a Hamilton `Driver` from specified modules. It"s useful to validate the dataflow definition
- `validate`: calls `Driver.validate_execution()` for a set of `inputs` and `overrides` passed through the `--context` option.
- `view`: calls `dr.display_all_functions()` on the built `Driver`
- `version`: generates node hashes based on their source code, and a dataflow hash from the collection of node hashes.
- `diff`: get a diff of added/deleted/edited nodes between the current version of Python modules and another git reference (`default=HEAD`, i.e., the last commited version). You can get a visualization of the diffs

## Options
- all commands receive `MODULES` which is a list of path to Python modules to assembled as a single dataflow
- all commands receive `--context` (`-ctx`), which is a file (`.py` or `.json`) that include top-level headers (see `config.py` and `config.json` in this repo for example):
- `HAMILTON_CONFIG`: `typing.Mapping` passed to `driver.Builder.with_config()`
- `HAMILTON_FINAL_VARS`: `typing.Sequence` passed to `driver.validate_execution(final_vars=...)`
- `HAMILTON_INPUTS`: `typing.Mapping` passed to `driver.validate_execution(inputs=...)`
- `HAMILTON_OVERRIDES`: `typing.Mapping` passed to `driver.validate_execution(overrides=...)`
- Using a `.py` context file provides more flexibility than `.json` to define inputs and overrides objects.
- all commands receive a `--name` (`-n`), which is used to name the output file (when the command produces a file). If `None`, a file name will be derived from the `MODULES` argument.
- When using a command that generates a file:
- passing a file path: will output the file with this name at this location
- passing a directory: will output the file with the `--name` value (either explicit or default derived from `MODULES`) at this location
- passing a file path with the name `default`: will output the file with the name replaced by `--name` value at this location. This is useful when you need to specify a type via filename. For example, `hamilton view -o /path/to/default.pdf my_dataflow.py` will create the file `/path/to/my_dataflow.pdf`. (This behavior may change)


See [DOCS.md](./DOCS.md) for the full references

# Usage
Expand Down
6 changes: 6 additions & 0 deletions examples/cli/config.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
{
"HAMILTON_CONFIG": {
"holiday": "halloween"
},
"HAMILTON_FINAL_VARS": ["customers_df", "customer_summary_table"]
}
3 changes: 3 additions & 0 deletions examples/cli/config.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
HAMILTON_CONFIG = dict(config_exists="true")

HAMILTON_FINAL_VARS = ["config_when", "customer_summary_table"]
11 changes: 9 additions & 2 deletions examples/cli/module_v1.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,16 @@
import pandas as pd

from hamilton.function_modifiers import extract_columns
from hamilton.function_modifiers import config, extract_columns


def customers_df(customers_path: str = "customers.csv") -> pd.DataFrame:
@config.when(holiday="halloween")
def customers_df__halloween() -> pd.DataFrame:
"""Example of using @config.when function modifier"""
return pd.read_csv("/path/to/halloween/customers.csv")


@config.when_not(holiday="halloween")
def customers_df__default(customers_path: str = "customers.csv") -> pd.DataFrame:
"""Load the customer dataset."""
return pd.read_csv(customers_path)

Expand Down
Loading

0 comments on commit 427f523

Please sign in to comment.