Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add documentation for Settings and Constants management #1521

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

elronbandel
Copy link
Member

Closes: #1517

.. _settings:

=====================================
Library Settings and Constants
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will any user need to access constants? For the list, they seem internal only.
These could be part of the code documentation. Putting them here adds complexity in the explanation for most users.

- Simplify debugging and testing.
- Enable dynamic configuration using environment variables or runtime contexts.

Adding New Settings
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not for users, only for contributors. I don't think we should document it in this tutorial. Only in the code (at latest as a last step - "for developers").

- Use a clear and descriptive name for the setting.
- Always specify the type as one of `int`, `float`, or `bool`.

Adding New Constants
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moreover here, constants are not to be added by users.

Using Settings Context
======================

The :class:`Settings <settings_utils.Settings>` class provides a `context` manager to temporarily override settings within a specific block of code. After exiting the block, the settings revert to their original values.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is important.

- bool
- False
- UNITXT_ALLOW_UNVERIFIED_CODE
- Enables or disables execution of unverified code.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Enables or disables execution of unverified code.
- Enables or disables execution of unverified code. Unverified code includes executable code from HF datasets and calls to ExecuteExpressions or other operators that run user code. This ensure only trusted code is executed.

- bool
- False
- UNITXT_USE_ONLY_LOCAL_CATALOGS
- Restricts operations to use only local catalogs.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Restricts operations to use only local catalogs.
- Restricts loading of artifacts to only use local catalogs on local filesystems (and not remote GitHub repos).

- int
- None
- UNITXT_GLOBAL_LOADER_LIMIT
- Sets a limit on the number of global data loaders.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is default value for "loader_limit"?

- None
- None
- UNITXT_CATALOGS
- Specifies the catalogs configuration.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not clear.

- None
- None
- UNITXT_ARTIFACTORIES
- Defines the artifact storage configuration.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also not clear.

- str
- "dataset_recipe"
- UNITXT_DEFAULT_RECIPE
- Specifies the default recipe for datasets.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it needed? What can it be set to?

- bool
- False
- UNITXT_USE_EAGER_EXECUTION
- Enables eager execution for tasks.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Describe what it is.

- list
- []
- UNITXT_REMOTE_METRICS
- Defines a list of configurations for remote metrics.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not really checked. Should we keep it?

- bool
- False
- UNITXT_TEST_CARD_DISABLE
- Disables the use of test cards when enabled.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why use it?

- bool
- False
- UNITXT_TEST_METRIC_DISABLE
- Disables the use of test metrics when enabled.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why use it?

- bool
- False
- UNITXT_SKIP_ARTIFACTS_PREPARE_AND_VERIFY
- Skips preparation and verification of artifacts.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why use it?

- bool
- True
- UNITXT_DISABLE_HF_DATASETS_CACHE
- Disables caching for Hugging Face datasets.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an important one. Need to describe the behavior, why caching is disabled by default and what changing means.

- int
- 1
- UNITXT_LOADER_CACHE_SIZE
- Sets the cache size for data loaders.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When is it used?

- bool
- True
- UNITXT_TASK_DATA_AS_TEXT
- Enables representation of task data as plain text.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why set it?

- None
- None
- UNITXT_DEFAULT_FORMAT
- Defines the default format for data processing.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is important.

- str
- "watsonx"
- UNITXT_DEFAULT_PROVIDER
- Specifies the default provider for tasks.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Specifies the default provider for tasks.
- Defines the default provider used by CrossProviderInferenceEngine. Used to set the change the platform (OpenAI, HF, Watson) used for inference calls and LLM as Judges without changing code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Document all env variables
2 participants