-
Notifications
You must be signed in to change notification settings - Fork 10
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #37 from boozallen/5-migrate-documentation
#5 📝 Tranche 7 of documentation migration
- Loading branch information
Showing
12 changed files
with
990 additions
and
23 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
61 changes: 61 additions & 0 deletions
61
docs/modules/ROOT/pages/guides/guides-configuration-store.adoc
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,61 @@ | ||
= Leveraging the Configuration Store | ||
|
||
== Overview | ||
The Configuration Store is a tool that enables the various configurations for a project to be centrally defined and | ||
managed, while also standardizing access to the configurations. The Configuration Store dynamically provides | ||
environment-specific configurations, based on the project's development lifecycle phase. Usage limitations and | ||
regeneration strategies can also be set, allowing configurations to automatically refreshed, thereby bolstering | ||
security of sensitive properties. | ||
|
||
=== Setup | ||
The Configuration Store tool is currently under development. Stay tuned for its release! | ||
|
||
=== Usage | ||
Via the configuration store's helm chart, project teams can specify environment variables the URIs that house the project's | ||
various configurations. It is required to specify a base URI that houses the base/default configurations. Optionally, | ||
one can specify a secondary URI that houses environment-specific configurations that will override and/or augment the | ||
base configurations. Multiple configuration files can be stored at the URI. Configuration files are expected to be | ||
in `YAML` format. Further guidance is covered below. | ||
|
||
Because it is common practice to define a separate Helm chart for each of a project's development lifecycle phase's | ||
environment, it is encouraged to define one shared base URI and respective environment-specific URIs, each housing | ||
the relevant overrides and augmentations. | ||
|
||
The following example Configuration Store Helm chart demonstrates a URI specifications for a CI deployment: | ||
[source,yaml] | ||
---- | ||
env: | ||
baseURI: <URI housing base configurations> | ||
envURI: <URI housing CI-specific overrides/augmentations> | ||
---- | ||
|
||
Suppose we defined the following config at the `baseURI`: | ||
[source,yaml] | ||
---- | ||
groupName: exampleGroup | ||
properties: | ||
- name: connector | ||
value: smallrye-kafka | ||
- name: topic | ||
value: baseTopic | ||
---- | ||
|
||
Next, suppose we defined the following config at the `envURI`: | ||
[source,yaml] | ||
---- | ||
groupName: messaging | ||
properties: | ||
- name: topic | ||
value: ciTopic | ||
- name: newProperty | ||
value: newValue | ||
---- | ||
|
||
Then the following calls to the tool would provide the following configurations: | ||
[source,java] | ||
---- | ||
ConfigServiceClient client = new ConfigServiceClient(); | ||
client.getProperty("messaging", "connector") //smallrye-kafka | ||
client.getProperty("messaging", "topic") //ciTopic | ||
client.getProperty("messaging", "newProperty") //newValue | ||
---- |
250 changes: 250 additions & 0 deletions
250
docs/modules/ROOT/pages/guides/guides-efficient-debugging.adoc
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,250 @@ | ||
= Leveraging Unit Testing and Live Updates: Recommendations for Efficient Development | ||
|
||
== Introduction | ||
During the development process, it's essential to employ effective strategies for testing and debugging to ensure the | ||
quality and reliability of software. This page will outline how and when to leverage unit testing and local live | ||
code deployment for efficient development and debugging while ensuring code longevity. | ||
|
||
=== Unit Test | ||
|
||
*What is a Unit Test* | ||
|
||
Unit testing involves writing and running tests to verify the correctness of individual components or units of code. | ||
It focuses on isolating and testing specific functions, methods, or classes independently of the larger system. Unit | ||
testing makes development more efficient by identifying issues early, promoting modular and reusable code, enabling | ||
faster refactoring, and facilitating collaboration among developers and teams. Therefore, it is advisable to | ||
incorporate unit tests whenever possible in the development process. | ||
|
||
More information on unit testing within aiSSEMBLE(TM) can be found in the | ||
xref:testing.adoc#_unit_testing_the_pipeline[Testing the | ||
Project page of the Getting Started guide] | ||
|
||
|
||
*Benefits of Unit Testing* | ||
|
||
Unit testing offers several benefits that enhance the development process and code quality. Here, we introduce some | ||
commonly seen examples of using unit testing to improve debugging efficiency. | ||
|
||
1. Critical functionality and edge cases: unit tests help verify that the code behaves as expected, especially | ||
important in edge cases or corner scenarios. | ||
2. Refactoring: During refactoring, unit tests act as a safety net, ensuring that existing functionality is not | ||
unintentionally altered or broken. | ||
3. Modular components: When developing modular components or libraries that will be reused across different projects, | ||
unit tests provide confidence in their functionality and compatibility. | ||
|
||
Please note that while unit testing should be regarded as a fundamental practice in software development and employed | ||
whenever possible, it can be enhanced by complementary techniques like live update. Live update provides rapid | ||
feedback during development, facilitating faster iterations and enabling immediate visual verification of code changes. | ||
Developers should use unit tests and live update when possible to enhance code quality and efficiency | ||
|
||
*Unit Testing Examples* | ||
|
||
1. Data access and processing: Unit testing is particularly useful in scenarios such as data access and processing | ||
within Python pipelines. For example, consider a data processing module that handles scaling and normalization of | ||
input features. Unit testing for this module would involve writing tests specifically targeting the scaling and | ||
normalization functions or classes. | ||
* Example: To ensure the functional correctness of PySpark file ingestion for structured data, we can utilize unit | ||
tests which read a CSV file into a file store and verify that the ingestion process was successful. | ||
** Generate a `data_ingestion.feature` file within the `pipeline_name/pyspark_file_pipeline_name/tests/features` | ||
directory, with a path that's tailored to the specific pipeline name you've created, for creating a scenario | ||
** Create a file named `data_ingest_steps.py` in the `pipeline_name/pyspark_file_pipeline_name/tests/features/steps | ||
directory`( the path should be tailored to pipeline you have created). Within this file, utilize the Ingest class to | ||
import the CSV file titled "data to ingest" into the file store. Then the unit test runs to make sure that the data | ||
file is stored in the file store. | ||
**** | ||
`data_ingest.feature` | ||
[source] | ||
---- | ||
@data_ingest | ||
Feature: Read a CSV file into a filestore | ||
Scenario: Read a CSV file into files tore | ||
Given Ingest csv file exist | ||
Then the csv file data is read into the file store | ||
---- | ||
`data_ingest.py` | ||
[source,python] | ||
---- | ||
import os | ||
from behave import * | ||
import nose.tools as nt | ||
from src.pyspark_file_data_ingest.step.ingest import Ingest | ||
@given("Ingest csv file exist") | ||
def step_impl(context): | ||
return | ||
@then("the csv file data is read into the file store") | ||
def step_impl(context): | ||
container_name = "movie-data" | ||
data_folder = os.getcwd() | ||
file_name = "data-to-ingest.csv" | ||
file_path = data_folder + "/" + file_name | ||
context.ingest = Ingest() | ||
context.file_store = context.ingest.file_stores["LocalTest"] | ||
if not os.path.exists(container_name): | ||
os.makedirs(container_name) | ||
context.container = context.file_store.get_container(container_name=container_name) | ||
context.file_store.upload_object(file_path, context.container, file_name) | ||
context.ingest.execute_step() | ||
dataframe = context.test_spark_session.sql( | ||
""" | ||
SELECT _c0 as title, _c1 as year, _c2 as certificate, _c3 as duration, _c4 as genre, _c5 as rating, | ||
_c6 as description, _c7 as stars, _c8 as votes from netflix_movies""" | ||
) | ||
dataframe.collect() | ||
nt.ok_(dataframe) | ||
---- | ||
**** | ||
2. API alls and connection | ||
|
||
If your Python pipeline interacts with external APIs to fetch data or make predictions, unit tests can be employed to | ||
validate the API integration. This includes testing the request and response handling, verifying the correctness of | ||
data mapping, or parsing, and ensuring the pipeline behaves as expected when interacting with different API endpoints | ||
or handling different response scenarios. It's important to note that to perform unit tests rather than integration | ||
tests, mocks should be used. Mocking allows you to simulate the behavior of external API endpoints without making real | ||
network requests. | ||
|
||
* Example: In an aiSSEMBLE project, if you are interested in testing the connection of a Python pipeline to a | ||
data store using the `create_engine` function, you can employ unit tests that use mocks to verify the successful | ||
establishment of this connection. | ||
** Create a scenario file named `create_engine_test.feature` in the | ||
`pipeline_name/pipeline_name-pipelines/example-data-delivery-py-spark-pipeline/tests/features` directory. The path | ||
should be tailored to the pipeline you've created. | ||
** Create a python file named `create_engine_test_steps.py` in the | ||
`pipeline_name/pipeline_name-pipelines/example-data-delivery-py-spark-pipeline/tests/features/steps` directory. The | ||
path should be tailored to the pipeline you've created. The unit test uses mock to simulate the return value when | ||
the user calls `create_engine` function. | ||
|
||
**** | ||
`create_engine_test.feature` | ||
[source,python] | ||
---- | ||
@create_engine_test | ||
Feature: Placeholder test | ||
Scenario: python pipelines can be connected to data store via create_engine function | ||
Given Pyspark pipeline exists | ||
Then User can connect to data store via create_engine function | ||
---- | ||
`create_engine_test_steps.py` | ||
[source,python] | ||
---- | ||
from unittest.mock import patch | ||
import sqlalchemy | ||
from sqlalchemy.pool import QueuePool | ||
import nose.tools as nt | ||
@given("Pyspark pipeline exists") | ||
def step_impl(context): | ||
return | ||
@then("User can connect to data store via create_engine function") | ||
@patch("sqlalchemy.create_engine") | ||
def step_impl(context, mock_create_engine): | ||
mock_create_engine.return_value = { | ||
"url": "postgresql://username:***@host:1001/database" | ||
} | ||
sqlalchemy.create_engine( | ||
"postgresql://username:password@host:1001/database", | ||
poolclass=QueuePool, | ||
pool_size=5, | ||
) | ||
expected_url = "postgresql://username:***@host:1001/database" | ||
nt.eq_(mock_create_engine.return_value["url"], expected_url) | ||
---- | ||
**** | ||
|
||
=== Live Updates | ||
|
||
*What are Live Updates* | ||
|
||
Live updates, facilitated by tools like Tilt, allow developers to make changes to the code and see the results | ||
immediately without the need for a full rebuild or redeployment. | ||
|
||
*Benefits of Live Updates* | ||
|
||
1. Rapid prototyping: When rapidly iterating on a feature or exploring different approaches, live updates enable quick | ||
feedback by instantly reflecting code changes in a running application. | ||
2. Debugging and small code changes: Live updates are effective for debugging scenarios where developers need to | ||
quickly iterate on small code changes and observe the impact in real-time. | ||
|
||
*Example of How to Implement Live Updates and How They are Used* | ||
|
||
An example of live update is the automatic updating of the inference code in the local deployment, making testing | ||
easier during the development process. The code in this example is generated as a manual action blob during the | ||
project build to enable live updates. This code automates several tasks involved in the development and deployment | ||
process of a machine learning component for an AI system. It enables developers to make changes to the code, sync | ||
those changes with the running Docker container, and observe the results immediately using the live update feature. | ||
|
||
[source] | ||
---- | ||
# Add deployment resources here | ||
load('ext://restart_process', docker_build_with_restart') | ||
# quick-inference-compiler | ||
local_resource( | ||
name='compile-quick-inference', | ||
cmd='cd project-name-pipelines/aissemble-machine-learning-inference/quick-inference && poetry run behave tests/features && poetry build && cd ../../.. && \ | ||
cp -r project-name-pipelines/aissemble-machine-learning-inference/quick-inference/dist project-name-docker/project-name-quick-inference-docker/target/quick-inference', | ||
deps=['project-name-pipelines/aissemble-machine-learning-inference/quick-inference'], | ||
auto_init=False, | ||
ignore=['**/dist/'] | ||
) | ||
sync_properties = sync( | ||
local_path='project-name-docker/project-name-quick-inference-docker/target/quick-inference/dist', | ||
remote_path='/modules/quick-inference' | ||
) | ||
# project-name-quick-inference-docker | ||
docker_build_with_restart( | ||
ref='boozallen/project-name-quick-inference-docker', | ||
context='project-name-docker/project-name-quick-inference-docker', | ||
live_update=[sync_properties, | ||
run('cd /modules/quick-inference; for x in *.whl; do pip install $x --no-cache-dir --no-deps --force-reinstall; done') | ||
], | ||
entrypoint='python -m quick_inference.inference_api_driver "fastAPI" & python -m quick_inference.inference_api_driver "grpc"', | ||
build_args=build_args, | ||
dockerfile='project-name-docker/project-name-quick-inference-docker/src/main/resources/docker/Dockerfile' | ||
) | ||
---- | ||
|
||
*Code Explanation* | ||
|
||
The code loads a module called `restart_process` and a function called `docker_build_with_restart`. It then defines | ||
a local resource named `compile-quick-inference` with specific commands and dependencies. A synchronization property | ||
is created to sync a local path with a remote path. Finally, the code builds a docker image with live update | ||
capabilities using the provided parameters, including the reference, context, synchronization properties, entrypoint, | ||
build arguments, and Dockerfile location. | ||
|
||
* `load('ext://restart_process', 'docker_build_with_restart')`: Loads the external extension called | ||
`restart_process`, specifically the `docker_build_with_restart` function, which is referenced later in the code and | ||
enables the live update functionality for the Docker container. | ||
* `local_resource( name='compile-quick-inference', cmd='cd project-name-pipelines/aissemble-machine-learning-inference/...)`: | ||
Defines a local resource named `compile-quick-inference` with a set of commands to be executed locally. It builds and | ||
tests a module called `quick inference` and copies the resulting `dist` directory to a specific location | ||
* `sync( | ||
local_path='project-name-docker/project-name-quick-inference-docker/target/quick-inference/dist', | ||
remote_path='/modules/quick-inference')`: This specifies the locations that need to be synchronized. It ensures that | ||
the `dist` directory from the previous step is kept in sync with a specific directory on the remote target. | ||
* `docker_build_with_restart( | ||
ref='boozallen/project-name-quick-inference-docker', | ||
context='project-name-docker/project-name-quick-inference-docker',...)`: This section is referenced earlier in the | ||
code in `load('ext://restart_process', 'docker_build_with_restart')`. The configuration includes the image reference, | ||
file location, and additional options and defines the setup of a Docker container with live update functionality. | ||
|
||
*How Live Updates Enable Debugging* | ||
|
||
Live update functionality can be used to facilitate debugging inference steps within aiSSEMBLE projects. Here's a | ||
step-by-step guide on how the live update can help you quickly visualize changes when modifying an endpoint response | ||
in this case: | ||
|
||
1. Open the file you'd like to modify within your pipeline step, such as `inference/rest/inference_api_rest.py` | ||
which defines the REST API logic, and locate the endpoint you wish to modify. | ||
2. Modify the return statement of the endpoint to a different response. | ||
** In the case of the `/healthcheck` endpoint, you can change the return statement to a custom message. | ||
3. Save the changes. | ||
** Now, when you trigger the curl command using: | ||
`curl --location 'http://0.0.0.0:7080/healthcheck' --header 'Content-Type: application/json'`. | ||
The response you will receive depends on the modifications made to the `/healthcheck` endpoint. By default, the | ||
endpoint returns the string `Inference service for `"InferencePipeline is running"`. If you modify the return | ||
statement in the script, for example, to change the response message to `"Health check passed!"`, the curl command | ||
will return the updated response to `"Health check passed!"` | ||
|
||
By following this step-by-step guide and utilizing the live update feature to modify the endpoint response, you can | ||
quickly visualize the changes and significantly improve your debugging efficiency. |
Oops, something went wrong.