-
Notifications
You must be signed in to change notification settings - Fork 4.3k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #33630 Expand documentation about YAML providers.
- Loading branch information
Showing
3 changed files
with
110 additions
and
68 deletions.
There are no files selected for viewing
109 changes: 109 additions & 0 deletions
109
website/www/site/content/en/documentation/sdks/yaml-providers.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,109 @@ | ||
--- | ||
type: languages | ||
title: "Apache Beam YAML Providers" | ||
--- | ||
<!-- | ||
Licensed to the Apache Software Foundation (ASF) under one | ||
or more contributor license agreements. See the NOTICE file | ||
distributed with this work for additional information | ||
regarding copyright ownership. The ASF licenses this file | ||
to you under the Apache License, Version 2.0 (the | ||
"License"); you may not use this file except in compliance | ||
with the License. You may obtain a copy of the License at | ||
http://www.apache.org/licenses/LICENSE-2.0 | ||
Unless required by applicable law or agreed to in writing, | ||
software distributed under the License is distributed on an | ||
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
KIND, either express or implied. See the License for the | ||
specific language governing permissions and limitations | ||
under the License. | ||
--> | ||
|
||
# Providers | ||
|
||
Though we aim to offer a large suite of built-in transforms, it is inevitable | ||
that people will need to author their own. This is made possible | ||
through the notion of Providers which leverage expansion services and | ||
vend catalogues of schema transforms. | ||
|
||
## Java | ||
|
||
For example, you could build a jar that vends a | ||
[cross language transform](https://beam.apache.org/documentation/sdks/python-multi-language-pipelines/) | ||
or [schema transform](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/schemas/transforms/SchemaTransformProvider.html) | ||
and then use it in a transform as follows | ||
|
||
``` | ||
pipeline: | ||
type: chain | ||
source: | ||
type: ReadFromCsv | ||
config: | ||
path: /path/to/input*.csv | ||
transforms: | ||
- type: MyCustomTransform | ||
config: | ||
arg: whatever | ||
sink: | ||
type: WriteToJson | ||
config: | ||
path: /path/to/output.json | ||
providers: | ||
- type: javaJar | ||
config: | ||
jar: /path/or/url/to/myExpansionService.jar | ||
transforms: | ||
MyCustomTransform: "urn:registered:in:expansion:service" | ||
``` | ||
|
||
A full example of how to build a java provider can be found | ||
[here](https://github.com/apache/beam-starter-java-provider). | ||
|
||
## Python | ||
|
||
Arbitrary Python transforms can be provided as well, using the syntax | ||
|
||
``` | ||
providers: | ||
- type: pythonPackage | ||
config: | ||
packages: | ||
- my_pypi_package>=version | ||
- /path/to/local/package.zip | ||
transforms: | ||
MyCustomTransform: "pkg.module.PTransformClassOrCallable" | ||
``` | ||
|
||
We offer a [python provider starter project](https://github.com/apache/beam-starter-python-provider) | ||
that serves as a complete example for how to do this. | ||
|
||
## YAML Provider listing files | ||
|
||
One can reference an external listings of providers in the yaml pipeline file | ||
via the syntax | ||
|
||
``` | ||
providers: | ||
- include: "file:///path/to/local/providers.yaml" | ||
- include: "gs://path/to/remote/providers.yaml" | ||
- include: "https://example.com/hosted/providers.yaml" | ||
... | ||
``` | ||
|
||
where `providers.yaml` is simply a yaml file containing a list of providers | ||
in the same format as those inlined in this providers block. | ||
See, for example, the provider listing [here]( | ||
https://github.com/apache/beam-starter-python-provider/blob/main/examples/provider_listing.yaml). | ||
|
||
In fact, this is how many of the the built in transforms are declared, | ||
see for example the [builtin io listing file]( | ||
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/yaml/standard_io.yaml). | ||
|
||
Hosting these listing files (together with their required artifacts) allows | ||
one to easily share catalogues of transforms that can be directly used | ||
by others in their YAML pipelines. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters