Releases: apache/beam
Beam 2.62.0 release
We are happy to present the new 2.62.0 release of Beam.
This release includes both improvements and new functionality.
See the download page for this release.
For more information on changes in 2.62.0, check out the detailed release notes.
New Features / Improvements
- Added support for stateful processing in Spark Runner for streaming pipelines. Timer functionality is not yet supported and will be implemented in a future release (#33237).
- The datetime module is now available for use in jinja templatization for yaml.
- Improved batch performance of SparkRunner's GroupByKey (#20943).
- Support OnWindowExpiration in Prism (#32211).
- This enables initial Java GroupIntoBatches support.
- Support OrderedListState in Prism (#32929).
I/Os
- gcs-connector config options can be set via GcsOptions (Java) (#32769).
- [Managed Iceberg] Support partitioning by time (year, month, day, hour) for types
date
,time
,timestamp
, andtimestamp(tz)
(#32939) - Upgraded the default version of Hadoop dependencies to 3.4.1. Hadoop 2.10.2 is still supported (Java) (#33011).
- [BigQueryIO] Create managed BigLake tables dynamically (#33125)
Breaking Changes
- Upgraded ZetaSQL to 2024.11.1 (#32902). Java11+ is now needed if Beam's ZetaSQL component is used.
Bugfixes
- Fixed EventTimeTimer ordering in Prism. (#32222).
- [Managed Iceberg] Fixed a bug where DataFile metadata was assigned incorrect partition values (#33549).
Security Fixes
- Fixed (CVE-2024-47561)[https://www.cve.org/CVERecord?id=CVE-2024-47561] (Java) by upgrading Avro version to 1.11.4
For the most up to date list of known issues, see https://github.com/apache/beam/blob/master/CHANGES.md
List of Contributors
According to git shortlog, the following people contributed to the 2.62.0 release. Thank you to all contributors!
Ahmed Abualsaud, Ahmet Altay, Alex Merose, Andrew Crites, Arnout Engelen, Attila Doroszlai, Bartosz Zablocki, Chamikara Jayalath, Claire McGinty, Claude van der Merwe, Damon Douglas, Danny McCormick, Gabija Balvociute, Hai Joey Tran, Hakampreet Singh Pandher, Ian Sullivan, Jack McCluskey, Jan Lukavský, Jeff Kinard, Jeffrey Kinard, Laura Detmer, Kenneth Knowles, Martin Trieu, Mattie Fu, Michel Davit, Naireen Hussain, Nick Anikin, Radosław Stankiewicz, Ravi Magham, Reeba Qureshi, Robert Bradshaw, Robert Burke, Rohit Sinha, S. Veyrié, Sam Whittle, Shingo Furuyama, Shunping Huang, Svetak Sundhar, Valentyn Tymofieiev, Vlado Djerek, XQ Hu, Yi Hu, twosom
Beam 2.61.0 release
We are happy to present the new 2.61.0 release of Beam.
This release includes both improvements and new functionality.
For more information on changes in 2.61.0, check out the detailed release notes.
Highlights
I/Os
- [Managed Iceberg] Support creating tables if needed (#32686)
- [Managed Iceberg] Now available in Python SDK (#31495)
- [Managed Iceberg] Add support for TIMESTAMP, TIME, and DATE types (#32688)
- BigQuery CDC writes are now available in Python SDK, only supported when using StorageWrite API at least once mode (#32527)
- [Managed Iceberg] Allow updating table partition specs during pipeline runtime (#32879)
- Added BigQueryIO as a Managed IO (#31486)
- Support for writing to Solace messages queues (
SolaceIO.Write
) added (Java) (#31905).
New Features / Improvements
- Added support for read with metadata in MqttIO (Java) (#32195)
- Added support for processing events which use a global sequence to "ordered" extension (Java) #32540
- Add new meta-transform FlattenWith and Tee that allow one to introduce branching
without breaking the linear/chaining style of pipeline construction. - Use Prism as a fallback to the Python Portable runner when running a pipeline with the Python Direct runner (#32876)
Deprecations
- Removed support for Flink 1.15 and 1.16
- Removed support for Python 3.8
Bugfixes
- (Java) Fixed tearDown not invoked when DoFn throws on Portable Runners (#18592, #31381).
- (Java) Fixed protobuf error with MapState.remove() in Dataflow Streaming Java Legacy Runner without Streaming Engine (#32892).
- Adding flag to support conditionally disabling auto-commit in JdbcIO ReadFn (#31111)
Known Issues
N/A
For the most up to date list of known issues, see https://github.com/apache/beam/blob/master/CHANGES.md
List of Contributors
According to git shortlog, the following people contributed to the 2.60.0 release. Thank you to all contributors!
Ahmed Abualsaud, Ahmet Altay, Arun Pandian, Ayush Pandey, Chamikara Jayalath, Chris Ashcraft, Christoph Grotz, DKPHUONG, Damon, Danny Mccormick, Dmitry Ulyumdzhiev, Ferran Fernández Garrido, Hai Joey Tran, Hyeonho Kim, Idan Attias, Israel Herraiz, Jack McCluskey, Jan Lukavský, Jeff Kinard, Jeremy Edwards, Joey Tran, Kenneth Knowles, Maciej Szwaja, Manit Gupta, Mattie Fu, Michel Davit, Minbo Bae, Mohamed Awnallah, Naireen Hussain, Rebecca Szper, Reeba Qureshi, Reuven Lax, Robert Bradshaw, Robert Burke, S. Veyrié, Sam Whittle, Sergei Lilichenko, Shunping Huang, Steven van Rossum, Tan Le, Thiago Nunes, Vitaly Terentyev, Vlado Djerek, Yi Hu, claudevdm, fozzie15, johnjcasey, kushmiD, liferoad, martin trieu, pablo rodriguez defino, razvanculea, s21lee, tvalentyn, twosom
Beam 2.60.0 release
We are happy to present the new 2.60.0 release of Beam.
This release includes both improvements and new functionality.
For more information on changes in 2.60.0, check out the detailed release notes.
Highlights
- Added support for using vLLM in the RunInference transform (Python) (#32528)
- [Managed Iceberg] Added support for streaming writes (#32451)
- [Managed Iceberg] Added auto-sharding for streaming writes (#32612)
- [Managed Iceberg] Added support for writing to dynamic destinations (#32565)
New Features / Improvements
- Dataflow worker can install packages from Google Artifact Registry Python repositories (Python) (#32123).
- Added support for Zstd codec in SerializableAvroCodecFactory (Java) (#32349)
- Added support for using vLLM in the RunInference transform (Python) (#32528)
- Prism release binaries and container bootloaders are now being built with the latest Go 1.23 patch. (#32575)
- Prism
- Prism now supports Bundle Finalization. (#32425)
- Significantly improved performance of Kafka IO reads that enable commitOffsetsInFinalize by removing the data reshuffle from SDF implementation. (#31682).
- Added support for dynamic writing in MqttIO (Java) (#19376)
- Optimized Spark Runner parDo transform evaluator (Java) (#32537)
- [Managed Iceberg] More efficient manifest file writes/commits (#32666)
Breaking Changes
- In Python, assert_that now throws if it is not in a pipeline context instead of silently succeeding (#30771)
- In Python and YAML, ReadFromJson now override the dtype from None to
an explicit False. Most notably, string values like"123"
are preserved
as strings rather than silently coerced (and possibly truncated) to numeric
values. To retain the old behavior, passdtype=True
(or any other value
accepted bypandas.read_json
). - Users of KafkaIO Read transform that enable commitOffsetsInFinalize might encounter pipeline graph compatibility issues when updating the pipeline. To mitigate, set the
updateCompatibilityVersion
option to the SDK version used for the original pipeline, example--updateCompatabilityVersion=2.58.1
Deprecations
- Python 3.8 is reaching EOL and support is being removed in Beam 2.61.0. The 2.60.0 release will warn users
when running on 3.8. (#31192)
Bugfixes
- (Java) Fixed custom delimiter issues in TextIO (#32249, #32251).
- (Java, Python, Go) Fixed PeriodicSequence backlog bytes reporting, which was preventing Dataflow Runner autoscaling from functioning properly (#32506).
- (Java) Fix improper decoding of rows with schemas containing nullable fields when encoded with a schema with equal encoding positions but modified field order. (#32388).
Known Issues
N/A
For the most up to date list of known issues, see https://github.com/apache/beam/blob/master/CHANGES.md
List of Contributors
According to git shortlog, the following people contributed to the 2.60.0 release. Thank you to all contributors!
Ahmed Abualsaud, Aiden Grossman, Arun Pandian, Bartosz Zablocki, Chamikara Jayalath, Claire McGinty, DKPHUONG, Damon Douglass, Danny McCormick, Dip Patel, Ferran Fernández Garrido, Hai Joey Tran, Hyeonho Kim, Igor Bernstein, Israel Herraiz, Jack McCluskey, Jaehyeon Kim, Jeff Kinard, Jeffrey Kinard, Joey Tran, Kenneth Knowles, Kirill Berezin, Michel Davit, Minbo Bae, Naireen Hussain, Niel Markwick, Nito Buendia, Reeba Qureshi, Reuven Lax, Robert Bradshaw, Robert Burke, Rohit Sinha, Ryan Fu, Sam Whittle, Shunping Huang, Svetak Sundhar, Udaya Chathuranga, Vitaly Terentyev, Vlado Djerek, Yi Hu, Claude van der Merwe, XQ Hu, Martin Trieu, Valentyn Tymofieiev, twosom
Beam 2.59.0 release
We are happy to present the new 2.59.0 release of Beam.
This release includes both improvements and new functionality.
See the download page for this release.
For more information on changes in 2.59.0, check out the detailed release notes.
Highlights
- Added support for setting a configureable timeout when loading a model and performing inference in the RunInference transform using with_exception_handling (#32137)
- Initial experimental support for using Prism with the Java and Python SDKs
- Prism is presently targeting local testing usage, or other small scale execution.
- For Java, use 'PrismRunner', or 'TestPrismRunner' as an argument to the
--runner
flag. - For Python, use 'PrismRunner' as an argument to the
--runner
flag. - Go already uses Prism as the default local runner.
I/Os
- Improvements to the performance of BigqueryIO when using withPropagateSuccessfulStorageApiWrites(true) method (Java) (#31840).
- [Managed Iceberg] Added support for writing to partitioned tables (#32102)
- Update ClickHouseIO to use the latest version of the ClickHouse JDBC driver (#32228).
- Add ClickHouseIO dedicated User-Agent (#32252).
New Features / Improvements
- BigQuery endpoint can be overridden via PipelineOptions, this enables BigQuery emulators (Java) (#28149).
- Go SDK Minimum Go Version updated to 1.21 (#32092).
- [BigQueryIO] Added support for withFormatRecordOnFailureFunction() for STORAGE_WRITE_API and STORAGE_API_AT_LEAST_ONCE methods (Java) (#31354).
- Updated Go protobuf package to new version (Go) (#21515).
- Added support for setting a configureable timeout when loading a model and performing inference in the RunInference transform using with_exception_handling (#32137)
- Adds OrderedListState support for Java SDK via FnApi.
- Initial support for using Prism from the Python and Java SDKs.
Bugfixes
- Fixed incorrect service account impersonation flow for Python pipelines using BigQuery IOs (#32030).
- Auto-disable broken and meaningless
upload_graph
feature when using Dataflow Runner V2 (#32159). - (Python) Upgraded google-cloud-storage to version 2.18.2 to fix a data corruption issue (#32135).
- (Go) Fix corruption on State API writes. (#32245).
Known Issues
- Prism is under active development and does not yet support all pipelines. See #29650 for progress.
- In the 2.59.0 release, Prism passes most runner validations tests with the exceptions of pipelines using the following features:
OrderedListState, OnWindowExpiry (eg. GroupIntoBatches), CustomWindows, MergingWindowFns, Trigger and WindowingStrategy associated features, Bundle Finalization, Looping Timers, and some Coder related issues such as with Python combiner packing, and Java Schema transforms, and heterogenous flatten coders. Processing Time timers do not yet have real time support. - If your pipeline is having difficulty with the Python or Java direct runners, but runs well on Prism, please let us know.
- In the 2.59.0 release, Prism passes most runner validations tests with the exceptions of pipelines using the following features:
For the most up to date list of known issues, see https://github.com/apache/beam/blob/master/CHANGES.md
List of Contributors
According to git shortlog, the following people contributed to the 2.59.0 release. Thank you to all contributors!
Ahmed Abualsaud,Ahmet Altay,Andrew Crites,atask-g,Axel Magnuson,Ayush Pandey,Bartosz Zablocki,Chamikara Jayalath,cutiepie-10,Damon,Danny McCormick,dependabot[bot],Eddie Phillips,Francis O'Hara,Hyeonho Kim,Israel Herraiz,Jack McCluskey,Jaehyeon Kim,Jan Lukavský,Jeff Kinard,Jeffrey Kinard,jonathan-lemos,jrmccluskey,Kirill Berezin,Kiruphasankaran Nataraj,lahariguduru,liferoad,lostluck,Maciej Szwaja,Manit Gupta,Mark Zitnik,martin trieu,Naireen Hussain,Prerit Chandok,Radosław Stankiewicz,Rebecca Szper,Robert Bradshaw,Robert Burke,ron-gal,Sam Whittle,Sergei Lilichenko,Shunping Huang,Svetak Sundhar,Thiago Nunes,Timothy Itodo,tvalentyn,twosom,Vatsal,Vitaly Terentyev,Vlado Djerek,Yifan Ye,Yi Hu
Beam 2.58.1 release
We are happy to present the new 2.58.1 release of Beam.
This release includes both improvements and new functionality.
See the download page for this release.
New Features / Improvements
- Fixed issue where KafkaIO Records read with
ReadFromKafkaViaSDF
are redistributed and may contain duplicates regardless of the configuration. This affects Java pipelines with Dataflow v2 runner and xlang pipelines reading from Kafka, (#32196)
Known Issues
- Large Dataflow graphs using runner v2, or pipelines explicitly enabling the
upload_graph
experiment, will fail at construction time (#32159). - Python pipelines that run with 2.53.0-2.58.0 SDKs and read data from GCS might be affected by a data corruption issue (#32169). The issue will be fixed in 2.59.0 (#32135). To work around this, update the google-cloud-storage package to version 2.18.2 or newer.
For the most up to date list of known issues, see https://github.com/apache/beam/blob/master/CHANGES.md
List of Contributors
According to git shortlog, the following people contributed to the 2.58.1 release. Thank you to all contributors!
Danny McCormick
Sam Whittle
Beam 2.58.0 release
We are happy to present the new 2.58.0 release of Beam.
This release includes both improvements and new functionality.
See the download page for this release.
For more information about changes in 2.58.0, check out the detailed release notes.
I/Os
New Features / Improvements
- Multiple RunInference instances can now share the same model instance by setting the model_identifier parameter (Python) (#31665).
- Added options to control the number of Storage API multiplexing connections (#31721)
- [BigQueryIO] Better handling for batch Storage Write API when it hits AppendRows throughput quota (#31837)
- [IcebergIO] All specified catalog properties are passed through to the connector (#31726)
- Removed a third-party LGPL dependency from the Go SDK (#31765).
- Support for
MapState
andSetState
when using Dataflow Runner v1 with Streaming Engine (Java) ([#18200])
Breaking Changes
- [IcebergIO]
IcebergCatalogConfig
was changed to support specifying catalog properties in a key-store fashion (#31726) - [SpannerIO] Added validation that query and table cannot be specified at the same time for
SpannerIO.read()
. PreviouslywithQuery
overrideswithTable
, if set (#24956).
Bug fixes
- [BigQueryIO] Fixed a bug in batch Storage Write API that frequently exhausted concurrent connections quota (#31710)
List of Contributors
According to git shortlog, the following people contributed to the 2.58.0 release. Thank you to all contributors!
Ahmed Abualsaud
Ahmet Altay
Alexandre Moueddene
Alexey Romanenko
Andrew Crites
Bartosz Zablocki
Celeste Zeng
Chamikara Jayalath
Clay Johnson
Damon Douglass
Danny McCormick
Dilnaz Amanzholova
Florian Bernard
Francis O'Hara
George Ma
Israel Herraiz
Jack McCluskey
Jaehyeon Kim
James Roseman
Kenneth Knowles
Maciej Szwaja
Michel Davit
Minh Son Nguyen
Naireen
Niel Markwick
Oliver Cardoza
Robert Bradshaw
Robert Burke
Rohit Sinha
S. Veyrié
Sam Whittle
Shunping Huang
Svetak Sundhar
TongruiLi
Tony Tang
Valentyn Tymofieiev
Vitaly Terentyev
Yi Hu
Beam 2.57.0 Release
We are happy to present the new 2.57.0 release of Beam.
This release includes both improvements and new functionality.
See the download page for this release.
For more information on changes in 2.57.0, check out the detailed release notes.
Highlights
I/Os
- Ensure that BigtableIO closes the reader streams (#31477).
New Features / Improvements
- Added Feast feature store handler for enrichment transform (Python) (#30957).
- BigQuery per-worker metrics are reported by default for Streaming Dataflow Jobs (Java) (#31015)
- Adds
inMemory()
variant of Java List and Map side inputs for more efficient lookups when the entire side input fits into memory. - Beam YAML now supports the jinja templating syntax.
Template variables can be passed with the (json-formatted)--jinja_variables
flag. - DataFrame API now supports pandas 2.1.x and adds 12 more string functions for Series.(#31185).
- Added BigQuery handler for enrichment transform (Python) (#31295)
- Disable soft delete policy when creating the default bucket for a project (Java) (#31324).
- Added
DoFn.SetupContextParam
andDoFn.BundleContextParam
which can be used
as a pythonDoFn.process
,Map
, orFlatMap
parameter to invoke a context
manager per DoFn setup or bundle (analogous to usingsetup
/teardown
orstart_bundle
/finish_bundle
respectively.) - Go SDK Prism Runner
- Pre-built Prism binaries are now part of the release and are available via the Github release page. (#29697).
- Some pipelines will work on Java and Python, but this is in part to prepare for real runner wrappers in 2.58.0
- ProcessingTime is now handled synthetically with TestStream pipelines and Non-TestStream pipelines, for fast test pipeline execution by default. (#30083).
- Prism does NOT yet support "real time" execution for this release.
- Improve processing for large elements to reduce the chances for exceeding 2GB protobuf limits (Python)([https://github.com//issues/31607]).
Breaking Changes
- Java's View.asList() side inputs are now optimized for iterating rather than
indexing when in the global window.
This new implementation still supports all (immutable) List methods as before,
but some of the random access methods like get() and size() will be slower.
To use the old implementation one can use View.asList().withRandomAccess(). - SchemaTransforms implemented with TypedSchemaTransformProvider now produce a
configuration Schema with snake_case naming convention
(#31374). This will make the following
cases problematic:- Running a pre-2.57.0 remote SDK pipeline containing a 2.57.0+ Java SchemaTransform,
and vice versa: - Running a 2.57.0+ remote SDK pipeline containing a pre-2.57.0 Java SchemaTransform
- All direct uses of Python's SchemaAwareExternalTransform
should be updated to use new snake_case parameter names.
- Running a pre-2.57.0 remote SDK pipeline containing a 2.57.0+ Java SchemaTransform,
- Upgraded Jackson Databind to 2.15.4 (Java) (#26743).
jackson-2.15 has known breaking changes. An important one is it imposed a buffer limit for parser.
If your custom PTransform/DoFn are affected, refer to #31580 for mitigation.
For the most up to date list of known issues, see https://github.com/apache/beam/blob/master/CHANGES.md
List of Contributors
According to git shortlog, the following people contributed to the 2.57.0 release. Thank you to all contributors!
Ahmed Abualsaud
Ahmet Altay
Alexey Romanenko
Andrey Devyatkin
Anody Zhang
Arvind Ram
Ben Konz
Bruno Volpato
Celeste Zeng
Chamikara Jayalath
Claire McGinty
Colm O hEigeartaigh
Damon
Danny McCormick
Evan Galpin
Ferran Fernández Garrido
Florent Biville
Jack Dingilian
Jack McCluskey
Jan Lukavský
JayajP
Jeff Kinard
Jeffrey Kinard
John Casey
Justin Uang
Kenneth Knowles
Kevin Zhou
Liam Miller-Cushon
Maarten Vercruysse
Maciej Szwaja
Maja Kontrec Rönn
Marc hurabielle
Martin Trieu
Mattie Fu
Min Zhu
Naireen Hussain
Nick Anikin
Pablo Rodriguez Defino
Paul King
Priyans Desai
Radosław Stankiewicz
Rebecca Szper
Ritesh Ghorse
Robert Bradshaw
Robert Burke
Rodrigo Bozzolo
RyuSA
Sam Rohde
Sam Whittle
Sergei Lilichenko
Shahar Epstein
Shunping Huang
Svetak Sundhar
Tomo Suzuki
Tony Tang
Valentyn Tymofieiev
Vincent Stollenwerk
Vineet Kumar
Vitaly Terentyev
Vlado Djerek
XQ Hu
Yi Hu
akashorabek
bzablocki
kberezin
Beam 2.56.0 release
We are happy to present the new 2.56.0 release of Beam.
This release includes both improvements and new functionality.
See the download page for this release.
For more information on changes in 2.56.0, check out the detailed release notes.
Highlights
- Added FlinkRunner for Flink 1.17, removed support for Flink 1.12 and 1.13. Previous version of Pipeline running on Flink 1.16 and below can be upgraded to 1.17, if the Pipeline is first updated to Beam 2.56.0 with the same Flink version. After Pipeline runs with Beam 2.56.0, it should be possible to upgrade to FlinkRunner with Flink 1.17. (#29939)
- New Managed I/O Java API (#30830).
- New Ordered Processing PTransform added for processing order-sensitive stateful data (#30735).
I/Os
- Upgraded Avro version to 1.11.3, kafka-avro-serializer and kafka-schema-registry-client versions to 7.6.0 (Java) (#30638).
The newer Avro package is known to have breaking changes. If you are affected, you can keep pinned to older Avro versions which are also tested with Beam. - Iceberg read/write support is available through the new Managed I/O Java API (#30830).
New Features / Improvements
- Profiling of Cythonized code has been disabled by default. This might improve performance for some Python pipelines (#30938).
- Bigtable enrichment handler now accepts a custom function to build a composite row key. (Python) (#30974).
Breaking Changes
- Default consumer polling timeout for KafkaIO.Read was increased from 1 second to 2 seconds. Use KafkaIO.read().withConsumerPollingTimeout(Duration duration) to configure this timeout value when necessary (#30870).
- Python Dataflow users no longer need to manually specify --streaming for pipelines using unbounded sources such as ReadFromPubSub.
Bugfixes
- Fixed locking issue when shutting down inactive bundle processors. Symptoms of this issue include slowness or stuckness in long-running jobs (Python) (#30679).
- Fixed logging issue that caused silecing the pip output when installing of dependencies provided in
--requirements_file
(Python).
List of Contributors
According to git shortlog, the following people contributed to the 2.56.0 release. Thank you to all contributors!
Abacn
Ahmed Abualsaud
Andrei Gurau
Andrey Devyatkin
Aravind Pedapudi
Arun Pandian
Arvind Ram
Bartosz Zablocki
Brachi Packter
Byron Ellis
Chamikara Jayalath
Clement DAL PALU
Damon
Danny McCormick
Daria Bezkorovaina
Dip Patel
Evan Burrell
Hai Joey Tran
Jack McCluskey
Jan Lukavský
JayajP
Jeff Kinard
Julien Tournay
Kenneth Knowles
Luís Bianchin
Maciej Szwaja
Melody Shen
Oleh Borysevych
Pablo Estrada
Rebecca Szper
Ritesh Ghorse
Robert Bradshaw
Sam Whittle
Sergei Lilichenko
Shahar Epstein
Shunping Huang
Svetak Sundhar
Timothy Itodo
Veronica Wasson
Vitaly Terentyev
Vlado Djerek
Yi Hu
akashorabek
bzablocki
clmccart
damccorm
dependabot[bot]
dmitryor
github-actions[bot]
liferoad
martin trieu
tvalentyn
xianhualiu
Beam 2.55.1 release
Bugfixes
- Fixed issue that broke WriteToJson in languages other than Java (X-lang) (#30776).
Beam 2.55.0 release
We are happy to present the new 2.55.0 release of Beam.
This release includes both improvements and new functionality.
See the download page for this release.
For more information on changes in 2.55.0, check out the detailed release notes.
Highlights
- The Python SDK will now include automatically generated wrappers for external Java transforms! (#29834)
I/Os
- Added support for handling bad records to BigQueryIO (#30081).
- Full Support for Storage Read and Write APIs
- Partial Support for File Loads (Failures writing to files supported, failures loading files to BQ unsupported)
- No Support for Extract or Streaming Inserts
- Added support for handling bad records to PubSubIO (#30372).
- Support is not available for handling schema mismatches, and enabling error handling for writing to Pub/Sub topics with schemas is not recommended
--enableBundling
pipeline option for BigQueryIO DIRECT_READ is replaced by--enableStorageReadApiV2
. Both were considered experimental and subject to change (Java) (#26354).
New Features / Improvements
- Allow writing clustered and not time-partitioned BigQuery tables (Java) (#30094).
- Redis cache support added to RequestResponseIO and Enrichment transform (Python) (#30307)
- Merged
sdks/java/fn-execution
andrunners/core-construction-java
into the main SDK. These artifacts were never meant for users, but noting
that they no longer exist. These are steps to bring portability into the core SDK alongside all other core functionality. - Added Vertex AI Feature Store handler for Enrichment transform (Python) (#30388)
Breaking Changes
- Arrow version was bumped to 15.0.0 from 5.0.0 (#30181).
- Go SDK users who build custom worker containers may run into issues with the move to distroless containers as a base (see Security Fixes).
- The issue stems from distroless containers lacking additional tools, which current custom container processes may rely on.
- See https://beam.apache.org/documentation/runtime/environments/#from-scratch-go for instructions on building and using a custom container.
- Python SDK has changed the default value for the
--max_cache_memory_usage_mb
pipeline option from 100 to 0. This option was first introduced in the 2.52.0 SDK version. This change restores the behavior of the 2.51.0 SDK, which does not use the state cache. If your pipeline uses iterable side inputs views, consider increasing the cache size by setting the option manually. (#30360).
Deprecations
- N/A
Bug fixes
- Fixed
SpannerIO.readChangeStream
to support propagating credentials from pipeline options
to thegetDialect
calls for authenticating with Spanner (Java) (#30361). - Reduced the number of HTTP requests in GCSIO function calls (Python) (#30205)
Security Fixes
- Go SDK base container image moved to distroless/base-nossl-debian12, reducing vulnerable container surface to kernel and glibc (#30011).
Known Issues
- In Python pipelines, when shutting down inactive bundle processors, shutdown logic can overaggressively hold the lock, blocking acceptance of new work. Symptoms of this issue include slowness or stuckness in long-running jobs. Fixed in 2.56.0 (#30679).
List of Contributors
According to git shortlog, the following people contributed to the {$RELEASE_VERSION} release. Thank you to all contributors!
Ahmed Abualsaud
Anand Inguva
Andrew Crites
Andrey Devyatkin
Arun Pandian
Arvind Ram
Chamikara Jayalath
Chris Gray
Claire McGinty
Damon Douglas
Dan Ellis
Danny McCormick
Daria Bezkorovaina
Dima I
Edward Cui
Ferran Fernández Garrido
GStravinsky
Jan Lukavský
Jason Mitchell
JayajP
Jeff Kinard
Jeffrey Kinard
Kenneth Knowles
Mattie Fu
Michel Davit
Oleh Borysevych
Ritesh Ghorse
Ritesh Tarway
Robert Bradshaw
Robert Burke
Sam Whittle
Scott Strong
Shunping Huang
Steven van Rossum
Svetak Sundhar
Talat UYARER
Ukjae Jeong (Jay)
Vitaly Terentyev
Vlado Djerek
Yi Hu
akashorabek
case-k
clmccart
dengwe1
dhruvdua
hardshah
johnjcasey
liferoad
martin trieu
tvalentyn