Skip to content

Releases: apache/beam

Beam 2.62.0 release

13 Jan 15:35
Compare
Choose a tag to compare

We are happy to present the new 2.62.0 release of Beam.
This release includes both improvements and new functionality.
See the download page for this release.

For more information on changes in 2.62.0, check out the detailed release notes.

New Features / Improvements

  • Added support for stateful processing in Spark Runner for streaming pipelines. Timer functionality is not yet supported and will be implemented in a future release (#33237).
  • The datetime module is now available for use in jinja templatization for yaml.
  • Improved batch performance of SparkRunner's GroupByKey (#20943).
  • Support OnWindowExpiration in Prism (#32211).
    • This enables initial Java GroupIntoBatches support.
  • Support OrderedListState in Prism (#32929).

I/Os

  • gcs-connector config options can be set via GcsOptions (Java) (#32769).
  • [Managed Iceberg] Support partitioning by time (year, month, day, hour) for types date, time, timestamp, and timestamp(tz) (#32939)
  • Upgraded the default version of Hadoop dependencies to 3.4.1. Hadoop 2.10.2 is still supported (Java) (#33011).
  • [BigQueryIO] Create managed BigLake tables dynamically (#33125)

Breaking Changes

  • Upgraded ZetaSQL to 2024.11.1 (#32902). Java11+ is now needed if Beam's ZetaSQL component is used.

Bugfixes

  • Fixed EventTimeTimer ordering in Prism. (#32222).
  • [Managed Iceberg] Fixed a bug where DataFile metadata was assigned incorrect partition values (#33549).

Security Fixes

For the most up to date list of known issues, see https://github.com/apache/beam/blob/master/CHANGES.md

List of Contributors

According to git shortlog, the following people contributed to the 2.62.0 release. Thank you to all contributors!

Ahmed Abualsaud, Ahmet Altay, Alex Merose, Andrew Crites, Arnout Engelen, Attila Doroszlai, Bartosz Zablocki, Chamikara Jayalath, Claire McGinty, Claude van der Merwe, Damon Douglas, Danny McCormick, Gabija Balvociute, Hai Joey Tran, Hakampreet Singh Pandher, Ian Sullivan, Jack McCluskey, Jan Lukavský, Jeff Kinard, Jeffrey Kinard, Laura Detmer, Kenneth Knowles, Martin Trieu, Mattie Fu, Michel Davit, Naireen Hussain, Nick Anikin, Radosław Stankiewicz, Ravi Magham, Reeba Qureshi, Robert Bradshaw, Robert Burke, Rohit Sinha, S. Veyrié, Sam Whittle, Shingo Furuyama, Shunping Huang, Svetak Sundhar, Valentyn Tymofieiev, Vlado Djerek, XQ Hu, Yi Hu, twosom

Beam 2.61.0 release

14 Nov 14:46
Compare
Choose a tag to compare

We are happy to present the new 2.61.0 release of Beam.
This release includes both improvements and new functionality.

For more information on changes in 2.61.0, check out the detailed release notes.

Highlights

  • [Python] Introduce Managed Transforms API (#31495)
  • Flink 1.19 support added (#32648)

I/Os

  • [Managed Iceberg] Support creating tables if needed (#32686)
  • [Managed Iceberg] Now available in Python SDK (#31495)
  • [Managed Iceberg] Add support for TIMESTAMP, TIME, and DATE types (#32688)
  • BigQuery CDC writes are now available in Python SDK, only supported when using StorageWrite API at least once mode (#32527)
  • [Managed Iceberg] Allow updating table partition specs during pipeline runtime (#32879)
  • Added BigQueryIO as a Managed IO (#31486)
  • Support for writing to Solace messages queues (SolaceIO.Write) added (Java) (#31905).

New Features / Improvements

  • Added support for read with metadata in MqttIO (Java) (#32195)
  • Added support for processing events which use a global sequence to "ordered" extension (Java) #32540
  • Add new meta-transform FlattenWith and Tee that allow one to introduce branching
    without breaking the linear/chaining style of pipeline construction.
  • Use Prism as a fallback to the Python Portable runner when running a pipeline with the Python Direct runner (#32876)

Deprecations

  • Removed support for Flink 1.15 and 1.16
  • Removed support for Python 3.8

Bugfixes

  • (Java) Fixed tearDown not invoked when DoFn throws on Portable Runners (#18592, #31381).
  • (Java) Fixed protobuf error with MapState.remove() in Dataflow Streaming Java Legacy Runner without Streaming Engine (#32892).
  • Adding flag to support conditionally disabling auto-commit in JdbcIO ReadFn (#31111)

Known Issues

N/A

For the most up to date list of known issues, see https://github.com/apache/beam/blob/master/CHANGES.md

List of Contributors

According to git shortlog, the following people contributed to the 2.60.0 release. Thank you to all contributors!

Ahmed Abualsaud, Ahmet Altay, Arun Pandian, Ayush Pandey, Chamikara Jayalath, Chris Ashcraft, Christoph Grotz, DKPHUONG, Damon, Danny Mccormick, Dmitry Ulyumdzhiev, Ferran Fernández Garrido, Hai Joey Tran, Hyeonho Kim, Idan Attias, Israel Herraiz, Jack McCluskey, Jan Lukavský, Jeff Kinard, Jeremy Edwards, Joey Tran, Kenneth Knowles, Maciej Szwaja, Manit Gupta, Mattie Fu, Michel Davit, Minbo Bae, Mohamed Awnallah, Naireen Hussain, Rebecca Szper, Reeba Qureshi, Reuven Lax, Robert Bradshaw, Robert Burke, S. Veyrié, Sam Whittle, Sergei Lilichenko, Shunping Huang, Steven van Rossum, Tan Le, Thiago Nunes, Vitaly Terentyev, Vlado Djerek, Yi Hu, claudevdm, fozzie15, johnjcasey, kushmiD, liferoad, martin trieu, pablo rodriguez defino, razvanculea, s21lee, tvalentyn, twosom

Beam 2.60.0 release

16 Oct 17:47
Compare
Choose a tag to compare

We are happy to present the new 2.60.0 release of Beam.
This release includes both improvements and new functionality.

For more information on changes in 2.60.0, check out the detailed release notes.

Highlights

  • Added support for using vLLM in the RunInference transform (Python) (#32528)
  • [Managed Iceberg] Added support for streaming writes (#32451)
  • [Managed Iceberg] Added auto-sharding for streaming writes (#32612)
  • [Managed Iceberg] Added support for writing to dynamic destinations (#32565)

New Features / Improvements

  • Dataflow worker can install packages from Google Artifact Registry Python repositories (Python) (#32123).
  • Added support for Zstd codec in SerializableAvroCodecFactory (Java) (#32349)
  • Added support for using vLLM in the RunInference transform (Python) (#32528)
  • Prism release binaries and container bootloaders are now being built with the latest Go 1.23 patch. (#32575)
  • Prism
    • Prism now supports Bundle Finalization. (#32425)
  • Significantly improved performance of Kafka IO reads that enable commitOffsetsInFinalize by removing the data reshuffle from SDF implementation. (#31682).
  • Added support for dynamic writing in MqttIO (Java) (#19376)
  • Optimized Spark Runner parDo transform evaluator (Java) (#32537)
  • [Managed Iceberg] More efficient manifest file writes/commits (#32666)

Breaking Changes

  • In Python, assert_that now throws if it is not in a pipeline context instead of silently succeeding (#30771)
  • In Python and YAML, ReadFromJson now override the dtype from None to
    an explicit False. Most notably, string values like "123" are preserved
    as strings rather than silently coerced (and possibly truncated) to numeric
    values. To retain the old behavior, pass dtype=True (or any other value
    accepted by pandas.read_json).
  • Users of KafkaIO Read transform that enable commitOffsetsInFinalize might encounter pipeline graph compatibility issues when updating the pipeline. To mitigate, set the updateCompatibilityVersion option to the SDK version used for the original pipeline, example --updateCompatabilityVersion=2.58.1

Deprecations

  • Python 3.8 is reaching EOL and support is being removed in Beam 2.61.0. The 2.60.0 release will warn users
    when running on 3.8. (#31192)

Bugfixes

  • (Java) Fixed custom delimiter issues in TextIO (#32249, #32251).
  • (Java, Python, Go) Fixed PeriodicSequence backlog bytes reporting, which was preventing Dataflow Runner autoscaling from functioning properly (#32506).
  • (Java) Fix improper decoding of rows with schemas containing nullable fields when encoded with a schema with equal encoding positions but modified field order. (#32388).

Known Issues

N/A

For the most up to date list of known issues, see https://github.com/apache/beam/blob/master/CHANGES.md

List of Contributors

According to git shortlog, the following people contributed to the 2.60.0 release. Thank you to all contributors!

Ahmed Abualsaud, Aiden Grossman, Arun Pandian, Bartosz Zablocki, Chamikara Jayalath, Claire McGinty, DKPHUONG, Damon Douglass, Danny McCormick, Dip Patel, Ferran Fernández Garrido, Hai Joey Tran, Hyeonho Kim, Igor Bernstein, Israel Herraiz, Jack McCluskey, Jaehyeon Kim, Jeff Kinard, Jeffrey Kinard, Joey Tran, Kenneth Knowles, Kirill Berezin, Michel Davit, Minbo Bae, Naireen Hussain, Niel Markwick, Nito Buendia, Reeba Qureshi, Reuven Lax, Robert Bradshaw, Robert Burke, Rohit Sinha, Ryan Fu, Sam Whittle, Shunping Huang, Svetak Sundhar, Udaya Chathuranga, Vitaly Terentyev, Vlado Djerek, Yi Hu, Claude van der Merwe, XQ Hu, Martin Trieu, Valentyn Tymofieiev, twosom

Beam 2.59.0 release

24 Aug 17:16
Compare
Choose a tag to compare

We are happy to present the new 2.59.0 release of Beam.
This release includes both improvements and new functionality.
See the download page for this release.

For more information on changes in 2.59.0, check out the detailed release notes.

Highlights

  • Added support for setting a configureable timeout when loading a model and performing inference in the RunInference transform using with_exception_handling (#32137)
  • Initial experimental support for using Prism with the Java and Python SDKs
    • Prism is presently targeting local testing usage, or other small scale execution.
    • For Java, use 'PrismRunner', or 'TestPrismRunner' as an argument to the --runner flag.
    • For Python, use 'PrismRunner' as an argument to the --runner flag.
    • Go already uses Prism as the default local runner.

I/Os

  • Improvements to the performance of BigqueryIO when using withPropagateSuccessfulStorageApiWrites(true) method (Java) (#31840).
  • [Managed Iceberg] Added support for writing to partitioned tables (#32102)
  • Update ClickHouseIO to use the latest version of the ClickHouse JDBC driver (#32228).
  • Add ClickHouseIO dedicated User-Agent (#32252).

New Features / Improvements

  • BigQuery endpoint can be overridden via PipelineOptions, this enables BigQuery emulators (Java) (#28149).
  • Go SDK Minimum Go Version updated to 1.21 (#32092).
  • [BigQueryIO] Added support for withFormatRecordOnFailureFunction() for STORAGE_WRITE_API and STORAGE_API_AT_LEAST_ONCE methods (Java) (#31354).
  • Updated Go protobuf package to new version (Go) (#21515).
  • Added support for setting a configureable timeout when loading a model and performing inference in the RunInference transform using with_exception_handling (#32137)
  • Adds OrderedListState support for Java SDK via FnApi.
  • Initial support for using Prism from the Python and Java SDKs.

Bugfixes

  • Fixed incorrect service account impersonation flow for Python pipelines using BigQuery IOs (#32030).
  • Auto-disable broken and meaningless upload_graph feature when using Dataflow Runner V2 (#32159).
  • (Python) Upgraded google-cloud-storage to version 2.18.2 to fix a data corruption issue (#32135).
  • (Go) Fix corruption on State API writes. (#32245).

Known Issues

  • Prism is under active development and does not yet support all pipelines. See #29650 for progress.
    • In the 2.59.0 release, Prism passes most runner validations tests with the exceptions of pipelines using the following features:
      OrderedListState, OnWindowExpiry (eg. GroupIntoBatches), CustomWindows, MergingWindowFns, Trigger and WindowingStrategy associated features, Bundle Finalization, Looping Timers, and some Coder related issues such as with Python combiner packing, and Java Schema transforms, and heterogenous flatten coders. Processing Time timers do not yet have real time support.
    • If your pipeline is having difficulty with the Python or Java direct runners, but runs well on Prism, please let us know.

For the most up to date list of known issues, see https://github.com/apache/beam/blob/master/CHANGES.md

List of Contributors

According to git shortlog, the following people contributed to the 2.59.0 release. Thank you to all contributors!

Ahmed Abualsaud,Ahmet Altay,Andrew Crites,atask-g,Axel Magnuson,Ayush Pandey,Bartosz Zablocki,Chamikara Jayalath,cutiepie-10,Damon,Danny McCormick,dependabot[bot],Eddie Phillips,Francis O'Hara,Hyeonho Kim,Israel Herraiz,Jack McCluskey,Jaehyeon Kim,Jan Lukavský,Jeff Kinard,Jeffrey Kinard,jonathan-lemos,jrmccluskey,Kirill Berezin,Kiruphasankaran Nataraj,lahariguduru,liferoad,lostluck,Maciej Szwaja,Manit Gupta,Mark Zitnik,martin trieu,Naireen Hussain,Prerit Chandok,Radosław Stankiewicz,Rebecca Szper,Robert Bradshaw,Robert Burke,ron-gal,Sam Whittle,Sergei Lilichenko,Shunping Huang,Svetak Sundhar,Thiago Nunes,Timothy Itodo,tvalentyn,twosom,Vatsal,Vitaly Terentyev,Vlado Djerek,Yifan Ye,Yi Hu

Beam 2.58.1 release

16 Aug 18:44
Compare
Choose a tag to compare

We are happy to present the new 2.58.1 release of Beam.
This release includes both improvements and new functionality.
See the download page for this release.

New Features / Improvements

  • Fixed issue where KafkaIO Records read with ReadFromKafkaViaSDF are redistributed and may contain duplicates regardless of the configuration. This affects Java pipelines with Dataflow v2 runner and xlang pipelines reading from Kafka, (#32196)

Known Issues

  • Large Dataflow graphs using runner v2, or pipelines explicitly enabling the upload_graph experiment, will fail at construction time (#32159).
  • Python pipelines that run with 2.53.0-2.58.0 SDKs and read data from GCS might be affected by a data corruption issue (#32169). The issue will be fixed in 2.59.0 (#32135). To work around this, update the google-cloud-storage package to version 2.18.2 or newer.

For the most up to date list of known issues, see https://github.com/apache/beam/blob/master/CHANGES.md

List of Contributors

According to git shortlog, the following people contributed to the 2.58.1 release. Thank you to all contributors!

Danny McCormick

Sam Whittle

Beam 2.58.0 release

06 Aug 13:49
Compare
Choose a tag to compare

We are happy to present the new 2.58.0 release of Beam.
This release includes both improvements and new functionality.
See the download page for this release.

For more information about changes in 2.58.0, check out the detailed release notes.

I/Os

  • Support for Solace source (SolaceIO.Read) added (Java) (#31440).

New Features / Improvements

  • Multiple RunInference instances can now share the same model instance by setting the model_identifier parameter (Python) (#31665).
  • Added options to control the number of Storage API multiplexing connections (#31721)
  • [BigQueryIO] Better handling for batch Storage Write API when it hits AppendRows throughput quota (#31837)
  • [IcebergIO] All specified catalog properties are passed through to the connector (#31726)
  • Removed a third-party LGPL dependency from the Go SDK (#31765).
  • Support for MapState and SetState when using Dataflow Runner v1 with Streaming Engine (Java) ([#18200])

Breaking Changes

  • [IcebergIO] IcebergCatalogConfig was changed to support specifying catalog properties in a key-store fashion (#31726)
  • [SpannerIO] Added validation that query and table cannot be specified at the same time for SpannerIO.read(). Previously withQuery overrides withTable, if set (#24956).

Bug fixes

  • [BigQueryIO] Fixed a bug in batch Storage Write API that frequently exhausted concurrent connections quota (#31710)

List of Contributors

According to git shortlog, the following people contributed to the 2.58.0 release. Thank you to all contributors!

Ahmed Abualsaud

Ahmet Altay

Alexandre Moueddene

Alexey Romanenko

Andrew Crites

Bartosz Zablocki

Celeste Zeng

Chamikara Jayalath

Clay Johnson

Damon Douglass

Danny McCormick

Dilnaz Amanzholova

Florian Bernard

Francis O'Hara

George Ma

Israel Herraiz

Jack McCluskey

Jaehyeon Kim

James Roseman

Kenneth Knowles

Maciej Szwaja

Michel Davit

Minh Son Nguyen

Naireen

Niel Markwick

Oliver Cardoza

Robert Bradshaw

Robert Burke

Rohit Sinha

S. Veyrié

Sam Whittle

Shunping Huang

Svetak Sundhar

TongruiLi

Tony Tang

Valentyn Tymofieiev

Vitaly Terentyev

Yi Hu

Beam 2.57.0 Release

26 Jun 20:00
Compare
Choose a tag to compare

We are happy to present the new 2.57.0 release of Beam.
This release includes both improvements and new functionality.
See the download page for this release.

For more information on changes in 2.57.0, check out the detailed release notes.

Highlights

  • Apache Beam adds Python 3.12 support (#29149).
  • Added FlinkRunner for Flink 1.18 (#30789).

I/Os

  • Ensure that BigtableIO closes the reader streams (#31477).

New Features / Improvements

  • Added Feast feature store handler for enrichment transform (Python) (#30957).
  • BigQuery per-worker metrics are reported by default for Streaming Dataflow Jobs (Java) (#31015)
  • Adds inMemory() variant of Java List and Map side inputs for more efficient lookups when the entire side input fits into memory.
  • Beam YAML now supports the jinja templating syntax.
    Template variables can be passed with the (json-formatted) --jinja_variables flag.
  • DataFrame API now supports pandas 2.1.x and adds 12 more string functions for Series.(#31185).
  • Added BigQuery handler for enrichment transform (Python) (#31295)
  • Disable soft delete policy when creating the default bucket for a project (Java) (#31324).
  • Added DoFn.SetupContextParam and DoFn.BundleContextParam which can be used
    as a python DoFn.process, Map, or FlatMap parameter to invoke a context
    manager per DoFn setup or bundle (analogous to using setup/teardown
    or start_bundle/finish_bundle respectively.)
  • Go SDK Prism Runner
    • Pre-built Prism binaries are now part of the release and are available via the Github release page. (#29697).
    • Some pipelines will work on Java and Python, but this is in part to prepare for real runner wrappers in 2.58.0
    • ProcessingTime is now handled synthetically with TestStream pipelines and Non-TestStream pipelines, for fast test pipeline execution by default. (#30083).
      • Prism does NOT yet support "real time" execution for this release.
  • Improve processing for large elements to reduce the chances for exceeding 2GB protobuf limits (Python)([https://github.com//issues/31607]).

Breaking Changes

  • Java's View.asList() side inputs are now optimized for iterating rather than
    indexing when in the global window.
    This new implementation still supports all (immutable) List methods as before,
    but some of the random access methods like get() and size() will be slower.
    To use the old implementation one can use View.asList().withRandomAccess().
  • SchemaTransforms implemented with TypedSchemaTransformProvider now produce a
    configuration Schema with snake_case naming convention
    (#31374). This will make the following
    cases problematic:
    • Running a pre-2.57.0 remote SDK pipeline containing a 2.57.0+ Java SchemaTransform,
      and vice versa:
    • Running a 2.57.0+ remote SDK pipeline containing a pre-2.57.0 Java SchemaTransform
    • All direct uses of Python's SchemaAwareExternalTransform
      should be updated to use new snake_case parameter names.
  • Upgraded Jackson Databind to 2.15.4 (Java) (#26743).
    jackson-2.15 has known breaking changes. An important one is it imposed a buffer limit for parser.
    If your custom PTransform/DoFn are affected, refer to #31580 for mitigation.

For the most up to date list of known issues, see https://github.com/apache/beam/blob/master/CHANGES.md

List of Contributors

According to git shortlog, the following people contributed to the 2.57.0 release. Thank you to all contributors!

Ahmed Abualsaud

Ahmet Altay

Alexey Romanenko

Andrey Devyatkin

Anody Zhang

Arvind Ram

Ben Konz

Bruno Volpato

Celeste Zeng

Chamikara Jayalath

Claire McGinty

Colm O hEigeartaigh

Damon

Danny McCormick

Evan Galpin

Ferran Fernández Garrido

Florent Biville

Jack Dingilian

Jack McCluskey

Jan Lukavský

JayajP

Jeff Kinard

Jeffrey Kinard

John Casey

Justin Uang

Kenneth Knowles

Kevin Zhou

Liam Miller-Cushon

Maarten Vercruysse

Maciej Szwaja

Maja Kontrec Rönn

Marc hurabielle

Martin Trieu

Mattie Fu

Min Zhu

Naireen Hussain

Nick Anikin

Pablo Rodriguez Defino

Paul King

Priyans Desai

Radosław Stankiewicz

Rebecca Szper

Ritesh Ghorse

Robert Bradshaw

Robert Burke

Rodrigo Bozzolo

RyuSA

Sam Rohde

Sam Whittle

Sergei Lilichenko

Shahar Epstein

Shunping Huang

Svetak Sundhar

Tomo Suzuki

Tony Tang

Valentyn Tymofieiev

Vincent Stollenwerk

Vineet Kumar

Vitaly Terentyev

Vlado Djerek

XQ Hu

Yi Hu

akashorabek

bzablocki

kberezin

Beam 2.56.0 release

02 May 01:14
Compare
Choose a tag to compare

We are happy to present the new 2.56.0 release of Beam.
This release includes both improvements and new functionality.
See the download page for this release.

For more information on changes in 2.56.0, check out the detailed release notes.

Highlights

  • Added FlinkRunner for Flink 1.17, removed support for Flink 1.12 and 1.13. Previous version of Pipeline running on Flink 1.16 and below can be upgraded to 1.17, if the Pipeline is first updated to Beam 2.56.0 with the same Flink version. After Pipeline runs with Beam 2.56.0, it should be possible to upgrade to FlinkRunner with Flink 1.17. (#29939)
  • New Managed I/O Java API (#30830).
  • New Ordered Processing PTransform added for processing order-sensitive stateful data (#30735).

I/Os

  • Upgraded Avro version to 1.11.3, kafka-avro-serializer and kafka-schema-registry-client versions to 7.6.0 (Java) (#30638).
    The newer Avro package is known to have breaking changes. If you are affected, you can keep pinned to older Avro versions which are also tested with Beam.
  • Iceberg read/write support is available through the new Managed I/O Java API (#30830).

New Features / Improvements

  • Profiling of Cythonized code has been disabled by default. This might improve performance for some Python pipelines (#30938).
  • Bigtable enrichment handler now accepts a custom function to build a composite row key. (Python) (#30974).

Breaking Changes

  • Default consumer polling timeout for KafkaIO.Read was increased from 1 second to 2 seconds. Use KafkaIO.read().withConsumerPollingTimeout(Duration duration) to configure this timeout value when necessary (#30870).
  • Python Dataflow users no longer need to manually specify --streaming for pipelines using unbounded sources such as ReadFromPubSub.

Bugfixes

  • Fixed locking issue when shutting down inactive bundle processors. Symptoms of this issue include slowness or stuckness in long-running jobs (Python) (#30679).
  • Fixed logging issue that caused silecing the pip output when installing of dependencies provided in --requirements_file (Python).

List of Contributors

According to git shortlog, the following people contributed to the 2.56.0 release. Thank you to all contributors!

Abacn

Ahmed Abualsaud

Andrei Gurau

Andrey Devyatkin

Aravind Pedapudi

Arun Pandian

Arvind Ram

Bartosz Zablocki

Brachi Packter

Byron Ellis

Chamikara Jayalath

Clement DAL PALU

Damon

Danny McCormick

Daria Bezkorovaina

Dip Patel

Evan Burrell

Hai Joey Tran

Jack McCluskey

Jan Lukavský

JayajP

Jeff Kinard

Julien Tournay

Kenneth Knowles

Luís Bianchin

Maciej Szwaja

Melody Shen

Oleh Borysevych

Pablo Estrada

Rebecca Szper

Ritesh Ghorse

Robert Bradshaw

Sam Whittle

Sergei Lilichenko

Shahar Epstein

Shunping Huang

Svetak Sundhar

Timothy Itodo

Veronica Wasson

Vitaly Terentyev

Vlado Djerek

Yi Hu

akashorabek

bzablocki

clmccart

damccorm

dependabot[bot]

dmitryor

github-actions[bot]

liferoad

martin trieu

tvalentyn

xianhualiu

Beam 2.55.1 release

08 Apr 13:09
Compare
Choose a tag to compare

Bugfixes

  • Fixed issue that broke WriteToJson in languages other than Java (X-lang) (#30776).

Beam 2.55.0 release

25 Mar 19:54
Compare
Choose a tag to compare

We are happy to present the new 2.55.0 release of Beam.
This release includes both improvements and new functionality.
See the download page for this release.

For more information on changes in 2.55.0, check out the detailed release notes.

Highlights

  • The Python SDK will now include automatically generated wrappers for external Java transforms! (#29834)

I/Os

  • Added support for handling bad records to BigQueryIO (#30081).
    • Full Support for Storage Read and Write APIs
    • Partial Support for File Loads (Failures writing to files supported, failures loading files to BQ unsupported)
    • No Support for Extract or Streaming Inserts
  • Added support for handling bad records to PubSubIO (#30372).
    • Support is not available for handling schema mismatches, and enabling error handling for writing to Pub/Sub topics with schemas is not recommended
  • --enableBundling pipeline option for BigQueryIO DIRECT_READ is replaced by --enableStorageReadApiV2. Both were considered experimental and subject to change (Java) (#26354).

New Features / Improvements

  • Allow writing clustered and not time-partitioned BigQuery tables (Java) (#30094).
  • Redis cache support added to RequestResponseIO and Enrichment transform (Python) (#30307)
  • Merged sdks/java/fn-execution and runners/core-construction-java into the main SDK. These artifacts were never meant for users, but noting
    that they no longer exist. These are steps to bring portability into the core SDK alongside all other core functionality.
  • Added Vertex AI Feature Store handler for Enrichment transform (Python) (#30388)

Breaking Changes

  • Arrow version was bumped to 15.0.0 from 5.0.0 (#30181).
  • Go SDK users who build custom worker containers may run into issues with the move to distroless containers as a base (see Security Fixes).
  • Python SDK has changed the default value for the --max_cache_memory_usage_mb pipeline option from 100 to 0. This option was first introduced in the 2.52.0 SDK version. This change restores the behavior of the 2.51.0 SDK, which does not use the state cache. If your pipeline uses iterable side inputs views, consider increasing the cache size by setting the option manually. (#30360).

Deprecations

  • N/A

Bug fixes

  • Fixed SpannerIO.readChangeStream to support propagating credentials from pipeline options
    to the getDialect calls for authenticating with Spanner (Java) (#30361).
  • Reduced the number of HTTP requests in GCSIO function calls (Python) (#30205)

Security Fixes

  • Go SDK base container image moved to distroless/base-nossl-debian12, reducing vulnerable container surface to kernel and glibc (#30011).

Known Issues

  • In Python pipelines, when shutting down inactive bundle processors, shutdown logic can overaggressively hold the lock, blocking acceptance of new work. Symptoms of this issue include slowness or stuckness in long-running jobs. Fixed in 2.56.0 (#30679).

List of Contributors

According to git shortlog, the following people contributed to the {$RELEASE_VERSION} release. Thank you to all contributors!

Ahmed Abualsaud

Anand Inguva

Andrew Crites

Andrey Devyatkin

Arun Pandian

Arvind Ram

Chamikara Jayalath

Chris Gray

Claire McGinty

Damon Douglas

Dan Ellis

Danny McCormick

Daria Bezkorovaina

Dima I

Edward Cui

Ferran Fernández Garrido

GStravinsky

Jan Lukavský

Jason Mitchell

JayajP

Jeff Kinard

Jeffrey Kinard

Kenneth Knowles

Mattie Fu

Michel Davit

Oleh Borysevych

Ritesh Ghorse

Ritesh Tarway

Robert Bradshaw

Robert Burke

Sam Whittle

Scott Strong

Shunping Huang

Steven van Rossum

Svetak Sundhar

Talat UYARER

Ukjae Jeong (Jay)

Vitaly Terentyev

Vlado Djerek

Yi Hu

akashorabek

case-k

clmccart

dengwe1

dhruvdua

hardshah

johnjcasey

liferoad

martin trieu

tvalentyn