Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Website] Version 15.0.0 blog post #464

Merged
merged 12 commits into from
Jan 22, 2024
188 changes: 188 additions & 0 deletions _posts/2024-01-10-15.0.0-release.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,188 @@
---
layout: post
title: "Apache Arrow 15.0.0 Release"
date: "2024-01-10 00:00:00"
author: pmc
categories: [release]
---
<!--
{% comment %}
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to you under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
{% endcomment %}
-->


The Apache Arrow team is pleased to announce the 15.0.0 release. This covers
over 3 months of development work and includes [**XXX resolved issues**][1]
from [**YYY distinct contributors**][2]. See the [Install Page](https://arrow.apache.org/install/)
to learn how to get the libraries for your platform.

The release notes below are not exhaustive and only expose selected highlights
of the release. Many other bugfixes and improvements have been made: we refer
you to the [complete changelog][3].

## Community

Since the 14.0.0 release, Curt Hagenlocher, Xuwei Fu, James Duong and Felipe Oliveira Carvalho
have been invited to be committers.
Jonathan Keane and Raúl Cumplido have joined the Project Management Committee (PMC).

As per our tradition of rotating the PMC chair once a year
Andy Grove was elected as the new PMC chair and VP.

Thanks for your contributions and participation in the project!

## C Data Interface notes

New format strings have been added for ListView, LargeListView, BinaryView and StringView array types.


## Arrow Flight RPC notes
raulcd marked this conversation as resolved.
Show resolved Hide resolved

Flight SQL is now considered stable (GH-39037). The Flight SQL specification was clarified regarding how the result set schema of a prepared statement is affected by bound parameters (GH-37061).

The JDBC Arrow Flight SQL driver now supports mTLS authentication (GH-38460) and bind parameters (GH-33475), follows the Flight RPC spec when fetching data (GH-34532), and can reuse credentials across metadata and data connections (GH-38576). On macOS it will also use the system keychain to be consistent with other platforms (GH-39014). Applications can also retrieve the underlying Flight RPC metadata from the JDBC driver (GH-38024, GH-38022).

## C++ notes
raulcd marked this conversation as resolved.
Show resolved Hide resolved

### Compute layer

### Datasets

### Gandiva

### Parquet

### Substrait

### Miscellaneous

## C# notes
raulcd marked this conversation as resolved.
Show resolved Hide resolved

## Go Notes
raulcd marked this conversation as resolved.
Show resolved Hide resolved

### Bug Fixes

#### Arrow

* Ensured reliability of AuthenticateBasicToken behind proxies ([GH-38198](https://github.com/apache/arrow/issues/38198))
* Ensured release callback is properly called on C Data imported arrays/batches ([GH-38281](https://github.com/apache/arrow/issues/38281))
* Fixed rounding errors in decimal256 string functions ([GH-38395](https://github.com/apache/arrow/issues/38395))
* Added `ValueLen` to Binary and String array interface ([GH-38458](https://github.com/apache/arrow/issues/38458))
* Fixed Decimal128 rounding issues ([GH-38477](https://github.com/apache/arrow/issues/38477))
* Fixed memory leak in IPC LZ4 decompressor ([GH-38728](https://github.com/apache/arrow/issues/38728))
* Addressed Data race in `GetToTimeFunc` for fixed timestamp data types ([GH-38795](https://github.com/apache/arrow/issues/38795))
* Fixed "index out of range" error for empty resultsets of FlightSQL driver ([GH-39238](https://github.com/apache/arrow/issues/39238))

#### Parquet

* Fixed issue with max definition levels when writing a Parquet file under certain circumstances ([GH-38503](https://github.com/apache/arrow/issues/38503))
* File writer now properly tracks the number of rows written beyond the last row group ([GH-38516](https://github.com/apache/arrow/issues/38516))

### Enhancements

#### Arrow

* Added an Avro OCF reader for converting Avro files directly to Arrow record batches ([GH-36760](https://github.com/apache/arrow/issues/36760))
* Added support for StringView ([GH-38718](https://github.com/apache/arrow/issues/38718)) and C Data ABI StringViews ([GH-39013](https://github.com/apache/arrow/issues/39013))
* GC Checks were enabled for CI running integration tests ([GH-38824](https://github.com/apache/arrow/issues/38824))

#### Parquet

* Implemented Float16 logical type for Parquet files ([GH-37582](https://github.com/apache/arrow/issues/37582))
* Added proper boolean RLE encoding/decoding ([GH-38462](https://github.com/apache/arrow/issues/38462))

### Bug Fixes

### Enhancements

## Java notes
raulcd marked this conversation as resolved.
Show resolved Hide resolved

**We expect a breaking change in the next release, Arrow 16.0.0.** Support for Java 9 modules is coming, but that will require changing the JVM flags used to launch your application (GH-38998). Arrow 15.0.0 is not affected.

A bill-of-materials (BOM) package was added to make it easier to depend on multiple Arrow libraries (GH-38264).

The JDBC adapter (separate from the JDBC driver) now supports 256-bit decimals (GH-39484) and throws more informative exceptions (GH-39355).

Various improvements were made to utilities for working with vectors (GH-38662, GH-38614, GH-38511, GH-38254, GH-38246).

## JavaScript notes
raulcd marked this conversation as resolved.
Show resolved Hide resolved

This release comes with new features and APIs. We also removed `getByteLength` to reduce bundle sizes.

New Features with API changes
* [GH-39017: [JS] Add typeId as attribute](https://github.com/apache/arrow/pull/39018)
* [GH-39257: [JS] LargeBinary](https://github.com/apache/arrow/pull/39258)
* [GH-15060: [JS] Add LargeUtf8 type](https://github.com/apache/arrow/pull/35780)
* [GH-39259: [JS] Remove getByteLength](https://github.com/apache/arrow/pull/39260)
* [GH-39435: [JS] Add Vector.nullable](https://github.com/apache/arrow/pull/39436)
* [GH-39255: [JS] Allow customization of schema when passing vectors to table constructor](https://github.com/apache/arrow/pull/39256)
* [GH-37983: [JS] Allow nullable fields in table when constructed from vector with nulls](https://github.com/apache/arrow/pull/39254)

Package changes
* [GH-39289: [JS] Add types to exports](https://github.com/apache/arrow/pull/39475)

## Python notes
raulcd marked this conversation as resolved.
Show resolved Hide resolved

Compatibility notes:
* Legacy `ParquetDataset` custom implementation has been removed and only the new dataset API is now in use [GH-31303](https://github.com/apache/arrow/issues/31303).

New features:
* PyArrow version 14.0.0 included a new specification for Arrow PyCapsules and related dunder methods [GH-35531](https://github.com/apache/arrow/pull/37797) and now a public `RecordBatchReader` constructor from stream object implementing the PyCapsule Protocol has been added [GH-[39217](https://github.com/apache/arrow/issues/39217)](https://github.com/apache/arrow/issues/39217) together with some additional documentation [GH-[39196](https://github.com/apache/arrow/issues/39196)](https://github.com/apache/arrow/issues/39196).
* DLPack protocol support (producer) was added to the Arrow C++ and is exposed in Python through `__dlpack__` and `__dlpack_device__` dunder methods [GH-33984](https://github.com/apache/arrow/issues/33984).
* Python now exposes enabling CRC checksum for read and write operations in Paquet [GH-37242](https://github.com/apache/arrow/issues/37242). CRC checksum are optional and can detect data corruption.
* `CacheOptions` are now configurable from Python as part of the `pyarrow.dataset.ParquetFragmentScanOptions` [GH-36441](https://github.com/apache/arrow/issues/36441).
* Parquet metadata to indicate sort order of the data are now exposed in `RowGroupMetaData` [GH-35331](https://github.com/apache/arrow/issues/35331).

Other improvements:
* Append parameter from `FileOutputStream` is exposed for the `OSFile` class [GH-38857](https://github.com/apache/arrow/issues/38857).
* File size can be passed to `make_fragment` in the pyarrow datasets (`pyarrow.dataset.FileFormat`and `pyarrow.dataset.ParquetFileFormat`) [GH-37857](https://github.com/apache/arrow/issues/37857).
* Support for mask parameter is added to `FixedSizeListArray.from_arrays` [GH-34316](https://github.com/apache/arrow/issues/34316)
* `to/from_struct_array` are added to the `pyarrow.Table` class [GH-33500](https://github.com/apache/arrow/issues/33500).
* GIL is released in `.nbytes` which is improving performance when calculating the data size [GH-39096](https://github.com/apache/arrow/issues/39096).
* Usage of pandas internals `DatetimeTZBlock` has been removed [GH-38341](https://github.com/apache/arrow/issues/38341).
* `DataType` instance can be passed to `MapType.from_arrays` constructor [GH-39515](https://github.com/apache/arrow/issues/39515).

Relevant bug fixes:
* `S3FileSystem` equals `None` segfault has been fixed [GH-38535](https://github.com/apache/arrow/issues/38535).
* No-op kernel is added for `dictionary_encode(dictionary)` [GH-34890](https://github.com/apache/arrow/issues/34890).
* PrettyPrint for Timestamp type now adds "Z" at the end of the print string when tz is defined in order to add minimum information about the values being stored in UTC [GH-30117](https://github.com/apache/arrow/issues/30117).

## R notes
raulcd marked this conversation as resolved.
Show resolved Hide resolved

raulcd marked this conversation as resolved.
Show resolved Hide resolved
For more on what’s in the 15.0.0 R package, see the [R changelog][4].

## Ruby and C GLib notes
raulcd marked this conversation as resolved.
Show resolved Hide resolved

### Ruby

- Add `Arrow::Table#each_raw_record` and `Arrow::RecordBatch#each_raw_record` ([GH-37137](https://github.com/apache/arrow/issues/37137), [GH-37600](https://github.com/apache/arrow/issues/37600))

### C GLib

- Follow C++ changes.

## Rust notes

The Rust projects have moved to separate repositories outside the
main Arrow monorepo. For notes on the latest release of the Rust
implementation, see the latest [Arrow Rust changelog][5].

[1]: https://github.com/apache/arrow/milestone/56?closed=1
[2]: {{ site.baseurl }}/release/15.0.0.html#contributors
[3]: {{ site.baseurl }}/release/15.0.0.html#changelog
[4]: {{ site.baseurl }}/docs/r/news/
[5]: https://github.com/apache/arrow-rs/tags