Releases · lancedb/lance

06 Feb 04:52

changhiskhan

v0.3.1

38ea579

v0.3.1 Index creation tool

We added an index creation tool that's 2x faster than FAISS.
Accessible in python via Dataset.create_index

What's Changed

Add unit test for Dataset::take_rows by @eddyxu in #523
Create index API from Python by @eddyxu in #524
Implement a kmean optimized for Arrow-backed vectors in pure rust. by @eddyxu in #525
Reimplement IVF_PQ index. by @eddyxu in #519
Use pure rust kmean in IVF and PQ by @eddyxu in #526
update release actions by @changhiskhan in #527
[rust] make scanning in order configurable by @changhiskhan in #528

Full Changelog: v0.3.0...v0.3.1

Contributors

eddyxu and changhiskhan

Assets 2

02 Feb 23:13

changhiskhan

v0.3.0

ba87905

v0.3.0 Rusty Lances and Friendly Neighbors

Sayonara C++, bonjour Rust

What started out as a holiday hack has become a full-blown Rust rewrite.
As we say farewell to our much beloved C++ implementation, we welcome a major new feature to Lance: the vector index.

Lance's vector index is fast and has a small memory footprint. From disk, we benchmark average latencies of 1ms on vanilla macbook airs for 1M vectors.
Your data, vectors, and index can live in harmony under one roof so you don't need to manage a separate index or service.
You can choose to manage and retrieve additional features with the vectors with very little performance impact.

What's Changed

Only increase cursor if file success to write by @eddyxu in #435
GHA to add python 3.11 (and upgrade to duckdb 0.6.1) by @changhiskhan in #434
ScannerStream accepts early stop by @eddyxu in #437
upgrade arrow-rs to 31.0 by @eddyxu in #438
L2 distance by @eddyxu in #439
Create DataFragment and DataFile during Dataset write process by @eddyxu in #440
Rust Dataset Write API by @eddyxu in #441
[Rust] Read Partially from a plain encoded batch by @eddyxu in #443
Get range in var-binary encoding by @eddyxu in #444
Productionlize Flat Index by @eddyxu in #442
Make Scan an ExecNode by @eddyxu in #445
Take record by Row ID by @eddyxu in #446
Implement Take for dictionary decoder. by @eddyxu in #447
Merge two RecordBatch by @eddyxu in #449
Integrate flat index by @eddyxu in #448
Support limit offset as ExecNode by @changhiskhan in #450
Read IVF_PQ index by @eddyxu in #451
Cli to operate on dataset indices by @eddyxu in #452
[RUST] python (re)integration v1 by @changhiskhan in #436
Support writing dictionary values (at the dataset level). by @eddyxu in #454
Replace ObjectReader as a pub trait. by @eddyxu in #459
[Rust] Implement LocalObjectReader that holds an open file to improve performance. by @eddyxu in #460
inherit from pyarrow Dataset/Scanner by @changhiskhan in #462
[RUST] Flat index benchmark by @eddyxu in #461
Generate spotify dataset with embeddings. by @eddyxu in #453
Fix pylance typo and float32 array conversion. by @eddyxu in #463
Write index metadata with a new version by @eddyxu in #466
[rust] fix projection in Dataset:take_rows by @changhiskhan in #464
blas feature flag by @changhiskhan in #467
Sift dataset generation by @eddyxu in #472
Improve scan perf by re-enable prefetching in ScanNode by @eddyxu in #473
Changhiskhan/new docs by @changhiskhan in #474
Fix AVX and NEON L2 distance computation. by @eddyxu in #476
add recall metric computation by @changhiskhan in #475
Fix reader assertion on manifest buffer size by @eddyxu in #478
[Rust] Minimal dataset append support by @eddyxu in #482
Pass nprobes parameter from python by @changhiskhan in #480
add a test_dataset function to compute the recall for lance by @changhiskhan in #479
Split sparse index read into chunks based on optimal I/O size for the media by @eddyxu in #483
Fix codespace prebuild by @eddyxu in #485
Make ObjectReader prefetch size configurable by @eddyxu in #486
Add a refine stage for vector search by @eddyxu in #488
add nprobes as parameter to benchmark by @changhiskhan in #484
refine factor by @changhiskhan in #489
Use ordered buffer in plain decoder by @eddyxu in #493
New rust+pyo3 based pylance by @eddyxu in #494
Fast count rows by @eddyxu in #490
Count rows in python dataset, and setup GHA again by @eddyxu in #495
Sayonara C++ by @eddyxu in #497
[Rust] Dataset Overwrite, and Version Checkout by @eddyxu in #496
Load S3 credentials using default credentials chain by @eddyxu in #498
Fix doc build by @eddyxu in #499
File format spec by @eddyxu in #500
Doc build fix by @eddyxu in #501
Schema evolution document by @eddyxu in #503
update the python readme for pypi by @changhiskhan in #504
Handle null strings for both cases where nullability is set or not. by @eddyxu in #509
update main github readme by @changhiskhan in #508
[python] write_dataset returns new dataset by @changhiskhan in #517
Changhiskhan/list versions by @changhiskhan in #516
Refine Factor is None by default by @eddyxu in #518

Full Changelog: v0.2.9...v0.3.0

Contributors

eddyxu and changhiskhan

Assets 2

0 Join discussion

16 Jan 00:24

changhiskhan

v0.2.9

434ad65

v0.2.9 pandas extension type for inline images

And also, we've started to implement Lance is Rust. A new kickass vector indexing feature will be coming soon once we do some more cleanup and hook the Rust module back into python.

What's Changed

[DuckDB] Add macro to check window size by @eddyxu in #395
[pandas] Add pandas extension type for ImageBinary by @changhiskhan in #398
python 3.11 is updating and causing error by @changhiskhan in #397
[RUST] Initialize read support in Rust. by @eddyxu in #401
Add missing logical type conversions by @eddyxu in #404
[RUST] Schema projection by @eddyxu in #403
[RUST] Data file reader by @eddyxu in #402
[Rust] Decoder for dictionary encoding by @eddyxu in #406
[Rust] Support full scan for BooleanArray by @changhiskhan in #407
[Rust] Basic reading support for nested fields. by @eddyxu in #408
Add unit tests for all supported primitive types by @changhiskhan in #409
[RUST] Binary encoder and null support. by @eddyxu in #411
[Rust] Fix Cargo publish by @eddyxu in #410
[RUST] Large binary support by @eddyxu in #412
Add support for fixed size list by @changhiskhan in #413
Jaichopra/nuscenes converter by @jaichopra in #364
Add Support for Fixed Size Binary Full scan by @changhiskhan in #414
Bare minimal scanner in Rust by @eddyxu in #415
Set field IDs. by @eddyxu in #417
[Rust] Read/Write Protobuf-backed struct directly from file or buffers. by @eddyxu in #418
[Rust] Lance File Writer by @eddyxu in #419
[Rust] Write dictionary data by @eddyxu in #420
[RUST] Write List/LargeList/FixedSizeList/FixedSizeBinary by @eddyxu in #421
fix byte range and iterator bug by @changhiskhan in #422
Fix dict order in logical type to be consistent with C++ by @eddyxu in #425
Limits notebook GHA to only run when C++ / Python changes. by @eddyxu in #427
Implement futures::Stream for Scanner by @eddyxu in #426
Append column to RecordBatch by @eddyxu in #429
[Rust] Read batch with rowid as a meta column. by @eddyxu in #430
[RUST] argmin and argmax kernel for numeric array by @eddyxu in #432

Full Changelog: v0.2.8...v0.2.9

Contributors

eddyxu, changhiskhan, and jaichopra

Assets 2

24 Dec 03:37

changhiskhan

v0.2.8

b8a949b

v0.2.8 Happy Holidays!

This release contains the following:

A full-fledged ML data quality improvement workflow using Lance showing model performance insights, detecting mislabels, and doing active learning. An experimental integration with Label Studio is demonstrated as well.
Critical bug fix affected read/write of dictionary columns
Imagenet dataset converter

What's Changed

[BUG] Fix reading version aux data reading and writing by @eddyxu in #384
[Benchmark] upload scripts for coco / imagenet benchmark dataset by @eddyxu in #385
Closes #387 by @changhiskhan in #388
Data quality notebook and associated code by @changhiskhan in #389
[DUCKDB] Do not build PyTorch by default by @eddyxu in #392
brew pin python by @changhiskhan in #391
fix off by one error using negative indices for diff'ing by @changhiskhan in #383
Fix GHA for duckdb extension by @changhiskhan in #394
[DUCKDB] Add a Derivative macro by @eddyxu in #393
[Benchmark] Create imagenet from raw dataset by @eddyxu in #386
Various fixes for imagenet and fmt changes by @changhiskhan in #396

Full Changelog: v0.2.7...v0.2.8

Contributors

eddyxu and changhiskhan

Assets 2

19 Dec 00:48

eddyxu

v0.2.7

173ac9d

v0.2.7 Dataset Diff and Metrics computation, and Dataset Version Metadata

What's Changed

create and update tarball for pets by @changhiskhan in #372
[C++] Sanity check to verify column does not overlap when merging a new table by @eddyxu in #375
update notebooks so s3 credentials are not required by @changhiskhan in #376
Add function to get version as of a certain date. Also formatting by @changhiskhan in #378
convenience for comparing metrics across versions by @changhiskhan in #379
Changhiskhan/datadiff by @changhiskhan in #380
Refactor dataset diff and compute metric by @changhiskhan in #381
[C++] Attach new schema update when update dataset by @eddyxu in #374

Full Changelog: v0.2.6...v0.2.7

Contributors

eddyxu and changhiskhan

Assets 2

1 Join discussion

13 Dec 02:41

eddyxu

v0.2.6

00102dc

v0.2.6 Schema evolution bug fixes, Google Colab support, and more datasets

What's Changed

[C++] Remove unused Reader APIs by @eddyxu in #344
[Python] fix timezone issue with version timestamp by @changhiskhan in #345
[C++] add Dataset::Make(string) API by @eddyxu in #346
[DUCKDB] Native duckdb lance reader by @eddyxu in #347
[DUCKDB] Read a special version of dataset by @eddyxu in #350
[DUCKDB] Fix duckdb manylinux build by @eddyxu in #351
[Python] Add colab badge to notebooks by @eddyxu in #354
[Notebook] ML dev cycle for DINO by @eddyxu in #355
[DUCKDB] fix type mapping for other int types by @changhiskhan in #359
[Python] Fix lance.dataset open local related path by @eddyxu in #365
[C++] Store relative path for data files by @eddyxu in #368
[C++] Add RAII util (defer) to auto cleanup / close resources after exiting the scope by @eddyxu in #369
[Python] Convert of ImageNet 1K into Lance dataset by @eddyxu in #366
[Python] Imagenet data quality analytics notebook by @eddyxu in #370

Full Changelog: v0.2.5...v0.2.6

Contributors

eddyxu and changhiskhan

Assets 2

02 Dec 06:15

eddyxu

v0.2.5

ceb65ae

v0.2.5 Schema evolution, support merging with arrow Table

What's Changed

[DOC] Fix notebook build by @eddyxu in #339
[Python] lance.write_dataset takes pandas DataFrame by @eddyxu in #342
[DOC] update readme docs to cater for import pathways from df/parquet by @jaichopra in #340
[Python] Improve PyTorch dataset ergonomic by @eddyxu in #336
[C++] Add columns from in-memory table by @eddyxu in #337
[Python] append column with a in-memory Pyarrow Table by @eddyxu in #338
[C++][Python] Add timestamp to each manifest version. by @eddyxu in #343

Full Changelog: v0.2.4...v0.2.5

Contributors

eddyxu and jaichopra

Assets 2

28 Nov 21:25

eddyxu

v0.2.4

b6ba75f

v0.2.4: Schema Evolution and Append Column

Support Schema Evolution via Append Column.

What's Changed

[Notebook] fixes for notebook backing the blog post by @changhiskhan in #316
[C++] Append column by @eddyxu in #299
[Python] Append columns by @eddyxu in #318
[Use column projection during update by @eddyxu in https://github.com//pull/322
update to duckdb 0.6 by @changhiskhan in #312
[Python] Support add column via Expression. by @eddyxu in #324
[Python] Expose projection for append column by @eddyxu in #325
[C++] Support column projection during add_columns via expression by @eddyxu in #326
[Python] Pytorch Dataset uses Fragment instead of files and support versions by @eddyxu in #327
[C++] Move writer API a private API by @eddyxu in #329
[C++] Refectory Metadata class to eliminate protobuf reference. by @eddyxu in #328
[C++] Performance profiling and improvement by @eddyxu in #333
[C++] Upgrade lq cmd tool to be able to inspect new versioned format by @eddyxu in #334

Full Changelog: v0.2.3...v0.2.4

Contributors

eddyxu and changhiskhan

Assets 2

1 Join discussion

16 Nov 04:23

changhiskhan

v0.2.3

a55f929

v0.2.3 Bugfix release; breaks dataset proto schema

What's Changed

[C++] Project schema via field Ids and Schema intersection by @eddyxu in #305
when writing in batches, handle all na arrays properly by @changhiskhan in #306
[C++] Use LanceFragment to build I/O exec plan by @eddyxu in #307
[CI] Fix Github Action warning to upgrade nodejs 12 based actions by @eddyxu in #309
Update README.md by @changhiskhan in #310
Temporarily pin duckdb to 0.5.1 by @changhiskhan in #313
Notebook for new blog post on versioning by @changhiskhan in #311
[C++] Fix reading dictionary values from manifest files by @eddyxu in #314

Full Changelog: v0.2.2...v0.2.3

Contributors

eddyxu and changhiskhan

Assets 2

09 Nov 17:25

eddyxu

v0.2.2

8a9d736

v0.2.2 Python notebooks and CV dataset conversion.

What's Changed

[DOC] Update README.md by @jaichopra in #294
[DUCKDB] Script to upload lance extension zip by @changhiskhan in #295
[C++] Scan Node reads multiple files by @eddyxu in #300
[Python] Add lance.util.duckdb to help install the extension transparently by @changhiskhan in #301
[Python] Notebook fixes by @changhiskhan in #303
[Python] Make dataset conversion a feature by @changhiskhan in #304

Full Changelog: v0.2.1...v0.2.2

Contributors

eddyxu, changhiskhan, and jaichopra

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's Changed

Contributors

What's Changed

Contributors

What's Changed

Contributors

What's Changed

Contributors

What's Changed

Contributors

What's Changed

Contributors

What's Changed

Contributors

What's Changed

Contributors

What's Changed

Contributors

What's Changed

Contributors

Releases: lancedb/lance

v0.3.1 Index creation tool

What's Changed

Contributors

v0.3.0 Rusty Lances and Friendly Neighbors

What's Changed

Contributors

v0.2.9 pandas extension type for inline images

What's Changed

Contributors

v0.2.8 Happy Holidays!

What's Changed

Contributors

v0.2.7 Dataset Diff and Metrics computation, and Dataset Version Metadata

What's Changed

Contributors

v0.2.6 Schema evolution bug fixes, Google Colab support, and more datasets

What's Changed

Contributors

v0.2.5 Schema evolution, support merging with arrow Table

What's Changed

Contributors

v0.2.4: Schema Evolution and Append Column

What's Changed

Contributors

v0.2.3 Bugfix release; breaks dataset proto schema

What's Changed

Contributors

v0.2.2 Python notebooks and CV dataset conversion.

What's Changed

Contributors