diff --git a/docs/autodoc/index.rst b/docs/autodoc/index.rst index 42681c19e..da5e6d944 100644 --- a/docs/autodoc/index.rst +++ b/docs/autodoc/index.rst @@ -24,6 +24,7 @@ training frameworks. benchmarks_tutorial_include troubleshoot_include release_notes_include + migration_guide_include Indices and tables diff --git a/docs/autodoc/migration_guide_include.rst b/docs/autodoc/migration_guide_include.rst new file mode 100644 index 000000000..7f8c274ff --- /dev/null +++ b/docs/autodoc/migration_guide_include.rst @@ -0,0 +1,3 @@ +Migration notes +=============== +.. include:: ../migrating-0.5.0.rst diff --git a/docs/migrating-0.5.0.rst b/docs/migrating-0.5.0.rst new file mode 100644 index 000000000..a66a06432 --- /dev/null +++ b/docs/migrating-0.5.0.rst @@ -0,0 +1,45 @@ +.. inclusion-marker-start-do-not-remove + +================== +To petastorm 0.5.0 +================== + +Petastorm 0.5.0 has some breaking changes from previous versions. These include: + +- Users should use :func:`~petastorm.reader.make_reader`, instead of instantiating :class:`~petastorm.reader.Reader` + directly to create a new instances +- It is still possible (although discouraged in most cases) to instantitate :class:`~petastorm.reader.Reader`. Some of + its argument has changed. + +Use :func:`~petastorm.reader.make_reader` to instantiate a reader instance +-------------------------------------------------------------------------- + +Use :func:`~petastorm.reader.make_reader` to create a new instance of a reader. :func:`~petastorm.reader.make_reader` +takes arguments that are almost similar to constructor arguments of :class:`~petastorm.reader.Reader`. The following +list enumerates the differences: + +- ``reader_pool``: takes one of the strings: ``'thread'``, ``'process'``, ``'dummy'`` + (instead of ``ThreadPool()``, ``ProcessPool()`` and ``DummyPool()`` object instances). Pass number of workers using + ``workers_count`` argument. +- ``training_partition`` and ``num_training_partitions`` were renamed into ``cur_shard`` and ``shard_count``. +- ``shuffle`` and ``shuffle_options`` were replaced by ``shuffle_row_groups=True, shuffle_row_drop_partitions=1`` + +.. code-block:: python + + from petastorm.reader import Reader + reader = Reader(dataset_url, + reader_pool=ThreadPool(5), + training_partition=1, num_training_partitions=5, + shuffle_options=ShuffleOptions(shuffle_row_groups=False)) + +To: + +.. code-block:: python + + from petastorm import make_reader + reader = make_reader(dataset_url, + reader_pool='thread', + workers_count=5, + cur_shard=1, shard_count=5, + shuffle_row_groups=False) + diff --git a/docs/release-notes.rst b/docs/release-notes.rst index 892b4bd1f..f96358ca8 100644 --- a/docs/release-notes.rst +++ b/docs/release-notes.rst @@ -4,6 +4,30 @@ Release notes ============= +Release 0.5.0 +============= + +Breaking changes +---------------- +- :func:`~petastorm.reader.make_reader` should be used to create new instance of a reader. +- It is still possible, but not recommended to use :class:`~petastorm.reader.Reader` in most cases. Its constructor arguments + has changed: + -- ``training_partition`` and ``num_training_partitions`` were renamed into ``cur_shard`` and ``shard_count``. + -- ``shuffle`` and ``shuffle_options`` were replaced by ``shuffle_row_groups=True, shuffle_row_drop_partitions=1`` + -- ``sequence`` argument was removed + + +New features and bug fixes +-------------------------- +- It is possible to read non-Petastorm Parquet datasets (created externally to Petastorm). Currently most of the + scalar types are supported. +- Support s3 as the protocol in a dataset url strings (e.g. 's3://...') +- PyTorch: support collating decimal scalars +- PyTorch: promote integer types that are not supported by PyTorch to the next larger integer types that is supported + (e.g. int8 -> int16). Booleans are promoted to uint8. +- Support running ``petastorm-generate-metadata.py`` on datasets created by Hive. +- Fix incorrect dataset sharding when using Python 3. + Release 0.4.3 ============= diff --git a/petastorm/__init__.py b/petastorm/__init__.py index 145161985..61aabaa8d 100644 --- a/petastorm/__init__.py +++ b/petastorm/__init__.py @@ -14,4 +14,4 @@ from petastorm.reader import make_reader, make_batch_reader # noqa: F401 -__version__ = '0.4.3' +__version__ = '0.5.0rc0'