Skip to content

Commit

Permalink
Merge branch 'main' into code_cleanup
Browse files Browse the repository at this point in the history
  • Loading branch information
sanjanag authored Nov 20, 2023
2 parents c228a12 + 9de8f33 commit 37b0bd8
Show file tree
Hide file tree
Showing 9 changed files with 170 additions and 61 deletions.
19 changes: 19 additions & 0 deletions DEVELOPMENT.md
Original file line number Diff line number Diff line change
Expand Up @@ -110,6 +110,25 @@ You can install the git hook scripts with:
pre-commit install
```

## How to build `cleanvision` docs locally?

1. Install the required packages to build the docs:
```shell
pip install -r docs/requirements.txt
```
2. Install [pandoc](https://pandoc.org/installing.html)

3. Build the docs using `sphinx-build`
```shell
sphinx-build docs/source cleanvision-docs
```

**Note for faster build**: Executing the Jupyter Notebooks (i.e., the .ipynb files) that make up some portion of the docs, such as the tutorials, takes a long time. If you want to skip rendering these, set the environment variable `SKIP_NOTEBOOKS=1`. You can either set this using `export SKIP_NOTEBOOKS=1`

4. To view the docs open the file `cleanvision-docs/index.html` file in a browser.



### EditorConfig

This repo uses [EditorConfig](https://editorconfig.org/) to keep code style
Expand Down
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,7 @@ imagelab.report(issue_types=issue_types)
- [Additional example notebooks](https://github.com/cleanlab/cleanvision-examples)
- [Documentation](https://cleanvision.readthedocs.io/)
- [Blog Post](https://cleanlab.ai/blog/cleanvision/)
- [FAQ](https://cleanvision.readthedocs.io/en/latest/faq.html)

## *Clean* your data for better Computer *Vision*

Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Folder Dataset
Fsspec Dataset
==============

.. automodule:: cleanvision.dataset.folder_dataset
.. automodule:: cleanvision.dataset.fsspec_dataset
:autosummary:
:members:
:undoc-members:
Expand Down
2 changes: 1 addition & 1 deletion docs/source/cleanvision/dataset/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ Dataset

.. toctree::
base_dataset
folder_dataset
fsspec_dataset
hf_dataset
torch_dataset
utils
68 changes: 68 additions & 0 deletions docs/source/faq.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
Frequently Asked Questions
==========================

Answers to frequently asked questions about the `cleanvision <https://github.com/cleanlab/cleanvision/>`_ open-source package.

1. **What kind of machine learning tasks can I use CleanVision for?**

CleanVision is independent of any machine learning tasks as it directly works on images and does not require and labels or metadata to detect issues in the dataset. The issues detected by CleanVision are helpful for all kinds of machine learning tasks.

2. **Can I check for specific issues in my dataset?**


Yes, you can specify issues like ``light`` or ``blurry`` in the issue_types argument when calling ``Imagelab.find_issues``

.. code-block:: python3
imagelab.find_issues(issue_types={"light": {}, "blurry": {}})
3. **What dataset formats does CleanVision support?**


Apart from plain image files stored locally or in the cloud, CleanVision also works with HuggingFace and Torchvision datasets. You can use the dataset objects as is with the ``image_key`` argument.

.. code-block:: python3
imagelab = Imagelab(hf_dataset=dataset, image_key="image")
For more detailed usage instructions and examples, check the :ref:`tutorials`.

Commonly encountered errors
---------------------------

- **RuntimeError: An attempt has been made to start a new process before the current process has finished its bootstrapping phase.**

.. code-block:: console
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if __name__ == '__main__':
freeze_support()
...
The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce an executable.
To fix this issue, refer to the "Safe importing of main module"
section in https://docs.python.org/3/library/multiprocessing.html
The above issue is caused by multiprocessing module working differently for macOS and Windows platforms. A detailed discussion of the issue can be found `here <https://github.com/cleanlab/cleanlab/issues/159>`_.
A fix around this issue is to run CleanVision in the main namespace like this

.. code-block:: python3
if __name__ == "__main__":
imagelab = Imagelab(data_path)
imagelab.find_issues()
imagelab.report()
OR use ``n_jobs=1`` to disable parallel processing:

.. code-block:: python3
imagelab.find_issues(n_jobs=1)
71 changes: 41 additions & 30 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,47 +4,50 @@

Documentation
=======================================

CleanVision automatically detects various issues in image datasets, such as images that are: (near) duplicates, blurry,
over/under-exposed, etc. This data-centric AI package is designed as a quick first step for any computer vision project
to find problems in your dataset, which you may want to address before applying machine learning.


Installation
============

To install the latest stable version (recommended):
------------

.. code-block:: console
.. tabs::

$ pip install cleanvision
.. tab:: pip

.. code-block:: bash
To install the bleeding-edge developer version:
pip install cleanvision
.. code-block:: console
To install the package with all optional dependencies:

$ pip install git+https://github.com/cleanlab/cleanvision.git
.. code-block:: bash
To install with HuggingFace optional dependencies
pip install "cleanvision[all]"
.. code-block:: console
.. tab:: source

$ pip install "cleanvision[huggingface]"
.. code-block:: bash
To install with Torchvision optional dependencies
pip install git+https://github.com/cleanlab/cleanvision.git
.. code-block:: console
To install the package with all optional dependencies:

$ pip install "cleanvision[pytorch]"
.. code-block:: bash
pip install "git+https://github.com/cleanlab/cleanvision.git#egg=cleanvision[all]"
Quickstart
===========
How to Use CleanVision
----------------------

1. Using CleanVision to audit your image data is as simple as running the code below:
Basic Usage
^^^^^^^^^^^
Here's how to quickly audit your image data:


.. code-block:: python3
Expand All @@ -60,8 +63,9 @@ Quickstart
# Produce a neat report of the issues found in your dataset
imagelab.report()
2. CleanVision diagnoses many types of issues, but you can also check for only specific issues:

Targeted Issue Detection
^^^^^^^^^^^^^^^^^^^^^^^^
You can also focus on specific issues:

.. code-block:: python3
Expand All @@ -72,8 +76,9 @@ Quickstart
# Produce a report with only the specified issue_types
imagelab.report(issue_types.keys())
3. Run CleanVision on a Hugging Face dataset

Integration with Hugging Face Dataset
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Easily use CleanVision with a Hugging Face dataset:

.. code-block:: python3
Expand All @@ -90,7 +95,9 @@ Quickstart
imagelab.report()
4. Run CleanVision on a Torchvision dataset
Integration with Torchvision Dataset
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
CleanVision works smoothly with Torchvision datasets too:


.. code-block:: python3
Expand All @@ -111,29 +118,32 @@ Quickstart
imagelab.report()
More on how to get started with CleanVision:
- `Example Python script <https://github.com/cleanlab/cleanvision/blob/main/docs/source/tutorials/run.py>`_
- `Example Notebooks <https://github.com/cleanlab/cleanvision-examples>`_
- `How To Contribute <https://github.com/cleanlab/cleanvision/blob/main/CONTRIBUTING.md>`_
Additional Resources
--------------------
- Get started with our `Example Notebook <https://cleanvision.readthedocs.io/en/latest/tutorials/tutorial.html>`_
- Explore more `Example Notebooks <https://github.com/cleanlab/cleanvision-examples>`_
- Learn how to contribute in the `Contribution Guide <https://github.com/cleanlab/cleanvision/blob/main/CONTRIBUTING.md>`_


.. toctree::
:hidden:
:maxdepth: 1
:caption: Getting Started

Quickstart <self>
.. _api-reference:


.. _tutorials:
.. toctree::
:hidden:
:maxdepth: 3
:caption: Tutorials
:name: _tutorials

tutorials/tutorial.ipynb
How to Use CleanVision <tutorials/tutorial.ipynb>
tutorials/torchvision_dataset.ipynb
tutorials/huggingface_dataset.ipynb
Frequently Asked Questions <faq>

.. _api-reference:
.. toctree::
:hidden:
:maxdepth: 3
Expand All @@ -153,3 +163,4 @@ More on how to get started with CleanVision:
GitHub <https://github.com/cleanlab/cleanvision.git>
PyPI <https://pypi.org/project/cleanvision/>
Cleanlab Studio <https://cleanlab.ai/studio/?utm_source=cleanvision&utm_medium=docs&utm_campaign=clostostudio>

40 changes: 14 additions & 26 deletions docs/source/tutorials/tutorial.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Overview"
"# How to Use CleanVision"
]
},
{
Expand All @@ -30,13 +30,13 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"nbsphinx": "hidden",
"tags": []
"nbsphinx": "hidden"
},
"source": [
"Use `pip install cleanvision` to install a stable release of the package.\n",
"\n",
"**After you install these packages, you may need to restart your notebook runtime before running the rest of this notebook.**"
]
},
Expand Down Expand Up @@ -72,38 +72,26 @@
"This notebook uses an example dataset, that you can download using these commands."
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"wget - nc 'https://cleanlab-public.s3.amazonaws.com/CleanVision/image_files.zip'\n",
"\n",
"unzip -q image_files.zip"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"nbsphinx": "hidden",
"tags": []
"nbsphinx": "hidden"
},
"outputs": [],
"source": [
"!wget - nc 'https://cleanlab-public.s3.amazonaws.com/CleanVision/image_files.zip'"
"!wget - nc 'https://cleanlab-public.s3.amazonaws.com/CleanVision/image_files.zip'\n",
"!unzip -q image_files.zip"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"nbsphinx": "hidden",
"tags": []
},
"outputs": [],
"cell_type": "markdown",
"metadata": {},
"source": [
"!unzip -q image_files.zip"
"```shell\n",
"wget - nc 'https://cleanlab-public.s3.amazonaws.com/CleanVision/image_files.zip'\n",
"unzip -q image_files.zip\n",
"```"
]
},
{
Expand Down Expand Up @@ -804,7 +792,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"**Beyond the collection of image files demonstrated here, you can alternatively run CleanVision on: [Hugging Face datasets](https://github.com/cleanlab/cleanvision/blob/main/docs/source/tutorials/huggingface_dataset.ipynb) and [torchvision datasets](https://github.com/cleanlab/cleanvision/blob/main/docs/source/tutorials/torchvision_dataset.ipynb).**"
"Beyond the collection of image files demonstrated here, you can alternatively run CleanVision on: [Hugging Face datasets](https://github.com/cleanlab/cleanvision/blob/main/docs/source/tutorials/huggingface_dataset.ipynb), [torchvision datasets](https://github.com/cleanlab/cleanvision/blob/main/docs/source/tutorials/torchvision_dataset.ipynb), as well as [files in cloud storage buckets like S3, GCS, or Azure](https://github.com/cleanlab/cleanvision-examples/blob/main/cloud_dataset.ipynb)."
]
}
],
Expand Down
2 changes: 1 addition & 1 deletion src/cleanvision/utils/viz_manager.py
Original file line number Diff line number Diff line change
Expand Up @@ -113,4 +113,4 @@ def plot_image_grid(
set_image_on_axes(images[i], axes[i], titles[i])
else:
set_image_on_axes(images[0], axes, titles[0])
plt.show() # type: ignore
plt.show()
24 changes: 23 additions & 1 deletion tests/test_viz_manager.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
import pytest
from PIL import Image

from cleanvision.utils.viz_manager import VizManager
from cleanvision.utils.viz_manager import VizManager, truncate_titles


class TestVizManager:
Expand Down Expand Up @@ -30,3 +30,25 @@ def test_individual_images(self, images, title_info):
)
def test_image_sets(self, image_sets, title_info_sets):
VizManager.image_sets(image_sets, title_info_sets, 4, (2, 2))


def test_truncate_titles():
assert truncate_titles(
2,
[
"/home/usr/proj/dev/product/dataset/images/image_0001.img",
"/home/usr/proj/dev/product/dataset/images/image_0002.img",
],
) == ["...es/image_0001.img", "...es/image_0002.img"]

assert truncate_titles(2, ["image.jpeg", "image2.jpeg"]) == [
"image.jpeg",
"image2.jpeg",
]
assert truncate_titles(
2,
[
"/pictures/mount/image_0001.img",
"/home/usr/proj/dev/product/dataset/images/image_0002.img",
],
) == ["/pictures/mount/i...", "/home/usr/proj/de..."]

0 comments on commit 37b0bd8

Please sign in to comment.