Skip to content

Commit

Permalink
Added faq page (#235)
Browse files Browse the repository at this point in the history
* Added faq page

* Updated soln

* Docs update
- Added faq page
- fixed broken links
- Changed wording for some sections

* Updated faq link

* Removed unncessary edit

* Update docs/source/faq.rst

Co-authored-by: Jonas Mueller <[email protected]>

* Corrected quotes

---------

Co-authored-by: Jonas Mueller <[email protected]>
  • Loading branch information
sanjanag and jwmueller authored Nov 15, 2023
1 parent 2814ed8 commit 1c6fa9d
Show file tree
Hide file tree
Showing 6 changed files with 122 additions and 43 deletions.
18 changes: 9 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,15 +35,14 @@ wget -nc 'https://cleanlab-public.s3.amazonaws.com/CleanVision/image_files.zip'
```python
from cleanvision import Imagelab

if __name__ == '__main__':
# Specify path to folder containing the image files in your dataset
imagelab = Imagelab(data_path="FOLDER_WITH_IMAGES/")

# Automatically check for a predefined list of issues within your dataset
imagelab.find_issues()

# Produce a neat report of the issues found in your dataset
imagelab.report()
# Specify path to folder containing the image files in your dataset
imagelab = Imagelab(data_path="FOLDER_WITH_IMAGES/")

# Automatically check for a predefined list of issues within your dataset
imagelab.find_issues()

# Produce a neat report of the issues found in your dataset
imagelab.report()
```

2. CleanVision diagnoses many types of issues, but you can also check for only specific issues.
Expand All @@ -67,6 +66,7 @@ imagelab.report(issue_types=issue_types)
- [Additional example notebooks](https://github.com/cleanlab/cleanvision-examples)
- [Documentation](https://cleanvision.readthedocs.io/)
- [Blog Post](https://cleanlab.ai/blog/cleanvision/)
- [FAQ](https://cleanvision.readthedocs.io/en/latest/faq.html)

## *Clean* your data for better Computer *Vision*

Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Folder Dataset
Fsspec Dataset
==============

.. automodule:: cleanvision.dataset.folder_dataset
.. automodule:: cleanvision.dataset.fsspec_dataset
:autosummary:
:members:
:undoc-members:
Expand Down
2 changes: 1 addition & 1 deletion docs/source/cleanvision/dataset/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ Dataset

.. toctree::
base_dataset
folder_dataset
fsspec_dataset
hf_dataset
torch_dataset
utils
68 changes: 68 additions & 0 deletions docs/source/faq.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
Frequently Asked Questions
==========================

Answers to frequently asked questions about the `cleanvision <https://github.com/cleanlab/cleanvision/>`_ open-source package.

1. **What kind of machine learning tasks can I use CleanVision for?**

CleanVision is independent of any machine learning tasks as it directly works on images and does not require and labels or metadata to detect issues in the dataset. The issues detected by CleanVision are helpful for all kinds of machine learning tasks.

2. **Can I check for specific issues in my dataset?**


Yes, you can specify issues like ``light`` or ``blurry`` in the issue_types argument when calling ``Imagelab.find_issues``

.. code-block:: python3
imagelab.find_issues(issue_types={"light": {}, "blurry": {}})
3. **What dataset formats does CleanVision support?**


Apart from plain image files stored locally or in the cloud, CleanVision also works with HuggingFace and Torchvision datasets. You can use the dataset objects as is with the ``image_key`` argument.

.. code-block:: python3
imagelab = Imagelab(hf_dataset=dataset, image_key="image")
For more detailed usage instructions and examples, check the :ref:`tutorials`.

Commonly encountered errors
---------------------------

- **RuntimeError: An attempt has been made to start a new process before the current process has finished its bootstrapping phase.**

.. code-block:: console
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if __name__ == '__main__':
freeze_support()
...
The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce an executable.
To fix this issue, refer to the "Safe importing of main module"
section in https://docs.python.org/3/library/multiprocessing.html
The above issue is caused by multiprocessing module working differently for macOS and Windows platforms. A detailed discussion of the issue can be found `here <https://github.com/cleanlab/cleanlab/issues/159>`_.
A fix around this issue is to run CleanVision in the main namespace like this

.. code-block:: python3
if __name__ == "__main__":
imagelab = Imagelab(data_path)
imagelab.find_issues()
imagelab.report()
OR use ``n_jobs=1`` to disable parallel processing:

.. code-block:: python3
imagelab.find_issues(n_jobs=1)
71 changes: 41 additions & 30 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,47 +4,50 @@

Documentation
=======================================

CleanVision automatically detects various issues in image datasets, such as images that are: (near) duplicates, blurry,
over/under-exposed, etc. This data-centric AI package is designed as a quick first step for any computer vision project
to find problems in your dataset, which you may want to address before applying machine learning.


Installation
============

To install the latest stable version (recommended):
------------

.. code-block:: console
.. tabs::

$ pip install cleanvision
.. tab:: pip

.. code-block:: bash
To install the bleeding-edge developer version:
pip install cleanvision
.. code-block:: console
To install the package with all optional dependencies:

$ pip install git+https://github.com/cleanlab/cleanvision.git
.. code-block:: bash
To install with HuggingFace optional dependencies
pip install "cleanvision[all]"
.. code-block:: console
.. tab:: source

$ pip install "cleanvision[huggingface]"
.. code-block:: bash
To install with Torchvision optional dependencies
pip install git+https://github.com/cleanlab/cleanvision.git
.. code-block:: console
To install the package with all optional dependencies:

$ pip install "cleanvision[pytorch]"
.. code-block:: bash
pip install "git+https://github.com/cleanlab/cleanvision.git#egg=cleanvision[all]"
Quickstart
===========
How to Use CleanVision
----------------------

1. Using CleanVision to audit your image data is as simple as running the code below:
Basic Usage
^^^^^^^^^^^
Here's how to quickly audit your image data:


.. code-block:: python3
Expand All @@ -60,8 +63,9 @@ Quickstart
# Produce a neat report of the issues found in your dataset
imagelab.report()
2. CleanVision diagnoses many types of issues, but you can also check for only specific issues:

Targeted Issue Detection
^^^^^^^^^^^^^^^^^^^^^^^^
You can also focus on specific issues:

.. code-block:: python3
Expand All @@ -72,8 +76,9 @@ Quickstart
# Produce a report with only the specified issue_types
imagelab.report(issue_types.keys())
3. Run CleanVision on a Hugging Face dataset

Integration with Hugging Face Dataset
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Easily use CleanVision with a Hugging Face dataset:

.. code-block:: python3
Expand All @@ -90,7 +95,9 @@ Quickstart
imagelab.report()
4. Run CleanVision on a Torchvision dataset
Integration with Torchvision Dataset
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
CleanVision works smoothly with Torchvision datasets too:


.. code-block:: python3
Expand All @@ -111,29 +118,32 @@ Quickstart
imagelab.report()
More on how to get started with CleanVision:
- `Example Python script <https://github.com/cleanlab/cleanvision/blob/main/docs/source/tutorials/run.py>`_
- `Example Notebooks <https://github.com/cleanlab/cleanvision-examples>`_
- `How To Contribute <https://github.com/cleanlab/cleanvision/blob/main/CONTRIBUTING.md>`_
Additional Resources
--------------------
- Get started with our `Example Notebook <https://cleanvision.readthedocs.io/en/latest/tutorials/tutorial.html>`_
- Explore more `Example Notebooks <https://github.com/cleanlab/cleanvision-examples>`_
- Learn how to contribute in the `Contribution Guide <https://github.com/cleanlab/cleanvision/blob/main/CONTRIBUTING.md>`_


.. toctree::
:hidden:
:maxdepth: 1
:caption: Getting Started

Quickstart <self>
.. _api-reference:


.. _tutorials:
.. toctree::
:hidden:
:maxdepth: 3
:caption: Tutorials
:name: _tutorials

tutorials/tutorial.ipynb
How to Use CleanVision <tutorials/tutorial.ipynb>
tutorials/torchvision_dataset.ipynb
tutorials/huggingface_dataset.ipynb
Frequently Asked Questions <faq>

.. _api-reference:
.. toctree::
:hidden:
:maxdepth: 3
Expand All @@ -153,3 +163,4 @@ More on how to get started with CleanVision:
GitHub <https://github.com/cleanlab/cleanvision.git>
PyPI <https://pypi.org/project/cleanvision/>
Cleanlab Studio <https://cleanlab.ai/studio/?utm_source=cleanvision&utm_medium=docs&utm_campaign=clostostudio>

2 changes: 1 addition & 1 deletion docs/source/tutorials/tutorial.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Overview"
"# How to Use CleanVision"
]
},
{
Expand Down

0 comments on commit 1c6fa9d

Please sign in to comment.