Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrongly labelled when using dataset.ImageFolder #10

Open
chaeunl opened this issue Oct 21, 2022 · 2 comments
Open

Wrongly labelled when using dataset.ImageFolder #10

chaeunl opened this issue Oct 21, 2022 · 2 comments

Comments

@chaeunl
Copy link

chaeunl commented Oct 21, 2022

Hello.

I found that for some OS system (my environment is Ubuntu20.04), the class_to_idx property of dataset.ImageFolder is not aligned with the directories' name, so it leads to wrongly label samples.

For instance, the directory 100 (str) is labelled with 2 (int) class. The easiest way to resolve the above issue is, from the dataset.ImageFolder source code (https://pytorch.org/vision/stable/_modules/torchvision/datasets/folder.html#ImageFolder), modifying the line in find_classes function class_to_idx = {cls_name: i for i, cls_name in enumerate(classes)} with class_to_idx = {cls_name: int(cls_name) for cls_name in classes}.

@chaeunl chaeunl changed the title Wrongly labeled when using dataset.ImageFolder Wrongly labelled when using dataset.ImageFolder Oct 21, 2022
@psandovalsegura
Copy link

psandovalsegura commented May 30, 2024

Following @chaeunl, you can use the following dataset class:

class ImageNetV2Folder(datasets.ImageFolder):
    def find_classes(self, directory: str) -> Tuple[List[str], Dict[str, int]]:
        """Finds the class folders in a dataset.

        See :class:`DatasetFolder` for details.
        """
        classes = sorted(entry.name for entry in os.scandir(directory) if entry.is_dir())
        if not classes:
            raise FileNotFoundError(f"Couldn't find any class folder in {directory}.")

        class_to_idx = {cls_name: int(cls_name) for cls_name in classes}
        return classes, class_to_idx

and initialize it with dataset = ImageNetV2Folder(root="imagenetv2-matched-frequency-format-val").

Sanity Check

Then you can check that the class indices point to the correct folder:

index_to_class = {v: k for k, v in dataset.class_to_idx.items()}
for i in range(len(dataset.classes)):
    print(f'Class idx {i} corresponds to folder name: {index_to_class[i]}') 
    # Class idx 0 corresponds to folder name 0
    # Class idx 1 corresponds to folder name 1
    # Class idx 2 corresponds to folder name 2
    # Class idx 3 corresponds to folder name 3

whereas the current implementation in repo (using dataset = torchvision.datasets.ImageFolder(root="imagenetv2-matched-frequency-format-val") produces:

Class idx 0 corresponds to folder name 0
Class idx 1 corresponds to folder name 1
Class idx 2 corresponds to folder name 10
Class idx 3 corresponds to folder name 100

The problem stems from the class folder being named with int strings without preceding zeros.

@Vaishaal
Copy link
Collaborator

Hi @psandovalsegura is right ImageFolder is not natively compatible because our class ids are actually uuids. Just use the class in https://github.com/modestyachts/ImageNetV2_pytorch (should be same class as @psandovalsegura shows above)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants