Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GoogleVisionFormatter return OpenPechaFS's pecha object with incorrect is_private value #239

Open
ta4tsering opened this issue Jan 26, 2023 · 0 comments

Comments

@ta4tsering
Copy link
Contributor

ta4tsering commented Jan 26, 2023

Describe the bug
When I use the recently updated GoogleVisionFormatter class to create opf from OCR output, Even when the work_id's CopyRight status is Public domain, the is_private key's value in the pecha object of the OpenpechaFS return is True and the published opf is private when it should be public.

To Reproduce
Steps to reproduce the behavior:

  1. use the below script
     from openpecha.formatters.ocr.google_vision import GoogleVisionFormatter, GoogleVisionBDRCFileProvider
     from openpecha.core.pecha import OpenPechaGitRepo
     from openpecha.core.ids import get_initial_pecha_id
    
     def make_opf(ocr_import_info, ocr_path):
      work_id = "W3CN18530"
      data_provider = GoogleVisionBDRCFileProvider(bdrc_scan_id=work_id, ocr_import_info=ocr_import_info, 
     ocr_disk_path=ocr_path)
      pecha_id = get_initial_pecha_id()
      formatter = GoogleVisionFormatter(f"./pechas/{pecha_id}/{pecha_id}.opf")
      pecha = formatter.create_opf(data_provider, pecha_id, {}, ocr_import_info)
      pecha.__class__ = OpenPechaGitRepo
      pecha.storage = None
      pecha.meta.id = pecha.pecha_id
      pecha.save_meta()
      pecha.publish(asset_path=ocr_path, asset_name="ocr_output")
    
    if __name__ == "__main__":
      ocr_import_info = {
    	"source": "bdrc",
    	"software": "vision",
    	"batch": "batch-G8E3G",
    	"expected_default_language": "bo",
    	"bdrc_scan_id": "W3CN18530",
    	"ocr_info": {
    		"timestamp": "2023-01-20T17:42:00",
    		"imagesfolder": "images"
    	  }
        }
      ocr_path = Path(f"./ocrs/W3CN18530")
      pecha = make_opf(ocr_import_info, ocr_path)```
    
    
  2. Below link is the OCR example of the OCR output used.
    OCR output of W3CN18530

Expected behavior
the return of OpenpechaFS pecha object's is_private should be false

Screenshots
below screenshot image is what the GoogleVisionFormatter returns
Screenshot 2023-01-24 at 9 49 43 AM

Desktop (please complete the following information):
Openpecha toolkit

  • Version 0.9.23

Additional context
None

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant