Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PyLaia model issue #26

Open
johnlockejrr opened this issue Jan 18, 2025 · 6 comments
Open

PyLaia model issue #26

johnlockejrr opened this issue Jan 18, 2025 · 6 comments
Labels
bug Something isn't working

Comments

@johnlockejrr
Copy link

What happened?

A bug happened!

Steps To Reproduce

Environment:

Python 3.10

jinja2 >= 3.1.3
numpy
opencv-python >= 4.6.0
tqdm >= 4.66.2,<5
xmlschema >= 3.0.2,<4
typer >= 0.12.0
rich >= 13.7.1
jiwer >= 3.0.4
pandas
pagexml-tools >= 0.5.0
transformers[torch] >= 4.44.1
huggingface-hub[cli] >= 0.24.6
ultralytics >= 8.0.225
pydantic >= 2.9.2
pylaia == 1.1.2
mmcv @ https://github.com/Swedish-National-Archives-AI-lab/openmim_install/raw/main/mmcv-2.0.0-cp310-cp310-manylinux1_x86_64.whl
mmdet==3.1.0
mmengine==0.7.2
mmocr==1.0.1
yapf==0.40.1

Pipeline Config:

steps:
- step: Segmentation
  settings:
    model: yolo
    model_settings:
      model: Riksarkivet/yolov9-lines-within-regions-1
- step: TextRecognition
  settings:
    model: PyLaia
    model_settings:
      model: Teklia/pylaia-belfort
      device: cuda
      #revision: d35f921605314afc7324310081bee55a805a0b9f
    generation_settings:
      batch_size: 8
      temperature: 1
- step: OrderLines
- step: Export
  settings:
    format: page
    dest: outputs

Some problem with the image size I think. PyLaia as default expects an image height of 128:

DESKTOP-NHKR7QL - 2025-01-17 23:50:51 UTC - INFO - Include schema from 'file:///home/incognito/htrflow-pylaia/htrflow-pylaia-py3.10/lib/python3.10/site-packages/xmlschema/schemas/XSD_1.1/xsd11-extra.xsd'
DESKTOP-NHKR7QL - 2025-01-17 23:50:51 UTC - INFO - Importing 1 input images with batch size 1
DESKTOP-NHKR7QL - 2025-01-17 23:50:51 UTC - INFO - Initialized collection 'None' with 1 pages
DESKTOP-NHKR7QL - 2025-01-17 23:50:51 UTC - INFO - Running step Segmentation (step 1 / 4)
model.pt: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 122M/122M [00:05<00:00, 23.2MB/s]
DESKTOP-NHKR7QL - 2025-01-17 23:51:01 UTC - INFO - Initialized YOLO model 'Riksarkivet/yolov9-lines-within-regions-1' from .cache/models--Riksarkivet--yolov9-lines-within-regions-1/snapshots/f3cbc6afb021ac5d7d997bbb5585166a0f24da0e/model.pt on device cuda:0
DESKTOP-NHKR7QL - 2025-01-17 23:51:01 UTC - INFO - Model 'YOLO' on device 'cuda' received 1 images in batches of 1 images per batch (1 batches)
YOLO: Running inference (batch size 1):   0%|                                                                                                                                                                                                           | 0/1 [00:00<?, ?it/s]DESKTOP-NHKR7QL - 2025-01-17 23:51:01 UTC - INFO - YOLO: Running inference on 1 images (batch 1 of 1)
YOLO: Running inference (batch size 1): 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:02<00:00,  2.94s/it]
DESKTOP-NHKR7QL - 2025-01-17 23:51:04 UTC - INFO - Running step TextRecognition (step 2 / 4)
DESKTOP-NHKR7QL - 2025-01-17 23:51:04 UTC - INFO - Downloading/loading PyLaia model 'Teklia/pylaia-belfort' from the Hugging Face Hub...
model: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.51k/1.51k [00:00<00:00, 12.7MB/s]
tokens.txt: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 277/277 [00:00<00:00, 2.31MB/s]
lexicon.txt: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 554/554 [00:00<00:00, 4.89MB/s]README.md: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2.57k/2.57k [00:00<00:00, 28.7MB/s].gitattributes: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.52k/1.52k [00:00<00:00, 17.5MB/s]
syms.txt: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 635/635 [00:00<00:00, 8.35MB/s]language_model.arpa.gz: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 11.3M/11.3M [00:00<00:00, 16.7MB/s]
weights.ckpt: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 42.8M/42.8M [00:01<00:00, 33.0MB/s]
Fetching 8 files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:01<00:00,  4.65it/s]
DESKTOP-NHKR7QL - 2025-01-17 23:51:06 UTC - INFO - Initialized PyLaiaModel from 'Teklia/pylaia-belfort' on device 'cuda'.
DESKTOP-NHKR7QL - 2025-01-17 23:51:06 UTC - INFO - Model 'PyLaia' on device 'cuda' received 41 images in batches of 8 images per batch (6 batches)
PyLaia: Running inference (batch size 8):   0%|                                                                                                                                                                                                         | 0/6 [00:00<?, ?it/s]DESKTOP-NHKR7QL - 2025-01-17 23:51:06 UTC - INFO - PyLaia: Running inference on 8 images (batch 1 of 6)
DESKTOP-NHKR7QL - 2025-01-17 23:51:06 UTC - INFO - Using checkpoint ".cache/models--Teklia--pylaia-belfort/snapshots/d35f921605314afc7324310081bee55a805a0b9f/weights.ckpt"
DESKTOP-NHKR7QL - 2025-01-17 23:51:06 UTC - WARNING - The key 'use_masks' is not supported anymore and will be removed.
DESKTOP-NHKR7QL - 2025-01-17 23:51:06 UTC - INFO - Loaded model .cache/models--Teklia--pylaia-belfort/snapshots/d35f921605314afc7324310081bee55a805a0b9f/model
Loading the LM will be faster if you build a binary file.
Reading .cache/models--Teklia--pylaia-belfort/snapshots/d35f921605314afc7324310081bee55a805a0b9f/language_model.arpa.gz
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
****************************************************************************************************
DESKTOP-NHKR7QL - 2025-01-17 23:51:06 UTC - INFO - GPU available: True, used: True
DESKTOP-NHKR7QL - 2025-01-17 23:51:06 UTC - INFO - TPU available: False, using: 0 TPU cores
DESKTOP-NHKR7QL - 2025-01-17 23:51:06 UTC - INFO - IPU available: False, using: 0 IPUs
DESKTOP-NHKR7QL - 2025-01-17 23:51:06 UTC - INFO - LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
PyLaia: Running inference (batch size 8):   0%|                                                                                                                                                                                                         | 0/6 [00:00<?, ?it/s]
DESKTOP-NHKR7QL - 2025-01-17 23:51:07 UTC - ERROR - Pipeline failed on step TextRecognition (step 2 / 4)
Traceback (most recent call last):
  File "/home/incognito/htrflow-pylaia/htrflow-pylaia-py3.10/lib/python3.10/site-packages/laia/engine/engine_exception.py", line 27, in exception_catcher
    yield
  File "/home/incognito/htrflow-pylaia/htrflow-pylaia-py3.10/lib/python3.10/site-packages/laia/engine/evaluator_module.py", line 28, in test_step
    return self.model(batch_x)
  File "/home/incognito/htrflow-pylaia/htrflow-pylaia-py3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/incognito/htrflow-pylaia/htrflow-pylaia-py3.10/lib/python3.10/site-packages/laia/models/htr/laia_crnn.py", line 117, in forward
    x = self.sequencer(x)
  File "/home/incognito/htrflow-pylaia/htrflow-pylaia-py3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/incognito/htrflow-pylaia/htrflow-pylaia-py3.10/lib/python3.10/site-packages/laia/nn/image_pooling_sequencer.py", line 51, in forward
    raise ValueError(
ValueError: Input images must have a fixed height of 16 pixels, found [19]

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/incognito/htrflow-pylaia/src/htrflow/pipeline/pipeline.py", line 30, in run
    collection = step.run(collection)
  File "/home/incognito/htrflow-pylaia/src/htrflow/pipeline/steps.py", line 105, in run
    result = self.model(collection.segments(), **self.generation_kwargs)
  File "/home/incognito/htrflow-pylaia/src/htrflow/models/base_model.py", line 121, in __call__
    return self.predict(images, **kwargs)
  File "/home/incognito/htrflow-pylaia/src/htrflow/models/base_model.py", line 109, in predict
    batch_results = self._predict(scaled_batch, **kwargs)
  File "/home/incognito/htrflow-pylaia/src/htrflow/models/teklia/pylaia.py", line 152, in _predict
    decode(
  File "/home/incognito/htrflow-pylaia/htrflow-pylaia-py3.10/lib/python3.10/site-packages/laia/scripts/htr/decode_ctc.py", line 124, in run
    trainer.test(evaluator_module, datamodule=data_module, verbose=False)
  File "/home/incognito/htrflow-pylaia/htrflow-pylaia-py3.10/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 706, in test
    results = self._run(model)
  File "/home/incognito/htrflow-pylaia/htrflow-pylaia-py3.10/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 918, in _run
    self._dispatch()
  File "/home/incognito/htrflow-pylaia/htrflow-pylaia-py3.10/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 982, in _dispatch
    self.accelerator.start_evaluating(self)
  File "/home/incognito/htrflow-pylaia/htrflow-pylaia-py3.10/lib/python3.10/site-packages/pytorch_lightning/accelerators/accelerator.py", line 95, in start_evaluating
    self.training_type_plugin.start_evaluating(trainer)
  File "/home/incognito/htrflow-pylaia/htrflow-pylaia-py3.10/lib/python3.10/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 165, in start_evaluating
    self._results = trainer.run_stage()
  File "/home/incognito/htrflow-pylaia/htrflow-pylaia-py3.10/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 993, in run_stage
    return self._run_evaluate()
  File "/home/incognito/htrflow-pylaia/htrflow-pylaia-py3.10/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1079, in _run_evaluate
    eval_loop_results = self._evaluation_loop.run()
  File "/home/incognito/htrflow-pylaia/htrflow-pylaia-py3.10/lib/python3.10/site-packages/pytorch_lightning/loops/base.py", line 111, in run
    self.advance(*args, **kwargs)
  File "/home/incognito/htrflow-pylaia/htrflow-pylaia-py3.10/lib/python3.10/site-packages/pytorch_lightning/loops/dataloader/evaluation_loop.py", line 110, in advance
    dl_outputs = self.epoch_loop.run(
  File "/home/incognito/htrflow-pylaia/htrflow-pylaia-py3.10/lib/python3.10/site-packages/pytorch_lightning/loops/base.py", line 111, in run
    self.advance(*args, **kwargs)
  File "/home/incognito/htrflow-pylaia/htrflow-pylaia-py3.10/lib/python3.10/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 110, in advance
    output = self.evaluation_step(batch, batch_idx, dataloader_idx)
  File "/home/incognito/htrflow-pylaia/htrflow-pylaia-py3.10/lib/python3.10/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 150, in evaluation_step
    output = self.trainer.accelerator.test_step(step_kwargs)
  File "/home/incognito/htrflow-pylaia/htrflow-pylaia-py3.10/lib/python3.10/site-packages/pytorch_lightning/accelerators/accelerator.py", line 226, in test_step
    return self.training_type_plugin.test_step(*step_kwargs.values())
  File "/home/incognito/htrflow-pylaia/htrflow-pylaia-py3.10/lib/python3.10/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 181, in test_step
    return self.model.test_step(*args, **kwargs)
  File "/home/incognito/htrflow-pylaia/htrflow-pylaia-py3.10/lib/python3.10/site-packages/laia/engine/evaluator_module.py", line 23, in test_step
    with exception_catcher(
  File "/usr/lib/python3.10/contextlib.py", line 153, in __exit__
    self.gen.throw(typ, value, traceback)
  File "/home/incognito/htrflow-pylaia/htrflow-pylaia-py3.10/lib/python3.10/site-packages/laia/engine/engine_exception.py", line 29, in exception_catcher
    raise EngineException(
laia.engine.engine_exception.EngineException: Exception "ValueError('Input images must have a fixed height of 16 pixels, found [19]')" raised during epoch=0, global_step=0 with batch=['fc00fe3b-cd7f-4307-9dbf-5b1ac75d0eae']
Traceback (most recent call last):
  File "/home/incognito/htrflow-pylaia/htrflow-pylaia-py3.10/lib/python3.10/site-packages/laia/engine/engine_exception.py", line 27, in exception_catcher
    yield
  File "/home/incognito/htrflow-pylaia/htrflow-pylaia-py3.10/lib/python3.10/site-packages/laia/engine/evaluator_module.py", line 28, in test_step
    return self.model(batch_x)
  File "/home/incognito/htrflow-pylaia/htrflow-pylaia-py3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/incognito/htrflow-pylaia/htrflow-pylaia-py3.10/lib/python3.10/site-packages/laia/models/htr/laia_crnn.py", line 117, in forward
    x = self.sequencer(x)
  File "/home/incognito/htrflow-pylaia/htrflow-pylaia-py3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/incognito/htrflow-pylaia/htrflow-pylaia-py3.10/lib/python3.10/site-packages/laia/nn/image_pooling_sequencer.py", line 51, in forward
    raise ValueError(
ValueError: Input images must have a fixed height of 16 pixels, found [19]

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/incognito/htrflow-pylaia/htrflow-pylaia-py3.10/bin/htrflow", line 8, in <module>
    sys.exit(app())
  File "/home/incognito/htrflow-pylaia/htrflow-pylaia-py3.10/lib/python3.10/site-packages/typer/main.py", line 340, in __call__
    raise e
  File "/home/incognito/htrflow-pylaia/htrflow-pylaia-py3.10/lib/python3.10/site-packages/typer/main.py", line 323, in __call__
    return get_command(self)(*args, **kwargs)
  File "/home/incognito/htrflow-pylaia/htrflow-pylaia-py3.10/lib/python3.10/site-packages/click/core.py", line 1161, in __call__
    return self.main(*args, **kwargs)
  File "/home/incognito/htrflow-pylaia/htrflow-pylaia-py3.10/lib/python3.10/site-packages/typer/core.py", line 743, in main
    return _main(
  File "/home/incognito/htrflow-pylaia/htrflow-pylaia-py3.10/lib/python3.10/site-packages/typer/core.py", line 198, in _main
    rv = self.invoke(ctx)
  File "/home/incognito/htrflow-pylaia/htrflow-pylaia-py3.10/lib/python3.10/site-packages/click/core.py", line 1697, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/incognito/htrflow-pylaia/htrflow-pylaia-py3.10/lib/python3.10/site-packages/click/core.py", line 1443, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/incognito/htrflow-pylaia/htrflow-pylaia-py3.10/lib/python3.10/site-packages/click/core.py", line 788, in invoke
    return __callback(*args, **kwargs)
  File "/home/incognito/htrflow-pylaia/htrflow-pylaia-py3.10/lib/python3.10/site-packages/typer/main.py", line 698, in wrapper
    return callback(**use_params)
  File "/home/incognito/htrflow-pylaia/src/htrflow/cli.py", line 111, in run_pipeline
    collection = pipe.run(collection)
  File "/home/incognito/htrflow-pylaia/src/htrflow/pipeline/pipeline.py", line 30, in run
    collection = step.run(collection)
  File "/home/incognito/htrflow-pylaia/src/htrflow/pipeline/steps.py", line 105, in run
    result = self.model(collection.segments(), **self.generation_kwargs)
  File "/home/incognito/htrflow-pylaia/src/htrflow/models/base_model.py", line 121, in __call__
    return self.predict(images, **kwargs)
  File "/home/incognito/htrflow-pylaia/src/htrflow/models/base_model.py", line 109, in predict
    batch_results = self._predict(scaled_batch, **kwargs)
  File "/home/incognito/htrflow-pylaia/src/htrflow/models/teklia/pylaia.py", line 152, in _predict
    decode(
  File "/home/incognito/htrflow-pylaia/htrflow-pylaia-py3.10/lib/python3.10/site-packages/laia/scripts/htr/decode_ctc.py", line 124, in run
    trainer.test(evaluator_module, datamodule=data_module, verbose=False)
  File "/home/incognito/htrflow-pylaia/htrflow-pylaia-py3.10/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 706, in test
    results = self._run(model)
  File "/home/incognito/htrflow-pylaia/htrflow-pylaia-py3.10/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 918, in _run
    self._dispatch()
  File "/home/incognito/htrflow-pylaia/htrflow-pylaia-py3.10/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 982, in _dispatch
    self.accelerator.start_evaluating(self)
  File "/home/incognito/htrflow-pylaia/htrflow-pylaia-py3.10/lib/python3.10/site-packages/pytorch_lightning/accelerators/accelerator.py", line 95, in start_evaluating
    self.training_type_plugin.start_evaluating(trainer)
  File "/home/incognito/htrflow-pylaia/htrflow-pylaia-py3.10/lib/python3.10/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 165, in start_evaluating
    self._results = trainer.run_stage()
  File "/home/incognito/htrflow-pylaia/htrflow-pylaia-py3.10/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 993, in run_stage
    return self._run_evaluate()
  File "/home/incognito/htrflow-pylaia/htrflow-pylaia-py3.10/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1079, in _run_evaluate
    eval_loop_results = self._evaluation_loop.run()
  File "/home/incognito/htrflow-pylaia/htrflow-pylaia-py3.10/lib/python3.10/site-packages/pytorch_lightning/loops/base.py", line 111, in run
    self.advance(*args, **kwargs)
  File "/home/incognito/htrflow-pylaia/htrflow-pylaia-py3.10/lib/python3.10/site-packages/pytorch_lightning/loops/dataloader/evaluation_loop.py", line 110, in advance
    dl_outputs = self.epoch_loop.run(
  File "/home/incognito/htrflow-pylaia/htrflow-pylaia-py3.10/lib/python3.10/site-packages/pytorch_lightning/loops/base.py", line 111, in run
    self.advance(*args, **kwargs)
  File "/home/incognito/htrflow-pylaia/htrflow-pylaia-py3.10/lib/python3.10/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 110, in advance
    output = self.evaluation_step(batch, batch_idx, dataloader_idx)
  File "/home/incognito/htrflow-pylaia/htrflow-pylaia-py3.10/lib/python3.10/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 150, in evaluation_step
    output = self.trainer.accelerator.test_step(step_kwargs)
  File "/home/incognito/htrflow-pylaia/htrflow-pylaia-py3.10/lib/python3.10/site-packages/pytorch_lightning/accelerators/accelerator.py", line 226, in test_step
    return self.training_type_plugin.test_step(*step_kwargs.values())
  File "/home/incognito/htrflow-pylaia/htrflow-pylaia-py3.10/lib/python3.10/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 181, in test_step
    return self.model.test_step(*args, **kwargs)
  File "/home/incognito/htrflow-pylaia/htrflow-pylaia-py3.10/lib/python3.10/site-packages/laia/engine/evaluator_module.py", line 23, in test_step
    with exception_catcher(
  File "/usr/lib/python3.10/contextlib.py", line 153, in __exit__
    self.gen.throw(typ, value, traceback)
  File "/home/incognito/htrflow-pylaia/htrflow-pylaia-py3.10/lib/python3.10/site-packages/laia/engine/engine_exception.py", line 29, in exception_catcher
    raise EngineException(
laia.engine.engine_exception.EngineException: Exception "ValueError('Input images must have a fixed height of 16 pixels, found [19]')" raised during epoch=0, global_step=0 with batch=['fc00fe3b-cd7f-4307-9dbf-5b1ac75d0eae']
Decoding:   0%|          | 0/8 [00:00<?, ?it/s]

Here is how they runt PyLaia with the API not cli:

https://huggingface.co/spaces/Teklia/PyLaia/blob/main/app.py

Relevant log output

@johnlockejrr johnlockejrr added the bug Something isn't working label Jan 18, 2025
@johnlockejrr
Copy link
Author

I think this will be addressed here: #25

@Borg93
Copy link
Contributor

Borg93 commented Jan 22, 2025

Fixed in v0.2.1

@Borg93 Borg93 closed this as completed Jan 22, 2025
@johnlockejrr
Copy link
Author

Still not solved:

(htrflow-py3.10) incognito@DESKTOP-H1BS9PO:~/htrflow$ htrflow pipeline yolo2pylaia_RTL.yaml CBL_Ms._Heb_751_235.jpg
DESKTOP-H1BS9PO - 2025-01-23 09:17:38 UTC - INFO - Include schema from 'file:///home/incognito/htrflow/htrflow-py3.10/lib/python3.10/site-packages/xmlschema/schemas/XSD_1.1/xsd11-extra.xsd'
DESKTOP-H1BS9PO - 2025-01-23 09:17:38 UTC - INFO - Importing 1 input images with batch size 1
DESKTOP-H1BS9PO - 2025-01-23 09:17:38 UTC - INFO - Initialized collection 'None' with 1 pages
DESKTOP-H1BS9PO - 2025-01-23 09:17:38 UTC - INFO - Running step Segmentation (step 1 / 4)
model.pt: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 54.8M/54.8M [00:01<00:00, 34.8MB/s]
DESKTOP-H1BS9PO - 2025-01-23 09:17:42 UTC - INFO - Initialized YOLO model 'johnlockejrr/yolov8-samaritan-segmentation' from .cache/models--johnlockejrr--yolov8-samaritan-segmentation/snapshots/72a460b68b240d2195457167a427b3f76c04417d/model.pt on device cuda:0
DESKTOP-H1BS9PO - 2025-01-23 09:17:42 UTC - INFO - Model 'YOLO' on device 'cuda' received 1 images in batches of 1 images per batch (1 batches)
YOLO: Running inference (batch size 1):   0%|                                                                                                                                                                                                                     | 0/1 [00:00<?, ?it/s]DESKTOP-H1BS9PO - 2025-01-23 09:17:42 UTC - INFO - YOLO: Running inference on 1 images (batch 1 of 1)
YOLO: Running inference (batch size 1): 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00,  1.25s/it]
DESKTOP-H1BS9PO - 2025-01-23 09:17:43 UTC - INFO - Running step TextRecognition (step 2 / 4)
DESKTOP-H1BS9PO - 2025-01-23 09:17:43 UTC - INFO - Downloading/loading PyLaia model 'johnlockejrr/pylaia-samaritan_v1' from the Hugging Face Hub...
lexicon.txt: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 242/242 [00:00<00:00, 1.02MB/s]
statistics.md: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4.51k/4.51k [00:00<00:00, 19.6MB/s]
README.md: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 821/821 [00:00<00:00, 6.75MB/s]
model: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.51k/1.51k [00:00<00:00, 15.7MB/s]
metrics.csv: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 19.5k/19.5k [00:00<00:00, 102MB/s]
.gitattributes: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.58k/1.58k [00:00<00:00, 19.4MB/s]
language_model.arpa.gz: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 772k/772k [00:00<00:00, 6.23MB/s]
syms.txt: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 225/225 [00:00<00:00, 1.60MB/s]
language_model.binary: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.79M/1.79M [00:00<00:00, 11.1MB/s]tokens.txt: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 121/121 [00:00<00:00, 1.36MB/s]
weights.ckpt: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 42.5M/42.5M [00:01<00:00, 33.4MB/s]
Fetching 11 files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 11/11 [00:02<00:00,  5.15it/s]
DESKTOP-H1BS9PO - 2025-01-23 09:17:46 UTC - INFO - Initialized PyLaiaModel from 'johnlockejrr/pylaia-samaritan_v1' on device 'cuda'.
DESKTOP-H1BS9PO - 2025-01-23 09:17:46 UTC - INFO - Model 'PyLaia' on device 'cuda' received 26 images in batches of 8 images per batch (4 batches)
PyLaia: Running inference (batch size 8):   0%|                                                                                                                                                                                                                   | 0/4 [00:00<?, ?it/s]DESKTOP-H1BS9PO - 2025-01-23 09:17:46 UTC - INFO - PyLaia: Running inference on 8 images (batch 1 of 4)
DESKTOP-H1BS9PO - 2025-01-23 09:17:46 UTC - INFO - Using checkpoint ".cache/models--johnlockejrr--pylaia-samaritan_v1/snapshots/844c5e4fadb1a0f5c82bfde97408592425bef747/weights.ckpt"
DESKTOP-H1BS9PO - 2025-01-23 09:17:46 UTC - INFO - Loaded model .cache/models--johnlockejrr--pylaia-samaritan_v1/snapshots/844c5e4fadb1a0f5c82bfde97408592425bef747/model
Loading the LM will be faster if you build a binary file.
Reading .cache/models--johnlockejrr--pylaia-samaritan_v1/snapshots/844c5e4fadb1a0f5c82bfde97408592425bef747/language_model.arpa.gz
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
****************************************************************************************************
DESKTOP-H1BS9PO - 2025-01-23 09:17:46 UTC - INFO - GPU available: True, used: True
DESKTOP-H1BS9PO - 2025-01-23 09:17:46 UTC - INFO - TPU available: False, using: 0 TPU cores
DESKTOP-H1BS9PO - 2025-01-23 09:17:46 UTC - INFO - IPU available: False, using: 0 IPUs
DESKTOP-H1BS9PO - 2025-01-23 09:17:46 UTC - INFO - LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
PyLaia: Running inference (batch size 8):   0%|                                                                                                                                                                                                                   | 0/4 [00:00<?, ?it/s]
DESKTOP-H1BS9PO - 2025-01-23 09:17:46 UTC - ERROR - Pipeline failed on step TextRecognition (step 2 / 4)
Traceback (most recent call last):
  File "/home/incognito/htrflow/htrflow-py3.10/lib/python3.10/site-packages/laia/engine/engine_exception.py", line 27, in exception_catcher
    yield
  File "/home/incognito/htrflow/htrflow-py3.10/lib/python3.10/site-packages/laia/engine/evaluator_module.py", line 28, in test_step
    return self.model(batch_x)
  File "/home/incognito/htrflow/htrflow-py3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/incognito/htrflow/htrflow-py3.10/lib/python3.10/site-packages/laia/models/htr/laia_crnn.py", line 117, in forward
    x = self.sequencer(x)
  File "/home/incognito/htrflow/htrflow-py3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/incognito/htrflow/htrflow-py3.10/lib/python3.10/site-packages/laia/nn/image_pooling_sequencer.py", line 51, in forward
    raise ValueError(
ValueError: Input images must have a fixed height of 16 pixels, found [23]

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/incognito/htrflow/src/htrflow/pipeline/pipeline.py", line 30, in run
    collection = step.run(collection)
  File "/home/incognito/htrflow/src/htrflow/pipeline/steps.py", line 105, in run
    result = self.model(collection.segments(), **self.generation_kwargs)
  File "/home/incognito/htrflow/src/htrflow/models/base_model.py", line 121, in __call__
    return self.predict(images, **kwargs)
  File "/home/incognito/htrflow/src/htrflow/models/base_model.py", line 109, in predict
    batch_results = self._predict(scaled_batch, **kwargs)
  File "/home/incognito/htrflow/src/htrflow/models/teklia/pylaia.py", line 156, in _predict
    decode(
  File "/home/incognito/htrflow/htrflow-py3.10/lib/python3.10/site-packages/laia/scripts/htr/decode_ctc.py", line 124, in run
    trainer.test(evaluator_module, datamodule=data_module, verbose=False)
  File "/home/incognito/htrflow/htrflow-py3.10/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 706, in test
    results = self._run(model)
  File "/home/incognito/htrflow/htrflow-py3.10/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 918, in _run
    self._dispatch()
  File "/home/incognito/htrflow/htrflow-py3.10/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 982, in _dispatch
    self.accelerator.start_evaluating(self)
  File "/home/incognito/htrflow/htrflow-py3.10/lib/python3.10/site-packages/pytorch_lightning/accelerators/accelerator.py", line 95, in start_evaluating
    self.training_type_plugin.start_evaluating(trainer)
  File "/home/incognito/htrflow/htrflow-py3.10/lib/python3.10/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 165, in start_evaluating
    self._results = trainer.run_stage()
  File "/home/incognito/htrflow/htrflow-py3.10/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 993, in run_stage
    return self._run_evaluate()
  File "/home/incognito/htrflow/htrflow-py3.10/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1079, in _run_evaluate
    eval_loop_results = self._evaluation_loop.run()
  File "/home/incognito/htrflow/htrflow-py3.10/lib/python3.10/site-packages/pytorch_lightning/loops/base.py", line 111, in run
    self.advance(*args, **kwargs)
  File "/home/incognito/htrflow/htrflow-py3.10/lib/python3.10/site-packages/pytorch_lightning/loops/dataloader/evaluation_loop.py", line 110, in advance
    dl_outputs = self.epoch_loop.run(
  File "/home/incognito/htrflow/htrflow-py3.10/lib/python3.10/site-packages/pytorch_lightning/loops/base.py", line 111, in run
    self.advance(*args, **kwargs)
  File "/home/incognito/htrflow/htrflow-py3.10/lib/python3.10/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 110, in advance
    output = self.evaluation_step(batch, batch_idx, dataloader_idx)
  File "/home/incognito/htrflow/htrflow-py3.10/lib/python3.10/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 150, in evaluation_step
    output = self.trainer.accelerator.test_step(step_kwargs)
  File "/home/incognito/htrflow/htrflow-py3.10/lib/python3.10/site-packages/pytorch_lightning/accelerators/accelerator.py", line 226, in test_step
    return self.training_type_plugin.test_step(*step_kwargs.values())
  File "/home/incognito/htrflow/htrflow-py3.10/lib/python3.10/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 181, in test_step
    return self.model.test_step(*args, **kwargs)
  File "/home/incognito/htrflow/htrflow-py3.10/lib/python3.10/site-packages/laia/engine/evaluator_module.py", line 23, in test_step
    with exception_catcher(
  File "/usr/lib/python3.10/contextlib.py", line 153, in __exit__
    self.gen.throw(typ, value, traceback)
  File "/home/incognito/htrflow/htrflow-py3.10/lib/python3.10/site-packages/laia/engine/engine_exception.py", line 29, in exception_catcher
    raise EngineException(
laia.engine.engine_exception.EngineException: Exception "ValueError('Input images must have a fixed height of 16 pixels, found [23]')" raised during epoch=0, global_step=0 with batch=['0e6d077e-987b-40a8-9969-3fc8f2b12a5d']
Traceback (most recent call last):
  File "/home/incognito/htrflow/htrflow-py3.10/lib/python3.10/site-packages/laia/engine/engine_exception.py", line 27, in exception_catcher
    yield
  File "/home/incognito/htrflow/htrflow-py3.10/lib/python3.10/site-packages/laia/engine/evaluator_module.py", line 28, in test_step
    return self.model(batch_x)
  File "/home/incognito/htrflow/htrflow-py3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/incognito/htrflow/htrflow-py3.10/lib/python3.10/site-packages/laia/models/htr/laia_crnn.py", line 117, in forward
    x = self.sequencer(x)
  File "/home/incognito/htrflow/htrflow-py3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/incognito/htrflow/htrflow-py3.10/lib/python3.10/site-packages/laia/nn/image_pooling_sequencer.py", line 51, in forward
    raise ValueError(
ValueError: Input images must have a fixed height of 16 pixels, found [23]

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/incognito/htrflow/htrflow-py3.10/bin/htrflow", line 8, in <module>
    sys.exit(app())
  File "/home/incognito/htrflow/htrflow-py3.10/lib/python3.10/site-packages/typer/main.py", line 340, in __call__
    raise e
  File "/home/incognito/htrflow/htrflow-py3.10/lib/python3.10/site-packages/typer/main.py", line 323, in __call__
    return get_command(self)(*args, **kwargs)
  File "/home/incognito/htrflow/htrflow-py3.10/lib/python3.10/site-packages/click/core.py", line 1161, in __call__
    return self.main(*args, **kwargs)
  File "/home/incognito/htrflow/htrflow-py3.10/lib/python3.10/site-packages/typer/core.py", line 743, in main
    return _main(
  File "/home/incognito/htrflow/htrflow-py3.10/lib/python3.10/site-packages/typer/core.py", line 198, in _main
    rv = self.invoke(ctx)
  File "/home/incognito/htrflow/htrflow-py3.10/lib/python3.10/site-packages/click/core.py", line 1697, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/incognito/htrflow/htrflow-py3.10/lib/python3.10/site-packages/click/core.py", line 1443, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/incognito/htrflow/htrflow-py3.10/lib/python3.10/site-packages/click/core.py", line 788, in invoke
    return __callback(*args, **kwargs)
  File "/home/incognito/htrflow/htrflow-py3.10/lib/python3.10/site-packages/typer/main.py", line 698, in wrapper
    return callback(**use_params)
  File "/home/incognito/htrflow/src/htrflow/cli.py", line 111, in run_pipeline
    collection = pipe.run(collection)
  File "/home/incognito/htrflow/src/htrflow/pipeline/pipeline.py", line 30, in run
    collection = step.run(collection)
  File "/home/incognito/htrflow/src/htrflow/pipeline/steps.py", line 105, in run
    result = self.model(collection.segments(), **self.generation_kwargs)
  File "/home/incognito/htrflow/src/htrflow/models/base_model.py", line 121, in __call__
    return self.predict(images, **kwargs)
  File "/home/incognito/htrflow/src/htrflow/models/base_model.py", line 109, in predict
    batch_results = self._predict(scaled_batch, **kwargs)
  File "/home/incognito/htrflow/src/htrflow/models/teklia/pylaia.py", line 156, in _predict
    decode(
  File "/home/incognito/htrflow/htrflow-py3.10/lib/python3.10/site-packages/laia/scripts/htr/decode_ctc.py", line 124, in run
    trainer.test(evaluator_module, datamodule=data_module, verbose=False)
  File "/home/incognito/htrflow/htrflow-py3.10/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 706, in test
    results = self._run(model)
  File "/home/incognito/htrflow/htrflow-py3.10/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 918, in _run
    self._dispatch()
  File "/home/incognito/htrflow/htrflow-py3.10/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 982, in _dispatch
    self.accelerator.start_evaluating(self)
  File "/home/incognito/htrflow/htrflow-py3.10/lib/python3.10/site-packages/pytorch_lightning/accelerators/accelerator.py", line 95, in start_evaluating
    self.training_type_plugin.start_evaluating(trainer)
  File "/home/incognito/htrflow/htrflow-py3.10/lib/python3.10/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 165, in start_evaluating
    self._results = trainer.run_stage()
  File "/home/incognito/htrflow/htrflow-py3.10/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 993, in run_stage
    return self._run_evaluate()
  File "/home/incognito/htrflow/htrflow-py3.10/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1079, in _run_evaluate
    eval_loop_results = self._evaluation_loop.run()
  File "/home/incognito/htrflow/htrflow-py3.10/lib/python3.10/site-packages/pytorch_lightning/loops/base.py", line 111, in run
    self.advance(*args, **kwargs)
  File "/home/incognito/htrflow/htrflow-py3.10/lib/python3.10/site-packages/pytorch_lightning/loops/dataloader/evaluation_loop.py", line 110, in advance
    dl_outputs = self.epoch_loop.run(
  File "/home/incognito/htrflow/htrflow-py3.10/lib/python3.10/site-packages/pytorch_lightning/loops/base.py", line 111, in run
    self.advance(*args, **kwargs)
  File "/home/incognito/htrflow/htrflow-py3.10/lib/python3.10/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 110, in advance
    output = self.evaluation_step(batch, batch_idx, dataloader_idx)
  File "/home/incognito/htrflow/htrflow-py3.10/lib/python3.10/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 150, in evaluation_step
    output = self.trainer.accelerator.test_step(step_kwargs)
  File "/home/incognito/htrflow/htrflow-py3.10/lib/python3.10/site-packages/pytorch_lightning/accelerators/accelerator.py", line 226, in test_step
    return self.training_type_plugin.test_step(*step_kwargs.values())
  File "/home/incognito/htrflow/htrflow-py3.10/lib/python3.10/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 181, in test_step
    return self.model.test_step(*args, **kwargs)
  File "/home/incognito/htrflow/htrflow-py3.10/lib/python3.10/site-packages/laia/engine/evaluator_module.py", line 23, in test_step
    with exception_catcher(
  File "/usr/lib/python3.10/contextlib.py", line 153, in __exit__
    self.gen.throw(typ, value, traceback)
  File "/home/incognito/htrflow/htrflow-py3.10/lib/python3.10/site-packages/laia/engine/engine_exception.py", line 29, in exception_catcher
    raise EngineException(
laia.engine.engine_exception.EngineException: Exception "ValueError('Input images must have a fixed height of 16 pixels, found [23]')" raised during epoch=0, global_step=0 with batch=['0e6d077e-987b-40a8-9969-3fc8f2b12a5d']
Decoding:   0%|          | 0/8 [00:00<?, ?it/s]****

v0.2.1

@johnlockejrr
Copy link
Author

johnlockejrr commented Jan 23, 2025

Should I specify somehow min_height?
I learned the default value is 128 so the padding isn't enough, we need resize (see my code in #25)

My raugh code of htrflow-pylaia/src/htrflow/models/teklia/pylaia.py that works:

import logging
import multiprocessing
import re
import sys
from contextlib import redirect_stdout
from pathlib import Path
from tempfile import NamedTemporaryFile, mkdtemp
from uuid import uuid4

import cv2
import numpy as np
import pydantic
from huggingface_hub import model_info, snapshot_download
from laia.common.arguments import CommonArgs, DataArgs, DecodeArgs, TrainerArgs
from laia.scripts.htr.decode_ctc import run as decode

from htrflow.models.base_model import BaseModel
from htrflow.results import Result
from htrflow.utils.imgproc import pad_image


logger = logging.getLogger(__name__)


class PyLaia(BaseModel):
    """
    A minimal HTRflow-style model wrapper around PyLaia.

    Uses Teklia's implementation of PyLaia. For further
    information, see:
    https://atr.pages.teklia.com/pylaia/usage/prediction/#decode-arguments

    Example usage with the `TextRecognition` step:
    ```yaml
    - step: TextRecognition
      settings:
        model: PyLaia
        model_settings:
          model: Teklia/pylaia-belfort
          device: cuda
          revision: d35f921605314afc7324310081bee55a805a0b9f
        generation_settings:
          batch_size: 8
          temperature: 1
    ```
    """

    IMAGE_ID_PATTERN = r"(?P<image_id>[-a-z0-9]{36})"
    CONFIDENCE_PATTERN = r"(?P<confidence>[0-9.]+)"
    TEXT_PATTERN = r"\s*(?P<text>.*)\s*"
    LINE_PREDICTION = re.compile(rf"{IMAGE_ID_PATTERN} {CONFIDENCE_PATTERN} {TEXT_PATTERN}")

    def __init__(
        self,
        model: str,
        revision: str | None = None,
        **kwargs,
    ):
        """
        Arguments:
            model (str):
                The Hugging Face Hub repository ID or a local path with PyLaia artifacts:
                - weights.ckpt
                - syms.txt
                - (optionally) language_model.arpa.gz, lexicon.txt, tokens.txt
            revision: Optional revision of the Huggingface repository.
            kwargs:
                Additional kwargs passed to BaseModel.__init__ (e.g., 'device').
        """
        super().__init__(**kwargs)

        model_info_dict = get_pylaia_model(model, revision=revision)
        self.model_dir = model_info_dict.model_dir
        model_version = model_info_dict.model_version
        self.use_language_model = model_info_dict.use_language_model
        self.language_model_params = model_info_dict.language_model_params

        self.metadata.update(
            {
                "model": model,
                "model_version": model_version,
            }
        )

        logger.info(f"Initialized PyLaiaModel from '{model}' on device '{self.device}'.")

    def _predict(self, images: list[np.ndarray], **decode_kwargs) -> list[Result]:
        """
        PyLaia-specific prediction method: runs text recognition.

        Args:
            images (list[np.ndarray]):
                List of images as NumPy arrays (e.g., shape [H, W, C]).
            batch_size (int, optional):
                Batch size for decoding. Defaults to 1.
            reading_order (str, optional):
                Reading order for text recognition. Defaults to "LTR".
            num_workers (int, optional):
                Number of workers for parallel processing. Defaults to `multiprocessing.cpu_count()`.

        Returns:
            list[Result]:
                A list of Result objects containing recognized text and
                optionally confidence scores.
        """

        temperature = decode_kwargs.get("temperature", 1.0)
        batch_size = decode_kwargs.get("batch_size", 1)
        reading_order = decode_kwargs.get("reading_order", "LTR")
        num_workers = decode_kwargs.get("num_workers", multiprocessing.cpu_count())

        common_args = CommonArgs(
            checkpoint="weights.ckpt",
            train_path=str(self.model_dir),
            experiment_dirname="",
        )

        data_args = DataArgs(
            batch_size=batch_size, color_mode="L", reading_order=reading_order, num_workers=num_workers
        )

        gpus_flag = 1 if self.device.type == "cuda" else 0
        trainer_args = TrainerArgs(gpus=gpus_flag)

        decode_args = DecodeArgs(
            include_img_ids=True,
            join_string="",
            convert_spaces=True,
            print_line_confidence_scores=True,
            print_word_confidence_scores=False,
            temperature=temperature,
            use_language_model=self.use_language_model,
            **self.language_model_params.model_dump(),
        )

        # Note: PyLaia's 'decode' function expects disk-based file paths rather than in-memory data.
        # Because it is tightly integrated as a CLI tool, we must create temporary image files
        # and pass their paths to the PyLaia decoder. Otherwise, PyLaia cannot process these images.
        tmp_images_dir = Path(mkdtemp())
        logger.debug(f"Created temp folder for images: {tmp_images_dir}")

        image_ids = [str(uuid4()) for _ in images]

        DEFAULT_HEIGHT = 128

        def get_width(image, height=DEFAULT_HEIGHT):
            # Get the aspect ratio of the image
            aspect_ratio = image.shape[1] / image.shape[0]  # width / height
            return int(height * aspect_ratio)

        for img_id, np_img in zip(image_ids, images):
            #padded_img = _ensure_min_width(np_img, 120, 125)  # Just to fix the min pixel width issue
            # Resize the image to the desired height while maintaining the aspect ratio
            new_width = get_width(np_img)
            resized_img = cv2.resize(np_img, (new_width, DEFAULT_HEIGHT))
            cv2.imwrite(str(tmp_images_dir / f"{img_id}.jpg"), resized_img)

        with NamedTemporaryFile() as pred_stdout, NamedTemporaryFile() as img_list:
            Path(img_list.name).write_text("\n".join(image_ids))

            with redirect_stdout(open(pred_stdout.name, mode="w")):
                decode(
                    syms=str(self.model_dir / "syms.txt"),
                    img_list=img_list.name,
                    img_dirs=[str(tmp_images_dir)],
                    common=common_args,
                    data=data_args,
                    trainer=trainer_args,
                    decode=decode_args,
                    num_workers=num_workers,
                )
                sys.stdout.flush()

            decode_output_lines = Path(pred_stdout.name).read_text().strip().splitlines()

        results = []
        metadata = self.metadata | {"decode_kwargs": decode_kwargs}

        for line in decode_output_lines:
            match = self.LINE_PREDICTION.match(line)
            if not match:
                logger.warning("Could not parse line: %s", line)
                continue
            _, score_str, text = match.groups()  # _ = image_id

            try:
                score_val = float(score_str)
            except ValueError:
                score_val = 0.0

            result = Result.text_recognition_result(metadata, [text], [score_val])
            results.append(result)

        logger.debug(f"PyLaia recognized {len(results)} lines of text.")

        return results


class LanguageModelParams(pydantic.BaseModel):
    """Pydantic model for language model parameters."""

    language_model_weight: float = 1.0
    language_model_path: str = ""
    lexicon_path: str = ""
    tokens_path: str = ""



class PyLaiaModelInfo(pydantic.BaseModel):
    """
    Pydantic model specifying what `get_pylaia_model` should return.
    """

    model_config = pydantic.ConfigDict(protected_namespaces=())

    model_dir: Path
    model_version: str
    use_language_model: bool
    language_model_params: LanguageModelParams


def get_pylaia_model(
    model: str,
    revision: str | None = None,
    cache_dir: str | None = ".cache",
) -> PyLaiaModelInfo:
    """
    Encapsulates logic for retrieving a PyLaia model (from either a local path
    or by downloading from the Hugging Face Hub), and detecting whether a
    language model is available.

    Args:
        model (str):
            - If this is a valid local directory path, we assume it contains
              the necessary PyLaia files and use that directly.
            - Otherwise, we treat `model` as a Hugging Face Hub repo_id and
              download from the HF Hub.
        revision (str | None, optional):
            Git branch, tag, or commit SHA to download from the HF Hub.
            If None, the default branch or tag is used.
        cache_dir (str | None, optional):
            Path to the folder where cached files are stored. Defaults to ".cache".

    Returns:
        PyLaiaModelInfo: A data class with these fields:
            - model_dir (Path): Local path to the model directory
            - model_version (str): "local" if loaded from directory, or commit SHA if from HF
            - use_language_model (bool): Whether a language model is present
            - language_model_params (dict[str, Any]): Additional arguments for LM
    """

    model_dir, model_version = _download_or_local_path(model, revision, cache_dir)
    use_language_model, language_model_params = _detect_language_model(model_dir)

    logger.debug(f"Model directory: {model_dir}")
    logger.debug(f"Model version: {model_version}")
    logger.debug(f"Use language model: {use_language_model}")

    return PyLaiaModelInfo(
        model_dir=model_dir,
        model_version=model_version,
        use_language_model=use_language_model,
        language_model_params=language_model_params,
    )


def _download_or_local_path(
    model: str,
    revision: str | None = None,
    cache_dir: str | None = None,
) -> tuple[Path, str]:
    """
    If 'model' is a local directory, model_version = "local".
    Otherwise, fetch from HF, and model_version = commit SHA (optionally
    at a specific `revision`).
    """
    model_path = Path(model)
    if model_path.is_dir():
        logger.info(f"Using local PyLaia model from: {model_path}")
        return model_path, "local"
    else:
        logger.info(f"Downloading/loading PyLaia model '{model}' from the Hugging Face Hub...")
        downloaded_dir = Path(snapshot_download(repo_id=model, revision=revision, cache_dir=cache_dir))
        if revision:
            version_sha = model_info(model, revision=revision).sha
        else:
            version_sha = model_info(model).sha

        return downloaded_dir, version_sha


def _detect_language_model(model_dir: Path) -> tuple[bool, LanguageModelParams]:
    """
    Checks if 'tokens.txt' is present in the model_dir, and if so,
    updates language model parameters accordingly.
    """
    tokens_file = model_dir / "tokens.txt"
    use_language_model = tokens_file.exists()
    language_model_params = {"language_model_weight": 1.0}

    if use_language_model:
        arpa_file = model_dir / "language_model.arpa.gz"
        lexicon_file = model_dir / "lexicon.txt"

        language_model_params.update(
            {
                "language_model_path": str(arpa_file) if arpa_file.exists() else "",
                "lexicon_path": str(lexicon_file) if lexicon_file.exists() else "",
                "tokens_path": str(tokens_file),
            }
        )

    return use_language_model, language_model_params


def _ensure_min_width(img: np.ndarray, min_width: int, target_width: int) -> np.ndarray:
    """
    Ensures an image meets a minimum width by padding if necessary.

    Args:
        img (np.ndarray): Input image.
        min_width (int): Minimum width before padding.
        target_width (int): Final width after padding.

    Returns:
        np.ndarray: The padded image (if needed).
    """
    _, width = img.shape[:2]
    if width < min_width:
        total_pad = target_width - width
        left_pad = total_pad // 2
        right_pad = total_pad - left_pad
        return pad_image(img, pad_left=left_pad, pad_right=right_pad)
    return img

@Borg93
Copy link
Contributor

Borg93 commented Jan 23, 2025

Should I specify somehow min_height? I learned the default value is 128 so the padding isn't enough, we need resize (see my code in #25)

Hi , just to be sure, have you updated htrflow to v0.2.1?

@johnlockejrr
Copy link
Author

johnlockejrr commented Jan 23, 2025

git clone -b v0.2.1 https://github.com/AI-Riksarkivet/htrflow.git

Yes, I did. From the src:

padded_img = _ensure_min_height(np_img, 128)  # Just to fix the min pixel height (defaults to 128)

Is the v0.2.1 version updated on pip? So I can dirrectly instal with pip install htrflow[teklia] ?

I installed it in another env with pip install htrflow[teklia], same thing.

laia.engine.engine_exception.EngineException: Exception "ValueError('Input images must have a fixed height of 16 pixels, found [23]')" raised during epoch=0, global_step=0 with batch=['d044ee5e-a09c-4390-a807-2a77f5a1085b']

@Borg93 Borg93 reopened this Jan 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants