How do I correct line recognition? #33

zabak · 2020-05-13T12:53:54Z

When correcting OCR for National Museum, the regions were mostly recognized without problems. When correcting OCR, however, I found cases where part of word was not detected as part of line. Sometimes, it has not been detected at all:

Other times: there were overlapping or duplicate detections:

How should I treat it when correcting?

Should I write the whole line, even the letters that are not part of the line as marked on the image? Should I write the same text twice, when correcting overlapping lines?

zabak · 2022-03-17T21:01:18Z

This is really a problem - please implement support for fixing the text line boundaries and use them to make a better model.

zabak · 2022-03-17T21:01:42Z

see also https://pero-ocr.fit.vutbr.cz/document/collaborators/13e60e04-734b-464a-b63d-eceb8d0a2563

michal-hradis · 2022-03-18T08:16:05Z

We are dubugging layout editor. Normally, you edit text in /ocr/show_results/... The new interface is running at /ocr/show_results_new/... Just manually change the URL. I'll probably have to explain how the interface works.

zabak · 2022-03-20T12:33:28Z

Yes please, I will need some explanation, the interface is not intuitive. Default zoom should zoom to fill the window width or height, not zoom out so much as it does now. There is no way to delete a row. I have no idea how to select two rows to join them. How to resize a row, how edit a shape of a region.

zabak · 2023-02-15T07:55:07Z

@michal-hradis please add the explanation here. Also, how to revert OCR without losing the manually edited baselines.

michal-hradis · 2023-02-15T09:06:17Z

Text transcriptions can be generated again and again without any loss of manual text corrections. Text line detection can not be repeated without loosing manual corrections.

How to edit text lines:

Select by left click.
Press CTRL to show controll points.
Drag controll points with mouse left button. You can delete controll points by moving the line end over them. You can't add controll points.
Change line height by dragging "top" and "bottom" controll points et the end and beginnig of a line.

To delete lines :

Select line.
Press ALT-B or press the "Delete line" button.
This option can be rolled back, but the line disapears only after the document page is reloaded.

Alternative way to delete lines which can not be rolled back and which tends to delete whole text region if you are not carefull at the moment:

Right click on a line.
Slect delete in the context menu, but check carefully that it shows: "Delete row" and not "Delete region" - the second one shows by mistake if you move the mouse when clicking.

Add lines:

Select a region.
Slect tool: "Create new row (baseline).
Create a baseline by left clicking. Finish baseline by pressing enter or right click.
Select line hight - top part and bottom part separately. Use left click to set the heights.

Edit regions:

Select region.
Press alt to show region controll points.
Drag the controll points around.

You can further:

Add regions
Delete regions (deletes all associated text lines)
Merge text lines (does not work properly at the moment)

michal-hradis assigned kohuthonza Feb 15, 2023

michal-hradis added the enhancement New feature or request label Feb 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How do I correct line recognition? #33

How do I correct line recognition? #33

zabak commented May 13, 2020 •

edited

Loading

zabak commented Mar 17, 2022

zabak commented Mar 17, 2022

michal-hradis commented Mar 18, 2022

zabak commented Mar 20, 2022

zabak commented Feb 15, 2023 •

edited

Loading

michal-hradis commented Feb 15, 2023 •

edited

Loading

How do I correct line recognition? #33

How do I correct line recognition? #33

Comments

zabak commented May 13, 2020 • edited Loading

zabak commented Mar 17, 2022

zabak commented Mar 17, 2022

michal-hradis commented Mar 18, 2022

zabak commented Mar 20, 2022

zabak commented Feb 15, 2023 • edited Loading

michal-hradis commented Feb 15, 2023 • edited Loading

zabak commented May 13, 2020 •

edited

Loading

zabak commented Feb 15, 2023 •

edited

Loading

michal-hradis commented Feb 15, 2023 •

edited

Loading