add native text rendering to muPDF backend #1159

mbway · 2024-09-02T22:41:45Z

This is an investigation into the feasibility of rendering text to PDF in a way that a viewer application is able to understand the text information (i.e. be able to select and copy the text). This was requested in #1158 .

This PR introduces the TextPolicy.NATIVE setting to pass text information directly to the backend rather than 'baking' it to a series of glyph shapes since this looses the original text information.

If the backend supports text it can implement the draw_text method which gets called when TextPolicy.NATIVE is used. The PDF backend supports loading fonts and rendering text with arbitrary 2D transformations defined by a matrix.

The current implementation relies on a magic number to scale the font size correctly to match the baked text from the frontend (which I am treating as ground truth). For the files I have access to the value of 1.375 results in almost identical text however other fonts may require a different scale factor to get exact results.

In the following screenshot white is baked text and red is 'native' PDF text

mbway · 2024-09-02T22:47:29Z

another possibility for the text rendering is to use render_mode=3 which is invisible but allows the area where the text is to be selected. This is typically used for overlying OCR over scanned documents. This approach could be used if you prefer to always use the baked glyphs (because you can do advanced clipping etc) but still have the option to select the text. I think using insert_text instead of baking glyphs should result in a smaller file size though so may be desirable in some situations even if slightly less accurate.

mozman · 2024-09-03T05:21:19Z

Sorry, but I will not add more complexity to the rendering process.

mbway · 2024-09-03T08:57:35Z

Are you able to elaborate on what you don't like about this solution? Maybe I can find something that you like better?

mozman · 2024-09-03T14:46:41Z

src/ezdxf/addons/drawing/pipeline.py

+            if bbox is None:
+                return
+            abstract_font = self.text_engine.get_font(font_face)
+            self.backend.draw_text(


This seems to bypass the clipping stage, so text in viewports and clipped INSERTs will be draw at any time?

Is clipping done in the pipeline? In which case I think you are correct, I didn't really handle clipping so I can't comment on if that would be difficult to add or not

mozman · 2024-09-03T14:48:38Z

src/ezdxf/addons/drawing/pymupdf.py

@@ -424,6 +429,46 @@ def draw_image(self, image_data: ImageData, properties: BackendProperties) -> No
            oc=self.get_optional_content_group(properties.layer),
        )

+    def register_font(self, font: AbstractFont) -> int:


I guess this implementation cannot handle SHX fonts.

I would assume so. I wonder what autocad / other cad programs do when exporting if non-ttf fonts are used?

If both exact rendering and selectable/interpretable text was desirable then invisible text could be layered on top of the baked glyphs hypothetically

mozman · 2024-09-03T14:58:07Z

I liked the fact that the burden of rendering text in backends was removed. This feature is only optional, but questions will still arise as to why the text looks different with different backends.

This implementation skips the clipping stage and renders text outside of VIEWPORTs and clipped INSERT entities and of course cannot render SHX fonts.

I think this feature causes more problems than it solves.

mbway · 2024-09-03T15:05:12Z

I definitely see your viewpoint that users may be confused by the edge cases and that discarding text information before the backend results in simpler backends and I wouldn't suggest that this text policy be set as the default for that reason. But I would think in some cases the ability to further process/analyze the output outweighs inaccuracies in rendering. I am not a user with this requirement though so I don't mind if we skip this feature.

It is a shame that without giving the backend more access, a user with this requirement cannot easily maintain a custom backend that handles text differently. I think a large restructuring would be required to allow this flexibility.

For now I suppose anyone with the requirement for text information in the resulting pdf can use this branch and let me know if they want it rebased in future.

mozman · 2024-09-03T18:43:07Z

I created this tool with my needs in mind (I have been working with CAD for civil engineers for over 25 years) and I don't understand why users would want to create, edit and extract text in DXF files when that's what CAD applications are for. I have always been interested only in geometry stored in DXF files and automating geometry creation - an application independent scripting tool.

However, if someone wants to select/extract text from a DXF file I recommend a tool called ezdxf😄:

import ezdxf

doc = ezdxf.readfile("your.dxf")
for entity in doc.query("MTEXT TEXT"):
    print(entity.dxf.text)  # or write it into a file

mbway · 2024-09-03T19:05:41Z

There is some benefit in being able to access the text in its rendered form (i.e. position, colour etc) as DXF does not store this information plainly (hence the need for the rendering frontend). However, the use case for this may be niche and I'll let users who have an actual use case advocate for it if there are any.

Another benefit for letting PDF handle the text is smaller file sizes but again that may not be a priority.

mbway force-pushed the pdf_text branch from 1c64eed to 63c340a Compare September 2, 2024 22:43

mbway force-pushed the pdf_text branch from 63c340a to dd62ba5 Compare September 2, 2024 22:58

add native text rendering to muPDF backend

f4ebd43

mbway force-pushed the pdf_text branch from dd62ba5 to f4ebd43 Compare September 2, 2024 23:00

mozman reviewed Sep 3, 2024

View reviewed changes

mbway closed this Sep 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add native text rendering to muPDF backend #1159

add native text rendering to muPDF backend #1159

mbway commented Sep 2, 2024 •

edited

Loading

mbway commented Sep 2, 2024

mozman commented Sep 3, 2024

mbway commented Sep 3, 2024

mozman Sep 3, 2024

mbway Sep 3, 2024

mozman Sep 3, 2024

mbway Sep 3, 2024

mozman commented Sep 3, 2024

mbway commented Sep 3, 2024 •

edited

Loading

mozman commented Sep 3, 2024

mbway commented Sep 3, 2024

add native text rendering to muPDF backend #1159

add native text rendering to muPDF backend #1159

Conversation

mbway commented Sep 2, 2024 • edited Loading

mbway commented Sep 2, 2024

mozman commented Sep 3, 2024

mbway commented Sep 3, 2024

mozman Sep 3, 2024

Choose a reason for hiding this comment

mbway Sep 3, 2024

Choose a reason for hiding this comment

mozman Sep 3, 2024

Choose a reason for hiding this comment

mbway Sep 3, 2024

Choose a reason for hiding this comment

mozman commented Sep 3, 2024

mbway commented Sep 3, 2024 • edited Loading

mozman commented Sep 3, 2024

mbway commented Sep 3, 2024

mbway commented Sep 2, 2024 •

edited

Loading

mbway commented Sep 3, 2024 •

edited

Loading