-
Notifications
You must be signed in to change notification settings - Fork 1.5k
ENH: FontDescriptor: Initiate from embedded font resource #3551
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3551 +/- ##
==========================================
- Coverage 97.15% 97.15% -0.01%
==========================================
Files 56 56
Lines 9783 9802 +19
Branches 1784 1789 +5
==========================================
+ Hits 9505 9523 +18
Misses 167 167
- Partials 111 112 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| } | ||
|
|
||
| # CID fonts have a /W array mapping character codes to widths stashed in /DescendantFonts | ||
| if "/DescendantFonts" in pdf_font_dict: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we able to move this whole conditional block into a dedicated method to keep methods short and readable where possible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not quite, at least, not very elegantly. If I did, then the code for finding the widths should also return the font descriptor dictionary, which would be a rather ugly solution.
Conceptually, it would be a lot different if the FontDescriptor here would also be a subclass of a Font class. More on that below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... I can make this work, of course, will try later.
|
@stefan6419846 I've done some more thinking about this, also in light of trying to fix creating appearance streams for arabic text. I think we took a bit of a wrong turn when we added the character widths in the CORE_FONT_METRICS dict to the FontDescriptor class. In a PDF document, widths are defined immediately in the font dictionary, at the same level of depths as the font descriptor. So, in keeping with the logic of a pdf font resource, a Font class should consist of:
Now, conceptually, our font metrics consist of character widths and a font descriptor, but our FontDescriptor class includes the character widths as well. Furthermore, in order to solve the arabic text case, we would at least need to update the document's font resource, for which it would be really nice to have an associated Font class. I'd like to:
|
This patch copies over the logic for acquiring character widths from a pdf font dictionary fron the Font class in pypdf/_text_extraction/_layout_mode/_font.py. Later on, this makes it possible to initiate a FontDescriptor from an embedded font.
Replace the width_map in the Font class with the FontDescriptor class.
|
I have to admit that I can not completely follow you completely here due to my limited knowledge of how all the font aspects work and relate. If you plan another round of refactoring, please keep in mind to keep it understandable, maintainable and reviewable in a straightforward matter. |
This PR enables initiating a font descriptor based on a font resource for an embedded font. To that end, it takes the code for collecting character widths from the Font class in pypdf/_text_extraction/_layout_mode/_font.py and adds some lines to collect the
other font metrics. This would be necessary for forms that do not use the 14 Adobe standard fonts.
To reduce code duplication, I replaced the existing width_map in the Font class with the FontDescriptor class, which keeps all functionality intact, because it uses the same code for collecting font widths.
I did notice that character widths are in fact not collected for type1 and truetype fonts that specify their encoding as a string.