Skip to content

Conversation

@pchinery
Copy link

@pchinery pchinery commented Oct 8, 2019

We came across a PDF file that was referencing one resource dictionary from every page, which contained all fonts and images. Therefore, extracting a single page would make the resulting file very large, as all fonts and images would be embedded as well. We can provide this file for tests, if desired.

The code changes not treat cloning the resource dictionary differently from cloning other objects, as the resources will be reduced to resources used in the content.

There are a few questions open:

  • Are there (maybe indirect) ways to reference a resource from the content that are not considered here?
  • Is there a way to re-use the lexer/parser to go identify used resources? (currently, this is a rather hacky implementation)
  • Are there any points that we have not considered properly here?

Any feedback is greatly appreciated and we'd love to see this ability in the main branch at some point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants