-
-
Notifications
You must be signed in to change notification settings - Fork 434
Description
Magick.NET version
14.6.0 Q16-AnyCPU
Environment (Operating system, version and so on)
Both Windows 10 and Ubuntu 22, .NET 6
Description
I work with schools and we get a lot of scanned PDFs that defy all reason and logic. Incorrect metadata is included all the time, among other problems. I have a few of them; unfortunately I can't upload them as they contain confidential client data. According to ImageMagick CLI, this particular pdf document contains 72 DPI 612x792 pixel pages. We have to read all images at a minimum of 300 DPI for the purposes of OCR. When I read the metadata of the document in Magick.NET using MagickImageInfo, it reports the page height as an absurd 4294966504 pixels. This causes other conversion processes to fail and produce bad quality images, because unfortunately, we have an algorithm that assumes that the image is very high resolution based on calculating the print size by dividing the largest dimension by the DPI, and thus retains the original 72 DPI instead of reading it at 300 DPI. We have seen images in the past where they were very high resolution, but read at "72 DPI," and reading them at 300 DPI produced absurdly huge images. So we unfortunately cannot just read every image at 300 DPI.
After looking at that particular value again, 4294966504, I've just realized that it is awfully familiar. I think the issue is related to this (from the below identify output):
pdf:HiResBoundingBox: 612x-792+0+0
The height value of the bounding box, for whatever ridiculous reason, is negative. The height value read by MagickImageInfo 4294966504 is exactly the max value of an unsigned integer 4294967295 minus 791. It seems there needs to be a call to use absolute value somewhere in the library and that this is an integer underflow issue.
The server that this application normally lives on runs Ubuntu 22, but I have also made a unit test on a Windows 10 machine that produces the same height value when reading the file using MagickImageInfo.
When running identify -verbose file.pdf, I get a completely reasonable analysis:
Image:
Filename: file.pdf
Format: PDF (Portable Document Format)
Mime type: application/pdf
Class: DirectClass
Geometry: 612x792+0+0
Resolution: 72x72
Print size: 8.5x11
Units: Undefined
Colorspace: sRGB
Type: PaletteAlpha
Base type: Undefined
Endianness: Undefined
Depth: 16/8-bit
Channel depth:
red: 8-bit
green: 8-bit
blue: 8-bit
alpha: 8-bit
... color data snip ...
Rendering intent: Perceptual
Gamma: 0.454545
Chromaticity:
red primary: (0.64,0.33)
green primary: (0.3,0.6)
blue primary: (0.15,0.06)
white point: (0.3127,0.329)
Background color: white
Border color: srgba(223,223,223,1)
Matte color: grey74
Transparent color: none
Interlace: None
Intensity: Undefined
Compose: Over
Page geometry: 612x792+0+0
Dispose: Undefined
Iterations: 0
Scene: 1 of 2
Compression: Undefined
Orientation: Undefined
Profiles:
Profile-xmp: 3395 bytes
Properties:
date:create: 2025-04-21T13:09:58+00:00
date:modify: 2025-04-21T13:09:58+00:00
dc:format: application/pdf
pdf:HiResBoundingBox: 612x-792+0+0
pdf:Producer: PDF Engine win32 - (11.1)
pdf:Version: PDF-1.6
signature: 5be3212e46ab6ab4f9d3f1c1f8d47c1628761cdda2c829833af68f33e268fa12
xmp:CreateDate: 2023-09-19T13:36:29-04:00
xmp:MetadataDate: 2025-02-25T09:15:14-05:00
xmp:ModifyDate: 2025-02-25T09:15:14-05:00
xmpMM:DocumentID: uuid:5fb8d54f-d934-4679-b439-6955d7156afc
xmpMM:InstanceID: uuid:4cb3c648-e710-42a3-832e-6179dde181c8
Artifacts:
filename: file.pdf
verbose: true
Tainted: False
Filesize: 49710B
Number pixels: 484704
Pixels per second: 2.34479MB
User time: 0.170u
Elapsed time: 0:01.206
Version: ImageMagick 6.9.11-60 Q16 x86_64 2021-01-25 https://imagemagick.org
Steps to Reproduce
var filePath = "/path/to/file.pdf";
var info = new MagickImageInfo(filePath);
Console.WriteLine($"Read {filePath}: [Width: {info.Width} Height: {info.Height} Orientation: {info.Orientation} Format: {info.Format}]");
Images
Here is a sample document which exhibits the described issue.
math-3-8-score-report-english-2024.pdf