Skip to content

MagickImageInfo reads dimensions incorrectly for some PDFs that have negative bounding box values #1831

@36PopTarts

Description

@36PopTarts

Magick.NET version

14.6.0 Q16-AnyCPU

Environment (Operating system, version and so on)

Both Windows 10 and Ubuntu 22, .NET 6

Description

I work with schools and we get a lot of scanned PDFs that defy all reason and logic. Incorrect metadata is included all the time, among other problems. I have a few of them; unfortunately I can't upload them as they contain confidential client data. According to ImageMagick CLI, this particular pdf document contains 72 DPI 612x792 pixel pages. We have to read all images at a minimum of 300 DPI for the purposes of OCR. When I read the metadata of the document in Magick.NET using MagickImageInfo, it reports the page height as an absurd 4294966504 pixels. This causes other conversion processes to fail and produce bad quality images, because unfortunately, we have an algorithm that assumes that the image is very high resolution based on calculating the print size by dividing the largest dimension by the DPI, and thus retains the original 72 DPI instead of reading it at 300 DPI. We have seen images in the past where they were very high resolution, but read at "72 DPI," and reading them at 300 DPI produced absurdly huge images. So we unfortunately cannot just read every image at 300 DPI.

After looking at that particular value again, 4294966504, I've just realized that it is awfully familiar. I think the issue is related to this (from the below identify output):
pdf:HiResBoundingBox: 612x-792+0+0
The height value of the bounding box, for whatever ridiculous reason, is negative. The height value read by MagickImageInfo 4294966504 is exactly the max value of an unsigned integer 4294967295 minus 791. It seems there needs to be a call to use absolute value somewhere in the library and that this is an integer underflow issue.

The server that this application normally lives on runs Ubuntu 22, but I have also made a unit test on a Windows 10 machine that produces the same height value when reading the file using MagickImageInfo.

When running identify -verbose file.pdf, I get a completely reasonable analysis:

Image:
  Filename: file.pdf
  Format: PDF (Portable Document Format)
  Mime type: application/pdf
  Class: DirectClass
  Geometry: 612x792+0+0
  Resolution: 72x72
  Print size: 8.5x11
  Units: Undefined
  Colorspace: sRGB
  Type: PaletteAlpha
  Base type: Undefined
  Endianness: Undefined
  Depth: 16/8-bit
  Channel depth:
    red: 8-bit
    green: 8-bit
    blue: 8-bit
    alpha: 8-bit
  ... color data snip ...
  Rendering intent: Perceptual
  Gamma: 0.454545
  Chromaticity:
    red primary: (0.64,0.33)
    green primary: (0.3,0.6)
    blue primary: (0.15,0.06)
    white point: (0.3127,0.329)
  Background color: white
  Border color: srgba(223,223,223,1)
  Matte color: grey74
  Transparent color: none
  Interlace: None
  Intensity: Undefined
  Compose: Over
  Page geometry: 612x792+0+0
  Dispose: Undefined
  Iterations: 0
  Scene: 1 of 2
  Compression: Undefined
  Orientation: Undefined
  Profiles:
    Profile-xmp: 3395 bytes
  Properties:
    date:create: 2025-04-21T13:09:58+00:00
    date:modify: 2025-04-21T13:09:58+00:00
    dc:format: application/pdf
    pdf:HiResBoundingBox: 612x-792+0+0
    pdf:Producer: PDF Engine win32 - (11.1)
    pdf:Version: PDF-1.6
    signature: 5be3212e46ab6ab4f9d3f1c1f8d47c1628761cdda2c829833af68f33e268fa12
    xmp:CreateDate: 2023-09-19T13:36:29-04:00
    xmp:MetadataDate: 2025-02-25T09:15:14-05:00
    xmp:ModifyDate: 2025-02-25T09:15:14-05:00
    xmpMM:DocumentID: uuid:5fb8d54f-d934-4679-b439-6955d7156afc
    xmpMM:InstanceID: uuid:4cb3c648-e710-42a3-832e-6179dde181c8
  Artifacts:
    filename: file.pdf
    verbose: true
  Tainted: False
  Filesize: 49710B
  Number pixels: 484704
  Pixels per second: 2.34479MB
  User time: 0.170u
  Elapsed time: 0:01.206
  Version: ImageMagick 6.9.11-60 Q16 x86_64 2021-01-25 https://imagemagick.org

Steps to Reproduce

var filePath = "/path/to/file.pdf";
var info = new MagickImageInfo(filePath);
Console.WriteLine($"Read {filePath}: [Width: {info.Width} Height: {info.Height} Orientation: {info.Orientation} Format: {info.Format}]");

Images

Here is a sample document which exhibits the described issue.
math-3-8-score-report-english-2024.pdf

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions