[BUG MODEL]: Mistral OCR-latest drops column of PDF at default DPI

### Model

mistral-small-latest

### Request Payload

OCR models tested: `mistral-ocr-latest` and `mistral-ocr-4-0` (both affected).
Document: a digital (non-scanned) A4 invoice PDF with a real text layer, two-column header layout.

```python
client.ocr.process(
    model="mistral-ocr-4-0",
    document={"type": "document_url", "document_url": "data:application/pdf;base64,<PDF>"},
    include_image_base64=False,
)
```

Toggling `extract_header=True` / `extract_footer=True` (or leaving them off) makes no difference.

### Output

The API silently omits an entire content region: the top-right header column
(invoice number, dates, customer ID, subscription block). The omitted text is
NOT in `markdown`, NOT in `header`/`footer`, and is NOT captured as an image
region. It is simply absent. No error, no warning, no confidence signal.

The dropped text is present in the PDF text layer (extractable with any PDF text
tool), so the content is unambiguously in the document.

What I narrowed it down to: the `document_url` path renders the page internally
to 719x1018 px (`dimensions.dpi = 87`). I rasterized the same page at several
DPIs and submitted each as `image_url`:

| Input | Raster (px) | Region returned? |
|-------|-------------|------------------|
| PDF via `document_url` | 719x1018 (87 dpi, internal) | NO |
| `image_url` PNG @ 72 dpi | 595x842 | yes |
| `image_url` PNG @ 87 dpi | 719x1018 | NO |
| `image_url` PNG @ 96 dpi | 794x1123 | yes |
| `image_url` PNG @ 150/200/300 dpi | up to 2480x3508 | yes |

The failure reproduces exactly at 719x1018, which is the raster the
`document_url` path produces. Both lower (72 dpi) and higher (96+ dpi) rasters
return the full content. This looks like a fragile layout/segmentation failure
at that specific raster size for a two-column header, and the PDF path renders
right into it.

### Expected Behavior

The right-column content should be returned in `markdown` regardless of input
modality, the same way it is when the identical page is submitted as `image_url`
at any other resolution.

At minimum, dropping an entire detected region should not happen silently. There
is currently no parameter on `ocr.process` to control the internal render DPI
for the `document_url` path, so callers cannot work around this without
rasterizing PDFs to images themselves.

### Additional Context

I don't want to share the affected PDF here publicly, but if you need the original pdf to recreate this issue with please provide me with a way to share it with you privately. 

### Suggested Solutions

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BUG MODEL]: Mistral OCR-latest drops column of PDF at default DPI #581

Model

Request Payload

Output

Expected Behavior

Additional Context

Suggested Solutions

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Input	Raster (px)	Region returned?
PDF via `document_url`	719x1018 (87 dpi, internal)	NO
`image_url` PNG @ 72 dpi	595x842	yes
`image_url` PNG @ 87 dpi	719x1018	NO
`image_url` PNG @ 96 dpi	794x1123	yes
`image_url` PNG @ 150/200/300 dpi	up to 2480x3508	yes

Uh oh!

[BUG MODEL]: Mistral OCR-latest drops column of PDF at default DPI #581

Description

Model

Request Payload

Output

Expected Behavior

Additional Context

Suggested Solutions

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions