Mistral OCR

Overview

The Mistral OCR server uses Mistral’s Pixtral model to extract text from images and PDF documents. It supports both URL-based documents and base64-encoded content, making it ideal for processing invoices, receipts, contracts, and other scanned documents.

How to Add Mistral OCR

Get Mistral API Key

Go to https://console.mistral.ai/
Sign up or log in to your Mistral account
Navigate to API Keys in your account settings
Create a new API key or copy an existing one

Connect to Nexus

Add the Mistral OCR server to your Nexus environment through the server directory
Enter your Mistral API key when prompted

Test Connection

Start with a simple command like “Extract text from this PDF: [URL]” to verify the connection works properly.

What You Can Do

PDF Text Extraction

Extract text content from PDF documents via URL or base64 encoding

Image OCR

Extract text from images including PNG, JPEG, WebP, and AVIF formats

Invoice Processing

Process invoices, receipts, and bills to extract amounts, dates, and vendor info

Document Analysis

Extract and analyze text from contracts, forms, and scanned documents

Available Tools (2)

OCR from URL

ocr_url

Extract text from an image or PDF document via URL

Input:
- url (required) - Public URL of the image or PDF document
- type (required) - Document type: image or pdf
- includeImages (optional) - Include base64-encoded images in response (default: false)
Use Cases: Process publicly accessible documents, extract text from hosted files

If the URL does not have a clear file extension (.pdf, .png, .jpg, etc.), you must specify the document type explicitly. Do not guess - ask the user.

OCR from Base64

ocr_base64

Extract text from a base64-encoded image or PDF document

Input:
- data (required) - Base64-encoded image or PDF data (without data URI prefix)
- mimeType (required) - MIME type of the document
- includeImages (optional) - Include base64-encoded images in response (default: false)
Supported MIME Types:
- image/png
- image/jpeg
- image/webp
- image/avif
- application/pdf
Use Cases: Process documents from file uploads, extract text from embedded content

Use Cases

Invoice & Receipt Processing

Extract Totals: “What is the total amount on this invoice?”
Get Line Items: “Extract all the line items from this receipt”
Vendor Info: “Pull out the vendor name, date, and amount from this bill”
Batch Processing: Process multiple invoices to extract payment details

Document Digitization

PDF Conversion: “Extract the text from this scanned PDF”
Form Processing: “What are the values filled in on this form?”
Contract Analysis: “Extract the key dates and parties from this contract”
Archive Search: Enable searching through scanned document archives

Data Extraction

Contact Info: “Find all phone numbers and email addresses in this document”
Dates & Deadlines: “What are the dates mentioned in this document?”
Financial Data: “Extract all monetary amounts from this statement”
Structured Output: Convert unstructured documents into usable data

Image Text Extraction

Screenshots: “What does this screenshot say?”
Photos of Documents: “Read the text from this photo of the receipt”
Signage & Labels: “What text is visible in this image?”
Handwritten Notes: Extract text from photos of handwritten content

Sample Prompts

Basic OCR

“Extract the text from this PDF: https://example.com/invoice.pdf”
“What does this image say? https://example.com/receipt.png”
“OCR this document and give me the contents”

Invoice Processing

“Extract all the line items and totals from this invoice”
“What is the invoice number, date, and total amount?”
“Pull out the vendor details and payment terms”

Document Analysis

“Extract the text and summarize the key terms in this contract”
“What are all the dates mentioned in this document?”
“Find all contact information (emails, phone numbers) in this PDF”

With Embedded Images

“Extract the text and include any images from this PDF”
“OCR this document and return the embedded charts as images”

Security Guardrails

The Mistral OCR server has built-in security constraints:

PII Redaction

Personally identifiable information (SSN, credit cards, phone numbers, emails) detected in OCR output is automatically redacted to prevent data leakage.

Input Validation

Constraint	Description
Base64 Size Limit	Blocks payloads over ~10MB to prevent memory exhaustion
Block Internal URLs	Prevents SSRF attacks by blocking localhost and private IP ranges (10.x.x.x, 192.168.x.x, 172.16-31.x.x)

Getting Started

Supported Clients

Developers

Reference

AI Prompts

Support

Overview

How to Add Mistral OCR

What You Can Do

PDF Text Extraction

Image OCR

Invoice Processing

Document Analysis

Available Tools (2)

OCR from URL

OCR from Base64

Use Cases

Sample Prompts

Basic OCR

Invoice Processing

Document Analysis

With Embedded Images

Security Guardrails

PII Redaction

Input Validation

Known Limitations

URL Requirements

File Size

Document Types

PII Handling

Documentation

Getting Started

Supported Clients

Developers

Reference

AI Prompts

Support

​Overview

​How to Add Mistral OCR

​What You Can Do

PDF Text Extraction

Image OCR

Invoice Processing

Document Analysis

​Available Tools (2)

​OCR from URL

​OCR from Base64

​Use Cases

​Sample Prompts

​Basic OCR

​Invoice Processing

​Document Analysis

​With Embedded Images

​Security Guardrails

​PII Redaction

​Input Validation

​Known Limitations

​URL Requirements

​File Size

​Document Types

​PII Handling

​Documentation

Overview

How to Add Mistral OCR

What You Can Do

Available Tools (2)

OCR from URL

OCR from Base64

Use Cases

Sample Prompts

Basic OCR

Invoice Processing

Document Analysis

With Embedded Images

Security Guardrails

PII Redaction

Input Validation

Known Limitations

URL Requirements

File Size

Document Types

PII Handling

Documentation