Overview
The Mistral OCR server uses Mistral’s Pixtral model to extract text from images and PDF documents. It supports both URL-based documents and base64-encoded content, making it ideal for processing invoices, receipts, contracts, and other scanned documents.How to Add Mistral OCR
1
Get Mistral API Key
- Go to https://console.mistral.ai/
- Sign up or log in to your Mistral account
- Navigate to API Keys in your account settings
- Create a new API key or copy an existing one
2
Connect to Nexus
- Add the Mistral OCR server to your Nexus environment through the server directory
- Enter your Mistral API key when prompted
3
Test Connection
Start with a simple command like “Extract text from this PDF: [URL]” to verify the connection works properly.
What You Can Do
PDF Text Extraction
Extract text content from PDF documents via URL or base64 encoding
Image OCR
Extract text from images including PNG, JPEG, WebP, and AVIF formats
Invoice Processing
Process invoices, receipts, and bills to extract amounts, dates, and vendor info
Document Analysis
Extract and analyze text from contracts, forms, and scanned documents
Available Tools (2)
OCR from URL
ocr_url
ocr_url
Extract text from an image or PDF document via URL
- Input:
url(required) - Public URL of the image or PDF documenttype(required) - Document type:imageorpdfincludeImages(optional) - Include base64-encoded images in response (default: false)
- Use Cases: Process publicly accessible documents, extract text from hosted files
OCR from Base64
ocr_base64
ocr_base64
Extract text from a base64-encoded image or PDF document
- Input:
data(required) - Base64-encoded image or PDF data (without data URI prefix)mimeType(required) - MIME type of the documentincludeImages(optional) - Include base64-encoded images in response (default: false)
- Supported MIME Types:
image/pngimage/jpegimage/webpimage/avifapplication/pdf
- Use Cases: Process documents from file uploads, extract text from embedded content
Use Cases
Invoice & Receipt Processing
Invoice & Receipt Processing
- Extract Totals: “What is the total amount on this invoice?”
- Get Line Items: “Extract all the line items from this receipt”
- Vendor Info: “Pull out the vendor name, date, and amount from this bill”
- Batch Processing: Process multiple invoices to extract payment details
Document Digitization
Document Digitization
- PDF Conversion: “Extract the text from this scanned PDF”
- Form Processing: “What are the values filled in on this form?”
- Contract Analysis: “Extract the key dates and parties from this contract”
- Archive Search: Enable searching through scanned document archives
Data Extraction
Data Extraction
- Contact Info: “Find all phone numbers and email addresses in this document”
- Dates & Deadlines: “What are the dates mentioned in this document?”
- Financial Data: “Extract all monetary amounts from this statement”
- Structured Output: Convert unstructured documents into usable data
Image Text Extraction
Image Text Extraction
- Screenshots: “What does this screenshot say?”
- Photos of Documents: “Read the text from this photo of the receipt”
- Signage & Labels: “What text is visible in this image?”
- Handwritten Notes: Extract text from photos of handwritten content
Sample Prompts
Basic OCR
- “Extract the text from this PDF: https://example.com/invoice.pdf”
- “What does this image say? https://example.com/receipt.png”
- “OCR this document and give me the contents”
Invoice Processing
- “Extract all the line items and totals from this invoice”
- “What is the invoice number, date, and total amount?”
- “Pull out the vendor details and payment terms”
Document Analysis
- “Extract the text and summarize the key terms in this contract”
- “What are all the dates mentioned in this document?”
- “Find all contact information (emails, phone numbers) in this PDF”
With Embedded Images
- “Extract the text and include any images from this PDF”
- “OCR this document and return the embedded charts as images”
Security Guardrails
The Mistral OCR server has built-in security constraints:PII Redaction
Personally identifiable information (SSN, credit cards, phone numbers, emails) detected in OCR output is automatically redacted to prevent data leakage.
Input Validation
| Constraint | Description |
|---|---|
| Base64 Size Limit | Blocks payloads over ~10MB to prevent memory exhaustion |
| Block Internal URLs | Prevents SSRF attacks by blocking localhost and private IP ranges (10.x.x.x, 192.168.x.x, 172.16-31.x.x) |
Known Limitations
URL Requirements
- URLs must be publicly accessible
- Internal/private network URLs are blocked for security
- URLs without clear file extensions require explicit
typeparameter
File Size
- Base64-encoded documents are limited to ~10MB
- For larger documents, use the
ocr_urltool with a public URL
Document Types
- Only supports images (PNG, JPEG, WebP, AVIF) and PDFs
- Scanned documents with poor quality may have reduced accuracy
- Handwritten text recognition may be limited
PII Handling
- PII is automatically redacted from responses
- If you need unredacted data, contact your administrator to adjust guardrails

