Paperless-ngx on Elestio: Automating Document Ingestion, OCR Processing, and Smart Tagging
Paperless-ngx can ingest your documents. You already knew that. But most people stop at "drop PDF in folder, let OCR do its thing" and never touch the automation engine sitting right underneath.
That's a mistake. The workflow system, matching algorithms, and storage path templates can turn Paperless-ngx from a searchable filing cabinet into something that sorts, tags, names, and files every document without you lifting a finger. Here's how to set it up on Elestio.
The Matching Algorithms You're Not Using
Every tag, correspondent, and document type in Paperless-ngx has a matching algorithm attached to it. Most people leave it on "Auto" and hope for the best. Here's what each one actually does:
| Algorithm | How It Works | Best For |
|---|---|---|
| Any | Matches if all specified words appear anywhere in the document | Broad categories ("invoice", "receipt") |
| Exact | Matches only if the exact phrase appears in order | Specific senders ("Acme Corp Invoice") |
| Regex | Full regular expression matching against content | Pattern-based matching (invoice numbers, dates) |
| Fuzzy | Approximate string matching for OCR errors | Scanned documents with inconsistent quality |
| Auto | Machine learning classifier trained on your existing assignments | Large document libraries with established patterns |
The Auto algorithm deserves special attention. It trains a neural network on documents you've already tagged, then applies those patterns to new arrivals. The catch: it only trains on documents that are NOT in your inbox. So if you've been tagging documents but leaving them in the inbox, the classifier never learns from them. Move them out, retrain, and watch the accuracy jump.
Building Workflows That Actually Work
Workflows are where Paperless-ngx gets powerful. They fire on four triggers:
Consumption Started fires before OCR even runs. Use it to pre-route documents based on source. If a document arrives via your accounting email rule, tag it "finance" before processing begins.
Document Added fires after content extraction. This is where you build classification logic. Match on content patterns and assign correspondents, document types, and storage paths automatically.
Document Updated fires when metadata changes. Chain workflows together: when a document gets tagged "invoice", a second workflow can set the document type and assign a storage path.
Scheduled runs on a cron schedule (default: hourly). Use it for cleanup tasks, like auto-tagging documents that have been in the inbox for more than 7 days.
Each workflow combines filters (source type, content matching, tag presence) with actions (assign tags, set correspondent, set document type, configure storage path). Actions execute sequentially, so order matters.
Custom Fields for Structured Data
Tags and correspondents only get you so far. Custom fields let you attach structured data to any document with nine data types:
- String/URL for reference numbers and links
- Date for due dates, payment dates, contract expiration
- Integer/Float/Monetary for amounts, quantities, totals (monetary uses ISO 4217 currency codes)
- Boolean for simple flags ("paid", "reviewed", "archived")
- Document Link for creating relationships between documents (creates symmetrical reverse links)
- Select for predefined dropdown lists
The real power is in combining custom fields with workflows. When a document matches your "invoice" pattern, a workflow can automatically create custom fields for "Amount", "Due Date", and "Invoice Number". You fill in the values once, and they become searchable, filterable, and sortable across your entire archive.
Storage Path Templates with Jinja
By default, Paperless-ngx dumps everything into a flat directory. Storage path templates let you organize files on disk using Jinja2 syntax:
{{ document.created|datetime('%Y/%m') }}/{{ document.correspondent.name }}/{{ document.title }}
This creates a folder structure like 2026/03/Acme Corp/Invoice 12345.pdf. Available template variables include:
document.title,document.correspondent.name,document.document_type.namedocument.created,document.added(with datetime filters)document.page_count,document.archive_serial_number- Custom field values via
{{ custom_fields|get_cf_value('Invoice Number') }}
Assign different storage paths to different document types through workflows, and your file system mirrors your organizational logic automatically.
Mail Fetching Rules
Paperless-ngx can pull documents directly from IMAP email accounts. Configure mail rules that filter by sender, recipient, subject line, body content, or attachment type. After processing, choose whether to mark the email as read, move it to a folder, or delete it.
The filter chain matters. Rules execute in defined order, so put specific rules first and catch-all rules last. A practical setup:
- Rule 1: Sender contains "billing@" → Tag "invoice", assign correspondent
- Rule 2: Subject contains "contract" → Tag "legal", set document type
- Rule 3: Catch-all → Tag "inbox" for manual review
Each rule runs independently per email, so one message with multiple attachments creates multiple documents, each tagged according to the rule that matched.
Deploy on Elestio
Paperless-ngx on Elestio comes with the full stack pre-configured: the web application, PostgreSQL for metadata, Redis as the task broker, plus Tika and Gotenberg for Office document conversion.
- Select Paperless-ngx from the Elestio marketplace
- Choose your provider (2 CPU / 4 GB RAM minimum, starting at $16/month on Netcup)
- Click "Deploy"
All five services (webserver, PostgreSQL, Redis, Gotenberg, Tika) start automatically. Access the web interface, create your admin account, and start uploading documents.
For custom domain setup with automated SSL, follow the official Elestio documentation.
Troubleshooting
Auto-matching not working? The classifier only trains on documents outside your inbox. Move tagged documents to their proper locations, then retrain from Settings > Auto matching.
OCR producing garbage? Check your language setting. The default is English only. Add additional languages by setting PAPERLESS_OCR_LANGUAGES in your environment variables (use Tesseract 3-letter codes like deu for German, fra for French).
Mail fetching not picking up emails? Verify your IMAP credentials and ensure the mail rule filters actually match. Check the Celery task log for connection errors: docker-compose logs -f webserver | grep mail.
Storage paths not applying? Templates only affect newly consumed documents. Existing documents keep their current paths unless you trigger a re-match from the admin panel.
Wrapping Up
Paperless-ngx with 37,000+ GitHub stars has earned its reputation as the go-to self-hosted document management system. But the difference between "I use Paperless" and "Paperless runs my entire document workflow" comes down to whether you've configured the automation layer.
Set up your matching algorithms, build a few workflows, define your storage path templates, and let it run. You'll wonder why you ever sorted a document manually.
Thanks for reading. See you in the next one.