Inside the RealScan.cy Document Engine: From Scanned Invoice to Posted Ledger Entry
Why Accounting Automation Is Harder Than It Looks
Accounting documents look simple and behave like anything but. Every supplier formats an invoice differently, receipts arrive as crumpled photographs taken in bad light, credit notes hide among the invoices they reverse, and a bank statement is a dense table that means nothing until each line is matched to something. Underneath all of it sits a hard rule: a single wrong number does not stay on the screen, it lands straight in the books of a real client.
RealScan exists to take that work off a firm. It reads invoices, receipts, credit notes and bank statements, extracts what matters, checks it, and posts the result straight into the accounting platforms a firm already uses. For an accountant running many client books at once, that is the difference between an evening of manual entry and a few minutes of review.
Making that reliable is far harder than running text through an optical character reader. This is how the engine behind RealScan actually approaches the problem.
Beyond OCR: Reading a Document the Way a Bookkeeper Does
Optical character recognition turns pixels into characters. It does not tell you that the number in the bottom right is the total, that the small print near the top is a VAT registration, or that the figure beside the word balance is the one that has to reconcile. Characters are not meaning, and accounting runs on meaning.
The engine is layout aware. It reads a page the way an experienced bookkeeper does, using position, structure and the relationship between a label and the value next to it, not just the words in isolation. A total is understood as a total because of where it sits, what surrounds it, and how it relates to the lines above, not because someone wrote a rule that the total lives in a fixed spot.
That shift, from reading text to understanding a document, is what lets the system cope with the endless variety of real paperwork instead of breaking on the first invoice that does not match a template.
Template Free Extraction Across Thousands of Layouts
The instinct with document automation is to build a template for each supplier: this vendor puts the invoice number here, that one puts it there. It works in a demo and collapses in production. A busy firm deals with hundreds of suppliers, layouts change without warning, and every new client arrives with a fresh stack of formats nobody has seen.
RealScan does not depend on per supplier templates. It extracts the same set of fields, the supplier, the date, the document number, the net, the VAT, the total and the individual line items, whether or not it has ever seen that particular layout before. New formats are handled on arrival rather than after someone stops to teach the system what they look like.
Template free extraction is harder to build, because the model has to generalize rather than memorize. It is also the only approach that survives contact with the real world, where the next document is almost never the same as the last one.
Four Document Types, Four Different Problems
Reading invoices, receipts, credit notes and bank statements sounds like one job. It is four. An invoice is relatively structured but endlessly varied. A receipt is often a photograph, faded, folded and partial, with the important numbers half lost to a thermal printer. A credit note looks like an invoice but reverses one, and getting the sign wrong quietly doubles an error. A bank statement is a long table where the work is less about reading a single field and more about making sense of many rows at once.
The engine treats each type on its own terms rather than forcing them through one pipeline. What a good extraction means for a receipt is not what it means for a bank statement, and the system is built around that difference instead of pretending it away.
Handling all four well is part of why RealScan replaces a real share of the work a firm does, rather than only the easy and tidy documents.
Getting the Numbers Right: Validation and Confidence
Extracting a number is not the same as trusting it. Accounting carries its own built in checks, and the engine uses them. Line items are expected to sum to the subtotal, VAT is expected to follow the rate, the totals are expected to reconcile, and a figure that breaks that arithmetic is flagged rather than waved through. The history of a supplier provides another sanity check: an amount or a tax treatment that looks nothing like the last fifty documents from the same vendor deserves a second look.
Every extracted field carries a measure of confidence. Where the engine is sure, the document flows through on its own. Where it is not, the document is routed to a person for a quick check rather than guessed at. That human in the loop design is deliberate. The goal is not to remove people from the process, it is to spend their attention only where it actually adds value.
The result is a system that is fast on the easy documents and careful on the hard ones, which is exactly the balance a firm needs when its name is on the accounts.
Posting Into the Books, Not Just Reading Them
Most document tools stop at extraction. They hand back a tidy set of fields and leave the real work, getting that document into the accounts, to a person. RealScan goes the rest of the way. It maps each document to the right supplier, the right ledger accounts and the right tax codes, and posts it straight into the accounting platform the firm already uses.
That last step is deceptively deep. The same supplier might belong to different accounts for different clients, tax treatment varies, and a credit note has to land against the right invoice. Posting correctly means understanding not just what a document says but where it belongs in a particular set of books, and doing it the way a careful bookkeeper would.
Reading a document is the part that demos well. Posting it correctly, every time, across many clients, is the part that actually saves a firm its evenings.
Built for Firms That Run Many Sets of Books
RealScan is built for multi company accountants, and that shapes everything. A firm is not one set of books, it is dozens, each with its own suppliers, its own chart of accounts and its own quirks. The engine keeps those worlds strictly separate, so what it learns about one client never leaks into another.
Within each of those walls, the system adapts. It learns the suppliers a particular client deals with, the accounts they tend to map to, and the patterns specific to that set of books, so the work gets lighter the longer a firm uses it. Strict separation and quiet personalization at the same time is a large part of what makes the tool fit the way accountants actually operate.
Data Protection Is Engineered In, Not Bolted On
Financial documents are some of the most sensitive data a business holds, and a firm hands RealScan the books of its clients. That trust is treated as a design requirement, not a marketing line. Data is isolated and encrypted, access is controlled, retention is kept to what is needed, and the whole system is built to handle client information the way European data protection rules expect.
For an accountant in Greece, in Cyprus or anywhere in the European Union, this is not optional, and the same care follows RealScan to firms working further afield. The value of automating document entry disappears the moment it puts client confidentiality at risk, so protecting that information is built into the foundations rather than added at the end.
It Gets Better the More It Works
When a reviewer corrects a field, that correction is not thrown away. It feeds back into the system, so the same supplier, the same document type and the same client need less attention next time. The engine that handles the hundredth invoice from a given supplier is quietly better at it than the one that handled the first.
This is what separates a static tool from a system that earns its place over time. RealScan is designed to keep getting lighter to use, because the work it has already done makes the work ahead easier.
What We Keep Behind the Curtain
We have described the shape of the engine: layout aware document understanding, template free extraction across invoices, receipts, credit notes and bank statements, arithmetic and history based validation, confidence scoring with human review where it matters, posting straight into existing accounting platforms, strict separation for multi company firms, and a feedback loop that makes it sharper over time. That is genuinely how serious document automation is built.
What stays proprietary is everything that turns that architecture into the results RealScan delivers: the models and how they are trained, the exact extraction and validation logic, the way documents are mapped and posted, the data, and the tuning gathered across a great many real documents. The approach is open to talk about. The edge is not.
If you want to see it in action, RealScan reads invoices, receipts, credit notes and bank statements and posts them straight into the platforms accounting firms already use, with a free companion app and a serious focus on protecting client data. Open the live platform from the link on this page.
Looking for help with AI integration, document processing, or intelligent automation?
We build production systems using the patterns and technologies discussed in this article. Tell us about your project.
Get in Touch