Financial Services

AI Document Processing for Financial Services

A mid market lending institution processing over 50,000 loan applications annually across personal loans, business lending, and mortgage products, with operations spanning three European countries.

85%

Faster Processing

97.3%

Extraction Accuracy

60%

Cost Reduction

12x

Throughput Increase

The Problem

Challenge

The lending institution received loan applications through multiple channels: their website, broker portals, branch offices, and a white label lending platform they operated for two partner banks. Each application included between 8 and 35 supporting documents depending on the product type, including payslips, bank statements, tax returns, proof of address, identification documents, company financial statements, and property valuations for mortgage applications.

The manual review process was the primary bottleneck in the lending pipeline. A team of 24 document processors worked in shifts to review incoming applications. Each processor manually opened every document, identified the document type, extracted the relevant data fields, entered the information into the loan origination system, and flagged any discrepancies or missing information. The average time from application submission to document review completion was 48 hours, though complex mortgage applications with multiple applicants could take up to five days.

Error rates in manual data entry were running at approximately 4.2%, discovered during the subsequent underwriting stage. Each error required the application to be sent back for re review, adding further delays and creating friction in the customer experience. The institution estimated that data entry errors were contributing to a 12% abandonment rate on applications that had already been conditionally approved.

The document format variability was the core technical challenge. Bank statements arrived from over 200 different banks, each with their own format. Payslips came from thousands of employers with no standardization. Tax returns varied by country and year. The previous attempt at automation using template based OCR had failed because it required a separate template for every document format, which was not maintainable at the scale of variation they encountered.

Our Approach

Solution

We built an intelligent document processing pipeline using Azure AI Document Intelligence as the extraction foundation, augmented with custom ML.NET models trained on the institution's specific document corpus. The system processes documents through five stages: intake, classification, extraction, validation, and integration with the loan origination system. The intake stage normalizes incoming documents regardless of source channel. Documents arrive as PDFs, scanned images, email attachments, and photographed documents from mobile applications. The system standardizes all inputs into a consistent format, applying image enhancement for low quality scans and automatic rotation correction for mobile photographs.

The classification stage uses a custom trained model that identifies the document type with 99.1% accuracy across 42 document categories. This model was trained on 180,000 labelled documents from the institution's historical archives. When the classifier confidence falls below 92%, the document is routed to a human reviewer through a purpose built review interface. The extraction stage is where the core intelligence operates. Azure AI Document Intelligence handles the foundational OCR and layout analysis, while our custom extraction models target specific data fields for each document type. For bank statements, the system extracts account holder details, transaction histories, running balances, and income patterns. For payslips, it identifies gross pay, deductions, net pay, employer details, and pay period.

The validation engine applies business rules specific to the lending products. It cross references extracted data across documents within the same application, checking for consistency in names, addresses, and income figures. It calculates affordability ratios, identifies potential fraud indicators such as altered documents, and generates a confidence score for each application.

An Azure Functions based orchestration layer manages the pipeline, handling retries for failed extractions, routing edge cases to human reviewers, and tracking processing status in real time through a SignalR powered dashboard. The human in the loop interface was designed for speed, presenting reviewers with the extracted data alongside the source document, with discrepancies highlighted. Reviewers correct and confirm with minimal keystrokes, and their corrections feed back into the training pipeline to continuously improve model accuracy.

Delivery

Implementation Phases

01

Document Corpus Analysis and Model Training

We catalogued all 42 document types the institution receives, built the training dataset from 180,000 historical documents, and trained the initial classification and extraction models. This phase included establishing the accuracy baselines and defining the confidence thresholds that would determine human review routing.

02

Core Pipeline Development

The document intake, classification, and extraction pipeline was built using Azure Functions for orchestration and Cosmos DB for document state management. We integrated Azure AI Document Intelligence and deployed the custom ML.NET models, running parallel processing tests against historical applications to validate accuracy.

03

Validation Engine and Review Interface

The business rules validation engine was developed in close collaboration with the underwriting team. The human in the loop review interface was built as a Blazor WebAssembly application, optimized for keyboard driven workflows to maximize reviewer throughput.

04

Integration and Shadow Processing

We integrated the pipeline with the existing loan origination system via REST APIs and ran four weeks of shadow processing where both the manual team and the automated system processed the same applications. Results were compared daily to identify extraction gaps and calibrate confidence thresholds.

05

Production Rollout and Continuous Improvement

The system went live with personal loan applications first, expanding to business lending after two weeks and mortgage products after six weeks. The feedback loop from human reviewers was activated, with model retraining running weekly to incorporate corrections and improve accuracy on edge cases.

Outcomes

Results

  • Average document processing time reduced from 48 hours to under 6 hours per application, an 85% improvement
  • Extraction accuracy reached 97.3% across all document types, compared to 95.8% for manual processing
  • Document processing team reduced from 24 to 9 staff, with the remaining team focused on complex edge cases and quality assurance
  • Operational cost reduced by 60%, saving approximately EUR 1.1 million annually
  • Application throughput increased 12x, from 200 to 2,400 applications processed per day at peak capacity
  • Data entry error rate dropped from 4.2% to 0.8%, significantly reducing underwriting rework
  • Application abandonment rate decreased from 12% to 4.5% due to faster processing and fewer re review cycles
  • The system processes documents in three languages (English, German, Dutch) with consistent accuracy across all three
"The previous automation attempt failed because it could not handle the sheer variety of documents we receive. This system handles bank statements from 200 different banks and payslips from thousands of employers without needing a template for each one. The accuracy actually exceeds what our manual team was achieving, and applications that used to sit in a queue for two days are now processed in hours."

Elena

Head of Lending Operations

Engineering

Technology Stack

Azure AI Document Intelligence.NET 8C#Azure FunctionsCosmos DBML.NETBlazor WebAssemblySignalRAzure Blob StorageREST APIsPythonDocker

Have a Similar Challenge?

Tell us about your project. We will review your requirements and come back to you with a clear plan and a realistic timeline.

No commitment required. Every conversation starts with understanding your challenge.