Building an OCR System Using RunPod Serverless
Learn how to build an Optical Character Recognition (OCR) system using RunPod Serverless and pre-trained models from Hugging Face to automate the processing of receipts and invoices.
Introduction
Processing receipts and invoices manually is both time-consuming and prone to errors. Optical Character Recognition (OCR) systems can automate this task by extracting text from images and converting it into structured data. In this tutorial, you will learn how to build your own OCR system using RunPod Serverless and pre-trained models from Hugging Face. This system will enable you to efficiently convert images of receipts into digital invoices, streamlining your workflow and reducing manual data entry.
Prerequisites
To complete this tutorial, you will need:
-
A RunPod account with access to RunPod Serverless.
-
Basic knowledge of Python programming.
-
Familiarity with RESTful APIs.
-
Python 3 installed on your local machine.
-
The following Python libraries installed:
pip install requests pillow pdf2image pillow_heif argparse
Step 1 — Setting Up the RunPod Serverless Environment
First, you'll set up a serverless endpoint on RunPod. RunPod Serverless allows you to deploy and run machine learning models without managing the underlying infrastructure.
Deploying the OCR Model
After deployment, you'll receive an Endpoint ID, which you'll use to interact with the model.
OPENAI BASE URL https://api.runpod.ai/v2/vllm-xxxxxxxxxxx/openai/v1
RUNSYNC https://api.runpod.ai/v2/vllm-xxxxxxxxxxx/runsync
RUN https://api.runpod.ai/v2/vllm-xxxxxxxxxxx/run
STATUS https://api.runpod.ai/v2/vllm-xxxxxxxxxxx/status/:id
CANCEL https://api.runpod.ai/v2/vllm-xxxxxxxxxxx/cancel/:id
HEALTH https://api.runpod.ai/v2/vllm-xxxxxxxxxxx/health
Writing the InvoiceProcessor
Class
Create a file named invoice_processor.py
and add the following code snippets.
Importing Required Libraries
import os
import base64
import requests
from PIL import Image
import io
import pdf2image
import pillow_heif
from pathlib import Path
Converting Images to Base64
Takes in images and turns them into base64 encoded schemes that the model is able to ingeest and run inference on.
def convert_to_base64(self, file_path):
"""
Convert various file formats to base64.
Supports JPEG, JPG, PDF, HEIC/HEIF formats.
"""
file_path = str(file_path)
if not os.path.exists(file_path):
raise FileNotFoundError(f"File not found: {file_path}")
file_extension = Path(file_path).suffix.lower().replace('.', '')
if file_extension not in self.supported_formats:
raise ValueError(f"Unsupported file format: {file_extension}")
# Handle PDF files
if file_extension == 'pdf':
images = pdf2image.convert_from_path(file_path, first_page=1, last_page=1)
if not images:
raise ValueError("Failed to convert PDF to image")
img = images[0]
# Handle HEIC/HEIF files
elif file_extension in ['heic', 'heif']:
heif_file = pillow_heif.read_heif(file_path)
img = Image.frombytes(
heif_file.mode,
heif_file.size,
heif_file.data,
"raw",
)
# Handle JPEG/JPG files
else:
img = Image.open(file_path)
# Convert image to RGB if necessary
if img.mode != 'RGB':
img = img.convert('RGB')
# Convert to JPEG format in memory
buffer = io.BytesIO()
img.save(buffer, format='JPEG')
img_bytes = buffer.getvalue()
return base64.b64encode(img_bytes).decode('utf-8')
Processing the Invoice Image
payload = {
"input": {
"prompt": prompt,
"image": encoded_image,
"temperature": 0.1,
"max_tokens": 1000
}
}
response = requests.post(
self.endpoint_url.format(endpoint_id=endpoint_id),
headers=headers,
json=payload
)
if response.status_code != 200:
raise Exception(f"RunPod API request failed: {response.text}")
return response.json()
Batch Processing of Invoices
Option to batch process multiple receipts into a single invoice.
def batch_process_invoices(self, input_directory, endpoint_id):
"""
Process all supported invoice images in a directory.
Args:
input_directory: Directory containing invoice images
endpoint_id: RunPod endpoint ID
Returns:
List of processed invoice data
"""
results = []
errors = []
for file_path in Path(input_directory).iterdir():
if file_path.suffix.lower().replace('.', '') in self.supported_formats:
try:
result = self.process_invoice_image(str(file_path), endpoint_id)
results.append({
'file_name': file_path.name,
'data': result
})
except Exception as e:
errors.append({
'file_name': file_path.name,
'error': str(e)
})
if errors:
print("Errors occurred during batch processing:")
for error in errors:
print(f"File: {error['file_name']} - Error: {error['error']}")
return results
Creating a Runpod API Key
Processing a Single Image
Run the following command to process a single image:
python run_processor.py --api-key "your-runpod-api-key" --endpoint-id "your-endpoint-id" --input path/to/invoice.jpg
Replace:
"your-runpod-api-key"
with your actual RunPod API key."your-endpoint-id"
with your endpoint ID from RunPod.path/to/invoice.jpg
with the path to your invoice image.
Processing a Batch of Images
To process all supported images in a directory:
python run_processor.py --api-key "your-runpod-api-key" --endpoint-id "your-endpoint-id" --input path/to/invoice_directory
Step 5 — Examining the Output
After running the script, you'll find JSON files in the ./output
directory.
Example Output (invoice_processed.json
):
{
"type": "Invoice",
"invoice_number": "12345",
"date": "2023-10-01",
"vendor_name": "ABC Supplies",
"line_items": [
{
"description": "Item A",
"quantity": 2,
"unit_price": 50.00,
"total": 100.00
},
{
"description": "Item B",
"quantity": 1,
"unit_price": 150.00,
"total": 150.00
}
],
"subtotal": 250.00,
"tax_amount": 25.00,
"total_amount": 275.00
}
Changing Serverless Template to New Model
If you want to change the model from hugging-faces you can update the model URL.
Step 6 — Generating Invoices from Extracted Data
Now, you can use the extracted JSON data to generate formatted invoices.
Generating a PDF Invoice
Use the ReportLab
library to create a PDF invoice.
Installing ReportLab
pip install reportlab
Writing the PDF Generation Script
Create a file named generate_invoice.py
and add the following code:
from reportlab.lib.pagesizes import letter
from reportlab.pdfgen import canvas
import json
def create_invoice(json_data, output_pdf_path):
c = canvas.Canvas(output_pdf_path, pagesize=letter)
width, height = letter
# Extract data from JSON
invoice_number = json_data.get('invoice_number', 'N/A')
date = json_data.get('date', 'N/A')
vendor_name = json_data.get('vendor_name', 'N/A')
total_amount = json_data.get('total_amount', 'N/A')
# Add text to PDF
c.drawString(50, height - 50, f"Invoice Number: {invoice_number}")
c.drawString(50, height - 70, f"Date: {date}")
c.drawString(50, height - 90, f"Vendor: {vendor_name}")
c.drawString(50, height - 110, f"Total Amount: ${total_amount}")
# Save PDF
c.save()
# Usage example
with open('output/invoice_processed.json') as f:
json_data = json.load(f)
create_invoice(json_data, 'output/invoice.pdf')
Running the PDF Generation Script
Run the following command:
python generate_invoice.py
This script will generate a PDF invoice based on the data extracted from your image.
Conclusion
In this tutorial, you built an OCR system using RunPod Serverless and a pre-trained model from Hugging Face. By automating the extraction of text from images and converting it into structured data, you've streamlined the process of generating invoices from receipts. This solution saves time and reduces the potential for errors associated with manual data entry.
Next Steps
Consider enhancing your OCR system by:
- Improving Error Handling: Add more robust exception handling to manage edge cases.
- Customizing Invoice Templates: Use advanced PDF generation libraries to create professional invoice layouts.
- Integrating with Accounting Software: Connect your system to accounting platforms like QuickBooks or Xero for seamless workflow integration.
Additional Resources
By following this tutorial, you've gained hands-on experience in building an OCR system, processing images, and generating digital invoices using RunPod. This foundation opens up opportunities to explore more complex data extraction and document processing tasks.