A from-scratch implementation of the CNN-RNN-CTC handwriting recognition stack — written to understand the pipeline rather than to chase a leaderboard.
Problem
OCR is usually treated as a closed API call. This project takes it apart: image preprocessing → convolutional feature extractor → recurrent sequence model → CTC alignment → decoded string, with no high-level OCR library in the path.
What I built
A PyTorch training pipeline on the IAM handwriting dataset (line-level), evaluated with Character Error Rate and Word Error Rate. A small inference web UI under OCR_WebApp/ for visual inspection.
Technical components
- Framework
- PyTorch
- Architecture
- CNN feature extractor → BiLSTM → CTC head
- Data
- IAM handwriting database, line-level
- Metrics
- Character Error Rate, Word Error Rate (held-out splits)
- Decoding
- Greedy + beam CTC decoding
- Inference UI
- Web app under
OCR_WebApp/for visual prediction inspection
Evidence / outputs
TODO: publish concrete CER / WER numbers on the IAM test split, a comparison row against a CRNN baseline, model size, and average per-line inference latency. The current README ships the architecture and evaluation harness but not a results table — this page intentionally does not invent numbers.
Current status
Experimental. The architecture and training loop are stable; results table on the repo is the open item.
Limitations
- Line-level only — no full-page layout analysis.
- No language model fusion; decoded outputs are raw model predictions plus simple beam search.
- No published baseline comparison yet (see Evidence).