handwritten-ocr-system

A from-scratch implementation of the CNN-RNN-CTC handwriting recognition stack — written to understand the pipeline rather than to chase a leaderboard.

Problem

OCR is usually treated as a closed API call. This project takes it apart: image preprocessing → convolutional feature extractor → recurrent sequence model → CTC alignment → decoded string, with no high-level OCR library in the path.

What I built

A PyTorch training pipeline on the IAM handwriting dataset (line-level), evaluated with Character Error Rate and Word Error Rate. A small inference web UI under OCR_WebApp/ for visual inspection.

Technical components

Framework: PyTorch
Architecture: CNN feature extractor → BiLSTM → CTC head
Data: IAM handwriting database, line-level
Metrics: Character Error Rate, Word Error Rate (held-out splits)
Decoding: Greedy + beam CTC decoding
Inference UI: Web app under OCR_WebApp/ for visual prediction inspection

Evidence / outputs

TODO: publish concrete CER / WER numbers on the IAM test split, a comparison row against a CRNN baseline, model size, and average per-line inference latency. The current README ships the architecture and evaluation harness but not a results table — this page intentionally does not invent numbers.

Current status

Experimental. The architecture and training loop are stable; results table on the repo is the open item.

Limitations

Line-level only — no full-page layout analysis.
No language model fusion; decoded outputs are raw model predictions plus simple beam search.
No published baseline comparison yet (see Evidence).

Repo

github.com/GioiaZheng/handwritten-ocr-system