A Python-based desktop automation system that extracts text from an image using OCR and automatically writes and saves the result into Notepad.
This project combines:
- Computer Vision (OCR)
- Text reconstruction using spatial logic
- Desktop GUI automation
- Window management
- Extracts text from images using EasyOCR
- Reconstructs text lines based on bounding box coordinates
- Automatically opens Notepad
- Simulates human typing
- Saves the output as a
.txtfile - Prevents overwriting existing files (auto-increment naming)
- Closes unexpected popup windows during execution
The system follows this pipeline:
- Load OCR model
- Detect and recognize text from image
- Sort detected words vertically (top → bottom)
- Group words into lines using Y-distance threshold
- Sort words horizontally inside each line (left → right)
- Open Notepad automatically
- Type extracted text
- Save as
.txtfile in output directory
pip install easyocr pyautogui pygetwindow
EasyOCR (Deep Learning OCR)
PyAutoGUI (Desktop Automation)
PyGetWindow (Window Management)
Python Pathlib (Modern File Handling)