University of Colombo School of Computing
Software Development / Network Design & Implementation
Web Publishing / System Design & Implementation / Conformal Testing
Click here for more info
  • Optical Character Recognition for Sinhala

Beta version of Optical Character Recognition (OCR) for printed Sinhala documents based upon tesseract 3.01 trained by Software
Development Unit of University of Colombo School of Computing (UCSC).

The process of OCR is divided into 2 stages; input and processing (handled by Tesseract OCR engine) and post processing Engine which was developed at UCSC.
Please note, Files loaded are retained with output for future development of the OCR.
At present, OCR will facilitate only single column, printed JPEG files. More image file formats and features will follow subsequently.
Standalone OCR is available soon with Document Management System which was developed by UCSC.
OCR for Sinhala
Home  |  Services  | ResearchClients & Projects  | Staff  |  Contribution  |  Site map  |  Contact Us |
© Copyrights 2007- Software Development Unit
University of Colombo School of Computing, No 35, Reid Avenue, Colombo 7, Sri Lanka.