
Tesseract code unearthed from the HP crypt
Google has re-released an open source version of optical character recognition (OCR) software originally produced by HP.
The Tesseract program was developed by HP between 1985 and 1995 and in its final year was in the top three OCR packages in a competition organised by the University of Las Vegas (UNLV) in Nevada.
Google said in a statement that, although some people might wonder why the search giant was interested in OCR technology, it fitted in with the company's plans to make information available online.
"We are all about making information available to users, and when this information is in a paper document, OCR is the process by which we can convert the pages of this document into text that can then be used for indexing," said Eric Case on the official Google Code blog.
Read more: vnunet.com

