Chinese researchers have succeeded in developing ethnic language
optical character recognition (OCR) system, which can converts
images of texts written in ethnic languages, such as a scanned
paper documents, into compute-editable text.
The system is usable for documents written in major Chinese
ethnic languages including Mongolian, Tibetan, Uygur, Kazak, Korean
and the Kirgiz language, said Ding Xiaoqing, a professor from the
Tsinghua University, who headed the team responsible for the
project.
It can also be used to recognize and transform materials written
in Arabic, Ding added.
Most OCT technologies in China are only applicable for materials
written in Chinese and English, and can not be used to process
characters in ethnical languages, said Ding.
"The system is designed with capacities to handle multiple
ethnic languages and it can recognize up to 96.2 percent of the
text content," said Ding.
The technology passed the appraisal by several academicians from
the Chinese Academy of Sciences and Chinese Academy of Engineering
on Monday.
It will be used to preserve documents written in ethnic
languages and promote the application of information technology in
China's ethnic groups, said Ni Guangnan, an academician from the
Chinese Academy of Engineering who was among the appraisal
team.
Over 40 researchers from Tsinghua University, Inner Mongolia
University and northwest China's Xinjiang University have spent
eight years in developing and improving the technology.
OCR (Optical Character Recognition) technology involves the
translation of optically scanned images of printed or written text
characters into character codes than can be manipulated by
computers.
(Xinhua News Agency January 30, 2007)
|