|
BACK TO
INDEX >>
The History of ICR & OCR
OCR & ICR - IMT Magazine
(Issue 2 - 06)
By: Parascript
The
overwhelming volume of paper-based data outstrips the ability of
corporations and government entities to manage documents and records.
Computers – working faster and more efficiently than human operators – now
perform many of the tasks required for efficient document and content
management. Computers best manage two distinct types of documents:
electronic documents or data files created originally on a computer and
paper documents scanned and recognized as images.
Computers understand alphanumeric characters as ASCII code typed on a
keyboard where each character or letter represents a recognizable code.
However, computers cannot discern characters and words that are scanned
images of paper documents. Therefore, where alphanumeric information must be
retrieved from images such as commercial or government applications, credit
card applications, tax returns or passport applications, characters (or
objects) must first be converted to their ASCII equivalents before they can
be recognized as readable text. Translating characters and words contained
in images is achieved by three key technologies: 1)
OCR
(Optical Character
Recognition), where characters are converted from machine print to ASCII
text, 2)
ICR
(Intelligent Character Recognition) where human handprint is converted to
ASCII text, and 3)
Intelligent Recognition, the new
generation of technology based on neural networks used to convert
handwritten, hand printed or machine-printed data to ASCII text
OCR
& ICR Technology
OCR
and
ICR
technology are analytical artificial intelligence systems that consider only
sequences of characters rather than whole words or phrases and do not
cross-validate data during the recognition process. Based on the analyses of
sequential lines and curves,
OCR
and
ICR make 'best guesses” at
characters using database look-up tables to closely associate or match the
strings of characters that form words. For these systems to effectively
recognize hand printed or machine printed forms, words must be separated
into individual characters. That is why most typical administrative forms
require people to either hand print into neatly spaced boxes or use combs
(tick marks) at the bottom of input lines to force spaces between letters
entered on a form. Without the use of combs or boxes, conventional
technologies reject fields if people do not follow the structure when
filling out forms, resulting in significant administrative overhead and
costs to forms processing organizations.
In summary:
OCR
technology recognizes only
machine print and rejects inputs that contain non-machine print characters.
Though most advanced systems are able to recognize multiple fonts (some
systems even claim to read any font), they deal only with standard fonts
found in mainstream applications, such as Times Roman and Arial. Ultimately,
human handwriting is too diverse and unstructured to be recognized by
OCR systems.
ICR
technology recognizes machine print
and handprint. However, it rejects any letter shapes formed as cursive
script.
Intelligent Recognition technology
Unlike ICR and OCR systems,
Intelligent Recognition
technology combines engines that can read words character by character with
engines that read whole words or phrases. Therefore, Intelligent Recognition
can recognize cursive handwriting, hand print and machine print --
individually or in any combination. Another important distinction of
Intelligent Recognition is that it does not require the use of constraining
boxes or combs. Accuracy and recognition speed is further improved by
cross-validation of data during the recognition process.
|