|
FEATURED
ARTICLES
IMT Magazine - OCR -
Optical Character Recognition (Issue 2 - 06)
By Parascript -
www.parascript.com
The overwhelming volume of paper-based data
outstrips the ability of corporations and government entities to manage
documents and records. Computers – working faster and more efficiently than
human operators – now perform many of the tasks required for efficient
document and content management. Computers best manage two distinct types of
documents: electronic documents or data files created originally on a
computer and paper documents scanned and recognized as images.
BY Parascript -
www.parascript.com
The basic principle of Intelligent Recognition
states that handwriting, when reduced to its most basic components, is
essentially motion, or a series of movements, made by a writing instrument.
According to this theory, any handwriting can be described using elements of
a special description language. The eight elements that make up the
trajectories of all cursive letters (Figure 1 below) form a ring that
illustrates the possible transitions of neighbor elements.
Optical character recognition is an
uphill battle for open source
By:
Nathan Willis
If you use
Linux, or another free operating system, and need optical character
recognition (OCR) software, be prepared for a challenge. OCR is a tricky
problem on any computing platform -- both because it is conceptually hard,
and because the task does not lend itself to simple, easy-to-use interfaces.
OCR is the use of visual pattern matching to extract text from an image --
usually a scanned paper document, but it could be a digital photo, a frame
of video, or a screenshot just as easily.
By:
Sami
Lais
(Computerworld)
Suppose you wanted to digitize the novel Moby Dick overnight. You
could stay up all night typing and still not finish. Or you could use a
high-end scanner and in minutes scan all of author Herman Melville's works
into a computer using optical character recognition (OCR) technology.
This is the
technology long used by libraries and government agencies to make lengthy
documents quickly available electronically. Advances in OCR technology have
spurred its increasing use by enterprises. For many document-input tasks,
OCR is the most cost-effective and speedy method available. And each year,
the technology frees acres of storage space once given over to file cabinets
and boxes full of paper documents.
OCR Technology
Optical Character Recognition (OCR) – used extensively
throughout business and government – examines scanned bitmap images of
machine-printed text and translates the characters into ASCII text files
that can be edited. For instance, paper checks contain number series written
in machine print designed to minimize recognition errors. These codes
contain bank routing numbers, the holder’s account numbers and other
information required to process paper transactions. Machine print conversion
is largely a solved problem in this application, as OCR was included in the
first commercial systems that automated machine print text recognition.
OCR (Optical Character
Recognition) is the process of turning a picture of words (such as a scan of
a typed letter) into an editable document that you can open and use in your
desktop publishing software, word processor, or other text editor.
While the technology has been around for
years, it has also been a hit-or-miss process. Some software does the job
better than others. Some of the newest packages offer better support for
less-than-perfect originals and documents with elaborate formatting
including columns, tables, numerous font changes, and graphics. See the
sidebar links for a round-up of some of the top OCR solutions out there
right now for your Windows and Macintosh desktop systems.
What Is OCR?
Optical Character
Recognition (OCR) is a process of converting printed materials into text or
word processing files that can be easily edited and stored. The technology
has enabled such materials to be stored using much less storage space than
the hard copy materials. OCR technology has made a huge impact on the way
information is stored, shared and edited. Prior to Optical Character
Recognition, if someone wanted to turn a book into a word processing file,
each page would have to be typed word for word.
|