|
BACK TO
INDEX >>
TURNING IMAGES INTO EDITABLE TEXT
OCR & ICR -
IMT Magazine (Issue 2 - 06)
By: IMT Staff
OCR (Optical Character
Recognition) is the process of turning a picture of words (such as a scan of
a typed letter) into an editable document that you can open and use in your
desktop publishing software, word processor, or other text editor.
While the technology has been around for years,
it has also been a hit-or-miss process. Some software does the job better
than others. Some of the newest packages offer better support for
less-than-perfect originals and documents with elaborate formatting
including columns, tables, numerous font changes, and graphics. See the
sidebar links for a round-up of some of the top OCR solutions out there
right now for your Windows and Macintosh desktop systems.
Tips for better OCR results
Scanners often come with a limited edition or
"stripped down" version of OCR software.
Other types of programs also
have OCR modules included. The CorelDRAW suite has a utility called
OCR-Trace that has always yielded a fairly acceptable level of OCR accuracy
for me. If your OCR needs are modest, these solutions may be adequate for
your needs.
Whatever type of program you use (and no matter
what accuracy rate the program claims) there are things you can do to insure
the best possible results from your OCR software:
-
Start with a good original.
Is the paper wrinkled? Try ironing it (warm, not hot iron) or pressing
between heavy books. Erase smudges.
-
Make the scan the best you can.
Make sure the scanner bed/glass is clean, smudge-free. Keep the document
straight and even so you don't end up with a "skewed" image. Adjust the
color/contrast/brightness so the background is light/white and free of
"artifacts" (such as a pattern in the paper) and the text is dark. Scan
at 300dpi or better.
-
Turn one document into many.
With older or stripped-down software, graphics, lines on forms, columns
of text, and other formatting will cause problems. Try breaking the
scanned original down into smaller chunks (crop out non-text elements or
save columns of text as individual images) and run your OCR software on
each part separately. You'll lose formatting but gain a more accurate
text document. However, newer OCR software is getting better and better
at retaining formatting of forms and tables so you may want to trade in
your old OCR software for some newer OCR software solutions.
-
Try different settings.
Experiment with different options in your software. If your first
attempt is less than usable, adjust the controls.
-
Proofread.
No matter how accurate the program, all are fallible. Proofread,
proofread, proofread the finished document.
|