index.gif
 

        


Site search Web search   What's new

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

JOIN IMT | SIGN IN | HOME  | CONTACT US

 

 

 

 

MODULE

 
 

 

IMT MAGAZINE

IMT TELEVISION & VIDEO

IMT SPACE

IMT NEWSWIRE

IMT PHONE BOOK

IMT SPONSORS

 

 

 

 

NAVIGATION

 
 

 

ABOUT US

ADVERTISE

CONTACT US

JOIN OUR NETWORK

 

 

 

 

THE E-MAGAZINE

 
 

 

AVAILABLE ISSUES

FEATURES

DEPARTMENTS

PRESS RELEASES

COLUMNS

 

 

 

 

MEDIA INFO

 
 

 

ADVERTISE/SPONSOR

EDITORIAL CALENDAR

CIRCULATION

AD REQUIREMENTS

 

 

 

 

SUBMISSIONS

 
 

 

ARTICLE SUBMISSIONS

PRESS SUBMISSIONS

EVENTS POSTINGS

VIDEO SUBMISSIONS

 

 

 

 

HOT OR NOT

 
 

 

HOT NEW PRODUCTS

SUBMIT YOUR TECH

 

 

 

 

TOP 100 AWARDS

 
 

 

CLICK TO APPLY

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

  

 

     FEATURED TECHNOLOGY VENDORS AT IMT MAGAZINE.COM:

       

            

 

IMT MAGAZINE FEATURE ARTICLE...

 

BACK TO INDEX >>

 

WHAT IS OCR (OPTICAL CHARACTER RECOGNITION)

OCR & ICR - IMT Magazine (Issue 2 - 06)

By:  IMT Staff

 

What Is OCR?

Optical Character Recognition (OCR) is a process of converting printed materials into text or word processing files that can be easily edited and stored. The technology has enabled such materials to be stored using much less storage space than the hard copy materials. OCR technology has made a huge impact on the way information is stored, shared and edited. Prior to Optical Character Recognition, if someone wanted to turn a book into a word processing file, each page would have to be typed word for word.

 

OCR technology requires both hardware and software. In addition, sophisticated OCR systems require an additional circuit board in the computer itself to complete the process. An optical scanner scans the text on a page, then breaks the fonts down into a series of dots called a bitmap. The software can read most common fonts and distinguish where lines start and stop. This bitmap is then translated into computer text.

 

While Optical Character Recognition has made huge advances in recent years, it still does not perform well in recognizing handwriting or fonts that look similar to handwriting. There are systems within the banking industry that use OCR technology to try to read the amounts on hand written checks, to go along with the computer's ability to read the routing and account numbers.

 

To give an idea of the power of OCR, let us take a look at a real-world example. Imagine a police department that has all its criminal records stored in vast file cabinets. Although scanning millions of pages would be an expensive and time-consuming undertaking, the benefits are huge. Once the OCR system has converted the pages into computer-readable text, a detective, for example, could search through the entire history in a few seconds. Manually finding a particular record might not be too difficult, but imagine a detective trying to search for all the crimes committed on a certain intersection between 8 and 8:30. This example only scratches the surface of the power of searchable text, and it is only one reason that many companies and institutions are spending millions of dollars to OCR their legacy data.

 

Ideal Source Material for OCR     

OCR works best with originals or very clear copies and mono-spaced fonts like Courier. If you have choices, use the following source material:

  • 12 point or greater font size.

  • Black text on a white background.

  • A clean copy; not a fuzzy multi-generation copy from a copy machine.

  • Standard type font (Times, New Roman, etc.) Fancy fonts may not be recognized.

  • Single column layout.

 

OCR Limitations     

  • Using text from a source with font size less than 12 points or from a fuzzy copy will result in more errors.

  • Except for tab stops and paragraphs marks, MOST document formatting is lost during text scanning, (Bold, Italic & Underline are sometimes recognized).

  • The output from a finished text scan will be a single column editable text file. This text file will always require spellchecking and proofreading as well as reformatting to desired final layout.

  • Scanning plain text files or printouts from a spreadsheet usually works, but the text must be imported into a spreadsheet and reformatted to match the original.

 

What Source Material Doesn't Work Well for OCR?     

  • Forms (especially with boxes and check boxes)

  • Very small text

  • Multi-generation fuzzy or blurry copies from a copy machine

  • Mathematical formulas

  • Draft copies of documents with hand-written revisions

  • Fancy text and unusual fonts

  • Handwritten text

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

      HOME | ABOUT US | ADVERTISE | SUBSCRIBE | E-MAGAZINE DOWNLOAD | CONTACT US

                                                                                                                  

      (C) Copyright 2005.  Horizon Dynamics, LLC.  All rights reserved.

       phone:  775-599-1984  Fax:  775-665-2769

        Email : info@rmtechnology.com