|
BACK TO INDEX >>
THE
TRUTH ABOUT IMAGING
Digital Imaging
- IMT Magazine (Issue 1 - 06)
By: Kurt W. Stevenson (Executive Director -
IMT Media Group)
CAN IT WATCH NETWORK
FOLDERS?
The backend OCR should
allow you to tell it what folders to watch for processing. This way you can
program your MFP and High Capacity devices to scan to one network folder.
If the OCR system is watching this folder it will auto process any digital
document dropped in the folder. This is another automation process that
your OCR system should definitely have. This feature is a must and a
requirement for purchase.
HOW MANY DOCUMENTS
WILL IT PROCESS AT ONE TIME?
Make sure you know how
many documents may be processed at one time. What are the limitations of the
software? How many instances of the software will load on each server?
This will give you an idea of how big and customizable your OCR network can
be. Make sure you have room for future growth in processing. For
instance, the system may require you to have one server as a scheduler.
This server just processes the requests. In other words, it watches the
pre-defined network folder for processing and then directs the document to
another processing server. The processing servers can run up to four
instances of OCR conversion at once. So with 4 servers you can process 16
instances of your OCR platform. This is plenty sufficient for most
organizations.
CAN IT CONTROL SERVER
LOADS?
This is also a very
important feature! For example, if you have a bunch of small documents (10
pages or less) to convert, the OCR system should recognize this and pick the
best server in the farm to perform the processing. If you have 5 other
large documents (100 pages or more) the OCR system should parse
these documents to servers in the farm that have the most CPU power
available. This will balance the load on your server farm and create quick
processing times.
DOES IT SUPPORT
SERVER FARMS?
If you try to perform
mass processing on one server it will definitely crash! Most likely the
server will peak at 100% CPU usage and will get hot and die! The OCR
system has to be able to accept multiple servers. The golden scenario is
one server for scheduling/processing and up to four additional servers for
just processing.
WHAT FILE FORMATS ARE
ACCEPTED?
Make sure you
pre-define what file formats your OCR system will accept. The most common
file type for processing quickly is TIFF. At my firm, I convert from TIFF
to OCR – TEXT SEARCHABLE .PDF. The OCR system does however have the ability
to convert to and from many other file formats.
ARE COVERSHEET SPLITS
PROCESSED?
Most third party ERMS,
EDMS and backend systems use coversheets for indexing and processing. Make
sure your OCR system will recognize these pages, remove the coversheet from
the document, and retain the indexing metadata. This will allow you to scan
multiple documents in one pass on the scanner. For instance, if you have 10
– 20 page documents you would have 10 coversheets. If you scan these all at
once in the feeder of a scanner it will come out as one huge document. With
coversheet recognition and splits it will automatically split it into 10
separate documents with the perspective indexing metadata attached.
Capture Technology For Your High Capacity Scanners
Most high capacity
scanners also require third party capture software. These software systems
have an enormous quantity of features. Unfortunately, most of the features
are proprietary to certain aspects of the business and only apply when you
are performing those tasks. However, there are some basic features that you
want it to have.
NETWORK SCANNING
Just like your MFP
devices you want to be able to have the option to scan to network drives.
This will allow you to scan mass document sets directly into your backend
ERMS, EDMS and Accounting applications.
LOCAL SCANNING
Although local scanning
is not usually preferred it is sometimes necessary. Local scan is a great
backup if your backend imaging and OCR network fails. Having the ability to
still scan mass documents locally and declare them as records is a huge
plus.
COVERSHEET
RECOGNITION / SEPARATOR SHEET RECOGNITION
As stated before, cover
sheets contain meta data which is transferred to a data file along with the
document. The scanner must read and recognize this data for the digital
document to properly index into your backend systems.
AUTO INSERT / MOVE /
DELETE
Another crucial feature
for your capture program is the ability to insert missed documents, move
incorrectly placed images, and delete bad image captures. These features
essentially allow you to re-scan bad images and place these images into the
main document before committing to your backend systems.
SIMPLE USER INTERFACE
One of the biggest
mistakes capture vendors make is remembering that we are not all software
programmers. Make sure the capture software you choose not only has all the
features that you want, but also is easy to use.
CAN YOU BATCH SCAN?
With the feeder
capacity normally residing at around 500-1000 pages, batch scanning
shouldn’t be an issue. Make sure your capture software can accept batch
scanning and split the documents, after scan, appropriately.
SCANNING TO MULTIPLE
FILE TYPES
Make sure your capture
software can accept and scan to multiple file types. Preferably TIFF or PDF
direct multi-page or single page scans.
INTEGRATED OCR &
QUALITY CONTROL
Although we have pretty
much conquered OCR on the backend it is sometimes necessary to OCR documents
on the fly! Make sure this feature is or can be integrated into your
product. Quality control is a huge issue! Make sure your capture product
includes a very good quality control module. The ability to see thumbnail
views of all of your documents to check for quality of scan is very
important. After all, one bad page in a 50 page document makes the digital
document useless.
WHAT IS YOUR NORMAL
SCAN RESOLUTION?
Most providers and
consultants will suggest 200dpi resolution with a level 4 compression
factor. This is a great setting for maximizing your file sizes on the
backend. If you are more worried about the quality of the image then the
size you might want to accelerate to 300dpi resolution.
WHAT WILL YOUR IMAGED
DOCUMENT LOOK LIKE IF REPRINTED?
You have to remember
that many organizations are now reprinting their digital documents for
review and/or drafting new documents. If they are giving these printed
documents to clients they must look like they were created in Word. Images
scanned at 200dpi and smashed with compression factors will not look like a
word processor print.
Want A Paperless Environment?
If your end goal is to
be a primarily paperless office you have to define certain criteria that
ensures your digital documents are 100% as accurate as if they had a piece
of paper in their hand. Some other factors to consider are:
YOU CAN’T DESTROY
PAPER WITHOUT PROOF OF 100% QUALITY CONTROL CHECKS
At my firm we stamp all
Cod documents with a stamp of approval. These stamps say – “100% Quality
control checked by (the document processors name). The coversheet is
printed on blue paper and retained at the last page of the hardcopy document
until we are ready to destroy it.
HOW LONG WILL YOU RETAIN YOUR HARDCOPY DOCUMENTS AFTER IMAGING?
With the system I
mentioned above, purging your hardcopy files becomes much easier. A good
policy is to retain the paper document for at least 30 days. Use a staging
area for documents that have already been imaged and are awaiting
destruction. Keep these documents in a chronological order so that they are
easy to find if needed.
HOW LONG WILL YOU
RETAIN YOUR IMAGED DOCUMENTS?
Because hard drive
space is so cheap these days, many organizations are choosing to retain
their digital documents forever. I, however, do not agree with this
scenario. Digital documents should be treated with the same retention
policies as your paper. When their lifecycle is up, they should be deleted
from your system.
DISASTER RECOVERY
Having a digital
records system is one thing, but making that system reliable is another.
One thing which must be remembered is that your digital ERMS relies on your
organization's network. With this in mind, it is necessary that you
replicate all of your images offsite to a remote location. Real time
replication is the most effective. This way, anytime you add a document to
your ERMS or EDMS systems it will auto copy to your offsite location. In
case of a disaster you can have all of your digital records available
immediately with no lag in the work process.
What ERMS & EDMS Vendors Forget
GOOD SEARCH TOOLS
In most cases ERMS and
EDMS search tools are inadequate and VERY slow! They also forget the need
to see collective works from multiple repositories in one unique search
tool. I suggest you look at purchasing a global search tool. If your
documents have good profiling and/or meta capture and are OCR’d at 95%; your
search results will come back very quick and be very accurate.
CLEAN EXPORTS OF
DIGITAL DOCUMENTS WITH CLEAN META EXTRACTION
With the above said,
make sure your ERMS and EDMS vendors can give you good exports of your
digital documents. Most ERMS systems hide and encode your documents below
multiple layers of folder sets on the backend. This will cause your
indexers and search results to crawl to a halt! Before you implement a
third party search tool make sure you get a quality export of your digital
documents from your ERMS and EDMS vendors.
DAILY DELTA EXPORTS
TO SECONDARY INDEX
In order to create the
most recent footprint of available digital documents your ERMS and EDMS
systems must be able to provide your search indexers with a daily delta grab
of new documents. This way you can set your indexers to only capture a
small amount of new data instead of re-indexing terabytes of documents at a
time.
What Should Your Global Search Engine Include
Although global search
technology is fairly new it should be able to perform these standard tasks
fairly seamlessly.
THE USER INTERFACE
The user interface
should include several views.
PANE VIEW
– this view should look and feel much like Microsoft Outlook. Allowing a
split view of documents pertaining to the search criteria across EDMS, ERMS,
LOCAL EMAIL, ARCHIVED EMAIL, and LOCAL DOCUMENTS. Each frame should have
the ability to search on click. For instance, the frame fields may be DATE,
DESCRIPTION, AUTHOR or others. You should be able to click on DATE and it
should re-sort the result set by date sequence.
GLOBAL VIEW
– this view should look and feel much like Microsoft Outlook as well. The
only difference is it would combine the search results from the EDMS, ERMS,
LOCAL EMAIL, ARCHIVED EMAIL, and LOCAL DOCUMENTS in one date chronological
order.
PART 1 |
PART 2 |
PART 3
|