|
Records &
Information Management Article - The Truth About Imaging
by: Kurt W. Stevenson -
Executive Director & COO - Horizon Dynamics, LLC
August 15,
2006
Most organizations focus their digital imaging
purchases on the reduction of paper, but this is by no means the
most important reason for converting paper to digital. In my
opinion, there are many other reasons for converting. Some of
the most prominent other reasons would be:
IMMEDIATE
ACCESS TO DOCUMENTS
Digital
imaging combined with Electronic Records Management Systems (EERMS)
allows users immediate access to critical documents without the
need for a hard copy file. With digital documents a user can
search, retrieve and execute a document in seconds. This can be
done in a fraction of the time it takes to request and retrieve
a hard copy file. Of course, saving time equates to saving
money in many organizations. Therefore immediate access to
documents is a critical reason for conversion to a digital
environment.
MULTIPLE
USER ACCESS TO DOCUMENTS
In many
instances, employees have the need to view and execute documents
simultaneously. In a hard copy world this is a significant
obstacle unless the employees are in the same place at the same
time. The best way two or more employees can view and execute a
document simultaneously is to have the document in a digital
format. Digital documents which reside in EERMS or Electronic
Document Management Systems (EEDMS) can be viewed, edited and
executed by multiple employees no matter the employee’s location
or time zone.
REMOTE
ACCESS TO DOCUMENTS
Having your
documents converted to a digital environment can allow immediate
and multiple user access to documents via the web from remote
locations. If your files only reside in paper format
productivity may be negatively impacted because multiple
employees must be at the remote location at the same time in
order to view the file content.
GLOBAL
SHARING OF DOCUMENTS
Having
electronic or digital versions of records will also enable
document sharing via the web in a global environment. Most
EERMS and EEDMS set-ups will allow publishing the digital
documents to an E-Portal via encrypted secure user access. This
permits users located anywhere in the world to access the
published documents. This is also a huge ROI because now
organizations may execute multi-billion dollar deals via the web
across country boundaries and time zones.
DISASTER
RECOVERY
How do you
recover paper documents or files after a disaster occurs? In
the past, everyone always said, “This will never happen.” In
today’s environment, disaster recovery plans should be in place
at nearly every organization or firm in the country. The answer
to recovering paper documents in most cases is – YOU DON’T!
Having your records converted to a digital environment allows
you replicate to remote locations in real time. In laymen’s
terms, it allows you to copy the entire file or record as it is
added to the system. Having a quality replication of your
digital document repository to an offsite location is crucial
when you are thinking of converting to digital and prepares your
organization for the time that “will never happen”.
REDUCTION
OF PAPER ONSITE
Of course
everyone would love to reduce the amount of paper onsite.
Remember this important fact: the quality of your images is the
main determinant of the amount of paper you can destroy. In a
digital imaging environment, quality control checking every
document scanned is a must. If you destroy the paper right
after scan and you have not performed appropriate quality
control checks; the imaged document may be useless. Although
most capture technologies in operation today have built-in
processes for QC none of them can offer 100% QC checks. At my
facility each and every page of every document scanned is QC
checked and then stamped 100% QC. We not only check that each
page is scanned, but we also check for resolution, de-skewing,
and de-speckling and over-all image quality. The reason for
this is that all of our images must be print quality! This in
turn will allow our attorneys to re-print an imaged document
that looks like it was just typed via word processing software!
If these processes take place you can destroy the hard copy
document knowing you have a 100% quality, replicated electronic
version of the original.
REDUCTION
OF PAPER OFFSITE
If you take
the necessary measures to ensure that your images are excellent
quality and a 100% accurate replication of the original it is
easy to reduce the amount paper offsite. At our site, we simply
look for the stamp of approval on the document. When we send
the records to storage we go through a purging process which
allows us to destroy all the documents with the stamp of
approval. We never destroy ORIGINAL EXECUTED DOCUMENTS.
Although courts in the US accept an electronic version of the
signed document we feel it is necessary to retain original
documents for there full retention period. Even retaining these
documents can reduce your offsite storage by up to 70%. This
converts into real dollar savings when you are storing 50,000+
boxes of useless paper!
Setting Up Your Digital Imaging Network – Hardware – MFP’s
In beginning
the process of setting-up a global network for digital imaging,
the purchase of hardware is crucial. Purchasing the right
Multi-Function Printers (MFP) can make or break your
environment. One good practice is to ask each vendor to provide
you with a specifications sheet on their latest and most
productive MFP. Take these documents and build comparison
spreadsheet (in MS Excel). Through this process, comparison of
all the machines may be made in a more efficient way. Don’t
take what’s on paper as reality! Most MFP’s perform at around
75-80% of what the specs say. You must consider the image
resolution and any other processes involved in the scanning
process. With this said, on the copy and print side most MFP’s
perform at around 90% spec ratio. Remember that the size of
the file and the file format that you are printing will directly
affect the PPM rate of most MFP’s. Following are some important
factors to consider when purchasing Multi-function printer
devices.
WHAT IS
THE FEEDER CAPACITY?
Most of the
feeder tray capacities on all common MFP units range between
139-150 pages maximum. Some have the option to accelerate this
capacity a bit, but it’s not going to reach 500 pages.
CAN YOU
BATCH SCAN?
With the
feeder capacity being limited to around 150 pages the ability to
batch scan is crucial. This means that you can take a 1000 page
document and continually feed the document 150 pages at a time
until you are finished. When all 1000 pages are complete you
can then simply push a button and commit the entire document for
processing. Without this option, you are restricted to scanning
only small capacity documents. Some vendors provide third party
proprietary software platforms which will allow batch scanning
with a cover sheet of sorts, but now you are adding another step
to an already NEW practice for most users. Adding this step may
build another potential problem into the process.
WHAT IS
THE PPM ON PRINT SCAN AND COPY?
This is a
crucial factor in purchasing MFP devices. Most MFP's start at
around 35ppm with 200dpi and level 4 compression. These
machines are will usually meet the minimum requirements of your
organization, but are usually limited by functionality and mass
usage. I would suggest you purchase a machine with a minimum of
50ppm. Remember once again that your ppm will be directly
affected by the size and format of the document you are trying
to print. Scanning speed will be affected by the level of
compression and resolution you are scanning at.
ARE
PROGRAMMABLE INTERFACES POSSIBLE?
Although
this may not be an immediate need you should always purchase for
the future. Having the ability to program the user interface on
the machine to meet your organization’s needs will definitely
become a crucial factor in the future.
SCAN TO
EMAIL – SCAN TO NETWORK DRIVES
Two features
which the MFP must have is the ability to scan to email and scan
to network drives. This will allow users to convert paper
documents quickly throughout your organization. Scanning to
network drives will allow you to program interactive buttons on
the panel that allow direct scan to Records Management, Document
Management, and Accounting Software Systems.
IS COST
RECOVERY CONTROLLED ON THE PANEL?
Most
organizations control cost recovery for print and copy by using
a third party piece of hardware attached to the machine. Each
of these machines has a cost of around $2500 -$3000. Most
organizations are unaware of the option to control cost recovery
available on many MFP’s through Nqueue. Third party companies
charge up to .12 per image. With a backend cost recovery system
you can control what is called “Digital Media Conversion”. So a
bill-back will not say copy or print cost. It will say Digital
Media Conversion, which will include print, scan, copy, phone
(If your system is on VOIP), and fax. There is a huge ROI
involved in cost recovery on the panel. Don’t forget to ask
about this option form your MFP provider.
IS
BARCODE/COVER SHEET RECOGNITION INCLUDED?
This option
is crucial when scanning directly to your EERMS system. Cover
sheets contain meta data which is transferred to a data file
along with the document. The scanner must read and recognize
this data for the digital document to properly index into your
backend systems. If your new MFP’s don’t include this out of
the box don’t buy it!
DEFINE
YOUR INTERFACES BETWEEN EERMS, EEDMS and BACKEND SYSTEMS
Defining how
your MFP devices will interact with your backend systems is also
a crucial point. Make sure you know what file formats your
EERMS, EEDMS and Accounting systems will accept. Contact your
EERMS, EEDMS and other backend systems providers for this
information before you purchase your MFP devices. With the
ability to program the user interface on the machine, you add
simple one button solutions to move your images into backend
systems.
Setting Up Your Digital Imaging Network – High Capacity Scanners
MFP devices
are only a portion of your imaging network. If you are going to
perform FULL SERVICE imaging onsite you will need to purchase
High Capacity Scanners. These scanners normally have feeder
capacities of 1000 pages or more. These scanners normally need
stand alone PC’s with massive amounts of RAM to operate
correctly. Some valuable features to look for are below.
WHAT IS
THE FEEDER CAPACITY AND MAX PPM?
Most high
capacity scanners will allow at least 1000 pages at a time in
the feeder. Do not consider high capacity scanners which don’t
provide at least 500 page feeder trays!
WHAT ARE
THE PC SPECS?
As I stated
earlier, most high capacity scanners require a stand alone PC
for processing. Make sure you know what hardware is needed for
maximum results in processing. Make sure you account for the
cost of these PC’s when you are choosing high capacity scanners.
IS IT
UPGRADEABLE?
In most
cases the need to scan 250 pages per minute duplex (both sides
of a document) is not necessary at the outset. Make sure you can
purchase at least 80 PPM duplex capacity, and then upgrade the
same scanner to 150 PPM duplex when the need arises.
CAN YOU
SCAN AT MAX PPM DUPLEX IN ONE PASS?
Most MFP
devices don’t have dual cameras. This means that to scan duplex
the feeder must pull the document through the scanner twice.
High capacity scanners should have two cameras and the ability
to scan a two sided document at max PPM in one pass. This will
save countless man hours when processing large volumes of paper
documents.
Optical
Character Recognition (OCR)
Ok, what
does that mean? OCR, in laymen’s terms means that a backend
system will read the context of the image and place a text file
behind the picture. This will in turn make any imaged document
text searchable. This functionality allows users to not only
search for the document description, but they can also search
across the body of the document for keywords. A few features
necessary for OCR systems are below.
WILL IT
RUN AS AN NT SERVICE?
This means
if any server in the farm fails will it reboot and continue
processing where it left off, or do you have to manually restart
all of the services to begin processing. Make sure this feature
exists. You do not want to have to monitor your backend OCR
system every minute of everyday. Having the ability to restart
itself provides a “hands-off” structure of an OCR system. The
system should know that it failed, at what point during
conversion it failed, and restart at that point at server
reboot. Make sure it will auto capture the latest footprint of
processing progress at server failure.
CAN IT
WATCH NETWORK FOLDERS?
The backend
OCR should allow you to tell it what folders to watch for
processing. This way you can program your MFP and High Capacity
devices to scan to one network folder. If the OCR system is
watching this folder it will auto process any digital document
dropped in the folder. This is another automation process that
your OCR system should definitely have. This feature is a must
and a requirement for purchase.
HOW MANY
DOCUMENTS WILL IT PROCESS AT ONE TIME?
Make sure
you know how many documents may be processed at one time. What
are the limitations of the software? How many instances of the
software will load on each server? This will give you an idea
of how big and customizable your OCR network can be. Make sure
you have room for future growth in processing. For
instance, the system may require you to have one server as a
scheduler. This server just processes the requests. In other
words, it watches the pre-defined network folder for processing
and then directs the document to another processing server. The
processing servers can run up to four instances of OCR
conversion at once. So with 4 servers you can process 16
instances of your OCR platform. This is plenty sufficient for
most organizations.
CAN IT
CONTROL SERVER LOADS?
This is also
a very important feature! For example, if you have a bunch of
small documents (10 pages or less) to convert, the OCR system
should recognize this and pick the best server in the farm to
perform the processing. If you have 5 other large
documents (100 pages or more) the OCR system should parse
these documents to servers in the farm that have the most CPU
power available. This will balance the load on your server farm
and create quick processing times.
DOES IT
SUPPORT SERVER FAERMS?
If you try
to perform mass processing on one server it will definitely
crash! Most likely the server will peak at 100% CPU usage and
will get hot and die! The OCR system has to be able to accept
multiple servers. The golden scenario is one server for
scheduling/processing and up to four additional servers for just
processing.
WHAT FILE
FORMATS ARE ACCEPTED?
Make sure
you pre-define what file formats your OCR system will accept.
The most common file type for processing quickly is TIFF. At my
firm, I convert from TIFF to OCR – TEXT SEARCHABLE .PDF. The
OCR system does however have the ability to convert to and from
many other file formats.
ARE
COVERSHEET SPLITS PROCESSED?
Most third
party ERMS, EDMS and backend systems use coversheets
for indexing and processing. Make sure your OCR system will
recognize these pages, remove the coversheet from the document,
and retain the indexing metadata. This will allow you to scan
multiple documents in one pass on the scanner. For instance, if
you have 10 – 20 page documents you would have 10 coversheets.
If you scan these all at once in the feeder of a scanner it will
come out as one huge document. With coversheet recognition and
splits it will automatically split it into 10 separate documents
with the perspective indexing metadata attached.
Capture Technology For Your High Capacity Scanners
Most high
capacity scanners also require third party capture software.
These software systems have an enormous quantity of features.
Unfortunately, most of the features are proprietary to certain
aspects of the business and only apply when you are performing
those tasks. However, there are some basic features that you
want it to have.
NETWORK
SCANNING
Just like
your MFP devices you want to be able to have the option to scan
to network drives. This will allow you to scan mass document
sets directly into your backend ERMS, EDMS and Accounting
applications.
LOCAL
SCANNING
Although
local scanning is not usually preferred it is sometimes
necessary. Local scan is a great backup if your backend imaging
and OCR network fails. Having the ability to still scan mass
documents locally and declare them as records is a huge plus.
COVERSHEET
RECOGNITION / SEPARATOR SHEET RECOGNITION
As stated
before, cover sheets contain meta data which is transferred to a
data file along with the document. The scanner must read and
recognize this data for the digital document to properly index
into your backend systems.
AUTO
INSERT / MOVE / DELETE
Another
crucial feature for your capture program is the ability to
insert missed documents, move incorrectly placed images, and
delete bad image captures. These features essentially allow
you to re-scan bad images and place these images into the main
document before committing to your backend systems.
SIMPLE
USER INTERFACE
One of the
biggest mistakes capture vendors make is remembering that we are
not all software programmers. Make sure the capture software
you choose not only has all the features that you want, but also
is easy to use.
CAN YOU
BATCH SCAN?
With the
feeder capacity normally residing at around 500-1000 pages,
batch scanning shouldn’t be an issue. Make sure your capture
software can accept batch scanning and split the documents,
after scan, appropriately.
SCANNING
TO MULTIPLE FILE TYPES
Make sure
your capture software can accept and scan to multiple file
types. Preferably TIFF or PDF direct multi-page or single page
scans.
INTEGRATED
OCR & QUALITY CONTROL
Although we
have pretty much conquered OCR on the backend it is sometimes
necessary to OCR documents on the fly! Make sure this feature
is or can be integrated into your product. Quality control is a
huge issue! Make sure your capture product includes a very good
quality control module. The ability to see thumbnail views of
all of your documents to check for quality of scan is very
important. After all, one bad page in a 50 page document makes
the digital document useless.
Defining Document Types – TIFF – Tagged Image File Format
Deciding
which file format is most effective for your organization is
critical! The most commonly used file formats are TIFF and
PDF. Both are great and have many benefits. Some benefits of
choosing TIFF as your file format follow.
CHOOSING
TIFF (TAGGED IMAGE FILE FORMAT)
·
Quick
load on large files as TIFF files normally load pages as needed.
·
Fairly easy conversion form TIFF to WORD
·
Viewable with Microsoft Office Image Viewer (FREE but not
recommended)
·
Normally TIFF files are small in size (based on scanning
resolution)
·
Instant OCR available (Usually OCR’s at 90% accuracy)
Defining Document Types – PDF – Portable Document Format
More and
more organizations are choosing PDF as their file format of
choice. With the most recent version of Adobe Acrobat
Professional you can literally edit a digital document like it
was a word doc. Below are some other benefits of choosing PDF
as your file format.
•
Slower Load Based on File Size as PDF files load the entire
document at once
•
Seamless Integration from Word To PDF
• Easy
Binder Creation
• Many
Editing Tools Available
•
Digital Signature Technology
•
Security Protocols
•
Views with FREE Viewer From Adobe
•
Edits like Word Doc With Acrobat Professional
•
Instant OCR Available (OCR’s at 95% based on image quality)
Indexing & Document Profiling
How you
index or profile your documents at scanning will definitely
affect how quickly the document returns in a search result
query. Here are some important factors to consider.
ELECTRONIC
RECORDS MANAGEMENT SYSTEM INTEGRATION
As stated
before most ERMS systems use coversheet recognition programs.
These programs essentially allow you to put a place holder in
your ERMS system with a select set of meta data. These cover
sheets will determine what criteria you can later search your
document by. Many capture programs of today will also allow you
to create templates that scan a particular spot on a document to
retrieve metadata. For instance, EDMS systems usually create a
header or footer that contains a good amount of metadata. If
you create a template that scans this section of the document it
will automatically define the metadata capture without any
manual intervention.
GARBAGE IN
= GARAGE OUT
As stated
before how you index and/or profile your digital documents will
directly affect your search return results. Some valuable things
to remember are:
·
Your
meta capture or document profiling is crucial to search
·
Make
sure you select a minimum meta capture or document profiling
criteria
·
How
you index inside of your ERMS and EDMS systems will directly
affect your global search result sets
Sample Meta Capture Criteria - LEGAL
Here is a
sample of what criteria a law firm might want to capture
• Client #
• Matter #
• Document Date
• Document Author
• Document Description
• Document Type
For instance
now an attorney can search for the following:
19123
– 00005
02/05/05 Kurt
Stevenson Response to
judgment Pleadings
If the document was previously profiled with even some of this
data it will allow the indexer on search to reduce the criteria
and amount of documents it has to query to return a results set.
Quality Control
As mentioned
before, the quality of your digital images will decide how much
your users rely on your system. If your images are garbage or
missing pages the system becomes unreliable and will not be
used. If you ever want to get close to a paperless environment
you need to perform quality control checks at 100% accuracy
level. In order for your digital documents to be 100% accurate
you must view each imaged document for quality and accuracy.
Some other factors to consider follow.
WHAT IS
YOUR NORMAL SCAN RESOLUTION?
Most
providers and consultants will suggest 200dpi resolution with a
level 4 compression factor. This is a great setting for
maximizing your file sizes on the backend. If you are more
worried about the quality of the image then the size you might
want to accelerate to 300dpi resolution.
WHAT WILL
YOUR IMAGED DOCUMENT LOOK LIKE IF REPRINTED?
You have to
remember that many organizations are now reprinting their
digital documents for review and/or drafting new documents. If
they are giving these printed documents to clients they must
look like they were created in Word. Images scanned at 200dpi
and smashed with compression factors will not look like a word
processor print.
Want A Paperless Environment?
If your end
goal is to be a primarily paperless office you have to define
certain criteria that ensures your digital documents are 100% as
accurate as if they had a piece of paper in their hand. Some
other factors to consider are:
YOU CAN’T
DESTROY PAPER WITHOUT PROOF OF 100% QUALITY CONTROL CHECKS
At my firm
we stamp all Cod documents with a stamp of approval. These
stamps say – “100% Quality control checked by (the document
processors name). The coversheet is printed on blue paper and
retained at the last page of the hardcopy document until we are
ready to destroy it.
HOW LONG WILL
YOU RETAIN YOUR HARDCOPY DOCUMENTS AFTER IMAGING?
With the
system I mentioned above, purging your hardcopy files becomes
much easier. A good policy is to retain the paper document for
at least 30 days. Use a staging area for documents that have
already been imaged and are awaiting destruction. Keep these
documents in a chronological order so that they are easy to find
if needed.
HOW LONG
WILL YOU RETAIN YOUR IMAGED DOCUMENTS?
Because hard
drive space is so cheap these days, many organizations are
choosing to retain their digital documents forever. I, however,
do not agree with this scenario. Digital documents should be
treated with the same retention policies as your paper. When
their lifecycle is up, they should be deleted from your system.
DISASTER
RECOVERY
Having a
digital records system is one thing, but making that system
reliable is another. One thing which must be remembered is that
your digital ERMS relies on your organization's network. With
this in mind, it is necessary that you replicate all of your
images offsite to a remote location. Real time replication is
the most effective. This way, anytime you add a document to
your ERMS or EDMS systems it will auto copy to your offsite
location. In case of a disaster you can have all of your
digital records available immediately with no lag in the work
process.
What ERMS & EDMS Vendors Forget
GOOD
SEARCH TOOLS
In most
cases ERMS and EDMS search tools are inadequate and VERY slow!
They also forget the need to see collective works from multiple
repositories in one unique search tool. I suggest you look at
purchasing a global search tool. If your documents have good
profiling and/or meta capture and are OCR’d at 95%; your search
results will come back very quick and be very accurate.
CLEAN
EXPORTS OF DIGITAL DOCUMENTS WITH CLEAN META EXTRACTION
With the
above said, make sure your ERMS and EDMS vendors can give you
good exports of your digital documents. Most ERMS systems hide
and encode your documents below multiple layers of folder sets
on the backend. This will cause your indexers and search
results to crawl to a halt! Before you implement a third party
search tool make sure you get a quality export of your digital
documents from your ERMS and EDMS vendors.
DAILY
DELTA EXPORTS TO SECONDARY INDEX
In order to
create the most recent footprint of available digital documents
your ERMS and EDMS systems must be able to provide your search
indexers with a daily delta grab of new documents. This way you
can set your indexers to only capture a small amount of new data
instead of re-indexing terabytes of documents at a time.
What Should Your Global Search Engine Include
Although
global search technology is fairly new it should be able to
perform these standard tasks fairly seamlessly.
THE USER
INTERFACE
The user
interface should include several views.
PANE VIEW
– this view should look and feel much like Microsoft Outlook.
Allowing a split view of documents pertaining to the search
criteria across EDMS, ERMS, LOCAL EMAIL, ARCHIVED EMAIL, and
LOCAL DOCUMENTS. Each frame should have the ability to search
on click. For instance, the frame fields may be DATE,
DESCRIPTION, AUTHOR or others. You should be able to click on
DATE and it should re-sort the result set by date sequence.
GLOBAL
VIEW
– this view should look and feel much like Microsoft Outlook as
well. The only difference is it would combine the search
results from the EDMS, ERMS, LOCAL EMAIL, ARCHIVED EMAIL, and
LOCAL DOCUMENTS in one date chronological order.
Global Search Engine Functionality After Result Set
After you
receive your results you should have the ability to do the
following with each and every document:
• Quick View The Document
• Email The Document
• Print The Document
• Extraction Multiple Documents
• Local Burn To CD
• Track The Lifecycle Of A Document (Who’s touched the document)
• Is
there A Hardcopy/Paper Available?
|
About Kurt W.
Stevenson
Kurt W. Stevenson has
been involved in the Records & Information
Management field for over 13 years. He currently
serves as the Director of Records & Information
Management at Thacher, Proffitt, & Wood LLP in NY.
Kurt has spent the last three years developing IIMT
Magazine as an outlet for vendors to reach Records &
Information Management Technology Professionals with
viable solutions. He also was a key developer in
the Industries largest online directory of Records &
Information Management Vendors & Pro's -
http://www.RIMdirectories.com .
Kurt is also a public
speaker on both the National and Local levels. He
also serves as a Records & Information Management
Technology consultant. Kurt has developed one of
the most intelligent teams of RIM Technology Vendors
in the Industry, and can now offer turn key
solutions to any RIM Center worldwide. Mr.
Stevenson has extensive experience in the LEGAL
MARKETPLACE.
If you are ready to
jump your Records & Information Management into the
next century - Kurt W. Stevenson is the right man
for the job! |
|