877.362.7273

OCR and Text Recognition

Text Recognition and Optical Character Recognition (OCR)

Text recognition refers to technologies that translate paper information into electronic data without any additional manual data input. When a document is uploaded into SpringCM, our software determines if it is in a supported text format or if it needs to have optical character recognition (OCR) processes applied. Images and faxes are the most common form of electronic document that requires OCR. OCR ensures that you can search for any snippet of text in your documents and find it quickly.

Using either text extraction or OCR, paper documents, forms and unstructured data are converted to searchable information in SpringCM. The result is that all document content (faxes, images, PDFs, electronic documents) can be made searchable and usable to all SpringCM users. Depending on the application, Smart Rules can be applied to facilitate routing and approval or other workflows.

Embedded OCR

OCR offers much more than making your document repository fully text searchable. Two additional OCR technologies further refine the process. Zone OCR instructs the solution to look at a particular place in the document for the specific information needed – for example a patient ID in a medical record or a purchase order number on an invoice. The specified data is then automatically extracted from the document. Intelligent Data Capture (IDC) takes the process even further, by searching through the entire document to find the specific data you require. For example, if incoming forms like invoices or contracts are faxed into a specific folder, SpringCM can read the most important fields and extract them into metadata for use in workflow processes, document retrieval and other tasks. Both IDC and Zone OCR eliminate manual data input, save time and reduce the possibility of errors.

Text Recognition as Part of the Indexing Process

SpringCM offers many tools to describe files so that they can more easily be classified and located at a later time. The data that is created during this process is called "metadata" – literally "data about data." The process of attaching additional metadata to a document is called "indexing" and constitutes an essential part of the capture/search process.

SpringCM can be configured to automatically extract as much relevant information as possible about a document or a file when it enters the system. As mentioned above, SpringCM’s Zone OCR and IDC capabilities automate the indexing process so important information like customer numbers or purchase order numbers are captured when documents enter the system. This extraction can effectively eliminate the need for manual data entry.

SpringCM’s Client Services team will help set up specialized text–extraction so your standardized electronic forms are easily recognizable by SpringCM and the document indexing process is nearly transparent to your users. The specific fields on your forms that you want to index will be identified and a map of these fields will be created. From there the SpringCM application does the rest, and your scanned or faxed documents as well as other electronic forms will be indexed automatically.

Return to Capture Solutions