Optical character recognition (OCR) helps speed up your payable creation process. At first, OCR may have low accuracy, however, training OCR can quickly and dramatically improve OCR results.
📚 To learn more about using OCR to speed up your payable creation, see the Routable Help Center article, Using OCR to speed up payment creation →
When coding a scanned bill from your inbox and you notice that a prefilled field is inaccurate, train OCR to correctly populate the field in the bill that you are currently coding and to help improve future OCR results.
Training OCR
👇Jump to:
To train OCR as you are creating a payable:
- Click the Train OCR on attachment button at the top of the file viewer.
From Routable, you will be redirected to the OCR training module.
- In the OCR training module, capture each field by:
- Verifying the suggested data ✅
- Selecting different data by moving the bounding box 🟦 associated with the field
- Clearing out the suggested data 🗑️
You may need to do more than one of these actions per field.
- Once you are done with your training session in the OCR training module, click the Confirm button to save your training session.
- From the OCR training module, you will be returned to Routable and the payable you were coding will be updated with the training data.
Training fields
Field | Definition |
---|---|
Basic Information | |
Document Type | Always select Tax Invoice |
Document ID | Bill number. Whitespaces are stripped. |
Purchase Order Number | Purchase order number. Whitespaces are stripped. |
Issue Date | Bill date |
Due Date | Bill due date |
Payment Instructions | |
Account Number | Bank account number. Whitespaces are stripped. |
Bank Code | Bank/sort code |
IBAN | Bank account number in IBAN format |
Terms | Payment terms |
Totals and Subtotals | |
Subtotal | Base bill amount used for tax calculation |
Total Tax | Total tax amount |
Amount Due | Final amount to be paid including tax after deducting all discounts |
Currency | Bill currency |
Vendor & Customer | |
Vendor Name | Vendor name |
Vendor Address | Vendor address |
Customer Name | Recipient/customer name |
Customer Address | Recipient/customer address |
Other | |
Notes | N/A. Skip confirming this field. |
Line Items | |
Code | Line item product/service item code |
Description | Line item description. Can be multiple lines. |
Quantity | Line item quantity |
Amount Base | Line item unit price without tax |
Unit Price | Line item unit price with tax. Likely you will leave this field blank. |
Total Base |
Line item total without tax Amount Total Base = Amount Base * Quantity |
Total Amount |
Line item total with tax. Likely you will leave this field blank. Amount Total = Amount * Quantity |
Training Best Practices
OCR accuracy depends on high-quality training data. In order to capture the highest quality data possible and to train OCR with the highest quality data possible, follow these best practices.
💡 Only annotate data values - keep data capture locations consistent and precise
- Capture a value from the same place on documents with the same layout and/or from the same vendor every time.
- Bounding box borders should go around the data, not through it.
- Check that the captured data is correct for the associated field.
- Do not capture data labels.
- Avoid overlapping captured data.
💡Always capture all available data on a document
- If the data for an OCR field appears on a document, even if a value is “0," capture it.
- If the data for an OCR field does not appear on a document, capture the data for the field as blank data.
- Do not manually input data capture values if they do not appear on the document.
💡Capture data from preferred locations and capture related data from the same location
- Typically, this means that you should capture all data possible on the first page of a document, the first time it appears in a document, and in the header of the document (rather than the footer).
- Avoid capturing document totals in line items—capture them in the header or footer instead.
- Especially capture all tax data from the same area (typically in a table below the line items).
- For example, capture all of the vendor fields from the same area in the header—do not use capture the vendor name in the logo and the vendor address in the header.
📚 For more information on OCR training best practices, refer to Rossum's Annotations Guide ↗
Comments
0 comments
Article is closed for comments.