Tabscanner - Docs

Tabscanner API

The Tabscanner API enables you to upload an image of a receipt or an invoice and get back the result in JSON format.

The document formats supported for upload are jpg and png.

You can

Upload a photograph of a receipt taken from a smartphone and extract all the data fields.
Upload a screenshot of an e-receipt and extract all the data fields.
Upload documents from multiple languages and regions.
Upload documents containing multiple character sets.

Out of the Box

The standard results provided by Tabscanner will be very good on most receipts and invoices, however, for extreme levels of accuracy, custom configurations and training may have to be performed. Please get in touch with our team to discuss your individual requirements.

Calling the API

Tabscanner uses a short polling system. First the request is submitted via the process endpoint where it is submitted into a queue for processing before returning a token. This token is then used to short poll the results endpoint for the result. an image will normally take ~5 seconds to process, so we would recommend waiting around this long before polling on a one second interval for the result.

Your API key allows access to make calls on your account. This must be kept secret at all times. Tabscanner must be called from a back-end application server and not a client side application directly. For this reason you won't find any API references for Swift/Android/Objective-C in our documentation.

All calls to Tabscanner are over SSL, we do not provide a non-ssl version of the API. All data uploaded to Tabscanner is encrypted at rest and deleted after 90 days. The results from your calls to tabscanner will be available for 90 days after the initial call. Therefore it is important to manage the persistent storage of images and results in your application.

API Reference

https://api.tabscanner.com

Authentication

Tabscanner authenticates via an API key passed as an http header named apikey.

Your API key can be found by logging into your Tabscanner account and locating it under the API details section.

Authentication

200 - OK	Request authenticated successfully

Errors

Types

400 - Error	ERROR_CLIENT: API key not found

Errors

When an error occurs during the call to the API Tabscanner returns a JSON object with details of the error. The attribute code holds the details of the error that occurred.

ATTRIBUTES

message	A message describing the error.
status	The status for the request. `success` or `failed`
status_code	An integer. The status code of the request
success	A boolean. If the request was successful or not.
code	An integer, the error code of the request.

API RESPONSE CODES
200 - Process request submitted successfully
202 - Result available
300 - Image uploaded, but did not meet the recommended dimension of 720x1280 (WxH)
301 - Result not yet available
400 - API key not found
401 - Not enough credit
402 - Token not found
403 - No file detected
404 - Multiple files detected, can only upload 1 file per API call
405 - Unsupported mimetype
406 - Form parser error
407 - Unsupported file extension
408 - File system error
500 - OCR Failure
510 - Server error
520 - Database Connection Error
521 - Database Query Error

Endpoints

Tabscanner aims to be a very simple API and has only 2 endpoints for processing. process and result.

The version of the API to use is passed in the URL after /api/. The current version is 2. The result endpoint does not require a version number as it is implied by the call to process.

We have provided code samples for the following languages to help fast-track your integration:

.NET
Ruby
Python
PHP
Java
Node
Go

As Tabscanner is not for use directly on phone apps we provide no documentation for Swift, Objective-C or Android.

https://api.tabscanner.com/api/2/process
https://api.tabscanner.com/api/result

process

The process endpoint allows the submission of an image. It is a multipart/form-data POST request. The request should contain the file you would like to upload, as well as the parameters for processing the image. All arguments are passed as form-data.

ARGUMENTS

file REQUIRED	The image file. Can accept JPG and PNG file formats.
decimalPlaces optional	Accepts an integer value should be 0, 1 or 3. A hint for what to look for on the receipt. It can improve accuracy if you know the number of decimal places in advance. This is not related to number formatting.
cents optional	Accepts a boolean value. Convert numbers without decimal places to cents. Only works with receipts set to 3 decimal places. (e.g. 1.574 = 1.574, 245 = 0.245)
documentType optional	Accepts a string value. Must be receipt, invoice or auto. The default is receipt. Specify the type of document to be processed. If set to auto Tabscanner will attempt to auto-detect the document type.
defaultDateParsing optional	Accepts a string value. Must be m/d or d/m. In the case of an ambiguous date eg. 02/03/2019 this parameter determines if the date is understood as day followed by month or month followed by day.
region optional	The 2-alpha ISO country code of the supported country. This will take into consideration number and date formats and language configurations among other configurations to improve the accuracy of the results. Listed below are the iso codes along with any custom fields that are available for the given region. Argentina - ar Australia - au ABNNumber Belguim - be Brazil - br ReceiptNumber CNPJ Canada - ca Chile - cl ReceiptNumber Columbia - co ReceiptNumber France - fr Germany - de VATNumber Greece - gr Hong Kong - hk India - in CINNO GSTIN Indonesia - id Ireland - ie Italy - it Japan - ja Kenya - ke Malaysia - my Mexico - mx ReceiptNumber New Zealand - nz Paraguay - pa Peru - pe ReceiptNumber Phillipines - ph TransactionInformationNumber Singapore - sg ReceiptNumber South Africa - za Spain - es Sweden - se Switzerland - ch Tonga - to UAE - ae TRNNumber InvoiceNo InvoiceType United Kingdom - gb ReceiptID StoreID VATNumber Uruguay - uy USA - us Vietnam - vn ReceiptNumber

ATTRIBUTES

token	A string. The token used to poll for the result
duplicate	A boolean describing if the same image has previously been uploaded.
duplicateToken	A string that is the token of the first seen duplicate of the upload.
message	A message describing the status of the request.
status	The status for the request `success` or `failed`.
status_code	An integer. The status code of the request
success	A boolean. If the request was successful or not.
code	An integer, the error code of the request.

string url = "https://api.tabscanner.com/api/2/process";
string fileName = "path/to/imageFile/image.jpg";
string key = 'yourapikey';

var client = new RestClient(url);
var request = new RestRequest(Method.POST);
request.AddFile("file", fileName, "image/jpeg");
request.AddHeader("apikey", key);

IRestResponse response = client.Execute(request);
var content = response.Content; // raw content as string

require 'rest_client'
require 'json'

API_KEY = 'yourapikey'

endpoint = 'https://api.tabscanner.com/api/2/process'

form = {
  :receiptFile => File.new('app/path/to/imageFile/image.jpg', 'rb')
}

headers = {apikey: API_KEY}

response = RestClient.post(endpoint,form,headers)

json = JSON.parse(response)

token = json["token"]

import requests
import json
from dotenv import load_dotenv
import os

# load your environment containing the api key
BASEDIR = os.path.abspath(os.path.dirname(__file__))
load_dotenv(os.path.join(BASEDIR, '.env'))
API_KEY = os.getenv("API_KEY")

def callProcess():

    endpoint = "https://api.tabscanner.com/api/2/process"
    receipt_image = "app/path/to/imageFile/image.jpg"

    payload = {"documentType":"receipt"}
    files = {'file': open(receipt_image)}
    headers = {'apikey':API_KEY}

    response = requests.post( endpoint,
                              files=files,
                              data=payload,
                              headers=headers)
    result = json.loads(response.text)

    return result

$url = 'https://api.tabscanner.com/api/2/process';

$cFile = curl_file_create('app/path/to/imageFile/image.jpg');
$post = array('receiptImage' => $cFile);

$apikey = 'yourapikey';
$headers = array(
                    "apikey:" . $apikey
                );

$cSession = curl_init();

curl_setopt($cSession, CURLOPT_URL, $url);
curl_setopt($cSession, CURLOPT_POST, 1);
curl_setopt($cSession, CURLOPT_POSTFIELDS, $post);
curl_setopt($cSession, CURLOPT_RETURNTRANSFER, true);
curl_setopt($cSession, CURLOPT_HEADER, false);
curl_setopt($cSession, CURLOPT_HTTPHEADER, $headers);

$result = curl_exec($cSession);

if (curl_errno($cSession)) {
	$result = curl_error($cSession);
}

curl_close($cSession);
$json = json_decode( $result );

$token =  $json->token;

String APIKEY = "yourapikey";
HttpResponse jsonResponse = Unirest.post("https://api.tabscanner.com/api/2/process")
.header("accept", "application/json")
.header("apikey", APIKEY)
.field("receiptImage", new File("path/to/imageFile/image.jpg"))
.asJson();

'use strict';
// load your environment containing the secret API key
require('dotenv').config()

const API_KEY = process.env.API_KEY
const fs = require("fs");
const rp = require("request-promise");

async function callProcess(files, params, ) {

  let formData = {
    file: []
  }

  for (var i = 0; i < files.length; i++) {
    const file = files[i]
    formData.file.push({
      value: fs.createReadStream(file),
      options: {
        filename: file,
        contentType: 'image/jpg'
      }
    })
  }

  formData = Object.assign({}, formData, params);

  const options = {
    method: 'POST',
    formData: formData,
    uri: `https://api.tabscanner.com/api/2/process`,
    headers: {
      'apikey': API_KEY
    }
  };

  const result = await rp(options)
  return JSON.parse(result)
}

(async () => {
  try {

    const imageFile = 'app/path/to/imageFile/image.jpg'
    let result = await callProcess([imageFile], {})
    // this token is used later to request the result
    const token = results.token
    console.log(token)

  } catch (e) {
    console.log(e)
  }
})();

package main

import (
  "net/http"
  "os"
  "bytes"
  "path/filepath"
  "mime/multipart"
  "io/ioutil"
  "io"
  "log"
  "github.com/buger/jsonparser"
  "time"
)

func main() {

  filePath := "./app/path/to/imageFile/image.jpg"
  apikey := "yourapikey"

  file, _ := os.Open(filePath)
  defer file.Close()

  body := &bytes.Buffer{}
  writer := multipart.NewWriter(body)
  part, _ := writer.CreateFormFile("file", filepath.Base(file.Name()))
  io.Copy(part, file)
  writer.Close()

  r, _ := http.NewRequest("POST", "https://api.tabscanner.com/api/2/process", body)
  r.Header.Add("Content-Type", writer.FormDataContentType())
  r.Header.Add("apikey", apikey)
  client := &http.Client{}
  response, _ := client.Do(r)

  processBody, _ := ioutil.ReadAll(response.Body)

  token,_ := jsonparser.GetString(processBody, "token")
  log.Println(token)

}

result

The result endpoint returns the result of the processed document. It is a GET request. The path of the request should contain the token returned in the related process call.

ARGUMENTS

token
REQUIRED

A string. The token by the process call.

ATTRIBUTES

status	The status for the request `done`, `pending` or `failed`.
status_code	An integer. The status code of the request
success	A boolean. If the request was successful or not.
code	An integer, the error code of the request.
result	An object containing the result data

Result Object

GENERAL DATA
establishment	A string. The establishment name detected on the receipt. This works by using a combination of machine learning and custom configurations. If you process a finite set of establishments, then accuracy can be dramatically improved via custom configurations.
date	A string. The purchase date and time in the format YYYY-MM-DD hh:mm:ss
dateISO	A string. The purchase date and time in ISO format YYYY-MM-DDThh:mm:ss
total	A float representing the total amount.
subTotal	A float representing the subtotal.
cash	A float representing the amount of cash paid.
change	A float representing the amount of change returned to the purchaser.
tax	A float representing the total amount of tax.
taxes	An array with a breakdown of all the tax amounts discovered.
serviceCharges	An array with a breakdown of all the service charges discovered.
tip	A float representing the total amount of the tip.
discount	A float representing the total discount applied to the receipt.
discounts	An array with a breakdown of all the discounts applied to the receipt.
rounding	A float representing an amount of rounding applied to the receipt.
address	A string containing the address of the establishment found on the receipt. this string is not normalized and contains all address info that was extracted.
addressNorm	The normalized version of the address attribute broken down as follows: city state number street suburb country building postcode
url	A string representing the website address extracted from the receipt.
phoneNumber	A string representing the phone number extracted from the receipt. Phone number formats are not normalized and are extracted in the form found in the receipt.
paymentMethod	A string representing the payment method found in the receipt. VISA Mastercard American Express Discover ALIPAY WE CHAT CASH Debit
barcodes	An array of barcodes extracted from the receipt. Each item in the array is itself an array with 2 indexes. The first index represents the barcode data and the second index represents the barcode type. Supported types are: EAN-13/UPC-A UPC-E EAN-8 Code 128 Code 39 Interleaved 2 of 5 QR Code
currency	A string. The detected currency extracted from the receipt. Currently support currencies: USD EUR GBP AED CHF AUD HKD JPY KRW RMB BRL CAD ZAR
expenseType BETA	A string representing the expense classification of the receipt. Support types are: Transportation-Rideshare/Uber/Lyft/Taxi Meals/Individual Meals while Traveling Travel Expenses/Hotel
customFields	Country CardLast4Digits
documentType	A string representing the type of document detected when auto is passed to the documentType parameter to the process endpoint. Supported values are: Receipt Invoice
LINE DATA
lineItems	An array of `LineItem` objects representing the products found in the receipt.
summaryItems	An array of `LineItem` objects representing lines that were not products eg. Total, Cash, Change etc.
CONFIDENCES
totalConfidence	A float value ranging from 0 to 1 representing how confident the system is that the total field is correct and is in fact the total.
subTotalConfidence	A float value ranging from 0 to 1 representing how confident the system is that the total field is correct and is in fact the subtotal.
taxesConfidence	An array containing float values ranging from 0 to 1 representing how confident the system is that the taxes fields are correct and are in fact the taxes.
serviceChargeConfidences	An array containing float values ranging from 0 to 1 representing how confident the system is that the service charge fields are correct and are in fact service charges.
tipConfidence	A float value ranging from 0 to 1 representing how confident the system is that the tip field is correct and is in fact a tip.
discountConfidences	An array containing float values ranging from 0 to 1 representing how confident the system is that the discount fields are correct and are in fact the discounts.
cashConfidence	A float value ranging from 0 to 1 representing how confident the system is that the cash field is correct and is in fact cash.
changeConfidence	A float value ranging from 0 to 1 representing how confident the system is that the change field is correct and is in fact change.
roundingConfidence	A float value ranging from 0 to 1 representing how confident the system is that the rounding field is correct and is in fact rounding.
dateConfidence	A float value ranging from 0 to 1 representing how confident the system is that the date field is correct.
establishmentConfidence	A float value ranging from 0 to 1 representing how confident the system is that the establishment field is correct.
validatedEstablishment	A boolean indicating that the establishment has been cross-referenced with the phoneNumber or address on the receipt and resolved confirmed in our database.
validatedTotal	A boolean indicating a very high confidence score for total. (0.99)
validatedSubTotal	A boolean indicating a very high confidence score for subtotal. (0.99)

LineItem Object

lineTotal	A float representing the total extracted from the line.
descClean	A string containing the consolidated and cleaned product description found in the line. This will include any supplemental descriptions found on lines adjacent to the lineTotal. It will also be cleaned of any price or discount information found in the description.
desc	A string containing the text found on the same line as the lineTotal.
qty	A float representing a quantity of a product found on the line. This will default to 0 rather than 1 and will only return 1 if it finds a 1.
price	A float representing the price extracted from the line.
unit	A float representing the unit extracted from the line.
productCode	A string representing a productCode found in the line.
symbols	An array of strings representing any symbols found in the line. Typically these are tax codes assigned to each line after the lineTotal.
supplementaryLineItems	In the event the system was unable to resolve text above and below lineTotals to a single line, a dictionary containing an array of text found above and below the line item will be available. Note: the system always attempts to resolve this automatically, however the event of a failure, this dictionary is returned as a fallback.
lineType	If the line item is a summary item, the system will attempt to classify the line item as one of the following: Total SubTotal Tax TotalTax Cash Change Tip ServiceCharge

Invalid Images

Tabscanner does not attempt to detect if an image is valid or not, but a number of fields can be used in conjunction to achieve this. For example, a total of zero with a confidence score of zero, combined with an empty date and empty establishment field would strongly indicate that the image was unreadable as a receipt. We leave the implementation of this up to the calling application as use-cases vary and no one algortithm will suffice.

string token = "yourtoken";
string url = "https://api.tabscanner.com/api/result/" + token;
string key = 'yourapikey';

var client = new RestClient(url);
var request = new RestRequest(Method.GET);
request.AddHeader("apikey", key);

IRestResponse response = client.Execute(request);
var content = response.Content; // raw content as string

require 'rest_client'
require 'json'

API_KEY = 'yourapikey'
token = 'yourtoken'
headers = {apikey: API_KEY}
endpoint = 'https://api.tabscanner.com/api/result/'

response = RestClient.get(endpoint + token, headers)
json = JSON.parse(response)

puts json

import requests
import json
from dotenv import load_dotenv
import os

# load your environment containing the api key
BASEDIR = os.path.abspath(os.path.dirname(__file__))
load_dotenv(os.path.join(BASEDIR, '.env'))
API_KEY = os.getenv("API_KEY")

def callResult(token):

    url = "https://api.tabscanner.com/api/result/{0}"
    endpoint = url.format(token)

    headers = {'apikey':API_KEY}

    response = requests.get(endpoint,headers=headers)
    result = json.loads(response.text)

    return result

$url = 'https://api.tabscanner.com/api/result/$token';
$token = 'yourtoken';

$apikey = 'yourapikey';
$headers = array(
                    "apikey:" . $apikey
                );

$cSession = curl_init();

curl_setopt($cSession, CURLOPT_URL, $url);
curl_setopt($cSession, CURLOPT_RETURNTRANSFER, true);
curl_setopt($cSession, CURLOPT_HEADER, false);
curl_setopt($cSession, CURLOPT_HTTPHEADER, $headers);

$result = curl_exec($cSession);
curl_close($cSession);

$json = json_decode( $result );

String APIKEY = "yourapikey";
String endpoint = "https://api.tabscanner.com/api";
Unirest.get(endpoint + "/result/{token}")
.routeParam("token", token)
.header("apikey", APIKEY).asJson();

'use strict';

require('dotenv').config()

const API_KEY = process.env.API_KEY
const fs = require("fs");
const rp = require("request-promise");

async function callResult(token) {

  const options = {
    method: 'GET',
    uri: `https://api.tabscanner.com/api/result/${token}`,
    headers: {
      'apikey': API_KEY
    }
  };

  const result = await rp(options)
  return JSON.parse(result)

}
(async () => {
  try {

    // your token from the previous process call
    const token = 'yourtoken'

    let result = await callResult(token)
    console.log(result)


  } catch (e) {
    console.log(e)
  }
})();

endpoint := "https://api.tabscanner.com/api/result/" + token
rr, _ := http.NewRequest("GET", endpoint, nil)
rr.Header.Add("apikey", apikey)
clientr := &http.Client{}
resp, _ := clientr.Do(rr)

result _ := ioutil.ReadAll(resp.Body)

establishment,_ := jsonparser.GetString(result, "result","establishment")

log.Println(establishment)

credit

The credit endpoint returns the number of credits left on the account.

It is a GET request.

It returns a single json number.

Simply call the /credit endpoint with a header named apikey containing the api key.

Advanced Features

Advanced features may be available to you depending on the status of your account. They include features such as:

Custom Fields
Line Item Resolution
Currency Detection

Custom Fields

Custom fields can be enabled on your account where Tabscanner will implement special rules to extract specific data from a receipt or invoice. Some examples of custom fields we currently extract are:

Receipt Number
Merchant ID
VAT Number
Document Number
Club Card Number

Resolving Line Items

When capturing detailed line items from receipts one major challenge is when the receipt lines span multiple lines of the receipt. For example, the line total maybe be on a different line to the description and the price and discount on another. In the event of many lines on the receipt, it can often become difficult to tell which description belongs to which line total or price.

Tabscanner will attempt to resolve these relationships automatically, but in some cases custom training of the system may be required to achieve very accurate results.

Improving Results

In general there are 3 ways to improve the results returned by the API

Improve image quality
Improve photography
Custom training and configuration of the system

Image Guidance

Our technology has been engineered to handle many different image anomalies, however, the following guidance will give your images the highest possible accuracy.

General Photography

Things to Avoid

Technical Support

If you have any technical questions not answered by this document or need technical support, please email your issue to support@tabscanner.com.