Compare commits

...

2 Commits

Author SHA1 Message Date
Louis Baudoux
59649a6b83 [IMP] extract_api: few minor improvements/corrections
- Add links to the IAP documentation.
- Remove BMP from supported file format (it never was supported ?).
- Rewording of some descriptions in the `/parse` documentation.
- More consistent abbreviations for "IAP" and "OCR".
2025-03-03 17:20:50 +01:00
Louis Baudoux
eea076745e [IMP] extract_api: improve implementation example
- Since the example only works for invoices, references to the other
  document types supported by the OCR have been removed.
- Handle the case where library `requests` isn't available.
- Show additional fields detected by the OCR.
- Properly set the ID of the JSON-RPC request.
- Commit 8c93ff7 should have adapted the implementation example with the
  latest API version.
2025-03-03 15:46:17 +01:00
2 changed files with 38 additions and 32 deletions

View File

@@ -2,16 +2,20 @@
Extract API
===========
.. |IAP| replace:: :abbr:`IAP (In-app purchases)`
.. |OCR| replace:: :abbr:`OCR (Optical Character Recognition)`
Odoo provides a service to automate the processing of documents of type **invoices**, **bank statements**,
**expenses** or **resumes**.
The service scans documents using an :abbr:`OCR (Optical Character Recognition)` engine and then
The service scans documents using an |OCR| engine and then
uses :abbr:`AI(Artificial Intelligence)`-based algorithms to extract fields of interest such as the
total, due date, or invoice lines for *invoices*, the initial and final balances, the date for
*bank statements*, the total, date for *expenses*, or the name, email, phone number for *resumes*.
This service is a paid service. Each document processing will cost you one credit.
Credits can be bought on `iap.odoo.com <https://iap.odoo.com/iap/in-app-services/259?sortby=date>`_.
This service is a paid service. Each document processing will cost you one credit from your
document digitization |IAP| account. More information about |IAP| accounts can be found
:doc:`here </applications/essentials/in_app_purchase>`.
You can either use this service directly in the Accounting, Expense, or Recruitment App or through
the API. The Extract API, which is detailed in the next section, allows you to integrate our
@@ -59,8 +63,8 @@ testing is provided in the
Parse
=====
Request the processing of a document from the OCR. The route will return a `document_token`,
you can use it to obtain the result of your request.
Request the digitization of a document. The route will return a `document_token` that you can use
to fetch the result of your request.
.. _extract_api/parse:
@@ -87,17 +91,15 @@ Request
.. rst-class:: o-definition-list
``account_token`` (required)
The token of the account from which credits will be taken. Each successful call costs one
token.
The token of the :doc:`IAP </applications/essentials/in_app_purchase>` account from which
credits will be charged. Each successful call costs one credit.
``version`` (required)
The version will determine the format of your requests and the format of the server response.
You should use the :ref:`latest version available <extract_api/version>`.
``documents`` (required)
The document must be provided as a string in the ASCII encoding. The list should contain
only one string. If multiple strings are provided only the first string corresponding to a
pdf will be processed. If no pdf is found, the first string will be processed. This field
is a list only for legacy reasons. The supported extensions are *pdf*, *png*, *jpg* and
*bmp*.
The document must be provided as a Base64 string in the ASCII encoding.
The list should contain only one document. This field is a list only for legacy reasons.
The supported formats are *pdf*, *png* and *jpg*.
``dbuuid`` (optional)
Unique identifier of the Odoo database.
``webhook_url`` (optional)
@@ -238,7 +240,7 @@ Request
``document_token`` (required)
The ``document_token`` for which you want to get the current parsing status.
``account_token`` (required)
The token of the account that was used to submit the document.
The token of the |IAP| account that was used to submit the document.
.. code-block:: js
@@ -279,7 +281,7 @@ are the name of the field and the value is the value of the field.
.. rst-class:: o-definition-list
``full_text_annotation``
Contains the unprocessed full result from the OCR for the document
Contains the unprocessed full result from the |OCR| for the document.
================================ =============================================================
status status_msg

View File

@@ -2,27 +2,25 @@ import base64
import json
import sys
import time
import uuid
import requests
try:
import requests
except ImportError:
print("The 'requests' library is required to run this script. More information at https://pypi.org/project/requests.")
exit()
account_token = "integration_token" # Use your token
domain_name = "https://extract.api.odoo.com"
path_to_pdf = "/path/to/your/pdf"
doc_type = "invoice" # invoice, expense or applicant
# Do not change
API_VERSION = {
'invoice': 122,
'expense': 132,
'applicant': 102,
}
def extract_jsonrpc_call(path: str, params: dict):
payload = {
'jsonrpc': '2.0',
'method': 'call',
'params': params,
'id': 0, # This should be unique for each call
'id': uuid.uuid4().hex, # This should be unique for each call
}
response = requests.post(domain_name + path, json=payload, timeout=10)
response.raise_for_status()
@@ -35,20 +33,20 @@ def send_document_to_extract(doc_path: str):
encoded_doc = base64.b64encode(f.read()).decode()
params = {
'account_token': account_token,
'version': API_VERSION[doc_type],
'version': 123,
'documents': [encoded_doc],
}
response = extract_jsonrpc_call(f"/api/extract/{doc_type}/2/parse", params)
response = extract_jsonrpc_call(f"/api/extract/invoice/2/parse", params)
return response
def get_result_from_extract(document_token: str):
params = {
'version': API_VERSION[doc_type],
'version': 123,
'document_token': document_token,
'account_token': account_token,
}
endpoint = f"/api/extract/{doc_type}/2/get_result"
endpoint = f"/api/extract/invoice/2/get_result"
response = extract_jsonrpc_call(endpoint, params)
while response['result']['status'] == 'processing':
print("Still processing... Retrying in 5 seconds")
@@ -83,8 +81,14 @@ if __name__ == '__main__':
document_results = response['result']['results'][0]
print("\nTotal:", document_results['total']['selected_value']['content'])
print("Subtotal:", document_results['subtotal']['selected_value']['content'])
print("Invoice id:", document_results['invoice_id']['selected_value']['content'])
print("Date:", document_results['date']['selected_value']['content'])
print("...\n")
def _get_selected_value(field):
return document_results.get(field, {}).get('selected_value', {}).get('content', '')
print("\nTotal:", _get_selected_value('total'))
print("Subtotal:", _get_selected_value('subtotal'))
print("Reference:", _get_selected_value('invoice_id'))
print("Date:", _get_selected_value('date'))
print("Due date:", _get_selected_value('due_date'))
print("Currency:", _get_selected_value('currency'))