You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm working on customizing the invoice template for the invoice extraction process, but I'm unable to extract the expected fields from the attached PDF using the custom template.
issuer: Impex
fields:
invoice_number: "PROFORMA INVOICE NUMBER AND DATE\\s+PAGE NUMBER\\s+([A-Za-z0-9]+)"
date: "26/12/2024"
amount: "1,404.00"
sales_order_number: "SALES ORDER NUMBER AND DATE\\s+([A-Za-z0-9]+)"
sales_order_date: "SALES ORDER NUMBER AND DATE\\s+[A-Za-z0-9]+\\s*(\\d{2}/\\d{2}/\\d{4})"
port_of_loading: "PORT OF LOADING\\s+(.*?)\\s+COUNTRY OF ORIGIN"
port_of_discharge: "PORT OF DISCHARGE\\s+(.*?)\\s+BUYER"
country_of_origin: "COUNTRY OF ORIGIN\\s+([A-Za-z]+)"
country_of_final_destination: "COUNTRY OF FINAL DESTINATION\\s+([A-Za-z]+)"
buyer_name: "BUYER\\s+(.*?)\\s+Email"
buyer_email: "BUYER\\s+.*?Email:\\s*(\\S+)"
seller_name: "SELLER\\s+(.*?)\\s+ABN"
seller_email: "SELLER\\s+.*?Email:\\s*(\\S+)"
seller_abn: "ABN:\\s*([0-9]+)"
amount: "TOTAL\\s+.*?AUD\\s*([\\d,]+\\.\\d{2})"
tolerance: "TOLERANCE\\s*:\\s*(.*?)\\s*(?=PAYMENT TERMS)"
payment_terms: "PAYMENT TERMS\\s*:\\s*(.*?)\\s*(?=SPECIFICATION)"
specification: "SPECIFICATION\\s*:\\s*(.*?)\\s*(?=WE CONFIRM)"
origin_confirmation: "WE CONFIRM GOODS ARE OF AUSTRALIAN ORIGIN"
signatory: "Signed for and on behalf of AGROMIN AUSTRALIA PTY LTD"
tables:
- start: "DESCRIPTION OF GOODS"
end: "TOTAL"
body: "1\\s+This product.*?\\s+(?P<qty>\\d+\\.\\d+)\\s+(?P<description>This is a text)\\s+(?P<hs_code>57021000)\\s+(?P<unit_price>234\\.00)\\s+(?P<line_total>702\\.00)"
options:
currency: AUD
decimal_separator: "."
keywords:
- "PROFORMA INVOICE"
- "Impex"
- "AUSTRALIA"
- "INDIA"
Current Output
{ 'amount': 1404.0,
'country_of_final_destination': 'NOIDA',
'country_of_origin': 'COUNTRY',
'currency': 'AUD',
'date': datetime.datetime(2024, 12, 26, 0, 0),
'desc': 'Invoice from Impex',
'invoice_number': 'Impex',
'issuer': 'Impex',
'origin_confirmation': 'WE CONFIRM GOODS ARE OF AUSTRALIAN ORIGIN',
'payment_terms': 'This product belongs to impex docs',
'sales_order_number': 'Noida',
'seller_abn': '4589',
'signatory': 'Signed for and on behalf of AGROMIN AUSTRALIA PTY LTD',
'specification': 'This product belongs to impex docs',
'tolerance': 'This product belongs to impex docs'}
The text was updated successfully, but these errors were encountered:
Thanks so much for creating this fantastic library!
I'm currently struggling with creating a custom template as described above. Despite my efforts, I'm not able to achieve the expected output from the invoice extraction process. Would you mind guiding me on how to adjust the template to correctly extract the fields I need?
You may have more luck here when using the lines parser..
lines: - start: "DESCRIPTION OF GOODS" end: "TOTAL" body: "1\s+This product.*?\s+(?P\d+\.\d+)\s+(?PThis is a text)\s+(?P<hs_code>57021000)\s+(?P<unit_price>234\.00)\s+(?P<line_total>702\.00)"
I'm working on customizing the invoice template for the invoice extraction process, but I'm unable to extract the expected fields from the attached PDF using the custom template.
Input PDF
sample.pdf
Expecting Output jsut like
My Current Template is
Current Output
The text was updated successfully, but these errors were encountered: