Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to Convert Custom Invoice Template #603

Open
rupesh881 opened this issue Jan 26, 2025 · 2 comments
Open

Unable to Convert Custom Invoice Template #603

rupesh881 opened this issue Jan 26, 2025 · 2 comments

Comments

@rupesh881
Copy link

I'm working on customizing the invoice template for the invoice extraction process, but I'm unable to extract the expected fields from the attached PDF using the custom template.

Input PDF

sample.pdf

Expecting Output jsut like

	"fields": {
		"invoice_number": "AUTOSAL4406",
		"proforma_invoice_date": "26/12/2024",
		"order_number": "AUTOSAL4406",
		"order_date": "26/12/2024",
		"seller_details": {
			"name": "Impex",
			"address": "Noida, Uttar Pradesh, INDIA",
			"ABN": "4589",
			"email": "[email protected]"
		},
		"buyer_details": {
			"name": "Tester Edit",
			"email": "[email protected]",
			"address": "ABU DHABI"
		},
		"country_of_origin": "NOIDA, INDIA",
		"country_of_final_destination": "AUSTRALIA",
		"port_of_loading": "Noida, Uttar Pradesh",
		"port_of_discharge": "BRISBANE",
		"items": [
			{
				"container_number": "1",
				"packing_qty": "3.00",
				"description": "This is a text",
				"HS_code": "57021000",
				"unit_price": "234.00",
				"total_price": "702.00"
			},
			{
				"container_number": "1",
				"packing_qty": "3.00",
				"description": "This is a text",
				"HS_code": "57021000",
				"unit_price": "234.00",
				"total_price": "702.00"
			}
		],
		"total_price_AUD": "1,404.00",
		"tolerance": "This product belongs to impex docs",
	},
}

My Current Template is

issuer: Impex
fields:
  invoice_number: "PROFORMA INVOICE NUMBER AND DATE\\s+PAGE NUMBER\\s+([A-Za-z0-9]+)"
  date: "26/12/2024"  
  amount: "1,404.00" 
  sales_order_number: "SALES ORDER NUMBER AND DATE\\s+([A-Za-z0-9]+)"
  sales_order_date: "SALES ORDER NUMBER AND DATE\\s+[A-Za-z0-9]+\\s*(\\d{2}/\\d{2}/\\d{4})"
  port_of_loading: "PORT OF LOADING\\s+(.*?)\\s+COUNTRY OF ORIGIN"
  port_of_discharge: "PORT OF DISCHARGE\\s+(.*?)\\s+BUYER"
  country_of_origin: "COUNTRY OF ORIGIN\\s+([A-Za-z]+)"
  country_of_final_destination: "COUNTRY OF FINAL DESTINATION\\s+([A-Za-z]+)"
  buyer_name: "BUYER\\s+(.*?)\\s+Email"
  buyer_email: "BUYER\\s+.*?Email:\\s*(\\S+)"
  seller_name: "SELLER\\s+(.*?)\\s+ABN"
  seller_email: "SELLER\\s+.*?Email:\\s*(\\S+)"
  seller_abn: "ABN:\\s*([0-9]+)"
  amount: "TOTAL\\s+.*?AUD\\s*([\\d,]+\\.\\d{2})"
  tolerance: "TOLERANCE\\s*:\\s*(.*?)\\s*(?=PAYMENT TERMS)"
  payment_terms: "PAYMENT TERMS\\s*:\\s*(.*?)\\s*(?=SPECIFICATION)"
  specification: "SPECIFICATION\\s*:\\s*(.*?)\\s*(?=WE CONFIRM)"
  origin_confirmation: "WE CONFIRM GOODS ARE OF AUSTRALIAN ORIGIN"
  signatory: "Signed for and on behalf of AGROMIN AUSTRALIA PTY LTD"
tables:
  - start: "DESCRIPTION OF GOODS"
    end: "TOTAL"
    body: "1\\s+This product.*?\\s+(?P<qty>\\d+\\.\\d+)\\s+(?P<description>This is a text)\\s+(?P<hs_code>57021000)\\s+(?P<unit_price>234\\.00)\\s+(?P<line_total>702\\.00)"
options:
  currency: AUD
  decimal_separator: "."
keywords:
  - "PROFORMA INVOICE"
  - "Impex"
  - "AUSTRALIA"
  - "INDIA"

Current Output

 { 'amount': 1404.0,
  'country_of_final_destination': 'NOIDA',
  'country_of_origin': 'COUNTRY',
  'currency': 'AUD',
  'date': datetime.datetime(2024, 12, 26, 0, 0),
  'desc': 'Invoice from Impex',
  'invoice_number': 'Impex',
  'issuer': 'Impex',
  'origin_confirmation': 'WE CONFIRM GOODS ARE OF AUSTRALIAN ORIGIN',
  'payment_terms': 'This product belongs to impex docs',
  'sales_order_number': 'Noida',
  'seller_abn': '4589',
  'signatory': 'Signed for and on behalf of AGROMIN AUSTRALIA PTY LTD',
  'specification': 'This product belongs to impex docs',
  'tolerance': 'This product belongs to impex docs'}
@rupesh881
Copy link
Author

Hey @bosd / @m3nu / @alexis-via ,

Thanks so much for creating this fantastic library!

I'm currently struggling with creating a custom template as described above. Despite my efforts, I'm not able to achieve the expected output from the invoice extraction process. Would you mind guiding me on how to adjust the template to correctly extract the fields I need?

Your help would be greatly appreciated!

Thanks in advance!

@bosd
Copy link
Collaborator

bosd commented Jan 27, 2025

You may have more luck here when using the lines parser..

lines: - start: "DESCRIPTION OF GOODS" end: "TOTAL" body: "1\s+This product.*?\s+(?P\d+\.\d+)\s+(?PThis is a text)\s+(?P<hs_code>57021000)\s+(?P<unit_price>234\.00)\s+(?P<line_total>702\.00)"

( Quick reply from phone, untested)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants