Skip to content
This repository has been archived by the owner on Jan 6, 2025. It is now read-only.

Bad encoding of Hindi Text #503

Open
nikkiBot opened this issue Feb 7, 2024 · 0 comments
Open

Bad encoding of Hindi Text #503

nikkiBot opened this issue Feb 7, 2024 · 0 comments

Comments

@nikkiBot
Copy link

nikkiBot commented Feb 7, 2024

  • I have a PDF that I wish to extract the table from. The package worked perfectly on most of the pdfs on which I used it before. But this time, I'm getting gibberish in English instead of Hindi Text.
  • Note that Dependencies are properly installed and that wouldn't be the issue here. This is what I'm doing:
    pdf = "./Pradhanjee.pdf"
    table = camelot.read_pdf(pdf, pages="all",flavor='lattice')
    df = []
    for i in range(len(table)):
    df.append(table[i].df)
    new_df = pd.DataFrame()
    for i in range(len(df)):
    new_df = pd.concat([new_df, df[i]], axis=0)
    new_df.to_excel(f"{title}.xlsx", index=False)
    new_df
    image
    I'm not sure why this is happening. Any help would be appreciated :')
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant