Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some jobs report UNSUPPORTED_FILE_TYPE Exception but by Status is Good According to API #576

Open
adreichert opened this issue Jan 3, 2025 · 3 comments
Labels
bug Something isn't working

Comments

@adreichert
Copy link
Contributor

Describe the bug
I have a directory containing several hundred HTML files. When I parse them with theload_data in the Python llama_parse package, the code fails with the UNSUPPORTED_FILE_TYPE exception.

  • Each time, it happens with different files
  • If I check the status quickly with web UI within a few seconds, I see "failed" and the error. But If I refresh the page, it lists success. The results seems correct.
  • One can also see successful job status using the API
lp = llama_parse.LlamaParse(
        api_key=lp,
        result_type='markdown',
        verbose=False,
        use_vendor_multimodal_model=True,
        vendor_multimodal_model_name='openai-gpt4o',
       [... other options..]
    )
lp.load_data(files)  <-- files is a list of paths


File ".../llama_parse/base.py", line 654, in _get_job_result
    raise Exception(exception_str)
Exception: Job ID: 3e3b9c18-8059-4c4d-bf84-a138d73e2208 failed with status: ERROR, Error code: UNSUPPORTED_FILE_TYPE, Error message: Unsupported file type.
[...]
% curl -X 'GET' \
  'https://api.cloud.llamaindex.ai/api/parsing/job/3e3b9c18-8059-4c4d-bf84-a138d73e2208' \
  -H 'accept: application/json' \
  -H "Authorization: Bearer $LLAMA_CLOUD_API_KEY"
{"id":"3e3b9c18-8059-4c4d-bf84-a138d73e2208","status":"SUCCESS"}%      

Job ID
3e3b9c18-8059-4c4d-bf84-a138d73e2208

Client:
Please remove untested options:

  • Python Library
@adreichert adreichert added the bug Something isn't working label Jan 3, 2025
@adreichert
Copy link
Contributor Author

One more thing. I parse these files a few days ago without issue. I don't believe the contents have changed, but it is possible.

@adreichert
Copy link
Contributor Author

adreichert commented Jan 4, 2025

The workaround was to keep retrying while relying on cached results. I ran it a few more times. Each time, a different file would cause the error and then be marked as successful. After this, the process completed.

@edanweis
Copy link

any updates on this? I'm getting the same problem intermittently via REST api

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants