Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IndexError: list index out of range running sample code "To get XBRL data" #20

Open
compusaurusrex opened this issue Jan 2, 2021 · 7 comments

Comments

@compusaurusrex
Copy link

compusaurusrex commented Jan 2, 2021

Hello Joey!

I want to say thanks so much for working on this project, it is exactly what I hoped to find.

I have been working with the package and sample code some and am getting an index error running your sample code from the section "To get XBRL data".

Specifically, "results" is a list with no elements after running the following line:

results = company.get_data_files_from_10K("EX-101.INS", isxml=True)

The error occurs during execution of the next line:

xbrl = XBRL(results[0])

I originally tried this in a Jupyter notebook, but also tried in an interactive interpreter session to make sure it wasn't just related to the environment.
edgar-index-error-interpreter
edgar-index-error-jupyter

I would appreciate it if you could provide some assistance and let me know if this problem is a known issue or something I am doing wrong, please?

I specifically would like to collect "facts" from the 10-K reports like number of common shares outstanding, for instance.

Thanks, sincerely

P.S. - I followed the BuyMeACoffee.com link with the intent of providing support, and found you don't have a support button on your page...

@joeyism
Copy link
Owner

joeyism commented Jan 27, 2021

There's no EX-101.INS in the latest Oracle filing https://www.sec.gov/Archives/edgar/data/1341439/000156459020030125/0001564590-20-030125-index.htm

Screenshot from 2021-01-27 22-00-10

Also, thanks for trying to donate, and letting me know. I fixed the link now. If it still doesn't work, let me know too :)

@dumrich
Copy link

dumrich commented Aug 9, 2021

@joeyism Many of the examples still give me this error:

Traceback (most recent call last):
  File "/home/abhinav/Code/BAM_scripts/test_edgar.py", line 2, in <module>
    edgar = Edgar()
  File "/home/abhinav/.local/lib/python3.9/site-packages/edgar/edgar.py", line 18, in __init__
    _name, _cik = Edgar.split_raw_string_to_cik_name(item)
  File "/home/abhinav/.local/lib/python3.9/site-packages/edgar/edgar.py", line 47, in split_raw_string_to_cik_name
    return ":".join(item_arr[:-1]), item_arr[-1]
IndexError: list index out of range

@zkoppert
Copy link

@joeyism I have also run into this error intermittently. Any pointers on how to debug this?

@joeyism
Copy link
Owner

joeyism commented Aug 26, 2021

@dumrich @zkoppert Can you give me the code you're using to run this? If it happens sometimes, it may be related to specific companies so I'll need as much information as possible to reproduce it

@fireball147
Copy link

@joeyism Hi Joey, I also noticed this error when I tried to use edgar module.
The error is not a company specific issue, as you can see in the screenshot of @dumrich which shows that traceback occured at line 2
edgar = Edgar()

Luckily, I was also able to pinpoint the reason for this due to one of my other personal project.
Recently, due to some changes in sec.gov, when you try to do web scrapping using python for http://www.sec.gov domain, the website usually throws an error saying that you are using unregistered automated tool

this can be easily solved by adding headers in the request.get part of the code.

I've manually added below this in all your edgar module files, and my issue was resolved.

Step 1: Add this at beginning of module files (company.py , edgar.py, etc.):
hdr = {'user-agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:89.0) Gecko/20100101 Firefox/89.0'}

Step 2: in the module files, search for requests.get. And in all the requests.get syntax, add headers=hdr after the url.

my issue was resolved after this.

@eabase
Copy link
Contributor

eabase commented Feb 7, 2022

related to #29

@fireball147

And no, those are not the correct headers to use.

@fireball147
Copy link

related to #29

@fireball147

And no, those are not the correct headers to use.

You might be correct. I'm not good with programming, i dont have any computer science or coding background... just a newbie here.

I tried this headers based on some other website, and it worked for me. So I mentioned it here.
I believe SEC website have information around how the header should be in case of using an automated tool but I didn't go into it in detail as this solution was working fine for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants