-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fil profiler fails for fairly large dataset #492
Comments
Try with |
Oh, except the OOM detection disabling thing is not currently available in Jupyter. I'll try to fix that (or just disable it by default, we'll see). |
Thanks for getting back. I feel the 1+1 situation is not something that is immediate, as there might not be a need to profile something like that.( but yeah worth putting some warning). I feel the use of fil with jupyter is good to have especially for data scientists, and the graph that fil prints out is easily interpretable by data scientists; hence I think having this option disabled in jupyter will help us out.
|
I am now kinda confused with OOM. Considering my laptop is of RAM 16 GB If my dataset uses around 23 GB from Is it the case that 23 GB of virtual memory ( allocated memory ) is used and around 17GB is swapped to the disk and around 6 GB is in RAM? If this is the case, what is the maximum amount of data that can be handled on my machine using pandas? Is it the free hard disk space? |
So, yes, one thing to keep in mind is that I'm told that macOS will keep writing to disk until it has twice as much disk usage as memory, and only then start failing memory allocations/killing your process. So in your case 32GB on disk. But that would be pretty slow. The out-of-memory detection Fil does, and which I guess I should consider just disabling on macOS, uses heuristics to guess when OOM is approaching, and sometimes it triggers too soon. You can try Fil with most other applications closed, or just a couple of browser tabs, and see if that gets you further. |
Hopefully I'll have a release with OOM disabled in an hour or two, or tomorrow if tests fail. As another option, I also work on a commercial Python profiler for data science: https://sciagraph.com Pros compared to Fil:
Cons compared to Fil:
|
Oh, and re >>> x = 1_000_000
>>> y = 1_000_000
>>> x == y
True
>>> x is y
False
>>> x = 1
>>> y = 1
>>> x is y
True |
@itamarst Thanks for fixing both OOM and 1+1 issue. I am able to workout fil in jupyter. I noticed it works fine and the graph shows up in jupyter when profiling small tasks. But I noticed when working with big datasets; for eg in my case I am loading 10 GB CSV file the graph is not showing even though it is getting generated like below I checked those folders and the file exist and it is using peak 53745 MB. Is there any reason why the graph is not showing in my jupyter notebook ? Also the above number (53745 MB) is allocated memory right ? Where I can find peak resident memory in fil (at least it is not showing in the above graph - it only shows peak tracked memory usage)? https://sciagraph.com/ looks interesting can I track any tickets to see when Mac support becomes available ? |
I was reading this https://pythonspeed.com/articles/python-out-of-memory/ Does fil profiler also gives how it failed like mentioned in the article ?
In the above case loading 10 GB file succeed with no issues. If we load that same 10 GB file again to a different variable that too works with no issues (no memory issues). I was just curious on why it is not failing and checked if when loading 10GB second time is using same address space like the first one (similar to simple copy case) - but it is using different address space. But when we load a 20 GB file (this is csv file basically made by row binding this 10GB file twice) jupyter kernel crashes. Would be great if we can get the reason of the fail like mentioned in the article. |
|
Hi, just FYI I just released the macOS version of Sciagraph |
I am doing some benchmarking and thought about using fil.
I am loading a csv file of size 3 GB to my 16 GB RAM mac. I am able to load the file completely fine in python3 with Fil kernel, but when I use %%filprofile The kernel is dying; and upon checking logs I saw out-of-memory.svg file in some temp folder with memory showing around 6000 MB. This shouldn’t be the case as I can load the data completely fine without fil and other profilers work with no issues. (like below)
I am wondering why this is happening when I am trying to use %%filprofile. All other profilers works fine like I tried memory profiler and it is showing as follows.
I don’t get any issue when I tested fil by profiling 1+1
Is fil designed to profile small dataset cases ? What can I try from my end to make fil working ?
Just as a side" note how different is fil when compared to memoryprofiler as It also give the peak memory and increment. Are there any strong reasons that fil is better other than the nice graphical view ? I do understand why it is better compared to sys.getsizeof() and memory_usage() from your article
filprofiler==2023.1.0
Python 3.11.0
The text was updated successfully, but these errors were encountered: