-
Notifications
You must be signed in to change notification settings - Fork 127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug: <Memory Leak persist after update 1.13.1 > #1142
Comments
For more information, I am referencing these discussions from Twitter. |
I have investigated this and it sounds to me that it works as expected. But, you as a user must know the caveat of it. In short, this is not a memory leaks! RepoDB requires these caching for it to perform a more performant insertion of the batch operations. ExplanationBelow is the screenshot and a very small project that we used for simulation and replication. The project below requires a SQL Server as we did the simulation there. Unfortunately for SQL Server and MDS, it only allow a maximum number of 2100 parameters. Therefore, you will see that I can only hit the max of Project: This project is good for small simulation on this kind of issue. Screenshot: What the program does?
ObservationFluctuations: In the first few seconds, the memory has fluctuated a lot, it is because when the Truth: If you insert 1 row, it will create 1 The Flat-Lines: You will notice the flat-lines after those 20 Behavior ExtentThis kind of behavior is expected and is also present to both ConclusionThe number of cached statements will vary on the number of OptimizationsCurrently, RepoDB is creating multiple INSERT statement per row-numbers's batch insertion. See below.
The statement above are verbose and is also not using the more optimal bulk insert. This can be optimized by below.
With that approach, it will eliminate so may characters from the memory. |
Referencing: #380 |
We will create the simulation in the PGSQ database, we are hopeful that this 2100 limit is not present there so we can simulate your use-case. We will post the result here once done. |
If you use You will also notice that the higher the Note: If you bulk insert 1000 rows and you set the batchSize to 1, it will iterate 1000 times (this is the behavior of SqlBulkCopy itself). Therefore, if you bulk insert 1000 rows and you set the batchSize to 1000 as well, it will bring all data at once. EDIT: Attaching. InsertAllMemoryLeaks-BulkInsert.zip |
Hmmm - interestingly, seems I can replicate your issue in PGSQL. I modified the program and enable the 100 max batch size with 120 columns in a table. Below is a screenshot in the first few seconds of run. It first burst up to 900 MB, went down suddenly and slowly climbing up, exactly as what you explained in Twitter. And few minutes after, the memory is not going down. It even reach to 2 GB, and is still climbing. But, when the library had cached all the row batch statements (100 total statements), the execution suddenly becomes faster and it gives a flat line on the memory allocations. After 5 minutes - the line is now flat and is not climbing anymore. Here is the project that replicates your case. In conclusion, this is not a memory leak. Since you're requiring a big caching on the statement based on the big data entity you have, the library requires such memory to execute your insertion fast. |
The Project: InsertAllMemoryLeaksPostgreSql-BinaryBulkInsert.zip |
Bug Description
Hi Michael,
I'm using RepoDb.PostgreSql.BulkOperations version 1.13.1, this application parse a lot of CDR's files and insert in a PostgreSql database, but when my application execute this line:
Memory usage increase and never decrease, No matter if I force to GC to collect and clean.
This application runs in a Debian Linux Server with 32 GB and postgresql 13.x, I'm using the latest version of RepoDb.PostgreSql.BulkOperations and dotnet core 6.0.
When the applications usage gets the 32 GB of RAM, the server starts to swap and the application stops working.
And it seems that it is incremental, for example it starts with 100 MB and for each file it processes it increases exponentially 200MB, 400 MB... 3 GB, etc.
Images
Library Version:
Version: RepoDb.PostgreSql.BulkOperations version 1.13.1
The text was updated successfully, but these errors were encountered: