-
Notifications
You must be signed in to change notification settings - Fork 127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhancement: Refactor Batch Operations SQL Statements #1143
Comments
Referencing: #380 |
Yeah 3GB is unacceptably high…but it also feels odd that an otherwise simple string cache would be so large in memory. Have you tried a POC to read the byte (array) size of the cached string to just to confirm the actual string size approaches that… while not apples to apples it can help grasp the actual size of the cached string because 3GB would hold more than the entire encyclopedia which used to fit (with images) on a CD-ROM… |
Yes, we will do this. In addition, as of writing this, we are now working and investigating on which approach is best to do. Since we cannot eliminate the required packed-statement caching when executing the batch operations, we should somehow put focus into the packed-statements composition. Initial resolution of us would be (ideally), we should create the following SQL in every batch operation Row-N insertion. INSERT INTO [TableName] (Col1, ..., Col112)
VALUES
(@p1, ..., @p112),
(@p113, ..., @p224),
(@p225, ..., @p336); But, renaming from the INSERT INTO [TableName] (Col1, ..., Col112)
RETURNING INSERTED.[PK/IK]
VALUES
(@Col1_1, ..., @Col112_1),
(@Col1_2, ..., @Col112_2),
(@Col1_3, ..., @Col112_3); It would eliminate more than 75% of the size of the packed-statement as we analyzed. |
See these 2 packed statements. The new one is only around 50% of size of the old one. We only have eliminated around 55% of size as we are only using small table on our Regression Test. The bigger the table and Row-N the higher the bytes it will be eliminated. The challenge to us here is - how do we ensure that InsertAll-New-Packed-Statement.txt |
Hey @cajuncoding , there is a little bit of complexity atleast for SQL Server. When introducing the INSERT BULK approach in the packed-statement like below. INSERT INTO [Table] ([Column1])
OUTPUT [INSERTED].[Id]
VALUES
(@Column1_0),
(@Column1_1),
(@Column1_2); There is no way for me to order the Ofcourse, unless I hacked it like below. WITH CTE AS
(
SELECT @Column1_0 AS [Column1], @__RepoDb_OrderColumn_0 AS [__RepoDb_OrderColumn]
UNION SELECT @Column1_1, @__RepoDb_OrderColumn_1
UNION SELECT @Column1_2, @__RepoDb_OrderColumn_2
)
INSERT INTO [Table] ([Column1])
OUTPUT [INSERTED].[Id]
SELECT [Column1] FROM CTE ORDER BY [__RepoDb_OrderColumn]; Since you were the one who had requested this capability, would you be able to validate that the 1st statement above is really contaminating the ordering of the insertion? I tried it many times, but it sounds to me it is not affecting the order. In short, the result is the same with 2nd statement and with the old approach. Would you be of help? |
@mikependon , I can’t remember exactly how I was validating it at the time but it was definitely an issue…but as you alluded to that was related to using However the issue still stands because SQL Server makes no guarantee at all of the returned item order unless you tell it exactly how to order (via A little google foo yielded this excellent Q&A on StackOverflow with a great response…it may be that the SQL Server implementation will need to use a merge statement vs simple insert, but it’s just a tweak on the script because all values can still be packed and passed using the same https://dba.stackexchange.com/a/155737/168595 Thoughts on this? |
@cajuncoding yeah, I read the link you shared and it is such a goos comment feom the community to not rely on the default order of the output based on the input. So, it is still recommendable to add that extra steps, which I would also do. Thanks mate 🚀 |
There seems to have a challenge in the PGSQL, a working query in SQL Server is failing in PGSQL. Reference: https://twitter.com/mike_pendon/status/1638654274237497346 |
Describe the enhancement
As a response to the #1142, it is important for the library to limit the usage of the memory while it is caching the packed-statements per entity/model when calling the batch operations. This is to ensure give more resource for the application to utilize the memory on other purpose.
The rationale
What? In the mentioned issue above, in the PGSQL extended library, a user has reported a memory leaks. They have an entity model that corresponds to the table with 112 columns. During the call to the
InsertAll
operation withbatchSize
of 500, the library suddenly spike the memory to 3GB.Why? Given with this parameter, RepoDB will/might possibly cache the max of 500 big packed-statement per row-size for such big table. This is only for a single entity and on this operation, not yet for other enties for other batch operation (i.e.: MergeAll, UpdateAll)
In the current version of the library, when you call the
InsertAll
operation, it caches a packed-statement based on the number of rows passed in the operation. (This is also true toMergeAll
andUpdateAll
)For a better elaboration, if a user had passed 3 entities in the operation, the statement below will be created.
Such statement will be cached into a host memory for the operation with 3 rows. It will be reused when the same entity model (or table) is passed again in the future, but is only for 3 rows.
Then, if a user had passed 5 entities in the operation, the statement below will be cached as well into the host memory.
If the user had set the
batchSize
to 500, a possible 500 packed-statements of the above statements will be saved into the host memory, resulting to a bigger memory requirements/consumptions.Conclusion
As per the report and also to the screenshots we provided during the investigation, it has utilized to the max of 3GB memory for such single entity. This alarming as it will require the application to issue high resource limit in case they need to utilize the batch capability of the library.
Though, this is not an issue, but this is something requires an attention and revisits to optimize the memory usage.
The text was updated successfully, but these errors were encountered: