You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
DB->Snowflake replication processes data in 3 steps:
export from the DB to local csv files
PUT the csv files to internal stage in Snowflake
Insert/Upsert of the records from the csv files
It works fine, however, csv files can get pretty large and the file transfer might take significant time. Suggest adding option to compress the files and/or store them in parquet format.
The text was updated successfully, but these errors were encountered:
The issue with parquet is that it can take alot of memory, while csv streams. it's worth an experiment.
The temp CSVs should be compressed, though (with zstd). Can you confirm that?
Do you have any non-parquet suggestions? This is the fastest I can think of to load into snowflake. You can use S3 as a temp storage, but it's not going to speed it up. The other route could be snowpipe streaming, but that's a non-starter as it requires alot of setup.
parquet vs csv streaming: it would be nice to let the user to decide if they are willing to allocate more resources to the EL process.
compression:
Snowflake supports multiple COMPRESSION formatTypeOptions: COMPRESSION = AUTO | GZIP | BZ2 | BROTLI | ZSTD | DEFLATE | RAW_DEFLATE | NONE
So GZIP or ZSTD would work.
resource utilization:
I'm wondering if total resource utilization would be different in case of writing csv to disc and compressing it vs writing parquet. Parquet is essentially a csv compressed by columns, thus it should take more or less same resources to produce both, parquet and compressed csv. Limiting number of records per parquet file will with capping memory/cpu utilization, same way as parquet target.
it would be nice to let the user to decide if they are willing to allocate more resources to the EL process.
Agreed.
I'm wondering if total resource utilization would be different in case of writing csv to disc and compressing it vs writing parquet.
Perhaps... But yea, just read a thread and once inside the internal staging, parquet will be faster to load by snowflake engine due to columnar/ binary nature
Feature Description
DB->Snowflake replication processes data in 3 steps:
It works fine, however, csv files can get pretty large and the file transfer might take significant time. Suggest adding option to compress the files and/or store them in parquet format.
The text was updated successfully, but these errors were encountered: