Allow parquet and compressed csv files in DB->Snowflake replication #480

nixent · 2025-01-07T16:45:59Z

Feature Description

DB->Snowflake replication processes data in 3 steps:

export from the DB to local csv files
PUT the csv files to internal stage in Snowflake
Insert/Upsert of the records from the csv files

It works fine, however, csv files can get pretty large and the file transfer might take significant time. Suggest adding option to compress the files and/or store them in parquet format.

flarco · 2025-01-07T17:22:20Z

The issue with parquet is that it can take alot of memory, while csv streams. it's worth an experiment.
The temp CSVs should be compressed, though (with zstd). Can you confirm that?
Do you have any non-parquet suggestions? This is the fastest I can think of to load into snowflake. You can use S3 as a temp storage, but it's not going to speed it up. The other route could be snowpipe streaming, but that's a non-starter as it requires alot of setup.

nixent · 2025-01-08T11:48:20Z

parquet vs csv streaming: it would be nice to let the user to decide if they are willing to allocate more resources to the EL process.

resource utilization:
I'm wondering if total resource utilization would be different in case of writing csv to disc and compressing it vs writing parquet. Parquet is essentially a csv compressed by columns, thus it should take more or less same resources to produce both, parquet and compressed csv. Limiting number of records per parquet file will with capping memory/cpu utilization, same way as parquet target.

flarco · 2025-01-08T12:00:42Z

it would be nice to let the user to decide if they are willing to allocate more resources to the EL process.

Agreed.

I'm wondering if total resource utilization would be different in case of writing csv to disc and compressing it vs writing parquet.

Perhaps... But yea, just read a thread and once inside the internal staging, parquet will be faster to load by snowflake engine due to columnar/ binary nature

nixent added the enhancement New feature or request label Jan 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow parquet and compressed csv files in DB->Snowflake replication #480

Allow parquet and compressed csv files in DB->Snowflake replication #480

nixent commented Jan 7, 2025

flarco commented Jan 7, 2025

nixent commented Jan 8, 2025

flarco commented Jan 8, 2025

Allow parquet and compressed csv files in DB->Snowflake replication #480

Allow parquet and compressed csv files in DB->Snowflake replication #480

Comments

nixent commented Jan 7, 2025

Feature Description

flarco commented Jan 7, 2025

nixent commented Jan 8, 2025

flarco commented Jan 8, 2025