Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No columns found on reading parquet files from ADLS Gen2 #491

Open
chadhorne opened this issue Jan 22, 2025 · 1 comment
Open

No columns found on reading parquet files from ADLS Gen2 #491

chadhorne opened this issue Jan 22, 2025 · 1 comment

Comments

@chadhorne
Copy link

Issue Description

  • Description of the issue: As of Sling version 1.3.5 the sling run command fails to read parquet files from ADLS Gen2, throwing error no columns found for: select * from read_parquet(['my_file.parquet']). The command succeeds with Sling version 1.3.4.

  • Sling version (sling --version): 1.3.5

  • Operating System (linux, mac, windows): windows

  • Replication Configuration:

connections:
  DATA_LAKE_RAW:
    type: azure
    account: <STORAGE_ACCOUNT>
    container: raw
    conn_str: DefaultEndpointsProtocol=https;AccountName=<STORAGE_ACCOUNT>;AccountKey=<ACCOUNT_KEY>;EndpointSuffix=core.windows.net

  DATA_WAREHOUSE_STAGING:
    type: sqlserver
    host: <SERVER>
    port: 1433
    database: Staging
    schema: dbo
    user: <USERNAME>
    password: <PASSWORD>
    encrypt: 'true'
    trust_server_certificate: 'true'

sling run --src-conn DATA_LAKE_RAW --src-stream 'my_file.parquet' --tgt-conn DATA_WAREHOUSE_STAGING --tgt-object 'my_schema.my_table' --mode full-refresh
  • Log Output (please run command with -d):
2025-01-22 10:19:42 INF Sling CLI | https://slingdata.io
2025-01-22 10:19:42 DBG opened "azure" connection (conn-azure-LaE)
2025-01-22 10:19:42 DBG Sling version: 1.3.5 (windows amd64)
2025-01-22 10:19:42 DBG type is file-db
2025-01-22 10:19:42 DBG using: {"columns":null,"mode":"full-refresh","transforms":null}
2025-01-22 10:19:42 DBG using source options: {"empty_as_null":true,"header":true,"fields_per_rec":-1,"compression":"auto","null_if":"NULL","datetime_format":"AUTO","skip_blank_lines":false,"max_decimals":-1}
2025-01-22 10:19:42 DBG using target options: {"datetime_format":"auto","file_max_rows":0,"max_decimals":-1,"use_bulk":true,"add_new_columns":true,"adjust_column_type":false,"column_casing":"source"}
2025-01-22 10:19:42 INF connecting to target database (sqlserver)
2025-01-22 10:19:42 DBG opened "sqlserver" connection (conn-sqlserver-cN1)
2025-01-22 10:19:42 INF reading from source file system (azure)
2025-01-22 10:19:42 DBG opened "azure" connection (conn-azure-e3E)
2025-01-22 10:19:42 DBG reading datastream from https://<STORAGE_ACCOUNT>.blob.core.windows.net/raw/my_file.parquet [format=parquet, nodes=1]
2025-01-22 10:19:43 DBG closed "azuresql" connection (conn-sqlserver-cN1)
2025-01-22 10:19:43 INF execution failed
fatal:
--- task_run.go:133 func2 ---
~ could not read from file
--- task_run.go:440 runFileToDB ---
~ Could not FileSysReadDataflow for azure
--- task_run_read.go:258 ReadFromFile ---
~ error getting dataflow
--- fs.go:588 ReadDataflow ---
--- fs.go:1155 GetDataflowViaDuckDB ---
~ dataflow error while waiting for ready state
--- dataflow.go:654 WaitReady ---

~ datastream error
--- dataflow.go:583 PushStreamChan ---

--- fs.go:1134 1 ---
--- datastream.go:1528 ConsumeParquetReaderDuckDb ---
~ Error consuming reader for https://<STORAGE_ACCOUNT>.blob.core.windows.net/raw/my_file.parquet
--- duckdb.go:566 Stream ---
~ could not read parquet rows
--- duckdb.go:584 StreamContext ---
~ could not get columns
--- duckdb.go:961 Describe ---
no columns found for: select * from read_parquet(['https://<STORAGE_ACCOUNT>.blob.core.windows.net/raw/my_file.parquet'])
@flarco
Copy link
Collaborator

flarco commented Jan 22, 2025

hey, can you try the dev build for the upcoming next release?
This should have been fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants