Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Error when opening file with list field (System.InvalidCastException) #119

Open
0xSheller opened this issue Sep 23, 2024 · 7 comments
Labels
bug Something isn't working

Comments

@0xSheller
Copy link

0xSheller commented Sep 23, 2024

Parquet Viewer Version
Latest (3.1.0)

Where was the parquet file created?
pyarrow

Sample File
Generic Array

Describe the bug
Upon loading a file i get hit with:

---------------------------
Specified cast is not valid.
---------------------------
Something went wrong (CTRL+C to copy):

System.InvalidCastException: Specified cast is not valid.

   at ParquetViewer.Engine.ParquetEngine.ReadListField(DataTableLite dataTable, ParquetRowGroupReader groupReader, Int32 rowBeginIndex, ParquetSchemaElement itemField, Int32 fieldIndex, Int64 skipRecords, Int64 readRecords, Boolean isFirstColumn, CancellationToken cancellationToken, IProgress`1 progress)

   at ParquetViewer.Engine.ParquetEngine.ProcessRowGroup(DataTableLite dataTable, ParquetRowGroupReader groupReader, Int64 skipRecords, Int64 readRecords, CancellationToken cancellationToken, IProgress`1 progress)

   at ParquetViewer.Engine.ParquetEngine.PopulateDataTable(DataTableLite dataTable, ParquetReader parquetReader, Int64 offset, Int64 recordCount, CancellationToken cancellationToken, IProgress`1 progress)

   at ParquetViewer.Engine.ParquetEngine.ReadRowsAsync(List`1 selectedFields, Int32 offset, Int32 recordCount, CancellationToken cancellationToken, IProgress`1 progress)

   at ParquetViewer.MainForm.<>c__DisplayClass33_0.<<LoadFileToGridview>b__1>d.MoveNext()

--- End of stack trace from previous location ---

   at ParquetViewer.MainForm.LoadFileToGridview()

   at System.Threading.Tasks.Task.<>c.<ThrowAsync>b__128_0(Object state)

   at InvokeStub_SendOrPostCallback.Invoke(Object, Span`1)

   at System.Reflection.MethodBaseInvoker.InvokeWithOneArg(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture)
---------------------------
OK   
---------------------------
@0xSheller 0xSheller added the bug Something isn't working label Sep 23, 2024
@mukunku mukunku changed the title [BUG] [BUG] Error when opening file with list field Dec 22, 2024
@mukunku mukunku mentioned this issue Dec 23, 2024
@mukunku
Copy link
Owner

mukunku commented Dec 23, 2024

Can you share a sample file please? You can zip and upload it here. Otherwise I might not be able to help much.

Also give version 3.2.0.0 a try. I doubt it has addressed your issue but wanted to suggest just in case 🤞🏼 .

@mukunku mukunku changed the title [BUG] Error when opening file with list field [BUG] Error when opening file with list field (System.InvalidCastException) Jan 9, 2025
@mukunku
Copy link
Owner

mukunku commented Jan 9, 2025

I believe this is due to list fields with nested struct values. Implementing it has proven challenging.

@mukunku mukunku mentioned this issue Feb 3, 2025
@toomyem
Copy link

toomyem commented Feb 4, 2025

Maybe related. I'm, getting the following error. And yes, it is parquet with nested struct.

---------------------------
Something went wrong
---------------------------
Could not load parquet file.

If the problem persists please consider opening a bug ticket in the project repo: Help → About

ParquetViewer.Engine.Exceptions.FileReadException: Encountered an error reading file.
 ---> System.InvalidOperationException: don't know how to skip type Set
   at Parquet.Meta.Proto.ThriftCompactProtocolReader.SkipField(CompactType compactType)
   at Parquet.Meta.ColumnChunk.Read(ThriftCompactProtocolReader proto)
   at Parquet.Meta.RowGroup.Read(ThriftCompactProtocolReader proto)
   at Parquet.Meta.FileMetaData.Read(ThriftCompactProtocolReader proto)
   at Parquet.ParquetActor.ReadMetadataAsync(CancellationToken cancellationToken)
   at Parquet.ParquetReader.InitialiseAsync(CancellationToken cancellationToken)
   at Parquet.ParquetReader.CreateAsync(String filePath, ParquetOptions parquetOptions, CancellationToken cancellationToken)
   at ParquetViewer.Engine.ParquetEngine.OpenFileAsync(String parquetFilePath, CancellationToken cancellationToken)
   --- End of inner exception stack trace ---
   at ParquetViewer.Engine.ParquetEngine.OpenFileAsync(String parquetFilePath, CancellationToken cancellationToken)
   at ParquetViewer.MainForm.OpenFieldSelectionDialog(Boolean forceOpenDialog)

@mukunku
Copy link
Owner

mukunku commented Feb 4, 2025

@toomyem Which version of the app are you using? Have you tried v3.2.1.0 ?

@toomyem
Copy link

toomyem commented Feb 4, 2025

Nope, still on 3.2.0 as it is latest released.

@toomyem
Copy link

toomyem commented Feb 4, 2025

I just checked 3.2.1.0 - works much better and opens my file correctly 👍
Thank you.

@toomyem
Copy link

toomyem commented Feb 5, 2025

After some testing, it looks like it is not handled 100% correct.

Example schema:

message test-msg {
  required int32 data_required32;
  required int64 data_required64;
  optional binary data_optional_missing (UTF8);
  optional binary data_optional_existing (UTF8);
  repeated int32 data_repeated;
  repeated group data_group {
    required binary nested (UTF8);
  }

  required binary uuid (UTF8);

  required group data_list (LIST) {
    repeated group list {
      required binary element (UTF8);
    }
  }
}

Test parquet file: test.zip
Parquet file without repetition: test-no-repeat.zip

Error while opening:

---------------------------
Specified cast is not valid.
---------------------------
Something went wrong (CTRL+C to copy):

System.InvalidCastException: Specified cast is not valid.
   at System.Data.Common.Int32Storage.Set(Int32 record, Object value)
   at System.Data.DataColumn.set_Item(Int32 record, Object value)

Problem occurs when there is more than one repetition for field repeated int32 data_repeated;

Works:

 record.add("data_repeated", 10);

Fails:

 record.add("data_repeated", 10);
 record.add("data_repeated", 20);

Moreover, required group data_list (LIST) is being read as empty list ([]):
Image

Snippet used to create parquet file:

try (InputStream inputStream = MessageTypeProvider.class.getResourceAsStream("/test-msg.message")) {
            String schema = new String(inputStream.readAllBytes(), StandardCharsets.UTF_8);
            MessageType msgType = MessageTypeParser.parseMessageType(schema);
            try (ParquetWriter<Group> writer = new ParquetWriterBuilder(Path.of("test.parquet"), msgType).withWriteMode(OVERWRITE).build()) {
                Group record = new SimpleGroupFactory(msgType).newGroup();
                record.add("uuid", UUID.randomUUID().toString());
                record.add("data_required32", 32);
                record.add("data_required64", 64L);
                record.add("data_optional_existing", "hello");
                record.add("data_repeated", 10);
                record.add("data_repeated", 20);
                Group data = record.addGroup("data_group");
                data.add("nested", "nested!");
                Group list = record.addGroup("data_list");
                Group el1 = list.addGroup("list");
                el1.add("element", "element1");
                Group el2 = list.addGroup("list");
                el2.add("element", "element2");
                writer.write(record);
            }
        }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants