You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jun 14, 2024. It is now read-only.
Adding the support for nested fields impacts the following areas:
Validate nested column names
Modify the create index action
Modify the filter and rule index functions
Creating the index
As seen above the index may be created with already existing Hyperspace APIs.
⚠️Important⚠️
It will NOT be possible to support multiple fields from a nested array due to the way Spark is working right now. This limitation is because arrays need to be exploded to create the proper index and Spark allows only one generator allowed per select clause.
Under the hood the index may be created with something like this:
It is important to understand that the name of the field of the index column is a non-nested column and due to parquet quirkiness on using . (dot) in the field name, it has to be properly renamed and at query time projected as it was. This will be based on the renaming pattern already implemented in #365.
Problem Statement
Hyperspace does NOT support indexing over columns/fields of type array of struct.
Background and Motivation
There are use cases where queries like the ones bellow can greatly benefit performance wise from Hyperspace's indexing.
Proposed Solution
This design is based on the work of proposal #347 and #365.
Given the following dataset
Anyone should be able to create an index with:
Alternatives
N/A
Known/Potential Compatibility Issues
N/A
Design
Adding the support for nested fields impacts the following areas:
Creating the index
As seen above the index may be created with already existing Hyperspace APIs.
It will NOT be possible to support multiple fields from a nested array due to the way Spark is working right now. This limitation is because arrays need to be exploded to create the proper index and Spark allows
only one generator allowed per select clause
.Under the hood the index may be created with something like this:
The resulted data frame will be:
It is important to understand that the name of the field of the index column is a non-nested column and due to parquet quirkiness on using
.
(dot) in the field name, it has to be properly renamed and at query time projected as it was. This will be based on the renaming pattern already implemented in #365.Search query
Given the following search/filter query
More to come...
Join Queries
TDB
Implementation
Performance Implications (if applicable)
N/A
The text was updated successfully, but these errors were encountered: