-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PPL: Add json_set
and json_extend
command to spark-ppl
#1038
base: main
Are you sure you want to change the base?
PPL: Add json_set
and json_extend
command to spark-ppl
#1038
Conversation
Signed-off-by: Andrew Carbonetto <[email protected]>
Signed-off-by: Andrew Carbonetto <[email protected]>
Signed-off-by: Andrew Carbonetto <[email protected]>
json_extend
command to spark-ppljson_set
and json_extend
command to spark-ppl
Signed-off-by: Andrew Carbonetto <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add some ITs for testing integrated with multiple json functions
@@ -223,10 +223,6 @@ public enum BuiltinFunctionName { | |||
JSON_EXTRACT(FunctionName.of("json_extract")), | |||
JSON_KEYS(FunctionName.of("json_keys")), | |||
JSON_VALID(FunctionName.of("json_valid")), | |||
// JSON_DELETE(FunctionName.of("json_delete")), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Em, how did json_delete
work if we didn't included here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its not a built-in function, but a user-defined function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add IT for this PR and ensure the examples you add in the document are all correct.
val jsonObjExp = | ||
Literal("""{"a":[{"b":1},{"c":2}]}""") | ||
val jsonFunc = | ||
Alias(visit("json_delete", util.List.of(jsonObjExp, keysExpression)), "result")() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: json_delete -> json_set
@@ -278,12 +301,57 @@ Example: | |||
|{"teacher":["Alice","Tom","Walt"],"student":[{"name":"Bob","rank":1},{"name":"Charlie","rank":2}]} | | |||
+-----------------------------------------------------------------------------------------------------------------------------------+ | |||
|
|||
|
|||
os> source=people | eval append = json_append(`{"school":{"teacher":["Alice"],"student":[{"name":"Bob","rank":1},{"name":"Charlie","rank":2}]}}`,array('school.teacher', 'Tom', 'Walt')) | head 1 | fields append | |||
os> source=people | eval append = json_append(`{"school":{"teacher":["Alice"],"student":[{"name":"Bob","rank":1},{"name":"Charlie","rank":2}]}}`,array('school.teacher', array('Tom', 'Walt'))) | head 1 | fields append |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you ever checked this PPL? Does it work?
As far as I know, function array
in Spark won't accept elements of different types.
|
||
Example: | ||
|
||
os> source=people | eval extend = json_extend(`{"teacher":["Alice"],"student":[{"name":"Bob","rank":1},{"name":"Charlie","rank":2}]}`, 'student', '{"name":"Tommy","rank":5}') | head 1 | fields extend |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same problem as found in json_set
. These 2 functions should only have 2 parameters as defined in the code while given 3 here.
|
||
Example: | ||
|
||
os> source=people | eval updated = json_set('{"a":[{"b":1},{"b":2}]}', '$.a[*].b', 3) | head 1 | fields updated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The syntax of json_set
in this PPL doesn't match that in UT. Will this PPL really work? Please double check this and add similar IT for the new added functions.
String currentKey = pathParts[depth]; | ||
|
||
if (depth == pathParts.length - 1) { | ||
// If it's the last key, append to the array |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment here not suitable.
* @param depth - current traversal depth | ||
* @param valueToUpdate - value to update | ||
*/ | ||
static void updateNestedValue(Object currentObj, String[] pathParts, int depth, Object valueToUpdate) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we reuse the code from appendNestedValue? These 2 functions seem to have mostly duplicated code with differences only in their final operations -- set or append.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking the same thing. I will try and consolidate today - maybe with another functional argument.
Description
Adds
json_set
andjson_extend
functions to the spark PPL UDF.Related Issues
Resolves #996
Check List
--signoff
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.