Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PPL: Add json_set and json_extend command to spark-ppl #1038

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

acarbonetto
Copy link

@acarbonetto acarbonetto commented Feb 7, 2025

Description

Adds json_set and json_extend functions to the spark PPL UDF.

Related Issues

Resolves #996

Check List

  • Updated documentation (docs/ppl-lang/README.md)
  • Implemented unit tests
  • Implemented tests for combination with other commands
  • New added source code should include a copyright header
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Andrew Carbonetto <[email protected]>
Signed-off-by: Andrew Carbonetto <[email protected]>
Signed-off-by: Andrew Carbonetto <[email protected]>
@YANG-DB YANG-DB enabled auto-merge (squash) February 7, 2025 04:35
@YANG-DB YANG-DB disabled auto-merge February 7, 2025 04:35
@acarbonetto acarbonetto changed the title PPL: Add json_extend command to spark-ppl PPL: Add json_set and json_extend command to spark-ppl Feb 7, 2025
Signed-off-by: Andrew Carbonetto <[email protected]>
Copy link
Member

@LantaoJin LantaoJin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add some ITs for testing integrated with multiple json functions

@@ -223,10 +223,6 @@ public enum BuiltinFunctionName {
JSON_EXTRACT(FunctionName.of("json_extract")),
JSON_KEYS(FunctionName.of("json_keys")),
JSON_VALID(FunctionName.of("json_valid")),
// JSON_DELETE(FunctionName.of("json_delete")),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Em, how did json_delete work if we didn't included here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its not a built-in function, but a user-defined function

@LantaoJin
Copy link
Member

cc @qianheng-aws

Copy link
Contributor

@qianheng-aws qianheng-aws left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add IT for this PR and ensure the examples you add in the document are all correct.

val jsonObjExp =
Literal("""{"a":[{"b":1},{"c":2}]}""")
val jsonFunc =
Alias(visit("json_delete", util.List.of(jsonObjExp, keysExpression)), "result")()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: json_delete -> json_set

@@ -278,12 +301,57 @@ Example:
|{"teacher":["Alice","Tom","Walt"],"student":[{"name":"Bob","rank":1},{"name":"Charlie","rank":2}]} |
+-----------------------------------------------------------------------------------------------------------------------------------+


os> source=people | eval append = json_append(`{"school":{"teacher":["Alice"],"student":[{"name":"Bob","rank":1},{"name":"Charlie","rank":2}]}}`,array('school.teacher', 'Tom', 'Walt')) | head 1 | fields append
os> source=people | eval append = json_append(`{"school":{"teacher":["Alice"],"student":[{"name":"Bob","rank":1},{"name":"Charlie","rank":2}]}}`,array('school.teacher', array('Tom', 'Walt'))) | head 1 | fields append
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you ever checked this PPL? Does it work?

As far as I know, function array in Spark won't accept elements of different types.


Example:

os> source=people | eval extend = json_extend(`{"teacher":["Alice"],"student":[{"name":"Bob","rank":1},{"name":"Charlie","rank":2}]}`, 'student', '{"name":"Tommy","rank":5}') | head 1 | fields extend
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same problem as found in json_set. These 2 functions should only have 2 parameters as defined in the code while given 3 here.


Example:

os> source=people | eval updated = json_set('{"a":[{"b":1},{"b":2}]}', '$.a[*].b', 3) | head 1 | fields updated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The syntax of json_set in this PPL doesn't match that in UT. Will this PPL really work? Please double check this and add similar IT for the new added functions.

String currentKey = pathParts[depth];

if (depth == pathParts.length - 1) {
// If it's the last key, append to the array
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment here not suitable.

* @param depth - current traversal depth
* @param valueToUpdate - value to update
*/
static void updateNestedValue(Object currentObj, String[] pathParts, int depth, Object valueToUpdate) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we reuse the code from appendNestedValue? These 2 functions seem to have mostly duplicated code with differences only in their final operations -- set or append.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking the same thing. I will try and consolidate today - maybe with another functional argument.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[PPL-Lang] PPL support json_set, json_extend functions
3 participants