Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ZEPPELIN-6135] Fix clean goal not removing interpreter/ output(Spark,Flink) #4881

Merged
merged 5 commits into from
Nov 6, 2024

Conversation

tbonelee
Copy link
Contributor

What is this PR for?

The Maven clean goal does not remove some interpreter files in interpreter/ directory as expected.
Specifically, the spark and flink interpreters are not cleaned.

This happens because when extending the pom.xml in the pom.xml of these interpreters, they lack the interpreter.name property.
As a result, the clean target directory configured in zeppelion-interpreter-parent was not properly resolved before.

What type of PR is it?

Bug Fix

Todos

  • - Task

What is the Jira issue?

How should this be tested?

  • Build Zeppelin
  • Clean Zeppelin and check if all interpreters are removed in interpreter/

Screenshots (if appropriate)

Questions:

  • Does the license files need to update? No
  • Is there breaking changes for older versions? No
  • Does this needs documentation? No

@tbonelee tbonelee force-pushed the fix-interparent-clean branch from 044c344 to 98df2ee Compare October 21, 2024 12:40
@Reamer
Copy link
Contributor

Reamer commented Oct 21, 2024

Shouldn't the clean be done from the sub-module e.g. spark/interpreter?

@tbonelee
Copy link
Contributor Author

@Reamer
Thanks for pointing that out. Just to confirm, you're saying that it makes sense for the clean goal to run in the module where the output is created, right? I updated it to run from spark/interpreter based on that.

Also, quick question, is it okay to delete the entire interpreter/spark directory, including the spark-scala interpreter JARs? I used a wildcard for simplicity, but if those files need to stay, I'll adjust it to only delete specific directories.

@Reamer
Copy link
Contributor

Reamer commented Oct 30, 2024

In general, this interpreter jar cleanup should already work.
The special clean goal in the interpreter parent should be triggered by the Flink submodule flink-scala-2.12 and the Spark submodule spark-interpreter
You can find the cleanup goal here.

<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-clean-plugin</artifactId>
<configuration>
<filesets>
<fileset>
<directory>${project.basedir}/../interpreter/${interpreter.name}</directory>
<followSymlinks>false</followSymlinks>
</fileset>
</filesets>
</configuration>
</plugin>

Maybe it doesn't work because of the special submodule structure. I will check this out.

@Reamer
Copy link
Contributor

Reamer commented Oct 30, 2024

The path is not correct due to the submodule structure in Flink and Spark. Not to make it too complicated, I would be in favor of your first contribution suggestion. Put the interpreter name in the parent pom.xml of Spark and Flink. Theoretically you could then remove it in the corresponding submodules.
It is desirable that the complete folder is removed.

In general, I don't think it's good that all interpreters store files outside. But this can be adjusted in the future.

Thank you for finding the bug.

@tbonelee
Copy link
Contributor Author

tbonelee commented Nov 3, 2024

@Reamer Thanks for checking things out!
I've reverted the second commit to go with the simpler solution you suggested. I also removed the unnecessary clean tasks in the Spark interpreter submodules.
Could you please check if these changes align with your suggestions?

@Reamer
Copy link
Contributor

Reamer commented Nov 5, 2024

Just two minor lines.
I think we can delete the following lines

<interpreter.name>flink</interpreter.name>

<interpreter.name>spark</interpreter.name>

What do you think?

@tbonelee
Copy link
Contributor Author

tbonelee commented Nov 5, 2024

@Reamer I agree with you; it's better to keep a single source of truth. I've updated these as you suggested.

Copy link
Contributor

@Reamer Reamer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Reamer Reamer merged commit f172b91 into apache:master Nov 6, 2024
16 of 17 checks passed
asfgit pushed a commit that referenced this pull request Nov 6, 2024
…,Flink)

### What is this PR for?
The Maven clean goal does not remove some interpreter files in `interpreter/` directory as expected.
Specifically, the `spark` and `flink` interpreters are not cleaned.

This happens because when extending the `pom.xml` in the `pom.xml` of these interpreters, they lack the `interpreter.name` property.
As a result, the clean target directory configured in `zeppelion-interpreter-parent` was not properly resolved before.

### What type of PR is it?
Bug Fix

### Todos
* [ ] - Task

### What is the Jira issue?
* Open an issue on Jira https://issues.apache.org/jira/browse/ZEPPELIN-6135

### How should this be tested?
- Build Zeppelin
- Clean Zeppelin and check if all interpreters are removed in `interpreter/`

### Screenshots (if appropriate)

### Questions:
* Does the license files need to update? No
* Is there breaking changes for older versions? No
* Does this needs documentation? No

Closes #4881 from tbonelee/fix-interparent-clean.

Signed-off-by: Philipp Dallig <[email protected]>
(cherry picked from commit f172b91)
Signed-off-by: Philipp Dallig <[email protected]>
@Reamer
Copy link
Contributor

Reamer commented Nov 6, 2024

Merge to master/branch-0.12

@tbonelee tbonelee deleted the fix-interparent-clean branch November 9, 2024 12:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants