Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Dockerfile for alpine v3 / liberica v21 #8111

Closed
wants to merge 1 commit into from

Conversation

hawko2600
Copy link
Contributor

Alter Dockerfile to rebase on Alpine Linux v3 with Liberica OpenJDK v21 for NiFi 2.0.0

Summary

NIFI-00000

Tracking

Please complete the following tracking steps prior to pull request creation.

Issue Tracking

Pull Request Tracking

  • Pull Request title starts with Apache NiFi Jira issue number, such as NIFI-00000
  • Pull Request commit message starts with Apache NiFi Jira issue number, as such NIFI-00000

Pull Request Formatting

  • Pull Request based on current revision of the main branch
  • Pull Request refers to a feature branch with one commit containing changes

Verification

Please indicate the verification steps performed prior to pull request creation.

Build

  • Build completed using mvn clean install -P contrib-check
    • JDK 21

Licensing

  • New dependencies are compatible with the Apache License 2.0 according to the License Policy
  • New dependencies are documented in applicable LICENSE and NOTICE files

Documentation

  • Documentation formatting appears as expected in rendered files

Alter Dockerfile to rebase on Alpine Linux v3 with Liberica OpenJDK v21 for NiFi 2.0.0
Copy link
Contributor

@ChrisSamo632 ChrisSamo632 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for proposing this change @hawko2600. I think we'd want to keep all docker builds consistent with one another.

As a minimum, you'll want to update the docker maven build for nifi, along with the docker hub file.

I think it would be best to also update the docker images for:

Each component has a docker hub setup (used to build images for docker hub, based upon released component binaries on apache distribution servers) and a docker maven setup (used to build an image based on a local build of the repo, also used for testing the docker builds)

Copy link
Contributor

@exceptionfactory exceptionfactory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @hawko2600, I concur with @ChrisSamo632 that it would be helpful to apply this change to all Dockerfiles for consistency.

@exceptionfactory
Copy link
Contributor

@hawko2600 Please review the pull request checklist and follow the steps to create an Apache NiFi Jira issue for tracking these changes as well. Thanks!

@exceptionfactory
Copy link
Contributor

@hawko2600 On further consideration, moving to Alpine does not look like a feasible approach for the standard container images.

The primary reason is the difference in platforms with musl for Alpine versus glibc for Debian, as described in the following article:

https://megamorf.gitlab.io/2020/05/06/why-it-s-better-not-to-use-alpine-linux-for-python-projects/

With NiFi 2.0.0 supporting native Python Processors, having general compatibility with Python C libraries is important.

In terms of actual container size, the current difference between the Debian and Alpine images is around 75 MB. Although size optimization is important, NiFi binaries make up the primary share of container size.

It is also worth highlighting that the current container images are targeted for general use cases, and particular deployment environments may have other requirements.

There are other potential improvements for the current container configuration, so some of this discussion may be worth continuing in Jira.

In light of the platform concerns with Alpine and Python, I am closing the pull request for now. Thanks for proposing this alternative approach.

@joewitt
Copy link
Contributor

joewitt commented Dec 4, 2023

Ahhh that is a great point @exceptionfactory. It might make sense to support/include a non python Java 2.0 based image on Alpine. That would end up a good bit smaller too. Something to consider as an alternative JIRA/PR

@hawko2600
Copy link
Contributor Author

@hawko2600 On further consideration, moving to Alpine does not look like a feasible approach for the standard container images.

The primary reason is the difference in platforms with musl for Alpine versus glibc for Debian, as described in the following article:

https://megamorf.gitlab.io/2020/05/06/why-it-s-better-not-to-use-alpine-linux-for-python-projects/

With NiFi 2.0.0 supporting native Python Processors, having general compatibility with Python C libraries is important.

In terms of actual container size, the current difference between the Debian and Alpine images is around 75 MB. Although size optimization is important, NiFi binaries make up the primary share of container size.

It is also worth highlighting that the current container images are targeted for general use cases, and particular deployment environments may have other requirements.

There are other potential improvements for the current container configuration, so some of this discussion may be worth continuing in Jira.

In light of the platform concerns with Alpine and Python, I am closing the pull request for now. Thanks for proposing this alternative approach.

It's unfortunate that that website from 3 years ago contains verifiable lies, for example it states that there is no CVE list for alpine whereas anyone can see from the homepage of alpinelinux.org that security.alpinelinux.org is linked directly and contains a comprehensive list.

I also find the performance issue contrived; the example given demonstrates installing pre-compiled apks versus compiling and installing muultiple packages from source. It claims the precompiled version took over 16 minutes whereas the needs-compiling version took only 12 seconds. There's no way that stacks up in the real world! The only way to achieve this would be to pre-compile the wheel files and include them in the docker image so the pip install only takes I/O unpacking time, which would mount to about ... 12 seconds. Obviously 16 minutes to install apks is because they set the network interface to 2Kb/s and let it download from an alpine mirror on the other side of the planet. Absolutely zero evidence was supplied as a normal shootout would do to ensure basic critical analysis would be satisfied that the premise is being tested, not external factors.

The main complaint about using MUSL stdlib is that "it might" be different to glibc. Well, it better be, or they failed in their mission to remove the bloat. No evidence was produced for the claim. Presumably, if I am on a Linux system with the debug version of glibc and the debug version of musl, then the difference in debugging is that with one I run the app through gdb and the other I ... run the app through gdb? There's absolutely zero difference. It's just FUD.

And in any case, it's a moot point because the images here are based on the glibc version of Alpine, which I did deliberately because I'm well aware that some libraries insist on being glibc-compatible only and we have an open-ended functionality presented through the new Python API interface.

@joewitt
Copy link
Contributor

joewitt commented Dec 5, 2023

I am still interested in this happening. If the claims arent valid we just need to run this thing. Anything which makes the builds smaller is a win but we should be consistent. We need a JIRA, and the various docker images updated.

@exceptionfactory
Copy link
Contributor

Thanks for the thoughtful reply @hawko2600, on further review, referencing that post was short-sighted on my part. The list of vulnerabilities claim caught my eye as unsubstantiated, but I should have taken a closer look at the other claims.

I also should have taken a closer at the liberica-openjdk-alpine image description, noting the glibc base instead of musl, which I expected from other Alpine containers. I see from the the image tag details that this does indeed use the glibc base.

This is a good example of where having a more thorough background on the change would be helped provide a better initial evaluation.

If you are still willing to put time into this, I would be glad to revisit the changes. As mentioned initially, if you can create a Jira issue for tracking, that would also help capture the background rationale, and importance of the fact that this Alpine image is based on glibc as opposed to musl.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants