Skip to content

Commit

Permalink
Merge pull request #551 from boozallen/540-harden-hive-image
Browse files Browse the repository at this point in the history
[#540] update jars/libs in hive image
  • Loading branch information
ewilkins-csi authored Jan 29, 2025
2 parents 91f5cf4 + 01a61ca commit 4f59edc
Show file tree
Hide file tree
Showing 15 changed files with 434 additions and 74 deletions.
10 changes: 9 additions & 1 deletion DRAFT_RELEASE_NOTES.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ _Note: instructions for adapting to these changes are outlined in the upgrade in
| `AIOpsModelInstanceRepostory` | `AissembleModelInstanceRepository` |
| `AiopsMdaJsonUtils` | `AissembleMdaJsonUtils` |
- To improve the development cycle and docker build consistency, we have deprecated the docker_build() and local_resources() functions in the Tilt and enable maven docker build for the docker modules. Follow the instruction in the `Finalizing the Upgrade` to avoid duplicated docker image build.
- In an attempt to harden the `aissemble-hive-service` image, several changes were made that may impact projects with Hive customization


# Known Issues
Expand Down Expand Up @@ -104,7 +105,14 @@ To avoid duplicate docker builds, remove all the related `docker_build()` and `l

## Conditional Steps

## AWS IRSA (IAM Roles Service Account) Authentication
### For projects that have customized the Hive service
Several changes were made to both the Hive service Docker image and the Hive service chart included as part of the Spark Infrastructure chart of a project. The defaults have been adjusted so that these changes should be transparent, however due to the nature of some possible customizations this may not always hold true. The following changes may impact the function of your customizations and may need to be accounted for:
- The image is now only the Hive Standalone Metastore service and cannot function as a full [Hive Server](https://hive.apache.org/development/quickstart/)
- The Java installation at `/opt/java` is no longer symlinked to `/opt/jre` -- `JAVA_HOME` has been adjusted accordingly by default
- The default working directory for the `aissemble-hive-service` image was changed from `/opt` to `/opt/hive`
- Schema initialization is no longer done as part of an `initContainer` in the `aissemble-hive-service-chart` and is instead done in a new `entrypoint` script. This is consistent with the [official `apache/hive` Docker image](https://hub.docker.com/r/apache/hive).

### AWS IRSA (IAM Roles Service Account) Authentication
This is not a required step but a recommended way to authenticate AWS service
1. [Create an IAM OIDC provider for your cluster](https://docs.aws.amazon.com/eks/latest/userguide/enable-iam-roles-for-service-accounts.html)
2. Follow the [Assign IAM roles to Kubernetes service accounts](https://docs.aws.amazon.com/eks/latest/userguide/associate-service-account-role.html) document but **skip** the step that creates the service account
Expand Down
5 changes: 3 additions & 2 deletions build-parent/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@
<version.fermenter>2.10.5</version.fermenter>
<version.fermenter.legacy.tools>2.8.0</version.fermenter.legacy.tools>
<version.habushu.plugin>2.17.0</version.habushu.plugin>
<version.hive>4.0.0</version.hive>
<version.python>3.11.4</version.python>
<version.help.plugin>3.5.0</version.help.plugin>
<version.krausening>20</version.krausening>
Expand Down Expand Up @@ -87,9 +88,9 @@
<version.sedona>1.4.0</version.sedona>
<version.geotools.wrapper>1.4.0-28.2</version.geotools.wrapper>
<version.mysql-connector>8.0.30</version.mysql-connector>
<version.hadoop>3.3.4</version.hadoop>
<version.hadoop>3.4.1</version.hadoop>
<version.neo4j>4.1.5_for_spark_3</version.neo4j>
<version.aws.sdk.bundle>1.12.262</version.aws.sdk.bundle>
<version.aws.sdk.bundle>1.12.780</version.aws.sdk.bundle>
<version.baton>1.1.1</version.baton>
<version.quarkus.cucumber>1.0.0</version.quarkus.cucumber>
<version.quarkus.cucumber.java>${version.cucumber}</version.quarkus.cucumber.java>
Expand Down
152 changes: 141 additions & 11 deletions extensions/extensions-docker/aissemble-hive-service/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,9 @@
<packaging>docker-build</packaging>

<properties>
<dockerbuild.jars.directory>target/dockerbuild/jars</dockerbuild.jars.directory>
<version.hive.metastore>4.0.0</version.hive.metastore>
<dockerbuild.jars.directory>target/dockerbuild/lib</dockerbuild.jars.directory>
<dockerbuild.bin.directory>target/dockerbuild/bin</dockerbuild.bin.directory>
<dockerbuild.patch.directory>target/dockerbuild/patch</dockerbuild.patch.directory>
</properties>

<dependencies>
Expand All @@ -26,9 +27,73 @@
<version>3.2.0</version>
</dependency>
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<version>${version.mysql-connector}</version>
<groupId>org.apache.hive</groupId>
<artifactId>hive-standalone-metastore-server</artifactId>
<version>${version.hive}</version>
<classifier>bin</classifier>
<type>tar.gz</type>
</dependency>

<!-- Patch JARs -->
<dependency>
<groupId>io.airlift</groupId>
<artifactId>aircompressor</artifactId>
<version>0.27</version>
</dependency>
<dependency>
<groupId>org.apache.avro</groupId>
<artifactId>avro</artifactId>
<version>1.11.4</version>
</dependency>
<dependency>
<groupId>dnsjava</groupId>
<artifactId>dnsjava</artifactId>
<version>3.6.2</version>
</dependency>
<dependency>
<groupId>io.grpc</groupId>
<artifactId>grpc-protobuf</artifactId>
<version>1.70.0</version>
</dependency>
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<version>33.4.0-jre</version>
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-core</artifactId>
<version>2.18.2</version>
</dependency>
<dependency>
<groupId>org.eclipse.jetty</groupId>
<artifactId>jetty-io</artifactId>
<version>9.4.57.v20241219</version>
</dependency>
<dependency>
<groupId>io.netty</groupId>
<artifactId>netty-codec-http2</artifactId>
<version>4.1.117.Final</version>
</dependency>
<dependency>
<groupId>com.nimbusds</groupId>
<artifactId>nimbus-jose-jwt</artifactId>
<version>9.48</version>
</dependency>
<dependency>
<groupId>com.google.protobuf</groupId>
<artifactId>protobuf-java</artifactId>
<version>3.25.5</version>
</dependency>
<dependency>
<groupId>org.xerial.snappy</groupId>
<artifactId>snappy-java</artifactId>
<version>1.1.10.7</version>
</dependency>
<dependency>
<groupId>org.apache.zookeeper</groupId>
<artifactId>zookeeper</artifactId>
<version>3.9.3</version>
</dependency>
</dependencies>

Expand All @@ -42,34 +107,99 @@
<image>
<build>
<args>
<METASTORE_VERSION>${version.hive.metastore}</METASTORE_VERSION>
<METASTORE_VERSION>${version.hive}</METASTORE_VERSION>
<HADOOP_VERSION>${version.hadoop}</HADOOP_VERSION>
<JARS_DIR>${dockerbuild.jars.directory}</JARS_DIR>
<BIN_DIR>${dockerbuild.bin.directory}</BIN_DIR>
<PATCH_DIR>${dockerbuild.patch.directory}</PATCH_DIR>
</args>
</build>
</image>
</images>
</configuration>
<executions>
<execution>
<id>test-image</id>
<phase>integration-test</phase>
<configuration>
<images>
<image>
<run>
<network>
<!-- bridged mode (the default) does not work reliably for direct tcp verification.
See https://github.com/fabric8io/docker-maven-plugin/issues/1234#issuecomment-2609924715 -->
<mode>host</mode>
</network>
<wait>
<tcp>
<ports>
<port>9083</port>
</ports>
</tcp>
</wait>
</run>
</image>
</images>
</configuration>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.codehaus.mojo</groupId>
<artifactId>exec-maven-plugin</artifactId>
<executions>
<execution>
<id>test-image-cleanup</id>
<phase>post-integration-test</phase>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-dependency-plugin</artifactId>
<configuration>
<outputDirectory>${dockerbuild.jars.directory}</outputDirectory>
<includeTypes>jar</includeTypes>
<excludeTransitive>true</excludeTransitive>
<includeArtifactIds>delta-hive_2.12,mysql-connector-java</includeArtifactIds>
</configuration>
<executions>
<execution>
<id>unpack</id>
<id>get-extra-libs</id>
<phase>prepare-package</phase>
<goals>
<goal>copy-dependencies</goal>
</goals>
<configuration>
<outputDirectory>${dockerbuild.jars.directory}</outputDirectory>
<includeTypes>jar</includeTypes>
<includeArtifactIds>delta-hive_2.12</includeArtifactIds>
</configuration>
</execution>
<execution>
<id>get-patch-jars</id>
<phase>prepare-package</phase>
<goals>
<goal>copy-dependencies</goal>
</goals>
<configuration>
<outputDirectory>${dockerbuild.patch.directory}</outputDirectory>
<includeTypes>jar</includeTypes>
<excludeArtifactIds>delta-hive_2.12,fermenter-mda,baton-maven-plugin,habushu-maven-plugin</excludeArtifactIds>
<useRepositoryLayout>true</useRepositoryLayout>
</configuration>
</execution>
<execution>
<id>get-metastore</id>
<phase>prepare-package</phase>
<goals>
<goal>copy-dependencies</goal>
</goals>
<configuration>
<outputDirectory>${dockerbuild.bin.directory}</outputDirectory>
<includeArtifactIds>hive-standalone-metastore-server</includeArtifactIds>
<stripVersion>true</stripVersion>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>

</project>
Original file line number Diff line number Diff line change
@@ -1,32 +1,75 @@
ARG METASTORE_VERSION
FROM docker.io/apache/hive:${METASTORE_VERSION} AS appsource
FROM docker.io/eclipse-temurin:17-jre AS builder

ARG METASTORE_VERSION
ARG HADOOP_VERSION
ARG JARS_DIR
ARG BIN_DIR
ARG PATCH_DIR

ENV HADOOP_HOME=/opt/hadoop
ENV HIVE_HOME=/opt/hive
ENV HIVE_VER=$METASTORE_VERSION

# Install hadoop
RUN cd /tmp \
&& wget https://archive.apache.org/dist/hadoop/common/hadoop-${HADOOP_VERSION}/hadoop-${HADOOP_VERSION}-lean.tar.gz -O - \
| tar -xzf - \
&& mv hadoop-${HADOOP_VERSION} $HADOOP_HOME

# Use standalone binary instead of full-service version on original image
COPY ${BIN_DIR}/hive-standalone-metastore-server-bin.tar.gz /tmp
RUN cd /tmp \
&& tar -xf /tmp/hive-standalone-metastore-server-bin.tar.gz \
&& mv apache-hive-metastore-${METASTORE_VERSION}-bin $HIVE_HOME \
&& rm /tmp/hive-standalone-metastore-server-bin.tar.gz

# Update library jars used by Hive and Hadoop
COPY --chmod=755 ./src/main/resources/scripts/setup.sh $HIVE_HOME/setup.sh
ADD ${PATCH_DIR}/* /tmp/patch-jars
RUN $HIVE_HOME/setup.sh /tmp/patch-jars && rm -rf /tmp/patch-jars

# Hadoop ships with source jars, which aren't necessary and bloat the image, also remove YARN support as we only support K8s
RUN rm -rf $HADOOP_HOME/share/hadoop/common/sources \
$HADOOP_HOME/share/hadoop/hdfs/sources \
$HADOOP_HOME/share/hadoop/mapreduce/sources \
$HADOOP_HOME/share/hadoop/tools/sources \
$HADOOP_HOME/share/hadoop/yarn/*

# Patch for HIVE-28487: https://github.com/apache/hive/pull/5419
RUN sed -i \
's/org.apache.hadoop.hive.metastore.tools.MetastoreSchemaTool/org.apache.hadoop.hive.metastore.tools.schematool.MetastoreSchemaTool/' \
"$HIVE_HOME/bin/ext/schemaTool.sh"

FROM docker.io/eclipse-temurin:17-jre AS final

LABEL org.opencontainers.image.source="https://github.com/boozallen/aissemble"

WORKDIR /opt

ARG METASTORE_VERSION
ARG HADOOP_VERSION
ARG JARS_DIR
ARG BIN_DIR
ARG PATCH_DIR

ENV HADOOP_HOME=/opt/hadoop
ENV HIVE_HOME=/opt/hive
ENV HIVE_VER=$METASTORE_VERSION

COPY --from=builder $HADOOP_HOME $HADOOP_HOME
COPY --from=builder $HIVE_HOME $HIVE_HOME

COPY --from=appsource /opt/hadoop $HADOOP_HOME
COPY --from=appsource /opt/hive $HIVE_HOME
COPY --chmod=755 src/main/resources/scripts/entrypoint.sh /entrypoint.sh

RUN groupadd -rf hive --gid=1000 && \
useradd --home $HIVE_HOME -g hive --shell /usr/sbin/nologin --uid 1000 hive -o && \
chown hive:hive -R $HIVE_HOME && \
ln -s $JAVA_HOME /opt/jre
chown hive:hive -R $HIVE_HOME

ADD ${JARS_DIR}/* $HIVE_HOME/lib/

# Remove jars with open vulnerabilities. These jars are included in the apache hive image but not necessary
# when running the hive metastore only
RUN rm ${HIVE_HOME}/lib/avatica-1.12.0.jar ${HIVE_HOME}/lib/htrace-core-3.1.0-incubating.jar \
${HADOOP_HOME}/share/hadoop/yarn/timelineservice/lib/htrace-core-3.1.0-incubating.jar

USER hive
WORKDIR $HIVE_HOME

ENTRYPOINT ["/opt/hive/bin/hive", "--skiphadoopversion", "--skiphbasecp", "--verbose", "--service", "metastore"]
ENV VERBOSE=true
ENTRYPOINT ["/entrypoint.sh"]
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
#!/bin/bash

###
# #%L
# aiSSEMBLE::Extensions::Docker::Hive Service
# %%
# Copyright (C) 2021 Booz Allen
# %%
# This software package is licensed under the Booz Allen Public License. All Rights Reserved.
# #L%
###


# DERIVED FROM apache/spark image entrypoint script
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0

set -x

DB_DRIVER=${DB_DRIVER:-derby}
if [[ $VERBOSE = "true" ]]; then
VERBOSE_MODE="--verbose"
else
VERBOSE_MODE=""
fi

function initialize_hive {
COMMAND="-initOrUpgradeSchema"
if [ "$(echo "$HIVE_VER" | cut -d '.' -f1)" -lt "4" ]; then
COMMAND="-initSchema"
fi
# Don't honor verbose mode and dump errors because the 4.0.0 mysql schema generates a ton of deprecation warnings
if "$HIVE_HOME/bin/schematool" -dbType $DB_DRIVER $COMMAND; then
echo "Initialized schema successfully.."
else
echo "Schema initialization failed!"
exit 1
fi
}

export HIVE_CONF_DIR=$HIVE_HOME/conf
if [ -d "${HIVE_CUSTOM_CONF_DIR:-}" ]; then
find "${HIVE_CUSTOM_CONF_DIR}" -type f -exec \
ln -sfn {} "${HIVE_CONF_DIR}"/ \;
export HADOOP_CONF_DIR=$HIVE_CONF_DIR
export TEZ_CONF_DIR=$HIVE_CONF_DIR
fi

export HADOOP_CLIENT_OPTS="$HADOOP_CLIENT_OPTS -Xmx1G $SERVICE_OPTS"

if [ -z "$IS_RESUME" ]; then
echo "Initializing (or upgrading) schema"
initialize_hive
else
echo "Skip schema initialization ($IS_RESUME)"
fi

export METASTORE_PORT=${METASTORE_PORT:-9083}
exec "$HIVE_HOME/bin/base" --skiphadoopversion $VERBOSE_MODE --service metastore
Loading

0 comments on commit 4f59edc

Please sign in to comment.