Skip to content

hum4n0id/testflinger-docker

Repository files navigation

Architecture

This is a microservices architecture to run Canonical's Testflinger service. The core stack is comprised of the the below containers. The base container infrastructure is defined via a master Docker compose file, which will deploy the entire environment automatically upon docker-compose deploy invokation.

Testflinger agents run in individual, discrete containers denoted by the associated SUT name in the Needham lab environment. These containers are created by the the tf-agent master container, via the Docker Python API, which is running pass through Docker in Docker. This allows for dynamic container creation, which is triggered by the presence of Testflinger configuration files in the associated base directory.

Healthchecks run on each agent container and report through an MQTT broker. This MQTT broker facilitates agent logging, said healthchecks and allows for initiating Testflinger jobs via MQTT publish. These healthchecks are exposed via Docker native healthchecking, making it available for standard Docker reporting.

Container management is facilitated via Portainer, which is deployed as part of the base Docker compose configuration. Full container management and health checking is availble from the Portainer web interface.

(this secion in prgoress),

Docker Containers

All container dockerfiles are located in the project root.

tf-agen

Creates, starts and coordinated SUT agent containers.

  • Runs a customized Phusion/Baseimage (Docker optimized Ubuntu) image.
  • Runs nested Docker to enable containers to be created and spun up via the Docker API at tf-agent startup.
  • Performs agent creation and management.
  • On first startup, will create agent image and create each agent container. Uses the Docker daemon from the base Docker host (passes thru /var/run/ docker.sock).

<sut_name_>

Discrete container for each SUT agent. Runs testflinger-agent.

  • One container per sut agent.
  • Runs a customized Phusion/Baseimage (Docker optimized Ubuntu) image.
  • Runs custom threaded subprocess pipe instrumentation (agent_entrypoint), logging to stdout, file and MQTT
  • Runs a healthcheck script to facilitate Docker healthchecks via MQTT.

tf-cli

Standard testflinger cli build, using Docker. • Runs a customized Phusion/Baseimage (Docker optimized Ubuntu) image.

tf-api

Gunicorn, with gevent.

tf-db

Redis database container.

tf-mqtt

Stack MQTT broker.

tf-portainer

Portainer container for managing the stack and related agents.

  • Needham Portainer details below.

nh-jenkins

Runs the Jenkins instance to facilitate regression testing via Jenkins pipelines. Pipelines are multi-stages to

Files of interest and project notes

  • ./code/init_agent_cntnrs

    • Utilizes the Python Docker sdk to enable containers to be created and spun up via the dockerd api at tf-agent startup.
    • Creates all sut container properties and attributes.
    • Image creation (streams build).
    • Network and network config.
    • Host config with defined mounts for all files.
    • Mounts are used for consistent scaling across many containers.
    • Agent healthcheck (using MQTT).
    • Containers instantiated via iterating over the sut conf dir in the project root.
    • Agent container image is created ifit doesn’t exist.
    • Agent containers are created if they do not exist.
  • ./code/agent_entrypoint

    • Loaded at agent container startup.
    • Runs custom threaded subprocess pipe instrumentation.
    • Logging to stdout, file and MQTT.
    • Transmits agent output to stack MQTT broker.
    • Runs agent_healthcheck.py (as below).
    • Published MQTT topics: agent : agent/logger status c3 : current status of REST comms from agent to C3 output: current agent output (broker retained) submit_status : when active, lists topic to publish test cmd last_job : last job seen by the sut agent (broker retained)
  • ./code/agent_healthcheck

    • Uses a simple MQTT subscribe poll on the (sut)/agent topic.
      • Agent status is a looping timer thread that runs parallel to agent logging.
      • If agent_entrypoint hangs or terminates/crashes, the container health will report as unhealthy.
    • Runs at a set interval as defined within init_agent_cntnrs
      • This is directly relative to the agent status timer period.
      • Status timer period should overlap with healthcheck interval and timeout.
    • Notes on the agent status reporting for the healthcheck:
      • Relies on the fact that the logging thread is the parent of the status thread.
        • Status child thread sets daemon=True, so if/when the parent thread fails, the child will follow suit and ‘ok’ messages will no longer be sent.
      • Healthcheck parameters in init_agent_cntnrs detail:
        • timeout should be => than status timer tperiod (or)
        • retries should be increased if timeout donesn’t cover status timer tperiod.
  • ./code/start_submit_agents • Runs on testflinger-cli.

    • Starts lightweight sut agents, similar to start_sut_agents.
    • The “submit_status” topic will list instructions if the submit agent is ready.
    • These agents listen for test submissions via MQTT, will then start the appropriate job. See MQTT notes below for useage.
    • Subscribed MQTT topics: submit : listen for mqtt test_cmd message, initiate job.
  • ./code/01_run_sut_agents

    • Starts init_agent_cntrs via init on tf-agent.
  • ./code/01_run_submit_agents

    • Starts submit agents on testflinger-cli boot.
  • ./code/export_ssh_pubkey (and export_ssh_pubkey_agnt)

    • Pushes ssh keys to specified stack MAAS host for maas-cli api.
    • Runs on container boot, will not push key if it already exists.
  • ./reference.yaml

    • Reference device agent file to facilitate job pushing via MQTT.
  • ./tf-entrypoint

    • Runs on testflinger-agent.
    • Exports ssh keys and starts init.
  • ./container-entrypoint

    • Starts appropriate microservices, see below for more info.
  • ./tools/*

    • Contains convenience scripts for streamlined Docker ops.
    • Most used scripts:: deploy_stack.sh : complete, clean deployment of stack orb_nuke.sh : completely destroy stack and all associated files and data reroll.sh : completely destroy stack, git pull and redeploy (git repo optional)
  • ./code/start_sut_agents (depreciated)

    • Runs on testflinger-agent.
    • Starts all agents, and performs logging.
    • Transmits agent output to stack MQTT broker.
    • Logs agent output /var/log/sut-agent within testflinger-agent.
    • Developer notes and aspirations within source file.
    • Published MQTT topics:: agent : agent/logger status c3 : current status of REST comms from agent to C3 output: current agent output (broker retained) submit_status : when active, lists topic to publish test cmd last_job : last job seen by the sut agent (broker retained)

Stack Operations

  • Starting and stopping the entire stack (including SUT agents):

    • Log into the Docker host and execute (as appropriate): docker-compose start (container name) docker-compose restart (container name) docker-compose stop (container name)
    • Alernatively, reboot the Docker host.
      • Stack will stop cleanly on shutdown and start on boot.
  • Starting and stopping SUT agents and/or individual stack containers:

    • Log into the Docker host OR tf-agent (for SUT agents) and execute:
      • Starting/Restarting/Stopping from shell:: docker start (container name) docker restart (container name) docker stop (container name)
      • Starting/Restarting/Stopping from Portainer works as well.
        • Done via GUI; refer to Portainer notes below.
  • Checking container logs:

    • On the shell of the Docker host:: docker logs (container name) docker-compose logs (for contiguous view of stack container logs)
    • In Portainer (in the containers context): • Click on the ‘(page icon)’ (leftmost icon) under ‘Quick actions.’ OR
    • Click on the container name and select ‘(page icon) Logs.’
  • Entering a container’s console/shell:

    • On the shell of Docker host:: docker exec -it (container name) bash
    • In Portainer (in the containers context):
      • Click on the console prompt icon.
  • Changing/updating an agent’s config:

    • Enter the agent container’s console (either method as above):
      • Agent conf file path is:: /data/testflinger-agent/sut/(sut_name).conf
      • Edit the file, save and restart the agent container (as above).
      • Conf file will reload on restart.
  • Adding an agent container:

    • On the Docker host, in the path:: /opt/testflinger-docker/sut
    • Create (or copy existing and change) the following files within this dir:
      • Agent SUT conf:: sut_name.conf
      • Agent SUT yaml:: sut_name.yaml
      • Agent snappy yaml:: sut_name_snappy.yaml
    • Next, run the following command to sort these files into the relative subdirs to load in the appropriate stack containers RUN FROM DOCKER ROOT (/opt/testflinger-docker):: ./tools/parse_tf_files.sh
    • Finally:
      • Restart the ‘tf-agent’ and ‘tf-cli’ containers.
      • This will create the container(s) using the sut name(s) (tf-agent) • Abdon statrup.
  • Handling a healthcheck event: SUT agent containers are exclusively running healthcheck functions.

    • If the healthy flag changes to unhealthy, simply restart the flagged container.
    • The current healthcheck process uses MQTT to publish from within the LogAgent

Portainer notes

This setup yields an HTML5 web interface with realtime log viewing, console and shell access along with start/restart/stop for all containers. This interface checks nearly all of the boxes for container and agent specific management and information.

  • Portainer access and config, as in Needham:
    • Access via http/s
    • https://10.245.128.15 (Needham)
    • Login with admin (request temp PW).
    • Will move to LDAP in the future.
    • Allows for centralized:
      • Start/stop/restarting of containers.
      • Accessing of console, logs (sut output) and shell.
      • Also reports container healthchecks.
      • Facilitated via agent_entrypoint and agent_healthcheck.

MQTT notes and useage

  • Grab a MQTT client, MQTT Explorer recommended.

    • This provides an excellent top-level view of all MQTT clients and topics within the MQTT broker. This means you can see all Testflinger agents running in the lab and their respective output and auxillary topics such as C3 status relative to the agent.
  • Point the client MQTT broker, as in Needham (stack broker settings):

    • Protocol: mqtt://
    • Host: 10.245.128.14
    • Port: 1883
    • Leave username and password blank.
    • Keep ‘validate certificate’ and ‘encryption’ unchecked
  • To submit a test via MQTT, publish to (sut)/submit.

    • The “submit_status” topic indicates if the submit agent is ready.
    • If using MQTT explorer (or similar clients):
      • Use the “publish” field and use (sut)/submit as the topic.
      • Raw text mode suggested, but other modes should work.
      • Publish the test cmd as in the same field ('test cmd') in the sut tf-cli yaml file. Note: when using MQTT explorer, breaking up long lines is recommended.
  • A web based MQTT client running within the lab, as a part of larger monitoring/ automation/CI is the next natural step here.

  • As a suppliment to MQTT, one could integrate REST calls via CoAP. Called inline in the same fashion as MQTT publish. A ‘testflinger-rest’ container could be a CoAP server (if necessary).

Deploying Stack

Utilizes single-shot deployment after installing some pre-reqs on the host system. • Will create and start all containers using Docker Compose (for base) followed by the API (for agents).

Deploy and configure Docker host

Deploy 18.04+ host via MAAS. After host is deployed, setup prerequisites: • Much of these steps will be moved to a conveince bash script.

Customize source and config files for environment:

All work is done in the Git cloned Docker root dir (testflinger-docker/).

Update relevant files are to match local environment:

Files that need to be updated: * Required updates:: ./docker-compose.yaml ./code/tf-entrypoint.sh ./code/testflinger.conf

  • Deployment optional updates (can be added post-deployment):: ./sut/*

  • Optional updates (uses default parameters):: ./tools/deploy_stack.sh

Edit docker-compose.yaml file to match environment:

  • Change the parent network parameters to match the environment. Keeping the default bridge parameters will work in any standard environment.

  • Likewise, update container IPs to match said networks.

Edit the testflinger entrypoint file (tf-entrypoint):

File location: ./code/tf-entrypoint.sh (ref*). This shell script is exec’d upon container boot/start.

  • Update the top-level variables to match your environment:

    • TF_MAAS_ACT is the MAAS Testflinger account (create one if it doesn’t exist).
    • MAAS API key is located in the MAAS dashboard for the testflinger account’s settings (create one if it doesn’t exist).
  • Add them to the file as follows (w/ real values):: TF_MAAS_ACT=testflinger_a MAAS_SERVER=10.245.128.4 MAAS_PORT=5240 MAAS_API_KEY=’<api_key>’

Edit ./code/testflinger.conf (ref *):

  • Update the REDIS_HOST field to the db container ip address:: REDIS_HOST = ‘10.172.10.13’

Modify/Create SUT files:

  • Update any testflinger-agent *.conf files with the api server IP:: server_address: http://10.245.128.10:8000 (use actual api ip)

  • Make sure the snappy-device-agents yaml files are appended with _snappy if you want the deployment to automatically transfer them from the sut directory to the containers. You can alternatively create the config files inside the container post-deployment.

Populate SUT conf dirs for deployment (required):

  • Run:: ./tools/parse_tf_files.sh

Deploy Compose Stack:

  • Execute the deploy-stack script to start deployment:: bash ./tools/deploy-stack.sh

  • If you want to start over from scratch, execute the orb_nuke (orbital nuke) script.:: bash ./tools/orb_nuke.sh

  • The rest of the deployment should be handled by the Docker code included in the source directory.

Validate Deployment:

When successfully deployed and running, you can check the output of the stack.

  • Show containers:: docker-compose ps

  • Validate logs:: docker logs <containter_name>

Once the deployment is complete, no other steps should be required to start executing Testflinger tests on SUTs outside of ensuring the appropriate configuration files are in the agent and cli containers.

References (incomplete):

Docker Compose Specification: https://github.com/compose-spec/compose-spec/blob/master/spec.md

Docker Build Ref (Dockerfile): https://docs.docker.com/engine/reference/builder/

Docker Python SDK: https://docker-py.readthedocs.io/en/stable/#

Phusion Baseimage: https://github.com/phusion/baseimage-docker

Portainer: https://www.portainer.io

MQTT Eclipse Mosquitto: https://github.com/eclipse/mosquitto https://hub.docker.com/\_/eclipse-mosquitto/

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published