Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add scripts for testing the project using testflinger #234

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions scripts/tf-tests/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
hwctl
job_outputs/*
machines.txt
129 changes: 129 additions & 0 deletions scripts/tf-tests/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,129 @@
# Manual integration tests using Testflinger

The purpose of the `run-jobs.py` script is to test hardware-api client
and server on the machines that are accessible via
[Testflinger](https://github.com/canonical/testflinger). It allows us
to test the project components on multiple machines automatically.

## Requirements

These scripts use [the testflinger
snap](https://snapcraft.io/testflinger-cli), so make sure you have it
installed on your system:

```sh
sudo snap install testflinger-cli
```

Also, the following files are required to be present in this
directory:

- `machines.txt`: This file lists the Canonical IDs of the machines on
which jobs will be run. Each Canonical ID should be on a separate
line, formatted as follows:

```text
202101-28595
202012-28526
...
```

- `tf-job.yaml`: This YAML template defines the job parameters. The
template should include the placeholder `$CANONICAL_ID`, which will
be replaced with each actual Canonical ID from `machines.txt` when
jobs are submitted. You can modify the existing `tf-job.yaml` file
to run other commands or use a different Ubuntu distro.

- `hwctl`: The client executable. You can create it by running `cargo
build --release` in the project root directory. You're supposed to
build the binary on the same Ubuntu release that is specified in
`tf-job.yaml`. The machines should also be of the same architecture
the binary is build for.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if you build it under a different release? "You're supposed to..." is a little vague there I think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you get an error finding the right version of glibc. For testing purposes only, I wonder if it would be worth mentioning how to compile statically so that you can run it on something even if you aren't sure of the version? RUSTFLAGS='-C target-feature=+crt-static' cargo build --release --target x86_64-unknown-linux-gnu worked for me to do this.


Then copy the created file to this directory:

```sh
cp target/release/hwctl scripts/tf-tests/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copied the hwctl binary there but it didn't work, I got an error saying that it expected it to be in scripts/tf-tests/job_outputs instead:

FileNotFoundError: [Errno 2] No such file or directory: '/home/plars/work/cert/hardware-api/scripts/tf-tests/job_outputs/hwctl'

That's because of a recent change in testflinger-cli that defaults to assuming that attachments will be in the same directory as your job yaml file. There's an override for it though that I'll propose below as a diff.

```

## Running the script

After you meet the described requirements, make sure you have access
to
[https://testflinger.canonical.com](https://testflinger.canonical.com).

The script `run-jobs.py` can be used to submit jobs, monitor their
status, or both, depending on the options you provide.

```sh
../tf_test.py [options]
```

Examples:

* Submit Jobs and Monitor Statuses Sequentially: `./tf_test.py`
* Only Submit Jobs: `./tf_test.py --send-jobs`
* Only Monitor Job Statuses: `./tf_test.py --check-status`
* Custom Machines File and Poll Interval: `./tf_test.py
--machines-file custom_machines.txt --poll-interval 60`


## Script overview

The script performs two main functions:

- Job Submission
- Job Monitoring

### Job Submission

When submitting jobs, the script:

1. Reads each Canonical ID from `machines.txt` (or the file specified
with `--machines-file`).
2. Replaces `$CANONICAL_ID` in `tf-job.yaml` (or the file specified
with `--template-file`) with the actual ID.
3. Submits the job with `testflinger submit <job.yaml>`.
4. Captures the job UUID returned after submission.
5. Creates a directory for each Canonical ID in `job_outputs/` (or the
directory specified with `--output-dir`) and saves the job UUID in
a file named `tf_job_id.txt` within that directory.

Example directory structure after job submission:

```
job_outputs/
├── 202101-28595/
│ └── tf_job_id.txt # Contains the job UUID
├── 202012-28526/
│ └── tf_job_id.txt
```

### Job Monitoring

When monitoring jobs, the script:

1. Reads `tf_job_id.txt` files in `job_outputs/` to get the job UUIDs.
2. Enters a loop, checking the status of each job using `testflinger
status <job_id>`.
3. For jobs with status "complete", retrieves results using
`testflinger-cli results <job_id>`.
4. Saves the test output to `output.txt` within the respective
Canonical ID’s directory.
5. Extracts the status field from the test output and writes it to
`hw_status.txt`.
6. Continues monitoring until all jobs are completed.

Example directory structure after job monitoring:

```
job_outputs/
├── 202101-28595/
│ ├── tf_job_id.txt
│ ├── output.txt # Contains test output
│ └── hw_status.txt # Contains the hardware API status
├── 202012-28526/
│ ├── tf_job_id.txt
│ ├── output.txt
│ └── hw_status.txt
```
219 changes: 219 additions & 0 deletions scripts/tf-tests/run-jobs.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,219 @@
#!/usr/bin/env python3

import re
import argparse
import subprocess
import json
import logging
from time import sleep
from pathlib import Path
from typing import Dict, Optional


def parse_arguments():
parser = argparse.ArgumentParser(
description="Submit jobs and monitor their status on Testflinger."
)
parser.add_argument(
"--machines-file",
type=Path,
default="machines.txt",
help="Path to the file with machines Canonical IDs",
)
parser.add_argument(
"--template-file",
type=Path,
default="tf-job.yaml",
help="Path to Testflinger job template file",
)
parser.add_argument(
"--output-dir",
type=Path,
default="job_outputs",
help="Path to job outputs directory",
)
parser.add_argument(
"--poll-interval",
type=int,
default=30,
help="Time delay between status checks in seconds",
)
group = parser.add_mutually_exclusive_group()
group.add_argument(
"--send-jobs",
action="store_true",
help="Only submit jobs without monitoring their statuses",
)
group.add_argument(
"--check-status",
action="store_true",
help="Only check job statuses without submitting new jobs",
)
return parser.parse_args()


# Job Submission Functions
def load_canonical_ids(filename: Path) -> list:
"""Reads the Canonical IDs from a file."""
with open(filename, "r", encoding="utf8") as file:
return file.read().strip().splitlines()


def create_job_yaml(template_file: Path, canonical_id: str) -> str:
"""Creates a modified job YAML for a specific Canonical ID."""
with open(template_file, "r", encoding="utf8") as file:
job_yaml = file.read()
return job_yaml.replace("$CANONICAL_ID", canonical_id)


def write_temp_job_file(job_yaml: str, output_dir: Path, canonical_id: str) -> Path:
"""Writes the modified job YAML to a temporary file."""
temp_job_file = output_dir / f"{canonical_id}_tf-job.yaml"
temp_job_file.write_text(job_yaml)
return temp_job_file
Comment on lines +62 to +73
Copy link

@boukeas boukeas Nov 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not major but the split in two functions here seems artificial. You can even see proof of that in the docstring for write_temp_job_file: it is not easy to describe what this function does without referring to the output of the other function, i.e. "the modified job YAML" and you probably wouldn't use the second function without the first one. I would suggest merging these two into one function: what you are really doing here is instantiating the template with a value for $CANONICAL_ID.

I would also suggest accepting the path for the output (instantiated) file as an argument, rather than hardcoding the temp_job_file in the function itself. So the signature for the merged function would be something like:

def instantiate_job_yaml(template: Path, output: Path, canonical_id: str):



def submit_job(temp_job_file: Path, canonical_id: str) -> Optional[str]:
"""Submits the job and returns the job UUID."""
try:
result = subprocess.run(
["testflinger", "submit", "--attachments-relative-to", ".", str(temp_job_file)],
capture_output=True,
text=True,
check=True,
)
for line in result.stdout.splitlines():
if line.startswith("job_id:"):
return line.split(": ")[1].strip()
logging.warning("Failed to retrieve job_id for %s", canonical_id)
except subprocess.CalledProcessError as e:
logging.error("Error submitting job for %s: %s", canonical_id, e.stderr)
return None


def save_job_uuid(job_uuid: str, output_dir: Path, canonical_id: str):
"""Creates a directory for the Canonical ID and saves the job UUID."""
id_dir = output_dir / canonical_id
id_dir.mkdir(exist_ok=True)
(id_dir / "tf_job_id.txt").write_text(job_uuid)
logging.info("Job submitted for %s with job_id: %s", canonical_id, job_uuid)


def submit_all_jobs(
machines_file: Path, template_file: Path, output_dir: Path
) -> Dict[str, Path]:
"""Submit all jobs for the given machines."""
canonical_ids = load_canonical_ids(machines_file)
job_ids = {}

for canonical_id in canonical_ids:
job_yaml = create_job_yaml(template_file, canonical_id)
temp_job_file = write_temp_job_file(job_yaml, output_dir, canonical_id)
job_uuid = submit_job(temp_job_file, canonical_id)

if job_uuid:
save_job_uuid(job_uuid, output_dir, canonical_id)
job_ids[job_uuid] = output_dir / canonical_id

temp_job_file.unlink() # Clean up temporary YAML file
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Up to you if you use it or not, but whenever there's this pattern of creating and deleting temporary files, you can also use the contexts in tempfile.


return job_ids


# Job Monitoring Functions
def load_jobs(output_dir: Path) -> Dict[str, Path]:
"""Load job IDs and directories from the job outputs directory."""
jobs = {}
for id_dir in output_dir.iterdir():
if id_dir.is_dir():
job_id_file = id_dir / "tf_job_id.txt"
if job_id_file.exists():
job_id = job_id_file.read_text().strip()
jobs[job_id] = id_dir
return jobs


def check_job_status(job_id: str) -> Optional[str]:
"""Check the status of a job by its job ID."""
try:
result = subprocess.run(
["testflinger", "status", job_id],
capture_output=True,
text=True,
check=True,
)
return result.stdout.strip()
except subprocess.CalledProcessError as e:
logging.error("Error checking status for job %s: %s", job_id, e.stderr)
return None


def extract_status_from_output(test_output):
"""Extracts the status value from the test output JSON."""
match_ = re.search(r'"status":\s*"([^"]+)"', test_output)
return match_.group(1) if match_ else "Unknown"


def retrieve_job_results(job_id: str, id_dir: Path):
"""Retrieve and save the results of a completed job."""
try:
results_result = subprocess.run(
["testflinger-cli", "results", job_id],
capture_output=True,
text=True,
check=True,
)
results_data = json.loads(results_result.stdout)
test_output = results_data.get("test_output", "")
(id_dir / "output.txt").write_text(test_output)
status = extract_status_from_output(test_output)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a little confused here: can't we get the exit status of the test phase in the same way we get it's output, i.e. status = results_data.get("test_status")?

(id_dir / "hw_status.txt").write_text(status)
logging.info("Results and status saved for job %s in %s", job_id, id_dir)
except (subprocess.CalledProcessError, json.JSONDecodeError) as e:
logging.error("Error fetching results for job %s: %s", job_id, str(e))


def monitor_jobs(remaining_jobs: Dict[str, Path], poll_interval: int):
"""Monitor jobs until all are completed, fetching results as jobs finish."""
while remaining_jobs:
for job_id, id_dir in list(remaining_jobs.items()):
job_status = check_job_status(job_id)
logging.info(
"Status for job %s (Canonical ID: %s): %s",
job_id,
id_dir.name,
job_status,
)
if job_status == "cancelled":
logging.error("The job %s got cancelled.", job_id)
del remaining_jobs[job_id]
if job_status == "complete":
retrieve_job_results(job_id, id_dir)
del remaining_jobs[job_id]

if remaining_jobs:
logging.info("Waiting %d seconds before checking again...", poll_interval)
sleep(poll_interval)

logging.info("All jobs complete and results retrieved.")


def main():
logging.basicConfig(level=logging.INFO, format="%(message)s")
args = parse_arguments()
args.output_dir.mkdir(exist_ok=True)

if args.send_jobs:
submit_all_jobs(args.machines_file, args.template_file, args.output_dir)
elif args.check_status:
remaining_jobs = load_jobs(args.output_dir)
monitor_jobs(remaining_jobs, args.poll_interval)
else:
job_ids = submit_all_jobs(
args.machines_file, args.template_file, args.output_dir
)
monitor_jobs(job_ids, args.poll_interval)


if __name__ == "__main__":
main()
11 changes: 11 additions & 0 deletions scripts/tf-tests/tf-job.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
job_queue: "$CANONICAL_ID"
provision_data:
distro: jammy
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be worth mentioning in the README that this tf-job.yaml will ONLY work on maas provisioned machines, and not on things like muxpi, or oemscript provisioned systems.

test_data:
attachments:
- local: hwctl
test_cmds: |
scp attachments/test/hwctl $DEVICE_IP:
ssh $DEVICE_IP "sudo apt-get update && sudo apt install -y pkgconf libssl-dev"
ssh $DEVICE_IP "sudo dmidecode"
ssh $DEVICE_IP "sudo HW_API_URL=https://hw.staging.ubuntu.com ./hwctl"