Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adapt charm to support COS integration #127

Merged
merged 65 commits into from
Oct 10, 2023
Merged

Adapt charm to support COS integration #127

merged 65 commits into from
Oct 10, 2023

Conversation

cbartz
Copy link
Collaborator

@cbartz cbartz commented Sep 19, 2023

Applicable spec: ISD-075

Overview

The PR adapts the charm to send metric logs to Loki by using the cos-agent integration.

  • The metrics are logged into a file and then picked up by the Grafana agent and transferred to Loki.
  • A simple metric event (RunnerInstalled) is fired to test the integration.
  • A Grafana dashboard has also been added, which can be transferred to Grafana using the cos-integration-k8s charm.

image

Rationale

To implement the COS integration as specified in the spec, the GitHub Runner charm needs to be adapted to send the metrics to Loki.

Juju Events Changes

Library Changes

New libraries fetched:

  • charms/grafana_agent/v0/cos_agent.py

Checklist

@cbartz cbartz added the complex label Sep 19, 2023
@cbartz
Copy link
Collaborator Author

cbartz commented Sep 19, 2023

Marked the PR as complex due to significant new feature added. @canonical/is-devops-leadership please provide feedback on the public interface.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

license-eye has totally checked 88 files.

Valid Invalid Ignored Fixed
38 2 48 0
Click to see the invalid file list
  • lib/charms/loki_k8s/v0/loki_push_api.py
  • lib/charms/observability_libs/v0/juju_topology.py

lib/charms/loki_k8s/v0/loki_push_api.py Outdated Show resolved Hide resolved
lib/charms/observability_libs/v0/juju_topology.py Outdated Show resolved Hide resolved
Copy link
Contributor

@jdkandersson jdkandersson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, thanks for putting this together!

src/promtail.py Outdated Show resolved Hide resolved
@jdkandersson
Copy link
Contributor

jdkandersson commented Sep 20, 2023

Do you think any other modifications to public interfaces are required? I'm thinking about potentially:

  • creating the logs temporary file system/ directory
  • Reading the logs after the runner finished
  • Submitting the logs to promtail

Also, do you think it is worthwhile to introduce CharmState with just the data required for the COS integration? That way we make a start and can gradually move the rest of the architecture to be compliant

@cbartz
Copy link
Collaborator Author

cbartz commented Sep 20, 2023

Do you think any other modifications to public interfaces are required? I'm thinking about potentially:

  • creating the logs temporary file system/ directory
  • Reading the logs after the runner finished
  • Submitting the logs to promtail

Also, do you think it is worthwhile to introduce CharmState with just the data required for the COS integration? That way we make a start and can gradually move the rest of the architecture to be compliant

I have updated the draft according to your comments and the outcome of the GH Runner Roadmap meeting (story has been updated to include Grafana Dashboard and to send Runner installation event).
I think we need to add methods to handle logs from crashed runners and methods to handle the interaction between charm and runners (shared file system), but I think this is out of scope for this PR.

Copy link
Contributor

@jdkandersson jdkandersson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the interfaces look good now, minor comments

src/metrics.py Outdated Show resolved Hide resolved
src/metrics.py Outdated Show resolved Hide resolved
.pylintrc Outdated Show resolved Hide resolved
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

license-eye has totally checked 100 files.

Valid Invalid Ignored Fixed
46 1 53 0
Click to see the invalid file list
  • .woke.yaml

.woke.yaml Show resolved Hide resolved
@cbartz cbartz marked this pull request as ready for review September 28, 2023 05:22
@cbartz cbartz requested a review from a team as a code owner September 28, 2023 05:22
@cbartz cbartz marked this pull request as ready for review October 6, 2023 12:56
@cbartz
Copy link
Collaborator Author

cbartz commented Oct 6, 2023

After feedback from the Observability team, I changed the implementation to use the cos-agent integration. I have updated the PR description.

metadata.yaml Outdated Show resolved Hide resolved
requirements.txt Outdated Show resolved Hide resolved
src/charm_state.py Outdated Show resolved Hide resolved
src/metrics.py Outdated Show resolved Hide resolved
src-docs/runner_manager.py.md Outdated Show resolved Hide resolved
src/metrics.py Outdated Show resolved Hide resolved
src/metrics.py Outdated Show resolved Hide resolved
src/metrics.py Outdated Show resolved Hide resolved
tests/integration/test_charm_metrics.py Outdated Show resolved Hide resolved
tests/integration/test_charm_metrics.py Show resolved Hide resolved
@github-actions
Copy link
Contributor

github-actions bot commented Oct 9, 2023

Test coverage for 05595ff

Name                    Stmts   Miss Branch BrPart  Cover   Missing
-------------------------------------------------------------------
src/charm.py              413     95     99     20    75%   80-82, 108-110, 142-144, 204-212, 224-228, 328-333, 354, 371-377, 386-409, 427-429, 437-440, 447, 450-451, 467, 472-477, 522->525, 538-539, 541-542, 576-583, 607-608, 620-622, 637-638, 663-664, 666-667, 669-670, 744->746, 806-807, 823, 841-843, 847
src/charm_state.py         12      0      2      0   100%
src/errors.py              24      0      0      0   100%
src/event_timer.py         42      8      4      0    74%   94-97, 117-120
src/firewall.py            43     25     10      0    38%   38-42, 64-67, 75-149
src/github_type.py         36      0      0      0   100%
src/lxd_type.py            37      0      2      0   100%
src/metrics.py             46      0     10      1    98%   60->63
src/runner.py             274     41     84     22    80%   38->42, 96->98, 192-199, 205-211, 254-263, 283-288, 293, 313, 317-327, 376->379, 382-384, 391, 405, 415, 419, 421, 436, 482-487, 497, 586, 621, 647, 652-664, 678, 698
src/runner_manager.py     235     26     90      9    88%   159, 216-218, 231-232, 244-246, 252-257, 261-262, 272-273, 292, 386, 406-410, 429, 447, 590, 610
src/runner_type.py         52      0     12      0   100%
src/utilities.py           68      6     20      8    84%   74->76, 78->84, 89-91, 117, 131, 169, 222
-------------------------------------------------------------------
TOTAL                    1282    201    333     60    81%

Static code analysis report

Run started:2023-10-09 16:29:17.688261

Test results:
  No issues identified.

Code scanned:
  Total lines of code: 2947
  Total lines skipped (#nosec): 0
  Total potential issues skipped due to specifically being disabled (e.g., #nosec BXXX): 9

Run metrics:
  Total issues (by severity):
  	Undefined: 0
  	Low: 0
  	Medium: 0
  	High: 0
  Total issues (by confidence):
  	Undefined: 0
  	Low: 0
  	Medium: 0
  	High: 0
Files skipped (0):

Copy link
Contributor

@jdkandersson jdkandersson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

Copy link

@gregory-schiano gregory-schiano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thx

@cbartz cbartz merged commit 2462aca into main Oct 10, 2023
@cbartz cbartz deleted the ISD-1086 branch October 10, 2023 10:23
yhaliaw pushed a commit that referenced this pull request Oct 11, 2023
* Add public interface

* Remove promtail prefix from fcts

* Fix license header

* Merge install and config into start

* Add interface to log metrics

* Add grafana_dashboard lib

* Move logging of event to metrics module

* Introduce charm state

* Update attrs doc

* Implement COS integration

* Remove grafana_dashboard

* Change proxy server settings in test_charm

* Ignore whitelist in pylintrc

* Add missing newlines

* Exclude woke.yaml from license check

* Add final newline to .woke.yaml

* Remove usage of requests.session in metrics

* Pin pydantic more specifically

* Fix type hint/doc string in ProxyConfig

* Pin pydantic

* Rename NotCompleteError

* Remove hardcoding of promtail arch

* Remove hardcoding of promtail arch

* Rename -> download_info

* promtail.start -> promtail.setup

* handle non-happy case first in metrics

* Remove group

* Adapt promtail.yaml.j2

* Introduce constant in test_charm

* Move issue_metrics to RunnerManagerConfig

* Fix promtail.service.j2

* Capture time only when needed

* Narrow the exception catch

* Introduce constants in Promtail

* Retry Promtail health check and raise error

* Use Path.write_bytes in promtail

* set unit status to Blocked for unhealthy promtail

* Fix integration test

Breaking change in latest Pygithub version

* Lint

* Catch RequestException

* Drop Promtail and call Loki directly

* Add unit test for charm state

* Switch to cos_agent integration

* Adapt integration test

* Fix integration test

* Simplify integration test

* Fix issue_event

* Add code to set up logrotate

* Update grafana dashboard

* Cleanup

* Remove status in wait_for_idle

* Small fixes

* Dont keep charm inside state

* Update src-docs

* Dont use mutable default value

* Compute event name on instantiation

* Raise LogrotateSetupError

* Reuse _create_runner func

* Check that metrics log is empty

* Fix docstring in Event

* Lint

* app_no_runner -> app
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants