From af190da659232c5cbe6d88db09d38804f7573d20 Mon Sep 17 00:00:00 2001 From: akash1810 Date: Fri, 10 Nov 2023 10:04:40 +0000 Subject: [PATCH 1/3] docs: Add ADR for checking our data accuracy --- ADR/03-data-accuracy.md | 63 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 63 insertions(+) create mode 100644 ADR/03-data-accuracy.md diff --git a/ADR/03-data-accuracy.md b/ADR/03-data-accuracy.md new file mode 100644 index 000000000..0ca6908d2 --- /dev/null +++ b/ADR/03-data-accuracy.md @@ -0,0 +1,63 @@ +# Data Accuracy + +## Status +Proposed. + +## Context +The Guardian Service Catalogue sources data from AWS, GitHub, Snyk etc. using [CloudQuery](02-cloudquery.md). + +The vast corpus of data this has provided has enabled the department to answer questions very quickly. + +A previous attempt to answer the question "which account is this S3 bucket in?" has highlighted some missing data[^1]. + +With Service Catalogue now driving business decisions through RepoCop or SLO dashboards, we want to verify the accuracy of the data it holds. + +We plan to do this by asking AWS to count the number of resources, and compare that to the count in Service Catalogue: +1. Count the resources in Service Catalogue's database +2. Count the resources AWS reports +3. Alarm when these numbers differ + +This ADR outlines options for implementing the second item. + +## Positions +### 1. [AWS Resource Explorer](https://docs.aws.amazon.com/resource-explorer/latest/userguide/welcome.html) +AWS Resource Explorer allows one to search resources in an AWS account. +It works in all regions, and provides an aggregated view of all regions. +That is, it provides a simple way to search for resources. + +Although it is a free service, AWS Resource Explorer is a little tricky to roll out. +It needs to be [enabled in each region](https://docs.aws.amazon.com/resource-explorer/latest/userguide/manage-aggregator-region.html). + +AWS does provide recommendations for rolling Resource Explorer out to an organisation. +However, it requires [Stack Sets](https://docs.aws.amazon.com/resource-explorer/latest/userguide/manage-service-all-org-with-stacksets.html). +Our departmental tooling doesn't yet have Stack Set support, so setup would be manual. + +AWS Resource Explorer also [supports a subset of AWS resources](https://docs.aws.amazon.com/resource-explorer/latest/userguide/supported-resource-types.html). +That is, we would not be able to verify all the data in Service Catalogue. +Notably, support for AWS CloudFormation is missing. + +### 2. [AWS Config](https://docs.aws.amazon.com/config/latest/developerguide/WhatIsConfig.html) +AWS Config is primarily a resource compliance tool. However, it does provide: + +> fine-grained visibility into what resources exist +> – https://docs.aws.amazon.com/config/latest/developerguide/WhatIsConfig.html#common-scenarios + +We have AWS Config deployed to a few of our AWS accounts already. + +Similar to AWS Resource Explorer, AWS Config [supports a subset of AWS resources](https://docs.aws.amazon.com/config/latest/developerguide/resource-config-reference.html#supported-resources). +That is, we would not be able to verify all the data in Service Catalogue. + +### 3. Use the SDKs directly +Whilst it is true that Service Catalogue's corpus is vast, we are currently only using a subset to drive business decisions. +Given this, we could interact with the AWS SDKs directly for the specific resources that we need. + +This solution would likely require more code than the others. +However: +- Setup cost is reduced +- The data we verify can also be more targeted +- It provides a framework to verify non-AWS data too + +## Decision +Use the AWS SDKs directly. + +[^1]: This has since been patched in https://github.com/cloudquery/cloudquery/pull/14476. From 73f4438d1907c2dc1769ba047ff713b68e27fc25 Mon Sep 17 00:00:00 2001 From: akash1810 Date: Fri, 10 Nov 2023 10:20:13 +0000 Subject: [PATCH 2/3] docs: Describe other mechanisms for checking data accuracy --- ADR/03-data-accuracy.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/ADR/03-data-accuracy.md b/ADR/03-data-accuracy.md index 0ca6908d2..9eabb15a6 100644 --- a/ADR/03-data-accuracy.md +++ b/ADR/03-data-accuracy.md @@ -19,6 +19,9 @@ We plan to do this by asking AWS to count the number of resources, and compare t This ADR outlines options for implementing the second item. +It is also worth noting that this is only part of the data accuracy checks we'll perform. +We will also look at the age of our data, and the volume of errors in the logs produced by CloudQuery. + ## Positions ### 1. [AWS Resource Explorer](https://docs.aws.amazon.com/resource-explorer/latest/userguide/welcome.html) AWS Resource Explorer allows one to search resources in an AWS account. From 293f9cbd022c9fb1eaa2cff93fdb7de38ba8ad2d Mon Sep 17 00:00:00 2001 From: akash1810 Date: Fri, 10 Nov 2023 10:20:56 +0000 Subject: [PATCH 3/3] docs: Mark ADR as accepted --- ADR/03-data-accuracy.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ADR/03-data-accuracy.md b/ADR/03-data-accuracy.md index 9eabb15a6..0d33cf751 100644 --- a/ADR/03-data-accuracy.md +++ b/ADR/03-data-accuracy.md @@ -1,7 +1,7 @@ # Data Accuracy ## Status -Proposed. +Accepted. ## Context The Guardian Service Catalogue sources data from AWS, GitHub, Snyk etc. using [CloudQuery](02-cloudquery.md).