-
Notifications
You must be signed in to change notification settings - Fork 79
MVP and Design Considerations
App developers need a security compliant reference solution that provides an enterprise-grade production architecture for deploying line-of-business applications to Azure App Service for Containers.
We have identified the following high-level requirements:
- Break out the reference applications, including the existing Spring Boot app and a forthcoming node.js app, into separate projects. In other words, these apps are outside the scope of this project.
- Create a HA/DR architecture, with a reference solution that provides high availability via multi-data center deployments, geo-based routing, robust health probes and automated failover. Certain aspects of disaster recovery, such as data backup, will be documented but not implemented.
- Provide bash scripts to query existing shared services (see below) and generate IaC files.
- Provide a solid DevOps solution that includes continuous integration (CI) builds and tests for new pull requests, and a simple continuous delivery (CD) solution in which new PRs are automatically deployed to a development environment, with manual promotion available to QA, staging and production environments.
- Follow security best practices to protect the infrastructure from attacks and to secure data with secrets stored in Key Vault.
- Load balancing done by App Service, not by App Gateway
- Standard DDoS protection
- WAF (cross-site scripting, SQL injection, etc.)
- Reverse proxy API and app traffic
- App/container hosting
- Public IP mapped to container port
- Load balancing
- Reverse proxy for API calls
- Log API traffic to App Insights
- SSL termination
- Service authentication (via JWT tokens)
- Service Versioning
- Single API Endpoint across service fleet
- DNS routing (default to nearest region to assure lowest latency for end user)
- Regional failover based on service health
- Canary testing
- API consumption
The solution depends on several shared services.
We expect the app developer to provide the following:
- Azure Subscriptions
- Azure Active Directory (AAD)
- Azure Container Registry (ACR)
- Azure Key Vault specific to Cobalt
- Resource Groups
- Vnets
- Azure DevOPS instance
For development purposes, we will use our own subscription and AAD.
We plan to implement a simple but reasonably robust CI/CD solution that provides basic functionality that developers can build on.
CI/CD is implemented using Azure DevOps and Azure Container Registry.
Our solution provides three environments:
- Integration
- QA
- Pre-Production (with slots for staging pre-production instances)
- Production (with slots for staging pre-production instances)
This solution is focused only on production infrastructure, so do not plan to provide CI testing for application builds. Rather, we run simple CI tests to validate that newly added and/or modified terraform templates are operating correctly when deployed to the development environment
To keep things simple, the solution requires engineers/operators to manually promote builds from dev to QA, and from QA to pre-production/staging. A manual app service swap is used to promote a pre-production app to production.
Although Project Jackson (the project on which we are building) deploys infrastructure using ARM templates, we are using Terraform for this solution. Several reasons:
- Terraform uses YAML, which is significantly more human-readable than the JSON templates used by ARM.
- Terraform provides excellent state management and allows idempotent updates, and it minimizes the surface area that it touches when updating deployments.
- We recommend using fewer technologies. As we are recommending Terraform as the solution for deploying AKS clusters, it is sensible that we do the same for App Services.
- Terraform promotes modular reusability reducing redundant infrastructure components and improves ease of maintenance and readability.
We are using multiple Terraform templates to deploy the different pieces of the solution. Initially this includes separate templates to create the following:
- Infrastructure for a single DC (resource group, App Service Plan, App Service, static IP, App Gateway, etc.)
- Global infrastructure (Azure Traffic Manager, CDN, App Analytics, Monitor, etc.)
- Placeholder shared services (Key Vault, ACR, Cosmos DB for reference app, etc.)
The reference solution is built around hosting simple containerized applications on Azure App Service for Containers (aka Web Apps for Containers/WAC). The following components are provisioned in each data center:
- App Service Plan
- App Service for Container
- Static IP that serves as the public IP for the App Service
- App Gateway to protect the App Service from attack
Azure Traffic Manager is configured to connect app users to the geographically closest App Service. (For validation purposes, we intend to deploy the solution to a minimum of two data centers.) In addition, it uses health probes to monitor the application stack in each data center. When a data center becomes unhealthy, either because the health probes time out or because they return errors, TM fails traffic over to the remain healthy DC(s).
TM is configured as follows:
- Configure DNS responses to specify a 30 second client TTL.
- Check probes every 5 seconds
- 10 second timeout on each probe
- Fail over after two success probes fail
- Restore traffic after failed data center is healthy for 1 minute
The reference solution provides simple monitoring and reporting, including simple dashboards, using appropriate Azure services, such as:
- Application Insights
- Azure Monitor