Bugfixes
Demo for GSF
Refactoring MCC to organise code a bit better & Bug fixes
Added the ability to assign expressions to clusters that will act as a toleration.
Extended Multi Cluster Controller to be capable of combining multiple metrics to give a remote cluster a score.
Adjusted NetworkPolicy
to allow new openshift-operators
namespace, which is now called openshift-operators-redhat
Changed/added labels to the ScaledJobs and ScaledObjects and updated the namespace replace job to filter on those labels.
Updated Architecture Visuals
Expanding Dispatcher to use Prometheus Queries to determine the Best destination for a Job.
Adding an example of an exporter deployment that is able to export metrics into Prometheus from a Grid Intensity Provider.
Adding Sailfish-Dispatcher ScaledJob to the Multi Cluster Component. This ScaledJob enables you to schedule workloads on either the bridge queue or the local queue.
Fixed Operator Sync waves
Adding a Multi Cluster Controller Component. This Controller enables you to create Bridge Queues with ease using a SailfishCluster
CRD.
Small documentation improvements and Refactored naming convention: sailfish-manager -> previously known as 'run-manager' / 'job-manager'/ 'CUSTOM JOB' / 'run-manager-job.yaml' sailfish-worker -> previously known as 'runner'/ 'tasks'/ 'task runner' / 'runner-job.yaml'
sailfish-managers execute jobs from the sailfishJob queue sailfish-workers execute tasks from the sailfishTask queue
Refactored sailfish-py
demo application to follow the Job Paradigm as described in the docs/the-job-paradigm.md
Described in the docs, is the prefered way to setup your ScaledJobs.
Fixed an issue with sailfish-gateway
path of the broker-authentication
component.
Fixed issue where Route generated from Knative Service would be constantly pruned by a Cluster-Scoped ArgoCD Improved robustness of the SyncJob to prevent failed ScaledJobs and ScaledObjects
Console and Authentication has been disabled by default.
To enable them, use the broker-console
or broker-authentication
Components. With this change you can remove all references and usage of sailfish-broker-credentials-secret
.
The motivation for disabling Authentication is that the authentication protocol is not supported/gets blocked by s2i images.
It is recommended that you keep both components disabled. However broker-console
can become handy for debugging/testing.
Adding Machineset Azure Tags for Owner and Application
Starting this version you should set the owner
parameter in your MachineSets to improve cost management in Azure.
- Fixing issues with sailfish instances missing logs
- Adding ability to use Spot Instances!
Removing application/product specific tolerations on MachineSets. By default, the Worker will always schedule on the Sailfish Machines, however the Manager schedules on any worker. More information on how this works can be found in docs/sailfish-machines.md
as well on how to implement Spot Machines in docs/features/spot-machinesets.md
Adding Overlays for Demo
Added a new component that allows you to set the Sailfish Broker in High Availability This will make your Queue Messages Zone redundant, which means if one zone goes down, the messages will be migrated to another. Fixes to observability dashboard outofsync issues
Removed activeDeadlineSeconds
from base configuration as it does not comply with the Job Paradigm as we intend it. Added Documentation that explains how the Job Paradigm is used in Sailfish.
Added kustomization.yaml in k8s/observability
so it works with kustomize remote ref
Restructured Folders
To use this version you must change sailfish-k8s
to k8s
in all of your ArgoCD Apps and AppSets!
Implemented GH Action that creates Releases
Added ability to set the maximum amount that your machinesets are allowed to scale to.
To upgrade to this version you must update the machineset argo app to include the parameter:
helm:
parameters:
- name: maxMachinesPerZone
value: '3'
Fixed some Readme inconsistencies
Simplified image replacements in your overlays, now you can just reference them by name!
In your overlay you can use the kustomize images
field like such:
images:
- name: sailfish-manager
newName: your-registry/your-manager-image
- name: sailfish-worker
newName: your-registry/your-worker-image
- name: sailfish-gateway
newName: your-registry/your-gateway-image
Fixed issue where this repo does not work with ApplicationSets due to the namespace field in the ScaledJob/ScaledObject Triggers.
Now these fields no longer needs to be overriden, a ArgoCD Sync job is deployed to fix all the triggers.
You must update your overlays to no longer replace namespaces of ScaledJob and ScaledObject triggers
You must update your ArgoCD Application that deploys sailfish with these ignores:
## There is a Sync job that automates the replacement of the namespace in the triggers
- group: keda.sh
kind: ScaledJob
jsonPointers:
- /spec/triggers/0/metadata/namespace
- /spec/triggers/1/metadata/namespace
- /spec/triggers/2/metadata/namespace
- /spec/triggers/3/metadata/namespace
- /spec/triggers/4/metadata/namespace
- /spec/triggers/5/metadata/namespace
- group: keda.sh
kind: ScaledObject
jsonPointers:
- /spec/triggers/0/metadata/namespace
- /spec/triggers/1/metadata/namespace
- /spec/triggers/2/metadata/namespace
- /spec/triggers/3/metadata/namespace
- /spec/triggers/4/metadata/namespace
- /spec/triggers/5/metadata/namespace
Added Ability to Scale the Sailfish Broker to Zero when there is no traffic! This Version also introduces the use of kustomize Components.
The current components are:
- sailfish-gateway - A Knative Service that handles the trigger of a Job/Simulation
- AMQ Broker scaling to zero - This requires the sailfish-gateway to be enabled
- Ephemeral Broker - If you wish to remove the persistence of queue
To activate one of these features, simply add it in your kustomization.yaml
like such:
components:
- https://github.com/Ortec-Finance/rdlabs-sailfish-hpc//k8s/sailfish/components/ephemeral-broker/?timeout=120&ref=v0.2.0
If you intend to use the AMQ Broker scaling to zero component you must update your ArgoCD Application that deploys sailfish-hpc with these ignores:
ignoreDifferences:
- group: broker.amq.io
kind: ActiveMQArtemis
jsonPointers:
- /spec/deploymentPlan/size
Updated Cluster Configuration to properly deploy operators Updated Machine Configuration to support OCP 4.12 and parameterized taints and labels