Skip to content

Commit

Permalink
update fluence scheduler build to use newer API
Browse files Browse the repository at this point in the history
Problem: we made changes to the flux-sched reapi go module, and did not update here.
Solution: update the client interface here, ensuring that we also install exactly
the version of flux-sched that we are cloning. I am also cleaning up the README to
not use $ before commands (easier to copy paste), the go.mod/sum, and the utils
function to print an error code and not an integer (part of the refactor)

Signed-off-by: vsoch <[email protected]>
  • Loading branch information
vsoch committed Dec 14, 2023
1 parent 22b3bfe commit 105562e
Show file tree
Hide file tree
Showing 7 changed files with 106 additions and 106 deletions.
8 changes: 6 additions & 2 deletions .github/workflows/build-deploy.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,9 @@ jobs:
go-version: ^1.19

- name: Build Containers
run: make build REGISTRY=ghcr.io/flux-framework SCHEDULER_IMAGE=fluence
run: |
make prepare
make build REGISTRY=ghcr.io/flux-framework SCHEDULER_IMAGE=fluence
- name: Tag Release Image
if: (github.event_name == 'release')
Expand Down Expand Up @@ -58,7 +60,9 @@ jobs:
go-version: ^1.19

- name: Build Container
run: make build-sidecar REGISTRY=ghcr.io/flux-framework SIDECAR_IMAGE=fluence-sidecar
run: |
make prepare
make build-sidecar REGISTRY=ghcr.io/flux-framework SIDECAR_IMAGE=fluence-sidecar
- name: Tag Release Image
if: (github.event_name == 'release')
Expand Down
44 changes: 22 additions & 22 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,11 +22,11 @@ We provide helper commands to do that.

```bash
# This clones the upstream scheduler plugins code, we will add fluence to it!
$ make prepare
make prepare

# Add fluence assets
$ cd upstream/manifests/install/charts
$ helm install \
cd upstream/manifests/install/charts
helm install \
--set scheduler.image=ghcr.io/flux-framework/fluence:latest \
--set scheduler.sidecarimage=ghcr.io/flux-framework/fluence-sidecar \
schedscheduler-plugins as-a-second-scheduler/
Expand Down Expand Up @@ -138,7 +138,7 @@ make build REGISTRY=vanessa CONTROLLER_IMAGE=fluence-controller SCHEDULER_IMAGE=
To walk through it manually, first, clone the upstream scheduler-plugins repository:

```bash
$ git clone https://github.com/kubernetes-sigs/scheduler-plugins ./upstream
git clone https://github.com/kubernetes-sigs/scheduler-plugins ./upstream
```

We need to add our fluence package to the scheduler plugins to build. You can do that manully as follows:
Expand All @@ -151,7 +151,7 @@ cp -R sig-scheduler-plugins/manifests/fluence ./upstream/manifests/fluence
# These are files with subtle changes to add fluence
cp sig-scheduler-plugins/cmd/scheduler/main.go ./upstream/cmd/scheduler/main.go
cp sig-scheduler-plugins/manifests/install/charts/as-a-second-scheduler/templates/deployment.yaml ./upstream/manifests/install/charts/as-a-second-scheduler/templates/deployment.yaml
cp sig-scheduler-plugins/manifests/install/charts/as-a-second-scheduler/templates/values.yaml ./upstream/manifests/install/charts/as-a-second-scheduler/templates/values.yaml
cp sig-scheduler-plugins/manifests/install/charts/as-a-second-scheduler/values.yaml ./upstream/manifests/install/charts/as-a-second-scheduler/values.yaml
```

Then change directory to the scheduler plugins repository.
Expand All @@ -164,10 +164,10 @@ And build! You'll most likely want to set a custom registry and image name again

```bash
# This will build to localhost
$ make local-image
make local-image

# this will build to docker.io/vanessa/fluence
$ make local-image REGISTRY=vanessa CONTROLLER_IMAGE=fluence
make local-image REGISTRY=vanessa CONTROLLER_IMAGE=fluence
```

</details>
Expand All @@ -177,7 +177,7 @@ $ make local-image REGISTRY=vanessa CONTROLLER_IMAGE=fluence
Whatever build approach you use, you'll want to push to your registry for later discovery!

```bash
$ docker push docker.io/vanessa/fluence
docker push docker.io/vanessa/fluence
```

### Prepare Cluster
Expand All @@ -188,7 +188,7 @@ These steps will require a Kubernetes cluster to install to, and having pushed t
create a local one with `kind`:

```bash
$ kind create cluster
kind create cluster
```

### Install Fluence
Expand All @@ -199,14 +199,14 @@ under [deploy](#deploy) to ensure you have cloned the upstream kubernetes-sigs/s
more details to inspect attributes available to you. Let's say that you ran

```bash
$ make prepare
make prepare
```

You could then inspect values with helm:

```bash
$ cd upstream/manifests/install/charts
$ helm show values as-a-second-scheduler/
cd upstream/manifests/install/charts
helm show values as-a-second-scheduler/
```

<details>
Expand Down Expand Up @@ -249,8 +249,8 @@ The `helm install` shown under [deploy](#deploy) is how you can install to your
Here would be an example using custom images:

```bash
$ cd upstream/manifests/install/charts
$ helm install \
cd upstream/manifests/install/charts
helm install \
--set scheduler.image=vanessa/fluence:latest \
--set scheduler.sidecarimage=vanessa/fluence-sidecar \
schedscheduler-plugins as-a-second-scheduler/
Expand All @@ -264,7 +264,7 @@ The installation process will run one scheduler and one controller pod for the S
You can double check that everything is running as follows:

```bash
$ kubectl get pods
kubectl get pods
```
```console
NAME READY STATUS RESTARTS AGE
Expand All @@ -277,26 +277,26 @@ Let's now check logs for containers to check that everything is OK.
First, let's look at logs for the sidecar container:

```bash
$ kubectl logs fluence-6bbcbc6bbf-xjfx6
kubectl logs fluence-6bbcbc6bbf-xjfx6
```
```console
Defaulted container "sidecar" out of: sidecar, scheduler-plugins-scheduler
This is the fluxion grpc server
Created cli context &{}
&{}
Created flux resource client &{0x3bd33d0}
&{ctx:0x3bd33d0}
Number nodes 1
node in flux group kind-control-plane
Node kind-control-plane flux cpu 6
Node kind-control-plane total mem 16132255744
Can request at most 6 exclusive cpu
Node kind-control-plane flux cpu 10
Node kind-control-plane total mem 32992821248
Can request at most 10 exclusive cpu
Match policy: {"matcher_policy": "lonode"}
[GRPCServer] gRPC Listening on [::]:4242
```

And for the fluence container:

```bash
$ kubectl logs fluence-6bbcbc6bbf-xjfx6 -c scheduler-plugins-scheduler
kubectl logs fluence-6bbcbc6bbf-xjfx6 -c scheduler-plugins-scheduler
```

If you haven't done anything, you'll likely just see health checks.
Expand Down
7 changes: 5 additions & 2 deletions src/build/scheduler/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -128,7 +128,10 @@ RUN apt -y update && apt -y upgrade && apt install --no-install-recommends -y pr
ENV LD_LIBRARY_PATH="${LD_LIBRARY_PATH}:/usr/lib:/home/flux-sched/resource:/home/flux-sched/resource/libjobspec:/home/flux-sched/resource/reapi/bindings"
COPY fluence Makefile /go/src/fluence/
WORKDIR /go/src/fluence/
RUN go mod tidy && \

# This is the 0.31.0 tag of flux-sched (same as we install above)
RUN go get -u github.com/flux-framework/flux-sched/resource/reapi/bindings/go/src/fluxcli@250eac78a6753253fc8353a3504d7e843d1b6b24 && \
go mod tidy && \
make server FLUX_SCHED_ROOT=/home/flux-sched INSTALL_PREFIX=${INSTALL_PREFIX} && \
mkdir -p /home/data/jobspecs /home/data/jgf && \
chmod -R ugo+rwx /home/data
chmod -R ugo+rwx /home/data
60 changes: 28 additions & 32 deletions src/fluence/fluxion/fluxion.go
Original file line number Diff line number Diff line change
@@ -1,68 +1,64 @@
package fluxion

import (

"github.com/cmisale/flux-sched/resource/hlapi/bindings/go/src/fluxcli"
"github.com/flux-framework/flux-k8s/flux-plugin/fluence/utils"
"github.com/flux-framework/flux-k8s/flux-plugin/fluence/jobspec"
"os"

pb "github.com/flux-framework/flux-k8s/flux-plugin/fluence/fluxcli-grpc"
"github.com/flux-framework/flux-k8s/flux-plugin/fluence/jobspec"
"github.com/flux-framework/flux-k8s/flux-plugin/fluence/utils"
"github.com/flux-framework/flux-sched/resource/reapi/bindings/go/src/fluxcli"

"context"
"errors"
"fmt"
"io/ioutil"
)

type Fluxion struct {
fctx *fluxcli.ReapiCtx
cli *fluxcli.ReapiClient
pb.UnimplementedFluxcliServiceServer
}

func (f *Fluxion) Context() *fluxcli.ReapiCtx {
return f.fctx
}

func (f *Fluxion) InitFluxion(policy *string, label *string) {
f.fctx = fluxcli.NewReapiCli()
f.cli = fluxcli.NewReapiClient()

fmt.Println("Created cli context ", f.fctx)
fmt.Printf("%+v\n", f.fctx)
fmt.Println("Created flux resource client ", f.cli)
fmt.Printf("%+v\n", f.cli)
filename := "/home/data/jgf/kubecluster.json"
err := utils.CreateJGF(filename, label)
if err != nil {
return
}
jgf, err := ioutil.ReadFile(filename)

jgf, err := os.ReadFile(filename)
if err != nil {
fmt.Println("Error reading JGF")
return
}

p := "{}"
if *policy != "" {
p = string("{\"matcher_policy\": \"" + *policy + "\"}")
fmt.Println("Match policy: ", p)
}

fluxcli.ReapiCliInit(f.fctx, string(jgf), p)
}

f.cli.InitContext(string(jgf), p)
}

func (s *Fluxion) Cancel(ctx context.Context, in *pb.CancelRequest) (*pb.CancelResponse, error) {
fmt.Printf("[GRPCServer] Received Cancel request %v\n", in)
err := fluxcli.ReapiCliCancel(s.fctx, int64(in.JobID), true)
if err < 0 {
err := s.cli.Cancel(int64(in.JobID), true)
if err != nil {
return nil, errors.New("Error in Cancel")
}

dr := &pb.CancelResponse{JobID: in.JobID, Error: int32(err)}
// Why would we have an error code here if we check above?
// This (I think) should be an error code for the specific job
dr := &pb.CancelResponse{JobID: in.JobID}
fmt.Printf("[GRPCServer] Sending Cancel response %v\n", dr)

fmt.Printf("[CancelRPC] Errors so far: %s\n", fluxcli.ReapiCliGetErrMsg(s.fctx))

reserved, at, overhead, mode, fluxerr := fluxcli.ReapiCliInfo(s.fctx, int64(in.JobID))
fmt.Printf("[CancelRPC] Errors so far: %s\n", s.cli.GetErrMsg())

reserved, at, overhead, mode, fluxerr := s.cli.Info(int64(in.JobID))
fmt.Println("\n\t----Job Info output---")
fmt.Printf("jobid: %d\nreserved: %t\nat: %d\noverhead: %f\nmode: %s\nerror: %d\n", in.JobID, reserved, at, overhead, mode, fluxerr)

Expand All @@ -74,17 +70,17 @@ func (s *Fluxion) Match(ctx context.Context, in *pb.MatchRequest) (*pb.MatchResp
filename := "/home/data/jobspecs/jobspec.yaml"
jobspec.CreateJobSpecYaml(in.Ps, in.Count, filename)

spec, err := ioutil.ReadFile(filename)
spec, err := os.ReadFile(filename)
if err != nil {
return nil, errors.New("Error reading jobspec")
}

fmt.Printf("[GRPCServer] Received Match request %v\n", in)
reserved, allocated, at, overhead, jobid, fluxerr := fluxcli.ReapiCliMatchAllocate(s.fctx, false, string(spec))
reserved, allocated, at, overhead, jobid, fluxerr := s.cli.MatchAllocate(false, string(spec))
utils.PrintOutput(reserved, allocated, at, overhead, jobid, fluxerr)

fmt.Printf("[MatchRPC] Errors so far: %s\n", fluxcli.ReapiCliGetErrMsg(s.fctx))
if fluxerr != 0 {
fmt.Printf("[MatchRPC] Errors so far: %s\n", s.cli.GetErrMsg())
if fluxerr != nil {
return nil, errors.New("Error in ReapiCliMatchAllocate")
}

Expand All @@ -93,12 +89,12 @@ func (s *Fluxion) Match(ctx context.Context, in *pb.MatchRequest) (*pb.MatchResp
}

nodetasks := utils.ParseAllocResult(allocated)

nodetaskslist := make([]*pb.NodeAlloc, len(nodetasks))
for i, result := range nodetasks {
nodetaskslist[i] = &pb.NodeAlloc {
nodetaskslist[i] = &pb.NodeAlloc{
NodeID: result.Basename,
Tasks: int32(result.CoreCount)/in.Ps.Cpu,
Tasks: int32(result.CoreCount) / in.Ps.Cpu,
}
}
mr := &pb.MatchResponse{PodID: in.Ps.Id, Nodelist: nodetaskslist, JobID: int64(jobid)}
Expand Down
2 changes: 1 addition & 1 deletion src/fluence/go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ module github.com/flux-framework/flux-k8s/flux-plugin/fluence
go 1.16

require (
github.com/cmisale/flux-sched/resource/hlapi/bindings/go/src/fluxcli v0.0.0-20220921153849-5b631bfccecf
github.com/flux-framework/flux-sched/resource/reapi/bindings/go v0.0.0-20231213021445-250eac78a675
google.golang.org/grpc v1.38.0
google.golang.org/protobuf v1.26.0
gopkg.in/yaml.v2 v2.4.0
Expand Down
4 changes: 2 additions & 2 deletions src/fluence/go.sum
Original file line number Diff line number Diff line change
Expand Up @@ -64,8 +64,6 @@ github.com/chzyer/logex v1.1.10/go.mod h1:+Ywpsq7O8HXn0nuIou7OrIPyXbp3wmkHB+jjWR
github.com/chzyer/readline v0.0.0-20180603132655-2972be24d48e/go.mod h1:nSuG5e5PlCu98SY8svDHJxuZscDgtXS6KTTbou5AhLI=
github.com/chzyer/test v0.0.0-20180213035817-a1ea475d72b1/go.mod h1:Q3SI9o4m/ZMnBNeIyt5eFwwo7qiLfzFZmjNmxjkiQlU=
github.com/client9/misspell v0.3.4/go.mod h1:qj6jICC3Q7zFZvVWo7KLAzC3yx5G7kyvSDkc90ppPyw=
github.com/cmisale/flux-sched/resource/hlapi/bindings/go/src/fluxcli v0.0.0-20220915175752-563ad82d6cb3 h1:iyAYjW/7RZeGXViInC6xKXalgwXdfat/iwcXlGpO42c=
github.com/cmisale/flux-sched/resource/hlapi/bindings/go/src/fluxcli v0.0.0-20220915175752-563ad82d6cb3/go.mod h1:YECCCsYTFtX4YCub9nBnvEmNj319mPNS8ERpvpKH1nU=
github.com/cncf/udpa/go v0.0.0-20191209042840-269d4d468f6f/go.mod h1:M8M6+tZqaGXZJjfX53e64911xZQV5JYwmTeXPW+k8Sc=
github.com/cncf/udpa/go v0.0.0-20201120205902-5459f2c99403/go.mod h1:WmhPx2Nbnhtbo57+VJT5O0JRkEi1Wbu0z5j0R8u5Hbk=
github.com/coreos/bbolt v1.3.2/go.mod h1:iRUV2dpdMOn7Bo10OQBFzIJO9kkE559Wcmn+qkEiiKk=
Expand Down Expand Up @@ -100,6 +98,8 @@ github.com/exponent-io/jsonpath v0.0.0-20151013193312-d6023ce2651d/go.mod h1:ZZM
github.com/fatih/camelcase v1.0.0/go.mod h1:yN2Sb0lFhZJUdVvtELVWefmrXpuZESvPmqwoZc+/fpc=
github.com/fatih/color v1.7.0/go.mod h1:Zm6kSWBoL9eyXnKyktHP6abPY2pDugNf5KwzbycvMj4=
github.com/felixge/httpsnoop v1.0.1/go.mod h1:m8KPJKqk1gH5J9DgRY2ASl2lWCfGKXixSwevea8zH2U=
github.com/flux-framework/flux-sched/resource/reapi/bindings/go v0.0.0-20231213021445-250eac78a675 h1:FgEA3pnL/kDoLaVOUDa401yainApQJaow9jeBPg4dek=
github.com/flux-framework/flux-sched/resource/reapi/bindings/go v0.0.0-20231213021445-250eac78a675/go.mod h1:yhmzNyn45YhoxEohh1Sl3h3izLMqL7qpcvmYTRpv7eY=
github.com/form3tech-oss/jwt-go v3.2.2+incompatible/go.mod h1:pbq4aXjuKjdthFRnoDwaVPLA+WlJuPGy+QneDUgJi2k=
github.com/form3tech-oss/jwt-go v3.2.3+incompatible/go.mod h1:pbq4aXjuKjdthFRnoDwaVPLA+WlJuPGy+QneDUgJi2k=
github.com/fsnotify/fsnotify v1.4.7/go.mod h1:jwhsz4b93w/PPRr/qN1Yymfu8t87LnFCMoQvtojpjFo=
Expand Down
Loading

0 comments on commit 105562e

Please sign in to comment.