Skip to content

Commit

Permalink
More updates
Browse files Browse the repository at this point in the history
  • Loading branch information
wajihyassine committed Oct 4, 2024
1 parent d36469c commit 4cdf847
Showing 1 changed file with 226 additions and 34 deletions.
260 changes: 226 additions & 34 deletions docs/installing-on-gke.md
Original file line number Diff line number Diff line change
@@ -1,33 +1,62 @@
# Deploying OSDFIR Infrastructure on Google Kubernetes Engine (GKE)

In this tutorial you will learn how to deploy and configure OSDFIR Infrastructure
on Google Kubernetes Engine (GKE). You will learn how to configure.
on Google Kubernetes Engine (GKE). You will then learn how to configure dfTimewolf
to process a Google Cloud disk using Turbinia and then import any created timelines
into Timesketch.

## 1. Create a Kubernetes Cluster

To get started, let's create a Kubernetes cluster in Google Cloud. You will need
to pick a name for your cluster and a zone to deploy it to. Here, we will go
with "osdfir-test" and the zone "us-central1-f". Let us save it in
environment variables:
GRR is not currently supported in this Helm chart deployment. We are working to add GRR support in a future release. In the meantime, you can find a dedicated guide for deploying GRR on GKE [here](https://github.com/google/osdfir-infrastructure/tree/main/cloud).

## Step 1: Set up Environment Variables

Before creating the Kubernetes (K8s) cluster, define the following environment
variables in your terminal. Replace the placeholders with your actual values:

```bash
export PROJECT=your-gcp-project
export CLUSTER="osdfir-test"
export ZONE="us-central1-f"
export PROJECT_ID="your-gcp-project" # Your Google Cloud project ID
export PROJECT_NUMBER="your-gcp-number" # Your Google Cloud project number
export REGION="us-central1" # The region of your cluster
export ZONE="us-central1-f" # The zone where you want to create the cluster
export CLUSTER="osdfir-cluster" # The name you choose for your K8s cluster
export NAMESPACE="default" # Your K8s namespace (can be left as 'default')
export KSA_NAME="turbinia" # Your Turbinia K8s service account (defaults to 'turbinia' if not set)
```

Now, create the cluster using the following command:
> *Note*: You can find the GCP project number by running `gcloud projects describe $PROJECT_ID`
## Step 2: Create a Kubernetes Cluster

Now, create the Kubernetes cluster with the specified configurations:

```bash
gcloud container clusters create $CLUSTER \
--num-nodes=1 \
--machine-type "e2-standard-4" \
--zone $ZONE \
--workload-pool=$PROJECT.svc.id.goog \
--addons GcpFilestoreCsiDriver
--num-nodes=1 \
--machine-type "e2-standard-4" \
--zone $ZONE \
--workload-pool=$PROJECT.svc.id.goog \
--addons GcpFilestoreCsiDriver
```

Set up the Google Kubernetes Engine auth plugin for kubectl:
> *Note*: It will take 4-5 minutes to create the cluster.
This command creates a single-node cluster with the necessary resources and addons
for running OSDFIR Infrastructure.

Important Considerations:

* GKE Autopilot is not currently supported because Turbinia workers require
elevated privileges for disk processing.
* You need a machine type of at least `e2-standard-4` (or equivalent with at
least 4 CPUs) when deploying to GKE.
* For clusters with more than one node, you'll need to set up a shared filesystem
like GCP Filestore. In Kubernetes, this translates to using a Persistent Volume
Claim (PVC) with `ReadWriteMany` access.

### Configure kubectl to Access the Cluster

Once the cluster has been created, set up the Google Kubernetes Engine auth
plugin for kubectl:

```bash
gcloud components install gke-gcloud-auth-plugin
Expand All @@ -41,58 +70,221 @@ Now check that you can connect to the cluster:
kubectl get nodes -o wide
```

> ⏲ It will take 4-5 minutes to create the cluster.
## Step 3: Create the Turbinia GCP Service Account

## 2. Create the Turbinia GCP Service Account

```bash
export PROJECT_ID=your-project # Your Google Cloud project ID.
export PROJECT_NUMBER=your-number # Your Google Cloud project number.
export NAMESPACE=default # Your K8s namespace.
export KSA_NAME=turbinia # Your Turbinia K8s service account name.
```
To process virtual machine disks in Google Cloud Platform (GCP) with Turbinia,
you need a dedicated GCP service account with the necessary permissions to attach
and detach disks.

```bash
# Grant the Compute Instance Admin role for attaching and detaching disks
gcloud projects add-iam-policy-binding projects/$PROJECT_ID \
--role=roles/compute.instanceAdmin \
--member=principal://iam.googleapis.com/projects/$PROJECT_NUMBER/locations/global/workloadIdentityPools/$PROJECT_ID.svc.id.goog/subject/ns/$NAMESPACE/sa/$KSA_NAME
```

```bash
# Grant the Service Account user role to allow the account to act as a service account
$ gcloud projects add-iam-policy-binding projects/$PROJECT_ID \
--role=roles/compute.serviceAccountUser \
--member=principal://iam.googleapis.com/projects/$PROJECT_NUMBER/locations/global/workloadIdentityPools/$PROJECT_ID.svc.id.goog/subject/ns/$NAMESPACE/sa/$KSA_NAME
```

These command grants the following roles to the service account:

* Compute Instance Admin: Allows Turbinia to attach and detach disks from GCP
VMs for processing.
* Service Account User: Allows the Turbinia service account to act as this newly
created service account.

## Step 4: Deploy the OSDFIR Infrastructure Helm Chart

## 3. Deploy OSDFIR Infrastructure
Now it is time to deploy the OSDFIR Infrastructure Helm chart.

The first step is to add the repo and then update to pick up any new changes.

```console
helm repo add osdfir-charts https://google.github.io/osdfir-infrastructure
helm repo update
```

To install the chart, specify any release name of your choice.
For example, using `my-release` as the release name, run:

```bash
helm install my-release osdfir-charts/osdfir-infrastructure \
--set turbinia.gcp.enabled=true \
--set turbinia.gcp.projectID=<GCP_PROJECT_ID> \
--set turbinia.gcp.projectRegion=<GKE_CLUSTER_REGION> \
--set turbinia.gcp.projectZone=<GKE_ClUSTER_ZONE>
--set turbinia.gcp.projectID=$PROJECT_ID \
--set turbinia.gcp.projectRegion=$REGION \
--set turbinia.gcp.projectZone=$ZONE \
--set turbinia.serviceaccount.name=$KSA_NAME \
--set turbinia.worker.autoscaling.enabled=true
```

## 4. Setup dfTimewolf and CLI configs
The command deploys OSDFIR Infrastructure on the Kubernetes cluster while enabling
Turbinia GCP integration and enabling autoscaling for Turbinia worker. Autoscaling
allows Turbinia to automatically adjust the number of worker pods based on CPU
utilization.

```console
Verify the deployment:

```bash
kubectl get pods -n $NAMESPACE
```

You should see pods for Timesketch, Turbinia, and Yeti in a Running state.

### Provisioning shared filestorage (optional)

While this example uses a single node, you might need persistent storage for
multi-node deployments. To enable persistent storage with a `ReadWriteMany` PVC,
add the following --set flags to your helm install command:

```bash
--set persistence.storageClass="standard-rwx" \
--set persistence.accessModes[0]="ReadWriteMany"
```

This configures OSDFIR Infrastructure to provision a GCP Filestore instance with
`ReadWriteMany` access mode, which is suitable for multi-node clusters where
shared storage is required.

## Step 5: Setup dfTimewolf and CLI configs

OSDFIR Infrastructure utilizes dfTimewolf for orchestrating forensic collection
and processing. dfTimewolf allows you to define "recipes" that specify how data
should be collected, processed by tools like Turbinia, and exported to platforms
like Timesketch.

To install dfTimewolf, you'll need to have Python 3.11 or greater, `git`, and
`pip` installed on your machine. dfTimewolf uses Poetry for simplified dependency
management. Then, follow these steps:

```bash
git clone https://github.com/log2timeline/dftimewolf.git && cd dftimewolf
pip install poetry
poetry install
poetry install && poetry shell
```

## 5. Process a GCP disk
Retrieve the Timesketch password from your deployment. For example, to grab
it from a release named `my-release`, run:

```bash
gcloud compute disks create test-disk
kubectl get secret --namespace default my-release-timesketch-secret -o jsonpath="{.data.timesketch-user}" | base64 -d
```

dfTimewolf uses a configuration file called `.dftimewolfrc` to store settings
such as your Timesketch credentials and endpoint. This allows you to avoid
entering these details every time you run a recipe.

Now, create a `.dftimewolfrc` file in your HOME directory, replacing
`$TIMESKETCH_PASSWORD` with the Timesketch password retrieved in the previous step:

```bash
cat >> ~/.dftimewolfrc << EOF
{
"timesketch_username": "timesketch",
"timesketch_password": "$TIMESKETCH_PASSWORD",
"timesketch_endpoint": "http://127.0.0.1:5000"
"turbinia_api": "http://127.0.0.1:8000"
}
```
Now you have dfTimewolf installed and configured to interact with your OSDFIR
Infrastructure deployment.
## Step 6: Process a Google Cloud Disk
With OSDFIR Infrastructure deployed and dfTimewolf installed and configured,
you're ready to process a GCP disk.
This example uses the dfTimewolf `gcp_turbinia_ts` recipe, which processes an
existing GCP persistent disk with Turbinia and sends the resulting Plaso timeline
to Timesketch.
First, create a disk to process using a name such as `test-disk`:
```bash
gcloud compute disks create test-disk --zone $ZONE
```
> Important: The recipe requires that the disk being processed
is in the same zone Turbinia is deployed to.
You'll need to use `kubectl port-forward` to forward the Turbinia and Timesketch services locally to your machine. This allows you to access the Turbinia UI and the Timesketch API from your local machine.
For example, to port-forward from a release named `my-release`, run the following commands:
```bash
kubectl --namespace default port-forward service/my-release-turbinia 8000:8000 && kubectl --namespace default port-forward service/my-release-timesketch 5000:5000
```
Then run the recipe
```bash
dftimewolf gcp_turbinia_ts $PROJECT_ID --disk_names test-disk
```
This command will:
* Process the disk with Turbinia, performing various forensic tasks such as running
Plaso and looking for prevelant anomalies.
* Export any generated Plaso files to Timesketch.
You can monitor the progress of the processing in the Turbinia UI (`http://localhost:8000`) and in the
dfTimewolf output. Once the processing is complete, log in to Timesketch (`http://localhost:5000`) and verify that a new timeline
has been created. You can then explore the timeline to analyze the processed artifacts.
Congratulations on completing the setup and processing your first disk! Please feel free to see the optional workflows below for more examples.
### Additional Workflows
#### Processing Disks from a Different Project
In a real-world scenario, you may need to process a GCP instance or disk belonging to a different project. To do this, you can use the dfTimewolf recipe `gcp_turbinia_disk_copy_ts`. This recipe copies the disk from the source project to your analysis project running OSDFIR Infrastructure, then processes it with Turbinia and sends the Plaso results to Timesketch.
#### Processing Files and Directories with Turbinia
This method is useful when you have evidence that is not located on a GCP disk (e.g., evidence from a local machine).
To copy evidence data into the Turbinia pod, first identify a Turbinia worker pod by running `kubectl get pods`. Then, use the `kubectl cp` command to copy the evidence file to the desired location within the pod. For example, to copy `my_evidence.dd` from your current directory to the `/mnt/turbiniavolume` directory in the turbinia-server-0 pod, run:
```bash
kubectl cp ./my_evidence.dd turbinia-server-0:/mnt/turbiniavolume/my_evidence.dd
```
To interact with Turbinia and submit processing jobs, you'll need to install the Turbinia client and configure it to connect to your Turbinia server.
```bash
pip3 install turbinia-client
```
Create a configuration file named `.turbinia_api_config.json` in your home directory with the following content:
```bash
cat >> ~/.turbinia_api_config.json << EOF
{
"default": {
"API_SERVER_ADDRESS": "http://localhost",
"API_SERVER_PORT": 8000,
"API_AUTHENTICATION_ENABLED": false
}
}
```
Then, submit a Turbinia request for the evidence:
```bash
turbinia-client submit directory --source_path /mnt/turbiniavolume/my_evidence.dd
```
This command submits a Turbinia request for the evidence you copied. The `--source_path` parameter specifies the path to the evidence within the Turbinia pod.
You can monitor the progress of the processing in the Turbinia UI (accessible by port-forwarding). Any Plaso jobs that run can have their output directly downloaded from the Turbinia UI and imported into Timesketch.
#### Searching for IoCs with Yeti
Yeti enhances your Timesketch investigations by enabling you to search for Yeti Intelligence (IoCs, threat data, etc.) across Timesketch timelines.
Learn how to use Yeti with Timesketch by following this [guide](https://yeti-platform.io/guides/indicators-timesketch/investigation/).

0 comments on commit 4cdf847

Please sign in to comment.