More updates

google · Oct 4, 2024 · 4cdf847 · 4cdf847
1 parent d36469c
commit 4cdf847
Showing 1 changed file with 226 additions and 34 deletions.
diff --git a/docs/installing-on-gke.md b/docs/installing-on-gke.md
@@ -1,33 +1,62 @@
 # Deploying OSDFIR Infrastructure on Google Kubernetes Engine (GKE)
 
 In this tutorial you will learn how to deploy and configure OSDFIR Infrastructure
-on Google Kubernetes Engine (GKE). You will learn how to configure.
+on Google Kubernetes Engine (GKE). You will then learn how to configure dfTimewolf
+to process a Google Cloud disk using Turbinia and then import any created timelines
+into Timesketch.
 
-## 1. Create a Kubernetes Cluster
 
-To get started, let's create a Kubernetes cluster in Google Cloud. You will need
-to pick a name for your cluster and a zone to deploy it to. Here, we will go
-with "osdfir-test" and the zone "us-central1-f". Let us save it in
-environment variables:
+GRR is not currently supported in this Helm chart deployment. We are working to add GRR support in a future release. In the meantime, you can find a dedicated guide for deploying GRR on GKE [here](https://github.com/google/osdfir-infrastructure/tree/main/cloud).
+
+## Step 1: Set up Environment Variables
+
+Before creating the Kubernetes (K8s) cluster, define the following environment
+variables in your terminal. Replace the placeholders with your actual values:
 
 ```bash
-export PROJECT=your-gcp-project
-export CLUSTER="osdfir-test"
-export ZONE="us-central1-f"
+export PROJECT_ID="your-gcp-project"  # Your Google Cloud project ID
+export PROJECT_NUMBER="your-gcp-number"  # Your Google Cloud project number
+export REGION="us-central1" # The region of your cluster 
+export ZONE="us-central1-f"  # The zone where you want to create the cluster
+export CLUSTER="osdfir-cluster"  # The name you choose for your K8s cluster
+export NAMESPACE="default"  # Your K8s namespace (can be left as 'default')
+export KSA_NAME="turbinia"  # Your Turbinia K8s service account (defaults to 'turbinia' if not set)
 ```
 
-Now, create the cluster using the following command:
+> *Note*: You can find the GCP project number by running `gcloud projects describe $PROJECT_ID`
+
+## Step 2: Create a Kubernetes Cluster
+
+Now, create the Kubernetes cluster with the specified configurations:
 
 ```bash
 gcloud container clusters create $CLUSTER \
---num-nodes=1 \
---machine-type "e2-standard-4" \
---zone $ZONE \
---workload-pool=$PROJECT.svc.id.goog \
---addons GcpFilestoreCsiDriver
+    --num-nodes=1 \
+    --machine-type "e2-standard-4" \
+    --zone $ZONE \
+    --workload-pool=$PROJECT.svc.id.goog \
+    --addons GcpFilestoreCsiDriver
 ```
 
-Set up the Google Kubernetes Engine auth plugin for kubectl:
+> *Note*: It will take 4-5 minutes to create the cluster.
+
+This command creates a single-node cluster with the necessary resources and addons
+for running OSDFIR Infrastructure.
+
+Important Considerations:
+
+* GKE Autopilot is not currently supported because Turbinia workers require
+elevated privileges for disk processing.
+* You need a machine type of at least `e2-standard-4` (or equivalent with at
+least 4 CPUs) when deploying to GKE.
+* For clusters with more than one node, you'll need to set up a shared filesystem
+like GCP Filestore. In Kubernetes, this translates to using a Persistent Volume
+Claim (PVC) with `ReadWriteMany` access.
+
+### Configure kubectl to Access the Cluster
+
+Once the cluster has been created, set up the Google Kubernetes Engine auth
+plugin for kubectl:
 
 ```bash
 gcloud components install gke-gcloud-auth-plugin
@@ -41,58 +70,221 @@ Now check that you can connect to the cluster:
 kubectl get nodes -o wide
 ```
 
-> ⏲ It will take 4-5 minutes to create the cluster.
+## Step 3: Create the Turbinia GCP Service Account
 
-## 2. Create the Turbinia GCP Service Account
-
-```bash
-export PROJECT_ID=your-project  # Your Google Cloud project ID.
-export PROJECT_NUMBER=your-number # Your Google Cloud project number.
-export NAMESPACE=default # Your K8s namespace.
-export KSA_NAME=turbinia # Your Turbinia K8s service account name.
-```
+To process virtual machine disks in Google Cloud Platform (GCP) with Turbinia,
+you need a dedicated GCP service account with the necessary permissions to attach
+and detach disks.
 
 ```bash
+# Grant the Compute Instance Admin role for attaching and detaching disks
 gcloud projects add-iam-policy-binding projects/$PROJECT_ID \
     --role=roles/compute.instanceAdmin \
     --member=principal://iam.googleapis.com/projects/$PROJECT_NUMBER/locations/global/workloadIdentityPools/$PROJECT_ID.svc.id.goog/subject/ns/$NAMESPACE/sa/$KSA_NAME
 ```
 
 ```bash
+# Grant the Service Account user role to allow the account to act as a service account
 $ gcloud projects add-iam-policy-binding projects/$PROJECT_ID \
     --role=roles/compute.serviceAccountUser \
     --member=principal://iam.googleapis.com/projects/$PROJECT_NUMBER/locations/global/workloadIdentityPools/$PROJECT_ID.svc.id.goog/subject/ns/$NAMESPACE/sa/$KSA_NAME
 ```
 
+These command grants the following roles to the service account:
+
+* Compute Instance Admin: Allows Turbinia to attach and detach disks from GCP
+VMs for processing.
+* Service Account User: Allows the Turbinia service account to act as this newly
+created service account.
+
+## Step 4: Deploy the OSDFIR Infrastructure Helm Chart
 
-## 3. Deploy OSDFIR Infrastructure
+Now it is time to deploy the OSDFIR Infrastructure Helm chart.
+
+The first step is to add the repo and then update to pick up any new changes.
 
 ```console
 helm repo add osdfir-charts https://google.github.io/osdfir-infrastructure
+helm repo update
 ```
 
+To install the chart, specify any release name of your choice.
+For example, using `my-release` as the release name, run:
+
 ```bash
 helm install my-release osdfir-charts/osdfir-infrastructure \
     --set turbinia.gcp.enabled=true \
-    --set turbinia.gcp.projectID=<GCP_PROJECT_ID> \
-    --set turbinia.gcp.projectRegion=<GKE_CLUSTER_REGION> \
-    --set turbinia.gcp.projectZone=<GKE_ClUSTER_ZONE>
+    --set turbinia.gcp.projectID=$PROJECT_ID \
+    --set turbinia.gcp.projectRegion=$REGION \
+    --set turbinia.gcp.projectZone=$ZONE \
+    --set turbinia.serviceaccount.name=$KSA_NAME \
+    --set turbinia.worker.autoscaling.enabled=true  
 ```
 
-## 4. Setup dfTimewolf and CLI configs
+The command deploys OSDFIR Infrastructure on the Kubernetes cluster while enabling
+Turbinia GCP integration and enabling autoscaling for Turbinia worker. Autoscaling
+allows Turbinia to automatically adjust the number of worker pods based on CPU
+utilization.
 
-```console
+Verify the deployment:
+
+```bash
+kubectl get pods -n $NAMESPACE
+```
+
+You should see pods for Timesketch, Turbinia, and Yeti in a Running state.
+
+### Provisioning shared filestorage (optional)
+
+While this example uses a single node, you might need persistent storage for
+multi-node deployments. To enable persistent storage with a `ReadWriteMany` PVC,
+add the following --set flags to your helm install command:
+
+```bash
+--set persistence.storageClass="standard-rwx" \
+--set persistence.accessModes[0]="ReadWriteMany"
+```
+
+This configures OSDFIR Infrastructure to provision a GCP Filestore instance with
+`ReadWriteMany` access mode, which is suitable for multi-node clusters where
+shared storage is required.
+
+## Step 5: Setup dfTimewolf and CLI configs
+
+OSDFIR Infrastructure utilizes dfTimewolf for orchestrating forensic collection
+and processing. dfTimewolf allows you to define "recipes" that specify how data
+should be collected, processed by tools like Turbinia, and exported to platforms
+like Timesketch.
+
+To install dfTimewolf, you'll need to have Python 3.11 or greater, `git`, and
+`pip` installed on your machine. dfTimewolf uses Poetry for simplified dependency
+management. Then, follow these steps:
+
+```bash
 git clone https://github.com/log2timeline/dftimewolf.git && cd dftimewolf
 pip install poetry
-poetry install
+poetry install && poetry shell
 ```
 
-## 5. Process a GCP disk
+Retrieve the Timesketch password from your deployment. For example, to grab
+it from a release named `my-release`, run:
 
 ```bash
-gcloud compute disks create test-disk
+kubectl get secret --namespace default my-release-timesketch-secret -o jsonpath="{.data.timesketch-user}" | base64 -d
 ```
 
+dfTimewolf uses a configuration file called `.dftimewolfrc` to store settings
+such as your Timesketch credentials and endpoint. This allows you to avoid
+entering these details every time you run a recipe.
+
+Now, create a `.dftimewolfrc` file in your HOME directory, replacing
+`$TIMESKETCH_PASSWORD` with the Timesketch password retrieved in the previous step:
+
+```bash
+cat >> ~/.dftimewolfrc << EOF
+{
+"timesketch_username": "timesketch",
+"timesketch_password": "$TIMESKETCH_PASSWORD",
+"timesketch_endpoint": "http://127.0.0.1:5000"
+"turbinia_api": "http://127.0.0.1:8000"
+}
+```
+
+Now you have dfTimewolf installed and configured to interact with your OSDFIR
+Infrastructure deployment.
+
+## Step 6: Process a Google Cloud Disk
+
+With OSDFIR Infrastructure deployed and dfTimewolf installed and configured,
+you're ready to process a GCP disk.
+
+This example uses the dfTimewolf `gcp_turbinia_ts` recipe, which processes an
+existing GCP persistent disk with Turbinia and sends the resulting Plaso timeline
+to Timesketch.
+
+First, create a disk to process using a name such as `test-disk`:
+
+```bash
+gcloud compute disks create test-disk --zone $ZONE
+```
+
+> Important: The recipe requires that the disk being processed
+is in the same zone Turbinia is deployed to.
+
+You'll need to use `kubectl port-forward` to forward the Turbinia and Timesketch services locally to your machine. This allows you to access the Turbinia UI and the Timesketch API from your local machine.
+
+For example, to port-forward from a release named `my-release`, run the following commands:
+
+```bash
+kubectl --namespace default port-forward service/my-release-turbinia 8000:8000 && kubectl --namespace default port-forward service/my-release-timesketch 5000:5000  
+```
+
+Then run the recipe
+
 ```bash
 dftimewolf gcp_turbinia_ts $PROJECT_ID --disk_names test-disk
 ```
+
+This command will:
+
+* Process the disk with Turbinia, performing various forensic tasks such as running
+Plaso and looking for prevelant anomalies.
+* Export any generated Plaso files to Timesketch.
+
+You can monitor the progress of the processing in the Turbinia UI (`http://localhost:8000`) and in the
+dfTimewolf output. Once the processing is complete, log in to Timesketch (`http://localhost:5000`) and verify that a new timeline
+has been created. You can then explore the timeline to analyze the processed artifacts.
+
+Congratulations on completing the setup and processing your first disk! Please feel free to see the optional workflows below for more examples.
+
+### Additional Workflows
+
+#### Processing Disks from a Different Project
+
+In a real-world scenario, you may need to process a GCP instance or disk belonging to a different project. To do this, you can use the dfTimewolf recipe `gcp_turbinia_disk_copy_ts`. This recipe copies the disk from the source project to your analysis project running OSDFIR Infrastructure, then processes it with Turbinia and sends the Plaso results to Timesketch.
+
+#### Processing Files and Directories with Turbinia
+
+This method is useful when you have evidence that is not located on a GCP disk (e.g., evidence from a local machine).
+
+To copy evidence data into the Turbinia pod, first identify a Turbinia worker pod by running `kubectl get pods`. Then, use the `kubectl cp` command to copy the evidence file to the desired location within the pod. For example, to copy `my_evidence.dd` from your current directory to the `/mnt/turbiniavolume` directory in the turbinia-server-0 pod, run:
+
+```bash
+kubectl cp ./my_evidence.dd turbinia-server-0:/mnt/turbiniavolume/my_evidence.dd
+```
+
+To interact with Turbinia and submit processing jobs, you'll need to install the Turbinia client and configure it to connect to your Turbinia server.
+
+```bash
+pip3 install turbinia-client
+```
+
+Create a configuration file named `.turbinia_api_config.json` in your home directory with the following content:
+
+```bash
+cat >> ~/.turbinia_api_config.json << EOF
+{
+    "default": {
+        "API_SERVER_ADDRESS": "http://localhost",
+        "API_SERVER_PORT": 8000,
+        "API_AUTHENTICATION_ENABLED": false
+    }
+}
+```
+
+Then, submit a Turbinia request for the evidence:
+
+
+```bash
+turbinia-client submit directory --source_path /mnt/turbiniavolume/my_evidence.dd
+```
+
+This command submits a Turbinia request for the evidence you copied. The `--source_path` parameter specifies the path to the evidence within the Turbinia pod.
+
+You can monitor the progress of the processing in the Turbinia UI (accessible by port-forwarding). Any Plaso jobs that run can have their output directly downloaded from the Turbinia UI and imported into Timesketch.
+
+#### Searching for IoCs with Yeti
+
+Yeti enhances your Timesketch investigations by enabling you to search for Yeti Intelligence (IoCs, threat data, etc.) across Timesketch timelines.
+
+Learn how to use Yeti with Timesketch by following this [guide](https://yeti-platform.io/guides/indicators-timesketch/investigation/).