Skip to content

Commit

Permalink
Merge pull request #444 from haiwen/k8s_log
Browse files Browse the repository at this point in the history
K8s log
  • Loading branch information
freeplant authored Jan 21, 2025
2 parents cdf6e5f + 12cd7dc commit 2e70b7b
Show file tree
Hide file tree
Showing 5 changed files with 240 additions and 4 deletions.
2 changes: 1 addition & 1 deletion manual/repo/k8s/ce/seafile-env.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ metadata:
data:
# for Seafile server
TIME_ZONE: "UTC"
SEAFILE_LOG_TO_STDOUT: "true"
SEAFILE_LOG_TO_STDOUT: "false"
SITE_ROOT: "/"
ENABLE_SEADOC: "false"
SEADOC_SERVER_URL: "https://seafile.example.com/sdoc-server" # only valid in ENABLE_SEADOC = true
Expand Down
2 changes: 1 addition & 1 deletion manual/repo/k8s/pro/seafile-env.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ metadata:
data:
# for Seafile server
TIME_ZONE: "UTC"
SEAFILE_LOG_TO_STDOUT: "true"
SEAFILE_LOG_TO_STDOUT: "false"
SITE_ROOT: "/"
ENABLE_SEADOC: "false"
SEADOC_SERVER_URL: "https://seafile.example.com/sdoc-server" # only valid in ENABLE_SEADOC = true
Expand Down
229 changes: 229 additions & 0 deletions manual/setup/cluster_deploy_with_k8s.md
Original file line number Diff line number Diff line change
Expand Up @@ -241,3 +241,232 @@ Finally, you should modify the related URLs in `seahub_settings.py`, from `http:
SERVICE_URL = "https://seafile.example.com"
FILE_SERVER_ROOT = 'https://seafile.example.com/seafhttp'
```
## Log routing and aggregation system
Similar to [Single-pod Seafile](./k8s_single_node.md), you can browse the log files of Seafile running directly in the persistent volume directory. The difference is that when using K8S to deploy a Seafile cluster (especially in a cloud environment), the persistent volume created is usually shared and synchronized for all nodes. However, ***the logs generated by the Seafile service do not record the specific node information where these logs are located***, so browsing the files in the above folder may make it difficult to identify which node these logs are generated from. Therefore, one solution proposed here is:
1. Record the generated logs to the standard output. In this way, the logs can be distinguished under each node by `kubectl logs` (but all types of logs will be output together now). You can enable this feature (**it should be enabled by default in K8S Seafile cluster but not in K8S single-pod Seafile**) by modifing `SEAFILE_LOG_TO_STDOUT` to `true` in `seafile-env.yaml`:
```yaml
...
data:
...
SEAFILE_LOG_TO_STDOUT: "true"
...
```
Then restart the Seafile server:
```sh
kubectl delete -f /opt/seafile-k8s-yaml/
kubectl apply -f /opt/seafile-k8s-yaml/
```
2. Since the logs in step 1 can be distinguished between nodes, but they are aggregated and output together, it is not convenient for log retrieval. So you have to route the standard output logs (i.e., distinguish logs by corresponding components name) and re-record them in a new file or upload them to a log aggregation system (e.g., [*Loki*](https://grafana.com/oss/loki/)).
Currently in the K8S environment, the commonly used log routing plugins are:
- [*Fluent Bit*](https://fluentbit.io/)
- [*Fluentd*](https://www.fluentd.org/)
- [*Logstash*](https://www.elastic.co/logstash/)
- [*Promtail*](https://grafana.com/loki/docs/sources/promtail/) (also a part of Loki)
***Fluent Bit*** and ***Promtail*** are more lightweight (i.e., consume less system resources), while *Promtail* only supports transferring logs to *Loki*. Therefore, this document will mainly introduce log routing through ***Fluent Bit*** which is a fast, lightweight logs and metrics agent. It is also a CNCF graduated sub-project under the umbrella of *Fluentd*. *Fluent Bit* is licensed under the terms of the Apache License v2.0. You should deploy the *Fluent Bit* in your K8S cluster by following [offical document](https://docs.fluentbit.io/manual/installation/kubernetes) firstly. Then modify Fluent-Bit pod settings to mount a new directory to load the configuration files:
```yaml
#kubectl edit ds fluent-bit
...
spec:
...
spec:
...
containers:
- name: fluent-bit
volumeMounts:
...
- mountPath: /fluent-bit/etc/seafile
name: fluent-bit-seafile
- mountPath: /
...
...
volumes:
...
- hostPath:
path: /opt/fluent-bit
name: fluent-bit-seafile
```
and
```yaml
#kubectl edit cm fluent-bit
data:
...
fluent-bit.conf: |
[SERVICE]
...
Parsers_File /fluent-bit/etc/seafile/confs/parsers.conf
...
@INCLUDE /fluent-bit/etc/seafile/confs/*-log.conf
```
For example in here, we use `/opt/fluent-bit/confs` (**it has to be non-shared**). What's more, the parsers will be defined in `/opt/fluent-bit/confs/parsers.conf`, and for each type log (e.g., *seahub*'s log, *seafevent*'s log) will be defined in `/opt/fluent-bit/confs/*-log.conf`. Each `.conf` file defines several Fluent-Bit data pipeline components:

| **Pipeline** | **Description** | **Required/Optional** |
| ------------- | --------------- | --------------------- |
| **INPUT** | Specifies where and how Fluent-Bit can get the original log information, and assigns a tag for each log record after read. | Required |
| **PARSER** | Parse the read log records. For K8S Docker runtime logs, they are usually in Json format. | Required |
| **FILTER** | Filters and selects log records with a specified tag, and assigns a new tag to new records. | Optional |
| **OUTPUT** | tells Fluent-Bit what format the log records for the specified tag will be in and where to output them (such as file, *Elasticsearch*, *Loki*, etc.). | Required |

!!! warning
For ***PARSER***, it can only be stored in `/opt/fluent-bit/confs/parsers.conf`, otherwise the Fluent-Bit cannot startup normally.

### Inputer

According to the above, a container will generate a log file (usually in `/var/log/containers/<container-name>-xxxxxx.log`), so you need to prepare an importer and add the following information (for more details, please refer to offical document about [*TAIL inputer*](https://docs.fluentbit.io/manual/pipeline/inputs/tail)) in `/opt/fluent-bit/confs/seafile-log.conf`:

```conf
[INPUT]
Name tail
Path /var/log/containers/seafile-frontend-*.log
Buffer_Chunk_Size 2MB
Buffer_Max_Size 10MB
Docker_Mode On
Docker_Mode_Flush 5
Tag seafile.*
Parser Docker # for definition, please see the next section as well
[INPUT]
Name tail
Path /var/log/containers/seafile-backend-*.log
Buffer_Chunk_Size 2MB
Buffer_Max_Size 10MB
Docker_Mode On
Docker_Mode_Flush 5
Tag seafile.*
Parser Docker
```

The above defines two importers, which are used to monitor seafile-frontend and seafile-backend services respectively. The reason why they are written together here is that for a node, you may not know when it will run the frontend service and when it will run the backend service, but they have the same tag prefix `seafile.`.

### Parser

Each input has to use a parser to parse the logs and pass them to the filter. Here, a parser named `Docker` is created to parse the logs generated by the *K8S-docker-runtime container*. The parser is placed in `/opt/fluent-bit/confs/parser.conf` (for more details, please refer to offical document about [JSON parser](https://docs.fluentbit.io/manual/pipeline/parsers/json)):

```conf
[PARSER]
Name Docker
Format json
Time_Key time
Time_Format %Y-%m-%dT%H:%M:%S.%LZ
```

!!! tip "Log records after parsing"
The logs of the Docker container are saved in /var/log/containers in **Json** format (see the sample below), which is why we use the `Json` format in the above parser.

```json
{"log":"[seaf-server] [2025-01-17 07:43:48] [INFO] seafile-session.c(86): fileserver: web_token_expire_time = 3600\n","stream":"stdout","time":"2025-01-17T07:43:48.294638442Z"}
{"log":"[seaf-server] [2025-01-17 07:43:48] [INFO] seafile-session.c(98): fileserver: max_index_processing_threads= 3\n","stream":"stdout","time":"2025-01-17T07:43:48.294810145Z"}
{"log":"[seaf-server] [2025-01-17 07:43:48] [INFO] seafile-session.c(111): fileserver: fixed_block_size = 8388608\n","stream":"stdout","time":"2025-01-17T07:43:48.294879777Z"}
{"log":"[seaf-server] [2025-01-17 07:43:48] [INFO] seafile-session.c(123): fileserver: max_indexing_threads = 1\n","stream":"stdout","time":"2025-01-17T07:43:48.295002479Z"}
{"log":"[seaf-server] [2025-01-17 07:43:48] [INFO] seafile-session.c(138): fileserver: put_head_commit_request_timeout = 10\n","stream":"stdout","time":"2025-01-17T07:43:48.295082733Z"}
{"log":"[seaf-server] [2025-01-17 07:43:48] [INFO] seafile-session.c(150): fileserver: skip_block_hash = 0\n","stream":"stdout","time":"2025-01-17T07:43:48.295195843Z"}
{"log":"[seaf-server] [2025-01-17 07:43:48] [INFO] ../common/seaf-utils.c(553): Use database Mysql\n","stream":"stdout","time":"2025-01-17T07:43:48.29704895Z"}
```

When these logs are obtained by the importer and parsed by the parser, they will become independent log records with the following fields:

- `log`: The original log content (i.e., same as you seen in `kubectl logs seafile-xxx`) and an extra line break at the end (i.e., `\n`). **This is also the field we need to save or upload to the log aggregation system in the end**.
- `stream`: The original log come from. `stdout` means the *standard output*.
- `time`: The time when the log is recorded in the corresponding stream (ISO 8601 format).


### Filter

Add two filters in `/opt/fluent-bit/confs/seafile-log.conf` for records filtering and routing. Here, the [*record_modifier* filter](https://docs.fluentbit.io/manual/pipeline/filters/record-modifier) is to select useful keys (see the contents in above *tip* label, only the `log` field is what we need) in the log records and [*rewrite_tag* filter](https://docs.fluentbit.io/manual/pipeline/filters/rewrite-tag) is used to route logs according to specific rules:

```conf
[FILTER]
Name record_modifier
Match seafile.*
Allowlist_key log
[FILTER]
Name rewrite_tag
Match seafile.*
Rule $log ^.*\[seaf-server\].*$ seaf-server false # for seafile's logs
Rule $log ^.*\[seahub\].*$ seahub false # for seahub's logs
Rule $log ^.*\[seafevents\].*$ seafevents false # for seafevents' lgos
Rule $log ^.*\[seafile-slow-rpc\].*$ seafile-slow-rpc false # for slow-rpc's logs
```

### Output log's to *Loki*

Loki is multi-tenant log aggregation system inspired by *Prometheus*. It is designed to be very cost effective and easy to operate. The Fluent-Bit *loki* built-in output plugin allows you to send your log or events to a *Loki service*. It supports data enrichment with Kubernetes labels, custom label keys and Tenant ID within others.

!!! tip "Alternative Fluent-Bit Loki plugin by *Grafana*"
For sending logs to Loki, there are [two plugins](https://grafana.com/docs/loki/latest/send-data/fluentbit/) for Fluent-Bit:

- The [built-in *Loki* plugin](https://docs.fluentbit.io/manual/pipeline/outputs/loki) maintained by the Fluent-Bit officially, and we will use it in this part because it provides the most complete features.
- [*Grafana-loki* plugin](https://grafana.com/docs/loki/latest/send-data/fluentbit/community-plugin/) maintained by *Grafana Labs*.


Due to each outputer dose not have a distinguishing marks in the configuration files (because Fluent-Bit takes each plugin as a tag workflow):

- ***Seaf-server log***: Add an outputer to `/opt/fluent-bit/confs/seaf-server-log.conf`:

```conf
[OUTPUT]
Name loki
Match seaf-server
Host <your Loki's host>
port <your Loki's port>
labels job=fluentbit, node_name=<your-node-name>, node_id=<your-node-id> # node_name and node_id is optional, but recommended for identifying the source node
```

- ***seahub log***: Add an outputer to `/opt/fluent-bit/confs/seahub-log.conf`:

```conf
[OUTPUT]
Name loki
Match seahub
Host <your Loki's host>
port <your Loki's port>
labels job=fluentbit, node_name=<your-node-name>, node_id=<your-node-id> # node_name and node_id is optional, but recommended for identifying the source node
```

- ***seafevents log***: Add an outputer to `/opt/fluent-bit/confs/seafevents-log.conf`:

```conf
[OUTPUT]
Name loki
Match seafevents
Host <your Loki's host>
port <your Loki's port>
labels job=fluentbit, node_name=<your-node-name>, node_id=<your-node-id> # node_name and node_id is optional, but recommended for identifying the source node
```

- ***seafile-slow-rpc log***: Add an outputer to `/opt/fluent-bit/confs/seafile-slow-rpc-log.conf`:

```conf
[OUTPUT]
Name loki
Match seafile-slow-rpc
Host <your Loki's host>
port <your Loki's port>
labels job=fluentbit, node_name=<your-node-name>, node_id=<your-node-id> # node_name and node_id is optional, but recommended for identifying the source node
```

!!! tip "Cloud Loki instance"
If you are using a cloud Loki instance, you can follow the [Fluent-Bit Loki plugin document](https://docs.fluentbit.io/manual/pipeline/outputs/loki) to fill up all necessary fields. Usually, the following fields are **additional needs** in cloud Loki service:

- `tls`
- `tls.verify`
- `http_user`
- `http_passwd`
9 changes: 8 additions & 1 deletion manual/setup/k8s_single_node.md
Original file line number Diff line number Diff line change
Expand Up @@ -133,4 +133,11 @@ kubectl exec -it seafile-748b695648-d6l4g -- bash

## HTTPS

Please refer [here](./cluster_deploy_with_k8s.md#load-balance-and-https) about suggestions of enabling HTTPS in K8S.
Please refer to [here](./cluster_deploy_with_k8s.md#load-balance-and-https) about suggestions of enabling HTTPS in K8S.

## Seafile directory structure

Please refer to [here](./setup_pro_by_docker.md#seafile-directory-structure) for the details.

!!! tip "Send logs to Loki"
You can directly view the log files of single-pod Seafile in the persistent volume directory, as the log files are distinguishable even the node of pod has changed (because there will only be one node running Seafile), so by default single-pod Seafile logs are not output to standard output. If you need to record these log files to a log server (e.g., [*Loki*](https://grafana.com/oss/loki/)), you can refer to [here](./cluster_deploy_with_k8s.md#log-routing-and-aggregation-system) for more informations.
2 changes: 1 addition & 1 deletion manual/setup/setup_pro_by_docker.md
Original file line number Diff line number Diff line change
Expand Up @@ -228,7 +228,7 @@ docker compose up -d

Placeholder spot for shared volumes. You may elect to store certain persistent information outside of a container, in our case we keep various log files and upload directory outside. This allows you to rebuild containers easily without losing important information.

* /opt/seafile-data/seafile: This is the directory for seafile server configurationlogs and data.
* /opt/seafile-data/seafile: This is the directory for seafile server configuration, logs and data.
* /opt/seafile-data/seafile/logs: This is the directory that would contain the log files of seafile server processes. For example, you can find seaf-server logs in `/opt/seafile-data/seafile/logs/seafile.log`.
* /opt/seafile-data/logs: This is the directory for operating system and Nginx logs.
* /opt/seafile-data/logs/var-log: This is the directory that would be mounted as `/var/log` inside the container. For example, you can find the nginx logs in `/opt/seafile-data/logs/var-log/nginx/`.
Expand Down

0 comments on commit 2e70b7b

Please sign in to comment.