As a korifi operator I want to be able to use my own log cache endpoint #3668

georgethebeatle · 2024-12-17T15:22:00Z

Dev Notes

Currently korifi implements the api/v1/read log cache endpoint in order to get logs for app
Currently korifi queries the metrics servce crds for pod stats in order to satisfy the processes/stats endpoint
In cf for vms process stats are also fetched from the log cache api.
We should implements the api/v1/query in the log cache handler and make the process stats repository loop back to this endpoint
Furthermore we should also make the logcache api endpoint configurable in the helm chart so that operators can override it
If no value is provided we will keep loooping back to the naive implementation by default
Log cache api spec

The text was updated successfully, but these errors were encountered:

chombium · 2024-12-18T11:04:28Z

Hi,

CF for VMs suports various log types which practically are the app logs themselves combined with the logs written from the platform components and a related to an app. An example would be: access logs, staging logs, app restarts, rescheduling and similar. All these logs are stored in Log Cache and are available through the Log Cache API. It would be nice to have all these app and app lifecycle logs available, but we should find out how and where to get them from.

I've tested and compared what kind of logs are being printed out with cf logs in CF-for-VMs and Korifi in/after different cf lifecycle events. Note: cf logs practically calls the Log Cache v1/read api for the given app with limit=200 parameter.

During my tests I've seen so far cf push output in Korifi is pretty similar if not identical to the one on CF-for-VMs. For the other commands like cf restage, cf restart Korifi only outputs the app logs and the other platform related logs have to be fetched via the pod events kubectl get events <pod>. In Korifi we have to practically combine data from various places to get a "cf logs" output as close to what we have on CF-for-VMs.

The other aspect to think about is that CF for VMs follows the Loggregator API format and observability metadata, if we want to use the same format and where and how do we get the data from.

In regards to the api/v1/query endpoint it is Prometheus compatible API, so if can make the metrics available via Prometheus we only need to map the api endpoint somewhere.

I guess, the /api/v1/meta would be out of scope if we simply read the logs from the k8s API server server. We could add something if needed afterwards.
Update: The meta endpoint show the state of Log Cache and how many data entries (logs and metrics) for apps and platform components are stored inside it. We could add something similar if we decide what to use as a caching layer if a cache is needed at all. If we don't need to cache anything, it is safe to leave out this endpoint

The one important thing to think about is how to collect and merge the logs and metrics in a single output(api) in case an app has multiple instances (a workload with multiple pods).

For implementation of the API we could take a look at the log-cache-release and the log-cache-cf-cli plugin. It will be interesting to check and decide if a simple API facade on top of the k8s API would be enough or we need a central cache component which will collect everything and serve the data with the Log Cache API.

Update: Log Cache uses Syslog to inject logs and metrics, but in k8s Syslog is barely used, so if we need to have a short term storage to serve the Log Cache API, we could take one of the shelf observability backends and put an API facade in front of it. It is suggested that Log Cache stores the data for at least 15 minutes.

btw. I'm one of the maintainers of the Loggregator, the CF's logging and metrics stack and I want to help ;)

georgethebeatle · 2025-01-23T15:06:20Z

@chombium thank you for your input! All your observations are correct, however we are thinking about a very minimal "batteries-included" implementation of the log-cache api to get the most basic behaviour. This of course leaves the door open for anyone to bring their own log cache that will have to do all the things you talk about in your comment.

As of today Korifi's API already provides a minimal GET /api/v1/read implementation that can only read container logs for an app in oder to support the needs of the cf push and cf logs commands.

The problematic part is the fetching of app metrics. Towards the end of the cf push operation the cli starts querying /v3/processes/<procees-guid>/stats. Korifi implements this endpoint by directly querying the metrics server custom resources. This implementation makes it impossible to bring your own log cache implementation as of today.

Our idea is to push the metrics fetching code behind the /api/v1/read interface. This, by the way is exactly what the cloud controller does in classic cf.

TL;DR We do not have the capacity to implement log cache properly, but we want to enable Korifi users to bring their own.

chombium · 2025-01-24T11:20:16Z

Hi @georgethebeatle,

If we implement the Log Cache API fasade properly it will be easy for someone to get their own backend and plug it in. I see that the Log Cache API handler for Logs has a defined interface which is good. I guess we should do something for logs as well.

The problematic part is the fetching of app metrics. Towards the end of the cf push operation the cli starts querying /v3/processes//stats. Korifi implements this endpoint by directly querying the metrics server custom resources. This implementation makes it impossible to bring your own log cache implementation as of today.

This is indeed a problem. The Metrics Server README states explicitly that it should not be used for forwarding metrics to monitoring solutions and Log Cache is one API for that..

Metrics Server is meant only for autoscaling purposes. For example, don't use it to forward metrics to monitoring solutions, or as a source of monitoring solution metrics. In such cases please collect metrics from Kubelet /metrics/resource endpoint directly.

I guess we shouldn't use the use the metrics server to get the app metrics. It's sort of a gray zone for our use-case as we have a custom controller which monitors the cf apps, but it feels wrong to me.

I would propose to what the others are doing about monitoring workloads running on k8s before implementing something. On the other hand we could also check if there is already something implemented in the OpenTelemetry Collector and its receivers for collecting resource consumption metrics from the workloads running in the cluster. I'll take a look at this.

danail-branekov · 2025-01-24T11:29:26Z

Hi @chombium

If we implement the Log Cache API fasade properly it will be easy for someone to get their own backend

The log cache REST API IS the facade. The way we envision it is that we would have a helm value where users would be able to specify where (the url) their real, fancy, open-telemetry-based, whatever logcache implementation is. As @georgethebeatle said, we do not have the capacity to implement logcache properly. We also do not want to support a pluggable logcache implementation for k8s - that is a completely unrelated project.

Furthermore, Korifi is a CF API implementation, therefore logcache should not be implemented at all in the first place. We have this very basic and naive "batteries-included" implementation to only facilitate cases where users just want to play with Korifi (and do not want to install additional stuff), or are happy to accept abusing the metrics server.

chombium · 2025-01-24T12:46:31Z

Hi @danail-branekov ,

I completely agree with what you've written. We should add the basic functionality so that cf push and cf logs will work decently and we can later think and discuss about production ready things.

georgethebeatle · 2025-01-24T14:56:09Z

Closing in favour of the sub-issues

korifi-bot added this to Korifi - Backlog Dec 17, 2024

github-project-automation bot moved this to 🧊 Icebox in Korifi - Backlog Dec 17, 2024

georgethebeatle added the Basic System Observability label Dec 17, 2024

chombium mentioned this issue Dec 18, 2024

[Feature]: As a Korifi Operator I want to be able to understand the health of my Korifi System #3665

Open

georgethebeatle closed this as completed Jan 24, 2025

github-project-automation bot moved this from 🧊 Icebox to ✅ Done in Korifi - Backlog Jan 24, 2025

github-actions bot added this to the release-candidate milestone Jan 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

As a korifi operator I want to be able to use my own log cache endpoint #3668

As a korifi operator I want to be able to use my own log cache endpoint #3668

georgethebeatle commented Dec 17, 2024 •

edited

Loading

chombium commented Dec 18, 2024 •

edited

Loading

georgethebeatle commented Jan 23, 2025

chombium commented Jan 24, 2025

danail-branekov commented Jan 24, 2025 •

edited

Loading

chombium commented Jan 24, 2025

georgethebeatle commented Jan 24, 2025

As a korifi operator I want to be able to use my own log cache endpoint #3668

As a korifi operator I want to be able to use my own log cache endpoint #3668

Comments

georgethebeatle commented Dec 17, 2024 • edited Loading

Dev Notes

chombium commented Dec 18, 2024 • edited Loading

georgethebeatle commented Jan 23, 2025

chombium commented Jan 24, 2025

danail-branekov commented Jan 24, 2025 • edited Loading

chombium commented Jan 24, 2025

georgethebeatle commented Jan 24, 2025

georgethebeatle commented Dec 17, 2024 •

edited

Loading

chombium commented Dec 18, 2024 •

edited

Loading

danail-branekov commented Jan 24, 2025 •

edited

Loading