forum Healthcheck isn't reliable #2558
Labels
bug
Something isn't working
product:infrastructure
Issues related to application and operations infrastructure
project:open-edx
Expected Behavior
If forum can't talk to its mongodb or opensearch backends, the app should crash / stop outright. Not enter a funky state where the ASG / LB healthcheck passes but the app itself isn't working.
Current Behavior
If forum can't find it's mongodb or opensearch instances for 10 minutes, it just stops looking for them and enters a catatonic state where it is still 'running' good enough for the LB healthchecks to pass but it isn't really working because it won't answer any requests, and the container is possibly stopped / not listening.
Possible Solution
Put traefik infront of the container to create a healthcheck endpoint that works?
Figure out the behavior of forum and adjust the healthcheck status matcher appropriately.
Additional Details
Discussion starting here and going to about 4pm that day. https://mitodl.slack.com/archives/C02QLTAE05S/p1721329113019089
The text was updated successfully, but these errors were encountered: