Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Relay White Noise Error] Enhance JSON RPC Relay Stability and Error Resilience [NEEDS MONITORING AND GLOBAL MAPPING ReView] #3441

Open
quiet-node opened this issue Feb 3, 2025 · 0 comments
Labels
enhancement New feature or request Epic
Milestone

Comments

@quiet-node
Copy link
Member

quiet-node commented Feb 3, 2025

Overview

The JSON RPC Relay service experiences intermittent errors due to issues from remote endpoints, primarily the Mirror Node. These "white noise errors" result in various failure responses, leading to degraded reliability. The goal of this epic is to analyze these errors, improve handling mechanisms, and enhance the overall stability of the Relay service.

Key Findings

a. eth_call

  • Frequent eth_call Failures Due to Mirror Node 5xx Errors
    Many eth_call requests are rejected with a 500 Relay error caused by Mirror Node HTTP 5xx series errors. The most common error codes are 500, 502, 503, and 504, with 502 accounting for almost 99% of all 5xx responses.

  • Invalid Data Causing 501 Errors
    Some eth_call requests return 501 Not Implemented from the Mirror Node due to invalid hex string parameters (e.g., 0x01ffc9a7d9b67a26...). The Relay then retries with the Consensus Node, which also rejects the request, ultimately leading to a 500 error in the Relay. Example txRequestId: 443ca4b5-5a19-459e-aa4e-59ee4420a49f.

  • Incorrect Mapping of 429 Errors
    eth_call requests receiving 429 Too Many Requests from the Mirror Node are currently mapped to 409 Conflict in the Relay. We need to evaluate whether the Relay should return 429 instead of 409 for better alignment with rate-limiting semantics.

b. eth_getBalance

  • Frequent eth_getBalance Failures Due to Mirror Node 504 Errors
    Many eth_getBalance requests fail with a 504 Gateway Timeout from the Mirror Node, indicating that the request is not completing within the expected timeframe. This suggests potential performance issues or bottlenecks in the Mirror Node's handling of balance queries. Further investigation is needed to determine if these failures are transient or indicative of a deeper infrastructure limitation.

c. eth_getBlockByNumber

  • Frequent eth_getBlockByNumber Failures Due to Mirror Node 504 Errors
    Similar to eth_getBalance, many eth_getBlockByNumber requests are failing with a 504 Gateway Timeout error from the Mirror Node. This points to possible delays or timeouts in processing block data at the Mirror Node level. Investigating the root cause of these timeouts is crucial to understand if the issue is related to high load or other performance constraints within the node infrastructure.

d. eth_getLogs

  • Frequent eth_getLogs Failures Due to Mirror Node 502 Errors
    Many eth_getLogs requests are failing with a 502 Bad Gateway error from the Mirror Node. This suggests an issue with the Mirror Node's ability to process and return log data reliably. The high occurrence of 502 errors indicates that the node may be experiencing upstream failures or instability when handling log queries. Further investigation is required to determine the root cause and potential mitigations.

e. Unexpected HTTP 567 Errors

  • Intermittent 567 Errors in the Relay
    Occasionally, the Relay logs show HTTP 567 errors, but these are not consistently reproducible. The cause of this status code is unclear, as it is not a standard HTTP error. It may indicate an upstream service issue, misconfigured proxy behavior, or an internal Relay anomaly. Further logging and investigation are needed to determine when and why these errors occur.

Core Issues

The key findings highlight four main issues affecting the Relay’s error handling and stability:

  1. Inconsistent Error Code Mapping for 5xx Failures

    • Mirror Node errors (500, 502, 503, 504) are currently mapped to 500 in the Relay, losing specific failure context. A better mapping mechanism is needed to provide more precise error feedback to end clients.
  2. Invalid Data Causing 501 Errors

    • Some eth_call requests return 501 Not Implemented from the Mirror Node due to invalid hex string parameters. The Relay retries with the Consensus Node, which also fails, ultimately leading to a 500 error. Improved validation is needed to detect invalid requests earlier.
  3. Incorrect Mapping of 429 Errors

    • eth_call requests receiving 429 Too Many Requests from the Mirror Node are incorrectly mapped to 409 Conflict in the Relay. This needs correction to ensure proper rate-limiting responses.
  4. Intermittent HTTP 567 Errors

    • The Relay occasionally logs HTTP 567 errors, but their cause is unclear. Further investigation is required to determine their origin and potential mitigation.

Objectives

This epic aims to enhance the Relay’s stability and resilience by addressing the core issues identified above. The main objectives include:

  • Implementing better error code mapping to ensure that Mirror Node failures provide accurate failure reasons to end clients.
  • Improving request validation to prevent invalid eth_call parameters from propagating and causing unnecessary retries.
  • Correcting the mapping of rate-limiting errors to align with HTTP semantics and improve client handling of 429 responses.
  • Investigating and mitigating the cause of HTTP 567 errors to eliminate unexplained anomalies in the Relay.

Alternatives

No response

@quiet-node quiet-node added enhancement New feature or request Epic labels Feb 3, 2025
@quiet-node quiet-node modified the milestones: 0.66.0, 0.67.0 Feb 3, 2025
@Ferparishuertas Ferparishuertas changed the title [Relay White Noise Error] Enhance JSON RPC Relay Stability and Error Resilience [Relay White Noise Error] Enhance JSON RPC Relay Stability and Error Resilience [NEEDS MONITORING AND GLOBAL MAPPING ReView] Feb 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Epic
Projects
Status: Backlog
Development

No branches or pull requests

1 participant