You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The JSON RPC Relay service experiences intermittent errors due to issues from remote endpoints, primarily the Mirror Node. These "white noise errors" result in various failure responses, leading to degraded reliability. The goal of this epic is to analyze these errors, improve handling mechanisms, and enhance the overall stability of the Relay service.
Key Findings
a. eth_call
Frequent eth_call Failures Due to Mirror Node 5xx Errors
Many eth_call requests are rejected with a 500 Relay error caused by Mirror Node HTTP 5xx series errors. The most common error codes are 500, 502, 503, and 504, with 502 accounting for almost 99% of all 5xx responses.
Invalid Data Causing 501 Errors
Some eth_call requests return 501 Not Implemented from the Mirror Node due to invalid hex string parameters (e.g., 0x01ffc9a7d9b67a26...). The Relay then retries with the Consensus Node, which also rejects the request, ultimately leading to a 500 error in the Relay. Example txRequestId: 443ca4b5-5a19-459e-aa4e-59ee4420a49f.
Incorrect Mapping of 429 Errors eth_call requests receiving 429 Too Many Requests from the Mirror Node are currently mapped to 409 Conflict in the Relay. We need to evaluate whether the Relay should return 429 instead of 409 for better alignment with rate-limiting semantics.
b. eth_getBalance
Frequent eth_getBalance Failures Due to Mirror Node 504 Errors
Many eth_getBalance requests fail with a 504 Gateway Timeout from the Mirror Node, indicating that the request is not completing within the expected timeframe. This suggests potential performance issues or bottlenecks in the Mirror Node's handling of balance queries. Further investigation is needed to determine if these failures are transient or indicative of a deeper infrastructure limitation.
c. eth_getBlockByNumber
Frequent eth_getBlockByNumber Failures Due to Mirror Node 504 Errors
Similar to eth_getBalance, many eth_getBlockByNumber requests are failing with a 504 Gateway Timeout error from the Mirror Node. This points to possible delays or timeouts in processing block data at the Mirror Node level. Investigating the root cause of these timeouts is crucial to understand if the issue is related to high load or other performance constraints within the node infrastructure.
d. eth_getLogs
Frequent eth_getLogs Failures Due to Mirror Node 502 Errors
Many eth_getLogs requests are failing with a 502 Bad Gateway error from the Mirror Node. This suggests an issue with the Mirror Node's ability to process and return log data reliably. The high occurrence of 502 errors indicates that the node may be experiencing upstream failures or instability when handling log queries. Further investigation is required to determine the root cause and potential mitigations.
e. Unexpected HTTP 567 Errors
Intermittent 567 Errors in the Relay
Occasionally, the Relay logs show HTTP 567 errors, but these are not consistently reproducible. The cause of this status code is unclear, as it is not a standard HTTP error. It may indicate an upstream service issue, misconfigured proxy behavior, or an internal Relay anomaly. Further logging and investigation are needed to determine when and why these errors occur.
Core Issues
The key findings highlight four main issues affecting the Relay’s error handling and stability:
Inconsistent Error Code Mapping for 5xx Failures
Mirror Node errors (500, 502, 503, 504) are currently mapped to 500 in the Relay, losing specific failure context. A better mapping mechanism is needed to provide more precise error feedback to end clients.
Invalid Data Causing 501 Errors
Some eth_call requests return 501 Not Implemented from the Mirror Node due to invalid hex string parameters. The Relay retries with the Consensus Node, which also fails, ultimately leading to a 500 error. Improved validation is needed to detect invalid requests earlier.
Incorrect Mapping of 429 Errors
eth_call requests receiving 429 Too Many Requests from the Mirror Node are incorrectly mapped to 409 Conflict in the Relay. This needs correction to ensure proper rate-limiting responses.
Intermittent HTTP 567 Errors
The Relay occasionally logs HTTP 567 errors, but their cause is unclear. Further investigation is required to determine their origin and potential mitigation.
Objectives
This epic aims to enhance the Relay’s stability and resilience by addressing the core issues identified above. The main objectives include:
Implementing better error code mapping to ensure that Mirror Node failures provide accurate failure reasons to end clients.
Improving request validation to prevent invalid eth_call parameters from propagating and causing unnecessary retries.
Correcting the mapping of rate-limiting errors to align with HTTP semantics and improve client handling of 429 responses.
Investigating and mitigating the cause of HTTP 567 errors to eliminate unexplained anomalies in the Relay.
Alternatives
No response
The text was updated successfully, but these errors were encountered:
Ferparishuertas
changed the title
[Relay White Noise Error] Enhance JSON RPC Relay Stability and Error Resilience
[Relay White Noise Error] Enhance JSON RPC Relay Stability and Error Resilience [NEEDS MONITORING AND GLOBAL MAPPING ReView]
Feb 4, 2025
Overview
The JSON RPC Relay service experiences intermittent errors due to issues from remote endpoints, primarily the Mirror Node. These "white noise errors" result in various failure responses, leading to degraded reliability. The goal of this epic is to analyze these errors, improve handling mechanisms, and enhance the overall stability of the Relay service.
Key Findings
a.
eth_call
Frequent eth_call Failures Due to Mirror Node 5xx Errors
Many
eth_call
requests are rejected with a 500 Relay error caused by Mirror Node HTTP 5xx series errors. The most common error codes are 500, 502, 503, and 504, with 502 accounting for almost 99% of all 5xx responses.Invalid Data Causing 501 Errors
Some
eth_call
requests return 501 Not Implemented from the Mirror Node due to invalid hex string parameters (e.g.,0x01ffc9a7d9b67a26...
). The Relay then retries with the Consensus Node, which also rejects the request, ultimately leading to a 500 error in the Relay. ExampletxRequestId
:443ca4b5-5a19-459e-aa4e-59ee4420a49f
.Incorrect Mapping of 429 Errors
eth_call
requests receiving 429 Too Many Requests from the Mirror Node are currently mapped to 409 Conflict in the Relay. We need to evaluate whether the Relay should return 429 instead of 409 for better alignment with rate-limiting semantics.b.
eth_getBalance
Many
eth_getBalance
requests fail with a 504 Gateway Timeout from the Mirror Node, indicating that the request is not completing within the expected timeframe. This suggests potential performance issues or bottlenecks in the Mirror Node's handling of balance queries. Further investigation is needed to determine if these failures are transient or indicative of a deeper infrastructure limitation.c.
eth_getBlockByNumber
Similar to
eth_getBalance
, manyeth_getBlockByNumber
requests are failing with a 504 Gateway Timeout error from the Mirror Node. This points to possible delays or timeouts in processing block data at the Mirror Node level. Investigating the root cause of these timeouts is crucial to understand if the issue is related to high load or other performance constraints within the node infrastructure.d.
eth_getLogs
Many
eth_getLogs
requests are failing with a 502 Bad Gateway error from the Mirror Node. This suggests an issue with the Mirror Node's ability to process and return log data reliably. The high occurrence of 502 errors indicates that the node may be experiencing upstream failures or instability when handling log queries. Further investigation is required to determine the root cause and potential mitigations.e. Unexpected HTTP 567 Errors
Occasionally, the Relay logs show HTTP 567 errors, but these are not consistently reproducible. The cause of this status code is unclear, as it is not a standard HTTP error. It may indicate an upstream service issue, misconfigured proxy behavior, or an internal Relay anomaly. Further logging and investigation are needed to determine when and why these errors occur.
Core Issues
The key findings highlight four main issues affecting the Relay’s error handling and stability:
Inconsistent Error Code Mapping for 5xx Failures
Invalid Data Causing 501 Errors
eth_call
requests return 501 Not Implemented from the Mirror Node due to invalid hex string parameters. The Relay retries with the Consensus Node, which also fails, ultimately leading to a 500 error. Improved validation is needed to detect invalid requests earlier.Incorrect Mapping of 429 Errors
eth_call
requests receiving 429 Too Many Requests from the Mirror Node are incorrectly mapped to 409 Conflict in the Relay. This needs correction to ensure proper rate-limiting responses.Intermittent HTTP 567 Errors
Objectives
This epic aims to enhance the Relay’s stability and resilience by addressing the core issues identified above. The main objectives include:
eth_call
parameters from propagating and causing unnecessary retries.Alternatives
No response
The text was updated successfully, but these errors were encountered: