Optimistic cache may return wrong result once it meets ServFail #436

BusyJay · 2025-02-12T10:39:04Z

Optimistic cache will return stale cache and trigger a background resolve to refresh the cache. Consider following timeline:

receive query -> resolve -> ServFail
receive query -> cache hit and return ServFail
                        |
                       -> resolve -> MXDOMAIN -> skip updating cache

If a query always returns MXDOMAIN generally, and somehow upstream returns a ServFail, it will be cached for 30 seconds and then become stale. However, when background retry sees MXDOMAIN, it will skip refresh cache, so all further request will be always responded by a stale cahched ServFail. Clients may retry ServFail and flood adguardhome with meaningless requests.

A simple fix is don't return any stale ServFail response from cache.

The text was updated successfully, but these errors were encountered:

Close AdguardTeam#436. Signed-off-by: Jay <[email protected]>

EugeneOne1 · 2025-02-12T15:05:17Z

Hello, @BusyJay, and thanks for the report. Could you please share some additional information about the failed query? We're interested in the following details:

the upstream responding with NXDOMAIN;
the domain requested;
an example of a dig/nslookup command that reproduces the behaviour.

BusyJay · 2025-02-14T09:48:01Z

The reason why I suspected this behavior was because I saw a lot of same failed query logs in a short time (50+ per second, 1m per day) that had the same response code ServFail and all of them had green check that indicates response are from cache.

But after further analysis, I find my assumption is not what's actually happening, though the assumption is still a valid bug. The real reason why logs are identical may be due to upstream always return ServFail. And because of optimistic refresh cache in the background, so all of those response seems from cache. And the reason why so many logs in a short time may due to no backoff in client application and adguardhome caches ServFail response.

The queries that will always return ServFail are querying https record of cmcos.albatross.10086.cn.

The topology of my network is devices -> linksys router -> dnsmasq -> adguardhome. After a lot of failed requests, dnsmasq stops accepting any dns queries for a while. I'm not sure whether the bad retries come from dnsmasq, linksys router or one of the devices. For now, I will add some sleep when response code is ServFail to my fork of adguardhome to get around this problem.

EugeneOne1 · 2025-02-14T17:32:40Z

@BusyJay, hello again. As for the original problem, we haven't been able to reproduce it. The only reason why new NXDOMAIN responses may not rewrite the cached ones is the lack of SOA records defining their TTL. Such responses aren't considered cacheable at all, so current behaviour is actually a cache poisoning mitigation.

FIY, SERVFAIL responses are cached for no more than 30 seconds, as per RFC 2308 Section 7.1, and we also have an issue about stale optimistic cache records (#318), so please consider upvoting it with "👍".

We'll close the issue for now, as it doesn't appear to be related to dnsproxy. Please consider opening another issue if the real problem appears to be within dnsproxy, and please add some steps for reproducing it to the report.

BusyJay added a commit to BusyJay/dnsproxy that referenced this issue Feb 12, 2025

avoid returning stale ServFail response

8fb6266

Close AdguardTeam#436. Signed-off-by: Jay <[email protected]>

BusyJay mentioned this issue Feb 12, 2025

avoid returning stale ServFail response #437

Open

ainar-g assigned EugeneOne1 Feb 12, 2025

ainar-g added the needs investigation Should be reproduced and analysed label Feb 12, 2025

EugeneOne1 closed this as not planned Won't fix, can't repro, duplicate, stale Feb 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimistic cache may return wrong result once it meets ServFail #436

Optimistic cache may return wrong result once it meets ServFail #436

BusyJay commented Feb 12, 2025

EugeneOne1 commented Feb 12, 2025

BusyJay commented Feb 14, 2025

EugeneOne1 commented Feb 14, 2025

Optimistic cache may return wrong result once it meets ServFail #436

Optimistic cache may return wrong result once it meets ServFail #436

Comments

BusyJay commented Feb 12, 2025

EugeneOne1 commented Feb 12, 2025

BusyJay commented Feb 14, 2025

EugeneOne1 commented Feb 14, 2025