-
Notifications
You must be signed in to change notification settings - Fork 258
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimistic cache may return wrong result once it meets ServFail #436
Comments
Close AdguardTeam#436. Signed-off-by: Jay <[email protected]>
Hello, @BusyJay, and thanks for the report. Could you please share some additional information about the failed query? We're interested in the following details:
|
The reason why I suspected this behavior was because I saw a lot of same failed query logs in a short time (50+ per second, 1m per day) that had the same response code ServFail and all of them had green check that indicates response are from cache. But after further analysis, I find my assumption is not what's actually happening, though the assumption is still a valid bug. The real reason why logs are identical may be due to upstream always return ServFail. And because of optimistic refresh cache in the background, so all of those response seems from cache. And the reason why so many logs in a short time may due to no backoff in client application and adguardhome caches ServFail response. The queries that will always return ServFail are querying https record of cmcos.albatross.10086.cn. The topology of my network is devices -> linksys router -> dnsmasq -> adguardhome. After a lot of failed requests, dnsmasq stops accepting any dns queries for a while. I'm not sure whether the bad retries come from dnsmasq, linksys router or one of the devices. For now, I will add some sleep when response code is ServFail to my fork of adguardhome to get around this problem. |
@BusyJay, hello again. As for the original problem, we haven't been able to reproduce it. The only reason why new NXDOMAIN responses may not rewrite the cached ones is the lack of SOA records defining their TTL. Such responses aren't considered cacheable at all, so current behaviour is actually a cache poisoning mitigation. FIY, SERVFAIL responses are cached for no more than 30 seconds, as per RFC 2308 Section 7.1, and we also have an issue about stale optimistic cache records (#318), so please consider upvoting it with "👍". We'll close the issue for now, as it doesn't appear to be related to dnsproxy. Please consider opening another issue if the real problem appears to be within dnsproxy, and please add some steps for reproducing it to the report. |
Optimistic cache will return stale cache and trigger a background resolve to refresh the cache. Consider following timeline:
If a query always returns
MXDOMAIN
generally, and somehow upstream returns a ServFail, it will be cached for 30 seconds and then become stale. However, when background retry seesMXDOMAIN
, it will skip refresh cache, so all further request will be always responded by a stale cahchedServFail
. Clients may retryServFail
and flood adguardhome with meaningless requests.A simple fix is don't return any stale
ServFail
response from cache.The text was updated successfully, but these errors were encountered: