-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PhenoAge Calculation Bug Disclosure: Missing U-Shaped Curves for Biomarkers #136
Comments
I'm stuck on this issue. No solution really seems to be worth it.
I guess sticking with the pheno age model, albeit it being flawed, might be the least bad solution? |
I'm gonna stop the improvement of the results at the best possible physiological values. Here I'm attempting to figure out what those should be. A quick and dirty AI questionnaire resulted in the following table:
At this time o1 and o3-mini-high are considered to be the most advanced models. Coincidentally they are even in perfect agreement regarding the results, which is convenient, so I'll take their numbers as caps. |
Or maybe I should go with reference ranges instead of best possible values? The reference ranges are less contentious than ideal values are. Although that'd have less optimal results, yet they are less debatable and stays more on the safe side. |
Or maybe I can have the best of both wordls: I can go with ideal ranges instead of typical reference ranges? |
Ok, this is what's happening. For age and crp there are no known lower caps, so not capping them. For the rest based on o1 and o3 again, falling on the safer side when they disagree: Albumin: Upper cap: 50 g/L |
With Dave Pascoe we’ve discovered that the DNAm PhenoAge calculation method in the Longevity World Cup repository – as well as all existing online PhenoAge calculators – seems to have a critical oversight. Specifically, the PhenoAge biomarkers (albumin, creatinine, glucose, C-reactive protein, lymphocyte percentage, mean cell volume, red cell distribution width, white blood cell count, and alkaline phosphatase) follow a U-shaped curve in reality (values that are either too high or too low are both associated with increased mortality), but the current linear regression model rewards pushing these values to unrealistic extremes.
For instance, the algorithm will yield impossibly large negative ages if albumin is inflated to 100 g/dL or if glucose is taken down to extremely low, unhealthy levels. These “optimizations” don’t reflect better health but rather an artifact of the underlying linear model.
Below is a snippet from our discussion:
We see similar distortions with albumin, C-reactive protein, and likely other markers. Simply capping values at a plausible upper or lower limit won’t solve it either, because some biomarkers legitimately have a U-shape (like glucose), and limiting them linearly doesn't capture the genuine mortality curve.
Question:
How might we fix or improve the DNAm PhenoAge calculations to respect actual physiological realities (e.g., U-shaped relationships) rather than purely linear extrapolations?
Some ideas to consider:
min()
andmax()
constraints for each biomarker so extreme values won’t skew results. For example, cap albumin at 5.0 g/dL and glucose at 250 mg/dL to prevent nonsensical outputs.Let me know your thoughts and any other potential strategies!
The text was updated successfully, but these errors were encountered: