PhenoAge Calculation Bug Disclosure: Missing U-Shaped Curves for Biomarkers #136

nopara73 · 2025-01-02T07:22:24Z

With Dave Pascoe we’ve discovered that the DNAm PhenoAge calculation method in the Longevity World Cup repository – as well as all existing online PhenoAge calculators – seems to have a critical oversight. Specifically, the PhenoAge biomarkers (albumin, creatinine, glucose, C-reactive protein, lymphocyte percentage, mean cell volume, red cell distribution width, white blood cell count, and alkaline phosphatase) follow a U-shaped curve in reality (values that are either too high or too low are both associated with increased mortality), but the current linear regression model rewards pushing these values to unrealistic extremes.

For instance, the algorithm will yield impossibly large negative ages if albumin is inflated to 100 g/dL or if glucose is taken down to extremely low, unhealthy levels. These “optimizations” don’t reflect better health but rather an artifact of the underlying linear model.

Below is a snippet from our discussion:

“Glucose, for example, would be near perfect biologically at 80, but PhenoAge will reward much more unhealthy people (e.g., glucose of 60, 40, or even 20) with much younger PhenoAges! That’s a big problem!”
— Dave Pascoe

We see similar distortions with albumin, C-reactive protein, and likely other markers. Simply capping values at a plausible upper or lower limit won’t solve it either, because some biomarkers legitimately have a U-shape (like glucose), and limiting them linearly doesn't capture the genuine mortality curve.

Question:

How might we fix or improve the DNAm PhenoAge calculations to respect actual physiological realities (e.g., U-shaped relationships) rather than purely linear extrapolations?

Some ideas to consider:

Clamp Out-of-Range Values: Implement quick min() and max() constraints for each biomarker so extreme values won’t skew results. For example, cap albumin at 5.0 g/dL and glucose at 250 mg/dL to prevent nonsensical outputs.
Add Simple Penalties for Out-of-Bounds Values: If a biomarker is below or above a known physiologically healthy range, add a penalty to push the result toward typical U-shaped mortality curves. This can be a single conditional block per biomarker.
Use a Two-Point Slope Adjustment: For clearly U-shaped markers, define two slopes: one for the “low range” and one for the “high range.” This is still a quick linear approach but mimics a U-shape without a full-blown polynomial or spline.
Piecewise Functions or Non-Linear Models: Implement piecewise functions or spline-based approaches to model truly U-shaped biomarker relationships.
Physiological Constraints: Set upper and lower limits based on best-known population studies, ensuring that values beyond these bounds do not produce unrealistic results.

Let me know your thoughts and any other potential strategies!

The text was updated successfully, but these errors were encountered:

nopara73 · 2025-01-15T01:43:09Z

I'm stuck on this issue. No solution really seems to be worth it.

If I do quick and dirty with a clean cutoff to correct for obvious problems at the end, it's very subjective without a proper research what score people should get. How fast am I worsening it? And it barely even solves the issue, just handles things that won't really happen.
A fairly correct result could be if I try to be more sophisticated and let's say I cut it off at the best possible value, the same subjectivity issue arises, but much worse because now people are expected to fall into these scores so it'd have real world consequences.

I guess sticking with the pheno age model, albeit it being flawed, might be the least bad solution?

nopara73 · 2025-02-03T15:50:23Z

I'm gonna stop the improvement of the results at the best possible physiological values. Here I'm attempting to figure out what those should be. A quick and dirty AI questionnaire resulted in the following table:

Model	Albumin (g/L)	Creatinine (µmol/L)	Glucose (mmol/L)	C-Reactive Protein (mg/L)	Lymphocytes (%)	Mean Corpuscular Volume (fL)	Red Cell Distribution Width (%)	Alkaline Phosphatase (U/L)	White Blood Cell Count (1000 cells/μL)
4o	45.0	80	4.8	0.0	40	90	12	70	6.50
o1	40.0	80	5.0	1.0	30	90	13	80	7.00
o3-mini-high	40.0	80	5.0	1.0	30	90	13	80	7.00
deepseek	40.0	80	5.0	3.0	30	90	12	70	7.00
claude	42.5	90	5.0	2.5	30	90	13	75	7.75

At this time o1 and o3-mini-high are considered to be the most advanced models. Coincidentally they are even in perfect agreement regarding the results, which is convenient, so I'll take their numbers as caps.

nopara73 · 2025-02-03T16:55:24Z

Or maybe I should go with reference ranges instead of best possible values? The reference ranges are less contentious than ideal values are. Although that'd have less optimal results, yet they are less debatable and stays more on the safe side.

nopara73 · 2025-02-03T17:17:29Z

Or maybe I can have the best of both wordls: I can go with ideal ranges instead of typical reference ranges?

nopara73 · 2025-02-03T17:36:32Z

Ok, this is what's happening.

For age and crp there are no known lower caps, so not capping them.

For the rest based on o1 and o3 again, falling on the safer side when they disagree:

Albumin: Upper cap: 50 g/L
Creatinine: Lower cap: 60 µmol/L
Glucose: Lower cap: 4.0 mmol/L
White Blood Cell Count (WBC): Lower cap: 4.5 × 1000 cells/µL
Lymphocytes: Upper cap: 40 %
Mean Corpuscular Volume (MCV): Lower cap: 85 fL
Red Cell Distribution Width (RDW): Lower cap: 11.5 %
Alkaline Phosphatase (AP): Lower cap: 50 U/L

nopara73 · 2025-02-03T18:00:18Z

97dbdf8

nopara73 mentioned this issue Jan 2, 2025

PhenoAge Calculation Bug Disclosure: Missing U-Shaped Curves for Biomarkers ajsteele/bioage#2

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PhenoAge Calculation Bug Disclosure: Missing U-Shaped Curves for Biomarkers #136

PhenoAge Calculation Bug Disclosure: Missing U-Shaped Curves for Biomarkers #136

nopara73 commented Jan 2, 2025 •

edited

Loading

nopara73 commented Jan 15, 2025

nopara73 commented Feb 3, 2025

nopara73 commented Feb 3, 2025

nopara73 commented Feb 3, 2025 •

edited

Loading

nopara73 commented Feb 3, 2025

nopara73 commented Feb 3, 2025

PhenoAge Calculation Bug Disclosure: Missing U-Shaped Curves for Biomarkers #136

PhenoAge Calculation Bug Disclosure: Missing U-Shaped Curves for Biomarkers #136

Comments

nopara73 commented Jan 2, 2025 • edited Loading

nopara73 commented Jan 15, 2025

nopara73 commented Feb 3, 2025

nopara73 commented Feb 3, 2025

nopara73 commented Feb 3, 2025 • edited Loading

nopara73 commented Feb 3, 2025

nopara73 commented Feb 3, 2025

nopara73 commented Jan 2, 2025 •

edited

Loading

nopara73 commented Feb 3, 2025 •

edited

Loading