Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xWidthAvg: Update character frequency weightings data source #167

Merged
merged 8 commits into from
Feb 15, 2024
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 21 additions & 0 deletions .changeset/strong-kangaroos-tease.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
---
'@capsizecss/metrics': minor
---

xWidthAvg: Update character frequency weightings data source

The character frequency weightings used to calculate the `xWidthAvg` metrics were previously hard coded internally, and were an adaption from a [frequency table] from Wikipedia.

We now generate these weightings based on the abstracts from [WikiNews] articles.
This makes it possible to add support for languages that use non-latin [unicode subsets], e.g. Thai, by adding the relevant abstract and generating the `xAvgWidth` based on the corresponding unicode subset range.

### Will this change anything for consumers?

Given the updated `xWidthAvg` metrics are very close to the original hard coded values, we do not forsee any impact on consumers.
Even our CSS snapshot tests were unchanged, and they contain values rounded to 4 decimal places!

The result is either no or extremely minor changes to the generated fallback font CSS, with the benefit being this lays the ground work to support additional language subsets in the near future.

[frequency table]: https://en.wikipedia.org/wiki/Letter_frequency#Relative_frequencies_of_letters_in_other_languages
[WikiNews]: https://wikinews.org/
[unicode subsets]: https://www.utf8icons.com/subsets
1 change: 1 addition & 0 deletions .prettierignore
Original file line number Diff line number Diff line change
Expand Up @@ -8,5 +8,6 @@ site/public
output
packages/metrics/scripts/googleFontsApi.json
packages/metrics/entireMetricsCollection/*
packages/unpack/src/weightings.ts
CHANGELOG.md
pnpm-lock.yaml
8 changes: 5 additions & 3 deletions package.json
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,9 @@
"test": "jest",
"format": "prettier --write .",
"lint": "manypkg check && prettier --check . && tsc",
"dev": "pnpm %packages dev",
"build": "pnpm %packages build && pnpm metrics:generate",
"dev": "pnpm %packages dev && pnpm generate",
"build": "pnpm generate && pnpm %packages build && pnpm metrics:generate",
"generate": "pnpm unpack:generate && pnpm metrics:generate",
michaeltaranto marked this conversation as resolved.
Show resolved Hide resolved
"copy-readme": "node scripts/copy-readme",
"version": "changeset version && pnpm install --lockfile-only",
"prepare-release": "pnpm copy-readme && pnpm build",
Expand All @@ -23,11 +24,12 @@
"site:build": "pnpm %site build",
"site:serve": "pnpm %site serve",
"site:deploy": "pnpm %site run deploy",
"site:deploy-preview": "pnpm metrics:generate && pnpm %site deploy-preview",
"site:deploy-preview": "pnpm generate && pnpm %site deploy-preview",
"metrics:extract-system": "pnpm %metrics extract-system-metrics",
"metrics:generate": "pnpm %metrics generate",
"metrics:clean": "pnpm %metrics clean",
"metrics:download": "pnpm %metrics download",
"unpack:generate": "pnpm --filter=@capsizecss/unpack generate",
"prepare": "pnpm dev && (is-ci || husky install)"
},
"author": {
Expand Down
40 changes: 27 additions & 13 deletions packages/metrics/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,20 +41,34 @@ const capsizeStyles = createStyleObject({

The font metrics object returned contains the following properties if available:

| Property | Type | Description |
| ---------- | ------ | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| familyName | string | The font family name as authored by font creator |
| category | string | The style of the font: serif, sans-serif, monospace, display, or handwriting. |
| capHeight | number | The height of capital letters above the baseline |
| ascent | number | The height of the ascenders above baseline |
| descent | number | The descent of the descenders below baseline |
| lineGap | number | The amount of space included between lines |
| unitsPerEm | number | The size of the font’s internal coordinate grid |
| xHeight | number | The height of the main body of lower case letters above baseline |
| xWidthAvg | number | The average width of lowercase characters.<br/><br/>Currently derived from latin [character frequencies] in English language, falling back to the built in [xAvgCharWidth] from the OS/2 table. |

[character frequencies]: https://en.wikipedia.org/wiki/Letter_frequency#Relative_frequencies_of_letters_in_other_languages
| Property | Type | Description |
| ---------- | ------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| familyName | string | The font family name as authored by font creator |
| category | string | The style of the font: serif, sans-serif, monospace, display, or handwriting. |
| capHeight | number | The height of capital letters above the baseline |
| ascent | number | The height of the ascenders above baseline |
| descent | number | The descent of the descenders below baseline |
| lineGap | number | The amount of space included between lines |
| unitsPerEm | number | The size of the font’s internal coordinate grid |
| xHeight | number | The height of the main body of lower case letters above baseline |
| xWidthAvg | number | The average width of character glyphs in the font. Calculated based on character frequencies in written text ([see below]), falling back to the built in [xAvgCharWidth] from the OS/2 table. |

#### How `xWidthAvg` is calculated

The `xWidthAvg` metric is derived from character frequencies in written language.
The value takes a weighted average of character glyph widths in the font, falling back to the built in [xAvgCharWidth] from the OS/2 table if the glyph width is not available.

The purpose of this metric is to support generating CSS metric overrides (e.g. [`ascent-override`], [`size-adjust`], etc) for fallback fonts, enabling inference of average line lengths so that a fallback font can be scaled to better align with a web font. This can be done either manually or using [`createFontStack`].

For this technique to be effective, the metric factors in a character frequency weightings as observed in written language, using “abstracts” from [Wikinews] articles as a data source.
Currently only supporting English ([source](https://en.wikinews.org/)).

[see below]: #how-xwidthavg-is-calculated
[xavgcharwidth]: https://learn.microsoft.com/en-us/typography/opentype/spec/os2#xavgcharwidth
[`ascent-override`]: https://developer.mozilla.org/en-US/docs/Web/CSS/@font-face/ascent-override
[`size-adjust`]: https://developer.mozilla.org/en-US/docs/Web/CSS/@font-face/size-adjust
[`createfontstack`]: ../core/README.md#createfontstack
[wikinews]: https://www.wikinews.org/

## Supporting APIs

Expand Down
41 changes: 41 additions & 0 deletions packages/metrics/scripts/buildMetrics.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
import { Font, fromFile, fromUrl } from '@capsizecss/unpack';

type FontCategory =
| 'serif'
| 'sans-serif'
| 'monospace'
| 'display'
| 'handwriting';
export interface MetricsFont extends Font {
category: FontCategory;
}

interface Options {
fontSource: string;
sourceType: 'file' | 'url';
category: FontCategory;
overrides?: Partial<Font>;
}

const extractor: Record<
Options['sourceType'],
typeof fromFile | typeof fromUrl
> = {
file: fromFile,
url: fromUrl,
};

export const buildMetrics = async ({
fontSource,
sourceType,
category,
overrides = {},
}: Options): Promise<MetricsFont> => {
const metrics = await extractor[sourceType](fontSource);

return {
...metrics,
...overrides,
category,
};
};
187 changes: 124 additions & 63 deletions packages/metrics/scripts/extractSystemFontMetrics.ts
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import fs from 'fs/promises';
import path from 'path';
import { fromFile } from '@capsizecss/unpack';
import { buildMetrics } from './buildMetrics';

(async () => {
const fontDirectory = process.env.FONT_DIRECTORY;
Expand All @@ -11,73 +11,134 @@ import { fromFile } from '@capsizecss/unpack';
);
}

const arial = await fromFile(`${fontDirectory}/Arial.ttf`);
const sfPro = await fromFile(`${fontDirectory}/SF-Pro.ttf`);
const roboto = await fromFile(`${fontDirectory}/Roboto.ttf`);
const segoeui = await fromFile(`${fontDirectory}/SegoeUI.ttf`);
const oxygen = await fromFile(`${fontDirectory}/Oxygen.ttf`);
const helvetica = await fromFile(`${fontDirectory}/Helvetica.ttf`);
const helveticaNeue = await fromFile(`${fontDirectory}/HelveticaNeue.ttf`);
const timesNewRoman = await fromFile(`${fontDirectory}/Times New Roman.ttf`);
const tahoma = await fromFile(`${fontDirectory}/Tahoma.ttf`);
const lucidaGrande = await fromFile(`${fontDirectory}/LucidaGrande.ttf`);
const verdana = await fromFile(`${fontDirectory}/Verdana.ttf`);
const trebuchetMS = await fromFile(`${fontDirectory}/Trebuchet MS.ttf`);
const georgia = await fromFile(`${fontDirectory}/Georgia.ttf`);
const courierNew = await fromFile(`${fontDirectory}/Courier New.ttf`);
const brushScript = await fromFile(`${fontDirectory}/Brush Script.ttf`);
const arial = await buildMetrics({
fontSource: `${fontDirectory}/Arial.ttf`,
sourceType: 'file',
category: 'sans-serif',
});
const appleSystem = await buildMetrics({
fontSource: `${fontDirectory}/SF-Pro.ttf`,
sourceType: 'file',
category: 'sans-serif',
overrides: {
familyName: '-apple-system',
descent: -420,
},
});
const blinkMacSystemFont = await buildMetrics({
fontSource: `${fontDirectory}/SF-Pro.ttf`,
sourceType: 'file',
category: 'sans-serif',
overrides: {
familyName: 'BlinkMacSystemFont',
descent: -420,
},
});
const roboto = await buildMetrics({
fontSource: `${fontDirectory}/Roboto.ttf`,
sourceType: 'file',
category: 'sans-serif',
});
const segoeui = await buildMetrics({
fontSource: `${fontDirectory}/SegoeUI.ttf`,
sourceType: 'file',
category: 'sans-serif',
});
const oxygen = await buildMetrics({
fontSource: `${fontDirectory}/Oxygen.ttf`,
sourceType: 'file',
category: 'sans-serif',
overrides: {
capHeight: 1479,
xHeight: 1097,
},
});
const helvetica = await buildMetrics({
fontSource: `${fontDirectory}/Helvetica.ttf`,
sourceType: 'file',
category: 'sans-serif',
});
const helveticaNeue = await buildMetrics({
fontSource: `${fontDirectory}/HelveticaNeue.ttf`,
sourceType: 'file',
category: 'sans-serif',
});
const timesNewRoman = await buildMetrics({
fontSource: `${fontDirectory}/Times New Roman.ttf`,
sourceType: 'file',
category: 'serif',
});
const tahoma = await buildMetrics({
fontSource: `${fontDirectory}/Tahoma.ttf`,
sourceType: 'file',
category: 'sans-serif',
});
const lucidaGrande = await buildMetrics({
fontSource: `${fontDirectory}/LucidaGrande.ttf`,
sourceType: 'file',
category: 'sans-serif',
});
const verdana = await buildMetrics({
fontSource: `${fontDirectory}/Verdana.ttf`,
sourceType: 'file',
category: 'sans-serif',
});
const trebuchetMS = await buildMetrics({
fontSource: `${fontDirectory}/Trebuchet MS.ttf`,
sourceType: 'file',
category: 'sans-serif',
overrides: {
capHeight: 1465,
xHeight: 1071,
},
});
const georgia = await buildMetrics({
fontSource: `${fontDirectory}/Georgia.ttf`,
sourceType: 'file',
category: 'serif',
});
const courierNew = await buildMetrics({
fontSource: `${fontDirectory}/Courier New.ttf`,
sourceType: 'file',
category: 'monospace',
});
const brushScript = await buildMetrics({
fontSource: `${fontDirectory}/Brush Script.ttf`,
sourceType: 'file',
category: 'handwriting',
overrides: {
capHeight: 1230,
xHeight: 709,
},
});

const content = JSON.stringify(
[
{ ...arial, category: 'sans-serif' },
{
...sfPro,
familyName: '-apple-system',
descent: -420,
category: 'sans-serif',
},
{
...sfPro,
familyName: 'BlinkMacSystemFont',
descent: -420,
category: 'sans-serif',
},
{ ...roboto, category: 'sans-serif' },
{ ...segoeui, category: 'sans-serif' },
{ ...oxygen, capHeight: 1479, xHeight: 1097, category: 'sans-serif' },
{ ...helvetica, category: 'sans-serif' },
{ ...helveticaNeue, category: 'sans-serif' },
{ ...timesNewRoman, category: 'serif' },
{ ...tahoma, category: 'sans-serif' },
{ ...lucidaGrande, category: 'sans-serif' },
{ ...verdana, category: 'sans-serif' },
{
...trebuchetMS,
capHeight: 1465,
xHeight: 1071,
category: 'sans-serif',
},
{ ...georgia, category: 'serif' },
{ ...courierNew, category: 'monospace' },
{
...brushScript,
capHeight: 1230,
xHeight: 709,
category: 'handwriting',
},
].sort((a, b) => {
const fontA = a.familyName.toUpperCase();
const fontB = b.familyName.toUpperCase();
const content = [
arial,
appleSystem,
blinkMacSystemFont,
roboto,
segoeui,
oxygen,
helvetica,
helveticaNeue,
timesNewRoman,
tahoma,
lucidaGrande,
verdana,
trebuchetMS,
georgia,
courierNew,
brushScript,
].sort((a, b) => {
const fontA = a.familyName.toUpperCase();
const fontB = b.familyName.toUpperCase();

return fontA < fontB ? -1 : fontA > fontB ? 1 : 0;
}),
null,
2,
);
return fontA < fontB ? -1 : fontA > fontB ? 1 : 0;
});

await fs.writeFile(
path.join(__dirname, 'systemFonts.json'),
`${content}\n`,
`${JSON.stringify(content, null, 2)}\n`,
'utf-8',
);
})();
Loading
Loading