Skip to content

Latest commit

 

History

History
103 lines (73 loc) · 4.59 KB

dev.md

File metadata and controls

103 lines (73 loc) · 4.59 KB

[ Back to README.md ]

TinyUTF8 Developer Documentation

Development Environment

You will need the following tools for development:

The following tools are not strictly required but very much recommended:

GNU/Linux users can usually install Git and Clang directly via their distribution packet manager and by default have a full POSIX conforming environment pre-installed.

BSD derivate users (e.g. FreeBSD, OpenBSD, NetBSD) either can install Git and Clang via their system packet manager, or may even have both pre-installed by default, and always come with a POSIX conforming environment.

macOS users can get Clang and Git by installing Xcode. Older versions are also sufficient and from the developer download portal; just select "Downloads" and log in (logging in with an Apple ID is required but an Apple ID is available free of charge and a developer subscription is not required for downloading and using Xcode). Alternatively they can install the command line developer tools for your current version of macOS by just executing xcode-select --install in Terminal and follow the prompted instructions. macOS always comes with a POSIX conforming shell environment.

Windows users can either use Linux on Windows and then installing a Linux distribution, e.g. Debian or Ubuntu, or they can install a MinGW-64 development environment, like Cygwin (with added Clang packet) or w64devkit.

All systems for that Docker is available (or a Docker compatible container environment) can also setup a Docker image for developing. A simple GNU/Linux environment like Alpine Linux (basically a Linux Kernel + BusyBox) is already sufficient and allows installing Git and Clang via the pkg packet manager.

Generating Test Data

Creating NFC/NFD forms

https://dencode.com/en/string/unicode-normalization

Create Unicode data

https://www.coderstool.com/unicode-text-converter

Create hex dumps from UTF-8 Hex

printf '\x..\x..\x..' | xxd -u -i | sed 's/0X/0x/g' | sed 's/^../\t/'

Count bytes from UTF-8 Hex

printf '%s' '\x..\x..\x..' | tr '\' '\n' | wc -l

Count code points from UTF-32

printf 'u+..u+..u+...' | tr 'u' '\n' | wc -l

Create UTF-8 from UTF-32

You can use the following JavaScript code for that:

const inputValues= [
	// 'u+DFFFu+00'
];
for (var i = 0; i < inputValues.length; i++) {
	const utf32String = inputValues[i];
	const utf32Array = utf32String.split('u+').filter(Boolean);
	const utf8Array = utf32Array.map(utf32 => {
		const codePoint = parseInt(utf32, 16);
		if (codePoint < 0x80) {
			return codePoint;
		} else if (codePoint < 0x800) {
			return [0xc0 | (codePoint >> 6),
				0x80 | (codePoint & 0x3f)];
		} else if (codePoint < 0x10000) {
			return [0xe0 | (codePoint >> 12),
				0x80 | ((codePoint >> 6) & 0x3f),
				0x80 | (codePoint & 0x3f)];
		} else {
			return [0xf0 | (codePoint >> 18),
				0x80 | ((codePoint >> 12) & 0x3f),
				0x80 | ((codePoint >> 6) & 0x3f),
				0x80 | (codePoint & 0x3f)];
		}
	});
	const utf8Result = utf8Array.flat().map(
	code => `0x${code.toString(16).padStart(2, '0')}`);
	console.log(utf8Result.join(', '));
}

E.g. you can run it at: https://reqbin.com/code/javascript