- Dark corners of Unicode
- Let’s Stop Ascribing Meaning to Code Points
- UTF-8 – “The most elegant hack” (2013)
- UTF-8 Everywhere (HN)
- Not everything is UTF-8 (2020) (Lobsters)
- Archie Markup Language (ArchieML) - Structured text format optimized for human writability.
- Unidecode - Lossy ASCII transliterations of Unicode text.
- BARE Message Encoding - Simple binary representation for structured application data. (Lobsters)
- Explaining text representation in computers (2020)
- Text Encoding: The Thing You Were Trying to Avoid (2020)
- Amazon Ion - Richly-typed, self-describing, hierarchical data serialization format offering interchangeable binary and text representations. (HN) (HN)
- Unicode In Five Minutes (2013)
- Unicode support. What does that actually mean? (2020)
- Fast UTF-8 validation (2020) (HN) (Code)
- UTF-8 Illustrator
- Coded Character Sets, History and Development (1980)
- Awesome Code Points - Curated list of characters in Unicode, that have interesting (and maybe not widely known) features or are awesome in some other way.
- Text rendering tests - Unicode’s test suite for text rendering engines.
- Executable PNGs (2020) (Lobsters) (HN)
- Unicode Proposal to Encode Subscripts/Superscripts for Mathematical Programming
- Emoji Under the Hood (HN)
- The history of UTF-8 as told by Rob Pike (Lobsters) (HN)
- Concise Encoding - Friendly data format for human and machine. Ad-hoc, secure, with 1:1 compatible twin binary and text formats and rich type support. (Code)
- Practical Reed-Solomon for Programmers (2021)
- Apache Avro - Data serialization system. (Web)
- Unicode sorting is hard & why browsers added special emoji matching to regexp (2021)
- Any Encoding, Ever (2021) (HN)
- casync - Content Addressable Data Synchronizer.
- ANSI Escape Codes (HN)
- Entropy coding in Oodle Data: Huffman coding (2021)
- Fun with Morse Code
- Latinendian vs Arabendian (2020)
- ICU - International Components for Unicode (Code)
- ruststep - STEP toolkit for Rust.
- Overview of Serialization Technologies (2018)
- Substrait - Cross-Language Serialization for Relational Algebra. (Code) (substrait-rs)
- OpenH264 - Open Source H.264 Codec.
- Unicode Normalization Forms: When ö ≠ ö (2021) (HN)
- Planus - Alternative flatbuffer implementation.
- TinyCBOR - Concise Binary Object Representation (CBOR) Library.
- Cheat sheets for the Portable Document Format
- Understanding UUIDs, ULIDs and String Representations (Lobsters) (HN)
- How UTF-8 works (2022) (HN) (Lobsters)
- What Every Programmer Absolutely, Positively Needs To Know About Encodings (2011) (HN)
- Why I invented “dash encoding”, a new encoding scheme for URL paths (2022) (Tweet)
- You Don't Know GIF – An analysis of a GIF file and some weird GIF features (2022)
- DeGauss - Avro schema compatibility checker.
- Hex: A Strategy Guide (HN)
- trrs - CLI tool to transform data between different encodings.
- Ask HN: Is there a tool to generate binary protocol figures out of a spec? (2022)
- Plain Text - Dylan Beattie (2021)
- Low Complexity Communication Codec (LC3)
- ltp - High performance, readable, and maintainable, in-place encoding format.
- Identity Crisis: Sequence v. UUID as Primary Key (Lobsters)
- Concise Encoding - Secure data format for a modern world. (HN)
- BIPF (Binary In-Place Format) Spec - Binary format designed for in-place (without parsing) reads, with schemaless json-like semantics.
- New UUID Formats (2022) (HN)
- UTF8.XYZ - Quick web app for fetching Unicode characters without extra fluff. (Code)
- How to estimate disk space
- muon - Compact and simple binary object notation. (Lobsters) (Doc)
- Character encoding and UTF-8 (2022) (HN)
- Type of Barcodes and Their Usage (HN)
- Unicode Character Search - Search for Unicode Characters by name, codepoint or text. (Code)
- Understanding Big Data File Formats (2022)
- Free Lossless Audio Codec (FLAC)
- How QR codes work (2022) (HN)