Skip to content

ST::utf_conversion

Michael Hansen edited this page Nov 22, 2019 · 1 revision

String Encoding Conversion functions

Since string_theory 3.0.

Headers

#include <string_theory/utf_conversion>

Public Types

Name Summary
utf_validation_t Behavior for handling UTF validation in conversions

Public Functions

Name Summary
utf16_to_utf8 Convert UTF-16 text to UTF-8
utf32_to_utf8 Convert UTF-32 text to UTF-8
wchar_to_utf8 Convert wide text to UTF-8
latin_1_to_utf8 Convert Latin-1 text to UTF-8
utf8_to_utf16 Convert UTF-8 text to UTF-16
utf32_to_utf16 Convert UTF-32 text to UTF-16
wchar_to_utf16 Convert wide text to UTF-16
latin_1_to_utf16 Convert Latin-1 text to UTF-16
utf8_to_utf32 Convert UTF-8 text to UTF-32
utf16_to_utf32 Convert UTF-16 text to UTF-32
wchar_to_utf32 Convert wide text to UTF-32
latin_1_to_utf32 Convert Latin-1 text to UTF-32
utf8_to_wchar Convert UTF-8 text to wide text
utf16_to_wchar Convert UTF-16 text to wide text
utf32_to_wchar Convert UTF-32 text to wide text
latin_1_to_wchar Convert Latin-1 text to wide text
utf8_to_latin_1 Convert UTF-8 text to Latin-1
utf16_to_latin_1 Convert UTF-16 text to Latin-1
utf32_to_latin_1 Convert UTF-32 text to Latin-1
wchar_to_latin_1 Convert wide text to Latin-1

Macros

Name Summary
ST_DEFAULT_VALIDATION Default value for utf_validation_t values

Details

These functions provide a standalone way to convert between string_theory's supported character encodings, without having to go through ST::string.

  • UTF-8
  • UTF-16
  • UTF-32 (or UCS4)
  • Latin-1
  • "wide" strings using the platform's native wchar_t type. These are assumed to be encoded as either UTF-16 or UTF-32, depending on the size of the wchar_t type. Other wide character encodings are not currently supported.

Since string_theory 3.0.


Member Type Documentation

ST::utf_validation_t

enum utf_validation_t
{
    assume_valid,
    substitute_invalid,
    check_validity
};

Options for dealing with invalid character sequences in encoding/decoding operations.

  • assume_valid: Don't do any checking or substitution. Only use this value if you are certain the data is already correct for the target encoding.
  • substitute_invalid: Replace invalid sequences with a substitute. For conversions to Unicode encodings, this will use the Unicode replacement character (U+FFFD). For conversions to Latin-1, this will use '?'.
  • check_validity: Throw a ST::unicode_error exception if any invalid sequences are encountered in the source data. This is the default for most conversions.
  • assert_validity: Call the string_theory assert handler if any invalid sequences are encountered in the source data.

Changed in 3.0: Removed assert_validity.


Member Documentation

ST::latin_1_to_utf8

Signature
ST::char_buffer ST::latin_1_to_utf8(const char *astr, size_t size) (1)
ST::char_buffer ST::latin_1_to_utf8(const char_buffer &astr) (2)

Convert Latin-1 text to UTF-8.

Since string_theory 3.0.


ST::latin_1_to_utf16

Signature
ST::utf16_buffer ST::latin_1_to_utf16(const char *astr, size_t size) (1)
ST::utf16_buffer ST::latin_1_to_utf16(const char_buffer &astr) (2)

Convert Latin-1 text to UTF-16.

Since string_theory 3.0.


ST::latin_1_to_utf32

Signature
ST::utf32_buffer ST::latin_1_to_utf32(const char *astr, size_t size) (1)
ST::utf32_buffer ST::latin_1_to_utf32(const char_buffer &astr) (2)

Convert Latin-1 text to UTF-32.

Since string_theory 3.0.


ST::latin_1_to_wchar

Signature
ST::wchar_buffer ST::latin_1_to_wchar(const char *astr, size_t size) (1)
ST::wchar_buffer ST::latin_1_to_wchar(const char_buffer &astr) (2)

Convert Latin-1 text to wide text. The returned buffer will be encoded as either UTF-16 or UTF-32, depending on the size of the wchar_t type.

Since string_theory 3.0.


ST::utf8_to_latin_1

Signature
ST::char_buffer ST::utf8_to_latin_1(const char *utf8, size_t size, utf_validation_t validation = ST_DEFAULT_VALIDATION, bool substitute_out_of_range = true) (1)
ST::char_buffer ST::utf8_to_latin_1(const char8_t *utf8, size_t size, utf_validation_t validation = ST_DEFAULT_VALIDATION, bool substitute_out_of_range = true) (2)
ST::char_buffer ST::utf8_to_latin_1(const char_buffer &utf8, utf_validation_t validation = ST_DEFAULT_VALIDATION, bool substitute_out_of_range = true) (3)

Convert UTF-8 text to Latin-1. Any characters outside of the Latin-1 range will be replaced by ? if substitute_out_of_range is true, or will cause a ST::unicode_error to be thrown otherwise.

Since string_theory 3.0.


ST::utf8_to_utf16

Signature
ST::utf16_buffer ST::utf8_to_utf16(const char *utf8, size_t size, utf_validation_t validation = ST_DEFAULT_VALIDATION) (1)
ST::utf16_buffer ST::utf8_to_utf16(const char8_t *utf8, size_t size, utf_validation_t validation = ST_DEFAULT_VALIDATION) (2)
ST::utf16_buffer ST::utf8_to_utf16(const char_buffer &utf8, utf_validation_t validation = ST_DEFAULT_VALIDATION) (3)

Convert UTF-8 text to UTF-16.

Since string_theory 3.0.


ST::utf8_to_utf32

Signature
ST::utf32_buffer ST::utf8_to_utf32(const char *utf8, size_t size, utf_validation_t validation = ST_DEFAULT_VALIDATION) (1)
ST::utf32_buffer ST::utf8_to_utf32(const char8_t *utf8, size_t size, utf_validation_t validation = ST_DEFAULT_VALIDATION) (2)
ST::utf32_buffer ST::utf8_to_utf32(const char_buffer &utf8, utf_validation_t validation = ST_DEFAULT_VALIDATION) (3)

Convert UTF-8 text to UTF-32.

Since string_theory 3.0.


ST::utf8_to_wchar

Signature
ST::wchar_buffer ST::utf8_to_wchar(const char *utf8, size_t size, utf_validation_t validation = ST_DEFAULT_VALIDATION) (1)
ST::wchar_buffer ST::utf8_to_wchar(const char8_t *utf8, size_t size, utf_validation_t validation = ST_DEFAULT_VALIDATION) (2)
ST::wchar_buffer ST::utf8_to_wchar(const char_buffer &utf8, utf_validation_t validation = ST_DEFAULT_VALIDATION) (3)

Convert UTF-8 text to wide text. The returned buffer will be encoded as either UTF-16 or UTF-32, depending on the size of the wchar_t type.

Since string_theory 3.0.


ST::utf16_to_latin_1

Signature
ST::char_buffer ST::utf16_to_latin_1(const char16_t *utf16, size_t size, utf_validation_t validation = ST_DEFAULT_VALIDATION, bool substitute_out_of_range = true) (1)
ST::char_buffer ST::utf16_to_latin_1(const utf16_buffer &utf16, utf_validation_t validation = ST_DEFAULT_VALIDATION, bool substitute_out_of_range = true) (2)

Convert UTF-16 text to Latin-1. Any characters outside of the Latin-1 range will be replaced by ? if substitute_out_of_range is true, or will cause a ST::unicode_error to be thrown otherwise.

Since string_theory 3.0.


ST::utf16_to_utf8

Signature
ST::char_buffer ST::utf16_to_utf8(const char16_t *utf16, size_t size, utf_validation_t validation = ST_DEFAULT_VALIDATION) (1)
ST::char_buffer ST::utf16_to_utf8(const utf16_buffer &utf16, utf_validation_t validation = ST_DEFAULT_VALIDATION) (2)

Convert UTF-16 text to UTF-8.

Since string_theory 3.0.


ST::utf16_to_utf32

Signature
ST::utf32_buffer ST::utf16_to_utf32(const char16_t *utf16, size_t size, utf_validation_t validation = ST_DEFAULT_VALIDATION) (1)
ST::utf32_buffer ST::utf16_to_utf32(const utf16_buffer &utf16, utf_validation_t validation = ST_DEFAULT_VALIDATION) (2)

Convert UTF-16 text to UTF-32.

Since string_theory 3.0.


ST::utf16_to_wchar

Signature
ST::wchar_buffer ST::utf16_to_wchar(const char16_t *utf16, size_t size, utf_validation_t validation = ST_DEFAULT_VALIDATION) (1)
ST::wchar_buffer ST::utf16_to_wchar(const utf16_buffer &utf16, utf_validation_t validation = ST_DEFAULT_VALIDATION) (2)

Convert UTF-16 text to wide text. The returned buffer will be encoded as either UTF-16 or UTF-32, depending on the size of the wchar_t type. When wchar_t is 16 bits, this will return an unmodified copy of the buffer.

Since string_theory 3.0.


ST::utf32_to_latin_1

Signature
ST::char_buffer ST::utf32_to_latin_1(const char32_t *utf32, size_t size, utf_validation_t validation = ST_DEFAULT_VALIDATION, bool substitute_out_of_range = true) (1)
ST::char_buffer ST::utf32_to_latin_1(const utf32_buffer &utf32, utf_validation_t validation = ST_DEFAULT_VALIDATION, bool substitute_out_of_range = true) (2)

Convert UTF-32 text to Latin-1. Any characters outside of the Latin-1 range will be replaced by ? if substitute_out_of_range is true, or will cause a ST::unicode_error to be thrown otherwise.

Since string_theory 3.0.


ST::utf32_to_utf8

Signature
ST::char_buffer ST::utf32_to_utf8(const char32_t *utf32, size_t size, utf_validation_t validation = ST_DEFAULT_VALIDATION) (1)
ST::char_buffer ST::utf32_to_utf8(const utf32_buffer &utf32, utf_validation_t validation = ST_DEFAULT_VALIDATION) (2)

Convert UTF-32 text to UTF-8.

Since string_theory 3.0.


ST::utf32_to_utf16

Signature
ST::utf16_buffer ST::utf32_to_utf16(const char32_t *utf32, size_t size, utf_validation_t validation = ST_DEFAULT_VALIDATION) (1)
ST::utf16_buffer ST::utf32_to_utf16(const utf32_buffer &utf32, utf_validation_t validation = ST_DEFAULT_VALIDATION) (2)

Convert UTF-32 text to UTF-16.

Since string_theory 3.0.


ST::utf32_to_wchar

Signature
ST::wchar_buffer ST::utf32_to_wchar(const char32_t *utf32, size_t size, utf_validation_t validation = ST_DEFAULT_VALIDATION) (1)
ST::wchar_buffer ST::utf32_to_wchar(const utf32_buffer &utf32, utf_validation_t validation = ST_DEFAULT_VALIDATION) (2)

Convert UTF-32 text to wide text. The returned buffer will be encoded as either UTF-16 or UTF-32, depending on the size of the wchar_t type. When wchar_t is 32 bits, this will return an unmodified copy of the buffer.

Since string_theory 3.0.


ST::wchar_to_latin_1

Signature
ST::char_buffer ST::wchar_to_latin_1(const wchar_t *wstr, size_t size, utf_validation_t validation = ST_DEFAULT_VALIDATION, bool substitute_out_of_range = true) (1)
ST::char_buffer ST::wchar_to_latin_1(const wchar_buffer &wstr, utf_validation_t validation = ST_DEFAULT_VALIDATION, bool substitute_out_of_range = true) (2)

Convert wide text to Latin-1. wstr is assumed to be encoded as either UTF-16 or UTF-32, depending on the size of the wchar_t type. Any characters outside of the Latin-1 range will be replaced by ? if substitute_out_of_range is true, or will cause a ST::unicode_error to be thrown otherwise.

Since string_theory 3.0.


ST::wchar_to_utf8

Signature
ST::char_buffer ST::wchar_to_utf8(const wchar_t *wstr, size_t size, utf_validation_t validation = ST_DEFAULT_VALIDATION) (1)
ST::char_buffer ST::wchar_to_utf8(const wchar_buffer &wstr, utf_validation_t validation = ST_DEFAULT_VALIDATION) (2)

Convert wide text to UTF-8. wstr is assumed to be encoded as either UTF-16 or UTF-32, depending on the size of the wchar_t type.

Since string_theory 3.0.


ST::wchar_to_utf16

Signature
ST::utf16_buffer ST::wchar_to_utf16(const wchar_t *wstr, size_t size, utf_validation_t validation = ST_DEFAULT_VALIDATION) (1)
ST::utf16_buffer ST::wchar_to_utf16(const wchar_buffer &wstr, utf_validation_t validation = ST_DEFAULT_VALIDATION) (2)

Convert wide text to UTF-16. wstr is assumed to be encoded as either UTF-16 or UTF-32, depending on the size of the wchar_t type. When wchar_t is 16 bits, this will return an unmodified copy of the buffer.

Since string_theory 3.0.


ST::wchar_to_utf32

Signature
ST::utf32_buffer ST::wchar_to_utf32(const wchar_t *wstr, size_t size, utf_validation_t validation = ST_DEFAULT_VALIDATION) (1)
ST::utf32_buffer ST::wchar_to_utf32(const wchar_buffer &wstr, utf_validation_t validation = ST_DEFAULT_VALIDATION) (2)

Convert wide text to UTF-32. wstr is assumed to be encoded as either UTF-16 or UTF-32, depending on the size of the wchar_t type. When wchar_t is 32 bits, this will return an unmodified copy of the buffer.

Since string_theory 3.0.


Macro Documentation

ST_DEFAULT_VALIDATION

#ifndef ST_DEFAULT_VALIDATION
#   define ST_DEFAULT_VALIDATION ST::check_validity
#endif

The default checking type for methods which do validity checking. It is possible to override the default by defining an alternate ST_DEFAULT_VALIDATION before including any string_theory headers, however it is generally recommended to leave the default and explicitly set other values in method calls that need different behavior.

See also utf_validation_t

Clone this wiki locally