-
Notifications
You must be signed in to change notification settings - Fork 12
ST::utf_conversion
Since string_theory 3.0.
#include <string_theory/utf_conversion>
Name | Summary |
---|---|
utf_validation_t | Behavior for handling UTF validation in conversions |
Name | Summary |
---|---|
utf16_to_utf8 | Convert UTF-16 text to UTF-8 |
utf32_to_utf8 | Convert UTF-32 text to UTF-8 |
wchar_to_utf8 | Convert wide text to UTF-8 |
latin_1_to_utf8 | Convert Latin-1 text to UTF-8 |
utf8_to_utf16 | Convert UTF-8 text to UTF-16 |
utf32_to_utf16 | Convert UTF-32 text to UTF-16 |
wchar_to_utf16 | Convert wide text to UTF-16 |
latin_1_to_utf16 | Convert Latin-1 text to UTF-16 |
utf8_to_utf32 | Convert UTF-8 text to UTF-32 |
utf16_to_utf32 | Convert UTF-16 text to UTF-32 |
wchar_to_utf32 | Convert wide text to UTF-32 |
latin_1_to_utf32 | Convert Latin-1 text to UTF-32 |
utf8_to_wchar | Convert UTF-8 text to wide text |
utf16_to_wchar | Convert UTF-16 text to wide text |
utf32_to_wchar | Convert UTF-32 text to wide text |
latin_1_to_wchar | Convert Latin-1 text to wide text |
utf8_to_latin_1 | Convert UTF-8 text to Latin-1 |
utf16_to_latin_1 | Convert UTF-16 text to Latin-1 |
utf32_to_latin_1 | Convert UTF-32 text to Latin-1 |
wchar_to_latin_1 | Convert wide text to Latin-1 |
Name | Summary |
---|---|
ST_DEFAULT_VALIDATION | Default value for utf_validation_t values |
These functions provide a standalone way to convert between string_theory's
supported character encodings, without having to go through ST::string
.
- UTF-8
- UTF-16
- UTF-32 (or UCS4)
- Latin-1
- "wide" strings using the platform's native
wchar_t
type. These are assumed to be encoded as either UTF-16 or UTF-32, depending on the size of thewchar_t
type. Other wide character encodings are not currently supported.
Since string_theory 3.0.
enum utf_validation_t
{
assume_valid,
substitute_invalid,
check_validity
};
Options for dealing with invalid character sequences in encoding/decoding operations.
- assume_valid: Don't do any checking or substitution. Only use this value if you are certain the data is already correct for the target encoding.
-
substitute_invalid: Replace invalid sequences with a substitute. For
conversions to Unicode encodings, this will use the Unicode replacement
character (U+FFFD). For conversions to Latin-1, this will use
'?'
. - check_validity: Throw a ST::unicode_error exception if any invalid sequences are encountered in the source data. This is the default for most conversions.
assert_validity: Call the string_theory assert handler if any invalid sequences are encountered in the source data.
Changed in 3.0: Removed assert_validity
.
Signature | |
---|---|
ST::char_buffer ST::latin_1_to_utf8(const char *astr, size_t size) | (1) |
ST::char_buffer ST::latin_1_to_utf8(const char_buffer &astr) | (2) |
Convert Latin-1 text to UTF-8.
Since string_theory 3.0.
Signature | |
---|---|
ST::utf16_buffer ST::latin_1_to_utf16(const char *astr, size_t size) | (1) |
ST::utf16_buffer ST::latin_1_to_utf16(const char_buffer &astr) | (2) |
Convert Latin-1 text to UTF-16.
Since string_theory 3.0.
Signature | |
---|---|
ST::utf32_buffer ST::latin_1_to_utf32(const char *astr, size_t size) | (1) |
ST::utf32_buffer ST::latin_1_to_utf32(const char_buffer &astr) | (2) |
Convert Latin-1 text to UTF-32.
Since string_theory 3.0.
Signature | |
---|---|
ST::wchar_buffer ST::latin_1_to_wchar(const char *astr, size_t size) | (1) |
ST::wchar_buffer ST::latin_1_to_wchar(const char_buffer &astr) | (2) |
Convert Latin-1 text to wide text. The returned buffer will be encoded as
either UTF-16 or UTF-32, depending on the size of the wchar_t
type.
Since string_theory 3.0.
Signature | |
---|---|
ST::char_buffer ST::utf8_to_latin_1(const char *utf8, size_t size, utf_validation_t validation = ST_DEFAULT_VALIDATION, bool substitute_out_of_range = true) | (1) |
ST::char_buffer ST::utf8_to_latin_1(const char8_t *utf8, size_t size, utf_validation_t validation = ST_DEFAULT_VALIDATION, bool substitute_out_of_range = true) | (2) |
ST::char_buffer ST::utf8_to_latin_1(const char_buffer &utf8, utf_validation_t validation = ST_DEFAULT_VALIDATION, bool substitute_out_of_range = true) | (3) |
Convert UTF-8 text to Latin-1. Any characters outside of the Latin-1 range
will be replaced by ?
if substitute_out_of_range
is true
, or will cause
a ST::unicode_error to be thrown otherwise.
Since string_theory 3.0.
Signature | |
---|---|
ST::utf16_buffer ST::utf8_to_utf16(const char *utf8, size_t size, utf_validation_t validation = ST_DEFAULT_VALIDATION) | (1) |
ST::utf16_buffer ST::utf8_to_utf16(const char8_t *utf8, size_t size, utf_validation_t validation = ST_DEFAULT_VALIDATION) | (2) |
ST::utf16_buffer ST::utf8_to_utf16(const char_buffer &utf8, utf_validation_t validation = ST_DEFAULT_VALIDATION) | (3) |
Convert UTF-8 text to UTF-16.
Since string_theory 3.0.
Signature | |
---|---|
ST::utf32_buffer ST::utf8_to_utf32(const char *utf8, size_t size, utf_validation_t validation = ST_DEFAULT_VALIDATION) | (1) |
ST::utf32_buffer ST::utf8_to_utf32(const char8_t *utf8, size_t size, utf_validation_t validation = ST_DEFAULT_VALIDATION) | (2) |
ST::utf32_buffer ST::utf8_to_utf32(const char_buffer &utf8, utf_validation_t validation = ST_DEFAULT_VALIDATION) | (3) |
Convert UTF-8 text to UTF-32.
Since string_theory 3.0.
Signature | |
---|---|
ST::wchar_buffer ST::utf8_to_wchar(const char *utf8, size_t size, utf_validation_t validation = ST_DEFAULT_VALIDATION) | (1) |
ST::wchar_buffer ST::utf8_to_wchar(const char8_t *utf8, size_t size, utf_validation_t validation = ST_DEFAULT_VALIDATION) | (2) |
ST::wchar_buffer ST::utf8_to_wchar(const char_buffer &utf8, utf_validation_t validation = ST_DEFAULT_VALIDATION) | (3) |
Convert UTF-8 text to wide text. The returned buffer will be encoded as either
UTF-16 or UTF-32, depending on the size of the wchar_t
type.
Since string_theory 3.0.
Signature | |
---|---|
ST::char_buffer ST::utf16_to_latin_1(const char16_t *utf16, size_t size, utf_validation_t validation = ST_DEFAULT_VALIDATION, bool substitute_out_of_range = true) | (1) |
ST::char_buffer ST::utf16_to_latin_1(const utf16_buffer &utf16, utf_validation_t validation = ST_DEFAULT_VALIDATION, bool substitute_out_of_range = true) | (2) |
Convert UTF-16 text to Latin-1. Any characters outside of the Latin-1 range
will be replaced by ?
if substitute_out_of_range
is true
, or will cause
a ST::unicode_error to be thrown otherwise.
Since string_theory 3.0.
Signature | |
---|---|
ST::char_buffer ST::utf16_to_utf8(const char16_t *utf16, size_t size, utf_validation_t validation = ST_DEFAULT_VALIDATION) | (1) |
ST::char_buffer ST::utf16_to_utf8(const utf16_buffer &utf16, utf_validation_t validation = ST_DEFAULT_VALIDATION) | (2) |
Convert UTF-16 text to UTF-8.
Since string_theory 3.0.
Signature | |
---|---|
ST::utf32_buffer ST::utf16_to_utf32(const char16_t *utf16, size_t size, utf_validation_t validation = ST_DEFAULT_VALIDATION) | (1) |
ST::utf32_buffer ST::utf16_to_utf32(const utf16_buffer &utf16, utf_validation_t validation = ST_DEFAULT_VALIDATION) | (2) |
Convert UTF-16 text to UTF-32.
Since string_theory 3.0.
Signature | |
---|---|
ST::wchar_buffer ST::utf16_to_wchar(const char16_t *utf16, size_t size, utf_validation_t validation = ST_DEFAULT_VALIDATION) | (1) |
ST::wchar_buffer ST::utf16_to_wchar(const utf16_buffer &utf16, utf_validation_t validation = ST_DEFAULT_VALIDATION) | (2) |
Convert UTF-16 text to wide text. The returned buffer will be encoded as either
UTF-16 or UTF-32, depending on the size of the wchar_t
type. When wchar_t
is 16 bits, this will return an unmodified copy of the buffer.
Since string_theory 3.0.
Signature | |
---|---|
ST::char_buffer ST::utf32_to_latin_1(const char32_t *utf32, size_t size, utf_validation_t validation = ST_DEFAULT_VALIDATION, bool substitute_out_of_range = true) | (1) |
ST::char_buffer ST::utf32_to_latin_1(const utf32_buffer &utf32, utf_validation_t validation = ST_DEFAULT_VALIDATION, bool substitute_out_of_range = true) | (2) |
Convert UTF-32 text to Latin-1. Any characters outside of the Latin-1 range
will be replaced by ?
if substitute_out_of_range
is true
, or will cause
a ST::unicode_error to be thrown otherwise.
Since string_theory 3.0.
Signature | |
---|---|
ST::char_buffer ST::utf32_to_utf8(const char32_t *utf32, size_t size, utf_validation_t validation = ST_DEFAULT_VALIDATION) | (1) |
ST::char_buffer ST::utf32_to_utf8(const utf32_buffer &utf32, utf_validation_t validation = ST_DEFAULT_VALIDATION) | (2) |
Convert UTF-32 text to UTF-8.
Since string_theory 3.0.
Signature | |
---|---|
ST::utf16_buffer ST::utf32_to_utf16(const char32_t *utf32, size_t size, utf_validation_t validation = ST_DEFAULT_VALIDATION) | (1) |
ST::utf16_buffer ST::utf32_to_utf16(const utf32_buffer &utf32, utf_validation_t validation = ST_DEFAULT_VALIDATION) | (2) |
Convert UTF-32 text to UTF-16.
Since string_theory 3.0.
Signature | |
---|---|
ST::wchar_buffer ST::utf32_to_wchar(const char32_t *utf32, size_t size, utf_validation_t validation = ST_DEFAULT_VALIDATION) | (1) |
ST::wchar_buffer ST::utf32_to_wchar(const utf32_buffer &utf32, utf_validation_t validation = ST_DEFAULT_VALIDATION) | (2) |
Convert UTF-32 text to wide text. The returned buffer will be encoded as either
UTF-16 or UTF-32, depending on the size of the wchar_t
type. When wchar_t
is 32 bits, this will return an unmodified copy of the buffer.
Since string_theory 3.0.
Signature | |
---|---|
ST::char_buffer ST::wchar_to_latin_1(const wchar_t *wstr, size_t size, utf_validation_t validation = ST_DEFAULT_VALIDATION, bool substitute_out_of_range = true) | (1) |
ST::char_buffer ST::wchar_to_latin_1(const wchar_buffer &wstr, utf_validation_t validation = ST_DEFAULT_VALIDATION, bool substitute_out_of_range = true) | (2) |
Convert wide text to Latin-1. wstr
is assumed to be encoded as either
UTF-16 or UTF-32, depending on the size of the wchar_t
type. Any characters
outside of the Latin-1 range will be replaced by ?
if substitute_out_of_range
is true
, or will cause a ST::unicode_error to be
thrown otherwise.
Since string_theory 3.0.
Signature | |
---|---|
ST::char_buffer ST::wchar_to_utf8(const wchar_t *wstr, size_t size, utf_validation_t validation = ST_DEFAULT_VALIDATION) | (1) |
ST::char_buffer ST::wchar_to_utf8(const wchar_buffer &wstr, utf_validation_t validation = ST_DEFAULT_VALIDATION) | (2) |
Convert wide text to UTF-8. wstr
is assumed to be encoded as either
UTF-16 or UTF-32, depending on the size of the wchar_t
type.
Since string_theory 3.0.
Signature | |
---|---|
ST::utf16_buffer ST::wchar_to_utf16(const wchar_t *wstr, size_t size, utf_validation_t validation = ST_DEFAULT_VALIDATION) | (1) |
ST::utf16_buffer ST::wchar_to_utf16(const wchar_buffer &wstr, utf_validation_t validation = ST_DEFAULT_VALIDATION) | (2) |
Convert wide text to UTF-16. wstr
is assumed to be encoded as either
UTF-16 or UTF-32, depending on the size of the wchar_t
type. When wchar_t
is 16 bits, this will return an unmodified copy of the buffer.
Since string_theory 3.0.
Signature | |
---|---|
ST::utf32_buffer ST::wchar_to_utf32(const wchar_t *wstr, size_t size, utf_validation_t validation = ST_DEFAULT_VALIDATION) | (1) |
ST::utf32_buffer ST::wchar_to_utf32(const wchar_buffer &wstr, utf_validation_t validation = ST_DEFAULT_VALIDATION) | (2) |
Convert wide text to UTF-32. wstr
is assumed to be encoded as either
UTF-16 or UTF-32, depending on the size of the wchar_t
type. When wchar_t
is 32 bits, this will return an unmodified copy of the buffer.
Since string_theory 3.0.
#ifndef ST_DEFAULT_VALIDATION
# define ST_DEFAULT_VALIDATION ST::check_validity
#endif
The default checking type for methods which do validity checking. It is possible to override the default by defining an alternate ST_DEFAULT_VALIDATION before including any string_theory headers, however it is generally recommended to leave the default and explicitly set other values in method calls that need different behavior.
See also utf_validation_t