-
Notifications
You must be signed in to change notification settings - Fork 12
ST::string
#include <string_theory/string>
Member Type | Definition |
---|---|
size_type |
size_t |
difference_type |
ptrdiff_t |
value_type |
ST::char_buffer::value_type |
const_pointer |
const value_type * |
const_reference |
const value_type & |
const_iterator |
ST::char_buffer::const_iterator |
const_reverse_iterator |
std::reverse_iterator<const_iterator> |
Name | Summary |
---|---|
case_sensitivity_t | Enumeration for case sensitivity selection |
Name | Summary |
---|---|
(constructor) | ST::string constructors |
set | Set a string's content |
set_validated | Set a string's content from pre-validated UTF-8 data |
operator= | Set a string's content with an overloaded = operator |
operator+= | Append additional content to a string |
c_str | Get a C-style string pointer to the string's internal UTF-8 data |
u8_str | Get a C-style string pointer to the string's internal UTF-8 data |
data | Return a pointer to the string's internal UTF-8 data |
at | Return a reference to a specific byte in the string |
operator[] | Return a reference to a specific byte in the string |
Return a specific byte in the string | |
front | Return a reference to the first character in the string |
back | Return a reference to the last character in the string |
begin cbegin |
Return an iterator to the front of the string |
end cend |
Return an iterator to the end of the string |
rbegin crbegin |
Return a reverse iterator to the start of the string |
rend crend |
Return a reverse iterator to the end of the string |
to_utf8 | Get a ST::char_buffer copy of the string's UTF-8 data |
to_utf16 | Convert a string to UTF-16 |
to_utf32 | Convert a string to UTF-32 |
to_wchar | Convert a string to a platform's wchar_t buffer type |
to_latin_1 | Convert a string to Latin-1 |
to_buffer | Convert a string to an ST::buffer<T>
|
to_std_string to_std_wstring to_std_u16string to_std_u32string to_std_u8string |
Get a std::string copy of the string data |
to_path | Convert a string to a std::filesystem::path object |
view | Create a std::string_view of all or part of this string |
Implicit cast to std::string_view of this string |
|
size | Return the number of UTF-8 bytes contained in the string |
empty |
Return whether the string is empty |
clear | Reset the string to the empty state |
to_short to_int to_long to_long_long to_ushort to_uint to_ulong to_ulong_long |
Convert a string to an integer |
to_float to_double |
Convert a string to a floating point number |
to_uint64 |
Convert a string to a 64-bit integer |
to_bool | Convert a string to a boolean |
compare compare_i |
Compare string content lexicographically |
compare_n compare_ni |
Compare N characters of string content lexicographically |
operator== operator!= operator< |
Overloaded operators for string comparison |
find | Find characters or substrings within this string |
find_last | Find the last instance of characters or substrings within this string |
contains | Determine whether a string contains a character or substring |
trim_left | Trim characters from the left side of the string |
trim_right | Trim characters from the right side of the string |
trim | Trim characters from both sides of the string |
substr | Extract a substring from part or all of this string |
left | Extract a substring from the left side of this string |
right | Extract a substring from the right side of this string |
starts_with | Determine whether a string starts with a given prefix |
ends_with | Determine whether a string ends with a given suffix |
before_first | Extract part of a string before the first match |
after_first | Extract part of a string after the first match |
before_last | Extract part of a string before the last match |
after_last | Extract part of a string after the last match |
replace | Replace instances of some search text within a string |
to_upper | Convert a string to upper-case |
to_lower | Convert a string to lower-case |
split | Split a string based on a given separator, preserving empty parts |
tokenize | Split a string into tokens separated by one or more delimiters |
Name | Summary |
---|---|
from_validated | Convert pre-validated UTF-8 data to an ST::string |
from_utf8 | Convert UTF-8 data to an ST::string |
from_utf16 | Convert UTF-16 data to an ST::string |
from_utf32 | Convert UTF-32 data to an ST::string |
from_wchar | Convert wchar_t data to an ST::string |
from_latin_1 | Convert Latin-1 data to an ST::string |
from_std_string from_std_wstring |
Convert std::string types to an ST::string |
from_path | Convert std::filesystem::path objects to an ST::string |
from_int from_uint |
Create a string representation of an integer |
from_float from_double |
Create a string representation of floating point numbers |
from_uint64 |
Create a string representation of a 64-bit integer |
from_bool | Create a string representation of a boolean value |
fill | Create a string whose contents are filled with a given character |
Name | Summary |
---|---|
hash | Hash functor for use in hashing containers |
hash_i | Case-insensitive hash functor for use in hashing containers |
less_i | Case-insensitive less functor for use in sorted containers |
equal_i | Case-insensitive equal functor for use in containers |
operator+ | String concatenation operator |
operator== operator!= |
String comparison operators |
operator"" _st | ST::string user-defined literal operator |
Name | Summary |
---|---|
ST_LITERAL | Efficient construction of ST::string literals |
ST_WHITESPACE | Default collection of characters to treat as whitespace for tokenize
|
ST::string
provides storage and manipulation tools for Unicode strings. The
string data is stored internally as UTF-8 (in a ST::char_buffer
object). This makes it easier for dealing with streams and files already in
UTF-8 encoding, but means that many unicode characters make take up more than
one code point (byte) in the string object.
ST::string
objects can be easily converted to/from a few other encodings,
including UTF-16, UTF-32 (UCS4), and wchar_t
arrays (assumed to be either
UTF-16 or UTF-32 depending on the platform).
With the exception of operator=
and operator+=
overloads and the set()
method, ST::string
objects are immutable. All operations which manipulate
the string data (including operator=
and operator+=
) will create a new
string buffer internally with a copy of the necessary data. This means that
all ST::string
members are re-entrant. Furthermore, the buffers returned
by to_utf8() and c_str() are accessors to the string's
internal storage, meaning they do not have to be stored externally in order
to remain valid.
Although it's perfectly valid to create strings with normal C string literals,
ST::string
provides some helpers which skip the normal validation and checks
when you know the input data is a string literal already encoded as valid
UTF-8 bytes.
// The following are equivalent, but the second line will generally be faster
ST::string greeting = "Hello";
ST::string greeting_2 = ST_LITERAL("Hello");
// If you compiler supports user-defined literals, the second greeting can
// also be written more concisely as
ST::string greeting_3 = "Hello"_st;
ST::string
will by default check that its data is valid UTF-8. If it finds
any input which it can't encode, it will either report the error or substitute
it with a substitute character (U+FFFD), depending on the conversion options.
It is also possible to tell ST::string
to skip its checks, if you know the
input data is already valid UTF-8. However, passing invalid UTF-8 data into
an ST::string
with ST::assume_valid
is undefined behavior, and may cause unexpected results and bugs.
Finally, it is possible to treat input as Latin-1 (ISO-8859-1) data, which always succeeds. However, passing UTF-8 data as Latin-1 may result in the individual UTF-8 bytes showing up as Latin-1 character sequences.
char *bad_input = "...";
// This will throw ST::unicode_error with a message about what was wrong if
// the conversion encounters any sequences it can't decode.
ST::string str1(bad_input, ST_AUTO_SIZE, ST::check_validity);
// This will replace any character sequences it can't decode with the Unicode
// substitute character (U+FFFD).
ST::string str3(bad_input, ST_AUTO_SIZE, ST::substitute_invalid);
// This will assume the input is already valid UTF-8. Passing invalid UTF-8
// data with ST::assume_valid is undefined behavior, and may have unexpected
// results and bugs.
ST::string str4(bad_input, ST_AUTO_SIZE, ST::assume_valid);
// Conversion always succeeds; treat data as Latin-1
ST::string str5 = ST::string::from_latin_1(bad_input);
enum case_sensitivity_t
{
case_sensitive,
case_insensitive
};
Indicates the case sensitivity for various find and comparison operations.
- case_sensitive: Consider upper- and lower-case characters as different when doing string comparisons and searches.
- case_insensitive: Consider upper- and lower-case characters as equal when doing string comparisons and searches.
- Default constructor for strings. Creates an empty string.
- Shortcut constructor for empty strings. This is equivalent to the empty constructor (1).
- Construct a string from the first
size
bytes of the string pointed to bycstr
. The data is expected to be encoded as UTF-8. - Construct a string from the first
size
bytes of the UTF-8 string pointed to bycstr
. - Construct a string from the first
size
wide characters of the string pointed to bywstr
. The data is expected to be encoded as either UTF-16 or UTF-32, depending on your platform's wchar_t support. - Construct a string from the first
size
characters of the UTF-16 string pointed to bycstr
. - Construct a string from the first
size
characters of the UTF-32 string pointed to bycstr
. - Construct a string whose contents are a copy of
copy
. - Move the contents of
move
into this string object. - Construct a string from the contents of
init
. The data stored ininit
is expected to be encoded as UTF-8. - Move the contents of
init
into this string's internal UTF-8 buffer. The data stored ininit
will still be checked according tovalidation
, and is expected to be encoded as UTF-8. - Construct a string from the wide character data provided in
init
. The data provided ininit
is expected to be encoded as either UTF-16 or UTF-32, depending on your platform's wchar_t support. - Construct a string from the UTF-16 data provided in
init
. - Construct a string from the UTF-32 data provided in
init
. - Construct a string from the contents of
init
. The string data is expected to be encoded as UTF-8. - Construct a string from the contents of
init
. The string data ininit
is expected to be encoded as either UTF-16 or UTF-32, depending on your platform's wchar_t support. - Construct a string from the contents of the UTF-16 string in
init
. - Construct a string from the contents of the UTF-32 string in
init
. - Construct a string from the contents of the UTF-8 string in
init
. - Construct a string from the string view captured by
view
. The string data is expected to be encoded as UTF-8. - Construct a string from the string view captured by
view
. The string data inview
is expected to be encoded as either UTF-16 or UTF-32, depending on your platform's wchar_t support. - Construct a string from the UTF-16 string view captured by
view
. - Construct a string from the UTF-32 string view captured by
view
. - Construct a string from the UTF-8 string view captured by
view
. - Construct a string from the filesystem path in
init
.
For the variants which take a size, if size
is ST_AUTO_SIZE
, the length of
the input will be determined as if with strlen()
or equivalent.
Changed in 1.1: Added std::string
, std::wstring
, and std::filesystem::path
constructors.
Changed in 1.7: Added const char16_t *
, const char32_t *
, std::u16string
,
and std::u32string
constructors.
Changed in 2.0: Added std::*string_view
constructors.
Changed in 2.2: Added const char8_t *
and std::u8string*
overloads.
Changed in 3.4: Deprecated null_t
overload.
See also operator=(), set(), from_utf8(), from_utf16(), from_utf32(), from_wchar(), from_std_string(), from_path()
Signature | |
---|---|
string after_first(char sep, ST::case_sensitivity_t cs = case_sensitive) const | (1) |
string after_first(const char *sep, ST::case_sensitivity_t cs = case_sensitive) const | (2) |
string after_first(const char8_t *sep, ST::case_sensitivity_t cs = case_sensitive) const | (3) |
string after_first(const string &sep, ST::case_sensitivity_t cs = case_sensitive) const | (4) |
Returns the part of this string after the first instance of sep
found
within the string. If sep
is not found in the string, an empty string
is returned.
Changed in 2.2: Added const char8_t *
overload.
Signature | |
---|---|
string after_last(char sep, ST::case_sensitivity_t cs = case_sensitive) const | (1) |
string after_last(const char *sep, ST::case_sensitivity_t cs = case_sensitive) const | (2) |
string after_last(const char8_t *sep, ST::case_sensitivity_t cs = case_sensitive) const | (3) |
string after_last(const string &sep, ST::case_sensitivity_t cs = case_sensitive) const | (4) |
Returns the part of this string after the last instance of sep
found
within the string. If sep
is not found in the string, the whole string
is returned.
Changed in 2.2: Added const char8_t *
overload.
Signature |
---|
const char &at(size_t position) const |
Returns a reference to the UTF-8 code unit (byte) at the specified position.
Like ST::char_buffer::at(), this is bounds checked and
may throw std::out_of_range
if the provided position
is outside the string's boundaries.
Since string_theory 2.0.
See also c_str(), operator[]()
Signature |
---|
const char &back() const noexcept |
Return a reference to the last character in the string. If the string is empty, this returns a reference to the terminating nul character.
Since string_theory 2.0.
Signature | |
---|---|
string before_first(char sep, ST::case_sensitivity_t cs = case_sensitive) const | (1) |
string before_first(const char *sep, ST::case_sensitivity_t cs = case_sensitive) const | (2) |
string before_first(const char8_t *sep, ST::case_sensitivity_t cs = case_sensitive) const | (3) |
string before_first(const string &sep, ST::case_sensitivity_t cs = case_sensitive) const | (4) |
Returns the part of this string before the first instance of sep
found
within the string. If sep
is not found in the string, the whole string
is returned.
Changed in 2.2: Added const char8_t *
overload.
Signature | |
---|---|
string before_last(char sep, ST::case_sensitivity_t cs = case_sensitive) const | (1) |
string before_last(const char *sep, ST::case_sensitivity_t cs = case_sensitive) const | (2) |
string before_last(const char8_t *sep, ST::case_sensitivity_t cs = case_sensitive) const | (3) |
string before_last(const string &sep, ST::case_sensitivity_t cs = case_sensitive) const | (4) |
Returns the part of this string before the last instance of sep
found
within the string. If sep
is not found in the string, an empty string
is returned.
Changed in 2.2: Added const char8_t *
overload.
Signature | |
---|---|
const_iterator begin() const noexcept | (1) |
const_iterator cbegin() const noexcept | (2) |
Return an iterator to the beginning of the string.
Since string_theory 2.0.
See also end()
Signature | |
---|---|
const char *c_str() const noexcept | (1) |
const char *c_str(const char *substitute) const noexcept | (2) |
Returns a pointer to the stored UTF-8 string data. This buffer should always be nul-terminated, so it's safe to use in functions which require C-style string buffers.
For variant #2, If this string is empty, the pointer provided in substitute
will be returned instead.
Changed in 1.7: Added the variant without a substitute string (#1). Previously,
the second form would default its parameter to ""
, which was less efficient than
the first variant.
Signature |
---|
Note: This method is deprecated in string_theory 2.0. New code should switch to using either at() or operator[]().
Returns the UTF-8 code unit (byte) at the specified position. Note that this may return a byte in the middle of a UTF-8 multi-byte sequence! The position is not bounds-checked, so accessing positions outside the range [0, size()+1] will result in undefined behavior.
Deprecated in string_theory 2.0.
Removed in string_theory 3.0.
See also at(), operator[]()
Signature |
---|
void clear() noexcept |
Clear the string data, resetting this string to the empty state.
Since string_theory 3.4
See also empty()
Signature | |
---|---|
int compare(const string &str, ST::case_sensitivity_t cs = case_sensitive) const noexcept | (1) |
int compare(const char *str, ST::case_sensitivity_t cs = case_sensitive) const noexcept | (2) |
int compare(const char8_t *str, ST::case_sensitivity_t cs = case_sensitive) const noexcept | (3) |
int compare_i(const string &str) const noexcept | (4) |
int compare_i(const char *str) const noexcept | (5) |
int compare_i(const char8_t *str) const noexcept | (6) |
Compare this string to str
lexicographically, in a manner similar to
strcmp
. If this string is "less than" (or "before") str
, this
function returns a negative value. If this string is "greater than"
str
, this function returns a positive value. If this string and str
are considered equal, this returns 0
.
Set cs
to case_insensitive in order to perform
a case-insensitive comparison.
The compare_i()
variants (4-6) are provided as convenience shortcuts for
compare(str, case_insensitive)
Changed in 2.2: Added the const char8_t *
overloads.
See also operator==(), operator<()
Signature | |
---|---|
int compare_n(const string &str, size_t count, ST::case_sensitivity_t cs = case_sensitive) const noexcept | (1) |
int compare_n(const char *str, size_t count, ST::case_sensitivity_t cs = case_sensitive) const noexcept | (2) |
int compare_n(const char8_t *str, size_t count, ST::case_sensitivity_t cs = case_sensitive) const noexcept | (3) |
int compare_ni(const string &str, size_t count) const noexcept | (4) |
int compare_ni(const char *str, size_t count) const noexcept | (5) |
int compare_ni(const char8_t *str, size_t count) const noexcept | (6) |
Compare up to the first count
bytes of this string to str
, in a manner
similar to strncmp
.
Set cs
to case_insensitive in order to perform
a case-insensitive comparison.
The compare_ni()
variants (4-6) are provided as convenience shortcuts for
compare_n(str, count, case_insensitive)
Changed in 2.2: Added the const char8_t *
overloads.
See also compare()
Signature | |
---|---|
bool contains(char ch, ST::case_sensitivity_t cs = case_sensitive) const noexcept | (1) |
bool contains(const char *substr, ST::case_sensitivity_t cs = case_sensitive) const noexcept | (2) |
bool contains(const char8_t *substr, ST::case_sensitivity_t cs = case_sensitive) const noexcept | (3) |
bool contains(const string &substr, ST::case_sensitivity_t cs = case_sensitive) const noexcept | (4) |
bool contains(const char *substr, size_t count, ST::case_sensitivity_t cs = case_sensitive) const noexcept | (5) |
bool contains(const char8_t *substr, size_t count, ST::case_sensitivity_t cs = case_sensitive) const noexcept | (6) |
Returns true
if ch
(1) or substr
(2-4) is contained anywhere in this string.
For variants 5 and 6, no more than count
bytes are matched from the substr
.
Changed in 2.2: Added const char8_t *
overload.
Changed in 3.5: Added overloads 5-6 with an explicit count.
See also find()
Signature | |
---|---|
const char *data() const noexcept |
Returns a pointer to the stored UTF-8 string data. This buffer should always be nul-terminated, so it's safe to use in functions which require C-style string buffers.
Since string_theory 3.5.
Signature | |
---|---|
bool empty() const noexcept | (1) |
(2) |
Returns true
if this string is empty (i.e. its size is 0
). Note that even
for an empty string, the first character pointed to by c_str() can
be accessed, and should be the nul character ('\0'
).
Changed in 2.0: Added empty()
and deprecated is_empty()
.
Changed in 3.0: Removed is_empty()
.
Signature | |
---|---|
const_iterator end() const noexcept | (1) |
const_iterator cend() const noexcept | (2) |
Return an iterator to the end of the string.
Since string_theory 2.0.
See also begin()
Signature | |
---|---|
bool ends_with(const string &suffix, ST::case_sensitivity_t cs = case_sensitive) const | (1) |
bool ends_with(const char *suffix, ST::case_sensitivity_t cs = case_sensitive) const | (2) |
bool ends_with(const char8_t *suffix, ST::case_sensitivity_t cs = case_sensitive) const | (3) |
Return true
if this string ends with suffix
.
Changed in 2.0: Marked these functions noexcept
.
Changed in 2.2: Added const char8_t *
overload.
Signature |
---|
static string fill(size_t count, char c) |
Create a string which is pre-populated with count
copies of the ASCII
character in c
.
Signature | |
---|---|
ST_ssize_t find(char ch, ST::case_sensitivity_t cs = case_sensitive) const noexcept | (1) |
ST_ssize_t find(const char *substr, ST::case_sensitivity_t cs = case_sensitive) const noexcept | (2) |
ST_ssize_t find(const char8_t *substr, ST::case_sensitivity_t cs = case_sensitive) const noexcept | (3) |
ST_ssize_t find(const string &substr, ST::case_sensitivity_t cs = case_sensitive) const noexcept | (4) |
ST_ssize_t find(size_t start, char ch, ST::case_sensitivity_t cs = case_sensitive) const noexcept | (5) |
ST_ssize_t find(size_t start, const char *substr, ST::case_sensitivity_t cs = case_sensitive) const noexcept | (6) |
ST_ssize_t find(size_t start, const char8_t *substr, ST::case_sensitivity_t cs = case_sensitive) const noexcept | (7) |
ST_ssize_t find(size_t start, const string &substr, ST::case_sensitivity_t cs = case_sensitive) const noexcept | (8) |
ST_ssize_t find(const char *substr, size_t count, ST::case_sensitivity_t cs = case_sensitive) const noexcept | (9) |
ST_ssize_t find(const char8_t *substr, size_t count, ST::case_sensitivity_t cs = case_sensitive) const noexcept | (10) |
ST_ssize_t find(size_t start, const char *substr, size_t count, ST::case_sensitivity_t cs = case_sensitive) const noexcept | (11) |
ST_ssize_t find(size_t start, const char8_t *substr, size_t count, ST::case_sensitivity_t cs = case_sensitive) const noexcept | (12) |
Find the first instance of ch
(1, 5) or substr
(2-4, 6-8) within
the string, and return its byte position. If ch
or substr
isn't
found, returns -1
.
For variants 5-8, searching starts at byte position start
.
For variants 9-12, no more than count
bytes are matched from the substr
.
Changed in 1.6: Added overloads 5, 6 and 8 to start at a specific position.
Changed in 2.2: Added const char8_t *
overloads.
Changed in 3.5: Added overloads 9-12 with an explicit count.
Signature | |
---|---|
ST_ssize_t find_last(char ch, ST::case_sensitivity_t cs = case_sensitive) const noexcept | (1) |
ST_ssize_t find_last(const char *substr, ST::case_sensitivity_t cs = case_sensitive) const noexcept | (2) |
ST_ssize_t find_last(const char8_t *substr, ST::case_sensitivity_t cs = case_sensitive) const noexcept | (3) |
ST_ssize_t find_last(const string &substr, ST::case_sensitivity_t cs = case_sensitive) const noexcept | (4) |
ST_ssize_t find_last(size_t max, char ch, ST::case_sensitivity_t cs = case_sensitive) const noexcept | (5) |
ST_ssize_t find_last(size_t max, const char *substr, ST::case_sensitivity_t cs = case_sensitive) const noexcept | (6) |
ST_ssize_t find_last(size_t max, const char8_t *substr, ST::case_sensitivity_t cs = case_sensitive) const noexcept | (7) |
ST_ssize_t find_last(size_t max, const string &substr, ST::case_sensitivity_t cs = case_sensitive) const noexcept | (8) |
ST_ssize_t find_last(const char *substr, size_t count, ST::case_sensitivity_t cs = case_sensitive) const noexcept | (9) |
ST_ssize_t find_last(const char8_t *substr, size_t count, ST::case_sensitivity_t cs = case_sensitive) const noexcept | (10) |
ST_ssize_t find_last(size_t max, const char *substr, size_t count, ST::case_sensitivity_t cs = case_sensitive) const noexcept | (11) |
ST_ssize_t find_last(size_t max, const char8_t *substr, size_t count, ST::case_sensitivity_t cs = case_sensitive) const noexcept | (12) |
Find the last instance of ch
(1, 5) or substr
(2-4, 6-8) within
the string, and return its byte position. If ch
or substr
isn't
found, returns -1
.
For variants 5-8, searching looks only within the first max
bytes
of the string.
For variants 9-12, no more than count
bytes are matched from the substr
.
Changed in 1.6: Added overloads 5, 6 and 8 to search within the first max
bytes.
Changed in 2.2: Added const char8_t *
overloads.
Changed in 3.5: Added overloads 9-12 with an explicit count.
Signature |
---|
static string from_bool(bool value) |
Creates the string literal "true"
or "false"
, depending on value
.
Signature | |
---|---|
static string from_float(float value, char format = 'g') | (1) |
static string from_float(double value, char format = 'g') | (2) |
static string from_double(double value, char format = 'g') | (3) |
Create a string representation of the floating-point number in value
.
The format character has the same meaning as printf's floating point formats,
and should be one of 'e'
, 'f'
or 'g'
.
Changed in 3.2: Added from_float(double, char)
overload.
Signature | |
---|---|
static string from_int(short value, int base = 10, bool upper_case = false) | (1) |
static string from_int(int value, int base = 10, bool upper_case = false) | (2) |
static string from_int(long value, int base = 10, bool upper_case = false) | (3) |
static string from_int(long long value, int base = 10, bool upper_case = false) | (4) |
static string from_uint(unsigned short value, int base = 10, bool upper_case = false) | (5) |
static string from_uint(unsigned int value, int base = 10, bool upper_case = false) | (6) |
static string from_uint(unsigned long value, int base = 10, bool upper_case = false) | (7) |
static string from_uint(unsigned long long value, int base = 10, bool upper_case = false) | (8) |
Create a string representation of the integer in value
.
Changed in 3.2: Added (unsigned
) short
, long
, long long
overloads.
Signature | |
---|---|
(1) | |
(2) |
Create a string representation of the 64-bit integer in value
.
Deprecated in string_theory 4.0
Signature | |
---|---|
static string from_latin_1(const char *astr, size_t size = ST_AUTO_SIZE) | (1) |
static string from_latin_1(const ST::char_buffer &astr) | (2) |
- Construct a string from the first
size
bytes of the Latin-1 / ISO-8859-1 string data inastr
. Ifsize
isST_AUTO_SIZE
, the length of the input will be determined with::strlen()
. - Construct a string from the Latin-1 / ISO-8859-1 string data in
astr
.
Signature | |
---|---|
static string from_std_string(const std::string &sstr, ST::utf_validation_t validation = ST_DEFAULT_VALIDATION) | (1) |
static string from_std_string(const std::wstring &wstr, ST::utf_validation_t validation = ST_DEFAULT_VALIDATION) | (2) |
static string from_std_wstring(const std::wstring &wstr, ST::utf_validation_t validation = ST_DEFAULT_VALIDATION) | (3) |
static string from_std_string(const std::u16string &ustr, ST::utf_validation_t validation = ST_DEFAULT_VALIDATION) | (4) |
static string from_std_string(const std::u32string &ustr, ST::utf_validation_t validation = ST_DEFAULT_VALIDATION) | (5) |
static string from_std_string(const std::u8string &ustr, ST::utf_validation_t validation = ST_DEFAULT_VALIDATION) | (6) |
static string from_std_string(const std::string_view &view, ST::utf_validation_t validation = ST_DEFAULT_VALIDATION) | (7) |
static string from_std_string(const std::wstring_view &view, ST::utf_validation_t validation = ST_DEFAULT_VALIDATION) | (8) |
static string from_std_wstring(const std::wstring_view &view, ST::utf_validation_t validation = ST_DEFAULT_VALIDATION) | (9) |
static string from_std_string(const std::u16string_view &view, ST::utf_validation_t validation = ST_DEFAULT_VALIDATION) | (10) |
static string from_std_string(const std::u32string_view &view, ST::utf_validation_t validation = ST_DEFAULT_VALIDATION) | (11) |
static string from_std_string(const std::u8string_view &view, ST::utf_validation_t validation = ST_DEFAULT_VALIDATION) | (12) |
Construct a string from a std::string
or std::string_view
. The string
is expected to be encoded as the appropriate UTF encoding for the string type
(see set() for details).
Since string_theory 1.1.
Changed in 1.7: Added std::u16string
and std::u32string
overloads.
Changed in 2.0: Added std::*string_view
overloads.
Changed in 2.2: Added std::u8string*
overloads.
See also (constructor), set()
Signature |
---|
static string from_path(const std::filesystem::path &path) |
Construct a string from a filesystem path, using the system's default encoding.
Since string_theory 1.1.
Signature | |
---|---|
static string from_utf8(const char *utf8, size_t size = ST_AUTO_SIZE, ST::utf_validation_t validation = ST_DEFAULT_VALIDATION) | (1) |
static string from_utf8(const char8_t *utf8, size_t size = ST_AUTO_SIZE, ST::utf_validation_t validation = ST_DEFAULT_VALIDATION) | (2) |
static string from_utf8(const ST::char_buffer &utf8, ST::utf_validation_t validation = ST_DEFAULT_VALIDATION) | (3) |
- Construct a string from the first
size
bytes of the UTF-8 string data inutf8
. Ifsize
isST_AUTO_SIZE
, the length of the input will be determined as if withstrlen()
. - Construct a string from the first
size
bytes of the UTF-8 string data inutf8
. Ifsize
isST_AUTO_SIZE
, the length of the input will be determined as if withstrlen()
. - Construct a string from the UTF-8 string data in
utf8
.
Changed in 2.2: Added const char8_t *
overload.
Signature | |
---|---|
static string from_utf16(const char16_t *utf16, size_t size = ST_AUTO_SIZE, ST::utf_validation_t validation = ST_DEFAULT_VALIDATION) | (1) |
static string from_utf16(const ST::utf16_buffer &utf16, ST::utf_validation_t validation = ST_DEFAULT_VALIDATION) | (2) |
- Construct a string from the first
size
characters of the UTF-16 string data inutf16
. Ifsize
isST_AUTO_SIZE
, the length of the input will be determined with the equivalent ofstrlen()
. - Construct a string from the UTF-16 string data in
utf16
.
Signature | |
---|---|
static string from_utf32(const char32_t *utf32, size_t size = ST_AUTO_SIZE, ST::utf_validation_t validation = ST_DEFAULT_VALIDATION) | (1) |
static string from_utf32(const ST::utf32_buffer &utf32, ST::utf_validation_t validation = ST_DEFAULT_VALIDATION) | (2) |
- Construct a string from the first
size
characters of the UTF-32 string data inutf32
. Ifsize
isST_AUTO_SIZE
, the length of the input will be determined with the equivalent ofstrlen()
. - Construct a string from the UTF-32 string data in
utf32
.
Signature | |
---|---|
static string from_validated(const char *text, size_t size) | (1) |
static string from_validated(const char8_t *text, size_t size) | (2) |
static string from_validated(const ST::char_buffer &buffer) | (3) |
static string from_validated(ST::char_buffer &&buffer) | (4) |
- Construct a string from the validated UTF-8 data pointed to by
text
. - Construct a string from the validated UTF-8 data pointed to by
text
. - Construct a string from the validated UTF-8 buffer in
buffer
. - Construct a string owning the validated UTF-8 buffer in
buffer
.
Since string_theory 2.2.
See also set_validated()
Signature | |
---|---|
static string from_wchar(const wchar_t *wstr, size_t size = ST_AUTO_SIZE, ST::utf_validation_t validation = ST_DEFAULT_VALIDATION) | (1) |
static string from_wchar(const ST::wchar_buffer &wstr, ST::utf_validation_t validation = ST_DEFAULT_VALIDATION) | (2) |
- Construct a string from the first
size
wide characters of the wide string data inwstr
. Ifsize
isST_AUTO_SIZE
, the length of the input will be determined withwcslen()
. - Construct a string from the wide character string data in
wstr
.
Note that the data is expected to be either UTF-16 or UTF-32 encoded, depending on your platform's wchar_t support.
Signature |
---|
const char &front() const noexcept |
Return a reference to the first character in the string. If the string is empty, this returns a reference to the terminating nul character.
Since string_theory 2.0.
Signature |
---|
string left(size_t size) const |
Convenience function to extract a substring from the left side of the string. This is equivalent to substr(0, size).
See also substr()
Signature | |
---|---|
(1) | |
string &operator=(const char *cstr) | (2) |
string &operator=(const char8_t *cstr) | (3) |
string &operator=(const wchar_t *wstr) | (4) |
string &operator=(const char16_t *cstr) | (5) |
string &operator=(const char32_t *cstr) | (6) |
string &operator=(const string ©) | (7) |
string &operator=(string &&move) noexcept | (8) |
string &operator=(const ST::char_buffer &init) | (9) |
string &operator=(ST::char_buffer &&init) | (10) |
string &operator=(const ST::wchar_buffer &init) | (11) |
string &operator=(const ST::utf16_buffer &init) | (12) |
string &operator=(const ST::utf32_buffer &init) | (13) |
string &operator=(const std::string &init) | (14) |
string &operator=(const std::wstring &init) | (15) |
string &operator=(const std::u16string &init) | (16) |
string &operator=(const std::u32string &init) | (17) |
string &operator=(const std::u8string &init) | (18) |
string &operator=(const std::string_view &init) | (19) |
string &operator=(const std::wstring_view &init) | (20) |
string &operator=(const std::u16string_view &init) | (21) |
string &operator=(const std::u32string_view &init) | (22) |
string &operator=(const std::u8string_view &init) | (23) |
string &operator=(const std::filesystem::path &init) | (24) |
- Shortcut operator=() overload to reset the string to the empty string. Equivalent to calling clear().
- Set the string content to the contents of the string pointed to by
cstr
. This is equivalent to set(cstr). - Set the string content to the contents of the string pointed to by
cstr
. This is equivalent to set(cstr). - Set the string content from the wide string pointed to by
wstr
. This is equivalent to set(wstr). - Set the string content from the UTF-16 string pointed to by
cstr
. This is equivalent to set(cstr). - Set the string content from the UTF-32 string pointed to by
cstr
. This is equivalent to set(cstr). - Set the string to the same value as
copy
. - Move the contents of
move
into this string object. - Set the string from the contents of
init
. The data stored ininit
is expected to be encoded as UTF-8. - Move the contents of
init
into this string's internal UTF-8 buffer. The data stored ininit
will still be checked according tovalidation
, and is expected to be encoded as UTF-8. - Set the string content from the wide character data provided in
init
. The data provided ininit
is expected to be encoded as either UTF-16 or UTF-32, depending on your platform's wchar_t support. - Set the string content from the UTF-16 data provided in
init
. - Set the string content from the UTF-32 data provided in
init
. - Set the string content from the string in
init
. The string data is expected to be encoded as UTF-8. - Set the string content from the wide string in
init
. The string data is expected to be encoded as either UTF-16 or UTF-32, depending on your platform's wchar_t support. - Set the string content from the UTF-16 string in
init
. - Set the string content from the UTF-32 string in
init
. - Set the string content from the UTF-8 string in
init
. - Set the string content from the string view contained in
view
. The string data is expected to be encoded as UTF-8. - Set the string content from the wide string view contained in
init
. The string data is expected to be encoded as either UTF-16 or UTF-32, depending on your platform's wchar_t support. - Set the string content from the UTF-16 string view contained in
init
. - Set the string content from the UTF-32 string view contained in
init
. - Set the string content from the UTF-8 string view contained in
init
. - Set the string content from the filesystem path in
init
.
Changed in 1.1: Added std::string
, std::wstring
, and std::filesystem::path
overloads.
Changed in 1.7: Added const char16_t *
, const char32_t *
, std::u16string
,
and std::u32string
overloads.
Changed in 2.0: Added std::*string_view
overloads.
Changed in 2.2: Added const char8_t *
and std::u8string*
overloads.
Changed in 3.4: Deprecated null_t
overload.
See also (constructor), set()
Signature | |
---|---|
string &operator+=(const char *cstr) | (1) |
string &operator+=(const wchar_t *wstr) | (2) |
string &operator+=(const char16_t *cstr) | (3) |
string &operator+=(const char32_t *cstr) | (4) |
string &operator+=(const char8_t *cstr) | (5) |
string &operator+=(const string &other) | (6) |
string &operator+=(char ch) | (7) |
string &operator+=(wchar_t ch) | (8) |
string &operator+=(char16_t ch) | (9) |
string &operator+=(char32_t ch) | (10) |
- Append the contents of
cstr
to the end of this string. The input is expected to be encoded as UTF-8. - Append the contents of
wstr
to the end of this string. The input is converted to UTF-8 in the same manner as from_wchar(). - Append the contents of
cstr
to the end of this string. The input is converted to UTF-8 in the same manner as from_utf16(). - Append the contents of
cstr
to the end of this string. The input is converted to UTF-8 in the same manner as from_utf32(). - Append the contents of
cstr
to the end of this string. - Append the contents of
other
to the end of this string. - Append the ASCII character
ch
to the end of this string. - Append the wide character
ch
to the end of this string. - Append the unicode character
ch
to the end of this string. - Append the unicode character
ch
to the end of this string.
Changed in 1.6: Added overloads 6-9 to append individual characters.
Changed in 1.7: Added const char16_t *
and const char32_t *
overloads.
Changed in 2.2: Added const char8_t *
and std::u8string*
overloads.
Signature | |
---|---|
(1) | |
bool operator==(const string &other) const noexcept | (2) |
bool operator==(const char *other) const noexcept | (3) |
bool operator==(const char8_t *other) const noexcept | (4) |
(5) | |
bool operator!=(const string &other) const noexcept | (6) |
bool operator!=(const char *other) const noexcept | (7) |
bool operator!=(const char8_t *other) const noexcept | (8) |
bool operator<(const string &other) const noexcept | (9) |
- Returns empty().
- Convenience operator. This is equivalent to checking
compare(other, ST::case_sensitive) ==
0
- Convenience operator. This is equivalent to checking
compare(other, ST::case_sensitive) ==
0
- Convenience operator. This is equivalent to checking
compare(other, ST::case_sensitive) ==
0
- Returns !empty().
- Convenience operator. This is equivalent to checking
compare(other, ST::case_sensitive) !=
0
- Convenience operator. This is equivalent to checking
compare(other, ST::case_sensitive) !=
0
- Convenience operator. This is equivalent to checking
compare(other, ST::case_sensitive) !=
0
- Convenience operator. This is provided to work with
std::less
for STL-style containers.
For more control over string comparisons, see the compare() family of functions.
Changed in 2.2: Added const char8_t *
overloads.
Changed in 3.4: Deprecated null_t
overloads.
See also compare(), struct less_i
Signature |
---|
const char &operator[](size_t position) const noexcept |
Returns a reference to the UTF-8 code unit (byte) at the specified position. Like ST::char_buffer::operator[](), this is not bounds checked. However, accessing characters outside of the string and its terminating nul character will result in undefined behavior.
Since string_theory 2.0.
Signature |
---|
This operator overload allows implicit conversion of an ST::string
to a
std::string_view
into the entire contents of the string.
Since string_theory 2.0.
Removed in string_theory 3.0.
Signature | |
---|---|
const_reverse_iterator rbegin() const noexcept | (1) |
const_reverse_iterator crbegin() const noexcept | (2) |
Return a reverse iterator to the reverse-start of the string.
Since string_theory 2.0.
See also rend()
Signature | |
---|---|
const_reverse_iterator rend() const noexcept | (1) |
const_reverse_iterator crend() const noexcept | (2) |
Return a reverse iterator to the reverse-end of the buffer.
Since string_theory 2.0.
See also rbegin()
Signature | |
---|---|
string replace(const char *from, const char *to, ST::case_sensitivity_t cs = case_sensitive, ST::utf_validation_t validation = ST_DEFAULT_VALIDATION) const | (1) |
string replace(const string &from, const char *to, ST::case_sensitivity_t cs = case_sensitive, ST::utf_validation_t validation = ST_DEFAULT_VALIDATION) const | (2) |
string replace(const char *from, const string &to, ST::case_sensitivity_t cs = case_sensitive, ST::utf_validation_t validation = ST_DEFAULT_VALIDATION) const | (3) |
(4) | |
string replace(const string &from, const string &to, ST::case_sensitivity_t cs = case_sensitive) const | (5) |
string replace(const char8_t *from, const char8_t *to, ST::case_sensitivity_t cs = case_sensitive, ST::utf_validation_t validation = ST_DEFAULT_VALIDATION) const | (6) |
string replace(const string &from, const char8_t *to, ST::case_sensitivity_t cs = case_sensitive, ST::utf_validation_t validation = ST_DEFAULT_VALIDATION) const | (7) |
string replace(const char8_t *from, const string &to, ST::case_sensitivity_t cs = case_sensitive, ST::utf_validation_t validation = ST_DEFAULT_VALIDATION) const | (8) |
Return a string which has all instances of the string from
replaced with
the string in to
.
Changed in 2.2: Added const char8_t *
overloads.
Changed in 3.0: Added overload #5 and deprecated #4.
Signature |
---|
string right(size_t size) const |
Convenience function to extract a substring from the right side of the string. This is equivalent to substr(-size).
See also substr()
- Reset the string to the empty string. Equivalent to calling clear().
- Set the string content to the first
size
bytes of the string pointed to bycstr
. The data pointed to bycstr
is expected to be encoded as UTF-8. - Set the string content from the first
size
wide characters of the string pointed to bywstr
. The data pointed to bywstr
is expected to be encoded as either UTF-16 or UTF-32, depending on your platform's wchar_t support. - Set the string content from the first
size
characters of the UTF-16 string pointed to bycstr
. - Set the string content from the first
size
characters of the UTF-32 string pointed to bycstr
. - Set the string content from the first
size
bytes of the UTF-8 string pointed to bycstr
. - Set the string to the same value as
copy
. - Move the contents of
move
into this string object. - Set the string from the contents of
init
. The data stored ininit
is expected to be encoded as UTF-8. - Move the contents of
init
into this string's internal UTF-8 buffer. The data stored ininit
will still be checked according tovalidation
, and is expected to be encoded as UTF-8. - Set the string content from the wide character data provided in
init
. The data provided ininit
is expected to be encoded as either UTF-16 or UTF-32, depending on your platform's wchar_t support. - Set the string content from the UTF-16 data provided in
init
. - Set the string content from the UTF-32 data provided in
init
. - Set the string content from the string provided in
init
. The string is expected to be encoded as UTF-8. - Set the string content from the wide string data provided in
init
. The string is expected to be encoded as either UTF-16 or UTF-32, depending on your platform's wchar_t support. - Set the string content from the UTF-16 string provided in
init
. - Set the string content from the UTF-32 string provided in
init
. - Set the string content from the UTF-8 string provided in
init
. - Set the string content from the string view captured in
view
. The string is expected to be encoded as UTF-8. - Set the string content from the wide string view captured in
view
. The string is expected to be encoded as either UTF-16 or UTF-32, depending on your platform's wchar_t support. - Set the string content from the UTF-16 string view captured in
view
. - Set the string content from the UTF-32 stirng view captured in
view
. - Set the string content from the UTF-8 stirng view captured in
view
. - Set the string content from the filesystem path in
init
.
For the variants which take a size, if size
is ST_AUTO_SIZE
, the length of
the input will be determined as if with strlen()
or equivalent.
Changed in 1.1: Added std::string
, std::wstring
, and
std::filesystem::path
overloads.
Changed in 1.7: Added const char16_t *
, const char32_t *
, std::u16string
,
and std::u32string
overloads.
Changed in 2.0: Added std::*string_view
overloads.
Changed in 2.2: Added const char8_t *
and std::u8string*
overloads.
Changed in 3.4: Deprecated null_t
overload.
See also (constructor)(), set_validated(), operator=(), from_utf8(), from_utf16(), from_utf32(), from_std_string(), from_path()
Signature | |
---|---|
void set_validated(const char *text, size_t size) | (1) |
void set_validated(const char8_t *text, size_t size) | (2) |
void set_validated(const ST::char_buffer &buffer) | (3) |
void set_validated(ST::char_buffer &&buffer) | (4) |
- Set the string content from the validated UTF-8 data pointed to by
text
. - Set the string content from the validated UTF-8 data pointed to by
text
. - Set the string content from the validated UTF-8 buffer in
buffer
. - Move the validated UTF-8 buffer in
buffer
into the string's internal buffer.
Since string_theory 2.2.
See also set(), from_validated()
Signature |
---|
size_t size() const noexcept |
Returns the size (in bytes) of the string data, not including the nul-terminator.
See also ST::buffer::size()
Signature | |
---|---|
std::vector<string> split(char split_char, size_t max_splits = ST_AUTO_SIZE, ST::case_sensitivity_t cs = case_sensitive const | (1) |
std::vector<string> split(const char *splitter, size_t max_splits = ST_AUTO_SIZE, ST::case_sensitivity_t cs = case_sensitive const | (2) |
std::vector<string> split(const string &splitter, size_t max_splits = ST_AUTO_SIZE, ST::case_sensitivity_t cs = case_sensitive const | (3) |
std::vector<string> split(const char8_t *splitter, size_t max_splits = ST_AUTO_SIZE, ST::case_sensitivity_t cs = case_sensitive const | (4) |
Split the string into pieces separated by split_char
or splitter
. If
max_splits
is not ST_AUTO_SIZE
and there are more than max_splits
separators in the string, the extras will be preserved in the final
element of the returned vector. Specifically, the maximum size of the
returned vector is max_splits
+ 1
elements.
Note that adjacent separators are treated individually: Two instances of
split_char
or splitter
next to each other will result in an empty
string in the result. If this string is empty, a vector with a single
empty string element will be returned.
Changed in 2.2: Added const char8_t *
overload.
See also tokenize()
Signature |
---|
std::vector<string> tokenize(const char *delims = ST_WHITESPACE) const |
Split the string into pieces separated by any of the characters in delims
.
Any sequence of adjacent delimiters will be treated as a single separator,
meaning that no elements of the returned vector will be empty. If this
string is empty, an empty vector will be returned.
See also split()
Signature | |
---|---|
bool starts_with(const string &prefix, ST::case_sensitivity_t cs = case_sensitive) const noexcept | (1) |
bool starts_with(const char *prefix, ST::case_sensitivity_t cs = case_sensitive) const noexcept | (2) |
bool starts_with(const char8_t *prefix, ST::case_sensitivity_t cs = case_sensitive) const noexcept | (3) |
Return true
if this string starts with prefix
. Equivalent to
compare_n(prefix, prefix.size(), cs) == 0
.
Changed in 2.0: Marked these functions noexcept
.
Changed in 2.2: Added const char8_t *
overload.
See also compare_n()
Signature |
---|
string substr(ST_ssize_t start, size_t size = ST_AUTO_SIZE) const |
Return a string whose contents are a copy of at most size
bytes from this
string, starting at position start
. If size
is ST_AUTO_SIZE
or start
+
size
is greather than the size of the string, this will return the remainder
of the string from the starting position. If start
is negative, then the
starting position is relative to the end of the string.
Signature | |
---|---|
bool to_bool() const noexcept | (1) |
bool to_bool(ST::conversion_result &result) const noexcept | (2) |
Convert the string to a boolean. If the string is either "true" or "false"
(case insensitive), those values are converted to the respective boolean
values. Otherwise, this behaves like to_int(), where a non-zero
result is treated as true
.
Note that variant #1 has no way of reporting errors. An empty string will
return false
, as will any string which cannot be converted to based on the
rules described above. For variant #2, The result of the conversion is
stored in result
.
Signature | |
---|---|
(1) | |
void to_buffer(ST::char_buffer &result, bool utf8 = true, bool substitute_out_of_range = true) const | (2) |
void to_buffer(ST::wchar_buffer &result) const | (3) |
void to_buffer(ST::utf16_buffer &result) const | (4) |
void to_buffer(ST::utf32_buffer &result) const | (5) |
Conversion helpers for easier use in templates:
- Deprecated alias for #2.
- Convert the string content to a
char_buffer
. Ifutf8
is true, the buffer will be a copy of the string's internal UTF-8 buffer; otherwise, it will be converted to Latin-1. - Convert the string content to a
wchar_buffer
. The string will be encoded as either UTF-16 or UTF-32 depending on your platform's wchar_t support. This is equivalent toresult = to_wchar()
. - Convert the string content to a
ST::utf16_buffer
. This is equivalent toresult = to_utf16()
. - Convert the string content to a
ST::utf32_buffer
. This is equivalent toresult = to_utf32()
.
Since string_theory 2.4.
Changed in 3.0: Deprecated the char_buffer
overload taking a
utf_validation_t
and added the bool overload as a replacement.
See also to_utf8(), to_latin_1(), to_wchar(), to_utf16(), to_utf32()
Signature | |
---|---|
float to_float() const noexcept | (1) |
float to_float(ST::conversion_result &result) const noexcept | (2) |
double to_double() const noexcept | (3) |
double to_double(ST::conversion_result &result) const noexcept | (4) |
Convert the string to a floating-point number, in a manner similar to strtod
.
Note that variants #1 and #3 hav no way of reporting errors. An empty string
will return 0
, and a string with other characters not parseable as a number
will get partially converted, up to the first invalid character. For variants
#2 and #4, The result of the conversion is stored in result
.
Signature | |
---|---|
short to_short(int base = 0) const noexcept | (1) |
short to_short(ST::conversion_result &result, int base = 0) const noexcept | (2) |
int to_int(int base = 0) const noexcept | (3) |
int to_int(ST::conversion_result &result, int base = 0) const noexcept | (4) |
long to_long(int base = 0) const noexcept | (5) |
long to_long(ST::conversion_result &result, int base = 0) const noexcept | (6) |
long long to_long_long(int base = 0) const noexcept | (7) |
long long to_long_long(ST::conversion_result &result, int base = 0) const noexcept | (8) |
unsigned short to_ushort(int base = 0) const noexcept | (9) |
unsigned short to_ushort(ST::conversion_result &result, int base = 0) const noexcept | (10) |
unsigned int to_uint(int base = 0) const noexcept | (11) |
unsigned int to_uint(ST::conversion_result &result, int base = 0) const noexcept | (12) |
unsigned long to_ulong(int base = 0) const noexcept | (13) |
unsigned long to_ulong(ST::conversion_result &result, int base = 0) const noexcept | (14) |
unsigned long long to_ulong_long(int base = 0) const noexcept | (15) |
unsigned long long to_ulong_long(ST::conversion_result &result, int base = 0) const noexcept | (16) |
Convert the string to an integer. If base
is 0
, this function will try
to guess the base in a similar manner to strtol
or strtoul
.
Note that variants without an ST::conversion_result &
parameter have no way
of reporting errors. An empty string will return 0
, and a string with other
characters not in the specified base
will get partially converted, up to the
first invalid character. For variants with a conversion_result
parameter,
the result of the conversion is stored in result
.
Changed in 3.2: Added (unsigned
) short
, long
, long long
conversions.
Signature | |
---|---|
(1) | |
(2) | |
(3) | |
(4) |
Convert the string to a 64-bit integer. If base
is 0
, this function will
try to guess the base in a similar manner to strtoll
or strtoull
.
Note that variants without an ST::conversion_result &
parameter have no way
of reporting errors. An empty string will return 0
, and a string with other
characters not in the specified base
will get partially converted, up to the
first invalid character. For variants with a conversion_result
parameter,
the result of the conversion is stored in result
.
Deprecated in string_theory 4.0.
Signature | |
---|---|
(1) | |
ST::char_buffer to_latin_1(bool substitute_out_of_range = true) const | (2) |
Convert the string content to Latin-1 / ISO-8859-1. Any characters outside
of the Latin-1 range will be replaced by ?
if substitute_out_of_range
is
true
, or will throw a ST::unicode_error otherwise.
Changed in 3.0: Deprecated the overload taking utf_validation_t
and added the bool overload as a replacement.
Signature |
---|
string to_lower() const |
Returns a copy of this string with all 7-bit ASCII characters converted to lower-case.
Signature | |
---|---|
(1) | |
std::string to_std_string(bool utf8 = true, bool substitute_out_of_range = true) const | (2) |
std::wstring to_std_wstring() const | (3) |
std::u16string to_std_u16string() const | (4) |
std::u32string to_std_u32string() const | (5) |
std::u8string to_std_u8string() const | (6) |
(7) | |
void to_std_string(std::string &result, bool utf8 = true, bool substitute_out_of_range = true) const | (8) |
void to_std_string(std::wstring &result) const | (9) |
void to_std_string(std::u16string &result) const | (10) |
void to_std_string(std::u32string &result) const | (11) |
void to_std_string(std::u8string &result) const | (12) |
- Deprecated alias for #2.
- Convert the string content to a
std::string
. Ifutf8
is true, the data will be converted as UTF-8; otherwise, it will be converted as Latin-1. - Convert the string content to a
std::wstring
. The string will be encoded as either UTF-16 or UTF-32 depending on your platform's wchar_t support. - Convert the string content to a UTF-16 encoded
std::u16string
. - Convert the string content to a UTF-32 encoded
std::u32string
. - Convert the string content to a UTF-8 encoded
std::u8string
. - Deprecated alias for #8.
- Convenience wrapper for #2 for easier use in templates.
- Convenience wrapper for #3 for easier use in templates.
- Convenience wrapper for #4 for easier use in templates.
- Convenience wrapper for #5 for easier use in templates.
- Convenience wrapper for #6 for easier use in templates.
Since string_theory 1.1.
Changed in 1.7: Added std::u16string
and std::u32string
variants.
Changed in 2.0: Added variants taking an output reference for easier use within templates and other generic code.
Changed in 2.2: Added std::u8string
variants.
Changed in 3.0: Deprecated std::string
overloads taking a
utf_validation_t
and added the bool overloads as a replacement.
Signature |
---|
std::filesystem::path to_path() const |
Convert the string to a filesystem path using the system's default path encoding.
Since string_theory 1.1.
Signature |
---|
string to_upper() const |
Returns a copy of this string with all 7-bit ASCII characters converted to upper-case.
Signature |
---|
ST::char_buffer to_utf8() const noexcept |
Return a copy of the internal UTF-8 string data buffer.
Signature |
---|
ST::utf16_buffer to_utf16() const |
Convert the string content to UTF-16.
Signature |
---|
ST::utf32_buffer to_utf32() const |
Convert the string content to UTF-32.
Signature |
---|
ST::wchar_buffer to_wchar() const |
Convert the string content to a wchar_t buffer. The buffer is encoded either as UTF-16 or UTF-32, depending on your platform's wchar_t support.
Signature |
---|
string trim(const char *charset = ST_WHITESPACE) const |
Return a string which has any characters found in charset
removed from both
the left and right sides of the string.
See also trim_left(), trim_right()
Signature |
---|
string trim_left(const char *charset = ST_WHITESPACE) const |
Return a string which has any characters found in charset
removed from the
left side of the string.
See also trim()
Signature |
---|
string trim_right(const char *charset = ST_WHITESPACE) const |
Return a string which has any characters found in charset
removed from the
right side of the string.
See also trim()
Signature | |
---|---|
const char8_t *u8_str() const noexcept | (1) |
const char8_t *u8_str(const char8_t *substitute) const noexcept | (2) |
Returns a pointer to the stored UTF-8 string data. This buffer should always be nul-terminated, so it's safe to use in functions which require C-style string buffers.
For variant #2, If this string is empty, the pointer provided in substitute
will be returned instead.
Since string_theory 2.2.
Signature |
---|
std::string_view view(size_t start = 0, size_t length = ST_AUTO_SIZE) const |
Return a std::string_view into part or all of the string's content.
Since string_theory 2.0.
struct equal_i
{
bool operator()(const string &left, const string &right) const noexcept
};
Functor object which returns true for case-insensitive string comparisons
where the left
string is equal to the right
string. This is designed
for STL-style containers which need case-insensitive ordering.
See also struct hash_i, struct less_i, struct equal_i, operator==()
struct hash
{
size_t operator()(const string &str) const noexcept;
};
namespace std
{
template <>
struct hash<ST::string>
{
size_t operator()(const ST::string &str) const noexcept;
};
}
Functor object which returns a reasonable hash for the provided string. This
is designed for STL-style containers which use hashing for indexes, e.g.
std::unordered_map
.
The std::hash
specialization is also provided for convenience for using
ST::string
objects in hash containers without needing to explicitly use
the ST::hash
object for hashing.
Changed in 1.7: Added std::hash
specialization.
See also struct hash_i
struct hash_i
{
size_t operator()(const string &str) const noexcept;
};
Functor object which returns a reasonable case-insensitive hash for the
provided string. This is designed for STL-style containers which use hashing
for indexes, e.g. std::unordered_map
.
See also struct hash, struct less_i, struct equal_i
struct less_i
{
bool operator()(const string &left, const string &right) const noexcept
};
Functor object which returns true for case-insensitive string comparisons
where the left
string is less than the right
string. This is designed
for STL-style containers which need case-insensitive ordering.
See also struct hash_i, struct less_i, struct equal_i, operator<()
Signature | |
---|---|
string operator+(const string &left, const string &right) | (1) |
string operator+(const string &left, const char *right) | (2) |
string operator+(const char *left, const string &right) | (3) |
string operator+(const string &left, const wchar_t *right) | (4) |
string operator+(const wchar_t *left, const string &right) | (5) |
string operator+(const string &left, const char16_t *right) | (6) |
string operator+(const char16_t *left, const string &right) | (7) |
string operator+(const string &left, const char32_t *right) | (8) |
string operator+(const char32_t *left, const string &right) | (9) |
string operator+(const string &left, const char8_t *right) | (10) |
string operator+(const char8_t *left, const string &right) | (11) |
string operator+(const string &left, char right) | (12) |
string operator+(const string &left, wchar_t right) | (13) |
string operator+(const string &left, char16_t right) | (14) |
string operator+(const string &left, char32_t right) | (15) |
string operator+(char left, const string &right) | (16) |
string operator+(wchar_t left, const string &right) | (17) |
string operator+(char16_t left, const string &right) | (18) |
string operator+(char32_t left, const string &right) | (19) |
Returns a string which is the concatenation of left
and right
.
Changed in 1.6: Added overloads 12-19 to concatenate individual characters.
Changed in 1.7: Added const char16_t *
and const char32_t *
overloads.
Changed in 2.2: Added const char8_t *
overloads.
See also operator+=()
Signature | |
---|---|
(1) | |
(2) |
- Returns
right.empty()
. - Returns
!right.empty()
.
Changed in 3.4: Deprecated null_t
overloads.
See also operator==()
using namespace ST::literals;
Signature | |
---|---|
ST::string operator"" _st(const char *str, size_t size) | (1) |
ST::string operator"" _st(const wchar_t *str, size_t size) | (2) |
ST::string operator"" _st(const char16_t *str, size_t size) | (3) |
ST::string operator"" _st(const char32_t *str, size_t size) | (4) |
ST::string operator"" _st(const char8_t *str, size_t size) | (5) |
User-defined literal operator to convert string literals to ST::string objects efficiently.
-
The string literal should be encoded as UTF-8.
Because this operator assumes UTF-8 data, it is as efficient as using ST_LITERAL() to construct the string.
-
The string literal should be encoded as either UTF-16 or UTF-32, depending on your platform's wchar_t support.
-
The string literal should be encoded as UTF-16.
-
The string literal should be encoded as UTF-32.
-
The string literal should be encoded as UTF-8.
Changed in 2.2: Added const char8_t *
overload.
Changed in 3.0: Moved these to the ST::literals
namespace to avoid
polluting the global namespace.
See also ST_LITERAL(), from_utf8(), from_wchar(), from_utf16(), from_utf32()
#define ST_LITERAL(str) ...
Construct an ST::string
from static UTF-8 data in an efficient manner. This
bypasses the normal constructor size and validity checks to construct the
string object directly (at compile time), making it much faster than the
string("...") or from_utf8("...") constructors.
See also operator"" _st()
#define ST_WHITESPACE " \t\r\n"
A default set of whitespace characters, useful for trimming and tokenizing strings.