Functions for Working with Strings
Functions for searching in strings and for replacing in strings are described separately.
empty
Checks whether the input string is empty. A string is considered non-empty if it contains at least one byte, even if this byte is a space or the null byte.
The function is also available for arrays and UUIDs.
Syntax
empty(x)
Arguments
x
— Input value. String.
Returned value
- Returns
1
for an empty string or0
for a non-empty string. UInt8.
Example
SELECT empty('');
Result:
┌─empty('')─┐
│ 1 │
└───────────┘
notEmpty
Checks whether the input string is non-empty. A string is considered non-empty if it contains at least one byte, even if this byte is a space or the null byte.
The function is also available for arrays and UUIDs.
Syntax
notEmpty(x)
Arguments
x
— Input value. String.
Returned value
- Returns
1
for a non-empty string or0
for an empty string string. UInt8.
Example
SELECT notEmpty('text');
Result:
┌─notEmpty('text')─┐
│ 1 │
└──────────────────┘
length
Returns the length of a string in bytes rather than in characters or Unicode code points. The function also works for arrays.
Alias: OCTET_LENGTH
Syntax
length(s)
Parameters
Returned value
- Length of the string or array
s
in bytes. UInt64.
Example
Query:
SELECT length('Hello, world!');
Result:
┌─length('Hello, world!')─┐
│ 13 │
└─────────────────────────┘
Query:
SELECT length([1, 2, 3, 4]);
Result:
┌─length([1, 2, 3, 4])─┐
│ 4 │
└──────────────────────┘
lengthUTF8
Returns the length of a string in Unicode code points rather than in bytes or characters. It assumes that the string contains valid UTF-8 encoded text. If this assumption is violated, no exception is thrown and the result is undefined.
Aliases:
CHAR_LENGTH
CHARACTER_LENGTH
Syntax
lengthUTF8(s)
Parameters
s
— String containing valid UTF-8 encoded text. String.
Returned value
- Length of the string
s
in Unicode code points. UInt64.
Example
Query:
SELECT lengthUTF8('Здравствуй, мир!');
Result:
┌─lengthUTF8('Здравствуй, мир!')─┐
│ 16 │
└────────────────────────────────┘
left
Returns a substring of string s
with a specified offset
starting from the left.
Syntax
left(s, offset)
Parameters
s
— The string to calculate a substring from. String or FixedString.offset
— The number of bytes of the offset. (U)Int*.
Returned value
- For positive
offset
: A substring ofs
withoffset
many bytes, starting from the left of the string. - For negative
offset
: A substring ofs
withlength(s) - |offset|
bytes, starting from the left of the string. - An empty string if
length
is 0.
Example
Query:
SELECT left('Hello', 3);
Result:
Hel
Query:
SELECT left('Hello', -3);
Result:
He
leftUTF8
Returns a substring of a UTF-8 encoded string s
with a specified offset
starting from the left.
Syntax
leftUTF8(s, offset)
Parameters
s
— The UTF-8 encoded string to calculate a substring from. String or FixedString.offset
— The number of bytes of the offset. (U)Int*.
Returned value
- For positive
offset
: A substring ofs
withoffset
many bytes, starting from the left of the string. - For negative
offset
: A substring ofs
withlength(s) - |offset|
bytes, starting from the left of the string. - An empty string if
length
is 0.
Example
Query:
SELECT leftUTF8('Привет', 4);
Result:
Прив
Query:
SELECT leftUTF8('Привет', -4);
Result:
Пр
leftPad
Pads a string from the left with spaces or with a specified string (multiple times, if needed) until the resulting string reaches the specified length
.
Syntax
leftPad(string, length[, pad_string])
Alias: LPAD
Arguments
string
— Input string that should be padded. String.length
— The length of the resulting string. UInt or Int. If the value is smaller than the input string length, then the input string is shortened tolength
characters.pad_string
— The string to pad the input string with. String. Optional. If not specified, then the input string is padded with spaces.
Returned value
- A left-padded string of the given length. String.
Example
SELECT leftPad('abc', 7, '*'), leftPad('def', 7);
Result:
┌─leftPad('abc', 7, '*')─┬─leftPad('def', 7)─┐
│ ****abc │ def │
└─────────────────────── ─┴───────────────────┘
leftPadUTF8
Pads the string from the left with spaces or a specified string (multiple times, if needed) until the resulting string reaches the given length. Unlike leftPad which measures the string length in bytes, the string length is measured in code points.
Syntax
leftPadUTF8(string, length[, pad_string])
Arguments
string
— Input string that should be padded. String.length
— The length of the resulting string. UInt or Int. If the value is smaller than the input string length, then the input string is shortened tolength
characters.pad_string
— The string to pad the input string with. String. Optional. If not specified, then the input string is padded with spaces.
Returned value
- A left-padded string of the given length. String.
Example
SELECT leftPadUTF8('абвг', 7, '*'), leftPadUTF8('дежз', 7);
Result:
┌─leftPadUTF8('абвг', 7, '*')─┬─leftPadUTF8('дежз', 7)─┐
│ ***абвг │ дежз │
└─────────────────────────────┴────────────────────────┘
right
Returns a substring of string s
with a specified offset
starting from the right.
Syntax
right(s, offset)
Parameters
s
— The string to calculate a substring from. String or FixedString.offset
— The number of bytes of the offset. (U)Int*.
Returned value
- For positive
offset
: A substring ofs
withoffset
many bytes, starting from the right of the string. - For negative
offset
: A substring ofs
withlength(s) - |offset|
bytes, starting from the right of the string. - An empty string if
length
is 0.
Example
Query:
SELECT right('Hello', 3);
Result:
llo
Query:
SELECT right('Hello', -3);
Result:
lo
rightUTF8
Returns a substring of UTF-8 encoded string s
with a specified offset
starting from the right.
Syntax
rightUTF8(s, offset)
Parameters
s
— The UTF-8 encoded string to calculate a substring from. String or FixedString.offset
— The number of bytes of the offset. (U)Int*.
Returned value
- For positive
offset
: A substring ofs
withoffset
many bytes, starting from the right of the string. - For negative
offset
: A substring ofs
withlength(s) - |offset|
bytes, starting from the right of the string. - An empty string if
length
is 0.
Example
Query:
SELECT rightUTF8('Привет', 4);
Result:
ивет
Query:
SELECT rightUTF8('Привет', -4);
Result:
ет
rightPad
Pads a string from the right with spaces or with a specified string (multiple times, if needed) until the resulting string reaches the specified length
.
Syntax
rightPad(string, length[, pad_string])
Alias: RPAD
Arguments
string
— Input string that should be padded. String.length
— The length of the resulting string. UInt or Int. If the value is smaller than the input string length, then the input string is shortened tolength
characters.pad_string
— The string to pad the input string with. String. Optional. If not specified, then the input string is padded with spaces.
Returned value
- A left-padded string of the given length. String.
Example
SELECT rightPad('abc', 7, '*'), rightPad('abc', 7);
Result:
┌─rightPad('abc', 7, '*')─┬─rightPad('abc', 7)─┐
│ abc**** │ abc │
└─────────────────────────┴────────────────────┘
rightPadUTF8
Pads the string from the right with spaces or a specified string (multiple times, if needed) until the resulting string reaches the given length. Unlike rightPad which measures the string length in bytes, the string length is measured in code points.
Syntax
rightPadUTF8(string, length[, pad_string])
Arguments
string
— Input string that should be padded. String.length
— The length of the resulting string. UInt or Int. If the value is smaller than the input string length, then the input string is shortened tolength
characters.pad_string
— The string to pad the input string with. String. Optional. If not specified, then the input string is padded with spaces.
Returned value
- A right-padded string of the given length. String.
Example
SELECT rightPadUTF8('абвг', 7, '*'), rightPadUTF8('абвг', 7);
Result:
┌─rightPadUTF8('абвг', 7, '*')─┬─rightPadUTF8('абвг', 7)─┐
│ абвг*** │ абвг │
└──────────────────────────────┴─────────────────────────┘
lower
Converts the ASCII Latin symbols in a string to lowercase.
Syntax*
lower(input)
Alias: lcase
Parameters
input
: A string type String.
Returned value
- A String data type value.
Example
Query:
SELECT lower('CLICKHOUSE');
┌─lower('CLICKHOUSE')─┐
│ clickhouse │
└─────────────────────┘
upper
Converts the ASCII Latin symbols in a string to uppercase.
Syntax
upper(input)
Alias: ucase
Parameters
input
— A string type String.
Returned value
- A String data type value.
Examples
Query:
SELECT upper('clickhouse');
┌─upper('clickhouse')─┐
│ CLICKHOUSE │
└─────────────────────┘
lowerUTF8
Converts a string to lowercase, assuming that the string contains valid UTF-8 encoded text. If this assumption is violated, no exception is thrown and the result is undefined.
Does not detect the language, e.g. for Turkish the result might not be exactly correct (i/İ vs. i/I). If the length of the UTF-8 byte sequence is different for upper and lower case of a code point (such as ẞ
and ß
), the result may be incorrect for this code point.
Syntax
lowerUTF8(input)
Parameters
input
— A string type String.
Returned value
- A String data type value.
Example
Query:
SELECT lowerUTF8('MÜNCHEN') as Lowerutf8;
Result:
┌─Lowerutf8─┐
│ münchen │
└───────────┘
upperUTF8
Converts a string to uppercase, assuming that the string contains valid UTF-8 encoded text. If this assumption is violated, no exception is thrown and the result is undefined.
Does not detect the language, e.g. for Turkish the result might not be exactly correct (i/İ vs. i/I). If the length of the UTF-8 byte sequence is different for upper and lower case of a code point (such as ẞ
and ß
), the result may be incorrect for this code point.
Syntax
upperUTF8(input)
Parameters
input
— A string type String.
Returned value
- A String data type value.
Example
Query:
SELECT upperUTF8('München') as Upperutf8;
Result:
┌─Upperutf8─┐
│ MÜNCHEN │
└───────────┘
isValidUTF8
Returns 1, if the set of bytes constitutes valid UTF-8-encoded text, otherwise 0.
Syntax
isValidUTF8(input)
Parameters
input
— A string type String.
Returned value
- Returns
1
, if the set of bytes constitutes valid UTF-8-encoded text, otherwise0
.
Query:
SELECT isValidUTF8('\xc3\xb1') AS valid, isValidUTF8('\xc3\x28') AS invalid;
Result:
┌─valid─┬─invalid─┐
│ 1 │ 0 │
└───────┴─────────┘
toValidUTF8
Replaces invalid UTF-8 characters by the �
(U+FFFD) character. All running in a row invalid characters are collapsed into the one replacement character.
Syntax
toValidUTF8(input_string)
Arguments
input_string
— Any set of bytes represented as the String data type object.
Returned value
- A valid UTF-8 string.
Example
SELECT toValidUTF8('\x61\xF0\x80\x80\x80b');
┌─toValidUTF8('a����b')─┐
│ a�b │
└───────────────────────┘
repeat
Concatenates a string as many times with itself as specified.
Syntax
repeat(s, n)
Alias: REPEAT
Arguments
s
— The string to repeat. String.n
— The number of times to repeat the string. UInt* or Int*.
Returned value
A string containing string s
repeated n
times. If n
<= 0, the function returns the empty string. String.
Example
SELECT repeat('abc', 10);
Result:
┌─repeat('abc', 10)──────────────┐
│ abcabcabcabcabcabcabcabcabcabc │
└────────────────────────────────┘
space
Concatenates a space (
) as many times with itself as specified.
Syntax
space(n)
Alias: SPACE
.
Arguments
n
— The number of times to repeat the space. UInt* or Int*.
Returned value
The string containing string
repeated n
times. If n
<= 0, the function returns the empty string. String.
Example
Query:
SELECT space(3);
Result:
┌─space(3) ────┐
│ │
└──────────────┘
reverse
Reverses the sequence of bytes in a string.
reverseUTF8
Reverses a sequence of Unicode code points in a string. Assumes that the string contains valid UTF-8 encoded text. If this assumption is violated, no exception is thrown and the result is undefined.
concat
Concatenates the given arguments.
Syntax
concat(s1, s2, ...)
Arguments
Values of arbitrary type.
Arguments which are not of types String or FixedString are converted to strings using their default serialization. As this decreases performance, it is not recommended to use non-String/FixedString arguments.
Returned values
The String created by concatenating the arguments.
If any of arguments is NULL
, the function returns NULL
.
Example
Query:
SELECT concat('Hello, ', 'World!');
Result:
┌─concat('Hello, ', 'World!')─┐
│ Hello, World! │
└─────────────────────────────┘
Query:
SELECT concat(42, 144);
Result:
┌─concat(42, 144)─┐
│ 42144 │
└─────────────────┘
||
operatorUse the || operator for string concatenation as a concise alternative to concat()
. For example, 'Hello, ' || 'World!'
is equivalent to concat('Hello, ', 'World!')
.
concatAssumeInjective
Like concat but assumes that concat(s1, s2, ...) → sn
is injective. Can be used for optimization of GROUP BY.
A function is called injective if it returns for different arguments different results. In other words: different arguments never produce identical result.
Syntax
concatAssumeInjective(s1, s2, ...)
Arguments
Values of type String or FixedString.
Returned values
The String created by concatenating the arguments.
If any of argument values is NULL
, the function returns NULL
.
Example
Input table:
CREATE TABLE key_val(`key1` String, `key2` String, `value` UInt32) ENGINE = TinyLog;
INSERT INTO key_val VALUES ('Hello, ','World',1), ('Hello, ','World',2), ('Hello, ','World!',3), ('Hello',', World!',2);
SELECT * from key_val;
┌─key1────┬─key2─────┬─value─┐
│ Hello, │ World │ 1 │
│ Hello, │ World │ 2 │
│ Hello, │ World! │ 3 │
│ Hello │ , World! │ 2 │
└─────────┴──────────┴───────┘
SELECT concat(key1, key2), sum(value) FROM key_val GROUP BY concatAssumeInjective(key1, key2);
Result:
┌─concat(key1, key2)─┬─sum(value)─┐
│ Hello, World! │ 3 │
│ Hello, World! │ 2 │
│ Hello, World │ 3 │
└────────────────────┴────────────┘
concatWithSeparator
Concatenates the given strings with a given separator.
Syntax
concatWithSeparator(sep, expr1, expr2, expr3...)
Alias: concat_ws
Arguments
- sep — separator. Const String or FixedString.
- exprN — expression to be concatenated. Arguments which are not of types String or FixedString are converted to strings using their default serialization. As this decreases performance, it is not recommended to use non-String/FixedString arguments.
Returned values
The String created by concatenating the arguments.
If any of the argument values is NULL
, the function returns NULL
.
Example
SELECT concatWithSeparator('a', '1', '2', '3', '4')
Result:
┌─concatWithSeparator('a', '1', '2', '3', '4')─┐
│ 1a2a3a4 │
└──────────────────────────────────────────────┘
concatWithSeparatorAssumeInjective
Like concatWithSeparator
but assumes that concatWithSeparator(sep, expr1, expr2, expr3...) → result
is injective. Can be used for optimization of GROUP BY.
A function is called injective if it returns for different arguments different results. In other words: different arguments never produce identical result.
substring
Returns the substring of a string s
which starts at the specified byte index offset
. Byte counting starts from 1. If offset
is 0, an empty string is returned. If offset
is negative, the substring starts pos
characters from the end of the string, rather than from the beginning. An optional argument length
specifies the maximum number of bytes the returned substring may have.
Syntax
substring(s, offset[, length])
Aliases:
substr
mid
byteSlice
Arguments
s
— The string to calculate a substring from. String, FixedString or Enumoffset
— The starting position of the substring ins
. (U)Int*.length
— The maximum length of the substring. (U)Int*. Optional.
Returned value
A substring of s
with length
many bytes, starting at index offset
. String.
Example
SELECT 'database' AS db, substr(db, 5), substr(db, 5, 1)
Result:
┌─db───────┬─substring('database', 5)─┬─substring('database', 5, 1)─┐