Namespaces
Variants
Views
Actions

character literal

From cppreference.com
< cpp‎ | language
 
 
C++ language
General topics
Flow control
Conditional execution statements
Iteration statements (loops)
Jump statements
Functions
Function declaration
Lambda function declaration
inline specifier
Exception specifications (until C++20)
noexcept specifier (C++11)
Exceptions
Namespaces
Types
Specifiers
decltype (C++11)
auto (C++11)
alignas (C++11)
Storage duration specifiers
Initialization
Expressions
Alternative representations
Literals
Boolean - Integer - Floating-point
Character - String - nullptr (C++11)
User-defined (C++11)
Utilities
Attributes (C++11)
Types
typedef declaration
Type alias declaration (C++11)
Casts
Implicit conversions - Explicit conversions
static_cast - dynamic_cast
const_cast - reinterpret_cast
Memory allocation
Classes
Class-specific function properties
Special member functions
Templates
Miscellaneous
 
 

[edit] Syntax

' c-char ' (1)
u8 ' c-char ' (2) (since C++17)
u ' c-char ' (3) (since C++11)
U ' c-char ' (4) (since C++11)
L ' c-char ' (5)
' c-char-sequence ' (6)

where

  • c-char is either
  • a character from the source character set minus single-quote ('), backslash (\), or the newline character,
  • escape sequence, as defined in escape sequences
  • universal character name, as defined in escape sequences
  • c-char-sequence is a sequence of two or more c-chars.
1) narrow character literal or ordinary character literal, e.g. 'a' or '\n' or '\13'. Such literal has type char and the value equal to the representation of c-char in the execution character set. If c-char is not representable as a single byte in the execution character set, the literal has type int and implementation-defined value
2) UTF-8 character literal, e.g. u8'a'. Such literal has type char (until C++20)char8_t (since C++20) and the value equal to ISO 10646 code point value of c-char, provided that the code point value is representable with a single UTF-8 code unit (that is, c-char is in the range 0x0-0x7F, inclusive). If c-char is not representable with a single UTF-8 code unit, the program is ill-formed.
3) UTF-16 character literal, e.g. u'貓', but not u'🍌' (u'\U0001f34c'). Such literal has type char16_t and the value equal to ISO 10646 code point value of c-char, provided that the code point value is representable with a single UTF-16 code unit (that is, c-char is in the range 0x0-0xFFFF, inclusive). If c-char is not representable with a single UTF-16 code unit, the program is ill-formed.
4) UTF-32 character literal, e.g. U'貓' or U'🍌'. Such literal has type char32_t and the value equal to ISO 10646 code point value of c-char.
5) wide character literal, e.g. L'β' or L'貓'. Such literal has type wchar_t and the value equal to the value of c-char in the execution wide character set. If c-char is not representable in the execution character set (e.g. a non-BMP value on Windows where wchar_t is 16-bit), the value of the literal is implementation-defined.
6) Multicharacter literal, e.g. 'AB', has type int and implementation-defined value.

[edit] Notes

Multicharacter literals were inherited by C from the B programming language. Although not specified by the C or C++ standard, most compilers (MSVC is a notable exception) implement multicharacter literals as specified in B: the values of each char in the literal initialize successive bytes of the resulting integer, in big-endian zero-padded right-adjusted order, e.g. the value of '\1' is 0x00000001 and the value of '\1\2\3\4' is 0x01020304.

In C, character constants such as 'a' or '\n' have type int, rather than char.

[edit] See also

user-defined literals literals with user-defined suffix (C++11) [edit]
C documentation for character constant