Namespaces
Variants
Views
Actions

Escape sequences

From cppreference.com
< cpp‎ | language
 
 
C++ language
General topics
Keywords
Escape sequences
Flow control
Conditional execution statements
if
Iteration statements (loops)
for
range-for (C++11)
Jump statements
Functions
Function declaration
Lambda function expression
inline specifier
Dynamic exception specifications (until C++20)
noexcept specifier (C++11)
Exceptions
Namespaces
Types
Specifiers
decltype (C++11)
auto (C++11)
alignas (C++11)
Storage duration specifiers
Initialization
Expressions
Alternative representations
Literals
Boolean - Integer - Floating-point
Character - String - nullptr (C++11)
User-defined (C++11)
Utilities
Attributes (C++11)
Types
typedef declaration
Type alias declaration (C++11)
Casts
Implicit conversions - Explicit conversions
static_cast - dynamic_cast
const_cast - reinterpret_cast
Memory allocation
Classes
Class-specific function properties
explicit (C++11)
static
Special member functions
Templates
Miscellaneous
 
 

Escape sequences are used to represent certain special characters within string literals and character literals.

The following escape sequences are available:

Escape
sequence
Description Representation
Simple escape sequences
\' single quote byte 0x27 in ASCII encoding
\" double quote byte 0x22 in ASCII encoding
\? question mark byte 0x3f in ASCII encoding
\\ backslash byte 0x5c in ASCII encoding
\a audible bell byte 0x07 in ASCII encoding
\b backspace byte 0x08 in ASCII encoding
\f form feed - new page byte 0x0c in ASCII encoding
\n line feed - new line byte 0x0a in ASCII encoding
\r carriage return byte 0x0d in ASCII encoding
\t horizontal tab byte 0x09 in ASCII encoding
\v vertical tab byte 0x0b in ASCII encoding
Numeric escape sequences
\nnn arbitrary octal value byte nnn (1~3 octal digits)
\o{n...} (since C++23) byte n... (arbitrary number of octal digits)
\xn... arbitrary hexadecimal value byte n... (arbitrary number of hexadecimal digits)
\x{n...} (since C++23)
Conditional escape sequences[1]
\c Implementation-defined Implementation-defined
Universal character names
\unnnn arbitrary Unicode value;
may result in several code units
code point U+nnnn (4 hexadecimal digits)
\u{n...} (since C++23) code point U+n... (arbitrary number of hexadecimal digits)
\Unnnnnnnn code point U+nnnnnnnn (8 hexadecimal digits)
\N{name} (since C++23) arbitrary character
listed in ISO/IEC 10646
character named by name (see below)
  1. Conditional escape sequences are conditionally-supported. The character c in each conditional escape sequence is a member of basic source character set (until C++23)basic character set (since C++23) that is not the character following the \ in any other escape sequence.

Contents

[edit] Range of universal character names

If a universal character name corresponds to a code point that is not 0x24 ($), 0x40 (@), nor 0x60 (`) and less than 0xA0, the program is ill-formed. In other words, members of basic source character set and control characters (in ranges 0x0-0x1F and 0x7F-0x9F) cannot be expressed in universal character names.

(until C++11)

If a universal character name corresponding to a code point of a member of basic source character set or control characters appear outside a character or string literal, the program is ill-formed.

If a universal character name corresponds surrogate code point (the range 0xD800-0xDFFF, inclusive), the program is ill-formed.

If a universal character name used in a UTF-16/32 string literal does not correspond to a code point in ISO/IEC 10646 (the range 0x0-0x10FFFF, inclusive), the program is ill-formed.

(since C++11)
(until C++20)

If a universal character name corresponding to a code point of a member of basic source character set or control characters appear outside a character or string literal, the program is ill-formed.

If a universal character name does not correspond to a code point in ISO/IEC 10646 (the range 0x0-0x10FFFF, inclusive) or corresponds to a surrogate code point (the range 0xD800-0xDFFF, inclusive), the program is ill-formed.

(since C++20)
(until C++23)

If a universal character name corresponding to a scalar value of a character in the basic character set or a control character appear outside a character or string literal, the program is ill-formed.

If a universal character name does not correspond to a scalar value of a character in the translation character set, the program is ill-formed.

(since C++23)


Named universal character escapes

\N{ n-char-sequence }
n-char-sequence - one or more n-chars
n-char - a character from the translation character set, except the right curly bracket } or new-line character

A universal character name of the syntax above is a named universal character. It designates the character named by its n-char-sequence. A character is so named if the n-char-sequence is equal to

  • the associated character name or associated character name alias specified in ISO/IEC 10646 subclause “Code charts and lists of character names” or
  • the control code alias given below.

None of the associated character names, associated character name aliases, or control code aliases have leading or trailing spaces.

Code unit Control Code Alias Code unit Control Code Alias
U+0000 NULL U+007F DELETE
U+0001 START OF HEADING U+0082 BREAK PERMITTED HERE
U+0002 START OF TEXT U+0083 NO BREAK HERE
U+0003 END OF TEXT U+0084 INDEX
U+0004 END OF TRANSMISSION U+0085 NEXT LINE
U+0005 ENQUIRY U+0086 START OF SELECTED AREA
U+0006 ACKNOWLEDGE U+0087 END OF SELECTED AREA
U+0007 ALERT U+0088 CHARACTER TABULATION SET
U+0008 BACKSPACE HORIZONTAL TABULATION SET
U+0009 CHARACTER TABULATION U+0089 CHARACTER TABULATION WITH JUSTIFICATION
HORIZONTAL TABULATION HORIZONTAL TABULATION WITH JUSTIFICATION
U+000A LINE FEED U+008A LINE TABULATION SET
NEW LINE VERTICAL TABULATION SET
END OF LINE U+008B PARTIAL LINE FORWARD
U+000B LINE TABULATION PARTIAL LINE DOWN
VERTICAL TABULATION U+008C PARTIAL LINE BACKWARD
U+000C FORM FEED PARTIAL LINE UP
U+000D CARRIAGE RETURN U+008D REVERSE LINE FEED
U+000E SHIFT OUT REVERSE INDEX
LOCKING-SHIFT ONE U+008E SINGLE SHIFT TWO
U+000F SHIFT IN SINGLE-SHIFT-2
LOCKING-SHIFT ZERO U+008F SINGLE SHIFT THREE
U+0010 DATA LINK ESCAPE SINGLE-SHIFT-3
U+0011 DEVICE CONTROL ONE U+0090 DEVICE CONTROL STRING
U+0012 DEVICE CONTROL TWO U+0091 PRIVATE USE ONE
U+0013 DEVICE CONTROL THREE PRIVATE USE-1
U+0014 DEVICE CONTROL FOUR U+0092 PRIVATE USE TWO
U+0015 NEGATIVE ACKNOWLEDGE PRIVATE USE-2
U+0016 SYNCHRONOUS IDLE U+0093 SET TRANSMIT STATE
U+0017 END OF TRANSMISSION BLOCK U+0094 CANCEL CHARACTER
U+0018 CANCEL U+0095 MESSAGE WAITING
U+0019 END OF MEDIUM U+0096 START OF GUARDED AREA
U+001A SUBSTITUTE START OF PROTECTED AREA
U+001B ESCAPE U+0097 END OF GUARDED AREA
U+001C INFORMATION SEPARATOR FOUR END OF PROTECTED AREA
FILE SEPARATOR U+0098 START OF STRING
U+001D INFORMATION SEPARATOR THREE U+009A SINGLE CHARACTER INTRODUCER
GROUP SEPARATOR U+009B CONTROL SEQUENCE INTRODUCER
U+001E INFORMATION SEPARATOR TWO U+009C STRING TERMINATOR
RECORD SEPARATOR U+009D OPERATING SYSTEM COMMAND
U+001F INFORMATION SEPARATOR ONE U+009E PRIVACY MESSAGE
UNIT SEPARATOR U+009F APPLICATION PROGRAM COMMAND
(since C++23)

[edit] Notes

\0 is the most commonly used octal escape sequence, because it represents the terminating null character in null-terminated strings.

The new-line character \n has special meaning when used in text mode I/O: it is converted to the OS-specific newline representation, usually a byte or byte sequence. Some systems mark their lines with length fields instead.

Octal escape sequences have a limit of three octal digits, but terminate at the first character that is not a valid octal digit if encountered sooner.

Hexadecimal escape sequences have no length limit and terminate at the first character that is not a valid hexadecimal digit. If the value represented by a single hexadecimal escape sequence does not fit the range of values represented by the character type used in this string literal (char, char8_t, (since C++20)char16_t, char32_t, (since C++11)or wchar_t), the result is unspecified.

A universal character name in a narrow string literal or a 16-bit string literal may map to more than one code unit, e.g. \U0001f34c is 4 char code units in UTF-8 (\xF0\x9F\x8D\x8C) and 2 char16_t code units in UTF-16 (\xD83C\xDF4C).

(since C++11)

The question mark escape sequence \? is used to prevent trigraphs from being interpreted inside string literals: a string such as "??/" is compiled as "\", but if the second question mark is escaped, as in "?\?/", it becomes "??/". As trigraphs have been removed from C++, the question mark escape sequence is no longer necessary. It is preserved for compatibility with C++14 (and former revisions) and C. (since C++17)

Feature-test macro Value Std Comment
__cpp_named_character_escapes 202207L (C++23) Named universal character escapes

[edit] Example

#include <iostream>
 
int main()
{
    std::cout << "This\nis\na\ntest\n\n";
    std::cout << "She said, \"Sells she seashells on the seashore?\"\n";
}

Output:

This
is
a
test
 
She said, "Sells she seashells on the seashore?"

[edit] Defect Reports

The following behavior-changing defect reports were applied retroactively to previously published C++ standards.

DR Applied to Behavior as published Correct behavior
CWG 505 C++98 the behavior was undefined if the character following
a backslash was not one of those specified in the table
made conditionally supported
(semantic is implementation-defined)

[edit] See also

C documentation for Escape sequence