Phases of translation

< cpp‎ | language
Revision as of 08:28, 10 May 2013 by P12 (Talk | contribs)

C++ language
General topics
Flow control
Conditional execution statements
Iteration statements (loops)
Jump statements
Function declaration
Lambda function declaration
inline specifier
Exception specifications (deprecated)
noexcept specifier (C++11)
decltype (C++11)
auto (C++11)
alignas (C++11)
Storage duration specifiers
Alternative representations
Boolean - Integer - Floating-point
Character - String - nullptr (C++11)
User-defined (C++11)
Attributes (C++11)
typedef declaration
Type alias declaration (C++11)
Implicit conversions - Explicit conversions
static_cast - dynamic_cast
const_cast - reinterpret_cast
Memory allocation
Class-specific function properties
Special member functions

The C++ source file is processed by the compiler as if the following phases take place, in this exact order:


Phase 1

1) The individual bytes of the source code file are mapped (in implementation defined manner) to the characters of the basic source character set. In particular, OS-dependent end-of-line indicators are replaced by newline characters.
The basic source character set consists of 96 characters:
a) 5 whitespace characters (space, horizontal tab, vertical tab, form feed, new-line)
b) 10 digit characters from '0' to '9'
c) 52 letters from 'a' to 'z' and from 'A' to 'Z'
d) 29 punctuation characters: _ { } [ ] # ( ) < > % : ; . ? * + - / ^ & | ~ ! = , \ " ’
2) Trigraph sequences are replaced by corresponding single-character representations.
3) Any source file character that cannot be mapped to a character in basic source character set, is replaced by its universal character name (\uXXX) or by some internal form that is handled equivalently.

Phase 2

1) Whenever backslash appears at the end of a line (immediately followed by the newline character), both backslash and newline are deleted, combining two physical source lines into one logical source line. This is a single-pass operation, a line ending in two backslashes followed by an empty line) does not combine three lines into one). If a universal character name (\uXXX) is formed on this phase, the behavior is undefined.
2) If a non-empty source file does not end with a newline character after this step (whether it had no newline originally, or it ended with a backslash)
  • the behavior is undefined (until C++11)
  • a terminating newline character is added (since C++11)

Phase 3

1) The source file is decomposed into comments, sequences of whitespace characters (space, horizontal tab, new-line, vertical tab, and form-feed), and preprocessing tokens, which are the following
a) header names: <iostream> or "myfile.h"
c) numbers
d) character and string literals, including user-defined
e) operators and punctuators (including alternative tokens), such as +, <<=, new, <%, ##, or and.
f) individual non-whitespace characters that do not fit in any other category
2) Each comment is replaced by one space character
3) Newlines are kept, and it's implementation-defined whether non-newline whitespace sequences may be collapsed into single space characters.

Phase 4

1) Preprocessor is executed.
2) Each file introduced with the #include directive goes through phases 1 through 4, recursively.
3) At the end of this phase, all preprocessor directives are removed from the source.

Phase 5

1) All characters in character literals and string literals are converted from source character set to execution character set.
2) Escape sequences and universal character names in character literals and non-raw string literals are expanded and converted to execution character set. If the character specified by universal character name isn't a member of the execution character set, the result is implementation-defined, but is guaranteed to not be a null (wide) character.

Phase 6

Adjacent string literals are concatenated.

Phase 7

Compilation takes place: the tokens are syntactically and semantically analyzed and translated as a translation unit.

Phase 8

Each translation unit is examined to produce a list of required template instantiations, including the ones requested by explicit instantiations). The definitions of the templates are located, the required instantiations are performed to produce instantiation units.

Phase 9

Translation units, instantiation units, and library components needed to satisfy external references are collected into a program image which contains information needed for execution in its execution environment.