\input texinfo @c Copyright (C) 2022 Richard Stallman and Free Software Foundation, Inc. @c (The work of Trevis Rothwell and Nelson Beebe has been assigned or @c licensed to the FSF.) @c move alignment later? @setfilename ./c @settitle GNU C Language Manual @documentencoding UTF-8 @c Merge variable index into the function index. @synindex vr fn @copying Copyright @copyright{} 2022 Richard Stallman and Free Software Foundation, Inc. (The work of Trevis Rothwell and Nelson Beebe has been assigned or licensed to the FSF.) @quotation Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with the Invariant Sections being ``GNU General Public License,'' with the Front-Cover Texts being ``A GNU Manual,'' and with the Back-Cover Texts as in (a) below. A copy of the license is included in the section entitled ``GNU Free Documentation License.'' (a) The FSF's Back-Cover Text is: ``You have the freedom to copy and modify this GNU manual.'' @end quotation @end copying @dircategory Programming @direntry * C: (c). GNU C Language Intro and Reference Manual @end direntry @titlepage @sp 6 @center @titlefont{GNU C Language Introduction} @center @titlefont{and Reference Manual} @sp 4 @c @center @value{EDITION} Edition @sp 5 @center Richard Stallman @center and @center Trevis Rothwell @center plus Nelson Beebe @center on floating point @page @vskip 0pt plus 1filll @insertcopying @sp 2 @ignore WILL BE Published by the Free Software Foundation @* 51 Franklin Street, Fifth Floor @* Boston, MA 02110-1301 USA @* ISBN ?-??????-??-? @end ignore @ignore @sp 1 Cover art by J. Random Artist @end ignore @end titlepage @summarycontents @contents @node Top @ifnottex @top GNU C Manual @end ifnottex @iftex @top Preface @end iftex This manual explains the C language for use with the GNU Compiler Collection (GCC) on the GNU/Linux system and other systems. We refer to this dialect as GNU C. If you already know C, you can use this as a reference manual. If you understand basic concepts of programming but know nothing about C, you can read this manual sequentially from the beginning to learn the C language. If you are a beginner in programming, we recommend you first learn a language with automatic garbage collection and no explicit pointers, rather than starting with C@. Good choices include Lisp, Scheme, Python and Java. C's explicit pointers mean that programmers must be careful to avoid certain kinds of errors. C is a venerable language; it was first used in 1973. The GNU C Compiler, which was subsequently extended into the GNU Compiler Collection, was first released in 1987. Other important languages were designed based on C: once you know C, it gives you a useful base for learning C@t{++}, C#, Java, Scala, D, Go, and more. The special advantage of C is that it is fairly simple while allowing close access to the computer's hardware, which previously required writing in assembler language to describe the individual machine instructions. Some have called C a ``high-level assembler language'' because of its explicit pointers and lack of automatic management of storage. As one wag put it, ``C combines the power of assembler language with the convenience of assembler language.'' However, C is far more portable, and much easier to read and write, than assembler language. This manual describes the GNU C language supported by the GNU Compiler Collection, as of roughly 2017. Please inform us of any changes needed to match the current version of GNU C. When a construct may be absent or work differently in other C compilers, we say so. When it is not part of ISO standard C, we say it is a ``GNU C extension,'' because it is useful to know that. However, standards and other dialects are secondary topics for this manual. For simplicity's sake, we keep those notes short, unless it is vital to say more. Likewise, we hardly mention C@t{++} or other languages that the GNU Compiler Collection supports. We hope this manual will serve as a base for writing manuals for those languages, but languages so different can't share one common manual. Some aspects of the meaning of C programs depend on the target platform: which computer, and which operating system, the compiled code will run on. Where this is the case, we say so. The C language provides no built-in facilities for performing such common operations as input/output, memory management, string manipulation, and the like. Instead, these facilities are provided by functions defined in the standard library, which is automatically available in every C program. @xref{Top, The GNU C Library, , libc, The GNU C Library Reference Manual}. GNU/Linux systems use the GNU C Library to do this job. It is itself a C program, so once you know C you can read its source code and see how its library functions do their jobs. Some fraction of the functions are implemented as @dfn{system calls}, which means they contain a special instruction that asks the system kernel (Linux) to do a specific task. To understand how those are implemented, you'd need to read Linux source code instead. Whether a library function is a system call is an internal implementation detail that makes no difference for how to call the function. This manual incorporates the former GNU C Preprocessor Manual, which was among the earliest GNU manuals. It also uses some text from the earlier GNU C Manual that was written by Trevis Rothwell and James Youngman. GNU C has many obscure features, each one either for historical compatibility or meant for very special situations. We have left them to a companion manual, the GNU C Obscurities Manual, which will be published digitally later. Please report errors and suggestions to c-manual@@gnu.org. @menu * The First Example:: Getting started with basic C code. * Complete Program:: A whole example program that can be compiled and run. * Storage:: Basic layout of storage; bytes. * Beyond Integers:: Exploring different numeric types. * Lexical Syntax:: The various lexical components of C programs. * Arithmetic:: Numeric computations. * Assignment Expressions:: Storing values in variables. * Execution Control Expressions:: Expressions combining values in various ways. * Binary Operator Grammar:: An overview of operator precedence. * Order of Execution:: The order of program execution. * Primitive Types:: More details about primitive data types. * Constants:: Explicit constant values: details and examples. * Type Size:: The memory space occupied by a type. * Pointers:: Creating and manipulating memory pointers. * Structures:: Compound data types built by grouping other types. * Arrays:: Creating and manipulating arrays. * Enumeration Types:: Sets of integers with named values. * Defining Typedef Names:: Using @code{typedef} to define type names. * Statements:: Controlling program flow. * Variables:: Details about declaring, initializing, and using variables. * Type Qualifiers:: Mark variables for certain intended uses. * Functions:: Declaring, defining, and calling functions. * Compatible Types:: How to tell if two types are compatible with each other. * Type Conversions:: Converting between types. * Scope:: Different categories of identifier scope. * Preprocessing:: Using the GNU C preprocessor. * Integers in Depth:: How integer numbers are represented. * Floating Point in Depth:: How floating-point numbers are represented. * Compilation:: How to compile multi-file programs. * Directing Compilation:: Operations that affect compilation but don't change the program. Appendices * Type Alignment:: Where in memory a type can validly start. * Aliasing:: Accessing the same data in two types. * Digraphs:: Two-character aliases for some characters. * Attributes:: Specifying additional information in a declaration. * Signals:: Fatal errors triggered in various scenarios. * GNU Free Documentation License:: The license for this manual. * Symbol Index:: Keyword and symbol index. * Concept Index:: Detailed topical index. @detailmenu --- The Detailed Node Listing --- * Recursive Fibonacci:: Writing a simple function recursively. * Stack:: Each function call uses space in the stack. * Iterative Fibonacci:: Writing the same function iteratively. * Complete Example:: Turn the simple function into a full program. * Complete Explanation:: Explanation of each part of the example. * Complete Line-by-Line:: Explaining each line of the example. * Compile Example:: Using GCC to compile the example. * Float Example:: A function that uses floating-point numbers. * Array Example:: A function that works with arrays. * Array Example Call:: How to call that function. * Array Example Variations:: Different ways to write the call example. Lexical Syntax * English:: Write programs in English! * Characters:: The characters allowed in C programs. * Whitespace:: The particulars of whitespace characters. * Comments:: How to include comments in C code. * Identifiers:: How to form identifiers (names). * Operators/Punctuation:: Characters used as operators or punctuation. * Line Continuation:: Splitting one line into multiple lines. * Digraphs:: Two-character substitutes for some characters. Arithmetic * Basic Arithmetic:: Addition, subtraction, multiplication, and division. * Integer Arithmetic:: How C performs arithmetic with integer values. * Integer Overflow:: When an integer value exceeds the range of its type. * Mixed Mode:: Calculating with both integer values and floating-point values. * Division and Remainder:: How integer division works. * Numeric Comparisons:: Comparing numeric values for equality or order. * Shift Operations:: Shift integer bits left or right. * Bitwise Operations:: Bitwise conjunction, disjunction, negation. Assignment Expressions * Simple Assignment:: The basics of storing a value. * Lvalues:: Expressions into which a value can be stored. * Modifying Assignment:: Shorthand for changing an lvalue's contents. * Increment/Decrement:: Shorthand for incrementing and decrementing an lvalue's contents. * Postincrement/Postdecrement:: Accessing then incrementing or decrementing. * Assignment in Subexpressions:: How to avoid ambiguity. * Write Assignments Separately:: Write assignments as separate statements. Execution Control Expressions * Logical Operators:: Logical conjunction, disjunction, negation. * Logicals and Comparison:: Logical operators with comparison operators. * Logicals and Assignments:: Assignments with logical operators. * Conditional Expression:: An if/else construct inside expressions. * Comma Operator:: Build a sequence of subexpressions. Order of Execution * Reordering of Operands:: Operations in C are not necessarily computed in the order they are written. * Associativity and Ordering:: Some associative operations are performed in a particular order; others are not. * Sequence Points:: Some guarantees about the order of operations. * Postincrement and Ordering:: Ambiguous execution order with postincrement. * Ordering of Operands:: Evaluation order of operands and function arguments. * Optimization and Ordering:: Compiler optimizations can reorder operations only if it has no impact on program results. Primitive Data Types * Integer Types:: Description of integer types. * Floating-Point Data Types:: Description of floating-point types. * Complex Data Types:: Description of complex number types. * The Void Type:: A type indicating no value at all. * Other Data Types:: A brief summary of other types. Constants * Integer Constants:: Literal integer values. * Integer Const Type:: Types of literal integer values. * Floating Constants:: Literal floating-point values. * Imaginary Constants:: Literal imaginary number values. * Invalid Numbers:: Avoiding preprocessing number misconceptions. * Character Constants:: Literal character values. * Unicode Character Codes:: Unicode characters represented in either UTF-16 or UTF-32. * Wide Character Constants:: Literal characters values larger than 8 bits. * String Constants:: Literal string values. * UTF-8 String Constants:: Literal UTF-8 string values. * Wide String Constants:: Literal string values made up of 16- or 32-bit characters. Pointers * Address of Data:: Using the ``address-of'' operator. * Pointer Types:: For each type, there is a pointer type. * Pointer Declarations:: Declaring variables with pointer types. * Pointer Type Designators:: Designators for pointer types. * Pointer Dereference:: Accessing what a pointer points at. * Null Pointers:: Pointers which do not point to any object. * Invalid Dereference:: Dereferencing null or invalid pointers. * Void Pointers:: Totally generic pointers, can cast to any. * Pointer Comparison:: Comparing memory address values. * Pointer Arithmetic:: Computing memory address values. * Pointers and Arrays:: Using pointer syntax instead of array syntax. * Low-Level Pointer Arithmetic:: More about computing memory address values. * Pointer Increment/Decrement:: Incrementing and decrementing pointers. * Pointer Arithmetic Drawbacks:: A common pointer bug to watch out for. * Pointer-Integer Conversion:: Converting pointer types to integer types. * Printing Pointers:: Using @code{printf} for a pointer's value. Structures * Referencing Fields:: Accessing field values in a structure object. * Arrays as Fields:: Accessing field values in a structure object. * Dynamic Memory Allocation:: Allocating space for objects while the program is running. * Field Offset:: Memory layout of fields within a structure. * Structure Layout:: Planning the memory layout of fields. * Packed Structures:: Packing structure fields as close as possible. * Bit Fields:: Dividing integer fields into fields with fewer bits. * Bit Field Packing:: How bit fields pack together in integers. * const Fields:: Making structure fields immutable. * Zero Length:: Zero-length array as a variable-length object. * Flexible Array Fields:: Another approach to variable-length objects. * Overlaying Structures:: Casting one structure type over an object of another structure type. * Structure Assignment:: Assigning values to structure objects. * Unions:: Viewing the same object in different types. * Packing With Unions:: Using a union type to pack various types into the same memory space. * Cast to Union:: Casting a value one of the union's alternative types to the type of the union itself. * Structure Constructors:: Building new structure objects. * Unnamed Types as Fields:: Fields' types do not always need names. * Incomplete Types:: Types which have not been fully defined. * Intertwined Incomplete Types:: Defining mutually-recursive structure types. * Type Tags:: Scope of structure and union type tags. Arrays * Accessing Array Elements:: How to access individual elements of an array. * Declaring an Array:: How to name and reserve space for a new array. * Strings:: A string in C is a special case of array. * Incomplete Array Types:: Naming, but not allocating, a new array. * Limitations of C Arrays:: Arrays are not first-class objects. * Multidimensional Arrays:: Arrays of arrays. * Constructing Array Values:: Assigning values to an entire array at once. * Arrays of Variable Length:: Declaring arrays of non-constant size. Statements * Expression Statement:: Evaluate an expression, as a statement, usually done for a side effect. * if Statement:: Basic conditional execution. * if-else Statement:: Multiple branches for conditional execution. * Blocks:: Grouping multiple statements together. * return Statement:: Return a value from a function. * Loop Statements:: Repeatedly executing a statement or block. * switch Statement:: Multi-way conditional choices. * switch Example:: A plausible example of using @code{switch}. * Duffs Device:: A special way to use @code{switch}. * Case Ranges:: Ranges of values for @code{switch} cases. * Null Statement:: A statement that does nothing. * goto Statement:: Jump to another point in the source code, identified by a label. * Local Labels:: Labels with limited scope. * Labels as Values:: Getting the address of a label. * Statement Exprs:: A series of statements used as an expression. Variables * Variable Declarations:: Name a variable and and reserve space for it. * Initializers:: Assigning initial values to variables. * Designated Inits:: Assigning initial values to array elements at particular array indices. * Auto Type:: Obtaining the type of a variable. * Local Variables:: Variables declared in function definitions. * File-Scope Variables:: Variables declared outside of function definitions. * Static Local Variables:: Variables declared within functions, but with permanent storage allocation. * Extern Declarations:: Declaring a variable which is allocated somewhere else. * Allocating File-Scope:: When is space allocated for file-scope variables? * auto and register:: Historically used storage directions. * Omitting Types:: The bad practice of declaring variables with implicit type. Type Qualifiers * const:: Variables whose values don't change. * volatile:: Variables whose values may be accessed or changed outside of the control of this program. * restrict Pointers:: Restricted pointers for code optimization. * restrict Pointer Example:: Example of how that works. Functions * Function Definitions:: Writing the body of a function. * Function Declarations:: Declaring the interface of a function. * Function Calls:: Using functions. * Function Call Semantics:: Call-by-value argument passing. * Function Pointers:: Using references to functions. * The main Function:: Where execution of a GNU C program begins. Type Conversions * Explicit Type Conversion:: Casting a value from one type to another. * Assignment Type Conversions:: Automatic conversion by assignment operation. * Argument Promotions:: Automatic conversion of function parameters. * Operand Promotions:: Automatic conversion of arithmetic operands. * Common Type:: When operand types differ, which one is used? Scope * Scope:: Different categories of identifier scope. Preprocessing * Preproc Overview:: Introduction to the C preprocessor. * Directives:: The form of preprocessor directives. * Preprocessing Tokens:: The lexical elements of preprocessing. * Header Files:: Including one source file in another. * Macros:: Macro expansion by the preprocessor. * Conditionals:: Controlling whether to compile some lines or ignore them. * Diagnostics:: Reporting warnings and errors. * Line Control:: Reporting source line numbers. * Null Directive:: A preprocessing no-op. Integers in Depth * Integer Representations:: How integer values appear in memory. * Maximum and Minimum Values:: Value ranges of integer types. Floating Point in Depth * Floating Representations:: How floating-point values appear in memory. * Floating Type Specs:: Precise details of memory representations. * Special Float Values:: Infinity, Not a Number, and Subnormal Numbers. * Invalid Optimizations:: Don't mess up non-numbers and signed zeros. * Exception Flags:: Handling certain conditions in floating point. * Exact Floating-Point:: Not all floating calculations lose precision. * Rounding:: When a floating result can't be represented exactly in the floating-point type in use. * Rounding Issues:: Avoid magnifying rounding errors. * Significance Loss:: Subtracting numbers that are almost equal. * Fused Multiply-Add:: Taking advantage of a special floating-point instruction for faster execution. * Error Recovery:: Determining rounding errors. * Exact Floating Constants:: Precisely specified floating-point numbers. * Handling Infinity:: When floating calculation is out of range. * Handling NaN:: What floating calculation is undefined. * Signed Zeros:: Positive zero vs. negative zero. * Scaling by the Base:: A useful exact floating-point operation. * Rounding Control:: Specifying some rounding behaviors. * Machine Epsilon:: The smallest number you can add to 1.0 and get a sum which is larger than 1.0. * Complex Arithmetic:: Details of arithmetic with complex numbers. * Round-Trip Base Conversion:: What happens between base-2 and base-10. * Further Reading:: References for floating-point numbers. Directing Compilation * Pragmas:: Controlling compilation of some constructs. * Static Assertions:: Compile-time tests for conditions. @end detailmenu @end menu @node The First Example @chapter The First Example This chapter presents the source code for a very simple C program and uses it to explain a few features of the language. If you already know the basic points of C presented in this chapter, you can skim it or skip it. We present examples of C source code (other than comments) using a fixed-width typeface, since that's the way they look when you edit them in an editor such as GNU Emacs. @menu * Recursive Fibonacci:: Writing a simple function recursively. * Stack:: Each function call uses space in the stack. * Iterative Fibonacci:: Writing the same function iteratively. @end menu @node Recursive Fibonacci @section Example: Recursive Fibonacci @cindex recursive Fibonacci function @cindex Fibonacci function, recursive To introduce the most basic features of C, let's look at code for a simple mathematical function that does calculations on integers. This function calculates the @var{n}th number in the Fibonacci series, in which each number is the sum of the previous two: 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, @dots{}. @example int fib (int n) @{ if (n <= 2) /* @r{This avoids infinite recursion.} */ return 1; else return fib (n - 1) + fib (n - 2); @} @end example This very simple program illustrates several features of C: @itemize @bullet @item A function definition, whose first two lines constitute the function header. @xref{Function Definitions}. @item A function parameter @code{n}, referred to as the variable @code{n} inside the function body. @xref{Function Parameter Variables}. A function definition uses parameters to refer to the argument values provided in a call to that function. @item Arithmetic. C programs add with @samp{+} and subtract with @samp{-}. @xref{Arithmetic}. @item Numeric comparisons. The operator @samp{<=} tests for ``less than or equal.'' @xref{Numeric Comparisons}. @item Integer constants written in base 10. @xref{Integer Constants}. @item A function call. The function call @code{fib (n - 1)} calls the function @code{fib}, passing as its argument the value @code{n - 1}. @xref{Function Calls}. @item A comment, which starts with @samp{/*} and ends with @samp{*/}. The comment has no effect on the execution of the program. Its purpose is to provide explanations to people reading the source code. Including comments in the code is tremendously important---they provide background information so others can understand the code more quickly. @xref{Comments}. In this manual, we present comment text in the variable-width typeface used for the text of the chapters, not in the fixed-width typeface used for the rest of the code. That is to make comments easier to read. This distinction of typeface does not exist in a real file of C source code. @item Two kinds of statements, the @code{return} statement and the @code{if}@dots{}@code{else} statement. @xref{Statements}. @item Recursion. The function @code{fib} calls itself; that is called a @dfn{recursive call}. These are valid in C, and quite common. The @code{fib} function would not be useful if it didn't return. Thus, recursive definitions, to be of any use, must avoid @dfn{infinite recursion}. This function definition prevents infinite recursion by specially handling the case where @code{n} is two or less. Thus the maximum depth of recursive calls is less than @code{n}. @end itemize @menu * Function Header:: The function's name and how it is called. * Function Body:: Declarations and statements that implement the function. @end menu @node Function Header @subsection Function Header @cindex function header In our example, the first two lines of the function definition are the @dfn{header}. Its purpose is to state the function's name and say how it is called: @example int fib (int n) @end example @noindent says that the function returns an integer (type @code{int}), its name is @code{fib}, and it takes one argument named @code{n} which is also an integer. (Data types will be explained later, in @ref{Primitive Types}.) @node Function Body @subsection Function Body @cindex function body @cindex recursion The rest of the function definition is called the @dfn{function body}. Like every function body, this one starts with @samp{@{}, ends with @samp{@}}, and contains zero or more @dfn{statements} and @dfn{declarations}. Statements specify actions to take, whereas declarations define names of variables, functions, and so on. Each statement and each declaration ends with a semicolon (@samp{;}). Statements and declarations often contain @dfn{expressions}; an expression is a construct whose execution produces a @dfn{value} of some data type, but may also take actions through ``side effects'' that alter subsequent execution. A statement, by contrast, does not have a value; it affects further execution of the program only through the actions it takes. This function body contains no declarations, and just one statement, but that one is a complex statement in that it contains nested statements. This function uses two kinds of statements: @table @code @item return The @code{return} statement makes the function return immediately. It looks like this: @example return @var{value}; @end example Its meaning is to compute the expression @var{value} and exit the function, making it return whatever value that expression produced. For instance, @example return 1; @end example @noindent returns the integer 1 from the function, and @example return fib (n - 1) + fib (n - 2); @end example @noindent returns a value computed by performing two function calls as specified and adding their results. @item @code{if}@dots{}@code{else} The @code{if}@dots{}@code{else} statement is a @dfn{conditional}. Each time it executes, it chooses one of its two substatements to execute and ignores the other. It looks like this: @example if (@var{condition}) @var{if-true-statement} else @var{if-false-statement} @end example Its meaning is to compute the expression @var{condition} and, if it's ``true,'' execute @var{if-true-statement}. Otherwise, execute @var{if-false-statement}. @xref{if-else Statement}. Inside the @code{if}@dots{}@code{else} statement, @var{condition} is simply an expression. It's considered ``true'' if its value is nonzero. (A comparison operation, such as @code{n <= 2}, produces the value 1 if it's ``true'' and 0 if it's ``false.'' @xref{Numeric Comparisons}.) Thus, @example if (n <= 2) return 1; else return fib (n - 1) + fib (n - 2); @end example @noindent first tests whether the value of @code{n} is less than or equal to 2. If so, the expression @code{n <= 2} has the value 1. So execution continues with the statement @example return 1; @end example @noindent Otherwise, execution continues with this statement: @example return fib (n - 1) + fib (n - 2); @end example Each of these statements ends the execution of the function and provides a value for it to return. @xref{return Statement}. @end table Calculating @code{fib} using ordinary integers in C works only for @var{n} < 47, because the value of @code{fib (47)} is too large to fit in type @code{int}. The addition operation that tries to add @code{fib (46)} and @code{fib (45)} cannot deliver the correct result. This occurrence is called @dfn{integer overflow}. Overflow can manifest itself in various ways, but one thing that can't possibly happen is to produce the correct value, since that can't fit in the space for the value. @xref{Integer Overflow}. @xref{Functions}, for a full explanation about functions. @node Stack @section The Stack, And Stack Overflow @cindex stack @cindex stack frame @cindex stack overflow @cindex recursion, drawbacks of @cindex stack frame Recursion has a drawback: there are limits to how many nested levels of function calls a program can make. In C, each function call allocates a block of memory which it uses until the call returns. C allocates these blocks consecutively within a large area of memory known as the @dfn{stack}, so we refer to the blocks as @dfn{stack frames}. The size of the stack is limited; if the program tries to use too much, that causes the program to fail because the stack is full. This is called @dfn{stack overflow}. @cindex crash @cindex segmentation fault Stack overflow on GNU/Linux typically manifests itself as the @dfn{signal} named @code{SIGSEGV}, also known as a ``segmentation fault.'' By default, this signal terminates the program immediately, rather than letting the program try to recover, or reach an expected ending point. (We commonly say in this case that the program ``crashes''). @xref{Signals}. It is inconvenient to observe a crash by passing too large an argument to recursive Fibonacci, because the program would run a long time before it crashes. This algorithm is simple but ridiculously slow: in calculating @code{fib (@var{n})}, the number of (recursive) calls @code{fib (1)} or @code{fib (2)} that it makes equals the final result. However, you can observe stack overflow very quickly if you use this function instead: @example int fill_stack (int n) @{ if (n <= 1) /* @r{This limits the depth of recursion.} */ return 1; else return fill_stack (n - 1); @} @end example Under gNewSense GNU/Linux on the Lemote Yeeloong, without optimization and using the default configuration, an experiment showed there is enough stack space to do 261906 nested calls to that function. One more, and the stack overflows and the program crashes. On another platform, with a different configuration, or with a different function, the limit might be bigger or smaller. @node Iterative Fibonacci @section Example: Iterative Fibonacci @cindex iterative Fibonacci function @cindex Fibonacci function, iterative Here's a much faster algorithm for computing the same Fibonacci series. It is faster for two reasons. First, it uses @dfn{iteration} (that is, repetition or looping) rather than recursion, so it doesn't take time for a large number of function calls. But mainly, it is faster because the number of repetitions is small---only @code{@var{n}}. @c If you change this, change the duplicate in node Example of for. @example int fib (int n) @{ int last = 1; /* @r{Initial value is @code{fib (1)}.} */ int prev = 0; /* @r{Initial value controls @code{fib (2)}.} */ int i; for (i = 1; i < n; ++i) /* @r{If @code{n} is 1 or less, the loop runs zero times,} */ /* @r{since @code{i < n} is false the first time.} */ @{ /* @r{Now @code{last} is @code{fib (@code{i})}} @r{and @code{prev} is @code{fib (@code{i} @minus{} 1)}.} */ /* @r{Compute @code{fib (@code{i} + 1)}.} */ int next = prev + last; /* @r{Shift the values down.} */ prev = last; last = next; /* @r{Now @code{last} is @code{fib (@code{i} + 1)}} @r{and @code{prev} is @code{fib (@code{i})}.} @r{But that won't stay true for long,} @r{because we are about to increment @code{i}.} */ @} return last; @} @end example This definition computes @code{fib (@var{n})} in a time proportional to @code{@var{n}}. The comments in the definition explain how it works: it advances through the series, always keeps the last two values in @code{last} and @code{prev}, and adds them to get the next value. Here are the additional C features that this definition uses: @table @asis @item Internal blocks Within a function, wherever a statement is called for, you can write a @dfn{block}. It looks like @code{@{ @r{@dots{}} @}} and contains zero or more statements and declarations. (You can also use additional blocks as statements in a block.) The function body also counts as a block, which is why it can contain statements and declarations. @xref{Blocks}. @item Declarations of local variables This function body contains declarations as well as statements. There are three declarations directly in the function body, as well as a fourth declaration in an internal block. Each starts with @code{int} because it declares a variable whose type is integer. One declaration can declare several variables, but each of these declarations is simple and declares just one variable. Variables declared inside a block (either a function body or an internal block) are @dfn{local variables}. These variables exist only within that block; their names are not defined outside the block, and exiting the block deallocates their storage. This example declares four local variables: @code{last}, @code{prev}, @code{i}, and @code{next}. The most basic local variable declaration looks like this: @example @var{type} @var{variablename}; @end example For instance, @example int i; @end example @noindent declares the local variable @code{i} as an integer. @xref{Variable Declarations}. @item Initializers When you declare a variable, you can also specify its initial value, like this: @example @var{type} @var{variablename} = @var{value}; @end example For instance, @example int last = 1; @end example @noindent declares the local variable @code{last} as an integer (type @code{int}) and starts it off with the value 1. @xref{Initializers}. @item Assignment Assignment: a specific kind of expression, written with the @samp{=} operator, that stores a new value in a variable or other place. Thus, @example @var{variable} = @var{value} @end example @noindent is an expression that computes @code{@var{value}} and stores the value in @code{@var{variable}}. @xref{Assignment Expressions}. @item Expression statements An expression statement is an expression followed by a semicolon. That computes the value of the expression, then ignores the value. An expression statement is useful when the expression changes some data or has other side effects---for instance, with function calls, or with assignments as in this example. @xref{Expression Statement}. Using an expression with no side effects in an expression statement is pointless except in very special cases. For instance, the expression statement @code{x;} would examine the value of @code{x} and ignore it. That is not useful. @item Increment operator The increment operator is @samp{++}. @code{++i} is an expression that is short for @code{i = i + 1}. @xref{Increment/Decrement}. @item @code{for} statements A @code{for} statement is a clean way of executing a statement repeatedly---a @dfn{loop} (@pxref{Loop Statements}). Specifically, @example for (i = 1; i < n; ++i) @var{body} @end example @noindent means to start by doing @code{i = 1} (set @code{i} to one) to prepare for the loop. The loop itself consists of @itemize @bullet @item Testing @code{i < n} and exiting the loop if that's false. @item Executing @var{body}. @item Advancing the loop (executing @code{++i}, which increments @code{i}). @end itemize The net result is to execute @var{body} with 1 in @code{i}, then with 2 in @code{i}, and so on, stopping just before the repetition where @code{i} would equal @code{n}. If @code{n} is less than 1, the loop will execute the body zero times. The body of the @code{for} statement must be one and only one statement. You can't write two statements in a row there; if you try to, only the first of them will be treated as part of the loop. The way to put multiple statements in such a place is to group them with a block, and that's what we do in this example. @end table @node Complete Program @chapter A Complete Program @cindex complete example program @cindex example program, complete It's all very well to write a Fibonacci function, but you cannot run it by itself. It is a useful program, but it is not a complete program. In this chapter we present a complete program that contains the @code{fib} function. This example shows how to make the program start, how to make it finish, how to do computation, and how to print a result. @menu * Complete Example:: Turn the simple function into a full program. * Complete Explanation:: Explanation of each part of the example. * Complete Line-by-Line:: Explaining each line of the example. * Compile Example:: Using GCC to compile the example. @end menu @node Complete Example @section Complete Program Example Here is the complete program that uses the simple, recursive version of the @code{fib} function (@pxref{Recursive Fibonacci}): @example #include int fib (int n) @{ if (n <= 2) /* @r{This avoids infinite recursion.} */ return 1; else return fib (n - 1) + fib (n - 2); @} int main (void) @{ printf ("Fibonacci series item %d is %d\n", 20, fib (20)); return 0; @} @end example @noindent This program prints a message that shows the value of @code{fib (20)}. Now for an explanation of what that code means. @node Complete Explanation @section Complete Program Explanation @ifnottex Here's the explanation of the code of the example in the previous section. @end ifnottex This sample program prints a message that shows the value of @code{fib (20)}, and exits with code 0 (which stands for successful execution). Every C program is started by running the function named @code{main}. Therefore, the example program defines a function named @code{main} to provide a way to start it. Whatever that function does is what the program does. @xref{The main Function}. The @code{main} function is the first one called when the program runs, but it doesn't come first in the example code. The order of the function definitions in the source code makes no difference to the program's meaning. The initial call to @code{main} always passes certain arguments, but @code{main} does not have to pay attention to them. To ignore those arguments, define @code{main} with @code{void} as the parameter list. (@code{void} as a function's parameter list normally means ``call with no arguments,'' but @code{main} is a special case.) The function @code{main} returns 0 because that is the conventional way for @code{main} to indicate successful execution. It could instead return a positive integer to indicate failure, and some utility programs have specific conventions for the meaning of certain numeric @dfn{failure codes}. @xref{Values from main}. @cindex @code{printf} The simplest way to print text in C is by calling the @code{printf} function, so here we explain very briefly what that function does. For a full explanation of @code{printf} and the other standard I/O functions, see @ref{I/O on Streams, The GNU C Library, , libc, The GNU C Library Reference Manual}. @cindex standard output The first argument to @code{printf} is a @dfn{string constant} (@pxref{String Constants}) that is a template for output. The function @code{printf} copies most of that string directly as output, including the newline character at the end of the string, which is written as @samp{\n}. The output goes to the program's @dfn{standard output} destination, which in the usual case is the terminal. @samp{%} in the template introduces a code that substitutes other text into the output. Specifically, @samp{%d} means to take the next argument to @code{printf} and substitute it into the text as a decimal number. (The argument for @samp{%d} must be of type @code{int}; if it isn't, @code{printf} will malfunction.) So the output is a line that looks like this: @example Fibonacci series item 20 is 6765 @end example This program does not contain a definition for @code{printf} because it is defined by the C library, which makes it available in all C programs. However, each program does need to @dfn{declare} @code{printf} so it will be called correctly. The @code{#include} line takes care of that; it includes a @dfn{header file} called @file{stdio.h} into the program's code. That file is provided by the operating system and it contains declarations for the many standard input/output functions in the C library, one of which is @code{printf}. Don't worry about header files for now; we'll explain them later in @ref{Header Files}. The first argument of @code{printf} does not have to be a string constant; it can be any string (@pxref{Strings}). However, using a constant is the most common case. @node Complete Line-by-Line @section Complete Program, Line by Line Here's the same example, explained line by line. @strong{Beginners, do you find this helpful or not? Would you prefer a different layout for the example? Please tell rms@@gnu.org.} @example #include /* @r{Include declaration of usual} */ /* @r{I/O functions such as @code{printf}.} */ /* @r{Most programs need these.} */ int /* @r{This function returns an @code{int}.} */ fib (int n) /* @r{Its name is @code{fib};} */ /* @r{its argument is called @code{n}.} */ @{ /* @r{Start of function body.} */ /* @r{This stops the recursion from being infinite.} */ if (n <= 2) /* @r{If @code{n} is 1 or 2,} */ return 1; /* @r{make @code{fib} return 1.} */ else /* @r{otherwise, add the two previous} */ /* @r{Fibonacci numbers.} */ return fib (n - 1) + fib (n - 2); @} int /* @r{This function returns an @code{int}.} */ main (void) /* @r{Start here; ignore arguments.} */ @{ /* @r{Print message with numbers in it.} */ printf ("Fibonacci series item %d is %d\n", 20, fib (20)); return 0; /* @r{Terminate program, report success.} */ @} @end example @node Compile Example @section Compiling the Example Program @cindex compiling @cindex executable file To run a C program requires converting the source code into an @dfn{executable file}. This is called @dfn{compiling} the program, and the command to do that using GNU C is @command{gcc}. This example program consists of a single source file. If we call that file @file{fib1.c}, the complete command to compile it is this: @example gcc -g -O -o fib1 fib1.c @end example @noindent Here, @option{-g} says to generate debugging information, @option{-O} says to optimize at the basic level, and @option{-o fib1} says to put the executable program in the file @file{fib1}. To run the program, use its file name as a shell command. For instance, @example ./fib1 @end example @noindent However, unless you are sure the program is correct, you should expect to need to debug it. So use this command, @example gdb fib1 @end example @noindent which starts the GDB debugger (@pxref{Sample Session, Sample Session, A Sample GDB Session, gdb, Debugging with GDB}) so you can run and debug the executable program @code{fib1}. Richard Stallman's advice, from personal experience, is to turn to the debugger as soon as you can reproduce the problem. Don't try to avoid it by using other methods instead---occasionally they are shortcuts, but usually they waste an unbounded amount of time. With the debugger, you will surely find the bug in a reasonable time; overall, you will get your work done faster. The sooner you get serious and start the debugger, the sooner you are likely to find the bug. @xref{Compilation}, for an introduction to compiling more complex programs which consist of more than one source file. @node Storage @chapter Storage and Data @cindex bytes @cindex storage organization @cindex memory organization Storage in C programs is made up of units called @dfn{bytes}. On nearly all computers, a byte consists of 8 bits, but there are a few peculiar computers (mostly ``embedded controllers'' for very small systems) where a byte is longer than that. This manual does not try to explain the peculiarity of those computers; we assume that a byte is 8 bits. Every C data type is made up of a certain number of bytes; that number is the data type's @dfn{size}. @xref{Type Size}, for details. The types @code{signed char} and @code{unsigned char} are one byte long; use those types to operate on data byte by byte. @xref{Signed and Unsigned Types}. You can refer to a series of consecutive bytes as an array of @code{char} elements; that's what an ASCII string looks like in memory. @xref{String Constants}. @node Beyond Integers @chapter Beyond Integers So far we've presented programs that operate on integers. In this chapter we'll present examples of handling non-integral numbers and arrays of numbers. @menu * Float Example:: A function that uses floating-point numbers. * Array Example:: A function that works with arrays. * Array Example Call:: How to call that function. * Array Example Variations:: Different ways to write the call example. @end menu @node Float Example @section An Example with Non-Integer Numbers @cindex floating point example Here's a function that operates on and returns @dfn{floating point} numbers that don't have to be integers. Floating point represents a number as a fraction together with a power of 2. (For more detail, @pxref{Floating-Point Data Types}.) This example calculates the average of three floating point numbers that are passed to it as arguments: @example double average_of_three (double a, double b, double c) @{ return (a + b + c) / 3; @} @end example The values of the parameter @var{a}, @var{b} and @var{c} do not have to be integers, and even when they happen to be integers, most likely their average is not an integer. @code{double} is the usual data type in C for calculations on floating-point numbers. To print a @code{double} with @code{printf}, we must use @samp{%f} instead of @samp{%d}: @example printf ("Average is %f\n", average_of_three (1.1, 9.8, 3.62)); @end example The code that calls @code{printf} must pass a @code{double} for printing with @samp{%f} and an @code{int} for printing with @samp{%d}. If the argument has the wrong type, @code{printf} will produce garbage output. Here's a complete program that computes the average of three specific numbers and prints the result: @example double average_of_three (double a, double b, double c) @{ return (a + b + c) / 3; @} int main (void) @{ printf ("Average is %f\n", average_of_three (1.1, 9.8, 3.62)); return 0; @} @end example From now on we will not present examples of calls to @code{main}. Instead we encourage you to write them for yourself when you want to test executing some code. @node Array Example @section An Example with Arrays @cindex array example A function to take the average of three numbers is very specific and limited. A more general function would take the average of any number of numbers. That requires passing the numbers in an array. An array is an object in memory that contains a series of values of the same data type. This chapter presents the basic concepts and use of arrays through an example; for the full explanation, see @ref{Arrays}. Here's a function definition to take the average of several floating-point numbers, passed as type @code{double}. The first parameter, @code{length}, specifies how many numbers are passed. The second parameter, @code{input_data}, is an array that holds those numbers. @example double avg_of_double (int length, double input_data[]) @{ double sum = 0; int i; for (i = 0; i < length; i++) sum = sum + input_data[i]; return sum / length; @} @end example This introduces the expression to refer to an element of an array: @code{input_data[i]} means the element at index @code{i} in @code{input_data}. The index of the element can be any expression with an integer value; in this case, the expression is @code{i}. @xref{Accessing Array Elements}. @cindex zero-origin indexing The lowest valid index in an array is 0, @emph{not} 1, and the highest valid index is one less than the number of elements. (This is known as @dfn{zero-origin indexing}.) This example also introduces the way to declare that a function parameter is an array. Such declarations are modeled after the syntax for an element of the array. Just as @code{double foo} declares that @code{foo} is of type @code{double}, @code{double input_data[]} declares that each element of @code{input_data} is of type @code{double}. Therefore, @code{input_data} itself has type ``array of @code{double}.'' When declaring an array parameter, it's not necessary to say how long the array is. In this case, the parameter @code{input_data} has no length information. That's why the function needs another parameter, @code{length}, for the caller to provide that information to the function @code{avg_of_double}. @node Array Example Call @section Calling the Array Example To call the function @code{avg_of_double} requires making an array and then passing it as an argument. Here is an example. @example @{ /* @r{The array of values to average.} */ double nums_to_average[5]; /* @r{The average, once we compute it.} */ double average; /* @r{Fill in elements of @code{nums_to_average}.} */ nums_to_average[0] = 58.7; nums_to_average[1] = 5.1; nums_to_average[2] = 7.7; nums_to_average[3] = 105.2; nums_to_average[4] = -3.14159; average = avg_of_double (5, nums_to_average); /* @r{@dots{}now make use of @code{average}@dots{}} */ @} @end example This shows an array subscripting expression again, this time on the left side of an assignment, storing a value into an element of an array. It also shows how to declare a local variable that is an array: @code{double nums_to_average[5];}. Since this declaration allocates the space for the array, it needs to know the array's length. You can specify the length with any expression whose value is an integer, but in this declaration the length is a constant, the integer 5. The name of the array, when used by itself as an expression, stands for the address of the array's data, and that's what gets passed to the function @code{avg_of_double} in @code{avg_of_double (5, nums_to_average)}. We can make the code easier to maintain by avoiding the need to write 5, the array length, when calling @code{avg_of_double}. That way, if we change the array to include more elements, we won't have to change that call. One way to do this is with the @code{sizeof} operator: @example average = avg_of_double ((sizeof (nums_to_average) / sizeof (nums_to_average[0])), nums_to_average); @end example This computes the number of elements in @code{nums_to_average} by dividing its total size by the size of one element. @xref{Type Size}, for more details of using @code{sizeof}. We don't show in this example what happens after storing the result of @code{avg_of_double} in the variable @code{average}. Presumably more code would follow that uses that result somehow. (Why compute the average and not use it?) But that isn't part of this topic. @node Array Example Variations @section Variations for Array Example The code to call @code{avg_of_double} has two declarations that start with the same data type: @example /* @r{The array of values to average.} */ double nums_to_average[5]; /* @r{The average, once we compute it.} */ double average; @end example In C, you can combine the two, like this: @example double nums_to_average[5], average; @end example This declares @code{nums_to_average} so each of its elements is a @code{double}, and @code{average} so that it simply is a @code{double}. However, while you @emph{can} combine them, that doesn't mean you @emph{should}. If it is useful to write comments about the variables, and usually it is, then it's clearer to keep the declarations separate so you can put a comment on each one. We set all of the elements of the array @code{nums_to_average} with assignments, but it is more convenient to use an initializer in the declaration: @example @{ /* @r{The array of values to average.} */ double nums_to_average[] = @{ 58.7, 5.1, 7.7, 105.2, -3.14159 @}; /* @r{The average, once we compute it.} */ average = avg_of_double ((sizeof (nums_to_average) / sizeof (nums_to_average[0])), nums_to_average); /* @r{@dots{}now make use of @code{average}@dots{}} */ @} @end example The array initializer is a comma-separated list of values, delimited by braces. @xref{Initializers}. Note that the declaration does not specify a size for @code{nums_to_average}, so the size is determined from the initializer. There are five values in the initializer, so @code{nums_to_average} gets length 5. If we add another element to the initializer, @code{nums_to_average} will have six elements. Because the code computes the number of elements from the size of the array, using @code{sizeof}, the program will operate on all the elements in the initializer, regardless of how many those are. @node Lexical Syntax @chapter Lexical Syntax @cindex lexical syntax @cindex token To start the full description of the C language, we explain the lexical syntax and lexical units of C code. The lexical units of a programming language are known as @dfn{tokens}. This chapter covers all the tokens of C except for constants, which are covered in a later chapter (@pxref{Constants}). One vital kind of token is the @dfn{identifier} (@pxref{Identifiers}), which is used for names of any kind. @menu * English:: Write programs in English! * Characters:: The characters allowed in C programs. * Whitespace:: The particulars of whitespace characters. * Comments:: How to include comments in C code. * Identifiers:: How to form identifiers (names). * Operators/Punctuation:: Characters used as operators or punctuation. * Line Continuation:: Splitting one line into multiple lines. @end menu @node English @section Write Programs in English! In principle, you can write the function and variable names in a program, and the comments, in any human language. C allows any kinds of characters in comments, and you can put non-ASCII characters into identifiers with a special prefix. However, to enable programmers in all countries to understand and develop the program, it is best given today's circumstances to write identifiers and comments in English. English is the one language that programmers in all countries generally study. If a program's names are in English, most programmers in Bangladesh, Belgium, Bolivia, Brazil, and Bulgaria can understand them. Most programmers in those countries can speak English, or at least read it, but they do not read each other's languages at all. In India, with so many languages, two programmers may have no common language other than English. If you don't feel confident in writing English, do the best you can, and follow each English comment with a version in a language you write better; add a note asking others to translate that to English. Someone will eventually do that. The program's user interface is a different matter. We don't need to choose one language for that; it is easy to support multiple languages and let each user choose the language to use. This requires writing the program to support localization of its interface. (The @code{gettext} package exists to support this; @pxref{Message Translation, The GNU C Library, , libc, The GNU C Library Reference Manual}.) Then a community-based translation effort can provide support for all the languages users want to use. @node Characters @section Characters @cindex character set @cindex Unicode @c ??? How to express ¶? GNU C source files are usually written in the @url{https://en.wikipedia.org/wiki/ASCII,,ASCII} character set, which was defined in the 1960s for English. However, they can also include Unicode characters represented in the @url{https://en.wikipedia.org/wiki/UTF-8,,UTF-8} multibyte encoding. This makes it possible to represent accented letters such as @samp{á}, as well as other scripts such as Arabic, Chinese, Cyrillic, Hebrew, Japanese, and Korean.@footnote{On some obscure systems, GNU C uses UTF-EBCDIC instead of UTF-8, but that is not worth describing in this manual.} In C source code, non-ASCII characters are valid in comments, in wide character constants (@pxref{Wide Character Constants}), and in string constants (@pxref{String Constants}). @c ??? valid in identifiers? Another way to specify non-ASCII characters in constants (character or string) and identifiers is with an escape sequence starting with backslash, specifying the intended Unicode character. (@xref{Unicode Character Codes}.) This specifies non-ASCII characters without putting a real non-ASCII character in the source file itself. C accepts two-character aliases called @dfn{digraphs} for certain characters. @xref{Digraphs}. @node Whitespace @section Whitespace @cindex whitespace characters in source files @cindex space character in source @cindex tab character in source @cindex formfeed in source @cindex linefeed in source @cindex newline in source @cindex carriage return in source @cindex vertical tab in source Whitespace means characters that exist in a file but appear blank in a printed listing of a file (or traditionally did appear blank, several decades ago). The C language requires whitespace in order to separate two consecutive identifiers, or to separate an identifier from a numeric constant. Other than that, and a few special situations described later, whitespace is optional; you can put it in when you wish, to make the code easier to read. Space and tab in C code are treated as whitespace characters. So are line breaks. You can represent a line break with the newline character (also called @dfn{linefeed} or LF), CR (carriage return), or the CRLF sequence (two characters: carriage return followed by a newline character). The @dfn{formfeed} character, Control-L, was traditionally used to divide a file into pages. It is still used this way in source code, and the tools that generate nice printouts of source code still start a new page after each ``formfeed'' character. Dividing code into pages separated by formfeed characters is a good way to break it up into comprehensible pieces and show other programmers where they start and end. The @dfn{vertical tab} character, Control-K, was traditionally used to make printing advance down to the next section of a page. We know of no particular reason to use it in source code, but it is still accepted as whitespace in C. Comments are also syntactically equivalent to whitespace. @ifinfo @xref{Comments}. @end ifinfo @node Comments @section Comments @cindex comments A comment encapsulates text that has no effect on the program's execution or meaning. The purpose of comments is to explain the code to people that read it. Writing good comments for your code is tremendously important---they should provide background information that helps programmers understand the reasons why the code is written the way it is. You, returning to the code six months from now, will need the help of these comments to remember why you wrote it this way. Outdated comments that become incorrect are counterproductive, so part of the software developer's responsibility is to update comments as needed to correspond with changes to the program code. C allows two kinds of comment syntax, the traditional style and the C@t{++} style. A traditional C comment starts with @samp{/*} and ends with @samp{*/}. For instance, @example /* @r{This is a comment in traditional C syntax.} */ @end example A traditional comment can contain @samp{/*}, but these delimiters do not nest as pairs. The first @samp{*/} ends the comment regardless of whether it contains @samp{/*} sequences. @example /* @r{This} /* @r{is a comment} */ But this is not! */ @end example A @dfn{line comment} starts with @samp{//} and ends at the end of the line. For instance, @example // @r{This is a comment in C@t{++} style.} @end example Line comments do nest, in effect, because @samp{//} inside a line comment is part of that comment: @example // @r{this whole line is} // @r{one comment} This is code, not comment. @end example It is safe to put line comments inside block comments, or vice versa. @example @group /* @r{traditional comment} // @r{contains line comment} @r{more traditional comment} */ text here is not a comment // @r{line comment} /* @r{contains traditional comment} */ @end group @end example But beware of commenting out one end of a traditional comment with a line comment. The delimiter @samp{/*} doesn't start a comment if it occurs inside an already-started comment. @example @group // @r{line comment} /* @r{That would ordinarily begin a block comment.} Oops! The line comment has ended; this isn't a comment any more. */ @end group @end example Comments are not recognized within string constants. @t{@w{"/* blah */"}} is the string constant @samp{@w{/* blah */}}, not an empty string. In this manual we show the text in comments in a variable-width font, for readability, but this font distinction does not exist in source files. A comment is syntactically equivalent to whitespace, so it always separates tokens. Thus, @example @group int/* @r{comment} */foo; @r{is equivalent to} int foo; @end group @end example @noindent but clean code always uses real whitespace to separate the comment visually from surrounding code. @node Identifiers @section Identifiers @cindex identifiers An @dfn{identifier} (name) in C is a sequence of letters and digits, as well as @samp{_}, that does not start with a digit. Most compilers also allow @samp{$}. An identifier can be as long as you like; for example, @example int anti_dis_establishment_arian_ism; @end example @cindex case of letters in identifiers Letters in identifiers are case-sensitive in C; thus, @code{a} and @code{A} are two different identifiers. @cindex keyword @cindex reserved words Identifiers in C are used as variable names, function names, typedef names, enumeration constants, type tags, field names, and labels. Certain identifiers in C are @dfn{keywords}, which means they have specific syntactic meanings. Keywords in C are @dfn{reserved words}, meaning you cannot use them in any other way. For instance, you can't define a variable or function named @code{return} or @code{if}. You can also include other characters, even non-ASCII characters, in identifiers by writing their Unicode character names, which start with @samp{\u} or @samp{\U}, in the identifier name. @xref{Unicode Character Codes}. However, it is usually a bad idea to use non-ASCII characters in identifiers, and when they are written in English, they never need non-ASCII characters. @xref{English}. Whitespace is required to separate two consecutive identifiers, or to separate an identifier from a preceding or following numeric constant. @node Operators/Punctuation @section Operators and Punctuation @cindex operators @cindex punctuation Here we describe the lexical syntax of operators and punctuation in C. The specific operators of C and their meanings are presented in subsequent chapters. Most operators in C consist of one or two characters that can't be used in identifiers. The characters used for operators in C are @samp{!~^&|*/%+-=<>,.?:}. Some operators are a single character. For instance, @samp{-} is the operator for negation (with one operand) and the operator for subtraction (with two operands). Some operators are two characters. For example, @samp{++} is the increment operator. Recognition of multicharacter operators works by grouping together as many consecutive characters as can constitute one operator. For instance, the character sequence @samp{++} is always interpreted as the increment operator; therefore, if we want to write two consecutive instances of the operator @samp{+}, we must separate them with a space so that they do not combine as one token. Applying the same rule, @code{a+++++b} is always tokenized as @code{@w{a++ ++ + b}}, not as @code{@w{a++ + ++b}}, even though the latter could be part of a valid C program and the former could not (since @code{a++} is not an lvalue and thus can't be the operand of @code{++}). A few C operators are keywords rather than special characters. They include @code{sizeof} (@pxref{Type Size}) and @code{_Alignof} (@pxref{Type Alignment}). The characters @samp{;@{@}[]()} are used for punctuation and grouping. Semicolon (@samp{;}) ends a statement. Braces (@samp{@{} and @samp{@}}) begin and end a block at the statement level (@pxref{Blocks}), and surround the initializer (@pxref{Initializers}) for a variable with multiple elements or components (such as arrays or structures). Square brackets (@samp{[} and @samp{]}) do array indexing, as in @code{array[5]}. Parentheses are used in expressions for explicit nesting of expressions (@pxref{Basic Arithmetic}), around the parameter declarations in a function declaration or definition, and around the arguments in a function call, as in @code{printf ("Foo %d\n", i)} (@pxref{Function Calls}). Several kinds of statements also use parentheses as part of their syntax---for instance, @code{if} statements, @code{for} statements, @code{while} statements, and @code{switch} statements. @xref{if Statement}, and following sections. Parentheses are also required around the operand of the operator keywords @code{sizeof} and @code{_Alignof} when the operand is a data type rather than a value. @xref{Type Size}. @node Line Continuation @section Line Continuation @cindex line continuation @cindex continuation of lines The sequence of a backslash and a newline is ignored absolutely anywhere in a C program. This makes it possible to split a single source line into multiple lines in the source file. GNU C tolerates and ignores other whitespace between the backslash and the newline. In particular, it always ignores a CR (carriage return) character there, in case some text editor decided to end the line with the CRLF sequence. The main use of line continuation in C is for macro definitions that would be inconveniently long for a single line (@pxref{Macros}). It is possible to continue a line comment onto another line with backslash-newline. You can put backslash-newline in the middle of an identifier, even a keyword, or an operator. You can even split @samp{/*}, @samp{*/}, and @samp{//} onto multiple lines with backslash-newline. Here's an ugly example: @example @group /\ * */ fo\ o +\ = 1\ 0; @end group @end example @noindent That's equivalent to @samp{/* */ foo += 10;}. Don't do those things in real programs, since they make code hard to read. @strong{Note:} For the sake of using certain tools on the source code, it is wise to end every source file with a newline character which is not preceded by a backslash, so that it really ends the last line. @node Arithmetic @chapter Arithmetic @cindex arithmetic operators @cindex operators, arithmetic @c ??? Duplication with other sections -- get rid of that? Arithmetic operators in C attempt to be as similar as possible to the abstract arithmetic operations, but it is impossible to do this perfectly. Numbers in a computer have a finite range of possible values, and non-integer values have a limit on their possible accuracy. Nonetheless, except when results are out of range, you will encounter no surprises in using @samp{+} for addition, @samp{-} for subtraction, and @samp{*} for multiplication. Each C operator has a @dfn{precedence}, which is its rank in the grammatical order of the various operators. The operators with the highest precedence grab adjoining operands first; these expressions then become operands for operators of lower precedence. We give some information about precedence of operators in this chapter where we describe the operators; for the full explanation, see @ref{Binary Operator Grammar}. The arithmetic operators always @dfn{promote} their operands before operating on them. This means converting narrow integer data types to a wider data type (@pxref{Operand Promotions}). If you are just learning C, don't worry about this yet. Given two operands that have different types, most arithmetic operations convert them both to their @dfn{common type}. For instance, if one is @code{int} and the other is @code{double}, the common type is @code{double}. (That's because @code{double} can represent all the values that an @code{int} can hold, but not vice versa.) For the full details, see @ref{Common Type}. @menu * Basic Arithmetic:: Addition, subtraction, multiplication, and division. * Integer Arithmetic:: How C performs arithmetic with integer values. * Integer Overflow:: When an integer value exceeds the range of its type. * Mixed Mode:: Calculating with both integer values and floating-point values. * Division and Remainder:: How integer division works. * Numeric Comparisons:: Comparing numeric values for equality or order. * Shift Operations:: Shift integer bits left or right. * Bitwise Operations:: Bitwise conjunction, disjunction, negation. @end menu @node Basic Arithmetic @section Basic Arithmetic @cindex addition operator @cindex subtraction operator @cindex multiplication operator @cindex division operator @cindex negation operator @cindex operator, addition @cindex operator, subtraction @cindex operator, multiplication @cindex operator, division @cindex operator, negation Basic arithmetic in C is done with the usual binary operators of algebra: addition (@samp{+}), subtraction (@samp{-}), multiplication (@samp{*}) and division (@samp{/}). The unary operator @samp{-} is used to change the sign of a number. The unary @code{+} operator also exists; it yields its operand unaltered. @samp{/} is the division operator, but dividing integers may not give the result you expect. Its value is an integer, which is not equal to the mathematical quotient when that is a fraction. Use @samp{%} to get the corresponding integer remainder when necessary. @xref{Division and Remainder}. Floating point division yields value as close as possible to the mathematical quotient. These operators use algebraic syntax with the usual algebraic precedence rule (@pxref{Binary Operator Grammar}) that multiplication and division are done before addition and subtraction, but you can use parentheses to explicitly specify how the operators nest. They are left-associative (@pxref{Associativity and Ordering}). Thus, @example -a + b - c + d * e / f @end example @noindent is equivalent to @example (((-a) + b) - c) + ((d * e) / f) @end example @node Integer Arithmetic @section Integer Arithmetic @cindex integer arithmetic Each of the basic arithmetic operations in C has two variants for integers: @dfn{signed} and @dfn{unsigned}. The choice is determined by the data types of their operands. Each integer data type in C is either @dfn{signed} or @dfn{unsigned}. A signed type can hold a range of positive and negative numbers, with zero near the middle of the range. An unsigned type can hold only nonnegative numbers; its range starts with zero and runs upward. The most basic integer types are @code{int}, which normally can hold numbers from @minus{}2,147,483,648 to 2,147,483,647, and @code{unsigned int}, which normally can hold numbers from 0 to 4,294,967,295. (This assumes @code{int} is 32 bits wide, always true for GNU C on real computers but not always on embedded controllers.) @xref{Integer Types}, for full information about integer types. When a basic arithmetic operation is given two signed operands, it does signed arithmetic. Given two unsigned operands, it does unsigned arithmetic. If one operand is @code{unsigned int} and the other is @code{int}, the operator treats them both as unsigned. More generally, the common type of the operands determines whether the operation is signed or not. @xref{Common Type}. Printing the results of unsigned arithmetic with @code{printf} using @samp{%d} can produce surprising results for values far away from zero. Even though the rules above say that the computation was done with unsigned arithmetic, the printed result may appear to be signed! The explanation is that the bit pattern resulting from addition, subtraction or multiplication is actually the same for signed and unsigned operations. The difference is only in the data type of the result, which affects the @emph{interpretation} of the result bit pattern, and whether the arithmetic operation can overflow (see the next section). But @samp{%d} doesn't know its argument's data type. It sees only the value's bit pattern, and it is defined to interpret that as @code{signed int}. To print it as unsigned requires using @samp{%u} instead of @samp{%d}. @xref{Formatted Output, The GNU C Library, , libc, The GNU C Library Reference Manual}. Arithmetic in C never operates directly on narrow integer types (those with fewer bits than @code{int}; @ref{Narrow Integers}). Instead it ``promotes'' them to @code{int}. @xref{Operand Promotions}. @node Integer Overflow @section Integer Overflow @cindex integer overflow @cindex overflow, integer When the mathematical value of an arithmetic operation doesn't fit in the range of the data type in use, that's called @dfn{overflow}. When it happens in integer arithmetic, it is @dfn{integer overflow}. Integer overflow happens only in arithmetic operations. Type conversion operations, by definition, do not cause overflow, not even when the result can't fit in its new type. @xref{Integer Conversion}. Signed numbers use two's-complement representation, in which the most negative number lacks a positive counterpart (@pxref{Integers in Depth}). Thus, the unary @samp{-} operator on a signed integer can overflow. @menu * Unsigned Overflow:: Overflow in unsigned integer arithmetic. * Signed Overflow:: Overflow in signed integer arithmetic. @end menu @node Unsigned Overflow @subsection Overflow with Unsigned Integers Unsigned arithmetic in C ignores overflow; it produces the true result modulo the @var{n}th power of 2, where @var{n} is the number of bits in the data type. We say it ``truncates'' the true result to the lowest @var{n} bits. A true result that is negative, when taken modulo the @var{n}th power of 2, yields a positive number. For instance, @example unsigned int x = 1; unsigned int y; y = -x; @end example @noindent causes overflow because the negative number @minus{}1 can't be stored in an unsigned type. The actual result, which is @minus{}1 modulo the @var{n}th power of 2, is one less than the @var{n}th power of 2. That is the largest value that the unsigned data type can store. For a 32-bit @code{unsigned int}, the value is 4,294,967,295. @xref{Maximum and Minimum Values}. Adding that number to itself, as here, @example unsigned int z; z = y + y; @end example @noindent ought to yield 8,489,934,590; however, that is again too large to fit, so overflow truncates the value to 4,294,967,294. If that were a signed integer, it would mean @minus{}2, which (not by coincidence) equals @minus{}1 + @minus{}1. @node Signed Overflow @subsection Overflow with Signed Integers @cindex compiler options for integer overflow @cindex integer overflow, compiler options @cindex overflow, compiler options For signed integers, the result of overflow in C is @emph{in principle} undefined, meaning that anything whatsoever could happen. Therefore, C compilers can do optimizations that treat the overflow case with total unconcern. (Since the result of overflow is undefined in principle, one cannot claim that these optimizations are erroneous.) @strong{Watch out:} These optimizations can do surprising things. For instance, @example int i; @r{@dots{}} if (i < i + 1) x = 5; @end example @noindent could be optimized to do the assignment unconditionally, because the @code{if}-condition is always true if @code{i + 1} does not overflow. GCC offers compiler options to control handling signed integer overflow. These options operate per module; that is, each module behaves according to the options it was compiled with. These two options specify particular ways to handle signed integer overflow, other than the default way: @table @option @item -fwrapv Make signed integer operations well-defined, like unsigned integer operations: they produce the @var{n} low-order bits of the true result. The highest of those @var{n} bits is the sign bit of the result. With @option{-fwrapv}, these out-of-range operations are not considered overflow, so (strictly speaking) integer overflow never happens. The option @option{-fwrapv} enables some optimizations based on the defined values of out-of-range results. In GCC 8, it disables optimizations that are based on assuming signed integer operations will not overflow. @item -ftrapv Generate a signal @code{SIGFPE} when signed integer overflow occurs. This terminates the program unless the program handles the signal. @xref{Signals}. @end table One other option is useful for finding where overflow occurs: @ignore @item -fno-strict-overflow Disable optimizations that are based on assuming signed integer operations will not overflow. @end ignore @table @option @item -fsanitize=signed-integer-overflow Output a warning message at run time when signed integer overflow occurs. This checks the @samp{+}, @samp{*}, and @samp{-} operators. This takes priority over @option{-ftrapv}. @end table @node Mixed Mode @section Mixed-Mode Arithmetic Mixing integers and floating-point numbers in a basic arithmetic operation converts the integers automatically to floating point. In most cases, this gives exactly the desired results. But sometimes it matters precisely where the conversion occurs. If @code{i} and @code{j} are integers, @code{(i + j) * 2.0} adds them as an integer, then converts the sum to floating point for the multiplication. If the addition causes an overflow, that is not equivalent to converting each integer to floating point and then adding the two floating point numbers. You can get the latter result by explicitly converting the integers, as in @code{((double) i + (double) j) * 2.0}. @xref{Explicit Type Conversion}. @c Eggert's report Adding or multiplying several values, including some integers and some floating point, performs the operations left to right. Thus, @code{3.0 + i + j} converts @code{i} to floating point, then adds 3.0, then converts @code{j} to floating point and adds that. You can specify a different order using parentheses: @code{3.0 + (i + j)} adds @code{i} and @code{j} first and then adds that sum (converted to floating point) to 3.0. In this respect, C differs from other languages, such as Fortran. @node Division and Remainder @section Division and Remainder @cindex remainder operator @cindex modulus @cindex operator, remainder Division of integers in C rounds the result to an integer. The result is always rounded towards zero. @example 16 / 3 @result{} 5 -16 / 3 @result{} -5 16 / -3 @result{} -5 -16 / -3 @result{} 5 @end example @noindent To get the corresponding remainder, use the @samp{%} operator: @example 16 % 3 @result{} 1 -16 % 3 @result{} -1 16 % -3 @result{} 1 -16 % -3 @result{} -1 @end example @noindent @samp{%} has the same operator precedence as @samp{/} and @samp{*}. From the rounded quotient and the remainder, you can reconstruct the dividend, like this: @example int original_dividend (int divisor, int quotient, int remainder) @{ return divisor * quotient + remainder; @} @end example To do unrounded division, use floating point. If only one operand is floating point, @samp{/} converts the other operand to floating point. @example 16.0 / 3 @result{} 5.333333333333333 16 / 3.0 @result{} 5.333333333333333 16.0 / 3.0 @result{} 5.333333333333333 16 / 3 @result{} 5 @end example The remainder operator @samp{%} is not allowed for floating-point operands, because it is not needed. The concept of remainder makes sense for integers because the result of division of integers has to be an integer. For floating point, the result of division is a floating-point number, in other words a fraction, which will differ from the exact result only by a very small amount. There are functions in the standard C library to calculate remainders from integral-values division of floating-point numbers. @xref{Remainder Functions, The GNU C Library, , libc, The GNU C Library Reference Manual}. Integer division overflows in one specific case: dividing the smallest negative value for the data type (@pxref{Maximum and Minimum Values}) by @minus{}1. That's because the correct result, which is the corresponding positive number, does not fit (@pxref{Integer Overflow}) in the same number of bits. On some computers now in use, this always causes a signal @code{SIGFPE} (@pxref{Signals}), the same behavior that the option @option{-ftrapv} specifies (@pxref{Signed Overflow}). Division by zero leads to unpredictable results---depending on the type of computer, it might cause a signal @code{SIGFPE}, or it might produce a numeric result. @cindex division by zero @cindex zero, division by @strong{Watch out:} Make sure the program does not divide by zero. If you can't prove that the divisor is not zero, test whether it is zero, and skip the division if so. @node Numeric Comparisons @section Numeric Comparisons @cindex numeric comparisons @cindex comparisons @cindex operators, comparison @cindex equal operator @cindex not-equal operator @cindex less-than operator @cindex greater-than operator @cindex less-or-equal operator @cindex greater-or-equal operator @cindex operator, equal @cindex operator, not-equal @cindex operator, less-than @cindex operator, greater-than @cindex operator, less-or-equal @cindex operator, greater-or-equal @cindex truth value There are two kinds of comparison operators: @dfn{equality} and @dfn{ordering}. Equality comparisons test whether two expressions have the same value. The result is a @dfn{truth value}: a number that is 1 for ``true'' and 0 for ``false.'' @example a == b /* @r{Test for equal.} */ a != b /* @r{Test for not equal.} */ @end example The equality comparison is written @code{==} because plain @code{=} is the assignment operator. Ordering comparisons test which operand is greater or less. Their results are truth values. These are the ordering comparisons of C: @example a < b /* @r{Test for less-than.} */ a > b /* @r{Test for greater-than.} */ a <= b /* @r{Test for less-than-or-equal.} */ a >= b /* @r{Test for greater-than-or-equal.} */ @end example For any integers @code{a} and @code{b}, exactly one of the comparisons @code{a < b}, @code{a == b} and @code{a > b} is true, just as in mathematics. However, if @code{a} and @code{b} are special floating point values (not ordinary numbers), all three can be false. @xref{Special Float Values}, and @ref{Invalid Optimizations}. @node Shift Operations @section Shift Operations @cindex shift operators @cindex operators, shift @cindex operators, shift @cindex shift count @dfn{Shifting} an integer means moving the bit values to the left or right within the bits of the data type. Shifting is defined only for integers. Here's the way to write it: @example /* @r{Left shift.} */ 5 << 2 @result{} 20 /* @r{Right shift.} */ 5 >> 2 @result{} 1 @end example @noindent The left operand is the value to be shifted, and the right operand says how many bits to shift it (the @dfn{shift count}). The left operand is promoted (@pxref{Operand Promotions}), so shifting never operates on a narrow integer type; it's always either @code{int} or wider. The result of the shift operation has the same type as the promoted left operand. @menu * Bits Shifted In:: How shifting makes new bits to shift in. * Shift Caveats:: Caveats of shift operations. * Shift Hacks:: Clever tricks with shift operations. @end menu @node Bits Shifted In @subsection Shifting Makes New Bits A shift operation shifts towards one end of the number and has to generate new bits at the other end. Shifting left one bit must generate a new least significant bit. It always brings in zero there. It is equivalent to multiplying by the appropriate power of 2. For example, @example 5 << 3 @r{is equivalent to} 5 * 2*2*2 -10 << 4 @r{is equivalent to} -10 * 2*2*2*2 @end example The meaning of shifting right depends on whether the data type is signed or unsigned (@pxref{Signed and Unsigned Types}). For a signed data type, it performs ``arithmetic shift,'' which keeps the number's sign unchanged by duplicating the sign bit. For an unsigned data type, it performs ``logical shift,'' which always shifts in zeros at the most significant bit. In both cases, shifting right one bit is division by two, rounding towards negative infinity. For example, @example (unsigned) 19 >> 2 @result{} 4 (unsigned) 20 >> 2 @result{} 5 (unsigned) 21 >> 2 @result{} 5 @end example For negative left operand @code{a}, @code{a >> 1} is not equivalent to @code{a / 2}. They both divide by 2, but @samp{/} rounds toward zero. The shift count must be zero or greater. Shifting by a negative number of bits gives machine-dependent results. @node Shift Caveats @subsection Caveats for Shift Operations @strong{Warning:} If the shift count is greater than or equal to the width in bits of the promoted first operand, the results are machine-dependent. Logically speaking, the ``correct'' value would be either @minus{}1 (for right shift of a negative number) or 0 (in all other cases), but the actual result is whatever the machine's shift instruction does in that case. So unless you can prove that the second operand is not too large, write code to check it at run time. @strong{Warning:} Never rely on how the shift operators relate in precedence to other arithmetic binary operators. Programmers don't remember these precedences, and won't understand the code. Always use parentheses to explicitly specify the nesting, like this: @example a + (b << 5) /* @r{Shift first, then add.} */ (a + b) << 5 /* @r{Add first, then shift.} */ @end example Note: according to the C standard, shifting of signed values isn't guaranteed to work properly when the value shifted is negative, or becomes negative during the operation of shifting left. However, only pedants have a reason to be concerned about this; only computers with strange shift instructions could plausibly do this wrong. In GNU C, the operation always works as expected, @node Shift Hacks @subsection Shift Hacks You can use the shift operators for various useful hacks. For example, given a date specified by day of the month @code{d}, month @code{m}, and year @code{y}, you can store the entire date in a single integer @code{date}: @example unsigned int d = 12; unsigned int m = 6; unsigned int y = 1983; unsigned int date = ((y << 4) + m) << 5) + d; @end example @noindent To extract the original day, month, and year out of @code{date}, use a combination of shift and remainder. @example d = date % 32; m = (date >> 5) % 16; y = date >> 9; @end example @code{-1 << LOWBITS} is a clever way to make an integer whose @code{LOWBITS} lowest bits are all 0 and the rest are all 1. @code{-(1 << LOWBITS)} is equivalent to that, due to associativity of multiplication, since negating a value is equivalent to multiplying it by @minus{}1. @node Bitwise Operations @section Bitwise Operations @cindex bitwise operators @cindex operators, bitwise @cindex negation, bitwise @cindex conjunction, bitwise @cindex disjunction, bitwise Bitwise operators operate on integers, treating each bit independently. They are not allowed for floating-point types. The examples in this section use binary constants, starting with @samp{0b} (@pxref{Integer Constants}). They stand for 32-bit integers of type @code{int}. @table @code @item ~@code{a} Unary operator for bitwise negation; this changes each bit of @code{a} from 1 to 0 or from 0 to 1. @example ~0b10101000 @result{} 0b11111111111111111111111101010111 ~0 @result{} 0b11111111111111111111111111111111 ~0b11111111111111111111111111111111 @result{} 0 ~ (-1) @result{} 0 @end example It is useful to remember that @code{~@var{x} + 1} equals @code{-@var{x}}, for integers, and @code{~@var{x}} equals @code{-@var{x} - 1}. The last example above shows this with @minus{}1 as @var{x}. @item @code{a} & @code{b} Binary operator for bitwise ``and'' or ``conjunction.'' Each bit in the result is 1 if that bit is 1 in both @code{a} and @code{b}. @example 0b10101010 & 0b11001100 @result{} 0b10001000 @end example @item @code{a} | @code{b} Binary operator for bitwise ``or'' (``inclusive or'' or ``disjunction''). Each bit in the result is 1 if that bit is 1 in either @code{a} or @code{b}. @example 0b10101010 | 0b11001100 @result{} 0b11101110 @end example @item @code{a} ^ @code{b} Binary operator for bitwise ``xor'' (``exclusive or''). Each bit in the result is 1 if that bit is 1 in exactly one of @code{a} and @code{b}. @example 0b10101010 ^ 0b11001100 @result{} 0b01100110 @end example @end table To understand the effect of these operators on signed integers, keep in mind that all modern computers use two's-complement representation (@pxref{Integer Representations}) for negative integers. This means that the highest bit of the number indicates the sign; it is 1 for a negative number and 0 for a positive number. In a negative number, the value in the other bits @emph{increases} as the number gets closer to zero, so that @code{0b111@r{@dots{}}111} is @minus{}1 and @code{0b100@r{@dots{}}000} is the most negative possible integer. @strong{Warning:} C defines a precedence ordering for the bitwise binary operators, but you should never rely on it. You should never rely on how bitwise binary operators relate in precedence to the arithmetic and shift binary operators. Other programmers don't remember this precedence ordering, so always use parentheses to explicitly specify the nesting. For example, suppose @code{offset} is an integer that specifies the offset within shared memory of a table, except that its bottom few bits (@code{LOWBITS} says how many) are special flags. Here's how to get just that offset and add it to the base address. @example shared_mem_base + (offset & (-1 << LOWBITS)) @end example Thanks to the outer set of parentheses, we don't need to know whether @samp{&} has higher precedence than @samp{+}. Thanks to the inner set, we don't need to know whether @samp{&} has higher precedence than @samp{<<}. But we can rely on all unary operators to have higher precedence than any binary operator, so we don't need parentheses around the left operand of @samp{<<}. @node Assignment Expressions @chapter Assignment Expressions @cindex assignment expressions @cindex operators, assignment As a general concept in programming, an @dfn{assignment} is a construct that stores a new value into a place where values can be stored---for instance, in a variable. Such places are called @dfn{lvalues} (@pxref{Lvalues}) because they are locations that hold a value. An assignment in C is an expression because it has a value; we call it an @dfn{assignment expression}. A simple assignment looks like @example @var{lvalue} = @var{value-to-store} @end example @noindent We say it assigns the value of the expression @var{value-to-store} to the location @var{lvalue}, or that it stores @var{value-to-store} there. You can think of the ``l'' in ``lvalue'' as standing for ``left,'' since that's what you put on the left side of the assignment operator. However, that's not the only way to use an lvalue, and not all lvalues can be assigned to. To use the lvalue in the left side of an assignment, it has to be @dfn{modifiable}. In C, that means it was not declared with the type qualifier @code{const} (@pxref{const}). The value of the assignment expression is that of @var{lvalue} after the new value is stored in it. This means you can use an assignment inside other expressions. Assignment operators are right-associative so that @example x = y = z = 0; @end example @noindent is equivalent to @example x = (y = (z = 0)); @end example This is the only useful way for them to associate; the other way, @example ((x = y) = z) = 0; @end example @noindent would be invalid since an assignment expression such as @code{x = y} is not valid as an lvalue. @strong{Warning:} Write parentheses around an assignment if you nest it inside another expression, unless that is a conditional expression, or comma-separated series, or another assignment. @menu * Simple Assignment:: The basics of storing a value. * Lvalues:: Expressions into which a value can be stored. * Modifying Assignment:: Shorthand for changing an lvalue's contents. * Increment/Decrement:: Shorthand for incrementing and decrementing an lvalue's contents. * Postincrement/Postdecrement:: Accessing then incrementing or decrementing. * Assignment in Subexpressions:: How to avoid ambiguity. * Write Assignments Separately:: Write assignments as separate statements. @end menu @node Simple Assignment @section Simple Assignment @cindex simple assignment @cindex assignment, simple A @dfn{simple assignment expression} computes the value of the right operand and stores it into the lvalue on the left. Here is a simple assignment expression that stores 5 in @code{i}: @example i = 5 @end example @noindent We say that this is an @dfn{assignment to} the variable @code{i} and that it @dfn{assigns} @code{i} the value 5. It has no semicolon because it is an expression (so it has a value). Adding a semicolon at the end would make it a statement (@pxref{Expression Statement}). Here is another example of a simple assignment expression. Its operands are not simple, but the kind of assignment done here is simple assignment. @example x[foo ()] = y + 6 @end example A simple assignment with two different numeric data types converts the right operand value to the lvalue's type, if possible. It can convert any numeric type to any other numeric type. Simple assignment is also allowed on some non-numeric types: pointers (@pxref{Pointers}), structures (@pxref{Structure Assignment}), and unions (@pxref{Unions}). @strong{Warning:} Assignment is not allowed on arrays because there are no array values in C; C variables can be arrays, but these arrays cannot be manipulated as wholes. @xref{Limitations of C Arrays}. @xref{Assignment Type Conversions}, for the complete rules about data types used in assignments. @node Lvalues @section Lvalues @cindex lvalues An expression that identifies a memory space that holds a value is called an @dfn{lvalue}, because it is a location that can hold a value. The standard kinds of lvalues are: @itemize @bullet @item A variable. @item A pointer-dereference expression (@pxref{Pointer Dereference}) using unary @samp{*}. @item A structure field reference (@pxref{Structures}) using @samp{.}, if the structure value is an lvalue. @item A structure field reference using @samp{->}. This is always an lvalue since @samp{->} implies pointer dereference. @item A union alternative reference (@pxref{Unions}), on the same conditions as for structure fields. @item An array-element reference using @samp{[@r{@dots{}}]}, if the array is an lvalue. @end itemize If an expression's outermost operation is any other operator, that expression is not an lvalue. Thus, the variable @code{x} is an lvalue, but @code{x + 0} is not, even though these two expressions compute the same value (assuming @code{x} is a number). An array can be an lvalue (the rules above determine whether it is one), but using the array in an expression converts it automatically to a pointer to the first element. The result of this conversion is not an lvalue. Thus, if the variable @code{a} is an array, you can't use @code{a} by itself as the left operand of an assignment. But you can assign to an element of @code{a}, such as @code{a[0]}. That is an lvalue since @code{a} is an lvalue. @node Modifying Assignment @section Modifying Assignment @cindex modifying assignment @cindex assignment, modifying You can abbreviate the common construct @example @var{lvalue} = @var{lvalue} + @var{expression} @end example @noindent as @example @var{lvalue} += @var{expression} @end example This is known as a @dfn{modifying assignment}. For instance, @example i = i + 5; i += 5; @end example @noindent shows two statements that are equivalent. The first uses simple assignment; the second uses modifying assignment. Modifying assignment works with any binary arithmetic operator. For instance, you can subtract something from an lvalue like this, @example @var{lvalue} -= @var{expression} @end example @noindent or multiply it by a certain amount like this, @example @var{lvalue} *= @var{expression} @end example @noindent or shift it by a certain amount like this. @example @var{lvalue} <<= @var{expression} @var{lvalue} >>= @var{expression} @end example In most cases, this feature adds no power to the language, but it provides substantial convenience. Also, when @var{lvalue} contains code that has side effects, the simple assignment performs those side effects twice, while the modifying assignment performs them once. For instance, @example x[foo ()] = x[foo ()] + 5; @end example @noindent calls @code{foo} twice, and it could return different values each time. If @code{foo ()} returns 1 the first time and 3 the second time, then the effect could be to add @code{x[3]} and 5 and store the result in @code{x[1]}, or to add @code{x[1]} and 5 and store the result in @code{x[3]}. We don't know which of the two it will do, because C does not specify which call to @code{foo} is computed first. Such a statement is not well defined, and shouldn't be used. By contrast, @example x[foo ()] += 5; @end example @noindent is well defined: it calls @code{foo} only once to determine which element of @code{x} to adjust, and it adjusts that element by adding 5 to it. @node Increment/Decrement @section Increment and Decrement Operators @cindex increment operator @cindex decrement operator @cindex operator, increment @cindex operator, decrement @cindex preincrement expression @cindex predecrement expression The operators @samp{++} and @samp{--} are the @dfn{increment} and @dfn{decrement} operators. When used on a numeric value, they add or subtract 1. We don't consider them assignments, but they are equivalent to assignments. Using @samp{++} or @samp{--} as a prefix, before an lvalue, is called @dfn{preincrement} or @dfn{predecrement}. This adds or subtracts 1 and the result becomes the expression's value. For instance, @example #include /* @r{Declares @code{printf}.} */ int main (void) @{ int i = 5; printf ("%d\n", i); printf ("%d\n", ++i); printf ("%d\n", i); return 0; @} @end example @noindent prints lines containing 5, 6, and 6 again. The expression @code{++i} increments @code{i} from 5 to 6, and has the value 6, so the output from @code{printf} on that line says @samp{6}. Using @samp{--} instead, for predecrement, @example #include /* @r{Declares @code{printf}.} */ int main (void) @{ int i = 5; printf ("%d\n", i); printf ("%d\n", --i); printf ("%d\n", i); return 0; @} @end example @noindent prints three lines that contain (respectively) @samp{5}, @samp{4}, and again @samp{4}. @node Postincrement/Postdecrement @section Postincrement and Postdecrement @cindex postincrement expression @cindex postdecrement expression @cindex operator, postincrement @cindex operator, postdecrement Using @samp{++} or @samp{--} @emph{after} an lvalue does something peculiar: it gets the value directly out of the lvalue and @emph{then} increments or decrements it. Thus, the value of @code{i++} is the same as the value of @code{i}, but @code{i++} also increments @code{i} ``a little later.'' This is called @dfn{postincrement} or @dfn{postdecrement}. For example, @example #include /* @r{Declares @code{printf}.} */ int main (void) @{ int i = 5; printf ("%d\n", i); printf ("%d\n", i++); printf ("%d\n", i); return 0; @} @end example @noindent prints lines containing 5, again 5, and 6. The expression @code{i++} has the value 5, which is the value of @code{i} at the time, but it increments @code{i} from 5 to 6 just a little later. How much later is ``just a little later''? That is flexible. The increment has to happen by the next @dfn{sequence point}. In simple cases, that means by the end of the statement. @xref{Sequence Points}. If a unary operator precedes a postincrement or postincrement expression, the increment nests inside: @example -a++ @r{is equivalent to} -(a++) @end example That's the only order that makes sense; @code{-a} is not an lvalue, so it can't be incremented. The most common use of postincrement is with arrays. Here's an example of using postincrement to access one element of an array and advance the index for the next access. Compare this with the example @code{avg_of_double}, which is almost the same but doesn't use postincrement (@pxref{Array Example}). @example double avg_of_double_alt (int length, double input_data[]) @{ double sum = 0; int i; /* @r{Fetch each element and add it into @code{sum}.} */ for (i = 0; i < length;) /* @r{Use the index @code{i}, then increment it.} */ sum += input_data[i++]; return sum / length; @} @end example @node Assignment in Subexpressions @section Pitfall: Assignment in Subexpressions @cindex assignment in subexpressions @cindex subexpressions, assignment in In C, the order of computing parts of an expression is not fixed. Aside from a few special cases, the operations can be computed in any order. If one part of the expression has an assignment to @code{x} and another part of the expression uses @code{x}, the result is unpredictable because that use might be computed before or after the assignment. Here's an example of ambiguous code: @example x = 20; printf ("%d %d\n", x, x = 4); @end example @noindent If the second argument, @code{x}, is computed before the third argument, @code{x = 4}, the second argument's value will be 20. If they are computed in the other order, the second argument's value will be 4. Here's one way to make that code unambiguous: @example y = 20; printf ("%d %d\n", y, x = 4); @end example Here's another way, with the other meaning: @example x = 4; printf ("%d %d\n", x, x); @end example This issue applies to all kinds of assignments, and to the increment and decrement operators, which are equivalent to assignments. @xref{Order of Execution}, for more information about this. However, it can be useful to write assignments inside an @code{if}-condition or @code{while}-test along with logical operators. @xref{Logicals and Assignments}. @node Write Assignments Separately @section Write Assignments in Separate Statements It is often convenient to write an assignment inside an @code{if}-condition, but that can reduce the readability of the program. Here's an example of what to avoid: @example if (x = advance (x)) @r{@dots{}} @end example The idea here is to advance @code{x} and test if the value is nonzero. However, readers might miss the fact that it uses @samp{=} and not @samp{==}. In fact, writing @samp{=} where @samp{==} was intended inside a condition is a common error, so GNU C can give warnings when @samp{=} appears in a way that suggests it's an error. It is much clearer to write the assignment as a separate statement, like this: @example x = advance (x); if (x != 0) @r{@dots{}} @end example @noindent This makes it unmistakably clear that @code{x} is assigned a new value. Another method is to use the comma operator (@pxref{Comma Operator}), like this: @example if (x = advance (x), x != 0) @r{@dots{}} @end example @noindent However, putting the assignment in a separate statement is usually clearer unless the assignment is very short, because it reduces nesting. @node Execution Control Expressions @chapter Execution Control Expressions @cindex execution control expressions @cindex expressions, execution control This chapter describes the C operators that combine expressions to control which of those expressions execute, or in which order. @menu * Logical Operators:: Logical conjunction, disjunction, negation. * Logicals and Comparison:: Logical operators with comparison operators. * Logicals and Assignments:: Assignments with logical operators. * Conditional Expression:: An if/else construct inside expressions. * Comma Operator:: Build a sequence of subexpressions. @end menu @node Logical Operators @section Logical Operators @cindex logical operators @cindex operators, logical @cindex conjunction operator @cindex disjunction operator @cindex negation operator, logical The @dfn{logical operators} combine truth values, which are normally represented in C as numbers. Any expression with a numeric value is a valid truth value: zero means false, and any other value means true. A pointer type is also meaningful as a truth value; a null pointer (which is zero) means false, and a non-null pointer means true (@pxref{Pointer Types}). The value of a logical operator is always 1 or 0 and has type @code{int} (@pxref{Integer Types}). The logical operators are used mainly in the condition of an @code{if} statement, or in the end test in a @code{for} statement or @code{while} statement (@pxref{Statements}). However, they are valid in any context where an integer-valued expression is allowed. @table @samp @item ! @var{exp} Unary operator for logical ``not.'' The value is 1 (true) if @var{exp} is 0 (false), and 0 (false) if @var{exp} is nonzero (true). @strong{Warning:} if @code{exp} is anything but an lvalue or a function call, you should write parentheses around it. @item @var{left} && @var{right} The logical ``and'' binary operator computes @var{left} and, if necessary, @var{right}. If both of the operands are true, the @samp{&&} expression gives the value 1 (which is true). Otherwise, the @samp{&&} expression gives the value 0 (false). If @var{left} yields a false value, that determines the overall result, so @var{right} is not computed. @item @var{left} || @var{right} The logical ``or'' binary operator computes @var{left} and, if necessary, @var{right}. If at least one of the operands is true, the @samp{||} expression gives the value 1 (which is true). Otherwise, the @samp{||} expression gives the value 0 (false). If @var{left} yields a true value, that determines the overall result, so @var{right} is not computed. @end table @strong{Warning:} never rely on the relative precedence of @samp{&&} and @samp{||}. When you use them together, always use parentheses to specify explicitly how they nest, as shown here: @example if ((r != 0 && x % r == 0) || (s != 0 && x % s == 0)) @end example @node Logicals and Comparison @section Logical Operators and Comparisons The most common thing to use inside the logical operators is a comparison. Conveniently, @samp{&&} and @samp{||} have lower precedence than comparison operators and arithmetic operators, so we can write expressions like this without parentheses and get the nesting that is natural: two comparison operations that must both be true. @example if (r != 0 && x % r == 0) @end example @noindent This example also shows how it is useful that @samp{&&} guarantees to skip the right operand if the left one turns out false. Because of that, this code never tries to divide by zero. This is equivalent: @example if (r && x % r == 0) @end example @noindent A truth value is simply a number, so using @code{r} as a truth value tests whether it is nonzero. But @code{r}'s meaning as en expression is not a truth value---it is a number to divide by. So it is better style to write the explicit @code{!= 0}. Here's another equivalent way to write it: @example if (!(r == 0) && x % r == 0) @end example @noindent This illustrates the unary @samp{!} operator, and the need to write parentheses around its operand. @node Logicals and Assignments @section Logical Operators and Assignments There are cases where assignments nested inside the condition can actually make a program @emph{easier} to read. Here is an example using a hypothetical type @code{list} which represents a list; it tests whether the list has at least two links, using hypothetical functions, @code{nonempty} which is true if the argument is a nonempty list, and @code{list_next} which advances from one list link to the next. We assume that a list is never a null pointer, so that the assignment expressions are always ``true.'' @example if (nonempty (list) && (temp1 = list_next (list)) && nonempty (temp1) && (temp2 = list_next (temp1))) @r{@dots{}} /* @r{use @code{temp1} and @code{temp2}} */ @end example @noindent Here we take advantage of the @samp{&&} operator to avoid executing the rest of the code if a call to @code{nonempty} returns ``false.'' The only natural place to put the assignments is among those calls. It would be possible to rewrite this as several statements, but that could make it much more cumbersome. On the other hand, when the test is even more complex than this one, splitting it into multiple statements might be necessary for clarity. If an empty list is a null pointer, we can dispense with calling @code{nonempty}: @example if ((temp1 = list_next (list)) && (temp2 = list_next (temp1))) @r{@dots{}} @end example @node Conditional Expression @section Conditional Expression @cindex conditional expression @cindex expression, conditional C has a conditional expression that selects one of two expressions to compute and get the value from. It looks like this: @example @var{condition} ? @var{iftrue} : @var{iffalse} @end example @menu * Conditional Rules:: Rules for the conditional operator. * Conditional Branches:: About the two branches in a conditional. @end menu @node Conditional Rules @subsection Rules for the Conditional Operator The first operand, @var{condition}, should be a value that can be compared with zero---a number or a pointer. If it is true (nonzero), then the conditional expression computes @var{iftrue} and its value becomes the value of the conditional expression. Otherwise the conditional expression computes @var{iffalse} and its value becomes the value of the conditional expression. The conditional expression always computes just one of @var{iftrue} and @var{iffalse}, never both of them. Here's an example: the absolute value of a number @code{x} can be written as @code{(x >= 0 ? x : -x)}. @strong{Warning:} The conditional expression operators have rather low syntactic precedence. Except when the conditional expression is used as an argument in a function call, write parentheses around it. For clarity, always write parentheses around it if it extends across more than one line. Assignment operators and the comma operator (@pxref{Comma Operator}) have lower precedence than conditional expression operators, so write parentheses around those when they appear inside a conditional expression. @xref{Order of Execution}. @node Conditional Branches @subsection Conditional Operator Branches @cindex branches of conditional expression We call @var{iftrue} and @var{iffalse} the @dfn{branches} of the conditional. The two branches should normally have the same type, but a few exceptions are allowed. If they are both numeric types, the conditional converts both to their common type (@pxref{Common Type}). With pointers (@pxref{Pointers}), the two values can be pointers to nearly compatible types (@pxref{Compatible Types}). In this case, the result type is a similar pointer whose target type combines all the type qualifiers (@pxref{Type Qualifiers}) of both branches. If one branch has type @code{void *} and the other is a pointer to an object (not to a function), the conditional converts the @code{void *} branch to the type of the other. If one branch is an integer constant with value zero and the other is a pointer, the conditional converts zero to the pointer's type. In GNU C, you can omit @var{iftrue} in a conditional expression. In that case, if @var{condition} is nonzero, its value becomes the value of the conditional expression, after conversion to the common type. Thus, @example x ? : y @end example @noindent has the value of @code{x} if that is nonzero; otherwise, the value of @code{y}. @cindex side effect in ?: @cindex ?: side effect Omitting @var{iftrue} is useful when @var{condition} has side effects. In that case, writing that expression twice would carry out the side effects twice, but writing it once does them just once. For example, if we suppose that the function @code{next_element} advances a pointer variable to point to the next element in a list and returns the new pointer, @example next_element () ? : default_pointer @end example @noindent is a way to advance the pointer and use its new value if it isn't null, but use @code{default_pointer} if that is null. We cannot do it this way, @example next_element () ? next_element () : default_pointer @end example @noindent because that would advance the pointer a second time. @node Comma Operator @section Comma Operator @cindex comma operator @cindex operator, comma The comma operator stands for sequential execution of expressions. The value of the comma expression comes from the last expression in the sequence; the previous expressions are computed only for their side effects. It looks like this: @example @var{exp1}, @var{exp2} @r{@dots{}} @end example @noindent You can bundle any number of expressions together this way, by putting commas between them. @menu * Uses of Comma:: When to use the comma operator. * Clean Comma:: Clean use of the comma operator. * Avoid Comma:: When to not use the comma operator. @end menu @node Uses of Comma @subsection The Uses of the Comma Operator With commas, you can put several expressions into a place that requires just one expression---for example, in the header of a @code{for} statement. This statement @example for (i = 0, j = 10, k = 20; i < n; i++) @end example @noindent contains three assignment expressions, to initialize @code{i}, @code{j} and @code{k}. The syntax of @code{for} requires just one expression for initialization; to include three assignments, we use commas to bundle them into a single larger expression, @code{i = 0, j = 10, k = 20}. This technique is also useful in the loop-advance expression, the last of the three inside the @code{for} parentheses. In the @code{for} statement and the @code{while} statement (@pxref{Loop Statements}), a comma provides a way to perform some side effect before the loop-exit test. For example, @example while (printf ("At the test, x = %d\n", x), x != 0) @end example @node Clean Comma @subsection Clean Use of the Comma Operator Always write parentheses around a series of comma operators, except when it is at top level in an expression statement, or within the parentheses of an @code{if}, @code{for}, @code{while}, or @code{switch} statement (@pxref{Statements}). For instance, in @example for (i = 0, j = 10, k = 20; i < n; i++) @end example @noindent the commas between the assignments are clear because they are between a parenthesis and a semicolon. The arguments in a function call are also separated by commas, but that is not an instance of the comma operator. Note the difference between @example foo (4, 5, 6) @end example @noindent which passes three arguments to @code{foo} and @example foo ((4, 5, 6)) @end example @noindent which uses the comma operator and passes just one argument (with value 6). @strong{Warning:} don't use the comma operator around an argument of a function unless it makes the code more readable. When you do so, don't put part of another argument on the same line. Instead, add a line break to make the parentheses around the comma operator easier to see, like this. @example foo ((mumble (x, y), frob (z)), *p) @end example @node Avoid Comma @subsection When Not to Use the Comma Operator You can use a comma in any subexpression, but in most cases it only makes the code confusing, and it is clearer to raise all but the last of the comma-separated expressions to a higher level. Thus, instead of this: @example x = (y += 4, 8); @end example @noindent it is much clearer to write this: @example y += 4, x = 8; @end example @noindent or this: @example y += 4; x = 8; @end example Use commas only in the cases where there is no clearer alternative involving multiple statements. By contrast, don't hesitate to use commas in the expansion in a macro definition. The trade-offs of code clarity are different in that case, because the @emph{use} of the macro may improve overall clarity so much that the ugliness of the macro's @emph{definition} is a small price to pay. @xref{Macros}. @node Binary Operator Grammar @chapter Binary Operator Grammar @cindex binary operator grammar @cindex grammar, binary operator @cindex operator precedence @cindex precedence, operator @cindex left-associative @dfn{Binary operators} are those that take two operands, one on the left and one on the right. All the binary operators in C are syntactically left-associative. This means that @w{@code{a @var{op} b @var{op} c}} means @w{@code{(a @var{op} b) @var{op} c}}. However, the only operators you should repeat in this way without parentheses are @samp{+}, @samp{-}, @samp{*} and @samp{/}, because those cases are clear from algebra. So it is OK to write @code{a + b + c} or @code{a - b - c}, but never @code{a == b == c} or @code{a % b % c}. For those operators, use explicit parentheses to show how the operations nest. Each C operator has a @dfn{precedence}, which is its rank in the grammatical order of the various operators. The operators with the highest precedence grab adjoining operands first; these expressions then become operands for operators of lower precedence. The precedence order of operators in C is fully specified, so any combination of operations leads to a well-defined nesting. We state only part of the full precedence ordering here because it is bad practice for C code to depend on the other cases. For cases not specified in this chapter, always use parentheses to make the nesting explicit.@footnote{Personal note from Richard Stallman: I wrote GCC without remembering anything about the C precedence order beyond what's stated here. I studied the full precedence table to write the parser, and promptly forgot it again. If you need to look up the full precedence order to understand some C code, fix the code with parentheses so nobody else needs to do that.} You can depend on this subsequence of the precedence ordering (stated from highest precedence to lowest): @enumerate @item Component access (@samp{.} and @samp{->}). @item Unary prefix operators. @item Unary postfix operators. @item Multiplication, division, and remainder (they have the same precedence). @item Addition and subtraction (they have the same precedence). @item Comparisons---but watch out! @item Logical operators @samp{&&} and @samp{||}---but watch out! @item Conditional expression with @samp{?} and @samp{:}. @item Assignments. @item Sequential execution (the comma operator, @samp{,}). @end enumerate Two of the lines in the above list say ``but watch out!'' That means that the line covers operators with subtly different precedence. Never depend on the grammar of C to decide how two comparisons nest; instead, always use parentheses to specify their nesting. You can let several @samp{&&} operators associate, or several @samp{||} operators, but always use parentheses to show how @samp{&&} and @samp{||} nest with each other. @xref{Logical Operators}. There is one other precedence ordering that code can depend on: @enumerate @item Unary postfix operators. @item Bitwise and shift operators---but watch out! @item Conditional expression with @samp{?} and @samp{:}. @end enumerate The caveat for bitwise and shift operators is like that for logical operators: you can let multiple uses of one bitwise operator associate, but always use parentheses to control nesting of dissimilar operators. These lists do not specify any precedence ordering between the bitwise and shift operators of the second list and the binary operators above conditional expressions in the first list. When they come together, parenthesize them. @xref{Bitwise Operations}. @node Order of Execution @chapter Order of Execution @cindex order of execution The order of execution of a C program is not always obvious, and not necessarily predictable. This chapter describes what you can count on. @menu * Reordering of Operands:: Operations in C are not necessarily computed in the order they are written. * Associativity and Ordering:: Some associative operations are performed in a particular order; others are not. * Sequence Points:: Some guarantees about the order of operations. * Postincrement and Ordering:: Ambiguous execution order with postincrement. * Ordering of Operands:: Evaluation order of operands and function arguments. * Optimization and Ordering:: Compiler optimizations can reorder operations only if it has no impact on program results. @end menu @node Reordering of Operands @section Reordering of Operands @cindex ordering of operands @cindex reordering of operands @cindex operand execution ordering The C language does not necessarily carry out operations within an expression in the order they appear in the code. For instance, in this expression, @example foo () + bar () @end example @noindent @code{foo} might be called first or @code{bar} might be called first. If @code{foo} updates a datum and @code{bar} uses that datum, the results can be unpredictable. The unpredictable order of computation of subexpressions also makes a difference when one of them contains an assignment. We already saw this example of bad code, @example x = 20; printf ("%d %d\n", x, x = 4); @end example @noindent in which the second argument, @code{x}, has a different value depending on whether it is computed before or after the assignment in the third argument. @node Associativity and Ordering @section Associativity and Ordering @cindex associativity and ordering An associative binary operator, such as @code{+}, when used repeatedly can combine any number of operands. The operands' values may be computed in any order. If the values are integers and overflow can be ignored, they may be combined in any order. Thus, given four functions that return @code{unsigned int}, calling them and adding their results as here @example (foo () + bar ()) + (baz () + quux ()) @end example @noindent may add up the results in any order. By contrast, arithmetic on signed integers, in which overflow is significant, is not always associative (@pxref{Integer Overflow}). Thus, the additions must be done in the order specified, obeying parentheses and left-association. That means computing @code{(foo () + bar ())} and @code{(baz () + quux ())} first (in either order), then adding the two. The same applies to arithmetic on floating-point values, since that too is not really associative. However, the GCC option @option{-funsafe-math-optimizations} allows the compiler to change the order of calculation when an associative operation (associative in exact mathematics) combines several operands. The option takes effect when compiling a module (@pxref{Compilation}). Changing the order of association can enable the program to pipeline the floating point operations. In all these cases, the four function calls can be done in any order. There is no right or wrong about that. @node Sequence Points @section Sequence Points @cindex sequence points @cindex full expression There are some points in the code where C makes limited guarantees about the order of operations. These are called @dfn{sequence points}. Here is where they occur: @itemize @bullet @item At the end of a @dfn{full expression}; that is to say, an expression that is not part of a larger expression. All side effects specified by that expression are carried out before execution moves on to subsequent code. @item At the end of the first operand of certain operators: @samp{,}, @samp{&&}, @samp{||}, and @samp{?:}. All side effects specified by that expression are carried out before any execution of the next operand. The commas that separate arguments in a function call are @emph{not} comma operators, and they do not create sequence points. The rule for function arguments and the rule for operands are different (@pxref{Ordering of Operands}). @item Just before calling a function. All side effects specified by the argument expressions are carried out before calling the function. If the function to be called is not constant---that is, if it is computed by an expression---all side effects in that expression are carried out before calling the function. @end itemize The ordering imposed by a sequence point applies locally to a limited range of code, as stated above in each case. For instance, the ordering imposed by the comma operator does not apply to code outside the operands of that comma operator. Thus, in this code, @example (x = 5, foo (x)) + x * x @end example @noindent the sequence point of the comma operator orders @code{x = 5} before @code{foo (x)}, but @code{x * x} could be computed before or after them. @node Postincrement and Ordering @section Postincrement and Ordering @cindex postincrement and ordering @cindex ordering and postincrement The ordering requirements for the postincrement and postdecrement operations (@pxref{Postincrement/Postdecrement}) are loose: those side effects must happen ``a little later,'' before the next sequence point. That still leaves room for various orders that give different results. In this expression, @example z = x++ - foo () @end example @noindent it's unpredictable whether @code{x} gets incremented before or after calling the function @code{foo}. If @code{foo} refers to @code{x}, it might see the old value or it might see the incremented value. In this perverse expression, @example x = x++ @end example @noindent @code{x} will certainly be incremented but the incremented value may be replaced with the old value. That's because the incrementation and the assignment may occur in either oder. If the incrementation of @code{x} occurs after the assignment to @code{x}, the incremented value will remain in place. But if the incrementation happens first, the assignment will put the not-yet-incremented value back into @code{x}, so the expression as a whole will leave @code{x} unchanged. The conclusion: @strong{avoid such expressions}. Take care, when you use postincrement and postdecrement, that the specific expression you use is not ambiguous as to order of execution. @node Ordering of Operands @section Ordering of Operands @cindex ordering of operands @cindex operand ordering Operands and arguments can be computed in any order, but there are limits to this intermixing in GNU C: @itemize @bullet @item The operands of a binary arithmetic operator can be computed in either order, but they can't be intermixed: one of them has to come first, followed by the other. Any side effects in the operand that's computed first are executed before the other operand is computed. @item That applies to assignment operators too, except that, in simple assignment, the previous value of the left operand is unused. @item The arguments in a function call can be computed in any order, but they can't be intermixed. Thus, one argument is fully computed, then another, and so on until they have all been done. Any side effects in one argument are executed before computation of another argument begins. @end itemize These rules don't cover side effects caused by postincrement and postdecrement operators---those can be deferred up to the next sequence point. If you want to get pedantic, the fact is that GCC can reorder the computations in many other ways provided that it doesn't alter the result of running the program. However, because it doesn't alter the result of running the program, it is negligible, unless you are concerned with the values in certain variables at various times as seen by other processes. In those cases, you should use @code{volatile} to prevent optimizations that would make them behave strangely. @xref{volatile}. @node Optimization and Ordering @section Optimization and Ordering @cindex optimization and ordering @cindex ordering and optimization Sequence points limit the compiler's freedom to reorder operations arbitrarily, but optimizations can still reorder them if the compiler concludes that this won't alter the results. Thus, in this code, @example x++; y = z; x++; @end example @noindent there is a sequence point after each statement, so the code is supposed to increment @code{x} once before the assignment to @code{y} and once after. However, incrementing @code{x} has no effect on @code{y} or @code{z}, and setting @code{y} can't affect @code{x}, so the code could be optimized into this: @example y = z; x += 2; @end example Normally that has no effect except to make the program faster. But there are special situations where it can cause trouble due to things that the compiler cannot know about, such as shared memory. To limit optimization in those places, use the @code{volatile} type qualifier (@pxref{volatile}). @node Primitive Types @chapter Primitive Data Types @cindex primitive types @cindex types, primitive This chapter describes all the primitive data types of C---that is, all the data types that aren't built up from other types. They include the types @code{int} and @code{double} that we've already covered. @menu * Integer Types:: Description of integer types. * Floating-Point Data Types:: Description of floating-point types. * Complex Data Types:: Description of complex number types. * The Void Type:: A type indicating no value at all. * Other Data Types:: A brief summary of other types. * Type Designators:: Referring to a data type abstractly. @end menu These types are all made up of bytes (@pxref{Storage}). @node Integer Types @section Integer Data Types @cindex integer types @cindex types, integer Here we describe all the integer types and their basic characteristics. @xref{Integers in Depth}, for more information about the bit-level integer data representations and arithmetic. @menu * Basic Integers:: Overview of the various kinds of integers. * Signed and Unsigned Types:: Integers can either hold both negative and non-negative values, or only non-negative. * Narrow Integers:: When to use smaller integer types. * Integer Conversion:: Casting a value from one integer type to another. * Boolean Type:: An integer type for boolean values. * Integer Variations:: Sizes of integer types can vary across platforms. @end menu @node Basic Integers @subsection Basic Integers @findex char @findex int @findex short int @findex long int @findex long long int Integer data types in C can be signed or unsigned. An unsigned type can represent only positive numbers and zero. A signed type can represent both positive and negative numbers, in a range spread almost equally on both sides of zero. Aside from signedness, the integer data types vary in size: how many bytes long they are. The size determines the range of integer values the type can hold. Here's a list of the signed integer data types, with the sizes they have on most computers. Each has a corresponding unsigned type; see @ref{Signed and Unsigned Types}. @table @code @item signed char One byte (8 bits). This integer type is used mainly for integers that represent characters, usually as elements of arrays or fields of other data structures. @item short @itemx short int Two bytes (16 bits). @item int Four bytes (32 bits). @item long @itemx long int Four bytes (32 bits) or eight bytes (64 bits), depending on the platform. Typically it is 32 bits on 32-bit computers and 64 bits on 64-bit computers, but there are exceptions. @item long long @itemx long long int Eight bytes (64 bits). Supported in GNU C in the 1980s, and incorporated into standard C as of ISO C99. @end table You can omit @code{int} when you use @code{long} or @code{short}. This is harmless and customary. @node Signed and Unsigned Types @subsection Signed and Unsigned Types @cindex signed types @cindex unsigned types @cindex types, signed @cindex types, unsigned @findex signed @findex unsigned An unsigned integer type can represent only positive numbers and zero. A signed type can represent both positive and negative number, in a range spread almost equally on both sides of zero. For instance, @code{unsigned char} holds numbers from 0 to 255 (on most computers), while @code{signed char} holds numbers from @minus{}128 to 127. Each of these types holds 256 different possible values, since they are both 8 bits wide. Write @code{signed} or @code{unsigned} before the type keyword to specify a signed or an unsigned type. However, the integer types other than @code{char} are signed by default; with them, @code{signed} is a no-op. Plain @code{char} may be signed or unsigned; this depends on the compiler, the machine in use, and its operating system. In many programs, it makes no difference whether @code{char} is signed. When it does matter, don't leave it to chance; write @code{signed char} or @code{unsigned char}.@footnote{Personal note from Richard Stallman: Eating with hackers at a fish restaurant, I ordered Arctic Char. When my meal arrived, I noted that the chef had not signed it. So I complained, ``This char is unsigned---I wanted a signed char!'' Or rather, I would have said this if I had thought of it fast enough.} @node Narrow Integers @subsection Narrow Integers The types that are narrower than @code{int} are rarely used for ordinary variables---we declare them @code{int} instead. This is because C converts those narrower types to @code{int} for any arithmetic. There is literally no reason to declare a local variable @code{char}, for instance. In particular, if the value is really a character, you should declare the variable @code{int}. Not @code{char}! Using that narrow type can force the compiler to truncate values for conversion, which is a waste. Furthermore, some functions return either a character value, or @minus{}1 for ``no character.'' Using @code{int} makes it possible to distinguish @minus{}1 from a character by sign. The narrow integer types are useful as parts of other objects, such as arrays and structures. Compare these array declarations, whose sizes on 32-bit processors are shown: @example signed char ac[1000]; /* @r{1000 bytes} */ short as[1000]; /* @r{2000 bytes} */ int ai[1000]; /* @r{4000 bytes} */ long long all[1000]; /* @r{8000 bytes} */ @end example In addition, character strings must be made up of @code{char}s, because that's what all the standard library string functions expect. Thus, array @code{ac} could be used as a character string, but the others could not be. @node Integer Conversion @subsection Conversion among Integer Types C converts between integer types implicitly in many situations. It converts the narrow integer types, @code{char} and @code{short}, to @code{int} whenever they are used in arithmetic. Assigning a new value to an integer variable (or other lvalue) converts the value to the variable's type. You can also convert one integer type to another explicitly with a @dfn{cast} operator. @xref{Explicit Type Conversion}. The process of conversion to a wider type is straightforward: the value is unchanged. The only exception is when converting a negative value (in a signed type, obviously) to a wider unsigned type. In that case, the result is a positive value with the same bits (@pxref{Integers in Depth}). @cindex truncation Converting to a narrower type, also called @dfn{truncation}, involves discarding some of the value's bits. This is not considered overflow (@pxref{Integer Overflow}) because loss of significant bits is a normal consequence of truncation. Likewise for conversion between signed and unsigned types of the same width. More information about conversion for assignment is in @ref{Assignment Type Conversions}. For conversion for arithmetic, see @ref{Argument Promotions}. @node Boolean Type @subsection Boolean Type @cindex boolean type @cindex type, boolean @findex bool The unsigned integer type @code{bool} holds truth values: its possible values are 0 and 1. Converting any nonzero value to @code{bool} results in 1. For example: @example bool a = 0; bool b = 1; bool c = 4; /* @r{Stores the value 1 in @code{c}.} */ @end example Unlike @code{int}, @code{bool} is not a keyword. It is defined in the header file @file{stdbool.h}. @node Integer Variations @subsection Integer Variations The integer types of C have standard @emph{names}, but what they @emph{mean} varies depending on the kind of platform in use: which kind of computer, which operating system, and which compiler. It may even depend on the compiler options used. Plain @code{char} may be signed or unsigned; this depends on the platform, too. Even for GNU C, there is no general rule. In theory, all of the integer types' sizes can vary. @code{char} is always considered one ``byte'' for C, but it is not necessarily an 8-bit byte; on some platforms it may be more than 8 bits. ISO C specifies only that none of these types is narrower than the ones above it in the list in @ref{Basic Integers}, and that @code{short} has at least 16 bits. It is possible that in the future GNU C will support platforms where @code{int} is 64 bits long. In practice, however, on today's real computers, there is little variation; you can rely on the table given previously (@pxref{Basic Integers}). To be completely sure of the size of an integer type, use the types @code{int16_t}, @code{int32_t} and @code{int64_t}. Their corresponding unsigned types add @samp{u} at the front: @code{uint16_t}, @code{uint32_t} and @code{uint64_t}. To define all these types, include the header file @file{stdint.h}. The GNU C Compiler can compile for some embedded controllers that use two bytes for @code{int}. On some, @code{int} is just one ``byte,'' and so is @code{short int}---but that ``byte'' may contain 16 bits or even 32 bits. These processors can't support an ordinary operating system (they may have their own specialized operating systems), and most C programs do not try to support them. @node Floating-Point Data Types @section Floating-Point Data Types @cindex floating-point types @cindex types, floating-point @findex double @findex float @findex long double @dfn{Floating point} is the binary analogue of scientific notation: internally it represents a number as a fraction and a binary exponent; the value is that fraction multiplied by the specified power of 2. (The C standard nominally permits other bases, but in GNU C the base is always 2.) @c ??? For instance, to represent 6, the fraction would be 0.75 and the exponent would be 3; together they stand for the value @math{0.75 * 2@sup{3}}, meaning 0.75 * 8. The value 1.5 would use 0.75 as the fraction and 1 as the exponent. The value 0.75 would use 0.75 as the fraction and 0 as the exponent. The value 0.375 would use 0.75 as the fraction and @minus{}1 as the exponent. These binary exponents are used by machine instructions. You can write a floating-point constant this way if you wish, using hexadecimal; but normally we write floating-point numbers in decimal (base 10). @xref{Floating Constants}. C has three floating-point data types: @table @code @item double ``Double-precision'' floating point, which uses 64 bits. This is the normal floating-point type, and modern computers normally do their floating-point computations in this type, or some wider type. Except when there is a special reason to do otherwise, this is the type to use for floating-point values. @item float ``Single-precision'' floating point, which uses 32 bits. It is useful for floating-point values stored in structures and arrays, to save space when the full precision of @code{double} is not needed. In addition, single-precision arithmetic is faster on some computers, and occasionally that is useful. But not often---most programs don't use the type @code{float}. C would be cleaner if @code{float} were the name of the type we use for most floating-point values; however, for historical reasons, that's not so. @item long double ``Extended-precision'' floating point is either 80-bit or 128-bit precision, depending on the machine in use. On some machines, which have no floating-point format wider than @code{double}, this is equivalent to @code{double}. @end table Floating-point arithmetic raises many subtle issues. @xref{Floating Point in Depth}, for more information. @node Complex Data Types @section Complex Data Types @cindex complex numbers @cindex types, complex @cindex @code{_Complex} keyword @cindex @code{__complex__} keyword @findex _Complex @findex __complex__ Complex numbers can include both a real part and an imaginary part. The numeric constants covered above have real-numbered values. An imaginary-valued constant is an ordinary real-valued constant followed by @samp{i}. To declare numeric variables as complex, use the @code{_Complex} keyword.@footnote{For compatibility with older versions of GNU C, the keyword @code{__complex__} is also allowed. Going forward, however, use the new @code{_Complex} keyword as defined in ISO C11.} The standard C complex data types are floating point, @example _Complex float foo; _Complex double bar; _Complex long double quux; @end example @noindent but GNU C supports integer complex types as well. Since @code{_Complex} is a keyword just like @code{float} and @code{double} and @code{long}, the keywords can appear in any order, but the order shown above seems most logical. GNU C supports constants for complex values; for instance, @code{4.0 + 3.0i} has the value 4 + 3i as type @code{_Complex double}. @xref{Imaginary Constants}. To pull the real and imaginary parts of the number back out, GNU C provides the keywords @code{__real__} and @code{__imag__}: @example _Complex double foo = 4.0 + 3.0i; double a = __real__ foo; /* @r{@code{a} is now 4.0.} */ double b = __imag__ foo; /* @r{@code{b} is now 3.0.} */ @end example @noindent Standard C does not include these keywords, and instead relies on functions defined in @code{complex.h} for accessing the real and imaginary parts of a complex number: @code{crealf}, @code{creal}, and @code{creall} extract the real part of a float, double, or long double complex number, respectively; @code{cimagf}, @code{cimag}, and @code{cimagl} extract the imaginary part. @cindex complex conjugation GNU C also defines @samp{~} as an operator for complex conjugation, which means negating the imaginary part of a complex number: @example _Complex double foo = 4.0 + 3.0i; _Complex double bar = ~foo; /* @r{@code{bar} is now 4 @minus{} 3i.} */ @end example @noindent For standard C compatibility, you can use the appropriate library function: @code{conjf}, @code{conj}, or @code{confl}. @node The Void Type @section The Void Type @cindex void type @cindex type, void @findex void The data type @code{void} is a dummy---it allows no operations. It really means ``no value at all.'' When a function is meant to return no value, we write @code{void} for its return type. Then @code{return} statements in that function should not specify a value (@pxref{return Statement}). Here's an example: @example void print_if_positive (double x, double y) @{ if (x <= 0) return; if (y <= 0) return; printf ("Next point is (%f,%f)\n", x, y); @} @end example A @code{void}-returning function is comparable to what some other languages (for instance, Fortran and Pascal) call a ``procedure'' instead of a ``function.'' @c ??? Already presented @c @samp{%f} in an output template specifies to format a @code{double} value @c as a decimal number, using a decimal point if needed. @node Other Data Types @section Other Data Types Beyond the primitive types, C provides several ways to construct new data types. For instance, you can define @dfn{pointers}, values that represent the addresses of other data (@pxref{Pointers}). You can define @dfn{structures}, as in many other languages (@pxref{Structures}), and @dfn{unions}, which define multiple ways to interpret the contents of the same memory space (@pxref{Unions}). @dfn{Enumerations} are collections of named integer codes (@pxref{Enumeration Types}). @dfn{Array types} in C are used for allocating space for objects, but C does not permit operating on an array value as a whole. @xref{Arrays}. @node Type Designators @section Type Designators @cindex type designator Some C constructs require a way to designate a specific data type independent of any particular variable or expression which has that type. The way to do this is with a @dfn{type designator}. The constructs that need one include casts (@pxref{Explicit Type Conversion}) and @code{sizeof} (@pxref{Type Size}). We also use type designators to talk about the type of a value in C, so you will see many type designators in this manual. When we say, ``The value has type @code{int},'' @code{int} is a type designator. To make the designator for any type, imagine a variable declaration for a variable of that type and delete the variable name and the final semicolon. For example, to designate the type of full-word integers, we start with the declaration for a variable @code{foo} with that type, which is this: @example int foo; @end example @noindent Then we delete the variable name @code{foo} and the semicolon, leaving @code{int}---exactly the keyword used in such a declaration. Therefore, the type designator for this type is @code{int}. What about long unsigned integers? From the declaration @example unsigned long int foo; @end example @noindent we determine that the designator is @code{unsigned long int}. Following this procedure, the designator for any primitive type is simply the set of keywords which specifies that type in a declaration. The same is true for compound types such as structures, unions, and enumerations. Designators for pointer types do follow the rule of deleting the variable name and semicolon, but the result is not so simple. @xref{Pointer Type Designators}, as part of the chapter about pointers. @xref{Array Type Designators}), for designators for array types. To understand what type a designator stands for, imagine a variable name inserted into the right place in the designator to make a valid declaration. What type would that variable be declared as? That is the type the designator designates. @node Constants @chapter Constants @cindex constants A @dfn{constant} is an expression that stands for a specific value by explicitly representing the desired value. C allows constants for numbers, characters, and strings. We have already seen numeric and string constants in the examples. @menu * Integer Constants:: Literal integer values. * Integer Const Type:: Types of literal integer values. * Floating Constants:: Literal floating-point values. * Imaginary Constants:: Literal imaginary number values. * Invalid Numbers:: Avoiding preprocessing number misconceptions. * Character Constants:: Literal character values. * String Constants:: Literal string values. * UTF-8 String Constants:: Literal UTF-8 string values. * Unicode Character Codes:: Unicode characters represented in either UTF-16 or UTF-32. * Wide Character Constants:: Literal characters values larger than 8 bits. * Wide String Constants:: Literal string values made up of 16- or 32-bit characters. @end menu @node Integer Constants @section Integer Constants @cindex integer constants @cindex constants, integer An integer constant consists of a number to specify the value, followed optionally by suffix letters to specify the data type. The simplest integer constants are numbers written in base 10 (decimal), such as @code{5}, @code{77}, and @code{403}. A decimal constant cannot start with the character @samp{0} (zero) because that makes the constant octal. You can get the effect of a negative integer constant by putting a minus sign at the beginning. In grammatical terms, that is an arithmetic expression rather than a constant, but it behaves just like a true constant. Integer constants can also be written in octal (base 8), hexadecimal (base 16), or binary (base 2). An octal constant starts with the character @samp{0} (zero), followed by any number of octal digits (@samp{0} to @samp{7}): @example 0 // @r{zero} 077 // @r{63} 0403 // @r{259} @end example @noindent Pedantically speaking, the constant @code{0} is an octal constant, but we can think of it as decimal; it has the same value either way. A hexadecimal constant starts with @samp{0x} (upper or lower case) followed by hex digits (@samp{0} to @samp{9}, as well as @samp{a} through @samp{f} in upper or lower case): @example 0xff // @r{255} 0XA0 // @r{160} 0xffFF // @r{65535} @end example @cindex binary integer constants A binary constant starts with @samp{0b} (upper or lower case) followed by bits (each represented by the characters @samp{0} or @samp{1}): @example 0b101 // @r{5} @end example @noindent Binary constants are a GNU C extension, not part of the C standard. Sometimes a space is needed after an integer constant to avoid lexical confusion with the following tokens. @xref{Invalid Numbers}. @node Integer Const Type @section Integer Constant Data Types @cindex integer constant data types @cindex constant data types, integer @cindex types of integer constants The type of an integer constant is normally @code{int}, if the value fits in that type, but here are the complete rules. The type of an integer constant is the first one in this sequence that can properly represent the value, @enumerate @item @code{int} @item @code{unsigned int} @item @code{long int} @item @code{unsigned long int} @item @code{long long int} @item @code{unsigned long long int} @end enumerate @noindent and that isn't excluded by the following rules. If the constant has @samp{l} or @samp{L} as a suffix, that excludes the first two types (non-@code{long}). If the constant has @samp{ll} or @samp{LL} as a suffix, that excludes first four types (non-@code{long long}). If the constant has @samp{u} or @samp{U} as a suffix, that excludes the signed types. Otherwise, if the constant is decimal (not binary, octal, or hexadecimal), that excludes the unsigned types. @c ### This said @code{unsigned int} is excluded. @c ### See 17 April 2016 Here are some examples of the suffixes. @example 3000000000u // @r{three billion as @code{unsigned int}.} 0LL // @r{zero as a @code{long long int}.} 0403l // @r{259 as a @code{long int}.} @end example Suffixes in integer constants are rarely used. When the precise type is important, it is cleaner to convert explicitly (@pxref{Explicit Type Conversion}). @xref{Integer Types}. @node Floating Constants @section Floating-Point Constants @cindex floating-point constants @cindex constants, floating-point A floating-point constant must have either a decimal point, an exponent-of-ten, or both; they distinguish it from an integer constant. To indicate an exponent, write @samp{e} or @samp{E}. The exponent value follows. It is always written as a decimal number; it can optionally start with a sign. The exponent @var{n} means to multiply the constant's value by ten to the @var{n}th power. Thus, @samp{1500.0}, @samp{15e2}, @samp{15e+2}, @samp{15.0e2}, @samp{1.5e+3}, @samp{.15e4}, and @samp{15000e-1} are six ways of writing a floating-point number whose value is 1500. They are all equivalent. Here are more examples with decimal points: @example 1.0 1000. 3.14159 .05 .0005 @end example For each of them, here are some equivalent constants written with exponents: @example 1e0, 1.0000e0 100e1, 100e+1, 100E+1, 1e3, 10000e-1 3.14159e0 5e-2, .0005e+2, 5E-2, .0005E2 .05e-2 @end example A floating-point constant normally has type @code{double}. You can force it to type @code{float} by adding @samp{f} or @samp{F} at the end. For example, @example 3.14159f 3.14159e0f 1000.f 100E1F .0005f .05e-2f @end example Likewise, @samp{l} or @samp{L} at the end forces the constant to type @code{long double}. You can use exponents in hexadecimal floating constants, but since @samp{e} would be interpreted as a hexadecimal digit, the character @samp{p} or @samp{P} (for ``power'') indicates an exponent. The exponent in a hexadecimal floating constant is an optionally signed decimal integer that specifies a power of 2 (@emph{not} 10 or 16) to multiply into the number. Here are some examples: @example @group 0xAp2 // @r{40 in decimal} 0xAp-1 // @r{5 in decimal} 0x2.0Bp4 // @r{16.75 decimal} 0xE.2p3 // @r{121 decimal} 0x123.ABCp0 // @r{291.6708984375 in decimal} 0x123.ABCp4 // @r{4666.734375 in decimal} 0x100p-8 // @r{1} 0x10p-4 // @r{1} 0x1p+4 // @r{16} 0x1p+8 // @r{256} @end group @end example @xref{Floating-Point Data Types}. @node Imaginary Constants @section Imaginary Constants @cindex imaginary constants @cindex complex constants @cindex constants, imaginary A complex number consists of a real part plus an imaginary part. (You may omit one part if it is zero.) This section explains how to write numeric constants with imaginary values. By adding these to ordinary real-valued numeric constants, we can make constants with complex values. The simple way to write an imaginary-number constant is to attach the suffix @samp{i} or @samp{I}, or @samp{j} or @samp{J}, to an integer or floating-point constant. For example, @code{2.5fi} has type @code{_Complex float} and @code{3i} has type @code{_Complex int}. The four alternative suffix letters are all equivalent. @cindex _Complex_I The other way to write an imaginary constant is to multiply a real constant by @code{_Complex_I}, which represents the imaginary number i. Standard C doesn't support suffixing with @samp{i} or @samp{j}, so this clunky method is needed. To write a complex constant with a nonzero real part and a nonzero imaginary part, write the two separately and add them, like this: @example 4.0 + 3.0i @end example @noindent That gives the value 4 + 3i, with type @code{_Complex double}. Such a sum can include multiple real constants, or none. Likewise, it can include multiple imaginary constants, or none. For example: @example _Complex double foo, bar, quux; foo = 2.0i + 4.0 + 3.0i; /* @r{Imaginary part is 5.0.} */ bar = 4.0 + 12.0; /* @r{Imaginary part is 0.0.} */ quux = 3.0i + 15.0i; /* @r{Real part is 0.0.} */ @end example @xref{Complex Data Types}. @node Invalid Numbers @section Invalid Numbers Some number-like constructs which are not really valid as numeric constants are treated as numbers in preprocessing directives. If these constructs appear outside of preprocessing, they are erroneous. @xref{Preprocessing Tokens}. Sometimes we need to insert spaces to separate tokens so that they won't be combined into a single number-like construct. For example, @code{0xE+12} is a preprocessing number that is not a valid numeric constant, so it is a syntax error. If what we want is the three tokens @code{@w{0xE + 12}}, we have to insert two spaces as separators. @node Character Constants @section Character Constants @cindex character constants @cindex constants, character @cindex escape sequence A @dfn{character constant} is written with single quotes, as in @code{'@var{c}'}. In the simplest case, @var{c} is a single ASCII character that the constant should represent. The constant has type @code{int}, and its value is the character code of that character. For instance, @code{'a'} represents the character code for the letter @samp{a}: 97, that is. To put the @samp{'} character (single quote) in the character constant, @dfn{escape} it with a backslash (@samp{\}). This character constant looks like @code{'\''}. The backslash character here functions as an @dfn{escape character}, and such a sequence, starting with @samp{\}, is called an @dfn{escape sequence}. To put the @samp{\} character (backslash) in the character constant, escape it with @samp{\} (another backslash). This character constant looks like @code{'\\'}. @cindex bell character @cindex @samp{\a} @cindex backspace @cindex @samp{\b} @cindex tab (ASCII character) @cindex @samp{\t} @cindex vertical tab @cindex @samp{\v} @cindex formfeed @cindex @samp{\f} @cindex newline @cindex @samp{\n} @cindex return (ASCII character) @cindex @samp{\r} @cindex escape (ASCII character) @cindex @samp{\e} Here are all the escape sequences that represent specific characters in a character constant. The numeric values shown are the corresponding ASCII character codes, as decimal numbers. @example '\a' @result{} 7 /* @r{alarm, @kbd{CTRL-g}} */ '\b' @result{} 8 /* @r{backspace, @key{BS}, @kbd{CTRL-h}} */ '\t' @result{} 9 /* @r{tab, @key{TAB}, @kbd{CTRL-i}} */ '\n' @result{} 10 /* @r{newline, @kbd{CTRL-j}} */ '\v' @result{} 11 /* @r{vertical tab, @kbd{CTRL-k}} */ '\f' @result{} 12 /* @r{formfeed, @kbd{CTRL-l}} */ '\r' @result{} 13 /* @r{carriage return, @key{RET}, @kbd{CTRL-m}} */ '\e' @result{} 27 /* @r{escape character, @key{ESC}, @kbd{CTRL-[}} */ '\\' @result{} 92 /* @r{backslash character, @kbd{\}} */ '\'' @result{} 39 /* @r{single quote character, @kbd{'}} */ '\"' @result{} 34 /* @r{double quote character, @kbd{"}} */ '\?' @result{} 63 /* @r{question mark, @kbd{?}} */ @end example @samp{\e} is a GNU C extension; to stick to standard C, write @samp{\33}. (The number after @samp{backslash} is octal.) To specify a character constant using decimal, use a cast; for instance, @code{(unsigned char) 27}. You can also write octal and hex character codes as @samp{\@var{octalcode}} or @samp{\x@var{hexcode}}. Decimal is not an option here, so octal codes do not need to start with @samp{0}. The character constant's value has type @code{int}. However, the character code is treated initially as a @code{char} value, which is then converted to @code{int}. If the character code is greater than 127 (@code{0177} in octal), the resulting @code{int} may be negative on a platform where the type @code{char} is 8 bits long and signed. @node String Constants @section String Constants @cindex string constants @cindex constants, string A @dfn{string constant} represents a series of characters. It starts with @samp{"} and ends with @samp{"}; in between are the contents of the string. Quoting special characters such as @samp{"}, @samp{\} and newline in the contents works in string constants as in character constants. In a string constant, @samp{'} does not need to be quoted. A string constant defines an array of characters which contains the specified characters followed by the null character (code 0). Using the string constant is equivalent to using the name of an array with those contents. In simple cases, where there are no backslash escape sequences, the length in bytes of the string constant is one greater than the number of characters written in it. As with any array in C, using the string constant in an expression converts the array to a pointer (@pxref{Pointers}) to the array's first element (@pxref{Accessing Array Elements}). This pointer will have type @code{char *} because it points to an element of type @code{char}. @code{char *} is an example of a type designator for a pointer type (@pxref{Pointer Type Designators}). That type is used for strings generally, not just the strings expressed as constants in a program. Thus, the string constant @code{"Foo!"} is almost equivalent to declaring an array like this @example char string_array_1[] = @{'F', 'o', 'o', '!', '\0' @}; @end example @noindent and then using @code{string_array_1} in the program. There are two differences, however: @itemize @bullet @item The string constant doesn't define a name for the array. @item The string constant is probably stored in a read-only area of memory. @end itemize Newlines are not allowed in the text of a string constant. The motive for this prohibition is to catch the error of omitting the closing @samp{"}. To put a newline in a constant string, write it as @samp{\n} in the string constant. A real null character in the source code inside a string constant causes a warning. To put a null character in the middle of a string constant, write @samp{\0} or @samp{\000}. Consecutive string constants are effectively concatenated. Thus, @example "Fo" "o!" @r{is equivalent to} "Foo!" @end example This is useful for writing a string containing multiple lines, like this: @example "This message is so long that it needs more than\n" "a single line of text. C does not allow a newline\n" "to represent itself in a string constant, so we have to\n" "write \\n to put it in the string. For readability of\n" "the source code, it is advisable to put line breaks in\n" "the source where they occur in the contents of the\n" "constant.\n" @end example The sequence of a backslash and a newline is ignored anywhere in a C program, and that includes inside a string constant. Thus, you can write multi-line string constants this way: @example "This is another way to put newlines in a string constant\n\ and break the line after them in the source code." @end example @noindent However, concatenation is the recommended way to do this. You can also write perverse string constants like this, @example "Fo\ o!" @end example @noindent but don't do that---write it like this instead: @example "Foo!" @end example Be careful to avoid passing a string constant to a function that modifies the string it receives. The memory where the string constant is stored may be read-only, which would cause a fatal @code{SIGSEGV} signal that normally terminates the function (@pxref{Signals}. Even worse, the memory may not be read-only. Then the function might modify the string constant, thus spoiling the contents of other string constants that are supposed to contain the same value and are unified by the compiler. @node UTF-8 String Constants @section UTF-8 String Constants @cindex UTF-8 String Constants Writing @samp{u8} immediately before a string constant, with no intervening space, means to represent that string in UTF-8 encoding as a sequence of bytes. UTF-8 represents ASCII characters with a single byte, and represents non-ASCII Unicode characters (codes 128 and up) as multibyte sequences. Here is an example of a UTF-8 constant: @example u8"A cónstàñt" @end example This constant occupies 13 bytes plus the terminating null, because each of the accented letters is a two-byte sequence. Concatenating an ordinary string with a UTF-8 string conceptually produces another UTF-8 string. However, if the ordinary string contains character codes 128 and up, the results cannot be relied on. @node Unicode Character Codes @section Unicode Character Codes @cindex Unicode character codes @cindex universal character names You can specify Unicode characters, for individual character constants or as part of string constants (@pxref{String Constants}), using escape sequences. Use the @samp{\u} escape sequence with a 16-bit hexadecimal Unicode character code. If the code value is too big for 16 bits, use the @samp{\U} escape sequence with a 32-bit hexadecimal Unicode character code. (These codes are called @dfn{universal character names}.) For example, @example \u6C34 /* @r{16-bit code (UTF-16)} */ \U0010ABCD /* @r{32-bit code (UTF-32)} */ @end example @noindent One way to use these is in UTF-8 string constants (@pxref{UTF-8 String Constants}). For instance, @example u8"fóó \u6C34 \U0010ABCD" @end example You can also use them in wide character constants (@pxref{Wide Character Constants}), like this: @example u'\u6C34' /* @r{16-bit code} */ U'\U0010ABCD' /* @r{32-bit code} */ @end example @noindent and in wide string constants (@pxref{Wide String Constants}), like this: @example u"\u6C34\u6C33" /* @r{16-bit code} */ U"\U0010ABCD" /* @r{32-bit code} */ @end example Codes in the range of @code{D800} through @code{DFFF} are not valid in Unicode. Codes less than @code{00A0} are also forbidden, except for @code{0024}, @code{0040}, and @code{0060}; these characters are actually ASCII control characters, and you can specify them with other escape sequences (@pxref{Character Constants}). @node Wide Character Constants @section Wide Character Constants @cindex wide character constants @cindex constants, wide character A @dfn{wide character constant} represents characters with more than 8 bits of character code. This is an obscure feature that we need to document but that you probably won't ever use. If you're just learning C, you may as well skip this section. The original C wide character constant looks like @samp{L} (upper case!) followed immediately by an ordinary character constant (with no intervening space). Its data type is @code{wchar_t}, which is an alias defined in @file{stddef.h} for one of the standard integer types. Depending on the platform, it could be 16 bits or 32 bits. If it is 16 bits, these character constants use the UTF-16 form of Unicode; if 32 bits, UTF-32. There are also Unicode wide character constants which explicitly specify the width. These constants start with @samp{u} or @samp{U} instead of @samp{L}. @samp{u} specifies a 16-bit Unicode wide character constant, and @samp{U} a 32-bit Unicode wide character constant. Their types are, respectively, @code{char16_t} and @w{@code{char32_t}}; they are declared in the header file @file{uchar.h}. These character constants are valid even if @file{uchar.h} is not included, but some uses of them may be inconvenient without including it to declare those type names. The character represented in a wide character constant can be an ordinary ASCII character. @code{L'a'}, @code{u'a'} and @code{U'a'} are all valid, and they are all equal to @code{'a'}. In all three kinds of wide character constants, you can write a non-ASCII Unicode character in the constant itself; the constant's value is the character's Unicode character code. Or you can specify the Unicode character with an escape sequence (@pxref{Unicode Character Codes}). @node Wide String Constants @section Wide String Constants @cindex wide string constants @cindex constants, wide string A @dfn{wide string constant} stands for an array of 16-bit or 32-bit characters. They are rarely used; if you're just learning C, you may as well skip this section. There are three kinds of wide string constants, which differ in the data type used for each character in the string. Each wide string constant is equivalent to an array of integers, but the data type of those integers depends on the kind of wide string. Using the constant in an expression will convert the array to a pointer to its first element, as usual for arrays in C (@pxref{Accessing Array Elements}). For each kind of wide string constant, we state here what type that pointer will be. @table @code @item char16_t This is a 16-bit Unicode wide string constant: each element is a 16-bit Unicode character code with type @code{char16_t}, so the string has the pointer type @code{char16_t@ *}. (That is a type designator; @pxref{Pointer Type Designators}.) The constant is written as @samp{u} (which must be lower case) followed (with no intervening space) by a string constant with the usual syntax. @item char32_t This is a 32-bit Unicode wide string constant: each element is a 32-bit Unicode character code, and the string has type @code{char32_t@ *}. It's written as @samp{U} (which must be upper case) followed (with no intervening space) by a string constant with the usual syntax. @item wchar_t This is the original kind of wide string constant. It's written as @samp{L} (which must be upper case) followed (with no intervening space) by a string constant with the usual syntax, and the string has type @code{wchar_t@ *}. The width of the data type @code{wchar_t} depends on the target platform, which makes this kind of wide string somewhat less useful than the newer kinds. @end table @code{char16_t} and @code{char32_t} are declared in the header file @file{uchar.h}. @code{wchar_t} is declared in @file{stddef.h}. Consecutive wide string constants of the same kind concatenate, just like ordinary string constants. A wide string constant concatenated with an ordinary string constant results in a wide string constant. You can't concatenate two wide string constants of different kinds. In addition, you can't concatenate a wide string constant (of any kind) with a UTF-8 string constant. @node Type Size @chapter Type Size @cindex type size @cindex size of type @findex sizeof Each data type has a @dfn{size}, which is the number of bytes (@pxref{Storage}) that it occupies in memory. To refer to the size in a C program, use @code{sizeof}. There are two ways to use it: @table @code @item sizeof @var{expression} This gives the size of @var{expression}, based on its data type. It does not calculate the value of @var{expression}, only its size, so if @var{expression} includes side effects or function calls, they do not happen. Therefore, @code{sizeof} is always a compile-time operation that has zero run-time cost. A value that is a bit field (@pxref{Bit Fields}) is not allowed as an operand of @code{sizeof}. For example, @example double a; i = sizeof a + 10; @end example @noindent sets @code{i} to 18 on most computers because @code{a} occupies 8 bytes. Here's how to determine the number of elements in an array @code{array}: @example (sizeof array / sizeof array[0]) @end example @noindent The expression @code{sizeof array} gives the size of the array, not the size of a pointer to an element. However, if @var{expression} is a function parameter that was declared as an array, that variable really has a pointer type (@pxref{Array Parm Pointer}), so the result is the size of that pointer. @item sizeof (@var{type}) This gives the size of @var{type}. For example, @example i = sizeof (double) + 10; @end example @noindent is equivalent to the previous example. You can't apply @code{sizeof} to an incomplete type (@pxref{Incomplete Types}), nor @code{void}. Using it on a function type gives 1 in GNU C, which makes adding an integer to a function pointer work as desired (@pxref{Pointer Arithmetic}). @end table @strong{Warning}: When you use @code{sizeof} with a type instead of an expression, you must write parentheses around the type. @strong{Warning}: When applying @code{sizeof} to the result of a cast (@pxref{Explicit Type Conversion}), you must write parentheses around the cast expression to avoid an ambiguity in the grammar of C@. Specifically, @example sizeof (int) -x @end example @noindent parses as @example (sizeof (int)) - x @end example @noindent If what you want is @example sizeof ((int) -x) @end example @noindent you must write it that way, with parentheses. The data type of the value of the @code{sizeof} operator is always one of the unsigned integer types; which one of those types depends on the machine. The header file @code{stddef.h} defines the typedef name @code{size_t} as an alias for this type. @xref{Defining Typedef Names}. @node Pointers @chapter Pointers @cindex pointers Among high-level languages, C is rather low-level, close to the machine. This is mainly because it has explicit @dfn{pointers}. A pointer value is the numeric address of data in memory. The type of data to be found at that address is specified by the data type of the pointer itself. Nothing in C can determine the ``correct'' data type of data in memory; it can only blindly follow the data type of the pointer you use to access the data. The unary operator @samp{*} gets the data that a pointer points to---this is called @dfn{dereferencing the pointer}. Its value always has the type that the pointer points to. C also allows pointers to functions, but since there are some differences in how they work, we treat them later. @xref{Function Pointers}. @menu * Address of Data:: Using the ``address-of'' operator. * Pointer Types:: For each type, there is a pointer type. * Pointer Declarations:: Declaring variables with pointer types. * Pointer Type Designators:: Designators for pointer types. * Pointer Dereference:: Accessing what a pointer points at. * Null Pointers:: Pointers which do not point to any object. * Invalid Dereference:: Dereferencing null or invalid pointers. * Void Pointers:: Totally generic pointers, can cast to any. * Pointer Comparison:: Comparing memory address values. * Pointer Arithmetic:: Computing memory address values. * Pointers and Arrays:: Using pointer syntax instead of array syntax. * Low-Level Pointer Arithmetic:: More about computing memory address values. * Pointer Increment/Decrement:: Incrementing and decrementing pointers. * Pointer Arithmetic Drawbacks:: A common pointer bug to watch out for. * Pointer-Integer Conversion:: Converting pointer types to integer types. * Printing Pointers:: Using @code{printf} for a pointer's value. @end menu @node Address of Data @section Address of Data @cindex address-of operator The most basic way to make a pointer is with the ``address-of'' operator, @samp{&}. Let's suppose we have these variables available: @example int i; double a[5]; @end example Now, @code{&i} gives the address of the variable @code{i}---a pointer value that points to @code{i}'s location---and @code{&a[3]} gives the address of the element 3 of @code{a}. (It is actually the fourth element in the array, since the first element has index 0.) The address-of operator is unusual because it operates on a place to store a value (an lvalue, @pxref{Lvalues}), not on the value currently stored there. (The left argument of a simple assignment is unusual in the same way.) You can use it on any lvalue except a bit field (@pxref{Bit Fields}) or a constructor (@pxref{Structure Constructors}). @node Pointer Types @section Pointer Types For each data type @var{t}, there is a type for pointers to type @var{t}. For these variables, @example int i; double a[5]; @end example @itemize @bullet @item @code{i} has type @code{int}; we say @code{&i} is a ``pointer to @code{int}.'' @item @code{a} has type @code{double[5]}; we say @code{&a} is a ``pointer to arrays of five @code{double}s.'' @item @code{a[3]} has type @code{double}; we say @code{&a[3]} is a ``pointer to @code{double}.'' @end itemize @node Pointer Declarations @section Pointer-Variable Declarations The way to declare that a variable @code{foo} points to type @var{t} is @example @var{t} *foo; @end example To remember this syntax, think ``if you dereference @code{foo}, using the @samp{*} operator, what you get is type @var{t}. Thus, @code{foo} points to type @var{t}.'' Thus, we can declare variables that hold pointers to these three types, like this: @example int *ptri; /* @r{Pointer to @code{int}.} */ double *ptrd; /* @r{Pointer to @code{double}.} */ double (*ptrda)[5]; /* @r{Pointer to @code{double[5]}.} */ @end example @samp{int *ptri;} means, ``if you dereference @code{ptri}, you get an @code{int}.'' @samp{double (*ptrda)[5];} means, ``if you dereference @code{ptrda}, then subscript it by an integer less than 5, you get a @code{double}.'' The parentheses express the point that you would dereference it first, then subscript it. Contrast the last one with this: @example double *aptrd[5]; /* @r{Array of five pointers to @code{double}.} */ @end example @noindent Because @samp{*} has lower syntactic precedence than subscripting, @samp{double *aptrd[5]} means, ``if you subscript @code{aptrd} by an integer less than 5, then dereference it, you get a @code{double}.'' Therefore, @code{*aptrd[5]} declares an array of pointers, not a pointer to an array. @node Pointer Type Designators @section Pointer-Type Designators Every type in C has a designator; you make it by deleting the variable name and the semicolon from a declaration (@pxref{Type Designators}). Here are the designators for the pointer types of the example declarations in the previous section: @example int * /* @r{Pointer to @code{int}.} */ double * /* @r{Pointer to @code{double}.} */ double (*)[5] /* @r{Pointer to @code{double[5]}.} */ @end example Remember, to understand what type a designator stands for, imagine the corresponding variable declaration with a variable name in it, and figure out what type that variable would have. Thus, the type designator @code{double (*)[5]} corresponds to the variable declaration @code{double (*@var{variable})[5]}. That deciares a pointer variable which, when dereferenced, gives an array of 5 @code{double}s. So the type designator means, ``pointer to an array of 5 @code{double}s.'' @node Pointer Dereference @section Dereferencing Pointers @cindex dereferencing pointers @cindex pointer dereferencing The main use of a pointer value is to @dfn{dereference it} (access the data it points at) with the unary @samp{*} operator. For instance, @code{*&i} is the value at @code{i}'s address---which is just @code{i}. The two expressions are equivalent, provided @code{&i} is valid. A pointer-dereference expression whose type is data (not a function) is an lvalue. Pointers become really useful when we store them somewhere and use them later. Here's a simple example to illustrate the practice: @example @{ int i; int *ptr; ptr = &i; i = 5; @r{@dots{}} return *ptr; /* @r{Returns 5, fetched from @code{i}.} */ @} @end example This shows how to declare the variable @code{ptr} as type @code{int *} (pointer to @code{int}), store a pointer value into it (pointing at @code{i}), and use it later to get the value of the object it points at (the value in @code{i}). If anyone can provide a useful example which is this basic, I would be grateful. @node Null Pointers @section Null Pointers @cindex null pointers @cindex pointers, null @c ???stdio loads sttddef A pointer value can be @dfn{null}, which means it does not point to any object. The cleanest way to get a null pointer is by writing @code{NULL}, a standard macro defined in @file{stddef.h}. You can also do it by casting 0 to the desired pointer type, as in @code{(char *) 0}. (The cast operator performs explicit type conversion; @xref{Explicit Type Conversion}.) You can store a null pointer in any lvalue whose data type is a pointer type: @example char *foo; foo = NULL; @end example These two, if consecutive, can be combined into a declaration with initializer, @example char *foo = NULL; @end example You can also explicitly cast @code{NULL} to the specific pointer type you want---it makes no difference. @example char *foo; foo = (char *) NULL; @end example To test whether a pointer is null, compare it with zero or @code{NULL}, as shown here: @example if (p != NULL) /* @r{@code{p} is not null.} */ operate (p); @end example Since testing a pointer for not being null is basic and frequent, all but beginners in C will understand the conditional without need for @code{!= NULL}: @example if (p) /* @r{@code{p} is not null.} */ operate (p); @end example @node Invalid Dereference @section Dereferencing Null or Invalid Pointers Trying to dereference a null pointer is an error. On most platforms, it generally causes a signal, usually @code{SIGSEGV} (@pxref{Signals}). @example char *foo = NULL; c = *foo; /* @r{This causes a signal and terminates.} */ @end example @noindent Likewise a pointer that has the wrong alignment for the target data type (on most types of computer), or points to a part of memory that has not been allocated in the process's address space. The signal terminates the program, unless the program has arranged to handle the signal (@pxref{Signal Handling, The GNU C Library, , libc, The GNU C Library Reference Manual}). However, the signal might not happen if the dereference is optimized away. In the example above, if you don't subsequently use the value of @code{c}, GCC might optimize away the code for @code{*foo}. You can prevent such optimization using the @code{volatile} qualifier, as shown here: @example volatile char *p; volatile char c; c = *p; @end example You can use this to test whether @code{p} points to unallocated memory. Set up a signal handler first, so the signal won't terminate the program. @node Void Pointers @section Void Pointers @cindex void pointers @cindex pointers, void The peculiar type @code{void *}, a pointer whose target type is @code{void}, is used often in C@. It represents a pointer to we-don't-say-what. Thus, @example void *numbered_slot_pointer (int); @end example @noindent declares a function @code{numbered_slot_pointer} that takes an integer parameter and returns a pointer, but we don't say what type of data it points to. With type @code{void *}, you can pass the pointer around and test whether it is null. However, dereferencing it gives a @code{void} value that can't be used (@pxref{The Void Type}). To dereference the pointer, first convert it to some other pointer type. Assignments convert @code{void *} automatically to any other pointer type, if the left operand has a pointer type; for instance, @example @{ int *p; /* @r{Converts return value to @code{int *}.} */ p = numbered_slot_pointer (5); @r{@dots{}} @} @end example Passing an argument of type @code{void *} for a parameter that has a pointer type also converts. For example, supposing the function @code{hack} is declared to require type @code{float *} for its argument, this will convert the null pointer to that type. @example /* @r{Declare @code{hack} that way.} @r{We assume it is defined somewhere else.} */ void hack (float *); @dots{} /* @r{Now call @code{hack}.} */ @{ /* @r{Converts return value of @code{numbered_slot_pointer}} @r{to @code{float *} to pass it to @code{hack}.} */ hack (numbered_slot_pointer (5)); @r{@dots{}} @} @end example You can also convert to another pointer type with an explicit cast (@pxref{Explicit Type Conversion}), like this: @example (int *) numbered_slot_pointer (5) @end example Here is an example which decides at run time which pointer type to convert to: @example void extract_int_or_double (void *ptr, bool its_an_int) @{ if (its_an_int) handle_an_int (*(int *)ptr); else handle_a_double (*(double *)ptr); @} @end example The expression @code{*(int *)ptr} means to convert @code{ptr} to type @code{int *}, then dereference it. @node Pointer Comparison @section Pointer Comparison @cindex pointer comparison @cindex comparison, pointer Two pointer values are equal if they point to the same location, or if they are both null. You can test for this with @code{==} and @code{!=}. Here's a trivial example: @example @{ int i; int *p, *q; p = &i; q = &i; if (p == q) printf ("This will be printed.\n"); if (p != q) printf ("This won't be printed.\n"); @} @end example Ordering comparisons such as @code{>} and @code{>=} operate on pointers by converting them to unsigned integers. The C standard says the two pointers must point within the same object in memory, but on GNU/Linux systems these operations simply compare the numeric values of the pointers. The pointer values to be compared should in principle have the same type, but they are allowed to differ in limited cases. First of all, if the two pointers' target types are nearly compatible (@pxref{Compatible Types}), the comparison is allowed. If one of the operands is @code{void *} (@pxref{Void Pointers}) and the other is another pointer type, the comparison operator converts the @code{void *} pointer to the other type so as to compare them. (In standard C, this is not allowed if the other type is a function pointer type, but it works in GNU C@.) Comparison operators also allow comparing the integer 0 with a pointer value. This works by converting 0 to a null pointer of the same type as the other operand. @node Pointer Arithmetic @section Pointer Arithmetic @cindex pointer arithmetic @cindex arithmetic, pointer Adding an integer (positive or negative) to a pointer is valid in C@. It assumes that the pointer points to an element in an array, and advances or retracts the pointer across as many array elements as the integer specifies. Here is an example, in which adding a positive integer advances the pointer to a later element in the same array. @example void incrementing_pointers () @{ int array[5] = @{ 45, 29, 104, -3, 123456 @}; int elt0, elt1, elt4; int *p = &array[0]; /* @r{Now @code{p} points at element 0. Fetch it.} */ elt0 = *p; ++p; /* @r{Now @code{p} points at element 1. Fetch it.} */ elt1 = *p; p += 3; /* @r{Now @code{p} points at element 4 (the last). Fetch it.} */ elt4 = *p; printf ("elt0 %d elt1 %d elt4 %d.\n", elt0, elt1, elt4); /* @r{Prints elt0 45 elt1 29 elt4 123456.} */ @} @end example Here's an example where adding a negative integer retracts the pointer to an earlier element in the same array. @example void decrementing_pointers () @{ int array[5] = @{ 45, 29, 104, -3, 123456 @}; int elt0, elt3, elt4; int *p = &array[4]; /* @r{Now @code{p} points at element 4 (the last). Fetch it.} */ elt4 = *p; --p; /* @r{Now @code{p} points at element 3. Fetch it.} */ elt3 = *p; p -= 3; /* @r{Now @code{p} points at element 0. Fetch it.} */ elt0 = *p; printf ("elt0 %d elt3 %d elt4 %d.\n", elt0, elt3, elt4); /* @r{Prints elt0 45 elt3 -3 elt4 123456.} */ @} @end example If one pointer value was made by adding an integer to another pointer value, it should be possible to subtract the pointer values and recover that integer. That works too in C@. @example void subtract_pointers () @{ int array[5] = @{ 45, 29, 104, -3, 123456 @}; int *p0, *p3, *p4; int *p = &array[4]; /* @r{Now @code{p} points at element 4 (the last). Save the value.} */ p4 = p; --p; /* @r{Now @code{p} points at element 3. Save the value.} */ p3 = p; p -= 3; /* @r{Now @code{p} points at element 0. Save the value.} */ p0 = p; printf ("%d, %d, %d, %d\n", p4 - p0, p0 - p0, p3 - p0, p0 - p3); /* @r{Prints 4, 0, 3, -3.} */ @} @end example The addition operation does not know where arrays begin or end in memory. All it does is add the integer (multiplied by target object size) to the numeric value of the pointer. When the initial pointer and the result point into the same array, the result is well-defined. @strong{Warning:} Only experts should do pointer arithmetic involving pointers into different memory objects. The difference between two pointers has type @code{int}, or @code{long} if necessary (@pxref{Integer Types}). The clean way to declare it is to use the typedef name @code{ptrdiff_t} defined in the file @file{stddef.h}. C defines pointer subtraction to be consistent with pointer-integer addition, so that @code{(p3 - p1) + p1} equals @code{p3}, as in ordinary algebra. Pointer subtraction works by subtracting @code{p1}'s numeric value from @code{p3}'s, and dividing by target object size. The two pointer arguments should point into the same array. In standard C, addition and subtraction are not allowed on @code{void *}, since the target type's size is not defined in that case. Likewise, they are not allowed on pointers to function types. However, these operations work in GNU C, and the ``size of the target type'' is taken as 1 byte. @node Pointers and Arrays @section Pointers and Arrays @cindex pointers and arrays @cindex arrays and pointers The clean way to refer to an array element is @code{@var{array}[@var{index}]}. Another, complicated way to do the same job is to get the address of that element as a pointer, then dereference it: @code{* (&@var{array}[0] + @var{index})} (or equivalently @code{* (@var{array} + @var{index})}). This first gets a pointer to element zero, then increments it with @code{+} to point to the desired element, then gets the value from there. That pointer-arithmetic construct is the @emph{definition} of square brackets in C@. @code{@var{a}[@var{b}]} means, by definition, @code{*(@var{a} + @var{b})}. This definition uses @var{a} and @var{b} symmetrically, so one must be a pointer and the other an integer; it does not matter which comes first. Since indexing with square brackets is defined in terms of addition and dereferencing, that too is symmetrical. Thus, you can write @code{3[array]} and it is equivalent to @code{array[3]}. However, it would be foolish to write @code{3[array]}, since it has no advantage and could confuse people who read the code. It may seem like a discrepancy that the definition @code{*(@var{a} + @var{b})} requires a pointer, while @code{array[3]} uses an array value instead. Why is this valid? The name of the array, when used by itself as an expression (other than in @code{sizeof}), stands for a pointer to the array's zeroth element. Thus, @code{array + 3} converts @code{array} implicitly to @code{&array[0]}, and the result is a pointer to element 3, equivalent to @code{&array[3]}. Since square brackets are defined in terms of such an addition, @code{array[3]} first converts @code{array} to a pointer. That's why it works to use an array directly in that construct. @node Low-Level Pointer Arithmetic @section Pointer Arithmetic at Low-Level @cindex pointer arithmetic, low-level @cindex low level pointer arithmetic The behavior of pointer arithmetic is theoretically defined only when the pointer values all point within one object allocated in memory. But the addition and subtraction operators can't tell whether the pointer values are all within one object. They don't know where objects start and end. So what do they really do? Adding pointer @var{p} to integer @var{i} treats @var{p} as a memory address, which is in fact an integer---call it @var{pint}. It treats @var{i} as a number of elements of the type that @var{p} points to. These elements' sizes add up to @code{@var{i} * sizeof (*@var{p})}. So the sum, as an integer, is @code{@var{pint} + @var{i} * sizeof (*@var{p})}. This value is reinterpreted as a pointer of the same type as @var{p}. If the starting pointer value @var{p} and the result do not point at parts of the same object, the operation is not officially legitimate, and C code is not ``supposed'' to do it. But you can do it anyway, and it gives precisely the results described by the procedure above. In some special situations it can do something useful, but non-wizards should avoid it. Here's a function to offset a pointer value @emph{as if} it pointed to an object of any given size, by explicitly performing that calculation: @example #include void * ptr_add (void *p, int i, int objsize) @{ intptr_t p_address = (long) p; intptr_t totalsize = i * objsize; intptr_t new_address = p_address + totalsize; return (void *) new_address; @} @end example @noindent @cindex @code{intptr_t} This does the same job as @code{@var{p} + @var{i}} with the proper pointer type for @var{p}. It uses the type @code{intptr_t}, which is defined in the header file @file{stdint.h}. (In practice, @code{long long} would always work, but it is cleaner to use @code{intptr_t}.) @node Pointer Increment/Decrement @section Pointer Increment and Decrement @cindex pointer increment and decrement @cindex incrementing pointers @cindex decrementing pointers The @samp{++} operator adds 1 to a variable. We have seen it for integers (@pxref{Increment/Decrement}), but it works for pointers too. For instance, suppose we have a series of positive integers, terminated by a zero, and we want to add them up. Here is a simple way to step forward through the array by advancing a pointer. @example int sum_array_till_0 (int *p) @{ int sum = 0; for (;;) @{ /* @r{Fetch the next integer.} */ int next = *p++; /* @r{Exit the loop if it's 0.} */ if (next == 0) break; /* @r{Add it into running total.} */ sum += next; @} return sum; @} @end example @noindent The statement @samp{break;} will be explained further on (@pxref{break Statement}). Used in this way, it immediately exits the surrounding @code{for} statement. @code{*p++} parses as @code{*(p++)}, because a postfix operator always takes precedence over a prefix operator. Therefore, it dereferences the entering value of @code{p}, then increments @code{p} afterwards. Incrementing a variable means adding 1 to it, as in @code{p = p + 1}. Since @code{p} is a pointer, adding 1 to it advances it by the width of the datum it points to---in this case, @code{sizeof (int)}. Therefore, each iteration of the loop picks up the next integer from the series and puts it into @code{next}. This @code{for}-loop has no initialization expression since @code{p} and @code{sum} are already initialized, has no end-test since the @samp{break;} statement will exit it, and needs no expression to advance it since that's done within the loop by incrementing @code{p} and @code{sum}. Thus, those three expressions after @code{for} are left empty. Another way to write this function is by keeping the parameter value unchanged and using indexing to access the integers in the table. @example int sum_array_till_0_indexing (int *p) @{ int i; int sum = 0; for (i = 0; ; i++) @{ /* @r{Fetch the next integer.} */ int next = p[i]; /* @r{Exit the loop if it's 0.} */ if (next == 0) break; /* @r{Add it into running total.} */ sum += next; @} return sum; @} @end example In this program, instead of advancing @code{p}, we advance @code{i} and add it to @code{p}. (Recall that @code{p[i]} means @code{*(p + i)}.) Either way, it uses the same address to get the next integer. It makes no difference in this program whether we write @code{i++} or @code{++i}, because the value @emph{of that expression} is not used. We use it for its effect, to increment @code{i}. The @samp{--} operator also works on pointers; it can be used to step backwards through an array, like this: @example int after_last_nonzero (int *p, int len) @{ /* @r{Set up @code{q} to point just after the last array element.} */ int *q = p + len; while (q != p) /* @r{Step @code{q} back until it reaches a nonzero element.} */ if (*--q != 0) /* @r{Return the index of the element after that nonzero.} */ return q - p + 1; return 0; @} @end example That function returns the length of the nonzero part of the array specified by its arguments; that is, the index of the first zero of the run of zeros at the end. @node Pointer Arithmetic Drawbacks @section Drawbacks of Pointer Arithmetic @cindex drawbacks of pointer arithmetic @cindex pointer arithmetic, drawbacks Pointer arithmetic is clean and elegant, but it is also the cause of a major security flaw in the C language. Theoretically, it is only valid to adjust a pointer within one object allocated as a unit in memory. However, if you unintentionally adjust a pointer across the bounds of the object and into some other object, the system has no way to detect this error. A bug which does that can easily result in clobbering (overwriting) part of another object. For example, with @code{array[-1]} you can read or write the nonexistent element before the beginning of an array---probably part of some other data. Combining pointer arithmetic with casts between pointer types, you can create a pointer that fails to be properly aligned for its type. For example, @example int a[2]; char *pa = (char *)a; int *p = (int *)(pa + 1); @end example @noindent gives @code{p} a value pointing to an ``integer'' that includes part of @code{a[0]} and part of @code{a[1]}. Dereferencing that with @code{*p} can cause a fatal @code{SIGSEGV} signal or it can return the contents of that badly aligned @code{int} (@pxref{Signals}. If it ``works,'' it may be quite slow. It can also cause aliasing confusions (@pxref{Aliasing}). @strong{Warning:} Using improperly aligned pointers is risky---don't do it unless it is really necessary. @node Pointer-Integer Conversion @section Pointer-Integer Conversion @cindex pointer-integer conversion @cindex conversion between pointers and integers @cindex @code{uintptr_t} On modern computers, an address is simply a number. It occupies the same space as some size of integer. In C, you can convert a pointer to the appropriate integer types and vice versa, without losing information. The appropriate integer types are @code{uintptr_t} (an unsigned type) and @code{intptr_t} (a signed type). Both are defined in @file{stdint.h}. For instance, @example #include #include void print_pointer (void *ptr) @{ uintptr_t converted = (uintptr_t) ptr; printf ("Pointer value is 0x%x\n", (unsigned int) converted); @} @end example @noindent The specification @samp{%x} in the template (the first argument) for @code{printf} means to represent this argument using hexadecimal notation. It's cleaner to use @code{uintptr_t}, since hexadecimal printing treats the number as unsigned, but it won't actually matter: all @code{printf} gets to see is the series of bits in the number. @strong{Warning:} Converting pointers to integers is risky---don't do it unless it is really necessary. @node Printing Pointers @section Printing Pointers To print the numeric value of a pointer, use the @samp{%p} specifier. For example: @example void print_pointer (void *ptr) @{ printf ("Pointer value is %p\n", ptr); @} @end example The specification @samp{%p} works with any pointer type. It prints @samp{0x} followed by the address in hexadecimal, printed as the appropriate unsigned integer type. @node Structures @chapter Structures @cindex structures @findex struct @cindex fields in structures A @dfn{structure} is a user-defined data type that holds various @dfn{fields} of data. Each field has a name and a data type specified in the structure's definition. Here we define a structure suitable for storing a linked list of integers. Each list item will hold one integer, plus a pointer to the next item. @example struct intlistlink @{ int datum; struct intlistlink *next; @}; @end example The structure definition has a @dfn{type tag} so that the code can refer to this structure. The type tag here is @code{intlistlink}. The definition refers recursively to the same structure through that tag. You can define a structure without a type tag, but then you can't refer to it again. That is useful only in some special contexts, such as inside a @code{typedef} or a @code{union}. The contents of the structure are specified by the @dfn{field declarations} inside the braces. Each field in the structure needs a declaration there. The fields in one structure definition must have distinct names, but these names do not conflict with any other names in the program. A field declaration looks just like a variable declaration. You can combine field declarations with the same beginning, just as you can combine variable declarations. This structure has two fields. One, named @code{datum}, has type @code{int} and will hold one integer in the list. The other, named @code{next}, is a pointer to another @code{struct intlistlink} which would be the rest of the list. In the last list item, it would be @code{NULL}. This structure definition is recursive, since the type of the @code{next} field refers to the structure type. Such recursion is not a problem; in fact, you can use the type @code{struct intlistlink *} before the definition of the type @code{struct intlistlink} itself. That works because pointers to all kinds of structures really look the same at the machine level. After defining the structure, you can declare a variable of type @code{struct intlistlink} like this: @example struct intlistlink foo; @end example The structure definition itself can serve as the beginning of a variable declaration, so you can declare variables immediately after, like this: @example struct intlistlink @{ int datum; struct intlistlink *next; @} foo; @end example @noindent But that is ugly. It is almost always clearer to separate the definition of the structure from its uses. Declaring a structure type inside a block (@pxref{Blocks}) limits the scope of the structure type name to that block. That means the structure type is recognized only within that block. Declaring it in a function parameter list, as here, @example int f (struct foo @{int a, b@} parm); @end example @noindent (assuming that @code{struct foo} is not already defined) limits the scope of the structure type @code{struct foo} to that parameter list; that is basically useless, so it triggers a warning. Standard C requires at least one field in a structure. GNU C does not require this. @menu * Referencing Fields:: Accessing field values in a structure object. * Arrays as Fields:: Accessing field values in a structure object. * Dynamic Memory Allocation:: Allocating space for objects while the program is running. * Field Offset:: Memory layout of fields within a structure. * Structure Layout:: Planning the memory layout of fields. * Packed Structures:: Packing structure fields as close as possible. * Bit Fields:: Dividing integer fields into fields with fewer bits. * Bit Field Packing:: How bit fields pack together in integers. * const Fields:: Making structure fields immutable. * Zero Length:: Zero-length array as a variable-length object. * Flexible Array Fields:: Another approach to variable-length objects. * Overlaying Structures:: Casting one structure type over an object of another structure type. * Structure Assignment:: Assigning values to structure objects. * Unions:: Viewing the same object in different types. * Packing With Unions:: Using a union type to pack various types into the same memory space. * Cast to Union:: Casting a value one of the union's alternative types to the type of the union itself. * Structure Constructors:: Building new structure objects. * Unnamed Types as Fields:: Fields' types do not always need names. * Incomplete Types:: Types which have not been fully defined. * Intertwined Incomplete Types:: Defining mutually-recursive structure types. * Type Tags:: Scope of structure and union type tags. @end menu @node Referencing Fields @section Referencing Structure Fields @cindex referencing structure fields @cindex structure fields, referencing To make a structure useful, there has to be a way to examine and store its fields. The @samp{.} (period) operator does that; its use looks like @code{@var{object}.@var{field}}. Given this structure and variable, @example struct intlistlink @{ int datum; struct intlistlink *next; @}; struct intlistlink foo; @end example @noindent you can write @code{foo.datum} and @code{foo.next} to refer to the two fields in the value of @code{foo}. These fields are lvalues, so you can store values into them, and read the values out again. Most often, structures are dynamically allocated (see the next section), and we refer to the objects via pointers. @code{(*p).@var{field}} is somewhat cumbersome, so there is an abbreviation: @code{p->@var{field}}. For instance, assume the program contains this declaration: @example struct intlistlink *ptr; @end example @noindent You can write @code{ptr->datum} and @code{ptr->next} to refer to the two fields in the object that @code{ptr} points to. If a unary operator precedes an expression using @samp{->}, the @samp{->} nests inside: @example -ptr->datum @r{is equivalent to} -(ptr->datum) @end example You can intermix @samp{->} and @samp{.} without parentheses, as shown here: @example struct @{ double d; struct intlistlink l; @} foo; @r{@dots{}}foo.l.next->next->datum@r{@dots{}} @end example @node Arrays as Fields @section Arrays as Fields When you declare field in a structure as an array, as here: @example struct record @{ char *name; int data[4]; @}; @end example @noindent Each @code{struct record} object holds one string (a pointer, of course) and four integers, all part of a field called @code{data}. If @code{recptr} is a pointer of type @code{struct record *}, then it points to a @code{struct record} which contains those things; you can access the second integer in that record with @code{recptr->data[1]}. If you have two objects of type @code{struct record}, each one contains an array. With this declaration, @example struct record r1, r2; @end example @code{r1.data} holds space for 4 @code{int}s, and @code{r2.data} holds space for another 4 @code{int}s, @node Dynamic Memory Allocation @section Dynamic Memory Allocation @cindex dynamic memory allocation @cindex memory allocation, dynamic @cindex allocating memory dynamically To allocate an object dynamically, call the library function @code{malloc} (@pxref{Basic Allocation, The GNU C Library,, libc, The GNU C Library Reference Manual}). Here is how to allocate an object of type @code{struct intlistlink}. To make this code work, include the file @file{stdlib.h}, like this: @example #include /* @r{Defines @code{NULL}.} */ #include /* @r{Declares @code{malloc}.} */ @dots{} struct intlistlink * alloc_intlistlink () @{ struct intlistlink *p; p = malloc (sizeof (struct intlistlink)); if (p == NULL) fatal ("Ran out of storage"); /* @r{Initialize the contents.} */ p->datum = 0; p->next = NULL; return p; @} @end example @noindent @code{malloc} returns @code{void *}, so the assignment to @code{p} will automatically convert it to type @code{struct intlistlink *}. The return value of @code{malloc} is always sufficiently aligned (@pxref{Type Alignment}) that it is valid for any data type. The test for @code{p == NULL} is necessary because @code{malloc} returns a null pointer if it cannot get any storage. We assume that the program defines the function @code{fatal} to report a fatal error to the user. Here's how to add one more integer to the front of such a list: @example struct intlistlink *my_list = NULL; void add_to_mylist (int my_int) @{ struct intlistlink *p = alloc_intlistlink (); p->datum = my_int; p->next = mylist; mylist = p; @} @end example The way to free the objects is by calling @code{free}. Here's a function to free all the links in one of these lists: @example void free_intlist (struct intlistlink *p) @{ while (p) @{ struct intlistlink *q = p; p = p->next; free (q); @} @} @end example We must extract the @code{next} pointer from the object before freeing it, because @code{free} can clobber the data that was in the object. For the same reason, the program must not use the list any more after freeing its elements. To make sure it won't, it is best to clear out the variable where the list was stored, like this: @example free_intlist (mylist); mylist = NULL; @end example @node Field Offset @section Field Offset @cindex field offset @cindex structure field offset @cindex offset of structure fields To determine the offset of a given field @var{field} in a structure type @var{type}, use the macro @code{offsetof}, which is defined in the file @file{stddef.h}. It is used like this: @example offsetof (@var{type}, @var{field}) @end example Here is an example: @example struct foo @{ int element; struct foo *next; @}; offsetof (struct foo, next) /* @r{On most machines that is 4. It may be 8.} */ @end example @node Structure Layout @section Structure Layout @cindex structure layout @cindex layout of structures The rest of this chapter covers advanced topics about structures. If you are just learning C, you can skip it. The precise layout of a @code{struct} type is crucial when using it to overlay hardware registers, to access data structures in shared memory, or to assemble and disassemble packets for network communication. It is also important for avoiding memory waste when the program makes many objects of that type. However, the layout depends on the target platform. Each platform has conventions for structure layout, which compilers need to follow. Here are the conventions used on most platforms. The structure's fields appear in the structure layout in the order they are declared. When possible, consecutive fields occupy consecutive bytes within the structure. However, if a field's type demands more alignment than it would get that way, C gives it the alignment it requires by leaving a gap after the previous field. Once all the fields have been laid out, it is possible to determine the structure's alignment and size. The structure's alignment is the maximum alignment of any of the fields in it. Then the structure's size is rounded up to a multiple of its alignment. That may require leaving a gap at the end of the structure. Here are some examples, where we assume that @code{char} has size and alignment 1 (always true), and @code{int} has size and alignment 4 (true on most kinds of computers): @example struct foo @{ char a, b; int c; @}; @end example @noindent This structure occupies 8 bytes, with an alignment of 4. @code{a} is at offset 0, @code{b} is at offset 1, and @code{c} is at offset 4. There is a gap of 2 bytes before @code{c}. Contrast that with this structure: @example struct foo @{ char a; int c; char b; @}; @end example This structure has size 12 and alignment 4. @code{a} is at offset 0, @code{c} is at offset 4, and @code{b} is at offset 8. There are two gaps: three bytes before @code{c}, and three bytes at the end. These two structures have the same contents at the C level, but one takes 8 bytes and the other takes 12 bytes due to the ordering of the fields. A reliable way to avoid this sort of wastage is to order the fields by size, biggest fields first. @node Packed Structures @section Packed Structures @cindex packed structures @cindex @code{__attribute__((packed))} In GNU C you can force a structure to be laid out with no gaps by adding @code{__attribute__((packed))} after @code{struct} (or at the end of the structure type declaration). Here's an example: @example struct __attribute__((packed)) foo @{ char a; int c; char b; @}; @end example Without @code{__attribute__((packed))}, this structure occupies 12 bytes (as described in the previous section), assuming 4-byte alignment for @code{int}. With @code{__attribute__((packed))}, it is only 6 bytes long---the sum of the lengths of its fields. Use of @code{__attribute__((packed))} often results in fields that don't have the normal alignment for their types. Taking the address of such a field can result in an invalid pointer because of its improper alignment. Dereferencing such a pointer can cause a @code{SIGSEGV} signal on a machine that doesn't, in general, allow unaligned pointers. @xref{Attributes}. @node Bit Fields @section Bit Fields @cindex bit fields A structure field declaration with an integer type can specify the number of bits the field should occupy. We call that a @dfn{bit field}. These are useful because consecutive bit fields are packed into a larger storage unit. For instance, @example unsigned char opcode: 4; @end example @noindent specifies that this field takes just 4 bits. Since it is unsigned, its possible values range from 0 to 15. A signed field with 4 bits, such as this, @example signed char small: 4; @end example @noindent can hold values from -8 to 7. You can subdivide a single byte into those two parts by writing @example unsigned char opcode: 4; signed char small: 4; @end example @noindent in the structure. With bit fields, these two numbers fit into a single @code{char}. Here's how to declare a one-bit field that can hold either 0 or 1: @example unsigned char special_flag: 1; @end example You can also use the @code{bool} type for bit fields: @example bool special_flag: 1; @end example Except when using @code{bool} (which is always unsigned, @pxref{Boolean Type}), always specify @code{signed} or @code{unsigned} for a bit field. There is a default, if that's not specified: the bit field is signed if plain @code{char} is signed, except that the option @option{-funsigned-bitfields} forces unsigned as the default. But it is cleaner not to depend on this default. Bit fields are special in that you cannot take their address with @samp{&}. They are not stored with the size and alignment appropriate for the specified type, so they cannot be addressed through pointers to that type. @node Bit Field Packing @section Bit Field Packing Programs to communicate with low-level hardware interfaces need to define bit fields laid out to match the hardware data. This section explains how to do that. Consecutive bit fields are packed together, but each bit field must fit within a single object of its specified type. In this example, @example unsigned short a : 3, b : 3, c : 3, d : 3, e : 3; @end example @noindent all five fields fit consecutively into one two-byte @code{short}. They need 15 bits, and one @code{short} provides 16. By contrast, @example unsigned char a : 3, b : 3, c : 3, d : 3, e : 3; @end example @noindent needs three bytes. It fits @code{a} and @code{b} into one @code{char}, but @code{c} won't fit in that @code{char} (they would add up to 9 bits). So @code{c} and @code{d} go into a second @code{char}, leaving a gap of two bits between @code{b} and @code{c}. Then @code{e} needs a third @code{char}. By contrast, @example unsigned char a : 3, b : 3; unsigned int c : 3; unsigned char d : 3, e : 3; @end example @noindent needs only two bytes: the type @code{unsigned int} allows @code{c} to straddle bytes that are in the same word. You can leave a gap of a specified number of bits by defining a nameless bit field. This looks like @code{@var{type} : @var{nbits};}. It is allocated space in the structure just as a named bit field would be allocated. You can force the following bit field to advance to the following aligned memory object with @code{@var{type} : 0;}. Both of these constructs can syntactically share @var{type} with ordinary bit fields. This example illustrates both: @example unsigned int a : 5, : 3, b : 5, : 0, c : 5, : 3, d : 5; @end example @noindent It puts @code{a} and @code{b} into one @code{int}, with a 3-bit gap between them. Then @code{: 0} advances to the next @code{int}, so @code{c} and @code{d} fit into that one. These rules for packing bit fields apply to most target platforms, including all the usual real computers. A few embedded controllers have special layout rules. @node const Fields @section @code{const} Fields @cindex const fields @cindex structure fields, constant @c ??? Is this a C standard feature? A structure field declared @code{const} cannot be assigned to (@pxref{const}). For instance, let's define this modified version of @code{struct intlistlink}: @example struct intlistlink_ro /* @r{``ro'' for read-only.} */ @{ const int datum; struct intlistlink *next; @}; @end example This structure can be used to prevent part of the code from modifying the @code{datum} field: @example /* @r{@code{p} has type @code{struct intlistlink *}.} @r{Convert it to @code{struct intlistlink_ro *}.} */ struct intlistlink_ro *q = (struct intlistlink_ro *) p; q->datum = 5; /* @r{Error!} */ p->datum = 5; /* @r{Valid since @code{*p} is} @r{not a @code{struct intlistlink_ro}.} */ @end example A @code{const} field can get a value in two ways: by initialization of the whole structure, and by making a pointer-to-structure point to an object in which that field already has a value. Any @code{const} field in a structure type makes assignment impossible for structures of that type (@pxref{Structure Assignment}). That is because structure assignment works by assigning the structure's fields, one by one. @node Zero Length @section Arrays of Length Zero @cindex array of length zero @cindex zero-length arrays @cindex length-zero arrays GNU C allows zero-length arrays. They are useful as the last element of a structure that is really a header for a variable-length object. Here's an example, where we construct a variable-size structure to hold a line which is @code{this_length} characters long: @example struct line @{ int length; char contents[0]; @}; struct line *thisline = ((struct line *) malloc (sizeof (struct line) + this_length)); thisline->length = this_length; @end example In ISO C90, we would have to give @code{contents} a length of 1, which means either wasting space or complicating the argument to @code{malloc}. @node Flexible Array Fields @section Flexible Array Fields @cindex flexible array fields @cindex array fields, flexible The C99 standard adopted a more complex equivalent of zero-length array fields. It's called a @dfn{flexible array}, and it's indicated by omitting the length, like this: @example struct line @{ int length; char contents[]; @}; @end example The flexible array has to be the last field in the structure, and there must be other fields before it. Under the C standard, a structure with a flexible array can't be part of another structure, and can't be an element of an array. GNU C allows static initialization of flexible array fields. The effect is to ``make the array long enough'' for the initializer. @example struct f1 @{ int x; int y[]; @} f1 = @{ 1, @{ 2, 3, 4 @} @}; @end example @noindent This defines a structure variable named @code{f1} whose type is @code{struct f1}. In C, a variable name or function name never conflicts with a structure type tag. Omitting the flexible array field's size lets the initializer determine it. This is allowed only when the flexible array is defined in the outermost structure and you declare a variable of that structure type. For example: @example struct foo @{ int x; int y[]; @}; struct bar @{ struct foo z; @}; struct foo a = @{ 1, @{ 2, 3, 4 @} @}; // @r{Valid.} struct bar b = @{ @{ 1, @{ 2, 3, 4 @} @} @}; // @r{Invalid.} struct bar c = @{ @{ 1, @{ @} @} @}; // @r{Valid.} struct foo d[1] = @{ @{ 1 @{ 2, 3, 4 @} @} @}; // @r{Invalid.} @end example @node Overlaying Structures @section Overlaying Different Structures @cindex overlaying structures @cindex structures, overlaying Be careful about using different structure types to refer to the same memory within one function, because GNU C can optimize code assuming it never does that. @xref{Aliasing}. Here's an example of the kind of aliasing that can cause the problem: @example struct a @{ int size; char *data; @}; struct b @{ int size; char *data; @}; struct a foo; struct b *q = (struct b *) &foo; @end example Here @code{q} points to the same memory that the variable @code{foo} occupies, but they have two different types. The two types @code{struct a} and @code{struct b} are defined alike, but they are not the same type. Interspersing references using the two types, like this, @example p->size = 0; q->size = 1; x = p->size; @end example @noindent allows GNU C to assume that @code{p->size} is still zero when it is copied into @code{x}. The compiler ``knows'' that @code{q} points to a @code{struct b} and this cannot overlap with a @code{struct a}. Other compilers might also do this optimization. The ISO C standard considers such code erroneous, precisely so that this optimization will be valid. @node Structure Assignment @section Structure Assignment @cindex structure assignment @cindex assigning structures Assignment operating on a structure type copies the structure. The left and right operands must have the same type. Here is an example: @example #include /* @r{Defines @code{NULL}.} */ #include /* @r{Declares @code{malloc}.} */ @r{@dots{}} struct point @{ double x, y; @}; struct point * copy_point (struct point point) @{ struct point *p = (struct point *) malloc (sizeof (struct point)); if (p == NULL) fatal ("Out of memory"); *p = point; return p; @} @end example Notionally, assignment on a structure type works by copying each of the fields. Thus, if any of the fields has the @code{const} qualifier, that structure type does not allow assignment: @example struct point @{ const double x, y; @}; struct point a, b; a = b; /* @r{Error!} */ @end example @xref{Assignment Expressions}. When a structure type has a field which is an array, as here, @example struct record @{ char *name; int data[4]; @}; struct record r1, r2; @end example @noindent structure assigment such as @code{r1 = r2} copies array fields' contents just as it copies all the other fields. This is the only way in C that you can operate on the whole contents of a array with one operation: when the array is contained in a @code{struct}. You can't copy the contents of the @code{data} field as an array, because @example r1.data = r2.data; @end data @noindent would convert the array objects (as always) to pointers to the initial elements of the arrays (of type @code{struct record *}), and the assignment would be invalid because the left operand is not an lvalue. @node Unions @section Unions @cindex unions @findex union A @dfn{union type} defines alternative ways of looking at the same piece of memory. Each alternative view is defined with a data type, and identified by a name. A union definition looks like this: @example union @var{name} @{ @var{alternative declarations}@r{@dots{}} @}; @end example Each alternative declaration looks like a structure field declaration, except that it can't be a bit field. For instance, @example union number @{ long int integer; double float; @} @end example @noindent lets you store either an integer (type @code{long int}) or a floating point number (type @code{double}) in the same place in memory. The length and alignment of the union type are the maximum of all the alternatives---they do not have to be the same. In this union example, @code{double} probably takes more space than @code{long int}, but that doesn't cause a problem in programs that use the union in the normal way. The members don't have to be different in data type. Sometimes each member pertains to a way the data will be used. For instance, @example union datum @{ double latitude; double longitude; double height; double weight; int continent; @} @end example This union holds one of several kinds of data; most kinds are floating points, but the value can also be a code for a continent which is an integer. You @emph{could} use one member of type @code{double} to access all the values which have that type, but the different member names will make the program clearer. The alignment of a union type is the maximum of the alignments of the alternatives. The size of the union type is the maximum of the sizes of the alternatives, rounded up to a multiple of the alignment (because every type's size must be a multiple of its alignment). All the union alternatives start at the address of the union itself. If an alternative is shorter than the union as a whole, it occupies the first part of the union's storage, leaving the last part unused @emph{for that alternative}. @strong{Warning:} if the code stores data using one union alternative and accesses it with another, the results depend on the kind of computer in use. Only wizards should try to do this. However, when you need to do this, a union is a clean way to do it. Assignment works on any union type by copying the entire value. @node Packing With Unions @section Packing With Unions Sometimes we design a union with the intention of packing various kinds of objects into a certain amount of memory space. For example. @example union bytes8 @{ long long big_int_elt; double double_elt; struct @{ int first, second; @} two_ints; struct @{ void *first, *second; @} two_ptrs; @}; union bytes8 *p; @end example This union makes it possible to look at 8 bytes of data that @code{p} points to as a single 8-byte integer (@code{p->big_int_elt}), as a single floating-point number (@code{p->double_elt}), as a pair of integers (@code{p->two_ints.first} and @code{p->two_ints.second}), or as a pair of pointers (@code{p->two_ptrs.first} and @code{p->two_ptrs.second}). To pack storage with such a union makes assumptions about the sizes of all the types involved. This particular union was written expecting a pointer to have the same size as @code{int}. On a machine where one pointer takes 8 bytes, the code using this union probably won't work as expected. The union, as such, will function correctly---if you store two values through @code{two_ints} and extract them through @code{two_ints}, you will get the same integers back---but the part of the program that expects the union to be 8 bytes long could malfunction, or at least use too much space. The above example shows one case where a @code{struct} type with no tag can be useful. Another way to get effectively the same result is with arrays as members of the union: @example union eight_bytes @{ long long big_int_elt; double double_elt; int two_ints[2]; void *two_ptrs[2]; @}; @end example @node Cast to Union @section Cast to a Union Type @cindex cast to a union @cindex union, casting to a In GNU C, you can explicitly cast any of the alternative types to the union type; for instance, @example (union eight_bytes) (long long) 5 @end example @noindent makes a value of type @code{union eight_bytes} which gets its contents through the alternative named @code{big_int_elt}. The value being cast must exactly match the type of the alternative, so this is not valid: @example (union eight_bytes) 5 /* @r{Error! 5 is @code{int}.} */ @end example A cast to union type looks like any other cast, except that the type specified is a union type. You can specify the type either with @code{union @var{tag}} or with a typedef name (@pxref{Defining Typedef Names}). Using the cast as the right-hand side of an assignment to a variable of union type is equivalent to storing in an alternative of the union: @example union foo u; u = (union foo) x @r{means} u.i = x u = (union foo) y @r{means} u.d = y @end example You can also use the union cast as a function argument: @example void hack (union foo); @r{@dots{}} hack ((union foo) x); @end example @node Structure Constructors @section Structure Constructors @cindex structure constructors @cindex constructors, structure You can construct a structure value by writing its type in parentheses, followed by an initializer that would be valid in a declaration for that type. For instance, given this declaration, @example struct foo @{int a; char b[2];@} structure; @end example @noindent you can create a @code{struct foo} value as follows: @example ((struct foo) @{x + y, 'a', 0@}) @end example @noindent This specifies @code{x + y} for field @code{a}, the character @samp{a} for field @code{b}'s element 0, and the null character for field @code{b}'s element 1. The parentheses around that constructor are to necessary, but we recommend writing them to make the nesting of the containing expression clearer. You can also show the nesting of the two by writing it like this: @example ((struct foo) @{x + y, @{'a', 0@} @}) @end example Each of those is equivalent to writing the following statement expression (@pxref{Statement Exprs}): @example (@{ struct foo temp = @{x + y, 'a', 0@}; temp; @}) @end example You can also create a union value this way, but it is not especially useful since that is equivalent to doing a cast: @example ((union whosis) @{@var{value}@}) @r{is equivalent to} ((union whosis) (@var{value})) @end example @node Unnamed Types as Fields @section Unnamed Types as Fields @cindex unnamed structures @cindex unnamed unions @cindex structures, unnamed @cindex unions, unnamed A structure or a union can contain, as fields, unnamed structures and unions. Here's an example: @example struct @{ int a; union @{ int b; float c; @}; int d; @} foo; @end example @noindent You can access the fields of the unnamed union within @code{foo} as if they were individual fields at the same level as the union definition: @example foo.a = 42; foo.b = 47; foo.c = 5.25; // @r{Overwrites the value in @code{foo.b}}. foo.d = 314; @end example Avoid using field names that could cause ambiguity. For example, with this definition: @example struct @{ int a; struct @{ int a; float b; @}; @} foo; @end example @noindent it is impossible to tell what @code{foo.a} refers to. GNU C reports an error when a definition is ambiguous in this way. @node Incomplete Types @section Incomplete Types @cindex incomplete types @cindex types, incomplete A type that has not been fully defined is called an @dfn{incomplete type}. Structure and union types are incomplete when the code makes a forward reference, such as @code{struct foo}, before defining the type. An array type is incomplete when its length is unspecified. You can't use an incomplete type to declare a variable or field, or use it for a function parameter or return type. The operators @code{sizeof} and @code{_Alignof} give errors when used on an incomplete type. However, you can define a pointer to an incomplete type, and declare a variable or field with such a pointer type. In general, you can do everything with such pointers except dereference them. For example: @example extern void bar (struct mysterious_value *); void foo (struct mysterious_value *arg) @{ bar (arg); @} @r{@dots{}} @{ struct mysterious_value *p, **q; p = *q; foo (p); @} @end example @noindent These examples are valid because the code doesn't try to understand what @code{p} points to; it just passes the pointer around. (Presumably @code{bar} is defined in some other file that really does have a definition for @code{struct mysterious_value}.) However, dereferencing the pointer would get an error; that requires a definition for the structure type. @node Intertwined Incomplete Types @section Intertwined Incomplete Types When several structure types contain pointers to each other, you can define the types in any order because pointers to types that come later are incomplete types. Thus, Here is an example. @example /* @r{An employee record points to a group.} */ struct employee @{ char *name; @r{@dots{}} struct group *group; /* @r{incomplete type.} */ @r{@dots{}} @}; /* @r{An employee list points to employees.} */ struct employee_list @{ struct employee *this_one; struct employee_list *next; /* @r{incomplete type.} */ @r{@dots{}} @}; /* @r{A group points to one employee_list.} */ struct group @{ char *name; @r{@dots{}} struct employee_list *employees; @r{@dots{}} @}; @end example @node Type Tags @section Type Tags @cindex type tags The name that follows @code{struct} (@pxref{Structures}), @code{union} (@pxref{Unions}, or @code{enum} (@pxref{Enumeration Types}) is called a @dfn{type tag}. In C, a type tag never conflicts with a variable name or function name; the type tags have a separate @dfn{name space}. Thus, there is no name conflict in this code: @example struct pair @{ int a, b; @}; int pair = 1; @end example @noindent nor in this one: @example struct pair @{ int a, b; @} pair; @end example @noindent where @code{pair} is both a structure type tag and a variable name. However, @code{struct}, @code{union}, and @code{enum} share the same name space of tags, so this is a conflict: @example struct pair @{ int a, b; @}; enum pair @{ c, d @}; @end example @noindent and so is this: @example struct pair @{ int a, b; @}; struct pair @{ int c, d; @}; @end example When the code defines a type tag inside a block, the tag's scope is limited to that block (as for local variables). Two definitions for one type tag do not conflict if they are in different scopes; rather, each is valid in its scope. For example, @example struct pair @{ int a, b; @}; void pair_up_doubles (int len, double array[]) @{ struct pair @{ double a, b; @}; @r{@dots{}} @} @end example @noindent has two definitions for @code{struct pair} which do not conflict. The one inside the function applies only within the definition of @code{pair_up_doubles}. Within its scope, that definition @dfn{shadows} the outer definition. If @code{struct pair} appears inside the function body, before the inner definition, it refers to the outer definition---the only one that has been seen at that point. Thus, in this code, @example struct pair @{ int a, b; @}; void pair_up_doubles (int len, double array[]) @{ struct two_pairs @{ struct pair *p, *q; @}; struct pair @{ double a, b; @}; @r{@dots{}} @} @end example @noindent the structure @code{two_pairs} has pointers to the outer definition of @code{struct pair}, which is probably not desirable. To prevent that, you can write @code{struct pair;} inside the function body as a variable declaration with no variables. This is a @dfn{forward declaration} of the type tag @code{pair}: it makes the type tag local to the current block, with the details of the type to come later. Here's an example: @example void pair_up_doubles (int len, double array[]) @{ /* @r{Forward declaration for @code{pair}.} */ struct pair; struct two_pairs @{ struct pair *p, *q; @}; /* @r{Give the details.} */ struct pair @{ double a, b; @}; @r{@dots{}} @} @end example However, the cleanest practice is to avoid shadowing type tags. @node Arrays @chapter Arrays @cindex array @cindex elements of arrays An @dfn{array} is a data object that holds a series of @dfn{elements}, all of the same data type. Each element is identified by its numeric @var{index} within the array. We presented arrays of numbers in the sample programs early in this manual (@pxref{Array Example}). However, arrays can have elements of any data type, including pointers, structures, unions, and other arrays. If you know another programming language, you may suppose that you know all about arrays, but C arrays have special quirks, so in this chapter we collect all the information about arrays in C@. The elements of a C array are allocated consecutively in memory, with no gaps between them. Each element is aligned as required for its data type (@pxref{Type Alignment}). @menu * Accessing Array Elements:: How to access individual elements of an array. * Declaring an Array:: How to name and reserve space for a new array. * Strings:: A string in C is a special case of array. * Array Type Designators:: Referring to a specific array type. * Incomplete Array Types:: Naming, but not allocating, a new array. * Limitations of C Arrays:: Arrays are not first-class objects. * Multidimensional Arrays:: Arrays of arrays. * Constructing Array Values:: Assigning values to an entire array at once. * Arrays of Variable Length:: Declaring arrays of non-constant size. @end menu @node Accessing Array Elements @section Accessing Array Elements @cindex accessing array elements @cindex array elements, accessing If the variable @code{a} is an array, the @var{n}th element of @code{a} is @code{a[@var{n}]}. You can use that expression to access an element's value or to assign to it: @example x = a[5]; a[6] = 1; @end example @noindent Since the variable @code{a} is an lvalue, @code{a[@var{n}]} is also an lvalue. The lowest valid index in an array is 0, @emph{not} 1, and the highest valid index is one less than the number of elements. The C language does not check whether array indices are in bounds, so if the code uses an out-of-range index, it will access memory outside the array. @strong{Warning:} Using only valid index values in C is the programmer's responsibility. Array indexing in C is not a primitive operation: it is defined in terms of pointer arithmetic and dereferencing. Now that we know @emph{what} @code{a[i]} does, we can ask @emph{how} @code{a[i]} does its job. In C, @code{@var{x}[@var{y}]} is an abbreviation for @code{*(@var{x}+@var{y})}. Thus, @code{a[i]} really means @code{*(a+i)}. @xref{Pointers and Arrays}. When an expression with array type (such as @code{a}) appears as part of a larger C expression, it is converted automatically to a pointer to element zero of that array. For instance, @code{a} in an expression is equivalent to @code{&a[0]}. Thus, @code{*(a+i)} is computed as @code{*(&a[0]+i)}. Now we can analyze how that expression gives us the desired element of the array. It makes a pointer to element 0 of @code{a}, advances it by the value of @code{i}, and dereferences that pointer. Another equivalent way to write the expression is @code{(&a[0])[i]}. @node Declaring an Array @section Declaring an Array @cindex declaring an array @cindex array, declaring To make an array declaration, write @code{[@var{length}]} after the name being declared. This construct is valid in the declaration of a variable, a function parameter, a function value type (the value can't be an array, but it can be a pointer to one), a structure field, or a union alternative. The surrounding declaration specifies the element type of the array; that can be any type of data, but not @code{void} or a function type. For instance, @example double a[5]; @end example @noindent declares @code{a} as an array of 5 @code{double}s. @example struct foo bstruct[length]; @end example @noindent declares @code{bstruct} as an array of @code{length} objects of type @code{struct foo}. A variable array size like this is allowed when the array is not file-scope. Other declaration constructs can nest within the array declaration construct. For instance: @example struct foo *b[length]; @end example @noindent declares @code{b} as an array of @code{length} pointers to @code{struct foo}. This shows that the length need not be a constant (@pxref{Arrays of Variable Length}). @example double (*c)[5]; @end example @noindent declares @code{c} as a pointer to an array of 5 @code{double}s, and @example char *(*f (int))[5]; @end example @noindent declares @code{f} as a function taking an @code{int} argument and returning a pointer to an array of 5 strings (pointers to @code{char}s). @example double aa[5][10]; @end example @noindent declares @code{aa} as an array of 5 elements, each of which is an array of 10 @code{double}s. This shows how to declare a multidimensional array in C (@pxref{Multidimensional Arrays}). All these declarations specify the array's length, which is needed in these cases in order to allocate storage for the array. @node Strings @section Strings @cindex string A string in C is a sequence of elements of type @code{char}, terminated with the null character, the character with code zero. Programs often need to use strings with specific, fixed contents. To write one in a C program, use a @dfn{string constant} such as @code{"Take me to your leader!"}. The data type of a string constant is @code{char *}. For the full syntactic details of writing string constants, @ref{String Constants}. To declare a place to store a non-constant string, declare an array of @code{char}. Keep in mind that it must include one extra @code{char} for the terminating null. For instance, @example char text[] = @{ 'H', 'e', 'l', 'l', 'o', 0 @}; @end example @noindent declares an array named @samp{text} with six elements---five letters and the terminating null character. An equivalent way to get the same result is this, @example char text[] = "Hello"; @end example @noindent which copies the elements of the string constant, including @emph{its} terminating null character. @example char message[200]; @end example @noindent declares an array long enough to hold a string of 199 ASCII characters plus the terminating null character. When you store a string into @code{message} be sure to check or prove that the length does not exceed its size. For example, @example void set_message (char *text) @{ int i; for (i = 0; i < sizeof (message); i++) @{ message[i] = text[i]; if (text[i] == 0) return; @} fatal_error ("Message is too long for `message'); @} @end example It's easy to do this with the standard library function @code{strncpy}, which fills out the whole destination array (up to a specified length) with null characters. Thus, if the last character of the destination is not null, the string did not fit. Many system libraries, including the GNU C library, hand-optimize @code{strncpy} to run faster than an explicit @code{for}-loop. Here's what the code looks like: @example void set_message (char *text) @{ strncpy (message, text, sizeof (message)); if (message[sizeof (message) - 1] != 0) fatal_error ("Message is too long for `message'); @} @end example @xref{String and Array Utilities, The GNU C Library, , libc, The GNU C Library Reference Manual}, for more information about the standard library functions for operating on strings. You can avoid putting a fixed length limit on strings you construct or operate on by allocating the space for them dynamically. @xref{Dynamic Memory Allocation}. @node Array Type Designators @section Array Type Designators Every C type has a type designator, which you make by deleting the variable name and the semicolon from a declaration (@pxref{Type Designators}). The designators for array types follow this rule, but they may appear surprising. @example @r{type} int a[5]; @r{designator} int [5] @r{type} double a[5][3]; @r{designator} double [5][3] @r{type} struct foo *a[5]; @r{designator} struct foo *[5] @end example @node Incomplete Array Types @section Incomplete Array Types @cindex incomplete array types @cindex array types, incomplete An array is equivalent, for most purposes, to a pointer to its zeroth element. When that is true, the length of the array is irrelevant. The length needs to be known only for allocating space for the array, or for @code{sizeof} and @code{typeof} (@pxref{Auto Type}). Thus, in some contexts C allows @itemize @bullet @item An @code{extern} declaration says how to refer to a variable allocated elsewhere. It does not need to allocate space for the variable, so if it is an array, you can omit the length. For example, @example extern int foo[]; @end example @item When declaring a function parameter as an array, the argument value passed to the function is really a pointer to the array's zeroth element. This value does not say how long the array really is, there is no need to declare it. For example, @example int func (int foo[]) @end example @end itemize These declarations are examples of @dfn{incomplete} array types, types that are not fully specified. The incompleteness makes no difference for accessing elements of the array, but it matters for some other things. For instance, @code{sizeof} is not allowed on an incomplete type. With multidimensional arrays, only the first dimension can be omitted: @example extern struct chesspiece *funnyboard foo[][8]; @end example In other words, the code doesn't have to say how many rows there are, but it must state how big each row is. @node Limitations of C Arrays @section Limitations of C Arrays @cindex limitations of C arrays @cindex first-class object Arrays have quirks in C because they are not ``first-class objects'': there is no way in C to operate on an array as a unit. The other composite objects in C, structures and unions, are first-class objects: a C program can copy a structure or union value in an assignment, or pass one as an argument to a function, or make a function return one. You can't do those things with an array in C@. That is because a value you can operate on never has an array type. An expression in C can have an array type, but that doesn't produce the array as a value. Instead it is converted automatically to a pointer to the array's element at index zero. The code can operate on the pointer, and through that on individual elements of the array, but it can't get and operate on the array as a unit. There are three exceptions to this conversion rule, but none of them offers a way to operate on the array as a whole. First, @samp{&} applied to an expression with array type gives you the address of the array, as an array type. However, you can't operate on the whole array that way---if you apply @samp{*} to get the array back, that expression converts, as usual, to a pointer to its zeroth element. Second, the operators @code{sizeof}, @code{_Alignof}, and @code{typeof} do not convert the array to a pointer; they leave it as an array. But they don't operate on the array's data---they only give information about its type. Third, a string constant used as an initializer for an array is not converted to a pointer---rather, the declaration copies the @emph{contents} of that string in that one special case. You @emph{can} copy the contents of an array, just not with an assignment operator. You can do it by calling the library function @code{memcpy} or @code{memmove} (@pxref{Copying and Concatenation, The GNU C Library, , libc, The GNU C Library Reference Manual}). Also, when a structure contains just an array, you can copy that structure. An array itself is an lvalue if it is a declared variable, or part of a structure or union that is an lvalue. When you construct an array from elements (@pxref{Constructing Array Values}), that array is not an lvalue. @node Multidimensional Arrays @section Multidimensional Arrays @cindex multidimensional arrays @cindex array, multidimensional Strictly speaking, all arrays in C are unidimensional. However, you can create an array of arrays, which is more or less equivalent to a multidimensional array. For example, @example struct chesspiece *board[8][8]; @end example @noindent declares an array of 8 arrays of 8 pointers to @code{struct chesspiece}. This data type could represent the state of a chess game. To access one square's contents requires two array index operations, one for each dimension. For instance, you can write @code{board[row][column]}, assuming @code{row} and @code{column} are variables with integer values in the proper range. How does C understand @code{board[row][column]}? First of all, @code{board} is converted automatically to a pointer to the zeroth element (at index zero) of @code{board}. Adding @code{row} to that makes it point to the desired element. Thus, @code{board[row]}'s value is an element of @code{board}---an array of 8 pointers. However, as an expression with array type, it is converted automatically to a pointer to the array's zeroth element. The second array index operation, @code{[column]}, accesses the chosen element from that array. As this shows, pointer-to-array types are meaningful in C@. You can declare a variable that points to a row in a chess board like this: @example struct chesspiece *(*rowptr)[8]; @end example @noindent This points to an array of 8 pointers to @code{struct chesspiece}. You can assign to it as follows: @example rowptr = &board[5]; @end example The dimensions don't have to be equal in length. Here we declare @code{statepop} as an array to hold the population of each state in the United States for each year since 1900: @example #define NSTATES 50 @{ int nyears = current_year - 1900 + 1; int statepop[NSTATES][nyears]; @r{@dots{}} @} @end example The variable @code{statepop} is an array of @code{NSTATES} subarrays, each indexed by the year (counting from 1900). Thus, to get the element for a particular state and year, we must subscript it first by the number that indicates the state, and second by the index for the year: @example statepop[state][year - 1900] @end example @cindex array, layout in memory The subarrays within the multidimensional array are allocated consecutively in memory, and within each subarray, its elements are allocated consecutively in memory. The most efficient way to process all the elements in the array is to scan the last subscript in the innermost loop. This means consecutive accesses go to consecutive memory locations, which optimizes use of the processor's memory cache. For example: @example int total = 0; float average; for (int state = 0; state < NSTATES, ++state) @{ for (int year = 0; year < nyears; ++year) @{ total += statepop[state][year]; @} @} average = total / nyears; @end example C's layout for multidimensional arrays is different from Fortran's layout. In Fortran, a multidimensional array is not an array of arrays; rather, multidimensional arrays are a primitive feature, and it is the first index that varies most rapidly between consecutive memory locations. Thus, the memory layout of a 50x114 array in C matches that of a 114x50 array in Fortran. @node Constructing Array Values @section Constructing Array Values @cindex constructing array values @cindex array values, constructing You can construct an array from elements by writing them inside braces, and preceding all that with the array type's designator in parentheses. There is no need to specify the array length, since the number of elements determines that. The constructor looks like this: @example (@var{elttype}[]) @{ @var{elements} @}; @end example Here is an example, which constructs an array of string pointers: @example (char *[]) @{ "x", "y", "z" @}; @end example That's equivalent in effect to declaring an array with the same initializer, like this: @example char *array[] = @{ "x", "y", "z" @}; @end example and then using the array. If all the elements are simple constant expressions, or made up of such, then the compound literal can be coerced to a pointer to its zeroth element and used to initialize a file-scope variable (@pxref{File-Scope Variables}), as shown here: @example char **foo = (char *[]) @{ "x", "y", "z" @}; @end example @noindent The data type of @code{foo} is @code{char **}, which is a pointer type, not an array type. The declaration is equivalent to defining and then using an array-type variable: @example char *nameless_array[] = @{ "x", "y", "z" @}; char **foo = &nameless_array[0]; @end example @node Arrays of Variable Length @section Arrays of Variable Length @cindex array of variable length @cindex variable-length arrays In GNU C, you can declare variable-length arrays like any other arrays, but with a length that is not a constant expression. The storage is allocated at the point of declaration and deallocated when the block scope containing the declaration exits. For example: @example #include /* @r{Defines @code{FILE}.} */ #include /* @r{Declares @code{str}.} */ FILE * concat_fopen (char *s1, char *s2, char *mode) @{ char str[strlen (s1) + strlen (s2) + 1]; strcpy (str, s1); strcat (str, s2); return fopen (str, mode); @} @end example @noindent (This uses some standard library functions; see @ref{String and Array Utilities, , , libc, The GNU C Library Reference Manual}.) The length of an array is computed once when the storage is allocated and is remembered for the scope of the array in case it is used in @code{sizeof}. @strong{Warning:} don't allocate a variable-length array if the size might be very large (more than 100,000), or in a recursive function, because that is likely to cause stack overflow. Allocate the array dynamically instead (@pxref{Dynamic Memory Allocation}). Jumping or breaking out of the scope of the array name deallocates the storage. Jumping into the scope is not allowed; that gives an error message. You can also use variable-length arrays as arguments to functions: @example struct entry tester (int len, char data[len][len]) @{ @r{@dots{}} @} @end example As usual, a function argument declared with an array type is really a pointer to an array that already exists. Calling the function does not allocate the array, so there's no particular danger of stack overflow in using this construct. To pass the array first and the length afterward, use a forward declaration in the function's parameter list (another GNU extension). For example, @example struct entry tester (int len; char data[len][len], int len) @{ @r{@dots{}} @} @end example The @code{int len} before the semicolon is a @dfn{parameter forward declaration}, and it serves the purpose of making the name @code{len} known when the declaration of @code{data} is parsed. You can write any number of such parameter forward declarations in the parameter list. They can be separated by commas or semicolons, but the last one must end with a semicolon, which is followed by the ``real'' parameter declarations. Each forward declaration must match a ``real'' declaration in parameter name and data type. ISO C11 does not support parameter forward declarations. @node Enumeration Types @chapter Enumeration Types @cindex enumeration types @cindex types, enumeration @cindex enumerator An @dfn{enumeration type} represents a limited set of integer values, each with a name. It is effectively equivalent to a primitive integer type. Suppose we have a list of possible emotional states to store in an integer variable. We can give names to these alternative values with an enumeration: @example enum emotion_state @{ neutral, happy, sad, worried, calm, nervous @}; @end example @noindent (Never mind that this is a simplistic way to classify emotional states; it's just a code example.) The names inside the enumeration are called @dfn{enumerators}. The enumeration type defines them as constants, and their values are consecutive integers; @code{neutral} is 0, @code{happy} is 1, @code{sad} is 2, and so on. Alternatively, you can specify values for the enumerators explicitly like this: @example enum emotion_state @{ neutral = 2, happy = 5, sad = 20, worried = 10, calm = -5, nervous = -300 @}; @end example Each enumerator which does not specify a value gets value zero (if it is at the beginning) or the next consecutive integer. @example /* @r{@code{neutral} is 0 by default,} @r{and @code{worried} is 21 by default.} */ enum emotion_state @{ neutral, happy = 5, sad = 20, worried, calm = -5, nervous = -300 @}; @end example If an enumerator is obsolete, you can specify that using it should cause a warning, by including an attribute in the enumerator's declaration. Here is how @code{happy} would look with this attribute: @example happy __attribute__ ((deprecated ("impossible under plutocratic rule"))) = 5, @end example @xref{Attributes}. You can declare variables with the enumeration type: @example enum emotion_state feelings_now; @end example In the C code itself, this is equivalent to declaring the variable @code{int}. (If all the enumeration values are positive, it is equivalent to @code{unsigned int}.) However, declaring it with the enumeration type has an advantage in debugging, because GDB knows it should display the current value of the variable using the corresponding name. If the variable's type is @code{int}, GDB can only show the value as a number. The identifier that follows @code{enum} is called a @dfn{type tag} since it distinguishes different enumeration types. Type tags are in a separate name space and belong to scopes like most other names in C@. @xref{Type Tags}, for explanation. You can predeclare an @code{enum} type tag like a structure or union type tag, like this: @example enum foo; @end example @noindent The @code{enum} type is incomplete until you finish defining it. You can optionally include a trailing comma at the end of a list of enumeration values: @example enum emotion_state @{ neutral, happy, sad, worried, calm, nervous, @}; @end example @noindent This is useful in some macro definitions, since it enables you to assemble the list of enumerators without knowing which one is last. The extra comma does not change the meaning of the enumeration in any way. @node Defining Typedef Names @chapter Defining Typedef Names @cindex typedef names @findex typedef You can define a data type keyword as an alias for any type, and then use the alias syntactically like a built-in type keyword such as @code{int}. You do this using @code{typedef}, so these aliases are also called @dfn{typedef names}. @code{typedef} is followed by text that looks just like a variable declaration, but instead of declaring variables it defines data type keywords. Here's how to define @code{fooptr} as a typedef alias for the type @code{struct foo *}, then declare @code{x} and @code{y} as variables with that type: @example typedef struct foo *fooptr; fooptr x, y; @end example @noindent That declaration is equivalent to the following one: @example struct foo *x, *y; @end example You can define a typedef alias for any type. For instance, this makes @code{frobcount} an alias for type @code{int}: @example typedef int frobcount; @end example @noindent This doesn't define a new type distinct from @code{int}. Rather, @code{frobcount} is another name for the type @code{int}. Once the variable is declared, it makes no difference which name the declaration used. There is a syntactic difference, however, between @code{frobcount} and @code{int}: A typedef name cannot be used with @code{signed}, @code{unsigned}, @code{long} or @code{short}. It has to specify the type all by itself. So you can't write this: @example unsigned frobcount f1; /* @r{Error!} */ @end example But you can write this: @example typedef unsigned int unsigned_frobcount; unsigned_frobcount f1; @end example In other words, a typedef name is not an alias for @emph{a keyword} such as @code{int}. It stands for a @emph{type}, and that could be the type @code{int}. Typedef names are in the same namespace as functions and variables, so you can't use the same name for a typedef and a function, or a typedef and a variable. When a typedef is declared inside a code block, it is in scope only in that block. @strong{Warning:} Avoid defining typedef names that end in @samp{_t}, because many of these have standard meanings. You can redefine a typedef name to the exact same type as its first definition, but you cannot redefine a typedef name to a different type, even if the two types are compatible. For example, this is valid: @example typedef int frobcount; typedef int frotzcount; typedef frotzcount frobcount; typedef frobcount frotzcount; @end example @noindent because each typedef name is always defined with the same type (@code{int}), but this is not valid: @example enum foo @{f1, f2, f3@}; typedef enum foo frobcount; typedef int frobcount; @end example @noindent Even though the type @code{enum foo} is compatible with @code{int}, they are not the @emph{same} type. @node Statements @chapter Statements @cindex statements A @dfn{statement} specifies computations to be done for effect; it does not produce a value, as an expression would. In general a statement ends with a semicolon (@samp{;}), but blocks (which are statements, more or less) are an exception to that rule. @ifnottex @xref{Blocks}. @end ifnottex The places to use statements are inside a block, and inside a complex statement. A @dfn{complex statement} contains one or two components that are nested statements. Each such component must consist of one and only one statement. The way to put multiple statements in such a component is to group them into a @dfn{block} (@pxref{Blocks}), which counts as one statement. The following sections describe the various kinds of statement. @menu * Expression Statement:: Evaluate an expression, as a statement, usually done for a side effect. * if Statement:: Basic conditional execution. * if-else Statement:: Multiple branches for conditional execution. * Blocks:: Grouping multiple statements together. * return Statement:: Return a value from a function. * Loop Statements:: Repeatedly executing a statement or block. * switch Statement:: Multi-way conditional choices. * switch Example:: A plausible example of using @code{switch}. * Duffs Device:: A special way to use @code{switch}. * Case Ranges:: Ranges of values for @code{switch} cases. * Null Statement:: A statement that does nothing. * goto Statement:: Jump to another point in the source code, identified by a label. * Local Labels:: Labels with limited scope. * Labels as Values:: Getting the address of a label. * Statement Exprs:: A series of statements used as an expression. @end menu @node Expression Statement @section Expression Statement @cindex expression statement @cindex statement, expression The most common kind of statement in C is an @dfn{expression statement}. It consists of an expression followed by a semicolon. The expression's value is discarded, so the expressions that are useful are those that have side effects: assignment expressions, increment and decrement expressions, and function calls. Here are examples of expression statements: @smallexample x = 5; /* @r{Assignment expression.} */ p++; /* @r{Increment expression.} */ printf ("Done\n"); /* @r{Function call expression.} */ *p; /* @r{Cause @code{SIGSEGV} signal if @code{p} is null.} */ x + y; /* @r{Useless statement without effect.} */ @end smallexample In very unusual circumstances we use an expression statement whose purpose is to get a fault if an address is invalid: @smallexample volatile char *p; @r{@dots{}} *p; /* @r{Cause signal if @code{p} is null.} */ @end smallexample If the target of @code{p} is not declared @code{volatile}, the compiler might optimize away the memory access, since it knows that the value isn't really used. @xref{volatile}. @node if Statement @section @code{if} Statement @cindex @code{if} statement @cindex statement, @code{if} @findex if An @code{if} statement computes an expression to decide whether to execute the following statement or not. It looks like this: @example if (@var{condition}) @var{execute-if-true} @end example The first thing this does is compute the value of @var{condition}. If that is true (nonzero), then it executes the statement @var{execute-if-true}. If the value of @var{condition} is false (zero), it doesn't execute @var{execute-if-true}; instead, it does nothing. This is a @dfn{complex statement} because it contains a component @var{if-true-substatement} that is a nested statement. It must be one and only one statement. The way to put multiple statements there is to group them into a @dfn{block} (@pxref{Blocks}). @node if-else Statement @section @code{if-else} Statement @cindex @code{if}@dots{}@code{else} statement @cindex statement, @code{if}@dots{}@code{else} @findex else An @code{if}-@code{else} statement computes an expression to decide which of two nested statements to execute. It looks like this: @example if (@var{condition}) @var{if-true-substatement} else @var{if-false-substatement} @end example The first thing this does is compute the value of @var{condition}. If that is true (nonzero), then it executes the statement @var{if-true-substatement}. If the value of @var{condition} is false (zero), then it executes the statement @var{if-false-substatement} instead. This is a @dfn{complex statement} because it contains components @var{if-true-substatement} and @var{if-else-substatement} that are nested statements. Each must be one and only one statement. The way to put multiple statements in such a component is to group them into a @dfn{block} (@pxref{Blocks}). @node Blocks @section Blocks @cindex block @cindex compound statement A @dfn{block} is a construct that contains multiple statements of any kind. It begins with @samp{@{} and ends with @samp{@}}, and has a series of statements and declarations in between. Another name for blocks is @dfn{compound statements}. Is a block a statement? Yes and no. It doesn't @emph{look} like a normal statement---it does not end with a semicolon. But you can @emph{use} it like a statement; anywhere that a statement is required or allowed, you can write a block and consider that block a statement. So far it seems that a block is a kind of statement with an unusual syntax. But that is not entirely true: a function body is also a block, and that block is definitely not a statement. The text after a function header is not treated as a statement; only a function body is allowed there, and nothing else would be meaningful there. In a formal grammar we would have to choose---either a block is a kind of statement or it is not. But this manual is meant for humans, not for parser generators. The clearest answer for humans is, ``a block is a statement, in some ways.'' @cindex nested block @cindex internal block A block that isn't a function body is called an @dfn{internal block} or a @dfn{nested block}. You can put a nested block directly inside another block, but more often the nested block is inside some complex statement, such as a @code{for} statement or an @code{if} statement. There are two uses for nested blocks in C: @itemize @bullet @item To specify the scope for local declarations. For instance, a local variable's scope is the rest of the innermost containing block. @item To write a series of statements where, syntactically, one statement is called for. For instance, the @var{execute-if-true} of an @code{if} statement is one statement. To put multiple statements there, they have to be wrapped in a block, like this: @example if (x < 0) @{ printf ("x was negative\n"); x = -x; @} @end example @end itemize This example (repeated from above) shows a nested block which serves both purposes: it includes two statements (plus a declaration) in the body of a @code{while} statement, and it provides the scope for the declaration of @code{q}. @example void free_intlist (struct intlistlink *p) @{ while (p) @{ struct intlistlink *q = p; p = p->next; free (q); @} @} @end example @node return Statement @section @code{return} Statement @cindex @code{return} statement @cindex statement, @code{return} @findex return The @code{return} statement makes the containing function return immediately. It has two forms. This one specifies no value to return: @example return; @end example @noindent That form is meant for functions whose return type is @code{void} (@pxref{The Void Type}). You can also use it in a function that returns nonvoid data, but that's a bad idea, since it makes the function return garbage. The form that specifies a value looks like this: @example return @var{value}; @end example @noindent which computes the expression @var{value} and makes the function return that. If necessary, the value undergoes type conversion to the function's declared return value type, which works like assigning the value to a variable of that type. @node Loop Statements @section Loop Statements @cindex loop statements @cindex statements, loop @cindex iteration You can use a loop statement when you need to execute a series of statements repeatedly, making an @dfn{iteration}. C provides several different kinds of loop statements, described in the following subsections. Every kind of loop statement is a complex statement because contains a component, here called @var{body}, which is a nested statement. Most often the body is a block. @menu * while Statement:: Loop as long as a test expression is true. * do-while Statement:: Execute a loop once, with further looping as long as a test expression is true. * break Statement:: End a loop immediately. * for Statement:: Iterative looping. * Example of for:: An example of iterative looping. * Omitted for-Expressions:: for-loop expression options. * for-Index Declarations:: for-loop declaration options. * continue Statement:: Begin the next cycle of a loop. @end menu @node while Statement @subsection @code{while} Statement @cindex @code{while} statement @cindex statement, @code{while} @findex while The @code{while} statement is the simplest loop construct. It looks like this: @example while (@var{test}) @var{body} @end example Here, @var{body} is a statement (often a nested block) to repeat, and @var{test} is the test expression that controls whether to repeat it again. Each iteration of the loop starts by computing @var{test} and, if it is true (nonzero), that means the loop should execute @var{body} again and then start over. Here's an example of advancing to the last structure in a chain of structures chained through the @code{next} field: @example #include /* @r{Defines @code{NULL}.} */ @r{@dots{}} while (chain->next != NULL) chain = chain->next; @end example @noindent This code assumes the chain isn't empty to start with; if the chain is empty (that is, if @code{chain} is a null pointer), the code gets a @code{SIGSEGV} signal trying to dereference that null pointer (@pxref{Signals}). @node do-while Statement @subsection @code{do-while} Statement @cindex @code{do}--@code{while} statement @cindex statement, @code{do}--@code{while} @findex do The @code{do}--@code{while} statement is a simple loop construct that performs the test at the end of the iteration. @example do @var{body} while (@var{test}); @end example Here, @var{body} is a statement (possibly a block) to repeat, and @var{test} is an expression that controls whether to repeat it again. Each iteration of the loop starts by executing @var{body}. Then it computes @var{test} and, if it is true (nonzero), that means to go back and start over with @var{body}. If @var{test} is false (zero), then the loop stops repeating and execution moves on past it. @node break Statement @subsection @code{break} Statement @cindex @code{break} statement @cindex statement, @code{break} @findex break The @code{break} statement looks like @samp{break;}. Its effect is to exit immediately from the innermost loop construct or @code{switch} statement (@pxref{switch Statement}). For example, this loop advances @code{p} until the next null character or newline. @example while (*p) @{ /* @r{End loop if we have reached a newline.} */ if (*p == '\n') break; p++ @} @end example When there are nested loops, the @code{break} statement exits from the innermost loop containing it. @example struct list_if_tuples @{ struct list_if_tuples next; int length; data *contents; @}; void process_all_elements (struct list_if_tuples *list) @{ while (list) @{ /* @r{Process all the elements in this node's vector,} @r{stopping when we reach one that is null.} */ for (i = 0; i < list->length; i++ @{ /* @r{Null element terminates this node's vector.} */ if (list->contents[i] == NULL) /* @r{Exit the @code{for} loop.} */ break; /* @r{Operate on the next element.} */ process_element (list->contents[i]); @} list = list->next; @} @} @end example The only way in C to exit from an outer loop is with @code{goto} (@pxref{goto Statement}). @node for Statement @subsection @code{for} Statement @cindex @code{for} statement @cindex statement, @code{for} @findex for A @code{for} statement uses three expressions written inside a parenthetical group to define the repetition of the loop. The first expression says how to prepare to start the loop. The second says how to test, before each iteration, whether to continue looping. The third says how to advance, at the end of an iteration, for the next iteration. All together, it looks like this: @example for (@var{start}; @var{continue-test}; @var{advance}) @var{body} @end example The first thing the @code{for} statement does is compute @var{start}. The next thing it does is compute the expression @var{continue-test}. If that expression is false (zero), the @code{for} statement finishes immediately, so @var{body} is executed zero times. However, if @var{continue-test} is true (nonzero), the @code{for} statement executes @var{body}, then @var{advance}. Then it loops back to the not-quite-top to test @var{continue-test} again. But it does not compute @var{start} again. @node Example of for @subsection Example of @code{for} Here is the @code{for} statement from the iterative Fibonacci function: @example int i; for (i = 1; i < n; ++i) /* @r{If @code{n} is 1 or less, the loop runs zero times,} */ /* @r{since @code{i < n} is false the first time.} */ @{ /* @r{Now @var{last} is @code{fib (@var{i})}} @r{and @var{prev} is @code{fib (@var{i} @minus{} 1)}.} */ /* @r{Compute @code{fib (@var{i} + 1)}.} */ int next = prev + last; /* @r{Shift the values down.} */ prev = last; last = next; /* @r{Now @var{last} is @code{fib (@var{i} + 1)}} @r{and @var{prev} is @code{fib (@var{i})}.} @r{But that won't stay true for long,} @r{because we are about to increment @var{i}.} */ @} @end example In this example, @var{start} is @code{i = 1}, meaning set @code{i} to 1. @var{continue-test} is @code{i < n}, meaning keep repeating the loop as long as @code{i} is less than @code{n}. @var{advance} is @code{i++}, meaning increment @code{i} by 1. The body is a block that contains a declaration and two statements. @node Omitted for-Expressions @subsection Omitted @code{for}-Expressions A fully-fleshed @code{for} statement contains all these parts, @example for (@var{start}; @var{continue-test}; @var{advance}) @var{body} @end example @noindent but you can omit any of the three expressions inside the parentheses. The parentheses and the two semicolons are required syntactically, but the expressions between them may be missing. A missing expression means this loop doesn't use that particular feature of the @code{for} statement. @c ??? You can't do this if START is a declaration. Instead of using @var{start}, you can do the loop preparation before the @code{for} statement: the effect is the same. So we could have written the beginning of the previous example this way: @example int i = 0; for (; i < n; ++i) @end example @noindent instead of this way: @example int i; for (i = 0; i < n; ++i) @end example Omitting @var{continue-test} means the loop runs forever (or until something else causes exit from it). Statements inside the loop can test conditions for termination and use @samp{break;} to exit. This is more flexible since you can put those tests anywhere in the loop, not solely at the beginning. Putting an expression in @var{advance} is almost equivalent to writing it at the end of the loop body; it does almost the same thing. The only difference is for the @code{continue} statement (@pxref{continue Statement}). So we could have written this: @example for (i = 0; i < n;) @{ @r{@dots{}} ++i; @} @end example @noindent instead of this: @example for (i = 0; i < n; ++i) @{ @r{@dots{}} @} @end example The choice is mainly a matter of what is more readable for programmers. However, there is also a syntactic difference: @var{advance} is an expression, not a statement. It can't include loops, blocks, declarations, etc. @node for-Index Declarations @subsection @code{for}-Index Declarations You can declare loop-index variables directly in the @var{start} portion of the @code{for}-loop, like this: @example for (int i = 0; i < n; ++i) @{ @r{@dots{}} @} @end example This kind of @var{start} is limited to a single declaration; it can declare one or more variables, separated by commas, all of which are the same @var{basetype} (@code{int}, in this example): @example for (int i = 0, j = 1, *p = NULL; i < n; ++i, ++j, ++p) @{ @r{@dots{}} @} @end example @noindent The scope of these variables is the @code{for} statement as a whole. See @ref{Variable Declarations} for a explanation of @var{basetype}. Variables declared in @code{for} statements should have initializers. Omitting the initialization gives the variables unpredictable initial values, so this code is erroneous. @example for (int i; i < n; ++i) @{ @r{@dots{}} @} @end example @node continue Statement @subsection @code{continue} Statement @cindex @code{continue} statement @cindex statement, @code{continue} @findex continue The @code{continue} statement looks like @samp{continue;}, and its effect is to jump immediately to the end of the innermost loop construct. If it is a @code{for}-loop, the next thing that happens is to execute the loop's @var{advance} expression. For example, this loop increments @code{p} until the next null character or newline, and operates (in some way not shown) on all the characters in the line except for spaces. All it does with spaces is skip them. @example for (;*p; ++p) @{ /* @r{End loop if we have reached a newline.} */ if (*p == '\n') break; /* @r{Pay no attention to spaces.} */ if (*p == ' ') continue; /* @r{Operate on the next character.} */ @r{@dots{}} @} @end example @noindent Executing @samp{continue;} skips the loop body but it does not skip the @var{advance} expression, @code{p++}. We could also write it like this: @example for (;*p; ++p) @{ /* @r{Exit if we have reached a newline.} */ if (*p == '\n') break; /* @r{Pay no attention to spaces.} */ if (*p != ' ') @{ /* @r{Operate on the next character.} */ @r{@dots{}} @} @} @end example The advantage of using @code{continue} is that it reduces the depth of nesting. Contrast @code{continue} with the @code{break} statement. @xref{break Statement}. @node switch Statement @section @code{switch} Statement @cindex @code{switch} statement @cindex statement, @code{switch} @findex switch @findex case @findex default The @code{switch} statement selects code to run according to the value of an expression. The expression, in parentheses, follows the keyword @code{switch}. After that come all the cases to select among, inside braces. It looks like this: @example switch (@var{selector}) @{ @var{cases}@r{@dots{}} @} @end example A case can look like this: @example case @var{value}: @var{statements} break; @end example @noindent which means ``come here if @var{selector} happens to have the value @var{value},'' or like this (a GNU C extension): @example case @var{rangestart} ... @var{rangeend}: @var{statements} break; @end example @noindent which means ``come here if @var{selector} happens to have a value between @var{rangestart} and @var{rangeend} (inclusive).'' @xref{Case Ranges}. The values in @code{case} labels must reduce to integer constants. They can use arithmetic, and @code{enum} constants, but they cannot refer to data in memory, because they have to be computed at compile time. It is an error if two @code{case} labels specify the same value, or ranges that overlap, or if one is a range and the other is a value in that range. You can also define a default case to handle ``any other value,'' like this: @example default: @var{statements} break; @end example If the @code{switch} statement has no @code{default:} label, then it does nothing when the value matches none of the cases. The brace-group inside the @code{switch} statement is a block, and you can declare variables with that scope just as in any other block (@pxref{Blocks}). However, initializers in these declarations won't necessarily be executed every time the @code{switch} statement runs, so it is best to avoid giving them initializers. @code{break;} inside a @code{switch} statement exits immediately from the @code{switch} statement. @xref{break Statement}. If there is no @code{break;} at the end of the code for a case, execution continues into the code for the following case. This happens more often by mistake than intentionally, but since this feature is used in real code, we cannot eliminate it. @strong{Warning:} When one case is intended to fall through to the next, write a comment like @samp{falls through} to say it's intentional. That way, other programmers won't assume it was an error and ``fix'' it erroneously. Consecutive @code{case} statements could, pedantically, be considered an instance of falling through, but we don't consider or treat them that way because they won't confuse anyone. @node switch Example @section Example of @code{switch} Here's an example of using the @code{switch} statement to distinguish among characters: @cindex counting vowels and punctuation @example struct vp @{ int vowels, punct; @}; struct vp count_vowels_and_punct (char *string) @{ int c; int vowels = 0; int punct = 0; /* @r{Don't change the parameter itself.} */ /* @r{That helps in debugging.} */ char *p = string; struct vp value; while (c = *p++) switch (c) @{ case 'y': case 'Y': /* @r{We assume @code{y_is_consonant} will check surrounding letters to determine whether this y is a vowel.} */ if (y_is_consonant (p - 1)) break; /* @r{Falls through} */ case 'a': case 'e': case 'i': case 'o': case 'u': case 'A': case 'E': case 'I': case 'O': case 'U': vowels++; break; case '.': case ',': case ':': case ';': case '?': case '!': case '\"': case '\'': punct++; break; @} value.vowels = vowels; value.punct = punct; return value; @} @end example @node Duffs Device @section Duff's Device @cindex Duff's device The cases in a @code{switch} statement can be inside other control constructs. For instance, we can use a technique known as @dfn{Duff's device} to optimize this simple function, @example void copy (char *to, char *from, int count) @{ while (count > 0) *to++ = *from++, count--; @} @end example @noindent which copies memory starting at @var{from} to memory starting at @var{to}. Duff's device involves unrolling the loop so that it copies several characters each time around, and using a @code{switch} statement to enter the loop body at the proper point: @example void copy (char *to, char *from, int count) @{ if (count <= 0) return; int n = (count + 7) / 8; switch (count % 8) @{ do @{ case 0: *to++ = *from++; case 7: *to++ = *from++; case 6: *to++ = *from++; case 5: *to++ = *from++; case 4: *to++ = *from++; case 3: *to++ = *from++; case 2: *to++ = *from++; case 1: *to++ = *from++; @} while (--n > 0); @} @} @end example @node Case Ranges @section Case Ranges @cindex case ranges @cindex ranges in case statements You can specify a range of consecutive values in a single @code{case} label, like this: @example case @var{low} ... @var{high}: @end example @noindent This has the same effect as the proper number of individual @code{case} labels, one for each integer value from @var{low} to @var{high}, inclusive. This feature is especially useful for ranges of ASCII character codes: @example case 'A' ... 'Z': @end example @strong{Be careful:} with integers, write spaces around the @code{...} to prevent it from being parsed wrong. For example, write this: @example case 1 ... 5: @end example @noindent rather than this: @example case 1...5: @end example @node Null Statement @section Null Statement @cindex null statement @cindex statement, null A @dfn{null statement} is just a semicolon. It does nothing. A null statement is a placeholder for use where a statement is grammatically required, but there is nothing to be done. For instance, sometimes all the work of a @code{for}-loop is done in the @code{for}-header itself, leaving no work for the body. Here is an example that searches for the first newline in @code{array}: @example for (p = array; *p != '\n'; p++) ; @end example @node goto Statement @section @code{goto} Statement and Labels @cindex @code{goto} statement @cindex statement, @code{goto} @cindex label @findex goto The @code{goto} statement looks like this: @example goto @var{label}; @end example @noindent Its effect is to transfer control immediately to another part of the current function---where the label named @var{label} is defined. An ordinary label definition looks like this: @example @var{label}: @end example @noindent and it can appear before any statement. You can't use @code{default} as a label, since that has a special meaning for @code{switch} statements. An ordinary label doesn't need a separate declaration; defining it is enough. Here's an example of using @code{goto} to implement a loop equivalent to @code{do}--@code{while}: @example @{ loop_restart: @var{body} if (@var{condition}) goto loop_restart; @} @end example The name space of labels is separate from that of variables and functions. Thus, there is no error in using a single name in both ways: @example @{ int foo; // @r{Variable @code{foo}.} foo: // @r{Label @code{foo}.} @var{body} if (foo > 0) // @r{Variable @code{foo}.} goto foo; // @r{Label @code{foo}.} @} @end example Blocks have no effect on ordinary labels; each label name is defined throughout the whole of the function it appears in. It looks strange to jump into a block with @code{goto}, but it works. For example, @example if (x < 0) goto negative; if (y < 0) @{ negative: printf ("Negative\n"); return; @} @end example If the goto jumps into the scope of a variable, it does not initialize the variable. For example, if @code{x} is negative, @example if (x < 0) goto negative; if (y < 0) @{ int i = 5; negative: printf ("Negative, and i is %d\n", i); return; @} @end example @noindent prints junk because @code{i} was not initialized. If the block declares a variable-length automatic array, jumping into it gives a compilation error. However, jumping out of the scope of a variable-length array works fine, and deallocates its storage. A label can't come directly before a declaration, so the code can't jump directly to one. For example, this is not allowed: @example @{ goto foo; foo: int x = 5; bar(&x); @} @end example @noindent The workaround is to add a statement, even an empty statement, directly after the label. For example: @example @{ goto foo; foo: ; int x = 5; bar(&x); @} @end example Likewise, a label can't be the last thing in a block. The workaround solution is the same: add a semicolon after the label. These unnecessary restrictions on labels make no sense, and ought in principle to be removed; but they do only a little harm since labels and @code{goto} are rarely the best way to write a program. These examples are all artificial; it would be more natural to write them in other ways, without @code{goto}. For instance, the clean way to write the example that prints @samp{Negative} is this: @example if (x < 0 || y < 0) @{ printf ("Negative\n"); return; @} @end example @noindent It is hard to construct simple examples where @code{goto} is actually the best way to write a program. Its rare good uses tend to be in complex code, thus not apt for the purpose of explaining the meaning of @code{goto}. The only good time to use @code{goto} is when it makes the code simpler than any alternative. Jumping backward is rarely desirable, because usually the other looping and control constructs give simpler code. Using @code{goto} to jump forward is more often desirable, for instance when a function needs to do some processing in an error case and errors can occur at various different places within the function. @node Local Labels @section Locally Declared Labels @cindex local labels @cindex macros, local labels @findex __label__ In GNU C you can declare @dfn{local labels} in any nested block scope. A local label is used in a @code{goto} statement just like an ordinary label, but you can only reference it within the block in which it was declared. A local label declaration looks like this: @example __label__ @var{label}; @end example @noindent or @example __label__ @var{label1}, @var{label2}, @r{@dots{}}; @end example Local label declarations must come at the beginning of the block, before any ordinary declarations or statements. The label declaration declares the label @emph{name}, but does not define the label itself. That's done in the usual way, with @code{@var{label}:}, before one of the statements in the block. The local label feature is useful for complex macros. If a macro contains nested loops, a @code{goto} can be useful for breaking out of them. However, an ordinary label whose scope is the whole function cannot be used: if the macro can be expanded several times in one function, the label will be multiply defined in that function. A local label avoids this problem. For example: @example #define SEARCH(value, array, target) \ do @{ \ __label__ found; \ __auto_type _SEARCH_target = (target); \ __auto_type _SEARCH_array = (array); \ int i, j; \ int value; \ for (i = 0; i < max; i++) \ for (j = 0; j < max; j++) \ if (_SEARCH_array[i][j] == _SEARCH_target) \ @{ (value) = i; goto found; @} \ (value) = -1; \ found:; \ @} while (0) @end example This could also be written using a statement expression (@pxref{Statement Exprs}): @example #define SEARCH(array, target) \ (@{ \ __label__ found; \ __auto_type _SEARCH_target = (target); \ __auto_type _SEARCH_array = (array); \ int i, j; \ int value; \ for (i = 0; i < max; i++) \ for (j = 0; j < max; j++) \ if (_SEARCH_array[i][j] == _SEARCH_target) \ @{ value = i; goto found; @} \ value = -1; \ found: \ value; \ @}) @end example Ordinary labels are visible throughout the function where they are defined, and only in that function. However, explicitly declared local labels of a block are visible in nested function definitions inside that block. @xref{Nested Functions}, for details. @xref{goto Statement}. @node Labels as Values @section Labels as Values @cindex labels as values @cindex computed gotos @cindex goto with computed label @cindex address of a label In GNU C, you can get the address of a label defined in the current function (or a local label defined in the containing function) with the unary operator @samp{&&}. The value has type @code{void *}. This value is a constant and can be used wherever a constant of that type is valid. For example: @example void *ptr; @r{@dots{}} ptr = &&foo; @end example To use these values requires a way to jump to one. This is done with the computed goto statement@footnote{The analogous feature in Fortran is called an assigned goto, but that name seems inappropriate in C, since you can do more with label addresses than store them in special label variables.}, @code{goto *@var{exp};}. For example, @example goto *ptr; @end example @noindent Any expression of type @code{void *} is allowed. @xref{goto Statement}. @menu * Label Value Uses:: Examples of using label values. * Label Value Caveats:: Limitations of label values. @end menu @node Label Value Uses @subsection Label Value Uses One use for label-valued constants is to initialize a static array to serve as a jump table: @example static void *array[] = @{ &&foo, &&bar, &&hack @}; @end example Then you can select a label with indexing, like this: @example goto *array[i]; @end example @noindent Note that this does not check whether the subscript is in bounds---array indexing in C never checks that. You can make the table entries offsets instead of addresses by subtracting one label from the others. Here is an example: @example static const int array[] = @{ &&foo - &&foo, &&bar - &&foo, &&hack - &&foo @}; goto *(&&foo + array[i]); @end example @noindent Using offsets is preferable in shared libraries, as it avoids the need for dynamic relocation of the array elements; therefore, the array can be read-only. An array of label values or offsets serves a purpose much like that of the @code{switch} statement. The @code{switch} statement is cleaner, so use @code{switch} by preference when feasible. Another use of label values is in an interpreter for threaded code. The labels within the interpreter function can be stored in the threaded code for super-fast dispatching. @node Label Value Caveats @subsection Label Value Caveats Jumping to a label defined in another function does not work. It can cause unpredictable results. The best way to avoid this is to store label values only in automatic variables, or static variables whose names are declared within the function. Never pass them as arguments. @cindex cloning An optimization known as @dfn{cloning} generates multiple simplified variants of a function's code, for use with specific fixed arguments. Using label values in certain ways, such as saving the address in one call to the function and using it again in another call, would make cloning give incorrect results. These functions must disable cloning. Inlining calls to the function would also result in multiple copies of the code, each with its own value of the same label. Using the label in a computed goto is no problem, because the computed goto inhibits inlining. However, using the label value in some other way, such as an indication of where an error occurred, would be optimized wrong. These functions must disable inlining. To prevent inlining or cloning of a function, specify @code{__attribute__((__noinline__,__noclone__))} in its definition. @xref{Attributes}. When a function uses a label value in a static variable initializer, that automatically prevents inlining or cloning the function. @node Statement Exprs @section Statements and Declarations in Expressions @cindex statements inside expressions @cindex declarations inside expressions @cindex expressions containing statements @c the above section title wrapped and causes an underfull hbox.. i @c changed it from "within" to "in". --mew 4feb93 A block enclosed in parentheses can be used as an expression in GNU C@. This provides a way to use local variables, loops and switches within an expression. We call it a @dfn{statement expression}. Recall that a block is a sequence of statements surrounded by braces. In this construct, parentheses go around the braces. For example: @example (@{ int y = foo (); int z; if (y > 0) z = y; else z = - y; z; @}) @end example @noindent is a valid (though slightly more complex than necessary) expression for the absolute value of @code{foo ()}. The last statement in the block should be an expression statement; an expression followed by a semicolon, that is. The value of this expression serves as the value of statement expression. If the last statement is anything else, the statement expression's value is @code{void}. This feature is mainly useful in making macro definitions compute each operand exactly once. @xref{Macros and Auto Type}. Statement expressions are not allowed in expressions that must be constant, such as the value for an enumerator, the width of a bit-field, or the initial value of a static variable. Jumping into a statement expression---with @code{goto}, or using a @code{switch} statement outside the statement expression---is an error. With a computed @code{goto} (@pxref{Labels as Values}), the compiler can't detect the error, but it still won't work. Jumping out of a statement expression is permitted, but since subexpressions in C are not computed in a strict order, it is unpredictable which other subexpressions will have been computed by then. For example, @example foo (), ((@{ bar1 (); goto a; 0; @}) + bar2 ()), baz(); @end example @noindent calls @code{foo} and @code{bar1} before it jumps, and never calls @code{baz}, but may or may not call @code{bar2}. If @code{bar2} does get called, that occurs after @code{foo} and before @code{bar1}. @node Variables @chapter Variables @cindex variables Every variable used in a C program needs to be made known by a @dfn{declaration}. It can be used only after it has been declared. It is an error to declare a variable name more than once in the same scope; an exception is that @code{extern} declarations and tentative definitions can coexist with another declaration of the same variable. Variables can be declared anywhere within a block or file. (Older versions of C required that all variable declarations within a block occur before any statements.) Variables declared within a function or block are @dfn{local} to it. This means that the variable name is visible only until the end of that function or block, and the memory space is allocated only while control is within it. Variables declared at the top level in a file are called @dfn{file-scope}. They are assigned fixed, distinct memory locations, so they retain their values for the whole execution of the program. @menu * Variable Declarations:: Name a variable and and reserve space for it. * Initializers:: Assigning initial values to variables. * Designated Inits:: Assigning initial values to array elements at particular array indices. * Auto Type:: Obtaining the type of a variable. * Local Variables:: Variables declared in function definitions. * File-Scope Variables:: Variables declared outside of function definitions. * Static Local Variables:: Variables declared within functions, but with permanent storage allocation. * Extern Declarations:: Declaring a variable which is allocated somewhere else. * Allocating File-Scope:: When is space allocated for file-scope variables? * auto and register:: Historically used storage directions. * Omitting Types:: The bad practice of declaring variables with implicit type. @end menu @node Variable Declarations @section Variable Declarations @cindex variable declarations @cindex declaration of variables Here's what a variable declaration looks like: @example @var{keywords} @var{basetype} @var{decorated-variable} @r{[}= @var{init}@r{]}; @end example The @var{keywords} specify how to handle the scope of the variable name and the allocation of its storage. Most declarations have no keywords because the defaults are right for them. C allows these keywords to come before or after @var{basetype}, or even in the middle of it as in @code{unsigned static int}, but don't do that---it would surprise other programmers. Always write the keywords first. The @var{basetype} can be any of the predefined types of C, or a type keyword defined with @code{typedef}. It can also be @code{struct @var{tag}}, @code{union @var{tag}}, or @code{enum @var{tag}}. In addition, it can include type qualifiers such as @code{const} and @code{volatile} (@pxref{Type Qualifiers}). In the simplest case, @var{decorated-variable} is just the variable name. That declares the variable with the type specified by @var{basetype}. For instance, @example int foo; @end example @noindent uses @code{int} as the @var{basetype} and @code{foo} as the @var{decorated-variable}. It declares @code{foo} with type @code{int}. @example struct tree_node foo; @end example @noindent declares @code{foo} with type @code{struct tree_node}. @menu * Declaring Arrays and Pointers:: Declaration syntax for variables of array and pointer types. * Combining Variable Declarations:: More than one variable declaration in a single statement. @end menu @node Declaring Arrays and Pointers @subsection Declaring Arrays and Pointers @cindex declaring arrays and pointers @cindex array, declaring @cindex pointers, declaring To declare a variable that is an array, write @code{@var{variable}[@var{length}]} for @var{decorated-variable}: @example int foo[5]; @end example To declare a variable that has a pointer type, write @code{*@var{variable}} for @var{decorated-variable}: @example struct list_elt *foo; @end example These constructs nest. For instance, @example int foo[3][5]; @end example @noindent declares @code{foo} as an array of 3 arrays of 5 integers each, @example struct list_elt *foo[5]; @end example @noindent declares @code{foo} as an array of 5 pointers to structures, and @example struct list_elt **foo; @end example @noindent declares @code{foo} as a pointer to a pointer to a structure. @example int **(*foo[30])(int, double); @end example @noindent declares @code{foo} as an array of 30 pointers to functions (@pxref{Function Pointers}), each of which must accept two arguments (one @code{int} and one @code{double}) and return type @code{int **}. @example void bar (int size) @{ int foo[size]; @r{@dots{}} @} @end example @noindent declares @code{foo} as an array of integers with a size specified at run time when the function @code{bar} is called. @node Combining Variable Declarations @subsection Combining Variable Declarations @cindex combining variable declarations @cindex variable declarations, combining @cindex declarations, combining When multiple declarations have the same @var{keywords} and @var{basetype}, you can combine them using commas. Thus, @example @var{keywords} @var{basetype} @var{decorated-variable-1} @r{[}= @var{init1}@r{]}, @var{decorated-variable-2} @r{[}= @var{init2}@r{]}; @end example @noindent is equivalent to @example @var{keywords} @var{basetype} @var{decorated-variable-1} @r{[}= @var{init1}@r{]}; @var{keywords} @var{basetype} @var{decorated-variable-2} @r{[}= @var{init2}@r{]}; @end example Here are some simple examples: @example int a, b; int a = 1, b = 2; int a, *p, array[5]; int a = 0, *p = &a, array[5] = @{1, 2@}; @end example @noindent In the last two examples, @code{a} is an @code{int}, @code{p} is a pointer to @code{int}, and @code{array} is an array of 5 @code{int}s. Since the initializer for @code{array} specifies only two elements, the other three elements are initialized to zero. @node Initializers @section Initializers @cindex initializers A variable's declaration, unless it is @code{extern}, should also specify its initial value. For numeric and pointer-type variables, the initializer is an expression for the value. If necessary, it is converted to the variable's type, just as in an assignment. You can also initialize a local structure-type (@pxref{Structures}) or local union-type (@pxref{Unions}) variable this way, from an expression whose value has the same type. But you can't initialize an array this way (@pxref{Arrays}), since arrays are not first-class objects in C (@pxref{Limitations of C Arrays}) and there is no array assignment. You can initialize arrays and structures componentwise, with a list of the elements or components. You can initialize a union with any one of its alternatives. @itemize @bullet @item A component-wise initializer for an array consists of element values surrounded by @samp{@{@r{@dots{}}@}}. If the values in the initializer don't cover all the elements in the array, the remaining elements are initialized to zero. You can omit the size of the array when you declare it, and let the initializer specify the size: @example int array[] = @{ 3, 9, 12 @}; @end example @item A component-wise initializer for a structure consists of field values surrounded by @samp{@{@r{@dots{}}@}}. Write the field values in the same order as the fields are declared in the structure. If the values in the initializer don't cover all the fields in the structure, the remaining fields are initialized to zero. @item The initializer for a union-type variable has the form @code{@{ @var{value} @}}, where @var{value} initializes the @emph{first alternative} in the union definition. @end itemize For an array of arrays, a structure containing arrays, an array of structures, etc., you can nest these constructs. For example, @example struct point @{ double x, y; @}; struct point series[] = @{ @{0, 0@}, @{1.5, 2.8@}, @{99, 100.0004@} @}; @end example You can omit a pair of inner braces if they contain the right number of elements for the sub-value they initialize, so that no elements or fields need to be filled in with zeros. But don't do that very much, as it gets confusing. An array of @code{char} can be initialized using a string constant. Recall that the string constant includes an implicit null character at the end (@pxref{String Constants}). Using a string constant as initializer means to use its contents as the initial values of the array elements. Here are examples: @example char text[6] = "text!"; /* @r{Includes the null.} */ char text[5] = "text!"; /* @r{Excludes the null.} */ char text[] = "text!"; /* @r{Gets length 6.} */ char text[] = @{ 't', 'e', 'x', 't', '!', 0 @}; /* @r{same as above.} */ char text[] = @{ "text!" @}; /* @r{Braces are optional.} */ @end example @noindent and this kind of initializer can be nested inside braces to initialize structures or arrays that contain a @code{char}-array. In like manner, you can use a wide string constant to initialize an array of @code{wchar_t}. @node Designated Inits @section Designated Initializers @cindex initializers with labeled elements @cindex labeled elements in initializers @cindex case labels in initializers @cindex designated initializers In a complex structure or long array, it's useful to indicate which field or element we are initializing. To designate specific array elements during initialization, include the array index in brackets, and an assignment operator, for each element: @example int foo[10] = @{ [3] = 42, [7] = 58 @}; @end example @noindent This does the same thing as: @example int foo[10] = @{ 0, 0, 0, 42, 0, 0, 0, 58, 0, 0 @}; @end example The array initialization can include non-designated element values alongside designated indices; these follow the expected ordering of the array initialization, so that @example int foo[10] = @{ [3] = 42, 43, 44, [7] = 58 @}; @end example @noindent does the same thing as: @example int foo[10] = @{ 0, 0, 0, 42, 43, 44, 0, 58, 0, 0 @}; @end example Note that you can only use constant expressions as array index values, not variables. If you need to initialize a subsequence of sequential array elements to the same value, you can specify a range: @example int foo[100] = @{ [0 ... 19] = 42, [20 ... 99] = 43 @}; @end example @noindent Using a range this way is a GNU C extension. When subsequence ranges overlap, each element is initialized by the last specification that applies to it. Thus, this initialization is equivalent to the previous one. @example int foo[100] = @{ [0 ... 99] = 43, [0 ... 19] = 42 @}; @end example @noindent as the second overrides the first for elements 0 through 19. The value used to initialize a range of elements is evaluated only once, for the first element in the range. So for example, this code @example int random_values[100] = @{ [0 ... 99] = get_random_number() @}; @end example @noindent would initialize all 100 elements of the array @code{random_values} to the same value---probably not what is intended. Similarly, you can initialize specific fields of a structure variable by specifying the field name prefixed with a dot: @example struct point @{ int x; int y; @}; struct point foo = @{ .y = 42; @}; @end example @noindent The same syntax works for union variables as well: @example union int_double @{ int i; double d; @}; union int_double foo = @{ .d = 34 @}; @end example @noindent This casts the integer value 34 to a double and stores it in the union variable @code{foo}. You can designate both array elements and structure elements in the same initialization; for example, here's an array of point structures: @example struct point point_array[10] = @{ [4].y = 32, [6].y = 39 @}; @end example Along with the capability to specify particular array and structure elements to initialize comes the possibility of initializing the same element more than once: @example int foo[10] = @{ [4] = 42, [4] = 98 @}; @end example @noindent In such a case, the last initialization value is retained. @node Auto Type @section Referring to a Type with @code{__auto_type} @findex __auto_type @findex typeof @cindex macros, types of arguments You can declare a variable copying the type from the initializer by using @code{__auto_type} instead of a particular type. Here's an example: @example #define max(a,b) \ (@{ __auto_type _a = (a); \ __auto_type _b = (b); \ _a > _b ? _a : _b @}) @end example This defines @code{_a} to be of the same type as @code{a}, and @code{_b} to be of the same type as @code{b}. This is a useful thing to do in a macro that ought to be able to handle any type of data (@pxref{Macros and Auto Type}). The original GNU C method for obtaining the type of a value is to use @code{typeof}, which takes as an argument either a value or the name of a type. The previous example could also be written as: @example #define max(a,b) \ (@{ typeof(a) _a = (a); \ typeof(b) _b = (b); \ _a > _b ? _a : _b @}) @end example @code{typeof} is more flexible than @code{__auto_type}; however, the principal use case for @code{typeof} is in variable declarations with initialization, which is exactly what @code{__auto_type} handles. @node Local Variables @section Local Variables @cindex local variables @cindex variables, local Declaring a variable inside a function definition (@pxref{Function Definitions}) makes the variable name @dfn{local} to the containing block---that is, the containing pair of braces. More precisely, the variable's name is visible starting just after where it appears in the declaration, and its visibility continues until the end of the block. Local variables in C are generally @dfn{automatic} variables: each variable's storage exists only from the declaration to the end of the block. Execution of the declaration allocates the storage, computes the initial value, and stores it in the variable. The end of the block deallocates the storage.@footnote{Due to compiler optimizations, allocation and deallocation don't necessarily really happen at those times.} @strong{Warning:} Two declarations for the same local variable in the same scope are an error. @strong{Warning:} Automatic variables are stored in the run-time stack. The total space for the program's stack may be limited; therefore, in using very large arrays, it may be necessary to allocate them in some other way to stop the program from crashing. @strong{Warning:} If the declaration of an automatic variable does not specify an initial value, the variable starts out containing garbage. In this example, the value printed could be anything at all: @example @{ int i; printf ("Print junk %d\n", i); @} @end example In a simple test program, that statement is likely to print 0, simply because every process starts with memory zeroed. But don't rely on it to be zero---that is erroneous. @strong{Note:} Make sure to store a value into each local variable (by assignment, or by initialization) before referring to its value. @node File-Scope Variables @section File-Scope Variables @cindex file-scope variables @cindex global variables @cindex variables, file-scope @cindex variables, global A variable declaration at the top level in a file (not inside a function definition) declares a @dfn{file-scope variable}. Loading a program allocates the storage for all the file-scope variables in it, and initializes them too. Each file-scope variable is either @dfn{static} (limited to one compilation module) or @dfn{global} (shared with all compilation modules in the program). To make the variable static, write the keyword @code{static} at the start of the declaration. Omitting @code{static} makes the variable global. The initial value for a file-scope variable can't depend on the contents of storage, and can't call any functions. @example int foo = 5; /* @r{Valid.} */ int bar = foo; /* @r{Invalid!} */ int bar = sin (1.0); /* @r{Invalid!} */ @end example But it can use the address of another file-scope variable: @example int foo; int *bar = &foo; /* @r{Valid.} */ int arr[5]; int *bar3 = &arr[3]; /* @r{Valid.} */ int *bar4 = arr + 4; /* @r{Valid.} */ @end example It is valid for a module to have multiple declarations for a file-scope variable, as long as they are all global or all static, but at most one declaration can specify an initial value for it. @node Static Local Variables @section Static Local Variables @cindex static local variables @cindex variables, static local @findex static The keyword @code{static} in a local variable declaration says to allocate the storage for the variable permanently, just like a file-scope variable, even if the declaration is within a function. Here's an example: @example int increment_counter () @{ static int counter = 0; return ++counter; @} @end example The scope of the name @code{counter} runs from the declaration to the end of the containing block, just like an automatic local variable, but its storage is permanent, so the value persists from one call to the next. As a result, each call to @code{increment_counter} returns a different, unique value. The initial value of a static local variable has the same limitations as for file-scope variables: it can't depend on the contents of storage or call any functions. It can use the address of a file-scope variable or a static local variable, because those addresses are determined before the program runs. @node Extern Declarations @section @code{extern} Declarations @cindex @code{extern} declarations @cindex declarations, @code{extern} @findex extern An @code{extern} declaration is used to refer to a global variable whose principal declaration comes elsewhere---in the same module, or in another compilation module. It looks like this: @example extern @var{basetype} @var{decorated-variable}; @end example Its meaning is that, in the current scope, the variable name refers to the file-scope variable of that name---which needs to be declared in a non-@code{extern}, non-@code{static} way somewhere else. For instance, if one compilation module has this global variable declaration @example int error_count = 0; @end example @noindent then other compilation modules can specify this @example extern int error_count; @end example @noindent to allow reference to the same variable. The usual place to write an @code{extern} declaration is at top level in a source file, but you can write an @code{extern} declaration inside a block to make a global or static file-scope variable accessible in that block. Since an @code{extern} declaration does not allocate space for the variable, it can omit the size of an array: @example extern int array[]; @end example You can use @code{array} normally in all contexts where it is converted automatically to a pointer. However, to use it as the operand of @code{sizeof} is an error, since the size is unknown. It is valid to have multiple @code{extern} declarations for the same variable, even in the same scope, if they give the same type. They do not conflict---they agree. For an array, it is legitimate for some @code{extern} declarations can specify the size while others omit it. However, if two declarations give different sizes, that is an error. Likewise, you can use @code{extern} declarations at file scope (@pxref{File-Scope Variables}) followed by an ordinary global (non-static) declaration of the same variable. They do not conflict, because they say compatible things about the same meaning of the variable. @node Allocating File-Scope @section Allocating File-Scope Variables @cindex allocation file-scope variables @cindex file-scope variables, allocating Some file-scope declarations allocate space for the variable, and some don't. A file-scope declaration with an initial value @emph{must} allocate space for the variable; if there are two of such declarations for the same variable, even in different compilation modules, they conflict. An @code{extern} declaration @emph{never} allocates space for the variable. If all the top-level declarations of a certain variable are @code{extern}, the variable never gets memory space. If that variable is used anywhere in the program, the use will be reported as an error, saying that the variable is not defined. @cindex tentative definition A file-scope declaration without an initial value is called a @dfn{tentative definition}. This is a strange hybrid: it @emph{can} allocate space for the variable, but does not insist. So it causes no conflict, no error, if the variable has another declaration that allocates space for it, perhaps in another compilation module. But if nothing else allocates space for the variable, the tentative definition will do it. Any number of compilation modules can declare the same variable in this way, and that is sufficient for all of them to use the variable. @c @opindex -fno-common @c @opindex --warn_common In programs that are very large or have many contributors, it may be wise to adopt the convention of never using tentative definitions. You can use the compilation option @option{-fno-common} to make them an error, or @option{--warn-common} to warn about them. If a file-scope variable gets its space through a tentative definition, it starts out containing all zeros. @node auto and register @section @code{auto} and @code{register} @cindex @code{auto} declarations @cindex @code{register} declarations @findex auto @findex register For historical reasons, you can write @code{auto} or @code{register} before a local variable declaration. @code{auto} merely emphasizes that the variable isn't static; it changes nothing. @code{register} suggests to the compiler storing this variable in a register. However, GNU C ignores this suggestion, since it can choose the best variables to store in registers without any hints. It is an error to take the address of a variable declared @code{register}, so you cannot use the unary @samp{&} operator on it. If the variable is an array, you can't use it at all (other than as the operand of @code{sizeof}), which makes it rather useless. @node Omitting Types @section Omitting Types in Declarations @cindex omitting types in declarations The syntax of C traditionally allows omitting the data type in a declaration if it specifies a storage class, a type qualifier (see the next chapter), or @code{auto} or @code{register}. Then the type defaults to @code{int}. For example: @example auto foo = 42; @end example This is bad practice; if you see it, fix it. @node Type Qualifiers @chapter Type Qualifiers A declaration can include type qualifiers to advise the compiler about how the variable will be used. There are three different qualifiers, @code{const}, @code{volatile} and @code{restrict}. They pertain to different issues, so you can use more than one together. For instance, @code{const volatile} describes a value that the program is not allowed to change, but might have a different value each time the program examines it. (This might perhaps be a special hardware register, or part of shared memory.) If you are just learning C, you can skip this chapter. @menu * const:: Variables whose values don't change. * volatile:: Variables whose values may be accessed or changed outside of the control of this program. * restrict Pointers:: Restricted pointers for code optimization. * restrict Pointer Example:: Example of how that works. @end menu @node const @section @code{const} Variables and Fields @cindex @code{const} variables and fields @cindex variables, @code{const} @findex const You can mark a variable as ``constant'' by writing @code{const} in front of the declaration. This says to treat any assignment to that variable as an error. It may also permit some compiler optimizations---for instance, to fetch the value only once to satisfy multiple references to it. The construct looks like this: @example const double pi = 3.14159; @end example After this definition, the code can use the variable @code{pi} but cannot assign a different value to it. @example pi = 3.0; /* @r{Error!} */ @end example Simple variables that are constant can be used for the same purposes as enumeration constants, and they are not limited to integers. The constantness of the variable propagates into pointers, too. A pointer type can specify that the @emph{target} is constant. For example, the pointer type @code{const double *} stands for a pointer to a constant @code{double}. That's the type that results from taking the address of @code{pi}. Such a pointer can't be dereferenced in the left side of an assignment. @example *(&pi) = 3.0; /* @r{Error!} */ @end example Nonconstant pointers can be converted automatically to constant pointers, but not vice versa. For instance, @example const double *cptr; double *ptr; cptr = π /* @r{Valid.} */ cptr = ptr; /* @r{Valid.} */ ptr = cptr; /* @r{Error!} */ ptr = π /* @r{Error!} */ @end example This is not an ironclad protection against modifying the value. You can always cast the constant pointer to a nonconstant pointer type: @example ptr = (double *)cptr; /* @r{Valid.} */ ptr = (double *)π /* @r{Valid.} */ @end example However, @code{const} provides a way to show that a certain function won't modify the data structure whose address is passed to it. Here's an example: @example int string_length (const char *string) @{ int count = 0; while (*string++) count++; return count; @} @end example @noindent Using @code{const char *} for the parameter is a way of saying this function never modifies the memory of the string itself. In calling @code{string_length}, you can specify an ordinary @code{char *} since that can be converted automatically to @code{const char *}. @node volatile @section @code{volatile} Variables and Fields @cindex @code{volatile} variables and fields @cindex variables, @code{volatile} @findex volatile The GNU C compiler often performs optimizations that eliminate the need to write or read a variable. For instance, @example int foo; foo = 1; foo++; @end example @noindent might simply store the value 2 into @code{foo}, without ever storing 1. These optimizations can also apply to structure fields in some cases. If the memory containing @code{foo} is shared with another program, or if it is examined asynchronously by hardware, such optimizations could confuse the communication. Using @code{volatile} is one way to prevent them. Writing @code{volatile} with the type in a variable or field declaration says that the value may be examined or changed for reasons outside the control of the program at any moment. Therefore, the program must execute in a careful way to assure correct interaction with those accesses, whenever they may occur. The simplest use looks like this: @example volatile int lock; @end example This directs the compiler not to do certain common optimizations on use of the variable @code{lock}. All the reads and writes for a volatile variable or field are really done, and done in the order specified by the source code. Thus, this code: @example lock = 1; list = list->next; if (lock) lock_broken (&lock); lock = 0; @end example @noindent really stores the value 1 in @code{lock}, even though there is no sign it is really used, and the @code{if} statement reads and checks the value of @code{lock}, rather than assuming it is still 1. A limited amount of optimization can be done, in principle, on @code{volatile} variables and fields: multiple references between two sequence points (@pxref{Sequence Points}) can be simplified together. Use of @code{volatile} does not eliminate the flexibility in ordering the computation of the operands of most operators. For instance, in @code{lock + foo ()}, the order of accessing @code{lock} and calling @code{foo} is not specified, so they may be done in either order; the fact that @code{lock} is @code{volatile} has no effect on that. @node restrict Pointers @section @code{restrict}-Qualified Pointers @cindex @code{restrict} pointers @cindex pointers, @code{restrict}-qualified @findex restrict You can declare a pointer as ``restricted'' using the @code{restrict} type qualifier, like this: @example int *restrict p = x; @end example @noindent This enables better optimization of code that uses the pointer. If @code{p} is declared with @code{restrict}, and then the code references the object that @code{p} points to (using @code{*p} or @code{p[@var{i}]}), the @code{restrict} declaration promises that the code will not access that object in any other way---only through @code{p}. For instance, it means the code must not use another pointer to access the same space, as shown here: @example int *restrict p = @var{whatever}; int *q = p; foo (*p, *q); @end example @noindent That contradicts the @code{restrict} promise by accessing the object that @code{p} points to using @code{q}, which bypasses @code{p}. Likewise, it must not do this: @example int *restrict p = @var{whatever}; struct @{ int *a, *b; @} s; s.a = p; foo (*p, *s.a); @end example @noindent This example uses a structure field instead of the variable @code{q} to hold the other pointer, and that contradicts the promise just the same. The keyword @code{restrict} also promises that @code{p} won't point to the allocated space of any automatic or static variable. So the code must not do this: @example int a; int *restrict p = &a; foo (*p, a); @end example @noindent because that does direct access to the object (@code{a}) that @code{p} points to, which bypasses @code{p}. If the code makes such promises with @code{restrict} then breaks them, execution is unpredictable. @node restrict Pointer Example @section @code{restrict} Pointer Example Here are examples where @code{restrict} enables real optimization. In this example, @code{restrict} assures GCC that the array @code{out} points to does not overlap with the array @code{in} points to. @example void process_data (const char *in, char * restrict out, size_t size) @{ for (i = 0; i < size; i++) out[i] = in[i] + in[i + 1]; @} @end example Here's a simple tree structure, where each tree node holds data of type @code{PAYLOAD} plus two subtrees. @example struct foo @{ PAYLOAD payload; struct foo *left; struct foo *right; @}; @end example Now here's a function to null out both pointers in the @code{left} subtree. @example void null_left (struct foo *a) @{ a->left->left = NULL; a->left->right = NULL; @} @end example Since @code{*a} and @code{*a->left} have the same data type, they could legitimately alias (@pxref{Aliasing}). Therefore, the compiled code for @code{null_left} must read @code{a->left} again from memory when executing the second assignment statement. We can enable optimization, so that it does not need to read @code{a->left} again, by writing @code{null_left} in a less obvious way. @example void null_left (struct foo *a) @{ struct foo *b = a->left; b->left = NULL; b->right = NULL; @} @end example A more elegant way to fix this is with @code{restrict}. @example void null_left (struct foo *restrict a) @{ a->left->left = NULL; a->left->right = NULL; @} @end example Declaring @code{a} as @code{restrict} asserts that other pointers such as @code{a->left} will not point to the same memory space as @code{a}. Therefore, the memory location @code{a->left->left} cannot be the same memory as @code{a->left}. Knowing this, the compiled code may avoid reloading @code{a->left} for the second statement. @node Functions @chapter Functions @cindex functions We have already presented many examples of functions, so if you've read this far, you basically understand the concept of a function. It is vital, nonetheless, to have a chapter in the manual that collects all the information about functions. @menu * Function Definitions:: Writing the body of a function. * Function Declarations:: Declaring the interface of a function. * Function Calls:: Using functions. * Function Call Semantics:: Call-by-value argument passing. * Function Pointers:: Using references to functions. * The main Function:: Where execution of a GNU C program begins. * Advanced Definitions:: Advanced features of function definitions. * Obsolete Definitions:: Obsolete features still used in function definitions in old code. @end menu @node Function Definitions @section Function Definitions @cindex function definitions @cindex defining functions We have already presented many examples of function definitions. To summarize the rules, a function definition looks like this: @example @var{returntype} @var{functionname} (@var{parm_declarations}@r{@dots{}}) @{ @var{body} @} @end example The part before the open-brace is called the @dfn{function header}. Write @code{void} as the @var{returntype} if the function does not return a value. @menu * Function Parameter Variables:: Syntax and semantics of function parameters. * Forward Function Declarations:: Functions can only be called after they have been defined or declared. * Static Functions:: Limiting visibility of a function. * Arrays as Parameters:: Functions that accept array arguments. * Structs as Parameters:: Functions that accept structure arguments. @end menu @node Function Parameter Variables @subsection Function Parameter Variables @cindex function parameter variables @cindex parameter variables in functions @cindex parameter list A function parameter variable is a local variable (@pxref{Local Variables}) used within the function to store the value passed as an argument in a call to the function. Usually we say ``function parameter'' or ``parameter'' for short, not mentioning the fact that it's a variable. We declare these variables in the beginning of the function definition, in the @dfn{parameter list}. For example, @example fib (int n) @end example @noindent has a parameter list with one function parameter @code{n}, which has type @code{int}. Function parameter declarations differ from ordinary variable declarations in several ways: @itemize @bullet @item Inside the function definition header, commas separate parameter declarations, and each parameter needs a complete declaration including the type. For instance, if a function @code{foo} has two @code{int} parameters, write this: @example foo (int a, int b) @end example You can't share the common @code{int} between the two declarations: @example foo (int a, b) /* @r{Invalid!} */ @end example @item A function parameter variable is initialized to whatever value is passed in the function call, so its declaration cannot specify an initial value. @item Writing an array type in a function parameter declaration has the effect of declaring it as a pointer. The size specified for the array has no effect at all, and we normally omit the size. Thus, @example foo (int a[5]) foo (int a[]) foo (int *a) @end example @noindent are equivalent. @item The scope of the parameter variables is the entire function body, notwithstanding the fact that they are written in the function header, which is just outside the function body. @end itemize If a function has no parameters, it would be most natural for the list of parameters in its definition to be empty. But that, in C, has a special meaning for historical reasons: ``Do not check that calls to this function have the right number of arguments.'' Thus, @example int foo () @{ return 5; @} int bar (int x) @{ return foo (x); @} @end example @noindent would not report a compilation error in passing @code{x} as an argument to @code{foo}. By contrast, @example int foo (void) @{ return 5; @} int bar (int x) @{ return foo (x); @} @end example @noindent would report an error because @code{foo} is supposed to receive no arguments. @node Forward Function Declarations @subsection Forward Function Declarations @cindex forward function declarations @cindex function declarations, forward The order of the function definitions in the source code makes no difference, except that each function needs to be defined or declared before code uses it. The definition of a function also declares its name for the rest of the containing scope. But what if you want to call the function before its definition? To permit that, write a compatible declaration of the same function, before the first call. A declaration that prefigures a subsequent definition in this way is called a @dfn{forward declaration}. The function declaration can be at top @c ??? file scope level or within a block, and it applies until the end of the containing scope. @xref{Function Declarations}, for more information about these declarations. @node Static Functions @subsection Static Functions @cindex static functions @cindex functions, static @findex static The keyword @code{static} in a function definition limits the visibility of the name to the current compilation module. (That's the same thing @code{static} does in variable declarations; @pxref{File-Scope Variables}.) For instance, if one compilation module contains this code: @example static int foo (void) @{ @r{@dots{}} @} @end example @noindent then the code of that compilation module can call @code{foo} anywhere after the definition, but other compilation modules cannot refer to it at all. @cindex forward declaration @cindex static function, declaration To call @code{foo} before its definition, it needs a forward declaration, which should use @code{static} since the function definition does. For this function, it looks like this: @example static int foo (void); @end example It is generally wise to use @code{static} on the definitions of functions that won't be called from outside the same compilation module. This makes sure that calls are not added in other modules. If programmers decide to change the function's calling convention, or understand all the consequences of its use, they will only have to check for calls in the same compilation module. @node Arrays as Parameters @subsection Arrays as Parameters @cindex array as parameters @cindex functions with array parameters Arrays in C are not first-class objects: it is impossible to copy them. So they cannot be passed as arguments like other values. @xref{Limitations of C Arrays}. Rather, array parameters work in a special way. @menu * Array Parm Pointer:: * Passing Array Args:: * Array Parm Qualifiers:: @end menu @node Array Parm Pointer @subsubsection Array parameters are pointers Declaring a function parameter variable as an array really gives it a pointer type. C does this because an expression with array type, if used as an argument in a function call, is converted automatically to a pointer (to the zeroth element of the array). If you declare the corresponding parameter as an ``array'', it will work correctly with the pointer value that really gets passed. This relates to the fact that C does not check array bounds in access to elements of the array (@pxref{Accessing Array Elements}). For example, in this function, @example void clobber4 (int array[20]) @{ array[4] = 0; @} @end example @noindent the parameter @code{array}'s real type is @code{int *}; the specified length, 20, has no effect on the program. You can leave out the length and write this: @example void clobber4 (int array[]) @{ array[4] = 0; @} @end example @noindent or write the parameter declaration explicitly as a pointer: @example void clobber4 (int *array) @{ array[4] = 0; @} @end example They are all equivalent. @node Passing Array Args @subsubsection Passing array arguments The function call passes this pointer by value, like all argument values in C@. However, the result is paradoxical in that the array itself is passed by reference: its contents are treated as shared memory---shared between the caller and the called function, that is. When @code{clobber4} assigns to element 4 of @code{array}, the effect is to alter element 4 of the array specified in the call. @example #include /* @r{Defines @code{NULL}.} */ #include /* @r{Declares @code{malloc},} */ /* @r{Defines @code{EXIT_SUCCESS}.} */ int main (void) @{ int data[] = @{1, 2, 3, 4, 5, 6@}; int i; /* @r{Show the initial value of element 4.} */ for (i = 0; i < 6; i++) printf ("data[%d] = %d\n", i, data[i]); printf ("\n"); clobber4 (data); /* @r{Show that element 4 has been changed.} */ for (i = 0; i < 6; i++) printf ("data[%d] = %d\n", i, data[i]); printf ("\n"); return EXIT_SUCCESS; @} @end example @noindent shows that @code{data[4]} has become zero after the call to @code{clobber4}. The array @code{data} has 6 elements, but passing it to a function whose argument type is written as @code{int [20]} is not an error, because that really stands for @code{int *}. The pointer that is the real argument carries no indication of the length of the array it points into. It is not required to point to the beginning of the array, either. For instance, @example clobber4 (data+1); @end example @noindent passes an ``array'' that starts at element 1 of @code{data}, and the effect is to zero @code{data[5]} instead of @code{data[4]}. If all calls to the function will provide an array of a particular size, you can specify the size of the array to be @code{static}: @example void clobber4 (int array[static 20]) @r{@dots{}} @end example @noindent This is a promise to the compiler that the function will always be called with an array of 20 elements, so that the compiler can optimize code accordingly. If the code breaks this promise and calls the function with, for example, a shorter array, unpredictable things may happen. @node Array Parm Qualifiers @subsubsection Type qualifiers on array parameters You can use the type qualifiers @code{const}, @code{restrict}, and @code{volatile} with array parameters; for example: @example void clobber4 (volatile int array[20]) @r{@dots{}} @end example @noindent denotes that @code{array} is equivalent to a pointer to a volatile @code{int}. Alternatively: @example void clobber4 (int array[const 20]) @r{@dots{}} @end example @noindent makes the array parameter equivalent to a constant pointer to an @code{int}. If we want the @code{clobber4} function to succeed, it would not make sense to write @example void clobber4 (const int array[20]) @r{@dots{}} @end example @noindent as this would tell the compiler that the parameter should point to an array of constant @code{int} values, and then we would not be able to store zeros in them. In a function with multiple array parameters, you can use @code{restrict} to tell the compiler that each array parameter passed in will be distinct: @example void foo (int array1[restrict 10], int array2[restrict 10]) @r{@dots{}} @end example @noindent Using @code{restrict} promises the compiler that callers will not pass in the same array for more than one @code{restrict} array parameter. Knowing this enables the compiler to perform better code optimization. This is the same effect as using @code{restrict} pointers (@pxref{restrict Pointers}), but makes it clear when reading the code that an array of a specific size is expected. @node Structs as Parameters @subsection Functions That Accept Structure Arguments Structures in GNU C are first-class objects, so using them as function parameters and arguments works in the natural way. This function @code{swapfoo} takes a @code{struct foo} with two fields as argument, and returns a structure of the same type but with the fields exchanged. @example struct foo @{ int a, b; @}; struct foo x; struct foo swapfoo (struct foo inval) @{ struct foo outval; outval.a = inval.b; outval.b = inval.a; return outval; @} @end example This simpler definition of @code{swapfoo} avoids using a local variable to hold the result about to be return, by using a structure constructor (@pxref{Structure Constructors}), like this: @example struct foo swapfoo (struct foo inval) @{ return (struct foo) @{ inval.b, inval.a @}; @} @end example It is valid to define a structure type in a function's parameter list, as in @example int frob_bar (struct bar @{ int a, b; @} inval) @{ @var{body} @} @end example @noindent and @var{body} can access the fields of @var{inval} since the structure type @code{struct bar} is defined for the whole function body. However, there is no way to create a @code{struct bar} argument to pass to @code{frob_bar}, except with kludges. As a result, defining a structure type in a parameter list is useless in practice. @node Function Declarations @section Function Declarations @cindex function declarations @cindex declararing functions To call a function, or use its name as a pointer, a @dfn{function declaration} for the function name must be in effect at that point in the code. The function's definition serves as a declaration of that function for the rest of the containing scope, but to use the function in code before the definition, or from another compilation module, a separate function declaration must precede the use. A function declaration looks like the start of a function definition. It begins with the return value type (@code{void} if none) and the function name, followed by argument declarations in parentheses (though these can sometimes be omitted). But that's as far as the similarity goes: instead of the function body, the declaration uses a semicolon. @cindex function prototype @cindex prototype of a function A declaration that specifies argument types is called a @dfn{function prototype}. You can include the argument names or omit them. The names, if included in the declaration, have no effect, but they may serve as documentation. This form of prototype specifies fixed argument types: @example @var{rettype} @var{function} (@var{argtypes}@r{@dots{}}); @end example @noindent This form says the function takes no arguments: @example @var{rettype} @var{function} (void); @end example @noindent This form declares types for some arguments, and allows additional arguments whose types are not specified: @example @var{rettype} @var{function} (@var{argtypes}@r{@dots{}}, ...); @end example For a parameter that's an array of variable length, you can write its declaration with @samp{*} where the ``length'' of the array would normally go; for example, these are all equivalent. @example double maximum (int n, int m, double a[n][m]); double maximum (int n, int m, double a[*][*]); double maximum (int n, int m, double a[ ][*]); double maximum (int n, int m, double a[ ][m]); @end example @noindent The old-fashioned form of declaration, which is not a prototype, says nothing about the types of arguments or how many they should be: @example @var{rettype} @var{function} (); @end example @strong{Warning:} Arguments passed to a function declared without a prototype are converted with the default argument promotions (@pxref{Argument Promotions}. Likewise for additional arguments whose types are unspecified. Function declarations are usually written at the top level in a source file, but you can also put them inside code blocks. Then the function name is visible for the rest of the containing scope. For example: @example void foo (char *file_name) @{ void save_file (char *); save_file (file_name); @} @end example If another part of the code tries to call the function @code{save_file}, this declaration won't be in effect there. So the function will get an implicit declaration of the form @code{extern int save_file ();}. That conflicts with the explicit declaration here, and the discrepancy generates a warning. The syntax of C traditionally allows omitting the data type in a function declaration if it specifies a storage class or a qualifier. Then the type defaults to @code{int}. For example: @example static foo (double x); @end example @noindent defaults the return type to @code{int}. This is bad practice; if you see it, fix it. Calling a function that is undeclared has the effect of an creating @dfn{implicit} declaration in the innermost containing scope, equivalent to this: @example extern int @dfn{function} (); @end example @noindent This declaration says that the function returns @code{int} but leaves its argument types unspecified. If that does not accurately fit the function, then the program @strong{needs} an explicit declaration of the function with argument types in order to call it correctly. Implicit declarations are deprecated, and a function call that creates one causes a warning. @node Function Calls @section Function Calls @cindex function calls @cindex calling functions Starting a program automatically calls the function named @code{main} (@pxref{The main Function}). Aside from that, a function does nothing except when it is @dfn{called}. That occurs during the execution of a function-call expression specifying that function. A function-call expression looks like this: @example @var{function} (@var{arguments}@r{@dots{}}) @end example Most of the time, @var{function} is a function name. However, it can also be an expression with a function pointer value; that way, the program can determine at run time which function to call. The @var{arguments} are a series of expressions separated by commas. Each expression specifies one argument to pass to the function. The list of arguments in a function call looks just like use of the comma operator (@pxref{Comma Operator}), but the fact that it fills the parentheses of a function call gives it a different meaning. Here's an example of a function call, taken from an example near the beginning (@pxref{Complete Program}). @example printf ("Fibonacci series item %d is %d\n", 19, fib (19)); @end example The three arguments given to @code{printf} are a constant string, the integer 19, and the integer returned by @code{fib (19)}. @node Function Call Semantics @section Function Call Semantics @cindex function call semantics @cindex semantics of function calls @cindex call-by-value The meaning of a function call is to compute the specified argument expressions, convert their values according to the function's declaration, then run the function giving it copies of the converted values. (This method of argument passing is known as @dfn{call-by-value}.) When the function finishes, the value it returns becomes the value of the function-call expression. Call-by-value implies that an assignment to the function argument variable has no direct effect on the caller. For instance, @example #include /* @r{Defines @code{EXIT_SUCCESS}.} */ #include /* @r{Declares @code{printf}.} */ void subroutine (int x) @{ x = 5; @} void main (void) @{ int y = 20; subroutine (y); printf ("y is %d\n", y); return EXIT_SUCCESS; @} @end example @noindent prints @samp{y is 20}. Calling @code{subroutine} initializes @code{x} from the value of @code{y}, but this does not establish any other relationship between the two variables. Thus, the assignment to @code{x}, inside @code{subroutine}, changes only @emph{that} @code{x}. If an argument's type is specified by the function's declaration, the function call converts the argument expression to that type if possible. If the conversion is impossible, that is an error. If the function's declaration doesn't specify the type of that argument, then the @emph{default argument promotions} apply. @xref{Argument Promotions}. @node Function Pointers @section Function Pointers @cindex function pointers @cindex pointers to functions A function name refers to a fixed function. Sometimes it is useful to call a function to be determined at run time; to do this, you can use a @dfn{function pointer value} that points to the chosen function (@pxref{Pointers}). Pointer-to-function types can be used to declare variables and other data, including array elements, structure fields, and union alternatives. They can also be used for function arguments and return values. These types have the peculiarity that they are never converted automatically to @code{void *} or vice versa. However, you can do that conversion with a cast. @menu * Declaring Function Pointers:: How to declare a pointer to a function. * Assigning Function Pointers:: How to assign values to function pointers. * Calling Function Pointers:: How to call functions through pointers. @end menu @node Declaring Function Pointers @subsection Declaring Function Pointers @cindex declaring function pointers @cindex function pointers, declaring The declaration of a function pointer variable (or structure field) looks almost like a function declaration, except it has an additional @samp{*} just before the variable name. Proper nesting requires a pair of parentheses around the two of them. For instance, @code{int (*a) ();} says, ``Declare @code{a} as a pointer such that @code{*a} is an @code{int}-returning function.'' Contrast these three declarations: @example /* @r{Declare a function returning @code{char *}.} */ char *a (char *); /* @r{Declare a pointer to a function returning @code{char}.} */ char (*a) (char *); /* @r{Declare a pointer to a function returning @code{char *}.} */ char *(*a) (char *); @end example The possible argument types of the function pointed to are the same as in a function declaration. You can write a prototype that specifies all the argument types: @example @var{rettype} (*@var{function}) (@var{arguments}@r{@dots{}}); @end example @noindent or one that specifies some and leaves the rest unspecified: @example @var{rettype} (*@var{function}) (@var{arguments}@r{@dots{}}, ...); @end example @noindent or one that says there are no arguments: @example @var{rettype} (*@var{function}) (void); @end example You can also write a non-prototype declaration that says nothing about the argument types: @example @var{rettype} (*@var{function}) (); @end example For example, here's a declaration for a variable that should point to some arithmetic function that operates on two @code{double}s: @example double (*binary_op) (double, double); @end example Structure fields, union alternatives, and array elements can be function pointers; so can parameter variables. The function pointer declaration construct can also be combined with other operators allowed in declarations. For instance, @example int **(*foo)(); @end example @noindent declares @code{foo} as a pointer to a function that returns type @code{int **}, and @example int **(*foo[30])(); @end example @noindent declares @code{foo} as an array of 30 pointers to functions that return type @code{int **}. @example int **(**foo)(); @end example @noindent declares @code{foo} as a pointer to a pointer to a function that returns type @code{int **}. @node Assigning Function Pointers @subsection Assigning Function Pointers @cindex assigning function pointers @cindex function pointers, assigning Assuming we have declared the variable @code{binary_op} as in the previous section, giving it a value requires a suitable function to use. So let's define a function suitable for the variable to point to. Here's one: @example double double_add (double a, double b) @{ return a+b; @} @end example Now we can give it a value: @example binary_op = double_add; @end example The target type of the function pointer must be upward compatible with the type of the function (@pxref{Compatible Types}). There is no need for @samp{&} in front of @code{double_add}. Using a function name such as @code{double_add} as an expression automatically converts it to the function's address, with the appropriate function pointer type. However, it is ok to use @samp{&} if you feel that is clearer: @example binary_op = &double_add; @end example @node Calling Function Pointers @subsection Calling Function Pointers @cindex calling function pointers @cindex function pointers, calling To call the function specified by a function pointer, just write the function pointer value in a function call. For instance, here's a call to the function @code{binary_op} points to: @example binary_op (x, 5) @end example Since the data type of @code{binary_op} explicitly specifies type @code{double} for the arguments, the call converts @code{x} and 5 to @code{double}. The call conceptually dereferences the pointer @code{binary_op} to ``get'' the function it points to, and calls that function. If you wish, you can explicitly represent the dereference by writing the @code{*} operator: @example (*binary_op) (x, 5) @end example The @samp{*} reminds people reading the code that @code{binary_op} is a function pointer rather than the name of a specific function. @node The main Function @section The @code{main} Function @cindex @code{main} function @findex main Every complete executable program requires at least one function, called @code{main}, which is where execution begins. You do not have to explicitly declare @code{main}, though GNU C permits you to do so. Conventionally, @code{main} should be defined to follow one of these calling conventions: @example int main (void) @{@r{@dots{}}@} int main (int argc, char *argv[]) @{@r{@dots{}}@} int main (int argc, char *argv[], char *envp[]) @{@r{@dots{}}@} @end example @noindent Using @code{void} as the parameter list means that @code{main} does not use the arguments. You can write @code{char **argv} instead of @code{char *argv[]}, and likewise for @code{envp}, as the two constructs are equivalent. @ignore @c Not so at present Defining @code{main} in any other way generates a warning. Your program will still compile, but you may get unexpected results when executing it. @end ignore You can call @code{main} from C code, as you can call any other function, though that is an unusual thing to do. When you do that, you must write the call to pass arguments that match the parameters in the definition of @code{main}. The @code{main} function is not actually the first code that runs when a program starts. In fact, the first code that runs is system code from the file @file{crt0.o}. In Unix, this was hand-written assembler code, but in GNU we replaced it with C code. Its job is to find the arguments for @code{main} and call that. @menu * Values from main:: Returning values from the main function. * Command-line Parameters:: Accessing command-line parameters provided to the program. * Environment Variables:: Accessing system environment variables. @end menu @node Values from main @subsection Returning Values from @code{main} @cindex returning values from @code{main} @cindex success @cindex failure @cindex exit status When @code{main} returns, the process terminates. Whatever value @code{main} returns becomes the exit status which is reported to the parent process. While nominally the return value is of type @code{int}, in fact the exit status gets truncated to eight bits; if @code{main} returns the value 256, the exit status is 0. Normally, programs return only one of two values: 0 for success, and 1 for failure. For maximum portability, use the macro values @code{EXIT_SUCCESS} and @code{EXIT_FAILURE} defined in @code{stdlib.h}. Here's an example: @cindex @code{EXIT_FAILURE} @cindex @code{EXIT_SUCCESS} @example #include /* @r{Defines @code{EXIT_SUCCESS}} */ /* @r{and @code{EXIT_FAILURE}.} */ int main (void) @{ @r{@dots{}} if (foo) return EXIT_SUCCESS; else return EXIT_FAILURE; @} @end example Some types of programs maintain special conventions for various return values; for example, comparison programs including @code{cmp} and @code{diff} return 1 to indicate a mismatch, and 2 to indicate that the comparison couldn't be performed. @node Command-line Parameters @subsection Accessing Command-line Parameters @cindex command-line parameters @cindex parameters, command-line If the program was invoked with any command-line arguments, it can access them through the arguments of @code{main}, @code{argc} and @code{argv}. (You can give these arguments any names, but the names @code{argc} and @code{argv} are customary.) The value of @code{argv} is an array containing all of the command-line arguments as strings, with the name of the command invoked as the first string. @code{argc} is an integer that says how many strings @code{argv} contains. Here is an example of accessing the command-line parameters, retrieving the program's name and checking for the standard @option{--version} and @option{--help} options: @example #include /* @r{Declare @code{strcmp}.} */ int main (int argc, char *argv[]) @{ char *program_name = argv[0]; for (int i = 1; i < argc; i++) @{ if (!strcmp (argv[i], "--version")) @{ /* @r{Print version information and exit.} */ @r{@dots{}} @} else if (!strcmp (argv[i], "--help")) @{ /* @r{Print help information and exit.} */ @r{@dots{}} @} @} @r{@dots{}} @} @end example @node Environment Variables @subsection Accessing Environment Variables @cindex environment variables You can optionally include a third parameter to @code{main}, another array of strings, to capture the environment variables available to the program. Unlike what happens with @code{argv}, there is no additional parameter for the count of environment variables; rather, the array of environment variables concludes with a null pointer. @example #include /* @r{Declares @code{printf}.} */ int main (int argc, char *argv[], char *envp[]) @{ /* @r{Print out all environment variables.} */ int i = 0; while (envp[i]) @{ printf ("%s\n", envp[i]); i++; @} @} @end example Another method of retrieving environment variables is to use the library function @code{getenv}, which is defined in @code{stdlib.h}. Using @code{getenv} does not require defining @code{main} to accept the @code{envp} pointer. For example, here is a program that fetches and prints the user's home directory (if defined): @example #include /* @r{Declares @code{getenv}.} */ #include /* @r{Declares @code{printf}.} */ int main (void) @{ char *home_directory = getenv ("HOME"); if (home_directory) printf ("My home directory is: %s\n", home_directory); else printf ("My home directory is not defined!\n"); @} @end example @node Advanced Definitions @section Advanced Function Features This section describes some advanced or obscure features for GNU C function definitions. If you are just learning C, you can skip the rest of this chapter. @menu * Variable-Length Array Parameters:: Functions that accept arrays of variable length. * Variable Number of Arguments:: Variadic functions. * Nested Functions:: Defining functions within functions. * Inline Function Definitions:: A function call optimization technique. @end menu @node Variable-Length Array Parameters @subsection Variable-Length Array Parameters @cindex variable-length array parameters @cindex array parameters, variable-length @cindex functions that accept variable-length arrays An array parameter can have variable length: simply declare the array type with a size that isn't constant. In a nested function, the length can refer to a variable defined in a containing scope. In any function, it can refer to a previous parameter, like this: @example struct entry tester (int len, char data[len][len]) @{ @r{@dots{}} @} @end example Alternatively, in function declarations (but not in function definitions), you can use @code{[*]} to denote that the array parameter is of a variable length, such that these two declarations mean the same thing: @example struct entry tester (int len, char data[len][len]); @end example @example struct entry tester (int len, char data[*][*]); @end example @noindent The two forms of input are equivalent in GNU C, but emphasizing that the array parameter is variable-length may be helpful to those studying the code. You can also omit the length parameter, and instead use some other in-scope variable for the length in the function definition: @example struct entry tester (char data[*][*]); @r{@dots{}} int dataLength = 20; @r{@dots{}} struct entry tester (char data[dataLength][dataLength]) @{ @r{@dots{}} @} @end example @c ??? check text above @cindex parameter forward declaration In GNU C, to pass the array first and the length afterward, you can use a @dfn{parameter forward declaration}, like this: @example struct entry tester (int len; char data[len][len], int len) @{ @r{@dots{}} @} @end example The @samp{int len} before the semicolon is the parameter forward declaration; it serves the purpose of making the name @code{len} known when the declaration of @code{data} is parsed. You can write any number of such parameter forward declarations in the parameter list. They can be separated by commas or semicolons, but the last one must end with a semicolon, which is followed by the ``real'' parameter declarations. Each forward declaration must match a subsequent ``real'' declaration in parameter name and data type. Standard C does not support parameter forward declarations. @node Variable Number of Arguments @subsection Variable-Length Parameter Lists @cindex variable-length parameter lists @cindex parameters lists, variable length @cindex function parameter lists, variable length @cindex variadic function A function that takes a variable number of arguments is called a @dfn{variadic function}. In C, a variadic function must specify at least one fixed argument with an explicitly declared data type. Additional arguments can follow, and can vary in both quantity and data type. In the function header, declare the fixed parameters in the normal way, then write a comma and an ellipsis: @samp{, ...}. Here is an example of a variadic function header: @example int add_multiple_values (int number, ...) @end example @cindex @code{va_list} @cindex @code{va_start} @cindex @code{va_end} The function body can refer to fixed arguments by their parameter names, but the additional arguments have no names. Accessing them in the function body uses certain standard macros. They are defined in the library header file @file{stdarg.h}, so the code must @code{#include} that file. In the body, write @example va_list ap; va_start (ap, @var{last_fixed_parameter}); @end example @noindent This declares the variable @code{ap} (you can use any name for it) and then sets it up to point before the first additional argument. Then, to fetch the next consecutive additional argument, write this: @example va_arg (ap, @var{type}) @end example After fetching all the additional arguments (or as many as need to be used), write this: @example va_end (ap); @end example Here's an example of a variadic function definition that adds any number of @code{int} arguments. The first (fixed) argument says how many more arguments follow. @example #include /* @r{Defines @code{va}@r{@dots{}} macros.} */ @r{@dots{}} int add_multiple_values (int argcount, ...) @{ int counter, total = 0; /* @r{Declare a variable of type @code{va_list}.} */ va_list argptr; /* @r{Initialize that variable..} */ va_start (argptr, argcount); for (counter = 0; counter < argcount; counter++) @{ /* @r{Get the next additional argument.} */ total += va_arg (argptr, int); @} /* @r{End use of the @code{argptr} variable.} */ va_end (argptr); return total; @} @end example With GNU C, @code{va_end} is superfluous, but some other compilers might make @code{va_start} allocate memory so that calling @code{va_end} is necessary to avoid a memory leak. Before doing @code{va_start} again with the same variable, do @code{va_end} first. @cindex @code{va_copy} Because of this possible memory allocation, it is risky (in principle) to copy one @code{va_list} variable to another with assignment. Instead, use @code{va_copy}, which copies the substance but allocates separate memory in the variable you copy to. The call looks like @code{va_copy (@var{to}, @var{from})}, where both @var{to} and @var{from} should be variables of type @code{va_list}. In principle, do @code{va_end} on each of these variables before its scope ends. Since the additional arguments' types are not specified in the function's definition, the default argument promotions (@pxref{Argument Promotions}) apply to them in function calls. The function definition must take account of this; thus, if an argument was passed as @code{short}, the function should get it as @code{int}. If an argument was passed as @code{float}, the function should get it as @code{double}. C has no mechanism to tell the variadic function how many arguments were passed to it, so its calling convention must give it a way to determine this. That's why @code{add_multiple_values} takes a fixed argument that says how many more arguments follow. Thus, you can call the function like this: @example sum = add_multiple_values (3, 12, 34, 190); /* @r{Value is 12+34+190.} */ @end example In GNU C, there is no actual need to use the @code{va_end} function. In fact, it does nothing. It's used for compatibility with other compilers, when that matters. It is a mistake to access variables declared as @code{va_list} except in the specific ways described here. Just what that type consists of is an implementation detail, which could vary from one platform to another. @node Nested Functions @subsection Nested Functions @cindex nested functions @cindex functions, nested @cindex downward funargs @cindex thunks A @dfn{nested function} is a function defined inside another function. (The ability to do this indispensable for automatic translation of certain programming languages into C.) The nested function's name is local to the block where it is defined. For example, here we define a nested function named @code{square}, then call it twice: @example @group foo (double a, double b) @{ double square (double z) @{ return z * z; @} return square (a) + square (b); @} @end group @end example The nested function definition can access all the variables of the containing function that are visible at the point of its definition. This is called @dfn{lexical scoping}. For example, here we show a nested function that uses an inherited variable named @code{offset}: @example @group bar (int *array, int offset, int size) @{ int access (int *array, int index) @{ return array[index + offset]; @} int i; @r{@dots{}} for (i = 0; i < size; i++) @r{@dots{}} access (array, i) @r{@dots{}} @} @end group @end example Nested function definitions can appear wherever automatic variable declarations are allowed; that is, in any block, interspersed with the other declarations and statements in the block. The nested function's name is visible only within the parent block; the name's scope starts from its definition and continues to the end of the containing block. If the nested function's name is the same as the parent function's name, there will be no way to refer to the parent function inside the scope of the name of the nested function. Using @code{extern} or @code{static} on a nested function definition is an error. It is possible to call the nested function from outside the scope of its name by storing its address or passing the address to another function. You can do this safely, but you must be careful: @example @group hack (int *array, int size, int addition) @{ void store (int index, int value) @{ array[index] = value + addition; @} intermediate (store, size); @} @end group @end example Here, the function @code{intermediate} receives the address of @code{store} as an argument. If @code{intermediate} calls @code{store}, the arguments given to @code{store} are used to store into @code{array}. @code{store} also accesses @code{hack}'s local variable @code{addition}. It is safe for @code{intermediate} to call @code{store} because @code{hack}'s stack frame, with its arguments and local variables, continues to exist during the call to @code{intermediate}. Calling the nested function through its address after the containing function has exited is asking for trouble. If it is called after a containing scope level has exited, and if it refers to some of the variables that are no longer in scope, it will refer to memory containing junk or other data. It's not wise to take the risk. The GNU C Compiler implements taking the address of a nested function using a technique called @dfn{trampolines}. This technique was described in @cite{Lexical Closures for C@t{++}} (Thomas M. Breuel, USENIX C@t{++} Conference Proceedings, October 17--21, 1988). A nested function can jump to a label inherited from a containing function, provided the label was explicitly declared in the containing function (@pxref{Local Labels}). Such a jump returns instantly to the containing function, exiting the nested function that did the @code{goto} and any intermediate function invocations as well. Here is an example: @example @group bar (int *array, int offset, int size) @{ /* @r{Explicitly declare the label @code{failure}.} */ __label__ failure; int access (int *array, int index) @{ if (index > size) /* @r{Exit this function,} @r{and return to @code{bar}.} */ goto failure; return array[index + offset]; @} @end group @group int i; @r{@dots{}} for (i = 0; i < size; i++) @r{@dots{}} access (array, i) @r{@dots{}} @r{@dots{}} return 0; /* @r{Control comes here from @code{access} if it does the @code{goto}.} */ failure: return -1; @} @end group @end example To declare the nested function before its definition, use @code{auto} (which is otherwise meaningless for function declarations; @pxref{auto and register}). For example, @example bar (int *array, int offset, int size) @{ auto int access (int *, int); @r{@dots{}} @r{@dots{}} access (array, i) @r{@dots{}} @r{@dots{}} int access (int *array, int index) @{ @r{@dots{}} @} @r{@dots{}} @} @end example @node Inline Function Definitions @subsection Inline Function Definitions @cindex inline function definitions @cindex function definitions, inline @findex inline To declare a function inline, use the @code{inline} keyword in its definition. Here's a simple function that takes a pointer-to-@code{int} and increments the integer stored there---declared inline. @example struct list @{ struct list *first, *second; @}; inline struct list * list_first (struct list *p) @{ return p->first; @} inline struct list * list_second (struct list *p) @{ return p->second; @} @end example optimized compilation can substitute the inline function's body for any call to it. This is called @emph{inlining} the function. It makes the code that contains the call run faster, significantly so if the inline function is small. Here's a function that uses @code{pair_second}: @example int pairlist_length (struct list *l) @{ int length = 0; while (l) @{ length++; l = pair_second (l); @} return length; @} @end example Substituting the code of @code{pair_second} into the definition of @code{pairlist_length} results in this code, in effect: @example int pairlist_length (struct list *l) @{ int length = 0; while (l) @{ length++; l = l->second; @} return length; @} @end example Since the definition of @code{pair_second} does not say @code{extern} or @code{static}, that definition is used only for inlining. It doesn't generate code that can be called at run time. If not all the calls to the function are inlined, there must be a definition of the same function name in another module for them to call. @cindex inline functions, omission of @c @opindex fkeep-inline-functions Adding @code{static} to an inline function definition means the function definition is limited to this compilation module. Also, it generates run-time code if necessary for the sake of any calls that were not inlined. If all calls are inlined then the function definition does not generate run-time code, but you can force generation of run-time code with the option @option{-fkeep-inline-functions}. @cindex extern inline function Specifying @code{extern} along with @code{inline} means the function is external and generates run-time code to be called from other separately compiled modules, as well as inlined. You can define the function as @code{inline} without @code{extern} in other modules so as to inline calls to the same function in those modules. Why are some calls not inlined? First of all, inlining is an optimization, so non-optimized compilation does not inline. Some calls cannot be inlined for technical reasons. Also, certain usages in a function definition can make it unsuitable for inline substitution. Among these usages are: variadic functions, use of @code{alloca}, use of computed goto (@pxref{Labels as Values}), and use of nonlocal goto. The option @option{-Winline} requests a warning when a function marked @code{inline} is unsuitable to be inlined. The warning explains what obstacle makes it unsuitable. Just because a call @emph{can} be inlined does not mean it @emph{should} be inlined. The GNU C compiler weighs costs and benefits to decide whether inlining a particular call is advantageous. You can force inlining of all calls to a given function that can be inlined, even in a non-optimized compilation. by specifying the @samp{always_inline} attribute for the function, like this: @example /* @r{Prototype.} */ inline void foo (const char) __attribute__((always_inline)); @end example @noindent This is a GNU C extension. @xref{Attributes}. A function call may be inlined even if not declared @code{inline} in special cases where the compiler can determine this is correct and desirable. For instance, when a static function is called only once, it will very likely be inlined. With @option{-flto}, link-time optimization, any function might be inlined. To absolutely prevent inlining of a specific function, specify @code{__attribute__((__noinline__))} in the function's definition. @node Obsolete Definitions @section Obsolete Function Features These features of function definitions are still used in old programs, but you shouldn't write code this way today. If you are just learning C, you can skip this section. @menu * Old GNU Inlining:: An older inlining technique. * Old-Style Function Definitions:: Original K&R style functions. @end menu @node Old GNU Inlining @subsection Older GNU C Inlining The GNU C spec for inline functions, before GCC version 5, defined @code{extern inline} on a function definition to mean to inline calls to it but @emph{not} generate code for the function that could be called at run time. By contrast, @code{inline} without @code{extern} specified to generate run-time code for the function. In effect, ISO incompatibly flipped the meanings of these two cases. We changed GCC in version 5 to adopt the ISO specification. Many programs still use these cases with the previous GNU C meanings. You can specify use of those meanings with the option @option{-fgnu89-inline}. You can also specify this for a single function with @code{__attribute__ ((gnu_inline))}. Here's an example: @example inline __attribute__ ((gnu_inline)) int inc (int *a) @{ (*a)++; @} @end example @node Old-Style Function Definitions @subsection Old-Style Function Definitions @cindex old-style function definitions @cindex function definitions, old-style @cindex K&R-style function definitions The syntax of C traditionally allows omitting the data type in a function declaration if it specifies a storage class or a qualifier. Then the type defaults to @code{int}. For example: @example static foo (double x); @end example @noindent defaults the return type to @code{int}. This is bad practice; if you see it, fix it. An @dfn{old-style} (or ``K&R'') function definition is the way function definitions were written in the 1980s. It looks like this: @example @var{rettype} @var{function} (@var{parmnames}) @var{parm_declarations} @{ @var{body} @} @end example In @var{parmnames}, only the parameter names are listed, separated by commas. Then @var{parm_declarations} declares their data types; these declarations look just like variable declarations. If a parameter is listed in @var{parmnames} but has no declaration, it is implicitly declared @code{int}. There is no reason to write a definition this way nowadays, but they can still be seen in older GNU programs. An old-style variadic function definition looks like this: @example #include int add_multiple_values (va_alist) va_dcl @{ int argcount; int counter, total = 0; /* @r{Declare a variable of type @code{va_list}.} */ va_list argptr; /* @r{Initialize that variable.} */ va_start (argptr); /* @r{Get the first argument (fixed).} */ argcount = va_arg (int); for (counter = 0; counter < argcount; counter++) @{ /* @r{Get the next additional argument.} */ total += va_arg (argptr, int); @} /* @r{End use of the @code{argptr} variable.} */ va_end (argptr); return total; @} @end example Note that the old-style variadic function definition has no fixed parameter variables; all arguments must be obtained with @code{va_arg}. @node Compatible Types @chapter Compatible Types @cindex compatible types @cindex types, compatible Declaring a function or variable twice is valid in C only if the two declarations specify @dfn{compatible} types. In addition, some operations on pointers require operands to have compatible target types. In C, two different primitive types are never compatible. Likewise for the defined types @code{struct}, @code{union} and @code{enum}: two separately defined types are incompatible unless they are defined exactly the same way. However, there are a few cases where different types can be compatible: @itemize @bullet @item Every enumeration type is compatible with some integer type. In GNU C, the choice of integer type depends on the largest enumeration value. @c ??? Which one, in GCC? @c ??? ... it varies, depending on the enum values. Testing on @c ??? fencepost, it appears to use a 4-byte signed integer first, @c ??? then moves on to an 8-byte signed integer. These details @c ??? might be platform-dependent, as the C standard says that even @c ??? char could be used as an enum type, but it's at least true @c ??? that GCC chooses a type that is at least large enough to @c ??? hold the largest enum value. @item Array types are compatible if the element types are compatible and the sizes (when specified) match. @item Pointer types are compatible if the pointer target types are compatible. @item Function types that specify argument types are compatible if the return types are compatible and the argument types are compatible, argument by argument. In addition, they must all agree in whether they use @code{...} to allow additional arguments. @item Function types that don't specify argument types are compatible if the return types are. @item Function types that specify the argument types are compatible with function types that omit them, if the return types are compatible and the specified argument types are unaltered by the argument promotions (@pxref{Argument Promotions}). @end itemize In order for types to be compatible, they must agree in their type qualifiers. Thus, @code{const int} and @code{int} are incompatible. It follows that @code{const int *} and @code{int *} are incompatible too (they are pointers to types that are not compatible). If two types are compatible ignoring the qualifiers, we call them @dfn{nearly compatible}. (If they are array types, we ignore qualifiers on the element types.@footnote{This is a GNU C extension.}) Comparison of pointers is valid if the pointers' target types are nearly compatible. Likewise, the two branches of a conditional expression may be pointers to nearly compatible target types. If two types are compatible ignoring the qualifiers, and the first type has all the qualifiers of the second type, we say the first is @dfn{upward compatible} with the second. Assignment of pointers requires the assigned pointer's target type to be upward compatible with the right operand (the new value)'s target type. @node Type Conversions @chapter Type Conversions @cindex type conversions @cindex conversions, type C converts between data types automatically when that seems clearly necessary. In addition, you can convert explicitly with a @dfn{cast}. @menu * Explicit Type Conversion:: Casting a value from one type to another. * Assignment Type Conversions:: Automatic conversion by assignment operation. * Argument Promotions:: Automatic conversion of function parameters. * Operand Promotions:: Automatic conversion of arithmetic operands. * Common Type:: When operand types differ, which one is used? @end menu @node Explicit Type Conversion @section Explicit Type Conversion @cindex cast @cindex explicit type conversion You can do explicit conversions using the unary @dfn{cast} operator, which is written as a type designator (@pxref{Type Designators}) in parentheses. For example, @code{(int)} is the operator to cast to type @code{int}. Here's an example of using it: @example @{ double d = 5.5; printf ("Floating point value: %f\n", d); printf ("Rounded to integer: %d\n", (int) d); @} @end example Using @code{(int) d} passes an @code{int} value as argument to @code{printf}, so you can print it with @samp{%d}. Using just @code{d} without the cast would pass the value as @code{double}. That won't work at all with @samp{%d}; the results would be gibberish. To divide one integer by another without rounding, cast either of the integers to @code{double} first: @example (double) @var{dividend} / @var{divisor} @var{dividend} / (double) @var{divisor} @end example It is enough to cast one of them, because that forces the common type to @code{double} so the other will be converted automatically. The valid cast conversions are: @itemize @bullet @item One numerical type to another. @item One pointer type to another. (Converting between pointers that point to functions and pointers that point to data is not standard C.) @item A pointer type to an integer type. @item An integer type to a pointer type. @item To a union type, from the type of any alternative in the union (@pxref{Unions}). (This is a GNU extension.) @item Anything, to @code{void}. @end itemize @node Assignment Type Conversions @section Assignment Type Conversions @cindex assignment type conversions Certain type conversions occur automatically in assignments and certain other contexts. These are the conversions assignments can do: @itemize @bullet @item Converting any numeric type to any other numeric type. @item Converting @code{void *} to any other pointer type (except pointer-to-function types). @item Converting any other pointer type to @code{void *}. (except pointer-to-function types). @item Converting 0 (a null pointer constant) to any pointer type. @item Converting any pointer type to @code{bool}. (The result is 1 if the pointer is not null.) @item Converting between pointer types when the left-hand target type is upward compatible with the right-hand target type. @xref{Compatible Types}. @end itemize These type conversions occur automatically in certain contexts, which are: @itemize @bullet @item An assignment converts the type of the right-hand expression to the type wanted by the left-hand expression. For example, @example double i; i = 5; @end example @noindent converts 5 to @code{double}. @item A function call, when the function specifies the type for that argument, converts the argument value to that type. For example, @example void foo (double); foo (5); @end example @noindent converts 5 to @code{double}. @item A @code{return} statement converts the specified value to the type that the function is declared to return. For example, @example double foo () @{ return 5; @} @end example @noindent also converts 5 to @code{double}. @end itemize In all three contexts, if the conversion is impossible, that constitutes an error. @node Argument Promotions @section Argument Promotions @cindex argument promotions @cindex promotion of arguments When a function's definition or declaration does not specify the type of an argument, that argument is passed without conversion in whatever type it has, with these exceptions: @itemize @bullet @item Some narrow numeric values are @dfn{promoted} to a wider type. If the expression is a narrow integer, such as @code{char} or @code{short}, the call converts it automatically to @code{int} (@pxref{Integer Types}).@footnote{On an embedded controller where @code{char} or @code{short} is the same width as @code{int}, @code{unsigned char} or @code{unsigned short} promotes to @code{unsigned int}, but that never occurs in GNU C on real computers.} In this example, the expression @code{c} is passed as an @code{int}: @example char c = '$'; printf ("Character c is '%c'\n", c); @end example @item If the expression has type @code{float}, the call converts it automatically to @code{double}. @item An array as argument is converted to a pointer to its zeroth element. @item A function name as argument is converted to a pointer to that function. @end itemize @node Operand Promotions @section Operand Promotions @cindex operand promotions The operands in arithmetic operations undergo type conversion automatically. These @dfn{operand promotions} are the same as the argument promotions except without converting @code{float} to @code{double}. In other words, the operand promotions convert @itemize @bullet @item @code{char} or @code{short} (whether signed or not) to @code{int}. @item an array to a pointer to its zeroth element, and @item a function name to a pointer to that function. @end itemize @node Common Type @section Common Type @cindex common type Arithmetic binary operators (except the shift operators) convert their operands to the @dfn{common type} before operating on them. Conditional expressions also convert the two possible results to their common type. Here are the rules for determining the common type. If one of the numbers has a floating-point type and the other is an integer, the common type is that floating-point type. For instance, @example 5.6 * 2 @result{} 11.2 /* @r{a @code{double} value} */ @end example If both are floating point, the type with the larger range is the common type. If both are integers but of different widths, the common type is the wider of the two. If they are integer types of the same width, the common type is unsigned if either operand is unsigned, and it's @code{long} if either operand is @code{long}. It's @code{long long} if either operand is @code{long long}. These rules apply to addition, subtraction, multiplication, division, remainder, comparisons, and bitwise operations. They also apply to the two branches of a conditional expression, and to the arithmetic done in a modifying assignment operation. @node Scope @chapter Scope @cindex scope @cindex block scope @cindex function scope @cindex function prototype scope Each definition or declaration of an identifier is visible in certain parts of the program, which is typically less than the whole of the program. The parts where it is visible are called its @dfn{scope}. Normally, declarations made at the top-level in the source -- that is, not within any blocks and function definitions -- are visible for the entire contents of the source file after that point. This is called @dfn{file scope} (@pxref{File-Scope Variables}). Declarations made within blocks of code, including within function definitions, are visible only within those blocks. This is called @dfn{block scope}. Here is an example: @example @group void foo (void) @{ int x = 42; @} @end group @end example @noindent In this example, the variable @code{x} has block scope; it is visible only within the @code{foo} function definition block. Thus, other blocks could have their own variables, also named @code{x}, without any conflict between those variables. A variable declared inside a subblock has a scope limited to that subblock, @example @group void foo (void) @{ @{ int x = 42; @} // @r{@code{x} is out of scope here.} @} @end group @end example If a variable declared within a block has the same name as a variable declared outside of that block, the definition within the block takes precedence during its scope: @example @group int x = 42; void foo (void) @{ int x = 17; printf ("%d\n", x); @} @end group @end example @noindent This prints 17, the value of the variable @code{x} declared in the function body block, rather than the value of the variable @code{x} at file scope. We say that the inner declaration of @code{x} @dfn{shadows} the outer declaration, for the extent of the inner declaration's scope. A declaration with block scope can be shadowed by another declaration with the same name in a subblock. @example @group void foo (void) @{ char *x = "foo"; @{ int x = 42; @r{@dots{}} exit (x / 6); @} @} @end group @end example A function parameter's scope is the entire function body, but it can be shadowed. For example: @example @group int x = 42; void foo (int x) @{ printf ("%d\n", x); @} @end group @end example @noindent This prints the value of @code{x} the function parameter, rather than the value of the file-scope variable @code{x}. Labels (@pxref{goto Statement}) have @dfn{function} scope: each label is visible for the whole of the containing function body, both before and after the label declaration: @example @group void foo (void) @{ @r{@dots{}} goto bar; @r{@dots{}} @{ // @r{Subblock does not affect labels.} bar: @r{@dots{}} @} goto bar; @} @end group @end example Except for labels, a declared identifier is not visible to code before its declaration. For example: @example @group int x = 5; int y = x + 10; @end group @end example @noindent will work, but: @example @group int x = y + 10; int y = 5; @end group @end example @noindent cannot refer to the variable @code{y} before its declaration. @include cpp.texi @node Integers in Depth @chapter Integers in Depth This chapter explains the machine-level details of integer types: how they are represented as bits in memory, and the range of possible values for each integer type. @menu * Integer Representations:: How integer values appear in memory. * Maximum and Minimum Values:: Value ranges of integer types. @end menu @node Integer Representations @section Integer Representations @cindex integer representations @cindex representation of integers Modern computers store integer values as binary (base-2) numbers that occupy a single unit of storage, typically either as an 8-bit @code{char}, a 16-bit @code{short int}, a 32-bit @code{int}, or possibly, a 64-bit @code{long long int}. Whether a @code{long int} is a 32-bit or a 64-bit value is system dependent.@footnote{In theory, any of these types could have some other size, bit it's not worth even a minute to cater to that possibility. It never happens on GNU/Linux.} @cindex @code{CHAR_BIT} The macro @code{CHAR_BIT}, defined in @file{limits.h}, gives the number of bits in type @code{char}. On any real operating system, the value is 8. The fixed sizes of numeric types necessarily limits their @dfn{range of values}, and the particular encoding of integers decides what that range is. @cindex two's-complement representation For unsigned integers, the entire space is used to represent a nonnegative value. Signed integers are stored using @dfn{two's-complement representation}: a signed integer with @var{n} bits has a range from @math{-2@sup{(@var{n} - 1)}} to @minus{}1 to 0 to 1 to @math{+2@sup{(@var{n} - 1)} - 1}, inclusive. The leftmost, or high-order, bit is called the @dfn{sign bit}. @c ??? Needs correcting There is only one value that means zero, and the most negative number lacks a positive counterpart. As a result, negating that number causes overflow; in practice, its result is that number back again. For example, a two's-complement signed 8-bit integer can represent all decimal numbers from @minus{}128 to +127. We will revisit that peculiarity shortly. Decades ago, there were computers that didn't use two's-complement representation for integers (@pxref{Integers in Depth}), but they are long gone and not worth any effort to support. @c ??? Is this duplicate? When an arithmetic operation produces a value that is too big to represent, the operation is said to @dfn{overflow}. In C, integer overflow does not interrupt the control flow or signal an error. What it does depends on signedness. For unsigned arithmetic, the result of an operation that overflows is the @var{n} low-order bits of the correct value. If the correct value is representable in @var{n} bits, that is always the result; thus we often say that ``integer arithmetic is exact,'' omitting the crucial qualifying phrase ``as long as the exact result is representable.'' In principle, a C program should be written so that overflow never occurs for signed integers, but in GNU C you can specify various ways of handling such overflow (@pxref{Integer Overflow}). Integer representations are best understood by looking at a table for a tiny integer size; here are the possible values for an integer with three bits: @multitable @columnfractions .25 .25 .25 .25 @headitem Unsigned @tab Signed @tab Bits @tab 2s Complement @item 0 @tab 0 @tab 000 @tab 000 (0) @item 1 @tab 1 @tab 001 @tab 111 (-1) @item 2 @tab 2 @tab 010 @tab 110 (-2) @item 3 @tab 3 @tab 011 @tab 101 (-3) @item 4 @tab -4 @tab 100 @tab 100 (-4) @item 5 @tab -3 @tab 101 @tab 011 (3) @item 6 @tab -2 @tab 110 @tab 010 (2) @item 7 @tab -1 @tab 111 @tab 001 (1) @end multitable The parenthesized decimal numbers in the last column represent the signed meanings of the two's-complement of the line's value. Recall that, in two's-complement encoding, the high-order bit is 0 when the number is nonnegative. We can now understand the peculiar behavior of negation of the most negative two's-complement integer: start with 0b100, invert the bits to get 0b011, and add 1: we get 0b100, the value we started with. We can also see overflow behavior in two's-complement: @example 3 + 1 = 0b011 + 0b001 = 0b100 = (-4) 3 + 2 = 0b011 + 0b010 = 0b101 = (-3) 3 + 3 = 0b011 + 0b011 = 0b110 = (-2) @end example @noindent A sum of two nonnegative signed values that overflows has a 1 in the sign bit, so the exact positive result is truncated to a negative value. @c ===================================================================== @node Maximum and Minimum Values @section Maximum and Minimum Values @cindex maximum integer values @cindex minimum integer values @cindex integer ranges @cindex ranges of integer types @findex INT_MAX @findex UINT_MAX @findex SHRT_MAX @findex LONG_MAX @findex LLONG_MAX @findex USHRT_MAX @findex ULONG_MAX @findex ULLONG_MAX @findex CHAR_MAX @findex SCHAR_MAX @findex UCHAR_MAX For each primitive integer type, there is a standard macro defined in @file{limits.h} that gives the largest value that type can hold. For instance, for type @code{int}, the maximum value is @code{INT_MAX}. On a 32-bit computer, that is equal to 2,147,483,647. The maximum value for @code{unsigned int} is @code{UINT_MAX}, which on a 32-bit computer is equal to 4,294,967,295. Likewise, there are @code{SHRT_MAX}, @code{LONG_MAX}, and @code{LLONG_MAX}, and corresponding unsigned limits @code{USHRT_MAX}, @code{ULONG_MAX}, and @code{ULLONG_MAX}. Since there are three ways to specify a @code{char} type, there are also three limits: @code{CHAR_MAX}, @code{SCHAR_MAX}, and @code{UCHAR_MAX}. For each type that is or might be signed, there is another symbol that gives the minimum value it can hold. (Just replace @code{MAX} with @code{MIN} in the names listed above.) There is no minimum limit symbol for types specified with @code{unsigned} because the minimum for them is universally zero. @code{INT_MIN} is not the negative of @code{INT_MAX}. In two's-complement representation, the most negative number is 1 less than the negative of the most positive number. Thus, @code{INT_MIN} on a 32-bit computer has the value @minus{}2,147,483,648. You can't actually write the value that way in C, since it would overflow. That's a good reason to use @code{INT_MIN} to specify that value. Its definition is written to avoid overflow. @include fp.texi @node Compilation @chapter Compilation @cindex object file @cindex compilation module @cindex make rules @cindex link Early in the manual we explained how to compile a simple C program that consists of a single source file (@pxref{Compile Example}). However, we handle only short programs that way. A typical C program consists of many source files, each of which is usually a separate @dfn{compilation module}---meaning that it has to be compiled separately. (The source files that are not separate compilation modules are those that are used via @code{#include}; see @ref{Header Files}.) To compile a multi-module program, you compile each of the program's compilation modules, making an @dfn{object file} for that module. The last step is to @dfn{link} the many object files together into a single executable for the whole program. The full details of how to compile C programs (and other programs) with GCC are documented in xxxx. @c ??? ref Here we give only a simple introduction. These commands compile two compilation modules, @file{foo.c} and @file{bar.c}, running the compiler for each module: @example gcc -c -O -g foo.c gcc -c -O -g bar.c @end example @noindent In these commands, @option{-g} says to generate debugging information, @option{-O} says to do some optimization, and @option{-c} says to put the compiled code for that module into a corresponding object file and go no further. The object file for @file{foo.c} is automatically called @file{foo.o}, and so on. If you wish, you can specify the additional compilation options. For instance, @option{-Wformat -Wparenthesis -Wstrict-prototypes} request additional warnings. @cindex linking object files After you compile all the program's modules, you link the object files into a combined executable, like this: @example gcc -o foo foo.o bar.o @end example @noindent In this command, @option{-o foo} species the file name for the executable file, and the other arguments are the object files to link. Always specify the executable file name in a command that generates one. One reason to divide a large program into multiple compilation modules is to control how each module can access the internals of the others. When a module declares a function or variable @code{extern}, other modules can access it. The other functions and variables defined in a module can't be accessed from outside that module. The other reason for using multiple modules is so that changing one source file does not require recompiling all of them in order to try the modified program. It is sufficient to recompile the source file that you changed, then link them all again. Dividing a large program into many substantial modules in this way typically makes recompilation much faster. Normally we don't run any of these commands directly. Instead we write a set of @dfn{make rules} for the program, then use the @command{make} program to recompile only the source files that need to be recompiled, by following those rules. @xref{Top, The GNU Make Manual, , make, The GNU Make Manual}. @node Directing Compilation @chapter Directing Compilation This chapter describes C constructs that don't alter the program's meaning @emph{as such}, but rather direct the compiler how to treat some aspects of the program. @menu * Pragmas:: Controlling compilation of some constructs. * Static Assertions:: Compile-time tests for conditions. @end menu @node Pragmas @section Pragmas A @dfn{pragma} is an annotation in a program that gives direction to the compiler. @menu * Pragma Basics:: Pragma syntax and usage. * Severity Pragmas:: Settings for compile-time pragma output. * Optimization Pragmas:: Controlling optimizations. @end menu @c See also @ref{Macro Pragmas}, which save and restore macro definitions. @node Pragma Basics @subsection Pragma Basics C defines two syntactical forms for pragmas, the line form and the token form. You can write any pragma in either form, with the same meaning. The line form is a line in the source code, like this: @example #pragma @var{line} @end example @noindent The line pragma has no effect on the parsing of the lines around it. This form has the drawback that it can't be generated by a macro expansion. The token form is a series of tokens; it can appear anywhere in the program between the other tokens. @example _Pragma (@var{stringconstant}) @end example @noindent The pragma has no effect on the syntax of the tokens that surround it; thus, here's a pragma in the middle of an @code{if} statement: @example if _Pragma ("hello") (x > 1) @end example @noindent However, that's an unclear thing to do; for the sake of understandability, it is better to put a pragma on a line by itself and not embedded in the middle of another construct. Both forms of pragma have a textual argument. In a line pragma, the text is the rest of the line. The textual argument to @code{_Pragma} uses the same syntax as a C string constant: surround the text with two @samp{"} characters, and add a backslash before each @samp{"} or @samp{\} character in it. With either syntax, the textual argument specifies what to do. It begins with one or several words that specify the operation. If the compiler does not recognize them, it ignores the pragma. Here are the pragma operations supported in GNU C@. @c ??? Verify font for [] @table @code @item #pragma GCC dependency "@var{file}" [@var{message}] @itemx _Pragma ("GCC dependency \"@var{file}\" [@var{message}]") Declares that the current source file depends on @var{file}, so GNU C compares the file times and gives a warning if @var{file} is newer than the current source file. This directive searches for @var{file} the way @code{#include} searches for a non-system header file. If @var{message} is given, the warning message includes that text. Examples: @example #pragma GCC dependency "parse.y" _pragma ("GCC dependency \"/usr/include/time.h\" \ rerun fixincludes") @end example @item #pragma GCC poison @var{identifiers} @itemx _Pragma ("GCC poison @var{identifiers}") Poisons the identifiers listed in @var{identifiers}. This is useful to make sure all mention of @var{identifiers} has been deleted from the program and that no reference to them creeps back in. If any of those identifiers appears anywhere in the source after the directive, it causes a compilation error. For example, @example #pragma GCC poison printf sprintf fprintf sprintf(some_string, "hello"); @end example @noindent generates an error. If a poisoned identifier appears as part of the expansion of a macro that was defined before the identifier was poisoned, it will @emph{not} cause an error. Thus, system headers that define macros that use the identifier will not cause errors. For example, @example #define strrchr rindex _Pragma ("GCC poison rindex") strrchr(some_string, 'h'); @end example @noindent does not cause a compilation error. @item #pragma GCC system_header @itemx _Pragma ("GCC system_header") Specify treating the rest of the current source file as if it came from a system header file. @xref{System Headers, System Headers, System Headers, gcc, Using the GNU Compiler Collection}. @item #pragma GCC warning @var{message} @itemx _Pragma ("GCC warning @var{message}") Equivalent to @code{#warning}. Its advantage is that the @code{_Pragma} form can be included in a macro definition. @item #pragma GCC error @var{message} @itemx _Pragma ("GCC error @var{message}") Equivalent to @code{#error}. Its advantage is that the @code{_Pragma} form can be included in a macro definition. @item #pragma GCC message @var{message} @itemx _Pragma ("GCC message @var{message}") Similar to @samp{GCC warning} and @samp{GCC error}, this simply prints an informational message, and could be used to include additional warning or error text without triggering more warnings or errors. (Note that unlike @samp{warning} and @samp{error}, @samp{message} does not include @samp{GCC} as part of the pragma.) @end table @node Severity Pragmas @subsection Severity Pragmas These pragmas control the severity of classes of diagnostics. You can specify the class of diagnostic with the GCC option that causes those diagnostics to be generated. @table @code @item #pragma GCC diagnostic error @var{option} @itemx _Pragma ("GCC diagnostic error @var{option}") For code following this pragma, treat diagnostics of the variety specified by @var{option} as errors. For example: @example _Pragma ("GCC diagnostic error -Wformat") @end example @noindent specifies to treat diagnostics enabled by the @var{-Wformat} option as errors rather than warnings. @item #pragma GCC diagnostic warning @var{option} @itemx _Pragma ("GCC diagnostic warning @var{option}") For code following this pragma, treat diagnostics of the variety specified by @var{option} as warnings. This overrides the @var{-Werror} option which says to treat warnings as errors. @item #pragma GCC diagnostic ignore @var{option} @itemx _Pragma ("GCC diagnostic ignore @var{option}") For code following this pragma, refrain from reporting any diagnostics of the variety specified by @var{option}. @item #pragma GCC diagnostic push @itemx _Pragma ("GCC diagnostic push") @itemx #pragma GCC diagnostic pop @itemx _Pragma ("GCC diagnostic pop") These pragmas maintain a stack of states for severity settings. @samp{GCC diagnostic push} saves the current settings on the stack, and @samp{GCC diagnostic pop} pops the last stack item and restores the current settings from that. @samp{GCC diagnostic pop} when the severity setting stack is empty restores the settings to what they were at the start of compilation. Here is an example: @example _Pragma ("GCC diagnostic error -Wformat") /* @r{@option{-Wformat} messages treated as errors. } */ _Pragma ("GCC diagnostic push") _Pragma ("GCC diagnostic warning -Wformat") /* @r{@option{-Wformat} messages treated as warnings. } */ _Pragma ("GCC diagnostic push") _Pragma ("GCC diagnostic ignored -Wformat") /* @r{@option{-Wformat} messages suppressed. } */ _Pragma ("GCC diagnostic pop") /* @r{@option{-Wformat} messages treated as warnings again. } */ _Pragma ("GCC diagnostic pop") /* @r{@option{-Wformat} messages treated as errors again. } */ /* @r{This is an excess @samp{pop} that matches no @samp{push}. } */ _Pragma ("GCC diagnostic pop") /* @r{@option{-Wformat} messages treated once again} @r{as specified by the GCC command-line options.} */ @end example @end table @node Optimization Pragmas @subsection Optimization Pragmas These pragmas enable a particular optimization for specific function definitions. The settings take effect at the end of a function definition, so the clean place to use these pragmas is between function definitions. @table @code @item #pragma GCC optimize @var{optimization} @itemx _Pragma ("GCC optimize @var{optimization}") These pragmas enable the optimization @var{optimization} for the following functions. For example, @example _Pragma ("GCC optimize -fforward-propagate") @end example @noindent says to apply the @samp{forward-propagate} optimization to all following function definitions. Specifying optimizations for individual functions, rather than for the entire program, is rare but can be useful for getting around a bug in the compiler. If @var{optimization} does not correspond to a defined optimization option, the pragma is erroneous. To turn off an optimization, use the corresponding @samp{-fno-} option, such as @samp{-fno-forward-propagate}. @item #pragma GCC target @var{optimizations} @itemx _Pragma ("GCC target @var{optimizations}") The pragma @samp{GCC target} is similar to @samp{GCC optimize} but is used for platform-specific optimizations. Thus, @example _Pragma ("GCC target popcnt") @end example @noindent activates the optimization @samp{popcnt} for all following function definitions. This optimization is supported on a few common targets but not on others. @item #pragma GCC push_options @itemx _Pragma ("GCC push_options") The @samp{push_options} pragma saves on a stack the current settings specified with the @samp{target} and @samp{optimize} pragmas. @item #pragma GCC pop_options @itemx _Pragma ("GCC pop_options") The @samp{pop_options} pragma pops saved settings from that stack. Here's an example of using this stack. @example _Pragma ("GCC push_options") _Pragma ("GCC optimize forward-propagate") /* @r{Functions to compile} @r{with the @code{forward-propagate} optimization.} */ _Pragma ("GCC pop_options") /* @r{Ends enablement of @code{forward-propagate}.} */ @end example @item #pragma GCC reset_options @itemx _Pragma ("GCC reset_options") Clears all pragma-defined @samp{target} and @samp{optimize} optimization settings. @end table @node Static Assertions @section Static Assertions @cindex static assertions @findex _Static_assert You can add compiler-time tests for necessary conditions into your code using @code{_Static_assert}. This can be useful, for example, to check that the compilation target platform supports the type sizes that the code expects. For example, @example _Static_assert ((sizeof (long int) >= 8), "long int needs to be at least 8 bytes"); @end example @noindent reports a compile-time error if compiled on a system with long integers smaller than 8 bytes, with @samp{long int needs to be at least 8 bytes} as the error message. Since calls @code{_Static_assert} are processed at compile time, the expression must be computable at compile time and the error message must be a literal string. The expression can refer to the sizes of variables, but can't refer to their values. For example, the following static assertion is invalid for two reasons: @example char *error_message = "long int needs to be at least 8 bytes"; int size_of_long_int = sizeof (long int); _Static_assert (size_of_long_int == 8, error_message); @end example @noindent The expression @code{size_of_long_int == 8} isn't computable at compile time, and the error message isn't a literal string. You can, though, use preprocessor definition values with @code{_Static_assert}: @example #define LONG_INT_ERROR_MESSAGE "long int needs to be \ at least 8 bytes" _Static_assert ((sizeof (long int) == 8), LONG_INT_ERROR_MESSAGE); @end example Static assertions are permitted wherever a statement or declaration is permitted, including at top level in the file, and also inside the definition of a type. @example union y @{ int i; int *ptr; _Static_assert (sizeof (int *) == sizeof (int), "Pointer and int not same size"); @}; @end example @node Type Alignment @appendix Type Alignment @cindex type alignment @cindex alignment of type @findex _Alignof @findex __alignof__ Code for device drivers and other communication with low-level hardware sometimes needs to be concerned with the alignment of data objects in memory. Each data type has a required @dfn{alignment}, always a power of 2, that says at which memory addresses an object of that type can validly start. A valid address for the type must be a multiple of its alignment. If a type's alignment is 1, that means it can validly start at any address. If a type's alignment is 2, that means it can only start at an even address. If a type's alignment is 4, that means it can only start at an address that is a multiple of 4. The alignment of a type (except @code{char}) can vary depending on the kind of computer in use. To refer to the alignment of a type in a C program, use @code{_Alignof}, whose syntax parallels that of @code{sizeof}. Like @code{sizeof}, @code{_Alignof} is a compile-time operation, and it doesn't compute the value of the expression used as its argument. Nominally, each integer and floating-point type has an alignment equal to the largest power of 2 that divides its size. Thus, @code{int} with size 4 has a nominal alignment of 4, and @code{long long int} with size 8 has a nominal alignment of 8. However, each kind of computer generally has a maximum alignment, and no type needs more alignment than that. If the computer's maximum alignment is 4 (which is common), then no type's alignment is more than 4. The size of any type is always a multiple of its alignment; that way, in an array whose elements have that type, all the elements are properly aligned if the first one is. These rules apply to all real computers today, but some embedded controllers have odd exceptions. We don't have references to cite for them. @c We can't cite a nonfree manual as documentation. Ordinary C code guarantees that every object of a given type is in fact aligned as that type requires. If the operand of @code{_Alignof} is a structure field, the value is the alignment it requires. It may have a greater alignment by coincidence, due to the other fields, but @code{_Alignof} is not concerned about that. @xref{Structures}. Older versions of GNU C used the keyword @code{__alignof__} for this, but now that the feature has been standardized, it is better to use the standard keyword @code{_Alignof}. @findex _Alignas @findex __aligned__ You can explicitly specify an alignment requirement for a particular variable or structure field by adding @code{_Alignas (@var{alignment})} to the declaration, where @var{alignment} is a power of 2 or a type name. For instance: @example char _Alignas (8) x; @end example @noindent or @example char _Alignas (double) x; @end example @noindent specifies that @code{x} must start on an address that is a multiple of 8. However, if @var{alignment} exceeds the maximum alignment for the machine, that maximum is how much alignment @code{x} will get. The older GNU C syntax for this feature looked like @code{__attribute__ ((__aligned__ (@var{alignment})))} to the declaration, and was added after the variable. For instance: @example char x __attribute__ ((__aligned__ 8)); @end example @xref{Attributes}. @node Aliasing @appendix Aliasing @cindex aliasing (of storage) @cindex pointer type conversion @cindex type conversion, pointer We have already presented examples of casting a @code{void *} pointer to another pointer type, and casting another pointer type to @code{void *}. One common kind of pointer cast is guaranteed safe: casting the value returned by @code{malloc} and related functions (@pxref{Dynamic Memory Allocation}). It is safe because these functions do not save the pointer anywhere else; the only way the program will access the newly allocated memory is via the pointer just returned. In fact, C allows casting any pointer type to any other pointer type. Using this to access the same place in memory using two different data types is called @dfn{aliasing}. Aliasing is necessary in some programs that do sophisticated memory management, such as GNU Emacs, but most C programs don't need to do aliasing. When it isn't needed, @strong{stay away from it!} To do aliasing correctly requires following the rules stated below. Otherwise, the aliasing may result in malfunctions when the program runs. The rest of this appendix explains the pitfalls and rules of aliasing. @menu * Aliasing Alignment:: Memory alignment considerations for casting between pointer types. * Aliasing Length:: Type size considerations for casting between pointer types. * Aliasing Type Rules:: Even when type alignment and size matches, aliasing can still have surprising results. @end menu @node Aliasing Alignment @appendixsection Aliasing and Alignment In order for a type-converted pointer to be valid, it must have the alignment that the new pointer type requires. For instance, on most computers, @code{int} has alignment 4; the address of an @code{int} must be a multiple of 4. However, @code{char} has alignment 1, so the address of a @code{char} is usually not a multiple of 4. Taking the address of such a @code{char} and casting it to @code{int *} probably results in an invalid pointer. Trying to dereference it may cause a @code{SIGBUS} signal, depending on the platform in use (@pxref{Signals}). @example foo () @{ char i[4]; int *p = (int *) &i[1]; /* @r{Misaligned pointer!} */ return *p; /* @r{Crash!} */ @} @end example This requirement is never a problem when casting the return value of @code{malloc} because that function always returns a pointer with as much alignment as any type can require. @node Aliasing Length @appendixsection Aliasing and Length When converting a pointer to a different pointer type, make sure the object it really points to is at least as long as the target of the converted pointer. For instance, suppose @code{p} has type @code{int *} and it's cast as follows: @example int *p; struct @{ double d, e, f; @} foo; struct foo *q = (struct foo *)p; q->f = 5.14159; @end example @noindent the value @code{q->f} will run past the end of the @code{int} that @code{p} points to. If @code{p} was initialized to the start of an array of type @code{int[6]}, the object is long enough for three @code{double}s. But if @code{p} points to something shorter, @code{q->f} will run on beyond the end of that, overlaying some other data. Storing that will garble that other data. Or it could extend past the end of memory space and cause a @code{SIGSEGV} signal (@pxref{Signals}). @node Aliasing Type Rules @appendixsection Type Rules for Aliasing C code that converts a pointer to a different pointer type can use the pointers to access the same memory locations with two different data types. If the same address is accessed with different types in a single control thread, optimization can make the code do surprising things (in effect, make it malfunction). Here's a concrete example where aliasing that can change the code's behavior when it is optimized. We assume that @code{float} is 4 bytes long, like @code{int}, and so is every pointer. Thus, the structures @code{struct a} and @code{struct b} are both 8 bytes. @example #include struct a @{ int size; char *data; @}; struct b @{ float size; char *data; @}; void sub (struct a *p, struct b *q) @{   int x;   p->size = 0;   q->size = 1;   x = p->size;   printf("x       =%d\n", x);   printf("p->size =%d\n", (int)p->size);   printf("q->size =%d\n", (int)q->size); @} int main(void) @{   struct a foo;   struct a *p = &foo;   struct b *q = (struct b *) &foo;   sub (p, q); @} @end example This code works as intended when compiled without optimization. All the operations are carried out sequentially as written. The code sets @code{x} to @code{p->size}, but what it actually gets is the bits of the floating point number 1, as type @code{int}. However, when optimizing, the compiler is allowed to assume (mistakenly, here) that @code{q} does not point to the same storage as @code{p}, because their data types are not allowed to alias. From this assumption, the compiler can deduce (falsely, here) that the assignment into @code{q->size} has no effect on the value of @code{p->size}, which must therefore still be 0. Thus, @code{x} will be set to 0. GNU C, following the C standard, @emph{defines} this optimization as legitimate. Code that misbehaves when optimized following these rules is, by definition, incorrect C code. The rules for storage aliasing in C are based on the two data types: the type of the object, and the type it is accessed through. The rules permit accessing part of a storage object of type @var{t} using only these types: @itemize @bullet @item @var{t}. @item A type compatible with @var{t}. @xref{Compatible Types}. @item A signed or unsigned version of one of the above. @item A qualified version of one of the above. @xref{Type Qualifiers}. @item An array, structure (@pxref{Structures}), or union type (@code{Unions}) that contains one of the above, either directly as a field or through multiple levels of fields. If @var{t} is @code{double}, this would include @code{struct s @{ union @{ double d[2]; int i[4]; @} u; int i; @};} because there's a @code{double} inside it somewhere. @item A character type. @end itemize What do these rules say about the example in this subsection? For @code{foo.size} (equivalently, @code{a->size}), @var{t} is @code{int}. The type @code{float} is not allowed as an aliasing type by those rules, so @code{b->size} is not supposed to alias with elements of @code{j}. Based on that assumption, GNU C makes a permitted optimization that was not, in this case, consistent with what the programmer intended the program to do. Whether GCC actually performs type-based aliasing analysis depends on the details of the code. GCC has other ways to determine (in some cases) whether objects alias, and if it gets a reliable answer that way, it won't fall back on type-based heuristics. @c @opindex -fno-strict-aliasing The importance of knowing the type-based aliasing rules is not so as to ensure that the optimization is done where it would be safe, but so as to ensure it is @emph{not} done in a way that would break the program. You can turn off type-based aliasing analysis by giving GCC the option @option{-fno-strict-aliasing}. @node Digraphs @appendix Digraphs @cindex digraphs C accepts aliases for certain characters. Apparently in the 1990s some computer systems had trouble inputting these characters, or trouble displaying them. These digraphs almost never appear in C programs nowadays, but we mention them for completeness. @table @samp @item <: An alias for @samp{[}. @item :> An alias for @samp{]}. @item <% An alias for @samp{@{}. @item %> An alias for @samp{@}}. @item %: An alias for @samp{#}, used for preprocessing directives (@pxref{Directives}) and macros (@pxref{Macros}). @end table @node Attributes @appendix Attributes in Declarations @cindex attributes @findex __attribute__ You can specify certain additional requirements in a declaration, to get fine-grained control over code generation, and helpful informational messages during compilation. We use a few attributes in code examples throughout this manual, including @table @code @item aligned The @code{aligned} attribute specifies a minimum alignment for a variable or structure field, measured in bytes: @example int foo __attribute__ ((aligned (8))) = 0; @end example @noindent This directs GNU C to allocate @code{foo} at an address that is a multiple of 8 bytes. However, you can't force an alignment bigger than the computer's maximum meaningful alignment. @item packed The @code{packed} attribute specifies to compact the fields of a structure by not leaving gaps between fields. For example, @example struct __attribute__ ((packed)) bar @{ char a; int b; @}; @end example @noindent allocates the integer field @code{b} at byte 1 in the structure, immediately after the character field @code{a}. The packed structure is just 5 bytes long (assuming @code{int} is 4 bytes) and its alignment is 1, that of @code{char}. @item deprecated Applicable to both variables and functions, the @code{deprecated} attribute tells the compiler to issue a warning if the variable or function is ever used in the source file. @example int old_foo __attribute__ ((deprecated)); int old_quux () __attribute__ ((deprecated)); @end example @item __noinline__ The @code{__noinline__} attribute, in a function's declaration or definition, specifies never to inline calls to that function. All calls to that function, in a compilation unit where it has this attribute, will be compiled to invoke the separately compiled function. @xref{Inline Function Definitions}. @item __noclone__ The @code{__noclone__} attribute, in a function's declaration or definition, specifies never to clone that function. Thus, there will be only one compiled version of the function. @xref{Label Value Caveats}, for more information about cloning. @item always_inline The @code{always_inline} attribute, in a function's declaration or definition, specifies to inline all calls to that function (unless something about the function makes inlining impossible). This applies to all calls to that function in a compilation unit where it has this attribute. @xref{Inline Function Definitions}. @item gnu_inline The @code{gnu_inline} attribute, in a function's declaration or definition, specifies to handle the @code{inline} keyword the way GNU C originally implemented it, many years before ISO C said anything about inlining. @xref{Inline Function Definitions}. @end table For full documentation of attributes, see the GCC manual. @xref{Attribute Syntax, Attribute Syntax, System Headers, gcc, Using the GNU Compiler Collection}. @node Signals @appendix Signals @cindex signal @cindex handler (for signal) @cindex @code{SIGSEGV} @cindex @code{SIGFPE} @cindex @code{SIGBUS} Some program operations bring about an error condition called a @dfn{signal}. These signals terminate the program, by default. There are various different kinds of signals, each with a name. We have seen several such error conditions through this manual: @table @code @item SIGSEGV This signal is generated when a program tries to read or write outside the memory that is allocated for it, or to write memory that can only be read. The name is an abbreviation for ``segmentation violation''. @item SIGFPE This signal indicates a fatal arithmetic error. The name is an abbreviation for ``floating-point exception'', but covers all types of arithmetic errors, including division by zero and overflow. @item SIGBUS This signal is generated when an invalid pointer is dereferenced, typically the result of dereferencing an uninitialized pointer. It is similar to @code{SIGSEGV}, except that @code{SIGSEGV} indicates invalid access to valid memory, while @code{SIGBUS} indicates an attempt to access an invalid address. @end table These kinds of signal allow the program to specify a function as a @dfn{signal handler}. When a signal has a handler, it doesn't terminate the program; instead it calls the handler. There are many other kinds of signal; here we list only those that come from run-time errors in C operations. The rest have to do with the functioning of the operating system. The GNU C Library Reference Manual gives more explanation about signals (@pxref{Program Signal Handling, The GNU C Library, , libc, The GNU C Library Reference Manual}). @node GNU Free Documentation License @appendix GNU Free Documentation License @include fdl.texi @node Symbol Index @unnumbered Index of Symbols and Keywords @printindex fn @node Concept Index @unnumbered Concept Index @printindex cp @bye