123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211 |
- Copyright (C) 2000-2015 Free Software Foundation, Inc.
- This file is intended to contain a few notes about writing C code
- within GCC so that it compiles without error on the full range of
- compilers GCC needs to be able to compile on.
- The problem is that many ISO-standard constructs are not accepted by
- either old or buggy compilers, and we keep getting bitten by them.
- This knowledge until now has been sparsely spread around, so I
- thought I'd collect it in one useful place. Please add and correct
- any problems as you come across them.
- I'm going to start from a base of the ISO C90 standard, since that is
- probably what most people code to naturally. Obviously using
- constructs introduced after that is not a good idea.
- For the complete coding style conventions used in GCC, please read
- http://gcc.gnu.org/codingconventions.html
- String literals
- ---------------
- Irix6 "cc -n32" and OSF4 "cc" have problems with constant string
- initializers with parens around it, e.g.
- const char string[] = ("A string");
- This is unfortunate since this is what the GNU gettext macro N_
- produces. You need to find a different way to code it.
- Some compilers like MSVC++ have fairly low limits on the maximum
- length of a string literal; 509 is the lowest we've come across. You
- may need to break up a long printf statement into many smaller ones.
- Empty macro arguments
- ---------------------
- ISO C (6.8.3 in the 1990 standard) specifies the following:
- If (before argument substitution) any argument consists of no
- preprocessing tokens, the behavior is undefined.
- This was relaxed by ISO C99, but some older compilers emit an error,
- so code like
- #define foo(x, y) x y
- foo (bar, )
- needs to be coded in some other way.
- Avoid unnecessary test before free
- ----------------------------------
- Since SunOS 4 stopped being a reasonable portability target,
- (which happened around 2007) there has been no need to guard
- against "free (NULL)". Thus, any guard like the following
- constitutes a redundant test:
- if (P)
- free (P);
- It is better to avoid the test.[*]
- Instead, simply free P, regardless of whether it is NULL.
- [*] However, if your profiling exposes a test like this in a
- performance-critical loop, say where P is nearly always NULL, and
- the cost of calling free on a NULL pointer would be prohibitively
- high, consider using __builtin_expect, e.g., like this:
- if (__builtin_expect (ptr != NULL, 0))
- free (ptr);
- Trigraphs
- ---------
- You weren't going to use them anyway, but some otherwise ISO C
- compliant compilers do not accept trigraphs.
- Suffixes on Integer Constants
- -----------------------------
- You should never use a 'l' suffix on integer constants ('L' is fine),
- since it can easily be confused with the number '1'.
- Common Coding Pitfalls
- ======================
- errno
- -----
- errno might be declared as a macro.
- Implicit int
- ------------
- In C, the 'int' keyword can often be omitted from type declarations.
- For instance, you can write
- unsigned variable;
- as shorthand for
- unsigned int variable;
- There are several places where this can cause trouble. First, suppose
- 'variable' is a long; then you might think
- (unsigned) variable
- would convert it to unsigned long. It does not. It converts to
- unsigned int. This mostly causes problems on 64-bit platforms, where
- long and int are not the same size.
- Second, if you write a function definition with no return type at
- all:
- operate (int a, int b)
- {
- ...
- }
- that function is expected to return int, *not* void. GCC will warn
- about this.
- Implicit function declarations always have return type int. So if you
- correct the above definition to
- void
- operate (int a, int b)
- ...
- but operate() is called above its definition, you will get an error
- about a "type mismatch with previous implicit declaration". The cure
- is to prototype all functions at the top of the file, or in an
- appropriate header.
- Char vs unsigned char vs int
- ----------------------------
- In C, unqualified 'char' may be either signed or unsigned; it is the
- implementation's choice. When you are processing 7-bit ASCII, it does
- not matter. But when your program must handle arbitrary binary data,
- or fully 8-bit character sets, you have a problem. The most obvious
- issue is if you have a look-up table indexed by characters.
- For instance, the character '\341' in ISO Latin 1 is SMALL LETTER A
- WITH ACUTE ACCENT. In the proper locale, isalpha('\341') will be
- true. But if you read '\341' from a file and store it in a plain
- char, isalpha(c) may look up character 225, or it may look up
- character -31. And the ctype table has no entry at offset -31, so
- your program will crash. (If you're lucky.)
- It is wise to use unsigned char everywhere you possibly can. This
- avoids all these problems. Unfortunately, the routines in <string.h>
- take plain char arguments, so you have to remember to cast them back
- and forth - or avoid the use of strxxx() functions, which is probably
- a good idea anyway.
- Another common mistake is to use either char or unsigned char to
- receive the result of getc() or related stdio functions. They may
- return EOF, which is outside the range of values representable by
- char. If you use char, some legal character value may be confused
- with EOF, such as '\377' (SMALL LETTER Y WITH UMLAUT, in Latin-1).
- The correct choice is int.
- A more subtle version of the same mistake might look like this:
- unsigned char pushback[NPUSHBACK];
- int pbidx;
- #define unget(c) (assert(pbidx < NPUSHBACK), pushback[pbidx++] = (c))
- #define get(c) (pbidx ? pushback[--pbidx] : getchar())
- ...
- unget(EOF);
- which will mysteriously turn a pushed-back EOF into a SMALL LETTER Y
- WITH UMLAUT.
- Other common pitfalls
- ---------------------
- o Expecting 'plain' char to be either sign or unsigned extending.
- o Shifting an item by a negative amount or by greater than or equal to
- the number of bits in a type (expecting shifts by 32 to be sensible
- has caused quite a number of bugs at least in the early days).
- o Expecting ints shifted right to be sign extended.
- o Modifying the same value twice within one sequence point.
- o Host vs. target floating point representation, including emitting NaNs
- and Infinities in a form that the assembler handles.
- o qsort being an unstable sort function (unstable in the sense that
- multiple items that sort the same may be sorted in different orders
- by different qsort functions).
- o Passing incorrect types to fprintf and friends.
- o Adding a function declaration for a module declared in another file to
- a .c file instead of to a .h file.
|