The C and C++ standards allows the character type char
to be
signed or unsigned, depending on the platform and compiler. Most
systems, including x86 GNU/Linux and Microsoft Windows, use signed
char
, but those based on PowerPC and ARM processors
typically use unsigned char
.35 This can lead to unexpected results when porting
programs between platforms which have different defaults for the type
of char
.
The following code demonstrates the difference between platforms with
signed and unsigned char
types:
#include <stdio.h> int main (void) { char c = 255; if (c > 128) { printf ("char is unsigned (c = %d)\n", c); } else { printf ("char is signed (c = %d)\n", c); } return 0; }
With an unsigned char
, the variable c
takes the value 255,
but with a signed char
it becomes -1.
The correct way to manipulate char
variables in C is through the
portable functions declared in ctype.h, such as isalpha
,
isdigit
and isblank
, rather than by their numerical
values. The behavior of non-portable conditional expressions such as
c > 'a'
depends on the signedness of the char
type. If
the signed or unsigned version of char
is explicitly required at
certain points in a program, it can be specified using the declarations
signed char
or unsigned char
.
For existing programs which assume that char
is signed or
unsigned, GCC provides the options -fsigned-char and
-funsigned-char to set the default type of char
. Using
these options, the example code above compiles cleanly when char
is unsigned:
$ gcc -Wall -funsigned-char signed.c $ ./a.out char is unsigned (c = 255)
However, when char
is signed the value 255 wraps around to
-1, giving a warning:
$ gcc -Wall -fsigned-char signed.c signed.c: In function `main': signed.c:7: warning: comparison is always false due to limited range of data type $ ./a.out char is signed (c = -1)
The warning message “comparison is always true/false due to
limited range of data type” is one symptom of code which assumes a
definition of char
which is different from the actual type.
The most common problem with code written assuming signed char
types occurs with the functions getc
, fgetc
and
getchar
(which read a character from a file). They have a return
type of int
, not char
, and this allows them to use the
special value -1 (defined as EOF
) to indicate an
end-of-file error. Unfortunately, many programs have been written which
incorrectly store this return value straight into a char
variable. Here is a typical example:
#include <stdio.h> int main (void) { char c; while ((c = getchar()) != EOF) /* not portable */ { printf ("read c = '%c'\n", c); } return 0; }
This only works on platforms which default to a signed char
type.36
On platforms which use an unsigned char
the same code will fail,
because the value -1 becomes 255 when stored in an unsigned
char
. This usually causes an infinite loop because the end of the file
cannot be recognized.37
To be portable, the program should test the
return value as an integer before coercing it to a char
, as
follows:
#include <stdio.h> int main (void) { int i; while ((i = getchar()) != EOF) { unsigned char c = i; printf ("read c = '%c'\n", c); } return 0; }
The same considerations described in this section apply to the definitions of bitfields in structs, which can be signed or unsigned by default. In GCC, the default type of bitfields can be controlled using the options -fsigned-bitfields and -funsigned-bitfields.
MacOS X (Darwin) on
PowerPC uses signed char
, for consistency with other Darwin
architectures.
There is also a subtle error even on
platforms with signed char
—the ASCII character 255
is spuriously interpreted as an end of file condition.
If displayed, character code 255 often appears as ÿ
.