Before there was C++ there was C. Developed at Bell Laboratories in the early 1970's, C was used for over 95% of the code in the UNIX operating system kernel. By the end of the 70's, C compilers were available for most mini and microcomputers. Superior to BASIC for applications requiring efficient code generation, C became the de facto standard for application software development on mini-computers and that new phenomenon, the PC. Meanwhile, Bjarne Stroustrup, a Bell Labs research computer scientist, released a prototype language called ``C With Classes.'' Originally a preprocessor that generated C code, this effort eventually produced C++. Meanwhile, C itself continued to dominate application development for personal computer software and real-time control. Both C and C++ are used today for application software, systems software, and embedded systems.
C++ can be viewed as C with richer support for object oriented development. Conversely, C can be viewed as a primitive variant of C++. In any event, most legal C programs are also legal C++, and the C++ compiler is able to translate them. The few exceptions use features considered obsolete in C; these will not be discussed.
Learning C means stripping away some of the comfortable assumptions under which C++ programmers work: while C is simpler than C++, C also makes it easier to defeat type checking, scramble memory, and use pointers in an undisciplined way. The remainder of this document assumes you know C++ moderately well, and discusses C in this context.
In C++, one uses classes to structure the system into components. Most of the functions in C++ are members of
a class, and thus tightly related to the class's objects. However, C++ also supports free functions:
functions that are not part of a class. The free function you are most familiar with is main,
which gets things going by creating, initializing, and activating one or more objects. Other examples are numerical
functions such as sin and log, which really don't belong
to any class.
In C, all functions are free functions. There is no way in C to bind a function to a data structure or object. Any such relationships depend on the discipline of the programming team. The data structures manipulated by C functions must be either global variables or explicit parameters: there are no implicit arguments like this in C++.
C does not have references or reference parameters. All arguments to C's free functions are passed by value. To achieve the effect of call by reference, pointers are used. Consider the following examples:
void swap(int& x, int& y) {
int t ;
t = x ; x = y ; y = t ;
}
. . .
swap(a, b) ;
|
void swap(int *x, int *y) {
int t ;
t = *x ; *x = *y ; *y = t ;
}
. . .
swap(&a, &b) ;
|
The code on the left is standard C++, using reference parameters to exchange two integers. The code on
the right is the equivalent C version. As C always passes by value, we have to pass pointers
to the two integers being swapped, and use the indirection operator * to access
the values. In the call on the right, we use the ``address of'' operator &
to create the necessary pointers to a and b . In this
case, the C++ code is much cleaner, as the pointers are manipulated by the compiler ``under the hood.''
C functions are severely constrained as to the types of values they can return:
There is no way in C to overload a function (i.e., to have two or functions with the same name but different argument lists). In a similar vein, there are no template functions in C.
The sequencing, iteration, and decision control structures in C are the same as in C++. That is, the while
and for loops, as well as the if and switch
statements, are identical. However, the types of values and structures you can use in decisions and iterations
are much simpler than in C++.
C data structures are fewer and simpler than those of C++. Most significantly, there are no classes in
C -- the closest we can come to a class is a struct , which is a collection of related
data items. The following subsections describe the basic C types and data structures.
The predefined scalar types are the same in C and C++, namely:
double (double precision) and float
(single precision). For most computations, double precision is preferable.
- Example constants
- 4.0 3.14159 6.023e23
char ) are just very small (1 byte) integers. Though they usually
hold a single character, there is nothing to prevent their use as small numbers (other than a desire for sanity).
- Example constants
- 1234 'x'
'x' are really small integers whose value
is the character's ASCII code. Some non-printing characters have special escape sequences:
\n |
Newline |
\r |
Return |
\t |
Tab |
\ooo |
Character with octal ASCII code "ooo". |
Arrays in C are declared by giving their size , where the subscripts range from 0 to (size-1). For example, the fragment:
double vector[20] ;
char name[16] ;
shows the declaration of a 20-element array of doubles, vector, and a 16-element
array of characters, name. The legal subscripts for the two arrays are 0 to 19
and 0 to 15, respectively.
Multiple dimension arrays (matrices) are treated as arrays of arrays in C. The following declaration defines a table of 10 character arrays, where each character array contains 16 characters:
char table[10][16]
To access the jth character in the ith row, you would write:
table[i][j]
In C, an array's name does not refer to the array's contents! Instead, the name is the address of the first array element (0)! It is hard to over-emphasize this point, as it is the source of many subtle C errors. Consider the following declarations:
char nameA[16] ;
char nameB[16] ;
It is perfectly legal in C to write:
if ( nameA == nameB )
However, the comparison is not between the two 16-character strings, but between the addresses
of the two arrays. Given that ``nameA'' and ``nameB'' are different arrays, this comparison will always be false.
The rationale for this seemingly strange behavior is that it supports passing arrays by reference. When an an
array is named as an argument, a pointer is actually passed. Consider a function to change all lower-case
letters in a string to 'x', and a call to that function using nameA
above:
void letter_to_X( char *string, int length ) {
int i ;
for ( i = 0 ; i < length ; i++ ) {
if ( string[i] >= 'a' && string[i] <= 'z' )
string[i] = 'X' ;
}
}
. . .
letter_to_X( nameA, 16 ) ;
There are several things to note:
string[i] = 'X' uses a subscript with a pointer. This is
perfectly normal in C -- the compiler simply uses the pointer as the base address for its subscript calculations.
length argument is incorrect,
or if the algorithm is wrong, it is possible to access and modify data that is not part of the array. Unlike
the lists, sequences, and strings in the RogueWave library, subscript errors are not caught in C.
Such errors are the source of many subtle failures in C programs.
In C, character string constants are constant arrays of characters. As in C++, such strings are enclosed in double quotes. They are constant because the text of the string is clearly spelled out between the quotes. They are arrays , as they occupy contiguous memory, with the characters numbered from zero. It is even possible (if a bit silly) to subscript strings; for example:
"abcdef"[3]
is the character 'd'.
The C (and C++) convention is that all strings are terminated by a NULL character
(character '\0'). All constant strings have a NULL
appended by the compiler. All strings that are constructed in character array variables should have a NULL
appended. Most string manipulation functions use the NULL as a marker to terminate
processing. If the NULL is missing, unrelated areas of memory may be modified and
corrupted.
Because string constants are arrays, the constraints from the previous section apply. In particular, it is impossible in C to assign a string constant to a character array directly, as the ``value'' of a string constant is the address of its first character. Instead, there are "library functions" to handle string copying, etc., the interface to which is available via:
#include <string.h>
Here are a few of the common functions:
char *strcpy(char *to, const char *from)
- The
fromstring is properly terminated with a NULL. The compiler ensures this for string constants.- The
tostring has enough space allocated to hold all the characters infromplus the terminating NULL !
Example
(void) strcpy(nameA, "Joe Blow") ;Note that the string is copied to an array, but we already know the array name is a pointer to its first element. Also, the cast of the return value to
(void)says that we know there is a return value but we're ignoring it.
char *strcat(char *to, const char *from)
- Both from and to are properly terminated with a NULL.
- The to string has enough space allocated to hold all the characters in the combined strings plus the terminating NULL.
Example
(void) strcat(nameA, ", Jr.") ;Adds the string ", Jr." to the end of
nameA. Again, casting the return value to(void)says we're ignoring the return value.
int strlen(const char *s)s. Note that this is the number of characters up-to
but not including the NULL byte. This may be less than the space actually allocated
(for example, if a string does not fill up a character array). If the string was not terminated with a NULL,
then strlen will continue scanning through memory until either it finds a NULL
by coincidence or it creates a memory violation.
Example
if ( strlen(nameA) < 8 ) {
(void) strcpy( nameB, nameA )
}
The code above copies the string in
nameAtonameBif the string innameAis less than eight characters long.
As mentioned previously, a C structure is essentially a C++ class with no member functions and with all the data members public. Indeed, if you look in the C++ reference manual, you'll see that this is .{emphasis} exactly how C++ defines the meaning of a .{code} struct .
Most C programmers use structures to provide an approximation to objects. The object data is stored in the structure's elements, and a set of C functions is developed to manipulate such structures. The difference between C and C++ is that the latter can enforce the access rules. That is, in C++ only member functions can get at the object's data. In C, it is a matter of convention and discipline as to which functions can manipulate a structure's contents.
Example
struct name {
char first[16] ;
char mi ;
char last[20] ;
} ;
struct student {
struct name stu_name ;
int stu_number ;
double stu_debt ;
} ;
struct student rit[15000] ;
In the example above, structure name has three components: a 16 character array for the first name, a single character for the middle initial, and a 20 character array for the last name. The second structure, student, uses the previous structure to define a component for the student's name, as well as two other components for the student number and amount of money owed. Finally, and array of 15000 student structures is defined to hold the overall RIT enrollment.
As with classes, dot notation is used to select components:
rit[6]rit[6].stu_namerit[6].stu_name.miUnlike arrays, a structure variable refers to the whole structure, not its address. When structures are passed to functions, a copy is passed, and changes to the copy are not reflected in the original structure. To affect a structure, the function argument must be a pointer.
Example
void clear_debt( struct student *p_student ) {
p->stu_debt = 0.0 ;
return ;
}
. . .
clear_debt( &rit[6] ) ;
The function clear_debt expects a pointer to a struct student,
and uses this to set the stu_debt component to zero. In the call to clear_debt,
the address of operator & is used to create
a pointer to the sixth RIT student.
The need to use the struct keyword, possibly with an asterisk for a pointer, can
clutter up a C program's declarations. The typedef construct lets us create
suitable aliases for any type we wish
Examples
typedef struct name Name ;
typedef struct student Student ;
typedef struct student *StudentPtr ;
After these declarations, we can use the identifiers Name, Student,
and StudentPtr instead of the longer forms using keywords and asterisks. For example,
the array of records can be declared as:
Student rit[15000] ;
and the function header becomes:
void clear_debt( StudentPtr p_student ) {
Both C and C++ support pointers to data in memory. We've already seen a couple uses of pointers in previous examples. What follows are some more simple examples of pointers in C, using the student records declarations above:
Declaration
struct student *p_stu ;
or
StudentPtr p_stu ;
Assignment
p_stu = &rit[6] ;
Access
p_stu->stu_debt = p_stu * 1.02 ;
In both C and C++, pointers are used primarily to create dynamic data structures like lists and trees. C++ allocates
new objects and recycles existing ones with new and delete,
respectively. In addition, well-designed class libraries hide much of the complexity behind a simpler class interfaces.
The RogueWave RWCString class, for example, provides strings that can grow and shrink.
The allocation and deallocation needed to support these strings is hidden in the implementation.
In C, many more of these details are visible to clients of a package built from pointers. What is more, the burden of allocating and freeing memory at the right time is on the programmer's shoulders. There is nothing like a C++ destructor in C. The following simple example will demonstrate the key memory management issues in C:
The Problem
Assume we decide to replace the array implementation of RIT's student database with one based on singly linked lists.
Solution Part 1
To do this, we'll define a new structure type StuNode which contains a student
structure and a pointer to the next StuNode in the list:
typedef struct stu_node *StuNodePtr ;
typedef struct stu_node {
Student sn_student ;
StuNodePtr sn_next ;
} StuNode ;
Next we'll define a global pointer rit_head to point to the first student in the list. As the list is initially empty, we'll set this pointer to zero (or NULL, a symbolic constant available in file stddef.h):
#include <stddef.h>
. . .
StuNodePtr rit_head = NULL ;
Solution Part 2
Now we need a function to add a new student to the list (the argument is the student to add). This will require access to the malloc memory allocation function, which is declared in stdlib.h. We'll assume the student is added at the list head:
#include <stdlib.h>
. . .
void add_student( Student new_stu ) {
StuNodePtr new_node ;
new_node =
(StuNodePtr) malloc( sizeof(StuNode) ) ;
new_node->sn_student = new_stu ;
new_node->sn_next = rit_head ;
rit_head = new_node ;
}
The first assignment calls the system .{code} malloc function with the size (in bytes) of a .{code} StuNode structure. The return value is a pointer to newly allocated memory at least as large as that requested. Unfortunately, this pointer is of type .{code} "char *" , rather than .{code} "StuNodePtr" . To remedy this, we .{term} cast the pointer from its real type to the type we want: that's the purpose of .{code} "(StuNodePtr)" in front of .{code} malloc . The remainder of the code:
Solution Part 3
Finally, we need a way to dispose of a node structure when a student is removed from the list. We won't give the details of finding the node in the list and properly unlinking it; we'll assume we simply need to delete the space associated with the node:
void free_node( StuNodePtr p_stu ) {
(void) free( (char *) p_stu ) ;
}
The free function is also in the standard library, and it simply releases the
space associated with pointer p_stu. We cast the pointer to type (char
*), which is the type of pointer expected by free. The return value
from free is an integer, but this value is rarely (if ever) used. To indicate
we are ignoring the value, we cast the return value to (void). Once a memory region
if freed, no further reference to the region is allowed. If such pointer access occurs by accident, the program
will behave unpredictably.
NOTE: This can happen in C++ as well if there are any pointers to deleted objects!.
Two other keywords that can be prepended to a declaration are extern and static
. An extern declaration gives the name and type of a variable or function, but does
not allocate any space. Such statements are typically found in header files defining the interface to a C module.
Typically the space is allocated when the variable or function is defined in the .c or .C
file that implements the module.
A static function or global variable is one whose scope is restricted to the current
implementation file. These declarations occur at the top of the source file, before any references to the associated
functions or variables. No other module can refer to these names, so they are effectively "hidden" within
the implementation file where they are defined. In essence, static provides a crude
form of private data and operations.
In summary, C's data structuring mechanisms are both more primitive and more unstructured than those of C++. They are primitive in that they do not support member functions, protected or private information, or generic (template) structures. They are more unstructured in that neither array subscripts nor pointers are checked for legality. The flexibility of this free-wheeling approach must be balanced against the increased probability of subtle, hard to locate bugs. The smart approach is to do as much as possible in C++, using stable, well-designed library classes.
C input/output is also primitive compared to C++. There are no input or output streams, nor are the >>
and << I/O operators available. As with string manipulation and memory allocation,
I/O is supported by a standard library, stdio.h. Access to the library interface is gained
by including its header file:
#include <stdio.h>
Here are a few of the functions provided.
getchar()int . On end-of-file, the
special indicator EOF (a negative value) is returned.
Example
int c ;
for ( c = getchar() ; c != EOF ; c = getchar() ) {
process_char( c ) ;
}
printf(format [,args] )printf has a variable length argument list. The first argument is a string specifying
the output format. Embedded in this string can be formatting specifiers, each preceded
by a '%' The specifiers tell how to format the arguments that follow, from left to
right.
Example
printf( "String %s integer %d float %5.2f\\n", s, i, f ) ;
The first argument, s, is handled by the %s specifier. This assumes the argument points to a string of characters, which are printed until the terminating NULL is encountered. The second argument, i, is printed as a decimal integer by the .{code} "%d" specifier.
Finally, the last argument, f, is printed as a floating point number by the %5.2f specifier. The "5.2" portion says to use a field 5 characters wide, and to print 2 digits to the right of the decimal point. There are many other output formats, and many other I/O functions as well. For all the details, consult the manual page for the standard I/O library:
man stdio
Please note that
printfhas no way of verifying the argument types. If you pass an integer where the format requires a pointer, you'll get weird output and possibly a core dump. You just have to be very careful.
The Unix convention is that files ending in .c (lower case c) contain C code,
while those ending in .C (upper case C) are for C++. While we have a compiler that
handles just the C language, you are probably better off using the C++ compiler, CC,
for both C and C++ files. All the programs we'll develop should compile this way, though you may get warning messages
from C code because the C++ compiler is pickier about type checking..
NOTE: You may have to define a special .c.o rule in your
Makefiles to have C code compiled by the C++ compiler.