C programming techniques
modular programming / data encapsulation
The term module in modular programming refers to a data entity together with a programming interface for accessing and modifying the data. In contrast to a structure, from outside of the module description files the data is not modified by directly accessing structure fields.
This permits a programmer to maintain tight control of who can alter data.
If later a change is needed in the way the data is stored in the structure, the change can be made, and files using the module may not even need to be re-compiled!
In C, a module can be created by careful use of the typedef
declaration. A type for the module is declared (but not defined) in a
header file, together with functions for creating and deleting, accessing,
and modifying the module. The module is referred to by pointers
exclusively (this is the only way to refer to it, since the type hasn’t
been defined).
module.h
typedef struct Module_struct Module; /* module declaration */ Module *Module_create(); /* constructor */ void Module_delete( Module * ); /* destructor */ int Module_get_element( const Module * ); /* accessor */ void Module_set_element( Module *, int ); /* modifier */
The corresponding source file defines the structure behind the module, as well as the functions.
module.c
#include <stdlib.h> struct Module_struct { /* module structure definition */ int element; }; #include "module.h" /* include type declaration after structure definition */ Module *Module_create() { return (Module *)calloc( 1, sizeof( Module ) ); } void Module_delete( Module * m ) { free( m ); } int Module_get_element( const Module * m ) { return m->element; } void Module_set_element( Module * m, int e ) { m->element = e; }
Now files that use the functionality provided by Module simply include module.h, and then create, refer to and manipulate Module instances by pointers using the functions provided.
Outside the file module.c, the exact form of the
data in a Module instance is unknown. Also, at compile time, the size
of a Module instance is unknown outside module.c, so you
can’t use the sizeof
operator on Module.
In this example, the identifier Module_Tag_placeholder
is
superfluous. However, if it’s omitted, some compilers will complain.
multiple aspects of same data
Consider the simple case of a rectangle. You often want to think of a
rectangle as four coordinate numbers, and often, as a thing having the
properties of position and size. The C union
declaration
lets you use both aspects.
typedef struct { int x, y; unsigned int width, height; } Box; typedef struct { int x, y; } Position; typedef struct { unsigned int width, height; } Size; typedef struct { Position position; Size size; } PosSize; typedef union Rect_tag { Box coords; PosSize props; } Rectangle;
Now, to refer directly to one of the coordinates of a Rectangle
,
use its coords
aspect:
Rectangle r; r.coords.x = 0;
Using the props
aspect, you can alter the size of the
Rectangle
in a single statement that copies structures:
Size d; Rectangle r; r.props.size = d;
Notice that to initialize a Rectangle
union, you use nested
braces (K&R A8.7):
Rectangle r = { { 0, 0, 0, 0 } };
Because the coords
was the first member of the union, the
items in the initializer refer to Box
elements.
string initialization
Here is an efficient way to initialize an automatic string variable:
#define EMPTY_STRING { '\0' } char s[256] = EMPTY_STRING;
The string is proper string of length 0 immediately after the declaration.
An alternative technique is to pass the string to memset
.
While it could be argued that this is more thorough, it is
subject to its own errors (passing incorrect buffer size), it is
slower, and it leaves the string in an uninitialized state between
the declaration and the memset
call.
application of structure copy
The structure copying feature of C is under-used, but very useful for several purposes. This kind of copy is also called a shallow copy, because if the structure contains pointers, the memory pointed to is not copied.
A particularly nice example is a structure that holds values for a dialog box. Code for a program with such a box might involve lines like these:
typedef struct { int x, y; char name[256]; } Values; static const Values Values_ZERO = { 0, 0, { '\0' } }; Values currentValues; currentValues = Values_ZERO; /* structure copy */
Here, we have created a constant Values
structure,
Values_ZERO
,
that contains default values (with the string field initialized by
the preceding technique), and a working structure that is
set to the default value by a structure copy.
Typical use of structure copy in a dialog box runs like this:
-
On startup,
- Create current values structure and initialize to default values
- If values were saved, read them into current values
-
When the dialog is opened
- A temporary structure is created and initialized to the current values
- User makes changes to the temporary structure
- If user clicks the Cancel button, just close the dialog and forget the temporary values
- If user clicks the OK or Apply button, copy the temporary structure to the current structure
- If user clicks the Defaults button, copy the default values to the temporary values
Often code for such things does not employ structure copy, and instead, explicitly copies structure fields individually, or else uses individual variables rather than using a structure. Of course, such code can work, but is much more prone to errors of omission, as well as being complicated and messy.
This technique is best for structures that don’t contain pointers, but can be used for any structure. For complicated structures, however, it is best to present the structure as a module as described above and do the structure copying in the module source file.
It would have been nice if C had a way to compare two structures for
identity (like a == b
). However, because a structure can
contain things like strings which can be considered identical although their
buffers are different, and because of padding issues, the meaning of such
an operation isn’t clear.
alternatives to “magic numbers”
One common programming no-no is known as the “magic number”, a number that appears in the code explicitly and has some special interpretation (to the programmer) besides its value.
There are several alternatives to using magic numbers that will result in your program being much more comprehensible to another programmer (or to yourself next year), and probably more robust and much easier to modify.
The most common technique is to use pre-processor defines
#define MY_SPECIAL_FIRST 1 #define MY_SPECIAL_SECOND 2 /* etc */
then to use the symbols such as MY_SPECIAL_FIRST
exclusively
in place of the magic numbers. Besides making the code much easier to
understand, it makes it much easier to change and add to these
special values.
This technique has the weakness that it isn’t type-safe: a user of a function that takes one of these values as an argument might mistakenly pass it some other integer value that causes weird problems down the line.
A much better technique is to define your numbers as a special type,
and give the values names in an enum
:
typedef enum { MY_SPECIAL_FIRST = 1, MY_SPECIAL_SECOND = 2 } my_special; void foo( int arg1, my_special arg2 ); /* function takes my_special arg */ /* ... */ foo( 1, MY_SPECIAL_FIRST ); /* call must take my_special value */
This way, it is clear from the declaration that the function can only
take arguments whose values are listed in your definition of the
enum
.
Unfortunately, most C compilers automatically typecast integers into
enum
’s without a peep.
In C++, however, an attempt to pass any other integer to the
foo
is a syntax error.
proper use of header files
The purpose of header files in C is to communicate type information between code modules (.c files). Type information includes types of variables, structures, classes, as well as function prototypes. In C++, header files also contain inline source and templates.
So, what goes in a header file is precisely that information that needs to be communicated. Do not include definitions of structures or prototypes of functions that are only used in one source file; those should go within the source file itself. Do not rely on the compiler’s ability to find a function of the right name without prototypes. This leads to all sorts of confusions and failures.
Regarding #include
lines in header files: A header file should
include precisely those header files that are needed to define its contents.
That is, if the header file is included in an otherwise empty source file, the
source file should compile, and if any of the includes in the header file
is then deleted, the source file should fail to compile.
Finally, a header file should protect itself from circular inclusion via the usual mechanism
#ifndef _MY_FILENAME_H_ #define _MY_FILENAME_H_ ... #endif
Failure to follow these policies results in such things as: header files that only compile when included in a certain order, slow compilation, situations where any change to any header file results in a forced re-compile of the whole source tree, and other inefficiencies and confusions.
In C++, a common technique for controlling the size of the include tree is forward declarations. For example, if class A is associated with class B in such a way that A can be defined using only references and pointers to B objects. then before the definition of class A, one can put the forward declaration
class B;
and in the source file for A, one can include the header for class B. In this way, other source files that need info about A but don’t refer to details of B don’t have to include the header file for B.