C programming techniques

modular programming / data encapsulation

The term module in modular programming refers to a data entity together with a programming interface for accessing and modifying the data. In contrast to a structure, from outside of the module description files the data is not modified by directly accessing structure fields.

This permits a programmer to maintain tight control of who can alter data.

If later a change is needed in the way the data is stored in the structure, the change can be made, and files using the module may not even need to be re-compiled!

In C, a module can be created by careful use of the typedef declaration. A type for the module is declared (but not defined) in a header file, together with functions for creating and deleting, accessing, and modifying the module. The module is referred to by pointers exclusively (this is the only way to refer to it, since the type hasn’t been defined).

`module.h`

typedef struct Module_struct Module;          /* module declaration */

Module *Module_create();                      /* constructor */
void    Module_delete( Module * );            /* destructor */
int     Module_get_element( const Module * ); /* accessor */
void    Module_set_element( Module *, int );  /* modifier */

The corresponding source file defines the structure behind the module, as well as the functions.

`module.c`

#include <stdlib.h>

struct Module_struct {    /* module structure definition */
        int element;
};

#include "module.h"    /* include type declaration after structure definition */

Module *Module_create() { return (Module *)calloc( 1, sizeof( Module ) ); }
void    Module_delete( Module * m ) { free( m ); }
int     Module_get_element( const Module * m ) { return m->element; }
void    Module_set_element( Module * m, int e ) { m->element = e; }

Now files that use the functionality provided by Module simply include module.h, and then create, refer to and manipulate Module instances by pointers using the functions provided.

Outside the file module.c, the exact form of the data in a Module instance is unknown. Also, at compile time, the size of a Module instance is unknown outside module.c, so you can’t use the sizeof operator on Module.

In this example, the identifier Module_Tag_placeholder is superfluous. However, if it’s omitted, some compilers will complain.

multiple aspects of data

Consider the simple case of a rectangle. You often want to think of a rectangle as four coordinate numbers, and often, as a thing having the properties of position and size. The C union declaration lets you use both aspects.

typedef struct {
        int          x, y;
        unsigned int width, height;
} Box;

typedef struct {
        int x, y;
} Position;

typedef struct {
        unsigned int width, height;
} Size;

typedef struct {
        Position   position;
        Size       size;
} PosSize;

typedef union Rect_tag {
        Box        coords;
        PosSize    props;
} Rectangle;

Now, to refer directly to one of the coordinates of a Rectangle, use its coords aspect:

Rectangle r;
r.coords.x = 0;

Using the props aspect, you can alter the size of the Rectangle in a single statement that copies structures:

Size d;
Rectangle r;
r.props.size = d;

Notice that to initialize a Rectangle union, you use nested braces (K&R A8.7):

Rectangle r = { { 0, 0, 0, 0 } };

Because the coords was the first member of the union, the items in the initializer refer to Box elements.

string initialization

Here is an efficient way to initialize an automatic string variable:

#define EMPTY_STRING { '\0' }
char s[256] = EMPTY_STRING;

The string is proper string of length 0 immediately after the declaration.

An alternative technique is to pass the string to memset. While it could be argued that this is more thorough, it is subject to its own errors (passing incorrect buffer size), it is slower, and it leaves the string in an uninitialized state between the declaration and the memset call.

structure copy

The structure copying feature of C is under-used, but very useful for several purposes. This kind of copy is also called a shallow copy, because if the structure contains pointers, the memory pointed to is not copied.

A particularly nice example is a structure that holds values for a dialog box. Code for a program with such a box might involve lines like these:

typedef struct {
	int          x, y;
	char	name[256];
} Values;

static const Values Values_ZERO = { 0, 0, { '\0' } };

Values	currentValues;

currentValues = Values_ZERO;	/* structure copy */

Here, we have created a constant Values structure, Values_ZERO, that contains default values (with the string field initialized by the preceding technique), and a working structure that is set to the default value by a structure copy.

Typical use of structure copy in a dialog box runs like this:

On startup,
1. Create current values structure and initialize to default values
2. If values were saved, read them into current values
When the dialog is opened
1. A temporary structure is created and initialized to the current values
2. User makes changes to the temporary structure
3. If user clicks the Cancel button, just close the dialog and forget the temporary values
4. If user clicks the OK or Apply button, copy the temporary structure to the current structure
5. If user clicks the Defaults button, copy the default values to the temporary values

Often code for such things does not employ structure copy, and instead, explicitly copies structure fields individually, or else uses individual variables rather than using a structure. Of course, such code can work, but is much more prone to errors of omission, as well as being complicated and messy.

This technique is best for structures that don’t contain pointers, but can be used for any structure. For complicated structures, however, it is best to present the structure as a module as described above and do the structure copying in the module source file.

It would have been nice if C had a way to compare two structures for identity (like a == b). However, because a structure can contain things like strings which can be considered identical although their buffers are different, and because of padding issues, the meaning of such an operation isn’t clear.

“magic numbers”

One common programming no-no is known as the “magic number”, a number that appears in the code explicitly and has some special interpretation (to the programmer) besides its value.

There are several alternatives to using magic numbers that will result in your program being much more comprehensible to another programmer (or to yourself next year), and probably more robust and much easier to modify.

The most common technique is to use pre-processor defines

#define	MY_SPECIAL_FIRST	1
#define	MY_SPECIAL_SECOND	2
	/* etc */

then to use the symbols such as MY_SPECIAL_FIRST exclusively in place of the magic numbers. Besides making the code much easier to understand, it makes it much easier to change and add to these special values.

This technique has the weakness that it isn’t type-safe: a user of a function that takes one of these values as an argument might mistakenly pass it some other integer value that causes weird problems down the line.

A much better technique is to define your numbers as a special type, and give the values names in an enum:

typedef enum {
	MY_SPECIAL_FIRST  = 1,
	MY_SPECIAL_SECOND = 2
} my_special;

void foo( int arg1, my_special arg2 );	/* function takes my_special arg */

/* ... */
	foo( 1, MY_SPECIAL_FIRST );	/* call must take my_special value */

This way, it is clear from the declaration that the function can only take arguments whose values are listed in your definition of the enum. Unfortunately, most C compilers automatically typecast integers into enum’s without a peep. In C++, however, an attempt to pass any other integer to the foo is a syntax error.

use of header files

The main purpose of header files in C is to communicate type information between code modules (.c files). Type information includes types of variables, structures, classes, as well as function prototypes. Header files may also contain preprocessor symbols and macros (the latter are nowadays strongly discouraged). In C++, header files also contain inline source and templates.

So, what goes in a header file is precisely that information that needs to be communicated. Do not include definitions of structures or prototypes of functions that are only used in one source file; those should go within the source file itself. Do not rely on the compiler’s ability to find a function of the right name without prototypes. This leads to all sorts of confusions and failures.

Regarding #include lines in header files: A header file should include precisely those header files that are needed to define its contents. That is, if the header file is included in an otherwise empty source file, the source file should compile, and if any of the includes in the header file is then deleted, the source file should fail to compile.

It is extremely bad practice for compilation behavior to depend upon order of #include lines. It isn’t asking for trouble — it is stomping your feet and holding your breath for trouble.

Finally, a header file should protect itself from circular inclusion via the usual mechanism

#ifndef _MY_FILENAME_H_
#define _MY_FILENAME_H_
...
#endif

Failure to follow these policies results in such things as: header files that only compile when included in a certain order, slow compilation, situations where any change to any header file results in a forced re-compile of the whole source tree, and other inefficiencies and confusions.

In C++, a common technique for controlling the size of the include tree is forward declarations. For example, if class A is associated with class B in such a way that A can be defined using only references and pointers to B objects. then before the definition of class A, one can put the forward declaration

class B;

and in the source file for A, one can include the header for class B. In this way, other source files that need info about A but don’t refer to details of B don’t have to include the header file for B.

The only excuse for preprocessor macros is maintenance of code for legacy compilers. If you are still writing or using them, you probably need to read up on inline functions in the C99 standard, and just stop doing that.