Creating and Passing Objects by Value in C

Martin ISDN

1.50/5 (6 votes)

Jul 9, 2021

GPL3

19 min read

5661

C, pure functions, multiple files, the stack and the heap

Download source code - 29.3 KB

Introduction

What is the optimum size of a function? How many things should a function do? What should be the desired minimum size? Does one line make sense? By which criteria do I sort functions into files?

This will be suitable for those with a more academic C background. Familiar with moderate algorithmic tasks, but lacking blue-collar experience.

Pure Functions

Functions are meaningless without data. Some deal with the datatype as a whole and some with every elementary aspect of it. If you have a Person datatype; the former operate on Persons, the later on Person's properties like: mass, age, name, address, picture...

I found a more indirect solution to my 'size of a function problem' in pure functions. Pure functions don't have side effects. They copy data via parameters, make whatever calculations they have and return a result. A pure function always returns the same result given the same arguments.

Having only one value to return to the outside world, a pure function says everything about its size without mentioning quantities like lines of code. You are supposed to have only one effect. You should not modify 5 outer objects of different type using pointers in a function.

Some elementary values have bonds with others and are meaningless alone. They represent a logical entity. For instance, a 3D point could represent a Person's location. It is more practical to have the object's location returned as a point than to have three different functions that return its x, y and z coordinate.

Languages like Python have the ability to return multiple values from a function and I think C does more appropriate than that. Why would you want to return a Person's: mass, picture and address from one function? You would want to return as one thing a couple of elementary values that represent a logical entity, in a struct. Like some bio info: weight, height and age.

The Heap and the Stack

I avoid malloc and the heap (if possible). It makes me happy to think that the default way of passing arguments in C is by copy/value. Sorry, not the default, the only way. Opposite to C++ and Pascal, in C you do not have passing by reference. You have a pointer datatype that you can pass by value and that value is good at representing references of objects.

The code examples here are without practical meaning, just to illustrate points in the discussion.

Example 1

point.h

struct point {
    unsigned x;
    unsigned y;
};

struct point point_new(unsigned, unsigned);
struct point point_move(struct point, int, int);

point.c

#include "point.h"

struct point point_new(unsigned x, unsigned y) {
    struct point t;
    t.x = x;
    t.y = y;
    return t;
}

struct point point_move(struct point t, int dx, int dy) {
    t.x = t.x + dx > 0 ? t.x + dx : 0;
    t.y = t.y + dy > 0 ? t.y + dy : 0;
    return t;
}

rect.h

struct rect {
    struct point p;
    unsigned w;
    unsigned h;
};

struct rect rect_new(struct point, unsigned, unsigned);
struct rect rect_move(struct rect, int, int);
struct rect rect_size(struct rect, unsigned, unsigned);

rect.c

#include "point.h"
#include "rect.h"

struct rect rect_new(struct point p, unsigned w, unsigned h) {
    struct rect t;
    t.p.x = p.x;
    t.p.y = p.y;
    t.w = w;
    t.h = h;
    return t;
}

struct rect rect_move(struct rect t, int dx, int dy) {
    return rect_new(point_move(t.p, dx, dy), t.w, t.h);
}

struct rect rect_size(struct rect t, unsigned w, unsigned h) {
    t.w = w;
    t.h = h;
    return t;
}

example1.c

#include "point.h"
#include "rect.h"

#include <stdio.h>

int main(void) {
    struct rect a = rect_new(point_new(5, 10), 20, 30);
    printf("rectangle at (%d, %d), width %d, height %d\n", a.p.x, a.p.y, a.w, a.h);

    a = rect_size(rect_move(a, 10, 5), 40, 50);
    printf("rectangle at (%d, %d), width %d, height %d\n", a.p.x, a.p.y, a.w, a.h);
    
    return 0;
}

compile:

cl example1.c point.c rect.c

output:

rectangle at (5, 10), width 20, height 30
rectangle at (15, 15), width 40, height 50

In the Appendix at the end of this article, there is info how on to get some free compilers and compile the examples on Windows.

Let's get straight with the notions of the caller and the callee. If function A is called from inside of the body of function B, we say that function B is the caller and function A is the callee. On the call stack, the callee is always on top of the caller.

The crucial thing here is: objects of type point and rect are created as local variables of the callee and they are returned to the caller, being again only local variables in the caller's function stack frame. No pointers, no allocation on the heap and when the main function exits, you get everything nicely cleaned up.

You would also have everything nicely cleaned up even if you used the heap and forgot to free the allocated memory, because when the program terminates, the process that runs the program in the OS terminates and no memory could get lost. It was virtual memory to begin with, but you don't want to send an embedded device in space coded with memory leaks like that.

Every newcomer to C after some time notices that the pros instead of passing the entire object to the callee they pass only its pointer. This has always been justified as being faster and more efficient. The larger the structure, it is more effective to pass its pointer vs the entire structure. There is no reason you cannot use that right here.

The Stack is the most used data structure of all in computer technology. It is so important that much has been done to improve its efficiency. CPUs have hardware support for it: a stack pointer, special instructions that move data to and from the stack... If there is anything that you can point to that is in the cash of the CPU, that will probably be the stack. Allocating memory on the heap per object is an invitation to a cash miss when you latter dereference that object.

If one can sacrifice effectiveness for the benefit of a virtual machine, I'm willing to sacrifice effectiveness by copying entire structures on the stack and I bet C can do it faster than whatever the virtual machine does. But, large objects (kilobyte+) on the stack defeats its purpose as you cannot fit many of them in the cash of the CPU.

Function calls are piled up on the stack in the same First-In-Last-Out manner as everything else, which means the local variables of the caller are genuine and you can pass their address to the callee. We are going to add some functions to the point and rect class that use this. It's a bastardization on the idea of nested functions in some languages. For instance, Pascal uses a hidden parameter when nesting functions that enable them to access the local variables of the enclosing function. It's a pointer to the caller's stack frame.

Much like that other bastardization which uses a hidden parameter in functions that points to the object so it looks like as if y->f(x) is something else than f(&y, x)...

The venerable master Qc Na once told his student, Anton - "objects are a poor man's closures" and "closures are a poor man's objects".

Decomposing to Multiple Files in C

There are two datatypes, the point and the rect (rectangle). Each of them is represented by two files respectively. One is the header file with the .h extension, the other with the .c extension is called the source file in C jargon.

The first file is the interface, the second file is the implementation. This is fair play in C and is up to you. You can switch the extensions of the files and everything will work, provided that you switch them in the include statements and when invoking the compiler.

Header files in C are inserted in the source file just as plain text in the place where the #include directive is. A new temporary file is produced from the source file with all the header files inserted in it. That is what is translated by the compiler into a binary file. There for it is called a translation unit.

The code in those five files could well fit in one .c file, so why did I split in multiple files?
Although it seems that people want to split programs to separate & group things logically and create some sort of utopia, the most powerful reason is because work can also be split to multiple people. Each person being responsible for one or more source files.

What criteria did I use to split them the way I did and why I didn't put all the _move functions in one file, with the _new functions in another file?
The criteria is the data. Data and code are equal parts of the program, but the data is more equal.

You organize your functions and source files around the datatypes and can think of every datatype with its functionality as a programming library. Whatever new functionality you add to a type (like the point here), it is considered good practice to add that functionality into the point.c file and not to use it in place in whatever other file you are coding at the moment.

The datatype and all the functions that operate on that data is called a class. Working with an object only through its type interface is called encapsulation.

Example 1 has poor encapsulation. Not just that you can peek & poke the datatype directly, but rather that there is not enough support in the functions to avoid direct usage. Also, I have meddled with the encapsulation of point. At the lines of function rect_new where I set the value of the rect.point: t.p.x = p.x, t.p.y = p.y Here be dragons. I should have coded t.p = p.

When a project arises to 1500 files out of which 500 deal with points in this manner and there is a change in the way point works, instead of making that change only in the implementation of point I'm going to have to make changes in 500 files.

There is something fishy in the way I use the header files here. Say, if you want to make a rect datatype for other people to use, you don't want them to bother and edit the source files themselves with the correct order of include directives.

The rect type is dependent on the point type and it will be OK if you just include the point.h header file into the rect.h header file, so the person using rectangles wouldn't have to include rect dependencies himself into his source file. But there is a trap here. C doesn't want to have the same type declarations or definitions more than once in a compilation unit.

Imagine a scenario where one uses the rect, but also wants to use a point so he includes the file point.h into his source file, but then the point.h included in the rect.h file kicks in and you have two definitions of the same thing.

Another scenario, one uses a rect and a circle so she includes both rect.h and circle.h into her source file. Now both rect.h and circle.h include point.h on their own. ERR: multiple declaration, earlier declaration or redefinition of the thing called point.

The way to prevent this is by using macro guards also called include guards, header guards... etc. I firmly believe that it is beneficial to know the way to include all the header files without inclusion guards, but inclusion guards are a must.

Also, it is good to know the naked declaration usage of struct, then again using the typedef keyword makes things more convenient.

Example 2

point.h

#ifndef POINT_H
#define POINT_H

typedef struct point {
    unsigned x;
    unsigned y;
} point;

point point_new(unsigned, unsigned);
point point_move(point, int, int);
void point_movep(point *, int, int);

#endif

point.c

#include "point.h"

point point_new(unsigned x, unsigned y) {
    struct point t;
    t.x = x;
    t.y = y;
    return t;
}

point point_move(point t, int dx, int dy) {
    t.x = t.x + dx > 0 ? t.x + dx : 0;
    t.y = t.y + dy > 0 ? t.y + dy : 0;
    return t;
}

void point_movep(point * t, int dx, int dy) {
    t->x = t->x + dx > 0 ? t->x + dx : 0;
    t->y = t->y + dy > 0 ? t->y + dy : 0;
}

rect.h

#include "point.h"

#ifndef RECT_H
#define RECT_H

typedef struct rect {
    point p;
    unsigned w;
    unsigned h;
} rect;

rect rect_new(point, unsigned, unsigned);
rect rect_move(rect, int, int);
rect rect_size(rect, unsigned, unsigned);
void rect_movep(rect *, int dx, int dy);
void rect_sizep(rect *, unsigned w, unsigned h);

#endif

rect.c

#include "rect.h"

rect rect_new(point p, unsigned w, unsigned h) {
    rect t;
    t.p.x = p.x;
    t.p.y = p.y;
    t.w = w;
    t.h = h;
    return t;
}

rect rect_move(rect t, int dx, int dy) {
    return rect_new(point_move(t.p, dx, dy), t.w, t.h);
}

rect rect_size(rect t, unsigned w, unsigned h) {
    t.w = w;
    t.h = h;
    return t;
}

void rect_movep(rect * t, int dx, int dy) {
    point_movep(&t->p, dx, dy);
}

void rect_sizep(rect * t, unsigned w, unsigned h) {
    t->w = w;
    t->h = h;
}

example2.c

#include "rect.h"
#include "point.h"

#include <stdio.h>

int main(void) {
    rect a = rect_new(point_new(5, 10), 20, 30);
    printf("rectangle at (%d, %d), width %d, height %d\n", a.p.x, a.p.y, a.w, a.h);

    a = rect_size(rect_move(a, 10, 5), 40, 50);
    printf("rectangle at (%d, %d), width %d, height %d\n", a.p.x, a.p.y, a.w, a.h);

    rect_movep(&a, -5, -5);
    rect_sizep(&a, 60, 70);
    printf("rectangle at (%d, %d), width %d, height %d\n", a.p.x, a.p.y, a.w, a.h);
    
    return 0;
}

output:

rectangle at (5, 10), width 20, height 30
rectangle at (15, 15), width 40, height 50
rectangle at (10, 10), width 60, height 70

The #ifndef preprocessor directive works like any ordinary if statement. If the condition is false, it skips the next block. In preprocessor terms, that means the text from that #ifndef and up until that #endif will not enter into the compilation unit. On the other hand, if the macro POINT_H is not defined, it will immediately get defined and the text with point type definition will enter the compilation units of: point.c, rect.c and example2.c.

Just to check things, I have reversed the include directives for point.h and rect.h and now it seems as if rect is defined before the point. Primarily, rect.h includes point.h at line 1. Since everything is guarded when the C preprocessor gets to the line #include "point.h" in example2.c, it will include an empty string, a void, a nothing...

Strings

There is open space in the market for an entire book only on C strings. Here, the special case where you pass strings by value is of interest.

To be able to pass strings by value in C, you need two things:

The string should be of a known fixed size
It should be embedded into a struct.

The former is needed because to pass something by value to a function, it has to be a defined object. The compiler has to know beforehand what's the object size to reserve the required space in the function's stack frame.

To make it look more like a real-world example, I have decided objects like point and rect have an uuid string. What could be more fixed in size than that? If I decided to put something like a full Person's name instead of an ID, no problem. I'd just pick up an arbitrary large string to represent it. Something like 80 characters.

There are names that have less than 10 characters, so I'd be wasting 70 bytes. There are names that require more than 80 characters, who cares? You can never get it right. It's all a compromise, especially how you handle strings in C. The most common way is to have a char pointer to a null terminated string allocated on the heap.

I will used this implementation for an uuid with every respect to the authors: Paul J. Leach and Rich Salz. It is dependent on this document with the RSA Data Security, Inc. MD5 Message-Digest Algorithm by: Ronald L. Rivest.

The source files with the copyright notice are included in the article archive. I will make the uuid.lib static library. Here, only the usage of their header files will be shown.

Example 3

point.h

#ifndef POINT_H
#define POINT_H

typedef struct {
    char id[40];
    unsigned x;
    unsigned y;
} point;

point point_new(unsigned, unsigned);
point point_move(point, int, int);

#endif

point.c

#include "point.h"
#include "sysdep.h"
#include "uuid.h"

#include <stdio.h>

point point_new(unsigned x, unsigned y) {
    long i;
    uuid_t u;
    point t;
    
    uuid_create(&u);
    sprintf(t.id, "%8.8x-%4.4x-%4.4x-%2.2x%2.2x-", u.time_low, u.time_mid,
        u.time_hi_and_version, u.clock_seq_hi_and_reserved,
        u.clock_seq_low);
    for (i = 0; i < 6; i++)
        sprintf(&t.id[24 + 2 * i], "%2.2x", u.node[i]);

    t.x = x;
    t.y = y;
    return t;
}

point point_move(point t, int dx, int dy) {
    t.x = t.x + dx > 0 ? t.x + dx : 0;
    t.y = t.y + dy > 0 ? t.y + dy : 0;
    return t;
}

rect.h

#include "point.h"

#ifndef RECT_H
#define RECT_H

typedef struct {
    char id[40];
    point p;
    unsigned w;
    unsigned h;
} rect;

rect rect_new(point, unsigned, unsigned);
rect rect_move(rect, int, int);
rect rect_size(rect, unsigned, unsigned);

#endif

rect.c

#include "rect.h"
#include "sysdep.h"
#include "uuid.h"

#include <stdio.h>

rect rect_new(point p, unsigned w, unsigned h) {
    long i;
    uuid_t u;
    rect t;

    uuid_create(&u);
    sprintf(t.id, "%8.8x-%4.4x-%4.4x-%2.2x%2.2x-", u.time_low, u.time_mid,
        u.time_hi_and_version, u.clock_seq_hi_and_reserved,
        u.clock_seq_low);
    for (i = 0; i < 6; i++)
        sprintf(&t.id[24 + 2 * i], "%2.2x", u.node[i]);

    t.p.x = p.x;
    t.p.y = p.y;
    t.w = w;
    t.h = h;
    return t;
}

rect rect_move(rect t, int dx, int dy) {
    return rect_new(point_move(t.p, dx, dy), t.w, t.h);
}

rect rect_size(rect t, unsigned w, unsigned h) {
    t.w = w;
    t.h = h;
    return t;
}

example3.c

#include "rect.h"
#include "point.h"

#include <stdio.h>

int main(void) {
    rect R = rect_new(point_new(5, 10), 20, 30);
    printf("rectangle R at (%d, %d), width %d, height %d\n", R.p.x, R.p.y, R.w, R.h);
    printf("rectangle R id %s\n", R.id);
    printf("point of rect R id %s\n", R.p.id);

    R = rect_size(rect_move(R, 10, 5), 40, 50);
    printf("rectangle R at (%d, %d), width %d, height %d\n", R.p.x, R.p.y, R.w, R.h);
    printf("rectangle R id %s\n", R.id);
    printf("point of rect R id %s\n", R.p.id);

    return 0;
}

compile:

cl example3.c point.c rect.c ..\uuid\uuid.lib wsock32.lib

output:

rectangle R at (5, 10), width 20, height 30
rectangle R id 2afc6b63-e09a-11eb-babc-e837ef16bb95
point of rect R id Ю№
rectangle R at (15, 15), width 40, height 50
rectangle R id 2afeccc2-e09a-11eb-babc-e837ef16bb95
point of rect R id ╝№

It has at least two issues.

The point id in the rect is garbage, I forgot to copy id. If I keep adding new properties to the point, I will have to remember to add code for them in the rect. That's bad! My disrespect for point's otherwise permissive encapsulation bit me.

To mend this, one doesn't write:

t.p.id = p.id;
t.p.x = p.x;
t.p.y = p.y;

But simply:

t.p = p;

Always code what to do, not how to do it. Let the point take care of itself.

Notice that t.p.id = p.id will not work as expected. It is an error. You cannot assign an array in C by value, like you would an int. You will have to embed the array into a struct or cast it as a struct.

The second problem is, the id of the rect changes. I chose to over-engineer the rect_move function. In my desire to make it one line, I reuse point_move and rect_new, but that's an overkill. Overengineering is a weakness. Things should be done with as less work as possible.

Here, I can use one of those unholy procedures that take the address of a point and mutate its x and y, playing the game as if I would be using a nested function in a higher language. Because the stack frame of the point_movep procedure would be on top of the rect_move function's stack frame, it's safe.

Or I can use the function point_move to copy the entire point from the rect, modify the copy's x and y, then return the copy to replace the old point in the rect. Sounds more functional.

Or... can do something more evil, later.

On the question, what should be the desired minimum size of a function? In Example 3, point_move is two lines, rect_move is one line... What is the justification of a function being only one/two lines? Other saying that goes well with always code what to do, not how to do it is always program in the language of the domain. Meaning, it's better to see something like "move the point" than t->x = t->x + dx > 0 ? t->x + dx : 0

I'm a bit bored by the ugliness of the code inside point_new and rect_new. That code right after the uuid_create procedure. It is repeating and it will continue to repeat itself into as many datatypes I create that use uuid.lib. Maybe it's time to put it into its own procedure and put that procedure into a new source file or maybe into one of the source files of the original creators of the uuid library, which fells a bit unethical.

I use the term procedure for special case of function that does not return anything and mutates object in place. Those void procedures in Example 2. They may be more useful if I turn them into functions that take the address of the object to be mutated and when done return, it's address back to the caller so you can chain function calls.

Example 4

id40.h

#ifndef ID40_H
#define ID40_H

typedef char id40[40];
typedef struct {
    id40 x;
} str40;

void id40_set(char *);

#endif

id40.c

#include "id40.h"
#include "sysdep.h"
#include "uuid.h"

#include <stdio.h>

void id40_set(id40 t) {
    long i;
    uuid_t u;
    
    uuid_create(&u);
    sprintf(t, "%8.8x-%4.4x-%4.4x-%2.2x%2.2x-", u.time_low, u.time_mid,
        u.time_hi_and_version, u.clock_seq_hi_and_reserved,
        u.clock_seq_low);
    for (i = 0; i < 6; i++)
        sprintf(&t[24 + 2 * i], "%2.2x", u.node[i]);
}

point.h

#include "id40.h"

#ifndef POINT_H
#define POINT_H

typedef struct {
    id40 id;
    unsigned x;
    unsigned y;
} point;

point point_new(unsigned, unsigned);

int point_equals(point, point);
int point_equalsp(point *, point *);

point point_move(point, int, int);
point * point_movep(point *, int, int);

#endif

point.c

#include "point.h"

point point_new(unsigned x, unsigned y) {
    point t;
    
    id40_set(t.id);
    t.x = x;
    t.y = y;
    return t;
}

int point_equals(point a, point b) {
    return a.x == b.x && a.y == b.y;
}

int point_equalsp(point * a, point * b) {
    return a == b;
}

point point_move(point t, int dx, int dy) {
    t.x = t.x + dx > 0 ? t.x + dx : 0;
    t.y = t.y + dy > 0 ? t.y + dy : 0;
    return t;
}

point * point_movep(point * t, int dx, int dy) {
    t->x = t->x + dx > 0 ? t->x + dx : 0;
    t->y = t->y + dy > 0 ? t->y + dy : 0;
    return t;
}

rect.h

#include "id40.h"
#include "point.h"

#ifndef RECT_H
#define RECT_H

typedef struct {
    char id[40];
    point p;
    unsigned w;
    unsigned h;
} rect;

rect rect_new(point, unsigned, unsigned);

int rect_equals(rect, rect);
int rect_equalsp(rect *, rect *);

rect rect_move(rect, int, int);
rect * rect_movep(rect *, int, int);

rect rect_size(rect, unsigned, unsigned);
rect * rect_sizep(rect *, unsigned, unsigned);

#endif

rect.c

#include "rect.h"

rect rect_new(point p, unsigned w, unsigned h) {
    rect t;

    id40_set(t.id);
    t.p = p;
    t.w = w;
    t.h = h;
    return t;
}

int rect_equals(rect a, rect b) {
    return point_equals(a.p, b.p) && a.w == b.w && a.h && b.h;
}

int rect_equalsp(rect * a, rect * b) {
    return a == b;
}

rect rect_move(rect t, int dx, int dy) {
    t.p = point_move(t.p, dx, dy);
    return t;
}

rect * rect_movep(rect * t, int dx, int dy) {
    point_movep(&t->p, dx, dy);
    return t;
}

rect rect_size(rect t, unsigned w, unsigned h) {
    t.w = w;
    t.h = h;
    return t;
}

rect * rect_sizep(rect * t, unsigned w, unsigned h) {
    t->w = w;
    t->h = h;
    return t;
}

example4.c

#include "rect.h"
#include "point.h"

#include <stdio.h>

int main(void) {
    rect R = rect_new(point_new(5, 10), 20, 30);
    printf("rectangle R at (%d, %d), width %d, height %d\n", R.p.x, R.p.y, R.w, R.h);
    printf("rectangle R id %s\n", R.id);
    printf("point of rectangle R id %s\n", R.p.id);

    R = rect_size(rect_move(R, 10, 5), 40, 50);
    printf("rectangle R at (%d, %d), width %d, height %d\n", R.p.x, R.p.y, R.w, R.h);
    printf("rectangle R id %s\n", R.id);
    printf("point of rectangle R id %s\n", R.p.id);
    
    {
        rect S = rect_new(R.p, 88, 77);
        rect * T = &S;
        printf("rectangle S at (%d, %d), width %d, height %d\n", S.p.x, S.p.y, S.w, S.h);
        printf("rectangle S id %s\n", S.id);
        printf("point of rect S id %s\n", S.p.id);
        printf("rectangle R and rectangle S are %s\n", 
                rect_equals(R, S) ? "equal" : "unequal");
        printf("points of rectangle R and S are %s\n", 
                point_equals(R.p, S.p) ? "equal" : "unequal");
        printf("rect R and rect S are %s object\n", 
                rect_equalsp(&R, &S) ? "the same" : "not the same");
        printf("points of rect R and S are %s object\n", 
                point_equalsp(&R.p, &S.p) ? "the same" : "not the same");
        *T = rect_new(R.p, R.w, R.h);
        printf("rectangle R and rectangle *T are %s\n", 
                rect_equals(R, *T) ? "equal" : "unequal");
        printf("rect R and rect *T are %s object\n", 
                rect_equalsp(&R, T) ? "the same" : "not the same");
        printf("rect S and rect *T are %s object\n", 
                rect_equalsp(&S, T) ? "the same" : "not the same");        
    }
    
    return 0;
}

compile:

cl example4.c id40.c point.c rect.c ..\uuid\uuid.lib wsock32.lib

output:

rectangle R at (5, 10), width 20, height 30
rectangle R id 86122912-bf91-11eb-bddc-e1995e751a4c
point of rectangle R id 860fc623-bf91-11eb-bddc-e1995e751a4c
rectangle R at (15, 15), width 40, height 50
rectangle R id 86122912-bf91-11eb-bddc-e1995e751a4c
point of rectangle R id 860fc623-bf91-11eb-bddc-e1995e751a4c
rectangle S at (15, 15), width 88, height 77
rectangle S id 86122913-bf91-11eb-bddc-e1995e751a4c
point of rect S id 860fc623-bf91-11eb-bddc-e1995e751a4c
rectangle R and rectangle S are unequal
points of rectangle R and S are equal
rect R and rect S are not the same object
points of rect R and S are not the same object
rectangle R and rectangle *T are equal
rect R and rect *T are not the same object
rect S and rect *T are the same object

typedef is just a name alias, it does not create a new type in binary form. To illustrate this in rect.h char id[40] is used instead of id40, without a complaint from the compiler.

The most interesting thing in Example 4 is the type declaration of str40, which by itself is obsolete in the code. We need to declare this struct that embeds an array of 40 chars just to have the ability to assign one array to another. For instance:

*(str40 *)t.p.id = *(str40 *)p.id;

Instead of working with a real array that is inside a struct, we type cast the array. First to a pointer of the forementioned struct, then we dereference that pointer to the real struct. Now, we have copied an array by value.

To be frank, the identifiers t.p.id and p.id in C do not represent an array, but a constant address. That is why we first need to assign that address to a pointer and then dereference that pointer to get to the real meat of the struct/array.

Appendix

Let's finish the job with some command line tools. Be warned, anytime you see a build error mentioning something like htons, a library called ws2_32 or wsock32 has to be included in the build process.

Embarcadero Free C++ Compiler

This is a modern 32bit C/C++ Clang-based compiler with C11 support. It can be downloaded [here].

Unzip it in some folder, let's say C:\LANG, so that the directory structure will be C:\LANG\BCC102\bin. Next, we need to add it to the PATH system variable. Right click on My Computer, select Properties, then Advanced system settings. In the System Properties windows on the Advanced tab, down there is a Environment Variables button to click.

If you see a Path entry in the User variables list, click Edit. Add a new value C:\LANG\BCC102\bin. If there is no path variable, click New and add Variable value: C:\LANG\BCC102\bin, Variable name: Path.

Now you can open Command Prompt and enter bcc32x. If everything is OK, you will be greeted by the compiler saying that its version is 7.30 for Win32. Don't close the window for now. Download the examples source from this article and unzip it in say the C:\Source folder. It has five folders in it and the directory structure is C:\Source\Example3 etc.

Change the directory in the console to C:\Source\uuid.

cd \Source\uuid

Now you'll have to compile the source and then create the uuid library.

bcc32c -c md5c.c sysdep.c uuid.c
tlib uuid.lib /u /a /C +md5c +sysdep +uuid

Let's get into a directory of an example and create one of the executables.

cd ..\example4
bcc32c example4.c id40.c point.c rect.c ..\uuid\uuid.lib

That's it. The Borland/Embarcadero compilers don't complain about undefined reference to htons.

Mingw-w64

The "Minimalist GNU for Windows" is a free and open-source software development environment, a port of the GNU Compiler Collection. For those who want to compile in 64bit, the standalone version of the compiler packaged with some handy tools and libraries can be found on [TCL's page].

Download, execute and tell the self-extracting archive to unzip at the C:\LANG folder. It will create a MinGW subfolder. Add the C:\LANG\MinGW\bin directory to the PATH environment variable, just like we did before with the Embarcadero compiler.

Open the Command Prompt. To test enter gcc -v. This will show you the version of the GNU C compiler, at the point of this writing, it is 9.2.0.

Switch to the \Source\uuid directory and let's compile and create the uuid library.

gcc -c uuid.c md5c.c sysdep.c
ar ru libuuid.a uuid.o md5c.o sysdep.o

Now as we did before, we either need to explicitly include the libuuid.a static library in the build:

gcc example4.c id40.c point.c rect.c ..\uuid\libuuid.a -o example4.exe

or the object files that created it:

gcc example4.c id40.c point.c rect.c ..\uuid\uuid.o ..\uuid\md5c.o ..\uuid\sysdep.o

And here, the linker is complaining about some undefined reference called __imp_ntohs...

To fix this, we'll have to include the compiler's own ws2_32 or wsock32 library in the build process.

gcc example4.c id40.c point.c rect.c ..\uuid\libuuid.a -lwsock32 -o example4

or:

gcc example4.c id40.c point.c rect.c ..\uuid\libuuid.a -lws2_32 -o example4

Somewhere in the folder hierarchy of the MinGW compiler toolset, there is a file libwsock32.a that does the job.

Visual C++ Toolkit 2003

Last but not least, the same C/C++ compiler shipped with Visual Studio .NET 2003 (without the IDE) that Microsoft made freely available. It's a basic C89 compiler. To use for win32 applications (that work on anything from Windows 95 to the latest Windows 10), you need the Platform SDK. Borland, MinGW, LCC, Peles's and other windows compilers include platform files of their own.

Let's arm this tool so it can do damage. First, get the compiler. Search Google for a file VCToolkitSetup.exe.

Tell the installation wizard to install in C:\LANG\VS2003\ and this alone sets a environment variable called VCToolkitInstallDir, but you have to add C:\LANG\VS2003\BIN to the PATH, so do it.

Now you can compile academic C code that creates: linked lists, binary trees, multiply matrices, writes stuff to the console... but you cannot make win32 GUI apps. Download the Windows Server 2003 R2 Platform SDK in IMG format from [CNET].

Unpack it with something like 7z and start the Setup. Choose custom installation. Although the Platform SDK is big by y2k standards we actually need very little from it (header files like windows.h and some build tools that are missing in the VCToolkit). Tell the installation wizard to put the files in the same directory where we got the compiler (C:\LANG\VS2003), so we'll don't have to set additional environment variables to Windows.

When it gets to the "Check the options below to select and deselect individual features" window, deselect everything by clicking on the main feature box. That will mark it with a red cross. Now open "Microsoft Windows Core SDK" box, choose: "Build Environment (x86 32-bit)" and "Tools (AMD 64-bit)". Finish the installation.

Technically, now you have two compilers: Visual C++ 2003 32bit compiler and the Visual C++ 2005 Express 64bit compiler. We'll stick to the former. Add the AMD64-bit tools to the PATH, they should be in C:\LANG\VS2003\Bin\win64\x86\AMD64, but have it come after C:\LANG\VS2003\BIN in the list.

You need to add two new environment variables to the Windows OS. INCLUDE and LIB. Their values should be respectively: C:\LANG\VS2003\INCLUDE and C:\LANG\VS2003\LIB.

Once again, open the Command Prompt. Go to the uuid directory where you have extracted the examples.

cl -c uuid.c md5c.c sysdep.c
lib -nologo -out:uuid.lib uuid.obj md5c.obj sysdep.obj

Let's go to the example4 folder and build it, this time including the wsock32.lib without waiting for the linker to complain.

cl example4.c id40.c point.c rect.c ..\uuid\uuid.lib wsock32.lib

Happy coding!

History

9^th July, 2021: Initial version
10^th July, 2021: Update