Linux, PHP and Git are popular projects developed with C, on the other side, OpenOffice, firefox, Clang, Photoshop are developed with C++, so it’s proven that each one is a good candidate to develop complex applications. Try to prove that one language is better than the other is not a good debate. However, we can discuss motivations behind choosing one of them.

There are two major arguments quoted each time when we discuss choosing C:

Best performance
Compiler support

But there’s a controversy concerning these arguments, and it’s not the goal of this article to discuss them. There are many web resources talking about them, however the idea is to focus more on the impact of the language chosen on the application design.

For this purpose, we will analyze with CppDepend the Git code source and discover some design facts. Git is a distributed revision control and source code management (SCM) system with an emphasis on speed. Git was initially designed and developed by Linus Torvalds for Linux kernel development; it has since been adopted by many other projects.

In the Git Website, they argue that C was chosen to increase performance, but it’s not the opinion of Linus the initiator of the project who said about C++:

“inefficient abstracted programming models where two years down the road you notice that some abstraction wasn’t very efficient, but now all your code depends on all the nice object models around it, and you cannot fix it without rewriting your app.”

and:

“So I’m sorry, but for something like git, where efficiency was a primary objective, the “advantages” of C++ is just a huge mistake.”

Here, we can find the entire Linus point of view about choosing C over C++.

Let’s try to understand the Linus opinion by comparing the impact on the design between C and C++.

Modularity:Physical vs Logical

Modularity is a software design technique that increases the extent to which software is composed from separate parts, you can manage and maintain modular code easily.

We can modularize a project with two approaches:

Physically: by using directories and files, this modularity is provided by the operating system and can be applied to any language.
Logically: by using namespaces, component, classes, structs and functions, this technique depends on the language capabilities.

When we develop with C, and to package our code, we use essentially physical modularity, the code is structured by using directories to isolate modules, here’s for Git the dependency graph between some of its directories.

However, for C++ instead of C, we can use namespaces and classes to modularize the code, these types are provided by the language, and for the previous graph, we can use namespaces to modularize our code instead of directories.

Impact of Choosing One of the Two Approaches

Easy to understand: The logical approach is better because the modularity is well defined by the language artifacts, and just by reading the code, we can know in which module a code element exist.
Managing changes: A good design needs in general many iterations, and for the physical approach the impact of design changes can be very limited than the logical one, indeed we need only to move function or variable from a file to another, or move file from one directory to another.

However for C++, it can impact a lot of code because the logical modularity is implemented by the language artifacts and a code modification is needed.

Encapsulation: Class vs File

For C++, the encapsulation is defined as the process of combining data and functions into a single unit called class. Using the method of encapsulation, the programmer cannot directly access the data. Data is only accessible through the functions present inside the class.

For C, we can have an encapsulation, but using also a physical approach like described in the modularity section, and a class can be a file containing functions and data used by them, and we can limit the accessibility of functions and variables by using “static” keyword.

Git uses this technique to hide functions and variables, to discover that, let’s search for static function:

from m in Methods where m.IsStatic select m

The treemap is very useful to have a good idea of code elements concerned by a CQLinq query, the blue rectangles represent the result.

Almost all functions are declared as static to be visible only in the translation unit where they are declared, the same remark could be applicable for variables.

from f in Fields where f.IsStatic select f

Easy to Understand: Using C++ encapsulation mechanism improves the understanding and visibility of code, C is low level and uses physical approach rather than logical.
Managing changes: If we have to change the place where variables or functions are encapsulated, it can be very easy for C, but for C++, it can impact a lot of code.

Polymorphism vs Selection Idiom

Polymorphism means that some code or operations or objects behave differently in different contexts.

This technique is used a lot in C++ projects, but what about C?

For procedural languages, the selection techniques by using the keywords “switch”, “if” or maybe “goto” can simulate the polymorphism behavior, but this technique tends to increase cyclomatic complexity of code.

Let’s search for complex function inside Git code source.

Even Git is well developed, but many functions could be considered complex, it’s due to overusing of control flow instructions like “if”, “switch” or “goto”, with C++ however we can use polymorphism and to minimize the complexity of the code.

Easy to understand: Using Polymorphism permits the isolation of a specific behavior to a class, it improves the visibility and the cohesion of the code.
Managing changes: Adding another behavior with polymorphism can imply the adding of another class, however with selection idiom, you can add only another case under the switch statement.

Inheritance vs Composition

Git uses essentially structs to define data manipulated by functions. Let’s search for all structs used:

from t in Types where t.IsStructure select t

What’s interesting is that almost all data are isolated inside structs, and to verify that, we can search for all not const public variables that are primitives and not inside a struct:

from f in Fields where f.IsPublic && f.IsPrimitiveType
&& !f.IsStatic && !f.IsConst
select f

Only some variables are concerned what’s a good point for Git design.

So what about extending a struct, with C, we can use the composition like the case of “remote” struct, where many structs reference it.

However for C++, we can also use inheritence to extend structs, for example known_remote struct could inherit from remote one.

Easy to understand: Using inheritance can improve the understanding of data, but we have to be careful when using it, it's used only for the “Is” relation.
Managing changes: Inheritance implies a high coupling so any changes can impact a lot of code.

Conclusion

C++ provides better possibilities to have a beautiful and well structured code, but it comes with a price, any changes or refactoring could be difficult.

But doing refactoring needs to understand the existing code before making changes, C programs are more difficult to understand, but easy to change, however C++ project can be more structured than the C one, but needs some effort when making changes.

How we can limit the impact of changes for C++?

The good solution to limit the impact of changes is to use patterns, specially low coupling and high cohesion concepts to isolate changes only in a specific place, Irrlicht as explained in the previous post as a good example of using low coupling.

Filed under: CodeProject, Uncategorized

Motivations of Choosing C: Git Case Study