Using C++ Templates for Startup Validation

AlexanderGorobets

Rate me:

4.27/5 (15 votes)

3 Nov 2010CPOL17 min read

38.6K

222

The article describes a technique for using C++ templates for general startup validation purposes.

Download source code - 7.65 KB

The Problem

C++ compiler is my friend. It always surprises me how effective it is in finding the bugs in my code. Very often, after I fix all the compile-time errors, there are no more errors left. I explain this fact by assuming that most of the errors in the code are made accidentally. This is, of course, only if the initial algorithm is correct. The same way as an accidental error on a network line almost certainly breaks the CRC of the message, an accidental error in the C++ code almost certainly breaks the C++ syntax and hence it’s detectable by the compiler.

However, there are some other types of errors that the C++ compiler cannot detect even if they are theoretically detectable at compile time. These bugs may happen when strings, formatted by certain rules, are embedded in C++ code. The examples are: the printf() family of functions, XML expressions, Regular Expressions, and SQL queries. The problem with these errors is that C++ usually doesn’t know anything about a particular formatting rule. In this series of two articles, I will present a technique that can be used to detect these kind of errors. How can this be done? We cannot change the C++ compiler; however, I will show how it is possible to detect and report errors using our own C++ code in a manner very similar to that of the C++ compiler. I call my approach "startup validation". It has the following distinct properties:

The code with the error does not need to run to detect the error.
All functions and files where the error can happen are analyzed automatically. Even if the function with the error is never called, the error is detected as soon as the file that contains the function compiles.
The whole validation procedure is started by a simple C++ function that should be called explicitly as soon as the application starts, probably after some initialization.
If running under Visual Studio, the validation procedure reports errors in the Output window. By double-clicking on the error line, the Visual Studio C++ editor opens the file with the error and positions the cursor on the line that contains the error.

As you can see, while you still need to run the application to launch startup validation, it gives almost the same experience as the C++ compiler errors detection.

All the code in this article was tested on the Windows OS and with the Visual Studio 2005 C++ compiler.

Now let’s start. The problem that I am solving in my first article is validation of the printf() family of functions. The function printf() has many relatives – sprintf(), printf()f, etc. All of them have a format string parameter and a variable list of optional arguments. The number of arguments and argument types should mach the format specifications in the format string. If they don’t, different bad things could happen when the function is called.

Let’s look at an example. The function in the code snippet below has an invalid argument type – double instead of string. It causes the application to crash.

C++

double d=13.3;
printf("Value of decimal = %s", d);

Even if the application does not crash, other types of errors are possible with printf(). In the next example, the function has only two arguments instead of the three required. It displays garbage in place of the missing parameter.

C++

printf("Param1 = %s,  param2 = %s, param3 = %s", "one", "two");

The errors could be more subtle. The following function call has an argument of string type that matches the format specification, but the size prefix – "l" - is incorrect here. As a result, the string argument is not displayed at all.

C++

printf("String=%ls", "string");

The boost library provides a format() function as a replacement for printf(). It is more stable; however, it still lacks the ability to detect errors at compile time. The following code snippet has two arguments instead of three, and it throws an exception at runtime. If this exception is not properly caught, it is an unhandled exception and it again causes application crash.

C++

boost::format("Param1 = %s,  param2 = %s, param3 = %s") % "one" % "two"

It is very hard to test all the places of the code where printf() is called. To make things worse, if printf() is used to report other errors, you should reproduce in your tests all those possible errors and carefully test the error reports output. If the format string is a constant, it is known at compile time, and therefore the errors could potentially be detected at compile time. The GCC compiler, in fact, detects this kind of errors if running with the warning switch /Wformat. Unfortunately, Visual C++ compilers up to VS2010 do not have an equivalent switch. The format string can also be dynamically constructed at runtime; however, I rarely use this feature. The described technique is applicable only for constant format strings.

Building a Replacement for printf()

The key problem was that I had to design a system that will capture all the printf() calls with all format strings, all optional argument types used with each call, and the file and line number of the call. It became clear for me very soon that I had to use a macro. I cannot imagine how I can capture the file name and the line number of the printf() call without a macro. If I had to use a macro anyway, I wanted to design a macro as close to the original syntax as possible. I wanted to change each printf() call to some snippet of C++ code and wrap it into a macro. Presumably, this snippet would capture the information I needed and keep it in some common storage, but I had to find a way to launch those snippets automatically. All of them should be called at application startup time.

Fortunately, there is a method that leads in the right direction. If you have global variables or static class member variables in your code and you link your application with the CRT library, CRT will automatically construct objects of those variables before the main() function is called. The order of creation is not defined, however. That was what I needed, but there was still a syntax challenge. How can I generate different variables/class names if I am restricted to the printf()– like call syntax? The idea of using the __FILE__, __LINE__ preprocessor directives was tried and quickly rejected. One of the problems with it was that there might be two calls in the same file on the same line. Then, suddenly the idea came.

If I use a template class, the C++ compiler will instantiate this class for each different template parameter.
If this class has a static member variable, then for every concrete class type, instantiated by the compiler, one object of this member variable will be created by the CRT at startup time. If the type of this variable has a non-trivial constructor, the constructor code will run at startup time for every object created.
C++ allows to use local classes – the ones that are declared inside a function call and invisible outside the function.
If then I use a local class as a template parameter for a template non-local class, I will have a separate instantiation for each local class. I can then write the same local class declaration, but in a separate dedicated scope, and not bother with its name – there would be no name conflicts – the C++ compiler will take care of this. Then, I will wrap this local class/structure declaration in a macro and the thing is done!

Let’s look at the code snippet. It is stripped from more details for simplicity.

C++

template <typename Place>
class PrintfGlobal_0{
public:
  PrintfGlobal_0()
  {}

  PrintfGlobal_0(bool CreateInstance)
  {
    // Execute this code, please
  }
  static const char* Format()
  {
    return s_Instance.m_Format.c_str();
  }

private:
  static PrintfGlobal_0 s_Instance;
};
// Need to define static member, otherwise I will have a linker error
template <typename Place>
PrintfGlobal_0<Place> PrintfGlobal_0<Place>::s_Instance(true);

In this piece of code, the template class PrintfGlobal_0 has a template parameter called Place. It also has a static member s_Instance. For each different concrete type of Place, the C++ compiler will instantiate a separate PrintfGlobal_0. And if its member s_Instance is used somewhere in the code, the object of PrintfGlobal_0<Place> will be created at startup time by the CRT. Whatever code I put in the PrintfGlobal_0 constructor will be executed automatically. Now you might notice a boolean parameter – I use it just for convenience to distinguish the startup construction from other instances construction that I might use in my code. So far, so good.

Now, let’s look at the macro that replaces the sprintf() function. I chose the sprintf() function because I use it more often in my code than printf(). Let’s start first with the variant without the optional arguments. I always use capital letters with macros, and consider this as a good style to avoid confusion with functions.

C++

#define SPRINTF_0(buffer, format)  \
{  \
  typedef struct  \
  {  \
    static const char* Format()  \ // Capture format string
    {  \
      return (format);  \
    }  \
    static const char* File()  \ // Capture file name
    {  \
      return (__FILE__);  \
    }  \
    static int Line(int line=__LINE__)  \ // Capture line name
    {  \
      return line;  \
    }  \
  } Placement;  \
  ChkdPrintf::Snprintf<Placement>((buffer), (sizeof(buffer)));  \
// Invokes PrintfGlobal_0<Placement> instantiation, you will see in a minute how
}

Here I use a local structure, typedef-ed as Placement. It captures three essential things: the format string that I provide as a macro parameter, a file name, and a line name. All these parameters are captured as static member functions so I can easily access them when needed. Note the trick with the line - a more simple approach in the code below is not working with the Visual C++ compiler.

C++

static int Line()  \ // This is not working
{  \
  return __LINE__;  \
}  \

The function Snprintf() is doing two things. At runtime, it calls the real _snprintf() function. At compile time, it causes the C++ compiler to instantiate PrintfGlobal_0<Place>, because it calls its static member function Format(). Moreover, the Format() function references the s_Instance static member, therefore this member will be compiled and the object of it will be created at startup time by the CRT.

C++

template <typename Place>
int Snprintf(char *buffer, size_t len)
{
  buffer[len-1]='\0';
  // Real function that is doing the job at runtime
  return _snprintf(buffer, len-1, PrintfGlobal_0<Place>::Format());
}

Now, let’s see how the capture itself works. Here is a prototype of the PrintfValidated class. This class will be a workhorse that performs all the validation.

C++

class PrintfValidated
{
public:
  PrintfValidated()
  {}
protected:
  PrintfValidated(bool)
  {  // Stores pointer to itself
    GetAllPrintfs().push_back(this);
  }
  static std::vector<PrintfValidated*>& GetAllPrintfs()
  {
    static std::vector<PrintfValidated*> pPrintfs;
    return pPrintfs;
  }
  template <typename Place>
  void SetLocation()
  {
    m_Format=Place::Format();
    //
    char buffer[1024];
    memset(buffer, 0, sizeof(buffer));
    // Stores file+line as a string: \SomeFile.cpp(N)
    _snprintf(buffer, sizeof(buffer)-1, "%s(%d)", Place::File(), Place::Line());
    m_Location=buffer;
  }

  std::string m_Format; // Formatting string
  std::string m_Location; // File and line number
};

PrintfValidated stores the formatting string in a non-static member m_Format. Both the file name and the line number are stored as a single string in a non-static member m_Location. To populate these members, PrintfValidated provides a template function SetLocation() that gets the formatting string, file name, and a line number by calling the corresponding static member functions of the Place template parameter. As you might have already seen, our localPlacement structures will serve as these parameters. When created by the overloaded protected constructor accepting a boolean parameter, PrintfValidated pushes a pointer to itself into a vector of pointers. This vector of pointers is stored as a local static variable pPrintf, and the reference of it is returned by the static member function GetAllPrintfs().

The code below shows the full version of the PrintfGlobal_0 class. It derives from PrintfValidated and calls its template SetLocation() member function in the overloaded constructor with the template parameter of Place. For each SPRINTF_0() call in the code, a unique local Placement class is created, a distinct instantiation of the PrintfGlobal_0 class is compiled, an object of the static member s_Instance is created by the CRT at startup time, an object of the PrintfValidated base class is created by the overloaded constructor, and the pointer to it is added to the static vector. Wow!

C++

template <typename Place>
class PrintfGlobal_0 : public PrintfValidated
{
public:
  PrintfGlobal_0()
  {}

  PrintfGlobal_0(bool CreateInstance)
    : PrintfValidated(CreateInstance)
  {
    SetLocation<Place>();
  }

  static const char* Format()
  {
    return s_Instance.m_Format.c_str();
  }

private:
  static PrintfGlobal_0 s_Instance;
};

template <typename Place>
PrintfGlobal_0<Place> PrintfGlobal_0<Place>::s_Instance(true);

The next task is to add optional arguments to our macro. I will add one argument at a time. For each number of arguments, a separate macro will be created. I cannot avoid this, because a macro cannot hold an arbitrary number of parameters. Also, each template class should contain a certain number of template parameters, at least for now when variadic templates are not yet supported by the Visual C++ compiler. I need to capture the types for all arguments, because I need to validate them against the formatting string. Here is a macro replacing the sprintf() function with an optional argument – arg1.

C++

#define SPRINTF_1(buffer, format, arg1)  \
{  \
  typedef struct  \
  {  \
    static const char* Format()  \
    {  \
      return (format);  \
    }  \
    static const char* File()  \
    {  \
      return (__FILE__);  \
    }  \
    static int Line(int line=__LINE__)  \
    {  \
      return line;  \
    }  \
  } Placement;  \
  ChkdPrintf::Snprintf<Placement>((buffer), (sizeof(buffer)), (arg1));  \
}

template <typename Place, typename A>
int Snprintf(char *buffer, size_t len, const A& a)
{
  buffer[len-1]='\0';
  // Optional argument passed to _snprintf
  return _snprintf(buffer, len-1, PrintfGlobal_1<Place, A>::Format(), a);
}

As you can see, it’s almost the same as the SPRINTF_0 macro. The difference is that it calls the Snprintf() template function with an extra template parameter – A. Note the template deduction technique that is used here when the type of the template parameter A is determined by the type of the argument arg1. The argument arg1 is passed to the real _snprintf() function, and its type A is used for instantiation of the template class PrintfGlobal_1.

C++

template <typename Place, typename A>
class PrintfGlobal_1 : public PrintfValidated
{
public:
  PrintfGlobal_1()
  {}

  PrintfGlobal_1(bool CreateInstance)
    : PrintfValidated(CreateInstance)
  {
    SetLocation<Place>();
    // Captures parameter type
    AddType(TypeID<A>::Get());
  }

  static const char* Format()
  {
    return s_Instance.m_Format.c_str();
  }

private:
  static PrintfGlobal_1 s_Instance;
};

template <typename Place, typename A>
PrintfGlobal_1<Place, A> PrintfGlobal_1<Place, A>::s_Instance(true);

The difference from the PrintfGlobal_0 class is one extra line in the constructor:

C++

AddType(TypeID<A>::Get());

This line captures the type of the optional argument. First, I had to somehow enumerate all the valid types that can be used as optional printf() arguments. Then I needed to find a way to determine the enum member by the real C++ type.

C++

enum KnownTypes
{
  CHAR,
  UCHAR,
  STR,
  WCHAR,
  WSTR,
  SHORT,
  USHORT,
  INT,
  UINT,
  LONG,
  ULONG,
  LONGLONG,
  ULONGLONG,
  DOUBLE,
  FLOAT,
  VOID_PTR,
  LAST_TYPE=VOID_PTR
};

template <typename T>
struct TypeID
{
  static KnownTypes Get();
};

template <>
struct TypeID<char>
{
  static KnownTypes Get() { return CHAR; }
};

template <>
struct TypeID<const char*>
{
  static KnownTypes Get() { return STR; }
};

template <int N>
struct TypeID<char[N]>
{
  static KnownTypes Get() { return STR; }
};

template <>
struct TypeID<short>
{
  static KnownTypes Get() { return SHORT; }
};

template <typename T>

struct TypeID<T*>
{
  static KnownTypes Get() { return VOID_PTR; }
};
// More types, omitted for simplicity

The only types that can be used as optional printf() arguments are built-in C++ integral types, C-strings, and a pointer to void. Here, I used the technique called template specialization. If you are not fully familiar with it, I will explain it with this example. I have a general template struct TypeID with a static function Get(). I also have separate declarations called specializations of the same struct, but each of them has its own concrete template parameter. For each C++ type that I want to use, I provide a separate specialization. Every specialization has its own version of the Get() function that just returns the KnownTypes enum member corresponding to the type used for the specialization. Now, several important details. First, the Get() function of the general template class does not have any implementation whatsoever. I could provide the general implementation, and it would work as a default for all other types that I had not specialized. But in this case, I don’t want the default implementation, because I don’t know how to format an object of an arbitrary type. If in my code I will try to call SPRINTF_N() with some other type, my program will not compile, and that’s exactly what I want. Because I did not provide a general implementation, the errors like the one in the code snippet below will be detected at compile time:

C++

std::string my_string;
// Error, should be my_string.c_str(). Causes crash at runtime
sprintf(buffer, "String=%s", my_string);
// Does not even compile
SPRINTF_1(buffer, "String=%s", my_string);

The second important detail here is using of partial specialization. The following template structure specializes TypeID for every type that is a pointer.

C++

template <typename T>
struct TypeID<T*>
{
  static KnownTypes Get() { return VOID_PTR; }
};

Another template structure specializes TypeID for a static array of chars of every length. This is also partial specialization, because static arrays of different sizes are different types in C++.

C++

template <int N>
struct TypeID<char[N]>
{
  static KnownTypes Get() { return STR; }
};

Note also that I provide a special specialization for pointer to char and for array of chars. Though sometimes C++ compiler can implicitly convert an array to a pointer, template specialization is not that case. Looking at the code, you might ask – why use a template class with a static member function? Why not use just the template function? The reason is that while C++ allows full template function specialization, it does allow partial template function specialization. Only a template class could be specialized partially. Every time I need a partial template function specialization, I replace my template function with a helper template class with a static non-template function.

The last thing I want to bring to your attention to specialization precedence. As you can see, I created a full specialization for a pointer to char and a partial specialization for any pointer. Would it cause an ambiguity? No, full specialization always takes precedence over partial, and more specialized declaration takes precedence over less specialized.

We are ready now to see the full declaration of the PrintfValidated class:

C++

class PrintfValidated
{
public:
  PrintfValidated()
  {}

  bool Validate();

  static bool ValidateAll();
  static std::vector<PrintfValidated*>& GetAllPrintfs();

protected:
  PrintfValidated(bool)
  {
    GetAllPrintfs().push_back(this);
  }
  static std::vector<PrintfValidated*>& GetAllPrintfs()
  {
      static std::vector<PrintfValidated*> pPrintfs;
      return pPrintfs;
  }
  template <typename Place>
  void SetLocation()
  {
    m_Format=Place::Format();

    char buffer[1024];
    memset(buffer, 0, sizeof(buffer));
    _snprintf(buffer, sizeof(buffer)-1, "%s(%d)", Place::File(), Place::Line());
    m_Location=buffer;
  }

  std::string m_Format;
  // Stores argument types, used with this instance
  std::vector<KnownTypes> m_Types;
  std::string m_Location;

  void AddType(KnownTypes type)
  {
    m_Types.push_back(type);
  }
};

All parameter types used for the particular instance of PrintfValidated are stored in a member m_Types which is a vector. This vector is populated by the AddType() member function, called by the derived class PrintfGlobal. PrintfGlobal with N optional arguments as template parameters calls AddType() N times in the constructor. Our capture is successfully completed.

Validation

Validation itself is performed by the static member function ValidateAll() of the class PrintfValidated.

C++

bool PrintfValidated::ValidateAll()
{
  std::vector<PrintfValidated*>& AllSprintfs=GetAllPrintfs();
  bool Ok=true;
  for (size_t i=0; i<AllSprintfs.size(); i++)
  {
    PrintfValidated* CurSprintf=AllSprintfs[i];
    if (!CurSprintf->Validate())
    {
      Ok=false;
    }
  }
  return Ok;
}

This function should be called explicitly when the application starts. It just iterates over all instances of the PrintfValidated class and calls their instance member function Validate() to validate them. Note that if any of the validation fails, the iteration continues, because we want to report all errors. The Validate() function is not very interesting. It parses the formatted string, extracts format specifications, compares them with the parameter types, and reports errors.

C++

bool FilterPercent(const char*& pos);
bool FilterFlags(const char*& pos);
int FilterWidth(const char*& pos);
bool FilterDec(const char*& pos);
bool FilterPrecision(const char*& pos);
PREFIXES FilterOptions(const char*& pos);
bool IsValidPrefix(KnownTypes type, PREFIXES prefix, char symbol);
bool IsTypeSymbol(char token);

//static const char* Prefixes[]={
//};

bool PrintfValidated::Validate()
{
  char buf_err[1024];
  memset(buf_err, 0, sizeof(buf_err));

  size_t NumParams=m_Types.size();

  bool BadNumberOfPlaceholders=false;

  size_t ParamPlaceHolders=0;
  bool DynamicWidth=false;
  const char* pos=strchr(m_Format.c_str(), '%');
  // m_Format is NULL – terminated
  while (pos!=NULL)
  {
    // Dynamic width is defined by * placeholder
    // and 1 extra argument
    DynamicWidth=false;
    pos++;
    // Flags are: -+0 #
    FilterFlags(pos);
    if (FilterWidth(pos)==2)
    {
      DynamicWidth=true;
    }
    if (FilterDec(pos))
    {
      FilterPrecision(pos);
    }
    // Prefixes are: h l ll I I32 I64
    PREFIXES prefix=FilterOptions(pos);
    if (!FilterPercent(pos))
    {
      if (DynamicWidth)
      {
        // Dynamic width is indicated by * symbol
        ParamPlaceHolders++;
        if (ParamPlaceHoldersNumParams)
        {
          BadNumberOfPlaceholders=true;
          // Continue loop to calculate the actual number of format placeholders
        }
        else
        {
          KnownTypes type=m_Types[ParamPlaceHolders-1];
          const char* valid_symbols=TYPE_SPECIFIERS[type];
          // * acts like a %d
          const char* ValidSymbolPos=strchr(valid_symbols, 'd');

          if (ValidSymbolPos==0)
          {
            _snprintf(buf_err, sizeof(buf_err)-1, "%s : error : 'printf' width specifier"
              " * at position %d does not match parameter type\n", 
              m_Location.c_str(), NumParams-1);
            OutputDebugString(buf_err);
            return false;
          }
        }
      }
      char symbol=*pos;
      if (IsTypeSymbol(symbol))
      {
        ParamPlaceHolders++;
        if (ParamPlaceHolders>NumParams)
        {
          BadNumberOfPlaceholders=true;
          // Continue loop to calculate the actual number of format placeholders
        }
        else
        {
          KnownTypes type=m_Types[ParamPlaceHolders-1];
          const char* valid_symbols=TYPE_SPECIFIERS[type];
          const char* ValidSymbolPos=strchr(valid_symbols, symbol);

          if (ValidSymbolPos==0)
          {
            _snprintf(buf_err, sizeof(buf_err)-1, 
               "%s : error : 'printf' format character"
               " '%c' at position %d does not match parameter type\n", 
               m_Location.c_str(), symbol, NumParams-1);
            OutputDebugString(buf_err);
            return false;
          }
          if (!IsValidPrefix(type, prefix, symbol))
          {
            _snprintf(buf_err, sizeof(buf_err)-1, 
               "%s : error : 'printf' at position %d prefix"
               " %s does not match format character '%c' and parameter type\n", 
               m_Location.c_str(), NumParams-1, PREFIXES_STR[prefix], symbol);
            OutputDebugString(buf_err);
            return false;
          }
        }
      }
    }

    pos=strchr(++pos, '%');
  }
  if (ParamPlaceHolders!=NumParams)
  {
    _snprintf(buf_err, 
       sizeof(buf_err)-1, "%s : error : 'printf' number of format placeholders"
       " - %d does not match number of arguments - %d\n", 
       m_Location.c_str(), ParamPlaceHolders, NumParams);
    OutputDebugString(buf_err);
    return false;
  }
  return true;
}

bool FilterSpaces(const char*& pos)
{
  bool ret=false;
  while (*pos == ' ')
  {
    pos++;
  }
  return ret;
}

bool FilterPercent(const char*& pos)
{
  bool ret=false;
  if (*pos == '%')
  {
    ret=true;
    pos++;
  }
  return ret;
}

bool FilterFlags(const char*& pos)
{
  bool ret=false;
  const char FLAGS[]="-+0 #";
  char token=0;
  while ((token=*pos) > 0 && strchr(FLAGS, token))
  {
    ret=true;
    pos++;
  }
  return ret;
}

// Returns 0 = no width symbol; 1 = static width; 2 = variable width (*)
int FilterWidth(const char*& pos)
{
  int ret=0;
  char token=0;
  if ((token=*pos) == '*')
  {
    pos++;
    ret=2;
  }
  while ((token=*pos) > 0 && isdigit(token))
  {
    ret = (ret==0) ? 1 : ret;
    pos++;
  }
  return ret;
}

bool FilterDec(const char*& pos)
{
  bool ret=false;
  if (*pos == '.')
  {
    ret=true;
    pos++;
  }
  return ret;
}

bool FilterPrecision(const char*& pos)
{
  bool ret=false;
  char token=0;
  while ((token=*pos) > 0 && isdigit(token))
  {
    ret=true;
    pos++;
  }
  return ret;
}

PREFIXES FilterOptions(const char*& pos)
{
  PREFIXES prefix=NONE;
  char token=*pos;
  if (token == 'h')
  {
    prefix=h;
    pos++;
  }
  else if (token == 'w')
  {
    prefix=w;
    pos++;
  }
  else if (token == 'l')
  {
    prefix=l;
    pos++;
    if ((token=*pos) == 'l')
    {
      prefix=ll;
      pos++;
    }
  }
  else if (token == 'I')
  {
    prefix=I;
    const char* cpos=pos+1;
    if (*cpos == '3')
    {
      prefix=NONE;
      cpos++;
      if (*cpos == '2')
      {
        prefix=I32;
        pos=cpos+1;
      }
    }
    else if (*cpos == '6')
    {
      prefix=NONE;
      cpos++;
      if (*cpos == '4')
      {
        prefix=I64;
        pos=cpos+1;
      }
    }
  }
  return prefix;
}

bool IsTypeSymbol(char token)
{
  bool ret=false;
  const char TYPE_SYM[]="cCdiouxXeEfgGaAnpsS";
  if (token > 0 && strchr(TYPE_SYM, token))
  {
    ret=true;
  }
  return ret;
}

bool IsValidPrefix(KnownTypes type, PREFIXES prefix, char symbol)
{
  if (prefix==NONE)
  {
    return true;
  }
  for (size_t i=0; i<PERFIXES_USAGE_SIZE; ++i)
  {
    const PrefixForType& pr=PERFIXES_USAGE[i];
    if (pr.m_Type==type && pr.m_Prefix==prefix && strchr(pr.m_TypeSpec, symbol))
    {
      return true;
    }
  }
  return false;
}

I use two arrays for validation. The first one is for validating type specifiers against types. It maps every possible argument type to type specifiers valid for it.

C++

static const char* TYPE_SPECIFIERS[]={
"cC",     // CHAR
"cC",     // UCHAR
"s",      // STR
"C",      // WCHAR
"S",      // WSTR
"diouxXn",// SHORT
"ouxXn",  // USHORT
"diouxXn",// INT
"ouxXn",  // UINT
"diouxXn",// LONG
"ouxXn",  // ULONG
"diouxXn",// LONGLONG
"ouxXn",  // ULONGLONG
"eEfgGaA",// DOUBLE
"eEfgGaA",// FLOAT
"xXp"     // VOID_PTR
};

The second one is for validating prefixes against types and type specifiers. It maps every possible type of argument with every possible prefix for this type to type specifiers valid for this pair of type and prefix.

C++

enum PREFIXES
{
  NONE,
  h,
  l,
  I,
  I32,
  I64,
  ll,
  w
};

static const char* PREFIXES_STR[]={
  "(empty)",
  "h",
  "l",
  "I",
  "I32",
  "I64",
  "ll",
  "w"
};

struct PrefixForType
{
  KnownTypes m_Type;
  PREFIXES m_Prefix;
  const char* m_TypeSpec;
};

static const PrefixForType PERFIXES_USAGE[]={
{CHAR, h, "cC"},
{UCHAR, h, "cC"},

{SHORT, h, "diouxX"},
{USHORT, h, "ouxX"},
{SHORT, l, "diouxX"},
{USHORT, l, "ouxX"},
{SHORT, I, "diouxX"},
{USHORT, I, "ouxX"},
{SHORT, I32, "diouxX"},
{USHORT, I32, "ouxX"},

{INT, h, "diouxX"},
{UINT, h, "ouxX"},
{INT, l, "diouxX"},
{UINT, l, "ouxX"},
{INT, I, "diouxX"},
{UINT, I, "ouxX"},
{INT, I32, "diouxX"},
{UINT, I32, "ouxX"},

{LONG, h, "diouxX"},
{ULONG, h, "ouxX"},
{LONG, l, "diouxX"},
{ULONG, l, "ouxX"},
{LONG, I, "diouxX"},
{ULONG, I, "ouxX"},
{LONG, I32, "diouxX"},
{ULONG, I32, "ouxX"},

{LONGLONG, h, "diouxX"},
{ULONGLONG, h, "ouxX"},
{LONGLONG, l, "diouxX"},
{ULONGLONG, l, "ouxX"},
{LONGLONG, I, "diouxX"},
{ULONGLONG, I, "ouxX"},
{LONGLONG, I32, "diouxX"},
{ULONGLONG, I32, "ouxX"},
{LONGLONG, I64, "diouxX"},
{ULONGLONG, I64, "ouxX"},
{LONGLONG, ll, "diouxX"},
{ULONGLONG, ll, "ouxX"},

{DOUBLE, l, "f"},
{FLOAT, l, "f"},

{WCHAR, l, "C"},
{WSTR, w, "S"}

};

static const size_t PERFIXES_USAGE_SIZE=sizeof(PERFIXES_USAGE)/sizeof(PrefixForType);

Here, I give myself some freedom and do not follow standards too closely. For example, I can format a character as a symbol with a type specifier ‘c’, or I can format it as an integer with a type specifier ‘d’. I prefer not to use type specifier ‘d’ with characters, because if I make a mistake and use ‘d’ instead of ‘c’, the output will be different from what I want. On the other hand, if I want to format a character as an integer, I can always cast the character to integer explicitly.

C++

char c=10;
printf("Formatting this as an integer: %d\n", c); // No, I don’t want this syntax
printf("Formatting this as an integer: %d\n", int(c)); // I prefer this syntax

I forbid this type of formatting conversion, and limit the type char with only ‘c’ and ‘C’ type specifiers. If you don’t like my approach, you can change it. All of the formatting compatibility rules are in these two arrays.

Now I want to bring your attention to how I report the errors. I use OutputDebugString() for this. When my program is running under Visual Studio IDE, the errors are displayed in the Output window. Every error description starts with the file name and the line number that is captured in the m_Location member of PrintfValidated and formatted like FileName(LineNumber). To display the full path of the file, the project should be built with the /FC compiler switch. This switch can be found in the "C/C++" – "Advanced" node in the "Project properties" dialog. If I double-click on the line of Output window that is formatted according to this rule, Visual Studio will automatically open that file and position the cursor on that line. Cool!

Testing

Now let’s examine the test program.

C++

int main(int argc, char* argv[])
{
  if (argc>1)
  {
    switch (*argv[1])
    {
    case '1':
      // Use command-line parameter "1" to get here
      Bug1();
      break;
    case '2':
      // Use command-line parameter "2" to get here
      Bug2();
      break;
    case '3':
      // Use command-line parameter "3" to get here
      Bug3();
      break;
    case '4':
      // Use command-line parameter "4" to get here
      Bug4();
      break;
    }

    getchar();
    return 0;
  }

  // Do not use command-line parameters to get here
  if (!ChkdPrintf::PrintfValidated::ValidateAll())
  {
    OutputDebugString("sprintf validation failed\n");
    // We exit in production
    OutputDebugString("Execution terminated\n");
    //exit(1);
  }
  // Extra tests
  Test();
  getchar();
  return 0;
}

If you run the application with the parameter ‘1’, ‘2’, ‘3’, or ‘4’, you can reproduce the bugs with the normal printf() that I described in the beginning of this article. (To put a parameter, when running under Visual Studio, use the "Project Properties" dialog – "Debugging" node – "Command Arguments" line.) If you run it without any parameter, the ValidateAll() function will find all the errors in the several test functions and report them. You can jump to the location of each error by double-clicking on the error line in the Output window.

C++

void Test_InvalidParamType()
{
  char buffer[128];
  double d=13.3;
  SPRINTF_1(buffer, "Value of decimal = %s", d);
  std::cout << buffer << std::endl;
}

void Test_MissingParameter()
{
  char buffer[128];
  SPRINTF_2(buffer, "Param1 = %s,  param2 = %s, param3 = %s", "one", "two");
  std::cout << buffer << std::endl;
}

void Test_OddParameter()
{
  char buffer[128];
  SPRINTF_3(buffer, "Param1 = %s,  param2 = %s", "one", "two", "three");
  std::cout << buffer << std::endl;
}

void Test_InvalidPrefix()
{
  char buffer[128];
  SPRINTF_1(buffer, "String=%ls", "string");
  std::cout << buffer << std::endl;
}

Normally, we want to stop execution when we have errors, but for our test purposes, we keep it running and let it execute the simple Test() program.

C++

void Test()
{
  char buffer[256];

  SPRINTF_0(buffer, "No parameters");
  std::cout << buffer << std::endl;

  SPRINTF_1(buffer, "%%c = %c", 'a'); std::cout << buffer << std::endl;
  SPRINTF_1(buffer, "%%C = %C", 'A'); std::cout << buffer << std::endl;

  SPRINTF_2(buffer, "%%d, %%i = %d, %i", 40, 67);
  std::cout << buffer << std::endl;

  SPRINTF_3(buffer, "%%o, %%u, %%x = %4o, %6u, %8x", 65535, 65535, 65535);
  std::cout << buffer << std::endl;

  SPRINTF_4(buffer, "%%e, %%E, %%f, %%g = %.3e, %.3E, %.0f, %g", 
            13.49e25, 13.49e25, 13.49e25, 13.49e25);
  std::cout << buffer << std::endl;

  SPRINTF_5(buffer, "%%G, %%a, %%A, %%C, %%S = %G, %a, %A, %C, %S", 
            13.49e-12, 13.49e-12, 13.49e-12, L'W', L"Wstring");
  std::cout << buffer << std::endl;

  float d=20.1f;
  // Test space after %
  SPRINTF_1(buffer, "%% f, 20.1 = % f", d);
  std::cout << buffer << std::endl;

  double d1=-56.7;
  SPRINTF_1(buffer, "%%+6.1f, -56.7 = %+6.1f", d1);
  std::cout << buffer << std::endl;

  int d2=100;
  long long dd2=100;
  SPRINTF_1(buffer, "%%hu, 100 = %hu", d2);
  std::cout << buffer << std::endl;

  SPRINTF_1(buffer, "%%lu, 100 = %lu", d2);
  std::cout << buffer << std::endl;

  SPRINTF_1(buffer, "%%I32, 100 = %I32d", d2);
  std::cout << buffer << std::endl;

  SPRINTF_1(buffer, "%%lld, 100 = %lld", dd2);
  std::cout << buffer << std::endl;

  SPRINTF_1(buffer, "%%I64, 100 = %I64d", dd2);
  std::cout << buffer << std::endl;

  short d3=11;
  SPRINTF_2(buffer, "%%*, 11 = %*d", 11, d3);
  std::cout << buffer << std::endl;

  SPRINTF_2(buffer, "%%s, ""\"a string\""" = %s; %%7.3f, 3.1425 = %7.3f", 
            "a string", 3.1425);
  std::cout << buffer << std::endl;

  char* s2="another string";
  int i=9;
  SPRINTF_2(buffer, ".%%10s, ""\"another string\""" = %.10s; %%#06X = %#06X", s2, &i);
  std::cout << buffer << std::endl;

  SPRINTF_1(buffer, "%%#p = %#p", &i);
  std::cout << buffer << std::endl;
}

As you can see, the syntax of our SPRINTF_N() macro is almost the same as for the normal sprintf() function. The only difference is that you can count the optional arguments used with SPRINTF_N() and provide the correct N number.

Finally, about limitations. For this article, I created a replacement only for the sprintf() function. In the source code, you can find the implementation of up to 5 optional arguments. The expansion for more arguments is trivial. I also considered the case when only a static buffer will be used for the sprintf() output. That is why I can query the buffer size by using the sizeof() operator and use the more stable _snprintf() function in my implementation. If you want to use dynamic buffers, the sizeof() operator cannot be used and the implementation should be changed. As I already mentioned, the described technique works only for hard-coded formatting strings. Also, though the real sprintf() function has a return value, there is no return value from SPRINTF_N(), because a macro cannot have a return value.

All sprintf()-specific code is put into the Sprintf.h file, and it is separated from the common core code. The replacements for other functions – printf(), fprintf() will be almost the same. You just need to change the macro and call a different underlying real function in it.

Conclusion

In this article, I presented a technique for using C++ templates for general startup validation purposes. As a working example, we built a replacement for the sprintf() function that can validate its parameters at startup time. All calls are enlisted for validation automatically as soon as files containing them compile. The errors found are displayed in the same way as the Visual C++ compiler does, with quick access to the source code. In the second article, "Validation of SQL Statements", I will show how the described technique can be used for the validation of SQL expressions.

If you have any comments or suggestions regarding this article or the source code, you can contact me at alexander@gorobets.com. Happy error-free coding!

History

26^th October, 2010: Initial post
30^th October, 2010: Updated article

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Written By

AlexanderGorobets

Canada

This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.