Tricky Binary Collision

User-13376231

5.00/5 (1 vote)

Dec 4, 2020

CPOL

2 min read

4292

An overview of a tricky binary collision example when there are multiple definitions of symbols in linked binaries

Introduction

Let's define two binaries: static library "examlplelib.a" and program "example".

examlplelib.a:

test_class_lib.h
test_class_lib.cpp
test_class_user.h
test_class_user.cpp

Example:

test_class_prog.h
test_class_prog.cpp
test_class_main.cpp

So, in the library, we'll have class TestClass defined as below:

test_class_lib.h

#pragma once

class TestClass {
    public:
        int MethodA();
        virtual int MethodV();
        virtual ~TestClass();
    private:
        const int value = 22;
};

test_class_lib.cpp

#include <test_class.h>
#include <iostream>

int TestClass::MethodA()
{
    std::cout << "lib::TestClass::MethodA, value=" << value << std::endl;
    return 5;
}

int TestClass::MethodV()
{
    std::cout << "lib::TestClass::MethodV" << std::endl;
    return 7;
}

TestClass::~TestClass()
{
}

In the same library "examplelib.a", we have another class which uses TestClass above:

test_class_user.h

#pragma once

class TestClass;

class TestClassUser {
    public:
        TestClass * pObj = nullptr;
        void CallMethodA();
        void CallMethodV();
};

test_class_user.cpp

#include <test_class_user.h>
#include <test_class_lib.h>

void TestClassUser::CallMethodA()
{
    if (pObj) {
        pObj->MethodA();
    }
}

void TestClassUser::CallMethodV()
{
    if (pObj) {
        pObj->MethodV();
    }
}

Then, we have binary program "example", where another class (with the same name and methods signatures) defined:

test_class_prog.h

#pragma once

class TestClass {
    public:
        int MethodA();
        virtual int MethodV();
        virtual ~TestClass();
    private:
        const int value = 42;
};

test_class_prog.cpp

#include <test_class_prog.h>
#include <iostream>

int TestClass::MethodA()
{
    std::cout << "prog::TestClass::MethodA" << std::endl;
    return 15;
}

int TestClass::MethodV()
{
    std::cout << "prog::TestClass::MethodV" << std::endl;
    return 17;
}

TestClass::~TestClass()
{
}

For now, signatures of both classes are the same. You can see that methods implementations are slightly different - they output different strings and return different values.

There is also main function like the following:

test_class_main.cpp

#include <test_class_prog.h>
#include <test_class_user.h>
#include <memory>

int main()
{
    TestClassUser user;
    auto obj = std::make_unique<TestClass>();
    user.pObj = obj.get();
    user.CallMethodA();
    user.CallMethodV();
    return 0;
}

Let's build this program. To successfully link this, we need to pass "--allow-multiple-definition" option to linker.

If we look at the symbols in "example", we can see both methods there (the methods from "example" should take precedence during linking since they are "local"):

000000000041a1c0 T TestClass::MethodV()
000000000041a230 T TestClass::MethodA()

Let's run program to ensure that the methods from program (not library) are called:

$ ./example

prog::TestClass::MethodA

prog::TestClass::MethodV

Good!

Let's make some tricks.

In the "example"'s class, let's add some argument to MethodA():

test_class_prog.h

#pragma once

class TestClass {
    public:
        int MethodA(int arg);
        virtual int MethodV();
        virtual ~TestClass();
    private:
        const int value = 42;
};

After building program, we'll see that there are both MethodA methods - the one from library (without arguments), and the one from program - with int argument:

000000000041a1c0 T TestClass::MethodV()
000000000041a230 T TestClass::MethodA(int)
000000000041ac54 T TestClass::MethodA()

And the method from library is called:

$ ./example

common::TestClass::MethodA, value=42

prog::TestClass::MethodV

You can notice that TestClass::value data member is taken from "example"'s class definition, it simply means that this points to object instantiated in "example"'s main.

So the behavior of calling method MethodA() in this case could be unpredictable, since the data members are used from totally different class.

Since there are totally different classes, what if there is no member TestClass::value at all in it?

Let's remove it and see:

test_class_prog.h

#pragma once

class TestClass {
    public:
        int MethodA(int arg);
        virtual int MethodV();
        virtual ~TestClass();
    private:
        // const int value = 42;
};

After compiling and executing, we'll see something like this:

$ ./example

common::TestClass::MethodA, value=27958528

prog::TestClass::MethodV

Tada! This is just some value in memory with offset corresponding to TestClass::value in object with definition of other TestClass.

Virtual members will work in such case similarly to the data members, since they are just pointers to functions in virtual methods table of an object.

It depends not on the method name (as it is the case for non-virtual methods), but rather on the placement in the class definition.

Example, if we rename virtual method "example":

test_class_prog.h

#pragma once

class TestClass {
    public:
        int MethodA(int arg);
        virtual int MethodOtherV();
        virtual ~TestClass();
    private:
        // const int value = 42;
};

By calling MethodV() from library, we'll actually call MethodOtherV() since it is the first method in VMT in both classes:

$ ./example

common::TestClass::MethodA, value=27958528

prog::TestClass::MethodOtherV

But if we add new virtual method above MethodV(), this new method will be called:

test_class_prog.h

#pragma once

class TestClass {
    public:
        int MethodA(int arg);
        virtual int NewOtherMethodV();
        virtual int MethodV();
        virtual ~TestClass();
    private:
        // const int value = 42;
};

$ ./example

common::TestClass::MethodA, value=27958528

prog::TestClass::NewOtherMethodV

Conclusion

Such an interesting problem came from analyzing the production code. It appeared when someone decided to copy-paste class definition to alter its behavior, because doing this inside library appeared "dangerous" for him (there are another binaries which use this library).

Before doing copy-pasting of source code, try thinking about how many man-days of supporting such code you are going to add.

History

4^th December, 2020: Initial version