Click here to Skip to main content
15,879,028 members
Articles / Programming Languages / C++

32-bit pointers in a 64-bit world

Rate me:
Please Sign up or sign in to vote.
3.18/5 (6 votes)
24 Aug 2008CPOL3 min read 73.6K   202   17   15
64 bit pointers are wasteful if you don't need to access TBs of RAM

Introduction

64 bit is wonderful - we are no longer limited to just 2GB (per Win32 process) - you can now address a full 8TB of memory (per Win64 process)... which is great if your application really needs 8TB of memory.

If you don't need the full address space, 64-bit can be wasteful: pointers take twice the memory, which may actually result in a slower application, especially when using a pointer-laden data structure such as a binary tree (larger structures take more cache space).

A simple scheme is presented that allows using 32-bit "pointers" to access up to 64GB of memory on a 64-bit machine.

Encoding 64-bit pointers in 32-bits

The main idea of sptr is data alignment - modern processors feel much more comfortable when accessing "aligned" addresses: accessing a char at address 0xC0001 means more work than accessing an int at 0xC0000.

malloc() (which does the heavy lifting for new) only returns aligned addresses (see http://msdn.microsoft.com/en-us/library/ycsb6wwf.aspx) - a pointer returned by malloc() will always be aligned to a 16-byte boundary on Visual Studio - the 4 lowest bits will always be 0.

Therefore, at least for pointers we allocate on the heap, we can just "delete" the 4 lowest bits (by shifting right) without losing information.

If all memory addresses were allocated from the "bottom" (the highest bits are also 0), we could just encode a pointer as follows:

C++
#include <stddef.h>  // for uintptr_t
typedef unsigned __int32 uint32_t;

uint32_t encode(void* p)
{
  uintptr_t i = reinterpret_cast<uintptr_t>(p) >> 4;
  return static_cast<uint32_t>(i);
}

(uintptr_t is an integral data type that is guaranteed to be the same size as a pointer.)

This will encode any (16-byte aligned) address in the range 0x00 0000 0000 - 0x10 0000 0000 into a 32-bit integer.

However, the Operating System is free to map physical memory into other virtual memory segments - for example, mapping kernel memory "at the top", so it is preferable to keep such a map ourselves:

C++
#include <boost/thread/mutex.hpp>
#include <boost/thread/locks.hpp>

class sptr_base
{   
protected:
  static const uint32_t ALIGNMENT_BITS = 4;
  static const uint32_t ALIGNMENT = (1<<ALIGNMENT_BITS);
  static const uintptr_t ALIGNMENT_MASK = ALIGNMENT - 1;

protected:
  static uintptr_t     _seg_map[ALIGNMENT];
  static uintptr_t     _segs;
  static boost::mutex  _m;
  inline static uintptr_t ptr2seg(uintptr_t p)
  {
    p &= ~0xFFFFFFFFULL; // Keep only high part
    uintptr_t s = _segs;
    uintptr_t i = 0;
    for (; i < s; ++i)
      if (_seg_map[i] == p)
    return i;

    // Not found - now we do it the "right" way (mutex and all)
    boost::lock_guard<boost::mutex> lock(_m);
    for (i = 0; i < s; ++i)
      if (_seg_map[i] == p)
        return i;

    i = _segs++;
    if (_segs > ALIGNMENT)
      throw bad_segment("Segment out of range");

    _seg_map[i] = p;
    return i;
  }
};

ptr2seg() maps the high bits of a pointer to one of a few segments. The code refrains from using a mutex when reading the map, based on the atomicity of integer assignment.

The actual pointer encode/decode is then as follows:

C++
#include <boost/pool/detail/singleton.hpp>
typedef boost::details::pool::singleton_default<segment_mapper> the_segment_mapper;

uint32_t encode(const_native_pointer_type ptr)
{
  uintptr_t p = reinterpret_cast<uintptr_t>(ptr);
  if ((p & ALIGNMENT_MASK) != 0)
    throw bad_alignment("Pointer is not aligned");
  return (uint32_t)(ptr2seg(p) + p);
}

inline native_pointer_type decode(uint32_t e)
{
   uintptr_t el = e;
   uintptr_t ptr = (_seg_map[el & ALIGNMENT_MASK] + el) & ~ALIGNMENT_MASK;
   return reinterpret_cast<native_pointer_type>(ptr);
}

sptr

sptr gives a pointer-like interface to the "compressed" pointer. It allows accessing our "pointer" similar to a real pointer:

C++
sptr<int> p = new int;
*p = 5;
delete p;

// Conversions from/to pointers and pointer arithmetics work
int* q = new int;
p = q;
p++;
q = p;

Limitations

sptr has several limitations (which may make it inappropriate in many places).

First, sptr can't access the whole 64-bit address space (duh!). It is appropriate only for applications requiring just a bit more than the 32-bit address space (server machines are usually limited to 16GB-32GB of memory; Desktops usually can't deal with more than 8GB).

Second, although an address allocated on the heap will be aligned properly, pointer arithmetic may complicate things:

C++
sptr<char> buf = new char[100];

will work just fine as new returns an aligned address.

However, something like:

C++
for (sptr<char> p = buf; p != buf + 100; ++p)
   *p = '0';

will not work as p will now point to an unaligned address (the sptr implementation will assert at compile time).

However, in most cases, a standard pointer can be used:

C++
for (char* p = buf; p != buf + 100; ++p)
   *p = '0';

Third, mix sptr and multiple-inheritence with caution.

For example, given:

C++
struct A { char i; };
struct B { int j; };
struct C : public A, public B { };

This may or may not work (depending on structure alignment):

C++
sptr<C> c = new C;
sptr<B> b = c;

This is because b does not point to the same address as c (this is not the place for a discussion on multiple-inheritance, but you can try the following code to convince yourself):

C++
C* c = new C;
B* b = c; 
std::cout << ((uintptr_t)b - (uintptr_t)c) << std::endl;

There is no problem accessing the object via a standard pointer:

C++
sptr<C> c = new C;
B* b = c;

Performance

A small benchmark is provided with the code, implementing a simple linked-list which is repeatedly traversed.

The list's node is:

C++
struct node
{
   int val;
   node_ptr prev;
   node_ptr next;
};

The size of the node is therefore 20 bytes with standard pointers vs. 12 bytes with sptr.

However, this comes at a tradeoff - the extra bit shuffling impacts performance - the standard pointer version will perform 1.5x faster on the linked-list benchmark.

Accessing the map accounts for much of this performance hit. If it could be guaranteed that all addresses are allocated from the region 0x00 0000 0000 - 0x10 0000 0000, we could discard the map, significantly improving performance.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
United States United States
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Comments and Discussions

 
QuestionGood solution, would add just1 improvement Pin
JCems16-Jan-15 7:04
JCems16-Jan-15 7:04 
GeneralUse with caution Pin
tobywf2-Sep-08 20:18
tobywf2-Sep-08 20:18 
GeneralRe: Use with caution Pin
Stefan_Lang3-Sep-08 4:00
Stefan_Lang3-Sep-08 4:00 
AnswerRe: Use with caution Pin
cppnow10-Oct-08 4:04
cppnow10-Oct-08 4:04 
GeneralRe: Use with caution Pin
Stefan_Lang10-Oct-08 4:14
Stefan_Lang10-Oct-08 4:14 
GeneralRe: Use with caution Pin
cppnow10-Oct-08 6:16
cppnow10-Oct-08 6:16 
Stefan63 wrote:
our memory analysis doesn't take into account the actual data stored in the tree! If it's only 4 bytes per node, then you'll be saving 25% instead of 33 in the best case. If it's more the reduction might be a lot less.



It does take account - that's "S".
The smaller the data element you actually want to store in the tree - the greater the saving.

If you only need 4 bytes per node, then with 64 bit pointers, a tree node will take 28 bytes (3x8 + 4) while with 32 bit pointers a node is only 16 bytes (3x4 + 4). In addition, alignment will also play a part - in this case, alignment will probably mean the 64 bit node will actually take 32 bytes, so the saving (in this particular case) is in fact 50%. If the data stored is larger, the saving will be lower. For example, if the data element is 12 bytes long (let's say an int + "standard" pointer), then we have for 64 bit: 3x8+12 = 36 bytes, while with 32-bit pointers: 3x4+12 = 24, or 33% saving (8 and 16 byte alignment will give 40 vs 24 and 48 vs 32 respectively)

Stefan63 wrote:
If your main reason for introducing such a pointer type is being able to run an application on a lower footprint you have to look at the total amount of memory used, not just the pointers that are part of your data structures!


More often than not, there are only few big data-structures that take obscene amount of memory - optimize those structures and you save memory for the entire application - same as with optimizing performance. And of course - the percentage memory saving will be lower for the whole application - but it doesn't mean it's not worth the effort: if you have one piece of code which the profiler shows takes 50% of the CPU's cycles, and you can improve its performance by 50%, unless the effort is prohibitive, the 25% performance gain for the entire application is probably worth it. If the performance is still not satisfactory, you will just have to find other places to optimize.
GeneralRe: Use with caution Pin
Stefan_Lang10-Oct-08 6:30
Stefan_Lang10-Oct-08 6:30 
GeneralRe: Use with caution Pin
cppnow10-Oct-08 6:58
cppnow10-Oct-08 6:58 
GeneralRe: Use with caution Pin
Stefan_Lang12-Oct-08 22:52
Stefan_Lang12-Oct-08 22:52 
GeneralRe: Use with caution Pin
cppnow13-Oct-08 5:20
cppnow13-Oct-08 5:20 
QuestionBenchmarks? Pin
Leo Davidson25-Aug-08 13:10
Leo Davidson25-Aug-08 13:10 
AnswerRe: Benchmarks? Pin
cppnow26-Aug-08 4:20
cppnow26-Aug-08 4:20 
GeneralError Pin
steveb25-Aug-08 9:24
mvesteveb25-Aug-08 9:24 
GeneralRe: Error Pin
JeanLuc_25-Aug-08 10:30
JeanLuc_25-Aug-08 10:30 
GeneralRe: Error Pin
cppnow25-Aug-08 10:49
cppnow25-Aug-08 10:49 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.