Introduction
TagCache is a full fledged, high performance cache which allows associating values
with tags (a.k.a. labels). Individual tags or a combination of tags can be used to invalidate all cache items that match the tag
specification. The highlight here is that the tag invalidation operation is ~O(1). For the latest information, visit project home.
Background
At work I come across lots of scenarios where a single change (e.g., hierarchical data) would require a bunch of cache items to be invalidated (e.g., a whole hierarchy).
I couldn't find any free .NET cache library that could do this efficiently. The sub optimal approach is to iterate over the cache entries and invalidate those that match the criterion.
This vacuum led to the creation of TagCache which is a full fledged cache implementation that supports annotating cache items with tags (a.k.a. labels) and enables efficient
invalidation using tags. In this article, I explain the usage and the implementation details of TagCache.
Example usage
Let us directly jump into an example usage to get a feel of caching with tags.
TagCache tagCache = new TagCache();
Action fillCacheItems = delegate
{
tagCache.Set("honda", new object(),
new List<string> { "Vehicle", "Car", "Economy" });
tagCache.Set("lexus", new object(),
new List<string> { "Vehicle", "Car", "Luxury" });
tagCache.Set("harley", new object(),
new List<string> { "Vehicle", "Bike", "Luxury" });
tagCache.Set("yamaha", new object(),
new List<string> { "Vehicle", "Bike", "Economy" });
};
fillCacheItems();
tagCache.Invalidate("Bike");
fillCacheItems();
tagCache.Invalidate("Luxury");
fillCacheItems();
tagCache.Invalidate("Vehicle");
fillCacheItems();
tagCache.Invalidate(new List<string> { "Car", "Luxury" });
fillCacheItems();
tagCache.Invalidate(new List<string> { "Bike", "Economy" });
fillCacheItems();
tagCache.Invalidate(new List<List<string>> {
new List<string> { "Bike", "Luxury" },
new List<string> { "Car", "Economy" }
});
Sample scenario
Imagine a system where we are dealing with a hierarchy of configuration information. This is similar to how web.config works
in the case of ASP.NET where the hierarchy is typically a tree structure that descends from the root web.config all the way down to the sub folder level web.config.
However, our system is much simpler than ASP.NET, and we don't need to unload/reload the app-domain during configuration changes, we'll
just have to re-compute the new configuration and move on. Also, we'll cache the computed configuration for performance reasons.
Now, what happens when a configuration update occurs while the system is in operation and the configuration values have been cached?
Ideally, we expect the cached values corresponding to the node of change and all its descendents to be invalidated. With existing caching systems,
the only approach would be to iterate through all the descendents and invalidate their cached configuration one by one, which is cumbersome and very inefficient.
Enter TagCache... you'll be able to invalidate the whole bunch of items in one go and in an efficient manner. You achieve this by
associating each configuration value with a list of tags. The list of tags map to the list of nodes in the ancestral hierarchy of the node whose configuration
has to be cached. So, the root level configuration will have only one tag - that of itself. The configuration of an Nth level node will have N tags
associated to it. So, when a configuration of node K changes, you invalidate tag K which in turn would ensure that the cached configuration values of node K and
all its descendents are removed from the cache!
The code for the above scenario follows. You'll find a fully working sample application attached. The code below is self explanatory...
static TagCache _TagCache = new TagCache();
static List<int> GetNodeAncestorList(int nodeId)
{
List<int> list = new List<int>();
while (nodeId > 0)
{
list.Add(nodeId);
nodeId = GetParentNode(nodeId);
}
return list;
}
static NodeConfig GetMergedNodeConfig(int nodeId)
{
if (nodeId == 0)
{
return null;
}
string cacheKey = String.Format("config-{0}", nodeId);
NodeConfig mergedNodeConfig = (NodeConfig)_TagCache.Get(cacheKey);
if (mergedNodeConfig == null)
{
NodeConfig mergedParentConfig =
GetMergedNodeConfig(GetParentNode(nodeId));
NodeConfig nonMergedNodeConfig = ReadNodeConfigFromXml(nodeId);
mergedNodeConfig =
nonMergedNodeConfig.MergeWith(mergedParentConfig);
List<string> tagList =
GetNodeAncestorList(nodeId).ConvertAll(
(nid) => String.Format("node-{0}", nid));
_TagCache.Set(cacheKey, mergedNodeConfig, tagList);
}
return mergedNodeConfig;
}
static void UpdateNodeConfig(int nodeId)
{
_TagCache.Invalidate(String.Format("node-{0}", nodeId));
}
Implementation details
Before getting into the implementation details, I would recommend that you download the sample application and dig through the code a bit.
Layered cache
Point to note is that TagCache is a full fledge cache which supports most of the cache functionalities like simple key value storage,
absolute and sliding time expiry, scavenging etc. I could have built tagging functionality upon any existing cache library but I chose
to implement the cache myself. One reason is that it is so much fun to do it and the other is that it helped me to leverage the
ConcurrentDictionary
to implement a high performance cache with minimal locks (many cache libraries I checked don't yet use ConcurrentDictionary
).
The classes forming the layered design are listed below...
CacheStore
- basic key value map that uses ConcurrentDictionary
ExpirableCacheStore
- built upon CacheStore
and supports expiration of stored itemsScavengedExpirableCacheStore
- built upon ExpirableCacheStore
and supports scavenging of stored items when a configurable limit is reachedTaggedScavengedExpirableCacheStore
- adds tagging functionality to ScavengedExpirableCacheStore
CacheItem
- a common object used by all the layers and holds data that may be required in one or more layersCacheOptions
- a common object that holds configuration information for different layers of the cacheTagCache
- the public facing cache API that encapsulates TaggedScavengedExpirableCacheStore
Please note that I was only experimenting with this sort of layered design and I am not advocating it. However, I am pretty pleased with
the simplicity and elegance of this layered implementation.
From here, I'll focus my discussion on TaggedScavengedExpirableCacheStore
as that is the unique piece here.
Tagging
The approach is simple as sequenced below...
- For each tag, maintain a
TagInfo
bookkeeping object. TagInfo
will hold ValidAfterTimeStamp
which is the latest invalidation timestamp for the tag.- Whenever a tag is invalidated, its
TagInfo.ValidAfterTimeStamp
is updated to the current timestamp. - Each cache item that has one or more tags associated with it that will capture the tag related information in a
TagExpirationSpecifier
object. - The
TagExpirationSpecifier
will maintain the cache item creation timestamp in its CreationTimeStamp
field. - Whenever a cached item is looked up,
TagExpirationSpecifier
will iterate through the list of tags and ensure that all TagInfo.ValidAfterTimeStamp
s
are behind the CreationTimeStamp
. - The cache item, if it passes the timestamp validation, would be considered valid, otherwise treated as invalid (and removed).
This is super easy, isn't it? As usual, the devil is in the details...
- While
ValidAfterTimeStamp
is good for single tag invalidations, there is no straightforward approach to invalidate efficiently on a combination of tags. - The
TagInfo
object map needs to be pruned of those TagInfo
s which are no longer used by any of the cached items.
We will delve deeper into the above two concerns.
Invalidating tag combinations
One approach is to maintain a tag-combination TagInfo
for all possible tag combinations and deal with invalidations similar to how we deal with
single tag invalidations. This approach would severely increase the bookkeeping memory overhead and will be impractical in most cases. The other approach is to
iterate through all the cache items and invalidate those items that match the tag combination. While this will work, it is very inefficient. Imagine iterating over
tens of thousands of objects each time a tag combination needs to be invalidated.
In the TagCache
implementation, we do check against every cache item. However, it happens in an on-demand fashion, thereby making the actual invalidation API to return quickly.
Here is what we do...
- In the Invalidate API, for a tag-combination, we capture the tag-combination and timestamp in the
TagInvalidationInfo
object. - The next step is to add the
TagInvalidationInfo
object to a TailList
maintained in TagInfo
of any one of the tags in the combination. - The list is called
TailList
because it tracks only the tail of the singly linked list. - The
TagExpirationSpecifier
object of each cache item would maintain the point-in-time head of the TailList
. - Note that the point-in-time head maintained by
TagExpirationSpecifier
may be different for different cache items. - While checking for cache item validity,
TagExpirationSpecifier
would iterate from the point-in-time head till the tail checking whether
the TagInvalidationInfo
would invalidate the cache item. - At the end of the check,
TagExpirationSpecifier
would update the point-in-time head to the current tail of TailList
. - So effectively, each cache item will iterate once through the
TagInvalidationInfo
objects related to each tag in the cache
item's tag list (during the Get request or during expiry poll). - In the course of time, after all related cache items have iterated past a particular
TagInvalidationInfo
object, it'll be automatically available
for GC. That is the beauty of TailList
.
The validation logic for the tag combination available in TagExpirationSpecifier
is shown below...
for (int i = 0; i < _TagInfoList.Count; i++)
{
TagInfo tagInfo = _TagInfoList[i];
object token = _TagInvalidationTokenList[i];
object newToken = tagInfo.TagInvalidationInfoList.ListHeadToken;
foreach (TagInvalidationInfo invalidationInfo in
tagInfo.TagInvalidationInfoList.GetEnumerable(token))
{
if (_CreationTimeStamp < invalidationInfo.ValidAfterTimeStamp)
{
if (_IsMatchedByTagSet(invalidationInfo.TagInfoList))
{
return true;
}
}
}
_TagInvalidationTokenList[i] = newToken;
}
Pruning unused TagInfo objects
The TagInfoMap
object maintains a ConcurrentDictionary
of tag to TagInfo
mapping. The map will keep growing as
more and more tags are associated with cache items. However, in order to prevent the indefinite growth of TagInfoMap
, we need to
figure out those tags that are no longer associated with any object and remove them from the map. One way to do this is to iterate through all the cache
items from time to time and capture a list of all tags that are used and use it thereby removing unused tags from the map. This approach is very
inefficient as well as prone to race conditions (thereby requiring long standing locks).
The approach we follow in the TagCache
implementation is described below:
TagInfoMap
would maintain a map between a tag and a WeakReference
to TagInfo
.- When all cache items that are associated with a tag get removed eventually, the
TagInfo
would be garbage collected as well. - We scavenge the
TagInfoMap
at intervals to identify and remove those WeakReference
objects that have a null
target.
Conclusion
TagCache
helped solve a cache consistency pain-point that was haunting me at work for a long time. I sincerely hope that some of you may benefit from this piece of work.
History
- 9/26/2011 - First publication.
I hold a bachelors degree in Computer Science & Engineering and am currently a Software Architect at GlobalScholar working on cutting edge technological solutions for education. Prior to this, I was at Microsoft working on Microsoft Outlook and Microsoft Office Live. My areas of technical specialization include Web/Ajax, Microsoft .Net, compilers and microprocessors. Other than work, I spend quite some time reading latest happenings in science/technology, hacking out some pet projects, playing musical instruments and being in yoga.