Click here to Skip to main content
15,880,651 members
Articles / Programming Languages / C# 4.0
Article

TagCache - A .NET cache that allows you to tag (aka label) items and invalidate based on tags

Rate me:
Please Sign up or sign in to vote.
0.00/5 (No votes)
23 Sep 2011MIT7 min read 15.7K   180   12  
Explains the usage and inner working of TagCache.

Introduction

TagCache is a full fledged, high performance cache which allows associating values with tags (a.k.a. labels). Individual tags or a combination of tags can be used to invalidate all cache items that match the tag specification. The highlight here is that the tag invalidation operation is ~O(1). For the latest information, visit project home.

Background

At work I come across lots of scenarios where a single change (e.g., hierarchical data) would require a bunch of cache items to be invalidated (e.g., a whole hierarchy). I couldn't find any free .NET cache library that could do this efficiently. The sub optimal approach is to iterate over the cache entries and invalidate those that match the criterion. This vacuum led to the creation of TagCache which is a full fledged cache implementation that supports annotating cache items with tags (a.k.a. labels) and enables efficient invalidation using tags. In this article, I explain the usage and the implementation details of TagCache.

Example usage

Let us directly jump into an example usage to get a feel of caching with tags.

C#
TagCache tagCache = new TagCache();
    
Action fillCacheItems = delegate
{
    tagCache.Set("honda", new object(), 
                 new List<string> { "Vehicle", "Car", "Economy" });
    tagCache.Set("lexus", new object(), 
                 new List<string> { "Vehicle", "Car", "Luxury" });
    tagCache.Set("harley", new object(), 
                 new List<string> { "Vehicle", "Bike", "Luxury" });
    tagCache.Set("yamaha", new object(), 
                 new List<string> { "Vehicle", "Bike", "Economy" });
};

fillCacheItems();
tagCache.Invalidate("Bike");  // invalidates all bikes
    
fillCacheItems();
tagCache.Invalidate("Luxury");  // invalidates all luxury vehicles
    
fillCacheItems();
tagCache.Invalidate("Vehicle");  // invalidates all vehicles
    
fillCacheItems();
tagCache.Invalidate(new List<string> { "Car", "Luxury" });
// invalidates all luxury cars
    
fillCacheItems();
tagCache.Invalidate(new List<string> { "Bike", "Economy" });
// invalidates all economy bikes

fillCacheItems();
tagCache.Invalidate(new List<List<string>> { 
    new List<string> { "Bike", "Luxury" }, 
    new List<string> { "Car", "Economy" } 
});  // invalidates all luxury bikes, economy cars

Sample scenario

Imagine a system where we are dealing with a hierarchy of configuration information. This is similar to how web.config works in the case of ASP.NET where the hierarchy is typically a tree structure that descends from the root web.config all the way down to the sub folder level web.config.

However, our system is much simpler than ASP.NET, and we don't need to unload/reload the app-domain during configuration changes, we'll just have to re-compute the new configuration and move on. Also, we'll cache the computed configuration for performance reasons.

Now, what happens when a configuration update occurs while the system is in operation and the configuration values have been cached? Ideally, we expect the cached values corresponding to the node of change and all its descendents to be invalidated. With existing caching systems, the only approach would be to iterate through all the descendents and invalidate their cached configuration one by one, which is cumbersome and very inefficient.

Enter TagCache... you'll be able to invalidate the whole bunch of items in one go and in an efficient manner. You achieve this by associating each configuration value with a list of tags. The list of tags map to the list of nodes in the ancestral hierarchy of the node whose configuration has to be cached. So, the root level configuration will have only one tag - that of itself. The configuration of an Nth level node will have N tags associated to it. So, when a configuration of node K changes, you invalidate tag K which in turn would ensure that the cached configuration values of node K and all its descendents are removed from the cache!

The code for the above scenario follows. You'll find a fully working sample application attached. The code below is self explanatory...

C#
static TagCache _TagCache = new TagCache();

static List<int> GetNodeAncestorList(int nodeId)
{
    List<int> list = new List<int>();
    while (nodeId > 0)
    {
        list.Add(nodeId);
        nodeId = GetParentNode(nodeId);
    }
    return list;
}

static NodeConfig GetMergedNodeConfig(int nodeId)
{
    // This method computes and caches
    // the config (and associates with tags)

    if (nodeId == 0)
    {
        return null;    // no config available
    }

    string cacheKey = String.Format("config-{0}", nodeId);
    NodeConfig mergedNodeConfig = (NodeConfig)_TagCache.Get(cacheKey);

    if (mergedNodeConfig == null)
    {
        NodeConfig mergedParentConfig = 
          GetMergedNodeConfig(GetParentNode(nodeId));
        NodeConfig nonMergedNodeConfig = ReadNodeConfigFromXml(nodeId);
                
        mergedNodeConfig = 
          nonMergedNodeConfig.MergeWith(mergedParentConfig);

        // create a tag list with tags corresponding
        // to this node and all its ancestors
        // basically a tag is like a dependency indicator;
        // in this case this cached item has to be invalided
        // when any of its ancestors change
        List<string> tagList = 
          GetNodeAncestorList(nodeId).ConvertAll(
          (nid) => String.Format("node-{0}", nid));

        _TagCache.Set(cacheKey, mergedNodeConfig, tagList);
    }

    return mergedNodeConfig;
}

static void UpdateNodeConfig(int nodeId)
{
    // some update logic

    // invalidate the tag corresponding to this node
    // note that this will invalidate cached config
    // of this node and its descendent nodes at one go!
    // the ancestor node configs or node configs
    // in other hierarchies won't be affected
    _TagCache.Invalidate(String.Format("node-{0}", nodeId));
}

Implementation details

Before getting into the implementation details, I would recommend that you download the sample application and dig through the code a bit.

Layered cache

Point to note is that TagCache is a full fledge cache which supports most of the cache functionalities like simple key value storage, absolute and sliding time expiry, scavenging etc. I could have built tagging functionality upon any existing cache library but I chose to implement the cache myself. One reason is that it is so much fun to do it and the other is that it helped me to leverage the ConcurrentDictionary to implement a high performance cache with minimal locks (many cache libraries I checked don't yet use ConcurrentDictionary).

The classes forming the layered design are listed below...

  • CacheStore - basic key value map that uses ConcurrentDictionary
  • ExpirableCacheStore - built upon CacheStore and supports expiration of stored items
  • ScavengedExpirableCacheStore - built upon ExpirableCacheStore and supports scavenging of stored items when a configurable limit is reached
  • TaggedScavengedExpirableCacheStore - adds tagging functionality to ScavengedExpirableCacheStore
  • CacheItem - a common object used by all the layers and holds data that may be required in one or more layers
  • CacheOptions - a common object that holds configuration information for different layers of the cache
  • TagCache - the public facing cache API that encapsulates TaggedScavengedExpirableCacheStore

Please note that I was only experimenting with this sort of layered design and I am not advocating it. However, I am pretty pleased with the simplicity and elegance of this layered implementation.

From here, I'll focus my discussion on TaggedScavengedExpirableCacheStore as that is the unique piece here.

Tagging

The approach is simple as sequenced below...

  1. For each tag, maintain a TagInfo bookkeeping object.
  2. TagInfo will hold ValidAfterTimeStamp which is the latest invalidation timestamp for the tag.
  3. Whenever a tag is invalidated, its TagInfo.ValidAfterTimeStamp is updated to the current timestamp.
  4. Each cache item that has one or more tags associated with it that will capture the tag related information in a TagExpirationSpecifier object.
  5. The TagExpirationSpecifier will maintain the cache item creation timestamp in its CreationTimeStamp field.
  6. Whenever a cached item is looked up, TagExpirationSpecifier will iterate through the list of tags and ensure that all TagInfo.ValidAfterTimeStamps are behind the CreationTimeStamp.
  7. The cache item, if it passes the timestamp validation, would be considered valid, otherwise treated as invalid (and removed).

This is super easy, isn't it? As usual, the devil is in the details...

  1. While ValidAfterTimeStamp is good for single tag invalidations, there is no straightforward approach to invalidate efficiently on a combination of tags.
  2. The TagInfo object map needs to be pruned of those TagInfos which are no longer used by any of the cached items.

We will delve deeper into the above two concerns.

Invalidating tag combinations

One approach is to maintain a tag-combination TagInfo for all possible tag combinations and deal with invalidations similar to how we deal with single tag invalidations. This approach would severely increase the bookkeeping memory overhead and will be impractical in most cases. The other approach is to iterate through all the cache items and invalidate those items that match the tag combination. While this will work, it is very inefficient. Imagine iterating over tens of thousands of objects each time a tag combination needs to be invalidated.

In the TagCache implementation, we do check against every cache item. However, it happens in an on-demand fashion, thereby making the actual invalidation API to return quickly.

Here is what we do...

  1. In the Invalidate API, for a tag-combination, we capture the tag-combination and timestamp in the TagInvalidationInfo object.
  2. The next step is to add the TagInvalidationInfo object to a TailList maintained in TagInfo of any one of the tags in the combination.
  3. The list is called TailList because it tracks only the tail of the singly linked list.
  4. The TagExpirationSpecifier object of each cache item would maintain the point-in-time head of the TailList.
  5. Note that the point-in-time head maintained by TagExpirationSpecifier may be different for different cache items.
  6. While checking for cache item validity, TagExpirationSpecifier would iterate from the point-in-time head till the tail checking whether the TagInvalidationInfo would invalidate the cache item.
  7. At the end of the check, TagExpirationSpecifier would update the point-in-time head to the current tail of TailList.
  8. So effectively, each cache item will iterate once through the TagInvalidationInfo objects related to each tag in the cache item's tag list (during the Get request or during expiry poll).
  9. In the course of time, after all related cache items have iterated past a particular TagInvalidationInfo object, it'll be automatically available for GC. That is the beauty of TailList.

The validation logic for the tag combination available in TagExpirationSpecifier is shown below...

C#
// check whether any tag combinations have been invalidated
for (int i = 0; i < _TagInfoList.Count; i++)
{
    TagInfo tagInfo = _TagInfoList[i];
    object token = _TagInvalidationTokenList[i];
    object newToken = tagInfo.TagInvalidationInfoList.ListHeadToken;
    // take note of the new token before enumerating

    foreach (TagInvalidationInfo invalidationInfo in 
             tagInfo.TagInvalidationInfoList.GetEnumerable(token))
    {
        // first: time stamp should be older
        if (_CreationTimeStamp < invalidationInfo.ValidAfterTimeStamp)
        {
            // second: check if the invalidation tag
            // combination matches with the available tags 
            if (_IsMatchedByTagSet(invalidationInfo.TagInfoList))
            {
                // this means that this item is captured
                // by the invalidation and needs to expire, RIP
                return true;
            }
        }
    }

    // update the list head token to point to the new token
    _TagInvalidationTokenList[i] = newToken;
}

Pruning unused TagInfo objects

The TagInfoMap object maintains a ConcurrentDictionary of tag to TagInfo mapping. The map will keep growing as more and more tags are associated with cache items. However, in order to prevent the indefinite growth of TagInfoMap, we need to figure out those tags that are no longer associated with any object and remove them from the map. One way to do this is to iterate through all the cache items from time to time and capture a list of all tags that are used and use it thereby removing unused tags from the map. This approach is very inefficient as well as prone to race conditions (thereby requiring long standing locks).

The approach we follow in the TagCache implementation is described below:

  1. TagInfoMap would maintain a map between a tag and a WeakReference to TagInfo.
  2. When all cache items that are associated with a tag get removed eventually, the TagInfo would be garbage collected as well.
  3. We scavenge the TagInfoMap at intervals to identify and remove those WeakReference objects that have a null target.

Conclusion

TagCache helped solve a cache consistency pain-point that was haunting me at work for a long time. I sincerely hope that some of you may benefit from this piece of work.

History

  • 9/26/2011 - First publication.

License

This article, along with any associated source code and files, is licensed under The MIT License


Written By
Architect www.globalscholar.com
United States United States
I hold a bachelors degree in Computer Science & Engineering and am currently a Software Architect at GlobalScholar working on cutting edge technological solutions for education. Prior to this, I was at Microsoft working on Microsoft Outlook and Microsoft Office Live. My areas of technical specialization include Web/Ajax, Microsoft .Net, compilers and microprocessors. Other than work, I spend quite some time reading latest happenings in science/technology, hacking out some pet projects, playing musical instruments and being in yoga.

Comments and Discussions

 
-- There are no messages in this forum --