Load Data in a Distributed Cache using GetCache Robot

Massimo Zerbini

Rate me:

5.00/5 (1 vote)

17 Sep 2013CPOL5 min read

7.9K

This tip shows how you can use the GetCache Robot API to populate the cached data in an automatic way.

Download source - 348.1 KB

Example of a cluster architecture that use Robots

Introduction

Distributed caching systems are becoming more widespread and used, many developers use these systems in order to make better use of server resources, in particular memory.

Integrating a distributed cache system in an application is generally very simple: just add the client library to your application and use it every time you want to retrieve a data from the cache.

The typical use is to try to read the data from the cache and if the data is not present, read from a data source and then put it in cache.

This implies that the application must maintain access to both the cache and the data-source.

But what would happen if an application does not want to have access to data-source? Or is it possible that the caching system is able to load in autonomy the data and put them in the cache?

GetCache addresses this situation.

GetCache provides APIs that allow users to realize the components that autonomously and periodically retrieve data and placing them in the cache.

These are the GetCache Robot API and the user components developed with these API are named Robots.
Robots classes can access any data-source, like database, web services, REST API or external resources like files. Robots can aggregate data and put in the storage of GetCache. Clients should only read the data that is automatically and periodically refreshed by Robots.

Background

GetCache is a key/value store developed with .NET 4.5 and is very simple to use and manage. GetCache operates a distributed in-memory cache that supports data sharding on multiple nodes and data replication. Nodes can be added and removed to the cluster without causing service interruptions.

GetCache stores any type of data, objects can be serialized using JSON or XML, row object like images or videos can be stored as byte arrays. GetCache supports also automatic data expiration.

Creating a Robot

In this tip, we create a Robot that takes care of periodically downloading content from RSS sources and stores it in the cache.

As we shall see, the robot can aggregate multiple RSS sources together and store them in storage.
The first step to develop your own robots is to download the GetCache.Robot API library.
The library is available on Nuget, alternatively the library can be downloaded from GetCache site www.getcache.net.

Then we can proceed to define our own Robot class.

/// Robot RSS Loader. Load RSS resource contents and store them in the GetCache Server.
[Robot("RssLoader")]
public class RssLoader
{
    /// Constructor.
    public RssLoader() { }

    /// Read RSS feeds from New York Times and store it in the GetCache Server using the key "technology".
    /// The information are updated every 60 seconds.
    /// <returns>the serialized list of RSS feeds</returns>
    [LoadData("technology", Persistence=StoringMethod.SINGLE, RefreshTime=60)]
    public Object LoadTecnologyRssContent()
    {
        // Read RSS feed from NTY
        List<rsscontent> rssList = GetRssResource
        ("http://rss.nytimes.com/services/xml/rss/nyt/Technology.xml");

        // use the helper to serialize the object in JSON
        return SerializationHelper.SerializeDataJson(rssList);
    }

    /// Read RSS feeds from different sources, aggregate them and
    /// store the aggregation in the GetCache Server using the key "jobMarket".
    /// The information are updated every 120 seconds and are replicate
    /// on every nodes of the GetCache cluster.
    [LoadData("jobMarket", Persistence = StoringMethod.REPLICATED, RefreshTime = 120)]
    public Object LoadJobRssContent()
    {
        // Read RSS feed from NTY
        List<rsscontent> rssList = GetRssResource
        ("http://www.nytimes.com/services/xml/rss/nyt/JobMarket.xml");
        // aggregate RSS feed from Washington Post
        rssList.AddRange(GetRssResource
        ("http://feeds.washingtonpost.com/rss/jobs/career-advice"));

        // serialize the object in JSON
        return SerializationHelper.SerializeDataJson(rssList);
    }

    /// Simple method that read a RSS feed using the Syndication API.
    private List<rsscontent> GetRssResource(String url)
    {
        var reader = XmlReader.Create(url);
        var feed = SyndicationFeed.Load(reader);
        List<rsscontent> list = new List<rsscontent>();
        foreach (SyndicationItem si in feed.Items)
        {
            RssContent rss = new RssContent()
            {
                Id = si.Id,
                LastUpdatedTime = si.LastUpdatedTime,
                PublishDate = si.PublishDate,
                Summary = si.Summary.Text,
                Title = si.Title.Text,
            };
            list.Add(rss);
        }
        return list;
    }
}

The attribute Robot specifies that the class RssLoader is a robot component and it's named "RssLoader".
The attribute LoadData must be applied to all methods that are responsible for loading the data.

The first parameter indicates the key that is used to store data in the cache.
The second parameter indicates which storage method is to be used by the server GetCache:
- StoringMethod.SINGLE indicates that the server will save only one copy of the data in the cluster, in this case, the sharding algorithm is used to determine which node in the cluster maintains data
- StoringMethod.REPLICATED indicates that the server will save the data in all nodes of the cluster, in this case, the server uses an asynchronous replication mechanism that copies the data to all nodes
The third parameter indicates the period of time with which the data are reloaded into the cache. The RefreshTime is expressed in seconds.

The attribute can only be applied to methods that do not have parameters in the signature and must return an object.

The object returned by the method can be a string or an array of bytes. In this example, we use the helper class SerializationHelper provided by the Robot API to serialize the object to JSON format.

Deploying the Robot

After compiling, we can produce a module to be installed on the server. A module is a set of DLL that contains the compiled classes necessary to the robots. All the DLL must be included in a directory that has the name of the module. A module can contain more than one robot and a robot can have more than one load data method.

There are many possible ways to install the robot on a server GetCache:

Create the directory "MyRobot" under the "deploy" directory of the server, copy the robot DLL Robot.RssLoader.dll under "MyRobot" and restart the server.
Create the directory "MyRobot" under the "deploy" directory of the server, copy the robot DLL and use the server console command "deploy" to install the robot without restarting the server.
Create the directory "MyRobot" locally, copy the robot DLL locally and use the GetCache Remote Console application to install the robot (i.e. use the "upload-module" command).

Test the Robot

When the robot is installed on the server, it begins to retrieve the data and populate the cache. Periodically, the data is refreshed.

Then you can query the cache to get the list of loaded RSS feed. We can use the server console to read feeds:

>get technology
technology => [{"Id":"http:\/\/dealbook.nytimes.com\/2013\/09\/17\
/china-internet-giant-buys-stake-in-search-engine\/",...

>get jobMarket
jobMarket => [{"Id":"http:\/\/www.nytimes.com\/2010\/07\/18\
/business\/18corner.html","LastUpdatedTime":...

Conclusions

Of course, the robots cannot be used to load any information in the cache, and you cannot remove access to databases by applications but there are many ways to use robots in loading data that do not change or that change little over time, such as the list of the states, the list of provinces, list of postal codes, but also data that changes little during the day as the list of customers, the list of users, or statistical data, etc.

All these types of data can be preloaded in the GetCache storage to be always available to client applications.

History

First revision GetCache_Robots_1_0

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Written By

Massimo Zerbini

Italy

This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.