Google Cloud Platform: Storing Data in Google Cloud Datastore

Ted Neward

27 Jan 2014CPOL12 min read

25.7K

Google Cloud Platform - Part 6: Google Cloud Datastore

This article is in the Product Showcase section for our sponsors at CodeProject. These articles are intended to provide you with information on products and services that we consider useful and of value to developers.

Google Cloud Platform: Getting Started with Google App Engine (Part 1)
Google Cloud Platform: Deploying with Google App Engine (Part 2)
Google Cloud Platform: User Management (Part 3)
Google Cloud Platform: Mobile Endpoints (Part 4)
Google Cloud Platform - Google Cloud SQL (Part 5)
Google Cloud Platform - Google Cloud Datastore (Part 6)
Google Cloud Platform - Google Cloud Storage (Part 7)

Part 6: Google Cloud Datastore

Welcome back and to the sixth installment of our ongoing series on Google Cloud Platform. If you're one of those people who suffers from FOMO (fear of missing out) you might be better off starting with Part 1 here, but if you're alright with coming a bit late to the party then please read on - we're building an application on Google Cloud Platform and in this installment we're going to continue with our investigation of something that most applications need to do: store data.

As we mentioned last time, Google offers several different ways of dealing with data storage: Google Cloud SQL, which we covered last time, is for those applications that want to store data in the time-honored fashion of the relational database and relational model. There’s also the Google Cloud Storage Client, which is geared more for "large binaries", like images and videos, which we’ll get into next time. And lastly, there’s Google Cloud Datastore, a non-relational "NoSQL" data storage approach that we’ll get into here, in just a moment.

Before we do so, however, a very serious point bears repeating: Much as the various technical pundits and evangelists might want to disagree, none of these is "superior" to the others. Those individuals who prefer to slavishly follow whatever "best practice" industry pundits are going on about will hate to hear me say this, but the fact is, each one solves a different kind of problem, and sometimes the best approach is to use all of them simultaneously, a technique sometimes called "polyglot persistence" or "poly-store persistence". Or, as a famous writer once put it, "From each database, according to its abilities, to each project, according to its needs."

(OK, I admit it, that writer was me, just now. But still, it sounds good, doesn’t it?)

Google Cloud Datastore

Google Cloud SQL has the advantage between the two in that it builds on top of the ever-familiar JDBC programming model, once that Java developers will be able to code in their sleep. The Google Cloud Datastore API, on the other hand, is not one that many Java developers will know; fortunately, they provide two approaches (and three APIs) for accessing it: a JDO- or JPA-based approach that encourages developers to build persistent classes and leave the details of persistence to the library, or a "low-level" API designed to provide access to the raw details of the Google Cloud Datastore storage layer. (For those who are truly curious, Google Cloud Datastore is built on top of Google’s "Bigtable" storage system, one of the earliest of the NoSQL-branded storage systems, and its details are described in more detail at http://static.googleusercontent.com/media/research.google.com/en/us/archive/bigtable-osdi06.pdf, though readers are warned, this is a full-blown academic paper, and pulls no punches.)

Depending on which approach appeals, the details of using Google Cloud Datastore differ. The JDO and JPA approaches both require developers to write "persistent classes", classes that conform to a particular set of restrictions and are "enhanced" (modified) during the build process to include additional functionality to make the persistence (mostly) transparent to the application developer. For developers that don’t want (or need) to bother with low-level details, this is usually the better approach.

The high-level APIs, however, have the disadvantage that they are high-level APIs, and there are times when a "raw" low-level approach will offer certain benefits. So, again, neither approach is "better" than the other, though it is fairly safe to say that for persisting an instance of a given class, the high-level APIs will take fewer lines of code than the low-level APIs, whereas the low-level APIs will offer a finer degree of control.

Regardless of approach taken, Google Cloud Datastore has several advantages and disadvantages. For starters, Google Cloud Datastore "entities" are not stored in a tabular relational format; the format has some vaguely tabular shape to it, allowing for entities to in turn have data elements stored on them as directly-dependent data ("properties"), but it doesn’t recognize "relations" (foreign key relationships) as a core part of the model, and it doesn’t support an ad-hoc query format like SQL provides. In return, it automatically distributes data to manage very large data sets, and supports incredibly fast queries, largely because the queries are known well ahead of time and can be optimized long before the query actually runs. It’s a tradeoff, ease-of-accessibility against scale, but the beautiful thing about a poly-store approach is that highly-relational data can be stored in Google Cloud SQL, and large-scale data can be stored in Google Cloud Datastore, and both accessed and used from the same application.

Entities stored in Google Cloud Datastore aren’t accessed in the same way that we’re used to from SQL-based databases, either—entities are structured in a hierarchy (analogously to how files are stored on a filesystem), and thus have a parent entity, except for the "root" entities in the system. Finding an entity, then, becomes an exercise in navigating the child paths to a given entity, much as finding a file on the filesystem is an exercise in navigating through directories to the file in question.

It’s a bit easier to see in code, so let’s look at the two high-level approaches side by side. If these two don’t really float your boat, by the way, Google suggests three other possible open-source frameworks that layer on top of the Google Cloud Datastore API: Objectify (https://code.google.com/p/objectify-appengine/), Twig (https://code.google.com/p/twig-persist/) and Slim3 (https://sites.google.com/site/slim3appengine/). More details on each can be found on their respective home pages.

JDO

Java Data Objects was a predecessor to JPA during the "ORM Wars" of the JavaEE world, and syntactically looks and feels a lot like the object-oriented databases that were a big part of the object-oriented world back in the late 90’s. (Versant, in particular, was a big influence on the JDO specification, it seems—at least, based on my own time using Versant and then later using JDO.)

The package for JDO is javax.jdo, and uses annotations to decorate Java classes to describe the entities and the entity’s properties that are stored in Google Cloud Datastore. It will require an "enhancement" step (recall that we had to disable this enhancement step in Part 1 of this series, so really it just means re-enabling that Ant build script step, or not doing anything at all if you’re working with a fresh copy of the project template from the Google App Engine SDK), which churns out modified versions of those classes with the storage functionality ninja’ed in.

Because JDO is a little less known than JPA, and because JPA is so frequently associated with relational databases (which can sometimes create some false-equivalences in new users’ minds), we’ll use it for the code examples. Note that, particularly at the most basic usage levels, JDO and JPA are pretty interchangeable, so readers more comfortable with JPA can freely use that instead.

JPA

Java Persistence API is the officially-sanctioned API for managing the object/relational impedance mismatch within the JavaEE stack, and was largely influenced by the success of the Hibernate open-source project. (In some respects, it can be called the "winner" of the "ORM Wars", if such a thing can be said to have a winner.) JPA annotations are defined out of the javax.jpa package, and like JDO, developers will annotate classes to be persisted with JPA annotations to describe the entities to be stored in the Google Cloud Datastore.

Low-level

A given datastore can also be seen/accessed from a much lower-level perspective, as can well be imagined (since it essentially rides on top of Google’s BigTable system). Although it can be helpful to be able to see "underneath" the objects being stored (via JDO or JPA) into the storage system, such as the built-in datatypes offered by the Google Cloud Datastore API (phone numbers, emails, unlimited-length text fields, URL links, and so on), for the most part Java developers will not need to use the low-level API, and it’s mentioned here mostly for completeness’ sake.

Code

Enough conceptual deconstruction; let’s see some code.

The application has thus far been greeting people as they’ve come up to the website (or, as we saw last issue, the mobile endpoint), but without any sense of history. Marketing has decided that the application needs to track users as they come to us, and those who’ve been here before get more personalized and/or heartfelt greetings. That means, practically speaking, that we want to track the date/time a given user (as given by the parameter to the mobile endpoint) hits the endpoint, as well as the message that we sent them this time (so as to avoid any obvious repetitions).

First of all, let’s re-define the Message class to be persistent, and to include both the timestamp of the greeting and the target of the greeting. We’ll still let the Message be passed in from outside the class to allow for maximum flexibility in deciding what the message should be. (Developer aesthetics may differ here—if you prefer to let Message encapsulate the actual choice of messages, that’s a perfectly reasonable decision. Personally, I prefer my data-storage types to be pretty dumb data transfer objects.)

From the JDO perspective, that means that the class needs to be annotated at the class level with the JDO @PersistentCapable annotation, indicating that this class needs to be enhanced, and the fields to be stored with the JDO @Persistent annotation. There are a few cases where @Persistent isn’t necessary, but it doesn’t hurt to include it even if it’s redundant. JDO also demands that there be one field defined on the class that stores the primary key for the persistent object, so we add one:

@PersistenceCapable
class Message
{
    @PrimaryKey
    @Persistent(valueStrategy = IdGeneratorStrategy.IDENTITY)
    private Key key;

    @Persistent
    public String target;

    @Persistent
    public String message;

    @Persistent
    public Date timestamp;

    public Message(String t, String m)
    {
        target = t;
        message = m;
        timestamp = new Date();
    }

    public String getMessage() { return message; }
    public void setMessage(String value) { message = value; }

    public Date getTimestamp() { return timestamp; }

    public String getTarget() { return target; }
}

JDO also supports the idea of "serializable types", meaning that any class that is marked Serializable (by implementing the marker interface) will be serialized as a "blob"—a straight array-of-bytes binary value—for those situations where the entity wants to store some dependent data but doesn’t really need to query or index over that data. For example, if we wanted to store images in the Message, that would be easy to do as a Serializable-implementing field type inside the Message, and wouldn’t require anything further to enable it, assuming the Image or other class actually stored were Serializable. (Note that since Collections are Serializable, an entity could store a Collection as a field, and the items within the Collection—assuming all were also Serializable—would be stored along with the entity itself. However, the items in the Collection would all be stored as a binary blob, meaning they would be inaccessible as query predicate parameters.)

From the developer’s point of view, this is all that’s necessary to make Message objects persistent—having made the changes above, we can do an "ant enhance", which in turn depends on the "compile" task, and Ant will run the code through the Java compiler, followed by the DataNucleus (the tool used for both JDO and JPA persistence) enhancer, and deposit the code into the generated "war" directory right next to the "src" directory in the project structure.

However, from the Google App Engine buildchain’s perspective, one other necessary change remains, and that’s the "JDO configuration file" (jdoconfig.xml), which has to end up in a very particular location: the war/WEB-INF/classes/META-INF directory. (In essence, the JDO config file must appear in the "META-INF" directory of the classes it describes, and thus, since these classes are part of a servlet WAR format, in the WEB-INF/classes subdirectory.) The default project template comes with a version stored in the src/META-INF directory that looks like so:

XML

<?xml version="1.0" encoding="utf-8"?>
<jdoconfig xmlns="http://java.sun.com/xml/ns/jdo/jdoconfig"
           xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
           xsi:noNamespaceSchemaLocation=
               "http://java.sun.com/xml/ns/jdo/jdoconfig">

  <persistence-manager-factory name="transactions-optional">
    <property name="javax.jdo.PersistenceManagerFactoryClass"
          value="org.datanucleus.api.jdo.JDOPersistenceManagerFactory"/>
    <property name="javax.jdo.option.ConnectionURL"
              value="appengine"/>
    <property name="javax.jdo.option.NontransactionalRead"
              value="true"/>
    <property name="javax.jdo.option.NontransactionalWrite"
              value="true"/>
    <property name="javax.jdo.option.RetainValues"
              value="true"/>
    <property name="datanucleus.appengine.autoCreateDatastoreTxns"
              value="true"/>
    <property name="datanucleus.appengine.singletonPMFForName"
              value="true"/>
  </persistence-manager-factory>
</jdoconfig>

As with all XML files, make sure the spelling and case of all non-quoted strings is exactly as defined above; usually the default jdoconfig.xml file is fine, and it’s best to just start with that until a situation arises that demands changing it.

The next steps come when we want to find all the Messages that have a particular target as the value of the target field, to decide what Message to hand back, as well as to store the Message that we created and handed back. Both of these steps will require the use of a JDO PersistenceManager, which are obtained via a JDOHelper static class to get a PersistenceManagerFactory, which in turn offers an instance of PersistenceManager:

public class Greetings
{
    private static final PersistenceManagerFactory pmf =
        JDOHelper.getPersistenceManagerFactory("transactions-optional");

    public Message greet(@Named("target") String target)
    {
        PersistenceManager pm = pmf.getPersistenceManager();
        try
        {
            Message msg = new Message(
                target,
                "Hello, " + target + ", from Google Cloud Endpoints!");
            pm.makePersistent(msg);
            return msg;
        }
        finally
        {
          pm.close();
        }

    }
}

Note that the string used to get the PersistenceManagerFactory has to match what was listed in the jdoconfig.xml file; this is to allow developers to be able to use different kinds of PersistenceManagers (one with transactions required, one with them optional, and so on). Once we have a PersistenceManager, it becomes pretty easy to store the Message, using the makePersistent() method call to do the actual storage; JDO and the Google Cloud Datastore API do the rest of the work from there.

Summary

As mentioned earlier, JDO is not the only way to get at Google Cloud Datastore; the JPA standard is equally supported, and may, for some developers, be an easier ramp to getting started with Google Cloud Datastore, if they’re familiar with it from working with Hibernate or the more recent JavaEE standard technologies. And, as one might easily surmise, there’s a lot more to JDO than just what we’ve seen here—the DataNucleus project has a great deal more documentation on JDO, including some nice examples of how to use it in a variety of different scenarios; anyone looking to do anything non-trivial with JDO should spend some serious quality time there. (This is one of the nice things about Google using established Java API standards like JDO and JPA—there’s a ton of documentation already out there, so we can leverage that in learning and using these tools.)

In the meantime, however, we now have a record of those whom we’ve greeted, and we could perhaps use that data as a way of changing up the Message—for those who’ve never been here before, offer them a very polite greeting ("It’s very nice to make your acquaintance"), whereas those who’ve been here numerous times before get a more casual and friendly greeting ("WHAZZZZUPPPPP?!?"). Future customizations are endless, which is good, because this is clearly the Internet’s Next Big Thing.

In the next article, we’ll talk about how to jazz up the greetings even further by including video or images with the greeting, but for now, happy coding!

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Written By

Ted Neward

Web Developer

United States

Ted Neward is an independent consultant specializing in high-scale enterprise systems, working with clients ranging in size from Fortune 500 corporations to small 10-person shops. He is an authority in Java and .NET technologies, particularly in the areas of Java/.NET integration (both in-process and via integration tools like Web services), back-end enterprise software systems, and virtual machine/execution engine plumbing.

He is the author or co-author of several books, including Effective Enterprise Java, C# In a Nutshell, SSCLI Essentials, Server-Based Java Programming, and a contributor to several technology journals. Ted is also a Microsoft MVP Architect, BEA Technical Director, INETA speaker, former DevelopMentor instructor, frequent worldwide conference speaker, and a member of various Java JSRs. He lives in the Pacific Northwest with his wife, two sons, and eight PCs.