Introduction
RavenDB is a new open source document database for .NET. If you have never worked with a document database before, the simplest way to think about it is to imagine serializing your objects and storing them on the hard drive where the app is. If you stored it using the key or whatever most common lookup method you might use, it would be quite easy to retrieve your entire object without having to map to and from columns and rows in a SQL database. Dealing with further ways of looking it up, concurrency, etc. would be more difficult to deal with, hence the creation of document databases. The documents stored don't necessarily need to be an object that is serialized (arbitrary documents can be stored independent of objects), but that is probably the most common way it can be used. Just keep in mind that it is a completely different way of dealing with data storage so you probably don't want to always approach it as you would a SQL database. A .NET client is included for communicating with the server but the underlying representation is HTTP/JSON - so any client that can communicate with the server via HTTP/JSON will work.
Simple Example
Here is a simple example of how to store two arbitrary POCO objects in RavenDB, query for them and print out some of the information to the screen. In this example, the RavenDB server is on the same machine as the client but that doesn't have to be the case (more on that next). Notice that I didn't have to create a table, I didn't have to map columns and classes to the tables and I didn't have to create any stored procedures. The Company
class isn't even marked as Serializable - it just works.
Also keep in mind that the creation of a DocumentStore
object should be treated as an expensive operation, similar to creating a session factory in NHibernate
. Currently it isn't but there is future work planned that may change this.
The Server and the Client
When starting up the server and running the above example, you will see output in the server similar to the above (log4net is used so console logging can be turned off or directed to a file if desired). There are quite a few interesting things going on here. First, note that instead of submitting two commands, it batched them and submitted them all at once with the SaveChanges()
call. Second, after the batch operation, it started working on indexes that applied to the documents that were saved and was able to query and return those documents.
In addition to the client/server method shown above, you can also directly embed the server functionality inside your app if there is no need for a distributed architecture.
The Browser Based Admin Tool
Once the Raven server is running, it can be accessed via browser to inspect its contents, indexes and to view the documentation.
You can even edit documents and index definitions using your browser.
Indexes and Performance
The more eagle-eyed among you may have noticed the method call to "WaitForNonStaleResults
" for the query in the simple example above. When designing a system that uses indexes, there are a few approaches you can take:
- Make the client wait while you update index whenever data is changed
- Make the client wait while you read data from an index if it is not current (stale)
- Don't make the client wait while updating or reading and just let the client know if data is stale
RavenDB takes the third alternative for performance reasons but if you want to make it wait, you can using the WaitForNonStaleResults
method. Chances are, if that method wasn't included and any other action at all had happened between the insert and the read (since index update times tend to be in the low tens of milliseconds), it would still have worked. Typically, it is sufficient and cheaper to just get the stale data while the index is updated and to get the updated data on the next view or query.
Adding New Indexes using LINQ
Indexes can be created or edited using the Web UI or programmatically. Here I'll show how to add a new index using the WebUI and to use that index for a query.
First, create the index using LINQ syntax:
Write a query that uses that index using Lucene syntax for the where
statement:
The server will recognize the index exists and use it via Lucene to get the smaller result set instead of scanning all items.
Sharding
Raven DB also supports sharding, or partitioning the data across multiple servers. If for example, we knew we had many companies split across multiple regions and wanted certain regions on one server and other regions on another, we could achieve that. The design of the sharding is based on Hibernate Shards, so if you are familiar with that, you will notice some similarities.
Here is an example using sharding (included as a sample project in the RavenDB source code):
When using sharding, you have to come up with the rules to partition the data by. For this example, we will assume 2 shards with companies in region A going to shard 1 and region B going to shard 2. To achieve this, we will create a concrete instance that implements IShardStrategy
which defines 3 pieces of the sharding behavior:
The ShardSelectionStrategy
is how it knows which shard to put a new item in:
The shard access strategy controls how it executes queries across multiple shards. Here I've used the built in parallel method that will query all shards simultaneously and return the results. I'm also using the existing resolution strategy which searches all shards. For more information on the idea behind these strategies, see the Hibernate Shards documentation.
Things Implemented using New .NET 4.0/VS2010 Features
The ParallelShardAccessStrategy
uses the new Tasks functionality in .NET 4.0. Here is the code that queries all the shards simultaneously:
There is an implementation of an expando object that enables dynamic access of a JSON object:
See the source for more examples and more details.
Where To Go From Here
To run the examples shown here, download the source zip file, open it in VS 2010 and build the solution. Run the Raven.Server
to start the server, then launch the simple client example, uncommenting the initial code that inserts data into the database initially to populate it with some data. To run the sharded examples, you'll need to make a copy of the Raven.Server bin directory and change the config for it to listen on a different port so you can run two servers simultaneously on the same machine (or use another machine). The first time you run the server, it will detect if it doesn't have access to the port and grant permissions which will prompt you for admin access.
There are far more features in the product than what I have shown here - to see the rest, get in the code and have a look around. The source code repository is on github at http://github.com/ravendb/ravendb - but the current source and code for the examples here is included with the article. You can grab all the code using git if you want to contribute, or just download all the files you need using the "Download source" button github provides at the above link. You will also find a list of issues in github so if you want to jump in and help on a smaller issue to get used to using git and VS 2010, go for it.
Also, see Oren's blog which includes many posts related to Raven as it was being built, giving some insight into design decision made, etc.
Here are the instructions I used to get git running locally on my Windows machine.
Using the Git GUI has so far worked for me (as opposed to the command line) even when using some of the more complicated scenarios such as creating and merging branches. It is helpful to look through the command line instructions for branching for git, just so you know the terminology which is quite different than VSS/TFS/SVN.
The user group for the project can be found at http://groups.google.com/group/ravendb.
History
- 26-Apr-2010 - Initial version