Click here to Skip to main content
15,867,453 members
Articles / Hosted Services / Azure
Article

How complicated can a search be?

28 Jul 2016CPOL6 min read 17.1K   2  
Cloudant distributed database as aservice (DBaaS) is engineered in a way to help you solve issues with indexingand searching by integrating the Apache Lucene search library.

This article is in the Product Showcase section for our sponsors at CodeProject. These articles are intended to provide you with information on products and services that we consider useful and of value to developers.

Image 1

Introduction

How complicated can a search be?

Usually we do not ask this question in early development stages and this is why for some projects the architecture is not consistent with the needs of a fast and efficient search.

In those "usual" circumstances the project enters into the stage of refactoring and constant discussions with product owners about the need of time and resources for re-building or re-designing the project for faster and more efficient search.

Cloudant distributed database as a service (DBaaS) is engineered in a way to help you solve issues with indexing and searching by integrating the Apache Lucene search library.

The benefit you get from this is that your NoSQL database has built-in indexing made specifically for JSON formatted data. Plus, index calculation and re-calculations algorithms are made to efficiently run in distributed cloud environments over chunks of data. And you don't need to write a single line of code or configuration to use this power.

In short

Search indexes, defined in design documents, allow databases to be queried using Lucene Query Parser Syntax. Search indexes are defined by an index function, similar to a map function in MapReduce views. The index function decides what data to index and store in the index. (from https://docs.cloudant.com/search.html)

The above quote is a good description of the Cloudant Search Engine but the best way to really understand it is with an example. So this is why we are going to build an IoT solution as an example using Cloudant to store sensor data and query information for monitoring.

Background

What is Apache Lucene?

In 1999 Doug Cutting released the first version Lucene. Later in 2001, the project joined the Apache Software Foundation Jakarta family of open-source products. After this a lot of related projects branched from Lucene. Currently this open-source search library is the most popular JSON document processing library.

Distributed Databases can be scaled across multiple racks, data centers and even different cloud providers. The Apache Lucene implementation in Cloudant is made so that scaling out will give you the benefits without losing the efficiency and speed.

Using RESTful API or Design Document Web UI you can define indexes that are immediately built and ready for use. As we mentioned in previous posts, when key actions occur (like CREATE, UPDATE or DELETE) all indexes are incrementally updated including Lucene indexes.

Lucene Query Parser Syntax

The combination of how the index is defined and the query syntax allows you to perform numeric, date, text, Boolean, and geo-spatial queries on any JSON field in your database.

Here are some of the capabilities of search syntax:

  • Ranked searching - different ways to order the results
  • Powerful query types - Wildcard, Regular Expression, Fuzzy, Proximity, Range and more
  • Language-specific analyzers - choosing a language to recognize terms within text
  • Faceted search and filtering
  • Bookmarking

Using the code

Sensor Emulator

We will reuse the code structure from previous posts extending the JSON delivered to Cloudant DBaaS with more content. Our enhanced sensors will send temperature and humidity just as before, but now they will include device id, geo-location, user messages and error messages. Data will flow directly into the Cloudant DB.

This is not a common architecture design for an IoT solution because architects prefer to design a middle service for providing security and versioning controls. With Cloudant we can send data directly from device to DB using API key level of security provided by Cloudant DBaaS. Because the database is not relational we can integrate different JSON versions into the same database, processed by the same (or version upgraded) indexes.

As a result of this, we have a Sensor class looking like this:

C#
public class Sensor
{
    public string _id;

    public string recordtitle;
    public string record;
    public string origtime;

    public int displacement;
    public int temp;
    public int hmdt;
    public long modified;
    public string tags;

    public string city;
    public double lat;
    public double lon;

    public string deviceId;
    public string userMessage;
    public string errorMessage;        
}

The sensor data generation method has several combinations of device IDs, cities and geo-locations predefined as well as several error messages and user messages.

Building Search Indexes

Multi-parameter indexing

Using the Design Document UI we will create a new search index for six of the fields we query

Image 2

Image 3

This search index is combining six search indexes for six fields we have in each record. This will allow us to execute complicated queries, requesting records in relations to those fields. For example:

  • we can request user messages in a city;
  • we can query user messages in a range of a geo-location;
  • we can request sensor data in a tame frame from a city
  • and many more

The code below generates named indexes for key properties:

C#
function (doc) {
  index("deviceId", doc.deviceId);
  if (doc.origtime) {
    index("time", doc.origtime, { "store" : true });
  }
  if (doc.lat && doc.lon) {
    index("lat", doc.lat, { "store" : true });
    index("lon", doc.lon, { "store" : true });
    index("city", doc.city, { "store" : true });
  }
  if (doc.userMessage && doc.userMessage.length !== 0) {
    index("userMessage", doc.userMessage, { "store" : true });
  }
}

Note that some indexes are declared with parameters:

  • "store"set to true is instructing the search engine that we will need to keep the value and return it up on request
  • "facet"set to true is instruction the search engine to count the distinct value repeats

Image 4

The generated search index can be tested from the UI

Image 5

Facet Searches

For easy REST access we will create three separate search indexes for the facet indexing. The fields we access there are city, sensor record name, and error message (because error messages are a limited range of error codes).

At the end we will have this type of design document:

Image 6

With those codes for each search index method:

facetErrors

C#
function (doc) {
  if (doc.errorMessage && doc.errorMessage.length !== 0) {
    index("errors", doc.errorMessage, { "facet":true });
  }
}

facetCity

C#
function (doc) {
  if (doc.city) {
    index("city", doc.city, { "store" : true, "facet" : true });
  }
}

facetRecords

C#
function (doc) {
  if (doc.recordtitle) {
    index("record", doc.recordtitle, { "store" : true, "facet" : true });
  }
}

Azure Web App visualization

For this demo we will build the simplest possible solution for Azure Cloud Web Site.

Image 7

This is why we started from an empty template.

Image 8

Making several changes into the web.config - adding application settings and enabling default documents

XML
<?xml version="1.0" encoding="utf-8"?>
<configuration>
  <appSettings>
    <add key="username" value="[username]" />
    <add key="password" value="[password]" />
  </appSettings>
...
  <system.webServer>
    <defaultDocument enabled="true" />
...
  </system.webServer>
</configuration>

Install one nuget library (needed for REST API calls against Cloudant DB)

Image 9

Adding a static helper class for addressing Cloudant search indexes - check this in code on GitHub. The key rows in the three methods for requesting data from Cloudant are the rows where we specify the filters.

GET Facet Search

C#
request.AddQueryParameter("q", "*:*");
request.AddQueryParameter("counts", "[\""+counter+"\"]"); //
request.AddQueryParameter("limit", "0");

Three parameters are sent in this type of request:

  • the query for main filtering - in this case *:* because we want to count all indexed records
  • the "counts" array of search index names - in this case we will request separately for "errors", "city" and "records"
  • the "limit" parameter to limit the returned search results - in this case I need only counts, so limit is zero (no records returned)

GET Geo-location Search

C#
request.AddQueryParameter("q", "*:*");
request.AddQueryParameter("sort", "\"<distance,lon,lat," + lon + "," + lat + ",km>\"");
request.AddQueryParameter("limit", "5");

Here I request the top five records ordered by distance from a given latitude and longitude.

GET Text Search

C#
request.AddQueryParameter("q", "userMessage:" + text + "*");
request.AddQueryParameter("limit", "10");

The last one is simple search where requested text is extended with wildcard char and results are limited to 10 records.

Index.html

A simple HTML page named with a default name (Index.html in order to be the default page to open) loads JavaScript from jQuery and Google Maps. The resulting view shows the facet counts and pins on the map

Image 10

And the text search shows a different type of pins with popup showing the user massage text.

Image 11

Points of Interest

This demo did not cover all the ways to use Cloudant Search, but gives us some knowledge about Cloudant Search capabilities - how to combine indexes and how to query them. In the post we built a simple Azure cloud-based web application that is using Cloudnat DBaaS for storage and data processing. And we used a different architecture for an IoT solution and achieved a simple implementation with better usability and simple maintenance.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Architect Strypes
Bulgaria Bulgaria
Mihail Mateev is a Technical Consultant, Community enthusiast, PASS RM for CEE and chapter lead, Microsoft Azure MVP
He works as Solutions Architect, Technical PM and Senior Technical Evangelist at Strypes
Mihail Mateev has experience as a Senior Technical Evangelist, Team Lead at Infragistics Inc. He worked as a Software developer and team lead on WPF and Silverlight Line of Business production lines of the company.
Mihail worked in various areas related to technology Microsoft: Silverlight, WPF, Windows Phone 7, Visual Studio LightSwitch, WCF RIA Services, ASP.Net MVC, Windows Metro Applications, MS SQL Server and Windows Azure. He also write many jQuery related blogs.
Over the past ten years, Mihail has written articles for Bulgarian Computer World magazine, blogs about .Net technologies. He is a contributor and a technical editor of publications PACKT Publishing and Wiley. Mihail did presentations for .Net and Silverlight user groups in Bulgaria. He has an Experience with GIS system over .Net framework. He worked more than five years in ESRI Bulgaria like a Software developer and a trainer. Several years Mihail did a lectures about Geographic Information Systems in the Sofia University “St. Kliment Ohridski” , Faculty of Mathematics and Informatics. Mihail is also a lecturer about Computer Systems in the University of the Architecture, Civil Engineering and Geodesy in Sofia at Computer Aided Engineering Department. Mihail holds master's degrees in Structural Engineering and Applied Mathematics and Informatics.

Comments and Discussions

 
-- There are no messages in this forum --