Cutting Edge - Motion, Texture And 3D Forms As Interactive Services (Part I)

Asame Imoni Obiomah

0/5 (0 vote)

Sep 8, 2012

MIT

7 min read

29130

955

This article is the first toddler step in the development of a framework for the delivery of motion, touch and 3D forms as interactive services locally and over networks.

Introduction

This is the first part of a series of articles to introduce concepts and software for a framework that delivers motion and touch as services, both locally and over the internet, all with the aid of your regular webcam. Yes you heard (or read) right, motion and touch as services. Actually, there's a bit more; with this tech, some time in the near future, you will be able to develop apps that actually replicate 3D forms across a network!
All of the software will be open source with very permissive licensing, except for a couple of core libraries which are patent pending freeware (intellectual rights are mine). The core libraries currently host code for image analysis, but will also host code to simulate texture and replicate 3D forms.

The computer can almost become a living thing.

What is it Joe?
“Sir, it the Kinect, Wii and Sony Move... But on steroids! It will rockses your World and it are a wonderful concept! I speechles, it have cutted my tongue into piece!”

Photo credit to Smoobs; http://www.flickr.com/photos/smoo/
Source; http://www.flickr.com/photos/43541636@N00/237975844

Here goes! A few words about what will be possible:

The framework will Allow you to invent your own gestures, effects and commands to control devices, it will also be flexible enough to adapt to existing games with no modifications to your code.

With a simple everyday webcam and a computing device (tablet PC etc), the new framework will open up all sorts of exciting prospects to extend human-computer interactions into an ecosystem of enhanced sensory exchanges. For example; imagine software that gives you the freedom to create crazy stuff that lets you point at one computer to transfer data to the next. Personally, I've always dreamt of boxing or taekwando tournaments across the web (any good ol' bashing), perhaps we would achieve some semblance of that limited mainly by connection and round-trip speeds; sadly, though, it would require a top end webcam that can take crisp shots of fast moving objects.

What about the ability to feel the fabric of your brand new sofa while sitting at your desk, before ordering over the web? How about an enhanced movie experience where you can feel the creepy crawlies in a horror film while sitting in your favourite chair in your sitting room? Dream of the possibilities of having your websites contact and support pages, your forum, Flash, Silverlight, HTML5 bits etc interacting in novel ways with gestures... Even actuators, motors, robots and other mechanical devices!

Possibilities! Possibilities!! Possibilities!!!

Every aspect of the technology's development will be done with your participation, there are things I can show you, while there are things you can teach me as well; as code is released here, the real life framework and servers are being built. Watch out for roll outs of both incremental code and new explanatory articles every fortnight or so. The current article deals with the most basic concepts of the framework.

Unnecessary History

I've always been intrigued by realtime object recognition, even starting an open source project on www.Codeplex.com in 2006. Since 2010, I've been seriously engaged in tackling real time object recognition and started a journey of grit, sweat, failures, the odd success and gruelling restarts on www.KC36.com. The lessons learnt were many.

A Hint At The Methodology and Supported Platforms

The first and most important lesson was that the Cartesian coordinate system is a thoroughly unnatural and confusing choice for image analysis. Firstly, its representation of angles and lengths are inaccurate; secondly nature prefers circles and other conics. As an example, the pupils of our eyes are round, while our field of vision is conical (I digress; indeed, throughout observable nature, straight lines are always found to be local approximations of curves, straight lines are merely one of our convenient mathematical concepts).
Ahem! How do you represent conics with a pile of squares? Not happening, sir!
We solve the problem by breaking our image into polygons. In order to break the image into polygons, we find the edges (or depths) and approximate curves with a polyline so that each object in the image becomes a polygon. Doing this enables us to simplify computations and execute them at increased speed.
More details about this area will be provided in later articles in this series.

The platforms that will be supported eventually, will be:

Windows (C++, C#)
Android (via C++ and C#/Mono)
Linux (via C++ and C#/Mono)
iOS (via C++)
Mac (via C++)

At this time, only Windows and the Pbgra32 image format are supported. However, the library is written in ANSI-C++ and has no dependencies, therefore as soon as other image formats are supported, there will be few or no issues compiling it for other platforms.

Support for browser interactions with HTML5, JavaScript and Webinos (http://webinos.org/) will come through layers built over the C++ and C# API's. A picture speaks a thousand words, so here's a pictorial overview:

Intro To The Edge Detection Code

The edge detection code does a number of things, the major ones being:

Edge detection
Edge categorising
Edge indexing and sorting
Polyline segment angle approximations
Curve approximation with polylines

Firstly, it must be stressed that the app and libraries shown below have no GPU acceleration and are unoptimised (except for the SSE switch in Visual C++), yet execution times are very competitive, even on my 7 year old, barely crawling granny 'puter that’s running an advanced OS its primeval BIOS utterly rejects like its the spawn of Satan. Here are the specs of my good 'ol workhorse:

OS: Windows Server 2008 Standard Service Pack 2 (32-Bit)
Make: Asus Pundit P4S8L
Processor: Intel Celeron 2.40GHz
RAM: 1.5 GB

The pics below show the result for finding, categorising, indexing, sorting the edges (and all the other goodies mentioned at the start of this section) of a 615 x 407 pic.

14 milliseconds, not bad enh?!

The code and library for the app below are supplied with this article, so you can do your own tests. Your mileage may vary and you will find that the busyness of your test pics also plays a part in the execution time.
NOTE: We do not use a webcam yet, for now that's not required to show what is possible. Webcams will come in subsequent articles.

Fig. 1 Edge categorisation

Fig. 2. Curve approximation

Fig. 3. Segment angle approximation

For some reason, its slow on 64-bit machines though, I will look into it and provide a 64-bit version at a later date.
NOTE: to avoid System.IO.FileNotFoundException, 64-bit users will need to install Microsoft Visual C++ 2010 SP1 Redistributable Package (x86) http://www.microsoft.com/en-gb/download/details.aspx?id=8328

Code Organisation

The software for this article is made up of two projects, KC36.Client and KC36.NET. KC36.Client is the GUI, while KC36.NET wraps the native library, KC36.Native. The native library contains the core analysis functions.

What This Code Release Does

In a few short words, it organises an image into polylines which can then be searched for patterns. Fairly simple and straightforward. The code block below explains how to use the current files, it is quite similar to code found in the Execute() method found in the GUI file, ClientMainUI.cs.
The code in the native library is patent pending, so I am a bit constrained with it as far as details go; one day in the future, it might become open source, but for now, the fear of the anti-innovation Apple/Samsung battle is the beginning of wisdom.

Using The Code

using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Text;
using System.Windows.Forms;
using System.IO;
using KC36.Interop;
using System.Diagnostics;
using System.Drawing.Imaging;

namespace KC36.Client
{
    internal class Class1
    {
        internal unsafe void TestIt(Image image)
        {
            // this should come in as a parameter or property from the webcam driver.
            int bitsPerPixel = 4;

            // Sets the brightness datum for edge selection, experiment
            // with this value to see its effect.
            // It is controlled by the slider in the GUI. 
            int trkBarMinBrightnessDiffTolerance = 177;

            // Sets the tolerance angle for the polyline approximation,
            // experiment with numbers from 1 to 8 and see the result.
            int cmbEdgeAngleTolerance = 1;

            // The image to be analysed.
            Bitmap bitmap = (Bitmap)image;
                
            // Get the byte array data of the loaded image.
            byte[] pixelArray = Tools.GetBytes(bitmap);

            // Set the properties that would be used to communicate with
            // the native library.
            int pixelArrayCount = pixelArray.Length;
            int stride = bitmap.Width * bitsPerPixel;
            int width = bitmap.Width;
            int height = bitmap.Height;

            // Initialise the native library (skipping this step will
            // result in unpredictable behaviour).
            // Initialisation needs to be carried out only once, or if
            // a resize of the GUI occurs.
            Wrapper.InitialiseCore(width, height, stride,
                        trkBarMinBrightnessDiffTolerance, cmbEdgeAngleTolerance);

            //---------------------------------------------------
            // Call the native library (KC36.Native) to analyse the pixel data.
            Wrapper.GetFeaturedList(pixelArray);

            // Use returned data the native library (KC36.Native)
            // to populate local variables.
            // Wrapper.DirectionIndices is key to understanding how the
            // the data below fits together.
            // Wrapper.DirectionIndices holds the indices Wrapper.Directions
            // and Wrapper.EdgeMetrics, for example
            // Wrapper.EdgeMetrics[0] holds the data for
            // Wrapper.DirectionIndices[Wrapper.DirectionIndices[0]].

            // polyline segment angles.
            int* polylineAngles = Wrapper.Directions;
 
            // indices of Wrapper.Directions and Wrapper.EdgeMetrics.
            int* indices = Wrapper.DirectionIndices;

            // edge quality indicator (used to discriminate between
            // and group polyline segments).
            int* edgeMetrics = Wrapper.EdgeMetrics; 

            // total count of Wrapper.DirectionIndices.
            int featurePointCount = Wrapper.FeaturePointCount;
 
            // Get the polyline segments (recall that polyline segments
            // represent curve  approximations.
            // This is a large array with every 10 elements holding the
            // definition of a segment. The order is as follows:
            // 0 index, 1 start x, 2 start y, 3 end x, 4 end y, 5 start depth,
            // 6 end depth, 7 pixel count, 8 octant (darkest pixel in 3x3
            // neighbourhood that holds the normal to orientation of segment),
            // 9 angle

            // total number of segments.
            int segmentPropertyCount = Wrapper.SegmentPropertyCount;

            // initialise segment array.
            int[] segments = new int[segmentPropertyCount];

            // populate segment array.
            int* segmentsNative = Wrapper.SegmentProperties;

            /*
             * Code for your functions goes here.
             * See ClientMainUI.cs for inspiration.
             * ...And they all lived happily ever after! :)
             */
        }
    }
}

Experiment with various settings, get used to them and see what you get. These controls are the beginning of an exciting and epochal journey to rewrite the way the Internet and computer interactions are defined.

To use the downloaded files, unzip to a suitable directory. You will find another zip file called KC36.Native.zip and a license file called “License (for KC36.Native).txt.” Read the license, then unzip the code in the same directory. The license is quite restrictive for the intro release, but that's only because some of the code will be obsolete in the next week or so; subsequent licenses will be much freer. By the way, both KC36.Client and KC36.NET are MIT licensed and as free as air

That’s all for now, folks! The next article will introduce the object recognition code, it'll link to the intro above and fill in any technical gaps. Expect it in a week from now.

Points of Interest

The code is slow on 64-bit machines, thats something thats being worked on. Also, to avoid System.IO.FileNotFoundException, 64-bit users will need to install Microsoft Visual C++ 2010 SP1 Redistributable Package (x86) http://www.microsoft.com/en-gb/download/details.aspx?id=8328