Click here to Skip to main content
15,881,248 members
Articles / Programming Languages / SQL

Part 2 – Bits for Event Patterns

Rate me:
Please Sign up or sign in to vote.
5.00/5 (1 vote)
5 Aug 2009CPOL4 min read 11.1K   4   1
Bits for Event Patterns

I promised last time (http://blogs.msdn.com/microsoftbob/archive/2009/06/03/the-power-of-bits-for-historical-event-analysis.aspx) that I would follow up with some code to provide a proof of concept for using bit masks for date patterns. I have good news! I have it working in my simulation application and have prepared a little demo for how this works with a Windows form.

I found it wasn’t too difficult to use .NET BitArrays and convert to varbinary and also was able to do this using SQL-to-LINQ. The tricky part was writing a table-valued function that could actually interpret the bit mask and return back all the associated dates. This would probably be better done via SQLCLR with a .NET function, but I’m out of practice with doing .NET CLR functions so will put that off for another iteration.

So, first of all, here’s a screen shot of the testing form.

image

The goal of the form was just to prototype the concept by allowing the specification of a day of week pattern to repeat for a particular period. In the above example, I wanted to generate a bit pattern that would contain 14 bits (actually 16 bits -8 bits for each week - as I explain later) representing whether an event occurred on a given day offset from Monday, June 22 and then store the data in a database.

Here’s the table definition I used to store the data:

SQL
CREATE TABLE [dbo].[TestBitArray]( 
   [PeriodStartDate] [date] NOT NULL, 
   [PeriodLength] [smallint] NOT NULL, 
   [EventName] [varchar](50) NOT NULL, 
   [SelectedWeekDayMask] [tinyint] NOT NULL, 
   [DateRangeBitArray] [varbinary](50) NOT NULL, 
 CONSTRAINT [PK_TestBitArray] PRIMARY KEY CLUSTERED 
( 
   [PeriodStartDate] ASC, 
   [PeriodLength] ASC, 
   [EventName] ASC 
))

Here’s what ends up in the database:

PeriodStartDatePeriodLengthEventNameSelectedWeekDayMaskDateRangeBitArray
6/22/200914TestEvent1740x4A4A

Please excuse me for using a compound primary key, this is just a quick and dirty example! I wanted to test with different combinations of period start dates, lengths, and event names.

Just to contrast different ways to deal with bits, I decided to store my selection bit mask as a simple tinyint rather than as a byte. I use tinyint since the max number of bits that can be set for a week is 7 for each day, which works out to 127 maximum value using low order bits.

Without getting too bogged down (full code is in the attachment), here’s what the .NET code fragment looks like to calculate the bit mask and display the items in the listbox.

JavaScript
BitArray ba = new BitArray((((periodDays-1) / 7) * 8) + 8 , false);
this.listBoxResults.Items.Clear(); 

DateTime currentDate = periodStartDate;

int j = 0;
for (int i = 0; i < periodDays; i++)
{
   if ((i % 7) == 0)
   {
       j++; // Don't set trailing bit, so we store 7 days per byte - simplifies comparisons
   }
   currentDate = periodStartDate.Add(new TimeSpan(i, 0, 0, 0));
   int currentDayOfWeekIndex = (int)currentDate.DayOfWeek;
   ba.Set(i+j, (checkForDayOfWeek(currentDate.DayOfWeek)));
   this.listBoxResults.Items.Add(currentDate.ToShortDateString() + " : " + ba[i+j].ToString());
}
_ba = ba;

Note, the extra bit not being used so that each week is allocated to 1 byte, so 2 weeks of data is actually 14 bytes. I decided to go ahead and “waste” a bit so that each week is in a single byte. This makes it easier to look at the data byte-by-byte, which is the only way to natively view it from T-SQL and see what the patterns are for each week. I could have just strung them all together, but didn’t think it was worth the space savings and it’s not too hard to work around in the functions on both the .NET and SQL side.

Storing to the database is a bit tricky and requires some helper functions that are included in the attachment.

JavaScript
TestBitArrayDataDataContext dc = new TestBitArrayDataDataContext();
dc.UpsertTestBitArray(
   this.dateTimePickerStartDate.Value,
   (short)this.numericUpDownPeriodLength.Value,
   BitArrayConversions.ConvertBitMaskToByte(selectedDaysMask),
  BitArrayConversions.BitArrayToByteArray(_ba),this.textBoxEvent.Text);
dc.SubmitChanges();

It is simple to read integer-type (tinyint, smallint, integer) data type into a bit array as well as to load from an array of varbinary.

For the selected days bit mask:

JavaScript
BitArray selectedDays = new BitArray(System.BitConverter.GetBytes(tb.SelectedWeekDayMask)); 

For the computed period detailed history bitmap:

JavaScript
_ba = new BitArray(tb.DateRangeBitArray.ToArray());

Now, the fun part is the SQL function to return the data, again for brevity, I’ve not pasted the supporting functions but are included in the TestBitArray.sql attachment in the zip.

SQL
CREATE FUNCTION [dbo].[udf_GetDatesForBitMask] 
( 
    -- Add the parameters for the function here 
     @PeriodStartDate date, 
     @DateBitMask varbinary(8000) 
) 
RETURNS 
 @PeriodDates TABLE 
( 
     EventDate DateTime 
) 
AS 
BEGIN 
     -- Fill the table variable with the rows for your result set 
     DECLARE @ByteLength smallint = LEN(@DateBitMask) 
     DECLARE @ByteString varchar(8000) = master.dbo.fn_varbintohexstr(@DateBitMask) 
     DECLARE @BytePointer smallint = 0 
     DECLARE @EventDate datetime 
     WHILE @BytePointer < @ByteLength 
     BEGIN 
           DECLARE @ByteValue tinyint = dbo.fn_ConvertHexUnsignedCharToTinyInt_
                                        (SUBSTRING(@ByteString,3+@BytePointer*2,2)) 
           DECLARE @BitPower smallint = 7 
           WHILE @BitPower > 0 -- First bit not used 
           BEGIN 
                 DECLARE @BitTestValue smallint = POWER(2.00,@BitPower) 
                 IF @ByteValue >= @BitTestValue 
                 BEGIN 
                       SET @EventDate = DATEADD(DD,@BitPower - 1 + @BytePointer * 7 ,@PeriodStartDate) 
                       SET @ByteValue = @ByteValue - @BitTestValue 
                       INSERT @PeriodDates (EventDate) Values (@EventDate) 
                 END 
                 SET @BitPower = @BitPower -1 
           END 
           SET @BytePointer = @BytePointer + 1 
     END 
     RETURN 
END

The ConvertHexUnsignedCharToTinyInt function is based on a similar function for converting to varbinary from string here.

So, what do we get when we actually query the table and join with the function?

SQL
/****** Script for SelectTopNRows command from SSMS ******/
SELECT [EventName]
     ,[PeriodStartDate]
     ,[PeriodLength]
     ,[Eventdate]
     ,master.dbo.fn_varbintohexstr(DateRangeBitArray) as BitArray
 FROM [dbo].[TestBitArray]
 CROSS APPLY dbo.udf_GetDatesForBitMask(PeriodStartDate,DateRangeBitArray)
 Where EventName = 'TestEvent1'
 Order by PeriodStartDate, Eventdate
EventNamePeriodStartDatePeriodLengthEventdateBitArray
TestEvent16/22/2009146/22/20090x4a4a
TestEvent16/22/2009146/24/20090x4a4a
TestEvent16/22/2009146/27/20090x4a4a
TestEvent16/22/2009146/29/20090x4a4a
TestEvent16/22/2009147/1/20090x4a4a
TestEvent16/22/2009147/4/20090x4a4a

Voila, we’ve stored 2 weeks of history for an event in a single row of a table at a cost of just 2 bytes and we are able to query it with T-SQL and no .CLR by joining to a table valued function. A month could be stored in 5 bytes and whole year’s worth of history could be stored in 53 bytes.

For our example, we repeated the same days for each week, but that is just a limitation of our test harness, we could have stored a different pattern of bits within each byte to represent dates.

Later, I’ll show more practical value with the simulation example including fuzzy logic matching to show for example what stocks traded together with statistical confidence within a particular period of time, by merely comparing bit masks enumerating a few hundred summary rows instead of enumerating through hundreds of thousands of history rows. The main application I see for this design pattern is analytic applications that wants to be able to rapidly compare correlations of entities based on events associated with periods over time. This is also useful to reduce storage requirements and I/O performance if there is a huge mass of history that is linked to a specific event during a period where you only need to track if the event occurred and not the details of the event itself.

I’ve included an attachment with all of the code, if you wish to try it out.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
United States United States
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Comments and Discussions

 
Generalfull code is in the attachment Pin
AlphaData14-Apr-11 8:45
AlphaData14-Apr-11 8:45 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.