I promised last time (http://blogs.msdn.com/microsoftbob/archive/2009/06/03/the-power-of-bits-for-historical-event-analysis.aspx) that I would follow up with some code to provide a proof of concept for using bit masks for date patterns. I have good news! I have it working in my simulation application and have prepared a little demo for how this works with a Windows form.
I found it wasn’t too difficult to use .NET BitArrays and convert to varbinary and also was able to do this using SQL-to-LINQ. The tricky part was writing a table-valued function that could actually interpret the bit mask and return back all the associated dates. This would probably be better done via SQLCLR with a .NET function, but I’m out of practice with doing .NET CLR functions so will put that off for another iteration.
So, first of all, here’s a screen shot of the testing form.
The goal of the form was just to prototype the concept by allowing the specification of a day of week pattern to repeat for a particular period. In the above example, I wanted to generate a bit pattern that would contain 14 bits (actually 16 bits -8 bits for each week - as I explain later) representing whether an event occurred on a given day offset from Monday, June 22 and then store the data in a database.
Here’s the table definition I used to store the data:
CREATE TABLE [dbo].[TestBitArray](
[PeriodStartDate] [date] NOT NULL,
[PeriodLength] [smallint] NOT NULL,
[EventName] [varchar](50) NOT NULL,
[SelectedWeekDayMask] [tinyint] NOT NULL,
[DateRangeBitArray] [varbinary](50) NOT NULL,
CONSTRAINT [PK_TestBitArray] PRIMARY KEY CLUSTERED
(
[PeriodStartDate] ASC,
[PeriodLength] ASC,
[EventName] ASC
))
Here’s what ends up in the database:
PeriodStartDate | PeriodLength | EventName | SelectedWeekDayMask | DateRangeBitArray |
6/22/2009 | 14 | TestEvent1 | 74 | 0x4A4A |
Please excuse me for using a compound primary key, this is just a quick and dirty example! I wanted to test with different combinations of period start dates, lengths, and event names.
Just to contrast different ways to deal with bits, I decided to store my selection bit mask as a simple tinyint
rather than as a byte
. I use tinyint
since the max number of bits that can be set for a week is 7 for each day, which works out to 127 maximum value using low order bits.
Without getting too bogged down (full code is in the attachment), here’s what the .NET code fragment looks like to calculate the bit mask and display the items in the listbox
.
BitArray ba = new BitArray((((periodDays-1) / 7) * 8) + 8 , false);
this.listBoxResults.Items.Clear();
DateTime currentDate = periodStartDate;
int j = 0;
for (int i = 0; i < periodDays; i++)
{
if ((i % 7) == 0)
{
j++;
}
currentDate = periodStartDate.Add(new TimeSpan(i, 0, 0, 0));
int currentDayOfWeekIndex = (int)currentDate.DayOfWeek;
ba.Set(i+j, (checkForDayOfWeek(currentDate.DayOfWeek)));
this.listBoxResults.Items.Add(currentDate.ToShortDateString() + " : " + ba[i+j].ToString());
}
_ba = ba;
Note, the extra bit not being used so that each week is allocated to 1 byte, so 2 weeks of data is actually 14 bytes. I decided to go ahead and “waste” a bit so that each week is in a single byte. This makes it easier to look at the data byte-by-byte, which is the only way to natively view it from T-SQL and see what the patterns are for each week. I could have just strung them all together, but didn’t think it was worth the space savings and it’s not too hard to work around in the functions on both the .NET and SQL side.
Storing to the database is a bit tricky and requires some helper functions that are included in the attachment.
TestBitArrayDataDataContext dc = new TestBitArrayDataDataContext();
dc.UpsertTestBitArray(
this.dateTimePickerStartDate.Value,
(short)this.numericUpDownPeriodLength.Value,
BitArrayConversions.ConvertBitMaskToByte(selectedDaysMask),
BitArrayConversions.BitArrayToByteArray(_ba),this.textBoxEvent.Text);
dc.SubmitChanges();
It is simple to read integer-type (tinyint
, smallint
, integer
) data type into a bit array as well as to load from an array of varbinary.
For the selected days bit mask:
BitArray selectedDays = new BitArray(System.BitConverter.GetBytes(tb.SelectedWeekDayMask));
For the computed period detailed history bitmap:
_ba = new BitArray(tb.DateRangeBitArray.ToArray());
Now, the fun part is the SQL function to return the data, again for brevity, I’ve not pasted the supporting functions but are included in the TestBitArray.sql attachment in the zip.
CREATE FUNCTION [dbo].[udf_GetDatesForBitMask]
(
@PeriodStartDate date,
@DateBitMask varbinary(8000)
)
RETURNS
@PeriodDates TABLE
(
EventDate DateTime
)
AS
BEGIN
DECLARE @ByteLength smallint = LEN(@DateBitMask)
DECLARE @ByteString varchar(8000) = master.dbo.fn_varbintohexstr(@DateBitMask)
DECLARE @BytePointer smallint = 0
DECLARE @EventDate datetime
WHILE @BytePointer < @ByteLength
BEGIN
DECLARE @ByteValue tinyint = dbo.fn_ConvertHexUnsignedCharToTinyInt_
(SUBSTRING(@ByteString,3+@BytePointer*2,2))
DECLARE @BitPower smallint = 7
WHILE @BitPower > 0
BEGIN
DECLARE @BitTestValue smallint = POWER(2.00,@BitPower)
IF @ByteValue >= @BitTestValue
BEGIN
SET @EventDate = DATEADD(DD,@BitPower - 1 + @BytePointer * 7 ,@PeriodStartDate)
SET @ByteValue = @ByteValue - @BitTestValue
INSERT @PeriodDates (EventDate) Values (@EventDate)
END
SET @BitPower = @BitPower -1
END
SET @BytePointer = @BytePointer + 1
END
RETURN
END
The ConvertHexUnsignedCharToTinyInt
function is based on a similar function for converting to varbinary from string here.
So, what do we get when we actually query the table and join with the function?
SELECT [EventName]
,[PeriodStartDate]
,[PeriodLength]
,[Eventdate]
,master.dbo.fn_varbintohexstr(DateRangeBitArray) as BitArray
FROM [dbo].[TestBitArray]
CROSS APPLY dbo.udf_GetDatesForBitMask(PeriodStartDate,DateRangeBitArray)
Where EventName = 'TestEvent1'
Order by PeriodStartDate, Eventdate
EventName | PeriodStartDate | PeriodLength | Eventdate | BitArray |
TestEvent1 | 6/22/2009 | 14 | 6/22/2009 | 0x4a4a |
TestEvent1 | 6/22/2009 | 14 | 6/24/2009 | 0x4a4a |
TestEvent1 | 6/22/2009 | 14 | 6/27/2009 | 0x4a4a |
TestEvent1 | 6/22/2009 | 14 | 6/29/2009 | 0x4a4a |
TestEvent1 | 6/22/2009 | 14 | 7/1/2009 | 0x4a4a |
TestEvent1 | 6/22/2009 | 14 | 7/4/2009 | 0x4a4a |
Voila, we’ve stored 2 weeks of history for an event in a single row of a table at a cost of just 2 bytes and we are able to query it with T-SQL and no .CLR by joining to a table valued function. A month could be stored in 5 bytes and whole year’s worth of history could be stored in 53 bytes.
For our example, we repeated the same days for each week, but that is just a limitation of our test harness, we could have stored a different pattern of bits within each byte to represent dates.
Later, I’ll show more practical value with the simulation example including fuzzy logic matching to show for example what stocks traded together with statistical confidence within a particular period of time, by merely comparing bit masks enumerating a few hundred summary rows instead of enumerating through hundreds of thousands of history rows. The main application I see for this design pattern is analytic applications that wants to be able to rapidly compare correlations of entities based on events associated with periods over time. This is also useful to reduce storage requirements and I/O performance if there is a huge mass of history that is linked to a specific event during a period where you only need to track if the event occurred and not the details of the event itself.
I’ve included an attachment with all of the code, if you wish to try it out.
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.