Click here to Skip to main content
15,879,095 members
Articles / Database Development / Data Visualization

Coercing AutoCompletion

Rate me:
Please Sign up or sign in to vote.
4.80/5 (3 votes)
17 Apr 2015CPOL6 min read 10.6K   230   7  
efficient and secure selecting from large amounts of data

Introduction

Presenting Data in order to choose an Item from is not solved sufficient yet. Imagine a Combobox with more than 300 Items - simply NoGo!
An Improvement would be a Textbox with AutoCompletion, because Autocompletion works as an easy-to-use filter, which reduces the data-amount very quickly.

But even Autocompletion has 2 disadvantages:

  • It is not data-safe: a user can ignore the autocompletion-suggestions, and input any rubbish he likes.
  • On large Data-Lists the user still looses the orientation, which keys are valid to continue reducing the abundance of choices to a human limit.

The concept of Coercing Autocompletion

  • display a dropdown-list of options, where the user can choose from, if he likes (standard-autocompletion-behavior)
  • ensure, that the user only can input valid Data. Reject invalid key-strokes immediately
  • give a preview of the next valid KeyStrokes
  • (optional) "WindForward": as long as only one KeyStroke is valid, autocompletion can input it automatically. This can reduce the nessecary keystrokes dramatically, but also can puzzle rapid-typing-users - So make it optional

Maybe theese concepts do not sound that "whoop! - world-changing". But assume a stable and easy-to-use standard-control, which provides theese concepts (and moreover databinding) - do you think anyone would ever use a combobox anymore? No - combobox, as we know it - perhaps/propably would die.

Several ways of looking at Data

Assume the following sorted data (some geman towns):

Genthin
Gentingen
Genzkow
Georgenberg
Georgensgmünd
Georgenthal
Georgsdorf
Georgsmühle
Geraberg
Gerabronn
Gerach

I put them into a uniform grid, and marked up some chars of specific importance:

897961/DataWithMarkups.png<figure1>

At theese markup-points a user can decide, and each decision will reduce the range of further choice-options.

You can understand theese points as nodes of a decision-tree - let me draw in the available decision-pathes:

897961/DecisionTree1.png<figure2>

Moreover i can remove redundant chars from the Tree:

897961/DecisionTree2.png<figure3>

funny - isn't it?

But the most convenient view (to me) is visualizing a practical sample-selection: Lets select "Georgensgmünd", and see, how the decisions reduce choice-options in a few steps to one, which is the final selection:

897961/OptionReduction1.png<figure4>

After typing "G" one can choose any of the 11 towns. But marked up as "Decision-Node" on this level are only these 3: "n o r"
Choosing "o" reduces the choice-options to 5, and now there are only two Decision-Node: "e s"
Choosing the "e", reduces on to choice-options 3, and "b s t" are available as Decision-Nodes.
Choosing "s" finishes selection - only Option "Georgensgmünd" is left.

Proof of Concept: Sample-Application

897961/SampleApp.gif<figure5>

My implementation is far away from beeing able to crowd out Combobox from Gui-Development: It misses Databinding-Support, the ValidNextKeys-Preview is not applicable everywhere, and it inherits a bug and some suboptimal behavior from Standard-Autocompletion. It is easy-to-use in fact, but - i admire: not (yet) as easy-to-use than combobox is...
But nevertheless i think, it already can do a good job in real-world-applications, who deal with selecting data from big data-amounts.
The sample-download is a kind of zip-code-finder, and it provides about 16000 german towns (and town-districts) to choose from.

My code-design is a kind of "Textbox-Behavior": You can attach it to a Textbox, give him data, and it will configure autocompletion, coercing, notify about Valid-Next-Keys and optional wind-forward-functionality.

Internally it simply let standard-autocompletion do its dropdown- and suggesting-job, and just extend functionality with coercion, displaying the available Chars  and stuff.
In other words: a really lightweight Solution - less than 200 lines of code.

Using the code

VB.NET
Public Class frmOrtFinder

   Private WithEvents _Coercer As New CoercingAutoCompletion

   Public Sub New()
      InitializeComponent()
      FillDatasetFromFile(Path.GetFullPath("..\..\Data\GeoNamesDE.inf"))
      Dim data = OrteDts.Ort.Select(Function(rw) ", ".Between(rw.Name, rw.Bundesland, rw.PLZ))
      _Coercer.Textbox = TextBox1
      _Coercer.ReadData(data)
   End Sub

   Private Sub _Coercer_AvailableCharsChanged(sender As Object, e As EventArgs) Handles _Coercer.AvailableCharsChanged
      lbAvailable.Text = _Coercer.AvailableChars
   End Sub

   Private Sub _Coercer_InputDone(sender As Object, e As EventArgs) Handles _Coercer.InputDone
      'busyness-logic: search the in the Textbox specified Data-Record and select it in Datagridview
      Dim fields = TextBox1.Text.Split({", "}, StringSplitOptions.None)
      For i = 0 To bsOrt.Count - 1
         If DirectCast(bsOrt(i), DataRowView).Row.ItemArray.Cast(Of String).SequenceEqual(fields) Then
            bsOrt.Position = i
            grdOrt.FirstDisplayedScrollingRowIndex = i
            Return
         End If
      Next
      Throw New Exception("Record not found - this should never occur!")
   End Sub
   '...
  • instantiate a coercer (line#3)
  • set its Textbox (line#9)
  • give data to it (line#10)
  • handle its Events (lines#13, #17)

Implementation-Details

I separated the concern "Textbox-Behavior" from "Data-Parsing" - first a glance at the Textbox-Part:

VB.NET
Public Class CoercingAutoCompletion

   Public Event AvailableCharsChanged As EventHandler
   Public Event InputDone As EventHandler

   '...

   Public Property Textbox As TextBox
      Get
         Return _TextBox
      End Get
      Set(value As TextBox)
         If _TextBox Is value Then Return
         If _TextBox IsNot Nothing Then Throw New Exception("Textbox already set!")
         _TextBox = value
         _TextBox = _TextBox
         _TextBox.AutoCompleteMode = AutoCompleteMode.SuggestAppend
         _TextBox.AutoCompleteSource = AutoCompleteSource.CustomSource
         _TextBox.AutoCompleteCustomSource = _AutoTextSource
      End Set
   End Property

   '...

   'improvements for standard-autocompletion: coerce valid inputs, notify about available Chars, selectionstart-wind-forward if wanted
   Private Sub _TextBox_KeyUp(sender As Object, e As KeyEventArgs) Handles _TextBox.KeyUp
      With _TextBox
         Dim i = .SelectionStart
         Select Case e.KeyCode
            Case Keys.Up, Keys.Down ' .Up/.Down choose text from Autocompletion-DropDown, try keep previous SelectionStart
               i = Math.Min(i, _UnselectedText.Length)
               _DataParser.Parse(.Text.Substring(0, i))
               _DataParser.SelStartPropose = i
            Case Keys.Left, Keys.Right
               i = _UnselectedText.Length + e.KeyCode - 38
               _DataParser.Parse(.Text.Substring(0, i))
               If e.KeyCode = Keys.Left Then _DataParser.SelStartPropose = i
            Case Else
               _DataParser.Parse(.Text.Substring(0, i))
               If e.KeyCode = Keys.Back Then _DataParser.SelStartPropose = i
               If _DataParser.FirstDivergence < i Then
                  System.Media.SystemSounds.Hand.Play()
                  .Text = .Text.Substring(0, _DataParser.FirstDivergence)
               End If
         End Select
         ImproveSelection()  'improves the Selection done right before by standard-Autocompletion, raises AvailableCharsChanged
      End With
   End Sub

   'Ensures that the Textbox can't be left without valid providing valid Input
   Private Sub _TextBox_Leave(sender As Object, e As EventArgs) Handles _TextBox.Leave
      With _TextBox
         _DataParser.Parse(.Text)
         _TextBox.Text = _DataParser.FirstBestMatch
         RaiseEvent InputDone(Me, EventArgs.Empty)
      End With
   End Sub
   '...

Mainly it configures AutoCompletion and then observes the Textbox_KeyUp-Event. And there is to distinguish between up/down, left/right, backspace and general Key-Strokes.

  • Up/Down address standard-autocompletion to select next/previous entry from Dropdown-list. Therefore the text is to be re-parsed, and try restore previous selection
  • Left/Right shall move the selection-start, but in every state an autocompletion-selection must reach to the end of text: that's the meaning of suggest: it immediately vanishes, when the user types anything else.
  • Backspace equals Cursor-Left-Keystroke, but Autocompletion removes its suggestion (instead of re-suggesting better - i can't help that)
  • other keys also cause re-parsing, and on invalid input the text gets shortened back to validity (line#22).

So far - maybe more interesting is the DataParser - concern:

VB.NET
Private _RawWords As Object()

Public Sub Refill(autoSource As AutoCompleteStringCollection, texts As IEnumerable(Of String))
   autoSource.Clear()
   Dim txts = texts.ToArray
   Array.Sort(txts, StringComparer.OrdinalIgnoreCase)
   autoSource.AddRange(txts)
   Dim bf = BindingFlags.Instance Or BindingFlags.NonPublic
   Dim inf = GetType(AutoCompleteStringCollection).GetField("data", bf)
   Dim data = inf.GetValue(autoSource)
   inf = GetType(ArrayList).GetField("_items", bf)
   _RawWords.Be(inf.GetValue(data))
   Parse("")
End Sub

Refill() fills the AutoCompleteCollection and then hack it with reflection, to get access to the underlying Data-Array. Thats why my solution takes no additional Memory to what standard-Autocompletion takes anyway.
Note that i use StringComparer.OrdinalIgnoreCase for sorting. The same Comparer is in use when searching the data with BinarySearch:

VB.NET
Private Function FindBestMatch(txt As String) As Integer
   Dim i = Array.BinarySearch(_RawWords, txt, StringComparer.OrdinalIgnoreCase)
   Return If(i < 0, Not i, i)
End Function

A binary search is very efficient: Eg. with only 20 internal Iterations it finds an item in a range of about 1 million sorted items. The Return-Value is tricky: If the search finds a matching item it returns its Position as a value >= 0. Otherwise binarysearch doesn't simply return -1, but it returns a negative number - the Bit-Complement of the Position, where a matching item should be (also understand as: "insert-position"). 
And that is, what Dataparser needs to know: the insert-position of the input (although it doesn't insert).
Assume the input "geo", and look back to <figure4>: FindBestMatch() would come up with line#4, and that is the starting-line from where is to look up for the next Decision-Nodes.
Then next we must find the ending-line of the particular "option-range".
Once again apply FindBestMatch()-binarysearch: But first append Char.MaxValue to the input "geo", to ensure, the modified input "geo^" will retrieve the first line behind the option-range - namely line#9.
In summary we figured out, that the first "Bestmatch" to "geo" is at line#4 - "Georgenberg", and the last BestMatch is at line#8 - "Georgsmühle". From this "option-range" we must collect the "decision-nodes", which define the next level of selection.
But how find the correct column of the next available Decision-Nodes?
Not that complicated: Just compare the first line of the option-range with its last line. The occurance of the first difference will give us the x-position we're looking for - in the sample its column#6, because "Georgenberg" and "Georgsmühle" diverge from column#6 on.
Now loop from firstBestmatchPosition to lastBestMatchPosition and collect distinct(!) the chars at column#6. These are: "e s" - the decision-nodes, and the "available-chars" of a wind-forward-coercing-autocompletion.

The story in code:

VB.NET
'figure out _FirstMatchPos, _LastMatchPos, FirstbestMatch, FirstDivergence, _DecisionColumn, SelectionStart
Public Sub Parse(txt As String)
   _FirstMatchPos = FindBestMatch(txt) ' first cut - possibly incorrect, on invalid input
   FirstBestMatch = Words(_FirstMatchPos)
   FirstDivergence = GetDivergence(txt, FirstBestMatch, 0)
   If FirstDivergence < txt.Length AndAlso _FirstMatchPos > 0 Then 'signals invalid input
      'depending on the sort-order of the first invalid char FindBestMatch() may have retrieved the position **after** the option-range
      Dim div2 = GetDivergence(txt, Words(_FirstMatchPos - 1), 0)
      If div2 < FirstDivergence Then
         _LastMatchPos = FindBestMatch(txt.Substring(0, FirstDivergence) & Char.MaxValue) - 1
      Else
         '_FirstMatchPos actually incorrect, and contains the position (+1) that _LastMatchPos should have. Make up that fault
         _LastMatchPos = _FirstMatchPos - 1
         FirstDivergence = div2
         _FirstMatchPos = FindBestMatch(txt.Substring(0, FirstDivergence))
         FirstBestMatch = Words(_FirstMatchPos)
      End If
   Else
      _LastMatchPos = FindBestMatch(txt & Char.MaxValue) - 1 'Position of the last word which fits to FirstDivergence
   End If
   _DecisionColumn = GetDivergence(FirstBestMatch, Words(_LastMatchPos), FirstDivergence)
   'SelectionStart is proposal - can be changed by the caller
   SelectionStart = If(FirstDivergence >= txt.Length AndAlso WindForward, _DecisionColumn, FirstDivergence)
End Sub

Private Function GetDivergence(s0 As String, s1 As String, start As Integer) As Integer
   For i = start To s0.Length - 1
      If Char.ToUpper(s0(i)) <> Char.ToUpper(s1(i)) Then Return i
   Next
   Return s0.Length
End Function

A complication occurs on invalid inputs: FindBestmatch() will either return the firstBestmatch or the lastBestmatch - this depends on the sort-order of the invalid char.
Assume invalid input "Geoo": the binary-searched insert-position is line#4, because "Geoo" is smaller than "Georgenberg".
But another invalid input "Geot" will retrieve line#9, because "Geot" is bigger than "Georgsdorf". So on invalid input there is additional to re-check, whether the line was the first or the last of the option-range.

Conclusion

To me it's less important to introduce a particular tricky Textbox-Behavior-Implementation. More importent i see the introduction of the concept "Coerce-Windforward-Autocompletion" in general - not my contemplations about Data-structures or explain-code-tryals.

Even my implementation is rather poor, compared with what my fantasy imagines:
May be a Wpf-control, where on GotFocus the Valid-Next-Chars pop up as kind of tooltip, and stuff like that.
Or they get marked up within the DropDown-List.
Or - keeping good-ol' combo alive - why not extend ComboBox either with coerce-wind-forward-autocompletion?
And of course Datagrid-Coercing-Columns must be created either ...

Credits

  • the sample-app's data is taken from GeoNames, under the CreativeCommons-License, which means: for free, as long as i mention them as the author - which i comply willingly.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Germany Germany
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Comments and Discussions

 
-- There are no messages in this forum --