Click here to Skip to main content
15,038,237 members
Articles / Web Development / ASP.NET / ASP.NET Core
Article
Posted 9 Mar 2017

Stats

17.9K views
170 downloads
7 bookmarked

Generating Anagrams at Breakneck Speed

Rate me:
Please Sign up or sign in to vote.
4.90/5 (8 votes)
9 Mar 2017CPOL5 min read
A documented journey looking for the fastest solution for generating valid Anagrams for the Weekly Code Project Challenge...

Introduction

This article was inspired by the Weekly Code Project Challenge[^] to:

  1. given a random (or not so random) string, generate unique anagrams
  2. hook into a spell checking service of your choice and only return anagrams that are actual words
  3. need to be a fast solution

The purpose of this article is to take you on the journey undertaken and the discoveries made along the way to the final solution for the Weekly Challenge.

The solution will be a console app, however the code could be applied to any other type of app - WinForm, WPF, UWP, Xamarin Native/Forms, ASP.NET, MVC, and ASP.Core.

Disclaimer: I don't claim this to be the best or fastest solution but I feel that it is pretty close. The challenge has not concluded at the time of posting of this article, so someone may have found a faster solution.

TL;DR

For those who want to see the results, you can jump directly to them[^].

What is an Anagram?

Firstly, we need to look at the definition of an Anagram. According to Oxford Dictionaries[^]: "A word, phrase, or name formed by rearranging the letters of another, such as "spar", formed from "rasp"."

1. Build List of Anagrams

To generate anagrams, you need to find all the combinations of the letters. There are many different ways of doing this. Here is one:

C#
static class Program
{
    private static object elapsed;

    private static void Main(string[] args)
    {
        var st = new Stopwatch();
        var elapsed = default(TimeSpan);
        var prime = "a".Anagrams();

        while (true)
        {
            st.Reset();
            Console.Write("\r\nSearch Word: ");

            int count = 0;
            var lookupWord = Console.ReadLine().Trim().ToLower();
            if (lookupWord.Length == 0) break;

            st.Start();
            var anagrams = lookupWord.Anagrams();
            elapsed = st.Elapsed;
            st.Stop();

            foreach (var anagram in anagrams)
                Console.WriteLine($"{++count,2}: {anagram}");

            Console.WriteLine($"Time: {elapsed.TotalMilliseconds} ms");
        }
    }

    // derived from http://stackoverflow.com/a/7187466
    static IEnumerable<string> Anagrams(this string word) 
        => Combinations(word.ToCharArray().ToList())
            .Select(x => new string(x.ToArray())).Distinct();

    static IEnumerable<list<char>> Combinations(List<char> items)
    {
        if (items.Count == 0) yield return new List<char>();

        var copy = new List<char>(items);
        foreach (var i in items)
        {
            copy.Remove(i);
            foreach (var rest in Combinations(copy))
            {
                rest.Insert(0, i);
                yield return rest;
            }
            copy.Insert(0, i);
        }
    }
}

How Did We Do?

Passed 2 out of 3 requirements. My target was 100%, so it fails:

  1. Generate anagrams - passed
  2. Valid (spell checked) words - fail
  3. Speed: Fast - passed barely

2. Using the WPF Spell Checker

I know that WPF has support for a spell checker. The trick with the control is that it needs to run in a STA Thread[^].

The following uses the WPF spell checker:

C#
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Threading;
using System.Windows.Controls;
using System.Windows.Documents;

namespace FastAnagrams
{
    static class Program
    {
        private static void Main()
        {
            var st = new Stopwatch();
            var elapsed = default(TimeSpan);
            var sh = new SpellCheckHelper();
            IEnumerable<string> anagrams = null;

            while (true)
            {
                st.Reset();
                Console.Write("\r\nSearch Word: ");

                int count = 0;
                var lookupWord = Console.ReadLine().Trim().ToLower();
                if (lookupWord.Length == 0) break;

                st.Start();
                sh.BeginBulkFind(Anagrams(lookupWord));
                anagrams = sh.EndBulkFind();
                elapsed = st.Elapsed;
                st.Stop();

                Console.WriteLine($"Time: {elapsed.TotalMilliseconds} ms");
                Console.WriteLine($"{anagrams.Count():N0} valid anagrams\r\n");

                foreach (var anagram in anagrams)
                    Console.WriteLine($"{++count,2}: {anagram}");
            }
        }

    public class SpellCheckHelper
    {
        private readonly object sync = new object();

        private List<string> validWords = new List<string>();
        private IEnumerable<string> words;

        private bool inProgress;
        private bool isComplete;
        public bool IsComplete => isComplete;

        public void BeginBulkFind(IEnumerable<string> words)
        {
            this.words = words;
            validWords.Clear();

            var backgroundThread 
				= new Thread(new ThreadStart(DoBulkCheck));
            backgroundThread.SetApartmentState(ApartmentState.STA);
            backgroundThread.Start();

            isComplete = false;
            inProgress = true;
        }

        public List<string> EndBulkFind()
        {
            lock (sync) if (!isComplete) Monitor.Wait(sync);

            isComplete = false;
            inProgress = false;

            return validWords;
        }

        private void DoBulkCheck()
        {
            try
            {
                var tb = new TextBox();
                tb.SpellCheck.IsEnabled = true;

                foreach (var word in words)
                {
                    int index = 0;
                    tb.Text = word;
                    var valid = true;
                    while ((index = 
					tb.GetNextSpellingErrorCharacterIndex
					(index, LogicalDirection.Forward)) != -1)
                        if (!string.IsNullOrEmpty
						(tb.Text.Substring(index, 
						tb.GetSpellingErrorLength(index))))
                        {
                            valid = false;
                            break;
                        }

                    if (valid) validWords.Add(word);
                }
            }
            finally
            {
                lock (sync)
                {
                    isComplete = true;
                    Monitor.PulseAll(sync);
                }
            }
        }
    }
}

How Did We Do?

Passed 2 out of 3 requirements. Performance was critical, so it fails at 38+ seconds:

  1. Generate anagrams - passed
  2. Valid (spell checked) words - passed
  3. Fast - failed

3. Check for Valid Words Using a Custom Spell Checker

Checking each word using the WPF spell checker was too slow. So let's load a List of words and run a check against our list.

A quick search finds the list of compiled words: GitHub - 350k+ English words[^].

Next, we require a fast method of checking if the word are in a List. The Generic HashSet<T> Class[^] has "is based on the model of mathematical sets and provides high-performance set operations similar to accessing the keys of the Dictionary<TKey, TValue> or HashTable collections". Here is the revised version:

C#
static class Program
{
    private static void Main(string[] args)
    {
        var st = new Stopwatch();
        var elapsed = default(TimeSpan);
        var anagrams = new List<string>();

        st.Start();
        LoadWords("words.txt");
        st.Stop();
        Console.WriteLine($"Loading Time: {st.ElapsedMilliseconds} ms\r\n");
        var prime = "a".Anagrams();

        while (true)
        {
            st.Reset();
            Console.Write("\r\nSearch Word: ");

            int count = 0;
            anagrams.Clear();

            var lookupWord = Console.ReadLine().Trim().ToLower();
            if (lookupWord.Length == 0) break;

            st.Start();
            var words = Anagrams(lookupWord);
            foreach (var item in words)
            {
                if (wordList.Contains(item))
                    anagrams.Add(item);
            }
            elapsed = st.Elapsed;
            st.Stop();

            Console.WriteLine($"Time: {elapsed.TotalMilliseconds} ms");
            Console.WriteLine($"{anagrams.Count:N0} valid anagrams out of {words.Count():N0}\r\n");

            foreach (var anagram in anagrams)
                Console.WriteLine($"{++count,2}: {anagram}");
        }
    }

    static HashSet<string> wordList = new HashSet<string>();
    static void LoadWords(string filename)
    {
        string word = string.Empty;
        using (StreamReader sr = File.OpenText(filename))
            while ((word = sr.ReadLine()) != null)
                if (word.Length > 0 && !word.Contains("'") && !word.Contains("\""))
                    wordList.Add(word);
    }
}

How Did We Do?

Passed 3 out of 3 requirements with reasonable performance at 2.75 ms:

  1. Generate anagrams - passed
  2. Valid (spell checked) words - passed
  3. Speed: fast - passed

This is a viable solution. But can we make it even faster?

4. Changing Approach to the Problem

In the first 3 attempts, we are generating a list of possible anagrams that require validation against a dictionary or list of words. This takes time to generate and time to validate each word.

A far simpler approach would be to pre-generate sets of anagrams and look for each set. This way, we remove the time to generate and validate. We can store the anagram sets in a Generic Dictionary<TKey, TValue> Class[^] and calculate a key that is common to each anagram in the set.

Calculating a common key went through a few iterations and the winning solution was based on "all anagrams of a word are equal length and contained the same characters".

So, for example, "teaser", "eaters", "easter", "asteer", "aretes", "saeter", "seater", "staree", "reseat" all have the same key "aeerst".

C#
static string GetKey(string w)
{
    var chars = w.ToCharArray();
    Array.Sort(chars);
    return new string(chars);
}

We now don't need to generate a list of anagrams. The word list that will be loaded contains all the words that we are going to need. Pre-building our anagram lists and storing in a Dictionary<TKey, TValue> with our generated key means we can take the word entered from our user, generate a key, and get a list of valid anagrams back in bulk if it is a valid word.

C#
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.IO;

namespace FastAnagrams
{
    static class Program
    {
        private static void Main(string[] args)
        {
            var st = new Stopwatch();
            var elapsed = default(TimeSpan);
            IEnumerable<string> anagrams = null;

            st.Start();
            LoadWords("words.txt");
            st.Stop();
            Console.WriteLine($"Loading Time: {st.ElapsedMilliseconds} ms\r\n");
            var prime = ContainsWord("a");

            while (true)
            {
                st.Reset();
                Console.Write("\r\nSearch Word: ");

                int count = 0;

                var lookupWord = Console.ReadLine().Trim().ToLower();
                if (lookupWord.Length == 0) break;

                st.Start();
                var key = lookupWord.GetKey();
                if (ContainsWord(key)) anagrams = GetResults(key);
                elapsed = st.Elapsed;
                st.Stop();

                foreach (var anagram in anagrams)
                    Console.WriteLine($"{++count,2}: {anagram}");

                Console.WriteLine($"\r\nTime: {elapsed.TotalMilliseconds} ms");
                Console.WriteLine($"{count:N0} anagrams for {lookupWord}");
            }
        }

        static bool ContainsWord(string key) => LookupTable.ContainsKey(key);
        static IEnumerable<string> GetResults(string key) => LookupTable[key];

        static Dictionary<string, list<string>> LookupTable
            = new Dictionary<string, list<string>>();

        static void LoadWords(string filename)
        {
            var wordKeys = new Dictionary<string, int>();
            string word, key;
            List<string> wordList = null;

            using (StreamReader sr = File.OpenText(filename))
            {
                while ((word = sr.ReadLine()) != null)
                {
                    if (word.Length > 0 && !word.Contains("'") && !word.Contains("\""))
                    {
                        key = word.GetKey();

                        if (!LookupTable.ContainsKey(key))
                        {
                            wordList = new List<string>();
                            LookupTable.Add(key, wordList);
                        }
                        else
                        {
                            wordList = LookupTable[key];
                        }

                        wordList.Add(word);
                    }
                }
            }
        }
    }
}

How Did We Do?

Passed 3 out of 3 requirements with a major improvement at 0.0148 ms:

  1. Generate anagrams - passed
  2. Valid (spell checked) words - passed
  3. Speed: faster - passed

This is a much better solution. But can we make it even faster?

5. Performance Tweak

Right now, you are probably thinking that we can't tweak the performance any further. But with a bit of experimentation, it is possible. Currently, the code makes 3 function calls:

  1. Generate the key
  2. Check if the key exists
  3. Return the list of anagrams

By combining all 3 functions into one, we gain another major jump in performance. The final version is now:

C#
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.IO;

namespace FastAnagrams
{
    static class Program
    {
        private static void Main(string[] args)
        {
            var st = new Stopwatch();
            var elapsed = default(TimeSpan);
            IEnumerable<string> anagrams = null;

            st.Start();
            LoadWords("words.txt");
            st.Stop();
            Console.WriteLine($"Loading Time: {st.ElapsedMilliseconds} ms\r\n");
            var prime = Find("a");

            while (true)
            {
                st.Reset();
                Console.Write("\r\nSearch Word: ");

                int count = 0;

                var lookupWord = Console.ReadLine().Trim().ToLower();
                if (lookupWord.Length == 0) break;

                st.Restart();
                anagrams = Find(lookupWord);
                elapsed = st.Elapsed;

                foreach (var anagram in anagrams)
                    Console.WriteLine($"{++count,2}: {anagram}");

                Console.WriteLine($"\r\nTime: {elapsed.TotalMilliseconds} ms");
                Console.WriteLine($"{count:N0} anagrams for {lookupWord}");
            }
        }

        public static string GetKey(this string word)
        {
            var chars = word.ToCharArray();
            Array.Sort(chars);
            return new string(chars);
        }

        static IEnumerable<string> Find(string lookupWord)
        {
            var chars = lookupWord.ToCharArray();
            Array.Sort(chars);
            var key = new string(chars);
            if (LookupTable.ContainsKey(key))
                foreach (var item in LookupTable[key])
                    yield return item;
            else
                yield break;
        }

        static Dictionary<string, list<string>> LookupTable
            = new Dictionary<string, list<string>>();

        static void LoadWords(string filename)
        {
            var wordKeys = new Dictionary<string, int>();
            string word, key;
            List<string> wordList = null;

            using (StreamReader sr = File.OpenText(filename))
            {
                while ((word = sr.ReadLine()) != null)
                {
                    if (word.Length > 0 && !word.Contains("'") && !word.Contains("\""))
                    {
                        key = word.GetKey();

                        if (!LookupTable.ContainsKey(key))
                        {
                            wordList = new List<string>();
                            LookupTable.Add(key, wordList);
                        }
                        else
                        {
                            wordList = LookupTable[key];
                        }

                        wordList.Add(word);
                    }
                }
            }
        }
    }
}

How Did We Do?

Passed 3 out of 3 requirements!

  1. Generate anagrams - passed
  2. Valid (spell checked) words - passed
  3. Speed: on fire! - passed

We have a winner! From a starting point of 2.75 ms (validated words) to 0.0003 ms.

6. Special Mention - SQLite

I expected SQLite[^] to be a bit slower than an in-memory lookup table. With my Mackbook Pro and its SSD (Solid State Drive), the results were quite surprising - they are just as fast as the Optimised Dictionary method (5.)!

C#
using System;
using System.Collections.Generic;
using System.Data.SQLite;
using System.Diagnostics;
using System.IO;
using System.Text;

namespace FastAnagrams
{
    static class Program
    {
        private static void Main(string[] args)
        {
            var st = new Stopwatch();
            var elapsed = default(TimeSpan);
            IEnumerable<string> anagrams = null;

            st.Start();
            InitDB();
            st.Stop();

            while (true)
            {
                st.Reset();
                Console.Write("\r\nSearch Word: ");

                int count = 0;

                var lookupWord = Console.ReadLine().Trim().ToLower();
                if (lookupWord.Length == 0) break;

                st.Restart();
                anagrams = Find(lookupWord);
                elapsed = st.Elapsed;

                foreach (var anagram in anagrams)
                    Console.WriteLine($"{++count,2}: {anagram}");

                Console.WriteLine($"\r\nTime: {elapsed.TotalMilliseconds} ms");
                Console.WriteLine($"{count:N0} anagrams for {lookupWord}");
            }
        }

        public static string GetKey(this string word)
        {
            var chars = word.ToCharArray();
            Array.Sort(chars);
            return new string(chars);
        }

        static IEnumerable<string> Find(string lookupWord)
        {
            var chars = lookupWord.ToCharArray();
            Array.Sort(chars);

            using (var cmd = 
                new SQLiteCommand($"SELECT word FROM {tableName} WHERE '" + 
                    new string(chars) + "' = wordkey", dbConn))
                using (SQLiteDataReader coreReader = cmd.ExecuteReader())
                    while (coreReader.Read())
                        yield return coreReader[0].ToString();
        }

        static SQLiteConnection dbConn;
        static SQLiteTransaction dbTrans;
        static string wordFile = "words.txt", dbFile = "words.db", 
            conn, tableName = "WordLookup",
            createCmd = $"CREATE TABLE {tableName} (wordkey TEXT, word TEXT)",
            insertCmd = $"INSERT INTO {tableName} (wordkey, word) values (@wordkey, @word)";

        static void InitDB()
        {
            CheckAndOpenDb(dbFile);
            CheckAndCreateTable();
        }

        private static void CheckAndOpenDb(string file)
        {
            conn = $"Data Source={file};Version=3;DateTimeKind=Utc";
            if (!File.Exists(file))
            {
                Console.WriteLine("Creating Database...");
                SQLiteConnection.CreateFile(file);
            }
            else
            {
                Console.WriteLine("Starting DB...");
            }
            dbConn = new SQLiteConnection(conn);
            dbConn.Open();
        }

        private static void CheckAndCreateTable()
        {
            if (!TableExists(tableName))
            {
                Console.WriteLine("Creating Table...");
                ExecuteCommand(createCmd);
                LoadWords();
            }
        }

        static void LoadWords()
        {
            string word, key;

            Console.WriteLine("Copying Word.Txt to Table...");
            BeginTrans();
            using (StreamReader sr = File.OpenText(wordFile))
            {
                while ((word = sr.ReadLine()) != null)
                {
                    if (word.Length > 0 && 
                        !word.Contains("'") && 
                        !word.Contains("\""))
                    {
                        key = word.GetKey();
                        ExecuteCommand(insertCmd, 
                            new Dictionary<string, object>
                            {
                                { "@wordkey", key },
                                { "@word", word }
                            });
                    }
                }
            }
            CommitTrans();
        }

        static bool ExecuteCommand
            (string query, Dictionary<string, object> fields = null)
        {
            bool success = false;
            using (SQLiteCommand cmd = Command(query))
            {
                if (fields != null && fields.Count != 0)
                    foreach (var key in fields.Keys)
                        cmd.Parameters.AddWithValue(key, fields[key]);

                cmd.ExecuteNonQuery();
                success = true;
            }
            return success;
        }

        static SQLiteCommand Command(string sql)
        {
            return new SQLiteCommand(sql, dbConn);
        }

        static bool TableExists(string name)
            => Command($"SELECT name FROM sqlite_master 
            WHERE type='table' AND name='{name}'")
                   .ExecuteReader()
                   .HasRows;

        static void BeginTrans()
        {
            dbTrans = dbConn.BeginTransaction();
        }

        static void CommitTrans()
        {
            dbTrans.Commit();
        }
    }
}

How Did We Do?

Passed 3 out of 3 requirements.

  1. Generate anagrams - passed
  2. Valid (spell checked) words - passed
  3. Speed: on fire! - passed

Result Summary

Loading words...

3. Anagram using HashSet
4. Keyed Dictionary
5. Keyed Dictionary Optimised
6. SqLite Lookup

Running Tests...

Name                           Count   Timing (ms)           % Var
----------------------------------------------------------------------
1. Anagram Generator             360       3.83040         0.00000
2. Anagram WPF Spell Check         3  38,484.62290       -99.99005
3. Anagram using HashSet           9       2.75240        39.16582
4. Keyed Dictionary                9       0.01480    25,781.08108
5. Keyed Dictionary Optimised      9       0.00030 1,276,700.00000
6. SqLite Lookup                   9       0.00030 1,276,700.00000
----------------------------------------------------------------------
* Timings are dependant on the config of the PC running them.

-- press any key to exit --

The WPF word dictionary appears to be smaller than the word list that is being used for tests 3 to 6. Only 3 anagrams found for the word "teaser", 9 for tests 3 to 9. The SqLite test threw me at first but was independently checked and verified. All testing was done in Release Mode and run from the command prompt.

Closing Comments

The journey was an interesting one. The lesson learned from this journey is don't discredit possible solutions without validation. I would never have thought that the SqLite solution would be as fast as it was given the configuration of my computer. In a real world scenario, the results will be different dependent on the system that it is running on.

Included Downloadable Code

The code included with this article is shared and can be run individually or together as a single test solution.

To perform optimal testing, as was done for this article, compile the code in Release Mode, open a command prompt and run.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

Graeme_Grant
Technical Lead
Australia Australia
No Biography provided

Comments and Discussions

 
QuestionCoder for small python project Pin
Member 1312910424-Apr-17 14:55
MemberMember 1312910424-Apr-17 14:55 
GeneralRe: Coder for small python project Pin
PIEBALDconsult24-Apr-17 15:16
professionalPIEBALDconsult24-Apr-17 15:16 
GeneralRe: Coder for small python project Pin
Member 1312910424-Apr-17 17:27
MemberMember 1312910424-Apr-17 17:27 
Noted. Although it is difficult to find someone with specific topic experience - thank you.
PraiseSometimes looking at the solution set will reveal a quicker solution Pin
LeeBear14-Mar-17 9:06
MemberLeeBear14-Mar-17 9:06 
GeneralStill no multi-word anagrams Pin
PIEBALDconsult12-Mar-17 5:19
professionalPIEBALDconsult12-Mar-17 5:19 
GeneralRe: Still no multi-word anagrams Pin
Graeme_Grant12-Mar-17 5:39
professionalGraeme_Grant12-Mar-17 5:39 
GeneralRe: Still no multi-word anagrams Pin
PIEBALDconsult12-Mar-17 6:00
professionalPIEBALDconsult12-Mar-17 6:00 
GeneralRe: Still no multi-word anagrams Pin
Graeme_Grant12-Mar-17 5:42
professionalGraeme_Grant12-Mar-17 5:42 
GeneralRe: Still no multi-word anagrams Pin
PIEBALDconsult12-Mar-17 5:59
professionalPIEBALDconsult12-Mar-17 5:59 
QuestionNice solution, but has some invalid time measurements Pin
Jeffrey Patterson12-Mar-17 5:03
MemberJeffrey Patterson12-Mar-17 5:03 
AnswerRe: Nice solution, but has some invalid time measurements Pin
Graeme_Grant12-Mar-17 5:35
professionalGraeme_Grant12-Mar-17 5:35 
GeneralRe: Nice solution, but has some invalid time measurements Pin
Jeffrey Patterson12-Mar-17 6:09
MemberJeffrey Patterson12-Mar-17 6:09 
GeneralRe: Nice solution, but has some invalid time measurements Pin
Jeffrey Patterson12-Mar-17 6:13
MemberJeffrey Patterson12-Mar-17 6:13 
GeneralRe: Nice solution, but has some invalid time measurements Pin
Graeme_Grant12-Mar-17 6:42
professionalGraeme_Grant12-Mar-17 6:42 
GeneralRe: Nice solution, but has some invalid time measurements Pin
Jeffrey Patterson12-Mar-17 7:11
MemberJeffrey Patterson12-Mar-17 7:11 
QuestionGoal 1 - Build List of Anagrams Pin
John Brett10-Mar-17 6:25
MemberJohn Brett10-Mar-17 6:25 
AnswerRe: Goal 1 - Build List of Anagrams Pin
Graeme_Grant10-Mar-17 7:05
professionalGraeme_Grant10-Mar-17 7:05 
GeneralRe: Goal 1 - Build List of Anagrams Pin
John Brett12-Mar-17 21:08
MemberJohn Brett12-Mar-17 21:08 
QuestionNice Pin
Garth J Lancaster9-Mar-17 12:46
mveGarth J Lancaster9-Mar-17 12:46 
AnswerRe: Nice Pin
Graeme_Grant9-Mar-17 13:26
professionalGraeme_Grant9-Mar-17 13:26 
GeneralDownload link not working Pin
Bryian Tan9-Mar-17 6:23
professionalBryian Tan9-Mar-17 6:23 
GeneralRe: Download link not working Pin
Graeme_Grant9-Mar-17 6:27
professionalGraeme_Grant9-Mar-17 6:27 
GeneralRe: Download link not working Pin
Bryian Tan9-Mar-17 6:46
professionalBryian Tan9-Mar-17 6:46 
GeneralRe: Download link not working Pin
Graeme_Grant9-Mar-17 6:50
professionalGraeme_Grant9-Mar-17 6:50 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.