Click here to Skip to main content
15,895,799 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
Hey guys.
Quick question.
I am writing a program to find all the permutations of a set of characters I input into the application.
This part works perfectly.
My problem is that I need to check all the permutations of the characters against a text file I use as a dictionary.
Ex. If I input the characters TSTE the outputs givin are tset,ttse,ttes,tets,test,stte,stet,sett...
I only want to print the valid words like tset,sett,stet,test,tets. where ttse,ttes,stte is not printed.

The code I have so far is as follows.
I have been scracthing at the edges of my scull for the past few days and just cant seem to find a way to do it.
Please if there is anything you can see that I have missed?

Thank you

C#
static void Main(string[] args)
        {
            Console.BufferHeight = Int16.MaxValue - 1;

            Console.WindowHeight = 40;
            Console.WindowWidth = 120;

            Permute p = new Permute();            
            var d = Read();
            string line;
            string str = System.IO.File.ReadAllText("Dictionary.txt");
            while ((line = Console.ReadLine()) != null)
            {                
                char[] c2 = line.ToArray();                
                p.setper(c2);
                           }
        }
        static Dictionary<string, string> Read()
        {
            var d = new Dictionary<string, string>();
            using (StreamReader sr = new StreamReader("Dictionary.txt"))
            {
                string line;
                while ((line = sr.ReadLine()) != null)
                {
                    string a = Alphabet(line);
                    string v;
                    if (d.TryGetValue(a, out v))
                    {
                        d[a] = v + "," + line;
                    }
                    else
                    {
                        d.Add(a, line);
                    }
                }
            }
            return d;
        }
        static string Alphabet(string s)
        {
            char[] a = s.ToCharArray();
            Array.Sort(a);
            return new string(a);
        }
        static void Show(Dictionary<string, string> d, string w)
        {
            string v;
            if (d.TryGetValue(Alphabet(w), out v))
            {
                Console.WriteLine(v);
            }
            else
            {
                Console.WriteLine("-----------");
            }
        }
    }
    class Permute
    {
        private void swap(ref char a, ref char b)
        {
            if (a == b) return;
            a ^= b;
            b ^= a;
            a ^= b;
        }
        public void setper(char[] list)
        {
            int x = list.Length - 1;
            go(list, 0, x);
        }
        public void go(char[] list1, int k, int m)
        {
            if (k == m)
            {

                Console.WriteLine(list1);
                Console.WriteLine(" ");

            }
            else
            {
                for (int i = k; i <= m; i++)
                {
                    swap(ref list1[k], ref list1[i]);
                    go(list1, k + 1, m);
                    swap(ref list1[k], ref list1[i]);
                }
            }
        }
Posted
Comments
Jegan Thiyagesan 5-Mar-13 7:20am    
can you post the copy of your dictionary content, as how it is laid out?

Also you are using circular referencing in "go" function, this can go hairy, you should find alternative way to call that increment.

Jegan
BulletVictim 6-Mar-13 4:18am    
I have a dictionary file of about 1097699 words.
The layout of the File is one word underneath the other.

As I see, you are taking the hardest way. A string is the permutation of an other if both contain the same characters in the same amount. As p(x)=x!, it can go wild: there are 3.628.800 permutations of a 10 character long string. With 20 you get 2.432.902.008.176.640.000! Do you have enough memory and time? Think about the recursion too... I suppose the file itself will be considerably shorter...
So in general it is cheaper to check if a word is a permutation of an other, than to generate all permutations, store them, and try to match all of them. Look at the figures above!

Look at following code:
C#
using System;
using System.Linq;
using System.Collections.Specialized;
using System.IO;

namespace MyNetstat
{
	class Program    
	{
		static bool isPermutation(string s1, string s2)
		{
			var _s2 = s2.ToCharArray();
			Array.Sort(_s2);
			
			foreach(var c in s1)
			{
				int i = Array.IndexOf(_s2,c);
				if(i < 0)
				{
					return false;
				}
				_s2[i] = (char)0;
			}
			
			return !_s2.Any(x => x != (char)0);
		}
		
		static StringCollection ParseFile(string filename, string test)
		{
			var result = new StringCollection();
			string line;
			
			StreamReader file = new StreamReader(@"d:\temp\dictionary.txt");
			while((line = file.ReadLine()) != null)
			{
				if(isPermutation(line, test))
				{
					result.Add(line);
				}
			}
			
			file.Close();
			
			return result;
		}
		
		
		public static void Main()
		{
			string test;
			
			Console.Write("Enter word to test for: ");
			test = Console.ReadLine();
			
			if(!string.IsNullOrWhiteSpace(test))
			{
				foreach(string s in ParseFile("Dictionary.txt", test))
				{
					Console.WriteLine(s);
				}
			}
			else
			{
				Console.WriteLine("Can not test for empty string!");
			}
		}
	}
}


BTW: why do you need to make dictionary from the file, and why <string,string>?
 
Share this answer
 
v2
Comments
BulletVictim 6-Mar-13 4:45am    
Hey yes. Im just thinking out loud but if I consider my code then all my formula(swap()) does is change the position of each character in the string. That way it is a permutation of the word already. So I do not need to check if it is a permutation.

I was using the Dictionary<string,string> in combination with the other way of checking, which I have removed now as I now check each permutation against the file.
If you like dense code, here is something to chew ;-)
C#
static IEnumerable<string> Perm(int n, char[] items)
{
    return (++n < items.Length)
        ? from c in items from s in Perm(n, items) select c + s
        : from c in items select c.ToString();
}

static void Main(string[] args)
{
    // dict: each word in one line (leading and trailing spaces get removed)
    string dictFile = @"c:\temp\dict.txt";
    var dict = new HashSet<string>(File.ReadAllLines(dictFile).Select(s=>s.ToLower().Trim()));

    var input = Console.ReadLine().ToLower().ToCharArray();
    var words = Perm(0, input).Where(p => dict.Contains(p)).Distinct();

    Console.WriteLine(string.Join("\n", words));
}


Don't use the above, it's correct but dedly slow. For the new solution see EDIT2 below.


[EDIT]
A more sensible approach was to avoid permutations at all and to the following (dense again ;-)):
C#
static void Main(string[] args)
{
    string dictFile = @"c:\temp\dict.txt";
    var dict = new HashSet<string>(File.ReadAllLines(dictFile).Select(s=>s.ToLower().Trim()));

    var input = Console.ReadLine().ToLower().ToCharArray();
    var words = dict.Where(d => d.Length == input.Length && d.All(c => input.Contains(c)));

    Console.WriteLine(string.Join("\n", words));
}            

[/EDIT]


Don't use that above since it calculates permutation per character position rather than by string, e.g. eeee is not a permutation of tste. For the new solution see EDIT2 below.

[EDIT2]
This takes care of proper permutations.
C#
static void Main(string[] args)
{
    var input = new string(Console.ReadLine().ToLower().OrderBy(c=>c).ToArray());

    string dictFile = @"c:\temp\dict.txt";
    var dict = new HashSet<string>(File.ReadAllLines(dictFile).Where(s=>s.Length == input.Length).Select(s=>s.ToLower()));
    var words = from d in dict where d.Length==input.Length && new string(d.OrderBy(c=>c).ToArray())==input select d;
            
    Console.WriteLine(string.Join("\n", words));
}            

[/EDIT2]
Cheers
Andi
 
Share this answer
 
v7
Comments
BulletVictim 6-Mar-13 4:48am    
Thanks I think I will use your first suggestion if I cannot figure out how to do everything I want to with my current code.
Andreas Gieriet 6-Mar-13 5:54am    
No, please not. It will take literally days to calculate a 10 character string permutation!
Use the second approach. This is the one I described in the comment to your solution.
Andi
BulletVictim 6-Mar-13 6:51am    
Hey Andy
Thanks for the help I see you have also commented on my follow up question.
I have tested your example, and it gives an output that is not exactly what I am looking for in my program. It does work for the intended purpose that you wrote it. I think it might just be a miss communication on my part.
Just to let you know the dictionary file I use has 1097699 words contained within it.
If you could take a look at the solution I posted on the previous problem I had you might better understand what it is I am trying to accomplish with this application.
In essence it is an anagram solver
Andreas Gieriet 6-Mar-13 7:31am    
What you do in your solution is partly non-sense. Sorry. E.g. you seem to not understand what Regex is for.
Plus: doing permanent file access in calculating permutations is very insensitive: file access is the most expensiv storage means (with respect to execution time). Did you ever run your code with a 10 character command line entry? It will take an eternity until you get a result!

On the other hand, loading (as I show in my solution) a text file (say with 1 million entries) is a quick task and storage in the HashSet may be imprived by only storing thoes entries that are of the correct length, e.g. var dict = new HashSet<string>(File.ReadAllLines(dictFile).Select(s=>s.ToLower().Trim()).Where(s=>s.Length == iLen));.
The "same length" is a reasonable filter for anagrams that results in probably far less content to process.
What exactly does not work in my solution? It lists all anagrams from a dictionary. Or do you have a counter example?
Cheers
Andi
BulletVictim 7-Mar-13 2:14am    
Ok miss communication again. Yes I am aware that my solution takes forever with larger strings and is not the most efficient. But it what it gives me in the output is what I want. It compares the entire permutation against the dictionary file and only displays exact matches. (I know I did not make that clear in the beginning.)
Where your solution works EXACTLY as you state its feed back prints out permutations of each character given.

Your solution with "tste" has the following output:
eeee
eeet
eese
esee
eses
eset
esse
esss
este
ests
etee
etes
etet
etse
ette
seee
sees
seet
sese
sess
sest
sete
sets
sett
sses
sset
ssss
ssts
stee
stet
stse
stss
stst
teee
tees
teet
tese
tess
test
tete
tets
tett
tset
tsss
tsts
tstt
ttee
ttet
ttte
tttt

where as my solution with the same input has the following output:
tset
tets
test
stet
stet
sett
sett
tset
tets
test

(Yes I know there are duplicate outputs. That is exactly my problem at the moment with my solution)
Hi,
At the moment you are only calculating the permuted values, but you are not comparing them with the values in your dictionary.


In your line where it says

C#
Console.WriteLine(list1);

you need to keep them saved somewhere, once all the permutation finished, compare the list with your dictionary and print them out.

Regards
Jegan
 
Share this answer
 
Thanks for the feedback guys.
I did however manage to solve the problem I was stuck with. I know its weird but it is getting me on my way to the next part I want to do.



C#
static void Main(string[] args)
       {
           Console.BufferHeight = Int16.MaxValue - 1;

           Console.WindowHeight = 40;
           Console.WindowWidth = 120;
           Console.WriteLine("Enter your caracters for the anagram: ");
           string line;
           while ((line = Console.ReadLine()) != null)
           {
               Console.WriteLine("Your results are: ");
               char[] charArray = line.ToArray();
               setper(charArray);
               Console.WriteLine("-----------------------------------------DONE-----------------------------------------");
               Console.WriteLine("Enter your caracters for the anagram: ");
               File.Delete("Permutations.txt");
           }

       }


       static void swap(ref char a, ref char b)
       {
           if (a == b) return;
           a ^= b;
           b ^= a;
           a ^= b;
       }
       static void setper(char[] list)
       {
           int x = list.Length - 1;
           permuteWords(list, 0, x);
       }
       static void permuteWords(char[] list1, int k, int m)
       {

           if (k == m)
           {
               StreamWriter sw = new StreamWriter("Permutations.txt", true);
               sw.WriteLine(list1);
               sw.Close();
               Regex permutationPattern = new Regex(new string(list1));
               string[] permutations = File.ReadAllLines("Permutations.txt");
               Regex pattern = new Regex(new string(list1));
               string[] lines = File.ReadAllLines("Dictionary.txt");

               foreach (string line in lines)
               {
                   var matches = pattern.Matches(line);
                   if (pattern.ToString() == line)
                   {
                       Console.WriteLine(line);
                   }
               }
           }
           else
           {
               for (int i = k; i <= m; i++)
               {
                   swap(ref list1[k], ref list1[i]);
                   permuteWords(list1, k + 1, m);
                   swap(ref list1[k], ref list1[i]);
               }
           }


       }


Ok just to explain somethings that you might be wondering about, also this brings me to another question I have about this.
I need to check that the permutations given do not give duplicate "values". So far I have tried writing all the permutations into a text file and then using that text file in the same way as I check the Dictionary file before I write out the permutation, but that just gives me more of a headache than anything else.

Does anyone maybe know of another way to check for duplicate permutations that I can use here? Or even if there is a better way to Implement my idea even?

It was something like this:
static void permuteWords(char[] list1, int k, int m)
        {

            if (k == m)
            {
                StreamWriter sw = new StreamWriter("Permutations.txt", true);
                sw.WriteLine(list1);
                sw.Close();
                Regex permutationPattern = new Regex(new string(list1));
                string[] permutations = File.ReadAllLines("Permutations.txt");
                Regex pattern = new Regex(new string(list1));
                string[] lines = File.ReadAllLines("Dictionary.txt");
                foreach (string permutes in permutations)
                {
                    var permuteMatches = permutationPattern.Matches(permutes);
                    if (permutationPattern.ToString() != permutes)
                    {
                        foreach (string line in lines)
                        {
                            var matches = pattern.Matches(line);
                            if (pattern.ToString() == line)
                            {
                                Console.WriteLine(line);
                            }
                        }
                    }
                }
            }
            else
            {
                for (int i = k; i <= m; i++)
                {
                    swap(ref list1[k], ref list1[i]);
                    permuteWords(list1, k + 1, m);
                    swap(ref list1[k], ref list1[i]);
                }
            }


        }
 
Share this answer
 
v2
Comments
Andreas Gieriet 6-Mar-13 5:36am    
Step away from the implementation and think about the problem first.
Your solution is far over-complicated.

The situation as I understand:
1) you have a dictionary stored in a file, one word in each line
2) you get a list of characters from the command line
3) you need to check which permutation of the input characters are stored in the dictionary and print them on the console
4) there shall be no duplicates printed (assuming that the dictionary contains no duplicates)

Assumptions and Constraints
1) we assume the dictionary to contain at most some thousand words
2) we assume the comparison is made case-insensitive
3) permutations produce quickly a huge amount of data, e.g.
5 chars entered = 5*5*5*5 = 3125
10 chars entered = 10*10*...*10 = 10000000000
i.e. double the size from 5 to 10 chars results in about 3 million time larger data!
i.e. assuming linear calculation time, if the 5 chars take 1 sec, 10 chars take 34 days!
4) one may enter duplicate characters, e.g. hello which would produce duplicate permutations.

Possible Approach:
1) Do *not* calculate the permutations upfront, but rather compare each dictionary entry if it is a permutation of the chars.
2) Stand on the shoulders of others: use the means that .Net provide. Everything else is waste of time.
In other words: do not re-invent the wheel. All you need is already "invented" and you just need to know them.
3) especially know what a Hash Table is (e.g. implemented in .Net as HashSet<T>)
3) definition: a string is a permutation of some characters, if all string characters are contained in the list of the permutation characters

So, the approach (pseudo code)
read the command line and store the characters as permutation character array A
the length L of the array gives the length of the entries to match in the dictionary
read the dictionary into a hash table D
for all entries E of D
print only those Es of length L an where all chars of E are contained in A

See my solution#3, the second approach.
Cheers
Andi
BulletVictim 6-Mar-13 6:27am    
Thanks Andi I will give it a try and see what happens

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900