Click here to Skip to main content
15,884,237 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
I'm attempting to return all possible word combinations from the letters "epfdoctlz" using the nltk corpus as the word list and a trie

What I have tried:

Python
import nltk
nltk.download('words')


from nltk.corpus import words
word_list = words.words()
# prints 236736
print(len(word_list))


trie = {}

for word in word_list:
    cur = trie
    for l in word:
        cur  = cur.setdefault(l, {})
        cur['word'] = True # defined if this node indicates a complete word
        
def findWords(word, trie = trie, cur = '', word_list = []):
    for i, letter in enumerate(word):
        if letter in trie:
            if 'word' in trie[letter]:
                word_list.append(cur + letter)
            findWords(word, trie[letter], cur+letter, word_list )    
            

    return word_list

words_longer = findWords("epfdoctlz")


longer_list = []

for word in words_longer:
    if len(word) > 1:
        longer_list.append(word)
print(longer_list) 


This returns a list (longer_list) with partial and incomplete words along with the desired list of full words.

partial words such as:
'lett', 'lette'

which it appears would culminate in the word 'letter'

or
'doct', 'docto'

are returned. However when the code sees there is no letter 'r' in "epfdoctlz" it stops, and still returns
'doct', 'docto'

when it should return nothing for that word since it contains an 'r'

On a high level, how do I make this code say, "if it's not a full word, don't return the beginning of the word, in fact don't return anything other than full words"
Posted
Updated 3-Nov-21 7:51am
v5
Comments
Richard MacCutchan 3-Nov-21 4:54am    
I would start by iterating the letters in the word and comparing them to the letters in your control string. As soon as it finds a letter not in the control it should terminate the search and move on to the next word. Only return words where every letter exist in the control string. You may (or may not) also need to check for duplicate letters.
nate walter 3-Nov-21 11:38am    
@Richard MacCuthan Yeah, it's just I'm not sure how to do that in Python language.
Richard MacCutchan 3-Nov-21 12:02pm    
Something like:
word_list = [ 'doctor', 'food', 'fruit', 'pond', 'felt', 'don' ]
pattern = 'epfdoctlz'
for word in word_list:
    find = True
    for letter in word:
        if letter not in pattern:
            find = False
            break
    if find == True:
        print(F'The letters of {word} are all in {pattern}')
nate walter 3-Nov-21 13:28pm    
This seems to work the best out of anything so far.
nate walter 3-Nov-21 13:31pm    
If you'd like to post this as a response I'd be more than happy @ Richard MacCuthan

Suggested solution:
Python
word_list = [ 'doctor', 'food', 'fruit', 'pond', 'felt', 'don' ]
pattern = 'epfdoctlz'
for word in word_list:
    find = True
    for letter in word:
        if letter not in pattern:
            find = False
            break
    if find == True:
        print(F'The letters of {word} are all in {pattern}')


[edit]
A better implementation, taking advantage of the Python else clause that can be bound to a loop statement:
Python
word_list = [ 'doctor', 'food', 'fruit', 'pond', 'felt', 'don' ]
pattern = 'epfdoctlz'
for word in word_list:
    for letter in word:
        if letter not in pattern:
            break
    else: # note that this else is bound to the inner for statement
        print(F'The letters of {word} are all in {pattern}')


[/edit]
 
Share this answer
 
v2
Thanks to Richard MacCutchan for his answer
I went ahead and made a function out his solution using nltk as the word list

Python
import nltk
nltk.download('words')

import words
word_list = words.words()
print(len(word_list))
#prints 236736


word_list = word_list
pattern = 'epfdoctlz'

def desired_words(pattern):
    for word in word_list:
        find = True
        for letter in word:
            if letter not in pattern:
                find = False
                break
        if find == True and len(word) > 1:
            print(F'The letters of {word} are all in {pattern}')
desired_words(pattern)
 
Share this answer
 
v3

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900