Solving the New York Times Spelling Bee Using Python

The New York Times “Spelling Bee” is a difficult puzzle in which you try to come up with words using only the letters that appear in a honeycomb grid. Every word must contain the center letter and letters can be reused. Below is an example spelling bee:

The first words that come to mind for me here are Table, Beetle, and Letbav. Due to my inability to score anywhere near ‘genius’ I have written a python program to solve the puzzle and temporarily reduce the shame.

Solution 1

My first idea to solve this puzzle consisted of the following:

Find all possible combinations of the puzzle’s letters with repeats. e.g. (aaaaa, aaaab, aaaae, aaaag, etc.) for word lengths from 5 to 10+.
Compare that list with a list of all English words.
Drop all the words that are not in both lists.
Drop all the words left that do not contain the center letter.

I learned that the number of all possible combinations of the spelling bee’s 7 letters with repeats was going to be computationally challenging. There is no limit on word length for the spelling bee puzzle. Combinations with repetition at long word lengths is a very long list.

In trying to understand the problem and see if it was going to be possible, I found out that there is a name for combinations with repetition. This is actually called the Cartesian Product. This product gives all possible combinations of two sets. Alternatively, with one set, you can get all possible combinations of that set with itself. The python implementation of the Cartesian Product comes in the itertools module as the function product.

I tested itertools.product to make sure the result is what I expected. Below is a code snippet and result.

test_list = ['a','b','c','d']
list(itertools.product(test_list,repeat=2))
[('a', 'a'),
 ('a', 'b'),
 ('a', 'c'),
 ('a', 'd'),
 ('b', 'a'),
 ('b', 'b'),
 ('b', 'c'),
 ('b', 'd'),
 ('c', 'a'),
 ('c', 'b'),
 ('c', 'c'),
 ('c', 'd'),
 ('d', 'a'),
 ('d', 'b'),
 ('d', 'c'),
 ('d', 'd')]

The above shows that the output gives all possible combinations of the test list with replacement of length 2. This can be scaled to the NYT Spelling Bee by creating a list of the letters in the spelling bee and iterating over all acceptable word lengths. I started this process and found a computational problem.

There are 7 letters in the bee. The product of a 7 letter set with 5 repeats results in 16,807 possible combinations. 6 repeats results in 117,649 combinations.

7 repeats is 823,543 combinations

8 repeats is 5,764,801 combinations

9 repeats is 40,353,607 combinations

I am not patient enough to create a list of all possible combinations for a 10 repeat list. It may be clear (it wasn’t for me) that the number of combinations for a letter set of length L with N repeats is equal to
$$ L^N $$. The longest word I could find in a quick search that wasn’t some ridiculous technical word and only uses 7 letters is ‘disinterestedness’ which is 17 letters long. To find words up to 17 letters using the itertools.product function, I would need to create a list of

$$ 7^{17} = 232,630,513,987,207 $$

possible combinations. Surprise! That list is unmanageable and would take forever to search through. There is another way to solve this problem though…

Solution 2

Instead of trying to create every possible combination of the Spelling Bee letters in the list from length 5 up to 17, it probably makes more sense to start with a list of actual words and work down from there.

Using the SCOWL (And Friends) website, I pulled down a dictionary of just over 100,000 words.

The first thing to do is get that dictionary into a list.

word_dict = [line.rstrip('\n') for line in open('dict60.txt')]

Then start bringing the list down to a size to look through for words that only contain letters in the Spelling Bee. The below code block removes any punctuation, then remove any words under the minimum length allowed by the Spelling Bee.

for word in word_dict:
    #strip punctuation
    word = word.translate(str.maketrans('', '', string.punctuation))
    #drop words with less than 5 letters
    if len(word) >= 5:
        sub_dictionary.append(word)

Now I’ve got sub_dictionary that contains words only 5 letters or longer. Next is the removal of words that do not contain the center letter.

center_letter = 'T'
for word in sub_dictionary:
    if center_letter.lower() in word:
        center_only.append(word)

Now I’ve got a list of words that are at least 5 letters long and contain the letter ‘T’. Next is the removal of words that are not a subset of all the letters in the Spelling Bee. First I instantiate a list final_words to hold the words in the center_only that are a subset of the Spelling Bee letter set.

#store all the letters in the bee
letters = 'VABEGL'
#set the minimum number allowed by the spelling bee
min_letters = 5
#create the spelling bee letterset
letterset = set(letters.lower() + center_letter.lower())
#instantiate the list to contain the final words
final_words = []
#loop through the words containing the center letter keeping only 
#that are a subset of the letters in the spelling bee 
for word in center_only:
    if set(word).issubset(letterset):
        final_words.append(word)

With that, the final_words list contains all remaining words. Lets print it given the Spelling Bee puzzle letters seen at the beginning of the article.

['abate',
 'ablate',
 'agate',
 'bagatelle',
 'ballet',
 'battle',
 'beatable',
 'beetle',
 'begat',
 'beget',
 'betel',
 'bleat',
 'eaglet',
 'eatable',
 'elate',
 'elevate',
 'latte',
 'legate',
 'legatee',
 'tabla',
 'table',
 'tablet',
 'tattle',
 'tattletale',
 'teabag',
 'telltale',
 'valet',
 'vegetable',
 'vegetate',
 'velvet']

Apparently a young eagle is actually called an eaglet, and begat is the past tense of beget. Fine.

Lets score the words. The NYT board of directors declare that words that use all 7 letters are worth 3 points, and all other words worth 1. The way to do this, I think, is to do the same subset check. Checking if the Spelling Bee letters are a subset of the letters in each word in the final_words list.

for word in final_words:
    if letterset.issubset(word):
        score = score + 3
    else:
        score = score + 1
print('score: ', score)

out: 32

So the python program came up with a word list worth 32 points. ‘Genius’. Stay tuned for the next post, where we’ll go over how to solve the Sunday crossword using Fortran and bend spoons with our minds.

The code in this program can be found in a jupyter notebook showing the combinatorics stuff and the final solution on the woolsocks github.

Solution 1

Solution 2

Leave a Reply Cancel reply