Addicted to Wordle? Professor Nir Oren, Director of Research at the School of Natural and Computing Sciences, has a formula that provides the answer to the question we are all asking: ‘What is the best word to start with?’
Wordle has intrigued and frustrated millions around the world. Articles covering 'the best word to start with from a X point of view' abound (where X can include linguistics, mathematics, astrology, or some other 'expert'). I thought I'd add to this list of articles, approaching the problem from an AI point of view, and explaining why the optimal word I suggest is indeed the best word possible.
If you are unfamiliar with it, Wordle is a word guessing game. Each day, users get six attempts to guess a five letter word. After every attempt, colours indicate which letters are correct and in the right place (green), correct but in the wrong place (yellow), or simply incorrect (grey). A good player will guess words in an order which eliminates many other possibilities, narrowing down candidate words as they go along.
Finding the best word
So what makes a good word to guess? Let's assume we have N candidate words, and no additional information. After we make a guess, we'll get some combination of five grey, green or yellow colours in a specific order as feedback. How many such feedback combinations are possible? Well, if we had a one letter word, we could have 3 possible feedbacks (just the colours themselves); if we had two letter words, 3x3 feedbacks are possible (as any combination of colours for each letter must be considered), and extending this to five letters, there are a total of 3x3x3x3x3 = 243 such feedbacks possible.
Each one of these 243 feedbacks can be associated with a different set of candidate words from our original candidate wordlist. For example, let's say we guessed the word of the day is 'today', and we guessed 'ahead'. The feedback we'd receive is 'yellow, grey, grey, grey, yellow'. If our list of candidate words consisted of 'ahead, today, toady, green', then the words consistent with this feedback are 'today' and 'toady'; all other words have been eliminated. We could then make our next guess from this list of candidate words, until only one candidate word remains.
Now imagine a box containing all candidate words. We can draw 243 arrows leaving this box and leading to other boxes. We label each of the 243 arrows with one of the feedback combinations we receive if we had chosen to play the first word in the original box, and the words contained in each of the 243 boxes to which the arrow leads then contain the possible remaining candidate words. We then repeat the process until we have - at the very edge of our drawing - only boxes containing one word.
This drawing of what computer scientists call a 'tree' then represents one strategy for playing the game. What we need to do is identify what word to put first in every box (which will affect what words appear in boxes that follow it), to minimise the number of boxes between our first box and any box containing a single word. If this number is less than six, then we are guaranteed to be able to solve Wordle.
So now, we need to try reduce the number of boxes. The best way to achieve this is to try have an equal number of words in the boxes at each level. If you're struggling to see this, consider an extreme case. Let's say you have a box containing 10 words, and you pick a word which eliminates only one candidate. Then all but one of the boxes following will have no words, and the remaining box will have nine words. if you repeat this process, you'll see that you will need nine steps to get to the final box.
In contrast, if you started with (say 59049) words, and were able to get each box to have the same number of words, then at the next level each box would have 243 words, and at the final level you'd have another 243 boxes with one word; you would have managed to find the correct word using only two levels!
This suggests a strategy that we can use. For every candidate word, we compute how many candidate words go in each box below it, and pick the candidate word which distributes the words as evenly as possible across the next level, and we can measure this by computing the word's 'entropy' using a simple formula. We then start by picking the word with the largest entropy. After receiving feedback from the game, we follow the arrow to the appropriate box, and pick the word from that box with the largest entropy, repeating this process until we get to the last word.
I found an online list containing 5757 five letter words. Using the process described above, I was able to build a tree with six levels, allowing me to find any word in fewer than six guesses given the best starting word.
Enough already, what's the word?
Assuming Wordle uses the same dictionary, the best word to start with is 'tares'. This word contains many common letters in common positions, allowing one to quickly eliminate many alternatives. It's worth noting that there are many other words that are pretty much as good, including 'crane', 'rates' and 'teams'. If any of these words are used, and under perfect play, you'll find the target word in between three and four moves.
However, even if you start with a bad word such as 'fuzzy', you'll still be able to find the goal word in just over four moves on average (which isn't surprising given that you can then jump back to one of the good words afterwards). No matter which starting word you choose, there are some words that you'll be lucky to find. The longest sequence of guesses can include up to 11 steps; a goal word such as 'watch' could require guesses including 'patch, match, hatch, latch' and 'batch'. In fact, around 5% of the words in my dictionary required luck, or more than six steps to find.
Conclusions
While Wordle is a fun distraction, solving the game actually highlights some very interesting science. The approach described above lies at the heart of the ID3 algorithm to build decision trees. Such decision trees are a form of Artificial Intelligence which has been embedded in many software systems.
Decision trees are used to help make choices ranging from whether to grant someone a mortgage, to what medicine a doctor should prescribe. Building a decision tree for these use-cases is done in precisely the same manner as we built our Wordle decision tree, except that rather than using a dictionary and game feedback to drive the construction process, credit or health records and outcomes are used as input data.
The moral of the story is that any decent word, containing common letters (ideally in common positions) is just about as good as choosing a perfect starting word. No matter what you choose, there will be some days that you'll need luck to succeed.