Solving Wordle...with science!

Solving Wordle...with science!
2022-03-16

Addicted to Wordle? Professor Nir Oren, Director of Research at the School of Natural and Computing Sciences, has a formula that provides the answer to the question we are all asking: ‘What is the best word to start with?’

 

Wordle has intrigued and frustrated millions around the world. Articles covering 'the best word to start with from a X point of view' abound (where X can include linguistics, mathematics, astrology, or some other 'expert'). I thought I'd add to this list of articles, approaching the problem from an AI point of view, and explaining why the optimal word I suggest is indeed the best word possible.

If you are unfamiliar with it, Wordle is a word guessing game. Each day, users get six attempts to guess a five letter word. After every attempt, colours indicate which letters are correct and in the right place (green), correct but in the wrong place (yellow), or simply incorrect (grey). A good player will guess words in an order which eliminates many other possibilities, narrowing down candidate words as they go along.

Finding the best word

So what makes a good word to guess? Let's assume we have N candidate words, and no additional information. After we make a guess, we'll get some combination of five grey, green or yellow colours in a specific order as feedback. How many such feedback combinations are possible? Well, if we had a one letter word, we could have 3 possible feedbacks (just the colours themselves); if we had two letter words, 3x3 feedbacks are possible (as any combination of colours for each letter must be considered), and extending this to five letters, there are a total of 3x3x3x3x3 = 243 such feedbacks possible.

Each one of these 243 feedbacks can be associated with a different set of candidate words from our original candidate wordlist. For example, let's say we guessed the word of the day is 'today', and we guessed 'ahead'. The feedback we'd receive is 'yellow, grey, grey, grey, yellow'. If our list of candidate words consisted of 'ahead, today, toady, green', then the words consistent with this feedback are 'today' and 'toady'; all other words have been eliminated. We could then make our next guess from this list of candidate words, until only one candidate word remains.

Now imagine a box containing all candidate words. We can draw 243 arrows leaving this box and leading to other boxes. We label each of the 243 arrows with one of the feedback combinations we receive if we had chosen to play the first word in the original box, and the words contained in each of the 243 boxes to which the arrow leads then contain the possible remaining candidate words. We then repeat the process until we have - at the very edge of our drawing - only boxes containing one word.

This drawing of what computer scientists call a 'tree' then represents one strategy for playing the game. What we need to do is identify what word to put first in every box (which will affect what words appear in boxes that follow it), to minimise the number of boxes between our first box and any box containing a single word. If this number is less than six, then we are guaranteed to be able to solve Wordle.

So now, we need to try reduce the number of boxes. The best way to achieve this is to try have an equal number of words in the boxes at each level. If you're struggling to see this, consider an extreme case. Let's say you have a box containing 10 words, and you pick a word which eliminates only one candidate. Then all but one of the boxes following will have no words, and the remaining box will have nine words. if you repeat this process, you'll see that you will need nine steps to get to the final box.

In contrast, if you started with (say 59049) words, and were able to get each box to have the same number of words, then at the next level each box would have 243 words, and at the final level you'd have another 243 boxes with one word; you would have managed to find the correct word using only two levels!

This suggests a strategy that we can use. For every candidate word, we compute how many candidate words go in each box below it, and pick the candidate word which distributes the words as evenly as possible across the next level, and we can measure this by computing the word's 'entropy' using a simple formula. We then start by picking the word with the largest entropy. After receiving feedback from the game, we follow the arrow to the appropriate box, and pick the word from that box with the largest entropy, repeating this process until we get to the last word.

I found an online list containing 5757 five letter words. Using the process described above, I was able to build a tree with six levels, allowing me to find any word in fewer than six guesses given the best starting word.

Enough already, what's the word?

Assuming Wordle uses the same dictionary, the best word to start with is 'tares'. This word contains many common letters in common positions, allowing one to quickly eliminate many alternatives. It's worth noting that there are many other words that are pretty much as good, including 'crane', 'rates' and 'teams'. If any of these words are used, and under perfect play, you'll find the target word in between three and four moves.

However, even if you start with a bad word such as 'fuzzy', you'll still be able to find the goal word in just over four moves on average (which isn't surprising given that you can then jump back to one of the good words afterwards). No matter which starting word you choose, there are some words that you'll be lucky to find. The longest sequence of guesses can include up to 11 steps; a goal word such as 'watch' could require guesses including 'patch, match, hatch, latch' and 'batch'. In fact, around 5% of the words in my dictionary required luck, or more than six steps to find.

Conclusions

While Wordle is a fun distraction, solving the game actually highlights some very interesting science. The approach described above lies at the heart of the ID3 algorithm to build decision trees. Such decision trees are a form of Artificial Intelligence which has been embedded in many software systems.

Decision trees are used to help make choices ranging from whether to grant someone a mortgage, to what medicine a doctor should prescribe. Building a decision tree for these use-cases is done in precisely the same manner as we built our Wordle decision tree, except that rather than using a dictionary and game feedback to drive the construction process, credit or health records and outcomes are used as input data.

The moral of the story is that any decent word, containing common letters (ideally in common positions) is just about as good as choosing a perfect starting word. No matter what you choose, there will be some days that you'll need luck to succeed.

Published by News, University of Aberdeen

Search Blog

Browse by Month

2024

  1. Jan There are no items to show for January 2024
  2. Feb There are no items to show for February 2024
  3. Mar There are no items to show for March 2024
  4. Apr There are no items to show for April 2024
  5. May There are no items to show for May 2024
  6. Jun There are no items to show for June 2024
  7. Jul
  8. Aug There are no items to show for August 2024
  9. Sep There are no items to show for September 2024
  10. Oct There are no items to show for October 2024
  11. Nov There are no items to show for November 2024
  12. Dec There are no items to show for December 2024

2023

  1. Jan There are no items to show for January 2023
  2. Feb
  3. Mar There are no items to show for March 2023
  4. Apr There are no items to show for April 2023
  5. May There are no items to show for May 2023
  6. Jun There are no items to show for June 2023
  7. Jul There are no items to show for July 2023
  8. Aug There are no items to show for August 2023
  9. Sep There are no items to show for September 2023
  10. Oct
  11. Nov There are no items to show for November 2023
  12. Dec There are no items to show for December 2023

2022

  1. Jan
  2. Feb There are no items to show for February 2022
  3. Mar
  4. Apr
  5. May
  6. Jun
  7. Jul
  8. Aug
  9. Sep
  10. Oct
  11. Nov
  12. Dec

2021

  1. Jan
  2. Feb
  3. Mar There are no items to show for March 2021
  4. Apr
  5. May
  6. Jun There are no items to show for June 2021
  7. Jul
  8. Aug There are no items to show for August 2021
  9. Sep
  10. Oct There are no items to show for October 2021
  11. Nov
  12. Dec There are no items to show for December 2021

2020

  1. Jan There are no items to show for January 2020
  2. Feb There are no items to show for February 2020
  3. Mar There are no items to show for March 2020
  4. Apr
  5. May
  6. Jun There are no items to show for June 2020
  7. Jul
  8. Aug There are no items to show for August 2020
  9. Sep There are no items to show for September 2020
  10. Oct There are no items to show for October 2020
  11. Nov There are no items to show for November 2020
  12. Dec

2019

  1. Jan
  2. Feb There are no items to show for February 2019
  3. Mar
  4. Apr
  5. May
  6. Jun
  7. Jul
  8. Aug There are no items to show for August 2019
  9. Sep There are no items to show for September 2019
  10. Oct
  11. Nov
  12. Dec There are no items to show for December 2019

2018

  1. Jan There are no items to show for January 2018
  2. Feb There are no items to show for February 2018
  3. Mar There are no items to show for March 2018
  4. Apr There are no items to show for April 2018
  5. May There are no items to show for May 2018
  6. Jun There are no items to show for June 2018
  7. Jul There are no items to show for July 2018
  8. Aug There are no items to show for August 2018
  9. Sep
  10. Oct
  11. Nov
  12. Dec