{"id":6088,"date":"2023-04-11T10:00:00","date_gmt":"2023-04-11T10:00:00","guid":{"rendered":"https:\/\/modernsciences.org\/staging\/4414\/?p=6088"},"modified":"2023-03-31T04:27:43","modified_gmt":"2023-03-31T04:27:43","slug":"chatgpt-struggles-with-wordle-puzzles-which-says-a-lot-about-how-it-works","status":"publish","type":"post","link":"https:\/\/modernsciences.org\/staging\/4414\/chatgpt-struggles-with-wordle-puzzles-which-says-a-lot-about-how-it-works\/","title":{"rendered":"ChatGPT struggles with Wordle puzzles, which says a lot about how it works"},"content":{"rendered":"\n  <figure>\n    <img  decoding=\"async\"  src=\"data:image\/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABAQMAAAAl21bKAAAAA1BMVEUAAP+KeNJXAAAAAXRSTlMAQObYZgAAAAlwSFlzAAAOxAAADsQBlSsOGwAAAApJREFUCNdjYAAAAAIAAeIhvDMAAAAASUVORK5CYII=\"  class=\" pk-lazyload\"  data-pk-sizes=\"auto\"  data-pk-src=\"https:\/\/images.theconversation.com\/files\/517926\/original\/file-20230328-3015-qfoq0k.jpg?ixlib=rb-1.1.0&#038;rect=287%2C8%2C5371%2C3637&#038;q=45&#038;auto=format&#038;w=754&#038;fit=clip\" >\n      <figcaption>\n        shutterstock.\n        <span class=\"attribution\"><a class=\"source\" href=\"https:\/\/www.shutterstock.com\/image-photo\/portland-usa-feb-12-2022-child-2126149826\" target=\"_blank\" rel=\"noopener\">Shutterstock<\/a><\/span>\n      <\/figcaption>\n  <\/figure>\n\n<span><a href=\"https:\/\/theconversation.com\/profiles\/michael-g-madden-1422365\" target=\"_blank\" rel=\"noopener\">Michael G. Madden<\/a>, <em><a href=\"https:\/\/theconversation.com\/institutions\/university-of-galway-2699\" target=\"_blank\" rel=\"noopener\">University of Galway<\/a><\/em><\/span>\n\n<p>The AI chatbot known as ChatGPT, developed by the company OpenAI, has caught the public\u2019s attention and imagination. Some applications of the technology <a href=\"https:\/\/arxiv.org\/abs\/2302.13817\" target=\"_blank\" rel=\"noopener\">are truly impressive<\/a>, such as its ability to <a href=\"https:\/\/www.forbes.com\/sites\/bernardmarr\/2023\/03\/01\/the-best-examples-of-what-you-can-do-with-chatgpt\/\" target=\"_blank\" rel=\"noopener\">summarise complex topics<\/a> or to <a href=\"https:\/\/www.theatlantic.com\/technology\/archive\/2022\/12\/openai-chatgpt-chatbot-messages\/672411\/\" target=\"_blank\" rel=\"noopener\">engage in long conversations<\/a>. <\/p>\n\n<p>It\u2019s no surprise that other AI companies <a href=\"https:\/\/www.theverge.com\/2022\/11\/2\/23434360\/google-1000-languages-initiative-ai-llm-research-project\" target=\"_blank\" rel=\"noopener\">have been rushing<\/a> to release their own <a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model\" target=\"_blank\" rel=\"noopener\">large language models (LLMs)<\/a> \u2013 the name for the technology underlying chatbots like ChatGPT. Some of these LLMs will be incorporated into other products, such as search engines.<\/p>\n\n<p>With its impressive capabilities in mind, I decided to test the chatbot on <a href=\"https:\/\/www.nytimes.com\/games\/wordle\/index.html\" target=\"_blank\" rel=\"noopener\">Wordle<\/a> \u2013 the word game from the New York Times \u2013 which I have been playing for some time. Players have six goes at guessing a five-letter word. On each guess, the game indicates which letters, if any, are in the correct positions in the word. <\/p>\n\n<p>Using the latest generation, <a href=\"https:\/\/openai.com\/product\/gpt-4\" target=\"_blank\" rel=\"noopener\">called ChatGPT-4<\/a>, I discovered that its performance on these puzzles was surprisingly poor. You might expect word games to be a piece of cake for GPT-4. LLMs are \u201ctrained\u201d on text, meaning they are exposed to information so that they can improve at what they do. ChatGPT-4 was trained on about 500 billion words: all of Wikipedia, all public-domain books, huge volumes of scientific articles, and text from many websites.<\/p>\n\n<p>AI chatbots could play a major role in our lives. Understanding why ChatGPT-4 struggles with Wordle provides insights into how LLMs represent and work with words \u2013 along with the limitations this brings. <\/p>\n\n<p>First, I tested ChatGPT-4 on a Wordle puzzle where I knew the correct locations of two letters in a word. The pattern was \u201c#E#L#\u201d, where \u201c#\u201d represented the unknown letters. The answer was the word \u201cmealy\u201d. <\/p>\n\n<p>Five out of ChatGPT-4\u2019s six responses failed to match the pattern. The responses were: \u201cberyl\u201d, \u201cferal\u201d, \u201cheral\u201d, \u201cmerle\u201d, \u201crevel\u201d and \u201cpearl\u201d.<\/p>\n\n<p>With other combinations, the chatbot sometimes found valid solutions. But, overall, it was very hit and miss. In the case of a word fitting the pattern \u201c##OS#\u201d, it found five correct options. But when the pattern was \u201c#R#F#\u201d, it proposed two words without the letter F, and a word \u2013 \u201cTraff\u201d \u2013 that isn\u2019t in dictionaries.<\/p>\n\n<figure class=\"align-center \">\n            <img  decoding=\"async\"  alt=\"Representation of GPT-4\"  src=\"data:image\/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABAQMAAAAl21bKAAAAA1BMVEUAAP+KeNJXAAAAAXRSTlMAQObYZgAAAAlwSFlzAAAOxAAADsQBlSsOGwAAAApJREFUCNdjYAAAAAIAAeIhvDMAAAAASUVORK5CYII=\"  class=\" pk-lazyload\"  data-pk-sizes=\"auto\"  data-ls-sizes=\"(min-width: 1466px) 754px, (max-width: 599px) 100vw, (min-width: 600px) 600px, 237px\"  data-pk-src=\"https:\/\/images.theconversation.com\/files\/516961\/original\/file-20230322-1452-x215lm.jpg?ixlib=rb-1.1.0&amp;q=45&amp;auto=format&amp;w=754&amp;fit=clip\"  data-pk-srcset=\"https:\/\/images.theconversation.com\/files\/516961\/original\/file-20230322-1452-x215lm.jpg?ixlib=rb-1.1.0&amp;q=45&amp;auto=format&amp;w=600&amp;h=400&amp;fit=crop&amp;dpr=1 600w, https:\/\/images.theconversation.com\/files\/516961\/original\/file-20230322-1452-x215lm.jpg?ixlib=rb-1.1.0&amp;q=30&amp;auto=format&amp;w=600&amp;h=400&amp;fit=crop&amp;dpr=2 1200w, https:\/\/images.theconversation.com\/files\/516961\/original\/file-20230322-1452-x215lm.jpg?ixlib=rb-1.1.0&amp;q=15&amp;auto=format&amp;w=600&amp;h=400&amp;fit=crop&amp;dpr=3 1800w, https:\/\/images.theconversation.com\/files\/516961\/original\/file-20230322-1452-x215lm.jpg?ixlib=rb-1.1.0&amp;q=45&amp;auto=format&amp;w=754&amp;h=503&amp;fit=crop&amp;dpr=1 754w, https:\/\/images.theconversation.com\/files\/516961\/original\/file-20230322-1452-x215lm.jpg?ixlib=rb-1.1.0&amp;q=30&amp;auto=format&amp;w=754&amp;h=503&amp;fit=crop&amp;dpr=2 1508w, https:\/\/images.theconversation.com\/files\/516961\/original\/file-20230322-1452-x215lm.jpg?ixlib=rb-1.1.0&amp;q=15&amp;auto=format&amp;w=754&amp;h=503&amp;fit=crop&amp;dpr=3 2262w\" >\n            <figcaption>\n              <span class=\"caption\">California-based company OpenAI recently released its latest chatbot, known as GPT-4.<\/span>\n              <span class=\"attribution\"><a class=\"source\" href=\"https:\/\/www.shutterstock.com\/image-photo\/portland-usa-mar-15-2023-webpage-2275173419\" target=\"_blank\" rel=\"noopener\">Shutterstock \/ Tada Images<\/a><\/span>\n            <\/figcaption>\n          <\/figure>\n\n<h2 id=\"under-the-bonnet\">Under the bonnet<\/h2>\n\n<p>At the core of ChatGPT is a <a href=\"https:\/\/www.ibm.com\/cloud\/blog\/ai-vs-machine-learning-vs-deep-learning-vs-neural-networks\" target=\"_blank\" rel=\"noopener\">deep neural network<\/a>: a complex mathematical function \u2013 or rule \u2013 that maps inputs to outputs. The inputs and outputs must be numbers. Since ChatGPT-4 works with words, these must be \u201ctranslated\u201d to numbers for the neural network to work with them.<\/p>\n\n<p>The translation is performed by a computer program called a <a href=\"https:\/\/platform.openai.com\/tokenizer\" target=\"_blank\" rel=\"noopener\">tokenizer<\/a>, which maintains a huge list of words and letter sequences, called \u201ctokens\u201d. These tokens are identified by numbers. A word such as \u201cfriend\u201d has a token ID of 6756, so a word such as \u201cfriendship\u201d is broken down into the tokens \u201cfriend\u201d and \u201cship\u201d. These are represented as the identifiers 6756 and 6729.<\/p>\n\n<p>When the user enters a question, the words are translated into numbers before ChatGPT-4 even starts processing the request. The deep neural network does not have access to the words as text, so it cannot really reason about the letters.<\/p>\n\n<h2 id=\"poem-task\">Poem task<\/h2>\n\n<p>ChatGPT-4 is good at working with the first letters of words. I asked it to write a poem where the opening letter of each line spelled out \u201cI love robots\u201d. Its response was surprisingly good. Here are the first four lines:<\/p>\n\n<blockquote>\n<p>I am a fan of gears and steel<\/p>\n\n<p>Loving their movements, so surreal,<\/p>\n\n<p>Over circuits, they swiftly rule<\/p>\n\n<p>Vying for knowledge, they\u2019re no fool,<\/p>\n<\/blockquote>\n\n<p><a href=\"https:\/\/www.sciencefocus.com\/future-technology\/gpt-3\/\" target=\"_blank\" rel=\"noopener\">The training data for ChatGPT-4<\/a> includes huge numbers of textbooks, which often include alphabetical indices. This could have been enough for GPT-4 to have learned associations between words and their first letters. <\/p>\n\n<p>The tokenizer also appears to have been modified to recognise requests like this, and seems to split a phrase such as \u201cI Love Robots\u201d into individual tokens when users enter their request. However, ChatGPT-4 was not able to handle requests to work with the last letters of words.<\/p>\n\n<p>ChatGPT-4 is also bad at palindromes. Asked to produce a palindrome phrase about a robot, it proposed \u201ca robot\u2019s sot, orba\u201d, which does not fit the definition of a palindrome and relies on obscure words.<\/p>\n\n<p>However, LLMs are relatively good at generating other computer programs. This is because their training data includes many websites devoted to programming. I asked ChatGPT-4 to write a program for working out the identities of missing letters in Wordle.<\/p>\n\n<p>The initial program that ChatGPT-4 produced had a bug in it. It corrected this when I pointed it out. When I ran the program, it found 48 valid words matching the pattern \u201c#E#L#\u201d, including \u201ctells\u201d, \u201ccells\u201d and \u201chello\u201d. When I had previously asked GPT-4 directly to propose matches for this pattern, it had only found one.<\/p>\n\n<h2 id=\"future-fixes\">Future fixes<\/h2>\n\n<p>It might seem surprising that a large language model like ChatGPT-4 would struggle to solve simple word puzzles or formulate palindromes, since the training data includes almost every word available to it.<\/p>\n\n<p>However, this is because all text inputs must be encoded as numbers and the process that does this doesn\u2019t capture the structure of letters within words. Because neural networks operate purely with numbers, the requirement to encode words as numbers will not change.<\/p>\n\n<p>There are two ways that future LLMs can overcome this. First, ChatGPT-4 knows the first letter of every word, so its training data could be augmented to include mappings of every letter position within every word in its dictionary.<\/p>\n\n<p>The second is a more exciting and general solution. Future LLMs could generate code to solve problems like this, as I have shown. A recent paper demonstrated <a href=\"https:\/\/arxiv.org\/abs\/2302.04761\" target=\"_blank\" rel=\"noopener\">an idea called Toolformer<\/a>, where an LLM uses external tools to carry out tasks where they normally struggle, such as arithmetic calculations.<\/p>\n\n<p>We are in the early days of these technologies, and insights like this into current limitations can lead to even more impressive AI technologies.<!-- Below is The Conversation's page counter tag. Please DO NOT REMOVE. --><img  loading=\"lazy\"  decoding=\"async\"  src=\"data:image\/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABAQMAAAAl21bKAAAAA1BMVEUAAP+KeNJXAAAAAXRSTlMAQObYZgAAAAlwSFlzAAAOxAAADsQBlSsOGwAAAApJREFUCNdjYAAAAAIAAeIhvDMAAAAASUVORK5CYII=\"  alt=\"The Conversation\"  width=\"1\"  height=\"1\"  style=\"border: none !important; box-shadow: none !important; margin: 0 !important; max-height: 1px !important; max-width: 1px !important; min-height: 1px !important; min-width: 1px !important; opacity: 0 !important; outline: none !important; padding: 0 !important\"  referrerpolicy=\"no-referrer-when-downgrade\"  class=\" pk-lazyload\"  data-pk-sizes=\"auto\"  data-pk-src=\"https:\/\/counter.theconversation.com\/content\/201906\/count.gif?distributor=republish-lightbox-basic\" ><!-- End of code. If you don't see any code above, please get new code from the Advanced tab after you click the republish button. The page counter does not collect any personal data. More info: https:\/\/theconversation.com\/republishing-guidelines --><\/p>\n\n<p><span><a href=\"https:\/\/theconversation.com\/profiles\/michael-g-madden-1422365\" target=\"_blank\" rel=\"noopener\">Michael G. Madden<\/a>, Established Professor of Computer Science, <em><a href=\"https:\/\/theconversation.com\/institutions\/university-of-galway-2699\" target=\"_blank\" rel=\"noopener\">University of Galway<\/a><\/em><\/span><\/p>\n\n<p>This article is republished from <a href=\"https:\/\/theconversation.com\" target=\"_blank\" rel=\"noopener\">The Conversation<\/a> under a Creative Commons license. Read the <a href=\"https:\/\/theconversation.com\/chatgpt-struggles-with-wordle-puzzles-which-says-a-lot-about-how-it-works-201906\" target=\"_blank\" rel=\"noopener\">original article<\/a>.<\/p>\n\n","protected":false},"excerpt":{"rendered":"shutterstock. Shutterstock Michael G. Madden, University of Galway The AI chatbot known as ChatGPT, developed by the company&hellip;\n","protected":false},"author":428,"featured_media":6075,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"nf_dc_page":"","fifu_image_url":"","fifu_image_alt":"","footnotes":""},"categories":[16],"tags":[334,693,333,497,474],"class_list":{"0":"post-6088","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-tech","8":"tag-artificial-intelligence","9":"tag-chatgpt","10":"tag-machine-learning","11":"tag-neural-network","12":"tag-the-conversation","13":"cs-entry","14":"cs-video-wrap"},"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/modernsciences.org\/staging\/4414\/wp-json\/wp\/v2\/posts\/6088","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/modernsciences.org\/staging\/4414\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/modernsciences.org\/staging\/4414\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/modernsciences.org\/staging\/4414\/wp-json\/wp\/v2\/users\/428"}],"replies":[{"embeddable":true,"href":"https:\/\/modernsciences.org\/staging\/4414\/wp-json\/wp\/v2\/comments?post=6088"}],"version-history":[{"count":1,"href":"https:\/\/modernsciences.org\/staging\/4414\/wp-json\/wp\/v2\/posts\/6088\/revisions"}],"predecessor-version":[{"id":6089,"href":"https:\/\/modernsciences.org\/staging\/4414\/wp-json\/wp\/v2\/posts\/6088\/revisions\/6089"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/modernsciences.org\/staging\/4414\/wp-json\/wp\/v2\/media\/6075"}],"wp:attachment":[{"href":"https:\/\/modernsciences.org\/staging\/4414\/wp-json\/wp\/v2\/media?parent=6088"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/modernsciences.org\/staging\/4414\/wp-json\/wp\/v2\/categories?post=6088"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/modernsciences.org\/staging\/4414\/wp-json\/wp\/v2\/tags?post=6088"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}