{"id":920,"date":"2023-11-05T00:15:47","date_gmt":"2023-11-05T00:15:47","guid":{"rendered":"http:\/\/kb.shoelace.biz\/?p=920"},"modified":"2025-11-08T15:37:11","modified_gmt":"2025-11-08T15:37:11","slug":"dont-teach-me-sudoku","status":"publish","type":"post","link":"https:\/\/kb.shoelace.biz\/?p=920","title":{"rendered":"Don&#8217;t Teach Me Sudoku"},"content":{"rendered":"\n<p>I&#8217;ve been playing the New York Times Sudoku on occasion, and it takes me almost an hour to solve the medium puzzle.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"http:\/\/kb.shoelace.biz\/wp-content\/uploads\/2023\/10\/image.png\" alt=\"\" class=\"wp-image-921\" style=\"width:840px;height:713px\" width=\"840\" height=\"713\"\/><figcaption class=\"wp-element-caption\">A New York Times Sudoku on Medium Difficulty<\/figcaption><\/figure>\n<\/div>\n\n\n<p>Fun isn&#8217;t fun until its ruined, so as is de rigueur, I started kicking around the idea of getting a computer to solve sudoku puzzles. Of course sudoku-solvers already exist on the internet, so I won&#8217;t be treading any new ground for humanity &#8211; but I&#8217;m interested in a subset of those solvers. Specifically, ones you don&#8217;t have to &#8220;teach&#8221; the rules to:<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><img decoding=\"async\" src=\"http:\/\/kb.shoelace.biz\/wp-content\/uploads\/2023\/10\/image-1.png\" alt=\"\" class=\"wp-image-922\"\/><\/figure>\n<\/div>\n\n\n<h2 class=\"wp-block-heading\">Baby Sudoku For Baby Minds<\/h2>\n\n\n\n<p>My first concern was whether a simple feed-forward neural network could even sort out the easiest problem to solve in sudoku &#8211; which is when you have numbers in a row or column and only 1 is missing:<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><img decoding=\"async\" src=\"http:\/\/kb.shoelace.biz\/wp-content\/uploads\/2023\/10\/image-2.png\" alt=\"\" class=\"wp-image-923\"\/><\/figure>\n<\/div>\n\n\n<p>Even if you&#8217;ve never played sudoku you can probably deduce what number goes in the box. I was not convinced this was particularly easy for a simple neural network to figure out. I mean, how would you set up a problem like this for any ML algorithm? I decided to do it like this:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Treat the numbers as categories. The numbers could be anything &#8211; and their actual relationships (2 being 1+1, 9 being 3 x 3) are irrelevant to the puzzle.<\/li>\n\n\n\n<li>Make the position of the numbers irrelevant. You can imagine that you don&#8217;t need the missing number to be in any particular box to solve the above puzzle.<\/li>\n\n\n\n<li>One-hot encode the categories. If you don&#8217;t what that means &#8211; you can read <a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.preprocessing.OneHotEncoder.html\">here<\/a> (or ignore this point).<\/li>\n<\/ol>\n\n\n\n<p>I wrote a small block of code to generate the above problem 20,000 times (with a random number from 1 to 9 missing each time).<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>import pandas as pd\nimport numpy as np\n# %%\ncategories = &#091;'1','2','3','4','5','6','7','8','9']\npredictors = categories&#091;:-1]\ndef generateSingleLineSudoku(samples):\n\n    df_list = &#091;]\n    for i in np.arange(0,samples):\n        df_list.append(\n            np.random.choice(\n                np.arange(1,10,1),size=9,replace=False\n            )\n        )\n\n    return pd.DataFrame(\n        df_list, \n        columns=predictors+&#091;'target']\n    )<\/code><\/pre>\n\n\n\n<p>Then I one hot encoded the predictors (the 8 numbers shown) and the target (the 9th number &#8211; aka the thing we&#8217;re trying to predict), and trained a single-hidden-layer neural network on 10,000 of the predictors and target:<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><img decoding=\"async\" src=\"http:\/\/kb.shoelace.biz\/wp-content\/uploads\/2023\/10\/image-3.png\" alt=\"\" class=\"wp-image-926\"\/><\/figure>\n<\/div>\n\n\n<p>All the &#8220;Nones&#8221; refer to batch sizes, which aren&#8217;t important here. The critical thing to see is that the model takes 72 values and outputs 9, Which sounds like 9 times too many given that I claimed it&#8217;s supposed to only predict one missing number given 8 other numbers. The devil-in-the-details is the one-hot encoding &#8211; so each of the 9 outputs from the model represents a probability that particular number is  missing.<\/p>\n\n\n\n<p>Here are 4 different predictions (rows) with columns 1 through 9 where each cell represents the probability that the missing number is the column number.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><img decoding=\"async\" src=\"http:\/\/kb.shoelace.biz\/wp-content\/uploads\/2023\/10\/image-4.png\" alt=\"\" class=\"wp-image-927\"\/><\/figure>\n<\/div>\n\n\n<p>These match the actual missing numbers for these rows. Suffice to say this model is very good at predicting the missing number. Upon further reflection this isn&#8217;t too surprising. The model only has to &#8220;see&#8221; 9 unique examples in its training data to have seen the entire universe of possible examples &#8211; and I trained it on 10,000 examples.<\/p>\n\n\n\n<p>Interestingly, if I remove a second number and ask the neural net to predict what is missing from this puzzle:<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><img decoding=\"async\" src=\"http:\/\/kb.shoelace.biz\/wp-content\/uploads\/2023\/10\/image-5.png\" alt=\"\" class=\"wp-image-931\"\/><\/figure>\n<\/div>\n\n\n<p>It correctly predicts that both 7 AND 8 are missing! So it is has appropriately generalized to a slightly more complex problem.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Toddler Sudoku for Toddler Minds<\/h2>\n\n\n\n<p>The next step is to train a model that can solve a sudoku problem with more constraints. A small sudoku board is suitable. I figured I would use the number of possible solutions as a guide to decide how large of a board to use but it turns out the math behind computing the total number of possible solutions is complex. A 9&#215;9 sudoku board has \\(6.671&#215;10^{21}\\) <a href=\"https:\/\/people.math.sc.edu\/girardi\/sudoku\/enumerate.pdf\" data-type=\"link\" data-id=\"https:\/\/people.math.sc.edu\/girardi\/sudoku\/enumerate.pdf\">solutions<\/a>. I wasn&#8217;t able to find a quick reference for a table or function that says the total number of solutions given the size of the board. I know a 1&#215;1 board has 1 possible solution, a 2&#215;2 board has only 2 possible solutions, a 3&#215;3 board has 6. This is growing like factorial &#8211; but a 9&#215;9 board has far more than 9! solutions. I&#8217;ll guess that a 4&#215;4 board is large enough to create some interesting unseen problems for my neural net.<\/p>\n\n\n\n<p>The biggest change from the previous problem is that in the new problem &#8211; position matters &#8211; so the network will have to learn to do more than just pick the missing numbers. The difficulty of a sudoku puzzle can also vary based on how filled out it is at the start. In general, the more filled out, the easier the puzzle. I generated a dataset of increasingly difficult puzzles and trained a new neural network to try and solve 4&#215;4 sudokus.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><img decoding=\"async\" src=\"http:\/\/kb.shoelace.biz\/wp-content\/uploads\/2023\/11\/Screenshot-2023-11-04-120441.png\" alt=\"\" class=\"wp-image-936\"\/><\/figure>\n<\/div>\n\n\n<p>After 2 minutes of training &#8211; the network could solve puzzles with 60% of the cells missing with 94% accuracy. Though my accuracy measure does not account for the problem of sudoku&#8217;s with multiple possible solutions. Here is a situation in which the network found a valid solution that differs from the prescribed solution:<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><img decoding=\"async\" src=\"http:\/\/kb.shoelace.biz\/wp-content\/uploads\/2023\/11\/Screenshot-2023-11-04-121743.png\" alt=\"\" class=\"wp-image-937\"\/><\/figure>\n<\/div>\n\n\n<h2 class=\"wp-block-heading\">Original Sudoku For Spoon Bending Minds<\/h2>\n\n\n\n<p>With proof that the neural network could learn the necessary information to solve small sudoku puzzles, I scaled the network up to see how it would perform on bigger puzzles, and unsurprisingly it performs worse. Some obvious improvements could be found using more data and a bigger network. I found some issues with the python package I was using to generate new puzzles and solutions (it was generating the same solutions many times over). However &#8211; simple 9&#215;9 puzzles were solvable for the trained network:<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><img decoding=\"async\" src=\"http:\/\/kb.shoelace.biz\/wp-content\/uploads\/2023\/11\/Screenshot-2023-11-04-160113.png\" alt=\"\" class=\"wp-image-940\"\/><\/figure>\n<\/div>\n\n\n<p>I highlighted the missing content in yellow, and the correct predictions made by the model in green. <\/p>\n\n\n\n<p>The next step is to generate more unique puzzles and possibly increase the complexity of the network so that I may train the network on more difficult problems &#8211; eventually leading to its ability to solve New York Time&#8217;s difficult puzzles.<\/p>\n\n\n\n<p>Until next time.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I&#8217;ve been playing the New York Times Sudoku on occasion, and it takes me almost an hour to solve the medium puzzle. Fun isn&#8217;t fun until its ruined, so as is de rigueur, I started kicking around the idea of getting a computer to solve sudoku puzzles. Of course sudoku-solvers already exist on the internet,&#8230;<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[7,11],"tags":[],"class_list":["post-920","post","type-post","status-publish","format-standard","hentry","category-machine-learning","category-python"],"_links":{"self":[{"href":"https:\/\/kb.shoelace.biz\/index.php?rest_route=\/wp\/v2\/posts\/920","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/kb.shoelace.biz\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/kb.shoelace.biz\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/kb.shoelace.biz\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/kb.shoelace.biz\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=920"}],"version-history":[{"count":1,"href":"https:\/\/kb.shoelace.biz\/index.php?rest_route=\/wp\/v2\/posts\/920\/revisions"}],"predecessor-version":[{"id":980,"href":"https:\/\/kb.shoelace.biz\/index.php?rest_route=\/wp\/v2\/posts\/920\/revisions\/980"}],"wp:attachment":[{"href":"https:\/\/kb.shoelace.biz\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=920"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/kb.shoelace.biz\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=920"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/kb.shoelace.biz\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=920"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}