Sudoku @ paulspages.co.uk - how the solver works

How the solver works

Now on version 2, 3, 4 with improved performance and extra logic!
(See bottom of page for details).

The solver routine in my sudoku page will solve any puzzle - even one with no starting numbers! (Press 'Clear puzzle' then 'Solve puzzle' to see it happen.) However it's not a good guide to solving sudoku puzzles by hand.

This is because, for all except the simplest puzzles, it uses trial and error. This is a workable strategy for a computer, which can handle the thousands of calculations involved, but not for us humans. We need to use logic to ensure that every number we enter is provably correct, otherwise we'd quickly get lost in a mire of wrong guesses and backtracking.

For some tips on human sudoku solving, try my How To Solve Sudoku page (the title says it all!). Also, try Michael Mephem's excellent 'Solving Sudoku' document at www.sudoku.org.uk (it's in PDF format, so you'll need the Adobe reader plugin).

Many computer solvers use 'human' techniques, sticking to logic and attempting to identify advanced patterns such as X Wing and Swordfish in order to solve difficult puzzles. Mine is more single-minded (I come from a commercial programming background!) - its job is to solve puzzles, and it's not averse to using guesswork and some digital brute force to get there. It was also surprisingly simple to write.

The basic principle is to fill in all the available single-candidate squares (including any new ones that result from filling them), and if that hasn't solved the puzzle, make a guess. It's iterative, i.e. it will make guesses-upon-guesses (to any level of nesting), backtracking to previous levels if a guess leads to a dead end. The same solving function (pz_solve() in the page source) is used from the start onwards, calling itself recursively whenever a new level of guess is required.

The first part of the function is the single-candidate scanning loop - the deduction part of the process. Each time through, the program scans all the empty squares, identifying and filling any which have just one possible value. Filling a single-candidate square can create more single-candidate squares, so the scanning loop keeps repeating, until one of the following conditions occurs:

The previous pass through the loop didn't find any new single-candidate squares (or end with one of the other conditions described below).

At this point, a guess must be made. The program takes a snapshot of the current state of the puzzle, then finds an empty square with the least number of candidate values, to act as the 'guess square' for this iteration.

The function saves the guess square's candidate values, puts the first one into the square, then calls itself. The process then begins again (starting with the deduction scanning loop) at a new 'guess level'.
Alternatively, the scanning loop may end because the puzzle is solved (there are no empty squares left at the end of the scanning loop). For easy puzzles this will occur in the first iteration of the solving function (i.e. before any guesses), typically after four or five passes through the scanning loop.

When 'puzzle solved' occurs, the function exits back up through all its iterations, to the routine that called it.
The scanning loop may also end because a dead end is reached (identified by an empty square with no candidate values).

If a dead end occurs in the first iteration (before any guesses have been made), then the puzzle is malformed, and doesn't have a valid solution.

If a dead end occurs in the second or later iteration, i.e. after a guess, then that guess was wrong. The process exits back up to the previous iteration, which restores the puzzle snapshot, puts the next candidate value into its guess-square, and calls itself again.

If there are no more candidate values, then it exits back up to its own previous iteration. At the end of a particular guess thread, the stack of iterations will collapse back up to the earliest guess level that still has untried candidate values.

And, er, that's it. This will solve any valid sudoku puzzle (however fiendish) because, at each guess level, it will eventually find the correct candidate. This will, at some point, result in a sequence of correct guesses from which the next iteration of the scanning loop can deduce all the remaining square values. If necessary, it will guess right down until there's only one square left to deduce, thus guaranteeing to solve any puzzle.

In Mozilla Firefox, the progress display is updated as the solver runs (Internet Explorer suppresses display updates until the process is complete). 'Guess level' is the current depth of the guess stack (this can decrease as well as increase, as the process backtracks along dead-end threads). 'Pass no.' is the total number of passes through the deduction loop so far (including dead-end threads). At the end, 'Maximum guess level' is the deepest guess level that was reached during the solve process.

A 'moderate' rated puzzle will typically use two guess levels and 10 - 15 passes, while the toughest 'outlaw' will run to around 12 guess levels and 90 - 100 passes. There are exceptions though - the toughest 3X3 puzzle from the 'Brain of Britain' section at sudoku.sourceforge.net takes 18 guess levels and 1,830 passes!

Most of the solver's time is, in fact, spent updating the display (even in IE). When you press 'Check my answer' or 'Set puzzle', it runs in silent mode (no display updates), and is much faster.

Version 2 - the new high(-er!) performance solver.
As with most software, the main aim in the first version of the solver was to get something that worked reliably with acceptable performance. As with most version 2's, there's been scope for some tuning.

In the first version of the solver, the deduction loop found single-candidate squares by checking every other square in the same regions (row, column and rectangle), every time through the loop (OK, that was a pretty crude way to do it, but it was quick to code and it worked!). In this version, a candidates array is maintained, so that the process only has to check for array elements with a length of 1. This triples the execution speed of the deduction loop.

This version also checks for pairs (see solving tips in the main page), removing the pair-squares' candidates from other squares in the same regions. I hadn't realised what a difference this would make - by creating more single-candidate squares per pass through the deduction loop, it reduced the number of passes on a benchmark 'hard' puzzle from 93 to 21. There's a tip for human solvers there - make sure you spot the pairs!

.. and now, version 3...
Version 3 adds checking for a candidate number only appearing once in a region. For example, a row might have three empty squares, with candidates 458, 4589 and 2459. The candidate value 2 only appears in the third square, so this row's 2 must go in that square.

This massively reduces the number of passes on most puzzles (especially difficult ones), more than outweighing the extra processing time. The number of passes on the benchmark puzzle is now 12.

.. and now version 4 - this time it's logical

Version 4 adds some more logic checking, this time for triples and number-claiming boxes. The solver now applies the five basic solving rules described in How to solve sudoku. This actually slows it down (the extra logic is switched off during resource-hungry puzzle generation) but means it can solve around 90% of puzzles without making a guess.

I use this feature to validate that puzzles are solvable by logic alone. Any that require a guess are rated 'outlaw'. This doesn't necessarily mean that they can't be solved by logic, merely that my solver can't solve them that way. My guess is that around 50% of outlaws genuinely need guesses - the rest can be solved using more sophisticated logic. Conversely, however, this test does mean that all non-outlaw puzzles can be guaranteed solvable by logic.

Paul Stephens, June 2005