Using z3 To Solve Logic Puzzles

Recently, I learned about point-free programming. The day after I learned about it, I was idly wishing for a puzzle book on converting program code to point-free form. A point-free programming puzzle book would be a good exercise on how well I can compose functions. Sadly, I'm not aware of such a puzzle book, nor am I qualified to write one.

However, thinking of puzzles again reminded me of how I liked logic puzzle books when I was younger (anyone remember PennyPress?). A few years ago, I solved a logic puzzle using an SMT solver out of curiosity. I was motivated to do this after seeing Claire Wolf doing the same to solve a similar "guess the combination to the lock given hints" puzzle. I'd intended to write a blog post about how I mapped the word problem of a logic puzzle to an SMT query, but I never followed up on it.

I know my last post was in late 2019. 2020 through 2023 has not been kind to the world and our collective social/mental bandwidth. Mine included. So while the thought of fun logic puzzles from my youth were fresh in my head yesterday (2023-06-30), I forced myself remember how I solved a logic puzzle with an SMT solver 5 years ago. And at last I'm going to document it here.


This post assumes a background in Boolean logic and/or one programming language (along with how to use its Boolean operators). Familiarity with S-expressions is also a plus, because the input to the z3 program we're using uses S-expressions. However, I will be explaining all the input we send to z3 line-by-line. So feel free to learn about S-expressions at your own pace while or after reading this post.

When referencing concepts common in other programming languages, I refer to Rust documentation, because at this writing (2023-07-01), it's easiest for me to find the analogous concepts in the Rust language's documentation.

And lastly, you should know what a logic puzzle is. Enjoying solving logic puzzles is a nice bonus :).

What is an SMT Solver?

Before I explain what an SMT solver is, and how to use them to solve puzzles, I need to explain what SAT and SAT solvers are as prerequisites.


Suppose you have some Boolean expressions- a bunch of Boolean variables chained together- with Boolean operators- &&, ||, !, etc. Here is an example query:

A || (B && !C)
B && C

SAT, or SATisfiability, is the process of determining whether a set of Boolean expressions like above can all be made true (1) for one or more combination of the values for the given Boolean variables.

Given n Boolean variables, where each variable has 2 possible values- true (1) and false (0)- you can determine whether there's an assignment that satisfies all the Boolean expressions by "substituting all combinations of values for the variables in the expressions" (brute-forcing).

For instance, here's how we can brute-force the above query in (up to) 8 (2^3) easy steps:

  1. A=0,B=0,C=0: UNSAT

    0 || (0 && !0) = 0
    0 && 0 = 0
  2. A=1,B=0,C=0: UNSAT

    1 || (0 && !0) = 1
    0 && 0 = 0
  3. A=0,B=1,C=0: UNSAT

    0 || (1 && !0) = 1
    1 && 0 = 0
  4. A=1,B=1,C=0: UNSAT

    1 || (1 && !0) = 1
    1 && 0 = 0
  5. A=0,B=0,C=1: UNSAT

    0 || (0 && !1) = 0
    0 && 0 = 0
  6. A=1,B=0,C=1: UNSAT

    1 || (0 && !1) = 1
    0 && 1 = 0
  7. A=0,B=1,C=1: UNSAT

    0 || (1 && !1) = 0
    1 && 1 = 1
  8. A=1,B=1,C=1: SAT

    1 || (1 && !1) = 1
    1 && 1 = 1

Indeed, our query is SATisfiable when all three Boolean variables are set to 1.

SAT Solvers

When checking for satisfiability, if just one combination of variable values makes all the expressions in your query equal to 1, you're done- the query is SAT. Unfortunately, this means you have to try 2^n combinations of values for n Boolean variables if you have a query that is UNSATisfiable. And even if the query is SAT, you may have to try many combinations of variable values before you find a combination that returns SAT, such as the above example.

To make matters worse, real-world SAT problems can have thousands of expressions and millions of variables. Unlike my toy example above, it is completely impractical to try to brute-force 2^1000000 different combinations of Boolean variables. It is believed that for some SAT problems, there is no better strategy than brute forcing all 2^n combinations. However, real-world SAT problems are better behaved and can be solved efficiently using, for instance, the Davis–Putnam–Logemann–Loveland (DPLL) algorithm or Conflict-Driven Clause Learning (CDCL) algorithm.

SAT solvers are a type of computer program optimized for solving Boolean satisfiability problems. They take advantage of these algorithms and hueristics common to real-world SAT problems to efficiently find whether a query is SAT or UNSAT1 .

SMT And SMT Solvers

Satisfiability Modulo Theory (SMT) problems and solvers are what you get when you're no longer limited to queries of just Boolean variables. Queries can be taken over Booleans, reals, integers, arrays of Booleans, functions, enumerations, and many more! You can even create queries of combinations of these data types! As might be expected, by relying on queries over data types besides Booleans can result in dramatic speedups, as well as dramatic increases in complexity :).

While some data types map nicely2 to Boolean expressions such as bitvectors, modern SMT solver use theories to map more complex data types back to Boolean expressions. Just like how Booleans are governed by the theory of Boolean algebra, each data type an SMT solver supports has a theory that explains "what you can and cannot do" with the data type. For instance, consider the following expressions, comparing equality of integers I, J, and K:

(I == 1) && (J == I + 1)
(I == 2) || (K == J + 3)
(I == 2) && (J == K + 2)

The expressions I == 1, J == I + 1, I == 2, K == J + 3, J ==K + 2 and all yield Boolean values, and can therefore be converted to and from Boolean values A, B, C, D, and E respectively:

A && B
C || D
C && E

While Boolean expressions can be solved for using the above algorithms for SAT, solving for equality is governed by the congruence closure, and the theory of integers is governed by, well, number theory. The DPLL/CDCL algorithms can be modified to handle conversion to and from Boolean expressions, as well as handle congruence closure and other theories. This is known as the DPLL(T) framework3 .

z3 And SMT-LIB

Our SMT solver of choice will be Microsoft Research's z3. Other SMT solvers exist, such as yices, and thanks to each using different algorithms, there is not a single SMT solver that solves all practical SMT problems better than the others. Speedups and slowdowns can be dramatic depending on which SMT solver you use4 . However, a query to solve a logic puzzle will not tax any SMT solver, so seeing that I'm familiar with its API, I chose z3.

For our purposes, z3 is invoked as the following:

z3 -smt2 model.txt

model.txt is a text file we will create in SMT-LIB format. SMT-LIB is an S-expression-based text input/output format widely-supported by SMT solvers, z3 included. It contains commands to construct input queries for many different theories, run the solver on your query via check-sat, and examine the solver's output. In particular, after running check-sat, you can use SMT-LIB to:

As is custom for an S-expression format, SMT-LIB has a large amount of parantheses to represent nested data structures. While I show a generic example of how to use all our needed SMT-LIB commands, I will be skimming over the meaning of the parentheses nesting for this example application.

An Example In SMT-LIB Format

Like SAT solvers, SMT solvers can handle Boolean expressions just fine, as long as they're in the SMT-LIB format. Using the SAT example earlier in the post, the equivalent SMT-LIB query would look something like this:

(set-option :produce-models true)
(set-logic QF_UF)

(declare-const a Bool)
(declare-const b Bool)
(declare-const c Bool)

(assert (or a (and b (not c)))) ; A || (B && C)
(assert (and b c)) ; B && C

There are a few things to note comparing the SAT example I created above:

  1. The first line set-option :produce-models true enables the get-model command. The second line set-logic restricts the data types and theories that the solver will use. The SMT-LIB website provides a catalog of logics that can be used with set-logic, and QF_UF is a simple sub-logic suitable for a Boolean-only query.

    Although z3 tolerates the first two lines missing, its good practice to add those lines for other solvers like yices and for optimizing. Assuming that some data types and theories can't occur in a given query sometimes allows the solver to speed up proofs.

  1. Variables are not implicit in SMT-LIB, and need to be declared with declare-const. declare-const takes a variable name, and a type- Bool in this case, but an SMT solver understands other types. During check-sat, the solver will fill in a value for you for each variable. The value of a variable is the same everywhere that the variable is used, hence the const part.

  2. As is traditional with S-expressions, operators such as and, or, etc are written in prefix notation rather than infix notation. Parentheses are required around operators and their arguments to actually evaluate the expression/do the calculation5 .

  3. The Boolean expressions you want to check for satisfiability aren't implicit either; you must tell the solver about your query using one more more asserts. The solver must find a way to make all asserts true simultaneously for your query to return SAT. A query without any asserts returns true.

  4. Comments are preceded with ;.

  5. check-sat checks whether your query is SAT or UNSAT, and nothing else. In contrast, get-model returns the value of the variables the solver used to satisfy your query if your query was SAT. In this case, the output of the solver, including the variable values chosen, matches the SAT example:

      (define-fun b () Bool
      (define-fun a () Bool
      (define-fun c () Bool

    Additionally, there is also get-value. As will become clear later, get-value is sometimes more useful than get-model for presenting information about a model which satisfies your query. get-value is especially when you're interested in the return values of uninterpreted functions.

  6. Although not represented in the above example, there are a few additional operators available to the Bool theory in SMT-LIB that are not available in a SAT solver due to being polymorphic: =, distinct, and ite.

    • = checks whether two generic types T have equal value. If T is Bool, this is equivalent to XNOR. Unlike many programming languages, SMT-LIB uses single-equals (=), not double-equals (==) for comparison.

    • distinct checks whether two generic types T are not of equal value. If T is Bool, this is equivalent to XOR.

    • ite corresponds to if-then-else expressions in your favorite programming language. The if predicate requires Bool, while the then and else branches of the expression require a T as input.

      The entire expression returns a T as output. It returns the value of the then branch when the if predicate evaluates to true, and returns the value of the else branch when the if predicate evaluates to false. When T is Bool, ite has the following truth table:


While the above example doesn't really use an SMT solver's true power, it does illustrate the fundamentals of how to interface with one. We will discuss how to use other datatypes and theories with SMT-LIB as we turn a logic puzzle into an SMT query.

SMT And Logic Puzzles

With the whirlwind introduction to SMT out of the way, my goal with this post is to show you how to map the word problem presented by a logic puzzle to a SATisfiable SMT query. I will introduce an example logic puzzle first, and then show how to use various SMT-LIB commands to create some datatypes and constraints to represent the puzzle.

At a high level, if you visualize a logic puzzle grid, such as the one below, the conversion process looks like this:

An empty logic puzzle grid corresponding to the puzzle described
in the next section.
This empty logic puzzle contains 9 rows and columns, along with headers for each row and column. The column headers, in left-to-right order, are: Batman, Jake, Dibii, Lazer, Sleep, Ball, 1, 2, and 3. The row headers in top-to-bottom order, are: Ruby, Spot, and Starbuck, 1, 2, 3, Lazer, Sleep, and Ball.

Each set of 3 row and column headers belong to the same category:

  • Batman, Jake, Dibii are tomcats.
  • Ruby, Spot, and Starbuck are queen cats.
  • Lazer, sleep, and ball are activities.
  • 1, 2, and 3 are number of kittens.

To solve the puzzle, you are intended to mark each table entry a ✓ if the row and column are associated with each other, or ✗ if the row and column are not associated with each other, based on clues given to you (not represented in the above image).

A table entry whose row and column are of the same category does not need to be marked. The pictured table represents this by completely omitting such entries, making the table similar to an upper triangular matrix.

The bottom part of the puzzle consists of a table with the columns queen, tom, activity, and number of kittens. The column queen is filled in for you with Ruby, Spot, and Starbuck. You fill the remaining rows of this table based on which row and columns you marked with a ✓ in the above table.

I got the logic puzzle template (before filling in the headers) from Daydream Puzzles.

An Example Puzzle

For this post, I'll be using a puzzle from Aha! Puzzles as an example. I'm not going to copy the puzzle description, since a person presumably spent time to create the puzzle (among others), and I don't want to steal their work. Instead I will paraphrase the puzzle and grid, which describes a set of cats.


Given the above information, and the following clues, figure out each queen's partner, favorite activity, and the number of kittens they have.


The puzzle is intended to have a unique answer. However, using only the description above, there are multiple correct answers. The clues below are intended to limit the multiple correct answers down to a single unique answer6 :

  1. Batman's partner likes to chase a ball. She was not Starbuck.
  2. Dibii's partner liked the lazer.
  3. Ruby was a cat who liked to sleep.
  4. Starbuck has two more kittens than whoever Batman's companion is.

Mapping The Puzzle To An SMT Query

Now that we have a puzzle, we can map the textual description to a query and a mathematical model that the SMT solver will return. Let's create the query piece by piece, starting with initial setup code:

(set-option :produce-models true)
(set-logic ALL)

The first two lines we've already seen, but we have a new logic ALL. set-logic ALL is a bit of a cop-out; SMT-LIB allows ALL for applications which generate queries on-the-fly. ALL is not intended for handwritten queries. Unfortunately, z3 doesn't really have that many logics that support datatypes. And none of the logics that do support datatypes seem to support both integers and uninterpreted functions.7 So, without any better options for now, I'm just making explicit that z3 should use "whatever means necessary" to solve the logic puzzle8 .


declare-datatypes is part of the most recent version of SMT-LIB, version 2.6. It is a very general function that allows you to create complex data types, including recursive data types. In our usage we will only be using it to create the equivalent of enums without data in other programming languages9 .

In general, you create an enum Foo whose variants A, B, C (and more, if wish) have data using the following syntax10 :

(declare-datatypes ((Foo 0)) (((A) (B) (C))))

Anywhere in your query where a value of type Foo is used, the SMT solver has no choice but to make the value either A, B, or C. Additionally, the following properties hold11 :

This can be confirmed with the following query, which returns SAT:

(declare-datatypes ((Foo 0)) (((A) (B) (C))))
(assert (and (= A A) (= B B) (= C C)))
(assert (and (distinct A B) (distinct A C) (distinct B C)))

With the above syntax and properties in mind, the names of the cats and their activities correspond very nicely to enums whose variants don't have data:

(declare-datatypes ((Tom 0)) (((Batman) (Jake) (Dibii))))
(declare-datatypes ((Queen 0)) (((Ruby) (Spot) (Starbuck))))
(declare-datatypes ((Activities 0)) (((Lazer) (Sleep) (Ball))))

Uninterpreted Functions

Functions form the backbone of programming languages and mathematics, and SMT formulas are no exception. We can create pure functions to abstract away common parts of our query using define-fun. Here is an example add function that "adds one to its input Int":

(define-fun add ((x Int)) Int (+ x 1))
(get-value ((add 1) (add 2) (add 10)))

The above query returns SAT in response to check-sat. After check-sat, your model has an add function that the solver can call.

We don't have any asserts in the above query, so get-model returns nothing. However, we can display the result of calls to our add function using get-value. get-value takes a single argument, but this single argument can take an arbitrary number of terms. Each of the three terms to the get-value call above will make the solver read, evaluate, and print the results of calling your add function on various integers- exciting!

(((add 1) 2)
 ((add 2) 3)
 ((add 10) 11))

Not only can we define functions, we can also declare functions, without bothering to define/fill in the function body! We can request the solver to fill in the function body for us by using declare-fun:

(declare-fun add_two (Int) Int)
(get-value ((add_two 1)))

The above query returns sat in response to check-sat. get-model will then return the definition of add_two that the solver created:

  (define-fun add_two ((x!0 Int)) Int
(((add_two 1) 0))

In this case add_two returns the literal 0. Not particularly interesting. What if we add an assert, to force add_two to return 3 when given the input 1?

(declare-fun add_two (Int) Int)
(assert (= (add_two 1) 3))
(get-value ((add_two 1)))
  (define-fun add_two ((x!0 Int)) Int
(((add_two 1) 3))

It's a function that returns 3 for all inputs. Now the solver is being cute. "Technically correct" is the best type of correct. The solver won't create a more interesting function body unless forced to via proper constraints on the input arguments and return values12 .

We will constrain our uninterpreted functions shortly, but for now, let's declare 4 uninterpreted functions with declare-fun. These functions will in fact return the answers for our logic puzzle, once the solver fills them in:

(declare-fun QueenPartner (Queen) Tom)
(declare-fun QueenActivity (Queen) Activities)
(declare-fun QueenKittens (Queen) Int)
(declare-fun TomPartner (Tom) Queen)

To be specific:

Implicit Constraints

By their very nature, logic puzzles contain some usually-implied rules that we must follow when solving them if we don't want incoherent, nonsensical answers:

A logic puzzle grid such as above visually makes it easy to follow these rules. Unfortunately, an SMT solver knows nothing about logic puzzles without our help, so we must explain those implicit constraints in our query. We use asserts to force an SMT solver to reject nonsensical-but-valid answers to a query of variables, where in the case of a logic puzzle, our variables are our uninterpreted functions.

Our enums we created via declare-datatype already encode these uniqueness/finiteness constraints mentioned above. We can also represent the number of kittens each cat pair has as a NumKittens enum, but for convenience that will become clear shortly, I chose to use an Int. While Ints have a uniqueness constraint (1 is different than 2), in SMT-LIB, Ints do not have a finiteness constraint. Thus without additional information, the solver free to set NumKittens for each cat pair to 1, 10, or even 1,234,567, among other fine Ints, if the solver so chooses.

The puzzle's description implies outright says that the number of kittens each cat pair must be between 1 and 3 inclusive. We can constrain the values for number of kittens the solver will try using an assert:

; Integer constraint
(assert (and 
    (>= (QueenKittens Ruby) 1) (<= (QueenKittens Ruby) 3)
    (>= (QueenKittens Spot) 1) (<= (QueenKittens Spot) 3)
    (>= (QueenKittens Starbuck) 1) (<= (QueenKittens Starbuck) 3)))

While the uniquess constraints of enums and Ints gets us far, it's not quite enough. As quoted above:

As is customary for this type of problem, it is implied that each queen has a different partner, favorite activity, and number of kittens.

Although the solver recognizes that each tom is a different cat, without additional information, nothing prevents the solver from giving an answer which assigns, for instance, two queens to the same tomcat partner.

We can further constrain the solver from trying to assign two queens the same partner, number of kittens, and activity by constraining the uninterpreted functions we defined earlier via more asserts. For instance, the statement (assert (not (= (QueenPartner Ruby) (QueenPartner Spot))) tells the SMT solver "However you decide to fill the function body, I need the function QueenPartner to not return the same value for Ruby and Spot". To properly constrain our query, you need one assert for each possible pair of values in each category, which can be combined into a single large conjuction13 :

; "Only one dot in each row/col" consistency check.
(assert (and
    (not (= (QueenPartner Ruby) (QueenPartner Spot)))
    (not (= (QueenPartner Ruby) (QueenPartner Starbuck)))
    (not (= (QueenPartner Spot) (QueenPartner Starbuck)))
    (not (= (QueenKittens Ruby) (QueenKittens Spot)))
    (not (= (QueenKittens Ruby) (QueenKittens Starbuck)))
    (not (= (QueenKittens Spot) (QueenKittens Starbuck)))
    (not (= (QueenActivity Ruby) (QueenActivity Spot)))
    (not (= (QueenActivity Ruby) (QueenActivity Starbuck)))
    (not (= (QueenActivity Spot) (QueenActivity Starbuck)))))

The above check is morally equivalent to a human making sure a logic grid has only one ✓ per row and column for each combination of values in each category. Because of the uniqueness constraints for each enum and Int, the above is sufficient to make sure the solver knows that each queen has a mutually exclusive partner, number of kittens, and activity and that such a solution exists14 .

Lastly, each cat pair shares a favorite activity and the number kittens they have. So far, we've only written constraints for each queen in the cat pair. However, we don't need to rewrite the constraints for each tom! There is another consistency check we need, and a side-effect, we can represent each tom's favorite activity and number of kittens for free in terms of this constraint.

The TomPartner uninterpreted function returns the Queen who is the partner of the given Tom. Likewise, the QueenPartner function returns the Tom whose partner is the given Queen. Without extra constraints, the SMT solver doesn't know "obvious" facts like "'the partner of the partner' of a cat is the same cat you started with"15 . We can represent this fact with another assert:

; All other constraints for Tom are implied if the partner of the partner is the same.
(assert (and
    (= (QueenPartner (TomPartner Batman)) Batman)
    (= (QueenPartner (TomPartner Jake)) Jake)
    (= (QueenPartner (TomPartner Dibii)) Dibii)))

After this constraint, TomPartner unambiguously maps a tom back to queen. Since a cat pair has the same favorite activity and number of kittens, the solver now has enough information to solve for each tom's favorite activity and number of kittens in terms of queens.

Explicit Constraints

The implicit constraits are necessary but not sufficient to constrain the logic puzzle to a unique answer. If an SMT solver is presented with only the above lines, but there will be multiple ways to fill in our uninterpreted functions which satisfy that query. The solver will choose to present to you16 one interpretation for each function out of all valid ones, and chances are, it won't be the intended answer to the logic puzzle based on the clues.

If you incorporate the clues, however, the SMT solver should17 be limited to only one valid answer to your query. In this sense, the clues given above form the explicit, outright-stated constraints unique to a specific logic puzzle. Explicit constraints are handled identically to the implicit constraints in the previous section, by mapping a textual description of the clues described earlier to asserts:

Batman's partner likes to chase a ball. She was not Starbuck.

; 1. Batman chose the female who liked to chase a ball, but she was not Starbuck.
(assert (= (QueenActivity (TomPartner Batman)) Ball))
(assert (not (= (TomPartner Batman) Starbuck)))

Dibii's partner liked the lazer.

; 2. Dibii's companion liked to chase the laser light.
(assert (= (QueenActivity (TomPartner Dibii)) Lazer))

Ruby was a cat who liked to sleep.

; 3. Ruby loved to cuddle up to her male for a long afternoon nap in the sun.
(assert (= (QueenActivity Ruby) Sleep))

Starbuck has two more kittens than whoever Batman's companion is.

This clue is the reason I represented the number of kittens as an Int. While I would have to reinvent addition if I used a NumKittens enum, the SMT solver already knows the theory of Ints and can add them using +18 .

; 4. Starbuck had two more kittens than Batman's companion.
(assert (= (QueenKittens Starbuck)
           (+ (QueenKittens (TomPartner Batman)) 2)))

Getting An Answer

Once, we add check-sat, our query text file is finished, and we can feed it to z3 using the invocation described to check whether our query is satisfiable. You can download the completed query here and run it online (as of 2023-08-13) if you wish.

Many times, knowing SAT or UNSAT is "good enough". But for a logic puzzle, we actually want the variable values. get-model works, but this shows the definition of the uninterpreted functions you asked the solver to fill in. While seeing the function bodies are useful information, for solving a logic puzzle, it's probably better to use get-value to see individual outputs of each now-interpreted function for "interesting" inputs.

To compare and contrast, I use all of check-sat, get-model, and get-value:


(echo "
(get-value ((QueenPartner Spot)
            (QueenKittens Spot)
            (QueenActivity Spot)))

(echo "
(get-value ((QueenPartner Ruby)
            (QueenKittens Ruby)
            (QueenActivity Ruby)))

(echo "
(get-value ((QueenPartner Starbuck)
            (QueenKittens Starbuck)
            (QueenActivity Starbuck)))

Running the entire query through my local copy of z3 gives the following output:

  (define-fun QueenKittens ((x!0 Queen)) Int
    (ite (= x!0 Spot) 1
    (ite (= x!0 Starbuck) 3
  (define-fun QueenPartner ((x!0 Queen)) Tom
    (ite (= x!0 Spot) Batman
    (ite (= x!0 Starbuck) Dibii
  (define-fun QueenActivity ((x!0 Queen)) Activities
    (ite (= x!0 Starbuck) Lazer
    (ite (= x!0 Ruby) Sleep
  (define-fun TomPartner ((x!0 Tom)) Queen
    (ite (= x!0 Jake) Ruby
    (ite (= x!0 Dibii) Starbuck

(((QueenPartner Spot) Batman)
 ((QueenKittens Spot) 1)
 ((QueenActivity Spot) Ball))

(((QueenPartner Ruby) Jake)
 ((QueenKittens Ruby) 2)
 ((QueenActivity Ruby) Sleep))

(((QueenPartner Starbuck) Dibii)
 ((QueenKittens Starbuck) 3)
 ((QueenActivity Starbuck) Lazer))

First off, the query is sat, which is a good start! The model returned by get-model isn't hard for me to follow either. I could fairly quickly figure out each answer to the logic puzzle using get-model alone, and I'm sure anyone else reading this post could too. But using get-value to evaluate the functions the solver filled in for us gives us the logic puzzle answers directly, ready to be written down! So let's save ourselves the trouble and populate the puzzle's answers thanks to get-value!

Was the solver correct? Well, as of this writing (2023-07-23), Aha! Puzzles lets you solve logic puzzles interactively online. I'll just let the below screenshot speak for itself:

An image from Firefox showing the Aha! Puzzles website popup with
a solved logic puzzle grid behind the popup. A table of answers to the puzzle
is to the right side of the grid, also partially obscured. The popup says &quot;Congratulations, you win!&quot;,
showing a solve time of 19 seconds.
The popup says "Congratulations, you win!", so indeed z3 was correct. Although the logic puzzle grid and answer table is obscured by the popup, there is just enough of the grid and answer table shown to verify that I filled in the puzzle with the answers returned by z3.

Conclusion/Next Steps

I still find logic puzzles fun to work through manually. To me, asking a machine to solve every logic puzzle for you defeats the satisfaction of working your brain to solve the puzzle yourself. So while I hope the above example provides some insight as to how to map word problems to an SMT query/model, I wouldn't suggest that you start solving every logic puzzle with an SMT solver.

That said, I found that trying to map a logic puzzle to an SMT solver query was also intellectually stimulating. It is a low-stakes problem that gave me the opportunity to exercise my skills at creating SMT queries, as opposed to my typical usage where I have a program generate queries and models for me. I was excited after I got my query working, and I wanted to document the result.

Speaking of generating queries and models, it's possible to take this logic puzzle-solving use case further by creating a Domain-Specific Language (DSL). Like how the previously-linked SymbiYosys converts Verilog to SMT-LIB queries, this DSL would convert a more convenient textual representation of a logic puzzle into SMT-LIB queries. It's a bit more up-front work than writing SMT-LIB queries manually, and probably not worth the time spent other than for learning. However, since this project is meant to be low-stakes and for fun, I may look into a DSL in the future for a follow-up post. It's about time I wrote a(nother) compiler anyway.


I would like to thank my friends Screwtape and Alyssa Rosenzweig for their general feedback and a few last-minute corrections on the final draft of this post.


1 I will not be discussing how to write a SAT solver; for those interested I suggest this tutorial for building a SAT solver in Python. The tutorial is based on Donald Knuth's "SAT0W" solver.

Although I have not personally done so yet, I would probably also look at Knuth's "SAT10" solver, which is based on the Davis–Putnam–Logemann–Loveland (DPLL) algorithm.

2 To be honest, I'm going by what Wikipedia says here. There's no external reference:

Early attempts for solving SMT instances involved translating them to Boolean SAT instances (e.g., a 32-bit integer variable would be encoded by 32 single-bit variables with appropriate weights and word-level operations such as 'plus' would be replaced by lower-level logic operations on the bits) and passing this formula to a Boolean SAT solver.

3 Just like for SAT solvers, I will not be discussing how to write an SMT solver. While in the process of writing this post, I have been learning about the DPLL(T) framework myself. I've found the following set of slides useful:

4 In my own experience, yices is significantly faster at many problems involving formal verification of HDL. However, it is not universal. This is why the Symbiyosys front end for HDL verification will run the same query against multiple SMT solvers in parallel if you have them installed.

5 If you don't include parentheses to actually evaluate operators, weird things you probably didn't intend will happen. For instance, try comparing the result of the query (assert (= or true false true)) to (assert (= (or true false) true)).

6 More succinctly, the clues "constrain the solution space".

7 The logic that's closest to what I need, Quantifier-Free Finite Domain (QF_FD), has some additional restrictions which cause my logic puzzle query to return UNKNOWN. I'm not actually certain why it returns UNKNOWN, but I suspect I'm running afoul of the restrictions by using datatypes with uninterpreted functions, and not just equality (=) comparisons.

8 I belive `ALL` is z3's default logic anyway.

9 declare-datatypes is not required, and the logic puzzle can be solved without them. In fact, I have constructed a working query without datatypes for this logic puzzle- twice! I wanted to solve the puzzle with yices as well as z3 as an exercise; while yices does support enums and the like, they are not compatible with SMT-LIB.

As shown in the linked yices examples, declare-sort, declare-const, and declare-fun, plus some assert constraints can emulate declare-datatypes and finite integers reasonably. However, for a handwritten example, I wanted to focus on creating constraints appropriate to a logic puzzle rather than reinventing enums.

10 The integer argument to declare-datatypes is used for parametric polymorphism. It represents, informally, "the number of types to be filled in later".

11 What A, B, and C actually are in terms of "bits in memory somewhere" doesn't matter as long as the solver manipulates these values of type Foo according to the above rules.

12 To make the example more interesting, I tried to use asserts to get the solver to generate a function body which uses + (integer addition), but got bored and gave up. This post is getting long anyway.

13 Readers familiar with using z3 may notice that I can more-succinctly represent the exclusivity constraints using a forall quantifier. Indeed, this works, and was how I originally solved the example logic puzzle. Except for an extra pair datatype, the quantifier version of our query is very similar to the quantifier-free query constructed in this blog post; just compare the comments.

However, I decided not not to discuss quantifiers in this post for a few reasons:

  1. The quantifiers are not required for enumerations with finite variants to create the constraints.
  2. I am not exceptionally familiar with using quantifiers in SMT queries and when they are actually required. I haven't needed them for HDL verification, for instance.
  3. Quantifiers complicate the SMT solver's job. While queries without quantifiers can always be determined to be SAT or UNSAT, some queries with quanitifers are undecidable. As of this writing (2023-07-02) I don't think I'm qualified to discuss undecidability other than in passing.
  4. Not all SMT solvers support quantifiers (e.g. yices).

Therefore, while the quantifiers are likely harmless here, I'd rather them when they are actually required, not just as a means to make queries take fewer lines.

14 If two enum variants could be equal each other in value (i.e. not distinct), it would be impossible for our uninterpreted functions to return three distinct enum variants to three different values of input enum variants. To show this, play with this commented query online:

(declare-sort Foo 0) ; declare-sort declares a new type, like `Int`, `Bool`.
                     ; It has fewer assumptions baked-in than `declare-datatypes`.

; Create some named values for our types.
; These are like our `enum` variants.
(declare-const foo_one Foo)
(declare-const foo_two Foo) 
(declare-const foo_three Foo)

(declare-sort Bar 0)
(declare-const bar_one Bar)
(declare-const bar_two Bar)
(declare-const bar_three Bar)

; Our uninterpreted function. Takes a `Foo`, returns a `Bar`.
(declare-fun my-fun (Foo) Bar)

; Ensure each constant has a unique value. This is implied 
; `declare-datatypes`.
; Change _any_ of the 6 asserts from `distinct` to `=`.
; The solver will return `unsat`.
(assert (distinct foo_one foo_two))
(assert (distinct foo_one foo_three))
(assert (distinct foo_two foo_three))

(assert (distinct bar_one bar_two))
(assert (distinct bar_one bar_three))
(assert (distinct bar_two bar_three))

; Prevent the solver from trying to create more values for our types
; from thin air. This is implied in `declare-datatypes`.
(assert (and
        (= (my-fun foo_one) bar_one)
        (= (my-fun foo_one) bar_two)
        (= (my-fun foo_one) bar_three))
        (= (my-fun foo_two) bar_one)
        (= (my-fun foo_two) bar_two)
        (= (my-fun foo_two) bar_three))
        (= (my-fun foo_three) bar_one)
        (= (my-fun foo_three) bar_two)
        (= (my-fun foo_three) bar_three))))

; Each input to my-fun should return a unique output.
; This is impossible unless foo_{one, two, three} have
; different values according to `=`.
(assert (distinct (my-fun foo_one) (my-fun foo_two)))
(assert (distinct (my-fun foo_one) (my-fun foo_three)))
(assert (distinct (my-fun foo_two) (my-fun foo_three)))


15 While I praise the SMT solver for being open-minded, polyamory is not within the scope of this logic puzzle.

16 By default an SMT solver will give you one answer. It is possible to get all possible answers to an SMT query. Since a logic puzzle is intended to have a single answer, ensuring that the number of unique answers is 1 would be another good consistency check at the cost of execution time. Unfortunately, I don't remember how to do this.

17 The puzzle should have a unique answer if it was designed properly.

18 I chose to write the assert in terms of Int and + as an example. When numbers you can choose from in a category aren't sequential, and/or the clues are more complicated, I think it's useful to be able to model clues with other theories the SMT solver provides.

In this case, because we know that NumKittens is constrained from 1 to 3 inclusive, you could just as easily derive the constraint as:

(assert (= (QueenKittens Starbuck) 3))
(assert (= (QueenKittens (TomPartner Batman)) 1))

Last Updated: 2023-08-15