Property testing property testers

Show the Haskell imports
import Numeric.Natural
import Data.Maybe

Property testers such as QuickCheck are used to search for inputs that invalidate a given property. We focus on properties that are implemented in Haskell as a function a -> Bool. In the context of QuickCheck, the type a is assumed to be an instance of the Arbitrary type class, allowing to randomly generate values of a. Because of its random nature, QuickCheck might not find any offending element of a even if it exists.

type Property a = a -> Bool

If the property tester managed to invalidate the property, it returns the offending element of a. Otherwise, it simply succeeds.

data Result counterexample
  = Success
  | Failure counterexample
instance Show (Result counterexample) where
  show Success = "Success"
  show (Failure _) = "Failure"
type Tester a = Property a -> Result a

It is quite well-known in the functional programming community that there exist infinite exhaustively testable types. That is, in finite time we can exhaustively test whether a given property holds for all input values of type a, even, in some cases, when a is infinite.

A prototypical infinite exhaustively testable type, and the one we’ll focus on in this blog post, is the Cantor space of functions from the natural numbers to the Booleans.

type Cantor = Natural -> Bool

Expanding the above definition, exhaustive testability of Cantor space means that we can write a testing process of type Tester Cantor = Property Cantor -> Result Cantor that, for any property on Cantor space, succeeds when the property holds for all values of type Cantor, and returns a counterexample when there is a value of type Cantor for which it fails.

(#) :: Bool -> Cantor -> Cantor
b # f = \i -> if i == 0 then b else f (i-1)
-- Try to find a probe that invalidates the property.
find :: Property Cantor -> Cantor
find prop = h # find (\a -> prop (h # a))
  where h = prop (False # find (\a -> prop (False # a)))
-- Exhaustively test a property on Cantor space
exhaustiveCheck :: Tester Cantor
exhaustiveCheck prop =
  if prop probe
  -- The property holds on all inputs
  then Success
  -- The property fails; 'probe' is a counterexample
  else Failure probe
  where probe = find prop

The above fact has seen little application outside of papers and blog posts. This text, being a blog post, is no different. The raison d’ĂȘtre of this post is merely to, once again, observe this surprising fact, and to do something fun with it, as appears to be done every couple of years.

So what might we do with an exhaustive testing process Property Cantor -> Result Cantor? What property of Cantor space might we wish to exhaustively test? Such a property might test input-output behavior of a function that takes elements of Cantor space as input (though not necessarily returning Booleans). However, when we are thinking of QuickCheck-able programs, we normally think of types that are isomorphic to the natural numbers, such as strings and binary trees. “Oh, but that type is isomorphic to Cantor space!” is not the one-liner it could be among Haskellers.

Here’s the key observation of this blog post: Cantor = Property Natural.

So here’s a proposal for program we might want to phrase properties about: the QuickCheck-style property tester itself. After all, we may see a property tester for natural numbers as a map Cantor -> Result Natural, that takes a property of natural numbers (e.g. whether taking the square of a natural always results in an even number) and tells us whether it managed to find any inputs for which the property returns False (in this case, one might hope so).

Here are some sample implementations of property testers for us to play around with, which have some intentional shortcomings.

-- A property tester that doesn't check the property against any
-- inputs and simply always succeeds
testNothing :: Tester Natural
-- A property tester that only tests the input 0
testZero :: Tester Natural
-- A property tester that tests the property on inputs 0 through 100
test100 :: Tester Natural
-- A property tester that tests the property on inputs 100 through 200
testNext100 :: Tester Natural
-- A property tester that reports an incorrect witness of failure
testWrongWitness :: Tester Natural
-- A property tester that sometimes reports an incorrect witness:
-- only when the property holds for [0..100] and 100000,
-- but fails at 12345 and 26535
testWrongWitnessSubtle :: Tester Natural
Show the implementations of these testers
testNothing _ = Success
testZero prop =
  if prop 0
  then Success
  else Failure 0
test100 prop = case tests of
  []  -> Success
  i:_ -> Failure i
  tests = mapMaybe (\i -> if prop i then Nothing else Just i) [0..100]
testNext100 prop = case tests of
  []  -> Success
  i:_ -> Failure i
  tests = mapMaybe (\i -> if prop i then Nothing else Just i) [100..200]
testWrongWitness prop =
  if prop 12345
  then Success
  else Failure 9876
testWrongWitnessSubtle prop = case test100 prop of
  Success ->
    if prop 100000
    then case (prop 26535, prop 12345) of
      (True, True) -> Success
      (False, True) -> Failure 26535
      (True, False) -> Failure 12345
      (False, False) -> Failure 9876 -- Wrong witness!
    else Failure 100000
  Failure i -> Failure i

So what property can we phrase for a given property tester to satisfy? We can require that the property tester always tests the property at input 0. More precisely, if the property passes testing, then the property itself needs to hold at 0. We would expect testNothing to fail this test, but testZero and test100 to satisfy it.

prop_alwaysTestZero :: Tester Natural -> Property (Property Natural)
prop_alwaysTestZero tester prop =
  case tester prop of
  Success -> prop 0
  Failure _ -> True

Note that we can’t directly check whether the property tester actually evaluated the property at 0: we only observe input-output behavior of the property tester. So in the case of a test failure, we can’t say anything about the required behavior of the property tester, since it might still fail because the property is false at another input. However, this is not reducing the power of our test at all, since we’re exhaustively testing over all properties of the naturals.

Just to drive this point home: for a given property tester tester, the property prop_alwaysTestZero tester isn’t just QuickCheck-able, in the sense that we can generate 100 inputs and see whether the property holds for those 100 inputs. It is exhaustively checkable, despite it stating a property of an infinite space. If the exhaustive check succeeds, we know that prop_alwaysTestZero tester holds for all properties on the naturals.

At this point, it is important to realize that, in the context of property testing, properties must in fact be decidable, rather than being arbitrary formulae in first-order logic (say). This restricts what we can test. In particular, we can’t test the claim “if the property always holds, then the property tester always succeeds”: because it is not decidable whether the input property always holds, so this claim itself is not a decidable property. But, since if the property tester fails, it also outputs a witness of this, we can test the contrapositive. So here is an exhaustively testable property expressing that if a property fails the property tester, then the witness provided by the property tester is actually an input for which the property is False. We would expect testNothing, testZero and test100 to satisfy it, but testWrongWitness and testWrongWitnessSubtle to fail it.

prop_failureActuallyFails :: Tester Natural -> Property (Property Natural)
prop_failureActuallyFails tester prop =
  case tester prop of
    Success -> True
    -- The property tester found a counterexample, so that better be one
    Failure n -> not (prop n)

This, I think, is a key property of property testers that is currently untested for QuickCheck. To be clear, I would expect QuickCheck to satisfy it, but strictly speaking it’s essential behavior that ought to be in the test suite.

QuickCheck attempts to shrinks counterexamples: so once it has randomly encountered a counterexample, it tries to find another which is intended to be smaller or simpler, in a sense specified by the implementation of shrink as part of the Arbitrary type class. We might therefore expect that if a tester finds a counterexample, that it picks a minimal one, in the sense that applying shrink to it does not yield any counterexamples that are smaller. For the natural numbers, this means that when we obtain a Failure i, then any natural smaller than i should not be a counterexample.

prop_failureMinimal :: Tester Natural -> Property (Property Natural)
prop_failureMinimal tester prop =
  case tester prop of
    Success -> True
    Failure i -> all prop (init [0..i])

Here’s how we can run the exhaustive testing:

main :: IO ()
main = do
  putStr "Testing whether testNothing tests 0 (expecting Failure): "
  print $ exhaustiveCheck (prop_alwaysTestZero testNothing)
  putStr "Testing whether testZero tests 0 (expecting Success): "
  print $ exhaustiveCheck (prop_alwaysTestZero testZero)
  putStr "Testing whether test100 tests 0 (expecting Success): "
  print $ exhaustiveCheck (prop_alwaysTestZero test100)
  putStr "Testing whether test100 yields minimal failure witnesses (expecting Success): "
  print $ exhaustiveCheck (prop_failureMinimal test100)
  putStr "Testing whether testNext100 yields minimal failure witnesses (expecting Failure): "
  print $ exhaustiveCheck (prop_failureMinimal testNext100)
  putStr "Testing whether test100 gives valid failure witnesses (expecting Success): "
  print $ exhaustiveCheck (prop_failureActuallyFails test100)
  putStr "Testing whether testWrongWitness gives valid failure witnesses (expecting Failure): "
  print $ exhaustiveCheck (prop_failureActuallyFails testWrongWitness)
  putStr "Testing whether testWrongWitnessSubtle gives valid failure witnesses (expecting Failure): "
  print $ exhaustiveCheck (prop_failureActuallyFails testWrongWitnessSubtle)

I think the last test case best demonstrates the power of this approach. Normal Arbitrary-based property testing has no chance to catch the subtle bug in the implementation of the property tester testWrongWitnessSubtle. It would take deep inspection of the code of testWrongWitnessSubtle to write a counterexample manually. But this exhaustive technique finds a counterexample within 0.3 seconds on my machine, without any static analysis.

So why is this just a blog post rather than a pull request?

In reality, QuickCheck uses the IO monad to print additional information to the terminal, and to generate randomness, so that in this instance the true type of quickCheck would be closer to (Natural -> Bool) -> IO (Result Natural). But we may imagine a pure variant of QuickCheck whose random number generator can be pre-seeded, and which computes a result purely without writing anything to the terminal. I’m not sure the above idea is adequate motivation to refactor QuickCheck in such a fundamental way. But, if anybody wants to work on this, do let me know.

Good luck to the next person writing a blog post about infinite searchable types!