Property testing property testers
Show the Haskell imports
import Numeric.Natural import Data.Maybe
Property testers such as QuickCheck are used to search for inputs that invalidate a given property. We focus on properties that are implemented in Haskell as a function
a -> Bool. In the context of QuickCheck, the type
a is assumed to be an instance of the
Arbitrary type class, allowing to randomly generate values of
a. Because of its random nature, QuickCheck might not find any offending element of
a even if it exists.
type Property a = a -> Bool
If the property tester managed to invalidate the property, it returns the offending element of
a. Otherwise, it simply succeeds.
data Result counterexample = Success | Failure counterexample instance Show (Result counterexample) where show Success = "Success" show (Failure _) = "Failure"
type Tester a = Property a -> Result a
It is quite well-known in the functional programming community that there exist infinite exhaustively testable types. That is, in finite time we can exhaustively test whether a given property holds for all input values of type
a, even, in some cases, when
a is infinite.
A prototypical infinite exhaustively testable type, and the one we’ll focus on in this blog post, is the Cantor space of functions from the natural numbers to the Booleans.
type Cantor = Natural -> Bool
Expanding the above definition, exhaustive testability of Cantor space means that we can write a testing process of type
Tester Cantor = Property Cantor -> Result Cantor that, for any property on Cantor space, succeeds when the property holds for all values of type
Cantor, and returns a counterexample when there is a value of type
Cantor for which it fails.
(#) :: Bool -> Cantor -> Cantor # f = \i -> if i == 0 then b else f (i-1)b
-- Try to find a probe that invalidates the property. find :: Property Cantor -> Cantor = h # find (\a -> prop (h # a)) find prop where h = prop (False # find (\a -> prop (False # a)))
-- Exhaustively test a property on Cantor space exhaustiveCheck :: Tester Cantor = exhaustiveCheck prop if prop probe -- The property holds on all inputs then Success -- The property fails; 'probe' is a counterexample else Failure probe where probe = find prop
The above fact has seen little application outside of papers and blog posts. This text, being a blog post, is no different. The raison d’être of this post is merely to, once again, observe this surprising fact, and to do something fun with it, as appears to be done every couple of years.
So what might we do with an exhaustive testing process
Property Cantor -> Result Cantor? What property of Cantor space might we wish to exhaustively test? Such a property might test input-output behavior of a function that takes elements of Cantor space as input (though not necessarily returning Booleans). However, when we are thinking of QuickCheck-able programs, we normally think of types that are isomorphic to the natural numbers, such as strings and binary trees. “Oh, but that type is isomorphic to Cantor space!” is not the one-liner it could be among Haskellers.
Here’s the key observation of this blog post:
Cantor = Property Natural.
So here’s a proposal for program we might want to phrase properties about: the QuickCheck-style property tester itself. After all, we may see a property tester for natural numbers as a map
Cantor -> Result Natural, that takes a property of natural numbers (e.g. whether taking the square of a natural always results in an even number) and tells us whether it managed to find any inputs for which the property returns
False (in this case, one might hope so).
Here are some sample implementations of property testers for us to play around with, which have some intentional shortcomings.
-- A property tester that doesn't check the property against any -- inputs and simply always succeeds testNothing :: Tester Natural -- A property tester that only tests the input 0 testZero :: Tester Natural -- A property tester that tests the property on inputs 0 through 100 test100 :: Tester Natural -- A property tester that tests the property on inputs 100 through 200 testNext100 :: Tester Natural -- A property tester that reports an incorrect witness of failure testWrongWitness :: Tester Natural -- A property tester that sometimes reports an incorrect witness: -- only when the property holds for [0..100] and 100000, -- but fails at 12345 and 26535 testWrongWitnessSubtle :: Tester Natural
Show the implementations of these testers
= SuccesstestNothing _
= testZero prop if prop 0 then Success else Failure 0
= case tests of test100 prop -> Success  :_ -> Failure i iwhere = mapMaybe (\i -> if prop i then Nothing else Just i) [0..100] tests
= case tests of testNext100 prop -> Success  :_ -> Failure i iwhere = mapMaybe (\i -> if prop i then Nothing else Just i) [100..200] tests
= testWrongWitness prop if prop 12345 then Success else Failure 9876
= case test100 prop of testWrongWitnessSubtle prop Success -> if prop 100000 then case (prop 26535, prop 12345) of True, True) -> Success (False, True) -> Failure 26535 (True, False) -> Failure 12345 (False, False) -> Failure 9876 -- Wrong witness! (else Failure 100000 Failure i -> Failure i
So what property can we phrase for a given property tester to satisfy? We can require that the property tester always tests the property at input 0. More precisely, if the property passes testing, then the property itself needs to hold at 0. We would expect
testNothing to fail this test, but
test100 to satisfy it.
prop_alwaysTestZero :: Tester Natural -> Property (Property Natural) = prop_alwaysTestZero tester prop case tester prop of Success -> prop 0 Failure _ -> True
Note that we can’t directly check whether the property tester actually evaluated the property at 0: we only observe input-output behavior of the property tester. So in the case of a test failure, we can’t say anything about the required behavior of the property tester, since it might still fail because the property is false at another input. However, this is not reducing the power of our test at all, since we’re exhaustively testing over all properties of the naturals.
Just to drive this point home: for a given property tester
tester, the property
prop_alwaysTestZero tester isn’t just QuickCheck-able, in the sense that we can generate 100 inputs and see whether the property holds for those 100 inputs. It is exhaustively checkable, despite it stating a property of an infinite space. If the exhaustive check succeeds, we know that
prop_alwaysTestZero tester holds for all properties on the naturals.
At this point, it is important to realize that, in the context of property testing, properties must in fact be decidable, rather than being arbitrary formulae in first-order logic (say). This restricts what we can test. In particular, we can’t test the claim “if the property always holds, then the property tester always succeeds”: because it is not decidable whether the input property always holds, so this claim itself is not a decidable property. But, since if the property tester fails, it also outputs a witness of this, we can test the contrapositive. So here is an exhaustively testable property expressing that if a property fails the property tester, then the witness provided by the property tester is actually an input for which the property is
False. We would expect
test100 to satisfy it, but
testWrongWitnessSubtle to fail it.
prop_failureActuallyFails :: Tester Natural -> Property (Property Natural) = prop_failureActuallyFails tester prop case tester prop of Success -> True -- The property tester found a counterexample, so that better be one Failure n -> not (prop n)
This, I think, is a key property of property testers that is currently untested for QuickCheck. To be clear, I would expect QuickCheck to satisfy it, but strictly speaking it’s essential behavior that ought to be in the test suite.
QuickCheck attempts to shrinks counterexamples: so once it has randomly encountered a counterexample, it tries to find another which is intended to be smaller or simpler, in a sense specified by the implementation of
shrink as part of the
Arbitrary type class. We might therefore expect that if a tester finds a counterexample, that it picks a minimal one, in the sense that applying
shrink to it does not yield any counterexamples that are smaller. For the natural numbers, this means that when we obtain a
Failure i, then any natural smaller than
i should not be a counterexample.
prop_failureMinimal :: Tester Natural -> Property (Property Natural) = prop_failureMinimal tester prop case tester prop of Success -> True Failure i -> all prop (init [0..i])
Here’s how we can run the exhaustive testing:
main :: IO () = do main putStr "Testing whether testNothing tests 0 (expecting Failure): " print $ exhaustiveCheck (prop_alwaysTestZero testNothing) putStr "Testing whether testZero tests 0 (expecting Success): " print $ exhaustiveCheck (prop_alwaysTestZero testZero) putStr "Testing whether test100 tests 0 (expecting Success): " print $ exhaustiveCheck (prop_alwaysTestZero test100) putStr "Testing whether test100 yields minimal failure witnesses (expecting Success): " print $ exhaustiveCheck (prop_failureMinimal test100) putStr "Testing whether testNext100 yields minimal failure witnesses (expecting Failure): " print $ exhaustiveCheck (prop_failureMinimal testNext100) putStr "Testing whether test100 gives valid failure witnesses (expecting Success): " print $ exhaustiveCheck (prop_failureActuallyFails test100) putStr "Testing whether testWrongWitness gives valid failure witnesses (expecting Failure): " print $ exhaustiveCheck (prop_failureActuallyFails testWrongWitness) putStr "Testing whether testWrongWitnessSubtle gives valid failure witnesses (expecting Failure): " print $ exhaustiveCheck (prop_failureActuallyFails testWrongWitnessSubtle)
I think the last test case best demonstrates the power of this approach. Normal
Arbitrary-based property testing has no chance to catch the subtle bug in the implementation of the property tester
testWrongWitnessSubtle. It would take deep inspection of the code of
testWrongWitnessSubtle to write a counterexample manually. But this exhaustive technique finds a counterexample within 0.3 seconds on my machine, without any static analysis.
So why is this just a blog post rather than a pull request?
In reality, QuickCheck uses the IO monad to print additional information to the terminal, and to generate randomness, so that in this instance the true type of
quickCheck would be closer to
(Natural -> Bool) -> IO (Result Natural). But we may imagine a pure variant of QuickCheck whose random number generator can be pre-seeded, and which computes a result purely without writing anything to the terminal. I’m not sure the above idea is adequate motivation to refactor QuickCheck in such a fundamental way. But, if anybody wants to work on this, do let me know.
Good luck to the next person writing a blog post about infinite searchable types!