Fuzz Map is a fuzzer for GUIs that automatically identifies states using code coverage and builds a visual map. Ideally, the map is useful even to people who'd prefer not to read code.
Behind this window is a interactive local demo. Hide this introduction then click to start fuzzing. To show this introduction again, click the button.
Here's a 5-second video of real-time fuzzing on my laptop. Every state or arrow in the map corresponds to one or more codepaths discovered by the fuzzer:
The sandbox renders the React code in the editor (specifically, it
renders <App />
). Click
to update the sandbox, and click the
button to reset it. The map supports pan and zoom.
The beginning of this writeup will focus on features. If, instead, you're primarily
interested in how Fuzz Map works, you can skip ahead to the part about
how it works. There's some fun trivia, e.g. why
a && b
can't be transformed into
(a && (h1, true)) || (h2, b)
.
By the way, everything in this research prototype runs in your browser. No data is uploaded to a server.
Fuzzing an application might reveal an unexpected state or crash in the map. After updating the application's code, you can fuzz the application a second time. Checking the new map helps to verify that the effects of the change are what you expected.
In Fuzz Map, fuzzing the second time normally goes much faster than it did the first
time around. This is because inputs are saved for reuse after recompiling. Inputs are
typically reusable without manual annotation because Fuzz Map uses the event handler's
inline declaration to identify it. For example, if you make a button with an
onClick
event handler:
function App(...) { const handleClick = ...; return <button onClick={handleClick} />; }
then the event handler will be identified using the text "handleClick"
.
This means that even if the definition of handleClick
is changed,
click events will still be reusable. If necessary, you can disambiguate further by
defining the data-fuzzmap-key
HTML attribute.
Errors can occur while rendering a state or while handling an event. Fuzz Map detects errors and displays them in the map.
Here's a potential bug in the Checkout example. Suppose an order contains zero items.
The code for computing the order subtotal has a
reduce
call:
const subtotal = sortedItems .map(([_code, item]) => item.quantity * item.price) .reduce((a, b) => a + b, 0);
If the second argument to reduce
is forgotten, then removing all of the
items from the order by decreasing their quantities to zero results in the following
map:
Errors can also occur in event handlers. Inserting an error into
onClickContinue
results in this map:
Hover over the error icon to reveal the error:
A single event handler can have many different cases, each corresponding to a distinct codepath. Fuzz Map lists out the cases for an input in the Before/After view.
Suppose we use the following modified code to handle changes to item quantities in
onChangeQuantity
:
const quantity = e.target.valueAsNumber; const items = new Map(oldItems); if (quantity === 0) { items.delete(code); } else { const item = items.get(code); items.set(code, { ...item, quantity }); } return items;
After fuzzing, there will be an arrow from the initial state to the initial state with a name like “change 'House Wine'” or “change 'Double-Shot Espresso'”. The name could be different because changing the quantity of any item is handled by the same code.
Click on the label to switch to the Before/After view, which lists the different cases for changing an item quantity.
The modified code doesn't properly handle “Case 2. change 'House Wine' to
''”. When the item's quantity is changed to the empty string,
e.target.valueAsNumber
evaluates to NaN
, and the subtotal
calculation goes wrong.
Click
to highlight the event handler's declaration:A future version of Fuzz Map could use the information it already has to highlight the lines of code that executed for each case. You could select two cases to reveal which branches executed differently. This could be used to debug unexpected states.
The main idea behind Fuzz Map is to distinguish between GUI states using code coverage and to render these states as a simplified map. Coverage-guided fuzzing has been applied with great success for more than a decade. A complete survey is beyond the scope of this writeup; see Manes et al. (2018) for an overview, including a fuzzer genealogy.
A visual map is especially useful for GUI development, where bugs are often obvious to humans but difficult for computers to check. It's almost certainly a bug if a program crashes or reads memory it shouldn't. But what if a button sends the user to the wrong screen, or the interface just looks strange under unexpected circumstances? People already use whiteboards or prototyping software to explain an application's behavior to themselves, or to review it with colleagues. Fuzz Map tries to create the same kind of map automatically.
While Fuzz Map fuzzes, it builds a state graph. Each node in the graph represents an application state. Each edge represents an input processed by an event handler. Every state and event is identified by a hit vector which describes the branches that executed while the application rendered or handled the event. The following program contains two branches:
const name = ...; if (name !== null) { return `Hello, ${name}!`; } else { throw new Error("TODO"); }
If the program runs one time and name
is not null, then the hit vector will
be \( (1, 0) \). If the program runs two times, and both times name
is
null, then the hit vector will be \( (0, 2) \).
A hit vector is a coarse representation of an application state: many different states can correspond to the same hit vector. This simplification is typically useful for Fuzz Map. The actual program being fuzzed (approximately a Turing machine) is always much more complicated than the map that Fuzz Map displays (approximately a finite automaton). Note that this simplification means that it's possible, albeit unlikely, to have two different states with the same hit vector where only one exhibits a bug.
Bugs in an application generally correspond to untested codepaths or interactions
between codepaths. Two states that execute very different code might render the same
HTML. But if a program executes the same code when rendering two screens, then any bug
in the code will generally occur in both cases. Because Fuzz Map uses hit vectors to
identify states, it will only show two states in the map for the previous example: one
where
name
is non-null, and another where name
is null.
The state graph helps Fuzz Map to explore states efficiently. The graph keeps track of the states the fuzzer has seen and the shortest paths between them. Instead of building a graph, Fuzz Map could randomly generate long sequences of inputs. This is surprisingly inefficient. Efficiency is especially important for fuzzing GUIs, which process inputs much more slowly than typical fuzzer targets like parsers or protocols.
To illustrate how random fuzzing can be inefficient, consider an application with a sequence of screens, e.g. for checking out items from an online store. In screen \( 1 \), colored blue, you can only go to the next state, and in screens \( 2 \) through \( n - 1 \) you can go to either the previous or next screen.
How long will it take a random fuzzer to find the final screen \( n \), colored orange? If this is the coupon collector's problem, we'd expect it to take around \( n \log n \) inputs. But because each screen can only be accessed through its adjacent screens, this is actually an instance of the gambler's ruin problem. It will take random fuzzing around \( n^2 \) inputs to find the final orange screen! And the structure of a real GUI is normally even worse for fuzzers. Most GUIs also have many cycles and dead ends; these only make it more likely for the fuzzer to get stuck.
Because Fuzz Map uses code coverage to recognize states it has already seen, it only takes around \( n \) inputs in this case.
For the math behind the \( n^2 \) estimate, see angryavian's answer to an equivalent problem on Math StackExchange. For the general case, see the chapter “Random Walk and Ruin Problems” in An Introduction to Probability Theory and Its Applications, Vol. 1 by Feller. The amazing Internet Archive has an online copy you can borrow, where the analysis starts on page 317.
By the way, if you think it's inconvenient that only one person can borrow the book per hour, book publishers disagree—they think it should be even harder! At the time of writing, Hachette, HarperCollins, Wiley, and Penguin Random House are suing the Internet Archive. The Internet Archive is being defended by the EFF.
Whenever Fuzz Map first sees a state, identified by its hit vector \( \vec s \), it generates all the possible inputs for that state. For example, one input \( i \) might be “change the value of 'Pickup time' to '1:24 AM'”. Then it places all of these inputs into a fuzzing queue. The items in the queue are pairs \( (\vec s, i) \).
Fuzzing takes place in a loop. In each iteration, Fuzz Map dequeues a pair \( (\vec s, i) \) then applies it. If \( \vec s \) does not match the current state, then Fuzz Map travels before applying \( i \). It does this by applying the shortest sequence of inputs it previously recorded as having led to \( \vec s \). A minified set of these shortest paths is also saved for reuse after recompiling to make subsequent fuzzing runs faster. This set is comparable to the seed pool of a conventional fuzzer.
Fuzz Map uses a few simple scheduling heuristics for dequeuing items. When possible, Fuzz Map dequeues an item whose state equals the current state. This is because it is typically more expensive to reset a GUI than to apply an additional input. Otherwise, Fuzz Map will alternate between selecting a random item from the queue and selecting an item with the smallest hit vector.
Fuzz Map dequeues a random item rather than always dequeuing the next item to to try avoid getting stuck in a clique of states that all lead to each other. Dequeuing the item with the smallest hit vector helps to expose behavior that occurs only when a loop does not execute.
These are just heuristics, and future versions of Fuzz Map could be much more
sophisticated. Consider an application which only shows some state if
password == "please"
. Fuzz Map knows through branch coverage that isn't
guessing correctly, but it doesn't currently analyze the program to determine what the
correct password is. A more realistic example for a GUI might be an event handler that
checks whether event.touches.length == 2
. One technique for generating
inputs using program analysis is
concolic testing.
To record branch coverage and obtain hit vectors for nodes (render states) and edges (event handlers), Fuzz Map uses compile-time instrumentation.
Tangentially, Fuzz Map does some things with compile-time instrumentation which could just as well be done at by patching the application runtime, e.g. associating event handlers with their elements. It would have been feasible to patch React or the browser, but it will be much harder to patch other platforms' runtimes. iOS is an obvious example, but another scenario is a corporate environment where you are only allowed to deploy JARs.
Instrumentation is applied in a compiler pass. When Fuzz Map encounters this code:
return isLoading ? "Loading..." : "Done!";
it adds instrumentation and outputs the following:
return isLoading ? (hit(65), "Loading...") : (hit(66), "Done!");
The comma denotes a sequence expression. When the expression
a, b
is evaluated, first a
is evaluated, then b
.
Finally, the entire expression evaluates to the result of b
. Here it's used
to insert the hit counter into the ternary expression without changing its value. There
are weirder tricks for this which will appear later in this section.
hit(n)
increments the hit counter with ID n
in a
global map. The hit counter IDs in this example are \( 65 \) and \( 66 \) because Fuzz
Map packs both a branch ID and a target ID into the hit counter ID. The branch ID in
this case is \( 1 \), and the second branch target has ID \( 2 \), so \( 66 = (1 \ll 6)
+ 2 \). Fuzz Map also uses the most significant bit to record whether a hit counter ID
corresponds to a “high detail” branch; currently these are just function
entry points. The extra information packed into the hit counter ID is used during map
graph generation, discussed later.
You probably know that the Boolean operators in JavaScript short-circuit
and evaluate to the first sub-expression that determines their value, as they do in most
(or all?) languages in the C family. Short-circuiting means that
false && alert("boo!")
and true || alert("boo!")
never alert.
But unless you are familiar with React and JSX, you might not know that short-circuiting
is often used
idiomatically. Here's some code from the Checkout example:
{screen === "OrderConfirmed" && ( <> <h2>Order confirmed</h2> <div className="screenContents"> <p>Your order is confirmed!</p> <p>{STORE_NAME} is preparing your order.</p> ... </div> </> )}
When the “Order Confirmed” screen is not being shown,
screen === "OrderConfirmed"
evaluates to false
, so the entire
expression evaluates to false
. React doesn't render false
, so
the screen doesn't show.
The frequent use of short-circuiting in React made it more important for this demo to instrument it. If the demo worked on C programs compiled to WASM instead, this might have been less important.
As an aside, WASM would have made some things easier. Instead of having to write the instrumentation for each branching operation separately, I could have just instrumented the less numerous control instructions in WASM. Rewriting a WASM binary is a lot easier than rewriting e.g. an x86 binary because WASM does not allow jumps to arbitrary addresses; for the difficulties that creates, see e.g. Bauman et al (2018). More generally, it would be nifty to have the ability to write a compiler pass that operates on a lower-level representation, and to then have the compiler automatically transform this pass into one that operates on a higher-level representation, preserving the original structure of the program whenever possible. I don't know of other applications for this, though; maybe it could be used to implement diff/patch not based on lines.
Because we don't have this ability, here are the separate transformations for
a || b
and a ?? b
(the
nullish coalescing operator):
a || b
becomes
((x) => x ? (hit(1), x) : (hit(2), b))(a)
a ?? b
becomes
((x) => (x === null || x === undefined) ? (hit(2), b) : (hit(1), x))(a)
Any expression can be replaced by an
immediately-invoked function expression
containing a conditional expression. It's necessary because we need to bind the value of
the expression a
so we can use it multiple times while evaluating it only
once. For example, to evaluate a ?? b
, first we compare the value of
a
with null
and undefined
, then we return it.
I previously tried to be clever and transform a || b
to
(a && (h1, true)) || (h2, b)
. This works unless a
is only
truthy, not true, and the result of the expression is not used as a Boolean.
Then the transformed expression evaluates to true
when the original
expression would have evaluated to e.g. 1
.
If this section didn't bore you out of your mind, you might enjoy the game return true to win by @steike.
The state graph is already a simplified version of the program being tested. But typically it'll still be humongous, even for a simple application. When rendered it is almost unreadable. So before layout and rendering, Fuzz Map simplifies the state graph into a map graph. Each node or edge in the map graph corresponds to one or more nodes and edges in the state graph. As a final step, Fuzz Map uses elkjs to create a layout where each node and edge in the map graph has been assigned a position on the screen.
Why is the state graph so large? First, there are many ways that different parts of the program can interact with each other. If one branch is added to the program that is independent from the other branches in the program, it more than doubles the number of possible states. In other words, even when each entry of the hit vector is clamped to \( [0, 1] \), a program with \( n \) branches still has \( 2^n \) possible hit vectors.
Second, each iteration of a loop increments each hit counter it contains. So each loop iteration creates an entirely new state.
During fuzzing, it's useful for the state graph to be large. Recall that each item \( (\vec s, i) \) in the fuzzing queue contains a hit vector \( \vec s \) that identifies the state to which the input \( i \) should be applied. So the more precisely the hit vector \( \vec s \) describes the application state, the more likely we are to successfully apply input \( i \) when we travel to that state again during fuzzing. Suppose we used clamped hit vectors to identify application states. Then we might try to apply an input \( i \) that only works in states where a list has 3 elements to a state where that list has only 2 elements.
It's still possible for us to fail to apply an input \( i \). A hit vector \( \vec s \) might correspond to two states, and \( i \) might apply in only one of them. What if we instead identified each state using the application's actual memory, instead of using hit vectors? This would guarantee that \( i \) would apply, but might make the fuzzer slower. Obviously, we would use much more memory to represent each state.
More subtly, we would apply many redundant operations during fuzzing. Suppose the application behaves exactly the same no matter what a user's name is, so long as they have a name. Then unless we have a way to determine that the difference between two states with different names is insignificant, every different name multiplies the number of items in the fuzzing queue.
The map graph is created by combining nodes and edges in the state graph. This process repeats until no more combinations are possible. Fuzz Map uses a few simple heuristics for simplification. The simplest—but most impactful—is clamping: each entry of a hit vector is clamped to at most \( 1 \). Another heuristic, branch collapsing, is to treat two branch targets as identical when we've seen a state where both were hit. Branch collapsing helps with GUIs which perform processing for each item in a list, like the Checkout example.
This diagram shows an example of clamping during simplification. The two blue nodes in the state graph are combined into one node in the map graph:
Initially the two blue nodes have different hit vectors, \( (1, 0, 2) \) and \( (3, 0, 1) \). But both hit vectors are equivalent after clamping to \( (1, 0, 1) \), so the two nodes are combined. This causes the edge from one blue node to the other to become a loop. Simplification operates on edges as well.
There's a lot more that could be said about map graph simplification. In fact, simplification was the largest source of novelty and complexity in this project. Without simplification, interesting parts of the map are drowned out by exponentially more uninteresting parts. Simplification is the key to making the map visualization possible.
Clamping is essentially local in that it acts on each hit vector independently. Branch collapsing is slightly less local. The next step is to explore global analyses, e.g. identifying when a state differs significantly from others.
Changing tack: it's often possible for Fuzz Map to apply inputs even after the program has been modified. The input paths that Fuzz Map uses to identify input targets contain the event handler's inline declaration. For example, if you render a list of buttons:
function App(...) { ... return list.map((name) => <button onClick={handleClick(name)} />); }
each button's onClick
handler gets an input path:
{ subtree: ?, eventName: 'click', handler: 'App_handleClick_name', index: 1 }
where index: 1
means this handler was the second among its siblings in the
DOM with this handler. Fuzz Map uses the event handler's inline declaration because it's
already common practice not to define event handlers entirely inline. This means most
input paths will be stable without any extra work.
Fuzz Map was designed to work without manual annotation whenever feasible. But if you
need to more precisely identify an element, you can set the
data-fuzzmap-key
HTML attribute on the element or its ancestors. The
subtree
field in the input path is a list of
(key, index)
pairs. If you want, you can give each element a totally unique
data-fuzzmap-key
. Many testing systems either require this for all
interactive elements or fall back to guessing based on style or content.
A heuristic is used to minimize the size of the seed input set: if one seed input sequence is exactly contained in another, then the smaller one is removed. For many applications, it is more expensive to reset the sandbox than to extend an existing input sequence with additional inputs, e.g. when resetting the sandbox requires resetting a database. A more optimal approach would be to compute a minimal path cover.
There are many limitations in the current version of Fuzz Map—hence the alpha label. I'll mention a few of them below. All of these will be addressed in future versions.
Fuzz Map does not generate complex values for inputs. This is the exact opposite of most commonly-used fuzzers! That is also exactly why this wasn't a focus of the demo. Instead of reinventing the wheel, a future version of Fuzz Map will integrate established fuzzers for generating e.g. the text in inputs.
Fuzz Map's state and event model is extensible. The demo does not currently handle
asynchronous events like setTimeout
and fetch
calls. But in a
future version of Fuzz Map, edges will correspond not just to DOM event handlers, but
also to event handlers attached to browser APIs. It will be straightforward to extend
Fuzz Map to instrument and replay these in the same way that it already instruments DOM
events. Here's a quick mockup:
Fuzz Map does not handle event handlers that are defined at runtime. Only controlled components are fuzzed.
I was lazy and didn't handle more exotic branching operations like ??=
and
?.
.
I'd like to especially acknowledge Joey Liaw, who gave me the idea to try and fuzz GUIs over lunch!
I'd also like to especially acknowledge the authors of elkjs and the ELK layout library. Of all the libraries I used for this demo, ELK punched the most above its weight. ELK generates the layout used to render the map graph. A similar program is the classic Graphviz.