Using code coverage to improve fuzzing results
Hi all,
I’m Lars Opstad, an engineering manager in the MSEC Science group supporting the SDL within Microsoft. I wanted to share with you some of the ways that we are improving our internal security practices, specifically in the area of file fuzzing.
Many fuzzers take a good file (template) as a starting point for creating malformed content. For years, dating back to Writing Secure Code in 2002, the SDL has encouraged people to fuzz their file parsers with a “good representative set” of template files. The question is “how do I tell if I have a good set?” At Microsoft, our answer is code coverage, which is in line with other external research (Guruswami, Miller, and Sutton, Greene and Amini to cite only a few).
Code Coverage Definition
Before getting into how we use code coverage for fuzzing, I should briefly define the terms. If you are well versed in these concepts, feel free to skip to “Using Coverage for Better Fuzzing.”
Code coverage is a measure of how well a set of code is exercised based on a set of tests or inputs. The simplest scheme of code coverage is at the function level. Functional coverage measures which functions were called and which functions were not called. While this is useful at a very broad level, it does not really provide the granularity necessary to measure meaningful coverage for a file parser. A better metric is “block” coverage. A basic block in coverage terms is a section of code that always executes in sequence with no jumps in or out. Take the following code sample: