Search
๐Ÿ”จ

Breaking Through Binaries: Compiler-quality Instrumentation for Better Binary-only Fuzzing

Created
2022/04/13 05:41
Tags
Fuzzing
ZAFL
Conference
Usenix
Published
2021

Abstract

Fuzzing์€ ์ž๋™ํ™”๋œ ๊ณผ์ •์„ ํ†ตํ•ด ํ”„๋กœ๊ทธ๋žจ์ด ๊ฐ€์ง€๊ณ  ์žˆ๋Š” ์ž ์žฌ์ ์ธ ์ทจ์•ฝ์ ์„ ์‹๋ณ„ํ•˜๋Š”๋ฐ ์‚ฌ์šฉ๋˜๋Š” ๊ธฐ์ˆ ์ž…๋‹ˆ๋‹ค. ๋ณธ ์—ฐ๊ตฌ๋Š” compiler-based ํ™˜๊ฒฝ๋ณด๋‹ค binary-only ํ™˜๊ฒฝ์—์„œ์˜ fuzzing์ด ๋” ์–ด๋ ต๊ณ , ๋น„ํšจ์œจ์ ์ด๋ผ๋Š” ์ ์— ์ฃผ๋ชฉํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. Compiler-based ํ™˜๊ฒฝ์˜ ๊ฒฝ์šฐ compiler๋ฅผ ์ง์ ‘์ ์œผ๋กœ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์— ์„ฑ๋Šฅ ํ–ฅ์ƒ์— ์šฉ์ดํ•˜๊ณ , ์ด ๋•Œ๋ฌธ์— ๋Œ€๋ถ€๋ถ„์˜ fuzzing ๊ธฐ๋ฒ•๋“ค์€ compiler-based fuzzing์„ ์ค‘์‹ฌ์œผ๋กœ ๋ฐœ์ „๋˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ๋ฐ˜๋ฉด binary-only fuzzing์˜ ๊ฒฝ์šฐ ์ด๋Ÿฐ ์ด์ ๋“ค์„ ์ „ํ˜€ ํ™œ์šฉํ•  ์ˆ˜ ์—†๊ธฐ ๋•Œ๋ฌธ์— ๋น„๊ต์  ๋‚ฎ์€ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ฃผ๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.
๊ทธ๋Ÿฌ๋‚˜ ํ˜„์‹ค์—์„œ ๋Œ€๋ถ€๋ถ„์˜ fuzzing์€ closed-source๋“ฑ์˜ ์ด์œ ๋กœ ์†Œ์Šค์ฝ”๋“œ์— ์ ‘๊ทผํ•  ์ˆ˜ ์—†๊ธฐ ๋•Œ๋ฌธ์— binary-only ํ™˜๊ฒฝ์—์„œ ์ง„ํ–‰ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” binary-only fuzzing์—์„œ๋„ compiler-based fuzzing์˜ ์„ฑ๋Šฅ์ด ๋‚˜์˜ฌ ์ˆ˜ ์žˆ๋„๋ก ๋•๋Š” fuzzing-enhancing program์ธ ZAFL์„ ์†Œ๊ฐœํ•ฉ๋‹ˆ๋‹ค.

Background on Fuzzing

์ด ํŒŒํŠธ์—์„œ๋Š” fuzzing์— ๋Œ€ํ•œ ์ „๋ฐ˜์ ์ธ ์„ค๋ช…๊ณผ, binary-only fuzzing์—์„œ์˜ ์–ด๋ ค์›€์„ ์ •๋ฆฌํ–ˆ์Šต๋‹ˆ๋‹ค.

An Overview of Fuzzing

Fuzzing์€ test case๋ฅผ ์ƒ์„ฑํ•˜๊ณ , ์ด ์ผ€์ด์Šค๋“ค์ด ๋Œ€์ƒ ํ”„๋กœ๊ทธ๋žจ์— ๋ฏธ์น˜๋Š” ์˜ํ–ฅ์„ ๊ด€์ฐฐํ•˜๋Š” ๊ฒƒ์„ ๋ฐ˜๋ณตํ•˜๋ฉฐ, ์ตœ์ข…์ ์œผ๋กœ ๋ฒ„๊ทธ๋‚˜ ํฌ๋ž˜์‹œ๋ฅผ ์œ ๋ฐœํ•˜๋Š” ๊ฒƒ์ด ๋ชฉํ‘œ์ž…๋‹ˆ๋‹ค. ๊ธฐ๋ณธ์ ์œผ๋กœ fuzzing์˜ ๊ณผ์ •์€ ์•„๋ž˜ Figure 1๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.
Figure 1: A high-level overview of the basic fuzzing overflow
1.
Instrumentation: ํƒ€๊ฒŸ ํ”„๋กœ๊ทธ๋žจ์„ code coverage tracking๋“ฑ์„ ์œ„ํ•ด ์ˆ˜์ •
2.
Test Case Generation: ์„ ํƒ๋œ ์‹œ๋“œ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ mutation์„ ํ†ตํ•ด test case ์ƒ์„ฑ
3.
Execution Monitoring and Feedback Collection: ๊ฐ test case๋ฅผ ์‹คํ–‰์‹œ์ผœ๋ณด๊ณ , ๊ทธ์— ๋Œ€ํ•œ feedback์ธ instrumentation์„ ์ˆ˜์ง‘
4.
Feedback Decision-making: ์ƒˆ๋กœ์šด code coverage ๋ฐœ๊ฒฌ ๋“ฑ์˜ ์กฐ๊ฑด์„ ์ถฉ์กฑํ•˜๋Š” test case๋งŒ ์œ ์ง€
5.
1๋ฒˆ ๊ณผ์ •๋ถ€ํ„ฐ ๋ฐ˜๋ณต
์œ„์™€ ๊ฐ™์€ ๊ณผ์ •์œผ๋กœ ์ง„ํ–‰์ด ๋˜๋ฉฐ, mutation ๋ฐฉ๋ฒ•์ด๋‚˜ decision-making ๋ฐฉ์‹์— ๋”ฐ๋ผ fuzzer๋ฅผ ๋ถ„๋ฅ˜ํ•  ์ˆ˜ ์žˆ์ง€๋งŒ, ๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” ์ด ๋ถ€๋ถ„์— ๋Œ€ํ•ด์„œ ์ง‘์ค‘์ ์œผ๋กœ ๋‹ค๋ฃจ์ง€ ์•Š์•˜์Šต๋‹ˆ๋‹ค.

Coverage-guided Grey-box Fuzzing

Coverage-guided Grey-box Fuzzing์€ ๊ฐ€์žฅ ๋งŽ์ด ์“ฐ์ด๋Š” fuzzing ๊ธฐ๋ฒ• ์ค‘ ํ•˜๋‚˜์ž…๋‹ˆ๋‹ค. ์ด ๋ฐฉ์‹์€ ํƒ€๊ฒŸ ํ”„๋กœ๊ทธ๋žจ์— ๋‹จ์œ„ ๋ธ”๋ก(basic block)๋ณ„๋กœ ์‚ฝ์ž… ๋˜์–ด์žˆ๋Š” instrumentation์„ ๊ธฐ๋ฐ˜์œผ๋กœ, code coverage๋ฅผ ์ธก์ •ํ•˜์—ฌ fuzzing์„ ์ง„ํ–‰ํ•ฉ๋‹ˆ๋‹ค. ๋˜ํ•œ white-box์™€ black-box์˜ ๋ถ„์„ ๋ฐฉ๋ฒ•์„ ๋ชจ๋‘ ์‚ฌ์šฉํ•˜๊ธฐ ๋•Œ๋ฌธ์— โ€˜grey-boxโ€™๋ผ๊ณ  ์นญํ•ฉ๋‹ˆ๋‹ค.
๋Œ€๋ถ€๋ถ„์˜ fuzzer๋“ค์€ ๋‚ฎ์€ overhead๋ฅผ ์œ„ํ•ด ์ปดํŒŒ์ผ๋ง ๊ณผ์ •์—์„œ instrumentation์„ ์‚ฝ์ž…ํ•˜๊ณ , ์ด๋ฅผ ์œ„ํ•ด ์†Œ์Šค์ฝ”๋“œ๊ฐ€ ํ•„์š”ํ•œ ๊ฒฝ์šฐ๊ฐ€ ๋Œ€๋ถ€๋ถ„์ž…๋‹ˆ๋‹ค. ์†Œ์Šค์ฝ”๋“œ ์ ‘๊ทผ์ด ๋ถˆ๊ฐ€๋Šฅํ•œ binary-only ํ™˜๊ฒฝ์—์„œ๋„ instrumentation์„ ์‚ฝ์ž…ํ•˜์—ฌ fuzzing์„ ์ง„ํ–‰ํ•  ์ˆ˜๋Š” ์žˆ์ง€๋งŒ, ์ด ๊ฒฝ์šฐ ์ปดํŒŒ์ผ๋ง ๋‹จ๊ณ„ ์ˆ˜์ค€์˜ instrumentation์ด ๋ถˆ๊ฐ€๋Šฅํ•˜์—ฌ coverage tracing์—์„œ๋งŒ ์ตœ๋Œ€ 1000%์˜ overhead๋ฅผ ์œ ๋ฐœํ•ฉ๋‹ˆ๋‹ค. โ€˜์œ ์˜๋ฏธํ•œโ€™ test case์— ํ•œํ•ด์„œ๋งŒ ๊ณ„์† fuzzing์„ ์ง„ํ–‰ํ•œ๋‹ค๋Š” ์ ์—์„œ ๋งค์šฐ ๊ฐ•๋ ฅํ•œ ๋ฐฉ๋ฒ•์ด์ง€๋งŒ, binary-only ํ™˜๊ฒฝ์—์„œ๋Š” ์ด๋ฅผ ํšจ์œจ์ ์œผ๋กœ ํ™œ์šฉํ•˜์ง€ ๋ชปํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

Compiler-based Fuzzing Enhancements

Binary-only ํ™˜๊ฒฝ์—์„œ์˜ fuzzing์— ๋Œ€ํ•ด์„œ ์•Œ์•„๋ณด๊ธฐ ์ „์—, ๊ธฐ์กด์˜ compiler-based fuzzing์—์„œ๋Š” ์–ด๋–ค ๋ฐฉ๋ฒ•์„ ํ†ตํ•ด ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ์ด๋ฃจ์—ˆ๋Š”์ง€ ์ •๋ฆฌํ•˜์˜€์Šต๋‹ˆ๋‹ค. ๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” ์—ฐ์‚ฐ ๊ณผ์ •์„ ํ–ฅ์ƒ์‹œํ‚ค๋Š” Instrumentation Pruning๊ณผ Instrumentation Downgrading, ๊ทธ๋ฆฌ๊ณ  Feedback ๊ณผ์ •์„ ํ–ฅ์ƒ์‹œํ‚ค๋Š” Sub-instruction Profiling๊ณผ Extra-coverage Behavior Tracking์— ๋Œ€ํ•ด์„œ ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค.
Table 1: Popular compiler-based fuzzing-enhancing program transformations, listed by category and effect
[Performance] Instrumentation Pruning
Graph reducibility technique๋Š” ์ค‘์š”ํ•˜์ง€ ์•Š์€ ๊ธฐ๋ณธ ๋ธ”๋ก์„ ๋ฌด์‹œํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ํ†ตํ•ด ๋Ÿฐํƒ€์ž„ overhead๋ฅผ ๋‚ฎ์ถ”๋Š” ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค. AFL์˜ ๊ฒฝ์šฐ ์ผ์ • ๋น„์œจ์— ํ•œํ•ด์„œ ์ž„์˜์ ์œผ๋กœ graph reduction์„ ํ—ˆ์šฉํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์ด์˜ ๊ฒฝ์šฐ ์ค‘์š”ํ•œ coverage๋ฅผ ๋ฌด์‹œํ•  ๊ฐ€๋Šฅ์„ฑ์ด ์ƒ๊ธฐ๊ธฐ ๋•Œ๋ฌธ์—, AFL++์™€ INSTRIM์€ ์‹ค์ œ๋กœ ์˜๋ฏธ๊ฐ€ ์—†๋Š” ๋ธ”๋ก์ธ์ง€ ํ™•์ธํ•˜๊ณ  ๋ฌด์‹œํ•˜๋Š” ๋ฐฉ๋ฒ•์œผ๋กœ ๋ฐœ์ „์‹œ์ผœ ์‚ฌ์šฉํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.
[Performance] Instrumentation Downgrading
ํ˜„์žฌ ๋Œ€๋ถ€๋ถ„์˜ fuzzer๋“ค์€ edge๋ฅผ ํ†ตํ•ด coverage๋ฅผ ์ธก์ •ํ•ฉ๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์„œ edge๋ž€, ๋‹จ์œ„ ๋ธ”๋ก ์‚ฌ์ด์˜ branches๋ฅผ ์นญํ•ฉ๋‹ˆ๋‹ค. ๋ณดํ†ต ์ด edge๋ฅผ coverage ์ธก์ •์— ์žˆ์–ด ์ผ์ข…์˜ ํ•ด์‹œ๊ฐ’์œผ๋กœ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ๋‹ค๋งŒ edge hashing์€ ์—ฐ์‚ฐ์ด ํ•„์š”ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์ตœ์†Œํ•œ์˜ ์—ฐ์‚ฐ์„ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ด ์ „๋ฐ˜์ ์ธ performance์—์„œ ์ค‘์š”ํ•œ ์—ญํ• ์„ ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ CollAFL๋“ฑ์˜ fuzzer๋“ค์€ block์—์„œ instruction์„ ์‚ฝ์ž…ํ•  ๋•Œ ๋” ๊ฐ„๋‹จํ•˜๊ฒŒ ๊ตฌ์„ฑํ•˜์—ฌ overhead๋ฅผ ๊ฐ์†Œ์‹œํ‚ต๋‹ˆ๋‹ค.
[Feedback] Sub-instruction Profiling
๋Œ€๋ถ€๋ถ„์˜ coverage-guided fuzzing์€ magic bytes์™€ ์ฒดํฌ์„ฌ ๊ฐ™์€ ๋ณต์žกํ•œ ์š”์†Œ๊ฐ€ ์žˆ๋Š” ๊ฒฝ์šฐ ์ด๋ฅผ ํŒŒ์•…ํ•˜๋Š” ๊ฒƒ์— ์–ด๋ ค์›€์„ ๊ฒช๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ๋Œ€๋ถ€๋ถ„์˜ fuzzer๋“ค์€ ์•ž์„œ ์„ค๋ช…ํ•œ edge์™€ block์„ ๊ธฐ์ค€์œผ๋กœ coverage๋ฅผ ์ธก์ •ํ•˜๊ธฐ ๋•Œ๋ฌธ์—, ์ฒดํฌ์„ฌ ๋“ฑ์˜ ๊ณผ์ •์„ ์ž˜ ์ธ์‹ํ•˜์ง€ ๋ชปํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด honggFuzz, CmpCov, laf-Intel๊ณผ ๊ฐ™์€ fuzzer๋“ค์€ ๋‹จ์ผ ๋ฐ”์ดํŠธ ๋น„๊ต ๋ฐ ํ•˜์œ„ ๋ช…๋ น์–ด ํ”„๋กœํŒŒ์ผ๋ง์„ ํ†ตํ•ด coverage๋ฅผ ๋†’์ด๋ ค๊ณ  ๋…ธ๋ ฅํ•ฉ๋‹ˆ๋‹ค.
[Feedback] Extra-coverage Behavior Tracking
์ „ํ†ต์ ์ธ ์ฝ”๋“œ ๋ฒ”์œ„์—์„œ ๋ฒ—์–ด๋‚˜ ์‹คํ–‰ ๋™์ž‘๊นŒ์ง€ ํฌํ•จํ•˜์—ฌ coverage๋ฅผ ์ธก์ •ํ•˜๋Š” ๊ฒƒ์€ ํ˜„์žฌ fuzzing์˜ ์—ฐ๊ตฌ ๋ถ„์•ผ ์ค‘ ํ•˜๋‚˜์ž…๋‹ˆ๋‹ค. ๊ธฐ์กด์˜ fuzzer๋“ค์€ edge๋ฅผ ํ•˜๋‚˜์˜ ์ง‘ํ•ฉ์œผ๋กœ ๊ฐ„์ฃผํ•˜์—ฌ coverage๋ฅผ ์ธก์ •ํ•˜๊ธฐ ๋•Œ๋ฌธ์—, ์ˆœ์„œ, ์ฆ‰ context์— ๋Œ€ํ•ด์„œ๋Š” ์ธก์ •ํ•˜์ง€ ๋ชปํ•ฉ๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด A โ†’ B โ†’C์™€ B โ†’ A โ†’C์™€ ๊ฐ™์€ edge๋“ค์˜ ๊ฒฝ์šฐ context-insensible coverage๋Š” ๋‘ edge๊ฐ€ ๊ฐ™๋‹ค๊ณ  ๊ฐ„์ฃผํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ context-sensitive coverage๋Š” B โ†’ C์™€ A โ†’ C๋กœ ๊ตฌ๋ถ„ํ•˜์—ฌ coverage๋ฅผ ๊ณ ๋ คํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

Binary-only Fuzzing: the Bad & the Ugly

Binary-only fuzzing์€ compiler-based ํ™˜๊ฒฝ์—์„œ์˜ ์žฅ์ ๋“ค์„ ํ™œ์šฉํ•  ์ˆ˜ ์—†๊ธฐ ๋•Œ๋ฌธ์— ์„ฑ๋Šฅ๊ณผ ํšจ์œจ์ด ๋น„๊ต์  ๋‚ฎ์€ ํŽธ์ž…๋‹ˆ๋‹ค. ๋ณธ ์žฅ์—์„œ๋Š” ๊ธฐ์กด์˜ binary-only fuzzing์ด ๊ฐ€์ง€๊ณ  ์žˆ๋Š” ํ•œ๊ณ„์ ์— ๋Œ€ํ•ด์„œ ํƒ๊ตฌํ•ฉ๋‹ˆ๋‹ค.

Limitations of Existing Platforms

Compiler-based ํ™˜๊ฒฝ์—์„œ๋Š” ์ปดํŒŒ์ผ๋Ÿฌ์˜ ๋น ๋ฅธ ์ฒ˜๋ฆฌ ์†๋„๋ฅผ ์ด์šฉํ•˜์—ฌ coverage ์ธก์ •์„ ํ•˜์ง€๋งŒ, binary-only ํ™˜๊ฒฝ์—์„œ๋Š” ๋‹ค์Œ ์„ธ ๊ฐ€์ง€ ๋ฐฉ๋ฒ• ์ค‘ ํ•˜๋‚˜๋ฅผ ํ†ตํ•ด ์ถ”์ ๋ฉ๋‹ˆ๋‹ค.
[1] Hardware-assisted Tracing
Intel PT๋“ฑ์˜ ์ตœ์‹  ํ”„๋กœ์„ธ์„œ๋“ค์˜ ๊ฒฝ์šฐ, binary code coverage๋ฅผ ์‰ฝ๊ฒŒ ์•Œ์•„๋‚ผ ์ˆ˜ ์žˆ๋„๋ก ํ•˜๋Š” ๋ฉ”์ปค๋‹ˆ์ฆ˜์ด ์ œ๊ณต๋ฉ๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ์ด ๊ฒฝ์šฐ๋Š” ๋น„์šฉ์ด ๋งค์šฐ ๋†’๊ณ , compiler-based์— ๋น„ํ•ด 50%๋„˜๋Š” ์˜ค๋ฒ„ํ—ค๋“œ๊ฐ€ ๋ฐœ์ƒํ•œ๋‹ค๋Š” ๋‹จ์ ์ด ์กด์žฌํ•ฉ๋‹ˆ๋‹ค.
[2] Dynamic Binary Translators
Dynamic binary translating์€ ํ”„๋กœ๊ทธ๋žจ์ด ์‹คํ–‰๋˜๋Š” ๋™์•ˆ coverage-tracing ํ•˜๋Š” ๋ฐฉ์‹์„ ๋งํ•ฉ๋‹ˆ๋‹ค. ๋Œ€ํ‘œ์ ์œผ๋กœ DynamoRIO, PIN, QEMU๋“ฑ์˜ ํ”„๋กœ๊ทธ๋žจ์ด ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ๋งŽ์€ ์•„ํ‚คํ…์ฒ˜์™€ ๋ฐ”์ด๋„ˆ๋ฆฌ ํ˜•์‹์„ ์ง€์›ํ•œ๋‹ค๋Š” ๊ฒƒ์ด ํฐ ์žฅ์ ์ด์ง€๋งŒ, fuzzing์˜ ๊ด€์ ์œผ๋กœ ๋ณด์•˜์„ ๋•Œ๋Š” ์˜ค๋ฒ„ํ—ค๋“œ๊ฐ€ 600%๊นŒ์ง€ ๋ฐœ์ƒํ•˜์—ฌ ํšจ์œจ์„ฑ์ด ๋งŽ์ด ๋–จ์–ด์ง„๋‹ค๋Š” ๋‹จ์ ์ด ์žˆ์Šต๋‹ˆ๋‹ค.
[3] Static Binary Rewriters
๋งˆ์ง€๋ง‰์œผ๋กœ static binary rewriting ๊ธฐ๋ฒ•์˜ ๊ฒฝ์šฐ, ๋ฐ”์ด๋„ˆ๋ฆฌ๋ฅผ ์‹คํ–‰์‹œํ‚ค๊ธฐ ์ง์ „์— ๋ฐ”์ด๋„ˆ๋ฆฌ๋ฅผ ๋ณ€ํ˜•์‹œํ‚ค๋Š” ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค. Static binary rewriting๊ธฐ๋ฒ•์˜ ๊ฒฝ์šฐ AFL-Dynist, RetroWrite, Uroboros, Ramblr ๋“ฑ ๋‹ค์–‘ํ•œ fuzzer๋“ค์ด ์ด ๊ธฐ๋ฒ•์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ๋ชจ๋‘ binary-only ํ™˜๊ฒฝ์˜ ๊ฒฝ์šฐ ์ œํ•œ์ ์œผ๋กœ ๊ธฐ๋Šฅ์„ ์ œ๊ณตํ•˜๊ฑฐ๋‚˜ Linux ํ”„๋กœ๊ทธ๋žจ์—์„œ๋งŒ ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•˜๋‹ค๋Š” ๋‹จ์  ์ด ์กด์žฌํ•ฉ๋‹ˆ๋‹ค.
Table 2: A qualitative comparison of the leading coverage-tracing methodologies currently used in binary-only coverage-guided fuzzing, alongside compiler instrumentation (LLVM).
์œ„์˜ Table์„ ๋ณด๋ฉด ์•Œ ์ˆ˜ ์žˆ๋“ฏ์ด, ๊ธฐ์กด binary-only ํ™˜๊ฒฝ์˜ platform๋“ค์˜ ์„ฑ๋Šฅ์„ ๋ถ„์„ํ•ด๋ณด์•˜์„ ๋•Œ, compiler-based fuzzer๋“ค์˜ ์„ฑ๋Šฅ์„ ๋”ฐ๋ผ๊ฐ€์ง€ ๋ชปํ•˜๋Š” ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Fundamental Design Considerations

๋‹ค์‹œ compiler-based fuzzing์œผ๋กœ ๋Œ์•„์™€์„œ, compiler ์ˆ˜์ค€์—์„œ์˜ instrumentation ์ž‘์—…์ด binary-only fuzzing์—์„œ๋ณด๋‹ค ํ›จ์”ฌ ํšจ๊ณผ์ ์ด์—ˆ๊ธฐ ๋•Œ๋ฌธ์— ํ•ด๋‹น ๋ฐฉ๋ฒ•๋ก ์— ๋Œ€ํ•ด ๊ณ ๋ฏผํ•ด๋ณผ ํ•„์š”๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. ์—ฐ๊ตฌํŒ€์—์„œ๋Š” compiler-based ํ™˜๊ฒฝ์—์„œ fuzzer๋“ค์ด transformation ๋ฐฉ๋ฒ•์—์„œ ํฌ๊ฒŒ 4๊ฐ€์ง€ ์„ ํƒ์„ ํ•˜๋Š” ๊ฒƒ์„ ๋ฐœ๊ฒฌํ•˜์˜€๊ณ , ๋ณธ ์žฅ์—์„œ๋Š” ๊ฐ๊ฐ์˜ ์ค‘์š”์„ฑ์— ๋Œ€ํ•ด ๋…ผ์˜ํ•˜๊ณ  compiler-quality instrumentation์— ๊ฐ€์žฅ ์ ํ•ฉํ•œ ํŠน์„ฑ์„ ์„ ์ •ํ•ฉ๋‹ˆ๋‹ค.
[1] Rewriting versus Translation
Dynamic translation์€ ๊ธฐ๋ณธ์ ์œผ๋กœ emulation์„ ํ†ตํ•ด ๋ฐ”์ด๋„ˆ๋ฆฌ์˜ ์†Œ์Šค ๋ช…๋ น์–ด ์ŠคํŠธ๋ฆผ์„ ์‹คํ–‰ํ•˜๊ณ  ๋ณ€๊ฒฝ(translate)ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ๋งŽ์€ ๋น„์šฉ์„ ํ•„์š”๋กœ ํ•ฉ๋‹ˆ๋‹ค. ๋ฐ˜๋ฉด์— ํƒ€๊ฒŸ์„ ์‹คํ–‰ํ•˜๊ธฐ์— ์•ž์„œ์„œ ์ •์ ์œผ๋กœ ์šฐ์„  ๋ถ„์„ํ•˜์—ฌ ์žฌ์ž‘์„ฑ(rewrite)ํ•  ๊ฒฝ์šฐ, ํ›จ์”ฌ ์ ์€ ์–‘์˜ ๋น„์šฉ(๋Ÿฐํƒ€์ž„ ๋“ฑ๋“ฑ)์„ ํ•„์š”๋กœ ํ•˜๋ฉฐ, ์ด ๊ฒฝ์šฐ compiler-quality ์†๋„๋ฅผ ๋”ฐ๋ผ๊ฐˆ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ZAFL์˜ ๊ฒฝ์šฐ์—๋„ static rewriting์„ ํ†ตํ•œ instrumentation code๋ฅผ ์ถ”๊ฐ€ํ•˜์˜€์Šต๋‹ˆ๋‹ค.
[2] Inlining versus Trampolining
๋‹ค์Œ์€ coverage๋ฅผ ์ธก์ •ํ•  ๋ฐฉ๋ฒ•์„ ์„ ํƒํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ๋Œ€๋ถ€๋ถ„ inlining๊ณผ trampolining ์‚ฌ์ด์—์„œ ์„ ํƒ์„ ํ•ฉ๋‹ˆ๋‹ค. Trampolining์˜ ๊ฒฝ์šฐ, instrumentation์„ ๋งŒ๋‚ฌ์„ ๋•Œ ๋ณ„๋„์˜ ํŽ˜์ด๋กœ๋“œ ํ•จ์ˆ˜๋ฅผ ์ถ”๊ฐ€์ ์œผ๋กœ ํ˜ธ์ถœํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ๋œปํ•ฉ๋‹ˆ๋‹ค. ๋‹น์—ฐํžˆ payload ํ•จ์ˆ˜๋ฅผ ํ˜ธ์ถœํ•˜๊ณ , ํ˜ธ์ถœํ•œ ํ•จ์ˆ˜์—๊ฒŒ ๋‹ค์‹œ ๋Œ์•„๊ฐ€์•ผ ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ๋†’์€ ์˜ค๋ฒ„ํ—ค๋“œ๊ฐ€ ๋ฐœ์ƒํ•˜๊ฒŒ ๋œ๋‹ค. ๋ฐ˜๋ฉด inlining์˜ ๊ฒฝ์šฐ, basic block ๋‚ด์— instrumentation ๊ธฐ๋Šฅ์„ ์‚ฝ์ž…ํ•˜๋Š” ๊ฒƒ์„ ์˜๋ฏธํ•œ๋‹ค. ์•„ ๊ฒฝ์šฐ trampolining๊ณผ ๊ฐ™์ด ๊ธฐ์กด์˜ control flow๋ฅผ ๋ฒ—์–ด๋‚  ํ•„์š”๊ฐ€ ์—†๊ธฐ ๋•Œ๋ฌธ์— ์˜ค๋ฒ„ํ—ค๋“œ๋ฅผ ์ค„์ด๋Š” ๋ฐ ํ•„์ˆ˜์ ์ด๋‹ค. ๋”ฐ๋ผ์„œ ZAFL์˜ ๊ฒฝ์šฐ์—๋„ inlining ๊ธฐ๋ฒ•์„ ์ฑ„ํƒํ•˜์˜€์Šต๋‹ˆ๋‹ค.
[3] Register Allocation
๋ฉ”๋ชจ๋ฆฌ ์ ‘๊ทผ์€ ์„ฑ๋Šฅ์— ์žˆ์–ด์„œ ์ง€์—ฐ์„ ๋ฐœ์ƒ์‹œํ‚ต๋‹ˆ๋‹ค. ์˜ค๋ฒ„ํ—ค๋“œ๋ฅผ ์ค„์ด๊ธฐ ์œ„ํ•ด์„œ ํ•„์š”ํ•œ ๋ ˆ์ง€์Šคํ„ฐ๋งŒ ์ ‘๊ทผํ•˜๋Š” ๊ฒƒ์ด ๊ผญ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. ์ปดํŒŒ์ผ๋Ÿฌ๋Š” ๋ณดํ†ต ์ž˜ ์‚ฌ์šฉํ•˜์ง€ ์•Š๋Š” ์ฝ”๋“œ ๋ ˆ์ง€์Šคํ„ฐ๋ฅผ ๊ฐ€๋Šฅํ•œ ๋งŽ์ด ์ €์žฅ/๋ณต์›ํ•˜์ง€ ์•Š๋„๋ก ๋ ˆ์ง€์Šคํ„ฐ ํ™œ์„ฑ ์ƒํƒœ๋ฅผ ์ถ”์ ํ•ฉ๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ์ปดํŒŒ์ผ๋Ÿฌ ํ’ˆ์งˆ์˜ ์ด์ง„ ๊ณ„์ธก ์†๋„๋ฅผ ๋‹ฌ์„ฑํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ์Šค๋งˆํŠธ ๋ ˆ์ง€์Šคํ„ฐ ํ• ๋‹น์ด ํ•„์ˆ˜์ ์ž…๋‹ˆ๋‹ค.
[4] Real-world Scalability
ํ˜„๋Œ€์˜ ์ปดํŒŒ์ผ๋Ÿฌ๋“ค์€ ๋‹ค์–‘ํ•œ ์ปดํŒŒ์ผ ์–ธ์–ด, ๋ฐ”์ด๋„ˆ๋ฆฌ ํ˜•์‹, ํ”Œ๋žซํผ์„ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์œ„์—์„œ ์‚ดํŽด๋ณธ ๋งŽ์€ ํ”„๋กœ๊ทธ๋žจ๋“ค์€ ์ผ๋ถ€ ์ธ๊ธฐ์žˆ๋Š” ํ˜•์‹๋“ค๋งŒ ์ง€์›ํ•˜๋Š” ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋Œ€ํ‘œ์ ์œผ๋กœ ๋งŽ์€ fuzzer๋“ค์ด Linux์—๋งŒ ํ˜ธํ™˜ ๊ฐ€๋Šฅํ•˜๋ฉฐ Windows 64bit PE32+์˜ ๊ฒฝ์šฐ ์ง€์›๋˜์ง€ ์•Š๋Š” ๊ฒฝ์šฐ๊ฐ€ ๋Œ€๋‹ค์ˆ˜์ž…๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ๋ณธ ์—ฐ๊ตฌํŒ€์ด ์ถ”๊ตฌํ•˜๋Š” compiler-quality binary-only fuzzing instrumenting์˜ ๊ฒฝ์šฐ ๋‹ค์–‘ํ•œ ๋ฐ”์ด๋„ˆ๋ฆฌ ํ˜•์‹์„ ์ง€์›ํ•˜๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ํ•ฉ๋‹ˆ๋‹ค.

The ZAFL Platform

๋ณธ ์—ฐ๊ตฌํŒ€์€ compiler-based ํ™˜๊ฒฝ์—์„œ์˜ ์žฅ์ ๊ณผ, binary-only ํ™˜๊ฒฝ์—์„œ์˜ ๋‹จ์ ์„ ์ทจํ•ฉํ•˜์—ฌ ํ•˜๋‚˜์˜ ์ƒˆ๋กœ์šด ํ”Œ๋žซํผ์„ ํƒ„์ƒ์‹œ์ผฐ์Šต๋‹ˆ๋‹ค. ZAFL์€ ์ปดํŒŒ์ผ๋Ÿฌ ์ˆ˜์ค€์˜ ์ฒ˜๋ฆฌ๋Ÿ‰์„ closed-source ๋ฐ”์ด๋„ˆ๋ฆฌ์— ๋Œ€ํ•ด์„œ ๊ตฌํ˜„ํ•ฉ๋‹ˆ๋‹ค. ZAFL์˜ ํŠน์ง•์€ staticํ•˜๊ฒŒ inline instrumentation์ด ์ง„ํ–‰๋˜๋ฉฐ, liveness awareness๋˜ํ•œ ๊ณ ๋ คํ•˜์—ฌ ์ˆ˜ํ–‰๋ฉ๋‹ˆ๋‹ค. ๋‚˜์•„๊ฐ€ x86-64 ELF ๋ฐ”์ด๋„ˆ๋ฆฌ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ PE32+์˜ ํ˜•์‹๋„ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค. ์•„๋ž˜๋Š” ZAFL์˜ ์ „๋ฐ˜์ ์ธ ๊ตฌ์กฐ๋ฅผ diagram์œผ๋กœ ํ‘œํ˜„ํ•œ ๊ฒƒ์ž…๋‹ˆ๋‹ค.
Figure 2: A high-level depiction of the ZAFL platform architecture and its four ZAX transformation and instrumentation phases.
Figure 3: Figure 2์˜ ๊ณผ์ •์„ zafl ์‹คํ–‰ log๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ 3๊ฐœ์˜ step์œผ๋กœ ๋‚˜๋ˆˆ ๋ชจ์Šต.
Figure 2์— ๋ณด์ด๋Š” ๊ฒƒ์ฒ˜๋Ÿผ, ZAFL์€ ํฌ๊ฒŒ 1) static rewriting์„ ์ง„ํ–‰ํ•œ ๋’ค, 2) ZAX transformation์„ ์ˆ˜ํ–‰ํ•˜์—ฌ ์ตœ์ ํ™”๋œ IR(Intermediate Representation)์„ ์ถ”์ถœํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ 3) ์ตœ์ ํ™”๋œ IR์„ ๋‹ค์‹œ ์žฌ๊ตฌ์„ฑ์„ ํ•˜๊ฒŒ ๋˜๋ฉด output binary๊ฐ€ ์ƒ์„ฑ๋ฉ๋‹ˆ๋‹ค. ์ด๋ ‡๊ฒŒ ์ถ”์ถœ๋œ binary๋Š” AFL๋“ฑ์˜ fuzzer์— input์œผ๋กœ ์‚ฌ์šฉ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ž์„ธํ•œ ๊ณผ์ •์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

Static Rewriting

ZAFL์€ ํšจ์œจ์ ์ธ instrumentation์„ ์œ„ํ•ด dynamic transformation์ด ์•„๋‹Œ static writing ๊ธฐ๋ฒ•์„ ์ฑ„ํƒํ•ฉ๋‹ˆ๋‹ค. ๊ธฐ๋ณธ์ ์œผ๋กœ static writing์€ Zipr์ด๋ผ๋Š” binary rewriting ํ”„๋กœ๊ทธ๋žจ์„ ํ†ตํ•ด ์ด๋ฃจ์–ด์ง‘๋‹ˆ๋‹ค.
Zipr์€ ์ˆœ์„œ๋Œ€๋กœ rida, pda_register, fill_in_cfg, fill_in_indtargs, fix_calls์˜ ๋‹จ๊ณ„๋ฅผ ๊ฑฐ์ณ ์ˆ˜ํ–‰๋ฉ๋‹ˆ๋‹ค. ๊ฒฐ๋ก ์ ์œผ๋กœ ์ด ๊ณผ์ •์„ ํ†ตํ•ด์„œ input ๋ฐ”์ด๋„ˆ๋ฆฌ์— ๋Œ€ํ•œ IR data structure๊ฐ€ ์ƒ์„ฑ๋ฉ๋‹ˆ๋‹ค. ๊ฐ ๋‹จ๊ณ„์— ๋Œ€ํ•œ ์ž์„ธํ•œ ์„ค๋ช…์€ ๋…ผ๋ฌธ์˜ ๋ฒ”์œ„๋ฅผ ๋ฒ—์–ด๋‚˜๊ธฐ ๋•Œ๋ฌธ์— ์ถ”๊ฐ€์ ์œผ๋กœ ์กฐ์‚ฌํ•˜์ง€ ์•Š์•˜์Šต๋‹ˆ๋‹ค.

ZAX Transformation

Zipr์„ ํ†ตํ•ด ์ƒ์„ฑ๋œ IR์„ ๊ธฐ๋ฐ˜์œผ๋กœ ZAFL์€ ZAX transformation์„ ์ˆ˜ํ–‰ํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ZAX Transformation์€ optimization, analysis, point selection, ๊ทธ๋ฆฌ๊ณ  application์œผ๋กœ ์ด๋ฃจ์–ด์ ธ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด ๊ณผ์ •์„ ํ†ตํ•ด ์ตœ์ ํ™”๋œ IR data structure์„ ์ƒ์„ฑํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.
์ฐธ๊ณ ๋กœ ZAX Transformation์€ ํ•ต์‹ฌ ๋‚ด์šฉ์„ ์ •๋ฆฌํ•จ๊ณผ ๋™์‹œ์— ์‹ค์ œ ์ฝ”๋“œ๋ฅผ ํ•จ๊ป˜ ๋ถ„์„ํ•˜์˜€์Šต๋‹ˆ๋‹ค. ZAFL ์‚ฌ์šฉ์‹œ ์˜ต์…˜์— ๋”ฐ๋ผ ๊ณผ์ •์ด ์กฐ๊ธˆ์”ฉ ๋‹ฌ๋ผ์ง€์ง€๋งŒ, ZAX Transformation ์ž์ฒด์— ์ง‘์ค‘ํ•˜๊ณ ์ž ์˜ต์…˜ ์—†์ด ์‹คํ–‰ํ–ˆ์„ ๋•Œ๋ฅผ ๊ฐ€์ •ํ•˜์—ฌ Transformation ๊ณผ์ •์„ ๋ถ„์„ํ•˜์˜€์Šต๋‹ˆ๋‹ค.
[1] Optimization
ZAX Transformation์˜ ์ฒซ ๋ฒˆ์งธ ๋‹จ๊ณ„์ธ Optimization ๋‹จ๊ณ„๋ฅผ ํ†ตํ•ด์„œ๋Š” ์ฃผ์–ด์ง„ ์กฐ๊ฑด์— ๋”ฐ๋ผ์„œ ์ตœ์ ํ™”๋ฅผ ์ง„ํ–‰ํ•ฉ๋‹ˆ๋‹ค. ์ด ๊ณผ์ •์€ control flow๋ฅผ ๋ณ€๊ฒฝ์‹œํ‚ค๊ธฐ ๋•Œ๋ฌธ์—, ์ด ๊ณผ์ •์„ ๊ฐ€์žฅ ๋จผ์ € ์ง„ํ–‰ํ•ฉ๋‹ˆ๋‹ค. Optimization ๋‹จ๊ณ„๋Š” filterBlocksByDomgraph ํ•จ์ˆ˜๋กœ ๊ตฌํ˜„๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ์ด ํ•จ์ˆ˜์˜ ํ•ต์‹ฌ ๋‚ด์šฉ์€ ์•„๋ž˜ for๋ฌธ์ž…๋‹ˆ๋‹ค.
Figure 4: filterBlocksByDomgraph์˜ ์ผ๋ถ€.
๊ณผ์ •์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค. ํ•จ์ˆ˜ ์‹คํ–‰ ์‹œ ์ „๋‹ฌ๋ฐ›์€ basic block set์„ ๊ธฐ๋ฐ˜์œผ๋กœ, dominator tree๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. ์ƒ์„ฑ๋œ dominator tree๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ, ๊ฐ branch์— ๋Œ€ํ•ด successor ์—ฌ๋ถ€์— ๋”ฐ๋ผ์„œ CFG ์ตœ์ ํ™” ์—ฌ๋ถ€๋ฅผ ๊ฒฐ์ •ํ•ฉ๋‹ˆ๋‹ค. ์ด ๊ณผ์ •์—์„œ ์™œ dominator tree๋ฅผ ํ™œ์šฉํ•˜๋Š”์ง€์— ๋Œ€ํ•ด์„œ๋Š” ์•„๋ž˜ Extending Compiler-quality Transforms to Binary-only Fuzzing์—์„œ ์„œ์ˆ ํ•˜์˜€์Šต๋‹ˆ๋‹ค.
๊ฒฐ๋ก ์ ์œผ๋กœ Compiler-based Fuzzing Enhancements์—์„œ ๋‹ค๋ฃฌ instrumentation pruning ๊ณผ์ •๊ณผ ๋งค์šฐ ๋น„์Šทํ•œ ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด ๊ณผ์ •์„ ํ†ตํ•ด ์ตœ์ ์˜ Control-Flow graph๋ฅผ ์ƒ์„ฑํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.
[2] Analysis & Point Selection
1๋‹จ๊ณ„์—์„œ ์ƒ์„ฑ๋œ Control-Flow Graph๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ, Control-Flow ๋ถ„์„์„ ์ง„ํ–‰ํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ์ด ๊ณผ์ •์„ ํ†ตํ•ด์„œ๋Š” predecessor successor, data-flow, and dominance relationships ๋“ฑ์˜ Meta-characteristics๋ฅผ ์ถ”์ถœํ•ฉ๋‹ˆ๋‹ค. ์ถ”์ถœ๋œ Meta-characteristics๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ, ์‹ค์ œ๋กœ instruction์„ ์‚ฝ์ž…ํ•  ๋ธ”๋ก์„ ํŒ๋‹จํ•ฉ๋‹ˆ๋‹ค. ์ด ๋ฐฉ๋ฒ•์€ ๋ชจ๋“  ๋ธ”๋ก์„ ๋‚˜์—ดํ•œ ๋’ค, ์กฐ๊ฑด์— ๋ถ€ํ•ฉํ•˜์ง€ ์•Š๋Š” ๋ถ€๋ถ„์„ ์ œ๊ฑฐํ•˜๋Š” ๋ฐฉ์‹์œผ๋กœ ์ง„ํ–‰ํ•ฉ๋‹ˆ๋‹ค. Analysis์™€ Point Selection ๋‹จ๊ณ„๋Š” getBlocksToInstrumentํ•จ์ˆ˜์— ๊ตฌํ˜„๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค. ์•„๋ž˜๋Š” ํ•จ์ˆ˜์˜ ํ•ต์‹ฌ ๋‚ด์šฉ์ž…๋‹ˆ๋‹ค.
Figure 5: getBlocksToInstrument์˜ ์ผ๋ถ€.
ํ•ด๋‹น for๋ฌธ์„ ๋ณด๋ฉด, if๋ฌธ์„ ํ†ตํ•ด์„œ instrumentation์„ ํฌํ•จํ•˜์ง€ ์•Š์„ ๋ธ”๋ก์„ ์ œ์™ธํ•˜๋Š” ๋ฐฉ์‹์œผ๋กœ ๊ตฌํ˜„ํ•œ ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด ๋ฐฉ๋ฒ•์€ compiler-based ํ™˜๊ฒฝ์—์„œ์˜ fuzzing enhancements๊ธฐ๋ฒ• ์ค‘ ํ•˜๋‚˜์ธ instrumentation downgrading์„ ์ฑ„ํƒํ•˜์˜€์Šต๋‹ˆ๋‹ค. ์ด ๊ณผ์ •์— ๋Œ€ํ•œ ์ž์„ธํ•œ ์„ค๋ช…์€ ์•„๋ž˜ Extending Compiler-quality Transforms to Binary-only Fuzzing์—์„œ ์„œ์ˆ ํ•˜์˜€์Šต๋‹ˆ๋‹ค.
[3] Application
๋งˆ์ง€๋ง‰์œผ๋กœ ZAX Transformation์—์„œ ๋ณ€ํ™˜ํ•œ ๊ฒƒ๋“ค์„ ์ ์šฉ์‹œํ‚ต๋‹ˆ๋‹ค. ์ด ๊ณผ์ •์—์„œ ์ค‘์š”ํ•œ ๊ฒƒ์€ ์–ด๋–ป๊ฒŒ ๊ฐ ์œ„์น˜์— instrument๋ฅผ ์‚ฝ์ž…ํ•˜๋Š”์ง€๊ฐ€ ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค. ํ”„๋กœ๊ทธ๋žจ์ด ์ •์ƒ์ ์œผ๋กœ ์ฒ˜๋ฆฌ๋˜๋Š” ์„  ์•ˆ์—์„œ instrumentation์ด ์ง„ํ–‰๋˜์–ด์•ผ ํ•˜๊ธฐ ๋•Œ๋ฌธ์—, ๋ ˆ์ง€์Šคํ„ฐ๊ฐ€ ํ˜„์žฌ ์‚ฌ์šฉ๋˜๊ณ  ์žˆ๋Š”์ง€ ํŒ๋ณ„ํ•˜์—ฌ(๋…ผ๋ฌธ์—์„œ๋Š” liveness๋ฅผ ํŒ๋ณ„ํ•œ๋‹ค๊ณ  ํ‘œํ˜„ํ•˜์˜€์Šต๋‹ˆ๋‹ค) instrumentation์„ ์ง„ํ–‰ํ•ฉ๋‹ˆ๋‹ค.
๋˜ํ•œ, ์ƒํƒœ ๋ ˆ์ง€์Šคํ„ฐ๋Š” ๋‹ค๋ฅธ ๋ ˆ์ง€์Šคํ„ฐ๋ณด๋‹ค ๋” ๋น„์šฉ์ด ๋งŽ์ด ๋“ค๊ธฐ ๋•Œ๋ฌธ์— ์„ฑ๋Šฅ ํ–ฅ์ƒ์— ์žˆ์–ด ๊ณ ๋ คํ•ด์•ผ ํ•˜๋Š” ๋ถ€๋ถ„ ์ค‘ ํ•˜๋‚˜์ž…๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ์•ž์„œ ๋งํ–ˆ๋˜ liveness ํŒ๋ณ„์„ ํ†ตํ•ด ๊ธฐ์กด์˜ ๊ณผ์ •์„ ํ•ด์น˜์ง€ ์•Š์œผ๋ฉด์„œ, ์ตœ๋Œ€์˜ ํšจ์œจ์„ ๊ฐ€์งˆ ์ˆ˜ ์žˆ๋„๋ก ์„ค๊ณ„ํ•˜์˜€์Šต๋‹ˆ๋‹ค. ์ฝ”๋“œ์ƒ์—์„œ๋Š” ์ด ๊ณผ์ •์„ instrumentBasicBlock์ด๋ผ๋Š” ํ•จ์ˆ˜์—์„œ ์ฒ˜๋ฆฌํ•˜์˜€์Šต๋‹ˆ๋‹ค.
Figure 6: instrumentBasicBlock์˜ ์ผ๋ถ€(1).
ํ•ด๋‹น ํ•จ์ˆ˜์—์„œ ๊ฐ€์žฅ ๋จผ์ € ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ ๋ ˆ์ง€์Šคํ„ฐ๋ฅผ ํŒŒ์•…ํ•ฉ๋‹ˆ๋‹ค. ์ด ๊ณผ์ •์—์„œ ๋”ฐ๋กœ ๋น„์šฉ์„ ๊ณ„์‚ฐํ•˜์—ฌ ์šฐ์„ ์ˆœ์œ„๋ฅผ ๋งค๊ธฐ์ง€๋Š” ์•Š์ง€๋งŒ, eflags๋“ฑ์˜ ์ƒํƒœ ๋ ˆ์ง€์Šคํ„ฐ๊ฐ€ ์—†๋Š” ๊ฒƒ์œผ๋กœ ๋ฏธ๋ฃจ์–ด ๋ณด์•„ ์ ์€ ๋น„์šฉ์„ ํ•„์š”๋กœ ํ•˜๋Š” ๋ ˆ์ง€์Šคํ„ฐ๋“ค๋งŒ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.
Figure 7: instrumentBasicBlock์˜ ์ผ๋ถ€(2).
๋‹ค์Œ์œผ๋กœ๋Š” ์‹ค์ œ instrumentation์„ ์ง„ํ–‰ํ•ฉ๋‹ˆ๋‹ค. Instrument๋ฅผ ์‚ฝ์ž…ํ•  ๋•Œ๋Š” ์ฒ˜์Œ ์ง„ํ–‰ํ•˜๋Š” ๊ฒฝ์šฐ insertAssemblyBefore, ์ดํ›„๋กœ๋Š” insertAssemblyAfter๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์•ž์„œ ์„ค๋ช…ํ•œ ๊ฒƒ์ฒ˜๋Ÿผ, ZAFL์€ instrumentation์„ ์ง„ํ–‰ํ•  ๋•Œ ํ•จ์ˆ˜๋ฅผ ์ด์šฉํ•˜๋Š” trampolining์ด ์•„๋‹Œ, ๋ฐ”๋กœ ์–ด์…ˆ๋ธ”๋ฆฌ์–ด๋ฅผ ์‚ฝ์ž…ํ•˜๋Š” inlining ๊ธฐ๋ฒ•์„ ์‚ฌ์šฉํ•œ ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
Figure 8: instrumentBasicBlock์˜ ์ผ๋ถ€(3).
๋งˆ์ง€๋ง‰์œผ๋กœ ์‚ฌ์šฉํ–ˆ๋˜ ๋ ˆ์ง€์Šคํ„ฐ๋“ค์„ ๋‹ค์‹œ ๋ณต๊ตฌํ•˜์—ฌ ํ”„๋กœ๊ทธ๋žจ ์‹คํ–‰์— ์ง€์žฅ์ด ์—†๋„๋ก ํ•ฉ๋‹ˆ๋‹ค. ์œ„์˜ ๊ณผ์ •๊ณผ ๋น„๊ตํ•˜์˜€์„ ๋•Œ assembly์˜ ์ˆœ์„œ๊ฐ€ ๋ฐ˜๋Œ€์ธ ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
ํŠนํžˆ ์ด ๊ณผ์ •์—์„œ getContextSensitivity๋“ฑ์˜ ํ•จ์ˆ˜๋ฅผ ํ†ตํ•ด context-sensitive tracking์„ ์ง„ํ–‰ํ•˜๋Š” ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด ๊ณผ์ • ๋˜ํ•œ ์•„๋ž˜ Extending Compiler-quality Transforms to Binary-only Fuzzing์—์„œ ์ž์„ธํžˆ ์„œ์ˆ ํ•˜์˜€์Šต๋‹ˆ๋‹ค.

Reconstitute Modified IR

๋งˆ์ง€๋ง‰์œผ๋กœ ZAX Transformation๊นŒ์ง€ ๊ฑฐ์นœ IR data structure์„ ๊ธฐ๋ฐ˜์œผ๋กœ, ๋งˆ์ง€๋ง‰์œผ๋กœ binary rewriting์„ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค. ์ด ๊ณผ์ • ๋˜ํ•œ Figure 3์—์„œ ๋ณผ ์ˆ˜ ์žˆ๋“ฏ์ด Zipr์ด ์ง„ํ–‰ํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ์ด๋ ‡๊ฒŒ binary rewriting์ด ์™„๋ฃŒ๋œ ๋ฐ”์ด๋„ˆ๋ฆฌ๋Š” โ€˜.zaflโ€™์ด๋ผ๋Š” ํ™•์žฅ์ž๋ฅผ ๊ฐ€์ง„ ์ƒํƒœ๋กœ output ๋ฐ”์ด๋„ˆ๋ฆฌ๊ฐ€ ๋ฉ๋‹ˆ๋‹ค. ์ด ์ƒํƒœ์˜ ๋ฐ”์ด๋„ˆ๋ฆฌ๋Š” AFL์˜ input์œผ๋กœ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

Extending Compiler-quality Transforms to Binary-only Fuzzing

๋ณธ ์—ฐ๊ตฌํŒ€์€ ZAFL์„ ๊ฐœ๋ฐœํ•˜๋Š” ๊ณผ์ •์—์„œ ์ด 5๊ฐ€์ง€์˜ LLVM ๊ธฐ๋ฐ˜ fuzzing transformation์„ ๊ตฌํ˜„ํ•˜์˜€์Šต๋‹ˆ๋‹ค. ๋ณธ ๋ฆฌ๋ทฐ์—์„œ๋Š” ์ถ”๊ฐ€์ ์œผ๋กœ ZAFL ํŒ€์ด ์–ด๋–ค ๋ฐฉ์‹์œผ๋กœ ๊ฐ transform์„ ๊ตฌํ˜„ํ–ˆ๋Š”์ง€ ๋ถ„์„ํ•˜์˜€์Šต๋‹ˆ๋‹ค.

Performance Transforms

Single-Successor Path Pruning
Single-Successor Path Pruning์€ flow graph๋ฅผ ์ตœ์ ํ™”ํ•˜๋Š” ๋ฐฉ๋ฒ• ์ค‘ ํ•˜๋‚˜๋กœ, ZAFL์˜ ๊ฒฝ์šฐ์—๋Š” AFL-Dynist์—์„œ ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ๊ตฌํ˜„ํ•˜์˜€์Šต๋‹ˆ๋‹ค. AFL-Dynist์—์„œ๋Š” entry block์ด ์•„๋‹ˆ๋ฉด์„œ, ๋‹จ์ผ successor๋ฅผ ๊ฐ€์งˆ ๊ฒฝ์šฐ ์ตœ์ ํ™”๋ฅผ ์ง„ํ–‰ํ•ฉ๋‹ˆ๋‹ค. ์ด ๊ฒฝ์šฐ์—๋Š” ๋ฌด์กฐ๊ฑด ๋„๋‹ฌํ•  ๊ฒƒ์œผ๋กœ ์˜ˆ์ƒํ•  ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์— ์ถ”๊ฐ€์ ์œผ๋กœ instrumentation์„ ์ง„ํ–‰ํ•˜์ง€ ์•Š๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ZAX Transformation์˜ ๋‹จ๊ณ„์—์„œ๋Š” Analysis๋‹จ๊ณ„์—์„œ ์‚ฝ์ž…์ด ๋˜์—ˆ์œผ๋ฉฐ, filterEntryBlock์—์„œ ๊ตฌํ˜„๋œ ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
Figure 9: filterEntryBlock์˜ ์ผ๋ถ€.
์ฝ”๋“œ์—์„œ๋„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋“ฏ์ด, ์šฐ์„ ์ ์œผ๋กœ successor์˜ ๊ฐœ์ˆ˜๋ฅผ ํ™•์ธํ•˜์—ฌ ๋‹จ์ผ successor์ธ ๊ฒฝ์šฐ input๊ฐ’์ธ p_in_out์—์„œ ์ œ์™ธ๋˜๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
Dominator Tree CFG Pruning
ZAX Transformation์˜ Optimization ๊ณผ์ •์—์„œ ์–ธ๊ธ‰ํ–ˆ๋‹ค์‹œํ”ผ, ZAFL์€ dominator tree๋ฅผ ํ†ตํ•ด CFG ์ตœ์ ํ™”๋ฅผ ์ง„ํ–‰ํ•ฉ๋‹ˆ๋‹ค. Dominator tree๋Š” ๊ธฐ๋ณธ์ ์œผ๋กœ A๋ผ๋Š” ์ฝ”๋“œ ๋ถ€๋ถ„์ด ์‹คํ–‰๋˜์—ˆ์„ ๋•Œ, B๋ผ๋Š” ์ฝ”๋“œ ๋ถ€๋ถ„ ๋˜ํ•œ ์‹คํ–‰๋˜์—ˆ๋Š”์ง€ ์ฐพ๊ธฐ ์œ„ํ•ด ์‚ฌ์šฉ๋˜๋Š” ๊ฐœ๋…์ž…๋‹ˆ๋‹ค.
๊ฒฐ๋ก ์ ์œผ๋กœ ์ด ์ด๋ก ์„ ์‚ฌ์šฉํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ๋ชจ๋“  Flow Graph์— ๋Œ€์‘ํ•˜๋Š” Dominator Tree๊ฐ€ ์กด์žฌํ•œ๋‹ค๋Š” ๊ฒƒ์„ ์ฆ๋ช…ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์ฆ๋ช…์€ [6]์œผ๋กœ ๋Œ€์ฒดํ•ฉ๋‹ˆ๋‹ค.
๊ทธ๋Ÿฌ๋ฉด ์ฒ˜์Œ์— ์–ธ๊ธ‰ํ–ˆ๋˜ โ€˜A๋ผ๋Š” ์ฝ”๋“œ ๋ถ€๋ถ„์ด ์‹คํ–‰๋˜์—ˆ์„ ๋•Œ, B๋ผ๋Š” ์ฝ”๋“œ ๋ถ€๋ถ„ ๋˜ํ•œ ์‹คํ–‰๋˜์—ˆ๋Š”์ง€โ€™์— ๋Œ€ํ•ด์„œ๋„ ์•Œ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์•ž์„œ ์ฆ๋ช…ํ•œ ๋‚ด์šฉ์„ ๋ฐ”ํƒ•์œผ๋กœ ํŒ๋‹จํ•˜์˜€์„ ๋•Œ, dominator tree์ƒ์—์„œ A๋ผ๋Š” ์ •์ ์ด B๋ผ๋Š” ์ •์ ์˜ ์กฐ์ƒ์ธ์ง€๋ฅผ ํŒ๋ณ„ํ•˜๋Š” ๊ฒƒ๊ณผ ๋™์น˜์ž…๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ dominator tree๋ฅผ ์ด์šฉํ•˜๋ฉด ๊ทธ๋ž˜ํ”„์˜ ๊ฒฝ๋กœ ์กด์žฌ ์—ฌ๋ถ€๋ฅผ ๋‹จ์ˆœํ™”ํ•  ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์— instrumentation์„ ์ตœ์†Œํ™”ํ•˜๊ธฐ ์œ„ํ•œ ์„ ํƒ์œผ๋กœ ๋ณด์ž…๋‹ˆ๋‹ค.
Edge Instrumentation Downgrading
CollAFL์€ ๊ธฐ์กด์˜ AFL์˜ edge coverage๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ, ๋ธ”๋ก์„ ์ตœ์†Œํ™”ํ•˜์—ฌ ์ปค๋ฒ„๋ฆฌ์ง€๋ฅผ ์ตœ์ ํ™”ํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” ์•ž์„œ์„œ๋„ ๋…ผ์˜ํ–ˆ๋˜ Single-Successor Path Pruning๋ฐฉ๋ฒ•๊ณผ ๋น„์Šทํ•ฉ๋‹ˆ๋‹ค. ๋งŒ์•ฝ ๋ธ”๋ก์ด ๋‹จ์ผ predecessor๋ฅผ ๊ฐ€์ง„๋‹ค๋ฉด, ํ•ด๋‹น ๋ธ”๋ก์€ ๊ทธ edge๋ฅผ ๋Œ€ํ‘œํ•œ๋‹ค๊ณ  ๋งํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด ๋ฐฉ๋ฒ•์„ ์ด์šฉํ•  ๊ฒฝ์šฐ start์™€ end point์˜ ํ•ด์‹ฑํ•˜๋Š” ๋น„์šฉ์„ ์•„๋‚„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ZAFLํŒ€์€ ์ด ๋ฐฉ๋ฒ•์„ ์ ์šฉํ–ˆ๊ณ , ์ „์ฒด ๋‹จ์œ„ ๋ธ”๋ก ์ค‘ 35~45%๊ฐ€ ์ด ์ตœ์ ํ™”๋ฅผ ์ ์šฉํ•  ์ˆ˜ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค.

Feedback Transforms

Sub-instruction Profiling
Sub-instruction Profiling์ด๋ž€ ๋ณต์žกํ•œ ์กฐ๊ฑด๋ถ€ ๋ถ€๋ถ„์„ single-byte ๋น„๊ต๋ฌธ์œผ๋กœ ๋ถ„ํ•ดํ•˜์—ฌ fuzzing์„ ์ง„ํ–‰ํ•˜๋„๋ก ํ•˜๋Š” ๊ณผ์ •์ž…๋‹ˆ๋‹ค. ์ด๋ ‡๊ฒŒ ํ•จ์œผ๋กœ์„œ ์ด๋Ÿฐ ์ œ์•ฝ์กฐ๊ฑด์„ ๋งž์ถ”๋Š” ๊ณผ์ •์„ ์ถ”์ ํ•  ์ˆ˜ ์žˆ๊ณ , mutation๊ณผ์ •์„ ์•„๋‚„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
Compiler-based ํ™˜๊ฒฝ์—์„œ๋Š” ์ด๋ฅผ ์ค‘์ฒฉ ๋น„๊ต๋ฅผ ๊ธฐ์กด์˜ ๋น„๊ต๋ฌธ์„ ๋Œ€์ฒดํ•˜์—ฌ ๊ตฌํ˜„ํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ feedback์„ ์ฆ๊ฐ€์‹œํ‚ค๋Š” ๊ฒƒ์ด ์ฃผ๋œ ๋ชฉํ‘œ์ด๊ธฐ ๋•Œ๋ฌธ์—, ZAFL์—์„œ๋Š” ๊ธฐ์กด์˜ ๋น„๊ต๋ฌธ ์•ž์— ์ค‘์ฒฉ ๋น„๊ต๋ฅผ ์‚ฝ์ž…ํ•˜๋Š” ๋ฐฉ์‹์œผ๋กœ ๊ตฌํ˜„ํ•˜์˜€์Šต๋‹ˆ๋‹ค. ์ด ๋ฐฉ๋ฒ•์œผ๋กœ ๊ตฌํ˜„ํ•  ๊ฒฝ์šฐ compiler-based ํ™˜๊ฒฝ์—์„œ ๊ตฌํ˜„ํ–ˆ์„ ๋•Œ์™€ ๋™์ผํ•œ ๊ฒฐ๊ณผ๋ฅผ ๊ฐ€์ ธ์˜ค๊ธฐ ๋•Œ๋ฌธ์— ์ด ๋ฐฉ๋ฒ•์„ ์„ ํƒํ•œ ๊ฒƒ์œผ๋กœ ๋ณด์ž…๋‹ˆ๋‹ค. ํŠนํžˆ IR์—์„œ cmp๊ฐ™์€ ๋น„๊ต ๊ตฌ๋ฌธ์„ ์ฐพ์•„์„œ ๋ฐ”์ดํŠธ๋งˆ๋‹ค ์ค‘์ฒฉ ๋น„๊ต๋ฅผ ํ•˜๊ฒŒ๋” ๊ตฌํ˜„ํ•˜์˜€์Šต๋‹ˆ๋‹ค.
Context-sensitive Tracking
๋งˆ์ง€๋ง‰์œผ๋กœ context-sensitive tracking์ž…๋‹ˆ๋‹ค. ์•ž์„œ ์กฐ์‚ฌํ–ˆ๋˜ Extra-coverage Behavior Tracking์ฒ˜๋Ÿผ, ํ˜ธ์ถœํ•  ๋•Œ์˜ ์ƒํ™ฉ๊นŒ์ง€ ๊ณ ๋ คํ•˜์—ฌ trackingํ•˜๋Š” ๊ฒƒ์„ ๋œปํ•ฉ๋‹ˆ๋‹ค. ZAFL์—์„œ๋Š” ๊ฐ ํ•จ์ˆ˜๋ฅผ ๋žœ๋ค ๊ฐ’์œผ๋กœ ๋ฐฐ์ •ํ•˜์—ฌ ๊ธฐ๋Šฅ ์ˆ˜์ค€์˜ ์ปจํ…์ŠคํŠธ ๊ฐ๋„๋ฅผ ์ƒ์„ฑํ•˜๊ณ , ์ด ๊ฐ’์€ ํ•จ์ˆ˜ ์ž…๋ ฅ/์ข…๋ฃŒ ์‹œ edge hashing ์—ฐ์‚ฐ์— ์‚ฌ์šฉ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ZAFL์—์„œ๋Š” addContextSensitivity ๋“ฑ์˜ ํ•จ์ˆ˜๋ฅผ ํ†ตํ•ด ๊ตฌํ˜„ํ–ˆ์Šต๋‹ˆ๋‹ค.
๊ฒฐ๋ก ์ ์œผ๋กœ ZAFL์—๋Š” ์œ„์™€ ๊ฐ™์ด 5๊ฐ€์ง€์˜ low-level์˜ API๋ฅผ ์‚ฝ์ž…ํ•˜์—ฌ binary fuzzing์ด ๊ฐ€์ง€์ง€ ๋ชปํ–ˆ๋˜ ๋†’์€ ์„ฑ๋Šฅ์„ ๊ตฌํ˜„ํ•˜์˜€์Šต๋‹ˆ๋‹ค.

Evaluation

๊ฐœ๋ฐœ์ง„๋“ค์€ ZAFL์ด ์‹ค์ œ๋กœ ์ข‹์€ ์„ฑ๋Šฅ์„ ๊ฐ€์ง„ fuzzer์ธ์ง€ ํ‰๊ฐ€ํ•˜๊ธฐ ์œ„ํ•ด ๋ช‡ ๊ฐ€์ง€ ํ‰๊ฐ€๋“ค์„ ์ง„ํ–‰ํ•˜์˜€์Šต๋‹ˆ๋‹ค. ํ‰๊ฐ€๋ฅผ ์ง„ํ–‰ํ•  ๋•Œ ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๊ธฐ์ค€์œผ๋กœ ์ง„ํ–‰ํ•˜์˜€์Šต๋‹ˆ๋‹ค.
โ€ข
๋Œ€์ƒ: 8๊ฐœ์˜ open-source์™€ 5๊ฐœ์˜ closed-source ๋ฐ”์ด๋„ˆ๋ฆฌ
โ€ข
๊ธฐ๊ฐ„: 8์ผ๋™์•ˆ fuzzing์„ ์ง„ํ–‰
โ€ข
์„ฑ๋Šฅ: ๋ฐœ์ƒํ•˜๋Š” overhead๋ฅผ ๊ธฐ์ค€์œผ๋กœ ์ธก์ •
โ€ข
์ •ํ™•๋„: open-source์—์„œ ๋ณต๊ตฌ๋˜์ง€ ์•Š์€ instruction์— ๋Œ€ํ•ด AFL-LLVM๊ณผ ๋น„๊ต
โ€ข
ํ™•์žฅ์„ฑ: ์ž๋™ํ™”๋œ smoke-test๋‚˜ ์ง์ ‘ ์‹คํ–‰ํ•˜์—ฌ ์—ฌ๋ถ€ ํ™•์ธ
์ž์„ธํ•œ ํ‰๊ฐ€ ๋‚ด์šฉ์€ ์•„๋ž˜์™€ ๊ฐ™์Šต๋‹ˆ๋‹ค.
Does ZAFL enhance binary fuzzing?
๊ฐ€์žฅ ๋จผ์ € ZAFL์ด binary-only ํ™˜๊ฒฝ์˜ fuzzing์—์„œ ๋” ๋‚˜์€ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ฃผ๋Š”์ง€ ์ธก์ •ํ•˜์˜€์Šต๋‹ˆ๋‹ค. ์ด ๊ณผ์ •์—์„œ๋Š” AFL-Dynist, AFL-QEMU, ๊ทธ๋ฆฌ๊ณ  ZAFL์„ ๋น„๊ตํ•˜์˜€์Šต๋‹ˆ๋‹ค.
Figure 11: Real-world software fuzzing unique triaged crashes averaged over 8*24 hour trials.
์œ„ ๊ทธ๋ฆผ์˜ ๊ฒฝ์šฐ ์‹ค์ œ software์„ ๋Œ€์ƒ์œผ๋กœ fuzzing์„ ์ง„ํ–‰ํ•˜์˜€์„ ๋•Œ ์ฃผ์–ด์ง„ test case์— ๋น„ํ•ด ํ‰๊ท ์ ์œผ๋กœ ์–ผ๋งˆ๋‚˜ ํฌ๋ž˜์‹œ๋ฅผ ๋ฐœ๊ฒฌํ•ด๋‚ด๋Š”์ง€์— ๋Œ€ํ•ด ๊ธฐ๋กํ•œ ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๊ทธ๋ž˜ํ”„์—์„œ๋„ ๋ณผ ์ˆ˜ ์žˆ๋“ฏ์ด, ZAFL์ด ๋Œ€๋ถ€๋ถ„ ๋” ๋†’์€ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ฃผ๊ณ  ์žˆ์Œ์„ ์•Œ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๊ณ„์‚ฐ ๊ฒฐ๊ณผ ZAFL์ด ํ‰๊ท ์ ์œผ๋กœ AFL-Dynist๋ณด๋‹ค๋Š” 26%, ๊ทธ๋ฆฌ๊ณ  AFL-QEMU๋ณด๋‹ค๋Š” 131% ๋” ๋งŽ์ด ํฌ๋ž˜์‹œ๋ฅผ ๋ฐœ๊ฒฌํ–ˆ์Šต๋‹ˆ๋‹ค.
Is ZAFLโ€™s speed near compilersโ€™?
๋‹ค์Œ์œผ๋กœ๋Š” ZAFL์˜ ์†๋„๊ฐ€ ์‹ค์ œ ์ปดํŒŒ์ผ๋Ÿฌ์˜ ์†๋„์™€ ๋น„์Šทํ•œ ์ง€์— ๋Œ€ํ•ด์„œ ํ‰๊ฐ€ํ•ด๋ณด์•˜์Šต๋‹ˆ๋‹ค. ๋น„๊ต ๋Œ€์ƒ์€ ์ปดํŒŒ์ผ๋Ÿฌ, ์–ด์…ˆ๋ธ”๋Ÿฌ, AFL-Dynist, ๊ทธ๋ฆฌ๊ณ  AFL-QEMU์ž…๋‹ˆ๋‹ค. ๋˜ํ•œ ZAFL-none๊ณผ ZAFL์˜ ๋‹ค์–‘ํ•œ version์ธ ZAFL-fsrvr, ZAFL-perf, ZAFL-all์ด ์ ์šฉ๋œ ๊ฒฝ์šฐ๋ฅผ ๋‚˜๋ˆ„์–ด ๊ฒฐ๊ณผ๋ฅผ ์ธก์ •ํ•˜์˜€์Šต๋‹ˆ๋‹ค.
Figure 12: Compiler, assembler, AFL-Dynist, AFL-QEMU, and ZAFL fuzzing instrumentation performance relative to baseline (higher is better).
๊ฒฐ๋ก ์ ์œผ๋กœ ZAFL์€ ์ปดํŒŒ์ผ๋Ÿฌ์— ๋น„ํ•ด ํฌ๊ฒŒ ๋’ค๋–จ์–ด์ง€์ง€ ์•Š๋Š” ์†๋„๋ฅผ ๋ณด์—ฌ์ฃผ์—ˆ์Šต๋‹ˆ๋‹ค. ์ปดํŒŒ์ผ๋Ÿฌ์™€ ์–ด์…ˆ๋ธ”๋Ÿฌ์˜ ๊ฒฝ์šฐ ๊ฐ๊ฐ 24%์™€ 34%์˜ overhead๊ฐ€ ๋ฐœ์ƒํ•˜์˜€๋Š”๋ฐ, ZAFL์˜ ๊ฒฝ์šฐ ๋ฒ„์ „์— ๋”ฐ๋ผ ์ƒ์ดํ•จ์„ ๋ณด์˜€์ง€๋งŒ 27~32%์˜ overhead๊ฐ€ ๋ฐœ์ƒํ•˜์˜€์Šต๋‹ˆ๋‹ค. ๋น„๋ก ์ปดํŒŒ์ผ๋Ÿฌ์— ๋น„๊ตํ–ˆ์„ ๋•Œ๋Š” ๋” ๋‚ฎ์€ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์คฌ์ง€๋งŒ, AFL-Dynist์™€ AFL-QEMU๊ฐ€ ๊ฐ 88%, 256%์˜ overhead๋ฅผ ๋ฐœ์ƒ์‹œํ‚ค๋Š” ๊ฒƒ์„ ๊ณ ๋ คํ•˜์˜€์„ ๋•Œ ์—„์ฒญ๋‚œ ์„ฑ์žฅ์„ ์ด๋ฃจ์—ˆ๋‹ค๊ณ  ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
Can ZAFL support real closed-source?
๋‹ค์Œ์œผ๋กœ๋Š” ZAFL์ด ์‹ค์ œ๋กœ closed-source์— ๋Œ€ํ•ด์„œ ์ข‹์€ ์„ฑ๊ณผ๋ฅผ ๋ณด์—ฌ์ฃผ๋Š”์ง€์— ๋Œ€ํ•ด์„œ ํ‰๊ฐ€ํ•˜์˜€์Šต๋‹ˆ๋‹ค. ํ‰๊ฐ€๋Š” 5๊ฐœ์˜ ์ทจ์•ฝ์ ์— ์ข…๋ฅ˜์— ๋Œ€ํ•ด์„œ ์–ผ๋งˆ๋‚˜ ๋นจ๋ฆฌ, ๊ทธ๋ฆฌ๊ณ  ๋งŽ์ด ๋ฐœ๊ฒฌํ•ด๋‚ด๋Š”์ง€์— ๋Œ€ํ•ด์„œ ํ‰๊ฐ€ํ•˜์˜€์Šต๋‹ˆ๋‹ค. ๋น„๊ต ๋Œ€์ƒ์€ AFL-Dynist์™€ AFL-QEMU์ž…๋‹ˆ๋‹ค.
Table 13: Mean time-to-discovery of closed-source binary bugs found for AFL-Dyninst, AFL-QEMU, and ZAFL over 5ร—24-hour fuzzing trials. โœ— = bug is not reached in any trials for that instrumenter configuration.
๊ทธ ๊ฒฐ๊ณผ, ZAFL์ด AFL-Dynist๋ณด๋‹ค๋Š” 55%, ๊ทธ๋ฆฌ๊ณ  AFL-QEMU๋ณด๋‹ค๋Š” 38% ๋” ํฌ๋ž˜์‹œ๋ฅผ ๋งŽ์ด ๋ฐœ๊ฒฌํ•œ ๊ฒƒ์„ ์•Œ์•„๋‚ผ ์ˆ˜ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค. ๋”๋ถˆ์–ด, ํ‰๊ท ์ ์œผ๋กœ ZAFL์€ ์ด ๋ฒ„๊ทธ๋“ค์„ ์ฐพ์•„๋‚ด๋Š”๋ฐ 660%, 113%(๊ฐ๊ฐ AFL-Dynist, AFL-QEMU) ๋” ๋น ๋ฅธ ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค.
Is ZAFL precise?
๋‹ค์Œ์œผ๋กœ๋Š” ZAFL์˜ ์ •ํ™•๋„์— ๋Œ€ํ•ด์„œ ํ‰๊ฐ€ํ•˜์˜€์Šต๋‹ˆ๋‹ค. ์ •ํ™•๋„๋ฅผ ํ‰๊ฐ€ํ•˜๊ธฐ ์œ„ํ•ด์„œ LLVM-10 objdump์™€ ์ •ํ™•๋„๋ฅผ ๋น„๊ตํ•˜์˜€์Šต๋‹ˆ๋‹ค. ๋น„๊ต๊ตฐ์€ IDA Pro์™€ Binary Ninja๊ฐ€ ๊ฐ์ง€ํ•ด๋‚ผ ์ˆ˜ ์žˆ๋Š” instrumentation์„ ๋น„๊ตํ•˜์˜€์Šต๋‹ˆ๋‹ค.
Table 14: Instruction recovery statistics for IDA Pro, Binary Ninja, and ZAFL, with ground-truth disassembly from LLVM-10โ€™s objdump. Reached = mean unrecovered instructions reached by fuzzing (hence, erroneouslyunrecovered); FalseNeg = erroneously-unrecovered instructions over total.
๊ฒฐ๋ก ์ ์œผ๋กœ ZAFL์ด ๊ฐ€์žฅ ๋†’์€ ์ •ํ™•๋„๋ฅผ ๋ณด์˜€์Šต๋‹ˆ๋‹ค. ํŠนํžˆ LLVM-10์˜ objdump์™€ ๋น„๊ตํ•ด๋ณด์•˜์„ ๋•Œ, ZAFL์„ ํ†ตํ•ด ๋„๋‹ฌํ•˜์ง€ ๋ชปํ•˜๋Š” instruction์€ ์—†์—ˆ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ ์ปค๋ฒ„๋ฆฌ์ง€ ์ •ํ™•๋„์˜ ํ‰๊ท ์„ ๋‚ด๋ณด์•˜์„ ๋•Œ 99.99%๋ผ๋Š” ๊ฒฐ๊ณผ๋ฅผ ๋„์ถœํ•  ์ˆ˜ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค.
Does ZAFL Scale?
๋งˆ์ง€๋ง‰์œผ๋กœ ZAFL์ด ๋‹ค์–‘ํ•œ ํ˜•์‹์„ ์ง€์›ํ•˜๋Š”์ง€(scalability)์— ๋Œ€ํ•ด์„œ ํ‰๊ฐ€ํ•˜์˜€์Šต๋‹ˆ๋‹ค. ZAFL์€ open-source, closed-source, ๊ทธ๋ฆฌ๊ณ  Windows ๋ฐ”์ด๋„ˆ๋ฆฌ๊นŒ์ง€ ๋ชจ๋‘ ํ˜ธํ™˜ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค. ๋ณธ ์—ฐ๊ตฌํŒ€์€ ๊ฒฐ๋ก ์ ์œผ๋กœ ์ด 56๊ฐœ์˜ ๋ฐ”์ด๋„ˆ๋ฆฌ์—์„œ ZAFL์ด ์ •์ƒ์ ์œผ๋กœ ๋™์ž‘ํ•œ๋‹ค๋Š” ๊ฒƒ์„ ํ™•์ธํ•˜์˜€์Šต๋‹ˆ๋‹ค. ์ด 56๊ฐœ์˜ ๋ฐ”์ด๋„ˆ๋ฆฌ๋Š” 1) Linux์™€ Window ๋ฐ”์ด๋„ˆ๋ฆฌ, 2) stripped, PIE, ๊ทธ๋ฆฌ๊ณ  non-PIEํ˜•์‹, 3) 100KB์™€ 100MB์‚ฌ์ด์˜ ํฌ๊ธฐ์˜ ๋ฐ”์ด๋„ˆ๋ฆฌ, ๊ทธ๋ฆฌ๊ณ  4)100์—์„œ 1,000,000๊ฐœ์˜ ๋‹จ์œ„ ๋ธ”๋ก์„ ๊ฐ€์ง„ ๋ฐ”์ด๋„ˆ๋ฆฌ๋ฅผ ๋ชจ๋‘ ์ง€์›ํ•œ๋‹ค๋Š” ๊ฒƒ์„ ๊ฒ€์ฆํ•˜์˜€์Šต๋‹ˆ๋‹ค.

Conclusion

๋ณธ ์—ฐ๊ตฌ๋Š” compiler-based fuzzing๊ณผ binary-only fuzzing ์‚ฌ์ด์˜ ๊ฐ„๊ทน์„ ์ค„์ด๊ธฐ ์œ„ํ•ด ์ง„ํ–‰ํ–ˆ๋˜ ์—ฐ๊ตฌ์˜€์Šต๋‹ˆ๋‹ค. ํŠนํžˆ performance์™€ feedback์˜ ์ธก๋ฉด์œผ๋กœ ๋‚˜๋ˆ„์–ด ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ์ด๋ฃฌ ๊ฒƒ์ด ๊ฐ€์žฅ ์ธ์ƒ๊นŠ์—ˆ์Šต๋‹ˆ๋‹ค.
๋‹ค๋งŒ ์•„์‰ฌ์› ๋˜ ์ ์€ ZAFL์ด static rewriting์˜ ๋ถ€๋ถ„์— ์žˆ์–ด์„œ๋Š” ๊ธฐ์กด์˜ binary rewriter์ธ Zipr์— ๋Œ€๋ถ€๋ถ„ ์˜์กดํ•œ ์ ์ž…๋‹ˆ๋‹ค. Zipr๋„ ์ข‹์€ binary rewriter์ด์ง€๋งŒ, ์ด ๋•Œ๋ฌธ์— ๋‚œ๋…ํ™”๊ฐ€ ์ง„ํ–‰๋œ ๋ฐ”์ด๋„ˆ๋ฆฌ์˜ ๊ฒฝ์šฐ์—๋Š” ์ข‹์€ ์„ฑ๋Šฅ์ด ๋‚˜์˜ค์ง€ ์•Š์•˜์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์ตœ๊ทผ automatic deobfuscation ๋“ฑ์˜ ์—ฐ๊ตฌ๊ฐ€ ๊ณ„์†๋˜๊ณ  ์žˆ๊ธฐ ๋•Œ๋ฌธ์—, ์ถ”ํ›„ ์—ฐ๊ตฌ์—์„œ ์ด๋ฅผ ์ ์šฉํ•  ๊ฒฝ์šฐ ๋” ์ข‹์€ ์„ฑ๋Šฅ์„ ๋ณด์ผ ์ˆ˜ ์žˆ์„ ๊ฒƒ์œผ๋กœ ์˜ˆ์ƒ๋ฉ๋‹ˆ๋‹ค.
๋…ผ๋ฌธ review๋ฅผ ์ง„ํ–‰ํ•˜๋ฉด์„œ, ZAFL์— ๋Œ€ํ•ด์„œ ์ง์ ‘ ์†Œ์Šค์ฝ”๋“œ๋ฅผ ๋ถ„์„ํ•˜๊ณ  ๊ตฌ์กฐ๋ฅผ ํŒŒ์•…ํ•ด๋ณด์•˜์Šต๋‹ˆ๋‹ค. ํŠนํžˆ ZAFL์„ ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚ค๊ธฐ ์œ„ํ•ด ๊ฐœ๋ฐœ์ง„๋“ค์ด ํ–ˆ๋˜ ์„ ํƒ์„ ์™œ ํ–ˆ๋Š”์ง€์— ๋Œ€ํ•ด ์ง์ ‘ ์ฆ๋ช…ํ•ด๋ณธ ๊ฒƒ์ด ์žฌ๋ฏธ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค. ์ด๋ฒˆ ๊ณผ์ œ๋ฅผ ํ†ตํ•ด ๊ธฐ์ˆ ์„ ์ตœ์ ํ™”ํ•˜๋Š” ๊ณผ์ •๊ณผ, ๊ทธ๋ฆฌ๊ณ  ๋‹ค๋ฅธ ํ™˜๊ฒฝ์—์„œ ๊ธฐ์ˆ ์„ ๊ตฌํ˜„ํ•˜๋Š” ๋ฐฉ๋ฒ•์— ๋Œ€ํ•ด ๋งŽ์ด ๋ฐฐ์šธ ์ˆ˜ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค.

References

[1] Nagy, S., Nguyen-Tuong, A., Hiser, J. D., Davidson, J. W., & Hicks, M. (2021, September). Breaking Through Binaries: Compiler-quality Instrumentation for Better. USENIX. https://www.usenix.org/system/files/sec21-nagy.pdf
[2] USENIX. (2021.09.04). USENIX Security '21 - Breaking Through Binaries: Compiler-quality Instrumentation for Better [Video]. Youtube. https://www.youtube.com/watch?v=8Z-5aTpk_l0
[3] J. Hiser, A. Nguyen-Tuong, W. Hawkins, M. McGill, M. Co, J. Davidson. (2017, November). Zipr++: Exceptional Binary Rewriting. FEASTโ€™ 17. https://dl.acm.org/doi/pdf/10.1145/3141235.3141240
[6] T. Lengauer, R.E. Tarjan. (1979). A Fast Algorithm for Finding Dominators in a Flowgraph. Stanford University. https://www.cs.princeton.edu/courses/archive/spr03/cs423/download/dominators.pdf