-
Notifications
You must be signed in to change notification settings - Fork 14
/
Copy pathl03-brop.html
851 lines (735 loc) · 32.4 KB
/
l03-brop.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
<h2>Baggy bounds (continued)</h2>
<p><strong>Note:</strong> These lecture notes were slightly modified from the ones posted on the 6.858 <a href="http://css.csail.mit.edu/6.858/2014/schedule.html">course website</a> from 2014.</p>
<p><strong>Example code:</strong> (assume that <code>slot_size = 16</code>):</p>
<pre><code> char *p = malloc(44); // Note that the nearest power of 2 (i.e.,
// 64 bytes) are allocated. So, there are
// 64/(slot_size) = 4 bounds table entries
// that are set to log_2(64) = 6.
char *q = p + 60; // This access is ok: It's past p's object
// size of 44, but still within the baggy
// bounds of 64.
char *r = q + 16; // ERROR: r is now at an offset of 60+16=76
// from p. This means that r is (76-64)=12
// beyond the end of p. This is more than
// half a slot away, so baggy bounds will
// raise an error.
char *s = q + 8; // s is now at an offset of 60+8=68 from p.
// So, s is only 4 bytes beyond the baggy
// bounds, which is less than half a slot
// away. No error is raised, but the OOB
// high-order bit is set in s, so that s
// cannot be derefernced.
char *t = s - 32; // t is now back inside the bounds, so
// the OOB bit is cleared.
</code></pre>
<p>For OOB pointers, the high bit is set (if OOB within half a slot).</p>
<ul>
<li>Typically, OS kernel lives in upper half, protects itself
via paging hardware.</li>
<li><strong>Q:</strong> Why half a slot for out-of-bounds?</li>
</ul>
<p>So what's the answer to the homework problem?</p>
<pre><code> char *p = malloc(255);
char *q = p + 256;
char ch = *q; // Does this raise an exception?
// Hint: How big is the baggy bound for p?
</code></pre>
<p>Does baggy bounds checking have to instrument <em>every</em>
memory address computation and access?</p>
<ul>
<li><em>No</em>, static analysis can prove that some addresses are
always safe to use. However, some address calculations
are "unsafe" in the sense that there's no way to
statically determine bounds on their values. Such
unsafe variables need checks.</li>
</ul>
<p>Handling function call arguments is a bit tricky, because
the x86 calling convention is fixed, i.e., the hardware
expects certain things to be in certain places on the
stack.</p>
<ul>
<li>However, we can copy unsafe arguments to a separate
area, and make sure that the copied arguments are
aligned and protected.</li>
<li><strong>Q:</strong> Do we have to overwrite the original arguments
with the copies values upon function return?</li>
<li><strong>A:</strong> No, because everything is pass-by-value in C!</li>
</ul>
<h3>How does baggy bounds checking ensure binary compatibility with existing libraries?</h3>
<p>In particular, how does baggy bounds code interact with pointers to memory that was
allocated by uninstrumented code?</p>
<ul>
<li><strong>Solution:</strong> Each entry in the bounds table is initialized
to the value 31, meaning that the corresponding pointer
has a memory bound of 2^31 (which is all of the
addressable memory).
<ul>
<li>On memory allocation in <em>instrumented</em> code,
bounds entries are set as previously discussed,
and reset to 31 when the memory is deallocated.</li>
<li>Memory allocated to uninstrumented code will never
change bounds table entries from their default
values of 31; so, when instrumented code interacts
with those pointers, bound errors will never</li>
</ul></li>
</ul>
<p><em>Example:</em></p>
<pre><code> Contiguous range of
memory used for the
heap
+-------------------+
| |
| |
| Heap allocated by |
| uninstrumented |---+
| code | \ Bounds table
| | \
+-------------------+ \ +-----------+
| | +->| |
| | | Always 31 |
| Heap allocated by | | |
| instrumented code | +-----------+
| | | Set using |
| |--------->| baggy bnds|
+-------------------+ +-----------+
</code></pre>
<ul>
<li>What does this all mean?
<ul>
<li>Can't detect out-of-bounds pointers generated in
uninstrumented code.</li>
<li>Can't detect when OOB pointer passed into library goes
in-bounds again.
<ul>
<li><strong>Q:</strong> Why?</li>
<li><strong>A:</strong> Because there is no pointer inspection in the
uninstrumented code which could clear the
high-order OOB bit!</li>
<li><strong>Q:</strong> Why do they instrument <code>strcpy()</code> and <code>memcpy()</code>?</li>
<li><strong>A:</strong> Because otherwise, those functions are
uninstrumented code, and suffer from the same
problems that we just discussed. For example,
off-the-shelf <code>strcpy()</code> does not ensure that
dest has enough space to store src!</li>
</ul></li>
</ul></li>
</ul>
<h3>How can baggy bits leverage 64-bit address spaces?</h3>
<p>Can get rid of the table storing bounds information, and
put it in the pointer.</p>
<pre><code> Regular pointer
+---------------+-------+------------------------+
| zero | size | supported addr space |
+---------------+-------+------------------------+
21 5 38
OOB pointer
+--------+------+-------+------------------------+
| offset | size | zero | supported addr space |
+--------+------+-------+------------------------+
13 5 8 38
</code></pre>
<p>This is similar to a fat pointer, but has the advantages
that...</p>
<ol>
<li>tagged pointers are the same size as regular pointers</li>
<li>writes to them are atomic</li>
</ol>
<p>...so programmer expectations are not broken, and data layouts stay the same.</p>
<p>Also note that, using tagged pointers, we can now keep track of
OOB pointers that go much further out-of-bounds. This is because
now we can tag pointers with an offset indicating how far they
are from their base pointer. In the 32-bit world, we couldn't
track OOB offsets without having an additional data structure!</p>
<h3>Can you still launch a buffer overflow attack in a baggy bounds system?</h3>
<p>Yes, <em>because the world is filled with sadness.</em></p>
<ul>
<li>Could exploit a vulnerability in uninstrumented libraries.</li>
<li>Could exploit temporal vulnerabilities (use-after-free).</li>
<li>Mixed buffers and code pointers</li>
</ul>
<p><em>Example:</em></p>
<pre><code> struct {
char buf[256];
void (*f) (void);
} my_type;
</code></pre>
<p>Note that <code>*f</code> is not an allocated type, so there are no
bounds checks associated with its dereference during
invocation. Thus, if <code>s.buf</code> is overflowed (e.g., by a
bug in an uninstrumented library) and <code>s.f</code> is corrupted,
the invocation of <code>f</code> will not cause a bounds error!</p>
<p>Would re-ordering f and buf help?</p>
<ul>
<li>Might break applications that depend on struct layout.</li>
<li>Might not help if this is an array of (<code>struct my_type</code>)'s.</li>
</ul>
<h3>In general, what are the costs of bounds checking?</h3>
<ul>
<li>Space overhead for bounds information (fat pointer or baggy bounds
table).</li>
<li>Baggy bounds also has space overhead for extra padding memory used
by buddy allocator (although some amount of overhead is intrinsic
to all popular algorithms for dynamic memory allocation).</li>
<li>CPU overheads for pointer arithmetic, dereferencing.</li>
<li>False alarms!
<ul>
<li>Unused out-of-bounds pointers.</li>
<li>Temporary out-of-bounds pointers by more than
<code>slot_size/2</code>.</li>
<li>Conversion from pointer to integers and back.</li>
<li>Passing out-of-bounds pointer into unchecked
code (the high address bit is set, so if the
unchecked code does arithmetic using that
pointer, insanity may ensue).</li>
</ul></li>
<li>Requires a significant amount of compiler support.</li>
</ul>
<p>So, baggy bounds checking is an approach for mitigating buffer
overflows in buggy code.</p>
<h2>More approaches for implementing bounds checking</h2>
<h3>Approach 4: non-executable memory (AMD's NX bit, Windows DEP, W^X, ...)</h3>
<ul>
<li>Modern hardware allows specifying read, write, and execute perms
for memory. (R, W permissions were there a long time ago; execute
is recent.)</li>
<li>Can mark the stack non-executable, so that adversary cannot run
their code.</li>
<li>More generally, some systems enforce "W^X", meaning all memory
is either writable, or executable, but not both. (Of course,
it's OK to be neither.)
<ul>
<li><strong>Advantage:</strong> Potentially works without any application changes.</li>
<li><strong>Advantage:</strong> The hardware is watching you all of the time,
unlike the OS.</li>
<li><strong>Disadvantage:</strong> Harder to dynamically generate code (esp.
with W^X).
<ul>
<li>JITs like Java runtimes, Javascript engines, generate
x86 on the fly.</li>
<li>Can work around it, by first writing, then changing to
executable.</li>
</ul></li>
</ul></li>
</ul>
<h3>Approach 5: randomized memory addresses (ASLR, stack randomization, ...)</h3>
<ul>
<li>Observation: Many attacks use hardcoded addresses in shellcode!
[The attacker grabs a binary and uses gdb to figure out where
stuff lives.]</li>
<li>So, we can make it difficult for the attacker to guess a valid
code pointer.
<ul>
<li>Stack randomization: Move stack to random locations, and/or
place padding between stack variables. This makes it more
difficult for attackers to determine:
<ul>
<li>Where the return address for the current frame is
located</li>
<li>Where the attacker's shellcode buffer will be located</li>
</ul></li>
<li>Randomize entire address space (Address Space Layout
Randomization): randomize the stack, the heap, location of
DLLs, etc.
<ul>
<li>Rely on the fact that a lot of code is relocatable.</li>
<li>Dynamic loader can choose random address for each
library, program.</li>
<li>Adversary doesn't know address of system(), etc.</li>
</ul></li>
<li>Can this still be exploited?
<ul>
<li>Adversary might guess randomness. Especially on 32-bit
machines, there aren't many random bits (e.g., 1 bit
belongs to kernel/user mode divide, 12 bits can't be
randomized because memory-mapped pages need to be
aligned with page boundaries, etc.).</li>
<li>For example, attacker could buffer overflow and
try to overwrite the return address with the
address of <code>usleep(16)</code>, and then seeing if the
connection hangs for 16 seconds, or if it crashes
(in which case the server forks a new ASLR process
with the same ASLR offsets). <code>usleep()</code> could be
in one of 2^16 or 2^28 places.
<a href="https://cseweb.ucsd.edu/~hovav/dist/asrandom.pdf">More details</a>.</li>
<li>ASLR is more practical on 64-bit machines (easily
32 bits of randomness).</li>
</ul></li>
</ul></li>
<li>Adversary might extract randomness.
<ul>
<li>Programs might generate a stack trace or error message
which contains a pointer.</li>
<li>If adversaries can run some code, they might be able to
extract real addresses (JIT'd code?).</li>
<li>Cute address leak in Flash's Dictionary (hash table):
<ul>
<li>Get victim to visit your Flash-enabled page (e.g., buy
an ad).</li>
<li>Hash table internally computes hash value of keys.</li>
<li>Hash value of integers is the integer.</li>
<li>Hash value of object is its memory address.</li>
<li>Iterating over a hash table is done from lowest
hash key to highest hash key.</li>
<li>So, the attacker creates a Dictionary, inserts a
string object which has shellcode, and then inserts
a bunch of numbers into the Dictionary.</li>
<li>By iterating through the Dictionary, the attacker
can determine where the string object lives by seeing
which integers the object reference falls between!</li>
<li>Now, overwrite a code pointer with the shellcode
address and bypass ASLR!</li>
</ul></li>
</ul></li>
<li>Adversary might not care exactly where to jump.
<ul>
<li>Ex: "Heap spraying": fill memory w/ shellcode so that a
random jump is OK!</li>
</ul></li>
<li>Adversary might exploit some code that's not randomized (if such
code exists).</li>
<li>Some other interesting uses of randomization:
<ul>
<li>System call randomization (each process has its own system
call numbers).</li>
<li>Instruction set randomization so that attacker cannot
easily determine what "shellcode" looks like for a
particular program instantiation.
<ul>
<li><em>Example:</em> Imagine that the processor had a special
register to hold a "decoding key." Each installation
of a particular application is associated with a
random key. Each machine instruction in the application
is XOR'ed with this key. When the OS launches the
process, it sets the decoding key register, and the
processor uses this key to decode instructions before
executing them.</li>
</ul></li>
</ul></li>
</ul>
<h3>Which buffer overflow defenses are used in practice?</h3>
<ul>
<li>gcc and MSVC enable stack canaries by default.</li>
<li>Linux and Windows include ASLR and NX by default.</li>
<li>Bounds checking is not as common, due to:
<ul>
<li>Performance overheads</li>
<li>Need to recompile programs</li>
<li>False alarms: Common theme in security tools: false
alarms prevent adoption of tools! Often, zero false
alarms with some misses better than zero misses
but false alarms.</li>
</ul></li>
</ul>
<h2>Return-oriented programming (ROP)</h2>
<p>ASLR and DEP are very powerful defensive techniques.</p>
<ul>
<li>DEP prevents the attacker from executing stack code of his
or her choosing</li>
<li>ASLR prevents the attacker from determining where shellcode
or return addresses are located.</li>
<li>However, what if the attacker could find PREEXISTING CODE
with KNOWN FUNCTIONALITY that was located at a KNOWN LOCATION?
Then, the attacker could invoke that code to do evil.
<ul>
<li>Of course, the preexisting code isn't <em>intentionally</em>
evil, since it is a normal part of the application.</li>
<li>However, the attacker can pass that code unexpected
arguments, or jump to the middle of the code and
only execute a desired piece of that code.</li>
</ul></li>
</ul>
<p>These kinds of attacks are called <em>return-oriented programming</em>,
or <em>ROP</em>. To understand how ROP works, let's examine a simple
C program that has a security vulnerability. <a href="http://codearcana.com/posts/2013/05/28/introduction-to-return-oriented-programming-rop.html">Example adapted from here</a>.</p>
<pre><code> void run_shell(){
system("/bin/bash");
}
void process_msg(){
char buf[128];
gets(buf);
}
</code></pre>
<p>Let's imagine that the system does not use ASLR or stack
canaries, but it does use DEP. <code>process_msg()</code> has an obvious
buffer overflow, but the attacker can't use this overflow
to execute shellcode in <code>buf</code>, since DEP makes the stack
non-executable. However, that <code>run_shell()</code> function looks
tempting... how can the attacker execute it?</p>
<ul>
<li>Attacker disassembles the program and figures out
where the starting address of <code>run_shell()</code>.</li>
<li>The attacker launches the buffer overflow, and
overwrites the return address of <code>process_msg()</code>
with the address of <code>run_shell()</code>. Boom! The attacker
now has access to a shell which runs with the
privileges of the application.</li>
</ul>
<p><em>Example:</em></p>
<pre><code> +------------------+
entry %ebp ----> | .. prev frame .. |
| |
| |
+------------------+
entry %esp ----> | return address | ^ <--Gets overwritten
+------------------+ | with address of
new %ebp ------> | saved %ebp | | run_shell()
+------------------+ |
| buf[127] | |
| ... | |
| buf[0] | |
new %esp ------> +------------------+
</code></pre>
<p>That's a straightforward extension of the buffer overflows that
we've already looked at. But how can we pass arguments to the
function that we're jumping to?</p>
<pre><code> char *bash_path = "/bin/bash";
void run_cmd(){
system("/something/boring");
}
void process_msg(){
char buf[128];
gets(buf);
}
</code></pre>
<p>In this case, the argument that we want to pass to
is already located in the program code. There's also
a preexisting call to <code>system()</code>, but that call isn't
passing the argument that we want.</p>
<p>We know that <code>system()</code> must be getting linked to our
program. So, using our trusty friend gdb, we can find
where the <code>system()</code> function is located, and where
bash_path is located.</p>
<p>To call <code>system()</code> with the <code>bash_path</code> argument, we have
to set up the stack in the way that <code>system()</code> expects
when we jump to it. Right after we jump to <code>system()</code>,
<code>system()</code> expects this to be on the stack:</p>
<pre><code> | ... |
+------------------+
| argument | The system() argument.
+------------------+
%esp ----> | return addr | Where system() should
+------------------+ ret after it has
finished.
</code></pre>
<p>So, the buffer overflow needs to set up a stack that
looks like this:</p>
<pre><code> +------------------+
entry %ebp ----> | .. prev frame .. |
| |
| |
| * - - - - - - - | ^
| | | Address of bash_path
+ * - - - - - - - | |
| | | Junk return addr for system()
+------------------+ |
entry %esp ----> | return address | | Address of system()
+------------------+ |
new %ebp ------> | saved %ebp | | Junk
+------------------+ |
| buf[127] | |
| ... | | Junk
| buf[0] | |
new %esp ------> +------------------+ |
</code></pre>
<p>In essence, what we've done is set up a fake calling
frame for the <code>system()</code> call! In other words, we've
simulated what the compiler would do if it actually
wanted to setup a call to <code>system()</code>.</p>
<p>What if the string <code>"/bin/bash"</code> was not in the program?</p>
<ul>
<li><p>We could include that string in the buffer overflow,
and then have the argument to system() point to
the string.</p>
<pre><code> | h\0 | ^
| * - - - - - - - | |
| /bas | |
| * - - - - - - - | |
| /bin | | <--------------------+
| * - - - - - - - | | |
| | | Address of bash_path--+
+ * - - - - - - - | |
| | | Junk return addr from system()
+------------------+ |
entry %esp ----> | return address | | Address of system()
+------------------+ |
new %ebp ------> | saved %ebp | | Junk
+------------------+ |
| buf[127] | |
| ... | | Junk
| buf[0] | |
new %esp ------> +------------------+ |
</code></pre></li>
</ul>
<p>Note that, in these examples, I've been assuming that
the attacker used a junk return address from <code>system()</code>.
However, the attacker could set it to something
useful.</p>
<p>In fact, by setting it to something useful, the attacker
can chain calls together!</p>
<p><strong>Goal:</strong> We want to call <code>system("/bin/bash")</code> multiple times.
Assume that we've found three addresses:</p>
<ul>
<li>The address of system()</li>
<li>The address of the string "/bin/bash"</li>
<li><p>The address of these x86 opcodes:</p>
<pre><code> pop %eax //Pops the top-of-stack and puts it in %eax
ret //Pops the top-of-stack and puts it in %eip
</code></pre></li>
</ul>
<p>These opcodes are an example of a "gadget." Gadgets
are preexisting instruction sequences that can be
strung together to create an exploit. Note that there
are <a href="http://www.exploit-db.com/download_pdf/17049/">user-friendly tools</a>
to help you extract gadgets from preexisting binaries (e.g., msfelfscan).</p>
<pre><code> | | ^
+ * - - - - - - - + |
| | | Address of bash_path -+ Fake calling
+ * - - - - - - - + | | frame for
(4) | | | Address of pop/ret * + system()
+ * - - - - - - - + |
(3) | | | Address of system()
+ * - - - - - - - + |
(2) | | | Address of bash_path -+ Fake calling
+ * - - - - - - - + | | frame for
(1) | | | Address of pop/ret * + system()
+------------------+ |
entry %esp ----> | return address | | Address of system()
+------------------+ |
new %ebp ------> | saved %ebp | | Junk
+------------------+ |
| buf[127] | |
| ... | | Junk
new %esp ------> | buf[0] | |
+------------------+ |
</code></pre>
<p>So, how does this work? Remember that the return instruction
pops the top of the stack and puts it into %eip.</p>
<ul>
<li>The overflowed function terminates by issuing <code>ret</code>. <code>ret</code>
pops off the top-of-the-stack (the address of <code>system()</code>)
and sets <code>%eip</code> to it. <code>system()</code> starts executing, and
<code>%esp</code> is now at (1), and points to the <code>pop/ret</code> gadget.</li>
<li><code>system()</code> finishes execution and calls <code>ret</code>. <code>%esp</code> goes
from (1)-->(2) as the <code>ret</code> instruction pops the top
of the stack and assigns it to <code>%eip</code>. <code>%eip</code> is now the
start of the <code>pop/ret</code> gadget.</li>
<li>The pop instruction in the <code>pop/ret</code> gadget discards the
<code>bash_path</code> variable from the stack. <code>%esp</code> is now at (3).
We are still in the <code>pop/ret</code> gadget!</li>
<li>The <code>ret</code> instruction in the <code>pop/ret</code> gadget pops the
top-of-the-stack and puts it into <code>%eip</code>. Now we're in
<code>system()</code> again, and <code>%esp</code> is at (4).</li>
</ul>
<p>And so on and so forth.</p>
<p>Basically, we've created a new type of machine that
is driven by the stack pointer instead of the regular
instruction pointer! As the stack pointer moves down
the stack, it executes gadgets whose code comes from
preexisting program code, and whose data comes from
stack data created by the buffer overflow.</p>
<p>This attack evades DEP protections--we're not generating
any new code, just invoking preexisting code!</p>
<h2>Stack reading: defeating canaries</h2>
<p>Assumptions:</p>
<ul>
<li>The remote server has a buffer overflow vulnerability.</li>
<li>Server crashes and restarts if a canary value is set
to an incorrect value.</li>
<li>When the server respawns, the canary is NOT re-randomized,
and the ASLR is NOT re-randomized, e.g., because the
server uses Linux's PIE mechanism, and <code>fork()</code> is used
to make new workers and not <code>execve()</code>.</li>
</ul>
<p>So, to determine an 8-byte canary value:</p>
<pre><code> char canary[8];
for(int i = 1; i <= 8; i++){ //For each canary byte . . .
for(char c = 0; c < 256; c++){ //. . . guess the value.
canary[i-1] = c;
server_crashed = try_i_byte_overflow(i, canary);
if(!server_crashed){
//We've discovered i-th byte of the
//the canary!
break;
}
}
}
</code></pre>
<p>At this point we have the canary, but remember that the
attack assumes that the server uses the same canary after
a crash.</p>
<p>Guessing the correct value for a byte takes 128 guesses on
average, so on a 32-bit system, we only need <code>4*128=512</code>
guesses to determine the canary (on a 64-bit system, we
need <code>8*128=1024</code>).</p>
<ul>
<li>Much faster than brute force attacks on the
canary (<code>2^15</code> or <code>2^27</code> expected guesses on
<code>32/64</code> bit systems with 16/28 bits of ASLR
randomness).</li>
<li>Brute force attacks can use the <code>usleep(16)</code>
probe that we discussed earlier.</li>
<li>Canary reading can be extended to reading arbitrary values
that the buffer overflow can overwrite!</li>
</ul>
<p>So, we've discussed how we can defeat randomized canaries
if canaries are not changed when a server regenerates.
We've also shown how to use gdb and gadgets to execute
preexisting functions in the program using arguments
that the attacker controls. But what if the server DOES
use ASLR? This prevents you from using offline analysis
to find where the preexisting functions are?</p>
<p>This is what the paper for today's lecture discussed.
That paper assumed that we're using a 64-bit machine,
so that's what we'll assume in this lecture from now
on. For the purposes of this discussion, the main
change is that function arguments are now passed in
registers instead of on the stack.</p>
<h2>Blind return-oriented programming</h2>
<h3>Step 1: Find a stop gadget</h3>
<ul>
<li>A stop gadget is a return address that points to code
that will hang the program, but not crash it.</li>
<li>Once the attacker can defeat canaries, he can overwrite
the overflown function's return address and start
guessing locations for a stop gadget. If the client
network connection suddenly closes, the guessed address
was not a stop gadget. If the connection stays open,
the gadget is a stop gadget.</li>
</ul>
<h3>Step 2: Find gadgets that pop stack entries</h3>
<ul>
<li>Once you have a stop gadget, you can use it to find
other gadgets that pop entries off of the stack and
into registers.</li>
<li>There are three building blocks to locate stack popping
gadgets:
<ul>
<li><em>probe:</em> Address of a potential stack popping gadget</li>
<li><em>stop:</em> Address of a stop gadget</li>
<li><em>crash:</em> Address of non-executable code (0x0)</li>
</ul></li>
</ul>
<p><em>Example:</em> Find a gadget that pops one thing off the stack.</p>
<pre><code> sleep(10)
^ ^
+--- pop rax / \
| ret / \
| \--->[stop] 0x5.... 0x5....
| [trap] 0x0 0x0 <-----------------+
+----------[probe] 0x4...8 0x4...c -->xor rax, rax | Crash!
ret |
\__________|
</code></pre>
<p>After you do this a bunch of times, you'll have a
collection of gadgets that pop one thing from the
stack and then return. However, you won't know which
<em>register</em> those gadgets store the popped value in.</p>
<ul>
<li>You need to know which registers are used to store
data so that you can issue a system call. Each system
call expects its arguments to be in a specific set
of registers. </li>
<li>Note that we also don't know the location of the
<code>syscall()</code> library function.</li>
</ul>
<h3>Step 3: Find syscall() and determine which registers the pop gadgets use</h3>
<ul>
<li><code>pause()</code> is a system call that takes no arguments (and
thus ignores everything in the registers).</li>
<li><p>To find <code>pause()</code>, the attacker chains all of the
<code>"pop x; ret"</code> gadgets on the stack, pushing the
system call number for <code>pause()</code> as the "argument"
for each gadget. At the bottom of the chain,
the attacker places the guessed address for <code>syscall()</code>.</p>
<pre><code> | | ^
+ * - - - - - - - + |
| | | Guessed addr of syscall()
+ * - - - - - - - + |
| | | ...
+ * - - - - - - - + |
| | | Sys call # for pause
+ * - - - - - - - + |
| | | Address of pop rsi; ret //Gadget 2
+ * - - - - - - - + |
| | | Sys call # for pause
+------------------+ |
entry %esp ----> | return address | | Address of pop rdi; ret //Gadget 1
+------------------+ |
new %ebp ------> | saved %ebp | | Junk
+------------------+ |
| buf[127] | |
| ... | | Junk
new %esp ------> | buf[0] | |
+------------------+ |
</code></pre></li>
</ul>
<p>So, at the end of this chain, the pop gadgets have
placed the syscall number for <code>pause()</code> in a bunch
of registers, hopefully including <code>rax</code>, which is the
one that <code>syscall()</code> looks in to find the syscall
number. </p>
<p>Once this mega-gadget induces a pause, we know that
we've determined the location of <code>syscall()</code>. Now we
need to determine which gadget pops the top-of-the
stack into <code>rax</code>. The attacker can figure this out by
process-of-elimination: iteratively try just one
gadget and see if you can invoke <code>pause()</code>.</p>
<p>To identify arbitrary <code>"pop x; ret"</code> gadgets, you can
use tricks with other system calls that use the
<code>x</code> register that you're trying to find.</p>
<p>So, the outcome of this phase is knowledge of
<code>"pop x; ret"</code> gadgets, location of <code>syscall()</code>.</p>
<h3>Step 4: Invoke write()</h3>
<p>Now we want to invoke the write call on the network
socket that the server has with the attacker's
client. We need the following gadgets:</p>
<pre><code> pop rdi; ret (socket)
pop rsi; ret (buffer)
pop rdx; ret (length)
pop rax; ret (write syscall number)
syscall
</code></pre>
<p>We have to guess the socket value, but that's
fairly easy to do, since Linux restricts processes
to 1024 simultaneously open file descriptors,
and new file descriptors have to be the lowest
one available (so guessing a small file descriptor
works well in practice).</p>
<p>To test whether we've guessed the correct file
descriptor, simply try the write and see if we
receive anything! </p>
<p>Once we have the socket number, we issue a write,
and for the data to send, we send a pointer
to the program's <code>.text</code> segment! This allows the
attacker to read the program's code (which was
randomized but now totally known to the attacker!).
Now the attacker can find more powerful gadgets
directly, and leverage those gadgets to open a
shell.</p>
<h2>Defenses against BROP</h2>
<ul>
<li>Re-randomize the canaries and the address space
after each crash!
<ul>
<li>Use <code>exec()</code> instead of <code>fork()</code> to create
processes, since <code>fork()</code> copies the address
space of the parent to the child.</li>
<li>Interestingly, Windows is not vulnerable to BROP
because Windows has no <code>fork()</code> equivalent.</li>
</ul></li>
<li>Sleep-on-crash?
<ul>
<li>Now a BROP attack is a denial-of-service!</li>
</ul></li>
<li>Bounds-checking?
<ul>
<li>Up to 2x performance overhead...</li>
</ul></li>
</ul>
<h2>More info on ROP and x86 calling conventions</h2>
<ul>
<li><a href="http://codearcana.com/posts/2013/05/21/a-brief-introduction-to-x86-calling-conventions.html">A brief introduction to x86 calling conventions</a></li>
<li><a href="http://codearcana.com/posts/2013/05/28/introduction-to-return-oriented-programming-rop.html">Introduction to return oriented programming</a></li>
<li><a href="http://www.slideshare.net/saumilshah/dive-into-rop-a-quick-introduction-to-return-oriented-programming">Dive into ROP: A quick introduction to return oriented programming</a></li>
<li><a href="https://cseweb.ucsd.edu/~hovav/dist/rop.pdf">Return-Oriented Programming: Systems, Languages, and Applications</a></li>
</ul>