In my previous article, I confirmed how debuginfo is used to map between the present instruction pointer (IP) and the operate or line containing it. That info is effective in exhibiting what code the processor is presently executing. However, having extra context for the calls that lead as much as the present operate and line being executed can also be extraordinarily useful.
For instance, suppose a operate in a library has an unlawful reminiscence entry as a result of a null pointer being handed as a parameter into the operate. Just trying on the present operate and line exhibits that the fault was triggered by tried entry by a null pointer. However, what you actually wish to know is the complete context of the lively operate calls main as much as that null pointer entry, so you possibly can decide how that null pointer was initially handed into the library operate. This context info is offered by a backtrace, and means that you can decide which capabilities could possibly be answerable for the bogus parameter.
One factor’s sure: Determining the presently lively operate calls is a non-trivial operation.
Function activation data
Modern programming languages have native variables and permit for recursion the place a operate can name itself. Also, concurrent packages have a number of threads that will have the identical operate operating on the similar time. The native variables can’t be saved in international areas in these conditions. The areas of the native variables should be distinctive for every invocation of the operate. Here’s the way it works:
- The compiler produces a operate activation file every time a operate is named to retailer native variables in a singular location.
- For effectivity, the processor stack is used to retailer the operate activation data.
- A brand new operate activation file is created on the prime of the processor stack for the operate when it’s referred to as.
- If that operate calls one other operate, then a brand new operate activation file is positioned above the prevailing operate activation file.
- Each time there’s a return from a operate, its operate activation file is faraway from the stack.
The creation of the operate activation file is created by code within the operate referred to as the prologue. The removing of the operate activation file is dealt with by the operate epilogue. The physique of the operate could make use of the reminiscence put aside on the stack for it for momentary values and native variables.
Function activation data will be variable dimension. For some capabilities, there’s no want for area to retailer native variables. Ideally, the operate activation file solely must retailer the return deal with of the operate that referred to as this operate. For different capabilities, important area could also be required to retailer native information buildings for the operate along with the return deal with. This variation in body sizes results in compilers utilizing body pointers to trace the beginning of the operate’s activation body. Now the operate prologue code has the extra job of storing the outdated body pointer earlier than creating a brand new body pointer for the present operate, and the epilogue has to revive the outdated body pointer worth.
The method that the operate activation file is laid out, the return deal with and outdated body pointer of the calling operate are fixed offsets from the present body pointer. With the outdated body pointer, the subsequent operate’s activation body on the stack will be situated. This course of is repeated till all of the operate activation data have been examined.
Optimization issues
There are a few disadvantages to having specific body pointers in code. On some processors, there are comparatively few registers accessible. Having an specific body pointer causes extra reminiscence operations for use. The ensuing code is slower as a result of the body pointer should be in one of many registers. Having specific body pointers might constrain the code that the compiler can generate, as a result of the compiler might not intermix the operate prologue and epilogue code with the physique of the operate.
The compiler’s objective is to generate quick code the place attainable, so compilers sometimes omit body pointers from generated code. Keeping body pointers can considerably decrease efficiency, as proven by Phoronix’s benchmarking. The draw back of omitting body pointers is that discovering the earlier calling operate’s activation body and return deal with are not easy offsets from the body pointer.
Call Frame Information
To assist within the technology of operate backtraces, the compiler contains DWARF Call Frame Information (CFI) to reconstruct body pointers and to search out return addresses. This supplemental info is saved within the .eh_frame
part of the execution. Unlike conventional debuginfo for operate and line location info, the .eh_frame
part is within the executable even when the executable is generated with out debug info, or when the debug info has been stripped from the file. The name body info is crucial for the operation of language constructs like throw-catch in C++.
The CFI has a Frame Description Entry (FDE) for every operate. As one in every of its steps, the backtrace technology course of finds the suitable FDE for the present activation body being examined. Think of the FDE as a desk, with every row representing a number of directions, with these columns:
- Canonical Frame Address (CFA), the situation the body pointer would level to
- The return deal with
- Information about different registers
The encoding of the FDE is designed to reduce the quantity of area required. The FDE describes the adjustments between rows reasonably than absolutely specify every row. To additional compress the information, beginning info frequent to a number of FDEs is factored out and positioned in Common Information Entries (CIE). This makes the FDE extra compact, nevertheless it additionally requires extra work to compute the precise CFA and discover the return deal with location. The instrument should begin from the uninitialized state. It steps by the entries within the CIE to get the preliminary state on operate entry, then it strikes on to course of the FDE by beginning on the FDE’s first entry, and processes operations till it will get to the row that covers the instruction pointer presently being analyzed.
Example use of Call Frame Information
Start with a easy instance with a operate that converts Fahrenheit to Celsius. Inlined capabilities do not need entries within the CFI, so the __attribute__((noinline))
for the f2c
operate ensures the compiler retains f2c
as an actual operate.
#embrace <stdio.h>
int __attribute__ ((noinline)) f2c(int f)
{
int c;
printf("convertingn");
c = (f-32.0) * 5.0 /9.0;
return c;
}
int principal (int argc, char *argv[])
{
int f;
scanf("%d", &f);
printf ("%d Fahrenheit = %d Celsiusn",
f, f2c(f));
return 0;
}
Compile the code with:
$ gcc -O2 -g -o f2c f2c.c
The .eh_frame
is there as anticipated:
$ eu-readelf -S f2c |grep eh_frame
[17] .eh_frame_hdr PROGBITS 0000000000402058 00002058 00000034 0 A 0 0 4
[18] .eh_frame PROGBITS 0000000000402090 00002090 000000a0 0 A 0 0 8
We can get the CFI info in human readable type with:
$ readelf --debug-dump=frames f2c > f2c.cfi
Generate a disassembly file of the f2c
binary so you possibly can search for the addresses of the f2c
and principal
capabilities:
$ objdump -d f2c > f2c.dis
Find the next traces in f2c.dis
to see the beginning of f2c
and principal
:
0000000000401060 <principal>:
0000000000401190 <f2c>:
In many circumstances, all of the capabilities within the binary use the identical CIE to outline the preliminary situations earlier than a operate’s first instruction is executed. In this instance, each f2c
and principal
use the next CIE:
00000000 0000000000000014 00000000 CIE
Version: 1
Augmentation: "zR"
Code alignment issue: 1
Data alignment issue: -8
Return deal with column: 16
Augmentation information: 1b
DW_CFA_def_cfa: r7 (rsp) ofs 8
DW_CFA_offset: r16 (rip) at cfa-8
DW_CFA_nop
DW_CFA_nop
For this instance, don’t fear concerning the Augmentation or Augmentation information entries. Because x86_64 processors have variable size directions from 1 to fifteen bytes in dimension, the “Code alignment factor” is about to 1. On a processor that solely has 32-bit (4 byte directions), this may be set to 4 and would permit extra compact encoding of what number of bytes a row of state info applies to. In an identical trend, there may be the “Data alignment factor” to make the changes to the place the CFA is situated extra compact. On x86_64, the stack slots are 8 bytes in dimension.
The column within the digital desk that holds the return deal with is 16. This is used within the directions on the tail finish of the CIE. There are 4 DW_CFA
directions. The first instruction, DW_CFA_def_cfa
describes compute the Canonical Frame Address (CFA) {that a} body pointer would level at if the code had a body pointer. In this case, the CFA is computed from r7 (rsp)
and CFA=rsp+8
.
The second instruction DW_CFA_offset
defines the place to acquire the return deal with CFA-8
. In this case, the return deal with is presently pointed to by the stack pointer (rsp+8)-8
. The CFA begins proper above the return deal with on the stack.
The DW_CFA_nop
on the finish of the CIE is padding to maintain alignment within the DWARF info. The FDE can even have padding on the finish of the for alignment.
Find the FDE for principal
in f2c.cfi
, which covers the principal
operate from 0x40160
as much as, however not together with, 0x401097
:
00000084 0000000000000014 00000088 FDE cie=00000000 computer=0000000000401060..0000000000401097
DW_CFA_advance_loc: 4 to 0000000000401064
DW_CFA_def_cfa_offset: 32
DW_CFA_advance_loc: 50 to 0000000000401096
DW_CFA_def_cfa_offset: 8
DW_CFA_nop
Before executing the primary instruction within the operate, the CIE describes the decision body state. However, because the processor executes directions within the operate, the main points will change. First the directions DW_CFA_advance_loc
and DW_CFA_def_cfa_offset
match up with the primary instruction in principal
at 401060
. This adjusts the stack pointer down by 0x18
(24 bytes). The CFA has not modified location however the stack pointer has, so the proper computation for CFA at 401064
is rsp+32
. That’s the extent of the prologue instruction on this code. Here are the primary couple of directions in principal
:
0000000000401060 <principal>:
401060: 48 83 ec 18 sub $0x18,%rsp
401064: bf 1b 20 40 00 mov $0x40201b,%edi
The DW_CFA_advance_loc
makes the present row apply to the subsequent 50 bytes of code within the operate, till 401096
. The CFA is at rsp+32
till the stack adjustment instruction at 401092
completes execution. The DW_CFA_def_cfa_offset
updates the calculations of the CFA to the identical as entry into the operate. This is anticipated, as a result of the subsequent instruction at 401096
is the return instruction (ret
) and pops the return worth off the stack.
401090: 31 c0 xor %eax,%eax
401092: 48 83 c4 18 add $0x18,%rsp
401096: c3 ret
This FDE for f2c
operate makes use of the identical CIE because the principal
operate, and covers the vary of 0x41190
to 0x4011c3
:
00000068 0000000000000018 0000006c FDE cie=00000000 computer=0000000000401190..00000000004011c3
DW_CFA_advance_loc: 1 to 0000000000401191
DW_CFA_def_cfa_offset: 16
DW_CFA_offset: r3 (rbx) at cfa-16
DW_CFA_advance_loc: 29 to 00000000004011ae
DW_CFA_def_cfa_offset: 8
DW_CFA_nop
DW_CFA_nop
DW_CFA_nop
The objdump
output for the f2c
operate within the binary:
0000000000401190 <f2c>:
401190: 53 push %rbx
401191: 89 fb mov %edi,%ebx
401193: bf 10 20 40 00 mov $0x402010,%edi
401198: e8 93 fe ff ff name 401030 <places@plt>
40119d: 66 0f ef c0 pxor %xmm0,%xmm0
4011a1: f2 0f 2a c3 cvtsi2sd %ebx,%xmm0
4011a5: f2 0f 5c 05 93 0e 00 subsd 0xe93(%rip),%xmm0 # 402040 <__dso_handle+0x38>
4011ac: 00
4011ad: 5b pop %rbx
4011ae: f2 0f 59 05 92 0e 00 mulsd 0xe92(%rip),%xmm0 # 402048 <__dso_handle+0x40>
4011b5: 00
4011b6: f2 0f 5e 05 92 0e 00 divsd 0xe92(%rip),%xmm0 # 402050 <__dso_handle+0x48>
4011bd: 00
4011be: f2 0f 2c c0 cvttsd2si %xmm0,%eax
4011c2: c3 ret
In the FDE for f2c
, there’s a single byte instruction originally of the operate with the DW_CFA_advance_loc
. Following the advance operation, there are two extra operations. A DW_CFA_def_cfa_offset
adjustments the CFA to %rsp+16
and a DW_CFA_offset
signifies that the preliminary worth in %rbx
is now at CFA-16
(the highest of the stack).
Looking at this fc2
disassembly code, you possibly can see {that a} push
is used to avoid wasting %rbx
onto the stack. One of some great benefits of omitting the body pointer within the code technology is that compact directions like push
and pop
can be utilized to retailer and retrieve values from the stack. In this case, %rbx
is saved as a result of the %rbx
is used to cross arguments to the printf
operate (really transformed to a places
name), however the preliminary worth of f
handed into the operate must be saved for the later computation. The DW_CFA_advance_loc
29 bytes to 4011ae
exhibits the subsequent state change simply after pop %rbx
, which recovers the unique worth of %rbx
. The DW_CFA_def_cfa_offset
notes the pop modified CFA to be %rsp+8
.
GDB utilizing the Call Frame Information
Having the CFI info permits GNU Debugger (GDB) and different instruments to generate correct backtraces. Without CFI info, GDB would have a troublesome time discovering the return deal with. You can see GDB making use of this info, if you happen to set a breakpoint at line 7 of f2c.c
. GDB places the breakpoint earlier than the pop %rbx
within the f2c
operate is finished and the return worth will not be on the prime of the stack.
GDB is ready to unwind the stack, and as a bonus can also be capable of fetch the argument f
that was presently saved on the stack:
$ gdb f2c
[...]
(gdb) break f2c.c:7
Breakpoint 1 at 0x40119d: file f2c.c, line 7.
(gdb) run
Starting program: /dwelling/wcohen/current/202207youarehere/f2c
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
98
changing
Breakpoint 1, f2c (f=98) at f2c.c:8
8 return c;
(gdb) the place
#0 f2c (f=98) at f2c.c:8
#1 0x000000000040107e in principal (argc=<optimized out>, argv=<optimized out>)
at f2c.c:15
Call Frame Information
The DWARF Call Frame Information supplies a versatile method for a compiler to incorporate info for correct unwinding of the stack. This makes it attainable to find out the presently lively operate calls. I’ve offered a short introduction on this article, however for extra particulars on how the DWARF implements this mechanism, see the DWARF specification.