Science and technology

Mapping this system counter again to the operate identify in your code

Compilers are generally used to transform human-readable supply code right into a collection of directions which are straight executed by the pc. A typical query is “How do the debuggers and error handlers report the place in the source code where the processor is currently at?” There are varied strategies to map directions again to places within the supply code. With the compiler optimizing the code, there are additionally some problems in mapping the directions again to the supply code. This first article within the collection describes how instruments map this system counter (also called the instruction pointer) again to the operate identify. Subsequent articles on this collection will cowl mapping this system counter again to the particular line in a supply file. It additionally offers a backtrace describing the collection of calls that resulted within the processor being within the present operate.

The operate entry symbols

When translating supply code into an executable binary, the compiler retains image details about the entry to every operate. Keeping this data obtainable makes it potential for the linker to assemble a number of object recordsdata (.o suffix) right into a single executable file. Before the item recordsdata are processed by the linker, the ultimate addresses of features and variables will not be identified. The object recordsdata have placeholders for the symbols that don’t but have addresses. The linker will resolve the addresses when creating the executable. Assuming that the ensuing executable is just not stripped, these symbols describing the entry of every operate are nonetheless obtainable. You can see an instance of this by compiling the next easy program:

#embrace <stdlib.h>
#embrace <stdio.h>

int a;
double b;

int
foremost(int argc, char* argv[])
{
	a = atoi(argv[1]);
	b = atof(argv[2]);
	a = a + 1;
	b = b / 42.0;
	printf ("a = %d, b = %fn", a, b);
	return 0;
}

Compile the code:

$ gcc -O2 instance.c -o instance

You can see the place the primary operate begins at 0x401060 with the next command:

$ nm instance |grep foremost
U __libc_start_main@GLIBC_2.34
0000000000401060 T foremost

Even although the code is compiled with out debugging data (-g choice), gdb can nonetheless discover the beginning of the primary operate with the image data:

$ gdb instance

GNU gdb (GDB) Fedora 12.1-1.fc36
…
(No debugging symbols present in instance)
(gdb) break foremost
Breakpoint 1 at 0x401060

Step via the code with the GDB nexti command. GDB studies the tackle it’s at the moment at within the foremost operate:

(gdb) nexti
0x0000000000401061 in foremost ()
(gdb) nexti
0x0000000000401065 in foremost ()
(gdb) the place
#0 0x0000000000401065 in foremost ()

This minimal data is helpful however is just not very best. The compiler can optimize features and cut up features into non-contiguous sections to make code related to the operate not clearly tied to the listed operate entry. A portion of the operate’s directions could be separated from the operate entry by different features’ entries. Also, the compiler could generate various names that don’t straight match the unique operate names. This makes it harder to find out which operate within the supply code the directions are related to.

DWARF data

Code compiled with the -g choice consists of further data to map between the supply code and the binary executable. By default RPMs with code compiled on Fedora have the debug data technology enabled. Then this data is put right into a separate debuginfo RPM, which will be put in as a complement to the RPMs containing the binaries. This makes it simpler to research crash dumps and debug applications. With debuginfo, you may get tackle ranges that map again to specific operate names. It additionally offers the road quantity and file identify that every instruction maps again to. The mapping data is encoded within the DWARF standard.

The DWARF operate description

For every operate with a operate entry there’s a DWARF Debug Information Entry (DIE) describing it. This data is in a machine-readable format, however there are a variety of instruments together with llvm-dwarfdump and eu-readelf that produce human-readable output of the DWARF debug data. Below is the llvm-dwarfdump output of the instance foremost operate DIE describing the primary operate of the sooner instance.c program.

The DIE begins with the DW_TAG_subprogram to point it describes a operate. There are other forms of DWARF tags used to explain different elements of applications comparable to sorts and variables.

The DIE for the operate has a number of attributes every beginning with DW_AT_ that describe the traits of the operate. These attributes present details about the operate, comparable to the place the operate is positioned within the executable binary. It additionally factors the place to seek out it within the unique supply code.

A number of strains down from the DW_TAG_subprogram is the DW_AT_name attribute that describes the supply code operate identify as foremost. The DW_AT_decl_file and DW_AT_decl_line DWARF attributes describe the file and line quantity respectively the place the operate got here from. This permits the debugger to seek out the suitable location in a file to indicate you the supply code related to the operate. Column data can be included with the DW_AT_decl_column.

The different key items of knowledge for mapping between the binary directions and the supply code are the DW_AT_low_pc and DW_AT_high_pc attributes. The use of the DW_AT_low_pc and DW_AT_high_pc signifies that the code for this operate is contiguous, starting from 0x401060 (the identical worth as offered by nm command earlier) as much as however not together with 0x4010b7. The DW_AT_ranges attribute is used to explain features if the operate covers non-contiguous areas.

With this system counter, you may map the processor’s program counter to the operate identify and discover the file and line quantity the place the operate is:

$ llvm-dwarfdump instance --name=foremost
instance:	file format elf64-x86-64

0x00000113: DW_TAG_subprogram
              DW_AT_external	(true)
              DW_AT_name	("main")
              DW_AT_decl_file	("/home/wcohen/present/202207youarehere/example.c")
              DW_AT_decl_line	(8)
              DW_AT_decl_column	(0x01)
              DW_AT_prototyped	(true)
              DW_AT_type	(0x00000031 "int")
              DW_AT_low_pc	(0x0000000000401060)
              DW_AT_high_pc	(0x00000000004010b7)
              DW_AT_frame_base	(DW_OP_call_frame_cfa)
              DW_AT_call_all_calls	(true)
              DW_AT_sibling	(0x000001ea)

Inlined features

A compiler can optimize code by changing a name to a different operate with directions that implement the operations of that referred to as operate. An inlined operate eliminates management circulate adjustments brought on by operate name and return directions to implement the normal operate calls. For an inlined operate, there is no such thing as a want for executing further operate prologue and epilogue directions required to evolve to the Application Binary Interface (ABI) of conventional operate calls.

Inline features additionally present further alternatives for optimizations because the compiler can intermix directions between the caller and the inlined invoked operate. This offers a whole image of what code can safely be eradicated. However, when you simply used the tackle ranges from the true features described by the DW_TAG_subprogram, you then would possibly misattribute an instruction to the operate that referred to as the inlined operate, relatively than the precise inlined operate containing it. For that motive, DWARF has the DW_TAG_inlined_subroutine to supply details about inlined features.

Surprisingly, even instance.c, the easy instance offered on this article, has inlined features within the generated code, atoi and atof. The code block under exhibits the output of llvm-dwarfdump for atoi. There are two elements, a DW_TAG_inlined_subroutine to explain every place atoi was truly inlined and a DW_TAG_subprogram describing the generic data that doesn’t change between the a number of inlined copies.

The DW_AT_abstract_origin within the DW_TAG_inlined_subroutine factors to the related DW_TAG_subprogram that describes the file with DW_AT_decl_file and DW_AT_decl_line identical to a DWARF DIE describing an everyday operate. In this case, you see that this inlined operate is coming from line 362 of system file /usr/embrace/stdlib.h.

The precise vary of addresses which are related to atof is non-contiguous and described by DW_AT_ranges, [0x401060,0x401060), [0x401061, 0x401065), [0x401068,0x401074), and [0x40107a,0x41080). The DW_TAG_inlined_subroutine has a DW_AT_entry_pc to point what location is taken into account to be the beginning of the inlined operate. With the compiler reordering directions it might not be apparent what can be thought-about the primary instruction of an inlined operate:

$ llvm-dwarfdump instance --name=atoi
instance:	file format elf64-x86-64

0x00000159: DW_TAG_inlined_subroutine
              DW_AT_abstract_origin	(0x00000208 "atoi")
              DW_AT_entry_pc	(0x0000000000401060)
              DW_AT_GNU_entry_view	(0x02)
              DW_AT_ranges	(0x0000000c
                 [0x0000000000401060, 0x0000000000401060)
                 [0x0000000000401061, 0x0000000000401065)
                 [0x0000000000401068, 0x0000000000401074)
                 [0x000000000040107a, 0x0000000000401080))
              DW_AT_call_file	("/home/wcohen/present/202207youarehere/example.c")
              DW_AT_call_line	(10)
              DW_AT_call_column	(6)
              DW_AT_sibling	(0x00000196)

0x00000208: DW_TAG_subprogram
              DW_AT_external	(true)
              DW_AT_name	("atoi")
              DW_AT_decl_file	("/usr/include/stdlib.h")
              DW_AT_decl_line	(362)
              DW_AT_decl_column	(0x01)
              DW_AT_prototyped	(true)
              DW_AT_type	(0x00000031 "int")
              DW_AT_inline	(DW_INL_declared_inlined)

Programming and growth

Implications of inline features

Most programmers consider the processor utterly executing one line within the supply code earlier than shifting on to the subsequent. Similarly, with the anticipated operate name ABI  programmers consider the place the caller operate completes operations within the statements earlier than the decision and statements after the decision will not be began till the callee operate return could not maintain. With inlined operate, the boundaries between features change into fuzzy. Instructions from the caller operate could also be scheduled earlier than or after the directions from the inlined features no matter the place they have been within the supply code if the compiler determines that the ultimate end result would be the similar. This could result in sudden values when inspecting variables earlier than and after the inlined features.

Further studying

This article covers a really small a part of how the mapping between supply code and the executable binary is carried out with DWARF. As a place to begin to study extra about DWARF, you would possibly learn Introduction to the DWARF Debugging Format by Michael J. Eager. Look for upcoming articles about how the directions within the features are mapped again to the supply code and the way backtraces are generated.

Most Popular

To Top