Science and technology

How Tracee solves the dearth of BTF data

Tracee is a venture by Aqua Security for tracing processes at runtime. By tracing processes utilizing Linux eBPF (Berkeley packet filter) know-how, Tracee can correlate collected data and determine malicious behavioral patterns.

eBPF

BPF is a system to assist in community visitors evaluation. The later eBPF system extends traditional BPF to enhance the programmability of the Linux kernel in several areas, reminiscent of community filtering, perform hooking, and so forth. Thanks to its register-based digital machine, which is embedded within the kernel, eBPF can execute applications written with a restricted C language with no need to recompile the kernel or load a module. Through eBPF, you possibly can run your program in kernel context and hook varied occasions within the kernel path. To accomplish that, eBPF must have deep information about information constructions that the kernel is utilizing.

eBPF CO-RE

eBPF interfaces with Linux kernel ABI (software binary interface). Access to kernel constructions from eBPF VM relies on the precise Linux kernel launch.

eBPF CO-RE (compile as soon as, run in all places) is the flexibility to put in writing an eBPF program that may efficiently compile, go kernel verification, and work accurately throughout completely different kernel releases with out the necessity to recompile it for every explicit kernel.

Ingredients

CO-RE wants a exact synergism of those elements:

  • BTF (BPF kind format) data: Allows the seize of essential items of details about kernel and BPF program sorts and code, enabling all the opposite elements of BPF CO-RE puzzle.
     
  • Compiler (Clang): Records relocation data. For instance, if you happen to have been going to entry the task_struct->pid area, Clang would document that it was precisely a area named pid of kind pid_t residing inside a struct task_struct. This system ensures that even when a goal kernel has a task_struct format through which the pid area is moved to a unique offset inside a task_struct construction, you will nonetheless be capable to discover it simply by its identify and sort data.
     
  • BPF loader (libbpf): Ties BTFs from kernel and BPF applications collectively to regulate compiled BPF code to particular kernels on the right track hosts.

So how do these components combine collectively for a profitable recipe?

Development/construct

To make the code transportable, the next methods come into play:

  • CO-RE helpers/macros
  • BTF-defined maps
  • #embody "vmlinux.h" (the header file containing all of the kernel sorts)

Run

The kernel have to be constructed with the CONFIG_DEBUG_INFO_BTF=y possibility with a view to present the /sys/kernel/btf/vmlinux interface that exposes BTF-formatted kernel sorts. This permits libbpf to resolve and match all the kinds and fields and replace needed offsets and different relocatable information to guarantee that the eBPF program is working correctly for the precise kernel on the goal host.

The drawback

The drawback arises when an eBPF program is written to be transportable however the goal kernel does not expose the /sys/kernel/btf/vmlinux interface. For extra data, refer to this list of distributions that assist BTF.

To load an run an eBPF object in several kernels, the libbpf loader makes use of the BTF data to calculate area offset relocations. Without the BTF interface, the loader does not have the mandatory data to regulate the beforehand recorded sorts that this system tries to entry after processing the article for the operating kernel.

Is it potential to keep away from this drawback?

Use instances

This article explores Tracee, an Aqua Security open supply venture, that gives a potential resolution.

Tracee offers completely different operating modes to adapt itself to the surroundings circumstances. It helps two eBPF integration modes:

  • CO-RE: A transportable mode, which seamlessly runs on all supported environments
  • Non CO-RE: A kernel-specific mode, requiring the eBPF object to be constructed for the goal host

Both of them are carried out within the eBPF C code (pkg/ebpf/c/tracee.bpf.c), the place the pre-processing conditional directive takes place. This permits you to compile CO-RE the eBPF binary, passing the -DCORE argument at construct time with Clang (check out the bpf-core Make goal).

In this text, we will cowl a case of the transportable mode when the eBPF binary is constructed CO-RE, however the goal kernel has not been constructed with CONFIG_DEBUG_INFO_BTF=y possibility.

To higher perceive this situation, it helps to know what’s potential when the kernel does not expose BTF-formatted sorts on sysfs.

No BTF assist

If you need to run Tracee on a bunch with out BTF assist, there are two choices:

  1. Build and install the eBPF object on your kernel. This relies on Clang and on the provision of a kernel version-specific kernel-headers bundle.
     
  2. Download the BTF information from BTFHUB on your kernel launch and supply it to the tracee-ebpf‘s loader by the TRACEE_BTF_FILE surroundings variable.

The first possibility just isn’t a CO-RE resolution. It compiles the eBPF binary, together with a protracted listing of kernel headers. That means you want kernel growth packages put in on the goal system. Also, this resolution wants Clang put in in your goal machine. The Clang compiler may be resource-heavy, so compiling eBPF code can use a major quantity of sources, probably affecting a fastidiously balanced manufacturing workload. That mentioned, it is a good observe to keep away from the presence of a compiler in your manufacturing surroundings. This may result in attackers efficiently constructing an exploit and performing a privilege escalation.

The second possibility is a CO-RE resolution. The drawback right here is that you need to present the BTF information in your system with a view to make Tracee work. The total archive is almost 1.3 GB. Of course you possibly can present simply the appropriate BTF file on your kernel launch, however that may be troublesome when coping with completely different kernel releases.

In the top, these potential options also can introduce issues, and that is the place Tracee works its magic.

A transportable resolution

With a non-trivial constructing process, the Tracee venture compiles a binary to be CO-RE even when the goal surroundings does not present BTF data. This is feasible with the embed Go bundle that gives, at runtime, entry to information embedded in this system. During the construct, the continual integration (CI) pipeline downloads, extracts, minimizes, after which embeds BTF information together with the eBPF object contained in the tracee-ebpf resultant binary.

Tracee can extract the appropriate BTF file and supply it to libbpf, which in flip hundreds the eBPF program to run throughout completely different kernels. But how can Tracee embed all these BTF information downloaded from BTFHub with out weighing an excessive amount of in the long run?

It makes use of a function just lately launched in bpftool by the Kinvolk crew known as BTFGen, obtainable utilizing the bpftool gen min_core_btf subcommand. Given an eBPF program, BTFGen generates diminished BTF information, accumulating simply what the eBPF code wants for its run. This discount permits Tracee to embed all these information that at the moment are lighter (only a few kilobytes) and assist kernels that do not have the /sys/kernel/btf/vmlinux interface uncovered.

Tracee construct

Here’s the execution move of the Tracee construct:

(Alessio Greggi and Massimiliano Giovagnoli, CC BY-SA 4.0)

First, it’s essential to construct the tracee-ebpf binary, the Go program that hundreds the eBPF object. The Makefile offers the command make bpf-core to construct the tracee.bpf.core.o object with BTF data.

Then STATIC=1 BTFHUB=1 make all builds tracee-ebpf, which has btfhub focused as a dependency. This final goal runs the script 3rdparty/btfhub.sh, which is liable for downloading the BTFHub repositories:

Once downloaded and positioned within the 3rdparty listing, the process executes the downloaded script 3rdparty/btfhub/instruments/btfgen.sh. This script generates diminished BTF information, tailor-made for the tracee.bpf.core.o eBPF binary.

The script collects *.tar.xz information from 3rdparty/btfhub-archive/ to uncompress them and at last course of them with bpftool, utilizing the next command:

for file in $(discover ./archive/${dir} -name *.tar.xz); do
    dir=$(dirname $file)
    base=$(basename $file)
    extracted=$(tar xvfJ $dir/$base)
    bpftool gen min_core_btf ${extracted} dist/btfhub/${extracted} tracee.bpf.core.o
carried out

This code has been simplified to make it simpler to know the situation.

Now, you could have all of the components obtainable for the recipe:

  • tracee.bpf.core.o eBPF object
  • BTF diminished information (for all kernel releases)
  • tracee-ebpf Go supply code

At this level, go construct is invoked to do its job. Inside the embedded-ebpf.go file, you could find the next code:

//go:embed "dist/tracee.bpf.core.o"
//go:embed "dist/btfhub/*"

Here, the Go compiler is instructed to embed the eBPF CO-RE object with all of the BTF-reduced information inside itself. Once compiled, these information might be obtainable utilizing the embed.FS file system. To have an thought of the present scenario, you possibly can think about the binary with a file system structured like this:

dist
├── btfhub
│   ├── 4.19.0-17-amd64.btf
│   ├── 4.19.0-17-cloud-amd64.btf
│   ├── 4.19.0-17-rt-amd64.btf
│   ├── 4.19.0-18-amd64.btf
│   ├── 4.19.0-18-cloud-amd64.btf
│   ├── 4.19.0-18-rt-amd64.btf
│   ├── 4.19.0-20-amd64.btf
│   ├── 4.19.0-20-cloud-amd64.btf
│   ├── 4.19.0-20-rt-amd64.btf
│   └── ...
└── tracee.bpf.core.o

The Go binary is prepared. Now to strive it out!

Tracee run

Here’s the execution move of the Tracee run:

(Alessio Greggi and Massimiliano Giovagnoli, CC BY-SA 4.0)

As the move chart illustrates, one of many very first phases of tracee-ebpf execution is to find the surroundings the place it’s operating. The first situation is an abstraction of the cmd/tracee-ebpf/initialize/bpfobject.go file, particularly the place the BpfObject() perform takes place. The program performs some checks to know the surroundings and make selections based mostly on it:

  1. BPF file given and BTF (vmlinux or env) exists: all the time load BPF as CO-RE
  2. BPF file given however no BTF exists: it’s a non CO-RE BPF
  3. No BPF file given and BTF (vmlinux or env) exists: load embedded BPF as CO-RE
  4. No BPF file given and no BTF obtainable: examine embedded BTF information
  5. No BPF file given and no BTF obtainable and no embedded BTF: non CO-RE BPF

Here’s the code extract:

func BpfObject(config *tracee.Config, okayConfig *helpers.KernelConfig, OSInfo *helpers.OSInfo) error {
        ...
        bpfFilePath, err := checkEnvPath("TRACEE_BPF_FILE")
        ...
        btfFilePath, err := checkEnvPath("TRACEE_BTF_FILE")
        ...
        // Decision ordering:
        // (1) BPF file given & BTF (vmlinux or env) exists: all the time load BPF as CO-RE
        ...
        // (2) BPF file given & if no BTF exists: it's a non CO-RE BPF
        ...
        // (3) no BPF file given & BTF (vmlinux or env) exists: load embedded BPF as CO-RE
        ...
        // (4) no BPF file given & no BTF obtainable: examine embedded BTF information
        unpackBTFFile = filepath.Join(traceeInstallPath, "/tracee.btf")
        err = unpackBTFHub(unpackBTFFile, OSInfo)
       
        if err == nil {
                if debug {
                        fmt.Printf("BTF: using BTF file from embedded btfhub: %vn", unpackBTFFile)
                }
                config.BTFObjPath = unpackBTFFile
                bpfFilePath = "embedded-core"
                bpfBytes, err = unpackCOREBinary()
                if err != nil {
                        return fmt.Errorf("could not unpack embedded CO-RE eBPF object: %v", err)
                }
       
                goto out
        }
        // (5) no BPF file given & no BTF obtainable & no embedded BTF: non CO-RE BPF
        ...
out:
        config.KernelConfig = okayConfig
        config.BPFObjPath = bpfFilePath
        config.BPFObjBytes = bpfBytes
       
        return nil
}

This evaluation focuses on the fourth case, when eBPF program and BTF information should not supplied to tracee-ebpf. At that time, tracee-ebpf tries to load the eBPF program extracting all the mandatory information from its embed file system. tracee-ebpf is ready to present the information that it must run, even in a hostile surroundings. It is a kind of high-resilience mode used when not one of the circumstances have been happy.

As you see, BpfObject() calls these features within the fourth case department:

  • unpackBTFHub()
  • unpackCOREBinary()

They extract respectively:

  • The BTF file for the underlying kernel
  • The BPF CO-RE binary

Unpack the BTFHub

Now have a look ranging from unpackBTFHub():

func unpackBTFHub(outFilePath string, OSInfo *helpers.OSInfo) error {
        var btfFilePath string

        osId := OSInfo.GetOSReleaseFieldValue(helpers.OS_ID)
        versionId := strings.Replace(OSInfo.GetOSReleaseFieldValue(helpers.OS_VERSION_ID), """, "", -1)
        kernelRelease := OSInfo.GetOSReleaseFieldValue(helpers.OS_KERNEL_RELEASE)
        arch := OSInfo.GetOSReleaseFieldValue(helpers.OS_ARCH)

        if err := os.MkdirAll(filepath.Dir(outFilePath), 0755); err != nil {
                return fmt.Errorf("couldn't create temp dir: %s", err.Error())
        }

        btfFilePath = fmt.Sprintf("dist/btfhub/%s/%s/%s/%s.btf", osId, versionId, arch, kernelRelease)
        btfFile, err := embed.BPFBundleInjected.Open(btfFilePath)
        if err != nil {
                return fmt.Errorf("error opening embedded btfhub file: %s", err.Error())
        }
        defer btfFile.Close()

        outFile, err := os.Create(outFilePath)
        if err != nil {
                return fmt.Errorf("couldn't create btf file: %s", err.Error())
        }
        defer outFile.Close()

        if _, err := io.Copy(outFile, btfFile); err != nil {
                return fmt.Errorf("error copying embedded btfhub file: %s", err.Error())

        }

        return nil
}

The perform has a primary section the place it collects details about the operating kernel (osId, versionId, kernelRelease, and so forth). Then, it creates the listing that’s going to host the BTF file (/tmp/tracee by default). It retrieves the appropriate BTF file from the embed file system:

btfFile, err := embed.BPFBundleInjected.Open(btfFilePath)

Finally, it creates and fills the file.

Unpack the CORE Binary

The unpackCOREBinary() perform does an analogous factor:

func unpackCOREBinary() ([]byte, error) {
        b, err := embed.BPFBundleInjected.ReadFile("dist/tracee.bpf.core.o")
        if err != nil {
                return nil, err
        }

        if debug.Enabled() {
                fmt.Println("unpacked CO:RE bpf object file into memory")
        }

        return b, nil
}

Once the primary perform BpfObject()returns, tracee-ebpf is able to load the eBPF binary by libbpfgo. This is finished within the initBPF() perform, inside pkg/ebpf/tracee.go. Here’s the configuration of this system execution:

func (t *Tracee) initBPF() error {
        ...
        newModuleArgs := bpf.NewModuleArgs{
                OkConfigFilePath: t.config.KernelConfig.GetKernelConfigFilePath(),
                BTFObjPath:      t.config.BTFObjPath,
                BPFObjBuff:      t.config.BPFObjBytes,
                BPFObjName:      t.config.BPFObjPath,
        }

        // Open the eBPF object file (create a brand new module)

        t.bpfModule, err = bpf.NewModuleFromBufferArgs(newModuleArgs)
        if err != nil {
                return err
        }
        ...
}

In this piece of code we’re initializing the eBPF args filling the libbfgo construction NewModuleArgs{}. Through its BTFObjPath argument, we’re capable of instruct libbpf to make use of the BTF file, beforehand extracted by the BpfObject() perform.

At this level, tracee-ebpf is able to run correctly!

(Alessio Greggi and Massimiliano Giovagnoli, CC BY-SA 4.0)

eBPF module initialization

Next, throughout the execution of the Tracee.Init() perform, the configured arguments might be used to open the eBPF object file:

Tracee.bpfModule = libbpfgo.NewModuleFromBufferArgs(newModuleArgs)

Initialize the probes:

t.probes, err = probes.Init(t.bpfModule, internetEnabled)

Load the eBPF object into kernel:

err = t.bpfModule.BPFLoadObject()

Populate eBPF maps with preliminary information:

err = t.populateBPFMaps()

And lastly, connect eBPF applications to chose occasions’ probes:

err = t.attachProbes()

Conclusion

Just as eBPF simplified the way in which to program the kernel, CO-RE is tackling one other barrier. But leveraging such options has some necessities. Fortunately, with Tracee, the Aqua Security crew discovered a technique to reap the benefits of portability in case these necessities cannot be happy.

At the identical time, we’re certain that that is solely the start of a constantly evolving subsystem that may discover growing assist again and again, even in several working techniques.

Most Popular

To Top