Dmesg under the hood

时间: 2018-09-24 05:13:05

Hey,

This week I wanted to discover a bit more about how dmesgworks under the hood. In the past, I wanted to have alerting based on error messages popping at dmesg, so, maybe by trying to create something that uses the same thing that dmesguses under the hood, I could better understand it.

Also, I knew that in some systems, it was possible to gather the same information from kern.logand that dmesgwas all about reading “the kernel ring buffer”, but, what did that mean?

More specifically, where was dmesgreading from?

In this blog post, I go through some exploration to better understand the whole picture.

Producing logs from kernel space
Producing logs from iptables
Producing logs from a Linux loadable kernel module
Producing logs from kmsg
Seeing kernel logs from user space
Interpreting the kmsg messages

You can jump to the tl;dr.

Producing logs from kernel space

Whenever something in the Linux kernel (which could be a module) wants to do the equivalent of a printf , a similar utility ( printk) can be used.

That allows the program to give information back to the developer (or any other user) in userspace to know more about what’s going on with the module - be a warning message, stack traces or some pure debugging information.

To test out producing logs from the Kernel, I came up with some ideas:

iptables has a LOGtarget that essentially produces some logs from iptablesitself whenever the target is, well, targetted. Being iptablessomething that runs in kernel space (not the cli, I mean - the actual thing), we could see logs coming out of the kernel;
writing a minimal kernel module should not be something very hard - many people did that before. I could write a printk("hello-world")and that’s it!
check if there’s something simpler that we could use to produce the logs (maybe the Kernel exposes a standard interface for doing that?)

Being this an exploration, why not try all of them?

Producing logs from iptables

Although this blog post is not particularly about iptables, we can use it as an example where interacting with something that is going on under the hood deep in the kernel is interesting for a user that sits on userspace.

From iptablesman page :

Iptables is used to set up, maintain, and inspect the tables of IP packet filter rules in the Linux kernel. Several different tables may be defined.

These mentioned rules can be either very simple things (drop a packet if it comes from a given source), or some very complicated chain of rules that depend on a bunch of conditionals.

Acknowledging that it’s very useful to be able to visualize the packets that are hitting a specific chain of rules, the LOGtarget came to existence.

LOG: Turn on kernel logging of matching packets.When this option is set for a rule, the Linux kernel will print some information on all matching packets (like most IP/IPv6 header fields) via the kernel log.

Here’s an example set up that allows us to see that working:

# List the current state of the `filter` table.# As we can see below, nothing is set up:# - any packet arriving w/ our machine as the#   destination is accepted;# - any packet arriving that should be forwarded#   is dropped;# - any packet that is sent from our machine is#   accepted (can go through).iptables --table filter --listChain INPUT (policy ACCEPT)target     prot opt source               destinationChain FORWARD (policy DROP)target     prot opt source               destinationChain OUTPUT (policy ACCEPT)target     prot opt source               destination# Add a rule to the OUTPUT chain to log every packet# that is destined towards 8.8.8.8 (google's public# dns) iptables \        --table filter \        --insert OUTPUT \        --jump LOG \        --destination 8.8.8.8 \        --log-prefix="[google-dns-out]"# Check that the rule has been placed in the OUTPUT # chain of the `filter` tableiptables --table filter --list OUTPUT --numericChain OUTPUT (policy ACCEPT)target   prot opt source       destinationLOG      all  --  0.0.0.0/0    8.8.8.8        LOG flags 0 level 4 prefix "[google-dns-out]"# Perform a DNS query targetting 8.8.8.8dig example.com @8.8.8.8... example.com.18272INA93.184.216.34# Verify the request packet in our logs (dmesg)dmesg | grep google-dns-out[66458.608136] [google-dns-out]IN= OUT=enp0s3 SRC=10.0.2.15 DST=8.8.8.8 LEN=80 TOS=0x00 PREC=0x00 TTL=64 ID=30602 PROTO=UDP SPT=39573 DPT=53 LEN=60

Producing logs from a Linux loadable kernel module

Not being a driver developer myself (honestly, having never written a kernel module before), it was interesting to look for resources that would teach me how to accomplish this task.

Based on a guide from The Linux Kernel Labs, I came up with the following code for a very simple kernel module:

/** * `module.h` contains the most basic functionality needed for * us to create a loadable kernel module, including the `MODULE_*` * macros, `module_*` functions and including a bunch of other * relevant headers that provide useful functionality for us * (for instance, `printk`, which comes from `linux/printk.h`, * a header included by `linux/module.h`). */#include <linux/module.h>/** * Following, we make use of several macros to properly provide * information about the kernel module that we're creating. * * The information supplied here are visible through tools like * `modinfo`. * * Note.: the license you choose here **DOES AFFECT** other things - * by using a proprietary license your kernel will be "tainted". */MODULE_LICENSE("GPL");MODULE_AUTHOR("Ciro S. Costa");MODULE_DESCRIPTION("A hello-world printer");MODULE_VERSION("0.1");/** hello_init - initializes the module * * The `hello_init` method defines the procedures that performs the set up * of our module. */static inthello_init(void){         // By making use of `printk` here (in the initialization),         // we can look at `dmesg` and verify that what we log here         // appears there at the moment that we load the module with         // `insmod`.printk(KERN_INFO "HELLO-WORLD: Hello world!\n");return 0;}static voidhello_exit(void){        // similar to `init`, but for the removal time.printk(KERN_INFO "HELLO-WORLD: Bye bye world!\n");}// registers the `hello_init` method as the method to run at module// insertion time.module_init(hello_init);// similar, but for `removal`module_exit(hello_exit);

ps.: the source code is available at github.com/cirocosta/hello-world-lkm

Load the module, and there you go:

dmesg | grep HELLO-WORLD[62076.224353] HELLO-WORLD: Hello world!

Producing logs from kmsg

https://www.kernel.org/doc/Documentation/ABI/testing/dev-kmsg

A standard interface that we can use to insert messages into the printkbuffer is exactly the same that we can use to read the messages from it: /dev/kmsg.

From the kernel documentation:

Injecting messages: Every write() to the opened device node places a log entry in the kernel’s printk buffer.

That is, whatever we echo "haha"as a single writeto /dev/kmsggoes into the buffer.

Let’s try it then:

# Generate the message from my current unprivileged# user and then pipe to the privileged `tee` which is# able to write into `/dev/kmsg` (yep, /dev/kmsg can be# see and read, but to write you need more privileges)echo "haha" | sudo tee "/dev/kmsg"# Check that we can see our message is indeed there:dmesg | grep haha[67030.010334] haha

Seeing kernel logs from user space

Having already spoiled the blog post by mentioning kmsgin the section before, there’s not a bunch to talk now.

The kernel provides to us /dev/kmsgin userspace, a special device that allows us to read from the ring buffer (with multitenancy in mind), and that’s what dmesg uses.

# Trace the syscalls `openat` and `read` to verify# that it reads from `/dev/kmsg`strace -e openat,read -f dmesg > /dev/nullopenat(AT_FDCWD, "/dev/kmsg", O_RDONLY|O_NONBLOCK) = 3read(3, "5,0,0,-;Linux version 4.15.0-34-"..., 8191) = 192read(3, "6,1,0,-;Command line: BOOT_IMAGE"..., 8191) = 142read(3, "6,2,0,-;KERNEL supported cpus:\n", 8191) = 31read(3, "6,3,0,-;  Intel GenuineIntel\n", 8191) = 29

Going even deeper, whenever a read(2)syscall is emitted targetting such file, the kernel triggers kernel/printk/devkmsg_read internally, taking a message from the circular queue, formatting it and then sending back to the user.

# We can use `iovisor/bcc#examples/trace/stacknoop.py` to# gather the stack trace from the execution of `devkmsg_read`# to verify that the `read` gets right there when `vfs_read`# is called../stacksnoop.py -v devkmsg_readTIME(s)            COMM         PID    CPU FUNCTION1.875072956        tail         1565   0   devkmsg_readdevkmsg_readvfs_readsys_readdo_syscall_64entry_SYSCALL_64_after_hwframe

A similar path is followed when writing to the circular queue as well: whenever a write(2)is issued, at some point devkmsg_write is called.

devkmsgtranslates such call to the equivalent of a printk, which then takes the path of reaching log_store, the method that ends up finally taking a free space from the queue and adding the log message:

# In one terminalecho "haha" > /dev/kmsg# In another terminal./stacksnoop.py -v log_storeTIME(s)            COMM         PID    CPU FUNCTION1.786375046        bash         1450   2   log_storelog_storeprintk_emitdevkmsg_writenew_sync_write__vfs_writevfs_writesys_writedo_syscall_64entry_SYSCALL_64_after_hwframe

By knowing that the kernel provides this interface, we can implement a simple program that constantly performs read(2)s against such device and then parses the contents (messages) that arrive.

Interpreting the kmsg messages

Knowing that every read(2)syscall performed against /dev/kmsgreturns us a single message, we’re left to:- implementing a reader that continuously looks at /dev/kmsgand then extracts the raw messages from there; as well as- implementing the parsing of those messages.

The later is the most interesting, imo.

Each message comes packed in the following format: a list of comma-separated info fields and a message (these, separated by a semicolon).

////                  INFO              MSG//     .------------------------------------------. .------.//    |                                            |        |//    |intint      int      char, <ignore>   | string |//    prefix,   seq, timestamp_us,flag[,..........];<message>//

There are four standard fields in the info section, leaving room for other fields in the future.

From the priorityfield, we can extract two pieces of information: priority and facility.

// DecodePrefix extracts both priority and facility from a given// syslog(2) encoded prefix.////   facility    priority//      .-----------.  .-----.//      |           |  |     |//7  6  5  4  3  2  1  0    bits//

With the rest of the fields, we’re able to know when the message was produced and what ID does such message carry in the sequence of messages that have been put in the buffer.

If you’re interested in seeing this parsing in the form of actual code, I wrote a Golang implementation that you can check here: github.com/cirocosta/dmesg_exporter/kmsg/kmsg.go.

Closing thoughts

It was great to have the opportunity of going deep into the stack and verifying how components from the kernel can communicate with userspace applications.

That led me to create dmesg_exporter , a tool for exporting “dmesg” logs metrics so that people can eventually alert messages marked as “critical” arriving from a specific facility.

If you’ve had any questions, or have ideas for improvements, please let me know! I’m cirowrcon Twitter, and I’d love to hear from you!

Have a good one!