Skip to content

C++ Low-latency Logging

Published: at 03:35 PM

Table of contents

Open Table of contents

Objective

To focus on the performance of the latest binlog, Quill, and fmtlog libraries on common data (string, double, complex structures, etc.).

Methods of Comparison

There are two methods:

All of the following results are from tests run on a kernel-isolated machine with kernel tethering.

Single-thread results

log one Foo struct (char data[200] + int64_t + double):

result of Method 1:

Library/Quantile Time (ns)0th50th75th90th95th99th99.999th100th
Quill192929294979759949
fmtlog19192929292914791949
binlog292939393939102267104247

result of Method 2:

Library/Quantile Time (ns)0th50th75th90th95th99th99.999th100th
Quill9910911915922938924892599
fmtlog30404040401103290456198
binlog394949595969117157117737

Note that in order to optimize Quill’s speed, I went ahead and changed the defaut_queue_capacity of the config to 67108864 (64MB), to prevent the need to create a new SPSC queue while keeping it running (since Quill internally has a thread_local SPSC queue for each thread, and the logger’s background queue pops out of each thread’s SPSC queue and writes files to it). (Because Quill has a thread_local SPSC queue for each thread, and then the logger’s backend queue pops out of each thread’s SPSC queue and writes to the file), as does fmtlog.

In method two, I create a volatile char array and rewrite the size of the L1 cache over and over again, which is equivalent to inserting a poison L1 cache operation in between each time.

Multi-thread results

Multi-thread results are all in Method 2.

4-thread results:

Library/Quantile Time (ns)0th50th75th90th95th99th99.999th100th
Quill3060808090100201087201636
fmtlog292939393979100663103036
binlog394949595969114607118427

8-thread results:

Library/Quantile Time (ns)0th50th75th90th95th99th99.999th100th
Quill30708090120160387276387911
fmtlog296979109129179190420192421
binlog405070110120170137961139571

Conclusions

The theoretical limit of single-threaded Quill is about the same as binlog, but the actual result is still faster than binlog. But for multithreading, 4-threaded Quill is slightly worse than the new version of binlog, and 8-threaded Quill is about the same as the new version of binlog.

In terms of single-threaded performance, on simple objects, single-threaded performance fmtlog≈Quill>binlog. On complex struct, single-thread performance binlog>fmtlog>Quill, but binlog performs worse at 99.999 and 100th percentile.

Based on the multithread and single-thread results, basically the latency of each of the multithreading results is higher. The performance of multithreading to log the Foo struct is binlog>fmtlog>Quill, but multithreading to log a simple variable is fmtlog>binlog>Quill.

This should be because the Foo struct has an advantage over the binlog, but if you consider more complex scenarios, such as when a user may need to log a struct, a string, or a double, then it is important to look at the performance of a single simple object.

Realizations

Quill: Quill-arch

binlog: binlog-arch

References