Today I announce the worlds fastest C preprocessor - meep
(*)!
meep
is a throw away name, I'm just using it in this post for the tool.Headline numbers:
I'll describe later how I profiled.
The C preprocessor is in some respects quite straightforward. As it is the result of many years of fixes, additions and improvements there are many subtle details, often not documented very well. I thank the developers of gcc for their documentation. The C Preprocessor Iceburg Meme gives a good overview of the rabbit hole.
The development has been somewhat humbling. I've been developing in C and C++ for the past 30 years or so. I thought I knew most things about how the C preprocessor works. Around 25 years ago I even developed a C preprocessor for building websites(!), but it was significantly slower than existing tools and was far from compliant or even arguably correct. I've been through several different algorithms and rewrites with meep
to get to this point.
_Pragma
#pragma once
__VA_ARGS__
, __VA_OPT__
, __COUNTER__
, __FILE__
#line
#include
, #line
etcTokenStream
s as well as PreprocStream
s
TokenStream
holds a tokenization of a source filePreprocStream
holds the preprocessor directive "program"Overview of how testing was performed:
cpp
or clang
do in this regard, but they seem single threadedbash time
\time
perf
/dev/null
#include
s to speed up compilationmeep
is developed in C++ and compiled via clang
for these testsI would note the results in this article are initial results. I have not profiled and optimized the code base around the meep
pre-processor in any way yet. I did create the algorithms to be performant though, and that appears to paid off. There are probably plenty of additional performance gains that could be achieved.
In order to test performance I made part of my engine codebase able to not include any system headers. This was important to make sure that all tests pre-process exactly the same files regardless of platform or toolchain.
I'm currently developing on linux (although meep
compiles and works on windows), and the pre-processors I tested against are clang
and gccs cpp
. Testing with time
gives some idea of the performance difference, but testing on a single file is so fast the numbers produced are all over the map. On average just running time and looking at user time we see about half the time. Wall clock time is similar, but this is almost certainly due to reading in the original source files. Using perf
produces similar results.
I decided next to try a multiple file test. This was largely to make the test slower to complete and therefore easier to measure. Here's where fairness becomes an issue. My implementation is designed to cache, tokenize, precompile input source once. I don't know how to do this with cpp
or clang
or if that's possible at all.
When I did a multi file test I did it in meep
via a single executable run. For other tools I produced a shell script that invokes the clang or cpp
multiple times. This is somewhat unfair, because my solution can read the files, tokenize, precompile only once across all compilations. Also there is some overhead in process setup and tear down.
With those caveats tests indicate meep
is around around 4 times as fast as cpp
or clang
in this scenario.
I would have liked to have compared to the performance of warp. I couldn't find any binaries. I did install the d compiler and attempt to compile warp. Unfortunately this produced numerous errors that I tried to fix. Not being a dlang expert, as more and more errors appeared I dropped the effort. On reading the warp blog post, it seems as if has some similarities to my approach around caching. The project itself appears to no longer have updates, so appears effectively shelved.
On doing some more profiling via perf
it's perhaps interesting to note, that nearly 2/3 of all execution with meep
is in outputting text. This is perhaps not super surprising, because the mechanism to do formatted token output is quite complicated. It uses the TokenStreams from source files, and then looks at how they line up with output tokens. This makes the assumption that the text between consecutive Tokens is either
If it's nothing, then we know there won't be an issue outputting then directly one after another. If it's comments, we don't want to (typically) output the comments but just the "structure" - meaning lines and carriage returns.