The Worlds Fastest C Preprocessor

Posted on: 2024-05-02

Today I announce the worlds fastest C preprocessor - meep (*)!

Headline numbers:

I'll describe later how I profiled.

The C preprocessor is in some respects quite straightforward. As it is the result of many years of fixes, additions and improvements there are many subtle details, often not documented very well. I thank the developers of gcc for their documentation. The C Preprocessor Iceburg Meme gives a good overview of the rabbit hole.

The development has been somewhat humbling. I've been developing in C and C++ for the past 30 years or so. I thought I knew most things about how the C preprocessor works. Around 25 years ago I even developed a C preprocessor for building websites(!), but it was significantly slower than existing tools and was far from compliant or even arguably correct. I've been through several different algorithms and rewrites with meep to get to this point.


Novel Features


Overview of how testing was performed:

I would note the results in this article are initial results. I have not profiled and optimized the code base around the meep pre-processor in any way yet. I did create the algorithms to be performant though, and that appears to paid off. There are probably plenty of additional performance gains that could be achieved.

In order to test performance I made part of my engine codebase able to not include any system headers. This was important to make sure that all tests pre-process exactly the same files regardless of platform or toolchain.

I'm currently developing on linux (although meep compiles and works on windows), and the pre-processors I tested against are clang and gccs cpp. Testing with time gives some idea of the performance difference, but testing on a single file is so fast the numbers produced are all over the map. On average just running time and looking at user time we see about half the time. Wall clock time is similar, but this is almost certainly due to reading in the original source files. Using perf produces similar results.

I decided next to try a multiple file test. This was largely to make the test slower to complete and therefore easier to measure. Here's where fairness becomes an issue. My implementation is designed to cache, tokenize, precompile input source once. I don't know how to do this with cpp or clang or if that's possible at all.

When I did a multi file test I did it in meep via a single executable run. For other tools I produced a shell script that invokes the clang or cpp multiple times. This is somewhat unfair, because my solution can read the files, tokenize, precompile only once across all compilations. Also there is some overhead in process setup and tear down.

With those caveats tests indicate meep is around around 4 times as fast as cpp or clang in this scenario.

I would have liked to have compared to the performance of warp. I couldn't find any binaries. I did install the d compiler and attempt to compile warp. Unfortunately this produced numerous errors that I tried to fix. Not being a dlang expert, as more and more errors appeared I dropped the effort. On reading the warp blog post, it seems as if has some similarities to my approach around caching. The project itself appears to no longer have updates, so appears effectively shelved.

Update 1

On doing some more profiling via perf it's perhaps interesting to note, that nearly 2/3 of all execution with meep is in outputting text. This is perhaps not super surprising, because the mechanism to do formatted token output is quite complicated. It uses the TokenStreams from source files, and then looks at how they line up with output tokens. This makes the assumption that the text between consecutive Tokens is either

If it's nothing, then we know there won't be an issue outputting then directly one after another. If it's comments, we don't want to (typically) output the comments but just the "structure" - meaning lines and carriage returns.

Future Work