Making checkrs 17x faster
checkrs is my personal Rust linter. It is
powered by ast-grep, which means I write lint
rules as tree-sitter queries in Python and ast-grep handles the parsing.
It has been working well but I noticed it was getting slow. I ran it against a project with 75 Rust files and it took 5.4 seconds. That felt wrong. Parsing 75 files and running some pattern matching should take milliseconds, not seconds. I decided to investigate.
Profiling
I started by running cProfile to see where the time was going.
609593 function calls (481349 primitive calls) in 1.560 seconds
ncalls tottime percall cumtime percall filename:lineno(function)
4739 0.029 0.000 3.069 0.001 checkrs/lints/lint.py:27(make_config)
4739 0.017 0.000 3.001 0.002 .../ast.py:50(literal_eval)
123454 0.109 0.000 2.975 0.001 .../ast.py:80(_convert)Wait, why is make_config taking 3 seconds? And why is ast.literal_eval being
called 4739 times? That number seemed suspicious. I have 64 lint rules and 75
files, and 64 times 75 is roughly 4800. So make_config was being called once
per lint per file.
Problem 1: Every lint parsed every file
I looked at how a typical lint was implemented.
def check(self, file_path: Path, source: str) -> list[Violation]:
root = ast_grep_py.SgRoot(source, "rust")
node = root.root()
matches = list(node.find_all(config))
...Every single lint was creating its own SgRoot. That means 75 files times 64
lints equals 4800 redundant parses. The AST was being rebuilt from scratch for
every lint. Tree-sitter parsing is fast, but not that fast.
The fix was obvious: parse once and reuse the root node.
# In the runner, parse once per file
root = ast_grep_py.SgRoot(source, "rust")
node = root.root()
# Pass the node to every lint
for lint in get_all_lints():
violations = lint.check(file_path, node)I updated the Lint.check signature across all 64 lint files.
# Before
def check(self, file_path: Path, source: str) -> list[Violation]:
root = ast_grep_py.SgRoot(source, "rust")
node = root.root()
...
# After
def check(self, file_path: Path, node: ast_grep_py.SgNode) -> list[Violation]:
matches = list(node.find_all(config))
...This alone dropped the runtime from 5.4 seconds to 1.5 seconds.
Problem 2: make_config was rebuilding configs from strings
The profile still showed make_config at the top. I looked at the
implementation.
def make_config(**kwargs: object) -> ast_grep_py.Config:
d = ast.literal_eval(repr(kwargs))
return ast_grep_py.Config(**d)This function was turning a dict into a string with repr, then parsing it back
into a dict with ast.literal_eval. I have no idea why it was written this way.
But it meant every call to make_config was serializing and deserializing the
entire config dict.
The configs are mostly static. The same lint uses the same config for every
file. So I replaced this with json.dumps and an lru_cache.
@lru_cache(maxsize=256)
def _cached_config(key: str) -> ast_grep_py.Config:
return json.loads(key)
def make_config(**kwargs: object) -> ast_grep_py.Config:
return _cached_config(json.dumps(kwargs, sort_keys=True))Now each unique config is built exactly once. The make_config overhead dropped
from about 3 seconds to basically nothing.
After this change the runtime was down to about 1.1 seconds.
Problem 3: ThreadPoolExecutor was making it slower
The code was using ThreadPoolExecutor with 20 threads to lint files in
parallel. I ran a quick benchmark with different worker counts.
1 worker: 1.108s
2 workers: 1.163s
4 workers: 1.187s
8 workers: 1.199s
20 workers: 1.210sMore threads made it slower. This is the GIL. Python threads cannot run in parallel when the work is CPU-bound, and ast-grep’s tree-sitter queries are pure CPU work. The threads were just adding context-switching overhead.
I removed the thread pool entirely and went single-threaded. That brought the time down to about 1.0 seconds. But I wanted sub-second.
Problem 4: No real parallelism
The GIL prevents threads from parallelizing CPU work, but processes can. On
Linux I can use multiprocessing.Pool with the fork context. This creates new
processes by forking the current one, which is fast because it copies the memory
pages without actually duplicating the data until it is written.
if sys.platform == "linux":
ctx = get_context("fork")
max_workers = min(8, (os.cpu_count() or 1))
with ctx.Pool(processes=max_workers) as pool:
results = pool.map(_check_file, files, chunksize=...)
else:
results = [_check_file(f) for f in files]I had to verify that fork multiprocessing did not break anything. Since the
lints are stateless and _check_file only reads files, it was safe. I compared
the single-threaded output against the multiprocessing output for every file and
all 143 tests passed.
Results
The final numbers:
$ time checkrs run external/pyk/raptor
real 0m0.319s
user 0m1.259s
sys 0m0.081sFrom 5.4 seconds down to 0.32 seconds. That is a 17x speedup.
What I learned
This optimization taught me a few things I will keep in mind.
First, caching matters even for “cheap” operations. Building a dict 4800 times by serializing and deserializing it is not cheap. Caching is not just for database queries or network calls.
Second, the GIL is real. I knew this in theory but it was interesting to see it so clearly in the benchmark. More threads literally made the code slower. Processes are the right tool for CPU-bound Python work, at least on Linux where fork is available.
Third, redundant work is the easiest win. Parsing every file 64 times was the biggest single improvement. It is easy to miss when each individual lint looks reasonable on its own. The cost only becomes obvious when you look at the system as a whole.
Finally, profile before optimizing. I could have guessed that threading was the problem, but the profile showed me that redundant parsing and config building were the real culprits. Without the profile I would have spent time on the wrong things.
The linter is fast enough now that I do not notice it at all. Which is exactly how a linter should feel.
Disclaimer: I used Kimi K2.6 to help with this optimization task, all results verified by me.
| Tags | rust , python , performance , ast-grep |
|---|