Preserve cached mutation results on rerun of unchanged source#471
Preserve cached mutation results on rerun of unchanged source#471percy-raskova wants to merge 2 commits intoboxed:mainfrom
Conversation
create_mutants_for_file() correctly detected unchanged source files (source_mtime < mutant_mtime) and skipped regenerating mutations, but also reset all exit_code_by_key values to None. This caused every subsequent run to re-test all mutants from scratch, defeating the incremental caching introduced in bf3a2e8. Remove the reset loop so cached results survive across runs. The skip logic at line 1273 (`if result is not None: continue`) can now actually skip already-tested mutants. Also fix pre-existing ruff lint errors (unused imports, unused variable, f-strings without placeholders).
There was a problem hiding this comment.
Pull request overview
Fixes incremental mutation-test caching so reruns on unchanged source keep previously recorded mutant exit codes, preventing unnecessary re-testing and restoring the intended behavior of the cache.
Changes:
- Stop resetting
exit_code_by_keytoNonewhen the source file is unchanged (mtime comparison) so cached results persist. - Add an e2e regression test that injects sentinel exit codes into
.metafiles and verifies they survive a second run. - Clean up a few ruff issues (unused imports/vars, f-strings without placeholders).
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
src/mutmut/__main__.py |
Preserves cached .meta results when skipping mutant regeneration for unchanged files; includes minor lint cleanups. |
tests/e2e/test_e2e_incremental_cache.py |
Adds an e2e test verifying cached exit codes are preserved across reruns on unchanged source. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
Hi, @percy-raskova . This was an explicit decision as stated in the #470 PR:
The purpose of this PR was only to reduce the time spent in generating mutants, not (yet) to reduce the time testing these mutants. I do think such a functionality could be useful, but I would tend towards making it opt-in as it can "cache" mutant results that would change in a second run. I haven't thought a lot about it yet, open to discussions. Also somehow related to #418 We could also try to make it sound, by doing something like testmon for each mutant (e.g. collecting coverage for each test separately; then check which tests are affected by file changes; then only rerun mutants that are touched in those tests). Would need some thinking how sound this actually is, and how we'd like to implement this. This seems like some interesting, but non-trivial work.. |
|
Thank you for the explanation! I submitted this because on a project I am working on, I have a lot of tests and noticed |
|
@percy-raskova maybe you closed this a bit prematurely. There could be a third option here: We could mark these potentially stale mutants as potentially stale and put them last in the queue. The odds of already tested mutants being killed is probably high, but strictly speaking they should be retested as explained above. What do you think about that? |
|
Oh yes I like this idea and would like to be involved in this process! This is my first attempt at a PR to an open source project, so I am a bit new to the usual practices and customs 😅 |
|
No problem at all. This project has a lot of history too, and it's understood that new contributors won't have that background knowledge. If you want to work on an idea like this it would be very welcome. I think it makes a lot of sense. I've been thinking also about having priority for different types of mutants, which would fit nicely into this idea. For example, a deletion of an entire line is obviously a more severe mutant than changing Even though I'm the original author of mutmut I seem to be more of an vision/idea man at this point, with Otto-AA doing the actual work on the system and he has a much better living understanding of the tradeoffs and implementation than me now :P |
|
Hi there, just catching up on this thread, but I've actually been working on some very similar optimizations as in this PR. We're at a spot internally where we are ready to start merging to the upstream (first PR can be found here #473 with some housekeeping items). We're seeing something in the neighbourhood of 95-99% runtime improvements with our caching strategy so just wanted to reach out in this thread to see if there was any interest in holding off on this PR a bit while I start to open source our work and open the PRs for us all to discuss. |
|
No problem! |
|
Hey @percy-raskova , I did not intend to say that your wokr is not useful, sorry if it came across this way. My main point was that I think it should be opt-in (as this type of incremental testing can skip relevant mutants) and that I frankly haven't thought through this problem yet and don't know what's a reasonable approach. I think now it makes sense to wait for how @nicklafleur approached it and decide which way we want to go. It's also interesting to compare to other tools for inspiration (e.g. https://stryker-mutator.io/docs/stryker-js/incremental/ and https://infection.github.io/guide/command-line-options.html#git-diff-filter). If you want to work on something else in the meantime, feel free to take a look at the issue tracker and ask for pointers or explanations on how the code base works. For instance, #414 would be a rather small change (renaming If you have any questions, just ask :) (but it can take some days until I reply) |
I have only used mutmut on side projects and there's not much user feedback, so my UX tradeoffs are probably not the best 🙃 I also feel like string mutations got a bit much, we probably want to enable/disable some of them via settings and per default. But I enjoy thinking about the problems and how things could be improved, and want to get it working on some production code bases at some point. |
|
@Otto-AA no problem! if other people have been working on this problem for longer and have a more robust and performant solution id rather defer to that. my PR was just "Hey I fixed this for myself on my own project, why not submit an upstream PR?" 😛 Ill take a look at some of the other issues you brought up and try one out |
|
I've pushed my current work to https://github.com/lyft/mutmut/tree/nicklafleur/oss-complete, the commit with this solution specifically is lyft@92a5d62, though I will be rebasing on top of some review changes so if/when that happens you can look for the I did end up folding in the mtime solution as that is a nice local optimization, but for our uses (running this across large teams/large codebases) we needed something that could share the gains across many machines. Also worth noting, while the totality of those changes are a pretty significant rewrite of the application, the current behaviour is still the default (I'm pretty sure at least), and it's all very negotiable so I'm happy to refactor and make changes based on feedback. |
Summary
create_mutants_for_file()correctly detects unchanged source files (source_mtime < mutant_mtime) and skips regenerating mutations, but also resets allexit_code_by_keyvalues toNone. This causes every subsequentmutmut runto re-test all mutants from scratch, defeating the incremental caching introduced in bf3a2e8.Root cause: Lines 296-301 in
__main__.pyload the.metafile and set every exit code toNonebefore returningunmodified=True. The skip logic at line 1273 (if result is not None: continue) then sees all-Noneresults and re-tests everything.Fix: Remove the reset loop so cached results survive across runs.
Changes
source_mtime < mutant_mtimebranch ofcreate_mutants_for_file().metafiles and verifies they survive a second run on unchanged sourceIterable/libcst.matchers, unused variabletests_dir, f-strings without placeholders)Test plan
test_rerun_preserves_cached_results— RED without fix, GREEN with fixruff checkclean on changed files