Skip to content

Preserve cached mutation results on rerun of unchanged source#471

Open
percy-raskova wants to merge 2 commits intoboxed:mainfrom
percy-raskova:fix/preserve-cached-mutation-results
Open

Preserve cached mutation results on rerun of unchanged source#471
percy-raskova wants to merge 2 commits intoboxed:mainfrom
percy-raskova:fix/preserve-cached-mutation-results

Conversation

@percy-raskova
Copy link

Summary

create_mutants_for_file() correctly detects unchanged source files (source_mtime < mutant_mtime) and skips regenerating mutations, but also resets all exit_code_by_key values to None. This causes every subsequent mutmut run to re-test all mutants from scratch, defeating the incremental caching introduced in bf3a2e8.

Root cause: Lines 296-301 in __main__.py load the .meta file and set every exit code to None before returning unmodified=True. The skip logic at line 1273 (if result is not None: continue) then sees all-None results and re-tests everything.

Fix: Remove the reset loop so cached results survive across runs.

Changes

  • Bug fix: Remove the exit code reset in the source_mtime < mutant_mtime branch of create_mutants_for_file()
  • Test: Add e2e test that injects sentinel exit codes into .meta files and verifies they survive a second run on unchanged source
  • Lint: Fix pre-existing ruff errors (unused imports Iterable/libcst.matchers, unused variable tests_dir, f-strings without placeholders)

Test plan

  • New e2e test test_rerun_preserves_cached_results — RED without fix, GREEN with fix
  • All 179 existing tests pass
  • All 6 e2e tests pass (including existing snapshot tests)
  • ruff check clean on changed files

create_mutants_for_file() correctly detected unchanged source files
(source_mtime < mutant_mtime) and skipped regenerating mutations,
but also reset all exit_code_by_key values to None. This caused
every subsequent run to re-test all mutants from scratch, defeating
the incremental caching introduced in bf3a2e8.

Remove the reset loop so cached results survive across runs. The
skip logic at line 1273 (`if result is not None: continue`) can
now actually skip already-tested mutants.

Also fix pre-existing ruff lint errors (unused imports, unused
variable, f-strings without placeholders).
Copilot AI review requested due to automatic review settings February 25, 2026 00:39
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes incremental mutation-test caching so reruns on unchanged source keep previously recorded mutant exit codes, preventing unnecessary re-testing and restoring the intended behavior of the cache.

Changes:

  • Stop resetting exit_code_by_key to None when the source file is unchanged (mtime comparison) so cached results persist.
  • Add an e2e regression test that injects sentinel exit codes into .meta files and verifies they survive a second run.
  • Clean up a few ruff issues (unused imports/vars, f-strings without placeholders).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
src/mutmut/__main__.py Preserves cached .meta results when skipping mutant regeneration for unchanged files; includes minor lint cleanups.
tests/e2e/test_e2e_incremental_cache.py Adds an e2e test verifying cached exit codes are preserved across reruns on unchanged source.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@Otto-AA
Copy link
Collaborator

Otto-AA commented Feb 25, 2026

Hi, @percy-raskova . This was an explicit decision as stated in the #470 PR:

When running mutmut run a second time, it will now only mutate modified source files. It will run all mutants nonetheless, because a change in some source file A can influence if a mutant of source file B would survive.

The purpose of this PR was only to reduce the time spent in generating mutants, not (yet) to reduce the time testing these mutants.

I do think such a functionality could be useful, but I would tend towards making it opt-in as it can "cache" mutant results that would change in a second run. I haven't thought a lot about it yet, open to discussions. Also somehow related to #418

We could also try to make it sound, by doing something like testmon for each mutant (e.g. collecting coverage for each test separately; then check which tests are affected by file changes; then only rerun mutants that are touched in those tests). Would need some thinking how sound this actually is, and how we'd like to implement this. This seems like some interesting, but non-trivial work..

@percy-raskova
Copy link
Author

Thank you for the explanation! I submitted this because on a project I am working on, I have a lot of tests and noticed mutmut was mutating all of the source files every time, which was inconvenient for my specific use case. I wasn't aware of the other PRs so my thought process to try submitting an upstream PR in case others had the same issue. But what you said makes sense as well. Thanks for taking the time to spell all that out!

@boxed
Copy link
Owner

boxed commented Feb 25, 2026

@percy-raskova maybe you closed this a bit prematurely. There could be a third option here:

We could mark these potentially stale mutants as potentially stale and put them last in the queue. The odds of already tested mutants being killed is probably high, but strictly speaking they should be retested as explained above.

What do you think about that?

@percy-raskova percy-raskova reopened this Feb 25, 2026
@percy-raskova
Copy link
Author

percy-raskova commented Feb 25, 2026

Oh yes I like this idea and would like to be involved in this process! This is my first attempt at a PR to an open source project, so I am a bit new to the usual practices and customs 😅

@boxed
Copy link
Owner

boxed commented Feb 25, 2026

No problem at all. This project has a lot of history too, and it's understood that new contributors won't have that background knowledge.

If you want to work on an idea like this it would be very welcome. I think it makes a lot of sense. I've been thinking also about having priority for different types of mutants, which would fit nicely into this idea. For example, a deletion of an entire line is obviously a more severe mutant than changing < to <= so logically all those mutants should be run first.

Even though I'm the original author of mutmut I seem to be more of an vision/idea man at this point, with Otto-AA doing the actual work on the system and he has a much better living understanding of the tradeoffs and implementation than me now :P

@nicklafleur
Copy link
Contributor

nicklafleur commented Feb 27, 2026

Hi there, just catching up on this thread, but I've actually been working on some very similar optimizations as in this PR. We're at a spot internally where we are ready to start merging to the upstream (first PR can be found here #473 with some housekeeping items). We're seeing something in the neighbourhood of 95-99% runtime improvements with our caching strategy so just wanted to reach out in this thread to see if there was any interest in holding off on this PR a bit while I start to open source our work and open the PRs for us all to discuss.

@percy-raskova
Copy link
Author

No problem!

@Otto-AA
Copy link
Collaborator

Otto-AA commented Feb 28, 2026

Hey @percy-raskova , I did not intend to say that your wokr is not useful, sorry if it came across this way. My main point was that I think it should be opt-in (as this type of incremental testing can skip relevant mutants) and that I frankly haven't thought through this problem yet and don't know what's a reasonable approach. I think now it makes sense to wait for how @nicklafleur approached it and decide which way we want to go. It's also interesting to compare to other tools for inspiration (e.g. https://stryker-mutator.io/docs/stryker-js/incremental/ and https://infection.github.io/guide/command-line-options.html#git-diff-filter).

If you want to work on something else in the meantime, feel free to take a look at the issue tracker and ask for pointers or explanations on how the code base works. For instance, #414 would be a rather small change (renaming paths_to_mutate because the name can be confusing), or #47 if you want to go deeper into how we use libCST to parse code. Or as @boxed explained, you could also take a look at shadowing in #367.

If you have any questions, just ask :) (but it can take some days until I reply)

@Otto-AA
Copy link
Collaborator

Otto-AA commented Feb 28, 2026

Even though I'm the original author of mutmut I seem to be more of an vision/idea man at this point, with Otto-AA doing the actual work on the system and he has a much better living understanding of the tradeoffs and implementation than me now :P

I have only used mutmut on side projects and there's not much user feedback, so my UX tradeoffs are probably not the best 🙃 I also feel like string mutations got a bit much, we probably want to enable/disable some of them via settings and per default. But I enjoy thinking about the problems and how things could be improved, and want to get it working on some production code bases at some point.

@percy-raskova
Copy link
Author

@Otto-AA no problem! if other people have been working on this problem for longer and have a more robust and performant solution id rather defer to that. my PR was just "Hey I fixed this for myself on my own project, why not submit an upstream PR?" 😛 Ill take a look at some of the other issues you brought up and try one out

@nicklafleur
Copy link
Contributor

nicklafleur commented Mar 1, 2026

I've pushed my current work to https://github.com/lyft/mutmut/tree/nicklafleur/oss-complete, the commit with this solution specifically is lyft@92a5d62, though I will be rebasing on top of some review changes so if/when that happens you can look for the feat: add function hashing for incremental mutation testing commit (tl;dr: it hashes the AST to get a syntactically-stable representation of the source and uses that for change detection). The one after also handles a better cache invalidation strategy so looking at them together isn't a bad idea (they will end up in a pr together anyways).

I did end up folding in the mtime solution as that is a nice local optimization, but for our uses (running this across large teams/large codebases) we needed something that could share the gains across many machines.

Also worth noting, while the totality of those changes are a pretty significant rewrite of the application, the current behaviour is still the default (I'm pretty sure at least), and it's all very negotiable so I'm happy to refactor and make changes based on feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants