Skip to content

Conversation

@Fidget-Spinner
Copy link
Member

@Fidget-Spinner Fidget-Spinner commented Jan 9, 2026

I verified on my system this restores the old performance.

@Fidget-Spinner Fidget-Spinner changed the title gh-143536: Lazily allocate tracer code and opt buffers gh-143421: Lazily allocate tracer code and opt buffers Jan 9, 2026
@cocolato
Copy link
Contributor

cocolato commented Jan 9, 2026

Should we remove the newly added pycore_optimizer_types.h?

@Fidget-Spinner
Copy link
Member Author

Should we remove the newly added pycore_optimizer_types.h?

Let's keep it. It's a good refactor.

@Fidget-Spinner Fidget-Spinner merged commit b852236 into python:main Jan 9, 2026
62 checks passed
@Fidget-Spinner Fidget-Spinner deleted the lazily-allocate branch January 9, 2026 16:56
// Holds locals, stack, locals, stack ... co_consts (in that order)
#define MAX_ABSTRACT_INTERP_SIZE 4096
// Holds locals, stack, locals, stack ... (in that order)
#define MAX_ABSTRACT_INTERP_SIZE 512
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This number was picked arbitrarily by me back then when I first wrote this. I now realise it's overkill. We don't need 3 pages of memory just to store locals and stack. It's slow and wastes memory.

@markshannon
Copy link
Member

This makes no sense to me. How is allocating the necessary memory piecemeal faster than allocating it in a single chunk?

@Fidget-Spinner
Copy link
Member Author

@markshannon no the problem was embedding it as a part of _PyThreadStateImpl, that caused a 100% slowdown of thread spawning time on the benchmark, which reflected in bench_thread_pool.

@Fidget-Spinner
Copy link
Member Author

The benchmark doesn't run any work to use the JIT, instead it just measures thread startup overhead https://github.com/python/pyperformance/blob/main/pyperformance/data-files/benchmarks/bm_concurrent_imap/run_benchmark.py#L19

Embedding the two structs together in the _PyThreadStateImpl causes a massive slowdown.

@markshannon
Copy link
Member

It shouldn't be embedded, it should be heap allocated as one
#143536 (comment)

@Fidget-Spinner
Copy link
Member Author

It shouldn't be embedded, it should be heap allocated as one #143536 (comment)

Ok let me just put a up a PR to do that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants