[#722] fix segfault and hung threads on KeyboardIinterrupt during parallel get #728

d-w-moore · 2025-05-19T17:07:54Z

Wherein we close down threads in an orderly way, so that things don't leave things to be disposed in the wrong order for the ever persnickety SSL shutdown logic.

Experiments show that SIGTERM actually does induce the Python interpreter to shut down non-daemonic threads, so installing a signal handler for that may not be necessary in the end.

d-w-moore · 2025-05-19T17:10:10Z

After a bit of manual testing, will attempt to make a proper test for SIGINT and SIGTERM to ensure things are left in an ok state.

d-w-moore · 2025-06-05T18:12:49Z

A GUI for example that maintains background asynch parallel transfers using PRC could trap and guard against Ctrl-C thusly:

from irods.parallel import abort_asynchronous_transfers
signal(SIGINT, lambda *_:exit(0 if abort_asynchronous_transfers() else 0))

Update: abort_asynchronous_transfers has transformed. It is now abort_parallel_transfers and may now be used to abort the current (just interrupted) synchronous transfer as well as all pending background ones. See the README updates in this pull request.

alanking

Seems reasonable. Just a couple of things in the test

irods/test/modules/test_signal_handling_in_multithread_get.py

korydraughn · 2025-07-08T18:27:55Z

Looks like we have a conflict.

Seems this PR is close to completion?

alanking · 2025-12-05T21:14:07Z

Just checking to see if this PR is still being considered for 3.3.0.

d-w-moore · 2025-12-08T08:15:20Z

Just checking to see if this PR is still being considered for 3.3.0.

Will check its currency to see whether the segfault is still a concern. If so, then I think we can consider it for this release.

…own in signal handler

some debug still remains.

irods/test/modules/test_signal_handling_in_multithread_get.py

irods/parallel.py

README.md

irods/manager/data_object_manager.py

irods/parallel.py

Co-authored-by: Alan King <[email protected]>

korydraughn · 2026-01-06T15:21:06Z

What's the status of this PR?

d-w-moore · 2026-01-07T15:27:17Z

What's the status of this PR?

I believe it's almost ready. I want to look over it once more.

… in call

alanking

Awaiting signal that this is ready

d-w-moore · 2026-01-08T20:02:23Z

Awaiting signal that this is ready

I've added a test (first draft, will run soon) that interrupts a put . We weren't testing that previously.

finish put test debug(parallel) debug(put-test) behaves better if we add mgr to list sooner? experimental changes ACTIVE_PATH paths active make return values consisten from io_multipart_*() print debug on abort almost there? move statement where transfer_managers is updated rework abort_transfer fn slightly handle logic for prematurely shutdown executor

d-w-moore · 2026-01-15T20:07:51Z

Now open to comment ... once more. These changes are final in my mind, as far as the significant changes to library functionality.

d-w-moore · 2026-01-15T20:08:59Z

I can squash the last 6 commits or so, if that helps reviewers.

korydraughn · 2026-01-15T20:12:56Z

Yes please.

alanking · 2026-01-15T20:14:40Z

irods/parallel.py

+transfer_managers: weakref.WeakKeyDictionary["_Multipart_close_manager", Any] = weakref.WeakKeyDictionary()
+
+def abort_parallel_transfers(dry_run=False, filter_function=None):
+    """'cls' should be tuple to extract the current synchronous transfer."""


What is/was 'cls'?

I think it was a base class used to filter/process exposed keys. no longer useful or used. will get rid of it in the comment.

alanking · 2026-01-15T20:26:12Z

irods/parallel.py

+            if File is None:
+                File = gen_file_handle()
+            try:
+                f = None


What is f? It appears to be unused.

Will look thru codacy, proabaly one among many unused variable notifications.

alanking · 2026-01-15T20:28:23Z

irods/parallel.py

+        # TODO - examine this experimentally restored code, as
+        # library should react to these two exception types(and perhaps others) by quitting all transfer threads


Is there an issue we can link to with this TODO? To which two exception types is it referring?

There's no issue (yet). I was just in the process - incomplete when I wrote the TODO - of trhing to understand how the parts fit together. I can change it to an informative comment - not a todo - wrt BaseException types and their normal place in handling cleanup. In a nutshell, they are not often trapped by application writers. But libraries like this one (since it spawns threads behind the scenes) need to handle them.

I'm going to remove the TODO unless there's objections.

alanking · 2026-01-15T20:31:01Z

irods/parallel.py

+#       if queueLength > 0:
+        (futures, mgr, chunk_notify_queue) = retval
+#       else:
+#           futures = retval
+            # TODO: investigate: Huh? Why were we zeroing out total_bytes when there is no progress queue? 
+            #chunk_notify_queue = total_bytes = None


Do we need to keep these commented-out bits? And if so, can we get an issue number for the TODO?

No, this TODO can go.

irods/test/modules/test_signal_handling_in_multithread_put.py

alanking · 2026-01-15T20:48:57Z

irods/test/modules/test_signal_handling_in_multithread_get.py

+        # Tell the parent process the name of the local file being "get"ted (got) from iRODS
+        print(local_path)
+        sys.stdout.flush()


This appears to be used as signal in the test on line 70 that the transfer threads have spawned. Is that understanding correct? If so, let's explicitly mention that in the comment as well because that seems really important.

irods/test/modules/test_signal_handling_in_multithread_get.py

alanking · 2026-01-15T21:01:56Z

Also, take a peek at the 2 failing checks to see if there's anything actionable.

d-w-moore · 2026-01-15T21:24:18Z

Sorry about the delay. (I see some reviews are already made.) I have just submitted a squash - but no actual changes of any kind since the last note I posted above.

d-w-moore · 2026-01-15T21:55:53Z

Still a TODO: Introduce a DataTransferInterruptedException and raise the RuntimeError from it, in the case that we need to signal a put or get didn't complete. We could also, possibly, deprecate catching the RuntimeError in case at time of 4.0.0 release we want to make it a bare DataTransferInterruptedException. (Name up for suggestions, but that's what I've settled on.)

Going to have to make an issue for this one and handle it later, that is in 4.0.0, I think. Things are looking pretty complex so far, and I don't want to complicate this release with unforeseen and as yet too-difficult-to-test problems....

d-w-moore changed the title ~~[_722] fix segfault and hung threads on SIGINT during parallel get~~ [#722] fix segfault and hung threads on KeyboardIinterrupt during parallel get May 19, 2025

d-w-moore self-assigned this May 19, 2025

d-w-moore marked this pull request as draft May 19, 2025 17:09

alanking mentioned this pull request May 21, 2025

Segmentation fault and ssl errors after the client connection is cut by KeyboardInterrupt #722

Open

d-w-moore force-pushed the segfault_parallel_io_722.m branch from e9d96e2 to 7e9a09f Compare May 22, 2025 13:02

d-w-moore marked this pull request as ready for review June 5, 2025 17:34

d-w-moore force-pushed the segfault_parallel_io_722.m branch from abafff5 to fb36836 Compare June 6, 2025 13:48

alanking reviewed Jun 30, 2025

View reviewed changes

irods/test/modules/test_signal_handling_in_multithread_get.py Outdated Show resolved Hide resolved

irods/test/modules/test_signal_handling_in_multithread_get.py Outdated Show resolved Hide resolved

irods/test/modules/test_signal_handling_in_multithread_get.py Outdated Show resolved Hide resolved

d-w-moore added 4 commits December 8, 2025 10:40

[_722] fix segfault and hung threads on SIGINT during parallel get

a3dd06e

use subtest.

481952c

try to preserve latest synchronous parallel put/get for orderly shutd…

1213d2b

…own in signal handler

can now abort parallel transfers with SIGINT/^C or SIGTERM

59fd131

some debug still remains.

d-w-moore force-pushed the segfault_parallel_io_722.m branch from fb36836 to 481952c Compare December 12, 2025 15:25

d-w-moore added 6 commits December 12, 2025 15:38

[_722] update readme for signals and parallel put/get

ba0407a

prevent auto_close

dddcc95

satisfy static typing.

b96f805

revise README

ed25eda

forward ref needed for mypy?

9d2ff7a

patch test

0aaf747

alanking reviewed Dec 19, 2025

View reviewed changes

irods/test/modules/test_signal_handling_in_multithread_get.py Outdated Show resolved Hide resolved

d-w-moore added 2 commits December 19, 2025 15:11

more informative error message when retcodes do not match

ade42ea

delete unnecessary "import irods"

8272b5a

alanking reviewed Jan 2, 2026

View reviewed changes

irods/parallel.py Outdated Show resolved Hide resolved

irods/parallel.py Show resolved Hide resolved

README.md Outdated Show resolved Hide resolved

irods/manager/data_object_manager.py Outdated Show resolved Hide resolved

irods/parallel.py Outdated Show resolved Hide resolved

d-w-moore and others added 2 commits January 4, 2026 10:48

Update README.md

7584b71

Co-authored-by: Alan King <[email protected]>

add a finite timeout

368e08e

review comments

0765f71

comments regarding futures returning None

9ec506b

d-w-moore added 3 commits January 7, 2026 11:46

test condition wait is ten minutes is the default, no need to specify…

1b42f97

… in call

catch was a no-op

1740e80

remove TODO's

c5824cc

alanking reviewed Jan 8, 2026

View reviewed changes

[_722] test a data put is sanely interruptable

92474be

alanking reviewed Jan 15, 2026

View reviewed changes

[another_squash] tidy, fix, add put test

14037f9

d-w-moore force-pushed the segfault_parallel_io_722.m branch from 4b0458e to 14037f9 Compare January 15, 2026 21:23

d-w-moore added 3 commits January 15, 2026 21:38

add tools.py with shared functions.

107ef8d

make doc string more thorough, for abort_parallel_transfers().

f7b5a73

codacy, review

d242b90

		# TODO - examine this experimentally restored code, as
		# library should react to these two exception types(and perhaps others) by quitting all transfer threads

[#722] fix segfault and hung threads on KeyboardIinterrupt during parallel get #728

Are you sure you want to change the base?

[#722] fix segfault and hung threads on KeyboardIinterrupt during parallel get #728

Conversation

d-w-moore commented May 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

d-w-moore commented May 19, 2025

Uh oh!

d-w-moore commented Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alanking left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

korydraughn commented Jul 8, 2025

Uh oh!

alanking commented Dec 5, 2025

Uh oh!

d-w-moore commented Dec 8, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

korydraughn commented Jan 6, 2026

Uh oh!

d-w-moore commented Jan 7, 2026

Uh oh!

alanking left a comment

Choose a reason for hiding this comment

Uh oh!

d-w-moore commented Jan 8, 2026

Uh oh!

d-w-moore commented Jan 15, 2026

Uh oh!

d-w-moore commented Jan 15, 2026

Uh oh!

korydraughn commented Jan 15, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

alanking commented Jan 15, 2026

Uh oh!

d-w-moore commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

d-w-moore commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

d-w-moore commented May 19, 2025 •

edited

Loading

d-w-moore commented Jun 5, 2025 •

edited

Loading

d-w-moore commented Jan 15, 2026 •

edited

Loading

d-w-moore commented Jan 15, 2026 •

edited

Loading