Skip to content

ray.exceptions.RayActorError: The actor died unexpectedly before finishing this task. #15

Description

@ekkooee7

hi, i met this problem when running python driver.py.

Hello World... From global (pid=36500) (imitationRunner pid=37879) Hello World... From global (imitationRunner pid=37879) starting episode 0 on metaAgent 0 (imitationRunner pid=37879) running imitation job 2024-04-02 16:24:19,702 WARNING worker.py:1986 -- A worker died or was killed while executing a task by an unexpected system error. To troubleshoot the problem, check the logs for the dead worker. RayTask ID: ffffffffffffffff57261355039a445aab5c889701000000 Worker ID: 5319944c466cd717513b05721f5bb35ee9d0bc67636ca45d75ec4b26 Node ID: 9142cb0d3cde6bac61a2c9ea58188ae8f649a46cd4f8ab495df8f181 Worker IP address: 10.26.224.144 Worker port: 46573 Worker PID: 37880 Worker exit type: SYSTEM_ERROR Worker exit detail: Worker unexpectedly exits with a connection error code 2. End of file. There are some potential root causes. (1) The process is killed by SIGKILL by OOM killer due to high memory usage. (2) ray stop --force is called. (3) The worker is crashed unexpectedly due to SIGSEGV or other unexpected errors. (imitationRunner pid=37880) cannot allocate memory for thread-local data: ABORT Traceback (most recent call last): File "/home/waz/workspace/PRIMAL2/driver.py", line 338, in <module> main() File "/home/waz/workspace/PRIMAL2/driver.py", line 170, in main jobResults, metrics, info = ray.get(done_id)[0] File "/home/waz/anaconda3/envs/py36/lib/python3.6/site-packages/ray/_private/client_mode_hook.py", line 105, in wrapper return func(*args, **kwargs) File "/home/waz/anaconda3/envs/py36/lib/python3.6/site-packages/ray/_private/worker.py", line 2523, in get raise value ray.exceptions.RayActorError: The actor died unexpectedly before finishing this task. class_name: imitationRunner actor_id: 57261355039a445aab5c889701000000 pid: 37880 namespace: 36ab3fad-5802-47dd-a1b7-63dece3b6d68 ip: 10.26.224.144 The actor is dead because its worker process has died. Worker exit type: SYSTEM_ERROR Worker exit detail: Worker unexpectedly exits with a connection error code 2. End of file. There are some potential root causes. (1) The process is killed by SIGKILL by OOM killer due to high memory usage. (2) ray stop --force is called. (3) The worker is crashed unexpectedly due to SIGSEGV or other unexpected errors. 2024-04-02 16:24:19,788 WARNING worker.py:1986 -- A worker died or was killed while executing a task by an unexpected system error. To troubleshoot the problem, check the logs for the dead worker. RayTask ID: ffffffffffffffff747a4f29667195d12b49c67b01000000 Worker ID: 7b73a6ec53dcefe2ebdf2886269b2f5c58b0a07f4dba5383bc0bdb60 Node ID: 9142cb0d3cde6bac61a2c9ea58188ae8f649a46cd4f8ab495df8f181 Worker IP address: 10.26.224.144 Worker port: 34091 Worker PID: 37879 Worker exit type: SYSTEM_ERROR Worker exit detail: Worker unexpectedly exits with a connection error code 2. End of file. There are some potential root causes. (1) The process is killed by SIGKILL by OOM killer due to high memory usage. (2) ray stop --force is called. (3) The worker is crashed unexpectedly due to SIGSEGV or other unexpected errors. (imitationRunner pid=37879) cannot allocate memory for thread-local data: ABORT

I change the number of agents and threads and i make sure my computation resource is enough(on a server with 2 Xeon silver cpu and 24090+64080). Do you have any idea about this problem?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions