Skip to content

Cleaner termination #397

@ultmaster

Description

@ultmaster

#370 fixes some of the issues, but not all.

                    ERROR    [Worker 6 | Rollout ro-5d3a71320a1d] Exception during update_attempt. Giving up the update.                                                                                                                                   agent.py:548
                             Traceback (most recent call last):                                                                                                                                                                                                        
                               File "/home/xxx/Projects/agent-lightning/agentlightning/runner/agent.py", line 517, in _step_impl                                                                                                                                      
                                 trace_spans = await self._post_process_rollout_result(next_rollout, result)                                                                                                                                                           
                                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                                                           
                               File "/home/xxx/Projects/agent-lightning/agentlightning/runner/agent.py", line 300, in _post_process_rollout_result                                                                                                                    
                                 await store.add_otel_span(rollout.rollout_id, rollout.attempt.attempt_id, reward_span)                                                                                                                                                
                               File "/home/xxx/Projects/agent-lightning/agentlightning/store/client_server.py", line 1919, in add_otel_span                                                                                                                           
                                 sequence_id = await self.get_next_span_sequence_id(rollout_id, attempt_id)                                                                                                                                                            
                                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                                                            
                               File "/home/xxx/Projects/agent-lightning/agentlightning/store/client_server.py", line 1896, in get_next_span_sequence_id                                                                                                               
                                 data = await self._request_json(                                                                                                                                                                                                      
                                        ^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                                                                                                      
                               File "/home/xxx/Projects/agent-lightning/agentlightning/store/client_server.py", line 1515, in _request_json                                                                                                                           
                                 async with http_call(url, json=json, params=params) as resp:                                                                                                                                                                          
                                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                                                                                   
                               File "/home/xxx/Projects/agent-lightning/.venv/lib/python3.12/site-packages/aiohttp/client.py", line 1510, in __aenter__                                                                                                               
                                 self._resp: _RetType = await self._coro                                                                                                                                                                                               
                                                        ^^^^^^^^^^^^^^^^                                                                                                                                                                                               
                               File "/home/xxx/Projects/agent-lightning/.venv/lib/python3.12/site-packages/aiohttp/client.py", line 779, in _request                                                                                                                  
                                 resp = await handler(req)                                                                                                                                                                                                             
                                        ^^^^^^^^^^^^^^^^^^                                                                                                                                                                                                             
                               File "/home/xxx/Projects/agent-lightning/.venv/lib/python3.12/site-packages/aiohttp/client.py", line 757, in _connect_and_send_request                                                                                                 
                                 await resp.start(conn)                                                                                                                                                                                                                
                               File "/home/xxx/Projects/agent-lightning/.venv/lib/python3.12/site-packages/aiohttp/client_reqrep.py", line 539, in start                                                                                                              
                                 message, payload = await protocol.read()  # type: ignore[union-attr]                                                                                                                                                                  
                                                    ^^^^^^^^^^^^^^^^^^^^^                                                                                                                                                                                              
                               File "/home/xxx/Projects/agent-lightning/.venv/lib/python3.12/site-packages/aiohttp/streams.py", line 680, in read                                                                                                                     
                                 await self._waiter                                                                                                                                                                                                                    
                             asyncio.exceptions.CancelledError                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                       
                             During handling of the above exception, another exception occurred:                                                                                                                                                                       
                                                                                                                                                                                                                                                                       
                             Traceback (most recent call last):                                                                                                                                                                                                        
                               File "/home/xxx/Projects/agent-lightning/agentlightning/runner/agent.py", line 546, in _step_impl                                                                                                                                      
                                 await store.update_attempt(rollout_id, next_rollout.attempt.attempt_id, status="succeeded")                                                                                                                                           
                               File "/home/xxx/Projects/agent-lightning/agentlightning/store/client_server.py", line 2041, in update_attempt                                                                                                                          
                                 data = await self._request_json(                                                                                                                                                                                                      
                                        ^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                                                                                                      
                               File "/home/xxx/Projects/agent-lightning/agentlightning/store/client_server.py", line 1545, in _request_json                                                                                                                           
                                 raise last_exc                                                                                                                                                                                                                        
                               File "/home/xxx/Projects/agent-lightning/agentlightning/store/client_server.py", line 1515, in _request_json                                                                                                                           
                                 async with http_call(url, json=json, params=params) as resp:                                                                                                                                                                          
                                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                                                                                   
                               File "/home/xxx/Projects/agent-lightning/.venv/lib/python3.12/site-packages/aiohttp/client.py", line 1510, in __aenter__                                                                                                               
                                 self._resp: _RetType = await self._coro                                                                                                                                                                                               
                                                        ^^^^^^^^^^^^^^^^                                                                                                                                                                                               
                               File "/home/xxx/Projects/agent-lightning/.venv/lib/python3.12/site-packages/aiohttp/client.py", line 779, in _request                                                                                                                  
                                 resp = await handler(req)                                                                                                                                                                                                             
                                        ^^^^^^^^^^^^^^^^^^                                                                                                                                                                                                             
                               File "/home/xxx/Projects/agent-lightning/.venv/lib/python3.12/site-packages/aiohttp/client.py", line 757, in _connect_and_send_request                                                                                                 
                                 await resp.start(conn)                                                                                                                                                                                                                
                               File "/home/xxx/Projects/agent-lightning/.venv/lib/python3.12/site-packages/aiohttp/client_reqrep.py", line 539, in start                                                                                                              
                                 message, payload = await protocol.read()  # type: ignore[union-attr]                                                                                                                                                                  
                                                    ^^^^^^^^^^^^^^^^^^^^^                                                                                                                                                                                              
                               File "/home/xxx/Projects/agent-lightning/.venv/lib/python3.12/site-packages/aiohttp/streams.py", line 680, in read                                                                                                                     
                                 await self._waiter                                                                                                                                                                                                                    
                             aiohttp.client_exceptions.ServerDisconnectedError: Server disconnected     

Currently we still see many ServerDisconnectedError if the store has terminated first. I think the client needs a way to check whether the server is in a status of preparing / exiting before dumping all the errors to console.

Metadata

Metadata

Assignees

No one assigned

    Labels

    help wantedExtra hands from community will be appreciatedstore

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions