Operating System
Deployment Method
CUDA Usage
Training Process Details (if applicable)
No response
Second-Me version
latest master
Describe the bug
This is as far as I've been able to bring my video drivers and CUDA:
| NVIDIA-SMI 555.42.02 Driver Version: 555.42.02 CUDA Version: 12.5 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA GeForce RTX 3090 Off | 00000000:24:00.0 On | N/A | | 31% 60C P0 106W / 370W | 1977MiB / 24576MiB | 6% Default | | | | N/A |
I've had to back off to Ubuntu 22.04 for compatibility. In spite of answering yes to the CUDA question I end up with the following output at the end of the build (no errors in the actual build, as I've made a number of changes to the Dockerfile as workarounds)
`
[+] Building 2/2
✔ backend Built 0.0s
✔ frontend Built 0.0s
[+] Running 3/3
✔ Network second-me_second-me-network Created 0.1s
✔ Container second-me-backend Started 0.5s
✔ Container second-me-frontend Started 0.6s
Container startup complete
Check CUDA support with: make docker-check-cuda
/r/P/Second-Me on 🌱 master [!?] via 🐍 v3.12.3 took 16m0s
→ make docker-check-cuda
Checking CUDA support in Docker containers...
Running CUDA support check in backend container:
=== GPU Support Check ===
llama-server binary found, checking for CUDA linkage...
❌ llama-server is not linked with CUDA libraries
Container was built without CUDA support
🔍 NVIDIA GPU is available at runtime, but llama-server doesn't support CUDA
To enable GPU support, rebuild using: make docker-up (and select CUDA support when prompted)
No GPU support detected in backend container
`
Current Behavior
The container starts without CUDA support
Expected Behavior
The container starts propertly with CUDA support
Reproduction Steps
Use the following Dockerfile.backend.cuda:
`
FROM nvidia/cuda:12.4.1-base-ubuntu22.04
Set working directory
WORKDIR /app
Add build argument to conditionally skip llama.cpp build
ARG SKIP_LLAMA_BUILD=false
Install system dependencies with noninteractive mode to avoid prompts
#RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y \
build-essential cmake git curl wget lsof vim unzip sqlite3 \
python3-pip python3-venv python3-full python3-poetry pipx \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/* \
&& ln -sf /usr/bin/python3 /usr/bin/python
ENV DEBIAN_FRONTEND=noninteractive
Install Python 3.12.2 from source
RUN apt-get update &&
apt-get install -y
build-essential libssl-dev zlib1g-dev libbz2-dev
libreadline-dev libsqlite3-dev wget curl llvm
libncursesw5-dev xz-utils tk-dev libxml2-dev libxmlsec1-dev
libffi-dev liblzma-dev git unzip cmake sqlite3
cuda-toolkit-12-4
&& cd /usr/src &&
wget https://www.python.org/ftp/python/3.12.2/Python-3.12.2.tgz &&
tar xzf Python-3.12.2.tgz && cd Python-3.12.2 &&
./configure --enable-optimizations &&
make -j"$(nproc)" && make altinstall &&
ln -sf /usr/local/bin/python3.12 /usr/bin/python &&
curl -sS https://bootstrap.pypa.io/get-pip.py | python &&
rm -rf /var/lib/apt/lists/*
Create a virtual environment to avoid PEP 668 restrictions
RUN python -m venv /app/venv
ENV PATH="/app/venv/bin:$PATH"
ENV VIRTUAL_ENV="/app/venv"
Use the virtual environment's pip to install packages
RUN pip install --upgrade pip
&& pip install poetry
&& poetry config virtualenvs.create false
Create directories
RUN mkdir -p /app/dependencies /app/data/sqlite /app/data/chroma_db /app/logs /app/run /app/resources
Copy dependency files - Files that rarely change
COPY dependencies/graphrag-1.2.1.dev27.tar.gz /app/dependencies/
COPY dependencies/llama.cpp.zip /app/dependencies/
Copy GPU checker script
COPY docker/app/check_gpu_support.sh /app/
COPY docker/app/check_torch_cuda.py /app/
RUN chmod +x /app/check_gpu_support.sh
Unpack llama.cpp and build with CUDA support (conditionally, based on SKIP_LLAMA_BUILD)
RUN if [ "$SKIP_LLAMA_BUILD" = "false" ]; then
echo "=====================================================================" &&
echo "STARTING LLAMA.CPP BUILD WITH CUDA SUPPORT - THIS WILL TAKE SOME TIME" &&
echo "=====================================================================" &&
LLAMA_LOCAL_ZIP="dependencies/llama.cpp.zip" &&
echo "Using local llama.cpp archive..." &&
unzip -q "$LLAMA_LOCAL_ZIP" &&
cd llama.cpp &&
mkdir -p build &&
cd build &&
echo "Starting CMake configuration with CUDA support..." &&
cmake -DGGML_CUDA=ON
-DCMAKE_BUILD_TYPE=Release
-DBUILD_SHARED_LIBS=OFF
-DLLAMA_NATIVE=ON
.. &&
echo "Starting build process (this will take several minutes)..." &&
cmake --build . --config Release -j$(nproc) --verbose &&
echo "Build completed successfully" &&
chmod +x /app/llama.cpp/build/bin/llama-server /app/llama.cpp/build/bin/llama-cli &&
echo "====================================================================" &&
echo "CUDA BUILD COMPLETED SUCCESSFULLY! GPU ACCELERATION IS NOW AVAILABLE" &&
echo "====================================================================";
else
echo "=====================================================================" &&
echo "SKIPPING LLAMA.CPP BUILD (SKIP_LLAMA_BUILD=$SKIP_LLAMA_BUILD)" &&
echo "Using existing llama.cpp build from Docker volume" &&
echo "=====================================================================" &&
LLAMA_LOCAL_ZIP="dependencies/llama.cpp.zip" &&
echo "Just unpacking llama.cpp archive (no build)..." &&
unzip -q "$LLAMA_LOCAL_ZIP" &&
cd llama.cpp &&
mkdir -p build;
fi
Mark as GPU-optimized build for runtime reference
RUN mkdir -p /app/data &&
echo "{ "gpu_optimized": true, "optimized_on": "$(date -u +"%Y-%m-%dT%H:%M:%SZ")" }" > /app/data/gpu_optimized.json &&
echo "Created GPU-optimized marker file"
Copy project configuration - Files that occasionally change
COPY pyproject.toml README.md /app/
Fix for potential package installation issues with Poetry
RUN pip install --upgrade setuptools wheel
RUN poetry install --no-interaction --no-root || poetry install --no-interaction --no-root --without dev
RUN pip install --force-reinstall dependencies/graphrag-1.2.1.dev27.tar.gz
Copy source code - Files that frequently change
COPY docker/ /app/docker/
COPY lpm_kernel/ /app/lpm_kernel/
Check module import
RUN python -c "import lpm_kernel; print('Module import check passed')"
Set environment variables
ENV PYTHONUNBUFFERED=1
PYTHONPATH=/app
BASE_DIR=/app/data
LOCAL_LOG_DIR=/app/logs
RUN_DIR=/app/run
RESOURCES_DIR=/app/resources
APP_ROOT=/app
FLASK_APP=lpm_kernel.app
LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
Expose ports
EXPOSE 8002 8080
Set the startup command
CMD ["bash", "-c", "echo 'Checking SQLite database...' && if [ ! -s /app/data/sqlite/lpm.db ]; then echo 'SQLite database not found or empty, initializing...' && mkdir -p /app/data/sqlite && sqlite3 /app/data/sqlite/lpm.db '.read /app/docker/sqlite/init.sql' && echo 'SQLite database initialized successfully' && echo 'Tables created:' && sqlite3 /app/data/sqlite/lpm.db '.tables'; else echo 'SQLite database already exists, skipping initialization'; fi && echo 'Checking ChromaDB...' && if [ ! -d /app/data/chroma_db/documents ] || [ ! -d /app/data/chroma_db/document_chunks ]; then echo 'ChromaDB collections not found, initializing...' && python /app/docker/app/init_chroma.py && echo 'ChromaDB initialized successfully'; else echo 'ChromaDB already exists, skipping initialization'; fi && echo 'Starting application at ' $(date) >> /app/logs/backend.log && cd /app && python -m flask run --host=0.0.0.0 --port=${LOCAL_APP_PORT:-8002} >> /app/logs/backend.log 2>&1"]
`
Possible Workaround
No response
Additional Information
No response
Link to related Github discussion or issue
No response
Operating System
Deployment Method
CUDA Usage
Training Process Details (if applicable)
No response
Second-Me version
latest master
Describe the bug
This is as far as I've been able to bring my video drivers and CUDA:
| NVIDIA-SMI 555.42.02 Driver Version: 555.42.02 CUDA Version: 12.5 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA GeForce RTX 3090 Off | 00000000:24:00.0 On | N/A | | 31% 60C P0 106W / 370W | 1977MiB / 24576MiB | 6% Default | | | | N/A |I've had to back off to Ubuntu 22.04 for compatibility. In spite of answering yes to the CUDA question I end up with the following output at the end of the build (no errors in the actual build, as I've made a number of changes to the Dockerfile as workarounds)
`
[+] Building 2/2
✔ backend Built 0.0s
✔ frontend Built 0.0s
[+] Running 3/3
✔ Network second-me_second-me-network Created 0.1s
✔ Container second-me-backend Started 0.5s
✔ Container second-me-frontend Started 0.6s
Container startup complete
Check CUDA support with: make docker-check-cuda
/r/P/Second-Me on 🌱 master [!?] via 🐍 v3.12.3 took 16m0s
→ make docker-check-cuda
Checking CUDA support in Docker containers...
Running CUDA support check in backend container:
=== GPU Support Check ===
llama-server binary found, checking for CUDA linkage...
❌ llama-server is not linked with CUDA libraries
Container was built without CUDA support
🔍 NVIDIA GPU is available at runtime, but llama-server doesn't support CUDA
To enable GPU support, rebuild using: make docker-up (and select CUDA support when prompted)
No GPU support detected in backend container
`
Current Behavior
The container starts without CUDA support
Expected Behavior
The container starts propertly with CUDA support
Reproduction Steps
Use the following
Dockerfile.backend.cuda:`
FROM nvidia/cuda:12.4.1-base-ubuntu22.04
Set working directory
WORKDIR /app
Add build argument to conditionally skip llama.cpp build
ARG SKIP_LLAMA_BUILD=false
Install system dependencies with noninteractive mode to avoid prompts
#RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y \
build-essential cmake git curl wget lsof vim unzip sqlite3 \
python3-pip python3-venv python3-full python3-poetry pipx \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/* \
&& ln -sf /usr/bin/python3 /usr/bin/python
ENV DEBIAN_FRONTEND=noninteractive
Install Python 3.12.2 from source
RUN apt-get update &&
apt-get install -y
build-essential libssl-dev zlib1g-dev libbz2-dev
libreadline-dev libsqlite3-dev wget curl llvm
libncursesw5-dev xz-utils tk-dev libxml2-dev libxmlsec1-dev
libffi-dev liblzma-dev git unzip cmake sqlite3
cuda-toolkit-12-4
&& cd /usr/src &&
wget https://www.python.org/ftp/python/3.12.2/Python-3.12.2.tgz &&
tar xzf Python-3.12.2.tgz && cd Python-3.12.2 &&
./configure --enable-optimizations &&
make -j"$(nproc)" && make altinstall &&
ln -sf /usr/local/bin/python3.12 /usr/bin/python &&
curl -sS https://bootstrap.pypa.io/get-pip.py | python &&
rm -rf /var/lib/apt/lists/*
Create a virtual environment to avoid PEP 668 restrictions
RUN python -m venv /app/venv
ENV PATH="/app/venv/bin:$PATH"
ENV VIRTUAL_ENV="/app/venv"
Use the virtual environment's pip to install packages
RUN pip install --upgrade pip
&& pip install poetry
&& poetry config virtualenvs.create false
Create directories
RUN mkdir -p /app/dependencies /app/data/sqlite /app/data/chroma_db /app/logs /app/run /app/resources
Copy dependency files - Files that rarely change
COPY dependencies/graphrag-1.2.1.dev27.tar.gz /app/dependencies/
COPY dependencies/llama.cpp.zip /app/dependencies/
Copy GPU checker script
COPY docker/app/check_gpu_support.sh /app/
COPY docker/app/check_torch_cuda.py /app/
RUN chmod +x /app/check_gpu_support.sh
Unpack llama.cpp and build with CUDA support (conditionally, based on SKIP_LLAMA_BUILD)
RUN if [ "$SKIP_LLAMA_BUILD" = "false" ]; then
echo "=====================================================================" &&
echo "STARTING LLAMA.CPP BUILD WITH CUDA SUPPORT - THIS WILL TAKE SOME TIME" &&
echo "=====================================================================" &&
LLAMA_LOCAL_ZIP="dependencies/llama.cpp.zip" &&
echo "Using local llama.cpp archive..." &&
unzip -q "$LLAMA_LOCAL_ZIP" &&
cd llama.cpp &&
mkdir -p build &&
cd build &&
echo "Starting CMake configuration with CUDA support..." &&
cmake -DGGML_CUDA=ON
-DCMAKE_BUILD_TYPE=Release
-DBUILD_SHARED_LIBS=OFF
-DLLAMA_NATIVE=ON
.. &&
echo "Starting build process (this will take several minutes)..." &&
cmake --build . --config Release -j$(nproc) --verbose &&
echo "Build completed successfully" &&
chmod +x /app/llama.cpp/build/bin/llama-server /app/llama.cpp/build/bin/llama-cli &&
echo "====================================================================" &&
echo "CUDA BUILD COMPLETED SUCCESSFULLY! GPU ACCELERATION IS NOW AVAILABLE" &&
echo "====================================================================";
else
echo "=====================================================================" &&
echo "SKIPPING LLAMA.CPP BUILD (SKIP_LLAMA_BUILD=$SKIP_LLAMA_BUILD)" &&
echo "Using existing llama.cpp build from Docker volume" &&
echo "=====================================================================" &&
LLAMA_LOCAL_ZIP="dependencies/llama.cpp.zip" &&
echo "Just unpacking llama.cpp archive (no build)..." &&
unzip -q "$LLAMA_LOCAL_ZIP" &&
cd llama.cpp &&
mkdir -p build;
fi
Mark as GPU-optimized build for runtime reference
RUN mkdir -p /app/data &&
echo "{ "gpu_optimized": true, "optimized_on": "$(date -u +"%Y-%m-%dT%H:%M:%SZ")" }" > /app/data/gpu_optimized.json &&
echo "Created GPU-optimized marker file"
Copy project configuration - Files that occasionally change
COPY pyproject.toml README.md /app/
Fix for potential package installation issues with Poetry
RUN pip install --upgrade setuptools wheel
RUN poetry install --no-interaction --no-root || poetry install --no-interaction --no-root --without dev
RUN pip install --force-reinstall dependencies/graphrag-1.2.1.dev27.tar.gz
Copy source code - Files that frequently change
COPY docker/ /app/docker/
COPY lpm_kernel/ /app/lpm_kernel/
Check module import
RUN python -c "import lpm_kernel; print('Module import check passed')"
Set environment variables
ENV PYTHONUNBUFFERED=1
PYTHONPATH=/app
BASE_DIR=/app/data
LOCAL_LOG_DIR=/app/logs
RUN_DIR=/app/run
RESOURCES_DIR=/app/resources
APP_ROOT=/app
FLASK_APP=lpm_kernel.app
LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
Expose ports
EXPOSE 8002 8080
Set the startup command
CMD ["bash", "-c", "echo 'Checking SQLite database...' && if [ ! -s /app/data/sqlite/lpm.db ]; then echo 'SQLite database not found or empty, initializing...' && mkdir -p /app/data/sqlite && sqlite3 /app/data/sqlite/lpm.db '.read /app/docker/sqlite/init.sql' && echo 'SQLite database initialized successfully' && echo 'Tables created:' && sqlite3 /app/data/sqlite/lpm.db '.tables'; else echo 'SQLite database already exists, skipping initialization'; fi && echo 'Checking ChromaDB...' && if [ ! -d /app/data/chroma_db/documents ] || [ ! -d /app/data/chroma_db/document_chunks ]; then echo 'ChromaDB collections not found, initializing...' && python /app/docker/app/init_chroma.py && echo 'ChromaDB initialized successfully'; else echo 'ChromaDB already exists, skipping initialization'; fi && echo 'Starting application at '$(date) >> /app/logs/backend.log && cd /app && python -m flask run --host=0.0.0.0 --port=$ {LOCAL_APP_PORT:-8002} >> /app/logs/backend.log 2>&1"]
`
Possible Workaround
No response
Additional Information
No response
Link to related Github discussion or issue
No response