Failure mode reduction is extremely slow / cannot use GPU inside Docker (SentenceTransformer embeddings)

What I tried
1. I tried to run Failure Mode Analysis (TrajFM).
2. The pipeline reaches `failure_mode_reduction`, then tries to embed ~27k titles using `SentenceTransformer('all-MiniLM-L6-v2')`.
3. But the Docker container cannot see any GPU, so embedding is done on CPU (or it hangs while trying to download/load the model).

Where the issue happens
`AssetOpsBench/src/TrajFM/failure_mode_reduction.py`, around line 113–120 (SentenceTransformer init + encode).

Evidence / logs
- From the container logs:
  - `torch.cuda.is_available=False
`
  - Default HF cache looks like /root/.cache/huggingface/hub (not pre-populated)
- On my host:
  - `nvidia-smi` → `command not found`
So currently, the container cannot access an NVIDIA GPU, and installing/using the embedding model becomes problematic.

Expected behavior
- Failure mode reduction should finish reasonably fast (minutes, not “forever”), ideally using GPU when available.
Actual behavior
- Embedding step is extremely slow / appears stuck (likely CPU-only + large batch).
- GPU is not visible from Docker.

Questions / help needed
- What is the recommended way to run TrajFM failure mode reduction with GPU?
- If my environment cannot provide GPU to Docker, what is the recommended workaround (pre-download model into image, mount HF cache, reduce batch, etc.)?

Optional request (if possible)
- If feasible, could we avoid local SentenceTransformer embedding entirely by adding an embedding option to the existing `watsonx_llm` flow (so embeddings can be computed on that service side / its GPU), instead of requiring GPU inside Docker?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failure mode reduction is extremely slow / cannot use GPU inside Docker (SentenceTransformer embeddings) #146

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Failure mode reduction is extremely slow / cannot use GPU inside Docker (SentenceTransformer embeddings) #146

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions