fix: remove unused other_mimi instance to save ~200MB GPU memory by ThanhNguyxn · Pull Request #49 · NVIDIA/personaplex

ThanhNguyxn · 2026-01-31T09:50:07Z

Summary

Fixes #46 - Remove unused other_mimi MimiModel instance that wastes ~200MB GPU memory

Problem Analysis

After careful code review, the other_mimi instance in both server.py and offline.py:

Is instantiated identically to the primary mimi:

mimi = loaders.get_mimi(args.mimi_weight, args.device)
other_mimi = loaders.get_mimi(args.mimi_weight, args.device)  # Same weights, same device

Has its outputs discarded (assigned to _):

codes = mimi.encode(chunk)        # ← Used
_ = other_mimi.encode(chunk)      # ← Discarded

main_pcm = mimi.decode(tokens[:, 1:9])  # ← Used
_ = other_mimi.decode(tokens[:, 1:9])   # ← Discarded

Consumes significant resources:
- ~200MB GPU memory for the duplicate model
- Extra compute cycles on every encode/decode call
- Maintains separate streaming state that's never read

Investigation

I verified there are no hidden side effects:

other_mimi.encode() and other_mimi.decode() return values are never used
The streaming state of other_mimi is reset but never read
No inter-model communication exists between mimi and other_mimi
The two instances share no state (each is independently loaded)

Changes

`server.py`

Removed other_mimi from ServerState dataclass fields
Removed other_mimi from __init__ parameters
Removed other_mimi.streaming_forever(1) call
Removed other_mimi.encode(chunk) in warmup() and opus_loop()
Removed other_mimi.decode(tokens) in warmup() and opus_loop()
Removed other_mimi.reset_streaming() call
Removed other_mimi instantiation in main()
Removed other_mimi from ServerState() constructor call

`offline.py`

Removed other_mimi parameter from warmup() function
Removed other_mimi parameter from decode_tokens_to_pcm() function
Removed other_mimi.encode() call in warmup()
Removed other_mimi.decode() calls in warmup() and decode_tokens_to_pcm()
Removed other_mimi instantiation in run_inference()
Removed other_mimi.streaming_forever(1) call
Removed other_mimi.reset_streaming() call
Updated function calls to not pass other_mimi

Memory Impact

Metric	Before	After	Savings
MimiModel instances	2	1	~200MB GPU RAM
encode() calls/frame	2	1	50% compute
decode() calls/frame	2	1	50% compute

Backward Compatibility

✅ Fully backward compatible - no API changes, no behavioral changes

Testing Recommendation

# Server mode
python -m moshi.server --ssl "$SSL_DIR"

# Offline mode
python -m moshi.offline --input-wav input.wav --output-wav output.wav \
  --output-text output.json --voice-prompt NATM1.pt

Both should work identically but with ~200MB less GPU memory usage.

Fixes NVIDIA#46 The `other_mimi` MimiModel instance was: - Instantiated but its encode/decode outputs were discarded - Consuming ~200MB additional GPU memory unnecessarily - Running redundant computations on every audio frame This PR removes all references to `other_mimi` from: - server.py: ServerState class, warmup(), handle_chat() - offline.py: warmup(), decode_tokens_to_pcm(), run_inference() Memory savings: ~200MB GPU RAM per running instance

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: remove unused other_mimi instance to save ~200MB GPU memory#49

fix: remove unused other_mimi instance to save ~200MB GPU memory#49
ThanhNguyxn wants to merge 1 commit intoNVIDIA:mainfrom
ThanhNguyxn:fix/remove-dead-code-other-mimi

ThanhNguyxn commented Jan 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ThanhNguyxn commented Jan 31, 2026

Summary

Problem Analysis

Investigation

Changes

server.py

offline.py

Memory Impact

Backward Compatibility

Testing Recommendation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

`server.py`

`offline.py`