Skip to content

Conversation

@blktests-ci
Copy link

@blktests-ci blktests-ci bot commented Nov 14, 2025

Pull request for series with
subject: nvme: Convert tag_list mutex to rwsemaphore to avoid deadlock
version: 1
url: https://patchwork.kernel.org/project/linux-block/list/?series=1023161

@blktests-ci
Copy link
Author

blktests-ci bot commented Nov 14, 2025

Upstream branch: 6da43bb
series: https://patchwork.kernel.org/project/linux-block/list/?series=1023161
version: 1

@blktests-ci
Copy link
Author

blktests-ci bot commented Nov 16, 2025

Upstream branch: f824272
series: https://patchwork.kernel.org/project/linux-block/list/?series=1023161
version: 1

@blktests-ci blktests-ci bot force-pushed the series/1023161=>linus-master branch from c60e57c to 6268848 Compare November 16, 2025 07:41
@blktests-ci blktests-ci bot force-pushed the linus-master_base branch from 00d5e5c to d782508 Compare November 17, 2025 00:45
@blktests-ci
Copy link
Author

blktests-ci bot commented Nov 17, 2025

Upstream branch: f824272
series: https://patchwork.kernel.org/project/linux-block/list/?series=1023161
version: 1

@blktests-ci blktests-ci bot force-pushed the series/1023161=>linus-master branch from 6268848 to 38e60f0 Compare November 17, 2025 00:51
@blktests-ci
Copy link
Author

blktests-ci bot commented Nov 17, 2025

Upstream branch: f824272
series: https://patchwork.kernel.org/project/linux-block/list/?series=1023161
version: 1

@blktests-ci blktests-ci bot force-pushed the series/1023161=>linus-master branch from 38e60f0 to c0ba4a9 Compare November 17, 2025 02:42
@blktests-ci
Copy link
Author

blktests-ci bot commented Nov 17, 2025

Upstream branch: f824272
series: https://patchwork.kernel.org/project/linux-block/list/?series=1024486
version: 2

@blktests-ci blktests-ci bot added V2 and removed V1 V1-ci-fail labels Nov 17, 2025
@blktests-ci blktests-ci bot force-pushed the series/1023161=>linus-master branch from c0ba4a9 to 0e9d2ad Compare November 17, 2025 20:41
@blktests-ci blktests-ci bot force-pushed the linus-master_base branch from d782508 to 6099a4d Compare November 17, 2025 23:44
@blktests-ci
Copy link
Author

blktests-ci bot commented Nov 17, 2025

Upstream branch: e7c375b
series: https://patchwork.kernel.org/project/linux-block/list/?series=1024486
version: 2

@blktests-ci blktests-ci bot force-pushed the series/1023161=>linus-master branch from 0e9d2ad to 32ed0c4 Compare November 17, 2025 23:52
@blktests-ci blktests-ci bot force-pushed the linus-master_base branch from 6099a4d to 5121c4d Compare November 18, 2025 02:19
@blktests-ci
Copy link
Author

blktests-ci bot commented Nov 18, 2025

Upstream branch: e7c375b
series: https://patchwork.kernel.org/project/linux-block/list/?series=1024486
version: 2

@blktests-ci blktests-ci bot force-pushed the series/1023161=>linus-master branch from 32ed0c4 to c9ded35 Compare November 18, 2025 02:25
@blktests-ci blktests-ci bot force-pushed the linus-master_base branch from 5121c4d to 4458758 Compare November 19, 2025 00:24
@blktests-ci
Copy link
Author

blktests-ci bot commented Nov 19, 2025

Upstream branch: 8b69055
series: https://patchwork.kernel.org/project/linux-block/list/?series=1024486
version: 2

@blktests-ci blktests-ci bot force-pushed the series/1023161=>linus-master branch from c9ded35 to 33a1d14 Compare November 19, 2025 00:28
@blktests-ci blktests-ci bot force-pushed the linus-master_base branch from 4458758 to 6f43942 Compare November 21, 2025 09:45
@blktests-ci
Copy link
Author

blktests-ci bot commented Nov 21, 2025

Upstream branch: fd95357
series: https://patchwork.kernel.org/project/linux-block/list/?series=1024486
version: 2

@blktests-ci blktests-ci bot force-pushed the series/1023161=>linus-master branch from 33a1d14 to 4f56eb1 Compare November 21, 2025 09:55
@blktests-ci blktests-ci bot force-pushed the linus-master_base branch from 6f43942 to ec9caac Compare December 8, 2025 05:15
blk_mq_{add,del}_queue_tag_set() functions add and remove queues from
tagset, the functions make sure that tagset and queues are marked as
shared when two or more queues are attached to the same tagset.
Initially a tagset starts as unshared and when the number of added
queues reaches two, blk_mq_add_queue_tag_set() marks it as shared along
with all the queues attached to it. When the number of attached queues
drops to 1 blk_mq_del_queue_tag_set() need to mark both the tagset and
the remaining queues as unshared.

Both functions need to freeze current queues in tagset before setting on
unsetting BLK_MQ_F_TAG_QUEUE_SHARED flag. While doing so, both functions
hold set->tag_list_lock mutex, which makes sense as we do not want
queues to be added or deleted in the process. This used to work fine
until commit 98d81f0 ("nvme: use blk_mq_[un]quiesce_tagset")
made the nvme driver quiesce tagset instead of quiscing individual
queues. blk_mq_quiesce_tagset() does the job and quiesce the queues in
set->tag_list while holding set->tag_list_lock also.

This results in deadlock between two threads with these stacktraces:

  __schedule+0x48e/0xed0
  schedule+0x5a/0xc0
  schedule_preempt_disabled+0x11/0x20
  __mutex_lock.constprop.0+0x3cc/0x760
  blk_mq_quiesce_tagset+0x26/0xd0
  nvme_dev_disable_locked+0x77/0x280 [nvme]
  nvme_timeout+0x268/0x320 [nvme]
  blk_mq_handle_expired+0x5d/0x90
  bt_iter+0x7e/0x90
  blk_mq_queue_tag_busy_iter+0x2b2/0x590
  ? __blk_mq_complete_request_remote+0x10/0x10
  ? __blk_mq_complete_request_remote+0x10/0x10
  blk_mq_timeout_work+0x15b/0x1a0
  process_one_work+0x133/0x2f0
  ? mod_delayed_work_on+0x90/0x90
  worker_thread+0x2ec/0x400
  ? mod_delayed_work_on+0x90/0x90
  kthread+0xe2/0x110
  ? kthread_complete_and_exit+0x20/0x20
  ret_from_fork+0x2d/0x50
  ? kthread_complete_and_exit+0x20/0x20
  ret_from_fork_asm+0x11/0x20

  __schedule+0x48e/0xed0
  schedule+0x5a/0xc0
  blk_mq_freeze_queue_wait+0x62/0x90
  ? destroy_sched_domains_rcu+0x30/0x30
  blk_mq_exit_queue+0x151/0x180
  disk_release+0xe3/0xf0
  device_release+0x31/0x90
  kobject_put+0x6d/0x180
  nvme_scan_ns+0x858/0xc90 [nvme_core]
  ? nvme_scan_work+0x281/0x560 [nvme_core]
  nvme_scan_work+0x281/0x560 [nvme_core]
  process_one_work+0x133/0x2f0
  ? mod_delayed_work_on+0x90/0x90
  worker_thread+0x2ec/0x400
  ? mod_delayed_work_on+0x90/0x90
  kthread+0xe2/0x110
  ? kthread_complete_and_exit+0x20/0x20
  ret_from_fork+0x2d/0x50
  ? kthread_complete_and_exit+0x20/0x20
  ret_from_fork_asm+0x11/0x20

The top stacktrace is showing nvme_timeout() called to handle nvme
command timeout. timeout handler is trying to disable the controller and
as a first step, it needs to blk_mq_quiesce_tagset() to tell blk-mq not
to call queue callback handlers. The thread is stuck waiting for
set->tag_list_lock as it tires to walk the queues in set->tag_list.

The lock is held by the second thread in the bottom stack which is
waiting for one of queues to be frozen. The queue usage counter will
drop to zero after nvme_timeout() finishes, and this will not happen
because the thread will wait for this mutex forever.

Convert set->tag_list_lock mutex to set->tag_list_rwsem rwsemaphore to
avoid the deadlock. Update blk_mq_[un]quiesce_tagset() to take the
semaphore for read since this is enough to guarantee no queues will be
added or removed. Update blk_mq_{add,del}_queue_tag_set() to take the
semaphore for write while updating set->tag_list and downgrade it to
read while freezing the queues. It should be safe to update set->flags
and hctx->flags while holding the semaphore for read since the queues
are already frozen.

Fixes: 98d81f0 ("nvme: use blk_mq_[un]quiesce_tagset")
Signed-off-by: Mohamed Khalfella <mkhalfella@purestorage.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
@blktests-ci
Copy link
Author

blktests-ci bot commented Dec 8, 2025

Upstream branch: c2f2b01
series: https://patchwork.kernel.org/project/linux-block/list/?series=1024486
version: 2

@blktests-ci blktests-ci bot force-pushed the series/1023161=>linus-master branch from 4f56eb1 to 1746eba Compare December 8, 2025 05:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant