Skip to content

Conversation

@martindekov
Copy link
Contributor

@martindekov martindekov commented Sep 19, 2025

Rejecting disk removals when there are attached volumes in Degraded
state on single node harvester set ups.

We also make sure to reject removals when block devices are
in failed state as well.

The logic is split in two parts:

  • We take the disk selectors of volumes in degraded state and
    compare the selectors against the disktags
  • We take the longhorn node related with the current blockdevice
    and through the node we pick the backing image all through indexing

In both scenarios in case there is match while disk is being
disabled, we list to the user which volumes and then images
needs to get fixed.

Signed-off-by: Martin Dekov martin.dekov@suse.com

Problem:
Users can directly delete disks without any warnings explained in issue:
harvester/harvester#3344

Solution:
Reject deletions of disks when there are degraded volumes when there is only single node harvester cluster

Related Issue:

harvester/harvester#3344

Test plan:

Unit testing added

Replicate Degraded Volume Scenario:

  1. Create single node harvester cluster
  2. Attach SATA disks before installation
  3. Once installed add the disks through the Hosts -> Edit Config -> Storage -> Add Disk dropdown
  4. Tag the added disk with new tag for example
  5. Create storage class with 3 replicas and attach the disk Advanced -> Storage Classes -> Create -> Disk Selector -> new (same as disk tag above)
  6. Create virtual machine and attach volume to use that storage class
  7. Since replicas are 3 while we have only 1 volume we end up with Degraded volume
  8. Try to delete the disk - see rejection

Replicate Failed Backing Image scenario:

  1. Follow steps 1 to 4 above
  2. In images tab click Create button to create a new image
  3. While creating pick file image and click Storage Class tab and pick storage class created from above - "new"
  4. While image is uploaded refresh the page multiple times to provoke EOF upload error
  5. Try deleting the disk from Hosts page - check error

Note: make sure the virtualmachineimage has block device as backend before testing so when image is created corresponding blockdevice is created as well

Copy link
Member

@w13915984028 w13915984028 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI.

@martindekov martindekov force-pushed the diskremoval3344 branch 3 times, most recently from d0a8ce6 to 0cc1d7c Compare September 24, 2025 08:57
@martindekov martindekov changed the title [WIP] Reject disk removals on Degraded volumes [WIP] Validate disk removals Sep 24, 2025
@martindekov martindekov force-pushed the diskremoval3344 branch 21 times, most recently from 0deeccb to a62db7c Compare September 25, 2025 14:21
Addressing feedback from Moritz including the following:
* adding longhorn volume resource to the ndm clusterrole
* adding virtualmachineimage resource to the clusterrole
* remove redundant usage of slice after logic was changed

Signed-off-by: Martin Dekov <martin.dekov@suse.com>
Signed-off-by: Martin Dekov <martin.dekov@suse.com>
Adding fake implementations and unit test skeleton for the
pipeline runs. For now there is only one test which has
minimal coverage for the purposes of validating the tests
work as expected and cover the basic part of the code.

Test cases will be added in incoming commit.

Signed-off-by: Martin Dekov <martin.dekov@suse.com>
Adding unit test cases covering mostly the new logic. Removed slice
allocation size since it produced empty values which would've led
to new logic to avoid empty values so for simplicity removal was the
best choice.

Also augmented the instantiaton of the objects to allow empty slices and
extend test coverage when caches are empty.

Signed-off-by: Martin Dekov <martin.dekov@suse.com>
@martindekov martindekov changed the title Validate disk removals Validate single node disk removals Sep 30, 2025
m-ildefons
m-ildefons previously approved these changes Sep 30, 2025
Copy link
Member

@m-ildefons m-ildefons left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

cd $(dirname $0)/..

echo Running unit tests
go test -v -cover ./pkg/... No newline at end of file
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a nit, you don't have to correct this:

If you want coverage output from the containerized build (i.e. using make unit-test), then you need to do some additional things:

  1. specify a coverage output file
go test -v -cover -coverprofile=coverage.out ./pkg/...
  1. do the conversion to the final output format (e.g. HTML) inside the container
go test [...]
go tool cover -html=coverage.out -o coverage.html
  1. add the final coverage output file in Dockerfile.dapper where the environment variable DAPPER_OUTPUT is defined, e.g.:
[...]
ENV DAPPER_OUTPUT ./bin ./manifests ./pkg ./coverage.html

Then you can do make unit-test, the coverage report will generated and placed outside of the build container so you can analyze it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Moritz, I will leave this open. I intentionally made it simple, didn't want to overcrowd this change as it already contains too much different things. Augmenting pipeline to output coverage in the right way is a work on it's own. Also I think there were badges to show coverage on front page etc etc.

We can open an separate issue for this I think in this repo? WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's completely fair, in fact I think it would be better to have a separate PR for that. I just assumed that you added the -cover flag to get coverage data.

Feel free to tag me as reviewer when you open that PR.

Copy link
Member

@WebberHuang1118 WebberHuang1118 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please review the discussion related to the Longhorn backing image, thanks.

Replacing validation of blockdevice deletion from virtualmachineimage to
backingimage. Logic rejects disk removals in case there are
backingimages attached in non ready state to the blockdevice. We index
the caches for both longhorn nodes and backingimages to reach all
backingimages related with blockdevices through the longhorn nodes.

Additonal changes
* Replaces fakes and testing for the new objects
* Fixed some RBAC permissions for the new objects
* Registered longhorn factory as caches didn't actually return
anything
* Removed unsued objects and prints
* Fixed selection based on Spec.Tags to Status.Tags as in real
environment tags are stored in status, not in the spec

Signed-off-by: Martin Dekov <martin.dekov@suse.com>
Copy link
Member

@WebberHuang1118 WebberHuang1118 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of the logic for checking backing images seems incorrect to me; please see my comment.

Also suggest reviewing the Backing Image LEP for the relevant data structures.

Thanks

Comment on lines +237 to +247
var failedBackingImages []string
for _, backingImage := range backingImages {
if backingImage.Status.DiskFileStatusMap[uuid].State != lhv1.BackingImageStateReady {
failedBackingImages = append(failedBackingImages, backingImage.Name)
}
}
if len(failedBackingImages) > 0 {
failedBackingImageList := strings.Join(failedBackingImages, ",")
return fmt.Errorf("the following backingimages: %v attached to blockdevice: %v are in a %s state; make sure state is fixed before disk deletion",
failedBackingImageList, old.Name, lhv1.BackingImageStateFailed)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, the current logic is: for the disk that’s about to be removed, check whether any backing image has a copy on that disk, and whether the copy is not in the ready state.

However, what we actually need to check is: for the disk that’s about to be removed, determine if it holds the only valid copy of certain backing images.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this:

determine if it holds the only valid copy of certain backing images.

Is part of the UI Warning change in that issue: harvester/harvester#9218 There is no PR yet for this.

The current PR refers to the case where we have a single harvester node and we have bad volumes/backingimages. And all storage classes have replica 1 when harvester consists of a single node covering this requirement by @Vicente-Cheng:

  • block by webhook - number of replica = 1

IMO if we have single node whatever you do effectively those disks will be attached to that single node, the moment you add more nodes, we don't reach this logic here at all. As per my current understanding In multi node set ups, replication of data becomes more than 1 and then we go to the warning before deletion, but we don't reject it through webhook, but this is to be discussed I think.

Copy link
Member

@WebberHuang1118 WebberHuang1118 Oct 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this PR only target a single-node environment? If I understand correctly, the original issue #3344 should also cover multi-node environments.

As I see it, this PR is attempting to address #3344, so this PR should not be limited to single-node setups. From my perspective, #3344 and #9218 appear to be duplicates.

Regarding:

The current PR refers to the case where we have a single harvester node and we have bad volumes/backingimages. And all storage classes have replica 1 when harvester consists of a single node covering this requirement by @Vicente-Cheng:

block by webhook - number of replica = 1

Even in a single-node environment, users can attach multiple disks so that a storageClass targets more than one replica.

If I understand correctly, what @Vicente-Cheng means is that the webhook should block node removal if a volume has only one healthy replica on the disk that’s about to be deleted. We should trace this through the route volume → engine → replicas, rather than relying on the selector, since the selector is part of the volume spec (the desired state) and not the actual state we need to verify.

The same logic should also apply to backing images.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I captured our offline discussions in comment to the issue:
harvester/harvester#3344 (comment)

That:

  • Prevent the user through webhook to delete disk in the hosts page in a single node harvester in case volume is in Degraded state and is attached to the disk through a storage class
  • In all cases warn the user that when disk is deleted from the hosts page in harvester you should be cautious and evict it first - 1st click warn - second click allow to pass

So the implementation reflects that

Comment on lines +176 to +218
func (v *Validator) validateDegradedVolumes(old *diskv1.BlockDevice) error {
volumeList, err := v.volumeCache.List(utils.LonghornSystemNamespaceName, labels.Everything())
if err != nil {
return err
}
if len(volumeList) == 0 {
return nil
}
degradedVolumes := []string{}
for _, vol := range volumeList {
if vol.Status.Robustness == lhv1.VolumeRobustnessDegraded {
degradedVolumes = append(degradedVolumes, vol.Name)
}
}
if len(degradedVolumes) == 0 {
return nil
}
selectorDegradedVol := make(map[string][]string)
for _, name := range degradedVolumes {
pv, err := v.pvCache.Get(name)
if err != nil {
return err
}
diskSelector := ""
if pv.Spec.CSI != nil {
diskSelector = pv.Spec.CSI.VolumeAttributes[utils.DiskSelectorKey]
}
if len(diskSelector) != 0 {
selectorDegradedVol[diskSelector] = append(selectorDegradedVol[diskSelector], pv.Name)
}
}
degradedVolString := ""
for _, diskTag := range old.Status.Tags {
if val, ok := selectorDegradedVol[diskTag]; ok {
degradedVolString += fmt.Sprintf(" %s: %v", diskTag, val)
}
}
if len(degradedVolString) > 0 {
return fmt.Errorf("the following tags with volumes:%s attached to disk: %s are in degraded state; evict disk before proceeding",
degradedVolString, old.Spec.DevPath)
}
return nil
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May I know why we need to care about the selector? If I understand correctly, the only check required is whether the disk slated for removal contains the sole valid replica of any volume.

Copy link
Contributor Author

@martindekov martindekov Oct 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Selector is how we trace which volumes are attached to the disk we currently we are trying to remove so:

  1. Tagging a disk
  2. We use that tag to create storage class
  3. We use that storage class to create volumes/vms

And this tag is used as selector across those components. That way If I Delete the disk I trace back the volumes created through the disk -> tag -> sc -> volumes way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also both this logic and backingimages logic above are triggered only on single node set ups, so replication can be only 1 in that case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even in a single-node environment, users can attach multiple disks so that a storageClass targets more than one replica, so we cannot make that assumption.

@martindekov
Copy link
Contributor Author

@WebberHuang1118 thanks for pointing the HEP, I will definetely go through it!

Generally - current change is handling bad backingimages and volumes when harvester is 1 node meaning replication is 1 as well. Any bad backingimage or volume is actually the last valid one in that single node set up.

Indexing bcacking image by disk UUID taken from
the status, not from the spec as per feedback
from Webber

Signed-off-by: Martin Dekov <martin.dekov@suse.com>
@martindekov martindekov requested a review from ihcsim October 3, 2025 12:38
@WebberHuang1118
Copy link
Member

Hi @martindekov

The original issue #3344 should also cover multi-node environments.

As I see it, this PR is attempting to address #3344, so the PR shouldn’t be limited to single-node setups. From my perspective, #3344 and #9218 appear to be duplicates.

IMO, we should design a unified solution that applies to both single-node and multi-node environments in this PR. There’s no need to separate it into another issue (#9218) and another PR.

@martindekov
Copy link
Contributor Author

martindekov commented Oct 6, 2025

To re-iterate, generally we agreed the solution for all node set ups is warning, not webhook rejection. Webhook rejection is for the corner case where we cannot really have replicated storage even if we set the affinity rule as per the comment:
harvester/harvester#3344 (comment)

The issue - harvester/harvester#9218 - refers to the Warning/UI
The issue - harvester/harvester#3344 - was refering to both, but now refers only to webhook/backend work - this PR as it took too much time which I did not anticipate.

So the work between the issues is split - one for the UI and one for Backend.

If we decide all warning work to go to the webhook backend, and leave the front end warning aspect, we also need to find ways to force delete disks in bad state as it will be rejected on API level. This design would further extend the scope of the issue.

@WebberHuang1118
Copy link
Member

To re-iterate, generally we agreed the solution for all node set ups is warning, not webhook rejection. Webhook rejection is for the corner case where we cannot really have replicated storage even if we set the affinity rule as per the comment:
harvester/harvester#3344 (comment)

Not every detail becomes clear until the PR is created. IMO, for reviewers, changes or updates to the original design or proposal are sometimes necessary.

The issue - harvester/harvester#9218 - refers to the Warning/UI
The issue - harvester/harvester#3344 - was refering to both, but now refers only to webhook/backend work - this PR as it took too much time which I did not anticipate.

So the work between the issues is split - harvester/harvester#9218 and one for harvester/harvester#3344.

I agree that #3344 targets the backend part only. However, based on my code reading, the current implementation only includes a webhook to reject disk removal for preventing data loss in a single-node cluster. This is not sufficient for #3344 for the following reasons:

  • Multi-node clusters: A volume may have only one healthy replica, and a backing image may have only one valid copy. Therefore, even in a multi-node cluster, a strict webhook block is still required to prevent data loss.
  • Single-node clusters: Multiple disks can be attached to a single node, so separating the logic between single-node and multi-node environments isn’t necessary and could make the code path more complex. IMO, we should have a unified mechanism that works for all cases.
  • PR design scope: If the PR sticks with the current design (comment), it won’t fully resolve the root issue #3344. Even if merged, the source issue likely won’t be marked as ready-for-testing, which may cause confusion for the QA team during verification.

TL;DR
I agree that #9218 should focus on GUI warning messages — specifically, when disk removal will not lead to data loss (i.e., all volumes still have healthy replicas and all backing images have ready copies even if the target disk is removed).

Meanwhile, #3344 should focus on the original context — if removing a disk would cause any volume to lose its only healthy replica or any backing image to lose the only ready copy, the webhook should strongly block the operation to prevent data loss, regardless of how many nodes are in the cluster.

If we continue with the current PR design, I believe the context and scope of #3344 should at least be reduced — for example, to [ENHANCEMENT] Delete disk operation should consider data availability for single-node Harvester. However, to be honest, that doesn’t make much sense from my perspective.

Ping @Vicente-Cheng @ihcsim for your thoughts on this topic.

@martindekov
Copy link
Contributor Author

Closing and giving priority to multi node implementation here: #248

Won't update this as it became overcrowded with comments and hard to review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants