-
Notifications
You must be signed in to change notification settings - Fork 23
Validate disk removals #248
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WebberHuang1118
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@martindekov Please check the comments, thanks.
f12a8ff to
2937bd6
Compare
Validating disk removals in case the disk contains the volume with last healthy replica. The healthy replica is determined by the FailedAt. In case this field is empty, then the replica is deemed healthy. Also rejection happens in case it hosts the last usable backing image. "Usable backing image" is one which in ready state. Signed-off-by: Martin Dekov <martin.dekov@suse.com>
Adding unit tests for blockdevice validations' Update method. The change requires very basic fake implementation of the cache objects used. Also pipeline has additional step and very basic go test which can be further optimized to output the coverage in file. Signed-off-by: Martin Dekov <martin.dekov@suse.com>
Refactoring code so logic in: * validateLHDisk - is series of checks which return early in case of no matches to reduce the previous complex if statement * validateVolumes - extracted diskUUID retrival in stand alone function to be reused and for each general step extracted in stand alone methods * validateBackingImages - reused the diskUUID from the volumes and extracted the logic of determining whether it's safe or not to delete backingimage to stand alone mehod Signed-off-by: Martin Dekov <martin.dekov@suse.com>
2937bd6 to
67a8c68
Compare
Including HealthyAt in addition to FailedAt field when deciding whether replica is failed or healthy. Updating tests as well when constructing replicas. The condiiton follows the established approach by the longhorn team which use this condition as well: https://github.com/longhorn/longhorn-manager/blob/master/datastore/longhorn.go#L2172-L2184 https://github.com/longhorn/longhorn-manager/blob/master/controller/volume_controller.go#L935-L951 Signed-off-by: Martin Dekov <martin.dekov@suse.com>
67a8c68 to
a477738
Compare
Vicente-Cheng
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall lgtm, just some questions.
Addressing feedback from Vicente including the following: * renamed getDiskUUID to validateDiskInNode and I call it outside directly in validateLHDisk to avoid duplication * renamed some functions and varibles to reduce lenght due to new functions on counting backing images and healthy replicas * Added trailing spaces which were previously removed * Prepend longhorn related object keys with lh as longhorn has a lot of common objects with k8s like volumes/replicas/nodes which can be confused Signed-off-by: Martin Dekov <martin.dekov@suse.com>
Signed-off-by: Martin Dekov <martin.dekov@suse.com>
Renaming pipeline integration image build step from: "Build the Image for the Integration Test" to: "Build Integration Test Image and run Unit Tests" To make it obvious when unit tests are being ran as we don't mention this anywhere explicitly. Due to this also removed the `unit-test` script and step. Signed-off-by: Martin Dekov <martin.dekov@suse.com>
Vicente-Cheng
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, please remember to squash after all reviews are complete.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just a nit, the PR is great, thanks.
|
Hi @martindekov ,
Could we say that this enhancement ONLY for the "single replica" storage class scenarios? By the way, for single replica but Detached volume, what is the expected behavior? |
|
Hey @albinsun so this might sound misleading if red in isolation: So on the question:
We don evaluate volume status when we determine whether it's ok to remove disk or not. We evaluate the replicas of the volume. If there is only 1 healthy replica of a volume and it's on the disk user is trying to remove we reject the disk removal operation. So shortly this covers all scenarios - when volume has one or more replicas.
My understanding is that in order to get to a detached volume, I assume you mean detached from virtual machine, it would've been attached to it at some point and replicated in hat case it would be still on a disk so depending on the state of it's replicas we reject or accept the disk's removal. We are just cautious to not losing the last healthy replica of a volume or backing image unintentionally so in case we decide to recover it we have a healthy place from where we can start. I think the test plan shows the intent better. |
Validating disk removals in case the disk contains the volume with last healthy replica. The healthy
replica is determined by the FailedAt and HealthyAt. In case FailedAt field is empty and HealthyAt field is not empty, then the replica is deemed healthy following examples directly from longhorn project:
https://github.com/longhorn/longhorn-manager/blob/master/datastore/longhorn.go#L2172-L2184
https://github.com/longhorn/longhorn-manager/blob/master/controller/volume_controller.go#L935-L951
Also rejection happens in case it hosts the last
usable backing image. "Usable backing image"
is one which in ready state.
The change also renames the pipeline step:
"Build the Image for the Integration Test"
to:
"Build Integration Test Image and run Unit Tests"
in order to make it more obvious where we run
the unit tests in the pipeline as the scripts are
called internally.
Problem:
If disk contains last healthy replica of a volume or last valid backingimage we should reject removals of the disk as we might lose data.
Solution:
From the condition above, add webhook validation for blockdevice where on attempted removal, we will make sure the disk which is to be removed does not contain last volumes/backingimages
Related Issue:
harvester/harvester#3344
Original PR with single node validation: #244
Test plan:
Added unit tests in the second commit
Negative scenarios
Reject volume removal:
newtag for example (make sure to add unique tags for each node)new(same as disk tag above)Replicate Backing Image removal rejection:
Positve scenarios
Follow the negative scenarios from above, but when creating the storage class, match the replicas to the amount of nodes in the harvester cluster. That way both backingimages and volumes should have multiple valid replicas and deletion of the disk would be allowed.
Removing disk with last healthy volume webhook UI result
Removing disk with last healthy backing image webhook UI result