-
Notifications
You must be signed in to change notification settings - Fork 35
[e2e][backend] Add test cases for restore backup with snapshot created #1384
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[e2e][backend] Add test cases for restore backup with snapshot created #1384
Conversation
| api_client.volumes.delete(vol_name) | ||
|
|
||
| @pytest.mark.dependency(depends=["TestBackupRestoreWithSnapshot::tests_backup_vm"], param=True) | ||
| def test_with_snapshot_restore_replace_retain_vols( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we parameterize delete_volumes thus can test Delete Previous Volumes for both Delete and Retain?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a good idea to use parameterize config to cover both the Delete and Retain test cases.
While according to the test issue #1045
We just select the Restore backup to replace existing (retain volume)
The reason is when the vm also have snapshot created, even if we shutdown the vm, the backend would check and prevent to restore to replace existing with delete volume.

Thus here we test the replace existing with retain volume only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Snapshot function is based on the volume, so delete volume will not work with snapshot should be the expected feature, or maybe we can discuss the case in sync up meeting to double confirm it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion.
After moving the snapshot restore test cases into the original TestBackupRestore.
When test successfully complete, all the test generated volume will be automatically cleanup.
a8fd100 to
71c5a97
Compare
|
@TachunLin Please help check the error and make sure the test run successfully in Jenkins. |
|
Thanks for the reminder. I checked the test report of The reason is we not yet have the backup bucket created in our ecm lab minio artifact endpoint Then I set the same The result is when I execute Next I trigger a new test
I would continue to investigate what cause these tests failed on the Jenkins run test jobs while works fine when trigger from local. |
Hi @TachunLin ,
If they are PASS if trigger from local like you mentioned, then we should suspect it's an environment issue and investigate ECM lab. CC @lanfon72 |
Those test cases are known issue in harvester/harvester#4640 |
|
Hi @TachunLin, Just sent a quick fix PR, please refer to |
|
Thank you @albinsun for finding the root cause and created PR #1419 to fix the flaky case of I also added the And trigger the new test on the main Jenkins vm towards the It can well fix the And I also trigger the entire |
lanfon72
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please move those test cases into TestBackupRestore (in the last) to reduce test time, as both are depends on the same test_backup_vm, I feel that it is not necessary to create another class for it.
| class TestBackupRestoreWithSnapshot: | ||
|
|
||
| @pytest.mark.dependency() | ||
| def test_connection(self, api_client, backup_config, config_backup_target): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we already tested the connection in previous class, we don't need duplicate the test case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion.
The reason to have the test connection running again is to maintain the full function of the backup and restore test cases inside the new test class TestBackupRestoreWithSnapshot
According to the test result in the follow up comment
#1384 (comment)
And consider the trade off of pros and cons, may I keep these tests in the new test class. Many thanks.
| assert 200 == code, f'Failed to test backup target connection: {data}' | ||
|
|
||
| @pytest.mark.dependency(depends=["TestBackupRestoreWithSnapshot::test_connection"], param=True) | ||
| def tests_backup_vm(self, api_client, wait_timeout, backup_config, base_vm_with_data): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above, it looks you are going to reuse the test case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion.
The reason to have the test connection running again is to provide a new backup for all the test cases inside the new test class TestBackupRestoreWithSnapshot.
According to the test result in the follow up comment
#1384 (comment)
And consider the trade off of pros and cons, may I keep these tests in the new test class. Many thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you want to decouple new test cases without using existed one, you can just add another fixture to spawn up VM and do the backup (as it is the prerequisite of new test cases).
The problem here is, if you added those two duplicated (totally same as previous class) test cases, which means the total of test cases in report is not reflect to what we actually implemented and covered tests.
to add them after existed test cases would be more easy than create another fixture, that's why I suggested so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TachunLin, any update or comment?
the current point is we should not have 2 same tests
TestBackupRestore:tests_backup_vmTestBackupRestoreWithSnapshot:tests_backup_vm
I think one way is to make TestBackupRestoreWithSnapshot:tests_backup_vm a class fixture.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TachunLin ,
Per discussed, let you know that /issues/1462 has been fixed so there should be no connection fail.
You can continue surveying the refactor of this task item, thx.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @albinsun for fixing the connection failed issue.
According to the discussion and based on the review comment.
I think it would be better moving the new snapshot restore test cases right after existed test cases in the same TestBackupRestore. It would solve the following concerns:
- No reuse and duplication of
test_connectionandtests_backup_vm - Save time without running the take backup action twice
I already made the fix and tested working on the local to remote ECM lab machine

Will try again on the main Jenkins jobs to double confirm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested PASS on the main Jenkins Harvester run test jobs #102
All the new snapshot restore tests can execute well inside the TestBackupRestore test class.
http://172.19.99.74/job/harvester-runtests/102/TestReport/
The S3 related tests are blocked to proceed due to the environment broken caused by vpn disconnection yesterday.
71c5a97 to
8c0aef0
Compare
a77925b to
cfdde7e
Compare
cfdde7e to
2d28bc4
Compare
| code, data = api_client.backups.restore(unique_vm_name, spec) | ||
| assert 201 == code, f'Failed to restore backup with current VM replaced, {data}' | ||
| vm_getable, (code, data) = vm_checker.wait_getable(unique_vm_name) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why remove wait_getable?
Won't possible to hit #1419?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the check.
I think that would probably because during the rebase process I miss something for not adding this check back.
Just added the wait_getable action back to the original places.
| code, data = api_client.vm_snapshots.get(vm_snapshot_name) | ||
| if data.get("status", {}).get("readyToUse"): | ||
| break | ||
| print(f"waiting for {vm_snapshot_name} to be ready") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drop temporary code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the check.
The purpose to add the line to check vm_snapshot is ready was to ensure the snapshot are actually created before proceeding to the next step.
| print(f"waiting for {vm_snapshot_name} to be ready") | ||
| sleep(3) | ||
| else: | ||
| raise AssertionError(f"timed out waiting for {vm_snapshot_name} to be ready") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please have code and data in error message like other assers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion.
Add the code and data in error message like others in the code.
| raise AssertionError(f"timed out waiting for {vm_snapshot_name} to be ready") | ||
|
|
||
| assert 200 == code | ||
| assert data.get("status", {}).get("readyToUse") is True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should these 2 lines be in the while loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion.
Indeed these 2 lines can be put into the previous while loop since a redundant check exists on
data.get("status", {}).get("readyToUse"). Update the code accordingly.
| # Check VM Started then get IPs (vm and host) | ||
| vm_got_ips, (code, data) = vm_checker.wait_ip_addresses(unique_vm_name, ['default']) | ||
| assert vm_got_ips, ( | ||
| f"Failed to Start VM({unique_vm_name}) with errors:\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have any restore or restart here?
If want to check VM still Running and accessible, then perhaps alter the error message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the check.
The VM did not restart before, just make sure the vm is running after taking the snapshot.
Just updated the assert error message accordingly.
| code, data = api_client.vm_snapshots.get(vm_snapshot_name) | ||
| if data.get("status", {}).get("readyToUse"): | ||
| break | ||
| print(f"waiting for {vm_snapshot_name} to be ready") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as previous.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion.
Add the code and data in error message like others in the code.
| raise AssertionError(f"timed out waiting for {vm_snapshot_name} to be ready") | ||
|
|
||
| assert 200 == code | ||
| assert data.get("status", {}).get("readyToUse") is True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as previous.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion.
Indeed these 2 lines can be put into the previous while loop since a redundant check exists on data.get("status", {}).get("readyToUse").
Update the code accordingly.
| vm_got_ips, (code, data) = vm_checker.wait_ip_addresses(unique_vm_name, ['default']) | ||
| assert vm_got_ips, ( | ||
| f"Failed to Start VM({unique_vm_name}) with errors:\n" | ||
| f"Status: {data.get('status')}\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as previous.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the check.
The VM did not restart before, just make sure the vm is running after taking the snapshot.
Just updated the assert error message accordingly.
| ) as sh: | ||
| out, err = sh.exec_command(f"echo {pub_key!r} > {base_vm_with_data['data']['path']}") | ||
| assert not err, (out, err) | ||
| sh.exec_command('sync') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does messing operation also needed in restore new?
As it's restore new, the messed data will not get back on base_vm.
2d28bc4 to
435a78b
Compare
435a78b to
534f329
Compare
faa5615 to
534f329
Compare














Which issue(s) this PR fixes:
Issue #1045
What this PR does / why we need it:
According to issue harvester/harvester#4954
We need to add the backend e2e test to the
vm_backup_restoreintegration testTo cover the case when a vm both have backup and snapshot created on it, when we restore this vm from backup.
It should be restore successfully.
Added the following test script:
Special notes for your reviewer:
Test result (Trigger locally and execute test on remote ecm lab machine)
Test can PASS all test cases in the

TestBackupRestoreWithSnapshotclassTest can PASS most of the test cases in the

test_4_vm_backup_restore.pyfileTestBackupRestoreandTestBackupRestoreOnMigrationandTestMultipleBackupRestore, consider the stability and future test scalability, thus I create a separate class for the new test scenario.