Conversation
e328e57 to
b81c949
Compare
Codecov Report
@@ Coverage Diff @@
## master #165 +/- ##
=========================================
- Coverage 53.93% 53.9% -0.04%
=========================================
Files 55 55
Lines 8452 8608 +156
=========================================
+ Hits 4559 4640 +81
- Misses 3893 3968 +75
Continue to review full report at Codecov.
|
1422b80 to
3605c92
Compare
Causes an unnecessary exception
3605c92 to
31d5624
Compare
| strs = [] | ||
| for vm in sorted(domains): | ||
| self._vm_line_strs(strs, vm) | ||
| return ''.join(strs) |
There was a problem hiding this comment.
Is str.append really faster than str.format?
There was a problem hiding this comment.
The idea is to avoid doing any kind of string processing until the final join call, which should be the fastest since it should just allocate a single string in join and copy them all in, avoiding any intermediate allocations, partial concatenations, format string parsing, etc.
It's slightly uglier, but not that much, it looks like writing to an output stream (which would be even faster, but not really worth the effort).
marmarek
left a comment
There was a problem hiding this comment.
Some of those commits are good as-is, some requires changes. We'll think about property.GetAll, so if you think other changes are important improvements anyway, splitting it into separate PR may be a good idea.
| return self._property_get_all(self.app) | ||
|
|
||
| property_def = dest.property_get_def(self.arg) | ||
| # pylint: disable=no-self-use |
qubes/app.py
Outdated
| self._conn.registerCloseCallback(self._close_callback, None) | ||
| break | ||
| except libvirt.libvirtError: | ||
| subprocess.call(["systemctl", "start", "libvirtd"]) |
There was a problem hiding this comment.
This is a bad idea. It makes it impossible to stop libvirtd service - which admin should be allowed to do. I guess it will also lead to some corner cases during restart and system shutdown.
libvirtd service have configured automatic restart on crash, so this shouldn't be an issue in that case.
There was a problem hiding this comment.
I removed it, although not sure which way is the best. The problem of not doing this is that qubesd will just freeze forever while trying to reconnect if libvirtd is stopped, which also doesn't seem great.
There was a problem hiding this comment.
Maybe add some timeout (5s?) and throw exception?
| 'property {!r} not applicable to {!r}'.format( | ||
| name, self.__class__.__name__)) | ||
|
|
||
| @classmethod |
There was a problem hiding this comment.
What about @functools.lru_cache() instead?
| self._stubdom_xid = int(stubdom_xid_str) | ||
|
|
||
| if self.is_halted() and not was_halted: | ||
| self.fire_event('domain-shutdown') |
There was a problem hiding this comment.
See #159 for related changes. Especially see the description of race condition fixed there.
There was a problem hiding this comment.
This prevents the "double shutdown event" as well. It think just taking the startup lock while firing the event could fix the synchronization issue?
There was a problem hiding this comment.
It will not prevent handling domain-shutdown after VM was started again, namely this sequence:
- VM shutdown
- vm.start()
- vm.fire_event('domain-shutdown')
You don't have any guarantee about order of input processing (request on qubesd.sock, event on libvirt socket). And additionally, you can't take lock directly in libvirt event handler, because it isn't coroutine.
There was a problem hiding this comment.
So I think this needs to be thought about and done more carefully: probably the simplest way is having non-coroutines schedule coroutines, and then pretty much serializing all methods that change the VM in any way, possibly including libvirt state updates, by taking the startup_lock (which should just be renamed to "lock") and updating libvirt state under lock if necessary.
Could also consider deleting the libvirt domain for VMs that are shutdown to prevent external tools starting it while we finalize storage and other such robustness measures against raw use of the libvirt/xen APIs.
I think neither the current code nor this patch does this correctly, and some more work is needed.
| raise | ||
|
|
||
| assert False | ||
| state = self._libvirt_state |
There was a problem hiding this comment.
Changes here are quite intrusive - especially more relies on libvirt events being really delivered (which is fragile...). What about caching libvirt state just in this function, instead of calling state() in each if statement? That should be big improvement already.
There was a problem hiding this comment.
The idea is to never call libvirtd for any read-only admin API call except for CPU/memory stats (and internal inspection).
It should theoretically not be fragile because it updates the state on every connection to libvirtd, on every lifecyle event received, and if the libvirt connection dies the close callback should be called, which triggers a reconnect, which triggers a state update.
Of course it's possible that the implementation is buggy
Without this doing qvm-ls requires at least one call to libvirtd per VM, which is probably going to take hundreds of milliseconds, and also any random "is_running" and similar call within qubes is going to cause I/O, which can also cause pervasive slowdown.
Plus we kind of want to know what the VMs are doing anyway to trigger events and perform appropriate actions, and if that can be done accurately, then the state can be accurately stored as well.
qubes/vm/qubesvm.py
Outdated
| elif not self.is_running(): | ||
| if not autostart: | ||
| raise qubes.exc.QubesVMNotRunningError(self) | ||
| yield from self.start(start_guid=gui) |
There was a problem hiding this comment.
Changed, this commit is quite unrelated to the rest anyway.
| self._vm_line_strs(strs, vm) | ||
| return ''.join(strs) | ||
|
|
||
| @qubes.api.method('admin.vm.GetAllData', no_payload=True, |
There was a problem hiding this comment.
Sorry, this is too much. Even if we agree for property.GetAll, getting all info of all the VMs in one call is beyond what we want in Admin API.
| pass | ||
| return cls._name | ||
|
|
||
| def _default_pool(app): |
1c01393 to
1db1dd1
Compare
it's unused and has a netid property with name different than key that would cause issues in the next commit
currently it takes 100ms+ to determine the default pool every time, including subprocess spawning (!), which is unacceptable since finding out the default value of properties should be instantaneous instead of checking every thin pool to see if the root device is in it, find and cache the name of the root device thin pool and see if there is a configured pool with that name
Otherwise qubesd will die if you try to break in the debugger
1db1dd1 to
27742bc
Compare
- connect in a loop, starting libvirtd - reconnect on demand in a loop - register events on reconnection
info already contains domain state, so just use it, making a single libvirtd call
This should prevent getting "qrexec not connected" due to trying to start a service in the middle of VM startup
This allows MUCH faster qvm-ls and qvm-prefs The code is optimized to be as fast as possible: - optimized property access - gather strings in a list and join them only at the end
Even faster qvm-ls!
27742bc to
0175ad6
Compare
|
The property.GetAll idea has been implemented in #298. Few individual commits may be worth including too, but the IMO most important part already is. |
This is the server part of a bunch of changes that get qvm-ls to around 200ms run time.
Warning: I haven't tested them much, and they change fundamental code, so they probably need fixes
The first major change is the introduction of APIs that allow to get all properties of a VM or of Qubes and all properties of all VMs in a single call.
The second major change is storing the libvirt state in qubesd (and changing it in response to events), so that libvirtd calls are not required to get the system state.
Finally, there's some optimization and fixes.