Changes to speed up qvm-ls by qubesuser · Pull Request #165 · QubesOS/qubes-core-admin

qubesuser · 2017-11-09T19:28:01Z

This is the server part of a bunch of changes that get qvm-ls to around 200ms run time.

Warning: I haven't tested them much, and they change fundamental code, so they probably need fixes

The first major change is the introduction of APIs that allow to get all properties of a VM or of Qubes and all properties of all VMs in a single call.

The second major change is storing the libvirt state in qubesd (and changing it in response to events), so that libvirtd calls are not required to get the system state.

Finally, there's some optimization and fixes.

codecov · 2017-11-10T15:37:27Z

Codecov Report

Merging #165 into master will decrease coverage by 0.03%.
The diff coverage is 41.29%.

@@            Coverage Diff            @@
##           master    #165      +/-   ##
=========================================
- Coverage   53.93%   53.9%   -0.04%     
=========================================
  Files          55      55              
  Lines        8452    8608     +156     
=========================================
+ Hits         4559    4640      +81     
- Misses       3893    3968      +75

Flag	Coverage Δ
#unittests	`53.9% <41.29%> (-0.04%)`	⬇️

Impacted Files	Coverage Δ
qubes/config.py	`100% <ø> (ø)`	⬆️
qubes/vm/mix/net.py	`56.14% <ø> (ø)`	⬆️
qubes/vm/adminvm.py	`72.13% <ø> (ø)`	⬆️
qubes/tools/qubesd.py	`0% <0%> (ø)`	⬆️
qubes/vm/qubesvm.py	`38.92% <27.71%> (+2.06%)`	⬆️
qubes/app.py	`66.71% <37.77%> (-1.64%)`	⬇️
qubes/api/admin.py	`86.22% <39.13%> (-5.6%)`	⬇️
qubes/vm/__init__.py	`83.9% <50%> (-0.53%)`	⬇️
qubes/__init__.py	`92.6% <89.18%> (-0.48%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 682d950...0175ad6. Read the comment docs.

Causes an unnecessary exception

marmarek · 2017-11-10T18:16:49Z

qubes/api/admin.py

+        strs = []
+        for vm in sorted(domains):
+            self._vm_line_strs(strs, vm)
+        return ''.join(strs)


Is str.append really faster than str.format?

The idea is to avoid doing any kind of string processing until the final join call, which should be the fastest since it should just allocate a single string in join and copy them all in, avoiding any intermediate allocations, partial concatenations, format string parsing, etc.

It's slightly uglier, but not that much, it looks like writing to an output stream (which would be even faster, but not really worth the effort).

marmarek

Some of those commits are good as-is, some requires changes. We'll think about property.GetAll, so if you think other changes are important improvements anyway, splitting it into separate PR may be a good idea.

marmarek · 2017-11-10T18:18:01Z

qubes/api/admin.py

+        return self._property_get_all(self.app)

-        property_def = dest.property_get_def(self.arg)
+    # pylint: disable=no-self-use


@staticmethod

marmarek · 2017-11-10T18:35:20Z

qubes/app.py

+                self._conn.registerCloseCallback(self._close_callback, None)
+                break
+            except libvirt.libvirtError:
+                subprocess.call(["systemctl", "start", "libvirtd"])


This is a bad idea. It makes it impossible to stop libvirtd service - which admin should be allowed to do. I guess it will also lead to some corner cases during restart and system shutdown.
libvirtd service have configured automatic restart on crash, so this shouldn't be an issue in that case.

I removed it, although not sure which way is the best. The problem of not doing this is that qubesd will just freeze forever while trying to reconnect if libvirtd is stopped, which also doesn't seem great.

Maybe add some timeout (5s?) and throw exception?

marmarek · 2017-11-10T18:39:27Z

qubes/__init__.py

                        'property {!r} not applicable to {!r}'.format(
                            name, self.__class__.__name__))

+    @classmethod


What about @functools.lru_cache() instead?

marmarek · 2017-11-10T18:46:44Z

qubes/vm/qubesvm.py

+                    self._stubdom_xid = int(stubdom_xid_str)
+
+        if self.is_halted() and not was_halted:
+            self.fire_event('domain-shutdown')


See #159 for related changes. Especially see the description of race condition fixed there.

This prevents the "double shutdown event" as well. It think just taking the startup lock while firing the event could fix the synchronization issue?

It will not prevent handling domain-shutdown after VM was started again, namely this sequence:

VM shutdown

vm.start()

vm.fire_event('domain-shutdown')

You don't have any guarantee about order of input processing (request on qubesd.sock, event on libvirt socket). And additionally, you can't take lock directly in libvirt event handler, because it isn't coroutine.

So I think this needs to be thought about and done more carefully: probably the simplest way is having non-coroutines schedule coroutines, and then pretty much serializing all methods that change the VM in any way, possibly including libvirt state updates, by taking the startup_lock (which should just be renamed to "lock") and updating libvirt state under lock if necessary.

Could also consider deleting the libvirt domain for VMs that are shutdown to prevent external tools starting it while we finalize storage and other such robustness measures against raw use of the libvirt/xen APIs.

I think neither the current code nor this patch does this correctly, and some more work is needed.

marmarek · 2017-11-10T18:51:42Z

qubes/vm/qubesvm.py

-            raise
-
-        assert False
+        state = self._libvirt_state


Changes here are quite intrusive - especially more relies on libvirt events being really delivered (which is fragile...). What about caching libvirt state just in this function, instead of calling state() in each if statement? That should be big improvement already.

The idea is to never call libvirtd for any read-only admin API call except for CPU/memory stats (and internal inspection).

It should theoretically not be fragile because it updates the state on every connection to libvirtd, on every lifecyle event received, and if the libvirt connection dies the close callback should be called, which triggers a reconnect, which triggers a state update.

Of course it's possible that the implementation is buggy

Without this doing qvm-ls requires at least one call to libvirtd per VM, which is probably going to take hundreds of milliseconds, and also any random "is_running" and similar call within qubes is going to cause I/O, which can also cause pervasive slowdown.

Plus we kind of want to know what the VMs are doing anyway to trigger events and perform appropriate actions, and if that can be done accurately, then the state can be accurately stored as well.

marmarek · 2017-11-10T18:53:40Z

qubes/vm/qubesvm.py

+            elif not self.is_running():
+                if not autostart:
+                    raise qubes.exc.QubesVMNotRunningError(self)
+                yield from self.start(start_guid=gui)


This will deadlock.

Changed, this commit is quite unrelated to the rest anyway.

marmarek · 2017-11-10T19:08:39Z

qubes/api/admin.py

            self._vm_line_strs(strs, vm)
        return ''.join(strs)

+    @qubes.api.method('admin.vm.GetAllData', no_payload=True,


Sorry, this is too much. Even if we agree for property.GetAll, getting all info of all the VMs in one call is beyond what we want in Admin API.

marmarek · 2017-11-10T19:10:18Z

qubes/app.py

+                pass
+        return cls._name
+
 def _default_pool(app):


@functools.lru_cache instead?

it's unused and has a netid property with name different than key that would cause issues in the next commit

currently it takes 100ms+ to determine the default pool every time, including subprocess spawning (!), which is unacceptable since finding out the default value of properties should be instantaneous instead of checking every thin pool to see if the root device is in it, find and cache the name of the root device thin pool and see if there is a configured pool with that name

…dom0-pool00

Otherwise qubesd will die if you try to break in the debugger

- connect in a loop, starting libvirtd - reconnect on demand in a loop - register events on reconnection

…very time

info already contains domain state, so just use it, making a single libvirtd call

This should prevent getting "qrexec not connected" due to trying to start a service in the middle of VM startup

This allows MUCH faster qvm-ls and qvm-prefs The code is optimized to be as fast as possible: - optimized property access - gather strings in a list and join them only at the end

Even faster qvm-ls!

marmarek · 2020-05-24T00:00:59Z

The property.GetAll idea has been implemented in #298. Few individual commits may be worth including too, but the IMO most important part already is.

This was referenced Nov 9, 2017

Changes to speed up qvm-ls QubesOS/qubes-core-admin-client#36

Open

Qubes 4 Admin API is way too inefficient QubesOS/qubes-issues#3293

Closed

qubesuser force-pushed the fast_qvm_ls branch 5 times, most recently from e328e57 to b81c949 Compare November 10, 2017 15:27

qubesuser force-pushed the fast_qvm_ls branch 2 times, most recently from 1422b80 to 3605c92 Compare November 10, 2017 16:31

qubesuser added 2 commits November 10, 2017 18:14

don't access netvm if it's None in visible_gateway/netmask

9cc86b3

Causes an unnecessary exception

cache isinstance(default, collections.Callable)

b297ecb

qubesuser force-pushed the fast_qvm_ls branch from 3605c92 to 31d5624 Compare November 10, 2017 17:21

marmarek reviewed Nov 10, 2017

View reviewed changes

marmarek requested changes Nov 10, 2017

View reviewed changes

marmarek mentioned this pull request Nov 10, 2017

Simple server optimizations #166

Merged

qubesuser force-pushed the fast_qvm_ls branch 4 times, most recently from 1c01393 to 1db1dd1 Compare November 11, 2017 00:57

qubesuser added 5 commits November 11, 2017 02:37

remove unused netid code

f2b8ad7

it's unused and has a netid property with name different than key that would cause issues in the next commit

cache PropertyHolder.property_list and use O(1) property name lookups

d183ab1

create "lvm" pool using rootfs thin pool instead of hardcoding qubes_…

edae7a1

…dom0-pool00

don't replace pdb debugger's SIGINT handler

015ee9c

Otherwise qubesd will die if you try to break in the debugger

qubesuser force-pushed the fast_qvm_ls branch from 1db1dd1 to 27742bc Compare November 11, 2017 01:40

qubesuser added 4 commits November 11, 2017 13:40

rework libvirt connection code

cf27e10

- connect in a loop, starting libvirtd - reconnect on demand in a loop - register events on reconnection

store libvirt state and update on events instead of asking libvirtd e…

690a88b

…very time

don't ask for domain state before asking for info

9ee73a2

info already contains domain state, so just use it, making a single libvirtd call

take startup lock while starting services

041bfbb

This should prevent getting "qrexec not connected" due to trying to start a service in the middle of VM startup

qubesuser added 2 commits November 11, 2017 13:40

Add admin APIs to get all properties at once, and optimize the code

3ea172c

This allows MUCH faster qvm-ls and qvm-prefs The code is optimized to be as fast as possible: - optimized property access - gather strings in a list and join them only at the end

add GetAllData API to get data of all VMs

0175ad6

Even faster qvm-ls!

qubesuser force-pushed the fast_qvm_ls branch from 27742bc to 0175ad6 Compare November 11, 2017 22:37

Uh oh!

Conversation

qubesuser commented Nov 9, 2017

Uh oh!

codecov bot commented Nov 10, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

marmarek left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

qubesuser Nov 10, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

marmarek commented May 24, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov bot commented Nov 10, 2017 •

edited

Loading

qubesuser Nov 10, 2017 •

edited

Loading