Skip to content
This repository was archived by the owner on Jan 27, 2019. It is now read-only.

Conversation

@Villemoes
Copy link
Contributor

This is mostly a proof-of-concept people can play with. The reason I'm not proposing this for immediate inclusion is that I'm not entirely convinced that we don't have some python function relying on being able to modify the task's metadata, with those changes then being visible to some postfunc or some entirely different task. So I wrote this in a way that doing each of the five tasks asynchronously is entirely opt-in and controlled by setting e.g. __ASYNC_SPLIT = True in local.conf.

I've tried an "oe bake world -t fetch" with empty ingredients. Without this, it takes 65 min, while with __ASYNC_FETCH = True it finished in 15 min (it takes that long partly because some ingredients currently fail to fetch, but the wget PR should hopefully fix that). But of course usually one has most ingredients, and even if not, the fetch time will be partly hidden behind the tasks that already run asynchronously.

@esben esben added the ready label Nov 15, 2016
@Villemoes Villemoes force-pushed the ravi/async_python branch 2 times, most recently from 1bed7a4 to d7993dc Compare December 5, 2016 12:55
Villemoes and others added 6 commits January 21, 2017 18:57
For now this is just a copy of the generic implementations in
OEliteFunction.

When we implement async python functions in terms of fork(), we could
just do the umask and chdir in the child, but since we only do some
python functions asynchronously, we'd have to duplicate the
try..finally stuff in the synchronous case, and that's not worth it
for saving four system calls in the parent.
Well, sort of. For now we just implement retrieving
e.g. do_fetch[__async], but it has to be False.

I do get_flag with expand=CLEAN_EXPANSION to allow me to do something like

do_fetch[__async] = "${__ASYNC_FETCH}"

with the latter set in local.conf, or not set at all. The bool(int(
... or 0)) dance is so that all of unset, "", "0", "1", False, True
etc. do the expected thing. The double underscores are to ensure that
these variables and flags to not affect metadata hashes.
The list of PythonFunction tasks includes at least:

  do_fetch
  do_unpack
  do_stage
  do_split
  do_package

These are all rather I/O bound, and do_fetch in particular is prone to
stall the entire build if it is trying to fetch from an unresponsive
or just slow server - worst case, with our default timeout settings,
we can end up waiting 10 minutes, during which we may completely fail
to start other tasks.

This implements support for making a particular python task function
run asynchronously; simply add e.g.

do_fetch[__async] = True

above the do_fetch definition in fetch.oeclass.

In order for a task function implemented as a PythonFunction to safely
run asynchronously, it must not rely on mutating state in the OE-lite
process. Checking that is a rather tedious and error-prone job, so
this is mostly an experimental feature for now.
For now, this allows controlling/experimenting with which of the tasks
fetch, unpack, stage, split and package that are run asynchronously by
setting variables __ASYNC_FOO in local.conf.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants