diff --git a/README.md b/README.md index 26b99d3..1cbe03f 100644 --- a/README.md +++ b/README.md @@ -79,6 +79,7 @@ stateDiagram-v2 | Last updated | Title | Author(s) alias | Category | |--------------|---------------------------------------------------------------------------------------------------------------------------------|------------------------------------------|----------| +| 2024-11-22 | [Copy Action](designs/2024-08-14-copy-action.md) | [@Silic0nS0ldier](https://github.com/Silic0nS0ldier) | Code Coverage | | 2024-08-29 | [A New Code Coverage API](https://docs.google.com/document/d/1HWGRrzE17rVuCpSniA1W3MeqKl-puX_sEzqUGyMxW5E) | [@c-mita](https://github.com/cmita) | Code Coverage | | 2024-07-28 | [Execution Platform Scoped Spawn Strategies](designs/2023-06-04-exec-platform-scoped-spawn-strategies.md) | [@Silic0nS0ldier](https://github.com/Silic0nS0ldier) | Configurability, Execution Strategy | | 2023-11-23 | [Regular rules in module extensions](https://docs.google.com/document/d/1OsEHpsJXXMC9SFAmAh20S42Dbmgdj4cNyYAsFOHMibo/edit) | [@matts1](https://github.com/matts1) | External Repositories | diff --git a/designs/2024-08-14-copy-action.md b/designs/2024-08-14-copy-action.md new file mode 100644 index 0000000..ae61606 --- /dev/null +++ b/designs/2024-08-14-copy-action.md @@ -0,0 +1,162 @@ +--- +created: 2024-08-14 +last updated: 2024-11-22 +status: To be reviewed +reviewers: + - ___ +title: Copy Action +authors: + - Silic0nS0ldier +--- + + +# Abstract + +Copying of files and directories is a common need that is currently underserved in Bazel. This proposal seeks to make this a builtin capability to improve performance and simplify rule development. + +# Background + +The following actions (`ctx.actions.*`) already exist; +- `args` - An abstraction for memory-efficient argument management. May produce a file. +- `declare_directory` - Declares a directory output. +- `declare_file` - Declares a file output. +- `declare_symlink` - Declares a symlink output. +- `do_nothing` - Does nothing, only serving as an insertion point for the deprecated 'extra actions' feature. +- `expand_template` - Creates a file using a template. +- `template_dict` - Abstraction for `expand_template` that allows deferring evaluation of values. +- `run` and `run_shell` - Runs an executable/shell script. May produce file(s), director(y/ies) and/or symlink(s). +- `symlink` - Creates a symlink. +- `write` - Creates a file. + +The closest analogue builtin analogue to copying a file is currently using `ctx.actions.symlink`, which depending on the scenario can lead to different +behaviour at runtime (e.g. NodeJS import resolution is affected, and package managers like [pnpm](https://pnpm.io/) (including rulesets like +[Rules JS](https://github.com/aspect-build/rules_js)) rely on such behavioural differences). This makes for a poor subsitutite. + +For a true file or directory copy, one of the following API surfaces must be used; +- `ctx.actions.run` or `ctx.actions.run_shell`. +- `genrule` + +Some utility rulesets include their own implemenations using these API; +- Skylib + - [`copy_directory` rule](https://github.com/bazelbuild/bazel-skylib/blob/5c071b5006bb9799981d04d74a28bdee2f000d4a/docs/copy_directory_doc.md). + - [`copy_file` rule](https://github.com/bazelbuild/bazel-skylib/blob/5c071b5006bb9799981d04d74a28bdee2f000d4a/docs/copy_file_doc.md). +- Aspect's Bazel Helpers Library + - [`copy_directory` rule and `copy_directory_action` action-generating function](https://github.com/aspect-build/bazel-lib/blob/5d09fc1b8352ef276dd4dd873b3dc5b0f5482f19/docs/copy_directory.md). + - [`copy_file` rule and `copy_file_action` action-generating function](https://github.com/aspect-build/bazel-lib/blob/5d09fc1b8352ef276dd4dd873b3dc5b0f5482f19/docs/copy_file.md). + - [`copy_to_bin` rule, `copy_file_to_bin_action` action-generating function and `copy_files_to_bin_actions` action-generating function](https://github.com/aspect-build/bazel-lib/blob/5d09fc1b8352ef276dd4dd873b3dc5b0f5482f19/docs/copy_to_bin.md) + - [`copy_to_directory` rule and `copy_to_directory_bin_action` action-generating function](https://github.com/aspect-build/bazel-lib/blob/5d09fc1b8352ef276dd4dd873b3dc5b0f5482f19/docs/copy_to_directory.md) + +Because all of these implementations rely on a spawn, they inherit associated weaknesses including; +- Overhead of spawn is disproportionately large relative to the task. This is even worse in conjunction with remote execution. +- Increases size (and potentially volume) of merkle trees, increasing CPU and memory requirements of Bazel. +- Attempts to optimise for remote execution by forcing copy-related spawns to run locally suffers from caching issues... + - ...in scenarios where clients are forbidden from uploading `ActionResult` metadata (listed outputs can be arbitrary, representing a security issue). + - ...when copying files generated by the remote as they must be downloaded first. +- Implementation changes can lead to cache misses. +- If copies are batched, odds are in most runs only 1 input will have actually changed (doing more work than necessary, becomes a performance and efficency concern once the batch hits a certain size). + +# Proposal + +Introduce a new `copy` action with the following behaviors; +- Content hash of output is identical to input, only the name is different. +- Realisation of output is deferred, similar to how remote spawn actions are handled. +- Action result is empty, similar to symlink actions. + +The API looks like `actions.copy(in: File, out: File, path: String|None): None`. +- `actions.copy(in, out)` is a simple same-type copy operation. + Unanswered question: How are `actions.symlink` inputs handled? Update the target? Error? +- `actions.copy(in, out, path = "foo/bar")` copies a file from the input directory. + `path` cannot be used with generic symlinks (`actions.declare_symlink`, source files??? and within tree artifacts support is later added) as Bazel does not track their target. + +Compared to existing solutions, `actions.copy` should; +- Complete much more quickly, as spawn machinery is not used and no subprocess is used. +- Allow other pending spawns to start sooner by sparing local resource pools. +- Cut down on remote execution (TBD - fairly sure this shouldn't have an action metadata record) and BES (TBD - need to check protocols and get some traffic samples) traffic during batches of copy operations. + +## Example Usage + +### Skylib's [`copy_directory`](https://github.com/bazelbuild/bazel-skylib/blob/5c071b5006bb9799981d04d74a28bdee2f000d4a/rules/private/copy_directory_private.bzl) + +```starlark +def _copy_directory_impl(ctx): + dst = ctx.actions.declare_directory(ctx.attr.out) + ctx.actions.copy(ctx.attr.src, dst) + + files = depset(direct = [dst]) + runfiles = ctx.runfiles(files = [dst]) + + return [DefaultInfo(files = files, runfiles = runfiles)] + +copy_directory = rule( + implementation = _copy_directory_impl, + provides = [DefaultInfo], + attrs = { + "src": attr.label(mandatory = True, allow_single_file = True), + # Cannot declare out as an output here, because there's no API for declaring + # TreeArtifact outputs. + "out": attr.string(mandatory = True), + }, +) +``` + +Compared to the canonical implementation; +- Implementation is significantly smaller and drops a (fairly light) import. +- Spawn action branches replaced by `copy` action. +- `is_windows` attribute to pick spawn action branch is gone. +- Macro to automatically set `is_windows` is gone. + +### Aspect's Bazel Helpers Library's [`copy_to_bin`](https://github.com/bazel-contrib/bazel-lib/blob/5d09fc1b8352ef276dd4dd873b3dc5b0f5482f19/lib/private/copy_file.bzl) + +```starlark +def copy_file_to_bin_action(ctx, file): + if not file.is_source: + return file + if ctx.label.workspace_name != file.owner.workspace_name: + fail(_file_in_external_repo_error_msg(file)) + if ctx.label.package != file.owner.package: + fail(_file_in_different_package_error_msg(file, ctx.label)) + + if file.path.startswith("bazel-"): + first = file.path.split("/")[0] + suffix = first[len("bazel-"):] + if suffix in ["testlogs", "bin", "out"]: + print("__WARNING__") + + dst = ctx.actions.declare_file(file.basename, sibling = file) + crx.actions.copy(file, dst) + return dst + +def _file_in_external_repo_error_msg(file): + return "__ERROR__" + +def _file_in_different_package_error_msg(file, curr_package_label): + return "__ERROR__" + +def copy_files_to_bin_actions(ctx, files): + return [copy_file_to_bin_action(ctx, file) for file in files] + +def _copy_to_bin_impl(ctx): + files = copy_files_to_bin_actions(ctx, ctx.files.srcs) + return DefaultInfo( + files = depset(files), + runfiles = ctx.runfiles(files = files), + ) + +copy_to_bin = rule( + implementation = _copy_to_bin_impl, + provides = [DefaultInfo], + attrs = { + "srcs": attr.label_list(mandatory = True, allow_files = True), + }, +) +``` + +Compared to the canonical implementation; +- Implementation remains about the same (common logic is in a separate file). +- Common logic can be significantly simplified or removed in favor of direct `actions.copy` usage. +- No toolchain resolution is necessary. +- N/A for the macro since the the canonical implementation just fowards to the rule. + +# Backward-compatibility + +This proposal won't impact backward compatibility, although feature detection should be considered so that existing utility rulesets can optionally use the newer (and more efficent) API without needing to wait for supported Bazel versions to age out of the support matrix.