Skip to content

Conversation

@dmikusa
Copy link
Contributor

@dmikusa dmikusa commented Jan 7, 2025

Signed-off-by: Daniel Mikusa <dan@mikusa.com>
@buildpack-bot
Copy link
Member

Maintainers,

As you review this RFC please queue up issues to be created using the following commands:

/queue-issue <repo> "<title>" [labels]...
/unqueue-issue <uid>

Issues

(none)

@joeybrown-sf
Copy link
Contributor

I don't see anything in here about chaining transformations. Would we want to support that? If so, perhaps we could consider changing the names of the variables to something like $SOURCE_CMD instead of $ORIGINAL_CMD or something to that effect.

@joeybrown-sf
Copy link
Contributor

What do you think about adding an optional reason field to the transform section? Since we're going to be logging the transformation, it might be good for user experience.

@dmikusa
Copy link
Contributor Author

dmikusa commented Jan 10, 2025

I don't see anything in here about chaining transformations. Would we want to support that? If so, perhaps we could consider changing the names of the variables to something like $SOURCE_CMD instead of $ORIGINAL_CMD or something to that effect.

Not sure I follow, do you have an example of what you mean by chaining?

@dmikusa
Copy link
Contributor Author

dmikusa commented Jan 10, 2025

What do you think about adding an optional reason field to the transform section? Since we're going to be logging the transformation, it might be good for user experience.

Sounds good to me. I'll add it.

Signed-off-by: Daniel Mikusa <dan@mikusa.com>
@joeybrown-sf
Copy link
Contributor

I don't see anything in here about chaining transformations. Would we want to support that? If so, perhaps we could consider changing the names of the variables to something like $SOURCE_CMD instead of $ORIGINAL_CMD or something to that effect.

Not sure I follow, do you have an example of what you mean by chaining?

Kind of like what @jabrown85 mentioned in the slack thread. "time wraps secretmanager wraps some-cmd".

I could see multiple buildpacks that do this transforming, so if you're the 3rd transformation buildpack, $ORIGINAL_CMD might not be accurate because it might not be the original per se, because it's already transformed. I think calling it something like $source helps to make it more apparent that we're transforming the process that may not be in its original state.

@dmikusa
Copy link
Contributor Author

dmikusa commented Jan 10, 2025

I don't see anything in here about chaining transformations. Would we want to support that? If so, perhaps we could consider changing the names of the variables to something like $SOURCE_CMD instead of $ORIGINAL_CMD or something to that effect.

Not sure I follow, do you have an example of what you mean by chaining?

Kind of like what @jabrown85 mentioned in the slack thread. "time wraps secretmanager wraps some-cmd".

I could see multiple buildpacks that do this transforming, so if you're the 3rd transformation buildpack, $ORIGINAL_CMD might not be accurate because it might not be the original per se, because it's already transformed. I think calling it something like $source helps to make it more apparent that we're transforming the process that may not be in its original state.

Ohh, ok. I understand. Yes, I can see how that might get confusing, and a rename sounds reasonable. How about PREVIOUS_ or CURRENT_? I'm not necessarily opposed to SOURCE_ but it is kind of a synonym to ORIGINAL. Or we could just take off the prefix? or use a generic prefix like LAUNCH_? Thoughts?

@joeybrown-sf
Copy link
Contributor

Yeah I think dropping the prefix altogether is the right approach. It should be clear in context what those placeholders are.

Signed-off-by: Daniel Mikusa <dan@mikusa.com>
@dmikusa
Copy link
Contributor Author

dmikusa commented Jan 10, 2025

Yeah I think dropping the prefix altogether is the right approach. It should be clear in context what those placeholders are.

Done.

@dmikusa
Copy link
Contributor Author

dmikusa commented Jan 10, 2025

@joeybrown-sf Any thoughts on the alternatives section? or the format of the toml in the spec changes section? Even if it's just to say you prefer it the way it is now. Lot of ways to do this, just hoping to get a little feedback on that. thanks!

@joeybrown-sf
Copy link
Contributor

sure thing!

I do prefer the single transform block, but that's just my personal preference. I like it because it's a bit more terse, less nested, and self-explanatory. That being said, I like "transformations" better than "transforms" because the former is always a noun where the latter could be a verb.

In my opinion, the following format would is my favorite.

[[transformations]]
type = "task"
command = ["time", $CMD]
args = ["more", "args"]
working-dir = "/somewhere-else"

type = "migration" # reference original process type

[processes.transform]
command = ["bash", "-c '$CMD_STRING'"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have we thought through all the things that could go wrong with variable interpolation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're using the $CMD and $ARGS options, I think that's the easiest option and probably the safest.

My thought for those is that the implementation is essentially treating them as lists, so if you have command = ["time", $CMD] and the original command is command = ["foo", "bar"] then you'd be merging the original list items into the new list at the place where the variable is located, so you don't need to look at or manipulate the strings, ex: command = ["time", "foo", "bar"].

For the $CMD_STRING and $ARGS_STRING, that's trickier. The obvious problem is string quoting. That is going to require a very cautious implementation. If you have command = ["bash", "-c '$CMD_STRING'"] and an original command of command = ["foo", "bar"], then you get command = ["bash", "-c 'foo bar'"].

The trouble is if you have single quotes in the original command like command = ["fo'o", "bar"], then what? The implementation could shell escape those single quotes, but then that requires the implementation to be aware of the string in which it's interpolating the variable. To know if it's in the middle of a pair of single quotes, and that is complicated.

I think we have two simple options:

  1. Keep it simple and not escape anything. That reduces the complexity and leaves it up to the buildpack author to sort out. I think this would work for most cases, but would have limits. If I need to use single quotes to wrap a command string and the wrapping buildpack uses single quotes, then it would fail and there would be no workaround.
  2. We could introduce something very basic like $CMD_STRING is the raw string, $CMD_STRING_ESC_SQ and $CMD_STRING_ESC_DQ. This would still be simple, it's not reimplementing Bash, but it would give some very basic control to the buildpack author. So if I'm making a buildpack that wraps another command like command = ["bash", "-c '$CMD_STRING'"], I know this could be a problem because I put in single quotes. I could then switch to command = ["bash", "-c '$CMD_STRING_ESC_SQ'"] and have the string single quote escaped. It's not elegant, but it does work to expand the use cases that we could support further.

The only other option that comes to mind would be using some sort of template language, like Go template. I think that's another level of complexity, though.

There are probably other issues with string interpolation. Maybe security? Although, I think there are other ways one could nefariously modify the start commands, so I don't think string interpolation is exposing anything you can do some other way in that regard.

Was there something in particular that had you concerned?

@jabrown85
Copy link
Contributor

@dmikusa do you want to take a round of edits given the feedback and get this up for vote soon?

@dmikusa
Copy link
Contributor Author

dmikusa commented Mar 30, 2025

@dmikusa do you want to take a round of edits given the feedback and get this up for vote soon?

@jabrown85 I will try, but things are a little busy over at Paketo at the moment and it's eating up my time. I will try.

@dmikusa
Copy link
Contributor Author

dmikusa commented May 21, 2025

Sorry for the delay. I've updated this to use the transformations block instead. I think this is ready for another review.

@wrr
Copy link

wrr commented Jun 11, 2025

Following the discussion on the #buildpack-authors Slack channel, I have studied the 'Flexible Process Types' proposal regarding the use case:

Is there a way for an executable included in a CNB buildpack to start automatically during the application startup and rebind the PORT environment variable that's visible to web processes?

Background: I'm porting a Heroku legacy buildpack to CNB (https://github.com/wwwhisper-auth/wwwhisper-heroku-buildpack). The buildpack provides an authorization proxy that runs in front of a web app. Currently, the proxy starts automatically from a .profile.d script, binds the public PORT passed by Heroku runtime, and remaps the PORT to a local port. This allows users to simply run Heroku buildpack:add ... to enable the proxy without modifying their Procfile.

It looks like the proposal addresses this use case. The authorization web proxy buildpack would provide the following launch.toml:

[[transformations]]
type = "web"
command = ["authproxy", "'$CMD_STRING'"]
reason = "Starting authorization proxy in front of the web app"

The authproxy process would then start listening on externally accessible PORT and change PORT to a local port before executing CMD_STRING as a shell script. For this use case, it is important for the CMD_STRING to still contain the original command (for example, fastapi run --port $PORT --host ::), not the expanded command (fastapi run --port 14000 --host ::), so PORT rebinding is visible in the executed command. My understanding is that this is what the RFC proposes.

@schneems
Copy link

schneems commented Jul 31, 2025

Short: I am 👍 for adding a way to do this in a non-trivial case (it's possible for a trivial case already). I'm 👎 on adding a TOML-based DSL. I have an alternative proposal involving exposing resolved launch.toml information to the buildpack.

Longer:

What is possible today

For the simplest case where there's only one launch.toml that defines processes, this is already possible:

$ $ pack build wrap_time_demo --path . && pack inspect wrap_time_demo | grep web -A5 -B5
...
  me/time-wrapper             0.0.0          -

Processes:
  TYPE                 SHELL        COMMAND        ARGS                                                                                                                          WORK DIR
  web (default)                     time           bash -c bin/rails server --binding "[::]" --port "${PORT:?Error: PORT env var is not set!}" --environment "$RAILS_ENV"        /workspace

This is because:

  • launch.toml is already readable by future buildpacks
  • Duplicate entries are spec-d such that the last running buildpack wins

However, it's not possible in the more complex case where there are multiple entries, or even multiple buildpacks that want to wrap commands i.e. time mcp other bin/rails. That's because the buildpack doesn't know the order of execution of previous buildpacks, but the lifecycle does.

This proposal

I'm +1 on the use case. I don't love adding a TOML-based DSL, though. This opens up the door for people to suggest new syntax options and pitch new use cases that need to be evaluated. There are also a lot of edge cases that upstream (lifecycle) will be tasked with handling. Such as: In the above example where someone is using bash -c (heroku/buildpacks#15) perhaps we want to special case that and instead modify the contents of the argument such that it is time bin/rails rather than time with bash -c bin/rails etc. If we handed the buildpack the TOML, it could make this distinction, but if we provide them a DSL, the DSL would need to add affordances for each and every edge case, or otherwise provide some level of Turing completeness, which I don't think we want.

Resolved launch.toml proposal

When faced with a specific problem, I ask if there's a general-purpose solution that could help me solve it if it existed. If we could provide the buildpack desiring to modify the resolved and finalized launch.toml as the lifecycle would see it, then it could read it in and make any modifications it wanted to its own launch.toml, which will take precedence in the event of name conflict.

I propose we inject this information into the bin/build process via a CNB_PRIOR_LAUNCH_TOML environment variable. The output of this would be TOML resolved from prior buildpacks, and then it's up to the buildpack to do what they want with it. They can choose to read it in modify it as they see fit, and then write it to their own launch.toml. When they pick conflicting/overlapping type names. That would allow them to effectively get the features needed to wrap commands, but we don't need to introduce or maintain a new DSL.

I think from the platform implementer side, the logic for doing this resolution is already present; it's just a matter of refactoring internals to expose it and standardize on how we want to expose the information. They wouldn't have to maintain N different DSLs for N different spec versions. While bash-based buildpacks reading and modifying TOML is more cumbersome than Go or Rust-based buildpacks, it's still possible. For the buildpack maintainer, it's slightly more effort than appending static toml, but it's one fewer concept to learn and supports use cases we've not yet realized we have.

@jericop
Copy link

jericop commented Aug 1, 2025

I too am in favor of adding this functionality in general.

I really like the idea of having something like CNB_PRIOR_LAUNCH_TOML available to buildpack authors for potential modification. It would be a perfect use case for inline buildpacks and would potentially eliminate the need to have custom buildpacks for a somewhat simple change.

One challenge with inline buildpacks, at least from my experience, is that because they always pass detection you cannot create a build plan to require things like python, go, java, etc. This would mean you are limited to tools installed on the build image only. If this is a problem then inline buildpack won’t work and you would then need a custom buildpack so you can have a proper detect script, or you can keep the inline buildpack and use something like the paketo-community/buildp-plan buildpack to specify a build plan and any requirements to the accompanying plan.toml file. Either way this leads to extra files/directories being created.

One thing that would be great with any approach is that it can be configured through environment variables or project.toml entirely without the need for extra files and directories.

I know this is beyond the scope of this rfc, but if inline buildpacks did support a build plan (while still always passing detection), then you would have a very effective way to modify prior launch toml. If anyone else likes this idea and thinks it’s worth a separate rfc, please thumbs up.


Because a subsequent buildpack does not know what a previous buildpack has defined for the command, args, and working directory the following place holders may be used to reference those values:

- `$CMD` - the original command as a list, fits into a TOML list
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a developer needs to add a literal $CMD they need a way to do so, especially since we are working with such a broad ecosystem. It also cannot involve escaping as any possible escape sequence could be a perfectly valid input as well.

I think the only option would be allowing developers to re-map these somehow like:

[[transforms.remap]]
CMD = "$ACTUAL_CMD"
ARGS = "$CUSTOM_ARGS"

That would allow them to use a $CMD literal if needed:

command = ["time", "$ACTUAL_CMD", "--format={$CMD%s}"]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a use case where this actually comes up?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No immediate need, more enumerating edge cases. Aliasing doesn’t need to go into a v1 schema, but I’m making a note that if someone does have this problem, escaping isn’t the right fix.

@dmikusa
Copy link
Contributor Author

dmikusa commented Aug 6, 2025

@schneems - The intent of this proposal is to only cover basic use cases. It is out of scope to cover every edge case and possibility that someone might dream up. The belief of this proposal is that having a basic, limited way to address this user need is better than a.) nothing viable (present situation) and b.) a complicated way of doing everything a user might cook up.

It is not trying to be future-proof either. If additional legitimate use cases surface that are not covered by this proposal, then it would be up to the CNB team to evaluate those in the future.

@dmikusa
Copy link
Contributor Author

dmikusa commented Aug 6, 2025

Resolved launch.toml proposal

Hmm, that's interesting. The context behind the present proposal is on this thread in Slack: https://cloud-native.slack.com/archives/C0331B5QS02/p1734193643763589

From that, I think the three largest design considerations were:

  1. Not producing garbage process types (i.e. having a subsequent buildpack break a prior buildpack)
  2. Knowing who changed what, so that buildpacks cannot covertly change another buildpack's commands and so that the process is easily debuggable.
  3. A buildpack should not require knowledge about what other buildpacks (before or after) are doing. i.e. they should operate in isolation.

I think I'm following what you're suggesting, and it sounds like that proposal could address these considerations as well. Item 3.) is debatable because this new proposal would be giving a buildpack knowledge of what happened previously, but I think the spirit of 3.) is that buildpacks don't need to reach into the files of other buildpacks (i.e. not reading the launch.toml of other buildpacks from disk). If the lifecycle is coordinating things, IMO, that meets 3.).

Anyway. I'm not tied to the particular implementation, but feel strongly that we need this feature. I'd be curious to know from those working on the lifecycle which approach sounds like it'd be easier to implement. That might be a good data point in this decision as well.

@schneems
Copy link

schneems commented Aug 7, 2025

Regarding wanting focus. A specific desire for Heroku is to pattern match. For example by transforming all types that start with “mcp.*”

I talked with @jabrown85 on implementation limitations of my “prior launch time” proposal is the need to handle multiple launch.toml versions.

To address that we could say that launch.toml changes need to be compatible with the prior version (changes are infrequent but have happened).

The transform DSL also introduces restrictions to future launch.toml changes. (Future updates need to not break the transform promises and be backwards compatible).

Copy link
Member

@AidanDelaney AidanDelaney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have an alternative that moves the complexity out of liffecycle and into a spec for exec binaries. We can define how to chain exec binaries. Is this worth discussing?

- This would be more complicated to implement as we'd need to define both a way to pass in the previous process type information. It's unclear if the additional complexity is needed. i.e. do buildpack authors need to be able to see the commmands/args/working directory set previously.
- I don't believe there's a way we can force the previously defined process types as being read-only. There is some protection by obscurity at the moment, if we point a subsequent buildpack to the `launch.toml` files of previous buildpacks, we're removing that obscurity and making it easier for someone to just modify those files directly. We could possibly retain this by passing copies of those files to subsequent buildpacks though.
- If all buildpack authors want is to wrap a command or append arguments, this implementation creates more work for them. They need to read the previous commands, augment, and then write out the changes, as opposed to using some handy place holders.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another alternative approach may be to allow chaining of processes with the requirement that each process exec s the passed in CMD. This is a similar analog to the entrypoint in Dockerfile.

[[transformations]]
type = "task" # reference original process type
exec = "time"
reason = "Wrapping start command to log time spent"

The advantage of this approach is simplicity of implementation. The disadvantage is that some of the example use-cases become less transparent. For example, the following transformation

[[transformations]]
type = "web" # reference original process type
args = [$ARGS, "--production"]
reason = "adding additional arguments"

would instead become

[[transformations]]
type = "web" # reference original process type
exec = "production"
reason = "adding additional arguments"

Where the production binary would exec the passed in command and add the --production flag.

The "add an argument" use-case could be generalized as a buildpack that provides the exec-with-args binary and accepts args.

[[transformations]]
type = "web" # reference original process type
exec = "exec-with-args"
args = ["--production"]
reason = "adding additional arguments"

Such exec wrappers would be defined as having to adhere to a specific interface, accepting --command and --transformations parameters

Let's take an example of where we want to chain such transformations. We serialize the eventual command as a JSON string passed as --command, and the list of subsequent transformations as a JSON string, passed as --transformations.

[[transformations]]
type = "web" # reference original process type
exec = "exec-with-args"
args = ["--production"]
reason = "adding additional arguments"

[[transformations]]
type = "web" # reference original process type
exec = "umask"
args = ["0022"]
reason = "set the umask for the process"

Here we compute the command to execute as

COMMAND: '{"type":"web","command":["my-app"],"args":["arg1","arg2"],"default":true,"working-dir":"/workspace"}'

and the subsequent transformations as

TRANSFORMATIONS: '[{"type":"web","exec":"umask","args":["0022"],"reason":"set the umask for the process"}]'

The exec-with-args --production --command COMMAND --transformation TRANSFORMATIONS process would exec umask 0022 and pass the modified COMMAND with the --production added to the args of COMMAND. exec-with-args execs umask 0022 --command '{"type":"web","command":["my-app"],"args":["arg1","arg2","--production"],"default":true,"working-dir":"/workspace"}' (with no --transformations argument). Subsequently, the umask binary does whatever is expected of it and execs the passed --command.

This alternative avoids interpolation of $CMD, $ARGS and $ARGS_STRING. It's slightly more in-keeping with what folks may have experience with in the Docker world. However, it is less declarative than this proposal.

Copy link

@chap chap Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AidanDelaney thanks for acknowledging Docker entrypoint. I was having similar thoughts.

What if transformations are simplified further to just allow prefixing and suffixing of CMD:

[[transformations]]
type = "web"
cmd_prefix = "time"
cmd_suffix = "--production --port $PORT"
reason = "wrap time and add prod argument"

Or overwriting entirely:

[[transformations]]
type = "web"
cmd = "time bin/server --port $PORT --production"
reason = "force prod web server"

(Forgive my ignorance on the technical feasibility and implementation. I'm new here 🤠 )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants