-
Notifications
You must be signed in to change notification settings - Fork 73
Add proposal for flexible process types #321
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Daniel Mikusa <dan@mikusa.com>
|
Maintainers, As you review this RFC please queue up issues to be created using the following commands: Issues(none) |
|
I don't see anything in here about chaining transformations. Would we want to support that? If so, perhaps we could consider changing the names of the variables to something like |
|
What do you think about adding an optional |
Not sure I follow, do you have an example of what you mean by chaining? |
Sounds good to me. I'll add it. |
Signed-off-by: Daniel Mikusa <dan@mikusa.com>
Kind of like what @jabrown85 mentioned in the slack thread. "time wraps secretmanager wraps some-cmd". I could see multiple buildpacks that do this transforming, so if you're the 3rd transformation buildpack, |
Ohh, ok. I understand. Yes, I can see how that might get confusing, and a rename sounds reasonable. How about |
|
Yeah I think dropping the prefix altogether is the right approach. It should be clear in context what those placeholders are. |
Signed-off-by: Daniel Mikusa <dan@mikusa.com>
Done. |
|
@joeybrown-sf Any thoughts on the alternatives section? or the format of the toml in the spec changes section? Even if it's just to say you prefer it the way it is now. Lot of ways to do this, just hoping to get a little feedback on that. thanks! |
|
sure thing! I do prefer the single transform block, but that's just my personal preference. I like it because it's a bit more terse, less nested, and self-explanatory. That being said, I like "transformations" better than "transforms" because the former is always a noun where the latter could be a verb. In my opinion, the following format would is my favorite. |
| type = "migration" # reference original process type | ||
|
|
||
| [processes.transform] | ||
| command = ["bash", "-c '$CMD_STRING'"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have we thought through all the things that could go wrong with variable interpolation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you're using the $CMD and $ARGS options, I think that's the easiest option and probably the safest.
My thought for those is that the implementation is essentially treating them as lists, so if you have command = ["time", $CMD] and the original command is command = ["foo", "bar"] then you'd be merging the original list items into the new list at the place where the variable is located, so you don't need to look at or manipulate the strings, ex: command = ["time", "foo", "bar"].
For the $CMD_STRING and $ARGS_STRING, that's trickier. The obvious problem is string quoting. That is going to require a very cautious implementation. If you have command = ["bash", "-c '$CMD_STRING'"] and an original command of command = ["foo", "bar"], then you get command = ["bash", "-c 'foo bar'"].
The trouble is if you have single quotes in the original command like command = ["fo'o", "bar"], then what? The implementation could shell escape those single quotes, but then that requires the implementation to be aware of the string in which it's interpolating the variable. To know if it's in the middle of a pair of single quotes, and that is complicated.
I think we have two simple options:
- Keep it simple and not escape anything. That reduces the complexity and leaves it up to the buildpack author to sort out. I think this would work for most cases, but would have limits. If I need to use single quotes to wrap a command string and the wrapping buildpack uses single quotes, then it would fail and there would be no workaround.
- We could introduce something very basic like
$CMD_STRINGis the raw string,$CMD_STRING_ESC_SQand$CMD_STRING_ESC_DQ. This would still be simple, it's not reimplementing Bash, but it would give some very basic control to the buildpack author. So if I'm making a buildpack that wraps another command likecommand = ["bash", "-c '$CMD_STRING'"], I know this could be a problem because I put in single quotes. I could then switch tocommand = ["bash", "-c '$CMD_STRING_ESC_SQ'"]and have the string single quote escaped. It's not elegant, but it does work to expand the use cases that we could support further.
The only other option that comes to mind would be using some sort of template language, like Go template. I think that's another level of complexity, though.
There are probably other issues with string interpolation. Maybe security? Although, I think there are other ways one could nefariously modify the start commands, so I don't think string interpolation is exposing anything you can do some other way in that regard.
Was there something in particular that had you concerned?
|
@dmikusa do you want to take a round of edits given the feedback and get this up for vote soon? |
@jabrown85 I will try, but things are a little busy over at Paketo at the moment and it's eating up my time. I will try. |
Signed-off-by: Daniel Mikusa <dan@mikusa.com>
|
Sorry for the delay. I've updated this to use the transformations block instead. I think this is ready for another review. |
|
Following the discussion on the #buildpack-authors Slack channel, I have studied the 'Flexible Process Types' proposal regarding the use case:
It looks like the proposal addresses this use case. The authorization web proxy buildpack would provide the following The authproxy process would then start listening on externally accessible |
|
Short: I am 👍 for adding a way to do this in a non-trivial case (it's possible for a trivial case already). I'm 👎 on adding a TOML-based DSL. I have an alternative proposal involving exposing resolved launch.toml information to the buildpack. Longer: What is possible todayFor the simplest case where there's only one launch.toml that defines processes, this is already possible: This is because:
However, it's not possible in the more complex case where there are multiple entries, or even multiple buildpacks that want to wrap commands i.e. time mcp other bin/rails. That's because the buildpack doesn't know the order of execution of previous buildpacks, but the lifecycle does. This proposalI'm +1 on the use case. I don't love adding a TOML-based DSL, though. This opens up the door for people to suggest new syntax options and pitch new use cases that need to be evaluated. There are also a lot of edge cases that upstream (lifecycle) will be tasked with handling. Such as: In the above example where someone is using bash -c (heroku/buildpacks#15) perhaps we want to special case that and instead modify the contents of the argument such that it is Resolved launch.toml proposalWhen faced with a specific problem, I ask if there's a general-purpose solution that could help me solve it if it existed. If we could provide the buildpack desiring to modify the resolved and finalized I propose we inject this information into the I think from the platform implementer side, the logic for doing this resolution is already present; it's just a matter of refactoring internals to expose it and standardize on how we want to expose the information. They wouldn't have to maintain N different DSLs for N different spec versions. While bash-based buildpacks reading and modifying TOML is more cumbersome than Go or Rust-based buildpacks, it's still possible. For the buildpack maintainer, it's slightly more effort than appending static toml, but it's one fewer concept to learn and supports use cases we've not yet realized we have. |
|
I too am in favor of adding this functionality in general. I really like the idea of having something like CNB_PRIOR_LAUNCH_TOML available to buildpack authors for potential modification. It would be a perfect use case for inline buildpacks and would potentially eliminate the need to have custom buildpacks for a somewhat simple change. One challenge with inline buildpacks, at least from my experience, is that because they always pass detection you cannot create a build plan to require things like python, go, java, etc. This would mean you are limited to tools installed on the build image only. If this is a problem then inline buildpack won’t work and you would then need a custom buildpack so you can have a proper detect script, or you can keep the inline buildpack and use something like the paketo-community/buildp-plan buildpack to specify a build plan and any requirements to the accompanying plan.toml file. Either way this leads to extra files/directories being created. One thing that would be great with any approach is that it can be configured through environment variables or project.toml entirely without the need for extra files and directories. I know this is beyond the scope of this rfc, but if inline buildpacks did support a build plan (while still always passing detection), then you would have a very effective way to modify prior launch toml. If anyone else likes this idea and thinks it’s worth a separate rfc, please thumbs up. |
|
|
||
| Because a subsequent buildpack does not know what a previous buildpack has defined for the command, args, and working directory the following place holders may be used to reference those values: | ||
|
|
||
| - `$CMD` - the original command as a list, fits into a TOML list |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a developer needs to add a literal $CMD they need a way to do so, especially since we are working with such a broad ecosystem. It also cannot involve escaping as any possible escape sequence could be a perfectly valid input as well.
I think the only option would be allowing developers to re-map these somehow like:
[[transforms.remap]]
CMD = "$ACTUAL_CMD"
ARGS = "$CUSTOM_ARGS"
That would allow them to use a $CMD literal if needed:
command = ["time", "$ACTUAL_CMD", "--format={$CMD%s}"]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have a use case where this actually comes up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No immediate need, more enumerating edge cases. Aliasing doesn’t need to go into a v1 schema, but I’m making a note that if someone does have this problem, escaping isn’t the right fix.
|
@schneems - The intent of this proposal is to only cover basic use cases. It is out of scope to cover every edge case and possibility that someone might dream up. The belief of this proposal is that having a basic, limited way to address this user need is better than a.) nothing viable (present situation) and b.) a complicated way of doing everything a user might cook up. It is not trying to be future-proof either. If additional legitimate use cases surface that are not covered by this proposal, then it would be up to the CNB team to evaluate those in the future. |
Hmm, that's interesting. The context behind the present proposal is on this thread in Slack: https://cloud-native.slack.com/archives/C0331B5QS02/p1734193643763589 From that, I think the three largest design considerations were:
I think I'm following what you're suggesting, and it sounds like that proposal could address these considerations as well. Item 3.) is debatable because this new proposal would be giving a buildpack knowledge of what happened previously, but I think the spirit of 3.) is that buildpacks don't need to reach into the files of other buildpacks (i.e. not reading the launch.toml of other buildpacks from disk). If the lifecycle is coordinating things, IMO, that meets 3.). Anyway. I'm not tied to the particular implementation, but feel strongly that we need this feature. I'd be curious to know from those working on the lifecycle which approach sounds like it'd be easier to implement. That might be a good data point in this decision as well. |
|
Regarding wanting focus. A specific desire for Heroku is to pattern match. For example by transforming all types that start with “mcp.*” I talked with @jabrown85 on implementation limitations of my “prior launch time” proposal is the need to handle multiple launch.toml versions. To address that we could say that launch.toml changes need to be compatible with the prior version (changes are infrequent but have happened). The transform DSL also introduces restrictions to future launch.toml changes. (Future updates need to not break the transform promises and be backwards compatible). |
AidanDelaney
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have an alternative that moves the complexity out of liffecycle and into a spec for exec binaries. We can define how to chain exec binaries. Is this worth discussing?
| - This would be more complicated to implement as we'd need to define both a way to pass in the previous process type information. It's unclear if the additional complexity is needed. i.e. do buildpack authors need to be able to see the commmands/args/working directory set previously. | ||
| - I don't believe there's a way we can force the previously defined process types as being read-only. There is some protection by obscurity at the moment, if we point a subsequent buildpack to the `launch.toml` files of previous buildpacks, we're removing that obscurity and making it easier for someone to just modify those files directly. We could possibly retain this by passing copies of those files to subsequent buildpacks though. | ||
| - If all buildpack authors want is to wrap a command or append arguments, this implementation creates more work for them. They need to read the previous commands, augment, and then write out the changes, as opposed to using some handy place holders. | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another alternative approach may be to allow chaining of processes with the requirement that each process exec s the passed in CMD. This is a similar analog to the entrypoint in Dockerfile.
[[transformations]]
type = "task" # reference original process type
exec = "time"
reason = "Wrapping start command to log time spent"
The advantage of this approach is simplicity of implementation. The disadvantage is that some of the example use-cases become less transparent. For example, the following transformation
[[transformations]]
type = "web" # reference original process type
args = [$ARGS, "--production"]
reason = "adding additional arguments"
would instead become
[[transformations]]
type = "web" # reference original process type
exec = "production"
reason = "adding additional arguments"
Where the production binary would exec the passed in command and add the --production flag.
The "add an argument" use-case could be generalized as a buildpack that provides the exec-with-args binary and accepts args.
[[transformations]]
type = "web" # reference original process type
exec = "exec-with-args"
args = ["--production"]
reason = "adding additional arguments"
Such exec wrappers would be defined as having to adhere to a specific interface, accepting --command and --transformations parameters
Let's take an example of where we want to chain such transformations. We serialize the eventual command as a JSON string passed as --command, and the list of subsequent transformations as a JSON string, passed as --transformations.
[[transformations]]
type = "web" # reference original process type
exec = "exec-with-args"
args = ["--production"]
reason = "adding additional arguments"
[[transformations]]
type = "web" # reference original process type
exec = "umask"
args = ["0022"]
reason = "set the umask for the process"
Here we compute the command to execute as
COMMAND: '{"type":"web","command":["my-app"],"args":["arg1","arg2"],"default":true,"working-dir":"/workspace"}'
and the subsequent transformations as
TRANSFORMATIONS: '[{"type":"web","exec":"umask","args":["0022"],"reason":"set the umask for the process"}]'
The exec-with-args --production --command COMMAND --transformation TRANSFORMATIONS process would exec umask 0022 and pass the modified COMMAND with the --production added to the args of COMMAND. exec-with-args execs umask 0022 --command '{"type":"web","command":["my-app"],"args":["arg1","arg2","--production"],"default":true,"working-dir":"/workspace"}' (with no --transformations argument). Subsequently, the umask binary does whatever is expected of it and execs the passed --command.
This alternative avoids interpolation of $CMD, $ARGS and $ARGS_STRING. It's slightly more in-keeping with what folks may have experience with in the Docker world. However, it is less declarative than this proposal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AidanDelaney thanks for acknowledging Docker entrypoint. I was having similar thoughts.
What if transformations are simplified further to just allow prefixing and suffixing of CMD:
[[transformations]]
type = "web"
cmd_prefix = "time"
cmd_suffix = "--production --port $PORT"
reason = "wrap time and add prod argument"
Or overwriting entirely:
[[transformations]]
type = "web"
cmd = "time bin/server --port $PORT --production"
reason = "force prod web server"
(Forgive my ignorance on the technical feasibility and implementation. I'm new here 🤠 )
Readable