-
Notifications
You must be signed in to change notification settings - Fork 204
Avoid uninstalling and re-installing service components on policy change #11740
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Avoid uninstalling and re-installing service components on policy change #11740
Conversation
|
This pull request does not have a backport label. Could you fix it @ycombinator? 🙏
|
84a4523 to
1951fec
Compare
|
Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane) |
|
Dropping the output ID makes sense to me, but I think if you do a policy assignment the input ID also changes, which would cause an uninstall and reinstall. Does the policy reassignment case work properly for tamper protected agents? Or is this only handling the case where the output is changed within the same policy? |
As discussed in today's meeting, the component ID uses the input type (not input ID) so, with the change in this PR, we should still prevent an uninstall and reinstall. |
Yes, it does. I forgot that that's how I'd tested this PR to begin with. I even added manual testing steps to the PR's description. 🤦 I will test the upgrade scenario next. |
Okay, so upgrading works in that nothing is broken after upgrade. Here's the output of BeforeDuringAfter |
|
What's the right additional testing to add for this? It looks correct by inspection, but I think we should add something proving it does what we think. If we could get a simulated policy reassignment (input If policy reassignment always fails without this change, that's a test case that is missing from the endpoint integration tests, so we could add that. I also don't think we have a test that does an upgrade with defend installed but that is even heavier, though we have had lots of problems around this discovered in the field so maybe worth it. |
d7bf91b to
e892133
Compare
Added a unit test case for input ID change to A unit test that tests just changing the output already exists in
But I added additional assertions for that test case to make sure no components were added, removed, or updated in the component model as a result of such a change: 4e352a7#diff-21de8fb82a5ebccaa0ca3afb0cb8bfbe10d5b8f920021785c2d485ee35a3556cR225-R227 |
| require.NotZero(t, secondPID) | ||
|
|
||
| // Assert that endpoint process IDs haven't changed across policy reassignment | ||
| require.Equal(t, firstPID, secondPID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is checking that Endpoint hasn't restarted good enough for this test? Or should we try to check that Endpoint has not be uninstalled/reinstalled — that might be more involved and require looking at the Endpoint service's logs in the service manager.
Added in 8a1a8c3 |
13f5ff1 to
8a1a8c3
Compare
|
…essful policy reassignment
💔 Build Failed
Failed CI StepsHistory
cc @ycombinator |
What does this PR do?
This PR identifies Service Runtime components with only their input type; the output ID is not longer used.
Why is it important?
Service Runtime components are intended to be kept running (via a service) for as long as possible. We should only install/start or uninstall/stop them if they are being explicitly added or removed, respectively, from the component model. If only their configuration is being updated, we should keep the component running.
If a component's ID changes between the last and current component models, Elastic Agent will ask the component's service to uninstall and then reinstall itself. Prior to this PR, service components' ID were determined by their input type and output ID. Therefore, if a service component's output were changed, it would cause the service to be uninstall and then reinstalled. This is undesirable behavior, as services should be kept running as long as possible.
With the changes in this PR, we no longer consider the output ID when generating service components' IDs. If a service component's output is changed, it's ID remains the same between the last and current component models. Elastic Agent does not uninstall and reinstall the component's service but simply passes the configuration change to it (which it was doing prior to this PR anyway).
Checklist
I have made corresponding changes to the documentationI have made corresponding change to the default configuration files./changelog/fragmentsusing the changelog toolI have added an integration test or an E2E testDisruptive User Impact
None.
How to test this PR locally
Policy reassign does not uninstall/reinstall Endpoint
default: containing only thesystemintegrationtp-es: containing the Elastic Defend integration, with tamper protection enabled, and using the Elasticsearch output.tp-ls: containing the Elastic Defend integration, with tamper protection enabled, and using the Logstash output. Note that you will need to create the Logstash output in Fleet > Settings.tp-espolicy and verify the agent is healthy and shipping data.tp-lspolicy.uninstall endpoint service./opt/Elastic/Endpoint/state/log/on Linux) and make sure that Endpoint has connected to Logstash (or has attempted to and failed if there is no actual Logstash endpoint listening).Removing Endpoint from policy uninstalls Endpoint
defaultpolicy.stopping endpoint service runtime, followed byuninstall endpoint service, followed byStopped: endpoint service runtime.Related issues
Questions to ask yourself