gc cron job to clean up dangling policy_bindings entries#3065
gc cron job to clean up dangling policy_bindings entries#3065
Conversation
michaeljguarino
left a comment
There was a problem hiding this comment.
we definitely need to make sure there are indices on all these columns as well, the exists queries are going to be quite slow if not.
|
|
||
| # Create a project with write_bindings - these should be kept | ||
| project = insert(:project, write_bindings: [%{user_id: user.id}]) | ||
| %{write_bindings: [kept_binding]} = Console.Repo.preload(project, [:write_bindings]) |
There was a problem hiding this comment.
i actually don't think this preload should be needed
There was a problem hiding this comment.
i think it's needed if we want a kept_binding that we can run assertions against after the cronjob
lib/console/deployments/cron.ex
Outdated
| |> PolicyBinding.ordered(asc: :id) | ||
| |> Repo.stream(method: :keyset) | ||
| |> Console.throttle(count: 100, pause: 1) | ||
| |> Task.async_stream(fn binding -> |
There was a problem hiding this comment.
we would want to chunk these in batches of 100 and delete each chunk in one query, to save db network round trips. Pretty sure there are some other examples of doing that here.
641a8d3 to
ed1ea75
Compare
ed1ea75 to
c7ee50e
Compare
michaeljguarino
left a comment
There was a problem hiding this comment.
this probably will work, but some refactoring changes, and need to try to use the ecto sql builder api directly since it protects against sql injections natively
| PolicyBinding.dangling() | ||
| |> PolicyBinding.ordered(asc: :id) | ||
| |> Repo.stream(method: :keyset) | ||
| |> Stream.chunk_every(100) |
There was a problem hiding this comment.
we should throttle this (should be examples elsewhere)
lib/console/schema/policy_binding.ex
Outdated
|
|
||
| sql = """ | ||
| SELECT pb.id FROM policy_bindings pb | ||
| WHERE #{where_clause} |
There was a problem hiding this comment.
i feel like this almost certainly has some sort of sql injection vulnerability, although its got to be pretty hard to construct. See if there's a way to sql escape that where clause, but also i'm pretty sure you can use the proper query builder with dynamics maybe for it.
lib/console/schema/policy_binding.ex
Outdated
| ORDER BY table_name, column_name | ||
| """ | ||
|
|
||
| Repo.query!(query, [], timeout: 30_000).rows |
There was a problem hiding this comment.
try to use query dsl here, but also don't have this query run every time you fetch dangling ids, probably refactor it so it can be called independently and have the table names passed to the dangling ids function each time (don't want to make 2*n queries instead of n)
c7ee50e to
dc777f6
Compare
lib/console/schema/policy_binding.ex
Outdated
| table_columns | ||
| |> Enum.flat_map(fn {table, columns} -> | ||
| columns | ||
| |> Enum.map(fn col -> |
There was a problem hiding this comment.
make this a flat_map instead of piping to List.flatten
lib/console/schema/policy_binding.ex
Outdated
| @doc """ | ||
| Returns IDs of policy bindings whose policy_id is not in the given set of referenced IDs. | ||
| """ | ||
| def unreferenced_binding_ids(referenced_ids) do |
There was a problem hiding this comment.
the referenced ids here should almost certainly be done with a subquery and join. If we move them into memory, it could be an unbounded set that either breaks sql (max query size) or breaks our server (OOM).
Look at ecto subquery to figure out how to make the join work (you should still fetch the table names into memory first and build the subquery with them)
dc777f6 to
5ee096a
Compare
gc cron job for dangling policy_bindings entries
Test Plan
unit test
Checklist
Plural Flow: console