Conversation
neuroglancer_auth/scim/filter.py
Outdated
| "ne": lambda attr, val: attr != val, | ||
| "co": lambda attr, val: attr.ilike(f"%{val}%"), # contains (case-insensitive) | ||
| "sw": lambda attr, val: attr.ilike(f"{val}%"), # starts with | ||
| "ew": lambda attr, val: attr.ilike(f"%{val}"), # ends with |
There was a problem hiding this comment.
LIKE wildcards in filter values not escaped
Low Severity
The co, sw, and ew filter operators pass user-supplied values directly into ilike patterns without escaping SQL LIKE wildcards (% and _). A filter like userName co "%" generates the pattern %%%, matching all records instead of only those containing a literal percent sign. This produces incorrect filter results.
|
|
||
| def update(self, data): | ||
| user_fields = ["admin", "name", "pi", "gdpr_consent", "read_only"] | ||
| user_fields = ["admin", "name", "pi", "gdpr_consent", "read_only", "email"] |
There was a problem hiding this comment.
Adding email to update fields affects existing endpoints
Medium Severity
Adding "email" to user_fields in User.update() changes behavior of the existing modify_user_route and modify_service_account_route admin endpoints, which pass raw request JSON to user.update(data). Previously, an email key in the request body was silently ignored; now it updates the email. Those endpoints don't catch IntegrityError, so a duplicate email triggers an unhandled 500 error. This also subtly broadens what admin PUT requests can modify.
Additional Locations (2)
neuroglancer_auth/scim/routes.py
Outdated
| UserGroup.add(user.id, group.id) | ||
| user.update_cache() # Update user cache after group change | ||
| except sqlalchemy.exc.IntegrityError: | ||
| pass # Already in group |
There was a problem hiding this comment.
Missing session rollback after caught IntegrityError breaks subsequent operations
High Severity
When UserGroup.add(), GroupDatasetPermission.add(), or ServiceTable.add() throws an IntegrityError (because the record already exists), their internal db.session.commit() fails and puts the SQLAlchemy session into a "pending rollback" state. The except IntegrityError: pass blocks never call db.session.rollback(), so any subsequent database operation (e.g., the next loop iteration's find_user_by_scim_identifier query) raises PendingRollbackError, crashing the request with a 500 error. This pattern is repeated across ~12 locations in this file.
Additional Locations (2)
migrations/versions/a1b2c3d4e5f6_populate_scim_ids_and_add_constraints.py
Show resolved
Hide resolved
| op.drop_index("ix_group_external_id", table_name="group") | ||
| op.drop_index("ix_group_scim_id", table_name="group") | ||
| op.drop_index("ix_user_external_id", table_name="user") | ||
| op.drop_index("ix_user_scim_id", table_name="user") |
There was a problem hiding this comment.
First migration downgrade drops indexes from second migration
Medium Severity
The downgrade() of migration 15ae05f61e12 drops indexes (e.g., ix_user_scim_id) that are only created by the subsequent migration a1b2c3d4e5f6. When rolling back sequentially, the second migration's downgrade() drops these indexes first, then the first migration's downgrade() tries to drop them again, causing a failure. The index drops belong only in the second migration's downgrade().
Additional Locations (1)
neuroglancer_auth/scim/routes.py
Outdated
| ug = UserGroup.get(group.id, member_user.id) | ||
| if ug: | ||
| db.session.delete(ug) | ||
| affected_users.add(member_user.id) |
There was a problem hiding this comment.
Remove-all-members PATCH misses path == "members" with no value
Medium Severity
In patch_group, the "remove all members" logic (elif value is None at line 1281) is only reachable when path.startswith("members["), not when path == "members". Per SCIM RFC 7644, {"op": "remove", "path": "members"} (with no value) means remove all members. But when path == "members" and value is None, the isinstance(value, list) check on line 1275 is False, nothing happens, and the elif value is None branch is skipped because it's an elif to the if path == "members" branch.
| resource = DatasetSCIMSerializer.to_scim(dataset) | ||
| response = flask.jsonify(resource) | ||
| response.headers["Content-Type"] = "application/scim+json" | ||
| return response |
There was a problem hiding this comment.
Dataset PATCH operations lack atomicity and error handling
Medium Severity
Unlike patch_user and patch_group which wrap all operations in a try/except and commit once atomically at the end, patch_dataset commits after each individual operation (via dataset.update(), ServiceTable.add(), db.session.commit()). If a multi-operation PATCH partially fails, the database is left in an inconsistent state. There's also no error handling — any IntegrityError or other exception propagates as an unhandled 500 instead of a SCIM-compliant error response.
| connection.execute( | ||
| sa.text("UPDATE group SET scim_id = :scim_id WHERE id = :id"), | ||
| {"scim_id": scim_id, "id": group_id} | ||
| ) |
There was a problem hiding this comment.
Migration raw SQL uses unquoted reserved word group
High Severity
The raw SQL queries use unquoted group and user table names via sa.text(). In PostgreSQL (this project uses psycopg2), group is a fully reserved keyword (part of GROUP BY), so SELECT id FROM group and UPDATE group SET ... will raise a syntax error. The user table has the same risk. These identifiers need to be double-quoted in the raw SQL strings (e.g., "\"group\"", "\"user\"). Alembic's op.add_column() in the first migration handles quoting automatically, but sa.text() does not — it passes SQL through verbatim.
Additional Locations (1)
|
|
||
| def update(self, data): | ||
| user_fields = ["admin", "name", "pi", "gdpr_consent", "read_only"] | ||
| user_fields = ["admin", "name", "pi", "gdpr_consent", "read_only", "email"] |
There was a problem hiding this comment.
Adding email to User.update() changes existing API behavior
Medium Severity
Adding "email" to user_fields in User.update() introduces a side effect: existing non-SCIM admin routes (modify_user_route, modify_service_account_route) now accept email changes from the raw JSON request body. Those routes pass flask.request.json directly to user.update() without checking email uniqueness, so a duplicate email will trigger an unhandled IntegrityError (500). Previously, email in the request body was silently ignored.
Additional Locations (1)
|
|
||
| def update(self, data): | ||
| user_fields = ["admin", "name", "pi", "gdpr_consent", "read_only"] | ||
| user_fields = ["admin", "name", "pi", "gdpr_consent", "read_only", "email"] |
There was a problem hiding this comment.
Email updates now bypass endpoint safeguards
Medium Severity
Adding email to User.update() makes legacy admin endpoints mutate email through a generic path that lacks duplicate-email handling. A conflicting update now raises a database IntegrityError outside SCIM handlers, turning a normal validation case into a server error path.
| service_name=st["serviceName"], | ||
| table_name=st["tableName"], | ||
| dataset=dataset.name, | ||
| ) |
There was a problem hiding this comment.
Dataset creation is not atomic
Medium Severity
create_dataset() persists the dataset via create_dataset_with_scim() before creating ServiceTable rows. If a later ServiceTable.add() fails, the request can return an error even though the dataset was already committed, leaving partial state and misleading conflict behavior.
Additional Locations (1)
| _handle_user_add(user, path, value) | ||
| elif op_type == "remove": | ||
| _handle_user_remove(user, path, value) | ||
|
|
There was a problem hiding this comment.
Invalid PATCH operations succeed silently
Medium Severity
PATCH handlers never validate op values. Unknown operations are skipped and the request still returns success, so clients can believe updates were applied when nothing changed. This produces silent data drift for Users, Groups, and Datasets.
Additional Locations (2)
| service_name=st["serviceName"], | ||
| table_name=st["tableName"], | ||
| dataset=dataset.name, | ||
| ) |
There was a problem hiding this comment.
Malformed serviceTables can trigger server error
Low Severity
create_dataset() indexes serviceTables entries with st["serviceName"] and st["tableName"] without validating structure. Missing keys raise KeyError, which is not caught in this handler, causing a 500 instead of a SCIM client error response.
|
|
||
| base_url = os.environ.get("SCIM_BASE_URL") | ||
| if base_url: | ||
| return base_url.rstrip("/") |
There was a problem hiding this comment.
SCIM base URL duplicates version segment
Medium Severity
When SCIM_BASE_URL already includes /v2, get_base_url() returns it unchanged and serializers append another /v2, producing invalid meta.location and Location URLs like /v2/v2/.... This breaks resource references returned by POST and discovery responses.
Additional Locations (1)
| return build_error_response(404, "NOT_FOUND", f"User {scim_id} not found") | ||
|
|
||
|
|
||
| def _sanitize_pi_field(pi_value): |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
|
|
||
| # If it's an auth error (400/401/403), ensure it's SCIM-formatted | ||
| if status_code in [400, 401, 403]: | ||
| # Check if already SCIM-formatted |
There was a problem hiding this comment.
|
|
||
| base_url = os.environ.get("SCIM_BASE_URL") | ||
| if base_url: | ||
| return base_url.rstrip("/") |
There was a problem hiding this comment.
SCIM locations can include duplicated version segment
Medium Severity
get_base_url returns SCIM_BASE_URL as-is, but serializers/routes append "/v2/..." to that value. With SCIM_BASE_URL set to a value already ending in /v2, generated meta.location URLs become .../v2/v2/..., producing invalid resource links.
Additional Locations (1)
|
test is failing for looking for test_config.py in neuroglancer_auth |


this adds a blueprint to middle_auth to implement SCIM 2.0 (https://scim.cloud/) to enable middle_auth users to be provisioned and arranged into groups by a SCIM compatible server.
Note
High Risk
Adds new externally-facing provisioning endpoints and modifies database schema/migrations plus auth gating behavior, so mistakes could impact user/group/dataset integrity or allow unintended access if misconfigured.
Overview
Introduces a new
neuroglancer_auth.scimblueprint mounted at/{URL_PREFIX}/scim/v2implementing SCIM 2.0 discovery endpoints and full CRUD/PATCH behavior for Users, Groups, and a custom Dataset resource, including SCIM-compliant error responses and RFC-style filtering.Adds persistent SCIM identifiers by extending the
user,group, anddatasetmodels/tables withscim_id(deterministic UUID5) and optionalexternal_id, plus Alembic migrations to backfill existing rows and create indexes/uniqueness constraints. Updates creation flows to auto-generatescim_idon insert and expandsUser.update()to allow email changes.Adds SCIM integration test infrastructure: GitHub Actions workflow (Postgres/Redis services) to run migrations, start the app with auth bypass, and execute
pytestintegration tests; plus Docker Compose + dedicated test runner image/config and new test dependencies (scim2-filter-parser,pytest,requests).Written by Cursor Bugbot for commit b99ac5b. This will update automatically on new commits. Configure here.