Skip to content

Comments

initial scim implementation#75

Open
fcollman wants to merge 49 commits intomasterfrom
scim
Open

initial scim implementation#75
fcollman wants to merge 49 commits intomasterfrom
scim

Conversation

@fcollman
Copy link
Collaborator

@fcollman fcollman commented Feb 6, 2026

this adds a blueprint to middle_auth to implement SCIM 2.0 (https://scim.cloud/) to enable middle_auth users to be provisioned and arranged into groups by a SCIM compatible server.


Note

High Risk
Adds new externally-facing provisioning endpoints and modifies database schema/migrations plus auth gating behavior, so mistakes could impact user/group/dataset integrity or allow unintended access if misconfigured.

Overview
Introduces a new neuroglancer_auth.scim blueprint mounted at /{URL_PREFIX}/scim/v2 implementing SCIM 2.0 discovery endpoints and full CRUD/PATCH behavior for Users, Groups, and a custom Dataset resource, including SCIM-compliant error responses and RFC-style filtering.

Adds persistent SCIM identifiers by extending the user, group, and dataset models/tables with scim_id (deterministic UUID5) and optional external_id, plus Alembic migrations to backfill existing rows and create indexes/uniqueness constraints. Updates creation flows to auto-generate scim_id on insert and expands User.update() to allow email changes.

Adds SCIM integration test infrastructure: GitHub Actions workflow (Postgres/Redis services) to run migrations, start the app with auth bypass, and execute pytest integration tests; plus Docker Compose + dedicated test runner image/config and new test dependencies (scim2-filter-parser, pytest, requests).

Written by Cursor Bugbot for commit b99ac5b. This will update automatically on new commits. Configure here.

"ne": lambda attr, val: attr != val,
"co": lambda attr, val: attr.ilike(f"%{val}%"), # contains (case-insensitive)
"sw": lambda attr, val: attr.ilike(f"{val}%"), # starts with
"ew": lambda attr, val: attr.ilike(f"%{val}"), # ends with
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LIKE wildcards in filter values not escaped

Low Severity

The co, sw, and ew filter operators pass user-supplied values directly into ilike patterns without escaping SQL LIKE wildcards (% and _). A filter like userName co "%" generates the pattern %%%, matching all records instead of only those containing a literal percent sign. This produces incorrect filter results.

Fix in Cursor Fix in Web


def update(self, data):
user_fields = ["admin", "name", "pi", "gdpr_consent", "read_only"]
user_fields = ["admin", "name", "pi", "gdpr_consent", "read_only", "email"]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding email to update fields affects existing endpoints

Medium Severity

Adding "email" to user_fields in User.update() changes behavior of the existing modify_user_route and modify_service_account_route admin endpoints, which pass raw request JSON to user.update(data). Previously, an email key in the request body was silently ignored; now it updates the email. Those endpoints don't catch IntegrityError, so a duplicate email triggers an unhandled 500 error. This also subtly broadens what admin PUT requests can modify.

Additional Locations (2)

Fix in Cursor Fix in Web

UserGroup.add(user.id, group.id)
user.update_cache() # Update user cache after group change
except sqlalchemy.exc.IntegrityError:
pass # Already in group
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing session rollback after caught IntegrityError breaks subsequent operations

High Severity

When UserGroup.add(), GroupDatasetPermission.add(), or ServiceTable.add() throws an IntegrityError (because the record already exists), their internal db.session.commit() fails and puts the SQLAlchemy session into a "pending rollback" state. The except IntegrityError: pass blocks never call db.session.rollback(), so any subsequent database operation (e.g., the next loop iteration's find_user_by_scim_identifier query) raises PendingRollbackError, crashing the request with a 500 error. This pattern is repeated across ~12 locations in this file.

Additional Locations (2)

Fix in Cursor Fix in Web

op.drop_index("ix_group_external_id", table_name="group")
op.drop_index("ix_group_scim_id", table_name="group")
op.drop_index("ix_user_external_id", table_name="user")
op.drop_index("ix_user_scim_id", table_name="user")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First migration downgrade drops indexes from second migration

Medium Severity

The downgrade() of migration 15ae05f61e12 drops indexes (e.g., ix_user_scim_id) that are only created by the subsequent migration a1b2c3d4e5f6. When rolling back sequentially, the second migration's downgrade() drops these indexes first, then the first migration's downgrade() tries to drop them again, causing a failure. The index drops belong only in the second migration's downgrade().

Additional Locations (1)

Fix in Cursor Fix in Web

ug = UserGroup.get(group.id, member_user.id)
if ug:
db.session.delete(ug)
affected_users.add(member_user.id)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove-all-members PATCH misses path == "members" with no value

Medium Severity

In patch_group, the "remove all members" logic (elif value is None at line 1281) is only reachable when path.startswith("members["), not when path == "members". Per SCIM RFC 7644, {"op": "remove", "path": "members"} (with no value) means remove all members. But when path == "members" and value is None, the isinstance(value, list) check on line 1275 is False, nothing happens, and the elif value is None branch is skipped because it's an elif to the if path == "members" branch.

Fix in Cursor Fix in Web

resource = DatasetSCIMSerializer.to_scim(dataset)
response = flask.jsonify(resource)
response.headers["Content-Type"] = "application/scim+json"
return response
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dataset PATCH operations lack atomicity and error handling

Medium Severity

Unlike patch_user and patch_group which wrap all operations in a try/except and commit once atomically at the end, patch_dataset commits after each individual operation (via dataset.update(), ServiceTable.add(), db.session.commit()). If a multi-operation PATCH partially fails, the database is left in an inconsistent state. There's also no error handling — any IntegrityError or other exception propagates as an unhandled 500 instead of a SCIM-compliant error response.

Fix in Cursor Fix in Web

connection.execute(
sa.text("UPDATE group SET scim_id = :scim_id WHERE id = :id"),
{"scim_id": scim_id, "id": group_id}
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Migration raw SQL uses unquoted reserved word group

High Severity

The raw SQL queries use unquoted group and user table names via sa.text(). In PostgreSQL (this project uses psycopg2), group is a fully reserved keyword (part of GROUP BY), so SELECT id FROM group and UPDATE group SET ... will raise a syntax error. The user table has the same risk. These identifiers need to be double-quoted in the raw SQL strings (e.g., "\"group\"", "\"user\"). Alembic's op.add_column() in the first migration handles quoting automatically, but sa.text() does not — it passes SQL through verbatim.

Additional Locations (1)

Fix in Cursor Fix in Web


def update(self, data):
user_fields = ["admin", "name", "pi", "gdpr_consent", "read_only"]
user_fields = ["admin", "name", "pi", "gdpr_consent", "read_only", "email"]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding email to User.update() changes existing API behavior

Medium Severity

Adding "email" to user_fields in User.update() introduces a side effect: existing non-SCIM admin routes (modify_user_route, modify_service_account_route) now accept email changes from the raw JSON request body. Those routes pass flask.request.json directly to user.update() without checking email uniqueness, so a duplicate email will trigger an unhandled IntegrityError (500). Previously, email in the request body was silently ignored.

Additional Locations (1)

Fix in Cursor Fix in Web


def update(self, data):
user_fields = ["admin", "name", "pi", "gdpr_consent", "read_only"]
user_fields = ["admin", "name", "pi", "gdpr_consent", "read_only", "email"]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Email updates now bypass endpoint safeguards

Medium Severity

Adding email to User.update() makes legacy admin endpoints mutate email through a generic path that lacks duplicate-email handling. A conflicting update now raises a database IntegrityError outside SCIM handlers, turning a normal validation case into a server error path.

Fix in Cursor Fix in Web

service_name=st["serviceName"],
table_name=st["tableName"],
dataset=dataset.name,
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dataset creation is not atomic

Medium Severity

create_dataset() persists the dataset via create_dataset_with_scim() before creating ServiceTable rows. If a later ServiceTable.add() fails, the request can return an error even though the dataset was already committed, leaving partial state and misleading conflict behavior.

Additional Locations (1)

Fix in Cursor Fix in Web

_handle_user_add(user, path, value)
elif op_type == "remove":
_handle_user_remove(user, path, value)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Invalid PATCH operations succeed silently

Medium Severity

PATCH handlers never validate op values. Unknown operations are skipped and the request still returns success, so clients can believe updates were applied when nothing changed. This produces silent data drift for Users, Groups, and Datasets.

Additional Locations (2)

Fix in Cursor Fix in Web

service_name=st["serviceName"],
table_name=st["tableName"],
dataset=dataset.name,
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Malformed serviceTables can trigger server error

Low Severity

create_dataset() indexes serviceTables entries with st["serviceName"] and st["tableName"] without validating structure. Missing keys raise KeyError, which is not caught in this handler, causing a 500 instead of a SCIM client error response.

Fix in Cursor Fix in Web


base_url = os.environ.get("SCIM_BASE_URL")
if base_url:
return base_url.rstrip("/")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SCIM base URL duplicates version segment

Medium Severity

When SCIM_BASE_URL already includes /v2, get_base_url() returns it unchanged and serializers append another /v2, producing invalid meta.location and Location URLs like /v2/v2/.... This breaks resource references returned by POST and discovery responses.

Additional Locations (1)

Fix in Cursor Fix in Web

return build_error_response(404, "NOT_FOUND", f"User {scim_id} not found")


def _sanitize_pi_field(pi_value):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems verbose

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.


# If it's an auth error (400/401/403), ensure it's SCIM-formatted
if status_code in [400, 401, 403]:
# Check if already SCIM-formatted
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SCIM auth can return OAuth redirects

Medium Severity

scim_auth_required delegates to auth_required, but only rewrites 400/401/403 responses. When no Authorization header is sent, auth_required can return a 302 OAuth redirect, so SCIM endpoints emit non-SCIM redirects instead of SCIM auth errors.

Fix in Cursor Fix in Web


base_url = os.environ.get("SCIM_BASE_URL")
if base_url:
return base_url.rstrip("/")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SCIM locations can include duplicated version segment

Medium Severity

get_base_url returns SCIM_BASE_URL as-is, but serializers/routes append "/v2/..." to that value. With SCIM_BASE_URL set to a value already ending in /v2, generated meta.location URLs become .../v2/v2/..., producing invalid resource links.

Additional Locations (1)

Fix in Cursor Fix in Web

@chrisj
Copy link
Collaborator

chrisj commented Feb 17, 2026

test is failing for looking for test_config.py in neuroglancer_auth

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants