Fix N+1 query in buildStopList — replace per-stop GetRouteIDsForStop …#305
Fix N+1 query in buildStopList — replace per-stop GetRouteIDsForStop …#305sreenivasivbieb wants to merge 2 commits intoOneBusAway:mainfrom
Conversation
|
@aaronbrethorst pls review this pr |
aaronbrethorst
left a comment
There was a problem hiding this comment.
Hey Sreenivasiv, thanks for working on this N+1 query optimization — reducing 41 queries down to 2 is exactly the kind of performance work that makes a real difference. The approach of batching the route lookups is the right idea. Before we can merge this, I will need you to make a couple changes:
Critical
-
Wrong batch query — route IDs are in the wrong format. The original
GetRouteIDsForStopreturns combined IDs like"25_1234"(viaroutes.agency_id || '_' || routes.id). The new code usesGetRoutesForStops, which returns aGetRoutesForStopsRowwhere.IDis just the raw route ID like"1234"— no agency prefix. This means every stop in the API response will have incorrectly formattedrouteIds.There's already a batch query designed for exactly this use case:
GetRouteIDsForStops(note the plural). It returns combined IDs just like the original single query. You'd use it like:routeIDsForStops, err := rb.api.GtfsManager.GtfsDB.Queries.GetRouteIDsForStops(rb.ctx, stopIDs) // ... for _, r := range routeIDsForStops { stopRouteMap[r.StopID] = append(stopRouteMap[r.StopID], r.RouteID.(string)) }
-
rb.presentRoutesis no longer populated from stop routes. The old code calledprocessRouteIds(), which had a critical side effect — it registered each route inrb.presentRoutes:rb.presentRoutes[routeId] = models.Route{} // This is gone now
The new code bypasses
processRouteIdsentirely, so routes discovered through stops never get registered. This meanscollectAgenciesAndRoutes()won't know about them, and the API response'sreferences.routesarray will be missing routes that serve stops but aren't associated with a currently active trip. Clients will see route IDs on stops that don't exist in the references section.After building
stopRouteMap, you'll need to register each route:for _, routeIDs := range stopRouteMap { for _, routeID := range routeIDs { rb.presentRoutes[routeID] = models.Route{} } }
Important
-
processRouteIdsbecomes dead code. After this PR,processRouteIdsis defined but never called.make lintwill flag this as unused, which means the lint check required by our contributing guidelines won't pass. Once you've addressed the items above and preserved its side effects inline, please removeprocessRouteIdsentirely. -
Error handling is less resilient. The old code used
continueon per-stop errors — if one stop's route lookup failed, the others still got processed. The new code uses a barereturnon the batch error, which silently drops all stops from the response with no logging. While batch queries are less likely to have partial failures, please at minimum log the error before returning:if err != nil { logging.LogError(rb.api.Logger, "failed to batch fetch routes for stops", err) return }
Fit and Finish
- Blank line removed between
buildStopListandprocessRouteIds. The diff shows a blank line was removed between these two functions. This is minor, butgo fmtshould take care of it — please rungo fmt ./...before committing, per our contributing guidelines.
Since this PR will require changes, you'll need to rebase against main before merging. If you haven't rebased before, I wrote a short guide that might help: https://www.brethorsting.com/blog/2026/01/git-rebase-for-the-terrified/
Thanks again, and I look forward to merging this change.
…with batch GetRoutesForStops Fixes OneBusAway#304 - Modified buildStopList to collect all stop IDs first - Replaced loop calling GetRouteIDsForStop for each stop with single GetRoutesForStops batch query - Built stopRouteMap in memory for O(1) lookups instead of database queries in loop Performance improvement for 40 stops: - Before: 41 queries (1 GetStopsByIDs + 40 GetRouteIDsForStop) - After: 2 queries (1 GetStopsByIDs + 1 GetRoutesForStops)
dd29593 to
f31ee3a
Compare
|
@aaronbrethorst pls review the changes i made and merge with main branch |
There was a problem hiding this comment.
Pull request overview
This PR aims to eliminate an N+1 query pattern in referenceBuilder.buildStopList (used by the trips-for-location endpoint) by replacing per-stop route lookups with a single batch query and in-memory stop→routes mapping (Fixes #304).
Changes:
- Collect stop IDs up front and replace per-stop
GetRouteIDsForStopcalls with a single batch query. - Build an in-memory
stopRouteMapfor O(1) route lookups per stop. - Expand route “presence” registration based on routes returned for stops.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // This registers all the routes that were in stopRouteMap as present, even if they don't have a trip in the current set. This ensures that any route that has a stop in the area is included in the references, even if no active trips for that route are currently visible. | ||
| for _, routeIDs := range stopRouteMap { | ||
| for _, routeID := range routeIDs { | ||
| rb.presentRoutes[routeID] = models.Route{} | ||
| } | ||
| } |
There was a problem hiding this comment.
The route IDs coming from GetRouteIDsForStops are the combined agency_id + '_' + route.id values, but presentRoutes is keyed by raw routes.id (it’s later used with GetRoutesByIDs and looked up via tripDetails.RouteID). Storing combined IDs here means these “stop-only” routes won’t be fetched/returned in references, despite the comment. Consider switching to GetRoutesForStops and keying presentRoutes by raw route ID while separately building combined IDs for Stop.RouteIDs.
| routesForStops, err := rb.api.GtfsManager.GtfsDB.Queries.GetRouteIDsForStops(rb.ctx, stopIDs) | ||
| if err != nil { | ||
| logging.LogError(rb.api.Logger, "failed to batch fetch routes for stops", err) | ||
| return | ||
| } |
There was a problem hiding this comment.
On batch query failure you return from buildStopList, which drops all stops from the references (and doesn’t propagate an error). Previously, a per-stop failure only skipped that stop. Either return an error up to build() or degrade gracefully by still appending stops with empty RouteIDs when the batch query fails.
There was a problem hiding this comment.
@copilot open a new pull request to apply changes based on this feedback
| func (rb *referenceBuilder) buildStopList(stops []gtfsdb.Stop) { | ||
| rb.stopList = make([]models.Stop, 0, len(stops)) | ||
|
|
||
| stopIDs := make([]string, 0, len(stops)) | ||
| for _, stop := range stops { | ||
| routeIds, err := rb.api.GtfsManager.GtfsDB.Queries.GetRouteIDsForStop(rb.ctx, stop.ID) | ||
| if err != nil { | ||
| continue | ||
| } | ||
| stopIDs = append(stopIDs, stop.ID) | ||
| } | ||
|
|
||
| routeIdsString := rb.processRouteIds(routeIds) | ||
| rb.stopList = append(rb.stopList, rb.createStop(stop, routeIdsString)) | ||
| routesForStops, err := rb.api.GtfsManager.GtfsDB.Queries.GetRouteIDsForStops(rb.ctx, stopIDs) | ||
| if err != nil { | ||
| logging.LogError(rb.api.Logger, "failed to batch fetch routes for stops", err) | ||
| return | ||
| } | ||
| } | ||
|
|
||
| func (rb *referenceBuilder) processRouteIds(routeIds []interface{}) []string { | ||
| routeIdsString := make([]string, len(routeIds)) | ||
| for i, id := range routeIds { | ||
| routeId := id.(string) | ||
| rb.presentRoutes[routeId] = models.Route{} | ||
| routeIdsString[i] = routeId | ||
| stopRouteMap := make(map[string][]string) | ||
|
|
||
| for _, r := range routesForStops { | ||
| stopRouteMap[r.StopID] = append(stopRouteMap[r.StopID], r.ID) | ||
| } | ||
| // This registers all the routes that were in stopRouteMap as present, even if they don't have a trip in the current set. This ensures that any route that has a stop in the area is included in the references, even if no active trips for that route are currently visible. | ||
| for _, routeIDs := range stopRouteMap { | ||
| for _, routeID := range routeIDs { | ||
| rb.presentRoutes[routeID] = models.Route{} | ||
| } | ||
| } | ||
| for _, stop := range stops { | ||
| routeIdsString := stopRouteMap[stop.ID] | ||
| if routeIdsString == nil { | ||
| routeIdsString = []string{} | ||
| } | ||
| rb.stopList = append(rb.stopList, rb.createStop(stop, routeIdsString)) | ||
| } | ||
| return routeIdsString | ||
| } |
There was a problem hiding this comment.
There are handler tests for trips-for-location, but none assert that the response references includes the expected stops/routes after this refactor. Adding an assertion around data["references"].(map).Routes/Stops (and that route IDs are non-null slices) would help prevent regressions in this batch mapping logic.
| stopRouteMap := make(map[string][]string) | ||
|
|
||
| for _, r := range routesForStops { | ||
| stopRouteMap[r.StopID] = append(stopRouteMap[r.StopID], r.ID) |
There was a problem hiding this comment.
GetRouteIDsForStops returns rows with fields RouteID (interface{}) and StopID; there is no ID field. This will not compile and also needs a type assertion (as done in other handlers) before appending into map[string][]string.
| stopRouteMap[r.StopID] = append(stopRouteMap[r.StopID], r.ID) | |
| routeID, ok := r.RouteID.(string) | |
| if !ok { | |
| continue | |
| } | |
| stopRouteMap[r.StopID] = append(stopRouteMap[r.StopID], routeID) |
aaronbrethorst
left a comment
There was a problem hiding this comment.
Hey Sreenivasiv, you're making good progress on this N+1 optimization. You've addressed items 2–5 from my first review — presentRoutes is now populated, processRouteIds is removed, the error is logged, and the formatting is clean. However, the critical item from last time is still not quite right:
Critical
1. r.ID doesn't exist on GetRouteIDsForStopsRow — this won't compile
You've switched to the correct query (GetRouteIDsForStops), but the code accesses r.ID on the result:
stopRouteMap[r.StopID] = append(stopRouteMap[r.StopID], r.ID)GetRouteIDsForStopsRow has two fields: RouteID interface{} and StopID string — there is no ID field. This will fail to compile. Additionally, since RouteID is typed as interface{}, you need a type assertion. The fix:
stopRouteMap[r.StopID] = append(stopRouteMap[r.StopID], r.RouteID.(string))Please also verify this compiles locally by running make build before pushing.
Important
2. Squash and rebase
The branch has two commits and is seven commits behind main. Please squash into a single atomic commit and rebase against main. If rebasing feels unfamiliar, here's the guide from last time: https://www.brethorsting.com/blog/2026/01/git-rebase-for-the-terrified/
It's OK to use AI tools to build new features and bug fixes, but you must make sure you understand everything that the AI tool is doing, using it as an enhancement, not as a crutch. Please read our policy on AI contributions. https://opentransitsoftwarefoundation.org/2025/12/our-policy-on-ai-generated-contributions/
Thanks again, and I look forward to merging this change.
|
hey I saw that this problem wasnt merged properly 2 weeks ago , can I take it up and generate a commit pr for this ? |
|
If you want to do it you can proceed on with it |
…with batch GetRoutesForStops
Fixes #304
Performance improvement for 40 stops: