Skip to content

Fix N+1 query in buildStopList — replace per-stop GetRouteIDsForStop …#305

Open
sreenivasivbieb wants to merge 2 commits intoOneBusAway:mainfrom
sreenivasivbieb:fix/n-plus-one-buildstoplist
Open

Fix N+1 query in buildStopList — replace per-stop GetRouteIDsForStop …#305
sreenivasivbieb wants to merge 2 commits intoOneBusAway:mainfrom
sreenivasivbieb:fix/n-plus-one-buildstoplist

Conversation

@sreenivasivbieb
Copy link
Contributor

…with batch GetRoutesForStops

Fixes #304

  • Modified buildStopList to collect all stop IDs first
  • Replaced loop calling GetRouteIDsForStop for each stop with single GetRoutesForStops batch query
  • Built stopRouteMap in memory for O(1) lookups instead of database queries in loop

Performance improvement for 40 stops:

  • Before: 41 queries (1 GetStopsByIDs + 40 GetRouteIDsForStop)
  • After: 2 queries (1 GetStopsByIDs + 1 GetRoutesForStops)

@sreenivasivbieb
Copy link
Contributor Author

@aaronbrethorst pls review this pr

Copy link
Member

@aaronbrethorst aaronbrethorst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Sreenivasiv, thanks for working on this N+1 query optimization — reducing 41 queries down to 2 is exactly the kind of performance work that makes a real difference. The approach of batching the route lookups is the right idea. Before we can merge this, I will need you to make a couple changes:

Critical

  1. Wrong batch query — route IDs are in the wrong format. The original GetRouteIDsForStop returns combined IDs like "25_1234" (via routes.agency_id || '_' || routes.id). The new code uses GetRoutesForStops, which returns a GetRoutesForStopsRow where .ID is just the raw route ID like "1234" — no agency prefix. This means every stop in the API response will have incorrectly formatted routeIds.

    There's already a batch query designed for exactly this use case: GetRouteIDsForStops (note the plural). It returns combined IDs just like the original single query. You'd use it like:

    routeIDsForStops, err := rb.api.GtfsManager.GtfsDB.Queries.GetRouteIDsForStops(rb.ctx, stopIDs)
    // ...
    for _, r := range routeIDsForStops {
        stopRouteMap[r.StopID] = append(stopRouteMap[r.StopID], r.RouteID.(string))
    }
  2. rb.presentRoutes is no longer populated from stop routes. The old code called processRouteIds(), which had a critical side effect — it registered each route in rb.presentRoutes:

    rb.presentRoutes[routeId] = models.Route{}  // This is gone now

    The new code bypasses processRouteIds entirely, so routes discovered through stops never get registered. This means collectAgenciesAndRoutes() won't know about them, and the API response's references.routes array will be missing routes that serve stops but aren't associated with a currently active trip. Clients will see route IDs on stops that don't exist in the references section.

    After building stopRouteMap, you'll need to register each route:

    for _, routeIDs := range stopRouteMap {
        for _, routeID := range routeIDs {
            rb.presentRoutes[routeID] = models.Route{}
        }
    }

Important

  1. processRouteIds becomes dead code. After this PR, processRouteIds is defined but never called. make lint will flag this as unused, which means the lint check required by our contributing guidelines won't pass. Once you've addressed the items above and preserved its side effects inline, please remove processRouteIds entirely.

  2. Error handling is less resilient. The old code used continue on per-stop errors — if one stop's route lookup failed, the others still got processed. The new code uses a bare return on the batch error, which silently drops all stops from the response with no logging. While batch queries are less likely to have partial failures, please at minimum log the error before returning:

    if err != nil {
        logging.LogError(rb.api.Logger, "failed to batch fetch routes for stops", err)
        return
    }

Fit and Finish

  1. Blank line removed between buildStopList and processRouteIds. The diff shows a blank line was removed between these two functions. This is minor, but go fmt should take care of it — please run go fmt ./... before committing, per our contributing guidelines.

Since this PR will require changes, you'll need to rebase against main before merging. If you haven't rebased before, I wrote a short guide that might help: https://www.brethorsting.com/blog/2026/01/git-rebase-for-the-terrified/

Thanks again, and I look forward to merging this change.

…with batch GetRoutesForStops

Fixes OneBusAway#304

- Modified buildStopList to collect all stop IDs first
- Replaced loop calling GetRouteIDsForStop for each stop with single GetRoutesForStops batch query
- Built stopRouteMap in memory for O(1) lookups instead of database queries in loop

Performance improvement for 40 stops:
- Before: 41 queries (1 GetStopsByIDs + 40 GetRouteIDsForStop)
- After: 2 queries (1 GetStopsByIDs + 1 GetRoutesForStops)
@sreenivasivbieb sreenivasivbieb force-pushed the fix/n-plus-one-buildstoplist branch from dd29593 to f31ee3a Compare February 7, 2026 14:32
Copilot AI review requested due to automatic review settings February 7, 2026 14:32
@sreenivasivbieb
Copy link
Contributor Author

@aaronbrethorst pls review the changes i made and merge with main branch

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to eliminate an N+1 query pattern in referenceBuilder.buildStopList (used by the trips-for-location endpoint) by replacing per-stop route lookups with a single batch query and in-memory stop→routes mapping (Fixes #304).

Changes:

  • Collect stop IDs up front and replace per-stop GetRouteIDsForStop calls with a single batch query.
  • Build an in-memory stopRouteMap for O(1) route lookups per stop.
  • Expand route “presence” registration based on routes returned for stops.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +376 to +381
// This registers all the routes that were in stopRouteMap as present, even if they don't have a trip in the current set. This ensures that any route that has a stop in the area is included in the references, even if no active trips for that route are currently visible.
for _, routeIDs := range stopRouteMap {
for _, routeID := range routeIDs {
rb.presentRoutes[routeID] = models.Route{}
}
}
Copy link

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The route IDs coming from GetRouteIDsForStops are the combined agency_id + '_' + route.id values, but presentRoutes is keyed by raw routes.id (it’s later used with GetRoutesByIDs and looked up via tripDetails.RouteID). Storing combined IDs here means these “stop-only” routes won’t be fetched/returned in references, despite the comment. Consider switching to GetRoutesForStops and keying presentRoutes by raw route ID while separately building combined IDs for Stop.RouteIDs.

Copilot uses AI. Check for mistakes.
Comment on lines +365 to 369
routesForStops, err := rb.api.GtfsManager.GtfsDB.Queries.GetRouteIDsForStops(rb.ctx, stopIDs)
if err != nil {
logging.LogError(rb.api.Logger, "failed to batch fetch routes for stops", err)
return
}
Copy link

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On batch query failure you return from buildStopList, which drops all stops from the references (and doesn’t propagate an error). Previously, a per-stop failure only skipped that stop. Either return an error up to build() or degrade gracefully by still appending stops with empty RouteIDs when the batch query fails.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot open a new pull request to apply changes based on this feedback

Comment on lines 357 to 389
func (rb *referenceBuilder) buildStopList(stops []gtfsdb.Stop) {
rb.stopList = make([]models.Stop, 0, len(stops))

stopIDs := make([]string, 0, len(stops))
for _, stop := range stops {
routeIds, err := rb.api.GtfsManager.GtfsDB.Queries.GetRouteIDsForStop(rb.ctx, stop.ID)
if err != nil {
continue
}
stopIDs = append(stopIDs, stop.ID)
}

routeIdsString := rb.processRouteIds(routeIds)
rb.stopList = append(rb.stopList, rb.createStop(stop, routeIdsString))
routesForStops, err := rb.api.GtfsManager.GtfsDB.Queries.GetRouteIDsForStops(rb.ctx, stopIDs)
if err != nil {
logging.LogError(rb.api.Logger, "failed to batch fetch routes for stops", err)
return
}
}

func (rb *referenceBuilder) processRouteIds(routeIds []interface{}) []string {
routeIdsString := make([]string, len(routeIds))
for i, id := range routeIds {
routeId := id.(string)
rb.presentRoutes[routeId] = models.Route{}
routeIdsString[i] = routeId
stopRouteMap := make(map[string][]string)

for _, r := range routesForStops {
stopRouteMap[r.StopID] = append(stopRouteMap[r.StopID], r.ID)
}
// This registers all the routes that were in stopRouteMap as present, even if they don't have a trip in the current set. This ensures that any route that has a stop in the area is included in the references, even if no active trips for that route are currently visible.
for _, routeIDs := range stopRouteMap {
for _, routeID := range routeIDs {
rb.presentRoutes[routeID] = models.Route{}
}
}
for _, stop := range stops {
routeIdsString := stopRouteMap[stop.ID]
if routeIdsString == nil {
routeIdsString = []string{}
}
rb.stopList = append(rb.stopList, rb.createStop(stop, routeIdsString))
}
return routeIdsString
}
Copy link

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are handler tests for trips-for-location, but none assert that the response references includes the expected stops/routes after this refactor. Adding an assertion around data["references"].(map).Routes/Stops (and that route IDs are non-null slices) would help prevent regressions in this batch mapping logic.

Copilot uses AI. Check for mistakes.
stopRouteMap := make(map[string][]string)

for _, r := range routesForStops {
stopRouteMap[r.StopID] = append(stopRouteMap[r.StopID], r.ID)
Copy link

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetRouteIDsForStops returns rows with fields RouteID (interface{}) and StopID; there is no ID field. This will not compile and also needs a type assertion (as done in other handlers) before appending into map[string][]string.

Suggested change
stopRouteMap[r.StopID] = append(stopRouteMap[r.StopID], r.ID)
routeID, ok := r.RouteID.(string)
if !ok {
continue
}
stopRouteMap[r.StopID] = append(stopRouteMap[r.StopID], routeID)

Copilot uses AI. Check for mistakes.
Copy link
Member

@aaronbrethorst aaronbrethorst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Sreenivasiv, you're making good progress on this N+1 optimization. You've addressed items 2–5 from my first review — presentRoutes is now populated, processRouteIds is removed, the error is logged, and the formatting is clean. However, the critical item from last time is still not quite right:

Critical

1. r.ID doesn't exist on GetRouteIDsForStopsRow — this won't compile

You've switched to the correct query (GetRouteIDsForStops), but the code accesses r.ID on the result:

stopRouteMap[r.StopID] = append(stopRouteMap[r.StopID], r.ID)

GetRouteIDsForStopsRow has two fields: RouteID interface{} and StopID string — there is no ID field. This will fail to compile. Additionally, since RouteID is typed as interface{}, you need a type assertion. The fix:

stopRouteMap[r.StopID] = append(stopRouteMap[r.StopID], r.RouteID.(string))

Please also verify this compiles locally by running make build before pushing.

Important

2. Squash and rebase

The branch has two commits and is seven commits behind main. Please squash into a single atomic commit and rebase against main. If rebasing feels unfamiliar, here's the guide from last time: https://www.brethorsting.com/blog/2026/01/git-rebase-for-the-terrified/

It's OK to use AI tools to build new features and bug fixes, but you must make sure you understand everything that the AI tool is doing, using it as an enhancement, not as a crutch. Please read our policy on AI contributions. https://opentransitsoftwarefoundation.org/2025/12/our-policy-on-ai-generated-contributions/

Thanks again, and I look forward to merging this change.

@chokoskoder
Copy link

hey I saw that this problem wasnt merged properly 2 weeks ago , can I take it up and generate a commit pr for this ?

@sreenivasivbieb
Copy link
Contributor Author

If you want to do it you can proceed on with it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix N+1 query in buildStopList — replace per-stop GetRouteIDsForStop with batch GetRoutesForStops

4 participants