-
-
Notifications
You must be signed in to change notification settings - Fork 597
Description
Bug description
When investigating why my default.json search index which is 1.6MB was taking 3-4 seconds to load even with a simple:
return \Statamic\Facades\Search::index($index)
->ensureExists()
->search($query)
->limit($limit)
->get();I got digging and realised that getBaseItems in QueryBuilder hydrates all results and transforms them before applying the limit. This was crazy to me because the lookup() returns results already sorted by score (highest first). So when we do ->take(20), we get the top 20 most relevant results.
I came up with this (rough-ish) solution which got my search results down to roughly 200ms - not great but much better than 3-4s:
// Get the search index and perform raw lookup (fast - searches JSON file)
$searchIndex = StatamicSearchFacade::index($index);
$searchIndex->ensureExists();
$rawResults = $searchIndex->lookup($query);
if ($rawResults->isEmpty()) {
return collect();
}
// Limit results BEFORE hydration to avoid loading hundreds of entries
$limitedResults = $rawResults->take($limit);
// Extract entry IDs from the limited results
$ids = $limitedResults
->map(fn ($result) => Str::after($result['reference'], 'entry::'))
->values()
->all();
// Hydrate only the entries we need
$entries = Entry::query()->whereIn('id', $ids)->get()->keyBy->id();
// Return entries in the same order as search results (by relevance score)
return collect($ids)
->map(fn ($id) => $entries->get($id))
->filter();Then I got digging more and noticed in the IteratorBuilder the getFilteredItems always hydrates all results for everything too (entries etc). I understand due to being flat file, you have to do this in case it has orderBy or no limit, but you could potentially improve it to be smarter via batching if there were limit + where without order by. This would instantly fix a lot of search:
protected function getFilteredItems()
{
// No limit or has orderBy - can't optimize, need all items
if (!$this->limit || $this->orderBys) {
$items = $this->getBaseItems();
return $this->filterWheres($items);
}
// No wheres - just limit base items directly
if (empty($this->wheres)) {
return $this->getBaseItems()->slice($this->offset ?? 0, $this->limit);
}
// Has limit AND wheres - batch hydrate until we have enough
return $this->getFilteredItemsInBatches();
}This would be much faster on large sites on many queries while still supporting the current approach.
Let me know your thoughts - I'd be happy to give a PR a go, just wanted to get your thoughts on the issue before I went down a rabbit hole.
How to reproduce
return \Statamic\Facades\Search::index($index)
->ensureExists()
->search($query)
->limit($limit)
->get();You'd need ~1000 basic entries with title + content to see that .
Environment
Environment
Application Name: xxxx
Laravel Version: 12.40.2
PHP Version: 8.4.6
Composer Version: 2.8.4
Environment: production
Debug Mode: OFF
URL: xxxx
Maintenance Mode: OFF
Timezone: Australia/Brisbane
Locale: en
Cache
Config: CACHED
Events: CACHED
Routes: CACHED
Views: CACHED
Drivers
Broadcasting: log
Cache: redis
Database: mysql
Logs: stack / daily
Mail: smtp
Queue: redis
Session: redis
Storage
public/storage: NOT LINKED
Livewire
Livewire: v3.7.0
Statamic
Addons: 7
Sites: 1
Stache Watcher: Disabled (auto)
Static Caching: full
Version: 5.69.0 PRO
Statamic Addons
aerni/advanced-seo: 2.13.2
aerni/livewire-forms: 9.10.1
digital-bird/statamic-toc: 1.0.2
marcorieser/statamic-livewire: 4.5.3
mitydigital/sitemapamic: 3.4.1
mitydigital/statamic-two-factor: 2.7.3Installation
Fresh statamic/statamic site via CLI
Additional details
No response