-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Proposal
For very large batch operations, the current implementation loads all results into memory before outputting. This can cause memory exhaustion on sites with many entities.
Current Behavior
public function batch(...): array {
$results = [];
foreach ($entities as $entity) {
$results[] = $this->collector->collectIntel($entity, [], $plugins); // Accumulates in memory
}
return $results; // Full array returned
}With --limit=1000 on entities with rich field data, this could consume significant memory.
Proposed Enhancement
Add a ci:stream command or --stream option that outputs entities one at a time using JSON Lines format:
#[CLI\Command(name: 'ci:stream', aliases: ['cist'])]
public function stream(string $entity_type, array $options = [...]): void {
// Process one entity at a time
foreach ($this->getEntityIterator($entity_type, $options) as $entity) {
$intel = $this->collector->collectIntel($entity, [], $plugins);
// Output immediately as JSON line
$this->output()->writeln(json_encode($intel));
// Memory freed after each iteration
}
}Benefits
- Memory efficiency: Constant memory usage regardless of batch size
- Streaming output: Results appear as they're processed
- Pipeline-friendly: JSON Lines format works with
jq,head,tail, etc. - Resilience: Partial results available even if process is interrupted
Use Cases
- Exporting all content for AI training
- Generating sitemaps or content inventories
- Migration/sync pipelines
- Large-scale content analysis
Output Format
JSON Lines (one JSON object per line):
{"entity":{"entity_type":"node","id":"1",...},"fields":{...},"intel":{...}}
{"entity":{"entity_type":"node","id":"2",...},"fields":{...},"intel":{...}}
Alternative: Generator in Service
public function collectIntelBatch(string $entity_type, array $options): \Generator {
foreach ($this->getEntityIterator(...) as $entity) {
yield $this->collectIntel($entity);
}
}This keeps memory-efficient iteration in the service layer.
Metadata
Metadata
Assignees
Labels
No labels