Add reddit dump reader ingest tool with GCS support#5
Add reddit dump reader ingest tool with GCS support#5borisolver wants to merge 1 commit intomainfrom
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| &remaining, | ||
| batch_size, | ||
| &backend_url, | ||
| &fetcher_token, |
There was a problem hiding this comment.
Spawned tasks borrow backend_url/fetcher_token
The tokio::spawn closure passes &backend_url and &fetcher_token into process_input, but spawned futures must be 'static; these references point to stack-local Strings in main and make the future non‑'static, so the crate does not compile. Clone or move owned strings into the task before spawning.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Good catch — will clone owned copies of backend_url and fetcher_token before tokio::spawn to satisfy 'static.
| let resp = client.get(input).send().await?.error_for_status()?; | ||
| let stream = resp | ||
| .bytes_stream() | ||
| .map_err(|e| io::Error::new(io::ErrorKind::Other, e)); | ||
| let reader = StreamReader::new(stream); |
There was a problem hiding this comment.
HTTP reader uses map_err without StreamExt in scope
The HTTP branch builds resp.bytes_stream().map_err(...), but no StreamExt/TryStreamExt trait is imported, so .map_err is not available on the stream and the new crate fails to compile for HTTP inputs. Bring the appropriate extension trait into scope or avoid using map_err here.
Useful? React with 👍 / 👎.
Summary
reddit_dump_readertool for streaming Reddit dump NDJSON files or URLs into bulk_ingestTesting
cargo test --manifest-path tools/reddit_dump_reader/Cargo.toml(fails: crates.io index blocked by network restrictions in the environment)Codex Task