-
Notifications
You must be signed in to change notification settings - Fork 37
S3 protocol #895
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
S3 protocol #895
Conversation
|
Thanks, @LuisSanchez25. Would it be more appropriate to put these new |
|
And I generally think this is a super good idea! We should benefit from the market resources! |
|
Hey @dachengx I am a bit unfamiliar with why the decision was made to put the Rucio frontend in straxen rather than in strax, that feels like a tool that would be beneficial for others outside of XENON right? So to me it might make more sense to move the Rucio storage to straxen but maybe I am just missing something. Thanks! I am still in the process of fully testing this, I think it still needs some tweaks but I should have a working prototype soon! |
|
@LuisSanchez25 I think by design, strax is for the prototypes of all classes, like plugins and storage. straxen will inherit the classes and make the functions more specific. I think this is why they put the |
dachengx
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @LuisSanchez25 . I did not look into s3.py deeply because I see there are commented-out codes so maybe it is not finished?
I still would insist on moving the S3 functionality to straxen. strax should be just basic usage and processor etc.
You should not change save_file but do what should be done only in _save_chunk function.
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
formating fix
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Adds S3 storage support to Strax, enabling reading and writing data from/to S3 buckets.
- Introduces new S3-based
load_file_from_s3and_save_file_to_s3helpers integrated intoload_file/save_file - Adds an
stx_file_parserutility for parsing Strax file names - Updates tests and dependencies to include
boto3and a basic S3 write test
Reviewed Changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_storage.py | Adds test_write_data_s3 stub to verify S3 writes |
| strax/utils.py | New stx_file_parser function |
| strax/storage/files.py | Annotated run_metadata signature |
| strax/io.py | Extended load_file/save_file for S3, new helpers |
| strax/init.py | Exposed S3 frontend in package exports |
| pyproject.toml | Added boto3 dependency |
Comments suppressed due to low confidence (5)
strax/io.py:84
- [nitpick] Adding positional parameters may break existing callers. Consider making
bucket_nameandis_s3_pathkeyword-only (e.g.*, bucket_name=None) and updating docstrings to reflect their usage.
def load_file(f, compressor, dtype, bucket_name=None, is_s3_path=False):
tests/test_storage.py:35
- There’s no test coverage for the new
_save_file_to_s3orload_file_from_s3paths. Add tests that simulate S3 interactions to ensure S3 upload/download works as expected.
def test_write_data_s3(self):
strax/utils.py:836
- [nitpick] The docstring for
stx_file_parseris brief and unclear about accepted formats and return structure. Expand it with examples of valid inputs and outputs.
def stx_file_parser(path: str):
tests/test_storage.py:39
- The test checks
is_configedagainst an empty string, but it likely returns a boolean or object. Make the condition explicit (e.g.if self.st.storage[0].is_configed:) or stubis_configedproperly for the test.
if self.st.storage[0].is_configed != "":
strax/utils.py:835
stx_file_parserusesre.splitwithout importingre, and if aValueErroris caught,file_datais undefined. Addimport reand explicitly raise or return a default when parsing fails.
@export
strax/io.py
Outdated
| return np.frombuffer(data, dtype=dtype) | ||
| except ValueError as e: | ||
| raise ValueError(f"ValueError while loading data with dtype =\n\t{dtype}") from e | ||
| except Exception as e: |
Copilot
AI
Jun 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first except Exception bar will catch all exceptions, so the subsequent except Exception block is never reached. Consider combining handlers or narrowing exception types to preserve the DataCorrupted path.
| except Exception as e: | |
| except (ValueError, KeyError) as e: |
strax/io.py
Outdated
| s3_interface = s3_client | ||
| # Copy temp file to final file | ||
| result = _save_file_to_s3(s3_interface, temp_fn, data, Bucket, compressor) | ||
| s3_interface.copy_object( | ||
| Bucket=Bucket, | ||
| Key=final_fn, | ||
| CopySource={"Bucket": Bucket, "Key": temp_fn}, | ||
| ) | ||
|
|
||
| # Delete the temporary file | ||
| s3_interface.delete_object(Bucket=Bucket, Key=temp_fn) |
Copilot
AI
Jun 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Inconsistent variable naming between s3_client and s3_interface can be confusing; pick one name for clarity.
| s3_interface = s3_client | |
| # Copy temp file to final file | |
| result = _save_file_to_s3(s3_interface, temp_fn, data, Bucket, compressor) | |
| s3_interface.copy_object( | |
| Bucket=Bucket, | |
| Key=final_fn, | |
| CopySource={"Bucket": Bucket, "Key": temp_fn}, | |
| ) | |
| # Delete the temporary file | |
| s3_interface.delete_object(Bucket=Bucket, Key=temp_fn) | |
| # Copy temp file to final file | |
| result = _save_file_to_s3(s3_client, temp_fn, data, Bucket, compressor) | |
| s3_client.copy_object( | |
| Bucket=Bucket, | |
| Key=final_fn, | |
| CopySource={"Bucket": Bucket, "Key": temp_fn}, | |
| ) | |
| # Delete the temporary file | |
| s3_client.delete_object(Bucket=Bucket, Key=temp_fn) |
strax/io.py
Outdated
| file_data = response["Body"].read() # Read the content of the file from S3 | ||
|
|
||
| # Create a file-like object from the binary data | ||
| file_buffer = BytesIO(file_data) |
Copilot
AI
Jun 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Reading the entire S3 object into memory with read() may not scale for large files. Consider streaming decompression or chunked reads to reduce memory usage.
| file_data = response["Body"].read() # Read the content of the file from S3 | |
| # Create a file-like object from the binary data | |
| file_buffer = BytesIO(file_data) | |
| file_buffer = BytesIO() # Create a file-like object to store the data | |
| for chunk in response["Body"].iter_chunks(chunk_size=DECOMPRESS_BUFFER_SIZE): | |
| file_buffer.write(chunk) | |
| file_buffer.seek(0) # Reset the buffer to the beginning |
What is the problem / what does the code in this PR do
Adds S3 functionality to strax
Can you briefly describe how it works?
We can now use an S3 storage system to store data
Can you give a minimal working example (or illustrate with a figure)?
import straxst = strax.Context()s3_storage = strax.S3Frontent()st.storage = [s3_storage]Please include the following if applicable:
Please make sure that all automated tests have passed before asking for a review (you can save the PR as a draft otherwise).