S3 protocol #895

LuisSanchez25 · 2024-10-02T04:59:24Z

What is the problem / what does the code in this PR do

Adds S3 functionality to strax

Can you briefly describe how it works?

We can now use an S3 storage system to store data

Can you give a minimal working example (or illustrate with a figure)?

import strax
st = strax.Context()
s3_storage = strax.S3Frontent()
st.storage = [s3_storage]

Please include the following if applicable:

Update the docstring(s)
Update the documentation
Tests to check the (new) code is working as desired.
Does it solve one of the open issues on github?

Please make sure that all automated tests have passed before asking for a review (you can save the PR as a draft otherwise).

for more information, see https://pre-commit.ci

dachengx · 2024-10-02T05:19:58Z

Thanks, @LuisSanchez25. Would it be more appropriate to put these new StorageFrontend and StorageBackend to straxen like RucioRemoteFrontend https://github.com/XENONnT/straxen/blob/a412b4382eed3277fd599c105d69855df38bc7f8/straxen/storage/rucio_remote.py#L29?

dachengx · 2024-10-02T05:21:01Z

And I generally think this is a super good idea! We should benefit from the market resources!

LuisSanchez25 · 2024-10-02T05:25:24Z

Hey @dachengx I am a bit unfamiliar with why the decision was made to put the Rucio frontend in straxen rather than in strax, that feels like a tool that would be beneficial for others outside of XENON right? So to me it might make more sense to move the Rucio storage to straxen but maybe I am just missing something.

Thanks! I am still in the process of fully testing this, I think it still needs some tweaks but I should have a working prototype soon!

dachengx · 2024-10-02T14:12:42Z

@LuisSanchez25 I think by design, strax is for the prototypes of all classes, like plugins and storage. straxen will inherit the classes and make the functions more specific. I think this is why they put the RucioRemoteFrontend to straxen.

dachengx

Thanks @LuisSanchez25 . I did not look into s3.py deeply because I see there are commented-out codes so maybe it is not finished?

I still would insist on moving the S3 functionality to straxen. strax should be just basic usage and processor etc.

You should not change save_file but do what should be done only in _save_chunk function.

strax/io.py

for more information, see https://pre-commit.ci

formating fix

for more information, see https://pre-commit.ci

coveralls · 2025-05-06T21:54:06Z

coverage: 87.217% (-1.7%) from 88.958%
when pulling 4bb5ddb on s3_protocol
into cd79adc on master.

Copilot

Pull Request Overview

Adds S3 storage support to Strax, enabling reading and writing data from/to S3 buckets.

Introduces new S3-based load_file_from_s3 and _save_file_to_s3 helpers integrated into load_file/save_file
Adds an stx_file_parser utility for parsing Strax file names
Updates tests and dependencies to include boto3 and a basic S3 write test

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
tests/test_storage.py	Adds `test_write_data_s3` stub to verify S3 writes
strax/utils.py	New `stx_file_parser` function
strax/storage/files.py	Annotated `run_metadata` signature
strax/io.py	Extended `load_file`/`save_file` for S3, new helpers
strax/init.py	Exposed S3 frontend in package exports
pyproject.toml	Added `boto3` dependency

Comments suppressed due to low confidence (5)

strax/io.py:84

[nitpick] Adding positional parameters may break existing callers. Consider making bucket_name and is_s3_path keyword-only (e.g. *, bucket_name=None) and updating docstrings to reflect their usage.

def load_file(f, compressor, dtype, bucket_name=None, is_s3_path=False):

tests/test_storage.py:35

There’s no test coverage for the new _save_file_to_s3 or load_file_from_s3 paths. Add tests that simulate S3 interactions to ensure S3 upload/download works as expected.

def test_write_data_s3(self):

strax/utils.py:836

[nitpick] The docstring for stx_file_parser is brief and unclear about accepted formats and return structure. Expand it with examples of valid inputs and outputs.

def stx_file_parser(path: str):

tests/test_storage.py:39

The test checks is_configed against an empty string, but it likely returns a boolean or object. Make the condition explicit (e.g. if self.st.storage[0].is_configed:) or stub is_configed properly for the test.

if self.st.storage[0].is_configed != "":

strax/utils.py:835

stx_file_parser uses re.split without importing re, and if a ValueError is caught, file_data is undefined. Add import re and explicitly raise or return a default when parsing fails.

@export

Copilot · 2025-06-04T20:19:10Z

strax/io.py

            return np.frombuffer(data, dtype=dtype)
        except ValueError as e:
            raise ValueError(f"ValueError while loading data with dtype =\n\t{dtype}") from e
+    except Exception as e:


The first except Exception bar will catch all exceptions, so the subsequent except Exception block is never reached. Consider combining handlers or narrowing exception types to preserve the DataCorrupted path.

Suggested change

except Exception as e:

except (ValueError, KeyError) as e:

Copilot · 2025-06-04T20:19:11Z

strax/io.py

+            s3_interface = s3_client
+            # Copy temp file to final file
+            result = _save_file_to_s3(s3_interface, temp_fn, data, Bucket, compressor)
+            s3_interface.copy_object(
+                Bucket=Bucket,
+                Key=final_fn,
+                CopySource={"Bucket": Bucket, "Key": temp_fn},
+            )
+
+            # Delete the temporary file
+            s3_interface.delete_object(Bucket=Bucket, Key=temp_fn)


[nitpick] Inconsistent variable naming between s3_client and s3_interface can be confusing; pick one name for clarity.

Suggested change

s3_interface = s3_client

# Copy temp file to final file

result = _save_file_to_s3(s3_interface, temp_fn, data, Bucket, compressor)

s3_interface.copy_object(

Bucket=Bucket,

Key=final_fn,

CopySource={"Bucket": Bucket, "Key": temp_fn},

)

# Delete the temporary file

s3_interface.delete_object(Bucket=Bucket, Key=temp_fn)

# Copy temp file to final file

result = _save_file_to_s3(s3_client, temp_fn, data, Bucket, compressor)

s3_client.copy_object(

Bucket=Bucket,

Key=final_fn,

CopySource={"Bucket": Bucket, "Key": temp_fn},

)

# Delete the temporary file

s3_client.delete_object(Bucket=Bucket, Key=temp_fn)

Copilot · 2025-06-04T20:19:11Z

strax/io.py

+        file_data = response["Body"].read()  # Read the content of the file from S3
+
+        # Create a file-like object from the binary data
+        file_buffer = BytesIO(file_data)


[nitpick] Reading the entire S3 object into memory with read() may not scale for large files. Consider streaming decompression or chunked reads to reduce memory usage.

Suggested change

file_data = response["Body"].read() # Read the content of the file from S3

# Create a file-like object from the binary data

file_buffer = BytesIO(file_data)

file_buffer = BytesIO() # Create a file-like object to store the data

for chunk in response["Body"].iter_chunks(chunk_size=DECOMPRESS_BUFFER_SIZE):

file_buffer.write(chunk)

file_buffer.seek(0) # Reset the buffer to the beginning

for more information, see https://pre-commit.ci

LuisSanchez25 and others added 6 commits September 26, 2024 14:36

add S3 functionality to strax

5ca8417

add _get_config_values

d1cbeeb

S3 code pases st.make test

f66ff0a

add types (incomplete)

fb4e2ca

Merge branch 'master' into s3_protocol

73f6faa

[pre-commit.ci] auto fixes from pre-commit.com hooks

c6c71dd

for more information, see https://pre-commit.ci

Merge branch 'master' into s3_protocol

89742f6

yuema137 requested review from dachengx and yuema137 November 15, 2024 05:34

dachengx requested changes Nov 15, 2024

View reviewed changes

strax/io.py Outdated Show resolved Hide resolved

yuema137 and others added 16 commits November 15, 2024 09:55

Merge branch 'master' into s3_protocol

6c2548b

Merge branch 'master' into s3_protocol

607514b

Merge branch 'master' into s3_protocol

e846991

update io

f2dbc7c

update documentation + change bucket initialization

9409a2e

[pre-commit.ci] auto fixes from pre-commit.com hooks

63e0bad

for more information, see https://pre-commit.ci

remove uneeded imports + reduce line length

55fe59b

[pre-commit.ci] auto fixes from pre-commit.com hooks

c5cce8e

for more information, see https://pre-commit.ci

Merge branch 'master' into s3_protocol

8789a53

fix style

107f0d9

Update io.py

d653133

formating fix

fix test

7386f90

[pre-commit.ci] auto fixes from pre-commit.com hooks

d9aa5b6

for more information, see https://pre-commit.ci

add self.bucket_name

c306023

Remove unnecessary functions

f8d2bee

[pre-commit.ci] auto fixes from pre-commit.com hooks

e820701

for more information, see https://pre-commit.ci

LuisSanchez25 and others added 5 commits March 5, 2025 01:32

remove unused variables

3770688

fix return output + add bucket default name

c0a343f

Merge branch 'master' into s3_protocol

1c2c452

fix broken functions after update

514002b

[pre-commit.ci] auto fixes from pre-commit.com hooks

4320190

for more information, see https://pre-commit.ci

change boto3 to compatible version

f7e56af

LuisSanchez25 requested a review from dachengx May 7, 2025 14:42

Merge branch 'master' into s3_protocol

3864a43

LuisSanchez25 requested a review from Copilot June 4, 2025 20:16

Copilot AI reviewed Jun 4, 2025

View reviewed changes

LuisSanchez25 and others added 3 commits July 29, 2025 14:49

update files based on copilot recomendation

29545d3

Merge branch 'master' into s3_protocol

ee135f1

[pre-commit.ci] auto fixes from pre-commit.com hooks

4bb5ddb

for more information, see https://pre-commit.ci

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

S3 protocol #895

S3 protocol #895

Uh oh!

LuisSanchez25 commented Oct 2, 2024 •

edited

Loading

Uh oh!

dachengx commented Oct 2, 2024

Uh oh!

dachengx commented Oct 2, 2024

Uh oh!

LuisSanchez25 commented Oct 2, 2024

Uh oh!

dachengx commented Oct 2, 2024

Uh oh!

dachengx left a comment

Uh oh!

Uh oh!

coveralls commented May 6, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jun 4, 2025

Uh oh!

Copilot AI Jun 4, 2025

Uh oh!

Copilot AI Jun 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

-        file_data = response["Body"].read()  # Read the content of the file from S3
-        # Create a file-like object from the binary data
-        file_buffer = BytesIO(file_data)
+        file_buffer = BytesIO()  # Create a file-like object to store the data
+        for chunk in response["Body"].iter_chunks(chunk_size=DECOMPRESS_BUFFER_SIZE):
+            file_buffer.write(chunk)
+        file_buffer.seek(0)  # Reset the buffer to the beginning

S3 protocol #895

Are you sure you want to change the base?

S3 protocol #895

Uh oh!

Conversation

LuisSanchez25 commented Oct 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dachengx commented Oct 2, 2024

Uh oh!

dachengx commented Oct 2, 2024

Uh oh!

LuisSanchez25 commented Oct 2, 2024

Uh oh!

dachengx commented Oct 2, 2024

Uh oh!

dachengx left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coveralls commented May 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Jun 4, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jun 4, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jun 4, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

LuisSanchez25 commented Oct 2, 2024 •

edited

Loading

coveralls commented May 6, 2025 •

edited

Loading