Skip to content

fix: Use filename w/extension#1292

Merged
ns-rse merged 1 commit intomainfrom
ns-rse/1290-summary-stats-output
Feb 6, 2026
Merged

fix: Use filename w/extension#1292
ns-rse merged 1 commit intomainfrom
ns-rse/1290-summary-stats-output

Conversation

@ns-rse
Copy link
Collaborator

@ns-rse ns-rse commented Feb 4, 2026

Closes #1290
Closes #1287

Because of problems that arise with the old format of Bruker files where the filenames have numerical extensions there
was scope for data being over-written as TopoStats.filename uses the "stem", the part upto the file extension and so
file1.001, file1.002 and file1.003 all write output to the same (nested) directory file1. To solve this we now
include the file-extension in TopoStats.filename.

However, this presents a problem when processing .topostats files as you can't have a directory and a file with the
same name in the same location (manifested in the tests where we use minicircle_small.topostats). The solution is
therefore to write output files (the .topostats version of every file) to the directory
<output_dir>/processed/topostats/<filename_without_extension>.topostats.

In doing so we also modify io.save_topostats_file() and drop the filename parameter as it can be obtained directly
from the topostats_object.filename parameter passed in. We also introduce a step when saving files that don't already
carry the .topostats extension (i.e. .spm, .ibw, .006 etc.) to remove the last file extension from the filename
and replace with .topostats for the target that is to be written to.

This hopefully also addressed #1290 at the same time @tcatley if you are able to check with the
20240411_60ngTop2a_6ngpuc19_1mMATP_mgni.0_00024 file that uncovered the problem please that would be really useful and
appreciated.


Before submitting a Pull Request please check the following.

  • Existing tests pass.
  • Pre-commit checks pass.

@ns-rse ns-rse requested review from tcatley and tobyallwood February 4, 2026 10:20
@ns-rse ns-rse force-pushed the ns-rse/1290-summary-stats-output branch from d8ff896 to cf9671e Compare February 4, 2026 10:25
Copy link
Collaborator

@tobyallwood tobyallwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me 👍

Copy link
Collaborator

@tcatley tcatley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does run and output the .csv files. However as @tobyallwood pointed out in #1290, there is now two folders generated for each file - one with the file name with the .spm extension (containing the tracing/filters/grains) and one without (containing the csv files). Ideally this would only be one folder per image

Closes #1290
Closes #1287

Because of problems that arise with the old format of Bruker files where the filenames have numerical extensions there
was scope for data being over-written as `TopoStats.filename` uses the "stem", the part upto the file extension and so
`file1.001`, `file1.002` and `file1.003` all write output to the same (nested) directory `file1`. To solve this we now
include the file-extension in `TopoStats.filename`.

However, this presents a problem when processing `.topostats` files as you can't have a directory and a file with the
same name in the same location (manifested in the tests where we use `minicircle_small.topostats`). The solution is
therefore to write output files (the `.topostats` version of every file) to the directory
`<output_dir>/processed/topostats/<filename_without_extension>.topostats`.

In doing so we also modify `io.save_topostats_file()` and drop the `filename` parameter as it can be obtained directly
from the `topostats_object.filename` parameter passed in. We also introduce a step when saving files that don't already
carry the `.topostats` extension (i.e. `.spm`, `.ibw`, `.006` etc.) to remove the last file extension from the filename
and replace with `.topostats` for the target that is to be written to.

This hopefully also addressed #1290 at the same time @tcatley if you are able to check with the
`20240411_60ngTop2a_6ngpuc19_1mMATP_mgni.0_00024` file that uncovered the problem please that would be really useful and
appreciated.

Also

- Removes some errant `print()` statements
@ns-rse ns-rse force-pushed the ns-rse/1290-summary-stats-output branch from cf9671e to 4c1fdf6 Compare February 4, 2026 15:27
@ns-rse
Copy link
Collaborator Author

ns-rse commented Feb 4, 2026

Thanks @tcatley I've now fixed that, took longer than I expected (as is often the case!).

It stemmed from the disconnect between the code that writes all results out and the section that writes out per image statistics.

I do wonder though, as I had this problem when working on #1284, what the utility of duplicating output is. The all statistics .csv files at the top level of the output_dir contain the same results as the per-image, it just requires filtering. Maintaining such convenience functionality detracts from developer time to work on other tasks.

Basic data manipulation skills are invaluable and something all scientific researchers should have the in their skill set. An alternative if a single images output really is required is to just re-run TopoStats on just the single file.

@ns-rse ns-rse requested a review from tobyallwood February 4, 2026 15:31
@tobyallwood
Copy link
Collaborator

@ns-rse could it be possible the other set of files are left over from when we changed the names a few months back? Will take a look at the new code later am just out right now

@tcatley
Copy link
Collaborator

tcatley commented Feb 4, 2026

Agreed @ns-rse, I can't say I've ever actually used the per-image stats anyway - I would always just filter from the all_statistics.csv. If it is easier to just remove that output folder completely then maybe go with that

@ns-rse
Copy link
Collaborator Author

ns-rse commented Feb 4, 2026

@tobyallwood : No there is a specific function for writing them on a per image basis (it uses the basename value from the results dataframes) so it was something that was deliberately done at some point.

@tcatley : Good to hear, thanks. I'll write up removal as an issue to be addressed.

@ns-rse ns-rse merged commit 0cf2b5d into main Feb 6, 2026
7 checks passed
@ns-rse ns-rse deleted the ns-rse/1290-summary-stats-output branch February 6, 2026 14:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[bug]: Failed saving summary statistics due to period in filename Filenames of Old Bruker may conflict

3 participants