Skip to content

Conversation

@hategan
Copy link
Collaborator

@hategan hategan commented Feb 28, 2025

Of course, the saga is not over.

When specifying --tag-output, mpirun is supposed to "tag each line" with [jobid, rank]<stdxxx>:. It mostly does. However
it occasionally does something else. Assume that a.txt and b.txt contain ABCD and EFGH, respectively. Running
mpirun --tag-output -n 1 cat a.txt b.txt mostly produces

[1,0]<stdout>:ABCDEFGH

Occasionally, the following shows up instead:

[1,0]<stdout>:ABCD[1,0]<stdout>:EFGH

That is indistinguishable from b.txt having contained [1,0]<stdout>:EFGH. This, my guess would be, is due to a brief delay between the files that cat introduces. This can be verified by adding more files for cat and seeing all kinds of combinations of tags popping out in the middle of a line.

One solution is to use heuristics and consider an output line to begin with the tag while also assuming that it is very unlikely for the application to produce the tag in the middle. Hence, we can filter on lines that start with the tag and then
remove any other tags that appear in the middle. This should significantly reduce the likelihood of random mishaps, but transforms it into less likely but deterministic mishaps (e.g., running echo "[1, 0]<stdout>:bla" through mpirun.

Another choice is --xml. Unfortunately, parsing XML in POSIX only is difficult and many simplifying assumptions are made. Nonetheless, that branch appears to work fine with OpenMPI 4, so, perhaps, the loss in clarity might not outweigh the benefits.

@andre-merzky
Copy link
Collaborator

A general remark: launching tasks on HPC systems poses a really large and complex problem space, and that it does require complex solutions. I am not sure though that adding solution complexity to a single shell script is a viable route in the long run. Don't get me wrong - I love shell scripting for it's directness, performance and conciseness - but maintainability and readability are not features of shell, and slowly growing the launcher into a single, non-modular and / or large shell script is not a route I would recommend, really.

Obviously I am biased, as we have been there and done that also in RP ;-) Our approach at the moment is to shove the complexities into modular python code and to generate small, readable and self-contained shell scripts on the fly. I wonder since quite some time if we should try to extract that code from RP, remove all dependencies, and make it usable for psi/j. I'd love to have a discussion about that at some point...

On to the problem at hand: Yes, I agree, finding the tag in the middle of a line is unlikely. But even so, it remains messy. Is it worth the effort? The original motivation was that mpirun produces various diagnostic output along with the application stdout and to filter that out. Well, any user running natively on that machine would also see that output - so psi/j is trying to improve over the system's native behavior (https://xkcd.com/1172/). I understand (and actually share) the sentiment, but the complexity tradeoff might not be worth it.

Having said all that: the code seems to be correct and seems to address the stated problem, so I'd probably approve the PR ;-)

PS.: XML? I happily would avoid that bottomless pit...

@hategan
Copy link
Collaborator Author

hategan commented Feb 28, 2025

A general remark: launching tasks on HPC systems poses a really large and complex problem space, and that it does require complex solutions. I am not sure though that adding solution complexity to a single shell script is a viable route in the long run. Don't get me wrong - I love shell scripting for it's directness, performance and conciseness - but maintainability and readability are not features of shell, and slowly growing the launcher into a single, non-modular and / or large shell script is not a route I would recommend, really.

I am very much in agreement with that statement. I believe that if you seek correctness you should probably stay as far away as possible from non-formal languages. Even Python is, to me, a step in the wrong direction.
The issue here is largely a practical one. The launcher code runs on compute nodes and systems make no guarantees that the environment on the compute nodes is the same as the one that PSI/J runs on. This mostly excludes Python as an alternative. We could consider Perl, but that's only a marginal improvement in readability. Doing the equivalent processing in PSI/J involves the additional complexity of having data generated in one place being filtered in a different place (not to mention the difficulty of doing that while streaming the data).

We could dismiss worries about sprawling shell scripts by noting that, after testing on a relatively diverse number of machines, the likelihood of needing to do this significantly in the future appears low. That said, the file_staging branch does, unfortunately, resort to shell scripts to emulate staging when it's not natively supported at the level defined by PSI/J (which ends up being pretty much everywhere). Even there, with the specification somewhat complete, it is unlikely that significant additions will be needed.

All that said, there might be clever solutions that I have not thought of, so suggestions are welcome.

PS: Thanks for taking a look at this.

@hategan
Copy link
Collaborator Author

hategan commented Mar 3, 2025

Most of the issues were popping up in the file_staging branch because of its additional tests involving cat/stdout gymnastics. Since merging this branch into file_staging, it looks like those issues are not showing up any more.

@hategan hategan merged commit e6b6de3 into main Mar 3, 2025
14 checks passed
@hategan hategan deleted the mpirun_output_take_2 branch March 3, 2025 00:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants