Skip to content

Conversation

@minsii
Copy link

@minsii minsii commented Feb 5, 2020

The notes for shmem_pe_accessible and shmem_addr_accessible have several issues as described below. Thus, I'd suggest we delete these notes.

  • shmem_pe_accessible:
...when an MPI job uses Multiple Program Multiple Data (MPMD) mode, 
multiple executable MPI programs are executed as part of the same MPI job. 
In such cases, OpenSHMEM support may only be available between processes 
running from the same executable file.

In a hybrid program, only processes that have initialized SHMEM can have valid PE numbers, and thus can be checked by shmem_pe_accessible.

In addition, some environments may allow a hybrid job to span multiple network partitions. 
In such scenarios, OpenSHMEM support may only be available between PEs within the 
same partition.

Same issue here, it is unclear how we can specify the PE of a remote process that exists in a different network partition. E.g., the PE numbering of two partitions can be {0,1,2} + {0,1,2}, or {0,1,2} + {3,4,5}. If P0 in the first partition checks the accessibility of a process in the second partition, the result is always TRUE in the former case.

  • shmem_addr_accessible
...when an MPI job uses MPMD mode, multiple executable MPI programs may use OpenSHMEM 
routines. In such cases, static memory, such as a C global variable, is symmetric between 
processes running from the same executable file, but is not symmetric between processes 
running from different executable files....

It is unclear how we can specify the SHMEM PE of a process that initializes only MPI.

In a hybrid SHMEM+MPI program, only processes that have initialized
SHMEM can have valid PE numbers, and thus can be checked by
shmem_addr_accessible and shmem_pe_accessible.
@minsii
Copy link
Author

minsii commented Feb 5, 2020

@agrippa @manjugv @swpoole @jamesaross @tonycurtis can you please review?

@tonycurtis
Copy link

tonycurtis commented Feb 5, 2020

Outside of the original SGI environment, shmem_pe_accessible still seems to have potential use e.g. as a simple fault tolerance check. shmem_addr_accessible could be used as an is-it-symmetric? check.

Agree on dropping old text.

@agrippa
Copy link

agrippa commented Feb 5, 2020

I think this text is saying that you might (for example) have a job with different executables in different nodes, but which all call shmem_init(). As a result, they might all receive a valid PE # from SHMEM but programs running different executables will not be able to access each other's symmetric data (e.g. global variables).

If these sorts of use cases are fundamentally invalid today, I think a change is needed but I think we need to replace it with other examples. It is not readily apparent how these routines are useful (particularly shmem_pe_accessible), and so some examples in the notes would be useful.

@minsii
Copy link
Author

minsii commented Feb 5, 2020

@tonycurtis shmem_addr_accessible returns TRUE also for an address that can be accessed by OpenSHMEM routines.

The return value is 1 if the local address addr is also a symmetric address and
the given data object is accessible via OpenSHMEM routines on the specified remote PE;

I actually do not know how the user can use shmem_addr_accessible as shmem_pe_accessible already validates the support of symmetric data objects.

@minsii
Copy link
Author

minsii commented Feb 5, 2020

@agrippa The #PE across SHMEM partitions might not be unique. E.g., 6 processes are initialized as two SHMEM partitions with #PE {0,1,2} and {0,1,2}. If P0 in the first partition checks the accessibility of a process in the second partition, the result will be always TRUE, which is inaccurate.

To make the example valid, we need to find a way to get unique #PE across partitions. But SHMEM does not provide such a functionality.

@agrippa
Copy link

agrippa commented Feb 5, 2020

@minsii A SHMEM program with non-unique PE IDs is not a compliant SHMEM program, so that isn't a case I'm worried about.

@tonycurtis
Copy link

Maybe shmem_pe_accessible could turn into a liveness check? Do you know if people use these in the wild?

@agrippa
Copy link

agrippa commented Feb 5, 2020

Maybe we need to bring the usefulness of these APIs up as a discussion item.

@minsii
Copy link
Author

minsii commented Feb 5, 2020

@agrippa I think we considered the SHMEM initialization in different ways. In my example, the two partitions are initialized as isolated environments, thus it guarantees unique #PE only inside a partition. Nevertheless, I think the notes are already confusing and the examples are unusual. It might be better to replace them with better examples, such as liveness check as suggested by @tonycurtis

@manjugv
Copy link
Owner

manjugv commented Feb 5, 2020

We should bring this up at the plenary and see if this interfaces need to be deprecated/removed.

@tonycurtis
Copy link

tonycurtis commented Feb 5, 2020 via email

@jdinan
Copy link

jdinan commented Feb 10, 2020

Please do not add "Notes: none" sections. Just delete these sections, per openshmem-org#330.

@minsii
Copy link
Author

minsii commented Feb 10, 2020

@jdinan Thanks, will fix.

I was not able to catch the conclusion at F2F meeting. Can someone please remind me which of the following options we want to go for 1.5?

  • Deprecate shmem_pe_accessible or shmem_addr_accessible, or both
  • Delete the example notes for both routines
  • Replace the examples with more common use cases for both routines
  • No change

@manjugv
Copy link
Owner

manjugv commented Feb 11, 2020

@minsii There was no action item for 1.5. We will address this more throughly for 1.6.

@manjugv manjugv self-assigned this Jun 8, 2020
manjugv pushed a commit that referenced this pull request Oct 26, 2020
Fix prototype type typos in deprecated reductions
manjugv pushed a commit that referenced this pull request Oct 26, 2020
Changelog: Reorder removal of SHMEM_CACHE
manjugv pushed a commit that referenced this pull request Oct 26, 2020
RM data types from memory ordering figures
manjugv pushed a commit that referenced this pull request Oct 26, 2020
Improve use of "non" vs. "non-"
@naveen-rn
Copy link

@manjugv Should we assign this to someone else - looks like easy to change.

@jdinan
Copy link

jdinan commented Sep 26, 2024

The change to shmem_addr_accessible looks fine. I think there are more fundamental questions about shmem_pe_accessible and perhaps the most sensible course of action would be to deprecate the function.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants