Skip to content

Conversation

@pnijhara
Copy link

@pnijhara pnijhara commented Dec 13, 2025

What does this PR change?

This PR implements the parallel maximal independent set via Luby's random permutation mechanism (solving issue #144)

  • Add parallel implementation of maximal_independent_set algorithm
  • Implements Luby-style randomized parallel algorithm for speedup on large graphs (50K+ nodes)
  • Works best on large graphs (n>50,000)

What is your approach, and why did you choose it?

A standard parallel MIS algorithm assigns a random priority to every vertex once per round, then selects all locally maximal vertices at the same time:

while graph not empty:
    generate random priority for each vertex
    S = { u | priority(u) > priority(v) for all v in N(u) }
    add S to MIS
    remove S ∪ N(S) from graph

Why this is parallel:

  • Priority comparisons for all vertices are independent operations.
  • Set S can be selected in one parallel pass.
  • All vertices in S are guaranteed independent.
  • Removing S ∪ N(S) can also be done in parallel.

This matches the PRAM formulation and is widely used in CPU and GPU environments.

Brief summary of changes (1–2 lines)

  • Added a file algorithms/mis.py
  • Added tests for algorithms/mis.py in algorithms/tests/test_mis.py
  • Found in the local benchmark tests that for node count less than 50K, sequential code is working best; however, for larger graphs such as n = 100K and 1M, parallel code is working best. This may be because of joblib's parallelization overhead. I tested this in C++ with OpenMP. The same issue persists there, but with a lesser node count.

@pnijhara pnijhara changed the title [DRAFT] Added functionality of parallel maximal independent set [WIP] Added functionality of parallel maximal independent set Dec 13, 2025
@pnijhara
Copy link
Author

@dschult please review it.

@Schefflera-Arboricola Schefflera-Arboricola added the type: Enhancement New feature or request label Dec 13, 2025
Copy link
Member

@dschult dschult left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this!

Some formatting suggestions before diving into the code/tests:

  • we typically don't put error messages on the "assert" command. The pytest report of the assert failure is sufficient.
  • we also typically dont include a comment line at the beginning of the test when the test function's name is sufficient. If the test involves soemthing special or unexpected, then a couple line doc_string (or comment) is used. We aren't building docs for the tests.
  • instead of building a sequential version of the code inside nx-parallel, does it make sense to just "fall back" to the NetworkX version?

@pnijhara
Copy link
Author

instead of building a sequential version of the code inside nx-parallel, does it make sense to just "fall back" to the NetworkX version?

I tried this approach; however, the tests, because of nx.maximal_independent_set, were failing. Because of this, I attempted to implement the sequential mis approach myself.

@dschult
Copy link
Member

dschult commented Dec 14, 2025

OK... I understtand. That makes sense. But the NetworkX backend system has a way to flag functions so that they should not run under certain conditions. The backend provides a "should_run" function which is called by NetworkX before it tries to run the backend version of the function. If the should_run function returns True, the backend version is called. If that function returns a string or False, the backend function does not run (and the string is used to describe why it should not run -- and is used in the logging system for backends.

In nx-parallel we have a set of utility functions to help us set that up. They are currently used in centrality/harmonic.py and algorithms/dag.py. But neither are checking the size of the graph. The way to use "should_run" is with a decorator provided in utils like this from harmonic_centrality:

@nxp._configure_if_nx_active(should_run=nxp.should_run_if_sparse(threshold=0.3))
 def harmonic_centrality(

That should_run function checks the density of the graph against a threshold. There is another function provided in utils called nxp.should_run_if_large. So I think we will use that one. Something like:

@nxp._configure_if_nx_active(should_run=nxp.should_run_if_large(50000))

Unfortunately, while the function has been created, it currently has no way to pass in the size of the threshold -- and it is set to 200 nodes and currently not used anywhere. You can still use it here though -- it will ignore the input -- and use 200. But you can get this code working with that cutoff.

If you are up for it, you could update the utility function should_run_if_large to use the input value in place of the currently hardcoded 200. Maybe call the argument nodes or nodes_threshold? You can use a default value of 200 nodes. Then this PR can use that function with argument 50000.
If you want, you could even make those changes in another PR and we can review/merge it pretty quickly.

If you'd prefer not to mess with that let me know and I'll fix it.

@pnijhara
Copy link
Author

pnijhara commented Dec 14, 2025

I tried as suggested, I am not very sure whether I have done this on correct side. Now, instead of duplicating NetworkX's sequential MIS code, we now fall back to their implementation for small graphs (< 50,000 nodes).

What I did:

  1. Dual-mode should_run_policies.py policy

Modified to work in two modes:

  • Direct function: should_run=nxp.should_run_if_large uses default 200 node threshold
  • Factory function: should_run=nxp.should_run_if_large(50000) returns wrapper with custom threshold

Also updated all policy signatures to accept **__ for keyword arguments passed by the backend dispatcher.

  1. Unwrapping NetworkX's dispatcher
    Used inspect.unwrap() to access the actual NetworkX implementation, bypassing multiple decorator layers. Direct import caused infinite recursion since NetworkX's maximal_independent_set is itself a dispatched function. This was hard to fix. I had to use LLMs, sorry :(

  2. Explicit fallback check

Added manual should_run check because NetworkX bypasses should_run when backend is explicitly specified (e.g., backend='parallel'). Also converted seed to Random object before fallback since the unwrapped NetworkX function expects Random, not int.

@pnijhara pnijhara requested a review from dschult December 14, 2025 17:10
Copy link
Member

@dschult dschult left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't worked through the actual code yet, but looked at the tests and the docs. The tests seem like maybe some of them are already covered when we run the networkx tests using the environment variable set for nx-parallel. In that case, each networkx test is run with the nx-parallel code being called when it is supported. And we run that in the nx-parallel CI tests.

  • can you see whether any of the nx-parallel tests are already tested by the networkx tests -- and remove the duplicates?
  • I have a couple other comments/questions below.

I hope to look at the code itself soon.
:)

Comment on lines 9 to 10
from networkx.algorithms.mis import maximal_independent_set as _nx_mis_dispatcher
_nx_mis = inspect.unwrap(_nx_mis_dispatcher)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can use nx.maximal_independent_set._orig_func as the way to call the original networkx function. Can you check if that works? It is possible I haven't fully understood how that works. That way we don't have to use inspect.

@pnijhara
Copy link
Author

can you see whether any of the nx-parallel tests are already tested by the networkx tests -- and remove the duplicates?

I checked all the tests of nx and nx-parallel and found most of the tests are already covered. For example, test_maximal_independent_set_basic is similr totest_random_graphs. The only test which I think is new and should be kept is test_maximal_independent_set_large_graph `. This is the test that tests the parallel execution. Let me know whether a single test seems to be viable.

Some other tests that I can think of are:

  1. Test chunking behavior - Test different get_chunks parameter values
    def test_custom_chunking():
    "Test with custom chunk function"

  2. Test threshold boundary - Verify fallback to sequential at threshold (50000 nodes) (this can be an overhead on a low-end machine)
    def test_fallback_below_threshold():
    "Graph with 49999 nodes should use sequential"
    def test_parallel_above_threshold():
    "Graph with 50001 nodes should use parallel"

  3. Test n_jobs behavior - Verify parallel execution with different n_jobs (I am not inclined towards this)
    def test_with_different_njobs():
    "Test with n_jobs=1, 2, 4, etc."

Let me know if these add value. I can think of implementing some of them.

I think you can use nx.maximal_independent_set._orig_func as the way to call the original networkx function. Can you check if that works? It is possible I haven't fully understood how that works. That way we don't have to use inspect.

I fixed this one by

from networkx.algorithms.mis import maximal_independent_set as _nx_mis_dispatcher
_nx_mis = _nx_mis_dispatcher.orig_func

Comment on lines +17 to +18
# If nodes_threshold is a graph-like object, it's being used as a direct should_run
# function instead of a factory. Use default threshold.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer the function signature here be: should_run_if_large(G, nodes_threshold=200, *_, **__)
So can you briefly explain where this might be used as a factory? The previous version doesn't seem to deal with that case. Of course, it didn't deal with keyword args either... ;}

Comment on lines +84 to +86
# Validate directed graph
if G.is_directed():
raise nx.NetworkXNotImplemented("Not implemented for directed graphs.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the networkx version work with directed graphs?
If the networkx version uses the not_implemented_for decorator, then I think that decorator runs before the backend is called. You can check whether this is ever called by changing the message (locally) and trying it to see which method gets shown.

Comment on lines +88 to +98
# Convert seed to Random object if needed (for fallback and parallel execution)
import random
if seed is not None:
if hasattr(seed, 'random'):
# It's already a RandomState/Random object
rng = seed
else:
# It's a seed value
rng = random.Random(seed)
else:
rng = random.Random()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly doesn't the see inut get handled by the decorators in the networkx code before it ever gets to the backend? Check...

Comment on lines +100 to +105
# Check if we should run parallel version
# This is needed when backend is explicitly specified
should_run_result = maximal_independent_set.should_run(G, nodes, seed)
if should_run_result is not True:
# Fall back to NetworkX sequential (unwrapped version needs Random object)
return _nx_mis(G, nodes=nodes, seed=rng)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This definitely gets called by the backend machinery. This code should never be reached.


# Parallel strategy: Run complete MIS algorithm on node chunks independently
# Then merge results by resolving conflicts
all_nodes = list(G.nodes())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
all_nodes = list(G.nodes())
all_nodes = list(G)

if nodes_set:
available = set(all_nodes) - nodes_set
for node in nodes_set:
available.discard(node)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dont these nodes already get removed two lines up?

Comment on lines +181 to +183
if nodes_set:
for node in nodes_set:
excluded.update(adj_dict[node])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't we do this already with available?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

type: Enhancement New feature or request

Development

Successfully merging this pull request may close these issues.

Add parallel implementation of Combinatorial Algorithms

3 participants