-
Notifications
You must be signed in to change notification settings - Fork 17
added random_landmarking support for precomputed distance/affinity #88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Pull Request Test Coverage Report for Build 20349244901Details
💛 - Coveralls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds support for random landmarking with precomputed distance and affinity matrices, addressing issue #87. Previously, random landmarking only worked with raw data using Euclidean distance computations. Now users can provide precomputed distance or affinity matrices and use random landmarking for landmark-based dimensionality reduction.
Key changes:
- Extended
build_landmark_op()to detect and handle precomputed matrices by using affinity-based clustering instead of distance-based clustering - Added test coverage for random landmarking with precomputed affinity matrices
- Preserved existing functionality for non-precomputed graphs
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| graphtools/graphs.py | Modified build_landmark_op() method to check for precomputed matrices and use kernel-based affinity clustering (argmax) instead of distance-based clustering (argmin) when precomputed matrices are detected |
| test/test_random_landmarking.py | Added new test test_random_landmarking_with_precomputed_affinity() to verify random landmarking works correctly with precomputed affinity matrices |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def test_random_landmarking_with_precomputed_affinity(): | ||
| """Random landmarking should work with precomputed affinity matrices""" | ||
| affinity = np.array( | ||
| [ | ||
| [1.0, 0.8, 0.1, 0.0, 0.0, 0.0], | ||
| [0.8, 1.0, 0.2, 0.0, 0.0, 0.0], | ||
| [0.1, 0.2, 1.0, 0.9, 0.4, 0.0], | ||
| [0.0, 0.0, 0.9, 1.0, 0.5, 0.2], | ||
| [0.0, 0.0, 0.4, 0.5, 1.0, 0.9], | ||
| [0.0, 0.0, 0.0, 0.2, 0.9, 1.0], | ||
| ] | ||
| ) | ||
| affinity = (affinity + affinity.T) / 2 # ensure symmetry | ||
| n_landmark = 3 | ||
| random_state = 42 | ||
|
|
||
| G = graphtools.Graph( | ||
| affinity, | ||
| precomputed="affinity", | ||
| n_landmark=n_landmark, | ||
| random_landmarking=True, | ||
| random_state=random_state, | ||
| knn=3, | ||
| thresh=0, | ||
| ) | ||
|
|
||
| # Trigger landmark construction | ||
| _ = G.landmark_op | ||
|
|
||
| rng = np.random.default_rng(random_state) | ||
| landmark_indices = rng.choice(affinity.shape[0], n_landmark, replace=False) | ||
| expected_clusters = np.asarray( | ||
| G.kernel[:, landmark_indices].argmax(axis=1) | ||
| ).reshape(-1) | ||
|
|
||
| assert np.array_equal(G.clusters, expected_clusters) | ||
| assert G.transitions.shape == (affinity.shape[0], n_landmark) | ||
| assert G.landmark_op.shape == (n_landmark, n_landmark) | ||
|
|
Copilot
AI
Dec 18, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While this test covers precomputed="affinity", there is no test coverage for precomputed="distance". Since the implementation in graphs.py handles precomputed distance matrices by converting them to affinity matrices in build_kernel(), it would be valuable to add a similar test case that verifies random landmarking works correctly with precomputed distance matrices as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def test_random_landmarking_with_precomputed_affinity(): | ||
| """Random landmarking should work with precomputed affinity matrices""" | ||
| affinity = np.array( | ||
| [ | ||
| [1.0, 0.8, 0.1, 0.0, 0.0, 0.0], | ||
| [0.8, 1.0, 0.2, 0.0, 0.0, 0.0], | ||
| [0.1, 0.2, 1.0, 0.9, 0.4, 0.0], | ||
| [0.0, 0.0, 0.9, 1.0, 0.5, 0.2], | ||
| [0.0, 0.0, 0.4, 0.5, 1.0, 0.9], | ||
| [0.0, 0.0, 0.0, 0.2, 0.9, 1.0], | ||
| ] | ||
| ) | ||
| affinity = (affinity + affinity.T) / 2 # ensure symmetry | ||
| n_landmark = 3 | ||
| random_state = 42 | ||
|
|
||
| G = graphtools.Graph( | ||
| affinity, | ||
| precomputed="affinity", | ||
| n_landmark=n_landmark, | ||
| random_landmarking=True, | ||
| random_state=random_state, | ||
| knn=3, | ||
| thresh=0, | ||
| ) | ||
|
|
||
| # Trigger landmark construction | ||
| _ = G.landmark_op | ||
|
|
||
| rng = np.random.default_rng(random_state) | ||
| landmark_indices = rng.choice(affinity.shape[0], n_landmark, replace=False) | ||
| expected_clusters = np.asarray( | ||
| G.kernel[:, landmark_indices].argmax(axis=1) | ||
| ).reshape(-1) | ||
|
|
||
| assert np.array_equal(G.clusters, expected_clusters) | ||
| assert G.transitions.shape == (affinity.shape[0], n_landmark) | ||
| assert G.landmark_op.shape == (n_landmark, n_landmark) | ||
|
|
Copilot
AI
Dec 18, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test only covers dense precomputed affinity matrices. Consider adding test coverage for sparse precomputed affinity matrices to ensure the sparse matrix handling code path (lines 1210-1217 in graphs.py) works correctly with random landmarking.
| def test_random_landmarking_with_precomputed_distance(): | ||
| """Random landmarking should work with precomputed distance matrices""" | ||
| dist = np.array( | ||
| [ | ||
| [0, 1, 4, 4, 4, 4], | ||
| [1, 0, 4, 4, 4, 4], | ||
| [4, 4, 0, 1, 4, 4], | ||
| [4, 4, 1, 0, 4, 4], | ||
| [4, 4, 4, 4, 0, 1], | ||
| [4, 4, 4, 4, 1, 0], | ||
| ] | ||
| ) | ||
|
|
||
| n_landmark = 3 | ||
| random_state = 42 | ||
|
|
||
| G = graphtools.Graph( | ||
| dist, | ||
| precomputed="distance", | ||
| n_landmark=n_landmark, | ||
| random_landmarking=True, | ||
| random_state=random_state, | ||
| bandwidth=1, # deterministic affinity: exp(-dist) | ||
| decay=1, | ||
| thresh=0, | ||
| knn=3, | ||
| ) | ||
|
|
||
| # Trigger landmark construction | ||
| _ = G.landmark_op | ||
|
|
||
| rng = np.random.default_rng(random_state) | ||
| landmark_indices = rng.choice(dist.shape[0], n_landmark, replace=False) | ||
| expected_clusters = np.asarray( | ||
| G.kernel[:, landmark_indices].argmax(axis=1) | ||
| ).reshape(-1) | ||
|
|
||
| assert np.array_equal(G.clusters, expected_clusters) | ||
| assert G.transitions.shape == (dist.shape[0], n_landmark) | ||
| assert G.landmark_op.shape == (n_landmark, n_landmark) | ||
|
|
Copilot
AI
Dec 18, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test only covers dense precomputed distance matrices. Consider adding test coverage for sparse precomputed distance matrices to ensure the sparse matrix code path works correctly with random landmarking.
| def test_random_landmarking_with_precomputed_affinity(): | ||
| """Random landmarking should work with precomputed affinity matrices""" | ||
| affinity = np.array( | ||
| [ | ||
| [1.0, 0.8, 0.1, 0.0, 0.0, 0.0], | ||
| [0.8, 1.0, 0.2, 0.0, 0.0, 0.0], | ||
| [0.1, 0.2, 1.0, 0.9, 0.4, 0.0], | ||
| [0.0, 0.0, 0.9, 1.0, 0.5, 0.2], | ||
| [0.0, 0.0, 0.4, 0.5, 1.0, 0.9], | ||
| [0.0, 0.0, 0.0, 0.2, 0.9, 1.0], | ||
| ] | ||
| ) | ||
| affinity = (affinity + affinity.T) / 2 # ensure symmetry | ||
| n_landmark = 3 | ||
| random_state = 42 | ||
|
|
||
| G = graphtools.Graph( | ||
| affinity, | ||
| precomputed="affinity", | ||
| n_landmark=n_landmark, | ||
| random_landmarking=True, | ||
| random_state=random_state, | ||
| knn=3, | ||
| thresh=0, | ||
| ) | ||
|
|
||
| # Trigger landmark construction | ||
| _ = G.landmark_op | ||
|
|
||
| rng = np.random.default_rng(random_state) | ||
| landmark_indices = rng.choice(affinity.shape[0], n_landmark, replace=False) | ||
| expected_clusters = np.asarray( | ||
| G.kernel[:, landmark_indices].argmax(axis=1) | ||
| ).reshape(-1) | ||
|
|
||
| assert np.array_equal(G.clusters, expected_clusters) | ||
| assert G.transitions.shape == (affinity.shape[0], n_landmark) | ||
| assert G.landmark_op.shape == (n_landmark, n_landmark) | ||
|
|
Copilot
AI
Dec 18, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding a test case for the edge case where some samples have zero affinity to all randomly selected landmarks. This would test the warning logic implemented in lines 1223-1228 of graphs.py, which is currently not covered by these tests.
graphtools/graphs.py
Outdated
| precomputed = getattr(self, "precomputed", None) | ||
|
|
||
| if precomputed is not None: | ||
| # Use the precomputed affinities/distances directly to avoid Euclidean fallback |
Copilot
AI
Dec 18, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment could be clearer. It says "Use the precomputed affinities/distances directly" but the code actually uses self.kernel, which is always an affinity matrix (distances are converted to affinities in build_kernel). Consider updating to "Use affinities from the kernel computed from the precomputed matrix" for clarity.
| # Use the precomputed affinities/distances directly to avoid Euclidean fallback | |
| # Use affinities from the kernel computed from the precomputed matrix to avoid Euclidean fallback |
|
@copilot open a new pull request to apply changes based on the comments in this thread |
|
@MattScicluna I've opened a new pull request, #89, to work on those changes. Once the pull request is ready, I'll request review from you. |
…prove comment clarity Co-authored-by: MattScicluna <19255250+MattScicluna@users.noreply.github.com>
Add sparse matrix and edge case test coverage for random landmarking with precomputed matrices
close #87