-
Notifications
You must be signed in to change notification settings - Fork 27
Add cosine distance as valid metric #36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Allows cosine distance to be set at the metric. [scipy.spatial.distance.cosine](https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.cosine.html#scipy.spatial.distance.cosine) returns `1 - cosine simillarity` which is [equivalent to angular distance](https://en.wikipedia.org/wiki/Cosine_similarity#Angular_distance_and_similarity), and thus is the same thing as setting `angular` as the metric ``` ValueError: Unknown metric angular. Valid metrics are ['euclidean', 'l2', 'l1', 'manhattan', 'cityblock', 'braycurtis', 'canberra', 'chebyshev', 'correlation', 'cosine', 'dice', 'hamming', 'jaccard', 'kulsinski', 'mahalanobis', 'matching', 'minkowski', 'rogerstanimoto', 'russellrao', 'seuclidean', 'sokalmichener', 'sokalsneath', 'sqeuclidean', 'yule', 'wminkowski', 'nan_euclidean', 'haversine'], or 'precomputed', or a callable ``` <details><summary>Full error message</summary> ``` --------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-13-4ea1dfbcbec4> in <module> 1 sc.external.pp.bbknn(preprocessed, batch_key='species_batch', n_pcs=15, metric='cosine') 2 ----> 3 sc.tl.umap(preprocessed) 4 sc.pl.umap(preprocessed, **umap_plot_kws) ~/miniconda3/envs/tabula-microcebus-jan2021/lib/python3.7/site-packages/scanpy/tools/_umap.py in umap(adata, min_dist, spread, n_components, maxiter, alpha, gamma, negative_sample_rate, init_pos, random_state, a, b, copy, method, neighbors_key) 171 neigh_params.get('metric', 'euclidean'), 172 neigh_params.get('metric_kwds', {}), --> 173 verbose=settings.verbosity > 3, 174 ) 175 elif method == 'rapids': ~/miniconda3/envs/tabula-microcebus-jan2021/lib/python3.7/site-packages/umap/umap_.py in simplicial_set_embedding(data, graph, n_components, initial_alpha, a, b, gamma, negative_sample_rate, n_epochs, init, random_state, metric, metric_kwds, output_metric, output_metric_kwds, euclidean_output, parallel, verbose) 1037 random_state, 1038 metric=metric, -> 1039 metric_kwds=metric_kwds, 1040 ) 1041 expansion = 10.0 / np.abs(initialisation).max() ~/miniconda3/envs/tabula-microcebus-jan2021/lib/python3.7/site-packages/umap/spectral.py in spectral_layout(data, graph, dim, random_state, metric, metric_kwds) 304 random_state, 305 metric=metric, --> 306 metric_kwds=metric_kwds, 307 ) 308 ~/miniconda3/envs/tabula-microcebus-jan2021/lib/python3.7/site-packages/umap/spectral.py in multi_component_layout(data, graph, n_components, component_labels, dim, random_state, metric, metric_kwds) 191 random_state, 192 metric=metric, --> 193 metric_kwds=metric_kwds, 194 ) 195 else: ~/miniconda3/envs/tabula-microcebus-jan2021/lib/python3.7/site-packages/umap/spectral.py in component_layout(data, n_components, component_labels, dim, random_state, metric, metric_kwds) 120 else: 121 distance_matrix = pairwise_distances( --> 122 component_centroids, metric=metric, **metric_kwds 123 ) 124 ~/miniconda3/envs/tabula-microcebus-jan2021/lib/python3.7/site-packages/sklearn/utils/validation.py in inner_f(*args, **kwargs) 70 FutureWarning) 71 kwargs.update({k: arg for k, arg in zip(sig.parameters, args)}) ---> 72 return f(**kwargs) 73 return inner_f 74 ~/miniconda3/envs/tabula-microcebus-jan2021/lib/python3.7/site-packages/sklearn/metrics/pairwise.py in pairwise_distances(X, Y, metric, n_jobs, force_all_finite, **kwds) 1738 raise ValueError("Unknown metric %s. " 1739 "Valid metrics are %s, or 'precomputed', or a " -> 1740 "callable" % (metric, _VALID_METRICS)) 1741 1742 if metric == "precomputed": ValueError: Unknown metric angular. Valid metrics are ['euclidean', 'l2', 'l1', 'manhattan', 'cityblock', 'braycurtis', 'canberra', 'chebyshev', 'correlation', 'cosine', 'dice', 'hamming', 'jaccard', 'kulsinski', 'mahalanobis', 'matching', 'minkowski', 'rogerstanimoto', 'russellrao', 'seuclidean', 'sokalmichener', 'sokalsneath', 'sqeuclidean', 'yule', 'wminkowski', 'nan_euclidean', 'haversine'], or 'precomputed', or a callable ``` </details>
|
Let me know if you have any other suggestions! The other PR, #35, is a formatting PR and may be better to merge first, then I can apply the Black formatting to this code. |
|
As per annoy, the distance is 0.412. As per scipy/sklearn, the distance is 0.085. Additionally, the issue you're encountering is the manifestation of something else happening. UMAP only seems to error out this way on select data. My guess is that it kicks in the spectral component and tries to run some distance stuff on its own when it deems the input too disjoint. As such, it goes to retrieve the metric, sees angular, and goes up in flames. The easiest fix is to change the default metric to Euclidean, as that's something that everything speaks, including UMAP's spectral stitching thing. However, while this makes things technically run, the stitched together manifold turns into a clump. A way to avoid this spectral thing kicking in is increasing I don't remember BBKNN/UMAP acting like this when I was developing the algorithm, and I'm unsure whether this is due to changes UMAP-side or me just being very fortunate with the data I was working with. I'm tempted to try to consult the UMAP folks for assistance on the matter. |
|
I'm having the same issue. AFAIK UMAP fit in scanpy takes the weighted adjacency matrix from neighbors and does not recalculate the distances, hence it seems to me it may be a sanity check on parameters as passed to UMAP by scanpy. To make |
|
I've band-aided over the issue by swapping the metric to Euclidean. However, there's something weird afoot as evidenced by the gloopy UMAP I sent earlier. The package also supports pynndescent now, which is UMAP's knn algorithm of choice. |


Hello,
When running this tool recently, I get errors with "angular" not being a valid metric anymore. It seems this is replaced with "cosine" in both Scipy and Scikit-learn, so this PR updates the default metric to be "cosine" instead of "angular."
Allows cosine distance to be set at the metric. scipy.spatial.distance.cosine returns
1 - cosine simillaritywhich is equivalent to angular distance, and thus is the same thing as settingangularas the metricFull error message