scXMatch (single-cell cross match) is a Python package that implements Rosenbaum's cross-match test using distance-based matching to assess statistical dependence between two groups of high-dimensional data. This is particularly useful in analyzing multivariate distributions in structured data, such as single-cell RNA-seq.
This package provides a Python implementation inspired by the methodology described in Rosenbaum (2005).
Due to its dependence on graph-tool, this package can only be installed from conda, not from PyPI. The channels need to be specified.
conda install scxmatch -c conda-forge -c bioconda- Python ≥ 3.9
anndatascanpyscipy-
graph-tool$\geq$ 2.92
scxmatch.test(
adata,
group_by,
test_group,
reference=None,
metric="sqeuclidean",
rank=False,
k=100,
total_RAM_available_gb=None
)Performs Rosenbaum’s matching-based test to determine if there is a statistically significant difference between two groups of samples using a distance-based graph matching approach.
adata(anndata.AnnData): The input data matrix. Features should be inadata.X, and group labels inadata.obs[group_by].group_by(str): Column inadata.obsindicating group labels.test_group(strorlist of str): The group(s) to be tested.reference(strorlist of str, optional): The reference group(s). IfNone, all non-test samples are used as reference.metric(str, default"sqeuclidean"): Distance metric for matching. Followsscipy.spatial.distance.cdiststandards.rank(bool, defaultFalse): IfTrue, features are rank-transformed before distance computation.k(int,"auto", or"full", default100): Number of nearest neighbors to use for graph construction. Iffull, a full distance matrix will be calculated.total_RAM_available_gb(float, optional): Required ifk="auto".
p_value(float): P-value from the Rosenbaum crossmatch test.z_score(float): Standardized test statistic.relative_support(float): Proportion of samples included in the matching.
TypeError: If the inputadatais not anAnnDataobject.ValueError: Iftest_grouporreferencecontains values not present inadata.obs[group_by].ValueError: Ifk="auto"andtotal_RAM_available_gbis not provided.ValueError: Ifkis not an integer,"auto", or"full".
- Modifies
adata.obsin-place by adding the following columns:XMatch_partner_<test_group>_vs_<reference>: The index of each sample’s matched partner in the MWMCM.
import anndata as ad
import scxmatch
# Load your AnnData object or load scanpy dataset
# adata = ad.read_h5ad("your_data.h5ad")
adata = sc.datasets.krumsiek11()
# Run test
p_val, z, support = scxmatch.test(
adata=adata,
group_by="condition",
test_group="treated",
reference="control",
metric="sqeuclidean",
rank=False,
k=100
)
print(f"P-value: {p_val:.4f}, Z-score: {z:.2f}, Support: {support:.2%}")If you use scXMatch in your research, please cite the original paper and our publication:
Rosenbaum, P. R. (2005). An exact distribution-free test comparing two multivariate distributions based on adjacency. Journal of the Royal Statistical Society: Series B, 67(4), 515–530.
Anna Moeller, Miriam Schnitzerlein, Eric Greto, Vasily Zaburdaev, Stefan Uderhardt, David B. Blumenthal. Quantifying distribution shifts in single-cell data with scXMatch. bioRxiv 2025.06.25.661473; doi: https://doi.org/10.1101/2025.06.25.661473
MIT License