Skip to content

Calling Python with Reticulate

Yichen W edited this page Apr 6, 2021 · 4 revisions

In cases that developers need to wrap a Python module in SCTK, you will need to work with Reticulate. Here is the basic workflow.

Prerequisites

  1. Python and module installation
    There are different ways to install Python to your local, including a raw Python installation from Python's official site, an Anaconda/Miniconda bundled installation, and Miniconda installation automatically accomplished by Reticulate. You can choose any one of your preferences. But be sure all the packages that work for SCTK are installed in the same environment.

⚠️ Recent Anaconda might be malfunctioning on some machines when users try to install packages. An old version with "conda-4.6.14" can be a workaround, according to this GitHub Issue. Downloading the specific version of Miniconda installer from its archive if you have a problem with downgrading.

  1. Reticulate installation
    In the R console, run the following command, although Reticulate should have already been installed since it is a required package of SCTK.
> install.packages("reticulate")

To check if everything is settled. For example if you are going to wrap a function from NumPy, in the R console:

> library(reticulate)
# Show you the location of the Python that Reticulate can find by default
> py_config() 
# Try to import the module
> np <- import("numpy")

Wrap a module

For a standard workflow of wrapping a Python function for SCTK, you will need to do the following steps. For example, if you want to wrap numpy's functionalities:

  1. In R/reticulate_setup.R , under line 6, # python modules to use, add the following line, as this is already there in line 10. If your working with another module, just add it below.
numpy <- NULL
  1. Still in this file, inside the function .onLoad(), add the following command.
numpy <<- reticulate::import("numpy", delay_load = TRUE)
  1. Also add this module to SCTK customized auto-install function, so that future users can install the module you indicate together with other required Python modules at one click.
  • In function sctkPythonInstallConda(), add "numpy" to the default vector of packages argument. This function works for Conda users. If your module is supported by Conda, add the string of the name of your module to packages argument; if it is only supported by Pip, add it to pipPackages.
  • In function sctkPythonInstallVirtualEnv(), add "numpy" to the default vector of packages argument.
  1. Write your function that wraps NumPy's functionality. For example if you want to use NumPy to calculate the log-transformation of a count matrix.
numpyLogTransform <- function(inSCE, 
                              useAssay = "counts", 
                              assayName = "numpyLogcounts"){
  ## Input check
  if(!inherits(inSCE, "SingleCellExperiment")){
    stop("\"inSCE\" should be a SingleCellExperiment Object.")
  }
  if(!reticulate::py_module_available(module = "numpy")){
    warning("Cannot find python module 'numpy', please install Conda and",
            " run sctkPythonInstallConda() or run ",
            "sctkPythonInstallVirtualEnv(). If one of these have been ",
            "previously run to install the modules, make sure to run ",
            "selectSCTKConda() or selectSCTKVirtualEnvironment(),",
            " respectively, if R has been restarted since the module ",
            "installation. Alternatively, numpy can be installed on the local ",
            "machine with pip (e.g. pip install numpy) and then the ",
            "'use_python()' function from the 'reticulate' package can be used",
            " to select the correct Python environment.")
    return(inSCE)
  }
  if(!useAssay %in% SummarizedExperiment::assayNames(inSCE)){
    stop(paste("\"useAssay\": ", useAssay, " not found."))
  }
  ## Calculation
  mat <- SummarizedExperiment::assay(inSCE, useAssay)
  mat <- as.matrix(mat)
  transformed <- numpy$log10(mat + 1)
  rownames(transformed) <- rownames(mat)
  colnames(transformed) <- colnames(mat)
  expData(inSCE, assayName, "normalized") <- transformed
  return(inSCE)
}
  1. Additionally, SCTK's GitHub repository undergoes the Travis CI check. If you have written a "non-dontrun" example, you will also need to add the required Python module to the Travis YAML file in order to pass the check. The YAML file is located at singleCellTK/.travis.yml, and the module name should be appended to the pip install ... command under before_install: section.

Here are some points to notice:

  • The "input check" section is a must for every exported function. You might need to add others so that users can get clear guidance from error messages when they are not running in a correct manner.
  • The code that works with NumPy is actually only the one line transformed <- numpy$log10(mat + 1). Notice that, the + operation is still done by R. Reticulate converts the result of mat + 1 to a numpy.ndarray, and passes it to Python. After the job of NumPy is done, Reticulate receives the result from Python and converts that numpy.ndarray back to an R matrix, and finally returns it to transformed.
  • Use expData<- instead of SingleCellExperiment::assay<-. Refer to SCTK tagging system.

Debugging

Make sure the auto-installation works.

You may not head to writing the wrapper at first. You need to still check if you can have the Python package installed purely via SCTK, as this is our goal to reduce the command-line workload for users without related knowledge.

Check how Reticulate converts your data.

Developers working with Reticulate should always be careful of the conversion conducted by Reticulate, as this behavior changes with Reticulate's updates. Each time you pass data from R to Python, and vice versa, try to print the conversion result and see if it matches to your expectation.

  • To see the result passed to R from Python, just use print() in R console or whatever you trust.
  • To see the result passed to Python from R, say if the value from R is called data, in R console:
> py$data_from_R <- data
> py_run_string("print(data_from_R)")

Here py is a variable for the Python working space at the current session. The first command simply passes from R to Python, and the conversion should have been done. The second line directly runs Python code without any communication with R environment. In this way, you can have a check of how your data looks like when it is converted to Python style.
This is also a good method to implement your wrapper, especially when you need to arrange a complexed input structure for the Python function you invoke. The point is that, only a few basic data type is supported with the automatic conversion, and sometimes the conversion does not happen as you expect. Keeping things on one side as much as possible can largely lower the risk you mess the structure up.

Alternatively,

> py$data_from_R <- data
> repl_python()
Python 3.7.5 (C:/Users/Yichen/AppData/Local/Programs/Python/Python37/python.exe)
Reticulate 1.18 REPL -- A Python interpreter in R.
>>> print(data_from_R)

By repl_python(), Reticulate starts a Python session inside your R console and now you can do whatever Python operations you prefer in order to have a check.

Clone this wiki locally