☁️ CloudLab-SciPy2025

This repository contains hands-on examples for processing large-scale scientific data in the cloud using:

Dataplug: A lightweight, client-side Python framework for efficient partitioning of unstructured scientific data stored in object storage (like Amazon S3), enabling elastic cloud processing.
Lithops: Serverless framework for scalable parallel processing.

🚀 Quick Start (Recommended): Use pyrun.cloud

This tutorial is designed to run seamlessly on pyrun.cloud, a cloud-based JupyterLab platform with:

✅ Pre-installed dependencies
✅ Auto-configured Lithops backend
✅ Direct support for Dataplug and serverless workflows

🟢 No setup required — just launch the notebooks and start experimenting!

🧪 Running the Examples

📁 Example 1 – Using Dataplug Locally

Notebook: dataplug_example.ipynb

This notebook shows how to:

Load a FASTA file from an S3 bucket using CloudObject.from_s3
Explore metadata (e.g., number of sequences)
Preprocess and split the file into chunks
Partition the data for analysis

Run it on pyrun or locally with:

jupyter notebook dataplug_example.ipynb

☁️ Example 2 – Scalable Processing with Dataplug + Lithops

Notebook: dataplug_lithops.ipynb

This notebook demonstrates how to scale the same processing logic to the cloud using Lithops:

Partition the FASTA file with co.partition(...)
Apply process_fasta_partition to each slice
Launch parallel processing with lithops.FunctionExecutor

Run it on pyrun or locally with:

jupyter notebook dataplug_lithops.ipynb

✅ The integration between Dataplug and Lithops is native — no code changes needed to go from local to serverless!

💻 Running Locally (Optional)

If you prefer to run the notebooks locally instead of pyrun, follow these steps:

📦 Install required libraries

pip install git+https://github.com/CLOUDLAB-URV/dataplug
pip install lithops

⚙️ Configure Lithops

To execute functions in the cloud (AWS, IBM Cloud, Azure, etc.), you’ll need to configure your Lithops backend manually.

You can follow the official guide here:
👉 https://github.com/lithops-cloud/lithops#configuration

Create a .lithops_config file with your credentials and backend options.

📚 Requirements

Python 3.10 or higher
Access to an S3-compatible storage (e.g., AWS S3, MinIO)
Internet connection
Cloud credentials (automatically set in pyrun, or configured manually for local runs)

📣 About

This code is part of the PyRun-SciPy2025 tutorial series for scientific computing in the cloud.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
README.md		README.md
README.pdf		README.pdf
dataplug_example.ipynb		dataplug_example.ipynb
dataplug_lithops.ipynb		dataplug_lithops.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

☁️ CloudLab-SciPy2025

🚀 Quick Start (Recommended): Use pyrun.cloud

🧪 Running the Examples

📁 Example 1 – Using Dataplug Locally

☁️ Example 2 – Scalable Processing with Dataplug + Lithops

💻 Running Locally (Optional)

📦 Install required libraries

⚙️ Configure Lithops

📚 Requirements

📣 About

About

Uh oh!

Releases

Packages

Languages

ubenabdelkrim/PyRun-SciPy2025

Folders and files

Latest commit

History

Repository files navigation

☁️ CloudLab-SciPy2025

🚀 Quick Start (Recommended): Use pyrun.cloud

🧪 Running the Examples

📁 Example 1 – Using Dataplug Locally

☁️ Example 2 – Scalable Processing with Dataplug + Lithops

💻 Running Locally (Optional)

📦 Install required libraries

⚙️ Configure Lithops

📚 Requirements

📣 About

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages