pyPDAF provides a Python interface to the established Parallel Data Assimilation Framework (PDAF). The original framework is used with various regional and global climate models including atmosphere, ocean, hydrology, land surface and sea ice models. These models are typically written in Fortran which can be easily used with PDAF. pyPDAF can become useful in the following scenarios:
- With an increasing number of Python-coded numerical models, especially machine learning models, pyPDAF is a convenient tool to implement data assimilation (DA) systems purely in Python.
- Alternatively, pyPDAF can be used to set up offline data assimilation system. In such a system, the model fields in restart files are replaced by analyses generated by pyPDAF. This can be an attractive alternative to the original Fortran implementations considering the simplicity of code implementation and package management in Python.
The interface inherits the efficiency of the data assimilation algorithms in Fortran, and the flexibility to be applied to different models and observations. This means that users of pyPDAF can couple the DA algorithms with any types of model and observations without the need to coding the actual DA algorithms. This allows the users to focus on the specific research problems. The framework includes various ensemble DA algorithms including many variants of ensemble Kalman filters, particle filters and other non-linear filters. It also provides framework for variants of 3DVar. A full list of supported methods can be found here
It is recommended to install pyPDAF via conda:
conda create -n pypdaf -c conda-forge yumengch::pypdafYou can also install from the source code using pip and meson. One can
find the information in .
To construct a data assimilation system, only 7 pyPDAF functions are necessary:
- pyPDAF.set_parallel - one can omit it without parallelisation
- pyPDAF.init
- pyPDAF.PDAFomi.init
- pyPDAF.PDAFomi.init_local - only used in domain localisation
- pyPDAF.PDAFomi.set_domain_limits - only used in domain localisation
- pyPDAF.init_forecast
- pyPDAF.assimilate
- pyPDAF.deallocate
However, users have to implement user-supplied functions to provide state vector and observation information.
For users without prior experience with PDAF, we highly recommend to
start with the tutorial here:
.
To construct a parallel ensemble DA system,
in example directory, we provide both online
and offline examples.
pyPDAF and PDAF both utilise Message Passing Interface (MPI)
parallelisation. Hence, to run the example, it needs to be executed
from commandline using mpiexec. For example,
cd example
mpiexec -n 4 python -u online/main.pywill run the example with 4 processes. The example is based on the tutorials of the original PDAF.
The most up-to-date pyPDAF has interface with PDAF-V3.0.
A documentation is provided.
The interface follows the naming convention of PDAF. We divide PDAF subroutines
into several subpackages including PDAF, PDAF3, PDAFomi, PDAFlocal, and
PDAFlocalomi. These provide
We welcome issues, pull requests, feature requests and any other discussions in the issues section.
Yumeng Chen, Lars Nerger
pyPDAF is mainly developed and maintained by National Centre for Earth Observation and University of Reading.

