Skip to content

Possible route to investigate: Pipeline for creating out-of-memory AIA data cubes with NDCube and Dask #1

@wtbarnes

Description

@wtbarnes

So this is very late in the game I realize to be adding another project idea, but we have quite a few people on the team so I figured it couldn't hurt. This is a bit of a mess and I will try to come back later and clean it up and maybe some more concrete steps will become more clear once we talk things through a bit.

One very common task that people want to do with AIA images is to analyze them in time by stacking many sequential images. Few people care about looking at just 1 AIA image. Especially at the full 4K resolution, this can be extremely challenging, computationally and just logistically, for a significant chunk of time. Even for smaller cutouts this can be an issue as accessing and aligning these cutouts is often difficult.

This typical workflow would look like,

  1. Query and download AIA data over some interval and for a selected number of wavelengths.
    (optionally also deconvolve with PSF though this isn't often done)
  2. Apply prep operations to every image such that each wavelength, at a given timestep, is on the same pixel grid
  3. Pick a reference time and align all images to this reference time to remove the effect of the Sun's rotation such that a given pixel at each time corresponds to the same physical location on the Sun. Alternatively, use some type of cross-correlation technique to align the images
  4. Stack the aligned images into a cube of dimensions nx-by-ny-by-nt, where nt corresponds to the number of images in the selected interval. There will be as many separate cubes as there are wavelengths.
  5. Perform scientific analysis on these aligned data cubes (e.g. extracting light curves from specific regions, computing time lags in each pixel, computing DEMs in each pixel as a function of time)

The steps through step 3 all have to occur on a by-image basis.

In an ideal world, I would be able to pass in a stack of AIA FITS files and process it, in parallel, into an out-of-core aligned AIA data cube.

I'm not exactly sure how this would fit into the aiapy package, but I think showing a prototype of this kind of workflow, especially accelerated with things like numba or cuda, would be really beneficial to the community.

I see this less about creating any custom tools for AIA analysis, but more about demonstrating how fast and powerful this workflow can be in Python.

Tools we would probably need:

  • ndcube
  • sunpy
  • reproject (for image alignment in time)
  • Dask

Main challenges:

  • Parallelizing the prep step (we are already planning to address this)
  • Loading the FITS files lazily
  • Transitioning from per-file operations to operations on the whole cube
  • Making these workflows the same, regardless of whether Dask is being used or not, i.e. all of this should be able to be done in memory and in serial too

As a possible starting point, see this notebook: https://gitlab.com/wtbarnes/aia-on-pleiades/-/blob/master/notebooks/tidy/data-cubes.ipynb

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions