Skip to content

Conversation

@JeffFessler
Copy link
Collaborator

@JeffFessler JeffFessler commented Jan 3, 2026

This PR has the following changes:

  • Extend Allow convolve to work with Real arrays with arbitrary precision #159 to also work with GPU arrays, i.e., convolve! now works with real (GPU) array types.
  • Use the new Real capabilities of convolve! so that weights are always real in sdc, avoiding abs and other conversions between real and complex types.
  • Solve analytically for the real weights scaling factor c in sdc to avoid calling \.
  • Comments out the line weights_tmp .+= eps(T) that seems unnecessary. (Are there corner cases where it is needed? If so, then the tests should be expanded to cover such a case.)
  • Make an (almost) non-allocating version sdc!.
  • Offer two versions of sdc! - one with and one without the final "scaling" step that is not always essential.
  • Purge residual code (in comments) about conversion to Array is a workaround for CuNFFT.

The one aspect I could not figure out how to address here is how to recycle the working space in the complex array p.tmpVec as a real array. My attempts to do this using reinterpret have failed. It could be refined with another PR later if anyone has ideas.

@JeffFessler JeffFessler marked this pull request as draft January 3, 2026 19:36
@JeffFessler JeffFessler requested review from nHackel and tknopp January 3, 2026 19:37
@nHackel
Copy link
Collaborator

nHackel commented Jan 5, 2026

@JeffFessler I took a look at the allocations with

Profile.Allocs.@profile sample_rate=1.0 NFFTTools.sdc!(p, 10,
            w2,
            weights_tmp,
            workg,
            workf,
            workv,
)

and I think the allocations are fairly minimal.

Most allocations seem to happen due to the multi-threading book-keeping, either in FFTW or our @cthreads via OhMyThreads.jl. I think with the example the ratio of data to bookkeeping is making this seem worse than things are.

In regards to reusing p.tmpVec I don't have a solution yet, but something I noticed while reviewing was that the sdc code dispatches on p::AbstractNFFTPlans but uses things like p.J, p.N, p.tmpVec which aren't part of the abstract interface. That is of course already an existing problem. I don't think we have the necessary metadata in the abstract interface to write this function in a proper abstract way. In particular, we have size, size_in and size_out, but those don't give us all the required values.

Maybe I am missing something there though and it is possible

@JeffFessler
Copy link
Collaborator Author

Thanks for the notes about allocs - makes sense.
I updated to use size_* where possible, but you are right that we can't eliminate p.tmpVec because we need its type do GPU stuff since the JLArray type is not part of the Plan type. And I don't see how to avoid p.Ñ either. Seems odd that the abstract interface requires convolve! but doesn't provide the key ingredients needed for that method.

Would it make sense to retreat from AbstractNFFTPlans and just use NFFTPlan? I'm happy to make that change, and if someone later wants a more general version they can tackle the challenge of convolve! etc. Thoughts?

@JeffFessler
Copy link
Collaborator Author

tests are failing because "NFFTPlan has no field size_in" sigh.

@nHackel
Copy link
Collaborator

nHackel commented Jan 7, 2026

@JeffFessler size_in and size_out are both functions you apply to the plans.

I think it makes sense to retreat from AbstractNFFT and change both the types in the methods and the dependency fot NFFTPlans/NFFT.jl for now. I'm not sure if I'd even classify this as a breaking change, because the code as written never worked before 😅

Going foward, we could "fix" this by:

  • Extending the AbstractNFFT interface to define methods for the metadata we need to allocate the sdc vectors
  • Giving the sdc an additional parameter which determines the array type with a default value of Array
  • Providing a package extension on NFFT.jl which has a default value of tmpVec (and potentially reuses tmpVec as a sdc! input)

If I remember correctly, AbstractNFFT.jl has @mustimplement convolve! but with a docstring that states that the operation is optional.

Still it would be good to have fitting function to retrieve the values we need, if those are something that are generally required for an NFFT

@nHackel
Copy link
Collaborator

nHackel commented Jan 7, 2026

I think the missing value is only right? Which is roughly σN with some rounding logic and probably common to all implementations unless I got lost in the maths again.

As far as I know, at the moment only NonuniformFFTs.jl would be a possible candidate for another package which could be used for NFFTTools and I don't think it implements the convolve! calls

@JeffFessler
Copy link
Collaborator Author

@nHackel there is a probley with our plan of removing AbstractNFFTs from sdc.
The NFFT tests also call sdc with NFFTGPUArraysExt.GPU_NFFTPlan which is why the tests fail now.
So I think we need to revert to AbstractNFFT for now. Agreed?

@nHackel
Copy link
Collaborator

nHackel commented Jan 9, 2026

@JeffFessler ah good catch! I didn't think about that. Some day I might find the necessary time (and understanding of the maths) to port more convolutions options than just NFFT.full to the GPU and then we can have only one NFFTPlan type.

Or as a temporary workaround, I could try to make the NFFTPlan parametric enough for GPU arrays and then just leave most of the unused fields empty.

Either way, I think for this PR it's good enough if we simply:

  1. Change the dependency from AbstractNFFTs to NFFT
  2. Leave the interface as AbstractNFFT (but get there via NFFT and not AbstractNFFTs)
  3. Make a comment in the source code that this only works for plans from NFFT.jl

@JeffFessler
Copy link
Collaborator Author

I have made the suggested changes I think, except for this one that I didn't grok:

  • Change the dependency from AbstractNFFTs to NFFT
    I think we still need the AbstractNFFTs dependence.

So now my main remaining question is whether I can cut the commented out code about conversion to Array is a workaround for CuNFFT

@JeffFessler JeffFessler marked this pull request as ready for review January 9, 2026 14:54
@nHackel
Copy link
Collaborator

nHackel commented Jan 17, 2026

@JeffFessler I just had time to take a look at this again. I think I meant dropping the AbstractNFFT dependency and just doing:

# in NFFTTools.jl
using NFFT, NFFT.AbstractNFFTs 

but ultimately it doesn't really make a difference. And we can just leave it as it is now

I think we can cut the comment and "worst" case it's still referred to in the PR and releases. IF the bug turns up again, we should try to turn it into a MWE.

Lastly, we also need to update the project.toml of NFFT.jl for your fix to the GPU extension

@JeffFessler
Copy link
Collaborator Author

we also need to update the project.toml of NFFT.jl for your fix to the GPU extension

I think you mean a version bump, right? I bumped versions for both NFFTTools and NFFT.
@nHackel Please check if I did it right because I am less sure how version changes work with extensions.

@nHackel nHackel merged commit cb3cac0 into master Jan 21, 2026
5 of 6 checks passed
@JeffFessler JeffFessler deleted the jf/sdc-real branch January 21, 2026 14:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants