trackpy.locate is slow. Could we use parallel processing? trackpy.batch already does parallel processing but is currently broken for Windows. Maybe we could fix it. It would also be nice to have both parallel processing and a cancel button, so some customizability is necessary.