-
Notifications
You must be signed in to change notification settings - Fork 57
Description
I have recently been experimenting with different ways of dealing with very long arrays. The summary below is partly for the benefit of others, but also to see if there are any other ways to speed things up.
The background is that when transferring very long arrays between Excel and Python everything works much quicker with a vector array, or a single row array, rather than a 2D array with many rows, even if it is a single column. One way around this is to use the VBA Transpose function to convert a single column into a single row, but this doesn't work if there are more than 2^ 16 rows.
The options I have looked at are:
In Python I have two functions:
@xlfunc
@Xlarg("x", "nparray", dims=1)
@Xlarg("y", "nparray", dims=1)
def AddVect(x, y):
x = x+y
return x
@xlfunc
def AddVect2(x, y):
x = np.array(x)
y = np.array(y)
x = x+y
return x
Using these I have written six VBA UDFs that transfer 2 complete columns (i.e. 1048576 rows), read the result of the Python function, and record the time taken for each step.
1: VBA code generated by ExcelPython, plus create a single column array and transfer the contents of the Python return array with a loop.
- Transfer the Excel arrays as single column variant arrays, convert to numpy arrays in Python with AddVect2. This returns a single column array, that does not need to be transposed.
- Convert to vector arrays of doubles with a loop in VBA, then convert to Numpy arrays in VBA
Convert results to column array with a loop. - As 3 but transfer as double arrays, and convert to Numpy arrays in Python.
- As 4, but convert column arrays to single row arrays in VBA
- As 5, but using variant arrays, rather than doubles.
Total times for each function were:
1: 1.34 sec.
2: 2.28 sec
3: 1.95 sec
4: 0.79 sec
5: 0.82 sec
6: 1.36 sec
So conclusions are:
- Passing long arrays as vectors, or single row arrays, is much faster than passing single column arrays.
- Function 3 was presumably doubling up some of the operations done in 1, hence the slower time.
- Creating vector arrays of doubles (functions 4 and 5) was significantly faster than converting variant arrays to numpy arrays in VBA, in spite of the need to loop through 1 million+ rows at each end of the operation.
- Looping with variant arrays (Function 6) was very similar in time to the ExcelPython code.
- Even with Functions 4 and 5 the data transpose and transfer operations were hugely slower than the addition of the two vectors in Python, which took just a few milliseconds.
So the message is, if you have long column arrays transpose then into double vector arrays before transferring to Python, then convert to Numpy arrays in Python.
And the question is, is there any way to further reduce the overhead of the data transfer?