-
Notifications
You must be signed in to change notification settings - Fork 4
Description
Maybe as "optimized calls" in the data.table section or "benchmarking" in the misc section.
Sorting. An example from today:
library(data.table)
n = 3e7
nv = 1e7
DT = data.table(dt = Sys.time() + sample(nv, n, replace=TRUE))[, c("d", "t") := .(as.IDate(dt), as.ITime(dt))][]
setindex(DT, dt)
setindex(DT, d, t)
system.time(DT[order(dt)]) # 4.8 s
system.time(DT[order(d, t)]) # 2.9 s
My takeaway is that sorting on ints is faster. Not actually sure if the indices are helping, since they are not acknowledged in the verbose output. The results above might be skewed by my comp currently being at 99 % RAM usage...
This is part of the unique(DT[order(ovars)], by=byvars, fromLast = TRUE) idiom that has come up on SO several times. I also tried DT[order(ovars), .SD[.N], by=byvars] and found the run time similarly too long. Of course, something like which.max should be faster to find the last entry, but I'm not sure if that's optimized yet, and besides it does not extend to multiple ovars and might not work for eg characters (since I recall that gmax does not)...