Skip to content

add notes re optimized code #31

@franknarf1

Description

@franknarf1

Maybe as "optimized calls" in the data.table section or "benchmarking" in the misc section.

Sorting. An example from today:

library(data.table)
n = 3e7
nv = 1e7
DT = data.table(dt = Sys.time() + sample(nv, n, replace=TRUE))[, c("d", "t") := .(as.IDate(dt), as.ITime(dt))][]

setindex(DT, dt)
setindex(DT, d, t)

system.time(DT[order(dt)]) # 4.8 s
system.time(DT[order(d, t)]) # 2.9 s

My takeaway is that sorting on ints is faster. Not actually sure if the indices are helping, since they are not acknowledged in the verbose output. The results above might be skewed by my comp currently being at 99 % RAM usage...

This is part of the unique(DT[order(ovars)], by=byvars, fromLast = TRUE) idiom that has come up on SO several times. I also tried DT[order(ovars), .SD[.N], by=byvars] and found the run time similarly too long. Of course, something like which.max should be faster to find the last entry, but I'm not sure if that's optimized yet, and besides it does not extend to multiple ovars and might not work for eg characters (since I recall that gmax does not)...

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions