add notes re optimized code

Maybe as "optimized calls" in the data.table section or "benchmarking" in the misc section.

**Sorting.** An example from today:

```
library(data.table)
n = 3e7
nv = 1e7
DT = data.table(dt = Sys.time() + sample(nv, n, replace=TRUE))[, c("d", "t") := .(as.IDate(dt), as.ITime(dt))][]

setindex(DT, dt)
setindex(DT, d, t)

system.time(DT[order(dt)]) # 4.8 s
system.time(DT[order(d, t)]) # 2.9 s
```

My takeaway is that sorting on ints is faster. Not actually sure if the indices are helping, since they are not acknowledged in the verbose output. The results above might be skewed by my comp currently being at 99 % RAM usage...

This is part of the `unique(DT[order(ovars)], by=byvars, fromLast = TRUE)` idiom that has come up on SO several times. I also tried `DT[order(ovars), .SD[.N], by=byvars]` and found the run time similarly too long. Of course, something like which.max should be faster to find the last entry, but I'm not sure if that's optimized yet, and besides it does not extend to multiple ovars and might not work for eg characters (since I recall that gmax does not)...


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add notes re optimized code #31

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

add notes re optimized code #31

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions