Skip to content

Quick memory win w/ edgelist() #10

@benthestatistician

Description

@benthestatistician

For many purposes ISMs can be treated like numeric vectors. Two meaningless examples:

is(big_ism_pairs)
## [1] "BlockedInfinitySparseMatrix" "InfinitySparseMatrix"
## [3] "vector"
head(big_ism_pairs)
## [1] 0.0081 0.5048 0.3906 0.6252 0.6794 0.3088
head(big_ism_pairs *2)
## [1] 0.0162 1.0097 0.7812 1.2504 1.3588 0.6175

Do the dist columns of objects returned by edgelist() follow this pattern? If so then we can save on memory by simplifying (in R/edgelist.R ; branch i54-hinting) "dist = x@.Data" to "dist = x":

load(file="gurm_match_big_ism_pairs.RData")
object_size(big_ism_pairs)
## 584 MB
object_size(big_ism_pairs, optmatch:::edgelist(big_ism_pairs))
## 1.16 GB
edgelist_  <- function(ism) data.frame(i=ism@rownames[ism@rows], 
                                            j=ism@colnames[ism@cols], dist = ism)
object_size(big_ism_pairs, edgelist_(big_ism_pairs))
## 875 MB

(Of course the task also calls for tests to document whether/how these dist variables could be persuaded to adopt the needed behavior.)

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions