Skip to content

Conversation

@Kyoshido
Copy link
Contributor

@Kyoshido Kyoshido commented May 7, 2025

Hello,

so I was tasked with creating synthetic data for a dataset with out hhid and hhsize. And I stumbled upon two errors related to hhid.

Let's use example with eusilc data.

The code below is working properly.

library("simPop")
data("eusilcS")
seed <- 1234
inp <- specifyInput(data=eusilcS,
                    hhid="db030",
                    hhsize="hsize",
                    strata="db040",
                    weight="db090"
                    )
inp
 -------------- 
survey sample of size 11725 x 19 

 Selected important variables: 

 household ID: db030
 personal ID: pid
 variable household size: hsize
 sampling weight: db090
 strata: db040
 -------------- 

1. Now let’s take a look at a similar code. But hhid and hhsize are absent.

inp <- specifyInput(data=eusilcS,
                    strata="db040",
                    weight="db090"
                    )
inp
Error in specifyInput(data = eusilcS, strata = "db040", weight = "db090") :    argument "hhid" is missing, with no default

Ok, so there is an error argument "hhid" is missing, with no default

The function definition of specifyInput says:

specifyInput <- function(data, hhid=NULL, hhsize=NULL, pid=NULL, weight=NULL, strata=NULL, population=FALSE)

This suggests that hhid is optional, because it has a default value NULL.

Inside the function, there's this block:

if(is.null(hhid)){
  # initialize hhid
  hhid <- "hhid.simPop"
  data[, hhid.simPop := .I]
}

So why do we get this error "hhid" is missing, with no default then?

Problem here is not hhid = NULL, but that hhid is missing altogether. Which is exactly what the error message explicitly says.

When a parameter is not passed at all, inside the function, hhid is not yet evaluated — it is not NULL — it is missing.

So inside the function, the condition is.null(hhid) is evaluted with error argument "hhid" is missing, with no default, because hhid is missing — not provided yet at all.

On an object that is not defined → hence the error.

So I recommend adding these checks at the beginning of the function:

if (missing(hhid)) hhid <- NULL
if (missing(hhsize)) hhsize <- NULL
if (missing(pid)) pid <- NULL

This way, the function will work as expected and will not display an error message when hhid or hhsize are not provided.

So I added it to my fork:

devtools::install_github("kyoshido/simPop")
library(simPop)
inp <- specifyInput(data=eusilcS,
                    strata="db040",
                    weight="db090"
                    )
inp
 -------------- 
survey sample of size 11725 x 21 

 Selected important variables: 

 household ID: hhid.simPop
 personal ID: pid.simPop
 variable household size: hhsize.simPop
 sampling weight: db090
 strata: db040
 -------------- 

Now it works as expected.

2. However, there was another error when the parameters were included, but set to NULL.

inp <- specifyInput(data=eusilcS,
                    hhid=NULL,
                    hhsize=NULL,
                    strata="db040",
                    weight="db090"
                    )
inp
Error in if (!inherits(hhid, "character") | length(hhid) != 1 | is.na(match(hhid,  : 
  argument is of length zero

the error message says argument is of length zero.

When we explicitly write hhid = NULL
The function see:

hhid  # is NULL
inherits(hhid, "character")  # FALSE
length(hhid)  # 0
match(hhid, colnames(data))  # integer(0)
is.na(match(hhid, colnames(data)))  # logical(0)

Logical test is not properly evaluated → hence the error.

But all this was properly handled by adding the check for missing(hhid) at the beginning of the function.

add checks when hhid hhsize pid are missing
@matthias-da
Copy link
Collaborator

We discussed this in our meetings that simPop already can do this. There was an update 6 months ago.
Please see the Commit f76e4c1

You just need to install the github version

library(devtools)
install_github("statistikat/simPop")

and everything works. We still did not submit it to CRAN, because we need to clean simPop to not receive any notes.

@matthias-da matthias-da closed this May 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants