-
-
Notifications
You must be signed in to change notification settings - Fork 9
Open
Labels
Description
What happened?
Inference of joining keys by parent seems insufficient.
Consider a case where two child datasets are joined.
ADLB and ADRS have the same primary keys, "STUDYID", "USUBJID", "PARAMCD", "AVISIT".
> library(teal.data)
> library(dplyr)
>
> ADLB <- rADLB
> ADRS <- rADRS
>
> jk <- default_cdisc_join_keys["ADLB", "ADRS"]
>
> full_join(ADLB, ADRS) |> dim()
Joining with `by = join_by(STUDYID, USUBJID, SUBJID, SITEID, AGE, AGEU, SEX, RACE, ETHNIC, COUNTRY, DTHFL, INVID, INVNAM, ARM, ARMCD, ACTARM, ACTARMCD, TRT01P, TRT01A, TRT02P, TRT02A, REGION1, STRATA1,
STRATA2, BMRKR1, BMRKR2, ITTFL, SAFFL, BMEASIFL, BEP01FL, AEWITHFL, RANDDT, TRTSDTM, TRTEDTM, TRT01SDTM, TRT01EDTM, TRT02SDTM, TRT02EDTM, AP01SDTM, AP01EDTM, AP02SDTM, AP02EDTM, EOSSTT, EOTSTT, EOSDT,
EOSDY, DCSREAS, DTHDT, DTHCAUS, DTHCAT, LDDTHELD, LDDTHGR1, LSTALVDT, DTHADY, ADTHAUT, ASEQ, PARAM, PARAMCD, AVAL, ADTM, ADY, AVISIT, AVISITN)`
[1] 11600 104
> full_join(ADLB, ADRS, by = jk) |> dim()
[1] 67200 165
Warning message:
In full_join(ADLB, ADRS, by = jk) :
Detected an unexpected many-to-many relationship between `x` and `y`.
ℹ Row 1 of `x` matches multiple rows in `y`.
ℹ Row 1 of `y` matches multiple rows in `x`.
ℹ If a many-to-many relationship is expected, set `relationship = "many-to-many"` to silence this warning.
>
Joining by default, i.e. using intersect(names(x), names(y)) correctly uses all primary keys as joining keys. Extracting a join_key_set from default_cdisc_join_keys results in a cartesian product.
sessionInfo()
> packageVersion("teal.data")
[1] '0.6.0'Relevant log output
No response
Code of Conduct
- I agree to follow this project's Code of Conduct.
Contribution Guidelines
- I agree to follow this project's Contribution Guidelines.
Security Policy
- I agree to follow this project's Security Policy.