CDMConnector: Connect to an OMOP Common Data Model

Package ‘CDMConnector’

July 16, 2024

Title Connect to an OMOP Common Data Model

Version 1.5.0

Description Provides tools for working with observational health data in the

Observational Medical Outcomes Partnership (OMOP) Common Data Model for-

mat with a pipe friendly syntax.

Common data model database table references are stored in a single compound ob-

ject along with metadata.

License Apache License (>= 2)

URL https://darwin-eu.github.io/CDMConnector/,

https://github.com/darwin-eu/CDMConnector

BugReports https://github.com/darwin-eu/CDMConnector/issues

Encoding UTF-8

RoxygenNote 7.3.1

Depends R (>= 4.0)

Imports dplyr, dbplyr (>= 2.5.0), DBI (>= 0.3.0), checkmate, cli,

purrr, rlang, tidyselect, readr, glue, waldo, methods, withr,

lifecycle, jsonlite, stringr, stringi, fs, generics, tidyr,

omopgenerics (>= 0.1.2)

Suggests SqlRender, CirceR, rJava, covr, knitr, rmarkdown, duckdb,

RSQLite, RPostgres, odbc, ggplot2, bigrquery,

DatabaseConnector, lubridate, clock, tibble, testthat (>=

3.0.0), pool, snakecase, palmerpenguins, tictoc

Enhances arrow

Conﬁg/testthat/edition 3

Conﬁg/testthat/parallel false

VignetteBuilder knitr

Collate 'CDMConnector-package.R' 'Eunomia.R' 'benchmarkCDMConnector.R'

'cdm.R' 'cdmSubset.R' 'cdm_from_environment.R'

'cohortTransformations.R' 'cohort_ddl.R' 'compute.R'

'copy_cdm_to.R' 'dateadd.R' 'dbSource.R'

2 Contents

'reexports-omopgenerics.R' 'generateCohortSet.R'

'generateConceptCohortSet.R' 'summariseQuantile.R' 'utils.R'

'validate.R' 'zzz-deprecated.R'

NeedsCompilation no

Author Adam Black [aut, cre] (<https://orcid.org/0000-0001-5576-8701>),

Artem Gorbachev [aut],

Edward Burn [aut],

Marti Catala Sabate [aut]

Maintainer Adam Black <[email protected]>

Repository CRAN

Date/Publication 2024-07-16 08:40:02 UTC

Contents

appendPermanent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

asDate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

assert_tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

assert_write_schema . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

benchmarkCDMConnector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

cdmCon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

cdmDisconnect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

cdmFlatten . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

cdmName . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

cdmSample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

cdmSubset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

cdmSubsetCohort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

cdmWriteSchema . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

cdm_from_con . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

cdm_from_environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

cdm_from_ﬁles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

cdm_from_tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

cdm_select_tbl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

cohortAttrition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

cohortSet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

cohort_count . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

cohort_erafy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

cohort_union . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

computeQuery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

copy_cdm_to . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

dateadd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

datediff . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

datepart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

dbms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

dbSource . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

downloadEunomiaData . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

eunomiaDir . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

appendPermanent 3

eunomia_is_available . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

exampleDatasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

generateCohortSet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

generateConceptCohortSet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

inSchema . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

intersect_cohorts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

list_tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

new_generated_cohort_set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

read_cohort_set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

recordCohortAttrition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

snapshot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

stow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

summarise_quantile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

tbl_group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

union_cohorts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

uniqueTableName . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

validate_cdm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

version . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

Index 49

appendPermanent Run a dplyr query and add the result set to an existing

Description

Run a dplyr query and add the result set to an existing

Usage

appendPermanent(x, name, schema = NULL)

append_permanent(x, name, schema = NULL)

Arguments

x A dplyr query

name Name of the table to be appended. If it does not already exist it will be created.

schema Schema where the table exists. Can be a length 1 or 2 vector. (e.g. schema =

"my_schema", schema = c("my_schema", "dbo"))

Value

A dplyr reference to the newly created table

4 asDate

Examples

## Not run:

library(CDMConnector)

con <- DBI::dbConnect(duckdb::duckdb(), dbdir = eunomia_dir())

concept <- dplyr::tbl(con, "concept")

# create a table

rxnorm_count <- concept %>%

dplyr::filter(domain_id == "Drug") %>%

dplyr::mutate(isRxnorm = (vocabulary_id == "RxNorm")) %>%

dplyr::count(domain_id, isRxnorm) %>%

compute("rxnorm_count")

# append to an existing table

rxnorm_count <- concept %>%

dplyr::filter(domain_id == "Procedure") %>%

dplyr::mutate(isRxnorm = (vocabulary_id == "RxNorm")) %>%

dplyr::count(domain_id, isRxnorm) %>%

appendPermanent("rxnorm_count")

DBI::dbDisconnect(con, shutdown = TRUE)

## End(Not run)

asDate as.Date dbplyr translation wrapper

Description

This is a workaround for using as.Date inside dplyr verbs against a database backend. This function

should only be used inside dplyr verbs where the ﬁrst argument is a database table reference. asDate

must be unquoted with !! inside dplyr verbs (see example).

Usage

asDate(x)

as_date(x)

Arguments

x an R expression

assert_tables 5

Examples

## Not run:

con <- DBI::dbConnect(odbc::odbc(), "Oracle")

date_tbl <- dplyr::copy_to(con,

data.frame(y = 2000L, m = 10L, d = 10L),

name = "tmp",

temporary = TRUE)

df <- date_tbl %>%

dplyr::mutate(date_from_parts = !!asDate(paste0(

.data$y, "/",

.data$m, "/",

.data$d

))) %>%

dplyr::collect()

## End(Not run)

assert_tables Assert that tables exist in a cdm object

Description

A cdm object is a list of references to a subset of tables in the OMOP Common Data Model. If

you write a function that accepts a cdm object as a parameter assert_tables/assertTables will

help you check that the tables you need are in the cdm object, have the correct columns/ﬁelds, and

(optionally) are not empty.

Usage

assert_tables(cdm, tables, empty.ok = FALSE, add = NULL)

assertTables(cdm, tables, empty.ok = FALSE, add = NULL)

Arguments

cdm A cdm object

tables A character vector of table names to check.

empty.ok Should an empty table (0 rows) be considered an error? TRUE or FALSE (de-

fault)

add An optional AssertCollection created by checkmate::makeAssertCollection()

that errors should be added to.

Value

Invisibly returns the cdm object

6 assert_write_schema

Examples

## Not run:

# Use assertTables inside a function to check that tables exist

countDrugsByGender <- function(cdm) {

assertTables(cdm, tables = c("person", "drug_era"), empty.ok = FALSE)

cdm$person %>%

dplyr::inner_join(cdm$drug_era, by = "person_id") %>%

dplyr::count(.data$gender_concept_id, .data$drug_concept_id) %>%

dplyr::collect()

}

library(CDMConnector)

con <- DBI::dbConnect(duckdb::duckdb(), dbdir = eunomia_dir())

cdm <- cdm_from_con(con)

countDrugsByGender(cdm)

DBI::dbDisconnect(con, shutdown = TRUE)

## End(Not run)

assert_write_schema Assert that cdm has a writable schema

Description

A cdm object can optionally contain a single schema in a database with write access. assert_write_schema

checks that the cdm contains the "write_schema" attribute and tests that local dataframes can be

written to tables in this schema.

Usage

assert_write_schema(cdm, add = NULL)

assertWriteSchema(cdm, add = NULL)

Arguments

cdm A cdm object

add An optional AssertCollection created by checkmate::makeAssertCollection()

that errors should be added to.

Value

Invisibly returns the cdm object

benchmarkCDMConnector 7

benchmarkCDMConnector Run benchmark of tasks using CDMConnector

Description

Run benchmark of tasks using CDMConnector

Usage

benchmarkCDMConnector(cdm)

Arguments

cdm A CDM reference object

Value

a tibble with time taken for different analyses

Examples

## Not run:

library(CDMConnector)

con <- DBI::dbConnect(duckdb::duckdb(), eunomia_dir())

cdm <- cdm_from_con(con, cdm_schema = "main", write_schema = "main")

benchmarkCDMConnector(cdm)

DBI::dbDisconnect(con, shutdown = TRUE)

## End(Not run)

cdmCon Get underlying database connection

Description

Get underlying database connection

Usage

cdmCon(cdm)

Arguments

cdm A cdm reference object created by cdm_from_con

8 cdmFlatten

Value

A reference to the database containing tables in the cdm reference

Examples

## Not run:

con <- DBI::dbConnect(duckdb::duckdb(), dbdir = eunomia_dir())

cdm <- cdm_from_con(con = con, cdm_name = "Eunomia",

cdm_schema = "main", write_schema = "main")

cdmCon(cdm)

DBI::dbDisconnect(con)

## End(Not run)

cdmDisconnect Disconnect the connection of the cdm object

Description

This function will disconnect from the database as well as drop "temporary" tables that were created

on database systems that do not support actual temporary tables. Currently temp tables are emulated

on Spark/Databricks systems.

Usage

cdmDisconnect(cdm)

cdm_disconnect(cdm)

Arguments

cdm cdm reference

cdmFlatten Flatten a cdm into a single observation table

Description

This experimental function transforms the OMOP CDM into a single observation table. This is only

recommended for use with a ﬁltered CDM or a cdm that is small in size.

cdmFlatten 9

Usage

cdmFlatten(

cdm,

domain = c("condition", "drug", "procedure"),

includeConceptName = TRUE

)

cdm_flatten(

cdm,

domain = c("condition", "drug", "procedure"),

include_concept_name = TRUE

)

Arguments

cdm A cdm_reference object

domain Domains to include. Must be a subset of "condition", "drug", "procedure",

"measurement", "visit", "death", "observation".

include_concept_name, includeConceptName

Should concept_name and type_concept_name be include in the output table?

TRUE (default) or FALSE

Details

[Experimental]

Value

A lazy query that when evaluated will result in a single cdm table

Examples

## Not run:

library(CDMConnector)

library(dplyr, warn.conflicts = FALSE)

con <- DBI::dbConnect(duckdb::duckdb(), eunomia_dir())

cdm <- cdm_from_con(con, cdm_schema = "main")

all_observations <- cdmSubset(cdm, personId = c(2, 18, 42)) %>%

cdmFlatten() %>%

collect()

all_observations

#> # A tibble: 213 × 8

#> person_id observation_. start_date end_date type_. domain obser. type_.

#> <dbl> <dbl> <date> <date> <dbl> <chr> <chr> <chr>

#> 1 2 40213201 1986-09-09 1986-09-09 5.81e5 drug pneumo <NA>

#> 2 18 4116491 1997-11-09 1998-01-09 3.20e4 condi Escher <NA>

10 cdmName

#> 3 18 40213227 2017-01-04 2017-01-04 5.81e5 drug tetanu <NA>

#> 4 42 4156265 1974-06-13 1974-06-27 3.20e4 condi Facial <NA>

#> 5 18 40213160 1966-02-23 1966-02-23 5.81e5 drug poliov <NA>

#> 6 42 4198190 1933-10-29 1933-10-29 3.80e7 proce Append <NA>

#> 7 2 4109685 1952-07-13 1952-07-27 3.20e4 condi Lacera <NA>

#> 8 18 40213260 2017-01-04 2017-01-04 5.81e5 drug zoster <NA>

#> 9 42 4151422 1985-02-03 1985-02-03 3.80e7 proce Sputum <NA>

#> 10 2 4163872 1993-03-29 1993-03-29 3.80e7 proce Plain <NA>

#> # ... with 203 more rows, and abbreviated variable names observation_concept_id,

#> # type_concept_id, observation_concept_name, type_concept_name

DBI::dbDisconnect(con, shutdown = TRUE)

## End(Not run)

cdmName Get the CDM name

Description

Extract the CDM name attribute from a cdm_reference object

Usage

cdmName(cdm)

cdm_name(cdm)

Arguments

cdm A cdm object

Value

The name of the CDM as a character string

Examples

## Not run:

library(CDMConnector)

con <- DBI::dbConnect(duckdb::duckdb(), eunomia_dir())

cdm <- cdm_from_con(con, cdm_schema = "main", write_schema = "main")

cdmName(cdm)

#> [1] "eunomia"

DBI::dbDisconnect(con, shutdown = TRUE)

## End(Not run)

cdmSample 11

cdmSample Subset a cdm object to a random sample of individuals

Description

cdmSample takes a cdm object and returns a new cdm that includes only a random sample of per-

sons in the cdm. Only person_ids in both the person table and observation_period table will be

considered.

Usage

cdmSample(cdm, n, seed = sample.int(1e+06, 1), name = "person_sample")

cdm_sample(cdm, n, seed = sample.int(1e+06, 1), name = "person_sample")

Arguments

cdm A cdm_reference object.

n Number of persons to include in the cdm.

seed Seed for the random number generator.

name Name of the table that will contain the sample of persons.

Details

[Experimental]

Value

A modiﬁed cdm_reference object where all clinical tables are lazy queries pointing to subset

Examples

## Not run:

library(CDMConnector)

library(dplyr, warn.conflicts = FALSE)

con <- DBI::dbConnect(duckdb::duckdb(), eunomia_dir())

cdm <- cdm_from_con(con, cdm_schema = "main")

cdmSampled <- cdmSample(cdm, n = 2)

cdmSampled$person %>%

select(person_id)

#> # Source: SQL [2 x 1]

#> # Database: DuckDB 0.6.1

#> person_id

#> <dbl>

12 cdmSubset

#> 1 155

#> 2 3422

DBI::dbDisconnect(con, shutdown = TRUE)

## End(Not run)

cdmSubset Subset a cdm object to a set of persons

Description

cdmSubset takes a cdm object and a list of person IDs as input. It returns a new cdm that includes

data only for persons matching the provided person IDs. Generated cohorts in the cdm will also be

subset to the IDs provided.

Usage

cdmSubset(cdm, personId)

cdm_subset(cdm, person_id)

Arguments

cdm A cdm_reference object

person_id, personId

A numeric vector of person IDs to include in the cdm

Details

[Experimental]

Value

A modiﬁed cdm_reference object where all clinical tables are lazy queries pointing to subset

Examples

## Not run:

library(CDMConnector)

library(dplyr, warn.conflicts = FALSE)

con <- DBI::dbConnect(duckdb::duckdb(), eunomia_dir())

cdm <- cdm_from_con(con, cdm_schema = "main")

cdm2 <- cdmSubset(cdm, personId = c(2, 18, 42))

cdm2$person %>%

cdmSubsetCohort 13

select(1:3)

#> # Source: SQL [3 x 3]

#> # Database: DuckDB 0.6.1

#> person_id gender_concept_id year_of_birth

#> <dbl> <dbl> <dbl>

#> 1 2 8532 1920

#> 2 18 8532 1965

#> 3 42 8532 1909

DBI::dbDisconnect(con, shutdown = TRUE)

## End(Not run)

cdmSubsetCohort Subset a cdm to the individuals in one or more cohorts

Description

cdmSubset will return a new cdm object that contains lazy queries pointing to each of the cdm

tables but subset to individuals in a generated cohort. Since the cdm tables are lazy queries, the

subset operation will only be done when the tables are used. computeQuery can be used to run the

SQL used to subset a cdm table and store it as a new table in the database.

Usage

cdmSubsetCohort(cdm, cohortTable = "cohort", cohortId = NULL, verbose = FALSE)

cdm_subset_cohort(

cdm,

cohort_table = "cohort",

cohort_id = NULL,

verbose = FALSE

)

Arguments

cdm A cdm_reference object

cohortTable, cohort_table

The name of a cohort table in the cdm reference

cohortId, cohort_id

IDs of the cohorts that we want to subset from the cohort table. If NULL (de-

fault) all cohorts in cohort table are considered.

verbose Should subset messages be printed? TRUE or FALSE (default)

Details

[Experimental]

14 cdmWriteSchema

Value

A modiﬁed cdm_reference with all clinical tables subset to just the persons in the selected cohorts.

Examples

## Not run:

library(CDMConnector)

library(dplyr, warn.conflicts = FALSE)

con <- DBI::dbConnect(duckdb::duckdb(), eunomia_dir())

cdm <- cdm_from_con(con, cdm_schema = "main", write_schema = "main")

# generate a cohort

path <- system.file("cohorts2", mustWork = TRUE, package = "CDMConnector")

cohortSet <- readCohortSet(path) %>%

filter(cohort_name == "GIBleed_male")

# subset cdm to persons in the generated cohort

cdm <- generateCohortSet(cdm, cohortSet = cohortSet, name = "gibleed")

cdmGiBleed <- cdmSubsetCohort(cdm, cohortTable = "gibleed")

cdmGiBleed$person %>%

tally()

#> # Source: SQL [1 x 1]

#> # Database: DuckDB 0.6.1

#> n

#> <dbl>

#> 1 237

cdm$person %>%

tally()

#> # Source: SQL [1 x 1]

#> # Database: DuckDB 0.6.1

#> n

#> <dbl>

#> 1 2694

DBI::dbDisconnect(con, shutdown = TRUE)

## End(Not run)

cdmWriteSchema Get cdm write schema

Description

Get cdm write schema

cdm_from_con 15

Usage

cdmWriteSchema(cdm)

Arguments

cdm A cdm reference object created by cdm_from_con

Value

The database write schema

Examples

## Not run:

con <- DBI::dbConnect(duckdb::duckdb(), dbdir = eunomia_dir())

cdm <- cdm_from_con(con = con, cdm_name = "Eunomia",

cdm_schema = "main", write_schema = "main")

cdmWriteSchema(cdm)

DBI::dbDisconnect(con)

## End(Not run)

cdm_from_con Create a CDM reference object from a database connection

Description

Create a CDM reference object from a database connection

Usage

cdm_from_con(

con,

cdm_schema,

write_schema,

cohort_tables = NULL,

cdm_version = "5.3",

cdm_name = NULL,

achilles_schema = NULL,

.soft_validation = FALSE,

write_prefix = NULL

)

cdmFromCon(

con,

16 cdm_from_con

cdmSchema,

writeSchema,

cohortTables = NULL,

cdmVersion = "5.3",

cdmName = NULL,

achillesSchema = NULL,

.softValidation = FALSE,

writePrefix = NULL

)

Arguments

con A DBI database connection to a database where an OMOP CDM v5.4 or v5.3

instance is located.

cdm_schema, cdmSchema

The schema where the OMOP CDM tables are located. Defaults to NULL.

write_schema, writeSchema

An optional schema in the CDM database that the user has write access to.

cohort_tables, cohortTables

A character vector listing the cohort table names to be included in the CDM

object.

cdm_version, cdmVersion

The version of the OMOP CDM: "5.3" (default), "5.4", "auto". "auto" attempts

to automatically determine the cdm version using heuristics. Cohort tables must

be in the write_schema.

cdm_name, cdmName

The name of the CDM. If NULL (default) the cdm_source_name . ﬁeld in the

CDM_SOURCE table will be used.

achilles_schema, achillesSchema

An optional schema in the CDM database that contains achilles tables.

.soft_validation, .softValidation

Normally the observation period table should not have overlapping observation

periods for a single person. If .softValidation is TRUE the validation check

that looks for overlapping observation periods will be skipped. Other analytic

packages may break or produce incorrect results if softValidation is TRUE and

the observation period table contains overlapping observation periods.

write_prefix, writePrefix

A preﬁx that will be added to all tables created in the write_schema. This can

be used to create namespace in your database write_schema for your tables.

Details

cdm_from_con / cdmFromCon creates a new cdm reference object from a DBI compliant database

connection. In addition to the connection the user needs to pass in the schema in the database where

the cdm data can be found as well as another schema where the user has write access to create

tables. Nearly all downstream analytic packages need the ability to create temporary data in the

database so the write_schema is required.

cdm_from_con 17

Some database systems have the idea of a catalog or a compound schema with two components.

See examples below for how to pass in catalogs and schemas.

You can also specify a write_prefix. This is a short character string that will be added to any

tables created in the write_schema effectively a namespace in the schema just for your analysis.

This makes it easy to ensure you do not overwrite someone elses tables if the write_schema is shared

and allows you to easily clean up tables by dropping all tables that start with the preﬁx. The preﬁx

is considered part of the write_schema since it is effectively a sub-schema. See examples.

Value

A list of dplyr database table references pointing to CDM tables

Examples

## Not run:

library(CDMConnector)

con <- DBI::dbConnect(duckdb::duckdb(), eunomia_dir())

# minimal example

cdm <- cdm_from_con(con,

cdm_schema = "main",

write_schema = "scratch")

cdm <- cdm_from_con(con,

cdm_schema = "main",

write_schema = "scratch",

write_prefix = "tmp_")

# There are a few differen options for using catalogs

cdm <- cdm_from_con(con,

cdm_schema = "catalog.main",

write_schema = "catalog.scratch",

write_prefix = "tmp_")

cdm <- cdm_from_con(con,

cdm_schema = c(catalog = "catalog", schema = "main"),

write_schema = c(catalog = "catalog", schema = "scratch"))

cdm <- cdm_from_con(con,

cdm_schema = c("catalog", "main"),

write_schema = c("catalog", "scratch"))

cdm <- cdm_from_con(con,

cdm_schema = c(catalog = "catalog", schema = "main"),

write_schema = c(catalog = "catalog",

schema = "scratch",

prefix = "tmp_"))

DBI::dbDisconnect(con)

## End(Not run)

18 cdm_from_environment

cdm_from_environment Create a CDM object from a pre-deﬁned set of environment variables

Description

This function is intended to be used with the Darwin execution engine. The execution engine

runs OHDSI studies in a pre-deﬁned runtime environment and makes several environment variables

available for connecting to a CDM database. Programmer writing code to run on the execution

engine and simply use cdm <- cdm_from_environment() to create a cdm reference object to use

for their analysis and the database connection and cdm object should be automatically created. This

obviates the need for site speciﬁc code for connecting to the database and creating the cdm reference

object.

Usage

cdm_from_environment(write_prefix = "")

Arguments

write_prefix (string) An optional preﬁx to use for all tables written to the CDM.

Details

The environment variables used by this function and provided by the execution engine are listed

below.

• DBMS_TYPE: one of "postgresql", "sql server", "redshift", "duckdb", "snowﬂake".

• DATA_SOURCE_NAME: a free text name for the CDM given by the person running the

study.

• CDM_VERSION: one of "5.3", "5.4".

• DBMS_CATALOG: The database catalog. Important primarily for compound schema names

used in SQL Server and Snowﬂake.

• DBMS_SERVER: The database server URL.

• DBMS_NAME: The database name used for creating the connection.

• DBMS_PORT: The database port number.

• DBMS_USERNAME: The database username needed to authenticate.

• DBMS_PASSWORD: The database password needed to authenticate.

• CDM_SCHEMA: The schema name where the OMOP CDM is located in the database.

• WRITE_SCHEMA: The shema where the user has write access and tables will be created

during study execution.

cdm_from_ﬁles 19

Value

A cdm_reference object

Examples

## Not run:

library(CDMConnector)

# This will only work in an evironment where the proper variables are present.

cdm <- cdm_from_environment()

# Proceed with analysis using the cdm object.

# Close the database connection when done.

cdm_disconnect(cdm)

## End(Not run)

cdm_from_files Create a CDM reference from a folder containing parquet, csv, or

feather ﬁles

Description

Create a CDM reference from a folder containing parquet, csv, or feather ﬁles

Usage

cdm_from_files(

path,

format = "auto",

cdm_version = "5.3",

cdm_name = NULL,

as_data_frame = TRUE

)

cdmFromFiles(

path,

format = "auto",

cdmVersion = "5.3",

cdmName = NULL,

asDataFrame = TRUE

)

20 cdm_from_tables

Arguments

path A folder where an OMOP CDM v5.4 instance is located.

format What is the ﬁle format to be read in? Must be "auto" (default), "parquet", "csv",

"feather".

cdm_version, cdmVersion

The version of the cdm (5.3 or 5.4)

cdm_name, cdmName

A name to use for the cdm.

as_data_frame, asDataFrame

TRUE (default) will read ﬁles into R as dataframes. FALSE will read ﬁles into

R as Arrow Datasets.

Value

A list of dplyr database table references pointing to CDM tables

cdm_from_tables Create a cdm object from local tables

Description

Create a cdm object from local tables

Usage

cdm_from_tables(tables, cdm_name, cohort_tables = list(), cdm_version = NULL)

Arguments

tables List of tables to be part of the cdm object.

cdm_name Name of the cdm object.

cohort_tables List of tables that contains cohort, cohort_set and cohort_attrition can be pro-

vided as attributes.

cdm_version Version of the cdm_reference

Value

A cdm_reference object.

cdm_select_tbl 21

Examples

library(omopgenerics)

library(dplyr, warn.conflicts = FALSE)

person <- tibble(

person_id = 1, gender_concept_id = 0, year_of_birth = 1990,

race_concept_id = 0, ethnicity_concept_id = 0

)

observation_period <- tibble(

observation_period_id = 1, person_id = 1,

observation_period_start_date = as.Date("2000-01-01"),

observation_period_end_date = as.Date("2025-12-31"),

period_type_concept_id = 0

)

cdm <- cdm_from_tables(

tables = list("person" = person, "observation_period" = observation_period),

cdm_name = "test"

)

cdm_select_tbl Select a subset of tables in a cdm reference object

Description

This function uses syntax similar to dplyr::select and can be used to subset a cdm reference

object to a speciﬁc tables

Usage

cdm_select_tbl(cdm, ...)

Arguments

cdm A cdm reference object created by cdm_from_con

... One or more table names of the tables of the cdm object. tidyselect is sup-

ported, see dplyr::select() for details on the semantics.

Value

A cdm reference object containing the selected tables

Examples

## Not run:

con <- DBI::dbConnect(duckdb::duckdb(), dbdir = eunomia_dir())

cdm <- cdm_from_con(con, "main")

22 cohortSet

cdm_select_tbl(cdm, person)

cdm_select_tbl(cdm, person, observation_period)

cdm_select_tbl(cdm, tbl_group("vocab"))

cdm_select_tbl(cdm, "person")

DBI::dbDisconnect(con)

## End(Not run)

cohortAttrition Get attrition table from a cohort_table object

Description

Get attrition table from a cohort_table object

Usage

cohortAttrition(x)

cohort_attrition(x)

Arguments

x A cohort_table object

cohortSet Get cohort settings from a cohort_table object

Description

Get cohort settings from a cohort_table object

Usage

cohortSet(x)

cohort_set(x)

Arguments

x A cohort_table object

cohort_count 23

cohort_count Get cohort counts from a generated_cohort_set object.

Description

Get cohort counts from a generated_cohort_set object.

Usage

cohort_count(cohort)

Arguments

cohort A generated_cohort_set object.

Value

A table with the counts.

Examples

## Not run:

library(CDMConnector)

library(dplyr)

con <- DBI::dbConnect(duckdb::duckdb(), eunomia_dir())

cdm <- cdm_from_con(con = con, cdm_schema = "main", write_schema = "main")

cdm <- generateConceptCohortSet(

cdm = cdm, conceptSet = list(pharyngitis = 4112343), name = "new_cohort"

)

cohort_count(cdm$new_cohort)

## End(Not run)

cohort_erafy Collapse cohort records within a certain number of days

Description

Collapse cohort records within a certain number of days

Usage

cohort_erafy(x, gap)

cohortErafy(x, gap)

24 computeQuery

Arguments

x A generated cohort set

gap When two cohort records are ’gap’ days apart or less the periods will be col-

lapsed into a single record

Value

A lazy query on a generated cohort set

cohort_union Union all cohorts in a cohort set with cohorts in a second cohort set

Description

Union all cohorts in a cohort set with cohorts in a second cohort set

Usage

cohort_union(x, y)

cohortUnion(x, y)

Arguments

x A tbl reference to a cohort table with one or more generated cohorts

y A tbl reference to a cohort table with one generated cohort

Value

A lazy query that when executed will resolve to a new cohort table with one the same cohort_deﬁnitions_ids

in x resulting from the union of all cohorts in x with the single cohort in y cohort table

computeQuery Execute dplyr query and save result in remote database

Description

This function is a wrapper around dplyr::compute that is tested on several database systems. It is

needed to handle edge cases where dplyr::compute does not produce correct SQL.

computeQuery 25

Usage

computeQuery(

name = uniqueTableName(),

temporary = TRUE,

schema = NULL,

overwrite = TRUE,

...

)

compute_query(

name = uniqueTableName(),

temporary = TRUE,

schema = NULL,

overwrite = TRUE,

...

)

Arguments

x A dplyr query

name The name of the table to create.

temporary Should the table be temporary: TRUE (default) or FALSE

schema The schema where the table should be created. Ignored if temporary = TRUE.

overwrite Should the table be overwritten if it already exists: TRUE (default) or FALSE

Ignored if temporary = TRUE.

... Further arguments passed on the dplyr::compute

Value

A dplyr::tbl() reference to the newly created table.

Examples

## Not run:

library(CDMConnector)

con <- DBI::dbConnect(duckdb::duckdb(), dbdir = eunomia_dir())

cdm <- cdm_from_con(con, "main")

# create a temporary table in the remote database from a dplyr query

drugCount <- cdm$concept %>%

dplyr::count(domain_id == "Drug") %>%

computeQuery()

# create a permanent table in the remote database from a dplyr query

drugCount <- cdm$concept %>%

26 copy_cdm_to

dplyr::count(domain_id == "Drug") %>%

computeQuery("tmp_table", temporary = FALSE, schema = "main")

DBI::dbDisconnect(con, shutdown = TRUE)

## End(Not run)

copy_cdm_to Copy a cdm object from one database to another

Description

It may be helpful to be able to easily copy a small test cdm from a local database to a remote for

testing. copy_cdm_to takes a cdm object and a connection. It copies the cdm to the remote database

connection. CDM tables can be preﬁxed in the new database allowing for multiple cdms in a single

shared database schema.

Usage

copy_cdm_to(con, cdm, schema, overwrite = FALSE)

copyCdmTo(con, cdm, schema, overwrite = FALSE)

Arguments

con A DBI datbase connection created by DBI::dbConnect

cdm A cdm reference object created by CDMConnector::cdmFromCon or CDMConnector::cdm_from_con

schema schema name in the remote database where the user has write permission

overwrite Should the cohort table be overwritten if it already exists? TRUE or FALSE

(default)

Details

[Experimental]

Value

A cdm reference object pointing to the newly created cdm in the remote database

dateadd 27

dateadd Add days or years to a date in a dplyr query

Description

This function must be "unquoted" using the "bang bang" operator (!!). See example.

Usage

dateadd(date, number, interval = "day")

Arguments

date The name of a date column in the database table as a character string

number The number of units to add. Can be a positive or negative whole number.

interval The units to add. Must be either "day" (default) or "year"

Value

Platform speciﬁc SQL that can be used in a dplyr query.

Examples

## Not run:

con <- DBI::dbConnect(duckdb::duckdb())

date_tbl <- dplyr::copy_to(con, data.frame(date1 = as.Date("1999-01-01")),

name = "tmpdate", overwrite = TRUE, temporary = TRUE)

df <- date_tbl %>%

dplyr::mutate(date2 = !!dateadd("date1", 1, interval = "year")) %>%

dplyr::collect()

DBI::dbDisconnect(con, shutdown = TRUE)

## End(Not run)

datediff Compute the difference between two days

Description

This function must be "unquoted" using the "bang bang" operator (!!). See example.

Usage

datediff(start, end, interval = "day")

28 datepart

Arguments

start The name of the start date column in the database as a string.

end The name of the end date column in the database as a string.

interval The units to use for difference calculation. Must be either "day" (default) or

"year".

Value

Platform speciﬁc SQL that can be used in a dplyr query.

Examples

## Not run:

con <- DBI::dbConnect(duckdb::duckdb())

date_tbl <- dplyr::copy_to(con, data.frame(date1 = as.Date("1999-01-01")),

name = "tmpdate", overwrite = TRUE, temporary = TRUE)

df <- date_tbl %>%

dplyr::mutate(date2 = !!dateadd("date1", 1, interval = "year")) %>%

dplyr::mutate(dif_years = !!datediff("date1", "date2", interval = "year")) %>%

dplyr::collect()

DBI::dbDisconnect(con, shutdown = TRUE)

## End(Not run)

datepart Extract the day, month or year of a date in a dplyr pipeline

Description

Extract the day, month or year of a date in a dplyr pipeline

Usage

datepart(date, interval = "year", dbms = NULL)

Arguments

date Character string that represents to a date column.

interval Interval to extract from a date. Valid options are "year", "month", or "day".

dbms Database system, if NULL it is auto detected.

dbms 29

Examples

## Not run:

con <- DBI::dbConnect(duckdb::duckdb(), ":memory:")

date_tbl <- dplyr::copy_to(con,

data.frame(birth_date = as.Date("1993-04-19")),

name = "tmp",

temporary = TRUE)

df <- date_tbl %>%

dplyr::mutate(year = !!datepart("birth_date", "year")) %>%

dplyr::mutate(month = !!datepart("birth_date", "month")) %>%

dplyr::mutate(day = !!datepart("birth_date", "day")) %>%

dplyr::collect()

DBI::dbDisconnect(con, shutdown = TRUE)

## End(Not run)

dbms Get the database management system (dbms) from a cdm_reference or

DBI connection

Description

Get the database management system (dbms) from a cdm_reference or DBI connection

Usage

dbms(con)

Arguments

con A DBI connection or cdm_reference

Value

A character string representing the dbms that can be used with SqlRender

Examples

## Not run:

con <- DBI::dbConnect(duckdb::duckdb(), dbdir = eunomia_dir())

cdm <- cdm_from_con(con)

dbms(cdm)

dbms(con)

## End(Not run)

30 downloadEunomiaData

dbSource Create a source for a cdm in a database.

Description

Create a source for a cdm in a database.

Usage

dbSource(con, writeSchema)

Arguments

con Connection to a database.

writeSchema Schema where cohort tables are. You must have read and write access to it.

downloadEunomiaData Download Eunomia data ﬁles

Description

Download the Eunomia data ﬁles from https://github.com/darwin-eu/EunomiaDatasets

Usage

downloadEunomiaData(

datasetName = "GiBleed",

cdmVersion = "5.3",

pathToData = Sys.getenv("EUNOMIA_DATA_FOLDER"),

overwrite = FALSE

)

download_eunomia_data(

dataset_name = "GiBleed",

cdm_version = "5.3",

path_to_data = Sys.getenv("EUNOMIA_DATA_FOLDER"),

overwrite = FALSE

)

eunomiaDir 31

Arguments

overwrite Control whether the existing archive ﬁle will be overwritten should it already

exist.

dataset_name, datasetName

The data set name as found on https://github.com/darwin-eu/EunomiaDatasets.

The data set name corresponds to the folder with the data set ZIP ﬁles

cdm_version, cdmVersion

The OMOP CDM version. This version will appear in the sufﬁx of the data ﬁle,

for example: synpuf_5.3.zip. Must be ’5.3’ (default) or ’5.4’.

path_to_data, pathToData

The path where the Eunomia data is stored on the ﬁle system., By default the

value of the environment variable "EUNOMIA_DATA_FOLDER" is used.

Value

Invisibly returns the destination if the download was successful.

Examples

## Not run:

downloadEunomiaData("GiBleed")

## End(Not run)

eunomiaDir Create a copy of an example OMOP CDM dataset

Description

Creates a copy of a Eunomia database, and returns the path to the new database ﬁle. If the dataset

does not yet exist on the user’s computer it will attempt to download the source data to the the path

deﬁned by the EUNOMIA_DATA_FOLDER environment variable.

Usage

eunomiaDir(

datasetName = "GiBleed",

cdmVersion = "5.3",

databaseFile = tempfile(fileext = ".duckdb")

)

eunomia_dir(

dataset_name = "GiBleed",

cdm_version = "5.3",

database_file = tempfile(fileext = ".duckdb")

)

32 eunomia_is_available

Arguments

datasetName, dataset_name

One of "GiBleed" (default), "synthea-allergies-10k", "synthea-anemia-10k", "synthea-

breast_cancer-10k", "synthea-contraceptives-10k", "synthea-covid19-10k", "synthea-

covid19-200k", "synthea-dermatitis-10k", "synthea-heart-10k", "synthea-hiv-10k",

"synthea-lung_cancer-10k", "synthea-medications-10k", "synthea-metabolic_syndrome-

10k", "synthea-opioid_addiction-10k", "synthea-rheumatoid_arthritis-10k", "synthea-

snf-10k", "synthea-surgery-10k", "synthea-total_joint_replacement-10k", "synthea-

veteran_prostate_cancer-10k", "synthea-veterans-10k", "synthea-weight_loss-10k"

cdmVersion, cdm_version

The OMOP CDM version. Must be "5.3" or "5.4".

databaseFile, database_file

The full path to the new copy of the example CDM dataset.

Value

The ﬁle path to the new Eunomia dataset copy

Examples

## Not run:

con <- DBI::dbConnect(duckdb::duckdb(), eunomiaDir("GiBleed"))

DBI::dbDisconnect(con, shutdown = TRUE)

## End(Not run)

eunomia_is_available Has the Eunomia dataset been cached?

Description

Has the Eunomia dataset been cached?

Usage

eunomia_is_available(dataset_name = "GiBleed", cdm_version = "5.3")

eunomiaIsAvailable(datasetName = "GiBleed", cdmVersion = "5.3")

Arguments

dataset_name, datasetName

Name of the Eunomia dataset to check. Defaults to "GiBleed".

cdm_version, cdmVersion

Version of the Eunomia dataset to check. Must be "5.3" or "5.4".

Value

TRUE if the eunomia example dataset is available and FASLE otherwise

exampleDatasets 33

exampleDatasets List the available example CDM datasets

Description

List the available example CDM datasets

Usage

exampleDatasets()

example_datasets()

Value

A character vector with example CDM dataset identiﬁers

Examples

## Not run:

library(CDMConnector)

exampleDatasets()[1]

#> [1] "GiBleed"

con <- DBI::dbConnect(duckdb::duckdb(), eunomiaDir("GiBleed"))

cdm <- cdm_from_con(con)

## End(Not run)

generateCohortSet Generate a cohort set on a cdm object

Description

A "chort_table" object consists of four components

• A remote table reference to an OHDSI cohort table with at least the columns: cohort_deﬁnition_id,

subject_id, cohort_start_date, cohort_end_date. Additional columns are optional and some

analytic packages deﬁne additional columns speciﬁc to certain analytic cohorts.

• A settings attribute which points to a remote table containing cohort settings including the

names of the cohorts.

• An attrition attribute which points to a remote table with attrition information recorded dur-

ing generation. This attribute is optional. Since calculating attrition takes additional compute

it can be skipped resulting in a NULL attrition attribute.

• A cohortCounts attribute which points to a remote table containing cohort counts

34 generateCohortSet

Each of the three attributes are tidy tables. The implementation of this object is experimental and

user feedback is welcome.

[Experimental]

One key design principle is that cohort_table objects are created once and can persist across analysis

execution but should not be modiﬁed after creation. While it is possible to modify a cohort_table

object doing so will invalidate it and it’s attributes may no longer be accurate.

Usage

generateCohortSet(

cdm,

cohortSet,

name,

computeAttrition = TRUE,

overwrite = TRUE

)

generate_cohort_set(

cdm,

cohort_set,

name = "cohort",

compute_attrition = TRUE,

overwrite = TRUE

)

Arguments

cdm A cdm reference created by CDMConnector. write_schema must be speciﬁed.

name Name of the cohort table to be created. This will also be used as a preﬁx for the

cohort attribute tables.

overwrite Should the cohort table be overwritten if it already exists? TRUE (default) or

FALSE

cohort_set, cohortSet

Can be a cohortSet object created with readCohortSet()

compute_attrition, computeAttrition

Should attrition be computed? TRUE (default) or FALSE

Examples

## Not run:

library(CDMConnector)

con <- DBI::dbConnect(duckdb::duckdb(), eunomia_dir())

cdm <- cdm_from_con(con,

cdm_schema = "main",

write_schema = "main")

cohortSet <- readCohortSet(system.file("cohorts2", package = "CDMConnector"))

cdm <- generateCohortSet(cdm, cohortSet, name = "cohort")

generateConceptCohortSet 35

print(cdm$cohort)

attrition(cdm$cohort)

settings(cdm$cohort)

cohortCount(cdm$cohort)

## End(Not run)

generateConceptCohortSet

Create a new generated cohort set from a list of concept sets

Description

Generate a new cohort set from one or more concept sets. Each concept set will result in one cohort

and represent the time during which the concept was observed for each subject/person. Concept

sets can be passed to this function as:

• A named list of numeric vectors, one vector per concept set

• A named list of Capr concept sets

Clinical observation records will be looked up in the respective domain tables using the vocabulary

in the CDM. If a required domain table does not exist in the cdm object a warning will be given.

Concepts that are not in the vocabulary or in the data will be silently ignored. If end dates are

missing or do not exist, as in the case of the procedure and observation domains, the the start date

will be used as the end date.

Usage

generateConceptCohortSet(

cdm,

conceptSet = NULL,

name,

limit = "first",

requiredObservation = c(0, 0),

end = "observation_period_end_date",

subsetCohort = NULL,

subsetCohortId = NULL,

overwrite = TRUE

)

generate_concept_cohort_set(

cdm,

concept_set = NULL,

name = "cohort",

limit = "first",

required_observation = c(0, 0),

36 generateConceptCohortSet

end = "observation_period_end_date",

subset_cohort = NULL,

subset_cohort_id = NULL,

overwrite = TRUE

)

Arguments

cdm A cdm reference object created by CDMConnector::cdmFromCon or CDMConnector::cdm_from_con

conceptSet, concept_set

A named list of numeric vectors or a Concept Set Expression created omopgenerics::newConceptSetExpression

name The name of the new generated cohort table as a character string

limit Include "ﬁrst" (default) or "all" occurrences of events in the cohort

• "ﬁrst" will include only the ﬁrst occurrence of any event in the concept set

in the cohort.

• "all" will include all occurrences of the events deﬁned by the concept set in

the cohort.

requiredObservation, required_observation

A numeric vector of length 2 that speciﬁes the number of days of required ob-

servation time prior to index and post index for an event to be included in the

cohort.

end How should the cohort_end_date be deﬁned?

• "observation_period_end_date" (default): The earliest observation_period_end_date

after the event start date

• numeric scalar: A ﬁxed number of days from the event start date

• "event_end_date": The event end date. If the event end date is not populated

then the event start date will be used

subsetCohort, subset_cohort

A cohort table containing the individuals for which to generate cohorts for. Only

individuals in the cohort table will appear in the created generated cohort set.

subsetCohortId, subset_cohort_id

A set of cohort IDs from the cohort table for which to include. If none are

provided, all cohorts in the cohort table will be included.

overwrite Should the cohort table be overwritten if it already exists? TRUE (default) or

FALSE.

Value

A cdm reference object with the new generated cohort set table added

inSchema 37

inSchema Helper for working with compound schemas

Description

This is similar to dbplyr::in_schema but has been tested across multiple database platforms. It only

exists to work around some of the limitations of dbplyr::in_schema.

Usage

inSchema(schema, table, dbms = NULL)

in_schema(schema, table, dbms = NULL)

Arguments

schema A schema name as a character string

table A table name as character string

dbms The name of the database management system as returned by dbms(connection)

Value

A DBI::Id that represents a qualiﬁed table and schema

intersect_cohorts Intersect all cohorts in a single cohort table

Description

Intersect all cohorts in a single cohort table

Usage

intersect_cohorts(x, cohort_definition_id = 1L)

intersectCohorts(x, cohort_definition_id = 1L)

Arguments

x A tbl reference to a cohort table

cohort_definition_id

A number to use for the new cohort_deﬁnition_id

[Superseded]

38 new_generated_cohort_set

Value

A lazy query that when executed will resolve to a new cohort table with one cohort_deﬁnition_id

resulting from the intersection of all cohorts in the original cohort table

list_tables List tables in a schema

Description

DBI::dbListTables can be used to get all tables in a database but not always in a speciﬁc schema.

listTables will list tables in a schema.

Usage

list_tables(con, schema = NULL)

listTables(con, schema = NULL)

Arguments

con A DBI connection to a database

schema The name of a schema in a database. If NULL, returns DBI::dbListTables(con).

Value

A character vector of table names

Examples

## Not run:

con <- DBI::dbConnect(duckdb::duckdb(), dbdir = eunomia_dir())

listTables(con, schema = "main")

## End(Not run)

new_generated_cohort_set

Constructor for cohort_table objects

Description

[Superseded]

new_generated_cohort_set 39

Usage

new_generated_cohort_set(

cohort_ref,

cohort_set_ref = NULL,

cohort_attrition_ref = NULL,

cohort_count_ref = NULL,

overwrite

)

newGeneratedCohortSet(

cohortRef,

cohortSetRef = NULL,

cohortAttritionRef = NULL,

cohortCountRef = NULL,

overwrite

)

Arguments

cohort_ref, cohortRef

A tbl_sql object that points to a remote cohort table with the following ﬁrst

four columns: cohort_deﬁnition_id, subject_id, cohort_start_date, cohort_end_date.

Additional columns are optional.

cohort_set_ref, cohortSetRef

A tbl_sql object that points to a remote table with the following ﬁrst two

columns: cohort_deﬁnition_id, cohort_name. Additional columns are optional.

cohort_deﬁnition_id should be a primary key on this table and uniquely identify

rows.

cohort_attrition_ref, cohortAttritionRef

A tbl_sql object that points to an attrition table in a remote database with the

ﬁrst column being cohort_deﬁnition_id.

cohort_count_ref, cohortCountRef

A tbl_sql object that points to a cohort_count table in a remote database with

columns cohort_deﬁnition_id, cohort_entries, cohort_subjects.

overwrite Should tables be overwritten if they already exist? TRUE or FALSE (default)

Details

Please use omopgenerics::newCohortTable() instead.

This constructor function is to be used by analytic package developers to create cohort_table

objects.

A cohort_table is a set of person-time from an OMOP CDM database. A cohort_table can be

represented by a table with three columns: subject_id, cohort_start_date, cohort_end_date. Sub-

ject_id is the same as person_id in the OMOP CDM. A cohort_table is a collection of one or

more cohort_table and can be represented as a table with four columns: cohort_deﬁnition_id,

subject_id, cohort_start_date, cohort_end_date.

This constructor function deﬁnes the cohort_table object in R.

40 new_generated_cohort_set

The object is an extension of a tbl_sql object deﬁned in dplyr. This is a lazy database query that

points to a cohort table in the database with at least the columns cohort_deﬁnition_id, subject_id,

cohort_start_date, cohort_end_date. The table could optionally have more columns as well.

In addition the cohort_table object has three optional attributes. These are: cohort_set, co-

hort_attrition, cohort_count. Each of these attributes is also a lazy SQL query (tbl_sql) that points

to a table in a database and is described below.

cohort_set:

cohort_set is a table with one row per cohort_deﬁnition_id. The ﬁrst two columns of the co-

hort_set table are: cohort_deﬁnition_id, and cohort_name. Additional columns can be added.

The cohort_set table is meant to store metadata about the cohort deﬁnition. Since this table is

required it will be created if it it is not supplied.

cohort_attrition:

cohort_attrition is an optional table that stores attrition information recorded during the cohort

generation process such as how many persons were dropped at each step of inclusion rule appli-

cation. The ﬁrst column of this table should be cohort_definition_id but all other columns

currently have no constraints.

cohort_count:

cohort_count is a option attribute table that records the number of records and the number of

unique persons in each cohort in a cohort_table. It is derived metadata that can be re-derived

as long as cohort_set, the complete list of cohorts in the set, is available. Column names of

cohort_count are: cohort_deﬁnition_id, number_records, number_subjects. This table is required

for cohort_table objects and will be created if not supplied.

Value

A cohort_table object that is a tbl_sql reference to a cohort table in the write_schema of an

OMOP CDM

Examples

## Not run:

# This function is for developers who are creating cohort_table

# objects in their packages. The function should accept a cdm_reference

# object as the first argument and return a cdm_reference object with the

# cohort table added. The second argument should be `name` which will be

# the prefix for the database tables, the name of the cohort table in the

# database and the name of the cohort table in the cdm object.

# Other optional arguments can be added after the first two.

generateCustomCohort <- function(cdm, name, ...) {

# accept a cdm_reference object as input

checkmate::assertClass(cdm, "cdm_reference")

con <- attr(cdm, "dbcon")

# Create the tables in the database however you like

# All the tables should be prefixed with `name`

read_cohort_set 41

# The cohort table should be called `name` in the database

# Create the dplyr table references

cohort_ref <- dplyr::tbl(con, name)

cohort_set <- dplyr::tbl(con, paste0(name, "_set"))

cohort_attrition_ref <- dplyr::tbl(con, paste0(name, "_attrition"))

cohort_count_ref <- dplyr::tbl(con, paste0(name, "_count"))

# add to the cdm

cdm[[name]] <- cohort_ref

# create the generated cohort set object using the constructor

cdm[[name]] <- new_generated_cohort_set(

cdm[[name]],

cohort_set_ref = cohort_set_ref,

cohort_attrition_ref = cohort_attrition_ref,

cohort_count_ref = cohort_count_ref)

return(cdm)

}

## End(Not run)

read_cohort_set Read a set of cohort deﬁnitions into R

Description

A "cohort set" is a collection of cohort deﬁnitions. In R this is stored in a dataframe with co-

hort_deﬁnition_id, cohort_name, and cohort columns. On disk this is stored as a folder with a

CohortsToCreate.csv ﬁle and one or more json ﬁles. If the CohortsToCreate.csv ﬁle is missing then

all of the json ﬁles in the folder will be used, cohort_deﬁnition_id will be automatically assigned in

alphabetical order, and cohort_name will match the ﬁle names.

Usage

read_cohort_set(path)

readCohortSet(path)

Arguments

path The path to a folder containing Circe cohort deﬁnition json ﬁles and optionally

a csv ﬁle named CohortsToCreate.csv with columns cohortId, cohortName, and

jsonPath.

42 recordCohortAttrition

recordCohortAttrition Add attrition reason to a cohort_table object

Description

Update the cohort attrition table with new counts and a reason for attrition.

Usage

recordCohortAttrition(cohort, reason, cohortId = NULL)

record_cohort_attrition(cohort, reason, cohortId = NULL)

Arguments

cohort A generated cohort set

reason The reason for attrition as a character string

cohortId Cohort deﬁnition id of the cohort you want to update the attrition

Value

The cohort object with the attributes created or updated.

[Experimental]

Examples

## Not run:

library(CDMConnector)

library(dplyr)

con <- DBI::dbConnect(duckdb::duckdb(), eunomia_dir())

cdm <- cdm_from_con(con = con, cdm_schema = "main", write_schema = "main")

cdm <- generateConceptCohortSet(

cdm = cdm, conceptSet = list(pharyngitis = 4112343), name = "new_cohort"

)

settings(cdm$new_cohort)

cohortCount(cdm$new_cohort)

cohortAttrition(cdm$new_cohort)

cdm$new_cohort <- cdm$new_cohort %>%

filter(cohort_start_date >= as.Date("2010-01-01"))

cdm$new_cohort <- updateCohortAttributes(

cohort = cdm$new_cohort, reason = "Only events after 2010"

)

settings(cdm$new_cohort)

snapshot 43

cohortCount(cdm$new_cohort)

cohortAttrition(cdm$new_cohort)

## End(Not run)

snapshot Extract CDM metadata

Description

Extract the name, version, and selected record counts from a cdm.

Usage

snapshot(cdm)

Arguments

cdm A cdm object

Value

A named list of attributes about the cdm including selected ﬁelds from the cdm_source table and

record counts from the person and observation_period tables

Examples

## Not run:

library(CDMConnector)

con <- DBI::dbConnect(duckdb::duckdb(), eunomia_dir())

cdm <- cdm_from_con(con, "main")

snapshot(cdm)

DBI::dbDisconnect(con, shutdown = TRUE)

## End(Not run)

stow Collect a list of lazy queries and save the results as ﬁles

Description

Collect a list of lazy queries and save the results as ﬁles

Usage

stow(cdm, path, format = "parquet")

44 summarise_quantile

Arguments

cdm A cdm object

path A folder to save the cdm object to

format The ﬁle format to use: "parquet" (default), "csv", "feather" or "duckdb".

Value

Invisibly returns the cdm input

Examples

## Not run:

con <- DBI::dbConnect(duckdb::duckdb(), dbdir = eunomia_dir())

vocab <- cdm_from_con(con, "main") %>%

cdm_select_tbl("concept", "concept_ancestor")

stow(vocab, here::here("vocab_tables"))

DBI::dbDisconnect(con, shutdown = TRUE)

## End(Not run)

summarise_quantile Quantile calculation using dbplyr

Description

This function provides DBMS independent syntax for quantiles estimation. Can be used by itself

or in combination with mutate() when calculating other aggregate metrics (min, max, mean).

summarise_quantile(), summarize_quantile(), summariseQuantile() and summarizeQuantile()

are synonyms.

Usage

summarise_quantile(.data, x = NULL, probs, name_suffix = "value")

summarize_quantile(.data, x = NULL, probs, name_suffix = "value")

summariseQuantile(.data, x = NULL, probs, nameSuffix = "value")

summarizeQuantile(.data, x = NULL, probs, nameSuffix = "value")

Arguments

.data lazy data frame backed by a database query.

x column name whose sample quantiles are wanted.

probs numeric vector of probabilities with values in [0,1].

name_suffix, nameSuffix

character; is appended to numerical quantile value as a column name part.

tbl_group 45

Details

Implemented quantiles estimation algorithm returns values analogous to quantile{stats} with ar-

gument type = 1. See discussion in Hyndman and Fan (1996). Results differ from PERCENTILE_CONT

natively implemented in various DBMS, where returned values are equal to quantile{stats} with

default argument type = 7

Value

An object of the same type as ’.data’

Examples

## Not run:

con <- DBI::dbConnect(duckdb::duckdb())

mtcars_tbl <- dplyr::copy_to(con, mtcars, name = "tmp", overwrite = TRUE, temporary = TRUE)

df <- mtcars_tbl %>%

dplyr::group_by(cyl) %>%

dplyr::mutate(mean = mean(mpg, na.rm = TRUE)) %>%

summarise_quantile(mpg, probs = c(0, 0.2, 0.4, 0.6, 0.8, 1),

name_suffix = "quant") %>%

dplyr::collect()

DBI::dbDisconnect(con, shutdown = TRUE)

## End(Not run)

tbl_group CDM table selection helper

Description

The OMOP CDM tables are grouped together and the tbl_group function allows users to easily

create a CDM reference including one or more table groups.

Usage

tbl_group(group)

tblGroup(group)

Arguments

group A character vector of CDM table groups: "vocab", "clinical", "all", "default",

"derived".

46 union_cohorts

Details

" alt" alt

The "default" table group is meant to capture the most commonly used set of CDM tables. Cur-

rently the "default" group is: person, observation_period, visit_occurrence, visit_detail, condi-

tion_occurrence, drug_exposure, procedure_occurrence, device_exposure, measurement, observa-

tion, death, note, note_nlp, specimen, fact_relationship, location, care_site, provider, payer_plan_period,

cost, drug_era, dose_era, condition_era, concept, vocabulary, concept_relationship, concept_ancestor,

concept_synonym, drug_strength

Value

A character vector of CDM tables names in the groups

Examples

## Not run:

con <- DBI::dbConnect(RPostgres::Postgres(),

dbname = "cdm",

host = "localhost",

user = "postgres",

password = Sys.getenv("PASSWORD"))

cdm <- cdm_from_con(con, cdm_name = "test", cdm_schema = "public") %>%

cdm_select_tbl(tbl_group("vocab"))

## End(Not run)

union_cohorts Union all cohorts in a single cohort table

Description

Union all cohorts in a single cohort table

Usage

union_cohorts(x, cohort_definition_id = 1L)

unionCohorts(x, cohort_definition_id = 1L)

Arguments

x A tbl reference to a cohort table

cohort_definition_id

A number to use for the new cohort_deﬁnition_id

[Superseded]

uniqueTableName 47

Value

A lazy query that when executed will resolve to a new cohort table with one cohort_deﬁnition_id

resulting from the union of all cohorts in the original cohort table

uniqueTableName Create a unique table name for temp tables

Description

Create a unique table name for temp tables

Usage

uniqueTableName()

unique_table_name()

Value

A string that can be used as a dbplyr temp table name

validate_cdm Validation report for a CDM

Description

Print a short validation report for a cdm object. The validation includes checking that column names

are correct and that no tables are empty. A short report is printed to the console. This function is

meant for interactive use.

Usage

validate_cdm(cdm)

validateCdm(cdm)

Arguments

cdm A cdm reference object.

Value

Invisibly returns the cdm input

48 version

Examples

## Not run:

con <- DBI::dbConnect(duckdb::duckdb(), eunomia_dir())

cdm <- cdm_from_con(con, cdm_schema = "main")

validate_cdm(cdm)

DBI::dbDisconnect(con)

## End(Not run)

version Get the CDM version

Description

Extract the CDM version attribute from a cdm_reference object

Usage

version(cdm)

Arguments

cdm A cdm object

Value

"5.3" or "5.4"

Examples

## Not run:

library(CDMConnector)

con <- DBI::dbConnect(duckdb::duckdb(), eunomia_dir())

cdm <- cdm_from_con(con, cdm_schema = "main", write_schema = "main")

version(cdm)

DBI::dbDisconnect(con, shutdown = TRUE)

## End(Not run)

Index

append_permanent (appendPermanent), 3

appendPermanent, 3

as_date (asDate), 4

asDate, 4

assert_tables, 5

assert_write_schema, 6

assertTables (assert_tables), 5

assertWriteSchema

(assert_write_schema), 6

benchmarkCDMConnector, 7

cdm_disconnect (cdmDisconnect), 8

cdm_flatten (cdmFlatten), 8

cdm_from_con, 15

cdm_from_environment, 18

cdm_from_files, 19

cdm_from_tables, 20

cdm_name (cdmName), 10

cdm_sample (cdmSample), 11

cdm_select_tbl, 21

cdm_subset (cdmSubset), 12

cdm_subset_cohort (cdmSubsetCohort), 13

cdmCon, 7

cdmDisconnect, 8

cdmFlatten, 8

cdmFromCon (cdm_from_con), 15

cdmFromFiles (cdm_from_files), 19

cdmName, 10

cdmSample, 11

cdmSubset, 12

cdmSubsetCohort, 13

cdmWriteSchema, 14

cohort_attrition (cohortAttrition), 22

cohort_count, 23

cohort_erafy, 23

cohort_set (cohortSet), 22

cohort_union, 24

cohortAttrition, 22

cohortErafy (cohort_erafy), 23

cohortSet, 22

cohortUnion (cohort_union), 24

compute_query (computeQuery), 24

computeQuery, 24

copy_cdm_to, 26

copyCdmTo (copy_cdm_to), 26

dateadd, 27

datediff, 27

datepart, 28

dbms, 29

dbSource, 30

download_eunomia_data

(downloadEunomiaData), 30

downloadEunomiaData, 30

eunomia_dir (eunomiaDir), 31

eunomia_is_available, 32

eunomiaDir, 31

eunomiaIsAvailable

(eunomia_is_available), 32

example_datasets (exampleDatasets), 33

exampleDatasets, 33

generate_cohort_set

(generateCohortSet), 33

generate_concept_cohort_set

(generateConceptCohortSet), 35

generateCohortSet, 33

generateConceptCohortSet, 35

in_schema (inSchema), 37

inSchema, 37

intersect_cohorts, 37

intersectCohorts (intersect_cohorts), 37

list_tables, 38

listTables (list_tables), 38

new_generated_cohort_set, 38

50 INDEX

newGeneratedCohortSet

(new_generated_cohort_set), 38

read_cohort_set, 41

readCohortSet (read_cohort_set), 41

record_cohort_attrition

(recordCohortAttrition), 42

recordCohortAttrition, 42

snapshot, 43

stow, 43

summarise_quantile, 44

summariseQuantile (summarise_quantile),

summarize_quantile

(summarise_quantile), 44

summarizeQuantile (summarise_quantile),

tbl_group, 45

tblGroup (tbl_group), 45

union_cohorts, 46

unionCohorts (union_cohorts), 46

unique_table_name (uniqueTableName), 47

uniqueTableName, 47

validate_cdm, 47

validateCdm (validate_cdm), 47

version, 48