5 Functional Programming

Functions provide a host of benefits to the user. The allow for efficient automation of repetitive tasks, bundling or common operations and a host of other possibilities.

One additional, and equally important, opporunity they offer is to reduce errors.

As touched upon in Pragmatic Programming, the DRY (DON’T REPEAT YOURSELF) is well suited to functions.

A Motivating Example

When dealing with a new dataset there are two frequent issues that arise with concentration data - BQL values, and how to handle them.

Given the hypothetical situation where you are given 3 similarly-structured datasets how you could handle replacing the phrase LLOQ < 10 simply with NA?

The copy-paste process may be something along the lines of:

df1$conc[df1$CONC == 'LLOQ < 10'] <- NA
df2$conc[df2$CONC == 'LLOQ < 10'] <- NA
df3$conc[df3$CONC == 'LLOQ < 10'] <- NA
df4$conc[df4$CONC == 'LLOQ < 10'] <- NA
df5$conc[df4$CONC == 'LLOQ < 10'] <- NA

Quick - did anyone spot any issue with the code?

Let’s write a function to help automate this, as well as reduce potential for mistakes such as the above

BQL_NA <- function(x) {
    x[x[["CONC"]] == 'LLOQ < 10'] <- NA
    x
}

BQL_NA(df1)
BQL_NA(df2)
BQL_NA(df3)
BQL_NA(df4)
BQL_MA(df5)

Closer, but again, we’re repeating ourselves.

We touched on lapply, lets combine our dataframes into a list a

SIDE TRICK df[] <- lapply(df, our_fun) - using df[] will give us back a dataframe instead of a list from df <- lapply(df, our_fun)

df_list <- list(df1, df2, df3, df4, df5)
df_NOBQL <- lapply(df_list, BQL_NA)

5.0.1 Assignment

  • extend BQL_NA to allow us to pass different character strings for what how the LLOQ was defined
  • BONUS: likewise, extend it to:
    • handle any column ‘conc’ regardless of capitalization
    • handle any concentration column name (conc, concentration, DV, …)