Using RDA Files as a Notebook

Lots of my work involves quite a few R files that are executed in order. For example, 01-prune-data.R might be executed first, and 02-filter-data.R might be executed next. If the intermediate results are tabular, I might save them in CSV or similar text files, but quite often I use RDA files, either because the data are in a complex form (e.g. containing lists) or because the data are large.

When I get close to the paper-writing stage, though, I often want to pick out things that not just interconnecting data, but rather processed data. For example, I might do a lm() and then find the coefficients, the p value, etc., perhaps rounded to the number of digits that I want for publication. Or I might want to store the number of data in a particular category. But the problem is that one of the many things I might want could be defined in one R file, and another might be in another file. That means that, when I want to incorporate the results into a publication, I have to remember not just the name of an item, but also the name of the particular file that stores it.

Below is a scheme that gets around this. I won’t bother explaining it in much detail. I might decide to make it into a package, in which case I’ll document how things work. But, in a nutshell, the idea is that a single RDA file is created at an early stage of processing, and that R files may later write results there as desired. Note that the stored items can be any R object, e.g. the full results of lm(), etc. It is, of course, required that the user be aware of the objects that are already stored. I will write a function to do that, perhaps updating this blog posting if I do. But I plan to use the code you see below in an actual project, to see what “feels right”, rather than planning it out in advance. My approach is to use a saw before a plane, a plane before sandpaper, and so on.

UPDATE 2024-02-09. I am going to make a package for this, to save having to copy the code from one spot to another, and doing source() in code that uses it. While I’m at it, I’ll invent a scheme where you can specify which RDA file to use at the start of your script, and then don’t need to name it in calls to e.g. saveRda(). The natural name for the package seems to be rdan, for RDA-notebook (also for “arrh Dan”, as a pirate might say.)

Example

Here’s an example of why I made this. I like how I can insert comments in there, as well as values. And I think it will be convenient in an Rmd or Rnw file (that is, in a final document) to have all such things loadable in a single rda file.

> load("results.rda"); str(results)
List of 4
 $ havePair:List of 3
  ..$ value  : Named logi [1:1453] TRUE TRUE TRUE FALSE TRUE TRUE ...
  .. ..- attr(*, "names")= chr [1:1453] "D1901534_152" "D1901534_153" "D3901601_002" "D3901601_003" ...
  ..$ comment: chr "profile has warmish-coldish pair"
  ..$ context: chr "/Users/kelley/git/argo_intrusions/sandbox/dek/01"
 $ m1ar2   :List of 3
  ..$ value  : num 0.709
  ..$ comment: chr "adj R^2 from lm() of tagged fraction vs longitude"
  ..$ context: chr "/Users/kelley/git/argo_intrusions/sandbox/dek/01"
 $ m1p     :List of 3
  ..$ value  : num 3.39e-09
  ..$ comment: chr "p from lm() of tagged fraction vs longitude"
  ..$ context: chr "/Users/kelley/git/argo_intrusions/sandbox/dek/01"
 $ m2p     :List of 3
  ..$ value  : num 0.564
  ..$ comment: chr "p from lm() of # profiles vs longitude"
  ..$ context: chr "/Users/kelley/git/argo_intrusions/sandbox/dek/01"

Code

debug <- FALSE
dmsg <- function(...) if (debug) message(...)

createRDA <- function(rdaName = "results.rda", clear = FALSE) {
    if (clear || !file.exists(rdaName)) {
        dmsg("creating RDA file \"", rdaName, "\"")
        results <- list() # stores name, value, comment and context
        save(results, file = rdaName)
    } else {
        dmsg("RDA file \"", rdaName, "\" already exists, so will not be recreated")
    }
}

readRDA <- function(name = NULL, rdaName = "results.rda") {
    if (!file.exists(rdaName)) {
        stop("RDA file \"", rdaName, "\" does not exist yet; use createRDA()")
    }
    get(load(rdaName))[[name]]
}

writeRDA <- function(name = NULL, value = NULL, comment = "", context = NULL, rdaName = "results.rda") {
    if (!file.exists(rdaName)) {
        stop("RDA file \"", rdaName, "\" does not exist yet; use createRDA()")
    }
    load(rdaName) # defines 'results'
    if (is.null(context)) {
        context <- getwd()
    }
    results[[name]] <- list(value = value, comment = comment, context = context)
    #print(str(results))
    save(results, file = rdaName)
}

## demo
#createRDA(clear = TRUE)
#readRDA("test")
#stopifnot(is.null(readRDA("test")))
#writeRDA("test", 999)
#stopifnot(identical(list(value = 999, comment = ""), readRDA("test")))
#writeRDA("test", list(A = 1, B = 2), "a list")
#stopifnot(identical(A, list(value = list(A = 1, B = 2), comment = "a list")))