Statistical project directory structure with multiple languages (e.g., R and Splus)?

Building on the post How to efficiently manage a statistical analysis project and the ProjectTemplate package in R…

Q: How do you build your statistical project directory structure when multiple languages feature heavily (e.g, R AND Splus)?

Most of the discussions on this topic have been limited to projects which primarily use one language. I’m concerned with how to minimize sloppiness, confusion, and breakage, when using multiple languages.

I’ve included below my current project structure and methods for doing things. An alternative might be to separate code so that I have ./R and ./Splus directories—each containing their own /lib, /src, /util, /tests, and /munge directories.

Q: Which approach would be closest to “best practices” (if any exist)?

  • /data – data shared across projects
  • /libraries – scripts shared across projects
  • /projects/myproject – my working directory. Currently, if I use multiple languages they share this location as their working directory.
  • ./data/ – data specific to /myproject and symlinks to data in /data
  • ./cache/ – cached workspaces (e.g., .RData files saved using save.image() in R or .sdd files saved using data.dump() in Splus)
  • ./lib/ – main project files. Same across all projects. An R project will be run via source("./lib/main.R") which in turn runs load.R, clean.R, test.R, analyze.R, .report.R. Currently, if multiple languages are being used, say, Splus in addition to R, I’ll throw main.ssc, clean.ssc, etc. into this directory as well. Not sure I like this though.
  • ./src/ – project-specific functions. Collected one function per file.
  • ./util/ – general functions eventually to be packaged. Collected one function per file.
  • ./tests/ – files for running test cases. Used by ./lib/test.R
  • ./munge/ – files for cleaning data. Used by ./lib/clean.R
  • ./figures/ – tables and figure output from ./lib/report.R to be used in the final report
  • ./report/.tex files and symlinks to files in ./figures
  • ./presentation/.tex files for presentations (usually the Beamer class)
  • ./temp/ – location for temporary scripts
  • ./README
  • ./TODO
  • ./.RData – for storing R project workspaces
  • ./.Data/ – for storing S project workspaces


I definitely wouldn’t call it “best practices”, but my typical project has directories

R (which generally contains prepData.R, analysis.R, func.R, and figs.R, though could be these could be each split into many files and could use Sweave or asciidoc)

Perl (mostly for parsing/converting data files)

RawData (all original data files)

Data (all processed files)

Notes (generally notes from the collaborator)

The R directory often contains subdirectories Figs and Rcache.

Particularly important: version control! I like git.

Source : Link , Question Author : lowndrul , Answer Author : Karl

Leave a Comment