How to optimize my R script in order to use “multicore”

I am using GNU R at a Ubuntu-Lucid PC which has 4 CPUs.
In order to use all 4 CPUs, I installed the “r-cran-multicore” package.
As the package’s manual lacks of practical examples that I understand, I need advice in how to optimize my script in order to make use of all 4 CPUs.

My dataset is a data.frame (called P1) that has 50,000 rows and 1600 cols. For each row, I’d like to calc the maximun, sum and mean. My script looks as follows:

p1max <- 0
p1mean <- 0
p1sum <-0
plength <- length(P1[,1])
for(i in 1:plength){
   p1max <- c(p1max, max(P1[i,]))
   p1mean <- c(p1mean, mean(P1[i,]))
   p1sum <- c(p1sum, sum(P1[i,]))
}

Could anyone please tell me how to modify and run the script in order to use all 4 CPUs?

Answer

Use foreach and doMC. The detailed explanation can be found here. Your script will change very little, the line

for(i in 1:plength){

should be changed to

foreach(i=1:plength) %dopar% { 

The prerequisites for any multitasking script using these packages are

library(foreach)
library(doMC)
registerDoMC()

Note of caution. According to the documentation you cannot use this in GUI.

As for your problem, do you really need multitasking? Your data.frame takes about 1.2GB of RAM, so it should fit into your memory. So you can simply use apply:

p1smry <- apply(P1,1,summary)

The result will be a matrix with summaries of each row.

You can also use function mclapply which is in the package multicore. Then your script might look like this:

loopfun <- function(i) {
     summary(P1[i,])
}

res <- mclapply(1:nrow(P1),loopfun)

This will return the list, where i-th element will be the summary of i-th row. You can convert it to matrix using sapply

mres <- sapply(res,function(x)x)

Attribution
Source : Link , Question Author : Produnis , Answer Author : mpiktas

Leave a Comment