# How to aggregate by minute data for a week into hourly means? [closed]

How would you get hourly means for multiple data columns, for a daily period, and show results for twelve “Hosts” in the same graph? That is, I’d like to graph what a 24 hour period looks like, for a weeks worth of data. The eventual goal would be to compare two sets of this data, before and after samplings.

``````                dates     Hos      CPUIOWait CPUUser CPUSys
1 2011-02-11 23:55:12     db       0         14      8
2 2011-02-11 23:55:10     app1     0          6      1
3 2011-02-11 23:55:09     app2     0          4      1
``````

I’ve been able to run `xyplot(CPUUser ~ dates | Host)` with good effect. However, rather than showing each date in the week, I’d like the X axis to be the hours of the day.

Trying to get this data into an xts object results in errors such as:

“order.by requires an appropriate time-based object”

Here is a `str()` of the data frame:

``````'data.frame':   19720 obs. of  5 variables:
$$dates : POSIXct, format: "2011-02-11 23:55:12" "2011-02-11 23:55:10" ...$$ Host     : Factor w/ 14 levels "app1","app2",..: 9 7 5 4 3 10 6 8 2 1 ...
$$CPUIOWait: int 0 0 0 0 0 0 0 0 0 0 ...$$ CPUUser  : int  14 6 4 4 3 10 4 3 4 4 ...
\$ CPUSys   : int  8 1 1 1 1 3 1 1 1 1 ...
``````

UPDATE: Just for future reference, I decided to go with a boxplot, to show both the median, and the ‘outliers’.

Essentially:

``````Data$$hour <- as.POSIXlt(dates)$$hour  # extract hour of the day
boxplot(Data$$CPUUser ~ Data$$hour)    # for a subset with one host or for all hosts
xyplot(Data$$CPUUser ~ Data$$hour | Data\$Host, panel=panel.bwplot, horizontal=FALSE)
``````

Here is one approach using cut() to create the appropriate hourly factors and ddply() from the plyr library for calculating the means.

``````library(lattice)
library(plyr)

## Create a record and some random data for every 5 seconds
## over two days for two hosts.
dates <- seq(as.POSIXct("2011-01-01 00:00:00", tz = "GMT"),
as.POSIXct("2011-01-02 23:59:55", tz = "GMT"),
by = 5)
hosts <- c(rep("host1", length(dates)), rep("host2",
length(dates)))
x1    <- sample(0:20, 2*length(dates), replace = TRUE)
x2    <- rpois(2*length(dates), 2)
Data  <- data.frame(dates = dates, hosts = hosts, x1 = x1,
x2 = x2)

## Calculate the mean for every hour using cut() to define
## the factors and ddply() to calculate the means.
## getmeans() is applied for each unique combination of the
## hosts and hour factors.
getmeans  <- function(Df) c(x1 = mean(Df\$x1),
x2 = mean(Df\$x2))
Data\$hour <- cut(Data\$dates, breaks = "hour")
Means <- ddply(Data, .(hosts, hour), getmeans)
Means\$hour <- as.POSIXct(Means\$hour, tz = "GMT")

## A plot for each host.
xyplot(x1 ~ hour | hosts, data = Means, type = "o",
scales = list(x = list(relation = "free", rot = 90)))
``````