How would you get hourly means for multiple data columns, for a daily period, and show results for twelve “Hosts” in the same graph? That is, I’d like to graph what a 24 hour period looks like, for a weeks worth of data. The eventual goal would be to compare two sets of this data, before and after samplings.
dates Hos CPUIOWait CPUUser CPUSys 1 2011-02-11 23:55:12 db 0 14 8 2 2011-02-11 23:55:10 app1 0 6 1 3 2011-02-11 23:55:09 app2 0 4 1
I’ve been able to run
xyplot(CPUUser ~ dates | Host)
with good effect. However, rather than showing each date in the week, I’d like the X axis to be the hours of the day.Trying to get this data into an xts object results in errors such as:
“order.by requires an appropriate time-based object”
Here is a
str()
of the data frame:'data.frame': 19720 obs. of 5 variables: $ dates : POSIXct, format: "2011-02-11 23:55:12" "2011-02-11 23:55:10" ... $ Host : Factor w/ 14 levels "app1","app2",..: 9 7 5 4 3 10 6 8 2 1 ... $ CPUIOWait: int 0 0 0 0 0 0 0 0 0 0 ... $ CPUUser : int 14 6 4 4 3 10 4 3 4 4 ... $ CPUSys : int 8 1 1 1 1 3 1 1 1 1 ...
UPDATE: Just for future reference, I decided to go with a boxplot, to show both the median, and the ‘outliers’.
Essentially:
Data$hour <- as.POSIXlt(dates)$hour # extract hour of the day boxplot(Data$CPUUser ~ Data$hour) # for a subset with one host or for all hosts xyplot(Data$CPUUser ~ Data$hour | Data$Host, panel=panel.bwplot, horizontal=FALSE)
Answer
Here is one approach using cut() to create the appropriate hourly factors and ddply() from the plyr library for calculating the means.
library(lattice)
library(plyr)
## Create a record and some random data for every 5 seconds
## over two days for two hosts.
dates <- seq(as.POSIXct("2011-01-01 00:00:00", tz = "GMT"),
as.POSIXct("2011-01-02 23:59:55", tz = "GMT"),
by = 5)
hosts <- c(rep("host1", length(dates)), rep("host2",
length(dates)))
x1 <- sample(0:20, 2*length(dates), replace = TRUE)
x2 <- rpois(2*length(dates), 2)
Data <- data.frame(dates = dates, hosts = hosts, x1 = x1,
x2 = x2)
## Calculate the mean for every hour using cut() to define
## the factors and ddply() to calculate the means.
## getmeans() is applied for each unique combination of the
## hosts and hour factors.
getmeans <- function(Df) c(x1 = mean(Df$x1),
x2 = mean(Df$x2))
Data$hour <- cut(Data$dates, breaks = "hour")
Means <- ddply(Data, .(hosts, hour), getmeans)
Means$hour <- as.POSIXct(Means$hour, tz = "GMT")
## A plot for each host.
xyplot(x1 ~ hour | hosts, data = Means, type = "o",
scales = list(x = list(relation = "free", rot = 90)))
Attribution
Source : Link , Question Author : Scott Hoffman , Answer Author : Jason Morgan