# Multivariant time series in R. How to find lagged correlation and build model for forecasting

I’m new in the page and pretty new in statistics and R. I’m working on a project for college with the objective of finding the correlation between rain and water flow level in rivers. Once the correlation is proved I want to forecast/predict it.

The data
I have a set of data of several years(taken every 5 minutes) for a particular rivers containing:

• Rainfall in millimetres
• River flow in cubic meters per second

This river doesn’t have snow, so the model is just based on rain and time. There are occasionally freezing temperatures, but I’m thinking on removing those periods out of the data as outliers as that situation is out of scope for my project.

Examples
Here you have a couple of plots of sample data the from a rain and the rise of water a few hours later.

The red line is the river flow. The orange is the rain. You can see it always rains before water raises in river. There is some rain starting again at the end of the time series, but it will affect the river flow later.

The correlation is there. Here is what I’ve done in R to prove the correlation using ccf in R:

• the cross-correlation
• the lag

This is my R line used for the second example (one rainfall period):

``````ccf(arnoiaex1\$Caudal, arnoiaex1\$Precip, lag.max=1000, plot=TRUE, main="Flow & Rain")
``````

My interpretation is:

• that the rain leads (happens first),
• there is a significant correlation that peaks at a lag of \$\approx 450\$ (I can check the exact number, I know that part).
• I don’t know how to find out the time that correlation affects the river flow, I think the name is “retention”. What I see is the graph follows the same shape of the first graph, when the river losing the water after the rain. I don’t if based on that I can say the retention lasts from \$\approx 450\$ when it peaks to \$\approx 800\$ (I can check this in the object created in the dataframe returned by `ccf` and see when the water level comes back to the value of “before rain”. Is that right? Is there a better way to find the retention?

Am I right?

This time series doesn’t have periodicity or seasonality. Rain can come any time and cause an effect. It does reduce in summer, but it still happens, it’s an area with a lot of rain all year around.

Model and forecast.
I don’t know how to create a model to be able to do a forecast that tells me how much is a river going to increase the volume after a period of rain. I’ve been trying some `arima`, `auto arima` but haven’t been very successful. Should I use `Arima`, `vars` or other different multivariate model? Any link to a example would be of great help.

Please, let me know if you know the best way to create this prediction, what model should I use. There are a few other things I’m considering doing but taken them out of this explanation for simplicity.
I can share some data if required.