When can I stop looking for a better model?

I’m looking for a model between stockprices of energy and the weather. I have the price of the MWatt bought between the countries of Europe, and a lot of values on the weather (Grib files). Each hours on a period of 5 years (2011-2015).

Price/day

enter image description here

This is per day for one year. I have this per hours on 5 years.

Example of weather

enter image description here
3Dscatterplot, in kelvin, for an hour. I have 1000 values per data per hour and 200 data, like klevin, wind, geopential etc..

I’m trying to forecast the mean price per hour of the Mwatt.

My data on the weather are very dense, more than 10000 values/hour and so with a high correlation. It’s a problem of short,big data.

I’ve tried the Lasso, Ridge and SVR methods with the mean price of the MWatt as outcome and my weather’s data as income. I took 70% as training data and 30% as test. If my test’s data are non-forecasting (somwhere inside my training data) I have a good prediction( R² = 0.89). But i want to do forecasting on my data.

So if the test data are chronologically after my training data it doesn’t predict anything (R²=0.05). I think it’s normal because it’s a time serie. And there is a lot of autocorrelation.

I thought that i had to use time serie model like ARIMA. I calculated the order of the method (the serie is stationary) and I tested it. But it doesn’t work . I mean that the forecasting has a r² of 0.05.My prediction on the test data is not at all on my test data. I tried the ARIMAX method with my weather as regressor. Put it doesn’t add any information.

ACF/PCF, Test/train data

So I’ve done a seasonal cut per day and per week

Day

enter image description here

Week on the trend of the first

enter image description here

And I can have this if I can predecit the trend of trend of my stock price :
enter image description here

The blue is my prediction and the red the real value.

I’m going to do a regression with a rolling mean of the weather as income and the trend of the trend of the stockprice as outcome. But for now, I haven’t find any relation.

But if there is no interaction, how can I know there isn’t anything? maybe it’s just that I haven’t find it.

Answer

You might be interested in a formal science domain called “computational mechanics.” In an article by James Crutchfield and David Feldman, they lay out the program of computational mechanics—as far as I understand it—as parsing out the boundaries between (1) deterministic uncertainty and the information cost of inferring deterministic relationships, (2) stochastic uncertainty and the information cost of inferring probability distributions, and (3) entropic uncertainty and the consequences of being information poor.

To answer your question directly (albeit also quite broadly, since you asked a broad question), how we know when we have learned either “enough,” or “all we can” from data is an open domain of research. The former will necessarily be contingent upon one’s needs as a researcher and actor in the world (e.g., given how much time? how much processing power? how much memory, how much urgency, etc.).

I’m not up on this field, or even deep with this particular article, but they’re some cool thinkers. 🙂

Crutchfield, J. P. and Feldman, D. P. (2003). Regularities unseen, randomness observed: Levels of entropy convergence. Chaos, 13(1):25–54.

Attribution
Source : Link , Question Author : el Josso , Answer Author : Alexis

Leave a Comment