I asked this question on the matemathics stackexchange site and was recommended to ask here.

I’m working on a hobby project and would need some help with the following problem.

## A bit of context

Let’s say there is a collection of items with a description of features and a price. Imagine a list of cars and prices. All cars have a list of features, e.g. engine size, color, horse power, model, year etc. For each make, something like this:

`Ford: V8, green, manual, 200hp, 2007, $200 V6, red, automatic, 140hp, 2010, $300 V6, blue, manual, 140hp, 2005, $100 ...`

Going even further, the list of cars with prices is published with some time-interval which means we have access to historical price data. Might not always include exactly the same cars.

## Problem

I would like to understand how to model prices for any car based on this base information, most importantly cars not in the initial list.

`Ford, v6, red, automatic, 130hp, 2009`

For the above car, it’s almost the same as one in the list, just slightly different in horse power and year. To price this, what is needed?

What I’m looking for is something practical and simple, but I would also like to hear about more complex approaches how to model something like this.## What I’ve tried

Here is what I’ve been experimenting with so far:

1) using historical data to lookup car X. If not found, no price. This is of course very limited and one can only use this in combination with some time decay to alter prices for known cars over time.

2) using a car feature weighting scheme together with a priced sample car. Basically that there is a base price and features just alter that with some factor. Based on this any car’s price is derived.

The first proved to be not enough and the second proved to not be always correct and I might not have had the best approach to using the weights. This also seems to be a bit heavy on maintaining weights, so that’s why I thought maybe there is some way to use the historical data as statistics in some way to get weights or to get something else. I just don’t know where to start.

## Other important aspects

- integrate into some software project I have. Either by using existing libraries or writing algorithm myself.
- fast recalculation when new historical data comes in.

Any suggestions how a problem like this could be approached?All ideas are more than welcome.Thanks a lot in advance and looking forward to reading your suggestions!

**Answer**

“Practical” and “simple” suggest **least squares regression.** It’s easy to set up, easy to do with lots of software (R, Excel, Mathematica, *any* statistics package), easy to interpret, and can be extended in many ways depending on how accurate you want to be and how hard you’re willing to work.

This approach is essentially your “weighting scheme” (2), but it finds the weights easily, guarantees as much accuracy as possible, and is easy and fast to update. There are *loads* of libraries to perform least squares calculations.

It will help to include not only the variables you listed–engine type, power, etc–but also *age* of car. Furthermore, make sure to adjust prices for inflation.

**Attribution***Source : Link , Question Author : murrekatt , Answer Author : whuber*