When performing k-fold cross validation, I understand that you obtain the accuracy metrics by pointing all the folds except one at that one fold and make predictions, and then repeat this process $k$ times. You can then run accuracy metrics on all of your instances (precision, recall, % classified correctly), which ought to be the same as if you calculated them each time and then averaged the result (correct me if I’m wrong).
The end result you want is a final model.
Do you average the models obtained to make your set of $k$ predictions to end up with the model that has the accuracy metrics obtained by the above method?
The aim of $k$-fold cross validation is not to produce a model; it is to compare models.
The results of a cross validation experiment could tell you that Support Vector Machines out-perform Naive Bayes on your data, or that the classifier’s hyper parameters should be set to c for this particular data set. Armed with this knowledge, you then train a “production” classifier with ALL of the available data and apply it to your problem.
In many cases, it’s not even clear how you would go about averaging several models. For example, what is the average of three decision trees or nearest neighbor classifiers?
It’s important to keep in mind that the cross validation results are estimates, not guarantees, and these estimates are more valid if the production classifier is trained with a similar quality (and quantity) of data. There has been a fair amount of work on developing ways to use these estimates to perform inference; that is, to say, in a statistically sound way, that method A is generally superior to method B on these data.