I got this question in a quiz, it asked what will be the training error for a KNN classifier when K=1. What does training mean for a KNN classifier? My understanding about the KNN classifier was that it considers the entire data-set and assigns any new observation the value the majority of the closest K-neighbors. Where does training come into the picture? Also the correct answer provided for this was that the training error will be zero irrespective of any data-set. How is this possible?
Training error here is the error you’ll have when you input your training set to your KNN as test set. When K = 1, you’ll choose the closest training sample to your test sample. Since your test sample is in the training dataset, it’ll choose itself as the closest and never make mistake. For this reason, the training error will be zero when K = 1, irrespective of the dataset. There is one logical assumption here by the way, and that is your training set will not include same training samples belonging to different classes, i.e. conflicting information. Some real world datasets might have this property though.