I am looking for a KNN imputation package. I have been looking at imputation package (http://cran.r-project.org/web/packages/imputation/imputation.pdf) but for some reason the KNN impute function (even when following the example from the description) only seems to impute zero values (as per below). I have been looking around but cannot find something yet, and hence was wondering if anyone has other suggestions for good KNN imputation packages?
W
In the code per below – the NA values are replaced by zero’s – not by the Knn mean value
require(imputation) x = matrix(rnorm(100),10,10) x.missing = x > 1 x[x.missing] = NA kNNImpute(x, 3) x
Answer
You could also try the following package: DMwR.
It failed on the case of 3 NN, giving ‘Error in knnImputation(x, k = 3) :
Not sufficient complete cases for computing neighbors.’
However, trying 2 gives.
> knnImputation(x,k=2)
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] -0.59091360 -1.2698175 0.5556009 -0.1327224 -0.8325065 0.71664000
[2,] -1.27255074 -0.7853602 0.7261897 0.2969900 0.2969556 -0.44612831
[3,] 0.55473981 0.4748735 0.5158498 -0.9493917 -1.5187722 -0.99377854
[4,] -0.47797654 0.1647818 0.6167311 -0.5149731 0.5240514 -0.46027809
[5,] -1.08767831 -0.3785608 0.6659499 -0.7223724 -0.9512409 -1.60547053
[6,] -0.06153279 0.9486815 -0.5464601 0.1544475 0.2835521 -0.82250221
[7,] -0.82536029 -0.2906253 -3.0284281 -0.8473210 0.7985286 -0.09751927
[8,] -1.15366189 0.5341000 -1.0109258 -1.5900281 0.2742328 0.29039928
[9,] -1.49504465 -0.5419533 0.5766574 -1.2412777 -1.4089572 -0.71069839
[10,] -0.35935440 -0.2622265 0.4048126 -2.0869817 0.2682486 0.16904559
[,7] [,8] [,9] [,10]
[1,] 0.58027159 -1.0669137 0.48670802 0.5824858
[2,] -0.48314440 -1.0532693 -0.34030385 -1.1041681
[3,] -2.81996446 0.3191438 -0.48117020 -0.0352633
[4,] -0.55080515 -1.0620243 -0.51383557 0.3161907
[5,] -0.56808769 -0.3696951 0.35549191 0.3202675
[6,] -0.25043479 -1.0389393 0.07810902 0.5251606
[7,] -0.41667318 0.8809541 -0.04613332 -1.1586756
[8,] -0.06898363 -1.0736161 0.62698065 -1.0373835
[9,] 0.30051583 -0.2936140 0.31417921 -1.4155193
[10,] -0.68180034 -1.0789745 0.58290920 -1.0197956
You can test for sufficient observations using complete.cases(x),
where that value must be at least k.
One way to overcome this problem is to relax your requirements (i.e. less incomplete rows),
by 1) increasing the NA threshold, or alternatively, 2) increasing your number of observations.
Here is the first:
> x = matrix(rnorm(100),10,10)
> x.missing = x > 2
> x[x.missing] = NA
> complete.cases(x)
[1] TRUE TRUE TRUE FALSE FALSE TRUE TRUE TRUE TRUE TRUE
> knnImputation(x,k=3)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 0.86882569 -0.2409922 0.3859031 0.5818927 -1.50310330 0.8752261 -0.5173105 -2.18244988 -0.28817656 -0.63941237
[2,] 1.54114079 0.7227511 0.7856277 0.8512048 -1.32442954 -2.1668744 0.7017532 -0.40086348 -0.41251883 0.42924986
[3,] 0.60062917 -0.5955623 0.6192783 -0.3836310 0.06871570 1.7804657 0.5965411 -1.62625036 1.27706937 0.72860273
[4,] -0.07328279 -0.1738157 1.4965579 -1.1686115 -0.06954318 -1.0171604 -0.3283916 0.63493884 0.72039689 -0.20889111
[5,] 0.78747874 -0.8607320 0.4828322 0.6558960 -0.22064430 0.2001473 0.7725701 0.06155196 0.09011719 -1.01902968
[6,] 0.17988720 -0.8520000 -0.5911523 1.8100573 -0.56108621 0.0151522 -0.2484345 -0.80695513 -0.18532984 -1.75115335
[7,] 1.03943492 0.4880532 -2.7588922 -0.1336166 -1.28424057 1.2871333 0.7595750 -0.55615677 -1.67765572 -0.05440992
[8,] 1.12394474 1.4890366 -1.6034648 -1.4315445 -0.23052386 -0.3536677 -0.8694188 -0.53689507 -1.11510406 -1.39108817
[9,] -0.30393916 0.6216156 0.1559639 1.2297105 -0.29439390 1.8224512 -0.4457441 -0.32814665 0.55487894 -0.22602598
[10,] 1.18424722 -0.1816049 -2.2975095 -0.7537477 0.86647524 -0.8710603 0.3351710 -0.79632184 -0.56254688 -0.77449398
> x
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 0.86882569 -0.2409922 0.3859031 0.5818927 -1.5031033 0.8752261 -0.5173105 -2.18244988 -0.28817656 -0.63941237
[2,] 1.54114079 0.7227511 0.7856277 0.8512048 -1.3244295 -2.1668744 0.7017532 -0.40086348 -0.41251883 0.42924986
[3,] 0.60062917 -0.5955623 0.6192783 -0.3836310 0.0687157 1.7804657 0.5965411 -1.62625036 1.27706937 0.72860273
[4,] -0.07328279 -0.1738157 1.4965579 -1.1686115 NA -1.0171604 -0.3283916 0.63493884 0.72039689 -0.20889111
[5,] 0.78747874 -0.8607320 0.4828322 NA -0.2206443 0.2001473 0.7725701 0.06155196 0.09011719 -1.01902968
[6,] 0.17988720 -0.8520000 -0.5911523 1.8100573 -0.5610862 0.0151522 -0.2484345 -0.80695513 -0.18532984 -1.75115335
[7,] 1.03943492 0.4880532 -2.7588922 -0.1336166 -1.2842406 1.2871333 0.7595750 -0.55615677 -1.67765572 -0.05440992
[8,] 1.12394474 1.4890366 -1.6034648 -1.4315445 -0.2305239 -0.3536677 -0.8694188 -0.53689507 -1.11510406 -1.39108817
[9,] -0.30393916 0.6216156 0.1559639 1.2297105 -0.2943939 1.8224512 -0.4457441 -0.32814665 0.55487894 -0.22602598
[10,] 1.18424722 -0.1816049 -2.2975095 -0.7537477 0.8664752 -0.8710603 0.3351710 -0.79632184 -0.56254688 -0.77449398
here is an example of the 2nd…
x = matrix(rnorm(1000),100,10)
x.missing = x > 1
x[x.missing] = NA
complete.cases(x)
[1] TRUE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE TRUE FALSE FALSE
[22] FALSE FALSE TRUE FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[43] TRUE FALSE FALSE TRUE FALSE FALSE FALSE TRUE FALSE FALSE TRUE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[64] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
[85] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE FALSE
At least k=3 complete rows are satisfied, thus it is able to impute for k=3.
> head(knnImputation(x,k=3))
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 0.01817557 -2.8141502 0.3929944 0.1495092 -1.7218396 0.4159133 -0.8438809 0.6599224 -0.02451113 -1.14541016
[2,] 0.51969964 -0.4976021 -0.1495392 -0.6448184 -0.6066386 -1.6210476 -0.3118440 0.2477855 -0.30986749 0.32424673
...
Attribution
Source : Link , Question Author : Wouter , Answer Author : pat