Use the k-nearest neighbour algorithm on the digit data set as follows:
Learn in increments of 50 (you should get 8 result sets in all) for each value
of k. Do this for k=1 to 15 for odd values of k and use the majority vote for
classification. Plot the recognition accuracy against both k and the number of
exemplars in the learning set (i.e. 50, 100, ... 400).
![]()
![]()
| K=1 |
K=3 |
K=5 |
K=7 |
K=9 |
K=11 |
K=13 |
K=15 | |
| 50 Training Instances |
61.5000 | 49.0000 | 42.5000 | 38.5000 | 37.0000 | 32.5000 | 28.5000 | 27.5000 |
| 100 Training instances |
70.0000 | 66.5000 | 65.5000 | 62.0000 | 60.5000 | 57.5000 | 53.0000 | 53.5000 |
| 150 Training instances |
77.0000 | 71.5000 | 67.5000 | 66.5000 | 63.0000 | 62.0000 | 59.0000 | 58.5000 |
| 200 Training instances |
78.5000 | 74.0000 | 71.5000 | 70.0000 | 68.0000 | 69.0000 | 65.5000 | 63.0000 |
| 250 Training instances |
80.0000 | 76.5000 | 71.0000 | 68.5000 | 67.5000 | 67.5000 | 68.0000 | 67.0000 |
| 300 Training instances |
81.5000 | 77.5000 | 74.5000 | 73.5000 | 72.5000 | 74.0000 | 73.0000 | 71.5000 |
| 350 Training instances |
84.0000 | 80.5000 | 79.5000 | 78.5000 | 77.5000 | 77.0000 | 77.0000 | 77.0000 |
| 400 Training instances |
88.5000 | 84.0000 | 82.0000 | 81.5000 | 78.5000 | 79.0000 | 78.5000 | 79.0000 |
![]()
Significanse of the
results:
The result signifies that
when more number of the neighbours were taken the accuracy is decreasing for
every set of training Instances. ie., the accuracy is decreasing as the k
value (number of neighbours) are increasing. Also the result indicate that
the accuracy is increasing woth the number of training examples taken.
By the results we can say that the actual output is more
depending on the neighbours that are nearer to the instance. Since as the number
of neighbours are increasing the farther instances will also effect the output
that is the reason the output accuracy is decreaing. When the number of
training examples are increasing the distance of the K-nearest neighbours will
be less. So the accuracy is increaing as we are taking the most nearest
neighbours.
In other words, As the
k-increases, the chance that data instances from different class will match the
test data increases (in terms of Euclidean distances). This may lead to false
classification of the test data into other classes. However when the no. of
training data is sufficient, The chance remains that more of the data
corresponding to a single class will cluster in Euclidean space. Hence the
chances that the test data will be correctly classified also increases. Hence we
observe increased performance in recognition accuracy.
![]()
Comparision between Backprop and
K-Nearest neighbour
Algorithm:
The Backprop networks is giving maximum accuracy for the training set as 88%.
This algorithm is giving maximum
accuracy as 88.5% when k=1 an all 400 instances are taken.
The reason is that in back
propagation technique we consider all the training instances, to classify a
particular test instance. As such, some irrelevant instances will have
effect on the classification, decreasing the accuracy. In contrast with this,
K-nearest neighbor technique considers only the nearest instances, which
predominantly have high effect on the target classification of a new query
instance. Therefore the above result follows.
![]()
Comparision of Execurion
time:
In
Backprop networks the training will take more time. For giving an output for a
new query it will take less time as there is no processsing
again.
But in these networks the
training will take less time and also for a new query instance we have to
calculate the distance from each instance and then find the k-nearest
neighbours. i.e., all the processing we will do at that time only. So the K
-nearest Neighbour algorithm will take less time in training but it will take
more for giving answer to a query instance as complement to the Backprop
networks.
![]()