4. K-Nearest Neighnour Algorithm
 


Use the k-nearest neighbour algorithm on the digit data set as follows: Learn in increments of 50 (you should get 8 result sets in all) for each value of k. Do this for k=1 to 15 for odd values of k and use the majority vote for classification. Plot the recognition accuracy against both k and the number of exemplars in the learning set (i.e. 50, 100, ... 400).

  1. Discuss the significance of the results you get.
  2. Compare your results with the backprop neural network you used in the previous assignment.
  3. What are the comparative execution times for comparable problems using k-NN and backprop based classification - you can do this for two sample cases at either end i.e. assuming learning set size is 50 and 400.

Graph:

 


 

Results
 



 


 
K=1
 
K=3
 
K=5
 
K=7
 
K=9
 
K=11
 
K=13
 
K=15
 
50 Training Instances
 
61.5000 49.0000 42.5000 38.5000 37.0000 32.5000 28.5000 27.5000
100 Training instances
 
70.0000 66.5000 65.5000 62.0000 60.5000 57.5000 53.0000 53.5000
150 Training instances
 
77.0000 71.5000 67.5000 66.5000 63.0000 62.0000 59.0000 58.5000
200 Training instances
 
78.5000 74.0000 71.5000 70.0000 68.0000 69.0000 65.5000 63.0000
250 Training instances
 
80.0000 76.5000 71.0000 68.5000 67.5000 67.5000 68.0000 67.0000
300 Training instances
 
81.5000 77.5000 74.5000 73.5000 72.5000 74.0000 73.0000 71.5000
350 Training instances
 
84.0000 80.5000 79.5000 78.5000 77.5000 77.0000 77.0000 77.0000
400 Training instances
 
88.5000 84.0000 82.0000 81.5000 78.5000 79.0000 78.5000 79.0000


 


Significanse of the results:



 The result signifies that when more number of the neighbours were taken the accuracy is decreasing for every set of training Instances. ie.,  the accuracy is decreasing as the k value (number of neighbours) are increasing.  Also the result indicate that the accuracy is increasing woth the number of training examples taken.
       
        By the results we can say that the actual output is more depending on the neighbours that are nearer to the instance. Since as the number of neighbours are increasing the farther instances will also effect the output that is the reason the output accuracy is decreaing.  When the number of training examples are increasing the distance of the K-nearest neighbours will be less. So the accuracy is increaing as we are taking the most nearest neighbours.

        In other words, As the k-increases, the chance that data instances from different class will match the test data increases (in terms of Euclidean distances). This may lead to false classification of the test data into other classes. However when the no. of training data is sufficient, The chance remains that more of the data corresponding to a single class will cluster in Euclidean space. Hence the chances that the test data will be correctly classified also increases. Hence we observe increased performance in recognition accuracy.


 


Comparision between Backprop and K-Nearest neighbour Algorithm:

        The Backprop networks is giving maximum accuracy for the training set as 88%.

        This algorithm is giving maximum accuracy as 88.5% when k=1 an all 400 instances are taken.

        The reason is that in back propagation technique we consider all the training instances, to classify a particular test instance. As such, some irrelevant instances will have effect on the classification, decreasing the accuracy. In contrast with this, K-nearest neighbor technique considers only the nearest instances, which predominantly have high effect on the target classification of a new query instance. Therefore the above result  follows.

 


Comparision of Execurion time:

        In Backprop networks the training will take more time. For giving an output for a new query it will take less time as there is no processsing again.

        But in these networks the training will take less time and also for a new query instance we have to calculate the distance from each instance and then find the k-nearest neighbours. i.e., all the processing we will do at that time only. So the K -nearest Neighbour algorithm will take less time in training but it will take more for giving answer to a query instance as complement to the Backprop networks.