S367 HW4 (Vidur Kumar)

SE367

HW4

Vidur Kumar (Y8560)

Q1.

a) Plot of residual errors versus the no. of dimensions chosen (for 100 images)

b) The variance is minimal when there are two dimensions chosen for the isomap- most likely because the data is best represented in two dimensions.

This is further corroborated by the fact that increasing the number of dimensions of the isomap, to more than two, does not alter the residual variance at all. Infact, if anything, it is infinitesimally increased with increasing dimensions of the isomap - the only plausible explanation for which could be the addition of noise with the inclusion of unnecessary dimensions.

c) Plot of the 2D Isomap with labelled boundary points of (Theta1) and (Theta2):

As it is seen from the above labelling of the plot - the occurance of theta-boundary points are not in any particular patterns in the Isomap. Nor do all points on the boundary of the Isomap represent boundary points of theta.

Hence, it is assumed that there is no pattern in relating the plots of the Isomap representation of the vectors with that in Eucledian space.

d) Plot of residual errors for 2D Isomap generation over 1000 images :

The residual error shows a similar pattern as seen for the Isomap generated for 100 images - with the only variation being that the absolute value of the error has gone down, by virtue of the larger number of data points to choose the eigen vectors more precisely.

e) The values of Y1 and Y2 versus Theta1 and Theta2, which have been plotted as shown :

	Y1	Y2	Theta1	Theta2
1	4446.367685	1164.778935	23.061785	111.920576
2	4134.200528	323.6929394	13.010174	143.673573
3	-2052.107757	1777.2488	77.935974	63.507889
4	-3621.799525	-1054.84844	93.446213	-27.068283
5	-2861.849756	28.53883074	117.977051	16.14006
6	-6139.902964	-1035.405504	60.216198	3.45488
7	-862.8480584	921.5577661	98.890152	-52.05019
8	-3520.120481	270.0209506	53.22966	86.592529
9	-2700.068712	1856.022592	114.079742	-10.437866
10	5113.083798	907.9178429	96.59771	-81.656945
11	371.2288524	1255.061606	61.536392	-41.745966
12	-3068.718884	159.2593573	73.605782	-3.282196
13	-5542.463144	507.3439305	76.759033	95.872925
14	-232.9344764	-1098.584115	117.142075	-87.767963
15	3208.814481	-1439.483843	37.193707	9.671291
16	2045.998808	51.52377313	124.752792	-46.28672
17	-182.8565163	-288.9544859	106.577217	-83.606153
18	-3190.656936	644.9811566	11.097768	158.217799
19	5638.949873	1320.331654	3.452615	24.305352
20	-4927.76027	-1130.816218	51.594627	-23.742022

The plot of (Theta 1 and Theta 2) vs Y1 and Y2 respectively - did not yield any significant patterns (atleast in terms of linear-fits).

Q2.

The eigen values for the PCA analysis (for 10dimensions) are as follows :

951682.4
402869.8
332749.8
102007.5
71759.04
43769.68
28116.36
21325.43
17958.51
16813.90

Hence, it is visible that the first two are not very significantly greater than the other eigen values - which should ideally reflect in the quality of the reconstruction, but does not seem to be affecting the image in any significant way (considering the angles theta1 and theta2):

PCA-Reconstruction, using 2 eigen vectors:

PCA-Reconstruction, using 10 eigen vectors:

Q3.

a) Reconstruction of the x' vector (corresponding to y' vector in eigen space - after LLE reduction of the vector/image-space) :

b) Reconstruction of x', corresponding to y' vector (after Isomap-algorithm based reduction of vector/image-space) [taking the weightage function for the local neighbours]:

Reconstruction after PCA (taking 2-eigen vectors) - (given above as well)

c) From the above examples - it is clearly visible that the reconstruction of y-data (in eigen space) back to its x-data-form (original dimensional space), is best facilitated when utilizing the original-data of the neighbouring points (which are calculated in the eigen space).

This is so - since using the original data, with weightage functions, allows for the inclusion of all details that existed in the original dimensions - as opposed to attempting to reconstruct the data from a linear expression of the eigen vectors in a linear space - specially when the no. of eigen vectors chosen is fewer than the dimension order of the original vector/image-space.

The eigen vectors allow for a resemblence to original data, upon reconstruction - because the 'constant' or non-variable data, is adjusted for within the chosen eigen-vectors that define the lower-dimensional eigen-space.. Hence, reconstructing from that coordinates in the eigen space, back to the original space - results in images that resemble the original image, but still lose out on some of the details.

The linear systems are more simplistic in their deconstruction/dimension-reduction methodology - and also in their reconstruction methodology - but do therefore, also allow for a greater loss of data, for the same degree of information retained (in terms of stored space in bytes). Their data is stored in terms of the coordinates in eigen space, and the representation of the eigen vectors in the original vector/image space.

The non-linear systems are more complex in their construction and deconstruction algorithms - but give far better results, for the same amount of information being stored. Their data is stored as a set of nodes and geodesic distances between nodes.

NOTE - the data in the non-linear system is stored as a function of the local neighbourhood of points - within which it is linear. Hence, resulting in multiple pockets of linearity, that allow for overall reduction of data by following the distances along these linear neighbourhoods (from one neighbourhood to another - and hence, traversing the entire image-space)- but there is no global linearity in the storage.

Q4.

Having run the Isomap and PCA algorithm on the set of 2000images, it can be said that the non-linear method is more efficient than the linear method - since it does not assume or require a pattern or a particular distrubtion in the vector/image-space, for it to give efficient reconstructions.

The linear methods are less efficient, since they are effective (in dimensions reduction) only if the data in the vector/image space, is oriented suitable for that reduction.

The linear methods have the advantage in being faster and requiring less processing - but suffer from the loss of efficiency in reconstruction, even when using the neighbours to do the reconstruction.

Example :

PCA-reconstructed image (using neighbours)

The more intense image in the reconstruction does not match the theta-angles of the two images taken to creat this new averaged image - while the shadow-image's angles do match the average of the two thetas.

This is probably explainable as the fact that since the y1,y2 did not map in any particular pattern to theta1,theta2 - therefore, in the eigen space - the neighbours of the overaged y1,y2 point - came out to be those with a great deviation in theta1,theta2. Hence, the image is grossly inaccurate to what it should be

IMAGE : Isomap reconstruction using neighbours:

This image is the accurate average of the two images whose average was taken to create this piont in the representation space.

HENCE - non-linear methods do seem to have the edge over linear methods.