function bestclass = knn(train_data, labels, example, k); %kNN-- do k-nearest neighbours classification % % BESTCLASS = knn(TRAIN_DATA, LABELS, EXAMPLE, K) % % Takes TRAIN_DATA, a D x N matrix containing N training examples of dimension % D; LABELS, an N-vector of the (positive integer) classes assigned to each % column of TRAIN_DATA; EXAMPLE, a D-vector consisting of the example we % are trying to classify; and K, the number of neighbours to use in % classifying. % % Returns BESTCLASS, the predicted class of the test point. % % The K-nearest neighbour algorithm works exactly like the one-nearest % neighbour algorithm (which chooses the class containing the example that is % has minimum Euclidean distance to the test example) but instead of using only % the closest neighbour it takes the K closest points and computes the % majority vote. See http://en.wikipedia.org/wiki/KNN for more details. % % By David Warde-Farley , November 2006. % % THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR % IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES % OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. % IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, % INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT % NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, % DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY % THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT % (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF % THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. % Compute the distances to each of the N training examples by duplicating % the test vector using REPMAT, subtracting, elementwise squaring with .^2, % and sum() to get the sums of each column, then sort them. % % NOTE: calling sort with two output arguments as I've done returns a vector % of the sorted distances as the first output argument, and a vector of % indices for the original positions of the sorted elements. i.e. ind(5) % contains the index that element 5 of the sorted array originally appeared % at in the unsorted array. [val, ind] = sort(sum((repmat(example,1,size(train_data,2)) - train_data).^2)); % Create a vector to store the number of examples observed for each class % among the K neighbours. counts = zeros(max(labels),1); % Loop through the k closest neighbours for neighbour = 1:k, % Get the label of the current neighbour from the LABELS vector % and increment its count. counts(labels(ind(neighbour))) = counts(labels(ind(neighbour))) + 1; end; % Take the class with the highest count (throw away the actual count % but store the index in BESTCLASS). [junk, bestclass] = max(counts);