Distance Between Vectors and Big Data
Maribel Bueno Cachadina
Data Science and Machine learning are topics in trend right now. Most math departments are introducing courses in their catalogs related with these topics. Linear Algebra is one of the main pillars of these theories. In this talk we will illustrate how to discuss the concept of distance between vectors in light of the applications of this concept in Machine Learning. In particular, we will make a distinction between distance-based metrics and similarity-based metrics. The first ones include the well-know Euclidean distance and Taxicab distance. The second ones include maybe less known measures such as cosine similarity and Jaccard similarity. The talk includes numerous applications. In particular, we will discuss the use of the knn-algorithm in classification problems, and the use of the k-means algorithm in clustering problems. We will show how to use these algorithms in classification of handwritten digits or to analyze traffic patterns, for example.