I have to implement a k-Nearest Neighbor Classifier to predict the sentiment `for a lot of movie reviews as illustrated in the .dat(test.dat).
Positive reviews is represented by a review rating of +1 and negative reviews are represented by a review rating of -1. In [test.dat] I am only provided the reviews. These data will be used for comparing your predictions.`
Training data consists of 25000, provided in the file train.dat. Each row begins with the sentiment score followed by the text associated with that rating. Note that the text may contain HTML artifacts and other text or numbers not associated with the review text.
I am completely lost in the fact, how does one compare text reviews and make them numerical, any suggestions are appreciated. I am beginner in data science and really need guidance. I know how the knn works, my current thought process is associate common words with an review?