K Means Clustering

  • algorithm that clusters database on their similarity
  • number of clusters has to be specified 
  • algorithm randomly assigns three centroids and assigns observations
  • calculates new centroids and reassigns data points 
  • variables have to be standardized


Text mining

Text mining is an area of computer science that has grown recently with the exponential increase in availability and relevance of unstructured data. 

Can be used to make large quantities of unstructured data accessible and useful

  • Anomaly detection
  • Contextual advertising
  • Sentiment analysis