← Projects
Completed Research

Twitter Sentiment Analysis

View on GitHub ↗
PythonJupyter Notebookscikit-learnNLTKpandasmatplotlib

About this project

Jupyter-based NLP pipeline for predicting sentiment from Twitter posts using machine learning. Applies text classification to infer positive, negative, or neutral sentiment from social media data.

Background

This project was a comparative study of classical ML classifiers applied to a well-understood NLP task. Twitter sentiment analysis is a benchmark problem with known challenges: the informal language, heavy use of abbreviations, irony, and the brevity of the format all make it harder than standard document classification.

The value of the multi-classifier comparison (Naïve Bayes, SVM, Logistic Regression) was less about identifying a single winner and more about understanding the trade-offs. Naïve Bayes trains fast and performs surprisingly well on short texts because the independence assumption is less wrong for tweets than for longer documents. SVM with an appropriate kernel handles the high-dimensional feature space well but is sensitive to the feature engineering choices upstream. Logistic Regression is interpretable — you can inspect the feature weights to understand what the model has learned.

The pre-processing pipeline handles the specifics of tweet data: hashtags are stripped but preserved as signals (a tweet with #angry is different from one without), URLs are removed, user mentions are normalised. Lemmatisation reduces vocabulary dimensionality without losing the root meaning of terms. The temporal trend visualisation was added to make the output useful for longitudinal analysis rather than just point-in-time classification.

Highlights

  • Tweet pre-processing: tokenisation, hashtag handling, URL stripping, lemmatisation
  • Multiple classifier comparison: Naïve Bayes, SVM, Logistic Regression
  • Sentiment polarity prediction with confidence scores
  • Visualisation of sentiment distribution and temporal trends
← All projects GitHub ↗
← SentimentScope Remote Work Intensity Survey Instrument →