Swatilalwani
Create Your First Project
Start adding your projects to your portfolio. Click on "Manage Projects" to get started
Sentiment Analysis on Amazon Alexa Review
Tools and techniques used
Libraries:
Pandas and NumPy for data manipulation.
Matplotlib and Seaborn for visualization.
Nltk for natural language processing, particularly stemming and stopword removal.
Scikit-learn for machine learning models and evaluation metrics.
WordCloud for visualizing frequently used words.
Dataset Overview:
The dataset contains Amazon Alexa reviews, including features like:
Rating: The star rating given by the user.
Feedback: Binary classification of reviews as positive (1) or negative (0).
Variation: The type of Alexa product (e.g., Echo, Black Dot).
Verified Reviews: Textual reviews provided by users.
Length: The length of the textual review.
The goal of the project is to predict the sentiment (positive/negative) of user reviews.
Project Workflow:
Data Exploration: The data is explored for missing values, unique values, and distributions of key features such as rating, feedback, and variation.
Rating Analysis: Most users provide ratings of 4 or 5, indicating positive feedback.
Feedback Analysis: About 91.87% of reviews are positive, while 8.13% are negative.
Variation Analysis: Various Alexa product variations are explored and analyzed based on ratings.
Review Length Analysis: The length of reviews is analyzed and related to feedback (positive or negative).
Text Preprocessing:
The text from the reviews is preprocessed by removing non-alphabet characters, converting to lowercase, and stemming the words.
Stopwords are removed to create a clean text corpus for analysis.
Machine Learning Models:
Random Forest Classifier:
Achieved high accuracy on both training and testing data.
Cross-validation and hyperparameter tuning were performed using GridSearchCV to improve performance.
XGBoost Classifier:
Another model tested for sentiment prediction with good performance.
Decision Tree Classifier:
Compared against other models but generally had lower accuracy than Random Forest and XGBoost.
Evaluation:
Confusion matrices were used to assess the performance of each model.
Accuracy scores, standard deviations, and cross-validation results were compared for optimal model selection.
Recommendations:
Model Selection: Random Forest and XGBoost provide high accuracy and could be prioritized for deployment. XGBoost slightly outperforms others in training and testing accuracy.
Conclusion:
The project successfully classifies the sentiment of Amazon Alexa reviews using machine learning algorithms like Random Forest, XGBoost, and Decision Trees.
While the data is heavily skewed towards positive feedback, the models achieve high accuracy in classifying both positive and negative sentiments.







