Sentiment Analysis

1

How to start working with us.

Geolance is a marketplace for remote freelancers who are looking for freelance work from clients around the world.

2

Create an account.

Simply sign up on our website and get started finding the perfect project or posting your own request!

3

Fill in the forms with information about you.

Let us know what type of professional you're looking for, your budget, deadline, and any other requirements you may have!

4

Choose a professional or post your own request.

Browse through our online directory of professionals and find someone who matches your needs perfectly, or post your own request if you don't see anything that fits!

Machine learning offers great potential for many fields. Natural language processing is no exception. It is an area of artificial intelligence where machine learning can show how it can achieve some brilliant results in really challenging tasks. NLP is a new field of machine learning. But this combination between both areas is relatively modern but aims at making progress. It's a hybrid application that most people (with a budget smartphone) use every day.

Sentiment analysis

Sentiment analysis is an area of NLP where we try to determine whether the writer of some text has positive or negative feelings towards some topic. Sentiment analysis can be used in social media, marketing, and finance applications. For example, if you post on Facebook, your friend may send you a link to check out which product they are promoting. Before clicking that link, you always want to know what other people think about this particular product. Otherwise, it might be useless for you. Or suppose you are sending money through Western Union or PayPal. Words, for example, that intensify, relax, or negate the sentiment expressed by the concept can affect its score. Alternatively, texts can be given a positive and negative sentiment strength score if the goal is to determine the sentiment in a text rather than the overall polarity and strength. Finally, you always want to know how much another person who had sent money received so that you will get an idea about its delivery time and service quality. All these examples show why sentiment analysis is so critical. Much in the way your brain remembers the descriptive words you encounter over your lifetime and their relative "sentiment weight," a basic sentiment analysis system draws on a sentiment library to understand the sentiment-bearing phrases it meets.

Sentiment analysis is a hot topic in NLP. Are you interested in learning more

Sentiment analysis can be used to understand the feelings and opinions of people on social media, marketing, finance, and many other applications. For example, if you post on Facebook, your friend may send you a link to check out which product they are promoting. The tone of voice can also be analyzed using sentiment analysis.

You will learn about different types of sentiment analysis like polarity detection (positive or negative), subjectivity detection (objective or subjective), and emotion detection (anger, disgust, etc.). Then, we'll go through how to build our sentiment classifier for English texts using Python libraries like sci-kit-learn and black. Finally, we'll see examples of these models applied successfully to solve real-world problems. By the end of this course, you will know how to use machine learning techniques for solving text classification tasks with practical examples! Let's get started!

Formulating the problem statement of sentiment analysis

Assume that I have some influential people on Twitter. Now, if I can monitor their tweets, it will help me know which products or services are more popular among them. Therefore, I need to develop an algorithm that can work as follows:

1. It should first identify influencers of any particular topic from Twitter.

2. It should collect the tweets posted by those influencers for some time (daily).

3. Each tweet should identify positive or negative emotions expressed by the writer towards that topic/product/service mentioned in that tweet.

4. If most of these influencers have a favourable opinion about the product, I can promote it on my social media profile. On the other hand, if most of these influencers have a negative idea about the product, I need to avoid promoting it on my profile.

Except for the difficulty of the sentiment analysis itself, applying sentiment analysis on reviews or feedback also faces the challenge of spam and biased studies. Therefore, one direction of work is focused on evaluating the helpfulness of each review. By analyzing tweets, online reviews, and news articles at scale, business analysts gain valuable insights into how customers feel about their brands, products, and services.

I will describe these steps in detail in upcoming blogs. If you are interested, please leave your questions/suggestions below in the comment section and check this space for more updates about NLP techniques used in sentiment analysis.

Naive Bayes classification for sentiment analysis

Naive Bayes classification is one of the most commonly used machine learning techniques for solving NLP problems. It's a supervised learning algorithm that works well in many practical applications like text classification, document categorization, spam detection, etc. So far, in my previous blog posts (see here and here ), we worked with emails and different categories like SPAM/Ham, Junk/Not Junk, and Phishing/Not Phishing.

Methods

There are three types of machine learning techniques that can be used to solve problems related to NLP: supervised learning, unsupervised learning, and reinforcement learning. We have already seen supervised and unsupervised methods in many previous posts starting from here. In this blog post, we will focus on another technique called reinforcement learning. You should note that other than these three, all other methods like decision trees, random forests, etc., are not suitable for solving NLP problems because they don't work well with text data. On the other hand, all three other techniques mentioned above work very well with text data, and that is why they are widely used for solving NLP problems.

Prerequisites

This tutorial assumes that you have some knowledge about machine learning algorithms and how they work. Please refer to my previous posts if you do not know anything about these. It would be best to understand it well because the entire sentiment classifier project is built around it.

Sentiment analysis - A practitioner's perspective

Sentiment analysis is the technique of classifying a textual input (i.e., sentence) as either "positive" or "negative." For example, it identified people's opinions in different textual content, such as product reviews and social media posts, and then categorized them into positive and negative sentiments. Although its roots can be traced back to the early 90s, it did not gain much interest until late 2004 when Jim Baker started working on his NLP system called Opinion Finder, which correctly identified sentiment in movie reviews.

Soon after that, many other researchers attempted to build similar systems, but most failed because they were trained using annotated data like tweets or emails whose own n-gram models (i.e., unigram, bigram, etc.) were not defined. To address this issue and achieve a reasonably accurate sentiment analysis system, researchers in 2009 proposed using the Maximum Entropy Model (see here ) trained on annotated data like tweets and emails whose models were already well-defined. Essentially, the Maximum Entropy Model solves the problem by predicting unseen words in the same document based on the context of surrounding words.

The meaning of a sentence analysis

Let's take the following text from a movie review of "Star Wars VII" by Peter Travers for Rolling Stone to understand sentiment analysis.

The force is strong with this one. Harrison Ford's Han Solo has lost none of his lustres, and the blend of the old and new works brilliantly when so much depends on it. The energy picks up in time to deliver an emotional payoff at the end that feels earned without being predictable. Yet there are also undercooked characters in need of more depth—something that can be said about just about every character in Star Wars except for Chewbacca, who always was excellent. The main factors are negative words, positive ones, and opinion mining. The sentiment score should reflect that intensity when customers like their bed so much. Let's take your customer feedback as an example. Sentiment analysis (a form of text analytics) measures the customer's attitude towards the aspects of a service or product they describe in text. With the recent advances in deep learning, the ability of algorithms to analyze text has improved considerably.

Problems of sentiment analysis

The most common problem associated with sentiment analysis is assigning incorrect polarity labels. This happens when there is no annotated data like tweets and emails whose models (i.e., unigram, bigram, etc.) were already well-defined. So any system trained with this type of data will predict words in the same document based on the context of surrounding words, leading to incorrect results. On top of that, there are other problems like identifying negation (e.g., "not good"), sarcasm (e.g., "That was great"), and slang (e.g., "the bomb") that a sentiment analysis system should be able to handle too. For example, some sentiment analysis algorithms look beyond only unigrams (i.e. single words) to understand the sentiment of a sentence as a whole.

Supervised Learning

Supervised learning is all about training a model from labelled data and testing it on new, unseen data with the help of metrics like accuracy, precision, recall, etc. This project will use supervised learning to get accuracies above 80%, which is an industry-standard for sentiment analysis models. This means that you need at least 20 positive and negative reviews for training your model because, in our case, these are the only two classes that we have used to train our model.

Maximum Entropy Modeling

Yehuda Koren and Robert Bell proposed the Maximum Entropy Model in 2009 at the Annual Meeting of the Association for Computational Linguistics. It is an unsupervised machine learning algorithm that considers all the words seen in the corpus and predicts unseen words based on their most similar context. This is compared to most NLP systems which use a string-based model (i.e., Vector Space Model) to represent each word with tfidf vectors meaning if two words are similar, they will have high values in their tfidf vector representation.

Examples

The most common example of the Maximum Entropy Model is spam detection, where the negative class is "spam," and the positive type is "ham." You have a corpus of emails labelled as either ham or spam. After training your model on this data using the Maximum Entropy Model, you can use it to predict whether future emails belong to one or another category (i.e., ham or spam) with great accuracy even though the corpus only contains two classes.

NMF

For sentiment analysis, we took a corpus containing 1,000 tweets for training and 50 positive and negative reviews for testing (i.e., 20 positive and 20 negative reviews each). So in our case, we set W to be unigrams and H to be bigrams since we weren't interested in just one word at a time but rather two consecutive words which were used more than once to express opinions about products and services on Twitter and Amazon respectively. Once we did that, we found that it took eight unique bigram features to divide our initial dataset into two separate matrices, which resulted in high accuracy of 90%.

Problems of sentiment analysis

The most common problem associated with sentiment analysis is assigning incorrect polarity labels. This happens when there is no annotated data like tweets and emails whose models (i.e., unigram, bigram, etc.) were already well-defined. So any system trained with this type of data will predict words in the same document based on the context of surrounding words, leading to incorrect results. On top of that, there are other problems like identifying negation (e.g., "not good"), sarcasm (e.g., "That was great and slang (e.g., "the bomb") that a sentiment analysis system should be able to handle too.

Supervised Learning

Supervised learning is all about training a model from labelled data and testing it on new, unseen data with the help of metrics like accuracy, precision, recall, etc. This project will use supervised learning to get accuracies above 80%, which is an industry-standard for sentiment analysis models. This means that you need at least 20 positive and negative reviews for training your model because, in our case, these are the only two classes that we have used to train our model.

Maximum Entropy Modeling

Yehuda Koren and Robert Bell proposed the Maximum Entropy Model in 2009 at the Annual Meeting of the Association for Computational Linguistics. It is an unsupervised machine learning algorithm that considers all the words seen in the corpus and predicts unseen words based on their most similar context. This is compared to most NLP systems which use a string-based model (i.e., Vector Space Model) to represent each word with tfidf vectors meaning if two words are similar, they will have high values in their tfidf vector representation.

Examples

The most common example of the Maximum Entropy Model is spam detection, where the negative class is "spam," and the positive type is "ham." You have a corpus of emails labelled as either ham or spam. After training your model on this data using the Maximum Entropy Model, you can use it to predict whether future emails belong to one or another category (i.e., ham or spam) with great accuracy even though the corpus only contains two classes.

NMF

For sentiment analysis, we took a corpus containing 1,000 tweets for training and 50 positive and negative reviews for testing (i.e., 20 positive and 20 negative reviews each). So in our case, we set W to be unigrams and H to be bigrams since we weren't interested in just one word at a time but rather two consecutive words which were used more than once to express opinions about products and services on Twitter and Amazon respectively. Once we did that, we found that it took eight unique bigram features to divide our initial dataset into two separate matrices, which resulted in high accuracy of 90%.

Importance of sentiment analysis

Sentiment analysis is becoming more and more important as social media platforms expand. For example, tweets are public. Short opinions people share about their lives, interests, etc. Since tweets give an unfiltered look into customers' thoughts which can either help companies build trust with their customers or damage it if they don't do anything to address their concerns.

Collaborative Filtering

To solve this problem, we used unsupervised Collaborative filtering, where the purpose of the algorithm is to find "similar" users based on their behaviour over time. Here each user was represented by their review counts for positive and negative sentiment words in our case. After applying the Principal Component Analysis (PCA) dimensionality reduction technique, we found that the first component was sufficient to describe almost half of all user behaviour.

Principal Component Analysis

PCA is a linear transformation technique given n points in m-dimensional space. It transforms them into n points in a new coordinate system defined by k < m new orthogonal axes. Put differently, PCA finds the line (or hyperplane) which has the best approximation of all data points and projects these data points onto this line (or hyperplane), so we can easily visualize them and find patterns in our data set since PCA works better than Eigendecomposition on large datasets and still provides meaningful results even when using just one or two components. Great! So what's the problem?

The first component of our dataset was heavily influenced by user review counts for positive sentiment, which made it hard to interpret the results since very strong negations are not familiar in social media. So, for example, one might say, "I'm not satisfied," but they're unlikely to say, "I'm very dissatisfied." However, if we look at the second component, this issue is resolved because both positive and negative sentiments are represented, so there's no bias towards either sentiment type.

Geolance is an on-demand staffing platform

We're a new kind of staffing platform that simplifies the process for professionals to find work. No more tedious job boards, we've done all the hard work for you.


Geolance is a search engine that combines the power of machine learning with human input to make finding information easier.

© Copyright 2024 Geolance. All rights reserved.