Click here to Skip to main content
15,868,016 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
Hi Python lovers,

I am planning to build a self-learning dictionary of sentiment word with their sentiment label.

I am able to identify the sentiment words by using POS tags but not able to label those words as positive, negative or neutral.

For example: "The food was not good" is the sentence, and I have extracted "not good" from the sentence as sentiment word by using the POS tag. Now I want to label this as negative and add it to my new dictionary for future use.

my preference to do this project/task is by not using any pre-defined dictionary/word bank/any pre-defined sentiment analysis package.

I am seeking your views to know the way to label it without using any pre-defined dictionary or with pre-defined dictionary

What I have tried:

Currently, I have explored Word embedding, Skip through n-gram model for this. I have also used a pre-defined dictionary to train the model by using some supervised learning model like Xgboost, KNN, Naive Bayes classifier. I have used some unsupervised model like k-mean to predict the label by using the words.
Still not able to get the results.

If You know any other way or some input to apply with any of above-used models to label word as positive, negative or neutral then please suggest.

Thanks in advance for your time and inputs...
Posted
Updated 24-Jul-19 5:57am

Computers are still not "human".

There is no "learning" without (initial) "training" (data) in the case of "artificial intelligence". Note the absence of "knowledge" in that phrase.

We can train ourselves and gain knowledge; computers not so much.

Without knowing cause and effect (i.e. "stats"), "we" get to decide what is "good or bad" (whether or not it actually is) and project from there.
 
Share this answer
 
Quote:
the way to label it
We can start here. Computer algorithms need enough data to perform their tasks. In machine learning, the learning tasks are categorized as supervised or unsupervised. In supervised learning you provide the data with independent and dependent variables; your input fields are independent variables and the label is the dependent variable. You predict the label based on the input.

Now, in unsupervised learning, you provide merely the input data. Your algorithm (meat of the project is here, you have to select the algorithm that can do this task!) has to then find the similarities in the input sentences and group them alike as Group_1 being the text that does not contain the word "not" or contain the words like, "bad", "poor", "disturbing", whereas another group Group_2 would contain the sentences that contain words like, "amazing", "excellent", "great", "refreshing"... You get the point.

Supervised vs. Unsupervised Machine Learning[^]

So, now this is on you how you choose and which algorithm to work on this problem. Labeling has to be done by you if you want to have the data labelled. Machine learning algorithm can label the data, but for itself only—and only the model will be able to understand the labels.

There are several articles available online too that discuss this thing in quite a great detail, see the following links:

https://towardsdatascience.com/sentiment-analysis-with-python-part-1-5ce197074184[^]
https://medium.com/district-data-labs/modern-methods-for-sentiment-analysis-694eaf725244[^]
GitHub - Aminoid/supervised-sentiment-analysis: Supervised Learning Techniques for Sentiment Analytics[^]
Quote:
my preference to do this project/task is by not using any pre-defined dictionary/word bank/any pre-defined sentiment analysis package.
If that is your preference as you say it, then you should find the data sets and label them yourself. Otherwise you can always find a huge load of data sets available on sites such as Kaggle.
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900