Quote:
the way to label it
We can start here. Computer algorithms need enough data to perform their tasks. In machine learning, the learning tasks are categorized as
supervised or unsupervised. In supervised learning you provide the data with independent and dependent variables; your input fields are independent variables and the label is the dependent variable. You predict the label based on the input.
Now, in unsupervised learning, you provide merely the input data. Your algorithm (meat of the project is here, you have to select the algorithm that can do this task!) has to then find the similarities in the input sentences and group them alike as Group_1 being the text that does not contain the word "not" or contain the words like, "bad", "poor", "disturbing", whereas another group Group_2 would contain the sentences that contain words like, "amazing", "excellent", "great", "refreshing"... You get the point.
Supervised vs. Unsupervised Machine Learning[
^]
So, now this is on you how you choose and which algorithm to work on this problem. Labeling has to be done by you if you want to have the data labelled. Machine learning algorithm can label the data, but for itself only—and only the model will be able to understand the labels.
There are several articles available online too that discuss this thing in quite a great detail, see the following links:
https://towardsdatascience.com/sentiment-analysis-with-python-part-1-5ce197074184[
^]
https://medium.com/district-data-labs/modern-methods-for-sentiment-analysis-694eaf725244[
^]
GitHub - Aminoid/supervised-sentiment-analysis: Supervised Learning Techniques for Sentiment Analytics[
^]
Quote:
my preference to do this project/task is by not using any pre-defined dictionary/word bank/any pre-defined sentiment analysis package.
If that is your
preference as you say it, then you should find the data sets and label them yourself. Otherwise you can always find a huge load of data sets available on sites such as
Kaggle.