EXECUTIVE SUMMARY

  1. It is evident that a higher positive sentiment rate will tend to be associated with a lower Nasdaq Close Price.

  2. COVID and Racial Equality related posts tend to have a larger proportion of negative sentiment, while Economics and Gender Equality related posts are likely to be more positive.

Summary table by topic
Sentiment Positive Neutral Negative Pos-Neg Ratio
Covid 12548 1813 20020 0.63
Election 20562 2516 21120 0.97
Economics 62773 4979 52316 1.20
Finance 13928 1517 13579 1.03
Gender Equality 43667 2021 14229 3.07
Racial Equality 98744 15071 157961 0.63

Data Analysis



Mainly Short Posts

After processing and splitting the text, the length of each post can be obtained. Since the subject of the subreddit is #PoliticalCompassMemes, which is about memes and pictures, so most of the posts (over 60%) are really short, with less than 20 words.

Word Frequency Ranking
Ranking Word Ranking Word
1 people 11 auth
2 make 12 yeah
3 leave 13 libleft
4 base 14 flair
5 thing 15 state
6 good 16 work
7 time 17 libright
8 government 18 man
9 lib 19 country
10 bad 20 authright

People, Government, State, and Country

Before using CountVectorizer and TF-IDF, data cleaning procedures including tokenizing, removing stop words and lemmatizing were applied onto the text. And in agreement with the results, People, Government, State, and Country are the most important words in the posts.

Higher Positive Sentiment Rate, Lower Nasdaq Close Price

To find the relationship between the daily sentiment of the reddit posts and Nasdaq close price, a contour plot was made here. And according to the plot, it can be recognized that there is an evidently negative relationship between the positive sentiment rate and Nasdaq close price.

Centrist Posts Tend to be More Positive

For most of the author flair type, the positive sentiment rates are around 0.575, while the centrists' posts tend to have higher positive sentiment rates, which are around 0.625-0.65.

Positive Posts Usually Gain a Higher Score

In accordance with the boxenplot, a positive post is more likely to obtain a higher post score, compared with a nagetive or neutral post.

Summary table by month
Month 19-07 19-08 19-09 19-10 19-11 19-12
Neg-Rate 0.3301 0.3478 0.3493 0.3358 0.3299 0.3248
Pos-Rate 0.6211 0.6031 0.6005 0.6107 0.6207 0.6237
Month 20-01 20-02 20-03 20-04 20-05 20-06
Neg-Rate 0.3239 0.3210 0.3448 0.3459 0.3511 0.3508
Pos-Rate 0.6258 0.6285 0.6043 0.6039 0.5986 0.5992
Month 20-07 20-08 20-09 20-10 20-11 20-12
Neg-Rate 0.3516 0.3587 0.3579 0.3657 0.3651 0.3644
Pos-Rate 0.6003 0.5918 0.5928 0.5872 0.5864 0.5883
Month 21-01 21-02 21-03 21-04 21-05 21-06
Neg-Rate 0.3781 0.3729 0.3670 0.3797 0.3868 0.3679
Pos-Rate 0.5736 0.5793 0.5853 0.5731 0.5674 0.5846

COVID Made a More Negative Atmosphere

After the outbreak of the COVID, the monthly negative posts proportion tends to increase from 0.32 to 0.37, elucidating that the pandemic seems to make the subreddit posts more negative.

Different Topics, Different Sentiments

COVID and Racial Equality related posts tend to have a larger proportion of negative sentiment, while Economics and Gender Equality related posts are likely to be more positive.

Summary table by topic
Sentiment Positive Neutral Negative Pos-Neg Ratio
Covid 12548 1813 20020 0.63
Election 20562 2516 21120 0.97
Economics 62773 4979 52316 1.20
Finance 13928 1517 13579 1.03
Gender Equality 43667 2021 14229 3.07
Racial Equality 98744 15071 157961 0.63

Summary

  1. Most of the posts (over 60%) are less than 20 words.
  2. People, Government, State, and Country are the most important words in the posts.
  3. There is an evidently negative relationship between the positive sentiment rate and Nasdaq close price.
  4. The centrists' posts tend to have higher positive sentiment rates.
  5. Positive Posts Usually Gain a Higher Score.
  6. The pandemic seems to make the subreddit posts more negative.
  7. COVID and Racial Equality related posts tend to have a larger proportion of negative sentiment, while Economics and Gender Equality related posts are likely to be more positive.

Resources

What's next?

Next section is Machine Learning Models, we will start to build predictive models.