Twitter speaks Crowdlending — A sentiment analysis case
At Acredius, we use alternative data to constantly enhance our proprietary credit risk scorecard. One of the sources we use is Twitter. Tweets have proven to be more powerful than we think when it comes to understanding trends. Associated with proper machine learning techniques, one can get great insights. As a “fun” project, we wanted to see what people think about our industry. So, we investigated the overall opinion about “crowdlending”.
Generally speaking, we form our own opinion based on several factors. They can be personal, for instance, our past experiences, or our current understanding. These factors can also be external like the news or a person who inspires us.
All of these individual opinions once gathered and analyzed can be a powerful indicator in marketing, sales, risk assessment and even in dating apps.
Any tweet represents an opinion at a certain point in time. The beauty of it is that so many people from different places, different backgrounds, and different demographics are using the same social media platforms. These form a picture that can be assumed, with caution, as a general opinion.
Let’s try to analyze the global feeling about crowdlending, by only using tweets.
Sentiment analysis
The main objective is to determine the general sentiment of crowdlending’s tweets. To do so we will use sentiment analysis and classify these tweets into three categories, namely: positive, neutral and negative.
As not all tweets are written in English, we performed our analysis twice. The first time on the full set of tweets that includes any post on crowdlending no matter the language. The second time solely on tweets written in English, which is the most common language in terms of crowdlending tweets.
In both cases, we can observe the same trend. The large majority of tweets are categorized as neutral. Only a few are labeled as negative while almost a third are considered positive. These results show a clear lack of awareness regarding this new asset class as the majority of tweets are neutral, but a great outlook as the ones who took position are mainly positive (around 75% if we take out the neutral ones).
This labelling is made according to the words composing the tweet. Some words such as ‘great’ or ‘amazing’ appear mainly in positive sentences. The more a tweet involves positive words the higher the positive coefficient and therefore the chance of being tagged as positive.
This being said, the percentages of either positive or negative are higher in the second case. The reason behind this difference is the use of a translator. In fact, in the second case, we simply ran the sentiment analysis algorithm on the initial English written tweets only. Whereas in the first case we applied a translator to any non-English written tweet and then executed our sentiment analysis on the whole set. Therefore we lost some of the semantic included in the original language due to the translation. This loss results in a higher number of tweets assigned to the neutral class.
We could have stopped the sentiment analysis here, but since we have every single word labeled as positive, negative or neutral, let’s look at the most occurring positive terms.
We only used the tweets originally written in English and extracted the most occurring positively labeled words on both a yearly basis and an overall basis.
Crowdlending is a recent industry. Most tweets reveal the same pattern: informing people about the advantages of crowdlending. It can first be seen on an overall basis, where, out of the top five most utilized words, participate and alternative fall under the scope of encouraging people to jump in, while benefits and growth are within the promoting scope.
This trend is also observed on a yearly basis.
In 2014, they mainly tried to initiate people to join the community and raise awareness about this industry. Similarly in 2016, where most of the positive posts included the word “participate”.
In between, most of the 2015’ tweets underlined that crowdlending is a great alternative to traditional ways of investing or getting financed. While in 2017 most of the posts included the benefits of crowdlending such as the diversification of a portfolio.
Fintech is the most occurring positive word of the year 2018, as most of the crowdlending firms recently established themselves as a Fintech. As tweets are scored regarding words they contain, there are definitely other positive words that made Fintech appear at the top. But the fact is, Fintech was the most frequent word in all positive 2018 tweets!
Lastly, in 2019 most of the positive posts highlighted the potential future growth of the crowdlending industry. It is well illustrated by the word “growth”, the most used positive word within this year.
Now that we have a clear overview of the sentiment of the tweets, let’s have a look at the context they have been written in.
Top 3 #Crowdlending related hashtags
The first component of a tweet is the hashtags it includes. They are used to categorize the content of the post and lead users to it. Since one post on twitter usually includes more than one hashtag, let’s have a look at the different hashtags associated with crowdlending.
Major hashtags used with crowdlending are in order of importance: Fintech, Invest and Alternative investment.
The most occurring hashtag is Fintech which is not surprising since crowdlending is an activity that is proper to Fintechs. All other tags concern the investment activity showcasing crowdlending as an alternative solution to ensure great returns.
Ps: Crowdlending is on the rise
The total number of tweets can be a good indicator of the current popularity of a hashtag. Indeed the more it is used the higher the popularity and the interest towards it. For this reason, we measured the total number of tweets tagged with crowdlending for each distinct year and plotted it.
From the beginning of 2014 until the end of 2018 we can observe a clear uptrend. The number of tweets about crowdlending increased by almost 50%. In other words, crowdlending is a subject that is getting more and more attention and consideration.
The curve exhibits the same pattern as the one obtained above. Namely continuously increasing with a drop in the year 2017.
A quick note concerning this decline: We think it is the combining result of different elements. First, Instagram, not only emerged as one of the main social media platforms but also as a major advertising platform for small businesses such as Fintechs. Second, like any money related internet business, transactions sometimes lead inevitably to some fraud cases. Their numbers exploded within the 2017 year period, especially in China. Third and last, 2017 was quite intense. It encompassed the Brexit, Trump’s first year, a homeless crisis, terrorist attacks, and North Korea escalation. All of these events reduced the interest and therefore the number of posts about crowdlending.
Sample information:
We used the Get Old Tweet python package in order to gather all the tweets from 01.01.2014 to 31.12.2019. In addition to the proper posted text, we also collected for each of the 84 117 tweets the datetime and the hashtags. The following table summarizes the percentage of originally English written tweets for each year:
*A Special thanks goes to Cedric Higel, our data scientist intern who played an important role in producing and writing this article.