At Elastacloud, it is important for us to let people know what goes on on a day to day basis. We use social media tools (e.g. Twitter), Channels and other avenues to showcase what the awesome stuff we do at work. As a data scientist at Elastacloud, I could not help but try to do some analysis on our social media presence!
In this post (and possibly next post), I will be investigating and highlighting how much work our amazing marketing team have done on twitter. To do this, I will analyse the contents of our twitter handle, @elastacloud (if you’re not following us already, please do so now J ). In this initial post, I will show a simple exploratory analysis of tweets in terms of number of tweets, what time of the day our tweets go out and so on.
To extract tweets from twitter, you will need to create a Twitter Application at https://apps.twitter.com. If you don’t have a twitter account already, you will need to create one. Th Twitter Application created will provide you with unique ‘consumerkey’, ‘consumersecret’, ‘accesstokenkey’ and ‘accesstokensecret’. Ensure you keep these safe. Using the twitter package, we can use these to connect to the Twitter API.
I extracted 482 tweets (observations) and 16 variables via the API. The variables are listed below:
A snippet of the first 6 tweets are (note that I extracted these tweets on 13th of September 2017):
The ‘text’ variable is the content of the tweet itself. It also contains some of our retweets. It tells us which tweets were favorited/retweeted and how many times a tweet was favorited/retweeted and so on. If the user gives access to his/her location, the API will also return the user’s latitude and longitude values otherwise it will return NA.
A quick look at the summary() of the data shows that we will be assessing tweets created between 9:03:45 on 09-03-2012 and 10:43:45 on 13-03-2017.
Let’s do some exploratory analysis. First we will look at our tweets, when they are posted (time and day of week).
Each dot represents a tweet. Seems like we do a lot of tweeting on Wednesday. Interesting to see that the marketing team tweet as late as 12 midnight! Amazing!
We get a clearer view of tweet volume per day by using the geom_bar() function instead of jitters in the ggplot2 package. Wednesday is our most active day followed by Thursday and Friday.
On a year by year basis, we’ve really been active in 2017! Before that though, 2012 was how most active period followed by 2015. I reckon we will surpass 200 tweets before the end of 2017. Yes we can!
On a month by month basis over the six years in consideration (2012 to 2017), June has been our most active month followed by May and then August. We can facet the plots and see what it looks like on a month to month basis for all 6 years.
On an hour by hour basis, our tweets tend to go live at about 11am and as before, we can see the team working late into the night getting our tweets live.
To get a more relevant view, let’s look at our performance in 2017.
Similar to the general view, most of our tweets go out at around 11am. However, in 2017, the 20:00 hours bar is missing any count implying that we have not tweeted (or retweeted) at this hour in 2017.
Our most active month (so far) in 2017 is August. Still some way to go in September so I expect us to surpass the 50 tweets in August. Interestingly, we didn’t have any tweet go out in January.
Finally, we see that looking at the weekly data, in 2017, we tend to post more on Thursdays. This is a shift from when we analysed the entire data from 2012 which showed Wednesday as our choice day for sending out tweets. We can facet this weekly data and see our performance from day to day for all 6 years.
From the plots, it can be seen that Wednesdays seem to be our popular day for tweeting (2012, 2014, 2016) followed by Thursdays (2015 and 2017). In 2013, it’s between Monday and Tuesday.
I know I used the word ‘finally’ in the last section but how many presentations have we seen where the user says ‘finally’ and goes on for another 30 minutes? Virtually all the time right? :D Please bear with me for a few more minutes. Let’s take a look at who we are retweeting.
This plot shows us the top ten handles we have retweeted on the @elastacloud handle. Andy (@andyelastacloud) and Richard (@azurecoder), we see you!
I’ll quickly sneak this one in before anyone notices: Our top performing tweets (in terms of number of retweets)
What I’ve tried to do in this post is a basic exploratory analysis of the @elastacloud twitter handle. In another post, I will aim to do some analysis on the actual tweet text itself. I will show words which appear a lot in our tweets and other possible analysis.