Twitterlytics

// What is Twitterlytics?

Twitterlytics is an application that collects, analyzes, and displays realtime Twitter data, as seen in this quick demonstration below.

The current iteration of this application analyzes 2000 tweets (because of Twitter API constraints) for their most used langauges, love vs. hate words, and countries mentioned. It also categorizes tweets as "top" if the tweet exceeded 10,000 retweets. Some of these top tweets are printed below, and the languages from these top tweets are also analyzed.

// How does Twitterlytics work?

Twitterlytics utilizes Python 3 and the Tweepy library to grab tweets from the live Twitter API stream. The data is then written to a SQLite database, then presented using Flask and Google Charts.

The flowchart below illustrates the process.

// How was Twitterlytics developed?

Design

I first began designing my solution after determining the technologies to use, exemplified by the flowchart diagram in the previous section. Doing this helped me ensure I had the big picture in mind as I proceeded to move onto actually coding. Through my design, I was able to think about the code blocks I'll have, their relationships to each other, as well as the overall data flow through Twitterlytics. Determining all technologies I planned on using in the beginning during the design phase allowed me to compile resources to help me accomplish the overall task as well.

messy_twitter.py + twitter.py

Then, I began messy_twitter.py, which was my initial iteration of the final twitter.py file. The (messy_)twitter.py file is the main script which talks to the Twitter API. I first ensured this script may actually stream tweets, which would verify my credentials.py file (renamed to YOUR-CREDENTIALS.py to keep my credentials secret) as well as confirm that I can actually retrieve tweets using Tweepy. I then added a counter to the script to have it continuously stream tweets. Next, I added more details to the tweets being pulled by having the script categorize tweets by language and retweet count. At this point, messy_twitter.py began getting really messy (hence its name), so I added classes (Twitter class and Statistics class) to encompass the functions spread throughout the code. In order to preserve my evolving development process, I just left messy_twitter.py and created a new file (twitter.py) that wasn’t as messy.

create_database.py

After ensuring twitter.py retrieves and analyzes the tweets from the Twitter stream, I worked on the database script (create_database.py) to have the data inserted into a SQLite database according to the tables I created. Whenever twitter.py is run, twitterlytics_data.db is updated.

test_backend.py

Most of the backend code was finished at this point, so I wrote test scripts for the aforementioned files, test_backend.py. This script tests the development database code (not production data!), tests if the Twitter class may retrieve a small amount of tweets (compared to the 2000 actually grabbed), tests the analytical features (calls the functions to retrieve the categorized tweets and if they’ve been written to the database), and tests the stats class.

test_frontend.py

Onto the frontend! I wrote the tests for the upcoming Flask web server preemptively to help with actually developing the frontend because these tests contained the bulk of what I designed the frontend to be comprised of. These tests may be seen in test_frontend.py. Basically, it firsts checks if Flask is running on the development server (localhost:5000), then tests each frontend feature individually, starting with tweet languages.

FlaskApp.py

Next, the actual Flask code, that may be seen in FlaskApp.py. This was my first time working with Flask, but it was generally straightforward thanks to their excellent documentation. Basically, FlaskApp.py reads the SQLite database and sends this data to the HTML pages in the Templates folder. In Twitterlytics’ case, the majority of the data is graphed through Google Charts. After getting the date displayed on my local environment, I touched up the frontend using my website as a guideline for Twitterlytics’ frontend codebase.

// Why was Twitterlytics made?

Ultimately, I decided to develop this to 1) practice full-stack Python applications and 2) because I use Twitter a lot (follow me at: @christianlobmit) and found it intriguing to dive into Twitter data.

If you would like more information, please see my project page on Github where all the source code may be viewed.

But enough introduction, and onto the actual analytics!

Top Languages

This section categorizes the tweets grabbed by language, so we may see the breakdown of languages used in tweets, thus determining the "top languages" in terms of tweet quantity.

Hover over the different sections of the chart to see the specific amount of tweets that comprise that language.

Love Tweets vs. Hate Tweets

This section categorizes tweets as "Love" or "Hate" depending on the words used in the tweets grabbed.

Words that categorize a tweet as loving: 'Love', 'Thank', 'Happy', 'Bless'

Words that categorize a tweet as hating: 'Fck', 'Sht', 'B*tch', 'Idiot'

Hover over the different sections of the chart to see the specific amount of tweets that comprise that categorization.

Countries Mentioned

This section categorizes the tweets grabbed by country mention, so we may see the breakdown of which countries are being talked about most.

Hover over the different sections of the chart to see the specific amount of tweets that comprise that specific country.

Top Tweets' Languages

This section categorizes the Top Tweets (retweeted > 10,000 times) grabbed by language, so we may see the breakdown of languages used in Top Tweets.

Hover over the different sections of the chart to see the specific amount of Top Tweets that comprise that language.

Sample of Top Tweets

This section provides a sample of the Top Tweets (retweeted > 10,000 times) that were grabbed. It's interesting to see a handful of tweets that were capable of garnering the attention of thousands of people.

yo i hate honors college boys i just asked this guy “hey why aren’t koalas considered bears?” and he hits me with “they’re marsupials” shut up nerd the answer to the joke is “they don’t have the koalafications”
— claire (@clairedaniellem) July 22, 2018

First woman that gave birth to twins was prolly like “????????”
— нarry вelaғcĸdyoвтcн (@BarkyBoogz) July 25, 2018

I was on the Moon! #Apollo11 @NASA https://t.co/6Nb2cQVU32
— Buzz Aldrin (@TheRealBuzz) July 21, 2018

I love overhearing dog owners talking to their dogs

eg, I was petting this dog who seemed happy but then suddenly growled at me, so I left

As I turned the corner I could hear his owner saying to him reproachfully, "You always do this, Oscar, you drive away all your friends"
— Julia Galef (@juliagalef) July 17, 2018

There’s 7 million people in this world and you think I’m gonna let one customer with a bad attitude to ruin my day??? damn right I am I’ll probably even go cry in the freezer too
— Jenna Cherry (@jennacherry_10) July 14, 2018

【盗撮注意】公衆トイレや試着室に、フック型隠しカメラが設置されていることがあるそう。一見、隠しカメラとは思えないものなので、フックがあれば全体を布か何かでフック全体を覆うようにした方が良いかも。これは是非とも皆に知って欲しい。 pic.twitter.com/jdL20x1Mhh
— she (@oshoyu_egg) July 24, 2018

Just needs a little help pic.twitter.com/mydc3vybZM
— Nature is Amazing 🌴 (@AMAZlNGNATURE) July 26, 2018

me opening the same app i just closed https://t.co/ttFbf1Qb5p
— josh (@yunginstitution) July 25, 2018

How to NOT kill yourself pt 1

Avoid being around people who make you want to kill yourself
— KANYE WEST (@kanyewest) July 27, 2018

eclipse #RM pic.twitter.com/uFDjljdxOM
— 방탄소년단 (@BTS_twt) July 27, 2018

Direct deposit ? I know https://t.co/CshhfX0rvI
— D (@yeeerderrick) July 26, 2018

eclipse #RM pic.twitter.com/uFDjljdxOM
— 방탄소년단 (@BTS_twt) July 27, 2018

SABON日本上陸10周年を記念してTwitter限定キャンペーンがスタート。
つるんとなめらか肌へと導く「フェイスポリッシャー」を抽選で20名様にプレゼントします。

▼参加方法
①@SABON_Japanをフォロー
②この投稿をRT

特設サイトでもキャンペーン実施中https://t.co/Yw3VX3wUGz #sabon #プレゼント pic.twitter.com/URbuZGrmxn
— 【公式】SABON(サボン) (@SABON_Japan) July 13, 2018

not to be confused with award winning actor Leonardo DaVinci https://t.co/ODOpurNgj3
— ruckin📱 (@ruckin_) July 23, 2018

but yet it’s hispanic workers who have to fix his piece of shit star when all he does is talk shit about them. LMAO funny how that works out https://t.co/lEjfv4K4hv
— 𝘽🧡 (@_bbrirose) July 27, 2018

90+ Grades
90+ Grades
90+ Grades
90+ Grades
90+ Grades
90+ Grades
90+ Grades
90+ Grades
90+ Grades
90+ Grades
90+ Grades
90+ Grades
90+ Grades

For this SY 2018-2019

Rt now✨
— JhonAlbertJuario (@Jalbert444) July 21, 2018

#BTS #방탄소년단 #VLIVE channel hit 10 million followers! 천만 구독자 달성 기념 축하 메시지가 네이버 그린팩토리 외벽에 게시되었습니다! V❤️BTS 천만! #BTS_VLIVE_10million pic.twitter.com/WECPlX58EC
— V LIVE (@Vliveofficial) July 27, 2018

do coke and pepsi taste the same??????

rt for no
like for yes

it’s time to finally settle this
— shenk (@taylorshenk) July 22, 2018

Sooo today my 6 year old cousin tried to kill me ...Follow thread to see the rest ..I promise ITS WORTH ALL YOUR TIME . pic.twitter.com/wNgoRmsoLl
— cass🦄 (@darealcass_) July 30, 2017

I don’t know how I thought jeans were being made, but this wasn’t it. https://t.co/IKKBdfCOFe
— Zoi🤸🏾‍♀️ (@zoisthoughts) July 26, 2018

*logs into gmail on different device*

Google 2 seconds later: https://t.co/RjANdDfNGW
— pull the receipts type beat (@Groovy1u) July 26, 2018

eclipse #RM pic.twitter.com/uFDjljdxOM
— 방탄소년단 (@BTS_twt) July 27, 2018

No I’m just gay https://t.co/qoSPo3w0aS
— Kait (@kaitlinmaarie) July 25, 2018

猛暑の影響で救急出動が急増し、通常より救急隊を増やして対応していますが、1日の出動が22件となった隊もあります。
そのため出動が連続し消防署に戻れない時は、救急車でコンビニ等に立ち寄り飲料水等を購入する場合があります。
その際も、出動態勢は維持していますので、ご理解をお願いします。
— 名古屋市消防局【公式】 (@NagoyaShobo) July 26, 2018

It all makes sense now. Imagine working from 9-5 and coming back and your kid didn’t take the chicken out the freezer
— AJ Olarinde (@AjOlarinde) July 25, 2018

eclipse #RM pic.twitter.com/uFDjljdxOM
— 방탄소년단 (@BTS_twt) July 27, 2018

MY CHILDHOOD HAS BEEN RESTORED 😂 pic.twitter.com/GjVj3FRRWX
— dayana • (@DayanaMarshae) July 26, 2018

This man decided to go 70+ in a 40 while drunk and ran into a fucking tree and Killed my 16 year old cousin. Lock him up https://t.co/x5O1CTxhme
— Jonte’ (@BOO_Dank) July 25, 2018

do coke and pepsi taste the same??????

rt for no
like for yes

it’s time to finally settle this
— shenk (@taylorshenk) July 22, 2018

Contact

Please send me an e-mail if you have any suggestions for me to further improve my code, improve this page, or for other analyses for me to conduct! Thanks for scrolling by :-)

Twitterlytics

Extracting and analyzing Twitter data

// What is Twitterlytics?

// How does Twitterlytics work?

Twitterlytics utilizes Python 3 and the Tweepy library to grab tweets from the live Twitter API stream. The data is then written to a SQLite database, then presented using Flask and Google Charts. The flowchart below illustrates the process.

// How was Twitterlytics developed?

Design

messy_twitter.py + twitter.py

create_database.py

After ensuring twitter.py retrieves and analyzes the tweets from the Twitter stream, I worked on the database script (create_database.py) to have the data inserted into a SQLite database according to the tables I created. Whenever twitter.py is run, twitterlytics_data.db is updated.

test_backend.py

test_frontend.py

FlaskApp.py

// Why was Twitterlytics made?

Ultimately, I decided to develop this to 1) practice full-stack Python applications and 2) because I use Twitter a lot (follow me at: @christianlobmit) and found it intriguing to dive into Twitter data.

If you would like more information, please see my project page on Github where all the source code may be viewed. But enough introduction, and onto the actual analytics!

Top Languages

This section categorizes the tweets grabbed by language, so we may see the breakdown of languages used in tweets, thus determining the "top languages" in terms of tweet quantity.

Hover over the different sections of the chart to see the specific amount of tweets that comprise that language.

Love Tweets vs. Hate Tweets

This section categorizes tweets as "Love" or "Hate" depending on the words used in the tweets grabbed.

Words that categorize a tweet as loving: 'Love', 'Thank', 'Happy', 'Bless'

Words that categorize a tweet as hating: 'F*ck', 'Sh*t', 'B*tch', 'Idiot'

Hover over the different sections of the chart to see the specific amount of tweets that comprise that categorization.

Countries Mentioned

This section categorizes the tweets grabbed by country mention, so we may see the breakdown of which countries are being talked about most.

Hover over the different sections of the chart to see the specific amount of tweets that comprise that specific country.

Top Tweets' Languages

This section categorizes the Top Tweets (retweeted > 10,000 times) grabbed by language, so we may see the breakdown of languages used in Top Tweets.

Hover over the different sections of the chart to see the specific amount of Top Tweets that comprise that language.

Sample of Top Tweets

This section provides a sample of the Top Tweets (retweeted > 10,000 times) that were grabbed. It's interesting to see a handful of tweets that were capable of garnering the attention of thousands of people.

Contact

Please send me an e-mail if you have any suggestions for me to further improve my code, improve this page, or for other analyses for me to conduct! Thanks for scrolling by :-)

Twitterlytics utilizes Python 3 and the Tweepy library to grab tweets from the live Twitter API stream. The data is then written to a SQLite database, then presented using Flask and Google Charts.

The flowchart below illustrates the process.

If you would like more information, please see my project page on Github where all the source code may be viewed.

But enough introduction, and onto the actual analytics!

Words that categorize a tweet as hating: 'Fck', 'Sht', 'B*tch', 'Idiot'