A Streaming Twitter Client

Sams Publishing

5.00/5 (2 votes)

Oct 3, 2012

CPOL

15 min read

29155

A chapter excerpt from Sams Teach Yourself Node.js in 24 Hours.

This excerpt is from the new book, ‘Sams Teach Yourself Node.js in 24 Hours’ authored by George Ornbo, published by Pearson/SAMS, Sept. 2012, ISBN 9780672335952, Copyright © 2013 by Pearson Education, Inc. For more info please visit the publisher site: www.informit.com/title/9780672335952

George Ornbo
Published by by Sams
ISBN-10: 0-672-33595-6
ISBN-13: 978-0-672-33595-2

Receive data from Twitter’s streaming API
Parse data received from Twitter’s streaming API
Push third-party data out to clients in real-time
Create a real-time graph
Discover whether there is more love or hate in the world by using real-time data from Twitter

Streaming APIs

In Hour 13, "A Socket.IO Chat Server," you learned how to create a chat server with Socket.IO and Express. This involved sending data from clients (or browsers) to the Socket.IO server and then broadcasting it out to other clients. In this hour, you learn about how Node.js and Socket.IO can also be used to consume data directly from the Web and then broadcast the data to connected clients. You will be working with Twitter’s streaming Application Programming Interface (API) and pushing data out to the browser in real-time.

With Twitter’s standard API, the process for getting data is as follows:

You open a connection to the API server.
You send a request for some data.
You receive the data that you requested from the API.
The connection is closed.

With Twitter’s streaming API, the process is different:

You open a connection to the API server.
You send a request for some data.
Data is pushed to you from the API.
The connection remains open.
More data is pushed to you when it becomes available.

Streaming APIs allow data to be pushed from the service provider whenever new data is available. In the case of Twitter, this data can be extremely frequent and high volume. Node.js is a great fit for this type of scenario where large numbers of events are happening frequently as data is received. This hour represents another excellent use case for Node.js and highlights some of the features that make Node.js different from other languages and frameworks.

Signing Up for Twitter

Twitter provides a huge amount of data to developers via a free, publically available API. Many Twitter desktop and mobile clients are built on top of this API, but this is also open to developers to use however they want.

If you do not already have a Twitter account, you need one for this hour. You can sign up for an account for free at https://twitter.com/. It takes less than a minute! Once you have a Twitter account, you need to sign into the Twitter Developers website with your details at http:// dev.twitter.com/. This site provides documentation and forums for anything to do with the Twitter API. The documentation is thorough, so if you want, you can get a good understanding of what types of data you can request from the API here.

Within the Twitter Developers website, you can also register applications that you create with the Twitter API. You create a Twitter application in this hour, so to register your application, do the following:

Click the link Create an App.
Pick a name for your application and fill out the form (see Figure 14.1). Application names on Twitter have to be unique, so if you find that the name has already been taken, choose another one.

Figure 14.1 Creating a Twitter application

Once you create your application, you need to generate an access token and an access token secret to gain access to the API from your application.

At the bottom of the Details tab is a Create My Access Token button (see Figure 14.2). Click this button to create an access token and an access token secret.

Figure 14.2 - requesting an access token

When the page refreshes, you see that values have been added for access token and access token secret (see Figure 14.3). Now you are ready to start using the API!

Figure 14.3 A successful creation of an access token

By The Way

OAuth Is a Way of Allowing Access to Online Accounts

oAuth is an open standard for authentication, typically used within the context of web applications. It allows users to grant access to all or parts of an account without handing over a username or password. When a user grants an application access to their account, a unique token is generated. This can be used by a third-party services to access all or parts of a user’s account. At any time, the user can revoke access and the token will no longer be valid so an application would no longer have access to the account.

Using Twitter’s API with Node.js

Once you create your application within the Twitter Developers website and request an OAuth access token, you are ready to start using the Twitter API. An excellent Node.js module is available for interacting with the Twitter API called ntwitter. This module was initially developed by technoweenie (Rick Olson), then jdub (Jeff Waugh), and is now maintained AvianFlu (Charlie McConnell). All the authors have done an amazing job of abstracting the complexity of interacting with Twitter’s API to make it trivial to get data and do things with it. You continue to use Express in this hour, so the package.json file for the application will include the Express and ntwitter modules.

{ "name":"socket.io-twitter-example", "version":"0.0.1", "private":true, "dependencies":{
 "express":"2.5.4", "ntwitter":"0.2.10" } }

If you requested these when you were setting up the application in the Twitter Developers web-site, these will be available on the Details page for your application. If you did not request them when you set up the application, you need to do so now under the Details tab. Once you have the keys and secrets, you can create a small Express server to connect to Twitter’s streaming API:

var app = require('express').createServer(), twitter = require('ntwitter'); 
app.listen(3000); 
var twit = new twitter({ consumer_key: 'YOUR_CONSUMER_KEY', consumer_secret: 'YOUR_CONSUMER_SECRET', access_token_key: 'YOUR_ACCESS_TOKEN_KEY', access_token_secret: 'YOUR_ACCESS_TOKEN_KEY' 
});

Of course, you need to remember to replace the values in the example with your actual values. This is all you need to start interacting with Twitter’s API! In this example, you answer the question, "Is there more love or hate in the world?" by using real-time data from Twitter. You request tweets from Twitter’s streaming API that mention the words "love" or "hate" and perform a small amount of analysis on the data to answer the question. The ntwitter module makes it easy to request this data:

twit.stream('statuses/filter', { track: ['love', 
    'hate'] }, function(stream) { stream.on('data', 
    function (data) { console.log(data); }); });

This requests data from the 'statuses/filter' endpoint that allows developers to track tweets by keyword, location, or specific users. In this case, we are interested in the keywords 'love' and 'hate'. The Express server opens a connection to the API server and listens for new data being received. Whenever a new data item is received, it writes the data to the console. In other words, you can see the stream live for the keywords "love" and "hate" in the terminal.

Figure 14.4 - Streaming data to the terminal

Extracting Meaning from the Data

So far, you created a way to retrieve data in real-time from Twitter, and you saw a terminal window move very fast with a lot of data. This is good, but in terms of being able to understand the data, you are not able to answer the question set. To work toward this, you need to be able to parse the tweets received and extract information. Twitter provides data in JSON, a subset of JavaScript, and this is great news for using it with Node.js. For each response, you can simply use dot notation to retrieve the data that you are interested in. So, if you wanted to view the screen name of the user along with the tweet, this can be easily achieved:

twit.stream('statuses/filter', { track: ['love', 'hate'] },
    function(stream) { stream.on('data', function (data) {
    console.log(data.user.screen_name + ': ' + data.text); }); });

Full documentation on the structure of the data received from Twitter is available on the docu mentation for the status element. This can be viewed online https://dev.twitter.com/docs/api/1/get/statuses/show/%3Aid. Under the section "Example Request," you can see the data structure for a status response. Using dot notation on the data object returned from Twitter, you are able to access any of these data points. For example, if you want the URL for the user, you can use data.user.url. Here is the full data available for the user who posted the tweet:

"user": {
 "profile_sidebar_border_color": "eeeeee",
 "profile_background_tile": true,
 "profile_sidebar_fill_color": "efefef",
 "name": "Eoin McMillan ",
 "profile_image_url": "http://a1.twimg.com/profile_images/1380912173/Screen_ 
 	shot_2011-06-03_at_7.35.36_PM_normal.png", "created_at": "Mon May 16 20:07:59 +0000 2011", "location": "Twitter", "profile_link_color": "009999", "follow_request_sent": null, "is_translator": false, "id_str": "299862462", "favourites_count": 0, "default_profile": false, "url": "http://www.eoin.me", "contributors_enabled": false, "id": 299862462, "utc_offset": null, "profile_image_url_https": "https://si0.twimg.com/profile_images/1380912173/ 
 	Screen_shot_2011-06-03_at_7.35.36_PM_normal.png", "profile_use_background_image": true, "listed_count": 0, "followers_count": 9, "lang": "en", "profile_text_color": "333333", "protected": false, "profile_background_image_url_https": "https://si0.twimg.com/images/themes/ 
 	theme14/bg.gif", "description": "Eoin's photography account. See @mceoin for tweets.", "geo_enabled": false, "verified": false, 

"profile_background_color": "131516",
 "time_zone": null,
 "notifications": null,
 "statuses_count": 255,
 "friends_count": 0,
 "default_profile_image": false,
 "profile_background_image_url": "http://a1.twimg.com/images/themes/theme14/bg.gif",
 "screen_name": "imeoin", "following": null, "show_all_inline_media": false
 }

There is much more information available with each response including geographic coordinates, whether the tweet was retweeted, and more.

Pushing Data to the Browser

Now that data from Twitter is in a more digestible format, you can push this data out to connected browsers using Socket.IO and use some client-side JavaScript to display the tweets. This is similar to the patterns that you saw in Hours 12 and 13 where data is received by a Socket.IO server and then broadcast to connected clients. To use Socket.IO, it must first be added as a dependency in the package.json file:

{ "name":"socket.io-twitter-example", "version":"0.0.1", "private":true, "dependencies":{
 "express":"2.5.4",
 "ntwitter":"0.2.10",
 "socket.io":"0.8.7"

 } }

Then, Socket.IO must be required in the main server file and instructed to listen to the Express server. This is exactly the same as the examples that you worked through in Hours 12 and 13:

var app = require('express').createServer(), twitter = require('ntwitter'), 
    io = require('socket.IO').listen(app);

The streaming API request can now be augmented to push the data out to any connected Socket.IO clients whenever a new data event is received:

twit.stream('statuses/filter', { track: ['love', 'hate'] }, 
 function(stream) { stream.on('data', function (data) {
 io.sockets.volatile.emit('tweet', {
 user: data.user.screen_name,
 text: data.text

 }); }); });

Instead of logging the data to the console, you are now doing something useful with the data by pushing it out to connected clients. A simple JSON structure is created to hold the name of the user and the tweet. If you want to send more information to the browser, you could simply extend the JSON object to hold other attributes.

You may have noticed that, instead of using io.sockets.emit as you did in Hours 12 and 13, you are now using io.sockets.volatile.emit. This is an additional method provided by Socket.IO for scenarios where certain messages can be dropped. This may be down to network issues or a user being in the middle of a request-response cycle. This is particularly the case where high volumes of messages are being sent to clients. By using the volatile method, you can ensure that your application will not suffer if a certain client does not receive a message. In other words, it does not matter whether a client does not receive a message.

The Express server is also instructed to serve a single HTML page so that the data can be viewed in a browser.

app.get('/', function (req, res) { res.sendfile(__dirname + '/index.html'); });

On the client side (or browser), some simple client-side JavaScript is added to the index.html file to listen for new tweets being sent to the browser and display them to the user. The full HTML file is available in the example that follows:

<ul class="tweets"></ul> <script src="https://ajax.googleapis.com/ajax/libs/jquery/1.7.1/jquery.min.js"></ 
 script> <script src="http://www.codeproject.com/socket.io/socket.io.js"></script> <script>
 var socket = io.connect();
 jQuery(function ($) {
 var tweetList = $('ul.tweets');
 socket.on('tweet', function (data) {

 tweetList .prepend('<li>' + data.user + ': ' + data.text + '</li>'); }); }); </script>

An empty unordered list is added to the DOM (Document Object Model), and this is filled with a new list item containing the screen name of the user and the tweet each time a new tweet is received. This uses jQuery’s prepend() method to insert data received into a list item within the unordered list. This has the effect of creating a stream on the page.

Now, whenever Socket.IO pushes a new tweet event out the browser receives it and writes it to the page immediately. Instead of viewing the stream of tweets in a terminal, it can now be viewed in the browser!

Creating a Real-Time Lovehateometer

Although the application can now stream tweets to a browser window, it is still not very useful. It is still impossible to answer the question of whether there is more love or hate in the world. To answer the question, you need a way to visualize the data. Assuming that the tweets received from the API are indicative of human sentiment, you set up several counters on the server that increment when the words "love" and "hate" are mentioned in the streaming data that is received. Furthermore, by maintaining another counter for the total number of tweets with either love or hate in them, you can calculate whether love or hate is mentioned more often. With this approach, it is possible to say—in unscientific terms—that there is x% of love and y% of hate in the world.

To be able to show data in the browser, you need counters on the server to hold:

The total number of tweets containing "love" or "hate"
The total number of tweets containing "love"
The total number of tweets containing "hate"

This can be achieved by initializing variables and setting these counters to zero on the Node.js server:

var app = require('express').createServer(), twitter = require('ntwitter'), 
    io = require('socket.io').listen(app), love = 0, hate = 0, total = 0;

Whenever new data is received from the API, the love counter will be incremented if the word "love" is found and so on. JavaScript’s indexOf() string function can be used to look for words within a tweet and provides a simple way to analyze the content of tweets:

twit.stream('statuses/filter', { track: ['love', 'hate'] }, 
 function(stream) { stream.on('data', function (data) {
 var text = data.text.toLowerCase();
 if (text.indexOf('love') !== -1) {
 love++
 total++

 }
 if (text.indexOf('hate') !== -1) {
 hate++
 total++

 } }); });

Because some tweets may contain both "love" and "hate," the total is incremented each time a word is found. This means that the total counter represents the total number of times "love" or "hate" was mentioned in a tweet rather than the total number of tweets.

Now that the application is maintaining a count of the occurrences of words this data can be added to the tweet emitter and pushed to connected clients in real-time. Some simple calculation is also used to send the values as a percentage of the total number of tweets:

io.sockets.volatile.emit('tweet', { user: data.user.screen_name, 
    text: data.text, love: (love/total)*100, hate: (hate/total)*100 
});

On the client side, by using an unordered list and some client-side JavaScript, the browser can receive the data and show it to users. Before any data is received from the server, the values are set to zero:

<ul class="percentage"> 
  <li class="love">0</li> 
  <li class="hate">0</li> 
</ul>

Finally, a client-side listener can be added to receive the tweet event and replace the percentage values with the ones received from the server. By starting the server and opening the browser, you can now answer the question!

<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.7.1/jquery.min.js"></script> 
<script src="http://www.codeproject.com/socket.io/socket.io.js"></script> <script>
 var socket = io.connect();
 jQuery(function ($) {
 var tweetList = $('ul.tweets'),
 loveCounter = $('li.love'),
 hateCounter = $('li.hate');

 socket.on('tweet', function (data) { tweetList .prepend('<li>' + data.user + ': ' + 
     data.text + '</li>'); loveCounter .text(data.love + '%'); 
     hateCounter .text(data.hate + '%'); }); }); </script>

Adding a Real-Time Graph

The application is now able to answer the question. Hurray! In terms of visualization, though, it is still just data. It would be great if the application could generate a small bar graph that moved dynamically based on the data received. The server is already sending this data to the browser so this can be implemented entirely using client-side JavaScript and some CSS. The application has an unordered list containing the percentages, and this is perfect to create a simple bar graph. The unordered list will be amended slightly so that it is easier to style. The only addition here is to wrap the number in a span tag:

<ul class="percentage"> <li class="love">
 <span>0</span>
 </li>
 <li class="hate">

 <span>0</span> </li> </ul>

Some CSS can then be added to the head of the HTML document that makes the unordered list look like a bar graph. The list items represent the bars with colors of pink to represent love and black to represent hate:

<style>
  ul.percentage { width: 100% } ul.percentage li { display: block; width: 0 }
  ul.percentage li span { float: right; display: block} ul.percentage li.love {
  background: #ff0066; color: #fff} ul.percentage li.hate { background: #000;
  color: #fff}
</style>

Finally, some client-side JavaScript allows the bars (the list items) to be resized dynamically based on the percentage values received from the server:

<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.7.1/jquery.min.js"></script> 
<script src="http://www.codeproject.com/socket.io/socket.io.js"></script> <script>
 var socket = io.connect();
 jQuery(function ($) {

 var tweetList = $('ul.tweets'),
 loveCounter = $('li.love'),
 hateCounter = $('li.hate'),
 loveCounterPercentage = $('li.love span'),
 hateCounterPercentage = $('li.hate span');

 socket.on('tweet', function (data) {
 loveCounter

.css("width", data.love + '%'); loveCounterPercentage .text(Math.round(data.love * 10) / 10 + '%'); 
hateCounter .css("width", data.hate + '%'); hateCounterPercentage .text(Math.round(data.hate * 10) / 10 + '%'); 
tweetList .prepend('<li>' + data.user + ': ' + data.text + '</li>'); }); }); </script>

Whenever a new tweet event is received from Socket.IO, the bar graph is updated by dynamically setting the CSS width of the list items with the percentage values received from the server. This has the effect of adjusting the graph each time a new tweet event is received. You have created a real-time graph!

The application that you created provides a visual representation of whether there is more love than hate in the world based on real-time data from Twitter. Granted this is totally unscientific, but it does showcase the capabilities of Node.js and Socket.IO to receive large amounts of data and to push it out to the browser. With a little more CSS work, the application can be styled to look better (see Figure 14.9).

Figure 14.9 - The finished application with additional styling

If you want to run this example yourself, this version is available in the code for this book as hour14/example06.

Summary

In this hour, you answered a fundamental question about human nature using Node.js, Twitter, and Socket.IO. Not bad for an hour’s work! At the time of writing, there is more love in the world, so if you take nothing else from this hour, rejoice! You learned how a Node.js server can receive large amounts of data from a third-party service and push it out to the browser in real-time using Socket.IO. You saw how to manipulate the data to extract meaning from it and performed simple calculations on the data to extract percentage values. Finally, you added some client-side JavaScript to receive the data and create a real-time graph. This hour showcased many of the strengths of Node.js, including the ease that data can be sent between the server and browser, the ability to process large amounts of data, and the strong support for networking.

Q&A

Q. Are there other streaming APIs that I can use to create applications like this?

A. yes. An increasing number of streaming APIs is becoming available to developers. At the time of writing, some APIs of interest include Campfire, Salesforce, Datasift, and Apigee, with many more expected to be created.

Q. How accurate is this data?

A. Not very. This data is based on the "statuses/filter" method from Twitter’s streaming API. More information about what goes into this feed is available here https://dev.twitter.com/ docs/streaming-api/methods. In short, do not base any anthropological studies on it.

Q. Can I save this data somewhere?

A. The application created in this hour does not persist data anywhere, so if the server is stopped, the counters and percentages are reset. Clearly, the longer that data can be collected, the more accurate the results. The application could be extended to store the counters with a data store that can handle high volumes of writes like redis. This is outside the scope of this hour, though!

Workshop

This workshop contains quiz questions and exercises to help cement your learning in this hour.