Maybe you’re like me and you have a few web properties or applications on the public web that are running and (according to your server monitoring) are functioning well. You haven’t heard of any problems with memory or processor utilization issues, and maybe you’ve even got it on a public cloud service like Azure where it is being scaled out automatically with new servers added when there is a flood of requests. But are the users, the visitors to your web applications having a good experience? What is the REAL performance of the web application on the browser? How do you test these things and get any sense of how appealing that is to your visitors and customers?
I had heard about the Raygun Pulse product from Raygun's website while looking for some way to measure if my Azure based blog was serving content at an appropriate rate to my visitors. Raygun also provides real-time error and crash reporting for all major programming languages and platforms but for now, I really wanted to dive into these performance questions to find some answers.
My blog doesn’t serve new content every day, but when I do publish new content I want my cloud services to handle the load, and I think I have that configured properly. Or at least I thought I had my blog configured properly ... I decided to give Raygun Pulse a try and connect it to my Azure-hosted Wordpress blog to do some performance check-ups, but you can use Pulse on any web application you want to monitor with mobile application support also on the way soon.
Sign up and Start up
The getting started process with Raygun is a breeze. I was able to register with my Twitter account and request a 30-day demo to the Pulse monitoring system at www.raygun.com in just a few quick seconds. I then started the short process to configure my application for use with Raygun Pulse:
- I submitted the name of my application to the Raygun configuration:
- I chose what features from Raygun I wanted to enable with this application:
- I then was able to choose my application framework and get instructions specific to that framework to configure the standard Raygun monitoring. An API Key is assigned to the application and reported so that it can be used in the application configuration. In my case, there is a Wordpress plugin that I installed from the Wordpress admin console called "Raygun4WP".
- After a brief pause while my blog downloaded and installed the plugin, I filled out some configuration information in my blog console:
This console is really easy to understand, allowing me to configure the appropriate Wordpress errors that I want reported to me in the email address I configured during the signup process. The API Key is the key reported to me for this application in step 3. Tags are strings that you can use to help identify the application’s errors in the Raygun console. Finally, the domains to ignore is a comma-separated list of domains from which visits and errors should be ignored. This is really helpful to prevent error reports generated from test requests or administrative actions the originate from your corporate domain.
I copied these two scripts from the Raygun website into the header.php file in use with my Wordpress theme, inserting them just before the </head> tag on the page.
With that last file saved, I navigated to my Raygun dashboard and was already greeted with some information about my Wordpress application tracking errors and visitor status. However, what I’m really interested in is the visitor performance information that Pulse delivers. I accessed that by choosing the Pulse menu option on the left side of the screen and started looking into what insights Pulse had about my visitor’s experience.
Pulse of my blog at a glance
Confession: I let Pulse run for a while and published a new blog post so that I could see some new visitors hit my blog with the Pulse monitor. The initial results were simply not what I expected:
The first screen of the Pulse dashboard shows you where visitors are "roughly" located geographically and some overall information about their experiences on your site. In my case, I have some visitors from North America, Spain, Africa, and Asia ... I never knew I had blog readers from Africa! I also could see from the top of the dashboard that my visitors are having a poor experience, with an eight second load time ... not cool. I know from previous studies that the request time for a page should be less than 500ms (half a second) before visitors think that your site is slow and decide to go somewhere else.
Oh, and this screen was updating LIVE in front of me. The Pulse dashboard is like my own personal website NORAD with instant information about the application experience reported to me. I saw some requests start to come in from Eastern Canada and even the UK shortly after I took the screenshot above. Neat stuff ... The green pings on the map are neat, but they do not look like they are intended to be accurate. Also, I cannot zoom in on the map and see more details on the visitors connecting from Europe from here, but can get better Geographical information from the Geo tab which I’ll show you later.
After spending a few minutes watching the purple map light up with friends and readers visiting my blog, it was time to get down to business: why the heck is my blog so SLOW? I started digging into my problem by clicking the Performance link at the top of the dashboard.
Look at that, performance is rated as "Poor" with a median load time of 2 seconds for my pages, and even slower at the 90th percentile with an 8 second report and the 99th percentile reporting a 47 second load time! Ouch ... The most requested pages are listed below, and I see an immediate trend: the blue segments of these page load times (to indicate server load time) is very small; my Wordpress server is doing its job of delivering the HTML quickly. However, the child objects and references indicated in purple bars on these pages are taking FOREVER to load. Let’s drill in a little further by clicking on the name of that first article:
I can see that this page is rated as a poor performer, and the load times are dreadful. Looking further into it, the server is delivering the HTML in 645ms which is a decent but not great time for the server to fetch and deliver HTML. The 2267ms to render the content in the browser is a problem. Lets look further into that and the children elements that are taking 2354ms to load:
Now things are getting clearer. Based on the sampling of the almost 300 visits to this article it's clear that there are some problems with the content being delivered. There are 50 images, scripts, and CSS files hosted either on my server or somewhere else that take a REALLY long time to deliver. Those red and blue lines are transfer times and server render times respectfully. Both are way too long for the purposes of this static content and I should do something about them. Additionally, there are just WAY too many additional requests for content for this page. Granted: I used several code embedding scripts from gist.github.com to format sample code on the page, but there are still too many items on this page. Do I really need the Twitter sidebar and gizmos on my blog? Probably not, and I could save my visitors about a dozen requests for content.
I’m also the victim of a number of Wordpress plugins that are doing a handful of tasks for me and injecting extra script on the page. Could I eliminate that content and rewrite it into something simpler and easier to deliver? Absolutely, and I know exactly what items are misbehaving and what to target. Those large 7+ second lines in this performance view need to be eliminated.
What about the Platforms in use?
I saw the ‘Platforms’ button on the top and decided to take a look at the list of platforms or operating systems that my visitors are using. Given that I write a lot about .NET and Microsoft, I expected an overwhelming number of visitors to come from Windows. I was correct, but found another piece to my performance puzzle:
While 83% of my visitors are on Windows, 11% are on a mobile operating system. Going further, the mobile visitors are having a SIGNIFICANTLY worse experience on my blog. Look at those numbers from November 10th: more than 30 second average load times on a mobile device. This information, coupled with the data about the really slow images that are loading on my blog leads me to want to revisit the Wordpress plugins that I have installed specifically to support mobile visitors and to look at updating my theme to use smaller images.
Is Geography an Issue?
I wanted to take a look at one final thing, as I was amazed that I had visitors from Africa earlier, I never considered that their geography may be causing some of the performance issue that is being reported. When I click the Geo tab at the top of the dashboard, I can see the world map and indicators of relative speed, as well as the break out of load times across various countries:
Most of my visitors are coming from Europe at the time of this screenshot, and I find that interesting given how American-centric my writing tends to be. Looking closer, I can see that the slowest load times are coming from Nigeria and Indonesia. Here’s the kicker: my blog runs from a server in North America, so I’ve already got a problem with distance to those visitors. With Brazil listed as the fourth country with the slowest visits, maybe I can look into Azure replicating my static content to a data center in the southern hemisphere. If my primary storage is hosted in North America, with a read-only backup running in Brazil I should be able to deliver better performance and also realize a distributed backup of my content in case of a failure in the primary data-center in North America.
With just a 10 minute investment up front to configure Raygun Pulse for my Wordpress blog, I was able to monitor my site for a few days free of charge to see how it performs. After publishing a new blog post to help generate some traffic, I was able to immediately see on the dashboard the visitors to my site in real-time. I could see where my blog would run slow, and identify those pain points in the plugins and libraries that my web application is referencing. I now have a base-line that I can compare against as I set forth to tune my blog for better performance.