Click here to Skip to main content
15,907,281 members
Articles / All Topics

How We Have Reduced CPU Usage on our Front Web Servers

Rate me:
Please Sign up or sign in to vote.
5.00/5 (1 vote)
3 Jun 2014CPOL4 min read 8.2K   1  
How to reduce CPU usage on our front web servers

Despite all our awareness on web performance and load testing, we recently encountered a few performance issues caused by high CPU usage on some front end web servers.

At Betclic, we use Structuremap (2.6.4) as main IoC container in an ASP.NET MVC 4 project using Web Api too. It’s used in the web site following best practices and nothing is really special here. We also love clean architectures that's why we use intensively dependency resolution in a multi-tenancy-like code base.

In this post, I will explain how we handled this issue, and the methodology you should have for this kind of problems. This is a variation of Deming’s cycle.

Note: Please note that for confidentiality reasons, the details provided are not complete but you can trust me.

The Problem

Starting February 2014, we started to see this kind of pattern for CPU usage. It didn’t appear after a deployment but we observed a smooth increase after each deployment (with new features)

first

One important thing to note is that contrary to many websites, our activity is mainly during evenings after working hours and during live sport events.

Why is having a high CPU so problematic? It’s just one of the many possible bottlenecks. Basically it limits scalability, increases response time and probability on component/app/server failure. Applications can also become sluggish or stop responding completely.

Detecting a performance issue early is very important but it’s only half of the job; you still have to understand and explain why.

Step 1: Analyze the Crime Scene / Understand the Context

Here is the crime scene, let’s find the clues and analyze evidence. Your first mission is to understand what happened.

  • How were page / API response time for user during the issue?
  • Have you got an idea concerning the requests distribution?
  • Can you see abnormal applicative or web server logs?
  • Did someone report an outage ?

Lack of tools, logs and metrics is generally not a good thing for you.

One key thing to notice here on the graph below: the CPU usage was already at 30% very early in the morning, when our activity was quite reduced.

Step 2: Try Reproduce

So, we created integration & web tests according to IIS logs and our monitoring tools.

Running these tests with Visual Studio Profiler locally or remotely during load-testing lead us to the same results: GetNestedContainer is a CPU-consuming operation (around 25%). It’s not something we think, it’s something we’ve measured.

It was also confirmed by with a very basic microbenchmark on GetNestedContainer: stopwatch takes around 15ms (when an api action take 40 ms).

This method is called only once inside our IDependencyScope.BeginScope.

C#
public IDependencyScope BeginScope()
{
     IContainer child = this.Container.GetNestedContainer();
     return new StructureMapHttpDependencyResolver(child);
}

Step 3: Understand and Try to Improve

When using ASP.NET web API, BeginScope is called for every request. As we have many Ajax requests because of real-time display, we’ve a serious bottleneck here.

Investigating deeper shows us that ToNestedGraph uses locking and reflexion to clone each instance factory. Reflection is often costly and is in fact the main problem here.

C#
//from structuremap
public PipelineGraph ToNestedGraph()
{
     PipelineGraph clone = new PipelineGraph(this._profileManager.Clone(), 
                           this._genericsGraph.Clone(), this._log)
     {
          _missingFactory = this._missingFactory
     };
     Monitor.Enter(this);
     try
     {
          foreach (KeyValuePair<Type, IInstanceFactory> pair in this._factories)
          {
              clone._factories.Add(pair.Key, pair.Value.Clone());
          }
     }
     finally
     {
          Monitor.Exit(this);
     }
     clone.EjectAllInstancesOf<IContainer>();
     return clone;
}

There are a wide range of possibilities to avoid cloning all our definitions. We’ve chosen to create another container with only ASP.NET web API definitions and valid lifecycles. The number of instance factories has dropped to 15 (previously it was more than 400, mixed by between ASP.NET MVC and ASP.NET Web API).

As for most of the performance issues, it required only a few lines to fix it.

Step 4: Check Locally and on Test Rig

Having appropriate scenarios (step 2) and exactly knowing the problem, it’s easy to evaluate the pertinence of the fix. Do not hesitate to redo these steps many times until you reach your goals & objectives.

Step 5: Deploy and Check on Production

Once the fix has been published, the situation was clearly better. Here is the CPU usage for the same server with nearly the same activity.

last

It’s quite a good job, sir!

Conclusion and Lessons Learned

Being metric-oriented is the first key learning here. It helps to be proactive at the first abnormal behavior and you can resolve it before users are –really- impacted.

It’s also very important to understand the circumstances that lead to this kind of undesirable situation. Having IIS & applicative logs, Real user monitoring … is the minimum viable starting point. We have load-testing scenarios but they were clearly not in adequation with the real activity, especially during evening where the Live activity is important.

At the end, the more important thing maybe. Should I have to blame structuremap? Not at all and I won’t. The team provides us a great IoC framework and our implementation has just reached framework limitations. You can have the same kind of problems with any external framework in your code. Our context is so specific, but it may be the case for each project.

To conclude, my guideline when working on performance is quite simple.

“For every line of code executed on your server, you should have a –very good- explanation” – Me

profile for Cybermaxs on Stack Exchange, a network of free, community-driven Q&A sites

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Chief Technology Officer Betclic
France France
I am Head of Software Development at Betclic France. I manage the Paris Dev Team, consisting of 35+ talented people, in various technical and functional projects in the fields of sports betting, poker, casino or horse betting.

Check out our technical blog at https://techblog.betclicgroup.com
This is a Organisation

3 members

Comments and Discussions

 
-- There are no messages in this forum --