Click here to Skip to main content
15,180,782 members
Articles / Artificial Intelligence / Machine Learning
Posted 10 Jan 2019


6 bookmarked

Tools to Help Optimize Deep-Learning Performance

10 Jan 2019CPOL
This article explores how developers can make deep-learning applications faster and more efficient by taking advantage of tools that optimize deep-learning code.

This article is in the Product Showcase section for our sponsors at CodeProject. These articles are intended to provide you with information on products and services that we consider useful and of value to developers.

Machine learning and deep learning are becoming more and more sophisticated every year. In the 1990s, the early days of deep-learning software, recognizing digits was considered state-of-the-art. By the early 2010s, deep-learning researchers had moved on to more impressive tasks, such as image classification. And in the past couple of years, we’ve seen the application of deep learning for solving challenges like large-scale license plate recognition at speeds as fast as 3 milliseconds per plate.

But there is a challenge that programmers need to address if deep learning is to keep evolving at this pace. As deep-learning algorithms grow increasingly complex, and as the amount of data they parse grows larger and larger, delivering results quickly and affordably becomes harder and harder. That is because the more complex a deep-learning algorithm is, the more hardware resources it demands in order to perform at an acceptable pace.

A secondary factor at play is the fact that deep-learning code is now moving from the laboratories of academic researchers or corporate R&D departments into real-world production use. Therefore, guaranteeing deep-learning performance is becoming more and more crucial. In the lab, performance may not be as critical a consideration as the sophistication of the deep-learning model that you create. But in the real world, the speed at which you achieve results matters.

With these challenges in mind, this article explores how developers can make deep-learning applications faster and more efficient by taking advantage of tools that optimize deep-learning code.

How not to improve deep-learning performance

One solution to the conundrum of assuring deep-learning software performance is to throw more hardware at the environments that host deep-learning algorithms. If you add more CPUs, GPUs, memory and so on, you’ll likely achieve faster performance.

On its own, however, adding hardware is not an efficient or cost-effective solution. It might help to a certain degree (and, indeed, generally speaking, hardware infrastructures will need to keep expanding in order to support the increasingly complex deep-learning algorithms of the future), but it’s not the only way that programmers should respond to the challenge of guaranteeing the acceptable performance of deep-learning code.

A better answer: Deep-learning optimization

Instead, now more than ever, developers should take care to optimize the deep-learning code they write.

While most developers already understand the importance of optimization in principle, actually optimizing code can be more difficult in practice. This is especially true in the context of deep-learning applications, where data quality and algorithmic sophistication often take priority over low-level code optimizations.

In addition, it can sometimes be difficult to optimize deep-learning applications if they were written for experimental purposes, without performance as a primary goal, but you later decide to use them in a different context where performance and efficiency are more important. It’s typically easier to optimize code if you are building it from the ground up than it is to improve efficiency in code that already exists.

While these may be challenging obstacles to overcome, the fact remains that under-optimized deep-learning software is unlikely to keep pace with the evolving needs and expectations of the deep-learning field. It’s also poorly positioned to move beyond the experimental phase and achieve real-world adoption.

Deep-learning optimization tools

That’s why it’s worth taking a look at the tools that are available for optimizing deep-learning code. Below are several such tools.

When identifying these tools, we searched for those that can accommodate existing deep-learning applications by helping to optimize the way they run with minimal recoding or code refactoring required. We’ve also included some optimization tools that are best used when writing a deep-learning application from scratch.

We should also note that, for the purposes of this article, "optimization" entails two main goals. The first is improving the speed at which a deep-learning application runs. The second is improving an application’s efficiency by allowing it to consume fewer hardware resources while retaining the same level of performance. Both goals are important for improving deep-learning performance and addressing the challenges described above.


The Intel® Math Kernel Library for Deep Neural Networks, or (Intel® MKL-DNN), provides a range of functions for building deep neural networks. The functions are optimized for high performance on Intel’s Atom, Core and Xeon processors, as well as compatible devices.

MKL-DNN thus provides both ready-made functions that are likely to be useful for a variety of deep-learning applications and code-level optimizations within those functions that improve performance and efficiency. The library is an easy way to build high-performing deep-learning software without having to spend much time optimizing code yourself.

A key advantage of MKL-DNN is that the library’s functions can be called by any C or C++ application. Therefore, you don’t have to write an application from scratch in order to take advantage of the library. You can take an existing C or C++ application and call MKL-DNN functions from it, or use those functions to replace unoptimized ones, without having to overhaul your codebase.

The MKL-DNN open source code and documentation is available on GitHub.


BigDL is designed to do what its name sounds like: enable deep learning in a Big Data framework. Specifically, BigDL provides a library of deep-learning functions for use with Apache Spark.

BigDL functions achieve a high level of performance by taking advantage of the Intel® Math Kernel Library (Intel® MKL), a general-purpose library for a broad range of compute-intensive programming tasks, not just those related to deep learning or machine learning. MKL is optimized for performance on Intel processors. In addition, BigDL implements multiple threads in each Spark task.

Thanks to these optimizations, BigDL provides performance that is "orders of magnitude faster" than other, non-optimized deep-learning frameworks, according to BigDL developers.


The Deep Learning Optimization Library, or DLOPT, is a tool that takes a somewhat different approach to optimizing deep-learning code. Its goal is not to provide hardware-optimized functions for common deep-learning tasks, but rather to simplify the process of designing and implementing a highly efficient deep-learning application architecture.

In other words, the optimizations it provides center on the architectural dimension of deep-learning software, rather than optimizing execution at the hardware level.

DLOPT, which is written in Python and depends on Keras and TensorFlow, is a new project created in just the past year. So far, it has been used only for research purposes, but its creators say in a paper describing the tool that the library is fully functional and that they plan to transfer it into commercial use. If you’re looking for very new, cutting-edge deep-learning optimization tools, DLOPT might be a good place to start.


Also in the category of very new Python-based deep-learning libraries designed for architectural optimization is McTorch, another library introduced this year by researchers. McTorch is based on PyTorch, an open source machine-learning library that has been around for a couple of years.

The main difference between McTorch and PyTorch is that the former is designed to make it easier to implement manifold constraints for deep-learning applications in an optimized way, using the same methods as Manopt and Pymanopt.

OpenVINO (Open Visual Inference and Neural Network Optimization)

The Intel® Distribution of OpenVINO™ toolkit is a production-ready software toolkit for building high-performing deep-learning applications that emulate human vision. It includes optimized functions and supports deployment across heterogeneous compute environments, which means you can easily take advantage of GPUs and other vision accelerators to achieve massive parallel processing.

The toolkit is most useful if you’re creating a new vision application by leveraging the 30+ optimized pre-trained models that are included in the package, although its unified common API could be incorporated into existing deep-learning codebases in order to improve performance and efficiency.

The OpenVINO™ toolkit is available as an open source product on GitHub, and contains the Deep Learning Deployment Toolkit (DLDT) and an open model zoo.


Last but not least on our list of deep-learning optimization tools is nGraph, another deep-learning toolset from Intel. NGraph is currently under development and available only in beta form, but the goal is for it to provide a C++ library, specialized compiler and runtime accelerator designed to make it easy to implement and deploy high-performing deep-learning code.

NGraph works via "bridges" that connect it to other deep-learning frameworks. Prebuilt bridges are available for major frameworks like TensorFlow and MXNet, and it’s possible to implement a custom bridge for other frameworks.

Thus, nGraph is a very useful tool for taking a deep-learning application that you’ve already written in one framework and improving its performance and efficiency by taking advantage of an optimized compilation and execution suite.


Writing deep-learning code is one thing. Writing (and executing) deep-learning code that runs at an acceptable speed without consuming unreasonable quantities of hardware resources is another.

The tools described above will help you to achieve the latter goal by helping you design and write better-performing deep-learning code, as well as (in some cases) optimizing the performance of deep-learning applications you’ve already written. With tools like these, making the leap from proof-of-concept deep-learning code to applications that are practical for production-level deployment is easier than ever.


This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


About the Author

United States United States

Chris Riley is a technologist who has spent 12 years helping organizations transition from traditional development practices to a modern set of culture, processes and tooling. In addition to being a Gigaom Research analyst, he is an O’Reilly author, regular speaker, and subject matter expert in the areas of DevOps Strategy and culture and Enterprise Content Management. Chris believes the biggest challenges faced in the tech market is not tools, but rather people and planning.

Throughout Chris’s career he has crossed the roles of marketing, product management, and engineering to gain a unique perspective of how the deeply technical is used to solve real-world problems. By working with both early adopters and late, he has watched technologies mature from rough solutions to essential and transparent. In addition to spending his time understanding the market he helps ISVs selling B2D and practitioner of DevOps Strategy. He is interested in machine-learning, and the intersection of BigData and Information Management.


application lifecycle management (alm) devops enterprise content management (ecm) information architecture (ia) information governance

Comments and Discussions

-- There are no messages in this forum --