Summary
MXNet is an open-source deep learning framework that allows you to define, train, and deploy deep neural networks on a wide array of devices, from cloud infrastructure to mobile devices. It is highly scalable, allowing for fast model training, and supports a flexible programming model and multiple languages. MXNet allows you to mix symbolic and imperative programming flavors to maximize both efficiency and productivity. MXNet is built on a dynamic dependency scheduler that automatically parallelizes both symbolic and imperative operations on the fly. A graph optimization layer on top of that makes symbolic execution fast and memory efficient. The latest version of MXNet includes built-in support for the Intel® Math Kernel Library (Intel® MKL) 2017. The latest version of the Intel MKL includes optimizations for Intel® Advanced Vector Extensions 2 (Intel® AVX2) and AVX-512 instructions which are supported in Intel® Xeon® processor and Intel® Xeon Phi™ processors.
Prerequisites
Follow the instructions given here.
Building/Installing with MKL
MXNet can be installed and used with several combinations of development tools and libraries on a variety of platforms. This tutorial provides one such recipe describing steps to build and install MXNet with Intel MKL 2017 on CentOS*- and Ubuntu*-based systems.
1. Clone the mxnet tree and pull down it’s submodule dependencies
git submodule update --init --recursive
git clone https://github.com/dmlc/mxnet.git
2. Edit the following lines in make/config.mk to "1" to enable MKL support.
With these enabled when you attempt your build it will pull the latest MKL package for you and install it on your system.
USE_MKL2017 = 1
USE_MKL2017_EXPERIMENTAL = 1
3. Build the mxnet library
NUM_THREADS=$(($(grep 'core id' /proc/cpuinfo | sort -u | wc -l)*2))
make -j $NUM_THREADS
4. Install the python modules
cd python
python setup.py install
Benchmarks
A range of standard image classification benchmarks can be found under examples/image-classification. We’ll focus on running a benchmark meant to test inference across a range of topologies.
Running Inference Benchmark:
The provided benchmark_score.py will run a variety of standard topologies (AlexNet, Inception, ResNet, etc) at a range of batch sizes and report the img/sec results. Prior to running set the following environmental variables for optimal performance:
export OMP_NUM_THREADS=$(($(grep 'core id' /proc/cpuinfo | sort -u | wc -l)*2))
export KMP_AFFINITY=granularity=fine,compact,1,0
Then run the benchmark by doing:
python benchmark_score.py
If everything is installed correctly you should expect to see img/sec #’s output for a variety of topologies and batch sizes. Ex:
INFO:root:network: alexnet
INFO:root:device: cpu(0)
INFO:root:batch size 1, image/sec: XXX
INFO:root:batch size 2, image/sec: XXX
…
INFO:root:batch size 32, image/sec: XXX
INFO:root:network: vgg
INFO:root:device: cpu(0)
INFO:root:batch size 1, image/sec: XXX
You may know us for our processors. But we do so much more. Intel invents at the boundaries of technology to make amazing experiences possible for business and society, and for every person on Earth.
Harnessing the capability of the cloud, the ubiquity of the Internet of Things, the latest advances in memory and programmable solutions, and the promise of always-on 5G connectivity, Intel is disrupting industries and solving global challenges. Leading on policy, diversity, inclusion, education and sustainability, we create value for our stockholders, customers and society.