Getting Started with Intel® Software Optimization for Theano and Intel® Distribution for Python

Intel

0/5 (0 vote)

May 4, 2017

CPOL

9 min read

6568

Theano is a Python library developed at the LISA lab to define, optimize, and evaluate mathematical expressions, including the ones with multi-dimensional arrays (numpy.ndarray)

Contents

Summary
Prerequisites
- Intel® Compilers and Intel® Math Kernel Library 2017
- Python* Tools
Building and Installing Intel(R) Software Optimization for Theano*
- Verify Theano and NumPy Installation
Benchmarks
Troubleshooting
Resources
Appendix A
Appendix B
References

Summary

Theano is a Python* library developed at the LISA lab to define, optimize, and evaluate mathematical expressions, including the ones with multi-dimensional arrays (numpy.ndarray). Intel® optimized-Theano is a new version based on Theano 0.0.8rc1, which is optimized for Intel® architecture and enables Intel® Math Kernel Library (Intel® MKL) 2017. The latest version of the Intel MKL includes optimizations for Intel® Advanced Vector Extensions 2 (Intel® AVX2) and AVX-512 instructions which are supported in Intel® Xeon® processor and Intel® Xeon Phi™ processors.

Theano can be installed and used with several combinations of development tools and libraries on a variety of platforms. This tutorial provides one such recipe describing steps to build and install Intel optimized-Theano with Intel® compilers and Intel MKL 2017 on CentOS*- and Ubuntu*-based systems. We also verify the installation by running common industry-standard benchmarks like MNIST*, DBN-Kyoto*, LSTM* and ImageNet*.

Prerequisites

Intel® Compilers and Intel® Math Kernel Library 2017

This tutorial assumes that Intel compilers(C/C++ and Fortran) are already installed and verified. If not, Intel compilers can be downloaded and installed as part of the Intel® Parallel Studio XE or can be independently installed.

Installing Intel MKL 2017 is optional when using Intel® Distribution for Python*. For other python distributions Intel MKL 2017 can be downloaded as part of Intel Parallel Studio XE 2017 or can be downloaded and installed for free using the community license. To download it, first register here for a free community license and follow the installation instructions.

Python* Tools

In this tutorial, the Intel® Distribution for Python* will be used as it provides ready access to tools and techniques which are enabled and verified for higher performance on Intel architecture. This will allow usage of Intel-optimized precompiled tools like NumPy* and SciPy* without worrying about building and installing them.

Intel Distribution for Python can be available as part of Intel Parallel Studio XE or can be also independently downloaded for free from here.

Instructions to install Intel Distribution for Python are given below. This article assumes that the Python installation is completed in the local user account

Python 2.7
tar -xvzf l_python27_p_2017.0.028.tgz
cd l_python27_p_2017.0.028
./install.sh

Python 3.5
tar -xvzf l_python35_p_2017.0.028.tgz
cd l_python35_p_2017.0.028
./install.sh

Using anaconda, create an independent user environment using the steps given below. Here the required NumPy, SciPy and Cython packages are also being installed with the .

Python 2.7
conda create -n pcs_theano_2 -c intel python=2 numpy scipy cython
source activate pcs_theano_2

Python 3.5
conda create -n pcs_theano_2 -c intel python=3 numpy scipy cython
source activate pcs_theano_2

Alternatively, NumPy and SciPy can also be built and installed from the source as given inAppendix A. Steps to install other python development tools is also shown which may be required in case a non-intel distribution of python is used.

Building and installing Intel® Software Optimization for Theano*

Branch of theano optimized for Intel architecture can be checked out and installed from the following git repository.

git clone https://github.com/intel/theano.git theano
cd theano
python setup.py build
python setup.py install
theano-cache clear

An example of the Theano configuration file is given below for reference. In order to use Intel compilers and specify the compiler flags to be used with Theano, create a copy of this file in user's home directory.

vi ~/.theanorc

[cuda]
root = /usr/local/cuda

[global]
device = cpu
floatX = float32
cxx = icpc
mode = FAST_RUN
openmp = True
openmp_elemwise_minsize = 10
[gcc]
cxxflags = -qopenmp -march=native -O3 -vec-report3 -fno-alias -opt-prefetch=2 -fp-trap=none
[blas]
ldflags = -lmkl_rt

Verify Theano and NumPy Installation

It is important to verify which versions of Theano and NumPy libraries are referenced once they are imported in python. The versions of NumPy and Theano referenced in this article are verified as follows:

python -c "import numpy; print (numpy.__version__)"
->1.11.1
python -c "import theano; print (theano.__version__)"
-> 0.9.0dev1.dev-*

It is also important to verify that the installed versions of NumPy and Theano are using Intel MKL.

python -c "import theano; print (theano.numpy.show_config())"

NumPy_Config

Fig 1. Desired output for theano.numpy.show_config()

Benchmarks

DBN-Kyoto and ImageNet benchmarks are available in the theano/democase directory.

DBN-Kyoto

Procuring the Dataset for Running DBN-Kyoto

The sample dataset can be downloaded for DBN-Kyoto from Dropbox via the following link: https://www.dropbox.com/s/ocjgzonmxpmerry/dataset1.pkl.7z?dl=0. Unzip the file and save it in the theano/democase/DBN-Kyoto directory.

Prerequisites

Dependencies for training DBN-Kyoto can be installed using Anaconda or built using the provided source in the tools directory. Due to some conflicts in the pandas library and Python 3, this benchmark is validated only for Python 2.7.

Python 2.7
conda install -c intel --override-channels pandas
conda install imaging

Alternatively the dependencies can also be installed from source as given in Appendix B.

Running DBN-Kyoto on CPU

The provided run.sh script can be used to download the dataset (if not already present) and start the training.

cd theano/democase/DBN-Kyoto/
./run.sh

MNIST

In this article, we show how to train a neural network on MNIST using Lasagne, which is a lightweight library to build and train neural networks in Theano. The Lasagne library will be built and installed using Intel compilers.

Download the MNIST Database

The MNIST database can be downloaded from http://yann.lecun.com/exdb/mnist/. We downloaded images and labels for both training and validation data.

Installing Lasagne Library

The latest version of the Lasagne library can be built and installed from the Lasagne git repository as given below:

Python 2.7 and Python 3.5
git clone https://github.com/Lasagne/Lasagne.git
cd Lasagne
python setup.py build
python setup.py install

Training

cd Lasagne/examples
python mnist.py [model [epochs]]
                    --  where model can be mlp - simple multi layer perceptron (default) or 
                         cnn - simple convolution neural network.
                         and epochs = 500 (default)

AlexNet

Procuring the ImageNet dataset for AlexNet training

The ImageNet dataset can be obtained from the image-net website.

Prerequisites

Dependencies for training AlexNet can be installed using Anaconda or installed from the fedora epel source repository. Currently, support for Hickle (required dependency for preprocessing data) is only available in Python 2 and not supported on Python 3.

Installing h5py, pyyaml, pyzmq using Anaconda:

conda install h5py
conda install -c intel --override-channels pyyaml pyzmq

Installing Hickle (HDF5-based clone of Pickle):

git clone https://github.com/telegraphic/hickle.git
cd hickle
python setup.py build
python setup.py install

Alternatively, the dependencies can also be installed using the source as given in appendix B.

Preprocessing the ImageNet Dataset

Preprocessing is required to dump Hickle files and create labels for training and validation data.

Modify the paths.yaml file in the preprocessing directory to update the path for the dataset. One example of paths.yaml file is given below for reference.

cat theano/democase/alexnet_grp1/preprocessing/paths.yaml

train_img_dir: '/mnt/DATA2/TEST/ILSVRC2012_img_train/'
# the dir that contains folders like n01440764, n01443537, ...

val_img_dir: '/mnt/DATA2/TEST/ILSVRC2012_img_val/'
# the dir that contains ILSVRC2012_val_00000001~50000.JPEG

tar_root_dir: '/mnt/DATA2/TEST/parsed_data_toy'  # dir to store all the preprocessed files
tar_train_dir: '/mnt/DATA2/TEST/parsed_data_toy/train_hkl'  # dir to store training batches
tar_val_dir: '/mnt/DATA2/TEST/parsed_data_toy/val_hkl'  # dir to store validation batches
misc_dir: '/mnt/DATA2/TEST/parsed_data_toy/misc'
# dir to store img_mean.npy, shuffled_train_filenames.npy, train.txt, val.txt

meta_clsloc_mat: '/mnt/DATA2/imageNet-2012-images/ILSVRC2014_devkit/data/meta_clsloc.mat'
val_label_file: '/mnt/DATA2/imageNet-2012-images/ILSVRC2014_devkit/data/ILSVRC2014_clsloc_validation_ground_truth.txt'
# although from ILSVRC2014, these 2 files still work for ILSVRC2012

# caffe style train and validation labels
valtxt_filename: '/mnt/DATA2/TEST/parsed_data_toy/misc/val.txt'
traintxt_filename: '/mnt/DATA2/TEST/parsed_data_toy/misc/train.txt'

Toy data set can be created using the provided script - generate_toy_data.sh¹.

cd theano/democase/alexnet_grp1/preprocessing
chmod u+x make_hkl.py make_labels.py make_train_val_txt.py
./generate_toy_data.sh

AlexNet training on CPU

Modify the config.yaml file to update the path to the preprocessed dataset:

cd theano/democase/alexnet_grp1/

# Sample changes to the path for input(label_folder, mean_file) and output(weights_dir)
label_folder: /mnt/DATA2/TEST/parsed_data_toy/labels/
mean_file: /mnt/DATA2/TEST/parsed_data_toy/misc/img_mean.npy
weights_dir: ./weight/  # directory for saving weights and results

Similarly, modify the spec.yaml file to update the path to the parsed toy data set:

# Directories
train_folder: /mnt/DATA2/TEST/parsed_data_toy/train_hkl_b256_b256_bchw/
val_folder: /mnt/DATA2/TEST/parsed_data_toy/val_hkl_b256_b256_bchw/

Start the training:

./run.sh

Large Movie Review Dataset (IMDB)

The Large Movie Review Dataset is an example of a Recurring Neural Network using a Long Short-Term Memory (LSTM) model. The IMDB data set is used for sentiment analysis on movie reviews using the LSTM model.

Procuring the dataset:

Obtain the imdb.pkl file from http://www-labs.iro.umontreal.ca/~lisa/deep/data/ and extract the file to a local folder.

Preprocessing

The http://deeplearning.net/tutorial/lstm.html page provides two scripts:

Imdb.py – This handles the loading the preprocessing of the IMDB dataset.

Lstm.py – This is the primary script that defines and trains the model.

Copy both of the above files into the same folder where we have the imdb.pkl file.

Training

Training can be started using the following command:

THEANO_FLAGS="floatX=float32" python lstm.py

Troubleshooting

Error 1: In some cases, you might get errors like libmkl_rt.so or libimf.so, which cannot be opened. In this case try the below:

find /opt/intel -name library_name.so

Add the paths to get to the /etc/ ld.so.conf file and run the ldconfig command to link the libraries. Also make sure the MKL installation paths are set correctly in the LD_LIBRARY_PATH environment variable.

Error 2: AlexNet preprocessing error for toy data

python make_hkl.py toy
generating toy dataset ...
Traceback (most recent call last):
  File "make_hkl.py", line 293, in <module>
    train_batchs_per_core)
ValueError: xrange() arg 3 must not be zero

The default number of processes used to preprocess ImageNet is currently set to 16. For the toy dataset this will create more processes than required, causing the application to crash. To resolve this issue, change the number of processes in file Alexnet_CPU/preprocessing/make_hkl.py:258 from 16 to 2. However, while preprocessing the full data set it is recommended to use a higher value for num_process for faster preprocessing.

num_process = 2

Error 3: Referencing the current version of Numpy when installing Intel(R) Distribution of Python* through Conda

If installing the Intel(R) Distribution of Python from within Conda instead of through the Intel(R) Distribution of Python installer, make sure that you set the PYTHONNOUSERSITE environment variable to True. This will enable the Conda environment to reference the correct version of Numpy. This is a known error in Conda. More information can be found here.

export PYTHONNOUSERSITE=True

Resources

GitHub repo - Intel optimized Theano
GitHub rep - Lasagne
GitHub repo - Intel optimized NumPy (if building from source)

Appendix A

Installing Python* Tools For Other Python Distribution

CentOS:
Python 2.7 - sudo yum install python-devel python-setuptools
Python 3.5 - sudo yum install python35-libs python35-devel python35-setuptools
//Note - Python 3.5 packages can be obtained from Fedora EPEL source repository
Ubuntu:
Python 2.7 - sudo apt-get install python-dev python-setuptools
Python 3.5 - sudo apt-get install libpython3-dev python3-dev python3-setuptools

Incase pip and cython are not installed on the system, they can be installed using the following commands:

sudo -E easy_install pip
sudo -E pip install cython

Installing NumPy

NumPy is the fundamental package needed for scientific computing with Python. This package contains:

A powerful N-dimensional array object
Sophisticated (broadcasting) functions
Tools for integrating C/C++ and Fortran code
Useful linear algebra, Fourier transform, and random number capabilities.

Note: An older version of the NumPy library can be removed by verifying its existence and deleting the related files. However, in this tutorial all the remaining libraries will be installed in user’s local directory, so this step is optional. If required, old versions can be cleaned as follows:

Verify if old version exists:

python -c "import numpy; print numpy.version"
<module 'numpy.version' from '/home/plse/.local/lib/python2.7/site-packages/numpy-1.11.0rc1-py2.7-linux-x86_64.egg/numpy/version.pyc'>

Delete any previously installed NumPy packages:

rm -r /home/plse/.local/lib/python2.7/site-packages/numpy-1.11.0rc1-py2.7-linux-x86_64.egg

Building and installing NumPy optimized for Intel architecture:

git clone https://github.com/pcs-theano/numpy.git
//update site.cfg file to point to required MKL directory. This step is optional if parallel studio or MKL were installed in default /opt/intel directory.
python setup.py config --compiler=intelem build_clib --compiler=intelem build_ext --compiler=intelem install --user

Installing SciPy

SciPy is an open source Python library used for scientific computing and technical computing. SciPy contains modules for optimization, linear algebra, integration, interpolation, special functions, FFT, signal and image processing, ODE solvers and other tasks common in science and engineering

Building and installing SciPy:

tar -xvzf scipy-0.16.1.tar.gz    (can be downloaded from: https://sourceforge.net/projects/scipy/files/scipy/0.16.1/  or 
     obtain the latest sources from https://github.com/scipy/scipy/releases) 
cd scipy-0.16.1/
python setup.py config --compiler=intelem --fcompiler=intelem build_clib --compiler=intelem --fcompiler=intelem build_ext --compiler=intelem --fcompiler=intelem install --user

Appendix B

Building and installing benchmark dependencies from source

DBN-Kyoto

//Untar and install all the provided tools:

cd theano/democase/DBN-Kyoto/tools
tar -xvzf Imaging-1.1.7.tar.gz
cd Imaging-1.1.7
python setup.py build
python setup.py install --user

cd theano/democase/DBN-Kyoto/tools
tar -xvzf python-dateutil-2.4.1.tar.gz
cd python-dateutil-2.4.1
python setup.py build
python setup.py install --user

cd theano/democase/DBN-Kyoto/tools
tar -xvzf pytz-2014.10.tar.gz
cd pytz-2014.10
python setup.py build
python setup.py install --user

cd theano/democase/DBN-Kyoto/tools
tar -xvzf pandas-0.15.2.tar.gz
cd pandas-0.15.2
python setup.py build
python setup.py install --user

AlexNet

Installing dependencies for AlexNet from source

Access to some of the add-on packages from the fedrora epel source repository may be required for running AlexNet on CPU.

wget http://dl.fedoraproject.org/pub/epel/7/x86_64/e/epel-release-7-8.noarch.rpm
sudo rpm -ihv epel-release-7-8.noarch.rpm
sudo yum install hdf5-devel
sudo yum install zmq-devel
sudo yum install zeromq-devel
sudo yum install python-zmq

Installing Hickle (HDF5-based clone of Pickle):

git clone https://github.com/telegraphic/hickle.git
python setup.py build install --user

Installing h5py (Python interface to HDF5 binary data format):

git clone https://github.com/h5py/h5py.git
python setup.py build install --user

References

LSTM tutorial
DBN tutorial
Superior Performance Commits Kyoto University to CPUs Over GPUs
Introduction of the LSTM model:
- [pdf] Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8), 1735-1780.
Addition of the forget gate to the LSTM model:
- [pdf] Gers, F. A., Schmidhuber, J., & Cummins, F. (2000). Learning to forget: Continual prediction with LSTM. Neural computation, 12(10), 2451-2471.
LSTM paper:
- [pdf] Graves, Alex. Supervised sequence labelling with recurrent neural networks. Vol. 385. Springer, 2012.
Theano:
- [pdf] Bastien, Frédéric, Lamblin, Pascal, Pascanu, Razvan, Bergstra, James, Goodfellow, Ian, Bergeron, Arnaud, Bouchard, Nicolas, and Bengio, Yoshua. Theano: new features and speed improvements. NIPS Workshop on Deep Learning and Unsupervised Feature Learning, 2012.
- [pdf] Bergstra, James, Breuleux, Olivier, Bastien, Frédéric, Lamblin, Pascal, Pascanu, Razvan, Desjardins, Guillaume, Turian, Joseph, Warde-Farley, David, and Bengio, Yoshua. Theano: a CPU and GPU math expression compiler. In Proceedings of the Python for Scientific Computing Conference (SciPy), June 2010.
- https://github.com/Theano
ImageNet
- ImageNet Large Scale Visual Recognition Challenge (ILSVRC)
- http://www.image-net.org/
NumPy
SciPy
- http://www.scipy.org/
- https://en.wikipedia.org/wiki/SciPy

About The Authors

Sunny Gogar
Software Engineer

Sunny Gogar received a Master’s degree in Electrical and Computer Engineering from the University of Florida, Gainesville and a Bachelor’s degree in Electronics and Telecommunications from the University of Mumbai, India. He is currently a software engineer with Intel Corporation's Software and Services Group. His interests include parallel programming and optimization for Multi-core and Many-core Processor Architectures.

Meghana Rao received a Master’s degree in Engineering and Technology Management from Portland State University and a Bachelor’s degree in Computer Science and Engineering from Bangalore University, India. She is a Developer Evangelist with the Software and Services Group at Intel focused on Machine Learning and Deep Learning.