Full working example used in the article can be downloaded from here.
The October 2018 issue of MSDN magazine brings the article “Sentiment Analysis Using CNTK” written by James McCaffrey. I was wondering if I can implement this solution in ANNdotNET as Dr. McCaffrey written in the magazine. Indeed, I have implemented the complete solution in less than 5 minutes.
In this blog post, I am going to walk you through this very good and well written MSDN article example. I am not going to repeat the text written in the MSDN article, so it is recommended to read the article first, and back here and implement the example in ANNdotNET. Since the ANNdotNET is a GUI tool, it is interesting to see all great visualizations during the model training and evaluation. Also the ANNdotNET provides complete binary model evaluation by providing the confusion matrix, ROC Curve, and other binary performance parameters, this example makes more interesting and valuable to read.
The whole example is implemented in five steps.
Step 1: Prepare Files and Folder Structure
First, we need to create several folders and files in order to create an empty annproject. This manual creation of folders are necessary because ANNdotNET v1.0 has no option to create the empty project. This will be added in the next version.
So first, create the following set of hierarchically ordered folders:
The following figure shows this set of folders:
Step 2: Download Data Sets Used in the Example
The only thing we need from the MSDN article is train and test data sets. The data can be downloaded from the MSDN sample: Code_McCaffreyTestRun1018.zip. Once the zip file is downloaded, unzip the sample, and copy files: imdb_sparse_train_50w.txt and indb_sparse_test_50w.txt to data folder as the image above shows.
Step 3: Create MoviewReview.ann and LSTM-Net.mlconfig Files
- Open Notepad and create file with the following content:
project:|Name:MovieReview |Type:NoRawData |MLConfigs:LSTM-Net
parser:|RowSeparator:rn |ColumnSeparator: ; |Header:0 |SkipLines:0
Save file in SentimentAnalysis folder as MovieReview.ann. The following picture shows saved annproject file on disk.
Now open Notepad again, create a new empty file. The empty file is supposed to be mlconfig file with the content shown below. Don’t worry about the content of the file, since all those details will be visible once we open it with ANNdotNET. If you want to know more about the structure of the mlconfig file, please refer to this wiki page of the ANNdotNET project.
features:|x 129892 1
labels:|y 2 0
network:|Layer:Embedding 50 0 0 None 0 0 |Layer:LSTM 25 25 0 TanH 1 1 |Layer:Dense 2 0 0 Softmax 0 0
learning:|Type:AdamLearner |LRate:0.01 |Momentum:0.85 |Loss:CrossEntropyWithSoftmax |Eval:ClassificationAccuracy |L1:0 |L2:0
training:|Type:Default |BatchSize:250 |Epochs:400 |Normalization:0 |RandomizeBatch:0 |SaveWhileTraining:0 |FullTrainingSetEval:1 |ProgressFrequency:1 |ContinueTraining:0 |TrainedModel:
paths:|Training:data\imdb_sparse_train_50w.txt |Validation:data\imdb_sparse_test_50w.txt |Test:data\imdb_sparse_test_50w.txt |TempModels:temp_models |Models:models|Result:LSTM-Net_result.csv |Logs:log
The file should be saved in the MovieReview folder with LSTM-Net.mlconfig file name. The next image shows where mlconfig file is stored.
Step 4: Open annproject File with ANNdotNET GUI Tool
Now we have setup everything in order to open and train sentiment analysis example with ANNdotNET. Since ANNdotNET implements
MLEngine which is based on CNTK, data sets are compatible and can be read by the trainer. In order to get better results, we have changed learning parameter a little bit. Instead of SGD, we used
In case you don’t have ANNdotNET tool installed on your machine, just go to release section and download the latest version. Or clone the GitHub repository and run it within the Visual Studio. All information about how to run ANNdotNET as standalone application or as the Visual Studio solution can be found at GitHub page https://github.com/bhrnjica/anndotnet.
After simple unzipping binaries of the ANNdotNET on your machine, run it by simply selecting anndotnet.wnd.exe file. Once the ANNdotNET is running, click the Open application command and select the MoveReview.ann file. In a second, the application loads the project with the corresponding mlconfig file. From the project explorer, click on LSTM-NET three item, and similar content as image below should appear.
Everything we have written into mlconfig file is now shown in the Network settings tab page.
- Input layer with 129892 dimensions
- Output layer with 2 dimensions (binary problem)
- Learning parameters:
- AdamLearner, with 0.01 lr and 0.85 momentum
- Loss Function is
- Evaluation function is
- NNetwork Designer shows typical LSTM recurrent network
Step 5: Training and Evaluation of the Example
Now that we reviewed the network settings, we can switch to the train tab page, and review the training parameters. Since we already setup training parameters in the mlconfig file, we don’t need to change anything.
Start training process by clicking on the Run application command. After some time, we should see the following result:
If we switch to Evaluation page, we can perform some statistics analysis in order to evaluate if the model is good or not. Once the evaluation tab page is shown, click on Refresh button to evaluate the model against training and validation data stets.
The left statistics are for the training dataset, and the left side is for the validation data set. As can be seen, the model perfectly predicted all data from the training data set, and about 70% of accuracy described the validation data set. Of course, the model is not good as we expected for the production, but for this demonstration is good enough. There are also two buttons to show ROC curve, and other binary performance parameters, for both data sets, which the reader may test.
That’s all that is needed in order to have complete Sentiment Analysis example setup and running. In case you want complete ANNdotNET project, it can be downloaded from here.
- 17th October, 2018: Initial version