14,984,032 members
Articles / Programming Languages / Java
Article
Posted 23 Mar 2013

62.2K views
12 bookmarked

# Multi-Linear Regression in Java

Rate me:
13 Mar 2017CPOL5 min read
Multi-linear regression/classification with simple examples and Java code
This article introduces multi-linear regression/classification with simple examples and provide the codes in Java.

## Introduction

I introduce a very popular subject in statistical modelling; multi-linear (or multi-variate) regression (MLR) or classification. In simple examples, I will show you the usage of MLR. MLR has been used extensively in science (biological, pharmaceutical, financial, medical and more).

## Background

A few months ago, I wrote an article about matrix operations in Java. I suggest reading that article first since the code in this article is heavily dependent on the matrix operations.

To understand multi-linear regression (MLR), have a look at the following table:

 Diet score Male age>20 BMI 4 0 1 27 7 1 1 29 6 1 0 23 2 0 0 20 3 0 1 21

The body mass index of five people has been measured. For each person, the diet score, whether they are male or female and whether they are older than 20 have also been recorded in three columns. Do not ask me what diet score is and how to measure them, because I do not know and this is just a toy example. The question is: what is the relationships between BMI and diet score, gender and age? If we have the diet score, gender and age of a new person, can we get his/her body mass index? MLR is here to answer these questions. We expect the relationships between BMI and three variables to be something like this:

Based on this equation, in order to predict the value of BMI for a person with known diet score, gender and age, you need to know the values of all beta. MLR finds the value of all missing coefficients. We call ß0 bias. In most real-life applications, having a large bias means the predictors (i.e., the three variables) do not have enough predictive power and having small bias is a good sign of having a good predictive model. A large bias could possibly mean that there are other descriptors that can explain the observations which we have not discovered them yet.

Let's show the BMI column in the above table as a column matrix and name it Y and the values of all independent variables as a 3 x 3 matrix with name X and finally the values of beta matrix that will be discovered later as a column matrix b. The unknown matrix b can be found as:

b = (X'X)-1X'Y

where X' is the transpose of matrix X and -1 returns the inverse of the matrix.

If you want to have bias, you need to add a new column to matrix X. This new column should be the first one and its value for all rows must be 1.

Limitation of MLR: MLR works only when the number of columns in X matrix is less than or equals the number of rows. In other words, the number of descriptors cannot be more than the number of observations. Another limitation is about the inverse operation in the above equation. Not all matrices have inverse and when we cannot get the inverse of X'X, then the calculation of b matrix will fail and therefore the MLR will fail. There are other methods such as Partial Least Square or Support Vector Machine that work fine when MLR fails.

## Using the Code

We only need the implementation of a single method on top of all matrix operations methods described in another article in order to create the model and find the values of b matrix.

C++
```public Matrix calculate() throws NoSquareException {
if (bias)
this.X = X.insertColumnWithValue1();
checkDiemnsion();
Matrix Xtr = MatrixMathematics.transpose(X); //X'
Matrix XXtr = MatrixMathematics.multiply(Xtr,X); //X'X
Matrix inverse_of_XXtr = MatrixMathematics.inverse(XXtr); //(X'X)^-1
if (inverse_of_XXtr == null) {
System.out.println("Matrix X'X does not have any inverse.
So MLR failed to create the model for these data.");
return null;
}
Matrix XtrY = MatrixMathematics.multiply(Xtr,Y); //X'Y
return MatrixMathematics.multiply(inverse_of_XXtr,XtrY); //(X'X)^-1 X'Y
}```

The above code follows the following steps in order to get the b matrix:

1. If you want to have bias (i.e., beta0), then add a new column to X matrix
2. Then check the input matrices are valid
3. Then find the transpose of X (i.e. X' )
4. Then multiply X by X'
5. Then find the inverse of matrix from step 4; i.e. (XX')-1
6. Then multiply X' by Y
7. Finally, multiply matrix from operation in step 5 by matrix of operation in step 6

Now let's test the method on the above example:

C++
```Matrix X = new Matrix(new double[][]{{4,0,1},{7,1,1},{6,1,0},{2,0,0},{3,0,1}});
Matrix Y = new Matrix(new double[][]{{27},{29},{23},{20},{21}});
MultiLinear ml = new MultiLinear(X, Y);
Matrix beta = ml.calculate();```

When we use the constructor with two arguments, then the value of bias by default is true. Here are the results:

This is a model to predict the MSI having the values of all independent variables (i.e., diet score, gender and age). The size of the values for beta and also their sign shows their importance. In this illustrative example, diet score and gender have a greater contribution to BMI than age, and effect of gender and diet score is opposite; i.e., people with more high diet score have more BMI and males have significantly lower BMI with respect to females. It is interesting to see the insight that MLR is giving about the BMI observations.

One final question: Is this a good model? The minimum that we can do is to use the model (the above equation) and predict the BMI and then compare them with the observed values:

 BMI predicted 27 27 29 27.75 23 24.25 20 18.75 21 22.25

As you can see, the predicted ones are not that far from the observed ones. You can find the error (i.e., predicted - observed) for each case and calculate the mean squared error (MSE) that can indicate how accurate our model is. The lower the MSE, the better the model. There are plenty of fancy statistical tests that can be used to examine the suitability of the model which I will ignore in this article. You can find a couple of more tests in the code. One of the test examples is a classification analysis using MLR.

## Points of Interest

With a few lines of code, I tried to illustrate one of the most important statistical modelling algorithms (MLR). I have not tested the codes for large matrices and because of recursive operations that we have you may need to increase the thread's stack size (i.e. -Xss flag). Please let me know if you have some interesting data that we can test the codes.

## History

• 23rd March, 2013: First version (v1.0.1)

## About the Author

 Software Developer (Senior) Private United Kingdom
I have a PhD in computational chemistry from Newcastle University. I worked for Imperial College London as research scientist for 6.5 years followed by 7 years in banking in the City of London as senior software developer. Currently I do mathematical modelling and software development for a private company and spend some time in research and development in the University of Newcastle.

## Comments and Discussions

 First Prev Next
 main class Member 141541534-Mar-19 19:38 Member 14154153 4-Mar-19 19:38
 C# conversion pcarney19-May-17 19:34 pcarney 19-May-17 19:34
 testing Member 1297451130-Jan-17 9:02 Member 12974511 30-Jan-17 9:02
 Support larger matrixes Member 1287667930-Nov-16 1:12 Member 12876679 30-Nov-16 1:12
 what is the coefficient in your code Member 1151664711-Mar-15 5:08 Member 11516647 11-Mar-15 5:08
 Re: what is the coefficient in your code Ata Amini12-Mar-15 2:20 Ata Amini 12-Mar-15 2:20
 Re: what is the coefficient in your code Member 1151664712-Mar-15 4:32 Member 11516647 12-Mar-15 4:32
 Re: what is the coefficient in your code Ata Amini12-Mar-15 4:44 Ata Amini 12-Mar-15 4:44
 Re: what is the coefficient in your code Member 1151664713-Mar-15 8:25 Member 11516647 13-Mar-15 8:25
 Machine Learning and MLR Member 831508824-Sep-14 23:21 Member 8315088 24-Sep-14 23:21
 Re: Machine Learning and MLR Ata Amini24-Sep-14 23:31 Ata Amini 24-Sep-14 23:31
 Re: Machine Learning and MLR Member 831508811-Nov-14 3:00 Member 8315088 11-Nov-14 3:00
 Re: Machine Learning and MLR Ata Amini11-Nov-14 5:14 Ata Amini 11-Nov-14 5:14
 Re: Machine Learning and MLR Member 83150884-Dec-14 4:01 Member 8315088 4-Dec-14 4:01
 Re: Machine Learning and MLR Member 83150884-Dec-14 5:08 Member 8315088 4-Dec-14 5:08
 Something's Missing Member 1016589822-Jul-13 10:06 Member 10165898 22-Jul-13 10:06
 Re: Something's Missing Ata Amini22-Jul-13 11:13 Ata Amini 22-Jul-13 11:13
 Re: Something's Missing Member 1016589822-Jul-13 11:16 Member 10165898 22-Jul-13 11:16
 Re: Something's Missing -1 Member 1016589822-Jul-13 11:17 Member 10165898 22-Jul-13 11:17
 Great code Jeroen Heijning17-Jun-13 9:47 Jeroen Heijning 17-Jun-13 9:47
 Re: Great code Ata Amini17-Jun-13 10:14 Ata Amini 17-Jun-13 10:14
 Re: Great code Member 1417269719-Mar-19 4:03 Member 14172697 19-Mar-19 4:03
 Last Visit: 31-Dec-99 18:00     Last Update: 5-Aug-21 10:07 Refresh 1

General    News    Suggestion    Question    Bug    Answer    Joke    Praise    Rant    Admin

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.