Click here to Skip to main content
14,984,032 members
Articles / Programming Languages / Java
Article
Posted 23 Mar 2013

Tagged as

Stats

62.2K views
2K downloads
12 bookmarked

Multi-Linear Regression in Java

Rate me:
Please Sign up or sign in to vote.
5.00/5 (11 votes)
13 Mar 2017CPOL5 min read
Multi-linear regression/classification with simple examples and Java code
This article introduces multi-linear regression/classification with simple examples and provide the codes in Java.

Introduction

I introduce a very popular subject in statistical modelling; multi-linear (or multi-variate) regression (MLR) or classification. In simple examples, I will show you the usage of MLR. MLR has been used extensively in science (biological, pharmaceutical, financial, medical and more).

Background

A few months ago, I wrote an article about matrix operations in Java. I suggest reading that article first since the code in this article is heavily dependent on the matrix operations.

To understand multi-linear regression (MLR), have a look at the following table:

Diet score Male age>20 BMI
4 0 1 27
7 1 1 29
6 1 0 23
2 0 0 20
3 0 1 21

The body mass index of five people has been measured. For each person, the diet score, whether they are male or female and whether they are older than 20 have also been recorded in three columns. Do not ask me what diet score is and how to measure them, because I do not know and this is just a toy example. The question is: what is the relationships between BMI and diet score, gender and age? If we have the diet score, gender and age of a new person, can we get his/her body mass index? MLR is here to answer these questions. We expect the relationships between BMI and three variables to be something like this:

Based on this equation, in order to predict the value of BMI for a person with known diet score, gender and age, you need to know the values of all beta. MLR finds the value of all missing coefficients. We call ß0 bias. In most real-life applications, having a large bias means the predictors (i.e., the three variables) do not have enough predictive power and having small bias is a good sign of having a good predictive model. A large bias could possibly mean that there are other descriptors that can explain the observations which we have not discovered them yet.

Let's show the BMI column in the above table as a column matrix and name it Y and the values of all independent variables as a 3 x 3 matrix with name X and finally the values of beta matrix that will be discovered later as a column matrix b. The unknown matrix b can be found as:

b = (X'X)-1X'Y

where X' is the transpose of matrix X and -1 returns the inverse of the matrix.

If you want to have bias, you need to add a new column to matrix X. This new column should be the first one and its value for all rows must be 1.

Limitation of MLR: MLR works only when the number of columns in X matrix is less than or equals the number of rows. In other words, the number of descriptors cannot be more than the number of observations. Another limitation is about the inverse operation in the above equation. Not all matrices have inverse and when we cannot get the inverse of X'X, then the calculation of b matrix will fail and therefore the MLR will fail. There are other methods such as Partial Least Square or Support Vector Machine that work fine when MLR fails.

Using the Code

We only need the implementation of a single method on top of all matrix operations methods described in another article in order to create the model and find the values of b matrix.

C++
public Matrix calculate() throws NoSquareException {
	if (bias)
		this.X = X.insertColumnWithValue1();
	checkDiemnsion();
	Matrix Xtr = MatrixMathematics.transpose(X); //X'
	Matrix XXtr = MatrixMathematics.multiply(Xtr,X); //X'X
	Matrix inverse_of_XXtr = MatrixMathematics.inverse(XXtr); //(X'X)^-1
	if (inverse_of_XXtr == null) {
		System.out.println("Matrix X'X does not have any inverse. 
                            So MLR failed to create the model for these data.");
		return null;
	}
	Matrix XtrY = MatrixMathematics.multiply(Xtr,Y); //X'Y
	return MatrixMathematics.multiply(inverse_of_XXtr,XtrY); //(X'X)^-1 X'Y
}

The above code follows the following steps in order to get the b matrix:

  1. If you want to have bias (i.e., beta0), then add a new column to X matrix
  2. Then check the input matrices are valid
  3. Then find the transpose of X (i.e. X' )
  4. Then multiply X by X'
  5. Then find the inverse of matrix from step 4; i.e. (XX')-1
  6. Then multiply X' by Y
  7. Finally, multiply matrix from operation in step 5 by matrix of operation in step 6

Now let's test the method on the above example:

C++
Matrix X = new Matrix(new double[][]{{4,0,1},{7,1,1},{6,1,0},{2,0,0},{3,0,1}});
Matrix Y = new Matrix(new double[][]{{27},{29},{23},{20},{21}});
MultiLinear ml = new MultiLinear(X, Y);
Matrix beta = ml.calculate();

When we use the constructor with two arguments, then the value of bias by default is true. Here are the results:

This is a model to predict the MSI having the values of all independent variables (i.e., diet score, gender and age). The size of the values for beta and also their sign shows their importance. In this illustrative example, diet score and gender have a greater contribution to BMI than age, and effect of gender and diet score is opposite; i.e., people with more high diet score have more BMI and males have significantly lower BMI with respect to females. It is interesting to see the insight that MLR is giving about the BMI observations.

One final question: Is this a good model? The minimum that we can do is to use the model (the above equation) and predict the BMI and then compare them with the observed values:

BMI predicted
27 27
29 27.75
23 24.25
20 18.75
21 22.25

As you can see, the predicted ones are not that far from the observed ones. You can find the error (i.e., predicted - observed) for each case and calculate the mean squared error (MSE) that can indicate how accurate our model is. The lower the MSE, the better the model. There are plenty of fancy statistical tests that can be used to examine the suitability of the model which I will ignore in this article. You can find a couple of more tests in the code. One of the test examples is a classification analysis using MLR.

Points of Interest

With a few lines of code, I tried to illustrate one of the most important statistical modelling algorithms (MLR). I have not tested the codes for large matrices and because of recursive operations that we have you may need to increase the thread's stack size (i.e. -Xss flag). Please let me know if you have some interesting data that we can test the codes.

History

  • 23rd March, 2013: First version (v1.0.1)

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

Ata Amini
Software Developer (Senior) Private
United Kingdom United Kingdom
I have a PhD in computational chemistry from Newcastle University. I worked for Imperial College London as research scientist for 6.5 years followed by 7 years in banking in the City of London as senior software developer. Currently I do mathematical modelling and software development for a private company and spend some time in research and development in the University of Newcastle.

Comments and Discussions

 
Questionmain class Pin
Member 141541534-Mar-19 19:38
MemberMember 141541534-Mar-19 19:38 
QuestionC# conversion Pin
pcarney19-May-17 19:34
Memberpcarney19-May-17 19:34 
Questiontesting Pin
Member 1297451130-Jan-17 9:02
MemberMember 1297451130-Jan-17 9:02 
QuestionSupport larger matrixes Pin
Member 1287667930-Nov-16 1:12
MemberMember 1287667930-Nov-16 1:12 
Questionwhat is the coefficient in your code Pin
Member 1151664711-Mar-15 5:08
MemberMember 1151664711-Mar-15 5:08 
AnswerRe: what is the coefficient in your code Pin
Ata Amini12-Mar-15 2:20
MemberAta Amini12-Mar-15 2:20 
GeneralRe: what is the coefficient in your code Pin
Member 1151664712-Mar-15 4:32
MemberMember 1151664712-Mar-15 4:32 
GeneralRe: what is the coefficient in your code Pin
Ata Amini12-Mar-15 4:44
MemberAta Amini12-Mar-15 4:44 
GeneralRe: what is the coefficient in your code Pin
Member 1151664713-Mar-15 8:25
MemberMember 1151664713-Mar-15 8:25 
QuestionMachine Learning and MLR Pin
Member 831508824-Sep-14 23:21
MemberMember 831508824-Sep-14 23:21 
AnswerRe: Machine Learning and MLR Pin
Ata Amini24-Sep-14 23:31
MemberAta Amini24-Sep-14 23:31 
GeneralRe: Machine Learning and MLR Pin
Member 831508811-Nov-14 3:00
MemberMember 831508811-Nov-14 3:00 
GeneralRe: Machine Learning and MLR Pin
Ata Amini11-Nov-14 5:14
MemberAta Amini11-Nov-14 5:14 
GeneralRe: Machine Learning and MLR Pin
Member 83150884-Dec-14 4:01
MemberMember 83150884-Dec-14 4:01 
GeneralRe: Machine Learning and MLR Pin
Member 83150884-Dec-14 5:08
MemberMember 83150884-Dec-14 5:08 
QuestionSomething's Missing Pin
Member 1016589822-Jul-13 10:06
MemberMember 1016589822-Jul-13 10:06 
AnswerRe: Something's Missing Pin
Ata Amini22-Jul-13 11:13
MemberAta Amini22-Jul-13 11:13 
GeneralRe: Something's Missing Pin
Member 1016589822-Jul-13 11:16
MemberMember 1016589822-Jul-13 11:16 
GeneralRe: Something's Missing -1 Pin
Member 1016589822-Jul-13 11:17
MemberMember 1016589822-Jul-13 11:17 
QuestionGreat code Pin
Jeroen Heijning17-Jun-13 9:47
MemberJeroen Heijning17-Jun-13 9:47 
AnswerRe: Great code Pin
Ata Amini17-Jun-13 10:14
MemberAta Amini17-Jun-13 10:14 
AnswerRe: Great code Pin
Member 1417269719-Mar-19 4:03
MemberMember 1417269719-Mar-19 4:03 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.