Multiple linear regression to fit data to a third degree polynomial equation with interaction terms
Statistical Modeling/Regression 2016. 6. 2. 17:29
Hello,
I have data for two independent variables and one dependent variable (obtained from experiment). I need to fit this data using linear regression to a 10 coefficient third degree polynomial equation - for the engineers among you, this is the standard equation for specifying refrigeration compressor performance. This can be done quite easily in EES by entering data into a table and selecting the option "Linear Regression" from the Tables menu. How do I achieve the same in Matlab? My equation is of the form X = C1 + C2.(S) + C3.(D) + C4·(S2) + C5 · (S·D) + C6 · (D2 ) + C7 · (S3 ) + C8 · (D·S2 ) +C9 · (S·D2 ) + C10 · (D3 ) Here, X is the dependent variable and S and D are the independent variables. The numbers next to S and D indicate the power to which they are raised. C1 to C10 are the coefficients that need to be calculated.
Using the 'regress' function gives me a constant of 0, and warns me that my design matrix is rank deficient. Using the curve fitting toolbox (cftool - polynomial option) gives me ridiculous values for the coefficients (p00 = -6.436e15). When I try to input a custom equation in the cftool, it is switching to non-linear regression and asks me to input guess values for the coefficients, which I don't want to do. What other functions are available that I might use to perform this regression and how do I implement them. Any suggestions/ help/ recommendations would be greatly appreciated.
Thanks very much.
Gautam.
---------------------------------------------------------------------------------------------------
Please post your regression equation and 15-10 rows of your X, S and D data.
It’s difficult to determine what the problem may be without that information.
---------------------------------------------------------------------------------------------------
>> X = [-10 101; -10 114; -10 129; -10 144; -5 96; -5 106; -5 119; -5 133; 5 92; 5 97; 5 108; 5 122];
>> Y = [72.03; 67.90; 62.77; 57.55; 88.53; 84.70; 79.91; 73.73; 123.43; 120.91; 115.44; 107.83];
>> x1 = X(:,1);
>> x2 = X(:,2);
>> cftool
I used a custom equation, Y = f(x1, x2) and the custom equation is:
a + b*x1 + c*x2 + d*(x1^2) + e*(x1*x2) + f*(x2^2) + g*(x1^3) + h*(x1^2*x2) + i*(x1*x2^2) + j*(x2^3)
What turned up in the Results box was this:
Complex value computed by model function, fitting cannot continue. Try using or tightening upper and lower bounds on coefficients.
Any help?
---------------------------------------------------------------------------------------------------
You have only three distinct values in the first column of your X matrix, so you won't be able to estimate constant, linear, quadratic, and cubic effects for that column. Here's how you can use all terms in the cubic model except x1^3:
fitlm(X,Y,[0 0;1 0;0 1;1 1;2 0;0 2;0 3;1 2;2 1])
Alternatively, if you know that a third order term is appropriate, you will need to collect some data at another value of x1.
---------------------------------------------------------------------------------------------------
Thank you very much Mr. Lane, for the comment. I certainly need to have the x1^3 term - I am not allowed to modify the equation - it is the AHRI 540-2004 Standard. And, the data is indistinct because it is a measurement of evaporator temperature (at the suction of a compressor) and that is how compressor data is taken and performance evaluated. Either the evaporator or the condenser temperature will have to be held constant, while the other is measured. Is there any way I can get around that?
---------------------------------------------------------------------------------------------------
To Gautam: Tom has told you the essential difficulty with your problem as you have presented it. The difficulty is not with your ten-coefficient third degree polynomial. It is with the shortage of data to be fitted to it. The quantity you called 'x1' occurs in only three distinct values: -10, -5, and +5. That is just not enough, given the complexity of your polynomial.
Look at it this way. Suppose your problem involves only a single variable and is approximated with a third degree polynomial with four adjustable coefficients. It is obvious that three data values of that single variable would not be enough to uniquely determine the four coefficients. You would have only three equations and four unknowns, and there are infinitely many coefficient combinations that would give you an exact fit to that limited data.
In other words, Gautam, your data is too easy to fit as far as the x1 variable is concerned. You need much more variation in x1 to produce a meaningful set of ten coefficients. Actually you are also not providing enough variation in the other variable, x2. You have only four values of x2 for each value of x1 and that is insufficient to give enough information to determine the coefficients with sufficient reliability. It is too easy to fit. You really need much greater variation in both variables than you have provided. That is what is behind the messages you have received about "rank deficiency". Ideally the messages ought to have complained with the declaration: "Data! Data! Give me more data! I am starving from too little data!"
---------------------------------------------------------------------------------------------------
Thank you very much Mr. Stafford for clearing that up for me. I'm a newcomer to Matlab and could not grasp completely the meaning behind the warnings. I tried out fitlm with a much bigger data set and it did give me the exact same values EES did, which has around 2% variation with the published coefficients. I cannot thank you enough.
---------------------------------------------------------------------------------------------------
There are numerous ways to do this. The most easiest of them all is to use the Curve Fitting Tool with the correct options. Try this link for help:
http://www.mathworks.com/help/curvefit/polynomial.html
Using the Statistics Toolbox you can do that by specifying the polyijf modelspec:
http://www.mathworks.com/help/stats/fitlm.html#inputarg_modelspec
Another way is to solve a linear system in MATLAB. Assuming X, S and D are vectors in your MATLAB workspace. Create random data and compute the coefficients in C:
S = randn(100,1); D = randn(100,1), X = randn(100,1);
M = [ones(length(S),1), S, D, S.^2, S.*D, D.^2, S.^3, D.*S.^2, S.*D.^2, D.^3] C = M\X
---------------------------------------------------------------------------------------------------
Shashank, thanks for the prompt response. In addition to the cftool I posted above, I also tried the below:
Using the same data (X and Y), I used the fitlm function with 'poly33' as the argument:
mdl = fitlm(X,Y,'poly33')
It did work, but the coefficients it gave me are not the coefficients I should be getting. The actual values of the coefficients are:
beta = [119.61 4.37 -0.39 -0.04 0.09 -0.01 0.00 0.02 0.00 0.00];
However, when I used fitlm, the value of the constant (119.61 above) is obtained as 0, which should not be the case.
As for the last method you mentioned (using the backslash operator), it gave me a warning that the rank of my matrix was deficient and still gave me wrong values for the values of the coefficients (and different from those obtained using the other functions).
Some more suggestions please?
---------------------------------------------------------------------------------------------------
I'm posting some of the attempts I made below:
X = [-10 101; -10 114; -10 129; -10 144; -5 96; -5 106; -5 119; -5 133; 5 92; 5 97; 5 108; 5 122];
>> Y = [72.03; 67.90; 62.77; 57.55; 88.53; 84.70; 79.91; 73.73; 123.43; 120.91; 115.44; 107.83];
>> x1 = X(:,1);
>> x2 = X(:,2);
>> mdl = fitlm(X,Y,'poly33')
Warning: Regression design matrix is rank deficient to within machine precision. > In TermsRegression>TermsRegression.checkDesignRank at 98 In LinearModel.LinearModel>LinearModel.fit at 868 In fitlm at 117
mdl =
Linear regression model: y ~ 1 + x1^2 + x1*x2 + x2^2 + x1^3 + (x1^2):x2 + x1:(x2^2) + x2^3
Estimated Coefficients: Estimate SE tStat pValue _________ ________ _______ ______
(Intercept) 0 0 NaN NaN x1 -10.143 5.3752 -1.887 0.19979 x2 -0.3125 1.1432 -0.27335 0.81022 x1^2 5.5413 1.6852 3.2882 0.081361 x1:x2 0.0043522 0.026071 0.16694 0.88277 x2^2 2.6105e-05 0.010348 0.0025228 0.99822 x1^3 0.55158 0.16763 3.2905 0.08126 x1^2:x2 -5.4944e-05 0.00021022 -0.26136 0.81827 x1:x2^2 -8.3639e-05 0.00012017 -0.69599 0.55844 x2^3 -4.12e-06 3.109e-05 -0.13252 0.9067
Number of observations: 12, Error degrees of freedom: 3 Root Mean Squared Error: 0.165 R-squared: 1, Adjusted R-Squared 1 F-statistic vs. constant model: 2.76e+04, p-value = 3.3e-07
Attempt 2:
A = [1 -10 101 100 -1010 10201 -1000 10100 -102010 1030301 1 -5 96 25 -480 9216 -125 2400 -46080 884736 1 5 92 25 460 8464 125 2300 42320 778688 1 -10 114 100 -1140 12996 -1000 11400 -129960 1481544 1 -5 106 25 -530 11236 -125 2650 -56180 1191016 1 5 97 25 485 9409 125 2425 47045 912673 1 -10 129 100 -1290 16641 -1000 12900 -166410 2146689 1 -5 119 25 -595 14161 -125 2975 -70805 1685159 1 5 108 25 540 11664 125 2700 58320 1259712 1 -10 144 100 -1440 20736 -1000 14400 -207360 2985984 1 -5 133 25 -665 17689 -125 3325 -88445 2352637 1 5 122 25 610 14884 125 3050 74420 1815848 ];
b = A\Y Warning: Rank deficient, rank = 9, tol = 4.463946e-09.
b =
0 -62.6154 18.9421 -35.6208 2.9627 -0.0958 -3.8184 -0.0236 -0.0136 0.0001
Attempt 3:
sf = fit([x1 x2],Y, 'poly33', 'Robust', 'Bi')
Warning: Equation is badly conditioned. Remove repeated data points or try centering and scaling. > In Warning>Warning.throw at 30 In fit>iLinearFit at 690 In fit>iFit at 396 In fit at 108 Warning: Iteration limit reached for robust fitting. > In Warning>Warning.throw at 30 In fit>iRobustLinearFit at 796 In fit>iFit at 404 In fit at 108
Linear model Poly33: sf(x,y) = p00 + p10*x + p01*y + p20*x^2 + p11*x*y + p02*y^2 + p30*x^3 + p21*x^2*y + p12*x*y^2 + p03*y^3 Coefficients (with 95% confidence bounds): p00 = -2.736e+14 (-6.205e+17, 6.199e+17) p10 = -2.736e+13 (-6.205e+16, 6.199e+16) p01 = 0.3112 (-346.1, 346.7) p20 = 1.094e+13 (-2.48e+16, 2.482e+16) p11 = -0.009459 (-5.569, 5.551) p02 = -0.005727 (-3.308, 3.296) p30 = 1.094e+12 (-2.48e+15, 2.482e+15) p21 = -0.0001193 (-0.4058, 0.4055) p12 = -1.846e-05 (-0.02545, 0.02542) p03 = 1.358e-05 (-0.01007, 0.01009)
Attempt 4:
sf = fit([x1 x2],Y, 'poly33', 'Robust', 'Bi','Normalize','on') Warning: Equation is badly conditioned. Remove repeated data points or try centering and scaling. > In Warning>Warning.throw at 30 In fit>iLinearFit at 690 In fit>iFit at 396 In fit at 108 Warning: Iteration limit reached for robust fitting. > In Warning>Warning.throw at 30 In fit>iRobustLinearFit at 796 In fit>iFit at 404 In fit at 108
Linear model Poly33: sf(x,y) = p00 + p10*x + p01*y + p20*x^2 + p11*x*y + p02*y^2 + p30*x^3 + p21*x^2*y + p12*x*y^2 + p03*y^3 where x is normalized by mean -3.333 and std 6.513 and where y is normalized by mean 113.4 and std 16.34 Coefficients (with 95% confidence bounds): p00 = 2.802e+14 (-4.173e+15, 4.734e+15) p10 = 1.15e+15 (-1.712e+16, 1.942e+16) p01 = -6.475 (-15.32, 2.371) p20 = 1.005 (-3.659, 5.67) p11 = -1.507 (-6.289, 3.275) p02 = -0.3495 (-5.084, 4.385) p30 = -8.362e+14 (-1.413e+16, 1.245e+16) p21 = -0.2157 (-8.185, 7.754) p12 = -0.6978 (-12.13, 10.73) p03 = -0.3148 (-7.039, 6.409)
Is there anything I haven't yet tried out or any mistakes I made? Anything at all would be helpful. Thank you.
---------------------------------------------------------------------------------------------------
'Statistical Modeling > Regression' 카테고리의 다른 글
Symbolic Polynomial Manipulation (0) | 2016.06.03 |
---|---|
polyfitn - extension of polyfit, allowing the user to create models with more than one independent variable (0) | 2016.06.03 |
Create linear regression model (0) | 2016.06.02 |
Predict response of linear regression model (0) | 2016.06.02 |