How to Analyze the Regression Analysis Output from Excel
In a simple regression model, we are trying to determine if a variable Y is linearly dependent on variable X. That is, whenever X changes, Y also changes linearly. A linear relationship is a straight line relationship. In the form of an equation, this relationship can be expressed as
Y = α + βX + e
In this equation, Y is the dependent variable
, and X is the independent variable. α is the intercept of the regression line, and β is the slope of the regression line. e is the random disturbance
The way to interpret the above equation is as follows:
Y = α + βX (ignoring the disturbance term “e”)
gives the average relationship between the values of Y and X.
For example, let Y be the cost of goods
sold and X be the sales. If α = 2 and β = 0.75, and if the sales are 100, i.e., X = 100, the cost of goods sold would be, on average,
2 + 0.75(100) = 77. However, in any particular year when sales X = 100, the actual cost of goods sold can deviate randomly around 77. This deviation from the average is called the “disturbance” or the “error” and is represented by “e”.
Also, in the equation
Y = 2 + 0.75X + e
Cost of goods sold = 2 + 0.75 (sales) + e
the interpretation is that the cost of goods sold increase by 0.75 times the increase in sales. For example, if the sales increase by 20, the cost of goods sold increase, on average, by 0.75 (20) = 15. In general, we are much more interested in the value of the slope of the regression line, β, than in the value of the intercept, α.
Now, suppose we are trying to determine if there is a relationship between two variables which have apparently no relationship, say the sales of a firm, and the average height of employees of the firm. We would set up an equation like the following:
Y = α + βX + e
Y = sales of firm, X = average height of employees, α = intercept of the regression line,
β = slope of the regression line, and e = disturbance term
Then, we would collect a sample of data from a number of firms regarding sales and average height of employees. The relationship between the two variables is estimated by a technique called the “ordinary least squares”. If indeed there is no relationship between the two variables, what do you expect the value of β, the slope of the regression line to be? We would expect this value to be zero, or some number close to it.
Though there may not be any real relationship between the two variables, the estimated value of β may not be exactly (and most probably will not be) equal to zero. But, if we were to repeat this exercise of estimating the value of β using many more samples, we would expect the value of β to be zero on average. However, we have only one sample of a certain number of firms (say, 30 firms) and we have to make an inference from this one sample whether X influences Y (i.e. β ≠ 0) or X does not influence Y (i.e. β = 0). How do we make such an inference? Testing whether β = 0 or β ≠ 0 is called the Test of Significance. In other words, we are trying to test if the independent variable X (average height in our example) is significant in explaining (or determining) what the value of Y (sales) would be. If indeed X is significant in explaining in Y, then whenever X changes we would expect Y to change in a systematic manner as well. In this case, β would not be equal to zero. However, if X is not significant in explaining Y, then changes in X would not cause systematic changes in Y. In this case, β should be STATISTICALLY close to zero.
To determine whether β is statistically close to zero (and infer whether there is any relationship at all between the variables X and Y), we can make use of various measures to make an inference regarding the significance of variable X in explaining variable Y. These measures are as follows:
R2: This statistic measures the percentage of variation in the dependent variable Y which is explained by the independent variable X. The value of R2 is always between 0 and 1. The weaker the relationship between the two variables, the closer is the value of R2 to 0. The stronger the relationship between the two variables, the closer is the value of R2 to 1.
t-value: A rough rule of thumb to determine the significance of X in explaining Y is that the t-value of the slope coefficient, β, should be at least 2. The greater the t-value, the more is the evidence that X is significant in explaining Y.
Significance F: The lower this value, the stronger is the evidence that there is indeed a relationship between X and Y. If this value is less than 0.05, we would be safe in accepting that there is a relationship between X and Y.
P-value: Look at the p-value of the independent variable (and not the intercept). If this p-value is less than 0.05, we would be safe in accepting that there is a relationship between X and Y.
95% Confidence Interval: Look at the 95% confidence interval of the independent variable (not the intercept). If this confidence interval does not contain zero, we would be safe in accepting that there is a relationship between X and Y. However, if the 95% confidence interval contains zero, there is a big chance that we would making a mistake by assuming that there is a relationship between X and Y.