Multiple Regression

1569 Words4 Pages

Introduction For this study researchers were interested in assessing whether self-reported health behaviours and health literacy are able to predict self-rated physical health, after controlling for the effects of gender and age. They are further interested in knowing which of the variables provide a statistically significant contribution to the equation. Also of interest to the researches was the interaction between gender and health literacy, that is, the degree to which individuals are able to obtain, process and understand the information needed to make appropriate decisions about their health, and the impact of this interaction on health. Data was collected from 350 people randomly selected from a dataset from a population-based study of health and health determinants. Health was measured on a scale of 1 to 10, where higher scores represent better health. Health behaviours include healthy diet, physical activity and relaxation and are measured on a scale from 1 to 15. Health literacy is measured on a scale from 10 to 45. Gender and age in years were also collected from the respondents. Data Screening & Assumption Testing The initial step in this data analysis involved screening the data for possible missing values, out of range values, univariate and multivariate outliers and multicollinearity. Three variables used for this study contained missing values; both system and identified missing. These variables were health literacy, physical activity and age in years, one case for each of these variables. Each of these missing values were recoded with a missing value code of 999. Descriptive statistics produced for each of the variables used for the analysis revealed out of range values for the variables healthy diet, physical activity and relaxation. These values were also recoded to the missing value code 999. Testing for the presence of outliers was done by generating a scatterplot matrix for all variables (Figure 1), and plots of Cook’s distances (Figure 2) and Mahalanobis distances (Figure 3). There are no cases which indicate a particular cause for concern. On the Mahalanobis distance chart there are no cases that is substantially larger than the rest and on the Cook’s distance there is no case with a distance above 1 which would indicate an influential point. Multicollinearity was tested and there were no variables with a tolerance of less than 0.3. It is also necessary to check the regression assumptions to ensure that any results from analysis are valid. The first assumption is that all variables are measured on a metric scale or that categorical variables are dichotomously coded. This is true for the data in this study. The second assumption is that each observation in the sample is independent of the other observations, the

Open Document