Data Handling

1812 Words4 Pages

Data Handling

In this coursework, I shall investigate whether there is a correlation

between the height and weight of students at fictional school,

Mayfield High. I expect there to be a positive correlation between the

height and the weight of the students as it would make sense if the

taller someone was, the heavier they would be. I am going to

investigate the differences between each year, and between male and

female overall. I am doing this for a number of reasons. Firstly, the

range in the higher years is likely to be lower because a lot of

people have finished growing, or have passed their optimum growth

rate. Therefore the data is likely to be more close together. I also

feel it would be interesting to see if there is a difference between

the correlations of males and females and what these differences are.

To help me do this, I am going to create scatter graphs, histograms

and box and whisker plots, as I feel that this will help my

investigation a lot. I will use scatter graphs to compare all of my

data and find correlations and standard deviation. I will use my

histograms and box and whisker plots to investigate further the weight

differences between each year and the boys and girls. I will find the

quartiles and the median etc.


I could do this investigation using all 1183 pieces of data, however

this would be extremely time consuming. I therefore am going to take a

random stratified sample. If I make my sample large enough I am

confident that the results will give a good indication of the results

as a whole. By sampling, I simplify my calculations and graphs

dramatically. However I must make sure that the sample I take is

completely unbiased otherwise my results will be corrupt.


To see how many pieces of data I need to sample, I used the following


Number of students in group x Amount of data being stratified

Open Document