# Statistics Project

Statistics Project

I have been given instructions to collect data for my GCSE statistics
coursework and then to represent them by interpreting them using
graphs and attributes, which I think influence the prices of a second
hand car. Below is my coursework flowchart that will show the steps I
will take to complete my coursework.

FLOWCHART
=========

1.Formulate my hypothesis

[IMAGE][IMAGE][IMAGE][IMAGE][IMAGE][IMAGE]

8. Ideas for further investigation

7. Draw conclusions

5.Analyse and interpret

4.Present the data

3.Record data

2. Plan data collection

HYPOTHESIS

I think that the age, make, amount of owners will affect the price of
the second hand cars. In the investigation, I will use several
different types of charts and graphs to support and show my hypothesis
is correct and to show if my predictions were incorrect, or correct.
After I have investigated some of the attributes, I will then draw up
another hypothesis and see what will happen at the second half of my
investigation. I think that if things do change, the tax amount and
mileage may affect the price.

DATA COLLECTION
---------------

There are many types of data, to obtain it we must observe and measure
something. This is known as a variable, there are two types of
variables:

-Quantitative variables, which have NUMERICAL observations or
measurements.

-Qualitative variables, which have NON NUMBERICAL observations or
measurements.

The other types of data I may consider using are

Continuous data which are measured on a scale and can take any value
on that scale

Discrete data which is concerned with a number of countable values.

Collecting data

Two types of data used can be Primary data and secondary data.

Primary data is data that is collected by or for the person who is
going to use it. Secondary data is data that is not collected by the
person who uses it.

Data can be collected using an experiment of survey.

In a statistical
experiment, one of the variables is controlled while the other is
observed.

The controlled variable is called the explanatory or independent
variables
The effect being observed is called response of dependent data.

A Survey can be particularly useful because the data is likely to be
personal. It may fail however if a person was not telling the truth.

SAMPLING

When sample data are collected, information is taken from part of the
population. The information is then used to make conclusions about the
population.

RANDOM SAMPLING

Random sampling is a sampling technique where we select a group of
subjects (a sample) for study from a larger group (a population). Each
individual is chosen entirely by chance and each member of the
population has a known, but possibly non-equal, chance of being
included in the sample.

By using random sampling, the likelihood of bias is reduced.

Simple Random Sampling

Simple random sampling is the basic sampling technique where we select
a group of subjects (a sample) for study from a larger group (a
population). Each individual is chosen entirely by chance and each
member of the population has an equal chance of being included in the
sample. Every possible sample of a given size has the same chance of
selection; i.e. each member of the population is equally likely to be
chosen at any stage in the sampling process.

Stratified Sampling

There may often be factors which divide up the population into
sub-populations (groups / strata) and we may expect the measurement of
interest to vary among the different sub-populations. This has to be
accounted for when we select a sample from the population in order
that we obtain a sample that is representative of the population. This
is achieved by stratified sampling.

A stratified sample is obtained by taking samples from each stratum or
sub-group of a population.

When we sample a population with several strata, we generally require
that the proportion of each stratum in the sample should be the same
as in the population.

Stratified sampling techniques are generally used when the population
is heterogeneous, or dissimilar, where certain homogeneous, or
similar, sub-populations can be isolated (strata). Simple random
sampling is most appropriate when the entire population from which the
sample is taken is homogeneous. Some reasons for using stratified
sampling over simple random sampling are:

a. the cost per observation in the survey may be reduced;

b. estimates of the population parameters may be wanted for each
sub-population;

c. increased accuracy at given cost.

Example
Suppose a farmer wishes to work out the average milk yield of each cow
type in his herd which consists of Ayrshire, Friesian, Galloway and
Jersey cows. He could divide up his herd into the four sub-groups and
take samples from these.

Quota Sampling

Quota sampling is a method of sampling widely used in opinion polling
and market research. Interviewers are each given a quota of subjects
of specified type to attempt to recruit for example, an interviewer
might be told to go out and select 20 adult men and 20 adult women, 10
teenage girls and 10 teenage boys so that they could interview them

It suffers from a number of practical flaws, the most basic of which
is that the sample is not a random sample and therefore the sampling
distributions of any statistics are unknown

Below is a graph showing the comparisons of prices, new and second
hand for the first 20 cars.

[IMAGE]

You can see the comparison of the prices new and second hand.
Obviously the second hand prices are less, but if you look carefully
the prices vary more for the new ones than the second hand. You can
see the new cars go up and down in price, ranging from 20,000 down to
7,000. This showed me there was something that really affects the
price, because even the Mercedes which is the most expensive new car,
was in the range of the fords and Nissans.

Below is a tally chart showing the colour of the cars in tally, and
how frequent they are in the first 20 cars

-------------------------------------------------------------------

Colour

Tally

Frequency

Black

llll

4

Red

lll

3

White

llll

4

Blue

llll

4

Silver

ll

2

Marine

l

1

Green

l

1

Gold

l

1

20

You can see the most frequency colours in the first 20 cars were
black, white and blue, which does not prove that it affects the cars
pricing because, when looking at the prices on the previous page, you
can see all cars vary between colours and prices. I will now represent
the data on the left on a pie chart, to give a more easier view.

Below is a pie chart expressing the colours of the cars

[IMAGE]

You can see a much easier view of the colours, and how they vary, less
than half of the data is taken up by the black white and blue cars, as
you can see it is very popular. But this still does not prove that it
affects the price

Below is a line graph showing the first 20 cars and their make along
with the price it is second hand.

[IMAGE]

The results have helped support my hypothesis incredibly, when looking
at the graph you can see that that, the second dot, the Mercedes, is
way above the rest, as we know Mercedes is a very well known make, so
looking at this helps us identify that the better the make, the more
you will have to pay.

This is a look at how the amount of owners can affect the price
---------------------------------------------------------------

Owners

Price second hand

Make

1

7999

Ford

1

10999

Mercedes

1

7999

Vauxhall

2

6595

Vauxhall

1

3999

Nissan

1

4999

Renault

1

5999

Mitsubushi

1

6999

Rover

2

6999

Renault

2

7499

Vauxhall

2

3495

BMW

2

6499

Vauxhall

3

5995

Fiat

1

4995

Rover

1

3995

Mitsubushi

3

3795

Fiat

1

5999

Mitsubushi

2

1995

Fiat

1

3795

Rover

In my hypothesis I stated that the amount of owners would affect the
price, the more owners, the less the price will be. When looking at
the chart above, you can see that the cars with 3 owners, the two
fiats are very low in price, but comparing this to the others that
have one, they are more. However, as I look at the mitsubushi, it is
only a couple of hundred pounds more and it as only had one owner,
this has confused me and my hypothesis is on 50% correct. I will need
to research more into the data if I want to get a more clearer and
accurate result.

Below is another chart which shows the mileage of the cars, and the
make, this shows what cars have been driven more. You can see that the
Mercedes car has a low mileage even though it is very expensive. The
cheaper cars however have got a higher mileage, so this may encourage
the price to go lower because I has been used more.

This is a look at how the engine can affect the price of the car

Engine

Price second hand

Make

1.8

7999

Ford

1.4

10999

Mercedes

2.5

7999

Vauxhall

1.6

6595

Vauxhall

1

3999

Nissan

1.6

4999

Renault

1.8

5999

Mitsubishi

2.3

6999

Rover

1.6

6999

Renault

1.4

7499

Vauxhall

1.4

3495

BMW

2.5

6499

Vauxhall

2.5

5995

Fiat

1.6

4995

Rover

1.2

3995

Mitsubishi

2

3795

Fiat

1.8

5999

Mitsubishi

0.9

1995

Fiat

1.6

3795

Rover

1.2

1795

Nissan

To prove that the engine affects the price of the car, you will have
to compare two of the cars with the same make. Example, looking at the
rover, it has a 2.3 engine at 6999. But looking at the other rover, it
has a size of 1.6, but is only 4995, this could mean that the engine
size does affect the price of the car.

BELOW IS A GRAPH COMPARING THE LAST 30 CARS AND THEIR SECOND HAND
PRICES

[IMAGE]

You can see from the chart above that, the expensive make MERCEDES has
the highest price ranging from 15000-20000 and this gives an idea that
the the better the make, the more expensive the car. So my hypothesis
was correct.

[IMAGE]
You can notice that, the mileage is more between the cheaper cars, and
less between the expensive cars. This could mean, the better the make,
the less mileage, the more expensive car.

Below is a chart showing the mileage vs second hand price.

[IMAGE]

You can see from the chart above that, the more the mileage, the
cheaper the price, because you can see from the first car, the mileage
is very low, and the price is more. But looking at the 12th to 14th
car, you can see the car price is very low.

Below are a series of charts which represent the prices of 3 makes,
Ford, Vauxhall and Rover.

FORD

You can see from the left a scatter graph with the line of best fit;
the prices are evenly spread out. There is a negative correlation.

[IMAGE]

Once again there is a negative correlation between the prices.

[IMAGE]

VAUXHALL
--------

Here there is a positive correlation, the prices are evenly spread.

[IMAGE]

Here there is another positive correlation, the prices are also evenly
spread and you can see that a lot of the prices are closely
positioned.

[IMAGE]

ROVER

There is no correlation.

[IMAGE]

There is no correlation.

[IMAGE]

Now I will use simple random sampling to represent my data, this means
I will pick out random data and interpret them.

Below I have randomly selected 4 cars, with their price when new and
second hand, this is simply to show how the much the price of the
second hand car is different to when new.

[IMAGE]

If you look at the graph, you can see that the price when new is
almost more than double the second hand price, but with the Mercedes,
a high class make, the prices are not that spread out, this could help
me prove that the make could affect the price. And this information
will be used when I decide to group my data and to prove my hypothesis
is correct.

[IMAGE]

Above is another graph showing the prices second hand and new for 5
random cars which are of an equal class. As you can now see, the new
prices are now more than doubled, which has helped me prove my
hypothesis that the make does effect the price.

BELOW IS A SERIES OF TALLY CHARTS SHOWING THE COLOURS OF THE CARS AND
SHOWING THE MOST POPULAR COLOUR.

First 20 cars

Colour

Tally

Frequency

BLACK

IIII

4

RED

III

3

WHITE

IIII

4

BLUE

IIII

4

GREEN

I

1

MARINE

I

1

GOLD

I

1

SILVER

II

2 = 20

Black, white and blue seem to be the most popular colour in the first
20 cars.

Next 20 cars

Colour

Tally

Frequency

SILVER

IIII

4

BLUE

III

3

RED

[IMAGE]IIII I

6

NIGHTFIRE

I

1

GREEN

III

3

BLACK

III

3 = 20

The most popular colour in the next 20 cars is the red colour.

I Will now use more accurate methods of finding results, I will use
stratified sampling by categorising my data.

I will take 3 makes with their models and then I will compare the new
prices with the second hand prices, and then compare the prices with
the other make prices.

VAUXHALL

[IMAGE]

Above you can see that the prices of the 5 Vauxhall cars when new were
in the same price range, but looking at the second hand prices, the
prices have decreased by a huge margin, this may prove that the lower
class cars prices decrease larger than the more high class cars.

MERCEDES

[IMAGE]

Now from the picture on the previous page, you can notice that the
Mercedes is a very expensive make, and the second hand price have
decreased from the new prices, but there is a less amount decreased.
With the Vauxhall cars, you saw that the second hand priced halved,
when you look at the Mercedes, you will see it has not halved, but it
has decreased by a small amount. This can prove that the MAKE does
indeed express the second hand price. So my hypothesis is correct.

FORD

[IMAGE]

Now you can see another change, the ford make is a highly used car but
not considered to be at a high class, so this proved my hypothesis
that the better the make,

I will now compare other makes and their prices to get a greater look
at the comparison between prices.

[IMAGE]

Above you can see a great deal of difference, the better make Mercedes
has a high second hand price than the ford, rover and Volkswagen when
new, so this does prove my hypothesis again that make affects the
price of a car.