Essay Color Key

Free Essays
Unrated Essays
Better Essays
Stronger Essays
Powerful Essays
Term Papers
Research Papers





Football Statistics Project

Rate This Paper:

Length: 3819 words (10.9 double-spaced pages)
Rating: Red (FREE)      
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Football Statistics Project


Introduction
------------

I have chosen to base my project on football statistics because they
are both readily available and interesting enough for deep analysis.
As a starting point I decided to look at the generally accepted theory
of 'Home Advantage'.

Home advantage, or the tendency for the home team to do better than
they would away, could have several causes. It could be partly
psychological - the home team would almost always have the majority of
the crowd behind them, cheering them on. It could also be to do with
the condition of the pitch - Premiership teams sometimes find it hard
to play on muddy, waterlogged pitches of some lower-division teams.

Another factor is the attitudes of referees and officials. Because
they are intimidated by the home crowd they often give decisions in
favour of the home team, meaning teams may also have a worse
disciplinary record when playing away.


Hypotheses:

1. Teams have a worse disciplinary record away than at home

2. Better attended teams have a greater home advantage

3. More successful teams have a better disciplinary record


[IMAGE]Collecting Data

I found that football statistics were easy to find on the internet. I
obtained mine from two main sites:

http://soccer-stats.football365.com

http://www.bettingzone.co.uk

There is a very small risk that some of the data I collected could be
incorrect. However, I have found alternate sites for the Premiership
statistics (such as www.4thegame.com) which gave the same results. I
also think that a betting site must give accurate statistics because
they are such an important part of gambling


Using Software

I chose to input my data into Microsoft Excel because it makes it much
quicker and easier to manipulate the data.


Hypothesis 1 - Teams have a worse disciplinary record away than at
home
------------------------------------------------------------------


Discipline 'points' system

On the internet I was able to find out the numbers of red and yellow
cards for each team at home and away. However, in order to give an
overall impression of how good or bad the team's discipline was I
needed to turn these two pieces of data into one measurement. I
decided to use the points system (as on www.4thegame.com). Under this
system a yellow card counts for one point whereas a red card is more
severe and counts for three.

To make this easier to calculate I used formulae in Excel:

[IMAGE]


Because some divisions have different numbers of teams than others,
some teams played more games than others. This means their players had
slightly more opportunities to get booked or sent off, so their points
totals might be higher. To correct for this I divided the points
scores by the number of games each team had to play to give a
'Disciplinary Points Per Game' score. This can then be compared to any
other team in any division.

To give a measure of how much better or worse the team's disciplinary
record is away and at home I decided to divide the away points per
game score by the home. I subtracted one from this and expressed it as
a percentage. This gives a positive percentage if the team has a worse
disciplinary record away and a negative one if it is worse at home.


Pilot Study

In order to find out how well my data would support my hypothesis
about teams having a worse disciplinary record away than at home I
made a bar chart using Excel to show the difference between
disciplinary points per game away and at home.

[IMAGE]


As you can see most teams have a considerably worse disciplinary
record away than at home, as shown by the taller red bars. For this
bar chart I simply ranked the teams in the Premiership and the First
Division from the top of the Premiership (1) to the bottom of Division
1 (44). The names of these teams can be found in the appendix at the
back.


Stratified random sampling

In order to better represent football at other levels of the game I
also collected data for lower divisions (Division 2 and Division 3).
However this gave me far too much data - a total of 92 teams - to
perform statistical tests such as the Wilcoxon Signed Rank Test. In
order to cut down on this I decided to use random sampling to lower
the number of teams involved.

However, if I just randomly selected teams from all of the divisions
put together I might over-represent some divisions over others,
affecting the results. To make this fairer I decided to use stratified
random sampling, with the different divisions as the strata. This way
I was sure to get proportionate numbers of teams from each division.

I chose to take 25% of the teams in each division, to give me 23 sets
of data - a much more manageable figure! I chose the teams by writing
the numbers of the teams in each division e.g. 1-24 on small pieces of
paper. I folded these up, shuffled them and picked them at random
until I had the right number.

Once I had chosen the teams I put them in a new spreadsheet. I
produced another bar chart similar to the one I had produced for the
preliminary test. This illustrates how well my randomly sampled data
supports my hypothesis.

[IMAGE]


As you can see the pattern I noticed in the pilot study is continued
with the data from the other divisions. The teams' away disciplinary
record is in almost all cases worse than at home.

As further evidence of this I found the mean disciplinary points per
game at home and away. At home this was about 1.71 compared to about
2.28 away (to 3 significant figures). This shows a 33% difference
between the two. I will now test whether or not this difference is
statistically significant. I chose to compare the means of the two
sets because this gives more weight to big differences between two
scores than small differences.


Wilcoxon signed-rank test

Although graphs and charts can illustrate trends in data they cannot
prove that my hypothesis is true. In order to prove my hypothesis I
will have to use a statistical test. Because my data is nonparametric
(i.e. I have no reason to believe it will follow a normal
distribution) and I am comparing pairs of data from two categories I
will use the Wilcoxon signed-rank test.


Method:

1. First I found the difference between the home and away
disciplinary points per game for each team by subtracting one from
the other using Excel.

2. Because some of the differences were negative I used the abs()
function in Excel to find the absolute values of the differences.

3. I sorted the data by the absolute differences between the home
and away disciplinary points per game. Ignoring the teams where
the difference was zero, I ranked them in order from the lowest to
the highest. Where several were the same I found the mean between
them.

4. I then looked to see where the differences had originally been
negative and I added the negative sign in front of the rank for
those differences. This gave me the signed rank.

5. Finally I found the greatest absolute sum of the signed rank (in
this case the negative ranks), which is the 'W' value. The number
of teams where the difference is not equal to zero gives the 'N'
value.

A

B

Original

Absolute

Rank of absolute

Signed Rank

Team

Home PPG

Away PPG

(XA-XB)

(XA-XB)

(XA-XB)

Manchester United

1.842105263

2.421052632

-0.57895

0.578947

7

-7

Tottenham Hotspur

1.947368421

1.947368421

0

0

Birmingham City

1.947368421

2.842105263

-0.89474

0.894737

13

-13

Aston Villa

2.263157895

2.105263158

0.157895

0.157895

2

2

Bolton Wanderers

2.105263158

2.368421053

-0.26316

0.263158

4

-4

Portsmouth

1.434782609

1.913043478

-0.47826

0.478261

6

-6

Wolverhampton

1.52173913

2.173913043

-0.65217

0.652174

8.5

-8.5

Norwich

1.47826087

1.47826087

0

0

Wimbledon

1.130434783

1.913043478

-0.78261

0.782609

11

-11

Rotherham United

1.869565217

2.869565217

-1

1

14

-14

Grimsby

2.304347826

1.608695652

0.695652

0.695652

10

10

Crewe Alexandria

0.913043478

1

-0.08696

0.086957

1

-1

Cheltenham Town

1.608695652

1.434782609

0.173913

0.173913

3

3

Huddersfield Town

1.130434783

2.52173913

-1.3913

1.391304

19

-19

Northampton Town

1.826086957

1.826086957

0

0

Bristol City

1.434782609

2.782608696

-1.34783

1.347826

18

-18

QPR

1.695652174

2.782608696

-1.08696

1.086957

16.5

-16.5

Rushden & Diamonds

1.608695652

2.652173913

-1.04348

1.043478

15

-15

Lincoln City

2

2.652173913

-0.65217

0.652174

8.5

-8.5

Bury

1.043478261

2.608695652

-1.56522

1.565217

20

-20

Darlington

2.217391304

2.565217391

-0.34783

0.347826

5

-5

Leyton Orient

1.826086957

2.695652174

-0.86957

0.869565

12

-12

Shrewsbury Town

2.173913043

3.260869565

-1.08696

1.086957

16.5

-16.5

W

-153

|W|

153

N

20

I found that the value of W was 195, and that N, the number of teams
where the difference was not equal to zero, was 20. Looking these up
in a table of critical values (OCR AS/A Level MEI Structured
Mathematics Examination Formulae and Tables, October 2000) I found
that there was only a 5% chance that the difference between home and
away points per game was due to chance alone. This means that there is
a 95% probability that the difference between disciplinary record at
home and away is not due to chance alone. Therefore my hypothesis is
highly likely to be correct.


Hypothesis 2 - Better attended teams have a greater home advantage
------------------------------------------------------------------

I proposed this hypothesis because a better attended team would have
more of the crowd behind them when playing at home, giving them a
psychological advantage over their opponents.

As with the disciplinary points system, I used Excel to find the
points per game score for each team both at home and away. This time I
divided the home points per game score by the away and subtracted one
from this, expressing it as a percentage.

A problem arises because some teams have much bigger stadiums than
others. For example, 20,000 might be considered good attendance for a
First Division club, but very poor for a Premiership team. Because of
this I divided the total capacity of each football ground by the
average number of home supporters there to give the average attendance
percentage. I plotted this against the home advantage percentage in a
scatter graph.


Pilot Study

The scatter graph is a useful way of looking for correlation between
two variables. As with the first hypothesis I used the data for the
Premiership and the First Division as a pilot test.

[IMAGE]

As you can see there is no strong correlation between these two
variables. There may be a slight trend for the higher home advantage
percentages to be towards the higher percentages of stadium capacity.
I decided to continue investigating this hypothesis because there
might be clearer correlation in the data from the other divisions.


Spearman's Rank

In order to tell for certain whether or not there is correlation
between home advantage and attendance Because this data is also
nonparametric I will need to use the Spearman's Rank Correlation
Coefficient.


Method:

1. The first step was to rank the teams by both % Home Advantage and
Average % Capacity. As with the Wilcoxon test I found the mean of
tied ranks.

2. I found the difference between these two ranks by subtracting one
from the other using Excel.

3. I then squared the differences between the two ranks.

4. I used the formula below to find rs, the Spearman's Rank
Correlation Coefficient. My workings are illustrated in the table
overleaf.

[IMAGE]

[IMAGE]

d = the difference in the rank of the values of each matched pair

n = the number of pairs


rs = 1 - 6∑d2

[IMAGE] n3 - n

Team

% PPG Home Advantage

Average % capacity

%PPG Home Advantage Rank

Average % Capacity Rank

d

d2

Manchester United

52%

99.16%

24

1

23

529

Tottenham Hotspur

63%

99.06%

19

2

17

289

Portsmouth

23%

98.72%

35

3

32

1024

West Ham United

10%

96.59%

41

4

37

1369

Birmingham City

53%

96.07%

22

5

17

289

Everton

81%

95.82%

10

6

4

16

Brighton

50%

95.56%

26

7

19

361

West Bromwich Albion

17%

95.46%

37.5

8

29.5

870.25

Liverpool

21%

95.33%

36

9

27

729

Norwich

100%

94.81%

4.5

10

-5.5

30.25

Wolverhampton

-5%

90.25%

44

11

33

1089

Bolton Wanderers

93%

89.73%

8

12

-4

16

Hull City

68%

84.72%

16

13

3

9

Blackburn Rovers

31%

83.61%

29

14

15

225

Aston Villa

250%

81.87%

1

15

-14

196

Nottingham Forest

96%

79.85%

6.5

16

-9.5

90.25

Derby

60%

75.81%

20

17

3

9

QPR

24%

68.97%

34

18

16

256

Hartlepool United

66%

68.38%

17

19

-2

4

Northampton Town

79%

68.09%

11

20

-9

81

Crewe Alexandria

-21%

67.04%

46

21

25

625

Rotherham United

27%

65.33%

31

22

9

81

Rushden & Diamonds

56%

65.26%

21

23

-2

4

Preston North End

90%

64.70%

9

24

-15

225

Watford

73%

64.45%

13

25

-12

144

Cheltenham Town

29%

62.85%

30

26

4

16

Grimsby

17%

58.65%

37.5

27

10.5

110.25

AFC Bournemouth

96%

58.37%

6.5

28

-21.5

462.25

Bristol City

52%

55.36%

24

29

-5

25

York City

75%

46.48%

12

30

-18

324

Boston United

105%

46.20%

3

31

-28

784

Chesterfield

185%

45.85%

2

32

-30

900

Shrewsbury Town

5%

45.70%

42

33

9

81

Colchester United

15%

44.83%

39.5

34

5.5

30.25

Milwall

44%

42.24%

27

35

-8

64

Barnsley

26%

42.09%

32.5

36

-3.5

12.25

Scunthorpe United

32%

40.20%

28

37

-9

81

Darlington

70%

39.17%

15

38

-23

529

Huddersfield Town

100%

38.80%

4.5

39

-34.5

1190.25

Lincoln City

26%

35.94%

32.5

40

-7.5

56.25

Leyton Orient

65%

33.84%

18

41

-23

529

Peterborough United

15%

32.33%

39.5

42

-2.5

6.25

Wigan Athletic

-4%

29.15%

43

43

0

0

Bury

-16%

27.65%

45

44

1

1

Port Vale

52%

19.84%

24

45

-21

441

Wimbledon

71%

10.55%

14

46

-32

1024

∑d2

15227.5

n

46

n3

97336

1 - ((6∑d2) / (n3 - n))

0.0609

I found that rs = 0.0609, and that the critical value for rs at 10%
was 0.2456 (OCR AS/A Level MEI Structured Mathematics Examination
Formulae and Tables, October 2000). This means that the data fails the
test for correlation at 10%, meaning there is a greater than 10%
probability that any apparent correlation occurred only by chance.

This is no great surprise to me, as the pilot test showed little or no
correlation. Unfortunately my hypothesis does not seem to be correct.
Perhaps the fact that away supporters are not included might have made
a difference - if a team is well-supported away from home it might
reverse the disadvantage I predicted. I could not find any data on
away supporters so I am unable to investigate this possibility.


Hypothesis 3 - More successful teams have a better disciplinary record
----------------------------------------------------------------------


Pilot Study

My idea for a third hypothesis was that a team struggling at the
bottom of the table facing relegation would lose confidence and become
desperate, causing the players to commit more fouls. On the other
hand, a team was near the top of the table would be confident and more
relaxed, and would not feel the need for desperate challenges etc.

As a pilot test I decided to plot a scatter graph to look for a
relationship between the position of a team within its division and
its disciplinary points per game. As with the other tests I used only
the data for the Premiership and the First Divisio[IMAGE]n.

[IMAGE]

This graph doesn't show an obvious trend, but there is a slight
tendency for the disciplinary points to rise further down the table,
especially in the First Division. The second team in Division 1
(Leicester, shown circled) is clearly an outlier, and perhaps if I
continued the study on the other divisions a clearer pattern would
emerge.

In order to test this hypothesis further I decided to take all of the
data from the Football League and randomly select 3 teams from the top
25% and 3 teams from the bottom 25% of each division. This means the
data is collected using stratified random sampling. However, as the
Premiership has only 20 teams instead of 24 it is slightly
over-represented compared to divisions 1-3.

Most importantly I am not using the data from the middle 50% of the
divisions, so any possible patterns there will be lost. However, there
are two good reasons to sacrifice this data. Firstly, any differences
between successful and unsuccessful teams would be most apparent at
the top and bottom of each division. Secondly I need a more manageable
sample size which I can perform statistical tests on.

I produced two histograms to show any difference between top and
bottom teams.

[IMAGE][IMAGE]

As you can see, slightly more teams in the lower quarters of the
divisions have higher disciplinary points per game, while slightly
more teams in the upper quarters of the divisions have lower
disciplinary points per game. The easiest way to tell this is that the
histogram for the bottom 25% is shifted slightly to the right compared
to the one for the top 25%.

I calculated the median for each set of data to give an idea of the
central tendency for each distribution. I used the mean because I am
comparing the 'average team' in the top 25% with the 'average team' in
the bottom 25%. The median for the upper quarters is 2.12 and for the
lower quarters, 2.41 (answers to 2 decimal places), meaning there is a
14% difference between the two. This suggests that the disciplinary
points per game for the lower teams are generally higher than those of
the upper teams.

In order to tell for certain whether or not there is a significant
difference between the lower and upper quarters of the divisions I
would have to perform a statistical test. In this case I will use the
Mann-Whitney U-Test.


Mann-Whitney U-Test

This is a non-parametric statistical test to show whether or not two
groups of samples are from different populations. In this case it will
show whether or not there is a statistically significant difference
between teams in the top and bottom 25% of each division, comparing
their average disciplinary points per game.


Method:

1. First I ranked the data from both groups in increasing order of
size (see column B in the table overleaf).

2. Next, for each team in group b, I counted how many teams in group
a had a smaller disciplinary points per game total. Teams with
equal disciplinary points per game scored ½. I did the same for
group a. See column C in the table.

3. I found the total of the column C values for both group a and
group b. I called these two totals Uaand Ub.

4. I chose the smaller value of U and I looked up the critical
values of U at the 5% significance level.


A

B

C

D

Team

Average disciplinary points per game

Number of teams in other group with a lower points per game score

Top (a, blue) or bottom (b, red) group

Hartlepool United

1.369565217

0

a

Bristol Rovers

1.652173913

1

b

Portsmouth

1.673913043

1

a

Cardiff City

1.739130435

1

a

Sheffield United

1.826086957

1

a

Rochdale

1.826086957

4

b

Huddersfield Town

1.826086957

4

b

AFC Bournemouth

1.847826087

3

a

Wolverhampton

1.847826087

3

a

Brighton

1.913043478

6

b

Sheffield Wednesday

1.956521739

6

b

Chelsea

1.973684211

5

a

Stoke

1.97826087

7

b

Liverpool

2.131578947

6

a

Aston Villa

2.184210526

8

b

Scunthorpe United

2.195652174

7

a

Bolton Wanderers

2.236842105

9

b

QPR

2.239130435

8.5

a

Barnsley

2.239130435

9.5

b

Swansea City

2.304347826

10

b

West Ham United

2.315789474

10

b

Arsenal

2.342105263

11

a

Oldham Athletic

2.391304348

11

a

Mansfield Town

2.456521739

12

b

Sum of column C values for group a (Ua)

57.5

Sum of column C values for group b (Ub)

86.5


Results

I found that the lower value of U was Ua (57.5). The critical value
for U at the 5% significance level was 37(Advanced Biology Study Guide
by C J Clegg & D G MacKean, 1996). This meant that Ua was larger than
the critical value of U at the 5% significance level. Therefore the
difference between teams in the top and bottom 25% of each division,
comparing the average disciplinary points per game, is not
significant. There is a greater than 5% probability that the
difference was caused by chance alone.

Again this result is hardly surprising considering the lack of strong
correlation in the pilot test. There could be several reasons why this
hypothesis failed. Perhaps certain teams do well whilst still playing
dirty - maybe this is even a valid tactic for success!

It might also be the case that the disciplinary points scores for some
teams are disproportionately increased by certain players who are
frequently booked or sent off - Patrick Viera of Arsenal for example.
I am unable to find data on individual players so I cannot investigate
this further.


Evaluation
----------

I am quite pleased with the way my investigation went. Although
hypotheses 2 and 3 were not statistically supported by my data, these
raised other interesting questions, which could be investigated. Of
course there are certain limitations to my study. The data I used came
from complete, published tables, and its authenticity is not in doubt.
However, there is nothing to say that the 2002-2003 season was a
typical one, and that my results might have been different for a
different year.

Another important point to consider is that the data for different
teams is not independent. For example, because Manchester United was
top of the Premiership, no other team could possibly be top as well.
In fact, even the points totals of the teams are interdependent - a
team can only be judged in comparison to the other teams it plays. It
is possible that every team played worse in the 2002-2003 season than
in previous or subsequent seasons - it is impossible to tell if this
is true as the points totals for each team are relative to those of
the other teams. Therefore there can be no stand-alone measure of how
good a team is.

It is also important to remember that football is a sport played at
many levels, in hundreds of countries and by many age and social
groups. The English Football League is only a tiny part of this, and
if I conducted my study on different aspects of the game I might obtai
very different results.
Appendix - Team numbers

Premiership

Team

Rank (within division)

Rank (overall)

Manchester United

1

1

Arsenal

2

2

Newcastle United

3

3

Chelsea

4

4

Liverpool

5

5

Blackburn Rovers

6

6

Everton

7

7

Manchester City

8

8

Southampton

9

9

Tottenham Hotspur

10

10

Middlesbrough

11

11

Charlton Athletic

12

12

Birmingham City

13

13

Fulham

14

14

Leeds United

15

15

Aston Villa

16

16

Bolton Wanderers

17

17

West Ham United

18

18

West Bromwich Albion

19

19

Sunderland

20

20

Division 1

Team

Rank (within division)

Rank (overall)

Portsmouth

1

21

Leicester

2

22

Sheffield United

3

23

Reading

4

24

Wolverhampton

5

25

Nottingham Forest

6

26

Ipswich

7

27

Norwich

8

28

Milwall

9

29

Wimbledon

10

30

Gillingham

11

31

Preston North End

12

32

Watford

13

33

Crystal Palace

14

34

Rotherham United

15

35

Burnley

16

36

Walsall

17

37

Derby

18

38

Bradford

19

39

Coventry

20

40

Stoke

21

41

Sheffield Wednesday

22

42

Brighton

23

43

Grimsby

24

44

Division 2

Team

Rank (within division)

Rank (overall)

Wigan Athletic

1

50

Crewe Alexandria

2

45

Bristol City

3

61

QPR

4

64

Oldham Athletic

5

65

Cardiff City

6

49

Tranmere Rovers

7

58

Plymouth Argyle

8

46

Luton Town

9

67

Swindon Town

10

48

Peterborough United

11

56

Colchester United

12

51

Blackpool

13

60

Stockport County

14

52

Notts County

15

63

Brentford

16

57

Port Vale

17

53

Wycombe Wanderers

18

59

Barnsley

19

62

Chesterfield

20

66

Cheltenham Town

21

47

Huddersfield Town

22

54

Mansfield Town

23

68

Northampton Town

24

55

Division 3

Team

Rank (within division)

Rank (overall)

Rushden & Diamonds

1

69

Hartlepool United

2

70

Wrexham

3

71

AFC Bournemouth

4

72

Scunthorpe United

5

73

Lincoln City

6

74

Bury

7

75

Oxford United

8

76

Torquay United

9

77

York City

10

78

Kidderminster Harriers

11

79

Cambridge United

12

80

Hull City

13

81

Darlington

14

82

Boston United

15

83

Macclesfield Town

16

84

Southend United

17

85

Leyton Orient

18

86

Rochdale

19

87

Bristol Rovers

20

88

Swansea City

21

89

Carlisle United

22

90

Exeter City

23

91

Shrewsbury Town

24

92

How to Cite this Page

MLA Citation:
"Football Statistics Project." 123HelpMe.com. 17 Apr 2014
    <http://www.123HelpMe.com/view.asp?id=148009>.




Related Searches





Important Note: If you'd like to save a copy of the paper on your computer, you can COPY and PASTE it into your word processor. Please, follow these steps to do that in Windows:

1. Select the text of the paper with the mouse and press Ctrl+C.
2. Open your word processor and press Ctrl+V.

Company's Liability

123HelpMe.com (the "Web Site") is produced by the "Company". The contents of this Web Site, such as text, graphics, images, audio, video and all other material ("Material"), are protected by copyright under both United States and foreign laws. The Company makes no representations about the accuracy, reliability, completeness, or timeliness of the Material or about the results to be obtained from using the Material. You expressly agree that any use of the Material is entirely at your own risk. Most of the Material on the Web Site is provided and maintained by third parties. This third party Material may not be screened by the Company prior to its inclusion on the Web Site. You expressly agree that the Company is not liable or responsible for any defamatory, offensive, or illegal conduct of other subscribers or third parties.

The Materials are provided on an as-is basis without warranty express or implied. The Company and its suppliers and affiliates disclaim all warranties, including the warranty of non-infringement of proprietary or third party rights, and the warranty of fitness for a particular purpose. The Company and its suppliers make no warranties as to the accuracy, reliability, completeness, or timeliness of the material, services, text, graphics and links.

For a complete statement of the Terms of Service, please see our website. By obtaining these materials you agree to abide by the terms herein, by our Terms of Service as posted on the website and any and all alterations, revisions and amendments thereto.



Return to 123HelpMe.com

Copyright © 2000-2013 123HelpMe.com. All rights reserved. Terms of Service