The Guardian vs. The Mirror

The Guardian vs. The Mirror

I am doing an investigation into the statistical differences between
the daily tabloid newspapers, and the weekly broadsheet newspapers.

My overall hypothesis is that the daily tabloid papers - here
represented by the Saturday edition of The Mirror, a daily tabloid -
represented by the Guardian, a weekly broadsheet - To reach a
conclusion, I plan to test three hypothesise in specific area. I will
use a range of sampling methods, and presentation of data, in order to
form valid conclusions.

Planning

1 - My hypothesis is that the number of letters per word will be
greater in the Guardian than in the Mirror.

Number of letters - I will count the number of letters in every fourth
word.

In order to make my calculations accurate enough to reach a valid
conclusion, I must collect a minimum of twenty pieces of data from
each newspaper. I was planning to collect data from fourth word, in
the first sentence on each page. However, if my second hypothesis is
correct, then the sentences in the Guardian will be longer than those
in The Mirror. This would corrupt the results, as some would be more
accurate than others. So, I have decided to take the fourth and the
eighth word from the first article on each page. The sections of each
paper I have chosen are twenty-five pages long, so this will provide
more than enough data to support any conclusion I reach, and should
incorporate all sections of each newspaper.

I will display my results in a data frequency chart. Then I will use
averages and histograms, to compare the results and draw my
conclusion.

2 - My second hypothesis is that the number of words per sentence will
be fewer in The Mirror than in the Guardian.

Number of words - I'll count the number of words in the first
sentence, on each page.

In order to make my calculations accurate enough to reach a valid

conclusion, I must collect a minimum of twenty pieces of data from
each newspaper. The section I've chosen from each newspaper is
twenty-five pages long, so I will collect data from the first
sentence, in the first article of every page. This should incorporate
all sections of both newspapers, and provide more than enough data to
support any conclusion I reach.

I will display my results in a data frequency chart. Then I will use
standard deviation, averages, histograms, box and whisker diagrams,
and the quartile range, to compare the results and draw my conclusion.

3 - My hypothesis is that the larger the number of words in the
headline, the longer the article. I also believe that the number of
words in the headline and/or article, will be greater in the Guardian
than in The Mirror.

Number of words in the headline - All words will be included.

Length of article - The Guardian has a standard column width, so I
could simply measure the length of the column with a ruler. However,
The Mirror uses two different standard widths. I can't exclude columns
of one width, as there may be a pattern to which articles have the
wider width column, and which articles have the narrower one. The
Mirror is not separated into sections, in the same way the Guardian is
- e.g. finance, politics, sport - but the column width may be it's
equivalent way of sectioning off different forms of article. I may,
therefore, be excluding a large part of the newspaper, and, in the
process, invalidating all my results, and conclusions. So instead, I
will use the average number of words per sentence, which I calculated
whilst working on my second hypothesis, and then count the number of
sentences. I'll multiply the number of sentences, which the average
number of words per sentence. I will use this to calculate a
reasonably accurate estimate of the number of words per article.

In order to make my calculations accurate enough to have a valid
conclusion, I must collect a minimum of twenty pieces of data from
each paper. The section of each paper I have selected is twenty-five
pages long, so I will use the first article on every page. This should
incorporate all sections in the Guardian, and all the column widths
used in The Mirror. This will provide more than enough data to support
the conclusion I reach.

I will display my results in a scatter graph, and use a line of best
fit, to see if there is a positive correlation between the number of
words in the headline, and the article. I will also work out the mean
ratio of number of headline words to number of words in article, for
each newspaper, and use this to compare them. If my hypothesis is
correct, we may be able to compare the length of the articles, simply

Collecting Data

No. of Letters

Frequency of The Mirror

Total

1-2

IIIII III

8 x 2 = 16

3-4

IIIII IIIII IIIII IIIII

20 x 4 = 80

5-6

IIIII IIIII III

13 x 6 = 78

7-8

IIIII IIII

9 x 8 = 72

9-10

0 X 10 = 0

11-12

0 X 12 = 0

1 -

No. of Letters

Frequency of The Mirror

Cumulative Frequency

1-2

IIIII III

8 (+ 20)

3-4

IIIII IIIII IIIII IIIII

28 (+13)

5-6

IIIII IIIII III

41 (+ 9)

7-8

IIIII IIII

50 (+ 0)

9-10

50 (+ 0)

No. of Letters

Frequency of the Guardian

Total

1-2

IIIII

5 x 2 = 10

3-4

IIIII IIIII III

13 x 4 = 52

5-6

IIIII IIIII IIII I

16 x 6 = 96

7-8

IIII

4 x 8 = 32

9-10

IIIII IIII

9 x 10 = 90

11-12

III

3 x 12 = 36

No. of Letters

Frequency of The Guardian

Cumulative Frequency

1-2

IIIII

5 (+ 13)

3-4

IIIII IIIII III

18 (+ 16)

5-6

IIIII IIIII IIIII I

53 (+ 4)

7-8

IIII

57 (+ 9)

9-10

IIIII IIII

66 (+ 3)

11-12

III

69 (+ 0)

The information in these data frequency charts, and cumulative
frequency charts can also be displayed in many different ways. First,
I will use it to work out the mode, range and mean number of letters
per word:

Mean for The Mirror: 8 16

20 80

13 78

9 + 72 +

50 246

246/50 = 4.92

Mean for the Guardian: 5 10

13 52

16 96

4 32

9 90

3 + 36 +

50 316

316/50 = 6.32

My hypothesis was correct. The mean number of letters per word in the
Guardian is 1.4 more than the mean number of letters per word in the
Mirror.

Mode for the Mirror: 3-4

Mode for the Guardian: 5-6

This is also proves my hypothesis, as the mode for the Guardian

is higher than the mode for the Mirror. However, the mode for the
Guardian is still low, but that is because there are still many more
words with 5-6 letters than there are with 12. However if you look at
the range:

Range for the Mirror: 8-1= 7

Range for the Guardian: 12-1=11

You can see that the Guardian does have a much larger range,
demonstrating that it does use the longer words I predicted it would,
although it obviously still has to use average length words as well.

I will also use a histogram to back up the results shown above in a
clearer format.

No. of Letters

Frequency of the Mirror

Frequency Density

1-2

IIIII III

8/1 = 1

3-4

IIIII IIIII IIIII IIIII

20/3 = 6.66

5-6

IIIII IIIII III

13/5 = 2.6

7-8

IIIII IIII

9/7 = 1.29 (2dp)

No. of Letters

Frequency of the Guardian

Frequency Density

1-2

IIIII

5/1 = 5

3-4

IIIII IIIII III

13/3 = 4.33

5-6

IIIII IIIII IIIII I

16/5 = 3.2

7-8

IIII

4/7 = 0.57

9-10

IIIII IIII

9/9 = 1

11-12

III

3/11 = 0.27

The Mirror Histogram

y

[IMAGE] 6.7

6

5

4

3

2

1

x

0 1-2 3-4 5-6 7-8

The Guardian Histogram

y

[IMAGE] 5

4

3

2

1

x

0 1-2 3-4 5-6 7-8 9-10 10-11

I conclude that my hypothesis was correct. The number of letters per
word in the Guardian is greater than the number of letters per word in
The Mirror. I think I tested it effectively and fairly, although it
may have been useful to check the consistency of each result using
standard deviation, or by working out the inter-quartile range.
However, this information still clearly proves my hypothesis.

No. of Words

Frequency of the Mirror

Total

1-5

I

1 x 5 = 5

6-10

III

3 x 10 = 30

11-15

IIII

4 x 15 = 60

16-20

IIIII IIIII IIII

14 x 20 = 280

21-25

III

3 x 25 = 75

No. of Words

Frequency of the Guardian

Total

1-5

0 x 5 = 0

6-10

I

1 x 10 = 10

11-15

I

1 x 15 = 15

16-20

I

1 x 20 = 20

21-25

IIIII II

7 x 25 = 175

26-30

IIIII II

7 x 30 = 210

31-35

I

1 x 35 = 35

36-40

IIIII

5 x 40 = 200

41-45

II

2 x 45 = 90

2 -

I will now use these figures in the data frequency chart, to work out
the averages of each, and compare the two. The mean, mode and range:

The mean of the Mirror: 1 5

3 30

4 60

14 280

3 + 75 +

25 450

450/25 = 18

The mean for the Guardian: 1 10

1 15

1 20

7 175

7 210

1 35

5 200

2 + 90 +

25 755

755/25 = 30.2

My hypothesis was correct. The mean number of words per sentence in
the Guardian was 12.2 more than the mean number of words per sentence
in the Mirror.

The Mode for the Mirror: 16-20

The Mode for the Guardian: (23+28)/2 = 25.5

This backs up my earlier results, proving my hypothesis. The mode for
the Guardian is larger than the mode for the Mirror, although the
range demonstrates the true extent of the Guardians sentences.

Range for the Mirror: 25 - 5 = 20

Range for the Guardian: 90 - 10 = 80

The Guardian has a much larger range. This is because there are always
some exceptions, and there was one sentence which had only 6-10 words.

I will use a histogram to back up my results:

No. of Words

Frequency of the Mirror

Frequency Density

1-5

I

1/5 = 0.2

6-10

III

3/10 = 0.3

11-15

IIII

4/15 = 0.26 (2dp)

16-20

IIIII IIIII IIII

14/20 = 0.7

21-25

III

3/25 = 0.12

No. of words

Frequency of the Guardian

Frequency Density

1-5

0/5 = 0

6-10

I

1/10 = 0.1

11-15

I

1/15 = 0.06 (2dp)

16-20

I

1/20 = 0.05

21-25

IIIII II

7/25 = 0.28

26-30

IIIII II

7/30 = 0.23 (2dp)

31-35

I

1/35 = 0.29 (2dp)

36-40

IIIII

5/40 = 0.125

41-45

II

2/45 = 0.04 (2dp)

The Mirror Histogram

y

[IMAGE][IMAGE][IMAGE][IMAGE] 0.7

0.6

0.5

0.4

[IMAGE][IMAGE][IMAGE] 0.3

[IMAGE]

[IMAGE][IMAGE] 0.2

[IMAGE]

0.1

[IMAGE] x

1-5 6-10 11-15 16-20 21-25

The Guardian Histogram

y

[IMAGE][IMAGE][IMAGE][IMAGE]0.3

[IMAGE]

[IMAGE]

0.2

[IMAGE][IMAGE][IMAGE][IMAGE][IMAGE]0.1

[IMAGE]

[IMAGE]

[IMAGE]

[IMAGE]

0 0-5 6-10 11-15 16-20 21-25 26-30 31-35 36-40 41-45

I conclude that my hypothesis was correct. The number of words per
sentence in the Guardian was greater than the number of words per
sentence in the Mirror. I think my investigation was fair, and the
results were clear.

3 - The Mirror

No. of Sentences per article

No. of words per article

Ratio

Decimal Ratios (3dp)

2

21

21 x 18 = 378

1:189

0.005

4

29

29 x 18 = 522

2:261

0.008

1

14

14 x 18 = 252

1:252

0.004

3

20

20 x 18 = 360

1:120

0.008

15

71

71 x 18=1278

5:426

0.012

5

13

13 x 18 = 234

5:234

0.021

3

32

32 x 18 = 576

3:32

0.094

7

24

24 x 18 = 432

7:24

0.292

5

15

15 x 18 = 270

5:15

0.333

8

11

11 x 18 = 198

8:11

0.727

4

124

124x8=2232

1:31

0.323

3

31

31 x 18 = 558

3:31

0.097

5

97

97 x 18=1746

5:97

0.052

11

24

24 x 18 = 432

11:24

0.042

1

19

19 x 18 = 342

1:19

0.053

3

27

27 x 18 = 486

1:9

0.111

9

66

66 x 18=1188

3:22

0.136

9

43

43 x 18 = 774

9:43

0.209

2

33

33 x 18 = 594

2:33

0.061

7

111

111x18=1998

7:111

0.063

9

20

20 x 18 = 360

9:20

0.45

2

15

15 x 18 = 270

2:15

0.133

7

11

11 x 18 = 198

7:11

0.636

8

24

24 x 18 = 432

1:3

0.333

3

37

37 x 18 = 666

3:37

0.081

The average number of words per sentence: 18

The average number of words per sentence: 30.2

No. of sentences in article

No. of words per article (0dp)

Ratio

Decimal Ratio (3dp)

4

53

53x30.2=1601

4:1601

0.002

7

18

18 x 30.2 =544

7:544

0.013

6

19

19 x 30.2 =574

3:287

0.010

5

19

19 x 30.2 =574

5:574

0.009

7

39

39 x30.2=1178

7:1178

0.006

12

26

26 x30.2=1565

12:1565

0.008

7

14

14 x 30.2 =423

7:423

0.017

8

61

61 x30.2=1842

4:921

0.004

8

24

24 x 30.2 =725

8:725

0.011

6

17

17 x 30.2 =513

6:513

0.012

8

27

27 x 30.2 =815

8:815

0.010

11

35

35 x30.2=1057

11:1057

0.010

4

51

51 x30.2=1540

1:385

0.003

7

17

17 x 30.2 =513

7:513

0.014

9

55

55 x30.2=1661

9:1661

0.005

5

65

65 x30.2=1963

5:1963

0.003

2

15

15 x 30.2 =453

2:453

0.004

4

12

12 x 30.2 =362

2:181

0.011

5

15

15 x30.2 = 453

5:453

0.011

2

23

23 x 30.2 =695

2:695

0.002

6

31

31 x 30.2 =936

1:156

0.006

8

16

16 x 30.2 =483

8:483

0.017

11

29

29 x 30.2 =876

11:876

0.013

9

14

14 x 30.2 =423

1:47

0.021

The Mirror

[IMAGE] y

68

x

64

x

60

56

x

x

52

x

48

44

40

[IMAGE] x

36

x

32

x

x

28

x x

24 x x

20 x x

x x x

16 x x

x x x

12 x

8

4

[IMAGE]

10 2 4 6 8 10 12 X

0

The Guardian

Y

[IMAGE] 130

x

[IMAGE]

120

x

110

100

x

90

80

x

70

x

60

50

x

40

x

x x

30 x x

x x x

x x

20 x x

x x

x x x x

10

[IMAGE]

0 2 4 6 8 10 12 14 16

10

There doesn't appear to be any direct colleration between the no. of
words in the headline, and the number of words in the article. I
conclude that my hypothesis was incorrect, and therefore, I do not
have the data to compare the two results.

My overall conclusion is that my original hypothesis was correct. The
fact that the Guardian uses longer words, has longer sentences, and
you can see from my third investigation, has longer articles, shows
that it is aimed at the more intelligent reader, who intends to find
out more about the subject in context.

I think my investigation went well, although section three's results
were disappointing. I think I investigated them in the quickest,
fairest way possible, and displayed them clearly, in number of ways,
represent the results.