I aim to compare mass-appeal tabloid newspapers and quality newspapers
by attempting to find statistical differences. To represent the
mass-appeal papers, I chose the Daily Mirror and for the text-quality
based newspapers, I chose the Times. Hopefully, there will be some
significant statistical differences in the style of journalism which I
will be able to comment on.

Pre-Test

Data Collection: I decided to choose similar pages from both the Times
and the Mirror with roughly equal numbers of paragraphs and adverts,
pages 4-5, or 4-6, as in the Mirror there were not enough sentences to
take samples from. To find mean sentence lengths in the two papers, I
decided I would sample systematically from my populations, counting
the number of words in every 3rd sentence. I came up on several
problems quickly - should I include headlines in my count? I decided
against it, as headlines tend to be shorter than normal sentences. The
next problem came with numbers - did they get counted as words in the
sentences? Making sure that I did the same with both papers, I decided
to exclude numbers in my count. I also decided to exclude any
sentences in adverts, as the number of adverts on the compared pages
varied. I then took a mean and found the standard deviation of my
data.

To find the average number of words per sentence, I decided to
'cluster-sample', and count the first 30 words in the first paragraph
of page 4 in each paper. I decided that I would again exclude numbers,
and that hyphenated words counted as a single word. Again, when I
found all the data, I found its mean and the standard deviation.

Location: As can be seen from this box and whisker diagram, the Times
has a similar mean sentence length (20 to 1sf.) to the Mirror (18).
This shows that the average sentences in the Mirror and the Times
contain a similar number of words. In the box and whisker diagram for
word lengths, it is visible that their medians are the same.

Statistics Project

The mean
word lengths of both the Times and the Mirror are very similar,
separated by only 0.4 letters (The Times mean - 5.5 words, The Mirror
- 5.1 words), which does not signify much as the sample population was
small.

Spread: The 1st and 3rd quartiles of the Times sentence lengths are
much more widely spaced than those of the Mirror, and the times has a
higher standard deviation (9 compared to the Mirror's 6), showing that
the sentences in the Times on the recorded pages vary more widely in
length than those of the Mirror. The furthest outliers of the Times
sentence lengths are also much more widely spread than those of the
Mirror.

Skew: There seems to be a slight positive skew in the Times's sentence
lengths, but that of the Mirror is pretty normal.

[IMAGE][IMAGE] They also have similar 1st and 3rd quartiles, showing
that there is no significant difference in the lengths of the words
used in the two newspapers.

I think that the pre-test results will be reflected in the results of
the main samples and I would be surprised if this was not the case.

Hypothesis

From the results of the pre-test, I have come up with the following
hypothesis.

1) H(0) There will be no differences between the two papers in terms
of word lengths.

H(1) Times will have longer sentence lengths and a wider spread of
sentence lengths than the Mirror.

2) H(0) The two newspapers will have similar word lengths.

H(1) Times will have longer word lengths.

3) H(0) There will be no significant differences between the picture
to rest area ratio of the two papers.

H(1) Times will have a lower mean picture to text area ratio than the
Mirror.

4) H(0) In corresponding articles written on the same subject and
event, the article

H(1) In corresponding articles written on the same subject and event,
the articles in Times will be longer.

Sampling Techniques

For the population of the project, I used the Times and the Daily
Mirror both printed on the Friday 9th January, 2004. In my opinion,
this was a good day to take the population from as it was mid-week,
meaning there were no bundles of extras and supplements and at the
time there were no big events going on, such as a war as there had
been Iraq recently and important sporting fixtures such as the
Olympic, and newspapers tend to thicken and print 'special editions'.
This meant the newspapers on the 9th January was pretty 'normal, so to
speak, and represented well the general publication of these
newspapers. Also, we chose the tabloid edition of the Times so the
comparison in the picture to text area ratio would be fair.

Obviously, taking the whole population in a newspaper would take an
extremely long time. Therefore, I have taken samples to represent the
population, but at the same time minimise bias. I took samples from 8
different pages in both papers, and to make sure that the pages I
chose correctly represented the entire population, I looked at the
proportion of pages belonging to each category, which were general
news, business, sports and TV/entertainment, in the two papers. I
thought it would be the best to use stratified sampling, as in a
population with several sub-genres, stratified sampling is capable of
taking into account the presence of variety by sampling evenly
according to the ratio of each sub-type. So, to do this, I had to
decide how many pages I was going to take from each of the papers and
in each of the categories. I chose to exclude adverts and non-news
material, like comments, obituaries, letters to the editor and debate,
from the samples because I think that these sections have no
reflection on the actual journalism of the newspapers as they tend to
differ greatly from the rest of the paper. From the proportion of
pages in each category, I decided to take 3 pages of news, 2 of
sports, 2 of business and 1 TV/entertainment page from the Times, and
from the Mirror, 2 news, 2 sports and 4 TV/entertainment pages.
Interestingly, there was no section in the Mirror devoted to business
news. I labelled each page and using a random number generator, I
selected the pages to be sampled from.

For counting the length of the sentences, I used systematic sampling.
Starting on the upper-most sentence in the page, I took sample from
every 4th sentence. I took 80 sample sentences from the 8 selected
pages, so 10 in each page. I used a similar technique for getting the
lengths of the words, but I took 20 samples from each page of every 10th
word.

For the text area to rest area ratio, I obtained the data from the
same 8 pages I took the word and sentence samples from.

In the articles in both papers, I inevitably came across
abbreviations, acronyms and numbers. I decided that in counting the
sentence lengths, I should include them but ignore them when getting
the word lengths, as their lengths are irrelevant to the style of
journalism. I also opted to exclude headlines and captions underneath
pictures as they are usually shortened and therefore are so-called
'outliers' as these are extremes compared to the rest of the
population. I included preposition in both the word and the sentence
lengths count because I felt that their use and frequency something
that could be affected by the journalists.

Also, the systematic sampling I used is not totally unbiased as random
sampling because often, I hit on extremes.

Hypothesis 1

- H(0) There will be no significant difference between the two papers
in terms of word lengths.

H(1) Times will have longer sentence lengths and a wider spread of
sentence lengths than the Mirror.

[IMAGE]

Statistics

Times

Mirror

Approx. difference

Mean

26

18

35%

Standard deviation

11

9

20%

Median

27

19

35%

IQR

14.75

11.75

20%

*Please note that all approximate difference in this paper are
calculated to be 100{(1±T/M)+(1±M/T)}/2 rounded to the nearest 5% when
T=figure for Times and M=figure for Mirror.

The reason why I drew a variable width histogram to compare the two
statistics is that if I had drawn a box-and-a-whisker diagram, the
outliers, in both papers, would have been excluded in both papers, but
I want the outliers to be included because I think they do have some
reflection on the style of journalism. Also, histograms show well the
general distribution.

Interpretation of Results

Location: The top histogram shows the sentence lengths in the Times
and the bottom one the Mirror. The mean sentence lengths in Times (26)
is substantially higher than that of the Mirror (19), by nearly 8
words per sentence. When the means in both papers do not exceed even
30, this is quite something; 35% difference is significant, I would
say.

Spread: The standard deviation for Times is 11, and for Mirror it is
9, again showing that Times has a wider spread, although this is only
slight. The 20% difference in both standard deviation and IQR
indicates a greater variety of journalism in the Times. Not only that,
the Mirror has less sentences anyway so it would be harder to find
sentences which would be considered as outliers in that they are
longer than the 'average' sentence.

Skew: The sentence lengths in both papers seem to be positively
skewed, though in the Times, it is very slight, co gay pared with the
Mirror. Naturally, you would expect there to be positive skew.
However, I do not think there is much to be commented on this.

For further investigation, I could to the normal distribution test,
but I do not have to.

Conclusion: There was a quite significant difference in the sentence
lengths. I reject H(0) in favour of H(1); the Times has quite
significantly longer sentences than the Mirror. The fact that the
Times has longer sentences could be due to a number of reasons. The
Mirror is a shorter paper than the Times; so a way to deal with as
many subjects and still keep 'thin' is to cut down on the provision of
tiny details to the readers. In the Times, small details usually tend
to be crammed in the same sentence after the main clause, effectively
lengthening the sentences.

The Times did have slightly larger spread of sentence lengths, but not
so much that I can say the difference was significant. However, the
spread occurs at different levels; the median of the Times' sentence
lengths is 27, whereas in Mirror it was 19.

Hypothesis 2

- H(0) The two newspapers will have similar word lengths.

H(1) Times will have longer word lengths.

[IMAGE]Diagram: I have chosen to use a box plot to compare the word
lengths of the two papers because they show clearly the location,
spread and skew of the data at one glance.

Location: The median in both cases lie on 4. The differences in the
mean between the two papers is under 0.5 letters, with the Times being
4.7 and the Mirror 4.3, which is quite close, and renders itself
insignificant.

Spread: The spread in the Times seems to be larger. Its IQ range is 4,
whereas in the Mirror it is 2. You could comment that this does show a
larger variety in the vocabulary used in the Times, but it might just
be that there happened to be less prepositions in the Times sample,
and as the sample was taken systematically, this could may well be the
case.

Skew: The results in both papers are positively skewed. However, one
would naturally expect that because of the presence of prepositions
and articles in the English language, which occur very frequently and
usually are very short, often only one or two letters.

Further Investigation: Seeing that there was no significant difference
in the world lengths of the two papers, I wondered if this is so in
all English literatures. So, I did a test to see if this was the case.
I took 20 words from two books aimed at different age groups. The
sample was selected by choosing every 4th word from a random page. I
chose a book that is read by 9~10 year olds and another which has is
aimed at adults. I found that the results are much the same as the
results for the two newspapers. They have the same median to start
with; and they share the lower quartile. However, 'The Glass Bead
Game' has a wider IQR, indicating that its spread in lengths of the
words is bigger than that of the children's book.

[IMAGE]

Conclusion: There is no significant difference. H(1) is rejected in
favour of H(0). There was actually more difference in the pre-test
samples. Maybe this indifference in word lengths is something that is
present in all English prose, as the further test has shown. It seems
that there is not much to say about the comparison between the two
papers in terms of word lengths. The actual word lengths are so
similar that I cannot conclude much from them; their median was the
same, for example. Although the IQR in the Times was bigger than that
of the Mirror by 2, I think that was just 'unlucky'; with the IQR
being so small and the figures heavily clustered around the median, it
could just be by chance that the there happened to be one or two less
2- or 3-lettered words.

Hypothesis 3

- H(0) There will be no significant difference in the text area to
rest area ratio in the pages of the two newspapers.

H(1) There will be a significant difference in the text area to rest
area ratio in the pages of the two papers.

Mirror

[IMAGE]

[IMAGE]Diagram: I have drawn two scatter diagrams, with the y-axis
being the percentage of the non-text area in the pages and the x-axis
the percentage of text area in the corresponding pages. This means
that the sum of the x co-ordinate and the y co-ordinate of a point
always adds up to 100. So, all the points in the scatter diagrams lie
on a straight diagonal line between (0,100) and (100,0) on the graph.
However, it is not the gradient that matters in these two diagrams, as
they both have the same; it is the location, spread and skew of the
co-ordinates, which will give us information about the trend in the
text area to rest area in the newspapers.

Times

The sample size is not too big, as I only took the data from 8 pages.
Still, I hope that what little sample I have would show some
significant differences.

Statistics

Times

Mirror

Approx. difference

Mean percentage of

Non-text area

43

63

40% (20% if you just take 63-43)

Standard deviation of

Non-text area

11

9

15%

Interpretation of the Results

Location: You can see at a glance that the co-ordinates in the Mirror
are grouped further up the y-axis than those of the Times. Clearly the
Mirror has less text in its pages than the Times. Although it might
appear that the difference in the mean is 20%, the real difference,
when you ignore the fact that figures for the Times and the Mirror are
in percents already and see the relationship between them in terms of
the size of the actual figures, it is actually around 40%. If this is
so with all other pages outside the 8 sample pages I chose, it
suggests that the Mirror places greater emphasis on its big headlines
and pictures and rely on them to attract readers, whereas in Times,
the articles themselves have priority.

Spread: The co-ordinates for the Mirror are a little more clustered
than those of the Times. This is because there are pages in the Times
where a high percentage of the pages are occupied by pictures, but on
the other hand have pages where there are hardly any pictures and have
whereas in the Mirror it was 9. However, it is not a very significant
difference.

Further Investigation: As 8 pages is not considered to be a big enough
a sample to be sure of the differences it indicates, I have prepared a
frequency density histogram, with a much larger sample, to illustrate
the text to rest ratio differences in the two papers. This indicates
that not only are there more text in general in the Times pages, but
also there is a bigger spread in the ratio.

[IMAGE]

Conclusion:There was a significant difference in the text area to the
rest area in the two newspapers, with the Times having considerably
more text in its pages than the Mirror. H(0) is rejected in favour of
H(1). This strongly suggests that the Times concentrates more on the
articles than the Mirror.

Hypothesis 4

- H(0) In corresponding articles written on the same subject and
event, the article lengths in both papers will be the same

H(1) In corresponding articles written on the same subject and event,
the articles in Times will be longer

I added this hypothesis to reinforce further the findings I made at
hypothesis 3.

Data Collection: I was attempting to find a variable which could
effectively compare the article lengths and support my prediction that
the Times will have longer articles and there is in fact a general
relationship between the article lengths

I had decided not to include the articles on the Rusedski drug scandal
because not only has both papers written multiple articles on it,
which makes it hard to compare, but also I felt that it was a big
event that does not occur everyday so it was not really representative
of any of the papers, and really rather a special feature of January 9th.
I could only find 10 articles common to both papers and it is hoped
that these will be sufficiently representative of the papers, and it
may be that these articles are heavily distorted and not fit for
comparison.

Diagram: I chose a scatter graph because this will show me if there is
a correlation between the article lengths in the Times and the Mirror
on the same subjects. It was hoped that I would find some sort of a
trend line. If so, the gradient should reveal the relationship, which
can be confirmed by looking at the correlation coefficient. Of course,
if the articles I compared were not written on the same thing, I could
have distorted the data however I liked.

[IMAGE]

Statistics

The Times

The Mirror

Approx. difference

Mean article lengths

714.6 words

411.3 words

65%

Standard deviation

233.3

178.8

25%

Correlation coefficient

r = 0.895

Line of regression

y = 0.69x + 79.3

Interpretation of Results: There is no doubt that there is a
significant difference. This analysis has shown the highest percentage
difference of anywhere in the project

Location: The mean article length for the Times is 715 words compared
with the Mirror of 411. This is a 65% difference and is significant.
This proves that the Times places a much greater emphasis on the text
of the articles than the Mirror. This supports the last hypothesis
that the two papers have different priorities.

y =

article length in Mirror

x = length of Times article

lengths of the articles in the two papers; the Times has the standard
deviation of 233, whereas the Mirror has 179. This tells us that in
the Mirror, they like to keep the articles of similar lengths, all
shorter than the Times, so that they are never long enough to overload
the reader's brain with information. However, in the Times, articles
go on as long as it is felt is necessary to provide as much relevant
information as readers would like to know. This significant difference
of 25% suggests a more varying style of journalism in the Times than
in the Mirror.

Further Investigation: The graph shows a positive relationship between
the two, which is only normal as if something is 'big', newspapers
tend to write longer on that subject. However, when I drew the y-on-x
regression line, it came out to be y = 0.69x + 79.3. This, in short,
proves that the articles in the Times are a lot longer than that of
the Mirror. I can say that of the corresponding articles in the
papers, those in the Mirror are in general only about 70% of the
lengths of that in their Times' counterpart. In fact, in no case y is
greater than x. The fit is quite good too, the correlation coefficient
is 0.893. The line of regression shows that there is something of a
relationship between the two. Also, one would think that the
newspapers would write mostly on the same events; but I only found 10
common articles to both papers. This also suggests that their articles
deal in different subjects; maybe the material for the Mirror is not
something that one can write a lot about.

Overall Summary

The reason as to why I chose the Times and the Mirror for comparison
is that the Times is a stereo-type of quality papers and the Mirror
represents the tabloids, so I can use these results to compare the
broadsheets and tabloids in general. Although I obtained my data from
the tabloid version of the Times, it still retains all its
characteristics and all the articles are the same; it is only smaller.

I have already said that the Times has quite significantly longer
sentences, but why would they want this to happen? Mirror is somewhat
less 'reputable' than the Times, and is aimed at people who are not
particularly interested in politics or business, but celebrity news
and gossips, and for this purpose, the actual quality of the article
does not matter; in fact, simpler articles are better because these
people just want some basic information, which gets across the
quickest to the reader's mind in short, decisive sentences. However,
Times targets people who want to read intelligent prose, with plenty
of details and words to accurately describe and inform the situation

Unfortunately, the second hypothesis turned out to be bit of a 'dud'
and nothing much can be indicated in terms of word lengths.

However, I think that the third and the fourth hypotheses say more
than the sentence lengths about the general editorial and the style of
journalism in the two newspapers.

It is clear that there is less room for text in the Mirror, because so
much is taken up by pictures. 17.5 is quite something, and as I have
said before, that is not counting the headline areas. I have seen
pages in the Mirror where the headlines occupy almost a half of the
paper. Also, the picture to text area ratio in the Times is more
consistent.

The articles in the Mirror are also shorter, as I have proved above,
which is in accordance with the fact that there is less room for the
text anyway. So why do people bother buying the Mirror then? The truth
is, I must conclude, that the Mirror, as with all mass-appeal
tabloids, sells itself by the pictures, 'page 3 girls' and catchy