# Investigation of Film Lengths

More ↓The title of my investigation is 'The Average Length of a Sample of 52

Randomly Selected Films'. As you can interpret from this title, I am

going to investigate into the length of a set of randomly selected

feature length films. I have chosen to investigate into this topic

because I am a Film Studies student and am required by this subject to

produce a feature length film as part of a coursework assignment. This

leaves me with the predicament of trying to decide how long my feature

film should be. Is there a certain length of film that after the

audience has finished watching, they feel that they have not got their

money's worth and in consequence leave the cinema feeling

disappointed? In contrast, is there a certain length of film that

after the audience has finished watching, they are bored because the

film has run on for too long? In short, my aim for this investigation

is to find the average length of feature films and so enable me to

correctly decide the length of my Film Studies coursework film so that

the audience leaves happy after viewing it.

To determine a population for my course-work, I am going to use the

HMV superstore Internet site to search for my sample of films. This

film site is split up into categories corresponding to letters of the

alphabet; therefore there are a total of 26 categories as there is 26

letters of the alphabet. Each category contains roughly 500 films

giving a total population of 130,00 films. Obviously this population

is far too vast, so I what I propose to do is to number each film in a

category and then the random number generator function on my

calculator to give two random numbers. I then match up the numbers

with the corresponding films, and get two films for each category,

complete with total running time and year of release. I will therefore

end up with a sample of 52 films, fulfilling the criteria of a sample

of films is as accurate as possible, I am going to cross-reference the

data that I collect from the HMV site with data from the Channel 4

Film Four website. Film Four is regarded in film society as being the

definitive movie database, and if this site does not know what you are

looking for, then no one does. I would use this site to originally

collect my data, but it has a smaller total population of films, that

is, its categories do not contain as many films as HMV does. It is

never the less a useful point of cross-reference.

See overleaf for collected data

On this page I have included results that I calculated with the help

of Microsoft Excel. I can then later compare my own calculated results

with those shown below.

Mean

120.1346154

Standard Deviation

28.15296373

1st Quartile

103.75

Median

117.5

3rd Quartile

129.25

To get a visual idea of the spread of my data, I decided to represent

it in a stem and leaf diagram:

70

7

80

8

0

1

5

90

1

2

4

5

3

7

100

5

0

5

5

5

4

3

4

5

110

1

7

4

5

4

4

8

120

9

1

1

0

2

3

0

8

4

5

2

1

130

0

6

9

0

140

7

8

150

6

160

7

4

2

170

180

1

6

190

200

210

3

N = 52 156 6 represents 156 mins.

Stem and Leaf diagram showing the total duration of a sample of 52

films (unsorted)

To help me when constructing a cumulative frequency diagram, I have

sorted the above diagram:

70

7

80

0

1

5

8

90

1

2

3

4

5

7

100

0

3

4

4

5

5

5

5

5

110

1

4

4

4

5

7

8

120

0

0

1

1

1

2

2

3

4

5

8

9

130

0

0

6

9

140

7

8

150

6

160

2

4

7

170

180

1

6

190

200

210

3

N = 52 156 6 represents 156 mins.

Stem and Leaf diagram showing the total duration of a sample of 52

films

(sorted)

As you can se from the above diagram

· the shortest film has duration of 77 mins, and the longest is 213

mins long

· most films are in the group 120-130 mins.

I have now decided to construct a frequency table so that I can draw a

cumulative frequency graph, which will enable me to draw a box and

whisker plot, and therefore visually see any or all outliers in my

data. Here follows my table:

Class

Frequency

Cumulative Frequency

70=

1

1

80=

4

5

90=

6

11

100=

9

20

110=

7

27

120=

12

39

130=

4

43

140=

2

45

150=

1

46

160=

3

49

170=

0

49

180=

2

51

190=

0

0

200=

0

0

210=

1

1

Total 52

See overleaf for cumulative frequency diagram.

From Graph:

First Quartile - 129 mins

Mean - 117 mins

Third Quartile - 103 mins

From my sorted stem and leaf diagram it is possible to calculate

median, first and third quartiles of my data, and so will now do so,

so that I can compare these results to those obtained from my graph.

First Quartile = ¼ x 52 + ½ = position 13.5

Value = 103 mins

Median = ½ x 52 + ½ = position 26.5

Value = 117 mins

Third Quartile = ¾ x 52 + ½ = position 39.5

Value = 129 mins

As you can see, my graph values deviate from the calculated values.

Therefore as the calculations will probably be more accurate than the

graph results, when I draw my box and whisker plot, I will use the

calculated values.

See below for box and whisker plot:

I am now going to calculate the mean and standard deviation of my data

so that I can see where the outliers of my data, if any, are.

Mean

====

Standard Deviation

------------------

In most data sets

· about 2/3 of the values lie within 1 standard deviation of the mean

· about 95% of the values lie within 2 standard deviations of the mean

· about 99.5% lies within 3 standard deviations.

To see if this is so with my data, I will now perform the relevant

calculations:

Mean = 120.13

120.13 + 27.91 = 148.04

120.13 - 27.91 = 92.22

Therefore about 2/3 of my data should lie between 92 mins and 148

mins. That is, about 34 values should be within this range. About 38

values are in this range, so this piece of information is relevant for

my information.

Mean = 120.13

120.13 + 2(27.91) = 175.95

120.13 - 2(27.91) = 64.31

Therefore about 95% of my data should lie in the range of 64 mins to

178 mins. About 49 values should be in this range. About 49 values are

in this range, and so I have identified that the outliers in my data

are 181, 186 and 213 mins.

SUMMARY OF WHAT I FOUND AND ACCURACY

====================================

I began this investigation with the aim of finding out the average

time of a song and determining how long or short a song can be without

the audience being bored or not pleased, respectively.

I found that the average duration of the sample of songs that I used

was about 235seconds. The shortest film in my sample was 125.4 seconds

long, and the longest was 325.2 seconds. About 2/3 of the sample of

films fell between 182.seconds and 284. seconds . Therefore my media

studies coursework should aim to be between about 182 seconds long and

about 284seconds long. I will aim to make the song to be about

233secondslong in total length, as this is the average between these

two values. I think that this investigation has been a success because

I have achieved what I set out to achieve: what length my song should

be for maximum audience enjoyment.

In terms of accuracy, I think that this investigation has been a

success. I have constructed a table to show my graph values, where

applicable, my previously calculated values and the computer's values:

Graph Value

My calculated value

Computer value

Mean

233.5

236.004

Standard Deviation

51.08

55.85029

Median

258

mode

202.2

As you can see, the only real discrepancy occurs with the computer's

standard deviation, and I can only conclude that this is because it

has used a mean with more decimal places, and therefore it is more

accurate. It also occurred because the calculated standard deviation

was found from grouped data and therefore only an estimate. Graph

discrepancies occur because of inaccurate drawing techniques. Other

than that, my values are, on the whole, pretty accurate.

The quality of the investigation could be improved by using a more

varied population, in an ideal world this would be an Internet site of

every film ever made. As this does not exist, the site that contains

the most comprehensive list of films will have to do. Which in this

case is the site that I used, the HMV superstore site. My method of

collection would not vary even if I used another site. If I used three

randomly picked numbers for each category, then I would end up with a

sample of 78 films, which I think is too produce an accurate report

on.

I think that this investigation was a success, as it helped me to gain

some information that will help me in another topic, and because it

was completed as accurately as I could.