Reliability is the extent to which scores obtained from similar or parallel instruments, by different observers or raters, or at different times yield the same or similar results (Streiner, 2003c). Importantly, reliability applies to the scores obtained from an instrument rather than the instrument itself. This is one of the most commonly made measurement mistakes in psychology. One way to establish reliability is to create alternate forms of an instrument. To create alternate forms, we take one instrument and compile another similar instrument that measures the same construct. If they yield the same or similar results, then the instruments are said to be equivalent or parallel forms.
There are several types of reliability. Test-retest reliability, in which the same instrument is administered at two different points in time, is used to determine temporal stability. The length of time between assessments depends on the characteristics of the construct, whether it is trait, state, or transitory in nature. Trait-like constructs should be highly stable over time and thus the period of time in between measurement can be longer, unlike state-like constructs which tend to be less permanent. In essence, the length of time between administrations of tests should reflect the degree of temporal stability the instrument is expected to have.
Another type of reliability is interrater reliability, or interrater agreement, which applies to instruments that require judges or observers to score or rate a behavior. The degree of agreement between the scores of raters is the interrater reliability of that instrument. Instruments of this nature include semistructured interviews, observational coding systems, behavior checkli...

... middle of paper ...

... how similar the facets or elements of a construct are to each other. For example, impulsivity is said to have four different elements – premeditation, urgency, sensation seeking, and perseverance. All differ from each other to varying degrees, but when combined, they produce a general construct of impulsivity. If a construct has several dimensions that are incongruent with each other, a single score derived from an instrument intended to measure this construct does not produce adequate information. We cannot know which facets of the construct contribute most heavily to the single score. Additionally, two individuals could conceivably have the same score, but exhibit vastly different combinations of the facets that comprise a construct. For this reason, it is important to get a score on each dimension of a construct that an instrument is intended to measure.

