Computer-Assisted Text Analysis

870 Words4 Pages
Computational approaches are largely used in the variety of text applications such as feature selection and classification tasks because of their efficiency of dealing with huge amount of data. The discussion is concerned, however, with the applications of computational approaches to only literary texts in general and Hardy’s texts in particular. To my knowledge, there is no computer-aided thematic classification of the works of Thomas Hardy. The only study that approached Hardy’s works in terms of clustering techniques is Hoover’s (2002). It evaluates the validity of multivariate analysis techniques and especially cluster analysis based on frequent words in distinguishing texts by different authors and grouping texts by a single author. Hoover (2002) investigates 29 novels and literary critical texts of American and British writers including Hardy’s Jude Obscure, Tess of D’Urbervilles, and The Mayor of Casterbridge. The comparative lack of computer-based analysis of literary texts can be due to the unfamiliarity of the world of computational theory and methodology to literary scholars. Ramsay (2003) suggests that “the inability of computing humanists to break into the mainstream of literary critical scholarship may be attributed to the prevalence of scientific methodologies and metaphors in humanities computing research” (2003: 167). One might even suggest that the unfamiliarity with computational and mathematical approaches has generated in literary scholars the belief that all computational and statistical approaches are somehow antithetical to literary critical approaches. This would explain the gap we see between literary critical theory on the one hand and computer-based text analysis and quantitative approaches on the ... ... middle of paper ... ... powerful tools seem appropriate for the very large amounts of information represented by texts” (Hoover, 2001: 421). These have come to be used in critical investigations of theme (Labbe and Labbe, 2006; Laffal, 1995), structure, genre, characterization (Ramsay and Steger, 2006; Craig, 1999), imagery (Ide, 1989), and text classification (Burrows, 2004). In such applications, cluster analysis and PCA are the most dominant multivariate analysis methods. For instance, Ide (1989) investigates the use of Imagery in William Blake The Four Zoas by means of principal components analysis and correlation analysis. Similarly, Craig (1999) uses PCA of the frequencies of the very common words of characters in plays of Ben Jonson. He concludes that significant change within the idiolects of Jonson’s characters is one of the most remarkable features of Jonson’s characterization.
Open Document