A Bridge Over the Language Gap: Topic Modelling for Text Analyses Across Languages for Country Comparative Research
November 4, 2019
Working Paper
Work Package 8 of the REMINDER project has used computer-assisted techniques to examine some 1.5 million news articles across seven languages. This paper presents a comprehensive overview of different methodological strategies for conducting such analysis across multiple languages. The authors give a general overview of the intricacies of computer-assisted text analysis of multilingual data, and introduce the concept of ‘topic modelling’ — a form of computer-assisted analysis focused on the identification of clusters of words that are likely to occur together in a given body of texts.