Topic Modeling for Text Structure Assessment: The case of Russian Academic Texts
This paper develops topic modeling methods (LDA, OnlineLDA) to assess complexity of Russian academic texts by analyzing topical structure parameters (number of topics, coherence, distribution, weight) across school textbooks and texts for different grade levels.
Background: Automatic assessment of text complexity levels is viewed as an important task, primarily in education. The existing methods of computing text complexity employ simple surface text properties neglecting complexity of text content and structure. The current paradigm of complexity studies can no longer keep up with the challenges of automatic evaluation of text structure. Purpose: The aim of the paper is twofold: (1) it introduces a new notion, i.e. complexity of a text topical structu