textgraph
is a R package for building and analysing large-scale text co-occurrence graphs. It provides functionality to turn document-feature data into (weighted) feature co-occurrence graphs,
analyse their contents via seeded random walks, topic clustering and temporal topic clustering, as well as numerous helper functions to prepare, analyse and explore the results.
Currently, the package provides two main workflows:
- Retrieving terms or entities functionally equivalent to previously extracted seed terms, and retrieve associated documents. This can be compared to certain types of seeded topic modeling;
- Retrieving clusters of related terms or entities via community detection algorithms, either statically for a single network or dynamically for a number of temporal network snapshots. This can be compared to unsupervised topic modeling approaches.
textgraph
is built to facilitate the analysis of large-scale graphs built from millions of documents. Whenever feasible, functions can be parallelized through the furrr framework. The majority of graph operations
is handled via the igraph library. Random Walk functionality is provided via RandomWalkRestartMH, while dynamic topics are facilitated via
Memory Community Matching.
textgraph
is not currently on CRAN. Therefore, it needs to be installed directly from Github.
# install.packages("remotes")
remotes::install_github("TimBMK/textgraph", build_vignettes = TRUE)
The vignette provides a throrough overview of the workflows, with explanations and example data.
vignette("textgraph")
Please report any and all bugs here.