Viele packages in R haben auch eine Auswahl an Texten, mit denen man sich an die Textanalyse heranwagen kann. Hier eine kleine (nicht vollständige) Liste für die ersten Versuche:
# Jane Austen texts
library(janeaustenr) #one-row-per-line format of her books
texts <- austen_books() %>%
group_by(book) %>%
mutate(linenumber = row_number(), #annotate line number
chapter = cumsum(str_detect(text, regex("^chapter [\\divxlc]",
ignore_case = TRUE)))) %>% #annotate chapter
ungroup()
# Access to Gutenberg Project (not working in Germany!)
library(gutenbergr) #access to the public Project Gutenberg collection
texts <- gutenberg_download(c(35, 36, 5230, 159)) #downloads one or more books by ID
# Access to AssociatedPress Articles
library(tm)
texts <- data("AssociatedPress", package = "topicmodels")
# Access to metadata from NASA papers
library(jsonlite)
metadata <- fromJSON("https://data.nasa.gov/data.json")
names(metadata$dataset)
texts <- tibble(id = metadata$dataset$`_id`$`$oid`, desc = metadata$dataset$description)
Viel Spaß! ;)
Photo by Engin Akyurt from Pexels