No further development will occur in this package as it has been supeseded by the actively maintained and quite spiffy! epubr
package.
Convert āepubā Files to Text
Convert āepubā Files to Text
The āepubā file format is really just a structured āZIPā archive with metadata, graphics and (usually) āHTMLā text. Tools are provided to turn an āepubā file into a tidy data frame.
The following functions are implemented:
epub_to_text
: Convert an epub file into a data frame of plaintext chapters
There are edge cases Iāve totally not covered yet. Feel free to jump in and make this a real, useful package!
- Refactor so there arenāt so many heavy dependencies
[ ] Try to getMoved the cleaner code into herehgr
on CRAN so itās not a GH dep- Better docs
- Embed some epubs for examples and tests
- Setup Travis, Appveyor, code coverage
devtools::install_github("hrbrmstr/pubcrawl")
library(pubcrawl)
library(tidyverse)
# current verison
packageVersion("pubcrawl")
## [1] '0.1.0'
epub_to_text("~/Data/R Packages.epub")
## # A tibble: 26 x 4
## path size date content
## <chr> <dbl> <dttm> <chr>
## 1 OEBPS/cover.html 315 2015-03-24 21:49:16 Cover
## 2 OEBPS/titlepage01.html 466 2015-03-24 21:49:16 "R Packages\n\nHadley Wickham"
## 3 OEBPS/copyright-page01.html 3286 2015-03-24 21:49:16 "R Packages\n\nby Hadley Wickham\n\n\n\nPrinted in the Uniteā¦
## 4 OEBPS/toc01.html 17557 2015-03-24 21:49:16 "navPrefaceIn This Book\n\nConventions Used in This Book\n\nUā¦
## 5 OEBPS/preface01.html 17784 2015-03-24 21:49:16 "Preface\n\n\nIn This Book\n\nThis book will guide you from bā¦
## 6 OEBPS/part01.html 444 2015-03-24 21:49:16 Getting Started
## 7 OEBPS/ch01.html 12007 2015-03-24 21:49:16 "Introduction\n\nIn R, the fundamental unit of shareable codeā¦
## 8 OEBPS/ch02.html 28633 2015-03-24 21:49:18 "Package Structure\n\nThis chapter will start you on the roadā¦
## 9 OEBPS/part02.html 454 2015-03-24 21:49:18 Package Components
## 10 OEBPS/ch03.html 28629 2015-03-24 21:49:18 "R Code\n\nThe first principle of using a package is that allā¦
## # ... with 16 more rows
epub_to_text(system.file("extdat", "augustine.epub", package="pubcrawl")) %>%
mutate(path = abbreviate(path))
## # A tibble: 10 x 4
## path size date content
## <chr> <dbl> <dttm> <chr>
## 1 OEBPS/@@@@@@@3296@3296-@3296--0 63804 2017-10-02 07:00:00 "THE CONFESSIONS\nOF\nSAINT AUGUSTINE\n\nBy Saint Augustiā¦
## 2 OEBPS/@@@@@@@3296@3296-@3296--1 68504 2017-10-02 07:00:00 "BOOK III\nTo Carthage I came, where there sang all arounā¦
## 3 OEBPS/@@@@@@@3296@3296-@3296--2 80192 2017-10-02 07:00:00 "BOOK V\nAccept the sacrifice of my confessions from the ā¦
## 4 OEBPS/@@@@@@@3296@3296-@3296--3 51898 2017-10-02 07:00:00 "O crooked paths! Woe to the audacious soul, which hoped,ā¦
## 5 OEBPS/@@@@@@@3296@3296-@3296--4 80194 2017-10-02 07:00:00 "Anubis, barking Deity, and allĀ Ā Ā Ā Ā Ā Ā Ā The monster Gods ā¦
## 6 OEBPS/@@@@@@@3296@3296-@3296--5 80718 2017-10-02 07:00:00 "The boy then being stilled from weeping, Euodius took upā¦
## 7 OEBPS/@@@@@@@3296@3296-@3296--6 65956 2017-10-02 07:00:00 "And Thou knowest how far Thou hast already changed me, wā¦
## 8 OEBPS/@@@@@@@3296@3296-@3296--7 57022 2017-10-02 07:00:00 "BOOK XII\nMy heart, O Lord, touched with the words of Thā¦
## 9 OEBPS/@@@@@@@3296@3296-@3296--8 69513 2017-10-02 07:00:00 "BOOK XIII\nI call upon Thee, O my God, my mercy, Who creā¦
## 10 OEBPS/@@@@@@@3296@3296-@3296--9 21223 2017-10-02 07:00:00 "The Confessions of Saint Augustine, by Saint Augustine\nā¦
Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.