Ebook

Counting Word Frequencies in One Line

cat book.txt | \ tr '!()[]{};:",<.>?“”‘’*/\r' ' ' | \ tr ' ' '\n' | \ grep -a -P "^[\p{L}\p{N}\-']+\$" | \ grep -a -P -v "^[\p{N}\-']+\$" | \ sed "s/'s\$//" | \ sed "s/^'//" | sed "s/'\$//" > words.txt cat words.txt | \ sort | uniq -c | \ sort -nr | \ cut -c9- > words_desc.txt replace punctuations with space; remove \r from `\r\n' one word per line keep only words composed of unicode letters, numbers, hyphen and apostrophe remove pure numbers remove ’s remove starting and ending apostrophe output words.txt sort and count unique words ...

A Note on Mobi Format

Mobi Format Description: http://wiki.mobileread.com/wiki/MOBI Python lib: https://github.com/kroo/mobi-python

Ebook Manipulation Tools

Ebook manager: Calibre Kindle PDF optimizer: k2pdfopt PDF Border Cropper Briss CHM File Extractor archmage PDF editing Xournal convert images to PDF sudo apt-get install imagemagick convert *.jpg pictures.pdf convert between different formats: sudo apt-get install calibre ebook-convert xxx.mobi xxx.txt --unsmarten-punctuation