Skip to Main Content
Ask A Librarian

HathiTrust Digital Library

Millions of books online.

Getting Started

HathiTrust is a partnership of academic & research institutions, offering a collection of millions of titles digitized from libraries around the world.


Supports large-scale computational analysis of the works in the HathiTrust Digital Library to facilitate non-profit and educational research.


Host, run, and code Python from the cloud.


Python Activity


Commands that we will be using for Python

Unzip Activity Files
unzip activity_files_hathitrust.zip
Move Files
mv activity_files/* /home/[your pythonanywhere username]
Wget

wget https://en.wikisource.org/wiki/George_Washington%27s_Fourth_State_of_the_Union_Address --output-document=washington_4.txt

Remove tag script
python remove_tag.py washington_4.txt
View Text
less tagless_file.txt
Remove Stop Words
python remove_stopwords.py tagless_file.txt stopwords.txt washington_4_stops_removed.txt
Install HTRC Feature Reader
pip install --user htrc-feature-reader
Run Top Adjectives Script
python top_adjectives.py 1970
Word Count
python word_count.py