Building a Goodreads quote scraper

2025 January 09

I’ve been wanting to code a web scraper I would actually use instead of another amazon one. Goodreads.com keeps 100 pages of random popular quotes from authors. Unfortunately there’s no data on which books they come from, but I’ll update this project if they ever add that in the future. For now, I just need functions to get a random page (1 to 100), scrape quotes from the pages, and actually extract the quote text and author.

Prerequisites: python requests: pip install requests. BeautifulSoup: pip install beautifulsoup4

Screenshot of Python code

Gives you a random page number. User the requests library to extract the actual quotes and authors. User-Agent makes the requests look like a browser. Used from the examples here. time.sleep(2) so I don’t overload Goodreads. Before implementing basic rate limiting, I couldn’t extract any quotes at all 🙁

Screenshot of Python code

If you inspect Goodreads source code, you’ll find the quotes are inside

elements with the class quote. The quote text is in
with the class quoteText, and the author is in with the class authorOrTitle.

Screenshot of Python code

displayed_quotes prevents the same quote from displaying twice and a while True loop keeps trying to find a unique quote.

Save the script and run it from your terminal: python goodreads_quotes.py

Of course, it could be improved a lot. Another day I’ll update it to write the quotes in JSON format or possibly implement a gui.