Transform a humble book bundle link into a read-to-import bookwyrm csv file
Find a file
2025-11-21 17:16:20 +01:00
output initial commit 2025-11-21 17:16:20 +01:00
.gitignore initial commit 2025-11-21 17:16:20 +01:00
grab.nu initial commit 2025-11-21 17:16:20 +01:00
README.md initial commit 2025-11-21 17:16:20 +01:00

Create a Bookwyrm-ready import list from humble bundle

Usage: ./grab.nu <link-to-humble-book-bundle-page>

For example: ./grab.nu https://www.humblebundle.com/books/data-engineering-science-oreilly-books | save -f data-eng.csv

Will download book info in parallel, but often does not find enough info for all books.

Internal logic

  • get humble bundle page http get https://www.humblebundle.com/books/software-architecture-pearson-books

  • extract correct element for json data open out.html | pup "script#webpack-bundle-page-data text{}" | from json

  • get transposed version open books.json | get bundleData.tier_item_data | transpose machine_id item | insert human_name {$in.item.human_name} | insert cover_art {$in.item.resolved_paths.front_page_art_imgix_retina} | insert publisher {$in.item.publishers.0.publisher-name} | reject item

  • get details of books into tabled format open books_transposed.json | where machine_id != "code_org" | insert human_name {$in.item.human_name} | insert cover_art {$in.item.resolved_paths.front_page_art_imgix_retina} | insert publisher {$in.item.publishers.0.publisher-name} | insert author {$in.item.developers.0.developer-name} | reject item | save books.csv

  • search openlibrary for the book http get https://openlibrary.org/search.json?author=Oliver+Goldman&title=Effective+Software+Architecture | save search_result.json

  • get isbn from openlibrary (currently first result, primary edition) open search_result.json | get docs.0.cover_edition_key | http get $"https://openlibrary.org/books/($in).json" | save work_result.json

  • fill out other information from edition info if available

    • publishers
    • publish_date
    • languages
    • authors?
    • title
    • subtitle
    • isbn_13
    • isbn_10 if exists
    • olid (Openlibrary ID) = key above
  • save it into csv

    • title
    • author_text (First Last, First Last)
    • remote_id (link, nothing)
    • openlibrary_key (olid above)
    • isbn_10
    • isbn_13