Inspecting websites to find JSON data APIs | Marcos Huerta | Data Science Lab
Автор: Posit PBC
Загружено: 2026-01-26
Просмотров: 719
Описание:
The Data Science Lab is a live weekly call. Register at pos.it/dslab! Discord invites go out each week on lives calls. We'd love to have you!
The Lab is an open, messy space for learning and asking questions. Think of it like pair coding with a friend or two. Learn something new, and share what you know to help others grow.
On this call, Libby Heeren is joined by Marcos Huerta, a Data Science Manager at Carmax, as he walks us through the guts of websites looking for data we can play with. He shows us how to find hidden REST/JSON APIs by using the web inspector in Safari/Firefox and then how to get what's necessary to pull the same data programmatically in python or R.
Hosting crew from Posit: Libby Heeren, Isabella Velasquez, Daniel Chen
Marcos's urls:
Website: https://marcoshuerta.com
GitHub: https://github.com/astrowonk/
Resources from the hosts and from participants in the Discord chat:
🔗 Postman: https://www.postman.com/
🔗 Insomnia (open source alternative to Postman): https://insomnia.rest/
🔗 Baseball Savant website Marcos is using: https://baseballsavant.mlb.com/gamefe...
🔗 Isabella Velasquez's blog on using {polite} R package to help scrape Wikipedia: https://ivelasq.rbind.io/blog/politel...
🔗 Festivas Mac app Marcos used to add the lights to his desktop: https://festivitas.app/
🔗 Ted Laderas blog post on parsing JSON in R: https://laderast.github.io/intro_apis...
🔗 New rvest read_html_live() function: https://rvest.tidyverse.org/reference...
🔗 yyjsonr R package: https://github.com/coolbutuseless/yyj...
🔗 tuber R package: https://github.com/gojiplus/tuber
🔗 WikipediaR R package: https://www.quantargo.com/help/r/late...
🔗 rookiepy python package: https://pypi.org/project/rookiepy/
► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu
Follow Us Here:
Website: https://www.posit.co
The Lab: https://pos.it/dslab
Hangout: https://pos.it/dsh
LinkedIn: / posit-software
Bluesky: https://bsky.app/profile/posit.co
Thanks for learning with us! 💛
Timestamps
00:00 Introduction
03:05 Web scraping vs. API calls
04:12 Server-side rendering vs. client-side JSON
06:12 Warning: Rate limits and business ethics (ahem)
08:39 Demo: Baseball Savant website
08:57 Using browser Developer Tools and the Network tab
12:15 "What is curl?"
13:30 Importing curl into Postman
16:03 Generating Python code from Postman
16:50 "Are there open source alternatives to Postman?"
17:50 Using the generated code in Python/Jupyter
22:28 R packages for JSON (jsonlite, yyjsonr)
25:09 Demo: Massachusetts Lottery website
28:17 Example: scripts Marcos automated with Cron jobs
30:17 Handling logins and cookies with RookiePie
32:19 Demo: CNN Election Data
34:26 Inspecting ESPN's website
36:58 "Can you scrape YouTube?"
38:19 Finding hidden JSON in CardsMania history
45:00 Benefits of API inspection over Beautiful Soup
46:59 New rvest function: read_html_live
50:40 Inspecting LinkedIn and finding GraphQL
53:58 Encouragement on handling API pagination
Повторяем попытку...
Доступные форматы для скачивания:
Скачать видео
-
Информация по загрузке: