Getting the Toner Level Value from Dynamic HTML with Cheerio and Puppeteer
Автор: vlogize
Загружено: 2025-04-14
Просмотров: 2
Описание:
Learn how to effectively scrape dynamic content such as the `toner level` from complex HTML structures using Cheerio and Puppeteer in Node.js.
---
This video is based on the question https://stackoverflow.com/q/68643093/ asked by the user 'MRK' ( https://stackoverflow.com/u/14438049/ ) and on the answer https://stackoverflow.com/a/68687628/ provided by the user 'MRK' ( https://stackoverflow.com/u/14438049/ ) at 'Stack Overflow' website. Thanks to these great users and Stackexchange community for their contributions.
Visit these links for original content and any more details, such as alternate solutions, latest updates/developments on topic, comments, revision history etc. For example, the original title of the Question was: cheerio .text() returns empty string
Also, Content (except music) licensed under CC BY-SA https://meta.stackexchange.com/help/l...
The original Question post is licensed under the 'CC BY-SA 4.0' ( https://creativecommons.org/licenses/... ) license, and the original Answer post is licensed under the 'CC BY-SA 4.0' ( https://creativecommons.org/licenses/... ) license.
If anything seems off to you, please feel free to write me at vlogize [AT] gmail [DOT] com.
---
Scraping Dynamic Content with Cheerio and Puppeteer
In the world of web scraping, developers often run into issues while trying to extract dynamic content from web pages. One common problem is when the selected elements return an empty string despite being present in the HTML. This post addresses a typical case where we try to scrape toner levels from a web page but encounter difficulties using Cheerio.
The Problem: Empty Selector Output
Imagine you’re tasked with scraping toner levels from a website. You've identified that the required data is found in a specific <span> tag within a larger HTML structure. However, when you apply your selector using Cheerio, it returns an empty string. This can be incredibly frustrating for developers.
Here’s a quick look at the HTML structure that contains the value you want to scrape:
[[See Video to Reveal this Text or Code Snippet]]
The goal is to extract the number 72, which is located within the <span> element with the class dataText.
Code to Extract Toner Level
You might try a simple selector like below in your code:
[[See Video to Reveal this Text or Code Snippet]]
Yet, this returns empty, leaving you puzzled about what went wrong.
The Solution: Using Puppeteer
After some investigation, it becomes apparent that the toner level information is loaded dynamically through a script, meaning it isn’t present in the document when Cheerio tries to access it. To solve this, we can utilize Puppeteer, a powerful tool for controlling headless browsers.
Puppeteer allows us to wait until the content is fully loaded before attempting to scrape it. Here’s how to modify your approach using Puppeteer:
Step-by-Step Solution
Install Puppeteer: Make sure you have Puppeteer installed along with Cheerio.
[[See Video to Reveal this Text or Code Snippet]]
Use Puppeteer to Fetch and Parse Content:
[[See Video to Reveal this Text or Code Snippet]]
Key Parts of the Code:
await page.goto('http://example.com');: Navigates to the desired URL.
await page.waitForSelector(...): Waits until the specified element is loaded into the DOM.
await page.$eval(...): Evaluates the selector and retrieves the text content from the element.
Bonus: Extracting Attributes
If you also want to get the title attribute from the same element, you can modify the $eval method like so:
[[See Video to Reveal this Text or Code Snippet]]
Conclusion
Scraping dynamic content can be tricky, especially when data is loaded asynchronously through scripts. Utilizing Puppeteer to wait for elements to fully load allows you to successfully extract the information you need. With the right tools and techniques, overcoming these challenges is within reach.
If you found this guide helpful or have any questions, feel free to leave a comment below!
Повторяем попытку...
Доступные форматы для скачивания:
Скачать видео
-
Информация по загрузке: