DataHen documentation
Table of Contents
- DataHen Platform
- Getting Started
- Install DataHen Command Line Interface using rubygems
- Get your access token
- Set environment variable of your access token
- Create the scraper
- Deploying the scraper
- Run the scraper
- Viewing the Job Stats
- Viewing the Job Pages
- Viewing a Job Page Content
- Viewing a Global Page Content
- View the scraper output
- View the scraper logs
- Getting Started
- High Level Concepts
- User Access
- Scraper Development workflow
- Scraper Maintenance workflow
- Coding Tutorials
- Advanced Tutorials
- How-Tos
- Setting scraper’s mode
- Setting a scraper’s scheduler
- Enabling global page cache
- Changing a Scraper’s or a Job’s or a JobPage’s Proxy Type
- Changing a Scraper’s or a Job’s Profiles
- Account deploy key
- Setting a specific ruby version
- Setting a specific Ruby Gem
- Changing a Scraper’s Standard workers
- Changing a Scraper’s Browser worker count
- Changing an existing scrape job’s worker count
- Enqueueing a page to Browser Fetcher’s queue
- Setting fetch priority to a Job Page
- Setting a user-agent-type of a Job Page
- Setting the request method of a Job Page
- Setting the TLS version on a Job Page (only works on standard fetch)
- Setting the request headers of a Job Page
- Setting the request body of a Job Page
- Setting the page_type of a Job Page
- Reset a Job Page
- Handling cookies
- Force Fetching a specific unfresh page
- Handling JavaScript
- Max page size
- Browser display
- Browser interaction
- Enqueue same page twice with different code
- Generate GID without Enqueue a page
- Intercept request
- Executing puppeteer code before fetch is done
- Sharing data between pre_code and code
- Enabling browser images
- Disabling browser Adblocker feature
- Change browser fetch behavior
- Dealing with responsive designs
- Dealing with infinite load timeouts
- Distinction of pages() vs newPage()
- Taking screenshots
- Doing dry-run of your script locally
- Executing your script locally, and uploading to DataHen
- Querying scraper outputs
- Restart a scraping job
- Setting Variables and Secrets to your Account, Scrapers, and Jobs
- Using a custom docker image for the scraper
- How to use shared code libraries from other Git repositories using Git Submodule
- How to debug page fetch
- Advanced Usage