Bright Data
Search, Crawl and Scrape any site, at scale, without getting blocked
0.5.0Bright Data provides a developer toolkit for large-scale web search, crawling, and scraping, enabling reliable extraction of pages and structured data without getting blocked. It supports search queries, content-to-Markdown conversion, and configurable data feeds across many site types.
Designed for integration into data pipelines and analytics workflows with parameterized feeds and output formats.
Capabilities
- Scale-resistant crawling and scraping with anti-blocking behavior for sustained collection.
- Flexible search engine queries with advanced parameters across major engines.
- Transform pages into clean Markdown and emit structured JSON feeds for profiles, products, reviews, listings, and media.
- Configurable extraction parameters for batching, pagination, and media handling.
Secrets
- API key (BRIGHTDATA_API_KEY) and zone token (BRIGHTDATA_ZONE). Example values: BRIGHTDATA_API_KEY=sk_..., BRIGHTDATA_ZONE=zone123.
Available tools(3)
| Tool name | Description | Secrets | |
|---|---|---|---|
Scrape a webpage and return content in Markdown format using Bright Data.
Examples:
scrape_as_markdown("https://example.com") -> "# Example Page
Content..."
scrape_as_markdown("https://news.ycombinator.com") -> "# Hacker News
..."
| 2 | ||
Search using Google, Bing, or Yandex with advanced parameters using Bright Data.
Examples:
search_engine("climate change") -> "# Search Results
## Climate Change - Wikipedia
..."
search_engine("Python tutorials", engine="bing", num_results=5) -> "# Bing Results
..."
search_engine("cats", search_type="images", country_code="us") -> "# Image Results
..."
| 2 | ||
Extract structured data from various websites like LinkedIn, Amazon, Instagram, etc.
NEVER MADE UP LINKS - IF LINKS ARE NEEDED, EXECUTE search_engine FIRST.
Supported source types:
- amazon_product, amazon_product_reviews
- linkedin_person_profile, linkedin_company_profile
- zoominfo_company_profile
- instagram_profiles, instagram_posts, instagram_reels, instagram_comments
- facebook_posts, facebook_marketplace_listings, facebook_company_reviews
- x_posts
- zillow_properties_listing
- booking_hotel_listings
- youtube_videos
Examples:
web_data_feed("amazon_product", "https://amazon.com/dp/B08N5WRWNW")
-> "{"title": "Product Name", ...}"
web_data_feed("linkedin_person_profile", "https://linkedin.com/in/johndoe")
-> "{"name": "John Doe", ...}"
web_data_feed(
"facebook_company_reviews", "https://facebook.com/company", num_of_reviews=50
) -> "[{"review": "...", ...}]" | 1 |
Selected tools
No tools selected.
Click "Show all tools" to add tools.
Requirements
Select tools to see requirements
Brightdata.ScrapeAsMarkdown
Scrape a webpage and return content in Markdown format using Bright Data. Examples: scrape_as_markdown("https://example.com") -> "# Example Page Content..." scrape_as_markdown("https://news.ycombinator.com") -> "# Hacker News ..."
Parameters
| Parameter | Type | Req. | Description |
|---|---|---|---|
url | string | Required | URL to scrape |
Requirements
Output
string— Scraped webpage content as MarkdownBrightdata.SearchEngine
Search using Google, Bing, or Yandex with advanced parameters using Bright Data. Examples: search_engine("climate change") -> "# Search Results ## Climate Change - Wikipedia ..." search_engine("Python tutorials", engine="bing", num_results=5) -> "# Bing Results ..." search_engine("cats", search_type="images", country_code="us") -> "# Image Results ..."
Parameters
| Parameter | Type | Req. | Description |
|---|---|---|---|
query | string | Required | Search query |
engine | string | Optional | Search engine to usegooglebingyandex |
language | string | Optional | Two-letter language code |
country_code | string | Optional | Two-letter country code |
search_type | string | Optional | Type of searchimagesshoppingnewsjobs |
start | integer | Optional | Results pagination offset |
num_results | integer | Optional | Number of results to return. The default is 10 |
location | string | Optional | Location for search results |
device | string | Optional | Device typemobileiosiphoneipadandroidandroid_tablet |
return_json | boolean | Optional | Return JSON instead of Markdown |
Requirements
Output
string— Search results as Markdown or JSONBrightdata.WebDataFeed
Extract structured data from various websites like LinkedIn, Amazon, Instagram, etc. NEVER MADE UP LINKS - IF LINKS ARE NEEDED, EXECUTE search_engine FIRST. Supported source types: - amazon_product, amazon_product_reviews - linkedin_person_profile, linkedin_company_profile - zoominfo_company_profile - instagram_profiles, instagram_posts, instagram_reels, instagram_comments - facebook_posts, facebook_marketplace_listings, facebook_company_reviews - x_posts - zillow_properties_listing - booking_hotel_listings - youtube_videos Examples: web_data_feed("amazon_product", "https://amazon.com/dp/B08N5WRWNW") -> "{"title": "Product Name", ...}" web_data_feed("linkedin_person_profile", "https://linkedin.com/in/johndoe") -> "{"name": "John Doe", ...}" web_data_feed( "facebook_company_reviews", "https://facebook.com/company", num_of_reviews=50 ) -> "[{"review": "...", ...}]"
Parameters
| Parameter | Type | Req. | Description |
|---|---|---|---|
source_type | string | Required | Type of data sourceamazon_productamazon_product_reviewslinkedin_person_profilelinkedin_company_profilezoominfo_company_profileinstagram_profilesinstagram_postsinstagram_reelsinstagram_commentsfacebook_postsfacebook_marketplace_listingsfacebook_company_reviewsx_postszillow_properties_listingbooking_hotel_listingsyoutube_videos |
url | string | Required | URL of the web resource to extract data from |
num_of_reviews | integer | Optional | Number of reviews to retrieve. Only applicable for facebook_company_reviews. Default is None |
timeout | integer | Optional | Maximum time in seconds to wait for data retrieval |
polling_interval | integer | Optional | Time in seconds between polling attempts |
Requirements
Output
string— Structured data from the requested source as JSON