Saltar al contenido

Mango Scraper — Extract Mango data

Mango scraper powered by Scrapy and Selenium. Run full crawls or single product.

Details

Implemented Features

  • Footer navigation: categories from footer links
  • Categories: Women and Men with full navigation
  • Advanced extraction: products, prices, descriptions, images
  • Images per color (max 15 per color) with deduplication
  • Progressive scrolling up to 30 attempts
  • Integrated Selenium with anti-detection
  • Pricing system with discount detection

Technical Capabilities

scrapy crawl mango                   # Full crawl
scrapy crawl mango -a url="URL"     # Single product
scrapy crawl mango -o products.json # Export results

Extracted Data

  • Normalized product name
  • Full description
  • Original and current price
  • Discount percentage and amount
  • Automatically detected currency (COP)
  • Canonical product URL
  • Images organized by color with duplicate detection
  • Extraction metadata (date, site)

FAQ

How to run the Mango scraper?

Use scrapy: "scrapy crawl mango" or for a single product: "scrapy crawl mango -a url="URL"".

What data does the Mango scraper extract?

Name, description, original/current price, currency (COP), canonical URL, color images and metadata.

Docker Compose is up, can I run scraping with a script that talks to the API?

Yes. Use the control_scraper.py script (it talks to the API to orchestrate scraping). Example (full run): python control_scraper.py --spider mango

Also looking for Zara scraper?