The professional Web Scraping solution for Fashion E-commerce
A distributed, scalable, and robust system designed for large-scale data extraction powering the Stylos AI ecosystem.
Learn moreWhat is Stylos Scraper?
Stylos Scraper is a professional distributed web scraping solution specifically designed for large-scale data extraction from fashion e-commerce websites. It leverages advanced technologies to build a scalable and robust system capable of handling multiple websites simultaneously.
Key Highlights
Production‑ready: scalable, observable, and easy to extend.
Multi‑Country/Multi‑Language Support
International Zara extraction with dynamic parameters.
Automatic Multi‑Currency
Automatic currency detection per country (USD, EUR, COP, ...).
Modular Extractors
Pluggable architecture to extend to new retailers quickly.
Fully Dockerized
Cloud‑native architecture orchestrated via Docker Compose.
Distributed Scraping
Parallel browser automation using Selenium Grid.
Advanced CLI
Schedule, launch, and monitor jobs from the terminal.
Sentry Monitoring
End‑to‑end error and performance tracking.
Advanced Middlewares
Smart request management and improved anti‑detection.
Available scrapers
By website
Quick Start
Get the distributed architecture running in minutes.
Clone the repository
git clone https://github.com/builker-col/stylos-scrapers.git
cd stylos-scrapers
Create your .env file
# Copy the example: cp .env.example .env
# Or create a new one with the content below
Sample .env
# MongoDB Configuration (use host.docker.internal to connect from a container to the host)
MONGO_URI=mongodb://host.docker.internal:27017
MONGO_DATABASE=stylos_scrapers
MONGO_COLLECTION=products
# Selenium Grid Configuration
SELENIUM_MODE=remote
SELENIUM_HUB_URL=http://selenium-hub:4444/wd/hub
# Scrapyd Configuration
SCRAPYD_URL=http://scrapyd:6800
PROJECT_NAME=stylos
# Monitoring (Optional)
SENTRY_DSN=
SCRAPY_ENV=development
Launch the architecture
docker-compose up --build -d
Servicios iniciados
Basic Usage
python control_scraper.py --spider zara
python control_scraper.py --spider zara --country us --lang en
python control_scraper.py --spider zara --country us --lang en --url "https://www.zara.com/us/en/your-product-url.html"
python control_scraper.py --spider mango
Tips
- La CLI muestra estado en tiempo real, IDs de job y logs detallados.
- Puedes escalar Chrome para más paralelismo:
docker-compose up --scale chrome=3 -d
- Ejecuta comandos dentro de contenedores:
docker-compose exec api ...
Advanced Technologies
Built on a modern and powerful tech stack to ensure maximum performance and scalability.
Selenium Grid
Parallelized scraping and testing across multiple machines and browsers.
Scrapyd
Service to deploy and run Scrapy spiders, managing scraping processes.
FastAPI
High-performance framework to build the API that controls and monitors scraping jobs.
Docker
Containerization for easy and consistent deployments across environments.
Docs
Detailed documentation for setup, usage, contribution, and licensing.
The Stylos Ecosystem
This project is part of the Stylos ecosystem, an AI platform that analyzes fashion trends and produces personalized recommendations.
Data extracted by Stylos Scraper fuels our AI models to identify styles such as Old Money, Formal, Streetwear, and many more.
Ready to contribute or use the project?
The code is fully open. Explore the repository, report issues, or make your first pull request.
Go to Repository