Stylos Scraper — Zara scraper and Mango scraper for Fashion E-commerce

OPEN SOURCE PROJECT

The professional Web Scraping solution for Fashion E-commerce

A distributed, scalable, and robust system designed for large-scale data extraction powering the Stylos AI ecosystem.

What is Stylos Scraper?

Stylos Scraper is a professional distributed web scraping solution specifically designed for large-scale data extraction from fashion e-commerce websites. It leverages advanced technologies to build a scalable and robust system capable of handling multiple websites simultaneously.

Key Highlights

Production‑ready: scalable, observable, and easy to extend.

Multi‑Country/Multi‑Language Support

International Zara extraction with dynamic parameters.

Automatic Multi‑Currency

Automatic currency detection per country (USD, EUR, COP, ...).

Modular Extractors

Pluggable architecture to extend to new retailers quickly.

Fully Dockerized

Cloud‑native architecture orchestrated via Docker Compose.

Distributed Scraping

Parallel browser automation using Selenium Grid.

Advanced CLI

Schedule, launch, and monitor jobs from the terminal.

Sentry Monitoring

End‑to‑end error and performance tracking.

Advanced Middlewares

Smart request management and improved anti‑detection.

Available scrapers

By website

ZARA

View more details →

MANGO

View more details →

Quick Start

Get the distributed architecture running in minutes.

1 Paso 1

Clone the repository

git clone https://github.com/builker-col/stylos-scrapers.git
cd stylos-scrapers

2 Paso 2

Create your .env file

# Copy the example: cp .env.example .env
# Or create a new one with the content below

Sample .env

# MongoDB Configuration (use host.docker.internal to connect from a container to the host)
MONGO_URI=mongodb://host.docker.internal:27017
MONGO_DATABASE=stylos_scrapers
MONGO_COLLECTION=products

# Selenium Grid Configuration
SELENIUM_MODE=remote
SELENIUM_HUB_URL=http://selenium-hub:4444/wd/hub

# Scrapyd Configuration
SCRAPYD_URL=http://scrapyd:6800
PROJECT_NAME=stylos

# Monitoring (Optional)
SENTRY_DSN=
SCRAPY_ENV=development

3 Paso 3

Launch the architecture

docker-compose up --build -d

Servicios iniciados

Basic Usage

Zara (defaults to Colombia)

python control_scraper.py --spider zara

Zara US in English

python control_scraper.py --spider zara --country us --lang en

Single product (testing)

python control_scraper.py --spider zara --country us --lang en --url "https://www.zara.com/us/en/your-product-url.html"

Mango (full run)

python control_scraper.py --spider mango

Tips

La CLI muestra estado en tiempo real, IDs de job y logs detallados.
Puedes escalar Chrome para más paralelismo: docker-compose up --scale chrome=3 -d
Ejecuta comandos dentro de contenedores: docker-compose exec api ...

Advanced Technologies

Built on a modern and powerful tech stack to ensure maximum performance and scalability.

🧪

Selenium Grid

Parallelized scraping and testing across multiple machines and browsers.

🕷️

Scrapyd

Service to deploy and run Scrapy spiders, managing scraping processes.

⚡

FastAPI

High-performance framework to build the API that controls and monitors scraping jobs.

🐳

Docker

Containerization for easy and consistent deployments across environments.

Docs

Detailed documentation for setup, usage, contribution, and licensing.

Usage

How to configure parameters, run extractors, and export results.

Contributing

Style guide, branching model, and how to propose new extractors.

License

Licensing model and legal considerations for scraping.

Detailed Docs

Architecture, middlewares, monitoring, and scaling best practices.

PART OF SOMETHING BIGGER

The Stylos Ecosystem

This project is part of the Stylos ecosystem, an AI platform that analyzes fashion trends and produces personalized recommendations.

Data extracted by Stylos Scraper fuels our AI models to identify styles such as Old Money, Formal, Streetwear, and many more.

Ready to contribute or use the project?

The code is fully open. Explore the repository, report issues, or make your first pull request.

Go to Repository