Saltar al contenido
OPEN SOURCE PROJECT

The professional Web Scraping solution for Fashion E-commerce

A distributed, scalable, and robust system designed for large-scale data extraction powering the Stylos AI ecosystem.

Learn more

What is Stylos Scraper?

Stylos Scraper is a professional distributed web scraping solution specifically designed for large-scale data extraction from fashion e-commerce websites. It leverages advanced technologies to build a scalable and robust system capable of handling multiple websites simultaneously.

Key Highlights

Production‑ready: scalable, observable, and easy to extend.

Multi‑Country/Multi‑Language Support

International Zara extraction with dynamic parameters.

Automatic Multi‑Currency

Automatic currency detection per country (USD, EUR, COP, ...).

Modular Extractors

Pluggable architecture to extend to new retailers quickly.

Fully Dockerized

Cloud‑native architecture orchestrated via Docker Compose.

Distributed Scraping

Parallel browser automation using Selenium Grid.

Advanced CLI

Schedule, launch, and monitor jobs from the terminal.

Sentry Monitoring

End‑to‑end error and performance tracking.

Advanced Middlewares

Smart request management and improved anti‑detection.

Quick Start

Get the distributed architecture running in minutes.

1 Paso 1

Clone the repository

git clone https://github.com/builker-col/stylos-scrapers.git
cd stylos-scrapers
2 Paso 2

Create your .env file

# Copy the example: cp .env.example .env
# Or create a new one with the content below
Sample .env
# MongoDB Configuration (use host.docker.internal to connect from a container to the host)
MONGO_URI=mongodb://host.docker.internal:27017
MONGO_DATABASE=stylos_scrapers
MONGO_COLLECTION=products

# Selenium Grid Configuration
SELENIUM_MODE=remote
SELENIUM_HUB_URL=http://selenium-hub:4444/wd/hub

# Scrapyd Configuration
SCRAPYD_URL=http://scrapyd:6800
PROJECT_NAME=stylos

# Monitoring (Optional)
SENTRY_DSN=
SCRAPY_ENV=development
3 Paso 3

Launch the architecture

docker-compose up --build -d

Basic Usage

Zara (defaults to Colombia)
python control_scraper.py --spider zara
Zara US in English
python control_scraper.py --spider zara --country us --lang en
Single product (testing)
python control_scraper.py --spider zara --country us --lang en --url "https://www.zara.com/us/en/your-product-url.html"
Mango (full run)
python control_scraper.py --spider mango

Tips

  • La CLI muestra estado en tiempo real, IDs de job y logs detallados.
  • Puedes escalar Chrome para más paralelismo: docker-compose up --scale chrome=3 -d
  • Ejecuta comandos dentro de contenedores: docker-compose exec api ...

Advanced Technologies

Built on a modern and powerful tech stack to ensure maximum performance and scalability.

🧪

Selenium Grid

Parallelized scraping and testing across multiple machines and browsers.

🕷️

Scrapyd

Service to deploy and run Scrapy spiders, managing scraping processes.

FastAPI

High-performance framework to build the API that controls and monitors scraping jobs.

🐳

Docker

Containerization for easy and consistent deployments across environments.

PART OF SOMETHING BIGGER

The Stylos Ecosystem

This project is part of the Stylos ecosystem, an AI platform that analyzes fashion trends and produces personalized recommendations.

Data extracted by Stylos Scraper fuels our AI models to identify styles such as Old Money, Formal, Streetwear, and many more.

Ready to contribute or use the project?

The code is fully open. Explore the repository, report issues, or make your first pull request.

Go to Repository
fade-in-observer