Polish Independent Coffee Market – ETL & BI Pipeline

End-to-end data pipeline that extracts coffee market data from the Allegro API, transforms it into a clean analytical model, stores it in MySQL, and visualizes insights in Power BI.

Project Overview

  • Collects data about independent coffee roasters in Poland, excluding mass-market brands.
  • Focuses on product characteristics: brands, origin, roast level, caffeine content, packaging, and more.
  • Creates a normalized relational schema for analytical purposes (snowflake-style).
  • Provides clean datasets for BI and dashboards.

Tech Stack

Python (Poetry, Pydantic, SQLAlchemy), MySQL, DBeaver, Power BI

The pipeline uses Python for ETL: extracting JSON data from the Allegro REST API, validating and transforming it, and loading it into a relational database. Power BI visualizes the data for insights.

Architecture

The project follows an end-to-end ETL workflow:

  1. Extract: Fetch product data from Allegro API and save raw JSON for traceability.
  2. Transform: Normalize products, parameters, and parameter values into relational tables. Clean and validate data using Pydantic models.
  3. Load: Insert processed data into MySQL tables using idempotent Python functions.
  4. Analyze: Create BI dashboards in Power BI using the cleaned relational model. Key metrics include brand distribution, origin, roast levels, caffeine content, and package sizes.
Coffee Market ETL Architecture

Key Insights

  • Top package sizes: 1000g, 250g, 500g, with rare sizes grouped as “Other”.
  • Most products are Arabica, with light and medium roast dominating independent roasters.
  • Brazil, Ethiopia, and Colombia are the most common origins for Polish independent coffees.
  • Decaf products are a small fraction, showing niche market presence.

Screenshots

Power BI Dashboard OverviewPower BI Charts

Code & Documentation

Full code, database schema, and Power BI files are available on GitHub:github.com/yourusername/data-coffee-market

What I Learned

This project demonstrates end-to-end ETL design, data modeling, Python validation with Pydantic, relational schema design, and interactive dashboards in Power BI. I improved skills in API integration, data cleaning, idempotent database operations, and dashboard storytelling.

Software Developer Portfolio