Polish Independent Coffee Market – ETL & BI Pipeline
End-to-end data pipeline that extracts coffee market data from the Allegro API, transforms it into a clean analytical model, stores it in MySQL, and visualizes insights in Power BI.
Project Overview
- Collects data about independent coffee roasters in Poland, excluding mass-market brands.
- Focuses on product characteristics: brands, origin, roast level, caffeine content, packaging, and more.
- Creates a normalized relational schema for analytical purposes (snowflake-style).
- Provides clean datasets for BI and dashboards.
Tech Stack
Python (Poetry, Pydantic, SQLAlchemy), MySQL, DBeaver, Power BI
The pipeline uses Python for ETL: extracting JSON data from the Allegro REST API, validating and transforming it, and loading it into a relational database. Power BI visualizes the data for insights.
Architecture
The project follows an end-to-end ETL workflow:
- Extract: Fetch product data from Allegro API and save raw JSON for traceability.
- Transform: Normalize products, parameters, and parameter values into relational tables. Clean and validate data using Pydantic models.
- Load: Insert processed data into MySQL tables using idempotent Python functions.
- Analyze: Create BI dashboards in Power BI using the cleaned relational model. Key metrics include brand distribution, origin, roast levels, caffeine content, and package sizes.

Key Insights
- Top package sizes: 1000g, 250g, 500g, with rare sizes grouped as “Other”.
- Most products are Arabica, with light and medium roast dominating independent roasters.
- Brazil, Ethiopia, and Colombia are the most common origins for Polish independent coffees.
- Decaf products are a small fraction, showing niche market presence.
Screenshots


Code & Documentation
Full code, database schema, and Power BI files are available on GitHub:github.com/yourusername/data-coffee-market
What I Learned
This project demonstrates end-to-end ETL design, data modeling, Python validation with Pydantic, relational schema design, and interactive dashboards in Power BI. I improved skills in API integration, data cleaning, idempotent database operations, and dashboard storytelling.