VIEW PROJECTS

SQL/MACHINE LEARNING

Paystack Payment Analytics - B2B Fintech Product Analytics

Project Background

Project Overview:

This project analyzes payment processing data modeled after the Nigerian fintech ecosystem (Paystack, Flutterwave) to understand merchant behavior, optimize activation funnels, predict churn, and detect fraud. By combining SQL-based product analytics with machine learning models, I built a comprehensive analytical framework that provides actionable insights for improving platform health and revenue growth.

Business Context: Payment processing platforms like Paystack earn revenue through transaction fees (typically 1.5-2% per successful payment). Their business success depends on three critical factors:

Activation: Getting merchants to complete setup and process their first live transaction
Retention: Keeping merchants actively processing payments month-over-month
Fraud Prevention: Blocking fraudulent transactions that result in chargebacks and revenue loss

The Challenge: With thousands of merchants spanning different business types (SME, Enterprise, Individual), payment methods (Card, Bank Transfer, USSD, Mobile Money), and engagement levels, how do you identify which merchants to prioritize for retention efforts? How do you detect fraud patterns in real-time? How do you optimize the activation funnel?

The Solution: Built an end-to-end analytics pipeline covering 20 SQL queries (activation, engagement, retention, churn, revenue, payment analysis) and 3 machine learning models (churn prediction, fraud detection, merchant segmentation) to transform raw payment data into strategic business recommendations.

Objectives:

Primary Goal

Develop data-driven strategies to improve merchant activation rates, reduce churn, and prevent fraud in a B2B fintech payment platform.

Specific Goals

1. Activation Analysis

Calculate merchant activation rate and identify drop-off points in onboarding funnel
Determine average time-to-activate and factors that accelerate/slow activation
Compare activation performance across merchant segments (SME vs Enterprise)

2. Engagement & Retention Analysis

Track monthly active merchants (MAM) and growth trends
Segment merchants by engagement levels (Low, Medium, High)
Calculate M1, M3 retention rates and build cohort retention tables
Identify D7, D14, D30, D60, D90 retention benchmarks

3. Churn Analysis

Calculate monthly churn rate and identify churn drivers
Compare logo churn vs revenue churn (merchant count vs revenue impact)
Build predictive model to flag high-risk merchants before they churn

4. Revenue Analysis

Track Monthly Recurring Revenue (MRR) growth
Calculate Net Revenue Retention (NRR) to measure expansion from existing merchants
Analyze revenue distribution by merchant segment
Calculate Cohort Lifetime Value (LTV)

5. Payment Performance Analysis

Compare success rates across payment methods (Card, Bank, USSD, Mobile Money)
Identify failure reasons and their business impact
Analyze payment method preferences by merchant segment

Tools and Technologies:

Data Generation

Because real Paystack data is confidential, I generated a realistic synthetic dataset using Python:

800 merchants
~23,857 transactions
Payment methods: Card, Bank Transfer, USSD, Mobile Money
Time-based activity patterns
Fraud-like anomalies embedded intentionally

Machine Learning

scikit-learn: Random Forest (churn), Isolation Forest (fraud), K-Means (segmentation)
pandas, numpy: Data manipulation and feature engineering
matplotlib, seaborn: Model performance visualization

Key Insights:

Churn is driven by inactivity, not merchant size
Revenue is highly concentrated among power merchants
High logo churn does not necessarily imply business failure
Fraud follows predictable behavioral patterns
Retention strategies must differ by merchant segment

VIEW PROJECT

SQL /MACHINE LEARNING/POWER BI

Open Food Facts Sales Analytics

Project Background

Project Overview:

This project demonstrates a complete data science workflow for an e-commerce business, tackling real-world challenges in customer retention, revenue optimization, and personalized marketing. Using a dataset of 3,900 customers, 10,002 products, and 855 transactions spanning one year, I built an end-to-end analytics solution featuring SQL database queries, three machine learning models, and interactive Power BI dashboards.

The Business Problem

E-commerce companies face three critical challenges:

Customer Churn - 27% of customers are at risk of leaving, resulting in significant revenue loss
Ineffective Marketing - Generic campaigns fail to engage different customer segments
Missed Cross-Selling Opportunities - Products are recommended without understanding customer preferences

My Solution

I created a comprehensive analytics system that:

Predicts customer churn with 83% accuracy, enabling proactive retention campaigns
Segments customers into 5 distinct groups for targeted marketing
Recommends products based on similarity algorithms to increase cross-selling
Analyzes 20+ business metrics through complex SQL queries across multiple tables
Delivers insights through interactive dashboards for stakeholder decision-making

Objectives:

Primary Goals

1. Build Production-Ready SQL Database

Design relational database structure with multiple tables
Write complex queries using JOINs, CTEs, and window functions
Demonstrate real-world database querying skills beyond CSV analysis

2. Develop Predictive Machine Learning Models

Customer Segmentation - Identify distinct customer groups using unsupervised learning
Churn Prediction - Forecast which customers will leave using classification
Product Recommendation - Suggest relevant products using content-based filtering

3. Create Actionable Business Insights

Translate model outputs into clear business recommendations
Calculate ROI for retention campaigns
Prioritize high-value customers for marketing spend

4. Visualize Complex Data Effectively

Build interactive Power BI dashboards with drill-down capabilities
Present model performance metrics (confusion matrix, feature importance)
Enable stakeholders to make data-driven decisions

Technical Learning Objectives:

Master SQL for multi-table analysis (not just single CSV files)
Implement supervised learning (classification) and unsupervised learning (clustering)
Evaluate models properly (train/test split, ROC-AUC, confusion matrix)
Build end-to-end pipeline: Database → Analysis → Modeling → Visualization

Tools and Technologies:

Programming & Data Analysis
Python 3.8: Data manipulation, ML modeling
Pandas: Cleaning, transformation, feature engineering
NumPy: Array operations, mathematical functions
Jupyter Notebook: Exploratory analysis, model experimentation
Machine Learning & Statistics
Scikit-learn: Random Forest, K-Means, preprocessing
OneHotEncoder: Categorical variable transformation
Train-Test Split: Model validation (70-30 split for unbiased evaluation)
Cosine Similarity: Recommendations, Product similarity calculations

Specific Algorithms:
K-Means Clustering - Unsupervised segmentation (5 customer groups)
Random Forest Classifier - Churn prediction (100 trees, max_depth=10)
Content-Based Filtering - Product recommendations (similarity matrix)

SQL Techniques Applied:
INNER/LEFT/RIGHT JOINs - Multi-table relationships
Window Functions - RANK(), ROW_NUMBER(), cumulative calculations
Subqueries - Nested SELECT statements
Date/Time Functions - YEAR(), MONTH(), FORMAT()
Aggregate Functions - SUM(), AVG(), COUNT(), GROUP BY, HAVING

Data Visualization
Power BI Desktop Interactive Dashboard
Matplotlib
Seaborn

Power BI Features:
DAX measures for calculated metrics
Drill-through functionality
Conditional formatting
Custom tooltips
Matrix visuals with color scales

VIEW PROJECT

SQL/MACHINE LEARNING/VISUALIZATION

League of Legends Player Retention Analytics

Project Background

Project Overview:

This project analyzes player engagement and retention patterns in League of Legends using real-time data from the Riot Games API. By combining SQL analytics, machine learning (K-Means clustering and Logistic Regression), and interactive Power BI dashboards, I identified the key drivers of player churn and built predictive models to flag at-risk players before they leave.

The Core Problem: Out of 12,760 new players, only 378 (2.96%) returned for a second match—a staggering 97% early drop-off rate that represents massive lost revenue and community growth potential.

The Solution: A comprehensive analytics pipeline that segments players by behavior, predicts 14-day retention with 84% accuracy, and provides actionable recommendations to reduce churn by 30-40%.

Project Duration: 4 weeks
Data Volume: 100,000+ matches, 12,760+ players, 1,944 total matches analyzed
Key Deliverables: SQL database, 2 ML models, interactive Power BI dashboard, business recommendations

Objectives:

Primary Objective

Identify why players churn early and develop data-driven strategies to improve retention rates.

Specific Goals

Understand Player Behavior: Analyze engagement patterns, match frequency, and performance metrics
Segment Players: Use machine learning to identify distinct player types with different retention profiles
Predict Churn: Build a model to identify at-risk players before they quit (D14 retention prediction)
Quantify Impact: Calculate the business value of retention improvements
Provide Recommendations: Deliver specific, actionable product changes based on data insights

Tools and Technologies Used:

Data Collection

Riot Games API: Real-time match data, player profiles, in-game events
Python (Requests library): API integration with rate limiting and error handling

Data Storage & Processing

SQL Server: Relational database design for players, matches, events
SQL: Complex queries for cohort analysis, retention calculations, funnel metrics
Python (pandas, numpy): Data cleaning, transformation, feature engineering

Machine Learning

scikit-learn: K-Means clustering (player segmentation), Logistic Regression (churn prediction)
Python: Model training, evaluation, hyperparameter tuning
Jupyter Notebooks: Exploratory data analysis and model development

Visualization & Reporting

Power BI: Interactive dashboards with DAX measures and Power Query transformations
DAX: Custom retention metrics, time intelligence calculations
Power Query (M language): Advanced data transformations for player-level aggregations

Methodology

Phase 1: Data Collection & Database Design

1. Registered for Riot Games API access and obtained development key

2. Built Python scripts to extract match history, player profiles, and in-game events

3. Designed normalized SQL database schema with relationships:

- players table (demographics, join dates)
- matches table (match metadata, duration, game mode)
- match_participants table (player performance per match)
- events table (in-game actions: kills, deaths, objectives)
Implemented automated daily data refresh pipeline

Phase 2: Exploratory Analysis

Cohort Retention Analysis: Calculated D1, D7, D14, D30 retention rates by signup week
Engagement Funnel: Mapped player progression from first match → second match → activation (3+ matches)
Performance Impact: Analyzed correlation between kills, win rate, match duration, and retention
Segment Hypothesis: Identified patterns suggesting different player types exist

VIEW PROJECT

EGS_LeagueofLegends_RiotGames_S1_2560x1440-47eb328eac5ddd63ebd096ded7d0d5ab.jpg

VISUALIZATION WITH POWER BI

Fintech Project Management Analysis

Project Background

Project Overview:

This project is the FP20 Analytics Live Challenge 32 which analyzes real-world Fintech project management operations using a multi-table dataset covering projects, tasks, employees, milestones, and budgets. The aim was to understand how digital payment solutions are developed across teams and to uncover patterns in delays, resource use, and cost efficiency. The final Power BI report highlights key delivery metrics, performance gaps, and financial insights needed to improve on-time delivery and optimize workforce utilization.

Objectives:

Evaluate project performance: timeline accuracy, delivery progress, and completion rates.
Measure cost performance against budget and highlight overspending or underspending risks.
Track task-level efficiency to identify blockers such as long review cycles or tasks on hold.
Compare departments, teams, and cities based on consistency and delivery reliability.
Analyze workforce impact using experience level, hourly rates, and actual hours logged.
Support decision-making through interactive drill-through analysis and department-level insights.

Key Insights:

48% of budget allocated remained unused, signaling potential planning inefficiencies.
Projects with low completion rates shared common blockers like “On Hold” or “Review Required.”
High-experience employees consistently delivered faster with fewer hours variance.
Certain departments completed tasks with better cost efficiency, indicating better workload management.
Projects with early milestone delays had a much higher likelihood of overspending later.
Cities with stable workforce distribution demonstrated more predictable delivery patterns.

Tools and Techniques:

Power BI Desktop: Data modeling, DAX measures, interactive dashboard creation
Power Query: Type corrections, locale date fixes, merging & cleaning
DAX: KPIs, variance metrics, resource cost calculations, experience-efficiency modeling
UI/UX principles: sidebar navigation, section layout, KPI placement, consistent typography
Drill-through analytics: Employee-level and project-level investigation

Deliveries:

3-page interactive Power BI dashboard
Clean PDF export for portfolio
Insight summary for stakeholders

VIEW PROJECT

MACHINE LEARNING/POWER BI

Marketing Campaign Success Predictor

Project Background

Project Overview:

Built an end-to-end machine learning system that predicts whether digital marketing campaigns will achieve high ROI (>1000%) before launch, enabling data-driven budget allocation decisions and preventing wasted ad spend.

The Challenge

Marketing teams struggle to predict campaign success before investing significant budgets. Most decisions are based on intuition rather than data, resulting in:
- £90,000+ annual waste on underperforming campaigns
- Missed opportunities on high-potential campaigns
- Lack of actionable insights on what drives success

The Solution

Developed a machine learning classification system that:
- Analyzes 9,900 historical campaigns across Facebook, Instagram, and Pinterest
- Predicts campaign success probability with 79% accuracy on unseen data
- Identifies the key factors driving campaign performance
- Provides interactive dashboards for decision-making
- Offers a "What-If" simulator for testing new campaign ideas

Objectives:

Primary Objectives

1. Build a Predictive Model
- Develop ML classifier to predict campaign success (High ROI vs Low ROI)
- Achieve minimum 75% accuracy on unseen test data
- Ensure model is interpretable for business stakeholders

2. Identify Success Drivers
- Determine which factors matter most for campaign performance
- Quantify the importance of budget, channel, timing, and engagement
- Provide actionable insights for marketing strategy

3. Create Decision-Support Tools
- Build interactive dashboards for campaign analysis
- Develop "What-If" simulator for testing new campaigns
- Enable non-technical users to leverage ML predictions

4. Demonstrate Business Value
- Calculate ROI and cost savings from model implementation
- Show clear recommendations for budget allocation
- Prove the model's real-world applicability

Tools and Technologies:

Programming & Machine Learning: Python, RandomForest, pandas, numpy
Data Processing & Feature Engineering: OneHotEncoder, ColumnTransformer, MinMaxScaler
Visualization & Reporting: Power BI, matplotlib, seaborn

VIEW PROJECT

BIG QUERY/SQL SERVER/POWER BI

Google Merchandise Store: Full Stack Product Analytics

Project Background

Project Overview:

A comprehensive e-commerce analytics project analyzing 900,000+ customer events from Google's official Merchandise Store. Using the full product analytics stack—from BigQuery data extraction to Power BI dashboards—I identified critical conversion bottlenecks and developed data-driven recommendations projected to increase revenue by 15% ($58K annually).

This project demonstrates end-to-end product analytics capabilities: data engineering, advanced SQL analysis, statistical reasoning, visualization design, and business strategy development.

Objectives:

Primary Goal

Conduct a complete conversion funnel analysis to identify where Google Merchandise Store loses customers and provide actionable recommendations to increase revenue

Specific Objectives

Map the customer journey - Build product-level funnel from view to purchase
Quantify drop-off points - Calculate conversion rates between each funnel stage
Segment user behavior - Analyze by traffic source, device, and temporal patterns
Identify product opportunities - Determine best and worst-performing items
Calculate business impact - Estimate revenue potential of optimization efforts

Success Metrics

Overall conversion rate analysis
Cart abandonment diagnosis
Traffic source ROI evaluation
Product-level conversion insights
Revenue opportunity quantification

Tools and Technologies Used:

Data Infrastructure

Google BigQuery - Extracted 900K events from GA4 public dataset
SQL Server - Data warehouse for transformation and analysis
T-SQL - 20+ complex analytical queries

Analysis & Visualization

Power BI Desktop - Interactive 2-page dashboard
DAX - Custom measures for conversion and abandonment metrics

Technical Skills Applied
SQL: CTEs, window functions, self-joins, date manipulation, aggregations
Data Modeling: Dimensional modeling with fact and dimension tables
Statistics: Conversion rate analysis, cohort retention, A/B test framework
Dashboard Design: User experience principles, visual hierarchy, storytelling
Business Analysis: Revenue impact modeling, prioritization frameworks

Skills Demonstrated:

Technical Proficiency
Advanced SQL: Complex CTEs, window functions, self-joins for 900K+ row datasets
Data Engineering: End-to-end pipeline from BigQuery to SQL Server to Power BI
Statistical Analysis: Conversion rates, cohort analysis, A/B test design
Data Visualization: Dashboard UX design following best practices

Business Acumen

Product Thinking: Translated metrics into user behavior insights
Impact Quantification: Tied every finding to revenue opportunity
Prioritization: Ranked recommendations by effort vs. impact
Stakeholder Communication: Crafted narrative for technical and non-technical audiences

Product Analytics Mindset

Asked "why" behind every metric (not just "what")
Segmented data multiple ways to find hidden patterns
Connected user behavior to business outcomes
Provided actionable next steps, not just observations

VIEW PROJECT

PYTHON/MACHINE LEARNING

Hull Tactical Analysis

Project Background

Project Overview:

This project analyzes the Hull Tactical Fund with dataset gotten from kaggle.com using Python to explore market data, identify patterns, and evaluate risk and performance metrics. It applies data preprocessing, visualization, and predictive modeling techniques to draw insights into market behavior and investment trends.

Objectives:

Perform exploratory data analysis (EDA) on the provided datasets (train.csv, test.csv).
Identify correlations and trends in historical financial data.
Build and evaluate predictive models to forecast fund performance.
Visualize insights using Python libraries for better interpretability.

Tools and Technologies:

Python
Jupyter Notebook
pandas, numpy, matplotlib, seaborn, scikit-learn, Linear Regression

VIEW PROJECT

VISUALIZATION WITH POWER BI

West Africa Development Dashboard

Project Background

Project Overview:

This Power BI project explores economic, infrastructure, and environmental development trends across West African countries using World Bank (WDI) data.
It provides a multi-dimensional view of growth—connecting GDP, fiscal health, labor participation, poverty reduction, and sustainability performance—to reveal how the region balances progress with environmental responsibility.

Objectives:

Evaluate economic performance through GDP per capita, PPP, and government debt (% of GDP).
Examine welfare trends—poverty ratio and labor force participation.
Assess infrastructure and environmental balance via electricity access, CO₂ emissions, forest coverage, and land area.
Identify top-performing countries and opportunities for sustainable development.

Tools and Concepts Used:

Power BI Desktop: Data modeling & interactive visualization
Data Source: World Bank DataBank (WDI) – Africa Development Indicators 2020–2024
DAX Measures: Dynamic averages, ratios (e.g., Sustainability Index, PPP/GDP comparisons)
Data Cleaning: Power Query (removed nulls, filtered years, normalized country names)
Design: Clean theme with KPIs, conditional formatting, and iconography for readability

Analytic Takeaways:

Steady economic progress coexists with slow poverty reduction → policy focus should target inclusive growth.
Energy access expansion drives development but raises CO₂ emissions → green energy transition is key.
Countries like Ghana and Cabo Verde show that high purchasing power can coexist with moderate emissions—models for regional sustainability.
Integrating economic and environmental metrics offers a richer picture of Africa’s development story beyond GDP alone.

VIEW PROJECT

SQL | POWER BI | MACHINE LEARNING

Telco Customer Churn Analysis

Project Background

Project Overview:

This project analyzes customer churn patterns for a telecommunications company using the Telco Customer Churn dataset from Kaggle.
It demonstrates how SQL, Power BI, and Python can be combined to uncover key drivers of churn and build a predictive model that helps businesses retain customers.

The project integrates:

Data extraction and cleaning in MySQL
Visual analytics and KPIs in Power BI
Predictive modeling in Python using Logistic Regression

Objectives:

Identify factors contributing most to customer churn.
Quantify churn rates across service types, contracts, and demographics.
Develop a machine-learning model to predict customers likely to leave.
Present insights visually for clear business communication.

Tools and Concepts Used:

Category Tools / Libraries
Database - MySQL
Visualization Power BI
Machine Learning - Python, Pandas, NumPy, Scikit-Learn
Preprocessing - OneHotEncoder, MinMaxScaler, ColumnTransformer
Evaluation Metrics - Accuracy, AUC, ROC Curve

Analytic Takeaways:

Customers on month-to-month contracts and fiber optic plans have the highest churn rates.
Longer tenure significantly reduces churn probability.
High monthly charges correlate strongly with churn risk.
Payment method and internet service type are major churn drivers.

Business Impact:

This analysis provides a data-driven foundation for:

Targeted customer retention campaigns.
Pricing or contract adjustments to minimize churn.
Predictive alerts to flag high-risk customers early.

VIEW PROJECT

Let's create data driven success together

VIEW PROJECTS

Paystack Payment Analytics - B2B Fintech Product Analytics

Data Generation

Machine Learning

Open Food Facts Sales Analytics

Programming & Data Analysis

Machine Learning & Statistics

Data Visualization

League of Legends Player Retention Analytics

Fintech Project Management Analysis

Marketing Campaign Success Predictor

Google Merchandise Store: Full Stack Product Analytics

Primary Goal

Specific Objectives

Success Metrics

Data Infrastructure

Analysis & Visualization

Technical Skills Applied

Technical Proficiency

Business Acumen

Product Analytics Mindset

Hull Tactical Analysis

West Africa Development Dashboard

Telco Customer Churn Analysis