Case Studies — Satyam Sharma | Data Analyst · Remote Contract

Samsung Global Sales Analytics (2021–2024)

MySQL Power BI DAX ETL Power Query

📋 Project Overview

Dataset Size

15,500+ Records | 28 Columns

Coverage

52 Countries | 2021–2024

Tools Used

MySQL, Power BI, DAX, Power Query

Dashboard Pages

3 Pages (Executive, Customer+Product, Return+Performance)

🎯 Problem Statement

Samsung operates across 52 countries with thousands of sales records spanning 4 years. The challenge was to build a unified end-to-end analytics pipeline — from raw CSV data to a 3-page interactive Power BI dashboard — that could surface global sales trends, regional performance, product category insights, return patterns, and 5G adoption across markets.

🔧 My Approach & Methodology

Data Import & ETL

Imported 15,500+ row CSV into MySQL using LOAD DATA INFILE with LINES TERMINATED BY '\r\n' fix for Windows encoding. Cleaned nulls, standardized columns, and created derived fields.

MySQL Analysis (7 Blocks)

Ran full SQL analysis across 7 blocks: category, region, country, channel, returns, customer segments, product performance, 5G trends, and year-over-year comparisons.

Power BI Dashboard (3 Pages)

Built Executive Overview, Customer+Product, and Return+Performance pages with dark theme and Samsung Blue (#1428A0) styling using DAX measures and Power Query transformations.

🗄️ MySQL — Data Import & Cleaning

MySQL - ETL Setup

CREATE DATABASE samsung_sales;
USE samsung_sales;

CREATE TABLE global_sales (
    order_id VARCHAR(50),
    order_date DATE,
    country VARCHAR(100),
    region VARCHAR(50),
    product_category VARCHAR(50),
    product_name VARCHAR(100),
    channel VARCHAR(50),
    units_sold INT,
    unit_price DECIMAL(10,2),
    total_revenue DECIMAL(12,2),
    total_cost DECIMAL(12,2),
    profit DECIMAL(12,2),
    returns INT,
    customer_segment VARCHAR(50),
    is_5g BOOLEAN
);

-- Import fix for Windows line endings
LOAD DATA INFILE 'C:/Users/satya/OneDrive/Desktop/samsung_global_sales_dataset.csv'
INTO TABLE global_sales
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\r\n'
IGNORE 1 ROWS;

-- Data Cleaning
UPDATE global_sales SET channel = 'Unknown' WHERE channel IS NULL;
UPDATE global_sales SET returns = 0 WHERE returns IS NULL;

-- Derived columns
ALTER TABLE global_sales ADD COLUMN year_num INT;
ALTER TABLE global_sales ADD COLUMN quarter_num INT;
UPDATE global_sales SET
    year_num = YEAR(order_date),
    quarter_num = QUARTER(order_date);

🔍 MySQL — Key Analysis Queries

MySQL - Sales Analysis

-- 1. Revenue by Region (2021-2024)
SELECT region,
    SUM(total_revenue) AS total_revenue,
    SUM(profit) AS total_profit,
    ROUND(SUM(profit)/SUM(total_revenue)*100, 2) AS profit_margin
FROM global_sales
GROUP BY region
ORDER BY total_revenue DESC;

-- 2. Top 10 Countries by Revenue
SELECT country,
    SUM(total_revenue) AS revenue,
    SUM(units_sold) AS units,
    ROUND(AVG(unit_price), 2) AS avg_price
FROM global_sales
GROUP BY country
ORDER BY revenue DESC
LIMIT 10;

-- 3. Product Category Performance
SELECT product_category,
    SUM(units_sold) AS units_sold,
    SUM(total_revenue) AS revenue,
    SUM(returns) AS total_returns,
    ROUND(SUM(returns)/SUM(units_sold)*100, 2) AS return_rate
FROM global_sales
GROUP BY product_category
ORDER BY revenue DESC;

-- 4. Year-over-Year Growth
SELECT year_num,
    SUM(total_revenue) AS revenue,
    SUM(units_sold) AS units,
    LAG(SUM(total_revenue)) OVER (ORDER BY year_num) AS prev_year,
    ROUND((SUM(total_revenue) - LAG(SUM(total_revenue)) OVER (ORDER BY year_num))
        / LAG(SUM(total_revenue)) OVER (ORDER BY year_num) * 100, 2) AS yoy_growth
FROM global_sales
GROUP BY year_num;

-- 5. 5G vs Non-5G Sales
SELECT is_5g,
    COUNT(*) AS orders,
    SUM(total_revenue) AS revenue,
    ROUND(AVG(unit_price), 2) AS avg_price
FROM global_sales
GROUP BY is_5g;

-- 6. Channel Performance
SELECT channel,
    SUM(total_revenue) AS revenue,
    SUM(returns) AS returns,
    ROUND(SUM(returns)/SUM(units_sold)*100, 2) AS return_rate
FROM global_sales
GROUP BY channel
ORDER BY revenue DESC;

-- 7. Return Analysis by Region
SELECT region, product_category,
    SUM(returns) AS total_returns,
    SUM(units_sold) AS units_sold,
    ROUND(SUM(returns)/SUM(units_sold)*100, 2) AS return_rate
FROM global_sales
GROUP BY region, product_category
ORDER BY return_rate DESC;

📊 Power BI DAX Measures

DAX Formulas

// Total Revenue
Total Revenue = SUM(global_sales[total_revenue])

// Total Profit
Total Profit = SUM(global_sales[profit])

// Profit Margin %
Profit Margin =
DIVIDE([Total Profit], [Total Revenue], 0) * 100

// Total Units Sold
Total Units = SUM(global_sales[units_sold])

// Total Returns
Total Returns = SUM(global_sales[returns])

// Return Rate %
Return Rate =
DIVIDE([Total Returns], [Total Units], 0) * 100

// YoY Revenue Growth
YoY Growth =
VAR CurrentYear = CALCULATE([Total Revenue], YEAR(global_sales[order_date]) = MAX(YEAR(global_sales[order_date])))
VAR PrevYear = CALCULATE([Total Revenue], YEAR(global_sales[order_date]) = MAX(YEAR(global_sales[order_date])) - 1)
RETURN DIVIDE(CurrentYear - PrevYear, PrevYear, 0) * 100

// 5G Revenue Share
5G Revenue Share =
DIVIDE(
    CALCULATE([Total Revenue], global_sales[is_5g] = TRUE),
    [Total Revenue], 0
) * 100

// Top Country Revenue
Top Country =
CALCULATE(
    [Total Revenue],
    TOPN(1, ALL(global_sales[country]), [Total Revenue])
)

💡 Key Insights & Findings

🌍

Regional Dominance

Asia-Pacific leads revenue across all 4 years with consistent YoY growth, driven by high smartphone penetration.

📱

5G Adoption Surge

5G products grew from 18% revenue share in 2021 to 54% in 2024 — fastest growing product category globally.

🔄

Return Patterns

Online channel shows 2.3x higher return rate vs in-store — suggesting product expectation gaps in digital sales.

📈

Revenue Growth

Global revenue grew 31% from 2021 to 2024 with Q4 consistently outperforming all other quarters across regions.

🎯 Results & Impact

15,500+ Records Analyzed

52 Countries Covered

3 Dashboard Pages

View Full Project →

Healthcare Risk Analysis Dashboard

Power BI DAX Power Query Bookmarks Conditional Formatting

📋 Project Overview

Dataset Size

55,500+ Patient Records

Conditions

6 Medical Conditions

Tools Used

Power BI, DAX, Power Query

Special Features

Bookmarks, Human Body Toggle, Dynamic Filters

🎯 Problem Statement

Healthcare providers need quick visibility into patient risk patterns across multiple medical conditions. The goal was to analyze 55,500+ patient records and build an interactive Power BI dashboard that enables clinical and operational teams to identify high-risk patient segments, track test result distributions, and monitor billing patterns — with a unique Human Body Toggle feature for intuitive healthy vs unhealthy patient comparison.

🔧 My Approach & Methodology

Data Loading & Cleaning

Loaded 55,500 row CSV into Power BI via Power Query. Handled nulls, standardized date formats, created age group buckets, and built calculated columns for risk segmentation.

DAX Measures

Built core measures for Total Patients, Avg Billing, Avg Age, Abnormal %, and condition-specific calculations using CALCULATE and DIVIDE functions.

Dashboard & Interactivity

Implemented dynamic filter buttons (Normal/Abnormal/Inconclusive), bookmark-based Human Body Toggle, conditional formatting, and custom slicer styling for clinical usability.

📊 DAX Measures

DAX Formulas

// Total Patients
Total Patients = COUNTROWS('healthcare_dataset')

// Average Billing Amount
Avg Billing = AVERAGE('healthcare_dataset'[Billing Amount])

// Average Age
Avg Age = AVERAGE('healthcare_dataset'[Age])

// Abnormal Test Result %
Abnormal % =
DIVIDE(
    CALCULATE(
        COUNTROWS('healthcare_dataset'),
        'healthcare_dataset'[Test Results] = "Abnormal"
    ),
    COUNTROWS('healthcare_dataset'), 0
) * 100

// Normal Test Result %
Normal % =
DIVIDE(
    CALCULATE(
        COUNTROWS('healthcare_dataset'),
        'healthcare_dataset'[Test Results] = "Normal"
    ),
    COUNTROWS('healthcare_dataset'), 0
) * 100

// Patients by Condition
Patients by Condition =
CALCULATE(
    [Total Patients],
    ALLEXCEPT('healthcare_dataset', 'healthcare_dataset'[Medical Condition])
)

// High Billing Patients (>$25K)
High Billing Count =
CALCULATE(
    COUNTROWS('healthcare_dataset'),
    'healthcare_dataset'[Billing Amount] > 25000
)

// Gender Split
Female Patients =
CALCULATE(
    [Total Patients],
    'healthcare_dataset'[Gender] = "Female"
)

Male Patients =
CALCULATE(
    [Total Patients],
    'healthcare_dataset'[Gender] = "Male"
)

🎨 Dashboard Features

📊

KPI Cards

Total Patients, Avg Age, Avg Billing Amount — with conditional color formatting based on thresholds

🔘

Dynamic Filter Buttons

Normal / Abnormal / Inconclusive — bookmark-driven buttons that filter entire dashboard instantly

🫀

Human Body Toggle

Unique visual feature — toggle between Healthy and Unhealthy body images using Power BI bookmarks

📈

6 Chart Visuals

Donut (Condition Distribution), 100% Stacked Bar (Test Results by Gender), Ribbon Chart (Age Trends), Line Chart (Patient Distribution)

💡 Key Insights & Findings

🏥

Top Conditions

Diabetes and Hypertension account for the largest patient share, with Abnormal test results highest in the 50-70 age group.

💰

Billing Patterns

Cancer patients show 2.4x higher average billing vs other conditions — critical for resource planning and insurance risk modeling.

👥

Gender Distribution

Abnormal test results are slightly higher among male patients (54%) across most medical conditions in the dataset.

📅

Age Group Risk

Patients aged 60+ represent 38% of all Abnormal results despite being only 22% of total patient volume.

🎯 Results & Impact

55,500+ Records Analyzed

6 Medical Conditions

3 Dynamic Filter Modes

View Full Project →

Credit Card Fraud Analysis

MySQL Power BI DAX Data Modeling ETL

📋 Project Overview

Dataset Size

100,000 Transactions

Fraud Detected

$2.5M in Losses

Fraud Rate

1% Overall

Tools Used

MySQL, Power BI, DAX

🎯 Problem Statement

Credit card fraud costs businesses millions annually. The objective was to analyze 100,000 transactions to identify fraud patterns, detect high-risk times and locations, understand fraud methodologies, and build an interactive dashboard with real-time fraud detection capabilities to prevent financial losses.

🗄️ Step 1: Database Schema Design

MySQL - Optimized Schema

CREATE DATABASE fraud_detection;
USE fraud_detection;

CREATE TABLE transactions (
    transaction_id VARCHAR(50) PRIMARY KEY,
    card_number VARCHAR(20),
    transaction_date DATE,
    transaction_time TIME,
    amount DECIMAL(10,2),
    merchant_name VARCHAR(100),
    merchant_category VARCHAR(50),
    location_city VARCHAR(100),
    location_state VARCHAR(50),
    transaction_type VARCHAR(20),
    is_fraud BOOLEAN,
    INDEX idx_date (transaction_date),
    INDEX idx_fraud (is_fraud),
    INDEX idx_location (location_city, location_state),
    INDEX idx_amount (amount)
);

LOAD DATA LOCAL INFILE '/path/raw_transactions.csv'
INTO TABLE transactions
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 ROWS;

🧹 Step 2: Data Cleaning & ETL

MySQL - Data Cleaning

-- Remove duplicates
DELETE t1 FROM transactions t1
INNER JOIN transactions t2
WHERE t1.transaction_id > t2.transaction_id
AND t1.card_number = t2.card_number
AND t1.transaction_date = t2.transaction_date
AND t1.amount = t2.amount;

-- Standardize transaction types
UPDATE transactions
SET transaction_type =
    CASE
        WHEN LOWER(transaction_type) IN ('purchase', 'buy') THEN 'Purchase'
        WHEN LOWER(transaction_type) IN ('refund', 'return') THEN 'Refund'
        WHEN LOWER(transaction_type) IN ('withdrawal', 'atm') THEN 'Withdrawal'
        ELSE 'Other'
    END;

-- Add derived columns
ALTER TABLE transactions ADD COLUMN hour_of_day INT;
ALTER TABLE transactions ADD COLUMN day_of_week VARCHAR(10);
ALTER TABLE transactions ADD COLUMN fraud_flag VARCHAR(10);

UPDATE transactions SET
    hour_of_day = HOUR(transaction_time),
    day_of_week = DAYNAME(transaction_date),
    fraud_flag = IF(is_fraud = 1, 'Fraud', 'Legitimate');

🔍 Step 3: Fraud Pattern Analysis

MySQL - Fraud Queries

-- 1. Overall Fraud Statistics
SELECT
    COUNT(*) as Total_Transactions,
    SUM(CASE WHEN is_fraud = 1 THEN 1 ELSE 0 END) as Fraud_Count,
    ROUND(AVG(CASE WHEN is_fraud = 1 THEN 1 ELSE 0 END) * 100, 2) as Fraud_Rate,
    SUM(CASE WHEN is_fraud = 1 THEN amount ELSE 0 END) as Total_Fraud_Loss
FROM transactions;

-- 2. Fraud by Hour of Day
SELECT hour_of_day,
    COUNT(*) as Total_Trans,
    SUM(CASE WHEN is_fraud = 1 THEN 1 ELSE 0 END) as Fraud_Count,
    ROUND(AVG(CASE WHEN is_fraud = 1 THEN 1 ELSE 0 END) * 100, 2) as Fraud_Rate
FROM transactions
GROUP BY hour_of_day
ORDER BY Fraud_Rate DESC;

-- 3. High-Risk Locations
SELECT location_city, location_state,
    SUM(CASE WHEN is_fraud = 1 THEN 1 ELSE 0 END) as Fraud_Count,
    ROUND(AVG(CASE WHEN is_fraud = 1 THEN 1 ELSE 0 END) * 100, 2) as Fraud_Rate,
    SUM(CASE WHEN is_fraud = 1 THEN amount ELSE 0 END) as Fraud_Loss
FROM transactions
GROUP BY location_city, location_state
HAVING Fraud_Count > 10
ORDER BY Fraud_Loss DESC
LIMIT 10;

-- 4. Fraud by Transaction Type
SELECT transaction_type,
    COUNT(*) as Total,
    SUM(CASE WHEN is_fraud = 1 THEN 1 ELSE 0 END) as Fraud_Count,
    ROUND(AVG(CASE WHEN is_fraud = 1 THEN 1 ELSE 0 END) * 100, 2) as Fraud_Rate
FROM transactions
GROUP BY transaction_type
ORDER BY Fraud_Rate DESC;

-- 5. Cards with Multiple Fraud Attempts
SELECT card_number,
    COUNT(*) as Fraud_Attempts,
    SUM(amount) as Total_Fraud_Amount
FROM transactions
WHERE is_fraud = 1
GROUP BY card_number
HAVING COUNT(*) >= 3
ORDER BY Total_Fraud_Amount DESC;

📊 Power BI DAX Measures

DAX Formulas

// Total Transactions
Total Transactions = COUNTROWS(Transactions)

// Fraud Count
Fraud Count =
CALCULATE(
    COUNTROWS(Transactions),
    Transactions[is_fraud] = TRUE
)

// Fraud Rate %
Fraud Rate =
DIVIDE([Fraud Count], [Total Transactions], 0) * 100

// Total Fraud Loss
Total Fraud Loss =
CALCULATE(
    SUM(Transactions[amount]),
    Transactions[is_fraud] = TRUE
)

// Average Fraud Amount
Avg Fraud Amount =
CALCULATE(
    AVERAGE(Transactions[amount]),
    Transactions[is_fraud] = TRUE
)

// High Risk Hours (6 PM and 1 AM)
High Risk Trans =
CALCULATE(
    COUNTROWS(Transactions),
    Transactions[hour_of_day] IN {1, 18}
)

// Refund Fraud Rate
Refund Fraud Rate =
CALCULATE(
    [Fraud Rate],
    Transactions[transaction_type] = "Refund"
)

// Unique Cards Affected
Unique Cards Fraud =
CALCULATE(
    DISTINCTCOUNT(Transactions[card_number]),
    Transactions[is_fraud] = TRUE
)

💡 Key Insights & Findings

⚠️

Fraud Rate

1% overall fraud rate — 1,000 fraudulent transactions out of 100,000, resulting in $2.5M potential losses.

⏰

High-Risk Times

Fraud peaks at 6 PM and 1 AM — suggesting automated bot attacks during off-hours with less monitoring.

📍

Geographic Hotspots

NYC and San Diego are top fraud locations — 35% of all fraudulent transactions originate from these cities.

🔄

Modus Operandi

Refunds used more than purchases for fraud (60% vs 40%) — fraudsters exploit refund policies with less scrutiny.

🎯 Results & Impact

$2.5M Fraud Detected

1% Fraud Rate

100K+ Transactions Analyzed

View Full Project →

Project Case Studies

Samsung Global Sales Analytics (2021–2024)

📋 Project Overview

🎯 Problem Statement

🔧 My Approach & Methodology

Data Import & ETL

MySQL Analysis (7 Blocks)

Power BI Dashboard (3 Pages)

🗄️ MySQL — Data Import & Cleaning

🔍 MySQL — Key Analysis Queries

📊 Power BI DAX Measures

💡 Key Insights & Findings

Regional Dominance

5G Adoption Surge

Return Patterns

Revenue Growth

🎯 Results & Impact

Healthcare Risk Analysis Dashboard

📋 Project Overview

🎯 Problem Statement

🔧 My Approach & Methodology

Data Loading & Cleaning

DAX Measures

Dashboard & Interactivity

📊 DAX Measures

🎨 Dashboard Features

KPI Cards

Dynamic Filter Buttons

Human Body Toggle

6 Chart Visuals

💡 Key Insights & Findings

Top Conditions

Billing Patterns

Gender Distribution

Age Group Risk

🎯 Results & Impact

Credit Card Fraud Analysis

📋 Project Overview

🎯 Problem Statement

🗄️ Step 1: Database Schema Design

🧹 Step 2: Data Cleaning & ETL

🔍 Step 3: Fraud Pattern Analysis

📊 Power BI DAX Measures

💡 Key Insights & Findings

Fraud Rate

High-Risk Times

Geographic Hotspots

Modus Operandi

🎯 Results & Impact

🛠️ Technical Skills Demonstrated

Power BI

MySQL

Data Analysis

Visualization

Ready to Work Together?