π§Ή Global Job Layoffs β SQL Data Cleaning & Analysis
π§ Project Overview
This project focuses on cleaning and analyzing a dataset containing records of global job layoffs across multiple industries and countries. The dataset includes company details, locations, industries, number of layoffs, company stages, and total funds raised (in millions).
Using MySQL, the project demonstrates how to clean raw, inconsistent data, standardize formats, fill missing values, and perform exploratory data analysis to uncover key insights about the global workforce trends.
π― Objectives
1 Remove duplicate and inconsistent records from the dataset. 2 Standardize and transform data types for accuracy and usability. 3 Handle missing and null values appropriately. 4 Derive insights on layoffs by company, industry, country, and time. 5 Identify funding and layoff trends across industries.
π§° Tools & Technologies
SQL (MySQL Workbench) RDBMS: MySQL 8+ Functions Used:
ROW_NUMBER(),TRIM(),STR_TO_DATE(),GROUP BY,JOIN,UPDATE,DELETE,ALTER,SUM(), etc.
π§Ή Data Cleaning Process
The data cleaning process followed a systematic approach:
-
Duplicate Removal
. Used
ROW_NUMBER()window function to identify and delete duplicate rows. . Created temporary tables (layoffs1,layoffs2) to ensure safe data cleaning. -
Trimming & Standardization
. Removed extra spaces using
TRIM()on text columns. . Corrected inconsistent entries (e.g., industry names like"crypto currency"β"crypto"). . Removed trailing characters from country names. -
Data Type Corrections
. Converted the
datecolumn fromTEXTtoDATEusingSTR_TO_DATE(). . Standardized numeric columns (total_laid_off,funds_raised_millions) for accurate aggregation. -
Handling Null or Blank Values
. Set empty strings to
NULL. . Filled missingindustry,total_laid_off, andpercentage_laid_offwhere logical matches existed usingJOINstatements. . Replaced remaining nulls in numeric fields with0where appropriate. -
Data Validation
. Checked for incomplete or irrelevant rows and deleted unusable entries. . Verified corrections using multiple
SELECTvalidation queries.
π Exploratory Analysis & Insights
After cleaning, SQL queries were used to extract valuable insights, including:
| Analysis Goal | SQL Summary Query | Example Output / Finding |
|---|---|---|
| Total layoffs globally | SELECT SUM(total_laid_off) |
Total layoffs: X employees |
| Funds raised by industry | GROUP BY industry ORDER BY SUM(funds_raised_millions) |
Tech sector raised the most funding |
| Top company by funds raised | LIMIT 1 query |
Company X raised the highest funds |
| Industries most affected by layoffs | GROUP BY industry |
Consumer & Tech sectors had the most layoffs |
| Preferred company location | GROUP BY location ORDER BY COUNT(location) |
San Francisco had the highest company count |
| Layoffs by year | WHERE date LIKE '2022%' |
Major layoffs occurred in 2022 |
| Cross-industry presence | WHERE industry LIKE 'health%' OR industry LIKE 'crypto%' |
Healthcare and crypto industries span multiple countries |
π Key Insights
β’ Total layoffs and funds raised across the dataset β’ Industries most affected by layoffs β’ Companies with highest layoffs and fundraising β’ Geographic trends in layoffs and funding β’ Yearly patterns and 2022-specific layoffs
π Repository Structure
π SQL-DC-IPROJECT/
βvisuals/
β βββ Overview.png
| Insights.png
| FinalTableSample.png
|README.md
|
|data/
β βββ layoffs.csv
β
|scripts/
β βββ layoffs.sql
π§ Skills Demonstrated
. SQL-based data cleaning and transformation . Exploratory analysis and business insight extraction . Query optimization and stepwise validation . Attention to data consistency and quality assurance
π¬ Contact