diff --git a/Hotel Guests Analysis/README.md b/Hotel Guests Analysis/README.md new file mode 100644 index 0000000..225f77a --- /dev/null +++ b/Hotel Guests Analysis/README.md @@ -0,0 +1,114 @@ +# Hotel Guests Data Analysis 🏨 + +This directory contains a comprehensive analysis of hotel guest data, including visualizations and insights that can help improve hotel operations and revenue optimization. + +## πŸ“Š Analysis Overview + +The analysis examines various aspects of hotel guest behavior and patterns: + +### 1. **Room Type Analysis** +- Distribution of room types (Basic, Deluxe, Suite) +- Average room rates by type +- Total revenue contribution by room type +- Room rate distribution patterns + +### 2. **Rewards Program Impact** +- Participation rates in rewards programs +- Spending differences between rewards and non-rewards guests +- Room type preferences by rewards status +- Amenities fee patterns + +### 3. **Seasonal Booking Patterns** +- Monthly booking distribution throughout the year +- Seasonal trends (Spring, Summer, Fall, Winter) +- Average room rates by season +- Length of stay variations by season + +### 4. **Geographic Distribution** +- Top states by number of guests +- Average spending by geographic location +- Regional booking patterns + +### 5. **Amenities and Stay Patterns** +- Distribution of amenities fees +- Correlation between room rates and amenities fees +- Length of stay analysis +- Cost patterns by stay duration + +## πŸš€ Getting Started + +### Prerequisites +Install the required Python packages: +```bash +pip install -r requirements.txt +``` + +### Running the Analysis +Execute the main analysis script: +```bash +python hotel_guests_analysis.py +``` + +## πŸ“ˆ Generated Outputs + +The analysis generates the following files: + +### Visualizations +- `room_type_analysis.png` - Room type distribution and revenue analysis +- `rewards_program_analysis.png` - Rewards program impact visualization +- `seasonal_patterns.png` - Seasonal booking trends +- `geographic_distribution.png` - Geographic guest distribution +- `amenities_stay_patterns.png` - Amenities and stay duration analysis + +### Reports +- `analysis_report.txt` - Comprehensive summary report with key insights and business recommendations + +## πŸ” Key Insights + +The analysis reveals several important patterns: + +1. **Revenue Optimization**: Different room types show varying profitability patterns +2. **Customer Loyalty**: Rewards program members exhibit different spending behaviors +3. **Seasonal Trends**: Clear seasonal patterns in booking volume and pricing +4. **Geographic Patterns**: Certain states contribute more to revenue than others +5. **Stay Duration**: Length of stay correlates with total spending patterns + +## πŸ’‘ Business Applications + +This analysis can help with: +- **Pricing Strategy**: Optimize room rates based on demand patterns +- **Inventory Management**: Allocate room types based on popularity +- **Marketing Campaigns**: Target specific customer segments +- **Seasonal Planning**: Prepare for peak and off-peak periods +- **Rewards Program**: Enhance customer loyalty initiatives + +## πŸ“‹ Data Schema + +The analysis works with the following data fields: +- `guest_email`: Guest email address +- `hotel_id`: Hotel identifier +- `has_rewards`: Rewards program participation (True/False) +- `room_type`: Type of room (BASIC, DELUXE, SUITE) +- `amenities_fee`: Additional amenities charges +- `checkin_date`: Check-in date +- `checkout_date`: Check-out date +- `room_rate`: Base room rate +- `billing_address`: Guest billing address +- `credit_card_number`: Payment information + +## πŸ› οΈ Technical Details + +- **Language**: Python 3.7+ +- **Key Libraries**: pandas, matplotlib, seaborn, numpy +- **Output Format**: PNG images (300 DPI) and text reports +- **Data Processing**: Handles missing values and date parsing +- **Visualization Style**: Professional seaborn styling with custom color palettes + +## πŸ“ž Support + +For questions or suggestions about this analysis, please refer to the generated reports or examine the detailed code comments in `hotel_guests_analysis.py`. + +--- + +*Generated by Codegen AI Assistant - June 2025* + diff --git a/Hotel Guests Analysis/amenities_stay_patterns.png b/Hotel Guests Analysis/amenities_stay_patterns.png new file mode 100644 index 0000000..8420fc9 Binary files /dev/null and b/Hotel Guests Analysis/amenities_stay_patterns.png differ diff --git a/Hotel Guests Analysis/analysis_report.txt b/Hotel Guests Analysis/analysis_report.txt new file mode 100644 index 0000000..7082a26 --- /dev/null +++ b/Hotel Guests Analysis/analysis_report.txt @@ -0,0 +1,38 @@ + + ═══════════════════════════════════════════════════════════════ + HOTEL GUESTS ANALYSIS REPORT + ═══════════════════════════════════════════════════════════════ + + πŸ“Š OVERVIEW METRICS + ───────────────────────────────────────────────────────────── + β€’ Total Guests: 71 + β€’ Total Revenue: $13,563.06 + β€’ Average Room Rate: $178.54 + β€’ Average Amenities Fee: $13.65 + β€’ Average Length of Stay: 2.3 days + + 🎯 KEY INSIGHTS + ───────────────────────────────────────────────────────────── + β€’ Rewards Program Participation: 26.8% + β€’ Most Popular Room Type: BASIC + β€’ Peak Booking Month: July + + πŸ’‘ BUSINESS RECOMMENDATIONS + ───────────────────────────────────────────────────────────── + 1. REWARDS PROGRAM: 26.8% participation rate suggests + opportunity to increase enrollment through targeted marketing. + + 2. ROOM TYPE OPTIMIZATION: BASIC rooms are most popular - + consider inventory allocation and pricing strategies. + + 3. SEASONAL PLANNING: July shows highest booking volume - + optimize staffing and pricing for peak periods. + + 4. AMENITIES STRATEGY: Average amenities fee of $13.65 + indicates potential for revenue optimization through package deals. + + 5. LENGTH OF STAY: Average stay of 2.3 days suggests + opportunities for extended stay packages and loyalty incentives. + + ═══════════════════════════════════════════════════════════════ + \ No newline at end of file diff --git a/Hotel Guests Analysis/geographic_distribution.png b/Hotel Guests Analysis/geographic_distribution.png new file mode 100644 index 0000000..eb54afe Binary files /dev/null and b/Hotel Guests Analysis/geographic_distribution.png differ diff --git a/Hotel Guests Analysis/guests.csv b/Hotel Guests Analysis/guests.csv new file mode 100644 index 0000000..cc4dbdd --- /dev/null +++ b/Hotel Guests Analysis/guests.csv @@ -0,0 +1,144 @@ +guest_email,hotel_id,has_rewards,room_type,amenities_fee,checkin_date,checkout_date,room_rate,billing_address,credit_card_number +awolf@phillips.com,HID_000,False,BASIC,37.89,27 Dec 2020,28 Dec 2020,156.23,"993 Rebecca Landing +Jesseburgh, PA 05072",4075084747483975747 +tonya44@wilkinson-wilkins.com,HID_000,False,BASIC,24.37,30 Dec 2020,31 Dec 2020,139.43,"958 Beverly Bypass +South Ronald, GA 46368",180072822063468 +harriskathleen@goodwin.com,HID_000,True,DELUXE,0.0,17 Sep 2020,19 Sep 2020,403.33,"8302 Nathaniel Pike +Rileyland, TX 71613",38983476971380 +kayladiaz@wallace-simmons.com,HID_000,False,BASIC,,28 Dec 2020,30 Dec 2020,140.61,"77 Massachusetts Ave +Cambridge, MA 02139",4969551998845740 +paigemendoza@tran-martin.com,HID_000,True,DELUXE,0.0,05 Apr 2020,10 Apr 2020,197.41,"1234 Corporate Drive +Boston, MA 02116",3558512986488983 +alexanderparker@robinson.com,HID_000,True,BASIC,0.0,18 Oct 2020,19 Oct 2020,197.76,"888 Little Stream +Lake Annmouth, ME 16402",4701079720447404938 +donald84@owens-arnold.org,HID_000,True,BASIC,0.0,22 Nov 2020,24 Nov 2020,108.09,"0156 Russell Trail Apt. 291 +Port Scottchester, IA 23231",6011956907055260 +keithbarnes@elliott-haley.com,HID_000,False,BASIC,16.45,04 Mar 2020,08 Mar 2020,136.12,"Unit 5037 Box 8794 +DPO AP 31934",4266279461142102517 +petermorton@garcia.biz,HID_000,False,BASIC,19.56,06 Jan 2020,09 Jan 2020,149.23,"583 Lewis Burgs +Port Jessetown, WI 21902",4779208902549 +mark93@good-ramirez.biz,HID_000,False,BASIC,15.23,22 Jan 2020,,139.44,"54238 Mcgee Crescent +Briantown, WY 72770",676384499254 +andrea71@johnson-chen.com,HID_000,False,BASIC,,15 Jun 2020,17 Jun 2020,131.09,"0910 Casey Land +Tammyville, TX 36463",3576959406725080 +brian51@bowman-brooks.com,HID_000,False,BASIC,37.1,23 Oct 2020,25 Oct 2020,141.36,"329 Deborah Via +Nataliehaven, MO 87399",675991525725 +pwilliams@cline.com,HID_000,False,BASIC,14.55,07 Mar 2020,12 Mar 2020,139.03,"904 Hines Rue +Port Gregory, MN 66236",4092078584581 +banthony@jackson.biz,HID_000,False,BASIC,4.54,12 Aug 2020,14 Aug 2020,161.73,"862 Mueller Creek Apt. 492 +Whiteland, WY 55940",4483421863642 +hicksdiana@carpenter.net,HID_000,True,BASIC,0.0,01 Oct 2020,03 Oct 2020,103.8,"954 Taylor Burgs +Port Steveton, OR 26243",4253047975942 +stevenscharlene@anderson.com,HID_000,False,BASIC,8.15,19 Jul 2020,21 Jul 2020,134.97,"4084 Heather Locks Apt. 091 +Blackwellport, GA 62060",563273068871 +margaret57@rodriguez.com,HID_000,False,DELUXE,17.94,14 Jul 2020,16 Jul 2020,282.27,"8302 Nathaniel Pike +Rileyland, TX 71613",3509825543444962 +leehelen@valencia.info,HID_000,False,BASIC,9.76,02 Mar 2020,07 Mar 2020,253.68,"19711 Alvarado Route +East Lori, AR 26670",3559348543878898 +dayers@bennett-ponce.com,HID_000,True,SUITE,0.0,16 Jul 2020,17 Jul 2020,254.43,"8594 Brian Lake Apt. 682 +East Lisafort, MT 74116",2247314498652894 +gracemorales@may.info,HID_000,True,BASIC,0.0,31 May 2020,02 Jun 2020,138.28,"8302 Nathaniel Pike +Rileyland, TX 71613",2290513662982427 +whitestephen@moreno-clark.com,HID_000,False,BASIC,12.87,30 May 2020,01 Jun 2020,132.53,"77 Massachusetts Ave +Cambridge, MA 02139",180065036602337 +elizabeth14@harrington.net,HID_000,False,BASIC,12.31,13 Oct 2020,17 Oct 2020,133.53,"Unit 8361 Box 9968 +DPO AE 57371",4670439786825211106 +ericayoung@neal.com,HID_000,False,BASIC,17.7,22 May 2020,24 May 2020,133.4,"502 Grant Spring Apt. 915 +Patriciaport, VA 24312",4249793726653 +heidi94@lopez.com,HID_000,False,BASIC,7.56,13 Sep 2020,15 Sep 2020,138.16,"329 Deborah Via +Nataliehaven, MO 87399",4010294825813198 +thomas68@harris.org,HID_000,False,BASIC,13.74,27 Jul 2020,30 Jul 2020,179.4,"3826 Rowe Mission Suite 167 +Jenniferhaven, CO 17091",213108431357494 +tiffany07@johnson.org,HID_000,False,BASIC,10.76,30 Nov 2020,02 Dec 2020,165.95,"1753 Anna Circles Suite 976 +West Amanda, GA 44322",30213206437256 +phillipsjay@pineda-bender.com,HID_000,True,BASIC,0.0,26 Aug 2020,28 Aug 2020,121.31,"1234 Corporate Drive +Boston, MA 02116",4551878586989 +millerrichard@robinson.com,HID_000,True,BASIC,0.0,05 Jul 2020,06 Jul 2020,113.97,"PSC 3637, Box 6528 +APO AP 00600",343863920293179 +xjohnson@baird.com,HID_000,True,BASIC,0.0,08 Dec 2020,11 Dec 2020,120.15,"0215 Chase Roads Suite 951 +Port Joshua, MS 80335",571167792928 +daniel92@gonzalez.com,HID_000,False,BASIC,9.76,05 Nov 2020,07 Nov 2020,143.07,"5678 Office Road +San Francisco, CA 94103",4704147652368811480 +ucastaneda@mitchell.com,HID_000,True,DELUXE,0.0,03 Dec 2020,04 Dec 2020,219.39,"Unit 8361 Box 9968 +DPO AE 57371",4104817616762 +christinesingh@pena.info,HID_000,False,BASIC,3.77,05 May 2020,07 May 2020,135.68,"4758 Parsons Camp +Lake Annettehaven, DC 78176",2255876870833937 +dillonmiranda@west.net,HID_000,False,BASIC,26.47,14 Nov 2020,15 Nov 2020,144.96,"5678 Office Road +San Francisco, CA 94103",4196831074465 +whitneychen@taylor.com,HID_000,False,BASIC,14.03,11 Aug 2020,16 Aug 2020,143.15,"82622 Christopher Skyway Apt. 066 +Hopkinsview, SC 55350",4372475126552 +mduncan@mullins.info,HID_000,False,BASIC,,11 Aug 2020,14 Aug 2020,144.06,"977 Valentine Corner +North Davidmouth, VT 07522",4484017750569544 +rjackson@castillo.com,HID_000,False,BASIC,17.32,03 Oct 2020,04 Oct 2020,150.77,"Unit 8361 Box 9968 +DPO AE 57371",3563866109003415 +ramosjames@gregory.com,HID_000,False,BASIC,28.41,12 Jul 2020,16 Jul 2020,149.48,"454 Jennifer Port +New Sandra, NY 05178",379626111119090 +vscott@beltran.com,HID_000,False,BASIC,30.86,28 Jan 2020,31 Jan 2020,201.31,"43645 Clark Landing +West Williamtown, SC 77546",4026911632249 +lsteele@mendoza.biz,HID_000,True,BASIC,0.0,20 Feb 2020,22 Feb 2020,118.41,"5678 Office Road +San Francisco, CA 94103",4942094262703149 +martinsydney@stone.net,HID_000,False,BASIC,7.92,18 May 2020,21 May 2020,220.73,"0215 Chase Roads Suite 951 +Port Joshua, MS 80335",586471481935 +coreymurray@duran-martinez.com,HID_000,False,SUITE,32.68,21 Jan 2020,23 Jan 2020,327.29,"9726 Amber Station +East Keithland, ND 56091",4506717789672128364 +daniel57@marsh-shaw.com,HID_000,False,BASIC,13.02,13 Sep 2020,14 Sep 2020,137.46,"7105 Lauren Cliffs Suite 835 +Hartfurt, VT 96920",4495106062613062000 +lhart@lozano-rogers.biz,HID_000,False,DELUXE,19.13,05 Aug 2020,06 Aug 2020,240.96,"19711 Alvarado Route +East Lori, AR 26670",2242995672400517 +alan08@jordan.info,HID_000,False,BASIC,27.45,04 Jul 2020,06 Jul 2020,153.66,"4084 Heather Locks Apt. 091 +Blackwellport, GA 62060",30277478142829 +michael29@bauer.org,HID_000,True,DELUXE,0.0,13 Jul 2020,15 Jul 2020,201.3,"41459 Sarah Ranch +Floresview, ND 01693",4440004935964264 +daltonward@ingram.com,HID_000,False,BASIC,9.51,06 Sep 2020,07 Sep 2020,198.11,"74489 Simmons Trail +Valdezstad, GA 17568",4799286980280913 +gtorres@mckinney-baker.com,HID_000,False,BASIC,,20 May 2020,22 May 2020,187.76,"3826 Rowe Mission Suite 167 +Jenniferhaven, CO 17091",30145483229883 +jameswelch@levine.biz,HID_000,False,BASIC,,11 May 2020,15 May 2020,157.93,"77 Massachusetts Ave +Cambridge, MA 02139",30255137954032 +grantwendy@bird.com,HID_000,False,BASIC,15.64,26 Jun 2020,28 Jun 2020,139.35,"0215 Chase Roads Suite 951 +Port Joshua, MS 80335",4821889109737689 +jonathan12@white-payne.com,HID_000,True,SUITE,0.0,19 Aug 2020,21 Aug 2020,268.16,"0784 Todd Manors +Jonesmouth, WY 42593",30240040028009 +wmorgan@mahoney-pope.info,HID_000,True,DELUXE,0.0,08 Apr 2020,10 Apr 2020,257.55,"0784 Todd Manors +Jonesmouth, WY 42593",2349865661751281 +nmiller@jordan.com,HID_000,False,BASIC,16.23,09 Jun 2020,10 Jun 2020,150.06,"5678 Office Road +San Francisco, CA 94103",3556805154473395 +ashleycollins@conner.com,HID_000,False,BASIC,6.68,01 Jan 2021,03 Jan 2021,130.42,"1234 Corporate Drive +Boston, MA 02116",4153708178750118 +david07@ramirez-stanton.com,HID_000,False,DELUXE,22.27,04 Feb 2020,07 Feb 2020,228.83,"4322 Wilson Squares +Matthewchester, HI 85123",180017311857920 +fbowman@weaver.com,HID_000,False,BASIC,38.21,13 Sep 2020,16 Sep 2020,136.27,"958 Beverly Bypass +South Ronald, GA 46368",4016023549459142452 +hbryan@george.com,HID_000,True,BASIC,0.0,09 May 2020,11 May 2020,163.57,"329 Deborah Via +Nataliehaven, MO 87399",4847818679193 +markthomas@johnson.com,HID_000,False,BASIC,23.61,16 Aug 2020,18 Aug 2020,143.04,"1234 Corporate Drive +Boston, MA 02116",2294477661503079 +brownmartha@armstrong.com,HID_000,True,BASIC,0.0,13 Oct 2020,16 Oct 2020,292.5,"958 Beverly Bypass +South Ronald, GA 46368",180064426198428 +kimberlyrandolph@mueller-pratt.org,HID_000,False,SUITE,25.01,05 Jun 2020,07 Jun 2020,351.23,"4152 Alyssa Dale Suite 100 +Port Joshua, UT 45333",3556723727485418 +abrewer@martin.com,HID_000,False,SUITE,27.51,10 Mar 2020,12 Mar 2020,335.26,"54238 Mcgee Crescent +Briantown, WY 72770",2669631248945594 +ehall@smith-stewart.com,HID_000,True,SUITE,0.0,25 Jun 2020,27 Jun 2020,262.99,"PSC 7921, Box 4282 +APO AA 49483",3599849235456515 +suttonmelissa@sanchez.com,HID_000,False,BASIC,28.02,12 Oct 2020,14 Oct 2020,178.1,"4811 Mariah Center +Port Timothyville, SD 37840",4462406877117731 +kelly01@young-aguilar.com,HID_000,False,BASIC,32.25,28 Dec 2020,30 Dec 2020,140.79,"5895 Rebecca Cliff +East Elizabethhaven, NY 44803",349391012165075 +erin19@johnson.com,HID_000,False,BASIC,28.18,09 Jan 2020,11 Jan 2020,133.81,"1234 Corporate Drive +Boston, MA 02116",4461033775093730 +christopherschaefer@lee.org,HID_000,True,BASIC,0.0,16 Jan 2020,19 Jan 2020,135.45,"54238 Mcgee Crescent +Briantown, WY 72770",180095407605742 +wfreeman@webb.com,HID_000,False,BASIC,26.14,20 Jul 2020,22 Jul 2020,145.34,"2506 Christopher Lock +Lake Paul, IL 17894",38651500078643 +christine70@lane-klein.com,HID_000,False,BASIC,11.62,14 Jun 2020,17 Jun 2020,190.96,"0215 Chase Roads Suite 951 +Port Joshua, MS 80335",38416070843739 +haley68@clark.com,HID_000,False,BASIC,,22 Dec 2020,24 Dec 2020,143.66,"941 Samantha Port +North Ashley, DC 89806",2273379952126518 +fordterri@jackson-chambers.org,HID_000,False,SUITE,31.68,05 Jan 2020,08 Jan 2020,313.65,"6535 Andre Mountain Suite 741 +Melaniestad, VA 73781",372770926213149 +tbutler@hill.com,HID_000,False,BASIC,10.28,24 Aug 2020,27 Aug 2020,169.23,"285 Singh Via Suite 786 +Lake Ashley, AK 61126",676148456368 +raymondmoreno@hayes.biz,HID_000,False,BASIC,22.87,05 Jun 2020,07 Jun 2020,143.66,"Unit 8361 Box 9968 +DPO AE 57371",4104817616762 + diff --git a/Hotel Guests Analysis/hotel_guests_analysis.py b/Hotel Guests Analysis/hotel_guests_analysis.py new file mode 100644 index 0000000..af6ccc1 --- /dev/null +++ b/Hotel Guests Analysis/hotel_guests_analysis.py @@ -0,0 +1,366 @@ +#!/usr/bin/env python3 +""" +Hotel Guests Data Analysis +========================== + +This script analyzes hotel guest data to extract meaningful insights and create visualizations. +The analysis includes: +1. Room type distribution and revenue analysis +2. Rewards program impact on spending +3. Seasonal booking patterns +4. Geographic distribution of guests +5. Amenities fee analysis +6. Length of stay patterns + +Author: Codegen AI Assistant +Date: June 2025 +""" + +import pandas as pd +import matplotlib.pyplot as plt +import seaborn as sns +import numpy as np +from datetime import datetime +import warnings +warnings.filterwarnings('ignore') + +# Set style for better-looking plots +plt.style.use('seaborn-v0_8') +sns.set_palette("husl") + +def load_and_clean_data(filepath): + """Load and clean the hotel guests dataset.""" + print("πŸ“Š Loading and cleaning hotel guests data...") + + # Load the data + df = pd.read_csv(filepath) + + # Basic info about the dataset + print(f"Dataset shape: {df.shape}") + print(f"Columns: {list(df.columns)}") + + # Handle missing values + print(f"\nMissing values per column:") + print(df.isnull().sum()) + + # Convert date columns + df['checkin_date'] = pd.to_datetime(df['checkin_date'], format='%d %b %Y', errors='coerce') + df['checkout_date'] = pd.to_datetime(df['checkout_date'], format='%d %b %Y', errors='coerce') + + # Calculate length of stay + df['length_of_stay'] = (df['checkout_date'] - df['checkin_date']).dt.days + + # Extract month and season from check-in date + df['checkin_month'] = df['checkin_date'].dt.month + df['checkin_season'] = df['checkin_month'].map({ + 12: 'Winter', 1: 'Winter', 2: 'Winter', + 3: 'Spring', 4: 'Spring', 5: 'Spring', + 6: 'Summer', 7: 'Summer', 8: 'Summer', + 9: 'Fall', 10: 'Fall', 11: 'Fall' + }) + + # Extract state from billing address + df['state'] = df['billing_address'].str.extract(r', ([A-Z]{2}) \d{5}') + + # Calculate total cost (room rate + amenities fee) + df['total_cost'] = df['room_rate'] + df['amenities_fee'].fillna(0) + + print(f"\nβœ… Data cleaning completed!") + print(f"Valid check-in dates: {df['checkin_date'].notna().sum()}") + print(f"Valid length of stay calculations: {df['length_of_stay'].notna().sum()}") + + return df + +def create_room_type_analysis(df): + """Analyze room types and their revenue contribution.""" + print("\n🏨 Analyzing room types and revenue...") + + fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(15, 12)) + fig.suptitle('Room Type Analysis', fontsize=16, fontweight='bold') + + # Room type distribution + room_counts = df['room_type'].value_counts() + ax1.pie(room_counts.values, labels=room_counts.index, autopct='%1.1f%%', startangle=90) + ax1.set_title('Distribution of Room Types') + + # Average room rate by type + avg_rates = df.groupby('room_type')['room_rate'].mean().sort_values(ascending=True) + ax2.barh(avg_rates.index, avg_rates.values) + ax2.set_title('Average Room Rate by Type') + ax2.set_xlabel('Average Room Rate ($)') + + # Total revenue by room type + total_revenue = df.groupby('room_type')['total_cost'].sum().sort_values(ascending=True) + ax3.barh(total_revenue.index, total_revenue.values) + ax3.set_title('Total Revenue by Room Type') + ax3.set_xlabel('Total Revenue ($)') + + # Room rate distribution by type + df.boxplot(column='room_rate', by='room_type', ax=ax4) + ax4.set_title('Room Rate Distribution by Type') + ax4.set_xlabel('Room Type') + ax4.set_ylabel('Room Rate ($)') + + plt.tight_layout() + plt.savefig('room_type_analysis.png', dpi=300, bbox_inches='tight') + plt.show() + + return room_counts, avg_rates, total_revenue + +def analyze_rewards_program(df): + """Analyze the impact of rewards program on guest behavior.""" + print("\n🎁 Analyzing rewards program impact...") + + fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(15, 12)) + fig.suptitle('Rewards Program Analysis', fontsize=16, fontweight='bold') + + # Rewards program distribution + rewards_dist = df['has_rewards'].value_counts() + ax1.pie(rewards_dist.values, labels=['No Rewards', 'Has Rewards'], autopct='%1.1f%%', startangle=90) + ax1.set_title('Rewards Program Participation') + + # Average spending by rewards status + avg_spending = df.groupby('has_rewards')['total_cost'].mean() + ax2.bar(['No Rewards', 'Has Rewards'], avg_spending.values) + ax2.set_title('Average Total Spending by Rewards Status') + ax2.set_ylabel('Average Total Cost ($)') + + # Amenities fee by rewards status + amenities_by_rewards = df.groupby('has_rewards')['amenities_fee'].mean() + ax3.bar(['No Rewards', 'Has Rewards'], amenities_by_rewards.values) + ax3.set_title('Average Amenities Fee by Rewards Status') + ax3.set_ylabel('Average Amenities Fee ($)') + + # Room type preference by rewards status + room_rewards = pd.crosstab(df['room_type'], df['has_rewards'], normalize='columns') * 100 + room_rewards.plot(kind='bar', ax=ax4) + ax4.set_title('Room Type Preference by Rewards Status (%)') + ax4.set_ylabel('Percentage') + ax4.legend(['No Rewards', 'Has Rewards']) + ax4.tick_params(axis='x', rotation=45) + + plt.tight_layout() + plt.savefig('rewards_program_analysis.png', dpi=300, bbox_inches='tight') + plt.show() + + return rewards_dist, avg_spending, amenities_by_rewards + +def analyze_seasonal_patterns(df): + """Analyze seasonal booking patterns.""" + print("\nπŸ—“οΈ Analyzing seasonal booking patterns...") + + # Filter out rows with missing check-in dates + df_valid_dates = df.dropna(subset=['checkin_date']) + + fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(15, 12)) + fig.suptitle('Seasonal Booking Patterns', fontsize=16, fontweight='bold') + + # Monthly booking distribution + monthly_bookings = df_valid_dates['checkin_month'].value_counts().sort_index() + month_names = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', + 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'] + ax1.bar(range(1, 13), [monthly_bookings.get(i, 0) for i in range(1, 13)]) + ax1.set_title('Bookings by Month') + ax1.set_xlabel('Month') + ax1.set_ylabel('Number of Bookings') + ax1.set_xticks(range(1, 13)) + ax1.set_xticklabels(month_names, rotation=45) + + # Seasonal distribution + seasonal_bookings = df_valid_dates['checkin_season'].value_counts() + ax2.pie(seasonal_bookings.values, labels=seasonal_bookings.index, autopct='%1.1f%%', startangle=90) + ax2.set_title('Bookings by Season') + + # Average room rate by season + seasonal_rates = df_valid_dates.groupby('checkin_season')['room_rate'].mean() + ax3.bar(seasonal_rates.index, seasonal_rates.values) + ax3.set_title('Average Room Rate by Season') + ax3.set_ylabel('Average Room Rate ($)') + ax3.tick_params(axis='x', rotation=45) + + # Length of stay by season + df_valid_dates.boxplot(column='length_of_stay', by='checkin_season', ax=ax4) + ax4.set_title('Length of Stay by Season') + ax4.set_xlabel('Season') + ax4.set_ylabel('Length of Stay (days)') + + plt.tight_layout() + plt.savefig('seasonal_patterns.png', dpi=300, bbox_inches='tight') + plt.show() + + return monthly_bookings, seasonal_bookings, seasonal_rates + +def analyze_geographic_distribution(df): + """Analyze geographic distribution of guests.""" + print("\nπŸ—ΊοΈ Analyzing geographic distribution...") + + # Filter out rows with valid state information + df_with_states = df.dropna(subset=['state']) + + fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6)) + fig.suptitle('Geographic Distribution of Guests', fontsize=16, fontweight='bold') + + # Top states by number of guests + top_states = df_with_states['state'].value_counts().head(10) + ax1.barh(range(len(top_states)), top_states.values) + ax1.set_yticks(range(len(top_states))) + ax1.set_yticklabels(top_states.index) + ax1.set_title('Top 10 States by Number of Guests') + ax1.set_xlabel('Number of Guests') + + # Average spending by top states + state_spending = df_with_states.groupby('state')['total_cost'].mean().sort_values(ascending=False).head(10) + ax2.barh(range(len(state_spending)), state_spending.values) + ax2.set_yticks(range(len(state_spending))) + ax2.set_yticklabels(state_spending.index) + ax2.set_title('Top 10 States by Average Spending') + ax2.set_xlabel('Average Total Cost ($)') + + plt.tight_layout() + plt.savefig('geographic_distribution.png', dpi=300, bbox_inches='tight') + plt.show() + + return top_states, state_spending + +def analyze_amenities_and_stay_patterns(df): + """Analyze amenities fees and length of stay patterns.""" + print("\nπŸ›ŽοΈ Analyzing amenities and stay patterns...") + + fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(15, 12)) + fig.suptitle('Amenities and Stay Patterns Analysis', fontsize=16, fontweight='bold') + + # Amenities fee distribution + amenities_with_fee = df[df['amenities_fee'] > 0]['amenities_fee'] + ax1.hist(amenities_with_fee, bins=20, edgecolor='black', alpha=0.7) + ax1.set_title('Distribution of Amenities Fees (Excluding $0)') + ax1.set_xlabel('Amenities Fee ($)') + ax1.set_ylabel('Frequency') + + # Amenities fee vs room rate correlation + df_clean = df.dropna(subset=['amenities_fee', 'room_rate']) + ax2.scatter(df_clean['room_rate'], df_clean['amenities_fee'], alpha=0.6) + ax2.set_title('Amenities Fee vs Room Rate') + ax2.set_xlabel('Room Rate ($)') + ax2.set_ylabel('Amenities Fee ($)') + + # Length of stay distribution + stay_lengths = df.dropna(subset=['length_of_stay'])['length_of_stay'] + stay_lengths = stay_lengths[stay_lengths > 0] # Remove invalid stays + ax3.hist(stay_lengths, bins=range(1, int(stay_lengths.max()) + 2), edgecolor='black', alpha=0.7) + ax3.set_title('Distribution of Length of Stay') + ax3.set_xlabel('Length of Stay (days)') + ax3.set_ylabel('Frequency') + + # Average total cost by length of stay + stay_cost = df.groupby('length_of_stay')['total_cost'].mean() + stay_cost = stay_cost[stay_cost.index > 0] # Remove invalid stays + ax4.bar(stay_cost.index, stay_cost.values) + ax4.set_title('Average Total Cost by Length of Stay') + ax4.set_xlabel('Length of Stay (days)') + ax4.set_ylabel('Average Total Cost ($)') + + plt.tight_layout() + plt.savefig('amenities_stay_patterns.png', dpi=300, bbox_inches='tight') + plt.show() + + return amenities_with_fee, stay_lengths, stay_cost + +def generate_summary_report(df): + """Generate a comprehensive summary report.""" + print("\nπŸ“‹ Generating summary report...") + + # Calculate key metrics + total_guests = len(df) + total_revenue = df['total_cost'].sum() + avg_room_rate = df['room_rate'].mean() + avg_amenities_fee = df['amenities_fee'].mean() + rewards_participation = (df['has_rewards'].sum() / total_guests) * 100 + avg_length_of_stay = df['length_of_stay'].mean() + + # Most popular room type + most_popular_room = df['room_type'].mode()[0] + + # Peak booking month + peak_month = df.dropna(subset=['checkin_month'])['checkin_month'].mode() + peak_month_name = {1: 'January', 2: 'February', 3: 'March', 4: 'April', + 5: 'May', 6: 'June', 7: 'July', 8: 'August', + 9: 'September', 10: 'October', 11: 'November', 12: 'December'} + peak_month_str = peak_month_name.get(peak_month[0], 'Unknown') if len(peak_month) > 0 else 'Unknown' + + report = f""" + ═══════════════════════════════════════════════════════════════ + HOTEL GUESTS ANALYSIS REPORT + ═══════════════════════════════════════════════════════════════ + + πŸ“Š OVERVIEW METRICS + ───────────────────────────────────────────────────────────── + β€’ Total Guests: {total_guests:,} + β€’ Total Revenue: ${total_revenue:,.2f} + β€’ Average Room Rate: ${avg_room_rate:.2f} + β€’ Average Amenities Fee: ${avg_amenities_fee:.2f} + β€’ Average Length of Stay: {avg_length_of_stay:.1f} days + + 🎯 KEY INSIGHTS + ───────────────────────────────────────────────────────────── + β€’ Rewards Program Participation: {rewards_participation:.1f}% + β€’ Most Popular Room Type: {most_popular_room} + β€’ Peak Booking Month: {peak_month_str} + + πŸ’‘ BUSINESS RECOMMENDATIONS + ───────────────────────────────────────────────────────────── + 1. REWARDS PROGRAM: {rewards_participation:.1f}% participation rate suggests + opportunity to increase enrollment through targeted marketing. + + 2. ROOM TYPE OPTIMIZATION: {most_popular_room} rooms are most popular - + consider inventory allocation and pricing strategies. + + 3. SEASONAL PLANNING: {peak_month_str} shows highest booking volume - + optimize staffing and pricing for peak periods. + + 4. AMENITIES STRATEGY: Average amenities fee of ${avg_amenities_fee:.2f} + indicates potential for revenue optimization through package deals. + + 5. LENGTH OF STAY: Average stay of {avg_length_of_stay:.1f} days suggests + opportunities for extended stay packages and loyalty incentives. + + ═══════════════════════════════════════════════════════════════ + """ + + print(report) + + # Save report to file + with open('analysis_report.txt', 'w') as f: + f.write(report) + + return report + +def main(): + """Main analysis function.""" + print("🏨 Hotel Guests Data Analysis Starting...") + print("=" * 60) + + # Load and clean data + df = load_and_clean_data('guests.csv') + + # Perform analyses + room_analysis = create_room_type_analysis(df) + rewards_analysis = analyze_rewards_program(df) + seasonal_analysis = analyze_seasonal_patterns(df) + geographic_analysis = analyze_geographic_distribution(df) + amenities_analysis = analyze_amenities_and_stay_patterns(df) + + # Generate summary report + summary_report = generate_summary_report(df) + + print("\nβœ… Analysis completed successfully!") + print("πŸ“ Generated files:") + print(" β€’ room_type_analysis.png") + print(" β€’ rewards_program_analysis.png") + print(" β€’ seasonal_patterns.png") + print(" β€’ geographic_distribution.png") + print(" β€’ amenities_stay_patterns.png") + print(" β€’ analysis_report.txt") + print("\nπŸŽ‰ All visualizations and reports have been saved!") + +if __name__ == "__main__": + main() diff --git a/Hotel Guests Analysis/requirements.txt b/Hotel Guests Analysis/requirements.txt new file mode 100644 index 0000000..fece643 --- /dev/null +++ b/Hotel Guests Analysis/requirements.txt @@ -0,0 +1,5 @@ +pandas>=1.3.0 +matplotlib>=3.5.0 +seaborn>=0.11.0 +numpy>=1.21.0 + diff --git a/Hotel Guests Analysis/rewards_program_analysis.png b/Hotel Guests Analysis/rewards_program_analysis.png new file mode 100644 index 0000000..c00cc95 Binary files /dev/null and b/Hotel Guests Analysis/rewards_program_analysis.png differ diff --git a/Hotel Guests Analysis/room_type_analysis.png b/Hotel Guests Analysis/room_type_analysis.png new file mode 100644 index 0000000..c32f066 Binary files /dev/null and b/Hotel Guests Analysis/room_type_analysis.png differ diff --git a/Hotel Guests Analysis/seasonal_patterns.png b/Hotel Guests Analysis/seasonal_patterns.png new file mode 100644 index 0000000..590445f Binary files /dev/null and b/Hotel Guests Analysis/seasonal_patterns.png differ