Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
341 changes: 17 additions & 324 deletions Flight_Data_Analysis.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,8 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Predicting Flight Delays with GitHub Copilot\n",
"# Predicting Flight Delays\n",
"\n",
"The flight dataset is a dataset that contains information about flights and how they are delayed.\n",
"In this notebook, we will use the dataset to predict whether a flight will be delayed or not."
]
Expand All @@ -25,334 +26,48 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Cleaning data with GitHub Copilot\n",
"## Cleaning data\n",
"\n",
"You can clean data with GitHub Copilot and ask questions about your data. For example, you can ask Copilot to remove missing values, remove duplicates, normalize your data and more."
]
},
{
"cell_type": "code",
"execution_count": 29,
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Year</th>\n",
" <th>Month</th>\n",
" <th>DayofMonth</th>\n",
" <th>DayOfWeek</th>\n",
" <th>Carrier</th>\n",
" <th>OriginAirportID</th>\n",
" <th>OriginAirportName</th>\n",
" <th>OriginCity</th>\n",
" <th>OriginState</th>\n",
" <th>DestAirportID</th>\n",
" <th>DestAirportName</th>\n",
" <th>DestCity</th>\n",
" <th>DestState</th>\n",
" <th>CRSDepTime</th>\n",
" <th>DepDelay</th>\n",
" <th>DepDel15</th>\n",
" <th>CRSArrTime</th>\n",
" <th>ArrDelay</th>\n",
" <th>ArrDel15</th>\n",
" <th>Cancelled</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>2013</td>\n",
" <td>9</td>\n",
" <td>16</td>\n",
" <td>1</td>\n",
" <td>DL</td>\n",
" <td>15304</td>\n",
" <td>Tampa International</td>\n",
" <td>Tampa</td>\n",
" <td>FL</td>\n",
" <td>12478</td>\n",
" <td>John F. Kennedy International</td>\n",
" <td>New York</td>\n",
" <td>NY</td>\n",
" <td>1539</td>\n",
" <td>4</td>\n",
" <td>0.0</td>\n",
" <td>1824</td>\n",
" <td>13</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2013</td>\n",
" <td>9</td>\n",
" <td>23</td>\n",
" <td>1</td>\n",
" <td>WN</td>\n",
" <td>14122</td>\n",
" <td>Pittsburgh International</td>\n",
" <td>Pittsburgh</td>\n",
" <td>PA</td>\n",
" <td>13232</td>\n",
" <td>Chicago Midway International</td>\n",
" <td>Chicago</td>\n",
" <td>IL</td>\n",
" <td>710</td>\n",
" <td>3</td>\n",
" <td>0.0</td>\n",
" <td>740</td>\n",
" <td>22</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>2013</td>\n",
" <td>9</td>\n",
" <td>7</td>\n",
" <td>6</td>\n",
" <td>AS</td>\n",
" <td>14747</td>\n",
" <td>Seattle/Tacoma International</td>\n",
" <td>Seattle</td>\n",
" <td>WA</td>\n",
" <td>11278</td>\n",
" <td>Ronald Reagan Washington National</td>\n",
" <td>Washington</td>\n",
" <td>DC</td>\n",
" <td>810</td>\n",
" <td>-3</td>\n",
" <td>0.0</td>\n",
" <td>1614</td>\n",
" <td>-7</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>2013</td>\n",
" <td>7</td>\n",
" <td>22</td>\n",
" <td>1</td>\n",
" <td>OO</td>\n",
" <td>13930</td>\n",
" <td>Chicago O'Hare International</td>\n",
" <td>Chicago</td>\n",
" <td>IL</td>\n",
" <td>11042</td>\n",
" <td>Cleveland-Hopkins International</td>\n",
" <td>Cleveland</td>\n",
" <td>OH</td>\n",
" <td>804</td>\n",
" <td>35</td>\n",
" <td>1.0</td>\n",
" <td>1027</td>\n",
" <td>33</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>2013</td>\n",
" <td>5</td>\n",
" <td>16</td>\n",
" <td>4</td>\n",
" <td>DL</td>\n",
" <td>13931</td>\n",
" <td>Norfolk International</td>\n",
" <td>Norfolk</td>\n",
" <td>VA</td>\n",
" <td>10397</td>\n",
" <td>Hartsfield-Jackson Atlanta International</td>\n",
" <td>Atlanta</td>\n",
" <td>GA</td>\n",
" <td>545</td>\n",
" <td>-1</td>\n",
" <td>0.0</td>\n",
" <td>728</td>\n",
" <td>-9</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Year Month DayofMonth DayOfWeek Carrier OriginAirportID \\\n",
"0 2013 9 16 1 DL 15304 \n",
"1 2013 9 23 1 WN 14122 \n",
"2 2013 9 7 6 AS 14747 \n",
"3 2013 7 22 1 OO 13930 \n",
"4 2013 5 16 4 DL 13931 \n",
"\n",
" OriginAirportName OriginCity OriginState DestAirportID \\\n",
"0 Tampa International Tampa FL 12478 \n",
"1 Pittsburgh International Pittsburgh PA 13232 \n",
"2 Seattle/Tacoma International Seattle WA 11278 \n",
"3 Chicago O'Hare International Chicago IL 11042 \n",
"4 Norfolk International Norfolk VA 10397 \n",
"\n",
" DestAirportName DestCity DestState CRSDepTime \\\n",
"0 John F. Kennedy International New York NY 1539 \n",
"1 Chicago Midway International Chicago IL 710 \n",
"2 Ronald Reagan Washington National Washington DC 810 \n",
"3 Cleveland-Hopkins International Cleveland OH 804 \n",
"4 Hartsfield-Jackson Atlanta International Atlanta GA 545 \n",
"\n",
" DepDelay DepDel15 CRSArrTime ArrDelay ArrDel15 Cancelled \n",
"0 4 0.0 1824 13 0 0 \n",
"1 3 0.0 740 22 1 0 \n",
"2 -3 0.0 1614 -7 0 0 \n",
"3 35 1.0 1027 33 1 0 \n",
"4 -1 0.0 728 -9 0 0 "
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}
],
"outputs": [],
"source": [
"# identify and replace missing values\n",
"df.isnull().sum()\n",
"# Normalize missing values\n",
"df.fillna(0, inplace=True)\n",
"\n",
"# update DepDel15 to 0 if null\n",
"df['DepDel15'].fillna(0, inplace=True)\n",
"\n",
"# Calculate the z-scores of depdelay and arrdelay & find outliers\n",
"z_scores = (df[['DepDelay', 'ArrDelay']] - df[['DepDelay', 'ArrDelay']].mean()) / df[['DepDelay', 'ArrDelay']].std()\n",
"abs_z_scores = z_scores.abs()\n",
"outliers = (abs_z_scores > 3).any(axis=1)\n",
"\n",
"# Remove the outliers\n",
"df = df[~outliers]\n",
"\n",
"# show newly cleaned data\n",
"df.head()\n"
"# Detect and remove outliers across delay columns"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Ask Questions about your dataset with GitHub Copilot\n",
"\n",
"If there are any questions you have about your dataset, you can ask Copilot to help you with that. For example, you can ask Copilot to show you the shape of your dataset, show you the columns of your dataset and even ask about columns that you are not sure about."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"q: what is DepDelay?\n",
"a: DepDelay is the departure delay in minutes.\n",
"\n",
"q: what is DepDel15?\n",
"a: DepDel15 is the departure delay indicator. It is 1 if the flight was delayed by 15 minutes or more, and 0 otherwise.\n",
"\n",
"q: what is CRSArrTime?\n",
"a: CRSArrTime is the scheduled arrival time."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create a model with GitHub Copilot\n",
"\n",
"You can create a model with GitHub Copilot. For example, you can ask Copilot to create a model that predicts whether a flight will be delayed or not.\n",
"\n",
"Demo Prompt: \n",
"```markdown\n",
"## Create a predictive model\n",
"\n",
"1. CMD + i (in a new cell)\n",
"\n",
"2. create a model to predict the likelihood of a flight being delayed based on the day of the week and the arrival airport. Use Logistic regression and calculate the accuracy of the model.\n",
"\n",
"3. resolve terminal errors with copilot in the terminal\n",
"\n",
"```"
"GitHub Copilot can help you write the code to train a model. For example, you can ask Copilot to create a model that predicts whether a flight will be delayed or not."
]
},
{
"cell_type": "code",
"execution_count": 28,
"execution_count": null,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Accuracy: 0.80\n"
]
}
],
"outputs": [],
"source": [
"from sklearn.model_selection import train_test_split\n",
"from sklearn.linear_model import LogisticRegression\n",
"\n",
"# Split the dataset into features (X) and target variable (y)\n",
"X = df[['DayOfWeek', 'DestAirportID']]\n",
"y = df['ArrDel15']\n",
"\n",
"# Split the dataset into training and testing sets\n",
"X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)\n",
"\n",
"# Create a logistic regression model\n",
"model = LogisticRegression()\n",
"\n",
"# Fit the model on the training data\n",
"model.fit(X_train, y_train)\n",
"\n",
"# Predict the probabilities of flight delays for the test data\n",
"y_pred_proba = model.predict_proba(X_test)\n",
"\n",
"# You can also predict the actual class labels\n",
"y_pred = model.predict(X_test)\n",
"\n",
"# Calculate the accuracy of the model\n",
"accuracy = model.score(X_test, y_test)\n",
"print(f'Accuracy: {accuracy:.2f}')\n"
"# Train a model to predict the likelihood of a flight being delayed based on the day of the week and the arrival airport. Use Logistic regression and calculate the accuracy of the model.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Code with your Voice with GitHub Copilot\n",
"\n",
"With Copilot Chat you can start a Voice chat and get help with your dataset.\n",
"\n",
"Demo Prompt:\n",
"```markdown\n",
"## Evaluate the Model\n",
"\n",
"1. open Copilot Chat\n",
"2. click the mic to start a voice chat\n",
"3. Ask: Make a prediction of the odds of a flight being delayed to Los Angeles on a Wednesday\n",
"\n",
"```"
"Try Copilot's voice input to test your model."
]
},
{
Expand All @@ -361,38 +76,16 @@
"metadata": {},
"outputs": [],
"source": [
"# add code here from voice\n",
"\n",
"# Create a dataframe with the input values\n",
"data = {'DayOfWeek': [3], 'DestAirportID': [12892]}\n",
"\n",
"# Make a prediction\n",
"input_values = pd.DataFrame(data)\n",
"\n",
"# Make a prediction\n",
"prediction = model.predict_proba(input_values)\n",
"\n",
"# Show the prediction\n",
"prediction[0]"
"# Predict flight delays with the model\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Slash Commands & Smart Actions with GitHub Copilot\n",
"\n",
"Slash commands and smart actions make it a fast way to interact with GitHub Copilot. You can use it to write documentation, test, and more.\n",
"\n",
"Demo Prompt:\n",
"```markdown\n",
"# Make it an app\n",
"\n",
"1. open server.py file\n",
"2. open Copilot Chat or press CMD + i\n",
"3. type /test\n",
"4. add tests to a new file and save\n",
"5. commit your changes with smart actions sparkles\n",
"6. push your changes to the repo\n",
"Move model to some server code … see [server.py](server/server.py)\n",
"\n",
"```"
]
Expand Down
Loading