This project builds a machine learning model to detect fraudulent transactions based on behavioral and transactional features. It supports financial institutions in identifying suspicious activity and preventing fraud in real time.
The dataset includes anonymized transaction records with engineered features for fraud detection.
Key columns:
Time: Time elapsed since the first transactionAmount: Transaction amountV1toV28: PCA-transformed features capturing behavioral patternsClass: Target label (1 = Fraud, 0 = Legitimate)
df = pd.read_csv('/kaggle/input/fraud-detection-dataset/fraud.csv')df = df.dropna()
X = df.drop('Class', axis=1)
y = df['Class']X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y, test_size=0.2, random_state=42)model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)y_pred = model.predict(X_test)
print(accuracy_score(y_test, y_pred))
print(classification_report(y_test, y_pred))
print(confusion_matrix(y_test, y_pred))input_df = pd.DataFrame(np.zeros((1, len(X_train.columns))), columns=X_train.columns)
model.predict(input_df)- Accuracy: ~99% on test data
- Precision: High precision for fraud class
- Top predictors: PCA components V14, V17, V10, and transaction amount
numpy
pandas
scikit-learn