Improve 80/20 train/test split to ensure consistent ratio #25

shayan74 · 2025-11-07T12:53:49Z

Dear Jadi,

Thank you for creating such a wonderful machine learning course — I’ve been recommending it to Persian-speaking students who are eager to learn ML.

While reviewing the code, I noticed a small detail in the train/test split logic that might cause slight variations in the ratio. The current approach:

np.random.rand(len(df)) < 0.8

works well in general, but due to randomness, it may yield ratios anywhere between roughly 77% to 82% for training data. This is perfectly acceptable for large datasets, but in smaller datasets it can lead to noticeable deviations and potential confusion for learners.

To make the ratio more consistent, I suggest using:

def random_boolean_array(x, true_ratio=0.8):
n_true = int(x * true_ratio)
n_false = x - n_true
arr = np.array([True] * n_true + [False] * n_false)
np.random.shuffle(arr)
return arr

This approach tends to produce a more stable 80/20 distribution.

Or, for keeping the inline coding style (and avoiding manual splitting functions) we can improve this by using:

np.random.choice([True, False], size=len(df), p=[0.8, 0.2])

Thank you for your time and for the excellent educational content you share.

Damet Garm!
Shayan

…lute ratio with consistent randomness

fix(data-split): correct 80/20 train-test distribution to ensure abso…

32d1d9f

…lute ratio with consistent randomness

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve 80/20 train/test split to ensure consistent ratio #25

Improve 80/20 train/test split to ensure consistent ratio #25

Uh oh!

shayan74 commented Nov 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Improve 80/20 train/test split to ensure consistent ratio #25

Are you sure you want to change the base?

Improve 80/20 train/test split to ensure consistent ratio #25

Uh oh!

Conversation

shayan74 commented Nov 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant