fpgrowth: NULL value handling dramatically increases runtime

#### Describe the bug

mlxtend 0.23.4 introduced a severe performance regression in the `fpgrowth` function when processing large sparse boolean DataFrames with low `min_support` values (0.01). The execution time increased from seconds to hours/days, making the function unusable for realistic datasets.

The regression appears to be caused by inefficient NULL/NaN handling introduced in 0.23.4 that processes every cell of the DataFrame unnecessarily, even when the DataFrame contains only boolean values without any NULL values. There is even a parameter `null_values` that lets the user declare whether DataFrame contains NULL values, but declaring it `False` doesn't actually omit the unnecessary NULL handling logic.

The additional NULL handling logic seems to (more or less) increase the `generate_itemsets()` time complexity from O(n_generated_itemsets) to O(n_generated_itemsets * n_df_rows) through the [nested for loop iterating by rows](https://github.com/rasbt/mlxtend/blob/76439b43c952e5f98ce516df96a55d1fe8b2f9c8/mlxtend/frequent_patterns/fpcommon.py#L89C13-L89C42). While trying to process a Dataframe of size 300 000 x 3 600 with min_support=0.01 there are approx. 300 000 itemsets generated, which basically squares the number of operations needed in the previous version (0.23.1). 300 000 rows * 300 000 itemsets =~ 10^11 suddenly makes fpgrowth unusable, where previously it usually took less than a minute to calculate. The mentioned dataset is intractable with current version of fpgrowth, although it is small to moderate in business applications.

#### Steps/Code to Reproduce
The example provided uses a smaller DataFrame (10 000 rows x 400 columns with rather low support), so that it can showcase the huge execution time difference between mlxtend versions 0.23.1 and 0.23.4, instead of just stalling for an eternity (as happens with larger data).

```python

import time
import numpy as np
import pandas as pd
from mlxtend.frequent_patterns import fpgrowth

def create_dataframe(n_rows = 10000, n_cols = 400):
    np.random.seed(42)
    
    # Generate realistic sparse boolean data based on real categorical data patterns
    # 90% columns have support < 0.01, 6% have 0.01-0.1, 4% have 0.1-0.65  
    support_values = np.zeros(n_cols)
    
    # Most columns: very low support (don't participate in itemsets with min_support=0.01)
    n_very_low = int(n_cols * 0.9)
    support_values[:n_very_low] = np.random.uniform(0.0001, 0.009, n_very_low)
    
    # Some columns: medium support (participate in itemsets)
    n_medium = int(n_cols * 0.06)
    support_values[n_very_low:n_very_low+n_medium] = np.random.uniform(0.01, 0.1, n_medium)
    
    # Few columns: high support (common categories)
    n_high = n_cols - n_very_low - n_medium
    support_values[n_very_low+n_medium:] = np.random.uniform(0.1, 0.65, n_high)
    
    # Generate boolean DataFrame (no NULL values)
    data = {}
    for i in range(n_cols):
        col_name = f"feature_{i:04d}"
        prob = support_values[i]
        data[col_name] = np.random.random(n_rows) < prob
    
    return pd.DataFrame(data)

def test_performance():
    """Test fpgrowth performance."""
    df = create_dataframe()
    print(f"Testing {df.shape} DataFrame with {df.values.mean():.4f} density")
    print(f"mlxtend version: {__import__('mlxtend').__version__}")
    
    start_time = time.time()
    frequent_itemsets = fpgrowth(df, min_support=0.01, use_colnames=True)
    elapsed_time = time.time() - start_time
    
    print(f"Execution time: {elapsed_time:.2f} seconds")
    print(f"Found itemsets: {len(frequent_itemsets):,}")
    
    return elapsed_time

if __name__ == "__main__":
    test_performance()
```

#### Expected Results
With mlxtend 0.23.1:
- Execution time: < 1 second

#### Actual Results
With mlxtend 0.23.4:
- Execution time: ~3 minutes !
- This shows that the newly added NULL handling dramatically increases complexity and takes up nearly all of the computing time.

The issue becomes even worse with larger DataFrames or higher support columns (scales poorly with both n_rows and n_generated_itemsets).

**Workaround:** Downgrade to mlxtend 0.23.1.

**Suggested Fix:** Review the NULL handling logic introduced in 0.23.4. The function should skip expensive NULL processing when the DataFrame contains only boolean values.

The current NULL logic in itself could be sped up a little by using vectorized operations in `generate_itemsets()`, instead of the for loop and costly conversions from pandas to python list. This does not however solve the general issue with higher complexity O(n_generated_itemsets * n_df_rows).

#### Versions

```
MLxtend 0.23.4
Linux-5.15.0-152-generic-x86_64-with-glibc2.35
Python 3.11.13 (main, Jun  4 2025, 08:57:29) [GCC 11.4.0]
Scikit-learn 1.7.0
NumPy 1.26.4
SciPy 1.15.2
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fpgrowth: NULL value handling dramatically increases runtime #1130

Describe the bug

Steps/Code to Reproduce

Expected Results

Actual Results

Versions

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

fpgrowth: NULL value handling dramatically increases runtime #1130

Description

Describe the bug

Steps/Code to Reproduce

Expected Results

Actual Results

Versions

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions