Skip to content

Conversation

@Bidek56
Copy link
Collaborator

@Bidek56 Bidek56 commented Dec 7, 2025

Adding series mapElements to close #46

This feature will allow for DF column map thru series, for example:

const df = pl.DataFrame({ a: [1, 5, 3] });
const mapping: Record<number, number> = { 1: 11, 2: 22, 3: 33, 4: 44 };
const funcMap = (k: number): number => mapping[k] ?? '';

const mappedSeries: pl.Series = df.select(pl.col("a")).toSeries().mapElements(funcMap);
const df2 = df.withColumn(mappedSeries.alias("mappedSeries"));

@Bidek56 Bidek56 self-assigned this Dec 7, 2025
@Bidek56 Bidek56 added the enhancement New feature or request label Dec 7, 2025
Copy link
Collaborator

@universalmind303 universalmind303 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this 100% needs to be done on the rust side of things via threadsafe-function. napi-v2 threadsafe functions don't really work the way we need to, so i think we'd need to upgrade to napi v3 first.

the current implementation in this PR is going to be painfully slow for any non toy examples

@Bidek56
Copy link
Collaborator Author

Bidek56 commented Dec 8, 2025

so this 100% needs to be done on the rust side of things via threadsafe-function. napi-v2 threadsafe functions don't really work the way we need to, so i think we'd need to upgrade to napi v3 first.

the current implementation in this PR is going to be painfully slow for any non toy examples

We are already using Napi v3.5

I have tested this simple example on 1M rows and it takes 0.5 sec on my laptop using the current JS implementation.

const mapping: Record<number, number> = { 1: 11, 2: 22, 3: 33, 4: 44, 11111: 4444 };
const funcMap = (k: number): number => mapping[k] ?? "";
pl.Series("foo", regularArray, pl.Int32).mapElements(funcMap);

Would you consider that slow?

@Bidek56
Copy link
Collaborator Author

Bidek56 commented Dec 8, 2025

Rust implementation of series.mapElements on a 1M int array takes 0.45 sec, slightly faster than JS (0.5s), but I would need to find a cleaver way to express all the possible combinations of Rust: Function<i32, i32>
This is my Rust implementation for a i32 dtype:

#[napi(catch_unwind)]
pub fn map_elements(&self, lambda: Function<i32, i32>) -> JsSeries {
  let s1: Vec<i32> = self.series.i32().map(|v| v.iter().map(|x| 
             lambda.call(x.unwrap()))).unwrap().filter_map(Result::ok).collect();
  let ss = Series::new(self.series.name().clone(), s1);
  JsSeries::new(ss)
}

@universalmind303
Copy link
Collaborator

universalmind303 commented Dec 9, 2025

Rust implementation of series.mapElements on a 1M int array takes 0.45 sec, slightly faster than JS (0.5s), but I would need to find a cleaver way to express all the possible combinations of Rust: Function<i32, i32> This is my Rust implementation for a i32 dtype:

#[napi(catch_unwind)]
pub fn map_elements(&self, lambda: Function<i32, i32>) -> JsSeries {
  let s1: Vec<i32> = self.series.i32().map(|v| v.iter().map(|x| 
             lambda.call(x.unwrap()))).unwrap().filter_map(Result::ok).collect();
  let ss = Series::new(self.series.name().clone(), s1);
  JsSeries::new(ss)
}

so primitive types will generally be much faster as they are backed by zero copy data types on the js side. complex and nested types however are not. I'd also want to see what the perf is when running on node.js. Bun has a highly optimized napi implementation that's way faster than node.js. I'd suspect node.js to be magnitudes slower.

For a rust based map_elements, the best way would probably just to use AnyValue. Theres some minimal conversion cost but it should cover all combinations without writing specialized functions for every signature: Function<AnyValue, AnyValue>.

@Bidek56
Copy link
Collaborator Author

Bidek56 commented Dec 9, 2025

Rust implementation of series.mapElements on a 1M int array takes 0.45 sec, slightly faster than JS (0.5s), but I would need to find a cleaver way to express all the possible combinations of Rust: Function<i32, i32> This is my Rust implementation for a i32 dtype:

#[napi(catch_unwind)]
pub fn map_elements(&self, lambda: Function<i32, i32>) -> JsSeries {
  let s1: Vec<i32> = self.series.i32().map(|v| v.iter().map(|x| 
             lambda.call(x.unwrap()))).unwrap().filter_map(Result::ok).collect();
  let ss = Series::new(self.series.name().clone(), s1);
  JsSeries::new(ss)
}

so primitive types will generally be much faster as they are backed by zero copy data types on the js side. complex and nested types however are not. I'd also want to see what the perf is when running on node.js. Bun has a highly optimized napi implementation that's way faster than node.js. I'd suspect node.js to be magnitudes slower.

For a rust based map_elements, the best way would probably just to use AnyValue. Theres some minimal conversion cost but it should cover all combinations without writing specialized functions for every signature: Function<AnyValue, AnyValue>.

The same test example using NodeJs 25, runs in 0.4 sec.
Bun will default to NodeJs without the --bun flag.
When using bun run --bun ..., it completes in 0.292 sec.

@Bidek56
Copy link
Collaborator Author

Bidek56 commented Dec 9, 2025

Rust implementation of series.mapElements on a 1M int array takes 0.45 sec, slightly faster than JS (0.5s), but I would need to find a cleaver way to express all the possible combinations of Rust: Function<i32, i32> This is my Rust implementation for a i32 dtype:

#[napi(catch_unwind)]
pub fn map_elements(&self, lambda: Function<i32, i32>) -> JsSeries {
  let s1: Vec<i32> = self.series.i32().map(|v| v.iter().map(|x| 
             lambda.call(x.unwrap()))).unwrap().filter_map(Result::ok).collect();
  let ss = Series::new(self.series.name().clone(), s1);
  JsSeries::new(ss)
}

so primitive types will generally be much faster as they are backed by zero copy data types on the js side. complex and nested types however are not. I'd also want to see what the perf is when running on node.js. Bun has a highly optimized napi implementation that's way faster than node.js. I'd suspect node.js to be magnitudes slower.
For a rust based map_elements, the best way would probably just to use AnyValue. Theres some minimal conversion cost but it should cover all combinations without writing specialized functions for every signature: Function<AnyValue, AnyValue>.

The same test example using NodeJs 25, runs in 0.4 sec. Bun will default to NodeJs without the --bun flag. When using bun run --bun ..., it completes in 0.292 sec.

This Rust implementation is actual slower vs JS by 20-40 ms on 1M int32 element series.

pub fn map_elements(&self, lambda: Function<Wrap<AnyValue>, Wrap<AnyValue>>) -> JsSeries {
  let out: Vec<AnyValue> = self.series.iter().map(|av| {
      let wrapped_in = Wrap(av);
      lambda.call(wrapped_in).map(|w| w.0)
  }).collect::<Result<_, _>>().unwrap();
  let ss = Series::new(self.series.name().clone(), out);
  JsSeries::new(ss)
}

When using more complex series with 1M rows:

const data = [
    { utf8: "a", f64: 1 },
    { utf8: "b", f64: 2 },
];

Rust implementation is slower by 200 ms, Rust: 2,210 ms vs JS: 2,007 ms

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Map and apply function to perform custom operations

2 participants