Skip to content

Conversation

@Jolanrensen
Copy link
Collaborator

This PR contains a handful of fixes and optimizations regarding Nothing(?) in column types:

  • Fixes DataFrame with Nothing? column is detected as FormattedFrame<*>? in notebooks #1546 and adds a helpful note to colsOf to prevent future confusion
  • DataColumn.empty() created a Unit-typed column. I have no idea why, I made it Nothing, which better represents that emptyCol.get(0): Nothing will always throw an exception. Added DataColumn.emptyOf<T>() if you need a specifically typed empty column.
  • #1546 also discovered that exceptions in update provided little info, so I added UpdateException for this.
  • While inspecting Column DataCollector for null-filled or empty columns, I rewrote ::class-logic to typeOf's (for efficiency), and added redundancy checks:
    • (If the data collector should produce a Nothing column, it can only do that if it has no values; so the resulting column must be empty)
    • (if the data collector should produce a Nothing? column, it can only do that if all its values are null)

I also checked usages of colsOf() querying for nullable types to see if similar issues like #1546 existed. Fortunately, in most cases, including null-filled columns are not an issue! All statistics functions simply skip nulls, for instance.

There are some unfortunate cases, like:

dataFrameOf(
    "a" to columnOf(1, 2, 3, null),
    "b" to columnOf(null, null, null, null),
    "c" to columnOf(7, 3, 2, 65),
).fillNulls { colsOf<Int?>() }.with { 0 }

which throw java.lang.IllegalArgumentException: Can not add value of class kotlin.Int to column of type kotlin.Nothing?. Value = 0 because column b: Nothing? is included by colsOf<Int?>(). However, hopefully, the note to add .filter { !it.allNulls() } is found :). Alternatively, we could decide to add an argument colsOf<Int?>(exactType = true) in the future.

…urate than `Unit` as getting something out of this column will always result in an exception, not "just run", like `Unit` does. Added emptyOf for if you need a specific type
…dded redundancy checks for nothing types in column creation
@Jolanrensen Jolanrensen added this to the 1.0.0-Beta4 milestone Nov 6, 2025
@Jolanrensen Jolanrensen requested a review from koperagen November 6, 2025 20:06
@Jolanrensen Jolanrensen added bug Something isn't working enhancement New feature or request labels Nov 6, 2025

public class UpdateException(override val message: String, cause: Throwable? = null) :
IllegalStateException(message, cause),
DataFrameError
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to add this supertype only if you have example of code that triggers this in compiler plugin

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that might be difficult, as it most often relies on runtime information. (besides maybe update {}.with { error() }). I thought we now preferred to use the DataFrameError type for any new exception we create

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think about it as a way to opt-in reporting warnings for exceptions, that we know for sure make sense for user. In theory we could report any exception. DataFrameError will help us to validate them first

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

soo... DataFrameWarning? ;P

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be

* Creates an empty [DataColumn] with given [name].
* @see emptyOf
*/
public fun empty(name: String = ""): DataColumn<Nothing> =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have two operations that use empty: concatImpl and getColumn. Please see if they can be affected by this change. Otherwise it's quite neat

Copy link
Collaborator Author

@Jolanrensen Jolanrensen Nov 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems concat fails on dataframes without rows, but not because of Nothing columns, but because it uses ::class reflection in createColumnGuessingType(), interesting

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed it in the next commit :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

DataFrame with Nothing? column is detected as FormattedFrame<*>? in notebooks

3 participants