Skip to content

Commit 77a0d7e

Browse files
authored
Refactored DataFrame JDBC API plus DataSource handling (#1487)
* Refactored DataFrame JDBC API for enhanced DataSource handling This commit introduces new DataFrame JDBC extension functions with `DataSource` support, removing redundant duplicate utilities. It includes revisions to streamline table/schema reading and validation features, delegating reusable connection-handling logic to a dedicated utility class. Also refactor file structure for better organization of DB-related code. * Refactored schema extraction to use `readSqlTable` and `readSqlQuery` for consistency and improved readability. * Refactored and modularized schema extraction utilities into a dedicated file `readDataFrameSchema.kt`, improving organization and code clarity. Converted several helper functions to `internal` visibility for encapsulation. * Refactor: Replace `DataFrame` with `DataFrameSchema` for schema-related methods in tests and main codebase * Update logging levels in validation utilities to debug and minor schema interface cleanup * Refactor: support custom `PreparedStatement` configuration, unify query limits, and standardize identifier quoting for various database dialects * Refactor: enhance `DbType` with batch size and query timeout properties, improve result set processing, and streamline table metadata handling * Refactor: centralize `makeCommonSqlToKTypeMapping` in `DbType`, streamline SQL type handling and improve readability * Refactored query execution logic by introducing `readDataFrameFromDatabase` utility for improved code reuse and clarity. * Refactored ResultSet-processing utilities to use mutable lists for improved post-processing efficiency and reduced copying. Adjusted related functions to accept mutable lists accordingly. * Add `configureStatement` missed parameters * Refactored JDBC utilities: added comprehensive error handling in `readDataFrameFromDatabase`, converted data classes to classes with equality and hashCode implementations, added validation for database interaction methods. * Update the exception type in the ` read from non-existing table` test for accuracy * Renamed schema extraction functions from `getSchemaFor*` to `from*` for consistency and clarity across the DataFrame JDBC API. Updated all usages and tests accordingly. * Rename `fromSqlTable` and `fromSqlQuery` to `readSqlTable` and `readSqlQuery` for consistency with updated naming conventions. * Update `GenerateDataSchemaTask` to use `DataFrameSchema` methods for SQL table and query schema generation * Refactor: improve code consistency, update parameter documentation, streamline SQL query-related methods, and enhance table schema handling * Replace `DEFAULT_LIMIT` with nullable `limit` parameter, defaulting to `null` for unlimited row fetching. Updated all related methods and documentation for clarity. * Add `validateLimit` utility to ensure limit parameter is null or positive; removed redundant exception handling and updated query-building logic. * Add `validateLimit` calls across all JDBC read methods to enforce limit validation. * Clarify "limit" parameter documentation and rename `readDataFrameFromDatabase` to `executeQueryAndBuildDataFrame` across JDBC methods for improved readability and consistency. * Refactor JDBC data handling: relocate and centralize `buildSchemaByTableColumns` and `buildDataColumn` functions, streamline column post-processing logic, and enhance modularity across schema and SQL utilities. * Ktlint with Junie * Linter with Junie, part 2 * Refactor and enhance JDBC: update references for improved consistency, streamline `TableMetadata` class with compact constructor and copy method, and remove unused `columnMetadata` parameter from `buildDataColumn`. * Add `DataFrameSchema.Companion` class to core API
1 parent a27e415 commit 77a0d7e

File tree

33 files changed

+2009
-1124
lines changed

33 files changed

+2009
-1124
lines changed

core/api/core.api

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6810,11 +6810,15 @@ public final class org/jetbrains/kotlinx/dataframe/schema/ComparisonMode : java/
68106810
}
68116811

68126812
public abstract interface class org/jetbrains/kotlinx/dataframe/schema/DataFrameSchema {
6813+
public static final field Companion Lorg/jetbrains/kotlinx/dataframe/schema/DataFrameSchema$Companion;
68136814
public abstract fun compare (Lorg/jetbrains/kotlinx/dataframe/schema/DataFrameSchema;Lorg/jetbrains/kotlinx/dataframe/schema/ComparisonMode;)Lorg/jetbrains/kotlinx/dataframe/schema/CompareResult;
68146815
public static synthetic fun compare$default (Lorg/jetbrains/kotlinx/dataframe/schema/DataFrameSchema;Lorg/jetbrains/kotlinx/dataframe/schema/DataFrameSchema;Lorg/jetbrains/kotlinx/dataframe/schema/ComparisonMode;ILjava/lang/Object;)Lorg/jetbrains/kotlinx/dataframe/schema/CompareResult;
68156816
public abstract fun getColumns ()Ljava/util/Map;
68166817
}
68176818

6819+
public final class org/jetbrains/kotlinx/dataframe/schema/DataFrameSchema$Companion {
6820+
}
6821+
68186822
public final class org/jetbrains/kotlinx/dataframe/util/DeprecationMessagesKt {
68196823
public static final field DF_READ_EXCEL Ljava/lang/String;
68206824
}

core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/schema/DataFrameSchema.kt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
package org.jetbrains.kotlinx.dataframe.schema
22

33
public interface DataFrameSchema {
4+
public companion object;
45

56
public val columns: Map<String, ColumnSchema>
67

dataframe-jdbc/api/dataframe-jdbc.api

Lines changed: 136 additions & 118 deletions
Large diffs are not rendered by default.
Lines changed: 93 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,93 @@
1+
package org.jetbrains.kotlinx.dataframe.io
2+
3+
/**
4+
* Represents the configuration for an internally managed JDBC database connection.
5+
*
6+
* This class defines connection parameters used by the library to create a `Connection`
7+
* when the user does not provide one explicitly.
8+
* It is designed for safe, read-only access by default.
9+
*
10+
* __NOTE:__ Connections created using this configuration are managed entirely by the library.
11+
* Users do not have access to the underlying `Connection` instance and cannot commit or close it manually.
12+
*
13+
* ### Read-Only Mode Behavior:
14+
*
15+
* When [readOnly] is `true` (default), the connection operates in read-only mode with:
16+
* - `Connection.setReadOnly(true)`
17+
* - `Connection.setAutoCommit(false)`
18+
* - automatic `rollback()` at the end of execution
19+
*
20+
* When [readOnly] is `false`, the connection uses JDBC defaults (usually read-write),
21+
* but the library still rejects any queries that appear to modify data
22+
* (e.g. contain `INSERT`, `UPDATE`, `DELETE`, etc.).
23+
*
24+
* ### Examples:
25+
*
26+
* ```kotlin
27+
* // Safe read-only connection (default)
28+
* val config = DbConnectionConfig("jdbc:sqlite::memory:")
29+
* val df = DataFrame.readSqlQuery(config, "SELECT * FROM books")
30+
*
31+
* // Use default JDBC connection settings (still protected against mutations)
32+
* val config = DbConnectionConfig(
33+
* url = "jdbc:sqlite::memory:",
34+
* readOnly = false
35+
* )
36+
* ```
37+
*
38+
* @property [url] The JDBC URL of the database, e.g., `"jdbc:postgresql://localhost:5432/mydb"`.
39+
* Must follow the standard format: `jdbc:subprotocol:subname`.
40+
*
41+
* @property [user] The username used for authentication.
42+
* Optional, default is an empty string.
43+
*
44+
* @property [password] The password used for authentication.
45+
* Optional, default is an empty string.
46+
*
47+
* @property [readOnly] If `true` (default), enables read-only mode. If `false`, uses JDBC defaults
48+
* but still prevents data-modifying queries. See class documentation for details.
49+
*/
50+
public class DbConnectionConfig(
51+
public val url: String,
52+
public val user: String = "",
53+
public val password: String = "",
54+
public val readOnly: Boolean = true,
55+
) {
56+
override fun equals(other: Any?): Boolean {
57+
if (this === other) return true
58+
if (other !is DbConnectionConfig) return false
59+
60+
if (url != other.url) return false
61+
if (user != other.user) return false
62+
if (password != other.password) return false
63+
if (readOnly != other.readOnly) return false
64+
65+
return true
66+
}
67+
68+
override fun hashCode(): Int {
69+
var result = url.hashCode()
70+
result = 31 * result + user.hashCode()
71+
result = 31 * result + password.hashCode()
72+
result = 31 * result + readOnly.hashCode()
73+
return result
74+
}
75+
76+
override fun toString(): String = "DbConnectionConfig(url='$url', user='$user', password='***', readOnly=$readOnly)"
77+
78+
/**
79+
* Creates a copy of this configuration with the option to override specific properties.
80+
*
81+
* @param url The JDBC URL. If not specified, uses the current value.
82+
* @param user The username. If not specified, uses the current value.
83+
* @param password The password. If not specified, uses the current value.
84+
* @param readOnly The read-only flag. If not specified, uses the current value.
85+
* @return A new [DbConnectionConfig] instance with the specified changes.
86+
*/
87+
public fun copy(
88+
url: String = this.url,
89+
user: String = this.user,
90+
password: String = this.password,
91+
readOnly: Boolean = this.readOnly,
92+
): DbConnectionConfig = DbConnectionConfig(url, user, password, readOnly)
93+
}

0 commit comments

Comments
 (0)