Skip to content

DPY-4009 when using direct_path_load() for inserting dataframe in batches #551

@urosdigital

Description

@urosdigital
  1. What versions are you using?

oracledb version: 3.4.1

Give your Oracle Database version, e.g.:

DB version: 19.15.0.0.0

Give your Oracle Client version (if you are using Thick mode):

NO. Using Thin mode

Also run Python and show the output of:

import sys
import platform

print("platform.platform:", platform.platform())
print("sys.maxsize > 2**32:", sys.maxsize > 2**32)
print("platform.python_version:", platform.python_version())
platform.platform: Windows-11-10.0.22631-SP0
sys.maxsize > 2**32: True
platform.python_version: 3.13.5

And:

print("oracledb.__version__:", oracledb.__version__)

oracledb.__version__: 3.4.1

  1. Is it an error or a hang or a crash?

An error

  1. What error(s) or behavior you are seeing?
code: 0
full code: DPY-4009
error message: DPY-4009: 15 positional bind values are required but 16 were provided
  1. Does your application call init_oracle_client()?

No. Thin mode

  1. Include a runnable Python script that shows the problem.

Hello,

I'm trying to insert few million records into database in batches, using pandas df and connection.direct_path_load().
Below is the code im using:

print(f'oracledb version: {oracledb.__version__}')
print(f'DB version: {conn.version}')
print(f'Thin mode: {conn.thin}')
columns = list(df.columns)
print(columns)
print(f'Number od columns: {len(columns)}')
print(f'Number of records in dataframe: {len(df)}')

batchsize=1000000
print(f'Start inserting dataframe into temp table, batch size: {batchsize}')
conn.direct_path_load(
	schema_name=user,
	table_name=tablename,
	column_names=columns,
	data=df,
	batch_size=batchsize
)

When i execute this code, im getting the following error:

oracledb version: 3.4.1
DB version: 19.15.0.0.0
Thin mode: True
['CDR_KEY', 'PERIOD', 'ISID', 'FILE_NAME', 'CDR_INDEX', 'STATUS', 'TOTAL_CHARGE', 'TOTAL_TAX', 'TOTAL_DISCOUNT', 'SYS_CREATION_DATE', 'SYS_UPDATE_DATE', 'OPERATOR_ID', 'APPLICATION_ID', 'DL_SERVICE_CODE', 'DL_UPDATE_STAMP']
Number od columns: 15
Number of records in df: 9103493
Start inserting dataframe into temp table, batch size: 1000000


code: 0
full code: DPY-4009
error message: DPY-4009: 15 positional bind values are required but 16 were provided

In case i convert df into list of tuples (bind values), operaton is completed without errors.

tuples_ready_for_insertion = [tuple(x) for x in df.values]
connection.direct_path_load(
	schema_name=user,
	table_name=tablename,
	column_names=columns,
	data=tuples_ready_for_insertion,
	batch_size=100000
)
oracledb version: 3.4.1
DB version: 19.15.0.0.0
Thin mode: True
['CDR_KEY', 'PERIOD', 'ISID', 'FILE_NAME', 'CDR_INDEX', 'STATUS', 'TOTAL_CHARGE', 'TOTAL_TAX', 'TOTAL_DISCOUNT', 'SYS_CREATION_DATE', 'SYS_UPDATE_DATE', 'OPERATOR_ID', 'APPLICATION_ID', 'DL_SERVICE_CODE', 'DL_UPDATE_STAMP']
Number od columns: 15
Number of records in df: 9103493
Start inserting dataframe into temp table, batch size: 1000000

Is this a bug maybe?
Also, please note before inserting records, im creating a temp table. for this operation i do not need explicit commit
(https://python-oracledb.readthedocs.io/en/latest/user_guide/txn_management.html#managing-transactions), what's bothers me is
the fact once the insertion using connection.direct_path_load() is completed, changes are committed. Is this expected behaviour
when using direct_path_load()?

What i would also like to know is, is it possible to know if some of the records are not inserted, having errors, number of inserted
records, etc... Like when we use executemany(batcherrors=True) and cursor.rowcounts

BR,
Uros

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions