@@ -51,6 +51,8 @@ To fetch in batches, use an iterator:
5151
5252.. code-block :: python
5353
54+ import pyarrow
55+
5456 sql = " select * from departments where department_id < 80"
5557 # Adjust "size" to tune the query fetch performance
5658 # Here it is small to show iteration
@@ -144,6 +146,10 @@ Oracle Database will result in an exception. :ref:`Output type handlers
144146 - TIMESTAMP
145147 * - DB_TYPE_VARCHAR
146148 - STRING
149+ * - DB_TYPE_VECTOR
150+ - List or struct with DOUBLE, FLOAT, INT8, or UINT8 values
151+
152+ **Numbers **
147153
148154When converting Oracle Database NUMBERs:
149155
@@ -158,10 +164,51 @@ When converting Oracle Database NUMBERs:
158164
159165- In all other cases, the Arrow data type is DOUBLE.
160166
167+ **Vectors **
168+
169+ When converting Oracle Database VECTORs:
170+
171+ - Dense vectors are fetched as lists.
172+
173+ - Sparse vectors are fetched as structs with fields ``num_dimensions ``,
174+ ``indices `` and ``values `` similar to :ref: `SparseVector objects
175+ <sparsevectorsobj>`.
176+
177+ - VECTOR columns with flexible dimensions are supported.
178+
179+ - VECTOR columns with flexible formats are not supported. Each vector value
180+ must have the same storage format data type.
181+
182+ - Vector values are fetched as the following types:
183+
184+ .. list-table-with-summary ::
185+ :header-rows: 1
186+ :class: wy-table-responsive
187+ :widths: 1 1
188+ :align: left
189+ :summary: The first column is the Oracle Database VECTOR format. The second column is the resulting Arrow data type in the list.
190+
191+ * - Oracle Database VECTOR format
192+ - Arrow data type
193+ * - FLOAT64
194+ - DOUBLE
195+ * - FLOAT32
196+ - FLOAT
197+ * - INT8
198+ - INT8
199+ * - BINARY
200+ - UINT8
201+
202+ See :ref: `dfvector ` for more information.
203+
204+ **LOBs **
205+
161206When converting Oracle Database CLOBs and BLOBs:
162207
163208- The LOBs must be no more than 1 GB in length.
164209
210+ **Dates and Timestamps **
211+
165212When converting Oracle Database DATEs and TIMESTAMPs:
166213
167214- Arrow TIMESTAMPs will not have timezone data.
@@ -236,6 +283,8 @@ An example that creates and uses a `PyArrow Table
236283
237284.. code-block :: python
238285
286+ import pyarrow
287+
239288 # Get an OracleDataFrame
240289 # Adjust arraysize to tune the query fetch performance
241290 sql = " select id, name from SampleQueryTab order by id"
@@ -303,8 +352,8 @@ An example that creates and uses a `Polars DataFrame
303352
304353.. code-block :: python
305354
306- import pyarrow
307355 import polars
356+ import pyarrow
308357
309358 # Get an OracleDataFrame
310359 # Adjust arraysize to tune the query fetch performance
@@ -377,8 +426,8 @@ For example, to convert to `NumPy <https://numpy.org/>`__ ``ndarray`` format:
377426
378427.. code-block :: python
379428
380- import pyarrow
381429 import numpy
430+ import pyarrow
382431
383432 SQL = " select id from SampleQueryTab order by id"
384433
@@ -426,3 +475,150 @@ An example of working with data as a `Torch tensor
426475
427476 See `samples/dataframe_torch.py <https://github.com/oracle/python-oracledb/
428477blob/main/samples/dataframe_torch.py> `__ for a runnable example.
478+
479+ .. _dfvector :
480+
481+ Using VECTOR data with Data Frames
482+ ----------------------------------
483+
484+ Columns of the `VECTOR <https://www.oracle.com/pls/topic/lookup?ctx=dblatest&
485+ id=GUID-746EAA47-9ADA-4A77-82BB-64E8EF5309BE> `__ data type can be fetched with
486+ the methods :meth: `Connection.fetch_df_all() ` and
487+ :meth: `Connection.fetch_df_batches() `. VECTOR columns can have flexible
488+ dimensions, but flexible storage formats are not supported: each vector value
489+ must have the same format data type. Vectors can be dense or sparse.
490+
491+ See :ref: `dftypemapping ` for the type mapping for VECTORs.
492+
493+ **Dense Vectors **
494+
495+ By default, Oracle Database vectors are "dense". These are fetched in
496+ python-oracledb as Arrow lists. For example, if the table::
497+
498+ create table myvec (v64 vector(3, float64));
499+
500+ contains these two vectors::
501+
502+ [4.1, 5.2, 6.3]
503+ [7.1, 8.2, 9.3]
504+
505+ then the code:
506+
507+ .. code-block :: python
508+
509+ odf = connection.fetch_df_all(" select v64 from myvec" )
510+ pyarrow_table = pyarrow.Table.from_arrays(
511+ odf.column_arrays(), names = odf.column_names()
512+ )
513+
514+ will result in a PyArrow table containing lists of doubles. The table can be
515+ converted to a data frame of your chosen library using functionality described
516+ earlier in this chapter. For example, to convert to Pandas:
517+
518+ .. code-block :: python
519+
520+ pdf = pyarrow_table.to_pandas()
521+ print (pdf)
522+
523+ The output will be::
524+
525+ V64
526+ 0 [4.1, 5.2, 6.3]
527+ 1 [7.1, 8.2, 9.3]
528+
529+ **Sparse Vectors **
530+
531+ Sparse vectors (where many of the values are 0) are fetched as structs with
532+ fields ``num_dimensions ``, ``indices ``, and ``values `` similar to
533+ :ref: `SparseVector objects <sparsevectorsobj >` which are discussed in a
534+ non-data frame context in :ref: `sparsevectors `.
535+
536+ If the table::
537+
538+ create table myvec (v64 vector(3, float64, sparse));
539+
540+ contains these two vectors::
541+
542+ [3, [1,2], [4.1, 5.2]]
543+ [3, [0], [9.3]]
544+
545+ then the code to fetch as data frames:
546+
547+ .. code-block :: python
548+
549+ import pyarrow
550+
551+ odf = connection.fetch_df_all(" select v64 from myvec" )
552+ pdf = pyarrow.Table.from_arrays(
553+ odf.column_arrays(), names = odf.column_names()
554+ ).to_pandas()
555+
556+ print (pdf)
557+
558+ print (" First row:" )
559+
560+ num_dimensions = pdf.iloc[0 ].V64[' num_dimensions' ]
561+ print (f " num_dimensions= { num_dimensions} " )
562+
563+ indices = pdf.iloc[0 ].V64[' indices' ]
564+ print (f " indices= { indices} " )
565+
566+ values = pdf.iloc[0 ].V64[' values' ]
567+ print (f " values= { values} " )
568+
569+ will display::
570+
571+ V64
572+ 0 {'num_dimensions': 3, 'indices': [1, 2], 'valu...
573+ 1 {'num_dimensions': 3, 'indices': [0], 'values'...
574+
575+ First row:
576+ num_dimensions=3
577+ indices=[1 2]
578+ values=[4.1 5.2]
579+
580+ You can convert each struct as needed. One way to convert into `Pandas
581+ dataframes with sparse values
582+ <https://pandas.pydata.org/docs/user_guide/sparse.html> `__ is via a `SciPy
583+ coordinate format matrix <https://docs.scipy.org/doc/scipy/reference/generated/
584+ scipy.sparse.coo_array.html#scipy.sparse.coo_array> `__. The Pandas method
585+ `from_spmatrix() <https://pandas.pydata.org/docs/reference/api/
586+ pandas.DataFrame.sparse.from_spmatrix.html> `__ can then be used to create the
587+ final sparse dataframe:
588+
589+ .. code-block :: python
590+
591+ import numpy
592+ import pandas
593+ import pyarrow
594+ import scipy
595+
596+ def convert_to_sparse_array (val ):
597+ dimensions = val[" num_dimensions" ]
598+ col_indices = val[" indices" ]
599+ row_indices = numpy.zeros(len (col_indices))
600+ values = val[" values" ]
601+ sparse_matrix = scipy.sparse.coo_matrix(
602+ (values, (col_indices, row_indices)), shape = (dimensions, 1 ))
603+ return pandas.arrays.SparseArray.from_spmatrix(sparse_matrix)
604+
605+ odf = connection.fetch_df_all(" select v64 from myvec" )
606+ pdf = pyarrow.Table.from_arrays(
607+ odf.column_arrays(), odf.column_names()
608+ ).to_pandas()
609+
610+ pdf[" SPARSE_ARRAY_V64" ] = pdf[" V64" ].apply(convert_to_sparse_array)
611+
612+ print (pdf.SPARSE_ARRAY_V64 )
613+
614+ The code will print::
615+
616+ 0 [0.0, 4.1, 5.2]
617+ Fill: 0.0
618+ IntIndex
619+ Indices: ar...
620+ 1 [9.3, 0.0, 0.0]
621+ Fill: 0.0
622+ IntIndex
623+ Indices: ar...
624+ Name: SPARSE_ARRAY_V64, dtype: object
0 commit comments