Skip to content

Conversation

@architch
Copy link

@architch architch commented Jun 23, 2025

Description

This change optimises byte array to hex string conversion
Current implementation uses string formatting which needs to parse the format for every byte. This takes a lot of time and CPU.
New implementation uses bit operations.

How did the Spark Cassandra Connector Work or Not Work Before this Patch

Below code demonstrates the time taken by the two implementations (>600ms to <20ms for 1Million bytearray)

//create random byte array of 1 Million bytes
val byteArray = (1 to 1000000).map(x => ((x%128). * ( if x%2 == 0 then -1 else 1)).toByte).toArray

// Time taken by current code ~ 800ms
var start = System.currentTimeMillis();
var hexString = "0x" + byteArray.map("%02x" format _).mkString
var end = System.currentTimeMillis()
print("time taken : " + (end - start)) // time taken : 854


//Time taken by new code ~ 15 ms 
start = System.currentTimeMillis();
hexString = byteArrayToHexString(byteArray)
end = System.currentTimeMillis()
print("time taken : " + (end - start)) // time taken : 14

General Design of the patch

Why pursue this particular fix?
We found lots of threads busy/occupied parsing the format while converting the byteArray to String in our application.

Fixes: CASSANALYTICS-68

How Has This Been Tested?

Unit Tests.

Checklist:

  • I have a ticket in the JIRA
  • I have performed a self-review of my own code
  • Locally all tests pass (make sure tests fail without your patch)

@architch architch changed the title optimise byte array to hex string conversion CASSANALYTICS-68: Optimise byte array to hex string conversion Jun 23, 2025
'0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a', 'b', 'c', 'd', 'e', 'f'
)
private def byteArrayToString (x: Array[Byte]) : String = {
val result = new StringBuilder(x.length * 2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Why not using x.length * 2 + 2 and already inserting 0x ? Then you can return result.toString()

@architch architch requested a review from nob13 October 28, 2025 17:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants