Skip to content

Conversation

@gouhongshen
Copy link
Contributor

@gouhongshen gouhongshen commented Dec 1, 2025

User description

What type of PR is this?

  • API-change
  • BUG
  • Improvement
  • Documentation
  • Feature
  • Test and CI
  • Code Refactoring

Which issue(s) this PR fixes:

issue ##23029 #23030

What this PR does / why we need it:

compatible with MySQL


PR Type

Enhancement


Description

  • Implement JSON_ARRAYAGG and JSON_OBJECTAGG aggregate functions

  • Support distinct aggregation and proper NULL handling

  • Add comprehensive type conversion for JSON serialization

  • Register new functions with IDs and validation logic


Diagram Walkthrough

flowchart LR
  A["Function Registration"] --> B["Function IDs<br/>JSON_ARRAYAGG=400<br/>JSON_OBJECTAGG=401"]
  B --> C["Validation Logic<br/>Parameter checks<br/>Type validation"]
  C --> D["Execution Engines<br/>jsonArrayAggExec<br/>jsonObjectAggExec"]
  D --> E["Result Building<br/>Array/Object<br/>JSON serialization"]
  E --> F["Test Cases<br/>GROUP BY queries<br/>Duplicate handling"]
Loading

File Walkthrough

Relevant files
Enhancement
4 files
jsonagg.go
Implement JSON_ARRAYAGG and JSON_OBJECTAGG execution logic
+646/-0 
types.go
Add factory functions for JSON aggregate executors             
+40/-0   
special.go
Register JSON aggregate functions in planning layer           
+8/-0     
list_agg.go
Add JSON aggregate function definitions and validation     
+69/-0   
Configuration changes
2 files
register.go
Register new JSON aggregate function IDs                                 
+12/-10 
function_id.go
Define function IDs for JSON aggregates                                   
+9/-1     
Tests
3 files
function_id_test.go
Update test constants for new function IDs                             
+3/-1     
func_json_agg.test
Add test cases for JSON aggregate functions                           
+20/-0   
func_json_agg.result
Add expected results for JSON aggregate tests                       
+26/-0   

@qodo-code-review
Copy link

qodo-code-review bot commented Dec 1, 2025

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
🟢
No security concerns identified No security vulnerabilities detected by AI analysis. Human verification advised for critical code.
Ticket Compliance
🟡
🎫 #23029
🟢 Implement the MySQL-compatible aggregate function JSON_ARRAYAGG(col_or_expr).
Ensure correct MySQL behavior including handling of NULLs inside the aggregated JSON
array.
Integrate function into planner/execution framework with proper function ID and
registration.
Provide tests demonstrating expected results comparable to MySQL.
Behavioral parity with MySQL for all edge cases (collations, large inputs,
overflow/precision, DISTINCT semantics) may require additional testing beyond included
cases.
🟡
🎫 #23030
🟢 Implement the MySQL-compatible aggregate function JSON_OBJECTAGG(key, val).
Enforce MySQL rules: key must be non-NULL string; last duplicate key wins; NULL values
become JSON null.
Integrate function into planner/execution framework with proper function ID and
registration.
Provide tests demonstrating expected results comparable to MySQL.
Full MySQL compatibility across data types, collations, and DISTINCT behavior may need
broader test coverage.
Codebase Duplication Compliance
Codebase context is not defined

Follow the guide to enable codebase context checks.

Custom Compliance
🟢
Generic: Meaningful Naming and Self-Documenting Code

Objective: Ensure all identifiers clearly express their purpose and intent, making code
self-documenting

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Robust Error Handling and Edge Case Management

Objective: Ensure comprehensive error handling that provides meaningful context and graceful
degradation

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Error Handling

Objective: To prevent the leakage of sensitive system information through error messages while
providing sufficient detail for internal debugging.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Logging Practices

Objective: To ensure logs are useful for debugging and auditing without exposing sensitive
information like PII, PHI, or cardholder data.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Security-First Input Validation and Data Handling

Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent
vulnerabilities

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Comprehensive Audit Trails

Objective: To create a detailed and reliable record of critical system actions for security analysis
and compliance.

Status:
No auditing: The new aggregation implementations perform data processing without emitting any audit
logs of critical actions, which may be acceptable in compute-layer code but cannot be
verified from the diff.

Referred Code
func RegisterJsonArrayAgg(id int64) {
	specialAgg[id] = true
	AggIdOfJsonArrayAgg = id
}

func RegisterJsonObjectAgg(id int64) {
	specialAgg[id] = true
	AggIdOfJsonObjectAgg = id
}

type jsonArrayAggGroup struct {
	values []bytejson.ByteJson
}

type jsonArrayAggExec struct {
	multiAggInfo
	ret aggResultWithBytesType

	distinctHash
	groups []jsonArrayAggGroup
}


 ... (clipped 599 lines)

Learn more about managing compliance generic rules or creating your own custom rules

  • Update
Compliance status legend 🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

@mergify mergify bot added the kind/feature label Dec 1, 2025
@qodo-code-review
Copy link

qodo-code-review bot commented Dec 1, 2025

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
General
Optimize JSON array generation to improve performance

To fix a quadratic complexity performance issue, move the JSON array generation
logic from updateResult to the Flush method, so it runs once per group instead
of on every row.

pkg/sql/colexec/aggexec/jsonagg.go [201-215]

 func (exec *jsonArrayAggExec) updateResult(groupIndex int) error {
+	// This function is still inefficient if called on every insert.
+	// The ideal solution is to call this logic only from Flush().
+	// The following is a slightly optimized version of the original.
 	arr := make([]any, len(exec.groups[groupIndex].values))
 	for i, v := range exec.groups[groupIndex].values {
 		arr[i] = v
 	}
-	bj, err := bytejson.CreateByteJSONWithCheck(arr)
-	if err != nil {
-		return err
-	}
+	bj := bytejson.ByteJson(bytejson.NewArray(arr))
 	data, err := bj.Marshal()
 	if err != nil {
 		return err
 	}
 	return exec.ret.set(data)
 }
  • Apply / Chat
Suggestion importance[1-10]: 8

__

Why: The suggestion correctly identifies a significant performance issue (quadratic complexity) due to re-calculating the aggregate result on every row and proposes the correct architectural fix.

Medium
Optimize JSON object generation to improve performance

To fix a quadratic complexity performance issue, move the JSON object generation
logic from updateResult to the Flush method, so it runs once per group instead
of on every row.

pkg/sql/colexec/aggexec/jsonagg.go [452-467]

 func (exec *jsonObjectAggExec) updateResult(groupIndex int) error {
+	// This function is still inefficient if called on every insert.
+	// The ideal solution is to call this logic only from Flush().
+	// The following is a slightly optimized version of the original.
 	m := exec.groups[groupIndex].values
 	obj := make(map[string]any, len(m))
 	for k, v := range m {
 		obj[k] = v
 	}
-	bj, err := bytejson.CreateByteJSONWithCheck(obj)
-	if err != nil {
-		return err
-	}
+	bj := bytejson.ByteJson(bytejson.NewObject(obj))
 	data, err := bj.Marshal()
 	if err != nil {
 		return err
 	}
 	return exec.ret.set(data)
 }
  • Apply / Chat
Suggestion importance[1-10]: 8

__

Why: The suggestion correctly identifies a significant performance issue (quadratic complexity) due to re-calculating the aggregate result on every row and proposes the correct architectural fix.

Medium
  • Update

@mergify
Copy link
Contributor

mergify bot commented Dec 4, 2025

Merge Queue Status Beta

✅ The pull request has been merged

This pull request spent 17 seconds in the queue, with no time running CI.
The checks were run in-place.

Required conditions to merge
  • #approved-reviews-by >= 1 [🛡 GitHub branch protection]
  • #changes-requested-reviews-by = 0 [🛡 GitHub branch protection]
  • #review-threads-unresolved = 0 [🛡 GitHub branch protection]
  • branch-protection-review-decision = APPROVED [🛡 GitHub branch protection]
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Compose CI / multi cn e2e bvt test docker compose(PESSIMISTIC)
    • check-neutral = Matrixone Compose CI / multi cn e2e bvt test docker compose(PESSIMISTIC)
    • check-skipped = Matrixone Compose CI / multi cn e2e bvt test docker compose(PESSIMISTIC)
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Standlone CI / Multi-CN e2e BVT Test on Linux/x64(LAUNCH, PROXY)
    • check-neutral = Matrixone Standlone CI / Multi-CN e2e BVT Test on Linux/x64(LAUNCH, PROXY)
    • check-skipped = Matrixone Standlone CI / Multi-CN e2e BVT Test on Linux/x64(LAUNCH, PROXY)
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Standlone CI / e2e BVT Test on Linux/x64(LAUNCH, PESSIMISTIC)
    • check-neutral = Matrixone Standlone CI / e2e BVT Test on Linux/x64(LAUNCH, PESSIMISTIC)
    • check-skipped = Matrixone Standlone CI / e2e BVT Test on Linux/x64(LAUNCH, PESSIMISTIC)
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone CI / SCA Test on Ubuntu/x86
    • check-neutral = Matrixone CI / SCA Test on Ubuntu/x86
    • check-skipped = Matrixone CI / SCA Test on Ubuntu/x86
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone CI / UT Test on Ubuntu/x86
    • check-neutral = Matrixone CI / UT Test on Ubuntu/x86
    • check-skipped = Matrixone CI / UT Test on Ubuntu/x86
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Compose CI / multi cn e2e bvt test docker compose(Optimistic/PUSH)
    • check-neutral = Matrixone Compose CI / multi cn e2e bvt test docker compose(Optimistic/PUSH)
    • check-skipped = Matrixone Compose CI / multi cn e2e bvt test docker compose(Optimistic/PUSH)
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Standlone CI / e2e BVT Test on Linux/x64(LAUNCH,Optimistic)
    • check-neutral = Matrixone Standlone CI / e2e BVT Test on Linux/x64(LAUNCH,Optimistic)
    • check-skipped = Matrixone Standlone CI / e2e BVT Test on Linux/x64(LAUNCH,Optimistic)
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Upgrade CI / Compatibility Test With Target on Linux/x64(LAUNCH)
    • check-neutral = Matrixone Upgrade CI / Compatibility Test With Target on Linux/x64(LAUNCH)
    • check-skipped = Matrixone Upgrade CI / Compatibility Test With Target on Linux/x64(LAUNCH)
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Utils CI / Coverage
    • check-neutral = Matrixone Utils CI / Coverage
    • check-skipped = Matrixone Utils CI / Coverage

@mergify mergify bot added the queued label Dec 4, 2025
@mergify mergify bot merged commit b40e6a0 into matrixorigin:main Dec 4, 2025
32 of 34 checks passed
@mergify mergify bot removed the queued label Dec 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/feature Review effort 3/5 size/XL Denotes a PR that changes [1000, 1999] lines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants