Skip to content

Commit 95bb795

Browse files
authored
chore: add simple token-based parser (#285)
* chore: add simple token parser * chore: add multi-byte support * chore: add benchmark * chore: find parameters with comments in place (#424) * chore: find parameters with comments in place Find query parameters in the SQL string while keeping comments in place. This both improves performance, as we need to copy the string fewer times, but more imporantly; it keeps the comments in the SQL string that is sent to Spanner. The latter is important for the PostgreSQL dialect, as query hints in that dialect are embedded into comments. * build: run tests pull requests for all branches (#426) * chore: ignore lint errors * perf: add a cache for parsed statements (#425) * perf: add a cache for parsed statements The functions for determining the type of statement and which query parameters are in a statement are deterministic and do not change over time. These can therefore safely be cached. This improves the execution speed of frequently executed SQL strings, as the same parsing does not need to happen for each execution. * chore: go mod tidy * chore: create statementParser struct and make it dialect-aware (#427) * chore: create statementParser struct and make it dialect-aware Refactors the statement parser into a struct that contains a field for the database dialect that is being used. This dialect field will be used in follow-up pull request to implement dialect-specific parsing. * chore: remove duplicate quoted string parsing (#428) * chore: remove duplicate quoted string parsing The calculateFindParams function contained its own logic for skipping over quoted string literals. This code was redundant, as it is also implemented in the skip function. This change removes this duplication and simplifies the calculateFindParams implementation significantly. This also ensures that we only have one place where string literals are found and skipped in the parser, which will make it easier to make this code dialect-aware. * chore: add support for PostgreSQL-style comments (#429) * chore: add support for PostgreSQL-style comments Adds support for PostgreSQL-style comments and sets some of the tests up for multi-dialect testing. * chore: support PostgreSQL-style parameters (#431) * chore: support PostgreSQL-style parameters Adds support for both recognizing PostgreSQL-style query parameters, and converting positional parameters to PostgreSQL-style query parameters. When using the PostgreSQL-dialect, the driver accepts both GoogleSQL and PostgreSQL-style named parameters (so both @param and $1). This to be consistent with other PostgreSQL drivers in Go (e.g. pgx). * chore: add PostgreSQL quoting rules (#432) * chore: add PostgreSQL quoting rules Add PostgreSQL quoting rules to the parser. This means: 1. Backslashes do not start an escape sequence. 2. Repeating the same quote twice inside a string literal is an escaped quote. 3. Backtick is not a valid quote character. 4. Triple-quoted strings are not supported. However, if the start of a string consists of three consequtive quotes, then that means 'start of a string' and then 'an escaped quote'. 5. Linefeeds are allowed in string literals. This change does not add support for dollar-quoted strings. That will be added in a follow-up pull request. * chore: add support for dollar-quoted strings (#433) * chore: add support for dollar-quoted strings Adds support for dollar-quoted strings. Also fixes an issue in the skipWhitespaces function, where multi-byte characters would be skipped as-if they were spaces. * test: add some more tests for PostgreSQL parsing (#434) * test: add some more tests for PostgreSQL parsing * feat: detect database dialect and support PostgreSQL databases (#435) * feat: detect database dialect and support PostgreSQL databases Automatically detect the dialect of the database that the driver connects to, and modify the way that query parameters are recognized based on the dialect of the database. This makes it possible to use this driver with Spanner databases that use the PostgreSQL dialect. * perf: cache client-side SQL parsing (#436) Cache the outcome of parsing a potential client-side SQL statement to prevent the same regular expressions to be tested repeatedly. Also, the initial check whether a statement is a client-side statement has been replaced with a keyword based check: Instead of trying to match the SQL statement against the list of all possible client-side statements, the parser first checks whether the first keyword is a keyword that could potentially be a client-side statement. If it is not, then the function returns early. Benchmarks: ``` BenchmarkDetectStatementTypeWithCache-8 12530068 942 ns/op BenchmarkDetectStatementTypeWithoutCache-8 2861143 4182 ns/op ``` * chore: rename function to skipWhitespacesAndComments
1 parent c68065d commit 95bb795

File tree

18 files changed

+2534
-390
lines changed

18 files changed

+2534
-390
lines changed

benchmarks/go.mod

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,7 @@ require (
3838
github.com/google/s2a-go v0.1.9 // indirect
3939
github.com/googleapis/enterprise-certificate-proxy v0.3.6 // indirect
4040
github.com/googleapis/gax-go/v2 v2.14.2 // indirect
41+
github.com/hashicorp/golang-lru v0.5.1 // indirect
4142
github.com/planetscale/vtprotobuf v0.6.1-0.20240319094008-0393e58bdf10 // indirect
4243
github.com/spiffe/go-spiffe/v2 v2.5.0 // indirect
4344
github.com/zeebo/errs v1.4.0 // indirect

benchmarks/go.sum

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -828,6 +828,7 @@ github.com/grpc-ecosystem/grpc-gateway v1.16.0/go.mod h1:BDjrQk3hbvj6Nolgz8mAMFb
828828
github.com/grpc-ecosystem/grpc-gateway/v2 v2.7.0/go.mod h1:hgWBS7lorOAVIJEQMi4ZsPv9hVvWI6+ch50m39Pf2Ks=
829829
github.com/grpc-ecosystem/grpc-gateway/v2 v2.11.3/go.mod h1:o//XUCC/F+yRGJoPO/VU0GSB0f8Nhgmxx0VIRUvaC0w=
830830
github.com/hashicorp/golang-lru v0.5.0/go.mod h1:/m3WP610KZHVQ1SGc6re/UDhFvYD7pJ4Ao+sR/qLZy8=
831+
github.com/hashicorp/golang-lru v0.5.1 h1:0hERBMJE1eitiLkihrMvRVBYAkpHzc/J3QdDN+dAcgU=
831832
github.com/hashicorp/golang-lru v0.5.1/go.mod h1:/m3WP610KZHVQ1SGc6re/UDhFvYD7pJ4Ao+sR/qLZy8=
832833
github.com/iancoleman/strcase v0.2.0/go.mod h1:iwCmte+B7n89clKwxIoIXy/HfoL7AsD47ZCWhYzw7ho=
833834
github.com/ianlancetaylor/demangle v0.0.0-20181102032728-5e5cf60278f6/go.mod h1:aSSvb/t6k1mPoxDqO4vJh6VOCGPwU4O0C2/Eqndh1Sc=

conn.go

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -204,6 +204,7 @@ type SpannerConn interface {
204204
var _ SpannerConn = &conn{}
205205

206206
type conn struct {
207+
parser *statementParser
207208
connector *connector
208209
closed bool
209210
client *spanner.Client
@@ -724,7 +725,7 @@ func (c *conn) Prepare(query string) (driver.Stmt, error) {
724725

725726
func (c *conn) PrepareContext(_ context.Context, query string) (driver.Stmt, error) {
726727
execOptions := c.options()
727-
parsedSQL, args, err := parseParameters(query)
728+
parsedSQL, args, err := c.parser.parseParameters(query)
728729
if err != nil {
729730
return nil, err
730731
}
@@ -733,7 +734,7 @@ func (c *conn) PrepareContext(_ context.Context, query string) (driver.Stmt, err
733734

734735
func (c *conn) QueryContext(ctx context.Context, query string, args []driver.NamedValue) (driver.Rows, error) {
735736
// Execute client side statement if it is one.
736-
clientStmt, err := parseClientSideStatement(c, query)
737+
clientStmt, err := c.parser.parseClientSideStatement(c, query)
737738
if err != nil {
738739
return nil, err
739740
}
@@ -754,11 +755,11 @@ func (c *conn) queryContext(ctx context.Context, query string, execOptions ExecO
754755
return pq.execute(ctx, execOptions.PartitionedQueryOptions.ExecutePartition.Index)
755756
}
756757

757-
stmt, err := prepareSpannerStmt(query, args)
758+
stmt, err := prepareSpannerStmt(c.parser, query, args)
758759
if err != nil {
759760
return nil, err
760761
}
761-
statementType := detectStatementType(query)
762+
statementType := c.parser.detectStatementType(query)
762763
// DDL statements are not supported in QueryContext so fail early.
763764
if statementType.statementType == statementTypeDdl {
764765
return nil, spanner.ToSpannerError(status.Errorf(codes.FailedPrecondition, "QueryContext does not support DDL statements, use ExecContext instead"))
@@ -797,7 +798,7 @@ func (c *conn) queryContext(ctx context.Context, query string, execOptions ExecO
797798

798799
func (c *conn) ExecContext(ctx context.Context, query string, args []driver.NamedValue) (driver.Result, error) {
799800
// Execute client side statement if it is one.
800-
stmt, err := parseClientSideStatement(c, query)
801+
stmt, err := c.parser.parseClientSideStatement(c, query)
801802
if err != nil {
802803
return nil, err
803804
}
@@ -812,7 +813,7 @@ func (c *conn) execContext(ctx context.Context, query string, execOptions ExecOp
812813
// Clear the commit timestamp of this connection before we execute the statement.
813814
c.commitTs = nil
814815

815-
statementInfo := detectStatementType(query)
816+
statementInfo := c.parser.detectStatementType(query)
816817
// Use admin API if DDL statement is provided.
817818
if statementInfo.statementType == statementTypeDdl {
818819
// Spanner does not support DDL in transactions, and although it is technically possible to execute DDL
@@ -824,7 +825,7 @@ func (c *conn) execContext(ctx context.Context, query string, execOptions ExecOp
824825
return c.execDDL(ctx, spanner.NewStatement(query))
825826
}
826827

827-
ss, err := prepareSpannerStmt(query, args)
828+
ss, err := prepareSpannerStmt(c.parser, query, args)
828829
if err != nil {
829830
return nil, err
830831
}

driver.go

Lines changed: 79 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@ import (
2222
"io"
2323
"log/slog"
2424
"math/big"
25+
"os"
2526
"regexp"
2627
"runtime"
2728
"runtime/debug"
@@ -33,6 +34,7 @@ import (
3334
"cloud.google.com/go/civil"
3435
"cloud.google.com/go/spanner"
3536
adminapi "cloud.google.com/go/spanner/admin/database/apiv1"
37+
"cloud.google.com/go/spanner/admin/database/apiv1/databasepb"
3638
"cloud.google.com/go/spanner/apiv1/spannerpb"
3739
"github.com/google/uuid"
3840
"github.com/googleapis/gax-go/v2"
@@ -48,6 +50,10 @@ const userAgent = "go-sql-spanner/1.13.2" // x-release-please-version
4850
const gormModule = "github.com/googleapis/go-gorm-spanner"
4951
const gormUserAgent = "go-gorm-spanner"
5052

53+
const DefaultStatementCacheSize = 1000
54+
55+
var defaultStatementCacheSize int
56+
5157
// LevelNotice is the default logging level that the Spanner database/sql driver
5258
// uses for informational logs. This level is deliberately chosen to be one level
5359
// lower than the default log level, which is slog.LevelInfo. This prevents the
@@ -91,6 +97,19 @@ var spannerDriver *Driver
9197
func init() {
9298
spannerDriver = &Driver{connectors: make(map[string]*connector)}
9399
sql.Register("spanner", spannerDriver)
100+
determineDefaultStatementCacheSize()
101+
}
102+
103+
func determineDefaultStatementCacheSize() {
104+
if defaultCacheSizeString, ok := os.LookupEnv("SPANNER_DEFAULT_STATEMENT_CACHE_SIZE"); ok {
105+
if defaultCacheSize, err := strconv.Atoi(defaultCacheSizeString); err == nil {
106+
defaultStatementCacheSize = defaultCacheSize
107+
} else {
108+
defaultStatementCacheSize = DefaultStatementCacheSize
109+
}
110+
} else {
111+
defaultStatementCacheSize = DefaultStatementCacheSize
112+
}
94113
}
95114

96115
// ExecOptions can be passed in as an argument to the Query, QueryContext,
@@ -210,6 +229,16 @@ type ConnectorConfig struct {
210229
Instance string
211230
Database string
212231

232+
// StatementCacheSize is the size of the internal cache that is used for
233+
// connectors that are created from this ConnectorConfig. This cache stores
234+
// the result of parsing SQL statements for query parameters and the type of
235+
// statement (Query / DML / DDL).
236+
// The default size is 1000. This default can also be overridden by setting
237+
// the environment variable SPANNER_DEFAULT_STATEMENT_CACHE_SIZE.
238+
StatementCacheSize int
239+
// DisableStatementCache disables the use of a statement cache.
240+
DisableStatementCache bool
241+
213242
// AutoConfigEmulator automatically creates a connection for the emulator
214243
// and also automatically creates the Instance and Database on the emulator.
215244
// Setting this option to true will:
@@ -335,6 +364,7 @@ type connector struct {
335364
adminClient *adminapi.DatabaseAdminClient
336365
adminClientErr error
337366
connCount int32
367+
parser *statementParser
338368
}
339369

340370
func newOrCachedConnector(d *Driver, dsn string) (*connector, error) {
@@ -480,6 +510,11 @@ func createConnector(d *Driver, connectorConfig ConnectorConfig) (*connector, er
480510
connectorConfig.IsolationLevel = val
481511
}
482512
}
513+
if strval, ok := connectorConfig.Params[strings.ToLower("StatementCacheSize")]; ok {
514+
if val, err := strconv.Atoi(strval); err == nil {
515+
connectorConfig.StatementCacheSize = val
516+
}
517+
}
483518

484519
// Check if it is Spanner gorm that is creating the connection.
485520
// If so, we should set a different user-agent header than the
@@ -574,6 +609,7 @@ func openDriverConn(ctx context.Context, c *connector) (driver.Conn, error) {
574609
connId := uuid.New().String()
575610
logger := c.logger.With("connId", connId)
576611
connection := &conn{
612+
parser: c.parser,
577613
connector: c,
578614
client: c.client,
579615
adminClient: c.adminClient,
@@ -614,12 +650,33 @@ func (c *connector) increaseConnCount(ctx context.Context, databaseName string,
614650
if c.clientErr != nil {
615651
return c.clientErr
616652
}
653+
c.logger.Log(ctx, LevelNotice, "fetching database dialect")
654+
closeClient := func() {
655+
c.client.Close()
656+
c.client = nil
657+
}
658+
if dialect, err := determineDialect(ctx, c.client); err != nil {
659+
closeClient()
660+
return err
661+
} else {
662+
// Create a separate statement parser and cache per connector.
663+
cacheSize := c.connectorConfig.StatementCacheSize
664+
if c.connectorConfig.DisableStatementCache {
665+
cacheSize = 0
666+
} else if c.connectorConfig.StatementCacheSize == 0 {
667+
cacheSize = defaultStatementCacheSize
668+
}
669+
c.parser, err = newStatementParser(dialect, cacheSize)
670+
if err != nil {
671+
closeClient()
672+
return err
673+
}
674+
}
617675

618676
c.logger.Log(ctx, LevelNotice, "creating Spanner Admin client")
619677
c.adminClient, c.adminClientErr = adminapi.NewDatabaseAdminClient(ctx, opts...)
620678
if c.adminClientErr != nil {
621-
c.client = nil
622-
c.client.Close()
679+
closeClient()
623680
c.adminClient = nil
624681
return c.adminClientErr
625682
}
@@ -630,6 +687,26 @@ func (c *connector) increaseConnCount(ctx context.Context, databaseName string,
630687
return nil
631688
}
632689

690+
func determineDialect(ctx context.Context, client *spanner.Client) (databasepb.DatabaseDialect, error) {
691+
it := client.Single().Query(ctx, spanner.Statement{SQL: "select option_value from information_schema.database_options where option_name='database_dialect'"})
692+
defer it.Stop()
693+
for {
694+
if row, err := it.Next(); err != nil {
695+
return databasepb.DatabaseDialect_DATABASE_DIALECT_UNSPECIFIED, err
696+
} else {
697+
var dialectName string
698+
if err := row.Columns(&dialectName); err != nil {
699+
return databasepb.DatabaseDialect_DATABASE_DIALECT_UNSPECIFIED, err
700+
}
701+
if dialect, ok := databasepb.DatabaseDialect_value[dialectName]; ok {
702+
return databasepb.DatabaseDialect(dialect), nil
703+
} else {
704+
return databasepb.DatabaseDialect_DATABASE_DIALECT_UNSPECIFIED, fmt.Errorf("unknown database dialect: %s", dialectName)
705+
}
706+
}
707+
}
708+
}
709+
633710
// decreaseConnCount decreases the number of connections that are active and closes the underlying clients if it was the
634711
// last connection.
635712
func (c *connector) decreaseConnCount() error {

driver_test.go

Lines changed: 27 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,7 @@ import (
2626
"time"
2727

2828
"cloud.google.com/go/spanner"
29+
"cloud.google.com/go/spanner/admin/database/apiv1/databasepb"
2930
"cloud.google.com/go/spanner/apiv1/spannerpb"
3031
"github.com/google/go-cmp/cmp"
3132
"github.com/google/go-cmp/cmp/cmpopts"
@@ -533,7 +534,12 @@ func TestConn_StartBatchDml(t *testing.T) {
533534
}
534535

535536
func TestConn_NonDdlStatementsInDdlBatch(t *testing.T) {
537+
parser, err := newStatementParser(databasepb.DatabaseDialect_GOOGLE_STANDARD_SQL, 1000)
538+
if err != nil {
539+
t.Fatal(err)
540+
}
536541
c := &conn{
542+
parser: parser,
537543
logger: noopLogger,
538544
autocommitDMLMode: Transactional,
539545
batch: &batch{tp: ddl},
@@ -568,7 +574,12 @@ func TestConn_NonDdlStatementsInDdlBatch(t *testing.T) {
568574
}
569575

570576
func TestConn_NonDmlStatementsInDmlBatch(t *testing.T) {
577+
parser, err := newStatementParser(databasepb.DatabaseDialect_GOOGLE_STANDARD_SQL, 1000)
578+
if err != nil {
579+
t.Fatal(err)
580+
}
571581
c := &conn{
582+
parser: parser,
572583
logger: noopLogger,
573584
batch: &batch{tp: dml},
574585
execSingleQuery: func(ctx context.Context, c *spanner.Client, statement spanner.Statement, tb spanner.TimestampBound, options ExecOptions) *spanner.RowIterator {
@@ -605,8 +616,12 @@ func TestConn_NonDmlStatementsInDmlBatch(t *testing.T) {
605616
func TestConn_GetBatchedStatements(t *testing.T) {
606617
t.Parallel()
607618

619+
parser, err := newStatementParser(databasepb.DatabaseDialect_GOOGLE_STANDARD_SQL, 1000)
620+
if err != nil {
621+
t.Fatal(err)
622+
}
608623
ctx := context.Background()
609-
c := &conn{logger: noopLogger}
624+
c := &conn{logger: noopLogger, parser: parser}
610625
if !reflect.DeepEqual(c.GetBatchedStatements(), []spanner.Statement{}) {
611626
t.Fatal("conn should return an empty slice when no batch is active")
612627
}
@@ -649,8 +664,13 @@ func TestConn_GetBatchedStatements(t *testing.T) {
649664
}
650665

651666
func TestConn_GetCommitTimestampAfterAutocommitDml(t *testing.T) {
667+
parser, err := newStatementParser(databasepb.DatabaseDialect_GOOGLE_STANDARD_SQL, 1000)
668+
if err != nil {
669+
t.Fatal(err)
670+
}
652671
want := time.Now()
653672
c := &conn{
673+
parser: parser,
654674
logger: noopLogger,
655675
autocommitDMLMode: Transactional,
656676
execSingleQuery: func(ctx context.Context, c *spanner.Client, statement spanner.Statement, tb spanner.TimestampBound, options ExecOptions) *spanner.RowIterator {
@@ -677,7 +697,12 @@ func TestConn_GetCommitTimestampAfterAutocommitDml(t *testing.T) {
677697
}
678698

679699
func TestConn_GetCommitTimestampAfterAutocommitQuery(t *testing.T) {
700+
parser, err := newStatementParser(databasepb.DatabaseDialect_GOOGLE_STANDARD_SQL, 1000)
701+
if err != nil {
702+
t.Fatal(err)
703+
}
680704
c := &conn{
705+
parser: parser,
681706
logger: noopLogger,
682707
execSingleQuery: func(ctx context.Context, c *spanner.Client, statement spanner.Statement, tb spanner.TimestampBound, options ExecOptions) *spanner.RowIterator {
683708
return &spanner.RowIterator{}
@@ -693,7 +718,7 @@ func TestConn_GetCommitTimestampAfterAutocommitQuery(t *testing.T) {
693718
if _, err := c.QueryContext(ctx, "SELECT * FROM Foo", []driver.NamedValue{}); err != nil {
694719
t.Fatalf("failed to execute query: %v", err)
695720
}
696-
_, err := c.CommitTimestamp()
721+
_, err = c.CommitTimestamp()
697722
if g, w := spanner.ErrCode(err), codes.FailedPrecondition; g != w {
698723
t.Fatalf("error code mismatch\n Got: %v\nWant: %v", g, w)
699724
}

0 commit comments

Comments
 (0)