Advanced Topics
Deep dive into change detection, performance optimization, and advanced patterns
This page covers advanced features and implementation details of the DB Query Operator.
Change Detection
Change detection optimizes database polling by only executing the main query when data has actually changed.
How It Works
sequenceDiagram
participant C as Controller
participant D as Database
loop Every pollInterval
C->>D: SELECT MAX(updated_at) FROM table
D-->>C: 2024-01-15 10:30:00
alt Timestamp changed
C->>D: Execute main query
D-->>C: Return all rows
C->>C: Reconcile resources
else Timestamp unchanged
C->>C: Skip reconciliation
end
endImplementation
The operator stores the last seen timestamp in memory and compares it on each poll:
- First Poll: No timestamp cached, executes main query
- Subsequent Polls:
- Queries
SELECT MAX(timestamp_column) FROM table_name - Compares with cached value
- Only runs main query if timestamp increased
- Queries
Configuration
spec:
changeDetection:
enabled: true
tableName: "my_table"
timestampColumn: "updated_at"Database Requirements
Your table must have a timestamp column that updates whenever data changes:
CREATE TABLE my_table (
id TEXT PRIMARY KEY,
data JSONB,
updated_at TIMESTAMP DEFAULT NOW()
);
-- Option 1: Trigger-based update
CREATE OR REPLACE FUNCTION update_timestamp()
RETURNS TRIGGER AS $$
BEGIN
NEW.updated_at = NOW();
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER my_table_timestamp
BEFORE UPDATE ON my_table
FOR EACH ROW
EXECUTE FUNCTION update_timestamp();
-- Option 2: Application-managed
UPDATE my_table SET data = $1, updated_at = NOW() WHERE id = $2;Performance Impact
Without change detection (pollInterval = 30s):
- 120 queries/hour to database
- Full table scan every 30s
- Continuous resource reconciliation
With change detection (pollInterval = 30s):
- 120 lightweight timestamp checks/hour
- Main query only when data changes
- Reconciliation only when needed
Example: For a table updated 5 times/hour:
- Reduces main queries from 120 to 5 (96% reduction)
- Minimal overhead from timestamp checks
Limitations
- Only detects row updates/inserts, not deletes (use
prune: truefor delete handling) - Requires timestamp column maintained by triggers or application
- Single table only (no joins in change detection query)
Best Practices
- Use change detection for tables with infrequent updates
- Set aggressive
pollInterval(e.g., 30s) with change detection enabled - Ensure timestamp column is indexed for performance
- Use database triggers for automatic timestamp updates
Connection Pooling
The operator reuses database connections across reconciliation cycles.
Implementation Details
- Single
*sql.DBconnection pool perDatabaseQueryResource - Connection established on first reconciliation
- Reused for all subsequent polls
- Closed when
DatabaseQueryResourceis deleted
Connection Configuration
Currently uses pgx defaults:
- Max open connections: Unlimited
- Max idle connections: 2
- Connection max lifetime: Unlimited
- Connection max idle time: Unlimited
Future Enhancements
Connection pool configuration via CRD:
# Future feature
spec:
database:
connectionPool:
maxOpenConns: 10
maxIdleConns: 5
connMaxLifetime: "1h"
connMaxIdleTime: "5m"Multi-Statement Query Execution
The operator automatically handles multi-statement queries using pgx.Batch.
Detection Logic
func splitStatements(sql string) []string {
// Splits on semicolons outside of:
// - Single quotes: 'text;text'
// - Double quotes: "text;text"
// - Dollar quotes: $$text;text$$
// - Tagged dollar quotes: $tag$text;text$tag$
}Execution Strategy
When multiple statements detected:
- Parse query into individual statements
- Create
pgx.Batch - Execute all but last statement (setup commands)
- Execute last statement and return results
Use Cases
Setting Session Parameters
SET statement_timeout = '5s';
SET lock_timeout = '3s';
SELECT * FROM large_table WHERE indexed_column = 'value';The operator automatically:
- Sets statement timeout
- Sets lock timeout
- Executes main query
- Returns results from step 3
Using PostgreSQL Extensions
CREATE EXTENSION IF NOT EXISTS "uuid-ossp";
SET search_path = public, extensions;
SELECT uuid_generate_v4() as id, name FROM users;Setup commands executed once, main query returns data.
Limitations
- All statements except the last must not return data
- Cannot use transactions (each statement auto-commits)
- Error in setup statement aborts entire batch
Resource Ownership and Finalizers
The operator uses Kubernetes ownership and finalizers for resource lifecycle management.
Owner References
Every created resource has an owner reference pointing to its DatabaseQueryResource:
apiVersion: v1
kind: ConfigMap
metadata:
name: generated-config
ownerReferences:
- apiVersion: konnektr.io/v1alpha1
kind: DatabaseQueryResource
name: my-dbqr
uid: abc-123
controller: true
blockOwnerDeletion: trueImplications
- Deleting a
DatabaseQueryResourceautomatically deletes managed resources - Owner references work only for resources in the same namespace
- Cross-namespace or cluster-scoped resources require manual cleanup
Finalizers
The operator adds a finalizer to track managed resources:
apiVersion: konnektr.io/v1alpha1
kind: DatabaseQueryResource
metadata:
name: my-dbqr
finalizers:
- konnektr.io/dbqr-finalizerDeletion Process:
- User deletes
DatabaseQueryResource - Kubernetes marks resource for deletion (
deletionTimestampset) - Operator finalizer prevents immediate deletion
- Operator cleans up managed resources
- Operator removes finalizer
- Kubernetes completes deletion
Cross-Namespace Resources
For resources in different namespaces, the operator:
- Cannot set owner references (Kubernetes limitation)
- Tracks resources via labels/annotations
- Manually deletes on DBQR deletion via finalizer logic
Cluster-Scoped Resources
Similar to cross-namespace resources:
- No owner references possible
- Manual tracking and cleanup
- Requires cluster-level RBAC permissions
Server-Side Apply
The operator uses Kubernetes Server-Side Apply for resource management.
Benefits
- Conflict Resolution: Multiple controllers can manage same resource
- Field Ownership: Operator owns only fields it sets
- Partial Updates: Only specified fields are updated
- Drift Detection: Kubernetes tracks who owns which fields
Field Manager
The operator identifies itself as field manager:
metadata:
managedFields:
- manager: db-query-operator
operation: Apply
apiVersion: v1
fields:
f:data:
f:config.json: {}Implications
- Manual
kubectl editchanges to managed fields are reverted - Other controllers can manage non-conflicting fields
- Operator updates are atomic and conflict-free
Example: Shared ConfigMap
# Operator manages data section
apiVersion: v1
kind: ConfigMap
metadata:
name: shared-config
labels:
managed-by: db-query-operator # Operator field
team: platform # Manual field (preserved)
data:
config.json: "{...}" # Operator field
manual-key: "manual-value" # Manual field (preserved)Status Updates and Two-Way Sync
The statusUpdateQuery enables bidirectional synchronization between Kubernetes and the database.
Architecture
sequenceDiagram
participant D as Database
participant O as Operator
participant K as Kubernetes
O->>D: Query for resources
D-->>O: Return rows
O->>K: Apply resources
K-->>K: Resources reach desired state
O->>K: Get resource status
K-->>O: Return status
O->>D: Execute statusUpdateQueryUse Cases
Track Deployment Readiness
spec:
statusUpdateQuery: |
UPDATE deployments
SET ready_replicas = {{ .Status.readyReplicas }},
available_replicas = {{ .Status.availableReplicas }},
last_check = NOW()
WHERE name = {{ .Metadata.Name | squote }}Track ArgoCD Application Sync Status
spec:
gvk:
group: "argoproj.io"
version: "v1alpha1"
kind: "Application"
statusUpdateQuery: |
UPDATE applications
SET sync_status = {{ .Status.sync.status | squote }},
health_status = {{ .Status.health.status | squote }},
last_synced = {{ .Status.operationState.finishedAt | squote }}
WHERE app_name = {{ .Metadata.Name | squote }}Available Template Data
In statusUpdateQuery, you have access to:
.Metadata: Resource metadata (name, namespace, labels, annotations).Status: Complete resource status object.Spec: Resource spec (for reading values)
Template Functions
squote: Single-quote SQL strings safelytoJson: Convert object to JSON string- All Sprig functions
Error Handling
- Status update failures are logged but don't block reconciliation
- Resource is still created/updated even if status update fails
- Useful for read-only database scenarios
Performance Tuning
Poll Interval Selection
Aggressive (30s-1m):
- ✅ Use with change detection enabled
- ✅ Real-time requirements
- ❌ Without change detection (high DB load)
Moderate (2m-5m):
- ✅ Most production workloads
- ✅ Balance between freshness and load
- ✅ Default recommendation
Conservative (10m-1h):
- ✅ Infrequently changing data
- ✅ Large result sets
- ❌ Time-sensitive applications
Query Optimization
-- Bad: Full table scan
SELECT * FROM resources;
-- Good: Indexed filter
SELECT * FROM resources WHERE enabled = true AND updated_at > NOW() - INTERVAL '24 hours';
-- Better: Materialized view
CREATE MATERIALIZED VIEW active_resources AS
SELECT * FROM resources WHERE enabled = true;
CREATE INDEX ON active_resources(updated_at);
-- Query the view
SELECT * FROM active_resources;Resource Template Optimization
Minimize Template Complexity:
# Complex (slower rendering)
resourceTemplate: |
{{ range $i, $item := .Row.items | fromJson }}
{{ if gt $item.value 100 }}
...
{{ end }}
{{ end }}
# Simple (faster rendering)
resourceTemplate: |
data:
items: {{ .Row.items }}Pre-compute in Database:
-- Move logic to database
SELECT
name,
CASE WHEN value > 100 THEN 'high' ELSE 'low' END as tier
FROM resourcesSecurity Considerations
Database Credentials
- Store in Kubernetes Secrets
- Use namespace-local secrets when possible
- Rotate credentials regularly
- Use read-only database users where appropriate
RBAC Permissions
The operator needs:
# Same-namespace resources
- apiGroups: [""]
resources: ["configmaps", "secrets"]
verbs: ["get", "list", "create", "update", "patch", "delete"]
# Cross-namespace resources
- apiGroups: [""]
resources: ["configmaps"]
verbs: ["get", "list", "create", "update", "patch", "delete"]
resourceNames: [] # Or specific names
# Cluster-scoped
- apiGroups: [""]
resources: ["namespaces"]
verbs: ["get", "list", "create", "update", "patch", "delete"]SQL Injection Prevention
The operator does NOT sanitize SQL queries. Ensure:
- Database users have minimal privileges
- Queries are written by trusted operators
- Use parameterized queries where possible (in
statusUpdateQuerytemplates)
Monitoring and Observability
Logs
The operator logs:
- Reconciliation cycles
- Query execution times
- Resource creation/update/delete operations
- Errors and warnings
kubectl logs -n dbqo-system deployment/db-query-operator -fMetrics
Future feature: Prometheus metrics
dbqr_reconciliation_duration_seconds
dbqr_query_duration_seconds
dbqr_resources_managed_total
dbqr_errors_totalStatus Conditions
Future feature: Status conditions on DBQR
status:
conditions:
- type: Ready
status: "True"
lastTransitionTime: "2024-01-15T10:00:00Z"
- type: DatabaseConnected
status: "True"
observedGeneration: 5
lastReconcile: "2024-01-15T10:30:00Z"
managedResources: 15Next Steps
- Review Examples for practical implementations
- Check Troubleshooting for common issues
- See API Reference for complete CRD documentation