Skip to content

Extend Idempotency Detection: Add 30+ Missing SQL Patterns from 757 Production Migrations Analysis

🔍 Extend Idempotency Detection for SQL Migrations - Comprehensive Gap Analysis

🎯 Executive Summary

After analyzing 757 production migrations from other repositories and comparing against the current MCP server's idempotency detection capabilities, I've identified 30+ critical patterns that are not currently detected. These gaps represent real-world production scenarios that required manual DO $$ block wrapping and other idempotency techniques.

Impact: Without these detections, AI developers and human developers miss critical non-idempotent patterns that can cause migration failures in production.

📊 Current Coverage vs. Required Coverage

Currently Detected (Working Well)

  • CREATE TABLE without IF NOT EXISTS
  • CREATE INDEX without IF NOT EXISTS
  • DROP TABLE/INDEX without IF EXISTS
  • INSERT without ON CONFLICT
  • UPDATE without WHERE clause
  • DELETE without WHERE clause
  • ALTER TABLE ADD COLUMN without IF NOT EXISTS
  • CREATE OR REPLACE FUNCTION (recognized as safe)
  • INSERT ... ON CONFLICT (recognized as safe)

Critical Gaps Found from Analysis

🚨 Priority 1: CREATE TYPE Patterns (Found in 15+ migrations)

-- Currently NOT detected as non-idempotent
CREATE TYPE payment_status AS ENUM ('pending', 'paid', 'failed');

-- Should suggest this DO $$ block pattern:
DO $$
BEGIN
  IF NOT EXISTS (
    SELECT FROM pg_type t
    LEFT JOIN pg_catalog.pg_namespace n ON n.oid = t.typnamespace
    WHERE typname = 'payment_status' AND nspname = 'public'
  ) THEN
    CREATE TYPE public.payment_status AS ENUM ('pending', 'paid', 'failed');
  END IF;
END $$;

🚨 Priority 1: CREATE EXTENSION (Found in multiple migrations)

-- Currently NOT detected
CREATE EXTENSION fuzzystrmatch;
CREATE EXTENSION pg_trgm;

-- Should suggest:
CREATE EXTENSION IF NOT EXISTS fuzzystrmatch;

🚨 Priority 1: CREATE SCHEMA (Business-critical)

-- Currently NOT detected
CREATE SCHEMA ops_api;

-- Should suggest:
CREATE SCHEMA IF NOT EXISTS ops_api;

⚠️ Priority 2: ALTER TABLE RENAME COLUMN

-- Currently NOT detected as risky
ALTER TABLE users RENAME COLUMN old_name TO new_name;

-- Actual production pattern used:
DO $$
BEGIN
  ALTER TABLE public.users
    RENAME COLUMN referred_user_id TO referral_partner_user_id;
EXCEPTION
  WHEN undefined_column THEN
    -- Column was already renamed
END;
$$;

⚠️ Priority 2: ADD CONSTRAINT Patterns (Found in 20+ migrations)

-- Currently NOT detected
ALTER TABLE users ADD CONSTRAINT users_email_unique UNIQUE (email);
ALTER TABLE posts ADD CONSTRAINT posts_user_fk FOREIGN KEY (user_id) REFERENCES users(id);

-- Should suggest:
DO $$
BEGIN
  IF NOT EXISTS (
    SELECT 1 FROM pg_constraint WHERE conname = 'users_email_unique'
  ) THEN
    ALTER TABLE users ADD CONSTRAINT users_email_unique UNIQUE (email);
  END IF;
END $$;

⚠️ Priority 2: CREATE TRIGGER (Found in 189 migrations with TRIGGER/RULE/etc)

-- Currently NOT detected
CREATE TRIGGER update_timestamp BEFORE UPDATE ON users
  FOR EACH ROW EXECUTE FUNCTION update_modified_column();

-- Should suggest:
DROP TRIGGER IF EXISTS update_timestamp ON users;
CREATE TRIGGER update_timestamp BEFORE UPDATE ON users
  FOR EACH ROW EXECUTE FUNCTION update_modified_column();

-- Or CREATE OR REPLACE TRIGGER (PostgreSQL 14+)

📍 Priority 3: Advanced Patterns

ALTER COLUMN Operations:
-- Currently limited detection
ALTER TABLE users ALTER COLUMN status SET DEFAULT 'active';
ALTER TABLE users ALTER COLUMN email SET NOT NULL;
ALTER TABLE users ALTER COLUMN age DROP NOT NULL;

CREATE MATERIALIZED VIEW:
-- Found in multiple migrations
CREATE MATERIALIZED VIEW worker_stats AS SELECT ...;

-- Should suggest:
DROP MATERIALIZED VIEW IF EXISTS worker_stats;
CREATE MATERIALIZED VIEW worker_stats AS SELECT ...;

CREATE DOMAIN:
CREATE DOMAIN email AS TEXT CHECK (VALUE ~ '^[^@]+@[^@]+$');

-- Should suggest DO $$ block with existence check

GRANT/REVOKE Operations (Found in 267 files with 3610 occurrences):
-- Currently not checked for idempotency
GRANT SELECT ON TABLE users TO analyst;
REVOKE INSERT ON TABLE sensitive_data FROM public;

-- These are naturally idempotent but should be documented

🔧 Template System Extensions Required

1. DO $$ Block Template Generator

The system needs to generate appropriate DO $$ block templates based on the operation type:

interface DOBlockTemplate {
  operationType: 'CREATE_TYPE' | 'RENAME_COLUMN' | 'ADD_CONSTRAINT' | 'CONDITIONAL_OPERATION';
  generateTemplate(operation: SQLOperation): string;
}

Template Examples:

Type Creation Template:
DO $$
BEGIN
  IF NOT EXISTS (
    SELECT FROM pg_type t
    LEFT JOIN pg_catalog.pg_namespace n ON n.oid = t.typnamespace
    WHERE typname = '${typeName}' AND nspname = '${schema}'
  ) THEN
    ${originalSQL}
  END IF;
END $$;

Rename Column Template:
DO $$
BEGIN
  ${originalSQL}
EXCEPTION
  WHEN undefined_column THEN
    RAISE NOTICE 'Column was already renamed';
END $$;

Constraint Addition Template:
DO $$
BEGIN
  IF NOT EXISTS (
    SELECT 1 FROM pg_constraint
    WHERE conname = '${constraintName}'
  ) THEN
    ${originalSQL}
  END IF;
EXCEPTION
  WHEN duplicate_object THEN
    RAISE NOTICE 'Constraint already exists';
END $$;

2. Environment-Conditional Templates

Found in migrations that check database environment:

DO $$
BEGIN
  IF (SELECT current_database() != 'dev_local') THEN
    -- Production-only operations
    ${productionSQL}
  END IF;
END $$;

3. Data Migration Safety Templates

DO $$
BEGIN
  IF EXISTS (
    SELECT FROM information_schema.columns
    WHERE table_schema = '${schema}'
    AND table_name = '${table}'
    AND column_name = '${oldColumn}'
  ) THEN
    UPDATE ${table} SET ${newColumn} = ${oldColumn}
    WHERE ${oldColumn} IS NOT NULL;
  END IF;
END $$;

🏗️ Implementation Requirements

Detection Engine Enhancements

interface IdempotencyDetector {
  // New patterns to detect
  patterns: {
    CREATE_TYPE: /CREATE\s+TYPE\s+(?!IF\s+NOT\s+EXISTS)/i,
    CREATE_EXTENSION: /CREATE\s+EXTENSION\s+(?!IF\s+NOT\s+EXISTS)/i,
    CREATE_SCHEMA: /CREATE\s+SCHEMA\s+(?!IF\s+NOT\s+EXISTS)/i,
    RENAME_COLUMN: /ALTER\s+TABLE.*RENAME\s+COLUMN/i,
    ADD_CONSTRAINT: /ADD\s+CONSTRAINT/i,
    CREATE_TRIGGER: /CREATE\s+TRIGGER\s+(?!OR\s+REPLACE)/i,
    CREATE_RULE: /CREATE\s+RULE\s+(?!OR\s+REPLACE)/i,
    CREATE_MATERIALIZED_VIEW: /CREATE\s+MATERIALIZED\s+VIEW/i,
    CREATE_DOMAIN: /CREATE\s+DOMAIN/i,
    ALTER_COLUMN_SET: /ALTER\s+COLUMN.*SET\s+(DEFAULT|NOT\s+NULL)/i,
    ALTER_COLUMN_DROP: /ALTER\s+COLUMN.*DROP\s+(DEFAULT|NOT\s+NULL)/i,
  };

  // Risk levels for each pattern
  riskLevels: {
    CREATE_TYPE: 'ERROR',
    CREATE_EXTENSION: 'ERROR',
    CREATE_SCHEMA: 'ERROR',
    RENAME_COLUMN: 'WARNING',
    ADD_CONSTRAINT: 'WARNING',
    CREATE_TRIGGER: 'WARNING',
    ALTER_COLUMN_SET: 'INFO',
    GRANT_REVOKE: 'INFO', // Already idempotent
  };
}

PostgreSQL Version Awareness

interface PostgreSQLVersionFeatures {
  '9.6': ['ALTER_COLUMN_IF_NOT_EXISTS'],
  '12': ['CREATE_TYPE_IF_NOT_EXISTS'],
  '13': ['ALTER_TYPE_ADD_VALUE_IF_NOT_EXISTS'],
  '14': ['CREATE_OR_REPLACE_TRIGGER'],
}

Fix Suggestion Engine

interface FixSuggestion {
  pattern: string;
  severity: 'ERROR' | 'WARNING' | 'INFO';
  originalSQL: string;
  suggestedFix: string;
  explanation: string;
  postgresVersionRequired?: string;
  alternativeFixes?: string[];
}

 Acceptance Criteria

Detection Coverage

- Detect all CREATE TYPE patterns without existence checks
- Detect CREATE EXTENSION without IF NOT EXISTS
- Detect CREATE SCHEMA without IF NOT EXISTS
- Detect ALTER TABLE RENAME COLUMN operations
- Detect ADD CONSTRAINT operations without existence checks
- Detect CREATE TRIGGER/RULE without DROP IF EXISTS or CREATE OR REPLACE
- Detect CREATE MATERIALIZED VIEW without DROP IF EXISTS
- Detect CREATE DOMAIN without existence checks
- Detect ALTER COLUMN SET/DROP operations
- Recognize and properly categorize GRANT/REVOKE as idempotent

Template System

- Generate appropriate DO $$ block templates for each pattern
- Support environment-conditional templates
- Support data migration safety templates
- Generate templates with proper exception handling
- Support namespace-aware existence checks

Risk Assessment

- Categorize each pattern by risk level (ERROR/WARNING/INFO)
- Provide confidence scores for fix suggestions
- Include PostgreSQL version requirements in suggestions

Testing

- Unit tests for each new pattern detection
- Integration tests with real migration examples
- Performance tests with 750+ migration files
- Template generation tests for each pattern type

Documentation for AI Developers

- Update CLAUDE.md with new patterns and templates
- Provide examples of each pattern with fixes
- Include best practices for idempotent migrations
- Add troubleshooting guide for complex patterns

📈 Success Metrics

- Coverage: Detect 95%+ of non-idempotent patterns found in 757 production migrations
- Accuracy: Zero false positives for ERROR-level detections
- Performance: Analyze 750+ migrations in <5 seconds
- Usability: AI developers can generate correct fixes 100% of the time

🔗 References

- Current implementation: src/utils/idempotency-analyzer.ts
- PostgreSQL documentation on idempotent operations
- https://github.com/graphile/migrate/blob/main/docs/idempotent-examples.md

🚀 Implementation Phases

Phase 1: Core Pattern Detection (Week 1)

- Implement CREATE TYPE, EXTENSION, SCHEMA detection
- Add basic DO $$ block template generation

Phase 2: Advanced Patterns (Week 1-2)

- Implement RENAME, CONSTRAINT, TRIGGER detection
- Add exception handling templates

Phase 3: Template System (Week 2)

- Build comprehensive template generator
- Add environment-conditional support
- Implement PostgreSQL version awareness

Phase 4: Testing & Documentation (Week 2)

- Test with all 757 production migrations
- Update documentation and CLAUDE.md
- Performance optimization

---
This issue enables AI developers to write production-grade idempotent migrations by detecting and
fixing patterns proven critical in 757 real-world migrations.