Skip to content

Optimize Container and Database Management - Reduce CI setup time by 40%

Phase 3: Container and Database Optimization

🎯 Objective

Reduce container startup and database initialization time from 13+ minutes to ~8 minutes per test run.

📊 Current Performance Analysis

Container Startup Bottlenecks

  • PostgreSQL container startup: 60-90 seconds
  • Docker-in-Docker (dind) service: 30-45 seconds overhead
  • Container pull time: Variable (no local caching)
  • Database initialization: 15-30 seconds per test suite

Impact

  • Each test run waits 60+ seconds for container startup
  • No container reuse between test suites
  • Sequential database creation adds cumulative delays

🚀 Implementation Strategy

1. Pre-pulled Container Images

# .gitlab-ci.yml
before_script:
  - docker pull postgres:16-alpine || true
  - docker pull docker:dind || true

2. Optimized Container Configuration

// tests/helpers/shared-container-manager.ts
private async _startContainer(): Promise<void> {
  this.container = await new PostgreSqlContainer('postgres:16-alpine')
    .withDatabase('testdb')
    .withUsername('test')
    .withPassword('test')
    .withTmpFs({ '/var/lib/postgresql/data': 'rw,noexec,nosuid,size=256m' }) // RAM disk
    .withCommand([
      '-c', 'shared_buffers=128MB',
      '-c', 'max_connections=100',
      '-c', 'fsync=off',              // Test-only optimization
      '-c', 'synchronous_commit=off',  // Test-only optimization
      '-c', 'full_page_writes=off',    // Test-only optimization
      '-c', 'checkpoint_segments=32',
      '-c', 'checkpoint_completion_target=0.9',
      '-c', 'wal_buffers=16MB'
    ])
    .withStartupTimeout(30000) // Fail fast if container issues
    .start();
}

3. Connection Pool Optimization

// src/utils/database-operations.ts
export function getOptimizedPoolConfig(): PoolConfig {
  return {
    max: isCI() ? 5 : 10,           // Reduce connections in CI
    idleTimeoutMillis: 10000,       // Quick cleanup
    connectionTimeoutMillis: 5000,   // Fail fast
    statement_timeout: 30000,        // Prevent hanging queries
    query_timeout: 30000,
    // Connection pooling for test isolation
    application_name: `test_${process.pid}_${Date.now()}`
  };
}

4. Parallel Database Creation

// tests/helpers/postgres-test-utils.ts
export async function createTestDatabaseBatch(
  container: StartedPostgreSqlContainer,
  dbNames: string[]
): Promise<Map<string, string>> {
  const client = new Client({
    connectionString: container.getConnectionUri(),
  });
  
  await client.connect();
  
  try {
    // Create all databases in parallel
    const promises = dbNames.map(async (dbName) => {
      const safeDbName = dbName.toLowerCase().replace(/[^a-z0-9_]/g, '_');
      await client.query(`CREATE DATABASE "${safeDbName}"`);
      return [dbName, container.getConnectionUri().replace('/testdb', `/${safeDbName}`)];
    });
    
    const results = await Promise.all(promises);
    return new Map(results);
  } finally {
    await client.end();
  }
}

5. Container Health Checks

# docker-compose.test.yml (for local development)
services:
  postgres:
    image: postgres:16-alpine
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U test -d testdb"]
      interval: 5s
      timeout: 3s
      retries: 5
      start_period: 10s

6. GitLab CI Service Optimization

# .gitlab-ci.yml
test:coverage:
  services:
    - name: docker:dind
      alias: docker
      command: ["--storage-driver=overlay2", "--mtu=1450"]
    - name: postgres:16-alpine
      alias: postgres
      variables:
        POSTGRES_DB: testdb
        POSTGRES_USER: test
        POSTGRES_PASSWORD: test
        POSTGRES_INITDB_ARGS: "--encoding=UTF8 --locale=C"
        POSTGRES_HOST_AUTH_METHOD: "md5"

📈 Expected Improvements

Before

  • Container startup: 60-90 seconds
  • Database creation: 15-30 seconds per suite
  • Total setup time: ~13 minutes

After

  • Container startup: 20-30 seconds (pre-pulled + optimized)
  • Database creation: 5-10 seconds per suite (parallel + RAM disk)
  • Total setup time: ~8 minutes

Net Savings

  • ~5 minutes per test run
  • 40% reduction in setup overhead

⚠️ Risk Assessment

Low Risk

  • Container configuration changes (test-only optimizations)
  • Connection pool adjustments
  • Health check additions

Medium Risk

  • RAM disk usage (may require CI runner memory adjustments)
  • Parallel database creation (potential name collisions)

Mitigation

  • Gradual rollout with monitoring
  • Fallback to sequential creation if parallel fails
  • Memory usage monitoring in CI

Success Criteria

  1. Container startup < 30 seconds consistently
  2. Database creation < 10 seconds per suite
  3. No increase in test flakiness
  4. CI memory usage stays within limits
  5. Total test time reduced by 5+ minutes

📋 Testing Plan

  1. Implement optimizations in feature branch
  2. Run 10 CI builds to gather metrics
  3. Compare against baseline performance
  4. Monitor for flaky tests or failures
  5. Gradual rollout to main branch

🔗 Related Issues

  • #11: Enable parallel test execution
  • #13: Implement advanced caching strategies

📊 Monitoring Metrics

  • Container startup time
  • Database creation time
  • Memory usage per job
  • Test success rate
  • P95 test execution time

Priority: High Estimated Effort: 1 day Labels: performance, ci-cd, infrastructure