Optimize Container and Database Management - Reduce CI setup time by 40%
Phase 3: Container and Database Optimization
🎯 Objective
Reduce container startup and database initialization time from 13+ minutes to ~8 minutes per test run.
📊 Current Performance Analysis
Container Startup Bottlenecks
- PostgreSQL container startup: 60-90 seconds
- Docker-in-Docker (dind) service: 30-45 seconds overhead
- Container pull time: Variable (no local caching)
- Database initialization: 15-30 seconds per test suite
Impact
- Each test run waits 60+ seconds for container startup
- No container reuse between test suites
- Sequential database creation adds cumulative delays
🚀 Implementation Strategy
1. Pre-pulled Container Images
# .gitlab-ci.yml
before_script:
- docker pull postgres:16-alpine || true
- docker pull docker:dind || true
2. Optimized Container Configuration
// tests/helpers/shared-container-manager.ts
private async _startContainer(): Promise<void> {
this.container = await new PostgreSqlContainer('postgres:16-alpine')
.withDatabase('testdb')
.withUsername('test')
.withPassword('test')
.withTmpFs({ '/var/lib/postgresql/data': 'rw,noexec,nosuid,size=256m' }) // RAM disk
.withCommand([
'-c', 'shared_buffers=128MB',
'-c', 'max_connections=100',
'-c', 'fsync=off', // Test-only optimization
'-c', 'synchronous_commit=off', // Test-only optimization
'-c', 'full_page_writes=off', // Test-only optimization
'-c', 'checkpoint_segments=32',
'-c', 'checkpoint_completion_target=0.9',
'-c', 'wal_buffers=16MB'
])
.withStartupTimeout(30000) // Fail fast if container issues
.start();
}
3. Connection Pool Optimization
// src/utils/database-operations.ts
export function getOptimizedPoolConfig(): PoolConfig {
return {
max: isCI() ? 5 : 10, // Reduce connections in CI
idleTimeoutMillis: 10000, // Quick cleanup
connectionTimeoutMillis: 5000, // Fail fast
statement_timeout: 30000, // Prevent hanging queries
query_timeout: 30000,
// Connection pooling for test isolation
application_name: `test_${process.pid}_${Date.now()}`
};
}
4. Parallel Database Creation
// tests/helpers/postgres-test-utils.ts
export async function createTestDatabaseBatch(
container: StartedPostgreSqlContainer,
dbNames: string[]
): Promise<Map<string, string>> {
const client = new Client({
connectionString: container.getConnectionUri(),
});
await client.connect();
try {
// Create all databases in parallel
const promises = dbNames.map(async (dbName) => {
const safeDbName = dbName.toLowerCase().replace(/[^a-z0-9_]/g, '_');
await client.query(`CREATE DATABASE "${safeDbName}"`);
return [dbName, container.getConnectionUri().replace('/testdb', `/${safeDbName}`)];
});
const results = await Promise.all(promises);
return new Map(results);
} finally {
await client.end();
}
}
5. Container Health Checks
# docker-compose.test.yml (for local development)
services:
postgres:
image: postgres:16-alpine
healthcheck:
test: ["CMD-SHELL", "pg_isready -U test -d testdb"]
interval: 5s
timeout: 3s
retries: 5
start_period: 10s
6. GitLab CI Service Optimization
# .gitlab-ci.yml
test:coverage:
services:
- name: docker:dind
alias: docker
command: ["--storage-driver=overlay2", "--mtu=1450"]
- name: postgres:16-alpine
alias: postgres
variables:
POSTGRES_DB: testdb
POSTGRES_USER: test
POSTGRES_PASSWORD: test
POSTGRES_INITDB_ARGS: "--encoding=UTF8 --locale=C"
POSTGRES_HOST_AUTH_METHOD: "md5"
📈 Expected Improvements
Before
- Container startup: 60-90 seconds
- Database creation: 15-30 seconds per suite
- Total setup time: ~13 minutes
After
- Container startup: 20-30 seconds (pre-pulled + optimized)
- Database creation: 5-10 seconds per suite (parallel + RAM disk)
- Total setup time: ~8 minutes
Net Savings
- ~5 minutes per test run
- 40% reduction in setup overhead
⚠️ Risk Assessment
Low Risk
- Container configuration changes (test-only optimizations)
- Connection pool adjustments
- Health check additions
Medium Risk
- RAM disk usage (may require CI runner memory adjustments)
- Parallel database creation (potential name collisions)
Mitigation
- Gradual rollout with monitoring
- Fallback to sequential creation if parallel fails
- Memory usage monitoring in CI
✅ Success Criteria
- Container startup < 30 seconds consistently
- Database creation < 10 seconds per suite
- No increase in test flakiness
- CI memory usage stays within limits
- Total test time reduced by 5+ minutes
📋 Testing Plan
- Implement optimizations in feature branch
- Run 10 CI builds to gather metrics
- Compare against baseline performance
- Monitor for flaky tests or failures
- Gradual rollout to main branch
🔗 Related Issues
- #11: Enable parallel test execution
- #13: Implement advanced caching strategies
📊 Monitoring Metrics
- Container startup time
- Database creation time
- Memory usage per job
- Test success rate
- P95 test execution time
Priority: High
Estimated Effort: 1 day
Labels: performance
, ci-cd
, infrastructure