Skip to content

Implement Advanced Caching Strategies - Reduce dependency installation by 60%

Phase 4: Advanced Caching Strategies

🎯 Objective

Reduce npm dependency installation time from 90+ seconds to ~30 seconds through intelligent caching strategies.

📊 Current Caching Analysis

Current State

  • npm ci execution: 90-120 seconds per job
  • Cache hit rate: ~50% (key changes frequently)
  • Cache size: Growing unbounded
  • No layer caching for Docker operations

Problems Identified

  1. Cache key based only on package-lock.json - misses when lock file changes
  2. No fallback cache strategies
  3. Downloading full npm registry metadata repeatedly
  4. No caching of built artifacts between stages

🚀 Implementation Strategy

1. Multi-Level Cache Strategy

# .gitlab-ci.yml
variables:
  npm_config_cache: "$CI_PROJECT_DIR/.npm"
  YARN_CACHE_FOLDER: "$CI_PROJECT_DIR/.yarn"

.npm_cache_enhanced: &npm_cache_enhanced
  cache:
    - key:
        files:
          - package-lock.json
      paths:
        - .npm/
        - node_modules/
      policy: pull-push
    # Fallback cache when lock file changes
    - key: "$CI_COMMIT_REF_SLUG"
      paths:
        - .npm/
      policy: pull
    # Global fallback cache
    - key: "npm-cache-global"
      paths:
        - .npm/
      policy: pull

2. Artifact Caching Between Stages

# .gitlab-ci.yml
build:
  stage: build
  script:
    - npm ci --cache .npm --prefer-offline
    - npm run build
  artifacts:
    paths:
      - dist/
      - node_modules/
      - .npm/
    expire_in: 2 hours
    # Use artifacts for downstream jobs
    when: on_success

test:coverage:
  stage: test
  dependencies: ["build"]  # Reuse build artifacts
  script:
    # Skip npm ci if node_modules exists from artifacts
    - |
      if [ ! -d "node_modules" ]; then
        npm ci --cache .npm --prefer-offline
      else
        echo "Using node_modules from build artifacts"
      fi
    - npm run test:coverage:ci

3. Docker Layer Caching

# .gitlab-ci.yml
variables:
  DOCKER_BUILDKIT: 1
  BUILDKIT_INLINE_CACHE: 1

before_script:
  # Enable Docker layer caching
  - docker buildx create --use --driver docker-container
  - docker buildx inspect --bootstrap
  
build:docker:
  script:
    - |
      docker buildx build \
        --cache-from type=registry,ref=$CI_REGISTRY_IMAGE:cache \
        --cache-to type=registry,ref=$CI_REGISTRY_IMAGE:cache,mode=max \
        --tag $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA \
        --push .

4. NPM Registry Caching

# .gitlab-ci.yml
before_script:
  # Configure npm to use local cache aggressively
  - npm config set prefer-offline true
  - npm config set cache-min 999999999
  - npm config set fetch-retries 3
  - npm config set fetch-retry-mintimeout 15000
  - npm config set fetch-retry-maxtimeout 90000

5. Selective Dependency Installation

// package.json - Add CI-specific install script
{
  "scripts": {
    "ci:install": "npm ci --omit=optional --ignore-scripts --cache .npm --prefer-offline",
    "ci:install:production": "npm ci --production --cache .npm --prefer-offline"
  }
}

6. Cache Warming Strategy

# .gitlab-ci.yml - Scheduled pipeline to warm caches
cache:warm:
  only:
    - schedules
  script:
    - npm ci --cache .npm
    - npm run build
    - echo "Cache warmed at $(date)"
  cache:
    key: "npm-cache-global"
    paths:
      - .npm/
      - node_modules/
    policy: push

7. Intelligent Cache Invalidation

# .gitlab-ci.yml
variables:
  # Version cache keys to allow controlled invalidation
  CACHE_VERSION: "v2"
  
.npm_cache: &npm_cache
  cache:
    key: "$CACHE_VERSION-$CI_COMMIT_REF_SLUG-$CI_PIPELINE_ID"
    paths:
      - .npm/
      - node_modules/
    # Automatic cleanup of old caches
    untracked: false

8. Distributed Cache with GitLab

# .gitlab-ci.yml
default:
  cache:
    # Use distributed cache for better performance
    key:
      files:
        - package-lock.json
        - .gitlab-ci.yml  # Invalidate on CI changes
    paths:
      - .npm/
      - node_modules/
    # Cache compression for faster transfer
    untracked: false
    when: on_success

📈 Expected Improvements

Before

  • npm ci: 90-120 seconds
  • Cache hit rate: ~50%
  • Total dependency time: ~100 seconds average

After

  • npm ci (cache hit): 20-30 seconds
  • npm ci (cache miss): 60-70 seconds
  • Cache hit rate: ~85%
  • Total dependency time: ~35 seconds average

Net Savings

  • ~65 seconds per job average
  • 60% reduction in dependency installation time

⚠️ Risk Assessment

Low Risk

  • npm configuration changes
  • Artifact reuse between stages
  • Cache key strategies

Medium Risk

  • Cache size growth (mitigated by expiration)
  • Stale cache issues (mitigated by versioning)
  • Network issues with distributed cache

Mitigation

  • Regular cache pruning via scheduled pipelines
  • Cache version bumping when needed
  • Fallback strategies for cache misses

Success Criteria

  1. npm ci < 30 seconds with cache hit
  2. Cache hit rate > 80%
  3. Cache size < 500MB per job
  4. No stale dependency issues
  5. Total CI time reduced by 2+ minutes

📋 Implementation Steps

  1. Phase 1: Multi-level cache keys
  2. Phase 2: Artifact sharing between stages
  3. Phase 3: Docker layer caching
  4. Phase 4: Cache warming pipelines
  5. Phase 5: Monitoring and optimization

🔗 Related Issues

  • #11: Enable parallel test execution
  • #12: Optimize container and database management

📊 Monitoring Metrics

  • Cache hit/miss ratio
  • npm ci execution time
  • Cache transfer time
  • Total cache size
  • Job duration percentiles

🏷️ Advanced Optimizations (Future)

npm Optimization

# Use pnpm for faster installs (70% faster)
pnpm install --frozen-lockfile --prefer-offline

# Or Yarn with PnP (Plug'n'Play)
yarn install --immutable --immutable-cache

Pre-built Docker Images

# Dockerfile.ci
FROM node:20-alpine
WORKDIR /app
COPY package*.json ./
RUN npm ci --production
# Push as base image for CI

Distributed npm Cache

# Using Artifactory or Nexus as npm proxy
npm config set registry https://npm-cache.company.internal/

Priority: Medium Estimated Effort: 1 day Labels: performance, ci-cd, caching