CI/CD: Inconsistent artifact collection for monitoring reports across test jobs

Problem Description

The CI pipeline generates valuable monitoring reports (ci-monitoring-*/final-ci-report.md) that provide post-mortem analysis of CI executions, but these artifacts are not being consistently collected across all test jobs. This makes debugging CI failures difficult and prevents access to critical diagnostic information.

Current State

What the monitoring reports contain:

The ci-enhanced-monitoring.sh script generates comprehensive reports including:

Execution summary (success/failure with exit codes)
System information (host, OS, architecture)
Resource usage peaks (memory, CPU, disk)
Process analysis (final process states)
Error analysis (tail of error logs)
Cleanup actions performed
Recommendations for fixing issues
Paths to detailed monitoring artifacts

Current artifact collection issues:

Inconsistent collection: Only some jobs collect monitoring artifacts
- ✅ test:coverage - collects ci-monitoring-$CI_JOB_ID/
- ✅ performance:migration-tools - collects ci-monitoring-$CI_JOB_ID/
- ❌ test:unit - NO monitoring artifacts collected
- ❌ test:integration - NO monitoring artifacts collected
- ❌ test:e2e - NO monitoring artifacts collected
- ❌ validate:pipeline - NO monitoring artifacts collected
Variable naming: Jobs use $CI_JOB_ID in artifact paths but the monitoring script creates directories with timestamps (e.g., ci-monitoring-1755401890)
Missing validation failures: The validate:pipeline job doesn't collect any artifacts, making it hard to debug validation failures

Impact

Debugging difficulty: When CI fails, developers can't access the monitoring reports without SSH access to runners
Lost diagnostic data: Valuable performance and resource usage data is not preserved
Inconsistent troubleshooting: Some jobs provide artifacts while others don't, leading to confusion
Manual intervention required: Developers must manually run scripts locally to reproduce issues

Proposed Solution

1. Standardize artifact collection across ALL test jobs

Update .gitlab-ci.yml to add consistent artifact collection:

.standard_artifacts: &standard_artifacts
  artifacts:
    when: always
    reports:
      junit:
        - coverage/junit.xml
    paths:
      - coverage/
      - ci-monitoring-*/
      - ci-monitoring-*/final-ci-report.md
      - docker-cleanup-*/
    expire_in: 3 days

2. Fix artifact path mismatch

Either:

Option A: Update monitoring script to use $CI_JOB_ID in directory names
Option B: Use wildcard patterns in artifact paths (current approach with ci-monitoring-*/)

3. Add artifacts to these jobs:

test:unit
test:integration
test:e2e
validate:pipeline

4. Create artifact aggregation job

Add a new job in the quality-gates stage that:

Downloads all monitoring reports from previous jobs
Creates a consolidated summary report
Posts key findings as MR comments (for MR pipelines)

Example:

aggregate:monitoring-reports:
  stage: quality-gates
  needs:
    - job: "test:unit"
      artifacts: true
    - job: "test:integration"
      artifacts: true
    - job: "test:e2e"
      artifacts: true
    - job: "test:coverage"
      artifacts: true
  script:
    - |
      echo "# Consolidated CI Monitoring Report" > consolidated-report.md
      for report in ci-monitoring-*/final-ci-report.md; do
        echo "## $(dirname $report)" >> consolidated-report.md
        cat "$report" >> consolidated-report.md
      done
    - |
      # Post to MR if in MR pipeline
      if [[ -n "$CI_MERGE_REQUEST_IID" ]]; then
        # Extract key metrics and post as MR comment
        # (implementation details omitted for brevity)
      fi
  artifacts:
    paths:
      - consolidated-report.md
    expire_in: 1 week

Benefits

Consistent debugging: All CI failures will have monitoring reports available
Performance tracking: Historical performance data can be analyzed across runs
Resource optimization: Identify jobs that need more resources based on actual usage
Faster troubleshooting: No need to SSH to runners or reproduce locally
MR visibility: Key findings posted directly to merge requests

Implementation Priority

High - This is a critical debugging and monitoring capability that directly impacts developer productivity when CI fails.

Testing

After implementation:

Trigger a CI run and verify all jobs collect artifacts
Intentionally fail a test and verify error reports are accessible
Check GitLab UI shows artifacts for all test jobs
Verify artifact retention periods are appropriate

Related Files

.gitlab-ci.yml - Main CI configuration
.ci/ci-enhanced-monitoring.sh - Monitoring script that generates reports
.ci/process-monitor.sh - Background monitoring process

Labels

ci-cd
monitoring
artifacts
debugging
quality-of-life