CI/CD: Inconsistent artifact collection for monitoring reports across test jobs
Problem Description
The CI pipeline generates valuable monitoring reports (ci-monitoring-*/final-ci-report.md
) that provide post-mortem analysis of CI executions, but these artifacts are not being consistently collected across all test jobs. This makes debugging CI failures difficult and prevents access to critical diagnostic information.
Current State
What the monitoring reports contain:
The ci-enhanced-monitoring.sh
script generates comprehensive reports including:
- Execution summary (success/failure with exit codes)
- System information (host, OS, architecture)
- Resource usage peaks (memory, CPU, disk)
- Process analysis (final process states)
- Error analysis (tail of error logs)
- Cleanup actions performed
- Recommendations for fixing issues
- Paths to detailed monitoring artifacts
Current artifact collection issues:
-
Inconsistent collection: Only some jobs collect monitoring artifacts
-
✅ test:coverage
- collectsci-monitoring-$CI_JOB_ID/
-
✅ performance:migration-tools
- collectsci-monitoring-$CI_JOB_ID/
-
❌ test:unit
- NO monitoring artifacts collected -
❌ test:integration
- NO monitoring artifacts collected -
❌ test:e2e
- NO monitoring artifacts collected -
❌ validate:pipeline
- NO monitoring artifacts collected
-
-
Variable naming: Jobs use
$CI_JOB_ID
in artifact paths but the monitoring script creates directories with timestamps (e.g.,ci-monitoring-1755401890
) -
Missing validation failures: The
validate:pipeline
job doesn't collect any artifacts, making it hard to debug validation failures
Impact
- Debugging difficulty: When CI fails, developers can't access the monitoring reports without SSH access to runners
- Lost diagnostic data: Valuable performance and resource usage data is not preserved
- Inconsistent troubleshooting: Some jobs provide artifacts while others don't, leading to confusion
- Manual intervention required: Developers must manually run scripts locally to reproduce issues
Proposed Solution
1. Standardize artifact collection across ALL test jobs
Update .gitlab-ci.yml
to add consistent artifact collection:
.standard_artifacts: &standard_artifacts
artifacts:
when: always
reports:
junit:
- coverage/junit.xml
paths:
- coverage/
- ci-monitoring-*/
- ci-monitoring-*/final-ci-report.md
- docker-cleanup-*/
expire_in: 3 days
2. Fix artifact path mismatch
Either:
- Option A: Update monitoring script to use
$CI_JOB_ID
in directory names - Option B: Use wildcard patterns in artifact paths (current approach with
ci-monitoring-*/
)
3. Add artifacts to these jobs:
test:unit
test:integration
test:e2e
validate:pipeline
4. Create artifact aggregation job
Add a new job in the quality-gates
stage that:
- Downloads all monitoring reports from previous jobs
- Creates a consolidated summary report
- Posts key findings as MR comments (for MR pipelines)
Example:
aggregate:monitoring-reports:
stage: quality-gates
needs:
- job: "test:unit"
artifacts: true
- job: "test:integration"
artifacts: true
- job: "test:e2e"
artifacts: true
- job: "test:coverage"
artifacts: true
script:
- |
echo "# Consolidated CI Monitoring Report" > consolidated-report.md
for report in ci-monitoring-*/final-ci-report.md; do
echo "## $(dirname $report)" >> consolidated-report.md
cat "$report" >> consolidated-report.md
done
- |
# Post to MR if in MR pipeline
if [[ -n "$CI_MERGE_REQUEST_IID" ]]; then
# Extract key metrics and post as MR comment
# (implementation details omitted for brevity)
fi
artifacts:
paths:
- consolidated-report.md
expire_in: 1 week
Benefits
- Consistent debugging: All CI failures will have monitoring reports available
- Performance tracking: Historical performance data can be analyzed across runs
- Resource optimization: Identify jobs that need more resources based on actual usage
- Faster troubleshooting: No need to SSH to runners or reproduce locally
- MR visibility: Key findings posted directly to merge requests
Implementation Priority
High - This is a critical debugging and monitoring capability that directly impacts developer productivity when CI fails.
Testing
After implementation:
- Trigger a CI run and verify all jobs collect artifacts
- Intentionally fail a test and verify error reports are accessible
- Check GitLab UI shows artifacts for all test jobs
- Verify artifact retention periods are appropriate
Related Files
-
.gitlab-ci.yml
- Main CI configuration -
.ci/ci-enhanced-monitoring.sh
- Monitoring script that generates reports -
.ci/process-monitor.sh
- Background monitoring process
Labels
- ci-cd
- monitoring
- artifacts
- debugging
- quality-of-life