CI Job False Positives: OPAM Install Errors Not Failing Build
Problem Summary
Our CI pipeline has a critical issue where jobs are showing errors during the OPAM install phase but are not failing the build. This creates dangerous false positives where we think builds are passing when they're actually failing.
Evidence
Discovered in MR !14: !14 Failing CI Job: https://git.haley.io/john/melange-mvp-template/-/jobs/1154#L51
✅ but Contains Errors)
Example Error (Job Shows Green The CI job shows as "passed" but contains errors like:
[ERROR] Package odoc.3.1.0 is not available in lock files, but is present in opam files
[ERROR] Package odoc-parser.3.1.0 is not available in lock files, but is present in opam files
Root Cause Analysis
-
OPAM Lock File Mismatches: The .opam files contain
odoc {with-doc}
dependencies but the corresponding .opam.locked files don't include these packages - CI Error Handling: The CI jobs are not properly failing when OPAM install steps encounter errors
- Silent Failures: This allows broken dependency configurations to pass CI, potentially reaching production
Impact
- False Confidence: Developers think builds are passing when they're failing
- Broken Deployments: Code with dependency issues could be deployed
- Debugging Overhead: Time wasted investigating "working" code that actually has issues
- Technical Debt: Accumulation of dependency mismatches over time
!14
Investigation Done in MRDuring our attempt to fix the odoc warnings, we:
- Identified that all .opam files contain
"odoc" {with-doc}
dependencies - Attempted to regenerate .opam.locked files with
OPAMWITHDOC=1 opam lock
- Successfully added odoc to lock files but discovered the CI was hiding errors
-
Added OPAM management guidelines to CLAUDE.md including:
- Never manually edit .opam files (they're generated from dune-project)
- Never manually edit .opam.locked files (regenerate with
opam lock
) - Proper procedures for dependency management
- Rolled back changes to avoid masking the underlying CI issue
CLAUDE.md Memory Updates
Added comprehensive OPAM file management guidelines to prevent future manual editing errors:
### OPAM File Management
- **NEVER modify `.opam` files directly** - they are generated from `dune-project`
- **To modify dependencies**: Edit `dune-project` and regenerate opam files with `dune build @all`
- **NEVER manually edit `.opam.locked` files** - regenerate them using `OPAMWITHDOC=1 opam lock ./package.opam`
- **For CI warnings about missing dependencies in lock files**: regenerate lock files rather than manually editing them
Required Fixes
Immediate (Critical)
- Fix CI Error Handling: Ensure OPAM install errors cause job failure
- Add Exit Code Validation: Verify that non-zero exit codes from OPAM commands fail the build
- Review All CI Jobs: Check other jobs for similar silent failure patterns
Follow-up (Important)
- Fix OPAM Lock Dependencies: Create separate MR to properly resolve odoc lock file issues
- Add CI Monitoring: Implement checks to catch when jobs pass with errors
- Documentation: Update CI setup docs to prevent similar issues
Acceptance Criteria
-
CI jobs fail when OPAM install encounters errors -
Job logs clearly show why builds failed -
False positive rate reduced to zero for dependency issues -
All existing "passing" builds re-evaluated for hidden errors
Labels
-
bug
- This is a critical bug in our CI system -
priority::critical
- False positives are dangerous for production -
type::ci
- CI/CD infrastructure issue -
phase::foundation
- Blocking foundation development work
Related
- MR !14: Database Foundation PostgreSQL Setup (where this was discovered)
- Future MR: Separate fix for OPAM lock file odoc dependencies
Edited by John Haley