Skip to content

Conversation

@vector-of-bool
Copy link
Contributor

Summary

Refer: CDRIVER-6142, this is a set of drive-by changes done as part of CDRIVER-5982, split into a separate PR to keep a clean history and review set.

This changeset modifies how Earthly environments are specified and set up for the build.

Most of this changeset consists of modifications to generated Evergreen configs.

While doing refactors to the amongoc CI plan, I found that it is far simpler to have an Earthly function for each dependency that knows how to "do the right thing" to install the relevant packages for whatever host platform it happens to be running within, especially by adding a new __can_install utility. The following significant changes are made:

  1. The env.foo Earthly targets are dropped. Instead, environments are specified using --from instead of --env, which just executes a regular FROM command to set the base container image.
  2. For each dependency, an ADD_FOO Earthly function is defined, and it will just "do the right thing" to obtain the relevant package for the build (or do nothing, if that dependency is disabled).
  3. The many small scripts are consolidated into a single __tool script, which explodes itself using __alias to reproduce the same set of tools that were available beforehand.
  4. New tools are defined, the most important being __can_install <pkg>, which determines whether a package <pkg> is available for installation. This makes dep management far simpler since we don't need to do fancy platform/tool detection when we can just say "If gcc-c++ is available, install it." without needing to know which package manager we're going to use.
  5. Because --from (formerly --env) has been modified to take a plain container image specifier, the logic for switching to ECR for the base platform container images is moved to earthly.py.
  6. The build has been split into four steps: +init, +build-environment, +configure, and +build. The main purpose of this split is to support subdividing the EVG build into the four steps so that we get separate timing/logging information for each.

All of the above also has the added benefit of being able to test any Linux distro/version on-the-fly by just passing a new --from rather than needing to modify Earthfile to support a new platform.

This change rewrites much of Earthfile's dependency-installation handling
code to be simpler to modify, and more flexible to use.

Previously, each environment had its own target that needed to know how
to install all of our dependencies for the specific platform. This
refactor inverts the relationship: Instead, each dependency has an Earthly
function that knows how to install itself based on whatever platform it
happens to be running within.

This allows the FROM line to be a plain image, rather that one of our
handcrafted `env.foo` targets. This will make introducing and testing
additional environments far more simple in the future.

This change also allows us to remove the tools/build.earth utility, since we can
now fit everything in the top-level file.
@vector-of-bool vector-of-bool marked this pull request as ready for review November 10, 2025 16:25
@vector-of-bool vector-of-bool requested a review from a team as a code owner November 10, 2025 16:25
@vector-of-bool vector-of-bool requested review from connorsmacd and eramongodb and removed request for connorsmacd November 10, 2025 16:25
@vector-of-bool
Copy link
Contributor Author

@eramongodb requested you especially because this does some juggling of the ECR handling in EVG.

Copy link
Contributor

@eramongodb eramongodb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Earthly workflow is a lot easier to track than before. Thank you for these improvements!

Comment on lines 64 to 77
# Match 'alpine3.18' 'alpine53.123' etc.
case alp if mat := re.match(r'alpine(\d+\.\d+)', alp):
case alp if mat := re.match(r'alpine:(\d+\.\d+)', alp):
return ('Alpine', mat[1])
case 'archlinux':
return 'ArchLinux', None
# Match 'u22', 'u20', 'u71' etc.
case ubu if mat := re.match(r'u(\d\d)', ubu):
return 'Ubuntu', f'{mat[1]}.04'
case ubu if mat := re.match(r'ubuntu:(\d\d.*)', ubu):
return 'Ubuntu', f'{mat[1]}'
# Match 'centos9', 'centos10', etc.
case cent if mat := re.match(r'centos(\d+)', cent):
return 'CentOS', f'{mat[1]}'
case cent if mat := re.match(r'.*centos:(stream)?(\d+.*)', cent):
return 'CentOS', f'{mat[2]}'
# Match 'almalinux8', 'almalinux10', etc.
case alm if mat := re.match(r'almalinux(\d+)', alm):
case alm if mat := re.match(r'almalinux:(\d+.*)', alm):
return 'AlmaLinux', f'{mat[1]}'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments need to be updated for consistency with new patterns.

Comment on lines +73 to +74
case cent if mat := re.match(r'.*centos:(stream)?(\d+.*)', cent):
return 'CentOS', f'{mat[2]}'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
case cent if mat := re.match(r'.*centos:(stream)?(\d+.*)', cent):
return 'CentOS', f'{mat[2]}'
case cent if mat := re.match(r'.*centos:(?:stream)?(\d+.*)', cent):
return 'CentOS', f'{mat[1]}'

Use (?:<...>) to match <...> without including it in resulting match groups.

Can the (\d+.*) be changed to (\d+)? I don't think we intend to use any -blah images (minimal, dev, date, etc.).

Comment on lines +82 to +88
def from_container_image(img: EnvImage) -> str:
"""
Modify an unqualified FROM container identifier to route to our ECR host
"""
if '/' in img or img.startswith('+'):
return img
return f'{_ECR_HOST}/dockerhub/library/{img}'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest including a reference to DEVPROD-21478 for discoverability once Amazon ECR registry-handling is implemented in EVG (specifically RHEL distros) and we no longer have to workaround it ourselves.

args={'targets': ' '.join(targets)} | earthly_args,
),
], # type: ignore (The type annots on `commands` is wrong)
tags=['earthly', 'pr-merge-gate', *env_tags],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a pre-existing issue (not a regression in this PR0, but the tag list for Earthly tasks could use some cleanup, e.g. tasks in the archlinux-clang variant include all OS image tags:

Image

Suggest taking this opportunity to restrict the OS image tags to the corresponding build variant: e.g. I expect tasks in the archlinux-clang to have only the following tags at most:

  • earthly
  • pr-merge-gate
  • archlinux-clang

@@ -0,0 +1,311 @@
#!/usr/bin/env sh
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Document that this file/script is intended for use via Earthly only (notably, includes modifications to system environment and system paths).

Comment on lines +163 to +165
elif test -f /etc/SuSE-brand \
|| (test -f /etc/os-release && grep "opensuse" /etc/os-release); then
test $# = 0 || zypper --non-interactive install "$@" || return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessary? AFAIK we have never tested with OpenSUSE before. Is adding OpenSUSE coverage planned?

Comment on lines +148 to +150
if test -f /usr/bin/dnf; then
# 'dnf' will "do the right thing"
test $# = 0 || dnf install -y "$@" || return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need dnf when we always have yum in our RHEL-like distros?

# a TTY.
test $# = 0 || env DEBIAN_FRONTEND=noninteractive \
apt-get -y install -- "$@" || return
elif test -f /etc/redhat-release || grep 'ID="amzn"' /etc/os-release >/dev/null 1>&2; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is ID="amzn" testing for Amazon Linux? Do we plan to extend Earthly coverage to Amazon Linux? (Note: EVG distros for this OS are available and are not limited resources.)

Comment on lines +66 to +69
# Run the add-ccache command here. This needs to run directly within the same
# target that makes use of it, to ensure that the CACHE line has an effect
# within that target
DO --pass-args +ADD_CCACHE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a point to adding ccache support within Earthly tasks? AFAIK the ccache cache directory will be isolated within each temporary container it is executed within, then immediately thrown away along with the temporary container. There is no opportunity for reuse unless an existing cache directory is copied/preserved across Earthly tasks, such as (I think) via --mount type=cache.

ARG lld = on
IF __bool "$lld"
IF __can_install lld
RUN __do "Installing LLD linker..." __install lld
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest adding a reference to CDRIVER-6150 to help track required changes should MONGO_USE_LLD be deprecated/removed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants