Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 14 additions & 4 deletions ansible/roles/eessi/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,20 @@ None.

## Role Variables

- `cvmfs_quota_limit_mb`: Optional int. Maximum size of local package cache on each node in MB.
- `cvmfs_config_overrides`: Optional dict. Set of key-value pairs for additional CernVM-FS settings see [official docs](https://cvmfs.readthedocs.io/en/stable/cpt-configure.html) for list of options.
Each dict key should correspond to a valid config variable (e.g. `CVMFS_HTTP_PROXY`) and the corresponding dict value will be set as the variable value (e.g. `https://my-proxy.com`).
These configuration parameters will be written to the `/etc/cvmfs/default.local` config file on each host in the form `KEY=VALUE`.
All variables relate to [CernVM-FS configuration](https://cvmfs.readthedocs.io/en/stable/cpt-configure.html).
By default, the configuration is that [recommended by EESSI for single clients](https://www.eessi.io/docs/getting_access/native_installation/#installation-for-single-clients).
However if `cvmfs_http_proxy` is set to a non-empty string then a configuration
suitable for using a [squid proxy](https://www.eessi.io/docs/getting_access/native_installation/#configuring-your-client-to-use-a-squid-proxy)
is applied instead. See [docs/production](../../../docs/eessi.md#eessi-proxy-configuration)
for guidance on appliance configuration.

- `cvmfs_quota_limit_mb`: Optional int. Maximum size of local package cache on
each node in MB. Default 10GB.
- `cvmfs_http_proxy`: Optional string. Value for [CVMFS_HTTP_PROXY](https://cvmfs.readthedocs.io/en/stable/cpt-configure.html#proxy-lists). Quotes are added around the provided value. Default empty string.
- `cvmfs_config_overrides`: Optional dict. Set of key-value pairs for additional
CernVM-FS settings, written to `/etc/cvmfs/default.local`. Keys are
[CVMFS configuration options](https://cvmfs.readthedocs.io/en/stable/cpt-configure.html)
(e.g. `CVMFS_TIMEOUT_DIRECT`). Default empty dict.

## Dependencies

Expand Down
15 changes: 12 additions & 3 deletions ansible/roles/eessi/defaults/main.yaml
Original file line number Diff line number Diff line change
@@ -1,13 +1,22 @@
---
cvmfs_release_version: "6-3"

# Default to 10GB
cvmfs_quota_limit_mb: 10000
cvmfs_quota_limit_mb: 10000 # local cache soft quota in MB (default 10GB)

cvmfs_config_default:
# NB: The string omit removes the option. Defined here for both default configs
# so make swapping between them work properly.
# TODO explain the omits
cvmfs_config_single:
CVMFS_CLIENT_PROFILE: single
CVMFS_QUOTA_LIMIT: "{{ cvmfs_quota_limit_mb }}"

cvmfs_http_proxy: '' # as per docs, quotes are added automatically
# See https://www.eessi.io/docs/getting_access/native_installation/#configuring-your-client-to-use-a-squid-proxy
cvmfs_config_proxy:
CVMFS_QUOTA_LIMIT: "{{ cvmfs_quota_limit_mb }}"
CVMFS_HTTP_PROXY: "'{{ cvmfs_http_proxy }}'"

cvmfs_config_default: "{{ cvmfs_config_single if cvmfs_http_proxy == '' else cvmfs_config_proxy }}"
cvmfs_config_overrides: {}
cvmfs_config: "{{ cvmfs_config_default | combine(cvmfs_config_overrides) }}"

Expand Down
19 changes: 11 additions & 8 deletions ansible/roles/eessi/tasks/configure.yml
Original file line number Diff line number Diff line change
@@ -1,21 +1,24 @@
---

- name: Add base CVMFS config
community.general.ini_file:
ansible.builtin.template:
dest: /etc/cvmfs/default.local
section: null
option: "{{ item.key }}"
value: "{{ item.value }}"
no_extra_spaces: true
mode: "0644"
loop: "{{ cvmfs_config | dict2items }}"

src: cvmfs.config.j2
mode: u=rw,go=r
owner: root
register: cvmfs_config

# NOTE: Not clear how to make this idempotent
- name: Ensure CVMFS config is setup # noqa: no-changed-when
ansible.builtin.command:
cmd: "cvmfs_config setup"

- name: Reload CVMFS config
ansible.builtin.command:
cmd: cvmfs_config reload
when: cvmfs_config.changed # noqa: no-handler
changed_when: true # workaround ansible-lint

# configure gpus
- name: Check for NVIDIA GPU
ansible.builtin.stat:
Expand Down
3 changes: 3 additions & 0 deletions ansible/roles/eessi/templates/cvmfs.config.j2
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
{% for k, v in cvmfs_config.items() %}
{{ k }}={{ v }}
{% endfor %}
42 changes: 37 additions & 5 deletions ansible/roles/squid/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,16 +2,37 @@

Deploy a caching proxy.

**NB:** The default configuration is aimed at providing a proxy for package installs etc. for
nodes which do not have direct internet connectivity. It assumes access to the proxy is protected
by the OpenStack security groups applied to the cluster. The generated configuration should be
reviewed if this is not case.
**NB:** This role provides two default configurations, selected by setting
`squid_conf_mode`:

- `default`: This is aimed at providing a proxy for package installs etc.
for nodes which do not have direct internet connectivity. It assumes access
to the proxy is protected by the OpenStack security groups applied to the
cluster. The generated configuration should be reviewed if this is not case.
- `eessi`: This provides a proxy server for EESSI clients. It uses the
[recommended configuration](https://www.eessi.io/docs/tutorial/access/proxy/#configuration)
which assumes a server with:

- 10Gbit link or faster to the client systems
- a sufficiently powerful CPU
- a decent amount of memory for the kernel cache (tens of GBs)
- fast storage
- 50GB is used for cache

For this use-case the above link recommends at least two squid servers and at
least one for every (100-500) client nodes.

## Role Variables

- `squid_conf_mode`: Optional str, `default` (the default) or `eessi`. See above.
- `squid_conf_template`: Optional str. Path (using Ansible search paths) to
squid.conf template. Default is in-role templates. If this is overriden then
`squid_conf_mode` has no effect.

### Role Variables for squid_conf_mode: default

Where noted these map to squid parameters of the same name without the `squid_` prefix - see [squid documentation](https://www.squid-cache.org/Doc/config) for details.

- `squid_conf_template`: Optional str. Path (using Ansible search paths) to squid.conf template. Default is in-role template.
- `squid_started`: Optional bool. Whether to start squid service. Default `true`.
- `squid_enabled`: Optional bool. Whether squid service is enabled on boot. Default `true`.
- `squid_cache_mem`: Required str. Size of memory cache, e.g "1024 KB", "12 GB" etc. See squid parameter.
Expand All @@ -37,3 +58,14 @@ Where noted these map to squid parameters of the same name without the `squid_`
http_access deny all

See squid parameter.

### Role Variables for squid_conf_mode: eessi

- `squid_eessi_clients`: Optional str. CIDR specifying clients allowed to access
this proxy. Default is the CIDR for the subnet of the [access network](../../../docs/networks.md),
i.e. the first cluster network. For clusters with multiple networks this may
need overriding.
- `squid_eessi_stratum_1`: Optional str. Domain (in squid `acl dstdomain`
format) of Stratum 1 replica servers. Defaults to upstream EEESI Stratum 1
servers.
- `squid_cache_dir`: See definition for default mode above.
9 changes: 8 additions & 1 deletion ansible/roles/squid/defaults/main.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
---
squid_conf_template: squid.conf.j2
squid_conf_mode: default # or 'eessi'

# squid_conf_mode=default:
squid_conf_template: "squid-{{ squid_conf_mode }}.conf.j2"
squid_started: true
squid_enabled: true

Expand All @@ -23,3 +26,7 @@ squid_http_access: |
http_access allow localhost
# Finally deny all other access to this proxy
http_access deny all

# squid_conf_mode=eessi:
squid_eessi_clients: "{{ cluster_subnets[0].cidr | mandatory('squid_eessi_clients must be defined when using eeesi squid config') }}"
squid_eessi_stratum_1: '.eessi.science'
30 changes: 30 additions & 0 deletions ansible/roles/squid/templates/squid-eessi.conf.j2
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# From https://www.eessi.io/docs/tutorial/access/proxy/
# List of local IP addresses (separate IPs and/or CIDR notation) allowed to access your local proxy
acl local_nodes src {{ squid_eessi_clients }}

# Destination domains that are allowed
# cern.ch + opensciencegrid.org domains because of cvmfs-config.cern.ch repository,
# which are provided via Stratum-1 mirror servers hosted by CERN and OSG
acl stratum_ones dstdomain .cern.ch .opensciencegrid.org {{ squid_eessi_stratum_1 }}

# Squid port
http_port 3128

# Deny access to anything which is not part of our stratum_ones ACL.
http_access deny !stratum_ones

# Only allow access from our local machines
http_access allow local_nodes
http_access allow localhost

# Finally, deny all other access to this proxy
http_access deny all

minimum_expiry_time 0
maximum_object_size 1024 MB

# proxy memory cache of 1GB
cache_mem 1024 MB
maximum_object_size_in_memory 128 KB
# 50 GB disk cache
cache_dir ufs {{ squid_cache_dir }} 50000 16 256
47 changes: 46 additions & 1 deletion docs/eessi.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

## How to Load EESSI

The EESSI environment can be initialise by running:
The EESSI environment can be initialised by running:

```bash
source /cvmfs/software.eessi.io/versions/2023.06/init/bash
Expand Down Expand Up @@ -109,3 +109,48 @@ cmake ..
make
./deviceQuery
```

## EESSI Proxy Configuration

EESSI recommend that clusters use a proxy to reduce latency for clients and
avoid excessive load on the EESSI Stratum 1 servers. Squid can be deployed and
configured appropriately as this proxy on separate node(s) using
the OpenTofu variable `additional_nodegroups`, e.g.:

```hcl
additional_nodegroups = {
# EESSI squid proxy
squid = {
nodes = ["squid-0"]
flavor = squid.flavor
}
}
```

EESSI [recommend](https://www.eessi.io/docs/tutorial/access/proxy/#general-recommendations)
that:

> The proxy server should have a 10Gbit link to the client systems, a
> sufficiently powerful CPU, a decent amount of memory for the kernel cache (tens
> of GBs), and fast local storage (SSD or NVMe).
>
> As a rule of thumb, it is recommended to have (at least) one proxy server for
> every couple of hundred worker nodes (100-500).

Generally, both the `squid` nodes and the `eeesi` client nodes can be
appropriately configured simply by setting the squid mode to `eessi`:

```yaml
# environments/site/inventory/group_vars/all/squid.yml:
squid_conf_mode: eessi
```

In this mode, by default:

- `squid` is configured to allow clients from the access network's CIDR, using
the EESSI-recommended cache configuration
- `eessi` is configured to use the `squid` node IPs on the access network (the
first network in `cluster_networks`) as proxies.

If this is not suitable then override the defaults provided by `environments/common/inventory/group_vars/all/eessi.yml`
and the `eeesi` and `squid` roles.
74 changes: 50 additions & 24 deletions docs/production.md
Original file line number Diff line number Diff line change
Expand Up @@ -163,33 +163,47 @@ will have been generated for you already under

## Define and deploy infrastructure

Create an OpenTofu variables file to define the required infrastructure, e.g.:

```text
# environments/$ENV/tofu/terraform.tfvars
cluster_name = "mycluster"
cluster_networks = [
{
network = "some_network" # *
subnet = "some_subnet" # *
}
]
key_pair = "my_key" # *
control_node_flavor = "some_flavor_name"
login = {
# Arbitrary group name for these login nodes
interactive = {
nodes: ["login-0"]
flavor: "login_flavor_name" # *
Modify the cookiecutter-templtaed OpenTofu configuration to define the required
infrastructure, e.g.:

```hcl
# environments/$ENV/tofu/main.tf
module "cluster" {
source = "../../site/tofu/"
environment_root = var.environment_root

cluster_name = "mycluster"
cluster_networks = [
{
network = "some_network" # *
subnet = "some_subnet" # *
}
}
cluster_image_id = "rocky_linux_9_image_uuid"
compute = {
]
key_pair = "my_key" # *
control_node_flavor = "some_flavor_name"
login = {
# Arbitrary group name for these login nodes
head = {
nodes = ["login-0"]
flavor = "login_flavor_name" # *
}
}
cluster_image_id = "rocky_linux_9_image_uuid"
compute = {
# Group name used for compute node partition definition
general = {
nodes: ["compute-0", "compute-1"]
flavor: "compute_flavor_name" # *
nodes = ["compute-0", "compute-1"]
flavor = "compute_flavor_name" # *
}
}
additional_nodes = {
# Nodes configured to provide a squid proxy for EESSI - for guidance
# on number and sizing see [docs/eessi.md](./eessi.md#eessi-proxy-configuration)
squid = {
nodes = ["squid-0"]
flavor = squid_flavor_name # *
}
}
}
```

Expand All @@ -203,7 +217,7 @@ Note that:
- Environment-specific variables (`cluster_name`) should be hardcoded into
the cluster module block.

- Environment-independent variables (e.g. maybe `cluster_net` if the same
- Environment-independent variables (e.g. maybe `cluster_networks` if the same
is used for staging and production) should be set as _defaults_ in
`environments/site/tofu/variables.tf`, and then don't need to be passed
in to the module.
Expand Down Expand Up @@ -356,6 +370,16 @@ environments which should be unique, e.g. production and staging.
not, remove `grafana_auth_anonymous` in
`environments/$ENV/inventory/group_vars/all/grafana.yml`

- Configure EESSI to be proxied via the `squid` node(s) defined in the OpenTofu
configuration:

```yaml
# environments/site/inventory/group_vars/all/squid.yml:
squid_conf_mode: eessi
```

See [docs/eessi](./eessi.md#eessi-proxy-configuration) for more information.

- See the [hpctests docs](../ansible/roles/hpctests/README.md) for advice on
raising `hpctests_hpl_mem_frac` during tests.

Expand Down Expand Up @@ -409,3 +433,5 @@ Once it completes you can log in to the cluster using:

For further information, including additional configuration guides and
operations instructions, see the [docs](README.md) directory.

TODO: Add stuff on eessi proxy.
17 changes: 17 additions & 0 deletions environments/common/inventory/group_vars/all/eessi.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# Automatically configure EESSI clients to use any 'squid' node(s) with squid_conf_mode == eessi as a proxy:
cvmfs_proxy_ip_var: ansible_host # hostvar with proxy IP - default is IP on access network
cvmfs_proxy_ips: >- # list of IPs for squid nodes with squid_conf_mode == eessi:
{{
hostvars.values() |
selectattr('group_names', 'contains', 'squid') |
selectattr('squid_conf_mode', 'eq', 'eessi') |
map(attribute=cvmfs_proxy_ip_var)
}}
# proxy string as per EESSI docs (but unquoted), or empty string:
# TODO: just check final format eg "http://10.8.1.16:3128|http://10.8.1.17:3128" looks ok
cvmfs_http_proxy: >-
{{
cvmfs_proxy_ips |
map('regex_replace', '^(.*)$', 'http://\1:' ~ (squid_http_port | string)) |
join('|')
}}
3 changes: 2 additions & 1 deletion environments/common/inventory/group_vars/all/squid.yml
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
---
squid_http_port: 3128 # defined here for proxy role
squid_http_port: 3128 # defined here for proxy/eeesi roles
squid_conf_mode: default # defined here for eeesi config
Loading