Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,10 +64,10 @@ The cluster-deployment tools here include helm charts and ansible playbooks to s
| data-sync | [![](https://img.shields.io/docker/v/instantlinux/data-sync?sort=date)](https://hub.docker.com/r/instantlinux/data-sync "Version badge") | poor-man's SAN for persistent storage |
| ddclient | [![](https://img.shields.io/docker/v/instantlinux/ddclient?sort=date)](https://hub.docker.com/r/instantlinux/ddclient "Version badge") | Dynamic DNS client |
| ez-ipupdate | [![](https://img.shields.io/docker/v/instantlinux/ez-ipupdate?sort=date)](https://hub.docker.com/r/instantlinux/ez-ipupdate "Version badge") | Dynamic DNS client |
| fluent-bit | ** | central logging for Kubernetes |
| haproxy-keepalived | [![](https://img.shields.io/docker/v/instantlinux/haproxy-keepalived?sort=date)](https://hub.docker.com/r/instantlinux/haproxy-keepalived "Version badge") | load balancer |
| grafana | ** | monitoring dashboard with prometheus-based alerting |
| guacamole | ** | authenticated remote-desktop server |
| logspout | ** | central logging for Docker |
| mysqldump | [![](https://img.shields.io/docker/v/instantlinux/mysqldump?sort=date)](https://hub.docker.com/r/instantlinux/mysqldump "Version badge") | per-database alternative to xtrabackup |
| nagios | [![](https://img.shields.io/docker/v/instantlinux/nagios?sort=date)](https://hub.docker.com/r/instantlinux/nagios "Version badge") | Nagios Core v4 for monitoring |
| nagiosql | [![](https://img.shields.io/docker/v/instantlinux/nagiosql?sort=date)](https://hub.docker.com/r/instantlinux/nagiosql "Version badge") | NagiosQL for configuring Nagios Core v4 |
Expand Down Expand Up @@ -121,4 +121,4 @@ Thank you to the following contributors!
* [Alberto Galera](https://github.com/agalera)
* [Andrew Eacott](https://github.com/andreweacott)

Contents created 2017-25 under [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0) by Rich Braun.
Contents created 2017-26 under [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0) by Rich Braun.
1 change: 1 addition & 0 deletions ansible/ansible.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ managed_str = This file is managed by Ansible.%n
user: {uid}
host: {host}
callback_whitelist = profile_tasks
interpreter_python = /usr/bin/python3
inventory = ./hosts
remote_user = ubuntu

Expand Down
3 changes: 2 additions & 1 deletion ansible/roles/docker_node/defaults/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ docker_defaults:
apt_repo:
key: 9DC858229FC7DD38854AE2D88D81803C0EBFCD88
package_name: docker-ce
package_ver: 5:28.4.0-1~ubuntu.24.04~noble
package_ver: 5:29.2.0-1~ubuntu.24.04~noble
repo: deb [arch=amd64 signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu {{ ansible_distribution_release }} stable
url: https://download.docker.com/linux/ubuntu/gpg
certs:
Expand All @@ -29,6 +29,7 @@ docker_defaults:
log-opts:
max-size: 50m
max-file: "3"
min-api-version: "1.43"
storage-driver: overlay2
# storage-opts:
# - dm.thinpooldev=/dev/mapper/{{ thinpool_vg_alt }}-thinpool
Expand Down
2 changes: 1 addition & 1 deletion ansible/roles/kubernetes/defaults/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ k8s_defaults:
name: kubelet
state: restarted
service_network: 10.96.0.0/12
version: 1.34.1
version: 1.34.3
coredns_version: v1.11.3
cni_version: 1.7.1
k8s_override: {}
Expand Down
2 changes: 1 addition & 1 deletion ansible/roles/mythfrontend/tasks/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
- include_tasks: "{{ ansible_os_family | lower }}/ir-keytable.yml"

- include_tasks: autosuspend.yml
when: suspend
when: suspend | length > 0

- include_tasks: drivers/{{ display_driver.type }}.yml

Expand Down
2 changes: 2 additions & 0 deletions ansible/roles/ntp/defaults/main.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
---

ntp_defaults:
driftfile: /var/lib/ntpsec/ntp.drift
leapfile: /usr/share/zoneinfo/leap-seconds.list
query_ok:
- localhost
- ::1
Expand Down
4 changes: 2 additions & 2 deletions ansible/roles/ntp/templates/ntp.conf.j2
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{{ ansible_managed | comment }}
driftfile /var/lib/ntp/ntp.drift

driftfile {{ ntp.driftfile }}
leapfile {{ ntp.leapfile }}
{% if 'symmetric_key' in ntp %}
keys /etc/ntp.keys # path for keys file
trustedkey 1 # define trusted keys
Expand Down
10 changes: 4 additions & 6 deletions images/dovecot/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,27 +25,25 @@ Configuration is defined as files in a volume mounted as
./mkcert.sh
```

For settings, see etc-example directory and [helm]((https://github.com/instantlinux/docker-tools/tree/main/images/dovecot/helm) / kubernetes.yaml / docker-compose.yml. The [k8s/Makefile.vars](https://github.com/instantlinux/docker-tools/blob/main/k8s/Makefile.vars) file defines default values.
For settings, see etc-example directory and [helm](https://github.com/instantlinux/docker-tools/tree/main/images/dovecot/helm) / docker-compose.yml. The [k8s/Makefile.vars](https://github.com/instantlinux/docker-tools/blob/main/k8s/Makefile.vars) file defines default values.

Also configure postfix as described in the postfix image.

This repo has complete instructions for
[building a kubernetes cluster](https://github.com/instantlinux/docker-tools/blob/main/k8s/README.md) where you can launch with [helm](https://github.com/instantlinux/docker-tools/tree/main/images/dovecot/helm) or [kubernetes.yaml](https://github.com/instantlinux/docker-tools/blob/main/images/dovecot/kubernetes.yaml) using _make_ and customizing [Makefile.vars](https://github.com/instantlinux/docker-tools/blob/main/k8s/Makefile.vars) after cloning this repo:
[building a kubernetes cluster](https://github.com/instantlinux/docker-tools/blob/main/k8s/README.md) where you can launch with [helm](https://github.com/instantlinux/docker-tools/tree/main/images/dovecot/helm) using _make_ after customizing overrides of [values.yaml](https://github.com/instantlinux/docker-tools/blob/main/images/dovecot/helm/values.yaml) after cloning this repo:
~~~
git clone https://github.com/instantlinux/docker-tools.git
cd docker-tools/k8s
make dovecot
~~~

See the Makefile and Makefile.vars files under k8s directory for default values referenced within kubernetes.yaml.

To provide high availability across the cluster, the helm chart here includes an optional data-sync service to keep the inbox, mail and spool directories synchronized across 2 or more worker nodes. Minor data loss can occur when the service shifts from one worker to another, so this feature isn't recommended for large production deployments (when running on a cloud provider, simply use their block storage capabilities). That said, unison-based data-sync service has been rock-solid on a bare-metal cluster for years.

Auth is the most challenging aspect of implementing dovecot. Use the following command from with the container to verify user authentication:
```
doveadm auth login <user>
```
If using openldap, turn on log setting `BER` to view raw packet contents as you troubleshoot login from dovecot.
If using openldap, turn on openldap's log setting `BER` to view raw packet contents as you troubleshoot login from dovecot.

### Variables

Expand All @@ -72,7 +70,7 @@ Need more configurability? Edit the ConfigMap defined in the helm chart.

| Helm var | 2.3 | 2.4 | Notes |
| -------- | --- | --- | ----- |
| uris | hosts | ldap_uris | <host> becomes ldap://<host>:389 |
| uris | hosts | ldap_uris | host becomes ldap://host:389 |
| | ldap_version | (unchanged) | |
| base | base | ldap_base | |
| bind | auth_bind | ldap_bind | |
Expand Down
32 changes: 12 additions & 20 deletions k8s/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -103,7 +103,7 @@ VOLUMES_YAML = $(basename $(wildcard volumes/*.yaml))
install: install/admin-user cluster_network \
install/local-storage storage_localdefault imports \
install_imports namespace_config install/prometheus-rbac \
install/k8s-backup install/logspout remote_volumes \
install/k8s-backup fluent-bit remote_volumes \
sops data-sync-ssh persistent secrets install/ingress-nginx \
install/cert-manager

Expand Down Expand Up @@ -166,17 +166,6 @@ storage_localdefault:
kubectl $(ADMIN_CTX) patch storageclass local-storage -p \
'{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'

##########
# etcd
##########
imports/etcd-token:
@-kubectl delete secret $(@F)
(cd imports && \
basename \
`curl -s 'https://discovery.etcd.io/new?size=$(ETCD_NUM_NODES)'` \
> $(@F) && \
kubectl create secret generic $(@F) --from-file $(@F))

##########
# Helm
##########
Expand Down Expand Up @@ -232,11 +221,14 @@ install_metrics: imports/kube-state-metrics
imports/traefik-prom.yaml:
curl -sLo $@ https://raw.githubusercontent.com/mateobur/prometheus-monitoring-guide/master/traefik-prom.yaml

# As of Jan-2019, the helm chart for etcd doesn't reliably construct multi-node
# cluster, just use 'make etcd' rather than 'make etcd_chart'
etcd_chart:
helm install --name etcd --namespace $(K8S_NAMESPACE) \
--kube-context=kubernetes-admin@$(CLUSTER) \
bitnami/etcd --set auth.rbac.enabled=false
sleep 30
kubectl scale statefulset etcd-etcd --namespace=$(K8S_NAMESPACE) --replicas=3
SPLUNK_OPT = $(if $(LOG_TO_SPLUNK), -f install/fluent-bit-splunk.yaml, )
fluent-bit: install/namespace.yaml
K8S_NAMESPACE=$(LOG_NAMESPACE) envsubst < $< | \
kubectl apply --context=sudo -f -
helm repo add fluent https://fluent.github.io/helm-charts
helm repo update
@$(eval OVERRIDE := $(shell [ -s ../admin/services/values/$@.yaml ] \
&& echo "-f ../admin/services/values/$@.yaml"))
envsubst < install/fluent-bit.yaml | \
helm install -f - $(SPLUNK_OPT) $(OVERRIDE) \
--kube-context=sudo --namespace=$(LOG_NAMESPACE) $@ fluent/fluent-bit
4 changes: 2 additions & 2 deletions k8s/Makefile.vars
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,8 @@ export LIMIT_CPU_DEFAULT ?= 500m
export LIMIT_CPU_REQUEST ?= 50m
export LIMIT_MEM_DEFAULT ?= 256Mi
export LIMIT_MEM_REQUEST ?= 64Mi
export LOG_NAMESPACE ?= logging
export LOG_TO_SPLUNK ?=
export MYTHTV_VOL_SIZE ?= 400Gi
export NAMED_VOLUMES ?= share $(LOCAL_VOLUMES)
export NFS_HOST ?= nfs.$(DOMAIN)
Expand Down Expand Up @@ -46,5 +48,3 @@ export PORT_DOVECOT_SMTP ?= 825
export PORT_GIT_SSH ?= 8999
export PORT_POSTFIX_INTERNAL ?= 3425
export PORT_POSTFIX_EXTERNAL ?= 3525
# Port configured in install/logspout.yaml
export PORT_RSYSLOGD ?= 514
89 changes: 27 additions & 62 deletions k8s/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ enterprise market.
This repo is an attempt to make Kubernetes more approachable for any
user who wants to get started easily, with a real cluster (not just
a single-instance minikube setup) on bare-metal. Most of this will
probably work in IaaS providers like Google or AWS but the purpose
probably work on cloud providers like Google or AWS but the purpose
of this repo is to set up production-grade K8S with your own servers / VMs.

See [How to use this](#how-to-use-this) below to get started.
Expand All @@ -32,14 +32,12 @@ kubeadm suite:
* Encryption for internal etcd
* MFA using [Authelia](https://github.com/clems4ever/authelia) and Google Authenticator
* Calico or flannel networking
* Fluent Bit for container-log aggregation
* ingress-nginx
* Local-volume sync
* Automatic certificate issuing/renewal with Letsencrypt

Resource yaml files are in standard k8s format, parameterized by simple
environment-variable substitution. Helm is provided only to enable
access to published helm charts; resources herein are defined using the
Kubernetes-native API syntax.
Helm has become the standard mechanism for deploying kubernetes resources. This repo provides a library, chartlib, which handles almost all of the logic and tedium of golang templating. Look in the values.yaml file of each of the helm charts published here for parameters that you can override by supplying a helm overrides yaml file.

### Requirements and cost

Expand All @@ -50,15 +48,12 @@ services such as etc and MariaDB is 4+ nodes. (An inexpensive node
similar to mine is an [Intel N6005 Mini PC](https://www.newegg.com/neosmay-ac8-jasper-lake/p/2SW-006Y-00003) with two 8GB DDR4 RAM modules
and a 500GB to 2TB drive installed in each.) As of Sep 2022 three of these configured with 16GB of RAM and 512GB SSD costs plus a control-plane node of 250GB SSD and 8GB of RAM add up to about $1250USD; you
can shave maybe $400 off by reducing RAM and storage, and another $250
by virtualizing the manager on your existing server. By Nov 2024, costs of such nodes has plunged: Intel N100 quad-core mini-PCs with 16GB of RAM and 512GB SSD can be had for under $150, so four of these is under $600.
by virtualizing the manager on your existing server. By Nov 2024, costs of such nodes has plunged: Intel N100 quad-core mini-PCs with 16GB of RAM and 512GB SSD can be had for under $150, so four of these is under $600. (Inflation has picked up since, but this hardware remains relatively inexpensive.)

### Assumptions

* You're not running in a cloud provider (this probably works in
cloud instances but isn't tested there; if you're already in
cloud and willing to pay for it you don't need this tool anyway)
* You want to run a current stable version of docker engine
* You're running Ubuntu 24.04 LTS on the nodes
* You're running Ubuntu LTS on the nodes
* You want fastest / most secure direct-attached SSD performance:
persistent storage will be local LUKS-encrypted directories on
each node
Expand All @@ -68,6 +63,9 @@ by virtualizing the manager on your existing server. By Nov 2024, costs of such
_make_ and _helm_ rather than ksonnet / ansible (I independently opted
to learn kubernetes this way, as described in [Using Makefiles and
envsubst as an Alternative to Helm and Ksonnet](https://vadosware.io/post/using-makefiles-and-envsubst-as-an-alternative-to-helm-and-ksonnet/) by vados.
* You're not running in a cloud provider (this probably works in
cloud instances but isn't tested there; if you're already in
cloud and willing to pay for it you don't need this tool anyway)

### Repo layout

Expand All @@ -79,7 +77,6 @@ by virtualizing the manager on your existing server. By Nov 2024, costs of such
| secrets/ | symlink to private directory of encrypted secrets |
| volumes/ | (deprecated) persistent volume claims |
| Makefile | resource deployer |
| *.yaml | applications as noted in top-level README |

### How to use this

Expand Down Expand Up @@ -289,6 +286,23 @@ CERT_MGR_EMAIL=<my email> make install/cert-manager
```
A lot of things have to be functioning before letsencrypt will issue certs: the [Let's Encrypt troubleshooting guide](https://cert-manager.io/docs/troubleshooting/acme/) is super-helpful.

### Container logs

The former standard way to funnel logs to a central log server (e.g. logstash, Splunk) was logspout; Fluent Bit is the newer way. See the [k8s/Makefile](https://github.com/instantlinux/docker-tools/blob/main/k8s/Makefile) which has a `fluent-bit` target to launch current version of the vendor-supplied helm chart. Set the `SPLUNK_OPT` environment variable to `yes` to apply config overrides suitable for a HEC forwarder. Your forwarder token should first be stored as key `splunk_token` in a `fluent-bit` secret stored in the `logging` namespace. The installation override yaml files provided under the k8s/install directory here work as-is for simple use-cases; add any additional overrides in the same directory where you keep helm override files for other services here (i.e. ../admin/services/values/fluent-bit.yaml).

As Fluent Bit still has not implemented a way (see [issue #4651](https://github.com/fluent/fluent-bit/issues/4651) and related reports) to edit out unwanted storage-consuming items in the kubernetes metadata json blob, if you use Splunk as your log aggregator you can add these two entries to files in /opt/splunk/etc/system/local to reduce logging storage:

transforms.conf
```
INGEST_EVAL = _raw=json_delete(_raw, "kubernetes.annotations",
"kubernetes.container_hash", "kubernetes.docker_id",
"kubernetes.labels", "kubernetes.pod_id", "kubernetes.pod_name")
```
props.conf
```
[httpevent]
TRANSFORMS-removeJsonKeys = removeJsonKeys1
```
### Network and local storage

Storage management is a mostly-unsolved problem in the container world; indeed there are startup companies raising cash to try to solve this in ways that will be affordable only to big enterprises (a blogger at StorageOS has [this to say](https://medium.com/@saliloquy/storage-is-the-achilles-heel-of-containers-97d0341e8d87)). Long before Swarm and Kubernetes came out, I was using LVM snapshots to quickly clone LXC containers, and grew accustomed to the speedy performance of direct-attached SSD storage. At a former employer, in order to provide resiliency for database masters, I used [drbd](http://www.drbd.org) to sync volumes at the block level across the network.
Expand Down Expand Up @@ -353,57 +367,8 @@ notes are as of Jan 2019 on version 1.13.1:
cluster isn't fully supported at this time, there are race conditions
which create performance problems).

### The version 1.15.0 upgrade fiasco

The kubeadm update procedure didn't work for 1.13->1.14 upgrade so
when an unknown fault took down my single master, I opted to do a
fresh install of 1.15.0. That led to a multiple-day total outage of
all services. Here are notes that might help others prevent similar
debacles:

* Networking and DNS are fundamental, and can fail silently on a newly
built cluster. Use a busybox container to ensure you can do DNS
lookups against the nameserver listed in /etc/resolv.conf after
generating a new master and worker(s). Do not proceed until you've
solved any mysteries that prevent this from working (and THIS CAN
TAKE DAYS on a bare-metal cluster.)

* If you only have a single controller that has failed, don't do anything
intrusive to it (like, say, the obvious--restoring a backup; this
totally clobbered my setup). If it's been running for more than a
couple months, chances are it's got some hard-to-replace
configurations and the currently available backup/restore procedures
may not (or probably won't) save you. Build a new one and use the
old server as reference.

* Switching from flannel to calico 3.8 was fraught with
problems. Hours into things, I couldn't figure out why coredns
wouldn't resolve any names from any pod: wound up sticking with
flannel for the foreseeable future. Also, it's easy to get both
flannel and calico installed, a major conflict.

* Installation procedure for cert-manager is 100% different from 5
months ago (in 2022). That took me about 3 hours to resolve. And I'd
become over-reliant on cert-manager: without valid TLS certificates,
my local Docker registry wouldn't come up. Without the registry,
most services wind up in ImagePullBackoff failure state. (Update in
2024 -- almost services I run are now on docker hub or
registry.k8s.io, so they depend only on Internet and DNS.)

* When restoring cert-manager, get ingress-nginx working first.

* My efforts to lock down kubernetes security (based mostly on tutorial-
style procedures found online) backfired big-time: bottom line is that
if you have to do something manual to get your setup running after
the community-supplied installer (kubeadm) finishes, then it's quite
likely whatever script or resource definition you created to automate
such manual processes *won't* work next time you need to do disaster-
recovery or routine upgrades. Make sure your kubeadm-config.yaml
defines all the flags required for the control plane under
/etc/kubernetes/manifests.

* One thing I'd done that compromised availability in the interest of
security was to encrypt etcd key-value storage. Make sure to
* One thing I'd done that initially compromised availability in the interest
of security was to encrypt etcd key-value storage. Make sure to
practice backup/restore a couple times, and document in an obvious
place what the restore procedure is and where to get the decyption
codes. The k8s-cplane ansible playbook here should help.
Expand Down
Loading
Loading