前言

本來託管在 AWS 的 RabbitMQ 因為成本考量只開 single instance,每次 maintenance windows 時 app 都會受到影響。因此決定嘗試在 Kubernetes 上建構 RabbitMQ cluster。

RabbitMQ 版本: v3.13.7 (fallow operator default)
Kubernetes (EKS) 版本: v1.30

Installing RabbitMQ Cluster Operator in a Kubernetes Cluster [ref]

這個 Operator 負責建立 RabbitMQ Cluster,版本我固定當前最新版 v2.11.0,預設建立的 RabbitMQ 版本是 v3.13.7 (ref)。

RabbitMQ instance 版本是由 Operator 控制 (目前沒找到可以自訂版本的方法,maybe set image ?),可以在 release Changelog 看到。

1
2
3
4
5
6
7
8
9
kubectl apply -f "https://github.com/rabbitmq/cluster-operator/releases/download/v2.11.0/cluster-operator.yml"
# namespace/rabbitmq-system created
# customresourcedefinition.apiextensions.k8s.io/rabbitmqclusters.rabbitmq.com created
# serviceaccount/rabbitmq-cluster-operator created
# role.rbac.authorization.k8s.io/rabbitmq-cluster-leader-election-role created
# clusterrole.rbac.authorization.k8s.io/rabbitmq-cluster-operator-role created
# rolebinding.rbac.authorization.k8s.io/rabbitmq-cluster-leader-election-rolebinding created
# clusterrolebinding.rbac.authorization.k8s.io/rabbitmq-cluster-operator-rolebinding created
# deployment.apps/rabbitmq-cluster-operator created

Creating a RabbitMQ Cluster [example]

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
apiVersion: rabbitmq.com/v1beta1
kind: RabbitmqCluster
metadata:
  name: dev-rabbitmq
  labels:
    app.kubernetes.io/name: dev-rabbitmq
spec:
  replicas: 3
  resources:
    requests:
      cpu: 300m
      memory: 256Mi
    limits:
      cpu: 2000m
      memory: 256Mi
  rabbitmq:
    additionalConfig: |
      vm_memory_high_watermark.absolute = 200MB      
  affinity:
    podAntiAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 100
        podAffinityTerm:
          labelSelector:
            matchExpressions:
              - key: app.kubernetes.io/name
                operator: In
                values:
                - dev-rabbitmq
          topologyKey: kubernetes.io/hostname
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
          - matchExpressions:
              - key: eks.amazonaws.com/nodegroup
                operator: In
                values:
                  - ARM-t4g-medium-1a

PDB [ref]

1
2
3
4
5
6
7
8
9
    apiVersion: policy/v1beta1
    kind: PodDisruptionBudget
    metadata:
      name: dev-rabbitmq
    spec:
      maxUnavailable: 1
      selector:
        matchLabels:
          app.kubernetes.io/name: dev-rabbitmq

壓力/連線 測試 [ref]

自行替換 hello-world

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
username="$(kubectl get secret dev-rabbitmq-default-user -o jsonpath='{.data.username}' | base64 --decode)"
password="$(kubectl get secret dev-rabbitmq-default-user -o jsonpath='{.data.password}' | base64 --decode)"
service="$(kubectl get service dev-rabbitmq -o jsonpath='{.spec.clusterIP}')"
kubectl run perf-test --image=pivotalrabbitmq/perf-test -- --uri amqp://$username:$password@$service

## test quorum queue
kubectl run perf-test --image=pivotalrabbitmq/perf-test -- --uri amqp://$username:$password@$service --quorum-queue --queue eric-quorum --metrics-format compact --use-millis --rate 100

## Test client HPA (Fill up the queue)
kubectl run perf-test --image=pivotalrabbitmq/perf-test -- \
--uri amqp://$username:$password@$service \
--quorum-queue --queue eric-quorum \
--producers 1 --consumers 0 --predeclared --routing-key rk \
--metrics-format compact --use-millis --rate 100

# pod/perf-test created

kubectl logs -f perf-test 2>&1 | tee  ~/Downloads/rabbitMQ-cluster-perf-test-$(date "+%m-%d-%H-%M")

預設情況下 60% RAM usage 就會觸發 RabbitMQ 的 memory high watermark,同時 alerm triggered + blocked connection。ref

可以透過設定 vm_memory_high_watermark.relative = 0.8 => Container 環境官方說不適合用 % 數 vm_memory_high_watermark.absolute = 200MB 來調整,running 中的 node 也可透過 rabbitmqctl set_vm_memory_high_watermark <fraction> 動態調整

參考資料