This document outlines the instructions for performance testing using Kwok for the Kyverno 1.10 release.
ETCD_VER=v3.4.13
# choose either URL
GOOGLE_URL=https://storage.googleapis.com/etcd
GITHUB_URL=https://github.com/etcd-io/etcd/releases/download
DOWNLOAD_URL=${GOOGLE_URL}
rm -f /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz
rm -rf /tmp/etcd-download-test && mkdir -p /tmp/etcd-download-test
curl -L ${DOWNLOAD_URL}/${ETCD_VER}/etcd-${ETCD_VER}-linux-amd64.tar.gz -o /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz
tar xzvf /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz -C /usr/local/bin --strip-components=1
rm -f /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz
etcd --version
etcdctl version
More details for etcdctl installation can be found here.
wget -q -O - https://raw.githubusercontent.com/k3d-io/k3d/main/install.sh | bash
More details for k3d installation can be found here.
To quickly try out the scaling test, you can use the following command to create the k3d cluster with 3 workers:
k3d cluster create --agents=3 --k3s-arg "--disable=metrics-server@server:*" --k3s-node-label "ingress-ready=true@agent:*"
To set up embedded etcd for the K3s cluster, follow instructions below.
k3d cluster create scaling --servers 3 --agents=15 --k3s-arg "--disable=metrics-server@server:*" --k3s-node-label "ingress-ready=true@agent:*"
Use the following command if you want to configure the etcd storage limit, this command sets the storage limit to 8GB:
k3d cluster create scaling --servers 3 --agents=15 --k3s-arg "--disable=metrics-server@server:*" --k3s-node-label "ingress-ready=true@agent:*" --k3s-arg "--etcd-arg=quota-backend-bytes=8589934592@server:*"
Note, you can execute into the server node to check the storage setting:
docker exec -ti k3d-scaling-server-0 sh
cat /var/lib/rancher/k3s/server/db/etcd/config | tail -2
quota-backend-bytes: 8589934592
export KUBECONFIG=/etc/rancher/k3s/k3s.yaml
docker cp k3d-scaling-server-0:/var/lib/rancher/k3s/server/tls/etcd/server-ca.crt ./server-ca.crt
docker cp k3d-scaling-server-0:/var/lib/rancher/k3s/server/tls/etcd/server-client.crt ./server-client.crt
docker cp k3d-scaling-server-0:/var/lib/rancher/k3s/server/tls/etcd/server-client.key ./server-client.key
etcd=https://$(kubectl get node -o wide | grep k3d-scaling-server-0 | awk '{print $6}'):2379
etcd_ep=$etcd/version
curl -L --cacert ./server-ca.crt --cert ./server-client.crt --key ./server-client.key $etcd_ep
export ETCDCTL_ENDPOINTS=$etcd
export ETCDCTL_CACERT='./server-ca.crt'
export ETCDCTL_CERT='./server-client.crt'
export ETCDCTL_KEY='./server-client.key'
export ETCDCTL_API=3
etcdctl endpoint status -w table
Credits to k3s etcd commands.
./docs/perf-testing/kwok.sh
Run the script to create the desired number of nodes for your Kwok cluster:
./docs/perf-testing/node.sh
More about Kwok on this page.
make dev-lab-metrics-server dev-lab-prometheus
helm repo update
helm upgrade --install kyverno kyverno/kyverno -n kyverno \
--create-namespace \
--set admissionController.serviceMonitor.enabled=true \
--set admissionController.replicas=3 \
--set reportsController.serviceMonitor.enabled=true \
--set reportsController.resources.limits.memory=10Gi
# --devel \
# --set features.admissionReports.enabled=false \
helm upgrade --install kyverno kyverno/kyverno-policies --set=podSecurityStandard=restricted --set=background=true --set=validationFailureAction=Enforce --devel
This script creates 1000 pods, with QPS and burst set to 50:
kubectl create ns test
go run docs/perf-testing/main.go --count=1000 --kinds=pods --clientRateLimitQPS=50 --clientRateLimitBurst=50 --namespace=test
Note that these pods will be scheduled to the Kwok nodes, not k3s nodes.
To view the Prometheus dashboard, you can expose it on your localhost's port at 9090:
kubectl port-forward --address 127.0.0.1 svc/kube-prometheus-stack-prometheus 9090:9090 -n monitoring &
To get an view of the memory utilization overtime, you can select by the container image for a specific Kyverno controller:
container_memory_working_set_bytes{image="ghcr.io/kyverno/kyverno:v1.10.0-rc.1"}
container_memory_working_set_bytes
gives you the current working set in bytes, and this is what the OOM killer is watching for.
rate(container_cpu_usage_seconds_total{image="ghcr.io/kyverno/kyverno:v1.10.0-rc.1"}[1m])
container_cpu_usage_seconds_total
is the sum of the total amount of “user” time (i.e. time spent not in the kernel) and the total amount of “system” time (i.e. time spent in the kernel). This query gives the average CPU usage in the last 1 minute.
It's a bit tricky to get the precise Admission Request rate (ARPS). When using the Prometheus rate() function, it always requires a time window to calculate the rate with the given internal. The rate may differ when the window differs.
During our test, we calculate the increment in the count of admission requests recorded at the start and end time of a particular duration. Next, we divide this increment by the duration of the time window to derive the average admission request rate during that period.
sum(kyverno_admission_requests_total)
Run the following script to calculate total sizes for the given resource (pods in the following example):
$ ./docs/perf-testing/size.sh
Enter the resource to calculate the size:
pods
The total size for pods is 8861737 bytes.
You can also check the total etcd size:
$ etcdctl endpoint status -w table
+-------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+-------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| https://172.19.0.2:2379 | d7380397c3ec4b90 | 3.5.3 | 84 MB | true | false | 2 | 154449 | 154449 | |
+-------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
This command returns the resources stored in etcd that have more than 100 objects:
kubectl get --raw=/metrics | grep apiserver_storage_objects |awk '$2>100' |sort -g -k 2
Kyverno exposes two metrics that can be used to calculate the admission review latency,
sum(kyverno_admission_review_duration_seconds_sum{resource_request_operation=~"create|update"})/sum(kyverno_admission_review_duration_seconds_count{resource_request_operation=~"create|update"})
The following metrics exposed by Prometheus should give you the same result if you follow the same setup on this page:
sum(apiserver_admission_webhook_admission_duration_seconds_sum{name="validate.kyverno.svc-fail",operation="CREATE"}) / sum(apiserver_admission_webhook_admission_duration_seconds_count{name="validate.kyverno.svc-fail",operation="CREATE"})