-
Notifications
You must be signed in to change notification settings - Fork 773
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Microk8s v1.29 snap installation failed on plain Debian 12.4 #4361
Comments
For me.. root@microk8s-master:~# microk8s status The inspect log looks same as your first inspect output. I don know Whats going on. |
@odoo-sh thanks for your fast feedback. Do you really mean my first inspect output, representing the output of a v1.28 installation without errors. Or do you mean my last output after a re-installation (microk8s-reinstall-1.29_6364-inspection-report-20240110_104641.tar.gz.tar.gz)? Sorry, just to clarify. |
|
Have the same problem on ubuntu server 22.04. file localnode.yaml not exist microk8s inspect
|
Same problem aswell. Also i'm wondering if you guys who got 1.29 running (e.g. by upgrading from 1.28) also can't use |
Same issue on ubuntu desktop 23.10:
|
Hi @TecIntelli and other folks who are running into this, sorry for taking long to check this. This seems to be related with cgroups, I see the following in the error logs (and I can also reproduce in Debian 12 systems)
One work-around for this is to disable this on the kubelet with:
Afterwards, MicroK8s should be coming up. We will take this back to see what the root cause is and what sort of mitigations we could apply to prevent this in out of the box deployments. |
To add some more details, this is what I'm seeing on a Debian 12 instance where I can reproduce the issue:
|
Let me drop some news to this issue we have found, regarding our initially mentioned problem. Maybe somebody else can explain more about the findings we have made. It might be an issue with the used kernel 6.1 on Debian 12 (last try with latest version 6.1.69). Let me attach the inspect files just to compare if required: Additionally (with link to @neoaggelos detail information) we have also figured out the reason in Kernel 6.1.x might be a deligation issue. If we add the following before we install MicroK8s, the initial problem does not occur.
github - opencontainers - cgroupv2 |
Hi @TecIntelli thanks a lot for looking deeper and coming up with a path towards a solution. It is still not too clear to me how we could handle this on the MicroK8s side, I do not think it's a good approach to mess with the system like this. |
I spontaneously run into the same issue on a HA cluster running Ubuntu 22.04 LTS (Hetzner cloud server) and Microk8s 1.29/stable. Firstly, I spotted weird behavior on one faulty node of the HA cluster (container stayed in Terminating state, no deletion possible). After rebooting, I observed that At some point I realized that
After setting up a new clean machine Ubuntu 22.04.3 LTS and 1.29/stable (single node), I run into the same not starting microk8s.daemon-kubelite. On top I got the missing localnode.yaml error reported by @Zvirovyi earlier. For now, I managed to restore the cluster by downgrading Microk8s to v1.28.3:
PS: Adding and removing nodes from the HA cluster was very smooth in every stage, even with the "broken" 1.29/stable. Kudos to the maintainers! |
@dimw Do you remember what kernel version run on your broken node with Ubuntu 22.04 and the new clean host with Ubuntu 22.04.3? |
@TecIntelli I made a snapshot of the machine before purging it so I restored it now and checked the data. Both machines have the same configuration:
|
@dimw I was just curious and made a short test on an AWS EC2 instance with Ubuntu 22.04.3 and kernel 5.15.0-1052-aws. Unfortunately I cannot confirm your mentioned behavior when I installed MicroK8s 1.29/stable (6364) via snap. It seems to run smoothly, all pod came up as expected. Here the inspect file of the singe node instance |
@TecIntelli I repeated the process yesterday and installed the newest Ubuntu on Hetzner Cloud and run into the following two issues again:
Expand for details$ apt update
$ apt upgrade -y
$ apt install snapd -y
$ snap install microk8s --classic --channel=1.29/stable
$ reboot # after kernel upgrade
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 22.04.3 LTS
Release: 22.04
Codename: jammy
$ uname -r
5.15.0-92-generic
$ microk8s start
$ microk8s inspect
microk8s inspect
Inspecting system
Inspecting Certificates
Inspecting services
Service snap.microk8s.daemon-cluster-agent is running
Service snap.microk8s.daemon-containerd is running
Service snap.microk8s.daemon-kubelite is running
Service snap.microk8s.daemon-k8s-dqlite is running
Service snap.microk8s.daemon-apiserver-kicker is running
Copy service arguments to the final report tarball
Inspecting AppArmor configuration
Gathering system information
Copy processes list to the final report tarball
Copy disk usage information to the final report tarball
Copy memory usage information to the final report tarball
Copy server uptime to the final report tarball
Copy openSSL information to the final report tarball
Copy snap list to the final report tarball
Copy VM name (or none) to the final report tarball
Copy current linux distribution to the final report tarball
Copy asnycio usage and limits to the final report tarball
Copy inotify max_user_instances and max_user_watches to the final report tarball
Copy network configuration to the final report tarball
Inspecting kubernetes cluster
Inspect kubernetes cluster
Inspecting dqlite
Inspect dqlite
cp: cannot stat '/var/snap/microk8s/6364/var/kubernetes/backend/localnode.yaml': No such file or directory
Building the report tarball
Report tarball is at /var/snap/microk8s/6364/inspection-report-20240131_203342.tar.gz
$ journalctl -u snap.microk8s.daemon-kubelite -n 1000 | grep "err="
Jan 31 20:38:20 ubuntu-4gb-fsn1-2 microk8s.daemon-kubelite[65475]: E0131 20:38:20.663704 65475 kubelet.go:2353] "Skipping pod synchronization" err="[container runtime status check may not have completed yet, PLEG is not healthy: pleg has yet to be successful]"
Jan 31 20:38:20 ubuntu-4gb-fsn1-2 microk8s.daemon-kubelite[65475]: E0131 20:38:20.721005 65475 container_manager_linux.go:881] "Unable to get rootfs data from cAdvisor interface" err="unable to find data in memory cache"
Jan 31 20:38:20 ubuntu-4gb-fsn1-2 microk8s.daemon-kubelite[65475]: E0131 20:38:20.772964 65475 kubelet.go:2353] "Skipping pod synchronization" err="[container runtime status check may not have completed yet, PLEG is not healthy: pleg has yet to be successful]"
Jan 31 20:38:20 ubuntu-4gb-fsn1-2 microk8s.daemon-kubelite[65475]: E0131 20:38:20.967043 65475 kubelet.go:1542] "Failed to start ContainerManager" err="failed to initialize top level QOS containers: root container [kubepods] doesn't exist" I also tried the same with Ubuntu 20.04.6 LTS (Kernel: 5.4.0-170-generic) and getting the same |
Hi all, I have the same issue on Oracle Linux 9.3: ` microk8s inspect Building the report tarball |
same on 22.0.4.4 ubuntu server |
Same on Debian GNU/Linux 12 (bookworm) microk8s.inspect: Inspecting system Building the report tarball |
System info:
When I installed version 1.29 using microk8s inspect WARNING: Maximum number of inotify user watches is less than the recommended value of 1048576. Got above error. Solution:
|
Ran into this same issue with This happened on a fresh 22.04.04 ubuntu server minimal installation having only done a Fix for me was to roll back to 1.28 ie sudo snap remove --purge microk8s
sudo snap install microk8s --classic --channel=1.28/stable This let me start back up the node. Subsequently I upgraded to latest: sudo snap refresh microk8s --channel 1.30/stable and rejoined the node to the cluster. microk8s join 192.168.x.x:25000/xxxxxxxxxxxxxxxxxxxxxxxxxx/xxxxxxxxx Everything seems to be in order. |
Hi @Nospamas and all, this seems to have started on Debian, but currently affecting Ubuntu versions as well. This is related to the kubepods cgroup not getting the We have a fix #4503 that is out on
The issue will remain open until the bugfix is promoted to stable. |
Same issue on raspberry PI 5 Linux pi1 6.6.28+rpt-rpi-2712 #1 SMP PREEMPT Debian 1:6.6.28-1+rpt1 (2024-04-22) aarch64 GNU/Linux $ microk8s.inspect Building the report tarball inspection-report-20240506_095631.tar.gz |
Are we able to switch back to stable once the bug is fixed? or it's best to use version 1.28? |
I hate microk8s |
Hi, I have the same issue on my Ubuntu 22.04
report file |
k3s is life! |
See canonical/microk8s#4361 for why. Particularly: canonical/microk8s#4361 (comment)
Hi, same issue here with Ubuntu 22.04 with both 1.29/edge and 1.30/edge Refreshing with 1.28/stable seems to work
and I no longer get: But I still have the
|
it fixed my issues apiVersion: v1 This configuration sets the address to 192.168.1.100:19001 and assigns the role as a node Purpose of localnode.yaml: |
not working, all wsl ubuntu 20.04, 22.04, 24.04 gave |
Are there plans to get this into the stable branch before the end-of life of 1.28 on 2024-10-28? |
As of writing this, both I first saw this problem appear on an Ubuntu 24.04 (x86) VM after upgrading that node from |
I am still facing this issue on my new VPS despite trying all available fixes, and none of them have worked. |
i am also facing same issue |
Re: the top level QOS problems, it looks like the systemd delegate conf was included in 1.31 (and maybe ported to others, didn't check). It is /not/ in 1.28. Presumably the changes could just be executed manually though via:
|
Hey folks, we have not seen this on 1.28 initially. We've backported the workaround/fix with #4667 to 1.28, it should be promoted to the We are following the upstream issue kubernetes/kubernetes#122955 (comment) and the possible fix kubernetes/kubernetes#125923 |
Hi, |
Same issue.. Workaround as suggested by thirusubash (tested on 1.31.0 and 1.29.8), I've created the file manually. You can create it based on the configuration found in the
Note: If you drained your node, |
Ran into this today. Seems like microk8s is genuinely unusable until this is fixed, any updates to this? |
Also run into this issue, any news on a fix? |
Chipping in - I have this too on bare metal AMD64 clusters (three identical 6 CPU 16GB ram mini desktop pc's) on the latest Ubuntu server version 24.04 (minimized install). The only thing I have installed microk8s using the option in the Ubuntu installer to install the snap, and then followed the (few) steps to get a HA cluster running. I'm running a 3 node HA cluster. I repeatedly have that when I take down the cluster (shutdown the machines), and later start it again, 1 or 2 random nodes will report 'NotReady' for a long time when running a When I ssh into the machine running the NotReady node, When I then run
After this, the node reports Ready on Some system details - note that the
|
Got this error. it seems still there is no fix for this issue for 11 months. |
Hey folks, this issue now contains reports related to multiple causes. To address certain ones:
I'm closing this issue since a workaround is issued for the original bug report. Please create a separate issue if you are facing a different problem, thanks! |
Summary
The last days I noticed that the installation of MicroK8s v1.29/stable (6364) failed on a new (plain) Debian 12.4 system (tested on AWS EC2 with default Debian 12 image provided by AWS). After a few tests I can summarize the following behavior:
Installation of MicroK8s v1.28/stable (6089) on described Debian system via snap works like expected
microk8s-1.28_6089-inspection-report-20240110_103728.tar.gz
Installation of MicroK8s v1.29/stable (6364) on described Debian system via snap failed and
microk8s inspect
responsed with:microk8s_1.29_6364-inspection-report-20240110_102300.tar.gz
microk8s-1.28_6089-refreshed-1.29_6364-inspection-report-20240110_103926.tar.gz
sudo snap remove --purge microk8s
and install the v1.29 (6364) again, the (one node) cluster seems to work like expected, but the inspect looks also not well:microk8s-reinstall-1.29_6364-inspection-report-20240110_104641.tar.gz.tar.gz
What Should Happen Instead?
I hope somebody of the development team can find the reason for this behavior. I guess there is something installed on the host system during the v1.28 installation what failed in v1.29, and is not removed during
snap remove --purge
process.Reproduction Steps
Explained above (incl. inspection tar balls)
If there are any points left, I will try to answer your questions.
Thanks!
The text was updated successfully, but these errors were encountered: