I have a fresh Nethserver install and during the night the server stops responding, it hangs.
I don’t have physical access to the server, someone has to reboot it for me in the morning.
The server is a new Dell Poweredge T130 with four 1TB drives configured in software RAID1(boot) and RAID10(the rest) and 8GB RAM.
The server has Samba server and router roles(DHCP, DNS, firewall, MultiWAN).
I tried to configure WAN failover and shorewall crashed. Even in this moment it says “Check firewall rules The firewall is NOT running”. Last time I tried to reboot the server remotely it hanged.
I would like to know where should I start the troubleshooting.
There are a lot of messages like this before hanging:
Feb 3 00:18:19 server kernel: IPv4: martian source 46.97.27.250 from 46.97.27.249, on dev p3p1
Feb 3 00:18:19 server kernel: ll header: 00000000: ff ff ff ff ff ff 00 6c bc ef 5e 2e 08 06 …l…^…
These are related to the 2nd WAN which I disabled to be able to start Shorewall.
This morning the server was not accessible again. This is my 2nd Nethserver and there are way too much stability problems with it. I need to find out if the stability problems are hardware related or Nethserver related. Is not normal to have weekly server hanging issues.
I think that @dnutan has found the right answer. @adv, could you please reboot your server using kernel 3.10.0-514.26.2.el7.x86_64 just to confirm that the problem goes away?
This seems to be a regression coming from Redhat, we use to trust them.
For this issue I supposed to open a new topic, but the forum administrators are closing them. They say the issues are related to this topic.
The thing is that I’m facing a lot of stability issues with this Nethserver installation. Actually since I use Linux in the past 15 years I never had stability issues.
I assume this must be a hardware issue, but I can’t prove it yet.
About this issue I disabled the auditd service, like is written on this link:
You are using CentoOS 7, I don’t think that instructions for CentOS 6 are relevant to your problem.
Does your system match what is reported in the CentOS bugtracker?
Today I the server was blocked again. The kernel version is: 3.10.0-693.17.1.el7.x86_64
Apparently in this version the bug reported here persists: https://bugs.centos.org/view.php?id=13843#c31121
In this thread is mentioned that this bug is fixed in version: 3.10.0-820.el7, but I can’t find this version.
Instead I found that the kernel 3 is EOL(https://www.kernel.org/) but it can be upgraded to 4.4 or 4.15.(https://www.howtoforge.com/tutorial/how-to-upgrade-kernel-in-centos-7-server/)
So what do you think? Should I upgrade to 4.4 or 4.15? Or is better to revert to 3.10.0-514.26.2.el7.x86_64 ?
I just discovered I can’t revert to 3.10.0-514.26.2.el7.x86_64. The available versions are:
0 : CentOS Linux (3.10.0-693.17.1.el7.x86_64) 7 (Core)
1 : CentOS Linux (3.10.0-693.11.6.el7.x86_64) 7 (Core)
2 : CentOS Linux (3.10.0-693.el7.x86_64) 7 (Core)
3 : CentOS Linux (0-rescue-b6a299e6f2a74c57aa53e745d1fd3c69) 7 (Core)
So should I go to update to kernel v4? Which one? 4.4 or 4.15?
3.10.0-820 will come with 7.5 in a few months.
3.10 is not EOL, it’s maintained by Redhat.
Follow the howto you have found to pick a new kernel from elrepo.
I’d prefer the -lt kernel (which, as of today, is 4.4).
You will lose the nDPI function, but you may rebuild it from sources (I know it works).
I upgraded to 4.4.116-1.el7.elrepo.x86_64. I’m anxious to see if it solves the problem.
Indeed DPI function is not available. Do you have a tutorial how can I rebuild it from the sources? I’m planning to use this feature.