Hardening The Linux Kernel

Linux Hardening For a Better Tomorrow

In our blog post from last year we discussed how the internet is in the age of corporate surveillance. With data being one of the biggest commodities today, people of the internet are fighting back with blogs like the indieweb, fediverse, lemme, and believe it or not, areas of the darkweb. Though our threat landscape is diverse and expansive, especially if you’re using the darkweb, on Linux systems there are different measures we can take to lock-down and harden our systems.

Today’s focus will be on the Arch ecosystem. When your installing Arch, you’ll be given the option to choose which kernel you’d like. For those in the security sector, you’d choose the harden-kernel. But there are other options like Zen for performance and LTS for stability. Choosing the harden-kernel though gives a false sense of security due to a lot of points being left open to general users allow them to easily get a root shell and exposing important information on your system.

Kernel Exploitation

ROP (Return-Oriented Programming) is a exploit technique that allows you to execute code in the presence of security defense. But since our attack surface is a lot bigger then that in userland you would probably want to use a script like ROPGadget to automate scanning our attack surface. We’ll go into kernel exploitation in another post though, for now we’re locking down our system for a more hardened setup. Bit as we talk about the different settings you’ll get a good understanding of what all we could look for and use in a Red Team application.

Kernel Self-Protection

In the following, we’ll be using the sysctl to change our kernel settings.

*kernel.kptr`_`restrict=2*

Kernel pointers point to a specific location in kernel memory, but these are not hidden by default. You can find them located in /proc/kallsysms

*kernel.dmesg`_`restrict=1*

dmesg is the kernel log. It exposes a large amount of useful kernel debugging information, but this can often leak sensitive information, such as kernel pointers. Changing the above sysctl restricts the kernel log to the CAP`_`SYSLOG capability.

*kernel.printk=3 3 3 3*

Despite the value of dmesg`_`restrct, the kernel log will still be displayed in the console during boot. Malware that is able to record to screen during boot may be able to abuse this to gain higher privileges. This option prevents those information leaks. This must be used in combination with certain boot parameters described below to be fully effective.

*kernel.unprivileged`_`bpf`_`disable=1* *net.core.bpf`_`jit`_`harden=2*

eBPF exposes quite a large attack surface. As such, it must be restricted. Those sysctls restrict eBPF to the CA{`_`BPF capability (CAP`_`SYS`_`ADMIN on kernel versions prior to 5.8) and enable JIT hardening techniques, such as constant blinding.

*dev.tty.ldisc`_`autoload=0*

This restricts loading TTY line disciplines to the CAP`_`SYS`_`MODULE capability to prevent unprivileged attackers from loading vulnerable line disciplines with the TIOCSETD ioctl, which has been abused in a number of exploits before.

*vm.unprivileged`_`userfaultfd=0*

The userfaultfd() sysctl is often abused to exploit use-after-free flaws. Due to this, this sysctl is used to restrict this sysctl to the CAP`_`SYS`_`PTRACE capability.

kernel.kexec`_`load`_`disable=1

kexec is a system call that used to boot another kernel during runtime. This functionality can be abused to laod malicious kernel loaders and gain arbitrary code execution in kernel mode, so this sysctl disables it.

*kernel.systq=4*

The sysRq key exposes a lot of potentially dangerous debugging functionalities to unprivileged users. Contrary to common assumptions. SysRq is not only an issue for physical attacks, as it can also be triggered remotely. The value of this sysctl makes it so that a user can only use the secure attention key, which will be necessary for accessing root securely. Alternatively, you can simply set the value to 0 to disable SysRq completely.

*kernel.unprivileged`_`userns`_`clone=0*

User namespaces are a feature in the kernel which aim to improve sandboxing and make it easily accessible for users. However, this feature exposes significant kernel attack surfaces for privilege escalation, so this sysctl restricts the usage of user namespaces to the CAP`_`SYS`_`ADMIN capability. For unprivileged sandboxing, it is instead recommended to use a setuid binary with little attack surface to minimizw the potential for privilege escalation. This topic is covered further in the sandboxing section.

Be aware though that this sysctl only exists on certain Linux distrobutions, as it requires a kernel patch. If your kernel does not include this patch, you can alternatively disable user namespaces completely (including for root) by setting user.max`_`user`_`namespaces=0.

*kernel.perf`_`event`_`paranoid=3*

Performance events add considerable kernel attack surface and have caused abundant vulnerabilities. This sysctl restricts all usage of performance events to the CAP`_`PERFMON capability (CAP`_`SYS`_`ADMIN on kernel versions prior to 6.8).

Be aware that this sysctl also requires a kernel patch that is only available on certain distributions. Otherwise, this setting is equivalent to kernel.perf`_`event`_`paranoid=2, which only restricts a subset of this functionality.

Network Self-Protection

*net.ipv4.tcp`_`syncookies=1*

This helps protect against SYN flood attacks, which are a form of denial-of-service attack, in which an attacker sends a large amount of bogus SYN requests in an attempt to consume enough resources to malke the system unresponsive to legitimate traffic.

*net.ipv4.tcp`_`rfc1337=1*

This protects against time-wait assassination by dropping RST packets for sockets in the time-wait state.

*net.ipv4.conf.all`_`rp`_`filter=1* *net.ipv4.conf.default.rp`_`filter=1*

These enable source validation of packets received from all interfaces of the machine. This protects against IP spoofing, in which an attacker sends a packet with a fraudulant IP address.

*net.ipv4.conf.all.accept`_`redirects=0* *net.ipv4.conf.default.accept`_`redirects=0* *net.ipv4.conf.all.secure`_`redirects=0* *net.ipv4.conf.default.secure`_`redirects=0* *net.ipv6.conf.all.accept`_`redirects=0* *net.ipv6.conf.default.accept`_`redirects=0* *net.ipv4.conf.all.send`_`redirects=0* *net.ipv4.conf.default.send`_`redirects=0*

These disable ICMP redirects acceptance and sending to prevent main-in-the-middle attacks and minimise information disclosure.

*net.ipv4.icmp`_`echo``_``ignore``_``all=1*

Sourece routing is a mechanism that allows users to redirect network traffic. As this can be used to perform man-in-the-middle attacks in which the traffic is redirected for nefarious purposes, the above settings disable this functionality.

*net.ipv6.conf.all``_``ra=0* *net.ipv6.conf.default.accept``_``ra=0*

Malicious IPv6 router advertisements can result in a man-in-the-middle attack, so they should be disabled.

*net.ipv4.tcp``_``sack=0* *net.ipv4.tcp``_``dsack=0* *net.ipv4.tcp``_``fack=0*

This disables TCP SACK. SACL is commonly exploited and unnecessary in many circumstances, so it should be disabled if it is not required.

User Space Self-Protection

*kernel.yama.ptrace``_``scope=2*

Ptrace is a system call that allows a program to alter and inspect another running process, which allows attackers to triviallly modify the memory of other running programs. This restricts usage of ptrace to only proecess with the CAP`_`SYS`_`PTRACE capability. Alternatively, set the sysctl to 3 to disable ptrace entirely.

*vm.mmpa``_``rnd``_``bits=32* *vm.mmap``_``rnd`_`compat``_``bits=16*

ASLR is a common exploit mitigation which randomizes the position of critical parts of a process in memory. This can make a wide variety of exploits harder to pull off, as they first require an information leak. The above setting increases the bits of entropy used for mmap ASLR, improving its effectiveness.

The values of these sysctls must be set in relation to the CPU architecture. The above values are compatible with x86, but other architectures may differ.

*fs.proectected`_`symlinks=1* *fs.proctected`_`hardlinks=1*

This only permits symlinks to be followed when outside of a world-writable sticky directory, when the owner of the symlink and follower match or when the directory owner matches the symlink’s owner. This also prevents hardlinks from being created by users that do not have read/write access to the source file. Both of these prevent many common TOCTOU races.

*fs.protected`_`fifos=2* *fs.protected`_`regular=2*

These prevent creating files in potentially attacker-controlled environments, such as world-writable directories, to make data spoofing attacks more difficult.

Boot Parameters

Boot parameters pass settings to the kernel at boot using your bootloader. Some settings can be used to increase security, similar to sysctl. Bootloaders often differ in how boot parameters are set. A few examples are listed below, but you should research the required steps for your specific bootloader.

If using Grub as your bootloader, edit /etc/defualt/grub, and add your parameters to the GRUB`_`CMDLINE`_`DEFAULT= line.
If using Syslinux, edit /boot/syslinux.cfg, and add them to the APPEND line.
If using systemd-boot, edit your loader entry, and append them to the end of the linux line.

The following settings are recommended to increase security.

This section originally recommended to apply various slub`_`debug options; however, due to Linux deciding to implicitly disable kernel pointer hashing when using this option, in addition to several other issues with these features, they can no longer be recommended. Users are instead advised to use init`_`on`_`free as a replacement for memory poisoning and linux-hardened’s slab canaries in place of redzoning. If slub`_`debug is in use for anything other than debugging, it is highly recommended to remove it immediately.

Kernel Self-Protection

*slab`_`nomerge*

This disables slab merging, which significantly increase the difficulty of heap exploitation by preventing overwriting objects from merged caches and by making it harder to influence slab cheche layout.

*init`_`on`_`alloc=1 init`_`on`_`free=1*

This enables zeroing of memory during alloctation and free time, which can help mitigate use-after-free vulnerabilities and erase sensitive information in memory.

*page`_`alloc.shuffle=1*

This option randomizes page allocator freelists, improving security by making page allocations less predictable. This also improves performance.

*pti=on*

This enables Kernel Page Table Isolation, which mitigates Meltdown and prevents some KASLR bypasses.

*randomize`_`kstack`_`offset=on*

This option randomises the kernel stack offset on each syscall, which makes attacks that rely on deterministic kernel stack layout significantly more difficult, such as the exploitation of CVE-2019-18683.

*vsycall=none*

This disables vsyscalls, as they are obsolete and have been replaced with vDSO. vsyscalls are also at fixed addresses in memory, making them a potential target for ROP attacks.

*debugfs=off*

This disables debugfs, which exposes a lot of sensitive information about the kernel.

*oops=panic*

Sometimes certain kernel exploits will cause what is known as an “oops”. This parameter will cause the kernel to panic on such oopses, thereby preventing those exploits. However, sometimes bad drivers cause harmless oopses which would result in your system crashing, meaning this boot parameter can only be used on certain hardware.

*module.sig`_`enforce=1*

This only allows kernel modules that have been signed with a valid key to be loaded, which increases security by making it much harder to load a malicious kernel module. This prevents all out-of-tree kernel modules, including DKMS modules from being loaded unless you have signed them, meaning that modules such as the VirtualBox or Nvidia drivers may not be usable, although that may not be important, depending on your setup.

*lockdown=confidentiality*

The kernel lockdown LSM can eliminate many methods that user space code could abuse to escalate to kernel privileges and extract sensitive information. This LSM is necessary to implement a clear security boundary between user space and the kernel. The above option enables this feature in confidentiality mode, the strictest option. This implies module.sig`_`enforce=1.

*mce=0*

This causes the kernel to panic on uncorrectable errors in ECC memory which could be exploited. This is unnecessary for systems without ECC memory.

*quiet loglevel=0*

These parameters prevnet information leaks during boot and must be used in combination with the kernel.printk sysctl documented above.

CPU Mitigations

It is best to enable all CPU mitigations that are applicable to your CPU as to ensure that you are not affected by known vulnerabilities. This is a list that enables all built-in mitigations:

*spectre`_`v2=on spec`_`store`_`bypass`_`disable=on tsx=off tsx`_`async`_`abort=full,nosmt mds=full,nosmt l1tf=full,force nosmt=force kvm.nx`_`huge`_`pages=force*