When using Proxmox in my homelab, one of the parameters that will most affect my IOPS metrics may be the disk cache mode. In this article I will try to explain how Proxmox’s cache works, the multiple modes it has, and how to choose the best one for each scenario. But first…
What are IOPS?
IOPS (Input/Output Operations Per Second, pronounced i-ops) is a common benchmark performance measure for computer storage devices.
It gives us, therefore, an idea of how many operations the unit will be able to satisfy in a given time. The higher this number is, the more capacity the disk will have to be able to satisfy the needs of our homelab.
Then… it is clear that the higher our IOPS number is, the better our storage will respond. However, it will not only depend on the type of storage we are going to use (HDD vs SSD), but also other factors such as the network bandwidth (in case of using SAN), the hardware used to connect the disks, the storage controllers, the OS background operations or the system configuration. This last one is the one we are going to deal with today.
Disk cache in Proxmox
In the Proxmox wiki we can find a section dedicated to disk caching. Although a bit chaotic, it gives us the keys to know which modes are available to us and what advantages each one gives us. I summarize it here:
The host page cache is bypassed and I/O happens directly between QEMU-KVM and the actual storage device.
This mode causes QEMU-KVM to interact with the disk image file or block device in a way that writes are reported as completed ONLY when the data has been committed to the storage device.
This mode causes QEMU-KVM to interact with the disk image file or block device in a way that the host page cache is used and writes are reported to the guest as completed when placed in the host page cache, BEFORE they are committed to the storage device.
This mode causes QEMU-KVM to interact with the disk image file or block device in a way that writes are reported as completed ONLY when the data has been committed to the storage device, AND when it is also desirable to bypass the host page cache.
This mode is similar to the cache=writeback mode discussed above.
The key aspect of this “unsafe” mode, is that all flush commands from the guests are ignored. Using this mode implies that the user has accepted the trade-off of performance over risk of data loss in the event of a host failure.
So… which one is better for me?
As a general rule, high end systems typically perform best with
cache = none, because of the reduced data copying that occurs. But looking at safety, these are the conclusions:
cache = writethrough, cache = none, cache = directsync
These are the safest modes, and considered equally safe, given that the guest operating system is “modern and well behaved”, which means that it uses flushes as needed.
cache = writeback
This mode relies on the guest to send flush commands as needed to maintain data integrity within its disk image.
It should be noted that because there is a window of time between the time a write is reported as completed, and that write being committed to the storage device, this mode exposes the guest to data loss in the unlikely event of a host failure.
cache = unsafe
This mode is similar to writeback caching except the guest flush commands are ignored resulting in a higher risk of data loss due to host failure.
Some interesting resources on the subject: