SQL Server Virtual Machine Storage Design
We have now covered how to optimize storage performance for SQL Server and Windows at the operating system level. Now we will look at how to optimize storage performance with your virtual machine template and discuss the different configuration options available to you. In this section, we cover different virtual machine hardware versions, virtual IO controllers, types of virtual disk, and how to size and deploy your virtual disks onto your storage devices. In this section, we start to look further at IO device queues and how they impact virtual machine storage performance.
Virtual Machine Hardware Version
The virtual machine hardware version you choose will impact the type of virtual storage IO controllers available to your virtual machine. The type of virtual storage IO controller you choose will have an impact on your storage performance, as we will cover in the next section. Table 6.3 shows the different storage IO controller options based on the different virtual hardware versions.
Table 6.3 Supported Virtual Machine Storage IO Controllers
|Feature||ESXi 5.5 and Later||ESXi 5.1 and Later||ESXi 5.0 and Later||ESXi 4.x and Later||ESXi 3.5 and Later|
|Maximum SCSI Adapters||4||4||4||4||4|
|Supported SCSI Adapters||BusLogicLSI Parallel
|SATA (AHCI) Controllers||4||N||N||N||N|
Each virtual SCSI controller (vSCSI) allows up to 15 disks to be connected, for a total of 60 vSCSI disks per virtual machine. With hardware version 10 in ESXi 5.5, VMware has introduced a new SATA (AHCI) controller. Each SATA controller allows up to 30 disks to be connected, for a total of 120 vSATA disks per virtual machine. vSCSI and vSATA can be combined on the same virtual machine for a maximum of 180 virtual disks per VM.
BusLogic and LSI Parallel are legacy controllers not suitable for SQL Server 2012 virtual machines. If you are virtualizing on ESXi 5.5 and using virtual hardware version 10, SATA may have some use if you have a need for a particularly large number of virtual disks per VM. However, for almost all cases, you will choose either LSI Logic SAS or VMware PVSCI (Paravirtualized SCSI). The reason why will become clear as we look in more detail at each of these controllers.
Choosing the Right Virtual Storage Controller
A virtual storage controller is very much like a physical storage controller in terms of how Windows interacts with it. Choosing the right virtual storage controller can have a big impact on your performance. In this section, we cover the different controller options and the performance characteristics of each type. This will help you make the right virtual storage controller choice based on your requirements and constraints.
Remember back to Principle 1: Your database is just an extension of your storage? Our goal is to maximize the parallelism of IOs from SQL Server and Windows through to the underlying storage devices. We don’t want IOs to get held up unnecessarily in Windows itself. However, care needs to be taken because we don’t want to issue so many IOs that the backend storage devices get overloaded and cause additional latency. Overloading your backend storage will not just impact SQL Server, but could also impact all of your other VMs. We will discuss later how you can use features of VMware vSphere to ensure quality of service to your critical databases and other virtual machines to protect them from the impact of noisy neighbors.
Table 6.4 outlines some of the key attributes of the storage controller options for your SQL Server VMs.
Table 6.4 Supported Virtual Machine Storage IO Controllers
|Feature||LSI Logic SAS||VMware PVSCSI||SATA AHCI*|
|Maximum Disks per Controller||15||15||30|
|Default Adapter Queue Depth||128||256||N/A|
|Maximum Adapter Queue Depth||128||1,024||N/A|
|Default Virtual Disk Queue Depth||32||64||32|
|Maximum Virtual Disk Queue Depth||32||256||32|
|AlwaysOn Failover Cluster Instance Supported||Y||N||N|
|AlwaysOn Availability Group Supported||Y||Y||Y|
VMware’s Paravirtualized SCSI controller (PVSCSI) is a high-performance SCSI adapter that allows the lowest possible latency and highest throughput with the lowest CPU overhead. In VMware vSphere 5.x, PVSCSI is the best choice, even if you don’t expect your database to be issuing large amounts of outstanding IO operations. However, like SATA, PVSCSI can’t be used with SQL Server AlwaysOn Failover Cluster Instances, which leverage shared disk clustering. When you are using AlwaysOn Failover Cluster Instances, your only option is LSI Logic SAS.
Choosing a virtual storage controller with a higher queue depth will allow SQL Server to issue more IOs concurrently through Windows and to the underlying storage devices (virtual disks). By having more virtual disks (more drives or mount points), you increase the amount of queues that SQL Server has access to. Balancing the number of data files to drive letters, to virtual disks, and to adapters allows you to maximize the IO efficiency of your database. This will reduce IO bottlenecks and lower latency.
Not all virtual disks will issue enough IOs to fill all of the available queue slots all of the time. This is why the adapter queue depths are lower than the aggregate total number of queues per device multiplied by the total number of devices per adapter. PVSCSI, for example, has 15 virtual disks, and each disk has a queue depth of 64 by default. The number of devices multiplied by their queue depth would be 960, even though the adapter default queue depth is only 256.
In most cases, the default queue depths are sufficient for even very high performance SQL Server systems—especially when you are able to add up to four vSCSI adapters and increase the number of virtual disks per adapter. With LSI Logic SAS, you have a maximum of 32 queue slots per disk and a maximum of 128 queue slots per adapter. Neither can be changed. In this case, your only option to scale IO concurrency is by adding virtual controllers and adding virtual disks. This is a key consideration when considering AlwaysOn Failover Cluster Instances, where LSI Logic SAS is the only vSCSI adapter option.
With PVSCSI, you can modify the disk queue depth and the adapter queue depth from their default settings. This is only required in very rare cases where your database needs to issue very large amounts of IO in parallel (>1,000). To keep things standardized and simple, we recommend leaving the default settings in your templates and only modify them if absolutely necessary. This assumes your underlying disk subsystems can support the parallelism required at low-enough latency.
Figure 6.11 shows an example of the registry entries configured to increase the maximum adapter and virtual disk queue depths for a VMware PVSCSI adapter, as documented in VMware KB 2053145.
If you choose not adjust the queue depth or are unable to adjust the queue depth of a particular storage device or adapter, Windows will queue any additional IOs. Windows will hold up to 255 IOs per device before issuing them to the adapter driver, regardless of the devices underlying queue depth. By holding the IOs in the Windows OS before issuing them to the adapter driver and the underlying storage, you will see increased IO latency. To learn more about the Windows storage driver architecture (storport), we recommend you read the article “Using Storage Drivers for Storage Subsystem Performance” at Windows Dev Center [http://msdn.microsoft.com/en-us/library/windows/hardware/dn567641 and http://msdn.microsoft.com/en-us/library/windows/hardware/ff567565(v=vs.85).aspx].
Figure 6.12 shows the difference in IOPS and latency between PVSCSI, LSI Logic SAS, and SATA AHCI. These tests were conducted using a single drive at a time on a single VM. The VM was configured with two vCPUs and 8GB RAM. Each virtual disk was placed on the same VMFS5 data store on top of a Fusion-io ioDrive2 1.2TB PCIe flash card. IOMeter was used to drive the IO load and measure the results.
As you can see from the graph in Figure 6.12 (published at http://longwhiteclouds.com/2014/01/13/vmware-vsphere-5-5-virtual-storage-adapter-performance/), both SATA and LSI Logic SAS have no significant performance advantage going from 32 outstanding IO operations (OIO) to 64 due to their maximum device queue depth being 32. PVSCSI, however, sees a 15% improvement in IOPS between 32 OIOs and 64, based on a single Fusion-io ioDrive2 card as the underlying storage. A storage array of multiple backend devices will potentially show a much greater improvement when queue depth is increased. This assumes the HBAs and storage processors are configured to accept a higher queue depth and not overloaded.
Table 6.5 displays the IOMeter performance results for each virtual storage adapter, including throughput and CPU utilization for the 8KB IO size. The IO pattern used was 100% random read, with a single worker thread and single virtual disk from the test virtual machine. As you can see from the results, PVSCSI shows significantly better IO performance at lower latency and lower CPU utilization compared to the other adapter types.
Table 6.5 Virtual Storage Adapter Performance (32 OIOs)
|Virtual Storage Adapter||IOPS||Avg. Latency (ms)||Throughput (MB/s)||CPU Utilization|
|LSI Logic SAS||45212.96||0.71||353.23||45%|
Table 6.6 displays the IOMeter performance results of increasing the outstanding IOs from 32 to 64 to issue more IOs in parallel using otherwise similar test parameters. As was the case with the previous test, PVSCSI shows significantly improved performance compared to the other adapter types.
Table 6.6 Virtual Storage Adapter Performance (64 OIOs)
|Virtual Storage Adapter||IOPS||Avg. Latency (ms)||Throughput (MB/s)||CPU Utilization|
|LSI Logic SAS||44933.46||1.42||351.04||43%|
This test also shows that a single virtual machine on a vSphere 5.5 host with a single virtual disk can provide good performance and throughput, provided the underlying storage system can support it. Using PVSCSI with vSphere 5.1 and above, a single virtual machine can support up to one million IOPS at 4KB IO size (see http://blogs.vmware.com/performance/2012/08/1millioniops-on-1vm.html).
Choosing the Right Virtual Disk Device
You have a small number of options when choosing a virtual disk type for SQL Server, and the choice you make may have an impact on performance. However, modern storage systems and advancements in hypervisor technology have equalized the performance aspects of different virtual disk types in a lot of cases. Today, the type of storage you’re using and your availability design will largely drive your virtual disk selection. Table 6.7 outlines the different virtual disk options.
Table 6.7 Virtual Disk Types
|Virtual Disk||Zero on First Write||Size Limit||SQL Failover Cluster Instance Across Boxes Supported|
|Thin||Y||2TB–512B62TB (ESXi 5.5)||N|
|Thick – Lazy Zero||Y||N|
|Thick – Eager Zero||N||N*|
|Virtual Raw Device Map (vRDM)||N||N|
|Physical Raw Device Map (pRDM)||N||64TB||Y|
|PassThrough PCI or VM DirectPath IO||N||Device Limit||Y**|
Thin Versus Thick Lazy Zero for SQL Server
The major difference between Thin and Thick Lazy Zero disks is that Thin disks are not preallocated and start small and grow on demand, whereas Thick Lazy Zero disks are preallocated. The unit of growth of a Thin disk is the VMFS block size, which is usually 1MB for VMFS5, unless the data store was upgraded form VMFS3. On a very active VMFS data store, there is the possibility that as the Thin disk grows, the blocks allocated will not be contiguous, but in most cases, this will not negatively impact performance.
There is a myth that the performance of Thick Provisioned disks, even if using Lazy Zero, is much better than a Thin Provisioned virtual disk (VMDK). This is not the case. Thin Provisioned disks and Thick Provisioned Lazy Zero disks have similar performance characteristics. This is because each time a block of data is initially written to either a Thin or Thick Lazy Zero disk, the block must first be zeroed out. This magnifies the write IO impact of blocks that have not previously been written because two write IOs will be issued by the ESXi host for each block. This may have a noticeable negative impact on write IO latency, depending on your underlying storage.
The reason to choose between Thin or Thick Lazy Zero therefore has little to do with performance and more to do with manageability and efficiency of storage consumption. There are tradeoffs to each choice. Your choice needs to be based on your requirements and circumstances.
Using thin provisioning may allow for higher utilization of storage capacity as each VMDK, data store, and underlying storage device will have a percentage of free space unused as a safety buffer. At the same time, it will add additional management overheads to the environment because administrators have to ensure they do not excessively over-provision real storage resources in terms of both capacity and performance. You need to be aware of possible growth storms and keep on top of your storage demands.
The capacity savings from thin provisioning may well be enough to justify the management overheads because you are able to purchase more on demand instead of up front, and this could save a considerable amount of money. But you need to make sure you can get the performance you need from the capacity that has been provisioned and is used. Sizing for performance may necessitate much more capacity is provisioned on the backend storage devices and therefore diminishes any savings that may have been had when saving capacity through thin provisioning.
If you don’t use Instant File Initialization, then SQL Server will zero out its data files whenever they are created or extended. This will ensure you get optimal performance from the data files regardless of the underlying virtual disk format. But this comes at the cost of the time taken to zero out the file and the resulting impact in terms of storage IO to the underlying storage. As previously discussed, using Instant File Initialization allows SQL Server to act as part of Windows and not write a zero to a block before data is written to it. In certain cases, there could be substantial storage efficiencies (IO Performance and Capacity) by combining the use of Instant File Initialization, thin provisioning, and SQL Server compression. This may be especially advantageous to development and test environments. There can be a significant performance penalty if you use a non-VAAI array without using SQL Instant File Initialization on Thin and Thick Lazy Zero disks. VAAI allows the zeroing operation to be offloaded to the array and performed more efficiently, thus saving vSphere resources for executing VMs. If you use Thin Provisioned or Lazy Thick VMDKs without a VAAI-compatible array, the entire zeroing operation has to be handled solely by vSphere.
If your SQL Server and environment meets the following requirements, you may want to consider using Thin Provisioned VMDKs with Instant File Initialization and SQL Data Compression:
- The SQL Server workload will be largely read biased.
- Performance from your storage during times that blocks are initially written to and zeroed out is sufficient to meet your database SLAs.
- Performance is sufficient from the capacity required when thin provisioning is used.
- You are not planning to use Transparent Data Encryption.
- You wish to minimize the amount of storage provisioned up front and only purchase and provision storage on demand.
When you are using Thick Provisioning Lazy Zero (the default), the VMDK’s space is allocated up front by vSphere, although like with thin provisioning, it is not zeroed out until it’s written to for the first time (or you select full format in Windows when partitioning the disks). When you look at the data store, you may get a more accurate view of free space and there may be less variance between provisioned space and usage. The reason we say you may get a more accurate view of free space is that many modern arrays will tell vSphere the storage is allocated or consumed but won’t actually do so until data is written to it, although it most likely will be reserved.
If you were considering using Thin or Thick Lazy Zero VMDKs for SQL Server, we would recommend you choose the default of Thick Lazy Zero to minimize management overheads. We would recommend using Thin where there are requirements that would benefit from it and justify the management overheads. However, before you decide on Thick Lazy Zero, you should consider Thick Eager Zero, which we cover next.
Using Thick Eager Zero Disks for SQL Server
The major difference between Thick Eager Zero and Thick Lazy Zero or thin provisioning is when the blocks on the VMDK are zeroed out. As we’ve covered with Lazy Zero and Thin VMDKs, blocks are zeroed on first write. With Eager Zero, the blocks are zeroed when the VMDK is created as part of the provisioning process. This means all blocks are pre-zeroed before Windows or SQL goes to access them. By doing this, you are eliminating a first write penalty in the situations where that would occur. This ensures there is no double write IO required to the VMDK after it is provisioned.
As you can imagine, it can take quite a bit longer to provision Thick Eager Zeroed disks. Additionally, provisioning and zeroing out the blocks may impact the performance of other VMs when using shared storage devices. The impact to your environment will be dependent upon the type and utilization of your backend storage devices. Some storage arrays will just throw away or ignore the zeros, and in these cases, the provisioning operations will complete relatively quickly and have minimal impact on performance.
In aggregate, over the life of a VMDK there is normally little difference in the amount of IO generated when using Thin, Thick Lazy Zero, or Thick Eager Zero VMDKs. The difference is all about the timing of when the IO is generated, either up front (in the case of Eager Zero) or on demand (first write) with Thick Lazy Zero and Thin. Once a block has been written to with Thick Lazy Zero or Thin, it has exactly the same performance characteristics as if it were Eager Zeroed. However, with Eager Zero, even if a block is never used, you have zeroed it out at the cost of a write IO operation.
If you are proactively managing SQL Server data and transaction log files, and not using Instant File Initialization, then the performance of your virtual machine will be the same regardless of the virtual disk type you select. This is because SQL Server is zeroing out the blocks first before they are used. If you enable IFI, then Eager Zero will give better performance in terms of lower latency compared to Thick Lazy Zero or Thin, but only when the block is first written to. All subsequent writes or access to the same block will have exactly the same performance characteristics.
Although the aggregate amount of IO may be similar between the different virtual disk options, Eager Zero generally provides the more predictable response times because IOs will not be impacted by the additional write operation when data is written to a new block. This predictability of IO response and generally lower latency is why Eager Zero is required for the non-shared disks of a SQL Server Failover Cluster Instance. Increased latency or poor IO performance can cause unnecessary cluster failovers between nodes.
With the advent of VMware’s VAAI and modern arrays that support it, the impact to the environment of zeroing operations is reduced and therefore the performance impact of using Eager Zero Thick disks is also reduced during initial provisioning. If you were previously thinking of using Thick Lazy Zero VMDKs and you have a VAAI-capable array that supports the Write Same primitive, we would recommend you use Thick Eager Zero instead. This provides lower management overheads and optimal performance. Regardless of whether you are using IFI or not, and in spite of the possible overhead of having written zeros to a block that may not be used, we feel this is justified for the decreased latency and increased predictability of IO responses that are provided to SQL Server. This is especially important for business-critical production databases. It is fine to use Thin or Thick Lazy Zero for your Windows OS disk, while using Eager Zero Thick for your database drives (data files, Temp DB, and transaction logs). When using SQL AlwaysOn Failover Cluster Instance, it is recommended that you configure Windows OS disks as Eager Zero Thick; shared LUNs will in this case be configured as physical RDMs.
Figure 6.13 shows a sample configuration of a virtual disk with the selection of Thick Provision Eager Zeroed.
Using Raw Device Maps for SQL Server
A Raw Device Map (RDM), as the name suggests, is a direct mapping of a physical LUN to a virtual machine. The main reason to choose RDM is SQL Server Failover Cluster Instances (FCI). SQL FCI uses Windows Failover Clustering (previously known as Microsoft Cluster Services), shared SCSI bus, shared disks between nodes, and requires persistent SCSI reservations. To allow the persistent SCSI reservations and the cluster to function correctly, Physical Mode or Pass-through RDM (pRDM) are required. Another reason to consider using RDMs for SQL Server is if you are leveraging physical SAN capabilities such as snapshots that you wish to present to a physical server for a purpose such as LAN-free backups, if you are not using a backup solution integrated with the VMware APIs for Data Protection. However, there are no noticeable performance differences between RDMs and virtual disks on a VMFS file system, as Figure 6.14 illustrates.
Figure 6.14 illustrates the performance comparison between VMFS and RDM using a random 50/50 mixed read-write workload pattern and the different IO sizes based on data published at http://www.vmware.com/files/pdf/performance_char_vmfs_rdm.pdf.
Although there are no noticeable performance differences between a single VMDK on a VMFS data store and an RDM, there are important performance considerations and constraints with using RDMs that need to be considered, such as:
- An RDM maps a single LUN to a virtual machine, so each VM will likely consume multiple LUNs and there will be more LUNs to manage.
- More LUNs are required, which may constrain the number of VMs possible as the maximum number of LUNs per host is currently 256.
- It is not possible to perform storage IO quality of service on a pRDM; therefore, a VM configured to use a pRDM could negatively impact the performance of other VMs using the same underlying shared storage array.
- Can’t leverage vSphere features such as Storage vMotion, so it can be more difficult to balance capacity and performance when using pRDMs and more difficult to resolve any storage hot spots.
Due to the management overheads, constraints, and VMware feature limitations of using RDMs, we recommend their use only when absolutely necessary, such as to deploy SQL FCI; in all other cases, VMDKs should be used. Using VMDKs future proofs your environment and allows it to benefit from any further advancements that VMware releases that pertain to VMDKs.
The IO Blender Effect
When you virtualize SQL and consolidate many SQL VMs onto fewer hosts, the amount of IO per host increases. In addition to the increase in IO per host, in most cases the IO patterns will also change. Unless you have completely dedicated storage for each SQL VM, which is not optimal from a cost or performance perspective in most cases, all IO will at some point become random.
Any time you share storage and have multiple VMs and different IO workloads, the combined IO pattern is random. Random IO, especially when write biased, can have a significant impact on storage performance, particularly when RAID (Redundant Array of Inexpensive or Independent Disks) is used. Grouping similar workloads together can help improve the blended IO pattern and reduce the burden on storage. Figure 6.15 shows the impact of combining different IO workload patterns.
This is an important concept to understand because you will need to size your storage to be able to handle the required number of IOPS with a completely random IO pattern. Random IO has a higher overhead than sequential IO in most cases, with the exception of some flash-based storage systems. Subsequent sections of this chapter will discuss IO workload performance impacts of different physical storage systems in more detail.
SQL Virtual Machine Storage Layout
Now that we have covered the various storage IO controller and virtual disk device choices, we can put it all together and discuss the design of a logical virtual machine storage layout. This layout, in turn, supports our SQL Server and Windows design and will drive the design of our underlying storage devices. We want to take our five key principles and apply these so our virtual machine storage layout meets the requirements of our database workloads in the simplest way possible, without compromising SLAs.
The example in Figure 6.16 shows a simple storage layout for a SQL Server VM that has all of its VMDKs supported by a single underlying data store. You could also have a number of SQL Server VMs and their VMDKs on the same data store. For development and test VMs, and where SQL FCI is not used, this may be a suitable design choice. It would also be suitable for the storage of your SQL Server template VM. However, it is unlikely to be a suitable choice for high-performance business-critical production SQL Server databases. The Windows C: drive, application binaries, and page file may be on the same data store or hosted on another data store.
The performance of SQL in this example will be limited to the performance of a single data store, and it will have access to the queue depth of a single data store, even though the individual virtual disks may be trying to issue many IOs in parallel. This example is the simplest from a management perspective, though, because there is only a single data store to manage. This sample layout assumes that not all of the virtual disks will be issuing IOs at the same time and that the aggregate amount of outstanding IOs will not exceed the available queue depth of the data store. If the available queue depth of the data store and the underlying storage device is exceeded, the result will be additional IO latency in the hypervisor and slower response times for your database. Another impact of this choice is that all IOs from SQL will be blended together and become completely random, as we show in the “IO Blender Effect.”
The example in Figure 6.17 shows two VMDKs per data store. This layout may be suitable for production SQL databases, provided the underlying data store could support the performance requirements of the VMDKs. This assumes that the data store has sufficient queue depth for the peak number of outstanding or parallel IOs from the VMDKs; otherwise, additional latency will result and response times will be degraded. SQL will benefit from the combined IO performance of multiple data stores and the queue depth available from multiple data stores to allow many IOs to be serviced in parallel.
This sample layout provides a balance between manageability and performance, because there are a relatively small number of data stores to manage per SQL VM, less than would be required for a physical SQL system or where RDMs are used. This is quite a common layout for SQL systems that have reasonable but not extreme performance requirements. The data store that holds the transaction logs would blend the otherwise sequential IO patterns and make them random. If this was a concern, the transaction log VMDKs could be separated onto their own data stores.
The example in Figure 6.18 shows each VMDK mapped to a dedicated data store. This layout is suitable for SQL systems that need extreme IO performance and scalability. It allows IO to be spread across more storage devices, and each VMDK has access to the maximum possible amount of parallel IO. The increased number of data stores and therefore LUNs will limit the total number of VMs that can be supported per host. You will have many more data stores to manage per VM, which will increase your management overheads.
If each SQL VM has 10 data stores, you could be limited to just 25 VMs per cluster, as each data store should be zoned to each host in the cluster to allow VMware HA and DRS to function correctly. It is likely that if you need this layout for storage performance, your SQL VMs will also have very large compute footprints in terms of memory and CPU. However, if this is required to meet your performance requirements, you may find that you need to design for a smaller number of hosts per cluster, and potentially have more clusters. This layout assumes that each VMDK will use the full queue depth of each data store, which is often not the case. You may find that you need to reduce the queue depth per LUN to avoid overloading your backend storage ports, which defeats the purpose of having more LUNs in the first place.
Often the need for extreme performance is driven by many database instances or schemas running on a single VM, and in these cases it may be a better design choice to split up those instances into multiple VMs. Because VMDKs (not RDMs) are used, it is possible to start with the example in Figure 6.19 and increase the number of data stores if required at a later time. You can migrate the VMDKs without any downtime by using VMware Storage vMotion.
Up until now we have provided examples where the storage is dedicated to each SQL Server. This is a very traditional approach to SQL storage architecture. When you have a very good baseline and understanding of your inventory and workload characteristics, it is a good approach, but it has a couple of potential drawbacks. The first drawback is manageability. You must have a number of data stores supporting each VM, which produces more data stores to manage, and may not balance performance and capacity efficiently between many SQL Server VMs. You may end up with many different data store sizes for each of the different databases, which provides little opportunity for standardization. This may be more of a problem in a smaller environment because there may be fewer SQL VMs of similar size; in large-scale environments (hundreds of SQL VMs), this is generally less of a problem.
The next potential drawback is that although you may have isolated the storage logically to each VM, if you share the same storage under the covers, each VM could impact the performance of the others. When a single VM is using a storage device, you can’t make use of VMware vSphere features such as Storage IO Control (SIOC) to ensure quality of service and fair IO performance between different VMs. This may place an additional burden on storage administrators to try and isolate performance at the physical storage level, which can often lead to limited and suboptimal overall performance.
Finally, the isolation approach doesn’t lend itself easily to automation and policy-based administration. It is also not possible to dedicate storage devices to SQL Server VMs in this manner in most Cloud or Infrastructure as a Service environments. To make automation and policy-based administration possible, you need standardization and you need to share multiple data stores among many VMs. This then allows you to leverage the features of VMware vSphere to ensure quality of service and fairness of IO performance between the many SQL VMs if there is any contention.
The example in Figure 6.19 shows two SQL Server VMs sharing the same data stores for the different types of Windows OS and SQL disks. This layout allows the SQL VM’s performance to be balanced with a standardized data store size and allows for easier automation and policy-drive provisioning and load balancing. Because the data stores are shared, VMware Storage IO Control can ensure fairness of IO and quality of service for IO performance between the multiple SQL VMs.
SQL Failover Cluster Instance Storage Layout
In this section we have shown how you can efficiently lay out your virtual machine storage for SQL and use fewer LUNs than you have VMDKs, while balancing performance requirements. This is possible when using standalone instances or when using AlwaysOn Availability Groups. However, when using SQL AlwaysOn Failover Cluster Instances, you must use pRDMs and therefore bypass the VMFS data store and the ability to share LUNs, as Figure 6.20illustrates.
For this reason and for reduced management overheads and complexity, we recommend the use of AlwaysOn Availability Groups over Failover Cluster Instances where possible.
Expanding SQL Virtual Machine Storage
When designing your virtual machine storage architecture for SQL Server, you need to consider how the allocated storage will be expanded in the future as your databases grow. We previously discussed in “SQL Server File System Layout” the ability to expand partitions and virtual disks online without disruption. This is one way of expanding the storage available to SQL Server. An alternative approach would be to hot-add additional virtual disks to SQL Server and then balance the data files across the additional disks.
If you hot-add new disks and need to create new data files, SQL Server will stripe the data to the newly created data files as they have the more free space. For this reason, we recommend you add more than one virtual disk and data file to try and spread the IO load. This will help avoid creating hot spots. The number of VMDKs and data files you need to create will depend on your SQL workload profile.
Jumbo VMDK Implications for SQL Server
vSphere 5.5 introduced the ability to provision 62TB Jumbo VMDKs and Virtual Mode RDMs (vRDM) with a VM. Physical Mode RDMs (pRDM) are capable of being provisioned up to 64TB, as of vSphere 5.0. The VMware maximum VMFS data store size is 64TB, as it was in vSphere 5.0. This allows truly massive storage footprints to a single VM.
With Virtual Hardware Version 10, we now have the ability to provision a single VM with maximum storage capacity (see Table 6.8).
Table 6.8 Maximum Virtual Machine Storage
|Virtual Disk Controller||Max # Controllers||VMDK/ Controller||Total VMDKs||Capacity VMDK/vRDM||Capacity pRDM|
|Totals||8||180||11,160TB (11PB)||11,520TB (11.5PB)|
Just because the size of the virtual disk increases doesn’t mean the performance of the virtual disk increases. With each virtual disk, the queue depth is still the same regardless of the size. This limits the parallelism of IOs to a single virtual disk, and it will also limit the throughput unless SQL is issuing incredibly large IO sizes. For this reason, the maximum capacity is largely theoretical because you would not be able to get the necessary performance.
Although having lots of 62TB virtual disks is unrealistic, having a few virtual disks > 2TB is possible and potentially desirable for large SQL Servers. You can use a single virtual disk for your transaction logs (max 2TB per transaction log file), and you would be able to use a single virtual disk for your backup drive. Both transaction logs and backups are sequential in nature and could benefit from the capacity of a larger > 2TB VMDK without the performance drawbacks that would be likely for data files. Your underlying storage platform would need to support a VMFS data store of a LUN size big enough to support all of these large VMDKs. You should also consider your restore times when using large VMDKs. If you can’t restore a large VMDK within your SLAs, it is not a good choice. Just because you can use Jumbo VMDKs doesn’t mean you always should.
VMFS Heap Size Considerations with Monster VMs and Jumbo VMDKs
ESXi 4.x and 5.x prior to 5.5 used a VMFS Heap value to control how much memory was consumed to manage the VMFS file system and for open or active VMDK capacity on a single ESXi host. This limit was not documented in the vSphere Maximum’s product document, and by default with a 1MB block size on ESXi 5.0 GA, it would limit a host to being able to open 8TB of total VMDKs before errors could occur. The maximum on ESXi 5.0 GA was 25TB with a 1MB block size, which required adjusting the advanced parameter VMFS3.MaxHeapSizeMB. This was later increased to 60TB by default on ESXi 5.0 by applying the latest patches and in ESXi 5.1 Update 1. The only downside of this was 640MB of RAM was consumed for the VMFS Heap.
In vSphere 5.5, the whole VMFS Heap size problem has been addressed. The VMFS Heap is now irrelevant as a measure of how much open and active VMDK capacity a single ESXi 5.5 host can handle. This is due to major improvements in the way the VMFS Heap and pointer blocks are managed.
VMFS pointer blocks are a pointer to a VMFS block on disk. When a VMDK is opened on an ESXi 5.5 host, all of the VMFS “pointer” blocks are cached in the Pointer Block Cache, which is not part of the main VMFS Heap (where the pointer blocks were previously stored in prior versions of ESXi). This allows the open VMFS “pointer” blocks to be addressed or accessed and managed as fast as possible without having to access metadata from the VMFS file system directly. The pointer blocks will remain in use so long as a VMDK or other file is open. However, many blocks in any individual VMDK are not often active. It’s usually only a percentage of the blocks that are actively used (say, 20%). The images shown in Figures 6.21 and 6.22 display how the pointer blocks are used to refer to data blocks on the VMFS file system. Each pointer block that is active is stored in the pointer block cache to ensure the fastest possible access to the most frequently used blocks.
Pointer Block Eviction Process
This is where the new Pointer Block Eviction Process introduced in ESXi 5.5 comes in. If the number of open and active VMFS blocks reaches 80% of the capacity of the Pointer Block Cache, a Pointer Block Eviction Process will commence. This basically means the pointer blocks that are not active, or least active, will be evicted from memory and only the active blocks will remain in the cache. This new process greatly reduces the amount of ESXi host memory consumed to manage VMFS file systems and the open VMDKs capacity per host. The VMFS Heap itself in ESXi 5.5 consumes 256MB of host RAM (down from 640MB), and the Pointer Block Cache by default consumes 128MB of host RAM. You no longer have to worry about adjusting the size of the VMFS Heap at all. A new advanced parameter has been introduced to control the size of the Pointer Block Cache, MaxAddressableSpaceTB.
As with all advanced parameters, you should not change MaxAddressableSpaceTB without a good justification, and in most cases, it will not be necessary. MaxAddressableSpaceTB by default is set to 32, with a maximum of 128. This controls the amount of host RAM the Pointer Block Cache consumes. With the default setting at 32, it will consume 128MB of host RAM (as mentioned previously), and with the maximum setting of 128, it will consume 512MB of host RAM. However, it’s important to note that this does not limit the capacity of open VMDKs on the ESXi 5.5 Host, just how many of the pointer blocks can stay cached in RAM. If only 20% of all VMDK blocks are active, you could conceivably be able to have 640TB or more of open VMDK capacity on the host, while still having the active pointer blocks cached without much, if any, performance penalty.
The way this new Pointer Block Eviction Process works gives you a sense of having an almost unlimited amount of open VMDK capacity per ESXi 5.5 host. But it’s not quite unlimited; there is a tradeoff as the amount of active VMDK capacity on an ESXi 5.5 host increases. The tradeoff is possible Pointer Block Cache Thrashing, which may impact performance.
With the default setting of MaxAddressableSpaceTB=32, the Pointer Block Eviction Process won’t kick in until the amount of open VMDKs exceeds 25.6TB. So if you aren’t expecting the VMs on your hosts to routinely exceed 25TB of open and active VMDK blocks, there is probably no need to even look at adjusting MaxAddressableSpaceTB; this saves you some host RAM that can be used for other things. In most cases, you would only have to adjust MaxAddressableSpaceTB if the active part of all open VMDKs on a host exceeds 25TB. If active VMDK blocks exceed the capacity of the Pointer Block Cache, then thrashing could result from constantly evicting and reloading pointer blocks, which may have a performance penalty.
You will see signs of Pointer Block Eviction in the VMKernel logs on your hosts if it is occurring. Syslog, vCenter Log Insight, or Splunk will help you spot this type of activity. If you start to notice any sort of performance impact, such as additional storage latency visible in KAVG in ESXTOP, and a correlation to Pointer Block Eviction, then that would be a sign you should consider adjusting MaxAddressableSpaceTB. If you’re planning to have 100TB of open VMDKs per host routinely, as in the case of large SQL Servers, we recommend setting MaxAddressableSpaceTB = 64 and adjusting upwards if necessary. If you’re not concerned about the amount of RAM the Pointer Block Cache will consume, you could consider setting it to the maximum of 128.
Increasing MaxAddressableSpaceTB may consume host RAM unnecessarily and so should be considered along with the total RAM per host and the RAM that is likely to be consumed by all VMs. 512MB of RAM consumed for Pointer Block Cache on a host with 512GB of RAM or more is not significant enough to worry about, but could be worth considering carefully if your hosts only have 32GB of RAM.