Posted by: virtualizationeh | July 12, 2011

VMware vSphere 5.0 – Performance Unleashed

Its been quiet recently…very quite…because something BIG was being created…

Finally I’m able to share some of my excitement with the virtualization community around VMware’s next major release – vSphere 5.  You all know I have a passion for performance so, today we start a new journey around education on just what is possible on this Cloud Infrastructure.  Here’s the start:

What’s New in VMware vSphere 5.0: Performance Whitepaper

Some very cool technical highlights of the ‘Monster’ VM:

I used to say one could confidently virtualize 99% of their workloads, but after today’s release, I’m now suggesting one can virtualize 99.9% of workloads.  Over the coming weeks you’ll start to see a stream of performance data demonstrating this enormous capability.

“And one more thing” Eric just posted:

vSphere 5 can virtualize itself and 64 bit guests – Sweet!

Posted by: virtualizationeh | February 17, 2011

Troubleshooting vSphere 4.1 Performance Issues – Updated Whitepaper

I’m excited to highlight that VMware, specifically Chethan Kumar, has released an updated Performance Troubleshooting guide for vSphere. The document was originally penned by Hal Rosenberg and has been referred to as the ‘Performance Troubleshooting Bible.’  Both of these fine gentleman work as performance engineers within VMware and have strove to identify common issues and potential resolutions in an easy-to-use framework.

This new edition has been updated specifically for vSphere 4.1 and should be part of every virtual administrators core collection. Whatever you call it, all virtual administrators should not only review this guide for the cool performance information contained within, but it should be used as Step #1 for troubleshooting any performance related issue.  Using it will help you diagnose all the most common issues seen in the field.

Abstract:

This document provides step-by-step approach for troubleshooting most common performance problems in vSphere-based virtual environments. The steps discussed in the document use performance data and charts readily available in the vSphere Client and esxtop to aid the troubleshooting flows. Each performance troubleshooting flow has two parts:

  1. How to identify the problem using specific performance counters.
  2. Possible causes of the problem and solutions to solve it.

I reference this guide at the end of all my presentations and will be updating my links to the new edition.  Performance is often seen as a dark art but in reality is often misunderstood. I hope resources, like this guide, will help take everyone’s knowledge to the next level through awareness and practical application.

The document is available here: Troubleshooting Performance Related Problems in vSphere 4.1 Environments

If you have any feedback, feel free to drop me a line and I’ll pass it along.

Posted by: virtualizationeh | January 18, 2011

vSphere Performance Presentation at PEX

I’m very excited to be presenting at VMware’s Partner Exchange 2011 in Orlando.  In this session I’ll be covering why I’m confident you can virtualize your Tier 1 apps on the vSphere platform, practices & configurations to consider and some common troubleshooting steps.

Session Details:

  • Location: Coronado DE
  • Time/Date: Thursday Feb 10 at 2:00pm

So for all you VMware partners out there, I hope you can attend and reach out to me while you’re down there so we can chat performance.

For more information on PEX 2011 and Registration check out this link.

Remember:  Performance is not a barrier.

Posted by: virtualizationeh | December 23, 2010

Do Consolidation Ratios Matter Anymore?

I was on a road trip recently and had a number of discussions about consolidation ratios as a measure of success.  This worried me.  Especially considering that these were large virtual adopters and I was talking with these clients about moving their Tier 1 applications to their virtual platforms.  It made me realize that a number of organizations are still using that ‘old’ ratio as a key performance indicator (KPI) for measuring how successful they are at virtualization.

Yes – back in the day consolidation ratios were a common measure often used for capacity management, generating ROI models and generally bragging rights.  This was fine when the only services being migrated were “low hanging fruit” or “extremely” underutilized servers.  One could expect 10:1, 25:1 even 50+:1 as a ratio.

But can we keep up those high ratios as we migrate Tier 1 services to virtual platforms?

Easy answer: No!

For the obvious reason that these larger workloads are actually using the hardware resources underneath them (being physical or virtual) so when we start to stack them side by side there are fewer resources to go around.  But this is not a bad thing.  A lower ratio does not mean you’ve failed.  It seems an obvious message but one I wanted to re-iterate as some seem to miss it.

Lets make sure our vision and focus for virtualization goes deeper than consolidation.  Don’t forget about the many other cool things virtualization brings like:  DR enablement, unlocking multicore platforms, simpler management and capacity in the cloud.

Merry Christmas Everyone!

Posted by: virtualizationeh | December 20, 2010

How to Collect Useful Performance Data

Despite the fact my role is pre-sales – I spent a lot of time digging into client performance statistics with them.  I wanted to document some of the more common ways to collect and review performance data.

vCenter:

I view vCenter as a great performance trending tool only.  Real-time statistics are not stored in the database and can only be viewed for the past hour – sadly.  That said, when a client is trying to troubleshoot an issue I recommend they temporarily change the logging level for the 5 minute interval from the default level of 1 to 4.  While this will increase your database size, some other interesting counters (memory, cpu, disk and network) are captured and can looked at with some historical accuracy for the day.  See this vCenter Performance Counters page for what’s available in each level.  Organizations should really consider augmenting the vCenter tool with something that will keep granular historical data.

esxtop/resxtop:

esxtop (or resxtop if you’re using the vMA appliance) is my hardcore tactical troubleshooting tool.  The real-time information it displays is extremely useful in diagnosis though its biggest issue is that you can only look at one host at a time.  I often ask for esxtop data samples using the following command:

esxtop -b -a -d 2 -i 150 >outputfile.csv

This will collect data using esxtop ‘b’atch mode, ‘a’ll counters, every ’2′ seconds for 150 ‘i’terations (or a total of 5 minutes).  The resulting .CSV file can be opened directly with PerfMon (use Win7 64bit edition) or esxplot (cheers to Geoffrey for his development effort).

vm-support:

Last, the vm-support script can be called with some parameters to not only collect the diagnostic bundle but to also collect performance data from the VSI nodes that can be re-played with esxtop (or extracted – see the command below).  This combo deal is also very useful since it allows us to not only look at the performance data, but some of the logging to check things like vSwitch config, flow control and monitor modes, etc.  The size can be daunting, potentially GB’s when extracted and so typically must be transported outside of email.

vm-support -n -s -i 2 -d 150

(again for samples every 2 seconds for a total of 5 minutes with no core dump).  To extract esxtop data use the following command:

esxtop -R / |esxtop -b >outputfile.csv

Anyone else have any cool tools or tips for data collection and analysis?

Happy Holidays Everyone!

Posted by: virtualizationeh | November 23, 2010

Checking vSphere Monitor Modes and Performance Implications

There are two forms of hardware assist that vSphere can leverage if your server platform supports them.  The vSphere monitor will determine if they are available and use them appropriately.  They can make a significant performance difference so it’s important to understand them.

First, is CPU virtualization assist known as AMD-V and Intel VT-x that replace Binary Translation (aka BT) which is CPU virtualization done completely in software.  As you can imagine using these hardware instructions greatly increases performance.  These have been around for quite sometime and you’d really have to dig for old server class hardware that doesn’t have them.

Second, is memory assist known as AMD RVI (Rapid Virtualization Indexing) and Intel EPT (Extended Page Tables) which replaces shadow page tables managed completely in software (sometimes referred to as SWmmu).  These additional functions allow the memory scheduler to use hardware to manage the mapping between physical host memory and virtual guest memory.  Again these functions increase performance and reduce virtualization overhead.  Its benefit can be felt the most with memory intensive workloads like Tier 1 applications such as databases, messaging, XenApp, etc.

To reduce virtualization overhead and maximize performance you want to ensure your server has these functions and that they are ‘enabled’ in the BIOS (I see many cases where people have forgot to enable them or have toggled them off by accident).

So how can I check if I have these assists available and that my virtual machines are leveraging them?

First check what has been set within the virtual machine configuration.  Here they can be switched manually.  Automatic is recommended and will allow the monitor to make its best attempt.

Second, search for “MONITOR MODE” in the vmware.log file of the running virtual machine.  These are the lines you’re interested in:

vmx| MONITOR MODE: allowed modes                   : BT32 HV HWMMU
vmx| MONITOR MODE: user requested modes        : BT32 HV HWMMU
vmx| MONITOR MODE: guestOS preferred modes   : HWMMU HV BT32
vmx| MONITOR MODE: filtered list                         : HWMMU HV BT32
vmx| HV Settings: virtual exec = ‘hardware’; virtual mmu = ‘hardware’

Decoded:

  • BT32 or BT – Binary Translation which is full software virtualization of CPU and Memory
  • HV – Hardware Virtualization (AMD-V, Intel VT-x) which is CPU assist
  • HWMMU – Hardware Memory Management Unit (AMD RVI, Intel EPT) which is Memory assist in addition to CPU assist (preferred option)
  • The order of modes is also important

In the example above, the server supports (ie: allowed modes) all of the modes above validating that the server does have both assists.  In the settings of the VM you have the option of selecting the monitor mode manually (ie: user requested modes).  Each operating system also has its own preference (ie: guestOS preferred modes) that is tied back to the operating system selection you make when creating the VM.

After all these options are taken into consideration (ie: filtered list) the most performant choice is made and documented on the next line.  In this case, virtual exec (cpu assist) = hardware and memory mmu (memory assist) = hardware.  It is still relatively common today not to have Intel EPT and therefore memory mmu = software.

More Information:

Also remember that a ‘Restart’ of the guest is not enough to change the monitor mode – you must ‘Power Off’ the virtual machine when making a monitor mode change.

Posted by: virtualizationeh | October 29, 2010

Apples to Oranges – Performance Comparisons Gone Awry

I’ve been involved in a few scenarios recently where “a virtual machine isn’t performing as well as it was on physical” and virtualization was blamed.

So where should one start troubleshooting this?

In this case we’re comparing an application on a physical host to the same application within a VM.  Some simple, but often overlooked, questions should be:

1) Are both hosts the same hardware and configuration?

More often than not, I find that people are comparing performance of two different platforms or configurations.  While servers may have the same model number, important differences “sneak in” like processor models, clock speeds, physical memory balance/speed and BIOS settings.  One cannot make a fair comparison unless the same foundation is used.  An easy way to eliminate this confusion is to use the same host both with/without a hypervisor.

2) Is the VM configured the same as the physical host?

This really seems silly but I see different configurations quite regularly.  Example: a physical platform has two processor sockets and 4 cores per processor but the VM is configured with only 4 vCPUs.  This cannot be compared directly, nor can it be compared lineally (eg: 50% of the sockets doesn’t equal 50% of the performance).  Same considerations for memory.  A change in memory allocation, either increase or decrease, can change the way an operating system or application performs.

3) Is the storage the same?

Storage is the most common performance issue I diagnose.  This one is  a little more complex but extremely important.  If the storage is not identical between tests, it will invalidate any comparison.  You must use the same type of storage (local, SAN, NAS), same protocol (iSCSI, FC or NFS), same disk type (SAS, SATA, FC) and even same spindle speed.  The best comparison is done using the same LUNs if possible, switched between physical and virtual guests.  Small IO latency changes can account for large performance swings so even the simple things like using local disk but changing the RAID type, introduce discrepancy.

If you ever look at true benchmarks, like VMmark, that’s why the process and method is very prescribed and audited.  It leaves little flexibility to make sure the highest quality comparison and results.  They do need more effort but that’s the price of accuracy.

So if you’re asked to do a comparison, or find yourself troubleshooting why performance has changed, ask the simple question first: “Is this apples to apples?”  Better to be fair and meticulous whether the results are good or bad.

I challenge you to make the comparison as close as possible.  If you have any questions, please feel free to ask.

Posted by: virtualizationeh | October 15, 2010

CPU Contention %RDY with CPU Limits %MLMTD

We all know about ‘Ready’ time (%RDY in esxtop) – so high Ready time is bad – right?  Now what happens if you’re using CPU ‘Limits’ either on a specific guest or resource pool?

Limits can artificially raise Ready time because when a limit stops a VM from executing, that VM still accumulates Ready time.  In this case it wasn’t because of CPU contention but because the scheduler didn’t allow it because you configured it to respect a Limit.

So how can I measure CPU contention in an environment with Limits?

In esxtop there is another counter called %MLMTD which is the percentage of time the VM was ready to run but wasn’t scheduled because it would violate the CPU Limit set.  Since %MLMTD is added to Ready time, we can use this simple formula:

%RDY – %MLMTD

Using the sample esxtop data above for the VM named ‘ad’ it would look like this: 239.05 – 239.04 = 0.1 % Ready Time without limits.  Here it’s obvious the VM is restricted due to an artificial CPU limit I configured as compared to the contention of a busy host.

Now the bad news – %MLMTD is only available in esxtop and is not visible in vCenter.  This means you’ll need to be creative in monitoring it on a per host basis, or leveraging a capacity management tool like CapacityIQ.  Again we always need to look at performance holistically instead of a couple counters here or there on occassionally.

FYI: Here’s a simple virtual machine utilization formula to put more counters in perspective:

100% = %RUN +%READY +%CSTP +%WAIT

Shout out to RyanP for asking the question!

Posted by: virtualizationeh | October 4, 2010

PVSCSI, LSI SAS or Parallel – What vSCSI adapter should I choose?

Update:  I’ve come across some more information which changes my recommendation (nothing is ever written in stone).  While I originally suggested LSI Logic Parallel was my preferred default vSCSI adapter, armed with this newly discovered information, I now understand why LSI Logic SAS is being used as the default for the latest Microsoft OS’s.  This is what my continued investigation uncovered:

In the VI Client help file (I’ve never actually looked in there before) around selecting your vSCSI adapter it says this:

The LSI Logic Parallel adapter and the LSI Logic SAS adapter offer equivalent performance. Some guest operating system vendors are phasing our support for parallel SCSI in favor of SAS, so if your virtual machine and guest operating system support SAS, choose LSI SAS to maintain future compatibility.

So my next question was ‘why’ and who are these ‘OS vendors?’  This was the clearest answer (Thanks Eric):

The main reason we did this is because as of Win2k8, MS deemed parallel scsi too old to care about supporting for MSCS.  As of now, I believe all versions of windows still ship with an LSI Parallel driver as well, though the newer versions also ship with LSI SAS.  I think it was more of a “why not use LSI SAS if the guest supports it?”  You get the benefit of never running into issues with clustering, and also LSI is more likely to push fixes to the more modern driver for a longer period of time before declaring it end of life.  As far as performance and stability, the two devices are identical (modulo potential driver bugs).

So based on this new information, I’ve updated my article and table below.

Under vSphere here’s a quick table to summarize my perspective:

Historically, there was very little performance difference between BusLogic and LSI Logic as documented under ESX 2 back in the day here.  It was a choice of which driver was available by default in each OS, versus a performance choice.  Today I rarely, if ever, see BusLogic used for anything other than legacy Windows 2000.  What was a popular choice has now become legacy and vendors have made investments in new directions.

So how to I choose between LSI Logic Parallel or SAS?

Neither of them is a bad choice.  Both are proven in their development life-cycle and have the same performance characteristics.  SAS seems to be where LSI’s more active investment is taking place so it would make sense that this driver would be better maintained (bug fixes, updates, available in many OS’s, etc) today and in the future.  So if your OS has supported driver, that would be my new vSCSI adapter choice.

So where does PVSCSI fit in?  That answer depends on the version of vSphere you’re running and your guest OS:

  • vSphere 4.0 – PVSCSI is only recommended for use with supported server based workloads for which more than 2000 IOPS are required.  The reason you would want to use this driver was that it was virtualization aware and therefore could process these high IO rates using less host CPU (ranging from 5-20%).  This enabled the virtualization of large Tier 1 I/O intensive applications.  However for lower I/O scenarios, the LSI Logic Parallel adapter would often outperform it – details here.  This generated confusion that still exists today.
  • vSphere 4.1 – Thanks to some driver optimizations, PVSCSI can now be used with consistent performance in both high and low I/O use cases.  Again the reason for PVSCSI is a lower host CPU cost per IO but it really only becomes a consideration when you’re dealing with thousands of I/Os.  So I use this option as an exception rather than a default.  It’s also worth noting you can now use PVSCSI for Windows boot disks.
  • PVSCSI only supports the following guest OS’s – Windows 2003, Windows 2008 and Red Hat Enterprise Linux 5.

References:

Bottom line:  Use the chart above to select the vSCSI adapter based on your OS.  There is very little read or write performance difference (I’d argue insignificant) between adapter choices now.  The choice is more around compatibility, vendor support, and futures.

Posted by: virtualizationeh | September 21, 2010

Transparent Page Sharing & Large Memory Pages

Do you feel like your vSphere 4,x servers seem are consuming more memory than they used to?

Does vCenter indicate, or alarm, that host memory utilization is consistently high?

Do you have a funny feeling that Transparent Page Sharing (TPS) isn’t reclaiming memory?

Turns out there is some confusion about how and when TPS works with Large Memory Pages.  Ever since ESX 3.5, if your CPU leveraged a hardware MMU (ie: AMD RVI or Intel EPT), large memory pages (sized at 2Mb each) are used for performance benefit instead of 4Kb pages.  But since TPS only works with 4Kb pages, this means that TPS does not come into play – until – the host is under memory pressure and begins breaking 2Mb pages into 4Kb pages for TPS reclamation.  If you’re not looking at the right combination of counters, you would wrongly assume the host is out of memory and stop adding workloads to it.

Background: Transparent Page Sharing (TPS) in hardware MMU systems

The reality is that you can continue to add workloads, over committing the physical memory until TPS, Ballooning and Memory Compression can no longer manage that memory pressure and you start to swap to disk.  Remember, swapping is very bad.  In the past, many administrators have used only memory utilization to predict when their hosts are out of memory and performance may suffer.  Since memory is consumed first and shared second, this counter is not a good measure of memory capacity.

Some other counters to consider in combination:

  • Active – Amount of memory that is actively used, as estimated by VMkernel based on recently touched memory pages.
  • Swapped – Current amount of guest physical memory swapped out to the virtual machine’s swap file by the VMkernel.
  • Swap In Rate – Rate at which memory is swapped from disk back into memory – if you see this consistently, it’s too late.

Also remember that use of reservations will have an effect on these counters.  Example:  If you have a host with large percentage of memory reserved by guests and those guests are idle, using only the active memory counter it would seem you might have more capacity left, but the reservation is not subject to memory reclamation techniques like TPS.  So you might add workloads only to find swapping occurs.

Capacity Planning and Management is more complex then reviewing a couple of host counters because things like clusters and reservations must be considered.  If you’re interested in a purpose-built tool I’d suggest evaluating VMware’s Capacity IQ.  Proper planning for capacity is important as otherwise performance suffers.

Older Posts »

Categories

Follow

Get every new post delivered to your Inbox.