Skip to content


vSphere 5 Upgrade: PSOD for vSphere 4.1.0 Hosts

I had a customer get ambushed by this last week and wanted to put this out there so hopefully no one else will run into this huge problem.  There is a VMware KB article out regarding the issue, however none of the release notes or upgrade documentation has been updated to reflect the information in the KB article.  Bottom line: DON’T UPGRADE VCENTER TO V5 IF YOU ARE RUNNING VSPHERE 4.1.0 (NO UPDATES) ON YOUR HOSTS!

The KB article can be found at the link here: clicky

Most of us have read through the upgrade guide for vSphere 5 and we know that the first step in the upgrade process is to upgrade/install vCenter 5.  After doing that, if you are running hosts at 4.1 with no updates you will begin having problems with hosts disconnecting and re-adding hosts to vCenter.  This could result in a PSOD if you turn out to be unlucky and catch this bug.

For this customer 8 out of 10 hosts in the cluster they were troubleshooting with VMware Support had a PSOD as soon as they re-enabled HA.  Needless to say all the VMs on those hosts went down and had to be brought back up when the hosts came back online.  Luckily there wasn’t any corruption within the VMs and they didn’t have to go through a lengthy restore process.

If you look at the chart below, VMware has 4.1.0 listed as supported for an upgrade.  Hopefully this will change in the future.

 

 

 

 

Hopefully this article helped some people before they got bit by this bug.

Update:  Chris Wahl has a great article on this over on his blog as well.  clicky

Update 2:  VMware has changed their compatibility matrix to reflect the bug.  Great and very reactive support as always from VMware!

 

Posted in vmware.

Tagged with , , , .


vCenter 5 Database Move: Update the tomcat DB pointer

Ran into a weird issue the other day and was totally perplexed.  I moved a vCenter database from one SQL server to another, as I have done numerous times before without issue, but this time something else cropped up.  The old SQL server kept receiveing login attempts from the vCenter server.  What?!?  

My DSNs for vCenter and Update Manager had been changed to point at the new DB server.  All apps and plugins that query vCenter use these DSNs, WTF is going on?

The "VMware VirtualCenter Management Webservices" service utilizes tomcat to connect to the DB to, I think, run the rollup jobs for DB cleanup and performance data.  vCenter runs fine without this service being able to connect to the DB though, so that is what was confusing for a while.  If you dig down into a config file at "C:\Program Data\VMware\VMware VirtualCenter\vcdb.properties" you will find what you're looking for.  A URL pointing to the old DB server.  e.g. url=jdbc:sqlserver://<dbserver>;databaseName\=<vcenter database name>  

The fix was to change the <dbserver> to the new DB server and all failed login attempts went away.  And in the spirit of the holidays "Yipee ki yay mother…"!

Hope this helps those DBAs out there who hate to see failed login attempts filling up their logs!  

Thanks to VMware Support for helping me run this down! Always a great experience when having to work with those guys and gals!  I don't have to call support often, but when I do, I call VMware Support…

Posted in Snig, vmware.

Tagged with , , , , .


vSphere Customization Specification Manager Bug

I've been working at a client the past few weeks and one of my tasks was to begin the process of upgrading their infrastructure to vSphere 5.  As we all know, the first step is to upgrade vCenter.  So I walked through the upgrade across all three vCenter servers and everything seemed to be ok until we went to deploy a new VM using a customization specification that had been created previously under vCenter 4.1.

During the customization process for the VM we received the infamous "Windows Setup encountered an internal error while loading or searching for an unattended answer file."  I opened a ticket with VMware, but I figured out the problem before they could call me back.  The problem stems from an amperes and "&" the client was using in the Organization field on the first screen of the Guest Customization wizard.  A simple change of "&" to "and" fixed the issue and everyone is happy.

VMware did call back and I asked them to reproduce and verify that we were not the only customer to run into this.  They said that we were only the second customer to open a ticket and report the issue, but that they would create a KB article so that if someone else ran into the issue the information would be out there.

I hope this helps anyone who has been scratching their head over this small, but potentially significant issue.

Posted in Snig.

Tagged with , , , , .


NFS Large Disk Support in vSphere

I've run into this issue with a couple customers the past couple weeks, and there isn't anything definitive out there that I could find, so I thought I'd write about it.  These customers have a need for disks in a VM larger than the 2TB – 512 bytes that is currently supported.  They have a couple options to get around this limitation but these solutions cause a tad bit more complexity in their environments.  

One nugget of knowledge that I confirmed with @VMwareStorage last week was that an NFS volume size is only restricted by the disk array itself when presented to a vSphere host.  e.g. If my NetApp running OnTap 8.x can do a 50TB volume, I can present that volume to vSphere.

 

Problem:

Customer is using a NFS datastore, running Windows VMs on that datastore and they require a 6TB volume.  They have a requirement for an application (a poorly written one) to be installed that requires local disk.  (This application cannot map to an NFS or CIFS export/share, it must use a disk seen by the OS as local disk.)  Currently with NFS we cannot use RDMs as a solution thus we are restricted to the maximum VM disk size of 2TB – 512 bytes.  (@VMwareStorage has indicated to me that RDMs on NFS are coming in the future.)

 

Solution(s):

NFS = we can create multiple disks (VMDKs) for that VM and present them to the OS as local disks.  Once presented and added we can then use a Dynamic Disk within Windows to concatenate the disks using a GPT partition to enable the one large contiguous volume needed.

iSCSI = if available, we can create a large LUN and present it to the VM as a virtual RDM.  Once added to the OS we simply format it with a GPT partition and away we go.  You could also use a software iSCSI initiator in the VM itself to get around the hypervisor all together, but that will have other implications when it comes to backup/recovery.

 

Risks:

There two obvious risks with these large volumes.  Backup and recovery.  The primary reason we are using a virtual RDM in the iSCSI solution is so that we can continue to use the vStorage APIs for backup, thus enabling some advanced backup technologies that ensure we can get the job done within the prescribed backup window.  Obviously some sort of backend disk array snapshot and offload to tape would be best for backups here, but would complicate recovery.  For the NFS solution the vCenter snaps of the VM could take quite a while, thus you would have to tune your timeout values accordingly.  Putting an agent inside the VM for this solution may be best depending on the backup solution you're using.  You have to weigh the good with the bad as always.

 

I hope this short article has helped a few people out there in the internets.  Let me know what you think in the comments below.  Have I missed anything?

Posted in storage, vmware.

Tagged with , , , , , , .


Hitachi has gone 3D and it’s good!

As many of you have already heard this morning, Hitachi has announced their new Virtual Storage Platform (VSP) and the software that goes along with it.  It's a pretty exciting time for the Hitachi folks as most new product launches are.  Having seen and actually used the new box and software, I'm pretty excited too!
 

3D Baby!

One of the coolest things that I think they have designed into the latest product is the ability to manage your storage across three different axis in your data center.  You can go wide by adding multiple compute and/or capacity modules to the base module.  You can go up and down by utilizing the new Hitachi Dynamic Tiering software to move data automatically between the different tiers of storage you define.  And last but not least, you can deep by virtualizing existing or new storage behind the VSP as you've been able to do for the two previous iterations of this product line.

I can't say it much better than Hitachi themselves so here a a few slides from their announcement.
 

Scale Deep

Scale Out

Scale Up

So there are some of the basics of the announcement.  I'll be getting in more technical detail on each of these items(and many more) over the next few weeks.  Let me know what you'd like to see more of and I'll hit those items sooner rather than later.

Here is a sneak peak into the base of the new architecture for the controllers.  Just imagine taking this base and being able to attach it horizontally and create the ability to grow capacity and/or performance.  Pretty flexible right?  And all without sending data traffic going outside the subsystem itself!

Posted in Snig.

Tagged with , , , , .


The Three Design Decisions – Choose Two

Someone on Twitter made a comment about a replication solution being expensive.  That particular solution is expensive, but I think that expense is relative to the problem you’re trying to solve. 

Something I always tell my customers when designing solutions is "You can have availability, performance, or an inexpensive solution.   Choose two."  That’s right, no matter how hard you try you can’t have all three.  Each slice of the pie requires sacrifice of another piece. 

I’d love to hear your thoughts…

Posted in Snig.


First VMworld Post

Well I have arrived. Checkin at registration was very nice at 0700 this morning. There was no wait and was able to walk right into the breakfast area.

After breakfast I went to our booth (booth #101) to finish the setup and run through our demo to make sure no one broke anything while testing over the weekend. Everything is working well without any issues.

So I walked over to my first session of the day, “Troubleshooting using ESXTOP”, and it was full. got there 15 minutes early and it was full. I’m praying that the rest of the sessions are not going to be like this. If so, I’m thinking there are going to be a lot of angry people.

I decided to walk over to the blogger lounge and sit down and type this and the lounge was full. It looks more like a TV studio/DJ booth than a blogger lounge to me IMHO. So I’m back at my booth typing this up.

It’s 1100 now, so I should probably head to my next session (@1200) to ensure I get a seat. I wonder what will happen when I actually get into a session and then expect to get to the next scheduled session. Will I be able to get in?

Posted in Uncategorized.


The New EMC is Pretty Impressive…

I had an opportunity to sit in a day long meeting with EMC and a customer last week. We did a walkthrough of the customers current environment and how we can simplify the overall environment to enable the customer to move to a more nimble, simplistic architecture. The customer has been an EMC customer for a little more than three years now and they have asked me to come in and bring all of the silos of infrastructure into an overall solution architecture. It has been fun so far and I’m looking forward to working with them and EMC over the next few years as things come together.

A Little History

I used to be an EMC customer back in the day and had a few Clariions running some small storage environments. They worked well, but I wasn’t really impressed with the EMC service and the overall kluge that was the EMC machine at the time. Today I work for an EMC partner (amongst other vendors) and until recently I spent most of my time competing against them rather than working with them. Chuck, Barry and I have had several disagreements/discussions in the past over technologies (just look back in time on this blog) and I have nothing but respect for those guys.

Impressed with the new EMC

So back to the meeting we had last week. We had discussions around BURA, Unified Storage, Encryption, RSA, and the list goes on. Just the amount of resources that EMC was able to bring to the table, on short notice, was impressive. Each individual was knowledgeable about their individual product, but also was able to spout the overall EMC messaging well. They were able to show how their individual product meshed with the other EMC products that were relevant to the customer.

Most impressive about the day were the products in the pipeline slated for the near future. In the past EMC was a company with a lot of products that didn’t really fit well together. Now they are a company that has brought things together and it feels like they have a true vision on where they want to go. Not just a vision but a plan on how to get to where they want to go. The end game is something that I always felt they lacked in the past. You can actually see how the differing products are going to come together to create an entire solution. Beautiful!

What will others do?

So that brings me to the smaller players in the storage industry. What are they going to do? They don’t have the breadth of products that EMC has. They don’t have the money EMC has. It’s going to be interesting to see how this all plays out. The good thing about the company I work for is that I am able to pick and choose from all the vendors we partner with to create the perfect solution. Storage companies just don’t have that ability so they are going to have to partner with a company like mine to even think about competing with EMC. For the storage only players, I’m thinking they’re probably just waiting to be bought. 3Par anyone?

Posted in Snig.


Article Discussing Who’s Switching to ESXi

Alex Barrett of Tech Target just published the article. It’s definitely worth a read.

Linkage

Posted in Uncategorized.


Migrating from ESX 3.5 to vSphere and ESXi

I posted a tweet yesterday about a migration and upgrade I was doing and received a couple replies asking to let them know how everything goes. I decided to create a blog post for everyone to read rather than replying in 140 character tweets. =)

Background: This customer has a very simple SMB setup as far as VMware goes. 3 servers in a single cluster, only a few vSwitches per host, and a HDS AMS disk subsystem. They wanted to upgrade to vSphere and are capable of taking some downtime to do it. (Not that downtime is required.) They bought 3 new Dell 710 servers to run vSphere on.

There are at least 20 different ways to go about this and I wanted to keep this upgrade as simple as possible Since the customer can take some downtime for their VMs I decided to do a Cold Migration of the VMs. This is by far the simplest and runs the least amount of risk for having problems with the VMs. The customer, while understanding VMware and the administration of it, could not troubleshoot issues once I was no longer onsite.

Here is the process that we walked through for a successful upgrade and migration:

1. Upgrade vCenter to version 4 – You need to read through the upgrade guide before you attempt this upgrade. There are some specific permissions changes to networks and datastores that could bite you after an upgrade if you don’t understand the way that vCenter changes read-only attributes. We’ll leave it at that. RTFM!

2. Install new ESXi servers on new hardware and add them to the existing cluster. Ensure all updates and patches have been applied. – Let’s discuss a few key reasons that I pushed this user to go with ESXi. All new implementations I have done over the past year have all been done with ESXi.

A) They were not running any fancy monitoring tools in the Service Console.
B) ESXi is a much smaller attack surface for hackers since RedHat is no longer running underneath.
C) Because there is no RedHat, VMware has full control over all patches which will allow for much quicker turn around on bug fixes, etc.
D) The user didn’t have any scripts or jumpstart servers that required the Service Console to run. Even if they did, we could have re- written them to run against the vMA (Virtual Management Appliance).
E) No Service Console! Personally I see more people get on the Service Console and make a mistake that causes a lot of problems. “Let’s see what this command does.”
F) THERE IS NO TECHNICAL REASON NOT TO GO TO ESXi!!!!!!

3. Map out all networks and datastores attached to 3.5 cluster – Self explanatory. You need to know what you need to know.

4. Create identical networks on new vSphere cluster – Again SE. The VMs will have to connect to the network after the migration. If the networks aren’t there, migration validation will fail.

5. Test cold migration to local storage and test networks with a test VM – Just test things out prior to the big outage.

6. Remediate any changes that need to be made. – SE. Fix any problems that pop up.

7. Present all datastores that reside on the old ESX servers to new ESXi servers. Verify connectivity, specifically LUN IDs. – Since the new ESXi boxes are part of the same cluster we can present the same LUNs/Datastores to them without any potential problems. Scan them and bring in the VMFS Datastores and verify everything looks OK.

8. Shut down all VMs – SE.

9. Cold migrate all VMs to new ESXi servers. – SE.

10. Remove all ESX servers from cluster. – This will ensure that during the next step DRS won’t try and spread the workload to the old servers. Just make things cleaner and less checking on hosts for DRS as a whole.

11. Upgrade the VMware Tools and Virtual Machine Hardware on all VMs. – Using Update Manager go ahead and upgrade things prior to putting them back online for your users. A reboot is required so the users would see interruption anyway. I’m not going to walk you through this. RTFM!

12. Install the vMA for command line management of the ESXi hosts. – SE.

On the ESX vs. ESXi front, I truly believe that once you try it you won’t even notice a difference and you’ll probably like ESXi better. If you’re an old ESX shop that loves the Service Console, then I would challenge you to use the vMA for a month and I would bet that you won’t go back.

Posted in Snig.