Migrating Failover Cluster to New 10Gbit Network + New SAN


Recommended Posts

We are doing this mass migration of our iSCSI network from a 1Gbit backbone and an old SAN to a new 10Gbit backbone w/ a new SAN. All of the other data on the old SAN has been migrated; finally, im at the last step in the process, but i'm stuck.

 

We have two SQL servers that are clustered using Failover Cluster Manager. We've just installed (2) X520-DA2 NICs into them. Each NIC has 2 ports, as shown below.

 

--> SQL1

      - 1Gbit iSCSI; 192.168.250.25

      - NEW 10Gbit iSCSI; 192.168.250.21; 192.168.250.22

     

--> SQL2

     - 1Gbit iSCSI; 192.168.250.26

     -  NEW 10Gbit iSCSI; 192.168.250.23; 192.168.250.24

 

Currently, in the Failover Cluster Manager (FCM), the two 1Gbit NICs show up just fine. What i dont understand is how to add the new 10Gbit IP addresses to that same cluster network. When I enable the new NICs on the passive host, the 250_network gets messed up, and i have to reboot the server. Obviously im working on the passive host first as to not mess up the Primary host that's hosting the LUNs...

 

I have new LUNs ready to go on the new SAN. If i can get the 250_network configured correctly w/ BOTH the new and old IP addresses, i should be able to map the new LUNs and have those join the FCM as well.

 

Here's what im thinking the step-by-step should be:

 

  1. Add new 250_network IPs into FCM (how?)
  2. Map new LUNs to SQL1 and SQL2
  3. Bring disks online and ready to join FCM
  4. Configure "Add a disk" in FCM to join new disks cluster
  5. Migrate quorum to new disk
  6. Stop SQL services in non-production hours and migrate data to new disks
  7. remove old disks from FCM
  8. disconnect iSCSI targets to old SAN
  9. disable old 1Gbit NICs
  10. Profit?

 

Can someone guide me through this process?

 

Thanks!

 

    

 

Link to comment
Share on other sites

Hi,

 

Have a look at the following url;

http://blogs.technet.com/b/askcore/archive/2014/02/20/configuring-windows-failover-cluster-networks.aspx

It might be easier if you remove the passive node from the cluster and disable the old 1gb connections.  Then configure the the 10gb either with the new ip's or reuse the old 1gb ip's; if you reuse the old ip's then you can connect the 10gb network back to the old luns plus the new luns and add the node back into the cluster.

 

Migrating the sql data you can use attach/detach or script the move process.

http://www.drawbackz.com/stack/28891/what-is-the-proper-way-to-move-a-database-from-one-drive-to-another-in-sql-server-2005.html

This might not be your preferred way but there's more than one way to skin a cat.

Link to comment
Share on other sites

It might be easier if you remove the passive node from the cluster and disable the old 1gb connections.  Then configure the the 10gb either with the new ip's or reuse the old 1gb ip's; if you reuse the old ip's then you can connect the 10gb network back to the old luns plus the new luns and add the node back into the cluster.

i cannot use the 10Gbit network to connect the old LUNs. the old and new SANs are unaware of each other. the old one is 1Gbit, the new is 10Gbit.

 

Whilst you are planning down time anyway why dont you just change the IP's over also. Or am I missing something?

 

sorry, i dont follow. These two networks, while both are 192.168.250.x, are physically unaware of each other. They have separate switches. One uses CAT, one uses SFP.

Link to comment
Share on other sites

i cannot use the 10Gbit network to connect the old LUNs. the old and new SANs are unaware of each other. the old one is 1Gbit, the new is 10Gbit.

 

 

 

sorry, i dont follow. These two networks, while both are 192.168.250.x, are physically unaware of each other. They have separate switches. One uses CAT, one uses SFP.

 

So I might be out of my league here, so please don't think bad of me. So you say they are completely unaware of each other, i.e. on different physical networks, right? So how are your subnets and gateway setup? 255.255.255.0 with 192.168.250.1? Or something like that?

 

I'm afraid to say what I think... afraid I'm gonna sound condescending from mentioning an easy mistake....

Link to comment
Share on other sites

More information is required around your networking and san connections.

 

For your sql cluster you should have an ip address for node 1, an ip address for node 2 and your cluster and msdtc should have seperate ips?

 

Your storage nics should have different IP's on each node that use iscsi connections?

 

Adding an additional 10gb for isci connections should have no effect as you will add the new luns into the cluster; is it possible the binding order of the nics has the new 10gb listed first?

Link to comment
Share on other sites

So I might be out of my league here, so please don't think bad of me. So you say they are completely unaware of each other, i.e. on different physical networks, right? So how are your subnets and gateway setup? 255.255.255.0 with 192.168.250.1? Or something like that?

 

I'm afraid to say what I think... afraid I'm gonna sound condescending from mentioning an easy mistake....

There's no gateway, it's just 192.168.250.x and 255.255.255.0.

 

This setup has worked great for all my other migrations. I've migrated VMs to new LUNs on ESX hosts, a 1TB LUN on another server, as well as 2.4TB worth of files on our file server cluster.

 

I think my 10 steps in the 1st post are correct, but im just not quite confident yet. Also, step 1, im not sure how to get the new IPs (NICs) into the FCM.

Link to comment
Share on other sites

Any conflicting IP's on both networks or are all ips unique on the individual networks?  Could be having an issue with knowing which path to take being that both adapters are on the same subnet.  Yes you will need to be able to put the path in the FCM...but it should see the iscsi wwn of the new san not necessarily the ip of that nic.  You would have to put that into the iscsi initiator then tell the fcm where to look

Link to comment
Share on other sites

More information is required around your networking and san connections.

 

For your sql cluster you should have an ip address for node 1, an ip address for node 2 and your cluster and msdtc should have seperate ips?

 

Your storage nics should have different IP's on each node that use iscsi connections?

 

Adding an additional 10gb for isci connections should have no effect as you will add the new luns into the cluster; is it possible the binding order of the nics has the new 10gb listed first?

--> SQL1

     - 1Gbit iSCSI; 192.168.250.25

     

--> SQL2

     - 1Gbit iSCSI; 192.168.250.26

 

The Cluster, as well as the SQL Cluster service, are on a different subnet --> 192.168.2.x. This will not be changing.

 

Also keep in mind that this is not a SQL cluster. it's simply a active/passive cluster of 2 servers. SQL1 is active, hosting LUNs, while SQL2 is in standby showing no LUNs. It's not like an active/active SQL loadbalancer or anything.

Link to comment
Share on other sites

Any conflicting IP's on both networks or are all ips unique on the individual networks?  Could be having an issue with knowing which path to take being that both adapters are on the same subnet.  Yes you will need to be able to put the path in the FCM...but it should see the iscsi wwn of the new san not necessarily the ip of that nic.  You would have to put that into the iscsi initiator then tell the fcm where to look

this is the problem im running into w/ the 250.x network now. When i enable the new NICs, it messes up the old NICs in the FCM. It appears that the FCM doesnt know how to prioritize or run all 250.x IPs simultaneously. So instead of putting all the 250.x addresses in FCM, it seems to just pick whatever it feels like. This, of course, messes up the iSCSI cluster until i disable the new NICs again.

 

So going back to my step-by-step:

 

1. I need FCM to 'see' all of the 250 IPs instead of just picking and choosing what it feels like

2. Once successful, i can map the new LUNs to the servers

3. Add new LUNs to FCM

4. etc

Link to comment
Share on other sites

windows does not like to be multihomed....it cannot determine the correct path to take.  You are going to have to disable one set of nics.  Being that it is the secondary server and the primary is functioning properly this should not be a big issue.  If you need to have both up, it may be best to raise the priority of one nic over the other. 

 

 

  1. Click Start, click Network, click Network and Sharing Center, and then click Change Adapter Settings.

  2. Press the ALT key, click Advanced, and then click Advanced Settings. If you are prompted for an administrator password or confirmation, type the password or provide confirmation.

  3. Click the Adapters and Bindings tab, and then, under Connections, click the connection you want to modify.

  4. Under Connections, select the adapter that you want to move up or down in the list, click the up or down arrow button, and then click OK.

     

Link to comment
Share on other sites

 

windows does not like to be multihomed....it cannot determine the correct path to take.  You are going to have to disable one set of nics.  Being that it is the secondary server and the primary is functioning properly this should not be a big issue.  If you need to have both up, it may be best to raise the priority of one nic over the other. 

 

 

  1. Click Start, click Network, click Network and Sharing Center, and then click Change Adapter Settings.

  2. Press the ALT key, click Advanced, and then click Advanced Settings. If you are prompted for an administrator password or confirmation, type the password or provide confirmation.

  3. Click the Adapters and Bindings tab, and then, under Connections, click the connection you want to modify.

  4. Under Connections, select the adapter that you want to move up or down in the list, click the up or down arrow button, and then click OK.

     

 

ok. i modded it slightly. Now the list includes the first 250.25 (current, 1Gbit), then 250.21, then 250.22. The latter 2 are the new IPs.

 

What should i expect to happen if i enable 1 or both of those new IPs?

Link to comment
Share on other sites

What is the subnet?  /24 not going to work, will bomb out or mixed functionality (sometimes up sometimes down).  If you segmented out the subnet it would work, but would require a bit of redesign on your part.  Perhaps doing a /27 instead of a /24.  If there is no gateway shouldn't be too hard to do, just make sure that the card ips are within the same usable range on the subnet and you will be good to go.  You can do a /25 if you want to split the 254 addresses in half.

Link to comment
Share on other sites

yeah im still pretty stuck...

 

I enabled the new NICs on the passive node this morning. I then connected those NICs to the new SAN via the iSCSI Initiator. Now those new drives are available on the server. I will repeat these steps on the Active node.

 

After this, im totally at a loss. I found out that i cannot add more than 1 NIC per subnet per host to the FCM, so there's no possible way to add the new 250_network IPs to the current IPs. What i would have to do is replicate the data, shut down all services, disable the current 1Gbit 250_network IPs and revalidate the cluster. This should then add the new 250_network IPs to the FCM.

 

Even if that works, though, the FCM is going to freak out that it cant find its disks, since they'll be offline. I'd have to add the new disks and change drive letters? i just dont know.

 

The idea came up today, as well, to make a new Cluster. i dont think that's going to work b/c it'll still have the 1Gbit NICs taking precedent. plus, id have to worry about new Cluster IP addresses and changing every application in our domain that references the old Cluster name. That would be a nightmare.

 

In the end, i need a way to gracefully disable the old 1Gbit stuff and bring the 10Gbit stuff online in its place...

 

[EDIT] on second thought, is it going to be a case where i need to destroy the current cluster, disable the 1Gbit stuff, and re-create the cluster with the 10Gbit stuff in place? ugg...

Link to comment
Share on other sites

This goes back to my initial link that I shared with you.

 

 

The Failover Clustering network driver detects networks on the system by their logical subnet. It is not recommended to assign more than one network adapter per subnet, including IPV6 Link local, as only one card would be used by Cluster and the other ignored.

http://windowsitpro.com/windows-server/six-common-problems-failover-clusters

This should only effect the cluster network; adding iSCSI Luns should be ok?  You stated that you got the new Luns added to your passive node? Have you tried to add the storage into the cluster?

https://technet.microsoft.com/en-us/library/cc733046.aspx

You also said its a cluster but not a SQL cluster so unsure if your using a sql mirror or if your using cluster shared volumes (csv)?

 

Depending on your SQL version you could consider adding a 3rd node and configure SQL AlwaysOn

http://www.mssqltips.com/sqlservertip/3241/implement-a-sql-server-ha-failover-solution-without-shared-storage/
Link to comment
Share on other sites

I think you need to re-read my original link again and the last comment;

http://blogs.technet.com/b/askcore/archive/2014/02/20/configuring-windows-failover-cluster-networks.aspx#pi169993=2

 

Failover Cluster deals with "networks", not network cards. So if your two storage connected cards are 1.1.1.1 and 1.1.1.2, from a Windows and MPIO perspective, that is fine and Windows handles it like it is supposed to when going to and from the storage. Failover Cluster on the other hand, uses the "networks" for its Cluster communication (heartbeats, joins, registry replication, etc) between the nodes. If it sees two cards on the same "network", it is only going to use one of the cards for its communication between the nodes. Cluster Validation will flag this setup, but you can ignore it as a possible misconfiguration because it is not. Since these are cards going out to your storage, you should have them disabled for Cluster use so that we do not use it for anything Cluster communication related. Right mouse click the network (Failover Cluster Manager / Networks) and choose Properties and you will see the setting. Set this network to be disabled for Cluster use.

 

Networks used for ISCSI communication with ISCSI software initiators is automatically disabled for Cluster communication (Do not allow cluster network communication on this network).

 

Networks configured without default gateway is automatically enabled for cluster communication only (Allow cluster network communication on this network).

 

Network configured with default gateway is automatically enabled for client and cluster communication (Allow cluster network communication on this network, Allow clients to connect through this network).

Link to comment
Share on other sites

Forgive my ignorance, I haven't touched this since 2010, but can't you re ip the networks without screwing things up royaly?  I do remember that I was able to re-ip-ing the servers without issue when doing this with exchange failover clustering. 

 

iscsi should be its own network. 

 

failover clusters should have its own network to communicate on for heartbeats and active/passive transfer

 

the servers themselves should be on a intranet network

 

 

You should have a minimum of 3 nics to be able to support this all on different networks.

Link to comment
Share on other sites

 

This goes back to my initial link that I shared with you.

 

"The Failover Clustering network driver detects networks on the system by their logical subnet. It is not recommended to assign more than one network adapter per subnet, including IPV6 Link local, as only one card would be used by Cluster and the other ignored."

 

--> I read this same thing yesterday. So, that means i'd have to re-order the hierarchy and put the new IPs ahead of the current, 1Gbit IPs.

This should only effect the cluster network; adding iSCSI Luns should be ok?  You stated that you got the new Luns added to your passive node? Have you tried to add the storage into the cluster?

 

--> Yes, i was able to add the new LUNs to the passive node just fine. I cannot add the storage into the cluster b/c the requirement is that the storage is equal among all nodes. so far, i havent added the LUNs to the Active node.

 

You also said its a cluster but not a SQL cluster so unsure if your using a sql mirror or if your using cluster shared volumes (csv)?

 

--> afaik, im just using csv's. There are 4 shared volumes in the FCM - Data, Logs, Backups, Quorum. and youre right, it's not a SQL Cluster. It's just 2 physical servers that are in Active/Passive.

 

Depending on your SQL version you could consider adding a 3rd node and configure SQL AlwaysOn

 

--> I'm not sure what AlwaysOn is, but i thought about a 3rd node yesterday. I can readily add VMs if necessary. i might explore this route today.

 

My responses are in-line. Thanks!

 

Networks used for ISCSI communication with ISCSI software initiators is automatically disabled for Cluster communication (Do not allow cluster network communication on this network).

 

Networks configured without default gateway is automatically enabled for cluster communication only (Allow cluster network communication on this network).

 

Network configured with default gateway is automatically enabled for client and cluster communication (Allow cluster network communication on this network, Allow clients to connect through this network).

In our case, the 250_network is configured for the 2nd option, in bold, even though it's iSCSI communication.

Link to comment
Share on other sites

Forgive my ignorance, I haven't touched this since 2010, but can't you re ip the networks without screwing things up royaly?  I do remember that I was able to re-ip-ing the servers without issue when doing this with exchange failover clustering. 

 

iscsi should be its own network. 

 

failover clusters should have its own network to communicate on for heartbeats and active/passive transfer

 

the servers themselves should be on a intranet network

 

 

You should have a minimum of 3 nics to be able to support this all on different networks.

all of this is already in place.

 

the issue here is that i have new switches, a new SAN, and i have to use them in tandem with our old switches and old SAN until the cut-over can be made.

 

remember, i've already done this with other storage in our environment. The issue here is that we're dealing with (2) clustered, physical servers, not VMs. it makes this much more difficult since i cant readily just create new VMs and swap them in and out. I did this recently w/ our file server cluster (also using FCM). I created a completely new file server cluster using VMs, migrated the data, and did the cut over.

 

this lead me to my edited post yesterday: am i just going to have to replicate the data, destroy the current cluster and re-create it w/ the new 10Gbit equipment?

Link to comment
Share on other sites

I would think it would be easier to re ip everything vs breaking the cluster. You would have less down time if you have a ton of data.

i assume you mean to disable the 1Gbit NIC and move the 10Gbit NIC up the hierarchy? or do you mean something else?

 

if i did that, then the FCM is most definitely going to freak out when it cant find it's 4 drives. not sure what to do after that

Link to comment
Share on other sites

I apologize if you've already given an answer to this question but is there any reason you can't / won't use a different subnet for the 10Gbit NICS?

 

192.168.251.X as opposed to 192.168.250.X.

Link to comment
Share on other sites

I would mark the 10GB San Connections as" Do not allow cluster network communication on this network" on both nodes; I would then allocate the luns to both nodes but don't configure the storage initially. I would then reboot the passive node; check the cluster for alerts and then fail the active node to the passive and reboot the new passive node and check cluster for alerts.

 

If no alerts and the cluster is ok; I would initialise the storage on the passive node.  If your using CSV use the wizard to add the storage; I would then reboot the passive node and check for alerts.  Then failover and reboot the new passive node and check alerts - at this stage if all is well then your good to go.

 

Make sure the cluster network is at the top of the bindings order.

Link to comment
Share on other sites

I apologize if you've already given an answer to this question but is there any reason you can't / won't use a different subnet for the 10Gbit NICS?

 

192.168.251.X as opposed to 192.168.250.X.

in hindsight, yes, i should have used another subnet to do this. at the time, i was going into this project blind, so i just wanted to keep everything as neat and tidy as i could. so, yes, another subnet would have avoided this problem completely. at this time, though, im stuck w/ the 250.x subnet b/c everything else i've already done is now configured that way too.

Link to comment
Share on other sites

This topic is now closed to further replies.
  • Recently Browsing   0 members

    • No registered users viewing this page.