OpenStack resizing of instances
One thing that is not always adequately explained in the OpenStack documentation is how exactly instance resizing works, and what is required, especially while using KVM as the virtualisation provider, with multiple compute nodes.
You might find something similiar to the following in your logs, and no good documentation on how to fix it.
ERROR nova.compute.manager [req-7cb1c029-beb4-4905-a9d9-62d488540eda f542d1b5afeb4908b8b132c4486f9fa8 c2bfab5ad24642359f43cdff9bb00047] [instance: 99736f90-db0f-4cba-8f44-a73a603eee0b] Setting instance vm_state to ERROR TRACE nova.compute.manager [instance: 99736f90-db0f-4cba-8f44-a73a603eee0b] Traceback (most recent call last): TRACE nova.compute.manager [instance: 99736f90-db0f-4cba-8f44-a73a603eee0b] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 5596, in _error_out_instance_on_exception TRACE nova.compute.manager [instance: 99736f90-db0f-4cba-8f44-a73a603eee0b] yield TRACE nova.compute.manager [instance: 99736f90-db0f-4cba-8f44-a73a603eee0b] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 3459, in resize_instance TRACE nova.compute.manager [instance: 99736f90-db0f-4cba-8f44-a73a603eee0b] block_device_info) TRACE nova.compute.manager [instance: 99736f90-db0f-4cba-8f44-a73a603eee0b] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 4980, in migrate_disk_and_power_off TRACE nova.compute.manager [instance: 99736f90-db0f-4cba-8f44-a73a603eee0b] utils.execute('ssh', dest, 'mkdir', '-p', inst_base) TRACE nova.compute.manager [instance: 99736f90-db0f-4cba-8f44-a73a603eee0b] File "/usr/lib/python2.7/site-packages/nova/utils.py", line 165, in execute TRACE nova.compute.manager [instance: 99736f90-db0f-4cba-8f44-a73a603eee0b] return processutils.execute(*cmd, **kwargs) TRACE nova.compute.manager [instance: 99736f90-db0f-4cba-8f44-a73a603eee0b] File "/usr/lib/python2.7/site-packages/nova/openstack/common/processutils.py", line 193, in execute TRACE nova.compute.manager [instance: 99736f90-db0f-4cba-8f44-a73a603eee0b] cmd=' '.join(cmd)) TRACE nova.compute.manager [instance: 99736f90-db0f-4cba-8f44-a73a603eee0b] ProcessExecutionError: Unexpected error while running command. TRACE nova.compute.manager [instance: 99736f90-db0f-4cba-8f44-a73a603eee0b] Command: ssh 10.5.2.20 mkdir -p /var/lib/nova/instances/99736f90-db0f-4cba-8f44-a73a603eee0b TRACE nova.compute.manager [instance: 99736f90-db0f-4cba-8f44-a73a603eee0b] Exit code: 255 TRACE nova.compute.manager [instance: 99736f90-db0f-4cba-8f44-a73a603eee0b] Stdout: '' TRACE nova.compute.manager [instance: 99736f90-db0f-4cba-8f44-a73a603eee0b] Stderr: 'Host key verification failed.\r\n' TRACE nova.compute.manager [instance: 99736f90-db0f-4cba-8f44-a73a603eee0b] ERROR oslo.messaging.rpc.dispatcher [-] Exception during message handling: Unexpected error while running command. Command: ssh 10.5.2.20 mkdir -p /var/lib/nova/instances/99736f90-db0f-4cba-8f44-a73a603eee0b Exit code: 255 Stdout: '' Stderr: 'Host key verification failed.\r\n'
When OpenStack's nova is instructed to resize an instance it will also change the host it is running on, almost never will it schedule the instance on the same host and do the resize on the same host it already exists. There is a configuration flag to change this, however in my case I would rather the scheduler be run again, especially if the instance size is changing drastically. During the resize process, the node where the instance is currently running will use SSH to connect to the instance where the resized instance will live, and copy over the instance and associated files.
There are a couple of assumptions I will be making:
- Your
nova
, andqemu
user both have the same UID on all compute nodes - The path for your instances is the same on all of your compute nodes
Configure the nova user
First things first, let's make sure our nova
user has an appropriate shell set:
cat /etc/passwd | grep nova
Verify that the last entry is /bin/bash
.
If not, let's modify the user and make it so:
usermod -s /bin/bash nova
Generate SSH key and configuration
After doing this the next steps are all run as the nova
user.
su - nova
We need to generate an SSH key:
ssh-keygen -t rsa
Follow the directions, and save the key WITHOUT a passphrase.
Next up we need to configure SSH to not do host key verification, unless you want to manually SSH to all compute nodes that exist and accept the key (and continue to do so for each new compute node you add).
cat << EOF > ~/.ssh/config Host * StrictHostKeyChecking no UserKnownHostsFile=/dev/null EOF
Next we need to make sure we copy the the contents of id_rsa.pub
to
authorized_keys
and set the mode on it correctly.
cat ~/.ssh/id_rsa.pub > .ssh/authorized_keys chmod 600 .ssh/authorized_keys
This should be all the configuration for SSH you need to do. Now comes the
import part, you will need to tar up and copy the ~nova/.ssh
directory to
every single compute node you have provisioned. This way all compute nodes will
be able to SSH to the remote host to run the commands required to copy an
instance over, and resize it.
Reset state on existing ERROR'ed instances
If you have any instances that are currently in the ERROR
state due to a
failed resize, you will be able to issue the following command to reset the
state back to running and try again:
nova reset-state --active <ID of instance>
This will start the instance, and you will be able to once again issue the resize command to resize the instance.