typedef int (*funcptr)();

An engineers technical notebook

Mac OS X El Capitan Installer Removes Custom Group ID and Membership

As always, after Apple releases their new operating system, my systems are upgraded. This time the upgrade was less of a surprise in terms of what it brings because I'd been beta testing the new release for the past couple of weeks, however I was still caught off guard.

On OS X, by default all user accounts start at ID 501 and count up, so if you have two accounts, you will have user ID 501 and 502 in use. For most people they will most likely never change this, and all is well. The default group ID for all new user accounts is staff which has a group ID of 20. So if you have a single account named for example janedoe her user ID would be 501 and her group ID would be 20 (staff).

Coming from a FreeBSD world and running a lot of FreeBSD systems, user accounts start at 1001, and count up. When you create a new user account on FreeBSD, by default that user is also added to a group with the same name as the username, with the same ID. So you end up with an account with ID 1001 and default group ID 1001. Using the same example, a user named janedoe would have a user ID of 1001, and a group ID of 1001 (janedoe).

When I first installed OS X, and almost every single new installation since, I have followed these steps to change my user ID and group ID to match those on my FreeBSD systems:

  1. Assumption is that you have a separate user account other than the one you are about to modify that you can temporarily use that has administrator privileges on the local Mac; I create an "Administrator" account for that reason.
  2. System Preferences
  3. Users and Groups
  4. Click the + (You may need to click the lock in the bottom left first)
  5. Change the dropdown to group
  6. Enter Full Name: janedoe
  7. Create group
  8. Right click on group (janedoe)
  9. Advanced Options...
  10. Change the Group ID to 1001
  11. Okay
  12. Right click on user (janedoe)
  13. Advanced Options...
  14. Change User ID from 501 to 1001
  15. Change Group from staff to janedoe
  16. Okay
  17. Close System Preferences
  18. Open Terminal, become root user (sudo su)
  19. cd /Users/janedoe
  20. find . -uid 501 -print0 | xargs -0 chown 1001:1001

This allows me to have the same user ID and group ID on both my Mac OS X and on FreeBSD, thereby making it easier to use tools like rsync that keeps ownership and permissions, as well as using NFS. Other ways to do something similar is using LDAP/Kerberos with shared directory service, but that is a little heavy handed for a home network.

This has worked for me without issues since OS X 10.8, even upgrading from 10.8 to 10.9 and then 10.10 did not change anything. However as soon as I did the upgrade to El Capitan (10.11) I noticed that all of my ls -lah output looked like this:

drwxr-xr-x+  13 xistence  1001   442B Oct  1 16:58 Desktop
drwx------+  28 xistence  1001   952B Aug 31 12:17 Documents
drwx------+  89 xistence  1001   3.0K Oct  1 15:56 Downloads
drwx------@  72 xistence  1001   2.4K Oct  2 00:16 Library

and id provided this valuable output:

uid=1001(xistence) gid=20(xistence) groups=20(xistence),12(everyone),61(localaccounts),399(com.apple.access_ssh),402(com.apple.sharepoint.group.2),401(com.apple.sharepoint.group.1),100(_lpoperator)

Wait, what happened to the staff group that I am supposed to be a member of, and why is my xistence group ID now stating it is 20 and not 1001 as I was expecting.

I wondered if the upgrade had messed up my group somehow, and it was confirmed when I checked with dscl.

$ dscl . -read /Groups/xistence
Password: *
PrimaryGroupID: 20
RealName: xistence
RecordName: xistence
RecordType: dsRecTypeStandard:Groups

Do note that the group xistence does not show up in System Preferences -> Users and Groups, so we'll have to do some command line magic.

Well, that's worrisome, why is this matching a built-in group's ID? Specifically let's check the staff group and make sure it still has the appropriate group ID.

$ dscl . -read /Groups/staff
GroupMembership: root
Password: *
PrimaryGroupID: 20
RealName: Staff
RecordName: staff BUILTIN\Users
RecordType: dsRecTypeStandard:Groups

Next I had to check to see what my user account was set to as the default group ID:

$ dscl . -read /Users/xistence
NFSHomeDirectory: /Users/xistence
Password: ********
PrimaryGroupID: 20
 Bert JW Regeer
RecordName: xistence bertjw@regeer.org com.apple.idms.appleid.prd.53696d524c62372b48344a53755864634e4f374b32513d3d
RecordType: dsRecTypeStandard:Users
UniqueID: 1001
UserShell: /bin/bash

Well, that is not entirely what I was expecting either, at last it didn't touch my user ID. Time to fix things.

First let's change the xistence group's group ID to 1001, and then change the Primary Group ID for the user xistence to group ID 1001.

# dscl . -change /Groups/xistence PrimaryGroupID 20 1001
# dscl . -change /Users/xistence PrimaryGroupID 20 1001

After that id looked a little bit more sane:

uid=1001(xistence) gid=1001(xistence) groups=1001(xistence),12(everyone),61(localaccounts),399(com.apple.access_ssh),402(com.apple.sharepoint.group.2),401(com.apple.sharepoint.group.1),100(_lpoperator)

However now the group staff is missing from the list of groups that the user xistence is a member of, which I don't think will hurt anything, but we still want to be able to read/write any folders that are designated as staff elsewhere in the OS, and any other privileges that entails. So let's add the user xistence to the staff group:

# dscl . -append /Groups/staff GroupMembership xistence

Let's verify, and check id again:

uid=1001(xistence) gid=1001(xistence) groups=1001(xistence),12(everyone),20(staff),61(localaccounts),399(com.apple.access_ssh),402(com.apple.sharepoint.group.2),401(com.apple.sharepoint.group.1),100(_lpoperator)

For this to fully take effect, log out and log back in. This will make sure that all new files have the correct user ID/group ID set.

After the change to the Group ID, the group still doesn't show up in System Preferences -> Users and Groups, which I find weird since it is not a built-in group.

Luckily everything is back to the way it was before the upgrade, and my backup scripts and NFS shares work again without issues.

Cobbler with CentOS 7 failure to boot/kickstart

Over the past week I've been working on building out an instance of Cobbler and testing some of the provisioning that it is able to do. One of the operating systems that I wanted to deploy is CentOS 7.

After I imported the system into cobbler, it correctly showed up in the pxelinux boot menu and it would happily load the kernel and the initrd, however after initial bootup it would throw the following error message:

dracut-initqueue[867]: Warning: Could not boot.
dracut-initqueue[867]: Warning: /dev/root does not exist

         Starting Dracut Emergency Shell...
Warning: /dev/root does not exist

Generating "/run/initramfs/rdsosreport.txt"

Entering emergency mode. Exit the shell to continue.
Type "journalctl" to view the system logs.
You might want to save "/run/initramfs/rdsosreport.txt" to a USB stick or /boot
after mounting them and attach it to a bug report

After that it gives you a root shell.

Some Google searching led me to an mailing list post for Cobbler where someone mentioned that adding ksdevice=link to the Cobbler profile allowed the system to boot without issues.

However before I just implement a change I want to know why that fixes the issue, so I searched Google for "kickstart ksdevice" and found Red Hat's documentation on starting a kickstart. Searching that page for "ksdevice" led me to this section:


The installation program uses this network device to connect to the network. You can specify the device in one of five ways:

  • the device name of the interface, for example, eth0
  • the MAC address of the interface, for example, 00:12:34:56:78:9a
  • the keyword link, which specifies the first interface with its link in the up state
  • the keyword bootif, which uses the MAC address that pxelinux set in the BOOTIF variable. Set IPAPPEND 2 in your pxelinux.cfg file to have pxelinux set the BOOTIF variable.
  • the keyword ibft, which uses the MAC address of the interface specified by iBFT

For example, consider a system connected to an NFS server through the eth1 device. To perform a kickstart installation on this system using a kickstart file from the NFS server, you would use the command ks=nfs:<server>:/<path> ksdevice=eth1 at the boot: prompt.

While ksdevice=link would work for some of the machines I am deploying, it wouldn't work for most since they have multiple interfaces and each one of those interfaces would have link, what I really wanted was ksdevice=bootif, which is the most sensible default.

So I modified the profile with ksdevice=link just to test, and that worked without issues, so then I modified the profile and added ksdevice=link, and this failed.

I figured I should check the pxelinux.cfg/default file that Cobbler generates upon issuing a cobbler sync and verify that ksdevice=bootif is actually listed correctly.

What I found was this:

LABEL CentOS-7.1-x86_64
        kernel /images/CentOS-7.1-x86_64/vmlinuz
        MENU LABEL CentOS-7.1-x86_64
        append initrd=/images/CentOS-7.1-x86_64/initrd.img ksdevice=${net0/mac} lang=  kssendmac text  ks=
        ipappend 2

This has a ksdevice=${net0/mac} which is not what I had put in the profile, overwriting ksdevice in the profile with ksdevice=link did correctly put that into the pxelinux.cfg/default file, so Cobbler was overwriting my change somehow.

A quick search of ${net0/mac} led me to a page about gPXE commandline items that contained the same variable. At which point I remembered that in Cobbler you set up your profile to be gPXE enabled or not. The default when you import an image is to enable gPXE support.

cobbler profile report  --name=CentOS-7.1-x86_64

Name                           : CentOS-7.1-x86_64
TFTP Boot Files                : {}
Comment                        : 
DHCP Tag                       : default
Distribution                   : CentOS-7.1-x86_64
Enable gPXE?                   : True
Enable PXE Menu?               : 1

So let's modify the profile to disable gPXE support:

cobbler profile edit --name=CentOS-7.1-x86_64 --enable-gpxe=False
cobbler sync

Verify that the change was made:

cobbler profile report  --name=CentOS-7.1-x86_64

Enable gPXE?                   : False

Then let's take a look at our pxelinux.cfg/default file and make sure that it looks correct:

LABEL CentOS-7.1-x86_64
        kernel /images/CentOS-7.1-x86_64/vmlinuz
        MENU LABEL CentOS-7.1-x86_64
        append initrd=/images/CentOS-7.1-x86_64/initrd.img ksdevice=bootif lang=  kssendmac text  ks=
        ipappend 2

This time our ksdevice is correctly set. Upon rebooting my PXE booted server it picked up the correct interface, made a DHCP request and kickstarted the server using the provided kickstart file, and installation completed successfully.

So unless you chain-boot gPXE from pxelinux by default, make sure that your profiles are not set to be gPXE enabled if you want to use them directly from the pxelinux menu.

While researching more about this article, I found a blog post by Vlad Ionescu about PXE installing RHEL 7 from Cobbler where he suggests disabling ksdevice entirely and adding an extra inst.repo variable to the kernel command line, however on older versions of CentOS 7 and Red Hat Enterprise Linux 7 there is a bug report that shows that an empty ksdevice could cause anaconda to crash, and setting a manual inst.repo for every profile seems like overkill when just disabling gPXE for the profile also solves the problem.

Neutron L3 agent with multiple provider networks

Due to requirements outside of my control, there was a requirement to run multiple "provider" networks each with each providing their own floating address pool from a single network node, I wanted to do this as simply as possible using a single l3 agent rather than having to figure out how to get systemd to start multiple with different configuration files.

Currently I've installed and configured an OpenStack instance that looks like this:

|                     |
|                  +--+----+
|                  |       |
|      +-----------+-+  +--+----------+
|      | Compute     |  | Compute     |
|      |     01      |  |     02      |
|      +------+------+  +-----+-------+
|             |               |
|             |               +----------+
|             +------------+--+          |
|                          |             |
| +-------------+    +-----+-------+     |
| | Controller  |    |   Network   |     |
| |             |    |             |     +---+  Tenant Networks (vlan tagged) (vlan ID's 350 - 400)
| +-----+----+--+    +------+----+-+
|       |    |              |    |
|       |    |              |    +-----------+  Floating Networks (vlan tagged) (vlan ID's 340 - 349)
|       |    |              |
|       |    |              |
+------------+--------------+----------------+  Management Network (
        +------------------------------------+  External API Network (

There are two compute nodes, a controller node that runs all of the API services, and a network node that is strictly used for providing network functions (routers, load balancers, firewalls, all that fun stuff!).

There are two flat networks that provide the following:

  1. External API access
  2. A management network that OpenStack uses internally to communicate between instances and to manage it, which is not accessible from the other three networks.

The other two networks are both vlan tagged:

  1. Tenant networks, with the possibility of 50 vlan ID's
  2. Floating networks, with existing vlan ID's for existing networks

Since the OpenStack Icehouse release, the l3 agent has supported the ability to use the Open vSwitch configuration to specify how traffic should be routed rather than statically defining that a single l3 agent routes certain traffic to a single Linux bridge. Setting this up is fairly simple if you follow the documentation, with one caveat, variables you think would be defined to no value, actually have a value and thus need to be explicitly zeroed out.

On the network node

First, we need to configure the l3 agent, so let's set some extra variables in /etc/neutron/l3-agent.ini:

gateway_external_network_id =
external_network_bridge =

It is important that these two are set, not left commented out, unfortunately when commented out they have some defaults set and it will fail to work, so explicitly setting them to blank will fix that issue.

Next, we need to set up our Open vSwitch configuration. In /etc/neutron/plugin.ini the following needs to be configured:

  • bridge_mappings
  • network_vlan_ranges

Note, that these may already be configured, in which case there is nothing left to do. Mine currently looks like this:

bridge_mappings = tenant1:br-tnt,provider1:br-ex

This basically specifies that any networks created under "provider name" tenant1 are going to be mapped to the Open vSwitch bridge br-tnt and any networks with "provider name" provider1 will be mapped to br-ex.

br-tnt is mapped to my tenant network and on the switch has vlan ID's 350 - 400 assigned, and br-ex has vlan ID's 340 - 349 assigned.

Following the above knowledge, my network_vlan_ranges is configured as such:

network_vlan_ranges = tenant1:350:400,provider1:340:349

Make sure to restart all neutron services:

openstack-service restart neutron

On the controller (where neutron-server lives)

On the controller we just need to make sure that our network_vlan_ranges matches what is on the network node, with one exception, we do not list our provider1 vlan ranges since we don't want to make those available to accidentally be assigned when a regular tenant creates a new network.

So our configuration should list:

network_vlan_ranges = tenant1:350:400

Make sure that all neutron services are restarted:

openstack-service restart neutron

Create the Neutron networks

Now, as an administrative user we need to create the provider networks.

source ~/keystonerc_admin

neutron net-create "" \
--router:external True \
--provider:network_type vlan \
--provider:physical_network provider1 \
--provider:segmentation_id 340

neutron net-create "" \
--router:external True \
--provider:network_type vlan \
--provider:physical_network provider1 \
--provider:segmentation_id 341

Notice how we've created two networks, given them each individual names (I like to use the name of the network they are going to be used for) and have been attached to the provider1. Note that provider1 is completely administratively defined, and could just as well have been physnet1, so long as it is consistent across all of the configuration files.

Now let's create subnets on this network:

neutron subnet-create "" \
--allocation-pool start=,end= \
--disable-dhcp --gateway

neutron subnet-create "" \
--allocation-pool start=,end= \
--disable-dhcp --gateway

Now that these networks are defined, you should be able to have tenants create routers and set their gateways to either of these new networks by selecting from the drop-down in Horizon or by calling neutron router-gateway-set <router id> <network id> on the command line.

The l3 agent will automatically configure and set up the router as required on the network node, and traffic will flow to either vlan 340 or vlan 341 as defined above depending on what floating network the user uses as a gateway.

This drastically simplifies the configuration of multiple floating IP networks since no longer is there a requirement to start up and configure multiple l3 agents each with their own network ID configured. This makes configuration less brittle and easier to maintain over time.

OpenStack resizing of instances

One thing that is not always adequately explained in the OpenStack documentation is how exactly instance resizing works, and what is required, especially while using KVM as the virtualisation provider, with multiple compute nodes.

You might find something similiar to the following in your logs, and no good documentation on how to fix it.

ERROR nova.compute.manager [req-7cb1c029-beb4-4905-a9d9-62d488540eda f542d1b5afeb4908b8b132c4486f9fa8 c2bfab5ad24642359f43cdff9bb00047] [instance: 99736f90-db0f-4cba-8f44-a73a603eee0b] Setting instance vm_state to ERROR
TRACE nova.compute.manager [instance: 99736f90-db0f-4cba-8f44-a73a603eee0b] Traceback (most recent call last):
TRACE nova.compute.manager [instance: 99736f90-db0f-4cba-8f44-a73a603eee0b]   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 5596, in _error_out_instance_on_exception
TRACE nova.compute.manager [instance: 99736f90-db0f-4cba-8f44-a73a603eee0b]     yield
TRACE nova.compute.manager [instance: 99736f90-db0f-4cba-8f44-a73a603eee0b]   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 3459, in resize_instance
TRACE nova.compute.manager [instance: 99736f90-db0f-4cba-8f44-a73a603eee0b]     block_device_info)
TRACE nova.compute.manager [instance: 99736f90-db0f-4cba-8f44-a73a603eee0b]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 4980, in migrate_disk_and_power_off
TRACE nova.compute.manager [instance: 99736f90-db0f-4cba-8f44-a73a603eee0b]     utils.execute('ssh', dest, 'mkdir', '-p', inst_base)
TRACE nova.compute.manager [instance: 99736f90-db0f-4cba-8f44-a73a603eee0b]   File "/usr/lib/python2.7/site-packages/nova/utils.py", line 165, in execute
TRACE nova.compute.manager [instance: 99736f90-db0f-4cba-8f44-a73a603eee0b]     return processutils.execute(*cmd, **kwargs)
TRACE nova.compute.manager [instance: 99736f90-db0f-4cba-8f44-a73a603eee0b]   File "/usr/lib/python2.7/site-packages/nova/openstack/common/processutils.py", line 193, in execute
TRACE nova.compute.manager [instance: 99736f90-db0f-4cba-8f44-a73a603eee0b]     cmd=' '.join(cmd))
TRACE nova.compute.manager [instance: 99736f90-db0f-4cba-8f44-a73a603eee0b] ProcessExecutionError: Unexpected error while running command.
TRACE nova.compute.manager [instance: 99736f90-db0f-4cba-8f44-a73a603eee0b] Command: ssh mkdir -p /var/lib/nova/instances/99736f90-db0f-4cba-8f44-a73a603eee0b
TRACE nova.compute.manager [instance: 99736f90-db0f-4cba-8f44-a73a603eee0b] Exit code: 255
TRACE nova.compute.manager [instance: 99736f90-db0f-4cba-8f44-a73a603eee0b] Stdout: ''
TRACE nova.compute.manager [instance: 99736f90-db0f-4cba-8f44-a73a603eee0b] Stderr: 'Host key verification failed.\r\n'
TRACE nova.compute.manager [instance: 99736f90-db0f-4cba-8f44-a73a603eee0b] 
ERROR oslo.messaging.rpc.dispatcher [-] Exception during message handling: Unexpected error while running command.
Command: ssh mkdir -p /var/lib/nova/instances/99736f90-db0f-4cba-8f44-a73a603eee0b
Exit code: 255
Stdout: ''
Stderr: 'Host key verification failed.\r\n'

When OpenStack's nova is instructed to resize an instance it will also change the host it is running on, almost never will it schedule the instance on the same host and do the resize on the same host it already exists. There is a configuration flag to change this, however in my case I would rather the scheduler be run again, especially if the instance size is changing drastically. During the resize process, the node where the instance is currently running will use SSH to connect to the instance where the resized instance will live, and copy over the instance and associated files.

There are a couple of assumptions I will be making:

  1. Your nova, and qemu user both have the same UID on all compute nodes
  2. The path for your instances is the same on all of your compute nodes

Configure the nova user

First things first, let's make sure our nova user has an appropriate shell set:

cat /etc/passwd | grep nova

Verify that the last entry is /bin/bash.

If not, let's modify the user and make it so:

usermod -s /bin/bash nova

Generate SSH key and configuration

After doing this the next steps are all run as the nova user.

su - nova

We need to generate an SSH key:

ssh-keygen -t rsa

Follow the directions, and save the key WITHOUT a passphrase.

Next up we need to configure SSH to not do host key verification, unless you want to manually SSH to all compute nodes that exist and accept the key (and continue to do so for each new compute node you add).

cat << EOF > ~/.ssh/config
Host *
    StrictHostKeyChecking no

Next we need to make sure we copy the the contents of id_rsa.pub to authorized_keys and set the mode on it correctly.

cat ~/.ssh/id_rsa.pub > .ssh/authorized_keys
chmod 600 .ssh/authorized_keys

This should be all the configuration for SSH you need to do. Now comes the import part, you will need to tar up and copy the ~nova/.ssh directory to every single compute node you have provisioned. This way all compute nodes will be able to SSH to the remote host to run the commands required to copy an instance over, and resize it.

Reset state on existing ERROR'ed instances

If you have any instances that are currently in the ERROR state due to a failed resize, you will be able to issue the following command to reset the state back to running and try again:

nova reset-state --active <ID of instance>

This will start the instance, and you will be able to once again issue the resize command to resize the instance.

Build numbers in binaries using waf

My build system of choice these days for any C++ project is waf. One of the things I always like havig is the build number included in the final binary, so that with a simple ./binary --version or even ./binary the version is printed that it was built from. This can make it much simpler to debug any potential issues, especially if fixes may have already been made but a bad binary was deployed.

Setup the wscript

Make sure that your wscript somewhere near the top contains the following:

APPNAME = 'myapp'
VERSION = '0.0.0'

Then in your configure(cfg) add the following:


git_version = try_git_version()

if git_version:
    cfg.env.VERSION += '-' + git_version

The try_git_version() function is fairly simple and looks like this:

def try_git_version():
    import os
    import sys

    version = None
        version = os.popen('git describe --always --dirty --long').read().strip()
    except Exception as e:
        print e
    return version

It runs git describe --always --dirty --long which will return something along these lines: 401b85f-dirty. If you have any annoted tags, it will return the tag name as well.

If git is not installed, or it is not a valid git directory, then it will simply return None. At that point all we have to go on is the VERSION variable set at the top of the wscript.

Now that we have our configuration environment set up with the VERSION we want to get that into a file that we can then include in our C++ source code.

Create a build_version.h.in file


char VERSION[] = "@VERSION@";

#endif /* BUILD_VERSION_H_IN_941AD1F24D0A9D */

Add the following to build(ctx)

        VERSION = ctx.env['VERSION'],

This uses the substitution feature to transform build_version.h.in into build_version.h, while inserting the version into the file.

Include build_version.h in your source code

#include "build_version.h"

And add something along these lines to your main():

std::cerr << "Version: " << VERSION << std::endl;

This will print out the VERSION that has been stored in build_version.h.

Full example

Check out my mdns-announce project on Github for an example of how this is implemented.