22:39 07 Aug 2012

IPv6 -- getaddrinfo() and bind() ordering with V6ONLY

Recently I ran into an issue that took me a while to sort out, and it is regarding inconsistent behaviour on various OS's with regards to IPv6 sockets (AF_INET6¹) and calling bind(2) after getting the results back from getaddrinfo(3).

A call to getaddrinfo() with the hints set to AF_UNSPEC in ai_family and AI_PASSIVE in ai_flags will return to us 1 or more results that we can bind() to. Sample code for that looks like this:

struct addrinfo hints, *addrlist;

memset(&hints, 0, sizeof(hints));

// Ask for TCP
hints.ai_socktype = SOCK_STREAM;

// Any family works for us ...
hints.ai_family = AF_UNSPEC;

// Set some hints
hints.ai_flags = 
            AI_PASSIVE    | // We want to use this with bind
            AI_ADDRCONFIG;  // Only return IPv4 or IPv6 if they are configured

int rv;

if ((rv = getaddrinfo(0, "7020", &hints, &addrlist)) != 0) {
    fprintf(stderr, "getaddrinfo: %s", gai_strerror(rv));
    return 1;
}

// Use the list in *addrlist
for (addr = addrlist; addr != 0; addr = addr->ai_next) {
    // use *addr as appropriate
}

// Clean up the memory from getaddrinfo()
freeaddrinfo(addrlist);

On Linux there are two entries returned when the host it is run on has both IPv4 and IPv6 enabled. An AF_INET which was followed by an AF_INET6. Now, it is not said that you are required to use all of the results that are returned, but if you want to listen on all address families it is off course suggested.

Following the steps below for each of the returned results should result in having 1 or more different sockets that are bound to a single port.

Create the socket()
Set any socket options you want (SO_REUSEADDR for example)
Then bind() the socket
After that call listen() (followed off course by accept() on the socket)

Only for some unknown reason (and errno is no help) bind() fails when you get to the AF_INET6, which was returned second. Searching online as to why the bind would fail doesn't give you any good results and the thing that is even worse is that if you run the same code on another platform such as FreeBSD, OpenIndiana or Mac OS X no such failure exists. However I started suspecting something was up when I started looking at the output from netstat -lan | grep 7020 on Mac OS X. Where 7020 is the port I passed into getaddrinfo().

tcp46      0      0  *.7020                 *.*  LISTEN     
tcp4       0      0  *.7020                 *.*  LISTEN

Wait a minute ... one of the sockets is on both IPv4 and on IPv6. Some more time spent searching the internet I came across RFC 3493 section 5.3, which is titled "IPV6_V6ONLY option for AF_INET6 Sockets".

As stated in section <3.7 Compatibility with IPv4 Nodes>, AF_INET6 sockets may be used for both IPv4 and IPv6 communications. Some applications may want to restrict their use of an AF_INET6 socket to IPv6 communications only.

This was going down the right route, so I changed my code so that in the steps listed above in number 2 I added the following code if the socket type is AF_INET6:

if (setsockopt(sockfd, IPPROTO_IPV6, IPV6_V6ONLY, &yes, sizeof(int)) == -1) {
    close(sockfd);
    fprintf(stderr, "setsockopt: %s IPV6_V6ONLY\n", strerror(errno));
    continue;
}

The RFC 3493 section 5.3 also states that this option should be turned off by default, which means that all IPv6 sockets can also communicate over IPv4. Thus technically setting the option manually in code the best way to fix the issue. FreeBSD has had this feature turned on (as in IPv6 sockets can only communicate with IPv6 and NOT IPv4) since 5.x.

The biggest issue is that the remaining operating systems (OS X and OpenIndiana) don't have the same behaviour as Linux which makes troubleshooting this issue more difficult than it should be. The issue is that the RFC doesn't specify what exactly the operating should do when it encounters a request to bind to the same port on IPv4 and IPv6. The only place where I have found this documented is in "IPv6 Network Programming" under "Tips in IPv6 Programming" chapter 4, section 4, appropriately titled "bind(2) Ordering and Conflicts".

If you get a bind() error when attempting to bind to an AF_INET6 socket please make sure that you set the socket option IPV6_V6ONLY on the AF_INET6 socket. The default as required by RFC 3493 is to have that option be off. The default is wrong, and the RFC should have been more specific regarding what the right behaviour is when attempting to bind on an AF_INET6 socket when already bound on an AF_INET while IPV6_V6ONLY is set to false.

The full code that I used for testing, along with a little bit more information is available as a gist on github.

The old BSD style socket() called for defines starting with PF_ such as PF_INET and PF_INET6 with the PF standing for protocol family. POSIX starts them with AF_, and calls them an address family. On almost every operating system PF_INET is the same as AF_INET. If the define doesn't exist you can always create it. ↩

00:10 18 Jan 2009

Upgrading an OpenSolaris ipkg zone

Recently I was working on my OpenSolaris machine (file server, ZFS rocks, will write more on that later), and I have one non-global zone, one that use for web development, which is aptly called dev-web.

So the following shows up when I run zoneadm:

xistence@Keyhole.network.lan:~# zoneadm list -iv
  ID NAME             STATUS     PATH                           BRAND    IP    
   0 global           running    /                              native   shared
   4 dev-web          running    /storage/zones/dev-web         ipkg     shared

The thing is, I had upgraded the global zone with the latest update available for the version (snv_101b):

pkg image-update -v

This had not upgraded my one none-global zone. And running pkg image-update from within the zone itself is not possible, because you can't do an upgrade on a "live" system, mainly because the image-update wants to create a new bootable environment, something that was already created when I upgraded the global zone. So what we have to do is mount the non-global zone to /mnt and tell pkg with -R where to find it and upgrade it anyway!

First we are going to halt the current zone, since I am mean and nothing important is running on the test bed system, I just used:

zoneadm -z dev-web halt

However, the better way is to off course use the shutdown command:

zlogin dev-web shutdown

and then for good measure a halt!

Next up, looking at what we are going to be mounting.

zfs list
NAME                               USED  AVAIL  REFER  MOUNTPOINT
rpool                             11.9G  42.7G    75K  /rpool
rpool/ROOT                        2.93G  42.7G    18K  legacy
rpool/ROOT/opensolaris            7.34M  42.7G  2.74G  /
rpool/ROOT/opensolaris-1          2.92G  42.7G  2.74G  /
rpool/dump                        1.87G  42.7G  1.87G  -
rpool/export                      5.27G  42.7G    19K  /export
rpool/export/home                 5.27G  42.7G    21K  /export/home
rpool/export/home/guest           5.24G  14.8G  5.24G  /export/home/guest
rpool/export/home/xistence        37.0M  42.7G  37.0M  /export/home/xistence
rpool/swap                        1.87G  43.0G  1.51G  -
storage                            493G  3.08T  35.1K  /storage
storage/media                      398G  3.08T   398G  /storage/media
storage/virtualbox                2.10G  3.08T  2.10G  /storage/virtualbox
storage/xistence                  91.7G  3.08T  91.7G  /storage/xistence
storage/zones                      957M  99.1G  30.4K  /storage/zones
storage/zones/dev-web              957M  19.1G  32.0K  /storage/zones/dev-web
storage/zones/dev-web/ROOT         957M  19.1G  28.8K  legacy
storage/zones/dev-web/ROOT/zbe    1.60M  19.1G   936M  legacy
storage/zones/dev-web/ROOT/zbe-1   955M  19.1G   935M  legacy

When pkg image-update was run on the global zone it created a new bootable environment named opensolaris-1, the cool thing is, that beadm at the same time will also create a new bootable environment for your zones. That way you can upgrade your zones afterwards, and if stuff does not work, you can revert the ENTIRE machine back to the previous state (ZFS is cool like that), thereby also making sure that your zones are reverted so that there are no incompatibilities.

So what we are looking for in this case is a zbe-1, this is the new root for the zone that we need to update, so we now need to mount it.

mount -F zfs storage/zones/dev-web/ROOT/zbe-1 /mnt

Note, that there is no / in front of storage, this is because we are specifying a pool name, since there is no "real" path that is defined as /storage/zones/dev-web/ROOT/zbe-1. Now that it is mounted, we are able to pass the -R flag to pkg, to get it to update our zone:

xistence@Keyhole.network.lan:~# pkg -R /mnt image-update -v
Creating Plan / Before evaluation:     
UNEVALUATED:
+pkg:/entire@0.5.11,5.11-0.101:20081204T010954Z

After evaluation:
pkg:/entire@0.5.11,5.11-0.101:20081119T235706Z -> pkg:/entire@0.5.11,5.11-0.101:20081204T010954Z
Actuators:

None
PHASE                                        ACTIONS
Update Phase                                     1/1 
PHASE                                          ITEMS
Reading Existing Index                           9/9 
Indexing Packages                                1/1

---------------------------------------------------------------------------
NOTE: Please review release notes posted at:
   http://opensolaris.org/os/project/indiana/resources/relnotes/200811/x86/
---------------------------------------------------------------------------

Voilá, and the deed is done. The last command is to off course unmount the zone, that we can then issue a zoneadm boot command to start it back up:

umount /mnt
zoneadm -z dev-web boot

And then on the zone after we log into it (over SSH in my case):

xistence@webdev.network.lan:~# pkg list -u
No installed packages have available updates

Which is exactly what we wanted! Your zone is now upgraded with the latest version available from the global zone.

typedef int (*funcptr)();

An engineers technical notebook

IPv6 -- getaddrinfo() and bind() ordering with V6ONLY

Upgrading an OpenSolaris ipkg zone