typedef int (*funcptr)();

An engineers technical notebook

Tamper proof session cookies and session storage

As a follow-up to my previous article regarding User sessions, what data should be stored where?, I wanted to discuss how to store the session, and how to generate cookies that are tamper proof.

What are we trying to accomplish?

Ultimately we want to be able to have X amount pieces of data that are tied to a particular user. Unfortunately due the fact that HTTP is a stateless protocol we have to use cookies. Cookies are small little pieces of data that are transmitted from the server to the client (generally done once), and then upon the user coming back to the website they are transmitted from the client to the server. This allows us to uniquely track a single user across connections to our website.

If the website allows a user to authenticate and the fact that they are authenticated is stored in the session, we also want to make sure that we can aggressively expire a session, if this is possible depends on our session storage.

Session storage

There are a multitude of ways to store the session data, but it ultimately boils down to server-side or client-side. Server-side can be done in Cassandra, Memcache, Redis or even in a SQL database.

Server-side storage

The main one that has been used for years is to use server side storage. Storing a small file on the servers hard drive that contains the data, and the client is sent a cookie that contains a unique identifier that is linked to the on-disk storage.

For example:

1 => /tmp/session_1
2 => /tmp/session_2
...
N => /tmp/session_N

Easily expire sessions

The upside to server-side storage is that it is possible for us to very easily expire a session, simply remove the associated file/data that is stored and the users session has now become invalid.

Client-side storage

The other method that has recently started being used more to make it easier to scale the server side is to store session data encoded in base64 in the cookie itself. In this case there is no unique session ID, and no data is stored server side.

Expiration is more difficult

The downside to using client-side storage is that there is no way, short of the expiration on the cookie itself for the website to expire a session. There are work-arounds, but they all require storing state server-side. A hybrid approach for example is possible, store a unique ID along with the session data, and store that unique ID server side, but none of the extra data. Remove the unique ID server side and if we receive a session that contains a unique ID we don't recognise, we simply clear the session.

Expiration, why do I care?

Being able to easily expire a users sessions allows for extra security measures. For example in Google Mail it is possible to sign out all other locations, this forces those other locations to re-authenticate before gaining access to your account.

This is a good security measure to have, so that if a users cookie is stolen, or their credentials are compromised upon changing their password all their sessions are invalidated and an attacker using an old cookie/session ID can't continue to wreak havoc on the users account.

Cookie format

If we are just storing a session ID, or the full session the cookie should be hardened so that it can not be tampered with by a client. Even if you are protecting the cookie using SSL, we still don't want to allow a malicious user to modify the cookie to change the session ID or the session itself.

Signing your cookie

The single best way to make sure your cookie has not been tampered with is to cryptographically sign your cookie, and upon receiving the cookie from the client verifying that the signature matches what you are expecting. This is especially important if you are using client-side storage, because you don't want someone to be able to change the user ID from 950 to 1 and suddenly impersonate a different user.

Use an HMAC

HMAC (Hash-based message authentication code) is an cryptographic construct that uses a hashing algorithm (SHA-1, SHA-256, SHA-3) to create a MAC (message authentication code) with a secret key. It is very easy given the secret key and the original data to create the MAC, but it is very difficult if not impossible to take the original data, and MAC and get the secret key.

This allows us to do the following:

data = "Hello World"
mac = HMAC(data, sha256, "SEEKRIT")

Our mac would now be equal to:

e655f98cb9b3c02f45576f7906d64b0b7f8731f25a5319c42ca666917aca45a4

If we now create our cookie as follows:

cookie = mac + " " + data

It would look as follows:

cookie = e655f98cb9b3c02f45576f7906d64b0b7f8731f25a5319c42ca666917aca45a4 Hello World

We can then send that to the client that requested the page. Once the client visits the next page, their browser will send that same cookie back to use. If we split the mac from the data, we can then do the following operation:

cookie = e655f98cb9b3c02f45576f7906d64b0b7f8731f25a5319c42ca666917aca45a4 Hello World
data = "Hello World"
mac = e655f98cb9b3c02f45576f7906d64b0b7f8731f25a5319c42ca666917aca45a4

mac_verify = HMAC(data, sha256, "SEEKRIT")

mac_verify == mac

If and only if mac_verify and mac are the same can we be sure that the cookie has not been tampered with.

This requires that the client is NEVER aware of what we are using as our secret key. In the above exmaples that is "SEEKRIT". In your web application you will be required to make this a configuration variable, and you will have to take care not to commit that configuration variable to a git repository and upload it to github (for example).

Do not use a bare hash algorithm

Using a bare hash algorithm allows for length extension attacks if used incorrectly, this would allow an attacker to concatenate extra data to the end of our existing data, modify the "MAC" and the server would accept it.

This construct is thus very dangerous:

data = "Hello World"
key = "SEEKRIT"
mac = SHA1(key + data)

The following construct is still not recommended, but is not nearly as dangerous:

mac = SHA1(data + key)

Due to the key being last, this is not vulnerable to a length extension attack, however please don't do this, instead stick to using an HMAC instead.

Encrypting session data

When using client-side storage, it may be beneficial to encrypt the data to add an extra layer of security. Even if encrypting the data you need to continue using a MAC.

Using just encryption will not protect you against decrypting bad data because an attacker decided to provide invalid data. Signing the cookie data with a MAC makes sure that the attacker is not able to mess with the ciphertext.

What are web frameworks/languages doing by default?

I am most familiar with the Pylons Project's Pyramid Web Framework, the default session implementation that is provided by the project is named SignedCookieSessionFactory, as the name implies this uses a client-side cookie to store the session data, which is signed using a secret key that is provided upon instantiation of the factory.

Flask sessions also uses a signed cookie for client-side session storage.

Ruby on Rails uses a signed/encrypted cookie for client-side session storage by default.

PHP does not by default sign the session cookie, it does however use server-side storage for session data by default. However extra security can be added by installing PHP SuHoSin which adds session cookie encryption/signing.

Building custom ports with Poudriere and Portshaker

Guest post by Scott Sturdivant.

Maintaining custom ports and integrating them into your build process doesn't need to be difficult. The documentation surrounding this process however is either non-existent, or lacking in its clarity. At the end of the day, it really is as simple as maintaining a repository whose structure matches the ports tree layout, then managing that repository and the standard ports tree with portshaker, and finally handing the end result off to poudriere.

Your Custom Repository

For this example, we'll assume a git repo is used and that you're already familiar with how to build FreeBSD ports. We'll also assume that we have but a single port that we're maintaining and that it is called myport. The hierarchy of your repo should simply be category/myport. We'll refer to this repo simply as myrepo.

Portshaker

Portshaker is the tool responsible for taking multiple ports sources and then merging them down into a single target. In our case, we have two sources: our git repo (myrepo) containing myport, and the standard FreeBSD ports tree. We aim to merge this down into a single ports tree that poudriere will then use for its builds.

To configure portshaker, add the following to the /usr/local/etc/portshaker.conf file:

# vim:set syntax=sh:
# $Id: portshaker.conf.sample 116 2008-09-30 16:15:02Z romain.tartiere $

#---[ Base directory for mirrored Ports Trees ]---
mirror_base_dir="/var/cache/portshaker"

#---[ Directories where to merge ports ]---
ports_trees="default"

use_zfs="no"
poudriere_ports_mountpoint="/usr/local/poudriere/ports"
default_poudriere_tree="default"
default_merge_from="freebsd myrepo"

Some key points here are that the two items listed in for the default_merge_from argument need to have scripts present in the /usr/local/etc/portshaker.d directory. Further more, the combination of the poudriere_ports_mountpoint and default_poudriere_tree needs to be a ports tree that is then registered with poudriere.

Next, we need to tell portshaker how to go off and fetch our two types of ports trees, freebsd and myrepo. For the freebsd ports tree, create /usr/local/etc/portshaker.d/freebsd with the following contents and make it executable:

#!/bin/sh
. /usr/local/share/portshaker/portshaker.subr
method="portsnap"
run_portshaker_command $*

Next, create a similar script to handle our repository containing our custom port. /usr/local/etc/portshaker.d/myrepo should contain the following and similarly be executable:

#!/bin/sh
. /usr/local/share/portshaker/portshaker.subr
method="git"
git_clone_uri="http://github.com/scott.sturdivant/packaging.git"
git_branch="master"
run_portshaker_command $*

Obviously replace the git_clone_uri and git_branch variables to reflect your actual configuration. For more information about the values and what they can contain, consult man portshaker.d

Now, portshaker should be all set. Execute portshaker -U to update your merge_from ports trees (freebsd and myrepo). You'll see the standard portsnap fetch and extract process as well as a git clone. After a good bit of time, these will both be present in the /var/cache/portshaker directory. Go ahead and merge them together by executing portshaker -M.

Hooray! You now have /usr/local/poudriere/ports/default/ports that is a combination of the normal ports tree and your custom one.

We're effectively complete with configuring portshaker. Whenever your port is updated, just re-run portshaker -U and portshaker -M to grab the latest changes and perform the merge.

Poudriere

Poudriere is a good tool for building ports. We will use it to handle our merged directory. Begin by configuring poudriere (/usr/local/etc/poudriere.conf):

NO_ZFS=yes
FREEBSD_HOST=ftp://ftp.freebsd.org
RESOLV_CONF=/etc/resolv.conf
BASEFS=/usr/local/poudriere
USE_PORTLINT=no
USE_TMPFS=yes
DISTFILES_CACHE=/usr/ports/distfiles
CHECK_CHANGED_OPTIONS=yes

Really there's nothing here that is specific to the problem at hand, so feel free to consult the provided configuration file to tune it to your needs.

Now, the step that is specific is to set poudriere up with a ports tree that it does not manage, specifically our resultant merged directory. If you consult man poudriere, it specifies that for the ports subcommand, there is a -m method switch which controls the methodology used to create the ports tree. By default, it is portsnap. This is confusing as in our case, we do not want poudriere to actually do anything. We want it to just use an existing path. Fortunately, there is a way!

The poudriere wiki has an entry for using the system ports tree, so we adopt it for our needs by executing:

poudriere ports -c -F -f none -M /usr/local/poudriere/ports/default \
-p default

If you've consulted the poudriere manpage, you'll see that the -F and -f switches both reference ZFS in their help. As we're not using ZFS, it's not clear how they will behave. However, in conjunction with the custom mountpoint (-M /usr/local/poudriere/ports/default), we ultimately wind up with what we want, a ports tree that poudriere can use, but does not manage:

# poudriere ports -l
PORTSTREE            METHOD     PATH
default              -          /usr/local/poudriere/ports/default

Note that this resulting PATH is the combination of the poudriere_ports_mountpoint and default_poudriere_tree variables present in our /usr/local/etc/portshaker.conf configuration file.

Building software from your custom ports tree

Go ahead and create your jail(s) like you normally would (i.e. poudriere -c -j 92amd64 -V 9.2-RELEASE -a amd64) and any other configuration you would like, and then go ahead and build myport with poudriere bulk -j 92amd64 -p default category/myport. Success!

User sessions, what data should be stored where?

This article has been updated. For the old version please check Archive.org

A couple of days ago on reddit.com's /r/netsec a poster by the name of Dan Weber posted what he believed to be an attack on PHP sessions: Hacking PHP sessions by running out of memory (reddit link). The way the "attack" works is as follows:

  1. Create a new session
  2. Assign some new data to the session, in this case a username which is used to signify that the user is logged in.
  3. Check to verify that the user should be logged in
  4. If the user should not be logged in, destroy the session.

The "attack" would be to run the PHP script out of memory on number 3, since once something is set on the session it is immediately stored, so even if the user is not supposed to be logged in, they are now logged in since their session says they are.

I wouldn't necessarily call this a PHP hack, this is really just bad practice in terms of programming, the logic should be reversed.

  1. Create a new session
  2. Verify the user should be logged in
  3. If so, set the session data
  4. If the user should not be logged in, don't set the session data

That would solve the problem at hand, and now there is no way for the user to trick the PHP script into believing she is logged in when that is not the case.

However as the discussion went on on Reddit, it became even more clear that there are no good resources on what you should store in the user session, and what you shouldn't store in the user session. Some of these things may seem like common knowledge, but sadly this is something every single new person to programming has to learn on their own.

Some assumptions

Let's get this out of the way, this is in no way limited to PHP, but it is the one I will be using as an example. This can all apply to Ruby (Ruby on Rails), Python (Pyramid) or many other frameworks.

The basic problem is that generally writing to the session is not an atomic transaction based on the page accessed, so the assumption made in this article is that when you write to the session it is instantly committed, and there is no way to roll it back upon failure. If there was, our first example listed wouldn't be able to occur since upon running out of memory at step 3, the session would have been rolled back and cleaned up.

What should you store in the users session?

You should only really store anything in the session that if it were made public it would do very little harm.

Really the list of items to be stored in a session are as follows:

  1. The users unique id (The ID that allows you to retrieve the users information from storage)
  2. Temporary state (i.e. Flash messages)
  3. CSRF token

More importantly, don't store permission bits, or group memberships, or anything that is used to allow/deny access to particular resources. You want to store just enough information that upon a user accessing your site you are able to retrieve the users information from storage, and based upon that information from storage you then make decisions such as permissions/group memberships.

Why is this important?

One of the things that Dan Weber brought up in the Reddit post was storing the users permission level and group membership in the session. If your code is then relying on the session to always contain the right permission level, then there is no way to expire someones access to the data.

  1. A user logs in
  2. A user gets various permissions, and they are set in the session
  3. The user has been fired from his job, and an administrator removes the users permissions
  4. Since the user is still logged in, the users permission bits are still set, and he continues to have access to parts of the site/information he shouldn't have access to.

If instead on every page visit we simply pull out the users unique id and verify the permissions upon access, as soon as the permissions are revoked by the administrator the user no longer has access to the various resources.

Temporary state

There has to be an easy way to remember something from page visit to page visit that isn't considered detrimental if the information gets lost. One of those things is flash messages. Flash messages are generally used to provide the user indication that something has changed, they are shown once and then never again.

Storing these as session data makes sense. If the flash message gets set, great, if it doesn't get set, it doesn't matter. Flash messages are simply a notification tool, if the user misses them it isn't important.

What you shouldn't store in the session

Definitely don't store any kind of permission bits, groups a user is a part of or anything that would allow the user access that they normally would not be able to access.

On each page access check what permissions the user has. While it may mean a little more heavy lifting server side it provides extra security, and the means to enforce changes in permissions instantly.

Good secure programming practices

Keep secure programming practices in mind at all times, always consider how the information you are storing/processing is accessed/viewed/administered. More importantly think about the access controls that are in place, and how one could expire access to a particular resource without requiring a co-operative client.

The ordering of how variables are set, and when they are set are very important. $_SESSION['isadmin'] = True at the top of a PHP script, and then removing it by checking to see if the user is actually an administrator later on in the script is a bad idea.

Always order your code so that if a failure does occur there is no chance that a critical section of your code is executed by accident, or that information is stored in a half-verified state. This is especially important for access control.

Embedding ZeroMQ in the libev Event Loop

In a previous article on ZeroMQ we went over how ZeroMQ is triggered when you use the socket that ZeroMQ returns, in that article there was some discussion of embedding ZeroMQ into another event loop. Let's do that.

libev is an absolutely fantastic library that helps make it easy to write evented programs. Evented programs work by getting notified that an action has happened, and acting upon it. Unlike threaded where multiple pieces of work are being executed at the same time, in an evented system you move every item that could block to an event loop, that then calls back into user code with a notification to continue. If one event uses up more than its fair share of CPU time because it is busy doing a long calculation, every single other event that is waiting will never get notified.

Now, as previously discussed ZeroMQ is edge triggered, so embedding it into an event loop that is level triggered doesn't do us much good, because we will miss certain ZeroMQ notifications.

One way to solve this problem is by looping over ZeroMQ's event system until we get back a notification that it no longer has anything else for us to process, that would look something like this1:

int zevents = 0;
size_t zevents_len = sizeof(zevents);
zmq_socket.getsockopt(ZMQ_EVENTS, &zevents, &zevents_len);

do {
    if (zevents & ZMQ_POLLIN) {
        // We can read from the ZeroMQ socket
    } else {
        break;
    }

    // Check to see if there is more to read ...
    zmq_socket.getsockopt(ZMQ_EVENTS, &zevents, &zevents_len);
} while (zevents & ZMQ_POLLIN);

if (zevents & ZMQ_POLLOUT) {
    // We can write to the ZeroMQ socket
}

// If neither of the above is true, then it was a false positive

However if we are receiving information from ZeroMQ remote endpoints faster than we can process them, we end up being stuck in that do ... while loop forever. If we have other events we want to process, that isn't entirely fair since they will never ever get called again. Especially in a server application where it may be servicing thousands of clients this is simply not acceptable.

libev

libev provides various different event notifications, to be able to get around edge triggered notifications, and still provide fair round-robin for all events we are going to have to build on top of multiple different events.

The events used will be:

  • ev::io: This one is pretty self explanatory, this is for getting notified about input output changes. This is the one we are going to use on the ZMQ_FD.
  • ev::prepare and ev::check: These two are generally used together, they can be used to change the event loop and or make modifications on the fly to events that have been registered with the event loop.
  • ev::idle: This is an event that gets fired whenever the event loop has nothing else to do, so no other events fired, this will fire.

Plan of attack

Since the prepare and check run before and after the loop, we are going to be using those to do most of the work. We use an io so that we can turn off the idle when we can actually wait for a result from ZeroMQ's file descriptor, otherwise we use idle so that we will always get called once every loop.

In the prepare watcher callback we do the following:

  1. Check to see what events ZeroMQ has for us, and check what events the user has requested.
  2. If the ZeroMQ has an event for us that we want, and the user has requested that event, we start the idle watcher.
  3. If ZeroMQ has no events, we start the io watcher.

In the check watcher callback we do the following:

  1. Stop both the io and idle watchers, they were only there to make sure that our check watcher was called.
  2. See what event ZeroMQ has for us, and check that against what the user wants. Depending on the event, call user defined function write() or user defined function read().
  3. If this was a spurious wake-up on the part of ZeroMQ we simply ignore it and let libev go on to other events.

We could make all of this work by simply using the prepare, check and idle watchers, but that would mean libev would be busy-waiting for something to happen on the ZeroMQ socket. The io watcher is required simply so in times of nothing happening libev in its library can call into the kernels event handling mechanism and go to sleep. We can't use just the io watcher due to the edge-triggered notification, because we'd miss all kinds of ZeroMQ messages. So all four watchers are required, and play crucial parts in making this work.

Let's get down to code

Below you will find example code, it is not complete. Do note that I am using some C++11isms, error checking code may not be complete/correct and in general I don't suggest you copy and paste this without reading and understanding what it does.

The zmq_event class is meant to be used as a base class, inherit from it, and create the write() and read() functions. These functions will be called when you are able to read from the ZeroMQ socket, or when you are able to write to the ZeroMQ socket. You are guaranteed to be able to read one whole ZeroMQ message, so if it is a multi-part message, do make sure to loop on ZMQ_SNDMORE as required.

Upon instantiation it will automatically start being notified about events, we start off with ev::READ. When your sub-class wants to write to ZeroMQ it should put the messages to be written into a list somewhere, and set ev::READ | ev::WRITE on watcher_io, by calling watcher_io.set(socket_fd, ev::READ | ev::WRITE). write() will then be called, write a single message to ZeroMQ, and if necessary when finished writing, unset ev::WRITE using watcher_io.set(socket_fd, ev::READ). If you are not finished writing, after writing that singular message you may return and write() will be called again the next loop iteration. This way if you have a lot of data to write you don't starve the other events from receiving their notifications.

zmq_event.h

#include <string>

#include <ev++.h>
#include <zmq.hpp>

class zmq_event {
    public:
        zmq_event(zmq::context_t& context, int type, const std::string& connect);
        virtual ~zmq_event();

    protected:
        // This gets fired before the event loop, to prepare
        void before(ev::prepare& prep, int revents);

        // This is fired after the event loop, but before any other type of events
        void after(ev::check& check, int revents);

        // We need to have a no-op function available for those events that we
        // want to add to the list, but should never fire an actual event
        template <typename T>
            inline void noop(T& w, int revents) {};

        // Function we are going to call to write to the ZeroMQ socket
        virtual void write() = 0;

        // Function we are going to call to read from the ZeroMQ socket
        virtual void read() = 0;

        // Some helper function, one to start notifications
        void start_notify();

        // And one to stop notifications.
        void stop_notify();

        // Our event types
        ev::io      watcher_io;
        ev::prepare watcher_prepare;
        ev::check   watcher_check;
        ev::idle    watcher_idle;

        // Our ZeroMQ socket
        zmq::socket_t socket;
        int           socket_fd = -1;
};

zmq_event.cc

#include <stdexcept>

#include "zmq_event.h"

zmq_event::zmq_event(zmq::context_t& context, int type, const std::string& connect) : socket(context, type) {
    // Get the file descriptor for the socket
    size_t fd_len = sizeof(_socket_fd);
    socket.getsockopt(ZMQ_FD, &socket_fd, &fd_len);

    // Actually connect to the ZeroMQ endpoint, could replace this with a bind as well ...
    socket.bind(connect.c_str());

    // Set up all of our watchers

    // Have our IO watcher check for READ on the ZeroMQ socket
    watcher_io.set(socket_fd, ev::READ);

    // This watcher has a no-op callback
    watcher_io.set<zmq_event, &zmq_event::noop>(this);

    // Set up our prepare watcher to call the before() function
    watcher_prepare.set<zmq_event, &zmq_event::before>(this);

    // Set up the check watcher to call the after() function
    watcher_check.set<zmq_event, &zmq_event::after>(this);

    // Set up our idle watcher, once again a no-op
    watcher_idle.set<zmq_event, &zmq_event::noop>(this);

    // Tell libev to start notifying us!
    start_notify();
}

zmq_event::~zmq_event() {}

zmq_event::before(ev::prepare&, int revents) {
    if (EV_ERROR & revents) {
        throw std::runtime_error("libev error");
    }

    // Get any events that may be waiting
    uint32_t zevents = 0;
    size_t zevents_len = sizeof(zevents);

    // Lucky for us, getting the events available doesn't invalidate the
    // events, so that calling this in `before()` and in `after()` will
    // give us the same results.
    socket.getsockopt(ZMQ_EVENTS, &zevents, &zevents_len);

    // Check what events exists, and check it against what event we want. We
    // "abuse" our watcher_io.events for this information.
    if ((zevents & ZMQ_POLLOUT) && (watcher_io.events & ev::WRITE)) {
        watcher_idle.start();
        return;
    }

    if ((zevents & ZMQ_POLLIN) && (watcher_io.events & ev::READ)) {
        watcher_idle.start();
        return;
    }

    // No events ready to be processed, we'll just go watch some io
    watcher_io.start();
}

zmq_event::after(ev::check&, int revents) {
    if (EV_ERROR & revents) {
        throw std::runtime_error("libev error");
    }

    // Stop both the idle and the io watcher, no point in calling the no-op callback
    // One of them will be reactived by before() on the next loop
    watcher_idle.stop();
    watcher_io.stop();

    // Get the events
    uint32_t zevents = 0;
    size_t zevents_len = sizeof(zevents);
    socket.getsockopt(ZMQ_EVENTS, &zevents, &zevents_len);

    // Check the events and call the users read/write function
    if ((zevents & ZMQ_POLLIN) && (watcher_io.events & ev::READ)) {
        this->read();
    }

    if ((zevents & ZMQ_POLLOUT) && (watcher_io.events & ev::WRITE)) {
        this->write();
    }
}

zmq_event::start_notify() {
    watcher_check.start();
    watcher_prepare.start();
}

zmq_event::stop_notify() {
    watcher_check.stop();
    watcher_prepare.stop();
}

Other event loops

libev is but just one of many event loops that exist out there, hopefully this shows how it is possible to embed ZeroMQ into an event loop, thereby making it easier to embed ZeroMQ into any other event loops.


  1. This snippet was from my older article regarding ZeroMQ edge triggered notifications. I would highly suggest reading that article for more information and even more background on what is going on. 

ZeroMQ - Edge Triggered Notification

ZeroMQ is an absolutely fantastic communications library that has allowed me to very easily create networks/connections between multiple different applications to communicate in an extremely fast yet directed manner and allow scaling of the connected entities with almost no second thought.

I have used ZeroMQ within Python and C++, and it has been fantastic leaving that part of the communication layer to a library rather than writing something similar myself. However, embedding ZeroMQ into an another event loop has proven to have some quirks. They are documented in the API, but not so much by other users/examples found around the web, and caught me by surprise.

This post mainly goes over the issue, the difference between edge and level triggered and how ZeroMQ works. I am hoping to spend some time documenting how to embed ZeroMQ correctly inside libev at a later date.

The quirk

ZeroMQ allows you to get an underlying file descriptor (ZMQ_FD) for a ZeroMQ socket using getsockopt(), this is not the actual ZeroMQ file descriptor but a stand-in:

int fd = 0;
size_t fd_len = sizeof(fd);
zmq_socket.getsockopt(ZMQ_FD, &fd, &fd_len);

This file descriptor can then be passed on to poll(), select(), kqueue() or whatever your event notification library of choice may be, the idea being that if the file descriptor is notified to be available for reading then you check the sockets ZMQ_EVENTS and find out if you are allowed to read or write to the ZeroMQ socket, or even possibly neither (false positive). The same file descriptor is also used internally within the ZeroMQ library for notification and thus we may receive an event notification that we can't do anything with.

As stated in the ZeroMQ API guide for getsockopt() under ZMQ_FD:

the ØMQ library shall signal any pending events on the socket in an edge-triggered fashion by making the file descriptor become ready for reading.

and further it states:

The ability to read from the returned file descriptor does not necessarily indicate that messages are available to be read from, or can be written to, the underlying socket; applications must retrieve the actual event state with a subsequent retrieval of the ZMQ_EVENTS option.

So far so good, so after we return from our event notification on the file descriptor we simply ask ZeroMQ whether or not we can read or write (or possibly both):

int zevents = 0;
size_t zevents_len = sizeof(zevents);
zmq_socket.getsockopt(ZMQ_EVENTS, &zevents, &zevents_len);

if (zevents & ZMQ_POLLIN) {
    // We can read from the ZeroMQ socket
}

if (zevents & ZMQ_POLLOUT) {
    // We can write to the ZeroMQ socket
}

// If neither of the above is true, then it was a false positive

So now you are happily reading from ZeroMQ and using the data as you wish, except you start to notice that sometimes it can take a little while for data to show up, or that if the end-point you are connected to on ZeroMQ sends a lot of data it is not delivered in a timely fashion. This is where the documentation for edge triggered comes into play, and that is something that is different about ZeroMQ compared to for example BSD sockets.

Edge triggered versus level triggered

Edge and level triggered refer to what kind of signal a device will output in an attempt to gain the attention of another device. Edge and level triggered have their history in electronics and hardware, where they play a very crucial role.

For both level and edge triggered assume the data is represented by this simple string:

00000000000111111111111111111111100000000011111111111000000000

The zeroes (0) are where no data is available, and the ones (1) are where data is available for processing.

Level triggered

Level triggered devices when they require attention, and until they no longer require attention will signal with a high voltage (going back to early electronics), and when they no longer require attention signal with a low voltage.

Data:  00000000001111111111111111111111000000000011111111111000000000
Block:      A               B              C         D
+ 5v              ____________________           _________
  -    __________|                    |_________|         |__________
  0v
State: 0          1                    0          1        0

At this point, once the signal turns to 1 in block B we can start processing data, and until we are done processing data the signal will stay 1. Once we are done processing the device will signal that we are done by setting the state to 0 in block C, and the process can repeat itself with block D.

Edge triggered

Edge triggered devices when they require attention will simply turn on the signal for a short period of time and then turn it back off until all of the requireding processing is complete at which point they will once again signal that they need processing.

Data:  00000000001111111111111111111111100000000011111111111000000000
Block:      A     B              C                 D
+ 5v              __                              __
  -    __________|  |____________________________|  |________________
  0v
State: 0          1              0                1        0

In this case when the trigger arrives you have to process all of the data available. When block B arrives and we get 1 as our signal, we have to process all of the data, we won't know when we are done other than some other notification scheme by the device. (Such as a special message that says we are done). Then when we are done we can go back to listening to waiting for a trigger.

If however in the time that we are processing the data from the trigger at B receive more data to process we won't get notified because we weren't officially done processing data from the device, and thus the device will not notify us of the new data again until we have processed all of the data.

There are some devices that as soon as you start processing their data, you re-enable the trigger, so if you receive new data while still processing the old you will get a new notification.

ZeroMQ uses edge triggered

When multiple message arrive for the ZeroMQ socket, one notification is sent. You check to see if the ZMQ_EVENTS actually contains ZMQ_POLLIN and then read a single message from ZeroMQ. Content that you have processed the data ZeroMQ has for you, you add the file descriptor back to the event notification library and wait. However, you know you sent two messages, but all you got was one. The next time you send data to the socket, it returns the second message, and you have now send three but the third doesn't seem to arrive.

This is where the edge triggered comes in. ZeroMQ only notifies you once, and after it has notified you it won't notify you again until it receives new messages, EVEN if you have messages waiting for you. In BSD sockets when you recv() data from the socket, if it is not all of the data it will simply tell you, and until all of the data has been read that the socket has to offer it is going to tell you (level triggered).

This means that when you get a notification from ZeroMQ you have to use a loop to process all of the messages that are coming your way:

int zevents = 0;
size_t zevents_len = sizeof(zevents);
zmq_socket.getsockopt(ZMQ_EVENTS, &zevents, &zevents_len);

do {
    if (zevents & ZMQ_POLLIN) {
        // We can read from the ZeroMQ socket
    } else {
        break;
    }

    // Check to see if there is more to read ...
    zmq_socket.getsockopt(ZMQ_EVENTS, &zevents, &zevents_len);
} while (zevents & ZMQ_POLLIN);

if (zevents & ZMQ_POLLOUT) {
    // We can write to the ZeroMQ socket
}

// If neither of the above is true, then it was a false positive

The same is true if you need to send multiple messages, you will need to check ZMQ_EVENTS for ZMQ_POLLOUT and send messages so long as that holds true, as soon as it is no longer true you will need to wait until you get triggered again by using the file descriptor in your event notification library.

Embedding or co-existing with another event loop

Embedding ZeroMQ into a different event loop becomes more difficult when you realise that it is edge triggered versus level triggered. If you want to allow fair processing of data between events it becomes harder to accomplish with ZeroMQ because you are required to process all of the data before going back to your event loop or requires the setting up of different types of events depending on the state that the ZeroMQ socket is in to make sure you process all of the data and don't forget any.

Edge triggered for the ZeroMQ file descriptor was a design decision which unfortunately made the use of ZeroMQ sockets more difficult and different from BSD sockets which makes it harder for application programmers to do the right thing. There are plenty of questions on the internet regarding the proper usage of ZMQ_FD with various different event notification schemes/implementations.