typedef int (*funcptr)();

An engineers technical notebook

Python Packaging and Distribution

There's been many a discussion online related to a variety of tools related to Python packaging and distribution. There is pip, pipenv and poetry that have been the tools under discussion.

As an open source maintainer as part of the Pylons Project, while I would love to be writing code I end up spending a lot of time dealing with user questions around packaging/distributing their source code using the software I've helped build, and as we move forward myself and other maintainers were wondering if we were actually helping users move forward in the best way possible using best of breed tools.1

As the Python community has moved from easy_install to pip, we too have kept the documentation up to date. We went from python setup.py develop to pip install -e . to create editable installs of local projects, and try to let people know the pitfalls of using both easy_install and pip in the same project (mostly with an answer that falls in line with: remove your virtual environment and start over, just use pip).

As part of Pyramid we have developed and maintain various different cookiecutter templates, and our goal is to attempt to provide templates that are both useful, but also follow best practices that are being adopted within the community at large so that newcomers can use their existing skills/knowledge and those that are starting with us walk away with a knowledge and experience that applies not just to development of Pyramid applications, but also applies to the broader community as a whole.

pip

Pip is a great tool that has simplified installation of packages, it supports using binary distributions named wheels and has a way to easily install software from the Python Packaging Index. It has a rather naive dependency resolution process, but for the most part it works and works well. It replaced easy_install as the tool to use for installing packages.

While you can use a requirements.txt file with pip to install a "blessed" list of software there is no good way to "lock" the dependencies of dependencies without manually adding it to the list of requirements. This ends up making it very difficult to manage, and it is very difficult to know that what has been tested is what the user is actually going to get because packages may be updated at anytime, and re-creating the same exact environment is difficult and fraught with errors.

This is where Pipfile is supposed to help. This is a project to add a new, more descriptive requirements file, as well as allowing for a lockfile that would lock not just your primary packages you have listed, but also all dependencies of dependencies all the way down the tree. This helps with reproducibility and allows for the same installation on two different systems to have the exact same software/dependencies installed.

pipenv

While pip is a great tool, and with the Pipfile changes it would allow for locking of dependencies, there is one more puzzle piece missing. When installing packages while you can install them into the global namespace, the recommended way is to install all packages for a particular tool/project into a virtual environment.

Normally you'd invoke virtualenv, to create this environment and then you'd make sure to install all packages within it, thereby isolating it from the rest of the system.

pipenv automates this for you, as well as using Pipfile it also supports locking using Pipfile.lock and provides a bunch of tooling around adding/removing dependencies from a local project.

pipenv allows you to easily create an environment and manage dependencies, but it makes no effort to solve the problem of distributing and building a package that may be installed by third parties.

poetry

Poetry is a similar project to pipenv, with a major difference being that it was built to help with distributing/developing applications and building a distributable package that may then be installed using pip.

Instead of using a Pipfile it uses a recently standardised pyproject.toml file instead. Like pipenv it also supports locking, and it provides tooling around adding/removing dependencies as well as managing what versions are required.

Ultimately those dependencies are going to end up as metadata in a distributable package.

Poetry makes it easier to manage a software development project, whether that is for an application using various libraries for internal use, or for libraries that are going to be distributed to other developers.

The divide

This is where the divide really starts, while you can use pipenv with a standard setuptools project, any dependencies you add to the Pipfile using pipenv's tooling will not be listed as a dependencies for your project when you distribute it, this either means you need to duplicate the list in both setup.py as well as the Pipfile, or you have to add your current project as an editable install within your Pipfile which means your Pipfile is now not as easily distributable.

There are work-arounds that people have used, such as having setup.py read a requirements.txt, so that you could have all your requirements listed in a text file, and not in setup.py, but asking to do the same with a Pipfile in pipenv was met with a "Do not do this.".

poetry explicitly allows you to add dependencies in one place, and those dependency listings are then automatically inserted into the package metadata that is created when you build your distributable package.

The two use cases

There are two competing use cases, one is the deployment of software packages and being able to run them, but not as a developer, the other is a developer of software packages that needs to define dependencies for the project to run.

pipenv solves the deployment case. If I was a user I could very simply grab a known good Pipfile.lock and use pipenv to install a known good set of software, this is great when I am deploying a project. It is the use case that many in the Python Packaging Authority also seem to be optimizing for.

The other use case is for developers that are building new software, either by using a list of existing packages and deploying privately, or people developing software for other developers to be published on the Python Packaging Index.

This latter group of people is under represented due to it likely being much smaller, and existing tools like setuptools and setup.py already providing a "good enough" experience. Innovation in this area is something that readily needs to be improved upon to make it easier to create new libraries/packages that follow best practices. The amount of copy and pastes people have done for adding a setup.py to their projects or to make something work is long. It's all a little bit of black magic, and there is a great many things that have been carried over because of cargo cult programming.

Explicit mentions by the Python Packaging Authority

Reading the packaging guide on managing dependencies, pipenv is the recommended tool:

This tutorial walks you through the use of Pipenv to manage dependencies for an application. It will show you how to install and use the necessary tools and make strong recommendations on best practices.

this language, along with what packaging.python.org implies as a URL makes it difficult as a project maintainer to recommend alternate tools, becuase even if those tools are superior for the use case we are recommending them for it is always going to lead to questions from users, such as:

Why are you not using pipenv, the official tool recommended by Python.org?

We get similar questions about easy_install vs pip all of the time, as well as why people should switch, and we can point to various bits of documentation that explains why pip is a better choice.

If we were to recommend an alternate the appeal to authority that python.org implies is going to make it much more difficult, and the question will become "why is the Pylons Project not using recommended tooling?"

poetry is listed as a footnote on that page, alongside pip-tools and hatch, and is mentioned only for doing library development, with no mention of other requirements that may make it a much better tool for developing locally.

Deployment is not development

If I am using pipenv with a non-installable project (no setup.py) I end up having to figure out how to get the code, and the Pipfile/Pipfile.lock to my environment I am deploying into. pipenv's install provides a way to make sure to only install if the Pipfile.lock is up to date or otherwise will fail to continue. If you are using a local project though, and it uses setup.py the only way that the Pipfile.lock will contain any sub-dependencies of your setup.py project is if you install it as editable. Otherwise sub-dependencies are not locked.2

If I am using poetry I get an pip installable project, but it doesn't contain any hard pins or lock files. I'd have to distribute pyproject.lock as well as my wheel. This gets me a little closer, but still no lock file that includes my newly produced wheel, and has all of its dependencies locked.

The Python Packaging Authority based on Twitter conversations with its members and the documentation on packaging.python.org suggest using pipenv for development. pipenv is particular ill-suited for development if the goal is to create a package to be deployed to production. With two locations to define dependencies it leaves people scratching their heads as to which is canonical, and if a dependency is added to Pipfile but not setup.py it may leave a developer thinking their package is ready for distribution when in reality it is missing a dependency that is required to run/use said distribution.

At this point using both projects seems like a win-win. Use poetry to build/develop a package, then use pipenv in the integration phase to create a Pipfile.lock that is used to deploy in production. This way you get the best of both worlds. A great tool that can help you register entry points and another that can help you with deploying a known good set of dependencies.

Interestingly, even the pipenv docs seem to agree that it is a deployment tool:

Specify your target Python version in your Pipfile’s [requires] section. Ideally, you should only have one target Python version, as this is a deployment tool.

-- Pipenv - General Recommendations & Version Control

Use pipenv if you have a script that requires a couple of dependencies and doesn't need all of the extra overhead of packaging metadata/packaging. Use poetry if you want to build a distributable project that can easily be deployed by others, and use both if you develop a project and need a known good environment to deploy.

In summary

There will likely never be a time that one single tool is considered good enough, and competition between tools is a way to keep advancing forward. Packaging in the Python community for a long time has been difficult. Wheels has made things a little better. pip has made management of installing new packages easier and improved upon easy_install. Here's to the next evolution.


Now, can we talk about standardising on pyproject.toml since that is already where "project" metadata needs to go, might as well re-use the name instead of having two different names/files. Oh, and PEP 517 can't come soon enough so that alternate tools like flit can be used instead of setuptools/setup.py.


  1. We created an issue named Support poetry, flit, pipenv, or ...? that attempts to go over the pros and cons of the various tools and how we currently support our users in our documentation on building projects using pyramid, including how to create a project that is distributable. Pyramid heavily uses pkg_resources and entry points. The way to register the entry points is to have an installable package.

    The framework is flexible enough that there is no requirement for entry points, but at that point you are in territory where the default tooling provided by the project will not work, and some of the convenience tools/functionality that Pyramid provides it's users/developers is not available. 

  2. See documentation for Editable Dependencies (e.g. -e .) which as of this writing states:

    Sub-dependencies are not added to the Pipfile.lock if you leave the -e option out.