Python Packaging and Distribution
There's been many a discussion online related to a variety of tools related to Python packaging and distribution. There is pip, pipenv and poetry that have been the tools under discussion.
As an open source maintainer as part of the Pylons Project, while I would love to be writing code I end up spending a lot of time dealing with user questions around packaging/distributing their source code using the software I've helped build, and as we move forward myself and other maintainers were wondering if we were actually helping users move forward in the best way possible using best of breed tools.1
As the Python community has moved from easy_install to pip, we too have
kept the documentation up to date. We went from python setup.py develop
to
pip install -e .
to create editable installs of local projects, and try to
let people know the pitfalls of using both easy_install and pip in the same
project (mostly with an answer that falls in line with: remove your virtual
environment and start over, just use pip).
As part of Pyramid we have developed and maintain various different cookiecutter templates, and our goal is to attempt to provide templates that are both useful, but also follow best practices that are being adopted within the community at large so that newcomers can use their existing skills/knowledge and those that are starting with us walk away with a knowledge and experience that applies not just to development of Pyramid applications, but also applies to the broader community as a whole.
pip
Pip is a great tool that has simplified installation of packages, it supports using binary distributions named wheels and has a way to easily install software from the Python Packaging Index. It has a rather naive dependency resolution process, but for the most part it works and works well. It replaced easy_install as the tool to use for installing packages.
While you can use a requirements.txt
file with pip to install a "blessed"
list of software there is no good way to "lock" the dependencies of
dependencies without manually adding it to the list of requirements. This ends
up making it very difficult to manage, and it is very difficult to know that
what has been tested is what the user is actually going to get because packages
may be updated at anytime, and re-creating the same exact environment is
difficult and fraught with errors.
This is where Pipfile is supposed to help. This is a project to add a new, more descriptive requirements file, as well as allowing for a lockfile that would lock not just your primary packages you have listed, but also all dependencies of dependencies all the way down the tree. This helps with reproducibility and allows for the same installation on two different systems to have the exact same software/dependencies installed.
pipenv
While pip is a great tool, and with the Pipfile changes it would allow for locking of dependencies, there is one more puzzle piece missing. When installing packages while you can install them into the global namespace, the recommended way is to install all packages for a particular tool/project into a virtual environment.
Normally you'd invoke virtualenv, to create this environment and then you'd make sure to install all packages within it, thereby isolating it from the rest of the system.
pipenv automates this for you, as well as using Pipfile
it also supports
locking using Pipfile.lock
and provides a bunch of tooling around
adding/removing dependencies from a local project.
pipenv allows you to easily create an environment and manage dependencies, but it makes no effort to solve the problem of distributing and building a package that may be installed by third parties.
poetry
Poetry is a similar project to pipenv, with a major difference being that it was built to help with distributing/developing applications and building a distributable package that may then be installed using pip.
Instead of using a Pipfile
it uses a recently standardised pyproject.toml
file instead. Like pipenv it also supports locking, and it provides tooling
around adding/removing dependencies as well as managing what versions are
required.
Ultimately those dependencies are going to end up as metadata in a distributable package.
Poetry makes it easier to manage a software development project, whether that is for an application using various libraries for internal use, or for libraries that are going to be distributed to other developers.
The divide
This is where the divide really starts, while you can use pipenv with a
standard setuptools project, any dependencies you add to the Pipfile
using pipenv's tooling will not be listed as a dependencies for your project
when you distribute it, this either means you need to duplicate the list in
both setup.py
as well as the Pipfile
, or you have to add your current
project as an editable install within your Pipfile
which means your Pipfile
is now not as easily distributable.
There are work-arounds that people have used, such as having setup.py
read a
requirements.txt
, so that you could have all your requirements listed in a
text file, and not in setup.py
, but asking to do the same with a Pipfile
in
pipenv was met with a "Do not do this.".
poetry explicitly allows you to add dependencies in one place, and those dependency listings are then automatically inserted into the package metadata that is created when you build your distributable package.
The two use cases
There are two competing use cases, one is the deployment of software packages and being able to run them, but not as a developer, the other is a developer of software packages that needs to define dependencies for the project to run.
pipenv solves the deployment case. If I was a user I could very simply grab a
known good Pipfile.lock
and use pipenv to install a known good set of
software, this is great when I am deploying a project. It is the use case that
many in the Python Packaging Authority also seem to be optimizing for.
The other use case is for developers that are building new software, either by using a list of existing packages and deploying privately, or people developing software for other developers to be published on the Python Packaging Index.
This latter group of people is under represented due to it likely being much
smaller, and existing tools like setuptools
and setup.py
already providing
a "good enough" experience. Innovation in this area is something that readily
needs to be improved upon to make it easier to create new libraries/packages
that follow best practices. The amount of copy and pastes people have done for
adding a setup.py
to their projects or to make something work is long. It's
all a little bit of black magic, and there is a great many things that have
been carried over because of cargo cult programming.
Explicit mentions by the Python Packaging Authority
Reading the packaging guide on managing dependencies, pipenv is the recommended tool:
This tutorial walks you through the use of Pipenv to manage dependencies for an application. It will show you how to install and use the necessary tools and make strong recommendations on best practices.
this language, along with what packaging.python.org
implies as a URL makes it
difficult as a project maintainer to recommend alternate tools, becuase even if
those tools are superior for the use case we are recommending them for it is
always going to lead to questions from users, such as:
Why are you not using pipenv, the official tool recommended by Python.org?
We get similar questions about easy_install
vs pip
all of the time, as
well as why people should switch, and we can point to various bits of
documentation that explains why pip is a better choice.
If we were to recommend an alternate the appeal to authority that python.org
implies is going to make it much more difficult, and the question will become
"why is the Pylons Project not using recommended tooling?"
poetry is listed as a footnote on that page, alongside pip-tools and hatch, and is mentioned only for doing library development, with no mention of other requirements that may make it a much better tool for developing locally.
Deployment is not development
If I am using pipenv with a non-installable project (no setup.py) I end up
having to figure out how to get the code, and the Pipfile
/Pipfile.lock
to
my environment I am deploying into. pipenv's install provides a way to make
sure to only install if the Pipfile.lock
is up to date or otherwise will fail
to continue. If you are using a local project though, and it uses setup.py
the only way that the Pipfile.lock
will contain any sub-dependencies of your
setup.py
project is if you install it as editable
. Otherwise
sub-dependencies are not locked.2
If I am using poetry I get an pip installable project, but it doesn't contain
any hard pins or lock files. I'd have to distribute pyproject.lock
as well as
my wheel. This gets me a little closer, but still no lock file that includes my
newly produced wheel, and has all of its dependencies locked.
The Python Packaging Authority based on Twitter conversations with its members
and the documentation on packaging.python.org
suggest using pipenv for
development. pipenv is particular ill-suited for development if the goal is to
create a package to be deployed to production. With two locations to define
dependencies it leaves people scratching their heads as to which is canonical,
and if a dependency is added to Pipfile
but not setup.py
it may leave a
developer thinking their package is ready for distribution when in reality it
is missing a dependency that is required to run/use said distribution.
At this point using both projects seems like a win-win. Use poetry to
build/develop a package, then use pipenv in the integration phase to create a
Pipfile.lock
that is used to deploy in production. This way you get the best
of both worlds. A great tool that can help you register entry points and
another that can help you with deploying a known good set of dependencies.
Interestingly, even the pipenv docs seem to agree that it is a deployment tool:
Specify your target Python version in your Pipfile’s [requires] section. Ideally, you should only have one target Python version, as this is a deployment tool.
Use pipenv if you have a script that requires a couple of dependencies and doesn't need all of the extra overhead of packaging metadata/packaging. Use poetry if you want to build a distributable project that can easily be deployed by others, and use both if you develop a project and need a known good environment to deploy.
In summary
There will likely never be a time that one single tool is considered good enough, and competition between tools is a way to keep advancing forward. Packaging in the Python community for a long time has been difficult. Wheels has made things a little better. pip has made management of installing new packages easier and improved upon easy_install. Here's to the next evolution.
Now, can we talk about standardising on pyproject.toml
since that is already
where "project" metadata needs to go, might as well re-use the name instead of
having two different names/files. Oh, and PEP 517 can't come soon enough
so that alternate tools like flit can be used instead of
setuptools/setup.py
.
-
We created an issue named Support poetry, flit, pipenv, or ...? that attempts to go over the pros and cons of the various tools and how we currently support our users in our documentation on building projects using pyramid, including how to create a project that is distributable. Pyramid heavily uses
pkg_resources
and entry points. The way to register the entry points is to have an installable package.The framework is flexible enough that there is no requirement for entry points, but at that point you are in territory where the default tooling provided by the project will not work, and some of the convenience tools/functionality that Pyramid provides it's users/developers is not available. ↩
-
See documentation for Editable Dependencies (e.g.
-e .
) which as of this writing states:Sub-dependencies are not added to the Pipfile.lock if you leave the -e option out.