Introducing and Open Sourcing Shiv

  • 时间: 2018-05-11 01:13:18

At LinkedIn, we ship hundreds of command-line utilities to every machine in our data centers and to all of our employees’ workstations. The vast majority of these utilities are written in Python.

In addition to developing these command-line utilities, we have hundreds of supporting libraries that are constantly being iterated on, with new versions published daily. Because of the inherent problems present when dealing with such a huge and ever-changing dependency graph, we need to package the executables individually to avoid dependency conflicts. Initially, we took advantage of the great open source tool PEX. PEX elegantly solved the isolated packaging requirement we had by including all of a tool’s dependencies inside a single binary file that we could then distribute.

However, as our tools matured over time and picked up additional dependencies, we became acutely aware of the performance issues being imposed by the pkg_resourceslibrary, which are well documented here.

To briefly summarize:

  • Pkg_resourcesmust scan every element of Python’s sys.pathin order to build metadata objects that represent each “distribution.”
  • This work is done at import time, which can significantly slow down CLI invocation time.

Since PEX leans heavily on pkg_resourcesto bootstrap its environment, we found ourselves at an impasse: either lose out on the ability to neatly package our tools in favor of invocation speed, or impose a few-second start-up penalty for the benefit of easy packaging.

After spending some time investigating and extricating pkg_resourcesfrom PEX, we decided to start from a clean slate, and thus, shivwas created.

What does shiv do?

Shiv allows us to create a single binary artifact from a Python project that includes all of its dependencies. The only thing required to run a full-fledged Python application is an interpreter.

How does shiv work?

Inspired by PEP 441, shiv creates a “ zipapp” containing a Python project, all the project’s dependencies, and a special bootstrap module that extracts and injects all required dependencies at run time with very little latency cost.

  • Shiv completely avoids the use of pkg_resources. If it is included by a transitive dependency, the performance implications are mitigated by limiting the length of sys.pathand always including the -s and -E Python interpreter flags.

  • Instead of shipping our binary with downloaded wheelsinside, we package an entire site-packages directory, as installed by pip. We then bootstrap that directory post-extraction via the stdlib’s site.addsitedirfunction. That way, everything works out of the box: namespace packages, real filesystem access, etc.

Because we optimize for a shorter sys.pathand don’t include pkg_resourcesin the critical path, executables created with shiv are often even faster than running a script from within a virtualenv.

Why shiv?

The tool freezes a Python environment, so you can think of shiv as a shorter way of saying “shiver.”

Acknowledgements

Special thanks toBarry Warsaw andDan Sully for their help open sourcing this project.

We also have to credit the great work by @wickman, @kwlzn, @jsiroisand the other PEX contributorsfor laying the groundwork for shiv!