SRCco.de

My Python Project Setup

Posted:   |  More posts about python
../galleries/python-logo.png

I have 8+ Python Open Source projects on github.com and on codeberg.org. This post describes my current set of tools and practices I use for maintaining them.

We have a pretty active Python community in Zalando, so I could learn some good practices from colleagues who are much more experienced than me. I would not have found or adopted some of the tools without my helpful Zalando colleagues.

My setup for Python projects includes:

  • Python 3.7+

  • Poetry for dependency management

  • Make to leverage muscle memory

  • black for code formatting

  • mypy for type checking

  • py.test for unit and e2e tests

  • pre-commit hooks to run formatting and linting

  • ReadTheDocs for documentation

  • CalVer for releases

Python 3.7+

I try to keep my projects up-to-date with the latest Python features. I make use of more recent features such as:

  • pathlib (3.4+): nicer path handling (replaces most uses of the os module), e.g. path = Path(__file__).parent / "myfile.txt"

  • typing (3.5+): type hints which can be checked with mypy

  • f-strings (3.6+): fast inline template strings: f"Hello {name}!" instead of "Hello {}".format(name)

  • asyncio (3.7+): write concurrent code with async/await

My projects therefore require at least Python 3.7.

I use pyenv on my local computer to get the latest Python version (3.8.1). See Real Python's blog series of cool new Python features: 3.7, 3.8.

Poetry

I use Poetry for dependency and package management. Poetry uses virtualenvs, has a better dependency resolver than Pipenv, and implements PEP 518 (aka pyproject.toml).

I used Pipenv before and converted the Pipfiles with dephell to pyproject.toml configuration (still needed some manual work afterwards).

Make

Leveraging muscle memory is powerful and make is one of those things easy to remember and first to try in a repo. GNU Make is pretty ubiquitous and can safely be assumed to be present on a developer machine, so by trial and error any dependency (like Poetry) will be discovered:

make
make: poetry: Command not found  # <-- ah, "poetry" is required!
Makefile:3: recipe for target 'install' failed
make: *** [install] Error 127

My standard targets are make lint and make test.

This blog post had some nice learnings for me (but I did not follow all advice).

black & Flake8

Code formatting can spark heated and unnecessary debates and I was jealous of Go having go fmt, so I was very happy to see black come along: black is a non-compromising code formatter for Python - it has nearly no options to tweak and therefore sets a standard across the globe. Luckily Python already had PEP8 and Flake8, so black is merely a tool to achieve standards-compliance without human effort.

There are still some dark corners with black+Flake8: black sometimes generates code which Flake8 complains about, so we need to tell Flake8 to ignore these violations. This can be done via .flake8:

[flake8]
ignore=E501,E203
  • ignore E501: black won't always ensure a max line length, e.g. it won't linebreak docstrings or comments

  • ignore E203: black has problems formatting mylist[len(prefix) :]

mypy

I started using mypy for type checking. This was triggered by some bug I introduced months ago: I refactored a function signature and had no tests for it --- tests would have catched the bug, but mypy also would have covered it. I introduced mypy instead of adding tests --- shame on me! Having mypy cover these cases is better than nothing and typing can gradually be improved: specific lines can be ignored by adding a # type: ignore comment.

py.test

Using py.test as a test framework instead of the "old" unittest library does not need to be elaborated: it's just so much easier to use and less code to write! Example with asserting a certain exception message:

def test_invalid_weekday_range():
    # Monday, November 27th 2017
    dt = datetime(2017, 11, 27, 15, 33, tzinfo=timezone.utc)
    with pytest.raises(ValueError) as excinfo:
        matches_time_spec(dt, "Sun-Fri 15:30-16:00 UTC")
    assert "invalid range (Sun is after Fri)" in str(excinfo.value)

I created pytest-kind as a py.test plugin to support e2e testing with a local kind Kubernetes cluster.

Pre-Commit Hooks

Do you know the pre-commit framework? I did not, but fell in love with it recently!

The framework allows to configure pluggable hooks to check all kind of different files, e.g:

  • make sure that file endings are consistent (also important when working with Windows colleagues)

  • strip unnecessary whitespace (avoids unnecessary git diffs)

  • validate YAML/Dockerfile/... syntax

  • validate Kubernetes manifests (easy to get some deployment spec wrong)

  • format Python code with black

  • lint Python code (Flake8, mypy, Bandit)

All code formatting (black) and linting are executed via pre-commit hooks on Travis CI. make lint runs pre-commit on all files, e.g.:

$ make lint
poetry run pre-commit run --all-files
Check hooks apply to the repository.........................Passed
Check for useless excludes..................................Passed
Check Kubernetes manifests..................................Passed
Reorder python imports......................................Passed
black.......................................................Passed
pydocstyle..................................................Passed
yamllint....................................................Passed
mypy........................................................Passed
Dockerfile linter...........................................Passed
Check for added large files.................................Passed
Check docstring is first....................................Passed
Debug Statements (Python)...................................Passed
Fix End of Files............................................Passed
Flake8......................................................Passed
Trim Trailing Whitespace....................................Passed
Check python ast............................................Passed
Check builtin type constructor use..........................Passed
Detect Private Key..........................................Passed
Mixed line ending...........................................Passed
Tests should end in _test.py................................Passed
type annotations not comments...............................Passed
use logger.warning(.........................................Passed
check for eval()............................................Passed
check for not-real mock methods.............................Passed
check blanket noqa..........................................Passed

By using the pre-commit git hooks locally, I can ensure quick feedback and don't have to remember doing make lint manually. The .pre-commit-config.yaml file is a helpful abstraction to share common formatting/linting configuration across repositories, i.e. I can copy .pre-commit-config.yaml around to apply a common standard to my projects.

My .pre-commit-config.yaml for Python looks like:

minimum_pre_commit_version: 1.21.0
repos:
- repo: meta
  hooks:
  - id: check-hooks-apply
  - id: check-useless-excludes

# reorder Python imports
- repo: https://github.com/asottile/reorder_python_imports
  rev: v1.9.0
  hooks:
  - id: reorder-python-imports

# format Python code with black
- repo: https://github.com/ambv/black
  rev: 19.10b0
  hooks:
  - id: black

# check docstrings
- repo: https://github.com/PyCQA/pydocstyle
  rev: 5.0.2
  hooks:
  - id: pydocstyle
    args: ["--ignore=D10,D21,D202"]

# static type checking with mypy
- repo: https://github.com/pre-commit/mirrors-mypy
  rev: v0.761
  hooks:
  - id: mypy

- repo: https://github.com/pre-commit/pre-commit-hooks
  rev: v2.4.0
  hooks:
  - id: check-added-large-files
  - id: check-docstring-first
  - id: debug-statements
  - id: end-of-file-fixer
  - id: flake8
    additional_dependencies: ["flake8-bugbear"]
  - id: trailing-whitespace
  - id: check-ast
  - id: check-builtin-literals
  - id: detect-private-key
  - id: mixed-line-ending
  - id: name-tests-test
    args: ["--django"]

ReadTheDocs

Not all my open source projects have dedicated documentation sites, but if I need one, I pick ReadTheDocs with Sphinx to publish documentation. See the Kubernetes Web View Documentation as an example.

Calendar Versioning

I switched all my projects to Calendar Versioning (CalVer). Releases now have a version like YY.MM.MICRO, e.g. 20.1.0 for the first release in January 2020.

Why? I believe SemVer is mostly a lie, it sounds good in theory, but in practice any change can be breaking (e.g. bug fixes) and often nobody knows when to increment the major version:

  • Kubernetes: nobody knows when to increment from 1.* to 2.*, breaking changes are introduced over multiple releases

  • Some projects never make it to 1.0 ("ZeroVer", e.g. Cython is still 0.28, but used in production), this was also the case for my personal projects (I never had the courage to make it to version 1.0)

../galleries/twitter-semver-is-a-lie-2019.png

SemVer would only really work if the previous version is maintained so that users can stay with the previous major version and still get bug fixes. I don't plan to support older stable releases for my open source projects, i.e. users don't really have the option to not upgrade (if they want to receive potential bug fixes).

That being said, I still try to keep compatibility and avoid unnecessary breaking changes --- I just won't guarantee it.

A simple release counter would also do it (like Kubernetes does with 1.X where X just increments all the time), but CalVer has some nice benefits:

  • old versions are immediately visible: "I still use the foo library in version 18.2.0? We have 2020, the version is 2 years old!"

  • it encourages working in small batches and releasing more often: regular updates with monthly updates are good to stay up-to-date with the environment (all kinds of dependencies update all the time)

I think that SemVer has its merits, but it's not a silver bullet for all projects --- just having a tuple of 3 numbers does not make a semantic version.

Summary

I'm relatively happy with my current collection of tools & practices around Python. There are always new things to learn and tools to discover, e.g. I was surprised to learn about pre-commit only very recently. Anything I can do better? Do you have tips and suggestions? Please let me know on Twitter or Mastodon!