Gitlab hero border pattern left svg Gitlab hero border pattern right svg

Python Style Guide

On this page


Python Style Guide

It is our collective responsibility to enforce this Style Guide since our chosen linter does not catch everything.

Linter

We use Black as our linter. We use the default configuration.

There is a manual CI job in the review stage that will lint the entire repo and return a non-zero exit code if files need to be formatted. It is up to both the MR author and the reviewer to make sure that this job passes before the MR is merged. To lint the entire repo, just execute black . from the top of the repo.

Type Hints

All function signatures should contain type hints, including for the return type, even if it is None. This is good documentation and can also be used with mypy for type checking and error checking.

Examples:

def foo(x: int, y: int) -> int:
    """
    Add two numbers together and return.
    """

    return x + y

def bar(some_str: str) -> None:
    """
    Print a string.
    """
    print(some_str)
    return

Import Order

Imports should follow the PEP8 rules and furthermore should be ordered with any import ... statements coming before from .... import ...

Example:

import logging
import sys
from os import environ

import pandas as pd
from requests import get

import some_local_module
from another_local_module import something

Docstrings

Docstrings should be used in every single function. Since we are using type hints in the function signature there is no requirement to describe each parameter. Docstrings should use triple double-quotes and use complete sentences with punctuation.

Examples:

def foo(x: int, y: int) -> int:
    """
    Add two numbers together and return the result.
    """

    return x + y

def bar(some_str: str) -> None:
    """
    Print a string.
    
    This is another proper sentence.
    """
    print(some_str)
    return

How to integrate Environment Variables

To make functions as reusable as possible, it is highly discouraged (unless there is a very good reason) from using environment variables directly in functions (there is an example of this below). Instead, the best practice is to either pass in the variable you want to use specifically or pass all of the environment variables in as a dictionary. This allows you to pass in any dictionary and have it be compatible while also not requiring the variables to being defined at the environment level.

Examples:

import os
from typing import Dict

## Don't do this!
def foo(x: int) -> int:
    """
    Add two numbers together and return.
    """
    
    return x + os.environ["y"]
foo(1)

## Do this!
env_vars = os.environ.copy() # The copy method returns a normal dict of the env vars.
def bar(some_str: str, another_string: str) -> None:
    """
    Print two strings concatenated together.
    """
    print(some_str + another_string)
    return
bar("foo", env_vars["bar"])

## Or do this!
def bar(some_str: str, env_vars: Dict[str, str]) -> None:
    """
    Print two strings concatenated together.
    """
    print(some_str + env_vars["another_string"])
    return
bar("foo", env_vars)

Package Aliases

We use a few standard aliases for common third-party packages. They are as follows:

Variable Naming Conventions

When possible, use descriptive naming for variables, especially with regards to data type. Here are some examples:

Although usually in the case of constants (particularly strings and numbers) it isn't as helpful, adding the type to the name is good self-documenting code.

When not to use Python

Since this style guide is for the entire data team, it is important to remember that there is a time and place for using Python and it is usually outside of the data modeling phase. Stick to SQL for data manipulation tasks where possible.