This lesson has passed peer-review! See the publication in JOSE.

Python for Atmosphere and Ocean Scientists: Glossary

Key Points

Package management
  • xarray and iris are the core Python libraries used in the atmosphere and ocean sciences.

  • Use conda to install and manage your Python environments.

Data processing and visualisation
  • Libraries such as xarray can make loading, processing and visualising netCDF data much easier.

  • The cmocean library contains colormaps custom made for the ocean sciences.

Functions
  • Define a function using def name(...params...).

  • The body of a function must be indented.

  • Call a function using name(...values...).

  • Use help(thing) to view help for something.

  • Put docstrings in functions to provide help for that function.

  • Specify default values for parameters when defining a function using name=value in the parameter list.

  • The readability of your code can be greatly enhanced by using numerous short functions.

  • Write (and import) modules to avoid code duplication.

Command line programs
  • Libraries such as defopt can be used the efficiently handle command line arguments.

  • Most Python scripts have a similar structure that can be used as a template.

Version control
  • Use git config to configure a user name, email address, editor, and other preferences once per machine.

  • git init initializes a repository.

  • git status shows the status of a repository.

  • Files can be stored in a project’s working directory (which users see), the staging area (where the next commit is being built up) and the local repository (where commits are permanently recorded).

  • git add puts files in the staging area.

  • git commit saves the staged content as a new commit in the local repository.

  • Always write a log message when committing changes.

  • git diff displays differences between commits.

  • git restore recovers old versions of files.

GitHub
  • A local Git repository can be connected to one or more remote repositories.

  • Use the HTTPS protocol to connect to remote repositories until you have learned how to set up SSH.

  • git push copies changes from a local repository to a remote repository.

  • git pull copies changes from a remote repository to a local repository.

Vectorisation
  • For large arrays, looping over each element can be slow in high-level languages like Python.

  • Vectorised operations can be used to avoid looping over array elements.

Defensive programming
  • Program defensively, i.e., assume that errors are going to arise, and write code to detect them when they do.

  • Put assertions in programs to check their state as they run, and to help readers understand how those programs are supposed to work.

  • The pdb library can be used to debug a Python script by stepping through line-by-line.

  • Software Carpentry has more advanced lessons on code testing.

Data provenance
  • It is possible (in only a few lines of code) to record the provenance of a data file or image.

Large data
  • Libraries such as dask and xarray can make loading, processing and visualising netCDF data much easier.

  • Dask can speed up processing through parallelism but care may be needed particularly with data chunking.

Glossary

FIXME