Package management
|
|
Data processing and visualisation
|
Libraries such as xarray can make loading, processing and visualising netCDF data much easier.
The cmocean library contains colormaps custom made for the ocean sciences.
|
Functions
|
Define a function using def name(...params...) .
The body of a function must be indented.
Call a function using name(...values...) .
Use help(thing) to view help for something.
Put docstrings in functions to provide help for that function.
Specify default values for parameters when defining a function using name=value in the parameter list.
The readability of your code can be greatly enhanced by using numerous short functions.
Write (and import) modules to avoid code duplication.
|
Command line programs
|
|
Version control
|
Use git config to configure a user name, email address, editor, and other preferences once per machine.
git init initializes a repository.
git status shows the status of a repository.
Files can be stored in a project’s working directory (which users see), the staging area (where the next commit is being built up) and the local repository (where commits are permanently recorded).
git add puts files in the staging area.
git commit saves the staged content as a new commit in the local repository.
Always write a log message when committing changes.
git diff displays differences between commits.
git restore recovers old versions of files.
|
GitHub
|
A local Git repository can be connected to one or more remote repositories.
Use the HTTPS protocol to connect to remote repositories until you have learned how to set up SSH.
git push copies changes from a local repository to a remote repository.
git pull copies changes from a remote repository to a local repository.
|
Vectorisation
|
For large arrays, looping over each element can be slow in high-level languages like Python.
Vectorised operations can be used to avoid looping over array elements.
|
Defensive programming
|
Program defensively, i.e., assume that errors are going to arise, and write code to detect them when they do.
Put assertions in programs to check their state as they run, and to help readers understand how those programs are supposed to work.
The pdb library can be used to debug a Python script by stepping through line-by-line.
Software Carpentry has more advanced lessons on code testing.
|
Data provenance
|
|
Large data
|
Libraries such as dask and xarray can make loading, processing and visualising netCDF data much easier.
Dask can speed up processing through parallelism but care may be needed particularly with data chunking.
|