In order to work efficiently as a Node Manager, the following programs are necessary and/or useful.
To standardize the verification and quality control process that all contributing data is subjected to, OTN has built custom quality control workflows and tools for Node Managers, often referred to as the OTN Nodebooks. The underlying functions are written in Python
and workflows that rely on them can be undertaken through the use of Jupyter
Notebooks. In order to use these tools, and interact with your database, you will need to install a few different applications and packages. All installation instructions are also available on our GitLab here.
This lesson will give attendees a chance to install all the relevant software, under the supervision of OTN staff.
Python/Mamba
Python
is a popular general-purpose programming language that can be used for a wide variety of applications. It is the main language used by OTN and our data processing pipeline.
Mamba
is fast, cross-platform python distribution and a package manager. When you install Mamba (through Miniforge)
you get a python
interpreter, and many of the core python libraries. Managing your Python installation with Mamba allows you to be able to install and update all the packages needed to run the Nodebooks with one command rather than having to install each one individually.
Miniforge Windows - https://conda-forge.org/miniforge/
- Select the option install for Just Me (recommended).
- Check the option to Add Miniforge3 to my PATH environment variable.
Miniforge Mac -
- Setup homebrew by running the command:
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
- Note: this operation requires elevated privileges (sudo) - Use the commands outputted by brew to add brew to your system path
- Use brew to install miniforge:
brew install miniforge
- Add miniforge to your zsh environment by typing conda init zsh. Restart the terminal
Miniforge Linux (Debian)
- download Shell script (.sh) file from https://conda-forge.org/miniforge/
- Recommended: Choose Python 64-bit Linux Installer
- Change the run permissions for the miniforge installer script. ie
chmod +x Miniforge3-[version]-Linux-x86_64.sh
- run the installer from the linux terminal.
./Miniforge3-[version]-Linux-x86_64.sh
- Add this conda installation to your terminal environment by running
conda init
. Restart the terminal to see the changes reflected.
Git
Git
is a version-control system, it helps people to work collaboratively and maintains a complete history of all changes made to a project. We use Git
at OTN to track changes to the Nodebooks made by our developer team, and occasionally you will need to update your Nodebooks to include those changes.
Install Git
-
Windows- https://git-scm.com/download/win
-
Linux (Debian) - run the command:
sudo apt install git
Nodebooks - iPython Utilities
The ipython-utilities
project contains the collection of Jupyter notebooks used to load data into the OTN data system.
Create an Account
First, you will need a GitLab account. Please fill out this signup form for an account on GitLab.
Then, OTN staff will give you access to the relevant Projects containing the code we will use.
Install iPython Utilities
- Determine the folder in which you wish to keep the iPython Utilities Nodebooks.
- Open your
terminal
orcommand prompt
app.- Type
cd
thenspace
. - You then need to get the filepath to the folder in which you wish to keep the iPython Utilities Nodebooks. You can either drag the folder into the
terminal
orcommand prompt
app or hitshift/option
while right clicking and selectcopy as path
from the menu. - Then paste the filepath in the
terminal
orcommand prompt
and hitenter
- In summary, you should type
cd /path/to/desired/folder
before pressing enter.
- Type
- Create and activate the “nodebook” python enviornment. The creation process will only need to happen once.
- In your terminal, run the command
conda create -n nodebook python=3.9
- Activate the nodebook environment using
conda activate nodebook
- In your terminal, run the command
- You are now able to run commands in that folder. Now run:
git clone https://gitlab.oceantrack.org/otn-partner-nodes/ipython-utilities.git
. This will get the latest version iPython Utilities from our GitLab - Navigate to the ipython-utilities subdirectory that was created by running
cd ipython-utilities
. - Switch to the
integration
branch (which contains the most up-to-date code) by runninggit checkout integration
. - Create a new conda environment by running the command:
conda create -n nodebook python=3.9
. This will create the environment named nodebook using the python version 3.9.XX. - Now to install all required python packages by running the following:
mamba env update -n nodebook t -f environment.yml
To open and use the OTN Nodebooks:
- MAC/WINDOWS/LINUX: Open your terminal, and navigate to your ipython-utilities directory, using
cd /path/to/ipython-utilities
. Then, run the commands:conda activate nodebook
to activate the nodebook python environmentjupyter notebook --config="nb_config.py" "0. Home.ipynb"
to open the Nodebooks in a browser window.
- DO NOT CLOSE your terminal/CMD instance that opens! This will need to remain open in the background in order for the Nodebooks to be operational.
More operating system-specific instructions and troubleshooting tips can be found at: https://gitlab.oceantrack.org/otn-partner-nodes/ipython-utilities/-/wikis/New-Install-of-Ipython-Utilities
Database Console Viewer
There are database administration applications to assist with interacting directly with your database. There are many options available but DBeaver
and DataGrip
are the most popular options at OTN.
- https://dbeaver.io/ (free and open access - recommended)
- https://www.jetbrains.com/datagrip (free institutional/student access options - another option)
More Useful Programs
In order to work efficiently as a Node Manager, the following programs are necessary and/or useful.
Cross-Platform
Visual Studio Code - An advanced code editing integrated development environment (IDE).
For WINDOWS users
Path Copy Copy - For copying path links from your file browser. Since many of the notebooks require you to provide the path to the file you wish to load, being able to copy and paste the entire path at once can save a lot of time.
Notepad++ - For reading and editing code, csv files etc. without altering the formatting. Opening CSV files in Excel can change the formatting of the data in the file (this is a common problem with dates). Notepad++ will allow you to edit CSV files (and code, if necessary) without imposing additional formatting on data.
Tortoise Git - For managing git, avoiding command line. Depending on what changes have been made to the code, you may be required to use a different branch of the notebook repository than the main one. Although using git through the command line is supported, you may prefer to have a graphical user interface (GUI) instead. Tortoise Git can provide that.
For MAC users
Source Tree - For managing git, avoiding command line.
Node Training Datasets
We have created test datasets to use for this workshop. Each attendee has their own files, available at this link: http://129.173.48.161/data/repository/node_training/node-training-files-1
Please find the folder with your name and download. Save these somewhere safe on your computer, and UNZIP all files.