OTN Node Manager Training: Setup

In order to work efficiently as a Node Manager, the following programs are necessary and/or useful.

To standardize the verification and quality control process that all contributing data is subjected to, OTN has built custom quality control workflows and tools for Node Managers, often referred to as the OTN Nodebooks. The underlying functions are written in Python and workflows that rely on them can be undertaken through the use of Jupyter Notebooks. In order to use these tools, and interact with your database, you will need to install a few different applications and packages. All installation instructions are also available on our GitLab here.

This lesson will give attendees a chance to install all the relevant software, under the supervision of OTN staff.

Python/Mamba

Python is a popular general-purpose programming language that can be used for a wide variety of applications. It is the main language used by OTN and our data processing pipeline.

Mamba is fast, cross-platform python distribution and a package manager. When you install Mamba (through Miniforge) you get a python interpreter, and many of the core python libraries. Managing your Python installation with Mamba allows you to be able to install and update all the packages needed to run the Nodebooks with one command rather than having to install each one individually.

Miniforge Windows - https://conda-forge.org/miniforge/

Miniforge Mac -

Miniforge Linux (Debian)

Git

Git is a version-control system, it helps people to work collaboratively and maintains a complete history of all changes made to a project. We use Git at OTN to track changes to the Nodebooks made by our developer team, and occasionally you will need to update your Nodebooks to include those changes.

Install Git

Nodebooks - iPython Utilities

The ipython-utilities project contains the collection of Jupyter notebooks used to load data into the OTN data system.

Create an Account

First, you will need a GitLab account. Please fill out this signup form for an account on GitLab.

Then, OTN staff will give you access to the relevant Projects containing the code we will use.

Install iPython Utilities

  1. Determine the folder in which you wish to keep the iPython Utilities Nodebooks.
  2. Open your terminal or command prompt app.
    • Type cd then space.
    • You then need to get the filepath to the folder in which you wish to keep the iPython Utilities Nodebooks. You can either drag the folder into the terminal or command prompt app or hit shift/option while right clicking and select copy as path from the menu.
    • Then paste the filepath in the terminal or command prompt and hit enter
    • In summary, you should type cd /path/to/desired/folder before pressing enter.
  3. Create and activate the “nodebook” python enviornment. The creation process will only need to happen once.
    • In your terminal, run the command conda create -n nodebook python=3.9
    • Activate the nodebook environment using conda activate nodebook
  4. You are now able to run commands in that folder. Now run: git clone https://gitlab.oceantrack.org/otn-partner-nodes/ipython-utilities.git. This will get the latest version iPython Utilities from our GitLab
  5. Navigate to the ipython-utilities subdirectory that was created by running cd ipython-utilities.
  6. Switch to the integration branch (which contains the most up-to-date code) by running git checkout integration.
  7. Create a new conda environment by running the command: conda create -n nodebook python=3.9. This will create the environment named nodebook using the python version 3.9.XX.
  8. Now to install all required python packages by running the following: mamba env update -n nodebook t -f environment.yml

To open and use the OTN Nodebooks:

More operating system-specific instructions and troubleshooting tips can be found at: https://gitlab.oceantrack.org/otn-partner-nodes/ipython-utilities/-/wikis/New-Install-of-Ipython-Utilities

OTN Nodebooks - home page

Database Console Viewer

There are database administration applications to assist with interacting directly with your database. There are many options available but DBeaver and DataGrip are the most popular options at OTN.

More Useful Programs

In order to work efficiently as a Node Manager, the following programs are necessary and/or useful.

Cross-Platform

Visual Studio Code - An advanced code editing integrated development environment (IDE).

For WINDOWS users

Path Copy Copy - For copying path links from your file browser. Since many of the notebooks require you to provide the path to the file you wish to load, being able to copy and paste the entire path at once can save a lot of time.

Notepad++ - For reading and editing code, csv files etc. without altering the formatting. Opening CSV files in Excel can change the formatting of the data in the file (this is a common problem with dates). Notepad++ will allow you to edit CSV files (and code, if necessary) without imposing additional formatting on data.

Tortoise Git - For managing git, avoiding command line. Depending on what changes have been made to the code, you may be required to use a different branch of the notebook repository than the main one. Although using git through the command line is supported, you may prefer to have a graphical user interface (GUI) instead. Tortoise Git can provide that.

For MAC users

Source Tree - For managing git, avoiding command line.

Node Training Datasets

We have created test datasets to use for this workshop. Each attendee has their own files, available at this link: http://129.173.48.161/data/repository/node_training/node-training-files-1

Please find the folder with your name and download. Save these somewhere safe on your computer, and UNZIP all files.