Setup and Installing Needed Software

Overview

Teaching: 10 min
Exercises: 50 min
Questions
  • What software does a Node Manager need?

  • Why do I need the recommended software?

  • How do I install the required software?

Objectives
  • Understand how to install required software and prepare to load data

  • Ensure attendees have the required software installed and are ready to use it

The NodeBook environment and supporting software

In order to work efficiently as a Node Manager, the following programs are necessary.

To standardize the verification and quality control process that all contributing data is subjected to, OTN has built custom quality control workflows and tools for Node Managers, often referred to as the OTN Nodebooks. The underlying functions are written in Python and workflows that rely on them can be undertaken through the use of Jupyter Notebooks. In order to use these tools, and interact with your database, you will need to install a Python environment and the software packages that support the workflows. Updates to these tools, as well as up-to-date installation instructions are always available on the OTN GitLab.

This lesson will give attendees a chance to install all the relevant software, under the supervision of OTN staff.

Python/Mamba

Python is a general-purpose programming language that has become the most popular language on GitHub and in many of the computational sciences. It is the main language used by OTN to standardize our data processing pipeline.

Mamba is a fast, cross-platform Python distribution and package manager. When you install Mamba (through Miniforge) you get a self-contained version of the Python interpreter (which enables your computer to run Python code), and many of the core Python libraries. Managing your Python installation with Mamba allows you to install and keep updated all the supporting packages needed for the Nodebooks with one command rather than having to install each one individually.

Miniforge Windows - https://conda-forge.org/miniforge/

Miniforge Mac -

Miniforge Linux (Debian)

Git

Git is a version-control system for text, it helps people to work on code collaboratively, and maintains a complete history of all changes made to the files in a project. We use Git at OTN to track and disseminate changes to the Nodebooks that are made by our developer team, and occasionally you will need to use Git to update your Nodebooks and receive those changes.

Install Git

Nodebooks - iPython Utilities

The ipython-utilities project contains the collection of Jupyter notebooks used to load data into the OTN data system.

Create an Account

First, you will need a GitLab account. Please fill out this signup form for an account on GitLab.

Then, OTN staff will give you access to the OTN-Partner-Nodes group, which hosts all of the relevant Projects for Node Managers.

Install iPython Utilities

  1. Determine the folder in which you wish to keep the iPython Utilities Nodebooks.
  2. Open your terminal or command prompt.
    • Type cd followed by a space.
    • You then need to get the filepath to the folder in which you wish to keep the iPython Utilities Nodebooks. You can either drag the folder into the terminal/command prompt OR right-click on the folder, select ‘Copy as Path’ from the dropdown menu, and paste the result into the terminal/command prompt.
    • You should have a command that looks like cd /path/to/desired/folder.
    • Press Enter, and your terminal/command prompt will navigate to the folder you provided.
  3. Create and activate the “nodebook” python enviornment. The creation process will only need to happen once.
    • In your terminal, run the command conda create -n nodebook python=3.9
    • Activate the nodebook environment by running conda activate nodebook
  4. Next, run: git clone https://gitlab.oceantrack.org/otn-partner-nodes/ipython-utilities.git. This will get the latest version iPython Utilities from our GitLab.
  5. Navigate to the newly-created ipython-utilities subdirectory by running cd ipython-utilities.
  6. Switch to the integration branch (which contains the most up-to-date code) by running git checkout integration.
  7. Now to install all required python packages by running the following: mamba env update -n nodebook -f environment.yml

To open and use the OTN Nodebooks:

More operating system-specific instructions and troubleshooting tips can be found at: https://gitlab.oceantrack.org/otn-partner-nodes/ipython-utilities/-/wikis/New-Install-of-Ipython-Utilities

OTN Nodebooks - home page

Database Console Viewer

There are database administration applications to assist with interacting directly with your database. There are many options available but DBeaver and DataGrip are the most popular options at OTN.

In the next lesson we will practice using our database console viewer and connecting to our node_training database.

More Useful Programs

In order to work efficiently as a Node Manager, the following programs are necessary and/or useful.

Cross-Platform

Visual Studio Code - An advanced code editing integrated development environment (IDE). Also contains extensions that can run JuPyTeR notebooks, open CSV files in a visually appealing way, as well as handle updating your Git repositories.

For WINDOWS users

Path Copy Copy - For copying path links from your file browser. Since many of the notebooks require you to provide the path to the file you wish to load, being able to copy and paste the entire path at once can save a lot of time.

Notepad++ - For reading and editing code, csv files etc. without altering the formatting. Opening CSV files in Excel can change the formatting of the data in the file (this is a common problem with dates). Notepad++ will allow you to edit CSV files (and code, if necessary) without imposing additional formatting on data.

Tortoise Git - For managing git, avoiding command line. Depending on what new features have been recently added, you may be asked to use a different branch of the notebook repository than the main one (i.e. integration). Although using git through the command line is supported, you may prefer to manage your Nodebooks via a graphical user interface (GUI). Tortoise Git can provide that.

For MAC users

Source Tree - For managing git, avoiding command line.

Node Training Datasets

We have created test datasets to use for this workshop. Each attendee has their own files, available at this link: http://129.173.48.161/data/repository/node_training/node-training-files-1

Please find the folder with your name and download. Save these somewhere on your computer, and UNZIP all files.

Key Points

  • Node Manager tasks involve the use of many different programs

  • OTN staff are always available to help with installation of these programs or any issues

  • There are many programs and tools to help Node Managers