Introduction to Nodes

Overview

Teaching: 45 min
Exercises: 0 min

Questions

What is an OTN-style Database Node?

What is expected from a Node Manager?

Who is this training for?

What is the schedule for the next several days?

Objectives

Understand the relationship between OTN and its Nodes.

Ensure attendees have required background information.

What is a Node?

OTN partners with regional acoustic telemetry networks around the world to enable detection-matching across our communities. An OTN Node is an exact copy of OTN’s acoustic telemetry database structure, which allows for direct cross-referencing between the data holdings of each regional telemetry sharing community. The list of OTN Nodes is available here: https://members.oceantrack.org. Data only needs to be reported to one Node in order for tags/detections to be matched across all.

How does a Node benefit its users?

OTN and affiliated networks provide automated cross-referencing of your detection data with other tags in the system to help resolve “mystery detections” and provide detection data to taggers in other regions. OTN Data Managers perform extensive quality control on submitted metadata to ensure the most accurate records possible are stored in the database and shared with researchers. OTN’s database and Data Portal website are well-suited for archiving datasets for future use and sharing with collaborators. The OTN system and data workflows include pathways to publish datasets with the Ocean Biodiversity Information System, and for sharing via open data portals such as ERDDAP and GeoServer. The data product returned by OTN is directly ingestible by populare acoustic telemetry data analysis packages including glatos, actel, remora, and resonATe. In addition to the data curation materials, OTN offers continuous support and workshop materials detailing the use of these packages and tools.

Below is a link to a presentation from current Node Managers, describing the relationship between OTN and its Nodes, the benefits of the Node system as a community outgrows more organic person-to-person sharing, as well as a realistic understanding of the work involved in hosting/maintaining a Node.

PowerPoint Link

Node Managers

To date, the greatest successes in organizing telemetry communities has come from identifying and working with local on-the-ground Node Managers for each affiliated Node. The trusted and connected ‘data wranglers’ have been essential to building and maintaining the culture and sense of trust in each telemetry group.

In order to be successful as a Node Manager in your region, here are a few tips and guidelines:

Ensure you set aside time each week to wear your ‘Node Manager hat’.
Familiarize yourself with the metadata reporting templates, and follow carefully the formats required for each variable.
Ensure you have continuous communication with the OTN data team so you are able to align your detection matching with all other Nodes, and keep your local toolbox up to date.
Ensure you have consistent communication with your local telemetry community - knowing who’s who is important for relationship building and growing the community.
Be willing to learn, and ask questions - we are always trying to improve our tools and processes!

No previous coding or data management experience is required to manage a Node. Anyone who is willing to put in the work to become a Data Manager can be successful. Being involved in the telemetry community as a researcher (or affiliate) is enough to get you started with ‘data wrangling’ for your telemetry community.

Node Training

Each year OTN hosts a training session for Node Managers. This session is not only for new Node Managers, but also a refresher for current Node Managers on our updated tools and processes.

This is a hands-on course, participants will be using the tools to practice loading telemetry data with us, usually using a Training Node we have built for this purpose. This means you will need to install all required software and devote full attention for the next several days.

Here are the general topics that will be covered:

OTN Node structure and database maintenance
Data loading workflow: from metadata to detection extracts, and how we track progress in GitLab
Practice interfacing with an OTN Node database in DBeaver, using SQL scripts
Practice using OTN’s Nodebooks to quality control, process and verify records, using python and Jupyter notebooks
Overview of OTN’s Data Push process, and how Node Managers are involved
Data Policy guidelines as a Node Manager

If you do not intend on learning how to load data to an OTN-style Node (and would prefer instead to be a spectator) please let us know, so we can identify who our hands-on learners will be.

A great resource for Node Managers as they get started will be OTN’s FAQ page. Your local telemetry community will likely have many questions about the Node and how it works, and the FAQs can help answer some of these questions.

Key Points

Your Node is fully compatible with all others like it.

A well-connected Node Manager is essential to serving the research community.

OTN staff are always availble to support Node Managers.

Upholding the Data Policy

Overview

Teaching: 15 min
Exercises: 0 min

Questions

How do I handle data requests?

Objectives

Understand how to uphold the Network’s Data Policy.

As an Node Data Manager, you will have the key responsibility to uphold your Network’s Data Privacy Policy and/or Data Sharing Agreement. You have access to every member’s public and private datasets, a level of access that not even your direct supervisors will have.

For this reason, you need to be intimately familiar with the types of data which are under embargo/private, and what is publicly available according to the Data Policy / Data Sharing Agreement for your Network. You will need to develop an internal protocol for when requests for data access are submitted, to ensure appropriate care is taken to protect the integrity of your members’ data.

OTN recommends creating a “Data Request Response Policy”.

External Requests - Restricted Data

This is an example of how OTN handles these requests:

Request for data (from person other than data owner) submitted
Data request is scoped, in a GitLab Issue. All details from requester is included.
Impacted PIs are identified, and contacted, seeking written permission for requester to access the information.
Written permission is documented in GitLab Issue, to preserve the paper-trail.
Data request report is compiled and provided to requester, once all permissions have been received.

Internal Requests

Internal requests, from other Network staff, need to be handled in a similar way. While all Network staff should be familiar with the Data Policy, this may not be the case. It is the Node Manager’s responsibility to ensure that information from the database is not shared outside of internal reports. We need to track requests from all sources, for OTN this even includes the OTN Glider and Field teams, so we can have a record of who has asked for what, and be able to enforce the appropriate Data Policies.

Partner-Node Data Policies

In order to meet everyone’s expectations when it comes to data sharing and maximize the utility of tags releeased, there is one shared rule across all participating Node data policies and data sharing agreements. All nodes and OTN agree that the Data Policy of the Network which holds the tag records will apply to the detections of those tags, in all of the other Nodes in which it is detected. You cannot share information/detections of tags from outside your Node, without first consulting with your partner’s Data Policy. This will likely not be an issue for most Nodes, but is a key consideration for OTN staff when creating data products.

For guidance on the interpretation of your Data Policy, or the Data Policy of a parter Node, OTN, and especially the Director of Data Operations, is available to assist.

Key Points

Node Managers have a key responsbility to ensure the Data Policy is followed.

OTN System, Structure and Outputs

Overview

Teaching: 25 min
Exercises: 0 min

Questions

What does an OTN-style Database look like?

What is the general path data takes through the OTN data system

What does the OTN data system output?

Objectives

Understand the OTN Database structure on a high level

Understand the general path of data of data through the OTN system

Understand what the OTN system can output

OTN Data System

The OTN data system is an aggregator of telemetry data made up of interconnected Node Databases and data processing tools. These work together to connect researchers with relevant and reliable data. At the heart of this system are Nodes and their OTN-style Databases.

Affiliated acoustic telemetry partner Networks may become an OTN Node by deploying their own database that follows the same structure as all the others. This structure allows Nodes to use OTN’s data loading processes, produce OTN data products, and match detections across other Nodes.

Basic Structure

The basic structural decision at the centre of an OTN-style Database is that each of a Node’s projects will be subdivided into their own database schemas. These schemas contain only the relevant tables and data to that project. The tables included in each schema are created and updated based on which types of data each project is reporting.

Projects can have the type tracker, deployment, or data.

Tracker projects only submit data about tag releases and animals. They get tables based on the tags, animals, and detections of those tags.
Deployment projects only submit data about receivers and their collected data. These projects get tables related to receiver deployments and detections on their receivers.
Data projects are projects that deploy both tags and receivers and will submit data related tags, animals, receivers, and detections and will get all the related tables.

In addition to the project-specific schemas, there are some important common schemas in the Database that Node Managers will interact with. These additional schemas include the obis, erddap, geoserver, vendor, and discovery schemas. These schemas are found across all Nodes and are used to create important end-products and for processing.

The obis schema holds summary data describing each project contained in the Node as well as the aggregated data from those projects. When data goes into a final table of the project schema it will be inherited into a table in obis (generally with a similar name).
The erddap schema holds aggregated data re-formatted to be used to serve telemetry data via an ERDDAP data portal.
The geoserver schema holds aggregated data re-formatted to be used to create geospatial data products published to a GeoServer.
The vendor schema holds manufacturer specifications for tags and receivers, used for quality control purposes.
The discovery schema holds summaries of data across the OTN ecosystem. These tables are used to create summary reports and populate statistics and maps on partner webpages.

The amount of information shared through the discovery tables can be adjusted based on sharing and reporting requirements for each Node.

OTN Database - path of data through the system

The Path of Data

The OTN data system takes 4 types of data/metadata: project, tag, instrument deployments, and detections. Most data has a similar flow through the OTN system even though each type has different notebooks and processes for loading. The exception to this is project metadata which has a more unique journey because it is completely user-defined, and must be used to initially define and create a project’s schema.

flowchart BT

    tag_start(( )) --> tag_sheet[[tagging metadata sheets]]
    style tag_start fill:#00FF00,stroke:#00FF00,stroke-width:4px
    tag_sheet --> tag_raw(c_tag_meta_suffix)
    style tag_raw fill:#6495ED
    tag_raw --> animcache(animalcache_suffix)
    style animcache fill:#CF9FFF
    tag_raw --> tagcache(tagcache_suffix)
    style tagcache fill:#CF9FFF
    animcache --> otnanim(otn_animals)
    style otnanim fill:#FDDA0D
    tagcache --> otntra(otn_transmitters)
    style otntra fill:#FDDA0D
    otnanim --> obisanim(otn_animals)
    style obisanim fill:#B8B8B8
    otntra --> obismoor(moorings)
    style obismoor fill:#B8B8B8

    rcv_start(( )) --> rcv_sheet[[deployment metadata sheets]]
    style rcv_start fill:#00FF00,stroke:#00FF00,stroke-width:4px
    rcv_sheet --> rcv_raw(c_shortform_suffix)
    style rcv_raw fill:#6495ED
    rcv_raw --> stat(stations)
    style stat fill:#CF9FFF
    rcv_raw --> rcv(rcvr_locations)
    style rcv fill:#CF9FFF
    stat --> moor(moorings)
    style moor fill:#FDDA0D
    rcv --> moor
    moor --> obismoor

    det_start(( )) --> det_sheet[[detection instrument data]]
    style det_start fill:#00FF00,stroke:#00FF00,stroke-width:4px
    det_sheet --> event_raw(c_events_suffix)
    style event_raw fill:#6495ED
    event_raw --> events(events)
    style events fill:#CF9FFF
    events --> moor
    det_sheet --> det_raw(c_detections_suffix)
    style det_raw fill:#6495ED
    det_raw --> det(detections_yyyy)
    style det fill:#CF9FFF
    det --> otndet(otn_detections_yyyy)
    style otndet fill:#FDDA0D
    otndet --> obisdet(otn_detections_yyyy)
    style obisdet fill:#B8B8B8

    obisanim --> obis[(Parent schema)]
    style obis fill:#B8B8B8,stroke:#000000
    obismoor --> obis
    obisdet --> obis

    obis --> done(( ))
    style done fill:#FF0000,stroke:#FF0000

Project Data

Project data has a unique workflow from the other input data and metadata that flows into an OTN Node, it is generally the first bit of information received about a project, and will be used to create the new schema in the Database for a project. The type of project selected (tracker, deployment, or data) will determine the format of the tables in the newly created schema. The type of project will also impact the loading tools and processes that will be used later on. The general journey of project data is:

To register a new project a researcher will fill out a project metadata template and submit it to the Node Manager.
The Node Manager will visually evaluate the template to catch any obvious errors and then run the data through the OTN Nodebook responsible for creating and updating projects (Create and Update Projects).
The Create and Update Projects notebook will make a new schema in the Database for that project, and fill it with the required tables based on the type of project.
Summary tables are populated at this time (scientificnames, contacts, otn_resources etc).
After this, OTN analysts will verify the project one last time to make sure every necessary field is filled out and properly defined.

Tag, Deployment and Detections Data

Even though tag, deployment, and detections data all have their own loading tools and processes, their general path through the database is the same.

Their data workflows all begin with a submission of data or metadata files from a researcher.
The Node Manager ensures there is a copy of the file on the Node’s document management website.
The Node Manager carries out visual quality control to catch any obvious errors.
The data is then processed through the relevant OTN Nodebooks. This process is outlined by the task list associated with the GitLab Issue made for this data.
The data will first be loaded into the “raw” tables. This is the table that holds the raw data as submitted by the researcher (the naming convention for raw tables is that they always have the prefix c_ and will have a suffix indicating the date it was loaded, typically YYYY_MM).
After the raw data table is verified, the data will move to the “intermediate” tables which act as a staging area for partially-processed data.
After the intermediate table is verified, data will move to the “upper” tables, where the data is finished processing and is in its final form. This is the data that will be used for aggregation tables such as obis and for outputs such as Detection Extracts.

OTN Data Products

The OTN Database has specific data products available, based upon the clean processed data, for researchers to use for their scientific analysis.

In order to create meaningful Detection Extracts, OTN and affiliated Nodes only perform cross-matching events every 4 months (when a reasonable amount of new data has been processed). This event is called a synchronous Data Push. In a Data Push:

All recently-loaded data is verified and updated.
Detections are matched to their relevant tag across all Nodes (Cross-Node Matching).
Once cross-node matching is done, Detection Extracts are created, containing all the new detections matches for each project. Detection Extract files are formatted for direct ingestion by analysis packages such as glatos and resonate.
Summary schemas like discovery, erddap, and geoserver are updated with the newly verified data.

Summary schema records can be used to create maps and other record overviews such as this map of active OTN receivers:

Summary Map

Backing Up Your Data

As with any database, it is important to make sure the data held by the OTN Database Node is protected. To ensure you are protected your Database and potentially the files contributed to your Node by the research community you support should be backed up properly, in the event your primary Database crashes, is corrupted, is lost, or any other unexpected event.

You should discuss and coordinate a backup strategy with the groups or institutions responsible for your Database’s administration, and find out their policies and practices for performing backups. Backup strategies may vary from group to group but it is a good idea to make sure they are adequately backing up your data daily, if not multiple times a day, and keep copies of backups in different physical locations in the case that something happens at a single location.

OTN is happy to offer guidance and recommendations to any group.

Key Points

All OTN-style Databases have the same structure

Databases are divided into project schemas which get certain tables based on the type of data they collect

Data in the OTN system moves from the raw tables to the intermediate tables to the upper tables before aggregation

Setup and Installing Needed Software

Overview

Teaching: 10 min
Exercises: 50 min

Questions

What software does a Node Manager need?

Why do I need the recommended software?

How do I install the required software?

Objectives

Understand how to install required software and prepare to load data

Ensure attendees have the required software installed and are ready to use it

The NodeBook environment and supporting software

In order to work efficiently as a Node Manager, the following programs are necessary.

To standardize the verification and quality control process that all contributing data is subjected to, OTN has built custom quality control workflows and tools for Node Managers, often referred to as the OTN Nodebooks. The underlying functions are written in Python and workflows that rely on them can be undertaken through the use of Jupyter Notebooks. In order to use these tools, and interact with your database, you will need to install a Python environment and the software packages that support the workflows. Updates to these tools, as well as up-to-date installation instructions are always available on the OTN GitLab.

This lesson will give attendees a chance to install all the relevant software, under the supervision of OTN staff.

Python/Mamba

Python is a general-purpose programming language that has become the most popular language on GitHub and in many of the computational sciences. It is the main language used by OTN to standardize our data processing pipeline.

Mamba is a fast, cross-platform Python distribution and package manager. When you install Mamba (through Miniforge) you get a self-contained version of the Python interpreter (which enables your computer to run Python code), and many of the core Python libraries. Managing your Python installation with Mamba allows you to install and keep updated all the supporting packages needed for the Nodebooks with one command rather than having to install each one individually.

Miniforge Windows - https://conda-forge.org/miniforge/

Select the option install for Just Me (recommended).
Check the option to Add Miniforge3 to my PATH environment variable.

Miniforge Mac -

Setup homebrew by running the command: /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)" - Note: this operation requires elevated privileges (sudo)
Use the commands provided by brew’s installation output to add brew to your system path
Use brew to install miniforge: brew install miniforge
Add miniforge to your zsh environment by typing conda init zsh. Restart the terminal

Miniforge Linux (Debian)

Download the Shell script (.sh) file from https://conda-forge.org/miniforge/
- Recommended: Choose Python 64-bit Linux Installer
Change the run permissions for the miniforge installer script. ie chmod +x Miniforge3-[version]-Linux-x86_64.sh
Run ./Miniforge3-[version]-Linux-x86_64.sh in the Linux terminal to activate the installer.
Add this conda installation to your terminal environment by running conda init. Restart the terminal to see the changes reflected.

Git

Git is a version-control system for text, it helps people to work on code collaboratively, and maintains a complete history of all changes made to the files in a project. We use Git at OTN to track and disseminate changes to the Nodebooks that are made by our developer team, and occasionally you will need to use Git to update your Nodebooks and receive those changes.

Install Git

Windows- https://git-scm.com/download/win
Mac- https://git-scm.com/download
Linux (Debian) - run the command: sudo apt install git

Nodebooks - iPython Utilities

The ipython-utilities project contains the collection of Jupyter notebooks used to load data into the OTN data system.

Create an Account

First, you will need a GitLab account. Please fill out this signup form for an account on GitLab.

Then, OTN staff will give you access to the OTN-Partner-Nodes group, which hosts all of the relevant Projects for Node Managers.

Install iPython Utilities

Determine the folder in which you wish to keep the iPython Utilities Nodebooks.
Open your terminal or command prompt.
- Type cd followed by a space.
- You then need to get the filepath to the folder in which you wish to keep the iPython Utilities Nodebooks. You can either drag the folder into the terminal/command prompt OR right-click on the folder, select ‘Copy as Path’ from the dropdown menu, and paste the result into the terminal/command prompt.
- You should have a command that looks like cd /path/to/desired/folder.
- Press Enter, and your terminal/command prompt will navigate to the folder you provided.
Create and activate the “nodebook” python enviornment. The creation process will only need to happen once.
- In your terminal, run the command conda create -n nodebook python=3.9
- Activate the nodebook environment by running conda activate nodebook
Next, run: git clone https://gitlab.oceantrack.org/otn-partner-nodes/ipython-utilities.git. This will get the latest version iPython Utilities from our GitLab.
Navigate to the newly-created ipython-utilities subdirectory by running cd ipython-utilities.
Switch to the integration branch (which contains the most up-to-date code) by running git checkout integration.
Now to install all required python packages by running the following: mamba env update -n nodebook -f environment.yml

To open and use the OTN Nodebooks:

MAC/WINDOWS/LINUX: Open your terminal and navigate to your ipython-utilities directory by running cd /path/to/ipython-utilities. Then, run the commands:
- conda activate nodebook to activate the nodebook python environment
- jupyter notebook --config="nb_config.py" "0. Home.ipynb" to open the Nodebooks in a browser window.
DO NOT CLOSE your terminal/CMD instance! This will need to remain open in the background in order for the Nodebooks to be operational.

More operating system-specific instructions and troubleshooting tips can be found at: https://gitlab.oceantrack.org/otn-partner-nodes/ipython-utilities/-/wikis/New-Install-of-Ipython-Utilities

OTN Nodebooks - home page

Database Console Viewer

There are database administration applications to assist with interacting directly with your database. There are many options available but DBeaver and DataGrip are the most popular options at OTN.

https://dbeaver.io/ (free and open access - recommended)
https://www.jetbrains.com/datagrip (free institutional/student access options - another option)

In the next lesson we will practice using our database console viewer and connecting to our node_training database.

More Useful Programs

In order to work efficiently as a Node Manager, the following programs are necessary and/or useful.

Cross-Platform

Visual Studio Code - An advanced code editing integrated development environment (IDE). Also contains extensions that can run JuPyTeR notebooks, open CSV files in a visually appealing way, as well as handle updating your Git repositories.

https://code.visualstudio.com/

For WINDOWS users

Path Copy Copy - For copying path links from your file browser. Since many of the notebooks require you to provide the path to the file you wish to load, being able to copy and paste the entire path at once can save a lot of time.

https://pathcopycopy.github.io/

Notepad++ - For reading and editing code, csv files etc. without altering the formatting. Opening CSV files in Excel can change the formatting of the data in the file (this is a common problem with dates). Notepad++ will allow you to edit CSV files (and code, if necessary) without imposing additional formatting on data.

https://notepad-plus-plus.org/downloads/

Tortoise Git - For managing git, avoiding command line. Depending on what new features have been recently added, you may be asked to use a different branch of the notebook repository than the main one (i.e. integration). Although using git through the command line is supported, you may prefer to manage your Nodebooks via a graphical user interface (GUI). Tortoise Git can provide that.

https://tortoisegit.org/download/

For MAC users

Source Tree - For managing git, avoiding command line.

https://www.sourcetreeapp.com

Node Training Datasets

We have created test datasets to use for this workshop. Each attendee has their own files, available at this link: http://129.173.48.161/data/repository/node_training/node-training-files-1

Please find the folder with your name and download. Save these somewhere on your computer, and UNZIP all files.

Key Points

Node Manager tasks involve the use of many different programs

OTN staff are always available to help with installation of these programs or any issues

There are many programs and tools to help Node Managers

Data Loading Workflow

Overview

Teaching: 30 min
Exercises: 45 min

Questions

How does a Node Manager receive data?

How does a Node Manager track their To-Do list?

How can a Node Manager interact with their database directly?

Objectives

Understand the data-loading workflow

Understand how to create and use GitLab Issues

Understand how to access and query your database tables

Understand how to use the AUTH - Create and Update notebook to maintain your database credentials file

Data Managers receive data from a researcher and then begin the process of QA/QC and data matching:

Records are received and a GitLab Issue is created.
Data are QA/QC’d using the OTN Nodebooks, and all progress is tracked in GitLab. Feedback between Data Manager and researchers happens at this stage, until data is clean and all GitLab tasks are completed.
Successful processing can be checked by using DBeaver to query and explore the database.

flowchart LR
    data_start(( )) --> get_data(Receive metadata </br>from researchers)
    style data_start fill:#00FF00,stroke:#00FF00,stroke-width:4px
    get_data --> gitlab(Create Gitlab issue </br>with template)
    gitlab --> viz{Visually inspect, </br>does metadata have errors?}
    viz --yes--> req(Request corrected data </br>from researchers)
    req --> end1(( ))
    style end1 fill:#FF0000,stroke:#FF0000
    viz --no--> run_checklist(Run data through checklist)
    run_checklist --> otn_part(Pass to OTN for </br>final verification)
    otn_part --> end2(( ))
    style end2 fill:#FF0000,stroke:#FF0000

Researcher data submission

There are many ways to receive data from researchers in your community/group. Find and make official the way that works for your community and ensure that way becomes standard practice for reporting to your Node.

File management website

The most common way to receive data and metadata from a researcher is through some type of file management website. This will require either an email notification system for the Node Manager or constant checking to look for new submissions.

OTN-managed Nodes can always use the same Plone file management portal software that OTN itself uses to create and maintain private-access data repository folders into which researchers can deposit their data and metadata. These private folders also serve as the location where Detection Extracts are distributed to users, when available.

The FACT Network currently uses a custom instance of Research Workspace for the same purpose.

The ACT and GLATOS Networks use a custom data-submission form managed through their networks’ web sites.

Its common for groups of researchers to use DropBox, Google Drive, or something similar to share data/metadata when the Network is still small. This can be a great, accessible option but the caveat is that is is much more difficult to control access to each individual folder to protect the Data Policy, and it may be difficult to determine when new data has been submitted.

Email-only submission

Generally, each Node Manager has an email address for communicating with their network’s data submitters (ex: Data @ TheFACTNetwork . org). This is a great way to ensure all Node-related emails are contained in the same account in the case of multiple Node Managers or the succession of a new Node Manager. With proper email management, this can be a very successful way to ask Node users to submit their data/metadata to your Node.

It is not recommended to use a personal email account for this, since all the files and history of the project’s data submissions will be lost if that Manager ever moves away from the role. If the account is hosted at an institution, it may be advisable to submit requests to raise institutional limits on constraints like email storage in advance.

Documenting data submission

Using one of the suggested means above, a user has submitted data and metadata to the Node Manager. Now what?

OTN uses GitLab Issues with templates of task-lists to ensure we NEVER forget a step in data loading, and that no file is ever lost/forgotten in an inbox.

Immediately upon receipt of a data file, you are advised to login to OTN’s GitLab. You will have a project for your Node named . This is where you will navigate to. You may want to bookmark this webpage!

Once on the GitLab project page, you should navigate to the Issues menu option, on the left side. Think of your GitLab issues as your running “TODO List”! You will want to create a new Issue for each piece of data that is submitted.

NOTE: GitLab Issues are often referred to as “tickets”

Creating GitLab issues

By choosing the New Issue button in the top-right of your screen, you will be taken to a new, blank, issue form. You will need to fill out the following fields:

Title: Write the project name/code, the type of data submitted, and the submission date, this makes the ticket searchable in the future (eg: HFX tag metadata 2022-02)
Type: Should be type Issue.
Description:
- There are pre-made Templates to choose from here, using the drop down menu. Ensure you choose the relevant checklist for the type of data that was submitted (eg: Tag_metadata). This will populate the large description field!
- Ensure you include the link to the submitted data file OR use the Attach a file option to attach a copy of the submitted data file to the issue.
Assignee: Assign to yourself if this is a task for you, or to anyone else to whom you want to delegate.
Milestone: These are the upcoming Data Push dates. You should choose the nearest future PUSH date as the Milestone for this issue.
Labels: This is for your reference - choose a label that will help you remember what stage of processing this issue is in. Some common examples include Needs QC, Waiting for Metadata, Waiting for VRLs, Request PI Clarification etc. You can create new labels at any time to help sort your tickets.

With the above information supplied, you can click the Create Issue button.

Using GitLab to track progress

As you approach the deadline for data-loading, before a data PUSH, you should begin to work on your Issues which fall under that Milestone. When you open an issue, you will be able to see the remaining tasks to properly load/process that data along with the name of the OTN Nodebook you should use to complete each task.

Keep GitLab open in your browser as you work through the relevant Nodebooks. You should check off the tasks as you complete them, and insert any comments you have into the bottom of the ticket. Comments can include error messages from the Nodebook, questions you have for the researcher, any re-formatting required, etc. At any time you can change the Labels on the issue, to help you remember the issue’s status at a glance.

Once you are done for the day, you’ll be able to come back and see exactly where you left off, thanks to the checklist!

You can tag anyone from the OTN Data Team in your GitLab issue (using the @NAME syntax). We will be notified via email to come and check out the Issue and answer any questions that have been commented.

Once you have completed all the tasks in the template, you can edit the Assignee value in the top-right corner, and assign to someone from OTN’s Database team (currently, Angela or Yinghuan). They will complete the final verification of the data, and close the issue when completed. At this time, you can change the issue Label to Verify, or something similar, to help visually “mark it off” your issue list on the main page.

GitLab practice

At this time we will take a moment to practice making GitLab Issues, and explore other pages on our GitLab like, Milestones, Repository, Snippets, and Wiki.

Database access

As part of the OTN workflow, it may be prudent to use a database client like DBeaver to view the contents of your Node’s database directly and make sure the data has been loaded as expected.

DBeaver is an open-source application for interacting directly with databases. There are lots of built-in tools for query writing and data exploration. We will assume that workshop attendees are novices in using this application.

Connecting to your database

For this training we will connect to a Node Training test database, as practice. Once you open DBeaver, you will need to click on the Database menu item, and choose New Database Connection. A popup will appear, and you will choose the PostreSQL logo (the elephant) then click Next. Using the .auth file provided to you by OTNDC you will complete the following fields:

Host: this could be something like matos.asascience.com for your DB, but we will use the IP address: 129.173.48.161 for our Node Training DB.
Database: this will be your database name, something like pathnode. For training, it will be nodetraining.
Port: this is specified in your .auth file and will be four digits. For training, this port will be set to 5432.
Username/Password: your personal username and password. For training, your username will be your first initial plus last name (ex: jsmith). Your password will be this username backwards.

Next, choose Test Connection and see if it passes the tests. If so, choose Finish and you’re now connected to your database!

On the left-side you should now see a Database Navigator tab, and a list of all your active database connections. You can use the drop down menu to explore all the schemas aka: collections stored in your database. You can even view each individual table, to confirm the creation steps in the Nodebooks were successful.

Writing a query in DBeaver

If you wish to write a query to see a specific portion of your already-loaded data, you should first open a new SQL console. Choose SQL Editor from the top menu, then New SQL Script. A blank form should appear.

While writing SQL is out of the scope of this course, there are many great SQL resources available online. The general premise involves creating conditional select statements to specify the data you’re interested in. As an example, select * from hfx.rcvr_locations where rcv_serial_no = '12345'; will select all records from the HFX schema’s rcvr_locations table where the serial number is 12345.

To run a query, ensure your cursor (the vertical line that shows where you are editing text) is on the line you want to run, then either 1) right-click, and choose Execute, or 2) press CTRL-ENTER (CMD-ENTER for Mac). The results of your query will be displayed in the window below the SQL console.

OTN is here to support you as you begin to experiment with SQL queries and the OTN database structure, and can help you build a library of helpful custom queries that you may want or need.

Exercise: Database Querying

Let’s take a moment to explore some of the tables in the Node Training database, and write some example SQL queries.

Key Points

Node-members cannot access the database, you are the liason

Data submissions and QC processes should be trackable and archived

OTN is always here to help with any step of the process

Connecting to the Database from the Nodebooks

Overview

Teaching: 15 min
Exercises: 15 min

Questions

How can a Node Manager connect with their database in the Nodebooks?

How can a Node Manager connect with other Nodes for matching?

How can a Node Manager use Gitlab Automation in the Database Fix Nodebooks?

Objectives

Understand how to use the AUTH - Create and Update notebook to maintain your database credentials file

Connecting to your Database from the Nodebooks

Now that we have explored and set up some of the tools needed to work as a Node Manager, we can begin preparing our Nodebook connection files. To enhance security, OTN uses encrypted, password-protected .kdbx files to store login credentials for your database. To create these, we have developed an interactive AUTH - Create and Update Nodebook.

Open the OTN Nodebooks
- Open your terminal, and navigate to your ipython-utilities directory, using cd /path/to/ipython-utilities. Then, run the commands:
  - conda activate nodebook to activate the nodebook python environment
  - jupyter notebook --config="nb_config.py" "0. Home.ipynb" to open the Nodebooks in a browser window.
Your Nodebooks should open in a new browser window, showing the Home page.
Open the AUTH - Create and Update Nodebook

Imports cell

This section will be common for most Nodebooks: it is a cell at the top of the notebook where you will import any required packages and functions to use throughout the notebook. It must be run first, every time.

Path to file

This cell needs to be edited. Between the quotes you will type the filepath to the .kdbx file you would like to create, or one which already exists that you would like to edit. The format should look like:

file_location = 'C:/Users/path/to/node_auth.kdbx'

Run this cell to save the input.

Create Password

Run this cell. You will be prompted to create a password for the file (if it is a new file) or to enter the existing password if you are accessing an existing file. Ensure that you remember this master password, as you will be using it every time you connect to the database through the Nodebooks.

Create or Update Main Connections

Run this cell. This section will have an editable form. If it is a new file, all fields will be blank. If it is an existing file, the previously-entered information will display. You may now edit the information, pressing the blue button when you are finished to save your results.

Conn Name: this is customizable - what is the name of this connection? We recommend choosing something like “OTN Database” to help you remember.
Host: this will be something like matos.asascience.com for your DB, but for training purposes we will use the IP of our Node Training DB: 129.173.48.161.
Port: this is specified in your .auth file and will be four digits. Use 5432 for Node Training.
DB Name: this will be your database name, something like pathnode. For training, it will be nodetraining.
Username/Password: your personal username and password. For training, your username will be your first initial plus last name (ex: jsmith). Your password will be this username backwards.

Create or Update DBLink Connections

In order to match detections across databases, you will need to establish DBLink connections to other OTN Nodes. This information can be stored in your .kdbx file, and will give you limited access to information required to match detections and create detection extracts.

Run this cell. This section will have an editable form. If it is a new file, all fields will be blank, and you can choose Add Connection. If it is an existing file, the previously-entered information will display for each DBLink Connection you’ve specified. You may now edit the information, pressing the update or save button when you are finished to save your results.

Please contact OTN if you need any of the following information:

Conn Name: this is customizable - what is the name of this connection? Something like fact-link-to-migramar is informative.
Host: this will be something like matos.asascience.com for the DB you are trying to connect to (i.e. not your own Node db host).
Port: this will be the port required to connect to the remote DB (not your Node db).
DB Name: this will be the name of the database you are trying to connect to (not your Node db), something like otnunit.
User/Password: the username and password of the DBLink user for your database. Not your own node admin’s connection information.

Once you have saved your new DBLink connection, you can create another. Continue until you have established connections to all remote Nodes. Currently, you will require DBLinks to 7 Nodes, but this is rapidly expanding.

Test Connections

The next two cells will test the connection information you entered. Success messages will look like this for your main connection:

Auth password:········
Connection Notes: None
Database connection established
Connection Type:postgresql Host:db.your.org Database:your_db User:node_admin Node: Node

and like this for your DBLink connections:

Testing dblink connections:
	fact-link-to-Node1: DBLink established on user@db.load.oceantrack.org:5432 - Node: NODE1
	fact-link-to-Node2: DBLink established on user@db.load.oceantrack.org:5432 - Node: NODE2
	fact-link-to-Node3: DBLink established on user@db.load.oceantrack.org:5432 - Node: NODE3
	fact-link-to-Node4: DBLink established on user@db.load.oceantrack.org:5432 - Node: NODE4
	fact-link-to-Node5: DBLink established on user@db.load.oceantrack.org:5432 - Node: NODE5
	fact-link-to-Node6: DBLink established on user@db.load.oceantrack.org:5432 - Node: NODE6
	fact-link-to-Node7: DBLink established on user@db.load.oceantrack.org:5432 - Node: NODE7

You are now able to use the filepath to your .kdbx file to run all the Nodebooks.

Change KDBX password

If you have been passed a template .kdbx file from OTNDC with prefilled information, you should use this section to change the password to ensure your privacy is protected.

You will need to enter:

Current Password (provided by OTNDC) for your template file
New Password (of your choosing)

Press Save to change the password of your .kdbx. Ensure that you remember this password, as you will be using it every time you connect to the database through the Nodebooks.

Add a Git Access Token

This will be relevant for users of the Database Fix suite of Nodebooks only. If you are not going to use these tools, you can skip this cell in the Nodebooks.

A Gitlab Access Token will allow Nodebooks to access your GitLab account and insert comments into an Issue directly, as you are working on it. This has been developed for the Database Fix Notebooks to ensure all changes made within the notebooks are documented in GitLab properly. The automation is part of the OTNGitlabAutomation package.

Instructions to create a Personal Access Token are found on our wiki here

You can create one by following the steps below:

In the top-right corner of gitlab, click your avatar.
Select Edit profile.
On the left sidebar, select Access Tokens. Enter a name for your token and optionally set expiry date for the token.
Under ‘Select scopes’ select ‘api’.
Select Create personal access token.
A new token will be created at the top of the page. Make sure you save it somewhere as you won’t be able to access it again. Treat this token like a password as it can be used to access GitLab under your user.

Once you have created your access token, run this cell.

You will need to enter your personal access token.

Press Add Token to insert the token into your .kdbx.

Re-running this cell will allow you to update your access token any time it expires.

Key Points

Node-members cannot access the database, you are the liaison

Data submissions and QC processes should be trackable and archived

OTN is always here to help with any step of the process

Project Metadata

Overview

Teaching: 90 min
Exercises: 0 min

Questions

How do I register a new project in the Database?

Objectives

Understand how to complete the template

Understand how to use the Gitlab checklist

Learn how to use the Create and Update Projects notebook

Learn how to use the Create Plone folders and add users notebook

Process workflow

The process workflow for project metadata is as follows:

flowchart LR
    proj_start(( )) --> get_meta(Receive 
project metadata 
from researchers)
    style proj_start fill:#00FF00,stroke:#00FF00,stroke-width:4px
    get_meta --> gitlab(Create 
Gitlab 
issue)
    gitlab --> inspect(Visually 
inspect)
    inspect --> nodebook(QC with 
nodebooks)
    nodebook --> plone(Verify repository 
folder 
is correct)
    plone --> email(Email project 
contacts 
onboarding information)
    email --> otn(Pass to 
OTN)
    otn --> end2(( ))
    style end2 fill:#FF0000,stroke:#FF0000

The first step when you are contacted by a researcher who wants to register their project with the database is to request Project Metadata. For most Nodes, this is in the form of a plaintext .txt file, using the template provided here. This file allows the researcher to provide information on the core attributes of the project, including the scientific abstract, associated investigators, geospatial details, temporal and taxonomic range.

Completed Metadata

Immediately upon receipt of the metadata, you must create a new Gitlab Issue. Please use the Project Metadata Issue checklist template.

Here is the Issue checklist, for reference:

Project Metadata
- [ ] - NAME add label *'loading records'*
- [ ] - NAME define type of project  **select here one of Data, Deployment, Tracker**
- [ ] - NAME create schema and project records (`Creating and Updating project metadata` notebook)
- [ ] - NAME add project contact information (`Creating and Updating project metadata` notebook)
- [ ] - NAME add scientificnames (`Creating and Updating project metadata` notebook)
- [ ] - NAME verify all of above (`Creating and Updating project metadata` notebook)
- [ ] - NAME [Plone-users only] create new project repository users (`Create Plone Folders and Add Users` notebook)
- [ ] - NAME [Plone-users only] create project repository folder (`Create Plone Folders and Add Users` notebook)
- [ ] - NAME [Plone-users only] add project repository users to folder (`Create Plone Folders and Add Users` notebook)
- [ ] - NAME [Plone-users only] access project repository double-check project repository creation and user access 
- [ ] - NAME add project metadata file to relevant project folder (Plone site, Research Workspace etc)
- [ ] - NAME email notification of updated metadata file to PI and individual who submitted
- [ ] - NAME send onboarding email to all contacts
- [ ] - NAME label issue with *'Verify'*
- [ ] - NAME pass issue to OTN DAQ staff
- [ ] - NAME [OTN only] manually identify if this is a loan, if so add record to otnunit.obis.loan_tracking (`Creating and Updating project metadata` notebook)
- [ ] - NAME [OTN only] if this is a loan, update links for PMO
- [ ] - NAME pass issue to OTN analyst for final verification
- [ ] - NAME verify project in database

**project metadata txt file**

Visual Inspection

Once the researcher provides the completed file, the Data Manager should complete a visual check for formatting and accuracy.

Please make sure of the following:

Is the PI-provided collection code unique/appropriate? Do you need to create one yourself? Existing schemas/collection codes can be seen in the database.
Are there typos in the title or abstract?
Are the contacts formatted correctly?
Are the species formatted correctly?
Does the location make sense based on the abstract, and is it formatted correctly (one per line)?

Often, the Contacts section has been improperly formatted. Pay close attention here.

Below is an example of a properly completed metadata form, for your reference.

===FORM START===
0. Intended/preferred project code, if known? (May be altered by OTNDC)
format: XXXX (3-6 uppercase letters that do not already have a representation in the OTN DB. Will be assigned if left blank)

NSBS

1. Title-style description of the project?
format:  < 70 words in 'paper title' form

OTN NS Blue Shark Tracking

2. Brief abstract of the project?
format: < 500 words in 'abstract' form

In the Northwest Atlantic, the Ocean Tracking Network (OTN), in collaboration with Dalhousie University, is using an acoustic telemetry infrastructure to monitor the habitat use, movements, and survival of juvenile blue sharks (Prionace glauca). This infrastructure includes state-of-the-art acoustic receivers and oceanographic monitoring equipment, and autonomous marine vehicles carrying oceanographic sensors and mobile acoustic receivers. Long-life acoustic tags (n=40) implanted in the experimental animals will provide long-term spatial resolution of shark movements and distribution, trans-boundary migrations, site fidelity, and the species’ response to a changing ocean. This study will facilitate interspecific comparisons, documentation of intra- and interspecific interactions, and permit long-term monitoring of this understudied predator in the Northwest Atlantic. The study will also provide basic and necessary information to better inform fisheries managers and policy makers. This is pertinent given the recent formulation of the Canadian Plan of Action for Shark Conservation.

3. Names, affiliations, email addresses, and ORCID (if available) of researchers involved in the project and their role.

The accepted Project Roles are defined as:
Principal Investigator: PI or Co-PI. The person(s) responsible for the overall planning, direction and management of the project.
Technician: Person(s) responsible for preparation, operation and/or maintenance of shipboard, laboratory or deployed scientific instrumentation, but has no invested interest in the data returned by that instrumentation.
Researcher: Person(s) who may use/analyse the data to answer research questions, but is not the project lead. Can be a student if their involvement spans past the completion of an academic degree.
Student: Person(s) researching as part of a project as part of their work towards an academic degree.
Collaborator: A provider of input/support to a project without formal involvement in the project.

Please add 'Point of Contact' to the contact(s) who will be responsible for communicating with OTN.

format: Firstname Lastname, Employer OR Affiliation, Project Role (choose from above list), email.address@url.com, point of contact (if relevant), ORCID

Fred Whoriskey, OTN, principal investigator, fwhoriskey@dal.ca, 0000-0001-7024-3284
Sara Iverson, OTN, principal investigator, sara.iverson@dal.ca
Caitlin Bate, Dal, researcher, caitlin.bate@dal.ca, point of contact


4. Project URL - can be left blank
format: http[s]://yoursite.com

https://members.oceantrack.org/

5. Species being studied?  
format: Common name (scientific name)

blue shark (Prionace glauca)
sunfish (Mola mola)

6. Location of the project?
format: (city, state/province OR nearby landmark OR lat/long points in decimal degree), one per line

Halifax, NS
44.19939/-63.24085

7. Start and end dates of the project, if known?
format: YYYY-MM-DD to YYYY-MM-DD ('ongoing' is an acceptable end date)

2013-08-21 to ongoing

8. Citation to use when referencing this project:
format: Lastname, I., Lastname, I. YYYY. [Title from question 1 or suitable alternative] Will be assigned if left blank.


===FORM END===

Quality Control - Create and Update Projects

Each step in the Issue checklist will be discussed here, along with other important notes required to use the Nodebooks.

Imports Cell

You will have to edit one section: engine = get_engine()

Within the open brackets you need to paste the path to your database .kdbx file which contains your login credentials. Ensure that the path is enclosed in quotation marks.
On MacOS computers, you can usually find and copy the path to your database .kdbx file by right-clicking on the file and holding down the “option” key. On Windows, we recommend using the installed software Path Copy Copy, so you can copy a unix-style path by right-clicking.
The path should look like engine = get_engine('C:/Users/username/Desktop/Auth files/database_connection.kdbx').

Project Metadata Parser

This cell is where you input the information contained in the Project Metadata .txt file. There are two ways to do this:

You can paste in the filepath to the saved .txt. You should ensure the formatting follows this example: project_template_path = '/path/to/project_metadata.txt'
You can paste the entire contents of the file, from ===FORM START=== to ===FORM END=== inclusive, into the provided triple quotation marks.

Note that each of the above options is contained in its own cell in the notebook, and you only need to do one of the two.

Once either option is selected, you can run the cell to complete the quality control checks.

The output will have useful information:

Are there strange characters in the collection code, project title, or abstract?
Were the names and affiliations of each contact successfully parsed? Are there any affiliated institutions which are not found? Are there any contacts which were not found that you expected to be?
Is the project URL formatted correctly?
Are all the species studied found in WoRMS? Are any of them non-accepted taxonomy (entries with accepted taxonomies will have a success message of the format INFO: Genus species is an accepted taxon, and has Aphia ID XXXXXX., followed by a URL)? Which ones have common names that do not match the WoRMS records (look at bottom of each species record for success: OK: Animal name is an acceptable vernacular name for Genus species)? NOTE: any mismatches with common name can be fixed at a later stage, make a note in the Issue for your records
Is the suggested Bounding Box appropriate based on the abstract? NOTE: any issues with the scale of the bounding box can be fixed at a later stage, make a note in the Issue for your records
Are the start and end dates formatted correctly?

Generally, most of the error messages arise from the Contacts and Species sections.

If any information does not parse correctly, you should fix it in the source file and re-run the cell until you are happy with the output.

There are some fields which need to be set up by the Data Manager, rather than the researcher. These are in the next cell.

Run the cell to generate a fillable form with these fields:

Node: select your node
Collaboration Type: based on the abstract, are they deploying only tags (Tracker project), only receivers (Deployment project) or both tags and receivers (Data project)?
Ocean: choose the most appropriate ocean region based on the abstract.
Shortname: usually a summarised version of the project title, which will be used as the name of the Data Portal folder. ex: OTN Blue Sharks.
Longname: use the Title provided by the researcher, or something else, which is in “scientific-paper” style. ex: Understanding the movements of Blue sharks through Nova Scotia waters, using acoustic telemetry.
Series Code: this will generally be the name of your node. Compare to values found in the database obis.otn_resources if you’re unsure.
Institution Code: The main institution responsible for maintaining the project. Compare to values found in the database obis.institution_codes and obis.otn_resources if you’re unsure. If this is a new Institution, please make a note in the Issue, so you can add it later on
Country: based upon the abstract. Multiple countries can be listed as such: CANADA, USA, EGYPT etc.
State: based upon the abstract. Multiple states can be listed as such: NOVA SCOTIA, NEWFOUNDLAND etc.
Local Area: based upon the abstract. Location information. ex: Halifax
Locality: based upon the abstract. Finest-scale of location. ex: Shubenacadie River
Status: is the project completed, ongoing or something else?

To save and parse your inputted values DO NOT re-run the cell - this will clear all your input. Instead, the next cell is the one which needs to be run to parse the information.

Verify the output from the parser cell, looking for several things:

Are all the fields marked with a green OK?
Is the following feedback included in the institution code section: Found institution record for XXX in your database: followed by a small, embedded table?

If anything is wrong, please begin again from the Manual Field input cell.

If the institution code IS NOT found - compare to values found in the database obis.institution_codes and obis.otn_resources. If this is a new Institution, please make a note in the Issue, so you can add it later on

Task List Checkpoint

In Gitlab, this task can be completed at this stage:

- [ ] - NAME define type of project **select here one of Data, Deployment, Tracker**

Please edit to include the selected project type, in harmony with the selected field in the Nodebook.

Verifying Correct Information

At this stage, we have parsed all the information that we need in order to register the project in the database. There is now a cell that will print out every saved value for your review.

You are looking for:

Typos
Strange characters
Correct information based on abstract

If this information is satisfactory, you can proceed.

Adjust Bounding Box

The cell titled Verify the new project details before writing to the DB is the final step for verification before the values are written to the database. At this stage, a map will appear with the proposed Bounding Box for the project.

Based on the abstract, you can use the Square Draw Tool to re-draw the bounding box until you are happy with it.

Proj 1

Once you are happy, you can run the next cell in order to save your bounding adjustments. The success output should be formatted like this:

--- Midpoint ---
Latitude:
Longitude:

Create New Institution

STOP - confirm there is no Push currently ongoing. If a Push is ongoing, you must wait for it to be completed before processing beyond this point.

Remember above, where we noted whether the provided institution was new or existed on obis.institution_codes? This cell is our opportunity to add any new institutions. If all institutions (for each contact, plus for the project as a whole) exist, then you can skip this cell.

To run the cell, you will need to complete:

Institution Code: a short-code for the institution (ex: DAL)
Institution Name: the full institution name (ex: Dalhousie University)
Institution Website: the institution’s website (ex: https://www.dal.ca/). You can confirm the URL is valid with the Check URL button.
Organization ID: Press the Search Org ID button. A list of potential WikiData, GRID, ROR, EDMO and NERC results for the institution will appear. Choose the best match, and paste that URL into the blank Organization ID cell.
Institution Country: the country where the institution is headquartered
Institution State: the state where the institution is headquartered
Institution Sector: one of Unknown, Government/Other Public, University/College/Research Hospital, Private, or Non-profit.

Once all values are completed, press Create Institution and confirm the following output:

Institution record 'DAL' created.

You can re-run this cell as many times as you need, to add each missing institution.

Write Project to the Database

STOP - confirm there is no Push currently ongoing. If a Push is ongoing, you must wait for it to be completed before processing beyond this point.

Finally, it is time to write the project records to the database and create the new project!

First, you should run this cell with printSQL = True. This will run the code, but print the SQL query instead of running it against the database. This enables you to do a dry run and make sure everything is in order before you register your project. If there are no errors, you can edit the cell to read printSQL = False and run again. This will register the project!

You will see some output - confirm each line is accompanied by a green OK.

Task List Checkpoint

In Gitlab, this task can be completed at this stage:

- [ ] - NAME create schema and project records ("Creating and Updating project metadata" notebook)

Save Contact Information

At this stage, the next step is to add contact information for all identified collaborators.

This cell will gather and print out all contacts and their information. Review for errors. If none exist, move to the next cell.

The next cell writes to the database: STOP - confirm there is no Push currently ongoing. This cell will add each contact to the database, into the obis.contacts and obis.contacts_projects tables, as needed.

Valid output will be of this format:

Valid contact:  Fred Whoriskey OTN principalInvestigator fwhoriskey@dal.ca
Created contact Fred Whoriskey

Task List Checkpoint

In Gitlab, this task can be completed at this stage:

- [ ] - NAME add project contact information ("Creating and Updating project metadata" notebook)

Add Species Information

The following section will allow you to add the required information into the obis.scientific_names table.

The first cell imports the required function.

The second cell, when run, will create an editable input form for each animal.

You should review to confirm the following:

the scientific name matches an accepted WoRMS taxon. There should be a URL provided and no error messages.
the common name is acceptable. If it is not, you can choose a value from the dropdown menu (taken directly from WoRMS’ vernacular list) OR you can enter a custom value to match the common name provided by the researcher.

Once you are sure that both the scientific and common names are correct, based on the information provided by both the project and the notebook, you may click the Add to project button for each animal record you’d like to insert.

There will be a confirmation display in the notebook to demonstrate if the insertion was successful.

Task List Checkpoint

In Gitlab, this task can be completed at this stage:

- [ ] - NAME add scientificnames ("Creating and Updating project metadata" notebook)

OPTIONAL: Add Project Loan Information

The following section is used by OTN staff to track projects which are recipients of OTN-equipment loans. This section is not within the scope of this Node Manager Training, because it requires a login file for the otnunit database.

Once you scroll past the Project Loan Information section, you will see a yellow star and the words Skip to the new project Verification. You should click the button provided, which will help you scroll to the bottom of the notebook, where the Verify section is located.

This is a chance to visually review all the fields you just entered. You should run these cells and review all output to ensure the database values align with the intended insertions.

Task List Checkpoint

In Gitlab, this task can be completed at this stage:

- [ ] - NAME verify all of above ("Creating and Updating project metadata" notebook)

Other Features of this Notebook

There are more features in the Create and Update Projects Nodebook than those covered above.

Schema Extension

This section is to be used when you have a project which is either Tracker or Deployment and is expanding to become a Data project. Ex: a project which was only tagging has begun deploying receivers.

You can use this Nodebook to create the missing tables for the schema. Ex: if a tagging project begins deploying receivers, the schema would now need stations, rcvr_locations, and moorings tables created.

Schema Updating

This section is to be used to change the values contained in obis.otn_resources. The first cell will open an editable form with the existing database values. You can change the required fields (ex: abstract).

To save and parse your inputted values DO NOT re-run the cell - this will clear all your input. Instead, the next cell is the one which needs to be run to parse the information.

Review the output of the parser cell and check for typos.

The next cell will show the changes that will be made to the project data. You can copy this output and paste it into the relevant Gitlab Issue for tracking.

The final cell will make the desired changes in the database. Ensure printSQL = False if you want the cell to execute directly.

Successful output will be of this format:

'Resource record for HFX has been updated.'

The following highlighted section is relevant only to Nodes who use Plone for their document management system

Quality Control - Create Plone Users and Access

If you are part of a Node that uses Plone as your document repository, then the following will be relevant for you.

Imports Cell

This section will be common for most Nodebooks: it is a cell at the top of the notebook where you will import any required packages and functions to use throughout the notebook. It must be run first, every time.

You will have to edit one section: engine = get_engine()

Within the open brackets you need to open quotations and paste the path to your database .kdbx file which contains your login credentials.

On MacOS computers, you can usually find and copy the path to your database .kdbx file by right-clicking on the file and holding down the “option” key. On Windows, we recommend using the installed software Path Copy Copy, so you can copy a unix-style path by right-clicking.

The path should look like engine = get_engine(‘C:/Users/username/Desktop/Auth files/database_connection.kdbx’).

Plone Login

The first cell is another import step.

The second cell requires input:

Proper Plone log-in information must be written in the plone_auth = get_plone_auth('./plonetools/plone_auth.json') file.

In order to do this, click on the Jupyter icon in the top left corner of the page.

This will bring you to a list of folders and notebooks. Select the plonetools folder. From there, select the plone_auth.json file and input your Plone base URL, username, and password. Please ensure the base_url in your json file ends in a slash, like https://members.oceantrack.org/!

You can now successfully log into Plone.

Now, when you run the cell, you should get the following output:
Auth Loaded:
------------------------------------------------------------------------------
base_url: https://members.oceantrack.org/
user_name: user
verify ssl: False
Finally, the third cell in this section will allow you to login. You should see this message:
Login Successful!
Access Project Information

Some information is needed in order to create the project Plone folders.

There are three ways to enter this information:

Access Project Information from Database

Manual Project Information Form - Parse Contacts

Manual Project Information Form - Insert Contacts into Textfields

The first option is generally the easiest, if the project has already been successfully written to the database using the Create and Update Projects Nodebook. To do this, you enter the collectioncode of your project, and run the cell. If there are no errors, you can click the SKIP button which will take you down the Nodebook to the next section.

Create Missing Users

This section will use the registered project contacts and compare against existing Plone users. It will compare by email, fullname, and lastname.

If a user is found, you will not need to create a new account for them.

If a user is not found, you will have to create an account for them. To do this, you can use the editable form in the next cell.

The editable cell will allow you to choose each contact that you’d like to register, and will autofill the information (including a suggested username). The password should be left blank. Once you are happy with the form, click Add User. An email will be sent to the new user, prompting them to set a password. Then you can repeat by selecting the next contact, etc.

Once all contacts have Plone accounts (new or otherwise) you are finished.

Task List Checkpoint

In Gitlab, this task can be completed at this stage:

- [ ] - NAME create new project repository users ("Create Plone Folders and Add Users" notebook)

Create Project Repository

To create the project folder you must first enter the relevant Node information:

otnunit: node = None

safnode, migramar, nepunit:node = "node" - lowercase with quotation marks, fill in the value based on the path in Plone.

all other nodes (not hosted by OTN): node = None

Running this cell will print out an example of the URL, for your confirmation. Ensure the collectioncode and Node are correct.

The expected format:

https://members.oceantrack.org/data/repository/node_name/collectioncode (node name is for SAF, MigraMar, and NEP only)

If you are confident the folder path is correct, you can run the next cell and confirm the following success message:
Creating collection folder 'collectioncode'. Done!
https://members.oceantrack.org/data/repository/node_name/collectioncode
Task List Checkpoint

In Gitlab, this task can be completed at this stage:

- [ ] - NAME create project repository folder ("Create Plone Folders and Add Users" notebook)

Add Users to Repository

Now that the users AND folder have been created, the users must be given access to the new folder.

Using the final cell in the Plone repository folder creation section, you will be provided with an editable search-bar.

Type in the Plone username of each contact (new and existing). Search results will appear:

Select the User who you would like to add

Choose their permissions

Click “Change repo permissions” to add them to the folder.

Review for the following success message:
Changed https://members.oceantrack.org/data/repository/node_name/collectioncode sharing for username:
	Contributor=True Reviewer=True Editor=True Reader=True
Then you may choose Add another user and begin again.

The acceptable folder permissions may vary depending on the project role of the contact. Here are some guidelines:

Principal Investigator: all permissions

Researcher: all permissions except Reviewer

Student: all permissions except Reviewer

Technician: only Contributor and Reader

Collaborator: only Contributor and Reader

This is very fluid and can be edited at any time. These are guidelines only!

Task List Checkpoint

In Gitlab, this task can be completed at this stage:

- [ ] - NAME add project repository users to folder ("Create Plone Folders and Add Users" notebook)

OPTIONAL: Add Project Loan Information

The following section is used by OTN staff to track projects which are recipients of OTN-equipment loans. This section is not within the scope of this Node Manager Training, because it requires a login-file for the otnunit database.

Final Steps

The remaining steps in the Gitlab Checklist are completed outside the Nodebooks.

First: you should access the created Repository folder in your browser and confirm if the title and sharing information is correct. If so, add the project metadata .txt file into the “Data and Metadata” folder to archive.

Next, you should send an email to the project contacts letting them know their project code and other onboarding information. Please note that OTN has a template we use for our onboarding emails. It is recommended that you create a template for your Node which includes relevant reporting instructions.

Finally, the GitLab ticket can be assigned to an OTN analyst for final verification in the database.

Key Points

Loading project metadata requires subjective decisions and judgement by the Data Manager

Loading project metadata is a required first step towards managing a project’s data in the Node

Tagging Metadata

Overview

Teaching: 90 min
Exercises: 0 min

Questions

How do I load new tags into the Database?

Objectives

Understand the proper template-completion

Understand how to use the Gitlab checklist

Learn how to use the Tag-1 notebook

Learn how to use the Tag-1b notebook

Learn how to use the Tag-2 notebook

Process workflow

The process workflow for tag metadata is as follows:

flowchart LR
    tag_start(( )) --> get_meta(Receive 
tag metadata 
from researchers)
    style tag_start fill:#00FF00,stroke:#00FF00,stroke-width:4px
    get_meta --> gitlab(Create 
Gitlab 
issue)
    gitlab --> inspect(Visually 
inspect)
    inspect --> nodebook(Process and verify 
with nodebooks)
    nodebook --> plone(Add metadata 
to repository folder)
    plone --> otn(Pass to 
OTN)
    otn --> end2(( ))
    style end2 fill:#FF0000,stroke:#FF0000

Once a project has been registered, the next step (for Tracker and Data project types) is to begin to quality control and load the project’s tagging metadata into the database. Tagging metadata should be reported to your Node in the template provided here. This file holds information about the deployment of any and all tags (acoustic, PIT, satellite, floy etc.) in or on animals for the purposes of tracking their movements using either listening stations or via mark/recapture. Any biological metrics that were measured at tagging time, i.e. length, weight, population, are also able to be recorded for association with the tagging event, permitting future analyses.

Recall that there are multiple levels of data tables in the database for tagging records: raw tables (“raw”), cache tables (“intermediate”) and otn tables (“upper”). The process for loading tagging metadata evaluates and promotes the data through each of these levels, as reflected by the GitLab task list.

Completed Metadata

Immediately, upon receipt of the metadata, create a new GitLab issue. Please use the Tag Metadata Issue checklist template.

Here is the Issue checklist, for reference:

Tag Meta Data
- [ ] - NAME add label *'loading records'*
- [ ] - NAME load raw tag metadata (`tag-1` notebook) **put_table_name_in_ticket**
- [ ] - NAME confirm no duplicates in raw table, review and remove (`tag-1b` notebook)
- [ ] - NAME verify raw table (`tag-2` notebook)
- [ ] - NAME post updated metadata to project folder (OTN members.oceantrack.org, FACT RW etc) if needed
- [ ] - NAME email notification of updated metadata file to PI and individual who submitted
- [ ] - NAME build cache tables (`tag-2` notebook)
- [ ] - NAME verify cache tables (`tag-2` notebook)
- [ ] - NAME load otn tables (`tag-2` notebook)
- [ ] - NAME verify otn tables (`tag-2` notebook)
- [ ] - NAME verify tags are not part of another collection (`tag-2` notebook)
- [ ] - NAME label issue with *'Verify'*
- [ ] - NAME pass issue to analyst for final verification
- [ ] - NAME check for double reporting (verification_notebooks/Tag Verification notebook)

Visual Inspection

Once the researcher provides the completed file, the Data Manager should complete a visual check for formatting and accuracy.

In general, Tagging Metadata has 3 sections:

information about the tag
information about the animal
information about the tag deployment

Information about the tag comes from Tag Specifications and is mandatory. Information about the animal is limited to the sampling conducted. The minimum requirement here is simply the common and scientific names. All other columns for biological parameters are optional. Information about the tag deployment includes location and dates and is mandatory to complete.

Data Managers should check for the following in the metadata:

Is there any information missing from the essential columns? These are:
- tag_type
- tag_manufacturer
- tag_model
- tag_serial_number (if not completed, unknown can be entered ONLY IF animal_id is also completed with unique values)
- tag_id_code
- tag_code_space
- tag_implant_type
- est_tag_life
- common_name_e
- scientific_name
- release_location
- release_latitude
- release_longitude
- utc_release_date_time
If any of the above mandatory fields are blank, follow-up with the researcher will be required if:
- you cannot discern the values yourself
- you do not have access to the Tag Specifications from the manufacturer (relevant for the columns containing tag information).
Are the tag_id_code and tag_code_space values formatted correctly?
Is the UTC_release_date_time column formatted correctly?

Often formatting errors occur in the information about the tag. Pay close attention here.

The metadata template available here has a Sample Data Row as an example of properly-formatted metadata, along with the Data Dictionary sheet which contains detailed expectations for each column. Refer back to these often. We have also included some recommendations for filling in the tag metadata template on our FAQ page. Here are some guidelines:

Animals with >1 associated tag (sensors, or double-tagging): add one line PER TRANSMITTER ID into the Tag Metadata form. The ANIMAL_ID column, or the TAG_SERIAL_NUMBER column must be the same between the rows in order to link those two (or more) records together.
Animals with anchor tags (ie: FLOY, spaghetti, streamer, dart, t-bar tags): ensure the TAG_TYPE column = ANCHOR. You may leave the following columns empty: tag_manufacturer, tag_model, tag_id_code, tag_code_space and est_tag_life.
Animals with satellite tags: ensure the TAG_TYPE column = SATELLITE. You may leave the following columns empty: tag_id_code and tag_code_space.

Please ensure each tag applied to an animal (of any type) has its own row. Anchor tags (inlcuding FLOY, spaghetti, streamer, dart, t-bar tags), radio tags (including PIT, VHF), satellite tags (and more) each need their own row in the metadata sheet.

Quality Control - Tag-1 Nodebook

Each step in the Issue checklist will be discussed here, along with other important notes required to use the Nodebook.

Imports cell

There are no values here which need to be edited.

Path to File

In this cell, you need to paste a filepath to the relevant Tagging Metadata file. The filepath will be added between the provided quotation marks.

Correct formatting looks something like this:

# Path to your tagging metadata file - Formats Accepted: (csv, xls or xlsx)
filepath = r"C:/Users/path/to/tag_metadata.xlsx"

Once you have added your filepath, you can run the cell.

Verification of File Contents - formatting

First, you must choose which sheet you would like to quality control. Generally, it will be named Tag Metadata but is often customized by researchers. Once you have selected the sheet name, do not re-run the cell to save the output - simply ensure the correct sheet is highlighted and move onto the next cell.

This cell will now complete the first round of Quality Control checks.

The output will have useful information:

Is the sheet formatted correctly? Correct column names, datatypes in each column etc.
Are either the animal_id or tag_serial_number columns completed?
Are there any harvest_date values in the metadata? Are they all after the utc_release_date_time?
Is the information about the animal formatted according to the Data Dictionary?
Are there any tags which are used twice in the same sheet?
Are there potential transcription errors in the tag_code_space? Ex: drag-and-drop errors from Excel
Are the scientific and common names used accepted by WoRMS? If there are errors flagged here which state Vernacular records not found for Aphia ID: 12345 please make a note, but continue on. This will be fixed at a later stage

The Nodebook will indicate the sheet had passed quality control by adding a ✔️green checkmark beside each section. The Nodebook will also generate an interactive plot for you to explore, summarizing the tags released over time, by species.

If there is information that fails these quality control checks, you should fix the source file (potentially requiring confirmation of accurate fixes from the researcher) and try again.

Tag 1

Connection to Database

You will have to edit two sections:

schema = 'collectioncode'
- Please edit to include the relevant project code, in lowercase, between the quotes.
engine = get_engine()
- Within the open brackets you need to open quotations and paste the path to your database .kdbx file which contains your login credentials.
- On MacOS computers, you can usually find and copy the path to your database .kdbx file by right-clicking on the file and holding down the “option” key. On Windows, we recommend using the installed software Path Copy Copy, so you can copy a unix-style path by right-clicking.
- The path should look like engine = get_engine(‘C:/Users/username/Desktop/Auth files/database_connection.kdbx’).

Once you have added your information, you can run the cell. Successful login is indicated with the following output:

Auth password:········
Connection Notes: None
Database connection established
Connection Type:postgresql Host:db.load.oceantrack.org Database:otnunit User:admin Node:OTN

Bounding Box Check

The following cell will map the locations of all the tag deployments, compared to the bounding box of the project. This is used to confirm the following:

the tag release locations are in the part of the world expected based on the project abstract. Ex: lat/long have correct +/- signs
the project bounding box is correct

If it appears the tag release locations have incorrect signs, you can fix it in the source file and re-run the cell.

If it appears there are tag release locations which are on land, you may want to reach out to the researcher for corrected coordinates.

If the bounding box needs to be expanded to include the tags, you can use the Square Draw Tool to re-draw the bounding box until you are happy with it. Once all tags are drawn inside the bounding box, press the Adjust Bounding Box button to save the results.

Tag 1 - bounds

Verification of File Contents - against database

This cell will now complete the second round of Quality Control checks using data already held in the Database.

The output will have useful information:

Have these tags been used on other projects in the database? Check the dates to ensure they don’t overlap and that there is no double-reporting.
Do we have the Tag Specifications from the manufacturer? Do the tag_id_code, tag_code_space and est_tag_life match the specifications for each provided serial number? Are there typos or errors that require clarification from the researcher?
Is the information about the animal formatted according to the Data Dictionary?
- Are all the life stages in the obis.lifestage_codes table? If not, the reported life stage should be compared to the values in the obis.lifestage_code table, and adjusted to match the DB records if possible. Otherwise, use the add_lifestage_codes Nodebook.
- Are all length types in the obis.length_type_codes table? If not, the reported length type code should be compared to the values in the obis.length_type_codes table, and adjusted to match the DB records if possible. Otherwise, use the add_lengthtype_codes Nodebook.
- Are all the age units in the obis.control_units table? If not, the reported age units should be compared to the values in the obis.control_units table, and adjusted to match the DB records if possible. Otherwise, use the add_control_units Nodebook.
Are there any tags in this sheet which have been previously reported on this project in the metadata? ex: duplicates.
Do the scientific and common names match the records previously added to obis.scientificnames for this schema? If not, please check the records in the obis.scientificnames (using DBeaver) and compare to the source file to confirm there are no typos. If this is indeed a new species tagged by this project, use the scientific_name_check Nodebook to add the new species.
Are all the provided tag_model values present in the obis.instrument_models table? If not, please check the records in the obis.instrument_models (using DBeaver) and the source file to confirm there are no typos. If this is a new model which has never been used before, use the add instrument_models Nodebook to add the new tag model.
Are there any tags in this sheet which have been previously reported on this project in the metadata, but with different deployment dates? ex: overlapping/missing harvest dates
Are there any tags flagged as overlapping tag deployments, but not as duplicate tags? There may be an error with the tag’s serial number. Check if the tag’s ID exists in the otn_transmitters table of the schema or in the vendor.c_vemco_tags table, and compare it to the tag in the tagging metadata sheet. Fix the tag in the tagging metadata sheet if any errors are found.
Are there any release dates in the future?

The Nodebook will indicate the sheet had passed quality control by adding a ✔️green checkmark beside each section.

If there is information that fails quality control, you should fix the source file (potentially by speaking to the researcher) and try again.

Loading the Raw Table

ONLY once the source file has successfully passed ALL quality control checks can you load the raw table to the database.

In this cell, you will have to edit one section in order to name the raw table. Between the quotes, please add your custom table suffix. We recommend using year_month or similar, to indicate the most-recently tagged animal in the metadata sheet.

table_suffix = "YYYY_MM"

The Nodebook will indicate the success of the table-creation with the following message:

Reading file: otn_metadata_tagging.xlsx.
Tag Metadata sheet found, reading file... OK
Loading 'otn_metadata_tagging.xlsx' file into collectioncode.c_tag_meta_YYYY_mm... OK
Loaded XX records into table collectioncode.c_tag_meta_YYYY_mm
True

Task list checkpoint

In Gitlab, this task can be completed at this stage:

- [ ] - NAME load raw tag metadata ('tag-1' notebook) **put_table_name_in_ticket**

Ensure you paste the table name (ex: c_tag_meta_YYYY_mm) into the section indicated, before you check the box.

Quality Control - Tag-1b Nodebook

Once the raw table is successfully loaded, the next step is to ensure any previously-loaded tags are not re-added to the database (causing duplication errors).

Imports cell

There are no values here which need to be edited.

Table Name and Database Connection

You will have to edit three sections:

engine = get_engine()
- Within the open brackets you need to open quotations and paste the path to your database .kdbx file which contains your login credentials.
- On MacOS computers, you can usually find and copy the path to your database .kdbx file by right-clicking on the file and holding down the “option” key. On Windows, we recommend using the installed software Path Copy Copy, so you can copy a unix-style path by right-clicking.
- The path should look like engine = get_engine(‘C:/Users/username/Desktop/Auth files/database_conn_string.kdbx’).
schema = 'collectioncode'
- Please edit to include the relevant project code, in lowercase, between the quotes.
table_name = 'c_tag_meta_YYYY_mm'
- Within the quotes, please add the name of the raw table.

Once you have added your information, you can run the cell. Successful login is indicated with the following output:

Auth password:········
Connection Notes: None
Database connection established
Connection Type:postgresql Host:db.load.oceantrack.org Database:otnunit User:admin Node:OTN

Checking for Duplicates

This cell, once run, may print out that there are No Duplicates Found. If so, you can complete the task in Gitlab and move to the Tag-2 notebook

However, if there is output identifying duplicates, some review is necessary.

Immediately, the Nodebook will create a new table, named schema.c_tag_meta_YYYY_mm_no_dup which is an exact copy of your raw table. There will be a print-out saying the following (under a table):

Building schema.c_tag_meta_YYYY_mm_no_dup table:
The following SQL queries will display the differences/similarities between the raw table (schema.c_tag_meta_YYYY_mm_no_dup) record and the otn_animals or otn_transmitters table record.

Then, there will be two interactive tables provided, which can be used to identify and delete any duplicate records from the no_dup table.

Comparison to otn_animals
Comparison to otn_transmitters

All text should be black, and you should scroll through the entire table, row-by-row, to review any values where the cell is highlighted/flashing RED. These are highlighting differences between the values in the raw table vs values in the otn table, for this animal record.

If there are no highlighted/flashing red cells in a row, you can delete the record from the no_dup table by using the delete button on the left-side of the table. You can also use the Select All button if appropriate.

If there are highlighted/flashing red cells in a row, you must compare the highlighted values. These cannot be deleted without review since they are not exact duplicates, and could indicate an update to the database records is needed.

Ex: life_stage from the raw table might = SMOLT while lifestage from otn_animals might = ADULT. This will cause the column lifestage_equal? to read FALSE and the cells to flash red. In this example, it could indicate that the researcher was correcting an error in the previously-submitted record (this animal was actually a smolt, not adult) and therefore we need to update the record in our database. In this case, you would need to email the researcher to confirm, pause processing this Issue, and create a new DB_fix Issue with the information which needs updating.

If you review all the flashing red cells and find they are only rounding errors, or similar non-significant data changes, you can determine that they are indeed duplicate records. You can now use the delete button on the left-side of the table. You can also use the Select All button, if all records are true duplicates.

Once you have selected a row to be deleted, the text will turn red to help you visualize your selection.

Deleting Duplicates

Once you have identified all the true duplicate records, this next cell will remove them from the no_dup table. Success will be indicated with this message, and a table:

Compiling list of deletions to the schema.c_tag_meta_YYYY_mm_no_dup table.
Delete the following XXX records from schema.c_tag_meta_YYYY_mm_no_dup?

You will be able to press the button to confirm, and the Nodebook will remove all the records.

Task list checkpoint

In Gitlab, this task can be completed at this stage:

- [ ] - NAME confirm no duplicates in raw table, review and remove ("tag-1b" notebook)

Ensure you paste the no_dup table name (ex: c_tag_meta_2021_09_no_dup), if relevant, into the Issue before you check the box. This is now the raw table that will be used for the result of the data-loading process.

Quality Control - Tag-2 Nodebook

Now that the raw table is free from duplicates, we can begin to move the records into the higher-level cache and otn tables, where they will be matched to detections.

Imports cell

There are no values here which need to be edited.

Database Connection

You will have to edit one section:

engine = get_engine()
- Within the open brackets you need to open quotations and paste the path to your database .kdbx file which contains your login credentials.
- On MacOS computers, you can usually find and copy the path to your database .kdbx file by right-clicking on the file and holding down the “option” key. On Windows, we recommend using the installed software Path Copy Copy, so you can copy a unix-style path by right-clicking.
- The path should look like engine = get_engine(‘C:/Users/username/Desktop/Auth files/database_conn_string.kdbx’).

Once you have added your information, you can run the cell. Successful login is indicated with the following output:

Auth password:········
Connection Notes: None
Database connection established
Connection Type:postgresql Host:db.load.oceantrack.org Database:otnunit User:admin Node:OTN

Table Name

You will have to edit two sections:

table_name = 'c_tag_meta_YYYY_mm'
- Within the quotes, please add the name of the raw table. Might be the no_dup table if relevant.
schema = 'collectioncode'
- Please edit to include the relevant project code, in lowercase, between the quotes.

Verification of File Contents - against database

This cell will now complete the final round of Quality Control checks. These are exactly the same as the checks at the end of the tag-1 Nodebook.

The output will have useful information:

Have these tags been used on other projects in the database? Check the dates to ensure they don’t overlap, and double-reporting is not taking place.
Do we have the Tag Specifications from the manufacturer? Do the tag_id_code, tag_code_space and est_tag_life match the specifications for each provided serial number? Are there typos or errors that require clarification from the researcher?
Is the information about the animal formatted according to the Data Dictionary?
- Are all the life stages in the obis.lifestage_codes table? If not, the reported life stage should be compared to the values in the obis.lifestage_code table, and adjusted to match the DB records if possible. Otherwise, use the add_lifestage_codes Nodebook.
- Are all length types in the obis.length_type_codes table? If not, the reported length type code should be compared to the values in the obis.length_type_codes table, and adjusted to match the DB records if possible. Otherwise, use the add_lengthtype_codes Nodebook.
- Are all the age units in the obis.control_units table? If not, the reported age units should be compared to the values in the obis.control_units table, and adjusted to match the DB records if possible. Otherwise, use the add_control_units Nodebook.
Are there any tags in this sheet which have been previously reported on this project in the metadata? ex: duplicates.
Do the scientific and common names match the records which are previously added to obis.scientificnames for this schema? If not, please check the records in the obis.scientificnames (using DBeaver) and compare to the source file to confirm there are no typos. If this is indeed a new species tagged by this project, use the scientific_name_check Nodebook to add the new species.
Are all the provided tag_model values present in the obis.instrument_models table? If not, please check the records in the obis.instrument_models (using DBeaver) and the source file to confirm there are no typos. If this is a new model which has never been used before, use the add instrument_models Nodebook to add the new tag model.
Are there any tags in this sheet which have been previously reported on this project in the metadata, but with different deployment dates? ex: overlapping/missing harvest dates
Are there any tags being flagged as overlapping tag deployments, but not as duplicate tags? There may be an error with the tag’s serial number. Check if the tag’s ID exists in the otn_transmitters table of the schema or in the vendor.c_vemco_tags table, and compare it to the tag in the tagging metadata sheet. Fix the tag in the tagging metadata sheet if any errors are found.
Are there any release dates in the future?

The Nodebook will indicate the sheet had passed quality control by adding a ✔️green checkmark beside each section.

If there is information which is not passing quality control, you should fix the source file (potentially speaking to the researcher), delete or edit the raw table, and try again.

Task list checkpoint

In Gitlab, this task can be completed at this stage:

- [ ] - NAME verify raw table ("tag-2" notebook)

Loading to Cache Tables

The first cell will create the cache tables, with the following success messaging:

Creating table schema.tagcache_YYYY_mm
Creating table schema.animalcache_YYYY_mm
Table creation finished.

The next step will populate the tag_cache and animal_cache tables. This separates the information about the tag and the information about the animal, joining the records by a unique catalognumber based on the tag deployment information.

Running this cell will provide the following success message:

Added XX records to the schema.animalcache_YYYY_mm table
Added XX records to the schema.tagcache_YYYY_mm table

You need to pay special attention to the number of records loaded to the animal and tag caches. If these numbers don’t match you may need to investigate why there are more tags than animals or vice versa. Possible reasons the values may not match:

There are some animals with only anchor tags, no acoustic tags (so no record is added to tag_cache, just to animal cache).
There are some animals with >1 tag attached, or a tag with >1 pinger ID (multiple records added to tag_cache, for each animal)

If the values are acceptable, you can move on.

Task list checkpoint

In Gitlab, this task can be completed at this stage:

- [ ] - NAME build cache tables ("tag-2" notebook)

Verifying Cache Tables

This cell will now complete the Quality Control checks of the cache tables.

The output will have useful information:

Are there more animals than tags?
Were all records loaded from raw to cache successfully?
Do all animals have tag records, and all tags have animal records?
Are there blank strings that need to be set to NULL? If so, press the Set to NULL button.
Is the species information and location information formatted correctly?
Are the sex, age, common name and instrument model records present in the obis table controlled vocabulary?
Do the length/weight values make sense for that species/lifestage? If not, change in the animal_cache table.
Are there any remaining overlapping tags?
Are Tag Specifications available, and do they match the records? Be mindful that harvested tags will have a different tag life than is stated in the Specifications.

The Nodebook will indicate the sheet had passed quality control by adding a ✔️green checkmark beside each section.

If there are any errors go into database and fix the cache tables themselves, and re-run the cell.

Task list checkpoint

In Gitlab, this task can be completed at this stage:

- [ ] - NAME verify cache tables ("tag-2" notebook)

Loading to OTN Tables

STOP - confirm there is no Push currently ongoing. If a Push is ongoing, you must wait for it to be completed before processing beyond this point

This cell will populate the otn_animals and otn_transmitters master-tables, with the following success messaging:

Added XX records to schema.otn_animals table from animalcache_YYYY_mm
Added XX records to schema.otn_transmitters table from tagcache_YYYY_mm

The number of records added should match the number from the cache table loading step.

Task list checkpoint

In Gitlab, these tasks can be completed at this stage:

- [ ] - NAME load otn tables ("tag-2" notebook)

Verifying OTN Tables

This cell will now complete the Quality Control checks of the tag and animal records contained in the entire schema. We are no longer checking our newly-loaded records only, but also each previously-loaded record.

The output will have useful information:

Are there more animals than tags?
Do all animals have tag records, and all tags have animal records? Do their SNs/Datetimes match?
Are there overlapping tags within the schema?
Are there tags outside of the project bounding box?
Are the values formatted properly for the_geom, lenthtype, length2type, ageunits, sex?
Are there blank strings that need to be set to NULL? If so, press the Set to NULL button in that cell.
Are there extra spaces that need to be clipped? If so, press the Remove extra spaces button.
Do the length/weight values make sense for that species/lifestage? If not, change in the otn_animals table, or contact researcher.
Are the date-fields formatted correctly?
Are there Tag Specifications available, and do they match the records? Be mindful that harvested tags will have a different tag life than is stated in the Specifications.
Have these tags been used on other projects in the database? Check the dates to ensure they don’t overlap, and double-reporting is not taking place.

The Nodebook will indicate the sheet had passed quality control by adding a ✔️green checkmark beside each section.

If there are any errors, contact the researcher to scope potential data fixes, then open a DB-Fix Ticket, and use the Database Fix Notebooks to resolve the issues.

Task list checkpoint

In Gitlab, these tasks can be completed at this stage:

- [ ] - NAME verify otn tables ("tag-2" notebook)
- [ ] - NAME verify tags are not part of another collection (`tag-2` notebook)

Final Steps

The remaining steps in the Gitlab Checklist are completed outside the notebooks.

First: you should access the Repository folder in your browser and add the cleaned Tag Metadata .xlsx file into the “Data and Metadata” folder.

Then, please email a copy of this file to the researcher who submitted it, so they can use the “cleaned” version in the future.

Finally, the Issue can be passed off to an OTN-analyst for final verification in the database.

Key Points

Loading tagging metadata requires judgement from the Data Manager

Communication with the researcher is essential when errors are found

Deployment Metadata

Overview

Teaching: 90 min
Exercises: 0 min

Questions

How do I load new deployments into the Database?

Objectives

Understand how to complete the template

Understand how to use the GitLab checklist

Learn how to use the Deploy notebook

Process workflow

The process workflow for deployment metadata is as follows:

flowchart LR
    tag_start(( )) --> get_meta(Receive 
deployment metadata 
from researchers)
    style tag_start fill:#00FF00,stroke:#00FF00,stroke-width:4px
    get_meta --> gitlab(Create 
Gitlab 
issue)
    gitlab --> inspect(Visually 
inspect)
    inspect --> nodebook(Process and verify 
with nodebooks)
    nodebook --> plone(Add metadata 
to repository folder)
    plone --> otn(Pass to 
OTN)
    otn --> end2(( ))
    style end2 fill:#FF0000,stroke:#FF0000

Once a project has been registered, the next step (for Deployment and Data project types) is to begin to quality control and load the instrument deployment metadata into the database. Deployment metadata should be reported to the Node in the template provided here. This file will contain information about the deployment of any instruments used to detect tagged subjects or collect related data. This includes stationary test tags, range test instruments, non-acoustic environmental sensors, and so on. Geographic location is recorded, as well as the duration of the deployment for each instrument. The locations of these listening stations are used to fix detections geographically.

Recall that there are multiple levels of data-tables in the database for deployment records: raw tables, rcvr_locations, stations and moorings. The process for loading instrument metadata reflects this, as does the GitLab task list.

Submitted Metadata

Immediately upon receipt of the metadata, create a new GitLab issue. Please use the Receiver_metadata Issue checklist template.

Here is the Issue checklist, for reference:

Receiver Metadata
- [ ] - NAME add label *'loading records'*
- [ ] - NAME load raw receiver metadata (`deploy` notebook) **put_table_name_in_ticket**
- [ ] - NAME check that station locations have not changed station "NAMES" since last submission (manual check)
- [ ] - NAME verify raw table (`deploy` notebook)
- [ ] - NAME post updated metadata file to project repository (OTN members.oceantrack.org, FACT RW etc)
- [ ] - NAME email notification of updated metadata file to PI and individual who submitted
- [ ] - NAME load station records (`deploy` notebook)
- [ ] - NAME verify stations (`deploy` notebook)
- [ ] - NAME load to rcvr_locations (`deploy` notebook)
- [ ] - NAME verify rcvr_locations (`deploy` notebook)
- [ ] - NAME add transmitter records receivers with integral pingers (`deploy` notebook)
- [ ] - NAME load to moorings (`deploy` notebook)
- [ ] - NAME verify moorings (`deploy` notebook)
- [ ] - NAME label issue with *'Verify'*
- [ ] - NAME pass issue to OTN DAQ for reassignment to analyst
- [ ] - NAME check if project is OTN loan, if yes, check for lost indicator in recovery column, list receiver serial numbers for OTN inventory updating.
- [ ] - NAME pass issue to OTN analyst for final verification
- [ ] - NAME check for double reporting (verification_notebooks/Deployment Verification notebook)

**receiver deployment files/path:**

Visual Inspection

Once the researcher provides the completed file, the Data Manager should complete a visual check for formatting and accuracy.

In general, the deployment metadata contains information on the instrument, the deployment location, and the deployment/recovery times.

Check for the following in the deployment metadata:

Is there any information missing from the essential columns? These are:
- otn_array
- station_no
- deploy_date_time
- deploy_lat
- deploy_long
- ins_model_no
- ins_serial_no
- recovered
- recover_date_time
If any of the above mandatory fields are blank, follow-up with the researcher will be required if:
- you cannot discern the values yourself.
- you do not have access to the Tag or Receiver Specifications from the manufacturer (relevant for the columns containing transmitter information).
Are the station names in the metadata consistent with those already loaded to the database (ex. ‘_yyyy’ appended to station names or special characters in the metadata)?
Are all lat/longs in the correct sign? Are they in the correct format (decimal degrees)?
Do all transceivers/test tags have their transmitters provided?
Are all recoveries from previous years recorded?
Do comments suggest anything was lost or damaged, where recovery indicator doesn’t say “lost” or “failed”?

In general, the most common formatting errors occur in records where there are >1 instrument deployed at a station, or where the receiver was deployed and recovered from the same site.

The metadata template available here has a Data Dictionary sheet which contains detailed expectations for each column. Refer back to these definitions often. We have also included some recommendations on our FAQ page. Here are some guidelines:

Deployment, download, and recovery information for each station is entered on a single line.
When more than one instrument is deployed, downloaded, or recovered at the same station, enter each one on a separate line using the same OTN_ARRAY and STATION_NO.
When sentinel tags are co-deployed with receivers, their information can be added to TRANSMITTER and TRANSMIT_MODEL columns, on the same line as the receiver deployment.
If a sentinel tag is deployed alone then a new line for that station, with as much information as possible, is added.
When stations are moved to a new location, but the researcher wants to keep the same station names, we often recommend appending ‘_yyyy’ to the station name, but this change might be forgotten the next time they submit metadata. So, we need to manually compare between the database and the metadata for special cases like this. Researchers may also submit station names with special characters which have been previously corrected and loaded to the database We need to make sure those same changes are reflected in the new metadata.
When an instrument is deemed lost, a value of l or lost should be entered in the “recovered” field; if the instrument is found, this can be updated by changing the recovery field to f or found and resubmitting the metadata sheet.
Every time an instrument is brought to the surface, enter y to indicate it was successfully recovered, even if only for downloading and redeployment. A new line for the redeployment is required.

Task List Checkpoint

In GitLab, this task can be completed at this stage:

- [ ] - check that station locations have not changed station "NAMES" since last submission (manual check)

Quality Control - Deploy Nodebook

Each step in the Issue checklist will be discussed here, along with other important notes required to use the Nodebook.

Imports Cell

There are no values here which need to be edited.

Path to File

In this cell, you need to paste a filepath to the relevant Deployment Metadata file. The filepath will be added between the provided quotation marks.

Correct formatting looks something like this:

# Shortfrom metadata path (xls, csv)
filepath = r'C:/Users/path/to/deployment_metadata.xlsx'

You also must select the format of the Deployment metadata. Currently, only the FACT Network uses a slightly different format than the template available here. If its relevant for your Node, you can edit the excel_fmt section.

Correct formatting looks something like this:

excel_fmt = 'otn' # Deployment metadata format 'otn' or 'fact'

Once you have added your filepath and chosen your template format, you can run the cell.

Next, you must choose which sheet you would like to quality control. Generally, it will be named Deployment but is often customized by researchers. Once you have selected the sheet name, do not re-run the cell to save the output - simply ensure the correct sheet is highlighted and move onto the next cell.

Table Name and Database

You will have to edit three sections:

schema = 'collectioncode'
- Please edit to include the relevant project code, in lowercase, between the quotes.
table_name = 'c_shortform_YYYY_mm'
- Within the quotes, please add your custom table suffix. We recommend using year_month or similar, to indicate the most-recently deployed/recovered instrument in the metadata sheet.
engine = get_engine()
- Within the open brackets you need to open quotations and paste the path to your database .kdbx file which contains your login credentials.
- On MacOS computers, you can usually find and copy the path to your database .kdbx file by right-clicking on the file and holding down the “option” key. On Windows, we recommend using the installed software Path Copy Copy, so you can copy a unix-style path by right-clicking.
- The path should look like engine = get_engine('C:/Users/username/Desktop/Auth files/database_conn_string.kdbx').

Once you have added your information, you can run the cell. Successful login is indicated with the following output:

Auth password:········
Connection Notes: None
Database connection established
Connection Type:postgresql Host:db.for.your.org Database:your_db_name User:your_node_admin Node:Node

Verification of File Contents

Run this cell to complete the first round of Quality Control checks.

The output will have useful information:

Is the sheet formatted correctly? Correct column names, datatypes in each column, etc.
Compared to the stations table in the database, are the station names correct? Have stations “moved” location? Are the reported bottom_depths significantly different (check for possible ft vs m vs ftm errors).
Are all recovery dates after the deployment dates?
Are all the provided ins_model_no values present in the obis.instrument_models table? If not, please check the records in the obis.instrument_models and the source file to confirm there are no typos. If this is a new model which has never been used before, use the add instrument_models Nodebook to add the new instrument model.
Do all transceivers/test tags have their transmitters provided? Do these match any manufacturer specifications we have in the database?
Are there any overlapping deployments (one serial number deployed at multiple locations for a period of time)?
Are all the deployments within the Bounding Box of the project. If the bounding box needs to be expanded to include the stations, you can use the Square Draw Tool to re-draw the bounding box until you are happy with it. Once all stations are drawn inside the bounding box, press the Adjust Bounding Box button to save the results.
Are there possible gaps in the metadata, based on previously-loaded detections files? This will be investigated in the Detections-3b Nodebook if you need more details.

The Nodebook will indicate the sheet has passed quality control by adding a ✔️green checkmark beside each section. The Nodebook will also generate an interactive plot for you to explore, summarizing the instruments deployed over time, and a map of the deployments.

Using the map, please confirm the following:

The instrument deployment locations are in the part of the world expected based on the project abstract. Ex: lat/long have correct +/- signs.
The instrument deployments do not occur on land.

If there is information that fails quality control, you should fix the source-file (potentially after speaking to the researcher) and try again.

Deploy 1

Loading the Raw Table

ONLY once the source file has successfully passed ALL quality control checks can you load the raw table to the database.

You have already named the table above, so there are no edits needed in this cell.

The Nodebook will indicate the success of the table-creation with the following message:

Reading file 'deployment_metadata.xlsx' as otn formatted Excel.
Table Loading Complete:
 Loaded XXX records into table schema.c_shortform_YYYY_mm

Task List Checkpoint

In GitLab, this task can be completed at this stage:

- [ ] - load raw receiver metadata ("deploy" notebook) **put_table_name_in_ticket**

Ensure you paste the table name (ex: c_shortform_YYYY_mm) into the indicated section before you check the box.

Verify Raw Table

This cell will now complete the Quality Control checks of the raw table. This is to ensure the Nodebook loaded the records correctly from the Excel sheet.

The output will have useful information:

Are there any duplicate records?
Is there missing information in the ar_model_no and ar_serial_no columns? If so, ensure that it makes sense for receivers to be deployed without Acoustic Releases in this location (ex: diver deployed).
Do all transceivers/test tags have their transmitters provided? Do these match any manufacturer specifications we have in the database?
Are there any non-numeric serial numbers? Do these make sense (ex: environmental sensors)?
Are the deployments within the bounding box?
Is the recovered column completed correctly, based on the comments and the recovery_date columns?
Are there blank strings that need to be set to NULL? If so, press the Set to NULL button in that cell.

The Nodebook will indicate the sheet had passed quality control by adding a ✔️green checkmark beside each section.

If there are any errors, go into database and fix the raw table directly, or contact the researcher and then fix the raw table.

Task List Checkpoint

In GitLab, this task can be completed at this stage:

- [ ] - NAME verify raw table ("deploy" notebook)

Loading Stations Records

STOP - confirm there is no Push currently ongoing. If a Push is ongoing, you must wait for it to be completed before processing beyond this point

Only once the raw table has successfully passed ALL quality control checks can you load the stations information to the database stations table.

Running this cell will first check for any new stations to add, then confirm the records in the stations table matches the records in the moorings table where basisofrecord = 'STATION'.

If new stations are identified:

Compare these station names to existing stations in the stations table in the database. Are they truly new, or is there a typo in the raw table? Did the station change names, but it is in the same location as a previous station?
If there are fixes to be made, change the records in the raw table, or contact the researcher to confirm.
If all stations are truly new, you should review the information in the editable form, then select Add New Stations.

The success message will look like:

Adding station records to the stations table.
Creating new stations...
Added XX new stations to schema.moorings

If the stations and moorings tables are not in sync, you will need to compare the two tables for differences and possibly update one or the other.

Task List Checkpoint

In GitLab, this task can be completed at this stage:

- [ ] - load station records ("deploy" notebook)

Verify Stations Table

This cell will now complete the Quality Control checks of the stations records contained in the entire schema. We are no longer only checking against our newly-loaded records, but also each previously-loaded record in this schema/project. This will help catch historical errors.

The output will have useful information:

Were all the stations from our raw table promoted to the stations table and the moorings table?
Are all stations in unique locations?
Are all stations within the project’s bounding box?
Does the date in the stations table match the first deployment date for that station?
Are there blank strings that need to be set to NULL? If so, press the Set to NULL button in that cell.
Are any of the dates in the future?

The Nodebook will indicate the sheet had passed quality control by adding a ✔️green checkmark beside each section.

If there are any errors, you could directly connect to the database and fix the raw table directly, or contact the researcher and then fix the raw table using updated input metadata. If there are problems with records that have already been promoted to the stations or moorings tables, you will need to create a db fix ticket in Gitlab in order to correct the records in the database.

Task List Checkpoint

In GitLab, this task can be completed at this stage:

- [ ] - verify stations ("deploy" notebook)

Load to rcvr_locations

Once the station table is verified, the receiver deployment records can now be promoted to the “intermediate” rcvr_locations table.

The cell will identify any new deployments to add and any previously-loaded deployments which need updating (ex: they have been recovered).

If new deployments are identified:

Compare these deployments to existing deployments in the rcvr_locations table in the database. Are they truly new, or is there a typo in the raw table? Did the station change names, but it is in the same location as before? Did the serial number change, but the deployment location and date are the same?
If there are fixes to be made, change the records in the raw table or contact the researcher.
If all deployments are truly new, you can then select Add Deployments.

If deployment updates are identified:

Review the suggested changes: do not select changes which remove information (ex: release 590023 –> None is not a good change to accept).
Use the check boxes to select only the appropriate changes.
Once you are sure the changes are correct, you can then select Update Deployments.

Each instance will give a success message such as:

Loading deployments into the rcvr_locations_table
Loaded XX records into the schema.rcvr_locations table.

Task List Checkpoint

In GitLab, this task can be completed at this stage:

- [ ] - load to rcvr_locations ("deploy" notebook)

Verify rcvr_locations

This cell will now complete the Quality Control checks of the rcvr_locations records contained in the entire schema. We are no longer only checking our newly-loaded records, but also each previously-loaded record for this schema/project. This will help catch historical errors.

The output will have useful information:

Have all deployments been loaded from the raw table? Please note that instruments where a sentinel tag is deployed alone at a station will not be loaded to rcvr_locations, and so these will likely be flagged in this section for your review.
Are there blank strings that need to be set to NULL? If so, press the Set to NULL button in that cell.
Based on comments, are there any instances where a receiver’s status should be changed to “lost”?
Are all instruments in the obis.instrument_models table?
Are there any overlapping deployments?
Are the_geom, serialnumber, and catalognumber formatted correctly?

The Nodebook will indicate the table has passed quality control by adding a ✔️green checkmark beside each section.

If there are any errors with records that have already been promoted to the rcvr_locations table, you will need to create a db fix ticket in Gitlab to correct the records in the database. You may need to contact the researcher before resolving the error.

Task List Checkpoint

In GitLab, this task can be completed at this stage:

- [ ] - verify rcvr_locations ("deploy" notebook)

Load Transmitter Records to Moorings

The transmitter values associated with transceivers, co-deployed sentinel tags, or stand-alone test tags will be loaded to the moorings table in this section. Existing transmitter records will also be updated, if relevant.

If new transmitters are identified:

Review for accuracy then select Add Transmitters.

If transmitter updates are identified:

Review the suggested changes: do not select changes which remove information (ex: release 590023 –> None is not a good change to accept).
Use the check boxes to select only the appropriate changes.
Once you are sure the changes are correct, you can then select Update Transmitters.

Task List Checkpoint

In GitLab, this task can be completed at this stage:

- [ ] - load transmitter records receivers with integral pingers ("deploy" notebook)

Load Receivers to Moorings

The final, highest-level table for instrument deployments is moorings.

The cell will identify any new deployments to add and any previously-loaded deployments which need updating (ex: they have been recovered).

Please review all new deployments and deployment updates for accuracy, then press the associated buttons to make the changes. At this stage, the updates are not editable: any updates chosen from the rcvr_locations section will be processed here.

You may be asked to select an instrumenttype for certain receivers. Use the drop-down menu to select before adding the deployment.

Task List Checkpoint

In GitLab, this task can be completed at this stage:

- [ ] - load to moorings ("deploy" notebook)

Verify Moorings

This cell will now complete the Quality Control checks of the moorings records contained in the entire schema. We are no longer only checking our newly-loaded records, but also each previously-loaded record in this project/schema.

The output will have useful information:

Have all deployments been loaded from rcvr_locations?
Do we have manufacturer specifications for the receivers or tags deployed?
Are there blank strings that need to be set to NULL? If so, press the Set to NULL button in that cell.
Do the STATIONS records match the information in the stations table?
Are all instruments in the obis.instrument_models table?
Are there any overlapping deployments or receivers or tags?
Are there duplicate download records?
Are the latitude, longitude, the_geom, serialnumber, and catalognumber formatted correctly?

The Nodebook will indicate the table has passed quality control by adding a ✔️ green checkmark beside each section.

If there are any errors with records that have already been promoted to the moorings table, you will need to create a db fix ticket in Gitlab to correct the records in the database. You may need to contact the researcher before resolving the error.

Task List Checkpoint

In GitLab, this task can be completed at this stage:

- [ ] - verify moorings ("deploy" notebook)

Final Steps

The remaining steps in the GitLab Checklist are completed outside the Nodebooks.

First: you should access the Repository folder in your browser and add the cleaned Deployment Metadata .xlsx file into the “Data and Metadata” folder.

Finally, the GitLab ticket can be reassigned to an OTN analyst for final verification in the database.

Key Points

Loading receiver metadata requires judgement from the Data Manager

Communication with the researcher is essential when errors are found

Detection Loading

Overview

Teaching: 60 min
Exercises: 0 min

Questions

What is the workflow for loading detection data?

What do I need to look out for as a Node Manager when loading detections?

Objectives

Understand how to use the GitLab checklist

Understand the workflow for detection data in the OTN system

Learn common errors and pitfalls that come up when loading detections

Process workflow

The process workflow for detection data is as follows:

flowchart LR
    tag_start(( )) --> get_meta(Receive 
detection data 
from researchers)
    style tag_start fill:#00FF00,stroke:#00FF00,stroke-width:4px
    get_meta --> gitlab(Create 
Gitlab 
issue)
    gitlab --> inspect(Visually 
inspect)
    inspect --> convert(Convert to 
CSVs)
    convert --> nodebook(Process and verify 
with nodebooks)
    nodebook --> plone(Add data 
to repository folder)
    plone --> otn(Pass to 
OTN)
    otn --> end2(( ))
    style end2 fill:#FF0000,stroke:#FF0000

Once deployment metadata has been processed for a project, the related detections may now be processed. Detection data should be reported to the Node as a collection of raw, unedited files. These can be in the form of a zipped folder of .VRLs, a database from Thelma Biotel or any other raw data product from any manufacturer. The files contain only transmitter numbers and the datetimes at which they were recorded at a specific receiver. The tag metadata and deployment metadata will provide the associated geographic and biological context to this data.

Submitted Records

Immediately upon receipt of the data files, you must create a new GitLab issue. Please use the Detections Issue checklist template.

Here is the Issue checklist, for reference:

Detections
- [ ] - NAME add label *'loading records'*
- [ ] - NAME load raw detections and events `(detections-1` notebook and `events-1` notebook **OR** `Convert - Fathom Export` notebook and `detections-1` notebook) **(put table names here)**
- [ ] - NAME upload raw detections to project folder (OTN members.oceantrack.org, FACT RW etc) if needed
- [ ] - NAME verify raw detections table (`detections-1` notebook)
- [ ] - NAME load raw events to events table (`events-2` notebook)
- [ ] - NAME load to detections_yyyy (`detections-2` notebook) **(put detection years that were loaded here)**
- [ ] - NAME verify detections_yyyy (looking for duplicates) (`detections-2` notebook)
- [ ] - NAME load to sensor_match_yyyy (`detections-2` notebook) **(put sensor years that were loaded here)**
- [ ] - NAME timedrift correction for affected detection and sensor years (`detections-2b` notebook)
- [ ] - NAME verify timedrift corrections (`detections-2b` notebook)
- [ ] - NAME manually check for open, unverified receiver metadata, **STOP** if it exists! (**put Gitlab issue number here**)
-----
- [ ] - NAME load to otn_detections_yyyy (`detections-3` notebook) **(put affected years here)**
- [ ] - NAME verify otn_detections_yyyy (`detections-3` notebook)
- [ ] - NAME load sentinel records (`detections-3` notebook)
- [ ] - NAME check for missing receiver metadata (`detections-3b` notebook)
- [ ] - NAME check for missing data records (`detections-3c` notebook)
- [ ] - NAME load download records (`events-3` notebook)
- [ ] - NAME verify download records (`events-3` notebook)
- [ ] - NAME process receiver configuration (`events-4` notebook)
- [ ] - NAME label issue with *'Verify'*
- [ ] - NAME pass issue to OTN analyst for final steps
- [ ] - NAME check for double reporting (verification_notebooks/`Detection Verification` notebook)
- [ ] - NAME match tags to animals (`detections-4` notebook)
- [ ] - NAME overwrite sentinel tags with animal tags (`detections-4b` notebook)
- [ ] - NAME do sensor tag processing (`detections-5` notebook) - only done if vendor specifications are available
- [ ] - NAME update detection extract table

**detections files/path:**

Visual Inspection

Once the researcher provides the files, the Data Manager should first complete a visual check for formatting and accuracy.

Look for the following in the detection data:

Do the files appear edited? Look for _edited in file name.
Is the file format the same as expected for that manufacturer? Ex. .vrl or .vdat for Innovasea - not .csv or .rld formats.
Is there data for each of the instrument recoveries that was reported in the deployment metadata?

Convert to CSV

Once the raw files are obtained, the data must often be converted to .csv format by the Node Manager. There are several ways this can be done, depending on the manufacturer.

For Innovasea

VUE (Obsolete, prefer Fathom Connect unless receiver unsupported)
- Open a new database
- Import all the VRL files provided
- Select export detections and choose the location you want to save the file
- Select export events and choose the location you want to save the file
Fathom Connect App
- Choose “export data”
- Select the relevant files and import into the Fathom Connect application
- Export all data types, and choose the location you want to save the files
convert - Fathom (vdat) Export - VRL to CSV Nodebook
- This will use the vdat.exe executable to export from VRL/VDAT to CSV
- Instructions for this Nodebook are below

For Thelma Biotel

Use the ComPort software to open the .tbdb file and export as CSV

For Lotek

Exporting to CSV is more complicated, please reach out to OTN for specific steps for a given instrument model

For all other manufacturers, contact OTN staff to get specifics on the detection data loading workflow.

convert - Fathom (vdat) Export - VRL to CSV Nodebook

This will use the vdat.exe executable to export from VRL/VDAT to CSV.

IMPORTANT NOTE: newer versions of vdat.exe are only being supported on Windows. Mac users will not be able to use this Nodebook. For instructions on using a program like Wine to run windows programs on other operating systems, contact the OTN Data Centre.

Before you begin, you will need to ensure you have access to a Fathom vdat executable. This executable ships with Fathom Connect for desktop computers as vdat.exe

Access the Vemco (Innovasea) website to download Fathom Connect - https://support.fishtracking.innovasea.com/s/downloads
Agree to the Licence
The app will install on your computer, along with the newest version of vdat.exe
Locate your ProgramFiles on your computer. Locate the InnovaSea subfolder, and the Fathom folder within.
Copy the full filepath to your vdat.exe file for use in the Nodebook - this will look like C:/Program Files/Innovasea/Fathom/vdat.exe

NOTE: Older versions of VDAT may have unintended consequences when converting newer files (like Open Protocol-enabled Innovasea receivers), and should not be used. Versions newer than vdat-9.3.0-20240207-74ad8e-release are safe to process Open Protocol data. NOT RECOMMENDED BY OTN: If you are desperate for an older version of vdat.exe you can find them here

MAC Users Only
- Locate the vdat executable in your terminal by navigating with the command cd /path/to/vdat/file
- Enable execution by running chmod +x vdat

Imports cell

User Input

You will have to edit 3 sections

vdat_path: Within quotation marks, paste the filepath to the fathom executable (vdat.exe)
vrl_dir: Within quotation marks, paste the path to a folder containing the VRLs or VDATs you would like to convert. Example: vrl_dir = r"/path/to/vrl_files/"
export_path: Within quotation marks, paste the path to the folder to which you would like the CSV exported. Example: export_path = r"/path/to/csv_export/"

Run this cell. There is no output.

Check vdat exists

The Nodebook will indicate the file had passed quality control by adding a ✔️green checkmark and printing the vdat version.

NOTE: Older versions of VDAT may have unintended consequences when converting newer files (like Open Protocol-enabled Innovasea receivers), and should not be used. Versions > vdat-9.3.0-20240207-74ad8e-release are safe to process Open Protocol data.

Get List of Files

Run this cell to see a list of the VRLs or VDAT files the Notebook has idenfitied inside your vrl_dir folder.

Process Files

Run this cell to begin converting your files to CSV. They will be saved into the folder you supplied for export_path above.

The Nodebook will indicate each file has been converted by adding a ✔️green checkmark beside each section as it progresses.

Once this step is complete, you may move onto the Detections - 1 Nodebook.

detections - 1 - load csv detections

Detections-1 loads CSV detections files into a new database table. If detections were exported using Fathom or the convert - Fathom (vdat) Export - VRL to CSV Nodebook, the receiver events records will also be loaded at this stage. This is because these applications combine the detections and events data in one CSV file.

Import cells and Database Connections

As in all Nodebooks, run the import cell to get the packages and functions needed throughout the notebook. This cell can be run without any edits.

The second cell will set your database connection. You will have to edit one section: engine = get_engine()

Within the open brackets you need to open quotations and paste the path to your database .kdbx file which contains your login credentials.
On MacOS computers, you can usually find and copy the path to your database .kdbx file by right-clicking on the file and holding down the “option” key. On Windows, we recommend using the installed software Path Copy Copy, so you can copy a unix-style path by right-clicking.
The path should look like engine = get_engine('C:/Users/username/Desktop/Auth files/database_conn_string.kdbx').

Once you have added your information, you can run the cell. Successful login is indicated with the following output:

Auth password:········
Connection Notes: None
Database connection established
Connection Type:postgresql Host:db.for.your.org Database:your_db_name User:your_node_admin Node:Node

User Input

Cell three requires input from you. This information will be used to get the raw detections CSV and to be able to create a new raw table in the database.

file_or_folder_path = r'C:/Users/path/to/detections_CSVs/'
- Paste a filepath to the relevant CSV file(s). The filepath will be added between the provided quotation marks.
- This can be a path to a single CSV file, or a folder of multiple CSVs.
table_suffix = 'YYYY_mm'
- Within the quotes, please add your custom table suffix. We recommend using the year and month (i.e, YYYY_MM) or similar, to indicate the most-recently downloaded instrument.
schema = 'collectioncode'
- Please edit to include the relevant project code, in lowercase, between the quotes.

There are also some optional inputs:

load_detections: A true or false value using the table suffix you supplied.
stacked: This is for Fathom exports only and refers to how they should be parsed.

Once you have added your information, you can run the cell.

Verify Detection File and Load to Raw Table

Next, the Nodebook will review and verify the detection file(s) format, and report any error. Upon successful verification, you can then run the cell below which will attempt to load the detections into a new raw table.

The Nodebook will indicate the success of the table-creation with a message such as this:

Reading fathom files...
Loading Files...
X/X

Task list checkpoint

In GitLab, this task can be completed at this stage:

- [ ] - NAME load raw detections and events ('detections-1' notebook and 'events-1' notebook **OR** 'Batch Fathom Export' notebook and 'detections-1' notebook) **(put table names here)**

Ensure you paste the table name (ex: c_detections_YYYY_mm) into the indicated section before you check the box.

Verify Raw Detection Table

This cell will now complete the Quality Control checks of the raw table. This is to ensure the Nodebook loaded the records correctly from the CSVs.

The output will have useful information:

Are there any duplicates?
Are the serial numbers formatted correctly?
Are the models formatted correctly?

The Nodebook will indicate the sheet had passed quality control by adding a ✔️green checkmark beside each section.

If there are any errors, contact OTN for next steps.

Task list checkpoint

In GitLab, these tasks can be completed at this stage:

- [ ] - NAME verify raw detections table ('detections-1' notebook)

events - 1 - load events into c_events_yyyy

Events-1 is responsible for loading receiver events files into raw tables. This is only relevant for CSVs that were NOT exported using Fathom or the convert - Fathom (vdat) Export - VRL to CSV Nodebook.

Import cell

As in all Nodebooks run the import cell to get the packages and functions needed throughout the notebook. This cell can be run without any edits.

User Inputs

Cell two requires input from you. This information will be used to get the raw events CSV and to be able to create a new raw table in the database.

filepath = r'C:/Users/path/to/events.csv'
- Paste a filepath to the relevant CSV file. The filepath will be added between the provided quotation marks.
table_name = 'c_events_YYYY_mm'
- Within the quotes, please add your custom table suffix. We recommend using the year and month (i.e, YYYY_MM) or similar, to indicate the most-recently downloaded instrument.
schema = 'collectioncode'
- Please edit to include the relevant project code, in lowercase, between the quotes.

There are also some optional inputs:

file_encoding: How the file is encoded. ISO-8859-1 is the default encoding used in VUE’s event export.

Once you have added your information, you can run the cell.

Verifying the events file

Before attempting to load the event files to a raw table the Nodebook will verify the file to make sure there are no major issues. This will be done by running the Verify Events File cell. If there are no errors, you will be able to continue.

The Nodebook will indicate the success of the file verification with a message such as this:

Reading file 'events.csv' as CSV.
Verifying the file.
Format: VUE 2.6+
Mandatory Columns: OK
date_and_time datetime:OK
Initialization(s): XX
Data Upload(s): XX
Reset(s): XX

Database Connection

You will have to edit one section: engine = get_engine()

Within the open brackets you need to open quotations and paste the path to your database .kdbx file which contains your login credentials.
On MacOS computers, you can usually find and copy the path to your database .kdbx file by right-clicking on the file and holding down the “option” key. On Windows, we recommend using the installed software Path Copy Copy, so you can copy a unix-style path by right-clicking.
The path should look like engine = get_engine('C:/Users/username/Desktop/Auth files/database_conn_string.kdbx').

Once you have added your information, you can run the cell. Successful login is indicated with the following output:

Auth password:········
Connection Notes: None
Database connection established
Connection Type:postgresql Host:db.for.your.org Database:your_db_name User:your_node_admin Node:Node

Load the events file into the c_events_yyyy table

The second last cell loads the events file into a raw table. It depends on successful verification from the last step. Upon successful loading, you can dispose of the engine then move on to the next Nodebook.

The Nodebook will indicate the success of the table-creation with the following message:

File loaded with XXXXX records.
100%

Task list checkpoint

In GitLab, these tasks can be completed at this stage:

- [ ] - NAME load raw detections and events ('detections-1' notebook and 'events-1' notebook **OR** 'Batch Fathom Export' notebook and 'detections-1' notebook) **(put table names here)**

Ensure you paste the table name (ex: c_events_YYYY_mm) into the indicated section before you check the box.

events - 2 - move c_events into events table

This Nodebook will move the raw events records into the intermediate events table.

Import cell

As in all Nodebooks run the import cell to get the packages and functions needed throughout the notebook. This cell can be run without any edits.

User input

This cell requires input from you. This information will be used to get the raw events CSV and to create a new raw table in the database.

c_events_table = 'c_events_YYYY_mm'
- Within the quotes, please add your custom table suffix, which you have just loaded in either detections-1 or events-1.
schema = 'collectioncode'
- Please edit to include the relevant project code, in lowercase, between the quotes.

Database Connection

You will have to edit one section: engine = get_engine()

Within the open brackets you need to open quotations and paste the path to your database .kdbx file which contains your login credentials.
On MacOS computers, you can usually find and copy the path to your database .kdbx file by right-clicking on the file and holding down the “option” key. On Windows, we recommend using the installed software Path Copy Copy, so you can copy a unix-style path by right-clicking.
The path should look like engine = get_engine('C:/Users/username/Desktop/Auth files/database_conn_string.kdbx').

Once you have added your information, you can run the cell. Successful login is indicated with the following output:

Auth password:········
Connection Notes: None
Database connection established
Connection Type:postgresql Host:db.for.your.org Database:your_db_name User:your_node_admin Node:Node

Verify table format

You will then verify that the c_events events table you put in exists and then verify that it meets the required format specifications.

The Nodebook will indicate the success of the table verification with a message such as this:

Checking table name format... OK
Checking if schema collectioncode exists... OK!
Checking collectioncode schema for c_events_YYYY_mm table... OK!
collectioncode.c_events_YYYY_mm table found.

If there are any errors in this section, please contact OTN.

Load to Events table

If nothing fails verification, you can move to the loading cell.

The Nodebook will indicate the success of the processing with a message such as this:

Checking for the collectioncode.events table... OK!
Loading events... OK!
Loaded XX rows into collectioncode.events table.

Task list checkpoint

In GitLab, these tasks can be completed at this stage:

- [ ] - NAME load raw events to events table ("events-2" notebook)

detections - 2 - c_table into detections_yyyy

This Nodebook takes the raw detection data from detections-1 and moves it into the intermediate detections_yyyy tables (separated by year).

Import cells and Database Connections

As in all Nodebooks run the import cell to get the packages and functions needed throughout the notebook. This cell can be run without any edits.

The second cell will set your database connection. You will have to edit one section: engine = get_engine()

Within the open brackets you need to open quotations and paste the path to your database .kdbx file which contains your login credentials.
On MacOS computers, you can usually find and copy the path to your database .kdbx file by right-clicking on the file and holding down the “option” key. On Windows, we recommend using the installed software Path Copy Copy, so you can copy a unix-style path by right-clicking.
The path should look like engine = get_engine('C:/Users/username/Desktop/Auth files/database_conn_string.kdbx').

Once you have added your information, you can run the cell. Successful login is indicated with the following output:

Auth password:········
Connection Notes: None
Database connection established
Connection Type:postgresql Host:db.for.your.org Database:your_db_name User:your_node_admin Node:Node

User Inputs

To load the to the detections_yyyy tables the Nodebook will require information about the current schema and the raw table that you created in detections-1.

c_table = 'c_detections_YYYY_mm'
- Within the quotes, please add your custom table suffix, which you have just loaded in detections-1
schema = 'collectioncode'
- Please edit to include the relevant project code, in lowercase, between the quotes.

The notebook will indicate success with the following message:

Checking table name format... OK
Checking if schema collectioncode exists... OK!
Checking collectioncode schema for c_detections_yyyy_mm table... OK!
collectioncode.c_detections_yyyy_mm table found.

Create Missing Tables

Detections tables are only created on an as-needed basis. These cells will detect any tables you are missing and create them based on the years covered in the raw detection table (c_table). This will check all tables such as detections_yyyy, sensor_match_yyyy and otn_detections_yyyy.

First the Nodebook will gather and print the missing tables. If there are none missing, the Nodebook will report that as well.

vemco: Match
You are missing the following tables:
[collectioncode.detections_YYYY, v2lbeiar.otn_detections_YYYY, v2lbeiar.sensor_match_YYYY]
Create these tables by passing the missing_tables variable into the create_detection_tables function.

If you proceed in the Nodebook, there is a “creation”`” cell which will add these tables to the project schema in the database. Success will be indicated with the following message:

Creating table collectioncode.detections_YYYY... OK
Creating table collectioncode.otn_detections_YYYY... OK
Creating table collectioncode.sensor_match_YYYY... OK

Create Detection Sequence

Before loading detections, a detection sequence is created. The sequence is used to populate the det_key column. The det_key value is an unique ID for that detection to help ensure there are no duplicates. If a sequence is required, you will see this output:

creating sequence v2lbeiar.detections_seq... OK

No further action is needed.

Load to Detections_YYYY

Duplicate detections are then checked for and will not be inserted into the detections_yyyy tables.

If no duplicates are found you will see:

No duplicates found. All of the detections will be loaded into the detections_yyyy table(s).

If duplicates are found you will see a bar chart showing the number of detections per year which either have already been loaded, or are new and will be loaded this time. You will have to investigate if the results are not what you expected.

Detections 2

After all this, the raw detection records are ready to be loaded into the detections_yyyy tables. The notebook will indicate success with the following message:

Inserting records from collectioncode.c_detections_YYYY_mm into collectioncode.detections_2018... OK
Added XXXXX rows.
Inserting records from collectioncode.c_detections_YYYY_mm into collectioncode.detections_2019... OK
Added XXXXX rows.
Inserting records from collectioncode.c_detections_YYYY_mm into collectioncode.detections_2020... OK
Added XXXXX rows.
Inserting records from collectioncode.c_detections_YYYY_mm into collectioncode.detections_2021... OK
Added XXXXX rows.

You must note which years have been loaded! In the example above, this would be 2018, 2019, 2020, and 2021.

Task list checkpoint

In GitLab, these tasks can be completed at this stage:

- [ ] - NAME load to detections_yyyy ("detections-2" notebook) **(put detection years that were loaded here)**

Ensure you paste the affected tables (ex: 2019, 2020) into the indicated section before you check the box.

Verify Detections YYYY Tables

This cell will now complete the quality control checks on the detections_yyyy tables. This is to ensure the nodebook loaded the records correctly.

First, you will need to list all of the years that were affected by the previous loading step, so the Nodebook knows which tables need to be verified.

The format will look like this:

`years = ['YYYY','YYYY','YYYY', 'YYYY']`

If left blank, the Nodebook will check all the years, which may take a long time for some projects.

Run this cell, then you can verify in the next cell.

The output will have useful information:

Were all the detections loaded?
Are the serial numbers formatted correctly?
Are the models formatted correctly?
Are there duplicate detections?
What sensors will need to be loaded?

The Nodebook will indicate the sheet had passed quality control by adding a ✔️green checkmark beside each section.

If there are any errors contact OTN for next steps.

Task list checkpoint

In GitLab, this task can be completed at this stage:

- [ ] - NAME verify detections_yyyy (looking for duplicates) ("detections-2" notebook)

Load sensors_match Tables by Year

For the last part of this Nodebook you will need to load the to the sensor_match_YYYY tables. This loads detections with sensor information into a project’s sensor_match_yyyy tables. Later, these tables will aid in matching vendor specifications to resolve sensor tag values.

Output will appear like this:

Inserting records from collectioncode.detections_2019 INTO sensor_match_2019... OK
Added XXX rows.
Inserting records from collectioncode.detections_2021 INTO sensor_match_2021... OK
Added XXX rows.

You must note which years have been loaded! In the example above, this would be 2019 and 2021.

Task list checkpoint

In GitLab, these tasks can be completed at this stage:

- [ ] - NAME load to sensor_match_yyyy ("detections-2" notebook) **(put sensor years that were loaded here)**

Ensure you paste the affected tables (ex: 2019, 2020) into the Issue.

detections - 2b - timedrift calculations

This Nodebook calculates time drift factors and applies the corrections to the detections_yyyy tables, in a field called corrected_time. OTN’s Data Manager toolbox (the Nodebooks) corrects for timedrift between each initialization and offload of a receiver. If a receiver is offloaded several times in one data file, time correction does not occur linearly from start to end, but between each download, to ensure the most accurate correction. If there is only one download in a data file then the time correction in the VUE software will match the time correction performed by OTN.

Import cells and Database connections

As in all Nodebooks run the import cell to get the packages and functions needed throughout the notebook. This cell can be run without any edits.

The second cell will set your database connection. You will have to edit one section: engine = get_engine()

Within the open brackets you need to open quotations and paste the path to your database .kdbx file which contains your login credentials.
On MacOS computers, you can usually find and copy the path to your database .kdbx file by right-clicking on the file and holding down the “option” key. On Windows, we recommend using the installed software Path Copy Copy, so you can copy a unix-style path by right-clicking.
The path should look like engine = get_engine('C:/Users/username/Desktop/Auth files/database_conn_string.kdbx').

Once you have added your information, you can run the cell. Successful login is indicated with the following output:

Auth password:········
Connection Notes: None
Database connection established
Connection Type:postgresql Host:db.for.your.org Database:your_db_name User:your_node_admin Node:Node

User Inputs

To load the to the detections_yyyy tables the notebook will require the name of the current schema. Please edit schema = 'collectioncode' to include the relevant project code, in lowercase, between the quotes.

Calculating Time Drift Factors

create_tbl_time_drift_factors. This function will create the time_drift_factors table in the schema if it doesn’t exist.

The next step is to run check_time_drifts which gives a display of the time drift factor values that will be added to the time_drift_factors table given an events table. At this stage, you should review for any erroneous/large timedrifts.

If everything looks good, you may proceed to the next cell which adds new time drift factors to the time_drift_factors table from the events file. A success message will appear:

Adding XXX records to collectioncode.time_drift_factors table from collectioncode.events... OK!

You will then see a cell to create missing views. This creates the time drift “views” which the database will use to calculate drift values for both the detections_yyyy and sensor_match_yyyy tables.

Correcting Time Drift

Finally, we are ready to update the times in both the detections_yyyy and sensor_match_yyyy tables with corrected time values using the vw_time_drift_cor database view.

The Nodebook should identify all of the years that were affected by detections-2 loading steps, so the notebook knows which tables need to be corrected.

Once the timedrift calculation is done (indicated by ✔️green checkmarks).

Task list checkpoint

In GitLab, this task can be completed at this stage:

- [ ] - NAME timedrift correction for affected detection and sensor years ("detections-2b" notebook)

Verify Detections After Time Drift Calculations

After running the above cells you will then verify the time drift corrections on the detections_yyyy and sensor_match_yyyy tables.

The output will have useful information:

Are there any erroneous timedrift values?
Did the time correction cause detections to need moving to another yyyy table? If so, select the “move detections” button.
Are the receiver models formatted correctly?

The Nodebook will indicate the sheet had passed quality control by adding a ✔️green checkmark beside each section.

If there are any errors contact OTN for next steps.

Task list checkpoint

In GitLab, this task can be completed at this stage:

- [ ] - NAME verify timedrift corrections ("detections-2b" notebook)

detections - 3 - detections_yyyy into otn_detections

The detections - 3 Nodebook moves the detections from detections_yyyy and sensor_match_yyyy tables into the final otn_detections_yyyy tables. This will join the detections records to their associated deployment records, providing geographic context to each detection. If there is no metadata for a specific detection (that is, no receiver record to match with) it will not be promoted to otn_detections_yyyy.

Import cells and Database connections

As in all Nodebooks run the import cell to get the packages and functions needed throughout the notebook. This cell can be run without any edits.

The second cell will set your database connection. You will have to edit one section: engine = get_engine()

Within the open brackets you need to open quotations and paste the path to your database .kdbx file which contains your login credentials.
On MacOS computers, you can usually find and copy the path to your database .kdbx file by right-clicking on the file and holding down the “option” key. On Windows, we recommend using the installed software Path Copy Copy, so you can copy a unix-style path by right-clicking.
The path should look like engine = get_engine('C:/Users/username/Desktop/Auth files/database_conn_string.kdbx').

Once you have added your information, you can run the cell. Successful login is indicated with the following output:

Auth password:········
Connection Notes: None
Database connection established
Connection Type:postgresql Host:db.load.oceantrack.org Database:otnunit User:admin Node:OTN

User Inputs

To load the to the detections_yyyy tables the Nodebook will require the current schema name. Please edit schema = 'collectioncode' to include the relevant project code, in lowercase, between the quotes.

Before moving on from this you will need to confirm 2 things:

Confirm that NO Push is currently ongoing
confirm rcvr_locations for this schema have been verified.

If a Push is ongoing, or if verification has not yet occurred, you must wait for it to be completed before processing beyond this point.

Task list checkpoint

In GitLab, this task can be completed at this stage:

- [ ] - NAME manually check for open, unverified receiver metadata, **STOP** if it exists! **(put GitLab issue number here)**

Creating detection views and loading to otn_detections

Once you are clear to continue loading you can run create_detection_views. This function, as its name implies, will create database views for detection data.

Output will look like:

Creating view collectioncode.vw_detections_2020... OK
Creating view collectioncode.vw_sentinel_2020... OK
Creating view collectioncode.vw_detections_2021... OK

These are then used to run the function in the next cell load_into_otn_detections_new, which loads the detections from those views into otn_detections. You will be asked to select all relevant tables here, with a dropdown menu and checkboxes.

You must select all years that were impacted by detections_yyyy or sensor_match_yyyy loading steps. Then click the Load Detections button to begin loading. The Nodebook will show a status bar indicating its progress.

Task list checkpoint

In GitLab, this task can be completed at this stage:

- [ ] - NAME load to otn_detections_yyyy ("detections-3" notebook) **(put affected years here)**

Verify OTN Detections

After running your needed cells you will then verify otn_detections_yyyy detections.

The output will have useful information:

Are all the sensors loaded to the sensor_match tables? Compare the counts between detections_yyyy and sensor_match_yyyy provided.
Are all the detections loaded from detections_yyyy to otn_detections_yyyy? Compare the counts provided. If there is a mismatch, we will get more details in the detections-3b Nodebook.
Is the formatting for the dates correct?
Is the formatting for the_geom correct?
Are there detections without receivers? If so, we will get more details in the detections-3b Nodebook.
Are there receivers without detections? If so, we will get more details in the detections-3c Nodebook.

The notebook will indicate the sheet had passed quality control by adding a ✔️green checkmark beside each section.

If there are any errors contact OTN for next steps.

Task list checkpoint

In GitLab, this task can be completed at this stage:

- [ ] - NAME verify otn_detections_yyyy ("detections-3" notebook)

Check and Load Sentinel Records

Are there any sentinel detections identified? If so, select the Load Sentinel Detections for YYYY button. This will move the detections into their own tables so they do not confuse our animal detection numbers and can be used for Sentinel analysis.

You must select all years that were impacted by detections_yyyy or sensor_match_yyyy loading steps.

Task list checkpoint

In GitLab, this task can be completed at this stage:

- [ ] - NAME load sentinel records ("detections-3" notebook)

detections - 3b - missing_metadata_check

This Nodebook checks for detections that have not been inserted into otn_detections_yyyy, which will indicate missing receiver metadata.

The user will be able to set a threshold for the minimum number of detections to look at (default is 100). It will also separate animal detections from transceiver detections in a graph. At the end, it will show a SQL command to run to display the missing metadata in table format.

Import cells and Database connections

As in all Nodebooks run the import cell to get the packages and functions needed throughout the notebook. This cell can be run without any edits.

The second cell will set your database connection. You will have to edit one section: engine = get_engine()

Within the open brackets you need to open quotations and paste the path to your database .kdbx file which contains your login credentials.
On MacOS computers, you can usually find and copy the path to your database .kdbx file by right-clicking on the file and holding down the “option” key. On Windows, we recommend using the installed software Path Copy Copy, so you can copy a unix-style path by right-clicking.
The path should look like engine = get_engine('C:/Users/username/Desktop/Auth files/database_conn_string.kdbx').

Once you have added your information, you can run the cell. Successful login is indicated with the following output:

Auth password:········
Connection Notes: None
Database connection established
Connection Type:postgresql Host:db.for.your.org Database:your_db_name User:your_node_admin Node:Node

User Inputs

Information regarding the tables we want to check against is required.

schema = 'collectioncode'
- Edit to include the relevant project code, in lowercase, between the quotes.
years = []
- A comma-separated list of detection table years for detections_yyyy, should be in form [yyyy] or a list such as [yyyy,yyyy,'early']

Once you have edited these values, you can run the cell. You should see the following success message:

All tables exist!

Check for Missing Metadata in Detections_yyyy

This step will perform the check for missing metadata in detections_yyyy and display results for each record where the number of excluded detections is greater than the specified threshold.

First: enter your threshold. Formatted like: threshold = 100. Then you may run the cell.

The output will include useful information:

Which receiver is missing detections from otn_detections_yyyy?
How many are missing?
What type of detections are missing? Transceiver, animal tags, or test tags.
What are the date-ranges of these missing detections? These dates can be used to determine the period for which we are missing metadata.
Other notes: is this date range before known deployments? After know deployments? Between known deployments?

There will be a visualization of the missing metadata for each instance where the number of missing detections is over the threshold.

Any instance with missing detections (greater than the threshold) should be identified, and the results pasted into a new GitLab Issue. The format will look like:

VR2W-123456
	missing XX detections (0 transceiver tags, XX animal tags, 0 test tags) (before deployments) (YYYY-MM-DD HH:MM:SS to YYYY-MM-DD HH:MM:SS)

VR2W-567891
	missing XX detections (0 transceiver tags, XX animal tags, 0 test tags) (before deployments) (YYYY-MM-DD HH:MM:SS to YYYY-MM-DD HH:MM:SS)

There are also two cells at the end that allow you create reports for researchers in CSV or HTML format.

This new GitLab ticket will require investigation to determine the cause for the missing metadata. The researcher will likely need to be contacted.

Are these “test” detections from on the boat or in the lab and we are not missing real metadata?
Is there a serial number typo in the metadata and it is not missing?
Are the deployment dates for the receiver wrong, and need to be fixed?

You can continue with data loading and then return to your Missing Metadata ticket for investigation.

Task list checkpoint

In GitLab, this task can be completed at this stage:

- [ ] - NAME check for missing receiver metadata ("detections-3b" notebook)

detections - 3c - missing_vrl_check

This Nodebook will check for missing data files in the database by comparing the rcvr_locations and events tables. For any receiver deployments that are missing events, it will check if there are detections during that time period for that receiver.

Import cells and Database connections

As in all Nodebooks run the import cell to get the packages and functions needed throughout the notebook. This cell can be run without any edits.

The second cell will set your database connection. You will have to edit one section: engine = get_engine()

Within the open brackets you need to open quotations and paste the path to your database .kdbx file which contains your login credentials.
On MacOS computers, you can usually find and copy the path to your database .kdbx file by right-clicking on the file and holding down the “option” key. On Windows, we recommend using the installed software Path Copy Copy, so you can copy a unix-style path by right-clicking.
The path should look like engine = get_engine('C:/Users/username/Desktop/Auth files/database_conn_string.kdbx').

Once you have added your information, you can run the cell. Successful login is indicated with the following output:

Auth password:········
Connection Notes: None
Database connection established
Connection Type:postgresql Host:db.for.your.org Database:your_db_name User:your_node_admin Node:Node

User Inputs

To run the missing VRL check, the Nodebook will require the current schema name. Please edit schema = 'collectioncode' to include the relevant project code, in lowercase, between the quotes.

There are also optional fields:

start_year = YYYY: With this and end_year, you may select a time range (in years) to limit the receivers to only ones that where active at some point during the time range.
end_year = YYYY: With this and start_year, you may select a time range (in years) to limit the receivers to only ones that where active at some point during the time range.
skip_events = False: By changing to True, this will skip the events check and go right to checking if there are detections for each period. Only skip the events check if you know there won’t be any events, the detections check takes longer than the event check.

Once you have edited the values, you can run the cell. You should see the following success message:

Checking if collectioncode schema exists...
OK

Checking For Missing Data Files

The next cell will begin scanning the project’s schema to identify if there are any missing data files. If there are no missing files, you will see this output:

Checking if all deployment periods have events...
X/X
Checking if deployment periods missing events have detections...
⭐ All deployment periods had events! Skipping detection check. ⭐

If the Nodebook has detection missing file, you will see this output:

Checking if all deployment periods have events...
XXX/XXX
Checking if deployment periods missing events have detections...
XX/XX

Displaying Missing Data Files

Now the Nodebook will begin plotting a Gantt chart, displaying the periods of deployment for which the database is missing data files. There are some optional customizations you can try:

split_plots = False: you can set to True if you would like multiple, smaller plots created
rcvrs_per_split = 20: if you are splitting the plots, how many receiver deployments should be depicted on each plot?

Running the cell will save your configuration options. The next cell creates the chart(s).

The plot will have useful information:

the receiver (x axis)
the date range (y axis)
the state of the data for that date-range
- all detections and events present
- missing events, detections present
- missing some events
- missing ALL events
hovering over a deployment period with give details regarding the date range

Exporting Results

After investigating the Gantt chart, a CSV export can be created to send to the researcher.

First, you must select which types of records you’d like to export from this list:

all detections and events present
missing events, detections present
missing some events (recommended)
missing ALL events (recommended)

Then, the next cell will print the relevant dataframe, with an option below to Save Dataframe. Type the intended filename and filetype into the File or Dataframe Name box (ex. missing_vrls_collectioncode.csv) and press Save Dataframe. The file should now be available in your ipython-utilities folder for dissemination. Please track this information in a new GitLab ticket.

This new GitLab ticket will require investigation to determine the cause for the missing data. The researcher will likely need to be contacted.

Are these “broken” receivers from which data could not be downloaded? Check the comments for clues.
Is there a typo in the receiver serial number and we are expecting a VRL that doesn’t exist?
Are the deployment dates wrong and we are expecting a VRL that doesn’t exist?
Can the researcher send us the missing VRL?

You can continue with data loading and then return to your Missing Data File ticket for investigation.

Task list checkpoint

In GitLab, this task can be completed at this stage:

- [ ] - NAME check for missing data records ("detections-3c" notebook)

events - 3 - create download records

This Nodebook will promote the events records from the intermediate events table to the final moorings records. Only use this Nodebook after adding the receiver records to the moorings table as this process is dependant on receiver records.

Import cells and Database connections

As in all Nodebooks run the import cell to get the packages and functions needed throughout the notebook. This cell can be run without any edits.

The second cell will set your database connection. You will have to edit one section: engine = get_engine()

Within the open brackets you need to open quotations and paste the path to your database .kdbx file which contains your login credentials.
On MacOS computers, you can usually find and copy the path to your database .kdbx file by right-clicking on the file and holding down the “option” key. On Windows, we recommend using the installed software Path Copy Copy, so you can copy a unix-style path by right-clicking.
The path should look like engine = get_engine('C:/Users/username/Desktop/Auth files/database_conn_string.kdbx').

Once you have added your information, you can run the cell. Successful login is indicated with the following output:

Auth password:········
Connection Notes: None
Database connection established
Connection Type:postgresql Host:db.for.your.org Database:your_db_name User:your_node_admin Node:Node

User Inputs

Information regarding the tables we want to check against is required. Please complete schema = 'collectioncode', edited to include the relevant project code, in lowercase, between the quotes.

Once you have edited the value, you can run the cell.

Detecting Download Records

The next cell will scan the events table looking for data download events, and attempt to match them to their corresponding receiver deployment.

You should see output like this:

Found XXX download records to add to the moorings table

The next cell will print out all the identified download records, in a dataframe for you to view.

Loading Download Records

Before moving on from this you will need to confirm 2 things:

Confirm that NO Push is currently ongoing
Confirm rcvr_locations for this schema have been verified.

If a Push is ongoing, or if verification has not yet occurred, you must wait for it to be completed before processing beyond this point.

If everything is OK, you can run the cell. The Nodebook will indicate success with a message like:

Added XXX records to the moorings table

Task list checkpoint

In GitLab, this task can be completed at this stage:

- [ ] - NAME load download records ("events-3" notebook)

Verify Download Records

This cell will have useful information:

Are the instrument models formatted correctly?
Are receiver serial numbers formatted correctly?
Are there any other outstanding download records which haven’t been loaded?

The Nodebook will indicate the table has passed verification by the presence of ✔️green checkmarks.

If there are any errors, contact OTN for next steps.

Task list checkpoint

In GitLab, this task can be completed at this stage:

- [ ] - NAME verify download records ("events-3" notebook)

events-4 - process receiver configuration

This Nodebook will process the receiver configurations (such as MAP code) from the events table and load them into the schema’s receiver_config table. This is a new initiative by OTN to document and store this information, to provide better feedback to researchers regarding the detectability of their tag programming through time and space.

There are many cells in this Nodebook that display information but no action is needed from the Node Manager.

Import cells and Database connections

As in all Nodebooks run the import cell to get the packages and functions needed throughout the notebook. This cell can be run without any edits.

The second cell will set your database connection. You will have to edit one section: engine = get_engine()

Within the open brackets you need to open quotations and paste the path to your database .kdbx file which contains your login credentials.
On MacOS computers, you can usually find and copy the path to your database .kdbx file by right-clicking on the file and holding down the “option” key. On Windows, we recommend using the installed software Path Copy Copy, so you can copy a unix-style path by right-clicking.
The path should look like engine = get_engine('C:/Users/username/Desktop/Auth files/database_conn_string.kdbx').

Once you have added your information, you can run the cell. Successful login is indicated with the following output:

Auth password:········
Connection Notes: None
Database connection established
Connection Type:postgresql Host:db.for.your.org Database:your_db_name User:your_node_admin Node:Node

User Inputs

Information regarding the tables we want to check against is required. Please complete schema = 'collectioncode', edited to include the relevant project code, in lowercase, between the quotes.

Once you have edited the value, you can run the cell.

Get Receiver Configuration

Using the receiver deployment records, and the information found in the events table, this cell will identify any important configuration information for each deployment. A dataframe will be displayed.

The following cell (“Process receiver config”) will extrapolate further to populate all the required columns from the receiver_config table. A dataframe will be displayed.

Processed Configuration

Run this to see the rows that were successfully processed from the events table. These will be sorted in the next step into (1) duplicates, (2) updates, and (3) new configurations. Of the latter two, you will be able to select which ones you want to load into the database.

Incomplete Configuration

Run this cell to see any rows that could not be properly populated with data, i.e, a missing frequency or a missing map code. This will usually happen as a result of a map code that could not be correctly processed. These rows will not be loaded and will have to be fixed in the events table if you want the configuration to show up in the receiver_config table.

Sort Configuration

All processed configurations are sorted into (1) duplicates, (2) updates, and (3) new configurations. Of the latter two, you will be able to select which ones you want to load into the database in future cells.

Duplicates

No action needed - These rows have already been loaded, and there are no substantial updates to be made.

Updates

Action needed - These rows have incoming configuration from the events table that represent updates to what is already in the table for a given catalognumber at a given frequency. Example: a new version of VDAT exported more events information, and we would like to ensure this new informatiuon is added to existing records in the events table.

Select the rows using the checkboxes that you want to make updates to, then run the next cell to make the changes.

You should see the following success message, followed by a dataframe:

XX modifications made to receiver config table.

New Confirguration

These rows are not already in the receiver configuration table.

Select the rows using the checkboxes that you want to add to the database, then run the next cell to make the changes.

You should see the following success message, followed by a dataframe:

XX modifications made to receiver config table.

Task list checkpoint

In GitLab, this task can be completed at this stage:

- [ ] - NAME process receiver configuration ("events-4" notebook)

Final Steps

The remaining steps in the GitLab Checklist are completed outside the Nodebooks.

First: you should access the Repository folder in your browser and ensure the raw detections are posted in the Data and Metadata folder.

Finally, the Issue can be passed off to an OTN-analyst for final verification in the database.

Key Points

Its important to handle errors when they come up as they can have implications on detections

OTN finishes off detections Issues by running Matching and sensor tag processing

Moving Platform: Mission Metadata, Telemetry and Detection Loading

Overview

Teaching: 150 min
Exercises: 0 min

Questions

What are considered moving platforms. What are the use cases?

How do I load moving platform metadata, telemetry data, and detections into the Database?

Objectives

Understand the workflow for moving platform workflow in the OTN system

Learn how to use the Movers notebooks

Known issues and shooting tips

Here is the issue checklist in the OTN Gitlab Moving Platforms template, for reference:

Moving platform 
- [ ] - NAME load raw metadata file (`movers-1` notebook)**(:fish: table name: c_moving_platform_missions_yyyy)**
- [ ] - NAME load raw telemetry files (`movers-2` notebook) **(:fish: table name: c_moving_platform_telemetry_yyyy**)
- [ ] - NAME create telemetry table from raw table (`movers-2` notebook) **(:fish: table name: moving_platform_telemetry_yyyy**)
- [ ] - NAME combine mission metadata with telemetry (`movers-2` notebook) **(:fish: table name: moving_platform_mission_telemetry_yyyy)**
- [ ] - NAME load to raw detections (`detections-1` notebook) **(:fish: table name: c_detections_yyyy)**
- [ ] - NAME verify raw detections table (`detections-1` notebook)
- [ ] - NAME load raw events (`events-1` notebook) **(:fish: table name: c_events_yyyy )**
- [ ] - NAME load raw events to events table (`events-2` notebook)
- [ ] - NAME load to detections_yyyy_movers (`movers-2` notebook) **(:fish: put affected years here)**
- [ ] - NAME delete self detections (`movers-3` notebook)
- [ ] - NAME timedrift correction for affected detection (`movers-3` notebook)
- [ ] - NAME verify timedrift corrections (`movers-3` notebook)
- [ ] - NAME verify detections_yyyy_movers (looking for duplicates) (`movers-3` notebook)
- [ ] - NAME load to sensor match (`movers-3` notebook) **(:fish: put affected years here)**
- [ ] - NAME load formatted telemetry tables (`movers-4` notebook) **(:fish: put affected years here)**
- [ ] - NAME load reduced telemetry tables (`movers-4` notebook) **(:fish: put affected years here)**
- [ ] - NAME load glider as receiver tables (`movers-4` notebook) **(:fish: put affected years here)**
- [ ] - NAME load into vw_detections_yyyy_movers (`movers-4` notebook) **(:fish: put affected years here)**
- [ ] - NAME load view detections into otn_detections_yyyy (`movers-4` notebook) **(:fish: put affected years here)**
- [ ] - NAME verify otn_detections_yyyy (`movers-4` notebook)
- [ ] - NAME create mission and receiver records in moorings (`movers-4` notebook)
- [ ] - NAME load download records (`events-3` notebook)
- [ ] - NAME verify download records (`events-3` notebook)
- [ ] - NAME process receiver configuration (`events-4` notebook)
- [ ] - NAME label issue with *'Verify'*
- [ ] - NAME pass issue to analyst for final steps
- [ ] - NAME match tags to animals (`detections-4` notebook)
- [ ] - NAME update detection extract table

metadata: **(put metadata repository link here)**

data: **(put data repository link here)**

telemetry: **(put telemetry repository link here)**

Loading Mission Metadata

Moving platform missing metadata should be reported to the Node in the template provided here: Moving Platforms Metadata. This spreadsheet file will contain one or more missions (rows) of the moving platform: identifiers, instruments used, and deployment/recovery times.

Quality control the MOVING PLATFORMS METADATA spreadsheet. If any modification save revised version as _QCed.xlsx

1.1 Visually check for any missing information and inconsistant or formatting issues in the essential columns (in dark-green backgroup color)? Column names and example data are shown as below:

PLATFORM_ID: e.g. OTN-Gl-1.
OTN_MISSION_ID: e.g. OTN-Gl-1-20231003T1456 (Note: otn_mission_id is an internal unique identifier which can be constructed as platform_id + deploy_date_time). Recommend this format to comply with OTN and also match the Dataset ID of IOOS US Glider DAC ERDDAP Datasets
INS_MODEL_NO: e.g. VMT
INS_SERIAL_NO: e.g. 130000
DEPLOY_DATE_TIME: e.g. 2023-10-03T14:56:00
DEPLOY_LAT: e.g. 16.7428
DEPLOY_LONG: e.g. -24.7889
RECOVER_DATE_TIME: e.g. 2023-12-03T12:00:00
RECOVERED: (y/n/l)
RECOVER_LAT: e.g. 16.9251
RECOVER_LONG: e.g. -25.4713
DATA_DOWNLOADED: (y/n)
DOWNLOAD_DATE_TIME: e.g. 2024-06-11T00:00:00
FILENAME: e.g. VMT_130000_20240616_133100.vrl (Note: prefer original downloads).

1.2 Optional but nice to have columns:

TRANSMITTER and TRANSMIT_MODEL to reduce self-detections.

Run through the movers - 1 - Load Mission Metadata Nodebook to load the spreadsheet into the mission_table. See steps here:

User Input

Cell three requires input from you. This information will be used to get the raw mission CSV and to be able to create a new raw mission table in the database.

schema: ‘collectioncode’
- Please edit to include the relevant project code, in lowercase, between the quotes.
table_suffix: e.g. 2024_03 - should be the same as in the movers - 1 - Load Mission Metadata Nodebook.
- Within the quotes, please add your custom table suffix. We recommend using year_month or similar, to indicate the most-recently downloaded instrument.
mission_file: ‘/path/to/mission_file’
- paste a filepath to the relevant XLSX file. The filepath will be added between the provided quotation marks.

Check off the step and record the mission_table name in the Gitlab ticket.

- [ ] - NAME load raw metadata file (movers-1 notebook)**(:fish: table name: c_moving_platform_missions_yyyy)**

Loading Telemetry Data

Visually check if there is any missing information or inconsistent formatting in the four essential columns in each submitted telemetry file. Column names and example data are shown as below:
- Timestamp: e.g. 2023-12-13T13:10:12 (Note: the column name may be different)
- lat: e.g. 28.33517 (Note: the column name may be different)
- lon: e.g. -80.33734833 (Note: the column name may be different)
- PLATFORM_ID: e.g. OTN-GL-1 (Note: the column name may be different. Ensure the values match the mission_table.platform_id in the Loading Mission Metadata step)
- OTN_MISSION_ID: e.g. OTN-GL-1-20231003T1456 (Note: this column needs to be added in a spreadsheet application. And populate the values to match the values in the mission_table.otn_mission_id in the Loading Mission Metadata step)
Launch the movers - 2 - Load telemetry Nodebook

User Input

Cell three requires input from you. This information will be used to get the telemetry CSV and to be able to create a new raw telemetry table in the database.

table_suffix: e.g. 2024_03
- Within the quotes, please add your custom table suffix. We recommend using year_month or similar.
schema: ‘collectioncode’
- Please edit to include the relevant project code, in lowercase, between the quotes.
telemetry_file: ‘/path/to/telem_file’
- Paste a filepath to the relevant CSV file. The filepath will be added between the provided quotation marks.

Note: if multiple telemetry files are submitted use the Optional: Combine telemetry files from multiple missions/platforms section to combine them.

Run the Prepare the telemetry file to be uploaded cell to map the spreadsheet comlumns to Database columns.
Run the verify_telemetry_file and Upload the telemetry file to raw table cells to load the telemetry data (.xlsx or .csv) file into the raw_telemetry table.
Check off the step and record the c_moving_platform_telemetry name in the Gitlab ticket.

- [ ] - NAME load raw telemetry files (movers-2 notebook) **(:fish: table name: c_moving_platform_telemetry_yyyy**)

Run the Create the telemtry table for joining to missions and verify_telemetry_table cell to create and load telemetry table (e.g. moving_platform_telemetry_2024_03).
Check off the step and record the moving_platform_telemetry name in the Gitlab ticket.

- [ ] - NAME create telemetry table from raw table (movers-2 notebook) **(:fish: table name: moving_platform_telemetry_yyyy**)

Run the movers - 2 - Load telemetry notebook: verify_missions_table, create_joined_table, and verify_telemetry_table cells to create the moving_platform_mission_telemetry table:
Check off the step and record the moving_platform_mission_telemetry name in the Gitlab ticket.

- [ ] - NAME combine mission metadata with telemetry (movers-2 notebook) **(:fish: table name: moving_platform_mission_telemetry_yyyy)**

Loading Raw Detections and Events

These detailed steps and explanations are the same as https://ocean-tracking-network.github.io/node-manager-training/10_Detections/index.html Convert to CSV section, detections - 1 - load csv detections section, events - 1 - load events into c_events_yyyy section and events - 2 - move c_events into events table section. Please use the above Detection Loading process as reference.

The related detections may now be processed. Detection data should be reported to the Node as a collection of raw, unedited files. These can be in the form of a zipped folder of .VRLs, a database from Thelma Biotel or any other raw data product from any manufacturer.

Visual Inspection

Once the files are received from a researcher, the Data Manager should first complete a visual check for formatting and accuracy.

Things to visually check:

Do the files appear edited? Look for _edited in file name.
Is the file format the same as expected for that manufacturer? Ex. .vrl for Innovasea - not .csv or rld formats.

Convert to CSV

Once the raw files are obtained, the data must often be converted to .csv format by the Node Manager. There are several ways this can be done, depending on the manufacturer.

For Innovasea

VUE (Obsolete, prefer Fathom Connect unless receiver unsupported)
- Open a new database
- Import all the VRL files provided
- Select export detections and choose the location you want to save the file
- Select export events and choose the location you want to save the file
Fathom Connect App
- Choose “export data”
- Select the relevant files and import into the Fathom Connect application
- Export all data types, and choose the location you want to save the files
convert - Fathom (vdat) Export - VRL to CSV Nodebook
- This will use the vdat.exe executable to export from VRL/VDAT to CSV
- Instructions for this Nodebook are in the Detections lesson.

For Thelma Biotel

Use the ComPort software to open the .tbdb file and export as CSV

For Lotek

Exporting to CSV is more complicated, please reach out to OTN for specific steps for a given instrument model

For all other manufacturers, contact OTN staff to get specifics on the detection data loading workflow.

detections - 1 - load csv detections

Import cells and Database Connections

As in all Nodebooks, run the import cell to get the packages and functions needed throughout the notebook. This cell can be run without any edits.

The second cell will set your database connection. You will have to edit one section: engine = get_engine()

Within the open brackets you need to open quotations and paste the path to your database .kdbx file which contains your login credentials.
On MacOS computers, you can usually find and copy the path to your database .kdbx file by right-clicking on the file and holding down the “option” key. On Windows, we recommend using the installed software Path Copy Copy, so you can copy a unix-style path by right-clicking.
The path should look like engine = get_engine('C:/Users/username/Desktop/Auth files/database_conn_string.kdbx').

Once you have added your information, you can run the cell. Successful login is indicated with the following output:

Auth password:········
Connection Notes: None
Database connection established
Connection Type:postgresql Host:db.for.your.org Database:your_db_name User:your_node_admin Node:Node

User Input

Cell three requires input from you. This information will be used to get the raw detections CSV and to be able to create a new raw table in the database.

file_or_folder_path = r'C:/Users/path/to/detections_CSVs/'
- Paste a filepath to the relevant CSV file(s). The filepath will be added between the provided quotation marks.
- This can be a path to a single CSV file, or a folder of multiple CSVs.
table_suffix = 'YYYY_mm'
- Within the quotes, please add your custom table suffix. We recommend using the year and month (i.e, YYYY_MM) or similar, to indicate the most-recently downloaded instrument.
schema = 'collectioncode'
- Please edit to include the relevant project code, in lowercase, between the quotes.

There are also some optional inputs:

load_detections: A true or false value using the table suffix you supplied.
stacked: This is for Fathom exports only and refers to how they should be parsed.

Once you have added your information, you can run the cell.

Verify Detection File and Load to Raw Table

The Nodebook will indicate the success of the table-creation with a message such as this:

Reading fathom files...
Loading Files...
X/X

Task list checkpoint

In GitLab, this task can be completed at this stage:

- [ ] - NAME load to raw detections ("detections-1" notebook) **(:fish: table name: c_detections_yyyy)**

Ensure you paste the table name (ex: c_detections_YYYY_mm) into the indicated section before you check the box.

Verify Raw Detection Table

This cell will now complete the Quality Control checks of the raw table. This is to ensure the Nodebook loaded the records correctly from the CSVs.

The output will have useful information:

Are there any duplicates?
Are the serial numbers formatted correctly?
Are the models formatted correctly?

The Nodebook will indicate the sheet had passed quality control by adding a ✔️green checkmark beside each section.

If there are any errors, contact OTN for next steps.

Task list checkpoint

In GitLab, these tasks can be completed at this stage:

- [ ] - NAME verify raw detections table ('detections-1' notebook)

events - 1 - load events into c_events_yyyy

Import cell

As in all Nodebooks run the import cell to get the packages and functions needed throughout the notebook. This cell can be run without any edits.

User Inputs

Cell two requires input from you. This information will be used to get the raw events CSV and to be able to create a new raw table in the database.

filepath = r'C:/Users/path/to/events.csv'
- Paste a filepath to the relevant CSV file. The filepath will be added between the provided quotation marks.
table_name = 'c_events_YYYY_mm'
- Within the quotes, please add your custom table suffix. We recommend using the year and month (i.e, YYYY_MM) or similar, to indicate the most-recently downloaded instrument.
schema = 'collectioncode'
- Please edit to include the relevant project code, in lowercase, between the quotes.

There are also some optional inputs:

file_encoding: How the file is encoded. ISO-8859-1 is the default encoding used in VUE’s event export.

Once you have added your information, you can run the cell.

Verifying the events file

The Nodebook will indicate the success of the file verification with a message such as this:

Reading file 'events.csv' as CSV.
Verifying the file.
Format: VUE 2.6+
Mandatory Columns: OK
date_and_time datetime:OK
Initialization(s): XX
Data Upload(s): XX
Reset(s): XX

Database Connection

You will have to edit one section: engine = get_engine()

Within the open brackets you need to open quotations and paste the path to your database .kdbx file which contains your login credentials.
On MacOS computers, you can usually find and copy the path to your database .kdbx file by right-clicking on the file and holding down the “option” key. On Windows, we recommend using the installed software Path Copy Copy, so you can copy a unix-style path by right-clicking.
The path should look like engine = get_engine('C:/Users/username/Desktop/Auth files/database_conn_string.kdbx').

Once you have added your information, you can run the cell. Successful login is indicated with the following output:

Auth password:········
Connection Notes: None
Database connection established
Connection Type:postgresql Host:db.for.your.org Database:your_db_name User:your_node_admin Node:Node

Load the events file into the c_events_yyyy table

The Nodebook will indicate the success of the table-creation with the following message:

File loaded with XXXXX records.
100%

Task list checkpoint

In GitLab, these tasks can be completed at this stage:

- [ ] - NAME load raw events (events-1 notebook) **(:fish: table name: c_events_yyyy )**

Ensure you paste the table name (ex: c_events_YYYY_mm) into the indicated section before you check the box.

events - 2 - move c_events into events table

This Nodebook will move the raw events records into the intermediate events table.

Import cell

As in all Nodebooks run the import cell to get the packages and functions needed throughout the notebook. This cell can be run without any edits.

User input

This cell requires input from you. This information will be used to get the raw events CSV and to create a new raw table in the database.

c_events_table = 'c_events_YYYY_mm'
- Within the quotes, please add your custom table suffix, which you have just loaded in either detections-1 or events-1.
schema = 'collectioncode'
- Please edit to include the relevant project code, in lowercase, between the quotes.

Database Connection

You will have to edit one section: engine = get_engine()

Within the open brackets you need to open quotations and paste the path to your database .kdbx file which contains your login credentials.
On MacOS computers, you can usually find and copy the path to your database .kdbx file by right-clicking on the file and holding down the “option” key. On Windows, we recommend using the installed software Path Copy Copy, so you can copy a unix-style path by right-clicking.
The path should look like engine = get_engine('C:/Users/username/Desktop/Auth files/database_conn_string.kdbx').

Once you have added your information, you can run the cell. Successful login is indicated with the following output:

Auth password:········
Connection Notes: None
Database connection established
Connection Type:postgresql Host:db.for.your.org Database:your_db_name User:your_node_admin Node:Node

Verify table format

You will then verify that the c_events events table you put in exists and then verify that it meets the required format specifications.

The Nodebook will indicate the success of the table verification with a message such as this:

Checking table name format... OK
Checking if schema collectioncode exists... OK!
Checking collectioncode schema for c_events_YYYY_mm table... OK!
collectioncode.c_events_YYYY_mm table found.

If there are any errors in this section, please contact OTN.

Load to Events table

If nothing fails verification, you can move to the loading cell.

The Nodebook will indicate the success of the processing with a message such as this:

Checking for the collectioncode.events table... OK!
Loading events... OK!
Loaded XX rows into collectioncode.events table.

Task list checkpoint

In GitLab, these tasks can be completed at this stage:

- [ ] - NAME load raw events to events table ("events-2" notebook)

Loading Detections for Moving Platforms

With the telemetry and mission table, we can now upload the raw detections and promote them to the detections_yyyy_movers tables.
This Nodebook is analogous to detections - 2 - c_table into detections_yyyy and detections - 2b - timedrift calculations. The difference is it handles _movers tables.
1. Run movers - 3 - Load Detections Nodebook till Load raw dets into detections_yyyy_movers cell to populate detections_yyyy_movers tables and load raw detections into them.

User Input

Cell three requires input from you. This information will be used to get the raw detections CSV and to be able to create a new raw table in the database.

table_suffix: e.g. 2024_03 - should be the same as in the movers - 1 - Load Mission Metadata Nodebook.
- Within the quotes, please add your custom table suffix. We recommend using year_month or similar.
schema: ‘collectioncode’
- Please edit to include the relevant project code, in lowercase, between the quotes.

Run the Load raw dets into detections_yyyy_movers cell and note the output.
Check off the step and record affected years (which detections_yyyy_movers tables were updated) in the Gitlab ticket.

- [ ] - NAME load to detections_yyyy_movers (movers-2 notebook) **(:fish: put affected years here)**

Note: if no detections were promoted into detections_yyyy_movers please assign the ticket to OTN for trouble shooting.

Run the next six cells to load timedrift factors, adjustment detection datetime to detections_yyyy_movers and verify the timedrift corrections. Check off the steps in the Gitlab ticket.

- [ ] - NAME timedrift correction for affected detection (movers-3 notebook)

- [ ] - NAME verify timedrift corrections (movers-3 notebook)

Run the next two cells to verify the detections_yyyy_movers tables. Check off the step in the Gitlab ticket.

- [ ] - NAME verify detections_yyyy_movers (looking for duplicates) (movers-3 notebook)

Run the last cell to create and load sensor match tables for movers. Check off the step and record affected years in the Gitlab ticket.

- [ ] - NAME load to sensor match (movers-3 notebook) **(:fish: put affected years here)**

Loading OTN Detections

User Input

Cell three requires input from you. This information will be used to get the raw detections CSV and to be able to create a new raw table in the database.

table_suffix: e.g. 2024_03 - should be the same as in the movers - 1 - Load Mission Metadata notebook.
- Within the quotes, please add your custom table suffix. We recommend using year_month or similar.
schema: ‘collectioncode’
- Please edit to include the relevant project code, in lowercase, between the quotes.
years: e.g. [2022, 2023]
- Enter the affected years from movers - 3 - Load Detections

Run the create_moving_platforms_full_telemetry cell to load to the formatted telemetry table. Check off the step and record the affected years in the Gitlab ticket.

- [ ] - NAME load formatted telemetry tables (movers-4 notebook) **(:fish: put affected years here)**

Run the create_reduced_telemetry_tables cell to load the reduced telemetry tables. Check off the step and record the affected years in the Gitlab ticket.

- [ ] - NAME load reduced telemetry tables (movers-4 notebook) **(:fish: put affected years here)**

Run the create_platform_as_receiver_tables cell to load the “glider as receiver” tables. Check off the step and record the affected years in the Gitlab ticket.

- [ ] - NAME load glider as receiver tables (movers-4 notebook) **(:fish: put affected years here)**

Run the create_detections_view and load_view_into_otn_dets cells to load view detections into otn_detections_yyyy tables. Check off the step and record the affected years in the Gitlab ticket.

- [ ] - NAME load into vw_detections_yyyy_movers (movers-4 notebook) **(:fish: put affected years here)**

- [ ] - NAME load view detections into otn_detections_yyyy (movers-4 notebook) **(:fish: put affected years here)**

Note: if no detections were promoted into detections_yyyy_movers please assign the ticket to OTN for trouble shooting.

Run the verify_otn_detections cell to verify the otn_detections_yyyy tables. Check off the step in the Gitlab ticket.

- [ ] - NAME verify otn_detections_yyyy (movers-4 notebook)

Run the load_platforms_to_moorings cell to verify the moorings tables. Check off the step in the Gitlab ticket.

- [ ] - NAME create mission and receiver records in moorings (movers-4 notebook)

events - 3 - create download records

NOTE: as of November 2024 the below Nodebooks do no support the Movers data. This will be completed shortly.

Import cells and Database connections

As in all Nodebooks run the import cell to get the packages and functions needed throughout the notebook. This cell can be run without any edits.

The second cell will set your database connection. You will have to edit one section: engine = get_engine()

Within the open brackets you need to open quotations and paste the path to your database .kdbx file which contains your login credentials.
On MacOS computers, you can usually find and copy the path to your database .kdbx file by right-clicking on the file and holding down the “option” key. On Windows, we recommend using the installed software Path Copy Copy, so you can copy a unix-style path by right-clicking.
The path should look like engine = get_engine('C:/Users/username/Desktop/Auth files/database_conn_string.kdbx').

Once you have added your information, you can run the cell. Successful login is indicated with the following output:

Auth password:········
Connection Notes: None
Database connection established
Connection Type:postgresql Host:db.for.your.org Database:your_db_name User:your_node_admin Node:Node

User Inputs

Information regarding the tables we want to check against is required. Please complete schema = 'collectioncode', edited to include the relevant project code, in lowercase, between the quotes.

Once you have edited the value, you can run the cell.

Detecting Download Records

The next cell will scan the events table looking for data download events, and attempt to match them to their corresponding receiver deployment.

You should see output like this:

Found XXX download records to add to the moorings table

The next cell will print out all the identified download records, in a dataframe for you to view.

Loading Download Records

Before moving on from this you will need to confirm 2 things:

Confirm that NO Push is currently ongoing
Confirm rcvr_locations for this schema have been verified.

If a Push is ongoing, or if verification has not yet occurred, you must wait for it to be completed before processing beyond this point.

If everything is OK, you can run the cell. The Nodebook will indicate success with a message like:

Added XXX records to the moorings table

Task list checkpoint

In GitLab, this task can be completed at this stage:

- [ ] - NAME load download records ("events-3" notebook)

Verify Download Records

This cell will have useful information:

Are the instrument models formatted correctly?
Are receiver serial numbers formatted correctly?
Are there any other outstanding download records which haven’t been loaded?

The Nodebook will indicate the table has passed verification by the presence of ✔️green checkmarks.

If there are any errors, contact OTN for next steps.

Task list checkpoint

In GitLab, this task can be completed at this stage:

- [ ] - NAME verify download records ("events-3" notebook)

events-4 - process receiver configuration

There are many cells in this Nodebook that display information but no action is needed from the Node Manager.

Import cells and Database connections

As in all Nodebooks run the import cell to get the packages and functions needed throughout the notebook. This cell can be run without any edits.

The second cell will set your database connection. You will have to edit one section: engine = get_engine()

Within the open brackets you need to open quotations and paste the path to your database .kdbx file which contains your login credentials.
On MacOS computers, you can usually find and copy the path to your database .kdbx file by right-clicking on the file and holding down the “option” key. On Windows, we recommend using the installed software Path Copy Copy, so you can copy a unix-style path by right-clicking.
The path should look like engine = get_engine('C:/Users/username/Desktop/Auth files/database_conn_string.kdbx').

Once you have added your information, you can run the cell. Successful login is indicated with the following output:

Auth password:········
Connection Notes: None
Database connection established
Connection Type:postgresql Host:db.for.your.org Database:your_db_name User:your_node_admin Node:Node

User Inputs

Information regarding the tables we want to check against is required. Please complete schema = 'collectioncode', edited to include the relevant project code, in lowercase, between the quotes.

Once you have edited the value, you can run the cell.

Get Receiver Configuration

The following cell (“Process receiver config”) will extrapolate further to populate all the required columns from the receiver_config table. A dataframe will be displayed.

Processed Configuration

Incomplete Configuration

Sort Configuration

Duplicates

No action needed - These rows have already been loaded, and there are no substantial updates to be made.

Updates

Select the rows using the checkboxes that you want to make updates to, then run the next cell to make the changes.

You should see the following success message, followed by a dataframe:

XX modifications made to receiver config table.

New Confirguration

These rows are not already in the receiver configuration table.

Select the rows using the checkboxes that you want to add to the database, then run the next cell to make the changes.

You should see the following success message, followed by a dataframe:

XX modifications made to receiver config table.

Task list checkpoint

In GitLab, this task can be completed at this stage:

- [ ] - NAME process receiver configuration ("events-4" notebook)

Final Steps

The remaining steps in the GitLab Checklist are completed outside the Nodebooks.

First: you should access the Repository folder in your browser and ensure the raw detections are posted in the Data and Metadata folder.

Finally, the Issue can be passed off to an OTN-analyst for final verification in the database.

Troubleshoot Tips and Resources

To visualize detections along telemetries use: movers - 2b - Compare detections to telemetry
- Gantt Chart of Detections vs Telemetry
- Missing Telemetry Graph

Known issues and Limitations

events - 3 - create download records does not support moving platform as of Nov-2024.

Key Points

OTN supports processing of slocum and wave glider detections, detections from other mobile platforms (ship-board receivers, animal-mounted receivers, etc.), and active tracking (reference https://oceantrackingnetwork.org/gliders/).

OTN is able to match detections collected by moving platforms as long as geolocation data is provided.

mission metadata, telemetry data and detection data should be submitted prior to the Moving platform data loading process.

Creating Repeatable Data Summaries and Visualizations

Overview

Teaching: 30 min
Exercises: 0 min

Questions

What reports and visualizations are provided to help me summarize the data held in my Database Node?

How do I create my own custom repeatable reports from the data held in my Database Node?

How do I share the reports I’ve created with other node managers?

Objectives

Learn about some of the tables in the aggregation schemas discovery and geoserver

Learn about the existing suite of reporting and visualization notebooks

Become comfortable with sharing the custom products you create

The data sources:

The `discovery` schema

The discovery schema of each node, as well as the OTN database’s own discovery schema, contains summarized data products that are meant to be quick shortcuts to analysis and reporting of popular data types. Here you can downsize the number of columns that accompany tag or receiver deployments, or bin up detections into easier-to-visualize bites. Simplified versions of these data types can also help you adhere to your Data Sharing Agreement with your user base, removing or generalizing the data according to what the community demands. The content of these tables is written by the discovery_process_reload.ipynb notebook, and the aggregation of the core discovery tables to the OTN database for aggregate reporting is performed by the discovery_process_pull_node_tables_to_mstr.ipynb notebook. OTN does this for the purpose of reporting to funders, and helping with discoverability of datasets by publishing project pages with receiver and tag release location data for each project. If there are data that you do not want to supply to OTN for publication, you do not have to run those parts of the discovery process. Those data summaries will not be created or harvested by OTN during the discovery process phase of the Data Push.

The `geoserver` schema

The geoserver schema is similar to the discovery schema but all data tables here correspond to layers that can be expressed via the OGC GeoServer data portal. This means they have pre-calculated the geometric and map projection data columns necessary for them to express themselves as geographic data points and paths. This data is often stored in a column called the_geom.

If you re-generate these tables for your Node using the populate_geoserver notebooks, they will then contain accurate data for your Node. Attaching a GeoServer instance to this database schema will allow you to express project, receiver deployment, and tag deployment information in many formats. OTN’s GeoServer instance can aggregate and re-format the GeoServer data into human and machine-readable formats for creating public maps and data products. As an example: the OTN Members Portal uses GeoServer to visualize project data, and the R package otndo uses the existence of a GeoServer with station histories to produce data summaries for individual researchers about the places and times their tags were detected outside their own arrays.

Installation

The installation steps for the Visuals and Reporting are similar to the installation steps for ipython-utilities:

Determine the folder in which you wish to keep the Visuals and Reporting notebooks.
Open your terminal or command prompt app.
- Type cd then space.
- You then need to get the filepath to the folder in which you wish to keep the Visuals and Reporting notebooks. You can either drag the folder into the terminal or command prompt app or hit shift/option while right clicking and select copy as path from the menu.
- Then paste the filepath in the terminal or command prompt and hit enter
- In summary, you should type cd /path/to/desired/folder before pressing enter.
Create and activate the “visbooks” python enviornment. The creation process will only need to happen once.
- In your terminal, run the command conda create -n visbooks python=3.9
- Activate the visbooks environment using conda activate visbooks
You are now able to run commands in that folder. Now run: git clone https://gitlab.oceantrack.org/otn-partner-nodes/visuals-and-reporting.git. This will get the latest version of the Visuals and Reporting notebooks from our GitLab
Navigate to the visuals-and-reporting subdirectory that was created by running cd visuals-and-reporting.
Now to install all required python packages by running the following: mamba env update -n visbooks -f environment.yml

To open and use the Visuals and Reporting Notebooks:

MAC/WINDOWS: Open your terminal, and navigate to your visuals-and-reporting directory, using cd /path/to/visuals-and-reporting. Then, run the commands:
- conda activate visbooks to activate the visbooks python environment
- jupyter notebook to open the Nodebooks in a browser window.
DO NOT CLOSE your terminal/CMD instance that opens! This will need to remain open in the background in order for the Nodebooks to be operational.

Some troubleshooting tips can be found in the ipython Nodebooks installation instructions: https://gitlab.oceantrack.org/otn-partner-nodes/ipython-utilities/-/wikis/New-Install-of-Ipython-Utilities

The `visuals-and-reporting` notebooks

Some products are built from things that nodes don’t make fully public, but are useful to summarize for funders or stakeholders, or to produce simplified visualizations meant to promote your Network. The visuals-and-reporting notebooks repository is a collection of popular workflows for producing these sorts of products.

Database and Network-wide reports

Example: Active Tags and IUCN Status

This creates a summary report of Tag Life, Tags, Detections, Stations. Tailored for OTN’s reporting requirements to CFI.

Example: Generate Receiver Map

This creates a map to show receivers from Nodes. Tailored for OTN’s reporting requirements and data policy.

Project-level summaries

Example: Receiver Project Report

This notebook generates a summary report for receiver operators to describe what animals have been seen and when. Tailored for OTN’s reporting requirements and data policy.

Tag-oriented summaries

Example: Private: Tag Check and Summarize

Scenario: You want to know whether any of the detections you just downloaded are part of your Database already. Use this notebook to search tags, tag specs, and existing detections that are unmatched to find matches to a set of tag IDs that you specify. This will not do the checking that actually loading and matching the detection data would do for you, so sharing this output is not recommended, but it gives a sense of the connectivity of a detection dataset to the Database.

User-facing summaries: `otndo`

Mike O’Brien’s summarization and visualization function can be run by any client and uses a detection extract as a data source. Referencing the OTN GeoServer, it will produce a list of collaborators and projects that have contributed to detctions in that supplied dataset, and provide a before/after timepoint to show the end user what detections have been added by the latest Data Push.

Examples at Mike’s otndo documentation page.

Adding new notebooks to `visuals-and-reporting`

If you would like to see new features or workflows added to the visuals-and-reporting repository for all to use, you can issue a merge request against the repository, or work with OTN programmer staff to build and design data structures and workflows that fulfill your reporting needs. Use the existing discovery and geoserver data objects or you can also design new ones for your node.

Example: Cross-node Detections chord plot

Using the detection_pre_summary Discovery table, we can see which detection events were mapped between different Nodes, and visualize these inter-Node interactions using a chord plot. Here we have created a few caveats to avoid representing false detections on the plot, only taking detection events with >1 detection, and excluding some of the more fantastic inter-ocean matches.

Cross-node chord plots of inter-node detection matches

Key Points

Often, funders and communities want to see aggregations, Big Numbers, or visualizations of the data collected by the Database Node.

Notebooks in the visuals-and-reporting repository (OTN GitLab Link) help create simple, standard outputs that may answer the needs of the communities.

The suite of notebooks can easily be extended by any node manager.

Anything OTN or node managers develop or co-develop can be shared with everyone via a pull request on GitLab.

Database Fix Notebooks

Overview

Teaching: 15 min
Exercises: 0 min

Questions

What should I do if the database values require a change?

Objectives

Learn about Database Fix Notebooks that are useful for changes to the database

General Description

During the process of loading data it is possible to uncover errors with previously-loaded datasets. OTN is constantly improving our QA/QC tools, which means we are identifying and correcting more and more historical errors.

Generally, there a few ways Node Managers will identify errors:

By using the Verification cells in the Nodebooks
When new QA/QC updates are released for the Nodebooks
When a researcher identifies an error explicitly
When a researcher submits data that does not match previously-loaded records

In the latter case, full comparison between the records is required, followed by a discussion with the researcher to identify if the previously-loaded records or the new records are correct. Often, the outcome is that the data in the DB needs correction.

These corrections can be done using the Database Fix Notebooks. These are the tools we have built so far to correct commonly identified errors.

This suite of notebooks, however, should only be used as a last option. If an error is identified by verification checks, human eyes and critical thinking must be used to check if the Database Fix Notebooks are required and, depending on the type of fix, researcher permission often needs to be obtained.

flowchart LR
    A(( )) --> B(Make Gitlab ticket with </br>information about database </br>fix)
    style A fill:#00FF00,stroke:#00FF00,stroke-width:4px
    style B fill:#0000ff,color:#ffffff
    B --> C{Does it have all </br>information needed?}
    style C fill:#000000,color:#ffffff
    C -- No --> D[Update Gitlab ticket]
    style D fill:#0000ff,color:#ffffff
    D --> C

    C -- Yes --> E[Check if nodebook </br>created for fix]
    style E fill:#ffffff,color:#000000
    E --> F{Does notebook exist?}
    style F fill:#000000,color:#ffffff
    F -- No --> G[Create feature issue for </br>notebook creation and do fix </br>manually]
    style G fill:#ffffff,color:#000000
    G --> H(( ))
    style H fill:#FF0000,stroke:#FF0000
    F -- Yes --> I(Run changes through </br>existing notebook)
    style I fill:#ffffff,color:#000000

    subgraph Nodebook
    I --> J{Do verification </br>pass with changes?}
    style J fill:#000000,color:#ffffff
    J -- Yes --> K(Make changes in database)
    style K fill:#F4C430
    K --> L(( ))
    style L fill:#FF0000,stroke:#FF0000
    J -- No --> M(Return a message </br>explaining </br>verification error)
    style M fill:#F4C430
    M --> N(( ))
    style N fill:#FF0000,stroke:#FF0000
    end

Installation

The installation steps for the Database Fix Notebooks are similar to the installation steps for ipython-utilities:

Determine the folder in which you wish to keep the Database Fix Notebooks.
Open your terminal or command prompt app.
- Type cd then space.
- You then need to get the filepath to the folder in which you wish to keep the Database Fix Notebooks. You can either drag the folder into the terminal or command prompt app or hit shift/option while right clicking and select copy as path from the menu.
- Then paste the filepath in the terminal or command prompt and hit enter
- In summary, you should type cd /path/to/desired/folder before pressing enter.
Create and activate the “dbfixnotebook” python enviornment. The creation process will only need to happen once.
- In your terminal, run the command conda create -n dbfixnotebook python=3.9
- Activate the dbfixnotebook environment using conda activate dbfixnotebook
You are now able to run commands in that folder. Now run: git clone https://gitlab.oceantrack.org/otn-partner-nodes/database-fix-notebooks.git. This will get the latest version Database Fix Notebooks from our GitLab
Navigate to the database-fix-notebooks subdirectory that was created by running cd database-fix-notebooks.
Now to install all required python packages by running the following: mamba env update -n dbfixnotebook -f environment.yml

To open and use the Database Fix Notebooks:

MAC/WINDOWS: Open your terminal, and navigate to your database-fix-notebooks directory, using cd /path/to/database-fix-notebooks. Then, run the commands:
- conda activate dbfixnotebook to activate the dbfixnotebook python environment
- jupyter notebook --config="nb_config.py" "0. Home.ipynb" to open the Nodebooks in a browser window.
DO NOT CLOSE your terminal/CMD instance that opens! This will need to remain open in the background in order for the Nodebooks to be operational.

Some troubleshooting tips can be found in the ipython Nodebooks installation instructions: https://gitlab.oceantrack.org/otn-partner-nodes/ipython-utilities/-/wikis/New-Install-of-Ipython-Utilities

Gitlab KDBX integration

An exciting feature of the Database Fix Notebooks is that if you add a Gitlab token to your kdbx file, the notebook will automatically comment the output from the notebook directly into the specified Gitlab issue. Otherwise, you will have to copy and paste the displayed results manualyl into the comments (as directed by the notebook).

To integrate the Gitlab token into your kdbx file, please use the instructions found at the bottom of the AUTH - Create and Update notebook in ipython-utilities.

Issue Creation

The first step when you have confirmed an incorrect dataqbase value is to create a new Gitlab Issue with the DB Fix Issue checklist template.

Here is the Issue checklist, for reference:

# **DB Fix Issue**

## Related gitlab issue:
- **[paste link to issue]**

## CSV of information needed for each record that needs fixing (look at `0. Home` notebook for column headers):
- **[link file here]**

## Task list
- [ ] NAME label issue `DB Fix`
- [ ] NAME create a CSV of changes 
- [ ] NAME assign to @diniangela to make the change
- [ ] Angela make database change

There are a few helpful explanation notebooks inside this suite of Database Fix tools. You should always start by accessing both of these in order to identify next steps.

0. Home: This notebook will provide a brief explanation of what each notebook does, as well as helpful hints to show what is needed to run the notebook.
0. Which notebook should I use: This notebook has a form which will help node managers determine which Database Fix Notebook is appropriate for their change. It shows a list of the types of metadata we offer (project, tag, deployment, and detection) and, based on the selection, shows a list of columns from the raw metadata sheets. Based on the raw metadata column selection, it will display a result of which notebook to use.

Spreadsheet Creation

Some of the Database Fix Notebooks require the user to provide a spreadsheet of the changes as input. This requirement will be specified in the top description of each notebook, or the description on 0. Home.ipynb.

The required columns will be shown in the description as well. Once input, if there are missing required columns, the notebook will display an error identifiying which columns are missing.

The spreadsheet should be created and added to the created Gitlab issue, either in the description or in a comment.

Examples

Once you know which notebook to use and have created the spreadsheet (if needed), you can open the correct Database Fix Notebook. This notebook will consist of a single cell to run.

The notebooks have similar formats so four examples will be demonstrated below.

Example 1: Changing a receiver serial

Let’s say for the first example, a researcher has emailed saying that they made a typo in the receiver metadata and that serial 87654321 should actually be 12345678 for receivers ‘CODE-87654321-2020-03-10’ and ‘CODE-87654321-2024-09-09’ in project CODE.

The first step is to create a Gitlab issue with the relevant information titled ‘CODE Change receiver serial’.

The next step would be to figure out which notebook to use to make this change. Running the first cell in 0. Which notebook should I use gives the following results:

Example 1 Which notebook

You can then click the link to go to the Change receiver serials notebook. In this notebook, there is a description telling you that the spreadsheet should have the columns ‘receiver_catalognumber’ and ‘correct_serial’ so you create the following spreadsheet:

receiver_catalognumber	correct_serial
CODE-87654321-2020-03-10	12345678
CODE-87654321-2024-09-09	12345678

Once this spreadsheet has been created, you can run the single cell in the notebook, which will prompt you for your authorization with a ‘Select File’ button:

Example 1 Select File

which will open your file explorer to select the file containing your authorization.

Once you press the ‘Next’ button after selecting the authorization file and enter your password for kdbx, text fields will appear for you to fill in with relevant information:

Example 1 User Input

Once you press ‘Next’ after filling in the relevant information, the backend code will perform verifications on the changes to make sure, e.g. that no resulting overlapping receivers occur after the change. If this is the case, the change will not occur and an error message will be displayed showing why the change was not made.

If the verifications pass, the notebook will display the updates it will be doing and display an ‘Update’ button for you to press once you have verified that the notebook is making the correct changes.

⭐ IMPORTANT ⭐ Please double check the update steps to ensure the notebook is performing accurately.

Once ‘Update’ is pressed, the notebook will display a success message describing the successful change.

If you have a gitlab token authorization associated with your kdbx, as mentioned above, the notebook will automatically comment the updates and success message in the created Gitlab ticket. Otherwise, it will tell you to copy and paste the update list and success message.

Example 2: Changing tag end date

Let’s say for the second example, a researcher has emailed saying that they had forgotten to add the harvest date ‘2024-09-09 10:00:00’ to tag ‘A69-1303-12345’ on animal ‘CODE-Jane’, which was released on ‘2024-01-01 13:00:00’, which should be used instead of the estimated tag life ‘365 days’.

The first step is to create a Gitlab issue with the relevant information titled ‘CODE Change tag end date with harvest date’ or something with relevant information.

The next step would be to figure out which notebook to use to make this change. Running the first cell in 0. Which notebook should I use gives the following results:

Example 2 Which notebook

You can then click the link to go to the Change tag dates notebook. In this notebook, there is a description telling you that the spreadsheet should have the columns ‘tag_catalognumber’, ‘correct_start_date’, and ‘correct_end_date’ so you create the following spreadsheet:

tag_catalognumber	correct_start_date	correct_end_date
A69-1303-12345	2024-01-01 13:00:00	2024-09-09 10:00:00

Once this spreadsheet has been created, you can run the single cell in the notebook, which will prompt you for your authorization with a ‘Select File’ button:

Example 2 Select File

which will open your file explorer to select the file containing your authorization.

Once you press the ‘Next’ button after selecting the authorization file and enter your password for kdbx, text fields will appear for you to fill in with relevant information:

Example 2 User Input

Once you press ‘Next’ after filling in the relevant information, the backend code will perform verifications on the changes to make sure, e.g. that no resulting overlapping tags occur after the change. If this is the case, the change will not occur and an error message will be displayed showing why the change was not made.

⭐ IMPORTANT ⭐ Please double check the update steps to ensure the notebook is performing accurately.

Once ‘Update’ is pressed, the notebook will display a success message describing the successful change.

Example 3: Fixing the_geom

Let’s say for the third example, you are verifying tag metadata for project ‘NSBS’ and an error comes up from ipython-utilities saying that the_geom is incorrect and the instructions direct you to the ‘fix the_geom’ Database Fix Notebook.

The first step is to create a Gitlab issue with the relevant information titled ‘NSBS fix the_geom’.

You can then open the Fix the_geom notebook as the ipython-utilities nodebook will direct you. In this notebook, there is a description that does not have a spreadsheet so no spreadsheet is needed.

You can run the single cell in the notebook, which will prompt you for your authorization with a ‘Select File’ button:

Example 3 Select File

which will open your file explorer to select the file containing your authorization.

Once you press the ‘Next’ button after selecting the authorization file and enter your password for kdbx, text fields will appear for you to fill in with relevant information:

Example 2 User Input

Once you press ‘Next’ after filling in the relevant information, more user input will be shown for you to pick the relevant table. Since this was for nsbs.otn_transmitters, you can choose ‘otn_transmitters’ from the drop-down:

Example 3 More User Input

Once you press ‘Next’ after filling in the relevant information, the backend code will perform verifications on the changes to make sure no errors will result from the change. If this is the case, the change will not occur and an error message will be displayed showing why the change was not made.

⭐ IMPORTANT ⭐ Please double check the update steps to ensure the notebook is performing accurately.

Once ‘Update’ is pressed, the notebook will display a success message describing the successful change.

Example 4: Fixing duplicate downloads

Let’s say for the fourth example, you are verifying event data and an error pops up from ipython-utilites saying that there are duplicate downloads and the instructions direct you to the ‘fix duplicate downloads’ Database Fix Notebook.

The first step would be to create a Gitlab issue with the collection code and linking the detections Gitlab issue that you were working on when this error popped up.

After this, you would open up the ‘Fix duplicate downloads’ notebook in database-fix-notebooks, notice there’s no spreadsheet needed, and run the single cell.

This prompts you for your engine with a ‘Select File’ button:

Example 4 Select File

Once you click this, your file explorer opens up and you can search for your kdbx (with the Gitlab token). Once you have selected this, the dialog box closes and a ‘Next’ button appears. Pressing next will make text fields appear for you to fill in with the relevant information. Once you press ‘Next’ after filling in the relevant information, you will be shown information about the duplicate downloads, some guidelines, and the option to select which download(s) to delete.

This will appear in two formats: a selectable table and a drop-down.

The selectable table will appear when there are less than ten duplicate download groups:

Example 4 Selectable Table

The drop-down will appear if there are ten or more duplicate download groups:

Example 4 Dropdown

The logic of both options will work the same but differ in the way to select the download(s) to delete. For the selectable table, you can click or Control-click on the rows on the actual table. For the drop-down, you can click or Control-click on the catalognumber(s) in the drop-down.

⭐ IMPORTANT ⭐ Make sure the catalognumber(s) in the drop-down you select matches the catalognumber(s) in the table.

For each duplicate download group, you can check which download(s) are incorrect and should be deleted. You can do this using the guidance in the instructions in the notebook.

If the downloads are matched to different receivers, the correct download matching is the one with the closest download date to its receiver’s end date and the other rows should be deleted.
If they are matched to the same receiver, the correct download is the one with the most accurate catalognumber. This could be of the format {receiver_catalognumber}-{download_date}. The other rows should be deleted.

In the displayed table, there is a ‘***’ in the ‘recommended’ column which shows which rows the notebook recommends to delete but sometimes it can be inaccurate so we recommend human eyes on it to double check that it is the one(s) you intend to delete.

For example, if the download table shows:

download_catalognumber	download_date	receiver_catalognumber	receiver_startdate	receiver_enddate	recommended
CODE-VR2W-123456-2019_06_13T16:2020-06-13 13:10:00	2020-06-13 13:10:00	CODE-VR2W-123456-2019_06_13T16	2019-06-13 16:30:00	2020-06-12 13:15:00
CODE-VR2W-123456-2020_06_12T13:2020-06-13 13:10:00	2020-06-13 13:10:00	CODE-VR2W-123456-2020_06_12T13	2020-06-12 13:20:00	2021-07-21 15:20:00	***

The correct download to delete would be the second row, because in the first one, the download date (2020-06-13 13:10:00) is closer to the receiver end date (2020-06-12 13:15:00) than the receiver end date in the second row (2021-07-21 15:20:00). Therefore, the download catalognumber CODE-VR2W-123456-2020_06_12T13:2020-06-13 13:10:00 should be selected as the one to delete.

If the download table shows:

download_catalognumber	download_date	receiver_catalognumber	receiver_startdate	receiver_enddate	recommended
Station-123456-download	2024-09-09 10:10:00	CODE-VR2W-123456-2019_06_13T16	2019-06-13 16:30:00	2020-06-12 13:15:00	***
CODE-VR2W-123456-2019_06_13T16:2024-09-09 10:10:00	2024-09-09 10:10:00	CODE-VR2W-123456-2019_06_13T16	2019-06-13 16:30:00	2020-06-12 13:15:00

The first row should be deleted since the downloads are matched to the same receiver deployment but the download catalognumber is not in the expected format ({receiver_catalognumber}-{download_date}).

Once the proper downloads to delete are selected by the user, the notebook will display the updates it will be doing and display an ‘Update’ button for you to press once you have verified that the notebook is making the correct changes.

⭐ IMPORTANT ⭐ Please double check the update steps to ensure the notebook is performing accurately.

Example 4 Update Steps

Once ‘Update’ is pressed, the notebook will display a success message describing the successful change.

Example 4 Success Message

General Information - Cross-Node Executions

If your .kdbx file includes a GitLab Access Token, as mentioned above, the Nodebook will automatically comment all updates and success messages in the created Gitlab ticket. Otherwise, you must copy and paste this information into the Issue manually.

For Nodebooks that require updates to the databases of other nodes:

If you have a GitLab Access Token: the Nodebook will provide the SQL needed to update other nodes and will automatically add a ‘Cross-node executions’ label to the Gitlab Issue.
If you do not have a GitLab Access Token: the Nodebook will provide the SQL needed to update other nodes. Please add a ‘Cross-node executions’ label onto the Issue manually.

A text box to enter a super-user authorization to automatically run the SQL on the other node will be displayed:

Example Node Authorization - Blank

If you have a super-user authorization for the other Node (ie; you are the Data Manager for multiple Nodes, or are OTN staff):

You may enter the filepath of the super-user authorization in the above text box and will receive a success message, which will run the SQL on the other Node.
Please remove the ‘Cross-node executions’ label from the Issue once completed

Example Node Authorization - Filled

If you do not have the super-user authorizations for the other Node :

Please inform the associated Node Manager that you have SQL for them to run and send them the SQL.
After they have run the SQL, please add a comment to the issue saying they have run the SQL
Remove the ‘Cross-node executions’ label.

Key Points

database-fix-notebooks has many useful notebooks for Node Managers to help them make changes to the database

Data Push

Overview

Teaching: 10 min
Exercises: 0 min

Questions

What is a Data Push?

What do I need to do as a Node Manager for a Data Push?

What are the end products of a Data Push?

Objectives

Understand a basic overview of a Data Push

Understand what a Node Manager’s responsibilities are during a Data Push

Understand the format of Detection Extracts

What is a Data Push?

A Data Push is when the OTN data system is re-verified and any new relevant information is sent to researchers. New data stops being brought in so that what’s in the system can be reliably verified. This way any issues found can be fixed and the data can be in the best form based on the information available at that moment. Once verification is done, detections are matched across nodes and detection extracts are sent out to researchers. This is also the time when summary schemas like discovery, erddap, and geoserver are updated with the newly verified data.

What is the Push Schedule?

Push events happen three times a year. They start on the third Thursday of the “push months” which are February, June, and October. This date is the cut-off date for all data-loading; no records can be loaded after this. Please aim to have all tickets ready for verification 1 week before this date.

With the increased number of Nodes joining the Pushes, we are announcing the schedule for the next year. Please prepare in advance and mark your calendars.

Push schedule through 2025:

February 20, 2025
June 19, 2025
October 16, 2025

Node Manager Roles During a Push

Node Managers have two main jobs during a Push:

The first job is to get the Node’s data loaded in time for the cut-off date. Data will be submitted by researchers on a continuous basis, but will likely increase just before a cut-off date. We recommend loading data as it arrives, to prevent a backlog near the Push date.
The second job for Node Managers is to create and send out Detection Extracts when they are ready to be made. This will be done using the detections - create detection extracts Nodebook.

Once the cut-off date has passed Node Managers are “off duty”! When it’s time for Detection Extracts to be created and disseminated that task will be assigned to the Node Managers, but this does not signify the end of the Push. There are several more “behind the scenes” steps required.

Please refrain from interacting with the Node Database until OTN staff have announced the Push has ended and data may be loaded again.

Push Reports

Once a push is completed, statistics are gathered about the overall push as well as metrics about each node. This process creates a snapshot of what each node looked like at the time of that push. The statistics tracked include metrics such as the number of issues in the push, the number of projects a node is managing, the total number of detections, and the size of the database.

Using this data, a push report is generated for each node. These reports provide a summary of the push, including graphs and figures that illustrate how each node is growing over time. In addition to sharing these reports, we try to schedule a check-in meeting with nodes. These meetings are not only a chance for OTN to get information to the nodes but also for you to relay any information to us.

We want node managers to gain as much value as possible from the check-in meetings and reports, so we welcome feedback on the format, content, or any additional details you’d like to see included. Our goal is to ensure every node has the insights they need to succeed.

If you have feedback or specific requests, we’re happy to address them during your check-in. Additionally, if something comes to mind outside of these meetings, please don’t hesitate to reach out to the OTNDC team. We’re always available to discuss and scope your needs with you further.

Copy of the OTN February 2024 Push Report

Detection Extracts

Detection Extracts are the main output of the Push. They contain all the new detection matches for each project. There are multiple types of detection extracts OTN creates:

‘qualified’ which contain detections collected by an array but matched to animals of other projects
‘unqualified’ which contain the unmatched or mystery detections collected by an array
‘sentinel’ which contain the detections matched to test or transceiver tags collected by an array
‘tracker’ which contains detections that have been mapped to animals tagged by a project that can originate from any receiver in the entire Network
‘external partners’ which is a report with suggested matches to the mystery-tag layers provided by non-Node partner networks. Summary detection information, including a count of the number of potential matches per project is provided. These matches are meant as a starting point for gaining information from non-Node telemetry networks. Researchers will have to contact those networks directly for more detailed information, and to register.
‘sentinel-tagger’ which contain the detections matched to test or transceiver tags deployed by a project, and detected by other projects

Detection Extract files are formatted for direct ingestion by analysis packages such as glatos and resonate.

Detections - Create Detection Extracts Nodebook

During the Push process, any new detection matches that are made are noted in the obis.detection_extracts_list table of your Node. These entries will have several pieces of useful information:

detection_extract: this contains the project code, year, and type of extract that needs to be created.
- ex: ABC,2022,t will suggest that project ABC needs the extract matched to animals 2022 (tracker format) created.
git_issue_link: the issue in which these detection matches were impacted
push_date: the date of the Push when this extract will have to be made

Using these fields, the detections-create detection extracts Nodebook can determine which extracts need to be created for each push.

As of December 2024, please ensure you are on the master branch of ipython utilities before running this Nodebook

To switch branches in Git, please follow the instructions on this page https://gitlab.oceantrack.org/otn-partner-nodes/ipython-utilities/-/wikis/updating-notebooks-after-bugfixes-and-new-features#changing-branches-of-ipython-utilities

Imports cell

There are no values here which need to be edited.

User Inputs Database Connection

outputdir = 'C:/Users/path/to/detection extracts/folder/'
- Within the quotes, please paste a filepath to the folder in which you’d like to save all the Detection Extracts.
engine = get_engine()
- Within the open brackets you need to open quotations and paste the path to your database .kdbx file which contains your login credentials.
- On MacOS computers, you can usually find and copy the path to your database .kdbx file by right-clicking on the file and holding down the “option” key. On Windows, we recommend using the installed software Path Copy Copy, so you can copy a unix-style path by right-clicking.
- The path should look like engine = get_engine(‘C:/Users/username/Desktop/Auth files/database_conn_string.kdbx’).

Once you have added your information, you can run the cell. Successful login is indicated with the following output:

Auth password:········
Connection Notes: None
Database connection established
Connection Type:postgresql Host:db.your.org Database:your_db User:node_admin Node: Node
Testing dblink connections:
	saf-on-fact: DBLink established on user@fact.secoora.org:5002 - Node: FACT
	saf-on-migramar: DBLink established on user@db.load.oceantrack.org:5432 - Node: MIGRAMAR
	saf-on-nep: DBLink established on user@db.load.oceantrack.org:5432 - Node: NEP
	saf-on-otn: DBLink established on user@db.load.oceantrack.org:5432 - Node: OTN
	saf-on-act: DBLink established on user@matos.asascience.com:5432 - Node: ACT
	saf-on-pirat: DBLink established on user@161.35.98.36:5432 - Node: PIRAT
	saf-on-path: DBLink established on user@fishdb.wfcb.ucdavis.edu:5432 - Node: PATH

You may note that there are multiple DB links required here: this is so that you will be able to include detection matches from all the Nodes. If your .kdbx file doesn’t include any of your DB link accounts, reach out to OTN to help set it up for you.

Detection Extract Selection

There are two options for selecting which Detection Extracts to create:

The manual entry cell. Here you can paste a list of extracts in this format (one per line):
- project code (capitals), year, type
The cell to query the obis.detection_extracts_list table. This is the preferred method.
- enter the current Push date like push_date = 'YYYY-MM-DD'

Once you have a list of the Detection Extracts to create, you can move on. The next cell will create a list of all the extracts that were just created, which you can use for your own records. It will save in your ipython-utilities folder.

Create Detection Extracts

This cell will begin creating the identified detection extracts, one by one. You will be able to see a summary of the matched projects for each extract. Please wait for them all to complete - indicated by a green checkmark and a summary of the time it took to complete the extract.

The following section is for Nodes who use Plone as their document management system only

Uploading Extracts to Plone

First the Nodebook will print a list of all the extracts that need to be uploaded. It should match the list of those just created.

Next, you will need to connect to Plone using a .auth file. The format will be like this: plone_auth_path = r'C:/path/to/Plone.auth'. Success will be indicated with this message:
Plone authorization was successful.
Connected to 'https://members.oceantrack.org' as 'USER'
Now the Nodebook will upload all the Detection Extracts into their relevant folders on Plone.

Please wait for them all to complete - indicated by a green checkmark and a summary of the time it took to complete the extract.

Emailing Researchers - Plone

Using the Plone users system, its possible to identify which researchers require an email notification. First you need to supply a .auth file for an email account. The format will be like this: email_auth_path = r'C:/path/to/email.auth'. Success will be indicated with this message:
Email auth is set:
 user= otndc@dal.ca
 host= smtp.office365.com
 cc= otndc@dal.ca
 port= XXX
Upon successful login, you will be able to print out your current email template. If it is not adequate, you can edit the template by changing the det_extracts_emailSpecial.j2 template in the templates subfolder of ipython-utilities, and changing the filepath to be email_template = 'templates/det_extracts_emailSpecial.j2', then re-running.

Finally, this stage will send the emails. Ensure that date = 'YYYY-MM-DD' for the date you uploaded the extracts to Plone. This is how the Nodebook will determine which links to include in the email template. First: set send_mail = False. Run the cell, select the projects of interest and Simulate Sending Emails. If you are pleased with the output, you can then change send_mail = True and re-run. Choose Send Emails and they will be sent.

Emailing Researchers - Manual

If you are not using the Plone system for sending emails, you can use the manual email tool.

You will first enter the collectioncode which has the new Detection Extract file requiring notification: cntct_schema = 'schema'. Please edit to include the project code, in lowercase, between the quotation marks.

An interactive menu will then appear, listing all the contacts associated with the project. You can select each of those you’d like to email.

The following cell will print out the resulting contact list.

Next, you will be able to review the email template. Required input includes:

email_auth_path = '/path/to/email.auth': please paste the filepath to your email .auth between the quotes.
template = './emailtools/templates/email_researcher.html' : you can select the template you’d like to use. If the ones provided are not adequate, you can edit the template by changing the email_researcher.html template by navigating from the ipython-utilities main folder to emailtools then into the templates folder. Save your edited file and re-run

If the email preview is acceptable, you may run the final cell in this section which will send the emails.

Update Extract Table

Once all extracts are made, uploaded to your file management system and emails have been sent to researchers, the final step is to ensure we mark in the obis.detection_extracts_list table that we have completed these tasks.

Please enter current_push_date = 'yyyy-mm-dd' : the date of the Push when these extracts have been made.

Then, an interactive dataframe will appear. This dataframe will allow you to check off the extracts as completed based on those you were able to successfully create.

Now you’re done with Detection Extracts until next Push!

Key Points

A Data Push is when we verify all the data in the system, fix any issues, and then provide detection matches to researchers

As Node Managers its your responsibility to get the data into the system so OTN can verify and get it ready to be sent out

Detection Extracts are the main end product of the Push

Supplementary notebooks

Overview

Teaching: 15 min
Exercises: 0 min

Questions

What other Nodebooks are there that could be useful for me to know as a Node Manager?

Objectives

Learn about Nodebooks that are useful for Node Managers outside the core data loading notebooks

OTN maintains several additional Nodebooks that fall outside the core tag, deployment and detection tools. These may be useful to Node managers who also deal with these particular scenarios.

Check Environment

This Nodebook checks your system Python environment against our environment.yml. This is to see if the Python packages and libraries you have installed are in line with what is required to run the Nodebooks. This will assist you with updating your packages if they become out-of-date, or if OTN develops and publishes new tools which rely on new packages.

scientific_name_check

This Nodebook uses WoRMS to check animal common and scientific names. It is used to add new species names to obis.scientificnames table for use each project. The instructions for using this Nodebook are the same as the Adding Scientific Names section in the Create and Update Projects Nodebook.

Occasionally, you may have to add a new species to the WoRMS data system. The specifics of doing so are laid out on the WoRMS Contribution page but they amount to: email info@marinespecies.org with a literature record describing the taxa that needs to be added. Traditionally WoRMS was not a register of freshwater species but in recent years they have extended their scope to account for and track marine, brackish, and fresh habitat designations for their supported taxa. So don’t hesitate to reach out to them and help them add new species!

Add Instrument Models to Database

This Nodebook is used to add an instrument model to the obis.instrument_models table.

insert_vendor_sheet

Used to load manufacturer specifications for tags or receivers into the vendor tables.

convert - Fathom (vdat) Export - VRL to CSV

Convert VRLs or VDATs into CSV files using command-line Fathom software. Can also be done using the Fathom app.

Load Health Report

This notebook will load “health reports” collected by LiquidRobotics WaveGliders while remotely offloading VR4 receivers.

Create and Update Contacts Nodebook

This Nodebook can be used to add new contacts to a project and update existing contacts in the database. Note: you cannot change someone’s email address using this tool.

DB Fix Notebooks

These are a series of notebooks for fixing common issues found in the database. More information can be found in the relevant lesson.

Outdated / old workflows:

Active Tracking

Handles active tracking data - detections collected by using a VR100 hydrophone (or similar) during “mobile” tracking activities. (Superseded by Movers workflow)

Slocum Telemetry Notebook

This notebook will process and load detection data collected by a slocum glider mission. Information required includes glider telemetry, glider metadata, and detection files. (Superceded by Movers workflow)

Telemetry Processing for Wave Gliders and Animals

This notebook will process and load detection data collected by a WaveGlider mission. Information required includes glider telemetry, glider metadata, and detection files. (Superseded by Movers workflow)

Key Points

ipython-utilities has many useful notebooks for Node Managers to help them

Nodebook Improvement and Development

Overview

Teaching: 10 min
Exercises: 0 min

Questions

How do I request new features?

How do I report bugs to OTN?

What is my role in Nodebook develeopment?

Objectives

Understand the process for developing the Nodebooks

Understand the process for testing new Nodebook changes

Once you begin using the OTN Nodebooks, you will likely discover tools that are missing, or bugs that hinder your usage. OTN developers are here to help ensure the Nodebooks meet your needs. We are constantly changing our processes and are always open to suggestions to improve! These tools are for you, and we want to ensure they are useful.

New Features

If there is a feature that you’d like to see, you can bring this to OTN’s attention in this way:

Ask in our Slack channels to see if there is a tool that already exists that will fit your needs. Discuss your ideal feature with OTN developers to help them understand what you need.
Once the new feature is properly scoped, create a new GitLab Issue here https://gitlab.oceantrack.org/otn-partner-nodes/ipython-utilities/-/issues, using the new_feature template. You should assign to one of the OTN developers, who can begin to work on the request.

Here is the “new_feature” template, for your information:

            Note that this template is **ONLY** used for tracking new features. Bugs should be filed separately using the Bug template.

            The new feature should be proposed, scoped, and approved **before** you fill out this template.

            ## Summary
            **Provide a general summary of the feature to be built.**

            ## Rationale
            **Why do we want this?**

            ## End Users
            **Who is expected to use this functionality most?**

            ## Scope
            **What are the expected qualities and functionality of this new feature? One per line.**

            ## Minimum viable product
            **What subset of the above functionality is necessary before the feature is considered finished?**

            ## Timeline
            **Give a brief, ballpark estimate on when you would want this done.**

Bug fixes

If you encounter an error in your Nodebooks, its possible there is an issue with your dataset. Sometimes, however, the Nodebook is not functioning as expected. If you believe a certain Nodebook is malfunctioning, identify this bug to OTN developers as soon as possible.

To identify a bug, here are the steps to take:

Ask in our Slack channels to see if the error is caused by your dataset. This can include posting an error message, or just describing the output from the Nodebook, and why it is not as expected.
If OTN developers identify that the problem is not your dataset, the next step will be to create a GitLab Issue here, using the bug template. You should assign to one of the OTN developers, and use the label bugfix.

Here is the “bug” template, for your information:

            Note that this template is **ONLY** used for reporting bugs. New features must be vetted and filed separately using the New Feature template.

            As such, before you continue, ask yourself:
            **Does this ticket concern existing functionality that is broken? Or is it a deficiency in the system as-designed that needs net new work to complete?**
            If it's the former, then use this template, otherwise, use the New Feature template.

            ## Summary
            **Provide a general summary of the problem.**

            ## Expected Behavior
            **What should be happening?**

            ## Current Behavior
            **What is actually happening?**

            ## Steps to Reproduce
            **How can I see this bug happening? Where does it happen? Provide links to specific pages where the bug is visible, along with steps to reproduce it.**

            ## Priority
            **On a scale from 1 to 5, approximate the severity of the issue. Be honest.**

Testing Features/Fixes

Once OTN developers have attempted to build your new feature, or fix your bug, they will publish their code into the integration branch of our ipython-utilities Nodebooks. You will be asked to test these changes, to ensure the end product is satisfactory. This branch is used for development and testing only, and should not be used for data-loading outside of testing.

Here are instructions for changing branches:

(Replace anything in <> with the sensible value for your installation/situation)

Open git bash or terminal.

From your git bash window, type:

cd "/path/to/your/ipython-utilities"

This will put your terminal or git bash terminal in the folder where this codebase lives on your computer. Now we’re going to

reset any changes you have made locally (save anything you might want to keep with git stash),
switch to the new branch
update that branch with any new code

git reset --hard
git checkout <branch you are moving to ie:integration>
git pull

Resolve python libraries by using the mamba command:

mamba env update -n root -f environment.yml

You will then be able to test the new code. When you are finished, follow the same steps to switch back to the master branch to continue your work. The master branch contains all the production-quality tools.

Key Points

The Nodebooks are for you - OTN wants to ensure they remain useful

OTN Node Manager Training

Introduction to Nodes

Overview

What is a Node?

How does a Node benefit its users?

Node Managers

Node Training

Key Points

Upholding the Data Policy

Overview

External Requests - Restricted Data

Internal Requests

Partner-Node Data Policies

Key Points

OTN System, Structure and Outputs

Overview

OTN Data System

Basic Structure

The Path of Data

Project Data

Tag, Deployment and Detections Data

OTN Data Products

Backing Up Your Data

Key Points

Setup and Installing Needed Software

Overview

The NodeBook environment and supporting software

Python/Mamba

Git

Nodebooks - iPython Utilities

Database Console Viewer

More Useful Programs

Cross-Platform

For WINDOWS users

For MAC users

Node Training Datasets

Key Points

Data Loading Workflow

Overview

Researcher data submission

File management website

Email-only submission

Documenting data submission

Creating GitLab issues

Using GitLab to track progress

GitLab practice

Database access

Connecting to your database

Writing a query in DBeaver

Exercise: Database Querying

Key Points

Connecting to the Database from the Nodebooks

Overview

Connecting to your Database from the Nodebooks

Imports cell

Path to file

Create Password

Create or Update Main Connections

Create or Update DBLink Connections

Test Connections

Change KDBX password

Add a Git Access Token

Key Points

Project Metadata

Overview

Process workflow

Completed Metadata

Visual Inspection

Quality Control - Create and Update Projects

Imports Cell

Project Metadata Parser

Manual Fields - Dropdown Menu

Task List Checkpoint

Verifying Correct Information

Adjust Bounding Box

Create New Institution

Write Project to the Database

Task List Checkpoint

Save Contact Information

Task List Checkpoint