OTN Node Manager Training

Introduction to Nodes

Overview

Teaching: 30 min
Exercises: 0 min
Questions
  • What is an OTN-style Database Node?

  • What is expected from a Node Manager?

  • Who is this training for?

  • What is the schedule for the next several days?

Objectives
  • Understand the relationship between OTN and its Nodes.

  • Ensure attendees have required background information.

What is a Node?

OTN partners with regional acoustic telemetry networks around the world to enable detection-matching across our communities. An OTN Node is an exact copy of OTN’s acoustic telemetry database structure, which allows for direct cross-referencing between the data holdings of each regional telemetry sharing community. The list of OTN Nodes is available here: https://members.oceantrack.org. Data only needs to be reported to one Node in order for tags/detections to be matched across all.

How does a Node benefit its users?

OTN and affiliated networks provide automated cross-referencing of your detection data with other tags in the system to help resolve “mystery detections” and provide detection data to taggers in other regions. OTN Data Managers extensively quality-control submitted metadata to ensure the most accurate records possible are stored in the database. OTN’s database and Data Portal website are well suited for archiving datasets for future use and sharing with collaborators. The OTN system includes pathways to publish datasets with OBIS, and for sharing via open data portals such as ERDDAP and GeoServer. The data-product format returned by OTN is directly ingestible by analysis packages including glatos and resonATe. OTN offers continuous support for the use of these packages and tools.

Below is a presentation from current Node Managers, describing the relationship between OTN and its Nodes, the benefits of the Node system as a community outgrows more organic person-to-person sharing, as well as a realistic understanding of the work involved in hosting/maintaining a Node.

PowerPoint

Node Managers

To date, the greatest successes in organizing telemetry communities has come from identifying and working with local on-the-ground Node Managers for each affiliated Node. The trusted and connected ‘data wranglers’ have been essential to building and maintaining the culture and sense of trust in each telemetry group.

In order to be successful as a Node Manager in your region, here are a few tips and guidelines:

  1. Ensure you set aside time each week to wear your ‘Node Manager hat’.
  2. Familiarize yourself with the metadata reporting templates, and follow carefully the formats required for each variable.
  3. Ensure you have continuous communication with the OTN data team so you are able to align your detection matching with all other Nodes, and keep your local toolbox up to date.
  4. Ensure you have consistent communication with your local telemetry community - knowing who’s who is important for relationship building.
  5. Be willing to learn and ask questions - we are always trying to improve our tools and processes!

No previous coding or data management experience is required! Anyone who is willing to put in the work to become a Data Manager can be successful. Being involved in the telemetry community as a researcher (or affiliate) is enough to get you started with ‘data wrangling’ in your region.

Node Training

Each year OTN hosts a training session for Node Managers. This session is not only for new Node Managers, but also a refresher for current Node Managers on our updated tools and processes.

This is a hands-on course, participants will be using the tools to practice loading telemetry data with us, using a Training Node we have built for this purpose. This means you will need to install all required software and devote full attention for the next several days.

Here are the general topics that will be covered:

If you do not intend on learning how to load data to an OTN-style Node (and would prefer to be a spectator) please let us know, so we can identify who our hands-on learners will be.

A great resource for Node Managers as they get started will be OTN’s FAQ page - https://members.oceantrack.org/faq. Your local telemetry community will likely have many questions about the Node and how it works, and the FAQs can help answer some of these questions.

Key Points

  • Your Node is fully compatible with all others like it.

  • A well-connected Node Manager is essential.

  • OTN staff are always availble to support Node Managers.


OTN System, Structure and Outputs

Overview

Teaching: 25 min
Exercises: 0 min
Questions
  • What does an OTN-style Database look like?

  • What is the general path data takes through the OTN data system

  • What does the OTN data system output?

Objectives
  • Understand the OTN Database structure on a high level

  • Understand the general path of data of data through the OTN system

  • Understand what the OTN system can output

OTN Data System

The OTN data system is an aggregator of telemetry data made up of interconnected Node Databases and data processing tools. These work together to connect researchers with relevant and reliable data. At the heart of this system are Nodes and their OTN-style Databases.

Affiliated acoustic telemetry partner Networks may become an OTN Node by deploying their own database that follows the same structure as all the others. This structure allows Nodes to use OTN’s data loading processes, produce OTN data products, and match detections across other Nodes.

Basic Structure

The basic structural decision at the centre of an OTN-style Database is that each of a Node’s projects will be subdivided into their own database schemas. These schemas contain only the relevant tables and data to that project. The tables included in each schema are created and updated based on which types of data each project is reporting.

Projects can have the type tracker, deployment, or data.

In addition to the project-specific schemas, there are some important common schemas in the Database that Node Managers will interact with. These additional schemas include the obis, erddap, geoserver, vendor, and discovery schemas. These schemas are found across all Nodes and are used to create important end-products and for processing.

The amount of information shared through the discovery tables can be adjusted based on sharing and reporting requirements for each Node.

OTN Database - path of data through the system

The Path of Data

The OTN data system takes 4 types of data/metadata: project, tag, instrument deployments, and detections. Most data has a similar flow through the OTN system even though each type has different notebooks and processes for loading. The exception to this is project metadata which has a more unique journey because it is completely user-defined, and must be used to initially define and create a project’s schema.

OTN Database - structural ER diagram

Project Data

Project data has a unique workflow from the other input data and metadata that flows into an OTN Node, it is generally the first bit of information received about a project, and will be used to create the new schema in the Database for a project. The type of project selected (tracker, deployment, or data) will determine the format of the tables in the newly created schema. The type of project will also impact the loading tools and processes that will be used later on. The general journey of project data is:

Tag, Deployment and Detections Data

Even though tag, deployment, and detections data all have their own loading tools and processes, their general path through the database is the same.

OTN Data Products

The OTN Database has specific data products available, based upon the clean processed data, for researchers to use for their scientific analysis.

In order to create meaningful Detection Extracts, OTN and affiliated Nodes only perform cross-matching events every 4 months (when a reasonable amount of new data has been processed). This event is called a synchronous Data Push. In a Data Push:

Summary Map

Backing Up Your Data

As with any database, it is important to make sure the data held by the OTN Database Node is protected. To ensure you are protected your Database and potentially the files contributed to your Node by the research community you support should be backed up properly, in the event your primary Database crashes, is corrupted, is lost, or any other unexpected event.

You should discuss and coordinate a backup strategy with the groups or institutions responsible for your Database’s administration, and find out their policies and practices for performing backups. Backup strategies may vary from group to group but it is a good idea to make sure they are adequately backing up your data daily, if not multiple times a day, and keep copies of backups in different physical locations in the case that something happens at a single location.

OTN is happy to offer guidance and recommendations to any group.

Key Points

  • All OTN-style Databases have the same structure

  • Databases are divided into project schemas which get certain tables based on the type of data they collect

  • Data in the OTN system moves from the raw tables to the intermediate tables to the upper tables before aggregation


Setup and Installing Needed Software

Overview

Teaching: 10 min
Exercises: 50 min
Questions
  • What software does a Node Manager need?

  • Why do I need the recommended software?

  • How do I install the required software?

Objectives
  • Understand how to install required software and prepare to load data

  • Ensure attendees have the required software installed and are ready to use it

In order to work efficiently as a Node Manager, the following programs are necessary and/or useful.

To standardize the verification and quality control process that all contributing data is subjected to, OTN has built custom quality control workflows and tools for Node Managers, often referred to as the OTN Nodebooks. The underlying functions are written in Python and workflows that rely on them can be undertaken through the use of Jupyter Notebooks. In order to use these tools, and interact with your database, you will need to install a few different applications and packages. All installation instructions are also available on our GitLab here.

This lesson will give attendees a chance to install all the relevant software, under the supervision of OTN staff.

Python/Mamba

Python is a popular general-purpose programming language that can be used for a wide variety of applications. It is the main language used by OTN and our data processing pipeline.

Mamba is fast, cross-platform python distribution and a package manager. When you install Mamba (through Miniforge) you get a python interpreter, and many of the core python libraries. Managing your Python installation with Mamba allows you to be able to install and update all the packages needed to run the Nodebooks with one command rather than having to install each one individually.

Install Miniforge3 - https://conda-forge.org/miniforge/

Git

Git is a version-control system, it helps people to work collaboratively and maintains a complete history of all changes made to a project. We use Git at OTN to track changes to the Nodebooks made by our developer team, and occasionally you will need to update your Nodebooks to include those changes.

Install Git

Nodebooks - iPython Utilities

The ipython-utilities project contains the collection of Nodebooks used to load data into the OTN data system.

Create an Account

First, you will need a GitLab account. Please fill out this signup form for an account on GitLab.

Then, OTN staff will give you access to the relevant Projects containing the code we will use.

Install iPython Utilities

  1. Determine the folder in which you wish to keep the iPython Utilities Nodebooks.
  2. Open your terminal or command prompt app.
    • Type cd then space.
    • You then need to get the filepath to the folder in which you wish to keep the iPython Utilities Nodebooks. You can either drag the folder into the terminal or command prompt app or hit shift/option while right clicking and select copy as path from the menu.
    • Then paste the filepath in the terminal or command prompt and hit enter
    • In summary, you should type cd /path/to/desired/folder before pressing enter.
  3. Create and activate the “nodebook” python enviornment. The creation process will only need to happen once.
    • In your terminal, run the command conda create -n nodebook python=3.9
    • Activate the nodebook environment using conda activate nodebook
  4. You are now able to run commands in that folder. Now run: git clone https://gitlab.oceantrack.org/otn-partner-nodes/ipython-utilities.git. This will get the latest version iPython Utilities from our GitLab
  5. Navigate to the ipython-utilities subdirectory that was created by running cd ipython-utilities.
  6. Now to install all required python packages by running the following: mamba env update -n nodebook -f environment.yml

To open and use the OTN Nodebooks:

More operating system-specific instructions and troubleshooting tips can be found at: https://gitlab.oceantrack.org/otn-partner-nodes/ipython-utilities/-/wikis/New-Install-of-Ipython-Utilities

OTN Nodebooks - home page

Database Console Viewer

There are database administration applications to assist with interacting directly with your database. There are many options available but DBeaver and DataGrip are the most popular options at OTN.

In the next lesson we will practice using our database console viewer and connecting to our node_training database.

More Useful Programs

In order to work efficiently as a Node Manager, the following programs are necessary and/or useful.

For WINDOWS users

Path Copy Copy - For copying path links from your file browser.

Notepad ++ - For reading and editing code, csv files etc. without altering the formatting.

Tortoise Git - For managing git, avoiding command line.

For MAC users

VS Code - For reading and editing code, csv files etc. without altering the formatting.

Source Tree - For managing git, avoiding command line.

Node Training Datasets

We have created test datasets to use for this workshop. Each attendee has their own files, available at this link: https://members.oceantrack.org/data/repository/fntp/nmt-files/

Please find the folder with your name and download. Save these somewhere on your computer, and UNZIP all files.

Key Points

  • Node Manager tasks involve the use of many different programs

  • OTN staff are always available to help with installation of these programs or any issues

  • There are many programs and tools to help Node Managers


Data Loading Workflow

Overview

Teaching: 30 min
Exercises: 15 min
Questions
  • How does a Node Manager receive data?

  • How does a Node Manager track their To-Do list?

  • How can a Node Manager interact with their database directly?

Objectives
  • Understand the data-loading workflow

  • Understand how to create and use GitLab Issues

  • Understand how to access and query your database tables

  • Understand how to use the AUTH - Create and Update notebook to maintain your database credentials file

Data Managers receive data from a researcher and then begin a several-step process of QA/QC and matching of data.

  1. Records are received, and immediately a GitLab Issue is created.
  2. Data are QA/QC’d using the OTN Nodebook tools (covered in detail later in the curriculum), and all progress is tracked in GitLab. Feedback between Data Manager and researchers happens at this stage, until data is clean and all GitLab tasks are completed.
  3. The successful processing of records can be evaluated by checking the database tables using DBeaver, and SQL queries.

Researcher data submission

There are many ways to receive data from researchers in your community/group. Find and make official the way that works for your community and ensure that that becomes standard practice for reporting to your Node.

File management website

The most common way to receive data and metadata from a researcher is through some type of file management website. This will require either an email notification system for the Node Manager or constant checking to look for new submissions.

OTN-managed Nodes can always use the same Plone file management portal software that OTN itself uses to create and maintain private-access data repository folders into which researchers can deposit their data and metadata. These private folders also serve as the location where Detection Extracts are distributed to users, when available.

The FACT Network currently uses a custom instance of Research Workspace for the same purpose.

The ACT and GLATOS Networks use a custom data-submission form managed through their networks’ web sites.

Its common for groups of researchers to use DropBox, Google Drive, or something similar to share data/metadata when the Network is still small. This can be a great, accessible option but the caveat is that is is much more difficult to control access to each individual folder to protect the Data Policy, and it may be difficult to determine when new data has been submitted.

Email-only submission

Generally, each Node Manager has an email address for communicating with their network’s data submitters (ex: Data @ TheFACTNetwork . org). This is a great way to ensure all Node-related emails are contained in the same account in the case of multiple Node Managers or the succession of a new Node Manager. With proper email management, this can be a very successful way to ask Node-users to submit their data/metadata to your Node.

It is not recommended to use a personal email account for this, since all the files and history of the project’s data submissions will be lost if that Manager ever moves away from the role. If the account is hosted at an institution, it may be advisable to submit requests to raise institutional limits on things like email storage in advance.

Documenting data submission

Using one of the suggested means above, a user has submitted data and metadata to the Node Manager. Now what?

OTN uses GitLab Issues with templates of task-lists to ensure we NEVER forget a step in data loading, and that no file is ever lost/forgotten in an inbox.

Immediately upon receipt of a data file, you are advised to login to OTN’s GitLab (https://gitlab.oceantrack.org). You will have a project for your Node named . This is where you will navigate to. You may want to bookmark this webpage!

Once on the GitLab project page, you should navigate to the Issues menu option, on the left side. Think of your GitLab issues as your running “TODO List”! You will want to create a new Issue for each piece of data that is submitted.

NOTE: GitLab Issues are often referred to as “tickets”

Creating GitLab issues

By choosing the New Issue button in the top-right of your screen, you will be taken to a new, blank, issue form. To fill out the fields you will need to do the following:

Now, with all information completed, you can select Create Issue.

Using GitLab to track progress

As you approach the deadline for data-loading, before a data PUSH, you should begin to work on your Issues which fall under that Milestone. When you open an issue, you will be able to see the remaining tasks to properly load/process that data along with the name of the OTN Nodebook you should use to complete each task.

Keep GitLab open in your browser as you work through the relevant Nodebooks. You should check-off the tasks as you complete them, and insert any comments you have into the bottom of the ticket. Comments can include error messages from the Nodebook, questions you have for the researcher, any re-formatting required, etc. At any time you can change the Labels on the issue, to help you remember the issue’s status at-a-glance.

Once you are done for the day, you’ll be able to come back and see exactly where you left off, thanks to the checklist!

You can tag anyone from the OTN Data Team in your GitLab issue (using the @NAME syntax). We will be notified via email to come and check out the Issue and answer any questions that have been commented.

Once you have completed all the tasks in the template, you can edit the Assignee value in the top-right corner, and assign to someone from OTN’s Database team (currently, Angela or Yinghuan). They will complete the final verification of the data, and close the issue when completed. At this time, you can change to the Verify issue label, or something similar, to help visually “mark it off” your issue list on the main page.

GitLab practice

At this time we will take a moment to practice making GitLab Issues, and explore other pages on our GitLab like, Milestones, Repository, Snippets, and Wiki.

Database access

As part of the OTN workflow, once we have used the OTN Nodebooks, it may be prudent to use a database client like DBeaver to view the contents of your database Node directly, and be sure the data was indeed loaded as expected.

DBeaver is an open-source application for interacting directly with databases. There are lots of built-in tools for quick query-writing, and data exploration. We will assume that workshop attendees are novices in using this application.

Connecting to your database

For this training we will connect to a Node Training test database, as practice. Once you open DBeaver, you will need to click on the Database menu item, and choose New Database Connection. A popup will appear, and you will choose the PostreSQL logo (the elephant) then click Next. Using the .auth file provided to you by OTNDC you will complete the following fields:

Next you choose Test Connection and see if it passes the tests. If so, you can choose Finish and you’re now connected to your database!

On the left-side you should now see a Database Navigator tab, and a list of all your active database connections. You can use the drop down menu to explore all the schemas aka: collections stored in your database. You can even view each individual table, to confirm the creation steps in the Nodebooks were successful!

Writing a query in DBeaver

If you wish to write an SQL query to see a specific portion of your already-loaded data, you should first open a new SQL console. Choose SQL Editor from the top menu, then New SQL Script. A blank form should appear.

While writing SQL is out of the scope of this course, there are many great SQL resources available online. The general premise involves creating conditional select statements to specify the data you’re interested in. ex: select * from hfx.rcvr_locations where rcv_serial_no = '12345'; will select all records from the HFX schema’s rcvr_locations table, where the serial number is 12345.

To run a query, you ensure your cursor is on the line you want to run, then you can either 1) right-click, and choose Execute, or 2) press CTRL-ENTER (CMD-ENTER for Mac). The results of your query will be displayed in the window below the SQL console.

OTN is here to support you as you begin to experiment with SQL queries, and the OTN database structure, and can help you build a library of helpful custom queries that you may want or need.

Database Practice

Let’s take a moment to explore some of the tables in the Node Training database, and write some example SQL queries.

Connecting to your Database from the Nodebooks

Now that we have explored and set up some of the tools needed to work as a Node Manager, we can begin preparing our Nodebook connection files. To enhance security, OTN uses encrypted, password-protected .kdbx files to store login credentials for your database. To create these, we have developed an interactive AUTH - Create and Update Nodebook.

  1. Open the OTN Nodebooks
    • MAC: Open your terminal, and navigate to your ipython-utilities directory, using cd /paht/to/ipython-utilities. Then, run the command: jupyter notebook --config="nb_config.py" "0. Home.ipynb" to open the Nodebooks
    • WINDOWS: Double-click the start-jupyter.bat file in your ipython-utlities folder, which will open the Nodebooks.
  2. Your Nodebooks should open in a new browser window, showing the Home page.
  3. Open the AUTH - Create and Update notebook

Imports cell

This section will be common for most Nodebooks: it is a cell at the top of the notebook where you will import any required packages and functions to use throughout the notebook. It must be run first, every time.

Path to file

This cell needs to be edited. Between the quotes you will type the filepath to the .kdbx file you would like to create, or one which already exists that you would like to edit. The format should look like:

file_location = 'C:/Users/path/to/node_auth.kdbx'

Run this cell to save the input.

Create Password

Run this cell. You will be prompted to create a password for the file (if it is a new file) or to enter the existing password if you are accessing an existing file. Ensure that you remember this password, as you will be using it every time you connect to the database through the Nodebooks.

Create or Update Main Connections

Run this cell. This section will have an editable form. If it is a new file, all fields will be blank. If it is an existing file, the previously-entered information will display. You may now edit the information, pressing the blue button when you are finished to save your results.

In order to match detections across databases, you will need to establish DBLink connections to other OTN Nodes. This information can be stored in your .kdbx file, and will give you limited access to information required to match detections and create detection extracts.

Run this cell. This section will have an editable form. If it is a new file, all fields will be blank, and you can choose Add Connection. If it is an existing file, the previously-entered information will display, for each DBLink Connection you’ve specified. You may now edit the information, pressing the update or save button when you are finished to save your results.

Please contact OTN if you need any of the following information:

Once you have saved your new DBLink connection, you can create another. Continue until you have established connections to all remote Nodes. Currently, you will require DBLinks to 7 Nodes.

Test Connections

The next two cells will test the connection information you entered. Success messages will look like:

Auth password:········
Connection Notes: None
Database connection established
Connection Type:postgresql Host:db.your.org Database:your_db User:node_admin Node: Node

and also like:

Testing dblink connections:
	fact-link-to-Node1: DBLink established on user@db.load.oceantrack.org:5432 - Node: NODE1
	fact-link-to-Node2: DBLink established on user@db.load.oceantrack.org:5432 - Node: NODE2
	fact-link-to-Node3: DBLink established on user@db.load.oceantrack.org:5432 - Node: NODE3
	fact-link-to-Node4: DBLink established on user@db.load.oceantrack.org:5432 - Node: NODE4
	fact-link-to-Node5: DBLink established on user@db.load.oceantrack.org:5432 - Node: NODE5
	fact-link-to-Node6: DBLink established on user@db.load.oceantrack.org:5432 - Node: NODE6
	fact-link-to-Node7: DBLink established on user@db.load.oceantrack.org:5432 - Node: NODE7

You are now able to use the filepath to your .kdbx file to run all the Nodebooks.

Change KDBX password

If you have been passed a template .kdbx file from OTNDC with prefilled information, you should use this section to change the password to ensure your privacy is protected.

You will need to enter:

Press Save to change the password of your .kdbx. Ensure that you remember this password, as you will be using it every time you connect to the database through the Nodebooks.

Add a Git Access Token

This will be relevant for users of the DB Fix and Verification suite of Nodebooks only. If you have questions about this section please reach out to OTNDC.

Key Points

  • Node-members cannot access the database, you are the liason

  • Data submissions and QC processes should be trackable and archived

  • OTN is always here to help with any step of the process


Project Metadata

Overview

Teaching: 30 min
Exercises: 0 min
Questions
  • How do I register a new project in the Database?

Objectives
  • Understand how to complete the template

  • Understand how to use the Gitlab checklist

  • Learn how to use the Create and Update Projects notebook

  • Learn how to use the Create Plone folders and add users notebook

The first step when you are contacted by a researcher who wants to register their project with the Database is to request Project Metadata. For most Nodes, this is in the form of a plaintext .txt file, using the template provided here. This file allows the researcher to provide information on the core attributes of the project, including the scientific abstract, associated investigators, geospatial details, temporal and taxonomic range.

Completed Metadata

Immediately, upon receipt of the metadata, a new Gitlab Issue should be created. Please use the Project Metadata Issue checklist template.

Here is the Issue checklist, for reference:

Project Metadata
- [ ] - NAME add label *'loading records'*
- [ ] - NAME define type of project  **select here one of Data, Deployment, Tracker**
- [ ] - NAME create schema and project records (`Creating and Updating project metadata` notebook)
- [ ] - NAME add project contact information (`Creating and Updating project metadata` notebook)
- [ ] - NAME add scientificnames (`Creating and Updating project metadata` notebook)
- [ ] - NAME verify all of above (`Creating and Updating project metadata` notebook)
- [ ] - NAME create new project repository users (`Create Plone Folders and Add Users` notebook)
- [ ] - NAME create project repository folder (`Create Plone Folders and Add Users` notebook)
- [ ] - NAME add project repository users to folder (`Create Plone Folders and Add Users` notebook)
- [ ] - NAME access project repository double-check project repository creation and user access
- [ ] - NAME add project metadata file to project folder (OTN members.oceantrack.org, FACT RW etc)
- [ ] - NAME send onboarding email to PIs
- [ ] - NAME label issue with *'Verify'*
- [ ] - NAME pass issue to OTN analyst for final verification
- [ ] - NAME verify project in database
- [ ] - NAME pass issue to OTN DAQ staff
- [ ] - NAME [OTN only] if this is a loan, update links for PMO
- [ ] - NAME [OTN only] manually identify if this is a loan, if so add record to otnunit.obis.loan_tracking (`Creating and Updating project metadata` notebook)

**project metadata txt file:**

Visual Inspection

Once the completed file is received from a researcher, the Data Manager should complete a visual check for formatting and accuracy.

Things to check:

  1. Is the PI-provided collection code unique/appropriate? Do you need to create one yourself? Existing schemas/collection codes can be seen in the database.
  2. Are there typos in the title or abstract?
  3. Are the contacts formatted correctly?
  4. Are the species formatted correctly?
  5. Does the location make sense based on the abstract, and is it formatted correctly (one per line)?

In general, most commonly formatting errors occur in the Contacts section. Pay close attention here.

Below is an example of a properly completed metadata form, for your reference.

===FORM START===
0. Intended/preferred project code, if known? (May be altered by OTNDC)
format: XXXX (3-6 uppercase letters that do not already have a representation in the OTN DB. Will be assigned if left blank)

NSBS

1. Title-style description of the project?
format:  < 70 words in 'paper title' form

OTN NS Blue Shark Tracking

2. Brief abstract of the project?
format: < 500 words in 'abstract' form

In the Northwest Atlantic, the Ocean Tracking Network (OTN), in collaboration with Dalhousie University, is using an acoustic telemetry infrastructure to monitor the habitat use, movements, and survival of juvenile blue sharks (Prionace glauca). This infrastructure includes state-of-the-art acoustic receivers and oceanographic monitoring equipment, and autonomous marine vehicles carrying oceanographic sensors and mobile acoustic receivers. Long-life acoustic tags (n=40) implanted in the experimental animals will provide long-term spatial resolution of shark movements and distribution, trans-boundary migrations, site fidelity, and the species’ response to a changing ocean. This study will facilitate interspecific comparisons, documentation of intra- and interspecific interactions, and permit long-term monitoring of this understudied predator in the Northwest Atlantic. The study will also provide basic and necessary information to better inform fisheries managers and policy makers. This is pertinent given the recent formulation of the Canadian Plan of Action for Shark Conservation.

3. Names, affiliations, email addresses, and ORCID (if available) of researchers involved in the project and their role.

The accepted Project Roles are defined as:
Principal Investigator: PI or Co-PI. The person(s) responsible for the overall planning, direction and management of the project.
Technician: Person(s) responsible for preparation, operation and/or maintenance of shipboard, laboratory or deployed scientific instrumentation, but has no invested interest in the data returned by that instrumentation.
Researcher: Person(s) who may use/analyse the data to answer research questions, but is not the project lead. Can be a student if their involvement spans past the completion of an academic degree.
Student: Person(s) researching as part of a project as part of their work towards an academic degree.
Collaborator: A provider of input/support to a project without formal involvement in the project.

Please add 'Point of Contact' to the contact(s) who will be responsible for communicating with OTN.

format: Firstname Lastname, Employer OR Affiliation, Project Role (choose from above list), email.address@url.com, point of contact (if relevant), ORCID

Fred Whoriskey, OTN, principal investigator, fwhoriskey@dal.ca, 0000-0001-7024-3284
Sara Iverson, OTN, principal investigator, sara.iverson@dal.ca
Caitlin Bate, Dal, researcher, caitlin.bate@dal.ca, point of contact


4. Project URL - can be left blank
format: http[s]://yoursite.com

https://members.oceantrack.org/

5. Species being studied?  
format: Common name (scientific name)

Blue shark (Prionace glauca)
Sunfish (Mola mola)

6. Location of the project?
format: (city, state/province OR nearby landmark OR lat/long points in decimal degree), one per line

Halifax, NS
44.19939/-63.24085

7. Start and end dates of the project, if known?
format: YYYY-MM-DD to YYYY-MM-DD ('ongoing' is an acceptable end date)

2013-08-21 to ongoing

8. Citation to use when referencing this project:
format: Lastname, I., Lastname, I. YYYY. [Title from question 1 or suitable alternative] Will be assigned if left blank.


===FORM END===

Quality Control - Create and Update Projects

Each step in the Issue checklist will be discussed here, along with other important notes required to use the Nodebooks.

Imports cell

This section will be common for most Nodebooks: it is a cell at the top of the notebook where you will import any required packages and functions to use throughout the notebook. It must be run first, every time.

You will have to edit one section: engine = get_engine()

Project Metadata Parser

This cell is where you input the information contained in the Project Metadata .txt file. There are two ways to do this:

  1. You can paste in the filepath to the saved .txt. You should ensure the formatting follows this example: project_template_path = '/path/to/project_metadata.txt'
  2. You can paste the entire contents of the file, from ===FORM START=== to ===FORM END=== inclusive, into the provided triple quotation marks.

Once either option is selected, you can run the cell to complete the quality control checks.

The output will have useful information:

Generally, most of the error messages arise from the Contacts and Species sections.

If there is information which is not parsing correctly, you should fix the source-file and re-run the cell until you are happy with the output.

Manual Fields - Dropdown Menu

There are some fields which need to be setup by the Data Manager, and not set by the researcher. These are in the next cell.

You will run this cell, and a fillable form will appear.

  1. Node: select your node
  2. Collaboration Type: based on the abstract, are they deploying only tags (Tracker project), only receivers (Deployment project) or both tags and receivers (Data project)?
  3. Ocean: choose the most appropriate ocean region based on the abstract.
  4. Shortname: use the Title provided by the researcher, or something else, which will be used as the name of the Data Portal folder. ex: OTN Blue Sharks.
  5. Longname: use the Title provided by the researcher, or something else, which is in “scientific-paper” style. ex: Understanding the movements of Blue sharks through Nova Scotia waters, using acoustic telemetry.
  6. Series Code: this will generally be the name of your node. Compare to values found in the database obis.otn_resources if you’re unsure.
  7. Institution Code: The main institution responsible for maintaining the project. Compare to values found in the database obis.institution_codes and obis.otn_resources if you’re unsure. If this is a new Institution, please make a note in the Issue, so you can add it later on
  8. Country: based upon the abstract. Multiple countries can be listed as such: CANADA, USA, EGYPT etc.
  9. State: based upon the abstract. Multiple states can be listed as such: NOVA SCOTIA, NEWFOUNDLAND etc.
  10. Local Area: based upon the abstract. Location information. ex: Halifax
  11. Locality: based upon the abstract. Finest-scale of location. ex: Shubenacadie River
  12. Status: is the project completed, ongoing or something else?

To save and parse your inputted values DO NOT re-run the cell - this will clear all your input. Instead, the next cell is the one which needs to be run to parse the information.

Verify the output from the parser cell, looking for several things:

  1. Are all the fields marked with a green OK?
  2. Is the following feedback included in the institution code section: Found institution record for XXX in your database: followed by a small, embedded Table?

If anything is wrong, please begin again from the Manual Field input cell.

If the institution code IS NOT found - compare to values found in the database obis.institution_codes and obis.otn_resources. If this is a new Institution, please make a note in the Issue, so you can add it later on

Task list checkpoint

In Gitlab, this task can be completed at this stage:

- [ ] - NAME define type of project **select here one of Data, Deployment, Tracker**

Please edit to include the selected project type, in harmony with the selected field in the notebook.

Verifying Correct Info

At this stage, we have all the information parsed that we need in order to register the project in the database. There is now a cell that will print out every saved value for your review.

You are looking for:

If this information is satisfactory, you can proceed.

Adjust Bounding Box

The cell titled Verify the new project details before writing to the DB is the final step for verification before the values are written to the database. At this stage a map will appear, with the proposed Bounding Box for the project.

Based on the abstract, you can use the Square Draw Tool to re-draw the bounding box until you are happy with it.

Proj 1

Once you are happy, you can run the next cell in order to save your bounding adjustments. The sucess output should be formatted like this:

--- Midpoint ---
Latitude:
Longitude:

Create New Institution

STOP - confirm there is no Push currently ongoing. If a Push is ongoing, you must wait for it to be completed before processing beyond this point.

Remember above, where we noted whether or not an institution existed on obis.institution_codes or if it was a new institution? This cell is our opportunity to add any institutions if they are new. If all institutions (for each contact, plus for the project as a whole) exist, then you can skip this cell.

To run the cell, you will need to complete:

  1. Institution Code: a short-code for the institution (ex: DAL)
  2. Institution Name: the full institution name (ex: Dalhousie University)
  3. Institution Website: the institution’s website (ex: https://www.dal.ca/). You can confirm the URL is valid with the Check URL button.
  4. Organization ID: Press the Search Org ID button. A list of potential WikiData, GRID, ROR, EDMO and NERC results for the institution will appear. Choose the best match, and paste that URL into the blank Organization ID cell.
  5. Institution Country: the country where the institution is headquartered
  6. Institution State: the state where the institution is headquartered
  7. Institution Sector: one of Unknown, Government/Other Publuc, Univserity/College/Research Hospital, Private, or Non-profit.

Once all values are completed, press Create Institution and confirm the following output:

Institution record 'DAL' created.

You can re-run this cell as many times as you need, to add each missing institution.

Write Project to the Database

STOP - confirm there is no Push currently ongoing. If a Push is ongoing, you must wait for it to be completed before processing beyond this point.

Finally, it is time to write the project records to the database and create the new project!

First, you should run this cell with printSQL = True. If there are no errors, you can edit and change to printSQL = False and run again. This will register the project!

You will see some output - confirm each line is accompanied by a green OK.

Task list checkpoint

In Gitlab, this task can be completed at this stage:

- [ ] - NAME create schema and project records ("Creating and Updating project metadata" notebook)

Save Contact Information

At this stage, the next step is to add contact information for all identified collaborators.

This cell will gather and print out all contacts and their information. Review for errors. If none exist, move to the next cell.

The next cell writes to the database: STOP - confirm there is no Push currently ongoing. This cell will add each contact to the database, into the obis.contacts and obis.contacts_projects tables, as needed.

There should be output similar to this:

Valid contact:  Fred Whoriskey OTN principalInvestigator fwhoriskey@dal.ca
Created contact Fred Whoriskey

Task list checkpoint

In Gitlab, this task can be completed at this stage:

- [ ] - NAME add project contact information ("Creating and Updating project metadata" notebook)

Add Species Information

The following section will allow you to add the required information into the obis.scientific_names table.

The first cell imports the required function.

The second cell, when run, will create an editable input form for each animal.

You should review to confirm the following:

  1. the scientific name is matching to an accepted WoRMS taxon. There should be a URL provided, and no error messages.
  2. the common name is acceptable. If it is not, you can choose a value from the dropdown menu (taken directly from WoRMS’ vernacular list) OR you can enter a custom value to match the common name provided by the researcher.

Once you are sure that both the scientific and common names are correct, based on the information provided by both the project and the notebook, you may click the Add to project button for each animal record you’d like to insert.

There will be a confirmation display in the notebook to demonstrate if the insertion was successful.

Task list checkpoint

In Gitlab, this task can be completed at this stage:

- [ ] - NAME add scientificnames ("Creating and Updating project metadata" notebook)

OPTIONAL: Add Project Loan Information

The following section is used by OTN staff to track projects which are recipients of OTN-equipment loans. This section is not within the scope of this Node Manager Training, because it requires a login-file for the otnunit database.

Skip to Verification

Once you scroll past the Project Loan Information section, you will see a yellow star and the words Skip to the new project Verification. You should click the button provided, which will help you scroll to the bottom of the notebook, where the Verify section is located.

This is a chance to visually review all the fields you just entered. You should enter run these cells and review all output to ensure the database values align with the intended insertions.

Task list checkpoint

In Gitlab, this task can be completed at this stage:

- [ ] - NAME verify all of above ("Creating and Updating project metadata" notebook)

Other Features of this Notebook

There are more features in the Create and Update Projects notebook than those covered above.

Schema extension

This section is to be used when you have a project which is either Tracker or Deployment and is expanding to become a Data project. Ex: a project which was only tagging has begun deploying receivers.

You can use this notebook to create the missing tables for the schema. Ex: if a tagging project begins deploying receivers, the schema would now need stations, rcvr_locations, and moorings tables created.

Schema updating

This section is to be used to change the values contained in obis.otn_resources. The first cell will open an editable form with the existing database values. You can change the required fields (ex: abstract).

To save and parse your inputted values DO NOT re-run the cell - this will clear all your input. Instead, the next cell is the one which needs to be run to parse the information.

Review the output of the parser cell and check for typos.

The next cell will show the changes that will be made to the project data. You can copy this output and paste it into the relevant Gitlab Issue for tracking.

The final cell will make the desired changes in the database. Ensure printSQL = False if you want the cell to execute directly.

The output should look like this to confirm success:

'Resource record for HFX has been updated.'

The following, highlighted section is relevant only to Nodes who use Plone for their document management system

Quality Control - Create Plone Users and Access

If you are part of a Node that uses Plone as your document repository, then the following will be relevant for you.

Imports cell

This section will be common for most Nodebooks: it is a cell at the top of the notebook where you will import any required packages and functions to use throughout the notebook. It must be run first, every time.

You will have to edit one section: engine = get_engine()

  • Within the open brackets you need to open quotations and paste the path to your database .kdbx file which contains your login credentials.
  • On MacOS computers, you can usually find and copy the path to your database .kdbx file by right-clicking on the file and holding down the “option” key. On Windows, we recommend using the installed software Path Copy Copy, so you can copy a unix-style path by right-clicking.
  • The path should look like engine = get_engine(‘C:/Users/username/Desktop/Auth files/database_connection.kdbx’).

Plone Login

The first cell is another import step.

The second cell requires input:

  • Proper Plone log-in information must be written in the plone_auth = get_plone_auth('./plonetools/plone_auth.json') file.
  • In order to do this, click on the Jupyter icon in the top left corner of the page.
  • This will bring you to a list of folders and notebooks. Select the plonetools folder. From there, select the plone_auth.json file and input your Plone base URL, username, and password. Hint: ensure the base_url in your json file ends in a slash, like https://members.oceantrack.org/!
  • You can now successfully log into Plone.

Now, when you run the cell, you should get following output:

Auth Loaded:
------------------------------------------------------------------------------
base_url: https://members.oceantrack.org/
user_name: user
verify ssl: False

Finally, the third cell in this section will allow you to login. You should see this message:

Login Successful!

Access Project Info

Some information is needed in order to create the project Plone folders.

There are three ways to enter this information:

  1. Access Project Information from Database
  2. Manual Project Information Form - Parse Contacts
  3. Manual Project Information Form - Insert Contacts into Textfields

The first option is generally the easiest, if the project has already been successfully written to the database using the Create and Update Projects notebook. To do this, you enter the collectioncode of your project, and run the cell. If there are no errors, you can click the SKIP button which will take you down the notebook to the next section.

Create Missing Users

This section will use the registered project contacts and compare against existing Plone users. It will compare by 1) email, 2) fullname, 3) lastname.

If a user is found: you will not need to create a new account for them.

If a user is not found: you will have to create an account for them. To do this, you can use the editable form in the next cell.

The editable cell will allow you to choose each contact that you’d like to register, and will autofill the information (including a suggested username). The password should be left blank. Once you are happy with the form, click Add User. An email will be sent to the new user, prompting them to set a password. Then you can repeat by selecting the next contact, etc.

Once all contacts have Plone accounts (new or otherwise) you are finished.

Task list checkpoint

In Gitlab, this task can be completed at this stage:

- [ ] - NAME create new project repository users ("Create Plone Folders and Add Users" notebook)

Create Project Repository

To create the project folder you must first enter the relevant Node information:

  • otnunit: node = None
  • safnode, migramar, nepunit:node = "node" - lowercase with quotation marks, fill in the value based on the path in Plone.
  • all other nodes (not hosted by OTN): node = None

Running this cell will print out an example of the URL, for your confirmation. Ensure the collectioncode and Node are correct.

The expected format:

https://members.oceantrack.org/data/repository/node_name/collectioncode (node name is for SAF, MigraMar and NEP only)

If you are confident the folder path is correct, you can run the next cell and confirm the following success message:

Creating collection folder 'collectioncode'. Done!
https://members.oceantrack.org/data/repository/node_name/collectioncode

Task list checkpoint

In Gitlab, this task can be completed at this stage:

- [ ] - NAME create project repository folder ("Create Plone Folders and Add Users" notebook)

Add Users to Repository

Now that the users AND folder have been created, the users must be given access to the new folder.

Using the final cell in the Plone repository folder creation section, you will be provided with an editable search-bar.

Type in the Plone username of each contact (new and existing). Search results will appear:

  1. select the User who you would like to add
  2. choose their permissions
  3. click “Change repo permissions” to add them to the folder.

Review for the following success message:

Changed https://members.oceantrack.org/data/repository/node_name/collectioncode sharing for username:
	Contributor=True Reviewer=True Editor=True Reader=True

Then you may choose Add another user and begin again.

The acceptable folder permissions may vary depending on the project role of the contact. Here are some guidelines:

  • Principal Investigator: all permissions
  • Researcher: all permissions except Reviewer
  • Student: all permissions except Reviewer
  • Technician: only Contributor and Reader
  • Collaborator: only Contributor and Reader

This is very fluid and can be edited at any time. These are guidelines only!

Task list checkpoint

In Gitlab, this task can be completed at this stage:

- [ ] - NAME add project repository users to folder ("Create Plone Folders and Add Users" notebook)

OPTIONAL: Add Project Loan Information

The following section is used by OTN staff to track projects which are recipients of OTN-equipment loans. This section is not within the scope of this Node Manager Training, because it requires a login-file for the otnunit database.

Final Steps

The remaining steps in the Gitlab Checklist are completed outside the notebooks.

First: you should access the created Repository folder in your browser and confirm if the title and sharing information is correct. If so, add the project metadata .txt file into the “Data and Metadata” folder to archive.

Next, you should send an email to the project contacts letting them know their project code and other onboarding information.

Finally, the Issue can be passed off to an OTN-analyst for final verification in the database.

Key Points

  • Loading project metadata requires judgement from the Data Manager

  • Loading project metadata is the first step towards a functioning project


Tagging Metadata

Overview

Teaching: 20 min
Exercises: 0 min
Questions
  • How do I load new tags into the Database?

Objectives
  • Understand the proper template-completion

  • Understand how to use the Gitlab checklist

  • Learn how to use the Tag-1 notebook

  • Learn how to use the Tag-1b notebook

  • Learn how to use the Tag-2 notebook

Once a project has been registered, the next step (for Tracker and Data project types) is to quality control and load the tagging metadata into the database. Tagging metadata should be reported to the Node in the template provided here. This file will contain information about the deployment of tags (acoustic, PIT, satellite, floy etc.) in or on animals for the purposes of tracking their movements using either listening stations or via mark/recapture. Any biological metrics that were measured at tagging time, i.e. length, weight, population, are recorded for future analysis.

Remembering a previous lesson, there are multiple levels of data-tables in the database for tagging records: raw tables, cache tables and otn tables. The process for loading tagging metadata reflects this, as does the Gitlab task list.

Completed Metadata

Immediately, upon receipt of the metadata, a new Gitlab Issue should be created. Please use the Tag Metadata Issue checklist template.

Here is the Issue checklist, for reference:

Tag Meta Data
- [ ] - NAME add label *'loading records'*
- [ ] - NAME load raw tag metadata (`tag-1` notebook) **put_table_name_in_ticket**
- [ ] - NAME confirm no duplicates in raw table, review and remove (`tag-1b` notebook)
- [ ] - NAME verify raw table (`tag-2` notebook)
- [ ] - NAME post updated metadata to project folder (OTN members.oceantrack.org, FACT RW etc) if needed
- [ ] - NAME build cache tables (`tag-2` notebook)
- [ ] - NAME verify cache tables (`tag-2` notebook)
- [ ] - NAME load otn tables (`tag-2` notebook)
- [ ] - NAME verify otn tables (`tag-2` notebook)
- [ ] - NAME verify tags are not part of another collection (`tag-2` notebook)
- [ ] - NAME label issue with *'Verify'*
- [ ] - NAME pass issue to analyst for final verification
- [ ] - NAME check for double reporting (verification_notebooks/Tag Verification notebook)

Visual Inspection

Once the completed file is received from a researcher, the Data Manager should first complete a visual check for formatting and accuracy.

In general, Tagging Metadata has 3 sections:

  1. information about the tag
  2. information about the animal
  3. information about the tag deployment

Information about the tag is obtained from Tag Specifications and is mandatory. Information about the animal is limited to the sampling conducted. The minimum requirement here is simply the common/scientific names. All other columns for biological parameters are optional. Information about the tag deployment includes location and dates and is mandatory to complete.

Things to visually check in the metadata:

  1. Is there any information missing from the essential columns? These are:
    • tag_type
    • tag_manufacturer
    • tag_model
    • tag_serial_number (if not completed, unknown can be entered ONLY IF animal_id is also completed with unique values)
    • tag_id_code
    • tag_code_space
    • tag_implant_type
    • est_tag_life
    • common_name_e
    • scientific_name
    • release_location
    • release_latitude
    • release_longitude
    • utc_release_date_time
  2. If any of the above mandatory fields are blank, follow-up with the researcher will be required if:
    • you cannot discern the values yourself
    • you do not have access to the Tag Specifications from the manufacturer (relevant for the columns containing tag information).
  3. Are the tag_id_code and tag_code_space values formatted correctly?
  4. Is the UTC_release_date_time column formatted correctly?

In general, most commonly formatting errors occur in the information about the tag section. Pay close attention here.

The metadata template available here has a Sample Data Row as an example of properly-formatted metadata, along with the Data Dictionary sheet which contains detailed expectations for each column. Refer back to these often. We have also included some recommendations on our FAQ page. Here are some guidelines:

Quality Control - Tag-1 Notebook

Each step in the Issue checklist will be discussed here, along with other important notes required to use the Nodebook.

Imports cell

This section will be common for most Nodebooks: it is a cell at the top of the notebook where you will import any required packages and functions to use throughout the notebook. It must be run first, every time.

There are no values here which need to be edited.

Path to File

In this cell, you need to paste a filepath to the relevant Tagging Metadata file. The filepath will be added between the provided quotation marks.

Correct formatting looks something like this:

# Path to your tagging metadata file - Formats Accepted: (csv, xls or xlsx)
filepath = r"C:/Users/path/to/tag_metadata.xlsx"

Once you have added your filepath, you can run the cell.

Verification of File Contents - formatting

First, you must choose which sheet you would like to quality control. Generally, it will be names Tag Metadata but is often customized by researchers. Once you have selected the sheet name, do not re-run the cell to save the output - simply ensure the correct sheet is highlighted and move onto the next cell.

This cell will now complete the first round of Quality Control checks.

The output will have useful information:

The notebook will indicate the sheet had passed quality control by adding a ✔️green checkmark beside each section. There should also be an interactive plot generated, summarizing the tags released over time, by species, for you to explore.

If there is information which is not passing quality control, you should fix the source-file (potentially speaking to the researcher) and try again.

Tag 1

Connection to Database

You will have to edit two sections:

  1. schema = 'collectioncode'
    • please edit to include the relevant project code, in lowercase, between the quotes.
  2. engine = get_engine()
    • Within the open brackets you need to open quotations and paste the path to your database .kdbx file which contains your login credentials.
    • On MacOS computers, you can usually find and copy the path to your database .kdbx file by right-clicking on the file and holding down the “option” key. On Windows, we recommend using the installed software Path Copy Copy, so you can copy a unix-style path by right-clicking.
    • The path should look like engine = get_engine(‘C:/Users/username/Desktop/Auth files/database_connection.kdbx’).

Once you have added your information, you can run the cell. Successful login is indicated with the following output:

Auth password:········
Connection Notes: None
Database connection established
Connection Type:postgresql Host:db.load.oceantrack.org Database:otnunit User:admin Node:OTN

Bounding Box Check

The following cell will map the locations of all the tag deployments, compared to the Bounding Box of the project. This is used to confirm the following:

  1. the tag release locations are in the part of the world expected based on the project abstract. Ex: lat/long have correct +/- signs
  2. the project bounding box is correct

If it appears the tag release locations have incorrect signs, you can fix it in the source file and re-run the cell.

If it appears there are tag release locations which are on land, you may want to reach out to the researcher for corrected coordinates.

If the bounding box needs to be expanded to include the tags, you can use the Square Draw Tool to re-draw the bounding box until you are happy with it. Once all tags are drawn inside the bounding box, press the Adjust Bounding Box button to save the results.

Tag 1 - bounds

Verification of File Contents - against database

This cell will now complete the second round of Quality Control checks.

The output will have useful information:

  1. Have these tags been used on other projects in the database? Check the dates to ensure they don’t overlap, and double-reporting is not taking place.
  2. Do we have the Tag Specifications from the manufacturer? Do the tag_id_code, tag_code_space and est_tag_life match the specifications for each provided serial number? Are there typos or errors that require clarification from the researcher?
  3. Is the information about the animal formatted according to the Data Dictionary?
    • Are all the life stages in the obis.lifestage_codes table? If not, the reported life stage should be compared to the values in the obis.lifestage_code table, and adjusted if possible. Otherwise, use the add_lifestage_codes notebook
    • Are all length types in the obis.length_type_codes table? If not, the reported length type code should be compared to the values in the obis.length_type_codes table, and adjusted if possible. Otherwise, use the add_lengthtype_codes notebook
    • Are all the age units in the obis.control_units table? If not, the reported age units should be compared to the values in the obis.control_units table, and adjusted if possible. Otherwise, use the add_control_units notebook
  4. Are there any tags in this sheet which have been previously reported on this project in the metadata? ex: duplicates.
  5. Do the scientific and common names match the records which are previously added to obis.scientificnames for this schema? If not, please check the records in the obis.scientificnames and the source file to confirm there are no typos. If this is a new species tagged by this project, use the scientific_name_check notebook to add the new species.
  6. Are all the provided tag_model values present in the obis.instrument_models table? If not, please check the records in the obis.instrument_models and the source file to confirm there are no typos. If this is a new model which has never been used before, use the add instrument_models notebook to add the new tag model.
  7. Are there any tags in this sheet which have been previously reported on this project in the metadata, but with different deployment dates? ex: overlapping/missing harvest dates
  8. Are there any tags being flagged as overlapping tag deployments, but not as duplicate tags? There may be an error with the tag’s serial number. Check if the tag’s ID exists in the otn_transmitters table of the schema or in the vendor.c_vemco_tags table, and compare it to the tag in the tagging metadata sheet. Fix the tag in the tagging metadata sheet if any errors are found.
  9. Are there any release dates in the future?

The notebook will indicate the sheet had passed quality control by adding a ✔️green checkmark beside each section.

If there is information which is not passing quality control, you should fix the source-file (potentially speaking to the researcher) and try again.

Loading the Raw Table

ONLY once the source file has successfully passed ALL quality control checks can you load the raw table to the database.

In this cell, you will have to edit one section in order to name the raw table. Between the quotes, please add your custom table suffix. We recommend using year_month or similar, to indicate the most-recently tagged animal in the metadata sheet.

table_suffix = "YYYY_mm"

The notebook will indicate the success of the table-creation with the following message:

Reading file: otn_metadata_tagging.xlsx.
Tag Metadata sheet found, reading file... OK
Loading 'otn_metadata_tagging.xlsx' file into collectioncode.c_tag_meta_YYYY_mm... OK
Loaded 37 records into table collectioncode.c_tag_meta_YYYY_mm
True

Task list checkpoint

In Gitlab, this task can be completed at this stage:

- [ ] - NAME load raw tag metadata ('tag-1' notebook) **put_table_name_in_ticket**

Ensure you paste the table name (ex: c_tag_meta_YYYY_mm) into the section indicated, before you check the box.

Quality Control - Tag-1b Notebook

Once the raw table is successfully loaded, the next step is to ensure any previously-loaded tags are not re-added to the database (causing duplication errors).

Imports cell

This section will be common for most Nodebooks: it is a cell at the top of the notebook where you will import any required packages and functions to use throughout the notebook. It must be run first, every time.

There are no values here which need to be edited.

Table Name and Database Connection

You will have to edit three sections:

  1. engine = get_engine()
    • Within the open brackets you need to open quotations and paste the path to your database .kdbx file which contains your login credentials.
    • On MacOS computers, you can usually find and copy the path to your database .kdbx file by right-clicking on the file and holding down the “option” key. On Windows, we recommend using the installed software Path Copy Copy, so you can copy a unix-style path by right-clicking.
    • The path should look like engine = get_engine(‘C:/Users/username/Desktop/Auth files/database_conn_string.kdbx’).
  2. schema = 'collectioncode'
    • please edit to include the relevant project code, in lowercase, between the quotes.
  3. table_name = 'c_tag_meta_YYYY_mm'
    • Within the quotes, please add the name of the raw table.

Once you have added your information, you can run the cell. Successful login is indicated with the following output:

Auth password:········
Connection Notes: None
Database connection established
Connection Type:postgresql Host:db.load.oceantrack.org Database:otnunit User:admin Node:OTN

Checking for Duplicates

This cell, once run, may print out that there are No Duplicates Found. If so, you can move on.

If there is output identifying duplicates, some review is necessary.

Immediately, the notebook will create a new table, named schema.c_tag_meta_YYYY_mm_no_dup which is an exact copy of your raw table. There will be a print-out saying the following (under a table):

Building schema.c_tag_meta_YYYY_mm_no_dup table:
The following SQL queries will display the differences/similarities between the raw table (schema.c_tag_meta_YYYY_mm_no_dup) record and the otn_animals or otn_transmitters table record.

Then, there will be two interactive tables provided, which can be used to identify and delete any duplicate records from the no_dup table.

  1. Comparison to otn_animals
  2. Comparison to otn_transmitters

All text should be black, and you should scroll through the entire table, row-by-row, to review any values where the cell is highlighted/flashing RED. These are highlighting differences between the values in the raw table vs values in the otn table, for this animal record.

If there are no highlighted/flashing red cells in a row, you can delete the record from the no_dup table by using the delete button on the left-side of the table. You can also use the Select All button if appropriate.

If there are highlighted/flashing red cells in a row, you must compare the highlighted values. These cannot be deleted without previous review since they are not exact duplicates, and could indicate an update to the database records is needed.

Ex: life_stage from the raw table might = SMOLT while lifestage from otn_animals might = ADULT. This will cause the column lifestage_equal? to read FALSE and the cells to flash red. In this example, it could indicate that the researcher was correcting an error in the previously-submitted record (this animal was actually a smolt, not adult) and therefore we need to update the record in our database. In this case, you would need to email the researcher to confirm, pause processing this Issue, and create a new DB_fix Issue with the information which needs updating.

If you review all the flashing red cells and find they are only rounding errors, or similar non-significant data changes, you can determine that they are indeed duplicate records. You can now use the delete button on the left-side of the table. You can also use the Select All button, if all records are true duplicates.

Once you have selected a row to be deleted, the text will turn red to help you visualize your selection.

Deleting Duplicates

Once you have identified all the true duplicate records, this next cell will remove them from the no_dup table. Success will be indicated with this message, and a table:

Compiling list of deletions to the schema.c_tag_meta_YYYY_mm_no_dup table.
Delete the following XXX records from schema.c_tag_meta_YYYY_mm_no_dup?

You will be able to press the button to confirm, and the notebook will remove all the records.

Task list checkpoint

In Gitlab, this task can be completed at this stage:

- [ ] - NAME confirm no duplicates in raw table, review and remove ("tag-1b" notebook)

Ensure you paste the no_dup table name(ex: c_tag_meta_2021_09_no_dup), if relevant, into the Issue before you check the box. This is now the raw table that will be used for the result of the data-loading process.

Quality Control - Tag-2 Notebook

Now that the raw table is free from duplicates, we can begin to move the records into the higher-level cache and otn tables, where they will be matched to detections.

Imports cell

This section will be common for most Nodebooks: it is a cell at the top of the notebook where you will import any required packages and functions to use throughout the notebook. It must be run first, every time.

There are no values here which need to be edited.

Database Connection

You will have to edit one section:

  1. engine = get_engine()
    • Within the open brackets you need to open quotations and paste the path to your database .kdbx file which contains your login credentials.
    • On MacOS computers, you can usually find and copy the path to your database .kdbx file by right-clicking on the file and holding down the “option” key. On Windows, we recommend using the installed software Path Copy Copy, so you can copy a unix-style path by right-clicking.
    • The path should look like engine = get_engine(‘C:/Users/username/Desktop/Auth files/database_conn_string.kdbx’).

Once you have added your information, you can run the cell. Successful login is indicated with the following output:

Auth password:········
Connection Notes: None
Database connection established
Connection Type:postgresql Host:db.load.oceantrack.org Database:otnunit User:admin Node:OTN

Table Name

You will have to edit two sections:

  1. table_name = 'c_tag_meta_YYYY_mm'
    • Within the quotes, please add the name of the raw table. Might be the no_dup table if relevant.
  2. schema = 'collectioncode'
    • please edit to include the relevant project code, in lowercase, between the quotes.

Verification of File Contents - against database

This cell will now complete the final round of Quality Control checks. These are exactly the same as the checks at the end of the tag-1 notebook.

The output will have useful information:

  1. Have these tags been used on other projects in the database? Check the dates to ensure they don’t overlap, and double-reporting is not taking place.
  2. Do we have the Tag Specifications from the manufacturer? Do the tag_id_code, tag_code_space and est_tag_life match the specifications for each provided serial number? Are there typos or errors that require clarification from the researcher?
  3. Is the information about the animal formatted according to the Data Dictionary?
    • Are all the life stages in the obis.lifestage_codes table? If not, the reported life stage should be compared to the values in the obis.lifestage_code table, and adjusted if possible. Otherwise, use the add_lifestage_codes notebook
    • Are all length types in the obis.length_type_codes table? If not, the reported length type code should be compared to the values in the obis.length_type_codes table, and adjusted if possible. Otherwise, use the add_lengthtype_codes notebook
    • Are all the age units in the obis.control_units table? If not, the reported age units should be compared to the values in the obis.control_units table, and adjusted if possible. Otherwise, use the add_control_units notebook
  4. Are there any tags in this sheet which have been previously reported on this project in the metadata? ex: duplicates.
  5. Do the scientific and common names match the records which are previously added to obis.scientificnames for this schema? If not, please check the records in the obis.scientificnames and the source file to confirm there are no typos. If this is a new species tagged by this project, use the scientific_name_check notebook to add the new species.
  6. Are all the provided tag_model values present in the obis.instrument_models table? If not, please check the records in the obis.instrument_models and the source file to confirm there are no typos. If this is a new model which has never been used before, use the add instrument_models notebook to add the new tag model.
  7. Are there any tags in this sheet which have been previously reported on this project in the metadata, but with different deployment dates? ex: overlapping/missing harvest dates
  8. Are there any tags being flagged as overlapping tag deployments, but not as duplicate tags? There may be an error with the tag’s serial number. Check if the tag’s ID exists in the otn_transmitters table of the schema or in the vendor.c_vemco_tags table, and compare it to the tag in the tagging metadata sheet. Fix the tag in the tagging metadata sheet if any errors are found.
  9. Are there any release dates in the future?

The notebook will indicate the sheet had passed quality control by adding a ✔️green checkmark beside each section.

If there is information which is not passing quality control, you should fix the source-file (potentially speaking to the researcher), delete or edit the raw table, and try again.

Task list checkpoint

In Gitlab, this task can be completed at this stage:

- [ ] - NAME verify raw table ("tag-2" notebook)

Loading to Cache Tables

The first cell will create the cache tables, with the following success messaging:

Creating table schema.tagcache_YYYY_mm
Creating table schema.animalcache_YYYY_mm
Table creation finished.

The next step will populate the tag_cache and animal_cache tables, but splitting apart the information about the tag and the information about the animal, joining the records by a unique catalognumber based on the tag deployment information.

Running this cell will provide the following success message:

Added XX records to the schema.animalcache_YYYY_mm table
Added XX records to the schema.tagcache_YYYY_mm table

You need to pay special attention to the number of records loaded to the animal and tag caches. If this number doesn’t match you may need to investigate why there are more tags than animals or vice versa. Possible reasons the values may not match:

If the values are acceptable, you can move on.

Task list checkpoint

In Gitlab, this task can be completed at this stage:

- [ ] - NAME build cache tables ("tag-2" notebook)

Verifying Cache Tables

This cell will now complete the Quality Control checks of the cache tables.

The output will have useful information:

The notebook will indicate the sheet had passed quality control by adding a ✔️green checkmark beside each section.

If there are any errors go into database and fix the cache tables themselves, and re-run

Task list checkpoint

In Gitlab, this task can be completed at this stage:

- [ ] - NAME verify cache tables ("tag-2" notebook)

Loading to OTN Tables

STOP - confirm there is no Push currently ongoing. If a Push is ongoing, you must wait for it to be completed before processing beyond this point

This cell will populate the otn_animals and otn_transmitters master-tables, with the following success messaging:

Added XX records to schema.otn_animals table from animalcache_YYYY_mm
Added XX records to schema.otn_transmitters table from tagcache_YYYY_mm

The number of records added should match the number from the cache table loading step.

Task list checkpoint

In Gitlab, these tasks can be completed at this stage:

- [ ] - NAME load otn tables ("tag-2" notebook)

Verifying OTN Tables

This cell will now complete the Quality Control checks of the tag and animal records contained in the entire schema. We are no longer checking our newly-loaded records only, but also each previously-loaded record.

The output will have useful information:

The notebook will indicate the sheet had passed quality control by adding a ✔️green checkmark beside each section.

If there are any errors, go into the database and fix the otn tables themselves, or contact the researcher, and re-run.

Task list checkpoint

In Gitlab, these tasks can be completed at this stage:

- [ ] - NAME verify otn tables ("tag-2" notebook)
- [ ] - NAME verify tags are not part of another collection (`tag-2` notebook)

Final Steps

The remaining steps in the Gitlab Checklist are completed outside the notebooks.

First: you should access the Repository folder in your browser and add the cleaned Tag Metadata .xlsx file into the “Data and Metadata” folder.

Finally, the Issue can be passed off to an OTN-analyst for final verification in the database.

Key Points

  • Loading tagging metadata requires judgement from the Data Manager

  • Communication with the researcher is essential when errors are found


Deployment Metadata

Overview

Teaching: 30 min
Exercises: 0 min
Questions
  • How do I load new deployments into the Database?

Objectives
  • Understand the proper template-completion

  • Understand how to use the GitLab checklist

  • Learn how to use the Deploy notebook

Once a project has been registered, the next step (for Deployment and Data project types) is to quality control and load the instrument deployment metadata into the database. Deployment metadata should be reported to the Node in the template provided here. This file will contain information about the deployment of any instruments used to detect tagged subjects or collect related data. This includes stationary test tags, range test instruments, non-acoustic environmental sensors etc. Geographic location, as well as the duration of the deployment for each instrument, is recorded. The locations of these listening stations are used to fix detections geographically.

Remembering our previous lessons, there are multiple levels of data-tables in the database for deployment records: raw tables, rcvr_locations, stations and moorings. The process for loading instrument metadata reflects this, as does the GitLab task list.

Submitted Metadata

Immediately, upon receipt of the metadata, a new GitLab Issue should be created. Please use the Receiver_metadata Issue checklist template.

Here is the Issue checklist, for reference:

Receiver Metadata
- [ ] - NAME add label *'loading records'*
- [ ] - NAME load raw receiver metadata (`deploy` notebook) **put_table_name_in_ticket**
- [ ] - NAME check that station locations have not changed station "NAMES" since last submission (manual check)
- [ ] - NAME verify raw table (`deploy` notebook)
- [ ] - NAME post updated metadata file to project repository (OTN members.oceantrack.org, FACT RW etc)
- [ ] - NAME load station records (`deploy` notebook)
- [ ] - NAME verify stations (`deploy` notebook)
- [ ] - NAME load to rcvr_locations (`deploy` notebook)
- [ ] - NAME verify rcvr_locations (`deploy` notebook)
- [ ] - NAME add transmitter records receivers with integral pingers (`deploy` notebook)
- [ ] - NAME load to moorings (`deploy` notebook)
- [ ] - NAME verify moorings (`deploy` notebook)
- [ ] - NAME label issue with *'Verify'*
- [ ] - NAME pass issue to OTN DAQ for reassignment to analyst
- [ ] - NAME check if project is OTN loan, if yes, check for lost indicator in recovery column, list receiver serial numbers for OTN inventory updating.
- [ ] - NAME pass issue to OTN analyst for final verification
- [ ] - NAME check for double reporting (verification_notebooks/Deployment Verification notebook)

**receiver deployment files/path:**

Visual Inspection

Once the completed file is received from a researcher, the Data Manager should first complete a visual check for formatting and accuracy.

In general, the deployment metadata contains information on the instrument, the deployment location, and the deployment/recovery times.

Things to visually check in the metadata:

  1. Is there any information missing from the essential columns? These are:
    • otn_array
    • station_no
    • deploy_date
    • deploy_lat
    • deploy_long
    • ins_model_no
    • ins_serial_no
    • recovered
    • recover_date
  2. If any of the above mandatory fields are blank, follow-up with the researcher will be required if:
    • you cannot discern the values yourself.
    • you do not have access to the Tag or Receiver Specifications from the manufacturer (relevant for the columns containing transmitter information).
  3. Are all lat/longs in the correct sign? Are they in the correct format (decimal degrees)?
  4. Do all transceivers/test tags have their transmitters provided?
  5. Are all recoveries from previous years recorded?
  6. Do comments suggest anything was lost or damaged, where recovery indicator doesn’t say “lost” or “failed”?

In general, the most common formatting errors occur in records where there are >1 instrument deployed at a station, or if the receiver was deployed and recovered from the same site.

The metadata template available here has a Data Dictionary sheet which contains detailed expectations for each column. Refer back to these definitions often. We have also included some recommendations on our FAQ page. Here are some guidelines:

Quality Control - Deploy Notebook

Each step in the Issue checklist will be discussed here, along with other important notes required to use the Nodebook.

Imports cell

This section will be common for most Nodebooks: it is a cell at the top of the notebook where you will import any required packages and functions to use throughout the notebook. It must be run first, every time.

There are no values here which need to be edited.

Path to File

In this cell, you need to paste a filepath to the relevant Deployment Metadata file. The filepath will be added between the provided quotation marks.

Correct formatting looks something like this:

# Shortfrom metadata path (xls, csv)
filepath = r'C:/Users/path/to/deployment_metadata.xlsx'

You also must select the format of the Deployment metadata. Currently, only the FACT Network uses a slightly different format than the template available here. If its relevant for your Node, you can edit the excel_fmt section.

Correct formatting looks something like this:

excel_fmt = 'otn' # Deployment metadata format 'otn' or 'fact'

Once you have added your filepath and chosen your template format, you can run the cell.

Next, you must choose which sheet you would like to quality control. Generally, it will be named Deployment but is often customized by researchers. Once you have selected the sheet name, do not re-run the cell to save the output - simply ensure the correct sheet is highlighted and move onto the next cell.

Table Name and Database

You will have to edit three sections:

  1. schema = 'collectioncode'
    • please edit to include the relevant project code, in lowercase, between the quotes.
  2. table_name = 'c_shortform_YYYY_mm'
    • Within the quotes, please add your custom table suffix. We recommend using year_month or similar, to indicate the most-recently deployed/recovered instrument in the metadata sheet.
  3. engine = get_engine()
    • Within the open brackets you need to open quotations and paste the path to your database .kdbx file which contains your login credentials.
    • On MacOS computers, you can usually find and copy the path to your database .kdbx file by right-clicking on the file and holding down the “option” key. On Windows, we recommend using the installed software Path Copy Copy, so you can copy a unix-style path by right-clicking.
    • The path should look like engine = get_engine('C:/Users/username/Desktop/Auth files/database_conn_string.kdbx').

Once you have added your information, you can run the cell. Successful login is indicated with the following output:

Auth password:········
Connection Notes: None
Database connection established
Connection Type:postgresql Host:db.for.your.org Database:your_db_name User:your_node_admin Node:Node

Verification of File Contents

This cell will now complete the first round of Quality Control checks.

The output will have useful information:

The notebook will indicate the sheet has passed quality control by adding a ✔️green checkmark beside each section. There should also be an interactive plot generated, summarizing the instruments deployed over time for you to explore, and a map of the deployments.

Using the map, please confirm the following:

  1. the instrument deployment locations are in the part of the world expected based on the project abstract. Ex: lat/long have correct +/- signs
  2. the instrument deployments do not occur on land

If there is information which is not passing quality control, you should fix the source-file (potentially after speaking to the researcher) and try again.

Deploy 1

Loading the Raw Table

ONLY once the source file has successfully passed ALL quality control checks can you load the raw table to the database.

You have already named the table above, so there are no edits needed in this cell.

The notebook will indicate the success of the table-creation with the following message:

Reading file 'deployment_metadata.xlsx' as otn formatted Excel.
Table Loading Complete:
 Loaded XXX records into table schema.c_shortform_YYYY_mm

Task list checkpoint

In GitLab, these tasks can be completed at this stage:

- [ ] - NAME load raw receiver metadata ("deploy" notebook) **put_table_name_in_ticket**
- [ ] - NAME check that station locations have not changed station "NAMES" since last submission (manual check)

Ensure you paste the table name (ex: c_shortform_YYYY_mm) into the section indicated, before you check the box.

Verify Raw Table

This cell will now complete the Quality Control checks of the raw table. This is to ensure the Nodebook loaded the records correctly from the Excel sheet.

The output will have useful information:

The notebook will indicate the sheet had passed quality control by adding a ✔️green checkmark beside each section.

If there are any errors go into database and fix the raw table directly, or contact the researcher, and re-run.

Task list checkpoint

In GitLab, this task can be completed at this stage:

- [ ] - NAME verify raw table ("deploy" notebook)

Loading Stations Records

STOP - confirm there is no Push currently ongoing. If a Push is ongoing, you must wait for it to be completed before processing beyond this point

Only once the raw table has successfully passed ALL quality control checks can you load the stations information to the database stations table.

Running this cell will first check for any new stations to add, then confirm the records in the stations table matches the records in the moorings table where basisofrecord = 'STATION'.

If new stations are identified:

The success message will look like:

Adding station records to the stations table.
Creating new stations...
Added XXX new stations to schema.moorings

If the stations and moorings tables are not in-sync, the difference between the two will need to be compared and possibly updated.

Task list checkpoint

In GitLab, this task can be completed at this stage:

- [ ] - load station records ("deploy" notebook)

Verify Stations Table

This cell will now complete the Quality Control checks of the stations records contained in the entire schema. We are no longer only checking against our newly-loaded records, but also each previously-loaded record in this schema/project. This will help catch historical errors.

The output will have useful information:

The notebook will indicate the sheet had passed quality control by adding a ✔️green checkmark beside each section.

If there are any errors go into database and fix the raw table directly, or contact the researcher, and re-run. If there are problems with records that have already been promoted to the stations or moorings table, you will need to contact an OTN database staff member to resolve these.

Task list checkpoint

In GitLab, this task can be completed at this stage:

- [ ] - verify stations("deploy" notebook)

Load to rcvr_locations

Once the station table is verified, the receiver deployment records can now be promoted to the “intermediate” rcvr_locations table.

The cell will identify any new deployments to add, and any previously-loaded deployments which need updating (ex: they have been recovered).

If new deployments are identified:

If deployment updates are identified:

Each instance will give a success message such as:

Loading deployments into the rcvr_locations_table
Loaded XX records into the schema.rcvr_locations table.

Task list checkpoint

In GitLab, this task can be completed at this stage:

- [ ] - load to rcvr_locations ("deploy" notebook)

Verify rcvr_locations

This cell will now complete the Quality Control checks of the rcvr_locations records contained in the entire schema. We are no longer only checking our newly-loaded records, but also each previously-loaded record for this schema/project. This will help catch historical errors.

The output will have useful information:

The notebook will indicate the table has passed quality control by adding a ✔️green checkmark beside each section.

If there are any errors contact OTN, or contact the researcher, to resolve.

Task list checkpoint

In GitLab, this task can be completed at this stage:

- [ ] - verify rcvr_locations ("deploy" notebook)

Load Transmitter Records to Moorings

The transmitter values associated with transceivers, co-deployed sentinel tags or stand-alone test tags will be loaded to the moorings table in this section. Existing transmitter records will also be updated if relevant.

If new transmitters are identified:

If transmitter updates are identified:

Task list checkpoint

In GitLab, this task can be completed at this stage:

- [ ] - load transmitter records receivers with integral pingers ("deploy" notebook)

Load Receivers to Moorings

The final, highest-level table for instrument deployments is moorings.

The cell will identify any new deployments to add, and any previously-loaded deployments which need updating (ex: they have been recovered).

Please review all new deployments and deployment update for accuracy, then press the associated buttons to make the changes. At this stage, the updates are not editable: any updates chosen from the rcvr_locations section will be processed here.

You may be asked to select an instrumenttype for certain receivers. Use the drop-down menu to select before adding the deployment.

Task list checkpoint

In GitLab, this task can be completed at this stage:

- [ ] - load to moorings ("deploy" notebook)

Verify Moorings

This cell will now complete the Quality Control checks of the moorings records contained in the entire schema. We are no longer only checking our newly-loaded records, but also each previously-loaded record in this project/schema.

The output will have useful information:

The notebook will indicate the table has passed quality control by adding a ✔️ green checkmark beside each section.

If there are any errors contact OTN to resolve.

Task list checkpoint

In GitLab, this task can be completed at this stage:

- [ ] - verify moorings ("deploy" notebook)

Final Steps

The remaining steps in the GitLab Checklist are completed outside the notebooks.

First: you should access the Repository folder in your browser and add the cleaned Deployment Metadata .xlsx file into the “Data and Metadata” folder.

Finally, the Issue can be passed off to an OTN-analyst for final verification in the database.

Key Points

  • Loading receiver metadata requires judgement from the Data Manager

  • Communication with the researcher is essential when errors are found


Detection Loading

Overview

Teaching: 40 min
Exercises: 0 min
Questions
  • What is the workflow for loading detection data?

  • What do I need to look out for as a Node Manager when loading detections?

Objectives
  • Understand how to use the GitLab checklist

  • Understand the workflow for detection data in the OTN system

  • Learn common errors and pitfalls that come up when loading detections

Once deployment metadata has been processed for a project, the related detections may now be processed. Detection data should be reported to the Node as a collection of raw, unedited files. These can be in the form of a zipped folder of .VRLs, a database from Thelma Biotel or any other raw data product from any manufacturer. The files contain only transmitter numbers and the datetimes at which they were recorded at a specific receiver. The tag metadata and deployment metadata will provide the associated geographic and biological context to this data.

Submitted Records

Immediately, upon receipt of the data files, a new GitLab Issue should be created. Please use the Detections Issue checklist template.

Here is the Issue checklist, for reference:

Detections
- [ ] - NAME load raw detections and events `(detections-1` notebook and `events-1` notebook **OR** `Convert - Fathom Export` notebook and `detections-1` notebook) **(put table names here)**
- [ ] - NAME upload raw detections to project folder (OTN members.oceantrack.org, FACT RW etc) if needed
- [ ] - NAME verify raw detections table (`detections-1` notebook)
- [ ] - NAME load raw events to events table (`events-2` notebook)
- [ ] - NAME load to detections_yyyy (`detections-2` notebook)
- [ ] - NAME comment in issue what detection years were loaded (output from `detections-2`)
- [ ] - NAME verify detections_yyyy (looking for duplicates) (`detections-2` notebook)
- [ ] - NAME load to sensor_match_yyyy (`detections-2` notebook)
- [ ] - NAME comment in issue what sensor years were loaded (output from `detections-2`)
- [ ] - NAME timedrift correction for affected detection and sensor years (`detections-2b` notebook)
- [ ] - NAME verify timedrift corrections (`detections-2b` notebook)
- [ ] - NAME manually check for open, unverified receiver metadata, **STOP** if it exists! (**put Gitlab issue number here**)
-----
- [ ] - NAME load to otn_detections_yyyy (`detections-3` notebook)
- [ ] - NAME verify otn_detections_yyyy (`detections-3` notebook)
- [ ] - NAME load sentinel records (`detections-3` notebook)
- [ ] - NAME check for missing receiver metadata (`detections-3b` notebook)
- [ ] - NAME check for missing data records (`detections-3c` notebook)
- [ ] - NAME load download records (`events-3` notebook)
- [ ] - NAME verify download records (`events-3` notebook)
- [ ] - NAME process receiver configuration (`events-4` notebook)
- [ ] - NAME label issue with *'Verify'*
- [ ] - NAME pass issue to OTN analyst for final steps
- [ ] - NAME check for double reporting (verification_notebooks/Project Verification notebook)
- [ ] - NAME match tags to animals (`detections-4` notebook)
- [ ] - NAME do sensor tag processing (only done if vendor specifications are available)
- [ ] - NAME update detection extract table

**detections files/path:**

Visual Inspection

Once the files are received from a researcher, the Data Manager should first complete a visual check for formatting and accuracy.

Things to visually check:

Convert to CSV

Once the raw files are obtained, the data must be converted to csv format. There are several ways this can be done, depending on the manufacturer.

For Innovasea

For Thelma Biotel

For Lotek

Other manufacturers: contact OTN staff.

detections - 1 - load csv detections

Detections-1 loads CSV detections files into a new database table. If detections were exported using Fathom or the convert - Fathom (vdat) Export - VRL to CSV notebook, the events records will also be loaded at this stage. This is because these applications combine the detections and events data in one CSV file.

Import cells and Database Connections

As in all notebooks, run the import cell to get the packages and functions needed throughout the notebook. This cell can be run without any edits.

The second cell will set your database connection. You will have to edit one section: engine = get_engine()

Once you have added your information, you can run the cell. Successful login is indicated with the following output:

Auth password:········
Connection Notes: None
Database connection established
Connection Type:postgresql Host:db.for.your.org Database:your_db_name User:your_node_admin Node:Node

User Input

Cell three requires input from you. This information will be used to get the raw detections CSV and to be able to create a new raw table in the database.

  1. file_or_folder_path = r'C:/Users/path/to/detections_CSVs/'
    • paste a filepath to the relevant CSV file(s). The filepath will be added between the provided quotation marks.
    • this can be a path to a single CSV file, or a folder of multiple CSVs.
  2. table_suffix = 'YYYY_mm'
    • Within the quotes, please add your custom table suffix. We recommend using year_month or similar, to indicate the most-recently downloaded instrument.
  3. schema = 'collectioncode'
    • please edit to include the relevant project code, in lowercase, between the quotes.

There are also some optional inputs:

Once you have added your information, you can run the cell.

Verify Detection File and Load to Raw Table

Next, the notebook will review and verify the detection file(s) format, and report any error. Upon successful verification, you can then run the cell below which will attempt to load the detections into a new raw table.

The notebook will indicate the success of the table-creation with a message such as this:

Reading fathom files...
Loading Files...
7/7

Task list checkpoint

In GitLab, this task can be completed at this stage:

- [ ] - NAME load raw detections and events ('detections-1' notebook and 'events-1' notebook **OR** 'Batch Fathom Export' notebook and 'detections-1' notebook) **(put table names here)**

Ensure you paste the table name (ex: c_detections_YYYY_mm) into the section indicated, before you check the box.

Verify Raw Detection Table

This cell will now complete the Quality Control checks of the raw table. This is to ensure the Nodebook loaded the records correctly from the CSVs.

The output will have useful information:

The notebook will indicate the sheet had passed quality control by adding a ✔️green checkmark beside each section.

If there are any errors, contact OTN for next steps.

Task list checkpoint

In GitLab, these tasks can be completed at this stage:

- [ ] - NAME verify raw detections table ('detections-1' notebook)

events - 1 - load events into c_events_yyyy

Events-1 is responsible for loading receiver events files into raw tables. This is only relevant for CSVs that were NOT exported using Fathom or the convert - Fathom (vdat) Export - VRL to CSV notebook.

Import cell

As in all notebooks run the import cell to get the packages and functions needed throughout the notebook. This cell can be run without any edits.

User Inputs

Cell two requires input from you. This information will be used to get the raw events CSV and to be able to create a new raw table in the database.

  1. filepath = r'C:/Users/path/to/events.csv'
    • paste a filepath to the relevant CSV file. The filepath will be added between the provided quotation marks.
  2. table_name = 'c_events_YYYY_mm'
    • Within the quotes, please add your custom table suffix. We recommend using year_month or similar, to indicate the most-recently downloaded instrument.
  3. schema = 'collectioncode'
    • please edit to include the relevant project code, in lowercase, between the quotes.

There are also some optional inputs:

Once you have added your information, you can run the cell.

Verifying the events file

Before attempting to load the event files to a raw table the notebook will verify the file to make sure there are no major issues. This will be done by running the Verify events file cell. Barring no errors, you will be able to continue.

The notebook will indicate the success of the file verification with a message such as this:

Reading file 'events.csv' as CSV.
Verifying the file.
Format: VUE 2.6+
Mandatory Columns: OK
date_and_time datetime:OK
Initialization(s): XX
Data Upload(s): XX
Reset(s): XX

Database Connection

You will have to edit one section: engine = get_engine()

Once you have added your information, you can run the cell. Successful login is indicated with the following output:

Auth password:········
Connection Notes: None
Database connection established
Connection Type:postgresql Host:db.for.your.org Database:your_db_name User:your_node_admin Node:Node

Load the events file into the c_events_yyyy table

The second last cell loads the events file into a raw table. It depends on successful verification from the last step. Upon successful loading and can dispose of the engine then move on to the next notebook.

The notebook will indicate the success of the table-creation with the following message:

File loaded with XXXXX records.
100%

Task list checkpoint

In GitLab, these tasks can be completed at this stage:

- [ ] - NAME load raw detections and events ('detections-1' notebook and 'events-1' notebook **OR** 'Batch Fathom Export' notebook and 'detections-1' notebook) **(put table names here)**

Ensure you paste the table name (ex: c_events_YYYY_mm) into the section indicated, before you check the box.

events - 2 - move c_events into events table

This notebook will move the raw events records in the intermediate events table.

Import cell

As in all notebooks run the import cell to get the packages and functions needed throughout the notebook. This cell can be run without any edits.

User input

This cell requires input from you. This information will be used to get the raw events CSV and to be able to create a new raw table in the database.

  1. c_events_table = 'c_events_YYYY_mm'
    • Within the quotes, please add your custom table suffix, which you have just loaded in either detections-1 or events-1.
  2. schema = 'collectioncode'
    • please edit to include the relevant project code, in lowercase, between the quotes.

Database Connection

You will have to edit one section: engine = get_engine()

Once you have added your information, you can run the cell. Successful login is indicated with the following output:

Auth password:········
Connection Notes: None
Database connection established
Connection Type:postgresql Host:db.for.your.org Database:your_db_name User:your_node_admin Node:Node

Verify table format

You will then verify that the c_events events table you put in exists and then verify that it meets the required format specifications.

The notebook will indicate the success of the table verification with a message such as this:

Checking table name format... OK
Checking if schema collectioncode exists... OK!
Checking collectioncode schema for c_events_YYYY_mm table... OK!
collectioncode.c_events_YYYY_mm table found.

If there are any errors in this section, please contact OTN.

Load to Events table

Pending nothing comes up in the verification cells, you run the loading cell.

The notebook will indicate the success of the processing with a message such as this:

Checking for the collectioncode.events table... OK!
Loading events... OK!
Loaded XX rows into collectioncode.events table.

Task list checkpoint

In GitLab, these tasks can be completed at this stage:

- [ ] - NAME load raw events to events table ("events-2" notebook)

detections - 2 - c_table into detections_yyyy

This notebook takes the raw detection data from detections-1 and moves it into the intermediate detections_yyyy tables (split out by year).

Import cells and Database Connections

As in all notebooks run the import cell to get the packages and functions needed throughout the notebook. This cell can be run without any edits.

The second cell will set your database connection. You will have to edit one section: engine = get_engine()

Once you have added your information, you can run the cell. Successful login is indicated with the following output:

Auth password:········
Connection Notes: None
Database connection established
Connection Type:postgresql Host:db.for.your.org Database:your_db_name User:your_node_admin Node:Node

User Inputs

To load the to the detections_yyyy tables the notebook will require information about the schema you are working in and the raw table that you created in detections-1.

  1. c_table = 'c_detections_YYYY_mm'
    • Within the quotes, please add your custom table suffix, which you have just loaded in detections-1
  2. schema = 'collectioncode'
    • please edit to include the relevant project code, in lowercase, between the quotes.

The notebook will indicate success with the following message:

Checking table name format... OK
Checking if schema collectioncode exists... OK!
Checking collectioncode schema for c_detections_yyyy_mm table... OK!
collectioncode.c_detections_yyyy_mm table found.

Create Missing Tables

Detections tables are only created on an as-needed basis. These cells will detect any tables you are missing and create them as needed, based on the years covered in the raw detection table (c_table). This will check all tables such as detections_yyyy, sensor_match_yyyy and otn_detections_yyyy.

First the notebook with gather and print the missing tables. If there are none missing, the notebook will report that as well.

vemco: Match
You are missing the following tables:
[collectioncode.detections_YYYY, v2lbeiar.otn_detections_YYYY, v2lbeiar.sensor_match_YYYY]
Create these tables by passing the missing_tables variable into the create_detection_tables function.

If you proceed in the notebook, there is a creation cell which will add these tables to the project schema in the database. Success will be indicated with the following message:

Creating table collectioncode.detections_YYYY... OK
Creating table collectioncode.otn_detections_YYYY... OK
Creating table collectioncode.sensor_match_YYYY... OK

Create Detection Sequence

Before loading detections, a detection sequence is created. The sequence is used to populate the det_key column. The det_key value is an unique ID for that detection to help ensure there are no duplicates. If a sequence is required, you will see this output:

creating sequence v2lbeiar.detections_seq... OK

No further action is needed.

Load to Detections_YYYY

Duplicate detections are then checked for and will not be inserted into the detections_yyyy tables.

If no duplicates are found you will see:

No duplicates found. All of the detections will be loaded into the detections_yyyy table(s).

If duplicates are found you will see:

Detections 2

After all this, the raw detection records are ready to be loaded into the detections_yyyy tables. The notebook will indicate success with the following message:

Inserting records from collectioncode.c_detections_YYYY_mm into collectioncode.detections_2018... OK
Added XXXXX rows.
Inserting records from collectioncode.c_detections_YYYY_mm into collectioncode.detections_2019... OK
Added XXXXX rows.
Inserting records from collectioncode.c_detections_YYYY_mm into collectioncode.detections_2020... OK
Added XXXXX rows.
Inserting records from collectioncode.c_detections_YYYY_mm into collectioncode.detections_2021... OK
Added XXXXX rows.

You must note which years have been loaded! In the example above, this would be 2018, 2019, 2020, and 2021.

Task list checkpoint

In GitLab, these tasks can be completed at this stage:

- [ ] - NAME load to detections_yyyy (`detections-2` notebook)
- [ ] - NAME comment in issue what detection years were loaded (output from `detections-2`)

Ensure you paste the affected tables (ex: 2019, 2020) into the section indicated, before you check the box.

Verify Detections YYYY Tables

This cell will now complete the Quality Control checks of the detections_yyyy tables. This is to ensure the nodebook loaded the records correctly.

First, you will need to list all of the years that were affected by the previous loading step, so the Nodebook knows which tables need to be verified. The format will look like this:

years = ['YYYY','YYYY','YYYY', 'YYYY']

If left blank, the Nodebook will check all the years, which may take a long time for some projects.

Run this cell, then you can verify in the next cell.

The output will have useful information:

The notebook will indicate the sheet had passed quality control by adding a ✔️green checkmark beside each section.

If there are any errors contact OTN for next steps.

Task list checkpoint

In GitLab, this task can be completed at this stage:

- [ ] - NAME verify detections_yyyy (looking for duplicates) ("detections-2" notebook)

Load sensors_match Tables by Year

For the last part of this notebook you will need to load the to the sensor_match_YYYY tables. This loads detections with sensor information into a project’s sensor_match_yyyy tables. Later, these tables will aid in matching vendor specifications to resolve sensor tag values.

Output will appear like this:

Inserting records from collectioncode.detections_2019 INTO sensor_match_2019... OK
Added XXX rows.
Inserting records from collectioncode.detections_2021 INTO sensor_match_2021... OK
Added XXX rows.

You must note which years have been loaded! In the example above, this would be 2019 and 2021.

Task list checkpoint

In GitLab, these tasks can be completed at this stage:

- [ ] - NAME load to sensor_match_yyyy (`detections-2` notebook)
- [ ] - NAME comment in issue what sensor years were loaded (output from `detections-2`)

Ensure you paste the affected tables (ex: 2019, 2020) into the Issue.

detections - 2b - timedrift calculations

This notebook calculates time drift factors and applies the corrections to the detections_yyyy tables, in a field called corrected_time. OTN’s Data Manager toolbox (the Nodebooks) corrects for timedrift between each initialization and offload of a receiver. If a receiver is offloaded several times in one data file, time correction does not occur linearly from start to end, but between each download, to ensure the most accurate correction. If there is only one download in a data file then the time correction in VUE software will match the time correction performed by OTN.

Import cells and Database connections

As in all notebooks run the import cell to get the packages and functions needed throughout the notebook. This cell can be run without any edits.

The second cell will set your database connection. You will have to edit one section: engine = get_engine()

Once you have added your information, you can run the cell. Successful login is indicated with the following output:

Auth password:········
Connection Notes: None
Database connection established
Connection Type:postgresql Host:db.for.your.org Database:your_db_name User:your_node_admin Node:Node

User Inputs

To load the to the detections_yyyy tables the notebook will require information about the schema you are working in. Please edit schema = 'collectioncode' to include the relevant project code, in lowercase, between the quotes.

Calculating Time Drift Factors

create_tbl_time_drift_factors. This function will create the time_drift_factors table in the schema if it doesn’t exist.

The next step is to run check_time_drifts which gives a display of the time drift factor values that will be added to the time_drift_factors table given an events table. At this stage, you should review for any erroneous/large timedrifts.

If everything looks good, you may proceed to the next cell which adds new time drift factors to the time_drift_factors table from the events file. A success message will appear:

Adding XXX records to collectioncode.time_drift_factors table from collectioncode.events... OK!

You will then see a cell to create missing views which creates the time drift views which the database will use to calculate drift values for both the detections_yyyy and sensor_match_yyyy tables.

Correcting Time Drift

Finally, we are ready to update the times in both the detections_yyyy and sensor_match_yyyy tables with corrected time values using the vw_time_drift_cor database view.

The notebook should identify all of the years that were affected by detections-2 loading steps, so the notebook knows which tables need to be corrected.

Once the timedirft calculation is done (indicated by ✔️green checkmarks).

Task list checkpoint

In GitLab, this task can be completed at this stage:

- [ ] - NAME timedrift correction for affected detection and sensor years ("detections-2b" notebook)

Verify Detections After Time Drift Calculations

After running the above cells you will then verify the time drift corrections on the detections_yyyy and sensor_match_yyyy tables.

The output will have useful information:

The notebook will indicate the sheet had passed quality control by adding a ✔️green checkmark beside each section.

If there are any errors contact OTN for next steps.

Task list checkpoint

In GitLab, this task can be completed at this stage:

- [ ] - NAME verify timedrift corrections ("detections-2b" notebook)

detections - 3 - detections_yyyy into otn_detections

The detections - 3 notebook moves the detections from detections_yyyy and sensor_match_yyyy tables into the final otn_detections_yyyy tables. This will join the detections records to their associated deployment records, providing geographic context to each detection. If there is no metadata for a specific detection (no receiver record to match with) it will not be promoted to otn_detections_yyyy.

Import cells and Database connections

As in all notebooks run the import cell to get the packages and functions needed throughout the notebook. This cell can be run without any edits.

The second cell will set your database connection. You will have to edit one section: engine = get_engine()

Once you have added your information, you can run the cell. Successful login is indicated with the following output:

Auth password:········
Connection Notes: None
Database connection established
Connection Type:postgresql Host:db.load.oceantrack.org Database:otnunit User:admin Node:OTN

User Inputs

To load the to the detections_yyyy tables the notebook will require information about the schema you are working in. Please edit schema = 'collectioncode' to include the relevant project code, in lowercase, between the quotes.

Before moving on from this you will need to confirm 2 things:

1) Confirm that NO Push is currently ongoing

2) confirm rcvr_locations for this schema have been verified.

If a Push is ongoing, or if verification has not yet occurred, you must wait for it to be completed before processing beyond this point.

Task list checkpoint

In GitLab, this task can be completed at this stage:

- [ ] - NAME manually check for open, unverified receiver metadata, **STOP** if it exists! **(put GitLab issue number here)**

Creating detection views and loading to otn_detections

Once you are clear to continue loading you can run create_detection_views. This function as its name implies will create database views for detection data.

Output will look like:

Creating view collectioncode.vw_detections_2020... OK
Creating view collectioncode.vw_sentinel_2020... OK
Creating view collectioncode.vw_detections_2021... OK

These are then used to run the function in the next cell load_into_otn_detections_new, which loads the detections from those views into otn_detections. You will be asked to select all relevant tables here, with a dropdown menu and checkboxes.

You must select all years that were impacted by detections_yyyy or sensor_match_yyyy loading steps. Then click the Load Detections button to being loading. There will be a status bar indicating your progress.

Task list checkpoint

In GitLab, this task can be completed at this stage:

- [ ] - NAME load to otn_detections_yyyy ("detections-3" notebook)

Verify OTN Detections

After running your needed cells you will then verify otn_detections_yyyy detections.

The output will have useful information:

The notebook will indicate the sheet had passed quality control by adding a ✔️green checkmark beside each section.

If there are any errors contact OTN for next steps.

Task list checkpoint

In GitLab, these tasks can be completed at this stage:

- [ ] - NAME verify otn_detections_yyyy (`detections-3` notebook)
- [ ] - NAME load sentinel records (`detections-3` notebook)

detections - 3b - missing_metadata_check

This notebook is for checking for detections that have not been inserted into otn_detections_yyyy, which will indicate missing receiver metadata.

The user will be able to set a threshold for the minimum number of detections to look at (default is 100). It will also separate animal detections from transceiver detections in a graph.At the end, it will show a SQL command to run so that the missing metadata can be seen in table format.

Import cells and Database connections

As in all notebooks run the import cell to get the packages and functions needed throughout the notebook. This cell can be run without any edits.

The second cell will set your database connection. You will have to edit one section: engine = get_engine()

Once you have added your information, you can run the cell. Successful login is indicated with the following output:

Auth password:········
Connection Notes: None
Database connection established
Connection Type:postgresql Host:db.for.your.org Database:your_db_name User:your_node_admin Node:Node

User Inputs

Information regarding the tables we want to check against is required.

  1. schema = 'collectioncode'
    • edit to include the relevant project code, in lowercase, between the quotes.
  2. years = []
    • A comma-separated list of detection table years for detections_yyyy, should be in form [yyyy] or a list such as [yyyy,yyyy,'early']

Once you have edited these values, you can run the cell. You should see the following success message:

All tables exist!

Check for Missing Metadata in Detections_yyyy

This step will perform the check for missing metadata in detections_yyyy and display results for each record where the number of excluded detections is greater than the specified threshold.

First: enter your threshold. Formatted like: threshold = 100. Then you may run the cell.

The output will include useful information:

There will be a visualization of the missing metadata for each instance where the number of missing detections is over the threshold.

Any instance with missing detections (greater than the threshold) should be identified, and the results pasted into a new GitLab Issue. The format will look like:

VR2W-123456
	missing XX detections (0 transceiver tags, XX animal tags, 0 test tags) (before deployments) (YYYY-MM-DD HH:MM:SS to YYYY-MM-DD HH:MM:SS)

VR2W-567891
	missing XX detections (0 transceiver tags, XX animal tags, 0 test tags) (before deployments) (YYYY-MM-DD HH:MM:SS to YYYY-MM-DD HH:MM:SS)

There are also two cells at the end that allow you create reports for researchers in CSV or HTML format.

Task list checkpoint

In GitLab, this task can be completed at this stage:

- [ ] - NAME check for missing receiver metadata ("detections-3b" notebook)

detections - 3c - missing_vrl_check

This notebook will check for missing data files in the database by comparing the rcvr_locations and events tables. For any receiver deployments that are missing events, it will check if there are detections during that time period for that receiver.

Import cells and Database connections

As in all notebooks run the import cell to get the packages and functions needed throughout the notebook. This cell can be run without any edits.

The second cell will set your database connection. You will have to edit one section: engine = get_engine()

Once you have added your information, you can run the cell. Successful login is indicated with the following output:

Auth password:········
Connection Notes: None
Database connection established
Connection Type:postgresql Host:db.for.your.org Database:your_db_name User:your_node_admin Node:Node

User Inputs

Information regarding the tables we want to check against is required. Please complete schema = 'collectioncode', edited to include the relevant project code, in lowercase, between the quotes.

There are also optional fields:

Once you have edited the values, you can run the cell. You should see the following success message:

Checking if collectioncode schema exists...
OK

Checking For Missing Data Files

The next cell will begin scanning the project’s schema to identify if there are any missing data files. If there are no missing files, you will see this output:

Checking if all deployment periods have events...
X/X
Checking if deployment periods missing events have detections...
⭐ All deployment periods had events! Skipping detection check. ⭐

If the notebook has detection missing file, you will see this output:

Checking if all deployment periods have events...
XXX/XXX
Checking if deployment periods missing events have detections...
XX/XX

Displaying Missing Data Files

Now the notebook will begin plotting a Gantt chart, displaying the periods of deployment for which the database is missing data files. There are some optional customizations you can try:

Running the cell will safe your configuration options. And the next cell creates the chart(s).

The plot will have useful information:

Exporting Results

After investigating the Gantt chart, a CSV export can be created to send to the researcher.

First, you must select which types of records you’d like to export from this list:

Then, the next cell will print the relevant dataframe, with an option below to Save Dataframe. Simply type the intended filename and filetype into the File or Dataframe Name box (ex. missing_vrls_collectioncode.csv) and press Save Dataframe. The file should now be available in your ipython-utilities folder for dissemination. Please track this information in a new GitLab ticket.

Task list checkpoint

In GitLab, this task can be completed at this stage:

- [ ] - NAME check for missing data records ("detections-3c" notebook)

events - 3 - create download records

This notebook will promote the events records from the intermediate events table to the final moorings records. Only use this notebook after adding the receiver records to the moorings table as this process is dependant on receiver records.

Import cells and Database connections

As in all notebooks run the import cell to get the packages and functions needed throughout the notebook. This cell can be run without any edits.

The second cell will set your database connection. You will have to edit one section: engine = get_engine()

Once you have added your information, you can run the cell. Successful login is indicated with the following output:

Auth password:········
Connection Notes: None
Database connection established
Connection Type:postgresql Host:db.for.your.org Database:your_db_name User:your_node_admin Node:Node

User Inputs

Information regarding the tables we want to check against is required. Please complete schema = 'collectioncode', edited to include the relevant project code, in lowercase, between the quotes.

Once you have edited the value, you can run the cell.

Detecting Download Records

The next cell will scan the events table looking for data download events, and attempt to match them to their corresponding receiver deployment.

You should see output like this:

Found XXX download records to add to the moorings table

The next cell will print out all the identified download records, in a dataframe for you to view.

Loading Download Records

Before moving on from this you will need to confirm 2 things:

1) Confirm that NO Push is currently ongoing

2) confirm rcvr_locations for this schema have been verified.

If a Push is ongoing, or if verification has not yet occurred, you must wait for it to be completed before processing beyond this point.

If everything is OK, you can run the cell. The notebook will indicate success with a message like:

Added XXX records to the moorings table

Task list checkpoint

In GitLab, this task can be completed at this stage:

- [ ] - NAME load download records ("events-3" notebook)

Verify Download Records

This cell will have useful information:

The notebook will indicate the table has passed verification by the presence of ✔️green checkmarks.

If there are any errors, contact OTN for next steps.

Task list checkpoint

In GitLab, this task can be completed at this stage:

- [ ] - NAME verify download records ("events-3" notebook)

events-4 - process receiver configuration

This notebook will process the receiver configurations (such as MAP code) from the events table and load them into the schema’s receiver_config table. This is a new initiative by OTN to document and store this information, to provide better feedback to researchers regarding the detectability of their tag-programming through time and space.

Import cells and Database connections

As in all notebooks run the import cell to get the packages and functions needed throughout the notebook. This cell can be run without any edits.

The second cell will set your database connection. You will have to edit one section: engine = get_engine()

Once you have added your information, you can run the cell. Successful login is indicated with the following output:

Auth password:········
Connection Notes: None
Database connection established
Connection Type:postgresql Host:db.for.your.org Database:your_db_name User:your_node_admin Node:Node

User Inputs

Information regarding the tables we want to check against is required. Please complete schema = 'collectioncode', edited to include the relevant project code, in lowercase, between the quotes.

Once you have edited the value, you can run the cell.

Get Receiver Configuration

Using the receiver deployment records, and the information found in the events table, this cell will identify and important configuration information for each deployment. A dataframe will be displayed.

The following cell will extrapolate further to populate all the required columns from the receiver_config table. A dataframe will be displayed.

Load Configuration to Database

Finally, the notebook will insert the identified records into the receiver_config table. You should see the following success message, followed by a dataframe:

The following XX receiver configurations are new and have been inserted:

Task list checkpoint

In GitLab, this task can be completed at this stage:

- [ ] - NAME process receiver configuration ("events-4" notebook)

Final Steps

The remaining steps in the GitLab Checklist are completed outside the notebooks.

First: you should access the Repository folder in your browser and ensure the raw detections are posted in the Data and Metadata folder.

Finally, the Issue can be passed off to an OTN-analyst for final verification in the database.

Key Points

  • Its important to handle errors when they come up as they can have implications on detections

  • OTN finishes off detections Issues by running Matching and sensor tag processing


Supplementary notebooks

Overview

Teaching: 15 min
Exercises: 0 min
Questions
  • What are other Nodebooks are there that could be usefull for me to know as a Node Manager?

Objectives
  • Learn about Nodebooks that are useful for Node Managers outside the core data loading notebooks

OTN maintains several additional Nodebooks that fall outside the core tag, deployment and detection tools. These may be useful to Node managers who also deal with these particular scenarios.

Check Environment

This notebook checks your system Python environment against our environment.yml. This is to see if the Python packages and libraries you have installed are in line with what is required to run the ipython Notebooks. This will assist you with updating your packages if they become out-of-date, or if OTN develops and publishes new tools which rely on new packages.

scientific_name_check

This notebook uses WoRMS to check animal common and scientific names. It is used to add new species names to obis.scientificnames table for use each project. The instructions for using this notebook are the same as the Adding Scientific Names section in the Create and Update Projects notebook

Add Instrument Models to Database

This notebook is used to add an instrument model to the obis.instrument_models table.

insert_vendor_sheet

Used to load manufacturer specifications for tags or receivers into the vendor tables.

convert - Fathom (vdat) Export - VRL to CSV

Convert VRLs or VDATs into CSV files using command-line Fathom software. Can also be done using the Fathom app.

Active Tracking

Handles active tracking data - detections collected by using a VR100 hydrophone (or similar) during “mobile” tracking activities.

Slocum Telemetry Notebook

This notebook will process and load detection data collected by a slocum glider mission. Information required includes glider telemetry, glider metadata, and detection files.

Load Health Report

This notebook will load “health reports” collected by LiquidRobotics WaveGliders while remotely offloading VR4 receivers.

Telemetry Processing for Wave Gliders and Animals

This notebook will process and load detection data collected by a WaveGlider mission. Information required includes glider telemetry, glider metadata, and detection files.

create and update contacts’ notebook

This notebook can be used to add new contacts to a project and update existing contacts in the database. Note: you cannot change someone’s email address using this tool.

Active Tags and IUCN Status

Located in the vis subfolder. This creates a summary report of Tag Life, Tags, Detections, Stations. Tailored for OTN’s reporting requirements and data policy.

Generate Receiver Map

Located in the vis subfolder. This creates a map to show receivers from Nodes. Tailored for OTN’s reporting requirements and data policy.

Receiver Operator Report

Located in the vis subfolder. This notebook generates a summary report for receiver operators to describe what animals have been seen and when. Tailored for OTN’s reporting requirements and data policy.

DB-Fix Notebooks

These are a series of notebooks for fixing common issues found in the database. These notebooks are beyond the scope of the current training but eventually Data Managers who wish to learn more will be able to take further training. In the meantime, if you see notes in the notebooks such as “Use the DB Fix notebook called XXXX to correct this error”, please contact OTN for assistance.

Key Points

  • ipython-utilities has many useful notebooks for Node Managers to help them


Data Push

Overview

Teaching: 10 min
Exercises: 0 min
Questions
  • What is a Data Push?

  • What do I need to do as a Node Manager for a Data Push?

  • What are the end products of a Data Push?

Objectives
  • Understand a basic overview of a Data Push

  • Understand what a Node Manager’s responsibilities are during a Data Push

  • Understand the format of Detection Extracts

What it a Data Push?

A Data Push is when the OTN data system is re-verified and any new relevant information is sent to researchers. New data being brought in is cut off so that what’s in the system can be reliably verified. This way any issues found can be fixed and the data can be in the best form based on the information available at that moment. Once verification is done detections are matched across nodes and detection extracts are sent out to researchers. This is also the time when summary schemas like discovery, erddap, and geoserver are updated with the newly verified and updated data.

What is the Push Schedule?

Push events happen three times a year. They start on the third Thursday of the “push month” which are February, June, and October. This date is the cut-off date for all data-loading: no records can be loaded after this. Please aim to have all tickets ready for verification 1 week before this date.

With the increased number of Nodes joining the Pushes, we are announcing the schedule for the next year. Please prepare in advance and mark your calendars.

Push schedule through 2023:

Node Manager Roles During a Push

Node Managers have two main jobs during a Push:

  1. The first job is to get the Node’s data loaded in time for the cut-off date. Data will be submitted by researchers on a continuous basis, but will likely increase just before a cut-off date. We recommend loading data as it arrives, to attempt to prevent a backlog near the Push date.
  2. The second job for Node Managers is to create and send out Detection Extracts when they are ready to be made. This will be done using the detections - create detection extracts notebook.

Once the cut-off date has passed Node Managers are “off duty”! When it’s time for Detection Extracts to be created and disseminated that task will be assigned to the Node Managers, but this does not signify the end of the Push. There are several more “behind the scenes” steps required.

Please refrain from interacting with the Node Database until OTN staff have announced the Push has ended and data may be loaded again.

Detection Extracts

Detection Extracts are the main output of the Push. They contain all the new detection matches for each project. There are multiple types of detection extracts OTN creates:

Detection Extract files are formatted for direct ingestion by analysis packages such as glatos and resonate.

Detections - Create Detection Extracts Notebook

During the Push process, any new detection matches that are made are noted in the obis.detection_extracts_list table of your Node. These entries will have several pieces of useful information:

Using these fields, the detections-create detection extracts notebook can determine which extracts need to be created for each push.

Imports cell

This section will be common for most Nodebooks: it is a cell at the top of the notebook where you will import any required packages and functions to use throughout the notebook. It must be run first, every time.

There are no values here which need to be edited.

User Inputs Database Connection

  1. outputdir = 'C:/Users/path/to/detection extracts/folder/'
    • Within the quotes, please paste a filepath to the folder in which you’d like to save all the Detection Extracts.
  2. engine = get_engine()
    • Within the open brackets you need to open quotations and paste the path to your database .kdbx file which contains your login credentials.
    • On MacOS computers, you can usually find and copy the path to your database .kdbx file by right-clicking on the file and holding down the “option” key. On Windows, we recommend using the installed software Path Copy Copy, so you can copy a unix-style path by right-clicking.
    • The path should look like engine = get_engine(‘C:/Users/username/Desktop/Auth files/database_conn_string.kdbx’).

Once you have added your information, you can run the cell. Successful login is indicated with the following output:

Auth password:········
Connection Notes: None
Database connection established
Connection Type:postgresql Host:db.your.org Database:your_db User:node_admin Node: Node
Testing dblink connections:
	saf-on-fact: DBLink established on user@fact.secoora.org:5002 - Node: FACT
	saf-on-migramar: DBLink established on user@db.load.oceantrack.org:5432 - Node: MIGRAMAR
	saf-on-nep: DBLink established on user@db.load.oceantrack.org:5432 - Node: NEP
	saf-on-otn: DBLink established on user@db.load.oceantrack.org:5432 - Node: OTN
	saf-on-act: DBLink established on user@matos.asascience.com:5432 - Node: ACT
	saf-on-pirat: DBLink established on user@161.35.98.36:5432 - Node: PIRAT
	saf-on-path: DBLink established on user@fishdb.wfcb.ucdavis.edu:5432 - Node: PATH

You may note that there are multiple DB links required here: this is so that you will be able to include detection matches from all the Nodes. If your kdbx file doesn’t include any of your DB link accounts, reach out to OTN to help set it up for you.

Detection Extract Selection

There are two options for selecting which Detection Extracts to create:

  1. There is a manual entry cell. Here you can paste a list of extracts in this format (one per line):
    • project code (capitals), year, type
  2. There is a cell to query the obis.detection_extracts_list table. This is the preferred method.
    • enter the current Push date like push_date = 'YYYY-MM-DD'

Once you have a list of the Detection Extracts to create, you can move on. The next cell will create a list of all the extracts that were just created, which you can use for your own records. It will save in your ipython-utilities folder.

Create Detection Extracts

This cell will begin creating the identified detection extracts, one by one. You will be able to see a summary of the matched projects for each extract. Please wait for them all to complete - indicated by a green checkmark and a summary of the time it took to complete the extract.

The following section is for Nodes who use Plone as their document management system only

Uploading Extracts to Plone

First the notebook will print a list of all the extracts that need to be uploaded. It should match the list of those just created.

Next, you will need to connect to Plone using a .auth file. The format will be like this: plone_auth_path = r'C:/path/to/Plone.auth'. Success will be indicated with this message:

Plone authorization was successful.
Connected to 'https://members.oceantrack.org' as 'USER'

Now the notebook will upload all the Detection Extracts into their relevant folders on Plone.

Please wait for them all to complete - indicated by a green checkmark and a summary of the time it took to complete the extract.

Emailing Researchers - Plone

Using the Plone users system, its possible to identify which researchers require an email notification. First you need to supply a .auth file for an email account. The format will be like this: email_auth_path = r'C:/path/to/email.auth'. Success will be indicated with this message:

Email auth is set:
 user= otndc@dal.ca
 host= smtp.office365.com
 cc= otndc@dal.ca
 port= XXX

Upon successful login, you will be able to print out your current email template. If it is not adequate, you can edit the template by changing the det_extracts_emailSpecial.j2 template in the templates subfolder of ipython-utilities, and changing the filepath to be email_template = 'templates/det_extracts_emailSpecial.j2', then re-running.

Finally, this stage will send the emails. Ensure that date = 'YYYY-MM-DD' for the date you uploaded the extracts to Plone. This is how the notebook will determine which links to include in the email template. First: set send_mail = False. Run the cell, select the projects of interest and Simulate Sending Emails. If you are pleased with the output, you can then change send_mail = True and re-run. Choose Send Emails and they will be sent.

Emailing Researchers - Manual

If you are not using the Plone system for sending emails, you can use the manual email tool.

You will first enter the collectioncode which has the new Detection Extract file requiring notification: cntct_schema = 'schema'. Please edit to include the project code, in lowercase, between the quotation marks.

An interactive menu will then appear, listing all the contacts associated with the project. You can select each of those you’d like to email.

The following cell will print out the resulting contact list.

Next, you will be able to review the email template. Required input includes:

If the email preview is acceptable, you may run the final cell in this section which will send the emails.

Update Extract Table

Once all extracts are made, uploaded to your file management system and emails have been sent to researchers, the final step is to ensure we mark in the obis.detection_extracts_list table that we have completed these tasks.

Please enter current_push_date = 'yyyy-mm-dd' : the date of the Push when these extracts have been made.

Then, an interactive dataframe will appear. This dataframe will allow you to check-off the extracts as completed based on those you were able to successfully create.

Now you’re done with Detection Extracts until next Push!

Key Points

  • A Data Push is when we verify all the data in the system, fix any issues, and then provide detection matches to researchers

  • As Node Managers its your responsibility to get the data into the system so OTN can verify and get it ready to be sent out

  • Detection Extracts are the main end product of the Push


Fixing Data Errors

Overview

Teaching: 10 min
Exercises: 0 min
Questions
  • How do I identify data errors?

  • How do I correct data errors?

Objectives
  • Understand the workflow for faxing data errors.

Identifying Errors

During the process of loading data it is possible to uncover errors with previously-loaded datasets. OTN is constantly improving our QA/QC tools, which means we are identifying and correcting more and more historical errors.

Generally, there a few ways Node Managers will identify errors:

In the latter case, full comparison between the records is required, followed by a discussion with the researcher to identify if the previously-loaded records or the new records are correct. Often, the outcome is that the data in the DB needs correction.

Scoping the Required Correction

Once an error has been identified, the correction needs to be scoped. This includes: which tables are affected by the error, which catalognumbers, etc. Please use the DB Fix Issue GitLab template.

Here is the template for reference:

# **DB Fix Issue**

__This issue title should include schema name, year of record affected__
- eg "HFX 2020 receiver lat/long change"

__add DB fix label to issue__

## Related gitlab issue:
- eg #1234, [paste link to issue]

## Schema:
- eg HFX

## DB table name:
- eg HFX.moorings

### information needed for each record that needs fixing (repeat for each catalog number):
1. unique identifier:
1. original value:
1. new value:

- [ ] NAME - label issue `db fix`
- [ ] NAME - tag OTNDC staff for assistance with request (@diniangela etc)

Once fully scoped, you may assign the Issue to OTN Staff to complete the correction. If it is a simple process, they may provide instructions to the Node Manager to complete. Do not attempt to correct the issue yourself without consultation with OTN.

Database Fix Notebooks

Currently, the OTN Database Team is working on a suite of notebooks for fixing common issues found in the database. These are the tools that the OTN Team will be using to correct the identified errors, if the tool to do so already exists.

These notebooks are beyond the scope of the current training but eventually Data Managers who wish to learn more will be able to take further training. In the meantime, if you see notes in the notebooks such as “Use the DB Fix notebook called XXXX to correct this error”, please create a DB Fix Issue ticket and pass it off to OTN.

Key Points

  • OTN is developing tools to support this.


Upholding the Data Policy

Overview

Teaching: 10 min
Exercises: 0 min
Questions
  • How do I handle data requests?

Objectives
  • Understand how to uphold the Network’s Data Policy.

As an Node Data Manager, you will have the key responsibility to uphold your Network’s Data Privacy Policy. You have access to private datasets that not even your direct supervisors will have access to.

For this reason, you need to be intimately familiar with the types of data which are under embargo/private, and what is publicly available for your Network. You will need to develop an internal protocol for when requests for data access are submitted, to ensure appropriate care is taken to protect the integrity of your members’ data.

OTN recommends creating a “Data Request Response Policy”.

External Requests

This is an example of how OTN handles these requests:

  1. Request for data (from person other than data owner) submitted
  2. Data request is scoped, in a GitLab Issue. All details from requester is included.
  3. Impacted PIs are identified, and contacted, seeking written permission for requester to access the information.
  4. Written permission is documented in GitLab Issue, to preserve the paper-trail.
  5. Data request report is compiled and provided to requester, once all permissions have been received.

Internal Requests

Internal requests, from other Network staff, need to be handled in a similar way. While all Network staff should be familiar with the Data Policy, this may not be the case. It is the Node Manager’s responsibility to ensure that information from the database is not shared outside of internal reports. We need to track requests from all sources, for OTN this even includes the OTN Glider and Field teams too, so we can have a record of who knows what, and be able to remind them of our policies officially.

Partner-Node Data Policies

It’s important to remember that the Data Policy of the Network which holds the tag records will apply to the detections of those tags, in other Nodes. You cannot share information/detections of tags from outside your Node, without first consulting with your partner’s Data Policy. This will likely not be an issue for most Nodes, but is a key consideration for OTN staff when creating data products.

For guidance on the interpretation of your Data Policy, or the Data Policy of a parter Node, OTN’s Director of Data Operations is available to assist.

Key Points

  • Node Managers have a key responsbility to ensure the Data Policy is followed.


Nodebook Improvement and Development

Overview

Teaching: 10 min
Exercises: 0 min
Questions
  • How do I request new features?

  • How do I report bugs to OTN?

  • What is my role in Nodebook develeopment?

Objectives
  • Understand the process for developing the Nodebooks

  • Understand the process for testing new Nodebook changes

Once you begin using the OTN Nodebooks, you will likely discover tools that are missing, or features which have code-bugs in them. OTN developers are here to help ensure the Nodebooks meet your needs. We are constantly changing our processes and are always open to suggestions to improve! These tools are for you, and we want to ensure they are useful.

New Features

If there is a feature that you’d like to see, you can bring this to OTN’s attention in this way:

  1. Ask in our Slack channels to see if there is a tool that already exists that will fit your needs. Discuss your ideal-feature with OTN developers to help them understand what you need.
  2. Once the new feature is properly scoped, create a new GitLab Issue here https://gitlab.oceantrack.org/otn-partner-nodes/ipython-utilities/-/issues, using the new_feature template. You should assign to one of the OTN developers, who can begin to work on the request.

Here is the “new_feature” template, for your information:

            Note that this template is **ONLY** used for tracking new features. Bugs should be filed separately using the Bug template.

            The new feature should be proposed, scoped, and approved **before** you fill out this template.

            ## Summary
            **Provide a general summary of the feature to be built.**

            ## Rationale
            **Why do we want this?**

            ## End Users
            **Who is expected to use this functionality most?**

            ## Scope
            **What are the expected qualities and functionality of this new feature? One per line.**

            ## Minimum viable product
            **What subset of the above functionality is necessary before the feature is considered finished?**

            ## Timeline
            **Give a brief, ballpark estimate on when you would want this done.**

Bug-fixes

If you encounter an error in your Nodebooks, its possible there is an issue with your dataset. Sometime, however, the case is that the Nodebook is not functioning as expected! If you believe a certain Nodebook is malfunctioning, you will want to identify this bug to OTN developers as soon as possible.

To identify a bug, here are the steps to take:

  1. Ask in our Slack channels to see if the error is caused by your dataset. This can include posting an error message, or just describing the output from the Nodebook, and why it is not as expected.
  2. If OTN developers identify that the problem is not your dataset, the next step will be to create a GitLab Issue here https://gitlab.oceantrack.org/otn-partner-nodes/ipython-utilities/-/issues, using the bug template. You should assign to one of the OTN developers, and use the label bugfix.

Here is the “bug” template, for your information:

            Note that this template is **ONLY** used for reporting bugs. New features must be vetted and filed separately using the New Feature template.

            As such, before you continue, ask yourself:
            **Does this ticket concern existing functionality that is broken? Or is it a deficiency in the system as-designed that needs net new work to complete?**
            If it's the former, then use this template, otherwise, use the New Feature template.

            ## Summary
            **Provide a general summary of the problem.**

            ## Expected Behavior
            **What should be happening?**

            ## Current Behavior
            **What is actually happening?**

            ## Steps to Reproduce
            **How can I see this bug happening? Where does it happen? Provide links to specific pages where the bug is visible, along with steps to reproduce it.**

            ## Priority
            **On a scale from 1 to 5, approximate the severity of the issue. Be honest.**

Testing Features/Fixes

Once OTN developers have attempted to build your new feature, or fix your bug, they will publish their code into the integration branch of our ipython-utilities Nodebooks. You will be asked to test these changes, to ensure the end-product is satisfactory. This branch is used for development and testing only, and should not be used for data-loading outside of testing.

Here are instructions for changing branches:

(Replace anything in <> with the sensible value for your installation/situation)

Open git bash or terminal.

From your git bash window, type:

cd "/path/to/your/ipython-utilities"

This will put your terminal or git bash terminal in the folder where this codebase lives on your computer. Now we’re going to

git reset --hard
git checkout <branch you are moving to ie:integration>
git pull

Resolve python libraries by using the mamba command:

mamba env update -n root -f environment.yml

You will then be able to test the new code. When you are finished, follow the same steps to switch back to the master branch to continue your work. The master branch contains all the production-quality tools.

Key Points

  • The Nodebooks are for you - OTN wants to ensure they remain useful