Deployment Metadata

Overview

Teaching: 30 min
Exercises: 0 min
Questions
  • How do I load new deployments into the Database?

Objectives
  • Understand how to complete the template

  • Understand how to use the GitLab checklist

  • Learn how to use the Deploy notebook

Process workflow

The process workflow for deployment metadata is as follows:

flowchart LR
    tag_start(( )) --> get_meta(Receive 
deployment metadata
from researchers) style tag_start fill:#00FF00,stroke:#00FF00,stroke-width:4px get_meta --> gitlab(Create
Gitlab
issue) gitlab --> inspect(Visually
inspect) inspect --> nodebook(Process and verify
with nodebooks) nodebook --> plone(Add metadata
to repository folder) plone --> otn(Pass to
OTN) otn --> end2(( )) style end2 fill:#FF0000,stroke:#FF0000

Once a project has been registered, the next step (for Deployment and Data project types) is to begin to quality control and load the instrument deployment metadata into the database. Deployment metadata should be reported to the Node in the template provided here. This file will contain information about the deployment of any instruments used to detect tagged subjects or collect related data. This includes stationary test tags, range test instruments, non-acoustic environmental sensors, and so on. Geographic location is recorded, as well as the duration of the deployment for each instrument. The locations of these listening stations are used to fix detections geographically.

Recall that there are multiple levels of data-tables in the database for deployment records: raw tables, rcvr_locations, stations and moorings. The process for loading instrument metadata reflects this, as does the GitLab task list.

Submitted Metadata

Immediately upon receipt of the metadata, create a new GitLab issue. Please use the Receiver_metadata Issue checklist template.

Here is the Issue checklist, for reference:

Receiver Metadata
- [ ] - NAME add label *'loading records'*
- [ ] - NAME load raw receiver metadata (`deploy` notebook) **put_table_name_in_ticket**
- [ ] - NAME check that station locations have not changed station "NAMES" since last submission (manual check)
- [ ] - NAME verify raw table (`deploy` notebook)
- [ ] - NAME post updated metadata file to project repository (OTN members.oceantrack.org, FACT RW etc)
- [ ] - NAME email notification of updated metadata file to PI and individual who submitted
- [ ] - NAME load station records (`deploy` notebook)
- [ ] - NAME verify stations (`deploy` notebook)
- [ ] - NAME load to rcvr_locations (`deploy` notebook)
- [ ] - NAME verify rcvr_locations (`deploy` notebook)
- [ ] - NAME add transmitter records receivers with integral pingers (`deploy` notebook)
- [ ] - NAME load to moorings (`deploy` notebook)
- [ ] - NAME verify moorings (`deploy` notebook)
- [ ] - NAME label issue with *'Verify'*
- [ ] - NAME pass issue to OTN DAQ for reassignment to analyst
- [ ] - NAME check if project is OTN loan, if yes, check for lost indicator in recovery column, list receiver serial numbers for OTN inventory updating.
- [ ] - NAME pass issue to OTN analyst for final verification
- [ ] - NAME check for double reporting (verification_notebooks/Deployment Verification notebook)

**receiver deployment files/path:**

Visual Inspection

Once the researcher provides the completed file, the Data Manager should complete a visual check for formatting and accuracy.

In general, the deployment metadata contains information on the instrument, the deployment location, and the deployment/recovery times.

Check for the following in the deployment metadata:

  1. Is there any information missing from the essential columns? These are:
    • otn_array
    • station_no
    • deploy_date_time
    • deploy_lat
    • deploy_long
    • ins_model_no
    • ins_serial_no
    • recovered
    • recover_date_time
  2. If any of the above mandatory fields are blank, follow-up with the researcher will be required if:
    • you cannot discern the values yourself.
    • you do not have access to the Tag or Receiver Specifications from the manufacturer (relevant for the columns containing transmitter information).
  3. Are the station names in the metadata consistent with those already loaded to the database (ex. ‘_yyyy’ appended to station names or special characters in the metadata)?
  4. Are all lat/longs in the correct sign? Are they in the correct format (decimal degrees)?
  5. Do all transceivers/test tags have their transmitters provided?
  6. Are all recoveries from previous years recorded?
  7. Do comments suggest anything was lost or damaged, where recovery indicator doesn’t say “lost” or “failed”?

In general, the most common formatting errors occur in records where there are >1 instrument deployed at a station, or where the receiver was deployed and recovered from the same site.

The metadata template available here has a Data Dictionary sheet which contains detailed expectations for each column. Refer back to these definitions often. We have also included some recommendations on our FAQ page. Here are some guidelines:

Task List Checkpoint

In GitLab, this task can be completed at this stage:

- [ ] - check that station locations have not changed station "NAMES" since last submission (manual check)

Quality Control - Deploy Nodebook

Each step in the Issue checklist will be discussed here, along with other important notes required to use the Nodebook.

Imports Cell

This section will be common for most Nodebooks: it is a cell at the top of the notebook where you will import any required packages and functions to use throughout the notebook. It must be run first, every time.

There are no values here which need to be edited.

Path to File

In this cell, you need to paste a filepath to the relevant Deployment Metadata file. The filepath will be added between the provided quotation marks.

Correct formatting looks something like this:

# Shortfrom metadata path (xls, csv)
filepath = r'C:/Users/path/to/deployment_metadata.xlsx'

You also must select the format of the Deployment metadata. Currently, only the FACT Network uses a slightly different format than the template available here. If its relevant for your Node, you can edit the excel_fmt section.

Correct formatting looks something like this:

excel_fmt = 'otn' # Deployment metadata format 'otn' or 'fact'

Once you have added your filepath and chosen your template format, you can run the cell.

Next, you must choose which sheet you would like to quality control. Generally, it will be named Deployment but is often customized by researchers. Once you have selected the sheet name, do not re-run the cell to save the output - simply ensure the correct sheet is highlighted and move onto the next cell.

Table Name and Database

You will have to edit three sections:

  1. schema = 'collectioncode'
    • Please edit to include the relevant project code, in lowercase, between the quotes.
  2. table_name = 'c_shortform_YYYY_mm'
    • Within the quotes, please add your custom table suffix. We recommend using year_month or similar, to indicate the most-recently deployed/recovered instrument in the metadata sheet.
  3. engine = get_engine()
    • Within the open brackets you need to open quotations and paste the path to your database .kdbx file which contains your login credentials.
    • On MacOS computers, you can usually find and copy the path to your database .kdbx file by right-clicking on the file and holding down the “option” key. On Windows, we recommend using the installed software Path Copy Copy, so you can copy a unix-style path by right-clicking.
    • The path should look like engine = get_engine('C:/Users/username/Desktop/Auth files/database_conn_string.kdbx').

Once you have added your information, you can run the cell. Successful login is indicated with the following output:

Auth password:········
Connection Notes: None
Database connection established
Connection Type:postgresql Host:db.for.your.org Database:your_db_name User:your_node_admin Node:Node

Verification of File Contents

Run this cell to complete the first round of Quality Control checks.

The output will have useful information:

The Nodebook will indicate the sheet has passed quality control by adding a ✔️green checkmark beside each section. The Nodebook will also generate an interactive plot for you to explore, summarizing the instruments deployed over time, and a map of the deployments.

Using the map, please confirm the following:

  1. The instrument deployment locations are in the part of the world expected based on the project abstract. Ex: lat/long have correct +/- signs.
  2. The instrument deployments do not occur on land.

If there is information that fails quality control, you should fix the source-file (potentially after speaking to the researcher) and try again.

Deploy 1

Loading the Raw Table

ONLY once the source file has successfully passed ALL quality control checks can you load the raw table to the database.

You have already named the table above, so there are no edits needed in this cell.

The Nodebook will indicate the success of the table-creation with the following message:

Reading file 'deployment_metadata.xlsx' as otn formatted Excel.
Table Loading Complete:
 Loaded XXX records into table schema.c_shortform_YYYY_mm

Task List Checkpoint

In GitLab, this task can be completed at this stage:

- [ ] - load raw receiver metadata ("deploy" notebook) **put_table_name_in_ticket**

Ensure you paste the table name (ex: c_shortform_YYYY_mm) into the indicated section before you check the box.

Verify Raw Table

This cell will now complete the Quality Control checks of the raw table. This is to ensure the Nodebook loaded the records correctly from the Excel sheet.

The output will have useful information:

The Nodebook will indicate the sheet had passed quality control by adding a ✔️green checkmark beside each section.

If there are any errors, go into database and fix the raw table directly, or contact the researcher and then fix the raw table.

Task List Checkpoint

In GitLab, this task can be completed at this stage:

- [ ] - NAME verify raw table ("deploy" notebook)

Loading Stations Records

STOP - confirm there is no Push currently ongoing. If a Push is ongoing, you must wait for it to be completed before processing beyond this point

Only once the raw table has successfully passed ALL quality control checks can you load the stations information to the database stations table.

Running this cell will first check for any new stations to add, then confirm the records in the stations table matches the records in the moorings table where basisofrecord = 'STATION'.

If new stations are identified:

The success message will look like:

Adding station records to the stations table.
Creating new stations...
Added XX new stations to schema.moorings

If the stations and moorings tables are not in sync, you will need to compare the two tables for differences and possibly update one or the other.

Task List Checkpoint

In GitLab, this task can be completed at this stage:

- [ ] - load station records ("deploy" notebook)

Verify Stations Table

This cell will now complete the Quality Control checks of the stations records contained in the entire schema. We are no longer only checking against our newly-loaded records, but also each previously-loaded record in this schema/project. This will help catch historical errors.

The output will have useful information:

The Nodebook will indicate the sheet had passed quality control by adding a ✔️green checkmark beside each section.

If there are any errors, you could directly connect to the database and fix the raw table directly, or contact the researcher and then fix the raw table using updated input metadata. If there are problems with records that have already been promoted to the stations or moorings tables, you will need to create a db fix ticket in Gitlab in order to correct the records in the database.

Task List Checkpoint

In GitLab, this task can be completed at this stage:

- [ ] - verify stations ("deploy" notebook)

Load to rcvr_locations

Once the station table is verified, the receiver deployment records can now be promoted to the “intermediate” rcvr_locations table.

The cell will identify any new deployments to add and any previously-loaded deployments which need updating (ex: they have been recovered).

If new deployments are identified:

If deployment updates are identified:

Each instance will give a success message such as:

Loading deployments into the rcvr_locations_table
Loaded XX records into the schema.rcvr_locations table.

Task List Checkpoint

In GitLab, this task can be completed at this stage:

- [ ] - load to rcvr_locations ("deploy" notebook)

Verify rcvr_locations

This cell will now complete the Quality Control checks of the rcvr_locations records contained in the entire schema. We are no longer only checking our newly-loaded records, but also each previously-loaded record for this schema/project. This will help catch historical errors.

The output will have useful information:

The Nodebook will indicate the table has passed quality control by adding a ✔️green checkmark beside each section.

If there are any errors with records that have already been promoted to the rcvr_locations table, you will need to create a db fix ticket in Gitlab to correct the records in the database. You may need to contact the researcher before resolving the error.

Task List Checkpoint

In GitLab, this task can be completed at this stage:

- [ ] - verify rcvr_locations ("deploy" notebook)

Load Transmitter Records to Moorings

The transmitter values associated with transceivers, co-deployed sentinel tags, or stand-alone test tags will be loaded to the moorings table in this section. Existing transmitter records will also be updated, if relevant.

If new transmitters are identified:

If transmitter updates are identified:

Task List Checkpoint

In GitLab, this task can be completed at this stage:

- [ ] - load transmitter records receivers with integral pingers ("deploy" notebook)

Load Receivers to Moorings

The final, highest-level table for instrument deployments is moorings.

The cell will identify any new deployments to add and any previously-loaded deployments which need updating (ex: they have been recovered).

Please review all new deployments and deployment updates for accuracy, then press the associated buttons to make the changes. At this stage, the updates are not editable: any updates chosen from the rcvr_locations section will be processed here.

You may be asked to select an instrumenttype for certain receivers. Use the drop-down menu to select before adding the deployment.

Task List Checkpoint

In GitLab, this task can be completed at this stage:

- [ ] - load to moorings ("deploy" notebook)

Verify Moorings

This cell will now complete the Quality Control checks of the moorings records contained in the entire schema. We are no longer only checking our newly-loaded records, but also each previously-loaded record in this project/schema.

The output will have useful information:

The Nodebook will indicate the table has passed quality control by adding a ✔️ green checkmark beside each section.

If there are any errors with records that have already been promoted to the moorings table, you will need to create a db fix ticket in Gitlab to correct the records in the database. You may need to contact the researcher before resolving the error.

Task List Checkpoint

In GitLab, this task can be completed at this stage:

- [ ] - verify moorings ("deploy" notebook)

Final Steps

The remaining steps in the GitLab Checklist are completed outside the Nodebooks.

First: you should access the Repository folder in your browser and add the cleaned Deployment Metadata .xlsx file into the “Data and Metadata” folder.

Finally, the GitLab ticket can be reassigned to an OTN analyst for final verification in the database.

Key Points

  • Loading receiver metadata requires judgement from the Data Manager

  • Communication with the researcher is essential when errors are found