Introduction to Nodes
Overview
Teaching: 30 min
Exercises: 0 minQuestions
What is an OTN-style Database Node?
What is expected from a Node Manager?
Who is this training for?
What is the schedule for the next several days?
Objectives
Understand the relationship between OTN and its Nodes.
Ensure attendees have required background information.
What is a Node?
OTN partners with regional acoustic telemetry networks around the world to enable detection-matching across our communities. An OTN Node is an exact copy of OTN’s acoustic telemetry database structure, which allows for direct cross-referencing between the data holdings of each regional telemetry sharing community. The list of OTN Nodes is available here: https://members.oceantrack.org. Data only needs to be reported to one Node in order for tags/detections to be matched across all.
How does a Node benefit its users?
OTN and affiliated networks provide automated cross-referencing of your detection data with other tags in the system to help resolve “mystery detections” and provide detection data to taggers in other regions. OTN Data Managers extensively quality-control submitted metadata to ensure the most accurate records possible are stored in the database. OTN’s database and Data Portal website are well suited for archiving datasets for future use and sharing with collaborators. The OTN system includes pathways to publish datasets with OBIS, and for sharing via open data portals such as ERDDAP and GeoServer. The data-product format returned by OTN is directly ingestible by analysis packages including glatos and resonATe. OTN offers continuous support for the use of these packages and tools.
Below is a presentation from current Node Managers, describing the relationship between OTN and its Nodes, the benefits of the Node system as a community outgrows more organic person-to-person sharing, as well as a realistic understanding of the work involved in hosting/maintaining a Node.
Node Managers
To date, the greatest successes in organizing telemetry communities has come from identifying and working with local on-the-ground Node Managers for each affiliated Node. The trusted and connected ‘data wranglers’ have been essential to building and maintaining the culture and sense of trust in each telemetry group.
In order to be successful as a Node Manager in your region, here are a few tips and guidelines:
- Ensure you set aside time each week to wear your ‘Node Manager hat’.
- Familiarize yourself with the metadata reporting templates, and follow carefully the formats required for each variable.
- Ensure you have continuous communication with the OTN data team so you are able to align your detection matching with all other Nodes, and keep your local toolbox up to date.
- Ensure you have consistent communication with your local telemetry community - knowing who’s who is important for relationship building.
- Be willing to learn and ask questions - we are always trying to improve our tools and processes!
No previous coding or data management experience is required! Anyone who is willing to put in the work to become a Data Manager can be successful. Being involved in the telemetry community as a researcher (or affiliate) is enough to get you started with ‘data wrangling’ in your region.
Node Training
Each year OTN hosts a training session for Node Managers. This session is not only for new Node Managers, but also a refresher for current Node Managers on our updated tools and processes.
This is a hands-on course, participants will be using the tools to practice loading telemetry data with us, using a Training Node we have built for this purpose. This means you will need to install all required software and devote full attention for the next several days.
Here are the general topics that will be covered:
- OTN Node structure and database maintenance
- Data loading workflow: from metadata to detection extracts, and how we track progress in GitLab
- Practice interfacing with an OTN Node database in DBeaver, using SQL scripts
- Practice using OTN’s Nodebooks to quality control, process and verify records, using python and jupyter notebooks
- Overview of OTN’s Data Push process, and how Node Managers are involved
- Data Policy guidelines as a Node Manager
If you do not intend on learning how to load data to an OTN-style Node (and would prefer to be a spectator) please let us know, so we can identify who our hands-on learners will be.
A great resource for Node Managers as they get started will be OTN’s FAQ page - https://members.oceantrack.org/faq. Your local telemetry community will likely have many questions about the Node and how it works, and the FAQs can help answer some of these questions.
Key Points
Your Node is fully compatible with all others like it.
A well-connected Node Manager is essential.
OTN staff are always availble to support Node Managers.
OTN System, Structure and Outputs
Overview
Teaching: 25 min
Exercises: 0 minQuestions
What does an OTN-style Database look like?
What is the general path data takes through the OTN data system
What does the OTN data system output?
Objectives
Understand the OTN Database structure on a high level
Understand the general path of data of data through the OTN system
Understand what the OTN system can output
OTN Data System
The OTN data system is an aggregator of telemetry data made up of interconnected Node Databases and data processing tools. These work together to connect researchers with relevant and reliable data. At the heart of this system are Nodes and their OTN-style Databases.
Affiliated acoustic telemetry partner Networks may become an OTN Node by deploying their own database that follows the same structure as all the others. This structure allows Nodes to use OTN’s data loading processes, produce OTN data products, and match detections across other Nodes.
Basic Structure
The basic structural decision at the centre of an OTN-style Database is that each of a Node’s projects will be subdivided into their own database schemas
. These schemas contain only the relevant tables and data to that project. The tables included in each schema are created and updated based on which types of data each project is reporting.
Projects can have the type tracker
, deployment
, or data
.
- Tracker projects only submit data about tag-releases and animals. They get tables based on the tags, animals, and detections of those tags.
- Deployment projects only submit data about receivers and their collected data. These projects get tables related to receiver deployments and detections on their receivers.
- Data projects are projects that deploy both tags and receivers and will submit data related tags, animals, receivers, and detections and will get all the related tables.
In addition to the project-specific schemas
, there are some important common schemas in the Database that Node Managers will interact with. These additional schemas include the obis
, erddap
, geoserver
, vendor
, and discovery
schemas. These schemas are found across all Nodes and are used to create important end-products and for processing.
- The
obis
schema holds summary data describing each project contained in the Node as well as the aggregated data from those projects. When data goes into a final table of the project schema it will be inherited into a table inobis
(generally with a similar name). - The
erddap
schema holds aggregated data re-formatted to be used to serve telemetry data via an ERDDAP data portal. - The
geoserver
schema holds aggregated data re-formatted to be used to create geospatial data products published to a GeoServer. - The
vendor
schema holds manufacturer specifications for tags and receivers, used for quality control purposes. - The
discovery
schema holds summaries of data across the OTN ecosystem. These tables are used to create summary reports and populate statistics and maps on partner webpages.
The amount of information shared through the discovery tables can be adjusted based on sharing and reporting requirements for each Node.
The Path of Data
The OTN data system takes 4 types of data/metadata: project, tag, instrument deployments, and detections. Most data has a similar flow through the OTN system even though each type has different notebooks and processes for loading. The exception to this is project
metadata which has a more unique journey because it is completely user-defined, and must be used to initially define and create a project’s schema
.
Project Data
Project
data has a unique workflow from the other input data and metadata that flows into an OTN Node, it is generally the first bit of information received about a project, and will be used to create the new schema
in the Database for a project. The type of project selected (tracker
, deployment
, or data
) will determine the format of the tables in the newly created schema
. The type of project will also impact the loading tools and processes that will be used later on. The general journey of project data is:
- To register a new project a researcher will fill out a
project metadata
template and submit it to the Node Manager. - The Node Manager will visually evaluate the template to catch any obvious errors and then runs the data through the OTN Nodebook responsible for creating and updating projects (
Create and Update Projects
). - The
Create and Update Projects
notebook will make a new schema in the Database for that project, and fill it with the required tables based on the type of project. - Summary tables are populated at this time (
scientificnames
,contacts
,otn_resources
etc). - After this, OTN will verify the project one last time to make sure every necessary field is filled out and properly defined.
Tag, Deployment and Detections Data
Even though tag
, deployment
, and detections
data all have their own loading tools and processes, their general path through the database is the same.
- Their data workflows all begin with a submission of data or metadata files from a researcher.
- The Node Manager ensures there is a copy of the file on the Node’s document management website.
- They will then do some quick visual QC to catch any obvious errors.
- The data is then processed through the relevant OTN Nodebooks. This process is outlined by the task list associated with the GitLab Issue made for this data.
- The data will first be loaded into the
raw
tables. This is the table that holds the raw data as submitted by the researcher (the naming convention for raw tables: they always have the prefixc_
and will have a suffix indicating the date it was loaded, typicallyYYYY_MM
). - After the raw data table is verified, the data will move to the
intermediate
tables which hold partially-processed data as a “staging area”. - After the intermediate table is verified, data will move to the
upper
tables, where the data is finished processing and is in its “final form”. This is the data that will be used for aggregation tables such asobis
and for outputs such asDetection Extracts
.
OTN Data Products
The OTN Database has specific data products available, based upon the clean processed data, for researchers to use for their scientific analysis.
In order to create meaningful Detection Extracts, OTN and affiliated Nodes only perform cross-matching events every 4 months (when a reasonable amount of new data has been processed). This event is called a synchronous Data Push
. In a Data Push
:
- All recently-loaded data is verified and updated.
- Cross-node matching is done; where detections are matched to their relevant tag, across all Nodes.
- Once cross-node matching is done, Detection Extracts are created, containing all the new detections matches for each project. Detection Extract files are formatted for direct ingestion by analysis packages such as glatos and resonate.
- Summary schemas like
discovery
,erddap
, andgeoserver
are updated with the newly verified and updated data. - Summary schema records can be used to create maps and other record overviews such as this map of active OTN receivers:
Backing Up Your Data
As with any database, it is important to make sure the data held by the OTN Database Node is protected. To ensure you are protected your Database and potentially the files contributed to your Node by the research community you support should be backed up properly, in the event your primary Database crashes, is corrupted, is lost, or any other unexpected event.
You should discuss and coordinate a backup strategy with the groups or institutions responsible for your Database’s administration, and find out their policies and practices for performing backups. Backup strategies may vary from group to group but it is a good idea to make sure they are adequately backing up your data daily, if not multiple times a day, and keep copies of backups in different physical locations in the case that something happens at a single location.
OTN is happy to offer guidance and recommendations to any group.
Key Points
All OTN-style Databases have the same structure
Databases are divided into project schemas which get certain tables based on the type of data they collect
Data in the OTN system moves from the raw tables to the intermediate tables to the upper tables before aggregation
Setup and Installing Needed Software
Overview
Teaching: 10 min
Exercises: 50 minQuestions
What software does a Node Manager need?
Why do I need the recommended software?
How do I install the required software?
Objectives
Understand how to install required software and prepare to load data
Ensure attendees have the required software installed and are ready to use it
In order to work efficiently as a Node Manager, the following programs are necessary and/or useful.
To standardize the verification and quality control process that all contributing data is subjected to, OTN has built custom quality control workflows and tools for Node Managers, often referred to as the OTN Nodebooks. The underlying functions are written in Python
and workflows that rely on them can be undertaken through the use of Jupyter
Notebooks. In order to use these tools, and interact with your database, you will need to install a few different applications and packages. All installation instructions are also available on our GitLab here.
This lesson will give attendees a chance to install all the relevant software, under the supervision of OTN staff.
Python/Mamba
Python
is a popular general-purpose programming language that can be used for a wide variety of applications. It is the main language used by OTN and our data processing pipeline.
Mamba
is fast, cross-platform python distribution and a package manager. When you install Mamba (through Miniforge)
you get a python
interpreter, and many of the core python libraries. Managing your Python installation with Mamba allows you to be able to install and update all the packages needed to run the Nodebooks with one command rather than having to install each one individually.
Install Miniforge3 - https://conda-forge.org/miniforge/
- Select the option to
add to PATH environment variable
(during install steps)!
Git
Git
is a version-control system, it helps people to work collaboratively and maintains a complete history of all changes made to a project. We use Git
at OTN to track changes to the Nodebooks made by our developer team, and occasionally you will need to update your Nodebooks to include those changes.
Install Git
-
Windows- https://git-scm.com/download/win
Nodebooks - iPython Utilities
The ipython-utilities
project contains the collection of Nodebooks used to load data into the OTN data system.
Create an Account
First, you will need a GitLab account. Please fill out this signup form for an account on GitLab.
Then, OTN staff will give you access to the relevant Projects containing the code we will use.
Install iPython Utilities
- Determine the folder in which you wish to keep the iPython Utilities Nodebooks.
- Open your
terminal
orcommand prompt
app.- Type
cd
thenspace
. - You then need to get the filepath to the folder in which you wish to keep the iPython Utilities Nodebooks. You can either drag the folder into the
terminal
orcommand prompt
app or hitshift/option
while right clicking and selectcopy as path
from the menu. - Then paste the filepath in the
terminal
orcommand prompt
and hitenter
- In summary, you should type
cd /path/to/desired/folder
before pressing enter.
- Type
- Create and activate the “nodebook” python enviornment. The creation process will only need to happen once.
- In your terminal, run the command
conda create -n nodebook python=3.9
- Activate the nodebook environment using
conda activate nodebook
- In your terminal, run the command
- You are now able to run commands in that folder. Now run:
git clone https://gitlab.oceantrack.org/otn-partner-nodes/ipython-utilities.git
. This will get the latest version iPython Utilities from our GitLab - Navigate to the ipython-utilities subdirectory that was created by running
cd ipython-utilities
. - Now to install all required python packages by running the following:
mamba env update -n nodebook -f environment.yml
To open and use the OTN Nodebooks:
- MAC/WINDOWS: Open your terminal, and navigate to your ipython-utilities directory, using
cd /paht/to/ipython-utilities
. Then, run the commands:conda activate nodebook
to activate the nodebook python environmentjupyter notebook --config="nb_config.py" "0. Home.ipynb"
to open the Nodebooks in a browser window.
- DO NOT CLOSE your terminal/CMD instance that opens! This will need to remain open in the background in order for the Nodebooks to be operational.
More operating system-specific instructions and troubleshooting tips can be found at: https://gitlab.oceantrack.org/otn-partner-nodes/ipython-utilities/-/wikis/New-Install-of-Ipython-Utilities
Database Console Viewer
There are database administration applications to assist with interacting directly with your database. There are many options available but DBeaver
and DataGrip
are the most popular options at OTN.
- https://dbeaver.io/ (free and open access - recommended)
- https://www.jetbrains.com/datagrip (free institutional/student access options - another option)
In the next lesson we will practice using our database console viewer and connecting to our node_training
database.
More Useful Programs
In order to work efficiently as a Node Manager, the following programs are necessary and/or useful.
For WINDOWS users
Path Copy Copy - For copying path links from your file browser.
Notepad ++ - For reading and editing code, csv files etc. without altering the formatting.
Tortoise Git - For managing git, avoiding command line.
For MAC users
VS Code - For reading and editing code, csv files etc. without altering the formatting.
Source Tree - For managing git, avoiding command line.
Node Training Datasets
We have created test datasets to use for this workshop. Each attendee has their own files, available at this link: https://members.oceantrack.org/data/repository/fntp/nmt-files/
Please find the folder with your name and download. Save these somewhere on your computer, and UNZIP all files.
Key Points
Node Manager tasks involve the use of many different programs
OTN staff are always available to help with installation of these programs or any issues
There are many programs and tools to help Node Managers
Data Loading Workflow
Overview
Teaching: 30 min
Exercises: 15 minQuestions
How does a Node Manager receive data?
How does a Node Manager track their To-Do list?
How can a Node Manager interact with their database directly?
Objectives
Understand the data-loading workflow
Understand how to create and use GitLab Issues
Understand how to access and query your database tables
Understand how to use the
AUTH - Create and Update
notebook to maintain your database credentials file
Data Managers receive data from a researcher and then begin a several-step process of QA/QC and matching of data.
- Records are received, and immediately a GitLab Issue is created.
- Data are QA/QC’d using the OTN Nodebook tools (covered in detail later in the curriculum), and all progress is tracked in GitLab. Feedback between Data Manager and researchers happens at this stage, until data is clean and all GitLab tasks are completed.
- The successful processing of records can be evaluated by checking the database tables using DBeaver, and SQL queries.
Researcher data submission
There are many ways to receive data from researchers in your community/group. Find and make official the way that works for your community and ensure that that becomes standard practice for reporting to your Node.
File management website
The most common way to receive data and metadata from a researcher is through some type of file management website. This will require either an email notification system for the Node Manager or constant checking to look for new submissions.
OTN-managed Nodes can always use the same Plone file management portal software that OTN itself uses to create and maintain private-access data repository folders into which researchers can deposit their data and metadata. These private folders also serve as the location where Detection Extracts are distributed to users, when available.
The FACT Network currently uses a custom instance of Research Workspace for the same purpose.
The ACT and GLATOS Networks use a custom data-submission form managed through their networks’ web sites.
Its common for groups of researchers to use DropBox, Google Drive, or something similar to share data/metadata when the Network is still small. This can be a great, accessible option but the caveat is that is is much more difficult to control access to each individual folder to protect the Data Policy, and it may be difficult to determine when new data has been submitted.
Email-only submission
Generally, each Node Manager has an email address for communicating with their network’s data submitters (ex: Data @ TheFACTNetwork . org). This is a great way to ensure all Node-related emails are contained in the same account in the case of multiple Node Managers or the succession of a new Node Manager. With proper email management, this can be a very successful way to ask Node-users to submit their data/metadata to your Node.
It is not recommended to use a personal email account for this, since all the files and history of the project’s data submissions will be lost if that Manager ever moves away from the role. If the account is hosted at an institution, it may be advisable to submit requests to raise institutional limits on things like email storage in advance.
Documenting data submission
Using one of the suggested means above, a user has submitted data and metadata to the Node Manager. Now what?
OTN uses GitLab Issues with templates of task-lists to ensure we NEVER forget a step in data loading, and that no file is ever lost/forgotten in an inbox.
Immediately upon receipt of a data file, you are advised to login to OTN’s GitLab (https://gitlab.oceantrack.org). You will have a project for your Node named
Once on the GitLab project page, you should navigate to the Issues menu option, on the left side. Think of your GitLab issues as your running “TODO List”! You will want to create a new Issue for each piece of data that is submitted.
NOTE: GitLab Issues are often referred to as “tickets”
Creating GitLab issues
By choosing the New Issue button in the top-right of your screen, you will be taken to a new, blank, issue form. To fill out the fields you will need to do the following:
- Title: Write the project name/code, the type of data submitted, and the submission date, this makes the ticket searchable in the future (eg:
HFX tag metadata 2022-02
) - Type: No need to edit this field, should be type
Issue
. - Description: 1) There are pre-made Templates to choose from here, using the drop down menu. Ensure you choose the relevant checklist for the type of data
that was submitted (eg:
Tag_metadata
). This will populate the large description field! 2) Ensure you include the link to the submitted data file OR use theAttach a file
option to attach a copy of the submitted data file to the issue. - Assignee: Assign to yourself if this is a task for you, or to anyone else who you would like to delegate to
- Milestone: These are the upcoming Data Push dates. You should choose the nearest future PUSH date as the Milestone for this issue.
- Labels: This is for your reference - choose a label that will help you remember what stage of processing this issue is in. Some common examples include
Needs QC
,Waiting for Metadata
,Waiting for VRLs
,Request PI Clarification
etc. You can create new labels at any time to help sort your tickets.
Now, with all information completed, you can select Create Issue.
Using GitLab to track progress
As you approach the deadline for data-loading, before a data PUSH, you should begin to work on your Issues which fall under that Milestone. When you open an issue, you will be able to see the remaining tasks to properly load/process that data along with the name of the OTN Nodebook you should use to complete each task.
Keep GitLab open in your browser as you work through the relevant Nodebooks. You should check-off the tasks as you complete them, and insert any comments you have into the bottom of the ticket. Comments can include error messages from the Nodebook, questions you have for the researcher, any re-formatting required, etc. At any time you can change the Labels
on the issue, to help you remember the issue’s status at-a-glance.
Once you are done for the day, you’ll be able to come back and see exactly where you left off, thanks to the checklist!
You can tag anyone from the OTN Data Team in your GitLab issue (using the @NAME
syntax). We will be notified via email to come and check out the Issue and answer any questions that have been commented.
Once you have completed all the tasks in the template, you can edit the Assignee
value in the top-right corner, and assign to someone from OTN’s Database team (currently, Angela or Yinghuan). They will complete the final verification of the data, and close the issue when completed. At this time, you can change to the Verify
issue label, or something similar, to help visually “mark it off” your issue list on the main page.
GitLab practice
At this time we will take a moment to practice making GitLab Issues, and explore other pages on our GitLab like, Milestones
, Repository
, Snippets
, and Wiki
.
Database access
As part of the OTN workflow, once we have used the OTN Nodebooks, it may be prudent to use a database client like DBeaver to view the contents of your database Node directly, and be sure the data was indeed loaded as expected.
DBeaver is an open-source application for interacting directly with databases. There are lots of built-in tools for quick query-writing, and data exploration. We will assume that workshop attendees are novices in using this application.
Connecting to your database
For this training we will connect to a Node Training test database, as practice. Once you open DBeaver, you will need to click on the Database
menu item, and choose New Database Connection
. A popup will appear, and you will choose the PostreSQL
logo (the elephant) then click Next. Using the .auth
file provided to you by OTNDC you will complete the following fields:
- Host: this will be something like
matos.asascience.com
for your DB, but we will use the IP address:129.173.48.161
for our Node Training DB. - Database: this will be your database name, something like
pathnode
. For training, it will benode_training
. - Port: this is specified in your
.auth
file and will be four digits. For training, this port will be set to5432
. - Username/Password: your personal username and password, as found on your
.auth
file.
Next you choose Test Connection
and see if it passes the tests. If so, you can choose Finish
and you’re now connected to your database!
On the left-side you should now see a Database Navigator
tab, and a list of all your active database connections. You can use the drop down menu to explore all the schemas
aka: collections stored in your database. You can even view each individual table, to confirm the creation steps in the Nodebooks were successful!
Writing a query in DBeaver
If you wish to write an SQL query to see a specific portion of your already-loaded data, you should first open a new SQL console. Choose SQL Editor
from the top menu, then New SQL Script
. A blank form should appear.
While writing SQL is out of the scope of this course, there are many great SQL resources available online. The general premise involves creating conditional select
statements to specify the data you’re interested in. ex: select * from hfx.rcvr_locations where rcv_serial_no = '12345';
will select all records from the HFX schema’s rcvr_locations table, where the serial number is 12345.
To run a query, you ensure your cursor is on the line you want to run, then you can either 1) right-click, and choose Execute, or 2) press CTRL-ENTER (CMD-ENTER for Mac). The results of your query will be displayed in the window below the SQL console.
OTN is here to support you as you begin to experiment with SQL queries, and the OTN database structure, and can help you build a library of helpful custom queries that you may want or need.
Database Practice
Let’s take a moment to explore some of the tables in the Node Training database, and write some example SQL queries.
Connecting to your Database from the Nodebooks
Now that we have explored and set up some of the tools needed to work as a Node Manager, we can begin preparing our Nodebook connection files. To enhance security, OTN uses encrypted, password-protected .kdbx
files to store login credentials for your database. To create these, we have developed an interactive AUTH - Create and Update
Nodebook.
- Open the OTN Nodebooks
- MAC: Open your terminal, and navigate to your ipython-utilities directory, using
cd /paht/to/ipython-utilities
. Then, run the command:jupyter notebook --config="nb_config.py" "0. Home.ipynb"
to open the Nodebooks - WINDOWS: Double-click the
start-jupyter.bat
file in your ipython-utlities folder, which will open the Nodebooks.
- MAC: Open your terminal, and navigate to your ipython-utilities directory, using
- Your Nodebooks should open in a new browser window, showing the
Home
page. - Open the
AUTH - Create and Update
notebook
Imports cell
This section will be common for most Nodebooks: it is a cell at the top of the notebook where you will import any required packages and functions to use throughout the notebook. It must be run first, every time.
Path to file
This cell needs to be edited. Between the quotes you will type the filepath to the .kdbx
file you would like to create, or one which already exists that you would like to edit. The format should look like:
file_location = 'C:/Users/path/to/node_auth.kdbx'
Run this cell to save the input.
Create Password
Run this cell. You will be prompted to create a password for the file (if it is a new file) or to enter the existing password if you are accessing an existing file. Ensure that you remember this password, as you will be using it every time you connect to the database through the Nodebooks.
Create or Update Main Connections
Run this cell. This section will have an editable form. If it is a new file, all fields will be blank. If it is an existing file, the previously-entered information will display. You may now edit the information, pressing the blue button when you are finished to save your results.
- Conn Name: this is customizable - what is the name of this connection? Something like “OTN Database” to help you remember.
- Host: this will be something like
matos.asascience.com
for your DB, but is just an IP address for our Node Training DB129.173.48.161
- Port: this is specified in your
.auth
file and will be four digits. Use5432
for Node Training. - DB Name: this will be your database name, something like
pathnode
. For training, it will benode_training
. - User/Password: your personal username and password, as found on your
.auth
file.
Create or Update DBLink Connections
In order to match detections across databases, you will need to establish DBLink
connections to other OTN Nodes. This information can be stored in your .kdbx
file, and will give you limited access to information required to match detections and create detection extracts.
Run this cell. This section will have an editable form. If it is a new file, all fields will be blank, and you can choose Add Connection
. If it is an existing file, the previously-entered information will display, for each DBLink Connection you’ve specified. You may now edit the information, pressing the update or save button when you are finished to save your results.
Please contact OTN if you need any of the following information:
- Conn Name: this is customizable - what is the name of this connection? Something like
fact-link-to-migramar
is informative. - Host: this will be something like
matos.asascience.com
for the DB you are trying to connect to (i.e. not your own Node db host). - Port: this will be the port required to connect to the remote DB (not your Node db).
- DB Name: this will be the name of the database you are trying to connect to (not your Node db), something like
otnunit
. - User/Password: the username and password of the DBLink user for your database. Not your own node admin’s connection information.
Once you have saved your new DBLink connection, you can create another. Continue until you have established connections to all remote Nodes. Currently, you will require DBLinks to 7 Nodes.
Test Connections
The next two cells will test the connection information you entered. Success messages will look like:
Auth password:········
Connection Notes: None
Database connection established
Connection Type:postgresql Host:db.your.org Database:your_db User:node_admin Node: Node
and also like:
Testing dblink connections:
fact-link-to-Node1: DBLink established on user@db.load.oceantrack.org:5432 - Node: NODE1
fact-link-to-Node2: DBLink established on user@db.load.oceantrack.org:5432 - Node: NODE2
fact-link-to-Node3: DBLink established on user@db.load.oceantrack.org:5432 - Node: NODE3
fact-link-to-Node4: DBLink established on user@db.load.oceantrack.org:5432 - Node: NODE4
fact-link-to-Node5: DBLink established on user@db.load.oceantrack.org:5432 - Node: NODE5
fact-link-to-Node6: DBLink established on user@db.load.oceantrack.org:5432 - Node: NODE6
fact-link-to-Node7: DBLink established on user@db.load.oceantrack.org:5432 - Node: NODE7
You are now able to use the filepath to your .kdbx
file to run all the Nodebooks.
Change KDBX password
If you have been passed a template .kdbx
file from OTNDC with prefilled information, you should use this section to change the password to ensure your privacy is protected.
You will need to enter:
- Current Password (provided by OTNDC) for your template file
- New Password (of your choosing)
Press Save
to change the password of your .kdbx
. Ensure that you remember this password, as you will be using it every time you connect to the database through the Nodebooks.
Add a Git Access Token
This will be relevant for users of the DB Fix
and Verification
suite of Nodebooks only. If you have questions about this section please reach out to OTNDC.
Key Points
Node-members cannot access the database, you are the liason
Data submissions and QC processes should be trackable and archived
OTN is always here to help with any step of the process
Project Metadata
Overview
Teaching: 30 min
Exercises: 0 minQuestions
How do I register a new project in the Database?
Objectives
Understand how to complete the template
Understand how to use the Gitlab checklist
Learn how to use the
Create and Update Projects
notebookLearn how to use the
Create Plone folders and add users
notebook
The first step when you are contacted by a researcher who wants to register their project with the Database is to request Project Metadata. For most Nodes, this is in the form of a plaintext .txt
file, using the template provided here. This file allows the researcher to provide information on the core attributes of the project, including the scientific abstract, associated investigators, geospatial details, temporal and taxonomic range.
Completed Metadata
Immediately, upon receipt of the metadata, a new Gitlab Issue should be created. Please use the Project Metadata
Issue checklist template.
Here is the Issue checklist, for reference:
Project Metadata
- [ ] - NAME add label *'loading records'*
- [ ] - NAME define type of project **select here one of Data, Deployment, Tracker**
- [ ] - NAME create schema and project records (`Creating and Updating project metadata` notebook)
- [ ] - NAME add project contact information (`Creating and Updating project metadata` notebook)
- [ ] - NAME add scientificnames (`Creating and Updating project metadata` notebook)
- [ ] - NAME verify all of above (`Creating and Updating project metadata` notebook)
- [ ] - NAME create new project repository users (`Create Plone Folders and Add Users` notebook)
- [ ] - NAME create project repository folder (`Create Plone Folders and Add Users` notebook)
- [ ] - NAME add project repository users to folder (`Create Plone Folders and Add Users` notebook)
- [ ] - NAME access project repository double-check project repository creation and user access
- [ ] - NAME add project metadata file to project folder (OTN members.oceantrack.org, FACT RW etc)
- [ ] - NAME send onboarding email to PIs
- [ ] - NAME label issue with *'Verify'*
- [ ] - NAME pass issue to OTN analyst for final verification
- [ ] - NAME verify project in database
- [ ] - NAME pass issue to OTN DAQ staff
- [ ] - NAME [OTN only] if this is a loan, update links for PMO
- [ ] - NAME [OTN only] manually identify if this is a loan, if so add record to otnunit.obis.loan_tracking (`Creating and Updating project metadata` notebook)
**project metadata txt file:**
Visual Inspection
Once the completed file is received from a researcher, the Data Manager should complete a visual check for formatting and accuracy.
Things to check:
- Is the PI-provided collection code unique/appropriate? Do you need to create one yourself? Existing schemas/collection codes can be seen in the database.
- Are there typos in the title or abstract?
- Are the contacts formatted correctly?
- Are the species formatted correctly?
- Does the location make sense based on the abstract, and is it formatted correctly (one per line)?
In general, most commonly formatting errors occur in the Contacts
section. Pay close attention here.
Below is an example of a properly completed metadata form, for your reference.
===FORM START===
0. Intended/preferred project code, if known? (May be altered by OTNDC)
format: XXXX (3-6 uppercase letters that do not already have a representation in the OTN DB. Will be assigned if left blank)
NSBS
1. Title-style description of the project?
format: < 70 words in 'paper title' form
OTN NS Blue Shark Tracking
2. Brief abstract of the project?
format: < 500 words in 'abstract' form
In the Northwest Atlantic, the Ocean Tracking Network (OTN), in collaboration with Dalhousie University, is using an acoustic telemetry infrastructure to monitor the habitat use, movements, and survival of juvenile blue sharks (Prionace glauca). This infrastructure includes state-of-the-art acoustic receivers and oceanographic monitoring equipment, and autonomous marine vehicles carrying oceanographic sensors and mobile acoustic receivers. Long-life acoustic tags (n=40) implanted in the experimental animals will provide long-term spatial resolution of shark movements and distribution, trans-boundary migrations, site fidelity, and the species’ response to a changing ocean. This study will facilitate interspecific comparisons, documentation of intra- and interspecific interactions, and permit long-term monitoring of this understudied predator in the Northwest Atlantic. The study will also provide basic and necessary information to better inform fisheries managers and policy makers. This is pertinent given the recent formulation of the Canadian Plan of Action for Shark Conservation.
3. Names, affiliations, email addresses, and ORCID (if available) of researchers involved in the project and their role.
The accepted Project Roles are defined as:
Principal Investigator: PI or Co-PI. The person(s) responsible for the overall planning, direction and management of the project.
Technician: Person(s) responsible for preparation, operation and/or maintenance of shipboard, laboratory or deployed scientific instrumentation, but has no invested interest in the data returned by that instrumentation.
Researcher: Person(s) who may use/analyse the data to answer research questions, but is not the project lead. Can be a student if their involvement spans past the completion of an academic degree.
Student: Person(s) researching as part of a project as part of their work towards an academic degree.
Collaborator: A provider of input/support to a project without formal involvement in the project.
Please add 'Point of Contact' to the contact(s) who will be responsible for communicating with OTN.
format: Firstname Lastname, Employer OR Affiliation, Project Role (choose from above list), email.address@url.com, point of contact (if relevant), ORCID
Fred Whoriskey, OTN, principal investigator, fwhoriskey@dal.ca, 0000-0001-7024-3284
Sara Iverson, OTN, principal investigator, sara.iverson@dal.ca
Caitlin Bate, Dal, researcher, caitlin.bate@dal.ca, point of contact
4. Project URL - can be left blank
format: http[s]://yoursite.com
https://members.oceantrack.org/
5. Species being studied?
format: Common name (scientific name)
Blue shark (Prionace glauca)
Sunfish (Mola mola)
6. Location of the project?
format: (city, state/province OR nearby landmark OR lat/long points in decimal degree), one per line
Halifax, NS
44.19939/-63.24085
7. Start and end dates of the project, if known?
format: YYYY-MM-DD to YYYY-MM-DD ('ongoing' is an acceptable end date)
2013-08-21 to ongoing
8. Citation to use when referencing this project:
format: Lastname, I., Lastname, I. YYYY. [Title from question 1 or suitable alternative] Will be assigned if left blank.
===FORM END===
Quality Control - Create and Update Projects
Each step in the Issue checklist will be discussed here, along with other important notes required to use the Nodebooks.
Imports cell
This section will be common for most Nodebooks: it is a cell at the top of the notebook where you will import any required packages and functions to use throughout the notebook. It must be run first, every time.
You will have to edit one section: engine = get_engine()
- Within the open brackets you need to open quotations and paste the path to your database
.kdbx
file which contains your login credentials. - On MacOS computers, you can usually find and copy the path to your database
.kdbx
file by right-clicking on the file and holding down the “option” key. On Windows, we recommend using the installed software Path Copy Copy, so you can copy a unix-style path by right-clicking. - The path should look like
engine = get_engine('C:/Users/username/Desktop/Auth files/database_connection.kdbx')
.
Project Metadata Parser
This cell is where you input the information contained in the Project Metadata .txt
file. There are two ways to do this:
- You can paste in the filepath to the saved
.txt
. You should ensure the formatting follows this example:project_template_path = '/path/to/project_metadata.txt'
- You can paste the entire contents of the file, from
===FORM START=== to ===FORM END===
inclusive, into the provided triple quotation marks.
Once either option is selected, you can run the cell to complete the quality control checks.
The output will have useful information:
- Are there strange characters in the collectioncode, project title, or abstract?
- Were the names and affiliations of each contact successfully parsed? Are there any affiliated institutions which are not found? Are there any contacts which were not found that you expected to be?
- Is the project URL formatted correctly?
- Are all the species studied found in WoRMS? Are any of them non-accepted taxonomy (the top of the species record should have this success message:
INFO: Halichoerus grypus is an accepted taxon, and has Aphia ID 137080.
, followed by a URL)? Which ones have common names which are not matching the WoRMS records (look at bottom of each species record for success:OK: Grey seal is an acceptable vernacular name for Halichoerus grypus
)? NOTE: any mismatches with commonname can be fixed at a later stage, make a note in the Issue for your records - Is the suggested Bounding Box appropriate based on the abstract? NOTE: any issues with the scale of the bounding box can be fixed at a later stage, make a note in the Issue for your records
- Are the start and end dates formatted correctly?
Generally, most of the error messages arise from the Contacts and Species sections.
If there is information which is not parsing correctly, you should fix the source-file and re-run the cell until you are happy with the output.
Manual Fields - Dropdown Menu
There are some fields which need to be setup by the Data Manager, and not set by the researcher. These are in the next cell.
You will run this cell, and a fillable form will appear.
- Node: select your node
- Collaboration Type: based on the abstract, are they deploying only tags (
Tracker
project), only receivers (Deployment
project) or both tags and receivers (Data
project)? - Ocean: choose the most appropriate ocean region based on the abstract.
- Shortname: use the Title provided by the researcher, or something else, which will be used as the name of the Data Portal folder. ex:
OTN Blue Sharks
. - Longname: use the Title provided by the researcher, or something else, which is in “scientific-paper” style. ex:
Understanding the movements of Blue sharks through Nova Scotia waters, using acoustic telemetry.
- Series Code: this will generally be the name of your node. Compare to values found in the database
obis.otn_resources
if you’re unsure. - Institution Code: The main institution responsible for maintaining the project. Compare to values found in the database
obis.institution_codes
andobis.otn_resources
if you’re unsure. If this is a new Institution, please make a note in the Issue, so you can add it later on - Country: based upon the abstract. Multiple countries can be listed as such:
CANADA, USA, EGYPT
etc. - State: based upon the abstract. Multiple states can be listed as such:
NOVA SCOTIA, NEWFOUNDLAND
etc. - Local Area: based upon the abstract. Location information. ex:
Halifax
- Locality: based upon the abstract. Finest-scale of location. ex:
Shubenacadie River
- Status: is the project completed, ongoing or something else?
To save and parse your inputted values DO NOT re-run the cell - this will clear all your input. Instead, the next cell is the one which needs to be run to parse the information.
Verify the output from the parser cell, looking for several things:
- Are all the fields marked with a green
OK
? - Is the following feedback included in the institution code section:
Found institution record for XXX in your database:
followed by a small, embedded Table?
If anything is wrong, please begin again from the Manual Field input cell.
If the institution code IS NOT found - compare to values found in the database obis.institution_codes
and obis.otn_resources
. If this is a new Institution, please make a note in the Issue, so you can add it later on
Task list checkpoint
In Gitlab, this task can be completed at this stage:
- [ ] - NAME define type of project **select here one of Data, Deployment, Tracker**
Please edit to include the selected project type, in harmony with the selected field in the notebook.
Verifying Correct Info
At this stage, we have all the information parsed that we need in order to register the project in the database. There is now a cell that will print out every saved value for your review.
You are looking for:
- Typos
- Strange characters
- Correct information based on abstract
If this information is satisfactory, you can proceed.
Adjust Bounding Box
The cell titled Verify the new project details before writing to the DB
is the final step for verification before the values are written to the database. At this stage a map will appear, with the proposed Bounding Box for the project.
Based on the abstract, you can use the Square Draw Tool
to re-draw the bounding box until you are happy with it.
Once you are happy, you can run the next cell in order to save your bounding adjustments. The sucess output should be formatted like this:
--- Midpoint ---
Latitude:
Longitude:
Create New Institution
STOP - confirm there is no Push currently ongoing. If a Push is ongoing, you must wait for it to be completed before processing beyond this point.
Remember above, where we noted whether or not an institution existed on obis.institution_codes
or if it was a new institution? This cell is our opportunity to add any institutions if they are new. If all institutions (for each contact, plus for the project as a whole) exist, then you can skip this cell.
To run the cell, you will need to complete:
- Institution Code: a short-code for the institution (ex: DAL)
- Institution Name: the full institution name (ex: Dalhousie University)
- Institution Website: the institution’s website (ex: https://www.dal.ca/). You can confirm the URL is valid with the
Check URL
button. - Organization ID: Press the
Search Org ID
button. A list of potential WikiData, GRID, ROR, EDMO and NERC results for the institution will appear. Choose the best match, and paste that URL into the blank Organization ID cell. - Institution Country: the country where the institution is headquartered
- Institution State: the state where the institution is headquartered
- Institution Sector: one of Unknown, Government/Other Publuc, Univserity/College/Research Hospital, Private, or Non-profit.
Once all values are completed, press Create Institution
and confirm the following output:
Institution record 'DAL' created.
You can re-run this cell as many times as you need, to add each missing institution.
Write Project to the Database
STOP - confirm there is no Push currently ongoing. If a Push is ongoing, you must wait for it to be completed before processing beyond this point.
Finally, it is time to write the project records to the database and create the new project!
First, you should run this cell with printSQL = True
. If there are no errors, you can edit and change to printSQL = False
and run again. This will register the project!
You will see some output - confirm each line is accompanied by a green OK
.
Task list checkpoint
In Gitlab, this task can be completed at this stage:
- [ ] - NAME create schema and project records ("Creating and Updating project metadata" notebook)
Save Contact Information
At this stage, the next step is to add contact information for all identified collaborators.
This cell will gather and print out all contacts and their information. Review for errors. If none exist, move to the next cell.
The next cell writes to the database: STOP - confirm there is no Push currently ongoing. This cell will add each contact to the database, into the obis.contacts
and obis.contacts_projects
tables, as needed.
There should be output similar to this:
Valid contact: Fred Whoriskey OTN principalInvestigator fwhoriskey@dal.ca
Created contact Fred Whoriskey
Task list checkpoint
In Gitlab, this task can be completed at this stage:
- [ ] - NAME add project contact information ("Creating and Updating project metadata" notebook)
Add Species Information
The following section will allow you to add the required information into the obis.scientific_names
table.
The first cell imports the required function.
The second cell, when run, will create an editable input form for each animal.
You should review to confirm the following:
- the scientific name is matching to an accepted WoRMS taxon. There should be a URL provided, and no error messages.
- the common name is acceptable. If it is not, you can choose a value from the dropdown menu (taken directly from WoRMS’
vernacular
list) OR you can enter a custom value to match the common name provided by the researcher.
Once you are sure that both the scientific and common names are correct, based on the information provided by both the project and the notebook, you may click the Add to project
button for each animal record you’d like to insert.
There will be a confirmation display in the notebook to demonstrate if the insertion was successful.
Task list checkpoint
In Gitlab, this task can be completed at this stage:
- [ ] - NAME add scientificnames ("Creating and Updating project metadata" notebook)
OPTIONAL: Add Project Loan Information
The following section is used by OTN staff to track projects which are recipients of OTN-equipment loans. This section is not within the scope of this Node Manager Training, because it requires a login-file for the
otnunit
database.
Skip to Verification
Once you scroll past the Project Loan Information
section, you will see a yellow star and the words Skip to the new project Verification. You should click the button provided, which will help you scroll to the bottom of the notebook, where the Verify
section is located.
This is a chance to visually review all the fields you just entered. You should enter run these cells and review all output to ensure the database values align with the intended insertions.
Task list checkpoint
In Gitlab, this task can be completed at this stage:
- [ ] - NAME verify all of above ("Creating and Updating project metadata" notebook)
Other Features of this Notebook
There are more features in the Create and Update Projects
notebook than those covered above.
Schema extension
This section is to be used when you have a project which is either Tracker
or Deployment
and is expanding to become a Data
project. Ex: a project which was only tagging has begun deploying receivers.
You can use this notebook to create the missing tables for the schema. Ex: if a tagging project begins deploying receivers, the schema would now need stations
, rcvr_locations
, and moorings
tables created.
Schema updating
This section is to be used to change the values contained in obis.otn_resources
. The first cell will open an editable form with the existing database values. You can change the required fields (ex: abstract).
To save and parse your inputted values DO NOT re-run the cell - this will clear all your input. Instead, the next cell is the one which needs to be run to parse the information.
Review the output of the parser cell and check for typos.
The next cell will show the changes that will be made to the project data. You can copy this output and paste it into the relevant Gitlab Issue for tracking.
The final cell will make the desired changes in the database. Ensure printSQL = False
if you want the cell to execute directly.
The output should look like this to confirm success:
'Resource record for HFX has been updated.'
The following, highlighted section is relevant only to Nodes who use Plone
for their document management system
Quality Control - Create Plone Users and Access
If you are part of a Node that uses Plone as your document repository, then the following will be relevant for you.
Imports cell
This section will be common for most Nodebooks: it is a cell at the top of the notebook where you will import any required packages and functions to use throughout the notebook. It must be run first, every time.
You will have to edit one section:
engine = get_engine()
- Within the open brackets you need to open quotations and paste the path to your database
.kdbx
file which contains your login credentials.- On MacOS computers, you can usually find and copy the path to your database
.kdbx
file by right-clicking on the file and holding down the “option” key. On Windows, we recommend using the installed software Path Copy Copy, so you can copy a unix-style path by right-clicking.- The path should look like
engine = get_engine(‘C:/Users/username/Desktop/Auth files/database_connection.kdbx’)
.Plone Login
The first cell is another import step.
The second cell requires input:
- Proper Plone log-in information must be written in the
plone_auth = get_plone_auth('./plonetools/plone_auth.json')
file.- In order to do this, click on the
Jupyter
icon in the top left corner of the page.- This will bring you to a list of folders and notebooks. Select the
plonetools
folder. From there, select theplone_auth.json
file and input your Plone base URL, username, and password. Hint: ensure the base_url in your json file ends in a slash, likehttps://members.oceantrack.org/
!- You can now successfully log into Plone.
Now, when you run the cell, you should get following output:
Auth Loaded: ------------------------------------------------------------------------------ base_url: https://members.oceantrack.org/ user_name: user verify ssl: False
Finally, the third cell in this section will allow you to login. You should see this message:
Login Successful!
Access Project Info
Some information is needed in order to create the project Plone folders.
There are three ways to enter this information:
- Access Project Information from Database
- Manual Project Information Form - Parse Contacts
- Manual Project Information Form - Insert Contacts into Textfields
The first option is generally the easiest, if the project has already been successfully written to the database using the
Create and Update Projects
notebook. To do this, you enter thecollectioncode
of your project, and run the cell. If there are no errors, you can click theSKIP
button which will take you down the notebook to the next section.Create Missing Users
This section will use the registered project contacts and compare against existing Plone users. It will compare by 1) email, 2) fullname, 3) lastname.
If a user is found: you will not need to create a new account for them.
If a user is not found: you will have to create an account for them. To do this, you can use the editable form in the next cell.
The editable cell will allow you to choose each contact that you’d like to register, and will autofill the information (including a suggested username). The password should be left blank. Once you are happy with the form, click
Add User
. An email will be sent to the new user, prompting them to set a password. Then you can repeat by selecting the next contact, etc.Once all contacts have Plone accounts (new or otherwise) you are finished.
Task list checkpoint
In Gitlab, this task can be completed at this stage:
- [ ] - NAME create new project repository users ("Create Plone Folders and Add Users" notebook)
Create Project Repository
To create the project folder you must first enter the relevant Node information:
- otnunit:
node = None
- safnode, migramar, nepunit:
node = "node"
- lowercase with quotation marks, fill in the value based on the path in Plone.- all other nodes (not hosted by OTN):
node = None
Running this cell will print out an example of the URL, for your confirmation. Ensure the collectioncode and Node are correct.
The expected format:
https://members.oceantrack.org/data/repository/node_name/collectioncode
(node name is for SAF, MigraMar and NEP only)If you are confident the folder path is correct, you can run the next cell and confirm the following success message:
Creating collection folder 'collectioncode'. Done! https://members.oceantrack.org/data/repository/node_name/collectioncode
Task list checkpoint
In Gitlab, this task can be completed at this stage:
- [ ] - NAME create project repository folder ("Create Plone Folders and Add Users" notebook)
Add Users to Repository
Now that the users AND folder have been created, the users must be given access to the new folder.
Using the final cell in the
Plone repository folder creation
section, you will be provided with an editable search-bar.Type in the Plone username of each contact (new and existing). Search results will appear:
- select the User who you would like to add
- choose their permissions
- click “Change repo permissions” to add them to the folder.
Review for the following success message:
Changed https://members.oceantrack.org/data/repository/node_name/collectioncode sharing for username: Contributor=True Reviewer=True Editor=True Reader=True
Then you may choose
Add another user
and begin again.The acceptable folder permissions may vary depending on the project role of the contact. Here are some guidelines:
- Principal Investigator: all permissions
- Researcher: all permissions except
Reviewer
- Student: all permissions except
Reviewer
- Technician: only
Contributor
andReader
- Collaborator: only
Contributor
andReader
This is very fluid and can be edited at any time. These are guidelines only!
Task list checkpoint
In Gitlab, this task can be completed at this stage:
- [ ] - NAME add project repository users to folder ("Create Plone Folders and Add Users" notebook)
OPTIONAL: Add Project Loan Information
The following section is used by OTN staff to track projects which are recipients of OTN-equipment loans. This section is not within the scope of this Node Manager Training, because it requires a login-file for the
otnunit
database.
Final Steps
The remaining steps in the Gitlab Checklist are completed outside the notebooks.
First: you should access the created Repository folder in your browser and confirm if the title and sharing information is correct. If so, add the project metadata .txt
file into the “Data and Metadata” folder to archive.
Next, you should send an email to the project contacts letting them know their project code and other onboarding information.
Finally, the Issue can be passed off to an OTN-analyst for final verification in the database.
Key Points
Loading project metadata requires judgement from the Data Manager
Loading project metadata is the first step towards a functioning project
Tagging Metadata
Overview
Teaching: 20 min
Exercises: 0 minQuestions
How do I load new tags into the Database?
Objectives
Understand the proper template-completion
Understand how to use the Gitlab checklist
Learn how to use the
Tag-1
notebookLearn how to use the
Tag-1b
notebookLearn how to use the
Tag-2
notebook
Once a project has been registered, the next step (for Tracker
and Data
project types) is to quality control and load the tagging metadata into the database. Tagging metadata should be reported to the Node in the template provided here. This file will contain information about the deployment of tags (acoustic, PIT, satellite, floy etc.) in or on animals for the purposes of tracking their movements using either listening stations or via mark/recapture. Any biological metrics that were measured at tagging time, i.e. length, weight, population, are recorded for future analysis.
Remembering a previous lesson, there are multiple levels of data-tables in the database for tagging records: raw tables
, cache tables
and otn tables
. The process for loading tagging metadata reflects this, as does the Gitlab task list.
Completed Metadata
Immediately, upon receipt of the metadata, a new Gitlab Issue should be created. Please use the Tag Metadata
Issue checklist template.
Here is the Issue checklist, for reference:
Tag Meta Data
- [ ] - NAME add label *'loading records'*
- [ ] - NAME load raw tag metadata (`tag-1` notebook) **put_table_name_in_ticket**
- [ ] - NAME confirm no duplicates in raw table, review and remove (`tag-1b` notebook)
- [ ] - NAME verify raw table (`tag-2` notebook)
- [ ] - NAME post updated metadata to project folder (OTN members.oceantrack.org, FACT RW etc) if needed
- [ ] - NAME build cache tables (`tag-2` notebook)
- [ ] - NAME verify cache tables (`tag-2` notebook)
- [ ] - NAME load otn tables (`tag-2` notebook)
- [ ] - NAME verify otn tables (`tag-2` notebook)
- [ ] - NAME verify tags are not part of another collection (`tag-2` notebook)
- [ ] - NAME label issue with *'Verify'*
- [ ] - NAME pass issue to analyst for final verification
- [ ] - NAME check for double reporting (verification_notebooks/Tag Verification notebook)
Visual Inspection
Once the completed file is received from a researcher, the Data Manager should first complete a visual check for formatting and accuracy.
In general, Tagging Metadata has 3 sections:
- information about the tag
- information about the animal
- information about the tag deployment
Information about the tag
is obtained from Tag Specifications and is mandatory. Information about the animal
is limited to the sampling conducted. The minimum requirement here is simply the common/scientific names. All other columns for biological parameters are optional. Information about the tag deployment
includes location and dates and is mandatory to complete.
Things to visually check in the metadata:
- Is there any information missing from the essential columns? These are:
- tag_type
- tag_manufacturer
- tag_model
- tag_serial_number (if not completed,
unknown
can be entered ONLY IF animal_id is also completed with unique values) - tag_id_code
- tag_code_space
- tag_implant_type
- est_tag_life
- common_name_e
- scientific_name
- release_location
- release_latitude
- release_longitude
- utc_release_date_time
- If any of the above mandatory fields are blank, follow-up with the researcher will be required if:
- you cannot discern the values yourself
- you do not have access to the Tag Specifications from the manufacturer (relevant for the columns containing
tag
information).
- Are the
tag_id_code
andtag_code_space
values formatted correctly? - Is the
UTC_release_date_time
column formatted correctly?
In general, most commonly formatting errors occur in the information about the tag
section. Pay close attention here.
The metadata template available here has a Sample Data Row
as an example of properly-formatted metadata, along with the Data Dictionary
sheet which contains detailed expectations for each column. Refer back to these often. We have also included some recommendations on our FAQ page. Here are some guidelines:
- Animals with >1 associated tag (sensors, or double-tagging): add one line PER
TRANSMITTER ID
into the Tag Metadata form. TheANIMAL_ID
column, or theTAG_SERIAL_NUMBER
column must be the same between the rows in order to link those two (or more) records together. - Animals with ONLY a floy tag: ensure the
TAG_TYPE
column =FLOY
. You may leave the following columns empty -tag_manufacturer
,tag_model
,tag_id_code
,tag_code_space
andest_tag_life
. - Animals with satellite tags: ensure the
TAG_TYPE
column =SATELLITE
. You may leave the following columns empty -tag_id_code
andtag_code_space
.
Quality Control - Tag-1 Notebook
Each step in the Issue checklist will be discussed here, along with other important notes required to use the Nodebook.
Imports cell
This section will be common for most Nodebooks: it is a cell at the top of the notebook where you will import any required packages and functions to use throughout the notebook. It must be run first, every time.
There are no values here which need to be edited.
Path to File
In this cell, you need to paste a filepath to the relevant Tagging Metadata file. The filepath will be added between the provided quotation marks.
Correct formatting looks something like this:
# Path to your tagging metadata file - Formats Accepted: (csv, xls or xlsx)
filepath = r"C:/Users/path/to/tag_metadata.xlsx"
Once you have added your filepath, you can run the cell.
Verification of File Contents - formatting
First, you must choose which sheet you would like to quality control. Generally, it will be names Tag Metadata
but is often customized by researchers. Once you have selected the sheet name, do not re-run the cell to save the output - simply ensure the correct sheet is highlighted and move onto the next cell.
This cell will now complete the first round of Quality Control checks.
The output will have useful information:
- Is the sheet formatted correctly? Correct column names, datatypes in each column etc.
- Are either the
animal_id
ortag_serial_number
columns completed? - Are there any
harvest_date
values in the metadata? Are they all after theutc_release_date_time
? - Are the information about the
animal
formatted according to the Data Dictionary? - Are there any tags which are used twice in the same sheet?
- Are there potential transcription errors in the
tag_code_space
? Ex: drag-and-drop errors from Excel - Are the scientific and common names used accepted by WoRMS? If there are errors flagged here which state
Vernacular records not found for Aphia ID: 12345
please make a note, but continue on. This will be fixed at a later stage
The notebook will indicate the sheet had passed quality control by adding a ✔️green checkmark beside each section. There should also be an interactive plot generated, summarizing the tags released over time, by species, for you to explore.
If there is information which is not passing quality control, you should fix the source-file (potentially speaking to the researcher) and try again.
Connection to Database
You will have to edit two sections:
schema = 'collectioncode'
- please edit to include the relevant project code, in lowercase, between the quotes.
engine = get_engine()
- Within the open brackets you need to open quotations and paste the path to your database
.kdbx
file which contains your login credentials. - On MacOS computers, you can usually find and copy the path to your database
.kdbx
file by right-clicking on the file and holding down the “option” key. On Windows, we recommend using the installed software Path Copy Copy, so you can copy a unix-style path by right-clicking. - The path should look like
engine = get_engine(‘C:/Users/username/Desktop/Auth files/database_connection.kdbx’)
.
- Within the open brackets you need to open quotations and paste the path to your database
Once you have added your information, you can run the cell. Successful login is indicated with the following output:
Auth password:········
Connection Notes: None
Database connection established
Connection Type:postgresql Host:db.load.oceantrack.org Database:otnunit User:admin Node:OTN
Bounding Box Check
The following cell will map the locations of all the tag deployments
, compared to the Bounding Box of the project. This is used to confirm the following:
- the tag release locations are in the part of the world expected based on the project abstract. Ex: lat/long have correct +/- signs
- the project bounding box is correct
If it appears the tag release locations have incorrect signs, you can fix it in the source file and re-run the cell.
If it appears there are tag release locations which are on land, you may want to reach out to the researcher for corrected coordinates.
If the bounding box needs to be expanded to include the tags, you can use the Square Draw Tool
to re-draw the bounding box until you are happy with it. Once all tags are drawn inside the bounding box, press the Adjust Bounding Box
button to save the results.
Verification of File Contents - against database
This cell will now complete the second round of Quality Control checks.
The output will have useful information:
- Have these tags been used on other projects in the database? Check the dates to ensure they don’t overlap, and double-reporting is not taking place.
- Do we have the Tag Specifications from the manufacturer? Do the
tag_id_code
,tag_code_space
andest_tag_life
match the specifications for each provided serial number? Are there typos or errors that require clarification from the researcher? - Is the information about the
animal
formatted according to the Data Dictionary?- Are all the life stages in the
obis.lifestage_codes
table? If not, the reported life stage should be compared to the values in theobis.lifestage_code table
, and adjusted if possible. Otherwise, use theadd_lifestage_codes
notebook - Are all length types in the
obis.length_type_codes
table? If not, the reported length type code should be compared to the values in theobis.length_type_codes
table, and adjusted if possible. Otherwise, use theadd_lengthtype_codes
notebook - Are all the age units in the
obis.control_units
table? If not, the reported age units should be compared to the values in theobis.control_units
table, and adjusted if possible. Otherwise, use theadd_control_units
notebook
- Are all the life stages in the
- Are there any tags in this sheet which have been previously reported on this project in the metadata? ex: duplicates.
- Do the scientific and common names match the records which are previously added to
obis.scientificnames
for this schema? If not, please check the records in theobis.scientificnames
and the source file to confirm there are no typos. If this is a new species tagged by this project, use thescientific_name_check
notebook to add the new species. - Are all the provided
tag_model
values present in theobis.instrument_models
table? If not, please check the records in theobis.instrument_models
and the source file to confirm there are no typos. If this is a new model which has never been used before, use theadd instrument_models
notebook to add the new tag model. - Are there any tags in this sheet which have been previously reported on this project in the metadata, but with different deployment dates? ex: overlapping/missing harvest dates
- Are there any tags being flagged as overlapping tag deployments, but not as duplicate tags? There may be an error with the tag’s serial number. Check if the tag’s ID exists in the otn_transmitters table of the schema or in the
vendor.c_vemco_tags
table, and compare it to the tag in the tagging metadata sheet. Fix the tag in the tagging metadata sheet if any errors are found. - Are there any release dates in the future?
The notebook will indicate the sheet had passed quality control by adding a ✔️green checkmark beside each section.
If there is information which is not passing quality control, you should fix the source-file (potentially speaking to the researcher) and try again.
Loading the Raw Table
ONLY once the source file has successfully passed ALL quality control checks can you load the raw table to the database.
In this cell, you will have to edit one section in order to name the raw table. Between the quotes, please add your custom table suffix. We recommend using year_month
or similar, to indicate the most-recently tagged animal in the metadata sheet.
table_suffix = "YYYY_mm"
The notebook will indicate the success of the table-creation with the following message:
Reading file: otn_metadata_tagging.xlsx.
Tag Metadata sheet found, reading file... OK
Loading 'otn_metadata_tagging.xlsx' file into collectioncode.c_tag_meta_YYYY_mm... OK
Loaded 37 records into table collectioncode.c_tag_meta_YYYY_mm
True
Task list checkpoint
In Gitlab, this task can be completed at this stage:
- [ ] - NAME load raw tag metadata ('tag-1' notebook) **put_table_name_in_ticket**
Ensure you paste the table name (ex: c_tag_meta_YYYY_mm) into the section indicated, before you check the box.
Quality Control - Tag-1b Notebook
Once the raw table is successfully loaded, the next step is to ensure any previously-loaded tags are not re-added to the database (causing duplication errors).
Imports cell
This section will be common for most Nodebooks: it is a cell at the top of the notebook where you will import any required packages and functions to use throughout the notebook. It must be run first, every time.
There are no values here which need to be edited.
Table Name and Database Connection
You will have to edit three sections:
engine = get_engine()
- Within the open brackets you need to open quotations and paste the path to your database
.kdbx
file which contains your login credentials. - On MacOS computers, you can usually find and copy the path to your database
.kdbx
file by right-clicking on the file and holding down the “option” key. On Windows, we recommend using the installed software Path Copy Copy, so you can copy a unix-style path by right-clicking. - The path should look like
engine = get_engine(‘C:/Users/username/Desktop/Auth files/database_conn_string.kdbx’)
.
- Within the open brackets you need to open quotations and paste the path to your database
schema = 'collectioncode'
- please edit to include the relevant project code, in lowercase, between the quotes.
table_name = 'c_tag_meta_YYYY_mm'
- Within the quotes, please add the name of the raw table.
Once you have added your information, you can run the cell. Successful login is indicated with the following output:
Auth password:········
Connection Notes: None
Database connection established
Connection Type:postgresql Host:db.load.oceantrack.org Database:otnunit User:admin Node:OTN
Checking for Duplicates
This cell, once run, may print out that there are No Duplicates Found
. If so, you can move on.
If there is output identifying duplicates, some review is necessary.
Immediately, the notebook will create a new table, named schema.c_tag_meta_YYYY_mm_no_dup
which is an exact copy of your raw table. There will be a print-out saying the following (under a table):
Building schema.c_tag_meta_YYYY_mm_no_dup table:
The following SQL queries will display the differences/similarities between the raw table (schema.c_tag_meta_YYYY_mm_no_dup) record and the otn_animals or otn_transmitters table record.
Then, there will be two interactive tables provided, which can be used to identify and delete any duplicate records from the no_dup
table.
- Comparison to otn_animals
- Comparison to otn_transmitters
All text should be black
, and you should scroll through the entire table, row-by-row, to review any values where the cell is highlighted/flashing RED
. These are highlighting differences between the values in the raw
table vs values in the otn
table, for this animal record.
If there are no highlighted/flashing red cells in a row, you can delete the record from the no_dup
table by using the delete
button on the left-side of the table. You can also use the Select All
button if appropriate.
If there are highlighted/flashing red cells in a row, you must compare the highlighted values. These cannot be deleted without previous review since they are not exact duplicates, and could indicate an update to the database records is needed.
Ex: life_stage
from the raw table might = SMOLT
while lifestage
from otn_animals might = ADULT
. This will cause the column lifestage_equal?
to read FALSE
and the cells to flash red. In this example, it could indicate that the researcher was correcting an error in the previously-submitted record (this animal was actually a smolt, not adult) and therefore we need to update the record in our database. In this case, you would need to email the researcher to confirm, pause processing this Issue, and create a new DB_fix
Issue with the information which needs updating.
If you review all the flashing red cells and find they are only rounding errors, or similar non-significant data changes, you can determine that they are indeed duplicate records. You can now use the delete
button on the left-side of the table. You can also use the Select All
button, if all records are true duplicates.
Once you have selected a row to be deleted, the text will turn red
to help you visualize your selection.
Deleting Duplicates
Once you have identified all the true duplicate records, this next cell will remove them from the no_dup
table. Success will be indicated with this message, and a table:
Compiling list of deletions to the schema.c_tag_meta_YYYY_mm_no_dup table.
Delete the following XXX records from schema.c_tag_meta_YYYY_mm_no_dup?
You will be able to press the button to confirm, and the notebook will remove all the records.
Task list checkpoint
In Gitlab, this task can be completed at this stage:
- [ ] - NAME confirm no duplicates in raw table, review and remove ("tag-1b" notebook)
Ensure you paste the no_dup
table name(ex: c_tag_meta_2021_09_no_dup), if relevant, into the Issue before you check the box. This is now the raw table that will be used for the result of the data-loading process.
Quality Control - Tag-2 Notebook
Now that the raw table is free from duplicates, we can begin to move the records into the higher-level cache
and otn
tables, where they will be matched to detections.
Imports cell
This section will be common for most Nodebooks: it is a cell at the top of the notebook where you will import any required packages and functions to use throughout the notebook. It must be run first, every time.
There are no values here which need to be edited.
Database Connection
You will have to edit one section:
engine = get_engine()
- Within the open brackets you need to open quotations and paste the path to your database
.kdbx
file which contains your login credentials. - On MacOS computers, you can usually find and copy the path to your database
.kdbx
file by right-clicking on the file and holding down the “option” key. On Windows, we recommend using the installed software Path Copy Copy, so you can copy a unix-style path by right-clicking. - The path should look like
engine = get_engine(‘C:/Users/username/Desktop/Auth files/database_conn_string.kdbx’)
.
- Within the open brackets you need to open quotations and paste the path to your database
Once you have added your information, you can run the cell. Successful login is indicated with the following output:
Auth password:········
Connection Notes: None
Database connection established
Connection Type:postgresql Host:db.load.oceantrack.org Database:otnunit User:admin Node:OTN
Table Name
You will have to edit two sections:
table_name = 'c_tag_meta_YYYY_mm'
- Within the quotes, please add the name of the raw table. Might be the
no_dup
table if relevant.
- Within the quotes, please add the name of the raw table. Might be the
schema = 'collectioncode'
- please edit to include the relevant project code, in lowercase, between the quotes.
Verification of File Contents - against database
This cell will now complete the final round of Quality Control checks. These are exactly the same as the checks at the end of the tag-1
notebook.
The output will have useful information:
- Have these tags been used on other projects in the database? Check the dates to ensure they don’t overlap, and double-reporting is not taking place.
- Do we have the Tag Specifications from the manufacturer? Do the
tag_id_code
,tag_code_space
andest_tag_life
match the specifications for each provided serial number? Are there typos or errors that require clarification from the researcher? - Is the information about the
animal
formatted according to the Data Dictionary?- Are all the life stages in the
obis.lifestage_codes
table? If not, the reported life stage should be compared to the values in theobis.lifestage_code table
, and adjusted if possible. Otherwise, use theadd_lifestage_codes
notebook - Are all length types in the
obis.length_type_codes
table? If not, the reported length type code should be compared to the values in theobis.length_type_codes
table, and adjusted if possible. Otherwise, use theadd_lengthtype_codes
notebook - Are all the age units in the
obis.control_units
table? If not, the reported age units should be compared to the values in theobis.control_units
table, and adjusted if possible. Otherwise, use theadd_control_units
notebook
- Are all the life stages in the
- Are there any tags in this sheet which have been previously reported on this project in the metadata? ex: duplicates.
- Do the scientific and common names match the records which are previously added to
obis.scientificnames
for this schema? If not, please check the records in theobis.scientificnames
and the source file to confirm there are no typos. If this is a new species tagged by this project, use thescientific_name_check
notebook to add the new species. - Are all the provided
tag_model
values present in theobis.instrument_models
table? If not, please check the records in theobis.instrument_models
and the source file to confirm there are no typos. If this is a new model which has never been used before, use theadd instrument_models
notebook to add the new tag model. - Are there any tags in this sheet which have been previously reported on this project in the metadata, but with different deployment dates? ex: overlapping/missing harvest dates
- Are there any tags being flagged as overlapping tag deployments, but not as duplicate tags? There may be an error with the tag’s serial number. Check if the tag’s ID exists in the otn_transmitters table of the schema or in the
vendor.c_vemco_tags
table, and compare it to the tag in the tagging metadata sheet. Fix the tag in the tagging metadata sheet if any errors are found. - Are there any release dates in the future?
The notebook will indicate the sheet had passed quality control by adding a ✔️green checkmark beside each section.
If there is information which is not passing quality control, you should fix the source-file (potentially speaking to the researcher), delete or edit the raw table, and try again.
Task list checkpoint
In Gitlab, this task can be completed at this stage:
- [ ] - NAME verify raw table ("tag-2" notebook)
Loading to Cache Tables
The first cell will create the cache tables, with the following success messaging:
Creating table schema.tagcache_YYYY_mm
Creating table schema.animalcache_YYYY_mm
Table creation finished.
The next step will populate the tag_cache
and animal_cache
tables, but splitting apart the information about the tag
and the information about the animal
, joining the records by a unique catalognumber
based on the tag deployment
information.
Running this cell will provide the following success message:
Added XX records to the schema.animalcache_YYYY_mm table
Added XX records to the schema.tagcache_YYYY_mm table
You need to pay special attention to the number of records loaded to the animal and tag caches. If this number doesn’t match you may need to investigate why there are more tags than animals or vice versa. Possible reasons the values may not match:
- There are some animals with only FLOY tags, no acoustic tags (no record added to
tag_cache
). - There are some animals with >1 tag attached, or a tag with >1 pinger ID (multiple records added to
tag_cache
, for each animal)
If the values are acceptable, you can move on.
Task list checkpoint
In Gitlab, this task can be completed at this stage:
- [ ] - NAME build cache tables ("tag-2" notebook)
Verifying Cache Tables
This cell will now complete the Quality Control checks of the cache tables.
The output will have useful information:
- Are there more animals than tags?
- Were all records loaded from raw to cache successfully?
- Do all animals have tag records, and all tags have animal records?
- Are there blank strings that need to be set to NULL? If so, press the
Set to NULL
button. - Is the species information and location information formatted correctly?
- Are the sex, age, common name and instrument model records present in the
obis
table controlled vocabulary? - Visually confirm that the length/weight values make sense for that species/lifestage, and if not, change in the
animal_cache
table. - Confirm there are no overlapping tags remaining
- Confirm there are Tag Specifications available, and they match the records. Be mindful that harvested tags will have a different tag life than is stated in the Specifications.
The notebook will indicate the sheet had passed quality control by adding a ✔️green checkmark beside each section.
If there are any errors go into database and fix the cache tables themselves, and re-run
Task list checkpoint
In Gitlab, this task can be completed at this stage:
- [ ] - NAME verify cache tables ("tag-2" notebook)
Loading to OTN Tables
STOP - confirm there is no Push currently ongoing. If a Push is ongoing, you must wait for it to be completed before processing beyond this point
This cell will populate the otn_animals
and otn_transmitters
master-tables, with the following success messaging:
Added XX records to schema.otn_animals table from animalcache_YYYY_mm
Added XX records to schema.otn_transmitters table from tagcache_YYYY_mm
The number of records added should match the number from the cache
table loading step.
Task list checkpoint
In Gitlab, these tasks can be completed at this stage:
- [ ] - NAME load otn tables ("tag-2" notebook)
Verifying OTN Tables
This cell will now complete the Quality Control checks of the tag and animal records contained in the entire schema. We are no longer checking our newly-loaded records only, but also each previously-loaded record.
The output will have useful information:
- Are there more animals than tags?
- Do all animals have tag records, and all tags have animal records? Do their SNs/Datetimes match?
- Are there overlapping tags within the schema?
- Are there tags outside of the project bounding box?
- Are the values formatted properly for
the_geom
,lenthtype
,length2type
,ageunits
,sex
- Are there blank strings that need to be set to NULL? If so, press the
Set to NULL
button in that cell. - Are there extra spaces that need to be clipped? If so, press the
Remove extra spaces
button. - Visually confirm that the length/weight values make sense for that species/lifestage, and if not, change in the
otn_animals
table, or contact researcher. - Are the date-fields formatted correctly?
- Confirm there are Tag Specifications available, and they match the records provided. Be mindful that harvested tags will have a different tag life than is stated in the Specifications.
- Have these tags been used on other projects in the database? Check the dates to ensure they don’t overlap, and double-reporting is not taking place.
The notebook will indicate the sheet had passed quality control by adding a ✔️green checkmark beside each section.
If there are any errors, go into the database and fix the otn
tables themselves, or contact the researcher, and re-run.
Task list checkpoint
In Gitlab, these tasks can be completed at this stage:
- [ ] - NAME verify otn tables ("tag-2" notebook)
- [ ] - NAME verify tags are not part of another collection (`tag-2` notebook)
Final Steps
The remaining steps in the Gitlab Checklist are completed outside the notebooks.
First: you should access the Repository folder in your browser and add the cleaned Tag Metadata .xlsx
file into the “Data and Metadata” folder.
Finally, the Issue can be passed off to an OTN-analyst for final verification in the database.
Key Points
Loading tagging metadata requires judgement from the Data Manager
Communication with the researcher is essential when errors are found
Deployment Metadata
Overview
Teaching: 30 min
Exercises: 0 minQuestions
How do I load new deployments into the Database?
Objectives
Understand the proper template-completion
Understand how to use the GitLab checklist
Learn how to use the
Deploy
notebook
Once a project has been registered, the next step (for Deployment
and Data
project types) is to quality control and load the instrument deployment metadata into the database. Deployment metadata should be reported to the Node in the template provided here. This file will contain information about the deployment of any instruments used to detect tagged subjects or collect related data. This includes stationary test tags, range test instruments, non-acoustic environmental sensors etc. Geographic location, as well as the duration of the deployment for each instrument, is recorded. The locations of these listening stations are used to fix detections geographically.
Remembering our previous lessons, there are multiple levels of data-tables in the database for deployment records: raw tables
, rcvr_locations
, stations
and moorings
. The process for loading instrument metadata reflects this, as does the GitLab task list.
Submitted Metadata
Immediately, upon receipt of the metadata, a new GitLab Issue should be created. Please use the Receiver_metadata
Issue checklist template.
Here is the Issue checklist, for reference:
Receiver Metadata
- [ ] - NAME add label *'loading records'*
- [ ] - NAME load raw receiver metadata (`deploy` notebook) **put_table_name_in_ticket**
- [ ] - NAME check that station locations have not changed station "NAMES" since last submission (manual check)
- [ ] - NAME verify raw table (`deploy` notebook)
- [ ] - NAME post updated metadata file to project repository (OTN members.oceantrack.org, FACT RW etc)
- [ ] - NAME load station records (`deploy` notebook)
- [ ] - NAME verify stations (`deploy` notebook)
- [ ] - NAME load to rcvr_locations (`deploy` notebook)
- [ ] - NAME verify rcvr_locations (`deploy` notebook)
- [ ] - NAME add transmitter records receivers with integral pingers (`deploy` notebook)
- [ ] - NAME load to moorings (`deploy` notebook)
- [ ] - NAME verify moorings (`deploy` notebook)
- [ ] - NAME label issue with *'Verify'*
- [ ] - NAME pass issue to OTN DAQ for reassignment to analyst
- [ ] - NAME check if project is OTN loan, if yes, check for lost indicator in recovery column, list receiver serial numbers for OTN inventory updating.
- [ ] - NAME pass issue to OTN analyst for final verification
- [ ] - NAME check for double reporting (verification_notebooks/Deployment Verification notebook)
**receiver deployment files/path:**
Visual Inspection
Once the completed file is received from a researcher, the Data Manager should first complete a visual check for formatting and accuracy.
In general, the deployment metadata contains information on the instrument, the deployment location, and the deployment/recovery times.
Things to visually check in the metadata:
- Is there any information missing from the essential columns? These are:
- otn_array
- station_no
- deploy_date
- deploy_lat
- deploy_long
- ins_model_no
- ins_serial_no
- recovered
- recover_date
- If any of the above mandatory fields are blank, follow-up with the researcher will be required if:
- you cannot discern the values yourself.
- you do not have access to the Tag or Receiver Specifications from the manufacturer (relevant for the columns containing
transmitter
information).
- Are all lat/longs in the correct sign? Are they in the correct format (decimal degrees)?
- Do all transceivers/test tags have their transmitters provided?
- Are all recoveries from previous years recorded?
- Do comments suggest anything was lost or damaged, where recovery indicator doesn’t say “lost” or “failed”?
In general, the most common formatting errors occur in records where there are >1 instrument deployed at a station, or if the receiver was deployed and recovered from the same site.
The metadata template available here has a Data Dictionary
sheet which contains detailed expectations for each column. Refer back to these definitions often. We have also included some recommendations on our FAQ page. Here are some guidelines:
- Deployment, download, and recovery information for each station is entered on a single line.
- When more than one instrument is deployed, downloaded, or recovered at the same station, enter each one on a separate line using the same
OTN_ARRAY
,STATION_NO
. - When sentinel tags are co-deployed with receivers, their information can be added to
TRANSMITTER
andTRANSMIT_MODEL
columns, on the same line as the receiver deployment. - If a sentinel tag is deployed alone then a new line for that station, with as much information as possible, is added.
- When an instrument is deemed lost, a value of
l
orlost
should be entered in the “recovered” field; if the instrument is found, this can be updated by changing the recovery field tof
orfound
and resubmitting the metadata sheet. - Every time an instrument is brought to the surface, enter
y
to indicate it was successfully recovered, even if only for downloading and redeployment. A new line for the redeployment is required.
Quality Control - Deploy Notebook
Each step in the Issue checklist will be discussed here, along with other important notes required to use the Nodebook.
Imports cell
This section will be common for most Nodebooks: it is a cell at the top of the notebook where you will import any required packages and functions to use throughout the notebook. It must be run first, every time.
There are no values here which need to be edited.
Path to File
In this cell, you need to paste a filepath to the relevant Deployment Metadata file. The filepath will be added between the provided quotation marks.
Correct formatting looks something like this:
# Shortfrom metadata path (xls, csv)
filepath = r'C:/Users/path/to/deployment_metadata.xlsx'
You also must select the format of the Deployment metadata. Currently, only the FACT Network uses a slightly different format than the template available here. If its relevant for your Node, you can edit the excel_fmt
section.
Correct formatting looks something like this:
excel_fmt = 'otn' # Deployment metadata format 'otn' or 'fact'
Once you have added your filepath and chosen your template format, you can run the cell.
Next, you must choose which sheet you would like to quality control. Generally, it will be named Deployment
but is often customized by researchers. Once you have selected the sheet name, do not re-run the cell to save the output - simply ensure the correct sheet is highlighted and move onto the next cell.
Table Name and Database
You will have to edit three sections:
schema = 'collectioncode'
- please edit to include the relevant project code, in lowercase, between the quotes.
table_name = 'c_shortform_YYYY_mm'
- Within the quotes, please add your custom table suffix. We recommend using
year_month
or similar, to indicate the most-recently deployed/recovered instrument in the metadata sheet.
- Within the quotes, please add your custom table suffix. We recommend using
engine = get_engine()
- Within the open brackets you need to open quotations and paste the path to your database
.kdbx
file which contains your login credentials. - On MacOS computers, you can usually find and copy the path to your database
.kdbx
file by right-clicking on the file and holding down the “option” key. On Windows, we recommend using the installed software Path Copy Copy, so you can copy a unix-style path by right-clicking. - The path should look like
engine = get_engine('C:/Users/username/Desktop/Auth files/database_conn_string.kdbx')
.
- Within the open brackets you need to open quotations and paste the path to your database
Once you have added your information, you can run the cell. Successful login is indicated with the following output:
Auth password:········
Connection Notes: None
Database connection established
Connection Type:postgresql Host:db.for.your.org Database:your_db_name User:your_node_admin Node:Node
Verification of File Contents
This cell will now complete the first round of Quality Control checks.
The output will have useful information:
- Is the sheet formatted correctly? Correct column names, datatypes in each column etc.
- Compared to the
stations
table in the database, are the station names correct? Have stations “moved” location? Are the reported bottom_depths significantly different (check for possibleft
vsm
vsftm
errors). - Are all recovery dates after the deployment dates?
- Are all the provided
ins_model_no
values present in theobis.instrument_models
table? If not, please check the records in theobis.instrument_models
and the source file to confirm there are no typos. If this is a new model which has never been used before, use theadd instrument_models
notebook to add the new instrument model. - Do all transceivers/test tags have their transmitters provided? Do these match any manufacturer Specifications we have in the database?
- Are there any overlapping deployments (one serial number deployed at multiple locations for a period of time)?
- Are all the deployments within the Bounding Box of the project. If the bounding box needs to be expanded to include the stations, you can use the
Square Draw Tool
to re-draw the bounding box until you are happy with it. Once all stations are drawn inside the bounding box, press theAdjust Bounding Box
button to save the results. - Are there possible gaps in the metadata, based on previously-loaded
detections
files? This will be investigated in theDetections-3b
notebook if you need more details.
The notebook will indicate the sheet has passed quality control by adding a ✔️green checkmark beside each section. There should also be an interactive plot generated, summarizing the instruments deployed over time for you to explore, and a map of the deployments.
Using the map, please confirm the following:
- the instrument deployment locations are in the part of the world expected based on the project abstract. Ex: lat/long have correct +/- signs
- the instrument deployments do not occur on land
If there is information which is not passing quality control, you should fix the source-file (potentially after speaking to the researcher) and try again.
Loading the Raw Table
ONLY once the source file has successfully passed ALL quality control checks can you load the raw table to the database.
You have already named the table above, so there are no edits needed in this cell.
The notebook will indicate the success of the table-creation with the following message:
Reading file 'deployment_metadata.xlsx' as otn formatted Excel.
Table Loading Complete:
Loaded XXX records into table schema.c_shortform_YYYY_mm
Task list checkpoint
In GitLab, these tasks can be completed at this stage:
- [ ] - NAME load raw receiver metadata ("deploy" notebook) **put_table_name_in_ticket**
- [ ] - NAME check that station locations have not changed station "NAMES" since last submission (manual check)
Ensure you paste the table name (ex: c_shortform_YYYY_mm) into the section indicated, before you check the box.
Verify Raw Table
This cell will now complete the Quality Control checks of the raw table. This is to ensure the Nodebook loaded the records correctly from the Excel sheet.
The output will have useful information:
- Are there any duplicate records?
- Is there missing information in the
ar_model_no
andar_serial_no
columns? If so, ensure that it makes sense for receivers to be deployed without Acoustic Releases in this location (ex: diver deployed). - Do all transceivers/test tags have their transmitters provided? Do these match any manufacturer Specifications we have in the database?
- Are there any non-numeric serial numbers? Do these makes sense (ex: environmental sensors)?
- Are the deployments within the bounding box?
- Is the
recovered
column completed correctly, based on thecomments
and therecovery_date
columns? - Are there blank strings that need to be set to NULL? If so, press the
Set to NULL
button in that cell.
The notebook will indicate the sheet had passed quality control by adding a ✔️green checkmark beside each section.
If there are any errors go into database and fix the raw
table directly, or contact the researcher, and re-run.
Task list checkpoint
In GitLab, this task can be completed at this stage:
- [ ] - NAME verify raw table ("deploy" notebook)
Loading Stations Records
STOP - confirm there is no Push currently ongoing. If a Push is ongoing, you must wait for it to be completed before processing beyond this point
Only once the raw table has successfully passed ALL quality control checks can you load the stations information to the database stations
table.
Running this cell will first check for any new stations to add, then confirm the records in the stations
table matches the records in the moorings
table where basisofrecord = 'STATION'
.
If new stations are identified:
- Compare these station names to existing stations in the
stations
table in the database. Are they truly new, or is there a typo in the raw table? Did the station change names, but is in the same location as a previous station? - If there are fixes to be made, change the records in the
raw
table, or contact the researcher. - If all stations are truly new, you should review the information in the editable form, then select
Add New Stations
.
The success message will look like:
Adding station records to the stations table.
Creating new stations...
Added XXX new stations to schema.moorings
If the stations
and moorings
tables are not in-sync, the difference between the two will need to be compared and possibly updated.
Task list checkpoint
In GitLab, this task can be completed at this stage:
- [ ] - load station records ("deploy" notebook)
Verify Stations Table
This cell will now complete the Quality Control checks of the stations records contained in the entire schema. We are no longer only checking against our newly-loaded records, but also each previously-loaded record in this schema/project. This will help catch historical errors.
The output will have useful information:
- Were all the stations from our
raw
table promoted to thestations
table, and themoorings
table? - Are all stations in unique locations?
- Are all stations within the project’s bounding box?
- Does the date in the
stations
table match the first deployment date for that station? - Are there blank strings that need to be set to NULL? If so, press the
Set to NULL
button in that cell. - Are any of the dates in the future?
The notebook will indicate the sheet had passed quality control by adding a ✔️green checkmark beside each section.
If there are any errors go into database and fix the raw
table directly, or contact the researcher, and re-run. If there are problems with records that have already been promoted to the stations
or moorings
table, you will need to contact an OTN database staff member to resolve these.
Task list checkpoint
In GitLab, this task can be completed at this stage:
- [ ] - verify stations("deploy" notebook)
Load to rcvr_locations
Once the station
table is verified, the receiver deployment records can now be promoted to the “intermediate” rcvr_locations
table.
The cell will identify any new deployments to add, and any previously-loaded deployments which need updating (ex: they have been recovered).
If new deployments are identified:
- Compare these deployments to existing deployments in the
rcvr_locations
table in the database. Are they truly new, or is there a typo in the raw table? Did the station change names, but is in the same location as before? Did the serial number change, but the deployment location and date are the same? - If there are fixes to be made, change the records in the
raw
table, or contact the researcher. - If all deployments are truly new, you can then select
Add Deployments
.
If deployment updates are identified:
- Review the suggested changes: do not select changes which remove information (ex: release 590023 –>
None
is not a good change to accept). - Use the check boxes to select only the appropriate changes
- Once you are sure the changes are correct, you can then select
Update Deployments
.
Each instance will give a success message such as:
Loading deployments into the rcvr_locations_table
Loaded XX records into the schema.rcvr_locations table.
Task list checkpoint
In GitLab, this task can be completed at this stage:
- [ ] - load to rcvr_locations ("deploy" notebook)
Verify rcvr_locations
This cell will now complete the Quality Control checks of the rcvr_locations records contained in the entire schema. We are no longer only checking our newly-loaded records, but also each previously-loaded record for this schema/project. This will help catch historical errors.
The output will have useful information:
- Have all deployments been loaded from the raw table? Please note that instruments where a sentinel tag is deployed alone at a station will not be loaded to rcvr_locations, and so these will likely be flagged in this section for your review.
- Are there blank strings that need to be set to NULL? If so, press the
Set to NULL
button in that cell. - Based on comments, are there any instances where a receiver’s status should be changed to “lost”?
- Are all instruments in the
obis.instrument_models
table? - Are there any overlapping deployments?
- Is the geom, serial number and catalognumber formatted correctly?
The notebook will indicate the table has passed quality control by adding a ✔️green checkmark beside each section.
If there are any errors contact OTN, or contact the researcher, to resolve.
Task list checkpoint
In GitLab, this task can be completed at this stage:
- [ ] - verify rcvr_locations ("deploy" notebook)
Load Transmitter Records to Moorings
The transmitter
values associated with transceivers, co-deployed sentinel tags or stand-alone test tags will be loaded to the moorings
table in this section. Existing transmitter records will also be updated if relevant.
If new transmitters are identified:
- Review for accuracy then select
Add Transmitters
.
If transmitter updates are identified:
- Review the suggested changes: do not select changes which remove information (ex: release 590023 –>
None
is not a good change to accept). - Use the check boxes to select only the appropriate changes
- Once you are sure the changes are correct, you can then select
Update Transmitters
.
Task list checkpoint
In GitLab, this task can be completed at this stage:
- [ ] - load transmitter records receivers with integral pingers ("deploy" notebook)
Load Receivers to Moorings
The final, highest-level table for instrument deployments is moorings
.
The cell will identify any new deployments to add, and any previously-loaded deployments which need updating (ex: they have been recovered).
Please review all new deployments and deployment update for accuracy, then press the associated buttons to make the changes. At this stage, the updates are not editable: any updates chosen from the rcvr_locations
section will be processed here.
You may be asked to select an instrumenttype
for certain receivers. Use the drop-down menu to select before adding the deployment.
Task list checkpoint
In GitLab, this task can be completed at this stage:
- [ ] - load to moorings ("deploy" notebook)
Verify Moorings
This cell will now complete the Quality Control checks of the moorings records contained in the entire schema. We are no longer only checking our newly-loaded records, but also each previously-loaded record in this project/schema.
The output will have useful information:
- Have all deployments been loaded from rcvr_locations?
- Do we have manufacturer specifications for the receivers or tags deployed?
- Are there blank strings that need to be set to NULL? If so, press the
Set to NULL
button in that cell. - Do the
STATIONS
records match the information in thestations
table? - Are all instruments in the
obis.instrument_models
table? - Are there any overlapping deployments or receivers or tags?
- Are there duplicate download records?
- Is the lat/long, geom, serial number and catalognumber formatted correctly?
The notebook will indicate the table has passed quality control by adding a ✔️ green checkmark beside each section.
If there are any errors contact OTN to resolve.
Task list checkpoint
In GitLab, this task can be completed at this stage:
- [ ] - verify moorings ("deploy" notebook)
Final Steps
The remaining steps in the GitLab Checklist are completed outside the notebooks.
First: you should access the Repository folder in your browser and add the cleaned Deployment Metadata .xlsx
file into the “Data and Metadata” folder.
Finally, the Issue can be passed off to an OTN-analyst for final verification in the database.
Key Points
Loading receiver metadata requires judgement from the Data Manager
Communication with the researcher is essential when errors are found
Detection Loading
Overview
Teaching: 40 min
Exercises: 0 minQuestions
What is the workflow for loading detection data?
What do I need to look out for as a Node Manager when loading detections?
Objectives
Understand how to use the GitLab checklist
Understand the workflow for detection data in the OTN system
Learn common errors and pitfalls that come up when loading detections
Once deployment metadata
has been processed for a project, the related detections may now be processed. Detection data should be reported to the Node as a collection of raw, unedited files. These can be in the form of a zipped folder of .VRLs
, a database from Thelma Biotel or any other raw data product from any manufacturer. The files contain only transmitter numbers and the datetimes at which they were recorded at a specific receiver. The tag metadata
and deployment metadata
will provide the associated geographic and biological context to this data.
Submitted Records
Immediately, upon receipt of the data files, a new GitLab Issue should be created. Please use the Detections
Issue checklist template.
Here is the Issue checklist, for reference:
Detections
- [ ] - NAME load raw detections and events `(detections-1` notebook and `events-1` notebook **OR** `Convert - Fathom Export` notebook and `detections-1` notebook) **(put table names here)**
- [ ] - NAME upload raw detections to project folder (OTN members.oceantrack.org, FACT RW etc) if needed
- [ ] - NAME verify raw detections table (`detections-1` notebook)
- [ ] - NAME load raw events to events table (`events-2` notebook)
- [ ] - NAME load to detections_yyyy (`detections-2` notebook)
- [ ] - NAME comment in issue what detection years were loaded (output from `detections-2`)
- [ ] - NAME verify detections_yyyy (looking for duplicates) (`detections-2` notebook)
- [ ] - NAME load to sensor_match_yyyy (`detections-2` notebook)
- [ ] - NAME comment in issue what sensor years were loaded (output from `detections-2`)
- [ ] - NAME timedrift correction for affected detection and sensor years (`detections-2b` notebook)
- [ ] - NAME verify timedrift corrections (`detections-2b` notebook)
- [ ] - NAME manually check for open, unverified receiver metadata, **STOP** if it exists! (**put Gitlab issue number here**)
-----
- [ ] - NAME load to otn_detections_yyyy (`detections-3` notebook)
- [ ] - NAME verify otn_detections_yyyy (`detections-3` notebook)
- [ ] - NAME load sentinel records (`detections-3` notebook)
- [ ] - NAME check for missing receiver metadata (`detections-3b` notebook)
- [ ] - NAME check for missing data records (`detections-3c` notebook)
- [ ] - NAME load download records (`events-3` notebook)
- [ ] - NAME verify download records (`events-3` notebook)
- [ ] - NAME process receiver configuration (`events-4` notebook)
- [ ] - NAME label issue with *'Verify'*
- [ ] - NAME pass issue to OTN analyst for final steps
- [ ] - NAME check for double reporting (verification_notebooks/Project Verification notebook)
- [ ] - NAME match tags to animals (`detections-4` notebook)
- [ ] - NAME do sensor tag processing (only done if vendor specifications are available)
- [ ] - NAME update detection extract table
**detections files/path:**
Visual Inspection
Once the files are received from a researcher, the Data Manager should first complete a visual check for formatting and accuracy.
Things to visually check:
- Do the files appear edited? Look for
_edited
in file name. - Is the file format the same as expected for that manufacturer? Ex.
.vrl
for Innovasea - not.csv
orrld
formats. - Is there data for each of the instrument recoveries that was reported in the
deployment metadata
?
Convert to CSV
Once the raw files are obtained, the data must be converted to csv
format. There are several ways this can be done, depending on the manufacturer.
For Innovasea
- VUE
- Open a new
database
- Import all the
VRL
files - Select
export detections
and choose the location you want to save the files - Select
export events
and choose the location you want to save the files
- Open a new
- Fathom Connect App
- choose “export data”
- select the relevant files and import into the Fathom Connect application
- export all data types, and choose the location you want to save the files
convert - Fathom (vdat) Export - VRL to CSV
Nodebook- this will use the
vdat.exe
executable to export from VRL/VDAT to CSV - select the folder containing the relevant files and the location you’d like the CSVs saved
- run the cells to
convert
- this will use the
For Thelma Biotel
- use the
ComPort
software to open the.tbdb
file and export as CSV
For Lotek
- exporting to CSV is more complicated, please reach out to OTN for specific steps
Other manufacturers: contact OTN staff.
detections - 1 - load csv detections
Detections-1 loads CSV detections files into a new database table. If detections were exported using Fathom
or the convert - Fathom (vdat) Export - VRL to CSV
notebook, the events
records will also be loaded at this stage. This is because these applications combine the detections and events data in one CSV file.
Import cells and Database Connections
As in all notebooks, run the import cell to get the packages and functions needed throughout the notebook. This cell can be run without any edits.
The second cell will set your database connection. You will have to edit one section: engine = get_engine()
- Within the open brackets you need to open quotations and paste the path to your database
.kdbx
file which contains your login credentials. - On MacOS computers, you can usually find and copy the path to your database
.kdbx
file by right-clicking on the file and holding down the “option” key. On Windows, we recommend using the installed software Path Copy Copy, so you can copy a unix-style path by right-clicking. - The path should look like
engine = get_engine('C:/Users/username/Desktop/Auth files/database_conn_string.kdbx')
.
Once you have added your information, you can run the cell. Successful login is indicated with the following output:
Auth password:········
Connection Notes: None
Database connection established
Connection Type:postgresql Host:db.for.your.org Database:your_db_name User:your_node_admin Node:Node
User Input
Cell three requires input from you. This information will be used to get the raw detections CSV and to be able to create a new raw table in the database.
file_or_folder_path = r'C:/Users/path/to/detections_CSVs/'
- paste a filepath to the relevant CSV file(s). The filepath will be added between the provided quotation marks.
- this can be a path to a single CSV file, or a folder of multiple CSVs.
table_suffix = 'YYYY_mm'
- Within the quotes, please add your custom table suffix. We recommend using
year_month
or similar, to indicate the most-recently downloaded instrument.
- Within the quotes, please add your custom table suffix. We recommend using
schema = 'collectioncode'
- please edit to include the relevant project code, in lowercase, between the quotes.
There are also some optional inputs:
load_detections
: a true or false value using the table suffix you suppliedstacked
: this is for Fathom exports only and is a way to know how to parse them
Once you have added your information, you can run the cell.
Verify Detection File and Load to Raw Table
Next, the notebook will review and verify the detection file(s) format, and report any error. Upon successful verification, you can then run the cell below which will attempt to load the detections into a new raw table.
The notebook will indicate the success of the table-creation with a message such as this:
Reading fathom files...
Loading Files...
7/7
Task list checkpoint
In GitLab, this task can be completed at this stage:
- [ ] - NAME load raw detections and events ('detections-1' notebook and 'events-1' notebook **OR** 'Batch Fathom Export' notebook and 'detections-1' notebook) **(put table names here)**
Ensure you paste the table name (ex: c_detections_YYYY_mm) into the section indicated, before you check the box.
Verify Raw Detection Table
This cell will now complete the Quality Control checks of the raw table. This is to ensure the Nodebook loaded the records correctly from the CSVs.
The output will have useful information:
- Are there any duplicates?
- Are the serial numbers formatted correctly?
- Are the models formatted correctly?
The notebook will indicate the sheet had passed quality control by adding a ✔️green checkmark beside each section.
If there are any errors, contact OTN for next steps.
Task list checkpoint
In GitLab, these tasks can be completed at this stage:
- [ ] - NAME verify raw detections table ('detections-1' notebook)
events - 1 - load events into c_events_yyyy
Events-1 is responsible for loading receiver events files into raw tables. This is only relevant for CSVs that were NOT exported using Fathom
or the convert - Fathom (vdat) Export - VRL to CSV
notebook.
Import cell
As in all notebooks run the import cell to get the packages and functions needed throughout the notebook. This cell can be run without any edits.
User Inputs
Cell two requires input from you. This information will be used to get the raw events CSV and to be able to create a new raw table in the database.
filepath = r'C:/Users/path/to/events.csv'
- paste a filepath to the relevant CSV file. The filepath will be added between the provided quotation marks.
table_name = 'c_events_YYYY_mm'
- Within the quotes, please add your custom table suffix. We recommend using
year_month
or similar, to indicate the most-recently downloaded instrument.
- Within the quotes, please add your custom table suffix. We recommend using
schema = 'collectioncode'
- please edit to include the relevant project code, in lowercase, between the quotes.
There are also some optional inputs:
file_encoding
: The file_encoding: ISO-8859-1 in the event export. The default encoding used in VUE’s event export
Once you have added your information, you can run the cell.
Verifying the events file
Before attempting to load the event files to a raw table the notebook will verify the file to make sure there are no major issues. This will be done by running the Verify events file cell. Barring no errors, you will be able to continue.
The notebook will indicate the success of the file verification with a message such as this:
Reading file 'events.csv' as CSV.
Verifying the file.
Format: VUE 2.6+
Mandatory Columns: OK
date_and_time datetime:OK
Initialization(s): XX
Data Upload(s): XX
Reset(s): XX
Database Connection
You will have to edit one section: engine = get_engine()
- Within the open brackets you need to open quotations and paste the path to your database
.kdbx
file which contains your login credentials. - On MacOS computers, you can usually find and copy the path to your database
.kdbx
file by right-clicking on the file and holding down the “option” key. On Windows, we recommend using the installed software Path Copy Copy, so you can copy a unix-style path by right-clicking. - The path should look like
engine = get_engine('C:/Users/username/Desktop/Auth files/database_conn_string.kdbx')
.
Once you have added your information, you can run the cell. Successful login is indicated with the following output:
Auth password:········
Connection Notes: None
Database connection established
Connection Type:postgresql Host:db.for.your.org Database:your_db_name User:your_node_admin Node:Node
Load the events file into the c_events_yyyy table
The second last cell loads the events file into a raw table. It depends on successful verification from the last step. Upon successful loading and can dispose of the engine then move on to the next notebook.
The notebook will indicate the success of the table-creation with the following message:
File loaded with XXXXX records.
100%
Task list checkpoint
In GitLab, these tasks can be completed at this stage:
- [ ] - NAME load raw detections and events ('detections-1' notebook and 'events-1' notebook **OR** 'Batch Fathom Export' notebook and 'detections-1' notebook) **(put table names here)**
Ensure you paste the table name (ex: c_events_YYYY_mm) into the section indicated, before you check the box.
events - 2 - move c_events into events table
This notebook will move the raw
events records in the intermediate
events table.
Import cell
As in all notebooks run the import cell to get the packages and functions needed throughout the notebook. This cell can be run without any edits.
User input
This cell requires input from you. This information will be used to get the raw events CSV and to be able to create a new raw table in the database.
c_events_table = 'c_events_YYYY_mm'
- Within the quotes, please add your custom table suffix, which you have just loaded in either
detections-1
orevents-1
.
- Within the quotes, please add your custom table suffix, which you have just loaded in either
schema = 'collectioncode'
- please edit to include the relevant project code, in lowercase, between the quotes.
Database Connection
You will have to edit one section: engine = get_engine()
- Within the open brackets you need to open quotations and paste the path to your database
.kdbx
file which contains your login credentials. - On MacOS computers, you can usually find and copy the path to your database
.kdbx
file by right-clicking on the file and holding down the “option” key. On Windows, we recommend using the installed software Path Copy Copy, so you can copy a unix-style path by right-clicking. - The path should look like
engine = get_engine('C:/Users/username/Desktop/Auth files/database_conn_string.kdbx')
.
Once you have added your information, you can run the cell. Successful login is indicated with the following output:
Auth password:········
Connection Notes: None
Database connection established
Connection Type:postgresql Host:db.for.your.org Database:your_db_name User:your_node_admin Node:Node
Verify table format
You will then verify that the c_events events table you put in exists and then verify that it meets the required format specifications.
The notebook will indicate the success of the table verification with a message such as this:
Checking table name format... OK
Checking if schema collectioncode exists... OK!
Checking collectioncode schema for c_events_YYYY_mm table... OK!
collectioncode.c_events_YYYY_mm table found.
If there are any errors in this section, please contact OTN.
Load to Events table
Pending nothing comes up in the verification cells, you run the loading
cell.
The notebook will indicate the success of the processing with a message such as this:
Checking for the collectioncode.events table... OK!
Loading events... OK!
Loaded XX rows into collectioncode.events table.
Task list checkpoint
In GitLab, these tasks can be completed at this stage:
- [ ] - NAME load raw events to events table ("events-2" notebook)
detections - 2 - c_table into detections_yyyy
This notebook takes the raw
detection data from detections-1 and moves it into the intermediate
detections_yyyy tables (split out by year).
Import cells and Database Connections
As in all notebooks run the import cell to get the packages and functions needed throughout the notebook. This cell can be run without any edits.
The second cell will set your database connection. You will have to edit one section: engine = get_engine()
- Within the open brackets you need to open quotations and paste the path to your database
.kdbx
file which contains your login credentials. - On MacOS computers, you can usually find and copy the path to your database
.kdbx
file by right-clicking on the file and holding down the “option” key. On Windows, we recommend using the installed software Path Copy Copy, so you can copy a unix-style path by right-clicking. - The path should look like
engine = get_engine('C:/Users/username/Desktop/Auth files/database_conn_string.kdbx')
.
Once you have added your information, you can run the cell. Successful login is indicated with the following output:
Auth password:········
Connection Notes: None
Database connection established
Connection Type:postgresql Host:db.for.your.org Database:your_db_name User:your_node_admin Node:Node
User Inputs
To load the to the detections_yyyy tables the notebook will require information about the schema you are working in and the raw table that you created in detections-1
.
c_table = 'c_detections_YYYY_mm'
- Within the quotes, please add your custom table suffix, which you have just loaded in
detections-1
- Within the quotes, please add your custom table suffix, which you have just loaded in
schema = 'collectioncode'
- please edit to include the relevant project code, in lowercase, between the quotes.
The notebook will indicate success with the following message:
Checking table name format... OK
Checking if schema collectioncode exists... OK!
Checking collectioncode schema for c_detections_yyyy_mm table... OK!
collectioncode.c_detections_yyyy_mm table found.
Create Missing Tables
Detections tables are only created on an as-needed basis. These cells will detect any tables you are missing and create them as needed, based on the years covered in the raw detection table (c_table). This will check all tables such as detections_yyyy
, sensor_match_yyyy
and otn_detections_yyyy
.
First the notebook with gather and print the missing tables. If there are none missing, the notebook will report that as well.
vemco: Match
You are missing the following tables:
[collectioncode.detections_YYYY, v2lbeiar.otn_detections_YYYY, v2lbeiar.sensor_match_YYYY]
Create these tables by passing the missing_tables variable into the create_detection_tables function.
If you proceed in the notebook, there is a creation
cell which will add these tables to the project schema in the database. Success will be indicated with the following message:
Creating table collectioncode.detections_YYYY... OK
Creating table collectioncode.otn_detections_YYYY... OK
Creating table collectioncode.sensor_match_YYYY... OK
Create Detection Sequence
Before loading detections, a detection sequence is created. The sequence is used to populate the det_key
column. The det_key
value is an unique ID for that detection to help ensure there are no duplicates. If a sequence is required, you will see this output:
creating sequence v2lbeiar.detections_seq... OK
No further action is needed.
Load to Detections_YYYY
Duplicate detections are then checked for and will not be inserted into the detections_yyyy tables.
If no duplicates are found you will see:
No duplicates found. All of the detections will be loaded into the detections_yyyy table(s).
If duplicates are found you will see:
- a bar chart showing the number of detections per year which
- have already been loaded
- are new, and will be loaded this time
- you may want to investigate if the results are not what you expected.
After all this, the raw
detection records are ready to be loaded into the detections_yyyy
tables. The notebook will indicate success with the following message:
Inserting records from collectioncode.c_detections_YYYY_mm into collectioncode.detections_2018... OK
Added XXXXX rows.
Inserting records from collectioncode.c_detections_YYYY_mm into collectioncode.detections_2019... OK
Added XXXXX rows.
Inserting records from collectioncode.c_detections_YYYY_mm into collectioncode.detections_2020... OK
Added XXXXX rows.
Inserting records from collectioncode.c_detections_YYYY_mm into collectioncode.detections_2021... OK
Added XXXXX rows.
You must note which years have been loaded! In the example above, this would be 2018, 2019, 2020, and 2021.
Task list checkpoint
In GitLab, these tasks can be completed at this stage:
- [ ] - NAME load to detections_yyyy (`detections-2` notebook)
- [ ] - NAME comment in issue what detection years were loaded (output from `detections-2`)
Ensure you paste the affected tables (ex: 2019, 2020) into the section indicated, before you check the box.
Verify Detections YYYY Tables
This cell will now complete the Quality Control checks of the detections_yyyy
tables. This is to ensure the nodebook loaded the records correctly.
First, you will need to list all of the years that were affected by the previous loading step, so the Nodebook knows which tables need to be verified. The format will look like this:
years = ['YYYY','YYYY','YYYY', 'YYYY']
If left blank, the Nodebook will check all the years, which may take a long time for some projects.
Run this cell, then you can verify in the next cell.
The output will have useful information:
- Were all the detections loaded?
- Are the serial numbers formatted correctly?
- Are the models formatted correctly?
- Are there duplicate detections?
- What sensors will need to be loaded?
The notebook will indicate the sheet had passed quality control by adding a ✔️green checkmark beside each section.
If there are any errors contact OTN for next steps.
Task list checkpoint
In GitLab, this task can be completed at this stage:
- [ ] - NAME verify detections_yyyy (looking for duplicates) ("detections-2" notebook)
Load sensors_match Tables by Year
For the last part of this notebook you will need to load the to the sensor_match_YYYY
tables. This loads detections with sensor information into a project’s sensor_match_yyyy tables. Later, these tables will aid in matching vendor specifications to resolve sensor tag values.
Output will appear like this:
Inserting records from collectioncode.detections_2019 INTO sensor_match_2019... OK
Added XXX rows.
Inserting records from collectioncode.detections_2021 INTO sensor_match_2021... OK
Added XXX rows.
You must note which years have been loaded! In the example above, this would be 2019 and 2021.
Task list checkpoint
In GitLab, these tasks can be completed at this stage:
- [ ] - NAME load to sensor_match_yyyy (`detections-2` notebook)
- [ ] - NAME comment in issue what sensor years were loaded (output from `detections-2`)
Ensure you paste the affected tables (ex: 2019, 2020) into the Issue.
detections - 2b - timedrift calculations
This notebook calculates time drift factors and applies the corrections to the detections_yyyy
tables, in a field called corrected_time
. OTN’s Data Manager toolbox (the Nodebooks) corrects for timedrift between each initialization and offload of a receiver. If a receiver is offloaded several times in one data file, time correction does not occur linearly from start to end, but between each download, to ensure the most accurate correction. If there is only one download in a data file then the time correction in VUE
software will match the time correction performed by OTN.
Import cells and Database connections
As in all notebooks run the import cell to get the packages and functions needed throughout the notebook. This cell can be run without any edits.
The second cell will set your database connection. You will have to edit one section: engine = get_engine()
- Within the open brackets you need to open quotations and paste the path to your database
.kdbx
file which contains your login credentials. - On MacOS computers, you can usually find and copy the path to your database
.kdbx
file by right-clicking on the file and holding down the “option” key. On Windows, we recommend using the installed software Path Copy Copy, so you can copy a unix-style path by right-clicking. - The path should look like
engine = get_engine('C:/Users/username/Desktop/Auth files/database_conn_string.kdbx')
.
Once you have added your information, you can run the cell. Successful login is indicated with the following output:
Auth password:········
Connection Notes: None
Database connection established
Connection Type:postgresql Host:db.for.your.org Database:your_db_name User:your_node_admin Node:Node
User Inputs
To load the to the detections_yyyy tables the notebook will require information about the schema you are working in. Please edit schema = 'collectioncode'
to include the relevant project code, in lowercase, between the quotes.
Calculating Time Drift Factors
create_tbl_time_drift_factors
. This function will create the time_drift_factors
table in the schema if it doesn’t exist.
The next step is to run check_time_drifts
which gives a display of the time drift factor values that will be added to the time_drift_factors table given an events table. At this stage, you should review for any erroneous/large timedrifts.
If everything looks good, you may proceed to the next cell which adds new time drift factors to the time_drift_factors table from the events file. A success message will appear:
Adding XXX records to collectioncode.time_drift_factors table from collectioncode.events... OK!
You will then see a cell to create missing views which creates the time drift views
which the database will use to calculate drift values for both the detections_yyyy
and sensor_match_yyyy
tables.
Correcting Time Drift
Finally, we are ready to update the times in both the detections_yyyy
and sensor_match_yyyy
tables with corrected time values using the vw_time_drift_cor database view.
The notebook should identify all of the years that were affected by detections-2
loading steps, so the notebook knows which tables need to be corrected.
Once the timedirft calculation is done (indicated by ✔️green checkmarks).
Task list checkpoint
In GitLab, this task can be completed at this stage:
- [ ] - NAME timedrift correction for affected detection and sensor years ("detections-2b" notebook)
Verify Detections After Time Drift Calculations
After running the above cells you will then verify the time drift corrections on the detections_yyyy
and sensor_match_yyyy
tables.
The output will have useful information:
- Are there any erroneous timedrift values?
- Did the time correction cause detections to need moving to another
yyyy
table? If so, select the “move detections” button. - Are the receiver models formatted correctly?
The notebook will indicate the sheet had passed quality control by adding a ✔️green checkmark beside each section.
If there are any errors contact OTN for next steps.
Task list checkpoint
In GitLab, this task can be completed at this stage:
- [ ] - NAME verify timedrift corrections ("detections-2b" notebook)
detections - 3 - detections_yyyy into otn_detections
The detections - 3
notebook moves the detections from detections_yyyy
and sensor_match_yyyy
tables into the final otn_detections_yyyy
tables. This will join the detections records to their associated deployment records, providing geographic context to each detection. If there is no metadata for a specific detection (no receiver record to match with) it will not be promoted to otn_detections_yyyy
.
Import cells and Database connections
As in all notebooks run the import cell to get the packages and functions needed throughout the notebook. This cell can be run without any edits.
The second cell will set your database connection. You will have to edit one section: engine = get_engine()
- Within the open brackets you need to open quotations and paste the path to your database
.kdbx
file which contains your login credentials. - On MacOS computers, you can usually find and copy the path to your database
.kdbx
file by right-clicking on the file and holding down the “option” key. On Windows, we recommend using the installed software Path Copy Copy, so you can copy a unix-style path by right-clicking. - The path should look like
engine = get_engine('C:/Users/username/Desktop/Auth files/database_conn_string.kdbx')
.
Once you have added your information, you can run the cell. Successful login is indicated with the following output:
Auth password:········
Connection Notes: None
Database connection established
Connection Type:postgresql Host:db.load.oceantrack.org Database:otnunit User:admin Node:OTN
User Inputs
To load the to the detections_yyyy tables the notebook will require information about the schema you are working in. Please edit schema = 'collectioncode'
to include the relevant project code, in lowercase, between the quotes.
Before moving on from this you will need to confirm 2 things:
1) Confirm that NO Push is currently ongoing
2) confirm rcvr_locations
for this schema have been verified.
If a Push is ongoing, or if verification has not yet occurred, you must wait for it to be completed before processing beyond this point.
Task list checkpoint
In GitLab, this task can be completed at this stage:
- [ ] - NAME manually check for open, unverified receiver metadata, **STOP** if it exists! **(put GitLab issue number here)**
Creating detection views and loading to otn_detections
Once you are clear to continue loading you can run create_detection_views
. This function as its name implies will create database views for detection data.
Output will look like:
Creating view collectioncode.vw_detections_2020... OK
Creating view collectioncode.vw_sentinel_2020... OK
Creating view collectioncode.vw_detections_2021... OK
These are then used to run the function in the next cell load_into_otn_detections_new
, which loads the detections from those views into otn_detections. You will be asked to select all relevant tables here, with a dropdown menu and checkboxes.
You must select all years that were impacted by detections_yyyy
or sensor_match_yyyy
loading steps. Then click the Load Detections
button to being loading. There will be a status bar indicating your progress.
Task list checkpoint
In GitLab, this task can be completed at this stage:
- [ ] - NAME load to otn_detections_yyyy ("detections-3" notebook)
Verify OTN Detections
After running your needed cells you will then verify otn_detections_yyyy
detections.
The output will have useful information:
- Are there any
sentinel
detections identified? If so, select theLoad Sentinel Detections for YYYY
button. - Are all the sensors loaded to the
sensor_match
tables? Compare the counts betweendetections_yyyy
andsensor_match_yyyy
provided. - Are all the detections loaded from
detections_yyyy
tootn_detections_yyyy
? Compare the counts provided. If there is a mismatch, we will get more details in thedetections-3b
notebook - Is the formatting for the dates correct?
- Is the formatting for the_geom correct?
- Are there detections without receivers? If so, we will get more details in the
detections-3b
notebook - Are there receivers without detections? If so, we will get more details in the
detections-3c
notebook
The notebook will indicate the sheet had passed quality control by adding a ✔️green checkmark beside each section.
If there are any errors contact OTN for next steps.
Task list checkpoint
In GitLab, these tasks can be completed at this stage:
- [ ] - NAME verify otn_detections_yyyy (`detections-3` notebook)
- [ ] - NAME load sentinel records (`detections-3` notebook)
detections - 3b - missing_metadata_check
This notebook is for checking for detections that have not been inserted into otn_detections_yyyy
, which will indicate missing receiver metadata.
The user will be able to set a threshold for the minimum number of detections to look at (default is 100). It will also separate animal detections from transceiver detections in a graph.At the end, it will show a SQL command to run so that the missing metadata can be seen in table format.
Import cells and Database connections
As in all notebooks run the import cell to get the packages and functions needed throughout the notebook. This cell can be run without any edits.
The second cell will set your database connection. You will have to edit one section: engine = get_engine()
- Within the open brackets you need to open quotations and paste the path to your database
.kdbx
file which contains your login credentials. - On MacOS computers, you can usually find and copy the path to your database
.kdbx
file by right-clicking on the file and holding down the “option” key. On Windows, we recommend using the installed software Path Copy Copy, so you can copy a unix-style path by right-clicking. - The path should look like
engine = get_engine('C:/Users/username/Desktop/Auth files/database_conn_string.kdbx')
.
Once you have added your information, you can run the cell. Successful login is indicated with the following output:
Auth password:········
Connection Notes: None
Database connection established
Connection Type:postgresql Host:db.for.your.org Database:your_db_name User:your_node_admin Node:Node
User Inputs
Information regarding the tables we want to check against is required.
schema = 'collectioncode'
- edit to include the relevant project code, in lowercase, between the quotes.
years = []
- A comma-separated list of detection table years for detections_yyyy, should be in form
[yyyy]
or a list such as[yyyy,yyyy,'early']
- A comma-separated list of detection table years for detections_yyyy, should be in form
Once you have edited these values, you can run the cell. You should see the following success message:
All tables exist!
Check for Missing Metadata in Detections_yyyy
This step will perform the check for missing metadata in detections_yyyy
and display results for each record where the number of excluded detections is greater than the specified threshold.
First: enter your threshold. Formatted like: threshold = 100
. Then you may run the cell.
The output will include useful information:
- Which receiver is missing detections from
otn_detections_yyyy
? - How many are missing?
- What type of detections are missing? Transceiver, animal tags, or test tags.
- What are the date-ranges of these missing detections? These dates can be used to determine the period for which we are missing metadata.
- Other notes: is this date range before known deployments? After know deployments? Between known deployments?
There will be a visualization of the missing metadata for each instance where the number of missing detections is over the threshold.
Any instance with missing detections (greater than the threshold) should be identified, and the results pasted into a new GitLab Issue. The format will look like:
VR2W-123456
missing XX detections (0 transceiver tags, XX animal tags, 0 test tags) (before deployments) (YYYY-MM-DD HH:MM:SS to YYYY-MM-DD HH:MM:SS)
VR2W-567891
missing XX detections (0 transceiver tags, XX animal tags, 0 test tags) (before deployments) (YYYY-MM-DD HH:MM:SS to YYYY-MM-DD HH:MM:SS)
There are also two cells at the end that allow you create reports for researchers in CSV or HTML format.
Task list checkpoint
In GitLab, this task can be completed at this stage:
- [ ] - NAME check for missing receiver metadata ("detections-3b" notebook)
detections - 3c - missing_vrl_check
This notebook will check for missing data files in the database by comparing the rcvr_locations
and events
tables. For any receiver deployments that are missing events, it will check if there are detections
during that time period for that receiver.
Import cells and Database connections
As in all notebooks run the import cell to get the packages and functions needed throughout the notebook. This cell can be run without any edits.
The second cell will set your database connection. You will have to edit one section: engine = get_engine()
- Within the open brackets you need to open quotations and paste the path to your database
.kdbx
file which contains your login credentials. - On MacOS computers, you can usually find and copy the path to your database
.kdbx
file by right-clicking on the file and holding down the “option” key. On Windows, we recommend using the installed software Path Copy Copy, so you can copy a unix-style path by right-clicking. - The path should look like
engine = get_engine('C:/Users/username/Desktop/Auth files/database_conn_string.kdbx')
.
Once you have added your information, you can run the cell. Successful login is indicated with the following output:
Auth password:········
Connection Notes: None
Database connection established
Connection Type:postgresql Host:db.for.your.org Database:your_db_name User:your_node_admin Node:Node
User Inputs
Information regarding the tables we want to check against is required. Please complete schema = 'collectioncode'
, edited to include the relevant project code, in lowercase, between the quotes.
There are also optional fields:
start_year = YYYY
: The user will be to select a time range (in years) to limit the receivers to only ones that where active at some point during the time range.end_year = YYYY
: The user will be to select a time range (in years) to limit the receivers to only ones that where active at some point during the time range.skip_events = False
: By changing to True, this will skip the events check and go right to checking if there are detections for each period. Only skip the events check if you know there won’t be any events, the detections check takes longer than the event check.
Once you have edited the values, you can run the cell. You should see the following success message:
Checking if collectioncode schema exists...
OK
Checking For Missing Data Files
The next cell will begin scanning the project’s schema to identify if there are any missing data files. If there are no missing files, you will see this output:
Checking if all deployment periods have events...
X/X
Checking if deployment periods missing events have detections...
⭐ All deployment periods had events! Skipping detection check. ⭐
If the notebook has detection missing file, you will see this output:
Checking if all deployment periods have events...
XXX/XXX
Checking if deployment periods missing events have detections...
XX/XX
Displaying Missing Data Files
Now the notebook will begin plotting a Gantt chart, displaying the periods of deployment for which the database is missing data files. There are some optional customizations you can try:
split_plots = False
: you can set to True if you would like multiple, smaller plots createdrcvrs_per_split = 20
: if you are splitting the plots, how many receiver deployments should be depicted on each plot?
Running the cell will safe your configuration options. And the next cell creates the chart(s).
The plot will have useful information:
- the receiver (x axis)
- the date range (y axis)
- the state of the data for that date-range
- all detections and events present
- missing events, detections present
- missing some events
- missing ALL events
- hovering over a deployment period with give details regarding the date range
Exporting Results
After investigating the Gantt chart, a CSV
export can be created to send to the researcher.
First, you must select which types of records you’d like to export from this list:
- all detections and events present
- missing events, detections present
- missing some events (recommended)
- missing ALL events (recommended)
Then, the next cell will print the relevant dataframe, with an option below to Save Dataframe
. Simply type the intended filename and filetype into the File or Dataframe Name
box (ex. missing_vrls_collectioncode.csv) and press Save Dataframe
. The file should now be available in your ipython-utilities
folder for dissemination. Please track this information in a new GitLab ticket.
Task list checkpoint
In GitLab, this task can be completed at this stage:
- [ ] - NAME check for missing data records ("detections-3c" notebook)
events - 3 - create download records
This notebook will promote the events records from the intermediate events
table to the final moorings
records. Only use this notebook after adding the receiver records to the moorings table as this process is dependant on receiver records.
Import cells and Database connections
As in all notebooks run the import cell to get the packages and functions needed throughout the notebook. This cell can be run without any edits.
The second cell will set your database connection. You will have to edit one section: engine = get_engine()
- Within the open brackets you need to open quotations and paste the path to your database
.kdbx
file which contains your login credentials. - On MacOS computers, you can usually find and copy the path to your database
.kdbx
file by right-clicking on the file and holding down the “option” key. On Windows, we recommend using the installed software Path Copy Copy, so you can copy a unix-style path by right-clicking. - The path should look like
engine = get_engine('C:/Users/username/Desktop/Auth files/database_conn_string.kdbx')
.
Once you have added your information, you can run the cell. Successful login is indicated with the following output:
Auth password:········
Connection Notes: None
Database connection established
Connection Type:postgresql Host:db.for.your.org Database:your_db_name User:your_node_admin Node:Node
User Inputs
Information regarding the tables we want to check against is required. Please complete schema = 'collectioncode'
, edited to include the relevant project code, in lowercase, between the quotes.
Once you have edited the value, you can run the cell.
Detecting Download Records
The next cell will scan the events
table looking for data download events, and attempt to match them to their corresponding receiver deployment.
You should see output like this:
Found XXX download records to add to the moorings table
The next cell will print out all the identified download records, in a dataframe for you to view.
Loading Download Records
Before moving on from this you will need to confirm 2 things:
1) Confirm that NO Push is currently ongoing
2) confirm rcvr_locations
for this schema have been verified.
If a Push is ongoing, or if verification has not yet occurred, you must wait for it to be completed before processing beyond this point.
If everything is OK, you can run the cell. The notebook will indicate success with a message like:
Added XXX records to the moorings table
Task list checkpoint
In GitLab, this task can be completed at this stage:
- [ ] - NAME load download records ("events-3" notebook)
Verify Download Records
This cell will have useful information:
- Are the instrument models formatted correctly?
- Are receiver serial numbers formatting correctly?
- Are there any other outstanding download records which haven’t been loaded?
The notebook will indicate the table has passed verification by the presence of ✔️green checkmarks.
If there are any errors, contact OTN for next steps.
Task list checkpoint
In GitLab, this task can be completed at this stage:
- [ ] - NAME verify download records ("events-3" notebook)
events-4 - process receiver configuration
This notebook will process the receiver configurations (such as MAP code) from the events table and load them into the schema’s receiver_config
table. This is a new initiative by OTN to document and store this information, to provide better feedback to researchers regarding the detectability of their tag-programming through time and space.
Import cells and Database connections
As in all notebooks run the import cell to get the packages and functions needed throughout the notebook. This cell can be run without any edits.
The second cell will set your database connection. You will have to edit one section: engine = get_engine()
- Within the open brackets you need to open quotations and paste the path to your database
.kdbx
file which contains your login credentials. - On MacOS computers, you can usually find and copy the path to your database
.kdbx
file by right-clicking on the file and holding down the “option” key. On Windows, we recommend using the installed software Path Copy Copy, so you can copy a unix-style path by right-clicking. - The path should look like
engine = get_engine('C:/Users/username/Desktop/Auth files/database_conn_string.kdbx')
.
Once you have added your information, you can run the cell. Successful login is indicated with the following output:
Auth password:········
Connection Notes: None
Database connection established
Connection Type:postgresql Host:db.for.your.org Database:your_db_name User:your_node_admin Node:Node
User Inputs
Information regarding the tables we want to check against is required. Please complete schema = 'collectioncode'
, edited to include the relevant project code, in lowercase, between the quotes.
Once you have edited the value, you can run the cell.
Get Receiver Configuration
Using the receiver deployment records, and the information found in the events
table, this cell will identify and important configuration information for each deployment. A dataframe will be displayed.
The following cell will extrapolate further to populate all the required columns from the receiver_config
table. A dataframe will be displayed.
Load Configuration to Database
Finally, the notebook will insert the identified records into the receiver_config
table. You should see the following success message, followed by a dataframe:
The following XX receiver configurations are new and have been inserted:
Task list checkpoint
In GitLab, this task can be completed at this stage:
- [ ] - NAME process receiver configuration ("events-4" notebook)
Final Steps
The remaining steps in the GitLab Checklist are completed outside the notebooks.
First: you should access the Repository folder in your browser and ensure the raw detections are posted in the Data and Metadata
folder.
Finally, the Issue can be passed off to an OTN-analyst for final verification in the database.
Key Points
Its important to handle errors when they come up as they can have implications on detections
OTN finishes off detections Issues by running Matching and sensor tag processing
Supplementary notebooks
Overview
Teaching: 15 min
Exercises: 0 minQuestions
What are other Nodebooks are there that could be usefull for me to know as a Node Manager?
Objectives
Learn about Nodebooks that are useful for Node Managers outside the core data loading notebooks
OTN maintains several additional Nodebooks that fall outside the core tag
, deployment
and detection
tools. These may be useful to Node managers who also deal with these particular scenarios.
Check Environment
This notebook checks your system Python environment against our environment.yml
. This is to see if the Python packages and libraries you have installed are in line with what is required to run the ipython Notebooks. This will assist you with updating your packages if they become out-of-date, or if OTN develops and publishes new tools which rely on new packages.
scientific_name_check
This notebook uses WoRMS to check animal common and scientific names. It is used to add new species names to obis.scientificnames
table for use each project. The instructions for using this notebook are the same as the Adding Scientific Names
section in the Create and Update Projects
notebook
Add Instrument Models to Database
This notebook is used to add an instrument model to the obis.instrument_models
table.
insert_vendor_sheet
Used to load manufacturer specifications for tags or receivers into the vendor
tables.
convert - Fathom (vdat) Export - VRL to CSV
Convert VRLs or VDATs into CSV files using command-line Fathom software. Can also be done using the Fathom app.
Active Tracking
Handles active tracking data - detections collected by using a VR100 hydrophone (or similar) during “mobile” tracking activities.
Slocum Telemetry Notebook
This notebook will process and load detection data collected by a slocum
glider mission. Information required includes glider telemetry
, glider metadata
, and detection files
.
Load Health Report
This notebook will load “health reports” collected by LiquidRobotics WaveGliders
while remotely offloading VR4
receivers.
Telemetry Processing for Wave Gliders and Animals
This notebook will process and load detection data collected by a WaveGlider
mission. Information required includes glider telemetry
, glider metadata
, and detection files
.
create and update contacts’ notebook
This notebook can be used to add new contacts to a project and update existing contacts in the database. Note: you cannot change someone’s email address using this tool.
Active Tags and IUCN Status
Located in the vis
subfolder. This creates a summary report of Tag Life, Tags, Detections, Stations. Tailored for OTN’s reporting requirements and data policy.
Generate Receiver Map
Located in the vis
subfolder. This creates a map to show receivers from Nodes. Tailored for OTN’s reporting requirements and data policy.
Receiver Operator Report
Located in the vis
subfolder. This notebook generates a summary report for receiver operators to describe what animals have been seen and when. Tailored for OTN’s reporting requirements and data policy.
DB-Fix Notebooks
These are a series of notebooks for fixing common issues found in the database. These notebooks are beyond the scope of the current training but eventually Data Managers who wish to learn more will be able to take further training. In the meantime, if you see notes in the notebooks such as “Use the DB Fix notebook called XXXX to correct this error”, please contact OTN for assistance.
Key Points
ipython-utilities has many useful notebooks for Node Managers to help them
Data Push
Overview
Teaching: 10 min
Exercises: 0 minQuestions
What is a Data Push?
What do I need to do as a Node Manager for a Data Push?
What are the end products of a Data Push?
Objectives
Understand a basic overview of a Data Push
Understand what a Node Manager’s responsibilities are during a Data Push
Understand the format of Detection Extracts
What it a Data Push?
A Data Push is when the OTN data system is re-verified and any new relevant information is sent to researchers. New data being brought in is cut off so that what’s in the system can be reliably verified. This way any issues found can be fixed and the data can be in the best form based on the information available at that moment. Once verification is done detections are matched across nodes and detection extracts are sent out to researchers. This is also the time when summary schemas like discovery
, erddap
, and geoserver
are updated with the newly verified and updated data.
What is the Push Schedule?
Push events happen three times a year. They start on the third Thursday of the “push month” which are February, June, and October. This date is the cut-off date for all data-loading: no records can be loaded after this. Please aim to have all tickets ready for verification 1 week before this date.
With the increased number of Nodes joining the Pushes, we are announcing the schedule for the next year. Please prepare in advance and mark your calendars.
Push schedule through 2023:
- June 23, 2022
- October 20, 2022
- February 16, 2023
- June 15, 2023
- October 19, 2023
Node Manager Roles During a Push
Node Managers have two main jobs during a Push:
- The first job is to get the Node’s data loaded in time for the cut-off date. Data will be submitted by researchers on a continuous basis, but will likely increase just before a cut-off date. We recommend loading data as it arrives, to attempt to prevent a backlog near the Push date.
- The second job for Node Managers is to create and send out Detection Extracts when they are ready to be made. This will be done using the
detections - create detection extracts
notebook.
Once the cut-off date has passed Node Managers are “off duty”! When it’s time for Detection Extracts to be created and disseminated that task will be assigned to the Node Managers, but this does not signify the end of the Push. There are several more “behind the scenes” steps required.
Please refrain from interacting with the Node Database until OTN staff have announced the Push has ended and data may be loaded again.
Detection Extracts
Detection Extracts are the main output of the Push. They contain all the new detection matches for each project. There are multiple types of detection extracts OTN creates:
- ‘qualified’ which contain detections collected by an array but matched to animals of other projects
- ‘unqualified’ which contain the unmatched or mystery detections collected by an array
- ‘sentinel’ which contain the detections matched to test or transceiver tags collected by an array
- ‘tracker’ which contains detections that have been mapped to animals tagged by a project that can originate from any receiver in the entire Network
- ‘external partners’ which is a report with suggested matches to the mystery-tag layers provided by non-Node partner networks. Summary detection information, including a count of the number of potential matches per project is provided. These matches are meant as a starting point for gaining information from non-Node telemetry networks. Researchers will have to contact those networks directly for more detailed information, and to register.
Detection Extract files are formatted for direct ingestion by analysis packages such as glatos and resonate.
Detections - Create Detection Extracts Notebook
During the Push process, any new detection matches that are made are noted in the obis.detection_extracts_list
table of your Node. These entries will have several pieces of useful information:
detection_extract
: this contains the project code, year, and type of extract that needs to be created.- ex:
ABC,2022,t
will suggest that project ABC needs the extractmatched to animals 2022
(tracker format) created.
- ex:
git_issue_link
: the issue in which these detection matches were impactedpush_date
: the date of the Push when this extract will have to be made
Using these fields, the detections-create detection extracts
notebook can determine which extracts need to be created for each push.
Imports cell
This section will be common for most Nodebooks: it is a cell at the top of the notebook where you will import any required packages and functions to use throughout the notebook. It must be run first, every time.
There are no values here which need to be edited.
User Inputs Database Connection
outputdir = 'C:/Users/path/to/detection extracts/folder/'
- Within the quotes, please paste a filepath to the folder in which you’d like to save all the Detection Extracts.
engine = get_engine()
- Within the open brackets you need to open quotations and paste the path to your database
.kdbx
file which contains your login credentials. - On MacOS computers, you can usually find and copy the path to your database
.kdbx
file by right-clicking on the file and holding down the “option” key. On Windows, we recommend using the installed software Path Copy Copy, so you can copy a unix-style path by right-clicking. - The path should look like
engine = get_engine(‘C:/Users/username/Desktop/Auth files/database_conn_string.kdbx’)
.
- Within the open brackets you need to open quotations and paste the path to your database
Once you have added your information, you can run the cell. Successful login is indicated with the following output:
Auth password:········
Connection Notes: None
Database connection established
Connection Type:postgresql Host:db.your.org Database:your_db User:node_admin Node: Node
Testing dblink connections:
saf-on-fact: DBLink established on user@fact.secoora.org:5002 - Node: FACT
saf-on-migramar: DBLink established on user@db.load.oceantrack.org:5432 - Node: MIGRAMAR
saf-on-nep: DBLink established on user@db.load.oceantrack.org:5432 - Node: NEP
saf-on-otn: DBLink established on user@db.load.oceantrack.org:5432 - Node: OTN
saf-on-act: DBLink established on user@matos.asascience.com:5432 - Node: ACT
saf-on-pirat: DBLink established on user@161.35.98.36:5432 - Node: PIRAT
saf-on-path: DBLink established on user@fishdb.wfcb.ucdavis.edu:5432 - Node: PATH
You may note that there are multiple DB links
required here: this is so that you will be able to include detection matches from all the Nodes. If your kdbx
file doesn’t include any of your DB link accounts, reach out to OTN to help set it up for you.
Detection Extract Selection
There are two options for selecting which Detection Extracts to create:
- There is a
manual entry
cell. Here you can paste a list of extracts in this format (one per line):- project code (capitals), year, type
- There is a cell to query the
obis.detection_extracts_list
table. This is the preferred method.- enter the current Push date like
push_date = 'YYYY-MM-DD'
- enter the current Push date like
Once you have a list of the Detection Extracts to create, you can move on. The next cell will create a list of all the extracts that were just created, which you can use for your own records. It will save in your ipython-utilities
folder.
Create Detection Extracts
This cell will begin creating the identified detection extracts, one by one. You will be able to see a summary of the matched projects for each extract. Please wait for them all to complete - indicated by a green checkmark and a summary of the time it took to complete the extract.
The following section is for Nodes who use Plone as their document management system only
Uploading Extracts to Plone
First the notebook will print a list of all the extracts that need to be uploaded. It should match the list of those just created.
Next, you will need to connect to Plone using a
.auth
file. The format will be like this:plone_auth_path = r'C:/path/to/Plone.auth'
. Success will be indicated with this message:Plone authorization was successful. Connected to 'https://members.oceantrack.org' as 'USER'
Now the notebook will upload all the Detection Extracts into their relevant folders on Plone.
Please wait for them all to complete - indicated by a green checkmark and a summary of the time it took to complete the extract.
Emailing Researchers - Plone
Using the Plone users system, its possible to identify which researchers require an email notification. First you need to supply a
.auth
file for an email account. The format will be like this:email_auth_path = r'C:/path/to/email.auth'
. Success will be indicated with this message:Email auth is set: user= otndc@dal.ca host= smtp.office365.com cc= otndc@dal.ca port= XXX
Upon successful login, you will be able to print out your current email template. If it is not adequate, you can edit the template by changing the
det_extracts_emailSpecial.j2
template in thetemplates
subfolder ofipython-utilities
, and changing the filepath to beemail_template = 'templates/det_extracts_emailSpecial.j2'
, then re-running.Finally, this stage will send the emails. Ensure that
date = 'YYYY-MM-DD'
for the date you uploaded the extracts to Plone. This is how the notebook will determine which links to include in the email template. First: setsend_mail = False
. Run the cell, select the projects of interest andSimulate Sending Emails
. If you are pleased with the output, you can then changesend_mail = True
and re-run. ChooseSend Emails
and they will be sent.
Emailing Researchers - Manual
If you are not using the Plone system for sending emails, you can use the manual
email tool.
You will first enter the collectioncode which has the new Detection Extract file requiring notification: cntct_schema = 'schema'
. Please edit to include the project code, in lowercase, between the quotation marks.
An interactive menu will then appear, listing all the contacts associated with the project. You can select each of those you’d like to email.
The following cell will print out the resulting contact list.
Next, you will be able to review the email template. Required input includes:
email_auth_path = '/path/to/email.auth'
: please paste the filepath to your email.auth
between the quotes.template = './emailtools/templates/email_researcher.html'
: you can select the template you’d like to use. If the ones provided are not adequate, you can edit the template by changing theemail_researcher.html
template by navigating from theipython-utilities
main folder toemailtools
then into thetemplates
folder. Save your edited file and re-run
If the email preview is acceptable, you may run the final cell in this section which will send the emails.
Update Extract Table
Once all extracts are made, uploaded to your file management system and emails have been sent to researchers, the final step is to ensure we mark in the obis.detection_extracts_list
table that we have completed these tasks.
Please enter current_push_date = 'yyyy-mm-dd'
: the date of the Push when these extracts have been made.
Then, an interactive dataframe will appear. This dataframe will allow you to check-off the extracts as completed
based on those you were able to successfully create.
Now you’re done with Detection Extracts until next Push!
Key Points
A Data Push is when we verify all the data in the system, fix any issues, and then provide detection matches to researchers
As Node Managers its your responsibility to get the data into the system so OTN can verify and get it ready to be sent out
Detection Extracts are the main end product of the Push
Fixing Data Errors
Overview
Teaching: 10 min
Exercises: 0 minQuestions
How do I identify data errors?
How do I correct data errors?
Objectives
Understand the workflow for faxing data errors.
Identifying Errors
During the process of loading data it is possible to uncover errors with previously-loaded datasets. OTN is constantly improving our QA/QC tools, which means we are identifying and correcting more and more historical errors.
Generally, there a few ways Node Managers will identify errors:
- By using the
Verification
cells in the Nodebooks - When new QA/QC updates are released for the Nodebooks
- When a researcher identifies an error explicitly
- When a researcher submits data that does not match previously-loaded records
In the latter case, full comparison between the records is required, followed by a discussion with the researcher to identify if the previously-loaded records or the new records are correct. Often, the outcome is that the data in the DB needs correction.
Scoping the Required Correction
Once an error has been identified, the correction needs to be scoped. This includes: which tables are affected by the error, which catalognumbers, etc. Please use the DB Fix Issue
GitLab template.
Here is the template for reference:
# **DB Fix Issue**
__This issue title should include schema name, year of record affected__
- eg "HFX 2020 receiver lat/long change"
__add DB fix label to issue__
## Related gitlab issue:
- eg #1234, [paste link to issue]
## Schema:
- eg HFX
## DB table name:
- eg HFX.moorings
### information needed for each record that needs fixing (repeat for each catalog number):
1. unique identifier:
1. original value:
1. new value:
- [ ] NAME - label issue `db fix`
- [ ] NAME - tag OTNDC staff for assistance with request (@diniangela etc)
Once fully scoped, you may assign the Issue to OTN Staff to complete the correction. If it is a simple process, they may provide instructions to the Node Manager to complete. Do not attempt to correct the issue yourself without consultation with OTN.
Database Fix Notebooks
Currently, the OTN Database Team is working on a suite of notebooks for fixing common issues found in the database. These are the tools that the OTN Team will be using to correct the identified errors, if the tool to do so already exists.
These notebooks are beyond the scope of the current training but eventually Data Managers who wish to learn more will be able to take further training. In the meantime, if you see notes in the notebooks such as “Use the DB Fix notebook called XXXX to correct this error”, please create a DB Fix Issue
ticket and pass it off to OTN.
Key Points
OTN is developing tools to support this.
Upholding the Data Policy
Overview
Teaching: 10 min
Exercises: 0 minQuestions
How do I handle data requests?
Objectives
Understand how to uphold the Network’s Data Policy.
As an Node Data Manager, you will have the key responsibility to uphold your Network’s Data Privacy Policy. You have access to private datasets that not even your direct supervisors will have access to.
For this reason, you need to be intimately familiar with the types of data which are under embargo/private, and what is publicly available for your Network. You will need to develop an internal protocol for when requests for data access are submitted, to ensure appropriate care is taken to protect the integrity of your members’ data.
OTN recommends creating a “Data Request Response Policy”.
External Requests
This is an example of how OTN handles these requests:
- Request for data (from person other than data owner) submitted
- Data request is scoped, in a GitLab Issue. All details from requester is included.
- Impacted PIs are identified, and contacted, seeking written permission for requester to access the information.
- Written permission is documented in GitLab Issue, to preserve the paper-trail.
- Data request report is compiled and provided to requester, once all permissions have been received.
Internal Requests
Internal requests, from other Network staff, need to be handled in a similar way. While all Network staff should be familiar with the Data Policy, this may not be the case. It is the Node Manager’s responsibility to ensure that information from the database is not shared outside of internal reports. We need to track requests from all sources, for OTN this even includes the OTN Glider and Field teams too, so we can have a record of who knows what, and be able to remind them of our policies officially.
Partner-Node Data Policies
It’s important to remember that the Data Policy of the Network which holds the tag records will apply to the detections of those tags, in other Nodes. You cannot share information/detections of tags from outside your Node, without first consulting with your partner’s Data Policy. This will likely not be an issue for most Nodes, but is a key consideration for OTN staff when creating data products.
For guidance on the interpretation of your Data Policy, or the Data Policy of a parter Node, OTN’s Director of Data Operations is available to assist.
Key Points
Node Managers have a key responsbility to ensure the Data Policy is followed.
Nodebook Improvement and Development
Overview
Teaching: 10 min
Exercises: 0 minQuestions
How do I request new features?
How do I report bugs to OTN?
What is my role in Nodebook develeopment?
Objectives
Understand the process for developing the Nodebooks
Understand the process for testing new Nodebook changes
Once you begin using the OTN Nodebooks, you will likely discover tools that are missing, or features which have code-bugs in them. OTN developers are here to help ensure the Nodebooks meet your needs. We are constantly changing our processes and are always open to suggestions to improve! These tools are for you, and we want to ensure they are useful.
New Features
If there is a feature that you’d like to see, you can bring this to OTN’s attention in this way:
- Ask in our Slack channels to see if there is a tool that already exists that will fit your needs. Discuss your ideal-feature with OTN developers to help them understand what you need.
- Once the new feature is properly scoped, create a new GitLab Issue here https://gitlab.oceantrack.org/otn-partner-nodes/ipython-utilities/-/issues, using the
new_feature
template. You should assign to one of the OTN developers, who can begin to work on the request.
Here is the “new_feature” template, for your information:
Note that this template is **ONLY** used for tracking new features. Bugs should be filed separately using the Bug template.
The new feature should be proposed, scoped, and approved **before** you fill out this template.
## Summary
**Provide a general summary of the feature to be built.**
## Rationale
**Why do we want this?**
## End Users
**Who is expected to use this functionality most?**
## Scope
**What are the expected qualities and functionality of this new feature? One per line.**
## Minimum viable product
**What subset of the above functionality is necessary before the feature is considered finished?**
## Timeline
**Give a brief, ballpark estimate on when you would want this done.**
Bug-fixes
If you encounter an error in your Nodebooks, its possible there is an issue with your dataset. Sometime, however, the case is that the Nodebook is not functioning as expected! If you believe a certain Nodebook is malfunctioning, you will want to identify this bug to OTN developers as soon as possible.
To identify a bug, here are the steps to take:
- Ask in our Slack channels to see if the error is caused by your dataset. This can include posting an error message, or just describing the output from the Nodebook, and why it is not as expected.
- If OTN developers identify that the problem is not your dataset, the next step will be to create a GitLab Issue here https://gitlab.oceantrack.org/otn-partner-nodes/ipython-utilities/-/issues, using the
bug
template. You should assign to one of the OTN developers, and use the labelbugfix
.
Here is the “bug” template, for your information:
Note that this template is **ONLY** used for reporting bugs. New features must be vetted and filed separately using the New Feature template.
As such, before you continue, ask yourself:
**Does this ticket concern existing functionality that is broken? Or is it a deficiency in the system as-designed that needs net new work to complete?**
If it's the former, then use this template, otherwise, use the New Feature template.
## Summary
**Provide a general summary of the problem.**
## Expected Behavior
**What should be happening?**
## Current Behavior
**What is actually happening?**
## Steps to Reproduce
**How can I see this bug happening? Where does it happen? Provide links to specific pages where the bug is visible, along with steps to reproduce it.**
## Priority
**On a scale from 1 to 5, approximate the severity of the issue. Be honest.**
Testing Features/Fixes
Once OTN developers have attempted to build your new feature, or fix your bug, they will publish their code into the integration
branch of our ipython-utilities Nodebooks. You will be asked to test these changes, to ensure the end-product is satisfactory. This branch is used for development and testing only, and should not be used for data-loading outside of testing.
Here are instructions for changing branches:
(Replace anything in <> with the sensible value for your installation/situation)
Open git bash or terminal.
From your git bash window, type:
cd "/path/to/your/ipython-utilities"
This will put your terminal or git bash terminal in the folder where this codebase lives on your computer. Now we’re going to
- reset any changes you have made locally (save anything you might want to keep with
git stash
), - switch to the new branch
- update that branch with any new code
git reset --hard
git checkout <branch you are moving to ie:integration>
git pull
Resolve python libraries by using the mamba command:
mamba env update -n root -f environment.yml
You will then be able to test the new code. When you are finished, follow the same steps to switch back to the master
branch to continue your work. The master
branch contains all the production-quality tools.
Key Points
The Nodebooks are for you - OTN wants to ensure they remain useful
Moving Platform: Mission Metadata, Telemetry and Detection Loading
Overview
Teaching: 90 min
Exercises: 0 minQuestions
What are considered moving platforms. What are the use cases?
How do I load moving platform metadata, telemetry data, and detections into the Database?
Objectives
Understand the workflow for moving platform workflow in the OTN system
Learn how to use the
Movers
notebooks
Here is the issue checklist in the OTN Gitlab Moving Platforms
template, for reference:
Moving platform
- [ ] - NAME load raw metadata file (`movers-1` notebook)**(:fish: table name: c_moving_platform_missions_yyyy)**
- [ ] - NAME load raw telemetry files (`movers-2` notebook) **(:fish: table name: c_moving_platform_telemetry_yyyy**)
- [ ] - NAME create telemetry table from raw table (`movers-2` notebook) **(:fish: table name: moving_platform_telemetry_yyyy**)
- [ ] - NAME combine mission metadata with telemetry (`movers-2` notebook) **(:fish: table name: moving_platform_mission_telemetry_yyyy)**
- [ ] - NAME load to raw detections (`detections-1` notebook) **(:fish: table name: c_detections_yyyy)**
- [ ] - NAME verify raw detections table (`detections-1` notebook)
- [ ] - NAME load raw events (`events-1` notebook) **(:fish: table name: c_events_yyyy )**
- [ ] - NAME load raw events to events table (`events-2` notebook)
- [ ] - NAME load to detections_yyyy_movers (`movers-2` notebook) **(:fish: put affected years here)**
- [ ] - NAME delete self detections (`movers-3` notebook)
- [ ] - NAME timedrift correction for affected detection (`movers-3` notebook)
- [ ] - NAME verify timedrift corrections (`movers-3` notebook)
- [ ] - NAME verify detections_yyyy_movers (looking for duplicates) (`movers-3` notebook)
- [ ] - NAME load to sensor match (`movers-3` notebook) **(:fish: put affected years here)**
- [ ] - NAME load formatted telemetry tables (`movers-4` notebook) **(:fish: put affected years here)**
- [ ] - NAME load reduced telemetry tables (`movers-4` notebook) **(:fish: put affected years here)**
- [ ] - NAME load glider as receiver tables (`movers-4` notebook) **(:fish: put affected years here)**
- [ ] - NAME load into vw_detections_yyyy_movers (`movers-4` notebook) **(:fish: put affected years here)**
- [ ] - NAME load view detections into otn_detections_yyyy (`movers-4` notebook) **(:fish: put affected years here)**
- [ ] - NAME verify otn_detections_yyyy (`movers-4` notebook)
- [ ] - NAME create mission and receiver records in moorings (`movers-4` notebook)
- [ ] - NAME load download records (`events-3` notebook)
- [ ] - NAME verify download records (`events-3` notebook)
- [ ] - NAME process receiver configuration (`events-4` notebook)
- [ ] - NAME label issue with *'Verify'*
- [ ] - NAME pass issue to analyst for final steps
- [ ] - NAME match tags to animals (`detections-4` notebook)
- [ ] - NAME update detection extract table
metadata: **(put metadata repository link here)**
data: **(put data repository link here)**
telemetry: **(put telemetry repository link here)**
Loading Mission Metadata
Moving platform missing metadata should be reported to the Node in the template provided here. This spreadsheet file will contain one or more missions (rows) of the moving platform: identifiers, instruments used, and deployment/recovery times.
- Visually check for any missing information and inconsistant or formatting issues in the essential columns? Column names and example data are shown as below:
- platform_id: e.g.
1234567
- otn_mission_id: e.g.
1234567202310031456
(Note: otn_mission_id is an iternal unique identifier which can be constructed asplatform_id + deploy_date_time digits
). - ins_model_no: e.g.
VMT
- ins_serial_no: e.g.
130000
- deploy_date_time: e.g.
2023-10-03T14:56:00
- recover_date_time: e.g.
2023-12-03T12:00:00
- platform_id: e.g.
- Run through the [
movers - 1 - Load Mission Metadata
notebook] (http://localhost:8888/notebooks/movers%20-%201%20-%20Load%20Mission%20Metadata.ipynb) to load the spreadsheet into themission_table
:
User Input
Cell three requires input from you. This information will be used to get the raw mission CSV and to be able to create a new raw mission table in the database.
schema
: ‘collectioncode’- please edit to include the relevant project code, in lowercase, between the quotes.
table_suffix
: e.g.2024_03
- should be the same as in themovers - 1 - Load Mission Metadata
notebook.- Within the quotes, please add your custom table suffix. We recommend using
year_month
or similar, to indicate the most-recently downloaded instrument.
- Within the quotes, please add your custom table suffix. We recommend using
mission_file
: ‘/path/to/mission_file’- paste a filepath to the relevant XLSX file. The filepath will be added between the provided quotation marks.
- Check off the step and record the
mission_table
name in the Gitlab ticket.
- [ ] - NAME load raw metadata file (movers-1 notebook)**(:fish: table name: c_moving_platform_missions_yyyy)**
Loading Telemetry Data
- Visually check if any missing information, inconsistant or formatting issues in the four essential columns? Column names and example data are shown as below:
- Timestamp: e.g.
2023-12-13T13:10:12
(Note: the column name maybe different) - lat: e.g.
28.33517
(Note: the column name maybe different) - lon: e.g.
-80.33734833
(Note: the column name maybe different) - vehicleName: e.g.
1234567202310031456
(Note: the column name maybe different.Ensure the values match themission_table
.platform_id
in the Loading Mission Metadata step) - otn_mission_id: e.g.
1234567202310031456
(Note: this column needs to be added in a spreadsheet application. And populate the values to match the values in themission_table
.otn_mission_id
in the Loading Mission Metadata step)
- Timestamp: e.g.
- Launch [
movers - 2 - Load telemetry
notebook] (http://localhost:8888/notebooks/movers%20-%202%20-%20Load%20telemetry.ipynb) and fill in
User Input
Cell three requires input from you. This information will be used to get the telemetry CSV and to be able to create a new raw telemetry table in the database.
table_suffix
: e.g.2024_03
- should be the same as in themovers - 1 - Load Mission Metadata
notebook.- Within the quotes, please add your custom table suffix. We recommend using
year_month
or similar.
- Within the quotes, please add your custom table suffix. We recommend using
schema
: ‘collectioncode’- please edit to include the relevant project code, in lowercase, between the quotes.
telemetry_file
: ‘/path/to/telem_file’- paste a filepath to the relevant CSV file. The filepath will be added between the provided quotation marks.
-
Run the
Prepare the telemetry file to be upload
cell to map the spreadsheet comlumns to Database columns. -
Run the
verify_telemetry_file
andload_csv
cells to load the telemetry data (.csv) file into theraw_telemetry
table,telemetry
table and joined withmission_table
as themoving_platform_mission_telemetry
table: -
Check off the step and record the
c_moving_platform_telemetry
name in the Gitlab ticket.
- [ ] - NAME load raw telemetry files (movers-2 notebook) **(:fish: table name: c_moving_platform_telemetry_yyyy**)
-
Run the
movers - 2 - Load telemetry
notebook:create_telemetry_table
andverify_telemetry_table
cells to create the telemtry table for joining to missions: -
Check off the step and record the
moving_platform_telemetry
name in the Gitlab ticket.
- [ ] - NAME create telemetry table from raw table (movers-2 notebook) **(:fish: table name: moving_platform_telemetry_yyyy**)
-
Run the
movers - 2 - Load telemetry
notebook:verify_missions_table
,create_joined_table
, andverify_joined_table
cells to create the mission and telemetry joined table: -
Check off the step and record the
moving_platform_mission_telemetry
name in the Gitlab ticket.
- [ ] - NAME combine mission metadata with telemetry (movers-2 notebook) **(:fish: table name: moving_platform_mission_telemetry_yyyy)**
Loading Raw Detections and Events
These detailed steps and explanations are the same as https://github.com/ocean-tracking-network/node-manager-training/blob/gh-pages/_episodes/08_Detections.md Convert to CSV
section, detections - 1 - load csv detections
section, events - 1 - load events into c_events_yyyy
section and events - 2 - move c_events into events table
section. Please use the above Detection Loading process as reference.
- The related detections may now be processed. Detection data should be reported to the Node as a collection of raw, unedited files. These can be in the form of a zipped folder of
.VRLs
, a database from Thelma Biotel or any other raw data product from any manufacturer. The files contain only transmitter numbers and the datetimes at which they were recorded at a specific receiver. Thetag metadata
anddeployment metadata
will provide the associated geographic and biological context to this data.
Visual Inspection
Once the files are received from a researcher, the Data Manager should first complete a visual check for formatting and accuracy.
Things to visually check:
- Do the files appear edited? Look for
_edited
in file name. - Is the file format the same as expected for that manufacturer? Ex.
.vrl
for Innovasea - not.csv
orrld
formats. - Is there data for each of the instrument recoveries that was reported in the
deployment metadata
?
Convert to CSV
Once the raw files are obtained, the data must be converted to csv
format. There are several ways this can be done, depending on the manufacturer.
For Innovasea
- VUE
- Open a new
database
- Import all the
VRL
files - Select
export detections
and choose the location you want to save the files - Select
export events
and choose the location you want to save the files
- Open a new
- Fathom Connect App
- choose “export data”
- select the relevant files and import into the Fathom Connect application
- export all data types, and choose the location you want to save the files
convert - Fathom (vdat) Export - VRL to CSV
Nodebook- this will use the
vdat.exe
executable to export from VRL/VDAT to CSV - select the folder containing the relevant files and the location you’d like the CSVs saved
- run the cells to
convert
- this will use the
For Thelma Biotel
- use the
ComPort
software to open the.tbdb
file and export as CSV
For Lotek
- exporting to CSV is more complicated, please reach out to OTN for specific steps
Other manufacturers: contact OTN staff.
detections - 1 - load csv detections
Detections-1 loads CSV detections files into a new database table. If detections were exported using Fathom
or the convert - Fathom (vdat) Export - VRL to CSV
notebook, the events
records will also be loaded at this stage. This is because these applications combine the detections and events data in one CSV file.
Import cells and Database Connections
As in all notebooks, run the import cell to get the packages and functions needed throughout the notebook. This cell can be run without any edits.
The second cell will set your database connection. You will have to edit one section: engine = get_engine()
- Within the open brackets you need to open quotations and paste the path to your database
.kdbx
file which contains your login credentials. - On MacOS computers, you can usually find and copy the path to your database
.kdbx
file by right-clicking on the file and holding down the “option” key. On Windows, we recommend using the installed software Path Copy Copy, so you can copy a unix-style path by right-clicking. - The path should look like
engine = get_engine('C:/Users/username/Desktop/Auth files/database_conn_string.kdbx')
.
Once you have added your information, you can run the cell. Successful login is indicated with the following output:
Auth password:········
Connection Notes: None
Database connection established
Connection Type:postgresql Host:db.for.your.org Database:your_db_name User:your_node_admin Node:Node
User Input
Cell three requires input from you. This information will be used to get the raw detections CSV and to be able to create a new raw table in the database.
file_or_folder_path = r'C:/Users/path/to/detections_CSVs/'
- paste a filepath to the relevant CSV file(s). The filepath will be added between the provided quotation marks.
- this can be a path to a single CSV file, or a folder of multiple CSVs.
table_suffix = 'YYYY_mm'
- Within the quotes, please add your custom table suffix. We recommend using
year_month
or similar, to indicate the most-recently downloaded instrument.
- Within the quotes, please add your custom table suffix. We recommend using
schema = 'collectioncode'
- please edit to include the relevant project code, in lowercase, between the quotes.
There are also some optional inputs:
load_detections
: a true or false value using the table suffix you suppliedstacked
: this is for Fathom exports only and is a way to know how to parse them
Once you have added your information, you can run the cell.
Verify Detection File and Load to Raw Table
Next, the notebook will review and verify the detection file(s) format, and report any error. Upon successful verification, you can then run the cell below which will attempt to load the detections into a new raw table.
The notebook will indicate the success of the table-creation with a message such as this:
Reading fathom files...
Loading Files...
7/7
Task list checkpoint
In GitLab, this task can be completed at this stage:
- [ ] - NAME load to raw detections (detections-1 notebook) **(:fish: table name: c_detections_yyyy)**
Ensure you paste the table name (ex: c_detections_YYYY_mm) into the section indicated, before you check the box.
Verify Raw Detection Table
This cell will now complete the Quality Control checks of the raw table. This is to ensure the Nodebook loaded the records correctly from the CSVs.
The output will have useful information:
- Are there any duplicates?
- Are the serial numbers formatted correctly?
- Are the models formatted correctly?
The notebook will indicate the sheet had passed quality control by adding a ✔️green checkmark beside each section.
If there are any errors, contact OTN for next steps.
Task list checkpoint
In GitLab, these tasks can be completed at this stage:
- [ ] - NAME verify raw detections table (detections-1 notebook)
events - 1 - load events into c_events_yyyy
Events-1 is responsible for loading receiver events files into raw tables. This is only relevant for CSVs that were NOT exported using Fathom
or the convert - Fathom (vdat) Export - VRL to CSV
notebook.
Import cell
As in all notebooks run the import cell to get the packages and functions needed throughout the notebook. This cell can be run without any edits.
User Inputs
Cell two requires input from you. This information will be used to get the raw events CSV and to be able to create a new raw table in the database.
filepath = r'C:/Users/path/to/events.csv'
- paste a filepath to the relevant CSV file. The filepath will be added between the provided quotation marks.
table_name = 'c_events_YYYY_mm'
- Within the quotes, please add your custom table suffix. We recommend using
year_month
or similar, to indicate the most-recently downloaded instrument.
- Within the quotes, please add your custom table suffix. We recommend using
schema = 'collectioncode'
- please edit to include the relevant project code, in lowercase, between the quotes.
There are also some optional inputs:
file_encoding
: The file_encoding: ISO-8859-1 in the event export. The default encoding used in VUE’s event export
Once you have added your information, you can run the cell.
Verifying the events file
Before attempting to load the event files to a raw table the notebook will verify the file to make sure there are no major issues. This will be done by running the Verify events file cell. Barring no errors, you will be able to continue.
The notebook will indicate the success of the file verification with a message such as this:
Reading file 'events.csv' as CSV.
Verifying the file.
Format: VUE 2.6+
Mandatory Columns: OK
date_and_time datetime:OK
Initialization(s): XX
Data Upload(s): XX
Reset(s): XX
Database Connection
You will have to edit one section: engine = get_engine()
- Within the open brackets you need to open quotations and paste the path to your database
.kdbx
file which contains your login credentials. - On MacOS computers, you can usually find and copy the path to your database
.kdbx
file by right-clicking on the file and holding down the “option” key. On Windows, we recommend using the installed software Path Copy Copy, so you can copy a unix-style path by right-clicking. - The path should look like
engine = get_engine('C:/Users/username/Desktop/Auth files/database_conn_string.kdbx')
.
Once you have added your information, you can run the cell. Successful login is indicated with the following output:
Auth password:········
Connection Notes: None
Database connection established
Connection Type:postgresql Host:db.for.your.org Database:your_db_name User:your_node_admin Node:Node
Load the events file into the c_events_yyyy table
The second last cell loads the events file into a raw table. It depends on successful verification from the last step. Upon successful loading and can dispose of the engine then move on to the next notebook.
The notebook will indicate the success of the table-creation with the following message:
File loaded with XXXXX records.
100%
Task list checkpoint
In GitLab, these tasks can be completed at this stage:
- [ ] - NAME load raw events (
events-1 notebook) **(:fish: table name: c_events_yyyy )**
Ensure you paste the table name (ex: c_events_YYYY_mm) into the section indicated, before you check the box.
events - 2 - move c_events into events table
This notebook will move the raw
events records in the intermediate
events table.
Import cell
As in all notebooks run the import cell to get the packages and functions needed throughout the notebook. This cell can be run without any edits.
User input
This cell requires input from you. This information will be used to get the raw events CSV and to be able to create a new raw table in the database.
c_events_table = 'c_events_YYYY_mm'
- Within the quotes, please add your custom table suffix, which you have just loaded in either
detections-1
orevents-1
.
- Within the quotes, please add your custom table suffix, which you have just loaded in either
schema = 'collectioncode'
- please edit to include the relevant project code, in lowercase, between the quotes.
Database Connection
You will have to edit one section: engine = get_engine()
- Within the open brackets you need to open quotations and paste the path to your database
.kdbx
file which contains your login credentials. - On MacOS computers, you can usually find and copy the path to your database
.kdbx
file by right-clicking on the file and holding down the “option” key. On Windows, we recommend using the installed software Path Copy Copy, so you can copy a unix-style path by right-clicking. - The path should look like
engine = get_engine('C:/Users/username/Desktop/Auth files/database_conn_string.kdbx')
.
Once you have added your information, you can run the cell. Successful login is indicated with the following output:
Auth password:········
Connection Notes: None
Database connection established
Connection Type:postgresql Host:db.for.your.org Database:your_db_name User:your_node_admin Node:Node
Verify table format
You will then verify that the c_events events table you put in exists and then verify that it meets the required format specifications.
The notebook will indicate the success of the table verification with a message such as this:
Checking table name format... OK
Checking if schema collectioncode exists... OK!
Checking collectioncode schema for c_events_YYYY_mm table... OK!
collectioncode.c_events_YYYY_mm table found.
If there are any errors in this section, please contact OTN.
Load to Events table
Pending nothing comes up in the verification cells, you run the loading
cell.
The notebook will indicate the success of the processing with a message such as this:
Checking for the collectioncode.events table... OK!
Loading events... OK!
Loaded XX rows into collectioncode.events table.
Task list checkpoint
In GitLab, these tasks can be completed at this stage:
- [ ] - NAME load raw events to events table (events-2 notebook)
Loading Detections for Moving Platforms
- With the telemetry and mission table, we can now upload the raw detections and promote them to the detections_yyyy_movers tables.
- This notebook has the functionalities of
detections - 2 - c_table into detections_yyyy
anddetections - 2b - timedrift calculations
notebooks. The difference is it handles_movers
tables.- Run
movers - 3 - Load Detections
notebook tillLoad raw dets into detections_yyyy_movers
cell to populatedetections_yyyy_movers
tables and load raw detections into them.
- Run
User Input
Cell three requires input from you. This information will be used to get the raw detections CSV and to be able to create a new raw table in the database.
table_suffix
: e.g.2024_03
- should be the same as in themovers - 1 - Load Mission Metadata
notebook.- Within the quotes, please add your custom table suffix. We recommend using
year_month
or similar.
- Within the quotes, please add your custom table suffix. We recommend using
schema
: ‘collectioncode’- please edit to include the relevant project code, in lowercase, between the quotes.
- Check off the step and record affected years (which detections_yyyy_movers tables were updated) in the Gitlab ticket.
- [ ] - NAME load to detections_yyyy_movers (
movers-2 notebook) **(:fish: put affected years here)**
- Note: if not detections were promoted into detections_yyyy_movers please assign the ticket to OTN for trouble shooting.
- Run the next six cells to load timedrift factors (into time_drift_factors), apply time adjustment to detections_yyyy_movers and verify the timedrift corrections. Check off the steps in the Gitlab ticket.
- [ ] - NAME timedrift correction for affected detection (
movers-3 notebook)
- [ ] - NAME verify timedrift corrections (movers-3 notebook)
- Run the next two cells to verify the detections_yyyy_movers tables. Check off the step in the Gitlab ticket.
- [ ] - NAME verify detections_yyyy_movers (looking for duplicates) (movers-3 notebook)
- Run the last cell to create and load sensor match tables for movers. Check off the step and record affected years in the Gitlab ticket.
- [ ] - NAME load to sensor match (
movers-3 notebook) **(:fish: put affected years here)**
Loading OTN Detections
User Input
Cell three requires input from you. This information will be used to get the raw detections CSV and to be able to create a new raw table in the database.
table_suffix
: e.g.2024_03
- should be the same as in themovers - 1 - Load Mission Metadata
notebook.- Within the quotes, please add your custom table suffix. We recommend using
year_month
or similar.
- Within the quotes, please add your custom table suffix. We recommend using
schema
: ‘collectioncode’- Please edit to include the relevant project code, in lowercase, between the quotes.
years
: e.g.[2022, 2023]
- Enter the affected years from
movers - 3 - Load Detections
- Enter the affected years from
- Run the
create_moving_platforms_full_telemetry
cell to load to formatted telemetry table. Check off the step and record the affected years in the Gitlab ticket.
- [ ] - NAME load formatted telemetry tables (movers-4 notebook) **(:fish: put affected years here)**
- Run the
create_reduced_telemetry_tables
cell to reduced telemetry tables. Check off the step and record the affected years in the Gitlab ticket.
- [ ] - NAME load reduced telemetry tables (movers-4 notebook) **(:fish: put affected years here)**
- Run the
create_platform_as_receiver_tables
cell to load glider as receiver tables. Check off the step and record the affected years in the Gitlab ticket.
- [ ] - NAME load glider as receiver tables (movers-4 notebook) **(:fish: put affected years here)**
- Run the
create_detections_view
andload_view_into_otn_dets
cells to load view detections into otn_detections_yyyy tables. Check off the step and record the affected years in the Gitlab ticket.
- [ ] - NAME load into vw_detections_yyyy_movers (
movers-4 notebook) **(:fish: put affected years here)**
- [ ] - NAME load view detections into otn_detections_yyyy (movers-4 notebook) **(:fish: put affected years here)**
- Note: if not detections were promoted into detections_yyyy_movers please assign the ticket to OTN for trouble shooting.
- Run the
verify_otn_detections
cell to verify theotn_detections_yyyy
tables. Check off the step in the Gitlab ticket.
- [ ] - NAME verify otn_detections_yyyy (movers-4 notebook)
- Run the
load_platforms_to_moorings
cell to verify themoorings
tables. Check off the step in the Gitlab ticket.
- [ ] - NAME create mission and receiver records in moorings (movers-4 notebook)
events - 3 - create download records
This notebook will promote the events records from the intermediate events
table to the final moorings
records. Only use this notebook after adding the receiver records to the moorings table as this process is dependant on receiver records.
Import cells and Database connections
As in all notebooks run the import cell to get the packages and functions needed throughout the notebook. This cell can be run without any edits.
The second cell will set your database connection. You will have to edit one section: engine = get_engine()
- Within the open brackets you need to open quotations and paste the path to your database
.kdbx
file which contains your login credentials. - On MacOS computers, you can usually find and copy the path to your database
.kdbx
file by right-clicking on the file and holding down the “option” key. On Windows, we recommend using the installed software Path Copy Copy, so you can copy a unix-style path by right-clicking. - The path should look like
engine = get_engine('C:/Users/username/Desktop/Auth files/database_conn_string.kdbx')
.
Once you have added your information, you can run the cell. Successful login is indicated with the following output:
Auth password:········
Connection Notes: None
Database connection established
Connection Type:postgresql Host:db.for.your.org Database:your_db_name User:your_node_admin Node:Node
User Inputs
Information regarding the tables we want to check against is required. Please complete schema = 'collectioncode'
, edited to include the relevant project code, in lowercase, between the quotes.
Once you have edited the value, you can run the cell.
Detecting Download Records
The next cell will scan the events
table looking for data download events, and attempt to match them to their corresponding receiver deployment.
You should see output like this:
Found XXX download records to add to the moorings table
The next cell will print out all the identified download records, in a dataframe for you to view.
Loading Download Records
Before moving on from this you will need to confirm 2 things:
1) Confirm that NO Push is currently ongoing
2) confirm rcvr_locations
for this schema have been verified.
If a Push is ongoing, or if verification has not yet occurred, you must wait for it to be completed before processing beyond this point.
If everything is OK, you can run the cell. The notebook will indicate success with a message like:
Added XXX records to the moorings table
Task list checkpoint
In GitLab, this task can be completed at this stage:
- [ ] - NAME load download records ("events-3" notebook)
Verify Download Records
This cell will have useful information:
- Are the instrument models formatted correctly?
- Are receiver serial numbers formatting correctly?
- Are there any other outstanding download records which haven’t been loaded?
The notebook will indicate the table has passed verification by the presence of ✔️green checkmarks.
If there are any errors, contact OTN for next steps.
Task list checkpoint
In GitLab, this task can be completed at this stage:
- [ ] - NAME verify download records ("events-3" notebook)
events-4 - process receiver configuration
This notebook will process the receiver configurations (such as MAP code) from the events table and load them into the schema’s receiver_config
table. This is a new initiative by OTN to document and store this information, to provide better feedback to researchers regarding the detectability of their tag-programming through time and space.
Import cells and Database connections
As in all notebooks run the import cell to get the packages and functions needed throughout the notebook. This cell can be run without any edits.
The second cell will set your database connection. You will have to edit one section: engine = get_engine()
- Within the open brackets you need to open quotations and paste the path to your database
.kdbx
file which contains your login credentials. - On MacOS computers, you can usually find and copy the path to your database
.kdbx
file by right-clicking on the file and holding down the “option” key. On Windows, we recommend using the installed software Path Copy Copy, so you can copy a unix-style path by right-clicking. - The path should look like
engine = get_engine('C:/Users/username/Desktop/Auth files/database_conn_string.kdbx')
.
Once you have added your information, you can run the cell. Successful login is indicated with the following output:
Auth password:········
Connection Notes: None
Database connection established
Connection Type:postgresql Host:db.for.your.org Database:your_db_name User:your_node_admin Node:Node
User Inputs
Information regarding the tables we want to check against is required. Please complete schema = 'collectioncode'
, edited to include the relevant project code, in lowercase, between the quotes.
Once you have edited the value, you can run the cell.
Get Receiver Configuration
Using the receiver deployment records, and the information found in the events
table, this cell will identify and important configuration information for each deployment. A dataframe will be displayed.
The following cell will extrapolate further to populate all the required columns from the receiver_config
table. A dataframe will be displayed.
Load Configuration to Database
Finally, the notebook will insert the identified records into the receiver_config
table. You should see the following success message, followed by a dataframe:
The following XX receiver configurations are new and have been inserted:
Task list checkpoint
In GitLab, this task can be completed at this stage:
- [ ] - NAME process receiver configuration ("events-4" notebook)
Final Steps
The remaining steps in the GitLab Checklist are completed outside the notebooks.
First: you should access the Repository folder in your browser and ensure the raw detections are posted in the Data and Metadata
folder.
Finally, the Issue can be passed off to an OTN-analyst for final verification in the database.
Key Points
OTN supports processing of slocum and wave glider detections, detections from other mobile platforms (ship-board receivers, animal-mounted receivers, etc.), and active tracking (reference https://oceantrackingnetwork.org/gliders/).
OTN is able to match detections collected by moving platforms as long as geolocation data is provided.
mission metadata
,telemetry data
anddetection data
should be submitted prior to the Moving platform data loading process.