This lesson is being piloted (Beta version)

Preparing ACT Data for actel

Overview

Teaching: 30 min
Exercises: 0 min
Questions
  • How do I take my ACT detection extracts and metadata and format them for use in actel?

Objectives

Preparing our data to use in Actel

So now, as the last piece of stock curriculum for this workshop, let’s quickly look at how we can take the data reports we get from the ACT-MATOS (or any other OTN-compatible data partner, like FACT, or OTN proper) and make it ready for Actel.

# Using ACT-style data in Actel ####

library(actel)
library(stringr)
library(glatos)
library(tidyverse)
library(readxl)

Within actel there is a preload() function for folks who are holding their deployment, tagging, and detection data in R variables already instead of the files and folders we saw in the actel intro. This function expects 4 input objects, plus the ‘spatial’ data object that will help us describe the locations of our receivers and how the animals are allowed to move between them.

To achieve the minimum required data for actel’s ingestion, we’ll want deployment and recovery datetimes, instrument models, etc. We can transform our metadata’s standard format into the standard format and naming schemes expected by actel::preload() with a bit of dplyr magic:

# Load the ACT metadata and detection extracts -------------
# set working directory to the data folder for this workshop
setwd("YOUR/PATH/TO/data/act")

# Our project's detections file - I'll use readr to read everything from proj59 in at once:

proj_dets <- list.files(pattern="proj59_matched_detections*") %>% 
  map_df(~readr::read_csv(.))
# note: readr::read_csv will read in csvs inside zip files no problem.

# read in the tag metadata:

tag_metadata <- readxl::read_excel('Tag_Metadata/Proj59_Metadata_bluecatfish.xls', 
                                   sheet='Tag Metadata', # use the Tag Metadata sheet from this excel file
                                   skip=4) # skip the first 4 lines as they're 'preamble'


# And we can import first a subset of the deployments in MATOS that were deemed OK to publish
deploy_metadata <- read_csv('act_matos_moorings_receivers_202104130939.csv') %>%
  # Add a very quick and dirty receiver group column.
  mutate(rcvrgroup = ifelse(collectioncode %in% c('PROJ60', 'PROJ61'), # if we're talking PROJ61
                            paste0(collectioncode,station_name), #let my receiver group be the station name
                            collectioncode)) # for other project receivers just make it their whole project code.

# Also tried to figure out if there was a pattern to station naming that we could take advantage of
# but nothing obvious materialized.
# mutate(rcvrgroup = paste(collectioncode, stringr::str_replace_all(station_name, "[:digit:]", ""), sep='_'))

# Let's review the groups quickly to see if we under or over-shot what our receiver groups should be.
# nb. hiding the legend because there are too many categories.

deploy_metadata %>% ggplot(aes(deploy_lat, deploy_long, colour=rcvrgroup)) + 
  geom_point() + 
  theme(legend.position="none")

# Maybe this is a bit of an overshoot, but it should work out. proj61 has receivers all over the place, 
# so our spatial analysis is not going to be very accurate.


# And let's look at what the other projects were that detected us.
proj_dets %>% count(detectedby)

# And how many of our tags are getting detections back:
proj_dets %>% filter(receiver != 'release') %>% count(tagname)

# OK most of those who have more than an isolated detection are in our deploy metadata.
# just one OTN project to add.

# For OTN projects, we would be able to add in any deployments of OTN receivers from the OTN GeoServer:

# if we wanted to grab and add V2LGMXSNAP receivers to our deployment metadata
# using OTN's public station history records on GeoServer:
# otn_geoserver_stations_url = 'https://members.oceantrack.org/geoserver/otn/ows?service=WFS&version=1.0.0&request=GetFeature&typeName=otn:stations_receivers&outputFormat=csv&cql_filter=collectioncode=%27V2LGMXSNAP%27'
## TODO: Actel needs serial numbers for receivers, so OTN 
#  will add serial numbers to this layer, and then we can just
#  urlencode the CQL-query for which projects you want, 
#  so you can write your list in 'plaintext' and embed it in the url

# For today, we've made an extract for V2LGMXSNAP and included it in the data folder:
otn_deploy_metadata <- readr::read_csv('otn_moorings_receivers_202104130938.csv') %>%
  mutate(rcvrgroup = collectioncode)

# Tack OTN stations to the end of the MATOS extract.
# These are in the exact same format because OTN and MATOS's databases are the
# same format, so we can easily coordinate our output formats.
all_stations <- bind_rows(deploy_metadata, otn_deploy_metadata)

# For ACT/FACT projects - we could use GeoServer to share this information, and could even add
# an authentication layer to let ACT members do this same trick to fetch deploy histories!

# So now this is our animal tagging metadata
# tag_metadata %>% View

# these are our detections: 

# proj_dets %>% View

# These are our deployments:

# all_stations %>% View

# Mutate metadata into Actel format ----

# Create a station entry from the projectcode and station number.
# --- add station to receiver metadata ----
full_receiver_meta <- all_stations %>%
  dplyr::mutate(
    station = paste(collectioncode, station_name, sep = '-')
  ) %>% 
  filter(is.na(rcvrstatus)|rcvrstatus != 'lost')

We’ve now imported our data, and renamed a few columns from the receiver metadata sheet so that they are in a nicer format. We also create a few helper columns, like a ‘station’ column that is of the form collectioncode + station_name, guaranteed unique for any project across the entire Network.

Formatting - Tagging and Deployment Data

As we saw earlier, tagging metadata is entered into Actel as biometrics, and deployment metadata as deployments. These data structures also require a few specially named columns, and a properly formatted date.

# All dates will be supplied to Actel in this format:
actel_datefmt = '%Y-%m-%d %H:%M:%S'

# biometrics is the tag metadata. If you have a tag metadata sheet, it looks like this:

actel_biometrics <- tag_metadata %>% dplyr::mutate(Release.date = format(UTC_RELEASE_DATE_TIME, actel_datefmt),
                                                   Signal=as.integer(TAG_ID_CODE),
                                                   Release.site = RELEASE_LOCATION, 
                                                   # Group=RELEASE_LOCATION  # can supply group to subdivide tagging groups
)


# deployments is based in the receiver deployment metadata sheet

actel_deployments <- full_receiver_meta %>% dplyr::filter(!is.na(recovery_date)) %>%
  mutate(Station.name = station,
         Start = format(deploy_date, actel_datefmt), # no time data for these deployments
         Stop = format(recovery_date, actel_datefmt),  # not uncommon for this region
         Receiver = rcvrserial) %>%
  arrange(Receiver, Start)

Detections

For detections, a few columns need to exist: Transmitter holds the full transmitter ID. Receiver holds the receiver serial number, Timestamp has the detection times, and we use a couple of Actel functions to split CodeSpace and Signal from the full transmitter_id.

# Renaming some columns in the Detection extract files   
actel_dets <- proj_dets %>% dplyr::filter(receiver != 'release') %>%
  dplyr::mutate(Transmitter = tagname,
                Receiver = as.integer(receiver),
                Timestamp = format(datecollected, actel_datefmt),
                CodeSpace = extractCodeSpaces(tagname),
                Signal = extractSignals(tagname), 
                Sensor.Value = sensorvalue,
                Sensor.Unit = sensorunit)

Note: we don’t have any environmental data in our detection extract here, but Actel will also find and plot temperature or other sensor values if you have those kinds of tags.

Creating the Spatial dataframe

The spatial dataframe must have entries for all release locations and all receiver deployment locations. Basically, it must have an entry for every distinct location we can say we know an animal has been.

# Prepare and style entries for receivers
actel_receivers <- full_receiver_meta %>% dplyr::mutate( Station.name = station,
                                                         Latitude = deploy_lat,
                                                         Longitude = deploy_long,
                                                         Type='Hydrophone') %>%
  dplyr::mutate(Array=rcvrgroup) %>%    # Having too many distinct arrays breaks things.
  dplyr::select(Station.name, Latitude, Longitude, Array, Type) %>%
  distinct(Station.name, Latitude, Longitude, Array, Type)

# Actel Tag Releases ---------------

# Prepare and style entries for tag releases
actel_tag_releases <- tag_metadata %>% mutate(Station.name = RELEASE_LOCATION,
                                              Latitude = RELEASE_LATITUDE,
                                              Longitude = RELEASE_LONGITUDE,
                                              Type='Release') %>%
  # It's helpful to associate release locations with their nearest Array.
  # Could set all release locations to the same Array:
  #  mutate(Array = 'PROJ61JUGNO_2A') %>% # Set this to the closest array to your release locations
  # if this is different for multiple release groups, can do things like this to subset case-by-case:
  # here Station.name is the release location 'station' name, and the value after ~ will be assigned to all.
  mutate(Array = case_when(Station.name %in% c('Red Banks', 'Eldorado', 'Williamsburg') ~ 'PROJ61UTEAST', 
                           Station.name == 'Woodrow Wilson Bridge' ~ 'PROJ56',
                           Station.name == 'Adjacent to Lyons Creek' ~ 'PROJ61JUGNO_5',
                           Station.name == 'Merkle Wildlife Sanctuary' ~ 'PROJ61JUGNO_2A',
                           Station.name == 'Nottingham' ~ 'PROJ61NOTTIN',
                           Station.name == 'Sneaking Point' ~ 'PROJ61MAGRUD',
                           Station.name == 'Jug Bay Dock' ~ 'PROJ61JUGDCK')) %>% # This value needs to be the nearest array to the release site
  distinct(Station.name, Latitude, Longitude, Array, Type)

# Combine Releases and Receivers ------

# Bind the releases and the deployments together for the unique set of spatial locations
actel_spatial <- actel_receivers %>% bind_rows(actel_tag_releases)

Now, for longer data series, we may have similar stations that were deployed and redeployed at very slightly different locations. One way to deal with this issue is that for stations that are named the same, we assign an average location in spatial.

Another way we might overcome this issue could be to increment station_names that are repeated and provide their distinct locations.

# group by station name and take the mean lat and lon of each station deployment history.
actel_spatial_sum <- actel_spatial %>% dplyr::group_by(Station.name, Type) %>%
  dplyr::summarize(Latitude = mean(Latitude),
                   Longitude = mean(Longitude),
                   Array =  first(Array))

Creating the Actel data object w/ preload()

Now you have everything you need to call preload().

# Specify the timezone that your timestamps are in.
# OTN provides them in UTC/GMT.
# FACT has both UTC/GMT and Eastern
# GLATOS provides them in UTC/GMT
# If you got the detections from someone else,
#    they will have to tell you what TZ they're in!
#    and you will have to convert them before importing to Actel!

tz <- "GMT0"

# You've collected every piece of data and metadata and formatted it properly.
# Now you can create the Actel project object.

actel_project <- preload(biometrics = actel_biometrics,
                         spatial = actel_spatial_sum,
                         deployments = actel_deployments,
                         detections = actel_dets,
                         tz = tz)
# Alas, we're going to have to discard a bunch of detections here, 
# as our subsetted demo data doesn't have deployment metadat for certain 
# receivers / time periods and is missing some station deployments

e # discard all detections at unknown receivers - this is almost never
# what you want to do in practice. Ask for missing metadata before
# resorting to this one

There will very likely be some issues with the data that the Actel checkers find and warn us about. Detections outside the deployment time bounds, receivers that aren’t in your metadata. For the purposes of today, we will drop those rows from the final copy of the data, but you can take these prompts as cues to verify your input metadata is accurate and complete. It is up to you in the end to determine whether there is a problem with the data, or an overzealous check that you can safely ignore. Here our demo is using a very deeply subsetted version of one project’s data, and it’s not surprising to be missing some deployments.

Once you have an Actel object, you can run explore() to generate your project’s summary reports:

# actel::explore()

actel_explore_output <- explore(datapack=actel_project, 
                                report=TRUE, GUI='never',
                                print.releases=FALSE)
								
n  # don't render any movements invalid - repeat for each tag, because:
# we haven't told Actel anything about which arrays connect to which
# so it's not able to properly determine which movements are valid/invalid

n  # don't save a copy of the results to a RData object... this time.

# Review the output .html file that has popped up in a browser.
# Our analysis might not make a lot of sense, since...
# actel assumed our study area was linear, we didn't tell it otherwise!

Review the file that Actel pops up in our browser. It presumed our Arrays were arranged linearly and alphabetically, which is of course not correct!

Custom spatial.txt files for Actel

We’ll have to tell Actel how our arrays are inter-connected. To do this, we’ll need to design a spatial.txt file for our detection data.

To help with this, we can go back and visualize our study area interactively, and start to see how the Arrays are connected.

# Designing a spatial.txt file -----

library(mapview)
library(spdplyr)
library(leaflet)
library(leafpop)


## Exploration - Let's use mapview, since we're going to want to move around, 
#   drill in and look at our stations


# Get a list of spatial objects to plot from actel_spatial_sum:
our_receivers <- as.data.frame(actel_spatial_sum) %>%    
  dplyr::filter(Array %in% (actel_spatial_sum %>%   # only look at the arrays already in our spatial file
                              distinct(Array))$Array)

rcvr_spatial <- our_receivers %>%    
  dplyr::select(Longitude, Latitude) %>%           # and get a SpatialPoints object to pass to mapview
  sp::SpatialPoints(CRS('+proj=longlat'))
# and plot it using mapview. The popupTable() function lets us customize our tooltip
mapview(rcvr_spatial, popup = popupTable(our_receivers, 
                                         zcol = c("Array",
                                                  "Station.name")))  # and make a tooltip we can explore

Can we design a graph and write it into spatial.txt that fits all these Arrays together? The station value we put in Array for our PROJ61 and PROJ60 projects looks to be a bit too granular for our purposes. Maybe we can combine many arrays that are co-located in open water into a singular ‘zone’, preserving the complexity of the river systems but creating a large basin to which we can connect the furthest downstream of those river arrays.

To do this, we only need to update the arrays in our spatial.csv file or actel_spatial dataframe. We don’t need to edit our source metadata! We will have to define a spatial.txt file and how these newly defined Arrays interconnect. While there won’t be time to do that for this example dataset and its large and very complicated region, this approach is definitely suitable for small river systems and even perhaps for multiple river systems feeding a bay and onward to the open water.

Custom Spatial Networks

Let’s get started designing a spatial.txt file for our detection data!

First we will visualize our study area, with a popup that tells us what project each deployment belongs to.

library(ggplot2)
library(ggmap)
library(plotly)

# Subset our study area so we're not dealing with too much stuff.

# Where were we detected, excluding Canada and the GoM:
actel_dets <- actel_dets %>% filter(!detectedby %in% c('OTN.V2LGMXSNAP', 'OTN.V2LSTANI', 'OTN.V2LSTADC'))

bbox.minlat <- min(actel_dets$latitude) - 0.5
bbox.maxlat <- max(actel_dets$latitude) + 0.5
bbox.minlong <- min(actel_dets$longitude) - 0.5
bbox.maxlong <- max(actel_dets$longitude) + 1.5

actel_deployments <- actel_deployments %>% filter(between(deploy_lat, bbox.minlat, bbox.maxlat) &
                                                    between(deploy_long, bbox.minlong, bbox.maxlong))

actel_receivers <- actel_receivers %>% filter(Station.name %in% actel_deployments$Station.name)

# biometrics? maybe don't have to?

actel_spatial_sum <- actel_spatial_sum %>% filter(Station.name %in% actel_deployments$Station.name)

base <- get_stamenmap(
  bbox = c(left = min(actel_deployments$deploy_long), 
           bottom = min(actel_deployments$deploy_lat), 
           right = max(actel_deployments$deploy_long), 
           top = max(actel_deployments$deploy_lat)),
  maptype = "toner", 
  crop = FALSE,
  zoom = 12)


proj59_zoomed_map <- ggmap(base, extent='panel') + 
                      ylab("Latitude") +
                      xlab("Longitude") +
                      geom_point(data = actel_spatial_sum, 
                        aes(x = Longitude,y = Latitude, colour = Station.name),
                        shape = 19, size = 2) 
ggplotly(proj59_zoomed_map)

Now we can see all the arrays in our region, and can begin to imagine how they are all connected. Can we design a spatial.txt file that fits our study area using ‘collectioncode’ as our Array?

Not really, no. Too complicated, too many interconnected arrays! Let’s define most of our projects as “outside”, that is, outside our estuary/river system, not really subject to being interconnected.

# projects in relatively 'open' water:
outside_arrays <- c('PROJ56',
                    'PROJ61CORNHB', 'PROJ61POCOMO', 
                    'PROJ61LTEAST', 'PROJ61LTWEST')

# Within the Chesapeake area:
# Piney Point's a gate array
piney_point <-c('PROJ60PINEY POINT A', 'PROJ60PINEY POINT B', 
                'PROJ60PINEY POINT C', 'PROJ60PINEY POINT D')
				
# the RTE 301 receivers are a group.
rte_301 <- c('PROJ60RT 301 A', 'PROJ60RT 301 B')

cedar_point <- c('PROJ60CEDAR POINT A', 'PROJ60CEDAR POINT B', 
                 'PROJ60CEDAR POINT C', 'PROJ60CEDAR POINT D',
                 'PROJ60CEDAR POINT E')
				 
ccb_kent <- c('PROJ60CCB1', 'PROJ60CCB2', 'PROJ60CCB3', 'PROJ60CCB4',
              'PROJ60KENT ISLAND A', 'PROJ60KENT ISLAND B', 
              'PROJ60KENT ISLAND C', 'PROJ60KENT ISLAND D')
			  
bear_cr <- c('PROJ61BEARCR', 'PROJ61BEARCR2')

# Single receivers inside the Chesapeake:
ches_rcvrs <- c('PROJ61COOKPT','PROJ61NELSON','PROJ61CASHAV',
                'PROJ61TPTSH','PROJ61SAUNDR', 'PROJ61DAVETR')

rhode_r <- c('PROJ61WESTM', 'PROJ61RHODEM', 'PROJ61RMOUTH')

Now we can update actel_spatial_sum to reflect the inter-connectivity of the Chesapeake arrays.

actel_spatial <- actel_receivers %>% bind_rows(actel_tag_releases)

To improve our plots we can summarize and take mean locations for stations

# group by station name and take the mean lat and lon of each station deployment history.
actel_spatial_sum <- actel_spatial %>% dplyr::group_by(Station.name, Type) %>%
  dplyr::summarize(Latitude = mean(Latitude),
                   Longitude = mean(Longitude),
                   Array =  first(Array))

actel_spatial_sum_grouped <- actel_spatial_sum %>% 
  dplyr::mutate(Array = if_else(Array %in% outside_arrays, 'Outside', #if any of the above, make it 'Huron'
                                Array)) %>% # else leave it as its current value
#  dplyr::mutate(Array = if_else(Array %in% wilmington, 'Wilmington', Array)) %>%
  dplyr::mutate(Array = if_else(Array %in% ches_rcvrs, 'InnerChesapeake', Array)) %>%
  dplyr::mutate(Array = if_else(Array %in% piney_point, 'PROJ60PineyPoint', Array)) %>%
  dplyr::mutate(Array = if_else(Array %in% rte_301, 'PROJ60RTE301', Array)) %>%
  dplyr::mutate(Array = if_else(Array %in% rhode_r, 'PROJ61RHODER', Array)) %>%
  dplyr::mutate(Array = if_else(Array %in% cedar_point, 'PROJ60CEDARPOINT', Array)) %>%
  dplyr::mutate(Array = if_else(Array %in% ccb_kent, 'CCBKENT', Array)) %>%
  dplyr::mutate(Array = if_else(Array %in% bear_cr, 'PROJ60BEARCR', Array)) %>%
  dplyr::mutate(Array = if_else(Station.name %in% c('PROJ56-UP', 'Woodrow Wilson Bridge'),'PROJ56-UP', Array)) %>% # one tricky receiver?
  dplyr::mutate(Array = if_else(Station.name == 'PROJ56-SN', 'PROJ56-SN', Array)) %>%
  dplyr::mutate(Array = if_else(Station.name == 'PROJ56-GR', 'PROJ56-GR', Array)) %>%
#  dplyr::mutate(Array = if_else(Array == 'PROJ60CD WEST LL 9200', 'PROJ60CDWESTLL9200', Array)) %>%
  dplyr::mutate(Array = if_else(Array == 'PROJ60CBL PIER', 'PROJ60CBLPIER', Array))# two tricky receivers?
# Notice we haven't changed any of our data or metadata, just the spatial table

Now let’s remake our map, see if the connectivity is better denoted.

# Head back into the map, and denote the connectivity between our receiver groups:

proj59_arrays_map <- ggmap(base, extent='panel') + 
  ylab("Latitude") +
  xlab("Longitude") +
  geom_point(data = actel_spatial_sum_grouped, 
             aes(x = Longitude,y = Latitude, colour = Array), 
             shape = 19, size = 2) +
  geom_point(data = actel_tag_releases, aes(x=Longitude, y=Latitude, colour=Station.name),
             shape = 10, size = 5) 
 
ggplotly(proj59_arrays_map)

At this stage you would create and curate your spatial.txt file. This example shows how complicated they can get in large systems where all receivers arrays interact!

spatial_txt_dot = 'act_spatial.txt'

# How many unique spatial Arrays do we still have, now that we've combined
# so many?

actel_spatial_sum_grouped %>% dplyr::group_by(Array) %>% dplyr::select(Array) %>% unique()

For a grand finale, let’s try analyzing this dataset with our reduced spatial complexity compared to previous runs.

# actel::preload() with custom spatial.txt ----

actel_project <- preload(biometrics = actel_biometrics,
                         spatial = actel_spatial_sum_grouped,
                         deployments = actel_deployments,
                         detections = actel_dets,
                         dot = readLines(spatial_txt_dot),
                         tz = tz)


# Now actel understands the connectivity between our arrays better!
# actel::explore() with custom spatial.txt

actel_explore_output_chesapeake <- explore(datapack=actel_project, report=TRUE, print.releases=FALSE)

# We no longer get the error about detections jumping across arrays!
# and we don't need to save the report
n

If you haven’t been able to follow along, this report is included in the data/act/ folder you have cloned from the workshop repository (called actel_explore_report_chesapeake.html) so you can check it out!

Key Points