OTN Collaboration Update
Overview
Teaching: 0 min
Exercises: 0 minQuestions
What is the Ocean Tracking Network?
How does the FACT Network interact with OTN?
Objectives
This presentation was given 2020-12-15 at the FACT Network December meeting to update members on the status of OTN.
Key Points
Intro to R
Overview
Teaching: 30 min
Exercises: 20 minQuestions
What are common operators in R?
What are common data types in R?
What are some base R functions?
How do I deal with missing data?
Objectives
First, lets learn about RStudio.
RStudio is divided into 4 “Panes”: the Source for your scripts and documents (top-left, in the default layout); your Environment/History (top-right) which shows all the objects in your working space (Environment) and your command history (History); your Files/Plots/Packages/Help/Viewer (bottom-right); and the R Console (bottom-left). The placement of these panes and their content can be customized (see menu, Tools -> Global Options -> Pane Layout).
The R Script in the top pane can be saved and edited, while code typed directly into the Console below will disappear after closing the R session.
R can access files on and save outputs to any folder on your computer. R knows where to look for and save files based on the current working directory. This is one of the first things you should set up: a folder you’d like to contain all your data, scripts and outputs. The working directory path will be different for everyone.
Setting up R
# Packages ####
# once you install packages to your computer, you can "check them out" of your packages library each time you need them
library(tidyverse)# really neat collection of packages! https://www.tidyverse.org/
library(lubridate)
library(plotly)
library(ggmap)
# Working Directory ####
setwd('C:/Users/ct991305/Documents/Workshop Material/2020-12-17-telemetry-packages-FACT/') #set folder you're going to work in
getwd() #check working directory
#you can also change it in the RStudio interface by navigating in the file browser where your working directory should be,
#and clicking on the blue gear icon "More", and select "Set As Working Directory".
Intro to R
Learning about R
Operators
3 + 5 #maths! including - , *, /
weight_kg <- 55 #assignment operator! for objects/variables. shortcut: alt + -
weight_kg
weight_lb <- 2.2 * weight_kg #can assign output to an object. can use objects to do calculations
# Challenge 1:
# if we change the value of weight_kg to be 100, does the value of weight_lb also change automatically?
# remember: you can check the contents of an object by simply typing out its name
Functions
#functions take "arguments": you have to tell them what to run their script against
ten <- sqrt(weight_kg) #contain calculations wrapped into one command to type.
round(3.14159) #don't have to assign
args(round) #the args() function will show you the required arguments of another function
?round #will show you the full help page for a function, so you can see what it does,
#Challenge 2: can you round the value 3.14159 to two decimal places?
# using args() should give a clue!
Vectors and Data Types
weight_g <- c(21, 34, 39, 54, 55) #use the combine function to join values into a vector object
length(weight_g) #explore vector
class(weight_g) #a vector can only contain one data type
str(weight_g) #find the structure of your object.
#our vector is numeric.
#other options include: character (words), logical (TRUE or FALSE), integer etc.
animals <- c("mouse", "rat", "dog") #to create a character vector, use quotes
#Challenge 3: what data type will this vector become? You can check using class()
#challenge3 <- c(1, 2, 3, "4")
Subsetting
animals #calling your object will print it out
animals[2] #square brackets = indexing. selects the 2nd value in your vector
weight_g > 50 #conditional indexing: selects based on criteria
weight_g[weight_g <=30 | weight_g == 55] #many new operators here!
#<= less than or equal to, | "or", == equal to
weight_g[weight_g >= 30 & weight_g == 21] # >= greater than or equal to, & "and"
# this particular example give 0 results - why?
Missing Data
heights <- c(2, 4, 4, NA, 6)
mean(heights) #some functions cant handle NAs
mean(heights, na.rm = TRUE) #remove the NAs before calculating
#other ways to get a dataset without NAs:
heights[!is.na(heights)] #select for values where its NOT NA
#[] square brackets are the base R way to select a subset of data --> called indexing
#! is an operator that reverses the function
na.omit(heights) #omit the NAs
heights[complete.cases(heights)] #select only complete cases
#Challenge 4:
#1. Using this vector of heights in inches, create a new vector, heights_no_na, with the NAs removed.
#heights <- c(63, 69, 60, 65, NA, 68, 61, 70, 61, 59, 64, 69, 63, 63, NA, 72, 65, 64, 70, 63, 65)
#2. Use the function median() to calculate the median of the heights vector.
#BONUS: Use R to figure out how many people in the set are taller than 67 inches.
Key Points
Starting with Data Frames
Overview
Teaching: 25 min
Exercises: 10 minQuestions
How do I import tabular data?
How do I explore my data set?
What are some basic data manipulation functions?
Objectives
Importing from csv
dplyr
takes advantage of tidyverse pipes and chains of data manipulation to create powerful exploratory summaries.
If you’re unfamiliar with detection extracts formats from OTN-style database nodes, see the documentation here
#imports file into R. paste the filepath to the unzipped file here!
tqcs_matched_2010 <- read_csv("data/tqcs_matched_detections_2010.csv", guess_max = 117172)
#read_csv() is from tidyverse's readr package --> you can also use read.csv() from base R but it created a dataframe (not tibble) so loads slower
#see https://link.medium.com/LtCV6ifpQbb
#the guess_max argument is helpful when there are many rows of NAs at the top. R will not assign a data type to that columns until it reaches the max guess.
#I chose to use these here because I got the following warning from read_csv()
# Warning: 82 parsing failures.
# row col expected actual file
#117172 bottom_depth 1/0/T/F/TRUE/FALSE 5 'data/tqcs_matched_detections_2010.csv'
#117172 receiver_depth 1/0/T/F/TRUE/FALSE 4 'data/tqcs_matched_detections_2010.csv'
#122664 bottom_depth 1/0/T/F/TRUE/FALSE 17.5 'data/tqcs_matched_detections_2010.csv'
#122664 receiver_depth 1/0/T/F/TRUE/FALSE 16.5 'data/tqcs_matched_detections_2010.csv'
#162757 bottom_depth 1/0/T/F/TRUE/FALSE 6 'data/tqcs_matched_detections_2010.csv'
Exploring Detection Extracts
Let’s start with a practical example. What can we find out about these matched detections?
head(tqcs_matched_2010) #first 6 rows
View(tqcs_matched_2010) #can also click on object in Environment window
str(tqcs_matched_2010) #can see the type of each column (vector)
glimpse(tqcs_matched_2010) #similar to str()
#summary() is a base R function that will spit out some quick stats about a vector (column)
#the $ syntax is the way base R selects columns from a data frame
summary(tqcs_matched_2010$latitude)
#Challenge 5:
#1. What is is the class of the station column in tqcs_matched_2010?
#2. How many rows and columns are in the tqcs_matched_2010 dataset?
Data Manipulation
What is dplyr
and how can it be used to create summaries for me?
library(dplyr) #can use tidyverse package dplyr to do exploration on dataframes in a nicer way
# %>% is a "pipe" which allows you to join functions together in sequence.
#it can be read as "and then". shortcut: ctrl + shift + m
tqcs_matched_2010 %>% dplyr::select(8) #selects column 8
# dplyr::select this syntax is to specify that we want the select function from the dplyr package.
#often functions are named the same but do diff things
tqcs_matched_2010 %>% slice(1:5) #selects rows 1 to 5 dplyr way
tqcs_matched_2010 %>% distinct(detectedby) %>% nrow #number of arrays that detected my fish in dplyr!
tqcs_matched_2010 %>% distinct(catalognumber) %>% nrow #number of animals that were detected in 2018 (includes release records)
tqcs_matched_2010 %>% filter(catalognumber=="TQCS-1049258-2008-02-14") #filtering in dplyr!
tqcs_matched_2010 %>% filter(monthcollected >= 10) #month is in/after Oct
#get the mean value across a column
tqcs_matched_2010 %>%
group_by(catalognumber) %>%
summarise(MeanLat=mean(latitude)) #uses pipes and dplyr functions to find mean latitude for each fish
#Challenge 6:
#1. find the mean latitude and mean longitude for animal "TQCS-1049258-2008-02-14"
#2. find the min lat/long of each animal for detections occurring in July
Joining Detection Extracts
Here we will join and filter our detection extracts
tqcs_matched_2011 <- read_csv("data/tqcs_matched_detections_2011.csv", guess_max = 41880)
tqcs_matched_10_11_full <- rbind(tqcs_matched_2010, tqcs_matched_2011) #join the two files
#release records for animals often appear in >1 year, this will remove the duplicates
tqcs_matched_10_11_full <- tqcs_matched_10_11_full %>% distinct()
View(tqcs_matched_10_11_full) #wow this is huge!
tqcs_matched_10_11 <- tqcs_matched_10_11_full %>% slice(1:100000) #subset our example data to help this workshop run smoother!
Dealing with Datetimes
Datetimes are special formats which are not numbers nor characters.
library(lubridate)
tqcs_matched_10_11 %>% mutate(datecollected=ymd_hms(datecollected)) #Tells R to treat this column as a date, not regular numbers
#as.POSIXct(tqcs_matched_2010$datecollected) #this is the base R way - if you ever see this function
#lubridate is amazing if you have a dataset with multiple datetime formats / timezone
#the function parse_date_time() can be used to specify multiple date formats if you have a dataset with mixed rows
#the function with_tz() can change timezone. accounts for daylight savings too!
#example code to change timezone:
#My_Data_Set %>% mutate(datetime = ymd_hms(datetime, tz = "America/Nassau")) #change your column to a datetime format, specifying TZ (eastern)
#My_Data_Set %>% mutate(datetime_utc = with_tz(datetime, tzone = "UTC")) #make new column called datetime_utc which is datetime converted to UTC
Key Points
Intro to Plotting
Overview
Teaching: 15 min
Exercises: 10 minQuestions
How do I plot my data?
How can I plot summaries of my data?
Objectives
Learn how to make basic plots with ggplot2
Learn how to combine dplyr summaries with ggplot2 plots
Background
ggplot2
takes advantage of tidyverse pipes and chains of data manipulation as well as separating the aesthetics of the plot (what are we plotting) from the styling of the plot (how should we show it?), in order to produce readable and malleable plotting code.
general formula ggplot(data = <DATA>, mapping = aes(<MAPPINGS>)) + <GEOM_FUNCTION>()
library(ggplot2) #tidyverse-style plotting, a very customizable plotting package
# Assign plot layout to a variable
tqcs_10_11_plot <- ggplot(data = tqcs_matched_10_11,
mapping = aes(x = latitude, y = longitude)) #can assign a base plot to data, and add the geom() later
# Draw the plot
tqcs_10_11_plot +
geom_point(alpha=0.1,
colour = "blue")
#layer whatever geom you want onto your plot template
#very easy to explore diff geoms without re-typing
#alpha is a transparency argument in case points overlap
Basic plots
You can build your plots iteratively, without assigning to a variale as well.
tqcs_matched_10_11 %>%
ggplot(aes(latitude, longitude)) +
geom_point() #geom = the type of plot
tqcs_matched_10_11 %>%
ggplot(aes(latitude, longitude, colour = commonname)) + #colour by species!
geom_point()
#anything you specify in the aes() is applied to the actual data points/whole plot,
#anything specified in geom() is applied to that layer only (colour, size...)
Challenge
Try combining with dplyr
functions in this challenge!
#Challenge 7: try making a scatterplot showing the lat/long for animal "TQCS-1049258-2008-02-14", coloured by detection array
#Question: what other geoms are there? Try typing `geom_` into R to see what it suggests!
Key Points
You can feed output from dplyr’s data manipulation functions into ggplot using pipes.
Plotting various summaries and groupings of your data is good practice at the exploratory phase, and dplyr and ggplot make iterating different ideas straightforward.
Telemetry Reports - Imports
Overview
Teaching: 10 min
Exercises: 0 minQuestions
What datasets do I need from the Node?
How do I import all the datasets?
Objectives
Importing all the datasets
Now that we have an idea of what an exploratory workflow might look like with Tidyverse libraries like dplyr
and ggplot2
, let’s look at how we might implement a common telemetry workflow using these tools.
View(tqcs_matched_10_11) #already have our Tag matches
#need our Array matches, joined
teq_qual_2010 <- read_csv("data/teq_qualified_detections_2010_ish.csv")
teq_qual_2011 <- read_csv("data/teq_qualified_detections_2011_ish.csv")
teq_qual_10_11_full <- rbind(teq_qual_2010, teq_qual_2011)
teq_qual_10_11 <- teq_qual_10_11_full %>% slice(1:100000) #subset our example data for ease of analysis!
#need Array metadata
teq_deploy <- read.csv("data/TEQ_Deployments_201001_201201.csv")
View(teq_deploy)
#need Tag metadata
tqcs_tag <- read.csv("data/TQCS_metadata_tagging.csv")
View(tqcs_tag)
#remember: we learned how to switch timezone of datetime columns above, if that is something you need to do with your dataset!!
Key Points
Telemetry Reports for Array Operators
Overview
Teaching: 30 min
Exercises: 0 minQuestions
How do I summarize and plot my deployments?
How do I summarize and plot my detections?
Objectives
Mapping my stations - Static map
Since we have already imported and joined our datasets, we can jump in. This section will use the Deployment metadata for your array.
library(ggmap)
#first, what are our columns called?
names(teq_deploy)
#make a basemap for your stations, using the min/max deploy lat and longs as bounding box
base <- get_stamenmap(
bbox = c(left = min(teq_deploy$DEPLOY_LONG),
bottom = min(teq_deploy$DEPLOY_LAT),
right = max(teq_deploy$DEPLOY_LONG),
top = max(teq_deploy$DEPLOY_LAT)),
maptype = "terrain-background",
crop = FALSE,
zoom = 8)
#filter for stations you want to plot
teq_deploy_plot <- teq_deploy %>%
mutate(deploy_date=ymd_hms(DEPLOY_DATE_TIME....yyyy.mm.ddThh.mm.ss.)) %>% #make a datetime
mutate(recover_date=ymd_hms(RECOVER_DATE_TIME..yyyy.mm.ddThh.mm.ss.)) %>% #make a datetime
filter(!is.na(deploy_date)) %>% #no null deploys
filter(deploy_date > 2010-07-03) %>% #only looking at certain deployments!
group_by(STATION_NO) %>%
summarise(MeanLat=mean(DEPLOY_LAT), MeanLong=mean(DEPLOY_LONG)) #get the mean location per station
# you could choose to plot stations which are within a certain bounding box!
# to do this you would add another filter to the above data, before passing to the map
# ex: add this line after the mutate() clauses:
# filter(latitude >= 0.5 & latitude <= 24.5 & longitude >= 0.6 & longitude <= 34.9)
#add your stations onto your basemap
teq_map <-
ggmap(base, extent='panel') +
ylab("Latitude") +
xlab("Longitude") +
geom_point(data = teq_deploy_plot, #filtering for recent deployments
aes(x = MeanLong,y = MeanLat), #specify the data
colour = 'blue', shape = 19, size = 2) #lots of aesthetic options here!
#view your receiver map!
teq_map
#save your receiver map into your working directory
ggsave(plot = teq_map, file = "code/day1/teq_map.tiff", units="in", width=15, height=8)
Mapping my stations - Interactive map
An interactive map can contain more information than a static map.
library(plotly)
#set your basemap
geo_styling <- list(
fitbounds = "locations", visible = TRUE, #fits the bounds to your data!
showland = TRUE,
landcolor = toRGB("gray95"),
subunitcolor = toRGB("gray85"),
countrycolor = toRGB("gray85")
)
#decide what data you're going to use
teq_map_plotly <- plot_geo(teq_deploy_plot, lat = ~MeanLat, lon = ~MeanLong)
#add your markers for the interactive map
teq_map_plotly <- teq_map_plotly %>% add_markers(
text = ~paste(STATION_NO, MeanLat, MeanLong, sep = "<br />"),
symbol = I("square"), size = I(8), hoverinfo = "text"
)
#Add layout (title + geo stying)
teq_map_plotly <- teq_map_plotly %>% layout(
title = 'TEQ Deployments<br />(> 2010-07-03)', geo = geo_styling
)
#View map
teq_map_plotly
#You might see the following warning: it just means that the plotly package has some updating to do
# Warning message:
# `arrange_()` is deprecated as of dplyr 0.7.0.
# Please use `arrange()` instead.
# See vignette('programming') for more help
# This warning is displayed once every 8 hours.
# Call `lifecycle::last_warnings()` to see where this warning was generated.
Summary of Animals Detected.
Let’s find out more about the animals detected by our array!
#How many of each animals did we detect from each collaborator, by species
teq_qual_summary <- teq_qual_10_11 %>%
filter(datecollected > '2010-06-01') %>% #select timeframe, stations etc.
group_by(trackercode, scientificname, tag_contact_pi, tag_contact_poc) %>%
summarize(count = n()) %>%
select(trackercode, tag_contact_pi, tag_contact_poc, scientificname, count)
#view our summary table
teq_qual_summary #remember, this is just the first 10,000 rows!
#export our summary table
write_csv(teq_qual_summary, "code/day1/teq_detection_summary_June2010_to_Dec2011.csv", col_names = TRUE)
Summary of Detections
This can suggest array performance, hotspot stations, and be used as a metric for funders.
# number of dets per month/year per station
teq_det_summary <- teq_qual_10_11 %>%
mutate(datecollected=ymd_hms(datecollected)) %>%
group_by(station, year = year(datecollected), month = month(datecollected)) %>%
summarize(count =n())
teq_det_summary #remember: this is a subset!
# number of dets per month/year per station & species
teq_anim_summary <- teq_qual_10_11 %>%
mutate(datecollected=ymd_hms(datecollected)) %>%
group_by(station, year = year(datecollected), month = month(datecollected), scientificname) %>%
summarize(count =n())
teq_anim_summary # remember: this is a subset!
Plot of Detections
Lets make an informative plot showing number of matched detections, per year and month.
#try with teq_qual_10_11_full if you're feeling bold! takes about 1 min to run on a fast machine
teq_qual_10_11 %>%
mutate(datecollected=ymd_hms(datecollected)) %>% #make datetime
mutate(year_month = floor_date(datecollected, "months")) %>% #round to month
group_by(year_month) %>% #can group by station, species etc.
summarize(count =n()) %>% #how many dets per year_month
ggplot(aes(x = (month(year_month) %>% as.factor()),
y = count,
fill = (year(year_month) %>% as.factor())
)
)+
geom_bar(stat = "identity", position = "dodge2")+
xlab("Month")+
ylab("Total Detection Count")+
ggtitle('TEQ Animal Detections by Month')+ #title
labs(fill = "Year") #legend title
Key Points
Telemetry Reports for Tag Owners
Overview
Teaching: 30 min
Exercises: 0 minQuestions
How do I summarize and plot my detections?
How do I summarize and plot my tag metadata?
Objectives
New data frames
Filtering out release records from the detection extracts
#optional subsetted dataset to use: detections with releases filtered out!
tqcs_matched_10_11_no_release <- tqcs_matched_10_11 %>%
filter(receiver != "release")
#optional full dataset to use: detections with releases filtered out!
tqcs_matched_10_11_full_no_release <- tqcs_matched_10_11_full %>%
filter(receiver != "release")
Mapping my Detections and Releases - static map
Where were my fish observed?
base <- get_stamenmap(
bbox = c(left = min(tqcs_matched_10_11$longitude),
bottom = min(tqcs_matched_10_11$latitude),
right = max(tqcs_matched_10_11$longitude),
top = max(tqcs_matched_10_11$latitude)),
maptype = "terrain-background",
crop = FALSE,
zoom = 8)
#add your releases and detections onto your basemap
tqcs_map <-
ggmap(base, extent='panel') +
ylab("Latitude") +
xlab("Longitude") +
geom_point(data = tqcs_matched_10_11,
aes(x = longitude,y = latitude), #specify the data
colour = 'blue', shape = 19, size = 2) #lots of aesthetic options here!
#view your tagging map!
tqcs_map
Mapping my Detections and Releases - interactive map
Let’s use plotly!
#set your basemap
geo_styling <- list(
fitbounds = "locations", visible = TRUE, #fits the bounds to your data!
showland = TRUE,
landcolor = toRGB("gray95"),
subunitcolor = toRGB("gray85"),
countrycolor = toRGB("gray85")
)
#decide what data you're going to use
tqcs_map_plotly <- plot_geo(tqcs_matched_10_11, lat = ~latitude, lon = ~longitude)
#add your markers for the interactive map
tqcs_map_plotly <- tqcs_map_plotly %>% add_markers(
text = ~paste(catalognumber, scientificname, paste("Date detected:", datecollected),
paste("Latitude:", latitude), paste("Longitude",longitude),
paste("Detected by:", detectedby), paste("Station:", station),
paste("Contact:", contact_poc, contact_pi), sep = "<br />"),
symbol = I("square"), size = I(8), hoverinfo = "text"
)
#Add layout (title + geo stying)
tqcs_map_plotly <- tqcs_map_plotly %>% layout(
title = 'TQCS Detections<br />(2010-2011)', geo = geo_styling
)
#View map
tqcs_map_plotly
Summary of tagged animals
This section will use your Tagging Metadata
# summary of animals you've tagged
tqcs_tag_summary <- tqcs_tag %>%
mutate(UTC_RELEASE_DATE_TIME = ymd_hms(UTC_RELEASE_DATE_TIME)) %>%
#filter(UTC_RELEASE_DATE_TIME > '2019-06-01') %>% #select timeframe, specific animals etc.
group_by(year = year(UTC_RELEASE_DATE_TIME), COMMON_NAME_E) %>%
summarize(count = n(),
Meanlength = mean(LENGTH..m., na.rm=TRUE),
minlength= min(LENGTH..m., na.rm=TRUE),
maxlength = max(LENGTH..m., na.rm=TRUE),
MeanWeight = mean(WEIGHT..kg., na.rm = TRUE))
#view our summary table
tqcs_tag_summary
Detection Attributes
Joining the detections to the tag metadata will add line-by-line morphometrics and other information!
#Average location of each animal, without release records
tqcs_matched_10_11_no_release %>%
group_by(catalognumber) %>%
summarize(NumberOfStations = n_distinct(station),
AvgLat = mean(latitude),
AvgLong =mean(longitude))
#Lets try to join to our tag metadata to get some more context!!
#First we need to make a tagname column in the tag metadata, and figure out the enddate of the tag battery
tqcs_tag <- tqcs_tag %>%
mutate(enddatetime = (ymd_hms(UTC_RELEASE_DATE_TIME) + days(EST_TAG_LIFE))) %>% #adding enddate
mutate(tagname = paste(TAG_CODE_SPACE,TAG_ID_CODE, sep = '-')) #adding tagname column
#Now we join by tagname, to the detections without the release information
tag_joined_dets <- left_join(x = tqcs_matched_10_11_no_release, y = tqcs_tag, by = "tagname")
#make sure the redeployed tags have matched within their deployment period only
tag_joined_dets <- tag_joined_dets %>%
filter(datecollected >= UTC_RELEASE_DATE_TIME & datecollected <= enddatetime)
View(tag_joined_dets)
#Lets use this new dataframe to make summaries! Avg length per location
tqcs_tag_det_summary <- tag_joined_dets %>%
group_by(detectedby, station, latitude, longitude) %>%
summarise(AvgSize = mean(LENGTH..m., na.rm=TRUE))
tqcs_tag_det_summary
Summary of Detection Counts
Lets make an informative plot showing number of matched detections, per year and month.
#try with tqcs_matched_10_11_full_no_release if you're feeling bold! takes ~30 secs
tqcs_matched_10_11_no_release %>%
mutate(datecollected=ymd_hms(datecollected)) %>% #make datetime
mutate(year_month = floor_date(datecollected, "months")) %>% #round to month
group_by(year_month) %>% #can group by station, species etc.
summarize(count =n()) %>% #how many dets per year_month
ggplot(aes(x = (month(year_month) %>% as.factor()),
y = count,
fill = (year(year_month) %>% as.factor())
)
)+
geom_bar(stat = "identity", position = "dodge2")+
xlab("Month")+
ylab("Total Detection Count")+
ggtitle('TQCS Detections by Month (2010-2011)')+ #title
labs(fill = "Year") #legend title
Other Example Plots
Some examples of complex plotting options
# monthly latitudinal distribution of your animals (works best w >1 species)
tqcs_matched_10_11 %>%
group_by(m=month(datecollected), catalognumber, scientificname) %>% #make our groups
summarise(mean=mean(latitude)) %>% #mean lat
ggplot(aes(m %>% factor, mean, colour=scientificname, fill=scientificname))+ #the data is supplied, but no info on how to show it!
geom_point(size=3, position="jitter")+ # draw data as points, and use jitter to help see all points instead of superimposition
#coord_flip()+ #flip x y, not needed here
scale_colour_manual(values = "blue")+ #change the colour to represent the species better!
scale_fill_manual(values = "grey")+
geom_boxplot()+ #another layer
geom_violin(colour="black") #and one more layer
#There are other ways to present a summary of data like this that we might have chosen.
#geom_density2d() will give us a KDE for our data points and give us some contours across our chosen plot axes.
tqcs_matched_10_11 %>% #doesnt work on the subsetted data, back to original dataset for this one
group_by(month=month(datecollected), catalognumber, scientificname) %>%
summarise(meanlat=mean(latitude)) %>%
ggplot(aes(month, meanlat, colour=scientificname, fill=scientificname))+
geom_point(size=3, position="jitter")+
scale_colour_manual(values = "blue")+
scale_fill_manual(values = "grey")+
geom_density2d(size=7, lty=1) #this is the only difference from the plot above
#anything you specify in the aes() is applied to the actual data points/whole plot,
#anything specified in geom() is applied to that layer only (colour, size...)
# per-individual density contours - lots of plots: called facets!
tqcs_matched_10_11 %>%
ggplot(aes(longitude, latitude))+
facet_wrap(~catalognumber)+ #make one plot per individual
geom_violin()
Key Points
Introduction to GLATOS Data Processing
Overview
Teaching: 30 min
Exercises: 0 minQuestions
How do I load my data into GLATOS?
How do I filter out false detections?
How can I consolidate my detections into detection events?
How do I summarize my data?
Objectives
GLATOS is a powerful toolkit that provides a wide range of functionality for loading, processing, and visualizing your data. With it, you can gain valuable insights with quick and easy commands that condense high volumes of base R into straightforward functions, with enough versatility to meet a variety of needs.
First, we must set our working directory and import the relevant library.
## Set your working directory ####
setwd("./data")
library(glatos)
library(tidyverse)
library(VTrack)
Your code may not be in the ‘code/glatos’ folder, so use the appropriate file path for your data.
Next, we will create paths to our detections and receiver files. GLATOS can function with both GLATOS and OTN-formatted data, but the functions are different for each. Both, however, provide a marked performance boost over base R, and Both ensure that the resulting data set will be compatible with the rest of the glatos framework.
We will use the TQCS detections.
# Get file path to example FACT data
det_file_name <- 'tqcs_matched_detections.csv'
Remember: you can always check a function’s documentation by typing a question mark, followed by the name of the function.
## GLATOS help files are helpful!! ####
?read_otn_detections
With our file path in hand, we’ll want to use the read_otn_detections function to load our data into a dataframe. In this case, our data is formatted in the FACT style- if it were GLATOS formatted, we would want to use read_glatos_detections() instead.
# Save our detections file data into a dataframe called detections
detections <- read_otn_detections(det_file=det_file_name)
We will also filter out all the release rows.
# Detection extracts have rows that report the animal release
# for all the animals in the file:
View(detections %>% filter(receiver_sn == "release") %>% dplyr::select(transmitter_id, receiver_sn, detection_timestamp_utc, notes))
detections <- detections %>% filter(receiver_sn != "release")
Remember that we can use head() to inspect a few lines of our data to ensure it was loaded properly.
# View first 2 rows of output
head(detections, 2)
With our data loaded, we next want to apply a false filtering algorithm to reduce the number of false detections in our dataset. GLATOS uses the Pincock algorithm to filter probable false detections based on the time lag between detections- tightly clustered detections are weighted as more likely to be true, while detections spaced out temporally will be marked as false.
## Filtering False Detections ####
## ?glatos::false_detections
# write the filtered data (no rows deleted, just a filter column added)
# to a new det_filtered object
detections_filtered <- false_detections(detections, tf=3600, show_plot=TRUE)
head(detections_filtered)
nrow(detections_filtered)
The false_detections function will add a new column to your dataframe, ‘passed_filter’. This contains a boolean value that will tell you whether or not that record passed the false detection filter. That information may be useful on its own merits; for now, we will just use it to filter out the false detections.
# Filter based on the column if you're happy with it.
detections_filtered <- detections_filtered[detections_filtered$passed_filter == 1,]
nrow(detections_filtered) # Smaller than before
With our data properly filtered, we can begin investigating it and developing some insights. GLATOS provides a range of tools for summarizing our data so that we can better see what our receivers are telling us.
We can begin with a summary by animal, which will group our data by the unique animals we’ve detected.
# Summarize Detections ####
# By animal ====
sum_animal <- summarize_detections(detections_filtered, location_col = 'station', summ_type='animal')
sum_animal
We can also summarize by location, grouping our data by distinct locations.
# By location ====
sum_location <- summarize_detections(detections_filtered, location_col = 'station', summ_type='location')
head(sum_location)
Finally, we can summarize by both dimensions.
# By both dimensions
sum_animal_location <- summarize_detections(det = detections_filtered,
location_col = 'station',
summ_type='both')
head(sum_animal_location)
Summerizing by both dimensions will create a row for each station and each animal pair, let’s filter out the station where the animal wasn’t detected.
# Filter out stations where the animal was NOT detected.
sum_animal_location <- sum_animal_location %>% filter(num_dets > 0)
sum_animal_location
One other method- we can summarize by a subset of our animals as well. If we only want to see summary data for a fixed set of animals, we can pass an array containing the animal_ids that we want to see summarized.
# create a custom vector of Animal IDs to pass to the summary function
# look only for these ids when doing your summary
tagged_fish <- c("TQCS-1049258-2008-02-14", "TQCS-1055546-2008-04-30", "TQCS-1064459-2009-06-29")
sum_animal_custom <- summarize_detections(det=detections_filtered,
animals=tagged_fish,
location_col = 'station',
summ_type='animal')
sum_animal_custom
Alright, we can summarize our data. Let’s move on and see if we can make our dataset more amenable to plotting by reducing it from detections to detection events.
Detection Events differ from detections in that they condense a lot of temporally and spatially clustered detections for a single animal into a single detection event. This is a powerful and useful way to clean up the data, and makes it easier to present and clearer to read. Fortunately, GLATOS lets us to this easily.
# Reduce Detections to Detection Events ####
# ?glatos::detection_events
# arrival and departure time instead of multiple detection rows
# you specify how long an animal must be absent before starting a fresh event
events <- detection_events(detections_filtered,
location_col = 'station', # combines events across different receivers in a single array
time_sep=432000)
head(events)
We can also keep the full extent of our detections, but add a group column so that we can see how they would have been condensed.
# keep detections, but add a 'group' column for each event group
detections_w_events <- detection_events(detections_filtered,
location_col = 'station', # combines events across different receivers in a single array
time_sep=432000, condense=FALSE)
With our filtered data in hand, let’s move on to some visualization.
Key Points
More Features of GLATOS
Overview
Teaching: 15 min
Exercises: 0 minQuestions
What other features does GLATOS offer?
Objectives
GLATOS has some more advanced analystic tools beyond filtering and creating events.
GLATOS can be used to get the residence index of your animals at all the different stations. GLATOS offers 5 different methods for calculating Residence Index, here we will showcase 2 of those. residence_index requires an events objects to create a residence_index, we will use the one from the last lesson.
# Calc residence index using the Kessel method
rik_data <- glatos::residence_index(events,
calculation_method = 'kessel')
rik_data
# Calc residence index using the time interval method, interval set to 6 hours
rit_data <- glatos::residence_index(events,
calculation_method = 'time_interval',
time_interval_size = "6 hours")
rit_data
Both of these methods are similar and will almost always give different results, you can explore them all to see what method works best for your data.
GLATOS strives to be interoperable with other scientific R packages. Currently, we can crosswalk OTN data over to the package VTrack. Here’s an example:
?convert_otn_to_att
# FACT's tagging and deployment metadata sheet
tag_sheet_path <- 'TQCS_metadata_tagging.xlsx'
rcvr_sheet_path <- 'TEQ_Deployments.xlsx'
# Load the data from the tagging sheet and the receiver sheet
tags <- prepare_tag_sheet(tag_sheet_path, sheet=2)
receivers <- prepare_deploy_sheet(rcvr_sheet_path)
# Add columns missing from FACT extracts
detections_filtered['sensorvalue'] = NA
detections_filtered['sensorunit'] = NA
# Rename the station names in receivers to match station names in detections
receivers <- receivers %>% mutate(station=substring(station, 4))
ATTdata <- convert_otn_to_att(detections_filtered, tags, deploymentSheet = receivers)
# ATT is split into 3 objects, we can view them like this
ATTdata$Tag.Detections
ATTdata$Tag.Metadata
ATTdata$Station.Information
And then you can use your data with the VTrack package. You can call its abacusPlot function to generate an abacus plot:
# Now that we have an ATT dataframe, we can use it in VTrack functions:
# Abacus plot:
VTrack::abacusPlot(ATTdata)
To use the spacial features of VTrack, we have to give the ATT object a coordinate system to use.
# If you're going to do spatial things in ATT:
library(rgdal)
# Tell the ATT dataframe its coordinates are in decimal lat/lon
proj <- CRS("+init=epsg:4326")
attr(ATTdata, "CRS") <-proj
Here’s an example of the Centers of Activity function from VTrack.
?COA
coa <- VTrack::COA(ATTdata)
coa
GLATOS also includes tools for planning receiver arrays, simulating fish moving in an array, and some nice visualizations (which we will cover in the next episode).
Key Points
Basic Visualization and Plotting
Overview
Teaching: 30 min
Exercises: 0 minQuestions
How can I use GLATOS to plot my data?
What kinds of plots can I make with my data?
Objectives
We can use GLATOS to quickly and effectively visualize our data, now that we’ve cleaned it up.
One of the simplest ways is to use an abacus plot to display animal detections against the appropriate stations.
# Visualizing Data - Abacus Plots ####
# ?glatos::abacus_plot
# customizable version of the standard VUE-derived abacus plots
abacus_plot(detections_w_events,
location_col='station',
main='TQCS Detections By Station') # can use plot() variables here, they get passed thru to plot()
This is good, but cluttered. We can also filter out a single animal ID and plot only the abacus plot for that.
# pick a single fish to plot
abacus_plot(detections_filtered[detections_filtered$animal_id== "TQCS-1049258-2008-02-14",],
location_col='station',
main="TQCS-1049258-2008-02-14 Detections By Station")
If we want to see actual physical distribution, a bubble plot will serve us better.
Before we can plot this data properly, we need to download a shapefile of Florida This will give us a map on which we can plot our data. We can get a suitable Shapefile for Florida from GADM, the Global Administrative boundaries reference. The following code will retrieve first the country, then the province/state:
library(raster)
library(sp)
USA <- getData('GADM', country="USA", level=1)
FL <- USA[USA$NAME_1=="Florida",]
With the map generated, we can pass it to the bubble plot and see the results.
# Bubble Plots for Spatial Distribution of Fish ####
# bubble variable gets the summary data that was created to make the plot
detections_filtered
bubble <- detection_bubble_plot(detections_filtered,
out_file = '../tqcs_bubble.png',
location_col = 'station',
map = FL,
col_grad=c('white', 'green'),
background_xlim = c(-81, -80),
background_ylim = c(26, 28))
Key Points
Preparing FACT/OTN/GLATOS Data for actel
Overview
Teaching: 15 min
Exercises: 0 minQuestions
How do I take my
glatos
data and format for actel?Objectives
Preparing our data to use in Actel
Up next, we’re going to be learning about Actel, a new player in the acoustic telemetry data analysis ecosystem. We’ve got the package author coming up next to tell you all about it, so let’s quickly look at how we can take the data we have been working with from FACT (and any OTN/GLATOS style data) and make it ready for Actel.
# Using FACT/OTN/GLATOS-style data in Actel ####
# install.packages('actel') # CRAN Version 1.2.0
# Or the development version:
# remotes::install_github("hugomflavio/actel", build_opts = c("--no-resave-data", "--no-manual"), build_vignettes = TRUE)
library(actel)
library(stringr)
# Hugo has created within Actel a preload() function for folks who are holding their deployment, tagging, and detection data in R variables already. This function expects 4 input objects, similar to VTrack's 3 objects, plus a 'spatial' data object that will help us describe the places we are able to detect animals and how the animals are allowed to move between them.
# But it wants a bit more data than VTrack did, so we're going to have to go back to our deployment metadata sheet and reload it:
full_receiver_meta <- readxl::read_excel(rcvr_sheet_path, sheet=1, skip=0) %>%
dplyr::rename(
deploy_lat = DEPLOY_LAT,
deploy_long = DEPLOY_LONG,
ins_model_no = INS_MODEL_NO,
deploy_date_time = `DEPLOY_DATE_TIME (yyyy-mm-ddThh:mm:ss)`,
recover_date_time = `RECOVER_DATE_TIME (yyyy-mm-ddThh:mm:ss)`,
) %>%
dplyr::mutate(
station = paste(OTN_ARRAY, STATION_NO, sep = '')
)
We rename a few columns from the receiver metadata sheet so that they are in a nicer format. We also create a ‘station’ column that is array_code + station_name, guaranteed unique for any project across the entire Network.
Formatting - Tagging and Deployment Data
Tagging metadata is entered into Actel as biometrics
, and deployment metadata as deployments
. It needs a few specially named columns, and a properly formatted date.
# All dates will be supplied to Actel in this format:
actel_datefmt = '%Y-%m-%d %H:%M:%S'
# biometrics is the tag metadata. If you have a tag metadata sheet, it looks like this:
actel_biometrics <- tags %>% mutate(Release.date = format(time, actel_datefmt),
Signal=as.integer(TAG_ID_CODE),
Release.site = RELEASE_LOCATION)
# deployments is based in the receiver deployment metadata sheet
actel_deployments <- full_receiver_meta %>% filter(!is.na(recover_date_time)) %>%
mutate(Station.name = station,
Start = format(deploy_date_time, actel_datefmt), # no time data for these deployments
Stop = format(recover_date_time, actel_datefmt), # not uncommon for this region
Receiver = INS_SERIAL_NO) %>%
arrange(Receiver, Start)
Detections
For detections, a few columns need to exist: Transmitter
holds the full transmitter ID. Receiver
holds the receiver serial number, Timestamp
has the detection times, and we use a couple of Actel functions to split CodeSpace
and Signal
from the full transmitter_id
.
# Renaming some columns in the Detection extract files
actel_dets <- detections %>% mutate(Transmitter = transmitter_id,
Receiver = as.integer(receiver_sn),
Timestamp = format(detection_timestamp_utc, actel_datefmt),
CodeSpace = extractCodeSpaces(transmitter_id),
Signal = extractSignals(transmitter_id))
Creating the Spatial dataframe
# Spatial is all release locations and all receiver deployment locations.
# Basically, every distinct location we can say we know an animal has been.
actel_receivers <- full_receiver_meta %>% mutate( Station.name = station,
Latitude = deploy_lat,
Longitude = deploy_long,
Type='Hydrophone') %>%
mutate(Array=OTN_ARRAY) %>% # Having this many distinct arrays breaks things with few clues as to why.
dplyr::select(Station.name, Latitude, Longitude, Array, Type) %>%
distinct(Station.name, Latitude, Longitude, Array, Type)
actel_tag_releases <- tags %>% mutate(Station.name = RELEASE_LOCATION,
Latitude = latitude,
Longitude = longitude,
Type='Release') %>%
mutate(Array = 'TEQ') %>% # released by TEQ, TEQ is 'first array'
distinct(Station.name, Latitude, Longitude, Array, Type)
# Bind the releases and the deployments together for the unique set of spatial locations
actel_spatial <- actel_receivers %>% bind_rows(actel_tag_releases)
# Now, for stations that are named the same, take an average location.
actel_spatial_sum <- actel_spatial %>% group_by(Station.name, Type) %>%
dplyr::summarize(Latitude = mean(Latitude),
Longitude = mean(Longitude),
Array = first(Array))
Creating the Actel data object w/ preload()
Now you have everything you need to call preload()
.
# Specify the timezone that your timestamps are in.
# OTN provides them in GMT.
# FACT has both UTC/GMT and Eastern
tz <- "GMT0"
# Then you can create the Actel project object.
actel_project <- preload(biometrics = actel_biometrics,
spatial = actel_spatial_sum,
deployments = actel_deployments,
detections = actel_dets,
tz = tz)
There will be some issues with the data that the Actel checkers find. Detections outside the deployment time bounds, receivers that aren’t in your metadata. For the purposes of today, we will drop those rows from the final copy of the data, but you can take these prompts as cues to verify your input metadata is accurate and complete.
# Once you have an Actel object, you can run things like explore to generate the summary reports you're about to see:
actel_explore_output <- explore(actel_project, tz=tz, report=TRUE, print.releases=FALSE)
See more on what you can do with this output coming up next!
Key Points
Introduction to actel
Overview
Teaching: 45 min
Exercises: 0 minQuestions
Objectives
actel
is designed for studies where animals tagged with acoustic tags are expected to move through receiver arrays. actel
combines the advantages of automatic sorting and checking of animal movements with the possibility for user intervention on tags that deviate from expected behaviour. The three analysis functions: explore, migration and residency, allow the users to analyse their data in a systematic way, making it easy to compare results from different studies.
(Speaker: Dr. Hugo Flavio, hflavio@wlu.ca)
Exploring
library("actel")
# The first thing you want to do when you try out a package is...
# explore the documentation!
# See the package level documentation:
?actel
# See the manual:
browseVignettes("actel")
# access the paper:
citation("actel")
# Finally, every function in actel contains detailed documentation
# of the function's purpose and parameters. You can access this
# documentation by typing a question mark before the function name.
# e.g.: ?explore
Example data exercise
# Start by checking where you are working with (it is always good to known this)
getwd()
# We will deploy actel's example files into a new folder, called "actel_example".
# exampleWorkspace() will provide you with some information about how to run the example analysis.
exampleWorkspace("actel_example")
# Side note: When preparing your own data, you can crate template files
# with the function createWorkspace("directory_name")
# Take a minute to explore the folder contents. You will find the files that were presented earlier.
# -----------------------
# If you read the information provided by exampleWorkspace, you will find these two commands:
# move into the newly created folder
setwd('actel_example')
# Run analysis. Note: This will open an analysis report on your web browser.
exp.results <- explore(tz = 'Europe/Copenhagen', report = TRUE)
# Because this is an example dataset, this analysis will run very smoothly.
# Real data is not always this nice to us!
# ----------
# IF your analysis failed while compiling the report, you can load
# the saved results back in using the dataToList() function:
exp.results <- dataToList("actel_explore_results.RData")
# IF your analysis failed before you had a chance to save the results,
# load the pre-compiled results, so you can keep up with the workshop.
# Remember to change the path so R can find the RData file.
exp.results <- dataToList("pre-compiled_results.RData")
# ----------
# -----------------------
# What is inside the output?
names(exp.results)
# What is inside the valid movements?
names(exp.results$valid.movements)
# let's have a look at the first one:
exp.results$valid.movements[["R64K-4451"]]
# and here are the respective valid detections:
exp.results$valid.detections[["R64K-4451"]]
# We can use these results to obtain our own plots (We will go into that later)
Distances matrix exercise
# Let's load the spatial file individually, so we can have a look at it.
spatial <- loadSpatial()
head(spatial)
# When doing the following steps, it is imperative that the coordinate reference
# system (CRS) of the shapefile and of the points in the spatial file are the same.
# In this case, the values in columns "x" and "y" are already in the right CRS.
# loadShape will rasterize the input shape, using the "size" argument as a reference
# for the pixel size. Note: The units of the "size" will be the same as the units
# of the shapefile projection (i.e. metres for metric projections, and degrees for latlong systems)
#
# In this case, we are using a metric system, so we are saying that we want the pixel
# size to be 10 metres.
#
# NOTE: Change the 'path' to the folder where you have the shape file.
# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
water <- loadShape(path = "replace/with/path/to/shapefile",
shape = "stora_shape_epsg32632.shp", size = 10,
coord.x = "x", coord.y = "y")
# The function above can run without the coord.x and coord.y arguments. However, by including them,
# you are allowing actel to load the spatial.csv file on the fly and check if the spatial points
# (i.e. hydrophone stations and release sites) are positioned in water. This is very important,
# as any point position on land will be cut-off during distance calculations.
# Now we need to create a transition layer, which R will use to estimate the distances
tl <- transitionLayer(water)
# We are ready to try it out! distancesMatrix will automatically search for a "spatial.csv"
# file in the current directory, so remember to keep that file up to date!
dist.mat <- distancesMatrix(tl, coord.x = "x", coord.y = "y")
# have a look at it:
dist.mat
migration and residency
# Let's go ahead and try running migration() and residency() on this dataset.
mig.results <- migration(tz = 'Europe/Copenhagen', report = TRUE)
# Now try copy-pasting the next five lines as a block and run it all at once.
res.results <- residency(tz = 'Europe/Copenhagen', report = TRUE)
comment
This is a lovely fish
n
y
# R will know to answer each of the questions that pop up during the analysis
# with the lines you copy-pasted together with your code!
# explore the reports to see what's new!
# Note: There is a known bug in residency() as of actel 1.2.0, which for some datasets
# will cause a crash with the following error message:
#
# Error in tableInteraction(moves = secmoves, tag = tag, trigger = the.warning, :
# argument "save.tables.locally" is missing, with no default
#
# This has already been corrected in development and a fix will be released in actel 1.2.1.
# In the meantime, if you come across this error, get in contact with me and I will guide
# you through how to install the development version.
For home: Transforming the results
# Try some of the stuff in this manual page!
vignette("f-0_post_functions", "actel")
Key Points
Introduction to Boosted Regression Trees
Overview
Teaching: 45 min
Exercises: 0 minQuestions
Objectives
Overview
- BRT Basics
- A Brief (Likely Incomplete) History of BRT Analysis in Shark Research
- Using BRTs with Acoustic Telemetry Data
- BRT Analysis in R using
gbm.auto
- Case Study (and Shameless Paper Promotion): NW Atlantic Dusky Sharks
(Speaker: Dr. Charles Bangley, Charles.Bangley@dal.ca)
Key Points
Other OTN Telemetry Curriculums
Overview
Teaching: 0 min
Exercises: 0 minQuestions
How can I keep expanding my learning?
Objectives
OTN has hosted other workshops in the past which contain different code sets that may be useful to explore after this workshop.
-
IdeasOTN Telemetry Workshop Series 2020: code available here and videos available on our YouTube here
-
SPG workshop 2020. Videos will soon be posted on our YouTube.
Many of our Intro to R workshops are based upon this curriculum from The Carpentries
Key Points