Data Manipulation Basics
Overview
Teaching: 20 min
Exercises: 0 minQuestions
How do I load data?
How do I clean data?
How can I inspect data?
Objectives
Explain data cleaning as a component of analysis.
Demonstrate read.csv() as a tool for loading data.
Explain the different kinds of files in which telemetry data is stored.
Use head, tail, str, and indexing to display loaded telemetry data.
Data Cleaning and Preprocessing
When analyzing data, 80% of time is spent cleaning and manipulating data and only 20% actually analyzing it. For this reason, it is critical to become familiar with the data cleaning process and getting your data into a format that can be analyzed. Let’s begin with reading in our data using the suite of tidyverse functions.
Reading in data can be done using the read_csv
function that automatically recognises data type in each column. The data is read in as a tibble
.
library(tidyverse)
# We need to define these as the first 1000 rows don't have any data so read_csv thinks they are logicals
col_specs <- cols(
sensor_value = col_character(),
sensor_unit = col_character(),
glatos_caught_date = col_date()
)
data <- read_csv("data/detections.csv", col_types = col_specs)
data
The read_csv
function outlines what each column was recognised as (e.g. double, integer, logical, date time). The function will also tell you which columns and rows it found difficult to recognise. In this case, the warnings specify that it was expecting the sensor_value
and sensor_unit
colums being a logical variable, but couldnt recognise the input.
# You can view the data you have just input using the `View()` function
View(data)
Load data. This enables collapsing blocks of code using the drop arrow on the left
Acoustic telemetry data are commonly stored in 3 different files:
- Detections
- Receiver deployment metadata
- Tag metadata
dets_file <- file.path('data', 'detections.csv')
rcv_file <- file.path('data', 'deployments.csv')
tags_file <- file.path('data', 'animal_tags.csv')
dets <- read_csv(dets_file, col_types = col_specs) #detections from acoustic receivers
Rxdeploy <- read_csv(rcv_file) #receiver station info
tags <- read_csv(tags_file) #tagged fish data
Check out the data (these are all data frames by default):
head(dets)
tail(dets)
str(dets)
dets[1:10,]
head(Rxdeploy)
head(tags)
Notice the variables and their data type (important - google data types in R if unfamiliar).
Clearly we need to combine the above 3 dataframes in various ways to do anything with these data let’s grease the wheels and check out fish tagging and receiver locations:
Key Points
read.csv() can be used to load data.
head, tail, and str() can be used to inspect data.