1 Getting started

Welcome to the statistics tutorial for datasets typically encountered in the Davidson Lab, School of Biological Sciences, University of East Anglia. This guide will walk you through selecting the appropriate statistical model for your data, formulating the model syntax and variables, testing model assumptions, improving model fit, and performing “model selection.” We will also cover commonly used graphical representations for different types of data.

1.1 Install and load R package dependencies

# Install the required packages (if not already installed)
required_packages <- c("tidyverse", "lme4", "MASS", "nlme", "lmerTest", "MuMIn", "ggpubr", "performance")

# Install packages that aren't already installed
install_if_needed <- required_packages[!(required_packages %in% installed.packages()[,"Package"])]
if(length(install_if_needed)) install.packages(install_if_needed)

You can also install each manually using the function install.packages("nameOfPackage")

# Load the packages after installation
library(tidyverse)   # A collection of R packages for data science
library(lme4)        # Provides tools for fitting linear and generalized linear mixed-effects models (GLMMs).
library(MASS)        # A large collection of statistical functions, with a focus on generalized linear models (GLMs).
library(nlme)        # Mixed-effects models, and can fit both linear and nonlinear mixed-effects models (U-shaped, sigmoidal, exponential). Model diagnostic tools. 
library(lmerTest)    # Adds p-values to lme4 models
library(performance) # checks for colinearity
library(MuMIn)       # Model selection & averaging
library(ggplot2)     # Data visualization
library(ggpubr)      # Enhanced ggplot2 functions

1.2 Troubleshooting install issues

If you have trouble installing packages try the following: * Read the content of the error warning * Does the package you are installing have dependencies? the error may say “x package not found”. Install that package using install.packages("nameofpackage") * Search the error online * Search alternative ways to install the package. Sometimes vignettes for the package have direct links you can use for installation, rather than using install.packages()

2 Workflow

3 Defining variables

Some terminology can be used interchangeably, here is a breakdown.

Common Terms for Independent and Dependent Variables
Term Description
Predictor Independent variable(s) used to explain or predict the dependent variable.
Outcome The dependent variable, whose variation is being explained or predicted by the predictors.
Explanatory Similar to ‘predictor’, refers to the variable that explains or influences the dependent variable.
Response The dependent variable in a study or experiment, often used in regression models.
Fixed Effect A variable that is considered to have a fixed influence on the dependent variable in a model, often contrasted with random effects.
Types of Response and Predictor Variables
Variable Type Description
Continuous (Gaussian) A variable that can take any value within a range, typically measured on a continuous scale (e.g., height, weight).
Count A variable that represents the number of occurrences of an event, typically non-negative integers (e.g., number of birds observed).
Binomial (Success/Failure) A variable with two possible outcomes (e.g., success/failure, yes/no), or quantified as proportion data from two possible outcomes (e.g. 4 out of 10 correct).
Categorical A variable with categories or levels that represent distinct groups or classes (e.g., species, treatment groups).

4 Choosing an appropriate statistical test

4.1 Regression models

LMs (Linear Models), GLMs (Generalized Linear Models), and GLMMs (Generalized Linear Mixed Models) are all part of a broad collection of regression models. Each of these models is used to examine relationships between dependent and independent variables, but they differ in their assumptions, data requirements, and flexibility.

4.1.1 Linear Models (LMs)

LMs are the simplest type of regression models. They model the relationship between a dependent variable (response) and one or more independent variables (predictors) using a linear equation.

The model can be written as: \[ Y = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \dots + \epsilon \]

Where: - \(Y\) is the dependent variable, and specifically a continuous variable (i.e. gaussian distribution). - \(\beta_0\) is the intercept. - \(\beta_1, \beta_2, \dots\) are the coefficients for the independent variables \(X_1, X_2, \dots\). - \(\epsilon\) is the error term.

4.1.2 Linear Mixed Models (LMMs)

The Linear Mixed Model (LMM) is an extension of the linear model that includes random effects to account for correlations within grouped or hierarchical data. These might include repeated measures (i.e. data points collected) from the same individuals, or repeated measures (i.e. data points collected) from the same geographical site.

The model can be written as:

\[ Y = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \dots + b_0 + \epsilon \]

Where: - \(Y\) is the dependent variable, and specifically a continuous variable (i.e. gaussian distribution). - \(\beta_0\) is the fixed intercept. - \(\beta_1, \beta_2, \dots\) are the fixed effect coefficients for the independent variables \(X_1, X_2, \dots\). - \(b_0\) is the random intercept associated with each group, representing the deviation of each group’s intercept from the overall intercept. It is assumed to follow a normal distribution with mean 0 and variance \(\sigma^2_b\), i.e., \(b_0 \sim \mathcal{N}(0, \sigma^2_b)\). - \(\epsilon \sim \mathcal{N}(0, \sigma^2)\) is the residual error term, assumed to be normally distributed with mean 0 and variance \(\sigma^2\).

What does it mean? - If modeling data with repeated measurements from subjects, \(b_0\) accounts for individual-specific variations in the intercept. - The fixed effects (\(\beta_0, \beta_1, \dots\)) represent the overall average relationship between predictors and the outcome, while the random effects capture variation at the group level.

This formulation allows the model to handle correlated data within groups, such as multiple measurements from the same subject or repeated observations from different locations.

4.1.3 Generalised Linear Models (GLMs)

GLMs extend linear models by allowing for response variables that have distributions other than the normal distribution (e.g., binomial, Poisson).

They introduce the concept of a link function to model the relationship between the predictors and the mean of the response variable.

GLMs are used when the response variable is categorical (e.g., logistic regression for binary outcomes) or count data (e.g., Poisson regression).

The model can be written as: \[ Y = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \dots + \epsilon \]

Where: - \(Y\) is the dependent variable which can follow distributions such as binary (e.g., for logistic regression) or count (e.g., for Poisson regression). - \(\beta_0\) is the intercept. - \(\beta_1, \beta_2, \dots\) are the coefficients for the independent variables \(X_1, X_2, \dots\). - \(\epsilon\) is the error term.

4.1.4 Generalised Linear Mixed Models (GLMMs)

The Generalized Linear Mixed Model (GLMM) extends the Generalized Linear Model (GLM) by incorporating random effects, which allow for hierarchical or grouped data structures.

The model can be written as:

\[ Y = g^{-1} \left( \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \dots + b_0 + \epsilon \right) \]

Where: - \(Y\) is the dependent variable which can follow distributions such as binary (e.g., for logistic regression) or count (e.g., for Poisson regression). - \(g^{-1}\) is the inverse link function, mapping the linear predictor to the expected value of \(Y\). - \(\beta_0\) is the fixed intercept. - \(\beta_1, \beta_2, \dots\) are the fixed effect coefficients for the independent variables \(X_1, X_2, \dots\). - \(b_0 \sim \mathcal{N}(0, \sigma^2_b)\) is the random intercept, assumed to follow a normal distribution with mean 0 and variance \(\sigma^2_b\). - \(\epsilon \sim \mathcal{N}(0, \sigma^2)\) is the residual error term.

What does it mean? - If modeling repeated measures, \(b_0\) represents variation in the intercept for different individuals. - If modeling data from multiple locations, \(b_0\) accounts for location-specific differences.

This formulation allows the model to capture dependencies within grouped data while retaining the flexibility of a generalized linear framework.


4.1.5 Selecting the appropriate regression model for your data

Simply put, follow the workflow (Figure 1) to select the right regression model for your data.

Figure 1. Regression model selection workflow

4.1.6 Model Assumptions for LM, LMM, GLM, and GLMM

Model Type Homoscedasticity (Homogeneity of Variance) Normality of Residuals Linearity Independence of Observations/Errors Definition
Linear Model (LM) Yes Yes Yes Yes Homoscedasticity assumes the variance of the residuals is constant across all levels of the predictors. Normality assumes the residuals are normally distributed. Linearity assumes a straight-line relationship between predictors and the response variable. Independence assumes observations (or errors) are not correlated with each other.
Linear Mixed Model (LMM) Yes Yes Yes Yes Homoscedasticity applies to residuals after accounting for random effects. Normality applies to both residuals and random effects. Linearity and independence assumptions are similar to LM.
Generalized Linear Model (GLM) No No Yes No GLMs do not assume homoscedasticity; variance is determined by the distribution of the response variable. Normality is not assumed, as the response follows the appropriate distribution. Linearity still applies in terms of the linear predictor. Independence of observations is assumed.
Generalized Linear Mixed Model (GLMM) No No Yes No GLMMs do not assume homoscedasticity or normality. The variance depends on the response distribution, and errors follow the chosen family (e.g., Poisson, binomial). Linearity applies to the linear predictor, but observations are not assumed independent due to random effects.

Summary of assumptions

Homoscedasticity (Homogeneity of Variance): This assumption states that the variance of the residuals should be constant across all levels of the independent variable(s). In the context of linear and linear mixed models, the residuals should exhibit consistent spread across the predictor values.

Normality of Residuals: This assumption holds that the residuals (or errors) from the model should follow a normal distribution. For linear models, this is important for valid hypothesis testing and constructing confidence intervals. In mixed models, it applies to both residuals and random effects.

Linearity: This assumption states that there is a linear relationship between the independent variable(s) and the dependent variable. For linear models, the effect of each predictor on the outcome is assumed to be linear, meaning a unit change in the predictor corresponds to a fixed change in the outcome.

Independence of Observations/Errors: This assumption holds that the residuals (errors) are independent of one another. In linear and mixed models, this means that one observation (or error) is not correlated with another, ensuring the independence of the data points used in the model.

4.2 Principal Component Analysis (PCA)

Principal Component Analysis (PCA) is a dimensionality reduction technique used to simplify large datasets by transforming the data into a smaller set of uncorrelated variables called principal components. These components explain the variance in the original data, with the first principal component explaining the greatest variance.

For example, imagine you have a dataset containing scores from an intelligence test, where multiple measures (such as memory, reasoning, problem-solving, etc.) are recorded. These measures may be correlated, as they all assess aspects of cognitive ability. PCA can be used to combine these correlated variables into a smaller number of principal components, capturing the main variation in the data without losing significant information.

The PCA model can be written as:

\[ \mathbf{X} = \mathbf{T} \mathbf{P}^T + \mathbf{E} \]

Where: - \(\mathbf{X}\) is the original data matrix (with variables as columns and observations as rows). - \(\mathbf{T}\) is the scores matrix, where each row represents the transformed data in terms of the principal components. - \(\mathbf{P}\) is the loading matrix, representing the eigenvectors (directions) of the original data that correspond to the principal components. - \(\mathbf{E}\) is the residual matrix, representing the part of the original data that cannot be explained by the principal components.

What does it mean? - PCA identifies the axes (principal components) that maximize the variance in the data. - The first principal component \(\mathbf{t}_1\) explains the greatest variance, while subsequent components \(\mathbf{t}_2, \mathbf{t}_3, \dots\) explain progressively less variance. - The loadings matrix \(\mathbf{P}\) shows how each original variable contributes to the principal components.

PCA is commonly used for: - Data reduction, where a smaller number of principal components are used to represent the data with minimal loss of information. - Visualisation of high-dimensional data, typically in 2D or 3D plots, by using the first two or three principal components.


5 Regression models in R

5.1 Input data format

Your dataframe should be in long format (Table 1) where:

Each row represents a single observation (i.e., the unit of analysis).
Columns contain variables, including:
Response variable (e.g., count, presence/absence, continuous measure).
Fixed effects (e.g., treatment, environmental variables).
Random effects (e.g., subject ID, site, time point).

Categorical variables should be coded appropriately (e.g., factors, dummy variables). Categorical variables will have 2 or more categories, also known as levels. In the types of regressions taught in this tutorial, the functions will automatically define a reference level alphabetically. Therefore in a category like age, with levels adult and juvenile, the reference category will be adult and outputs will be described as juvenile relative to adult.
Numeric variables should be scaled if necessary for model convergence.

Example of long-format data for regression models
Subject_ID Time Treatment Response Site
A 1 Control 5.2 X
A 2 Control 6.1 X
B 1 Treatment 8.3 Y
B 2 Treatment 7.9 Y

5.2 Basic model code syntax

The R package, the function and its syntax (i.e. the way you write the model formula) depends on the type of model you are running.

LM syntax

model <- lm(response ~ predictor1 + predictor2, data = your_data)

LMM syntax

model <- lmer(response ~ predictor1 + (1|group), data = your_data, family = gaussian)

GLM syntax

model <- glm(response ~ predictor1 + predictor2, family = binomial, data = your_data)

model <- glm(response ~ predictor1 + predictor2, family = poisson, data = your_data)

GLMM syntax

model <- glmer(response ~ predictor1 + predictor2 + (1 | group), family = binomial, data = your_data)

GLMMS can be run using a slightly different package that can handle non-linear structures and provides AIC and BIC output scores:

model <- lme(response ~ predictor1 + predictor2, random = ~ 1 | group, data = your_data)

where reponse is your dependent variable, predictor1 and predictor2 are your dependent variables, also known as fixed effects, and group is your repeated measures, such as individual or site, also known as random effects.

5.3 Including interaction terms in your model code syntax

Interactions in statistical models allow you to explore how the relationship between one predictor and the outcome changes depending on the value of another predictor.

Instead of simply including main fixed effects response ~ predictor1 + predictor2 , to test an interaction between fixed effects, change the syntax by replacing the + with a * response ~ predictor1*predictor2

Example Imagine you measured the cognition of birds, but the birds were of different sex and age; including an interaction in your model allows you to explore how the relationship between age and cognition changes depending on the bird’s sex.

For example, age may positively affect cognition in males but not in females. older males may have better cognition than younger males. or perhaps older females might have a lower cognition compared to younger females, perhaps due to differences in reproductive energy allocation or other biological factors.

Ignoring the interaction could lead to incorrect inferences. For instance, a model without the interaction term might incorrectly assume that age affects males and females the same way.

That being said, be sure to only include interaction terms if you have an a priori reason to test the interaction, based on existing literature, ecological theory or speculations based on observations.


5.4 Example dataset and models

We will perform some example analyses in R. They are a guide and starting point and do not cover all model types, but the concepts are the same. You will need to apply these concepts flexibly, according to the type of model that is appropriate for your data, and the variables included. Your dataframe, variable names and number of variables and interactions will likely differ, and will need to be substituted according to the syntax described above.

We begin by creating some mock data. When it comes to running your own analyses, you would load and name your dataframe into your R global environment. It will then be available to call within the regression function, specifying the name you gave your dataframe. In the example dataset here, we are creating a dataframe call bird_data.

set.seed(123)  # Ensures the random code is reproducible 

# Number of rows
n <- 100

# Create mock data
bird_data <- data.frame(
  # Randomly assign age (adult or juvenile)
  age = factor(sample(c("adult", "juvenile"), n, replace = TRUE)),
  
  # Randomly assign sex (male or female)
  sex = factor(sample(c("male", "female"), n, replace = TRUE)),
  
  # Randomly assign site (3 different sites)
  site = factor(sample(1:3, n, replace = TRUE)),
  
  # Generate continuous body_size data (e.g., normally distributed)
  body_size = rnorm(n, mean = 50, sd = 10)  # Mean = 50, SD = 10
)

# Now, generate cognition based on the predictors (age, sex, body_size) and their interaction
# We'll add some noise with rnorm() to simulate data
bird_data$cognition <- with(bird_data, 
                            10 + 3*(age == "juvenile") + 2*(sex == "male") + 
                              1.5*(age == "juvenile" & sex == "male") + 0.05*body_size + 
                              rnorm(n, mean = 0, sd = 5))  # Add random noise
# Check the first few rows of the dataset
head(bird_data)
##        age    sex site body_size cognition
## 1    adult   male    2  63.72614  12.60574
## 2    adult female    2  44.35730  14.66327
## 3    adult female    1  59.70311  17.49528
## 4 juvenile   male    2  49.81366  22.19250
## 5    adult female    3  53.62370  17.43723
## 6 juvenile female    3  70.11306  13.51004

5.4.1 LMM gaussian distribution with random effects:

# Run the model
library(lme4) # For the lmer() function
library(lmerTest) # To generate p-values

# Fit the linear mixed model with age, sex, body_size, and their interaction
model <- lmer(cognition ~ age * sex + body_size + (1 | site), data = bird_data)

Generate the statistical output of your model

# Summary of the model
summary(model)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: cognition ~ age * sex + body_size + (1 | site)
##    Data: bird_data
## 
## REML criterion at convergence: 593.1
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.6894 -0.6091  0.1092  0.7675  2.4452 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  site     (Intercept)  0.12    0.3465  
##  Residual             23.77    4.8753  
## Number of obs: 100, groups:  site, 3
## 
## Fixed effects:
##                      Estimate Std. Error        df t value Pr(>|t|)    
## (Intercept)         12.989235   2.451018 84.689839   5.300 9.07e-07 ***
## agejuvenile          3.254914   1.342987 93.379681   2.424   0.0173 *  
## sexmale              1.979627   1.297545 94.157439   1.526   0.1304    
## body_size           -0.008942   0.045148 94.741097  -0.198   0.8434    
## agejuvenile:sexmale  0.706059   1.979490 94.197241   0.357   0.7221    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) agjvnl sexmal bdy_sz
## agejuvenile -0.274                     
## sexmale     -0.219  0.439              
## body_size   -0.930  0.044 -0.024       
## agjvnl:sxml  0.108 -0.675 -0.656  0.054

5.4.2 LMM model output and interpretation

The key details to pay attention to are the fixed effects (Table 2) and random effects tables (Table 3).

Estimate:

The model’s best guess at the effect size of each predictor. The intercept in a linear model represents the baseline category, which is determined by the reference levels of categorical predictors. In this case, adults (age) and female (sex) are the baseline reference categories (i.e. what juveniles and males are compared to, respectively). The baseline category is determined by the model based on alphabetical order.
Intercept (12.99): The estimated cognition score for an adult female with body size = 0 (baseline category).
Age (juvenile) = 3.25: Juveniles have, on average, a 3.25-point higher cognition score than adults (when holding other factors constant). This is statistically significant.
Sex (male) = 1.98: Although males have a slightly higher cognition score than females, the effect is not statistically significant.
Body size = -0.0089: Body size does not predict cognition in this model.
Interaction (agejuvenile:sexmale) = 0.71: The additional effect of being both juvenile and male is not significant.

Standard Error (SE):

Measures the uncertainty around each estimate. Larger SE means less confidence in the estimate.

Degrees of Freedom (df):

Degrees of freedom (df) determine how much information is available to estimate model parameters and conduct significance tests, it takes into account sample size and fixed and random effects. It’s used to calculate the t-value and p-value.

t-value:

Computed as Estimate / SE. Higher t-values indicate stronger evidence that the effect is different from zero. The sign of the t-value indicates the direction of the relationship (i.e. positive or negative for categorical values; greater than or less than the reference category)

p-value:

Measures the probability that the observed effect happened by chance. Thresholds:

<0.10: non-significant trend (.)

<0.05: Significant (*)

<0.01: Very significant (**)

<0.001: Highly significant (***)

Table 2: Fixed Effects from LMM
Term Estimate Std..Error df t.value p.value
(Intercept) 12.990 2.450 84.69 5.300 0.000
age (juvenile) 3.250 1.340 93.38 2.420 0.017
sex (male) 1.980 1.300 94.16 1.530 0.130
body size -0.009 0.045 94.74 -0.198 0.843
age × sex interaction 0.710 1.980 94.20 0.357 0.722

The random effects tell us how much variation is explained by the random effects and how much variation remains unexplained by the model.

Site (random intercept variance = 0.12): Suggests only a small amount of variation in cognition is due to site differences.

Residual variance (23.77): The amount of variability left unexplained by the model.

Table 3: Random Effects from LMM
Group Name Variance Std..Dev.
Site (Intercept) 0.12 0.35
Residual
23.77 4.88

5.4.3 GLMM binomial distribution with random effects

# Create a binary variable for proportion of correct choices (e.g., cognition > 15 is correct, otherwise incorrect)
bird_data$correct_choice <- ifelse(bird_data$cognition > 15, 1, 0)

head(bird_data)
##        age    sex site body_size cognition correct_choice
## 1    adult   male    2  63.72614  12.60574              0
## 2    adult female    2  44.35730  14.66327              0
## 3    adult female    1  59.70311  17.49528              1
## 4 juvenile   male    2  49.81366  22.19250              1
## 5    adult female    3  53.62370  17.43723              1
## 6 juvenile female    3  70.11306  13.51004              0
# Run the model
# Fit a mixed-effects logistic regression model
model2 <- glmer(correct_choice ~ age * sex + body_size + (1 | site), family = binomial(link = "logit"), data = bird_data)
## boundary (singular) fit: see help('isSingular')

Generate the statistical output of your model

# Check the summary of the model
summary(model2)
## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: correct_choice ~ age * sex + body_size + (1 | site)
##    Data: bird_data
## 
##      AIC      BIC   logLik deviance df.resid 
##    136.0    151.7    -62.0    124.0       94 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.4928 -0.8295  0.4072  0.8681  1.4017 
## 
## Random effects:
##  Groups Name        Variance Std.Dev.
##  site   (Intercept) 0        0       
## Number of obs: 100, groups:  site, 3
## 
## Fixed effects:
##                      Estimate Std. Error z value Pr(>|z|)  
## (Intercept)         -0.270904   1.062068  -0.255   0.7987  
## agejuvenile          0.327316   0.564575   0.580   0.5621  
## sexmale              0.913902   0.547102   1.670   0.0948 .
## body_size           -0.006494   0.019771  -0.328   0.7425  
## agejuvenile:sexmale  1.074141   0.932849   1.151   0.2495  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) agjvnl sexmal bdy_sz
## agejuvenile -0.275                     
## sexmale     -0.206  0.455              
## body_size   -0.935  0.042 -0.039       
## agjvnl:sxml  0.100 -0.602 -0.587  0.046
## optimizer (Nelder_Mead) convergence code: 0 (OK)
## boundary (singular) fit: see help('isSingular')

5.4.4 GLMM model output and interpretation

The model output is similar to that of an LMM, except that the test statistic is a z-value, and degrees of freedom cannot be calculated. Because of this, glmer will not normally return p-values, which is why lmerTest is loaded as it estimates p values based on the z-value. As a rule of thumb, is the absolute value of z-value (or indeed t-value) is 2.00 or greater, the p-value will be statistically significant.

In this example a warning about singular fit was returned. This is because no variance was explained by site in this model.

There is also a non-significant trend for juvenile birds. Also notable is that there is no significant interaction, and therefore this raises the question of whether we should remove that interaction.

5.4.5 Interactions: to include or not to include?

Including non-significant terms increases noise. Sometimes models can be packed so full of terms and interactions that they are overfit and fail to converge. More parameters make the model more complex without improving accuracy. Unnecessary interactions can lead to higher standard errors and less precise estimates for other terms.Every added predictor consumes df, reducing statistical power for detecting real effects. Removing non-significant interactions frees up df, improving estimates of main effects. One school of thought is to adopt parsimony: The best model is the simplest one that adequately explains the data. This rationale applies not just to interactions, but also main effects.

To demonstrate this, run the previous model without the interaction term

# Run the model
# Fit a mixed-effects logistic regression model without the interaction term
model3 <- glmer(correct_choice ~ age + sex + body_size + (1 | site), family = binomial(link = "logit"), data = bird_data)
## boundary (singular) fit: see help('isSingular')
summary(model3)
## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: correct_choice ~ age + sex + body_size + (1 | site)
##    Data: bird_data
## 
##      AIC      BIC   logLik deviance df.resid 
##    135.4    148.4    -62.7    125.4       95 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.0157 -0.9208  0.5050  0.7833  1.5523 
## 
## Random effects:
##  Groups Name        Variance  Std.Dev. 
##  site   (Intercept) 7.307e-16 2.703e-08
## Number of obs: 100, groups:  site, 3
## 
## Fixed effects:
##              Estimate Std. Error z value Pr(>|z|)   
## (Intercept) -0.401608   1.055870  -0.380  0.70368   
## agejuvenile  0.745049   0.438327   1.700  0.08918 . 
## sexmale      1.315685   0.434462   3.028  0.00246 **
## body_size   -0.007675   0.019685  -0.390  0.69661   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) agjvnl sexmal
## agejuvenile -0.278              
## sexmale     -0.183  0.104       
## body_size   -0.944  0.093 -0.014
## optimizer (Nelder_Mead) convergence code: 0 (OK)
## boundary (singular) fit: see help('isSingular')

Without the interaction term, males have significantly higher cognition than females, and there is a trend for juveniles to have better cognition than adults.

5.5 Checking model assumptions

Model assumptions must be checked and reported in methods/results.

5.5.1 Normality and homogeniety of residuals

In this section, we will apply two common transformations to the response variable (cognition): the log transformation and the square root transformation. These transformations can help stabilize variance and improve model fit. We will then assess the residuals from the different models to determine which, if any, transformation fits best. To visualise normality of residuals we assess which plot shows datapoints that are best aligned with (tightly fit against) the solid regression line. to visualise homogeneity of residuals, we are looking for the plot where the datapoints are more randomly distributed (i.e. no clumping).

# Apply log transformation
bird_data$cognition_log <- log(bird_data$cognition + 1)  # log(x+1) to avoid log(0)

# Apply square root transformation
bird_data$cognition_sqrt <- sqrt(bird_data$cognition)

# Fit models for raw, log-transformed, and square root-transformed data
model_raw <- lmer(cognition ~ age * sex + body_size + (1 | site), data = bird_data)
model_log <- lmer(cognition_log ~ age * sex + body_size + (1 | site), data = bird_data)
model_sqrt <- lmer(cognition_sqrt ~ age * sex + body_size + (1 | site), data = bird_data)

# Check residuals for each model
residuals_raw <- residuals(model_raw)
residuals_log <- residuals(model_log)
residuals_sqrt <- residuals(model_sqrt)

# Check normality of residuals using Q-Q plots
par(mfrow = c(1, 3))  # Set up a 1x3 layout for the plots
qqnorm(residuals_raw, main = "Raw Data")
qqline(residuals_raw)

qqnorm(residuals_log, main = "Log-transformed Data")
qqline(residuals_log)

qqnorm(residuals_sqrt, main = "Square Root-transformed Data")
qqline(residuals_sqrt)

# Reset layout
par(mfrow = c(1, 1))

# Plot residuals vs fitted values for homogeneity of variance
par(mfrow = c(1, 3))  # Set up a 1x3 layout for the plots
plot(fitted(model_raw), residuals_raw, main = "Raw Data", xlab = "Fitted Values", ylab = "Residuals")
abline(h = 0, col = "red")

plot(fitted(model_log), residuals_log, main = "Log-transformed Data", xlab = "Fitted Values", ylab = "Residuals")
abline(h = 0, col = "red")

plot(fitted(model_sqrt), residuals_sqrt, main = "Square Root-transformed Data", xlab = "Fitted Values", ylab = "Residuals")
abline(h = 0, col = "red")

# Reset layout
par(mfrow = c(1, 1))

Having reviewed these plots, the raw data is the best fit, there is no need to transform the response variable.

5.5.2 Colinearity

Next we will test for colinearity between predictor variables by calculating Variance Inflation Factor (VIF). check_collinearity() evaluates the extent of multicollinearity between predictors in linear, generalised, and mixed models, flagging variables that may be highly correlated and affect model estimates.

VIF = 1: No correlation with other predictors.

1 < VIF < 5: Moderate correlation, generally considered acceptable.

VIF > 5: Indicates potentially problematic multicollinearity, suggesting that predictors are highly correlated.

VIF > 10: Very high multicollinearity, which can severely distort model estimates. You should consider removing or combining predictors (see PCA) to address this issue.

library(performance)

check_collinearity(model_raw)
## # Check for Multicollinearity
## 
## Low Correlation
## 
##       Term  VIF      VIF 95% CI Increased SE Tolerance Tolerance 95% CI
##        age 1.86 [1.48,    2.52]         1.36      0.54     [0.40, 0.67]
##        sex 1.76 [1.41,    2.38]         1.33      0.57     [0.42, 0.71]
##  body_size 1.02 [1.00, 3260.63]         1.01      0.98     [0.00, 1.00]
##    age:sex 2.63 [2.02,    3.62]         1.62      0.38     [0.28, 0.50]

The 95% confidence interval around the VIF estimate gives the range of VIF values based on sampling variability. If the standard Error (SE) is large, it indicates that multicollinearity is inflating the standard error of the coefficient, which makes it harder to detect a true effect of that predictor Tolerance is the inverse of VIF. A value close to 1 inidicates little to no multicollinearity. A tolerance below 0.1 suggests multicollinearity. The output also gives the 95% CI around the tolenrence.

Our data shows no collinearity, so we can proceed with the analysis as is.


5.6 Model selection

There are many methods for simplifying models by removing predictor terms. This is known as “model selection”. The best model selection method depends on who you speak to, and current trends.

Summary of Model Selection Methods

Method Best for Advantages Disadvantages
LRT Hypothesis testing, mixed models Compares model fit rigorously Requires nested models, sensitive to sample size
AIC Model selection, avoiding overfitting Penalises complexity, allows non-nested comparisons No hypothesis testing, sensitive to small sample sizes
p-values Quick significance check Easy to interpret, widely used Ignores overall model fit, affected by sample size

Recommendations:
Use LRT if testing a specific hypothesis about an effect.
Use AIC for overall model selection and avoiding overfitting.
Use p-values for quick checks, but don’t rely on them alone.

5.6.1 Likelihood Ratio Test (LRT)

The Likelihood Ratio Test (LRT) compares two nested models (one with and one without the term of interest) by evaluating whether removing the term significantly worsens model fit.

LRT Formula

\[ LRT\ statistic = -2 \times (\log \text{Likelihood of Reduced Model} - \log \text{Likelihood of Full Model}) \]

If the LRT statistic follows a χ² distribution, we can compute a p-value to determine if removing the term significantly worsens the model.

In this example, the p-value is 0.72, therefore the reduced model is not any better than the full model. Keep the full model.

library(lme4)
library(lmerTest)

# Full model (with interaction)
full_model <- lmer(cognition ~ age * sex + body_size + (1 | site), data = bird_data)

# Reduced model (without interaction)
reduced_model <- lmer(cognition ~ age + sex + body_size + (1 | site), data = bird_data)

# LRT comparison
anova(reduced_model, full_model)
## Data: bird_data
## Models:
## reduced_model: cognition ~ age + sex + body_size + (1 | site)
## full_model: cognition ~ age * sex + body_size + (1 | site)
##               npar    AIC    BIC  logLik deviance  Chisq Df Pr(>Chisq)
## reduced_model    6 607.97 623.60 -297.98   595.97                     
## full_model       7 609.83 628.07 -297.92   595.83 0.1324  1     0.7159

5.6.2 Akaike Information Criterion (AIC) Comparison

The Akaike Information Criterion (AIC) balances model fit and complexity:

\[ AIC = -2 \times \log \text{Likelihood} + 2K \]

where \(K\) is the number of estimated parameters. Lower AIC values indicate better models.

Example in R:

AIC(full_model, reduced_model)
##               df      AIC
## full_model     7 607.0767
## reduced_model  6 608.4035
  • If ΔAIC < 2, the models are considered equivalent.
  • If ΔAIC > 2, the model with higher AIC is worse.

5.6.3 p-values to perform stepwise elimination

Start with a full model, then remove non-significant interactions first, followed by non-significant main effects (unless they are needed for interpretability). This used to be very popular, but is hardly used now.


5.7 AIC Model selection with model averaging

The order in which you remove terms can affect the output. Therefore the more thorough method for models with several terms is to run every combination of terms present/removed/as interactions. Luckily there are pacakges that can do this for us to automate the process and tell us which models have the lowest AIC values, and if this is greater or less than 2 units (we consider a minimum of 2 units necessary for it to be a better model fit).

Global model: Also known as Full Model. One in which all terms and interactions are included.
Top model: The model with the lowest AIC score.
Averaged model: when more than one model are > 2 AIC units than other models, but < 2 AIC units relative to one another. In otherwords, there are multiple “top models”, and so they must come to a combined consensus.

library(lme4)
library(lmerTest)
library(MuMIn)  

# Full model (with interaction)
global_model <- glmer(correct_choice ~ age*sex + body_size + (1 | site), family = binomial(link = "logit"), data = bird_data, na.action=na.fail) #must include na.action=na.fail for dredge()
## boundary (singular) fit: see help('isSingular')
dd<-dredge(global_model, evaluate=TRUE, rank=AICc) # this using AICc for ranking models, where AICc corrects for small sample sizes
## Fixed term is "(Intercept)"
## boundary (singular) fit: see help('isSingular')
## boundary (singular) fit: see help('isSingular')
## boundary (singular) fit: see help('isSingular')
## boundary (singular) fit: see help('isSingular')
## boundary (singular) fit: see help('isSingular')
## boundary (singular) fit: see help('isSingular')
## boundary (singular) fit: see help('isSingular')
## boundary (singular) fit: see help('isSingular')
## boundary (singular) fit: see help('isSingular')
## boundary (singular) fit: see help('isSingular')
dd
## Global model call: glmer(formula = correct_choice ~ age * sex + body_size + (1 | 
##     site), data = bird_data, family = binomial(link = "logit"), 
##     na.action = na.fail)
## ---
## Model selection table 
##      (Int) age   bdy_siz sex age:sex df  logLik  AICc delta weight
## 6  -0.7912   +             +          4 -62.788 134.0  0.00  0.315
## 14 -0.5978   +             +       +  5 -62.075 134.8  0.79  0.212
## 5  -0.4520                 +          3 -64.353 135.0  0.96  0.195
## 8  -0.4016   + -0.007675   +          5 -62.712 136.1  2.07  0.112
## 7   0.1004     -0.011100   +          4 -64.189 136.8  2.80  0.078
## 16 -0.2709   + -0.006494   +       +  6 -62.021 136.9  2.95  0.072
## 2  -0.1759   +                        3 -67.672 141.6  7.60  0.007
## 1   0.1201                            2 -69.135 142.4  8.40  0.005
## 4   0.2117   + -0.007640              4 -67.589 143.6  9.60  0.003
## 3   0.6586     -0.010820              3 -68.963 144.2 10.18  0.002
## Models ranked by AICc(x) 
## Random terms (all models): 
##   1 | site

TIP: If there is a term you want to be retained for some biological/hypothesis testing justification, you can specify this using the subset() argument:

dd<-dredge(global_model, subset= ~age, evaluate=TRUE, rank=AICc)

This above output shows us a model table. Each row represents a model variation, with different combinations of predictor variables (age, sex, body_size, and their interaction age:sex). The columns indicate:
(Int): Intercept estimate.
age, sex, body_size, age:sex: Whether each predictor is included (+ for included, - for excluded).

We can subset these models to include the top models with AICc scores that are <2 units. Note that this can sometimes return nothing if there are no model better than the others

In this case, there are three top models. One which includes both sex and age as interactions, one that includes sex and age as main effects only, and one that only includes sex.

ddAIC_S<-subset(dd, delta < 2)
ddAIC_S
## Global model call: glmer(formula = correct_choice ~ age * sex + body_size + (1 | 
##     site), data = bird_data, family = binomial(link = "logit"), 
##     na.action = na.fail)
## ---
## Model selection table 
##      (Int) age sex age:sex df  logLik  AICc delta weight
## 6  -0.7912   +   +          4 -62.788 134.0  0.00  0.436
## 14 -0.5978   +   +       +  5 -62.075 134.8  0.79  0.294
## 5  -0.4520       +          3 -64.353 135.0  0.96  0.270
## Models ranked by AICc(x) 
## Random terms (all models): 
##   1 | site

Next we perform model averaging, which will take these three top models to generate an output that can be reported, which considers model types as equivalently well-fit

ddMAc<-model.avg(ddAIC_S, subset= delta <2)
summary(ddMAc)
## 
## Call:
## model.avg(object = ddAIC_S, subset = delta < 2)
## 
## Component model call: 
## glmer(formula = correct_choice ~ <3 unique rhs>, data = bird_data, 
##      family = binomial(link = "logit"), na.action = na.fail)
## 
## Component models: 
##     df logLik   AICc delta weight
## 12   4 -62.79 134.00  0.00   0.44
## 123  5 -62.08 134.79  0.79   0.29
## 2    3 -64.35 134.96  0.96   0.27
## 
## Term codes: 
##     age     sex age:sex 
##       1       2       3 
## 
## Model-averaged coefficients:  
## (full average) 
##                     Estimate Std. Error Adjusted SE z value Pr(>|z|)  
## (Intercept)          -0.6428     0.3680      0.3721   1.728   0.0840 .
## agejuvenile           0.4308     0.5262      0.5305   0.812   0.4167  
## sexmale               1.1858     0.5009      0.5065   2.341   0.0192 *
## agejuvenile:sexmale   0.3197     0.7075      0.7122   0.449   0.6535  
##  
## (conditional average) 
##                     Estimate Std. Error Adjusted SE z value Pr(>|z|)  
## (Intercept)          -0.6428     0.3680      0.3721   1.728   0.0840 .
## agejuvenile           0.5903     0.5340      0.5398   1.094   0.2742  
## sexmale               1.1858     0.5009      0.5065   2.341   0.0192 *
## agejuvenile:sexmale   1.0890     0.9314      0.9434   1.154   0.2484  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

5.7.1 Choosing between full and conditional model averaging

When using model averaging, you have two options for computing the estimates:

  1. Full Average: Includes all candidate models, even if a term is absent in some models. When a term is missing, it is treated as zero.
  2. Conditional Average: Only includes models where a term appears, ignoring models where the term is absent.

Key Differences and When to Use Each

Approach How It Works Pros Cons
Full Average Assigns a coefficient of zero to models where a predictor is missing More conservative (reduces overestimation) Can underestimate effects if a term is frequently excluded
Conditional Average Averages only across models where the term appears Provides more accurate estimates for included terms Can overestimate effects if the term is weak but appears in some models

Which Should You Use?
If your goal is prediction or avoiding overestimation, use full averaging (safer, more conservative).
If you are interested in inference and reporting effects, use conditional averaging, as it provides a clearer effect size when a term is included.

5.8 Reporting your regression analyses

When concluding your analysis, it is important to clearly communicate the findings and rationale behind your model choice. Here’s what to include:

5.8.1 Model Selection

Report the best model based on AICc, ΔAICc, or any other model selection criteria. If multiple models are equally good (ΔAICc < 2), discuss model uncertainty and therefore the choice to perform model averaging. If terms are not included in the top model, report their test statistics as they appeared in the global model, but be clear that they were dropped from the top model(s).

5.8.2 Model Averaging

If you used model averaging, specify whether you used the full average or conditional average and explain the choice (based on whether you prioritize predictive accuracy or effect size estimation). Report the model-averaged coefficients, including the estimate, standard error, z-value, and p-value for each significant term. Ensure clarity by highlighting any predictors that were consistently significant across models.

5.8.3 Statistical Significance

Clearly indicate the significance of each predictor. Use standard significance thresholds (e.g., p < 0.05) and report any terms that were significant, and discuss marginally non-significant (e.g., p < 0.1) as “trends”. If an interaction term (e.g., age:sex) is significant, explain the implications of this interaction for understanding the relationship between the predictors and the response variable.

5.8.4 Other test statistics

Refer to Table 2 and Table 3 as a guide.

5.8.5 Model Assumptions

Discuss whether the assumptions of the model (normality of residuals, homogeneity of residuals, etc.) were met. If assumptions were violated, describe the steps you took to address them (e.g., log transformations). Typically this is reported in the methods, so as not to clog the results section, which should focus on the answers to your statistical tests around your hypothesis.

6 Generating figures and plots

This section will cover three types of data plots: Linear regression plot with continuous response and predictor variables; scatterplot with boxplots for categorical predictors; stacked barplot for binomial data across categorical predictors.

Your plots should reflect the data and analyses performed, and should be as transparent as possible by including individual data points, means/medians, error bars.

Plots use bird_data generated in previous sections

6.1 Linear regression plot

Imagine we are interested in the effect of body size on cognition.

library(ggplot2)

ggplot(bird_data, aes(x=body_size, y=cognition)) + #specify the data, x and y variables
  geom_point(alpha = 0.7, size = 1.5 ) +  # Add scatterplot points, specify their size, and with slight transparency
  geom_smooth(method=lm, size = 0.5, colour = "black") +  # Add linear regression line with black colour 
  theme_bw() +   theme(panel.grid.major = element_blank(),    # Use a clean black-and-white theme,   # Remove major and minor grid lines, customise axis and border appearance
                       panel.grid.minor = element_blank(), axis.line = element_line(colour = "black"),  
                       panel.border = element_rect(linetype = "solid", colour = "black", size=.8)) +
  theme(text=element_text(size=14,  family="sans"),   # Adjust text size and font, enhance axis ticks
        axis.ticks = element_line(colour = "black", size = .7)) +
    labs(x = "Body size (scale mass index)") +   # Set axis labels with meaningful descriptions
  labs(y = "Cognition (trials to criterion)") 

We can also depict regression lines for males and females separately by specifying color

library(ggplot2)

ggplot(bird_data, aes(x=body_size, y=cognition, color = sex)) + #specify the data, x and y variables
  geom_point(alpha = 0.7, size = 1.5 ) +  # Add scatterplot points, specify their size, and with slight transparency
  geom_smooth(method=lm, size = 0.8, se = FALSE) +  # Add linear regression lines for both sexes
  scale_color_manual(values = c("male" = "#56B4E9", "female" = "#CC79A7"),#consider colour-blind friendly colours
                     labels = c("Females", "Males")) + # Manually specify legend text
  theme_bw() +   theme(panel.grid.major = element_blank(),    # Use a clean black-and-white theme,   # Remove major and minor grid lines, customise axis and border appearance
                       panel.grid.minor = element_blank(), axis.line = element_line(colour = "black"),  
                       panel.border = element_rect(linetype = "solid", colour = "black", size=.8)) +
  theme(text=element_text(size=14,  family="sans"),   # Adjust text size and font, enhance axis ticks
        axis.ticks = element_line(colour = "black", size = .7)) +
    labs(x = "Body size (scale mass index)") +   # Set axis labels with meaningful descriptions
  labs(y = "Cognition (trials to criterion)") 

###Saving your plot

Use the following code to save as a high quality .tif. This code will save to the current working directory. Adjust the width and height to suit your plot.

tiff(file=“NAME_OF_YOUR_FILE.tiff”, width = 4, height = 4, units = ‘in’, res = 300)
ggplot(YOUR_PLOT_CODE)
dev.off()

6.2 Scatterplot boxplot

Imagine we are interested in the effect of age and sex on cognition. First, we want to calculate some summary statistics about the sample size, mean, error and confidence intervals

library(dplyr)

summarySE <- function(data=NULL, measurevar, groupvars=NULL, na.rm=FALSE,
                      conf.interval=.95, .drop=TRUE) {
    library(plyr)

    # New version of length which can handle NA's: if na.rm==T, don't count them
    length2 <- function (x, na.rm=FALSE) {
        if (na.rm) sum(!is.na(x))
        else       length(x)
    }

    # This does the summary. For each group's data frame, return a vector with
    # N, mean, and sd
    datac <- ddply(data, groupvars, .drop=.drop,
      .fun = function(xx, col) {
        c(N    = length2(xx[[col]], na.rm=na.rm),
          mean = mean   (xx[[col]], na.rm=na.rm),
          sd   = sd     (xx[[col]], na.rm=na.rm)
        )
      },
      measurevar
    )

    # Rename the "mean" column    
    datac <- rename(datac, c("mean" = measurevar))

    datac$se <- datac$sd / sqrt(datac$N)  # Calculate standard error of the mean

    # Confidence interval multiplier for standard error
    # Calculate t-statistic for confidence interval: 
    # e.g., if conf.interval is .95, use .975 (above/below), and use df=N-1
    ciMult <- qt(conf.interval/2 + .5, datac$N-1)
    datac$ci <- datac$se * ciMult

    return(datac)
}

summary_stats <- summarySE(data = bird_data, measurevar = "cognition", groupvars = c("sex", "age"))

summary_stats
##      sex      age  N cognition       sd        se       ci
## 1 female    adult 31  12.50466 4.811956 0.8642527 1.765040
## 2 female juvenile 23  15.77239 5.562456 1.1598522 2.405386
## 3   male    adult 26  14.49989 5.033285 0.9871084 2.032988
## 4   male juvenile 20  18.48973 3.682931 0.8235284 1.723665

We apply the mean and error to a boxplot

library(ggplot2)

ggplot(bird_data, aes(x = interaction(age, sex), y = cognition, color = age)) +
  geom_point(position = position_jitter(width = 0.2), alpha = 0.7, size = 1) +
  geom_boxplot(alpha = 0.8, outlier.shape = NA) + #outlier.shape = NA otherwise outliers are duplicated data points on plot
  geom_pointrange(data = summary_stats,
                  aes(x = interaction(age, sex), y = cognition, 
                      ymin = cognition - se, 
                      ymax = cognition + se,
                      color =age),
                  colour = "black", size = 0.6) +
  theme_bw() +
  theme(
    panel.grid.major = element_blank(),
    panel.grid.minor = element_blank(),
    axis.line = element_line(colour = "black"),
    panel.border = element_rect(linetype = "solid", colour = "black", linewidth = 0.8),
    legend.title = element_blank(),
    plot.title = element_text(hjust = 0.5, size = 20), # Adjust title size here
    axis.text.x = element_text(angle = 45, hjust = 0.95, vjust = 1.0),
    axis.text = element_text(size = 10),
    axis.title = element_text(size = 24),
    axis.title.x = element_text(size = 14),
    axis.title.y = element_text(size = 14)
  ) +
  labs(
    title = "Cognitive performance in birds",
    x = "",
    y = "Cognition (trials until criterion)"
  ) +
  guides(colour = guide_legend(override.aes = list(size = 16))) +
  scale_color_manual(values = c(
    "#00ABE6", "#EFDD9F", "purple", "#577E43"))

Now lets rename the x axis ticks

library(ggplot2)

ggplot(bird_data, aes(x = interaction(age, sex), y = cognition, color = age)) +
  geom_point(position = position_jitter(width = 0.2), alpha = 0.7, size = 1) +
  geom_boxplot(alpha = 0.8, outlier.shape = NA) + #outlier.shape = NA otherwise outliers are duplicated data points on plot
  geom_pointrange(data = summary_stats,
                  aes(x = interaction(age, sex), y = cognition, 
                      ymin = cognition - se, 
                      ymax = cognition + se,
                      color =age),
                  colour = "black", size = 0.6) +
  theme_bw() +
  theme(
    panel.grid.major = element_blank(),
    panel.grid.minor = element_blank(),
    axis.line = element_line(colour = "black"),
    panel.border = element_rect(linetype = "solid", colour = "black", linewidth = 0.8),
    legend.title = element_blank(),
    plot.title = element_text(hjust = 0.5, size = 20), # Adjust title size here
    axis.text.x = element_text(angle = 45, hjust = 0.95, vjust = 1.0),
    axis.text = element_text(size = 10),
    axis.title = element_text(size = 24),
    axis.title.x = element_text(size = 14),
    axis.title.y = element_text(size = 14)
  ) + scale_x_discrete(labels = c("Female Adult", "Female Juvenile", "Male Adult", "Male Juvenile")) +
  labs(
    title = "Cognitive performance in birds",
    x = "Sex and Age",
    y = "Cognition (trials until criterion)"
  ) +
  guides(colour = guide_legend(override.aes = list(size = 16))) +
  scale_color_manual(values = c(
    "#00ABE6", "#EFDD9F", "purple", "#577E43"))

6.3 stacked barplot

Useful for binomial data

ggplot(bird_data, aes(x = interaction(sex, age), fill = factor(correct_choice))) +
  geom_bar(position = "fill") +
  geom_text(aes(label = ..count..), stat = 'count', position = position_fill(vjust = 0.5)) +
  scale_x_discrete(labels = c("Female adult", "female juvenile", "Male adult", "Male juvenile")) +
  theme_bw() +
  theme(panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(), 
        axis.line = element_line(colour = "black"), 
        panel.border = element_rect(linetype = "solid", colour = "black", size = 0.8)) +
  scale_fill_manual("Correct choice", values = c("1" = "gray49", "0" = "gray84")) +
  theme(text = element_text(size = 12, family = "sans"),
        axis.ticks = element_line(colour = "black", size = 0.6)) +
  labs(x = "Sex and Age", y = "Proportion of individuals")

Lets adjust the naming and legend values

###rename the values in the column from binary 1 and 0 into words so it's visually more appealing and publishable
bird_data$correct_choice[bird_data$correct_choice == '1'] <- 'Correct'
bird_data$correct_choice[bird_data$correct_choice == '0'] <- 'Incorrect'

ggplot(bird_data, aes(x = interaction(sex, age), fill = factor(correct_choice))) +
  geom_bar(position = "fill") +
  geom_text(aes(label = ..count..), stat = 'count', position = position_fill(vjust = 0.5)) +
 scale_x_discrete(labels = c("Female adult", "female juvenile", "Male adult", "Male juvenile")) +
  theme_bw() +
  theme(panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(), 
        axis.line = element_line(colour = "black"), 
        panel.border = element_rect(linetype = "solid", colour = "black", size = 0.8)) +
  scale_fill_manual("Cognition choice", values = c("Correct" = "gray49", "Incorrect" = "gray84")) +
  theme(text = element_text(size = 12, family = "sans"),
        axis.ticks = element_line(colour = "black", size = 0.6)) +
  labs(x = "Sex and Age", y = "Proportion of individuals")

And without the interaction between age and sex.

# Rename the values in the column from binary 1 and 0 into words so it's visually more appealing and publishable
bird_data$correct_choice[bird_data$correct_choice == '1'] <- 'Correct'
bird_data$correct_choice[bird_data$correct_choice == '0'] <- 'Incorrect'

ggplot(bird_data, aes(x = sex, fill = factor(correct_choice))) +
  geom_bar(position = "fill") +
  geom_text(aes(label = ..count..), stat = 'count', position = position_fill(vjust = 0.5)) +
  scale_x_discrete(labels = c("Female", "Male")) +
  theme_bw() +
  theme(panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(), 
        axis.line = element_line(colour = "black"), 
        panel.border = element_rect(linetype = "solid", colour = "black", size = 0.8)) +
  scale_fill_manual("Cognition choice", values = c("Correct" = "gray49", "Incorrect" = "gray84")) +
  theme(text = element_text(size = 12, family = "sans"),
        axis.ticks = element_line(colour = "black", size = 0.6)) +
  labs(x = "Sex", y = "Proportion of individuals")