Most real-world ecological studies are characterized by imperfect
detectability, i. e. the inability to detect a species or taxon despite
its presence in a location. Imperfect detectability is a potential
source of bias that must be avoided or at least estimated, particularly
since it influences estimates of colonization and extinction.
Unfortunately, it is not always possible to avoid or estimate the
effects of imperfect detectability. We should be cautious in
interpreting estimates derived from the methods that assume perfect
detectability. However, when we have a replicated sampling design we can
account for detectability while estimating colonization and extinction
rates (MacKenzie *et al.* 2003).

MacKenzie (2003) presents a likelihood function to estimate site
occupancy, colonization, and local extinction when a species is detected
imperfectly. The method relies on replicate observations per sampling
time. The implementation of this likelihood is not trivial because there
might be several underlying colonization-extinction trajectories that
are compatible with the same observed detection history. For example, a
detection history such as \(\lbrace 101 \ 100
\rbrace\) means that in the first sampling time we have three
replicates, \(101\), where we detected
our hypothetical species twice, and a second sampling time, where we
observed \(100\), this is, we detected
the species only once. Since we detected it at least once at both time 0
and time 1, there is only one underlying colonization-extinction
trajectory compatible with it, which, we take the convention of
collapsing it into \(( 1 \ 1 )\).
However, imagine we fail to detect the species at time 1, being then our
detection history \(\lbrace 101 \ 000
\rbrace\). In this case, there are two underlying trajectories
that are both compatible with this observation, since the species could
have or could have not gone extinct at time 1. These are \(( 1 \ 1 )\) and \(( 1 \ 0 )\). Therefore, the probability of
the observed detection history \(\lbrace 101 \
000 \rbrace\) should sum over the two ways in which that
detection history could have been observed, either through the
trajectory \((1 \ 1 )\) or \((1 \ 0)\). For simplicity, let us analyse
first what is the probability for the observed detection history \(\lbrace 101 \ 100 \rbrace\). The first
sampling time always considers the probability of the species being
present at the site, \(P_0\), as the
fourth model parameter, and given that, the probability of making two
out of three possible detections, \(d^2·(1-d)\). The probability of being also
present at the time 1 given that the species was present at time 0 is
given by \(T_{11}\), and given that,
the probability of making only one out three possible detections is
\(d · (1-d)^2\), where \(d\) is the detectability *per*
replicate or probability of detecting a species when is present per
observation. Taking all together, this leads us to the following
probability for the full detection history: \[Pr(\lbrace 101 \ 100 \rbrace) = P_0 ·
d^2·(1-d)·T_{11} · d · (1-d)^2\] Now, let us examine the
detection history \(\lbrace 101 \ 000
\rbrace\). As mentioned, we have two possibilities for the second
sampling time: the species could be present and have not been detected
or could have been truly absent. Notice then that the probability of the
full detection history should sum over the two underlying
colonization-extinction histories, \(\lbrace 1
\ 1 \rbrace\) and \(\lbrace 1 \ 0
\rbrace\). It would be: \[Pr(\lbrace
101 \ 000 \rbrace) = P_0 · d^2·(1-d) · T_{11} · (1-d)^3 + P_0 ·
d^2·(1-d) · T_{10} \] where \(T_{10}\) is the probability of
colonization.

As a final example, consider the detection history \(\lbrace 001 \ 000 \ 101 \ 000 \ 111 \rbrace\). This detection history can be produced by four underlying colonization-extinction trajectories. These are: \((1 \ 1 \ 1\ 1\ 1)\), \((1 \ 0 \ 1\ 1\ 1)\), \((1 \ 1 \ 1\ 0\ 1)\), \((1 \ 0 \ 1\ 0\ 1)\). The probability of this detection history should sum over these four possible underlying colonization-extinction trajectories because all are compatible with it. Below we detailed the four conditional probabilities:

\[Pr( \lbrace 001 \ 000 \ 101 \ 000 \ 111 \rbrace | ( 1 \ 1 \ 1\ 1\ 1 ) ) = P_0·d·(1-d)^2 · T_{11}·(1-d)^3 · T_{11}·d^2·(1-d) · T_{11}·(1-d)^3 · T_{11}·d^3 \]

\[Pr( \lbrace 001 \ 000 \ 101 \ 000 \ 111 \rbrace | ( 1 \ 0 \ 1\ 1\ 1 ) ) = P_0·d·(1-d)^2 · T_{01} · T_{10}·d^2·(1-d) · T_{11}·(1-d)^3 · T_{11}·d^3 \]

\[Pr( \lbrace 001 \ 000 \ 101 \ 000 \ 111 \rbrace | ( 1 \ 1 \ 1\ 0\ 1 ) ) = P_0·d·(1-d)^2 · T_{11}·(1-d)^3 · T_{11}·d^2·(1-d) · T_{01} · T_{10}·d^3 \]

\[Pr( \lbrace 001 \ 000 \ 101 \ 000 \ 111 \rbrace | ( 1 \ 0 \ 1\ 0\ 1 ) ) = P_0·d·(1-d)^2 · T_{01} · T_{10}·d^2·(1-d) · T_{01} · T_{10}·d^3 \]

The algorithm implemented in island would sum over these four conditional probabilities to calculate the total probability for the initial detection history,\(Pr( \lbrace 001 \ 000 \ 101 \ 000 \ 111 \rbrace )\). Please note that, for real-life examples, when a species goes fully undetected for many sampling times, the full total sum becomes unfeasible because the number of compatible trajectories undergoes rapidly a combinatorial explosion. This may happen in practice if detectability per replicate is very low. In this case, only approximated likelihoods can be given. Alternatively, one could get around this problem by redesigning the full survey and taking more replicates per sampling time. As we have discussed in the main vignette of the package, transition probabilities \(T_{00}, \ T_{10}, \ T_{01}, \ T_{11}\) are functions of the rates \(c\) and \(e\) for a given time interval \(dt\) between observations. Therefore, we have all the elements required to estimate the likelihood of any detection history, even if time intervals between observations vary, which allows to find maximum likelihood estimates for the four model parameters, colonization and extinction rates, \(c\) and \(e\), along with the detectability, \(d\), and the probability of initial presence, \(P_0\).

In order to estimate detectability, we need to provide
presence-absence data with replicated samples for the same sampling
time, as in the example below extracted from data set
`lakshadweepPLUS`

, where column X2000 and X2000.1 correspond
to two replicate transects sampled in the same year. In addition, the
data can have groups that can be treated as levels of a factor, as in
column “Guild”.

Species | Atoll | Guild | X2000 | X2000.1 | X2001 | X2001.1 | X2001.2 | X2001.3 |
---|---|---|---|---|---|---|---|---|

Acanthurus_auranticavus | AGATHI | Algal_Feeder | 1 | 1 | 1 | 0 | 1 | 0.1 |

Acanthurus_leucosternon | AGATHI | Algal_Feeder | 1 | 1 | 1 | 0 | 0 | 0.1 |

Acanthurus_lineatus | AGATHI | Algal_Feeder | 1 | 1 | 1 | 0 | 0 | 0.1 |

Acanthurus_nigrofuscus | AGATHI | Algal_Feeder | 0 | 0 | 0 | 0 | 0 | 0.1 |

Acanthurus_thompsoni | AGATHI | Zooplanktivore | 0 | 0 | 0 | 0 | 0 | 0.1 |

Acanthurus_triostegus | AGATHI | Algal_Feeder | 1 | 1 | 0 | 1 | 1 | 0.1 |

Functions `sss_cedp`

, `mss_cedp`

allow the
estimation of colonization and extinction rates with imperfect
detectability with simple and multiple sampling schemes, respectively.
The function `sss_cedp`

allows estimation for a single
sampling scheme with repeated measures that has to be specified with
arguments `Time`

, that contains the unique sampling times,
and argument `Transects`

that specifies the number of of
transects per sampling time. By contrast, `mss_cedp`

allows
the estimation of rates with perfect or imperfect detectability for
multiple sampling schemes, via the use of flags for missing values
specified by argument `MV_FLAG`

, for the whole data set or
groups of factors. A full sampling scheme should be specified with
argument `Time`

, which is internally used to calculate the
particular sampling schemes associated to each separate row with the
help of the missing value flags on the columns that have not been
sampled. In the next example, we use data sets `lakshadweep`

and `lakshadweepPLUS`

to demonstrate the use of the previous
functions. These data sets are extensions of data set
`alonso15`

, and include raw data (information of up to 4
transects per atoll at each sampling time, and additional samples for
2012 and 2013). Transects are considered as replicates.
`lakshadweepPLUS`

differs in marking missing data with a
flag, combining the data for the three atolls in a single
`data.frame`

.

```
### Using sss_cedp
Data1 <- lakshadweep[[1]]
Name_of_Factors <- c("Species","Atoll","Guild")
Factors <- Filter(is.factor, Data1)
No_of_Factors <- length(Factors[1,])
n <- No_of_Factors + 1
D1 <- as.matrix(Data1[1:nrow(Data1),n:ncol(Data1)])
Time <- as.double(D1[1,])
P1 <- as.matrix(D1[2:nrow(D1),1:ncol(D1)])
Time_Vector <- as.numeric(names(table(Time)))
Transects <- as.numeric((table(Time)))
R1 <- sss_cedp(P1, Time_Vector, Transects,
Colonization=0.5, Extinction=0.5, Detectability=0.5,
Phi_Time_0=0.5,
Tol=1.0e-8, Verbose = F)
knitr::kable(unlist(R1))
```

x | |
---|---|

C | 0.3212542 |

E | 0.2007582 |

D | 0.4980380 |

P | 0.5082734 |

NLL | 2200.7687093 |

```
### Using mss_cedp
Data <- lakshadweepPLUS[[1]]
Guild_Tag = c("Alg","Cor","Mac","Mic","Omn","Pis","Zoo") # In alphabetical order.
Time <- as.vector(c(2000, 2000, 2001, 2001, 2001, 2001, 2002, 2002, 2002,
2002, 2003, 2003, 2003, 2003, 2010, 2010, 2011, 2011, 2011, 2011, 2012,
2012, 2012, 2012, 2013, 2013, 2013, 2013))
R2 <- mss_cedp(Data, Time, Factor=3, Tags=Guild_Tag, PerfectDetectability=FALSE, z=4)
#> Group 0 (Alg): NLL (Col = 0.544756, Ext = 0.167592, Dtc = 0.598818, P_0 = 0.662212) = 1399.03
#> Group 1 (Cor): NLL (Col = 0.366281, Ext = 0.216484, Dtc = 0.49887, P_0 = 0.529949) = 817.113
#> Group 2 (Mac): NLL (Col = 0.314384, Ext = 0.244689, Dtc = 0.4636, P_0 = 0.596814) = 1591.12
#> Group 3 (Mic): NLL (Col = 0.332609, Ext = 0.202035, Dtc = 0.501711, P_0 = 0.588815) = 849.614
#> Group 4 (Omn): NLL (Col = 0.184111, Ext = 0.167086, Dtc = 0.525903, P_0 = 0.426777) = 323.018
#> Group 5 (Pis): NLL (Col = 0.364651, Ext = 0.31265, Dtc = 0.371796, P_0 = 0.450578) = 712.661
#> Group 6 (Zoo): NLL (Col = 0.404582, Ext = 0.184582, Dtc = 0.558295, P_0 = 0.64373) = 619.948
```

Model selection aims to select the best model for a given phenomenon
with a reasonable number of parameters describing it and avoiding
over-fitting. Our procedure is intended to distinguish, for example,
guilds or islands with different colonization and extinction dynamics.
The function `upgma_model_selection`

incorporates an UPGMA
algorithm based model selection procedure intended to find an optimal
partition that minimizes AIC values. The algorithm needs a vector of
tags in order to estimate the partition. This function allows the
estimation of colonization and extinction rates with or without
imperfect detectability.

The following example (using `lakshadweepPLUS`

) shows the
best model describing the dynamics of coral reef fishes in the
Lakshadweep Archipelago, based on their guilds.

```
Data <- lakshadweepPLUS[[1]]
Guild_Tag = c("Alg", "Cor", "Mac", "Mic", "Omn", "Pis", "Zoo")
Time <- as.vector(c(2000, 2000, 2001, 2001, 2001, 2001, 2002, 2002, 2002,
2002, 2003, 2003, 2003, 2003, 2010, 2010, 2011, 2011, 2011, 2011, 2012,
2012, 2012, 2012, 2013, 2013, 2013, 2013))
R3 <- upgma_model_selection(Data, Time, Factor = 3, Tags = Guild_Tag,
PerfectDetectability = FALSE, z = 4)
#> Number of Columns: 28
#> Group 0 (Alg): NLL (Col = 0.544756, Ext = 0.167592, Dtc = 0.598818, P_0 = 0.662212) = 1399.03
#> Group 1 (Cor): NLL (Col = 0.366281, Ext = 0.216484, Dtc = 0.49887, P_0 = 0.529949) = 817.113
#> Group 2 (Mac): NLL (Col = 0.314384, Ext = 0.244689, Dtc = 0.4636, P_0 = 0.596814) = 1591.12
#> Group 3 (Mic): NLL (Col = 0.332609, Ext = 0.202035, Dtc = 0.501711, P_0 = 0.588815) = 849.614
#> Group 4 (Omn): NLL (Col = 0.184111, Ext = 0.167086, Dtc = 0.525903, P_0 = 0.426777) = 323.018
#> Group 5 (Pis): NLL (Col = 0.364651, Ext = 0.31265, Dtc = 0.371796, P_0 = 0.450578) = 712.661
#> Group 6 (Zoo): NLL (Col = 0.404582, Ext = 0.184582, Dtc = 0.558295, P_0 = 0.64373) = 619.948
#> Partition 0-th: Number of estimated parameters: 2
#> { Alg Cor Mac Mic Omn Pis Zoo }
#> NLL = 6395.98 AIC = 12800 AIC (corrected) = 12800 AIC_d = 127.573 AIC_w = 1.80675e-28
#> Partition 1-th: Number of estimated parameters: 4
#> { Omn Pis Zoo Cor Mic Mac } { Alg }
#> NLL = 6348.51 AIC = 12713 AIC (corrected) = 12713 AIC_d = 40.641 AIC_w = 1.36121e-09
#> Partition 2-th: Number of estimated parameters: 6
#> { Pis Zoo Cor Mic Mac } { Omn } { Alg }
#> NLL = 6345.48 AIC = 12715 AIC (corrected) = 12715 AIC_d = 42.6136 AIC_w = 5.07662e-10
#> Partition 3-th: Number of estimated parameters: 8
#> { Zoo Cor Mic Mac } { Pis } { Omn } { Alg }
#> NLL = 6324.1 AIC = 12680.2 AIC (corrected) = 12680.3 AIC_d = 7.87612 AIC_w = 0.0177308
#> Partition 4-th: Number of estimated parameters: 10
#> { Cor Mic Mac } { Zoo } { Pis } { Omn } { Alg }
#> NLL = 6316.15 AIC = 12672.3 AIC (corrected) = 12672.4 AIC_d = 0 AIC_w = 0.909928
#> Partition 5-th: Number of estimated parameters: 12
#> { Mic Mac } { Cor } { Zoo } { Pis } { Omn } { Alg }
#> NLL = 6314.83 AIC = 12677.7 AIC (corrected) = 12677.8 AIC_d = 5.39963 AIC_w = 0.0611635
#> Partition 6-th: Number of estimated parameters: 14
#> { Mac } { Mic } { Cor } { Zoo } { Pis } { Omn } { Alg }
#> NLL = 6312.5 AIC = 12681 AIC (corrected) = 12681.2 AIC_d = 8.79881 AIC_w = 0.0111781
#> \begin{table}
#> \centering
#> \begin{tabular}{lccccc}
#> Model& NLL& AIC& AIC corrected& AIC difference& AIC weights\\
#> \hline
#> 2-parameter model& 6395.98& 12800& 12800& 127.573& 1.80675e-28\\
#> 4-parameter model& 6348.51& 12713& 12713& 40.641& 1.36121e-09\\
#> 6-parameter model& 6345.48& 12715& 12715& 42.6136& 5.07662e-10\\
#> 8-parameter model& 6324.1& 12680.2& 12680.3& 7.87612& 0.0177308\\
#> 10-parameter model& 6316.15& 12672.3& 12672.4& 0& 0.909928\\
#> 12-parameter model& 6314.83& 12677.7& 12677.8& 5.39963& 0.0611635\\
#> 14-parameter model& 6312.5& 12681& 12681.2& 8.79881& 0.0111781\\
#> \end{tabular}
#> \caption{Caption goes here}
#> \label{tab:myfirsttable}
#> \end{table}
#> \begin{table}
#> \centering
#> \begin{tabular}{lcc}
#> Species Group& Extinction Rate& Colonization Rate\\
#> \hline
#> { Cor Mic Mac }& 0.22788& 0.334553\\
#> { Zoo }& 0.184582& 0.404582\\
#> { Pis }& 0.31265& 0.364651\\
#> { Omn }& 0.167086& 0.184111\\
#> { Alg }& 0.167592& 0.544756\\
#> \end{tabular}
#> \caption{Caption goes here}
#> \label{tab:myfirsttable}
#> \end{table}
```

The function `upgma_model_selection`

also generates two
output files in latex format (.tex) with: a) the parameters of the best
model found under the model selection procedure and b) the summary of
the procedure. Rmarkdown equivalent tables are included below.

Species Group | Extinction Rate | Colonization Rate |
---|---|---|

Cor Mic Mac | 0.22788 | 0.334553 |

Zoo | 0.184582 | 0.404582 |

Pis | 0.31265 | 0.364651 |

Omn | 0.167086 | 0.184111 |

Alg | 0.167592 | 0.544756 |

In table 3 (a), we find that corallivores, microinvertivores and macroinvertivores group together while the other guilds have their own estimates.

Model | NLL | AIC | AIC corrected | AIC difference | AIC weights |
---|---|---|---|---|---|

2-parameter model | 6395.98 | 12800 | 12800 | 127.573 | 1.80675e-28 |

4-parameter model | 6348.51 | 12713 | 12713 | 40.641 | 1.36121e-09 |

6-parameter model | 6345.48 | 12715 | 12715 | 42.6136 | 5.07662e-10 |

8-parameter model | 6324.1 | 12680.2 | 12680.3 | 7.87612 | 0.0177308 |

10-parameter model | 6316.15 | 12672.3 | 12672.4 | 0 | 0.909928 |

12-parameter model | 6314.83 | 12677.7 | 12677.8 | 5.39963 | 0.0611635 |

14-parameter model | 6312.5 | 12681 | 12681.2 | 8.79881 | 0.0111781 |

Table 4 (b) shows the Negative Log-Likelihood, Akaike Information Criterion and associated measures for the models considered in the UPGMA-based model selection procedure.