---
title: 'Distance sampling: Methods and applications'
author: 'Lure point transect case study: Scottish crossbills'
mainfont: Cabin
output:
  html_document:
    highlight: tango
    includes:
      after_body: ../include/afterbody.html
      in_header: ../include/inheader.html
    pandoc_args:
    - --bibliography
    - ../include/MainBibFile.bib
    - --csl
    - ../include/ecology.csl
    - --css
    - ../include/markdown2.css
    - --toc
    - --number-sections
    self_contained: no
    theme: cosmo
  pdf_document:
    includes:
      after_body: ../include/afterbody.tex
      in_header: ../include/inheader.tex
    latex_engine: xelatex
    pandoc_args:
    - --bibliography
    - ../include/MainBibFile.bib
    - --csl
    - ../include/ecology.csl
    - --toc
    - --number-sections
classoption: a4paper
---

```{r, echo=FALSE, warning=FALSE}
myecho <- TRUE
myeval <- TRUE
library(knitr)
## http://stackoverflow.com/questions/11062497/how-to-avoid-using-round-in-every-sexpr
# knit_hooks$get("inline")
## Then customize it to your own liking
inline_hook <- function(x) {
    if(is.numeric(x)) x <- round(x, 0)
    paste(as.character(x), collapse=", ")
}
knit_hooks$set(inline = inline_hook)
opts_chunk$set(
  tidy=TRUE, size = 'small'
)
```
The Scottish crossbill is Britain's only endemic bird species.  A lure point transect survey was used to estimate its population size [@Summers2011].  In this case study, we work through the analyses reported in Section 10.2.1 of the book.  We do not attempt to recreate those analyses exactly;  instead, we simplify the analyses, while retaining the essential elements to allow the approach to be adapted to other studies.

# The trials

The trials data are in file `lure trials.csv`.  We first read the data:
```{r, echo=myecho, eval=myeval}
xbill <- read.table("lure trials.csv",header=T,sep=",")
attach(xbill)
```

We next specify a distance (in metres) beyond which it is reasonable to assume that no crossbill will respond to the lure:

```{r, echo=myecho, eval=myeval}
w <- 850
```

There are several covariates in the data that might affect probability that a bird responds to the lure:  day (days from 1^st^ January);  time (hour of the day);  habitat (1 = plantation, 2 = native pinewood);  behavcode (1 = perching and feeding, 2 = giving excitement calls, 3 = singing);  numbirds (flock size);  and dist (distance of birds from the point when the lure was played).
We can fit all these covariates as follows.

```{r, echo=myecho, eval=myeval, comment=NA}
habitat <- factor(habitat)
behavcode <- factor(behavcode)
model1 <- glm(response~dist+numbirds+day+time+habitat+behavcode,family=binomial)
summary(model1)
anova(model1)
```

We see that only `dist` has a coefficient that is significantly different from zero.  Experiment with dropping non-significant terms.  Using a backwards stepping procedure, you should find that other covariates remain non-significant, so we can use the simple model:

```{r, echo=myecho, eval=myeval, comment=NA}
model2 <- glm(response~dist,family=binomial)
summary(model2)
```

(Degrees of freedom change a little as some covariates have a few missing values.)


We can now plot the fitted detection function model:

```{r, echo=myecho, eval=myeval, fig=TRUE}
fits <- fitted(model2)
plot(dist,response,xlim=c(0,w),xlab="Distance from lure",ylab="Probability of response")
dist2 <- 0:w  
phat <- predict.glm(model2,newdata=data.frame(dist=dist2),type="response")
lines(dist2,phat)
```


# The survey

We need to specify the function $/pi(r), r /le w$, which represents the probability density function of distances of animals from the point.  If we extend the study area out a distance w around the entire boundary, thus creating a bufferzone, and place our survey points at random throughout the area + bufferzone (or randomly place a systematic grid of points over the extended study area), we can assume that this density is triangular:

```{r, echo=myecho, eval=myeval, fig=TRUE}
pi_r <- dist2/sum(dist2)
```

If the study area is fragmented, it may not be cost-effective to sample into the bufferzone.  If points are restricted to within the boundaries of the study area, and the species of interest does not occur beyond the study area because habitat is unsuitable, we need to adjust $/pi(r)$ so that it reflects the amount of available habitat (averaged over points) with increasing distance from the point [@Buckland2006b].  [@Summers2011] estimated this availability function, but for the purposes of this case study, we use the above triangular distribution.

```{r, echo=myecho, eval=myeval, comment=NA}
# Pa is prob of response, integrating r from 0 to w, assuming that phat is
# a function of distance alone (no z's in model)

(Pa <- sum(phat*pi_r))
```

Thus we estimate that just under 10% of birds within 850m of a point are detected.  We now read in the data from the main survey.  Note that detection distances are unknown in the main survey;  instead, we have used the trials data to estimate the detection function, and hence the proportion of birds within 850m that are detected.

```{r, echo=myecho, eval=myeval}
detections <- read.table("mainsurveydetections.csv",sep=",",header=T)
attach(detections)
```

We now calculate the number of points kmain in the main survey, and the total area a within 850m of a point, converting from m^2^ to km^2^.  (Note that, if we do not include unsuitable habitat as described above, this total area should also exclude that habitat.)

```{r, echo=myecho, eval=myeval}
kmain <- length(point)
a <- kmain*pi*(w/1000)^2
A <- 3505.8  # size in km2 of study area
```

We can now estimate the size of the population as
```{r, echo=myecho, eval=myeval, comment=NA}
(Nscot <- sum(nscottish)*A/(Pa*a))
```
Thus we estimate that there are around 10,000 birds.  (This does not agree with the estimate of 13,600 in [@Summers2011], because we have opted to keep the case study simple, and have omitted corrections for unsuitable habitat on surveyed plots, for juveniles, for unidentified crossbills, for incubating or brooding females, and for flying birds.  Correcting only for unsuitable habitat on surveyed plots increases the abundance estimate to 12,900 birds.)

# Measure of precision

We can calculate a standard error for our estimate and a 95% confidence interval for true abundance by bootstrapping both trials and points:

```{r, echo=myecho, eval=myeval, comment=NA}
index <- 1:kmain
nboot <- 999
bNscot <- vector(length=nboot)
m <- length(dist)
tindex <- 1:m
bdist <- vector(length=m)
bresponse <- vector(length=m)
for (i in 1:nboot) {
# bootstrap trials
  btindex <- sample(tindex,m,replace=TRUE)
  for (l in tindex) {
    bdist[l] <- dist[btindex[l]]
    bresponse[l] <- response[btindex[l]] 
    }
  bmodel <- glm(bresponse~bdist,family=binomial)
  bphat <- predict.glm(bmodel,newdata=data.frame(bdist=dist2),type="response")
  bPa <- sum(bphat*pi_r)
#bootstrap points
  rindex <- sample(index,kmain,replace=TRUE)
  bNscot[i] <- sum(nscottish[rindex])*A/(bPa*a)
}
#calculate bootstrap standard error 
seNscot <- sd(bNscot)
#calculate bootstrap percentile confidence limits
bNscot <- sort(bNscot)
alpha.over.2 <- 0.025
loNscot <- bNscot[round(alpha.over.2*(nboot+1))]
hiNscot <- bNscot[round((1-alpha.over.2)*(nboot+1))]
(c(Nscot,seNscot,loNscot,hiNscot))
```

Thus we estimate the population size to be `r Nscot` birds with standard error `r seNscot` and 95% confidence interval (`r loNscot`, `r hiNscot`).

# References