---
title: "Subsample table in pepr"
author: "Michal Stolarczyk & Nathan Sheffield"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Subsample table in pepr}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

# Learn sample subannotations in `pepr`

This vignette will show you how and why to use the subsample table functionality of the `pepr` package.

 - basic information about the PEP concept visit the [project website](http://pep.databio.org/en/2.0.0/).
 
 - broader theoretical description in the subsample table [documentation section](http://pep.databio.org/en/2.0.0/specification/#project-attribute-subsample_table).

## Problem/Goal

This series of examples below demonstrates how and why to use sample subannoatation functionality in multiple cases to **provide multiple input files of the same type for a single sample**.


## Solutions

### Example 1: basic sample subannotation table

This example demonstrates how the sample subannotation functionality is used. In this example, 2 samples have multiple input files that need merging (`frog_1` and `frog_2`), while 1 sample (`frog_3`) does not. Therefore, `frog_3` specifies its file in the `sample_table.csv` file, while the others leave that field blank and instead specify several files in the `subsample_table.csv` file.

This example is made up of these components:

* Project config file:
```{r, echo=FALSE,message=TRUE,collapse=TRUE,comment=" "}
branch = "master"
library(pepr)
projectConfig = system.file(
"extdata",
paste0("example_peps-", branch),
"example_subtable1",
"project_config.yaml",
package = "pepr"
)
.printNestedList(yaml::read_yaml(projectConfig))
```
* Sample table:
```{r ,echo=FALSE}
library(knitr)
sampleAnnotation = system.file(
"extdata",
paste0("example_peps-", branch),
"example_subtable1",
"sample_table.csv",
package = "pepr"
)
sampleAnnotationDF = read.table(sampleAnnotation, sep = ",", header = T)
kable(sampleAnnotationDF, format = "html") 
```
* Subsample table:
```{r ,echo=FALSE}
sampleAnnotation = system.file(
  "extdata",
  paste0("example_peps-", branch),
  "example_subtable1",
  "subsample_table.csv",
  package = "pepr"
  )
  sampleAnnotationDF = read.table(sampleAnnotation, sep = ",", header = T)
  kable(sampleAnnotationDF, format = "html") 
```

Let's create the Project object and see if multiple files are present 
```{r}
projectConfig1 = system.file(
"extdata",
paste0("example_peps-", branch),
"example_subtable1",
"project_config.yaml",
package = "pepr"
)
p1 = Project(projectConfig1)
# Check the files
p1Samples = sampleTable(p1)
p1Samples$file
# Check the subsample names
p1Samples$subsample_name
```
And inspect the whole table in `p1@samples` slot
```{r,echo=FALSE}
kable(p1Samples)
```

You can also access a single subsample if you call the `getSubsample` method with appropriate `sample_name` - `subsample_name` attribute combination. Note, that this is only possible if the `subsample_name` column is defined in the `sub_annotation.csv` file.

```{r}
sampleName = "frog_1"
subsampleName = "sub_a"
getSubsample(p1, sampleName, subsampleName)
```


### Example 2: subannotations and derived attributes

This example uses a `subsample_table.csv` file and a derived attributes to point to files. This is a rather complex example. Notice we must include the `file_id` column in the `sample_table.csv` file, and leave it blank; this is then populated by just some of the samples (`frog_1` and `frog_2`) in the `subsample_table.csv`, but is left empty for the samples that are not merged.

This example is made up of these components:

* Project config file:
```{r, echo=FALSE,message=TRUE,collapse=TRUE,comment=" "}
projectConfig = system.file(
"extdata",
paste0("example_peps-", branch),
"example_subtable2",
"project_config.yaml",
package = "pepr"
)
.printNestedList(yaml::read_yaml(projectConfig))
```
* Sample annotation table:
```{r ,echo=FALSE}
sampleAnnotation = system.file(
"extdata",
paste0("example_peps-", branch),
"example_subtable2",
"sample_table.csv",
package = "pepr"
)
sampleAnnotationDF = read.table(sampleAnnotation, sep = ",", header = T)
kable(sampleAnnotationDF, format = "html") 
```
* Sample subannotation table:
```{r ,echo=FALSE}
sampleAnnotation = system.file(
  "extdata",
  paste0("example_peps-", branch),
  "example_subtable2",
  "subsample_table.csv",
  package = "pepr"
  )
  sampleAnnotationDF = read.table(sampleAnnotation, sep = ",", header = T)
  kable(sampleAnnotationDF, format = "html") 
```
Let's load the project config, create the Project object and see if multiple files are present 
```{r}
projectConfig2 = system.file(
"extdata",
paste0("example_peps-", branch),
"example_subtable2",
"project_config.yaml",
package = "pepr"
)
p2 = Project(projectConfig2)
# Check the files
p2Samples = sampleTable(p2)
p2Samples$file
```
And inspect the whole table in `p2@samples` slot
```{r,echo=FALSE}
kable(p2Samples)
```

### Example 3: subannotations and expansion characters

This example gives the exact same results as Example 2, but in this case, uses a wildcard for `frog_2` instead of including it in the `subsample_table.csv` file. Since we can't use a wildcard and a subannotation for the same sample, this necessitates specifying a second data source class (`local_files_unmerged`) that uses an asterisk (`*`). The outcome is the same.

This example is made up of these components:

* Project config file:
```{r, echo=FALSE,message=TRUE,collapse=TRUE,comment=" "}
projectConfig = system.file(
"extdata",
paste0("example_peps-", branch),
"example_subtable3",
"project_config.yaml",
package = "pepr"
)
.printNestedList(yaml::read_yaml(projectConfig))
```
* Sample annotation table:
```{r ,echo=FALSE}
sampleAnnotation = system.file(
"extdata",
paste0("example_peps-", branch),
"example_subtable3",
"sample_table.csv",
package = "pepr"
)
sampleAnnotationDF = read.table(sampleAnnotation, sep = ",", header = T)
kable(sampleAnnotationDF, format = "html") 
```
* Sample subtable table:
```{r ,echo=FALSE}
sampleAnnotation = system.file(
  "extdata",
  paste0("example_peps-", branch),
  "example_subtable3",
  "subsample_table.csv",
  package = "pepr"
  )
  sampleAnnotationDF = read.table(sampleAnnotation, sep = ",", header = T)
  kable(sampleAnnotationDF, format = "html") 
```
Let's load the project config, create the Project object and see if multiple files are present 
```{r}
projectConfig3 = system.file(
"extdata",
paste0("example_peps-", branch),
"example_subtable3",
"project_config.yaml",
package = "pepr"
)
p3 = Project(projectConfig3)
# Check the files
p3Samples = sampleTable(p3)
p3Samples$file
```
And inspect the whole table in `p3@samples` slot
```{r,echo=FALSE}
kable(p3Samples)
```

### Example 4: subannotations and multiple (separate-class) inputs

Merging is for same class inputs (like, multiple files for read1). Different-class inputs (like read1 vs read2) are handled by different attributes (or columns). This example shows you how to handle paired-end data, while also merging within each.

This example is made up of these components:

* Project config file:
```{r, echo=FALSE,message=TRUE,collapse=TRUE,comment=" "}
project_config = system.file(
"extdata",
paste0("example_peps-", branch),
"example_subtable4",
"project_config.yaml",
package = "pepr"
)
.printNestedList(yaml::read_yaml(project_config))
```
* Sample annotation table:
```{r ,echo=FALSE}
sampleAnnotation = system.file(
"extdata",
paste0("example_peps-", branch),
"example_subtable4",
"sample_table.csv",
package = "pepr"
)
sampleAnnotationDF = read.table(sampleAnnotation, sep = ",", header = T)
kable(sampleAnnotationDF, format = "html") 
```
* Sample subannotation table:
```{r ,echo=FALSE}
sampleAnnotation = system.file(
  "extdata",
  paste0("example_peps-", branch),
  "example_subtable4",
  "subsample_table.csv",
  package = "pepr"
  )
  sampleAnnotationDF = read.table(sampleAnnotation, sep = ",", header = T)
  kable(sampleAnnotationDF, format = "html") 
```
Let's load the project config, create the Project object and see if multiple files are present 
```{r}
projectConfig4 = system.file(
"extdata",
paste0("example_peps-", branch),
"example_subtable4",
"project_config.yaml",
package = "pepr"
)
p4 = Project(projectConfig4)
# Check the read1 and read2 columns
p4Samples = sampleTable(p4)
p4Samples$read1
p4Samples$read2
```
And inspect the whole table in `p4@samples` slot
```{r,echo=FALSE}
kable(p4Samples)
```