---
title: "Sample modifiers in pepr: derive"
author: "Michal Stolarczyk"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteEncoding{UTF-8}
  %\VignetteIndexEntry{Sample modifiers in pepr: derive}
  %\VignetteEngine{knitr::rmarkdown}
editor_options: 
  chunk_output_type: console
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

# Learn derived attributes in `pepr`

This vignette will show you how and why to use the derived attributes functionality of the `pepr` package. 

 - basic information about the PEP concept on the [project website](https://pepkit.github.io/).
 
 - broader theoretical description in the derived attributes [documentation section](http://pep.databio.org/en/2.0.0/specification/#sample_modifiersderive).

## Problem/Goal

The example below demonstrates how to use the derived attributes to **flexibly define the samples attributes the `file_path` column** of the `sample_table.csv` file to match the file names in  your project. Please consider the example below for reference:


```{r ,echo=FALSE}
branch = "master"
library(knitr)
sampleAnnotation = system.file(
"extdata",
paste0("example_peps-", branch),
"example_derive",
"sample_table_pre.csv",
package = "pepr"
)
sampleAnnotationDF = read.table(sampleAnnotation, sep = ",", header = T)
knitr::kable(sampleAnnotationDF, format = "html") 
```

----

## Solution

As the name suggests the attributes in the specified attributes (here: `file_path`) can be derived from other ones. The way how this process is carried out is indicated explicitly in the `project_config.yaml` file (presented below). The name of the column is determined in the `sample_modifiers.derive.attributes` key-value pair, whereas the pattern for the attributes construction - in the `sample_modifiers.derive.sources` one. Note that the second level key (here: `source`) has to exactly match the attributes in the `file_path` column of the modified `sample_annotation.csv` (presented below).


```{r, echo=FALSE,message=TRUE,collapse=TRUE,comment=" "}
library(pepr)
projectConfig = system.file(
"extdata",
paste0("example_peps-", branch),
"example_derive",
"project_config.yaml",
package = "pepr"
)
pepr::.printNestedList(yaml::read_yaml(projectConfig))
```


Let's introduce a few modifications to the original `sample_annotation.csv` file to map the appropriate data sources from the `project_config.yaml` with attributes in the derived column - `[file_path]`:

```{r ,echo=FALSE}
library(knitr)
sampleAnnotation = system.file(
"extdata",
paste0("example_peps-", branch),
"example_derive",
"sample_table.csv",
package = "pepr"
)
sampleAnnotationDF = read.table(sampleAnnotation, sep = ",", header = T)
kable(sampleAnnotationDF, format = "html") 
```

----

## Code

Load `pepr` and read in the project metadata by specifying the path to the `project_config.yaml`:

```{r}
library(pepr)
projectConfig = system.file(
"extdata",
paste0("example_peps-", branch),
"example_derive",
"project_config.yaml",
package = "pepr"
)
p = Project(projectConfig)
```

And inspect it:

```{r}
sampleTable(p)
```

As you can see, the resulting samples are annotated the same way as if they were read from the original, unwieldy, annotations file.

What is more, the `p` object consists of all the information from the project config file (`project_config.yaml`). Run the following line to explore it:

```{r}
config(p)
```