--- title: "Sample modifiers in pepr: derive" author: "Michal Stolarczyk" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteEncoding{UTF-8} %\VignetteIndexEntry{Sample modifiers in pepr: derive} %\VignetteEngine{knitr::rmarkdown} editor_options: chunk_output_type: console --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` # Learn derived attributes in `pepr` This vignette will show you how and why to use the derived attributes functionality of the `pepr` package. - basic information about the PEP concept on the [project website](https://pepkit.github.io/). - broader theoretical description in the derived attributes [documentation section](http://pep.databio.org/en/2.0.0/specification/#sample_modifiersderive). ## Problem/Goal The example below demonstrates how to use the derived attributes to **flexibly define the samples attributes the `file_path` column** of the `sample_table.csv` file to match the file names in your project. Please consider the example below for reference: ```{r ,echo=FALSE} branch = "master" library(knitr) sampleAnnotation = system.file( "extdata", paste0("example_peps-", branch), "example_derive", "sample_table_pre.csv", package = "pepr" ) sampleAnnotationDF = read.table(sampleAnnotation, sep = ",", header = T) knitr::kable(sampleAnnotationDF, format = "html") ``` ---- ## Solution As the name suggests the attributes in the specified attributes (here: `file_path`) can be derived from other ones. The way how this process is carried out is indicated explicitly in the `project_config.yaml` file (presented below). The name of the column is determined in the `sample_modifiers.derive.attributes` key-value pair, whereas the pattern for the attributes construction - in the `sample_modifiers.derive.sources` one. Note that the second level key (here: `source`) has to exactly match the attributes in the `file_path` column of the modified `sample_annotation.csv` (presented below). ```{r, echo=FALSE,message=TRUE,collapse=TRUE,comment=" "} library(pepr) projectConfig = system.file( "extdata", paste0("example_peps-", branch), "example_derive", "project_config.yaml", package = "pepr" ) pepr::.printNestedList(yaml::read_yaml(projectConfig)) ``` Let's introduce a few modifications to the original `sample_annotation.csv` file to map the appropriate data sources from the `project_config.yaml` with attributes in the derived column - `[file_path]`: ```{r ,echo=FALSE} library(knitr) sampleAnnotation = system.file( "extdata", paste0("example_peps-", branch), "example_derive", "sample_table.csv", package = "pepr" ) sampleAnnotationDF = read.table(sampleAnnotation, sep = ",", header = T) kable(sampleAnnotationDF, format = "html") ``` ---- ## Code Load `pepr` and read in the project metadata by specifying the path to the `project_config.yaml`: ```{r} library(pepr) projectConfig = system.file( "extdata", paste0("example_peps-", branch), "example_derive", "project_config.yaml", package = "pepr" ) p = Project(projectConfig) ``` And inspect it: ```{r} sampleTable(p) ``` As you can see, the resulting samples are annotated the same way as if they were read from the original, unwieldy, annotations file. What is more, the `p` object consists of all the information from the project config file (`project_config.yaml`). Run the following line to explore it: ```{r} config(p) ```