--- title: "Sample modifiers in pepr: imply and derive" author: "Michal Stolarczyk" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Sample modifiers in pepr: imply and derive} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- # Learn how to combine implied and derived attributes in `pepr` This vignette will show you how and why to use the derieved attributes and implied attributes functionalities concurrently of the `pepr` package. - For the basic information about the PEP concept on the [project website](https://pepkit.github.io/) - Make sure to study the dedicated [derived attributes](./feature3_derivedAttributes.html) and [implied attributes](./feature2_impliedAttributes.html) vignettes prior to reading this one ## Problem/Goal While either derived attributes or implied attributes functionalities alone are often sufficient to efficiently describe your samples in PEP, the example below demonstrates how to use the derived attributes to **simplify and unclutter the columns** of the `sample_table.csv` file, after implying the attributes for samples that **follow certain patterns**. The two functionalities combined provide you with the way of building complex, yet flexible sample annotation tables effortlessly. Note that the attributes implication is always performed first - before the attributes are derived. This means that the newly created attributes (implied ones) can be used to construct the attributes in the column derivation process. Please consider the example below for reference: ```{r ,echo=FALSE} branch = "master" library(knitr) sampleAnnotation = system.file( "extdata", paste0("example_peps-", branch), "example_derive_imply", "sample_table_pre.csv", package = "pepr" ) sampleAnnotationDF = read.table(sampleAnnotation, sep = ",", header = T) knitr::kable(sampleAnnotationDF, format = "html") ``` ---- ## Solution The specification of detailed file paths/names (as presented above) is cumbersome. In order to make your life easier just find the patterns that the file names in `file_path` column of `sample_table.csv` follow, imply needed attributes and derive the file names. This multi step process is orchestrated by the `project_config.yaml` file via the `sample_modifiers.derive` and `sample_modifiers.imply` sections: ```{r, echo=FALSE,message=TRUE,collapse=TRUE,comment=" "} library(pepr) projectConfig = system.file( "extdata", paste0("example_peps-", branch), "example_derive_imply", "project_config.yaml", package = "pepr" ) .printNestedList(yaml::read_yaml(projectConfig)) ``` The `*_untreated` files are clearly associated with the samples that are labeled with `time` 0. Therefore the `untreated` attribute is implied for the samples which have 0 in the `time` columns. Similarly, the codes `susScr11` and `xenTro9` are associated with the attributes in the `oragnism` column. Therefore, the column `condion` that consists of those two codes is implied from the attributes in the `organism` column according to the `project_config.yaml`. Let's introduce a few modifications to the original `sample_table.csv` file to imply the attributes `genome` and `condition` and subsequently map the appropriate data sources from the `project_config.yaml` with attributes in the derived column - `[file_path]`: ```{r,echo=FALSE} sampleAnnotation = system.file( "extdata", paste0("example_peps-", branch), "example_derive_imply", "sample_table.csv", package = "pepr" ) sampleAnnotationDF = read.table(sampleAnnotation, sep = ",", header = T) knitr::kable(sampleAnnotationDF, format = "html") ``` ---- ## Code Load `pepr` and read in the project metadata by specifying the path to the `project_config.yaml`: ```{r collapse=TRUE} library(pepr) projectConfig = system.file( "extdata", paste0("example_peps-", branch), "example_derive_imply", "project_config.yaml", package = "pepr" ) p = Project(projectConfig) ``` And inspect it: ```{r collapse=T} sampleTable(p) ``` As you can see, the resulting samples are annotated the same way as if they were read from the original, unwieldy, annotations file (enriched with the `genome` and `condition` attributes that were implied).