--- title: "Sample modifiers in pepr: imply" author: "Michal Stolarczyk" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Sample modifiers in pepr: imply} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` # Learn implied sample modifier in `pepr` This vignette will show you how and why to use the implied attributes functionality of the `pepr` package. - basic information about the PEP concept on the [project website](http://pep.databio.org/en/2.0.0) - broader theoretical description in the implied attributes [documentation section](http://pep.databio.org/en/2.0.0/specification/#sample_modifiersimply). ## Problem/Goal The example below demonstrates how and why to use implied attributes functionality to **save your time and effort** in case multiple sample attributes need to be defined for many samples and they **follow certain patterns**. Please consider the example below for reference: ```{r ,echo=FALSE} branch = "master" library(knitr) sampleAnnotation = system.file( "extdata", paste0("example_peps-", branch), "example_imply", "sample_table_pre.csv", package = "pepr" ) sampleAnnotationDF = read.table(sampleAnnotation, sep = ",", header = T) knitr::kable(sampleAnnotationDF, format = "html") ``` ---- ## Solution Noticeably, the samples with attributes `human` and `mouse` (in the `organism` column) follow two distinct patterns here. They have additional attributes in attributes `genome` and `genome_size` in the `sample_table.csv` file. Consequently you can use implied attributes to add those attributes to the sample annotations (set global, species-level attributes at the project level instead of duplicating that information for every sample that belongs to a species). The way how this process is carried out is indicated explicitly in the `project_config.yaml` file (presented below). ```{r, echo=FALSE,message=FALSE,collapse=TRUE,comment=" "} library(pepr) projectConfig = system.file( "extdata", paste0("example_peps-", branch), "example_imply", "project_config.yaml", package = "pepr" ) .printNestedList(yaml::read_yaml(projectConfig)) ``` Consequently, you can design `sample_modifiers.imply` - a multi-level key-value section in the `project_config.yaml` file. Note that the keys must match the column names and attributes in the `sample_annotations.csv` file. Let's introduce a few modifications to the original `sample_table.csv` file to use the `sample_modifiers.imply` section of the config. Simply skip the attributes that will be implied and let the `pepr` do the work for you. ```{r ,echo=FALSE} sampleAnnotation = system.file( "extdata", paste0("example_peps-", branch), "example_imply", "sample_table.csv", package = "pepr" ) sampleAnnotationDF = read.table(sampleAnnotation, sep = ",", header = T) kable(sampleAnnotationDF, format = "html") ``` ---- ## Code Rread in the project metadata by specifying the path to the `project_config.yaml`: ```{r} projectConfig = system.file( "extdata", paste0("example_peps-", branch), "example_imply", "project_config.yaml", package = "pepr" ) p = Project(projectConfig) ``` And inspect it: ```{r} sampleTable(p) ``` As you can see, the resulting samples are annotated the same way as if they were read from the original annotations file with attributes in the two last columns manually determined. What is more, the `p` object consists of all the information from the project config file (`project_config.yaml`). Run the following line to explore it: ```{r} config(p) ```