--- title: "Subsample table in pepr" author: "Michal Stolarczyk & Nathan Sheffield" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Subsample table in pepr} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` # Learn sample subannotations in `pepr` This vignette will show you how and why to use the subsample table functionality of the `pepr` package. - basic information about the PEP concept visit the [project website](http://pep.databio.org/en/2.0.0/). - broader theoretical description in the subsample table [documentation section](http://pep.databio.org/en/2.0.0/specification/#project-attribute-subsample_table). ## Problem/Goal This series of examples below demonstrates how and why to use sample subannoatation functionality in multiple cases to **provide multiple input files of the same type for a single sample**. ## Solutions ### Example 1: basic sample subannotation table This example demonstrates how the sample subannotation functionality is used. In this example, 2 samples have multiple input files that need merging (`frog_1` and `frog_2`), while 1 sample (`frog_3`) does not. Therefore, `frog_3` specifies its file in the `sample_table.csv` file, while the others leave that field blank and instead specify several files in the `subsample_table.csv` file. This example is made up of these components: * Project config file: ```{r, echo=FALSE,message=TRUE,collapse=TRUE,comment=" "} branch = "master" library(pepr) projectConfig = system.file( "extdata", paste0("example_peps-", branch), "example_subtable1", "project_config.yaml", package = "pepr" ) .printNestedList(yaml::read_yaml(projectConfig)) ``` * Sample table: ```{r ,echo=FALSE} library(knitr) sampleAnnotation = system.file( "extdata", paste0("example_peps-", branch), "example_subtable1", "sample_table.csv", package = "pepr" ) sampleAnnotationDF = read.table(sampleAnnotation, sep = ",", header = T) kable(sampleAnnotationDF, format = "html") ``` * Subsample table: ```{r ,echo=FALSE} sampleAnnotation = system.file( "extdata", paste0("example_peps-", branch), "example_subtable1", "subsample_table.csv", package = "pepr" ) sampleAnnotationDF = read.table(sampleAnnotation, sep = ",", header = T) kable(sampleAnnotationDF, format = "html") ``` Let's create the Project object and see if multiple files are present ```{r} projectConfig1 = system.file( "extdata", paste0("example_peps-", branch), "example_subtable1", "project_config.yaml", package = "pepr" ) p1 = Project(projectConfig1) # Check the files p1Samples = sampleTable(p1) p1Samples$file # Check the subsample names p1Samples$subsample_name ``` And inspect the whole table in `p1@samples` slot ```{r,echo=FALSE} kable(p1Samples) ``` You can also access a single subsample if you call the `getSubsample` method with appropriate `sample_name` - `subsample_name` attribute combination. Note, that this is only possible if the `subsample_name` column is defined in the `sub_annotation.csv` file. ```{r} sampleName = "frog_1" subsampleName = "sub_a" getSubsample(p1, sampleName, subsampleName) ``` ### Example 2: subannotations and derived attributes This example uses a `subsample_table.csv` file and a derived attributes to point to files. This is a rather complex example. Notice we must include the `file_id` column in the `sample_table.csv` file, and leave it blank; this is then populated by just some of the samples (`frog_1` and `frog_2`) in the `subsample_table.csv`, but is left empty for the samples that are not merged. This example is made up of these components: * Project config file: ```{r, echo=FALSE,message=TRUE,collapse=TRUE,comment=" "} projectConfig = system.file( "extdata", paste0("example_peps-", branch), "example_subtable2", "project_config.yaml", package = "pepr" ) .printNestedList(yaml::read_yaml(projectConfig)) ``` * Sample annotation table: ```{r ,echo=FALSE} sampleAnnotation = system.file( "extdata", paste0("example_peps-", branch), "example_subtable2", "sample_table.csv", package = "pepr" ) sampleAnnotationDF = read.table(sampleAnnotation, sep = ",", header = T) kable(sampleAnnotationDF, format = "html") ``` * Sample subannotation table: ```{r ,echo=FALSE} sampleAnnotation = system.file( "extdata", paste0("example_peps-", branch), "example_subtable2", "subsample_table.csv", package = "pepr" ) sampleAnnotationDF = read.table(sampleAnnotation, sep = ",", header = T) kable(sampleAnnotationDF, format = "html") ``` Let's load the project config, create the Project object and see if multiple files are present ```{r} projectConfig2 = system.file( "extdata", paste0("example_peps-", branch), "example_subtable2", "project_config.yaml", package = "pepr" ) p2 = Project(projectConfig2) # Check the files p2Samples = sampleTable(p2) p2Samples$file ``` And inspect the whole table in `p2@samples` slot ```{r,echo=FALSE} kable(p2Samples) ``` ### Example 3: subannotations and expansion characters This example gives the exact same results as Example 2, but in this case, uses a wildcard for `frog_2` instead of including it in the `subsample_table.csv` file. Since we can't use a wildcard and a subannotation for the same sample, this necessitates specifying a second data source class (`local_files_unmerged`) that uses an asterisk (`*`). The outcome is the same. This example is made up of these components: * Project config file: ```{r, echo=FALSE,message=TRUE,collapse=TRUE,comment=" "} projectConfig = system.file( "extdata", paste0("example_peps-", branch), "example_subtable3", "project_config.yaml", package = "pepr" ) .printNestedList(yaml::read_yaml(projectConfig)) ``` * Sample annotation table: ```{r ,echo=FALSE} sampleAnnotation = system.file( "extdata", paste0("example_peps-", branch), "example_subtable3", "sample_table.csv", package = "pepr" ) sampleAnnotationDF = read.table(sampleAnnotation, sep = ",", header = T) kable(sampleAnnotationDF, format = "html") ``` * Sample subtable table: ```{r ,echo=FALSE} sampleAnnotation = system.file( "extdata", paste0("example_peps-", branch), "example_subtable3", "subsample_table.csv", package = "pepr" ) sampleAnnotationDF = read.table(sampleAnnotation, sep = ",", header = T) kable(sampleAnnotationDF, format = "html") ``` Let's load the project config, create the Project object and see if multiple files are present ```{r} projectConfig3 = system.file( "extdata", paste0("example_peps-", branch), "example_subtable3", "project_config.yaml", package = "pepr" ) p3 = Project(projectConfig3) # Check the files p3Samples = sampleTable(p3) p3Samples$file ``` And inspect the whole table in `p3@samples` slot ```{r,echo=FALSE} kable(p3Samples) ``` ### Example 4: subannotations and multiple (separate-class) inputs Merging is for same class inputs (like, multiple files for read1). Different-class inputs (like read1 vs read2) are handled by different attributes (or columns). This example shows you how to handle paired-end data, while also merging within each. This example is made up of these components: * Project config file: ```{r, echo=FALSE,message=TRUE,collapse=TRUE,comment=" "} project_config = system.file( "extdata", paste0("example_peps-", branch), "example_subtable4", "project_config.yaml", package = "pepr" ) .printNestedList(yaml::read_yaml(project_config)) ``` * Sample annotation table: ```{r ,echo=FALSE} sampleAnnotation = system.file( "extdata", paste0("example_peps-", branch), "example_subtable4", "sample_table.csv", package = "pepr" ) sampleAnnotationDF = read.table(sampleAnnotation, sep = ",", header = T) kable(sampleAnnotationDF, format = "html") ``` * Sample subannotation table: ```{r ,echo=FALSE} sampleAnnotation = system.file( "extdata", paste0("example_peps-", branch), "example_subtable4", "subsample_table.csv", package = "pepr" ) sampleAnnotationDF = read.table(sampleAnnotation, sep = ",", header = T) kable(sampleAnnotationDF, format = "html") ``` Let's load the project config, create the Project object and see if multiple files are present ```{r} projectConfig4 = system.file( "extdata", paste0("example_peps-", branch), "example_subtable4", "project_config.yaml", package = "pepr" ) p4 = Project(projectConfig4) # Check the read1 and read2 columns p4Samples = sampleTable(p4) p4Samples$read1 p4Samples$read2 ``` And inspect the whole table in `p4@samples` slot ```{r,echo=FALSE} kable(p4Samples) ```