pepr
This vignette will show you how and why to use the implied attributes
functionality of the pepr
package.
basic information about the PEP concept on the project website
broader theoretical description in the implied attributes documentation section.
The example below demonstrates how and why to use implied attributes functionality to save your time and effort in case multiple sample attributes need to be defined for many samples and they follow certain patterns. Please consider the example below for reference:
sample_name | organism | time | file_path | genome | genome_size |
---|---|---|---|---|---|
frog_0h | frog | 0 | data/lab/project/frog_0h.fastq | ||
frog_1h | frog | 1 | data/lab/project/frog_1h.fastq | ||
human_1h | human | 1 | data/lab/project/human_1h.fastq | hg38 | hs |
human_0h | human | 0 | data/lab/project/human_0h.fastq | hg38 | hs |
mouse_1h | mouse | 1 | data/lab/project/mouse_1h.fastq | mm10 | mm |
mouse_0h | mouse | 0 | data/lab/project/mouse_1h.fastq | mm10 | mm |
Noticeably, the samples with attributes human
and
mouse
(in the organism
column) follow two
distinct patterns here. They have additional attributes in attributes
genome
and genome_size
in the
sample_table.csv
file. Consequently you can use implied
attributes to add those attributes to the sample annotations (set
global, species-level attributes at the project level instead of
duplicating that information for every sample that belongs to a
species). The way how this process is carried out is indicated
explicitly in the project_config.yaml
file (presented
below).
pep_version: 2.0.0
sample_table: sample_table.csv
looper:
output_dir: $HOME/hello_looper_results
sample_modifiers:
imply:
if:
organism: human
then:
genome: hg38
macs_genome_size: hs
if:
organism: mouse
then:
genome: mm10
macs_genome_size: mm
Consequently, you can design sample_modifiers.imply
- a
multi-level key-value section in the project_config.yaml
file. Note that the keys must match the column names and attributes in
the sample_annotations.csv
file.
Let’s introduce a few modifications to the original
sample_table.csv
file to use the
sample_modifiers.imply
section of the config. Simply skip
the attributes that will be implied and let the pepr
do the
work for you.
sample_name | organism | time | file_path |
---|---|---|---|
frog_0h | frog | 0 | data/lab/project/frog_0h.fastq |
frog_1h | frog | 1 | data/lab/project/frog_1h.fastq |
human_1h | human | 1 | data/lab/project/human_1h.fastq |
human_0h | human | 0 | data/lab/project/human_0h.fastq |
mouse_1h | mouse | 1 | data/lab/project/mouse_1h.fastq |
mouse_0h | mouse | 0 | data/lab/project/mouse_1h.fastq |
Rread in the project metadata by specifying the path to the
project_config.yaml
:
projectConfig = system.file(
"extdata",
paste0("example_peps-", branch),
"example_imply",
"project_config.yaml",
package = "pepr"
)
p = Project(projectConfig)
#> Loading config file: /tmp/RtmptrVshf/Rinstad543d17118/pepr/extdata/example_peps-master/example_imply/project_config.yaml
And inspect it:
sampleTable(p)
#> sample_name organism time file_path genome
#> <char> <char> <int> <char> <char>
#> 1: frog_0h frog 0 data/lab/project/frog_0h.fastq
#> 2: frog_1h frog 1 data/lab/project/frog_1h.fastq
#> 3: human_1h human 1 data/lab/project/human_1h.fastq hg38
#> 4: human_0h human 0 data/lab/project/human_0h.fastq hg38
#> 5: mouse_1h mouse 1 data/lab/project/mouse_1h.fastq mm10
#> 6: mouse_0h mouse 0 data/lab/project/mouse_1h.fastq mm10
#> macs_genome_size
#> <char>
#> 1:
#> 2:
#> 3: hs
#> 4: hs
#> 5: mm
#> 6: mm
As you can see, the resulting samples are annotated the same way as if they were read from the original annotations file with attributes in the two last columns manually determined.
What is more, the p
object consists of all the
information from the project config file
(project_config.yaml
). Run the following line to explore
it:
config(p)
#> Config object. Class: Config
#> pep_version: 2.0.0
#> sample_table:
#> /tmp/RtmptrVshf/Rinstad543d17118/pepr/extdata/example_peps-master/example_imply/sample_table.csv
#> looper:
#> output_dir: /github/home/hello_looper_results
#> sample_modifiers:
#> imply:
#> if:
#> organism: human
#> then:
#> genome: hg38
#> macs_genome_size: hs
#> if:
#> organism: mouse
#> then:
#> genome: mm10
#> macs_genome_size: mm
#> name: example_imply