Skip to content

Commit fc18199

Browse files
dlaehnemannAddimatorfxwiegand
authored
feat!: allow for multiple gene_lists (#164)
This will generate one heatmap per list, and bootstrap plots for each of the genes in the union of the lists. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Support multiple named gene lists — one heatmap and one bootstrap plot per list; list name appears in filenames, logs, and outputs. * Ability to exclude samples via a YAML list. * **Refactor** * Heatmap generation moved to a tidyverse/ggplot pipeline; bootstrap plotting updated to operate per-list/per-transcript. * **Documentation** * Clarified multi-list usage, per-list plotting, sample-exclude format, and Ensembl/Pfam guidance. * **Chores** * Default bootstrap FDR changed to 0.5; Ensembl updated to 115 and Pfam to 37.1; R environment/dependency definitions updated. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Adrian Prinz <44083468+Addimator@users.noreply.github.com> Co-authored-by: Felix Wiegand <fxwiegand@wgdnet.de>
1 parent e580493 commit fc18199

File tree

14 files changed

+207
-150
lines changed

14 files changed

+207
-150
lines changed

.test/config/config.yaml

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -15,9 +15,9 @@ experiment:
1515
resources:
1616
ref:
1717
species: homo_sapiens
18-
release: "114"
18+
release: "115"
1919
build: GRCh38
20-
pfam: "37.0"
20+
pfam: "37.1"
2121
representative_transcripts: canonical
2222
ontology:
2323
gene_ontology: "https://release.geneontology.org/2025-07-22/ontology/go-basic.obo"
@@ -44,7 +44,9 @@ diffexp:
4444
qq-plot: 0.05
4545
genes_of_interest:
4646
activate: true
47-
genelist: "resources/gene_list.tsv"
47+
gene_lists:
48+
gene_list_1: "resources/gene_list.tsv"
49+
gene_list_2: "resources/gene_list_2.tsv"
4850

4951
diffsplice:
5052
activate: false
@@ -81,7 +83,7 @@ report:
8183
offer_excel: true
8284

8385
bootstrap_plots:
84-
FDR: 0.01
86+
FDR: 0.5 # Intentionally high for testing.
8587
top_n: 3
8688
color_by: condition
8789

.test/resources/gene_list.tsv

Lines changed: 1 addition & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -8,11 +8,4 @@ DCT
88
MLANA
99
MITF
1010
CDK2
11-
SOX10
12-
ERBB3
13-
LEF1
14-
CTNNB1
15-
CDH1
16-
FN1
17-
NGFR
18-
AXL
11+
SOX10

.test/resources/gene_list_2.tsv

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
MLANA
2+
MITF
3+
CDK2
4+
SOX10
5+
ERBB3
6+
LEF1
7+
CTNNB1
8+
CDH1
9+
FN1
10+
NGFR
11+
AXL

.test/three_prime/config/config.yaml

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,7 @@ scatter:
3232

3333
diffexp:
3434
exclude:
35+
- SRR8309099_2
3536
models:
3637
model_X:
3738
full: ~condition
@@ -44,7 +45,8 @@ diffexp:
4445
qq-plot: 0.05
4546
genes_of_interest:
4647
activate: false
47-
genelist: "resources/gene_list.tsv"
48+
gene_lists:
49+
gene_list_1: "resources/gene_list.tsv"
4850

4951
diffsplice:
5052
activate: false

.test/three_prime/config/samples.tsv

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,4 +4,5 @@ SRR8309094 Control
44
SRR8309095 Treated
55
SRR8309097 Treated
66
SRR8309098 Control
7-
SRR8309099 Treated
7+
SRR8309099 Treated
8+
SRR8309099_2 Treated

.test/three_prime/config/units.tsv

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,4 +4,5 @@ SRR8309094 u1 430 43 quant_seq_test_data/SRR8309094.fastq.gz
44
SRR8309095 u1 430 43 quant_seq_test_data/SRR8309095.fastq.gz
55
SRR8309097 u1 430 43 quant_seq_test_data/SRR8309097.fastq.gz
66
SRR8309098 u1 430 43 quant_seq_test_data/SRR8309098.fastq.gz
7-
SRR8309099 u1 430 43 quant_seq_test_data/SRR8309099.fastq.gz
7+
SRR8309099 u1 430 43 quant_seq_test_data/SRR8309099.fastq.gz
8+
SRR8309099_2 u1 430 43 quant_seq_test_data/SRR8309099.fastq.gz

config/config.yaml

Lines changed: 29 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -24,27 +24,30 @@ resources:
2424
# For a quick check, see the Ensembl species list:
2525
# https://www.ensembl.org/info/about/species.html
2626
# For full valid species names, consult the respective table for the release
27-
# you specify, for example for ‘114’ this is at:
28-
# https://ftp.ensembl.org/pub/release-114/species_EnsemblVertebrates.txt
27+
# you specify, for example for ‘115’ this is at:
28+
# https://ftp.ensembl.org/pub/release-115/species_EnsemblVertebrates.txt
2929
# And to browse available downloads in more detail, see the FTP server:
3030
# https://ftp.ensembl.org/pub/
3131
species: homo_sapiens
3232
# ensembl release version:
3333
# Update this to the latest working version, when you first set up a new
34-
# analysis on a dataset. Later, it only makes sense to update (or downgrade)
35-
# the release versions if either (a) the version you are using consistently
36-
# fails to download (some Ensembl release versions are just broken) or
37-
# (b) you know that a newer version will include changes that will fix some
38-
# error or adds transcripts that will be relevant to your analysis.
39-
release: "114"
34+
# analysis on a dataset. You can usually find the latest release in the
35+
# Ensembl blog, by looking at the latest posts of the release category:
36+
# https://www.ensembl.info/category/01-release/
37+
# Later, it only makes sense to update (or downgrade) the release versions
38+
# if either (a) the version you are using consistently fails to download
39+
# (some Ensembl release versions are just broken) or (b) you know that a
40+
# newer version will include changes that will fix some error or adds
41+
# transcripts that will be relevant to your analysis.
42+
release: "115"
4043
# genome build:
4144
# Usually, this should just be the main build listed in:
42-
# https://ftp.ensembl.org/pub/release-114/species_EnsemblVertebrates.txt
45+
# https://ftp.ensembl.org/pub/release-115/species_EnsemblVertebrates.txt
4346
# For example, for homo_sapiens, you strip the assembly column entry
4447
# "GRCh38.p12" down to "GRCh38". If in doubt, navigate to the respective
4548
# cdna folder on the FTP server, and look for the correct build in the
4649
# file names there. For example "GRCh38" in:
47-
# https://ftp.ensembl.org/pub/release-114/fasta/homo_sapiens/cdna/
50+
# https://ftp.ensembl.org/pub/release-115/fasta/homo_sapiens/cdna/
4851
build: GRCh38
4952
# pfam release:
5053
# This is used for annotation of domains in differential splicing analysis.
@@ -54,7 +57,7 @@ resources:
5457
# https://xfam.wordpress.com/
5558
# For debugging file downloads, you can browse the FTP download server:
5659
# https://ftp.ebi.ac.uk/pub/databases/Pfam/releases/
57-
pfam: "37.0"
60+
pfam: "37.1"
5861
# representative transcripts:
5962
# Strategy for selecting a representative transcript for each gene.
6063
# kallisto quantifies expression on the transcript level. For datasets
@@ -91,8 +94,11 @@ scatter:
9194

9295
diffexp:
9396
# Samples to exclude from differential expression modeling (for example,
94-
# outliers due to technical problems).
97+
# outliers due to technical problems). List their `sample_name` column
98+
# entry in the form of a YAML list.
9599
exclude:
100+
# - sample_X
101+
# - sample_Y
96102
# model for sleuth differential expression analysis
97103
# For an introduction to sleuth, see its online manual:
98104
# https://pachterlab.github.io/sleuth/about
@@ -135,14 +141,19 @@ diffexp:
135141
volcano-plot: 0.05
136142
ma-plot: 0.05
137143
qq-plot: 0.05
138-
# heatmap and bootstrap plots for given set of genes:
139-
# If you want a heatmap and bootstrap plots for a particular set of genes,
140-
# you can set `activate: true` and provide a genelist file. In this file,
141-
# list all your HGNC gene symbols of interest in one line, separated by
142-
# tabulators (tabs).
144+
# heatmap and bootstrap plots for given sets of genes:
145+
# If you want a heatmap and bootstrap plots for particular sets of genes,
146+
# you can set `activate: true` and provide one or more gene list files. In
147+
# those files, list all your HGNC gene symbols of interest, one gene per line.
148+
# The workflow will generate one heatmap plot per gene set file, and a
149+
# bootstrap plot for each of the genes contained in any of the provided files.
143150
genes_of_interest:
144151
activate: false
145-
genelist: "config/gene_list.tsv"
152+
gene_lists:
153+
# Use a descriptive gene list name, as this will be part of
154+
# the filename of the resulting heatmap plot.
155+
gene_list_1: "config/gene_list.tsv"
156+
146157

147158
diffsplice:
148159
# isoformSwitchAnalyzer

workflow/envs/heatmap.yaml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,6 @@ channels:
33
- bioconda
44
- nodefaults
55
dependencies:
6-
- r-pheatmap =1.0.12
7-
- r-dplyr =1.0.9
8-
- r-tidyr =1.2.0
6+
- r-base >=4.1
7+
- r-tidyverse =2.0
8+
- r-ggalign =1.1

workflow/envs/sleuth.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,5 +8,5 @@ dependencies:
88
- r-pheatmap =1.0.12
99
- r-tidyverse =2.0
1010
- r-ggpubr =0.6
11-
- r-base =4
11+
- r-base >=4.1
1212
- bioconductor-limma =3.56

workflow/rules/common.smk

Lines changed: 6 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -242,15 +242,6 @@ def kallisto_params(wildcards, input):
242242
return extra
243243

244244

245-
def input_genelist(predef_genelist):
246-
if config["diffexp"]["genes_of_interest"]["activate"] == True:
247-
predef_genelist = config["diffexp"]["genes_of_interest"]["genelist"]
248-
else:
249-
predef_genelist = []
250-
251-
return predef_genelist
252-
253-
254245
def all_input(wildcards):
255246
"""
256247
Function defining all requested inputs for the rule all (below).
@@ -329,20 +320,22 @@ def all_input(wildcards):
329320
"results/tables/tpm-matrix/{model}.tpm-matrix.tsv",
330321
"results/sleuth/{model}.samples.tsv",
331322
"results/datavzrd-reports/diffexp-{model}",
332-
"results/plots/diffexp-heatmap/{model}.diffexp-heatmap.{mode}.pdf",
323+
"results/plots/diffexp-heatmap/{model}.diffexp-heatmap.{gene_list}.pdf",
333324
],
334325
model=config["diffexp"]["models"],
335-
mode=["topn"],
326+
gene_list=["topn"],
336327
)
337328
)
338329
if config["diffexp"]["genes_of_interest"]["activate"]:
339330
wanted_input.extend(
340331
expand(
341332
[
342-
"results/plots/diffexp-heatmap/{model}.diffexp-heatmap.{mode}.pdf",
333+
"results/plots/diffexp-heatmap/{model}.diffexp-heatmap.{gene_list}.pdf",
343334
],
344335
model=config["diffexp"]["models"],
345-
mode=["predefined"],
336+
gene_list=lookup(
337+
within=config, dpath="diffexp/genes_of_interest/gene_lists"
338+
),
346339
)
347340
)
348341

0 commit comments

Comments
 (0)