Skip to content

Commit e4affbe

Browse files
authored
Merge pull request #129 Added report system
2 parents 9885389 + 281225b commit e4affbe

File tree

117 files changed

+3243
-282
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

117 files changed

+3243
-282
lines changed

changelog.md

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,45 @@
11
# Changelog
22

3+
## 1.3.0
4+
5+
### Features
6+
- Added `report` run mode to Flowcraft that displays the report of any given
7+
pipeline in the Flowcraft's web application. The `report` mode can be executed
8+
after a pipeline ended or during the pipeline execution using the `--watch`
9+
option.
10+
- Added standalone report HTML at the end of the pipeline execution.
11+
- Components with support for the new report system:
12+
- `abricate`
13+
- `assembly_mapping`
14+
- `check_coverage`
15+
- `chewbbaca`
16+
- `dengue_typing`
17+
- `fastqc`
18+
- `fastqc_trimmomatic`
19+
- `integrity_coverage`
20+
- `mlst`
21+
- `patho_typing`
22+
- `pilon`
23+
- `process_mapping`
24+
- `process_newick`
25+
- `process_skesa`
26+
- `process_spades`
27+
- `process_viral_assembly`
28+
- `seq_typing`
29+
- `trimmomatic`
30+
- `true_coverage`
31+
32+
### Minor/Other changes
33+
34+
- Refactored report json for components `mash_dist`, `mash_screen` and
35+
`mapping_patlas`
36+
37+
### Bug fixes
38+
- Fixed issue where `seq_typing` and `patho_typing` processes were not feeding
39+
report data to report compiler.
40+
- Fixed fail messages for `process_assembly` and `process_viral_assembly`
41+
components
42+
343
## 1.2.2
444

545
### Components changes
@@ -9,6 +49,8 @@ sam and bam files and added data to .report.json. Updated databases to pATLAS
949
version 1.5.2.
1050
- `mash_screen` and `mash_dist`: added data to .report.json. Updated databases
1151
to pATLAS version 1.5.2.
52+
- Added new options to `abricate` componente. Users can now provide custom database
53+
directories, minimum coverage and minimum identity parameters.
1254

1355
### New components
1456

docs/_static/custom.css

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,4 +4,8 @@ div.wy-side-nav-search, div.wy-nav-top {
44

55
.wy-menu > .caption > .caption-text {
66
color: #5c6bc0;
7+
}
8+
9+
.wy-nav-content {
10+
max-width: 100%
711
}

docs/dev/create_process.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -116,6 +116,8 @@ must be used **only once**. Like in the input channel, this channel should
116116
be defined with a two element tuple with the sample ID and the data. The
117117
sample ID must match the one specified in the ``input_channel``.
118118

119+
.. _compiler:
120+
119121
{% include "compiler_channels.txt %}
120122
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
121123

docs/dev/pipeline_reporting.rst

Lines changed: 90 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,90 @@
1+
Pipeline reporting
2+
==================
3+
4+
This section describes how the reports of a FlowCraft pipeline are generated
5+
and collected at the end of a run. These reports can then be sent to the
6+
`FlowCraft web application <https://github.com/assemblerflow/flowcraft-webapp>`_
7+
where the results are visualized.
8+
9+
.. important::
10+
Note that if the nextflow process reports add new types of data, one or
11+
more React components need to be added to the web application for them
12+
to be rendered.
13+
14+
Data collection
15+
---------------
16+
17+
The data for the pipeline reports is collected from three dotfiles in each nextflow
18+
process (they should be present in each work sub directory):
19+
20+
- **.report.json**: Contains report data (See :ref:`report-json` for more information).
21+
- **.versions**: Contains information about the versions of the software used
22+
(See :ref:`versions` for more information).
23+
- **.command.trace**: Contains resource usage information.
24+
25+
The **.command.trace** file is generated by nextflow when the **trace** scope
26+
is active. The **.report.json** and **.version** files are specific to
27+
FlowCraft pipelines.
28+
29+
Generation of dotfiles
30+
^^^^^^^^^^^^^^^^^^^^^^
31+
32+
Both **report.json** and **.versions** empty dotfiles are automatically generated
33+
by the ``{% include "post.txt" ignore missing %}`` placeholder, specified in the
34+
:ref:`create-process` section. Using this placeholder in your processes is all
35+
that is needed.
36+
37+
Collection of dotfiles
38+
^^^^^^^^^^^^^^^^^^^^^^
39+
40+
The **.report.json**, **.versions** and **.command.trace** files are automatically
41+
collected and sent to dedicated report channels in the pipeline by the
42+
``{%- include "compiler_channels.txt" ignore missing -%}`` placeholder, specified
43+
in the :ref:`process creation <compiler>` section. Placing this placeholder in your
44+
processes will generate the following line in the output channel specification::
45+
46+
set {{ sample_id|default("sample_id") }}, val("{{ task_name }}_{{ pid }}"), val("{{ pid }}"), file(".report.json"), file(".versions"), file(".command.trace") into REPORT_{{task_name}}_{{ pid }}
47+
48+
This line collects several metadata associated with the process along with the three
49+
dotfiles.
50+
51+
Compilation of dotfiles
52+
^^^^^^^^^^^^^^^^^^^^^^^
53+
54+
As mentioned in the previous section, the dotfiles and other relevant metadata
55+
for are sent through special report channels to a FlowCraft component that is
56+
responsible for compiling all the information and generate a single report
57+
file at the end of each pipeline run.
58+
59+
This component is specified in ``flowcraft.generator.templates.report_compiler.nf``
60+
and it consists of two nextflow processes:
61+
62+
- First, the **report** process receives the data from each executed process that
63+
sends report data and runs the ``flowcraft/bin/prepare_reports.py`` script
64+
on that data. This script will simply merge metadata and dotfiles information
65+
in a single JSON file. This file contains the following keys:
66+
67+
- ``reportJson``: The data in **.report.json** file.
68+
- ``versions``: The data in **.versions** file.
69+
- ``trace``: The data in **.command.trace** file.
70+
- ``processId``: The process ID
71+
- ``pipelineId``: The pipeline ID that defaults to one, unless specified in
72+
the parameters.
73+
- ``projectid``: The project ID that defaults to one, unless specified in
74+
the parameters.
75+
- ``userId``: The user ID that defaults to one, unless specified in
76+
the parameters.
77+
- ``username``: The user name that defaults to *user*, unless specified in
78+
the parameters
79+
- ``processName``: The name of the flowcraft component.
80+
- ``workdir``: The work directory where the process was executed.
81+
82+
- Second, all JSON files created in the process above are merged
83+
and a single reports JSON file is created. This file will contains the
84+
following structure::
85+
86+
reportJSON = {
87+
"data": {
88+
"results": [<array of report JSONs>]
89+
}
90+
}

docs/dev/process_dotfiles.rst

Lines changed: 46 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -44,15 +44,22 @@ execution of the process. When this occurs, the ``.status`` channel must have
4444
the ``fail`` string as well. As in the warning dotfile, there is no
4545
particular format for the fail message.
4646

47+
.. _report-json:
48+
4749
Report JSON
4850
-----------
4951

52+
.. important::
53+
The general specification of the report JSON changed in version 1.2.2.
54+
See the `issue tracker <https://github.com/assemblerflow/flowcraft/issues/96>`_
55+
for details.
56+
5057
The ``.report.json`` file stores any information from a given process that is
5158
deemed worthy of being reported and displayed at the end of the pipeline.
5259
Any information can be stored in this file, as long as it is in JSON format,
5360
but there are a couple of recommendations that are necessary to follow
5461
for them to be processed by a reporting web app (Currently hosted at
55-
`report-nf <https://github.com/ODiogoSilva/report-nf>`_). However, if
62+
`flowcraft-webapp <https://github.com/assemblerflow/flowcraft-webapp>`_). However, if
5663
data processing will be performed with custom scripts, feel free to specify
5764
your own format.
5865

@@ -63,33 +70,53 @@ Information meant to be displayed in tables should be in the following
6370
format::
6471

6572
json_dic = {
66-
"tableRow": [
67-
{"header": "Raw BP",
68-
"value": chars,
69-
"table": "assembly",
70-
"columnBar": True},
73+
"tableRow": [{
74+
"sample": "A",
75+
"data": [{
76+
"header": "Raw BP",
77+
"value": 123,
78+
"table": "qc"
79+
}, {
80+
"header": "Coverage",
81+
"value": 32,
82+
"table": "qc"
83+
}]
84+
}, {
85+
"sample": "B",
86+
"data": [{
87+
"header": "Coverage",
88+
"value": 35,
89+
"table": "qc"
90+
}]
91+
}]
7192
}
7293

73-
This means that the ``chars`` variable that is created during the execution
74-
of the process should appear as a table entry with the specified ``header``
75-
and ``value``. The ``table`` key specifies in which table of the reports
76-
it will appear and the ``columnBar`` key informs the report generator to
77-
create a bar column in that particular cell.
94+
This provides table information for multiple samples in the same process. In
95+
this case, data for two samples is provided. For each sample, values for
96+
one or more headers can be provided. For instance, this report provides
97+
information about the **Raw BP** and **Coverage** for sample **A** and this
98+
information should go to the **qc** table. If any other information is relevant
99+
to build the table, feel free to add more elements to the JSON.
78100

79101
Information for plots
80102
^^^^^^^^^^^^^^^^^^^^^
81103

82104
Information meant to be displayed in plots should be in the following format::
83105

84106
json_dic = {
85-
"plotData": {
86-
"size_dist": size_dist
87-
}
107+
"plotData": [{
108+
"sample": "strainA",
109+
"data": {
110+
"sparkline": 23123,
111+
"otherplot": [1,2,3]
112+
}
113+
}],
88114
}
89115

90-
This is a simple key:value pair, where the key is the ID of the plot in the
91-
reports and the ``size_dist`` contains the plot data that was gathered
92-
for a particular process.
116+
As in the table JSON, *plotData* should be an array with an entry for each
117+
sample. The data for each sample should be another JSON where the keys are
118+
the *plot signatures*, so that we know to which plot the data belongs. The
119+
corresponding values are whatever data object you need.
93120

94121
Other information
95122
^^^^^^^^^^^^^^^^^
@@ -99,6 +126,8 @@ is not particular format for other information. They will simply store the
99126
data of interest to report and it will be the job of a downstream report app
100127
to process that data into an actual visual report.
101128

129+
.. _versions:
130+
102131
Versions
103132
--------
104133

0 commit comments

Comments
 (0)