Getting Started
Prerequisites
DeepLEAP is built using Nextflow version 25.04.2 which must be installed on your system, and depends upon a container runtime such as Docker or Singularity. Other containerization systems may work as long as they can convert Docker images to their own format.
Nextflow Installation
Nextflow requires Java to be installed on your system. Thereafter, you can install Nextflow by running the following command in your terminal:
And you can check that this has been successful by running: Further information can be found in the Nextflow documentation.Container Runtime
Note, you only need to have one of these installed on your system.
Docker Installation
Instructions for installing Docker can be found on the Docker website. You can check if Docker is installed correctly by running:
Singularity Installation
You can find instructions for installing Singularity on the Singularity website. You can check if Singularity is installed correctly by running:
Installation
Simply clone this repository into the directory of your choice:
Usage
The repository has a sample dataset in the sample_data directory which we can use to test the pipeline. Navigate to the deepleap directory, and run the following command, replacing ${deepleap_root} with the path to the directory you cloned this repo to:
nextflow run -c nextflow.config main.nf \
--run_name testA \
--samplesheet ${deepleap_root}/sample_data/samplesheet.csv \
--reference_file ${deepleap_root}/sample_data/hxb2_env.fasta \
--aligner MAFFT \
-profile test,docker \
--region_of_interest envelope-polyprotein \
--sample_base_dir ${deepleap_root}/sample_data/input \
-output-dir ./testoutput/ \
--skip_trim
Note
Depending on your docker setup (if using docker), you may need to enter your sudo password. The prompt for this sometimes gets hidden by the rest of the Nextflow output, so be sure to check your terminal if the pipeline seems to hang.
Parameter explanation
--run_name: A name for this pipeline run. Is used in logging and report generation.--samplesheet: Path to a CSV file containing sample information. Seesample_data/samplesheet.csvfor an example.--reference_file: Path to a FASTA file containing the reference sequence for alignment.--aligner: The alignment tool to use.--region_of_interest: The genomic region to focus on. This does not have to be in any format, and is largely present for legacy reasons.--sample_base_dir: Path to the directory containing input FASTA files. This is the directory where the files specified in the samplesheet are will be looked for.--skip_trim: Optional flag to skip the trimming step. In this case it is necessary since the input files are already trimmed to the region of interest.-profile: Specifies the configuration profile to use. In this example, we use thetestprofile for testing purposes which will cap the memory usage to the maximum available on the machine and thedockerprofile to run the pipeline using Docker containers. You can also use thesingularityprofile if you prefer Singularity.-c: Specifies the path to the Nextflow configuration file. By default, Nextflow looks for a file namednextflow.configin the current directory, but the option is specified here for clarity.-output-dir: Path to the directory where output files will be saved.
Outputs
The outputs of the pipeline will be saved in the directory specified by the -output-dir option. The structure of the output directory will be as follows:
``` testoutput/ ├── amino_acid_alignments/ ├── execution_report/ ├── functional_filter ├── logs/ ├── nucleotide_alignments/ ├── rejected_sequences/ └── trimmed_sequences/