Chromosome Evil Free Download (v3.04)
Or some Bismark HTML summary reports: Bismark Summary Report WGBS, Bismark Summary Report RRBS (no deduplication), or a Bismark Summary for a single-cell experiment which summarises a larger number of samples (Bismark Summary single cells data (.txt)) Here is an overview of the alignment modes that are currently supported by Bismark: Bismark alignment modes (pdf). Changelog 19-11-2019: 0.22.3 released (click here for the Release Notes hosted on Github)
16-10-2019: 0.22.2 released (click here for the Release Notes hosted on Github)
21-04-2019: 0.22.1 released (click here for the Release Notes hosted on Github)
16-04-2019: 0.22.0 released (click here for the Release Notes hosted on Github)
14-03-2019: 0.21.0 released (click here for the Release Notes hosted on Github)
01-02-2019: 0.20.1 released (click here for the Release Notes hosted on Github)
16-08-2018: 0.20.0 released (click here for the Release Notes hosted on Github)
27-04-2018: 0.19.1 released (click here for the Release Notes hosted on Github)
13-10-2017: 0.19.0 released (click here for the Release Notes hosted on Github)
23-05-2017: 0.18.1 released (click here for the Release Notes hosted on Github)
15-05-2017: 0.18.0 released (click here for the Release Notes hosted on Github)
18-01-2017: 0.17.0 released (click here for the Release Notes hosted on Github)
25-07-2016: 0.16.3 released (click here for the Release Notes hosted on Github)
Bismark: Essential fixes (2 in total) to address a bug for Bowtie 2 alignments where reads that should be considered ambiguous were incorrectly assigned to the first alignment thread. These errors had crept in during releases 0.16.0 and 0.16.2). More info available on Github
Bismark: Added support for large Bowtie (1) index files ending in .ebwtl which had been added in Bowtie v1.1.0
Changed the Shebang in all scripts of the Bismark suite to #!/usr/bin/env perl instead of #!/usr/bin/perl
deduplicate_bismark: Does now bail with a useful error message when the input files are empty
bismark_genome_preparation: Added new option '--genomic_composition' so that the genomic composition can be calculated and written right at the genome preparation stage rather than by using bam2nuc
bam2nuc: Now also calculates a fold coverage for the various (di-)nucleotides. The changes in the nucleotide_stats text file are also picked up and plotted by bismark2report
bam2nuc: Added a new option '--genomic_composition_only' to just process the genomic sequence without requiring any data files
bismark2summary: Added option -o/--basename FILENAME to specify a certain filename. If not specified the name will remain bismark_summary_report.txt/html
bismark2summary: Added documentation and the options '--help' and '--version' to be consistent with the rest of Bismark
bismark2summary: Added option '--title STRING' to give the HTML report a different title
25-04-2016: 0.16.1 released (click here for the Release Notes hosted on Github)
Bismark: Removed a rogue warn/sleep statement for PE/Bowtie2 mode that had crept in during the last release...
20-04-2016: 0.16.0 released (click here for the Release Notes hosted on Github)
Bismark: File endings .fastq .fq .fastq.gz .fq.gz are now removed from the output file (unless they were specified with --basename) in a bid to reduce the length of the already long file names
Bismark: Enabled the new option --dovetail (which will be turned on by default for --pbat libraries) which will now allow dovetailing reads to be reported
Bismark: Changed the behaviour of corner cases where several non-directional alignments could have existed for the very same position but to different strands so that now the best alignment trumps the weaker one. As an example: If you relaxed the alignment criteria of a given alignment to allow 60 mismatches for PE alignment we did find an alignment to the OT strand with a combined AS of -324, but there also was an alignment to the CTOB strand with and AS of 0 (perfect alignment). The CTOB now trumps the OT alignment, and the methylation information information is now reported for the bottom strand
New module: bismark2summary accepts Bismark BAM files as input. It will then try to identify Bismark reports, and optionally deduplication reports or methylation extractor (splitting) reports automatically based the BAM file basename. It produces a tab delimited overview table (.txt) as well as a graphical HTML report. Examples can be found at Bismark Summary Report and Bismark Summary Report (.txt)
The new Bismark module bam2nuc calculcates the average mono- and di-nucleotide coverage of libraries and compares this to the genomic average composition. bam2nuc can be called straight from within Bismark (option --nucleotide_coverage) or run stand-alone. bam2nuc creates a ...nucleotide_stats.txt file that is also automatically detected by bismark2report and incorporated into the HTML report
bismark_sitrep.tpl: Removed an extra function call in bismark_sitrep.tpl so that the M-bias 2 plot is drawn once the M-bias 1 plot has finished drawing (parallel processing could with certain browsers and data may have resulted in a white spaceholder only)
Methylation extractor: Altering the file path handling of coverage2cytosine and bismark2bedGraph also required some changes in the methylation extractor
bismark2bedGraph: Input file path handling has been completely reworked. The output file which can be specified as -o output.bedGraph now has to be a single file name and mustn't contain any path information. A particular output folder may be specified with -dir /any/path/
bismark2bedGraph: Addressing the file path handling issue also fixed a similar issue with the option --remove_spaces when -o had been specified
coverage2cytosine: Changed zcat for gunzip -c when reading a gzipped coverage file. This should avoid some Mac platforms crashing because zcat invariably requires a file to end in the .Z (which it doesn't...)
coverage2cytosine: Changed the way in which the coverage input file is handed over from the methylation_extractor to coverage2cytosine (previously the path information might have been part of the file name, but instead it will now be only part of the -dir output_directory option
14-01-2016: 0.15.0 released (click here for the Release Notes hosted on Github)
Added option --se/--single_end [list]. This sets single-end mapping mode explicitly giving alist of file names as [list]. The filenames may be provided as a comma , or colon :-separated list
Added option --genome_folder /path/to/genome as alternative to supplying the genome as the first argument
Added an option --rg_tag to print an @RG header line as well as and RG:Z: tag to each read. The ID and SAMPLE fields default to 'SAMPLE', but can be specified manually with --rg_id or --rg_sample
Added new option --ambig_bam for Bowtie2-mode only, which writes out a single alignment for sequences with multiple alignments to a special file ending in .ambiguous.bam. The alignments are in Bowtie2 format and do not any contain Bismark specific entries such as the methylation call etc. These ambiguous BAM files are intended to be used as coverage estimators for variant callers. Works for single-end and paired-end alignments in single or multi-core mode
Added the new options --cram and --cram_ref to Bismark for both paired- and single-end alignments in single or multi-core mode. This option requires Samtools version 1.2 or higher. A genome FastA reference may be supplied as a single file with the option --cram_ref; if this is not specified the file is derived from the reference FastA file(s) used for the Bismark run, and written to the file Bismark_genome_CRAM_reference.mfa into the output directory.
deduplicate_bismark: Added better handling of cases when the input file was empty (died for percentage calculation instead of calling it N/A)
Added a note mentioning that Read1 and Read2 of paired-end files are expected to follow each other in two consecutive lines and possibly require name-sorting prior to deduplication. Also added a check that reads the first 100000 lines to see if the file appears to have been sorted and bail out if this is true
methylation extractor: Added support for CRAM files (this option requires Samtools version 1.2 or higher) bismark2bedGraph
Changed the way gzip compressed input files are handled when using the UNIX sort command (i.e. with --scaffolds/--gazillion or without --ample_memory coverage2cytosine
Added option --gzip to compress output files. This currently only works for the default CpG_report and CX_report output files (and thus not with the option --gc or --split_files. The option --gzip is now also passed on from the bismark_methylation_extractor
Added a check to bail if no information was found in the coverage file, e.g. if a wrong file path for a .cov.gz file had been specified
bismark_genome_preparation: Added process handling to the child processes
20-08-2015: 0.14.5 released - minor fix
deduplicate_bismark: Changed all instances of literal calls of 'samtools' calls to '$samtools_path'
19-08-2015: 0.14.4 released
Bismark: Changed the FLAG values of paired-end alignments to the CTOT or CTOB strands so that reads can be properly displayed in SeqMonk when imported as BAM files. This change affects only paired-end alignments in --pbat or --non_directional mode. In detail we simply swapped the Read 1 and Read 2 FLAG values round so reads now resemble exactly concordant read pairs to the OT or OB strands. Note that results produced by the methylation extractor or further downstream of that are not affected by this change
Bismark: Input files specified with filepath information for FastA files are now handled properly in --multicore runs (this was fixed only for FastQ files in the previous patch)
Bismark: Unmapped and ambiguous files (options --unmapped and --ambiguous) are now written out as gzip compressed files by default
Bismark: Changed the default mode of operation to --bowtie2. Bowtie (1) alignments may still be chosen using the option --bowtie1
Bismark Genome Preparation: Changed the execution of the genome indexing of the parent process to system() rather than an exec() call since this seemed to lead to interesting faults when run in a pipeline setting
Bismark Genome Preparation: Changed the default indexing mode to --bowtie2. Bowtie (1) indexing is still available via the option --bowtie1
bismark2bedGraph: The coverage (.cov) and bedGraph (.bedGraph) files are now written out as gzip compressed files by default
coverage2cytosine: Added new option '--gc/--gc_context' to reprocess the genome to find methylation in GpC context. This might be useful for specialist applications where GpC methylases had been employed. The output format is exactly the same as for the normal cytosine report, and only positions covered by at least one read are reported (output file ends in .GpC_report.txt). In addition this will write out a Bismark coverage file (ending in GpC.cov)
deduplicate_bismark: Removed redundant closing statements to get rid of warning messages
deduplicate_bismark: The option --representative is no longer displayed in the help text. The option was once useful to determine the PCR bias that had been introduced by over digestion with bisulfite and is nearly always not what should be used for deduplication (it will be left in and is still functional for the time being though)
06-05-2015: 0.14.3 released
Bismark: Changed the renaming settings for paired-end files so that 'sam' within the filename no longer gets renamed to 'bam' (e.g. smallsample.sam > smallbample.sam)
Bismark: Input files specified with filepath information are now handled properly in --multicore runs
Bismark: The --multicore option currently requires the output files to be in BAM format, so specifying --sam at the same time has been disallowed
Methylation Extractor: fixed another bug for the same issue as in 0.14.1 that had crept into the 0.14.2 release (to do with --ignore_3prime)
coverage2cytosine: Changed the option --merge_CpG so that CGs starting at position 1 are not considered (since the 3-base sequence context of the bottom strand C at position 2 can not be determined)
27-03-2015: 0.14.2 released
Methylation Extractor: Added a bug fix for the same issue as in 0.14.1 that was overlooked in the earlier release
27-03-2015: 0.14.1 released
Bismark: Fixed the cleaning up stage in a --multicore run when --gzip had been specified as well
Bismark: Fixed the handling of files in a --multicore run when the input files had been specified including file path information
Bismark: Please note that the option -B/--basename in conjunction with --multicore is currently not supported (as in: disabled), but we are aiming to address this soon
Methylation Extractor: Fixed a bug with the position adjustment of paired-end reads when the reads should have been trimmed from their 3' ends (option --ignore_3prime)
deduplicate_bismark: Now also removing newline characters from the read conversion tag in case other programs interfered with the tag ordering and put this tag into the very last column
06-03-2015: 0.14.0 released - Bismark Parallelization
Bismark: Finally added parallelization to the Bismark alignment step using the option '--muticore int' which sets the number of parallel instances of Bismark to be run concurrently. At least in this first distribution this is achieved by forking the Bismark alignment step very early on so that each individual Spawn of Bismark (SoB?) processes only every n-th sequence (n being set by --multicore). Once all processes have completed, the individual BAM files, mapping reports, unmapped or ambiguous FastQ files are merged into single files in very much the same way as they would have been generated running Bismark conventionally with only a single instance.If system resources are plentiful this is a viable option to speed up the alignment process (we observed a near linear speed increase for up to --multicore 8 tested so far). However, please note that a typical Bismark run will use several cores already (Bismark itself, 2 or 4 threads of Bowtie/Bowtie2, Samtools, gzip etc...) and 10-16GB of memory depending on the choice of aligner and genome. WARNING: Bismark Parallel (BP?) is resource hungry! Each value of --multicore specified will effectively lead to a linear increase in compute and memory requirements, so --multicore 4 for e.g. the GRCm38 mouse genome will probably use 20 cores and eat 40GB or RAM, but at the same time reduce the alignment time to 25-30%. You have been warned...
Bismark: Changed the default output to BAM. SAM output may be requested using the option --sam
Bismark: No longer generates a piechart (.png) with the alignment stats. bismark2report generates a much nicer report anyway
Methylation Extractor: To detect paired-end alignment mode from the @PG header line, white spaces before and after -1 and -2 are now required. In some instances files containing e.g. -1-2 in their filename might previously have been identified as paired-end incorrectly
deduplicate_bismark: To detect paired-end alignment mode from the @PG header line, white spaces before and after -1 and -2 are now required
deduplicate_bismark: Added option --version so that Clusterflow can report a version number
bismark2bedGraph: Fixed path handling for cases where the input files were given with path information and an output directory had been specified as well
coverage2cytosine: Fixed a typo in the shebang which prevented coverage2cytosine from running
27-12-2014: 0.13.1 released
Bismark Genome Preparation: Added a check for unique chromosome names to the Bismark indexer to avoid disappointments later
Methylation Extractor: Fixed a bug for the M-bias reports when the option --multicore was used, in which case only the numbers of one core were used to constuct the report. Now every different thread writes out an individual M-bias table, and once the methylation extraction has completed all these individual files are merged into a single, cumulative table as it should be
Methylation Extractor: Added a new option --mbias_off, which processes the files as normal but does not write out any M-bias files. This option is meant for users who run the methylation extractor two times, the first time to figure out whether there is a bias that needs to be removed, and the second time using the --ignore options, but without overwriting the already existent M-bias reports
bismark2bedGraph: Deferred removal of the input file path information a little so that specifying file paths doesn't prevent bismark2bedGraph from finding the input files anymore
bismark2bedGraph: If the specified output directory doesn't exist it will be created
bismark2bedGraph: Changed the way scaffolds are sorted (with --gazillion/--scaffold specified) to -k3,3V (this was done following a suggestion by Volker Brendel, Indiana University: "The -k3,3V sort option is critical when the sequence names are numbered scaffolds (without left-buffering of zeros)
coverage2cytosine: Added a new option --merge_CpG that will post-process the genome-wide report to write out an additional coverage file which has the top and bottom strand methylation evidence pooled into a single CpG dinucleotide entity. This may be the desirable input format for some downstream processing tools such as the R-package bsseq (by K.D. Hansen). For an example please see the RELEASE_NOTES file. This option is currently experimental, and only works if CpG context only and a single genome-wide report were specified (i.e. it doesn't work with the options --CX or --split_by_chromosome)
coverage2cytosine: Changed the processing of not-covered chromosomes so that they are sorted and not processed randomly. This should make runs more reproducible
01-10-2014: 0.13.0 released
Bismark: Fixed renaming issue for SAM to BAM files (which would have replaced any occurrence of sam in the file name, e.g. sample1_... instead of the file extension .sam)
Methylation Extractor: Added new option '--multicore INT' to set the number of cores to be used for the methylation extraction process. If system resources are plentiful this is a viable option to speed up the extraction process (we observed a near linear speed increase for up to 10 cores specified). Please note that a typical process of extracting a BAM file and writing out '.gz' output streams will in fact use 3 cores per value of --multicore INT specified (1 for the methylation extractor itself, 1 for a Samtools stream, 1 for a GZIP stream), so --multicore 10 is likely to use around 30 cores of system resources. This option has no bearing on the speed of the bismark2bedGraph or genome-wide cytosine report processes
Methylation Extractor: Added two new options '--ignore_3prime INT' (for single-end alignments and Read 1 of paired-end alignments) and '--ignore_3prime_r2 INT' (for Read 2 of paired-end a