DOE Joint Genome Institute

  • COVID-19
  • About Us
  • Contact Us
  • Our Science
    • DOE Mission Areas
    • Bioenergy Research Centers
    • Science Programs
    • Science Highlights
    • Scientists
    Data yielded from RIViT-seq increased the number of sigma factor-gene pairs confirmed in Streptomyces coelicolor from 209 to 399. Here, grey arrows denote previously known regulation and red arrows are regulation identified by RIViT-seq; orange nodes mark sigma factors while gray nodes mark other genes. (Otani, H., Mouncey, N.J. Nat Commun 13, 3502 (2022). https://doi.org/10.1038/s41467-022-31191-w)
    Streamlining Regulon Identification in Bacteria
    Regulons are a group of genes that can be turned on or off by the same regulatory protein. RIViT-seq technology could speed up associating transcription factors with their target genes.

    More

    (PXFuel)
    Designer DNA: JGI Helps Users Blaze New Biosynthetic Pathways
    In a special issue of the journal Synthetic Biology, JGI scientific users share how they’ve worked with the JGI DNA Synthesis Science Program and what they’ve discovered through their collaborations.

    More

    A genetic element that generates targeted mutations, called diversity-generating retroelements (DGRs), are found in viruses, as well as bacteria and archaea. Most DGRs found in viruses appear to be in their tail fibers. These tail fibers – signified in the cartoon by the blue virus’ downward pointing ‘arms’— allow the virus to attach to one cell type (red), but not the other (purple). DGRs mutate these ‘arms,’ giving the virus opportunities to switch to different prey, like the purple cell. (Courtesy of Blair Paul)
    A Natural Mechanism Can Turbocharge Viral Evolution
    A team has discovered that diversity generating retroelements (DGRs) are not only widespread, but also surprisingly active. In viruses, DGRs appear to generate diversity quickly, allowing these viruses to target new microbial prey.

    More

  • Our Projects
    • Search JGI Projects
    • DOE Metrics/Statistics
    • Approved User Proposals
    • Legacy Projects
    Photograph of a stream of diatoms beneath Arctic sea ice.
    Polar Phytoplankton Need Zinc to Cope with the Cold
    As part of a long-term collaboration with the JGI Algal Program, researchers studying function and activity of phytoplankton genes in polar waters have found that these algae rely on dissolved zinc to photosynthesize.

    More

    This data image shows the monthly average sea surface temperature for May 2015. Between 2013 and 2016, a large mass of unusually warm ocean water--nicknamed the blob--dominated the North Pacific, indicated here by red, pink, and yellow colors signifying temperatures as much as three degrees Celsius (five degrees Fahrenheit) higher than average. Data are from the NASA Multi-scale Ultra-high Resolution Sea Surface Temperature (MUR SST) Analysis product. (Courtesy NASA Physical Oceanography Distributed Active Archive Center)
    When “The Blob” Made It Hotter Under the Water
    Researchers tracked the impact of a large-scale heatwave event in the ocean known as “The Blob” as part of an approved proposal through the Community Science Program.

    More

    A plantation of poplar trees. (David Gilbert)
    Genome Insider podcast: THE Bioenergy Tree
    The US Department of Energy’s favorite tree is poplar. In this episode, hear from ORNL scientists who have uncovered remarkable genetic secrets that bring us closer to making poplar an economical and sustainable source of energy and materials.

    More

  • Data & Tools
    • IMG
    • Data Portal
    • MycoCosm
    • PhycoCosm
    • Phytozome
    • GOLD
    HPCwire Editor's Choice Award (logo crop) for Best Use of HPC in the Life Sciences
    JGI Part of Berkeley Lab Team Awarded Best Use of HPC in Life Sciences
    The HPCwire Editors Choice Award for Best Use of HPC in Life Sciences went to the Berkeley Lab team comprised of JGI and ExaBiome Project team, supported by the DOE Exascale Computing Project for MetaHipMer, an end-to-end genome assembler that supports “an unprecedented assembly of environmental microbiomes.”

    More

    With a common set of "baseline metadata," JGI users can more easily access public data sets. (Steve Wilson)
    A User-Centered Approach to Accessing JGI Data
    Reflecting a structural shift in data access, the JGI Data Portal offers a way for users to more easily access public data sets through a common set of metadata.

    More

    Phytozome portal collage
    A More Intuitive Phytozome Interface
    Phytozome v13 now hosts upwards of 250 plant genomes and provides users with the genome browsers, gene pages, search, BLAST and BioMart data warehouse interfaces they have come to rely on, with a more intuitive interface.

    More

  • User Programs
    • Calls for Proposals
    • Special Initiatives & Programs
    • Product Offerings
    • User Support
    • Policies
    • Submit a Proposal
    screencap from Amundson and Wilkins subsurface microbiome video
    Digging into Microbial Ecosystems Deep Underground
    JGI users and microbiome researchers at Colorado State University have many questions about the microbial communities deep underground, including the role viral infection may play in other natural ecosystems.

    Read more

    Yeast strains engineered for the biochemical conversion of glucose to value-added products are limited in chemical output due to growth and viability constraints. Cell extracts provide an alternative format for chemical synthesis in the absence of cell growth by isolating the soluble components of lysed cells. By separating the production of enzymes (during growth) and the biochemical production process (in cell-free reactions), this framework enables biosynthesis of diverse chemical products at volumetric productivities greater than the source strains. (Blake Rasor)
    Boosting Small Molecule Production in Super “Soup”
    Researchers supported through the Emerging Technologies Opportunity Program describe a two-pronged approach that starts with engineered yeast cells but then moves out of the cell structure into a cell-free system.

    More

    These bright green spots are fluorescently labelled bacteria from soil collected from the surface of plant roots. For reference, the scale bar at bottom right is 10 micrometers long. (Rhona Stuart)
    A Powerful Technique to Study Microbes, Now Easier
    In JGI's Genome Insider podcast: LLNL biologist Jennifer Pett-Ridge collaborated with JGI scientists through the Emerging Technologies Opportunity Program to semi-automate experiments that measure microbial activity in soil.

    More

  • News & Publications
    • News
    • Blog
    • Podcasts
    • Webinars
    • Publications
    • Newsletter
    • Logos and Templates
    • Photos
    A view of the mangroves from which the giant bacteria were sampled in Guadeloupe. (Hugo Bret)
    Giant Bacteria Found in Guadeloupe Mangroves Challenge Traditional Concepts
    Harnessing JGI and Berkeley Lab resources, researchers characterized a giant - 5,000 times bigger than most bacteria - filamentous bacterium discovered in the Caribbean mangroves.

    More

    In their approved proposal, Frederick Colwell of Oregon State University and colleagues are interested in the microbial communities that live on Alaska’s glacially dominated Copper River Delta. They’re looking at how the microbes in these high latitude wetlands, such as the Copper River Delta wetland pond shown here, cycle carbon. (Courtesy of Rick Colwell)
    Monitoring Inter-Organism Interactions Within Ecosystems
    Many of the proposals approved through JGI's annual Community Science Program call focus on harnessing genomics to developing sustainable resources for biofuels and bioproducts.

    More

    Coloring the water, the algae Phaeocystis blooms off the side of the sampling vessel, Polarstern, in the temperate region of the North Atlantic. (Katrin Schmidt)
    Climate Change Threatens Base of Polar Oceans’ Bountiful Food Webs
    As warm-adapted microbes edge polewards, they’d oust resident tiny algae. It's a trend that threatens to destabilize the delicate marine food web and change the oceans as we know them.

    More

Data & Tools
Home › Data & Tools › Software › BBTools › BBTools User Guide › Reformat Guide

Reformat Guide

Reformat is designed for generic streaming read-processing tasks that have low memory or computational demands, such as format conversion, subsampling, and various filtering operations. Some of its functionality (like quality-trimming, length-filtering, histogram generation) is shared with BBDuk, in which case BBDuk will be faster; but much of it (like converting degenerate bases to N) is unique to Reformat. Because of its lower resource consumption, Reformat is often preferable to BBDuk when piping data to or from a high-resource program. This guide will ignore most of the functionality shared with BBDuk.

Reformat’s parameters are described in its shell script (reformat.sh). This file provides usage examples of various common tasks.

*Notes*

Memory:

Reformat needs only a trivial amount of memory for processing short reads, regardless of how many there are. The only situation it would need more memory is when processing very long sequences, such as the human genome, since by default Reformat buffers several hundred sequences in memory at a time; with the human genome, that would be the whole thing (over 3GB). In that situation you can reduce the number of buffered reads with the flags “readbufferlength=1 readbuffers=1”, and/or increase the amount of memory used with the -Xmx flag (see UsageGuide for details).

Threads:

Reformat only uses a single worker thread, but can use multiple I/O threads and potentially even more compression threads if pigz is installed. The “t” flag will not impact the number of worker threads, but it can be used to cap the number of compression and I/O threads used. However, even with “t=1”, Reformat will generally use over 2 CPU cores on average since the I/O is in separate threads.

Output streams:

Reformat has 2 standard output streams, “out” and “outs”. Normal reads passing any filters being used go to “out”; “outs” only captures singleton reads that pass a filter but whose mate fails the filter.

Formats supported:

Please see readme_filtypes.txt.

Related tools:

Reformat shares a lot of functionality with BBDuk, which is typically faster but more resource-intensive. However, there is also similar functionality (low-resource, streaming operations) in some tools that seems like it would be in Reformat, but isn’t:
rename.sh does various renaming operations;
repair.sh/bbsplitpairs.sh does reordering of paired reads that have lost synchronization;
readlength.sh has more advanced length histogram control options;
filterbyname.sh for name-based filtering;
fuse.sh and split.sh for shredding and concatenating sequence;
phylip2fasta for phylip reformatting;
translate6frames for AminoAcid<->Nucleotide conversion.

*Usage Examples*

To reformat fastq to fasta:

reformat.sh in=reads.fastq out=reads.fasta

This command is analogous for any file format conversion. For example, “reformat.sh in=reads.fa.gz out=reads.sam” will convert gzipped fasta reads to uncompressed sam. They won’t be mapped, of course. If you wish to convert sam to fastq, it is recommended that you add the “primaryonly” flag to avoid getting duplicates of reads.

To run reformat in a loop, and automatically rename files appropriately:

reformat.sh in=read#.fq out=%.fa

This will convert read1.fq and read2.fq (expanded from the # symbol) to read1.fa and read2.fa (expanded from the % symbol).

To change quality encoding:

reformat.sh in=reads.fq out=reads.fq qin=33 qout=64

This will covert ASCII-33 qualities (Sanger, modern Illumina, and all other platforms) to ASCII-64 (obsolete Illumina).

To convert between fastq and fasta+qual:

reformat.sh in=reads.fq out=reads.fa qfout=reads.qual
or
reformat.sh in=reads.fa qfin=reads.qual out=reads.fq

To interleave or deinterleave paired reads:

reformat.sh in=reads.fq out1=read1.fq out2=read2.fq
or
reformat.sh in1=read1.fq in2=read2.fq out=reads.fq
or to be concise,
reformat.sh in=read#.fq out=reads.fq

To change fasta word-wrap limits:

reformat.sh in=reads.fa out=wrapped.fa fastawrap=70

To verify that reads appear to be correctly paired, based on their names:

reformat.sh in=reads.fq vint
or (for reads in 2 files)
reformat.sh in=read#.fq vpair

If it is acceptable for reads to have identical names, rather than the usual /1 and /2 or 1: and 2: at the end, add the flag “allowidenticalnames”.

To discard reads that have mismatching lengths of bases and qualities:

reformat.sh in=reads.fq out=fixed.fq tossbrokenreads

Note that this should be used with caution as it normally means the input file is corrupt.

To add a “/1” and “/2” to the names of paired reads that don’t have them:

reformat.sh in=reads.fq out=renamed.fq addslash int
To change whitespace in read names to underscores:
reformat.sh in=reads.fq out=renamed.fq underscore

Or, to trim read names after the first whitespace:

reformat.sh in=reads.fq out=renamed.fq trd

BBTools by default always use the full name of a sequence. However, some other programs ignore everything after the first whitespace, so these options are often useful for compatibility with them.

To reverse-complement reads:

reformat.sh in=reads.fq out=out.fq rcomp
or, for just read2:
reformat.sh in=reads.fq out=out.fq rcompmate

To change lowercase letters to uppercase:

reformat.sh in=reads.fq out=out.fq tuc
To perform arbitrary remaping of input bases:
reformat.sh in=reads.fq out=out.fq remap=aZGP

The map consists of a series of pairs, in this case “aZ” and “GP”. This will change “a” to “Z” and “G” to “P”, and ignore all other characters.
To convert degenerate bases (IUPAC characters) to Ns:
reformat.sh in=reads.fq out=out.fq iupacton

To ensure all sequences in a file have unique names:

reformat.sh in=reads.fq out=out.fq uniquenames

If a name is duplicated, the additional copies will have “_number” appended to them to ensure all names are unique. Note that unlike most other functions, this is NOT streaming and requires storing all names in memory. As a result, it can use a substantial amount of memory.

To cap quality scores into a certain range:

reformat.sh in=reads.fq out=out.fq mincalledquality=2 maxcalledquality=41

This is useful for preventing abnormal quality scores (such as in error-corrected PacBio reads) that can break some programs.

  • BBTools User Guide
    • Installation Guide
    • Usage Guide
    • Data Preprocessing
    • Add Adapters Guide
    • BBDuk Guide
    • BBMap Guide
    • BBMask Guide
    • BBMerge Guide
    • BBNorm Guide
    • CalcUniqueness Guide
    • Clumpify Guide
    • Dedupe Guide
    • Reformat Guide
    • Repair Guide
    • Seal Guide
    • Split Nextera Guide
    • Statistics Guide
    • Tadpole Guide
    • Taxonomy Guide
  • BBTools FAQ and Support Forums

More topics:

  • COVID-19 Status
  • News
  • Science Highlights
  • Blog
  • Webinars
  • CSP Plans
  • Featured Profiles
  • Careers
  • Contact Us
  • Events
  • User Meeting
  • MGM Workshops
  • Internal
  • Disclaimer
  • Credits
  • Policies
  • Emergency Info
  • Accessibility / Section 508 Statement
  • Flickr
  • LinkedIn
  • RSS
  • Twitter
  • YouTube
Lawrence Berkeley National Lab Biosciences Area
A project of the US Department of Energy, Office of Science

JGI is a DOE Office of Science User Facility managed by Lawrence Berkeley National Laboratory

© 1997-2023 The Regents of the University of California