Biopython 1.75 released

Dear Biopythoneers,

Biopython 1.75 has been released and is available from our website and PyPI.

This release of Biopython supports Python 2.7, 3.5, 3.6, 3.7 and is expected to work on the soon to be released Python 3.8. It has also been tested on PyPy2.7.13 v7.1.1 and PyPy3.6.1 v7.1.1-beta0.

Note we intend to drop Python 2.7 support in early 2020.

The restriction enzyme list in Bio.Restriction has been updated to the August 2019 release of REBASE.

Bio.SeqIO now supports reading and writing files in the native format of Christian Marck’s DNA Strider program (“xdna” format, also used by Serial Cloner), as well as reading files in the native formats of GSL Biotech’s SnapGene (“snapgene”) and Textco Biosoftware’s Gene Construction Kit (“gck”).

Bio.AlignIO now supports GCG MSF multiple sequence alignments as the “msf” format (work funded by the National Marrow Donor Program).

The main Seq object now has string-like .index() and .rindex() methods, matching the existing .find() and .rfind() implementations. The MutableSeq object retains its more list-like .index() behaviour.

The MMTFIO class has been added that allows writing of MMTF file format files from a Biopython structure object. MMTFIO has a similar interface to PDBIO and MMCIFIO, including the use of a Select class to write out a specified selection. This final addition to read/write support for PDB/mmCIF/MMTF in Biopython allows conversion between all three file formats.

Values from mmCIF files are now read in as a list even when they consist of a single value. This change improves consistency and reduces the likelihood of making an error, but will require user code to be updated accordingly.

Bio.PDB has been updated to support parsing REMARK 99 header entries from PDB-style Astral files.

A new keyword parameter full_sequences was added to Bio.pairwise2‘s pretty print method format_alignment to restore the output of local alignments to the ‘old’ format (showing the whole sequences including the un-aligned parts instead of only showing the aligned parts).

A new function charge_at_pH(pH) has been added to ProtParam and IsoelectricPoint in Bio.SeqUtils.

The PairwiseAligner in Bio.Align was extended to allow generalized pairwise alignments, i.e. alignments of any Python object, for example three-letter amino acid sequences, three-nucleotide codons, and arrays of integers.

A new module substitution_matrices was added to Bio.Align, which includes an Array class that can be used as a substitution matrix. As the Array class is a subclass of a numpy array, mathematical operations can be applied to it directly, and C code that makes use of substitution matrices can directly access the numerical values stored in the substitution matrices. This module is intended as a replacement of Bio.SubsMat, which is currently unmaintained.

As in recent releases, more of our code is now explicitly available under either our original “Biopython License Agreement”, or the very similar but more commonly used “3-Clause BSD License”. See the LICENSE.rst file for more details.

Additionally, a number of small bugs and typos have been fixed with further additions to the test suite, and there has been further work to follow the Python PEP8, PEP257 and best practice standard coding style. We have also started to use the black Python code formatting tool.

Many thanks to the Biopython developers and community for making this release possible, especially the following contributors:

  • Chris MacRaild
  • Chris Rands
  • Damien Goutte-Gattat (first contribution)
  • Devang Thakkar
  • Harry Jubb
  • Joe Greener
  • Kiran Mukhyala (first contribution)
  • Konstantin Vdovkin
  • Mark Amery
  • Markus Piotrowski
  • Mike Moritz (first contribution)
  • Mustafa Anil Tuncel
  • Nick Negretti
  • Peter Cock
  • Peter Kerpedjiev
  • Sergio Valqui
  • Spencer Bliven