Generation and Use of Substitution Matrices in Biopython Iddo Friedberg*(1) and Brad Chapman(2) * Corresponding Author (1) Dept. of Molecular Genetics and Biotechnology The Hebrew University of Jerusalem POB 12272, Jerusalem 91120 Israel email: idoerg@cc.huji.ac.il (2) Department of Crop and Soil Science, University of Georgia, GA, USA Substitution matrices provide means for scoring an alignment, multiple or pairwise, between protein sequences. Examples of such commonly used matrices are the PAM and BLOSUM series. Substitution matrices are usually derived from multiple sequence alignments of proteins. However, matrices based on structural alignments, and matrices incorporating physico-chemical information have also been derived. As more research is being conducted using tailored subsets of sequence and structure databases, there is a need for an easy way for deriving substitution matrices from alignments, and analyzing and comparing them. This is especially true when such tailored subsets are far from being representative of protein sequence space, an underlying assumption when using the commonly derived matrices. Biopython provides several tools for the generation, analysis, and comparison of alignments. The module SubsMat can be used in conjunction with those tools to easily generate a substitution matrix from an alignment. SubsMat features the following: * Generation of observed frequency matrices, relative frequency matrices, and log-odds matrices. * Arithmetic operations on matrices. * Relative and absolute entropy calculations * Comparison with other matrices: correlation, Jensen-Shannon divergence. * Provides existing substitution matrices. * Formatted output. SubsMat is presented here within the framework of Biopython, and its implementation and use are discussed. An example of substitution matrices generated from structural alignments of sequence dissimilar proteins will be shown, along with their analysis.