BOSC 2000

FAKtory: An Open Source DNA Fragment Assembly System

Susan J. Miller and Nirav Merchant*

*Corresponding author
nirav@arl.arizona.edu
Biotechnology Computing
Arizona Research Labs
University of Arizona
Tucson, AZ 85721
USA

FAKtory is a highly configurable, GUI based software for DNA fragment manipulation and assembly. The system includes fragment clipping, prescreening, tagging functions, sequence assembly with or without constraints, and a user friendly finishing editor. FAKtory is developed for UNIX platforms using C and Tcl/Tk. An extensible input/output mechanism and scalability permits integration into varied informatics environments.

FAKtory provides a general purpose fragment database. The type of data associated with each project is configurable. Controlled vocabularies can be specified for given fields and default values established for the purpose of validity checking on input data. Database records may be sorted and viewed using any combination of fields, sets of fragment records may be selected or saved using an SQL-like query mechanism that supports string searches.

FAKtory provides a fragment pipeline, which is a configurable conduit of processing stages through which fragments must pass. There are stages for input, overlap detection, assembly, and prescreening activities such as vector removal, low quality data trimming, and tagging of recognizable elements such as Alus. FAKtory supports different modes of processing from fully automatic to completely manual, which lets the user determine the degree of interaction and control over the prescreening process. FAKtory also provides a review mechanism for handling processing anomalies and ability to export data at any given stage of the pipeline.

FAKtory can be configured with a very large range of preferences. The resulting complexity is shielded from end users by a default scheme that permits a master user to establish a lab-customized configuration that all end users see by default. Individual user may override the master defaults if so desired. FAKtory has been designed to permit cooperative work on a given sequencing project. Several users may open a project simultaneously for reading, while only one person may have a write-lock on the project.

FAKtory works with input and output filters to make it extensible,handling a wide range of input formats including ABI, SCF, FASTA, phd, and linking to a variety of post-analysis programs. For example, it is possible to create an input filter that runs 'phred' on the data to produce quality values before importing traces into faktory. The output of FAKtory may be sent to programs such as Genscan, RepeatMasker, MAGPIE or a RDBMS.

The FAKtory prescreener stages employ a general framework that permits users to develop highly sophisticated criteria for clipping or tagging data. Fragments may be trimmed or tagged based on assigned quality numbers, frequency of bases in a given window, regular expression pattern matching, or predetermined fixed intervals (e.g. clip after base 700).

The FAKtory finisher automatically keeps track of edited regions of a multi-alignment, organizing the task of editing. Chromatograms are displayed and correlated with the aligned sequences. The finisher maintains a complete history of all editing sessions and allows unlimited undo and redo of edits.

FAKtory source code, tutorial and precompiled binaries are available at ftp://bcf.arl.arizona.edu/pub/faktory. A comprehensive website and access to CVS repository will be available at http://bcf.arl.arizona.edu/faktory beginning October 2000.