- Ultrafast and memory-efficient alignment of short DNA sequences to the human genome
Center for Bioinformatics and Computational Biology, Institute for Advanced Computer Studies, University of Maryland, College Park, MD 20742, USA
Genome Biol 10:R25. 2009
..Multiple processor cores can be used simultaneously to achieve even greater alignment speeds. Bowtie is open source (http://bowtie.cbcb.umd.edu)...
- MUSCLE: multiple sequence alignment with high accuracy and high throughput
Robert C Edgar
Nucleic Acids Res 32:1792-7. 2004
..The MUSCLE program, source code and PREFAB test data are freely available at http://www.drive5. com/muscle...
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
S F Altschul
National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
Nucleic Acids Res 25:3389-402. 1997
..PSI-BLAST is used to uncover several new and interesting members of the BRCT superfamily...
- Fast and accurate short read alignment with Burrows-Wheeler transform
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, CB10 1SA, UK
Bioinformatics 25:1754-60. 2009
..The speed of MAQ is also a concern when the alignment is scaled up to the resequencing of hundreds of individuals...
- Velvet: algorithms for de novo short read assembly using de Bruijn graphs
Daniel R Zerbino
EMBL European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
Genome Res 18:821-9. 2008
We have developed a new set of algorithms, collectively called "Velvet," to manipulate de Bruijn graphs for genomic sequence assembly...
- Mapping and quantifying mammalian transcriptomes by RNA-Seq
Division of Biology, MC 156 29, California Institute of Technology, Pasadena, California 91125, USA
Nat Methods 5:621-8. 2008
..We observed 1.45 x 10(5) distinct splices, and alternative splices were prominent, with 3,500 different genes expressing one or more alternate internal splices...
- New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0
Méthodes et Algorithmes pour la Bioinformatique, LIRMM, Centre National de la Recherche Scientifique, Universite de Montpellier, Montpellier Cedex 5, France
Syst Biol 59:307-21. 2010
..In the meantime, research around PhyML has continued, and this article describes the new algorithms and methods implemented in the program...
- Exploration, normalization, and summaries of high density oligonucleotide array probe level data
Rafael A Irizarry
Department of Biostatistics, Johns Hopkins University, Baltimore, MD 21205, USA
Biostatistics 4:249-64. 2003
..Finally, we evaluate the algorithms in terms of their ability to detect known levels of differential expression using the spike-in data...
- Basic local alignment search tool
S F Altschul
National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894
J Mol Biol 215:403-10. 1990
..In addition to its flexibility and tractability to mathematical analysis, BLAST is an order of magnitude faster than existing sequence comparison tools of comparable sensitivity...
- Model-based analysis of ChIP-Seq (MACS)
Department of Biostatistics and Computational Biology, Dana Farber Cancer Institute and Harvard School of Public Health, Boston, MA 02115, USA
Genome Biol 9:R137. 2008
..MACS also uses a dynamic Poisson distribution to effectively capture local biases in the genome, allowing for more robust predictions. MACS compares favorably to existing ChIP-Seq peak-finding algorithms, and is freely available.
- MRBAYES: Bayesian inference of phylogenetic trees
J P Huelsenbeck
Department of Biology, University of Rochester, Rochester, NY 14627, USA
Bioinformatics 17:754-5. 2001
..The program MRBAYES performs Bayesian inference of phylogeny using a variant of Markov chain Monte Carlo...
- Principal components analysis corrects for stratification in genome-wide association studies
Alkes L Price
Department of Genetics, Harvard Medical School, Boston, Massachusetts 02115, USA
Nat Genet 38:904-9. 2006
..Our simple, efficient approach can easily be applied to disease studies with hundreds of thousands of markers...
- Statistical significance for genomewide studies
John D Storey
Department of Biostatistics, University of Washington, Seattle, WA 98195, USA
Proc Natl Acad Sci U S A 100:9440-5. 2003
..Our approach avoids a flood of false positive results, while offering a more liberal criterion than what has been used in genome scans for linkage...
- Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences
Burnham Institute for Medical Research La Jolla, CA 92037, USA
Bioinformatics 22:1658-9. 2006
..All these programs can handle huge datasets with millions of sequences and can be hundreds of times faster than methods based on the popular sequence comparison and database search tools, such as BLAST...
- A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood
LIRMM, CNRS, 161 rue Ada, 34392, Montpellier Cedex 5, France
Syst Biol 52:696-704. 2003
..pairs from plant plastids, thus reaching a speed of the same order as some popular distance-based and parsimony algorithms. This new method is implemented in the PHYML program, which is freely available on our web page: http://www...
- KEGG for representation and analysis of molecular networks involving diseases and drugs
Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Kyoto 611 0011, Japan
Nucleic Acids Res 38:D355-60. 2010
..The new disease/drug information resource named KEGG MEDICUS can be used as a reference knowledge base for computational analysis of molecular networks, especially, by integrating large-scale experimental datasets...
- Inference of population structure using multilocus genotype data
J K Pritchard
Department of Statistics, University of Oxford, United Kingdom
Genetics 155:945-59. 2000
..g. , seven microsatellite loci in an example using genotype data from an endangered bird species. The software used for this article is available from http://www.stats.ox.ac.uk/ approximately pritch/home. html...
- Haploview: analysis and visualization of LD and haplotype maps
J C Barrett
Whitehead Institute for Biomedical Research Cambridge, MA 02142, USA
Bioinformatics 21:263-5. 2005
..Haploview is a software package that provides computation of linkage disequilibrium statistics and population haplotype patterns from primary genotype data in a visually appealing and interactive interface...
- Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications
Department of Genetics, The Norwegian Radium Hospital, Montebello, N 0310 Oslo, Norway
Proc Natl Acad Sci U S A 98:10869-74. 2001
- Cytoscape: a software environment for integrated models of biomolecular interaction networks
Institute for Systems Biology, Seattle, Washington 98103, USA
Genome Res 13:2498-504. 2003
- CAP3: A DNA sequence assembly program
Department of Computer Science, Michigan Technological University, Houghton, Michigan 49931 USA
Genome Res 9:868-77. 1999
..PHRAP often produces longer contigs than CAP3 whereas CAP3 often produces fewer errors in consensus sequences than PHRAP. It is easier to construct scaffolds with CAP3 than with PHRAP on low-pass data with forward-reverse constraints...
- Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies
Department of Molecular Biology, Max Planck Institut für Infektionsbiologie, Schumann Strasse 21 22, 10117 Berlin, Germany
Genetics 164:1567-87. 2003
..The methods are implemented in a program, structure, version 2.0, which is available at http://pritch.bsd.uchicago.edu...
- MrBayes 3: Bayesian phylogenetic inference under mixed models
Department of Systematic Zoology, Evolutionary Biology Centre, Uppsala University, Norbyv 18D, SE 752 36 Uppsala, Sweden
Bioinformatics 19:1572-4. 2003
..g. morphological, nucleotide, and protein-and to explore a wide variety of structured models mixing partition-unique and shared parameters. The program employs MPI to parallelize Metropolis coupling on Macintosh or UNIX clusters...
- Combinatorial microRNA target predictions
Center for Comparative Functional Genomics, Department of Biology, New York University, 100 Washington Square East, New York, New York 10003, USA
Nat Genet 37:495-500. 2005
..In particular, we experimentally validate common regulation of Mtpn by miR-375, miR-124 and let-7b and thus provide evidence for coordinate microRNA control in mammals...
- A comparison of normalization methods for high density oligonucleotide array data based on variance and bias
B M Bolstad
Group in Biostatistics, University of California, Berkeley, CA 94720, USA
Bioinformatics 19:185-93. 2003
..Normalization is a process for reducing this variation. It is common to see non-linear relations between arrays and the standard normalization provided by Affymetrix does not perform well in these situations...
- Prediction of mammalian microRNA targets
Benjamin P Lewis
Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
Cell 115:787-98. 2003
..The predicted regulatory targets of mammalian miRNAs were enriched for genes involved in transcriptional regulation but also encompassed an unexpectedly broad range of other functions...
- Tandem repeats finder: a program to analyze DNA sequences
Department of Biomathematical Sciences, Mount Sinai School of Medicine, New York, NY 10029 6574, USA
Nucleic Acids Res 27:573-80. 1999
..These sequences range in size from 3 kb up to 700 kb. A World Wide Web server interface atc3.biomath.mssm.edu/trf.html has been established for automated use of the program...
- RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models
Swiss Federal Institute of Technology Lausanne, School of Computer and Communication Sciences Lab Prof Moret, Station 14, CH 1015 Lausanne, Switzerland
Bioinformatics 22:2688-90. 2006
..The program has been used to compute ML trees on two of the largest alignments to date containing 25,057 (1463 bp) and 2182 (51,089 bp) taxa, respectively...
- ABySS: a parallel assembler for short read sequence data
Jared T Simpson
Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, British Columbia V5Z 4E6, Canada
Genome Res 19:1117-23. 2009
..acid (DNA) sequencing instruments has prompted the recent development of de novo short read assembly algorithms. A common shortcoming of the available tools is their inability to efficiently assemble vast amounts of data ..
- Search and clustering orders of magnitude faster than BLAST
Robert C Edgar
Tiburon, CA 94920, USA
Bioinformatics 26:2460-1. 2010
..Biological sequence data is accumulating rapidly, motivating the development of improved high-throughput methods for sequence classification...
- TopHat: discovering splice junctions with RNA-Seq
Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD 20742, USA
Bioinformatics 25:1105-11. 2009
..TopHat is an efficient read-mapping algorithm designed to align reads from an RNA-Seq experiment to a reference genome without relying on known splice sites...
- Fast and accurate long-read alignment with Burrows-Wheeler transform
Wellcome Trust Sanger Institute, Wellcome Genome Campus, Cambridge, CB10 1SA, UK
Bioinformatics 26:589-95. 2010
..of them are very efficient for short reads but inefficient or not applicable for reads >200 bp because the algorithms are heavily and specifically tuned for short queries with low sequencing error rate...
- affy--analysis of Affymetrix GeneChip data at the probe level
Center for Biological Sequence Analysis CBS, Technical University of Denmark, Building 208, 2800 Lyngby, Denmark
Bioinformatics 20:307-15. 2004
..The processing of the Affymetrix GeneChip data has been a recent focus for data analysts. Alternatives to the original procedure have been proposed and some of these new methods are widely used...
- Fast robust automated brain extraction
Stephen M Smith
Oxford Centre for Functional Magnetic Resonance Imaging of the Brain, Department of Clinical Neurology, Oxford University, John Radcliffe Hospital, Headington, Oxford, United Kingdom
Hum Brain Mapp 17:143-55. 2002
..We describe the new method and give examples of results and the results of extensive quantitative testing against "gold-standard" hand segmentations, and two other popular automated methods...
- Recent developments in the MAFFT multiple sequence alignment program
Digital Medicine Initiative, Kyushu University, Fukuoka 812 8582, Japan
Brief Bioinform 9:286-98. 2008
..We review these and other techniques that MAFFT uses and suggest possible future directions of MSA software as a basis of comparative analyses. MAFFT is available at http://align.bmr.kyushu-u.ac.jp/mafft/software/...
- The transcriptional landscape of the yeast genome defined by RNA sequencing
Department of Molecular, Cellular, and Developmental Biology, Yale University, New Haven, CT 06520, USA
Science 320:1344-9. 2008
..We also found unexpected 3'-end heterogeneity and the presence of many overlapping genes. These results indicate that the yeast transcriptome is more complex than previously appreciated...
- FastTree 2--approximately maximum-likelihood trees for large alignments
Morgan N Price
Physical Biosciences Division, Lawrence Berkeley National Lab, Berkeley, California, United States of America
PLoS ONE 5:e9490. 2010
..We recently described FastTree, a tool for inferring phylogenies for alignments with up to hundreds of thousands of sequences. Here, we describe improvements to FastTree that improve its accuracy without sacrificing scalability...
- Human MicroRNA targets
Computational Biology Center, Memorial Sloan Kettering Cancer Center, New York, New York, USA
PLoS Biol 2:e363. 2004
..microrna.org. Our analysis suggests that miRNA genes, which are about 1% of all human genes, regulate protein production for 10% or more of all human genes...
- Open source clustering software
M J L de Hoon
Human Genome Center, Institute of Medical Science, University of Tokyo, 4 6 1 Shirokanedai, Minato ku, Tokyo, 108 8639 Japan
Bioinformatics 20:1453-4. 2004
..In addition, we generated a Python and a Perl interface to the C Clustering Library, thereby combining the flexibility of a scripting language with the speed of C...
- Naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy
Center for Microbial Ecology, Michigan State University, East Lansing, MI 48824, USA
Appl Environ Microbiol 73:5261-7. 2007
..It combines the RDP Classifier with a statistical test to flag taxa differentially represented between samples. The RDP Classifier and RDP Library Compare are available online at http://rdp.cme.msu.edu/...
- MODELTEST: testing the model of DNA substitution
Department of Zoology, Brigham Young University, 574 WIDB, Provo, UT 84602 5255, USA
Bioinformatics 14:817-8. 1998
..The program MODELTEST uses log likelihood scores to establish the model of DNA evolution that best fits the data...
- The Amber biomolecular simulation programs
David A Case
Department of Molecular Biology, The Scripps Research Institute, 10550 North Torrey Pines Raod, TPC15, La Jolla, CA 92037, USA
J Comput Chem 26:1668-88. 2005
- EnsemblCompara GeneTrees: Complete, duplication-aware phylogenetic trees in vertebrates
Albert J Vilella
EMBL EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
Genome Res 19:327-35. 2009
..All data are made available in a number of formats and will be kept up to date with the Ensembl project...
- BiNGO: a Cytoscape plugin to assess overrepresentation of gene ontology categories in biological networks
Department of Plant Systems Biology, Flanders Interuniversity Institute for Biotechnology VIB, Ghent University, Technologiepark 927, B 9052, Ghent, Belgium
Bioinformatics 21:3448-9. 2005
- Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search
Institute for Systems Biology, Seattle, Washington 98103, USA
Anal Chem 74:5383-92. 2002
..This analysis makes it possible to filter large volumes of MS/MS database search results with predictable false identification error rates and can serve as a common standard by which the results of different research groups are compared...
- Protein secondary structure prediction based on position-specific scoring matrices
D T Jones
Department of Biological Sciences, University of Warwick, Coventry, CV4 7AL, United Kingdom
J Mol Biol 292:195-202. 1999
..Given the success of the method in CASP3, it is reasonable to be confident that the evaluation presented here gives a fair indication of the performance of the method in general...
- An efficient algorithm for large-scale detection of protein families
A J Enright
Computational Genomics Group, The European Bioinformatics Institute, EMBL Cambridge Outstation, Cambridge CB10 1SD, UK
Nucleic Acids Res 30:1575-84. 2002
..This novel approach does not suffer from the problems that normally hinder other protein sequence clustering algorithms, such as the presence of multi-domain proteins, promiscuous domains and fragmented proteins...
- Clustal W and Clustal X version 2.0
M A Larkin
The Conway Institute of Biomolecular and Biomedical Research, University College Dublin, Belfield, Dublin 4, Ireland
Bioinformatics 23:2947-8. 2007
..This will facilitate the further development of the alignment algorithms in the future and has allowed proper porting of the programs to the latest versions of Linux, Macintosh and ..
- A multigene assay to predict recurrence of tamoxifen-treated, node-negative breast cancer
Division of Pathology, Operation Center, and the Biostatistics Center, National Surgical Adjuvant Breast and Bowel Project, Pittsburgh 15212, USA
N Engl J Med 351:2817-26. 2004
..The likelihood of distant recurrence in patients with breast cancer who have no involved lymph nodes and estrogen-receptor-positive tumors is poorly defined by clinical and histopathological measures...
- Identifying bacterial genes and endosymbiont DNA with Glimmer
Arthur L Delcher
Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD 20742, USA
Bioinformatics 23:673-9. 2007
..This module was developed in response to the discovery that eukaryotic genome sequencing projects sometimes inadvertently capture the DNA of intracellular bacteria living in the host...
- The metagenomics RAST server - a public resource for the automatic phylogenetic and functional analysis of metagenomes
Mathematics and Computer Science Division, Argonne National Laboratory, 9700 S Cass Avenue, Argonne, IL 60439, USA
BMC Bioinformatics 9:386. 2008
..High-throughput, low-cost next-generation sequencing has provided access to metagenomics to a wide range of researchers...
- Fast and effective prediction of microRNA/target duplexes
Universitat Bielefeld, International NRW Graduate School in Bioinformatics and Genome Research, Postfach 10 01 31, 33501 Bielefeld, Germany
RNA 10:1507-17. 2004
..RNAhybrid, with its accompanying programs RNAcalibrate and RNAeffective, is available for download and as a Web tool on the Bielefeld Bioinformatics Server (http://bibiserv.techfak.uni-bielefeld.de/rnahybrid/)...
- GOrilla: a tool for discovery and visualization of enriched GO terms in ranked gene lists
Molecular Cell Biology Department, Weizmann Institute of Science, Rehovot, Israel
BMC Bioinformatics 10:48. 2009
..A few tools also exist that support analyzing ranked lists. The latter typically rely on simulations or on union-bound correction for assigning statistical significance to the results...
- Unified segmentation
Wellcome Department of Imaging Neuroscience, 12 Queen Square, London, WC1N 3BG, UK
Neuroimage 26:839-51. 2005
..A strategy for optimising the model parameters is described, along with the requisite partial derivatives of the objective function...
- Genome-wide mapping of in vivo protein-DNA interactions
David S Johnson
Department of Genetics, Stanford University School of Medicine, Stanford, CA, 94305 5120, USA
Science 316:1497-502. 2007
..96] and statistical confidence (P <10(-4)), properties that were important for inferring new candidate interactions. These include key transcription factors in the gene network that regulates pancreatic islet cell development...
- WGCNA: an R package for weighted correlation network analysis
Department of Human Genetics and Department of Biostatistics, University of California, Los Angeles, CA 90095, USA
BMC Bioinformatics 9:559. 2008
..While parts of the correlation network methodology have been described in separate publications, there is a need to provide a user-friendly, comprehensive, and consistent software implementation and an accompanying tutorial...
- A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach
Department of Zoology, University of Cambridge, Cambridge, England
Mol Biol Evol 18:691-9. 2001
- Distinct and predictive chromatin signatures of transcriptional promoters and enhancers in the human genome
Nathaniel D Heintzman
Ludwig Institute for Cancer Research, University of California San Diego UCSD School of Medicine, 9500 Gilman Drive, La Jolla, California 92093 0653 USA
Nat Genet 39:311-8. 2007
..We developed computational algorithms using these distinct chromatin signatures to identify new regulatory elements, predicting over 200 promoters and ..
- Protein homology detection by HMM-HMM comparison
Department of Protein Evolution, Max Planck Institute for Developmental Biology Spemannstrasse 35, D 72076 Tubingen, Germany
Bioinformatics 21:951-60. 2005
..Protein homology detection and sequence alignment are at the basis of protein structure prediction, function prediction and evolution...
- Removing noise from pyrosequenced amplicons
Department of Civil Engineering, University of Glasgow, Rankine Building, Oakfield Avenue, Glasgow G12 8LT, UK
BMC Bioinformatics 12:38. 2011
..We use data sets where samples of known diversity have been amplified and sequenced to quantify the effect of each of the sources of error on OTU inflation and to validate these algorithms.
- Coding algorithms for defining comorbidities in ICD-9-CM and ICD-10 administrative data
Department of Community Health Sciences, University of Calgary, Calgary, Alberta, Canada
Med Care 43:1130-9. 2005
..Recognizing this, we conducted a multistep process to develop ICD-10 coding algorithms to define Charlson and Elixhauser comorbidities in administrative data and assess the performance of the ..
- Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry
Joshua E Elias
Department of Cell Biology, 240 Longwood Avenue, Harvard Medical School, Boston, Massachusetts 02115, USA
Nat Methods 4:207-14. 2007
- R/qtl: QTL mapping in experimental crosses
Karl W Broman
Department of Biostatistics, Johns Hopkins University, 615 N Wolfe St, Baltimore, MD 21205, USA
Bioinformatics 19:889-90. 2003
- JASPAR 2010: the greatly expanded open-access database of transcription factor binding profiles
Department of Medical Genetics, Centre for Molecular Medicine and Therapeutics, Child and Family Research Institute, University of British Columbia, 950 West 28th Avenue, Vancouver, BC V5Z 4H4, Canada
Nucleic Acids Res 38:D105-10. 2010
..Additionally, three new special collections provide matrix profile data produced by recent alternative high-throughput approaches...
- Mapping short DNA sequencing reads and calling variants using mapping quality scores
The Wellcome Trust Sanger Institute, Hinxton CB10 1SA, United Kingdom
Genome Res 18:1851-8. 2008
..produce very short reads, typically of a few tens of base pairs, and to use these reads effectively requires new algorithms and software...
- A comparison of bayesian methods for haplotype reconstruction from population genotype data
Department of Statistics, University of Washington, Seattle, WA 98195 4322, USA
Am J Hum Genet 73:1162-9. 2003
..The new algorithm is included in the software package PHASE, version 2.0, available online (http://www.stat.washington.edu/stephens/software.html)...
- InParanoid 7: new algorithms and tools for eukaryotic orthology analysis
Department of Biochemistry and Biophysics, Stockholm Bioinformatics Centre, AlbaNova University Centre, Stockholm University, SE 10691 Stockholm, Sweden
Nucleic Acids Res 38:D196-203. 2010
..To facilitate data exchange and comparisons among ortholog databases, we have developed and are making available two XML schemas: SeqXML for the input sequences and OrthoXML for the output ortholog clusters...
- Improved prediction of signal peptides: SignalP 3.0
Jannick Dyrløv Bendtsen
Center for Biological Sequence Analysis, BioCentrum DTU, Building 208, Technical University of Denmark, DK 2800 Lyngby, Denmark
J Mol Biol 340:783-95. 2004
..SignalP consists of two different predictors based on neural network and hidden Markov model algorithms, where both components have been updated...
- T-Coffee: A novel method for fast and accurate multiple sequence alignment
National Institute for Medical Research, The Ridgeway, London, NW7 1AA, UK
J Mol Biol 302:205-17. 2000
..The improvement, especially clear with the more difficult test cases, is always visible, regardless of the phylogenetic spread of the sequences in the tests...
- MatInspector and beyond: promoter analysis based on transcription factor binding sites
Genomatix Software GmbH Landsberger Strasse 6, 80339 München, Germany
Bioinformatics 21:2933-42. 2005
..The next steps in promoter analysis can be tackled only with reliable predictions, e.g. finding phylogenetically conserved patterns or identifying higher order combinations of sites in promoters of co-regulated genes...
- RAxML-III: a fast program for maximum likelihood-based inference of large phylogenetic trees
Department of Computer Science, Technical University of Munich Boltzmannstrasse 3, D 85748 München, Germany
Bioinformatics 21:456-63. 2005
..Due to the combinatorial and computational complexity the size of trees which can be computed on a Biologist's PC workstation within reasonable time is limited to trees containing approximately 100 taxa...
- Imaging intracellular fluorescent proteins at nanometer resolution
Howard Hughes Medical Institute, Janelia Farm Research Campus, Ashburn, VA 20147, USA
Science 313:1642-5. 2006
- A new statistical method for haplotype reconstruction from population data
Department of Statistics, University of Oxford
Am J Hum Genet 68:978-89. 2001
..applicable to genotype data at linked loci from a population sample, that improves substantially on current algorithms; often, error rates are reduced by > 50%, relative to its nearest competitor...
- Improved scoring of functional groups from gene expression data by decorrelating GO graph structure
Max Planck Institute for Informatics Stuhlsatzenhausweg 85, D 66123 Saarbrucken, Germany
Bioinformatics 22:1600-7. 2006
..g. based on Gene Ontology (GO). We develop methods that increase the explanatory power of this approach by integrating knowledge about relationships between the GO terms into the calculation of the statistical significance...
- Rank products: a simple, yet powerful, new method to detect differentially regulated genes in replicated microarray experiments
Plant Science Group, Institute of Biomedical and Life Sciences, University of Glasgow, Glasgow G12 8QQ, UK
FEBS Lett 573:83-92. 2004
..In addition, using RP can lead to a sharp reduction in the number of replicate experiments needed to obtain reproducible results...
- SNP detection for massively parallel whole-genome resequencing
Beijing Genomics Institute at Shenzhen, Shenzhen 518000, China
Genome Res 19:1124-32. 2009
..Our analyses demonstrate that our method has a very low false call rate at any sequencing depth and excellent genome coverage at a high sequencing depth...
- A neural substrate of prediction and reward
Institute of Physiology, University of Fribourg, CH 1700 Fribourg, Switzerland
Science 275:1593-9. 1997
..Taken together, these findings can be understood through quantitative theories of adaptive optimizing control...
- Accurate determination of microbial diversity from 454 pyrosequencing data
Department of Civil Engineering, Rankine Building, University of Glasgow, Glasgow, UK
Nat Methods 6:639-41. 2009
..We pyrosequenced a known mixture of microbial 16S rDNA sequences extracted from a lake and found that without noise reduction the number of operational taxonomic units is overestimated but using PyroNoise it can be accurately calculated...
- Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering
Sharon R Browning
Department of Statistics, The University of Auckland, Auckland, New Zealand
Am J Hum Genet 81:1084-97. 2007
..1 days of computing time, with 99% of masked alleles imputed correctly. Our method is implemented in the Beagle software package, which is freely available...
- High-quality draft assemblies of mammalian genomes from massively parallel sequence data
Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
Proc Natl Acad Sci U S A 108:1513-8. 2011
..The ALLPATHS-LG program is available at http://www.broadinstitute.org/science/programs/genome-biology/crd...
- PROSITE, a protein domain database for functional characterization and annotation
Christian J A Sigrist
Structural Biology and Bioinformatics Department, University of Geneva, Swiss Institute of Bioinformatics, Centre Medical Universitaire, CH 1211 Geneva 4, Switzerland
Nucleic Acids Res 38:D161-6. 2010
..The latest version of PROSITE (release 20.54, of 22 September 2009) contains 1308 patterns, 863 profiles and 869 ProRules. PROSITE is accessible at: http://www.expasy.org/prosite/...
- GROMACS: fast, flexible, and free
David van der Spoel
Department of Cell and Molecular Biology, Uppsala University, Husargatan 3, Box 596, S 75124 Uppsala, Sweden
J Comput Chem 26:1701-18. 2005
..It is maintained by a group of developers from the Universities of Groningen, Uppsala, and Stockholm, and the Max Planck Institute for Polymer Research in Mainz. Its Web site is http://www.gromacs.org...
- Discovering microRNAs from deep sequencing data using miRDeep
Marc R Friedländer
Max Delbruck Centrum fur Molekulare Medizin, Robert Rossle Strasse 10, D 13125 Berlin Buch, Germany
Nat Biotechnol 26:407-15. 2008
..miRDeep reports altogether approximately 230 previously unannotated miRNAs, of which four novel C. elegans miRNAs are validated by northern blot analysis...
- The sequence and de novo assembly of the giant panda genome
BGI Shenzhen, Shenzhen 518083, China
Nature 463:311-7. 2010
- Accurate normalization of real-time quantitative RT-PCR data by geometric averaging of multiple internal control genes
Center for Medical Genetics, Ghent University Hospital 1K5, De Pintelaan 185, B 9000 Ghent, Belgium
Genome Biol 3:RESEARCH0034. 2002
- PANTHER: a library of protein families and subfamilies indexed by function
Paul D Thomas
Protein Informatics, Celera Genomics, Foster City, California 94404, USA
Genome Res 13:2129-41. 2003
..Third, we use the family HMMs to rank missense single nucleotide polymorphisms (SNPs), on a database-wide scale, according to their likelihood of affecting protein function...
- Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists
Da Wei Huang
Laboratory of Immunopathogenesis and Bioinformatics, Clinical Services Program, SAIC Frederick, Inc, National Cancer Institute at Frederick, Frederick, MD 21702, USA
Nucleic Acids Res 37:1-13. 2009
..Tools are uniquely categorized into three major classes, according to their underlying enrichment algorithms. The comprehensive collections, unique tool classifications and associated questions/issues will provide a more ..
- A pattern-based method for the identification of MicroRNA binding sites and their corresponding heteroduplexes
Kevin C Miranda
Bioinformatics and Pattern Discovery Group, IBM Thomas J Watson Research Center, Yorktown Heights, P O Box 218, NY 10598, USA
Cell 126:1203-17. 2006
..We also extended the method's key idea to a low-error microRNA-precursor-discovery scheme; our studies suggest that the number of microRNA precursors in mammalian genomes likely ranges in the tens of thousands...
- ARACNE: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context
Adam A Margolin
Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA
BMC Bioinformatics 7:S7. 2006
..This method uses an information theoretic approach to eliminate the majority of indirect interactions inferred by co-expression methods...
- Using GOstats to test gene lists for GO term association
Fred Hutchison Cancer Research Center, Program Computational Biology, 1100 Fairview Avenue North P O Box 19024, Seattle, WA 98109, USA
Bioinformatics 23:257-8. 2007
..In this paper we report significant improvements and extensions such as support for conditional testing...
- Scalable molecular dynamics with NAMD
James C Phillips
Beckman Institute, University of Illinois at Urbana Champaign, Urbana, IL 61801, USA
J Comput Chem 26:1781-802. 2005
..force field, equations of motion, and integration methods along with the efficient electrostatics evaluation algorithms employed and temperature and pressure controls used...
- Bayesian inference of species trees from multilocus data
Department of Computer Science, University of Auckland, New Zealand
Mol Biol Evol 27:570-80. 2010
..We demonstrate that both BEST and our method have much better estimation accuracy for species tree topology than concatenation, and our method outperforms BEST in divergence time and population size estimation...
- MEME SUITE: tools for motif discovery and searching
Timothy L Bailey
Institute for Molecular Bioscience, University of Queensland, Brisbane, Queensland, Australia
Nucleic Acids Res 37:W202-8. 2009
..Three sequence scanning algorithms--MAST, FIMO and GLAM2SCAN--allow scanning numerous DNA and protein sequence databases for motifs discovered by ..
- Bayesian inference of phylogeny and its impact on evolutionary biology
J P Huelsenbeck
Department of Biology, University of Rochester, Rochester, NY 14627, USA
Science 294:2310-4. 2001
- Assembly algorithms for next-generation sequencing data
Jason R Miller
J Craig Venter Institute, Rockville, MD 20850 3343, USA
Genomics 95:315-27. 2010
..emergence of next-generation sequencing platforms led to resurgence of research in whole-genome shotgun assembly algorithms and software...
- High-resolution mapping and characterization of open chromatin across the genome
Alan P Boyle
Institute for Genome Sciences and Policy, Duke University, Durham, NC 27708, USA
Cell 132:311-22. 2008
..In addition, and unexpectedly, our analyses have uncovered detailed features of nucleosome structure...
- Improved microbial gene identification with GLIMMER
A L Delcher
Department of Computer Science, Loyola College in Maryland, Baltimore, MD 21210, USA
Nucleic Acids Res 27:4636-41. 1999
..When the analysis is restricted to genes that have significant homology to genes in other organisms, GLIMMER misses <1% of known genes...
- IUPred: web server for the prediction of intrinsically unstructured regions of proteins based on estimated energy content
Institute of Enzymology, BRC, Hungarian Academy of Sciences, PO Box 7, H 1518 Budapest, Hungary
Bioinformatics 21:3433-4. 2005
..Optional to the prediction are built-in parameter sets optimized for predicting short or long disordered regions and structured domains...
- HyPhy: hypothesis testing using phylogenies
Sergei L Kosakovsky Pond
Antiviral Research Center, University of California San Diego San Diego, CA 92103, USA
Bioinformatics 21:676-9. 2005
..AVAILABILITY: http://www.hyphy.org CONTACT: email@example.com SUPPLEMENTARY INFORMATION: HyPhydocumentation and tutorials are available at http://www.hyphy.org...
- Genesis: cluster analysis of microarray data
Institute of Biomedical Engineering, Graz University of Technology, Krenngasse 37, 8010 Graz, Austria
Bioinformatics 18:207-8. 2002
..analysis such as filters, normalization and visualization tools, distance measures as well as common clustering algorithms including hierarchical clustering, self-organizing maps, k-means, principal component analysis, and support ..
- A fast diffeomorphic image registration algorithm
Wellcome Trust Centre for Neuroimaging, 12 Queen Square, London, UK
Neuroimage 38:95-113. 2007