datasets as topic


Summary: Subject matter related to the curation of data from research projects, stored permanently in a formalized manner suitable for communication, interpretation, or processing.

Top Publications

  1. Torkamani A, Andersen K, Steinhubl S, Topol E. High-Definition Medicine. Cell. 2017;170:828-843 pubmed publisher
    ..In this review, we will examine the core disciplines that enable high-definition medicine and project how these technologies will alter the future of medicine. ..
  2. Kim S, Cho K, Oh S. Development of machine learning models for diagnosis of glaucoma. PLoS ONE. 2017;12:e0177726 pubmed publisher
    ..We may combine multiple learning models to increase prediction accuracy. The C5.0 model includes decision rules for prediction. It can be used to explain the reasons for specific predictions. ..
  3. Dimitropoulos K, Barmpoutis P, Zioga C, Kamas A, Patsiaoura K, Grammalidis N. Grading of invasive breast carcinoma through Grassmannian VLAD encoding. PLoS ONE. 2017;12:e0185110 pubmed publisher
    ..Experimental results have shown that the proposed method outperforms a number of state of the art approaches providing average classification rates of 95.8% and 91.38% with our dataset and the BreaKHis dataset, respectively...
  4. Kanaya S, Altaf Ul Amin M, Kiboi S, Afendi F. Big Data and Network Biology 2015. Biomed Res Int. 2015;2015:604623 pubmed publisher
  5. Adams J. Genetics: Big hopes for big data. Nature. 2015;527:S108-9 pubmed publisher
  6. Hildebrand D, Cicconet M, Torres R, Choi W, Quan T, Moon J, et al. Whole-brain serial-section electron microscopy in larval zebrafish. Nature. 2017;545:345-349 pubmed publisher
    ..All obtained images and reconstructions are provided as an open-access resource. ..
  7. Zanderigo F, Mann J, Ogden R. A hybrid deconvolution approach for estimation of in vivo non-displaceable binding for brain PET targets without a reference region. PLoS ONE. 2017;12:e0176636 pubmed publisher
    ..HYDECA can provide subject-specific estimates of VND without requiring a blocking study for tracers and targets for which a valid reference region does not exist. ..
  8. Kishi A, Van Dongen H, Natelson B, Bender A, Palombini L, Bittencourt L, et al. Sleep continuity is positively correlated with sleep duration in laboratory nighttime sleep recordings. PLoS ONE. 2017;12:e0175504 pubmed publisher
    ..These findings suggest that S-TST may differ from L-TST in processes underlying sleep continuity, shedding new light on mechanisms underlying individual differences in sleep duration. ..
  9. Kiselev V, Kirschner K, Schaub M, Andrews T, Yiu A, Chandra T, et al. SC3: consensus clustering of single-cell RNA-seq data. Nat Methods. 2017;14:483-486 pubmed publisher We demonstrate that SC3 is capable of identifying subclones from the transcriptomes of neoplastic cells collected from patients. ..

More Information


  1. Bauman W, Krassioukov A, Biering Sørensen F. Version 2.0 of the international spinal cord injury endocrinology and metabolic function basic data set. Spinal Cord. 2017;55:327-328 pubmed publisher
  2. Bourne P, Bonazzi V, Dunn M, Green E, Guyer M, Komatsoulis G, et al. The NIH Big Data to Knowledge (BD2K) initiative. J Am Med Inform Assoc. 2015;22:1114 pubmed publisher
  3. Auton A, Brooks L, Durbin R, Garrison E, Kang H, Korbel J, et al. A global reference for human genetic variation. Nature. 2015;526:68-74 pubmed publisher
    ..This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies. ..
  4. Malde K. Estimating the information value of polymorphic sites using pooled sequences. BMC Genomics. 2014;15 Suppl 6:S20 pubmed publisher
    ..The results show that we achieve a clear separation between true variants and noise, allowing us to select candidate sites with a high degree of confidence. ..
  5. Fagnan L, Dolor R. PBRNS discuss utilizing big data for research and within a learning health system. Ann Fam Med. 2015;13:185 pubmed publisher
  6. Northcott P, Buchhalter I, Morrissy A, Hovestadt V, Weischenfeldt J, Ehrenberger T, et al. The whole-genome landscape of medulloblastoma subtypes. Nature. 2017;547:311-317 pubmed publisher
  7. Bria A, Iannello G, Onofri L, Peng H. TeraFly: real-time three-dimensional visualization and annotation of terabytes of multidimensional volumetric images. Nat Methods. 2016;13:192-4 pubmed publisher
  8. Brown L, Williams J, Taylor L, Thomson R, Nolan P, Foster R, et al. Meta-analysis of transcriptomic datasets identifies genes enriched in the mammalian circadian pacemaker. Nucleic Acids Res. 2017;45:9860-9873 pubmed publisher
    ..SCN-enriched transcripts identified in this study provide novel insights into SCN function, including identifying genes which may play key roles in SCN physiology or provide SCN-specific drivers...
  9. Hucka M, Bergmann F, Hoops S, Keating S, Sahle S, Schaff J, et al. The Systems Biology Markup Language (SBML): Language Specification for Level 3 Version 1 Core. J Integr Bioinform. 2015;12:266 pubmed publisher
    ..Other materials and software are available from the SBML project web site, ..
  10. Zhou C, Yu H, Ding Y, Guo F, Gong X. Multi-scale encoding of amino acid sequences for predicting protein interactions using gradient boosting decision tree. PLoS ONE. 2017;12:e0181426 pubmed publisher
    ..Because of learning capabilities of the gradient boosting decision tree and the mutil-scale feature representation scheme, the proposed method might be a useful tool for future proteomics studies. ..
  11. de Ruiter J, Kas S, Schut E, Adams D, Koudijs M, Wessels L, et al. Identifying transposon insertions and their effects from RNA-sequencing data. Nucleic Acids Res. 2017;45:7064-7077 pubmed publisher
    ..We expect that IM-Fusion will significantly enhance the accuracy of cancer gene discovery in forward genetic screens and provide initial insight into the biological effects of insertions on candidate cancer genes. ..
  12. Páez Espino D, Eloe Fadrosh E, Pavlopoulos G, Thomas A, Huntemann M, Mikhailova N, et al. Uncovering Earth's virome. Nature. 2016;536:425-30 pubmed
    ..Our results highlight an extensive global viral diversity and provide detailed insight into viral habitat distribution and host–virus interactions. ..
  13. Brandizi M, Melnichuk O, Bild R, Kohlmayer F, Rodriguez Castro B, Spengler H, et al. Orchestrating differential data access for translational research: a pilot implementation. BMC Med Inform Decis Mak. 2017;17:30 pubmed publisher
    ..Here we report experience and lessons learnt of our pilot implementation, which may be useful for similar use cases. Furthermore, we discuss possible extensions for more complex scenarios. ..
  14. Mathes R, Lall R, Levin Rector A, Sell J, Paladini M, Konty K, et al. Evaluating and implementing temporal, spatial, and spatio-temporal methods for outbreak detection in a local syndromic surveillance system. PLoS ONE. 2017;12:e0184419 pubmed publisher
    ..Furthermore, we found the scan statistics, as applied in the SaTScan software package, to be the easiest to program and implement for daily data analysis. ..
  15. Mallik S, Kundu S. Coevolutionary constraints in the sequence-space of macromolecular complexes reflect their self-assembly pathways. Proteins. 2017;85:1183-1189 pubmed publisher
    ..Proteins 2017; 85:1183-1189. © 2017 Wiley Periodicals, Inc. ..
  16. Green E, Watson J, Collins F. Human Genome Project: Twenty-five years of big biology. Nature. 2015;526:29-31 pubmed publisher
  17. Welch R, Chung D, Grass J, Landick R, Keles S. Data exploration, quality control and statistical analysis of ChIP-exo/nexus experiments. Nucleic Acids Res. 2017;45:e145 pubmed publisher
    ..ChIPexoQual analysis of these datasets resulted in guidelines for using these QC metrics across a wide range of sequencing depths and provided further insights for modelling ChIP-exo data. ..
  18. Drazen J. Sharing individual patient data from clinical trials. N Engl J Med. 2015;372:201-2 pubmed publisher
  19. Medhaug I, Stolpe M, Fischer E, Knutti R. Reconciling controversies about the 'global warming hiatus'. Nature. 2017;545:41-47 pubmed publisher
    ..Combined with stronger recent warming trends in newer datasets, we are now more confident than ever that human influence is dominant in long-term warming. ..
  20. Lubbeke A, Rees J, Barea C, Combescure C, Carr A, Silman A. International variation in shoulder arthroplasty. Acta Orthop. 2017;88:592-599 pubmed publisher
    ..The internationally increasing registry activity is an excellent basis for improving the so far weak evidence in shoulder arthroplasty...
  21. Chaouiya C, Keating S, Bérenguier D, Naldi A, Thieffry D, van Iersel M, et al. The Systems Biology Markup Language (SBML) Level 3 Package: Qualitative Models, Version 1, Release 1. J Integr Bioinform. 2015;12:270 pubmed publisher
    ..This is particularly suited to logical models (Boolean or multi-valued) and some classes of Petri net models can be encoded with the approach. ..
  22. Wang Y, Freedman J, Liu H, Moorman P, Hyslop T, George D, et al. Associations between RNA splicing regulatory variants of stemness-related genes and racial disparities in susceptibility to prostate cancer. Int J Cancer. 2017;141:731-743 pubmed publisher
    ..6) were predicted to regulate RNA splicing. These variants may serve as novel biomarkers for racial disparities in prostate cancer risk. ..
  23. Cáceres E, Hurst L. The evolution, impact and properties of exonic splice enhancers. Genome Biol. 2013;14:R143 pubmed publisher
    ..Prior analyses that used the RESCUE-ESE set of hexamers captured the properties of consensus exonic splice enhancers. We estimate that at least 4% of synonymous mutations are deleterious owing to an effect on enhancer functioning. ..
  24. Tørring M, Murchie P, Hamilton W, Vedsted P, Esteva M, Lautrup M, et al. Evidence of advanced stage colorectal cancer with longer diagnostic intervals: a pooled analysis of seven primary care cohorts comprising 11?720 patients in five countries. Br J Cancer. 2017;117:888-897 pubmed publisher
    ..Furthermore, the study cannot define a specific 'safe' waiting time as the length of the primary care interval appears to have negative impact from day one. ..
  25. Luo H, Lum T, Wong G, Kwan J, Tang J, Chi I. Predicting Adverse Health Outcomes in Nursing Homes: A 9-Year Longitudinal Study and Development of the FRAIL-Minimum Data Set (MDS) Quick Screening Tool. J Am Med Dir Assoc. 2015;16:1042-7 pubmed
    ..It can be applied using variables from the MDS, allowing direct adoption in long-term care facilities already using this health information system. ..
  26. Hull S. Patient-generated health data foundation for personalized collaborative care. Comput Inform Nurs. 2015;33:177-80 pubmed publisher
  27. de Ávila M, Xavier M, Pintro V, de Azevedo W. Supervised machine learning techniques to predict binding affinity. A study for cyclin-dependent kinase 2. Biochem Biophys Res Commun. 2017;494:305-310 pubmed publisher
    ..In addition, the machine-learning model was applied to predict binding affinity of CDK2, which showed a better performance when compared with AutoDock4, AutoDock Vina, MolDock, and PLANTS scores...
  28. Obermeyer Z, Lee T. Lost in Thought - The Limits of the Human Mind and the Future of Medicine. N Engl J Med. 2017;377:1209-1211 pubmed publisher
  29. Dove E, Townend D, Meslin E, Bobrow M, Littler K, Nicol D, et al. RESEARCH ETHICS. Ethics review for international data-intensive research. Science. 2016;351:1399-400 pubmed publisher
  30. Zhang Y, Hardison R. Accurate and reproducible functional maps in 127 human cell types via 2D genome segmentation. Nucleic Acids Res. 2017;45:9823-9836 pubmed publisher
    ..Thus, we provide a high-quality map of candidate functional regions across 127 human cell types and compare the quality of different annotation methods in order to facilitate biomedical research in epigenomics...
  31. Cohen J, Pettitt J, Wilbourn E. Intentional burn injury: Assessment of allegations of self-infliction. J Forensic Leg Med. 2017;51:9-21 pubmed publisher
  32. Stetson L, Pearl T, Chen Y, Barnholtz Sloan J. Computational identification of multi-omic correlates of anticancer therapeutic response. BMC Genomics. 2014;15 Suppl 7:S2 pubmed publisher
  33. Quinn T, Singh S, Lees K, Bath P, Myint P. Validating and comparing stroke prognosis scales. Neurology. 2017;89:997-1002 pubmed publisher
    ..Our comparative analyses confirm differences in the prognostic accuracy of stroke scales. However, even the best performing scale had prognostic accuracy that may not be sufficient as a basis for clinical decision-making. ..
  34. Lloyd Price J, Mahurkar A, Rahnavard G, Crabtree J, Orvis J, Hall A, et al. Strains, functions and dynamics in the expanded Human Microbiome Project. Nature. 2017;550:61-66 pubmed publisher
    ..This study furthers our knowledge of baseline human microbial diversity and enables an understanding of personalized microbiome function and dynamics...
  35. Wang J, Tiyip T, Ding J, Zhang D, Liu W, Wang F, et al. Desert soil clay content estimation using reflectance spectroscopy preprocessed by fractional derivative. PLoS ONE. 2017;12:e0184836 pubmed publisher
    ..888, RMSEC = 0.446%, [Formula: see text] = 0.918, RMSEP = 0.383% and RPD = 2.511 ? 2.000) were most effective. Furthermore, they performed well in quantitative estimations of the clay content of soils in the study area...
  36. Deeter A, Dalman M, Haddad J, Duan Z. Inferring gene and protein interactions using PubMed citations and consensus Bayesian networks. PLoS ONE. 2017;12:e0186004 pubmed publisher
  37. Venkatesan M, Gadalla N, Stepniewska K, Dahal P, Nsanzabana C, Moriera C, et al. Polymorphisms in Plasmodium falciparum chloroquine resistance transporter and multidrug resistance 1 genes: parasite risk factors that affect treatment outcomes for P. falciparum malaria after artemether-lumefantrine and artesunate-amodiaquine. Am J Trop Med Hyg. 2014;91:833-43 pubmed publisher
  38. Yang C, Wang X, Liao X, Han C, Yu T, Qin W, et al. Aldehyde dehydrogenase 1 (ALDH1) isoform expression and potential clinical implications in hepatocellular carcinoma. PLoS ONE. 2017;12:e0182208 pubmed publisher
    ..Furthermore, high serum AFP levels contributed to lower ALDH1L1. ALDH1A1, ALDH1B1, and ALDH1L1, all of which were considered promising diagnostic and prognostic markers as well as potential drug targets. ..
  39. Silberzahn R, Uhlmann E. Crowdsourced research: Many hands make tight work. Nature. 2015;526:189-91 pubmed publisher
  40. Ogbo F, Agho K, Page A. Determinants of suboptimal breastfeeding practices in Nigeria: evidence from the 2008 demographic and health survey. BMC Public Health. 2015;15:259 pubmed publisher
  41. Yazdani M, Chow J, Manovich L. Quantifying the development of user-generated art during 2001-2010. PLoS ONE. 2017;12:e0175350 pubmed publisher
    ..Our analysis reveals a number of gradual and systematic changes over a ten-year period in artworks belonging to both categories. ..
  42. Wedemeyer A, Kliemann L, Srivastav A, Schielke C, Reusch T, Rosenstiel P. An improved filtering algorithm for big read datasets and its application to single-cell assembly. BMC Bioinformatics. 2017;18:324 pubmed publisher
    ..Our Bignorm algorithm allows assemblies of competitive quality in comparison to Diginorm, while being much faster. Bignorm is available for download at . ..
  43. Taichman D, Sahni P, Pinborg A, Peiperl L, Laine C, James A, et al. Data Sharing Statements for Clinical Trials: A Requirement of the International Committee of Medical Journal Editors. PLoS Med. 2017;14:e1002315 pubmed publisher
  44. Gui H, Kwan J, Sham P, Cherny S, Li M. Sharing of Genes and Pathways Across Complex Phenotypes: A Multilevel Genome-Wide Analysis. Genetics. 2017;206:1601-1609 pubmed publisher
    ..The investigation on genetic sharing at three different levels presents a complementary picture of how common DNA sequence variations contribute to disease comorbidities and trait manifestations. ..
  45. Johnson K, Maughan E, Bergren M, Wolfe L, GERDES J. Step Up & Be Counted! Strategies for Data Collection. NASN Sch Nurse. 2017;32:356-360 pubmed publisher
    ..The following is a discussion of some of the data collection innovations shared by Designated State Data Champions at the 2017 NASN Annual Conference...
  46. Voros S, Moreau Gaudry A. Sensor, signal, and imaging informatics: big data and smart health technologies. Yearb Med Inform. 2014;9:150-3 pubmed publisher
    ..This review shows that it is necessary not only to develop new tools specifically designed for Big Data, but also to evaluate their performance on such large datasets. ..
  47. Mohorianu I, Bretman A, Smith D, Fowler E, Dalmay T, Chapman T. Comparison of alternative approaches for analysing multi-level RNA-seq data. PLoS ONE. 2017;12:e0182694 pubmed publisher
    ..The proposed approaches have the potential to improve key steps in the analysis of RNA-seq data by incorporating the structure and characteristics of biological experiments. ..
  48. Gazzo A, Raimondi D, Daneels D, Moreau Y, Smits G, Van Dooren S, et al. Understanding mutational effects in digenic diseases. Nucleic Acids Res. 2017;45:e140 pubmed publisher
    ..Together, our results show that digenic disease data generates novel insights, providing a glimpse into the oligogenic realm. ..
  49. Epskamp S, Kruis J, Marsman M. Estimating psychopathological networks: Be careful what you wish for. PLoS ONE. 2017;12:e0179891 pubmed publisher
    ..To illustrate this point, we discuss recent literature and show the effect of the assumption of sparsity in three simulation studies. ..
  50. Esbenshade A, Zhao Z, Aftandilian C, Saab R, Wattier R, Beauchemin M, et al. Multisite external validation of a risk prediction model for the diagnosis of blood stream infections in febrile pediatric oncology patients without severe neutropenia. Cancer. 2017;123:3781-3790 pubmed publisher
    ..Cancer 2017;123:3781-3790. © 2017 American Cancer Society. ..
  51. Liu L, Wu F, Zhang W. A group LASSO-based method for robustly inferring gene regulatory networks from multiple time-course datasets. BMC Syst Biol. 2014;8 Suppl 3:S1 pubmed publisher
    ..The simulation and real data examples demonstrate the effectiveness of the Huber group LASSO in integrating multiple time-course gene expression datasets and improving the resistance to large errors or outliers. ..
  52. Chao Y, Wu C. Principal component-based weighted indices and a framework to evaluate indices: Results from the Medical Expenditure Panel Survey 1996 to 2011. PLoS ONE. 2017;12:e0183997 pubmed publisher
    ..The indices selected by this framework could lead to a new genre of publications focusing on meaningful aggregation of information. ..
  53. Leem S, Park T. An empirical fuzzy multifactor dimensionality reduction method for detecting gene-gene interactions. BMC Genomics. 2017;18:115 pubmed publisher
    ..The program written in R for EF-MDR is available at . ..