![]() The second example describes such a deposition of a dataset containing the quantum mechanically computed structures and properties of 134,000 molecules into the Figshare digital repository. We will argue here that these SI-based mechanisms for depositing, retrieving and re-using the data components of journal articles are no longer fit for this purpose (if indeed they ever were) and should be urgently replaced by repositories of data and closely-coupled metadata as a fundamentally different model for research data management. ![]() The trend in scientific publication in recent years has required authors reporting such studies to include more extensive data in the form of supporting information (SI) to accompany the scientific narrative from which their models are constructed and their conclusions drawn. The first is typical of how almost all datasets derived from molecular computations are currently curated in this case the stochastic generation of all possible stable molecular structures from an initial set of specified atoms. We will also compare our approach with two other examples drawn from computational chemistry. The original PM5 method used to obtain the molecular geometries was never formally published and is now unavailable, whereas the succeeding PM7 method has been formally peer reviewed and published. Specific benefits of undertaking such a curation include re-filtering the original source data for errors not previously eliminated, to produce an enhanced metadata record for each entry, and to recompute the optimised molecular coordinates by using the newer PM7 method. The purpose of this project was to explore the viability of the long-term preservation of the 10 year old Cambridge dataset through such an incremental curation by performing its migration to the SPECTRa repository hosted at Imperial College London. These curation cycles can track the evolution of data storage hardware, data formats and introduction of new software, so ensuring that the data remains accessible and in a usable form. Such operations can in principle be repeated indefinitely, thus creating a long-term mechanism with an anticipated lifetime of 100+ years if required. In this context, it is becoming increasingly accepted that successful long-term preservation of digital data depends upon repeated incremental improvements or curations taking place in 5–10 year cycles. Quite different problems are associated with virtual collections, where the physical medium is less important than the information associated with the data itself. Few of these media have proven lifetimes exceeding 20 years and the real problem would be locating working devices capable of reading such physical media in the future. ![]() Relatively recently, such questions were largely directed towards the expected longevity of physical media such as punched cards and floppy disks (both now effectively extinct), hard drives, CDROMs, DVDs, magnetic tape etc. At the commencement of the present project, the original deposition of this information for 175,356 molecules into the institutional repository of the University of Cambridge represented the only openly accessible copy.Īn issue frequently raised in the context of research data management relates to the prospects of being able to access and use such digitally held information in the future. The information for each molecule was originally annotated by optimising the coordinates with respect to the energy obtained using the semi-empirical PM5 parameter set in MOPAC (then the most current parameter set) and creating a DSpace collection. An early example of such RDM is illustrated with a project to produce a library of quantum-mechanically-optimised molecular coordinates derived from a computable subset of the National Cancer Institutes (NCI) collection of small molecules. Their importance has recently come to the fore with funding agencies in the USA, Europe and Asia all indicating that open deposition of research data will become a mandatory aspect of their funding, and many universities are now starting to consider the implications of implementing research data management, or RDM. ![]() Research data repositories based on platforms such as DSpace were introduced about 10 years ago, and their use in domains such as chemistry and molecular sciences has gradually increased.
0 Comments
Leave a Reply. |
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |