Home | News & Events | Events | 2014 Events | NISO Virtual Conferences | April 23: Dealing with the Data Deluge: Successful Techniques for Scientific Data Management

NISO Virtual Conference: Dealing with the Data Deluge: Successful Techniques for Scientific Data Management

April 23, 2014
11:00 a.m. - 5:00 p.m. (Eastern Time)

System Requirements:

  • NISO has developed a quick tutorial, How to Participate in a NISO Web Event. Please view the recording, which is an overview of the web conferencing system and will help to answer the most commonly asked questions regarding participating in an online Webex event.
  • You will need a computer for the presentation and Q&A.
  • Audio is available through the computer (broadcast) and by telephone. We recommend you have a set-up for telephone audio as back-up even if you plan to use the broadcast audio as the voice over Internet isn't always 100% reliable.
  • Please check your system in advance to make sure it meets the Cisco WebEx requirements. It is your responsibility to ensure that your system is properly set up before each webinar begins. 

About the Virtual Conference

With the expansion of digital data collection and the increased expectations of data sharing, researchers are turning to their libraries or institutional repositories as a place to store and preserve that data. Many institutions have created such data management services and see the data curation role as a growing and important element of their service portfolio. While some of the experience in managing other types of digital resources is transferrable, the management of large-scale scientific data has many special requirements and challenges. From metadata collection and cataloging data sources, to identification, discovery, and preservation, best practices and standards are still in their infancy.

This Virtual Conference will explore in greater depth than traditional webinars some of the practical lessons from those who have implemented data management and developed best practices, as well as provide some insight into the evolving issues the community faces. It will include discussions related to certification of trusted repositories, provenance and identification issues around data, data citation, preservation, and the work of several repository networks to advance distribution of scientific information.


Todd Carpenter, Executive Director, NISO

* * * * * * * * *
11:10 a.m. - 12:00 p.m. Keynote Speaker: DataCite – A Global Approach for Better Data Sharing
Jan Brase, Ph.D., German National Library of Science and Technology

The publication of research data is still not a widespread practice in many disciplines. The lack of acceptance of data as scientific output equal to scientific articles, and the lack of suitable infrastructures for the storage of data make it difficult to publish and cite data independently. The global consortium DataCite was established in 2009 to overcome the challenges of data citation. The aim of the consortium is to establish easy access to data, to increase the acceptance of data publication and to support data archiving. The use of Digital Object Identifiers (DOI) provides an easy method to access and re-use research data. The DOI facilitates the citation of data and therefore increases the availability and acknowledgement of research data.

Jan Brase has a degree in Mathematics and a Ph.D. in Computer Science. His research background is metadata, ontologies and digital libraries. Since 2005, he has served as head of the DOI Registration Agency for research data at the German National Library of Science and Technology (TIB). Jan is also the Executive Officer of DateCite, an international consortium with 22 members from 14 countries) focused on the goal of making the online access to research data for scientists easier by promoting the acceptance of research data as individual, citable scientific objects.

Jan is Chair of the International DOI foundation (IDF), President of the International Council for Scientific and Technical Information (ICSTI) and Co-Chair of the recently established CODATA Data Citation task group. He is author of several articles and conference paper on the citation of data sets and the new challenges for libraries in dealing with such non-textual information objects.

* * * * * * * * *

12:00 p.m. - 12:30 p.m. Guidelines and Resources for Office of Science and Technology Policy (OSTP) Data Access Plans
Jared Lyle, Director of Data Curation Services, Interuniversity Consortium for Political and Social Research (ICPSR), University of Michigan

In February 2013, the Executive Office of the President's Office of Science and Technology Policy published a memo titled "Increasing Access to the Results of Federally Funded Scientific Research," which directs funding agencies with an annual R&D budget over $100 million to develop a public-access plan for disseminating the results of their research. This presentation will cover the data portion of the memo, explain the 13 elements for a public-access plan, and discuss resources and options to successfully address the elements, including examples of public-access projects at ICPSR.

Jared Lyle directs the Curation Services Unit at the Inter-university Consortium for Political and Social Research (ICPSR). His work includes developing and maintaining a comprehensive approach to data management and digital preservation policy at ICPSR.

* * * * * * * * *

12:30 p.m. - 1:00 p.m. Joint Declaration of Data Citation Principles: Implementation and Compliance in the Dataverse Repository  
Mercè Crosas, Ph.D., Director of Data Science, Institute for Quantitative Social Science (IQSS), Harvard University

Decades of data citation research, initiatives and guidelines have been consolidated into a single set of Data Citation Principles, created by a synthesis group that represents more than 25 organizations. The principles are driven by the premise that "sound, reproducible scholarship rests upon a foundation of robust, accessible data" and therefore "data should be considered legitimate, citable products of research". The Dataverse repository, developed at Harvard University's IQSS, generates a data citation compliant with the Joint Principles, and provides data publishing workflows to guarantee a persistent linkage between journal articles and the underlying data. The Dataverse is open and free to all researchers.

Mercè Crosas is the Director of Data Science at the Institute for Quantitative Social Science (IQSS) at Harvard University. Her group includes the Dataverse Network project, data acquisition and curation, the Murray Research Archive, statistical programming (Zelig and other R statistical packages), and the Consilience project on text analysis.

* * * * * * * * *

1:00 p.m. - 1:45 p.m. Lunch Break

* * * * * * * * *

1:45 p.m. - 2:15 p.m. Purdue University Research Repository (PURR): A Commitment to Supporting Researchers
Michael Witt, Head, Distributed Data Curation Center (D2C2); Associate Professor of Library Science, Purdue University Research Repository (PURR)

The Purdue University Research Repository (PURR) uses HUBzero to provide a research collaboration and data management platform for the researchers on its campus. Funding agencies in the United States and other countries are beginning to require that researchers explain how they will manage and share the data that will be produced from their research. PURR enables researchers to create and implement effective data management plans, to invite collaborators to work with them in a web-based, virtual research environment, and to publish datasets in a scholarly context, and archive them in a secure, reliable institutional repository. This presentation will explore the development of PURR including an overview of its service definition and design, governance, roles and responsibilities, workflows, policies, infrastructure, and some early metrics of adoption.

Michael Witt is the head of the Distributed Data Curation Center (D2C2) and an Associate Professor of Library Science. He is the library's liaison to Computer Science as well as the Project Director for the Purdue University Research Repository (PURR).

* * * * * * * * *

2:15 p.m. - 2:45 p.m.  The Roles of Data Citation in Data Management
Christine L. Borgman, Professor & Presidential Chair in Information Studies, UCLA

One of the continuing barriers to managing data for reuse is that authors rarely cite the data they use. The problem is manifold. Many stakeholders with competing interests are concerned with data citation, including authors, publishers, funding agencies, universities, libraries, repositories, and commercial and public interest groups. Goals and practices for citation mechanisms vary accordingly, such as credit, attribution, discovery, licensing, access, curation, and reuse. The problem of data citation is not simply a technical matter of mapping bibliographic citation methods. Rather, it is a challenge that lies deep within scholarly communication practices. This talk will explore the roles of data citation in scholarship and the implications for managing research data.

Christine L. Borgman is Professor and Presidential Chair in Information Studies at UCLA. Prof. Borgman is the author of more than 200 publications in information studies, computer science, and communication. Her monographs, Scholarship in the Digital Age: Information, Infrastructure, and the Internet (MIT Press, 2007) and From Gutenberg to the Global Information Infrastructure: Access to Information in a Networked World (MIT Press, 2000), each won the Best Information Science Book of the Year award from the American Society for Information Science and Technology. Her next book, Big Data, Little Data, No Data: Scholarship in the Networked World, is forthcoming from MIT Press in late 2014. She is a Fellow of the American Association for the Advancement of Science and of the Association for Computing Machinery.

* * * * * * * * *

2:45 p.m. - 3:15 p.m.  Is This Data Fit for My Use? The Challenges and Opportunities Data Provenance Presents
Adriane Chapman, MITRE

Given huge amounts of data to sift through, how can a user truly understand whether a dataset is fit for a particular purpose? If the fields and data look about right, what other clues are there to help find and choose between data sources? Provenance can provide additional information to a user on the applicability of a data source; was it owned by an untrustworthy organization in the past, was it run through a cleansing algorithm that altered fields, etc. Provenance usage, as well as the cost of capturing it, is discussed.

Adriane Chapman, PhD. is a Principle Database Technology Software Engineer at MITRE, where she leads their provenance research effort. Her focus is on taking academic theory on provenance and making it viable in functioning systems.

* * * * * * * * *

3:15 p.m. - 3:30 p.m. Afternoon Break

* * * * * * * * *

3:30 p.m. - 4:00 p.m. A Durable Space: Technologies for Accessing Our Collective Digital Heritage 
David Wilcox, Product Manager, DuraSpace

Repositories play a critical role in managing and preserving scientific research data. This presentation will give an overview of repository software, with a particular focus on the open source Fedora Commons project. Fedora provides a flexible, extensible digital object model that is ideally suited to supporting the diversity and complexity of scientific research data. By exploring real-world use cases, this presentation will showcase the ways Fedora can be used for research data management.

David Wilcox is the Product Manager for the open source Fedora Repository project at DuraSpace. He sets the vision for Fedora and serves as strategic liaison to the steering committee, advisory group, sponsors, service providers, and other stakeholders. David works together with the Fedora Technical Lead to oversee key project processes such as gathering requirements, setting work priorities, and coordinating user acceptance testing. David holds an MLIS from Dalhousie University and a BA from St. Thomas University.

* * * * * * * * *

4:00 p.m. - 4:30 p.m. The SHared Access Research Ecosystem (SHARE) Project: A Joint Initiative of ARL, AAU, and APLU
Judy Ruttenberg, Program Director for Transforming Research Libraries, Association of Research Libraries (ARL)

Ruttenberg will provide an update on SHARE (SHared Access Research Ecosystem), a higher-education based initiative that aims to make the inventory of research assets more discoverable and more accessible, and to enable the research community to build upon these assets in creative and productive ways. Specifically, Ruttenberg will describe SHARE's plans for an automated notification service for "research events" defined broadly to include published articles, data, grey literature and more. She will focus on the minimum requirements for the service, and touch upon concurrent discussions within the overall SHARE initiative.

Judy Ruttenberg is the program director for the Transforming Research Libraries strategic direction. Her responsibilities also include the E-Research Working Group and the Transforming Special Collections in the Digital Age Working Group.

Prior to joining ARL in 2011, Judy was a program officer at the Triangle Research Libraries Network (TRLN) where she coordinated the work of TRLN's collections groups, focusing on issues such as collections analysis, shared collections, and large-scale digitization. She has also held library appointments at the University of California Irvine, California State University at Fullerton, and the National Criminal Justice Reference Service.

Judy holds an MLS from the University of Maryland College Park, an MA in history from the University of Massachusetts Amherst, and a BA from the University of Michigan.

* * * * * * * * *

4:30 p.m. - 5:00 p.m. Conference Roundtable:
Moderated by: Todd Carpenter, Executive Director, NISO

Event Slides


If paying by credit card, register online.

If paying by check, please use this PDF form.

Registration closes on April 22, 2014 at 4:00 p.m. (ET).

Registration Costs

  • NISO Member
    • $185.00 (US and Canada)
    • $225.00 (International)
  • Non-Member
    • $245.00 (US and Canada)
    • $285.00 (International)
  • Student
    • $80.00

Additional Information

  • Registration closes on April 22, 2014 at 4:00 p.m. (ET). Cancellations made by  April 16, 2014 will receive a refund, less a $35 cancellation. After that date, there are no refunds.

  • Registrants will receive detailed instructions about accessing the virtual conference via e-mail the Friday prior to the event. (Anyone registering between Monday and the close of registration will receive the message shortly after the registration is received, within normal business hours.) Due to the widespread use of spam blockers, filters, out of office messages, etc., it is your responsibility to contact the NISO office if you do not receive login instructions before the start of the webinar.

  • If you have not received your Login Instruction email by 10AM (ET) on the Tuesday before the virtual conference, please contact the NISO office or email Juliana Wood, Educational Programs Manager at jwood@niso.org for immediate assistance.

  • Registration is per site (access for one computer) and includes access to the online recorded archive of the conference. You may have as many people as you like from the registrant's organization view the conference from that one connection. If you need additional connections, you will need to enter a separate registration for each connection needed.

  • If you are registering someone else from your organization, either use that person's e-mail address when registering or contact the NISO office to provide alternate contact information.

  • Conference presentation slides and Q&A will be posted to this event webpage following the live conference.

  • Registrants will receive an e-mail message containing access information to the archived conference recording within 48 hours after the event. This recording access is only to be used by the registrant's organization.