Conquaire Continuous quality control for research data to ensure reproducibility

Conquaire at ELAG 2017 in Athens

The ELAG 2017 conference was hosted by the NTUA from 05-09th June, in sunny Athens. Our project (Conquaire) had applied to conduct a workshop at ELAG on "Automatic quality feedback for inter-disciplinary research data management" to be jointly presented by Christian and Vid.

Before the workshop we (Christian and me) attended the bootcamp Applying JSON-LD to make linked-data-driven applications on Tuesday and the main conference started on Wednesday, 07 June with keynote speeches and speakers. It was fun to see Tory mingle with the attendees who gladly took pics.

We had created a private repo for the attendees to see the demo and understand how we could use gitlab as a research tool to maintain quality of research data. All the attendees were sent a login and provided the workshop material within the repo.

The workshop session was over a period of two days, Wednesday and Thursday with the third day reserved for feedback presentations from the attendees in the main hall. On Wednesday, we had 10 attendees from different countries and work backgrounds. Christian and I had decided that we wanted an open un-conference BoF-like session that kept the discussion free-flowing with notes collected in etherpads. I kick-started the workshop by introducing ourselves, the Conquaire project and mentioned our workshop agenda and outcome: * Learn and gain insights on current research data management (RDM) practices. * Document tools and methodologies currently used within research groups. * Document guidelines on organizing research workflows in various interdisciplinary scientific research projects. * Document the impact of Libre tools on research reproducibility and workflow automation.

Then I requested each attendee to start with a self-introduction and designated a maintainer for each etherpad as we wanted to understand them and their expectations from the workshop. Christian used this opportunity to present the existing research data and publication services run by Bielefeld University Library.

After this, I gave a more detailed introduction to the Conquaire project and the first topic was on "Tools" used by researchers. We had a lively discussion on how researchers use FOSS but dont release their software as the legal department does not want them to loose the IP. Then we discussed the fundamental RDM challenges in interdisciplinary research projects to understand common file formats, research objects, and ontologies to store metadata that was used in research environments and how one could build an infrastructure to cater to this diversity. We quickly ran out of time and had to postpone two topics (data pipeline maintainence tools and common computational services, skills and technology) for the second day of the workshop.

On Thursday, the second day of the workshop I started off by resuming where we had left off yesterday, viz. data pipeline maintainence tools and common computational services, skills and technology. Then, Christian gave a demo of the GitLab infrastructure that could be used for the automation and software maintenance. He spoke about how GitLab cannot only track research data changes, but it can also run automated tests that check data integrity as it arrives – or make sure that analysis scripts still produce the same results. This can be done using GitLab CI or other continuous integrations tools, and he presented a minimal .gitlab-ci.yml file to get people started. Some participants had not used Git before, but they were impressed with the hands-on demo.

After this, we continued the workshop discussion on the impact of dependency hell (constant changes in technology) on reproducibility and most participants found this an interesting aspect as they had not considered this a serious issue. We discussed repeoducibility vis-a-vis the "research freedom", storage issues and what library services (ex. beaker, jupyter notebooks) etc.) could be integrated to manage research data.

Particiants seemed to enjoy the interleaving of slide presentations with feedback rounds and practical demonstrations. On Friday, Alain Borel from the Swiss Federal Institute of Technology, Lausanne, presented his feedback in the main hall. Another researcher, Vasiliki Kokkala, e-mailed their feedback and allowed us to reproduce it here:

It was a pleasure to attend this workshop. I am a postgraduate student and have not worked with the management of research data, but it is an area that interests me. So for me the workshop was useful as an analytic introduction to the subject, and I liked the way the subject matters were constructed in the process. It was also of great profit for me to listen to the other participants' experience of managing research data in an interdisciplinary context. It helped me to get a concrete idea of the practices and the problems that take place in the various institutions.