Conquaire Continuous quality control for research data to ensure reproducibility

Conquaire presented at Repscience-2016 Workshop in Hannover

On Friday, 9'th September 2016, Conquaire team members Cord Wiljes, Jochen Schirrwagen and Vidya Ayer attended the Repscience 2016 Workshop, co-located with TDPL2016 whose venue was the Congress Centrum near the city Zoo in the historical city of Hannover. The city, surrounded by two small rivers, was entirely rebuilt after the war and it has seamlessly melded the old and new : modern art occupies pride-of-place along the green tree-lined central city roads with an erstwhile palace forming the Leibniz University building.

Jochen chaired the first session and the last (third) session of the workshop in the afternoon and Vidya presented the goals and architecture of the Conquaire project at Universität Bielefeld. The workshop day was kick-started with an interesting keynote by Prof. Carole Goble of Manchester University where she ennumerated on the research objects and the 'R' in 'replicable', 'reproduce', 'rerun', 'repeat', 'repair', 'reconstruct', etc.. vis-a-vis the research lab environment from a technical and theoretical perspective.

Jingbo introduced the attendees to the Provenance Capture System at the Australian NCI's HPC centre. This cloud-based solution supports hosts 10PB of research data that uses the PROV Ontology to create a traceable, reproducible, and machine actionable workflow inorder to store and publish the information stored as RDF documents in an RDF graph database using PID services .

The Scholix talk touched upon the interoperability issues for linking datasets, the lack of a cogent literature exchange framework that they attempt to solve by creating a clear set of guidelines for RO, citations, data formats, protocols, etc.. within the OpenAccess framework.

The o2r presentation touched upon the publication process where a researcher can submit a folder and it creates an ERC (Executable Research Compendia) for the existing workspace and also check the metadata. The entire project uses Node.js extensively and and most interestingly they use Docker containers to package the R-lang workspace for creating base images for the ERC (executable research containers). Similarly, the ReplicationWiki project also uses docker and VM's for their research workflow automation.

Dr. Dallmeier-Tiessen, an invited speaker, shared her experiences and lessons learned while enabling reproducible research within their research group at CERN which echoed the various issues that most researchers faced. Stefan Pröll from SBA spoke about their Query store used to store queries, parameters, metadata for the data citations and data sets in their small and large scale research setting.

The FLARE project's workflow approach to tackle the research e-infrastructure challenges for repeating scientific workflows was with optimal database managment that includes language operators, execution operators and workflow controls for different types of data sources. They plan to extend an existing language such as 'R-lang' to achieve this flexibility in data manipulation.

The workshop ended with a lively brainstorming session where all attendees and speakers exchanged ideas ranging from 'Failure' or negative data that is never published to the lack of guidelines for 'Replication' and 'Reproducibility' in research.