main.tex

\documentclass[procedia]{easychair}
\usepackage{url}
\usepackage[flushleft]{threeparttable}
\hyphenation{resour-ces}
\hyphenation{approac-hes}
\hyphenation{har-der}
\hyphenation{spe-cifically}
\makeatletter
\def\@copyrightspace{\relax}
\makeatother

\usepackage{comment}
\usepackage{wrapfig}
\excludecomment{TM}

\title{An Invariant Framework for Conducting Reproducible Computational Science}

\titlerunning{An invariant framework for conducting reproducible computational science}

\author{
	Haiyan Meng\inst{2}
\and Rupa Kommineni\inst{1}
\and Quan Pham\inst{1} \\
\and Robert Gardner\inst{1}
\and Tanu Malik\inst{1}
\and
	Douglas Thain\inst{2}
}
\institute{
	Computation Institute,
	University of Chicago,
	Chicago, Illinois, USA \\
	\email{rupa, quanpt, rwg, tanum@uchicago.edu}
\and
	Department of Computer Science and Engineering,
	University of Notre Dame,
	Notre Dame, Indiana, USA \\
	\email{hmeng, dthain@nd.edu}
}

%  \authorrunning{} has to be set for the shorter version of the authors' names;
% otherwise a warning will be rendered in the running heads. When processed by
% EasyChair, this command is mandatory: a document without \authorrunning
% will be rejected by EasyChair

\authorrunning{Meng et al.}

\begin{document}

\maketitle

\keywords{Preservation framework, reproducible research, virtualization, container}

\begin{abstract}
\it Computational reproducibility depends on the ability to not only isolate necessary and sufficient computational artifacts but also to preserve those artifacts for later re-execution. Both isolation and preservation present challenges in large part due to the complexity of existing software and systems as well as the implicit dependencies, resource distribution, and shifting compatibility of systems that result over time---all of which conspire to break the reproducibility of an application. Sandboxing is a technique that has been used extensively in OS environments in order to isolate computational artifacts. Several tools were proposed recently that employ sandboxing as a mechanism to ensure reproducibility. However, none of these tools preserve the sandboxed application for re-distribution to a larger scientific community—aspects that are equally crucial for ensuring reproducibility as sandboxing itself. In this paper, we describe a framework of combined sandboxing and preservation, which is not only efficient and invariant, but also practical for large-scale reproducibility. We present case studies of complex high-energy physics applications and show how the framework can be useful for sandboxing, preserving, and distributing applications. We report on the completeness, performance, and efficiency of the framework, and suggest possible standardization approaches. 

\end{abstract}

\vspace{-10pt}
\input {intro}
\vspace{-10pt}
\input {tauroast}
\vspace{-10pt}
\input {observation}
\vspace{-10pt}
\input {evolution}
\vspace{-10pt}
\input {measure}
\vspace{-10pt}
\input {evaluation}
\vspace{-10pt}
\input {related_work}
\vspace{-10pt}
\section{Conclusions and Future work}
In this paper, we propose an invariant framework for conducting reproducible computational science - using light-weight virtualization approaches to preserve applications in the format of self-contained packages and using standardized software delivery mechanisms to deliver and distribute preserved packages.
We use two complex high energy physics applications to illustrate how the framework can help the original authors preserve and distribute the applications, and others reproduce the applications.

This paper focuses on how to measure the mess and track the used dependencies to preserve an application.
In the following work, we plan to explore how to preserve an application in an organized style - specifying the execution environment clearly.
How to preserve and improve the availability of remote network resources is another important problem to be explored.

The DOI name for the experiment involved in the paper is \url{doi:10.7274/R0C24TCG}, and current information may be found on the web through \url{http://doi.org/doi:10.7274/R0C24TCG}.

The Athena experiment is further preserved at: 

\indent \indent \indent \indent \indent \url{https://sites.google.com/site/invariantcompatlas/} 

\section*{Acknowledgments}

This work was supported in part by National Science Foundation grants PHY-1247316 (DASPOS), 
OCI-1148330 (SI2), PHY-1312842, ICER-1440327, SES-0951576 (RDCEP), and ICER-1343816 (UChicago subcontract).
The University of Notre Dame Center for Research Computing scientists and engineers provided critical technical assistance throughout this research effort.
The Open Science Grid at the University of Chicago provided critical technical assistance throughout this research effort.

\vspace{-10pt}
\bibliographystyle{plain}
\bibliography{cclpapers,this,sole}

\end{document}