Program (Workshop is in Room: Hilton 408)

Please enavluate our workshop by using the following link

There will be a journal special issue in the Future Generation Computer Systems Journal, which will include papers from WORKS and other innovative papers in the field of workflows. Please contact us if you are interested. The CFP is here

All slide can be found here except for the keynote, which can be found here.

9-10: Bertram Ludaescher, Keynote
YesWorkflow: More Provenance Mileage from Scientific Workflows and Scripts
Abstract and Author Info

Bertram Ludäscher is a professor at the Graduate School of Library and Information Science (GSLIS) at the University of Illinois, Urbana-Champaign, and the Director of the Center for Informatics Research in Science and Scholarship (CIRSS). He also holds affiliate faculty appointments at the National Center for Supercomputing Applications (NCSA) and the Department of Computer Science at UIUC. Prior to joining Illinois he was a professor at the Computer Science department and the Genome Center at UC Davis. His research interests are in scientific data and workflow management, knowledge representation and reasoning. He is one of the founders of the Kepler scientific workflow system, and a member of the DataONE leadership team, focusing on data and workflow provenance. Until 2004 he was a research scientist at the San Diego Supercomputer Center (SDSC). He received his MS in computer science from the Technical University of Karlsruhe (now KIT) and his PhD from the University of Freiburg, both in Germany.


An often touted advantage for using scientific workflow systems is their ability to capture provenance information during execution. The idea is that a controlled environment such as a workflow system makes it easy to record relevant observables, e.g., data read and write events. The captured provenance can then be used to document data lineage, to debug faulty runs, to speed up re-runs of workflows by reusing unchanged parts, or more generally, to support the reproducibility of computational science experiments. In this talk, I will first give an overview of different notions, forms, and research questions around data and workflow provenance. The database community, e.g., has developed specialized notions such as why, how, where, and why-not provenance. The scientific workflow community, on the other hand, has focused on forms of "black-box provenance", capturing, e.g., actor invocations and file I/O events to track possible data dependencies. Both communities share an interest in querying and analyzing provenance information. In the second part of the talk I will take a critical look at the current use of provenance information from scientific workflows and scripts and argue that open, interoperable tools are needed that can combine different forms of available provenance, e.g., recorded or reconstructed retrospective provenance together with prospective provenance given by a workflow specification or via high-level user-defined annotations in scripts. To this end, I will describe YesWorkflow, a new project and toolkit under development that combines different forms of provenance information to allow users to answer questions about the data created and used during workflow runs and script executions. An important source of provenance in the YesWorkflow approach are simple user annotations that represent a user's conceptual model of a workflow. In this way, YesWorkflow can link low-level provenance observables with high-level questions users often need to answer to conduct their (computational) science. Thus, in addition to outward-facing “provenance for others”, YesWorkflow emphasizes the utility of provenance for the researchers’ own purposes.

10:30-12:00: Session 1: Scheduling & resources allocation
10:30-11 - A Workflow Runtime Environment for Manycore Parallel Architectures
Matthias Janetschek, Radu Prodan and Shajulin Benedict
11-11:30 - Orchestrating Workflows Over Heterogeneous Networking Infrastructures
Ian Taylor and Joe Macker
11:30-12:00 - Towards Efficient Scheduling of Data Intensive High Energy Physics Workflows
Mahantesh Halappanavar, Malachi Schram, Luis de La Torre, Kevin Barker, Nathan Tallent and Darren Kerbyson
1:30-3:00: Session 2: Data flows Management
1:30-2:00 - Contemporary Challenges for Data-intensive Scientific Workflow Management Systems
Ryan Mork, Paul Martin and Zhiming Zhao
2:00-2:30 - Co-Sites: The Autonomous Distributed Dataflows in Collaborative Scientific Discovery
Yanwei Zhang, Matthew Wolf, Karsten Schwan, Qing Liu, Greg Eisenhauer and Scott Klasky
2:30-3:00 - Interlanguage Parallel Scripting for Distributed Memory Scientific Computing
Justin Wozniak, Timothy Armstrong, Ketan Maheshwari, Daniel Katz, Michael Wilde and Ian Foster
3:30-5:00: Control & re-execution
3:30-4:00 - Dynamically Reconfigurable Workflows for Time-Critical Applications
Kieran Evans, Andrew Jones, Alun Preece, Francisco Quevedo, David Rogers, Irena Spasic, Ian Taylor, Vlado Stankovski, Salman Taherizadeh, Jernej Trnkoczy, George Suciu, Victor Suciu, Paul Martin, Junchao Wang and Zhiming Zhao
4:00-4:30 - Enabling Workflow Repeatability with Virtualization Support
Fan Jiang, Claris Castillo, Charles Schmitt, Anirban Mandal, Paul Ruth and Ilya Baldin
4:00-4:30 - Workflow Provenance: An analysis of long term storage costs
Simon Woodman, Hugo Hiden and Paul Watson
5:00-5:30: Closing comments for the 10th Anniversary and discussion of the future of WORKS.

You can view the proceedings material using the following URL:

Please be aware that the material was just loaded to the DL and it takes time for the data to propagate through the entire system.

Full text article files (PDFs) will be available for download either on the conference start date or on the date set by the ACM SIG.