TY - GEN
T1 - Exploring repositories of scientific workflows
AU - Stoyanovich, Julia
AU - Taskar, Ben
AU - Davidson, Susan
PY - 2010
Y1 - 2010
N2 - Scientific workflows are gaining popularity, and repositories of workflows are starting to emerge. In this paper we present some initial experiences of information discovery in repositories of scientific workflows. In the first part of the paper we consider a collection of VisTrails workflows, and explore how this collection may be summarized when workflow modules are used as features. We present a hierarchical browsable view of the repository in which categories are derived using frequent itemset mining or latent Dirichlet allocation. We demonstrate that both approaches may be used for effective data exploration. In the second part of the paper we focus on a collection of Taverna workflows from myExperi-ment.org, and consider how these workflows may be browsed using modules and tags as features. Finally, we outline some interesting challenges and describe conditions under which these techniques work well for repositories of scientific workflows, and conditions under which additional work is needed for effective data exploration.
AB - Scientific workflows are gaining popularity, and repositories of workflows are starting to emerge. In this paper we present some initial experiences of information discovery in repositories of scientific workflows. In the first part of the paper we consider a collection of VisTrails workflows, and explore how this collection may be summarized when workflow modules are used as features. We present a hierarchical browsable view of the repository in which categories are derived using frequent itemset mining or latent Dirichlet allocation. We demonstrate that both approaches may be used for effective data exploration. In the second part of the paper we focus on a collection of Taverna workflows from myExperi-ment.org, and consider how these workflows may be browsed using modules and tags as features. Finally, we outline some interesting challenges and describe conditions under which these techniques work well for repositories of scientific workflows, and conditions under which additional work is needed for effective data exploration.
UR - http://www.scopus.com/inward/record.url?scp=77955857994&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=77955857994&partnerID=8YFLogxK
U2 - 10.1145/1833398.1833405
DO - 10.1145/1833398.1833405
M3 - Conference contribution
AN - SCOPUS:77955857994
SN - 9781450301886
T3 - Proceedings of the ACM SIGMOD International Conference on Management of Data
BT - Proceedings of the 1st International Workshop on Workflow Approaches to New Data-centric Science, Wands '10
T2 - 1st International Workshop on Workflow Approaches to New Data-centric Science, Wands '10
Y2 - 6 June 2010 through 6 June 2010
ER -