Date of Award
7-24-2017
Thesis Type
phd
Document Type
Thesis (Restricted Access)
Divisions
fsktm
Department
Faculty of Computer Science & Information Technology
Institution
University of Malaya
Abstract
With the advancement in science and technology numerous complex scientific applications can be executed in heterogeneous computing environment. However, the bottle neck is efficient scheduling algorithms. Such complex applications can be expressed in the form of workflows. Geographically distributed heterogeneous resources can execute such workflows in parallel. This enhances the workflow execution. In data-intensive workflows, heavy data moves across the execution nodes. This causes high communication overhead. To avoid such overheads many techniques have been used, however in this thesis stream-based data processing model is used in which data is processed in the form of continuous instances of data items. Data-intensive workflow optimization is an active research area because numerous applications are producing huge amount of data that is increasing exponentially day by day. This thesis proposes data-intensive workflow optimization algorithms. The first algorithm architecture consists of two phases a) workflow partitioning, and b) partitions mapping. Partitions are made in such a way that minimum data should move across the partitions. It enables heavy data processing locally on same execution node because each partition is mapped to one execution node. It overcomes the high communication costs. In the mapping phase, a partition is mapped on that execution node which offers minimum execution time. Eventually, the workflow is executed. The second algorithm is a variation in first algorithm in which data parallelism is introduced in each partition. Most compute intensive task in each partition is identified and data parallelism is applied to that task. It reduces the execution time of that compute intensive tasks. The simulation results prove that proposed algorithms outperform from state of the art algorithms for variety of workflows. The datasets used for performance evaluation are synthesized as well as workflows derived from real world applications. The workflows derived from real world applications include Montage and Cybershake. Synthesized workflows were generated with different sizes, shapes and densities to evaluate the proposed algorithms. The simulation results shows 60% reduced latency with 47% improvement in the throughput. Similarly, when data parallelism is introduced in the algorithm the performance of the algorithm improved further by 12% in latency and 17% in throughput when compared to PDWA algorithm. In the real time stream processing framework the experiments were performed using STORM with a use-case data-intensive workflow (EURExpressII). Experiments show that PDWA outperforms in terms of execution time of the workflow with different input data size.
Note
Thesis (PhD) – Faculty of Computer Science & Information Technology, University of Malaya, 2017.
Recommended Citation
Saima Gulzar, Ahmad, "Workflow optimization in distributed computing environment for stream-based data processing model / Saima Gulzar Ahmad" (2017). Student Works (2010-2019). 4626.
https://knova.um.edu.my/student_works_2010s/4626