Workshops and Tutorials
The 24th International Conference on Supercomputing (ICS-2010) program will include workshops and tutorials on Tuesday, June 1.
| Tuesday, 1 June 2010 | |||||||
| Morning | T1 | T2 | W1 (HAC'10) (CANCELLED) |
W2 (HEART) |
|||
| Afternoon |
T3 | T4 (start from 14:00) |
W3 (IRMM'10) | ||||
The following workshops will take place in conjunction with ICS'10:
- W1: 2nd International Workshop on Hybrid Architecture Computing (HAC’10)
- W2: International Workshop on Highly Efficient Accelerators and Reconfigurable Technologies (HEART) (9:00 - 17:00)
- W3: Information Retrieval by Matrix Methods on Supercomputer Infrastructures 2010 (IRMM'10) (13:00 - 17:00)
- T1: The Speedup-Test : a rigorous statistical protocol for declaring program speedups with high confidence (9:00 - 12:00)
- T2: Tutorial on Architecture Specific Optimizations for modern CPU and GPU (9:00 - 12:00)
- T3: Monolithic Supercomputing; what it is, why it’s important and how to design for it (13:00 - 17:00)
- T4: Designing High-End Computing Systems with InfiniBand and High-speed Ethernet (14:00 - 18:00)
The time schedule of each workshop may be updated after the final program is fixed. For additional information, please contact the chairs at ics2010-ws@c.csce.kyushu-u.ac.jp.
W1: 2nd International Workshop on Hybrid Architecture Computing (HAC’10) (CANCELLED)
Full day workshop
Workshop organizers: Taisuke Boku (University of Tsukuba, Japan), Serge Petiton ( CNRS/LIFL, France)
Large scale hybrid architecture computing which incorporates general purpose processors and computation accelerators in single system is considered very strong and promised system to support wide area of high performance computing toward Exa-FLOPS performance. Currently, various types of accelerator hardware such as GPGPU, Cell Broadband Engine, GRAPE-DR or other ASICs are ready to be attached with general purpose microprocessors in order to realize large scale and extremely powerful high-end computing. In this workshop, various types and fields of hybrid architecture computing systems as well as real applications with them are in the scope of topics. The purpose of the workshop is to discuss the hybrid architecture computing from several aspects such as hardware, system software, middleware, programming, application, benchmarks and execution environment. The workshop will collect wide variety of systems and applications as contributed papers with a couple of keynote and invited talks. Our goal is to confirm the current status of this high-end computing technology as well as the future trend and work toward Exa-FLOPS computing.
W2: International Workshop on Highly Efficient Accelerators and Reconfigurable Technologies (HEART)
Full day workshop (9:00 - 17:00)
Workshop organizers: Hideharu Amano (Keio University, Japan), Wayne Luk (Imperial College London, United Kingdom)
The 1st International Workshop on Highly Efficient Accelerators and Reconfigurable Technologies (HEART) is a forum to present and discuss new research on accelerators and the use of reconfigurable technologies for high-performance and/or power-efficient computation. Submissions are solicited on a wide variety of topics related to the acceleration for high-performance computation, including but not limited to:
Architectures and systems:
- Reconfigurable/configurable hardware and systems including IP-cores, embedded systems and/or SoCs for scalable, high-performance and/or low-power computation
- High-performance custom-computing systems based on reconfigurable technologies including FPGAs
- Grid/cluster computing with FPGAs, other reconfigurable technologies and accelerators such as GPGPUs and Cell/B.E.s
- Novel architectures and devices that can be applied to efficient acceleration, including many-core architectures, NoC architectures, optical devices/interconnection
- Novel applications for high-performance reconfigurable systems and accelerators including GPGPUs and Cell/B.E.
- Compiler techniques and programming languages for reconfigurable acceleration systems and other accelerators such as GPGPUs and Cell/B.E.
- Run-time systems and the use of run-time reconfigurability for acceleration
- Performance evaluation and analysis of reconfigurable acceleration systems and other accelerators such as GPGPUs and Cell/B.E.
W3: Information Retrieval by Matrix Methods on Supercomputer Infrastructures 2010 (IRMM'10)
Half day workshop (13:00 - 17:00)
Workshop organizers: Marian Vajtersic (University of Salzburg, AUSTRIA), Michael W. Berry (University of Tennessee, USA), Efstratios Gallopoulos (University of Patras, GREECE)
Information retrieval (IR) in large data sets, like web-pages, libraries, image collections is a challenging application for high-performance computing. The dimensionality of these problems can be enormous, millions of documents with thousands of features each. Moreover, the data is subject to frequent changes. Therefore, the primary goal is dimensionality reduction to allow the delivery of fast responses to queries and other requests within acceptable time constraints. The algorithmic foundation of IR critically depends on matrix methods. These are already well-examined candidates for HPC implementation but they have to be modified and newly structured in order to reflect the special requirements of this specific problem. The workshop aims to bring together researchers active in the development of algorithms and tools for high-performance IR on state-of-the-art systems (multicores, GPUs and Grid environments) to present their latest results. The workshop supports (but is not limited to) following topics:
- Innovative Matrix-Based Models for IR
- Novel Fast Linear Algebra Solvers for IR
- Solving Update--Downdate Problem of IR
- Specialized Approaches for Solving Sparse- and Dense- Vector IR Applications
- High-Performance Implementations of IR Matrix--Algorithms
- Solving IR-Problems on a Grid
- Formal Comparisons of Matrix-Based to Matrix-Free Methods
Link to workshop page
T1: The Speedup-Test : a rigorous statistical protocol for declaring program speedups with high confidence
Half day tutorial (9:00 - 12:00)
Tutorial organizer: Sid Touati (University of Versailles Saint-Quentin en Yvelines, France)
Numerous code optimisation methods are usually experimented by doing multiple observations of the initial and the optimised executions times in order to declare a speedup. Even with fixed input and execution environment, programs executions times vary in general, especially for toy/kernel benchrmaks. With the introduction of multi-core architectures, execution times variability is becoming increasingly unstable. So hence different kinds of speedups may be reported: the speedup of the average execution time, the speedup of the minimal execution time, the speedup of the median, etc. Many published speedups in the literature are observations of a set of experiments that do not guarantee reproducibility. In order to improve the reproducibility of the experimental results, this tutorial presents a rigorous statistical methodology regarding program performance analysis. We rely on well known statistical tests (Shapiro-wilk's test, Fisher's F-test, Student's t-test, Kolmogorov-Smirnov's test, Wilcoxon-Mann-Whitney's test) to study if the observed speedups are statistically significant or not. By fixing a desired risk level between 0 and 1, we are able to analyse the statistical significance of the average execution time as well as the median. We can also check if P(X>Y)>1/2, the probability that an individual execution of the optimised code is faster than the individual execution of the initial code. Our methodology defines a consistent improvement compared to the usual performance analysis method in high performance computing. We explain in each situation what are the hypothesis that must be checked to declare a correct risk level for the statistics. The Speedup-Test protocol certifying the observed speedups with rigorous statistics is implemented and will distributed for the tutorial as an open source tool based on the R software.
Ref: Sid-Ahmed-Ali Touati and Julien Worms and Sébastien Briais. The Speedup-Test. Technical Report number HAL-inria-00443839. University of Versailles Saint-Quentin en Yvelines. January 2010.
Link to the tutorial document and tool
T2: Tutorial on Architecture Specific Optimizations for modern CPU and GPU
Half day tutorial (9:00 - 12:00)
Tutorial organizer: Victor Lee (Throughput Computing Lab, Intel Corporation)
As multicore architectures overtake single-core architectures in today and future computer systems, applications must switch to parallel algorithms to achieve higher performance. Years of researches have yielded many ways to parallelize applications -functional decomposition, data partitioning, etc. However, we found exploiting parallelism alone at the algorithmic level is not sufficient to achieve the best performance. We must take into account the underlying platform architecture characteristics such as core architecture, SIMD width, bandwidth, etc. to achieve optimal application performance. We have been engaging in platform specific optimizations for many years and this tutorial will present our architecture specific optimization guides for modern CPU and GPU. We will use industry examples to illustrate how specific optimization techniques benefited the application performance.
Link to tutorial page
T3: Monolithic Supercomputing; what it is, why it’s important and how to design for it
Half day tutorial (13:00 - 17:00)
Tutorial organizers: Michael J Flynn (Maxeler Technologies, Palo Alto, CA, USA), Dennis Allison (Maxeler Technologies, Palo Alto, CA, USA), Oskar Mencer (Maxeler Technologies, London, UK), Rob Dimond (Maxeler Technologies, London, UK)
HPC applications have several different forms. This tutorial focuses on monolithic applications, sometimes referred to warehouse scale computing. These are applications with a defined computational structure occupying significant computational resources over an extended period of time. As used here a monolithic HPC application would, if executed on a conventional multi core processors, use a “GigaFlop” of computation per node and be executed on more than 1000 nodes for several days. This defines a problem size (or computational volume) of at least 1018 EPOs or essential program operations (such as floating point operations). Monolithic applications may be scalable on multi-core clusters but still hit practical limits on total number of nodes owing to power, space and reliability. For example, the next generation of oil and gas seismic simulations will require two or more orders of magnitude more computation and these are already very large cluster applications.Accelerators that significantly increase performance per-node offer a potential solution, if the application kernel meets certain criteria.Monolithic applications arise out of several inherently different needs: the processing of an enormous volume of sensor data at one extreme to processing extremely large computational simulations. While the monolithic HPC applications all have large a large computational volume and a defined dataflow graph, they differ significantly in at least three dimensions: dataflow graph size, input/output data processed and intermediate data storage. A search engine application may have a quite small data flow graph, no intermediate storage but huge number of instantiations of the data flow and potentially a large input data base. An oil and gas survey application may have an intermediate sixed data flow graph O(105) nodes with very large input/output data structures O(1012-1015) bytes and require 1010 bytes of intermediate storage (storing results for 3D image processing). Monolithic applications with well defined structure usually are amenable to computational acceleration. Acceleration hardware is added to conventional server hardware providing significant speedup to application execution. This can be used in several ways: enhancing the scope and robustness of the application or reducing the hardware costs, energy and space required.Monolithic applications share the essential environmental concerns of peak power cooling, energy consumption, reliability and availability.
Tutorial outline:
1) The need, opportunity, challenges, environment (power and space) for Monolithic computing
2) The computational alternatives: clusters, acceleration, types of accelerators, etc,
3) Hardware and structural issues: performance, energy, cooling and reliability.
4) Tools, acceleration methodology and some case studies
Ref. "The Fourth Paradigm: Data-Intensive Scientific Discovery"
by Hay, Tansley and Tolle and "Data Center as Computer, Warehouse
scale machines" by Barroso and Holzle.
Link to tutorial page (to be linked)
T4: Designing High-End Computing Systems with InfiniBand and High-speed Ethernet
Half day tutorial (14:00 - 18:00)
Tutorial organizers: D. K. Panda (The Ohio State University, USA), P. Balaji (Argonne National Laboratory, USA), S. Sur (The Ohio State University, USA)
InfiniBand (IB) and High-speed Ethernet interconnects are generating a lot of excitement towards building next generation High Performance Computing (HPC) systems and enterprise datacenters. This tutorial will provide an overview of these emerging interconnects, their offered features, their current market standing, and their suitability for prime-time HPC. It will start with a brief overview of IB, high-speed Ethernet and their architectural features. An overview of the emerging OpenFabrics stack which encapsulates both IB and Ethernet in a unified manner, and hardware technologies such as Virtual Protocol Interconnect (VPI), RDMA over Ethernet and RDMA over Converged Enhanced Ethernet that aim at converged hardware solutions will be presented. IB and high-speed Ethernet hardware/software solutions and the market trends will be highlighted. Finally, sample performance numbers highlighting the performance these technologies can achieve in different environments such as MPI, Sockets, Parallel File Systems, Multi-tier Datacenters, and Virtual Machines, will be shown.
Link to tutorial page

