Publications

Spacey S, Luk W, Kuhn D, Kelly PHJet al., 2013, Parallel partitioning for distributed systems using sequential assignment, Journal of Parallel and Distributed Computing, Vol: 73, Pages: 207-219

This paper introduces a method to combine the advantages of both task parallelism and fine-grained co-design specialisation to achieve faster execution times than either method alone on distributed heterogeneous architectures. The method uses a novel mixed integer linear programming formalisation to assign code sections from parallel tasks to share computational components with the optimal trade-off between acceleration from component specialism and serialisation delay. The paper provides results for software benchmarks partitioned using the method and formal implementations of previous alternatives to demonstrate both the practical tractability of the linear programming approach and the increase in program acceleration potential deliverable.

Journal article

Eele A, Maciejowski J, Chau T, Luk Wet al., 2013, Parallelisation of sequential Monte Carlo for real-time control in air traffic management, Pages: 4859-4864, ISSN: 0743-1546

This paper presents the parallelisation of a Sequential Monte Carlo algorithm, and the associated changes required when applied to the problem of conflict resolution and aircraft trajectory control in air traffic management. The target problem is non-linear constrained, non-convex and multi-agent. The new method is shown to have a 98.5% computational time saving over that of a previous sequential implementation, with no degradation in path quality. The computation saving is enough to allow real-time implementation. © 2013 IEEE.

Abstract
Cite
Citations: 2

Conference paper

Lam YM, Tsoi KH, Luk W, 2013, Parallel neighbourhood search on many-core platforms, International Journal of Computational Science and Engineering, Vol: 8, Pages: 281-293, ISSN: 1742-7185

This paper presents a parallel search parallel move approach to parallelise neighbourhood search algorithms on many-core platforms. In this approach, a large number of searches are run concurrently and coordinated periodically. Iteratively, each search generates and evaluates multiple moves in parallel. The proposed approach can fully utilise the computing capability of many-core platforms under various platform specific constraints. A parallel simulated annealing algorithm for solving the travelling salesman problem is developed using the parallel search parallel move scheme and implemented on an NVIDIA Tesla C2050 GPU platform. We evaluate the performance of our approach against a multi-threaded CPU implementation on a server containing two Intel Xeon X5650 CPUs (12 cores in total). The experimental results of 20 benchmark problems show that the GPU implementation achieves 99 times speedup on average in solution space exploration speed. In terms of effectiveness, the GPU implementation is capable of finding good solutions 39.5 times faster or with 21.7% solution quality improvement given the same searching time. Copyright © 2013 Inderscience Enterprises Ltd.

Abstract
Cite
Citations: 13

Journal article

Chau TCP, Niu X, Eele A, Luk W, Cheung PYK, Maciejowski Jet al., 2013, Heterogeneous Reconfigurable System for Adaptive Particle Filters in Real-Time Applications, 9th International Applied Reconfigurable Computing Symposium (ARC), Publisher: SPRINGER-VERLAG BERLIN, Pages: 1-12, ISSN: 0302-9743

Author Web Link
Cite
Citations: 11

Conference paper

Niu X, Coutinho JGF, Luk W, 2013, A SCALABLE DESIGN APPROACH FOR STENCIL COMPUTATION ON RECONFIGURABLE CLUSTERS, 23rd International Conference on Field Programmable Logic and Applications (FPL), Publisher: IEEE, ISSN: 1946-1488

Conference paper

Guo C, Luk W, 2013, ACCELERATING MAXIMUM LIKELIHOOD ESTIMATION FOR HAWKES POINT PROCESSES, 23rd International Conference on Field Programmable Logic and Applications (FPL), Publisher: IEEE, ISSN: 1946-1488

Author Web Link
Cite
Citations: 2

Conference paper

Liu Q, Ma Y, Wang Y, Luk W, Bian Jet al., 2013, RALP: Reconvergence-Aware Layer Partitioning For 3D FPGAs, International Conference on Reconfigurable Computing and FPGAs (ReConFig), Publisher: IEEE, ISSN: 2325-6532

Conference paper

Gan L, Fu H, Luk W, Yang C, Xue W, Huang X, Zhang Y, Yang Get al., 2013, ACCELERATING SOLVERS FOR GLOBAL ATMOSPHERIC EQUATIONS THROUGH MIXED-PRECISION DATA FLOWENGINE, 23rd International Conference on Field Programmable Logic and Applications (FPL), Publisher: IEEE, ISSN: 1946-1488

Author Web Link
Cite
Citations: 7

Conference paper

Todman T, Luk W, 2013, RUNTIME ASSERTIONS AND EXCEPTIONS FOR STREAMING SYSTEMS, 23rd International Conference on Field Programmable Logic and Applications (FPL), Publisher: IEEE, ISSN: 1946-1488

Author Web Link
Cite
Citations: 1

Conference paper

Gan L, Fu H, Luk W, Yang C, Xue W, Yang Get al., 2013, Global Atmospheric Simulation on a Reconfigurable Platform, 21st Annual International IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM), Publisher: IEEE, Pages: 230-230

Conference paper

Niu X, Chau TCP, Jin Q, Luk W, Liu Qet al., 2013, Automating resource optimisation in reconfigurable design (abstract only)., Publisher: ACM, Pages: 275-275

Conference paper

Kurek M, Becker T, Luk W, 2013, Parametric Optimization of Reconfigurable Designs Using Machine Learning, 9th International Applied Reconfigurable Computing Symposium (ARC), Publisher: SPRINGER-VERLAG BERLIN, Pages: 134-145, ISSN: 0302-9743

Author Web Link
Cite
Citations: 4

Conference paper

Eele A, Maciejowski J, Chau T, Luk Wet al., 2013, Parallelisation of Sequential Monte Carlo for Real-Time Control in Air Traffic Management, 52nd IEEE Annual Conference on Decision and Control (CDC), Publisher: IEEE, Pages: 4853-4858, ISSN: 0743-1546

Conference paper

Niu X, Chau TCP, Jin Q, Luk W, Liu Qet al., 2013, Automating elimination of idle functions by run-time reconfiguration, 21st Annual International IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM), Publisher: IEEE, Pages: 97-104

Author Web Link
Cite
Citations: 7

Conference paper

Ruan H, Huang X, Fu H, Yang G, Luk W, Racaniere S, Pell O, Han Wet al., 2013, An FPGA-Based Data Flow Engine For Gaussian Copula Model, 21st Annual International IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM), Publisher: IEEE, Pages: 218-225

Author Web Link
Cite
Citations: 1

Conference paper

Grigoras P, Niu X, Coutinho JGF, Luk W, Bower J, Pell Oet al., 2013, Aspect Driven Compilation for Dataflow Designs, IEEE 24th International Conference on Application-Specific Systems, Architectures and Processors (ASAP), Publisher: IEEE, Pages: 18-25, ISSN: 2160-0511

Author Web Link
Cite
Citations: 3

Conference paper

Cattaneo R, Niu X, Pilato C, Becker T, Luk W, Santambrogio MDet al., 2013, A Framework for Effective Exploitation of Partial Reconfiguration in Dataflow Computing, 8th International Workshop on Reconfigurable and Communication-Centric Systems-on-Chip (ReCoSoC), Publisher: IEEE

Conference paper

Arram J, Luk W, Jiang P, 2013, ReconfigurACable Filtered Acceleration of Short Read AlignmentAC, 12th International Conference on Field-Programmable Technology (FPT), Publisher: IEEE, Pages: 438-441

Author Web Link
Cite
Citations: 5

Conference paper

Niu X, Coutinho JGF, Wang Y, Luk Wet al., 2013, Computing nodes in reconfigurable clusters are occupied and released by applications during their, 12th International Conference on Field-Programmable Technology (FPT), Publisher: IEEE, Pages: 214-221

Author Web Link
Cite
Citations: 2

Conference paper

Ng N, Yoshida N, Luk W, 2013, Scalable Session Programming for Heterogeneous High-Performance Systems, Publisher: Springer, Pages: 82-98

Cite

Conference paper

Arram J, Tsoi KH, Luk W, Jiang Pet al., 2013, Reconfigurable Acceleration of Short Read Mapping, 21st Annual International IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM), Publisher: IEEE, Pages: 210-217

Author Web Link
Cite
Citations: 24

Conference paper

Guo C, Luk W, 2013, Accelerating HAC Estimation for Multivariate Time Series, IEEE 24th International Conference on Application-Specific Systems, Architectures and Processors (ASAP), Publisher: IEEE, Pages: 42-49, ISSN: 2160-0511

Author Web Link
Cite
Citations: 3

Conference paper

Chau TCP, Kwok K-W, Chow GCT, Tsoi KH, Lee K-H, Tse Z, Cheung PYK, Luk Wet al., 2013, Acceleration of Real-time Proximity Query for Dynamic Active Constraints, 12th International Conference on Field-Programmable Technology (FPT), Publisher: IEEE, Pages: 206-213

Author Web Link
Cite
Citations: 2

Conference paper

Inggs G, Thomas D, Luk W, 2013, A Heterogeneous Computing framework for Computational Finance, 42nd Annual International Conference on Parallel Processing (ICPP), Publisher: IEEE, Pages: 688-697, ISSN: 0190-3918

Author Web Link
Cite
Citations: 5

Conference paper

Thomas DB, Luk W, 2013, Multiplierless Algorithm for Multivariate Gaussian Random Number Generation in FPGAs, Very Large Scale Integration (VLSI) Systems, IEEE Transactions on, Vol: PP, Pages: 1-1-1-1, ISSN: 1063-8210

Journal article

Petrov Z, Zaykov PG, Cardoso JMP, Coutinho JGF, Diniz PC, Luk Wet al., 2013, An Aspect-Oriented Approach for Designing Safety-Critical Systems, IEEE Aerospace Conference, Publisher: IEEE, ISSN: 1095-323X

Conference paper

Chau TCP, Luk W, Cheung PYK, 2012, Roberts, ACM SIGARCH Computer Architecture News, Vol: 40, Pages: 10-15, ISSN: 0163-5964

<jats:p>This paper presents Roberts, a Reconfigurable platfOrm for BEnchmarking Real-Time Systems. Roberts is the first platform which can be customised for a given system-under-test to support benchmarking of real-time properties and energy consumption. The benchmarking takes into account system workload and environmental events, with facilities for generating test vectors conforming to the specification of systemunder- test, and with support for on-line monitoring of the response time, output values and energy consumption. The proposed benchmarking platform has been implemented in the DE4 development system to provide cycle-accurate timing measurement at nano-second precision to analyse high performance applications. An evaluation of our approach shows that the platform can be used in analysing the performance of target applications and overheads of other timing facilities, such as the interval timer on processors.</jats:p>

Journal article

Tsoi KH, Becker T, Luk W, 2012, Modelling reconfigurable systems in event driven simulation, Publisher: Association for Computing Machinery (ACM), Pages: 34-39, ISSN: 0163-5964

<jats:p>Reconfigurable platforms allow hardware developers to customise their designs for specific applications. However, their adoption involves challenges in understanding and estimating the impact of various design parameters and approaches. This paper proposes a unified framework to model behaviour of reconfigurable systems using an event driven simulation approach. This provides an abstract yet informative method to capture both analytical relationships and empirical parameters of reconfigurable systems. It can be used to help making design decisions or verifying analytical models. We apply this approach to three models of reconfigurable applications to estimate the communication efficiency of networked clusters, and the performance and energy efficiency of runtime reconfigurable designs for software-defined radio and for option pricing in finance. The results show that, through this simulation framework, we can verify the accuracy of analytical models and also obtain practical information that is not provided by analytical models.</jats:p>

Conference paper

Todman T, Luk W, 2012, Verification of streaming designs by combining symbolic simulation and equivalence checking, Pages: 203-208

As design complexity grows, verification becomes a bottleneck in design development and implementation. This paper describes a novel approach for verifying reconfigurable streaming designs, based on symbolic simulation and equivalence checking. Compared with numerical simulation, symbolic simulation provides a more informative way of showing a design behaved as expected; equivalence checking enables automatic checking of equivalence of symbolic expressions. Our approach has been implemented for designs targeting Maxeler technologies, using an easy-to-use symbolic simulator and the Yices equivalence checker, together with other facilities such as an output combiner to support an automated verification flow. Several benchmarks including, including one-dimensional convolution and finite difference computation, are used to evaluate the proposed approach. © 2012 IEEE.

Abstract
Cite
Citations: 5

Conference paper

Betkaoui B, Wang Y, Thomas DB, Luk Wet al., 2012, Parallel FPGA-based all pairs shortest paths for sparse networks: A human brain connectome case study, Pages: 99-104

This paper proposes a highly parallel and scalable reconfigurable design for the All-Pairs Shortest-Paths (APSP) algorithm for very sparse networks. Our work is motivated by a computationally intensive bioinformatics application that employs this memory-latency bound algorithm. The proposed design methodology takes advantage of distributed on-chip memory resources of modern FPGAs to reduce accesses to high-latency off-chip memories. We develop design optimisations that yield different FPGA configurations which are selected at run time based on the input graph data. Using human brain network data, we are able to achieve performance results superior to those from multi-core CPU and GPU, while attaining linear scaling over the number of processors introduced. Our FPGA-based APSP design is over 10 times faster than a quad-core CPU implementation and 2-5 times faster than an AMD Cypress GPU implementation. © 2012 IEEE.

Abstract
Cite
Citations: 16

Conference paper

ProfessorWayneLuk

Contact

Location

Summary