Publications

Ho CH, Yu CW, Leong PHW, Luk W, Wilton SJEet al., 2009, Floating-point FPGA: architecture and modeling, IEEE Transactions on VLSI Systems, Vol: 17, Pages: 1709-1718

Cite

Journal article

Morris GW, Thomas DB, Luk W, 2009, FPGA accelerated low-latency market data feed processing, 17th Symposium on High-Performance Interconnects, Publisher: IEEE, Pages: 83-89

Author Web Link
Cite
Citations: 14

Conference paper

Mencer O, Tsoi KH, Craimer S, Todman T, Luk W, Wong MY, Leong PHWet al., 2009, CUBE: A 512-FPGA CLUSTER, 5th Southern Conference on Programmable Logic, Publisher: IEEE, Pages: 51-+

Author Web Link
Cite
Citations: 18

Conference paper

Lamoureux J, Field T, Luk W, 2009, Accelerating a Virtual Ecology Model with FPGAs, 20th IEEE International Conference on Application-Specific Systems, Architectures and Processors, Publisher: IEEE, Pages: 67-74, ISSN: 2160-0511

Conference paper

Ho CH, Luk W, Szefer JM, Lee RBet al., 2009, Tuning Instruction Customisation for Reconfigurable System-on-Chip, IEEE International SOC Conference, Publisher: IEEE, Pages: 61-+, ISSN: 2164-1676

Conference paper

Koester M, Luk W, Hagemeyer J, Porrmann Met al., 2009, Design Optimizations to Improve Placeability of Partial Reconfiguration Modules, Design, Automation and Test in Europe Conference and Exhibition, Publisher: IEEE, Pages: 976-+, ISSN: 1530-1591

Author Web Link
Cite
Citations: 5

Conference paper

Potter PG, Luk W, Cheung P, 2009, Partition-based exploration for reconfigurable JPEG designs, Design, Automation and Test in Europe Conference and Exhibition, Publisher: IEEE, Pages: 886-889, ISSN: 1530-1591

Author Web Link
Cite
Citations: 1

Conference paper

Das J, Wilton SJE, Leong P, Luk Wet al., 2009, MODELING POST-TECHMAPPING AND POST-CLUSTERING FPGA CIRCUIT DEPTH, 19th International Conference on Field Programmable Logic and Applications, Publisher: IEEE, Pages: 205-+, ISSN: 1946-1488

Author Web Link
Cite
Citations: 11

Conference paper

Terry L, Roitch V, Tufail S, Singh K, Taraq O, Luk W, Jamieson Pet al., 2009, Harnessing Human Computation Cycles for the FPGA Placement Problem., Publisher: CSREA Press, Pages: 188-194

Conference paper

Fidjeland AK, Roesch EB, Shanahan MP, Luk Wet al., 2009, NeMo: A Platform for Neural Modelling of Spiking Neurons Using GPUs, 20th IEEE International Conference on Application-Specific Systems, Architectures and Processors, Publisher: IEEE, Pages: 137-144, ISSN: 2160-0511

Author Web Link
Cite
Citations: 64

Conference paper

Tsoi KH, Rueckert D, Ho CH, Luk Wet al., 2009, RECONFIGURABLE ACCELERATION OF 3D IMAGE REGISTRATION, 5th Southern Conference on Programmable Logic, Publisher: IEEE, Pages: 95-100

Conference paper

Spacey SA, Luk W, Kelly PHJ, Kuhn Det al., 2009, RAPID DESIGN SPACE VISUALISATION THROUGH HARDWARE/SOFTWARE PARTITIONING, 5th Southern Conference on Programmable Logic, Publisher: IEEE, Pages: 159-164

Author Web Link
Cite
Citations: 6

Conference paper

Todman T, Fu H, Tsoi B, Mencer O, Luk Wet al., 2009, Smart Enumeration: A Systematic Approach to Exhaustive Search, 18th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS), Publisher: SPRINGER-VERLAG BERLIN, Pages: 429-438, ISSN: 0302-9743

Author Web Link
Cite
Citations: 1

Conference paper

Tse AHT, Thomas DB, Luk W, 2009, Accelerating Quadrature Methods for Option Valuation, 17th Annual IEEE Symposium on Field Programmable Custom Computing Machines, Publisher: IEEE COMPUTER SOC, Pages: 29-36

Author Web Link
Cite
Citations: 4

Conference paper

Thomas DB, Luk W, 2009, FPGA Accelerated Simulation of Biologically Plausible Spiking Neural Networks, 17th Annual IEEE Symposium on Field Programmable Custom Computing Machines, Publisher: IEEE COMPUTER SOC, Pages: 45-52

Author Web Link
Cite
Citations: 46

Conference paper

Pell O, Luk W, 2008, Instance-Specific Design, Pages: 455-474

This chapter covers instance-specific design, an optimization technique involving effective exploitation of information specific to an instance of a generic design description. It introduces different types of instance-specific designs with examples and describes partial evaluation, a systematic method for producing instance-specific designs that can be automated. It covers the application of partial evaluation to hardware design in general and to field-programmable gate arrays (FPGAs) in particular. FPGAs are an effective way to implement designs in computationally intensive datapath-oriented applications such as cryptography, digital signal processing (DSP), and network processing. The main alternative implementation technologies in these application areas are general-purpose processors, digital signal processors, and application-specific integrated circuits (ASICs). Instance-specific design offers the opportunity to exploit the reconfigurable nature of FPGAs to improve performance by tailoring circuits to particular problem instances. It can be broadly categorized into three techniques that include constant folding, which can be applied when some inputs are static; function adaptation, which alters the function of circuitry to produce a certain quality of result; and architecture adaptation, in which the circuit architecture is adapted without affecting its functional behavior. The level of automation that can be applied varies among these approaches. Constant folding can often be carried out automatically using partial evaluation techniques. Function adaptation can be performed by varying bit widths and arithmetic methods in parameterized IP cores. Tools such as Quartz (for low-level design) or tool for stream architectures can produce highly parameterized circuit cores where design parameters can be traded off against each other to achieve the desired requirements in area, speed, and power consumption. Architecture adaptation, such as adding processing units to

Abstract
Cite
Citations: 2

Journal article

Luk W, Mencer O, Savaria Y, 2008, Guest editorial: 20 years of ASAP, JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, Vol: 53, Pages: 1-2, ISSN: 1939-8018

Journal article

Becker T, Jamieson P, Luk W, Cheung PYK, Rissa Tet al., 2008, Towards benchmarking energy efficiency of reconfigurable architectures, International Conference on Field Programmable Logic and Applications, Publisher: IEEE, Pages: 691-694

Energy research in reconfigurable architectures often involves legacy benchmarks such as the MCNC benchmarks. These benchmarks, however, are not well-suited for assessing energy consumption of reconfigurable technology, since they lack realistic input stimuli. This paper reviews and categorises a range of computation system benchmarks, and shows that there are no comprehensive benchmarks targeting reconfigurable architectures that would stimulate energy or power research. We review existing energy research in the field which involves microbenchmarks, in-house designs, or legacy benchmark suites used to evaluate power optimisations.

Abstract
Cite

Conference paper

Atasu K, Todman T, Mencer O, Luk Wet al., 2008, Optimal Implementation of Combinational Logic on Look-up Tables, The Fourth Conference on Ph.D. Research in Microelectronics and Electronics (PRIME'08)

Cite

Conference paper

Echeverría P, Thomas DB, López-Vallejo M, Luk Wet al., 2008, An FPGA run-time parameterisable log-normal random number generator, Pages: 221-232, ISSN: 0302-9743

Monte Carlo financial simulation relies on the generation of random variables with different probability distribution functions. These simulations, particularly the random number generator (RNG) cores, are computationally intensive and are ideal candidates for hardware acceleration. In this work we present an FPGA based Log-normal RNG ideally suited for financial Monte Carlo simulations, as it is run-time parameterisable and compatible with variance reduction techniques. Our architecture achieves a throughput of one sample per cycle with a 227.6 MHz clock on a Xilinx Virtex-4 FPGA. © 2008 Springer-Verlag Berlin Heidelberg.

Abstract
Cite
Citations: 2

Conference paper

Atasu K, Mencer O, Luk W, Ozturan C, Dundar Get al., 2008, Fast Custom Instruction Identification by Convex Subgraph Enumeration, 19th IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP)

Cite

Conference paper

Todman T, Atasu K, Mencer O, Luk Wet al., 2008, Optimal Implementation of Combinational Logic on Lookup Tables, The Fourth Conference on Ph.D. Research in Microelectronics and Electronics (PRIME'08), Istanbul, Turkey.

We present a methodology for optimally implementing combinational logic equations on networks of look-up tables. Our work effectively extends optimality to span logic minimization and technology mapping. We restrict ourselves to 4-input look-up tables (LUTs) and enumerate all possible circuits up to a certain area or latency. Since simple-minded enumeration would take a long time, we develop levels of abstractions (steps) and we formulate the key step of enumeration as an Integer Linear Programming (ILP) problem. We show results on a set of ISCAS benchmarks.

Conference paper

Thomas DB, Luk W, 2008, Multivariate Gaussian Random Number Generation Targeting Reconfigurable Hardware, ACM Transactions on Reconfigurable Technology and Systems (TRETS), Vol: 1, Pages: 12:1-12:??, ISSN: 1936-7406

The multivariate Gaussian distribution is often used to model correlations between stochastic time-series, and can be used to explore the effect of these correlations across N\/ time-series in Monte-Carlo simulations. However, generating random correlated vectors is an O\/ (N\/$^2$) process, and quickly becomes a computational bottleneck in software simulations. This article presents an efficient method for generating vectors in parallel hardware, using N\/ parallel pipelined components to generate a new vector every N\/ cycles. This method maps well to the embedded block RAMs and multipliers in contemporary FPGAs, particularly as extensive testing shows that the limited bit-width arithmetic does not reduce the statistical quality of the generated vectors. An implementation of the architecture in the Virtex-4 architecture achieves a 500MHz clock-rate, and can support vector lengths up to 512 in the largest devices. The combination of a high clock-rate and parallelism provides a significant performance advantage over conventional processors, with an xc4vsx55 device at 500MHz providing a 200 times speedup over an Opteron 2.6GHz using an AMD optimised BLAS package. In a case study in Delta-Gamma Value-at Risk, an RC2000 accelerator card using an xc4vsx55 at 400MHz is 26 times faster than a quad Opteron 2.6GHz SMP.

Abstract
Cite

Journal article

Mak T, D'Alessandro C, Sedcole P, Cheung PYK, Yakovlev A, Luk Wet al., 2008, Implementation of Wave-Pipelined Interconnects in FPGAs, Publisher: IEEE, Pages: 213-214

Global interconnection and communication at high clock frequencies are becoming more problematic in FPGA. In this paper, we address this problem by presenting an interconnect wave-pipelining strategy, which utilizes the existing programmable interconnects fabrics to provide high-throughput communication in FPGA. Two design approaches for interconnect wave-pipelining, using simple clock phase shifting and asynchronous phase encoding, are presented in this paper. Experimental results from a Xilinx Virtex-5 FPGA device are also presented.

Conference paper

Lee D-U, Cheung RCC, Luk W, Villasenor JDet al., 2008, Hardware implementation trade-offs of polynomial approximations and interpolations, IEEE TRANSACTIONS ON COMPUTERS, Vol: 57, Pages: 686-701, ISSN: 0018-9340

Author Web Link
Cite
Citations: 47

Journal article

Wilton S, Ho C, Quinton B, Leong P, Luk Wet al., 2008, A Synthesizable Datapath-Oriented Embedded FPGA Fabric for Silicon Debug Applications, ACM Transactions on Reconfigurable Technology and Systems, Vol: 1, Pages: 1-25

We present an architecture for a synthesizable datapath-oriented FPGA core which can be used to provide post-fabrication flexibility to an SoC. Our architecture is optimized for bus-based operations and employs a directional routing architecture, which allows it to be synthesized using standard ASIC design tools and flows. The primary motivation for this architecture is to provide an efficient mechanism to support on-chip debugging. The fabric can also be used to implement other datapath-oriented circuits such as those needed in signal processing and computation-intensive applications. We evaluate our architecture using a set of benchmark circuits and compare it to previous fabrics in terms of area, speed, and power consumption.\r\n\r\n

Abstract
Cite

Journal article

Cope BT, Cheung PYK, Luk W, 2008, Using Reconfigurable Logic to Optimise GPU Memory Accesses, Pages: 44-49

Cite

Conference paper

Koester M, Luk W, Brown G, 2008, A hardware compilation flow for instance-specific vliw cores, Pages: 619-622

Hardware compilers for high-level programming languages are important tools to reduce the design productivity gap in hardware development. In this paper a hardware compilation approach is described, which is able to generate a hardware description based on a specification in a high-level programming language such as ANSI C. No modification of the program specification is required, allowing it to be suitable for a hardware and a software implementation at the same time. The parallelism is extracted by using VLIW optimization techniques. The generated hardware implementation is an instance-specific VLIW core, which is defined by its high-level program specification. To demonstrate the principle of the design flow, a prototype is presented which uses the VEX compiler as the front-end and the Handel-C tool chain as the back-end. The resulting instance-specific VLIW cores of several test functions are compared to equivalent software implementations. © 2008 IEEE.

Abstract
Cite
Citations: 8

Conference paper

Yusuf S, Luk W, Sloman M, Dulay N, Lupu ECet al., 2008, Reconfigurable architecture for network flow analysis, IEEE Transactions on VLSI System, Vol: 16, Pages: 57-65, ISSN: 1063-8210

Journal article

Yusuf S, Luk W, Sloman M, Dulay N, Lupu EC, Brown Get al., 2008, Reconfigurable architecture for network flow analysis, International Conference on Engineering of Reconfigurable Systems and Algorithms, Pages: 57-65

This paper describes a reconfigurable architecture based on field-programmable gate-array (FPGA) technology for monitoring and analyzing network traffic at increasingly high network data rates. Our approach maps the performance-critical tasks of packet classification and flow monitoring into reconfigurable hardware, such that multiple flows can be processed in parallel. We explore the scalability of our system, showing that it can support flows at multi-gigabit rate; this is faster than most software-based solutions where acceptable data rates are typically no more than 100 million bits per second.

Abstract
Cite

Conference paper

ProfessorWayneLuk

Contact

Location

Summary