Domain Specific Architecture：Introduction

1. Introduction

硬件功耗极限没有像性能那样增长，因此当前为了提升硬件极限，则需要优化energe per operation。

如果按照以往的技术，更多核心可能能获得10%左右的性能的提升，但是要想性能进一步提升，则需要将arithmetic operations per instruction提升百倍，这就更需要专用处理器。

以后的系统，会有通用的处理来处理一些例如操作系统的通用任务，而专用处理器则完成他擅长的工作，比如某种计算。

energycost

以前一些架构的特性，比如cache，out-of-order等能很好等满足通用计算等需求，但是对于一些特殊等领域则会在silicon和energe上都有浪费，反而不使用这些通用特性能获得更大优势。比如vedio领域，数据往往很大，也基本上不重用，cache则完全是浪费。

DSA往往只针对系统的某个subset，而不是考虑支撑其整个系统。对于DSA研究来说，最大的两个问题是：

下述Guidelines有如下好处：

基于对处理问题的数据和执行的操作更理解，提出了如下5点Guidelines：

用专门设计的内存减少数据搬移的距离：a two-way set associative cache uses 2.5 times as much energy as an equivalent software-controlled scratchpad memory
将资源倾向更多的计算和更大的内存，而不是传统先进微架构为了满足摩尔定律的优化手段（out-of-order execu- tion, multithreading, multiprocessing, prefetching, address coalescing, etc）
使用符合domain的更简单的并行手段：For example, with respect to data-level parallelism, if SIMD works in the domain, it’s certainly easier for the programmer and the compiler writer than MIMD. Simi- larly, if VLIW can express the instruction-level parallelism for the domain, the design can be smaller and more energy-efficient than out-of-order execution.
减少data size，够用即可，可以提高内存利用率，也可以在相同chip area放更多计算units
使用domain-specifc的语言

fourdsa