Hardware architecture improvements had relied on Moore's law and Dennard scaling for some decades. According to Moore's prediction, the number of transistors on a chip doubles almost every 18 months. According to Dennard scaling, as transistors become smaller, their power density stays constant. These allowed manufacturers to raise clock frequencies without significantly increasing overall circuit power consumption. Nowadays, Moore’s law is not happening as predicted, and Dennard scaling law is broken down. To address this, (homogeneous) multi-processors were invented, and opened a new era for parallel processing to gain better performance with a reasonable increase in power consumption. However, homogeneous multi-processors cannot keep up with high-demand application requirements such as low power, energy efficiency, and high performance.