
Hello everyone, this is Haruyuki Tago, Edge Evangelist at HACARUS’ Tokyo R&D center.
In this series of articles, I will share some insights from my decades of experience in the semiconductor industry and I will comment on various AI industry-related topics from my unique perspective.
In today’s volume, I will once again cover three main topics. First, I will discuss the estimated number of servers used by Amazon, the world’s largest online retailer, and why in-house development is advantageous for them. Second, I will discuss the design and performance of Amazon’s proprietary microprocessors, the Graviton, Graviton2, and Graviton3. Finally, I will close by looking at several examples of logic integrated circuits developed by Japanese semiconductor manufacturers using the ARM Neoverse.
The Scale of Amazon Web Services
In the fourth quarter of 2021 (2021/Q4), Amazon Inc’s global sales were approximately $137.412 billion USD [2][3]. Below is a breakdown of Amazon’s sales by service.
- internet shopping – 45% ($66.075 billion)
- Amazon Web Service (AWS) – 13% ($17.78 billion)
Compared to the previous quarter (Q3), internet shopping and AWS each rose by 40%.
Amazon also offers an Elastic Compute Cloud (EC2) service, a virtual server service that offers computing resources on-demand, various instance types depending on the workload, and a quick and easy pay-as-you-go plan by the second.
Next, let’s take a look at Amazon’s choices for CPUs in 2021 by looking at Figure 1. On the right, we can see the various CPU options available to Amazon. On the left, we can see the CPU shares of Intel, AMD, and Amazon’s CPUs with the Graviton2 shown in light blue. Here, we notice that the increase in market share for the Graviton2 is quite noticeable.

Figure 1 Graviton2 as a percentage of Amazon’s EC2 [5] (left); Options for Amazon’s EC2 high-performance CPUs (right)
By developing servers in-house, Amazon can optimize them for AWS workloads. They can also produce the CPUs at a lower cost than if they purchased them from either Intel or AMD. The author’s rough cost estimate of the Graviton3 chip is in the range of several 1000’s to 10,000’s of yen. For large-scale production, this could lead to massive savings in the long run.
AWS C7g Instance Configuration
The AWS services offer a wide variety of instances (virtual CPU environments) to its customers. The previously mentioned Graviton3 CPU is used for the C7g instance. To get a better understanding of how instances work, let’s look at Figure 2. The left shows a server rack that holds multiple C7g boards while the center image shows the board of an individual C7g board. The right image shows the block diagram, where three Graviton3 chips are connected to a Nitro card. The board is configured so that the three Graviton3 chips are in the center, the Nitro card and SSD are in the back, and the power supply circuit is located in the front.

Figure 2 C7g rack and board
Diving deeper into the C7g board, let’s look at Figure 3, which shows a photo of the Graviton3 chip package on the left. Here, we see that there are seven semiconductor pieces (dies) included in the package. However, outside of this image, Amazon has not released any further details. Based on information released by the media and their own estimation, the author of references [8]and [11] created an estimated diagram for the package shown on the right of Figure 3.
In this diagram, the seven dies consist of one Graviton3 CPU die (light yellow), four DDR5 dies (light green), and two PCI express 5th generation I/O (light blue). The DDR5 dies and the PCle gen5 dies are both mounted upward on the package board (green rectangles in Figure 3). The I/O terminals of these dies are located on the top surface and represented by the bright green semicircles.
The Graviton3 CPU die surface (the side with the I/O pins) is turned face-down and directly bonded to the mounted DDR5 and PCle gen5 dies, where the I/O pins were directly aligned. Here, small bumps (solder balls with a diameter of several tens of microns) generated on the surface of each die were directly bonded to each other while slightly overlapping each other.
The Graviton3, DDR5, and PCl3 gen5 dies are all designed so that the bumps are perfectly aligned. In this case, the distance between each die is around 0.1~0.2mm without the need to go through the package substrate.

Figure 3 Graviton3 package (right): Seven dies in the Graviton3 (author’s assumption)
The Graviton3 has two major advantages over other models.
- The distance between the Gvavition3 CPU die and the DDR5 die is radically shortened to about 0.1 to 0.2 mm. It also relaxes high-speed DDR5 operation timing constraints.
- Another advantage involves the PCle Gen5, the latest I/O IP, which is maybe only released from a few vendors such as Synopsys Inc. The Graviton3 die can be developed independently of the PCle gen5, which is important because design and development need time to adapt to new manufacturing processes. Using new technologies can also lead to failure due to sensitivity to semiconductor characteristics. For this reason, I think that using pre-verified PCIe gen5 dies will avoid this risk.
The Graviton3 CPU die in the center of the Graviton3 package uses ARM’s Neoverse V1 IP, with 64 cores connected in a mesh-like pattern, shown on the right of Figure 4. The left image shows a breakdown of each individual core, showing the enhanced instruction decoding section and approximately doubled the number of execution units compared to the Graviton2. It is also worth noting that increasing the number of instructions executed in parallel also increases performance.

Figure 4 Graviton3-Interconnect & system (left); Enhancement from the Graviton2 CPU core to the Graviton3 CPU core [1] (right).
AWS Instance Comparison
It is also important to show how the specifications of AWS instances compare with other competitors. Looking at Figure 5, we can compare three different instances. These instances include the m6g instance using the Graviton2, the m5a instance using AMD’s EPYC7571, and the m5n using Intel’s Xeon Platinum 8259CL as of 2021.
Looking at the results, it is of particular interest that the m6g instance has a Thermal Design Power (TDP) that is about half that of both the m5a and m5n instances. the hourly cost rating for the m6g is also the lowest of the three, making it the most attractive option for customers.

Figure 5 Graviton2, AMD, Intel, AMD Comparisons [10]
Focusing on the Graviton3, the first enhancement is that the number of integrated transistors has been increased from 30 billion in the Graviton2 to 50 billion in the Graviton3 using TSMC 5nm manufacturing technology. An increase in the transistor budget is issued to add 256-bit Scalable Vector Extension (SVE) to the core instruction set and to improve parallel execution performance (Figure 4).
AWS has also stated its intention to expand into the field of high-performance computing, and these enhancements are their first step. To match the increase in core performance, DDR5-4800 is used for main memory, providing 300 GB/s memory bandwidth, which is 1.5 times greater than the Graviton2. It is also interesting to note that DDR5 memory is not yet used by either Intel or AMD in server CPUs.
In spite of all these performance enhancements, the power consumption of the Graviton3 is approximately 100W. This is surprisingly similar to the consumption of the Graviton2. This is most likely due to the evolution in semiconductor manufacturing technology from 7nm practices to newer 5nm methods. The most advanced packaging technology (Figure 3) may have contributed also.

Figure 6 Graviton, Graviton2, Graviton3 Comparison.Red items indicate values obtained from crude estimates [11]
Graviton3 Performance
Another important factor is the performance of the Graviton3 CPU. Figure 7 (left) shows the SPEC cpu2017 rate values of 5 separate instances. By setting the performance scale to 100% for the C6g instance (graviton2), C7g (Graviton3) shows a performance rate between 130~160%, while Intel and AMD CPUs show a performance rate of around 70~90%.
As an example of machine learning, the performance using the BERT model for natural language processing (NLP) is about 2.4 times better than that of the C6g, shown in the right image of Figure 7.

Figure 7 Graviton3 SPEC cpu2017 rate (left); Graviton3’s machine learning performance data (right)[1]
Intel and AMD are semiconductor manufacturers that sell CPU chips. In this case, the company controls the CPU standards and the CPUs run within this standard. On the other hand, the interface is open to the public so that other semiconductor manufacturers can develop and manufacture DDR memory, PCI-Express, and other I/O chips, motherboards, and so on. In other words, they are creating an ecosystem where CPU manufacturing is centered around the company’s own CPU design.
At the same time, the evolution of interfaces takes time because each company needs to be in step with the other and sudden drastic changes aren’t possible. To this point, AWS isn’t in the CPU chip business, so they can focus on obtaining optimal solutions to realize servers with high processing performance and low power consumption. It is also important to obtain optimal solutions at the C7g rack and board level, and the availability of high-performance IP packaging technology and state-of-the-art foundries has made this possible.
Graviton3 uses ARM Neoverse V1 (Zeus), but ARM is already preparing its next-generation platform, the Poseidon Generation Platform, shown in Figure 8. Oracle Cloud uses Ampere Altra servers using Ampere chips (with Neoverse N1) from a semiconductor venture company called Ampere [9]. At the same time, Chinese company Tencent is also running CPUs based on Neoverse N1 while Alibaba, another Chinese company, is using ARM [5].

Figure 8 ARM Neoverse Platform Roadmap [5]
Japanese Developed Neoverse-based Logic Semiconductors
Japan has also been making strides when it comes to semiconductor manufacturing. The Japanese company Socionext has recently developed a 40-core CPU using Neoverse in Yokohama [12].
The CPU was designed for an overseas client for integration into its System on Chip (SoC). The SoC was manufactured using the 7nm process by the Taiwanese company TSMC and the CPUs operate at speeds as high as 3GHz.
Another example is Intel’s latest server MPU (microprocessor), the third-generation Xeon Scalable Processor (Xeon SP, Code Name: Ice Lake SP). Manufactured using a similar process as Socionext’s CPU, Intel’s design had up to 40 cores and a base clock frequency of 2.3 GHz with the ability to reach 3.4 GHz when turbo-boosted.

Figure 9 Japanese semiconductor manufacturer Socionext’s development of a 40-core CPU using Neoverse [12]
Summary
- AWS utilizes millions of servers while it is reported that they replace more than one million servers each year. This large-scale operation provides several significant advantages for Amazon when it comes to in-house development. First, servers can be optimized for AWS workloads. Second, they are able to produce them at a lower cost than if they were purchased from either Intel or AMD.
- At the end of 2021, AWS developed the Graviton3 CPU and the C7g, an instance that ran using the Graviton3. The performance of this instance was 2.4 times higher than the previous C6g (Graviton2) for natural language processing. The power consumption for both of these instances remained constant at 100W. The CPU core was created using the ARM Neoverse V1 IP, advanced packaging technology, and TSMC 5nm semiconductor manufacturing technology.
- Heavy cloud users such as Oracle, Ampere, Tencent, and Alibaba are reportedly developing their own servers with ARM Neoverse. ARM has also announced its next-generation Neoverse which boasts even higher performance.
- Japanese semiconductor manufacturer Socionext has developed a 40-core CPU using the Neoverse N1. Developed for overseas customers, the logic integrated circuits can operate at speeds up to 3GHz.
References
[1] AWS re:Invent 2021 – {New Launch} Deep dive into AWS Graviton3 and Amazon EC2 C7g instances
https://www.youtube.com/watch?v=WDKwwFQKfSI
[2] AMAZON.COM ANNOUNCES FOURTH QUARTER RESULTS
https://s2.q4cdn.com/299287126/files/doc_financials/2021/q4/business_and_financial_update.pdf
[3] Amazon、売上高は過去最高も予測に届かず 純利益98%増は出資先Rivianの上場益 2022年2月4日
https://www.itmedia.co.jp/news/articles/2202/04/news065.html
[4] AWSとは? 初心者にもわかりやすく解説
https://www.skyarch.net/column/whataws01/
[5] Arm、DC向けCPU IPデザイン「Neoverse」の高性能版「Neoverse V1」とArmv9に対応した「Neoverse N2」を発表
https://cloud.watch.impress.co.jp/docs/news/1321510.html
[6] Amazon EC2 で選択できる高性能CPUの選択肢
https://d1.awsstatic.com/webinars/jp/pdf/services/20200707_BlackBelt_Graviton2.pdf
[7] コレ1枚で分かる「クラウドコンピューティング 2019年版」
[8] Amazon Graviton 3 Uses Chiplets & Advanced Packaging To Commoditize High Performance CPUs | The First PCIe 5.0 And DDR5 Server CPU
[9] 『半導体業界の第一人者,AI業界を行く!』 Vol.12:クラウドネイティブプロセッサ 2021年7月9日
https://hacarus.com/ja/ai-lab/20210709-cloud-native-prosessor/
[10] Amazon’s Arm-based Graviton2 Against AMD and Intel: Comparing Cloud Compute
https://www.anandtech.com/show/15578/cloud-clash-amazon-graviton2-arm-against-intel-and-amd
[11] INSIDE AMAZON’S GRAVITON3 ARM SERVER PROCESSOR
https://www.nextplatform.com/2022/01/04/inside-amazons-graviton3-arm-server-processor/
[12] 日本に残る世界レベルの設計力、40コア大規模CPUで見せつける
https://xtech.nikkei.com/atcl/nxt/column/18/00138/090200869/