Edge AI Evangelist’s Thoughts Vol.20: The Secrets of Apple’s M1 Ultra

Edge AI Evangelist’s Thoughts Vol.20: The Secrets Of Apple’s M1 Ultra

Hello everyone, this is Haruyuki Tago, Edge Evangelist at HACARUS’ Tokyo R&D center.

In this series of articles, I will share some insights from my decades of experience in the semiconductor industry and I will comment on various AI industry-related topics from my unique perspective.

In today’s volume, I want to present my findings on Apple’s M1 Ultra microprocessor and Mac Studio PC, which were announced in March of 2022. In this article, I will cover the specifications and performance of these technologies. Other topics will include the challenges of the M1 Ultra, the secrets behind the Apple Ultra Fusion, and how Apple went from the M1 to the M1 Ultra. 

The Apple M1 Ultra & Mac Studio PC

At an Apple event in March of 2022, Apple unveiled the M1 Ultra microprocessor and the Mac Studio PC [1][2]. The Mac Studio PC series are offered with several specification variants that use different configurations of microprocessors. Figure 1 shows a configuration that uses two M1 Ultra microprocessors to achieve high processing performance. This system is capable of playing back up to 18 8K ProRes 422 video streams while maintaining a lower power consumption. These power savings are shown in figure 2, where power consumption is plotted against CPU performance. Looking at the graph, these Macs use up to 90% less power (200W) than a comparable Windows PC. The Mac Studio PC is priced at $5,799 (excluding tax) and is marketed as a premium product for those who desire higher performance [4].

Figure 1 The Mac Studio using the M1 Ultra Microprocessor [2]

Figure 2 CPU performance vs power consumption (left): GPU performance vs power consumption (right) [1]

From the Apple M1 to the M1 Ultra

Apple first began its in-house development of microprocessors with the M1 model for the Mac, which was announced in November of 2020. By March 2022, four variations of the M1 microprocessor had been developed including the M1, M1 Pro, M1 Max, and the M1 Ultra. Looking at Figure 3, we can see the chip area, transistor count, layout, and other specs for each of these four chips [3]. 

Figure 3 Apple M1, M1 Pro, M1 Max, M1 Ultra Dies  [2], [5]

Figure 4 shows the specifications including the number of CPU cores. At the Apple event, four types of Apple Silicon were announced and PCs using them were released to the public. Looking at the development of PCs, the number of CPU, GPU, and accelerator cores all increase along with chip area. The memory bandwidth, another important factor in determining processing speed, has also doubled with the M1, M1 Pro, and M1 Max microprocessors (Figure 5).

Chip Name→ M1 M1 Pro M1 Max M1 Ultra
CPU core count (note1) 4+4 8+2 8+2 16+4
GPU core count (note2) 8 16 32 64
AI accelerator core count 16 16 16 32
Main memory capacity 8GB 16GB 32GB 64GB
DRAM type LPDDR4x-4266 LPDDR5-6400 LPDDR5-6400 LPDDR5-6400
DRAM channel count 2 2 4 8
Total channel x  bits per channel 2ch x 64bit/ch 2ch x 128bit/ch 4ch x 128bit/ch 8ch x 128bit/ch
Memory bandwidth 68,2GB/s 204.8GB/s 409.6GB/s 819.2GB/s

Figure 4 Specifications of Apple M1, M1 Pro, M1 Max, M1 Ultra [5]

note1: Performance-core count + Efficiency-core count

note2: total implemented GPU cores which may include unactivated cores

Figure 5 Main memory capacity and memory bandwidth trends for the Apple M1, M1 Pro, M1 Max, and M1 Ultra [5]

Challenges of the M1 Ultra

Next, suppose that we are developing a microprocessor with twice the processing performance of the M1 Max, using the same TSMC N5 semiconductor manufacturing technology. The main difficulty is doubling the number of CPU and GPU cores, as well as the number of AI accelerators. Looking at the M1 Ultra, I think that there are two major challenges that Apple faced in its development.

Challenge 1: Overcoming the Chip Size Wall

For the manufacturing of chips using the TSMC 5nm process, 13.5nm extreme ultraviolet (EUV) light is used to etch patterns onto the surface of the wafer. In this process, a mask acts as the pattern that is etched onto the surface, where the exposure area is limited to a maximum width of 26 nm and a length of 33 nm, creating a total area of 858 square nm [6]. 

However, even if the chip could be contained within this limit, the large size of the chip would reduce the overall yield, increasing the manufacturing cost. Because of this, fitting multiple CPU chips within a single package is preferred instead.     

Challenge 2: Securing Unified Memory Access

In an Apple News release [1], the company stated, “This method allows the M1 Ultra to operate as a single chip and is recognized this way by the software, so developers don’t have to rewrite the code to take advantage of its performance.” 

The M1 series is also based on the Unified Memory architecture. This structure means that the memory address can be accessed by the CPU, GPU, and AI accelerator cores. These cores and the memory interface IPs are all interconnected by an on-chip network. For software compatibility, it is necessary to maintain the characteristics of the Unified Memory even when two M1 Max chips are connected. For this reason, the on-chip connection network of both M1 Max chips is connected without altering the architecture (although some buffers and registers might be included). 

In the same article [1], the author stated, “…meanwhile, Apple’s innovative UltraFusion uses silicon interposers to connect the chips through more than 10,000 signals to deliver more than four times the bandwidth (2.5 TB/s) compared to industry-leading multi-chip technology.” The company website also states that the Silicon Bridge is a powerful, low latency inter-processor bandwidth solution.  

Secrets of the Apple Ultra Fusion

In a promotional video for a previous Apple event, two M1 Max dies are brought close together and more than 10,000 wires are connected to form a long thin pattern at their center (Figure 6) [2]. Apple has named this chip-to-chip connection structure Ultra Fusion [1].

Figure 6 A snapshot from the Apple event video [2]

Figure 7 shows an illustration of the inside of the M1 Ultra package as shown by Apple, with the author’s additions from [1]. The illustration shown is just for illustrative purposes, but the author believes that the M1 Max dies, the small center connection piece (red rectangle), and the eight DDR5 dies are all face-down mounted. The wiring between the M1 Max die and the DDR5 die is also embedded into the package board.

Figure 7 Interior Apple M1 Ultra package Illustration and author’s notes

Combining multiple dies into a single package through Multi-Chip Packages (MCP) or Multi-Chip Modules (MCM) is a widely used practice. Figure 8 shows three wiring technologies versus the half-line pitch between multiple dies [7]. In order of decreasing half-line pitch, there are traditional organic packages (FCBGA) with 10um or more, High-Density Organic Interposers with 2-3um or more, and Technologies that use SI backend wiring with 2um or less.

When technologies that use SI backend wiring are used for short connections between adjacent chips, it is generally called the Silicon Bridge. Other companies including Intel and AMD call this architecture Embedded Multi-Die Interconnect Bridge (EMIB). Looking back at Figure 8, SI backend wiring technologies had a half-pitch reading of 1um, which means the wiring pitch was 2um. However, in article [1], the same minimum wiring half-pitch reading was 1um since the width of the M1 Max chip is 19.3mm and is calculated using (19.3mm/10,000 wires). This is consistent with Silicon Bridge’s minimum wiring pitch of 2um. 

Furthermore, Apple claims to have achieved a transfer bandwidth of 2.5TB/s. Using the signals through the connection, we can estimate the switching speed. Assuming differential signals, 8bit/B x 2.5TB/s (10,000/2), we can estimate 4Gbps. This value is also the theoretical maximum data transfer rate of USB3.0. 

Figure 8 Three interconnect technologies of MCP(Multi-Chip Package)  and their half-line pitches. from page 10 of [7]

Figure 9 shows a schematic of the cross-section for the configuration, following the dotted blue line (between A and A’), illustrated in Figure 7. The figure also shows the Silicon Bridge, demoted by the red rectangle, which directly connects the M1 Max die using over 10,000 wires. Next, the DDR5 signals (more than 1,000) and other signals in the M1 Max die are connected to the package board using Cupper Pillars. They are then connected to the DDR5 die and package terminals through the wiring in the Top build-up layer. 

Figure 9 Vertical cross-section of the M1 Ultra package (author’s estimate)

After looking at Figure 9, I have noticed an interesting feature of the Silicon Bridge. It isn’t quite clear why the height of the Silicon Bridge shown in Figure 5 is so high. It is purely speculation, but I think this might be preparing for the next generation of M1 Ultra. Using its high drive capability, the next generation might include three, or even four, M1 Max dies and even more memory [8].

Apple’s American Patent

Apple has filed patent applications for multiple die packaging methods including the “US 2020/0176419 A1 WAFER RECONSTITUTION AND DIE-STITCHING’ patent [9]. Looking at the abstract of [9], states, “reconfigured chips are formed using wafer reconfiguration and die stitching techniques. In one embodiment, a die set embedded in a back-end of chip line (BEOL) build-connecting structure inorganic gap filler containing reconfigured chip levels.” 

Figure 10 Apple_US2020_0176419_Al_Fig.3, Fig4 [9]

Summary

  1. In March of 2022, Apple introduced its Mac Studio PC, which uses the brand new M1 Ultra microprocessor. The design and implementation of this high-performance low-power chip is the talk of the industry. 
  2. As part of their microprocessor line, Apple has developed the M1, M1 Pro, and M1 Max for use in their Mac PCs. Each model has been pushing the limits for increasing chip area. However, the M1 Max has reached the upper limit, so further increases in chip size are no longer possible. 
  3. To overcome the limitations on chip size while maintaining software compatibility, a new Multi-Chip Package (MCP) has been adopted that directly connects two M1 Max dies. This configuration mounts eight DDR5 dies inside the package. 

References 

[1] Apple unveils M1 Ultra, the world’s most powerful chip for a personal computer

https://www.apple.com/ca/newsroom/2022/03/apple-unveils-m1-ultra-the-worlds-most-powerful-chip-for-a-personal-computer/

[2]  Apple Event – March 9 –

https://www.youtube.com/watch?v=CUwg_JoNHpo

[3] 田胡,『半導体業界の第一人者、AI業界を行く!』 Vol.4:Apple M1 プロセッサはなぜ速い?

https://hacarus.com/ja/information/column/20210120-apple-m1/

[4] 鈴木 淳也,順当進化だが限界も見えた「M1 Ultra」、次期Apple Siliconの布石となるか

https://xtech.nikkei.com/atcl/nxt/column/18/01983/031000001/?P=2

[5] Wikipedia Apple silicon

https://en.wikipedia.org/wiki/Apple_silicon#Apple_M1_Ultra

[6] 福田昭のセミコン業界最前線 開発が本格化する次世代EUV露光技術、3nm以降の微細化を主導

https://pc.watch.impress.co.jp/docs/column/semicon/1165543.html

[7] R. Mahajan and S. Sane, Intel, “Technology Provider: Intel packaging technologies for chiplets and 3D” , HotChip33, 2021

https://hc33.hotchips.org/assets/program/tutorials/Tutorial_Mahajan_Sane_HotChips_2021_Talk_final_Formatted_1.pdf

[8]  大原雄介のエレ・組み込みプレイバック ウルトラ高性能な「Apple M1 Ultra」の謎

https://techfactory.itmedia.co.jp/tf/articles/2204/07/news042.html

[9] US 2020/0176419 A1 WAFER RECONSTITUTION AND DIE-STITCHING

https://patentimages.storage.googleapis.com/ee/a1/13/470ad3bdceab68/US20200176419A1.pdf

 

Subscribe to our newsletter

Click here to sign up