CDI-Info/181 at main · vaj/CDI-Info · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
YouTube:https://www.youtube.com/watch?v=pXUfLdi_Agc
Text:
All right. Yeah. So thank you for the opportunity given to you all to introduce our report and share some insights. So I'm going to present you some of the extracts coming from our report, Memory Process Interface 2033, which focuses on the CXL.

OK, I will give you maybe first a brief introduction about my company. So my name is Thibault Grossi, I'm a senior analyst within the Memory team. So Memory team, which is expanding because we have two more people now. So on top of John and Simone, we also have Josephine, which is based in California and Belinda, which is based in Nantes as well. So this team is managed by Emile Jolivet, who is also taking care of the computing and software division.

Quickly about Yole. So Yole is a 25 years old French market intelligence company with over 100 analysts worldwide. So worldwide, it means we've got on top of France and Europe, we also have analysts in North America, Asia, and this is inclusive of China, which is a pretty strong, I mean, a strength clearly those days. We attend over 120 annual conferences every year, interview, I mean, around over 5000 interviews every year. We can also rely on the Yole System Plus that realize teardowns of systems as well as components. So systems can be smartphones, can be as well, system, automotive system like ADAS, infotainment system, for example, as well as the solutions coming, accelerators like the H100,  finally, on which we have a teardown of the boards as well as the teardown of the chips itself. All these together put us in a pretty unique position, allowing us to obtain pretty detailed and accurate information, which clearly help us to meet our customers' expectations.

Quickly covering the field of expertise of Yole, so it goes from, I mean, this is a wide range of expertise covering the whole semiconductor industry. It goes from photonics and lighting, imaging, sensing and actuating, display, radio frequency, compound semiconductor, power electronics, batteries. And then we go to semiconductor packaging, semiconductor manufacturing equipment, memory, computing and softwares. And then after, let's say, more some more transversal topics such as, you know, systems, electronic systems, more globally and emerging technologies. We deliver out of the shelf reports just like this one, which are meant to be refreshed every year or every two years, depending on the subject. We also provide with monitors which are refreshed on a quarterly basis. An example of monitors, we have NAND, DRAM monitors, we have MCU, processor monitors as well, compound semiconductor packaging, wafers report as well. I mean, we have a wide range of offers. Tracks, teardowns of systems and on top of it, we also provide some custom studies for some specific requests. So that's it in terms of short introduction of Yole.

I will now jump directly to the topic of interest for today, which is CXL. So here's an overview of the forecast we have here in Yole. From a bit less than two million in 2022, we project the market to reach almost 16 billion by 2028. So in 2022, we estimate that the revenue remained pretty limited and mostly coming from, you know, prototyping configurations. But in 2026, we expect the market to reach about 2.1 billion, mainly boosted by the CXL 2 and the early phases of CXL 3.1. By 2028, we forecast this revenue to be boosted mainly thanks to the CXL 3.1 configurations and reaching almost those 16 billion.

So now let's have a look at how we are reaching this and why we're seeing this acceleration of revenue. So first, I'm not going to spend ages there. I mean, other people already have explained this in past presentations, but you're all aware that data centers are and will be facing several challenges. So among them, we have the poor consumption, reliability, easiness of maintenance and the memory bottlenecks. Clearly, these are clearly seen as the main rationale for the CXL adoption.

Quickly going through some of them. So what we have in mind in terms of memory bottlenecks, we have the first, the memory bandwidth, which is decreasing since 2012. The average number, of course, per servers has increased by about a factor of three. But the DRAM clearly did not scale as fast as this. Which is leading to, you know, on average, a decrease of the bandwidth per core. The second point is the memory hierarchy. So there is today a pretty big latency gap between the attached memory and the storage devices. There have been various technologies that have emerged to fill that gap, such as 3D Xpoints, for example. But clearly, none of them has really taken off at the end of the day. Third point is the cost of the memory per server is increasing. So DRAM spending per server has progressively increased from about 15% back in 2012 to something like 30% in 2022. And we even expect it to keep rising. The third point can partially be explained by the fourth point, which is memory stranding. So memory stranding, you know, this is what occurs in common issues of architecture that occurs when all the cores are fully allocated. But some unrented memory remains, which is leading to inefficient use of this pretty expensive resource. And last point, the workloads are growing in terms of complexity. So if we think about the workloads that are, you know, driving the development of data center market,  we have HPC and we have AI servers, for instance, which are clearly memory capacity and bandwidth sensitive. And those are rapidly increasing in terms of complexity. So, for instance, if we take NLP or AI model in general, the number of parameters have increased by something like 14, a factor of 14 per year. So all the old, we believe that open systems, such as CXLs, could actually help tackling most of these challenges  and even, you know, revolutionizing computing architecture for data centers.

A quick slide, not to comment and not describe the different type of devices again,  but more just to highlight the fact that in the report, we mainly focus on the type 2 devices, which are memory buffer and expanders. Even though we do cover the relationship that may happen between the type 1, type 2 and type 3 devices,  many type 2 accelerators like GPU, for instance.

Now, let's have a quick look at the different revision of CXL and what it means in terms of memory. So the first one being CXL 1.1, which is running over PCI Express 5,  and it sees its application limited to the in-server applications. So we have here the in-server memory expansion case. So use case are mainly capacity and bandwidth expansion here, using the 64 lanes available. That's the equivalent of about 8 DDR5 additional channels in terms of bandwidth. We can see here, for example, quickly on the slides, a brief presentation of the content of CXL memory expanders,  which embed DRAM, obviously, I mean, to make it simple, embed DRAM and CXL memory. With CXL 2 still running over PCI Express 5,  but it had the CXL switch in the possible architectures and within the possibility to have a CXL memory pool behind a switch. So we are talking here of the possibility to share the memory expanders across several compute nodes. And we expect the beginning actually of the CXL adoption acceleration,  as the CXL 2 CPU are going to be released on the market. So CXL 2, we're still talking of, you know, in a rack configuration. With CXL 3.1, we start running over PCI Express 6,  which is allowing to double the bandwidth per lane without increasing the latency. And among the key new features of CXL 3.1,  it will actually enable the possibility to cascade switches and as well the communication from device to device,  which is opening the door to the full disaggregation composability of resources, including memory. So it makes great sense here. And this is where we can expect the CXL adoption in data center to boom and fully accelerate.

Now, if you look at trying to have a look at the memory and storage technologies which are leveraging CXL. So as I stated previously,  the gap between DRAM and NVMe SSDs in the memory hierarchy is currently quite substantial. I have in mind a factor of 1000 in terms of latency. Now, the CXL presents an opportunity to bridge this gap by, you know, introducing far memory and subsequently allowing the inclusion of additional layers in the memory hierarchy. So these layers can be classified into three levels, each of them offering distinct levels of latency and bandwidth. So first, we're going to have the direct attached CXL memory expanders,  where the memory expanders are going to be mounted directly in the server. Then with a bit more latency, we would have the memory pooling DRAM based solutions,  still using the same devices, but behind the switch. So you would have additional latency. And finally, memory pooling, but using a lower latency type of media. So talking about those media and those type of devices, we have three, those three layers are going to use various type of devices, CXL being, you know, media agnostic. The first layer, I mean, the first type of device is going to be CXL memory expanders. I'm going to come back to it after. We expect most of the demand to be coming for these type of devices. Then we're going to have for the third layer, persistent memory. And to this, we have in mind solutions such as, you know, NVDIMMs or any solutions using emerging non-volatile memories such as PCM, MRAM, for instance. And finally, we have the third category of devices where we have in mind storage class memory or CXL SSDs. By CXL SSDs, what we have in mind, it's mainly, you know, either very low latency NAND flash. And/or CXL SSDs that take benefit from the CXL protocol with, for example, you know, the possibility to address them with a thinner granularity,  you know, instead of 4 kilobyte pages, you would be able to address them using 64, 128 or 256 bytes,  which could be of interest for specific applications that are handling, you know, large amount of data with a pretty thin granularity. I have in mind, of course, AI, but as well data analytics. Now, I was saying that CXL memory expanders, DRAM-based, I expect it to be the main, where most of the demand is going to be.

So I suggest we have a quick look at this type of devices. So these are available in two main form factors. We have the added cards that include a CXL memory expander controller that need to be populated with, you know, conventional DRAM or DIMMs. And then we have the drives, which are fully integrated solution in which DRAM chips, DRAM chipset like PMIC and CXL memory expander controllers are assembled on the same board. Both have pros and constraints. So the AIC is probably the easiest way to start with CXL, allowing you to get the capacity tuned to your requirements and potentially reuse DDR for audits from legacy configurations. The drive on their side use currently EDSSF, E3.S form factor, which could evolve in the future. But clearly the drive form factor is seen as a more robust and optimized form factor, compliant with existing server chassis standards. But this may evolve. Now, at Yole, we believe that if on the short term, the AIC will represent most of the volumes, because they are very convenient to build quickly configurations to assess CXL capabilities. We, however, expect the drives on the midterm and longer term to form factor to take over in the future. Thanks to its robustness, making it easier and safer to handle, especially thinking of future cases enabled by CXL 2.1 for having device which is going to be hot plug and manage hot removable devices.

Now, I'm going to move to the next slide, which is about, you know, we're trying to in the report, we're also trying to cover, you know, what would be the key milestones for the CXL deployments in data centers. The first point is CXL 1.1, 2.1, 3.1 ramp up are clearly conditioned to the availability of CXL capable CPUs. That's the first point. So based on what we know from the available CPU roadmap, so CXL 1.1 are possible since late 2022, CXL 2 start ramping up in 2024, and finally CXL 3.1 by late 2025-2026. Now, if we look at the memory expander types, we expect the drives and the AIC to ramp up with CXL 1.1. However, the drives adoption are expected to accelerate with the CXL memory pool enabled by CXL 2. In the future, with CXL 3.1, we could think of, you know, having CXL drive as well in four lanes configuration instead of eight lanes. Thanks to the switch to PCI Express 6, which is doubling the bandwidth per lane. So say differently, with four lanes in CXL 3.1, you will have the same bandwidth and same latency than configuration with eight lanes in CXL 2.0, allowing to connect more devices with the same amount of lanes. Now focusing on the use cases. So the direct attach in-service memory expander starts with CXL 1.1. This is the main use case today. But then the memory pool use case is expected to start with CXL 2.0 configurations, with the possibility to attach and share memory expanders across few compute nodes, thanks to the CXL switches. And finally, maybe with the CXL 3.1, we may expect the rise of multi-headed memory expanders starting 2025-2026. They would, however, offer, you know, I mean, they would offer lower latency, which could be good, but they wouldn't be as scalable as a solutions memory pool behind switch.

Now, if we look at all the forcing revenue for those memory expanders, as I explained previously, on the short term,  we do expect the AIC to represent most of the revenue. And this is true at least at the end of this year and potentially early 2025. But clearly, we expect on the longer term, a drive to take over. So by 2028, our forecast is reflecting 87% coming from the drive. For the reason I explained previously. Now looking at the use cases, direct attached memory expansion is currently the main use case. But with the possibility, I mean, the CXL 2.0, memory pooling is becoming possible and we expect actually by 2028 to represent most of the revenue.

In the report as well, we are trying to provide also a picture of where would be the interest coming from, looking at the different applications and so on. On this slide is a quick overview. So looking at in-server memory expansions, we expect to see interest mainly where large amount of memory is required. So AI servers, for instance, in-memory database or HPC are where we would expect the main interest first. With CXL 2.0 and the possibility to have memory pool, we expect the rising interest again from the in-memory database, but even stronger interest coming from HPC, cloud and hyperscalers. And finally, with CXL 3.1, we actually expect to see a boom of interest across all the applications. Two main points would be obviously AI servers with the possibility to extend the accelerator memory using a memory pool. So it could be inference AI servers, for example. Another case would be cloud and hyperscalers with the possibility to virtually compose the servers, including memory, to the application requirements. Overall, the adoption of CXL 2.0 and CXL 3.1 configuration will require the implementation of systems and softwares to manage the CXL fabric. And we believe that this could actually be the birth of a totally new market, if not an entirely new industry.

Coming back to the initial forecast with a bit more details. So by 2022, it's pretty limited, but by 2026, we expect the market to reach 2.1 billion, boosted by the CXL 2.0 configurations. Most of it will be coming from memory expenders with 1.9 billion. We forecast this 9.1 billion to be composed of 1.5 billion from the DRAM, actually, and about 112 million dollars coming from the memory expender controls revenue. But it would also be the beginning of memory pooling behind a switch, and we forecast the revenue generated by switches to reach about 143 million dollars by 2026. Just a quick clarification, by switch here, I mean the chip, the switch chips itself. And in the case of a larger SoC that would have an embedded switch, I'm only focusing on the switch functionality, not the full SoC. So, I mean, this in terms of unit will remain pretty limited in volumes in 2026, but the RRSP would still be expected to be relatively high. By 2028, we forecast the CXL market to reach about 15.8 billion dollars, with 15 billion coming from the memory expanders. And out of those 15 billion, we would be seeing 12.5 billion coming from the DRAM, and about 600 million dollars coming from the CXL memory expender controllers. Then the switches demand, which will be boosted by CXL 3.1 configurations, we see their RRSP becoming a bit more reasonable and volume rising. And I will explain the revenue rising to 741 million dollars by 2028.

Now looking at a quick takeaway about the CXL. So we believe CXL benefits from significant tailwinds overall. So this includes the large support from the industry, with CXL consumption accounting for over 80 members and even more adopters. The fact that CXL could actually help solving the data center memory bottlenecks by boosting DRAM capacities and server bandwidth,  enabling flexible allocation and usage optimization of the expensive memory. It will also enable the full disaggregation of servers, which will allow the customizations of them to meet application needs. So this is clearly a strength, and we believe these are strong tailwinds for CXL adoptions. There are still some tailwinds that will remain. So the first one being that the initial acceleration of CXL adoptions will clearly depend on the successful execution of roadmaps by CPUs, memory controllers, CXL chips and system manufacturers. Any delay in the execution could clearly postpone the wide adoption of CXL in the industry. Now you also have to keep in mind the macroeconomics uncertainties, even though those days it doesn't seem to be a very clear concern yet. But any filtration in the CAPEX could clearly potentially postpone the adoption of CXL. And the next aspects are more inherent to the CXL protocol itself, such as system reliability, configuration security, you know, the memory being far from the processor, and cost consideration. Now we do acknowledge that the current chip cost may pose a challenge, but we do anticipate a normalization as production volume increases. And additionally, we believe it's pretty also important to consider the total cost of the solutions to make a fuller cost assessment.

So Franck, I just did add a slide versus one presentation I shared with you just to try to show this. So I just made a quick comparison, you know, I have a commercial approach, large approach, I mean large configurations, but still using commercial approach. I have eight servers, each of them having two CPUs, which are running over with two DIMMs per channels. So that's a total of four terabytes, so 128 gigabytes per DIMM. That's a total of four terabytes per server and 32 terabytes total in terms of DRAM. Then I look at what way I could distribute it. Looking at the capacity to optimize scenarios, so number two, I still have my eight servers running with two CPUs, but I only populate one DIMM out of two, still with 128 gigabytes of DRAM. I could have populated with two times 64 to even lower the cost. And on top of this, I'm adding two memory pools of 16, each 16 times 512 gigabytes, CXL expanders. By doing this, I'm still having 32 terabytes overall configurations, but what I'm doing is actually I'm increasing the total DRAM accessible per server to 18 terabytes. The same amount of DRAM. And what I'm doing this is I'm doing it with a cost that of about 10%. Then I had the other extreme case where I'm saying, well, I want to keep the total DRAM accessible per server to the same level, but I want to lower my overall cost. So I keep my eight servers with two processors and one DIMM per channel, but I only have one memory pool of 16 times 128 gigabytes. So that's two terabytes per server and two terabytes in the memory pool. In that case, I have a total accessible DRAM per server of four terabytes still, but I'm diminishing my overall cost by 35%. So when I say cost, I'm considering the RDIMM, JBOM, cost assessment, and as well as the XM drives. There might be other pieces, but I took what seems to be the highest cost pieces. Now, we could argue that the third scenario might be a bit light. Even though I keep the capacity optimized, I just exchanged the 512 gigabyte to 256 gigabyte DRAM CXL expanders, I would still have 10 terabytes accessible per server while reducing my cost by something like 10% overall. To have eight terabytes accessible using the conventional approach, I would have to switch to 256 gigabyte DRAM DIMMs, and I would be simply doubling my cost. So what CPU set does here is actually including the server accessible DRAM at a reasonable cost. The more memory resources are shared, the lower is the accessible DRAM cost to be paid from the server. Of course, every configuration needs to look at every application because there are other parameters to be looked at, overall bandwidth, acceptable latency, and so on. But overall, we believe that CXL can be a way to get cost-optimized configurations.

 That's it for me. Thank you again for the opportunity. I'm leaving here a QR code so you can have access to our report online.

One last point as well is if you wish to contact us for this report or any other topic, feel free to contact any of those people linked here. You also have my email address at the beginning of the presentation.