Skip to content

Commit 64729e4

Browse files
committed
Update Mount for CacheRuntime
Signed-off-by: xliuqq <xlzq1992@gmail.com>
1 parent 0ed91e8 commit 64729e4

2 files changed

Lines changed: 73 additions & 28 deletions

File tree

proposals/runtime/v1.1.0_extend_cache_runtime/full_cache_runtime.md

Lines changed: 41 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -63,12 +63,33 @@ Cache Runtime 为缓存系统提供 RuntimeConfig,因此Curvine 组件需要**
6363

6464
- 不同于curvine,Juicefs 是无Master架构,在[启动时就执行 format ](https://juicefs.com/docs/zh/community/getting-started/for_distributed/#4-创建文件系统),只支持一个远程存储,因此直接在 worker/fuse 的启动命令里执行;
6565

66+
扩展 CacheRuntimeClass 定义,增加对 mount UFS 的支持
67+
68+
```go
69+
type CacheRuntimeClass struct {
70+
// 当前 Cache System 支持哪些数据操作
71+
LifeCycleHook *LifeCycleHook `json:"lifeCycleHook,omitempty"`
72+
}
73+
type LifeCycleHook struct {
74+
// 挂载 UFS 的 hook 操作,针对 Master-Slave 架构,需要在 Master 中执行
75+
MountUfs *MountUfs `json:"mountUfs,omitempty"`
76+
}
77+
type MountUFS struct {
78+
// 执行的命令,必选,会在 Master Pod 中执行
79+
Command string `json:"command"`
80+
81+
// 执行命令的超时时间(单位:秒),最小值为 5s.
82+
Timeout int `json:"timeout,omitempty"`
83+
}
84+
```
85+
6686
Curvine Cache Runtime 的处理流程如下图所示:本项工作的重点在于:
6787

6888
1. 提供启动脚本,将 Fluid 提供的 RuntimeConfig 转化为 Curvine 所使用的配置文件;
6989
- 拟采用 go template 要求的格式定义 Curvine 的配置文件,并进行替换;
7090
1. 添加 Mount UFS 步骤,在 Master Sts 启动完成后,进入 Master Pod 执行 cv mount 操作;
71-
- mount 操作由缓存系统定义在ConfigMap中并挂载到 Master Pod 中。
91+
- **mount 操作在 CacheRuntimeClass 中定义,指定在特定的角色(如Master)的Pod 中执行指定的命令,以RuntimeConfig文件为参数。**
92+
- 对于 JuiceFS 缓存系统,不需要单独执行 mount 参数,在 CacheRuntimeClass 中不定义即可;
7293

7394
![img](./pics/curvine_integration.jpeg)
7495

@@ -83,19 +104,18 @@ type CacheRuntimeClass struct {
83104
}
84105

85106
type DataOperationSpec struct {
86-
// 数据操作,如 DataLoad, DataBackup, DataMigration等
107+
// Data Operation,如 DataLoad, DataBackup, DataMigration等
87108
// +kubebuilder:validation:Enum=DataLoad
88-
Name string `json:"string"`
109+
Name string `json:"name,string"`
89110

90-
// DataOperation 所使用的镜像版本,避免从 Master Pod 的 Container 中解析
91-
// +kubebuilder:validation:Required
92-
Image string `json:"image,omitempty"`
93-
94-
// Command for DataOperation Pod
111+
// The image name for DataOperation executing
112+
Image string `json:"name,string"`
113+
114+
// Command for image container
95115
Command []string `json:"command,omitempty"`
96-
97-
// Args for DataOperation Pod
98-
Args []string `json:"args,omitempty"
116+
117+
// Args for image container
118+
Args []string `json:"args,omitempty"
99119
}
100120
```
101121

@@ -122,8 +142,16 @@ type DataOperatorYamlGenerator interface {
122142

123143
针对 operation 的 不同 Type,做不同的处理。DataProcess 不区分缓存系统,可以复用现有的 Helm Yaml 生成逻辑;而其它的 DataOperation 都需要相应的缓存系统镜像及其配置。
124144

125-
- 通过新增的 DataOperationSpec 定义构建 Pod,并挂载 RuntimeConfig 和相应DataOperation的Config
145+
- 通过新增的 DataOperationSpec 定义相应 Pod ,启动并执行相应的数据操作的命令,其中 Fluid DataOperation的相关配置信息,会挂载到 /etc/fluid/config/dataop 文件中
126146

127-
- Pod 的启动命令形式类似:"/bin/sh -c generate_conf.sh /etc/fluid/config/runtime && /entrypoint.sh /etc/fluid/config/dataop",包括配置转换和缓存实际数据操作命令;
128147

129148
### 3. 支持 In-Place UpgradeReBuild
149+
150+
版本更新时的原地升级:
151+
152+
153+
154+
配置更新时的缓存重建:
155+
156+
157+

roadmap/v1.1.0_extend_cache_runtime .md

Lines changed: 32 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
## 1. Overview
44

5-
本文概述了扩展 Generic Cache Runtime 接口工作的实施路线图,旨在提供更简便的缓存系统集成方式和更高的缓存可用性。更详细的设计文档见 XXX
5+
This article outlines the implementation roadmap for extending the Generic Cache Runtime interface, aiming to provide a simpler way to integrate caching systems and enhance cache availability. For a more detailed design document, please refer to [the proposal](../proposals/runtime/v1.1.0_extend_cache_runtime/full_cache_runtime.md)
66

77
### 1.1. Problem Statement
88

@@ -12,22 +12,22 @@ In production scenarios, it is necessary to support In-Place upgrade for engine
1212

1313
### 1.2. Solution Summary
1414

15-
扩展 Cache Runtime 接口,使用 Curvine/Alluxio 作为缓存系统示例,实现
15+
Extend generic cache runtime interface to achieve
1616

1717
- Curvine Integration:Implement working reference adapters for Curvine;
1818
- DataOperation Support :Implement DataOperation interface including DataLoad and DataOperation for cache runtime;
1919
- In-Place upgrade and rebuild:Implement in-place upgrade for engine version update and in-place cache rebuild after node failure or config change.
2020

2121
### 1.3. Goals
2222

23-
- 通过完善 GenericCacheRuntime 降低第三方缓存系统的集成难度,提升代码质量;
24-
- 支持缓存系统的在线升级,不中断服务,提高系统可用性,满足生产级要求;
23+
- Reduce the integration difficulty of third-party caching systems and enhance code quality by improving Generic Cache Runtime;
24+
- Support in-place upgrades for the caching system without interrupting services, improve system availability, and meet production-level requirements;
2525

2626
## 2. Phase
2727

2828
### Phase 1: Curvine Integration
2929

30-
目标:完成 Cache Runtime 的数据缓存功能,完成与 Curvine 的联调测试,编写单元测试、e2e单元测试和集成文档。
30+
**Objectives**: Complete the data caching function of Cache Runtime, conduct joint debugging and testing with Curvine, and write unit tests, e2e unit tests, and integration documentation.
3131

3232
| Task | Description | Deliverable |
3333
| ---- | ------------------------------------------------------------ | ------------------------------------------------------------ |
@@ -38,7 +38,7 @@ In production scenarios, it is necessary to support In-Place upgrade for engine
3838

3939
### Phase 2: DataOperation Support
4040

41-
目标:Cache Runtime 支持 DataOperation(包括DataLoad、DataProcess),编写单元测试、e2e单元测试和集成文档。
41+
Objective: Cache Runtime supports DataOperation (including DataLoad, DataProcess), and write unit tests, e2e unit tests, and integration documentation.
4242

4343
| Task | Description | Deliverable |
4444
| ---- | -------------------------------------------------------- | ------------------------------------------------------------ |
@@ -52,23 +52,40 @@ In production scenarios, it is necessary to support In-Place upgrade for engine
5252

5353
### Phase 3: In-Place Upgrade and Rebuild
5454

55-
目标:支持缓存系统的 In-Place 的升级和重建
55+
Objective: To support in-place upgrades during the version upgrade of the caching system and cache rebuilding during configuration changes.
5656

57-
| Task | Description | Deliverable |
58-
| ---- | --------------------------------------- | ----------- |
59-
| 1.1 | Support engine version in-place upgrade | |
60-
| 1.2 | | |
61-
| 1.3 | | |
62-
| 1.4 | | |
57+
| Task | Description | Deliverable |
58+
| ---- | ------------------------------------------------------------ | --------------------------------------------------- |
59+
| 1.1 | Support in-place upgrade for updating engine version | Cache Runtime supports in-place upgrade |
60+
| 1.2 | Support in-place cache rebuild for config changes | Cache Runtime supports in-place cache rebuild |
61+
| 1.3 | Write related unit test code and e2e test scripts | unit tests and e2e test using curvine cache runtime |
62+
| 1.4 | Document how to use in-place upgrade and in-place cache rebuild | Docs section: "In-Place Upgrade and Rebuild". |
6363

6464

6565

66-
## 3. 时间线
66+
## 3. Dependencies
67+
68+
- No new external service dependencies at build time.
69+
70+
71+
72+
## 4. Success Metrics
73+
74+
**Usability**: users can utilize Curvine for data caching through GenericCacheRuntime.
75+
76+
**Support**: users can use Curvine for DataLoad/DataProcess scenarios.
77+
78+
**Resilience**: users can update Curvine Cache Runtime version and rebuild cache in-place.
79+
80+
81+
82+
## 5. **Timeline (Suggested)**
6783

6884
| **Phase** | **Focus** | **Suggested duration** |
6985
| ----------- | ---------------------------- | ---------------------- |
7086
| **Phase 1** | Curvine Integration | 3-4 Weeks |
7187
| **Phase 2** | DataOperation Support | 3-4 Weeks |
7288
| **Phase 3** | In-Place Upgrade and Rebuild | 3-4 Weeks |
7389

74-
Total: on the order of **9-12 weeks** for a small team or single contributor, depending on familiarity with the codebase and cluster access for testing.
90+
Total: on the order of **9-12 weeks** for a small team or single contributor, depending on familiarity with the codebase and cluster access for testing.
91+

0 commit comments

Comments
 (0)