-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathgit_github_overview.qmd
More file actions
316 lines (199 loc) · 7.7 KB
/
git_github_overview.qmd
File metadata and controls
316 lines (199 loc) · 7.7 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
---
title: "Getting Started"
format:
html:
code-fold: false
code-tools: true
toc: true
number-sections: true
author: "Jenny L. Smith"
---
::: callout-tip
## Standardization of Code Repositories
1. Naming conventions for `github` repositories
2. Initializing the code repository using `git` and `github`
3. Using data analysis quarto templates
:::
## Github Repository Naming Conventions
Repositories have the following nomenclature for easy searching:
1. Exploratory Data Analysis and Research Projects with collaborators:
- `[YEAR-MONTH-DAY]_[DATA-TYPE]_[SHORT-DESCRIPTION]`
- **Example**: '2025-06-02_RNAseq_Bulk_T-ALL'
- **Example**: with two data types: '2025-06-02_RNAseq_WGS_Bulk_T-ALL'
2. Genomic Data pipelines and workflows:
- `[DATA-TYPE]_[DATA-PROCESSING]_[WORKFLOW-TYPE]`
- **Example**: 'RNAseq_Fusion_Calling_Nextflow'
3. R packages:
- `[SHORT-NAME]_R_Package`
- **Example**: 'DeGSEA_R_Package'
4. Python packages:
- `[SHORT-NAME]_Py_Package`
- **Example**: 'DataValidator_Py_Package'
### Example
0. First create a folder (directory) name following the standardized naming conventions listed above.
``` bash
DATA_TYPE="RNAseq"
DESCRIPTION="Example_Repo"
OUTDIR="$(date +%Y-%m-%d)_${DATA_TYPE}_${DESCRIPTION}"
echo $OUTDIR
```
This will create a variable that is has the value set as the string `2025-06-17_RNAseq_Example_Repo`.
## Create the Data Analysis Notebook Template
::: panel-tabset
## R
1. Create a ***private*** remote repository in the Meshinchi-Lab lab github organization.
Then repository name format is the organization name `Meshinchi-Lab/` followed by the name of the analysis directory you just created.
``` bash
REPO="Meshinchi-Lab/${OUTDIR}"
echo $REPO
```
This will create a variable that is set as the string `Meshinchi-Lab/2025-06-17_RNAseq_Example_Repo`.
``` bash
gh repo create $REPO \
--description "An example R analysis template repo" \
--template "Meshinchi-Lab/r_analysis_template" \
--clone \
--private
```
If `gh repo create` command is successful, you will see:
> ✓ Created repository Meshinchi-Lab/2025-06-16_RNAseq_Example_Repo on GitHub
> https://github.com/Meshinchi-Lab/2025-06-16_RNAseq_Example_Repo
> ✓ Added remote https://github.com/Meshinchi-Lab/2025-06-16_RNAseq_Example_Repo.git
2. Change directories (`cd`) into the new github repository you've just cloned (by using the `--clone` argument in step 1).
Make sure to use `git pull` in order to have your local repository in sync with your new local repository.
``` bash
cd $OUTDIR && git pull origin main
```
3. Use the `quarto` CLI to download the quarto markdown template and follow the prompts:
- ? Do you trust the authors of this template (Y/n) › **Yes**
- ? Create a subdirectory for template? (Y/n) › **No**
``` bash
quarto use template Meshinchi-Lab/r_analysis_template
```
You will see the following output:
> Downloading \[###################################\] 0.0
> \[✓\] Downloading
> \[✓\] Unzipping
> Preparing template files...
> \[✓\] Copying files...
> Files created:
> - 2025-06-16_RNAseq_Example_Repo.qmd
The `2025-06-16_RNAseq_Example_Repo.qmd` will be the template for the EDA report. You can rename it if you want. You can now open the `.qmd` file in your text editor and start editing it.
In this case, the qmd file is named the same as the directory you created, eg `2025-06-16_RNAseq_Example_Repo.qmd`. If you've renamed the qmd, be sure to use the right value for the "$FILE" variable.
``` bash
FILE="${OUTDIR}.qmd"
echo $FILE
```
## Python
1. Create a ***private*** remote repository in the Meshinchi-Lab lab github organization.
Then repository name format is the organization name `Meshinchi-Lab/` followed by the name of the analysis directory you just created.
``` bash
REPO="Meshinchi-Lab/${OUTDIR}"
echo $REPO
```
This will create a variable that is set as the string `Meshinchi-Lab/2025-06-17_RNAseq_Example_Repo`.
``` bash
gh repo create $REPO \
--description "An example py analysis template repo" \
--template "Meshinchi-Lab/py_analysis_template" \
--clone \
--private
```
If `gh repo create` command is successful, you will see:
> ✓ Created repository Meshinchi-Lab/2025-06-16_RNAseq_Example_Repo on GitHub
> https://github.com/Meshinchi-Lab/2025-06-16_RNAseq_Example_Repo
> ✓ Added remote https://github.com/Meshinchi-Lab/2025-06-16_RNAseq_Example_Repo.git
2. Change directories (`cd`) into the new github repository you've just cloned (by using the `--clone` argument in step 1).
Make sure to use `git pull` in order to have your local repository in sync with your new local repository.
``` bash
cd $OUTDIR && git pull origin main
```
Use the `quarto` CLI to download the quarto markdown template and follow the prompts:
- ? Do you trust the authors of this template (Y/n) › **Yes**
- ? Create a subdirectory for template? (Y/n) › **No**
``` bash
quarto use template Meshinchi-Lab/py_analysis_template
```
You will see the following output:
> Downloading \[###################################\] 0.0
> \[✓\] Downloading
> \[✓\] Unzipping
> Preparing template files...
> \[✓\] Copying files...
> Files created:
> - 2025-06-16_RNAseq_Example_Repo.qmd
The `2025-06-16_RNAseq_Example_Repo.qmd` will be the template for the EDA report. You can rename it if you want. You can now open the `.qmd` file in your text editor and start editing it.
In this case, the qmd file is named the same as the directory you created, eg `2025-06-16_RNAseq_Example_Repo.qmd`. If you've renamed the qmd, be sure to use the right value for the "$FILE" variable.
``` bash
FILE="${OUTDIR}.qmd"
echo $FILE
```
Then convert the qmd template to ipython notebook (.ipynb) that can be used for EDA analyses.
``` bash
quarto convert $FILE
```
:::
## Install Dependencies and Render
::: {.panel-tabset}
## R
Create a virtual environment using `renv` and then install the project dependencies.
``` bash
Rscript -e "renv::activate()"
Rscript -e "renv::restore()"
```
When you are ready to render the HTML report, run:
``` bash
quarto render $FILE
```
## Python
Create a virtual environment using `venv` and then install the initial project dependencies.
``` bash
python3 -m venv venv
source venv/bin/activate
python3 -m pip install -r requirements.txt
```
When you are ready to render the HTML report, run:
``` bash
IPYNB="${OUTDIR}.ipynb"
echo $IPYNB
quarto render $IPYNB
```
:::
## Commit Changes
1. First, add the quarto analysis template qmd file.
In this case, the qmd file is named the same as the directory you created, eg "2025-06-16_RNAseq_Example_Repo.qmd". If you've renamed the qmd before committing the file, be sure to use the right name for the "$FILE" variable.
``` bash
FILE="${OUTDIR}.qmd"
git status
git add $FILE
git status
```
2. Commit the new template qmd file.
And use `git log` to see what changes were made, for example if you come back late to continue.
``` bash
git commit -m "Added analysis template"
git log --oneline
```
When you make changes to the analysis qmd and create any new files, you will repeat steps 1 and 2 here. This is summarized in the section below.
## Summary
In summary, after creating the github repository with the analysis quarto template, you will be repeating the commands below as you make edits.
1. Ensure the virtual environment is up-to-date and activated.
For R
``` bash
Rscript -e "renv::activate()"
Rscript -e "renv::status()"
```
For Python
``` bash
source venv/bin/activate
which python
```
2. Use these 5 git commands as you make changes to files, and install software dependencies
``` bash
git pull
git status
git add [MY FILE THAT CHANGED] [A NEW FILE]
git commit -m "describe what changed"
git status
git log --oneline
```