Skip to content

Commit b02e915

Browse files
committed
update jupyter notebooks for 3rd and 4th episode
1 parent 7acb007 commit b02e915

File tree

3 files changed

+41
-118
lines changed

3 files changed

+41
-118
lines changed

content/06-supervised-ML-regression.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -564,7 +564,7 @@ Code examples are availalbe in the [Jupyter Notebook](./jupyter-notebooks/6-ML-R
564564
:::{seealso}
565565
- [Hyperparameter optimization](https://en.wikipedia.org/wiki/Hyperparameter_optimization)
566566
- [Grid search](https://drbeane.github.io/python_dsci/pages/grid_search.html)
567-
- [Introduction To Cross-Validation in Machine Learning](https://thatdatatho.com/detailed-introduction-cross-validation-machine-learning/)
567+
- [Introduction to Cross-Validation in Machine Learning](https://thatdatatho.com/detailed-introduction-cross-validation-machine-learning/)
568568
:::
569569

570570

content/jupyter-notebooks/3-Tensor.ipynb

Lines changed: 16 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -12,8 +12,7 @@
1212
{
1313
"cell_type": "markdown",
1414
"metadata": {
15-
"id": "XhOKjEQHUFid",
16-
"jp-MarkdownHeadingCollapsed": true
15+
"id": "XhOKjEQHUFid"
1716
},
1817
"source": [
1918
"## 1. What is Tensor"
@@ -51,8 +50,7 @@
5150
{
5251
"cell_type": "markdown",
5352
"metadata": {
54-
"id": "t8j3e6sbTuOl",
55-
"jp-MarkdownHeadingCollapsed": true
53+
"id": "t8j3e6sbTuOl"
5654
},
5755
"source": [
5856
"## 2. Creat Tensors"
@@ -61,8 +59,7 @@
6159
{
6260
"cell_type": "markdown",
6361
"metadata": {
64-
"id": "q9cRkvIEZ_ZT",
65-
"jp-MarkdownHeadingCollapsed": true
62+
"id": "q9cRkvIEZ_ZT"
6663
},
6764
"source": [
6865
"### 2.1 From Python objects"
@@ -108,8 +105,7 @@
108105
{
109106
"cell_type": "markdown",
110107
"metadata": {
111-
"id": "081fqwJpaOM0",
112-
"jp-MarkdownHeadingCollapsed": true
108+
"id": "081fqwJpaOM0"
113109
},
114110
"source": [
115111
"### 2.2 From Numpy objects"
@@ -197,8 +193,7 @@
197193
{
198194
"cell_type": "markdown",
199195
"metadata": {
200-
"id": "qZcFBz8EaitD",
201-
"jp-MarkdownHeadingCollapsed": true
196+
"id": "qZcFBz8EaitD"
202197
},
203198
"source": [
204199
"### 2.3 From functions to create tensors"
@@ -396,8 +391,7 @@
396391
{
397392
"cell_type": "markdown",
398393
"metadata": {
399-
"id": "EoU7a1IDe3oZ",
400-
"jp-MarkdownHeadingCollapsed": true
394+
"id": "EoU7a1IDe3oZ"
401395
},
402396
"source": [
403397
"## 3. Tensor's properties"
@@ -406,8 +400,7 @@
406400
{
407401
"cell_type": "markdown",
408402
"metadata": {
409-
"id": "87MkSs5af_iC",
410-
"jp-MarkdownHeadingCollapsed": true
403+
"id": "87MkSs5af_iC"
411404
},
412405
"source": [
413406
"### 3.1 Tensor.shape"
@@ -451,8 +444,7 @@
451444
{
452445
"cell_type": "markdown",
453446
"metadata": {
454-
"id": "zSfQ7msOgg2R",
455-
"jp-MarkdownHeadingCollapsed": true
447+
"id": "zSfQ7msOgg2R"
456448
},
457449
"source": [
458450
"### 3.2 Tensor.ndim"
@@ -496,8 +488,7 @@
496488
{
497489
"cell_type": "markdown",
498490
"metadata": {
499-
"id": "uZCRMgjogz8x",
500-
"jp-MarkdownHeadingCollapsed": true
491+
"id": "uZCRMgjogz8x"
501492
},
502493
"source": [
503494
"### 3.3 Tensor.dtype"
@@ -548,8 +539,7 @@
548539
{
549540
"cell_type": "markdown",
550541
"metadata": {
551-
"id": "KoG4GgknmRy_",
552-
"jp-MarkdownHeadingCollapsed": true
542+
"id": "KoG4GgknmRy_"
553543
},
554544
"source": [
555545
"## 4. Tensor operations"
@@ -558,8 +548,7 @@
558548
{
559549
"cell_type": "markdown",
560550
"metadata": {
561-
"id": "UEDH0rsSmnZI",
562-
"jp-MarkdownHeadingCollapsed": true
551+
"id": "UEDH0rsSmnZI"
563552
},
564553
"source": [
565554
"### 4.1 Indexing"
@@ -609,8 +598,7 @@
609598
{
610599
"cell_type": "markdown",
611600
"metadata": {
612-
"id": "d8bEpG89pFhu",
613-
"jp-MarkdownHeadingCollapsed": true
601+
"id": "d8bEpG89pFhu"
614602
},
615603
"source": [
616604
"### 4.2 Combining Tensors"
@@ -662,8 +650,7 @@
662650
{
663651
"cell_type": "markdown",
664652
"metadata": {
665-
"id": "wFhC1jCvrG43",
666-
"jp-MarkdownHeadingCollapsed": true
653+
"id": "wFhC1jCvrG43"
667654
},
668655
"source": [
669656
"### 4.3 Split Tensors"
@@ -804,8 +791,7 @@
804791
{
805792
"cell_type": "markdown",
806793
"metadata": {
807-
"id": "aRyZ34QJyCnc",
808-
"jp-MarkdownHeadingCollapsed": true
794+
"id": "aRyZ34QJyCnc"
809795
},
810796
"source": [
811797
"### 4.4 Build-in Math Functions"
@@ -1386,8 +1372,7 @@
13861372
{
13871373
"cell_type": "markdown",
13881374
"metadata": {
1389-
"id": "Yo72y7ufULIS",
1390-
"jp-MarkdownHeadingCollapsed": true
1375+
"id": "Yo72y7ufULIS"
13911376
},
13921377
"source": [
13931378
"### 4.5 Activation functions"
@@ -1466,8 +1451,7 @@
14661451
{
14671452
"cell_type": "markdown",
14681453
"metadata": {
1469-
"id": "H8aMRAtcHpph",
1470-
"jp-MarkdownHeadingCollapsed": true
1454+
"id": "H8aMRAtcHpph"
14711455
},
14721456
"source": [
14731457
"### 5.1 Tensor.device (CPU vs CUDA)\n"

content/jupyter-notebooks/4-Data-Preprocessing.ipynb

Lines changed: 24 additions & 85 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
"id": "0f8d00c9-72e4-4712-b360-bcf0f3c64c5b",
66
"metadata": {},
77
"source": [
8-
"# Data Processing using the Palmer Penguins Dataset"
8+
"# Data Preparation using the Palmer Penguins Dataset"
99
]
1010
},
1111
{
@@ -43,9 +43,7 @@
4343
{
4444
"cell_type": "markdown",
4545
"id": "71dbbbee-d16b-4adc-a04f-a24548233472",
46-
"metadata": {
47-
"jp-MarkdownHeadingCollapsed": true
48-
},
46+
"metadata": {},
4947
"source": [
5048
"## 1. Loading Dataset"
5149
]
@@ -133,9 +131,7 @@
133131
{
134132
"cell_type": "markdown",
135133
"id": "3750928c-7b7a-4b28-b8fc-c7afc8df52a8",
136-
"metadata": {
137-
"jp-MarkdownHeadingCollapsed": true
138-
},
134+
"metadata": {},
139135
"source": [
140136
"## 2. Handling Missing Values"
141137
]
@@ -151,22 +147,10 @@
151147
"penguins_test.style.highlight_null(color = 'red')"
152148
]
153149
},
154-
{
155-
"cell_type": "code",
156-
"execution_count": null,
157-
"id": "822adc7d-cb48-4a5c-a6fe-76ed399b04d5",
158-
"metadata": {},
159-
"outputs": [],
160-
"source": [
161-
"penguins_test.describe()"
162-
]
163-
},
164150
{
165151
"cell_type": "markdown",
166152
"id": "6d7dc18f-f2d8-470a-886c-13c08fa2463b",
167-
"metadata": {
168-
"jp-MarkdownHeadingCollapsed": true
169-
},
153+
"metadata": {},
170154
"source": [
171155
"### 2.1 Handling missing numerical data"
172156
]
@@ -179,30 +163,6 @@
179163
"**Mean or Median Imputation**"
180164
]
181165
},
182-
{
183-
"cell_type": "code",
184-
"execution_count": null,
185-
"id": "f7108013-9b25-4431-bcca-4a99ebcc8931",
186-
"metadata": {},
187-
"outputs": [],
188-
"source": [
189-
"# check if there are `NaN` in dataset if the dataset is too large\n",
190-
"\n",
191-
"penguins_test.style.highlight_null(color = 'red')"
192-
]
193-
},
194-
{
195-
"cell_type": "code",
196-
"execution_count": null,
197-
"id": "4ad9f8a1-e3ae-49f3-9cf5-03bfb16164de",
198-
"metadata": {},
199-
"outputs": [],
200-
"source": [
201-
"print(penguins_test.info(), '\\n\\n')\n",
202-
"\n",
203-
"print(penguins_test.isnull().mean())"
204-
]
205-
},
206166
{
207167
"cell_type": "code",
208168
"execution_count": null,
@@ -302,13 +262,21 @@
302262
{
303263
"cell_type": "markdown",
304264
"id": "ab1553b9-cf42-441e-b9e8-52f7b8ca2e0c",
305-
"metadata": {
306-
"jp-MarkdownHeadingCollapsed": true
307-
},
265+
"metadata": {},
308266
"source": [
309267
"### 2.2 Handling missing categorical data"
310268
]
311269
},
270+
{
271+
"cell_type": "code",
272+
"execution_count": null,
273+
"id": "e4bf1eee-0f26-4238-a0b6-7dbff8ae4cbf",
274+
"metadata": {},
275+
"outputs": [],
276+
"source": [
277+
"penguins_test.style.highlight_null(color = 'red')"
278+
]
279+
},
312280
{
313281
"cell_type": "code",
314282
"execution_count": null,
@@ -435,9 +403,7 @@
435403
{
436404
"cell_type": "markdown",
437405
"id": "7c682fc0-b90c-406d-8e2a-2c7ad8961e5a",
438-
"metadata": {
439-
"jp-MarkdownHeadingCollapsed": true
440-
},
406+
"metadata": {},
441407
"source": [
442408
"### 2.3 Remove missing values"
443409
]
@@ -466,9 +432,7 @@
466432
{
467433
"cell_type": "markdown",
468434
"id": "6dbdd32f-3f18-416d-8831-7e544b96ea68",
469-
"metadata": {
470-
"jp-MarkdownHeadingCollapsed": true
471-
},
435+
"metadata": {},
472436
"source": [
473437
"## 3. Handling Outliers"
474438
]
@@ -492,9 +456,7 @@
492456
{
493457
"cell_type": "markdown",
494458
"id": "49d907e8-4224-495b-8cbc-2dd8b9e31a47",
495-
"metadata": {
496-
"jp-MarkdownHeadingCollapsed": true
497-
},
459+
"metadata": {},
498460
"source": [
499461
"### 3.1 How to define outlier?"
500462
]
@@ -526,9 +488,7 @@
526488
{
527489
"cell_type": "markdown",
528490
"id": "7dfcbbe8-d12e-4eab-b0d3-8c38792873f8",
529-
"metadata": {
530-
"jp-MarkdownHeadingCollapsed": true
531-
},
491+
"metadata": {},
532492
"source": [
533493
"### 3.2 The Inter quartile range (IQR) method"
534494
]
@@ -612,9 +572,7 @@
612572
{
613573
"cell_type": "markdown",
614574
"id": "0760f0a2-a0e7-4c13-8151-887856bd79b7",
615-
"metadata": {
616-
"jp-MarkdownHeadingCollapsed": true
617-
},
575+
"metadata": {},
618576
"source": [
619577
"### 3.3 The mean-standard deviation method"
620578
]
@@ -653,17 +611,6 @@
653611
"# lower limt of IQR = 1703.125 and upper limit of IQR = 6628.125"
654612
]
655613
},
656-
{
657-
"cell_type": "code",
658-
"execution_count": null,
659-
"id": "e7c43aec-f217-4366-a4a1-8ebf93e070fd",
660-
"metadata": {},
661-
"outputs": [],
662-
"source": [
663-
"# penguins_test_BMG_outlier_remove_IQR = penguins_test_BMG_outlier[penguins_test_BMG_outlier[\"body_mass_g\"] < upper_bmg_limit]\n",
664-
"# penguins_test_BMG_outlier_remove_IQR"
665-
]
666-
},
667614
{
668615
"cell_type": "code",
669616
"execution_count": null,
@@ -701,9 +648,7 @@
701648
{
702649
"cell_type": "markdown",
703650
"id": "4dceaa3a-a84e-4716-9b90-c9a744006eb5",
704-
"metadata": {
705-
"jp-MarkdownHeadingCollapsed": true
706-
},
651+
"metadata": {},
707652
"source": [
708653
"## 4. Encoding Categorical Variables"
709654
]
@@ -722,9 +667,7 @@
722667
{
723668
"cell_type": "markdown",
724669
"id": "51527736-0701-4658-a28a-34134e760c5a",
725-
"metadata": {
726-
"jp-MarkdownHeadingCollapsed": true
727-
},
670+
"metadata": {},
728671
"source": [
729672
"### 4.1 One hot encoding (OHE)"
730673
]
@@ -750,9 +693,7 @@
750693
{
751694
"cell_type": "markdown",
752695
"id": "1965e127-b284-4ff4-9f7b-444df5f343e1",
753-
"metadata": {
754-
"jp-MarkdownHeadingCollapsed": true
755-
},
696+
"metadata": {},
756697
"source": [
757698
"### 4.2 Label encoding"
758699
]
@@ -787,9 +728,7 @@
787728
{
788729
"cell_type": "markdown",
789730
"id": "d9a4887f-c02f-4942-81e0-8eb647f0cd87",
790-
"metadata": {
791-
"jp-MarkdownHeadingCollapsed": true
792-
},
731+
"metadata": {},
793732
"source": [
794733
"### 4.3 The get_dummies() function in Pandas"
795734
]

0 commit comments

Comments
 (0)