diff --git a/Notebooks/NB07__Dictionaries_gerson.ipynb b/Notebooks/NB07__Dictionaries_gerson.ipynb
new file mode 100644
index 000000000..29e256ba4
--- /dev/null
+++ b/Notebooks/NB07__Dictionaries_gerson.ipynb
@@ -0,0 +1,2681 @@
+{
+ "nbformat": 4,
+ "nbformat_minor": 0,
+ "metadata": {
+ "colab": {
+ "name": "NB07__Dictionaries.ipynb",
+ "provenance": [],
+ "collapsed_sections": [
+ "n8BIbzQbNWUo",
+ "7eS94uQ4NhVR",
+ "SYOgJpGYVLUu",
+ "CaHFxk98W5if",
+ "ReWUyWiHXCnc",
+ "CqszHxaKHr2h",
+ "tXgF1Wl9gHKY",
+ "Fotx7XUquAo8",
+ "36kmLUYDvsUI",
+ "SWO2GdNovxAp",
+ "vpN54l4vxze5",
+ "u4HOf9SNytSq",
+ "6BQ9oZiD9hg5",
+ "tz5-QdrX9vct",
+ "p1muBgMX8NK4",
+ "FxTC2-U88ajk",
+ "z8EYn0pP25Rh"
+ ],
+ "include_colab_link": true
+ },
+ "kernelspec": {
+ "name": "python3",
+ "display_name": "Python 3"
+ },
+ "accelerator": "GPU"
+ },
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "view-in-github",
+ "colab_type": "text"
+ },
+ "source": [
+ "
"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "iBW6agsvqqAm"
+ },
+ "source": [
+ "
DICIONÁRIOS
\n",
+ "\n",
+ "* Coleção desordenada, mutável e indexada (estrutura do tipo {key: value}) de itens;\n",
+ "* Não permite itens duplicados;\n",
+ "* Usamos {key: value} para representar os itens do dicionário;\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "LFcr_2Xnq2ho"
+ },
+ "source": [
+ "# **AGENDA**:\n",
+ "\n",
+ "> Veja o **índice** dos itens que serão abordados neste capítulo.\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "r8vR-lHJIhgM"
+ },
+ "source": [
+ "# **NOTAS E OBSERVAÇÕES**\n",
+ "* Levar os exemplos de lambda function daqui para o capítulo de Lambda Function.\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "DkxCxjsbE5fL"
+ },
+ "source": [
+ "# **CHEETSHEET**"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "cGUWTualFCOk"
+ },
+ "source": [
+ ""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "ublDMf3R_qMn"
+ },
+ "source": [
+ "A seguir, os principais métodos associados aos dicionários. Para isso, considere as listas l_frutas e l_precos_frutas que darão origem ao dicionário d_frutas a seguir:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "FxuJ7Awd8f5a"
+ },
+ "source": [
+ "# Definição da lista l_frutas:\n",
+ "l_frutas = ['Avocado', 'Apple', 'Apricot', 'Banana', 'Blackcurrant', 'Blackberry', 'Blueberry', 'Cherry', 'Coconut', 'Fig', 'Grape', 'Kiwi', 'Lemon', 'Mango', 'Nectarine', \n",
+ " 'Orange', 'Papaya','Passion Fruit','Peach','Pineapple','Plum','Raspberry','Strawberry','Watermelon']"
+ ],
+ "execution_count": 4,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "jJyxuMQc9Ewy"
+ },
+ "source": [
+ "# Definição da lista l_precos_frutas:\n",
+ "l_precos_frutas = [0.35, 0.40, 0.25, 0.30, 0.70, 0.55, 0.45, 0.50, 0.75, 0.60, 0.65, 0.20, 0.15, 0.80, 0.75, 0.25, 0.30,0.45,0.55,0.55,0.60,0.40,0.50,0.45]"
+ ],
+ "execution_count": 5,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "C59Z2LQpZ7DD",
+ "outputId": "fb61f9fc-7c46-418d-8e95-f5e9d1090dc2",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ }
+ },
+ "source": [
+ "a= ['abacate', 'ameixa']\n",
+ "p= [4, 8]\n",
+ "c= dict(zip(a,p))\n",
+ "c"
+ ],
+ "execution_count": 2,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "{'abacate': 4, 'ameixa': 8}"
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "execution_count": 2
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "hXP3kxW4-AI1"
+ },
+ "source": [
+ "Observe abaixo o uso das funções dict() e zip() para criarmos o dicionário d_frutas:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "qT_4sYxA9dyn",
+ "outputId": "65e827a8-c58a-4191-816d-87585c794b85",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ }
+ },
+ "source": [
+ "# Definir o dicionário d_frutas: {chave: valor}\n",
+ "d_frutas = dict(zip(l_frutas, l_precos_frutas))\n",
+ "d_frutas"
+ ],
+ "execution_count": 6,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "{'Apple': 0.4,\n",
+ " 'Apricot': 0.25,\n",
+ " 'Avocado': 0.35,\n",
+ " 'Banana': 0.3,\n",
+ " 'Blackberry': 0.55,\n",
+ " 'Blackcurrant': 0.7,\n",
+ " 'Blueberry': 0.45,\n",
+ " 'Cherry': 0.5,\n",
+ " 'Coconut': 0.75,\n",
+ " 'Fig': 0.6,\n",
+ " 'Grape': 0.65,\n",
+ " 'Kiwi': 0.2,\n",
+ " 'Lemon': 0.15,\n",
+ " 'Mango': 0.8,\n",
+ " 'Nectarine': 0.75,\n",
+ " 'Orange': 0.25,\n",
+ " 'Papaya': 0.3,\n",
+ " 'Passion Fruit': 0.45,\n",
+ " 'Peach': 0.55,\n",
+ " 'Pineapple': 0.55,\n",
+ " 'Plum': 0.6,\n",
+ " 'Raspberry': 0.4,\n",
+ " 'Strawberry': 0.5,\n",
+ " 'Watermelon': 0.45}"
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "execution_count": 6
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "bIJ4cYhlZ5oT"
+ },
+ "source": [
+ ""
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "iHKUaGNT_IDt"
+ },
+ "source": [
+ "A seguir, resumo dos principais métodos relacionados à dicionários:"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "MQLZ1mwW_yiU"
+ },
+ "source": [
+ "| Método | Descrição | Exemplo | Resultado |\n",
+ "|-------------------------|----------------------------------------------------------------------------------------------------|------------------------------------------|--------------------------------------------------------------------------------|\n",
+ "| d_dicionario.clear() | Remove todos os itens de d_dicionario | d_frutas.clear() | {} |\n",
+ "| d_dicionario.copy() | Retorna uma cópia de d_dicionario | d_frutas2= d_frutas.copy() | d_frutas2 é uma cópia de d_frutas |\n",
+ "| d_dicionario.get(key) | Retorna o valor para key, se key estiver em d_dicionario | d_frutas.get('Passion Fruit') | 0.45 |\n",
+ "| | | d_frutas.get('XPTO') | O Python não apresenta nenhum retorno |\n",
+ "| d_dicionario.items() | Retorna um objeto com as tuplas (key, valor) de d_dicionario | d_frutas.items() | dict_items([('Avocado', 0.35), ..., ('Watermelon', 0.45)]) |\n",
+ "| d_dicionario.keys() | Retorna um objeto com as keys de d_dicionario | d_frutas.keys() | dict_keys(['Avocado', 'Apple', ..., 'Watermelon']) |\n",
+ "| d_dicionario.values() | Retorna um objeto com os valores de d_dicionario | d_frutas.values() | dict_values([0.35, 0.4, ..., 0.45]) |\n",
+ "| d_dicionario.popitem() | Retorna e remove um item de d_dicionario | d_frutas.popitem() | ('Watermelon', 0.45) |\n",
+ "| | | 'Watermelon' in d_frutas | False |\n",
+ "| d_dicionario.pop(key[, default]) | Retorna e remove o item de d_dicionario correspondente à key | d_frutas.pop('Orange') | 0.25 |\n",
+ "| | | 'Orange' in d_frutas | False |\n",
+ "| d_dicionario.update(d2) | Adiciona item(s) à d_dicionario se key não estiver em d_dicionario. Se key estiver em d_dicionario, atualizará key com o novo valor | d_frutas.update({'Cherimoya': 1.3}) | Adicionará o item {'Cherimoya': 1.3} à d_frutas, pois key= 'Cherimoya' não está em d_frutas. |\n",
+ "| | | d_frutas.update({'Orange': 0.55}) | Atualiza o valor de key= 'Orange' para 0.55. O valor anterior era 0.25 |\n",
+ "| d_dicionario.fromkeys(keys, value) | Retorna um dicionário com keys especificadas e valores | tFruits= ('Avocado', 'Apple', 'Apricot') | |\n",
+ "| | | d_frutas.fromkeys(tFruits, 0) | {'Apple': 0, 'Apricot': 0, 'Avocado': 0} |"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "uH6cHnctDu2l"
+ },
+ "source": [
+ "A seguir, vamos apresentar mais alguns exemplos de dicionários e seus métodos associados:"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "YeCPxCab4e4k"
+ },
+ "source": [
+ "___\n",
+ "# **EXEMPLO**\n",
+ "* Os dias da semana como dicionário."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "N_2J839X4lps",
+ "outputId": "3d356121-d4b7-424a-addb-949be5a9d193",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ }
+ },
+ "source": [
+ "d_dia_semana = {'Seg': 'Segunda', 'Ter': 'Terça', 'Qua': 'Quarta', 'Qui': 'Quinta', 'Sex': 'Sexta', 'Sab': 'Sabado', 'Dom': 'Domingo'}\n",
+ "d_dia_semana"
+ ],
+ "execution_count": 7,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "{'Dom': 'Domingo',\n",
+ " 'Qua': 'Quarta',\n",
+ " 'Qui': 'Quinta',\n",
+ " 'Sab': 'Sabado',\n",
+ " 'Seg': 'Segunda',\n",
+ " 'Sex': 'Sexta',\n",
+ " 'Ter': 'Terça'}"
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "execution_count": 7
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "CnZLR-VX6FV4"
+ },
+ "source": [
+ "Observe que:\n",
+ "* os itens do dicionário d_dia_semana seguem a estrutura {key: value}.\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "eHuvY7BWQKhQ",
+ "outputId": "3906f3f3-c849-4689-8da1-8d40e8f26369",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 35
+ }
+ },
+ "source": [
+ "d_dia_semana['Seg']"
+ ],
+ "execution_count": 8,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "application/vnd.google.colaboratory.intrinsic+json": {
+ "type": "string"
+ },
+ "text/plain": [
+ "'Segunda'"
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "execution_count": 8
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "j65BxhzGG0NA"
+ },
+ "source": [
+ "___\n",
+ "# **DECLARAR OU INICIALIZAR UM DICIONÁRIO VAZIO**"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "LEGwQ0U-fKtL"
+ },
+ "source": [
+ "Por exemplo, o comando abaixo declara um dicionário vazio chamado d_paises:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "2iPWXPBLfOlr",
+ "outputId": "5687da81-8541-4169-eea4-65fd25044e4e",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ }
+ },
+ "source": [
+ "d_paises = {} # Também podemos usar a função dict() para criar o dicionário vazio da seguinte forma: d_paises= dict()\n",
+ "d_paises"
+ ],
+ "execution_count": 11,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "{}"
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "execution_count": 11
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "vCxZv-jmG5y0"
+ },
+ "source": [
+ "___\n",
+ "# **OBTER O TIPO DO OBJETO**\n",
+ "> type(d_dicionario)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "voPYpGIGff3o",
+ "outputId": "174838de-d69c-40fe-fa79-6bd7082044e7",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ }
+ },
+ "source": [
+ "type(d_paises)"
+ ],
+ "execution_count": 12,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "dict"
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "execution_count": 12
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "X3MvCkFiG-UO"
+ },
+ "source": [
+ "___\n",
+ "# **ADICIONAR ITENS AO DICIONÁRIO**"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "fzP8iG5xfi0H"
+ },
+ "source": [
+ "Adicionar o valor 'Italy' à key = 1. Em outras palavras, estamos a adicionar o item {1: 'Italy'}"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "EXZ7eEZofnza",
+ "outputId": "b6789781-3d15-47cd-edae-90c7b9ab4013",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ }
+ },
+ "source": [
+ "d_paises[1] = 'Italy'\n",
+ "d_paises"
+ ],
+ "execution_count": 13,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "{1: 'Italy'}"
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "execution_count": 13
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "rH51ORGHHREE"
+ },
+ "source": [
+ "Adicionar o valor 'Denmark' à key= 2. Em outras palavras, estamos a adicionar o item {2: 'Denmark'}"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "GAXSzSiufv1u",
+ "outputId": "ad237289-6397-438f-910d-c79b0acd9dea",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ }
+ },
+ "source": [
+ "d_paises[2] = 'Denmark'\n",
+ "d_paises"
+ ],
+ "execution_count": 14,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "{1: 'Italy', 2: 'Denmark'}"
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "execution_count": 14
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "Xqdc_IYoHVVQ"
+ },
+ "source": [
+ "Adicionar o valor 'Brazil' à key= 3. Em outras palavras, estamos a adicionar o item {3: 'Brazil'}"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "FN7km8C9gAjM",
+ "outputId": "0905f1e6-6f19-40c7-a467-12c162ac1cc3",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ }
+ },
+ "source": [
+ "d_paises[3]= 'Brazil'\n",
+ "d_paises"
+ ],
+ "execution_count": 15,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "{1: 'Italy', 2: 'Denmark', 3: 'Brazil'}"
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "execution_count": 15
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "iwU8pJKRHapD"
+ },
+ "source": [
+ "___\n",
+ "# **ATUALIZAR VALORES DO DICIONÁRIO**"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "CxXUV7TugLXn"
+ },
+ "source": [
+ "O que acontece quando eu atribuo à key 3 outro valor, por exemplo, 'France'. Vamos conferir abaixo:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "Rr6DtJnDgU5I",
+ "outputId": "31925676-fdbe-4c98-cb3a-f59978009711",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ }
+ },
+ "source": [
+ "# Adicionar o valor 'France' à key= 3\n",
+ "d_paises[3]= 'France'\n",
+ "d_paises"
+ ],
+ "execution_count": 16,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "{1: 'Italy', 2: 'Denmark', 3: 'France'}"
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "execution_count": 16
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "xB9G1l3_ggo-"
+ },
+ "source": [
+ "Como a key= 3 existe no dicionário d_paises, então o Python substitui o valor anterior 'Brazil' pelo novo valor, 'France'. \n",
+ "\n",
+ "* Lembre-se, os dicionários são mutáveis!"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "T8JBxySZHiOJ"
+ },
+ "source": [
+ "___\n",
+ "# **OBTER KEYS DO DICIONÁRIO**"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "ALwbHwi4iwky",
+ "outputId": "bb0d57fb-2742-4eb1-9d82-9309142d21f5",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 34
+ }
+ },
+ "source": [
+ "d_paises.keys()"
+ ],
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "dict_keys([1, 2, 3])"
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "execution_count": 10
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "FIvi0Li1Hng5"
+ },
+ "source": [
+ "___\n",
+ "# **OBTER VALORES DO DICIONÁRIO**"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "cp0PPtl3jEKo",
+ "outputId": "c7b8739a-caa9-4e58-e6d3-0f86ccd2d950",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 34
+ }
+ },
+ "source": [
+ "d_paises.values()"
+ ],
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "dict_values(['Italy', 'Denmark', 'France'])"
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "execution_count": 11
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "JUblZBMjHrwl"
+ },
+ "source": [
+ "___\n",
+ "# **OBTER ITENS (key, value) DO DICIONÁRIO**"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "LraTwXjdjG3m",
+ "outputId": "b3d6d55e-20ad-4f88-a783-9ba1c4fd8654",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 162
+ }
+ },
+ "source": [
+ "d_paises.items()"
+ ],
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "error",
+ "ename": "NameError",
+ "evalue": "ignored",
+ "traceback": [
+ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
+ "\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)",
+ "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0md_Paises\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mitems\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
+ "\u001b[0;31mNameError\u001b[0m: name 'd_Paises' is not defined"
+ ]
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "IJEMg2LKHyGa"
+ },
+ "source": [
+ "___\n",
+ "# **OBTER VALOR PARA UMA KEY ESPECÍFICA**\n",
+ "* d_dicionario.get(key)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "dzgBhsphjSQm"
+ },
+ "source": [
+ "Qual o valor para key= 1?"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "FUfTjqktjW60",
+ "outputId": "678ab629-6cff-4fe1-e03f-d90709a98f26",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 34
+ }
+ },
+ "source": [
+ "d_paises.get(1)"
+ ],
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "'Italy'"
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "execution_count": 11
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "tyJ0KsloIBoD"
+ },
+ "source": [
+ "___\n",
+ "# **COPIAR DICIONÁRIO**\n",
+ "* d_dicionario.copy()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "XL17EmvMkkky",
+ "outputId": "d3e9648a-ed03-47c2-e650-4a7a74dcaa38",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ }
+ },
+ "source": [
+ "d_paises2 = d_paises.copy()\n",
+ "d_paises2"
+ ],
+ "execution_count": 17,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "{1: 'Italy', 2: 'Denmark', 3: 'France'}"
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "execution_count": 17
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "8V25l2ZoIG4B"
+ },
+ "source": [
+ "___\n",
+ "# **REMOVER TODOS OS ITENS DO DICIONÁRIO**\n",
+ "* d_dicionario.clear()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "r-8Gs1gYjqLN"
+ },
+ "source": [
+ "d_paises.clear()"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "ro_42gzDjsdV",
+ "outputId": "a2c2a25b-40ef-4842-f2f7-3ac85404d195",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 35
+ }
+ },
+ "source": [
+ "d_paises"
+ ],
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "{}"
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "execution_count": 13
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "pCzKkKoujv7G"
+ },
+ "source": [
+ "Como esperado, removemos todos os itens do dicionário d_paises. Entretanto, o dicionário d_paises continua a existir!"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "MKtPwGVsIaLQ"
+ },
+ "source": [
+ "___\n",
+ "# **DELETAR O DICIONÁRIO**\n",
+ "* del d_dicionario"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "8wvM-o7Lj7A0"
+ },
+ "source": [
+ "del d_paises"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "wK83ZURYkD_T",
+ "outputId": "03254461-9939-4ef9-de30-c4b59c920674",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 166
+ }
+ },
+ "source": [
+ "d_paises"
+ ],
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "error",
+ "ename": "NameError",
+ "evalue": "ignored",
+ "traceback": [
+ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
+ "\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)",
+ "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mdCountries\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
+ "\u001b[0;31mNameError\u001b[0m: name 'dCountries' is not defined"
+ ]
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "aSe3veUB1lo_"
+ },
+ "source": [
+ "Como esperado, pois agora o dicionário já não existe mais. Ok?"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "STtkGUvEg7d1"
+ },
+ "source": [
+ "___\n",
+ "# **ITERAR PELO DICIONÁRIO**\n",
+ "* Considere o dicionário d_frutas a seguir:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "IG8hKSvcfalZ"
+ },
+ "source": [
+ "# Definindo os valores iniciais do dicionário d_frutas:\n",
+ "d_frutas = {'Avocado': 0.35, \n",
+ " 'Apple': 0.40, \n",
+ " 'Apricot': 0.25, \n",
+ " 'Banana': 0.30, \n",
+ " 'Blackcurrant': 0.70, \n",
+ " 'Blackberry': 0.55, \n",
+ " 'Blueberry': 0.45, \n",
+ " 'Cherry': 0.50, \n",
+ " 'Coconut': 0.75, \n",
+ " 'Fig': 0.60, \n",
+ " 'Grape': 0.65, \n",
+ " 'Kiwi': 0.20, \n",
+ " 'Lemon': 0.15, \n",
+ " 'Mango': 0.80, \n",
+ " 'Nectarine': 0.75, \n",
+ " 'Orange': 0.25, \n",
+ " 'Papaya': 0.30,\n",
+ " 'Passion Fruit': 0.45,\n",
+ " 'Peach': 0.55,\n",
+ " 'Pineapple': 0.55,\n",
+ " 'Plum': 0.60,\n",
+ " 'Raspberry': 0.40,\n",
+ " 'Strawberry': 0.50,\n",
+ " 'Watermelon': 0.45}"
+ ],
+ "execution_count": 18,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "ppRkK_jJJG6W"
+ },
+ "source": [
+ "Mostrando os itens do dicionário d_frutas:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "bI7Ctf0ohyz8",
+ "outputId": "05418ee0-ce00-439a-848a-de9d5084c900",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ }
+ },
+ "source": [
+ "d_frutas"
+ ],
+ "execution_count": 23,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "{'Apple': 0.4,\n",
+ " 'Apricot': 0.25,\n",
+ " 'Avocado': 0.35,\n",
+ " 'Banana': 0.3,\n",
+ " 'Blackberry': 0.55,\n",
+ " 'Blackcurrant': 0.7,\n",
+ " 'Blueberry': 0.45,\n",
+ " 'Cherry': 0.5,\n",
+ " 'Coconut': 0.75,\n",
+ " 'Fig': 0.6,\n",
+ " 'Grape': 0.65,\n",
+ " 'Kiwi': 0.2,\n",
+ " 'Lemon': 0.15,\n",
+ " 'Mango': 0.8,\n",
+ " 'Nectarine': 0.75,\n",
+ " 'Orange': 0.25,\n",
+ " 'Papaya': 0.3,\n",
+ " 'Passion Fruit': 0.45,\n",
+ " 'Peach': 0.55,\n",
+ " 'Pineapple': 0.55,\n",
+ " 'Plum': 0.6,\n",
+ " 'Raspberry': 0.4,\n",
+ " 'Strawberry': 0.5,\n",
+ " 'Watermelon': 0.45}"
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "execution_count": 23
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "wXFfyiyPtD35"
+ },
+ "source": [
+ "Qual o valor para a fruta 'Apple'? Para responder à esta pergunta, basta lembrar que 'Apple' é uma key do dicionário d_frutas. Certo?"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "JpreyE_LtCcU",
+ "outputId": "cee4be2d-7980-4a3d-85fb-17561d1bb1ff",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 34
+ }
+ },
+ "source": [
+ "d_frutas['Apple']"
+ ],
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "0.4"
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "execution_count": 21
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "huu835LGcyHL",
+ "outputId": "02e958ca-4133-4363-9ab1-c3eb767b2d5e",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ }
+ },
+ "source": [
+ "for chave in d_frutas.keys():\n",
+ " print (chave)"
+ ],
+ "execution_count": 26,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "text": [
+ "Avocado\n",
+ "Apple\n",
+ "Apricot\n",
+ "Banana\n",
+ "Blackcurrant\n",
+ "Blackberry\n",
+ "Blueberry\n",
+ "Cherry\n",
+ "Coconut\n",
+ "Fig\n",
+ "Grape\n",
+ "Kiwi\n",
+ "Lemon\n",
+ "Mango\n",
+ "Nectarine\n",
+ "Orange\n",
+ "Papaya\n",
+ "Passion Fruit\n",
+ "Peach\n",
+ "Pineapple\n",
+ "Plum\n",
+ "Raspberry\n",
+ "Strawberry\n",
+ "Watermelon\n"
+ ],
+ "name": "stdout"
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "JBMf8SbAJmiq"
+ },
+ "source": [
+ "## Iterar pelas keys do dicionário:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "aDDD-tbmdj0o",
+ "outputId": "5a9c4751-4fb6-4ee1-83a0-629ca32d0fba",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ }
+ },
+ "source": [
+ "d_frutas.keys()"
+ ],
+ "execution_count": 30,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "dict_keys(['Avocado', 'Apple', 'Apricot', 'Banana', 'Blackcurrant', 'Blackberry', 'Blueberry', 'Cherry', 'Coconut', 'Fig', 'Grape', 'Kiwi', 'Lemon', 'Mango', 'Nectarine', 'Orange', 'Papaya', 'Passion Fruit', 'Peach', 'Pineapple', 'Plum', 'Raspberry', 'Strawberry', 'Watermelon'])"
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "execution_count": 30
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "gOROGkRqfeUp",
+ "outputId": "c4252748-c64b-4df9-d82a-5648279c7765",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ }
+ },
+ "source": [
+ "d_frutas.values()"
+ ],
+ "execution_count": 31,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "dict_values([0.35, 0.4, 0.25, 0.3, 0.7, 0.55, 0.45, 0.5, 0.75, 0.6, 0.65, 0.2, 0.15, 0.8, 0.75, 0.25, 0.3, 0.45, 0.55, 0.55, 0.6, 0.4, 0.5, 0.45])"
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "execution_count": 31
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "2YgHJOref4Qe",
+ "outputId": "6dab6f4a-6380-4b43-828c-2d0b696236bd",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ }
+ },
+ "source": [
+ "d_frutas.items()"
+ ],
+ "execution_count": 32,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "dict_items([('Avocado', 0.35), ('Apple', 0.4), ('Apricot', 0.25), ('Banana', 0.3), ('Blackcurrant', 0.7), ('Blackberry', 0.55), ('Blueberry', 0.45), ('Cherry', 0.5), ('Coconut', 0.75), ('Fig', 0.6), ('Grape', 0.65), ('Kiwi', 0.2), ('Lemon', 0.15), ('Mango', 0.8), ('Nectarine', 0.75), ('Orange', 0.25), ('Papaya', 0.3), ('Passion Fruit', 0.45), ('Peach', 0.55), ('Pineapple', 0.55), ('Plum', 0.6), ('Raspberry', 0.4), ('Strawberry', 0.5), ('Watermelon', 0.45)])"
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "execution_count": 32
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "F8r8GgxvdMJA",
+ "outputId": "e7637082-e428-4f32-f09f-0ee22f82cf6f",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ }
+ },
+ "source": [
+ "for i_valor in d_frutas.values():\n",
+ " print (i_valor)"
+ ],
+ "execution_count": 27,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "text": [
+ "0.35\n",
+ "0.4\n",
+ "0.25\n",
+ "0.3\n",
+ "0.7\n",
+ "0.55\n",
+ "0.45\n",
+ "0.5\n",
+ "0.75\n",
+ "0.6\n",
+ "0.65\n",
+ "0.2\n",
+ "0.15\n",
+ "0.8\n",
+ "0.75\n",
+ "0.25\n",
+ "0.3\n",
+ "0.45\n",
+ "0.55\n",
+ "0.55\n",
+ "0.6\n",
+ "0.4\n",
+ "0.5\n",
+ "0.45\n"
+ ],
+ "name": "stdout"
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "rMro_tY8kepo",
+ "outputId": "4488c243-6792-4efa-b271-e546270b129d",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 425
+ }
+ },
+ "source": [
+ "for key in d_frutas.keys():\n",
+ " print(key)"
+ ],
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "text": [
+ "Avocado\n",
+ "Apple\n",
+ "Apricot\n",
+ "Banana\n",
+ "Blackcurrant\n",
+ "Blackberry\n",
+ "Blueberry\n",
+ "Cherry\n",
+ "Coconut\n",
+ "Fig\n",
+ "Grape\n",
+ "Kiwi\n",
+ "Lemon\n",
+ "Mango\n",
+ "Nectarine\n",
+ "Orange\n",
+ "Papaya\n",
+ "Passion Fruit\n",
+ "Peach\n",
+ "Pineapple\n",
+ "Plum\n",
+ "Raspberry\n",
+ "Strawberry\n",
+ "Watermelon\n"
+ ],
+ "name": "stdout"
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "yDkOLvRFJxco"
+ },
+ "source": [
+ "## Iterar pelos itens (key, value) do dicionário"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "DpFB1g-3kDSt",
+ "outputId": "f94dd133-3c61-4ac9-b8df-d5ca641a66e1",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ }
+ },
+ "source": [
+ "for item in d_frutas.items():\n",
+ " print(item) "
+ ],
+ "execution_count": 28,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "text": [
+ "('Avocado', 0.35)\n",
+ "('Apple', 0.4)\n",
+ "('Apricot', 0.25)\n",
+ "('Banana', 0.3)\n",
+ "('Blackcurrant', 0.7)\n",
+ "('Blackberry', 0.55)\n",
+ "('Blueberry', 0.45)\n",
+ "('Cherry', 0.5)\n",
+ "('Coconut', 0.75)\n",
+ "('Fig', 0.6)\n",
+ "('Grape', 0.65)\n",
+ "('Kiwi', 0.2)\n",
+ "('Lemon', 0.15)\n",
+ "('Mango', 0.8)\n",
+ "('Nectarine', 0.75)\n",
+ "('Orange', 0.25)\n",
+ "('Papaya', 0.3)\n",
+ "('Passion Fruit', 0.45)\n",
+ "('Peach', 0.55)\n",
+ "('Pineapple', 0.55)\n",
+ "('Plum', 0.6)\n",
+ "('Raspberry', 0.4)\n",
+ "('Strawberry', 0.5)\n",
+ "('Watermelon', 0.45)\n"
+ ],
+ "name": "stdout"
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "fVcEz1OMiUBu",
+ "outputId": "e7b5e949-2e02-4d22-cea8-980fc1844f43",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ }
+ },
+ "source": [
+ "d_frutas2 = {k: v for k,v in filter(lambda t: t[0] == 'Apple', d_frutas.items())} # o t[0] refere-se à chave do dicionário\n",
+ "d_frutas2"
+ ],
+ "execution_count": 33,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "{'Apple': 0.4}"
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "execution_count": 33
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "V6j2Q0jngTc6"
+ },
+ "source": [
+ "for key, value in d_frutas.items():\n",
+ " "
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "8z6qO74fJ6Q1"
+ },
+ "source": [
+ "## Iterar pelos valores do dicionário"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "tjJ6qRF8nr4v",
+ "outputId": "55fe54a5-4702-4a07-c050-0fc83d2de5ca",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ }
+ },
+ "source": [
+ "for value in d_frutas.values():\n",
+ " print(value)"
+ ],
+ "execution_count": 29,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "text": [
+ "0.35\n",
+ "0.4\n",
+ "0.25\n",
+ "0.3\n",
+ "0.7\n",
+ "0.55\n",
+ "0.45\n",
+ "0.5\n",
+ "0.75\n",
+ "0.6\n",
+ "0.65\n",
+ "0.2\n",
+ "0.15\n",
+ "0.8\n",
+ "0.75\n",
+ "0.25\n",
+ "0.3\n",
+ "0.45\n",
+ "0.55\n",
+ "0.55\n",
+ "0.6\n",
+ "0.4\n",
+ "0.5\n",
+ "0.45\n"
+ ],
+ "name": "stdout"
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "-LmEUroVKDUA"
+ },
+ "source": [
+ "## Iterar pela key e valor do dicionário"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "oRhZ_Zq9oQIg",
+ "outputId": "be168183-30b4-4f96-ae2c-3f313acbc558",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 425
+ }
+ },
+ "source": [
+ "for key, value in d_frutas.items():\n",
+ " print(\"%s --> %s\" %(key, value))"
+ ],
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "text": [
+ "Avocado --> 0.35\n",
+ "Apple --> 0.4\n",
+ "Apricot --> 0.25\n",
+ "Banana --> 0.3\n",
+ "Blackcurrant --> 0.7\n",
+ "Blackberry --> 0.55\n",
+ "Blueberry --> 0.45\n",
+ "Cherry --> 0.5\n",
+ "Coconut --> 0.75\n",
+ "Fig --> 0.6\n",
+ "Grape --> 0.65\n",
+ "Kiwi --> 0.2\n",
+ "Lemon --> 0.15\n",
+ "Mango --> 0.8\n",
+ "Nectarine --> 0.75\n",
+ "Orange --> 0.25\n",
+ "Papaya --> 0.3\n",
+ "Passion Fruit --> 0.45\n",
+ "Peach --> 0.55\n",
+ "Pineapple --> 0.55\n",
+ "Plum --> 0.6\n",
+ "Raspberry --> 0.4\n",
+ "Strawberry --> 0.5\n",
+ "Watermelon --> 0.45\n"
+ ],
+ "name": "stdout"
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "Fotx7XUquAo8"
+ },
+ "source": [
+ "___\n",
+ "# **VERIFICAR SE UMA KEY ESPECÍFICA PERTENCE AO DICIONÁRIO**"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "ju__WsSoKXtk"
+ },
+ "source": [
+ "A fruta 'Apple' (que em nosso caso, é uma key) existe no dicionário?"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "-gkEKNZPTeMp",
+ "outputId": "3540aadd-996a-4abd-cfcb-c22e49b75aaa",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 34
+ }
+ },
+ "source": [
+ "'Apple' in d_frutas.keys()"
+ ],
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "True"
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "execution_count": 75
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "fMzBeFMIusv7"
+ },
+ "source": [
+ "A fruta 'Coconut' pertence ao dicionário d_frutas?"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "SKtEwmBCuxyi",
+ "outputId": "1df7263c-a64f-4eaf-8d4d-a55cac03d2bc",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 34
+ }
+ },
+ "source": [
+ "'Coconut' in fruits.keys()"
+ ],
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "True"
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "execution_count": 77
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "rrH8ArqsK6Bd"
+ },
+ "source": [
+ "___\n",
+ "# **VERIFICAR SE VALOR PERTENCE AO DICIONÁRIO**"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "DbWpbuLTK9sn",
+ "outputId": "e9fafa6d-284e-4862-8f25-9419ff702dec",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 34
+ }
+ },
+ "source": [
+ "0.4 in d_frutas.values()"
+ ],
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "True"
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "execution_count": 14
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "36kmLUYDvsUI"
+ },
+ "source": [
+ "## Adicionar novos itens ao dicionário\n",
+ "* Considere o dicionário d_frutas2 abaixo:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "5Rwq4-UG4--u"
+ },
+ "source": [
+ "d_frutas2 = {'Grapefruit': 1.0 }"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "vljceM6_5H9o"
+ },
+ "source": [
+ "O comando abaixo adiciona o dicionário d_frutas2 ao dicionário d_frutas."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "7BD_mYMM5O5o",
+ "outputId": "2b185546-255e-4ad0-e8c9-10564fcbe2b0",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 442
+ }
+ },
+ "source": [
+ "d_frutas.update(d_frutas2)\n",
+ "d_frutas"
+ ],
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "{'Apple': 0.4,\n",
+ " 'Apricot': 0.25,\n",
+ " 'Avocado': 0.35,\n",
+ " 'Banana': 0.3,\n",
+ " 'Blackberry': 0.55,\n",
+ " 'Blackcurrant': 0.7,\n",
+ " 'Blueberry': 0.45,\n",
+ " 'Cherry': 0.5,\n",
+ " 'Coconut': 0.75,\n",
+ " 'Fig': 0.6,\n",
+ " 'Grape': 0.65,\n",
+ " 'Grapefruit': 1.0,\n",
+ " 'Kiwi': 0.2,\n",
+ " 'Lemon': 0.15,\n",
+ " 'Mango': 0.8,\n",
+ " 'Nectarine': 0.75,\n",
+ " 'Orange': 0.25,\n",
+ " 'Papaya': 0.3,\n",
+ " 'Passion Fruit': 0.45,\n",
+ " 'Peach': 0.55,\n",
+ " 'Pineapple': 0.55,\n",
+ " 'Plum': 0.6,\n",
+ " 'Raspberry': 0.4,\n",
+ " 'Strawberry': 0.5,\n",
+ " 'Watermelon': 0.45}"
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "execution_count": 79
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "ffh-94lo55n4"
+ },
+ "source": [
+ "Agora, considere o dicionário d_frutas3 abaixo:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "JMAq_jbP5---"
+ },
+ "source": [
+ "d_frutas3 = {'Apple': 0.70}"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "Jd6B2cy-6KmY"
+ },
+ "source": [
+ "Qual o resultado do comando abaixo?\n",
+ "\n",
+ "* Atenção: A fruta 'Apple' (é uma key do dicionário d_frutas) tem valor 0.40. E no dicionário d_frutas3 a fruta 'Apple' tem valor 0.70."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "E4GKdTw76PXI"
+ },
+ "source": [
+ "d_frutas.update(d_frutas3)\n",
+ "d_frutas"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "HMmDfrln6o0c"
+ },
+ "source": [
+ "Como esperado, como key= 'Apple' existe no dicionário d_frutas, então o Python atualizou o valor de key= 'Apple' para 0.70."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "SWO2GdNovxAp"
+ },
+ "source": [
+ "## Modificar keys e valores"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "DX9UTy4TwlAw"
+ },
+ "source": [
+ "Suponha que queremos aplicar um desconto de 10% para cada fruta do nosso dicionário.\n",
+ "\n",
+ "* Como fazemos isso?"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "ZziGmKGmwqwn"
+ },
+ "source": [
+ "for key, value in d_frutas.items():\n",
+ " d_frutas[key] = round(value * 0.9, 2)"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "s1B-yN8lM-C1"
+ },
+ "source": [
+ "Mostra d_frutas com os valores atualizados:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "zZLa85knxBtY",
+ "outputId": "2c7c12f8-8885-4f34-a0d1-1323e98a9437",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 442
+ }
+ },
+ "source": [
+ "d_frutas"
+ ],
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "{'Apple': 0.63,\n",
+ " 'Apricot': 0.23,\n",
+ " 'Avocado': 0.32,\n",
+ " 'Banana': 0.27,\n",
+ " 'Blackberry': 0.5,\n",
+ " 'Blackcurrant': 0.63,\n",
+ " 'Blueberry': 0.41,\n",
+ " 'Cherry': 0.45,\n",
+ " 'Coconut': 0.68,\n",
+ " 'Fig': 0.54,\n",
+ " 'Grape': 0.59,\n",
+ " 'Grapefruit': 0.9,\n",
+ " 'Kiwi': 0.18,\n",
+ " 'Lemon': 0.14,\n",
+ " 'Mango': 0.72,\n",
+ " 'Nectarine': 0.68,\n",
+ " 'Orange': 0.23,\n",
+ " 'Papaya': 0.27,\n",
+ " 'Passion Fruit': 0.41,\n",
+ " 'Peach': 0.5,\n",
+ " 'Pineapple': 0.5,\n",
+ " 'Plum': 0.54,\n",
+ " 'Raspberry': 0.36,\n",
+ " 'Strawberry': 0.45,\n",
+ " 'Watermelon': 0.41}"
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "execution_count": 84
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "vpN54l4vxze5"
+ },
+ "source": [
+ "## Deletar keys do dicionário\n",
+ "* Deletar uma key significa deletar todo o item {key: value}, ok?"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "eDlthLStNIwR"
+ },
+ "source": [
+ "Suponha que queremos deletar a fruta 'Avocado' do dicionário d_frutas.\n",
+ "\n",
+ "* Como fazer isso?"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "fnpzHZU_x5Y1"
+ },
+ "source": [
+ "for key in list(d_frutas.keys()): # Dica: use a função list para melhorar a performance computacional\n",
+ " if key == 'Avocado':\n",
+ " del d_frutas[key] # Deleta key = 'Avocado'"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "VyPUrobONqvI"
+ },
+ "source": [
+ "Mostra o dicionário d_frutas atualizado:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "IwnsHejhyT4l",
+ "outputId": "b910699c-9729-4a27-bd78-3a283c82ac39",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 425
+ }
+ },
+ "source": [
+ "d_frutas"
+ ],
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "{'Apple': 0.63,\n",
+ " 'Apricot': 0.23,\n",
+ " 'Banana': 0.27,\n",
+ " 'Blackberry': 0.5,\n",
+ " 'Blackcurrant': 0.63,\n",
+ " 'Blueberry': 0.41,\n",
+ " 'Cherry': 0.45,\n",
+ " 'Coconut': 0.68,\n",
+ " 'Fig': 0.54,\n",
+ " 'Grape': 0.59,\n",
+ " 'Grapefruit': 0.9,\n",
+ " 'Kiwi': 0.18,\n",
+ " 'Lemon': 0.14,\n",
+ " 'Mango': 0.72,\n",
+ " 'Nectarine': 0.68,\n",
+ " 'Orange': 0.23,\n",
+ " 'Papaya': 0.27,\n",
+ " 'Passion Fruit': 0.41,\n",
+ " 'Peach': 0.5,\n",
+ " 'Pineapple': 0.5,\n",
+ " 'Plum': 0.54,\n",
+ " 'Raspberry': 0.36,\n",
+ " 'Strawberry': 0.45,\n",
+ " 'Watermelon': 0.41}"
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "execution_count": 86
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "u4HOf9SNytSq"
+ },
+ "source": [
+ "## Filtrar/Selecionar itens baseado em condições\n",
+ "Em algumas situações você vai querer filtrar os itens do dicionário que satisfaçam alguma(s) condições.\n",
+ "\n",
+ "* Considere o exemplo a seguir: queremos selecionar/filtrar somente as frutas com preços maiores que 0.4."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "EwqxWiVlyvgH"
+ },
+ "source": [
+ "d_frutas_filtro = {}\n",
+ "for key, value in d_frutas.items():\n",
+ " if value > 0.5:\n",
+ " d_frutas_filtro.update({key: value})"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "eb0jmAKWOtYt"
+ },
+ "source": [
+ "Mostra o resultado do dicionário d_frutas_Selected:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "SsStWM5k1s-Q",
+ "outputId": "f6af5b61-2333-41c7-a28a-0f6a67b0a949",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 170
+ }
+ },
+ "source": [
+ "d_frutas_filtro"
+ ],
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "{'Apple': 0.63,\n",
+ " 'Blackcurrant': 0.63,\n",
+ " 'Coconut': 0.68,\n",
+ " 'Fig': 0.54,\n",
+ " 'Grape': 0.59,\n",
+ " 'Grapefruit': 0.9,\n",
+ " 'Mango': 0.72,\n",
+ " 'Nectarine': 0.68,\n",
+ " 'Plum': 0.54}"
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "execution_count": 89
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "u1ve6xIGOjrE"
+ },
+ "source": [
+ " Como se pode ver, somente a fruta 'Blackberry' satifaz esta condição."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "KJqpPrfkCk9L"
+ },
+ "source": [
+ "## Cálculos com os itens do dicionário"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "exD8HXodCqg6"
+ },
+ "source": [
+ "from collections import Counter"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "llCLTysdCuwB"
+ },
+ "source": [
+ "Somando os valores de todas as frutas"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "uG0VP1MNCroX",
+ "outputId": "8221b07b-610d-4a7c-cb14-86d6f63e5be3",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 34
+ }
+ },
+ "source": [
+ "sum(d_frutas.values())"
+ ],
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "11.450000000000001"
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "execution_count": 22
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "a5MBNCF-C5-4"
+ },
+ "source": [
+ "Quantos itens existem no dicionário:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "AkvygR0PC9bT",
+ "outputId": "254eff41-8336-4fe6-d6ad-4d52544d74a9",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 34
+ }
+ },
+ "source": [
+ "len(list(d_frutas))"
+ ],
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "24"
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "execution_count": 25
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "xBNFaklq8OC9"
+ },
+ "source": [
+ "## Sortear itens do dicionário - sorted(d_dicionario.items(), reverse= True/False)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "WULJMjHA-mal"
+ },
+ "source": [
+ "Ordem alfabética (por key):"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "SH0WIKZ8-Ylr",
+ "outputId": "b9cea719-637e-40a5-9e79-eb67aeb47887",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 425
+ }
+ },
+ "source": [
+ "d_frutas_ordenadas = sorted(d_frutas.items(), reverse = False)\n",
+ "d_frutas_ordenadas"
+ ],
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "[('Apple', 0.4),\n",
+ " ('Apricot', 0.25),\n",
+ " ('Avocado', 0.35),\n",
+ " ('Banana', 0.3),\n",
+ " ('Blackberry', 0.55),\n",
+ " ('Blackcurrant', 0.7),\n",
+ " ('Blueberry', 0.45),\n",
+ " ('Cherry', 0.5),\n",
+ " ('Coconut', 0.75),\n",
+ " ('Fig', 0.6),\n",
+ " ('Grape', 0.65),\n",
+ " ('Kiwi', 0.2),\n",
+ " ('Lemon', 0.15),\n",
+ " ('Mango', 0.8),\n",
+ " ('Nectarine', 0.75),\n",
+ " ('Orange', 0.25),\n",
+ " ('Papaya', 0.3),\n",
+ " ('Passion Fruit', 0.45),\n",
+ " ('Peach', 0.55),\n",
+ " ('Pineapple', 0.55),\n",
+ " ('Plum', 0.6),\n",
+ " ('Raspberry', 0.4),\n",
+ " ('Strawberry', 0.5),\n",
+ " ('Watermelon', 0.45)]"
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "execution_count": 12
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "T4Li1Q2d-pnZ"
+ },
+ "source": [
+ "Ordem reversa (por key):"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "PoBOmfpM_A_a",
+ "outputId": "4cd9a21c-a2ad-462c-acb0-26ba7a0a4e5d",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 425
+ }
+ },
+ "source": [
+ "d_frutas_ordenadas_reverse = sorted(d_frutas.items(), reverse = True)\n",
+ "d_frutas_ordenadas_reverse"
+ ],
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "[('Watermelon', 0.45),\n",
+ " ('Strawberry', 0.5),\n",
+ " ('Raspberry', 0.4),\n",
+ " ('Plum', 0.6),\n",
+ " ('Pineapple', 0.55),\n",
+ " ('Peach', 0.55),\n",
+ " ('Passion Fruit', 0.45),\n",
+ " ('Papaya', 0.3),\n",
+ " ('Orange', 0.25),\n",
+ " ('Nectarine', 0.75),\n",
+ " ('Mango', 0.8),\n",
+ " ('Lemon', 0.15),\n",
+ " ('Kiwi', 0.2),\n",
+ " ('Grape', 0.65),\n",
+ " ('Fig', 0.6),\n",
+ " ('Coconut', 0.75),\n",
+ " ('Cherry', 0.5),\n",
+ " ('Blueberry', 0.45),\n",
+ " ('Blackcurrant', 0.7),\n",
+ " ('Blackberry', 0.55),\n",
+ " ('Banana', 0.3),\n",
+ " ('Avocado', 0.35),\n",
+ " ('Apricot', 0.25),\n",
+ " ('Apple', 0.4)]"
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "execution_count": 11
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "FxTC2-U88ajk"
+ },
+ "source": [
+ "## Função filter()\n",
+ "* A função filter() aplica um filtro no dicionário, retornando apenas os itens que satisfaz as condições do filtro."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "iJq1clvOHVG2",
+ "outputId": "16a779ef-48c9-497c-8c7c-a1612aa9aa03",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 425
+ }
+ },
+ "source": [
+ "d_frutas"
+ ],
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "{'Apple': 0.4,\n",
+ " 'Apricot': 0.25,\n",
+ " 'Avocado': 0.35,\n",
+ " 'Banana': 0.3,\n",
+ " 'Blackberry': 0.55,\n",
+ " 'Blackcurrant': 0.7,\n",
+ " 'Blueberry': 0.45,\n",
+ " 'Cherry': 0.5,\n",
+ " 'Coconut': 0.75,\n",
+ " 'Fig': 0.6,\n",
+ " 'Grape': 0.65,\n",
+ " 'Kiwi': 0.2,\n",
+ " 'Lemon': 0.15,\n",
+ " 'Mango': 0.8,\n",
+ " 'Nectarine': 0.75,\n",
+ " 'Orange': 0.25,\n",
+ " 'Papaya': 0.3,\n",
+ " 'Passion Fruit': 0.45,\n",
+ " 'Peach': 0.55,\n",
+ " 'Pineapple': 0.55,\n",
+ " 'Plum': 0.6,\n",
+ " 'Raspberry': 0.4,\n",
+ " 'Strawberry': 0.5,\n",
+ " 'Watermelon': 0.45}"
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "execution_count": 2
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "qtTKvNeJNycl"
+ },
+ "source": [
+ "### Filtrando por key:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "uIDW5FhwAiSs",
+ "outputId": "52599d3f-ff13-4894-f697-ce7290bff9d5",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 34
+ }
+ },
+ "source": [
+ "d_frutas2 = {k: v for k, v in filter(lambda t: t[0] == 'Apple', d_frutas.items())}\n",
+ "d_frutas2"
+ ],
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "{'Apple': 0.4}"
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "execution_count": 6
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "nUMGIzxeNt_U"
+ },
+ "source": [
+ "### Filtrando por valor:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "tvHcQatANltL",
+ "outputId": "8feaf5b1-1db8-4391-8950-248ba8ab46c5",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 187
+ }
+ },
+ "source": [
+ "d_frutas3 = {k: v for k, v in filter(lambda t: t[1] > 0.5, d_frutas.items())}\n",
+ "d_frutas3"
+ ],
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "{'Blackberry': 0.55,\n",
+ " 'Blackcurrant': 0.7,\n",
+ " 'Coconut': 0.75,\n",
+ " 'Fig': 0.6,\n",
+ " 'Grape': 0.65,\n",
+ " 'Mango': 0.8,\n",
+ " 'Nectarine': 0.75,\n",
+ " 'Peach': 0.55,\n",
+ " 'Pineapple': 0.55,\n",
+ " 'Plum': 0.6}"
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "execution_count": 7
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "qA_XhCdmA6Gn"
+ },
+ "source": [
+ "___\n",
+ "# **EXERCÍCIOS**"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "RSpyl_URgNyE"
+ },
+ "source": [
+ "## Exercício 1\n",
+ "* É possível sortear os itens de um dicionário? Explique sua resposta."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "CXqc9kHch6Mm"
+ },
+ "source": [
+ "## Exercício 2\n",
+ "* É possível termos um dicionário do tipo abaixo?"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "0BBWO9Zth_mc",
+ "outputId": "330cd62b-9b7b-4b72-e3b8-1b1a5d3e9ee3",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ }
+ },
+ "source": [
+ "d_colaboradores= {'Gerentes': ['A', 'B', 'C'], 'Programadores': ['B', 'D', 'E', 'F', 'G'], 'Gerentes_Projeto': ['A', 'E']}\n",
+ "d_colaboradores"
+ ],
+ "execution_count": 34,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "{'Gerentes': ['A', 'B', 'C'],\n",
+ " 'Gerentes_Projeto': ['A', 'E'],\n",
+ " 'Programadores': ['B', 'D', 'E', 'F', 'G']}"
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "execution_count": 34
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "TNiJSG_uiePb"
+ },
+ "source": [
+ "Como acessar o Gerente 'A'?"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "k0YZg0gMjzCT",
+ "outputId": "333e147c-d9a0-452f-f152-a0dacf4182b8",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ }
+ },
+ "source": [
+ "d_colaboradores ['Gerentes']"
+ ],
+ "execution_count": 35,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "['A', 'B', 'C']"
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "execution_count": 35
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "U7fAFy_8j48J",
+ "outputId": "84cd7173-35db-4329-e6d6-0d2ba45b60b6",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ }
+ },
+ "source": [
+ "d_colaboradores ['Programadores']"
+ ],
+ "execution_count": 36,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "['B', 'D', 'E', 'F', 'G']"
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "execution_count": 36
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "Wh61G2i2kE3j",
+ "outputId": "39297cee-ad6a-4df2-f0bf-3b21b82c48d4",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 35
+ }
+ },
+ "source": [
+ "s_gerente_A = d_colaboradores ['Gerentes']\n",
+ "s_gerente_A [0]\n",
+ "\n"
+ ],
+ "execution_count": 37,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "application/vnd.google.colaboratory.intrinsic+json": {
+ "type": "string"
+ },
+ "text/plain": [
+ "'A'"
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "execution_count": 37
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "Ws8GtJr6nlqJ"
+ },
+ "source": [
+ ""
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "2zq4kU-smVju",
+ "outputId": "867cef53-26d9-47c2-9a4c-97124ade8fe1",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 163
+ }
+ },
+ "source": [
+ "d_colaboradores.values('A')\n"
+ ],
+ "execution_count": 41,
+ "outputs": [
+ {
+ "output_type": "error",
+ "ename": "TypeError",
+ "evalue": "ignored",
+ "traceback": [
+ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
+ "\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)",
+ "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0md_colaboradores\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mvalues\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'A'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
+ "\u001b[0;31mTypeError\u001b[0m: values() takes no arguments (1 given)"
+ ]
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "ntVcr_3XwaQ-"
+ },
+ "source": [
+ "## Exercício 3\n",
+ "Consulte a página [Python Data Types: Dictionary - Exercises, Practice, Solution](https://www.w3resource.com/python-exercises/dictionary/) para mais exercícios relacionados à dicionários."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "PmW40kENj4NO"
+ },
+ "source": [
+ ""
+ ],
+ "execution_count": null,
+ "outputs": []
+ }
+ ]
+}
\ No newline at end of file
diff --git a/Notebooks/NB15_00_gerson__Machine_Learning___DSWP.ipynb b/Notebooks/NB15_00_gerson__Machine_Learning___DSWP.ipynb
new file mode 100644
index 000000000..b7b17b205
--- /dev/null
+++ b/Notebooks/NB15_00_gerson__Machine_Learning___DSWP.ipynb
@@ -0,0 +1,4554 @@
+{
+ "nbformat": 4,
+ "nbformat_minor": 0,
+ "metadata": {
+ "kernelspec": {
+ "name": "python3",
+ "display_name": "Python 3"
+ },
+ "colab": {
+ "name": "NB15_00__Machine_Learning.ipynb",
+ "provenance": [],
+ "include_colab_link": true
+ },
+ "accelerator": "TPU"
+ },
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "view-in-github",
+ "colab_type": "text"
+ },
+ "source": [
+ "
"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "ShVXyGj9wkgN"
+ },
+ "source": [
+ "MACHINE LEARNING WITH PYTHON
"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "aYQ4cDfcPu4e"
+ },
+ "source": [
+ "___\n",
+ "# **NOTAS E OBSERVAÇÕES**\n",
+ "* Abordar o impacto do desbalanceamento da amostra;\n",
+ "* Colocar AUROC no material e mostrar o cut off para classificação entre 0 e 1;\n",
+ "* Conceitos estatísticos de bias & variance;"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "5YvhLC_uf4_G"
+ },
+ "source": [
+ "___\n",
+ "# **AGENDA**\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "QgX6n2VDyY1O"
+ },
+ "source": [
+ "___\n",
+ "# **REFERÊNCIAS**\n",
+ "* [scikit-learn - Machine Learning With Python](https://scikit-learn.org/stable/);\n",
+ "* [An Introduction to Machine Learning Theory and Its Applications: A Visual Tutorial with Examples](https://www.toptal.com/machine-learning/machine-learning-theory-an-introductory-primer)\n",
+ "* [The Difference Between Artificial Intelligence, Machine Learning, and Deep Learning](https://medium.com/iotforall/the-difference-between-artificial-intelligence-machine-learning-and-deep-learning-3aa67bff5991)\n",
+ "* [A Gentle Guide to Machine Learning](https://blog.monkeylearn.com/a-gentle-guide-to-machine-learning/)\n",
+ "* [A Visual Introduction to Machine Learning](http://www.r2d3.us/visual-intro-to-machine-learning-part-1/)\n",
+ "* [Introduction to Machine Learning](http://alex.smola.org/drafts/thebook.pdf)\n",
+ "* [The 10 Statistical Techniques Data Scientists Need to Master](https://medium.com/cracking-the-data-science-interview/the-10-statistical-techniques-data-scientists-need-to-master-1ef6dbd531f7)\n",
+ "* [Tune: a library for fast hyperparameter tuning at any scale](https://towardsdatascience.com/fast-hyperparameter-tuning-at-scale-d428223b081c)\n",
+ "* [How to lie with Data Science](https://towardsdatascience.com/how-to-lie-with-data-science-5090f3891d9c)\n",
+ "* [5 Reasons “Logistic Regression” should be the first thing you learn when becoming a Data Scientist](https://towardsdatascience.com/5-reasons-logistic-regression-should-be-the-first-thing-you-learn-when-become-a-data-scientist-fcaae46605c4)\n",
+ "* [Machine learning on categorical variables](https://towardsdatascience.com/machine-learning-on-categorical-variables-3b76ffe4a7cb)\n",
+ "\n",
+ "## Deep Learning & Neural Networks\n",
+ "\n",
+ "- [An Introduction to Neural Networks](http://www.cs.stir.ac.uk/~lss/NNIntro/InvSlides.html)\n",
+ "- [An Introduction to Image Recognition with Deep Learning](https://medium.com/@ageitgey/machine-learning-is-fun-part-3-deep-learning-and-convolutional-neural-networks-f40359318721)\n",
+ "- [Neural Networks and Deep Learning](http://neuralnetworksanddeeplearning.com/index.html)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "TsCbZd2epfxo"
+ },
+ "source": [
+ "___\n",
+ "# **INTRODUÇÃO**\n",
+ "\n",
+ "* \"__Information is the oil of the 21st century, and analytics is the combustion engine__.\" - Peter Sondergaard, SVP, Garner Research;\n",
+ "\n",
+ "\n",
+ ">O foco deste capítulo será:\n",
+ "* Linear, Logistic Regression, Decision Tree, Random Forest, Support Vector Machine and XGBoost algorithms for building Machine Learning models;\n",
+ "* Entender como resolver problemas de classificação e Regressão;\n",
+ "* Aplicar técnicas de Ensemble como Bagging e Boosting;\n",
+ "* Como medir a acurácia dos modelos de Machine Learning;\n",
+ "* Aprender os principais algoritmos de Machine Learning tanto das técnicas de aprendizagem supervisionada quanto da não-supervisionada.\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "HqqB2vaHXMGt"
+ },
+ "source": [
+ "___\n",
+ "# **ARTIFICIAL INTELLIGENCE VS MACHINE LEARNING VS DEEP LEARNING**\n",
+ "* **Machine Learning** - dá aos computadores a capacidade de aprender sem serem explicitamente programados. Os computadores podem melhorar sua capacidade de aprendizagem através da prática de uma tarefa, geralmente usando grandes conjuntos de dados.\n",
+ "* **Deep Learning** - é um método de Machine Learning que depende de redes neurais artificiais, permitindo que os sistemas de computadores aprendam pelo exemplo, assim como nós humanos aprendemos."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "P961GcguXFFA"
+ },
+ "source": [
+ "\n",
+ "\n",
+ "Source: [Artificial Intelligence vs. Machine Learning vs. Deep Learning](https://github.com/MathMachado/P4ML/blob/DS_Python/Material/Evolution%20of%20AI.PNG)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "lkqGtO88ZkPr"
+ },
+ "source": [
+ "\n",
+ "\n",
+ "Source: [Artificial Intelligence vs. Machine Learning vs. Deep Learning](https://towardsdatascience.com/artificial-intelligence-vs-machine-learning-vs-deep-learning-2210ba8cc4ac)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "xesQpzfmaqj6"
+ },
+ "source": [
+ "\n",
+ "\n",
+ "Source: [Artificial Intelligence vs. Machine Learning vs. Deep Learning](https://towardsdatascience.com/artificial-intelligence-vs-machine-learning-vs-deep-learning-2210ba8cc4ac)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "KeIVR59IIS7f"
+ },
+ "source": [
+ "___\n",
+ "# **MACHINE LEARNING - TECHNIQUES**\n",
+ "\n",
+ "* Supervised Learning\n",
+ "* Unsupervised Learning\n",
+ "\n",
+ "\n",
+ "\n",
+ "Source: [Machine Learning for Everyone](https://vas3k.com/blog/machine_learning/?source=post_page-----885aa35db58b----------------------)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "rvwp5UHdBiup"
+ },
+ "source": [
+ "___\n",
+ "# **NOSSO FOCO AQUI SERÁ...**\n",
+ "\n",
+ "\n",
+ "\n",
+ "Source: [Machine Learning for Everyone](https://vas3k.com/blog/machine_learning/?source=post_page-----885aa35db58b----------------------)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "cBLSvJTXHBjK"
+ },
+ "source": [
+ "___\n",
+ "# **CHEETSHEET**"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "ZdjR3nahUuKq"
+ },
+ "source": [
+ "\n",
+ "\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "MkBSvyorGXQz"
+ },
+ "source": [
+ "___\n",
+ "# **CROSS-VALIDATION**\n",
+ "* K-fold é o método de Cross-Validation (CV) mais conhecido e utilizado;\n",
+ "* Como funciona: divide o dataframe de treinamento em k partes;\n",
+ " * Usa k-1 partes para treinar o modelo e o restante para validar o modelo;\n",
+ " * repete este processo k vezes, sendo que em cada iteração calcula as métricas desejadas;\n",
+ " * Ao final das k iterações, teremos k métricas das quais calculamos média e desvio-padrão.\n",
+ "\n",
+ " A figura abaixo nos ajuda a entender como funciona CV:\n",
+ "\n",
+ "\n",
+ "\n",
+ "Source: [5 Reasons why you should use Cross-Validation in your Data Science Projects](https://towardsdatascience.com/5-reasons-why-you-should-use-cross-validation-in-your-data-science-project-8163311a1e79)\n",
+ "\n",
+ "* **valor de k**:\n",
+ " * valor de k (folds): entre 5 e 10 --> Não há regra geral para a escolha de k;\n",
+ " * Quanto maior o valor de k, menor o viés do CV;\n",
+ "\n",
+ "[Applied Predictive Modeling, 2013](https://www.amazon.com/Applied-Predictive-Modeling-Max-Kuhn/dp/1461468485/ref=as_li_ss_tl?ie=UTF8&qid=1520380699&sr=8-1&keywords=applied+predictive+modeling&linkCode=sl1&tag=inspiredalgor-20&linkId=1af1f3de89c11e4a7fd49de2b05e5ebf)."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "HscfN-a1V043"
+ },
+ "source": [
+ "* **Vantagens do uso de CV**:\n",
+ " * Modelos com melhor acurácia;\n",
+ " * Melhor uso dos dados, pois todos os dados são utilizados como treinamento e validação. Portanto, qualquer problema com os dados serão encontrados nesta fase.\n",
+ "\n",
+ "* **Leitura Adicional**\n",
+ " * [Cross-Validation in Machine Learning](https://towardsdatascience.com/cross-validation-in-machine-learning-72924a69872f)\n",
+ " * [5 Reasons why you should use Cross-Validation in your Data Science Projects](https://towardsdatascience.com/5-reasons-why-you-should-use-cross-validation-in-your-data-science-project-8163311a1e79)\n",
+ " * [Cross-validation: evaluating estimator performance](https://scikit-learn.org/stable/modules/cross_validation.html)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "XRukccWQSklx"
+ },
+ "source": [
+ "## Medidas para avaliarmos a variabilidade presente nos dados\n",
+ "* As principais medidas para medirmos a variabilidade dos dados são amplitude, variância, desvio padrão e coeficiente de variação;\n",
+ "* Estas medidas nos permite concluir se os dados são homogêneos (menor dispersão/variabilidade) ou heterogêneos (maior variabilidade/dispersão).\n",
+ "\n",
+ "* **Na próxima versão, trazer estes conceitos para o Notebook e usar o Python para calcular estas medidas**."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "yBR8tWV_lhQq"
+ },
+ "source": [
+ "___\n",
+ "# **ENSEMBLE METHODS** (= Combinar modelos preditivos)\n",
+ "* Métodos\n",
+ " * **Bagging** (Bootstrap AGGregatING)\n",
+ " * **Boosting**\n",
+ " * Stacking --> Não é muito utilizado\n",
+ "* Evita overfitting (Overfitting é quando o modelo/função se ajusta muito bem ao dados de treinamento, sendo ineficiente para generalizar para outras amostras/população).\n",
+ "* Constroi meta-classificadores: combinar os resultados de vários algoritmos para produzir previsões mais precisas e robustas do que as previsões de cada classificador individual.\n",
+ "* Ensemble reduz/minimiza os efeitos das principais causas de erros nos modelos de Machine Learning:\n",
+ " * ruído;\n",
+ " * bias (viés);\n",
+ " * variância --> Principal medida para medir a variabilidade presente nos dados.\n",
+ "\n",
+ "# Referências\n",
+ "* [Simple guide for ensemble learning methods](https://towardsdatascience.com/simple-guide-for-ensemble-learning-methods-d87cc68705a2) - Explica didaticamente como funcionam ensembes."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "25RW8u-Sj780"
+ },
+ "source": [
+ "### Leitura Adicional\n",
+ "* [Ensemble methods: bagging, boosting and stacking](https://towardsdatascience.com/ensemble-methods-bagging-boosting-and-stacking-c9214a10a205)\n",
+ "* [Ensemble Methods in Machine Learning: What are They and Why Use Them?](https://towardsdatascience.com/ensemble-methods-in-machine-learning-what-are-they-and-why-use-them-68ec3f9fef5f)\n",
+ "* [Ensemble Learning Using Scikit-learn](https://towardsdatascience.com/ensemble-learning-using-scikit-learn-85c4531ff86a)\n",
+ "* [Let’s Talk About Machine Learning Ensemble Learning In Python](https://medium.com/fintechexplained/lets-talk-about-machine-learning-ensemble-learning-in-python-382747e5fba8)\n",
+ "* [Boosting, Bagging, and Stacking — Ensemble Methods with sklearn and mlens](https://medium.com/@rrfd/boosting-bagging-and-stacking-ensemble-methods-with-sklearn-and-mlens-a455c0c982de)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "FugME1HSl4jJ"
+ },
+ "source": [
+ "___\n",
+ "# **PARAMETER TUNNING** (= Parâmetros ótimos dos modelos de Machine Learning)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "u_147cIRl9F1"
+ },
+ "source": [
+ "## GridSearch (Ferramenta ou meio que vamos utilizar para otimização dos parâmetros dos modelos de ML)\n",
+ "* Encontra os parâmetros ótimos (hyperparameter tunning) que melhoram a acurácia dos modelos.\n",
+ "* Necessita dos seguintes inputs:\n",
+ " * A matrix $X_{p}$ com as $p$ COLUNAS (variáveis ou atributos) do dataframe;\n",
+ " * A matriz $y_{p}$ com a COLUNA-target (vaiável resposta);\n",
+ " * Exemplo: DecisionTree, RandomForestClassifier, XGBoostClassificer e etc;\n",
+ " * Um dicionário com os parâmetros a serem otimizados;\n",
+ " * O número de folds para o método de Cross-validation."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "39Sg77fbTWCO"
+ },
+ "source": [
+ "___\n",
+ "# **MODEL SELECTION & EVALUATION**\n",
+ "> Nesta fase identificamos e aplicamos as melhores métricas (Accuracy, Sensitivity, Specificity, F-Score, AUC, R-Sq, Adj R-SQ, RMSE (Root Mean Square Error)) para avaliar o desempenho/acurácia/performance dos modelos de ML.\n",
+ ">> Treinamos os modelos de ML usando a amostra de treinamento e avaliamos o desempenho/acurácia/performance na amostra de teste/validação.\n",
+ "\n",
+ "* Leitura Adicional\n",
+ " * [The 5 Classification Evaluation metrics every Data Scientist must know](https://towardsdatascience.com/the-5-classification-evaluation-metrics-you-must-know-aa97784ff226)\n",
+ " * [Confusion matrix and other metrics in machine learning](https://medium.com/hugo-ferreiras-blog/confusion-matrix-and-other-metrics-in-machine-learning-894688cb1c0a)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "oQQVzZ2ZTYrB"
+ },
+ "source": [
+ "## Confusion Matrix\n",
+ "* Termos associados à Confusion Matrix:\n",
+ " * **Verdadeiro Positivo** (TP = True Positive): Quando o valor observado é True e o modelo estima como True. Ou seja, o modelo acertou na estimativa.\n",
+ " * Exemplo: **Observado**: Fraude (Positive); **Modelo**: Fraude (Positive) --> Modelo acertou!\n",
+ " * **Verdadeiro Negativo** (TN = True Negative): Quando o valor observado é False e o modelo estima como False. Ou seja, o modelo acertou na estimativa;\n",
+ " * Exemplo: **Observado**: NÃO-Fraude (Negative); **Modelo**: NÃO-Fraude (Negative) --> Modelo acertou!\n",
+ " * **Falso Positivo** (FP = False Positive): Quando o valor observado é False e o modelo estima como True. Ou seja, o modelo errou na estimativa. \n",
+ " * Exemplo: **Observado**: NÃO-Fraude (Negative); **Modelo**: Fraude (Positive) --> Modelo errou!\n",
+ " * **Falso Negativo** (FN = False Negative): Quando o valor observado é True e o modelo estima como False.\n",
+ " * Exemplo: **Observado**: Fraude (Positive); **Modelo**: NÃO-Fraude (Negative) --> Modelo errou!\n",
+ "\n",
+ "* Consulte [Confusion matrix](https://scikit-learn.org/stable/auto_examples/model_selection/plot_confusion_matrix.html#sphx-glr-auto-examples-model-selection-plot-confusion-matrix-py)\n",
+ "\n",
+ "\n",
+ "\n",
+ "Source: [Confusion Matrix](https://subscription.packtpub.com/book/big_data_and_business_intelligence/9781838555078/6/ch06lvl1sec34/confusion-matrix)\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "ci-6eiqBTgbL"
+ },
+ "source": [
+ "## Accuracy\n",
+ "> Accuracy - é o número de previsões corretas feitas pelo modelo.\n",
+ "\n",
+ "Responde à seguinte pergunta:\n",
+ "\n",
+ "```\n",
+ "Com que frequência o classificador (modelo preditivo) classifica corretamente?\n",
+ "```\n",
+ "\n",
+ "$$Accuracy= \\frac{TP+TN}{TP+TN+FP+FN}$$"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "F7YI8X5TRx-R"
+ },
+ "source": [
+ "## Precision (ou Specificity)\n",
+ "> **Precision** - fornece informações sobre o desempenho em relação a Falsos Positivos (quantos capturamos).\n",
+ "\n",
+ "Responde à seguinte pergunta:\n",
+ "\n",
+ "```\n",
+ "Com relação ao resultado Positivo, com que frequência o classificador está correto?\n",
+ "```\n",
+ "\n",
+ "\n",
+ "$$Precision= \\frac{TP}{TP+FP}$$\n",
+ "\n",
+ "**Exemplo**: Precison nos dirá a proporção de clientes que o modelo estimou como sendo Fraude quando, na verdade, são fraude.\n",
+ "\n",
+ "**Comentário**: Se nosso foco é minimizar Falso Negativos (FN), então precisamos nos esforçar para termos Recall próximo de 100%."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "zO39n8x_Sz3L"
+ },
+ "source": [
+ "## Recall (ou Sensitivity)\n",
+ "> **Recall** - nos fornece informações sobre o desempenho de um classificador em relação a Falsos Negativos (quantos perdemos).\n",
+ "\n",
+ "Responde à seguinte pergunta:\n",
+ "\n",
+ "```\n",
+ "Quando o valor observado é Positivo, com que frequência o classificador está correto?\n",
+ "```\n",
+ "\n",
+ "$$Recall = Sensitivity = \\frac{TP}{TP+FN}$$\n",
+ "\n",
+ "**Exemplo**: Recall é a proporção de clientes observados como Fraude e que o modelo estima como Fraude.\n",
+ "\n",
+ "**Comentário**: Se nosso foco for minimizar Falso Positivos (FP), então precisamos nos esforçar para fazer Precision mais próximo de 100% possível."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "htS6rdHVVXRG"
+ },
+ "source": [
+ "## Specificity\n",
+ "> **Specificity** - proporção de TN por TN+FP.\n",
+ "\n",
+ "Responde à seguinte pergunta:\n",
+ "\n",
+ "```\n",
+ "Quando o valor observado é Negativo, com que frequência o classificador está correto?\n",
+ "```\n",
+ "\n",
+ "**Exemplo**: Specificity é a proporção de clientes NÃO-Fraude que o modelo estima como NÃO-Fraude.\n",
+ "\n",
+ "$$Specificity= \\frac{TN}{TN+FP}$$\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "mNn0twadTacc"
+ },
+ "source": [
+ "## F1-Score\n",
+ "> F1-Score é a média harmônica entre Recall e Precision e é um número entre 0 e 1. Quanto mais próximo de 1, melhor. Quanto mais próximo de 0, pior. Ou seja, é um equilíbrio entre Recall e Precision.\n",
+ "\n",
+ "$$F1\\_Score= 2\\left(\\frac{Recall*Precision}{Recall+Precision}\\right)$$"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "rsH9dMxazWCg"
+ },
+ "source": [
+ "# **DATAFRAME-EXEMPLO USADO NESTE TUTORIAL**\n",
+ "> Gerar um dataframe com 18 colunas, sendo 9 informativas, 6 redundantes e 3 repetidas:\n",
+ "\n",
+ "Para saber mais sobre a geração de dataframes-exemplo (toy), consulte [Synthetic data generation — a must-have skill for new data scientists](https://towardsdatascience.com/synthetic-data-generation-a-must-have-skill-for-new-data-scientists-915896c0c1ae)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "GEyDo_EIV_jV"
+ },
+ "source": [
+ "## Definir variáveis globais"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "TdwgpZ76WFaT"
+ },
+ "source": [
+ "i_CV = 10 # Número de Cross-Validations\n",
+ "i_Seed = 20111974 # semente por questões de reproducibilidade\n",
+ "f_Test_Size = 0.3 # Proporção do dataframe de validação (outros valores poderiam ser 0.15, 0.20 ou 0.25)"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "gJTJfpwWzykS"
+ },
+ "source": [
+ "from sklearn.datasets import make_classification\n",
+ "\n",
+ "X, y = make_classification(n_samples = 1000, \n",
+ " n_features = 18, \n",
+ " n_informative = 9, \n",
+ " n_redundant = 6, \n",
+ " n_repeated = 3, \n",
+ " n_classes = 2, \n",
+ " n_clusters_per_class = 1, \n",
+ " random_state=i_Seed)"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "gWy2IZh3s-o3"
+ },
+ "source": [
+ "X"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "ccjhGnzxtAaV"
+ },
+ "source": [
+ "y[0:30] # Semelhante aos casos de fraude: {0, 1}"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "OHO2befKJxR3"
+ },
+ "source": [
+ "___\n",
+ "# **DECISION TREE**\n",
+ "> Decision Trees possuem estrutura em forma de árvores.\n",
+ "\n",
+ "* **Principais Vantagens**:\n",
+ " * São algoritmos fáceis de entender, visualizar e interpretar;\n",
+ " * Captura facilmente padrões não-lineares presentes nos dados;\n",
+ " * Requer pouco poder computacional --> Treinar Decision Trees não requer tanto recurso computacional!\n",
+ " * Lida bem com COLUNAS numéricas ou categóricas;\n",
+ " * Não requer os dados sejam normalizados;\n",
+ " * Pode ser utilizado como Feature Engineering ao lidar com Missing Values;\n",
+ " * Pode ser utilizado como Feature Selection;\n",
+ " * Não requer suposições sobre a distribuição dos dados por causa da natureza não-paramétrica do algoritmo\n",
+ "\n",
+ "* **Principais desvantagens**\n",
+ " * Propenso a Overfitting, pois Decision Trees podem construir árvores complexas que não sejam capazes de generalizar bem os dados. As coisas complicam muito se a amostra de treinamento possuir outliers. Portanto, **recomenda-se fortemente a tratar os outliers previamente**.\n",
+ " * Pode criar árvores viesadas se tivermos um dataframe não-balanceado ou que alguma classe seja dominante. Por conta disso, **recomenda-se balancear o dataframe previamente para se evitar esse problema**.\n",
+ "\n",
+ "* **Principais parâmetros**\n",
+ " * **Gini Index** - é uma métrica que mede a frequência com que um ponto/observação aleatoriamente selecionado seria incorretamente identificado.\n",
+ " * Portanto, quanto menor o valor de Gini Index, melhor a COLUNA;\n",
+ " * **Entropy** - é uma métrica que mede aleatoriedade da informação presente nos dados.\n",
+ " * Portanto, quanto maior a entropia da COLUNA, pior ela se torna para nos ajudar a tomar uma conclusão (classificar, por exemplo).\n",
+ "\n",
+ "## **Referências**:\n",
+ "* [1.10. Decision Trees](https://scikit-learn.org/stable/modules/tree.html).\n",
+ "* [Decision Tree Algorithm With Hands On Example](https://medium.com/datadriveninvestor/decision-tree-algorithm-with-hands-on-example-e6c2afb40d38) - ótimo tutorial para aprender, entender, interpretar e calcular os índices de Gini e entropia.\n",
+ "* [Intuitive Guide to Understanding Decision Trees](https://towardsdatascience.com/light-on-math-machine-learning-intuitive-guide-to-understanding-decision-trees-adb2165ccab7) - ótimo tutorial para aprender, entender, interpretar e calcular os índices de Gini e entropia.\n",
+ "* [The Complete Guide to Decision Trees](https://towardsdatascience.com/the-complete-guide-to-decision-trees-28a4e3c7be14)\n",
+ "* [Creating and Visualizing Decision Tree Algorithm in Machine Learning Using Sklearn](https://intellipaat.com/blog/decision-tree-algorithm-in-machine-learning/) - Muito didático!\n",
+ "* [Decision Trees in Machine Learning](https://towardsdatascience.com/decision-trees-in-machine-learning-641b9c4e8052)\n",
+ "\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "FrMkPN5aLp0Y"
+ },
+ "source": [
+ "## Carregar as bibliotecas"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "FVU1CM0PKgO4"
+ },
+ "source": [
+ "import numpy as np\n",
+ "import pandas as pd\n",
+ "import seaborn as sns\n",
+ "import matplotlib.pyplot as plt\n",
+ "\n",
+ "import warnings\n",
+ "warnings.filterwarnings(\"ignore\")"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "15clh4XrISpz"
+ },
+ "source": [
+ "## Carregar/Ler os dados"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "UMPL46w2IWJw"
+ },
+ "source": [
+ "l_colunas = ['v1', 'v2', 'v3', 'v4', 'v5', 'v6', 'v7', 'v8', 'v9', 'v10', 'v11', 'v12', 'v13', 'v14', 'v15', 'v16', 'v17', 'v18']\n",
+ "\n",
+ "df_X = pd.DataFrame(X, columns = l_colunas)\n",
+ "df_y = pd.DataFrame(y, columns = ['target'])"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "MFaQF2MGFl_M"
+ },
+ "source": [
+ "df_X.head()"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "s-ibdD2ZG7tm"
+ },
+ "source": [
+ "df_X.shape"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "f9cqRaywa_TR"
+ },
+ "source": [
+ "set(df_y['target'])"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "BN6jbpn6Iwmu"
+ },
+ "source": [
+ "## Estatísticas Descritivas básicas do dataframe - df.describe()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "KlwhxxUNIyYs"
+ },
+ "source": [
+ "df_X.describe()"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "N_QhFqyZOKFB"
+ },
+ "source": [
+ "## Selecionar as amostras de treinamento e validação\n",
+ "\n",
+ "* Dividir os dados/amostra em:\n",
+ " * **Amostra de treinamento**: usado para treinar o modelo e otimizar os hiperparâmetros;\n",
+ " * **Amostra de teste**: usado para verificar se o modelo otimizado funciona em dados totalmente desconhecidos. É nesta amostra de teste que avaliamos a performance do modelo em termos de generalização (trabalhar com dados que não lhe foi apresentado);\n",
+ "* Geralmente usamos 70% da amostra para treinamento e 30% validação. Outras opções são usar os percentuais 80/20 ou 75/25 (default).\n",
+ "* Consulte [sklearn.model_selection.train_test_split](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html) para mais detalhes.\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "8sKBgs-QOOfn"
+ },
+ "source": [
+ "from sklearn.model_selection import train_test_split\n",
+ "\n",
+ "X_train, X_test, y_train, y_test = train_test_split(df_X, df_y, test_size = f_Test_Size, random_state = i_Seed)"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "TPTKBBHgOpoA",
+ "outputId": "3c8ab56e-2746-4310-df58-9b16986b9413",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 34
+ }
+ },
+ "source": [
+ "X_train.shape"
+ ],
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "(700, 18)"
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "execution_count": 15
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "lEn_LLs2OtRI",
+ "outputId": "7e53d785-2595-4ba6-c229-ac02b99d3c55",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 34
+ }
+ },
+ "source": [
+ "y_train.shape"
+ ],
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "(700, 1)"
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "execution_count": 16
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "_uAw8EcyOvrG",
+ "outputId": "00356053-c127-40d1-8bdd-d769af9ef0e2",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 34
+ }
+ },
+ "source": [
+ "X_test.shape"
+ ],
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "(300, 18)"
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "execution_count": 17
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "A2LYI-9hOyXI",
+ "outputId": "b4f4b728-0bee-435e-e697-27768787d43e",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 34
+ }
+ },
+ "source": [
+ "y_test.shape"
+ ],
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "(300, 1)"
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "execution_count": 18
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "npgoBSX2dd4l"
+ },
+ "source": [
+ "## Treinar o algoritmo com os dados de treinamento\n",
+ "### Carregar os algoritmos/libraries"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "hcvzrtolGfnQ",
+ "outputId": "b0d2ab18-7386-461b-d5f5-8e1880496244",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 68
+ }
+ },
+ "source": [
+ "!pip install graphviz\n",
+ "!pip install pydotplus"
+ ],
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "text": [
+ "Requirement already satisfied: graphviz in /usr/local/lib/python3.6/dist-packages (0.10.1)\n",
+ "Requirement already satisfied: pydotplus in /usr/local/lib/python3.6/dist-packages (2.0.2)\n",
+ "Requirement already satisfied: pyparsing>=2.0.1 in /usr/local/lib/python3.6/dist-packages (from pydotplus) (2.4.7)\n"
+ ],
+ "name": "stdout"
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "v_pF-HH3JKL2"
+ },
+ "source": [
+ "from sklearn.metrics import accuracy_score # para medir a acurácia do modelo preditivo\n",
+ "#from sklearn.model_selection import train_test_split\n",
+ "#from sklearn.metrics import classification_report\n",
+ "from sklearn.metrics import confusion_matrix # para plotar a confusion matrix\n",
+ "\n",
+ "from sklearn.model_selection import GridSearchCV # para otimizar os parâmetros dos modelos preditivos\n",
+ "from sklearn.model_selection import cross_val_score\n",
+ "from time import time\n",
+ "from operator import itemgetter\n",
+ "from scipy.stats import randint\n",
+ "\n",
+ "from sklearn.tree import export_graphviz\n",
+ "from sklearn.externals.six import StringIO \n",
+ "from IPython.display import Image \n",
+ "import pydotplus\n",
+ "\n",
+ "np.set_printoptions(suppress=True)"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "9ROlyvgij2yl"
+ },
+ "source": [
+ "Função para plotar a Confusion Matrix extraído de [Confusion Matrix Visualization](https://medium.com/@dtuk81/confusion-matrix-visualization-fc31e3f30fea)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "klQ0FLOIgeX1"
+ },
+ "source": [
+ "def mostra_confusion_matrix(cf, \n",
+ " group_names = None, \n",
+ " categories = 'auto', \n",
+ " count = True, \n",
+ " percent = True, \n",
+ " cbar = True, \n",
+ " xyticks = False, \n",
+ " xyplotlabels = True, \n",
+ " sum_stats = True, figsize = (8, 8), \n",
+ " cmap = 'Blues'):\n",
+ " '''\n",
+ " This function will make a pretty plot of an sklearn Confusion Matrix cm using a Seaborn heatmap visualization.\n",
+ " Arguments\n",
+ " ---------\n",
+ " cf: confusion matrix to be passed in\n",
+ " group_names: List of strings that represent the labels row by row to be shown in each square.\n",
+ " categories: List of strings containing the categories to be displayed on the x,y axis. Default is 'auto'\n",
+ " count: If True, show the raw number in the confusion matrix. Default is True.\n",
+ " normalize: If True, show the proportions for each category. Default is True.\n",
+ " cbar: If True, show the color bar. The cbar values are based off the values in the confusion matrix.\n",
+ " Default is True.\n",
+ " xyticks: If True, show x and y ticks. Default is True.\n",
+ " xyplotlabels: If True, show 'True Label' and 'Predicted Label' on the figure. Default is True.\n",
+ " sum_stats: If True, display summary statistics below the figure. Default is True.\n",
+ " figsize: Tuple representing the figure size. Default will be the matplotlib rcParams value.\n",
+ " cmap: Colormap of the values displayed from matplotlib.pyplot.cm. Default is 'Blues'\n",
+ " See http://matplotlib.org/examples/color/colormaps_reference.html\n",
+ " '''\n",
+ "\n",
+ " # CODE TO GENERATE TEXT INSIDE EACH SQUARE\n",
+ " blanks = ['' for i in range(cf.size)]\n",
+ "\n",
+ " if group_names and len(group_names)==cf.size:\n",
+ " group_labels = [\"{}\\n\".format(value) for value in group_names]\n",
+ " else:\n",
+ " group_labels = blanks\n",
+ "\n",
+ " if count:\n",
+ " group_counts = [\"{0:0.0f}\\n\".format(value) for value in cf.flatten()]\n",
+ " else:\n",
+ " group_counts = blanks\n",
+ "\n",
+ " if percent:\n",
+ " group_percentages = [\"{0:.2%}\".format(value) for value in cf.flatten()/np.sum(cf)]\n",
+ " else:\n",
+ " group_percentages = blanks\n",
+ "\n",
+ " box_labels = [f\"{v1}{v2}{v3}\".strip() for v1, v2, v3 in zip(group_labels,group_counts,group_percentages)]\n",
+ " box_labels = np.asarray(box_labels).reshape(cf.shape[0],cf.shape[1])\n",
+ "\n",
+ " # CODE TO GENERATE SUMMARY STATISTICS & TEXT FOR SUMMARY STATS\n",
+ " if sum_stats:\n",
+ " #Accuracy is sum of diagonal divided by total observations\n",
+ " accuracy = np.trace(cf) / float(np.sum(cf))\n",
+ "\n",
+ " #if it is a binary confusion matrix, show some more stats\n",
+ " if len(cf)==2:\n",
+ " #Metrics for Binary Confusion Matrices\n",
+ " precision = cf[1,1] / sum(cf[:,1])\n",
+ " recall = cf[1,1] / sum(cf[1,:])\n",
+ " f1_score = 2*precision*recall / (precision + recall)\n",
+ " stats_text = \"\\n\\nAccuracy={:0.3f}\\nPrecision={:0.3f}\\nRecall={:0.3f}\\nF1 Score={:0.3f}\".format(accuracy,precision,recall,f1_score)\n",
+ " else:\n",
+ " stats_text = \"\\n\\nAccuracy={:0.3f}\".format(accuracy)\n",
+ " else:\n",
+ " stats_text = \"\"\n",
+ "\n",
+ " # SET FIGURE PARAMETERS ACCORDING TO OTHER ARGUMENTS\n",
+ " if figsize==None:\n",
+ " #Get default figure size if not set\n",
+ " figsize = plt.rcParams.get('figure.figsize')\n",
+ "\n",
+ " if xyticks==False:\n",
+ " #Do not show categories if xyticks is False\n",
+ " categories=False\n",
+ "\n",
+ " # MAKE THE HEATMAP VISUALIZATION\n",
+ " plt.figure(figsize=figsize)\n",
+ " sns.heatmap(cf,annot=box_labels,fmt=\"\",cmap=cmap,cbar=cbar,xticklabels=categories,yticklabels=categories)\n",
+ "\n",
+ " if xyplotlabels:\n",
+ " plt.ylabel('True label')\n",
+ " plt.xlabel('Predicted label' + stats_text)\n",
+ " else:\n",
+ " plt.xlabel(stats_text)"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "YJMS9ePQ6B6t"
+ },
+ "source": [
+ "**Atenção**: Para evitar overfitting nos algoritmos DecisionTreeClassifier, considere min_samples_split = 2 como default."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "nNeRHYePJc-r"
+ },
+ "source": [
+ "from sklearn.tree import DecisionTreeClassifier # Library para Decision Tree (Classificação)\n",
+ "\n",
+ "# Instancia com os parâmetros sugeridos para se evitar overfitting:\n",
+ "ml_DT= DecisionTreeClassifier(criterion = 'gini', \n",
+ " splitter = 'best', \n",
+ " max_depth = None, \n",
+ " min_samples_split = 2, \n",
+ " min_samples_leaf = 1, \n",
+ " min_weight_fraction_leaf = 0.0, \n",
+ " max_features = None, \n",
+ " random_state = i_Seed, \n",
+ " max_leaf_nodes = None, \n",
+ " min_impurity_decrease = 0.0, \n",
+ " min_impurity_split = None, \n",
+ " class_weight = None, \n",
+ " presort = False)"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "gVLZznprx2YX",
+ "outputId": "956487e9-beb3-4638-c305-786d7e06c0c0",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 120
+ }
+ },
+ "source": [
+ "# Objeto configurado\n",
+ "ml_DT"
+ ],
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "DecisionTreeClassifier(ccp_alpha=0.0, class_weight=None, criterion='gini',\n",
+ " max_depth=None, max_features=None, max_leaf_nodes=None,\n",
+ " min_impurity_decrease=0.0, min_impurity_split=None,\n",
+ " min_samples_leaf=1, min_samples_split=2,\n",
+ " min_weight_fraction_leaf=0.0, presort=False,\n",
+ " random_state=None, splitter='best')"
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "execution_count": 30
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "OgAHfXVo-Nw8",
+ "outputId": "10fed276-0cf3-4149-e5d1-784e736a2841",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 120
+ }
+ },
+ "source": [
+ "# Treina o algoritmo: fit(df)\n",
+ "ml_DT.fit(X_train, y_train)"
+ ],
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "DecisionTreeClassifier(ccp_alpha=0.0, class_weight=None, criterion='gini',\n",
+ " max_depth=None, max_features=None, max_leaf_nodes=None,\n",
+ " min_impurity_decrease=0.0, min_impurity_split=None,\n",
+ " min_samples_leaf=1, min_samples_split=2,\n",
+ " min_weight_fraction_leaf=0.0, presort=False,\n",
+ " random_state=None, splitter='best')"
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "execution_count": 33
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "ohmGCDpfyhvV",
+ "outputId": "fee641eb-64d0-4072-874c-f704c6a70cfe",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 34
+ }
+ },
+ "source": [
+ "i_CV"
+ ],
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "10"
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "execution_count": 24
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "6exa9D8R2fDJ",
+ "outputId": "5bfc98af-bd00-440d-b504-ab499254c533",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 51
+ }
+ },
+ "source": [
+ "# Cross-Validation com 10 folds\n",
+ "a_scores_CV = cross_val_score(ml_DT, X_train, y_train, cv = i_CV)\n",
+ "\n",
+ "print(f'Média das Acurácias calculadas pelo CV....: {100*round(a_scores_CV.mean(),4)}')\n",
+ "print(f'std médio das Acurácias calculadas pelo CV: {100*round(a_scores_CV.std(),4)}')"
+ ],
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "text": [
+ "Média das Acurácias calculadas pelo CV....: 91.43\n",
+ "std médio das Acurácias calculadas pelo CV: 3.8899999999999997\n"
+ ],
+ "name": "stdout"
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "Uxoplcea0byV",
+ "outputId": "578c5e51-c311-4cdf-c5ad-0de8fedd4e17",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 51
+ }
+ },
+ "source": [
+ "a_scores_CV # array com os scores a cada iteração do CV"
+ ],
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "array([0.87142857, 0.98571429, 0.85714286, 0.91428571, 0.9 ,\n",
+ " 0.95714286, 0.91428571, 0.92857143, 0.87142857, 0.94285714])"
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "execution_count": 36
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "y3k-PcbN0o_i",
+ "outputId": "0334a08d-8d2b-4687-ccda-65c6eac86759",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 34
+ }
+ },
+ "source": [
+ "a_scores_CV.mean()"
+ ],
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "0.9142857142857144"
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "execution_count": 37
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "6_rYker2gzeG"
+ },
+ "source": [
+ "**Interpretação**: Nosso classificador (DecisionTreeClassifier) tem uma acurácia média de 91,43% (base de treinamento). Além disso, o std é da ordem de 3,66%, ou seja, pequena. Vamos tentar melhorar a acurácia do classificador usando parameter tunning (GridSearchCV)."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "tkwchmkP3p_A",
+ "outputId": "8b157dfc-f416-49d2-d185-3cf8ebfa13b0",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 51
+ }
+ },
+ "source": [
+ "print(f'Acurácias: {a_scores_CV}')"
+ ],
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "text": [
+ "Acurácias: [0.87142857 0.98571429 0.85714286 0.91428571 0.9 0.95714286\n",
+ " 0.91428571 0.92857143 0.87142857 0.94285714]\n"
+ ],
+ "name": "stdout"
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "sI31WkZs2ht_"
+ },
+ "source": [
+ "# Faz predições...\n",
+ "y_pred = ml_DT.predict(X_test)"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "rfapj3OG13PG",
+ "outputId": "af6e5144-5cdb-4017-885e-e398508d9cf5",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 51
+ }
+ },
+ "source": [
+ "y_pred[0:30]"
+ ],
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "array([1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0,\n",
+ " 1, 0, 0, 1, 1, 0, 1, 1])"
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "execution_count": 40
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "sc88ofqh16RT",
+ "outputId": "4c2d7859-fa1a-4ecb-ea61-9ec399e439de",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 51
+ }
+ },
+ "source": [
+ "y[0:30]"
+ ],
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "array([1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1,\n",
+ " 1, 1, 0, 1, 0, 1, 0, 1])"
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "execution_count": 41
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "fSaVzJ9xFpwW",
+ "outputId": "12eb1946-18c6-4369-af9d-916b5a0fc42d",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 538
+ }
+ },
+ "source": [
+ "# Confusion Matrix\n",
+ "cf_matrix = confusion_matrix(y_test, y_pred)\n",
+ "cf_labels = ['True_Negative', 'False_Positive', 'False_Negative', 'True_Positive']\n",
+ "cf_categories = ['Zero', 'One']\n",
+ "mostra_confusion_matrix(cf_matrix, group_names= cf_labels, categories= cf_categories)"
+ ],
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "display_data",
+ "data": {
+ "image/png": "iVBORw0KGgoAAAANSUhEUgAAAccAAAIJCAYAAADQ9vbrAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nOzdeZxP1R/H8ddnzNhCdlmz7wqllJQ9pOxbCqVECRWl9FNp30slVEIREm1KZKfsKmslsofszBhm5vz++H5NM19jjGvGzHzn/fw9vo++99x7zzl3+j2+nz7nnHuvOecQERGR/4SkdgdERETSGgVHERGRAAqOIiIiARQcRUREAig4ioiIBFBwFBERCRCaEpVmq9FH94dIundw+bup3QWRZJE1FEupulPi9z5i9bsp1t+kUuYoIiISIEUyRxERySAsOHOs4LwqERGRC6DMUUREvLNUnx5MEcocRUREAihzFBER74J0zlHBUUREvNOwqoiISMagzFFERLwL0mHV4LwqERGRC6DMUUREvNOco4iISAALSf5PUpo1G21me81sbQL7HjEzZ2b5/dtmZsPMbJOZ/WZmNc9Vv4KjiIikR2OApoGFZlYcaAJsi1PcDCjn//QE3j9X5QqOIiLinVnyf5LAObcAOJDArjeBR4G4bwtpCYxzPkuA3GZWOLH6FRxFRCQomFlLYKdz7teAXUWB7XG2d/jLzkoLckRExLsUuJXDzHriG/48bZRzbtQ5zskOPIFvSPWCKTiKiIh3KbBa1R8IEw2GCSgDlAJ+NV+figGrzOwaYCdQPM6xxfxlZ6VhVRERSfecc2uccwWdcyWdcyXxDZ3WdM79A3wNdPWvWq0NHHbO7U6sPmWOIiLiXSo9IcfMPgPqAfnNbAfwlHPuo7Mc/h3QHNgEhAN3nat+BUcREUl3nHOdz7G/ZJzvDnjgfOpXcBQREe/0hBwREZGMQZmjiIh4F6Rv5VBwFBER74I0OAbnVYmIiFwAZY4iIuJdiBbkiIiIZAjKHEVExLsgnXNUcBQREe90n6OIiEjGoMxRRES8C9Jh1eC8KhERkQugzFFERLwL0jlHBUcREfFOw6oiIiIZgzJHERHxLkiHVZU5ioiIBFDmKCIi3gXpnKOCo4iIeKdhVRERkYxBmaOIiHgXpMOqwXlVIiIiF0CZo4iIeKc5RxERkYxBmaOIiHgXpHOOCo4iIuJdkAbH4LwqERGRC6DMUUREvNOCHBERkYxBmaOIiHgXpHOOCo4iIuKdhlVFREQyBmWOIiLiXZAOqwbnVYmIiFwAZY4iIuJdkM45KjiKiIhnFqTBUcOqIiIiAZQ5ioiIZ8ocRUREMghljiIi4l1wJo7KHEVERAIpcxQREc+Cdc5RwVFERDwL1uCoYVUREZEAyhxFRMQzZY4iIiIZhDJHERHxLFgzRwVHERHxLjhjo4ZVRUREAilzFBERz4J1WFWZo4iISABljiIi4lmwZo4KjiIi4lmwBkcNq4qIiARQ5igiIp4pcxQREUkjzGy0me01s7Vxyl41s41m9puZTTOz3HH2PW5mm8zsdzO7+Vz1KziKiIh3lgKfpBkDNA0omwVUdc5dAfwBPA5gZpWBTkAV/znDzSxTYpUrOIqISLrjnFsAHAgom+mci/JvLgGK+b+3BCY65yKdc1uATcA1idWv4CgiIp6ZWUp8eprZijifnh66djfwvf97UWB7nH07/GVnpQU5IiLiWUosyHHOjQJGeT3fzAYDUcB4r3UoOIqISNAws+5AC6Chc875i3cCxeMcVsxfdlYaVhUREc9SYlj1AvrSFHgUuM05Fx5n19dAJzPLYmalgHLAssTqUuYoIiLpjpl9BtQD8pvZDuApfKtTswCz/EF2iXOul3NunZlNBtbjG259wDkXnVj9Co4iIuJdKj0DwDnXOYHijxI5/nng+aTWr+AoIiKe6Qk5IiIiGYQyRxER8UyZo4iISAahzFFERDwL1sxRwVFERDwL1uCoYVUREZEAyhxFRMS74EwclTmej7yXXsKSiYNYMnEQW2a9wF8/PBe7HRaa6KvBztvG6c/w2Wv3xG63blSdUc/ckaxtAPS5vR7ZsobFbk97pzeX5siW7O1I2lSjWiU6tGkZ+9m5c8dZj619dY1ka7dH9zu57Zabad/6Nrp16cTfWzafdx0P9LqXI0eOcOTIESZ99t/zpffu3cMj/fsmW18lY1LmeB4OHD5O7U4vATD4vuYcD4/krU9mx+7PlCmE6OiYZGuvRqXiVCx9GRs3/5NsdQbq06U+n323nIgTpwBo/eD7KdaWpD1ZsmRl8tSvUqXtF19+jSpVqzFl8iTeeO0Vhr034rzOf2/EBwDs3LmDSRM/o2PnLgAULFiI198aluz9lYRpzlESNOqZOxg2uBMLxg3ghf6tGHxfc/rf2TB2/4rPn6BE4bwAdGpei4WfDGDJxEG8M7gTISGJ/5/q7U/m8FiPm88oz541MyOe6sLCTwbw82eP0aJeNQCyZQ3j05fvZtUXg5n0+r0sGDeAmpVL+Op6oiOLxj/KyimDebJXcwDu73wThQtcyoxR/Zgxyvdf2hunP0O+3JfwbN/buK/DjbFtxr2uh7o2ZNGnA1k26fHYuiQ4hB8/zr13d6Nju9a0bXUrc+f8eMYx+/bt5a6uXejQpiVtWrZg1coVAPy0eBF33t6Rju1aM+ChvoQfP56kNq+6+mq2b9uGc443XnuZNi1b0LbVrcz4/rtE22vWuAEHDx7g7TdfZ8f2bXRo05I3XnuZnTt30KZlCwDu6NyBTZv+jG2rR/c7Wbd2DeHh4Qx58nFu79iODm1bJXidkrEpc0wGRQvmpl7314mJcQy+L+FgUaFUIdo1qUn9u94gKiqGtx7vQKfmtZjw7dkfDP/FzFX0bF+X0sXzxyt/7J6bmbf8D3o9M55Lc2Rj4acDmbPkd3q2r8vBI+HUbPs8lcsUZunEQbHnPP3uNxw8Ek5IiPH9yL5ULVeE4Z/Np+8dDWja8232H4r/Qzblh1W8OrAtIycvAKBtkxrcdv97NKxdkTIlCnLDHa9iZkx56z7q1CzD4lV/ef3zSSqKjDxBhzYtAShSrBivvfE2bw57jxw5cnDw4AHu7NyRevUbxssOvpv+LdfXuYF77+tNdHQ0J05EcPDgAT4Y+T4jP/yY7NmzM/rDUYwb+zG97u9zzj7MnzeXsuXLM3vWTH7fuJHPp37FoYMHub1jO666+uoE24ur30OPsOnPP2Mz4LhDwzc3bc7MGd9Ttk859u3by759e6lStRrD3nqDa66tzdDnXuTIkSN06dSea2tfT/bs2ZPjz5qhBGvmqOCYDKb+uJqYGJfoMfWvqUDNyiVY9OmjAGTLEsa+A8cSPSc6JoY3x/3IwLubMHPx+tjyhtdV4pabqtG/qy+Ty5o5lOKF83B9jdK8O2EeAOv/2s2aP3fFntO2SU3ublOH0EwhXFYgF5VKF2ZtnP2Bfv19BwXy5KRwgUvJnycHh46Es2PPIR64vT6NrqvIEn/gzZEtC2VLFFRwTKcCh1VPnTrFsLfeYNXK5YRYCHv37mH/v/+Sv0CB2GOqVq3GU08+QVRUFPUbNKJipUqsWD6XzX9tovsdnWPruaJ69UTbfvyxAWTNkpUiRYsy6In/8cnYj2na/BYyZcpEvvz5uapWLdatWZNge0nVpGkzet17N/f36cvMGd/TuElTAH7+aRHz5s5h3MejATgZGck/u3dTukyZJNctPgqOclbhEZGx36Oio+MNl2bN7FvsYmZ8+s1Shrzz9XnVPWH6Mgbe3YT1m3bHlhnQecCH/Ll1b5LquLxIPvrf2ZAb7niFQ0cjGPXMHWTJfO5/9VN/XE3rRtUplC8XU2au8l8HvDp6Jh99sfi8rkPSh+++/YaDBw/w2eSphIWF0axxAyJPRsY75qqrazF63KcsnD+fIYMHcWe3u8iZKxe1r6vDy6+9keS2Ts85nktC7d3aslWS2ihUqBC5c+fmj9838sOM73lyyNMAOAdvvDWMkqVKJ7m/krFozjGZbd11gOqVfC+crl6xGCWL5gNg7rLfad2oOgXy5AAgT67slCic55z1RUXF8M6nc3mwS/3Ysh9/3sD9nW6K3b6yQjEAfv5lM22b1ASgYunLqFq2CAC5cmTl+IlIDh87QcG8OWlSp3LsuUePR5Ije9YE257yw0ra33wVrRvVYOqs1QDM+mkD3VpexyXZMgNQpMClsdck6d+xY0fJmzcfYWFhLFu6hF27znxZ+q5dO8mXLz9t23egddv2bFi/jiuurM4vq1exbetWAMLDw/n77y3n1XaNq67mh++/Jzo6mgMHDrBqxQqqVrsiwfbiuuSSSxKd37y5aXM+Hv0hR48epXyFigBcX+cGJoz/lNMvit+wYf1Zz5dzsBT4pAHKHJPZl7N/oUuLa1g5ZTDL1/wdm91t3PwPz7z3Ld+834cQM05FRfPQS5PZtvvgOesc8+XPDLq3aez2ix/M4NUBbVk++QlCQoy/d+6nbb8RjJy8kA+fvZNVXwzmjy17WL95N4ePRfDXtn38unEHv077Hzv+OciSX/5bNj966mK+fu9+du87TNOe8Vf4bdj8DzmyZ2XX3kP88+8RAGYv2UjFUpcxb+wAAI5HRHLX4LHsO5j4ELGkD81b3ErfB3rTttWtVK5SlVKlz8ysVixbxpiPPyI0NJTs2bPz3IsvkzdvXoY+/yKDBj7MyVMnAejzYH9KliyV5LYbNmrMb7+upn2blpgZ/R8ZSP4CBfj6y2lntBdX7tx5qF6jJm1atuCGunVjV62e1rjJzbzy0vP07HV/bFnPXvfzyksv0K71bcTExFC0WDHeHT7yfP5UEuTs9H85JadsNfokf6VyTiEhRlhoJiJPRlGqWH6+G9GHK1o9y6moRF94LWdxcPm7qd0FkWSRNTTl8rESD36d7L/32965LdXzR2WOQSR71szM+KAfYaEhGEa/FycrMIpIitKCHEkRC8YNIHPA4pgeT45j3aazryQ9m2PhkdzQ5ZXk6prIBevf9wF27Yj/1J1+Dw+gzg11U6lHIkmj4JjKbuz6WoLlI57qQrMbq7LvwFGubv9CvH397mzASw+3oVj9x9h/6Dh1ryrH52/25O9d+wH4as4vvDhqRor3XeRc3hr2XqL7Pxk7hqlffI6ZUa5ceYY+/yJZsmS5SL2T5BCsmaNWq6ZRn3yzhJYPnPnDUqxQbhrWrsS23QfilS9e/Re1O71E7U4vKTBKurBnzx4mjB/HZ5O/YOpX3xITE82M76andrdEAAXHNGvxqr84cDj8jPJXBrRl8NtfkhILqUQutujoaCJPnCAqKoqIEycoULBgandJzpOZJfsnLVBwTEda1KvGrr2HWPPHmfeeXXtFKZZOGsSX7/amUunLUqF3IuenUKFCdOt+Nzc3qk+jejeQM0cOrq9zQ2p3S85XkN7nqOCYTmTLGsajd9/M0PfPHHb6ZeN2KjT/H9d2fIn3J85n8ps9U6GHIufnyOHDzJ0zm+9mzmbW3IVERETw7Tep84YQkUAKjulE6WIFuLxoPpZNepyN05+haMHc/DzhMQrly8nR4yc4HuG78fqHResJC81EvtyXpHKPRRK3ZMlPFC1WjLx58xIWFkbDRk34dfXq1O6WnKdgHVbVatV0Yt2mXVze8PHY7Y3Tn6FOl1fYf+g4hfLlZM/+owBcXeVyQszOeMuGSFpzWeEi/Pbrr0RERJA1a1aWLvmZylWrpna3RAAFxzRr7IvdqXtVOfLnzsGmGc/y7IjvGPvlzwke27pRDe5tX5eo6GhOnDhF18c/vsi9FTl/V1xxJY2b3Eyn9q3JlCmUipUq0a59x9TulpyntJLpJTc9Pk7kLPT4OAkWKfn4uDKPfJ/sv/d/vd4s1SOuMkcREfEsSBNHBUcREfEuWIdVtVpVREQkgDLHVBISYiwe/yi79h6mbb8R1LumPC/0b01IiHE8PJJ7n/qEzdv/jXfO1VUu593/dQZ8QxnPj/iOr+f+RpbMofz4UX8yZw4lNFMmpv24mudGfAfAx893o0rZIny/cC1PvfsNAI/dczPrN+3mm3m/XdyLlqD1z+7dDH78UQ7s3w9mtGvfgS53djvjuOXLlvLqSy9wKiqKPHnyMHrsp/y9ZTOPPvJQ7DE7dmzn/j59uaNrd958/VUWL1pAhYqVeP5F30P1v/3mKw4dPMgdXbtfrMuTRARp4qjgmFr63F6f37fsIeclWQEY9kQn2j80kt+37KFn+7oMuqcpPZ/6NN456/7aRZ0urxAdHcNl+XOxdNLjTF+wlsiTUTTtOYzjEScJDQ1hzuiHmbl4PeEnThIReYprOr7It+/3IVeOrGTPmplaVUvy8oc/pMZlS5DKFJqJAY8OolLlKhw/foxO7dtS+7o6lClbNvaYI0eO8MKzzzB85IcULlKE/ft9D8ovWao0k6f6bv6Pjo6mcf0badCoMUePHmXjhvVMmfYNTw8ZzJ9//E7xEpfz1bSpDB/5Yapcp2QcGlZNBUUL5qbpDVX4eNpPsWXOOXL5A2WunNnYve/wGedFnDhFdHQMAFkyh8V7vurphwCEhWYiNDQTzjlORUWTLUsYZr6XIEdHx/C/3rfw3Ag93FmSV4ECBalUuQoAl1ySg9KlS7N37554x3w//RsaNmpM4SJFAMiXL98Z9Sxd8jPFixenSJGihIQYUVFROOc4EXGC0NBQxn78EZ273ElYWFjKX5QkiR4CIMnm1YG+h4fnyJ41tuz+oROY9s79nIg8yZHjJ7ip6+sJnlur6uWMePoOShTOS48nx8YGy5AQ46cJj1GmeAFGTlrA8rVbAfj34DF+/uwxPpu+jDLFCxASYvyycUeCdYskh507d7BxwwaqXXFlvPKtf/9NVFQUPbrfyfHjx+lyR1dubdkq3jEzvp9O0+YtAF+QvaHujXRs24pral9Hjpw5WbPmN+7r/cBFuxY5tzQSy5KdguNF1qxuVfYeOMrqDdupe1W52PIHu9Sn9YPDWb52Kw91bcjLj7Th/qETzjh/+dqtXNXueSqUKsSHQ+/kh8XriTwZRUyMo3anl7g0RzYmvXEvlcsUZv1fuxn42hex50556z4efH4ij/a4mSvKF2X2ko3xsleRCxV+/DiP9O/LwEFPkCNHjnj7oqKjWb9+HaM+GkNk5Am63t6JaldeScmSpQA4dfIk8+fOoV//R2LPuavHvdzV414Anh4ymAf69GXqlM/5+adFlCtfgZ697r94FycZioZVL7LrqpemxU3V2Dj9Gca9dBf1apVn6rBeVCtfNDbbmzJzFbWvLJVoPb9v2cOx8EiqlC0Sr/zwsQjmr/iDJtdXjlfeol41Vm/YziXZslC6WH7ueGw0rRvVIFtWDU9J8jh16hQP9+9L81tupVHjJmfsL1ToMq6vcwPZs2cnT5681Lz6av74fWPs/kWLFlCxchXy5c9/xrkbNqzHOcflJUsx84cZvPrG22zfvp2tW/9OyUuSJAgJsWT/pAUKjhfZkHe+pmzT/1HxlqfoOuhj5i3/g/YPjSJXjmyULeF7l12D2hX5fcueM869vEg+MmXy/SsrUTgPFUpdxtZd+8mfJweX5sgGQNYsYTS8tiK///3f+aGhIfS5vT5vjJ1FtqxhOHxzlZkyGZlDNXggF845x9NDBlO6dGm6dr8rwWPqN2jI6lUrfe9ujIhgzW+/Uap0mdj93383nWbNb0nw3PfeeZsHHuxHVFQUMTHRgO9H+UTEieS/GBE0rJomREfH8MCzE/jstXuIcTEcOhLBfU/7VqreclM1alYuwbPvT+f6GqUZcFcTTkVFExPj6PfCJPYfOk7VckX4YOidZAoJISTE+GLWKr5fuDa2/l4dbuTTb5YSceIUa/7YSfasmVk++Ql+WLSOw8ciUuuyJYisXrWSb7/+inLly9OhTUsAHuz/MLt37wKgQ8fOlC5Thjo31KV969uwkBDatG1HuXLlAQgPD2fJTz/xv6eGnlH3nNk/UqVKVQoWLARAhYqVaNvqVsqXL0+FihUv0hXK2QTrnKOerSpyFnq2qgSLlHy2atUnZyX77/3a5xqnesjVsKqIiEgADauKiIhnwTqsqsxRREQkgDJHERHxLK080Sa5KXMUEREJoMxRREQ8C9bMUcFRREQ8C9LYqGFVERGRQMocRUTEs2AdVlXmKCIiEkCZo4iIeBakiaOCo4iIeKdhVRERkQxCmaOIiHgWpImjMkcREUl/zGy0me01s7VxyvKa2Swz+9P/zzz+cjOzYWa2ycx+M7Oa56pfwVFERDwzs2T/JNEYoGlA2SBgtnOuHDDbvw3QDCjn//QE3j9X5QqOIiLimVnyf5LCObcAOBBQ3BIY6/8+FmgVp3yc81kC5DazwonVr+AoIiLBopBzbrf/+z9AIf/3osD2OMft8JedlRbkiIiIZylxK4eZ9cQ3/HnaKOfcqPOpwznnzMx57YOCo4iIpCn+QHhewdBvj5kVds7t9g+b7vWX7wSKxzmumL/srDSsKiIinqXWnONZfA1083/vBnwVp7yrf9VqbeBwnOHXBClzFBGRdMfMPgPqAfnNbAfwFPASMNnMegBbgQ7+w78DmgObgHDgrnPVr+AoIiKepdbj45xznc+yq2ECxzrggfOpX8FRREQ80xNyREREMghljiIi4pneyiEiIpJBKHMUERHPgjRxVHAUERHvNKwqIiKSQShzFBERz5Q5ioiIZBDKHEVExLMgTRwVHEVExDsNq4qIiGQQyhxFRMSzIE0clTmKiIgEUuYoIiKeBeuco4KjiIh4FqSxUcOqIiIigZQ5ioiIZyFBmjoqcxQREQmgzFFERDwL0sRRmaOIiEggZY4iIuKZbuUQEREJEBKcsVHDqiIiIoGUOYqIiGfBOqyqzFFERCSAMkcREfEsSBNHBUcREfHOCM7oqGFVERGRAMocRUTEM93KISIikkEocxQREc+C9VYOBUcREfEsSGOjhlVFREQCKXMUERHP9LJjERGRDEKZo4iIeBakiaMyRxERkUDKHEVExDPdyiEiIhIgSGOjhlVFREQCKXMUERHPdCuHiIhIBqHMUUREPAvOvFHBUURELkCwrlbVsKqIiEgAZY4iIuJZsL7s+KzB0czeAdzZ9jvn+qZIj0RERFJZYpnjiovWCxERSZeCdc7xrMHROTc27raZZXfOhad8l0REJL0I0th47gU5Znadma0HNvq3rzSz4SneMxERkVSSlNWqbwE3A/sBnHO/AjemZKdERCR9MLNk/6QFSbqVwzm3PaAoOgX6IiIikiYk5VaO7WZ2PeDMLAzoB2xI2W6JiEh6EKy3ciQlc+wFPAAUBXYB1f3bIiIiQemcmaNz7l+gy0Xoi4iIpDOpNUdoZg8B9+C7H38NcBdQGJgI5ANWAnc65056qT8pq1VLm9k3ZrbPzPaa2VdmVtpLYyIiElwsBT7nbNOsKNAXuNo5VxXIBHQCXgbedM6VBQ4CPbxeV1KGVScAk/FF5CLA58BnXhsUERFJBqFANjMLBbIDu4EGwBT//rFAK6+VJyU4ZnfOfeKci/J/PgWyem1QRESCR4hZsn/MrKeZrYjz6Rm3TefcTuA1YBu+oHgY3zDqIedclP+wHfjWyniS2LNV8/q/fm9mg/CN4zqgI/Cd1wZFREQS45wbBYw6234zywO0BEoBh/CNaDZNzj4ktiBnJb5geHoI+L44+xzweHJ2RERE0p9UWo/TCNjinNvn64NNBeoAuc0s1J89FgN2em0gsWerlvJaqYiIZAyptFp1G1DbzLIDEUBDfC/LmAu0wzfS2Q34ymsDSXqfo5lVBSoTZ67ROTfOa6MiIiJeOeeWmtkUYBUQBazGNww7HZhoZs/5yz7y2sY5g6OZPQXUwxccvwOaAYsABUcRkQwutR6F6px7CngqoHgzcE1y1J+U1art8KWs/zjn7gKuBC5NjsZFRETSoqQMq0Y452LMLMrMcgF7geIp3C8REUkHQtLIWzSSW1KC4wozyw18gG8F6zHg5xTtlYiIpAtBGhuT9GzV+/1fR5jZDCCXc+63lO2WiIhI6knsIQA1E9vnnFuVMl0SEZH0Iq28nDi5JZY5vp7IPofvGXYJ+uenYZ47JJJW5Gn+amp3QSRZRMwcmNpdSHcSewhA/YvZERERSX+ScstDehSs1yUiIuJZkp6QIyIikpCMOOcoIiKSqJDgjI3nHlY1nzvMbIh/u4SZJcvjeURERNKipMw5DgeuAzr7t48C76VYj0REJN0IseT/pAVJGVa91jlX08xWAzjnDppZ5hTul4iISKpJSnA8ZWaZ8N3biJkVAGJStFciIpIuZOQFOcOAaUBBM3se31s6nkzRXomISLqQVoZBk1tSnq063sxW4nttlQGtnHMbUrxnIiIiqSQpLzsuAYQD38Qtc85tS8mOiYhI2heko6pJGladjm++0YCsQCngd6BKCvZLREQk1SRlWLVa3G3/2zruP8vhIiKSgWTklx3H45xbZWbXpkRnREQkfQnWB3QnZc7x4TibIUBNYFeK9UhERCSVJSVzzBnnexS+OcgvUqY7IiKSngTpqGriwdF/839O59yAi9QfERGRVHfW4Ghmoc65KDOrczE7JCIi6UdGXJCzDN/84i9m9jXwOXD89E7n3NQU7puIiEiqSMqcY1ZgP9CA/+53dICCo4hIBhekiWOiwbGgf6XqWv4Liqe5FO2ViIikCxnx2aqZgBzED4qnKTiKiEjQSiw47nbODb1oPRERkXQnWBfkJPZwg+C8YhERkXNILHNseNF6ISIi6VKQJo5nD47OuQMXsyMiIpL+BOuCnGB9ZqyIiIhn5/1WDhERkdMsSJenKHMUEREJoMxRREQ8C9Y5RwVHERHxLFiDo4ZVRUREAihzFBERzyxIb3RU5igiIhJAmaOIiHimOUcREZEMQpmjiIh4FqRTjgqOIiLiXUZ8ZZWIiEiGpMxRREQ804IcERGRDEKZo4iIeBakU44KjiIi4l2IXlklIiKSMShzFBERz4J1WFWZo4iISABljiIi4lmw3sqh4CgiIp7pCTkiIiJphJnlNrMpZrbRzDaY2XVmltfMZpnZn/5/5vFav4KjiIh4Zpb8nyR6G5jhnKsIXAlsAAYBs122WZcAACAASURBVJ1z5YDZ/m1PFBxFRCRdMbNLgRuBjwCccyedc4eAlsBY/2FjgVZe29Cco4iIeJZKc46lgH3Ax2Z2JbAS6AcUcs7t9h/zD1DIawPKHEVEJE0xs55mtiLOp2fAIaFATeB951wN4DgBQ6jOOQc4r31Q5igiIp6lROLonBsFjErkkB3ADufcUv/2FHzBcY+ZFXbO7TazwsBer31Q5igiIp6FpMDnXJxz/wDbzayCv6ghsB74GujmL+sGfOX1upQ5iohIevQgMN7MMgObgbvwxdbJZtYD2Ap08Fq5gqOIiHhmqfQQAOfcL8DVCexqmBz1a1hVREQkgDJHERHxLDgfHqfgKCIiF0DPVhUREckglDmKiIhnwZk3KnMUERE5gzJHERHxLEinHBUcRUTEu9S6zzGlaVhVREQkgDJHERHxLFgzrGC9LhEREc+UOYqIiGeacxQREckglDmKiIhnwZk3KjiKiMgF0LCqiIhIBqHMUUREPAvWDCtYr0tERMQzZY4iIuJZsM45KjiKiIhnwRkaNawqIiJyBmWOIiLiWZCOqipzFBERCaTMUUREPAsJ0llHBUcREfFMw6oiIiIZhDJHERHxzIJ0WFWZo4iISABljiIi4lmwzjkqOIqIiGfBulpVw6oiIiIBlDmKiIhnwTqsqsxRREQkgDJHERHxTJmjiIhIBqHMMYlq16xCmbLlY7dfffNdihQtmuCxN113FfN/Xpks7fbq0ZXwiHDGTZgCwPp1axn2xiuM+GhcstR/2rdfTePa6+pQoGBBAJ575kluv6M7pcuUTdZ2JO3JmzMr373SEYBCeS4hJiaGfYcjAKj74CeciopJtrY2juvJ0YiTOAd7Dh7nnle+Y8/B4+dVx9w3b6f+QxMoUSgX11UuyqS5GwCoWa4QXRpX4ZHhc5Ktv3JuwfoQAAXHJMqSJSvjJ09LlbYPHjjAT4sWcP0NN6ZYG99+PY3SZcvFBscnn3ouxdqStOXA0RPU7j0WgMF3Xs/xiFO8NWV57P5MIUZ0jEu29poOnMT+IxE8c1ddHu187XkHs/oPTQDg8kKX0qF+pdjguOrPPaz6c0+y9VOSJiQ4Y6OCo1fh4ccZ0L8PR48cJioqil4P9OOm+g3jHfPvvr088djDHD92nOjoKB4b/BQ1al7Nkp8WM2rEO5w6eZKixUowZOjzZM9+yVnbuqPb3Xz84cgzgmN0dDTvvf0GK1cs49Spk7TreDtt2nUkJiaGV198lhXLl1Ko0GWEhoZya6u2NGx8Mx+OfI+F8+cRGXmCK66sweP/e4Y5P85kw/p1DHliIFmyZOWjcZ/R/4Ge9H34UTasX8vO7dvp+/BAwJdhbli/loGP/4/vp3/NpAmfcurUKapWu4JHnxhCpkyZkv+PLRfdqAHNOHEyiuplC/Lzul0cCY+MFzRXjOpOm/9NZdueI3RqWJkHWtYkLCwTyzfupt87s4hJQjBdtGY797e6iixhmRjWtzE1y19GVHQMj42cy4Jft1Pp8nyMeqQZYWGZCDGj89Av+WvXIfZ91Y8CLd/muR43UqFEPpa8343xs9byy6a99G9Xi3ZPTWXD2J5c23ssh49HArDm43to+NAEYpzjnb5NKF4wJwAD35/Lz+t3ptwfUtItzTkmUWTkCbp0aE2XDq0Z+FAfMmfOwitvvMMnE6fy/gdjefuNV3Au/g/CD99Pp/Z1NzB+8jTGT/6S8hUqcejgQUZ/+D7vjRzNJxOnUqlKFSZ8MibRtqtdUZ3QsDBWLF8ar/zraV9wSc4cjJ3wOWPGf86XUz9n584dzJ09i927djJp6rc8/fzLrPnt19hz2nfqwtgJnzPxi2+IjDzBogXzaNj4ZipVrsLQF15l/ORpZM2aNfb4Bg2bMG/uj7Hbs2Z+T+Omzdmy+S9m/fA9H44Zz/jJ0wgJCWHGd99cwF9Y0pqi+XNSr/8EHhs596zHVCiel3Y3VaD+QxOo3Xss0TExdGpQOUn1N7+2DOu27KPXbTVwDmrdN4ZuL37LhwObkyUsE/feUp33vlxJ7d5jqdNnHDv/PRbv/Cc/WsDiNTuo3Xss70z9bxrDOfj2503cVqccALUqFmbbniPsPRTOa70b8M7UFdzw4Kd0HvoVwx++2cNfRuKyFPhfWqDMMYkCh1WjTp3i/XfeZPWqFZiFsG/vHvbv/5f8+QvEHlOpSlWee/pJoqKiqFe/IeUrVmLhymVs2fwX93Tr4qsn6hRVr7jynO3ffW8vRn8wgj79HoktW7pkMX/+8TtzZs0E4Nixo2zfupVfV6+kYeOmhISEkD9/Aa6qdU3sOSuXL+OTMR9x4kQERw4fpnSZctS9qf5Z282TNy9FixZjzW+/ULzE5fy9ZTNXVq/J55MmsHHDOrp16QD4/uMhT958SfxrSnowdeHv58wA69e4nJrlLmPRu3cCkC1zKPsOhSd6zoxXOxId41i7eR9Pj1nEqAHNGP7VKgD+2H6AbXuOUK5YXpZu2MWjnWtTNH9Ovlz0B3/tOpTkvk+Zv5HHu1zPJzPX0r5eRabM3+jrb83LqXh5/tjjcmXPzCVZwzh+4lSS65aMQcHRoxnffcvBgwcYN2EKoWFhtGzWkJORJ+MdU/OqWoz86BMWL5zHM0Oe4PY7u5Er16VcW/t6nnvp9fNqr9Y1tRnx7tusXfNfFuicY8CgJ7nu+hviHfvTovkJ1hEZGckrLwxl7ITPKXRZYUa9/y6RkZHnbLtx0+b8OHMGJUuWpl6DRpgZzjluubUVD/R9+LyuQ9KP8DgBIyo6hpA4a/azhvl+Oszg01lrGTJ6YZLrPT3neC6T5m5g2cbdNLu2NF8+344+b89k/i/bktTGkvW7KFMkN/kvzcat15flpfE/AxBixk19PyXyVHSS+yuJ060cEs+xY0fJkzdf7HDn7t27zjhm966d5M2Xj1ZtO9CyTTt+37CeqtWu5NdfVrN921YAIiLC2bp1S5LavPveXnwy5qPY7drX3cAXkycSdcr3I7Z16xYiIsK5onpN5syeSUxMDPv3/8uqFb55opP+QHhp7jyEhx9nzo8/xNaV/ZJLCA9PeNVgvQaNWDBvDjNnTKfJzc0BX7CeM+sHDhzYD8Dhw4fYvUtzN8Fq654jVC/nW6xVvWxBSl52KQBzV2+jdd0KFMidHYA8ObNSomCu86p78doddGpQCYCyRfNQvGBO/thxgJKXXcqW3YcY/uUqvv1pE9VKFYh33rGIk+TMnvms9X7905+8fF99Nm47wIGjJwCYvfJv7m9VM/aYK0oXPK++ypk0rCrxNG1+Kw/3603ndrdRqXJVSpYqfcYxK1cs59OxHxEaGka27Nl5+rmXyJM3L0OGvsCTgwZw6pQv0+z1QD8uv7zUOdusU/cm8uTJE7vdsk07du/ayZ2d2+KcI0+evLz65rs0aNSE5cuW0LFNCwoVuowKlSqRI0cOcubKRcs27ejc7jby5ctP5SrVYutqcVtrXnru6dgFOXHlynUpJUuVZsvmv6hS7QoASpcpS68+/Xiw1z04F0NoaCgDH/8fhYskfHuLpG9fLvyDLo2qsHLUXSzfuJs/dx4EYOO2/TwzZiHfvNieEDNORUfz0Ds/sm3vkSTXPfLr1Qzr25jlI7sTFR3Dva99z8lT0bS7qQKdG1bhVHQMew4c55WJS+Kdt2bzPqJjYlj6fjc+9S/IiWvKvI0sfq8r97z6XWzZI8Pn8FafRiwb0Z3QTMaiNTvoO2zWBfxlJFhZ4CKS5HA4IhnXfYsn4eHHyZ79Eg4dOshdd3TkgzHj482Hyrld1vL8hr5F0qqImQNTLB1b8MeBZP+9v7F83lRPH5U5BqmHH+zN0aNHiYo6RY97eyswioicBwXHNGLgQ33YtTP+nF2f/o+csdgmqZL7CToi52PBsC5kDov/89Lj5ems+/vfVOqRpJS0MkeY3BQc04hX33w3tbsgkmxu7Ds+tbsgF0mwrlZVcEwHnn1qMIsWzCNP3rxM/MJ3o/2I995mwbw5mIWQN29ehgx9MfbRbyJpyYiHm9Ksdmn2HQrn6p5jABjSrQ4tritHjHPsOxROz1e/Y/eB/1ZLX1X+Mua93YWuL3zDtIV/pFLPJSPTrRzpwC23teLt4aPild3RrQcTPv+K8ZOnccON9fhw1PBU6p1I4j6ZtZaWT0yJV/bm58u5ptcYavcey/dL/+LxO66P3RcSYjx3z438uPLvi9xT8cJS4JMWKDimAzWvqkWuXLnjleXIkSP2e0RERNAObUj6t3jNjtj7DE87Gv7fAzOyZw0j7qL5+1vW5MuFf57zSTsiKUnDqunY8Hfe4rtvvyJHjhy8/8HY1O6OyHl5uvsNdGlchcPHI2k6cBIARfLl4LY65bh54ERGVmiWyj2UpAgJ0v8yV+aYjt3/YH++/WEuTZvfyucTtQBC0penxyyiXJeRTJyzgV63+Z5a82rvBjz54XxS4PZrkfOi4BgEmjZvwZzZM1O7GyKeTJq9nlZ1fW/QqFm+EOOeuJWN43rSum553nqwEbderxdup2XBOueoYdV0atvWvylxeUkA5s+bk+Dj60TSqjJFcse+ZaPF9WX5Y/sBACp1/SD2mFEDmvH90r/45qdNqdJHSaK0Es2SmYJjOvDkoEdYuWIZhw4dokWTetzbuw8/LVrA1r+3EBISwmWFizBo8NOp3U2RBI19vAV1ryhO/kuzsWl8L579ZDFNa5WmXPE8xMTAtr2H6fu2nm8qaYuerSpyFnq2qgSLlHy26tK/Dif77/21ZS5NUn/NLBOwAtjpnGthZqWAiUA+YCVwp3PuZGJ1nI3mHEVEJL3qB2yIs/0y8KZzrixwEOjhtWIFRxER8cws+T9Ja9eKAbcAH/q3DWgAnH7ixFigldfrUnBMA6Kjo7mjYxseerDXGfu++HwindvdRpcOrbm3exc2/+VbnLBuzW906dCaLh1ac3uHVsyd45uzOXjgAPd270Kntrcyb86PsfUM6P8A+/buPaN+kQsVEmL8PLwrXwxtE6/89fsbsO+rfmc9r2qpAsx7q4vvHZEju5MlLBM5soWx5P1usZ/tnz/Aq73qA9C7ZQ1WjOrOtOfaEhbq++m6vkpRXvHvl9SREqtVzaynma2I8+mZQNNvAY8CMf7tfMAh51yUf3sH4PkFs1qQkwZMnPAJJUuV5vjxY2fsu7lZC9q27wTAgnlzeOv1lxk2/APKlC3H2AmfExoayr/79tKlQ2vq3lifmTOm06Z9R+o3aEz/PvdRr0EjFs6fS/kKlfTsVUkRfVpfxe/b9pMze5bYsprlCpE7R9aznpMpxBj92C30eGU6azbvI2/OrJyKjiHyVDS1e//3QIvF793Jl4v/BKBTg8rUum8Mj3auTeOrS/Hdkr8Y1OU6ur34bcpdnKQK59woYNTZ9ptZC2Cvc26lmdVLiT4oc0xle/b8w+KF82nZpl2C+898TJxvzCFrtmyEhvr+2yby5MnY8kyhoZyIOMHJUycJyZSJqKgoPhs/jq7dPQ+9i5xV0fw5aHpNaT6esSa2LCTEeOHeegz+cP5Zz2t0VUnWbtnHms37ADhw9AQxAev4yhbNQ8Hc2Vm8ZgfgG24LC81E9ixhnIqKpnPDysxcvoWDAY+mk4ssdW50rAPcZmZ/41uA0wB4G8htZqeTvmLAzoRPPzcFx1T25qsv8mD/AYTY2f9VfD5xPK1bNOGdt17jkUefiC1fu+ZXOrZpwe3tWvLYk08RGhpK02YtWDBvNn169aB7j558Mfkzmt9yG1mzZbsYlyMZzKu9GzD4w/nxAlvv22owfckm/onzlo1A5YrlxTnH1y+046f3uvJw+2vOOKZ9vYpMmfd77Pb7X61m/ttdKF4wFz+v20nXm6sy4uvVyXtBki445x53zhVzzpUEOgFznHNdgLnA6UyjG/CV1zYUHFPRwgVzyZMnL5UqV0n0uPadujDt25n06fcIoz8YEVtetdqVTJr6LWPGT2bsRx8QGRlJjpw5efPdkYybMIWKlSqzcP5cGjRuwvPP/I9BA/rx26/6MZHk0eza0uw9FM7qP/fElhXOewltbqzA8C9XJXpuaKYQrq9alLtemk7DhydwW51y1KteIt4x7etVZPK8/xYifjZ7PdfdP467X57Og22uZviXq7i5Vikm/O82XulVXw/fTyWWAv+7AI8BD5vZJnxzkB95rUjBMRX99stqFs6fS8tmDRk86BFWLF/KkCcePevxTZo2Z/682WeUlypdhmzZs/PXpj/jlX806n3uuqcXM7+fzpU1ruKpZ1/kgxHvJft1SMZ0XZWitKhdlo3jejLuiVupV70EKz+4m9JF8rBuzL1sHNeT7FnCWPvxPWecu/Pfoyxas4P9RyKIiIxixvLN1ChXKHZ/tdIFCM0UEi/wnlY47yVcXaEw3/y0iX7tanHH899w6Fgk9WtcnqLXKwlLrdWqpznn5jnnWvi/b3bOXeOcK+uca++ci/R6XVqQk4oe6PswD/R9GICVy5fx6bjRDH3hlXjHxH1M3OKF8ylewvcDsHPnDgoVuozQ0FB279rJ1r83U6RI0Xjn7d2zh6tqXcOff2wkV5YsGEZkpOZnJHkMGb2QIaMXAlD3iuL0b1eLtkOmxjtm31f9qHrXh2ecO2vFFh5qfw3ZsoRy8lQ0dasV552pK2L3d6hXiclzNybcbvcbeHbcIgCyZQ7FOUeMc2TPEpZclyai4JgWjRw+jEqVq3JjvQZ8PnECy5b+RGhoGLly5eKpoS8C8OvqlYwd/QGhoWGEhBiPPj6E3HnyxNbx/rtv07uPbxl9k2a3MLB/H8aO/oD77u+bKtckckvtMtQsfxnPjlvMoWORDJu6gkXv3InD8cOyLcxYtjn22LY3VaDVk1+cUceVZXwrrn/Z5LstadLcDawYeRc79h3ljcnLLs6FSDzBOpqtx8eJnIUeHyfBIiUfH7fq7yPJ/ntfs2SuVI+5yhxFRMS7VA9jKUMLckRERAIocxQREc8u8NaLNEvBUUREPAvW+0s1rCoiIhJAmaOIiHgWpImjMkcREZFAyhxFRMS7IE0dFRxFRMSzYF2tqmFVERGRAMocRUTEM93KISIikkEocxQREc+CNHFUcBQRkQsQpNFRw6oiIiIBlDmKiIhnupVDREQkg1DmKCIinulWDhERkQxCmaOIiHgWpImjgqOIiFyAII2OGlYVEREJoMxRREQ8060cIiIiGYQyRxER8SxYb+VQcBQREc+CNDZqWFVERCSQMkcREfEuSFNHZY4iIiIBlDmKiIhnwXorh4KjiIh4FqyrVTWsKiIiEkCZo4iIeBakiaMyRxERkUDKHEVExLsgTR2VOYqIiARQ5igiIp7pVg4REZEAupVDREQkg1DmKCIingVp4qjMUUREJJAyRxER8S5IU0cFRxER8SxYV6tqWFVERCSAMkcREfFMt3KIiIhkEMocRUTEsyBNHBUcRUTEOw2rioiIZBDKHEVE5AIEZ+qozFFERCSAgqOIiHhmlvyfc7dpxc1srpmtN7N1ZtbPX57XzGaZ2Z/+f+bxel0KjiIikt5EAY845yoDtYEHzKwyMAiY7ZwrB8z2b3ui4CgiIp5ZCnzOxTm32zm3yv/9KLABKAq0BMb6DxsLtPJ6XVqQIyIinqX2rRxmVhKoASwFCjnndvt3/QMU8lqvMkcREUlTzKynma2I8+l5luNyAF8A/Z1zR+Luc845wHntgzJHERHxLCXeyuGcGwWMSrRdszB8gXG8c26qv3iPmRV2zu02s8LAXq99UOYoIiLpipkZ8BGwwTn3RpxdXwPd/N+7AV95bUOZo4iIeJc6c451gDuBNWb2i7/sCeAlYLKZ9QC2Ah28NqDgKCIinqVGbHTOLUqk6YbJ0YaGVUVERAIocxQREc9S+1aOlKLMUUREJIAyRxER8SwlbuVICxQcRUTEu+CMjRpWFRERCaTMUUREPAvSxFGZo4iISCBljiIi4plu5RAREckglDmKiIhnupVDREQkgIZVRUREMggFRxERkQAKjiIiIgE05ygiIp4F65yjgqOIiHgWrKtVNawqIiISQJmjiIh4FqzDqsocRUREAihzFBERz4I0cVRwFBGRCxCk0VHDqiIiIgGUOYqIiGe6lUNERCSDUOYoIiKe6VYOERGRDEKZo4iIeBakiaOCo4iIXIAgjY4aVhUREQmgzFFERDzTrRwiIiIZhDJHERHxLFhv5TDnXGr3QUREJE3RsKqIiEgABUcREZEACo4iIiIBFBwlTTGzaDP7xczWmtnnZpb9AuoaY2bt/N8/NLPKiRxbz8yu99DG32aWP6nlAcccO8+2njazAefbRxE5fwqOktZEOOeqO+eqAieBXnF3mpmnFdbOuXucc+sTOaQecN7BUUSCk4KjpGULgbL+rG6hmX0NrDezTGb2qpktN7PfzOw+APN518x+N7MfgYKnKzKzeWZ2tf97UzNbZWa/mtlsMyuJLwg/5M9a65pZATP7wt/GcjOr4z83n5nNNLN1ZvYhSXh4lpl9aWYr/ef0DNj3pr98tpkV8JeVMbMZ/nMWmlnF5PhjikjS6T5HSZP8GWIzYIa/qCZQ1Tm3xR9gDjvnaplZFmCxmc0EagAVgMpAIWA9MDqg3gLAB8CN/rryOucOmNkI4Jhz7jX/cROAN51zi8ysBPADUAl4CljknBtqZrcAPZJwOXf728gGLDezL5xz+4FLgBXOuYfMbIi/7j7AKKCXc+5PM7sWGA408PBnFBGPFBwlrclmZr/4vy8EPsI33LnMObfFX94EuOL0fCJwKVAOuBH4zDkXDewyszkJ1F8bWHC6LufcgbP0oxFQ2f67wzmXmeXwt9HGf+50MzuYhGvqa2at/d+L+/u6H4gBJvnLPwWm+tu4Hvg8TttZktCGiCQjBUdJayKcc9XjFviDxPG4RcCDzrkfAo5rnoz9CAFqO+dOJNCXJDOzevgC7XXOuXAzmwdkPcvhzt/uocC/gYhcXJpzlPToB6C3mYUBmFl5M7sEWAB09M9JFgbqJ3DuEuBGMyvlPzevv/wokDPOcTOBB09vmNnpYLUAuN1f1gzIc46+Xgoc9AfGivgy19NCgNPZ7+34hmuPAFvMrL2/DTOzK8/RhogkMwVHSY8+xDefuMrM1gIj8Y2CTAP+9O8bB/wceKJzbh/QE98Q5q/8N6z5DdD69IIcoC9wtX/Bz3r+WzX7DL7gug7f8Oq2c/R1BhBqZhuAl/AF59OOA9f4r6EBMNRf3gXo4e/fOqBlEv4mIpKM9GxVERGRAMocRUREAig4ioiIBFBwFBERCaDgKCIiEkDBUUREJICCo4iISAAFRxERkQAKjiIiIgEUHEVERAIoOIqIiARQcBQREQmg4CgiIhJAwVFERCSAgqOIiEgABUdJdWbWysyc/2XA6Z6ZXWVma8xsk5kNMzNL4Jg8ZjbN/77IZWZW1V+e1b/9q5mtM7Nn4pxTysyW+uudZGaZL+Z1iWQkCo6SFnQGFvn/mSLMLFNK1Z2A94F7gXL+T9MEjnkC+MU5dwXQFXjbXx4JNHDOXQlUB5qaWW3/vpeBN51zZYGDQI+UuwSRjE3BUVKVmeUAbsD3Q9/JX5bJzF4zs7X+zOpBf3ktM/vJn1UtM7OcZtbdzN6NU9+3ZlbP//2Ymb1uZr8C15nZEDNb7q931OmMzszKmtmP/npXmVkZMxtnZq3i1DvezFom4XoKA7mcc0uc703i44BWCRxaGZgD4JzbCJQ0s0LO55j/mDD/x/n72gCY4t839iz1ikgyCE3tDkiG1xKY4Zz7w8z2m9lVwDVASaC6cy7KzPL6hxAnAR2dc8vNLBcQcY66LwGWOuceATCz9c65of7vnwAtgG+A8cBLzrlpZpYV3380fgQ8BHxpZpcC1wPdzKyCvx8JqQcUBXbEKdvhLwv0K9AGWGhm1wCXA8WAPf4sdyVQFnjPObfUzPIDh5xzUeeoV0SSgYKjpLbO/DekONG/XQoYcToQOOcOmFk1YLdzbrm/7AhAAtN5cUUDX8TZrm9mjwLZgbzAOjObBxR1zk3z13vCf+x8MxtuZgWAtsAX/v78jm+4M0Hn6E9cLwFvm9kvwBpgtb+/OOeigepmlhuY5p+P/CepFYvIhVNwlFRjZnnxDRVWMzMHZAIcsPw8qoki/vRA1jjfT/gDDf6McDhwtXNuu5k9HXBsQsYBd+Ab7r3LX8+5Msed+DLA04r5y+LxB/fTdRqwBdgccMwhM5uLb87ydSC3mYX6g3SC9YpI8tCco6SmdsAnzrnLnXMlnXPF8QWJX4H7zCwUYoPo70BhM6vlL8vp3/83viwrxMyK4xuSTcjpQPivf56zHYBz7iiw4/T8opllMbPs/mPHAP39x633//N351z1s3wOOed2A0fMrLY/6HUFvgrsjJnljrPa9B5ggXPuiJkV8GeMmFk2oDGw0T9/Ofd0v4H/t3fvwVaVdRjHv8/gDQQVMZHMomyKChUVsZzIG94VZSZTvKQpJpqJEkVNM2rOOHmpydHGtLyXMkZKomMiGiKheEUuwqgYWs6YOiIoFzXs1x/vb+Nmuc+Nc+Acxuczs2fv/a71rr32njnzO++6PO8pjbZrZh3DxdE600hgUqXtTqAf8C9gbl5Mc0JEfAAcB1ydbVMpBW8mpaAuAK4Cnmn0QRGxFPgDMB+Ywtqj05OBcyXNBR4Fdsg+rwMLgZva+L3OBq4HFgEvAX8DkDRa0uhc5yvAfEnPA4cBY7K9HzAt9+VJYGpE3JvLxgNjJS0C+lDOi5rZeqDyD6mZVeUIch6wR0Qs6+z9MbMNxyNHswYkDaOMGq92YTT75PHI0czMrMIjRzMzswoXR+tUkj6U9Gym1kysu1K0Pdu8OA+LNrV8tKTvtvdzmtn+Omer1i3vJmm2pHsb9L1K0vJqu5l1HBdH62yr8jaIgcAHwOj6hbXbOdoiIi6IiAebWX5tRNza9l1ttfZkq9aMoZzzXIukwUDvSndhWQAACFFJREFUDt1bM/sYF0frSmYAX5S0n6QZkiYDC3IUdUXmos6VdGatg6TxOUqbI+nSbLtZ0rfz9aWSFmS/X2XbRZLG5etBkmbl8kmSemf7w5Iuy1HdC5KGtuYLtDdbNbfxGeAIyu0g9dvuBlwB/KR1P6eZrSsn5FiXkCPEw4D7s2kPYGBELJb0fWBZROwlaXNgpqQHgAGUbNa9I2JlhgXUb7MPMAIYEBFRu7m+4lbghxExXdLFwIXkjf/AJhExRNLh2T5sQ2SrAldSCmCvSp9zgMkR8VobYurMbB24OFpn6575olBGjjdQQr6fiIjF2X4wsGttNAhsTTlcOQy4KSJWQslgrWx7GfAecEOeu1vr/J1KoPg2ETE9m24BJtatclc+P00JQici1mu2qqQjgTci4mnl7CK53U8Dx1IKsJmtZy6O1tlWRcRaxSYLzIr6JsrobkplvUOa23DO6DEEOJASu3YOJcu1td7P5w/Jv5UNkK16HDA8R6tbAFtJ+hMwgTJLx6L8fXpIWpRzO5pZB3NxtI3BFOAsSX+PiP9K+hKl4EwFLpB0W+2wav3oMTNUe0TEfZJm8vFg72WS3pY0NCJmUGLkptOMlkaOwFJJ76hMUPw45WKbq6sr5SHelRmLtyZbFfhZPsiR47iIOCm77VDXf7kLo9n64+JoG4PrKYc1n8lR1pvAMRFxv6RBwFOSPgDuo1wFWtMLuFtlRg4BYxts+xTg2ryF5J/kaK6dzqaElnen5KquyVaFcrUsJVv1FpXZSJ6jTPZsZl2EE3LMzMwqfCuHmZlZhYujmZlZhYujdVmVaLl7mrhPsT3bf1nSdvm61XFskj4v6fGMh7tDH01aXL/OZpJuqgso2K9u2SWS/l39TElj6wILHpL0uXZ8PTNrBxdH68rqo+WWAD/o7B1KlwG/yatF36bxxTRnAETELsBBwK8l1f7e7gGGNOgzGxickXJ/AS7v6B03s9ZxcbSNxWNk0oyknSXdL+npjJkbkO19MwJuTj72yfa/5rrPZdrOOsurZQ+gFC8owQEtxcO9ASwFBuf7WRHxWrVDREyrBRoAs1j7fkkz24B8K4d1eZkpeiAlPQfg98DoiHhR0t7ANZSCdRUwPSJGZJ+euf5pEbFEUnfgSUl3RsRbTXxWL0pSTyMnAG8ASyNidbY1Fw83XNIEYCdgz3x+opVf+3TyFhAz2/BcHK0rq0XL7UiZoWJq3ti/DzCxLqpt83w+gHLTPRHxISU+DuBcSSPy9U6U6LmGxTEi3qX5eLjtWrnvN1LuZXwKeAV4lJK00yJJJ1FGmfu28rPMrIO5OFpXtioiBuUN+lMo5xxvpozcmkupWSMvhBkGfCNTdB6mxLI1tX5LI8eFwDaSNsnRY1PxcKuB8+u2+yjwQiv2dxjwc2DfiHi/pfXNbP3wOUfr8vI83LnAj4CVwGJJx0I5Byhpt1z1IeCsbO+WweJbA29nYRwAfL2Fz3o3LwJq9FiQ01BNo2S1QknYubu6HUk9JG2Zrw8CVkfEguY+W9LuwHXA8DxPaWadxMXRNgoRMRuYC4wETgROlzSHEr12dK42Bthf0jzKTBpfpUyBtYmkhZSZMGZ1wO6MB8ZKWgT0Ic+FShquMu0VwPaUuLuFuf7Jtc6SLpf0KiU8/FVJF+WiKyjnSSfmLSyTO2BfzWwdOD7OzMyswiNHMzOzChdHMzOzChdHMzOzChdH63R1Gaq1R39JfSRNk7Rc0m+b6XukpNmZiLNA0pkbct8b7M+2kqZKejGfezex3mWZGTtf0nF17bdJej7bb5S0abb/uO73mZ+/2bYb6nuZfdL4ghzrdCqz2vestG0J7A4MBAZGxDkN+m1KucF+SES8KmlzoH9EPN+OfRHl7+J/69j/cmBJRFwq6adA74gYX1nnCOA84DBKgMHDwIER8Y6kw/koGed24JGI+F2l/1HA+RFxwLrso5m1zCNH65IiYkVE/AN4r5nVelGCLN7KPu/XCmMzOatj60Zs52Vb/xyt3QrMB3bKkdqTOUPGL9qw60dT8lah+dzVRyJidUSsoNyicmh+h/siUaLmGuWrjgQmtGGfzKyNXBytK+hed8hwUms7RcQSYDLwiqQJkk7URzNf1HJWdwP2AJ6TtCfwPWBvShjAGXnjPZRIuWsi4mvAl/P9EEqU3J6SvgWgEnT+bIPHsNxO37pQ8f8AfRvs+hzg0AwK2A7YnxJrt0aOik+m3KdZ396DUkjvbO3vZGZt5/g46wpWtTYOrioiRknahRIRN44yPdSpNMhZlfRNYFKO1pB0FzCULLARUQsIODgfs/N9T0qxfCQihrZh30LSx85bRMQDkvai5K2+SZlxpJq7ek1+XjXK7ihgZv5jYGbriYujbfQiYh4wT9IfgcWU4thWK+peC/hlRFxXXUnSDMrh3KpxEfEg8LqkfhHxmqR+lFk8Gu3zJcAluc3bqctdlXQh8Cmg0cVFx+NDqmbrnQ+r2kZLUs8MFq8ZRLlABxrnrM4AjqnLPR1B45DxKcBpKjOAIGlHSdsDRMTQJnJXH8y+kyl5q9B07mo3SX3y9a7ArsAD+X4UcAgwsnpRUH6HfRtt08w6lq9WtU7X6GrVbH8Z2ArYjDJZ8MH14d0qM2jcAewMrKKM/sZExFOS+lLmffwC5ZDlWRHxmKSxwGm5iesj4kpJ/YF7I2Jg3bbHAKPy7XLgpIh4qRXfpQ/wZ+CzlEL9nZxLcjBlDspRkrYAnsku72T7s9l/dfZ7N5ffFREX57JTgUMj4viW9sPM2sfF0czMrMKHVc3MzCpcHM3MzCpcHM3MzCpcHM3MzCpcHM3MzCpcHM3MzCpcHM3MzCpcHM3MzCr+D7LtdYZ12h17AAAAAElFTkSuQmCC\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {
+ "tags": [],
+ "needs_background": "light"
+ }
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "p8D975NqsGtj"
+ },
+ "source": [
+ "## Parameter tunning\n",
+ "### Referência\n",
+ "* [Hyperparameter Tuning the Random Forest in Python](https://towardsdatascience.com/hyperparameter-tuning-the-random-forest-in-python-using-scikit-learn-28d2aa77dd74)\n",
+ "* [Decision Tree Adventures 2 — Explanation of Decision Tree Classifier Parameters](https://medium.com/datadriveninvestor/decision-tree-adventures-2-explanation-of-decision-tree-classifier-parameters-84776f39a28) - Explica didaticamente e step by step como fazer parameter tunning."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "Bfdq5zEhlVsk"
+ },
+ "source": [
+ "# Dicionário de parâmetros para o parameter tunning. Ao todo serão ajustados 2X13X5X5X7= 4.550 modelos. Contando com 10 folds no Cross-Validation, então são 45.500 modelos.\n",
+ "d_parametros_DT= {\"criterion\": [\"gini\", \"entropy\"]} #, \"min_samples_split\": [2, 5, 10, 30, 50, 70, 90, 120, 150, 180, 210, 240, 270, 350, 400], \"max_depth\": [None, 2, 5, 9, 15], \"min_samples_leaf\": [20, 40, 60, 80, 100], \"max_leaf_nodes\": [None, 2, 3, 4, 5, 10, 15]}"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "H8gNSs0G0A-L"
+ },
+ "source": [
+ "```\n",
+ "grid_search = GridSearchCV(ml_DT, param_grid= d_parametros_DT, cv = i_CV, n_jobs= -1)\n",
+ "start = time()\n",
+ "grid_search.fit(X_train, y_train)\n",
+ "tempo_elapsed= time()-start\n",
+ "print(f\"\\nGridSearchCV levou {tempo_elapsed:.2f} segundos para estimar {len(grid_search.cv_results_)} modelos candidatos\")\n",
+ "\n",
+ "GridSearchCV levou 1999.12 segundos para estimar 23 modelos candidatos\n",
+ "```\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "ap3WMXqDthu9"
+ },
+ "source": [
+ "# Definindo a função para o GridSearchCV\n",
+ "def GridSearchOptimizer(modelo, ml_Opt, d_Parametros, X_train, y_train, X_test, y_test, cv = i_CV):\n",
+ " ml_GridSearchCV = GridSearchCV(modelo, d_Parametros, cv = i_CV, n_jobs= -1, verbose= 10, scoring= 'accuracy')\n",
+ " start = time()\n",
+ " ml_GridSearchCV.fit(X_train, y_train)\n",
+ " tempo_elapsed= time()-start\n",
+ " #print(f\"\\nGridSearchCV levou {tempo_elapsed:.2f} segundos.\")\n",
+ "\n",
+ " # Parâmetros que otimizam a classificação:\n",
+ " print(f'\\nParametros otimizados: {ml_GridSearchCV.best_params_}')\n",
+ " \n",
+ " if ml_Opt == 'ml_DT2':\n",
+ " print(f'\\nDecisionTreeClassifier *********************************************************************************************************')\n",
+ " ml_Opt = DecisionTreeClassifier(criterion= ml_GridSearchCV.best_params_['criterion'], \n",
+ " max_depth= ml_GridSearchCV.best_params_['max_depth'],\n",
+ " max_leaf_nodes= ml_GridSearchCV.best_params_['max_leaf_nodes'],\n",
+ " min_samples_split= ml_GridSearchCV.best_params_['min_samples_leaf'],\n",
+ " min_samples_leaf= ml_GridSearchCV.best_params_['min_samples_split'], \n",
+ " random_state= i_Seed)\n",
+ " \n",
+ " elif ml_Opt == 'ml_RF2':\n",
+ " print(f'\\nRandomForestClassifier *********************************************************************************************************')\n",
+ " ml_Opt = RandomForestClassifier(bootstrap= ml_GridSearchCV.best_params_['bootstrap'], \n",
+ " max_depth= ml_GridSearchCV.best_params_['max_depth'],\n",
+ " max_features= ml_GridSearchCV.best_params_['max_features'],\n",
+ " min_samples_leaf= ml_GridSearchCV.best_params_['min_samples_leaf'],\n",
+ " min_samples_split= ml_GridSearchCV.best_params_['min_samples_split'],\n",
+ " n_estimators= ml_GridSearchCV.best_params_['n_estimators'],\n",
+ " random_state= i_Seed)\n",
+ " \n",
+ " elif ml_Opt == 'ml_AB2':\n",
+ " print(f'\\nAdaBoostClassifier *********************************************************************************************************')\n",
+ " ml_Opt = AdaBoostClassifier(algorithm='SAMME.R', \n",
+ " base_estimator=RandomForestClassifier(bootstrap = False, \n",
+ " max_depth = 10, \n",
+ " max_features = 'auto', \n",
+ " min_samples_leaf = 1, \n",
+ " min_samples_split = 2, \n",
+ " n_estimators = 400), \n",
+ " learning_rate = ml_GridSearchCV.best_params_['learning_rate'], \n",
+ " n_estimators = ml_GridSearchCV.best_params_['n_estimators'], \n",
+ " random_state = i_Seed)\n",
+ " \n",
+ " elif ml_Opt == 'ml_GB2':\n",
+ " print(f'\\nGradientBoostingClassifier *********************************************************************************************************')\n",
+ " ml_Opt = GradientBoostingClassifier(learning_rate = ml_GridSearchCV.best_params_['learning_rate'], \n",
+ " n_estimators = ml_GridSearchCV.best_params_['n_estimators'], \n",
+ " max_depth = ml_GridSearchCV.best_params_['max_depth'], \n",
+ " min_samples_split = ml_GridSearchCV.best_params_['min_samples_split'], \n",
+ " min_samples_leaf = ml_GridSearchCV.best_params_['min_samples_leaf'], \n",
+ " max_features = ml_GridSearchCV.best_params_['max_features'])\n",
+ " \n",
+ " elif ml_Opt == 'ml_XGB2':\n",
+ " print(f'\\nXGBoostingClassifier *********************************************************************************************************')\n",
+ " ml_Opt = XGBoostingClassifier(learning_rate= ml_GridSearchCV.best_params_['learning_rate'], \n",
+ " max_depth= ml_GridSearchCV.best_params_['max_depth'], \n",
+ " colsample_bytree= ml_GridSearchCV.best_params_['colsample_bytree'], \n",
+ " subsample= ml_GridSearchCV.best_params_['subsample'], \n",
+ " gamma= ml_GridSearchCV.best_params_['gamma'], \n",
+ " min_child_weight= ml_GridSearchCV.best_params_['min_child_weight'])\n",
+ " \n",
+ " # Treina novamente usando os parametros otimizados...\n",
+ " ml_Opt.fit(X_train, y_train)\n",
+ "\n",
+ " # Cross-Validation com 10 folds\n",
+ " print(f'\\n********* CROSS-VALIDATION ***********')\n",
+ " a_scores_CV = cross_val_score(ml_Opt, X_train, y_train, cv = i_CV)\n",
+ " print(f'Média das Acurácias calculadas pelo CV....: {100*round(a_scores_CV.mean(),4)}')\n",
+ " print(f'std médio das Acurácias calculadas pelo CV: {100*round(a_scores_CV.std(),4)}')\n",
+ "\n",
+ " # Faz predições com os parametros otimizados...\n",
+ " y_pred = ml_Opt.predict(X_test)\n",
+ " \n",
+ " # Importância das COLUNAS\n",
+ " print(f'\\n********* IMPORTÂNCIA DAS COLUNAS ***********')\n",
+ " df_importancia_variaveis = pd.DataFrame(zip(l_colunas, ml_Opt.feature_importances_), columns= ['coluna', 'importancia'])\n",
+ " df_importancia_variaveis = df_importancia_variaveis.sort_values(by= ['importancia'], ascending=False)\n",
+ " print(df_importancia_variaveis)\n",
+ "\n",
+ " # Matriz de Confusão\n",
+ " print(f'\\n********* CONFUSION MATRIX - PARAMETER TUNNING ***********')\n",
+ " cf_matrix = confusion_matrix(y_test, y_pred)\n",
+ " cf_labels = ['True_Negative', 'False_Positive', 'False_Negative', 'True_Positive']\n",
+ " cf_categories = ['Zero', 'One']\n",
+ " mostra_confusion_matrix(cf_matrix, group_names = cf_labels, categories = cf_categories)\n",
+ "\n",
+ " return ml_Opt, ml_GridSearchCV.best_params_"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "44-BRnNjBT25"
+ },
+ "source": [
+ "# Invoca a função\n",
+ "ml_DT2, best_params = GridSearchOptimizer(ml_DT, 'ml_DT2', d_parametros_DT, X_train, y_train, X_test, y_test, cv = i_CV)"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "gmCkjGjPJMLr"
+ },
+ "source": [
+ "### Visualizar o resultado"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "cIc3ZgaISEd0"
+ },
+ "source": [
+ "from sklearn.tree import export_graphviz\n",
+ "from sklearn.externals.six import StringIO \n",
+ "from IPython.display import Image \n",
+ "import pydotplus\n",
+ "\n",
+ "dot_data = StringIO()\n",
+ "export_graphviz(ml_DT2, out_file = dot_data, filled = True, rounded = True, special_characters = True, feature_names = l_colunas, class_names = ['0','1'])\n",
+ "\n",
+ "graph = pydotplus.graph_from_dot_data(dot_data.getvalue()) \n",
+ "graph.write_png('DecisionTree.png')\n",
+ "Image(graph.create_png())"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "e1R2GBkbnV37"
+ },
+ "source": [
+ "## Selecionar as COLUNAS importantes/relevantes"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "vv7GKBvs6Ybf"
+ },
+ "source": [
+ "# Função desenvolvida para Selecionar COLUNAS relevantes\n",
+ "from sklearn.feature_selection import SelectFromModel\n",
+ "\n",
+ "def seleciona_colunas_relevantes(modelo, X_train, X_test, threshold = 0.05):\n",
+ " # Cria um seletor para selecionar as COLUNAS com importância > threshold\n",
+ " sfm = SelectFromModel(modelo, threshold)\n",
+ " \n",
+ " # Treina o seletor\n",
+ " sfm.fit(X_train, y_train)\n",
+ "\n",
+ " # Mostra o indice das COLUNAS mais importantes\n",
+ " print(f'\\n********** COLUNAS Relevantes ******')\n",
+ " print(sfm.get_support(indices=True))\n",
+ "\n",
+ " # Seleciona somente as COLUNAS relevantes\n",
+ " X_train_I = sfm.transform(X_train)\n",
+ " X_test_I = sfm.transform(X_test)\n",
+ " return X_train_I, X_test_I "
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "ukMLoEr7nbUf"
+ },
+ "source": [
+ "X_train_DT, X_test_DT = seleciona_colunas_relevantes(ml_DT2, X_train, X_test)"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "8JjePRQAoqkk"
+ },
+ "source": [
+ "## Treina o classificador com as COLUNAS relevantes"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "Gt3aCPpfKRxm"
+ },
+ "source": [
+ "best_params"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "zq6uCVtzovMt"
+ },
+ "source": [
+ "# Treina usando as COLUNAS relevantes...\n",
+ "ml_DT2.fit(X_train_DT, y_train)\n",
+ "\n",
+ "# Cross-Validation com 10 folds\n",
+ "a_scores_CV = cross_val_score(ml_DT2, X_train_DT, y_train, cv = i_CV)\n",
+ "print(f'Média das Acurácias calculadas pelo CV....: {100*round(a_scores_CV.mean(),4)}')\n",
+ "print(f'std médio das Acurácias calculadas pelo CV: {100*round(a_scores_CV.std(),4)}')"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "Tc7esxqtq-Og"
+ },
+ "source": [
+ "****************************************************************"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "znWy3LE1q-Z3"
+ },
+ "source": [
+ "ml_DT3, best_params2 = GridSearchOptimizer(ml_DT2, 'ml_DT2', d_parametros_DT, X_train_DT, y_train, X_test_DT, y_test, cv = i_CV)"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "6IhCC6pfq-jL"
+ },
+ "source": [
+ "best_params"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "qw6Dk3kesT0q"
+ },
+ "source": [
+ "best_params2"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "SbS4ZKN8s-ee"
+ },
+ "source": [
+ "# Cross-Validation com 10 folds\n",
+ "a_scores_CV = cross_val_score(ml_DT3, X_train_DT, y_train, cv = i_CV)\n",
+ "print(f'Média das Acurácias calculadas pelo CV....: {100*round(a_scores_CV.mean(),4)}')\n",
+ "print(f'std médio das Acurácias calculadas pelo CV: {100*round(a_scores_CV.std(),4)}')"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "_at3XP1Bq-qb"
+ },
+ "source": [
+ "***************************************************************"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "MZ1-vGRcxJoN"
+ },
+ "source": [
+ "## Valida o modelo usando o dataframe X_test"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "ig9GiUAEw9jr"
+ },
+ "source": [
+ "y_pred_DT = ml_DT2.predict(X_test_DT)"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "7UZz4UzHDqae"
+ },
+ "source": [
+ "# Calcula acurácia\n",
+ "accuracy_score(y_test, y_pred_DT)"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "K3EUMAxxKBur"
+ },
+ "source": [
+ "___\n",
+ "# **RANDOM FOREST**\n",
+ "* Decision Trees possuem estrutura em forma de árvores.\n",
+ "* Random Forest pode ser utilizado tanto para classificação (RandomForestClassifier)quanto para Regressão (RandomForestRegressor).\n",
+ "\n",
+ "* **Vantagens**:\n",
+ " * Não requer tanto data preprocessing;\n",
+ " * Lida bem com COLUNAS categóricas e numéricas;\n",
+ " * É um Boosting Ensemble Method (pois constrói muitas árvores). Estes modelos aprendem com os próprios erros e ajustam as árvores de modo a fazer melhores classificações;\n",
+ " * Mais robusta que uma simples Decision Tree. **Porque?**\n",
+ " * Controla automaticamente overfitting (**porque?**) e frequentemente produz modelos muito robustos e de alta-performance.\n",
+ " * Pode ser utilizado como Feature Selection, pois gera a matriz de importância dos atributos (importance sample). A soma das importâncias soma 100;\n",
+ " * Assim como as Decision Trees, esses modelos capturam facilmente padrões não-lineares presentes nos dados;\n",
+ " * Não requer os dados sejam normalizados;\n",
+ " * Lida bem com Missing Values;\n",
+ " * Não requer suposições (assumptions) sobre a distribuição dos dados por causa da natureza não-paramétrica do algoritmo\n",
+ "\n",
+ "* **Desvantagens**\n",
+ " * **Recomenda-se balancear o dataframe previamente para se evitar esse problema**.\n",
+ "\n",
+ "* **Principais parâmetros**\n",
+ "\n",
+ "## **Referências**:\n",
+ "* [Running Random Forests? Inspect the feature importances with this code](https://towardsdatascience.com/running-random-forests-inspect-the-feature-importances-with-this-code-2b00dd72b92e)\n",
+ "* [Feature importances with forests of trees](https://scikit-learn.org/stable/auto_examples/ensemble/plot_forest_importances.html)\n",
+ "* [Understanding Random Forests Classifiers in Python](https://www.datacamp.com/community/tutorials/random-forests-classifier-python)\n",
+ "* [Understanding Random Forest](https://towardsdatascience.com/understanding-random-forest-58381e0602d2)\n",
+ "* [An Implementation and Explanation of the Random Forest in Python](https://towardsdatascience.com/an-implementation-and-explanation-of-the-random-forest-in-python-77bf308a9b76)\n",
+ "* [Random Forest Simple Explanation](https://medium.com/@williamkoehrsen/random-forest-simple-explanation-377895a60d2d)\n",
+ "* [Random Forest Explained](https://www.youtube.com/watch?v=eM4uJ6XGnSM)\n",
+ "* [Hyperparameter Tuning the Random Forest in Python](https://towardsdatascience.com/hyperparameter-tuning-the-random-forest-in-python-using-scikit-learn-28d2aa77dd74) - Explica os principais parâmetros do Random Forest."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "cnfDw_GEKBuu"
+ },
+ "source": [
+ "from sklearn.ensemble import RandomForestClassifier\n",
+ "\n",
+ "# Instancia...\n",
+ "ml_RF= RandomForestClassifier(n_estimators=100, min_samples_split= 2, max_features=\"auto\", random_state= i_Seed)\n",
+ "\n",
+ "# Treina...\n",
+ "ml_RF.fit(X_train, y_train)"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "lYa9oaZW__o6"
+ },
+ "source": [
+ "# Cross-Validation com 10 folds\n",
+ "a_scores_CV = cross_val_score(ml_RF, X_train, y_train, cv = i_CV)\n",
+ "print(f'Média das Acurácias calculadas pelo CV....: {100*round(a_scores_CV.mean(),4)}')\n",
+ "print(f'std médio das Acurácias calculadas pelo CV: {100*round(a_scores_CV.std(),4)}')"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "AouWUu8vANdb"
+ },
+ "source": [
+ "**Interpretação**: Nosso classificador (RandomForestClassifier) tem uma acurácia média de 96,44% (base de treinamento). Além disso, o std é da ordem de 2,77%, ou seja, pequena. Vamos tentar melhorar a acurácia do classificador usando parameter tunning (GridSearchCV)."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "vbducxlgAa85"
+ },
+ "source": [
+ "print(f'Acurácias: {a_scores_CV}')"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "_lxx-LUw_5sd"
+ },
+ "source": [
+ "# Faz predições...\n",
+ "y_pred = ml_RF.predict(X_test)"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "pQIRO_LpGAkw"
+ },
+ "source": [
+ "# Confusion Matrix\n",
+ "cf_matrix = confusion_matrix(y_test, y_pred)\n",
+ "cf_labels = ['True_Negative','False_Positive','False_Negative','True_Positive']\n",
+ "cf_categories = ['Zero', 'One']\n",
+ "mostra_confusion_matrix(cf_matrix, group_names= cf_labels, categories= cf_categories)"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "yKLHZ5_C6FJ8"
+ },
+ "source": [
+ "## Parameter tunning\n",
+ "### Referência\n",
+ "* [Hyperparameter Tuning the Random Forest in Python](https://towardsdatascience.com/hyperparameter-tuning-the-random-forest-in-python-using-scikit-learn-28d2aa77dd74)\n",
+ "* [Decision Tree Adventures 2 — Explanation of Decision Tree Classifier Parameters](https://medium.com/datadriveninvestor/decision-tree-adventures-2-explanation-of-decision-tree-classifier-parameters-84776f39a28) - Explica didaticamente e step by step como fazer parameter tunning.\n",
+ "* [Optimizing Hyperparameters in Random Forest Classification](https://towardsdatascience.com/optimizing-hyperparameters-in-random-forest-classification-ec7741f9d3f6) - Outro approach para entender parameter tunning. Recomendo fortemente a leitura! "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "XOa9naju6FKA"
+ },
+ "source": [
+ "# Dicionário de parâmetros para o parameter tunning.\n",
+ "d_parametros_RF= {'bootstrap': [True, False]} #,\n",
+ "# 'max_depth': [10, 20, 30, 40, 50, 60, 70, 80, 90, 100, None],\n",
+ "# 'max_features': ['auto', 'sqrt'],\n",
+ "# 'min_samples_leaf': [1, 2, 4],\n",
+ "# 'min_samples_split': [2, 5, 10],\n",
+ "# 'n_estimators': [200, 400, 600, 800, 1000, 1200, 1400, 1600, 1800, 2000]}"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "6__f2jZaTQat"
+ },
+ "source": [
+ "# Invoca a função\n",
+ "ml_RF2, best_params = GridSearchOptimizer(ml_RF, 'ml_RF2', d_parametros_RF, X_train, y_train, X_test, y_test, cv = i_CV)"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "crfn-n--KG4n"
+ },
+ "source": [
+ "### Resultado da execução do Random Forest\n",
+ "\n",
+ "```\n",
+ "[Parallel(n_jobs=-1)]: Done 7920 out of 7920 | elapsed: 194.0min finished\n",
+ "best_params= {'bootstrap': False, 'max_depth': 10, 'max_features': 'auto', 'min_samples_leaf': 1, 'min_samples_split': 2, 'n_estimators': 400}\n",
+ "```"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "SGTOe5PaRw59"
+ },
+ "source": [
+ "# Como o procedimento acima levou 194 minutos para executar, então vou estimar ml_RF2 abaixo usando os parâmetros acima estimados\n",
+ "best_params= {'bootstrap': False, 'max_depth': 10, 'max_features': 'auto', 'min_samples_leaf': 1, 'min_samples_split': 2, 'n_estimators': 400}\n",
+ "\n",
+ "ml_RF2= RandomForestClassifier(bootstrap= best_params['bootstrap'], \n",
+ " max_depth= best_params['max_depth'], \n",
+ " max_features= best_params['max_features'], \n",
+ " min_samples_leaf= best_params['min_samples_leaf'], \n",
+ " min_samples_split= best_params['min_samples_split'], \n",
+ " n_estimators= best_params['n_estimators'], \n",
+ " random_state= i_Seed)"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "HMJcAdLlTQa0"
+ },
+ "source": [
+ "## Visualizar o resultado\n",
+ "> Implementar a visualização do RandomForest."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "WWNiy7Z0TQa3"
+ },
+ "source": [
+ "## Selecionar as COLUNAS importantes/relevantes"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "kOi11YOKTQa4"
+ },
+ "source": [
+ "X_train_RF, X_test_RF = seleciona_colunas_relevantes(ml_RF2, X_train, X_test)"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "Zn_O7c_DTQbE"
+ },
+ "source": [
+ "## Treina o classificador com as COLUNAS relevantes"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "UwEOwzSGTQbF"
+ },
+ "source": [
+ "best_params"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "Rr8qDrgvTQbL"
+ },
+ "source": [
+ "# Treina com as COLUNAS relevantes...\n",
+ "ml_RF2.fit(X_train_RF, y_train)\n",
+ "\n",
+ "# Cross-Validation com 10 folds\n",
+ "a_scores_CV = cross_val_score(ml_RF2, X_train_RF, y_train, cv = i_CV)\n",
+ "print(f'Acurácia Media: {100*a_scores_CV.mean():.2f}')\n",
+ "print(f'std médio.....: {100*a_scores_CV.std():.2f}')"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "-mYfQLlsTQbQ"
+ },
+ "source": [
+ "## Valida o modelo usando o dataframe X_test"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "sSD5o1JQTQbR"
+ },
+ "source": [
+ "y_pred_RF = ml_RF2.predict(X_test_RF)"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "wywF6LymDzKr"
+ },
+ "source": [
+ "# Calcula acurácia\n",
+ "accuracy_score(y_test, y_pred_RF)"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "hJJsL0IJb6iO"
+ },
+ "source": [
+ "## Estudo do comportamento dos parametros do algoritmo\n",
+ "> Consulte [Optimizing Hyperparameters in Random Forest Classification](https://towardsdatascience.com/optimizing-hyperparameters-in-random-forest-classification-ec7741f9d3f6) para mais detalhes."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "navUWMwHi44D"
+ },
+ "source": [
+ "param_range = np.arange(1, 250, 2)\n",
+ "\n",
+ "# Calculate accuracy on training and test set using range of parameter values\n",
+ "train_a_scores_CV, test_a_scores_CV = validation_curve(RandomForestClassifier(), \n",
+ " X_train, \n",
+ " y_train, \n",
+ " param_name=\"n_estimators\", \n",
+ " param_range = param_range, \n",
+ " cv = i_CV, \n",
+ " scoring = \"accuracy\", \n",
+ " n_jobs = -1)\n",
+ "\n",
+ "\n",
+ "# Calculate mean and standard deviation for training set a_scores_CV\n",
+ "train_mean = np.mean(train_a_scores_CV, axis = 1)\n",
+ "train_std = np.std(train_a_scores_CV, axis = 1)\n",
+ "\n",
+ "# Calculate mean and standard deviation for test set a_scores_CV\n",
+ "test_mean = np.mean(test_a_scores_CV, axis = 1)\n",
+ "test_std = np.std(test_a_scores_CV, axis = 1)\n",
+ "\n",
+ "# Plot mean accuracy a_scores_CV for training and test sets\n",
+ "plt.plot(param_range, train_mean, label = \"Training score\", color = \"black\")\n",
+ "plt.plot(param_range, test_mean, label = \"Cross-validation score\", color = \"dimgrey\")\n",
+ "\n",
+ "# Plot accurancy bands for training and test sets\n",
+ "plt.fill_between(param_range, train_mean - train_std, train_mean + train_std, color = \"gray\")\n",
+ "plt.fill_between(param_range, test_mean - test_std, test_mean + test_std, color = \"gainsboro\")\n",
+ "\n",
+ "# Create plot\n",
+ "plt.title(\"Validation Curve With Random Forest\")\n",
+ "plt.xlabel(\"Number Of Trees\")\n",
+ "plt.ylabel(\"Accuracy Score\")\n",
+ "plt.tight_layout()\n",
+ "plt.legend(loc = \"best\")\n",
+ "plt.show()"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "rv7TIM9kjsud"
+ },
+ "source": [
+ "param_range = np.arange(1, 250, 2)\n",
+ "\n",
+ "# Calculate accuracy on training and test set using range of parameter values\n",
+ "train_a_scores_CV, test_a_scores_CV = validation_curve(RandomForestClassifier(), \n",
+ " X_train, \n",
+ " y_train, \n",
+ " param_name = \"max_depth\", \n",
+ " param_range = param_range, \n",
+ " cv = i_CV, \n",
+ " scoring = \"accuracy\", \n",
+ " n_jobs = -1)\n",
+ "\n",
+ "# Calculate mean and standard deviation for training set a_scores_CV\n",
+ "train_mean = np.mean(train_a_scores_CV, axis = 1)\n",
+ "train_std = np.std(train_a_scores_CV, axis = 1)\n",
+ "\n",
+ "# Calculate mean and standard deviation for test set a_scores_CV\n",
+ "test_mean = np.mean(test_a_scores_CV, axis = 1)\n",
+ "test_std = np.std(test_a_scores_CV, axis = 1)\n",
+ "\n",
+ "# Plot mean accuracy a_scores_CV for training and test sets\n",
+ "plt.plot(param_range, train_mean, label=\"Training score\", color=\"black\")\n",
+ "plt.plot(param_range, test_mean, label=\"Cross-validation score\", color=\"dimgrey\")\n",
+ "\n",
+ "# Plot accurancy bands for training and test sets\n",
+ "plt.fill_between(param_range, train_mean - train_std, train_mean + train_std, color=\"gray\")\n",
+ "plt.fill_between(param_range, test_mean - test_std, test_mean + test_std, color=\"gainsboro\")\n",
+ "\n",
+ "# Create plot\n",
+ "plt.title(\"Validation Curve With Random Forest\")\n",
+ "plt.xlabel(\"Number Of Trees\")\n",
+ "plt.ylabel(\"Accuracy Score\")\n",
+ "plt.tight_layout()\n",
+ "plt.legend(loc=\"best\")\n",
+ "plt.show()"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "lm_fPGYwkJYc"
+ },
+ "source": [
+ "param_range = np.arange(1, 250, 2)\n",
+ "\n",
+ "# Calculate accuracy on training and test set using range of parameter values\n",
+ "train_a_scores_CV, test_a_scores_CV = validation_curve(RandomForestClassifier(), \n",
+ " X_train, \n",
+ " y_train, \n",
+ " param_name='min_samples_leaf', \n",
+ " param_range=param_range,\n",
+ " cv = i_CV, \n",
+ " scoring=\"accuracy\", \n",
+ " n_jobs=-1)\n",
+ "\n",
+ "\n",
+ "# Calculate mean and standard deviation for training set a_scores_CV\n",
+ "train_mean = np.mean(train_a_scores_CV, axis = 1)\n",
+ "train_std = np.std(train_a_scores_CV, axis = 1)\n",
+ "\n",
+ "# Calculate mean and standard deviation for test set a_scores_CV\n",
+ "test_mean = np.mean(test_a_scores_CV, axis = 1)\n",
+ "test_std = np.std(test_a_scores_CV, axis = 1)\n",
+ "\n",
+ "# Plot mean accuracy a_scores_CV for training and test sets\n",
+ "plt.plot(param_range, train_mean, label=\"Training score\", color=\"black\")\n",
+ "plt.plot(param_range, test_mean, label=\"Cross-validation score\", color=\"dimgrey\")\n",
+ "\n",
+ "# Plot accurancy bands for training and test sets\n",
+ "plt.fill_between(param_range, train_mean - train_std, train_mean + train_std, color=\"gray\")\n",
+ "plt.fill_between(param_range, test_mean - test_std, test_mean + test_std, color=\"gainsboro\")\n",
+ "\n",
+ "# Create plot\n",
+ "plt.title(\"Validation Curve With Random Forest\")\n",
+ "plt.xlabel(\"Number Of Trees\")\n",
+ "plt.ylabel(\"Accuracy Score\")\n",
+ "plt.tight_layout()\n",
+ "plt.legend(loc=\"best\")\n",
+ "plt.show()"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "CAqdiSaVlAB8"
+ },
+ "source": [
+ "param_range = np.arange(0.05, 1, 0.05)\n",
+ "\n",
+ "# Calculate accuracy on training and test set using range of parameter values\n",
+ "train_a_scores_CV, test_a_scores_CV = validation_curve(RandomForestClassifier(), \n",
+ " X_train, \n",
+ " y_train, \n",
+ " param_name='min_samples_split', \n",
+ " param_range=param_range,\n",
+ " cv = i_CV, \n",
+ " scoring=\"accuracy\", \n",
+ " n_jobs=-1)\n",
+ "\n",
+ "\n",
+ "# Calculate mean and standard deviation for training set a_scores_CV\n",
+ "train_mean = np.mean(train_a_scores_CV, axis = 1)\n",
+ "train_std = np.std(train_a_scores_CV, axis = 1)\n",
+ "\n",
+ "# Calculate mean and standard deviation for test set a_scores_CV\n",
+ "test_mean = np.mean(test_a_scores_CV, axis = 1)\n",
+ "test_std = np.std(test_a_scores_CV, axis = 1)\n",
+ "\n",
+ "# Plot mean accuracy a_scores_CV for training and test sets\n",
+ "plt.plot(param_range, train_mean, label=\"Training score\", color=\"black\")\n",
+ "plt.plot(param_range, test_mean, label=\"Cross-validation score\", color=\"dimgrey\")\n",
+ "\n",
+ "# Plot accurancy bands for training and test sets\n",
+ "plt.fill_between(param_range, train_mean - train_std, train_mean + train_std, color=\"gray\")\n",
+ "plt.fill_between(param_range, test_mean - test_std, test_mean + test_std, color=\"gainsboro\")\n",
+ "\n",
+ "# Create plot\n",
+ "plt.title(\"Validation Curve With Random Forest\")\n",
+ "plt.xlabel(\"Number Of Trees\")\n",
+ "plt.ylabel(\"Accuracy Score\")\n",
+ "plt.tight_layout()\n",
+ "plt.legend(loc=\"best\")\n",
+ "plt.show()"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "cX_gfsbQSdNd"
+ },
+ "source": [
+ "___\n",
+ "# **BOOSTING MODELS**\n",
+ "* São algoritmos muito utilizados nas competições do Kaggle;\n",
+ "* São algoritmos utilizados para melhorar a performance dos algoritmos de Machine Learning;\n",
+ "* Modelos:\n",
+ " - [X] AdaBoost\n",
+ " - [X] XGBoost\n",
+ " - [X] LightGBM\n",
+ " - [X] GradientBoosting\n",
+ " - [X] CatBoost\n",
+ "\n",
+ "## Bagging vs Boosting vc Stacking\n",
+ "### **Bagging**\n",
+ "* Objetivo é reduzir a variância;\n",
+ "\n",
+ "#### Como funciona\n",
+ "* Seleciona várias amostras **COM REPOSIÇÃO** do dataframe de treinamento. Cada amostra é usada para treinar um modelo usando Decision Trees. Como resultado, temos um ensemble de muitas e diferentes modelos (Decision Trees). A média de desses muitos e diferentes modelos (Decision Trees) são usados para produzir o resultado final;\n",
+ "* O resultado final é mais robusto do que usarmos uma simples Decision Tree.\n",
+ "\n",
+ "\n",
+ "\n",
+ "Souce: [Boosting and Bagging: How To Develop A Robust Machine Learning Algorithm](https://hackernoon.com/how-to-develop-a-robust-algorithm-c38e08f32201).\n",
+ "\n",
+ "#### Steps\n",
+ "* Suponha um dataframe X_train (dataframe de treinamento) contendo N observações (instâncias, pontos, linhas) e M COLUNAS (features, atributos).\n",
+ " 1. Bagging seleciona aleatoriamente uma amostra **COM REPOSIÇÃO** de X_train;\n",
+ " 2. Bagging seleciona aleatoriamente M2 (M2 < M) COLUNAS do dataframe extraído do passo (1);\n",
+ " 3. Constroi uma Decision Tree com as M2 COLUNAS do passo (2) e o dataframe obtido no passo (1) e as COLUNAS são avaliadas pela sua habilidade de classificar as observações;\n",
+ " 4. Os passos (1)--> (2)-- (3) são repetidos K vezes (ou seja, K Decision Trees), de forma que as COLUNAS são ranqueadas pelo seu poder preditivo e o resultado final (acurácia, por exemplo) é obtido pela agregação das predições dos K Decision Trees.\n",
+ "\n",
+ "#### Vantagens\n",
+ "* Reduz overfitting;\n",
+ "* Lida bem com dataframes com muitas COLUNAS (high dimensionality);\n",
+ "* Lida automaticamente com Missing Values;\n",
+ "\n",
+ "#### Desvantagem\n",
+ "* A predição final é baseada na média das K Decision Trees, o que pode comprometer a acurácia final.\n",
+ "\n",
+ "___ \n",
+ "### **Boosting**\n",
+ "* Objetivo é melhorar acurácia;\n",
+ "\n",
+ "#### Como funciona\n",
+ "* Os classificadores são usados sequencialmente, de forma que o classificador no passo N aprende com os erros do classificador do passo N-1. Ou seja, o objetivo é melhorar a precisão/acurácia à cada passo aprendendo com o passado.\n",
+ "\n",
+ "\n",
+ "\n",
+ "Source: [Ensemble methods: bagging, boosting and stacking](https://towardsdatascience.com/ensemble-methods-bagging-boosting-and-stacking-c9214a10a205), Joseph Rocca\n",
+ ".\n",
+ "\n",
+ "#### Steps\n",
+ "* Suponha um dataframe X_train (dataframe de treinamento) contendo N observações (instâncias, pontos, linhas) e M COLUNAS (features, atributos).\n",
+ " 1. Boosting seleciona aleatoriamente uma amostra D1 SEM reposição de X_train;\n",
+ " 2. Boosting treina o classificador C1;\n",
+ " 3. Boosting seleciona aleatoriamente a SEGUNDA amostra D2 SEM reposição de X_train e acrescenta à D2 50% das observações que foram classificadas incorretamente para treinar o classificador C2;\n",
+ " 4. Boosting encontra em X_train a amostra D3 que os classificadores C1 e C2 discordam em classificar e treina C3;\n",
+ " 5. Combina (voto) as predições de C1, C2 e C3 para produzir o resultado final.\n",
+ "\n",
+ "#### Vantagens\n",
+ "* Lida bem com dataframes com muitas COLUNAS (high dimensionality);\n",
+ "* Lida automaticamente com Missing Values;\n",
+ "\n",
+ "#### Desvantagem\n",
+ "* Propenso a overfitting. Recomenda-se tratar outliers previamente.\n",
+ "* Requer ajuste cuidadoso dos hyperparameters;"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "9fgUrkmPk4dr"
+ },
+ "source": [
+ "___\n",
+ "# STACKING\n",
+ "\n",
+ "\n",
+ "\n",
+ "Kd a referência desta figura???"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "B0jxx3ETpOdm"
+ },
+ "source": [
+ "___\n",
+ "# **BOOTSTRAPPING METHODS**\n",
+ "> Antes de falarmos de Boosting ou Bagging, precisamos entender primeiro o que é Bootstrap, pois ambos (Boosting e Bagging) são baseados em Bootstrap.\n",
+ "\n",
+ "* Em Estatística (e em Machine Learning), Bootstrap se refere à extrair amostras aleatórias COM reposição da população X."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "SyqazmUuifkE"
+ },
+ "source": [
+ "___\n",
+ "# **ADABOOST(Adaptive Boosting)**\n",
+ "* Quando nada funciona, AdaBoost funciona!\n",
+ "* Foi um dos primeiros algoritmos de Boosting (1995);\n",
+ "* AdaBoost pode ser utilizado tanto para classificação (AdaBoostClassifier) quanto para Regressão (AdaBoostRegressor);\n",
+ "* AdaBoost usam algoritmos DecisionTree como base_estimator;"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "RU-vzkXqrFVw"
+ },
+ "source": [
+ "## Referências\n",
+ "* [AdaBoost Classifier Example In Python](https://towardsdatascience.com/machine-learning-part-17-boosting-algorithms-adaboost-in-python-d00faac6c464) - Didático e explica exatamente como o AdaBoost funciona.\n",
+ "* [Adaboost for Dummies: Breaking Down the Math (and its Equations) into Simple Terms](https://towardsdatascience.com/adaboost-for-dummies-breaking-down-the-math-and-its-equations-into-simple-terms-87f439757dcf) - Para quem quer entender a matemática por trás do algoritmo.\n",
+ "* [Gradient Boosting and XGBoost](https://medium.com/hackernoon/gradient-boosting-and-xgboost-90862daa6c77)\n",
+ "* [Understanding AdaBoost](https://towardsdatascience.com/understanding-adaboost-2f94f22d5bfe), Akash Desarda.\n",
+ "* [AdaBoost Classifier Example In Python](https://towardsdatascience.com/machine-learning-part-17-boosting-algorithms-adaboost-in-python-d00faac6c464)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "6EMrjQDZIMl_"
+ },
+ "source": [
+ "## O que é AdaBoost (Adaptive Boosting)?\n",
+ "* é um dos classificadores do tipo ensemble (combina vários classificadores para aumentar a precisão).\n",
+ "* AdaBoost é um classificador iterativo e forte que combina (ensemble) vários classificadores fracos para melhorar a precisão.\n",
+ "* Qualquer algoritmo de aprendizado de máquina pode ser usado como um classificador de base (parâmetro base_estimator);\n",
+ "\n",
+ "## Parâmetros mais importantes do AdaBoost:\n",
+ "* base_estimator - É um classificador usado para treinar o modelo. Como default, AdaBoost usa o DecisionTreeClassifier. Como dito anteriormente, pode-se utilizar diferentes algoritmos para esse fim.\n",
+ "* n_estimators - Número de base_estimator para treinar iterativamente.\n",
+ "* learning_rate - Controla a contribuição do base_estimator na solução/combinação final;"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "TzLtHzWNJBix"
+ },
+ "source": [
+ "## Usando diferentes algoritmos para base_estimator\n",
+ "> Como dito anteriormente, pode-se utilizar vários tipos de base_estimator em AdaBoost. Por exemplo, se quisermos usar SVM (Support Vector Machines), devemos proceder da seguinte forma:\n",
+ "\n",
+ "\n",
+ "```\n",
+ "# Importar a biblioteca base_estimator\n",
+ "from sklearn.svm import SVC\n",
+ "\n",
+ "# Treina o classificador (algoritmo)\n",
+ "ml_SVC= SVC(probability=True, kernel='linear')\n",
+ "\n",
+ "# Constroi o modelo AdaBoost\n",
+ "ml_AB = AdaBoostClassifier(n_estimators= 50, base_estimator=ml_SVC, learning_rate=1)\n",
+ "```\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "hrj4a4s6hMMB"
+ },
+ "source": [
+ "## Vantagens\n",
+ "* AdaBoost é fácil de implementar;\n",
+ "* AdaBoost corrige os erros do base_estimator iterativamente e melhora a acurácia;\n",
+ "* Faz o Feature Selection automaticamente (**Porque**?);\n",
+ "* Pode-se usar muitos algoritos como base_estimator ;\n",
+ "* Como é um método ensemble, então o modelo final é pouco propenso à overfitting.\n",
+ "\n",
+ "## Desvantagens\n",
+ "* AdaBoost é sensível a ruídos nos dados;\n",
+ "* Altamente impactado por outliers (contribui para overfitting), pois o algoritmo tenta se ajustr a cada ponto da mehor forma possível;\n",
+ "* AdaBoost é mais lento que XGBoost;"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "bgJmu7YLiyv7"
+ },
+ "source": [
+ "No exemplo a seguir, vou usar RandomForestClassifier com os parâmetros otimizados, ou seja:\n",
+ "\n",
+ "```\n",
+ "best_params= {'bootstrap': False, 'max_depth': 10, 'max_features': 'auto', 'min_samples_leaf': 1, 'min_samples_split': 2, 'n_estimators': 400}\n",
+ "```\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "5VCRNyZT3qvc"
+ },
+ "source": [
+ "best_params= {'bootstrap': False, 'max_depth': 10, 'max_features': 'auto', 'min_samples_leaf': 1, 'min_samples_split': 2, 'n_estimators': 400}"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "1gIboJdriq61"
+ },
+ "source": [
+ "from sklearn.ensemble import AdaBoostClassifier\n",
+ "from sklearn.ensemble import RandomForestClassifier\n",
+ "\n",
+ "# Instancia RandomForestClassifier - Parâmetros otimizados!\n",
+ "ml_RF2= RandomForestClassifier(bootstrap= best_params['bootstrap'], \n",
+ " max_depth= best_params['max_depth'], \n",
+ " max_features= best_params['max_features'], \n",
+ " min_samples_leaf= best_params['min_samples_leaf'], \n",
+ " min_samples_split= best_params['min_samples_split'], \n",
+ " n_estimators= best_params['n_estimators'], \n",
+ " random_state= i_Seed)\n",
+ "# Instancia AdaBoostClassifier\n",
+ "ml_AB= AdaBoostClassifier(n_estimators=100, base_estimator= ml_RF2, random_state= i_Seed)\n",
+ "\n",
+ "# Treina...\n",
+ "ml_AB.fit(X_train, y_train)"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "A4Cs81OLD40y"
+ },
+ "source": [
+ "# Cross-Validation com 10 folds\n",
+ "a_scores_CV = cross_val_score(ml_AB, X_train, y_train, cv = i_CV)\n",
+ "print(f'Média das Acurácias calculadas pelo CV....: {100*round(a_scores_CV.mean(),4)}')\n",
+ "print(f'std médio das Acurácias calculadas pelo CV: {100*round(a_scores_CV.std(),4)}')"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "F7Ce5L38ECoC"
+ },
+ "source": [
+ "**Interpretação**: Nosso classificador (AdaBoostClassifier) tem uma acurácia média de 96,72% (base de treinamento). Além disso, o std é da ordem de 2,54%, ou seja, pequena. Vamos tentar melhorar a acurácia do classificador usando parameter tunning (GridSearchCV)."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "t5GfnBwEifkO"
+ },
+ "source": [
+ "print(f'Acurácias: {a_scores_CV}')"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "Q9rSpuXyEPA5"
+ },
+ "source": [
+ "# Faz predições com os parametros otimizados...\n",
+ "y_pred = ml_AB.predict(X_test)"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "2F9k-_eXGDLa"
+ },
+ "source": [
+ "# Confusion Matrix\n",
+ "cf_matrix = confusion_matrix(y_test, y_pred)\n",
+ "cf_labels = ['True_Negative','False_Positive','False_Negative','True_Positive']\n",
+ "cf_categories = ['Zero', 'One']\n",
+ "mostra_confusion_matrix(cf_matrix, group_names= cf_labels, categories= cf_categories)"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "XweWTjQ9EXLw"
+ },
+ "source": [
+ "## Parameter tunning"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "fcrKzse9EbL_"
+ },
+ "source": [
+ "# Dicionário de parâmetros para o parameter tunning.\n",
+ "d_parametros_AB = {'n_estimators':[50, 100, 200], 'learning_rate':[.001, 0.01, 0.05, 0.1, 0.3,1]}"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "Susc3I7mFDQX"
+ },
+ "source": [
+ "# Invoca a função\n",
+ "ml_AB2, best_params= GridSearchOptimizer(ml_AB, 'ml_AB2', d_parametros_AB, X_train, y_train, X_test, y_test, cv = i_CV)"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "w4JjWsusjNS8"
+ },
+ "source": [
+ "___\n",
+ "# **GRADIENT BOOSTING**\n",
+ "* Gradient boosting pode ser usado para resolver problemas de classificação (GradientBoostingClassifier) e Regressão (GradientBoostingRegressor);\n",
+ "* Gradient boosting são um refinamento do AdaBoost (lembra que AdaBoost foi um dos primeiros métodos de Boosting - criado em 1995). O que Gradient Boosting faz adicionalmente ao AdaBoost é minimizar a loss (função perda), ie, minimizar a diferença entre os valores observados de y e os valores preditos.\n",
+ "* Usa Gradient Descent para encontrar as deficiências nas previsões do passo anterior. Gradient Descent é um algoritmo popular e poderoso e usado em Redes Neurais;\n",
+ "* O objetivo do Gradient Boosting é minimizar 'loss function'. Portanto, Gradient Boosting depende da \"loss function\".\n",
+ "* Gradient boosting usam algoritmos DecisionTree como base_estimator;\n",
+ "\n",
+ "## Vantagens\n",
+ "* Não há necessidade de pre-processing;\n",
+ "* Trabalha normalmente com COLUNAS numéricas ou categóricas;\n",
+ "* Trata automaticamente os Missing Values. Ou seja, não é necessário aplicar métodos de Missing Value Imputation;\n",
+ "\n",
+ "## Desvantagens\n",
+ "* Como Gradient Boosting tenta continuamente minimizar os erros à cada iteração, isso pode enfatizar os outliers e causar overfitting. Portanto, deve-se:\n",
+ " * Tratar os outliers previamente OU\n",
+ " * Usar Cross-Validation para neutralizar os efeitos dos outliers (**Eu prefiro este método, pois toma menos tempo**);\n",
+ "* Computacionalmene caro. Geralmente são necessários muitas árvores (> 1000) para se obter bons resultados;\n",
+ "* Devido à flexibilidade (muitos parâmetros para ajustar), então é necessário usar GridSearchCV para encontrar a combinação ótima dos hyperparameters;\n",
+ "\n",
+ "## Referências\n",
+ "* [Gradient Boosting Decision Tree Algorithm Explained](https://towardsdatascience.com/machine-learning-part-18-boosting-algorithms-gradient-boosting-in-python-ef5ae6965be4) - Didático e detalhista.\n",
+ "* [Predicting Wine Quality with Gradient Boosting Machines](https://towardsdatascience.com/predicting-wine-quality-with-gradient-boosting-machines-a-gmb-tutorial-d950b1542065)\n",
+ "* [Parameter Tuning in Gradient Boosting (GBM) with Python](https://www.datacareer.de/blog/parameter-tuning-in-gradient-boosting-gbm/)\n",
+ "* [Tune Learning Rate for Gradient Boosting with XGBoost in Python](https://machinelearningmastery.com/tune-learning-rate-for-gradient-boosting-with-xgboost-in-python/)\n",
+ "* [In Depth: Parameter tuning for Gradient Boosting](https://medium.com/all-things-ai/in-depth-parameter-tuning-for-gradient-boosting-3363992e9bae) - Muito bom\n",
+ "* [Complete Machine Learning Guide to Parameter Tuning in Gradient Boosting (GBM) in Python](https://www.analyticsvidhya.com/blog/2016/02/complete-guide-parameter-tuning-gradient-boosting-gbm-python/)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "Q4bUCZs2jNTA"
+ },
+ "source": [
+ "from sklearn.ensemble import GradientBoostingClassifier\n",
+ "\n",
+ "# Instancia...\n",
+ "ml_GB=GradientBoostingClassifier(n_estimators=100, min_samples_split= 2)\n",
+ "\n",
+ "# Treina...\n",
+ "ml_GB.fit(X_train, y_train)"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "-dr6dyjdXwvd"
+ },
+ "source": [
+ "# Cross-Validation com 10 folds\n",
+ "a_scores_CV = cross_val_score(ml_GB, X_train, y_train, cv = i_CV)\n",
+ "print(f'Média das Acurácias calculadas pelo CV....: {100*round(a_scores_CV.mean(),4)}')\n",
+ "print(f'std médio das Acurácias calculadas pelo CV: {100*round(a_scores_CV.std(),4)}')"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "VlC3y3M5YaGG"
+ },
+ "source": [
+ "print(f'Acurácias: {a_scores_CV}')"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "vnLvQ0ZDYNjB"
+ },
+ "source": [
+ "**Interpretação**: Nosso classificador (GradientBoostingClassifier) tem uma acurácia média de 96,86% (base de treinamento). Além disso, o std é da ordem de 2,52%, ou seja, pequena. Vamos tentar melhorar a acurácia do classificador usando parameter tunning (GridSearchCV)."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "D2n1RKZuXq3D"
+ },
+ "source": [
+ "# Faz precições...\n",
+ "y_pred = ml_GB.predict(X_test)"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "8r6JCzQRGFa0"
+ },
+ "source": [
+ "# Confusion Matrix\n",
+ "cf_matrix = confusion_matrix(y_test, y_pred)\n",
+ "cf_labels = ['True_Negative','False_Positive','False_Negative','True_Positive']\n",
+ "cf_categories = ['Zero', 'One']\n",
+ "mostra_confusion_matrix(cf_matrix, group_names = cf_labels, categories = cf_categories)"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "KFv-Q2AD5uCk"
+ },
+ "source": [
+ "## Parameter tunning\n",
+ "> Consulte [Complete Machine Learning Guide to Parameter Tuning in Gradient Boosting (GBM) in Python](https://www.analyticsvidhya.com/blog/2016/02/complete-guide-parameter-tuning-gradient-boosting-gbm-python/) para detalhes sobre os parâmetros, significado e etc."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "wgU040AcjNTF"
+ },
+ "source": [
+ "# Dicionário de parâmetros para o parameter tunning.\n",
+ "d_parametros_GB= {'learning_rate': [1, 0.5, 0.25, 0.1, 0.05, 0.01]} #,\n",
+ "# 'n_estimators': [1, 2, 4, 8, 16, 32, 64, 100, 200],\n",
+ "# 'max_depth': [5, 10, 15, 20, 25, 30],\n",
+ "# 'min_samples_split': [0.1, 0.3, 0.5, 0.7, 0.9],\n",
+ "# 'min_samples_leaf': [0.1, 0.2, 0.3, 0.4, 0.5],\n",
+ "# 'max_features': list(range(1, X_train.shape[1]))}"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "v5KLFlpTjNTH"
+ },
+ "source": [
+ "# Invoca a função\n",
+ "ml_GB2, best_params= GridSearchOptimizer(ml_GB, 'ml_GB2', d_parametros_GB, X_train, y_train, X_test, y_test, cv = i_CV)"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "YQ6ERz3fi9i2"
+ },
+ "source": [
+ "### Resultado da execução do Gradient Boosting"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "RSa7uKw13mKG"
+ },
+ "source": [
+ "```\n",
+ "[Parallel(n_jobs=-1)]: Done 275400 out of 275400 | elapsed: 93.7min finished\n",
+ "\n",
+ "Parametros otimizados: {'learning_rate': 1, 'max_depth': 30, 'max_features': 11, 'min_samples_leaf': 0.1, 'min_samples_split': 0.1, 'n_estimators': 100}\n",
+ "```\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "wiJpA2PyjDjR"
+ },
+ "source": [
+ "# Como o procedimento acima levou 93 minutos para executar, então vou estimar ml_GB2 abaixo usando os parâmetros acima estimados\n",
+ "best_params= {'learning_rate': 1, 'max_depth': 30, 'max_features': 11, 'min_samples_leaf': 0.1, 'min_samples_split': 0.1, 'n_estimators': 100}\n",
+ "\n",
+ "#ml_GB2= GradientBoostingClassifier(learning_rate= best_params['learning_rate'], \n",
+ "# max_depth= best_params['max_depth'],\n",
+ "# max_features= best_params['max_features'],\n",
+ "# min_samples_leaf= best_params['min_samples_leaf'],\n",
+ "# min_samples_split= best_params['min_samples_split'],\n",
+ "# n_estimators= best_params['n_estimators'],\n",
+ "# random_state= i_Seed)\n",
+ "\n",
+ "ml_GB2= GradientBoostingClassifier(learning_rate= best_params['learning_rate'], \n",
+ " max_depth= best_params['max_depth'],\n",
+ " min_samples_leaf= best_params['min_samples_leaf'],\n",
+ " min_samples_split= best_params['min_samples_split'],\n",
+ " n_estimators= best_params['n_estimators'],\n",
+ " random_state= i_Seed)"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "mb14gJ7-jbVM"
+ },
+ "source": [
+ "## Selecionar as COLUNAS importantes/relevantes"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "TAqGZIFYm2sU"
+ },
+ "source": [
+ "X_train_GB, X_test_GB = seleciona_colunas_relevantes(ml_GB2, X_train, X_test)"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "6yiu6dahnBvC"
+ },
+ "source": [
+ "## Treina o classificador com as COLUNAS relevantes "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "APrtWN18nc4t"
+ },
+ "source": [
+ "best_params"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "VS0mLdOmnXAY"
+ },
+ "source": [
+ "# Treina com as COLUNAS relevantes\n",
+ "ml_GB2.fit(X_train_GB, y_train)\n",
+ "\n",
+ "# Cross-Validation com 10 folds\n",
+ "a_scores_CV = cross_val_score(ml_GB2, X_train_GB, y_train, cv = i_CV)\n",
+ "print(f'Acurácia Media: {100*a_scores_CV.mean():.2f}')\n",
+ "print(f'std médio.....: {100*a_scores_CV.std():.2f}')"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "vmc9PP_Rn1TN"
+ },
+ "source": [
+ "## Valida o modelo usando o dataframe X_test"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "e3mnIALvnzP2"
+ },
+ "source": [
+ "y_pred_GB = ml_GB2.predict(X_test_GB)\n",
+ "\n",
+ "# Calcula acurácia\n",
+ "accuracy_score(y_test, y_pred_GB)"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "kwP9Z2GnkV7r"
+ },
+ "source": [
+ "___\n",
+ "# **XGBOOST (eXtreme Gradient Boosting)**\n",
+ "* XGBoost é uma melhoria de Gradient Boosting. As melhorias são em velocidade e performace, além de corrigir as ineficiências do GradientBoosting.\n",
+ "* Algoritmo preferido pelos Kaggle Grandmasters;\n",
+ "* Paralelizável;\n",
+ "* Estado-da-arte em termos de Machine Learning;\n",
+ "\n",
+ "## Parâmetros relevantes e seus valores iniciais\n",
+ "Consulte [Complete Guide to Parameter Tuning in XGBoost with codes in Python](https://www.analyticsvidhya.com/blog/2016/03/complete-guide-parameter-tuning-xgboost-with-codes-python/) para detalhes completos sobre os parâmetros, significado e etc.\n",
+ "\n",
+ "* n_estimators = 100 (100 caso o dataframe for grande. Se o dataframe for médio/pequeno, então 1000) - É o número de árvores desejamos construir;\n",
+ "* max_depth= 3 - Determina quão profundo cada árvore pode crescer durante qualquer round de treinamento. Valores típicos no intervalo [3, 10];\n",
+ "* learning rate= 0.01 - Usado para evitar overfitting, intervalo: [0, 1];\n",
+ "* alpha (somente para problemas de Regressão) - L1 regularization nos pesos. Valores altos resulta em mais regularization;\n",
+ "* lambda (somente para problemas de Regressão) - L2 regularization nos pesos.\n",
+ "* colsample_bytree: 1 - porcentagem de COLUNAS usados por cada árvore. Alto valor pode causar overfitting;\n",
+ "* subsample: 0.8 - porcentagem de amostras usadas por árvore. Um valor baixo pode levar a overfitting;\n",
+ "* gamma: 1 - Controla se um determinado nó será dividido com base na redução esperada na perda após a divisão. Um valor mais alto leva a menos divisões.\n",
+ "* objective: Define a \"loss function\". As opções são:\n",
+ " * reg:linear - Para resolver problemas de regressão;\n",
+ " * reg:logistic - Para resolver problemas de classificação;\n",
+ " * binary:logistic - Para resolver problemas de classificação com cálculo de probabilidades;\n",
+ "\n",
+ "# Referências\n",
+ "* [How exactly XGBoost Works?](https://medium.com/@pushkarmandot/how-exactly-xgboost-works-a320d9b8aeef)\n",
+ "* [Fine-tuning XGBoost in Python like a boss](https://towardsdatascience.com/fine-tuning-xgboost-in-python-like-a-boss-b4543ed8b1e)\n",
+ "* [Gentle Introduction of XGBoost Library](https://medium.com/@imoisharma/gentle-introduction-of-xgboost-library-2b1ac2669680)\n",
+ "* [A Beginner’s guide to XGBoost](https://towardsdatascience.com/a-beginners-guide-to-xgboost-87f5d4c30ed7)\n",
+ "* [Exploring XGBoost](https://towardsdatascience.com/exploring-xgboost-4baf9ace0cf6)\n",
+ "* [Feature Importance and Feature Selection With XGBoost in Python](https://machinelearningmastery.com/feature-importance-and-feature-selection-with-xgboost-in-python/)\n",
+ "* [Ensemble Learning case study: Running XGBoost on Google Colab free GPU](https://towardsdatascience.com/running-xgboost-on-google-colab-free-gpu-a-case-study-841c90fef101) - Recomendo\n",
+ "* [Predicting movie revenue with AdaBoost, XGBoost and LightGBM](https://towardsdatascience.com/predicting-movie-revenue-with-adaboost-xgboost-and-lightgbm-262eadee6daa)\n",
+ "* [Tuning XGBoost Hyperparameters with Scikit Optimize](https://towardsdatascience.com/how-to-improve-the-performance-of-xgboost-models-1af3995df8ad)\n",
+ "* [An Example of Hyperparameter Optimization on XGBoost, LightGBM and CatBoost using Hyperopt](https://towardsdatascience.com/an-example-of-hyperparameter-optimization-on-xgboost-lightgbm-and-catboost-using-hyperopt-12bc41a271e) - Interessante\n",
+ "* [XGBOOST vs LightGBM: Which algorithm wins the race !!!](https://towardsdatascience.com/lightgbm-vs-xgboost-which-algorithm-win-the-race-1ff7dd4917d) - LightGBM tem se mostrado interessante.\n",
+ "* [From Zero to Hero in XGBoost Tuning](https://towardsdatascience.com/from-zero-to-hero-in-xgboost-tuning-e48b59bfaf58) - Gostei\n",
+ "* [Build XGBoost / LightGBM models on large datasets — what are the possible solutions?](https://towardsdatascience.com/build-xgboost-lightgbm-models-on-large-datasets-what-are-the-possible-solutions-bf882da2c27d)\n",
+ "* [Selecting Optimal Parameters for XGBoost Model Training](https://towardsdatascience.com/selecting-optimal-parameters-for-xgboost-model-training-c7cd9ed5e45e) - Muito bom!\n",
+ "* [CatBoost vs. Light GBM vs. XGBoost](https://towardsdatascience.com/catboost-vs-light-gbm-vs-xgboost-5f93620723db)\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "iMM_R4_ukV7x"
+ },
+ "source": [
+ "from xgboost import XGBClassifier\n",
+ "import xgboost as xgb\n",
+ "\n",
+ "# Instancia...\n",
+ "ml_XGB= XGBClassifier(silent=False, \n",
+ " scale_pos_weight=1,\n",
+ " learning_rate=0.01, \n",
+ " colsample_bytree = 1,\n",
+ " subsample = 0.8,\n",
+ " objective='binary:logistic', \n",
+ " n_estimators=1000, \n",
+ " reg_alpha = 0.3,\n",
+ " max_depth= 3, \n",
+ " gamma=1, \n",
+ " max_delta_step=5)"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "E4wQMlDEFINR"
+ },
+ "source": [
+ "# Treina...\n",
+ "ml_XGB.fit(X_train, y_train)"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "zAhsTtwGqMkG"
+ },
+ "source": [
+ "# Cross-Validation com 10 folds\n",
+ "a_scores_CV = cross_val_score(ml_XGB, X_train, y_train, cv = i_CV)\n",
+ "print(f'Média das Acurácias calculadas pelo CV....: {100*round(a_scores_CV.mean(),4)}')\n",
+ "print(f'std médio das Acurácias calculadas pelo CV: {100*round(a_scores_CV.std(),4)}')"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "JNyKX6PkrXOk"
+ },
+ "source": [
+ "**Interpretação**: Nosso classificador (XGBClassifier) tem uma acurácia média de 96,72% (base de treinamento). Além disso, o std é da ordem de 2,02%, ou seja, pequena. Vamos tentar melhorar a acurácia do classificador usando parameter tunning (GridSearchCV)."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "_h0QYv3FkV73"
+ },
+ "source": [
+ "print(f'Acurácias: {a_scores_CV}')"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "AKhhAZLjkV76"
+ },
+ "source": [
+ "# Faz predições...\n",
+ "y_pred = ml_XGB.predict(X_test)"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "Ir2Kd1PqGHgz"
+ },
+ "source": [
+ "# Confusion Matrix\n",
+ "cf_matrix = confusion_matrix(y_test, y_pred)\n",
+ "cf_labels = ['True_Negative','False_Positive','False_Negative','True_Positive']\n",
+ "cf_categories = ['Zero', 'One']\n",
+ "mostra_confusion_matrix(cf_matrix, group_names= cf_labels, categories= cf_categories)"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "jEC7gW4qYpWw"
+ },
+ "source": [
+ "## Parameter tunning\n",
+ "### Leitura Adicional:\n",
+ "* [Fine-tuning XGBoost in Python like a boss](https://towardsdatascience.com/fine-tuning-xgboost-in-python-like-a-boss-b4543ed8b1e)\n",
+ "* [Complete Guide to Parameter Tuning in XGBoost with codes in Python](https://www.analyticsvidhya.com/blog/2016/03/complete-guide-parameter-tuning-xgboost-with-codes-python/)\n",
+ "\n",
+ "> Olhando para os resultados acima, qual o melhor modelo?\n",
+ "\n",
+ "XGBoost? Supondo que sim, agora vamos fazer o fine-tuning dos parâmetros do modelo."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "n3MsUONPwIV9"
+ },
+ "source": [
+ "# Dicionário de parâmetros para XGBoost:\n",
+ "d_parametros_XGB = {'min_child_weight': [i for i in np.arange(1, 13)]} #,\n",
+ "# 'gamma': [i for i in np.arange(0, 5, 0.5)],\n",
+ "# 'subsample': [0.6, 0.8, 1.0],\n",
+ "# 'colsample_bytree': [0.6, 0.8, 1.0],\n",
+ "# 'max_depth': [3, 4, 5, 7, 9],\n",
+ "# 'learning_rate': [i for i in np.arange(0.01, 1, 0.1)]}"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "CX27FCKmwSni"
+ },
+ "source": [
+ "# Invoca a função\n",
+ "ml_XGB, best_params= GridSearchOptimizer(ml_XGB, 'ml_XGB2', d_parametros_XGB, X_train, y_train, X_test, y_test, cv = i_CV)"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "9b7uCuF74Hjv"
+ },
+ "source": [
+ "### Resultado da execução do XGBoostClassifier\n",
+ "\n",
+ "```\n",
+ "[Parallel(n_jobs=-1)]: Done 108000 out of 108000 | elapsed: 372.0min finished\n",
+ "\n",
+ "Parametros otimizados: {'colsample_bytree': 0.8, 'gamma': 0.5, 'learning_rate': 0.51, 'max_depth': 5, 'min_child_weight': 1, 'subsample': 0.6}\n",
+ "```\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "n7E0oyxEtbGi"
+ },
+ "source": [
+ "# Como o procedimento acima levou 372 minutos para executar, então vou estimar ml_XGB2 abaixo usando os parâmetros acima estimados\n",
+ "best_params= {'colsample_bytree': 0.8, 'gamma': 0.5, 'learning_rate': 0.51, 'max_depth': 5, 'min_child_weight': 1, 'subsample': 0.6}\n",
+ "\n",
+ "ml_XGB2= XGBClassifier(min_child_weight= best_params['min_child_weight'], \n",
+ " gamma= best_params['gamma'], \n",
+ " subsample= best_params['subsample'], \n",
+ " colsample_bytree= best_params['colsample_bytree'], \n",
+ " max_depth= best_params['max_depth'], \n",
+ " learning_rate= best_params['learning_rate'], \n",
+ " random_state= i_Seed)"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "CuqyLHTU5Z-j"
+ },
+ "source": [
+ "## Selecionar as COLUNAS importantes/relevantes\n",
+ "* [The Multiple faces of ‘Feature importance’ in XGBoost](https://towardsdatascience.com/be-careful-when-interpreting-your-features-importance-in-xgboost-6e16132588e7)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "QPG3JZIpRZ-T"
+ },
+ "source": [
+ "# plot feature importance\n",
+ "from xgboost import plot_importance\n",
+ "\n",
+ "xgb.plot_importance(ml_XGB2, color = 'red')\n",
+ "plt.title('importance', fontsize = 20)\n",
+ "plt.yticks(fontsize = 10)\n",
+ "plt.ylabel('features', fontsize = 20)"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "EmpRC2lHW-KP"
+ },
+ "source": [
+ "ml_XGB2"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "4f9MIEBiyq-5"
+ },
+ "source": [
+ "X_train_XGB, X_test_XGB= seleciona_colunas_relevantes(ml_XGB2, X_train, X_test)"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "F6EayWaY5nMm"
+ },
+ "source": [
+ "## Treina o classificador com as COLUNAS relevantes"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "Huy18gKI5qad"
+ },
+ "source": [
+ "best_params"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "E3-PaTdc5vZk"
+ },
+ "source": [
+ "# Treina com as COLUNAS relevantes...\n",
+ "ml_XGB2.fit(X_train_XGB, y_train)\n",
+ "\n",
+ "# Cross-Validation com 10 folds\n",
+ "a_scores_CV = cross_val_score(ml_XGB2, X_train_XGB, y_train, cv = i_CV)\n",
+ "print(f'Acurácia Media: {100*a_scores_CV.mean():.2f}')\n",
+ "print(f'std médio.....: {100*a_scores_CV.std():.2f}')"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "tBdYikDU6NhD"
+ },
+ "source": [
+ "## Valida o modelo usando o dataframe X_test"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "GcvY-VdL6VIZ"
+ },
+ "source": [
+ "y_pred_XGB = ml_XGB2.predict(X_test_XGB)\n",
+ "\n",
+ "# Calcula acurácia\n",
+ "accuracy_score(y_test, y_pred_XGB)"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "8oLtdH-vTSbC"
+ },
+ "source": [
+ "xgb.to_graphviz(ml_XGB2)"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "czXQG3MCHfHM"
+ },
+ "source": [
+ "# KNN - KNEIGHBORSCLASSIFIER"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "llTTXNeyHiwx"
+ },
+ "source": [
+ "# BAGGINGCLASSIFIER"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "Fbkekd4QHoZO"
+ },
+ "source": [
+ "# EXTRATREESCLASSIFIER"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "widavwR4HzwE"
+ },
+ "source": [
+ "# SVM\n",
+ "https://data-flair.training/blogs/svm-support-vector-machine-tutorial/"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "id_Ubulns6We"
+ },
+ "source": [
+ "# NAIVE BAYES"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "3e0m7lEnYOV9"
+ },
+ "source": [
+ "# **IMPORTANCIA DAS COLUNAS**\n",
+ "Source: [Plotting Feature Importances](https://www.kaggle.com/grfiv4/plotting-feature-importances)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "fjco0HnNYr-N"
+ },
+ "source": [
+ "def mostra_feature_importances(clf, X_train, y_train=None, \n",
+ " top_n=10, figsize=(8,8), print_table=False, title=\"Feature Importances\"):\n",
+ " '''\n",
+ " plot feature importances of a tree-based sklearn estimator\n",
+ " \n",
+ " Note: X_train and y_train are pandas DataFrames\n",
+ " \n",
+ " Note: Scikit-plot is a lovely package but I sometimes have issues\n",
+ " 1. flexibility/extendibility\n",
+ " 2. complicated models/datasets\n",
+ " But for many situations Scikit-plot is the way to go\n",
+ " see https://scikit-plot.readthedocs.io/en/latest/Quickstart.html\n",
+ " \n",
+ " Parameters\n",
+ " ----------\n",
+ " clf (sklearn estimator) if not fitted, this routine will fit it\n",
+ " \n",
+ " X_train (pandas DataFrame)\n",
+ " \n",
+ " y_train (pandas DataFrame) optional\n",
+ " required only if clf has not already been fitted \n",
+ " \n",
+ " top_n (int) Plot the top_n most-important features\n",
+ " Default: 10\n",
+ " \n",
+ " figsize ((int,int)) The physical size of the plot\n",
+ " Default: (8,8)\n",
+ " \n",
+ " print_table (boolean) If True, print out the table of feature importances\n",
+ " Default: False\n",
+ " \n",
+ " Returns\n",
+ " -------\n",
+ " the pandas dataframe with the features and their importance\n",
+ " \n",
+ " Author\n",
+ " ------\n",
+ " George Fisher\n",
+ " '''\n",
+ " \n",
+ " __name__ = \"mostra_feature_importances\"\n",
+ " \n",
+ " import pandas as pd\n",
+ " import numpy as np\n",
+ " import matplotlib.pyplot as plt\n",
+ " \n",
+ " from xgboost.core import XGBoostError\n",
+ " from lightgbm.sklearn import LightGBMError\n",
+ " \n",
+ " try: \n",
+ " if not hasattr(clf, 'feature_importances_'):\n",
+ " clf.fit(X_train.values, y_train.values.ravel())\n",
+ "\n",
+ " if not hasattr(clf, 'feature_importances_'):\n",
+ " raise AttributeError(\"{} does not have feature_importances_ attribute\".\n",
+ " format(clf.__class__.__name__))\n",
+ " \n",
+ " except (XGBoostError, LightGBMError, ValueError):\n",
+ " clf.fit(X_train.values, y_train.values.ravel())\n",
+ " \n",
+ " feat_imp = pd.DataFrame({'importance':clf.feature_importances_}) \n",
+ " feat_imp['feature'] = X_train.columns\n",
+ " feat_imp.sort_values(by ='importance', ascending = False, inplace = True)\n",
+ " feat_imp = feat_imp.iloc[:top_n]\n",
+ " \n",
+ " feat_imp.sort_values(by='importance', inplace = True)\n",
+ " feat_imp = feat_imp.set_index('feature', drop = True)\n",
+ " feat_imp.plot.barh(title=title, figsize=figsize)\n",
+ " plt.xlabel('Feature Importance Score')\n",
+ " plt.show()\n",
+ " \n",
+ " if print_table:\n",
+ " from IPython.display import display\n",
+ " print(\"Top {} features in descending order of importance\".format(top_n))\n",
+ " display(feat_imp.sort_values(by = 'importance', ascending = False))\n",
+ " \n",
+ " return feat_imp"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "ycu_EIGlYUYn"
+ },
+ "source": [
+ "import pandas as pd\n",
+ "\n",
+ "from xgboost import XGBClassifier\n",
+ "from sklearn.ensemble import ExtraTreesClassifier\n",
+ "from sklearn.tree import ExtraTreeClassifier\n",
+ "from sklearn.tree import DecisionTreeClassifier\n",
+ "from sklearn.ensemble import GradientBoostingClassifier\n",
+ "from sklearn.ensemble import BaggingClassifier\n",
+ "from sklearn.ensemble import AdaBoostClassifier\n",
+ "from sklearn.ensemble import RandomForestClassifier\n",
+ "from sklearn.linear_model import LogisticRegression\n",
+ "from lightgbm import LGBMClassifier\n",
+ "\n",
+ "clfs = [XGBClassifier(), LGBMClassifier(), \n",
+ " ExtraTreesClassifier(), ExtraTreeClassifier(),\n",
+ " BaggingClassifier(), DecisionTreeClassifier(),\n",
+ " GradientBoostingClassifier(), LogisticRegression(),\n",
+ " AdaBoostClassifier(), RandomForestClassifier()]\n",
+ "\n",
+ "for clf in clfs:\n",
+ " try:\n",
+ " _ = mostra_feature_importances(clf, X_train, y_train, top_n=X_train.shape[1], title=clf.__class__.__name__)\n",
+ " except AttributeError as e:\n",
+ " print(e)"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "EwWkjfC8KEZH"
+ },
+ "source": [
+ "# ENSEMBLE METHODS\n",
+ "https://towardsdatascience.com/using-bagging-and-boosting-to-improve-classification-tree-accuracy-6d3bb6c95e5b\n",
+ "\n",
+ ""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "3Uf1RML7xETY"
+ },
+ "source": [
+ "# WOE e IV\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "TBNRfYZCyhMP"
+ },
+ "source": [
+ "## Construção do exemplo"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "gIIroyyP4ZRZ"
+ },
+ "source": [
+ "df_y.head()"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "PzQQdrkf1ohX"
+ },
+ "source": [
+ "from random import choices\n",
+ "\n",
+ "df_X2= df_X.copy()\n",
+ "df_X2['tipo']= choices(['A', 'B', 'C', 'D'], k= 1000)\n",
+ "df_X2['idade']= np.random.randint(10, 15, size= 1000)\n",
+ "df_X2['target']= df_y['target']\n",
+ "df_X2.head()"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "v-OpwIpx4hXJ"
+ },
+ "source": [
+ "df_X2['target'].value_counts()"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "yZfqSvbKzeJ3"
+ },
+ "source": [
+ "def Constroi_Buckets(df, i, k= 10):\n",
+ " coluna= 'v'+ str(i)\n",
+ " df[coluna+'_Bucket']= pd.cut(df[coluna], bins= k, labels= np.arange(1, k+1))\n",
+ " df= df.drop(columns= [coluna], axis= 1)\n",
+ " return df"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "V6Nrpsx60HD3"
+ },
+ "source": [
+ "for i in np.arange(1,19):\n",
+ " df_X2= Constroi_Buckets(df_X2, i)"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "J2Fbh41-03OB"
+ },
+ "source": [
+ "df_X2.head()"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "O9r5BeWVxIr3"
+ },
+ "source": [
+ "# Função para calcular WOE e IV\n",
+ "def calculate_woe_iv(dataset, feature, target):\n",
+ "\n",
+ " def codethem(IV):\n",
+ " if IV < 0.02: return 'Useless'\n",
+ " elif IV >= 0.02 and IV < 0.1: return 'Weak'\n",
+ " elif IV >= 0.1 and IV < 0.3: return 'Medium'\n",
+ " elif IV >= 0.3 and IV < 0.5: return 'Strong'\n",
+ " elif IV >= 0.5: return 'Suspicious'\n",
+ " else: return 'None'\n",
+ "\n",
+ " lst = []\n",
+ " for i in range(dataset[feature].nunique()):\n",
+ " val = list(dataset[feature].unique())[i]\n",
+ " lst.append({\n",
+ " 'Value': val,\n",
+ " 'All': dataset[dataset[feature] == val].count()[feature],\n",
+ " 'Good': dataset[(dataset[feature] == val) & (dataset[target] == 0)].count()[feature],\n",
+ " 'Bad': dataset[(dataset[feature] == val) & (dataset[target] == 1)].count()[feature]\n",
+ " })\n",
+ " \n",
+ " dset = pd.DataFrame(lst)\n",
+ " dset['Distr_Good'] = dset['Good']/dset['Good'].sum()\n",
+ " dset['Distr_Bad'] = dset['Bad']/dset['Bad'].sum()\n",
+ " dset['Mean']= dset['All']/dset['All'].sum()\n",
+ " dset['WoE'] = np.log(dset['Distr_Good']/dset['Distr_Bad'])\n",
+ " dset = dset.replace({'WoE': {np.inf: 0, -np.inf: 0}})\n",
+ " dset['IV'] = (dset['Distr_Good'] - dset['Distr_Bad']) * dset['WoE']\n",
+ " #dset= dset.drop(columns= ['Distr_Good', 'Distr_Bad'], axis= 1)\n",
+ "\n",
+ " dset['Predictive_Power']= dset['IV'].map(codethem)\n",
+ " iv = dset['IV'].sum() \n",
+ " dset = dset.sort_values(by='IV') \n",
+ " return dset, iv"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "Y8WGjWH63nx_"
+ },
+ "source": [
+ "df_Lab = df_X2.copy()"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "-N6xr1MgxTiz"
+ },
+ "source": [
+ "def calcula_Predictive_Power(df_Lab, coluna):\n",
+ " print('WoE and IV for column: {}'.format(coluna))\n",
+ " df, iv = calculate_woe_iv(df_Lab, coluna, 'target')\n",
+ " print(df)\n",
+ " print('IV score: {:.2f}'.format(iv))\n",
+ " print('\\n')"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "ayqN_7WnxVq9"
+ },
+ "source": [
+ "for i in np.arange(1,19):\n",
+ " coluna= 'v'+str(i)+'_Bucket'\n",
+ " calcula_Predictive_Power(df_Lab, coluna)"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "qtoJVI4Pyx3I"
+ },
+ "source": [
+ "# **IMBALANCED SAMPLE**\n",
+ "> Alguns objetivos como detectar fraude em transações bancárias ou detecção de intrusão em network tem em comum o fato que a classe de interesse (o que queremos detectar), geralmente é um evento raro\n",
+ "\n",
+ "## Exemplo: Detectar fraude\n",
+ "A proporção de fraudes diante de NÃO-FRAUDES são mais ou menos 1%/99%. Neste caso, ao desenvovermos um modelo para detectar fraudes e o modelo classificar todas as instâncias como NÃO-FRAUDE, então o modelo terá uma acurácia de 99%. No entanto, este modelo não nos ajudará em nada.\n",
+ "\n",
+ "## Necessidade de se usar outras métricas \n",
+ "> Recomenda-se utilizar outras métricas (na verdade, é boa prática usar mais de 1 métrica para medir a performance dos modelos) como, por exemplo, F1-Score, Precision/Specificity, Recall/Sensitivity e AUROC.\n",
+ "\n",
+ "## Como lidar com a amostra desbalanceada?\n",
+ "* Under-sampling\n",
+ "> Seleciona aleatoriamente a classe MAJORITÁRIA (em nosso exemplo, NÃO-FRAUDE) até o número de instâncias da classe MINORITÁRIA (FRAUDE);\n",
+ "\n",
+ "* Over-sampling\n",
+ "> Resample aleatoriamente a classe MINORITÁRIA (em nosso exemplo, FRAUDE) até o número de instâncias da classe MAJORITÁRIA (NÃO-FRAUDE), ou uma proporção da classe MAJORITÁRIA. Veja a bibliotea SMOTE (Synthetic Minority Over-Sampling Techniques);\n",
+ "\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "2o45zx8zw-aB"
+ },
+ "source": [
+ "## EFEITOS DA AMOSTRA DESBALANCEADA"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "cCVTPCB-Xkbd"
+ },
+ "source": [
+ "# TPOT\n",
+ "https://towardsdatascience.com/tpot-automated-machine-learning-in-python-4c063b3e5de9"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "2ulXii6JXpWd"
+ },
+ "source": [
+ ""
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "_TWUq-z4X4yZ"
+ },
+ "source": [
+ "___\n",
+ "# FEATURETOOLS\n",
+ "https://medium.com/@rrfd/simple-automatic-feature-engineering-using-featuretools-in-python-for-classification-b1308040e183\n",
+ "\n",
+ "https://www.analyticsvidhya.com/blog/2018/08/guide-automated-feature-engineering-featuretools-python/\n",
+ "\n",
+ "https://mlwhiz.com/blog/2019/05/19/feature_extraction/\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "aZUNOgmSgAmq"
+ },
+ "source": [
+ "!pip install featuretools"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "_sxdONzsh9rb"
+ },
+ "source": [
+ "df_X.head()"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "p5_ynGo1dBJJ"
+ },
+ "source": [
+ "df_X.shape"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "TqJRJXUhiDqf"
+ },
+ "source": [
+ "from random import choices\n",
+ "\n",
+ "df_X2= df_X.copy()\n",
+ "df_X2['tipo'] = choices(['A', 'B', 'C', 'D'], k = 1000)\n",
+ "df_X2['idade'] = np.random.randint(10, 15, size = 1000)\n",
+ "df_X2['id'] = range(0,1000)\n",
+ "df_X2.head()"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "nR56bGGngk-W"
+ },
+ "source": [
+ "# Automated feature engineering\n",
+ "import featuretools as ft\n",
+ "import featuretools.variable_types as vtypes\n",
+ "\n",
+ "es= ft.EntitySet(id = 'simulacao')\n",
+ "\n",
+ "# adding a dataframe \n",
+ "es.entity_from_dataframe(entity_id = 'df_X2', dataframe = df_X2, index = 'id')\n",
+ "es"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "IOJ4Tr5Ogk6M"
+ },
+ "source": [
+ "es['df_X2'].variables"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "1uXPqHDZgkys"
+ },
+ "source": [
+ "variable_types = {'idade': vtypes.Categorical}\n",
+ " \n",
+ "es.entity_from_dataframe(entity_id = 'df_X2', dataframe = df_X2, index = 'id', variable_types= variable_types)\n",
+ "\n",
+ "es = es.normalize_entity(base_entity_id='df_X2', new_entity_id= 'tipo', index='id')\n",
+ "es = es.normalize_entity(base_entity_id='df_X2', new_entity_id= 'idade', index='id')"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "dnbYTBqugkvm"
+ },
+ "source": [
+ "es"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "I2v_jetdgkr7"
+ },
+ "source": [
+ "feature_matrix, feature_names = ft.dfs(entityset=es, target_entity = 'df_X2', max_depth = 3, verbose = 3, n_jobs= 1)"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "zZiRBvHXgkoJ"
+ },
+ "source": [
+ "feature_matrix.head()"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "aWiahwKe2d6U"
+ },
+ "source": [
+ "# **EXERCÍCIOS**\n",
+ "> Encontre algoritmos adequados para ser aplicados aos seguintes problemas:"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "XbSLkbDB2mzK"
+ },
+ "source": [
+ "## Exercício 1 - Credit Card Fraud Detection\n",
+ "Source: [Credit Card Fraud Detection](https://www.kaggle.com/mlg-ulb/creditcardfraud)\n",
+ "\n",
+ "### Leitura suporte\n",
+ "* [Detecting Credit Card Fraud Using Machine Learning](https://towardsdatascience.com/detecting-credit-card-fraud-using-machine-learning-a3d83423d3b8)\n",
+ "* [Credit Card Fraud Detection](https://towardsdatascience.com/credit-card-fraud-detection-a1c7e1b75f59)\n",
+ "\n",
+ "### Dataframe\n",
+ "* [Creditcard.csv](https://raw.githubusercontent.com/MathMachado/DataFrames/master/creditcard.csv)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "sPNc6ouw2MRe"
+ },
+ "source": [
+ "import pandas as pd\n",
+ "import numpy as np"
+ ],
+ "execution_count": 2,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "RlGFLoPi2OFJ",
+ "outputId": "ec18dcc6-9703-4764-d781-9d2c5738c54f",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 224
+ }
+ },
+ "source": [
+ "url= 'https://raw.githubusercontent.com/gersonhenz/DSWP/master/Dataframes/creditcard.csv'\n",
+ "df_cc = pd.read_csv(url)\n",
+ "df_cc.head()"
+ ],
+ "execution_count": 4,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " Time | \n",
+ " V1 | \n",
+ " V2 | \n",
+ " V3 | \n",
+ " V4 | \n",
+ " V5 | \n",
+ " V6 | \n",
+ " V7 | \n",
+ " V8 | \n",
+ " V9 | \n",
+ " V10 | \n",
+ " V11 | \n",
+ " V12 | \n",
+ " V13 | \n",
+ " V14 | \n",
+ " V15 | \n",
+ " V16 | \n",
+ " V17 | \n",
+ " V18 | \n",
+ " V19 | \n",
+ " V20 | \n",
+ " V21 | \n",
+ " V22 | \n",
+ " V23 | \n",
+ " V24 | \n",
+ " V25 | \n",
+ " V26 | \n",
+ " V27 | \n",
+ " V28 | \n",
+ " Amount | \n",
+ " Class | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " | 0 | \n",
+ " 0 | \n",
+ " -1.359807 | \n",
+ " -0.072781 | \n",
+ " 2.536347 | \n",
+ " 1.378155 | \n",
+ " -0.338321 | \n",
+ " 0.462388 | \n",
+ " 0.239599 | \n",
+ " 0.098698 | \n",
+ " 0.363787 | \n",
+ " 0.090794 | \n",
+ " -0.551600 | \n",
+ " -0.617801 | \n",
+ " -0.991390 | \n",
+ " -0.311169 | \n",
+ " 1.468177 | \n",
+ " -0.470401 | \n",
+ " 0.207971 | \n",
+ " 0.025791 | \n",
+ " 0.403993 | \n",
+ " 0.251412 | \n",
+ " -0.018307 | \n",
+ " 0.277838 | \n",
+ " -0.110474 | \n",
+ " 0.066928 | \n",
+ " 0.128539 | \n",
+ " -0.189115 | \n",
+ " 0.133558 | \n",
+ " -0.021053 | \n",
+ " 149.62 | \n",
+ " 0.0 | \n",
+ "
\n",
+ " \n",
+ " | 1 | \n",
+ " 0 | \n",
+ " 1.191857 | \n",
+ " 0.266151 | \n",
+ " 0.166480 | \n",
+ " 0.448154 | \n",
+ " 0.060018 | \n",
+ " -0.082361 | \n",
+ " -0.078803 | \n",
+ " 0.085102 | \n",
+ " -0.255425 | \n",
+ " -0.166974 | \n",
+ " 1.612727 | \n",
+ " 1.065235 | \n",
+ " 0.489095 | \n",
+ " -0.143772 | \n",
+ " 0.635558 | \n",
+ " 0.463917 | \n",
+ " -0.114805 | \n",
+ " -0.183361 | \n",
+ " -0.145783 | \n",
+ " -0.069083 | \n",
+ " -0.225775 | \n",
+ " -0.638672 | \n",
+ " 0.101288 | \n",
+ " -0.339846 | \n",
+ " 0.167170 | \n",
+ " 0.125895 | \n",
+ " -0.008983 | \n",
+ " 0.014724 | \n",
+ " 2.69 | \n",
+ " 0.0 | \n",
+ "
\n",
+ " \n",
+ " | 2 | \n",
+ " 1 | \n",
+ " -1.358354 | \n",
+ " -1.340163 | \n",
+ " 1.773209 | \n",
+ " 0.379780 | \n",
+ " -0.503198 | \n",
+ " 1.800499 | \n",
+ " 0.791461 | \n",
+ " 0.247676 | \n",
+ " -1.514654 | \n",
+ " 0.207643 | \n",
+ " 0.624501 | \n",
+ " 0.066084 | \n",
+ " 0.717293 | \n",
+ " -0.165946 | \n",
+ " 2.345865 | \n",
+ " -2.890083 | \n",
+ " 1.109969 | \n",
+ " -0.121359 | \n",
+ " -2.261857 | \n",
+ " 0.524980 | \n",
+ " 0.247998 | \n",
+ " 0.771679 | \n",
+ " 0.909412 | \n",
+ " -0.689281 | \n",
+ " -0.327642 | \n",
+ " -0.139097 | \n",
+ " -0.055353 | \n",
+ " -0.059752 | \n",
+ " 378.66 | \n",
+ " 0.0 | \n",
+ "
\n",
+ " \n",
+ " | 3 | \n",
+ " 1 | \n",
+ " -0.966272 | \n",
+ " -0.185226 | \n",
+ " 1.792993 | \n",
+ " -0.863291 | \n",
+ " -0.010309 | \n",
+ " 1.247203 | \n",
+ " 0.237609 | \n",
+ " 0.377436 | \n",
+ " -1.387024 | \n",
+ " -0.054952 | \n",
+ " -0.226487 | \n",
+ " 0.178228 | \n",
+ " 0.507757 | \n",
+ " -0.287924 | \n",
+ " -0.631418 | \n",
+ " -1.059647 | \n",
+ " -0.684093 | \n",
+ " 1.965775 | \n",
+ " -1.232622 | \n",
+ " -0.208038 | \n",
+ " -0.108300 | \n",
+ " 0.005274 | \n",
+ " -0.190321 | \n",
+ " -1.175575 | \n",
+ " 0.647376 | \n",
+ " -0.221929 | \n",
+ " 0.062723 | \n",
+ " 0.061458 | \n",
+ " 123.50 | \n",
+ " 0.0 | \n",
+ "
\n",
+ " \n",
+ " | 4 | \n",
+ " 2 | \n",
+ " -1.158233 | \n",
+ " 0.877737 | \n",
+ " 1.548718 | \n",
+ " 0.403034 | \n",
+ " -0.407193 | \n",
+ " 0.095921 | \n",
+ " 0.592941 | \n",
+ " -0.270533 | \n",
+ " 0.817739 | \n",
+ " 0.753074 | \n",
+ " -0.822843 | \n",
+ " 0.538196 | \n",
+ " 1.345852 | \n",
+ " -1.119670 | \n",
+ " 0.175121 | \n",
+ " -0.451449 | \n",
+ " -0.237033 | \n",
+ " -0.038195 | \n",
+ " 0.803487 | \n",
+ " 0.408542 | \n",
+ " -0.009431 | \n",
+ " 0.798278 | \n",
+ " -0.137458 | \n",
+ " 0.141267 | \n",
+ " -0.206010 | \n",
+ " 0.502292 | \n",
+ " 0.219422 | \n",
+ " 0.215153 | \n",
+ " 69.99 | \n",
+ " 0.0 | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " Time V1 V2 V3 ... V27 V28 Amount Class\n",
+ "0 0 -1.359807 -0.072781 2.536347 ... 0.133558 -0.021053 149.62 0.0\n",
+ "1 0 1.191857 0.266151 0.166480 ... -0.008983 0.014724 2.69 0.0\n",
+ "2 1 -1.358354 -1.340163 1.773209 ... -0.055353 -0.059752 378.66 0.0\n",
+ "3 1 -0.966272 -0.185226 1.792993 ... 0.062723 0.061458 123.50 0.0\n",
+ "4 2 -1.158233 0.877737 1.548718 ... 0.219422 0.215153 69.99 0.0\n",
+ "\n",
+ "[5 rows x 31 columns]"
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "execution_count": 4
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "AWBaTtKL5NG5",
+ "outputId": "cf3c7d2d-a208-4de2-91da-549df5dc9fbc",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 34
+ }
+ },
+ "source": [
+ "df_cc.shape"
+ ],
+ "execution_count": 9,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "(12842, 31)"
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "execution_count": 9
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "dpisbkhv3f_p",
+ "outputId": "175f9013-b169-4a73-8af7-828a83eeeb24",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 68
+ }
+ },
+ "source": [
+ "df_cc['Class'].value_counts()"
+ ],
+ "execution_count": 7,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "0.0 12785\n",
+ "1.0 56\n",
+ "Name: Class, dtype: int64"
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "execution_count": 7
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "XJcK_gRh3uzD",
+ "outputId": "8c3fdbb7-30aa-4ea9-c11e-dc8db3033c6f",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 34
+ }
+ },
+ "source": [
+ "56/12785\n"
+ ],
+ "execution_count": 8,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "0.004380132968322252"
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "execution_count": 8
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "MDskpiqf4DHI"
+ },
+ "source": [
+ "# não precisa normalizar os campos neste exercício, pois o DECISION TREE não requer isso;\n",
+ "# aplicar as transformações (principais) e reestimar modelo\n",
+ "# qual o impacto das transformações? A conclusão mudou ou não?\n",
+ "\n"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "6HHkLchN2-Wh",
+ "outputId": "54742978-b903-4404-8b69-9dd8e7926bc5",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 561
+ }
+ },
+ "source": [
+ "df_cc.isna().sum()"
+ ],
+ "execution_count": 5,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "Time 0\n",
+ "V1 0\n",
+ "V2 0\n",
+ "V3 0\n",
+ "V4 0\n",
+ "V5 0\n",
+ "V6 0\n",
+ "V7 0\n",
+ "V8 0\n",
+ "V9 0\n",
+ "V10 1\n",
+ "V11 1\n",
+ "V12 1\n",
+ "V13 1\n",
+ "V14 1\n",
+ "V15 1\n",
+ "V16 1\n",
+ "V17 1\n",
+ "V18 1\n",
+ "V19 1\n",
+ "V20 1\n",
+ "V21 1\n",
+ "V22 1\n",
+ "V23 1\n",
+ "V24 1\n",
+ "V25 1\n",
+ "V26 1\n",
+ "V27 1\n",
+ "V28 1\n",
+ "Amount 1\n",
+ "Class 1\n",
+ "dtype: int64"
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "execution_count": 5
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "qhObHtS63ecx",
+ "outputId": "c0265507-9ad7-4a11-c847-b82f952d66a7",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 34
+ }
+ },
+ "source": [
+ "df_cc2 = df_cc.copy()\n",
+ "df_cc2 = df_cc.dropna()\n",
+ "df_cc2.shape"
+ ],
+ "execution_count": 10,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "(12841, 31)"
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "execution_count": 10
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "CBl1x6tM49iz"
+ },
+ "source": [
+ "# Definir as variáveis globais\n",
+ "i_CV = 10\n",
+ "i_Seed = 20111974\n",
+ "f_Test_Size = 0.3\n"
+ ],
+ "execution_count": 12,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "CSuzy52r7zoY",
+ "outputId": "b1077f26-a1ca-494d-c4a4-662a481de491",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 197
+ }
+ },
+ "source": [
+ "df_X = df_cc2.copy() # dataframe só com as coluna preditoras.... não consegui copiar tudo ....\n",
+ "df_X.drop(colums = ['Class'], axis=1, inplace = True)\n",
+ "df_X.head()"
+ ],
+ "execution_count": 14,
+ "outputs": [
+ {
+ "output_type": "error",
+ "ename": "TypeError",
+ "evalue": "ignored",
+ "traceback": [
+ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
+ "\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)",
+ "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0mdf_X\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mdf_cc2\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcopy\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;31m# dataframe só com as coluna preditoras.... não consegui copiar tudo ....\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0mdf_X\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdrop\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mcolums\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0;34m'Class'\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0maxis\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0minplace\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;32mTrue\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 3\u001b[0m \u001b[0mdf_X\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mhead\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
+ "\u001b[0;31mTypeError\u001b[0m: drop() got an unexpected keyword argument 'colums'"
+ ]
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "pjj8IffX7QAe"
+ },
+ "source": [
+ "df_X = df_cc2 [] # dataframe somente com as preditoras\n",
+ "df_y = df_cc2 ['Class'] # variável RESPOSTA\n",
+ "\n",
+ "\n"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "ZdfL-K7z7ZB5"
+ },
+ "source": [
+ ""
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "W1ymC93v54ti"
+ },
+ "source": [
+ "from sklearn.model_selection train_test_split\n",
+ "\n",
+ "X_treinamento, X_teste, y_treinamento, y_teste = train_test_split(df_X, df_y, test_size = f_Test_Size, )"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "oYgK6JXd3MgA"
+ },
+ "source": [
+ "## Exercício 2 - Predicting species on IRIS dataset\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "si0rsJvu3O6O"
+ },
+ "source": [
+ "from sklearn import datasets\n",
+ "import xgboost as xgb\n",
+ "\n",
+ "iris = datasets.load_iris()\n",
+ "X_iris = iris.data\n",
+ "y_iris = iris.target"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "zom8t4yWC_UC"
+ },
+ "source": [
+ "## Exercício 3 - Predict Wine Quality\n",
+ "> Estimar a qualidade dos vinhos, numa scala de 0–100. A seguir, a qualidade em função da escala:\n",
+ "\n",
+ "* 95–100 Classic: a great wine\n",
+ "* 90–94 Outstanding: a wine of superior character and style\n",
+ "* 85–89 Very good: a wine with special qualities\n",
+ "* 80–84 Good: a solid, well-made wine\n",
+ "* 75–79 Mediocre: a drinkable wine that may have minor flaws\n",
+ "* 50–74 Not recommended\n",
+ "\n",
+ "Source: [Wine Reviews](https://www.kaggle.com/zynicide/wine-reviews)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "klL2Q9Ria96n"
+ },
+ "source": [
+ "import numpy as np\n",
+ "import pandas as pd\n",
+ "from sklearn import datasets\n",
+ "\n",
+ "Wine = datasets.load_wine()\n",
+ "X_vinho = Wine.data\n",
+ "y_vinho = Wine.target"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "lhVhSWBgGijq"
+ },
+ "source": [
+ "## Exercício 4 - Predict Parkinson\n",
+ "Source: https://archive.ics.uci.edu/ml/machine-learning-databases/parkinsons/"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "SVCxHqv0VBJn"
+ },
+ "source": [
+ "## Exercício 5 - Predict survivors from Titanic tragedy\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "CwvB8us4eKNi"
+ },
+ "source": [
+ "import numpy as np\n",
+ "import pandas as pd\n",
+ "import seaborn as sns\n",
+ "\n",
+ "df_titanic = sns.load_dataset('titanic')"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "ZJrT9YIXVdtx"
+ },
+ "source": [
+ "## Exercício 6 - Predict Loan\n",
+ "> Os dados devem ser obtidos diretamente da fonte: [Loan Default Prediction - Imperial College London](https://www.kaggle.com/c/loan-default-prediction/data)\n",
+ "\n",
+ "* [Bank Loan Default Prediction](https://medium.com/@wutianhao910/bank-loan-default-prediction-94d4902db740)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "R8-GVu7ZWeA8"
+ },
+ "source": [
+ "## Exercício 7 - Predict the sales of a store.\n",
+ "* [Predicting expected sales for Bigmart’s stores](https://medium.com/diogo-menezes-borges/project-1-bigmart-sale-prediction-fdc04f07dc1e)\n",
+ "* Dataframes\n",
+ " * [Treinamento](https://raw.githubusercontent.com/MathMachado/DataFrames/master/Big_Mart_Sales_III_train.txt)\n",
+ " * [Validação](https://raw.githubusercontent.com/MathMachado/DataFrames/master/Big_Mart_Sales_III_test.txt)\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "fv9w86j4Wnwj"
+ },
+ "source": [
+ "## Exercício 8 - [The Boston Housing Dataset](https://www.cs.toronto.edu/~delve/data/boston/bostonDetail.html)\n",
+ "> Predict the median value of owner occupied homes."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "5HYRt8-ug1BT"
+ },
+ "source": [
+ "import numpy as np\n",
+ "import pandas as pd\n",
+ "from sklearn import datasets\n",
+ "\n",
+ "Boston = datasets.load_boston()\n",
+ "X_boston = Boston.data\n",
+ "y_boston = Boston.target"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "1UDIaqmtXQ0T"
+ },
+ "source": [
+ "## Exercício 9 - Predict the height or weight of a person.\n",
+ "\n",
+ "http://wiki.stat.ucla.edu/socr/index.php/SOCR_Data_Dinov_020108_HeightsWeights"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "-7R146nIXmMT"
+ },
+ "source": [
+ "## Exercício 10 - Black Friday Sales Prediction - Predict purchase amount.\n",
+ "\n",
+ "This dataset comprises of sales transactions captured at a retail store. It’s a classic dataset to explore and expand your feature engineering skills and day to day understanding from multiple shopping experiences. This is a regression problem. The dataset has 550,069 rows and 12 columns.\n",
+ "\n",
+ "https://github.com/MathMachado/DataFrames/blob/master/blackfriday.zip\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "mQ8FPbuLZlIh"
+ },
+ "source": [
+ "## Exercício 11 - Predict the income class of US population.\n",
+ "\n",
+ "http://archive.ics.uci.edu/ml/machine-learning-databases/census-income-mld/"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "Af4NRrchgPlM"
+ },
+ "source": [
+ "## Exercício 12 - Predicting Cancer\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "c4LOlgZW3P40"
+ },
+ "source": [
+ "from sklearn import datasets\n",
+ "cancer = datasets.load_breast_cancer()\n",
+ "X_cancer = cancer.data\n",
+ "y_cancer = cancer.target"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "74PmpT8Ix0tD"
+ },
+ "source": [
+ "## Exercício 13\n",
+ "Source: [Complete Machine Learning Guide to Parameter Tuning in Gradient Boosting (GBM) in Python](https://www.analyticsvidhya.com/blog/2016/02/complete-guide-parameter-tuning-gradient-boosting-gbm-python/).\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "WY8GZMixZ9W9"
+ },
+ "source": [
+ "## Exercício 14 - Predict Diabetes"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "y92t6tbOge0S"
+ },
+ "source": [
+ "from sklearn import datasets\n",
+ "Diabetes= datasets.load_diabetes()\n",
+ "\n",
+ "X_diabetes = Diabetes.data\n",
+ "y_diabetes = Diabetes.target"
+ ],
+ "execution_count": null,
+ "outputs": []
+ }
+ ]
+}
\ No newline at end of file