diff --git a/Notebooks/NB07__Dictionaries_gerson.ipynb b/Notebooks/NB07__Dictionaries_gerson.ipynb new file mode 100644 index 000000000..29e256ba4 --- /dev/null +++ b/Notebooks/NB07__Dictionaries_gerson.ipynb @@ -0,0 +1,2681 @@ +{ + "nbformat": 4, + "nbformat_minor": 0, + "metadata": { + "colab": { + "name": "NB07__Dictionaries.ipynb", + "provenance": [], + "collapsed_sections": [ + "n8BIbzQbNWUo", + "7eS94uQ4NhVR", + "SYOgJpGYVLUu", + "CaHFxk98W5if", + "ReWUyWiHXCnc", + "CqszHxaKHr2h", + "tXgF1Wl9gHKY", + "Fotx7XUquAo8", + "36kmLUYDvsUI", + "SWO2GdNovxAp", + "vpN54l4vxze5", + "u4HOf9SNytSq", + "6BQ9oZiD9hg5", + "tz5-QdrX9vct", + "p1muBgMX8NK4", + "FxTC2-U88ajk", + "z8EYn0pP25Rh" + ], + "include_colab_link": true + }, + "kernelspec": { + "name": "python3", + "display_name": "Python 3" + }, + "accelerator": "GPU" + }, + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "view-in-github", + "colab_type": "text" + }, + "source": [ + "\"Open" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "iBW6agsvqqAm" + }, + "source": [ + "

DICIONÁRIOS

\n", + "\n", + "* Coleção desordenada, mutável e indexada (estrutura do tipo {key: value}) de itens;\n", + "* Não permite itens duplicados;\n", + "* Usamos {key: value} para representar os itens do dicionário;\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "LFcr_2Xnq2ho" + }, + "source": [ + "# **AGENDA**:\n", + "\n", + "> Veja o **índice** dos itens que serão abordados neste capítulo.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "r8vR-lHJIhgM" + }, + "source": [ + "# **NOTAS E OBSERVAÇÕES**\n", + "* Levar os exemplos de lambda function daqui para o capítulo de Lambda Function.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "DkxCxjsbE5fL" + }, + "source": [ + "# **CHEETSHEET**" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "cGUWTualFCOk" + }, + "source": [ + "![DataSctructures](https://github.com/MathMachado/Materials/blob/master/PythonDataStructures.png?raw=true)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ublDMf3R_qMn" + }, + "source": [ + "A seguir, os principais métodos associados aos dicionários. Para isso, considere as listas l_frutas e l_precos_frutas que darão origem ao dicionário d_frutas a seguir:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "FxuJ7Awd8f5a" + }, + "source": [ + "# Definição da lista l_frutas:\n", + "l_frutas = ['Avocado', 'Apple', 'Apricot', 'Banana', 'Blackcurrant', 'Blackberry', 'Blueberry', 'Cherry', 'Coconut', 'Fig', 'Grape', 'Kiwi', 'Lemon', 'Mango', 'Nectarine', \n", + " 'Orange', 'Papaya','Passion Fruit','Peach','Pineapple','Plum','Raspberry','Strawberry','Watermelon']" + ], + "execution_count": 4, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "jJyxuMQc9Ewy" + }, + "source": [ + "# Definição da lista l_precos_frutas:\n", + "l_precos_frutas = [0.35, 0.40, 0.25, 0.30, 0.70, 0.55, 0.45, 0.50, 0.75, 0.60, 0.65, 0.20, 0.15, 0.80, 0.75, 0.25, 0.30,0.45,0.55,0.55,0.60,0.40,0.50,0.45]" + ], + "execution_count": 5, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "C59Z2LQpZ7DD", + "outputId": "fb61f9fc-7c46-418d-8e95-f5e9d1090dc2", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "source": [ + "a= ['abacate', 'ameixa']\n", + "p= [4, 8]\n", + "c= dict(zip(a,p))\n", + "c" + ], + "execution_count": 2, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "{'abacate': 4, 'ameixa': 8}" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 2 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "hXP3kxW4-AI1" + }, + "source": [ + "Observe abaixo o uso das funções dict() e zip() para criarmos o dicionário d_frutas:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "qT_4sYxA9dyn", + "outputId": "65e827a8-c58a-4191-816d-87585c794b85", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "source": [ + "# Definir o dicionário d_frutas: {chave: valor}\n", + "d_frutas = dict(zip(l_frutas, l_precos_frutas))\n", + "d_frutas" + ], + "execution_count": 6, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "{'Apple': 0.4,\n", + " 'Apricot': 0.25,\n", + " 'Avocado': 0.35,\n", + " 'Banana': 0.3,\n", + " 'Blackberry': 0.55,\n", + " 'Blackcurrant': 0.7,\n", + " 'Blueberry': 0.45,\n", + " 'Cherry': 0.5,\n", + " 'Coconut': 0.75,\n", + " 'Fig': 0.6,\n", + " 'Grape': 0.65,\n", + " 'Kiwi': 0.2,\n", + " 'Lemon': 0.15,\n", + " 'Mango': 0.8,\n", + " 'Nectarine': 0.75,\n", + " 'Orange': 0.25,\n", + " 'Papaya': 0.3,\n", + " 'Passion Fruit': 0.45,\n", + " 'Peach': 0.55,\n", + " 'Pineapple': 0.55,\n", + " 'Plum': 0.6,\n", + " 'Raspberry': 0.4,\n", + " 'Strawberry': 0.5,\n", + " 'Watermelon': 0.45}" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 6 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "bIJ4cYhlZ5oT" + }, + "source": [ + "" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "iHKUaGNT_IDt" + }, + "source": [ + "A seguir, resumo dos principais métodos relacionados à dicionários:" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "MQLZ1mwW_yiU" + }, + "source": [ + "| Método | Descrição | Exemplo | Resultado |\n", + "|-------------------------|----------------------------------------------------------------------------------------------------|------------------------------------------|--------------------------------------------------------------------------------|\n", + "| d_dicionario.clear() | Remove todos os itens de d_dicionario | d_frutas.clear() | {} |\n", + "| d_dicionario.copy() | Retorna uma cópia de d_dicionario | d_frutas2= d_frutas.copy() | d_frutas2 é uma cópia de d_frutas |\n", + "| d_dicionario.get(key) | Retorna o valor para key, se key estiver em d_dicionario | d_frutas.get('Passion Fruit') | 0.45 |\n", + "| | | d_frutas.get('XPTO') | O Python não apresenta nenhum retorno |\n", + "| d_dicionario.items() | Retorna um objeto com as tuplas (key, valor) de d_dicionario | d_frutas.items() | dict_items([('Avocado', 0.35), ..., ('Watermelon', 0.45)]) |\n", + "| d_dicionario.keys() | Retorna um objeto com as keys de d_dicionario | d_frutas.keys() | dict_keys(['Avocado', 'Apple', ..., 'Watermelon']) |\n", + "| d_dicionario.values() | Retorna um objeto com os valores de d_dicionario | d_frutas.values() | dict_values([0.35, 0.4, ..., 0.45]) |\n", + "| d_dicionario.popitem() | Retorna e remove um item de d_dicionario | d_frutas.popitem() | ('Watermelon', 0.45) |\n", + "| | | 'Watermelon' in d_frutas | False |\n", + "| d_dicionario.pop(key[, default]) | Retorna e remove o item de d_dicionario correspondente à key | d_frutas.pop('Orange') | 0.25 |\n", + "| | | 'Orange' in d_frutas | False |\n", + "| d_dicionario.update(d2) | Adiciona item(s) à d_dicionario se key não estiver em d_dicionario. Se key estiver em d_dicionario, atualizará key com o novo valor | d_frutas.update({'Cherimoya': 1.3}) | Adicionará o item {'Cherimoya': 1.3} à d_frutas, pois key= 'Cherimoya' não está em d_frutas. |\n", + "| | | d_frutas.update({'Orange': 0.55}) | Atualiza o valor de key= 'Orange' para 0.55. O valor anterior era 0.25 |\n", + "| d_dicionario.fromkeys(keys, value) | Retorna um dicionário com keys especificadas e valores | tFruits= ('Avocado', 'Apple', 'Apricot') | |\n", + "| | | d_frutas.fromkeys(tFruits, 0) | {'Apple': 0, 'Apricot': 0, 'Avocado': 0} |" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "uH6cHnctDu2l" + }, + "source": [ + "A seguir, vamos apresentar mais alguns exemplos de dicionários e seus métodos associados:" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "YeCPxCab4e4k" + }, + "source": [ + "___\n", + "# **EXEMPLO**\n", + "* Os dias da semana como dicionário." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "N_2J839X4lps", + "outputId": "3d356121-d4b7-424a-addb-949be5a9d193", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "source": [ + "d_dia_semana = {'Seg': 'Segunda', 'Ter': 'Terça', 'Qua': 'Quarta', 'Qui': 'Quinta', 'Sex': 'Sexta', 'Sab': 'Sabado', 'Dom': 'Domingo'}\n", + "d_dia_semana" + ], + "execution_count": 7, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "{'Dom': 'Domingo',\n", + " 'Qua': 'Quarta',\n", + " 'Qui': 'Quinta',\n", + " 'Sab': 'Sabado',\n", + " 'Seg': 'Segunda',\n", + " 'Sex': 'Sexta',\n", + " 'Ter': 'Terça'}" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 7 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "CnZLR-VX6FV4" + }, + "source": [ + "Observe que:\n", + "* os itens do dicionário d_dia_semana seguem a estrutura {key: value}.\n" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "eHuvY7BWQKhQ", + "outputId": "3906f3f3-c849-4689-8da1-8d40e8f26369", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 35 + } + }, + "source": [ + "d_dia_semana['Seg']" + ], + "execution_count": 8, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "string" + }, + "text/plain": [ + "'Segunda'" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 8 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "j65BxhzGG0NA" + }, + "source": [ + "___\n", + "# **DECLARAR OU INICIALIZAR UM DICIONÁRIO VAZIO**" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "LEGwQ0U-fKtL" + }, + "source": [ + "Por exemplo, o comando abaixo declara um dicionário vazio chamado d_paises:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "2iPWXPBLfOlr", + "outputId": "5687da81-8541-4169-eea4-65fd25044e4e", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "source": [ + "d_paises = {} # Também podemos usar a função dict() para criar o dicionário vazio da seguinte forma: d_paises= dict()\n", + "d_paises" + ], + "execution_count": 11, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "{}" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 11 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "vCxZv-jmG5y0" + }, + "source": [ + "___\n", + "# **OBTER O TIPO DO OBJETO**\n", + "> type(d_dicionario)" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "voPYpGIGff3o", + "outputId": "174838de-d69c-40fe-fa79-6bd7082044e7", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "source": [ + "type(d_paises)" + ], + "execution_count": 12, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "dict" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 12 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "X3MvCkFiG-UO" + }, + "source": [ + "___\n", + "# **ADICIONAR ITENS AO DICIONÁRIO**" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "fzP8iG5xfi0H" + }, + "source": [ + "Adicionar o valor 'Italy' à key = 1. Em outras palavras, estamos a adicionar o item {1: 'Italy'}" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "EXZ7eEZofnza", + "outputId": "b6789781-3d15-47cd-edae-90c7b9ab4013", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "source": [ + "d_paises[1] = 'Italy'\n", + "d_paises" + ], + "execution_count": 13, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "{1: 'Italy'}" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 13 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "rH51ORGHHREE" + }, + "source": [ + "Adicionar o valor 'Denmark' à key= 2. Em outras palavras, estamos a adicionar o item {2: 'Denmark'}" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "GAXSzSiufv1u", + "outputId": "ad237289-6397-438f-910d-c79b0acd9dea", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "source": [ + "d_paises[2] = 'Denmark'\n", + "d_paises" + ], + "execution_count": 14, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "{1: 'Italy', 2: 'Denmark'}" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 14 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Xqdc_IYoHVVQ" + }, + "source": [ + "Adicionar o valor 'Brazil' à key= 3. Em outras palavras, estamos a adicionar o item {3: 'Brazil'}" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "FN7km8C9gAjM", + "outputId": "0905f1e6-6f19-40c7-a467-12c162ac1cc3", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "source": [ + "d_paises[3]= 'Brazil'\n", + "d_paises" + ], + "execution_count": 15, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "{1: 'Italy', 2: 'Denmark', 3: 'Brazil'}" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 15 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "iwU8pJKRHapD" + }, + "source": [ + "___\n", + "# **ATUALIZAR VALORES DO DICIONÁRIO**" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "CxXUV7TugLXn" + }, + "source": [ + "O que acontece quando eu atribuo à key 3 outro valor, por exemplo, 'France'. Vamos conferir abaixo:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "Rr6DtJnDgU5I", + "outputId": "31925676-fdbe-4c98-cb3a-f59978009711", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "source": [ + "# Adicionar o valor 'France' à key= 3\n", + "d_paises[3]= 'France'\n", + "d_paises" + ], + "execution_count": 16, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "{1: 'Italy', 2: 'Denmark', 3: 'France'}" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 16 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "xB9G1l3_ggo-" + }, + "source": [ + "Como a key= 3 existe no dicionário d_paises, então o Python substitui o valor anterior 'Brazil' pelo novo valor, 'France'. \n", + "\n", + "* Lembre-se, os dicionários são mutáveis!" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "T8JBxySZHiOJ" + }, + "source": [ + "___\n", + "# **OBTER KEYS DO DICIONÁRIO**" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "ALwbHwi4iwky", + "outputId": "bb0d57fb-2742-4eb1-9d82-9309142d21f5", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + } + }, + "source": [ + "d_paises.keys()" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "dict_keys([1, 2, 3])" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 10 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "FIvi0Li1Hng5" + }, + "source": [ + "___\n", + "# **OBTER VALORES DO DICIONÁRIO**" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "cp0PPtl3jEKo", + "outputId": "c7b8739a-caa9-4e58-e6d3-0f86ccd2d950", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + } + }, + "source": [ + "d_paises.values()" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "dict_values(['Italy', 'Denmark', 'France'])" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 11 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "JUblZBMjHrwl" + }, + "source": [ + "___\n", + "# **OBTER ITENS (key, value) DO DICIONÁRIO**" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "LraTwXjdjG3m", + "outputId": "b3d6d55e-20ad-4f88-a783-9ba1c4fd8654", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 162 + } + }, + "source": [ + "d_paises.items()" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "error", + "ename": "NameError", + "evalue": "ignored", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)", + "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0md_Paises\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mitems\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", + "\u001b[0;31mNameError\u001b[0m: name 'd_Paises' is not defined" + ] + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "IJEMg2LKHyGa" + }, + "source": [ + "___\n", + "# **OBTER VALOR PARA UMA KEY ESPECÍFICA**\n", + "* d_dicionario.get(key)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "dzgBhsphjSQm" + }, + "source": [ + "Qual o valor para key= 1?" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "FUfTjqktjW60", + "outputId": "678ab629-6cff-4fe1-e03f-d90709a98f26", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + } + }, + "source": [ + "d_paises.get(1)" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "'Italy'" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 11 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "tyJ0KsloIBoD" + }, + "source": [ + "___\n", + "# **COPIAR DICIONÁRIO**\n", + "* d_dicionario.copy()" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "XL17EmvMkkky", + "outputId": "d3e9648a-ed03-47c2-e650-4a7a74dcaa38", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "source": [ + "d_paises2 = d_paises.copy()\n", + "d_paises2" + ], + "execution_count": 17, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "{1: 'Italy', 2: 'Denmark', 3: 'France'}" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 17 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "8V25l2ZoIG4B" + }, + "source": [ + "___\n", + "# **REMOVER TODOS OS ITENS DO DICIONÁRIO**\n", + "* d_dicionario.clear()" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "r-8Gs1gYjqLN" + }, + "source": [ + "d_paises.clear()" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "ro_42gzDjsdV", + "outputId": "a2c2a25b-40ef-4842-f2f7-3ac85404d195", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 35 + } + }, + "source": [ + "d_paises" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "{}" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 13 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "pCzKkKoujv7G" + }, + "source": [ + "Como esperado, removemos todos os itens do dicionário d_paises. Entretanto, o dicionário d_paises continua a existir!" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "MKtPwGVsIaLQ" + }, + "source": [ + "___\n", + "# **DELETAR O DICIONÁRIO**\n", + "* del d_dicionario" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "8wvM-o7Lj7A0" + }, + "source": [ + "del d_paises" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "wK83ZURYkD_T", + "outputId": "03254461-9939-4ef9-de30-c4b59c920674", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 166 + } + }, + "source": [ + "d_paises" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "error", + "ename": "NameError", + "evalue": "ignored", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)", + "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mdCountries\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", + "\u001b[0;31mNameError\u001b[0m: name 'dCountries' is not defined" + ] + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "aSe3veUB1lo_" + }, + "source": [ + "Como esperado, pois agora o dicionário já não existe mais. Ok?" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "STtkGUvEg7d1" + }, + "source": [ + "___\n", + "# **ITERAR PELO DICIONÁRIO**\n", + "* Considere o dicionário d_frutas a seguir:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "IG8hKSvcfalZ" + }, + "source": [ + "# Definindo os valores iniciais do dicionário d_frutas:\n", + "d_frutas = {'Avocado': 0.35, \n", + " 'Apple': 0.40, \n", + " 'Apricot': 0.25, \n", + " 'Banana': 0.30, \n", + " 'Blackcurrant': 0.70, \n", + " 'Blackberry': 0.55, \n", + " 'Blueberry': 0.45, \n", + " 'Cherry': 0.50, \n", + " 'Coconut': 0.75, \n", + " 'Fig': 0.60, \n", + " 'Grape': 0.65, \n", + " 'Kiwi': 0.20, \n", + " 'Lemon': 0.15, \n", + " 'Mango': 0.80, \n", + " 'Nectarine': 0.75, \n", + " 'Orange': 0.25, \n", + " 'Papaya': 0.30,\n", + " 'Passion Fruit': 0.45,\n", + " 'Peach': 0.55,\n", + " 'Pineapple': 0.55,\n", + " 'Plum': 0.60,\n", + " 'Raspberry': 0.40,\n", + " 'Strawberry': 0.50,\n", + " 'Watermelon': 0.45}" + ], + "execution_count": 18, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ppRkK_jJJG6W" + }, + "source": [ + "Mostrando os itens do dicionário d_frutas:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "bI7Ctf0ohyz8", + "outputId": "05418ee0-ce00-439a-848a-de9d5084c900", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "source": [ + "d_frutas" + ], + "execution_count": 23, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "{'Apple': 0.4,\n", + " 'Apricot': 0.25,\n", + " 'Avocado': 0.35,\n", + " 'Banana': 0.3,\n", + " 'Blackberry': 0.55,\n", + " 'Blackcurrant': 0.7,\n", + " 'Blueberry': 0.45,\n", + " 'Cherry': 0.5,\n", + " 'Coconut': 0.75,\n", + " 'Fig': 0.6,\n", + " 'Grape': 0.65,\n", + " 'Kiwi': 0.2,\n", + " 'Lemon': 0.15,\n", + " 'Mango': 0.8,\n", + " 'Nectarine': 0.75,\n", + " 'Orange': 0.25,\n", + " 'Papaya': 0.3,\n", + " 'Passion Fruit': 0.45,\n", + " 'Peach': 0.55,\n", + " 'Pineapple': 0.55,\n", + " 'Plum': 0.6,\n", + " 'Raspberry': 0.4,\n", + " 'Strawberry': 0.5,\n", + " 'Watermelon': 0.45}" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 23 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "wXFfyiyPtD35" + }, + "source": [ + "Qual o valor para a fruta 'Apple'? Para responder à esta pergunta, basta lembrar que 'Apple' é uma key do dicionário d_frutas. Certo?" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "JpreyE_LtCcU", + "outputId": "cee4be2d-7980-4a3d-85fb-17561d1bb1ff", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + } + }, + "source": [ + "d_frutas['Apple']" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "0.4" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 21 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "huu835LGcyHL", + "outputId": "02e958ca-4133-4363-9ab1-c3eb767b2d5e", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "source": [ + "for chave in d_frutas.keys():\n", + " print (chave)" + ], + "execution_count": 26, + "outputs": [ + { + "output_type": "stream", + "text": [ + "Avocado\n", + "Apple\n", + "Apricot\n", + "Banana\n", + "Blackcurrant\n", + "Blackberry\n", + "Blueberry\n", + "Cherry\n", + "Coconut\n", + "Fig\n", + "Grape\n", + "Kiwi\n", + "Lemon\n", + "Mango\n", + "Nectarine\n", + "Orange\n", + "Papaya\n", + "Passion Fruit\n", + "Peach\n", + "Pineapple\n", + "Plum\n", + "Raspberry\n", + "Strawberry\n", + "Watermelon\n" + ], + "name": "stdout" + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "JBMf8SbAJmiq" + }, + "source": [ + "## Iterar pelas keys do dicionário:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "aDDD-tbmdj0o", + "outputId": "5a9c4751-4fb6-4ee1-83a0-629ca32d0fba", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "source": [ + "d_frutas.keys()" + ], + "execution_count": 30, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "dict_keys(['Avocado', 'Apple', 'Apricot', 'Banana', 'Blackcurrant', 'Blackberry', 'Blueberry', 'Cherry', 'Coconut', 'Fig', 'Grape', 'Kiwi', 'Lemon', 'Mango', 'Nectarine', 'Orange', 'Papaya', 'Passion Fruit', 'Peach', 'Pineapple', 'Plum', 'Raspberry', 'Strawberry', 'Watermelon'])" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 30 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "gOROGkRqfeUp", + "outputId": "c4252748-c64b-4df9-d82a-5648279c7765", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "source": [ + "d_frutas.values()" + ], + "execution_count": 31, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "dict_values([0.35, 0.4, 0.25, 0.3, 0.7, 0.55, 0.45, 0.5, 0.75, 0.6, 0.65, 0.2, 0.15, 0.8, 0.75, 0.25, 0.3, 0.45, 0.55, 0.55, 0.6, 0.4, 0.5, 0.45])" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 31 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "2YgHJOref4Qe", + "outputId": "6dab6f4a-6380-4b43-828c-2d0b696236bd", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "source": [ + "d_frutas.items()" + ], + "execution_count": 32, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "dict_items([('Avocado', 0.35), ('Apple', 0.4), ('Apricot', 0.25), ('Banana', 0.3), ('Blackcurrant', 0.7), ('Blackberry', 0.55), ('Blueberry', 0.45), ('Cherry', 0.5), ('Coconut', 0.75), ('Fig', 0.6), ('Grape', 0.65), ('Kiwi', 0.2), ('Lemon', 0.15), ('Mango', 0.8), ('Nectarine', 0.75), ('Orange', 0.25), ('Papaya', 0.3), ('Passion Fruit', 0.45), ('Peach', 0.55), ('Pineapple', 0.55), ('Plum', 0.6), ('Raspberry', 0.4), ('Strawberry', 0.5), ('Watermelon', 0.45)])" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 32 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "F8r8GgxvdMJA", + "outputId": "e7637082-e428-4f32-f09f-0ee22f82cf6f", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "source": [ + "for i_valor in d_frutas.values():\n", + " print (i_valor)" + ], + "execution_count": 27, + "outputs": [ + { + "output_type": "stream", + "text": [ + "0.35\n", + "0.4\n", + "0.25\n", + "0.3\n", + "0.7\n", + "0.55\n", + "0.45\n", + "0.5\n", + "0.75\n", + "0.6\n", + "0.65\n", + "0.2\n", + "0.15\n", + "0.8\n", + "0.75\n", + "0.25\n", + "0.3\n", + "0.45\n", + "0.55\n", + "0.55\n", + "0.6\n", + "0.4\n", + "0.5\n", + "0.45\n" + ], + "name": "stdout" + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "rMro_tY8kepo", + "outputId": "4488c243-6792-4efa-b271-e546270b129d", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 425 + } + }, + "source": [ + "for key in d_frutas.keys():\n", + " print(key)" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "text": [ + "Avocado\n", + "Apple\n", + "Apricot\n", + "Banana\n", + "Blackcurrant\n", + "Blackberry\n", + "Blueberry\n", + "Cherry\n", + "Coconut\n", + "Fig\n", + "Grape\n", + "Kiwi\n", + "Lemon\n", + "Mango\n", + "Nectarine\n", + "Orange\n", + "Papaya\n", + "Passion Fruit\n", + "Peach\n", + "Pineapple\n", + "Plum\n", + "Raspberry\n", + "Strawberry\n", + "Watermelon\n" + ], + "name": "stdout" + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "yDkOLvRFJxco" + }, + "source": [ + "## Iterar pelos itens (key, value) do dicionário" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "DpFB1g-3kDSt", + "outputId": "f94dd133-3c61-4ac9-b8df-d5ca641a66e1", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "source": [ + "for item in d_frutas.items():\n", + " print(item) " + ], + "execution_count": 28, + "outputs": [ + { + "output_type": "stream", + "text": [ + "('Avocado', 0.35)\n", + "('Apple', 0.4)\n", + "('Apricot', 0.25)\n", + "('Banana', 0.3)\n", + "('Blackcurrant', 0.7)\n", + "('Blackberry', 0.55)\n", + "('Blueberry', 0.45)\n", + "('Cherry', 0.5)\n", + "('Coconut', 0.75)\n", + "('Fig', 0.6)\n", + "('Grape', 0.65)\n", + "('Kiwi', 0.2)\n", + "('Lemon', 0.15)\n", + "('Mango', 0.8)\n", + "('Nectarine', 0.75)\n", + "('Orange', 0.25)\n", + "('Papaya', 0.3)\n", + "('Passion Fruit', 0.45)\n", + "('Peach', 0.55)\n", + "('Pineapple', 0.55)\n", + "('Plum', 0.6)\n", + "('Raspberry', 0.4)\n", + "('Strawberry', 0.5)\n", + "('Watermelon', 0.45)\n" + ], + "name": "stdout" + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "fVcEz1OMiUBu", + "outputId": "e7b5e949-2e02-4d22-cea8-980fc1844f43", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "source": [ + "d_frutas2 = {k: v for k,v in filter(lambda t: t[0] == 'Apple', d_frutas.items())} # o t[0] refere-se à chave do dicionário\n", + "d_frutas2" + ], + "execution_count": 33, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "{'Apple': 0.4}" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 33 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "V6j2Q0jngTc6" + }, + "source": [ + "for key, value in d_frutas.items():\n", + " " + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "8z6qO74fJ6Q1" + }, + "source": [ + "## Iterar pelos valores do dicionário" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "tjJ6qRF8nr4v", + "outputId": "55fe54a5-4702-4a07-c050-0fc83d2de5ca", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "source": [ + "for value in d_frutas.values():\n", + " print(value)" + ], + "execution_count": 29, + "outputs": [ + { + "output_type": "stream", + "text": [ + "0.35\n", + "0.4\n", + "0.25\n", + "0.3\n", + "0.7\n", + "0.55\n", + "0.45\n", + "0.5\n", + "0.75\n", + "0.6\n", + "0.65\n", + "0.2\n", + "0.15\n", + "0.8\n", + "0.75\n", + "0.25\n", + "0.3\n", + "0.45\n", + "0.55\n", + "0.55\n", + "0.6\n", + "0.4\n", + "0.5\n", + "0.45\n" + ], + "name": "stdout" + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "-LmEUroVKDUA" + }, + "source": [ + "## Iterar pela key e valor do dicionário" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "oRhZ_Zq9oQIg", + "outputId": "be168183-30b4-4f96-ae2c-3f313acbc558", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 425 + } + }, + "source": [ + "for key, value in d_frutas.items():\n", + " print(\"%s --> %s\" %(key, value))" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "text": [ + "Avocado --> 0.35\n", + "Apple --> 0.4\n", + "Apricot --> 0.25\n", + "Banana --> 0.3\n", + "Blackcurrant --> 0.7\n", + "Blackberry --> 0.55\n", + "Blueberry --> 0.45\n", + "Cherry --> 0.5\n", + "Coconut --> 0.75\n", + "Fig --> 0.6\n", + "Grape --> 0.65\n", + "Kiwi --> 0.2\n", + "Lemon --> 0.15\n", + "Mango --> 0.8\n", + "Nectarine --> 0.75\n", + "Orange --> 0.25\n", + "Papaya --> 0.3\n", + "Passion Fruit --> 0.45\n", + "Peach --> 0.55\n", + "Pineapple --> 0.55\n", + "Plum --> 0.6\n", + "Raspberry --> 0.4\n", + "Strawberry --> 0.5\n", + "Watermelon --> 0.45\n" + ], + "name": "stdout" + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Fotx7XUquAo8" + }, + "source": [ + "___\n", + "# **VERIFICAR SE UMA KEY ESPECÍFICA PERTENCE AO DICIONÁRIO**" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ju__WsSoKXtk" + }, + "source": [ + "A fruta 'Apple' (que em nosso caso, é uma key) existe no dicionário?" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "-gkEKNZPTeMp", + "outputId": "3540aadd-996a-4abd-cfcb-c22e49b75aaa", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + } + }, + "source": [ + "'Apple' in d_frutas.keys()" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "True" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 75 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "fMzBeFMIusv7" + }, + "source": [ + "A fruta 'Coconut' pertence ao dicionário d_frutas?" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "SKtEwmBCuxyi", + "outputId": "1df7263c-a64f-4eaf-8d4d-a55cac03d2bc", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + } + }, + "source": [ + "'Coconut' in fruits.keys()" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "True" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 77 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "rrH8ArqsK6Bd" + }, + "source": [ + "___\n", + "# **VERIFICAR SE VALOR PERTENCE AO DICIONÁRIO**" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "DbWpbuLTK9sn", + "outputId": "e9fafa6d-284e-4862-8f25-9419ff702dec", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + } + }, + "source": [ + "0.4 in d_frutas.values()" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "True" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 14 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "36kmLUYDvsUI" + }, + "source": [ + "## Adicionar novos itens ao dicionário\n", + "* Considere o dicionário d_frutas2 abaixo:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "5Rwq4-UG4--u" + }, + "source": [ + "d_frutas2 = {'Grapefruit': 1.0 }" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "vljceM6_5H9o" + }, + "source": [ + "O comando abaixo adiciona o dicionário d_frutas2 ao dicionário d_frutas." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "7BD_mYMM5O5o", + "outputId": "2b185546-255e-4ad0-e8c9-10564fcbe2b0", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 442 + } + }, + "source": [ + "d_frutas.update(d_frutas2)\n", + "d_frutas" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "{'Apple': 0.4,\n", + " 'Apricot': 0.25,\n", + " 'Avocado': 0.35,\n", + " 'Banana': 0.3,\n", + " 'Blackberry': 0.55,\n", + " 'Blackcurrant': 0.7,\n", + " 'Blueberry': 0.45,\n", + " 'Cherry': 0.5,\n", + " 'Coconut': 0.75,\n", + " 'Fig': 0.6,\n", + " 'Grape': 0.65,\n", + " 'Grapefruit': 1.0,\n", + " 'Kiwi': 0.2,\n", + " 'Lemon': 0.15,\n", + " 'Mango': 0.8,\n", + " 'Nectarine': 0.75,\n", + " 'Orange': 0.25,\n", + " 'Papaya': 0.3,\n", + " 'Passion Fruit': 0.45,\n", + " 'Peach': 0.55,\n", + " 'Pineapple': 0.55,\n", + " 'Plum': 0.6,\n", + " 'Raspberry': 0.4,\n", + " 'Strawberry': 0.5,\n", + " 'Watermelon': 0.45}" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 79 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ffh-94lo55n4" + }, + "source": [ + "Agora, considere o dicionário d_frutas3 abaixo:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "JMAq_jbP5---" + }, + "source": [ + "d_frutas3 = {'Apple': 0.70}" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Jd6B2cy-6KmY" + }, + "source": [ + "Qual o resultado do comando abaixo?\n", + "\n", + "* Atenção: A fruta 'Apple' (é uma key do dicionário d_frutas) tem valor 0.40. E no dicionário d_frutas3 a fruta 'Apple' tem valor 0.70." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "E4GKdTw76PXI" + }, + "source": [ + "d_frutas.update(d_frutas3)\n", + "d_frutas" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "HMmDfrln6o0c" + }, + "source": [ + "Como esperado, como key= 'Apple' existe no dicionário d_frutas, então o Python atualizou o valor de key= 'Apple' para 0.70." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "SWO2GdNovxAp" + }, + "source": [ + "## Modificar keys e valores" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "DX9UTy4TwlAw" + }, + "source": [ + "Suponha que queremos aplicar um desconto de 10% para cada fruta do nosso dicionário.\n", + "\n", + "* Como fazemos isso?" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "ZziGmKGmwqwn" + }, + "source": [ + "for key, value in d_frutas.items():\n", + " d_frutas[key] = round(value * 0.9, 2)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "s1B-yN8lM-C1" + }, + "source": [ + "Mostra d_frutas com os valores atualizados:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "zZLa85knxBtY", + "outputId": "2c7c12f8-8885-4f34-a0d1-1323e98a9437", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 442 + } + }, + "source": [ + "d_frutas" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "{'Apple': 0.63,\n", + " 'Apricot': 0.23,\n", + " 'Avocado': 0.32,\n", + " 'Banana': 0.27,\n", + " 'Blackberry': 0.5,\n", + " 'Blackcurrant': 0.63,\n", + " 'Blueberry': 0.41,\n", + " 'Cherry': 0.45,\n", + " 'Coconut': 0.68,\n", + " 'Fig': 0.54,\n", + " 'Grape': 0.59,\n", + " 'Grapefruit': 0.9,\n", + " 'Kiwi': 0.18,\n", + " 'Lemon': 0.14,\n", + " 'Mango': 0.72,\n", + " 'Nectarine': 0.68,\n", + " 'Orange': 0.23,\n", + " 'Papaya': 0.27,\n", + " 'Passion Fruit': 0.41,\n", + " 'Peach': 0.5,\n", + " 'Pineapple': 0.5,\n", + " 'Plum': 0.54,\n", + " 'Raspberry': 0.36,\n", + " 'Strawberry': 0.45,\n", + " 'Watermelon': 0.41}" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 84 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "vpN54l4vxze5" + }, + "source": [ + "## Deletar keys do dicionário\n", + "* Deletar uma key significa deletar todo o item {key: value}, ok?" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "eDlthLStNIwR" + }, + "source": [ + "Suponha que queremos deletar a fruta 'Avocado' do dicionário d_frutas.\n", + "\n", + "* Como fazer isso?" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "fnpzHZU_x5Y1" + }, + "source": [ + "for key in list(d_frutas.keys()): # Dica: use a função list para melhorar a performance computacional\n", + " if key == 'Avocado':\n", + " del d_frutas[key] # Deleta key = 'Avocado'" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "VyPUrobONqvI" + }, + "source": [ + "Mostra o dicionário d_frutas atualizado:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "IwnsHejhyT4l", + "outputId": "b910699c-9729-4a27-bd78-3a283c82ac39", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 425 + } + }, + "source": [ + "d_frutas" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "{'Apple': 0.63,\n", + " 'Apricot': 0.23,\n", + " 'Banana': 0.27,\n", + " 'Blackberry': 0.5,\n", + " 'Blackcurrant': 0.63,\n", + " 'Blueberry': 0.41,\n", + " 'Cherry': 0.45,\n", + " 'Coconut': 0.68,\n", + " 'Fig': 0.54,\n", + " 'Grape': 0.59,\n", + " 'Grapefruit': 0.9,\n", + " 'Kiwi': 0.18,\n", + " 'Lemon': 0.14,\n", + " 'Mango': 0.72,\n", + " 'Nectarine': 0.68,\n", + " 'Orange': 0.23,\n", + " 'Papaya': 0.27,\n", + " 'Passion Fruit': 0.41,\n", + " 'Peach': 0.5,\n", + " 'Pineapple': 0.5,\n", + " 'Plum': 0.54,\n", + " 'Raspberry': 0.36,\n", + " 'Strawberry': 0.45,\n", + " 'Watermelon': 0.41}" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 86 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "u4HOf9SNytSq" + }, + "source": [ + "## Filtrar/Selecionar itens baseado em condições\n", + "Em algumas situações você vai querer filtrar os itens do dicionário que satisfaçam alguma(s) condições.\n", + "\n", + "* Considere o exemplo a seguir: queremos selecionar/filtrar somente as frutas com preços maiores que 0.4." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "EwqxWiVlyvgH" + }, + "source": [ + "d_frutas_filtro = {}\n", + "for key, value in d_frutas.items():\n", + " if value > 0.5:\n", + " d_frutas_filtro.update({key: value})" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "eb0jmAKWOtYt" + }, + "source": [ + "Mostra o resultado do dicionário d_frutas_Selected:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "SsStWM5k1s-Q", + "outputId": "f6af5b61-2333-41c7-a28a-0f6a67b0a949", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 170 + } + }, + "source": [ + "d_frutas_filtro" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "{'Apple': 0.63,\n", + " 'Blackcurrant': 0.63,\n", + " 'Coconut': 0.68,\n", + " 'Fig': 0.54,\n", + " 'Grape': 0.59,\n", + " 'Grapefruit': 0.9,\n", + " 'Mango': 0.72,\n", + " 'Nectarine': 0.68,\n", + " 'Plum': 0.54}" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 89 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "u1ve6xIGOjrE" + }, + "source": [ + " Como se pode ver, somente a fruta 'Blackberry' satifaz esta condição." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "KJqpPrfkCk9L" + }, + "source": [ + "## Cálculos com os itens do dicionário" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "exD8HXodCqg6" + }, + "source": [ + "from collections import Counter" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "llCLTysdCuwB" + }, + "source": [ + "Somando os valores de todas as frutas" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "uG0VP1MNCroX", + "outputId": "8221b07b-610d-4a7c-cb14-86d6f63e5be3", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + } + }, + "source": [ + "sum(d_frutas.values())" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "11.450000000000001" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 22 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "a5MBNCF-C5-4" + }, + "source": [ + "Quantos itens existem no dicionário:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "AkvygR0PC9bT", + "outputId": "254eff41-8336-4fe6-d6ad-4d52544d74a9", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + } + }, + "source": [ + "len(list(d_frutas))" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "24" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 25 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "xBNFaklq8OC9" + }, + "source": [ + "## Sortear itens do dicionário - sorted(d_dicionario.items(), reverse= True/False)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "WULJMjHA-mal" + }, + "source": [ + "Ordem alfabética (por key):" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "SH0WIKZ8-Ylr", + "outputId": "b9cea719-637e-40a5-9e79-eb67aeb47887", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 425 + } + }, + "source": [ + "d_frutas_ordenadas = sorted(d_frutas.items(), reverse = False)\n", + "d_frutas_ordenadas" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "[('Apple', 0.4),\n", + " ('Apricot', 0.25),\n", + " ('Avocado', 0.35),\n", + " ('Banana', 0.3),\n", + " ('Blackberry', 0.55),\n", + " ('Blackcurrant', 0.7),\n", + " ('Blueberry', 0.45),\n", + " ('Cherry', 0.5),\n", + " ('Coconut', 0.75),\n", + " ('Fig', 0.6),\n", + " ('Grape', 0.65),\n", + " ('Kiwi', 0.2),\n", + " ('Lemon', 0.15),\n", + " ('Mango', 0.8),\n", + " ('Nectarine', 0.75),\n", + " ('Orange', 0.25),\n", + " ('Papaya', 0.3),\n", + " ('Passion Fruit', 0.45),\n", + " ('Peach', 0.55),\n", + " ('Pineapple', 0.55),\n", + " ('Plum', 0.6),\n", + " ('Raspberry', 0.4),\n", + " ('Strawberry', 0.5),\n", + " ('Watermelon', 0.45)]" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 12 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "T4Li1Q2d-pnZ" + }, + "source": [ + "Ordem reversa (por key):" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "PoBOmfpM_A_a", + "outputId": "4cd9a21c-a2ad-462c-acb0-26ba7a0a4e5d", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 425 + } + }, + "source": [ + "d_frutas_ordenadas_reverse = sorted(d_frutas.items(), reverse = True)\n", + "d_frutas_ordenadas_reverse" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "[('Watermelon', 0.45),\n", + " ('Strawberry', 0.5),\n", + " ('Raspberry', 0.4),\n", + " ('Plum', 0.6),\n", + " ('Pineapple', 0.55),\n", + " ('Peach', 0.55),\n", + " ('Passion Fruit', 0.45),\n", + " ('Papaya', 0.3),\n", + " ('Orange', 0.25),\n", + " ('Nectarine', 0.75),\n", + " ('Mango', 0.8),\n", + " ('Lemon', 0.15),\n", + " ('Kiwi', 0.2),\n", + " ('Grape', 0.65),\n", + " ('Fig', 0.6),\n", + " ('Coconut', 0.75),\n", + " ('Cherry', 0.5),\n", + " ('Blueberry', 0.45),\n", + " ('Blackcurrant', 0.7),\n", + " ('Blackberry', 0.55),\n", + " ('Banana', 0.3),\n", + " ('Avocado', 0.35),\n", + " ('Apricot', 0.25),\n", + " ('Apple', 0.4)]" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 11 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "FxTC2-U88ajk" + }, + "source": [ + "## Função filter()\n", + "* A função filter() aplica um filtro no dicionário, retornando apenas os itens que satisfaz as condições do filtro." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "iJq1clvOHVG2", + "outputId": "16a779ef-48c9-497c-8c7c-a1612aa9aa03", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 425 + } + }, + "source": [ + "d_frutas" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "{'Apple': 0.4,\n", + " 'Apricot': 0.25,\n", + " 'Avocado': 0.35,\n", + " 'Banana': 0.3,\n", + " 'Blackberry': 0.55,\n", + " 'Blackcurrant': 0.7,\n", + " 'Blueberry': 0.45,\n", + " 'Cherry': 0.5,\n", + " 'Coconut': 0.75,\n", + " 'Fig': 0.6,\n", + " 'Grape': 0.65,\n", + " 'Kiwi': 0.2,\n", + " 'Lemon': 0.15,\n", + " 'Mango': 0.8,\n", + " 'Nectarine': 0.75,\n", + " 'Orange': 0.25,\n", + " 'Papaya': 0.3,\n", + " 'Passion Fruit': 0.45,\n", + " 'Peach': 0.55,\n", + " 'Pineapple': 0.55,\n", + " 'Plum': 0.6,\n", + " 'Raspberry': 0.4,\n", + " 'Strawberry': 0.5,\n", + " 'Watermelon': 0.45}" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 2 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "qtTKvNeJNycl" + }, + "source": [ + "### Filtrando por key:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "uIDW5FhwAiSs", + "outputId": "52599d3f-ff13-4894-f697-ce7290bff9d5", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + } + }, + "source": [ + "d_frutas2 = {k: v for k, v in filter(lambda t: t[0] == 'Apple', d_frutas.items())}\n", + "d_frutas2" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "{'Apple': 0.4}" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 6 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "nUMGIzxeNt_U" + }, + "source": [ + "### Filtrando por valor:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "tvHcQatANltL", + "outputId": "8feaf5b1-1db8-4391-8950-248ba8ab46c5", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 187 + } + }, + "source": [ + "d_frutas3 = {k: v for k, v in filter(lambda t: t[1] > 0.5, d_frutas.items())}\n", + "d_frutas3" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "{'Blackberry': 0.55,\n", + " 'Blackcurrant': 0.7,\n", + " 'Coconut': 0.75,\n", + " 'Fig': 0.6,\n", + " 'Grape': 0.65,\n", + " 'Mango': 0.8,\n", + " 'Nectarine': 0.75,\n", + " 'Peach': 0.55,\n", + " 'Pineapple': 0.55,\n", + " 'Plum': 0.6}" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 7 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "qA_XhCdmA6Gn" + }, + "source": [ + "___\n", + "# **EXERCÍCIOS**" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "RSpyl_URgNyE" + }, + "source": [ + "## Exercício 1\n", + "* É possível sortear os itens de um dicionário? Explique sua resposta." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "CXqc9kHch6Mm" + }, + "source": [ + "## Exercício 2\n", + "* É possível termos um dicionário do tipo abaixo?" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "0BBWO9Zth_mc", + "outputId": "330cd62b-9b7b-4b72-e3b8-1b1a5d3e9ee3", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "source": [ + "d_colaboradores= {'Gerentes': ['A', 'B', 'C'], 'Programadores': ['B', 'D', 'E', 'F', 'G'], 'Gerentes_Projeto': ['A', 'E']}\n", + "d_colaboradores" + ], + "execution_count": 34, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "{'Gerentes': ['A', 'B', 'C'],\n", + " 'Gerentes_Projeto': ['A', 'E'],\n", + " 'Programadores': ['B', 'D', 'E', 'F', 'G']}" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 34 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "TNiJSG_uiePb" + }, + "source": [ + "Como acessar o Gerente 'A'?" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "k0YZg0gMjzCT", + "outputId": "333e147c-d9a0-452f-f152-a0dacf4182b8", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "source": [ + "d_colaboradores ['Gerentes']" + ], + "execution_count": 35, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "['A', 'B', 'C']" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 35 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "U7fAFy_8j48J", + "outputId": "84cd7173-35db-4329-e6d6-0d2ba45b60b6", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "source": [ + "d_colaboradores ['Programadores']" + ], + "execution_count": 36, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "['B', 'D', 'E', 'F', 'G']" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 36 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "Wh61G2i2kE3j", + "outputId": "39297cee-ad6a-4df2-f0bf-3b21b82c48d4", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 35 + } + }, + "source": [ + "s_gerente_A = d_colaboradores ['Gerentes']\n", + "s_gerente_A [0]\n", + "\n" + ], + "execution_count": 37, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "string" + }, + "text/plain": [ + "'A'" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 37 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Ws8GtJr6nlqJ" + }, + "source": [ + "" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "2zq4kU-smVju", + "outputId": "867cef53-26d9-47c2-9a4c-97124ade8fe1", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 163 + } + }, + "source": [ + "d_colaboradores.values('A')\n" + ], + "execution_count": 41, + "outputs": [ + { + "output_type": "error", + "ename": "TypeError", + "evalue": "ignored", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)", + "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0md_colaboradores\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mvalues\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'A'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", + "\u001b[0;31mTypeError\u001b[0m: values() takes no arguments (1 given)" + ] + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ntVcr_3XwaQ-" + }, + "source": [ + "## Exercício 3\n", + "Consulte a página [Python Data Types: Dictionary - Exercises, Practice, Solution](https://www.w3resource.com/python-exercises/dictionary/) para mais exercícios relacionados à dicionários." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "PmW40kENj4NO" + }, + "source": [ + "" + ], + "execution_count": null, + "outputs": [] + } + ] +} \ No newline at end of file diff --git a/Notebooks/NB15_00_gerson__Machine_Learning___DSWP.ipynb b/Notebooks/NB15_00_gerson__Machine_Learning___DSWP.ipynb new file mode 100644 index 000000000..b7b17b205 --- /dev/null +++ b/Notebooks/NB15_00_gerson__Machine_Learning___DSWP.ipynb @@ -0,0 +1,4554 @@ +{ + "nbformat": 4, + "nbformat_minor": 0, + "metadata": { + "kernelspec": { + "name": "python3", + "display_name": "Python 3" + }, + "colab": { + "name": "NB15_00__Machine_Learning.ipynb", + "provenance": [], + "include_colab_link": true + }, + "accelerator": "TPU" + }, + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "view-in-github", + "colab_type": "text" + }, + "source": [ + "\"Open" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ShVXyGj9wkgN" + }, + "source": [ + "

MACHINE LEARNING WITH PYTHON

" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "aYQ4cDfcPu4e" + }, + "source": [ + "___\n", + "# **NOTAS E OBSERVAÇÕES**\n", + "* Abordar o impacto do desbalanceamento da amostra;\n", + "* Colocar AUROC no material e mostrar o cut off para classificação entre 0 e 1;\n", + "* Conceitos estatísticos de bias & variance;" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "5YvhLC_uf4_G" + }, + "source": [ + "___\n", + "# **AGENDA**\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "QgX6n2VDyY1O" + }, + "source": [ + "___\n", + "# **REFERÊNCIAS**\n", + "* [scikit-learn - Machine Learning With Python](https://scikit-learn.org/stable/);\n", + "* [An Introduction to Machine Learning Theory and Its Applications: A Visual Tutorial with Examples](https://www.toptal.com/machine-learning/machine-learning-theory-an-introductory-primer)\n", + "* [The Difference Between Artificial Intelligence, Machine Learning, and Deep Learning](https://medium.com/iotforall/the-difference-between-artificial-intelligence-machine-learning-and-deep-learning-3aa67bff5991)\n", + "* [A Gentle Guide to Machine Learning](https://blog.monkeylearn.com/a-gentle-guide-to-machine-learning/)\n", + "* [A Visual Introduction to Machine Learning](http://www.r2d3.us/visual-intro-to-machine-learning-part-1/)\n", + "* [Introduction to Machine Learning](http://alex.smola.org/drafts/thebook.pdf)\n", + "* [The 10 Statistical Techniques Data Scientists Need to Master](https://medium.com/cracking-the-data-science-interview/the-10-statistical-techniques-data-scientists-need-to-master-1ef6dbd531f7)\n", + "* [Tune: a library for fast hyperparameter tuning at any scale](https://towardsdatascience.com/fast-hyperparameter-tuning-at-scale-d428223b081c)\n", + "* [How to lie with Data Science](https://towardsdatascience.com/how-to-lie-with-data-science-5090f3891d9c)\n", + "* [5 Reasons “Logistic Regression” should be the first thing you learn when becoming a Data Scientist](https://towardsdatascience.com/5-reasons-logistic-regression-should-be-the-first-thing-you-learn-when-become-a-data-scientist-fcaae46605c4)\n", + "* [Machine learning on categorical variables](https://towardsdatascience.com/machine-learning-on-categorical-variables-3b76ffe4a7cb)\n", + "\n", + "## Deep Learning & Neural Networks\n", + "\n", + "- [An Introduction to Neural Networks](http://www.cs.stir.ac.uk/~lss/NNIntro/InvSlides.html)\n", + "- [An Introduction to Image Recognition with Deep Learning](https://medium.com/@ageitgey/machine-learning-is-fun-part-3-deep-learning-and-convolutional-neural-networks-f40359318721)\n", + "- [Neural Networks and Deep Learning](http://neuralnetworksanddeeplearning.com/index.html)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "TsCbZd2epfxo" + }, + "source": [ + "___\n", + "# **INTRODUÇÃO**\n", + "\n", + "* \"__Information is the oil of the 21st century, and analytics is the combustion engine__.\" - Peter Sondergaard, SVP, Garner Research;\n", + "\n", + "\n", + ">O foco deste capítulo será:\n", + "* Linear, Logistic Regression, Decision Tree, Random Forest, Support Vector Machine and XGBoost algorithms for building Machine Learning models;\n", + "* Entender como resolver problemas de classificação e Regressão;\n", + "* Aplicar técnicas de Ensemble como Bagging e Boosting;\n", + "* Como medir a acurácia dos modelos de Machine Learning;\n", + "* Aprender os principais algoritmos de Machine Learning tanto das técnicas de aprendizagem supervisionada quanto da não-supervisionada.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "HqqB2vaHXMGt" + }, + "source": [ + "___\n", + "# **ARTIFICIAL INTELLIGENCE VS MACHINE LEARNING VS DEEP LEARNING**\n", + "* **Machine Learning** - dá aos computadores a capacidade de aprender sem serem explicitamente programados. Os computadores podem melhorar sua capacidade de aprendizagem através da prática de uma tarefa, geralmente usando grandes conjuntos de dados.\n", + "* **Deep Learning** - é um método de Machine Learning que depende de redes neurais artificiais, permitindo que os sistemas de computadores aprendam pelo exemplo, assim como nós humanos aprendemos." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "P961GcguXFFA" + }, + "source": [ + "![EvolutionOfAI](https://github.com/MathMachado/Materials/blob/master/Evolution%20of%20AI.PNG?raw=true)\n", + "\n", + "Source: [Artificial Intelligence vs. Machine Learning vs. Deep Learning](https://github.com/MathMachado/P4ML/blob/DS_Python/Material/Evolution%20of%20AI.PNG)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "lkqGtO88ZkPr" + }, + "source": [ + "![AI_vs_ML_vs_DL](https://github.com/MathMachado/Materials/blob/master/AI_vs_ML_vs_DL.PNG?raw=true)\n", + "\n", + "Source: [Artificial Intelligence vs. Machine Learning vs. Deep Learning](https://towardsdatascience.com/artificial-intelligence-vs-machine-learning-vs-deep-learning-2210ba8cc4ac)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "xesQpzfmaqj6" + }, + "source": [ + "![ML_vs_DL](https://github.com/MathMachado/Materials/blob/master/ML_vs_DL.PNG?raw=true)\n", + "\n", + "Source: [Artificial Intelligence vs. Machine Learning vs. Deep Learning](https://towardsdatascience.com/artificial-intelligence-vs-machine-learning-vs-deep-learning-2210ba8cc4ac)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "KeIVR59IIS7f" + }, + "source": [ + "___\n", + "# **MACHINE LEARNING - TECHNIQUES**\n", + "\n", + "* Supervised Learning\n", + "* Unsupervised Learning\n", + "\n", + "![MachineLearning](https://github.com/MathMachado/Materials/blob/master/MachineLearningTechniques.jpg?raw=true)\n", + "\n", + "Source: [Machine Learning for Everyone](https://vas3k.com/blog/machine_learning/?source=post_page-----885aa35db58b----------------------)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "rvwp5UHdBiup" + }, + "source": [ + "___\n", + "# **NOSSO FOCO AQUI SERÁ...**\n", + "\n", + "![ClassicalML](https://github.com/MathMachado/Materials/blob/master/ClassicalML.jpg?raw=true)\n", + "\n", + "Source: [Machine Learning for Everyone](https://vas3k.com/blog/machine_learning/?source=post_page-----885aa35db58b----------------------)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "cBLSvJTXHBjK" + }, + "source": [ + "___\n", + "# **CHEETSHEET**" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ZdjR3nahUuKq" + }, + "source": [ + "\n", + "![Scikit-Learn](https://github.com/MathMachado/Materials/blob/master/scikit-learn-1.png?raw=true)\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "MkBSvyorGXQz" + }, + "source": [ + "___\n", + "# **CROSS-VALIDATION**\n", + "* K-fold é o método de Cross-Validation (CV) mais conhecido e utilizado;\n", + "* Como funciona: divide o dataframe de treinamento em k partes;\n", + " * Usa k-1 partes para treinar o modelo e o restante para validar o modelo;\n", + " * repete este processo k vezes, sendo que em cada iteração calcula as métricas desejadas;\n", + " * Ao final das k iterações, teremos k métricas das quais calculamos média e desvio-padrão.\n", + "\n", + " A figura abaixo nos ajuda a entender como funciona CV:\n", + "\n", + "![Cross-Validation](https://github.com/MathMachado/Materials/blob/master/CV2.PNG?raw=true)\n", + "\n", + "Source: [5 Reasons why you should use Cross-Validation in your Data Science Projects](https://towardsdatascience.com/5-reasons-why-you-should-use-cross-validation-in-your-data-science-project-8163311a1e79)\n", + "\n", + "* **valor de k**:\n", + " * valor de k (folds): entre 5 e 10 --> Não há regra geral para a escolha de k;\n", + " * Quanto maior o valor de k, menor o viés do CV;\n", + "\n", + "[Applied Predictive Modeling, 2013](https://www.amazon.com/Applied-Predictive-Modeling-Max-Kuhn/dp/1461468485/ref=as_li_ss_tl?ie=UTF8&qid=1520380699&sr=8-1&keywords=applied+predictive+modeling&linkCode=sl1&tag=inspiredalgor-20&linkId=1af1f3de89c11e4a7fd49de2b05e5ebf)." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "HscfN-a1V043" + }, + "source": [ + "* **Vantagens do uso de CV**:\n", + " * Modelos com melhor acurácia;\n", + " * Melhor uso dos dados, pois todos os dados são utilizados como treinamento e validação. Portanto, qualquer problema com os dados serão encontrados nesta fase.\n", + "\n", + "* **Leitura Adicional**\n", + " * [Cross-Validation in Machine Learning](https://towardsdatascience.com/cross-validation-in-machine-learning-72924a69872f)\n", + " * [5 Reasons why you should use Cross-Validation in your Data Science Projects](https://towardsdatascience.com/5-reasons-why-you-should-use-cross-validation-in-your-data-science-project-8163311a1e79)\n", + " * [Cross-validation: evaluating estimator performance](https://scikit-learn.org/stable/modules/cross_validation.html)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "XRukccWQSklx" + }, + "source": [ + "## Medidas para avaliarmos a variabilidade presente nos dados\n", + "* As principais medidas para medirmos a variabilidade dos dados são amplitude, variância, desvio padrão e coeficiente de variação;\n", + "* Estas medidas nos permite concluir se os dados são homogêneos (menor dispersão/variabilidade) ou heterogêneos (maior variabilidade/dispersão).\n", + "\n", + "* **Na próxima versão, trazer estes conceitos para o Notebook e usar o Python para calcular estas medidas**." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "yBR8tWV_lhQq" + }, + "source": [ + "___\n", + "# **ENSEMBLE METHODS** (= Combinar modelos preditivos)\n", + "* Métodos\n", + " * **Bagging** (Bootstrap AGGregatING)\n", + " * **Boosting**\n", + " * Stacking --> Não é muito utilizado\n", + "* Evita overfitting (Overfitting é quando o modelo/função se ajusta muito bem ao dados de treinamento, sendo ineficiente para generalizar para outras amostras/população).\n", + "* Constroi meta-classificadores: combinar os resultados de vários algoritmos para produzir previsões mais precisas e robustas do que as previsões de cada classificador individual.\n", + "* Ensemble reduz/minimiza os efeitos das principais causas de erros nos modelos de Machine Learning:\n", + " * ruído;\n", + " * bias (viés);\n", + " * variância --> Principal medida para medir a variabilidade presente nos dados.\n", + "\n", + "# Referências\n", + "* [Simple guide for ensemble learning methods](https://towardsdatascience.com/simple-guide-for-ensemble-learning-methods-d87cc68705a2) - Explica didaticamente como funcionam ensembes." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "25RW8u-Sj780" + }, + "source": [ + "### Leitura Adicional\n", + "* [Ensemble methods: bagging, boosting and stacking](https://towardsdatascience.com/ensemble-methods-bagging-boosting-and-stacking-c9214a10a205)\n", + "* [Ensemble Methods in Machine Learning: What are They and Why Use Them?](https://towardsdatascience.com/ensemble-methods-in-machine-learning-what-are-they-and-why-use-them-68ec3f9fef5f)\n", + "* [Ensemble Learning Using Scikit-learn](https://towardsdatascience.com/ensemble-learning-using-scikit-learn-85c4531ff86a)\n", + "* [Let’s Talk About Machine Learning Ensemble Learning In Python](https://medium.com/fintechexplained/lets-talk-about-machine-learning-ensemble-learning-in-python-382747e5fba8)\n", + "* [Boosting, Bagging, and Stacking — Ensemble Methods with sklearn and mlens](https://medium.com/@rrfd/boosting-bagging-and-stacking-ensemble-methods-with-sklearn-and-mlens-a455c0c982de)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "FugME1HSl4jJ" + }, + "source": [ + "___\n", + "# **PARAMETER TUNNING** (= Parâmetros ótimos dos modelos de Machine Learning)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "u_147cIRl9F1" + }, + "source": [ + "## GridSearch (Ferramenta ou meio que vamos utilizar para otimização dos parâmetros dos modelos de ML)\n", + "* Encontra os parâmetros ótimos (hyperparameter tunning) que melhoram a acurácia dos modelos.\n", + "* Necessita dos seguintes inputs:\n", + " * A matrix $X_{p}$ com as $p$ COLUNAS (variáveis ou atributos) do dataframe;\n", + " * A matriz $y_{p}$ com a COLUNA-target (vaiável resposta);\n", + " * Exemplo: DecisionTree, RandomForestClassifier, XGBoostClassificer e etc;\n", + " * Um dicionário com os parâmetros a serem otimizados;\n", + " * O número de folds para o método de Cross-validation." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "39Sg77fbTWCO" + }, + "source": [ + "___\n", + "# **MODEL SELECTION & EVALUATION**\n", + "> Nesta fase identificamos e aplicamos as melhores métricas (Accuracy, Sensitivity, Specificity, F-Score, AUC, R-Sq, Adj R-SQ, RMSE (Root Mean Square Error)) para avaliar o desempenho/acurácia/performance dos modelos de ML.\n", + ">> Treinamos os modelos de ML usando a amostra de treinamento e avaliamos o desempenho/acurácia/performance na amostra de teste/validação.\n", + "\n", + "* Leitura Adicional\n", + " * [The 5 Classification Evaluation metrics every Data Scientist must know](https://towardsdatascience.com/the-5-classification-evaluation-metrics-you-must-know-aa97784ff226)\n", + " * [Confusion matrix and other metrics in machine learning](https://medium.com/hugo-ferreiras-blog/confusion-matrix-and-other-metrics-in-machine-learning-894688cb1c0a)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "oQQVzZ2ZTYrB" + }, + "source": [ + "## Confusion Matrix\n", + "* Termos associados à Confusion Matrix:\n", + " * **Verdadeiro Positivo** (TP = True Positive): Quando o valor observado é True e o modelo estima como True. Ou seja, o modelo acertou na estimativa.\n", + " * Exemplo: **Observado**: Fraude (Positive); **Modelo**: Fraude (Positive) --> Modelo acertou!\n", + " * **Verdadeiro Negativo** (TN = True Negative): Quando o valor observado é False e o modelo estima como False. Ou seja, o modelo acertou na estimativa;\n", + " * Exemplo: **Observado**: NÃO-Fraude (Negative); **Modelo**: NÃO-Fraude (Negative) --> Modelo acertou!\n", + " * **Falso Positivo** (FP = False Positive): Quando o valor observado é False e o modelo estima como True. Ou seja, o modelo errou na estimativa. \n", + " * Exemplo: **Observado**: NÃO-Fraude (Negative); **Modelo**: Fraude (Positive) --> Modelo errou!\n", + " * **Falso Negativo** (FN = False Negative): Quando o valor observado é True e o modelo estima como False.\n", + " * Exemplo: **Observado**: Fraude (Positive); **Modelo**: NÃO-Fraude (Negative) --> Modelo errou!\n", + "\n", + "* Consulte [Confusion matrix](https://scikit-learn.org/stable/auto_examples/model_selection/plot_confusion_matrix.html#sphx-glr-auto-examples-model-selection-plot-confusion-matrix-py)\n", + "\n", + "![ConfusionMatrix](https://github.com/MathMachado/Materials/blob/master/ConfusionMatrix.PNG?raw=true)\n", + "\n", + "Source: [Confusion Matrix](https://subscription.packtpub.com/book/big_data_and_business_intelligence/9781838555078/6/ch06lvl1sec34/confusion-matrix)\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ci-6eiqBTgbL" + }, + "source": [ + "## Accuracy\n", + "> Accuracy - é o número de previsões corretas feitas pelo modelo.\n", + "\n", + "Responde à seguinte pergunta:\n", + "\n", + "```\n", + "Com que frequência o classificador (modelo preditivo) classifica corretamente?\n", + "```\n", + "\n", + "$$Accuracy= \\frac{TP+TN}{TP+TN+FP+FN}$$" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "F7YI8X5TRx-R" + }, + "source": [ + "## Precision (ou Specificity)\n", + "> **Precision** - fornece informações sobre o desempenho em relação a Falsos Positivos (quantos capturamos).\n", + "\n", + "Responde à seguinte pergunta:\n", + "\n", + "```\n", + "Com relação ao resultado Positivo, com que frequência o classificador está correto?\n", + "```\n", + "\n", + "\n", + "$$Precision= \\frac{TP}{TP+FP}$$\n", + "\n", + "**Exemplo**: Precison nos dirá a proporção de clientes que o modelo estimou como sendo Fraude quando, na verdade, são fraude.\n", + "\n", + "**Comentário**: Se nosso foco é minimizar Falso Negativos (FN), então precisamos nos esforçar para termos Recall próximo de 100%." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "zO39n8x_Sz3L" + }, + "source": [ + "## Recall (ou Sensitivity)\n", + "> **Recall** - nos fornece informações sobre o desempenho de um classificador em relação a Falsos Negativos (quantos perdemos).\n", + "\n", + "Responde à seguinte pergunta:\n", + "\n", + "```\n", + "Quando o valor observado é Positivo, com que frequência o classificador está correto?\n", + "```\n", + "\n", + "$$Recall = Sensitivity = \\frac{TP}{TP+FN}$$\n", + "\n", + "**Exemplo**: Recall é a proporção de clientes observados como Fraude e que o modelo estima como Fraude.\n", + "\n", + "**Comentário**: Se nosso foco for minimizar Falso Positivos (FP), então precisamos nos esforçar para fazer Precision mais próximo de 100% possível." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "htS6rdHVVXRG" + }, + "source": [ + "## Specificity\n", + "> **Specificity** - proporção de TN por TN+FP.\n", + "\n", + "Responde à seguinte pergunta:\n", + "\n", + "```\n", + "Quando o valor observado é Negativo, com que frequência o classificador está correto?\n", + "```\n", + "\n", + "**Exemplo**: Specificity é a proporção de clientes NÃO-Fraude que o modelo estima como NÃO-Fraude.\n", + "\n", + "$$Specificity= \\frac{TN}{TN+FP}$$\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "mNn0twadTacc" + }, + "source": [ + "## F1-Score\n", + "> F1-Score é a média harmônica entre Recall e Precision e é um número entre 0 e 1. Quanto mais próximo de 1, melhor. Quanto mais próximo de 0, pior. Ou seja, é um equilíbrio entre Recall e Precision.\n", + "\n", + "$$F1\\_Score= 2\\left(\\frac{Recall*Precision}{Recall+Precision}\\right)$$" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "rsH9dMxazWCg" + }, + "source": [ + "# **DATAFRAME-EXEMPLO USADO NESTE TUTORIAL**\n", + "> Gerar um dataframe com 18 colunas, sendo 9 informativas, 6 redundantes e 3 repetidas:\n", + "\n", + "Para saber mais sobre a geração de dataframes-exemplo (toy), consulte [Synthetic data generation — a must-have skill for new data scientists](https://towardsdatascience.com/synthetic-data-generation-a-must-have-skill-for-new-data-scientists-915896c0c1ae)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "GEyDo_EIV_jV" + }, + "source": [ + "## Definir variáveis globais" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "TdwgpZ76WFaT" + }, + "source": [ + "i_CV = 10 # Número de Cross-Validations\n", + "i_Seed = 20111974 # semente por questões de reproducibilidade\n", + "f_Test_Size = 0.3 # Proporção do dataframe de validação (outros valores poderiam ser 0.15, 0.20 ou 0.25)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "gJTJfpwWzykS" + }, + "source": [ + "from sklearn.datasets import make_classification\n", + "\n", + "X, y = make_classification(n_samples = 1000, \n", + " n_features = 18, \n", + " n_informative = 9, \n", + " n_redundant = 6, \n", + " n_repeated = 3, \n", + " n_classes = 2, \n", + " n_clusters_per_class = 1, \n", + " random_state=i_Seed)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "gWy2IZh3s-o3" + }, + "source": [ + "X" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "ccjhGnzxtAaV" + }, + "source": [ + "y[0:30] # Semelhante aos casos de fraude: {0, 1}" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "OHO2befKJxR3" + }, + "source": [ + "___\n", + "# **DECISION TREE**\n", + "> Decision Trees possuem estrutura em forma de árvores.\n", + "\n", + "* **Principais Vantagens**:\n", + " * São algoritmos fáceis de entender, visualizar e interpretar;\n", + " * Captura facilmente padrões não-lineares presentes nos dados;\n", + " * Requer pouco poder computacional --> Treinar Decision Trees não requer tanto recurso computacional!\n", + " * Lida bem com COLUNAS numéricas ou categóricas;\n", + " * Não requer os dados sejam normalizados;\n", + " * Pode ser utilizado como Feature Engineering ao lidar com Missing Values;\n", + " * Pode ser utilizado como Feature Selection;\n", + " * Não requer suposições sobre a distribuição dos dados por causa da natureza não-paramétrica do algoritmo\n", + "\n", + "* **Principais desvantagens**\n", + " * Propenso a Overfitting, pois Decision Trees podem construir árvores complexas que não sejam capazes de generalizar bem os dados. As coisas complicam muito se a amostra de treinamento possuir outliers. Portanto, **recomenda-se fortemente a tratar os outliers previamente**.\n", + " * Pode criar árvores viesadas se tivermos um dataframe não-balanceado ou que alguma classe seja dominante. Por conta disso, **recomenda-se balancear o dataframe previamente para se evitar esse problema**.\n", + "\n", + "* **Principais parâmetros**\n", + " * **Gini Index** - é uma métrica que mede a frequência com que um ponto/observação aleatoriamente selecionado seria incorretamente identificado.\n", + " * Portanto, quanto menor o valor de Gini Index, melhor a COLUNA;\n", + " * **Entropy** - é uma métrica que mede aleatoriedade da informação presente nos dados.\n", + " * Portanto, quanto maior a entropia da COLUNA, pior ela se torna para nos ajudar a tomar uma conclusão (classificar, por exemplo).\n", + "\n", + "## **Referências**:\n", + "* [1.10. Decision Trees](https://scikit-learn.org/stable/modules/tree.html).\n", + "* [Decision Tree Algorithm With Hands On Example](https://medium.com/datadriveninvestor/decision-tree-algorithm-with-hands-on-example-e6c2afb40d38) - ótimo tutorial para aprender, entender, interpretar e calcular os índices de Gini e entropia.\n", + "* [Intuitive Guide to Understanding Decision Trees](https://towardsdatascience.com/light-on-math-machine-learning-intuitive-guide-to-understanding-decision-trees-adb2165ccab7) - ótimo tutorial para aprender, entender, interpretar e calcular os índices de Gini e entropia.\n", + "* [The Complete Guide to Decision Trees](https://towardsdatascience.com/the-complete-guide-to-decision-trees-28a4e3c7be14)\n", + "* [Creating and Visualizing Decision Tree Algorithm in Machine Learning Using Sklearn](https://intellipaat.com/blog/decision-tree-algorithm-in-machine-learning/) - Muito didático!\n", + "* [Decision Trees in Machine Learning](https://towardsdatascience.com/decision-trees-in-machine-learning-641b9c4e8052)\n", + "\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "FrMkPN5aLp0Y" + }, + "source": [ + "## Carregar as bibliotecas" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "FVU1CM0PKgO4" + }, + "source": [ + "import numpy as np\n", + "import pandas as pd\n", + "import seaborn as sns\n", + "import matplotlib.pyplot as plt\n", + "\n", + "import warnings\n", + "warnings.filterwarnings(\"ignore\")" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "15clh4XrISpz" + }, + "source": [ + "## Carregar/Ler os dados" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "UMPL46w2IWJw" + }, + "source": [ + "l_colunas = ['v1', 'v2', 'v3', 'v4', 'v5', 'v6', 'v7', 'v8', 'v9', 'v10', 'v11', 'v12', 'v13', 'v14', 'v15', 'v16', 'v17', 'v18']\n", + "\n", + "df_X = pd.DataFrame(X, columns = l_colunas)\n", + "df_y = pd.DataFrame(y, columns = ['target'])" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "MFaQF2MGFl_M" + }, + "source": [ + "df_X.head()" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "s-ibdD2ZG7tm" + }, + "source": [ + "df_X.shape" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "f9cqRaywa_TR" + }, + "source": [ + "set(df_y['target'])" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "BN6jbpn6Iwmu" + }, + "source": [ + "## Estatísticas Descritivas básicas do dataframe - df.describe()" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "KlwhxxUNIyYs" + }, + "source": [ + "df_X.describe()" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "N_QhFqyZOKFB" + }, + "source": [ + "## Selecionar as amostras de treinamento e validação\n", + "\n", + "* Dividir os dados/amostra em:\n", + " * **Amostra de treinamento**: usado para treinar o modelo e otimizar os hiperparâmetros;\n", + " * **Amostra de teste**: usado para verificar se o modelo otimizado funciona em dados totalmente desconhecidos. É nesta amostra de teste que avaliamos a performance do modelo em termos de generalização (trabalhar com dados que não lhe foi apresentado);\n", + "* Geralmente usamos 70% da amostra para treinamento e 30% validação. Outras opções são usar os percentuais 80/20 ou 75/25 (default).\n", + "* Consulte [sklearn.model_selection.train_test_split](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html) para mais detalhes.\n" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "8sKBgs-QOOfn" + }, + "source": [ + "from sklearn.model_selection import train_test_split\n", + "\n", + "X_train, X_test, y_train, y_test = train_test_split(df_X, df_y, test_size = f_Test_Size, random_state = i_Seed)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "TPTKBBHgOpoA", + "outputId": "3c8ab56e-2746-4310-df58-9b16986b9413", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + } + }, + "source": [ + "X_train.shape" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "(700, 18)" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 15 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "lEn_LLs2OtRI", + "outputId": "7e53d785-2595-4ba6-c229-ac02b99d3c55", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + } + }, + "source": [ + "y_train.shape" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "(700, 1)" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 16 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "_uAw8EcyOvrG", + "outputId": "00356053-c127-40d1-8bdd-d769af9ef0e2", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + } + }, + "source": [ + "X_test.shape" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "(300, 18)" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 17 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "A2LYI-9hOyXI", + "outputId": "b4f4b728-0bee-435e-e697-27768787d43e", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + } + }, + "source": [ + "y_test.shape" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "(300, 1)" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 18 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "npgoBSX2dd4l" + }, + "source": [ + "## Treinar o algoritmo com os dados de treinamento\n", + "### Carregar os algoritmos/libraries" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "hcvzrtolGfnQ", + "outputId": "b0d2ab18-7386-461b-d5f5-8e1880496244", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 68 + } + }, + "source": [ + "!pip install graphviz\n", + "!pip install pydotplus" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "text": [ + "Requirement already satisfied: graphviz in /usr/local/lib/python3.6/dist-packages (0.10.1)\n", + "Requirement already satisfied: pydotplus in /usr/local/lib/python3.6/dist-packages (2.0.2)\n", + "Requirement already satisfied: pyparsing>=2.0.1 in /usr/local/lib/python3.6/dist-packages (from pydotplus) (2.4.7)\n" + ], + "name": "stdout" + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "v_pF-HH3JKL2" + }, + "source": [ + "from sklearn.metrics import accuracy_score # para medir a acurácia do modelo preditivo\n", + "#from sklearn.model_selection import train_test_split\n", + "#from sklearn.metrics import classification_report\n", + "from sklearn.metrics import confusion_matrix # para plotar a confusion matrix\n", + "\n", + "from sklearn.model_selection import GridSearchCV # para otimizar os parâmetros dos modelos preditivos\n", + "from sklearn.model_selection import cross_val_score\n", + "from time import time\n", + "from operator import itemgetter\n", + "from scipy.stats import randint\n", + "\n", + "from sklearn.tree import export_graphviz\n", + "from sklearn.externals.six import StringIO \n", + "from IPython.display import Image \n", + "import pydotplus\n", + "\n", + "np.set_printoptions(suppress=True)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "9ROlyvgij2yl" + }, + "source": [ + "Função para plotar a Confusion Matrix extraído de [Confusion Matrix Visualization](https://medium.com/@dtuk81/confusion-matrix-visualization-fc31e3f30fea)" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "klQ0FLOIgeX1" + }, + "source": [ + "def mostra_confusion_matrix(cf, \n", + " group_names = None, \n", + " categories = 'auto', \n", + " count = True, \n", + " percent = True, \n", + " cbar = True, \n", + " xyticks = False, \n", + " xyplotlabels = True, \n", + " sum_stats = True, figsize = (8, 8), \n", + " cmap = 'Blues'):\n", + " '''\n", + " This function will make a pretty plot of an sklearn Confusion Matrix cm using a Seaborn heatmap visualization.\n", + " Arguments\n", + " ---------\n", + " cf: confusion matrix to be passed in\n", + " group_names: List of strings that represent the labels row by row to be shown in each square.\n", + " categories: List of strings containing the categories to be displayed on the x,y axis. Default is 'auto'\n", + " count: If True, show the raw number in the confusion matrix. Default is True.\n", + " normalize: If True, show the proportions for each category. Default is True.\n", + " cbar: If True, show the color bar. The cbar values are based off the values in the confusion matrix.\n", + " Default is True.\n", + " xyticks: If True, show x and y ticks. Default is True.\n", + " xyplotlabels: If True, show 'True Label' and 'Predicted Label' on the figure. Default is True.\n", + " sum_stats: If True, display summary statistics below the figure. Default is True.\n", + " figsize: Tuple representing the figure size. Default will be the matplotlib rcParams value.\n", + " cmap: Colormap of the values displayed from matplotlib.pyplot.cm. Default is 'Blues'\n", + " See http://matplotlib.org/examples/color/colormaps_reference.html\n", + " '''\n", + "\n", + " # CODE TO GENERATE TEXT INSIDE EACH SQUARE\n", + " blanks = ['' for i in range(cf.size)]\n", + "\n", + " if group_names and len(group_names)==cf.size:\n", + " group_labels = [\"{}\\n\".format(value) for value in group_names]\n", + " else:\n", + " group_labels = blanks\n", + "\n", + " if count:\n", + " group_counts = [\"{0:0.0f}\\n\".format(value) for value in cf.flatten()]\n", + " else:\n", + " group_counts = blanks\n", + "\n", + " if percent:\n", + " group_percentages = [\"{0:.2%}\".format(value) for value in cf.flatten()/np.sum(cf)]\n", + " else:\n", + " group_percentages = blanks\n", + "\n", + " box_labels = [f\"{v1}{v2}{v3}\".strip() for v1, v2, v3 in zip(group_labels,group_counts,group_percentages)]\n", + " box_labels = np.asarray(box_labels).reshape(cf.shape[0],cf.shape[1])\n", + "\n", + " # CODE TO GENERATE SUMMARY STATISTICS & TEXT FOR SUMMARY STATS\n", + " if sum_stats:\n", + " #Accuracy is sum of diagonal divided by total observations\n", + " accuracy = np.trace(cf) / float(np.sum(cf))\n", + "\n", + " #if it is a binary confusion matrix, show some more stats\n", + " if len(cf)==2:\n", + " #Metrics for Binary Confusion Matrices\n", + " precision = cf[1,1] / sum(cf[:,1])\n", + " recall = cf[1,1] / sum(cf[1,:])\n", + " f1_score = 2*precision*recall / (precision + recall)\n", + " stats_text = \"\\n\\nAccuracy={:0.3f}\\nPrecision={:0.3f}\\nRecall={:0.3f}\\nF1 Score={:0.3f}\".format(accuracy,precision,recall,f1_score)\n", + " else:\n", + " stats_text = \"\\n\\nAccuracy={:0.3f}\".format(accuracy)\n", + " else:\n", + " stats_text = \"\"\n", + "\n", + " # SET FIGURE PARAMETERS ACCORDING TO OTHER ARGUMENTS\n", + " if figsize==None:\n", + " #Get default figure size if not set\n", + " figsize = plt.rcParams.get('figure.figsize')\n", + "\n", + " if xyticks==False:\n", + " #Do not show categories if xyticks is False\n", + " categories=False\n", + "\n", + " # MAKE THE HEATMAP VISUALIZATION\n", + " plt.figure(figsize=figsize)\n", + " sns.heatmap(cf,annot=box_labels,fmt=\"\",cmap=cmap,cbar=cbar,xticklabels=categories,yticklabels=categories)\n", + "\n", + " if xyplotlabels:\n", + " plt.ylabel('True label')\n", + " plt.xlabel('Predicted label' + stats_text)\n", + " else:\n", + " plt.xlabel(stats_text)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "YJMS9ePQ6B6t" + }, + "source": [ + "**Atenção**: Para evitar overfitting nos algoritmos DecisionTreeClassifier, considere min_samples_split = 2 como default." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "nNeRHYePJc-r" + }, + "source": [ + "from sklearn.tree import DecisionTreeClassifier # Library para Decision Tree (Classificação)\n", + "\n", + "# Instancia com os parâmetros sugeridos para se evitar overfitting:\n", + "ml_DT= DecisionTreeClassifier(criterion = 'gini', \n", + " splitter = 'best', \n", + " max_depth = None, \n", + " min_samples_split = 2, \n", + " min_samples_leaf = 1, \n", + " min_weight_fraction_leaf = 0.0, \n", + " max_features = None, \n", + " random_state = i_Seed, \n", + " max_leaf_nodes = None, \n", + " min_impurity_decrease = 0.0, \n", + " min_impurity_split = None, \n", + " class_weight = None, \n", + " presort = False)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "gVLZznprx2YX", + "outputId": "956487e9-beb3-4638-c305-786d7e06c0c0", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 120 + } + }, + "source": [ + "# Objeto configurado\n", + "ml_DT" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "DecisionTreeClassifier(ccp_alpha=0.0, class_weight=None, criterion='gini',\n", + " max_depth=None, max_features=None, max_leaf_nodes=None,\n", + " min_impurity_decrease=0.0, min_impurity_split=None,\n", + " min_samples_leaf=1, min_samples_split=2,\n", + " min_weight_fraction_leaf=0.0, presort=False,\n", + " random_state=None, splitter='best')" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 30 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "OgAHfXVo-Nw8", + "outputId": "10fed276-0cf3-4149-e5d1-784e736a2841", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 120 + } + }, + "source": [ + "# Treina o algoritmo: fit(df)\n", + "ml_DT.fit(X_train, y_train)" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "DecisionTreeClassifier(ccp_alpha=0.0, class_weight=None, criterion='gini',\n", + " max_depth=None, max_features=None, max_leaf_nodes=None,\n", + " min_impurity_decrease=0.0, min_impurity_split=None,\n", + " min_samples_leaf=1, min_samples_split=2,\n", + " min_weight_fraction_leaf=0.0, presort=False,\n", + " random_state=None, splitter='best')" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 33 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "ohmGCDpfyhvV", + "outputId": "fee641eb-64d0-4072-874c-f704c6a70cfe", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + } + }, + "source": [ + "i_CV" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "10" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 24 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "6exa9D8R2fDJ", + "outputId": "5bfc98af-bd00-440d-b504-ab499254c533", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 51 + } + }, + "source": [ + "# Cross-Validation com 10 folds\n", + "a_scores_CV = cross_val_score(ml_DT, X_train, y_train, cv = i_CV)\n", + "\n", + "print(f'Média das Acurácias calculadas pelo CV....: {100*round(a_scores_CV.mean(),4)}')\n", + "print(f'std médio das Acurácias calculadas pelo CV: {100*round(a_scores_CV.std(),4)}')" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "text": [ + "Média das Acurácias calculadas pelo CV....: 91.43\n", + "std médio das Acurácias calculadas pelo CV: 3.8899999999999997\n" + ], + "name": "stdout" + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "Uxoplcea0byV", + "outputId": "578c5e51-c311-4cdf-c5ad-0de8fedd4e17", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 51 + } + }, + "source": [ + "a_scores_CV # array com os scores a cada iteração do CV" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([0.87142857, 0.98571429, 0.85714286, 0.91428571, 0.9 ,\n", + " 0.95714286, 0.91428571, 0.92857143, 0.87142857, 0.94285714])" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 36 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "y3k-PcbN0o_i", + "outputId": "0334a08d-8d2b-4687-ccda-65c6eac86759", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + } + }, + "source": [ + "a_scores_CV.mean()" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "0.9142857142857144" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 37 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "6_rYker2gzeG" + }, + "source": [ + "**Interpretação**: Nosso classificador (DecisionTreeClassifier) tem uma acurácia média de 91,43% (base de treinamento). Além disso, o std é da ordem de 3,66%, ou seja, pequena. Vamos tentar melhorar a acurácia do classificador usando parameter tunning (GridSearchCV)." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "tkwchmkP3p_A", + "outputId": "8b157dfc-f416-49d2-d185-3cf8ebfa13b0", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 51 + } + }, + "source": [ + "print(f'Acurácias: {a_scores_CV}')" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "text": [ + "Acurácias: [0.87142857 0.98571429 0.85714286 0.91428571 0.9 0.95714286\n", + " 0.91428571 0.92857143 0.87142857 0.94285714]\n" + ], + "name": "stdout" + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "sI31WkZs2ht_" + }, + "source": [ + "# Faz predições...\n", + "y_pred = ml_DT.predict(X_test)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "rfapj3OG13PG", + "outputId": "af6e5144-5cdb-4017-885e-e398508d9cf5", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 51 + } + }, + "source": [ + "y_pred[0:30]" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0,\n", + " 1, 0, 0, 1, 1, 0, 1, 1])" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 40 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "sc88ofqh16RT", + "outputId": "4c2d7859-fa1a-4ecb-ea61-9ec399e439de", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 51 + } + }, + "source": [ + "y[0:30]" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1,\n", + " 1, 1, 0, 1, 0, 1, 0, 1])" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 41 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "fSaVzJ9xFpwW", + "outputId": "12eb1946-18c6-4369-af9d-916b5a0fc42d", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 538 + } + }, + "source": [ + "# Confusion Matrix\n", + "cf_matrix = confusion_matrix(y_test, y_pred)\n", + "cf_labels = ['True_Negative', 'False_Positive', 'False_Negative', 'True_Positive']\n", + "cf_categories = ['Zero', 'One']\n", + "mostra_confusion_matrix(cf_matrix, group_names= cf_labels, categories= cf_categories)" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "display_data", + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAccAAAIJCAYAAADQ9vbrAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nOzdeZxP1R/H8ddnzNhCdlmz7wqllJQ9pOxbCqVECRWl9FNp30slVEIREm1KZKfsKmslsofszBhm5vz++H5NM19jjGvGzHzn/fw9vo++99x7zzl3+j2+nz7nnHuvOecQERGR/4SkdgdERETSGgVHERGRAAqOIiIiARQcRUREAig4ioiIBFBwFBERCRCaEpVmq9FH94dIundw+bup3QWRZJE1FEupulPi9z5i9bsp1t+kUuYoIiISIEUyRxERySAsOHOs4LwqERGRC6DMUUREvLNUnx5MEcocRUREAihzFBER74J0zlHBUUREvNOwqoiISMagzFFERLwL0mHV4LwqERGRC6DMUUREvNOco4iISAALSf5PUpo1G21me81sbQL7HjEzZ2b5/dtmZsPMbJOZ/WZmNc9Vv4KjiIikR2OApoGFZlYcaAJsi1PcDCjn//QE3j9X5QqOIiLinVnyf5LAObcAOJDArjeBR4G4bwtpCYxzPkuA3GZWOLH6FRxFRCQomFlLYKdz7teAXUWB7XG2d/jLzkoLckRExLsUuJXDzHriG/48bZRzbtQ5zskOPIFvSPWCKTiKiIh3KbBa1R8IEw2GCSgDlAJ+NV+figGrzOwaYCdQPM6xxfxlZ6VhVRERSfecc2uccwWdcyWdcyXxDZ3WdM79A3wNdPWvWq0NHHbO7U6sPmWOIiLiXSo9IcfMPgPqAfnNbAfwlHPuo7Mc/h3QHNgEhAN3nat+BUcREUl3nHOdz7G/ZJzvDnjgfOpXcBQREe/0hBwREZGMQZmjiIh4F6Rv5VBwFBER74I0OAbnVYmIiFwAZY4iIuJdiBbkiIiIZAjKHEVExLsgnXNUcBQREe90n6OIiEjGoMxRRES8C9Jh1eC8KhERkQugzFFERLwL0jlHBUcREfFOw6oiIiIZgzJHERHxLkiHVZU5ioiIBFDmKCIi3gXpnKOCo4iIeKdhVRERkYxBmaOIiHgXpMOqwXlVIiIiF0CZo4iIeKc5RxERkYxBmaOIiHgXpHOOCo4iIuJdkAbH4LwqERGRC6DMUUREvNOCHBERkYxBmaOIiHgXpHOOCo4iIuKdhlVFREQyBmWOIiLiXZAOqwbnVYmIiFwAZY4iIuJdkM45KjiKiIhnFqTBUcOqIiIiAZQ5ioiIZ8ocRUREMghljiIi4l1wJo7KHEVERAIpcxQREc+Cdc5RwVFERDwL1uCoYVUREZEAyhxFRMQzZY4iIiIZhDJHERHxLFgzRwVHERHxLjhjo4ZVRUREAilzFBERz4J1WFWZo4iISABljiIi4lmwZo4KjiIi4lmwBkcNq4qIiARQ5igiIp4pcxQREUkjzGy0me01s7Vxyl41s41m9puZTTOz3HH2PW5mm8zsdzO7+Vz1KziKiIh3lgKfpBkDNA0omwVUdc5dAfwBPA5gZpWBTkAV/znDzSxTYpUrOIqISLrjnFsAHAgom+mci/JvLgGK+b+3BCY65yKdc1uATcA1idWv4CgiIp6ZWUp8eprZijifnh66djfwvf97UWB7nH07/GVnpQU5IiLiWUosyHHOjQJGeT3fzAYDUcB4r3UoOIqISNAws+5AC6Chc875i3cCxeMcVsxfdlYaVhUREc9SYlj1AvrSFHgUuM05Fx5n19dAJzPLYmalgHLAssTqUuYoIiLpjpl9BtQD8pvZDuApfKtTswCz/EF2iXOul3NunZlNBtbjG259wDkXnVj9Co4iIuJdKj0DwDnXOYHijxI5/nng+aTWr+AoIiKe6Qk5IiIiGYQyRxER8UyZo4iISAahzFFERDwL1sxRwVFERDwL1uCoYVUREZEAyhxFRMS74EwclTmej7yXXsKSiYNYMnEQW2a9wF8/PBe7HRaa6KvBztvG6c/w2Wv3xG63blSdUc/ckaxtAPS5vR7ZsobFbk97pzeX5siW7O1I2lSjWiU6tGkZ+9m5c8dZj619dY1ka7dH9zu57Zabad/6Nrp16cTfWzafdx0P9LqXI0eOcOTIESZ99t/zpffu3cMj/fsmW18lY1LmeB4OHD5O7U4vATD4vuYcD4/krU9mx+7PlCmE6OiYZGuvRqXiVCx9GRs3/5NsdQbq06U+n323nIgTpwBo/eD7KdaWpD1ZsmRl8tSvUqXtF19+jSpVqzFl8iTeeO0Vhr034rzOf2/EBwDs3LmDSRM/o2PnLgAULFiI198aluz9lYRpzlESNOqZOxg2uBMLxg3ghf6tGHxfc/rf2TB2/4rPn6BE4bwAdGpei4WfDGDJxEG8M7gTISGJ/5/q7U/m8FiPm88oz541MyOe6sLCTwbw82eP0aJeNQCyZQ3j05fvZtUXg5n0+r0sGDeAmpVL+Op6oiOLxj/KyimDebJXcwDu73wThQtcyoxR/Zgxyvdf2hunP0O+3JfwbN/buK/DjbFtxr2uh7o2ZNGnA1k26fHYuiQ4hB8/zr13d6Nju9a0bXUrc+f8eMYx+/bt5a6uXejQpiVtWrZg1coVAPy0eBF33t6Rju1aM+ChvoQfP56kNq+6+mq2b9uGc443XnuZNi1b0LbVrcz4/rtE22vWuAEHDx7g7TdfZ8f2bXRo05I3XnuZnTt30KZlCwDu6NyBTZv+jG2rR/c7Wbd2DeHh4Qx58nFu79iODm1bJXidkrEpc0wGRQvmpl7314mJcQy+L+FgUaFUIdo1qUn9u94gKiqGtx7vQKfmtZjw7dkfDP/FzFX0bF+X0sXzxyt/7J6bmbf8D3o9M55Lc2Rj4acDmbPkd3q2r8vBI+HUbPs8lcsUZunEQbHnPP3uNxw8Ek5IiPH9yL5ULVeE4Z/Np+8dDWja8232H4r/Qzblh1W8OrAtIycvAKBtkxrcdv97NKxdkTIlCnLDHa9iZkx56z7q1CzD4lV/ef3zSSqKjDxBhzYtAShSrBivvfE2bw57jxw5cnDw4AHu7NyRevUbxssOvpv+LdfXuYF77+tNdHQ0J05EcPDgAT4Y+T4jP/yY7NmzM/rDUYwb+zG97u9zzj7MnzeXsuXLM3vWTH7fuJHPp37FoYMHub1jO666+uoE24ur30OPsOnPP2Mz4LhDwzc3bc7MGd9Ttk859u3by759e6lStRrD3nqDa66tzdDnXuTIkSN06dSea2tfT/bs2ZPjz5qhBGvmqOCYDKb+uJqYGJfoMfWvqUDNyiVY9OmjAGTLEsa+A8cSPSc6JoY3x/3IwLubMHPx+tjyhtdV4pabqtG/qy+Ty5o5lOKF83B9jdK8O2EeAOv/2s2aP3fFntO2SU3ublOH0EwhXFYgF5VKF2ZtnP2Bfv19BwXy5KRwgUvJnycHh46Es2PPIR64vT6NrqvIEn/gzZEtC2VLFFRwTKcCh1VPnTrFsLfeYNXK5YRYCHv37mH/v/+Sv0CB2GOqVq3GU08+QVRUFPUbNKJipUqsWD6XzX9tovsdnWPruaJ69UTbfvyxAWTNkpUiRYsy6In/8cnYj2na/BYyZcpEvvz5uapWLdatWZNge0nVpGkzet17N/f36cvMGd/TuElTAH7+aRHz5s5h3MejATgZGck/u3dTukyZJNctPgqOclbhEZGx36Oio+MNl2bN7FvsYmZ8+s1Shrzz9XnVPWH6Mgbe3YT1m3bHlhnQecCH/Ll1b5LquLxIPvrf2ZAb7niFQ0cjGPXMHWTJfO5/9VN/XE3rRtUplC8XU2au8l8HvDp6Jh99sfi8rkPSh+++/YaDBw/w2eSphIWF0axxAyJPRsY75qqrazF63KcsnD+fIYMHcWe3u8iZKxe1r6vDy6+9keS2Ts85nktC7d3aslWS2ihUqBC5c+fmj9838sOM73lyyNMAOAdvvDWMkqVKJ7m/krFozjGZbd11gOqVfC+crl6xGCWL5gNg7rLfad2oOgXy5AAgT67slCic55z1RUXF8M6nc3mwS/3Ysh9/3sD9nW6K3b6yQjEAfv5lM22b1ASgYunLqFq2CAC5cmTl+IlIDh87QcG8OWlSp3LsuUePR5Ije9YE257yw0ra33wVrRvVYOqs1QDM+mkD3VpexyXZMgNQpMClsdck6d+xY0fJmzcfYWFhLFu6hF27znxZ+q5dO8mXLz9t23egddv2bFi/jiuurM4vq1exbetWAMLDw/n77y3n1XaNq67mh++/Jzo6mgMHDrBqxQqqVrsiwfbiuuSSSxKd37y5aXM+Hv0hR48epXyFigBcX+cGJoz/lNMvit+wYf1Zz5dzsBT4pAHKHJPZl7N/oUuLa1g5ZTDL1/wdm91t3PwPz7z3Ld+834cQM05FRfPQS5PZtvvgOesc8+XPDLq3aez2ix/M4NUBbVk++QlCQoy/d+6nbb8RjJy8kA+fvZNVXwzmjy17WL95N4ePRfDXtn38unEHv077Hzv+OciSX/5bNj966mK+fu9+du87TNOe8Vf4bdj8DzmyZ2XX3kP88+8RAGYv2UjFUpcxb+wAAI5HRHLX4LHsO5j4ELGkD81b3ErfB3rTttWtVK5SlVKlz8ysVixbxpiPPyI0NJTs2bPz3IsvkzdvXoY+/yKDBj7MyVMnAejzYH9KliyV5LYbNmrMb7+upn2blpgZ/R8ZSP4CBfj6y2lntBdX7tx5qF6jJm1atuCGunVjV62e1rjJzbzy0vP07HV/bFnPXvfzyksv0K71bcTExFC0WDHeHT7yfP5UEuTs9H85JadsNfokf6VyTiEhRlhoJiJPRlGqWH6+G9GHK1o9y6moRF94LWdxcPm7qd0FkWSRNTTl8rESD36d7L/32965LdXzR2WOQSR71szM+KAfYaEhGEa/FycrMIpIitKCHEkRC8YNIHPA4pgeT45j3aazryQ9m2PhkdzQ5ZXk6prIBevf9wF27Yj/1J1+Dw+gzg11U6lHIkmj4JjKbuz6WoLlI57qQrMbq7LvwFGubv9CvH397mzASw+3oVj9x9h/6Dh1ryrH52/25O9d+wH4as4vvDhqRor3XeRc3hr2XqL7Pxk7hqlffI6ZUa5ceYY+/yJZsmS5SL2T5BCsmaNWq6ZRn3yzhJYPnPnDUqxQbhrWrsS23QfilS9e/Re1O71E7U4vKTBKurBnzx4mjB/HZ5O/YOpX3xITE82M76andrdEAAXHNGvxqr84cDj8jPJXBrRl8NtfkhILqUQutujoaCJPnCAqKoqIEycoULBgandJzpOZJfsnLVBwTEda1KvGrr2HWPPHmfeeXXtFKZZOGsSX7/amUunLUqF3IuenUKFCdOt+Nzc3qk+jejeQM0cOrq9zQ2p3S85XkN7nqOCYTmTLGsajd9/M0PfPHHb6ZeN2KjT/H9d2fIn3J85n8ps9U6GHIufnyOHDzJ0zm+9mzmbW3IVERETw7Tep84YQkUAKjulE6WIFuLxoPpZNepyN05+haMHc/DzhMQrly8nR4yc4HuG78fqHResJC81EvtyXpHKPRRK3ZMlPFC1WjLx58xIWFkbDRk34dfXq1O6WnKdgHVbVatV0Yt2mXVze8PHY7Y3Tn6FOl1fYf+g4hfLlZM/+owBcXeVyQszOeMuGSFpzWeEi/Pbrr0RERJA1a1aWLvmZylWrpna3RAAFxzRr7IvdqXtVOfLnzsGmGc/y7IjvGPvlzwke27pRDe5tX5eo6GhOnDhF18c/vsi9FTl/V1xxJY2b3Eyn9q3JlCmUipUq0a59x9TulpyntJLpJTc9Pk7kLPT4OAkWKfn4uDKPfJ/sv/d/vd4s1SOuMkcREfEsSBNHBUcREfEuWIdVtVpVREQkgDLHVBISYiwe/yi79h6mbb8R1LumPC/0b01IiHE8PJJ7n/qEzdv/jXfO1VUu593/dQZ8QxnPj/iOr+f+RpbMofz4UX8yZw4lNFMmpv24mudGfAfAx893o0rZIny/cC1PvfsNAI/dczPrN+3mm3m/XdyLlqD1z+7dDH78UQ7s3w9mtGvfgS53djvjuOXLlvLqSy9wKiqKPHnyMHrsp/y9ZTOPPvJQ7DE7dmzn/j59uaNrd958/VUWL1pAhYqVeP5F30P1v/3mKw4dPMgdXbtfrMuTRARp4qjgmFr63F6f37fsIeclWQEY9kQn2j80kt+37KFn+7oMuqcpPZ/6NN456/7aRZ0urxAdHcNl+XOxdNLjTF+wlsiTUTTtOYzjEScJDQ1hzuiHmbl4PeEnThIReYprOr7It+/3IVeOrGTPmplaVUvy8oc/pMZlS5DKFJqJAY8OolLlKhw/foxO7dtS+7o6lClbNvaYI0eO8MKzzzB85IcULlKE/ft9D8ovWao0k6f6bv6Pjo6mcf0badCoMUePHmXjhvVMmfYNTw8ZzJ9//E7xEpfz1bSpDB/5Yapcp2QcGlZNBUUL5qbpDVX4eNpPsWXOOXL5A2WunNnYve/wGedFnDhFdHQMAFkyh8V7vurphwCEhWYiNDQTzjlORUWTLUsYZr6XIEdHx/C/3rfw3Ag93FmSV4ECBalUuQoAl1ySg9KlS7N37554x3w//RsaNmpM4SJFAMiXL98Z9Sxd8jPFixenSJGihIQYUVFROOc4EXGC0NBQxn78EZ273ElYWFjKX5QkiR4CIMnm1YG+h4fnyJ41tuz+oROY9s79nIg8yZHjJ7ip6+sJnlur6uWMePoOShTOS48nx8YGy5AQ46cJj1GmeAFGTlrA8rVbAfj34DF+/uwxPpu+jDLFCxASYvyycUeCdYskh507d7BxwwaqXXFlvPKtf/9NVFQUPbrfyfHjx+lyR1dubdkq3jEzvp9O0+YtAF+QvaHujXRs24pral9Hjpw5WbPmN+7r/cBFuxY5tzQSy5KdguNF1qxuVfYeOMrqDdupe1W52PIHu9Sn9YPDWb52Kw91bcjLj7Th/qETzjh/+dqtXNXueSqUKsSHQ+/kh8XriTwZRUyMo3anl7g0RzYmvXEvlcsUZv1fuxn42hex50556z4efH4ij/a4mSvKF2X2ko3xsleRCxV+/DiP9O/LwEFPkCNHjnj7oqKjWb9+HaM+GkNk5Am63t6JaldeScmSpQA4dfIk8+fOoV//R2LPuavHvdzV414Anh4ymAf69GXqlM/5+adFlCtfgZ697r94FycZioZVL7LrqpemxU3V2Dj9Gca9dBf1apVn6rBeVCtfNDbbmzJzFbWvLJVoPb9v2cOx8EiqlC0Sr/zwsQjmr/iDJtdXjlfeol41Vm/YziXZslC6WH7ueGw0rRvVIFtWDU9J8jh16hQP9+9L81tupVHjJmfsL1ToMq6vcwPZs2cnT5681Lz6av74fWPs/kWLFlCxchXy5c9/xrkbNqzHOcflJUsx84cZvPrG22zfvp2tW/9OyUuSJAgJsWT/pAUKjhfZkHe+pmzT/1HxlqfoOuhj5i3/g/YPjSJXjmyULeF7l12D2hX5fcueM869vEg+MmXy/SsrUTgPFUpdxtZd+8mfJweX5sgGQNYsYTS8tiK///3f+aGhIfS5vT5vjJ1FtqxhOHxzlZkyGZlDNXggF845x9NDBlO6dGm6dr8rwWPqN2jI6lUrfe9ujIhgzW+/Uap0mdj93383nWbNb0nw3PfeeZsHHuxHVFQUMTHRgO9H+UTEieS/GBE0rJomREfH8MCzE/jstXuIcTEcOhLBfU/7VqreclM1alYuwbPvT+f6GqUZcFcTTkVFExPj6PfCJPYfOk7VckX4YOidZAoJISTE+GLWKr5fuDa2/l4dbuTTb5YSceIUa/7YSfasmVk++Ql+WLSOw8ciUuuyJYisXrWSb7/+inLly9OhTUsAHuz/MLt37wKgQ8fOlC5Thjo31KV969uwkBDatG1HuXLlAQgPD2fJTz/xv6eGnlH3nNk/UqVKVQoWLARAhYqVaNvqVsqXL0+FihUv0hXK2QTrnKOerSpyFnq2qgSLlHy2atUnZyX77/3a5xqnesjVsKqIiEgADauKiIhnwTqsqsxRREQkgDJHERHxLK080Sa5KXMUEREJoMxRREQ8C9bMUcFRREQ8C9LYqGFVERGRQMocRUTEs2AdVlXmKCIiEkCZo4iIeBakiaOCo4iIeKdhVRERkQxCmaOIiHgWpImjMkcREUl/zGy0me01s7VxyvKa2Swz+9P/zzz+cjOzYWa2ycx+M7Oa56pfwVFERDwzs2T/JNEYoGlA2SBgtnOuHDDbvw3QDCjn//QE3j9X5QqOIiLimVnyf5LCObcAOBBQ3BIY6/8+FmgVp3yc81kC5DazwonVr+AoIiLBopBzbrf/+z9AIf/3osD2OMft8JedlRbkiIiIZylxK4eZ9cQ3/HnaKOfcqPOpwznnzMx57YOCo4iIpCn+QHhewdBvj5kVds7t9g+b7vWX7wSKxzmumL/srDSsKiIinqXWnONZfA1083/vBnwVp7yrf9VqbeBwnOHXBClzFBGRdMfMPgPqAfnNbAfwFPASMNnMegBbgQ7+w78DmgObgHDgrnPVr+AoIiKepdbj45xznc+yq2ECxzrggfOpX8FRREQ80xNyREREMghljiIi4pneyiEiIpJBKHMUERHPgjRxVHAUERHvNKwqIiKSQShzFBERz5Q5ioiIZBDKHEVExLMgTRwVHEVExDsNq4qIiGQQyhxFRMSzIE0clTmKiIgEUuYoIiKeBeuco4KjiIh4FqSxUcOqIiIigZQ5ioiIZyFBmjoqcxQREQmgzFFERDwL0sRRmaOIiEggZY4iIuKZbuUQEREJEBKcsVHDqiIiIoGUOYqIiGfBOqyqzFFERCSAMkcREfEsSBNHBUcREfHOCM7oqGFVERGRAMocRUTEM93KISIikkEocxQREc+C9VYOBUcREfEsSGOjhlVFREQCKXMUERHP9LJjERGRDEKZo4iIeBakiaMyRxERkUDKHEVExDPdyiEiIhIgSGOjhlVFREQCKXMUERHPdCuHiIhIBqHMUUREPAvOvFHBUURELkCwrlbVsKqIiEgAZY4iIuJZsL7s+KzB0czeAdzZ9jvn+qZIj0RERFJZYpnjiovWCxERSZeCdc7xrMHROTc27raZZXfOhad8l0REJL0I0th47gU5Znadma0HNvq3rzSz4SneMxERkVSSlNWqbwE3A/sBnHO/AjemZKdERCR9MLNk/6QFSbqVwzm3PaAoOgX6IiIikiYk5VaO7WZ2PeDMLAzoB2xI2W6JiEh6EKy3ciQlc+wFPAAUBXYB1f3bIiIiQemcmaNz7l+gy0Xoi4iIpDOpNUdoZg8B9+C7H38NcBdQGJgI5ANWAnc65056qT8pq1VLm9k3ZrbPzPaa2VdmVtpLYyIiElwsBT7nbNOsKNAXuNo5VxXIBHQCXgbedM6VBQ4CPbxeV1KGVScAk/FF5CLA58BnXhsUERFJBqFANjMLBbIDu4EGwBT//rFAK6+VJyU4ZnfOfeKci/J/PgWyem1QRESCR4hZsn/MrKeZrYjz6Rm3TefcTuA1YBu+oHgY3zDqIedclP+wHfjWyniS2LNV8/q/fm9mg/CN4zqgI/Cd1wZFREQS45wbBYw6234zywO0BEoBh/CNaDZNzj4ktiBnJb5geHoI+L44+xzweHJ2RERE0p9UWo/TCNjinNvn64NNBeoAuc0s1J89FgN2em0gsWerlvJaqYiIZAyptFp1G1DbzLIDEUBDfC/LmAu0wzfS2Q34ymsDSXqfo5lVBSoTZ67ROTfOa6MiIiJeOeeWmtkUYBUQBazGNww7HZhoZs/5yz7y2sY5g6OZPQXUwxccvwOaAYsABUcRkQwutR6F6px7CngqoHgzcE1y1J+U1art8KWs/zjn7gKuBC5NjsZFRETSoqQMq0Y452LMLMrMcgF7geIp3C8REUkHQtLIWzSSW1KC4wozyw18gG8F6zHg5xTtlYiIpAtBGhuT9GzV+/1fR5jZDCCXc+63lO2WiIhI6knsIQA1E9vnnFuVMl0SEZH0Iq28nDi5JZY5vp7IPofvGXYJ+uenYZ47JJJW5Gn+amp3QSRZRMwcmNpdSHcSewhA/YvZERERSX+ScstDehSs1yUiIuJZkp6QIyIikpCMOOcoIiKSqJDgjI3nHlY1nzvMbIh/u4SZJcvjeURERNKipMw5DgeuAzr7t48C76VYj0REJN0IseT/pAVJGVa91jlX08xWAzjnDppZ5hTul4iISKpJSnA8ZWaZ8N3biJkVAGJStFciIpIuZOQFOcOAaUBBM3se31s6nkzRXomISLqQVoZBk1tSnq063sxW4nttlQGtnHMbUrxnIiIiqSQpLzsuAYQD38Qtc85tS8mOiYhI2heko6pJGladjm++0YCsQCngd6BKCvZLREQk1SRlWLVa3G3/2zruP8vhIiKSgWTklx3H45xbZWbXpkRnREQkfQnWB3QnZc7x4TibIUBNYFeK9UhERCSVJSVzzBnnexS+OcgvUqY7IiKSngTpqGriwdF/839O59yAi9QfERGRVHfW4Ghmoc65KDOrczE7JCIi6UdGXJCzDN/84i9m9jXwOXD89E7n3NQU7puIiEiqSMqcY1ZgP9CA/+53dICCo4hIBhekiWOiwbGgf6XqWv4Liqe5FO2ViIikCxnx2aqZgBzED4qnKTiKiEjQSiw47nbODb1oPRERkXQnWBfkJPZwg+C8YhERkXNILHNseNF6ISIi6VKQJo5nD47OuQMXsyMiIpL+BOuCnGB9ZqyIiIhn5/1WDhERkdMsSJenKHMUEREJoMxRREQ8C9Y5RwVHERHxLFiDo4ZVRUREAihzFBERzyxIb3RU5igiIhJAmaOIiHimOUcREZEMQpmjiIh4FqRTjgqOIiLiXUZ8ZZWIiEiGpMxRREQ804IcERGRDEKZo4iIeBakU44KjiIi4l2IXlklIiKSMShzFBERz4J1WFWZo4iISABljiIi4lmw3sqh4CgiIp7pCTkiIiJphJnlNrMpZrbRzDaY2XVmltfMZpnZn/5/5vFav4KjiIh4Zpb8nyR6G5jhnKsIXAlsAAYBs122WZcAACAASURBVJ1z5YDZ/m1PFBxFRCRdMbNLgRuBjwCccyedc4eAlsBY/2FjgVZe29Cco4iIeJZKc46lgH3Ax2Z2JbAS6AcUcs7t9h/zD1DIawPKHEVEJE0xs55mtiLOp2fAIaFATeB951wN4DgBQ6jOOQc4r31Q5igiIp6lROLonBsFjErkkB3ADufcUv/2FHzBcY+ZFXbO7TazwsBer31Q5igiIp6FpMDnXJxz/wDbzayCv6ghsB74GujmL+sGfOX1upQ5iohIevQgMN7MMgObgbvwxdbJZtYD2Ap08Fq5gqOIiHhmqfQQAOfcL8DVCexqmBz1a1hVREQkgDJHERHxLDgfHqfgKCIiF0DPVhUREckglDmKiIhnwZk3KnMUERE5gzJHERHxLEinHBUcRUTEu9S6zzGlaVhVREQkgDJHERHxLFgzrGC9LhEREc+UOYqIiGeacxQREckglDmKiIhnwZk3KjiKiMgF0LCqiIhIBqHMUUREPAvWDCtYr0tERMQzZY4iIuJZsM45KjiKiIhnwRkaNawqIiJyBmWOIiLiWZCOqipzFBERCaTMUUREPAsJ0llHBUcREfFMw6oiIiIZhDJHERHxzIJ0WFWZo4iISABljiIi4lmwzjkqOIqIiGfBulpVw6oiIiIBlDmKiIhnwTqsqsxRREQkgDJHERHxTJmjiIhIBqHMMYlq16xCmbLlY7dfffNdihQtmuCxN113FfN/Xpks7fbq0ZXwiHDGTZgCwPp1axn2xiuM+GhcstR/2rdfTePa6+pQoGBBAJ575kluv6M7pcuUTdZ2JO3JmzMr373SEYBCeS4hJiaGfYcjAKj74CeciopJtrY2juvJ0YiTOAd7Dh7nnle+Y8/B4+dVx9w3b6f+QxMoUSgX11UuyqS5GwCoWa4QXRpX4ZHhc5Ktv3JuwfoQAAXHJMqSJSvjJ09LlbYPHjjAT4sWcP0NN6ZYG99+PY3SZcvFBscnn3ouxdqStOXA0RPU7j0WgMF3Xs/xiFO8NWV57P5MIUZ0jEu29poOnMT+IxE8c1ddHu187XkHs/oPTQDg8kKX0qF+pdjguOrPPaz6c0+y9VOSJiQ4Y6OCo1fh4ccZ0L8PR48cJioqil4P9OOm+g3jHfPvvr088djDHD92nOjoKB4b/BQ1al7Nkp8WM2rEO5w6eZKixUowZOjzZM9+yVnbuqPb3Xz84cgzgmN0dDTvvf0GK1cs49Spk7TreDtt2nUkJiaGV198lhXLl1Ko0GWEhoZya6u2NGx8Mx+OfI+F8+cRGXmCK66sweP/e4Y5P85kw/p1DHliIFmyZOWjcZ/R/4Ge9H34UTasX8vO7dvp+/BAwJdhbli/loGP/4/vp3/NpAmfcurUKapWu4JHnxhCpkyZkv+PLRfdqAHNOHEyiuplC/Lzul0cCY+MFzRXjOpOm/9NZdueI3RqWJkHWtYkLCwTyzfupt87s4hJQjBdtGY797e6iixhmRjWtzE1y19GVHQMj42cy4Jft1Pp8nyMeqQZYWGZCDGj89Av+WvXIfZ91Y8CLd/muR43UqFEPpa8343xs9byy6a99G9Xi3ZPTWXD2J5c23ssh49HArDm43to+NAEYpzjnb5NKF4wJwAD35/Lz+t3ptwfUtItzTkmUWTkCbp0aE2XDq0Z+FAfMmfOwitvvMMnE6fy/gdjefuNV3Au/g/CD99Pp/Z1NzB+8jTGT/6S8hUqcejgQUZ/+D7vjRzNJxOnUqlKFSZ8MibRtqtdUZ3QsDBWLF8ar/zraV9wSc4cjJ3wOWPGf86XUz9n584dzJ09i927djJp6rc8/fzLrPnt19hz2nfqwtgJnzPxi2+IjDzBogXzaNj4ZipVrsLQF15l/ORpZM2aNfb4Bg2bMG/uj7Hbs2Z+T+Omzdmy+S9m/fA9H44Zz/jJ0wgJCWHGd99cwF9Y0pqi+XNSr/8EHhs596zHVCiel3Y3VaD+QxOo3Xss0TExdGpQOUn1N7+2DOu27KPXbTVwDmrdN4ZuL37LhwObkyUsE/feUp33vlxJ7d5jqdNnHDv/PRbv/Cc/WsDiNTuo3Xss70z9bxrDOfj2503cVqccALUqFmbbniPsPRTOa70b8M7UFdzw4Kd0HvoVwx++2cNfRuKyFPhfWqDMMYkCh1WjTp3i/XfeZPWqFZiFsG/vHvbv/5f8+QvEHlOpSlWee/pJoqKiqFe/IeUrVmLhymVs2fwX93Tr4qsn6hRVr7jynO3ffW8vRn8wgj79HoktW7pkMX/+8TtzZs0E4Nixo2zfupVfV6+kYeOmhISEkD9/Aa6qdU3sOSuXL+OTMR9x4kQERw4fpnSZctS9qf5Z282TNy9FixZjzW+/ULzE5fy9ZTNXVq/J55MmsHHDOrp16QD4/uMhT958SfxrSnowdeHv58wA69e4nJrlLmPRu3cCkC1zKPsOhSd6zoxXOxId41i7eR9Pj1nEqAHNGP7VKgD+2H6AbXuOUK5YXpZu2MWjnWtTNH9Ovlz0B3/tOpTkvk+Zv5HHu1zPJzPX0r5eRabM3+jrb83LqXh5/tjjcmXPzCVZwzh+4lSS65aMQcHRoxnffcvBgwcYN2EKoWFhtGzWkJORJ+MdU/OqWoz86BMWL5zHM0Oe4PY7u5Er16VcW/t6nnvp9fNqr9Y1tRnx7tusXfNfFuicY8CgJ7nu+hviHfvTovkJ1hEZGckrLwxl7ITPKXRZYUa9/y6RkZHnbLtx0+b8OHMGJUuWpl6DRpgZzjluubUVD/R9+LyuQ9KP8DgBIyo6hpA4a/azhvl+Oszg01lrGTJ6YZLrPT3neC6T5m5g2cbdNLu2NF8+344+b89k/i/bktTGkvW7KFMkN/kvzcat15flpfE/AxBixk19PyXyVHSS+yuJ060cEs+xY0fJkzdf7HDn7t27zjhm966d5M2Xj1ZtO9CyTTt+37CeqtWu5NdfVrN921YAIiLC2bp1S5LavPveXnwy5qPY7drX3cAXkycSdcr3I7Z16xYiIsK5onpN5syeSUxMDPv3/8uqFb55opP+QHhp7jyEhx9nzo8/xNaV/ZJLCA9PeNVgvQaNWDBvDjNnTKfJzc0BX7CeM+sHDhzYD8Dhw4fYvUtzN8Fq654jVC/nW6xVvWxBSl52KQBzV2+jdd0KFMidHYA8ObNSomCu86p78doddGpQCYCyRfNQvGBO/thxgJKXXcqW3YcY/uUqvv1pE9VKFYh33rGIk+TMnvms9X7905+8fF99Nm47wIGjJwCYvfJv7m9VM/aYK0oXPK++ypk0rCrxNG1+Kw/3603ndrdRqXJVSpYqfcYxK1cs59OxHxEaGka27Nl5+rmXyJM3L0OGvsCTgwZw6pQv0+z1QD8uv7zUOdusU/cm8uTJE7vdsk07du/ayZ2d2+KcI0+evLz65rs0aNSE5cuW0LFNCwoVuowKlSqRI0cOcubKRcs27ejc7jby5ctP5SrVYutqcVtrXnru6dgFOXHlynUpJUuVZsvmv6hS7QoASpcpS68+/Xiw1z04F0NoaCgDH/8fhYskfHuLpG9fLvyDLo2qsHLUXSzfuJs/dx4EYOO2/TwzZiHfvNieEDNORUfz0Ds/sm3vkSTXPfLr1Qzr25jlI7sTFR3Dva99z8lT0bS7qQKdG1bhVHQMew4c55WJS+Kdt2bzPqJjYlj6fjc+9S/IiWvKvI0sfq8r97z6XWzZI8Pn8FafRiwb0Z3QTMaiNTvoO2zWBfxlJFhZ4CKS5HA4IhnXfYsn4eHHyZ79Eg4dOshdd3TkgzHj482Hyrld1vL8hr5F0qqImQNTLB1b8MeBZP+9v7F83lRPH5U5BqmHH+zN0aNHiYo6RY97eyswioicBwXHNGLgQ33YtTP+nF2f/o+csdgmqZL7CToi52PBsC5kDov/89Lj5ems+/vfVOqRpJS0MkeY3BQc04hX33w3tbsgkmxu7Ds+tbsgF0mwrlZVcEwHnn1qMIsWzCNP3rxM/MJ3o/2I995mwbw5mIWQN29ehgx9MfbRbyJpyYiHm9Ksdmn2HQrn6p5jABjSrQ4tritHjHPsOxROz1e/Y/eB/1ZLX1X+Mua93YWuL3zDtIV/pFLPJSPTrRzpwC23teLt4aPild3RrQcTPv+K8ZOnccON9fhw1PBU6p1I4j6ZtZaWT0yJV/bm58u5ptcYavcey/dL/+LxO66P3RcSYjx3z438uPLvi9xT8cJS4JMWKDimAzWvqkWuXLnjleXIkSP2e0RERNAObUj6t3jNjtj7DE87Gv7fAzOyZw0j7qL5+1vW5MuFf57zSTsiKUnDqunY8Hfe4rtvvyJHjhy8/8HY1O6OyHl5uvsNdGlchcPHI2k6cBIARfLl4LY65bh54ERGVmiWyj2UpAgJ0v8yV+aYjt3/YH++/WEuTZvfyucTtQBC0penxyyiXJeRTJyzgV63+Z5a82rvBjz54XxS4PZrkfOi4BgEmjZvwZzZM1O7GyKeTJq9nlZ1fW/QqFm+EOOeuJWN43rSum553nqwEbderxdup2XBOueoYdV0atvWvylxeUkA5s+bk+Dj60TSqjJFcse+ZaPF9WX5Y/sBACp1/SD2mFEDmvH90r/45qdNqdJHSaK0Es2SmYJjOvDkoEdYuWIZhw4dokWTetzbuw8/LVrA1r+3EBISwmWFizBo8NOp3U2RBI19vAV1ryhO/kuzsWl8L579ZDFNa5WmXPE8xMTAtr2H6fu2nm8qaYuerSpyFnq2qgSLlHy26tK/Dif77/21ZS5NUn/NLBOwAtjpnGthZqWAiUA+YCVwp3PuZGJ1nI3mHEVEJL3qB2yIs/0y8KZzrixwEOjhtWIFRxER8cws+T9Ja9eKAbcAH/q3DWgAnH7ixFigldfrUnBMA6Kjo7mjYxseerDXGfu++HwindvdRpcOrbm3exc2/+VbnLBuzW906dCaLh1ac3uHVsyd45uzOXjgAPd270Kntrcyb86PsfUM6P8A+/buPaN+kQsVEmL8PLwrXwxtE6/89fsbsO+rfmc9r2qpAsx7q4vvHZEju5MlLBM5soWx5P1usZ/tnz/Aq73qA9C7ZQ1WjOrOtOfaEhbq++m6vkpRXvHvl9SREqtVzaynma2I8+mZQNNvAY8CMf7tfMAh51yUf3sH4PkFs1qQkwZMnPAJJUuV5vjxY2fsu7lZC9q27wTAgnlzeOv1lxk2/APKlC3H2AmfExoayr/79tKlQ2vq3lifmTOm06Z9R+o3aEz/PvdRr0EjFs6fS/kKlfTsVUkRfVpfxe/b9pMze5bYsprlCpE7R9aznpMpxBj92C30eGU6azbvI2/OrJyKjiHyVDS1e//3QIvF793Jl4v/BKBTg8rUum8Mj3auTeOrS/Hdkr8Y1OU6ur34bcpdnKQK59woYNTZ9ptZC2Cvc26lmdVLiT4oc0xle/b8w+KF82nZpl2C+898TJxvzCFrtmyEhvr+2yby5MnY8kyhoZyIOMHJUycJyZSJqKgoPhs/jq7dPQ+9i5xV0fw5aHpNaT6esSa2LCTEeOHeegz+cP5Zz2t0VUnWbtnHms37ADhw9AQxAev4yhbNQ8Hc2Vm8ZgfgG24LC81E9ixhnIqKpnPDysxcvoWDAY+mk4ssdW50rAPcZmZ/41uA0wB4G8htZqeTvmLAzoRPPzcFx1T25qsv8mD/AYTY2f9VfD5xPK1bNOGdt17jkUefiC1fu+ZXOrZpwe3tWvLYk08RGhpK02YtWDBvNn169aB7j558Mfkzmt9yG1mzZbsYlyMZzKu9GzD4w/nxAlvv22owfckm/onzlo1A5YrlxTnH1y+046f3uvJw+2vOOKZ9vYpMmfd77Pb7X61m/ttdKF4wFz+v20nXm6sy4uvVyXtBki445x53zhVzzpUEOgFznHNdgLnA6UyjG/CV1zYUHFPRwgVzyZMnL5UqV0n0uPadujDt25n06fcIoz8YEVtetdqVTJr6LWPGT2bsRx8QGRlJjpw5efPdkYybMIWKlSqzcP5cGjRuwvPP/I9BA/rx26/6MZHk0eza0uw9FM7qP/fElhXOewltbqzA8C9XJXpuaKYQrq9alLtemk7DhydwW51y1KteIt4x7etVZPK8/xYifjZ7PdfdP467X57Og22uZviXq7i5Vikm/O82XulVXw/fTyWWAv+7AI8BD5vZJnxzkB95rUjBMRX99stqFs6fS8tmDRk86BFWLF/KkCcePevxTZo2Z/682WeUlypdhmzZs/PXpj/jlX806n3uuqcXM7+fzpU1ruKpZ1/kgxHvJft1SMZ0XZWitKhdlo3jejLuiVupV70EKz+4m9JF8rBuzL1sHNeT7FnCWPvxPWecu/Pfoyxas4P9RyKIiIxixvLN1ChXKHZ/tdIFCM0UEi/wnlY47yVcXaEw3/y0iX7tanHH899w6Fgk9WtcnqLXKwlLrdWqpznn5jnnWvi/b3bOXeOcK+uca++ci/R6XVqQk4oe6PswD/R9GICVy5fx6bjRDH3hlXjHxH1M3OKF8ylewvcDsHPnDgoVuozQ0FB279rJ1r83U6RI0Xjn7d2zh6tqXcOff2wkV5YsGEZkpOZnJHkMGb2QIaMXAlD3iuL0b1eLtkOmxjtm31f9qHrXh2ecO2vFFh5qfw3ZsoRy8lQ0dasV552pK2L3d6hXiclzNybcbvcbeHbcIgCyZQ7FOUeMc2TPEpZclyai4JgWjRw+jEqVq3JjvQZ8PnECy5b+RGhoGLly5eKpoS8C8OvqlYwd/QGhoWGEhBiPPj6E3HnyxNbx/rtv07uPbxl9k2a3MLB/H8aO/oD77u+bKtckckvtMtQsfxnPjlvMoWORDJu6gkXv3InD8cOyLcxYtjn22LY3VaDVk1+cUceVZXwrrn/Z5LstadLcDawYeRc79h3ljcnLLs6FSDzBOpqtx8eJnIUeHyfBIiUfH7fq7yPJ/ntfs2SuVI+5yhxFRMS7VA9jKUMLckRERAIocxQREc8u8NaLNEvBUUREPAvW+0s1rCoiIhJAmaOIiHgWpImjMkcREZFAyhxFRMS7IE0dFRxFRMSzYF2tqmFVERGRAMocRUTEM93KISIikkEocxQREc+CNHFUcBQRkQsQpNFRw6oiIiIBlDmKiIhnupVDREQkg1DmKCIinulWDhERkQxCmaOIiHgWpImjgqOIiFyAII2OGlYVEREJoMxRREQ8060cIiIiGYQyRxER8SxYb+VQcBQREc+CNDZqWFVERCSQMkcREfEuSFNHZY4iIiIBlDmKiIhnwXorh4KjiIh4FqyrVTWsKiIiEkCZo4iIeBakiaMyRxERkUDKHEVExLsgTR2VOYqIiARQ5igiIp7pVg4REZEAupVDREQkg1DmKCIingVp4qjMUUREJJAyRxER8S5IU0cFRxER8SxYV6tqWFVERCSAMkcREfFMt3KIiIhkEMocRUTEsyBNHBUcRUTEOw2rioiIZBDKHEVE5AIEZ+qozFFERCSAgqOIiHhmlvyfc7dpxc1srpmtN7N1ZtbPX57XzGaZ2Z/+f+bxel0KjiIikt5EAY845yoDtYEHzKwyMAiY7ZwrB8z2b3ui4CgiIp5ZCnzOxTm32zm3yv/9KLABKAq0BMb6DxsLtPJ6XVqQIyIinqX2rRxmVhKoASwFCjnndvt3/QMU8lqvMkcREUlTzKynma2I8+l5luNyAF8A/Z1zR+Luc845wHntgzJHERHxLCXeyuGcGwWMSrRdszB8gXG8c26qv3iPmRV2zu02s8LAXq99UOYoIiLpipkZ8BGwwTn3RpxdXwPd/N+7AV95bUOZo4iIeJc6c451gDuBNWb2i7/sCeAlYLKZ9QC2Ah28NqDgKCIinqVGbHTOLUqk6YbJ0YaGVUVERAIocxQREc9S+1aOlKLMUUREJIAyRxER8SwlbuVICxQcRUTEu+CMjRpWFRERCaTMUUREPAvSxFGZo4iISCBljiIi4plu5RAREckglDmKiIhnupVDREQkgIZVRUREMggFRxERkQAKjiIiIgE05ygiIp4F65yjgqOIiHgWrKtVNawqIiISQJmjiIh4FqzDqsocRUREAihzFBERz4I0cVRwFBGRCxCk0VHDqiIiIgGUOYqIiGe6lUNERCSDUOYoIiKe6VYOERGRDEKZo4iIeBakiaOCo4iIXIAgjY4aVhUREQmgzFFERDzTrRwiIiIZhDJHERHxLFhv5TDnXGr3QUREJE3RsKqIiEgABUcREZEACo4iIiIBFBwlTTGzaDP7xczWmtnnZpb9AuoaY2bt/N8/NLPKiRxbz8yu99DG32aWP6nlAcccO8+2njazAefbRxE5fwqOktZEOOeqO+eqAieBXnF3mpmnFdbOuXucc+sTOaQecN7BUUSCk4KjpGULgbL+rG6hmX0NrDezTGb2qpktN7PfzOw+APN518x+N7MfgYKnKzKzeWZ2tf97UzNbZWa/mtlsMyuJLwg/5M9a65pZATP7wt/GcjOr4z83n5nNNLN1ZvYhSXh4lpl9aWYr/ef0DNj3pr98tpkV8JeVMbMZ/nMWmlnF5PhjikjS6T5HSZP8GWIzYIa/qCZQ1Tm3xR9gDjvnaplZFmCxmc0EagAVgMpAIWA9MDqg3gLAB8CN/rryOucOmNkI4Jhz7jX/cROAN51zi8ysBPADUAl4CljknBtqZrcAPZJwOXf728gGLDezL5xz+4FLgBXOuYfMbIi/7j7AKKCXc+5PM7sWGA408PBnFBGPFBwlrclmZr/4vy8EPsI33LnMObfFX94EuOL0fCJwKVAOuBH4zDkXDewyszkJ1F8bWHC6LufcgbP0oxFQ2f67wzmXmeXwt9HGf+50MzuYhGvqa2at/d+L+/u6H4gBJvnLPwWm+tu4Hvg8TttZktCGiCQjBUdJayKcc9XjFviDxPG4RcCDzrkfAo5rnoz9CAFqO+dOJNCXJDOzevgC7XXOuXAzmwdkPcvhzt/uocC/gYhcXJpzlPToB6C3mYUBmFl5M7sEWAB09M9JFgbqJ3DuEuBGMyvlPzevv/wokDPOcTOBB09vmNnpYLUAuN1f1gzIc46+Xgoc9AfGivgy19NCgNPZ7+34hmuPAFvMrL2/DTOzK8/RhogkMwVHSY8+xDefuMrM1gIj8Y2CTAP+9O8bB/wceKJzbh/QE98Q5q/8N6z5DdD69IIcoC9wtX/Bz3r+WzX7DL7gug7f8Oq2c/R1BhBqZhuAl/AF59OOA9f4r6EBMNRf3gXo4e/fOqBlEv4mIpKM9GxVERGRAMocRUREAig4ioiIBFBwFBERCaDgKCIiEkDBUUREJICCo4iISAAFRxERkQAKjiIiIgEUHEVERAIoOIqIiARQcBQREQmg4CgiIhJAwVFERCSAgqOIiEgABUdJdWbWysyc/2XA6Z6ZXWVma8xsk5kNMzNL4Jg8ZjbN/77IZWZW1V+e1b/9q5mtM7Nn4pxTysyW+uudZGaZL+Z1iWQkCo6SFnQGFvn/mSLMLFNK1Z2A94F7gXL+T9MEjnkC+MU5dwXQFXjbXx4JNHDOXQlUB5qaWW3/vpeBN51zZYGDQI+UuwSRjE3BUVKVmeUAbsD3Q9/JX5bJzF4zs7X+zOpBf3ktM/vJn1UtM7OcZtbdzN6NU9+3ZlbP//2Ymb1uZr8C15nZEDNb7q931OmMzszKmtmP/npXmVkZMxtnZq3i1DvezFom4XoKA7mcc0uc703i44BWCRxaGZgD4JzbCJQ0s0LO55j/mDD/x/n72gCY4t839iz1ikgyCE3tDkiG1xKY4Zz7w8z2m9lVwDVASaC6cy7KzPL6hxAnAR2dc8vNLBcQcY66LwGWOuceATCz9c65of7vnwAtgG+A8cBLzrlpZpYV3380fgQ8BHxpZpcC1wPdzKyCvx8JqQcUBXbEKdvhLwv0K9AGWGhm1wCXA8WAPf4sdyVQFnjPObfUzPIDh5xzUeeoV0SSgYKjpLbO/DekONG/XQoYcToQOOcOmFk1YLdzbrm/7AhAAtN5cUUDX8TZrm9mjwLZgbzAOjObBxR1zk3z13vCf+x8MxtuZgWAtsAX/v78jm+4M0Hn6E9cLwFvm9kvwBpgtb+/OOeigepmlhuY5p+P/CepFYvIhVNwlFRjZnnxDRVWMzMHZAIcsPw8qoki/vRA1jjfT/gDDf6McDhwtXNuu5k9HXBsQsYBd+Ab7r3LX8+5Msed+DLA04r5y+LxB/fTdRqwBdgccMwhM5uLb87ydSC3mYX6g3SC9YpI8tCco6SmdsAnzrnLnXMlnXPF8QWJX4H7zCwUYoPo70BhM6vlL8vp3/83viwrxMyK4xuSTcjpQPivf56zHYBz7iiw4/T8opllMbPs/mPHAP39x633//N351z1s3wOOed2A0fMrLY/6HUFvgrsjJnljrPa9B5ggXPuiJkV8GeMmFk2oDGw0T9/Ofd0v4H/t3fvwVaVdRjHv8/gDQQVMZHMomyKChUVsZzIG94VZSZTvKQpJpqJEkVNM2rOOHmpydHGtLyXMkZKomMiGiKheEUuwqgYWs6YOiIoFzXs1x/vb+Nmuc+Nc+Acxuczs2fv/a71rr32njnzO++6PO8pjbZrZh3DxdE600hgUqXtTqAf8C9gbl5Mc0JEfAAcB1ydbVMpBW8mpaAuAK4Cnmn0QRGxFPgDMB+Ywtqj05OBcyXNBR4Fdsg+rwMLgZva+L3OBq4HFgEvAX8DkDRa0uhc5yvAfEnPA4cBY7K9HzAt9+VJYGpE3JvLxgNjJS0C+lDOi5rZeqDyD6mZVeUIch6wR0Qs6+z9MbMNxyNHswYkDaOMGq92YTT75PHI0czMrMIjRzMzswoXR+tUkj6U9Gym1kysu1K0Pdu8OA+LNrV8tKTvtvdzmtn+Omer1i3vJmm2pHsb9L1K0vJqu5l1HBdH62yr8jaIgcAHwOj6hbXbOdoiIi6IiAebWX5tRNza9l1ttfZkq9aMoZzzXIukwUDvSndhWQAACFFJREFUDt1bM/sYF0frSmYAX5S0n6QZkiYDC3IUdUXmos6VdGatg6TxOUqbI+nSbLtZ0rfz9aWSFmS/X2XbRZLG5etBkmbl8kmSemf7w5Iuy1HdC5KGtuYLtDdbNbfxGeAIyu0g9dvuBlwB/KR1P6eZrSsn5FiXkCPEw4D7s2kPYGBELJb0fWBZROwlaXNgpqQHgAGUbNa9I2JlhgXUb7MPMAIYEBFRu7m+4lbghxExXdLFwIXkjf/AJhExRNLh2T5sQ2SrAldSCmCvSp9zgMkR8VobYurMbB24OFpn6575olBGjjdQQr6fiIjF2X4wsGttNAhsTTlcOQy4KSJWQslgrWx7GfAecEOeu1vr/J1KoPg2ETE9m24BJtatclc+P00JQici1mu2qqQjgTci4mnl7CK53U8Dx1IKsJmtZy6O1tlWRcRaxSYLzIr6JsrobkplvUOa23DO6DEEOJASu3YOJcu1td7P5w/Jv5UNkK16HDA8R6tbAFtJ+hMwgTJLx6L8fXpIWpRzO5pZB3NxtI3BFOAsSX+PiP9K+hKl4EwFLpB0W+2wav3oMTNUe0TEfZJm8vFg72WS3pY0NCJmUGLkptOMlkaOwFJJ76hMUPw45WKbq6sr5SHelRmLtyZbFfhZPsiR47iIOCm77VDXf7kLo9n64+JoG4PrKYc1n8lR1pvAMRFxv6RBwFOSPgDuo1wFWtMLuFtlRg4BYxts+xTg2ryF5J/kaK6dzqaElnen5KquyVaFcrUsJVv1FpXZSJ6jTPZsZl2EE3LMzMwqfCuHmZlZhYujmZlZhYujdVmVaLl7mrhPsT3bf1nSdvm61XFskj4v6fGMh7tDH01aXL/OZpJuqgso2K9u2SWS/l39TElj6wILHpL0uXZ8PTNrBxdH68rqo+WWAD/o7B1KlwG/yatF36bxxTRnAETELsBBwK8l1f7e7gGGNOgzGxickXJ/AS7v6B03s9ZxcbSNxWNk0oyknSXdL+npjJkbkO19MwJuTj72yfa/5rrPZdrOOsurZQ+gFC8owQEtxcO9ASwFBuf7WRHxWrVDREyrBRoAs1j7fkkz24B8K4d1eZkpeiAlPQfg98DoiHhR0t7ANZSCdRUwPSJGZJ+euf5pEbFEUnfgSUl3RsRbTXxWL0pSTyMnAG8ASyNidbY1Fw83XNIEYCdgz3x+opVf+3TyFhAz2/BcHK0rq0XL7UiZoWJq3ti/DzCxLqpt83w+gHLTPRHxISU+DuBcSSPy9U6U6LmGxTEi3qX5eLjtWrnvN1LuZXwKeAV4lJK00yJJJ1FGmfu28rPMrIO5OFpXtioiBuUN+lMo5xxvpozcmkupWSMvhBkGfCNTdB6mxLI1tX5LI8eFwDaSNsnRY1PxcKuB8+u2+yjwQiv2dxjwc2DfiHi/pfXNbP3wOUfr8vI83LnAj4CVwGJJx0I5Byhpt1z1IeCsbO+WweJbA29nYRwAfL2Fz3o3LwJq9FiQ01BNo2S1QknYubu6HUk9JG2Zrw8CVkfEguY+W9LuwHXA8DxPaWadxMXRNgoRMRuYC4wETgROlzSHEr12dK42Bthf0jzKTBpfpUyBtYmkhZSZMGZ1wO6MB8ZKWgT0Ic+FShquMu0VwPaUuLuFuf7Jtc6SLpf0KiU8/FVJF+WiKyjnSSfmLSyTO2BfzWwdOD7OzMyswiNHMzOzChdHMzOzChdHMzOzChdH63R1Gaq1R39JfSRNk7Rc0m+b6XukpNmZiLNA0pkbct8b7M+2kqZKejGfezex3mWZGTtf0nF17bdJej7bb5S0abb/uO73mZ+/2bYb6nuZfdL4ghzrdCqz2vestG0J7A4MBAZGxDkN+m1KucF+SES8KmlzoH9EPN+OfRHl7+J/69j/cmBJRFwq6adA74gYX1nnCOA84DBKgMHDwIER8Y6kw/koGed24JGI+F2l/1HA+RFxwLrso5m1zCNH65IiYkVE/AN4r5nVelGCLN7KPu/XCmMzOatj60Zs52Vb/xyt3QrMB3bKkdqTOUPGL9qw60dT8lah+dzVRyJidUSsoNyicmh+h/siUaLmGuWrjgQmtGGfzKyNXBytK+hed8hwUms7RcQSYDLwiqQJkk7URzNf1HJWdwP2AJ6TtCfwPWBvShjAGXnjPZRIuWsi4mvAl/P9EEqU3J6SvgWgEnT+bIPHsNxO37pQ8f8AfRvs+hzg0AwK2A7YnxJrt0aOik+m3KdZ396DUkjvbO3vZGZt5/g46wpWtTYOrioiRknahRIRN44yPdSpNMhZlfRNYFKO1pB0FzCULLARUQsIODgfs/N9T0qxfCQihrZh30LSx85bRMQDkvai5K2+SZlxpJq7ek1+XjXK7ihgZv5jYGbriYujbfQiYh4wT9IfgcWU4thWK+peC/hlRFxXXUnSDMrh3KpxEfEg8LqkfhHxmqR+lFk8Gu3zJcAluc3bqctdlXQh8Cmg0cVFx+NDqmbrnQ+r2kZLUs8MFq8ZRLlABxrnrM4AjqnLPR1B45DxKcBpKjOAIGlHSdsDRMTQJnJXH8y+kyl5q9B07mo3SX3y9a7ArsAD+X4UcAgwsnpRUH6HfRtt08w6lq9WtU7X6GrVbH8Z2ArYjDJZ8MH14d0qM2jcAewMrKKM/sZExFOS+lLmffwC5ZDlWRHxmKSxwGm5iesj4kpJ/YF7I2Jg3bbHAKPy7XLgpIh4qRXfpQ/wZ+CzlEL9nZxLcjBlDspRkrYAnsku72T7s9l/dfZ7N5ffFREX57JTgUMj4viW9sPM2sfF0czMrMKHVc3MzCpcHM3MzCpcHM3MzCpcHM3MzCpcHM3MzCpcHM3MzCpcHM3MzCpcHM3MzCr+D7LtdYZ12h17AAAAAElFTkSuQmCC\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "tags": [], + "needs_background": "light" + } + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "p8D975NqsGtj" + }, + "source": [ + "## Parameter tunning\n", + "### Referência\n", + "* [Hyperparameter Tuning the Random Forest in Python](https://towardsdatascience.com/hyperparameter-tuning-the-random-forest-in-python-using-scikit-learn-28d2aa77dd74)\n", + "* [Decision Tree Adventures 2 — Explanation of Decision Tree Classifier Parameters](https://medium.com/datadriveninvestor/decision-tree-adventures-2-explanation-of-decision-tree-classifier-parameters-84776f39a28) - Explica didaticamente e step by step como fazer parameter tunning." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "Bfdq5zEhlVsk" + }, + "source": [ + "# Dicionário de parâmetros para o parameter tunning. Ao todo serão ajustados 2X13X5X5X7= 4.550 modelos. Contando com 10 folds no Cross-Validation, então são 45.500 modelos.\n", + "d_parametros_DT= {\"criterion\": [\"gini\", \"entropy\"]} #, \"min_samples_split\": [2, 5, 10, 30, 50, 70, 90, 120, 150, 180, 210, 240, 270, 350, 400], \"max_depth\": [None, 2, 5, 9, 15], \"min_samples_leaf\": [20, 40, 60, 80, 100], \"max_leaf_nodes\": [None, 2, 3, 4, 5, 10, 15]}" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "H8gNSs0G0A-L" + }, + "source": [ + "```\n", + "grid_search = GridSearchCV(ml_DT, param_grid= d_parametros_DT, cv = i_CV, n_jobs= -1)\n", + "start = time()\n", + "grid_search.fit(X_train, y_train)\n", + "tempo_elapsed= time()-start\n", + "print(f\"\\nGridSearchCV levou {tempo_elapsed:.2f} segundos para estimar {len(grid_search.cv_results_)} modelos candidatos\")\n", + "\n", + "GridSearchCV levou 1999.12 segundos para estimar 23 modelos candidatos\n", + "```\n", + "\n" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "ap3WMXqDthu9" + }, + "source": [ + "# Definindo a função para o GridSearchCV\n", + "def GridSearchOptimizer(modelo, ml_Opt, d_Parametros, X_train, y_train, X_test, y_test, cv = i_CV):\n", + " ml_GridSearchCV = GridSearchCV(modelo, d_Parametros, cv = i_CV, n_jobs= -1, verbose= 10, scoring= 'accuracy')\n", + " start = time()\n", + " ml_GridSearchCV.fit(X_train, y_train)\n", + " tempo_elapsed= time()-start\n", + " #print(f\"\\nGridSearchCV levou {tempo_elapsed:.2f} segundos.\")\n", + "\n", + " # Parâmetros que otimizam a classificação:\n", + " print(f'\\nParametros otimizados: {ml_GridSearchCV.best_params_}')\n", + " \n", + " if ml_Opt == 'ml_DT2':\n", + " print(f'\\nDecisionTreeClassifier *********************************************************************************************************')\n", + " ml_Opt = DecisionTreeClassifier(criterion= ml_GridSearchCV.best_params_['criterion'], \n", + " max_depth= ml_GridSearchCV.best_params_['max_depth'],\n", + " max_leaf_nodes= ml_GridSearchCV.best_params_['max_leaf_nodes'],\n", + " min_samples_split= ml_GridSearchCV.best_params_['min_samples_leaf'],\n", + " min_samples_leaf= ml_GridSearchCV.best_params_['min_samples_split'], \n", + " random_state= i_Seed)\n", + " \n", + " elif ml_Opt == 'ml_RF2':\n", + " print(f'\\nRandomForestClassifier *********************************************************************************************************')\n", + " ml_Opt = RandomForestClassifier(bootstrap= ml_GridSearchCV.best_params_['bootstrap'], \n", + " max_depth= ml_GridSearchCV.best_params_['max_depth'],\n", + " max_features= ml_GridSearchCV.best_params_['max_features'],\n", + " min_samples_leaf= ml_GridSearchCV.best_params_['min_samples_leaf'],\n", + " min_samples_split= ml_GridSearchCV.best_params_['min_samples_split'],\n", + " n_estimators= ml_GridSearchCV.best_params_['n_estimators'],\n", + " random_state= i_Seed)\n", + " \n", + " elif ml_Opt == 'ml_AB2':\n", + " print(f'\\nAdaBoostClassifier *********************************************************************************************************')\n", + " ml_Opt = AdaBoostClassifier(algorithm='SAMME.R', \n", + " base_estimator=RandomForestClassifier(bootstrap = False, \n", + " max_depth = 10, \n", + " max_features = 'auto', \n", + " min_samples_leaf = 1, \n", + " min_samples_split = 2, \n", + " n_estimators = 400), \n", + " learning_rate = ml_GridSearchCV.best_params_['learning_rate'], \n", + " n_estimators = ml_GridSearchCV.best_params_['n_estimators'], \n", + " random_state = i_Seed)\n", + " \n", + " elif ml_Opt == 'ml_GB2':\n", + " print(f'\\nGradientBoostingClassifier *********************************************************************************************************')\n", + " ml_Opt = GradientBoostingClassifier(learning_rate = ml_GridSearchCV.best_params_['learning_rate'], \n", + " n_estimators = ml_GridSearchCV.best_params_['n_estimators'], \n", + " max_depth = ml_GridSearchCV.best_params_['max_depth'], \n", + " min_samples_split = ml_GridSearchCV.best_params_['min_samples_split'], \n", + " min_samples_leaf = ml_GridSearchCV.best_params_['min_samples_leaf'], \n", + " max_features = ml_GridSearchCV.best_params_['max_features'])\n", + " \n", + " elif ml_Opt == 'ml_XGB2':\n", + " print(f'\\nXGBoostingClassifier *********************************************************************************************************')\n", + " ml_Opt = XGBoostingClassifier(learning_rate= ml_GridSearchCV.best_params_['learning_rate'], \n", + " max_depth= ml_GridSearchCV.best_params_['max_depth'], \n", + " colsample_bytree= ml_GridSearchCV.best_params_['colsample_bytree'], \n", + " subsample= ml_GridSearchCV.best_params_['subsample'], \n", + " gamma= ml_GridSearchCV.best_params_['gamma'], \n", + " min_child_weight= ml_GridSearchCV.best_params_['min_child_weight'])\n", + " \n", + " # Treina novamente usando os parametros otimizados...\n", + " ml_Opt.fit(X_train, y_train)\n", + "\n", + " # Cross-Validation com 10 folds\n", + " print(f'\\n********* CROSS-VALIDATION ***********')\n", + " a_scores_CV = cross_val_score(ml_Opt, X_train, y_train, cv = i_CV)\n", + " print(f'Média das Acurácias calculadas pelo CV....: {100*round(a_scores_CV.mean(),4)}')\n", + " print(f'std médio das Acurácias calculadas pelo CV: {100*round(a_scores_CV.std(),4)}')\n", + "\n", + " # Faz predições com os parametros otimizados...\n", + " y_pred = ml_Opt.predict(X_test)\n", + " \n", + " # Importância das COLUNAS\n", + " print(f'\\n********* IMPORTÂNCIA DAS COLUNAS ***********')\n", + " df_importancia_variaveis = pd.DataFrame(zip(l_colunas, ml_Opt.feature_importances_), columns= ['coluna', 'importancia'])\n", + " df_importancia_variaveis = df_importancia_variaveis.sort_values(by= ['importancia'], ascending=False)\n", + " print(df_importancia_variaveis)\n", + "\n", + " # Matriz de Confusão\n", + " print(f'\\n********* CONFUSION MATRIX - PARAMETER TUNNING ***********')\n", + " cf_matrix = confusion_matrix(y_test, y_pred)\n", + " cf_labels = ['True_Negative', 'False_Positive', 'False_Negative', 'True_Positive']\n", + " cf_categories = ['Zero', 'One']\n", + " mostra_confusion_matrix(cf_matrix, group_names = cf_labels, categories = cf_categories)\n", + "\n", + " return ml_Opt, ml_GridSearchCV.best_params_" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "44-BRnNjBT25" + }, + "source": [ + "# Invoca a função\n", + "ml_DT2, best_params = GridSearchOptimizer(ml_DT, 'ml_DT2', d_parametros_DT, X_train, y_train, X_test, y_test, cv = i_CV)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "gmCkjGjPJMLr" + }, + "source": [ + "### Visualizar o resultado" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "cIc3ZgaISEd0" + }, + "source": [ + "from sklearn.tree import export_graphviz\n", + "from sklearn.externals.six import StringIO \n", + "from IPython.display import Image \n", + "import pydotplus\n", + "\n", + "dot_data = StringIO()\n", + "export_graphviz(ml_DT2, out_file = dot_data, filled = True, rounded = True, special_characters = True, feature_names = l_colunas, class_names = ['0','1'])\n", + "\n", + "graph = pydotplus.graph_from_dot_data(dot_data.getvalue()) \n", + "graph.write_png('DecisionTree.png')\n", + "Image(graph.create_png())" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "e1R2GBkbnV37" + }, + "source": [ + "## Selecionar as COLUNAS importantes/relevantes" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "vv7GKBvs6Ybf" + }, + "source": [ + "# Função desenvolvida para Selecionar COLUNAS relevantes\n", + "from sklearn.feature_selection import SelectFromModel\n", + "\n", + "def seleciona_colunas_relevantes(modelo, X_train, X_test, threshold = 0.05):\n", + " # Cria um seletor para selecionar as COLUNAS com importância > threshold\n", + " sfm = SelectFromModel(modelo, threshold)\n", + " \n", + " # Treina o seletor\n", + " sfm.fit(X_train, y_train)\n", + "\n", + " # Mostra o indice das COLUNAS mais importantes\n", + " print(f'\\n********** COLUNAS Relevantes ******')\n", + " print(sfm.get_support(indices=True))\n", + "\n", + " # Seleciona somente as COLUNAS relevantes\n", + " X_train_I = sfm.transform(X_train)\n", + " X_test_I = sfm.transform(X_test)\n", + " return X_train_I, X_test_I " + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "ukMLoEr7nbUf" + }, + "source": [ + "X_train_DT, X_test_DT = seleciona_colunas_relevantes(ml_DT2, X_train, X_test)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "8JjePRQAoqkk" + }, + "source": [ + "## Treina o classificador com as COLUNAS relevantes" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "Gt3aCPpfKRxm" + }, + "source": [ + "best_params" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "zq6uCVtzovMt" + }, + "source": [ + "# Treina usando as COLUNAS relevantes...\n", + "ml_DT2.fit(X_train_DT, y_train)\n", + "\n", + "# Cross-Validation com 10 folds\n", + "a_scores_CV = cross_val_score(ml_DT2, X_train_DT, y_train, cv = i_CV)\n", + "print(f'Média das Acurácias calculadas pelo CV....: {100*round(a_scores_CV.mean(),4)}')\n", + "print(f'std médio das Acurácias calculadas pelo CV: {100*round(a_scores_CV.std(),4)}')" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "Tc7esxqtq-Og" + }, + "source": [ + "****************************************************************" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "znWy3LE1q-Z3" + }, + "source": [ + "ml_DT3, best_params2 = GridSearchOptimizer(ml_DT2, 'ml_DT2', d_parametros_DT, X_train_DT, y_train, X_test_DT, y_test, cv = i_CV)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "6IhCC6pfq-jL" + }, + "source": [ + "best_params" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "qw6Dk3kesT0q" + }, + "source": [ + "best_params2" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "SbS4ZKN8s-ee" + }, + "source": [ + "# Cross-Validation com 10 folds\n", + "a_scores_CV = cross_val_score(ml_DT3, X_train_DT, y_train, cv = i_CV)\n", + "print(f'Média das Acurácias calculadas pelo CV....: {100*round(a_scores_CV.mean(),4)}')\n", + "print(f'std médio das Acurácias calculadas pelo CV: {100*round(a_scores_CV.std(),4)}')" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "_at3XP1Bq-qb" + }, + "source": [ + "***************************************************************" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "MZ1-vGRcxJoN" + }, + "source": [ + "## Valida o modelo usando o dataframe X_test" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "ig9GiUAEw9jr" + }, + "source": [ + "y_pred_DT = ml_DT2.predict(X_test_DT)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "7UZz4UzHDqae" + }, + "source": [ + "# Calcula acurácia\n", + "accuracy_score(y_test, y_pred_DT)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "K3EUMAxxKBur" + }, + "source": [ + "___\n", + "# **RANDOM FOREST**\n", + "* Decision Trees possuem estrutura em forma de árvores.\n", + "* Random Forest pode ser utilizado tanto para classificação (RandomForestClassifier)quanto para Regressão (RandomForestRegressor).\n", + "\n", + "* **Vantagens**:\n", + " * Não requer tanto data preprocessing;\n", + " * Lida bem com COLUNAS categóricas e numéricas;\n", + " * É um Boosting Ensemble Method (pois constrói muitas árvores). Estes modelos aprendem com os próprios erros e ajustam as árvores de modo a fazer melhores classificações;\n", + " * Mais robusta que uma simples Decision Tree. **Porque?**\n", + " * Controla automaticamente overfitting (**porque?**) e frequentemente produz modelos muito robustos e de alta-performance.\n", + " * Pode ser utilizado como Feature Selection, pois gera a matriz de importância dos atributos (importance sample). A soma das importâncias soma 100;\n", + " * Assim como as Decision Trees, esses modelos capturam facilmente padrões não-lineares presentes nos dados;\n", + " * Não requer os dados sejam normalizados;\n", + " * Lida bem com Missing Values;\n", + " * Não requer suposições (assumptions) sobre a distribuição dos dados por causa da natureza não-paramétrica do algoritmo\n", + "\n", + "* **Desvantagens**\n", + " * **Recomenda-se balancear o dataframe previamente para se evitar esse problema**.\n", + "\n", + "* **Principais parâmetros**\n", + "\n", + "## **Referências**:\n", + "* [Running Random Forests? Inspect the feature importances with this code](https://towardsdatascience.com/running-random-forests-inspect-the-feature-importances-with-this-code-2b00dd72b92e)\n", + "* [Feature importances with forests of trees](https://scikit-learn.org/stable/auto_examples/ensemble/plot_forest_importances.html)\n", + "* [Understanding Random Forests Classifiers in Python](https://www.datacamp.com/community/tutorials/random-forests-classifier-python)\n", + "* [Understanding Random Forest](https://towardsdatascience.com/understanding-random-forest-58381e0602d2)\n", + "* [An Implementation and Explanation of the Random Forest in Python](https://towardsdatascience.com/an-implementation-and-explanation-of-the-random-forest-in-python-77bf308a9b76)\n", + "* [Random Forest Simple Explanation](https://medium.com/@williamkoehrsen/random-forest-simple-explanation-377895a60d2d)\n", + "* [Random Forest Explained](https://www.youtube.com/watch?v=eM4uJ6XGnSM)\n", + "* [Hyperparameter Tuning the Random Forest in Python](https://towardsdatascience.com/hyperparameter-tuning-the-random-forest-in-python-using-scikit-learn-28d2aa77dd74) - Explica os principais parâmetros do Random Forest." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "cnfDw_GEKBuu" + }, + "source": [ + "from sklearn.ensemble import RandomForestClassifier\n", + "\n", + "# Instancia...\n", + "ml_RF= RandomForestClassifier(n_estimators=100, min_samples_split= 2, max_features=\"auto\", random_state= i_Seed)\n", + "\n", + "# Treina...\n", + "ml_RF.fit(X_train, y_train)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "lYa9oaZW__o6" + }, + "source": [ + "# Cross-Validation com 10 folds\n", + "a_scores_CV = cross_val_score(ml_RF, X_train, y_train, cv = i_CV)\n", + "print(f'Média das Acurácias calculadas pelo CV....: {100*round(a_scores_CV.mean(),4)}')\n", + "print(f'std médio das Acurácias calculadas pelo CV: {100*round(a_scores_CV.std(),4)}')" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "AouWUu8vANdb" + }, + "source": [ + "**Interpretação**: Nosso classificador (RandomForestClassifier) tem uma acurácia média de 96,44% (base de treinamento). Além disso, o std é da ordem de 2,77%, ou seja, pequena. Vamos tentar melhorar a acurácia do classificador usando parameter tunning (GridSearchCV)." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "vbducxlgAa85" + }, + "source": [ + "print(f'Acurácias: {a_scores_CV}')" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "_lxx-LUw_5sd" + }, + "source": [ + "# Faz predições...\n", + "y_pred = ml_RF.predict(X_test)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "pQIRO_LpGAkw" + }, + "source": [ + "# Confusion Matrix\n", + "cf_matrix = confusion_matrix(y_test, y_pred)\n", + "cf_labels = ['True_Negative','False_Positive','False_Negative','True_Positive']\n", + "cf_categories = ['Zero', 'One']\n", + "mostra_confusion_matrix(cf_matrix, group_names= cf_labels, categories= cf_categories)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "yKLHZ5_C6FJ8" + }, + "source": [ + "## Parameter tunning\n", + "### Referência\n", + "* [Hyperparameter Tuning the Random Forest in Python](https://towardsdatascience.com/hyperparameter-tuning-the-random-forest-in-python-using-scikit-learn-28d2aa77dd74)\n", + "* [Decision Tree Adventures 2 — Explanation of Decision Tree Classifier Parameters](https://medium.com/datadriveninvestor/decision-tree-adventures-2-explanation-of-decision-tree-classifier-parameters-84776f39a28) - Explica didaticamente e step by step como fazer parameter tunning.\n", + "* [Optimizing Hyperparameters in Random Forest Classification](https://towardsdatascience.com/optimizing-hyperparameters-in-random-forest-classification-ec7741f9d3f6) - Outro approach para entender parameter tunning. Recomendo fortemente a leitura! " + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "XOa9naju6FKA" + }, + "source": [ + "# Dicionário de parâmetros para o parameter tunning.\n", + "d_parametros_RF= {'bootstrap': [True, False]} #,\n", + "# 'max_depth': [10, 20, 30, 40, 50, 60, 70, 80, 90, 100, None],\n", + "# 'max_features': ['auto', 'sqrt'],\n", + "# 'min_samples_leaf': [1, 2, 4],\n", + "# 'min_samples_split': [2, 5, 10],\n", + "# 'n_estimators': [200, 400, 600, 800, 1000, 1200, 1400, 1600, 1800, 2000]}" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "6__f2jZaTQat" + }, + "source": [ + "# Invoca a função\n", + "ml_RF2, best_params = GridSearchOptimizer(ml_RF, 'ml_RF2', d_parametros_RF, X_train, y_train, X_test, y_test, cv = i_CV)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "crfn-n--KG4n" + }, + "source": [ + "### Resultado da execução do Random Forest\n", + "\n", + "```\n", + "[Parallel(n_jobs=-1)]: Done 7920 out of 7920 | elapsed: 194.0min finished\n", + "best_params= {'bootstrap': False, 'max_depth': 10, 'max_features': 'auto', 'min_samples_leaf': 1, 'min_samples_split': 2, 'n_estimators': 400}\n", + "```" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "SGTOe5PaRw59" + }, + "source": [ + "# Como o procedimento acima levou 194 minutos para executar, então vou estimar ml_RF2 abaixo usando os parâmetros acima estimados\n", + "best_params= {'bootstrap': False, 'max_depth': 10, 'max_features': 'auto', 'min_samples_leaf': 1, 'min_samples_split': 2, 'n_estimators': 400}\n", + "\n", + "ml_RF2= RandomForestClassifier(bootstrap= best_params['bootstrap'], \n", + " max_depth= best_params['max_depth'], \n", + " max_features= best_params['max_features'], \n", + " min_samples_leaf= best_params['min_samples_leaf'], \n", + " min_samples_split= best_params['min_samples_split'], \n", + " n_estimators= best_params['n_estimators'], \n", + " random_state= i_Seed)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "HMJcAdLlTQa0" + }, + "source": [ + "## Visualizar o resultado\n", + "> Implementar a visualização do RandomForest." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "WWNiy7Z0TQa3" + }, + "source": [ + "## Selecionar as COLUNAS importantes/relevantes" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "kOi11YOKTQa4" + }, + "source": [ + "X_train_RF, X_test_RF = seleciona_colunas_relevantes(ml_RF2, X_train, X_test)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Zn_O7c_DTQbE" + }, + "source": [ + "## Treina o classificador com as COLUNAS relevantes" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "UwEOwzSGTQbF" + }, + "source": [ + "best_params" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "Rr8qDrgvTQbL" + }, + "source": [ + "# Treina com as COLUNAS relevantes...\n", + "ml_RF2.fit(X_train_RF, y_train)\n", + "\n", + "# Cross-Validation com 10 folds\n", + "a_scores_CV = cross_val_score(ml_RF2, X_train_RF, y_train, cv = i_CV)\n", + "print(f'Acurácia Media: {100*a_scores_CV.mean():.2f}')\n", + "print(f'std médio.....: {100*a_scores_CV.std():.2f}')" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "-mYfQLlsTQbQ" + }, + "source": [ + "## Valida o modelo usando o dataframe X_test" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "sSD5o1JQTQbR" + }, + "source": [ + "y_pred_RF = ml_RF2.predict(X_test_RF)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "wywF6LymDzKr" + }, + "source": [ + "# Calcula acurácia\n", + "accuracy_score(y_test, y_pred_RF)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "hJJsL0IJb6iO" + }, + "source": [ + "## Estudo do comportamento dos parametros do algoritmo\n", + "> Consulte [Optimizing Hyperparameters in Random Forest Classification](https://towardsdatascience.com/optimizing-hyperparameters-in-random-forest-classification-ec7741f9d3f6) para mais detalhes." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "navUWMwHi44D" + }, + "source": [ + "param_range = np.arange(1, 250, 2)\n", + "\n", + "# Calculate accuracy on training and test set using range of parameter values\n", + "train_a_scores_CV, test_a_scores_CV = validation_curve(RandomForestClassifier(), \n", + " X_train, \n", + " y_train, \n", + " param_name=\"n_estimators\", \n", + " param_range = param_range, \n", + " cv = i_CV, \n", + " scoring = \"accuracy\", \n", + " n_jobs = -1)\n", + "\n", + "\n", + "# Calculate mean and standard deviation for training set a_scores_CV\n", + "train_mean = np.mean(train_a_scores_CV, axis = 1)\n", + "train_std = np.std(train_a_scores_CV, axis = 1)\n", + "\n", + "# Calculate mean and standard deviation for test set a_scores_CV\n", + "test_mean = np.mean(test_a_scores_CV, axis = 1)\n", + "test_std = np.std(test_a_scores_CV, axis = 1)\n", + "\n", + "# Plot mean accuracy a_scores_CV for training and test sets\n", + "plt.plot(param_range, train_mean, label = \"Training score\", color = \"black\")\n", + "plt.plot(param_range, test_mean, label = \"Cross-validation score\", color = \"dimgrey\")\n", + "\n", + "# Plot accurancy bands for training and test sets\n", + "plt.fill_between(param_range, train_mean - train_std, train_mean + train_std, color = \"gray\")\n", + "plt.fill_between(param_range, test_mean - test_std, test_mean + test_std, color = \"gainsboro\")\n", + "\n", + "# Create plot\n", + "plt.title(\"Validation Curve With Random Forest\")\n", + "plt.xlabel(\"Number Of Trees\")\n", + "plt.ylabel(\"Accuracy Score\")\n", + "plt.tight_layout()\n", + "plt.legend(loc = \"best\")\n", + "plt.show()" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "rv7TIM9kjsud" + }, + "source": [ + "param_range = np.arange(1, 250, 2)\n", + "\n", + "# Calculate accuracy on training and test set using range of parameter values\n", + "train_a_scores_CV, test_a_scores_CV = validation_curve(RandomForestClassifier(), \n", + " X_train, \n", + " y_train, \n", + " param_name = \"max_depth\", \n", + " param_range = param_range, \n", + " cv = i_CV, \n", + " scoring = \"accuracy\", \n", + " n_jobs = -1)\n", + "\n", + "# Calculate mean and standard deviation for training set a_scores_CV\n", + "train_mean = np.mean(train_a_scores_CV, axis = 1)\n", + "train_std = np.std(train_a_scores_CV, axis = 1)\n", + "\n", + "# Calculate mean and standard deviation for test set a_scores_CV\n", + "test_mean = np.mean(test_a_scores_CV, axis = 1)\n", + "test_std = np.std(test_a_scores_CV, axis = 1)\n", + "\n", + "# Plot mean accuracy a_scores_CV for training and test sets\n", + "plt.plot(param_range, train_mean, label=\"Training score\", color=\"black\")\n", + "plt.plot(param_range, test_mean, label=\"Cross-validation score\", color=\"dimgrey\")\n", + "\n", + "# Plot accurancy bands for training and test sets\n", + "plt.fill_between(param_range, train_mean - train_std, train_mean + train_std, color=\"gray\")\n", + "plt.fill_between(param_range, test_mean - test_std, test_mean + test_std, color=\"gainsboro\")\n", + "\n", + "# Create plot\n", + "plt.title(\"Validation Curve With Random Forest\")\n", + "plt.xlabel(\"Number Of Trees\")\n", + "plt.ylabel(\"Accuracy Score\")\n", + "plt.tight_layout()\n", + "plt.legend(loc=\"best\")\n", + "plt.show()" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "lm_fPGYwkJYc" + }, + "source": [ + "param_range = np.arange(1, 250, 2)\n", + "\n", + "# Calculate accuracy on training and test set using range of parameter values\n", + "train_a_scores_CV, test_a_scores_CV = validation_curve(RandomForestClassifier(), \n", + " X_train, \n", + " y_train, \n", + " param_name='min_samples_leaf', \n", + " param_range=param_range,\n", + " cv = i_CV, \n", + " scoring=\"accuracy\", \n", + " n_jobs=-1)\n", + "\n", + "\n", + "# Calculate mean and standard deviation for training set a_scores_CV\n", + "train_mean = np.mean(train_a_scores_CV, axis = 1)\n", + "train_std = np.std(train_a_scores_CV, axis = 1)\n", + "\n", + "# Calculate mean and standard deviation for test set a_scores_CV\n", + "test_mean = np.mean(test_a_scores_CV, axis = 1)\n", + "test_std = np.std(test_a_scores_CV, axis = 1)\n", + "\n", + "# Plot mean accuracy a_scores_CV for training and test sets\n", + "plt.plot(param_range, train_mean, label=\"Training score\", color=\"black\")\n", + "plt.plot(param_range, test_mean, label=\"Cross-validation score\", color=\"dimgrey\")\n", + "\n", + "# Plot accurancy bands for training and test sets\n", + "plt.fill_between(param_range, train_mean - train_std, train_mean + train_std, color=\"gray\")\n", + "plt.fill_between(param_range, test_mean - test_std, test_mean + test_std, color=\"gainsboro\")\n", + "\n", + "# Create plot\n", + "plt.title(\"Validation Curve With Random Forest\")\n", + "plt.xlabel(\"Number Of Trees\")\n", + "plt.ylabel(\"Accuracy Score\")\n", + "plt.tight_layout()\n", + "plt.legend(loc=\"best\")\n", + "plt.show()" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "CAqdiSaVlAB8" + }, + "source": [ + "param_range = np.arange(0.05, 1, 0.05)\n", + "\n", + "# Calculate accuracy on training and test set using range of parameter values\n", + "train_a_scores_CV, test_a_scores_CV = validation_curve(RandomForestClassifier(), \n", + " X_train, \n", + " y_train, \n", + " param_name='min_samples_split', \n", + " param_range=param_range,\n", + " cv = i_CV, \n", + " scoring=\"accuracy\", \n", + " n_jobs=-1)\n", + "\n", + "\n", + "# Calculate mean and standard deviation for training set a_scores_CV\n", + "train_mean = np.mean(train_a_scores_CV, axis = 1)\n", + "train_std = np.std(train_a_scores_CV, axis = 1)\n", + "\n", + "# Calculate mean and standard deviation for test set a_scores_CV\n", + "test_mean = np.mean(test_a_scores_CV, axis = 1)\n", + "test_std = np.std(test_a_scores_CV, axis = 1)\n", + "\n", + "# Plot mean accuracy a_scores_CV for training and test sets\n", + "plt.plot(param_range, train_mean, label=\"Training score\", color=\"black\")\n", + "plt.plot(param_range, test_mean, label=\"Cross-validation score\", color=\"dimgrey\")\n", + "\n", + "# Plot accurancy bands for training and test sets\n", + "plt.fill_between(param_range, train_mean - train_std, train_mean + train_std, color=\"gray\")\n", + "plt.fill_between(param_range, test_mean - test_std, test_mean + test_std, color=\"gainsboro\")\n", + "\n", + "# Create plot\n", + "plt.title(\"Validation Curve With Random Forest\")\n", + "plt.xlabel(\"Number Of Trees\")\n", + "plt.ylabel(\"Accuracy Score\")\n", + "plt.tight_layout()\n", + "plt.legend(loc=\"best\")\n", + "plt.show()" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "cX_gfsbQSdNd" + }, + "source": [ + "___\n", + "# **BOOSTING MODELS**\n", + "* São algoritmos muito utilizados nas competições do Kaggle;\n", + "* São algoritmos utilizados para melhorar a performance dos algoritmos de Machine Learning;\n", + "* Modelos:\n", + " - [X] AdaBoost\n", + " - [X] XGBoost\n", + " - [X] LightGBM\n", + " - [X] GradientBoosting\n", + " - [X] CatBoost\n", + "\n", + "## Bagging vs Boosting vc Stacking\n", + "### **Bagging**\n", + "* Objetivo é reduzir a variância;\n", + "\n", + "#### Como funciona\n", + "* Seleciona várias amostras **COM REPOSIÇÃO** do dataframe de treinamento. Cada amostra é usada para treinar um modelo usando Decision Trees. Como resultado, temos um ensemble de muitas e diferentes modelos (Decision Trees). A média de desses muitos e diferentes modelos (Decision Trees) são usados para produzir o resultado final;\n", + "* O resultado final é mais robusto do que usarmos uma simples Decision Tree.\n", + "\n", + "![Bagging](https://github.com/MathMachado/Materials/blob/master/Bagging.png?raw=true)\n", + "\n", + "Souce: [Boosting and Bagging: How To Develop A Robust Machine Learning Algorithm](https://hackernoon.com/how-to-develop-a-robust-algorithm-c38e08f32201).\n", + "\n", + "#### Steps\n", + "* Suponha um dataframe X_train (dataframe de treinamento) contendo N observações (instâncias, pontos, linhas) e M COLUNAS (features, atributos).\n", + " 1. Bagging seleciona aleatoriamente uma amostra **COM REPOSIÇÃO** de X_train;\n", + " 2. Bagging seleciona aleatoriamente M2 (M2 < M) COLUNAS do dataframe extraído do passo (1);\n", + " 3. Constroi uma Decision Tree com as M2 COLUNAS do passo (2) e o dataframe obtido no passo (1) e as COLUNAS são avaliadas pela sua habilidade de classificar as observações;\n", + " 4. Os passos (1)--> (2)-- (3) são repetidos K vezes (ou seja, K Decision Trees), de forma que as COLUNAS são ranqueadas pelo seu poder preditivo e o resultado final (acurácia, por exemplo) é obtido pela agregação das predições dos K Decision Trees.\n", + "\n", + "#### Vantagens\n", + "* Reduz overfitting;\n", + "* Lida bem com dataframes com muitas COLUNAS (high dimensionality);\n", + "* Lida automaticamente com Missing Values;\n", + "\n", + "#### Desvantagem\n", + "* A predição final é baseada na média das K Decision Trees, o que pode comprometer a acurácia final.\n", + "\n", + "___ \n", + "### **Boosting**\n", + "* Objetivo é melhorar acurácia;\n", + "\n", + "#### Como funciona\n", + "* Os classificadores são usados sequencialmente, de forma que o classificador no passo N aprende com os erros do classificador do passo N-1. Ou seja, o objetivo é melhorar a precisão/acurácia à cada passo aprendendo com o passado.\n", + "\n", + "![Boosting](https://github.com/MathMachado/Materials/blob/master/Boosting.png?raw=true)\n", + "\n", + "Source: [Ensemble methods: bagging, boosting and stacking](https://towardsdatascience.com/ensemble-methods-bagging-boosting-and-stacking-c9214a10a205), Joseph Rocca\n", + ".\n", + "\n", + "#### Steps\n", + "* Suponha um dataframe X_train (dataframe de treinamento) contendo N observações (instâncias, pontos, linhas) e M COLUNAS (features, atributos).\n", + " 1. Boosting seleciona aleatoriamente uma amostra D1 SEM reposição de X_train;\n", + " 2. Boosting treina o classificador C1;\n", + " 3. Boosting seleciona aleatoriamente a SEGUNDA amostra D2 SEM reposição de X_train e acrescenta à D2 50% das observações que foram classificadas incorretamente para treinar o classificador C2;\n", + " 4. Boosting encontra em X_train a amostra D3 que os classificadores C1 e C2 discordam em classificar e treina C3;\n", + " 5. Combina (voto) as predições de C1, C2 e C3 para produzir o resultado final.\n", + "\n", + "#### Vantagens\n", + "* Lida bem com dataframes com muitas COLUNAS (high dimensionality);\n", + "* Lida automaticamente com Missing Values;\n", + "\n", + "#### Desvantagem\n", + "* Propenso a overfitting. Recomenda-se tratar outliers previamente.\n", + "* Requer ajuste cuidadoso dos hyperparameters;" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "9fgUrkmPk4dr" + }, + "source": [ + "___\n", + "# STACKING\n", + "\n", + "![Stacking](https://github.com/MathMachado/Materials/blob/master/Stacking.png?raw=true)\n", + "\n", + "Kd a referência desta figura???" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "B0jxx3ETpOdm" + }, + "source": [ + "___\n", + "# **BOOTSTRAPPING METHODS**\n", + "> Antes de falarmos de Boosting ou Bagging, precisamos entender primeiro o que é Bootstrap, pois ambos (Boosting e Bagging) são baseados em Bootstrap.\n", + "\n", + "* Em Estatística (e em Machine Learning), Bootstrap se refere à extrair amostras aleatórias COM reposição da população X." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "SyqazmUuifkE" + }, + "source": [ + "___\n", + "# **ADABOOST(Adaptive Boosting)**\n", + "* Quando nada funciona, AdaBoost funciona!\n", + "* Foi um dos primeiros algoritmos de Boosting (1995);\n", + "* AdaBoost pode ser utilizado tanto para classificação (AdaBoostClassifier) quanto para Regressão (AdaBoostRegressor);\n", + "* AdaBoost usam algoritmos DecisionTree como base_estimator;" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "RU-vzkXqrFVw" + }, + "source": [ + "## Referências\n", + "* [AdaBoost Classifier Example In Python](https://towardsdatascience.com/machine-learning-part-17-boosting-algorithms-adaboost-in-python-d00faac6c464) - Didático e explica exatamente como o AdaBoost funciona.\n", + "* [Adaboost for Dummies: Breaking Down the Math (and its Equations) into Simple Terms](https://towardsdatascience.com/adaboost-for-dummies-breaking-down-the-math-and-its-equations-into-simple-terms-87f439757dcf) - Para quem quer entender a matemática por trás do algoritmo.\n", + "* [Gradient Boosting and XGBoost](https://medium.com/hackernoon/gradient-boosting-and-xgboost-90862daa6c77)\n", + "* [Understanding AdaBoost](https://towardsdatascience.com/understanding-adaboost-2f94f22d5bfe), Akash Desarda.\n", + "* [AdaBoost Classifier Example In Python](https://towardsdatascience.com/machine-learning-part-17-boosting-algorithms-adaboost-in-python-d00faac6c464)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "6EMrjQDZIMl_" + }, + "source": [ + "## O que é AdaBoost (Adaptive Boosting)?\n", + "* é um dos classificadores do tipo ensemble (combina vários classificadores para aumentar a precisão).\n", + "* AdaBoost é um classificador iterativo e forte que combina (ensemble) vários classificadores fracos para melhorar a precisão.\n", + "* Qualquer algoritmo de aprendizado de máquina pode ser usado como um classificador de base (parâmetro base_estimator);\n", + "\n", + "## Parâmetros mais importantes do AdaBoost:\n", + "* base_estimator - É um classificador usado para treinar o modelo. Como default, AdaBoost usa o DecisionTreeClassifier. Como dito anteriormente, pode-se utilizar diferentes algoritmos para esse fim.\n", + "* n_estimators - Número de base_estimator para treinar iterativamente.\n", + "* learning_rate - Controla a contribuição do base_estimator na solução/combinação final;" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "TzLtHzWNJBix" + }, + "source": [ + "## Usando diferentes algoritmos para base_estimator\n", + "> Como dito anteriormente, pode-se utilizar vários tipos de base_estimator em AdaBoost. Por exemplo, se quisermos usar SVM (Support Vector Machines), devemos proceder da seguinte forma:\n", + "\n", + "\n", + "```\n", + "# Importar a biblioteca base_estimator\n", + "from sklearn.svm import SVC\n", + "\n", + "# Treina o classificador (algoritmo)\n", + "ml_SVC= SVC(probability=True, kernel='linear')\n", + "\n", + "# Constroi o modelo AdaBoost\n", + "ml_AB = AdaBoostClassifier(n_estimators= 50, base_estimator=ml_SVC, learning_rate=1)\n", + "```\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "hrj4a4s6hMMB" + }, + "source": [ + "## Vantagens\n", + "* AdaBoost é fácil de implementar;\n", + "* AdaBoost corrige os erros do base_estimator iterativamente e melhora a acurácia;\n", + "* Faz o Feature Selection automaticamente (**Porque**?);\n", + "* Pode-se usar muitos algoritos como base_estimator ;\n", + "* Como é um método ensemble, então o modelo final é pouco propenso à overfitting.\n", + "\n", + "## Desvantagens\n", + "* AdaBoost é sensível a ruídos nos dados;\n", + "* Altamente impactado por outliers (contribui para overfitting), pois o algoritmo tenta se ajustr a cada ponto da mehor forma possível;\n", + "* AdaBoost é mais lento que XGBoost;" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "bgJmu7YLiyv7" + }, + "source": [ + "No exemplo a seguir, vou usar RandomForestClassifier com os parâmetros otimizados, ou seja:\n", + "\n", + "```\n", + "best_params= {'bootstrap': False, 'max_depth': 10, 'max_features': 'auto', 'min_samples_leaf': 1, 'min_samples_split': 2, 'n_estimators': 400}\n", + "```\n", + "\n" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "5VCRNyZT3qvc" + }, + "source": [ + "best_params= {'bootstrap': False, 'max_depth': 10, 'max_features': 'auto', 'min_samples_leaf': 1, 'min_samples_split': 2, 'n_estimators': 400}" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "1gIboJdriq61" + }, + "source": [ + "from sklearn.ensemble import AdaBoostClassifier\n", + "from sklearn.ensemble import RandomForestClassifier\n", + "\n", + "# Instancia RandomForestClassifier - Parâmetros otimizados!\n", + "ml_RF2= RandomForestClassifier(bootstrap= best_params['bootstrap'], \n", + " max_depth= best_params['max_depth'], \n", + " max_features= best_params['max_features'], \n", + " min_samples_leaf= best_params['min_samples_leaf'], \n", + " min_samples_split= best_params['min_samples_split'], \n", + " n_estimators= best_params['n_estimators'], \n", + " random_state= i_Seed)\n", + "# Instancia AdaBoostClassifier\n", + "ml_AB= AdaBoostClassifier(n_estimators=100, base_estimator= ml_RF2, random_state= i_Seed)\n", + "\n", + "# Treina...\n", + "ml_AB.fit(X_train, y_train)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "A4Cs81OLD40y" + }, + "source": [ + "# Cross-Validation com 10 folds\n", + "a_scores_CV = cross_val_score(ml_AB, X_train, y_train, cv = i_CV)\n", + "print(f'Média das Acurácias calculadas pelo CV....: {100*round(a_scores_CV.mean(),4)}')\n", + "print(f'std médio das Acurácias calculadas pelo CV: {100*round(a_scores_CV.std(),4)}')" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "F7Ce5L38ECoC" + }, + "source": [ + "**Interpretação**: Nosso classificador (AdaBoostClassifier) tem uma acurácia média de 96,72% (base de treinamento). Além disso, o std é da ordem de 2,54%, ou seja, pequena. Vamos tentar melhorar a acurácia do classificador usando parameter tunning (GridSearchCV)." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "t5GfnBwEifkO" + }, + "source": [ + "print(f'Acurácias: {a_scores_CV}')" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "Q9rSpuXyEPA5" + }, + "source": [ + "# Faz predições com os parametros otimizados...\n", + "y_pred = ml_AB.predict(X_test)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "2F9k-_eXGDLa" + }, + "source": [ + "# Confusion Matrix\n", + "cf_matrix = confusion_matrix(y_test, y_pred)\n", + "cf_labels = ['True_Negative','False_Positive','False_Negative','True_Positive']\n", + "cf_categories = ['Zero', 'One']\n", + "mostra_confusion_matrix(cf_matrix, group_names= cf_labels, categories= cf_categories)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "XweWTjQ9EXLw" + }, + "source": [ + "## Parameter tunning" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "fcrKzse9EbL_" + }, + "source": [ + "# Dicionário de parâmetros para o parameter tunning.\n", + "d_parametros_AB = {'n_estimators':[50, 100, 200], 'learning_rate':[.001, 0.01, 0.05, 0.1, 0.3,1]}" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "Susc3I7mFDQX" + }, + "source": [ + "# Invoca a função\n", + "ml_AB2, best_params= GridSearchOptimizer(ml_AB, 'ml_AB2', d_parametros_AB, X_train, y_train, X_test, y_test, cv = i_CV)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "w4JjWsusjNS8" + }, + "source": [ + "___\n", + "# **GRADIENT BOOSTING**\n", + "* Gradient boosting pode ser usado para resolver problemas de classificação (GradientBoostingClassifier) e Regressão (GradientBoostingRegressor);\n", + "* Gradient boosting são um refinamento do AdaBoost (lembra que AdaBoost foi um dos primeiros métodos de Boosting - criado em 1995). O que Gradient Boosting faz adicionalmente ao AdaBoost é minimizar a loss (função perda), ie, minimizar a diferença entre os valores observados de y e os valores preditos.\n", + "* Usa Gradient Descent para encontrar as deficiências nas previsões do passo anterior. Gradient Descent é um algoritmo popular e poderoso e usado em Redes Neurais;\n", + "* O objetivo do Gradient Boosting é minimizar 'loss function'. Portanto, Gradient Boosting depende da \"loss function\".\n", + "* Gradient boosting usam algoritmos DecisionTree como base_estimator;\n", + "\n", + "## Vantagens\n", + "* Não há necessidade de pre-processing;\n", + "* Trabalha normalmente com COLUNAS numéricas ou categóricas;\n", + "* Trata automaticamente os Missing Values. Ou seja, não é necessário aplicar métodos de Missing Value Imputation;\n", + "\n", + "## Desvantagens\n", + "* Como Gradient Boosting tenta continuamente minimizar os erros à cada iteração, isso pode enfatizar os outliers e causar overfitting. Portanto, deve-se:\n", + " * Tratar os outliers previamente OU\n", + " * Usar Cross-Validation para neutralizar os efeitos dos outliers (**Eu prefiro este método, pois toma menos tempo**);\n", + "* Computacionalmene caro. Geralmente são necessários muitas árvores (> 1000) para se obter bons resultados;\n", + "* Devido à flexibilidade (muitos parâmetros para ajustar), então é necessário usar GridSearchCV para encontrar a combinação ótima dos hyperparameters;\n", + "\n", + "## Referências\n", + "* [Gradient Boosting Decision Tree Algorithm Explained](https://towardsdatascience.com/machine-learning-part-18-boosting-algorithms-gradient-boosting-in-python-ef5ae6965be4) - Didático e detalhista.\n", + "* [Predicting Wine Quality with Gradient Boosting Machines](https://towardsdatascience.com/predicting-wine-quality-with-gradient-boosting-machines-a-gmb-tutorial-d950b1542065)\n", + "* [Parameter Tuning in Gradient Boosting (GBM) with Python](https://www.datacareer.de/blog/parameter-tuning-in-gradient-boosting-gbm/)\n", + "* [Tune Learning Rate for Gradient Boosting with XGBoost in Python](https://machinelearningmastery.com/tune-learning-rate-for-gradient-boosting-with-xgboost-in-python/)\n", + "* [In Depth: Parameter tuning for Gradient Boosting](https://medium.com/all-things-ai/in-depth-parameter-tuning-for-gradient-boosting-3363992e9bae) - Muito bom\n", + "* [Complete Machine Learning Guide to Parameter Tuning in Gradient Boosting (GBM) in Python](https://www.analyticsvidhya.com/blog/2016/02/complete-guide-parameter-tuning-gradient-boosting-gbm-python/)" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "Q4bUCZs2jNTA" + }, + "source": [ + "from sklearn.ensemble import GradientBoostingClassifier\n", + "\n", + "# Instancia...\n", + "ml_GB=GradientBoostingClassifier(n_estimators=100, min_samples_split= 2)\n", + "\n", + "# Treina...\n", + "ml_GB.fit(X_train, y_train)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "-dr6dyjdXwvd" + }, + "source": [ + "# Cross-Validation com 10 folds\n", + "a_scores_CV = cross_val_score(ml_GB, X_train, y_train, cv = i_CV)\n", + "print(f'Média das Acurácias calculadas pelo CV....: {100*round(a_scores_CV.mean(),4)}')\n", + "print(f'std médio das Acurácias calculadas pelo CV: {100*round(a_scores_CV.std(),4)}')" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "VlC3y3M5YaGG" + }, + "source": [ + "print(f'Acurácias: {a_scores_CV}')" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "vnLvQ0ZDYNjB" + }, + "source": [ + "**Interpretação**: Nosso classificador (GradientBoostingClassifier) tem uma acurácia média de 96,86% (base de treinamento). Além disso, o std é da ordem de 2,52%, ou seja, pequena. Vamos tentar melhorar a acurácia do classificador usando parameter tunning (GridSearchCV)." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "D2n1RKZuXq3D" + }, + "source": [ + "# Faz precições...\n", + "y_pred = ml_GB.predict(X_test)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "8r6JCzQRGFa0" + }, + "source": [ + "# Confusion Matrix\n", + "cf_matrix = confusion_matrix(y_test, y_pred)\n", + "cf_labels = ['True_Negative','False_Positive','False_Negative','True_Positive']\n", + "cf_categories = ['Zero', 'One']\n", + "mostra_confusion_matrix(cf_matrix, group_names = cf_labels, categories = cf_categories)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "KFv-Q2AD5uCk" + }, + "source": [ + "## Parameter tunning\n", + "> Consulte [Complete Machine Learning Guide to Parameter Tuning in Gradient Boosting (GBM) in Python](https://www.analyticsvidhya.com/blog/2016/02/complete-guide-parameter-tuning-gradient-boosting-gbm-python/) para detalhes sobre os parâmetros, significado e etc." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "wgU040AcjNTF" + }, + "source": [ + "# Dicionário de parâmetros para o parameter tunning.\n", + "d_parametros_GB= {'learning_rate': [1, 0.5, 0.25, 0.1, 0.05, 0.01]} #,\n", + "# 'n_estimators': [1, 2, 4, 8, 16, 32, 64, 100, 200],\n", + "# 'max_depth': [5, 10, 15, 20, 25, 30],\n", + "# 'min_samples_split': [0.1, 0.3, 0.5, 0.7, 0.9],\n", + "# 'min_samples_leaf': [0.1, 0.2, 0.3, 0.4, 0.5],\n", + "# 'max_features': list(range(1, X_train.shape[1]))}" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "v5KLFlpTjNTH" + }, + "source": [ + "# Invoca a função\n", + "ml_GB2, best_params= GridSearchOptimizer(ml_GB, 'ml_GB2', d_parametros_GB, X_train, y_train, X_test, y_test, cv = i_CV)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "YQ6ERz3fi9i2" + }, + "source": [ + "### Resultado da execução do Gradient Boosting" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "RSa7uKw13mKG" + }, + "source": [ + "```\n", + "[Parallel(n_jobs=-1)]: Done 275400 out of 275400 | elapsed: 93.7min finished\n", + "\n", + "Parametros otimizados: {'learning_rate': 1, 'max_depth': 30, 'max_features': 11, 'min_samples_leaf': 0.1, 'min_samples_split': 0.1, 'n_estimators': 100}\n", + "```\n", + "\n" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "wiJpA2PyjDjR" + }, + "source": [ + "# Como o procedimento acima levou 93 minutos para executar, então vou estimar ml_GB2 abaixo usando os parâmetros acima estimados\n", + "best_params= {'learning_rate': 1, 'max_depth': 30, 'max_features': 11, 'min_samples_leaf': 0.1, 'min_samples_split': 0.1, 'n_estimators': 100}\n", + "\n", + "#ml_GB2= GradientBoostingClassifier(learning_rate= best_params['learning_rate'], \n", + "# max_depth= best_params['max_depth'],\n", + "# max_features= best_params['max_features'],\n", + "# min_samples_leaf= best_params['min_samples_leaf'],\n", + "# min_samples_split= best_params['min_samples_split'],\n", + "# n_estimators= best_params['n_estimators'],\n", + "# random_state= i_Seed)\n", + "\n", + "ml_GB2= GradientBoostingClassifier(learning_rate= best_params['learning_rate'], \n", + " max_depth= best_params['max_depth'],\n", + " min_samples_leaf= best_params['min_samples_leaf'],\n", + " min_samples_split= best_params['min_samples_split'],\n", + " n_estimators= best_params['n_estimators'],\n", + " random_state= i_Seed)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "mb14gJ7-jbVM" + }, + "source": [ + "## Selecionar as COLUNAS importantes/relevantes" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "TAqGZIFYm2sU" + }, + "source": [ + "X_train_GB, X_test_GB = seleciona_colunas_relevantes(ml_GB2, X_train, X_test)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "6yiu6dahnBvC" + }, + "source": [ + "## Treina o classificador com as COLUNAS relevantes " + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "APrtWN18nc4t" + }, + "source": [ + "best_params" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "VS0mLdOmnXAY" + }, + "source": [ + "# Treina com as COLUNAS relevantes\n", + "ml_GB2.fit(X_train_GB, y_train)\n", + "\n", + "# Cross-Validation com 10 folds\n", + "a_scores_CV = cross_val_score(ml_GB2, X_train_GB, y_train, cv = i_CV)\n", + "print(f'Acurácia Media: {100*a_scores_CV.mean():.2f}')\n", + "print(f'std médio.....: {100*a_scores_CV.std():.2f}')" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "vmc9PP_Rn1TN" + }, + "source": [ + "## Valida o modelo usando o dataframe X_test" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "e3mnIALvnzP2" + }, + "source": [ + "y_pred_GB = ml_GB2.predict(X_test_GB)\n", + "\n", + "# Calcula acurácia\n", + "accuracy_score(y_test, y_pred_GB)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "kwP9Z2GnkV7r" + }, + "source": [ + "___\n", + "# **XGBOOST (eXtreme Gradient Boosting)**\n", + "* XGBoost é uma melhoria de Gradient Boosting. As melhorias são em velocidade e performace, além de corrigir as ineficiências do GradientBoosting.\n", + "* Algoritmo preferido pelos Kaggle Grandmasters;\n", + "* Paralelizável;\n", + "* Estado-da-arte em termos de Machine Learning;\n", + "\n", + "## Parâmetros relevantes e seus valores iniciais\n", + "Consulte [Complete Guide to Parameter Tuning in XGBoost with codes in Python](https://www.analyticsvidhya.com/blog/2016/03/complete-guide-parameter-tuning-xgboost-with-codes-python/) para detalhes completos sobre os parâmetros, significado e etc.\n", + "\n", + "* n_estimators = 100 (100 caso o dataframe for grande. Se o dataframe for médio/pequeno, então 1000) - É o número de árvores desejamos construir;\n", + "* max_depth= 3 - Determina quão profundo cada árvore pode crescer durante qualquer round de treinamento. Valores típicos no intervalo [3, 10];\n", + "* learning rate= 0.01 - Usado para evitar overfitting, intervalo: [0, 1];\n", + "* alpha (somente para problemas de Regressão) - L1 regularization nos pesos. Valores altos resulta em mais regularization;\n", + "* lambda (somente para problemas de Regressão) - L2 regularization nos pesos.\n", + "* colsample_bytree: 1 - porcentagem de COLUNAS usados por cada árvore. Alto valor pode causar overfitting;\n", + "* subsample: 0.8 - porcentagem de amostras usadas por árvore. Um valor baixo pode levar a overfitting;\n", + "* gamma: 1 - Controla se um determinado nó será dividido com base na redução esperada na perda após a divisão. Um valor mais alto leva a menos divisões.\n", + "* objective: Define a \"loss function\". As opções são:\n", + " * reg:linear - Para resolver problemas de regressão;\n", + " * reg:logistic - Para resolver problemas de classificação;\n", + " * binary:logistic - Para resolver problemas de classificação com cálculo de probabilidades;\n", + "\n", + "# Referências\n", + "* [How exactly XGBoost Works?](https://medium.com/@pushkarmandot/how-exactly-xgboost-works-a320d9b8aeef)\n", + "* [Fine-tuning XGBoost in Python like a boss](https://towardsdatascience.com/fine-tuning-xgboost-in-python-like-a-boss-b4543ed8b1e)\n", + "* [Gentle Introduction of XGBoost Library](https://medium.com/@imoisharma/gentle-introduction-of-xgboost-library-2b1ac2669680)\n", + "* [A Beginner’s guide to XGBoost](https://towardsdatascience.com/a-beginners-guide-to-xgboost-87f5d4c30ed7)\n", + "* [Exploring XGBoost](https://towardsdatascience.com/exploring-xgboost-4baf9ace0cf6)\n", + "* [Feature Importance and Feature Selection With XGBoost in Python](https://machinelearningmastery.com/feature-importance-and-feature-selection-with-xgboost-in-python/)\n", + "* [Ensemble Learning case study: Running XGBoost on Google Colab free GPU](https://towardsdatascience.com/running-xgboost-on-google-colab-free-gpu-a-case-study-841c90fef101) - Recomendo\n", + "* [Predicting movie revenue with AdaBoost, XGBoost and LightGBM](https://towardsdatascience.com/predicting-movie-revenue-with-adaboost-xgboost-and-lightgbm-262eadee6daa)\n", + "* [Tuning XGBoost Hyperparameters with Scikit Optimize](https://towardsdatascience.com/how-to-improve-the-performance-of-xgboost-models-1af3995df8ad)\n", + "* [An Example of Hyperparameter Optimization on XGBoost, LightGBM and CatBoost using Hyperopt](https://towardsdatascience.com/an-example-of-hyperparameter-optimization-on-xgboost-lightgbm-and-catboost-using-hyperopt-12bc41a271e) - Interessante\n", + "* [XGBOOST vs LightGBM: Which algorithm wins the race !!!](https://towardsdatascience.com/lightgbm-vs-xgboost-which-algorithm-win-the-race-1ff7dd4917d) - LightGBM tem se mostrado interessante.\n", + "* [From Zero to Hero in XGBoost Tuning](https://towardsdatascience.com/from-zero-to-hero-in-xgboost-tuning-e48b59bfaf58) - Gostei\n", + "* [Build XGBoost / LightGBM models on large datasets — what are the possible solutions?](https://towardsdatascience.com/build-xgboost-lightgbm-models-on-large-datasets-what-are-the-possible-solutions-bf882da2c27d)\n", + "* [Selecting Optimal Parameters for XGBoost Model Training](https://towardsdatascience.com/selecting-optimal-parameters-for-xgboost-model-training-c7cd9ed5e45e) - Muito bom!\n", + "* [CatBoost vs. Light GBM vs. XGBoost](https://towardsdatascience.com/catboost-vs-light-gbm-vs-xgboost-5f93620723db)\n" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "iMM_R4_ukV7x" + }, + "source": [ + "from xgboost import XGBClassifier\n", + "import xgboost as xgb\n", + "\n", + "# Instancia...\n", + "ml_XGB= XGBClassifier(silent=False, \n", + " scale_pos_weight=1,\n", + " learning_rate=0.01, \n", + " colsample_bytree = 1,\n", + " subsample = 0.8,\n", + " objective='binary:logistic', \n", + " n_estimators=1000, \n", + " reg_alpha = 0.3,\n", + " max_depth= 3, \n", + " gamma=1, \n", + " max_delta_step=5)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "E4wQMlDEFINR" + }, + "source": [ + "# Treina...\n", + "ml_XGB.fit(X_train, y_train)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "zAhsTtwGqMkG" + }, + "source": [ + "# Cross-Validation com 10 folds\n", + "a_scores_CV = cross_val_score(ml_XGB, X_train, y_train, cv = i_CV)\n", + "print(f'Média das Acurácias calculadas pelo CV....: {100*round(a_scores_CV.mean(),4)}')\n", + "print(f'std médio das Acurácias calculadas pelo CV: {100*round(a_scores_CV.std(),4)}')" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "JNyKX6PkrXOk" + }, + "source": [ + "**Interpretação**: Nosso classificador (XGBClassifier) tem uma acurácia média de 96,72% (base de treinamento). Além disso, o std é da ordem de 2,02%, ou seja, pequena. Vamos tentar melhorar a acurácia do classificador usando parameter tunning (GridSearchCV)." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "_h0QYv3FkV73" + }, + "source": [ + "print(f'Acurácias: {a_scores_CV}')" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "AKhhAZLjkV76" + }, + "source": [ + "# Faz predições...\n", + "y_pred = ml_XGB.predict(X_test)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "Ir2Kd1PqGHgz" + }, + "source": [ + "# Confusion Matrix\n", + "cf_matrix = confusion_matrix(y_test, y_pred)\n", + "cf_labels = ['True_Negative','False_Positive','False_Negative','True_Positive']\n", + "cf_categories = ['Zero', 'One']\n", + "mostra_confusion_matrix(cf_matrix, group_names= cf_labels, categories= cf_categories)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "jEC7gW4qYpWw" + }, + "source": [ + "## Parameter tunning\n", + "### Leitura Adicional:\n", + "* [Fine-tuning XGBoost in Python like a boss](https://towardsdatascience.com/fine-tuning-xgboost-in-python-like-a-boss-b4543ed8b1e)\n", + "* [Complete Guide to Parameter Tuning in XGBoost with codes in Python](https://www.analyticsvidhya.com/blog/2016/03/complete-guide-parameter-tuning-xgboost-with-codes-python/)\n", + "\n", + "> Olhando para os resultados acima, qual o melhor modelo?\n", + "\n", + "XGBoost? Supondo que sim, agora vamos fazer o fine-tuning dos parâmetros do modelo." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "n3MsUONPwIV9" + }, + "source": [ + "# Dicionário de parâmetros para XGBoost:\n", + "d_parametros_XGB = {'min_child_weight': [i for i in np.arange(1, 13)]} #,\n", + "# 'gamma': [i for i in np.arange(0, 5, 0.5)],\n", + "# 'subsample': [0.6, 0.8, 1.0],\n", + "# 'colsample_bytree': [0.6, 0.8, 1.0],\n", + "# 'max_depth': [3, 4, 5, 7, 9],\n", + "# 'learning_rate': [i for i in np.arange(0.01, 1, 0.1)]}" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "CX27FCKmwSni" + }, + "source": [ + "# Invoca a função\n", + "ml_XGB, best_params= GridSearchOptimizer(ml_XGB, 'ml_XGB2', d_parametros_XGB, X_train, y_train, X_test, y_test, cv = i_CV)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "9b7uCuF74Hjv" + }, + "source": [ + "### Resultado da execução do XGBoostClassifier\n", + "\n", + "```\n", + "[Parallel(n_jobs=-1)]: Done 108000 out of 108000 | elapsed: 372.0min finished\n", + "\n", + "Parametros otimizados: {'colsample_bytree': 0.8, 'gamma': 0.5, 'learning_rate': 0.51, 'max_depth': 5, 'min_child_weight': 1, 'subsample': 0.6}\n", + "```\n", + "\n" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "n7E0oyxEtbGi" + }, + "source": [ + "# Como o procedimento acima levou 372 minutos para executar, então vou estimar ml_XGB2 abaixo usando os parâmetros acima estimados\n", + "best_params= {'colsample_bytree': 0.8, 'gamma': 0.5, 'learning_rate': 0.51, 'max_depth': 5, 'min_child_weight': 1, 'subsample': 0.6}\n", + "\n", + "ml_XGB2= XGBClassifier(min_child_weight= best_params['min_child_weight'], \n", + " gamma= best_params['gamma'], \n", + " subsample= best_params['subsample'], \n", + " colsample_bytree= best_params['colsample_bytree'], \n", + " max_depth= best_params['max_depth'], \n", + " learning_rate= best_params['learning_rate'], \n", + " random_state= i_Seed)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "CuqyLHTU5Z-j" + }, + "source": [ + "## Selecionar as COLUNAS importantes/relevantes\n", + "* [The Multiple faces of ‘Feature importance’ in XGBoost](https://towardsdatascience.com/be-careful-when-interpreting-your-features-importance-in-xgboost-6e16132588e7)" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "QPG3JZIpRZ-T" + }, + "source": [ + "# plot feature importance\n", + "from xgboost import plot_importance\n", + "\n", + "xgb.plot_importance(ml_XGB2, color = 'red')\n", + "plt.title('importance', fontsize = 20)\n", + "plt.yticks(fontsize = 10)\n", + "plt.ylabel('features', fontsize = 20)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "EmpRC2lHW-KP" + }, + "source": [ + "ml_XGB2" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "4f9MIEBiyq-5" + }, + "source": [ + "X_train_XGB, X_test_XGB= seleciona_colunas_relevantes(ml_XGB2, X_train, X_test)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "F6EayWaY5nMm" + }, + "source": [ + "## Treina o classificador com as COLUNAS relevantes" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "Huy18gKI5qad" + }, + "source": [ + "best_params" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "E3-PaTdc5vZk" + }, + "source": [ + "# Treina com as COLUNAS relevantes...\n", + "ml_XGB2.fit(X_train_XGB, y_train)\n", + "\n", + "# Cross-Validation com 10 folds\n", + "a_scores_CV = cross_val_score(ml_XGB2, X_train_XGB, y_train, cv = i_CV)\n", + "print(f'Acurácia Media: {100*a_scores_CV.mean():.2f}')\n", + "print(f'std médio.....: {100*a_scores_CV.std():.2f}')" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "tBdYikDU6NhD" + }, + "source": [ + "## Valida o modelo usando o dataframe X_test" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "GcvY-VdL6VIZ" + }, + "source": [ + "y_pred_XGB = ml_XGB2.predict(X_test_XGB)\n", + "\n", + "# Calcula acurácia\n", + "accuracy_score(y_test, y_pred_XGB)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "8oLtdH-vTSbC" + }, + "source": [ + "xgb.to_graphviz(ml_XGB2)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "czXQG3MCHfHM" + }, + "source": [ + "# KNN - KNEIGHBORSCLASSIFIER" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "llTTXNeyHiwx" + }, + "source": [ + "# BAGGINGCLASSIFIER" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Fbkekd4QHoZO" + }, + "source": [ + "# EXTRATREESCLASSIFIER" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "widavwR4HzwE" + }, + "source": [ + "# SVM\n", + "https://data-flair.training/blogs/svm-support-vector-machine-tutorial/" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "id_Ubulns6We" + }, + "source": [ + "# NAIVE BAYES" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "3e0m7lEnYOV9" + }, + "source": [ + "# **IMPORTANCIA DAS COLUNAS**\n", + "Source: [Plotting Feature Importances](https://www.kaggle.com/grfiv4/plotting-feature-importances)" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "fjco0HnNYr-N" + }, + "source": [ + "def mostra_feature_importances(clf, X_train, y_train=None, \n", + " top_n=10, figsize=(8,8), print_table=False, title=\"Feature Importances\"):\n", + " '''\n", + " plot feature importances of a tree-based sklearn estimator\n", + " \n", + " Note: X_train and y_train are pandas DataFrames\n", + " \n", + " Note: Scikit-plot is a lovely package but I sometimes have issues\n", + " 1. flexibility/extendibility\n", + " 2. complicated models/datasets\n", + " But for many situations Scikit-plot is the way to go\n", + " see https://scikit-plot.readthedocs.io/en/latest/Quickstart.html\n", + " \n", + " Parameters\n", + " ----------\n", + " clf (sklearn estimator) if not fitted, this routine will fit it\n", + " \n", + " X_train (pandas DataFrame)\n", + " \n", + " y_train (pandas DataFrame) optional\n", + " required only if clf has not already been fitted \n", + " \n", + " top_n (int) Plot the top_n most-important features\n", + " Default: 10\n", + " \n", + " figsize ((int,int)) The physical size of the plot\n", + " Default: (8,8)\n", + " \n", + " print_table (boolean) If True, print out the table of feature importances\n", + " Default: False\n", + " \n", + " Returns\n", + " -------\n", + " the pandas dataframe with the features and their importance\n", + " \n", + " Author\n", + " ------\n", + " George Fisher\n", + " '''\n", + " \n", + " __name__ = \"mostra_feature_importances\"\n", + " \n", + " import pandas as pd\n", + " import numpy as np\n", + " import matplotlib.pyplot as plt\n", + " \n", + " from xgboost.core import XGBoostError\n", + " from lightgbm.sklearn import LightGBMError\n", + " \n", + " try: \n", + " if not hasattr(clf, 'feature_importances_'):\n", + " clf.fit(X_train.values, y_train.values.ravel())\n", + "\n", + " if not hasattr(clf, 'feature_importances_'):\n", + " raise AttributeError(\"{} does not have feature_importances_ attribute\".\n", + " format(clf.__class__.__name__))\n", + " \n", + " except (XGBoostError, LightGBMError, ValueError):\n", + " clf.fit(X_train.values, y_train.values.ravel())\n", + " \n", + " feat_imp = pd.DataFrame({'importance':clf.feature_importances_}) \n", + " feat_imp['feature'] = X_train.columns\n", + " feat_imp.sort_values(by ='importance', ascending = False, inplace = True)\n", + " feat_imp = feat_imp.iloc[:top_n]\n", + " \n", + " feat_imp.sort_values(by='importance', inplace = True)\n", + " feat_imp = feat_imp.set_index('feature', drop = True)\n", + " feat_imp.plot.barh(title=title, figsize=figsize)\n", + " plt.xlabel('Feature Importance Score')\n", + " plt.show()\n", + " \n", + " if print_table:\n", + " from IPython.display import display\n", + " print(\"Top {} features in descending order of importance\".format(top_n))\n", + " display(feat_imp.sort_values(by = 'importance', ascending = False))\n", + " \n", + " return feat_imp" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "ycu_EIGlYUYn" + }, + "source": [ + "import pandas as pd\n", + "\n", + "from xgboost import XGBClassifier\n", + "from sklearn.ensemble import ExtraTreesClassifier\n", + "from sklearn.tree import ExtraTreeClassifier\n", + "from sklearn.tree import DecisionTreeClassifier\n", + "from sklearn.ensemble import GradientBoostingClassifier\n", + "from sklearn.ensemble import BaggingClassifier\n", + "from sklearn.ensemble import AdaBoostClassifier\n", + "from sklearn.ensemble import RandomForestClassifier\n", + "from sklearn.linear_model import LogisticRegression\n", + "from lightgbm import LGBMClassifier\n", + "\n", + "clfs = [XGBClassifier(), LGBMClassifier(), \n", + " ExtraTreesClassifier(), ExtraTreeClassifier(),\n", + " BaggingClassifier(), DecisionTreeClassifier(),\n", + " GradientBoostingClassifier(), LogisticRegression(),\n", + " AdaBoostClassifier(), RandomForestClassifier()]\n", + "\n", + "for clf in clfs:\n", + " try:\n", + " _ = mostra_feature_importances(clf, X_train, y_train, top_n=X_train.shape[1], title=clf.__class__.__name__)\n", + " except AttributeError as e:\n", + " print(e)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "EwWkjfC8KEZH" + }, + "source": [ + "# ENSEMBLE METHODS\n", + "https://towardsdatascience.com/using-bagging-and-boosting-to-improve-classification-tree-accuracy-6d3bb6c95e5b\n", + "\n", + "![Ensemble](https://github.com/MathMachado/Materials/blob/master/Ensemble.png?raw=true)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "3Uf1RML7xETY" + }, + "source": [ + "# WOE e IV\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "TBNRfYZCyhMP" + }, + "source": [ + "## Construção do exemplo" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "gIIroyyP4ZRZ" + }, + "source": [ + "df_y.head()" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "PzQQdrkf1ohX" + }, + "source": [ + "from random import choices\n", + "\n", + "df_X2= df_X.copy()\n", + "df_X2['tipo']= choices(['A', 'B', 'C', 'D'], k= 1000)\n", + "df_X2['idade']= np.random.randint(10, 15, size= 1000)\n", + "df_X2['target']= df_y['target']\n", + "df_X2.head()" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "v-OpwIpx4hXJ" + }, + "source": [ + "df_X2['target'].value_counts()" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "yZfqSvbKzeJ3" + }, + "source": [ + "def Constroi_Buckets(df, i, k= 10):\n", + " coluna= 'v'+ str(i)\n", + " df[coluna+'_Bucket']= pd.cut(df[coluna], bins= k, labels= np.arange(1, k+1))\n", + " df= df.drop(columns= [coluna], axis= 1)\n", + " return df" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "V6Nrpsx60HD3" + }, + "source": [ + "for i in np.arange(1,19):\n", + " df_X2= Constroi_Buckets(df_X2, i)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "J2Fbh41-03OB" + }, + "source": [ + "df_X2.head()" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "O9r5BeWVxIr3" + }, + "source": [ + "# Função para calcular WOE e IV\n", + "def calculate_woe_iv(dataset, feature, target):\n", + "\n", + " def codethem(IV):\n", + " if IV < 0.02: return 'Useless'\n", + " elif IV >= 0.02 and IV < 0.1: return 'Weak'\n", + " elif IV >= 0.1 and IV < 0.3: return 'Medium'\n", + " elif IV >= 0.3 and IV < 0.5: return 'Strong'\n", + " elif IV >= 0.5: return 'Suspicious'\n", + " else: return 'None'\n", + "\n", + " lst = []\n", + " for i in range(dataset[feature].nunique()):\n", + " val = list(dataset[feature].unique())[i]\n", + " lst.append({\n", + " 'Value': val,\n", + " 'All': dataset[dataset[feature] == val].count()[feature],\n", + " 'Good': dataset[(dataset[feature] == val) & (dataset[target] == 0)].count()[feature],\n", + " 'Bad': dataset[(dataset[feature] == val) & (dataset[target] == 1)].count()[feature]\n", + " })\n", + " \n", + " dset = pd.DataFrame(lst)\n", + " dset['Distr_Good'] = dset['Good']/dset['Good'].sum()\n", + " dset['Distr_Bad'] = dset['Bad']/dset['Bad'].sum()\n", + " dset['Mean']= dset['All']/dset['All'].sum()\n", + " dset['WoE'] = np.log(dset['Distr_Good']/dset['Distr_Bad'])\n", + " dset = dset.replace({'WoE': {np.inf: 0, -np.inf: 0}})\n", + " dset['IV'] = (dset['Distr_Good'] - dset['Distr_Bad']) * dset['WoE']\n", + " #dset= dset.drop(columns= ['Distr_Good', 'Distr_Bad'], axis= 1)\n", + "\n", + " dset['Predictive_Power']= dset['IV'].map(codethem)\n", + " iv = dset['IV'].sum() \n", + " dset = dset.sort_values(by='IV') \n", + " return dset, iv" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "Y8WGjWH63nx_" + }, + "source": [ + "df_Lab = df_X2.copy()" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "-N6xr1MgxTiz" + }, + "source": [ + "def calcula_Predictive_Power(df_Lab, coluna):\n", + " print('WoE and IV for column: {}'.format(coluna))\n", + " df, iv = calculate_woe_iv(df_Lab, coluna, 'target')\n", + " print(df)\n", + " print('IV score: {:.2f}'.format(iv))\n", + " print('\\n')" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "ayqN_7WnxVq9" + }, + "source": [ + "for i in np.arange(1,19):\n", + " coluna= 'v'+str(i)+'_Bucket'\n", + " calcula_Predictive_Power(df_Lab, coluna)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "qtoJVI4Pyx3I" + }, + "source": [ + "# **IMBALANCED SAMPLE**\n", + "> Alguns objetivos como detectar fraude em transações bancárias ou detecção de intrusão em network tem em comum o fato que a classe de interesse (o que queremos detectar), geralmente é um evento raro\n", + "\n", + "## Exemplo: Detectar fraude\n", + "A proporção de fraudes diante de NÃO-FRAUDES são mais ou menos 1%/99%. Neste caso, ao desenvovermos um modelo para detectar fraudes e o modelo classificar todas as instâncias como NÃO-FRAUDE, então o modelo terá uma acurácia de 99%. No entanto, este modelo não nos ajudará em nada.\n", + "\n", + "## Necessidade de se usar outras métricas \n", + "> Recomenda-se utilizar outras métricas (na verdade, é boa prática usar mais de 1 métrica para medir a performance dos modelos) como, por exemplo, F1-Score, Precision/Specificity, Recall/Sensitivity e AUROC.\n", + "\n", + "## Como lidar com a amostra desbalanceada?\n", + "* Under-sampling\n", + "> Seleciona aleatoriamente a classe MAJORITÁRIA (em nosso exemplo, NÃO-FRAUDE) até o número de instâncias da classe MINORITÁRIA (FRAUDE);\n", + "\n", + "* Over-sampling\n", + "> Resample aleatoriamente a classe MINORITÁRIA (em nosso exemplo, FRAUDE) até o número de instâncias da classe MAJORITÁRIA (NÃO-FRAUDE), ou uma proporção da classe MAJORITÁRIA. Veja a bibliotea SMOTE (Synthetic Minority Over-Sampling Techniques);\n", + "\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "2o45zx8zw-aB" + }, + "source": [ + "## EFEITOS DA AMOSTRA DESBALANCEADA" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "cCVTPCB-Xkbd" + }, + "source": [ + "# TPOT\n", + "https://towardsdatascience.com/tpot-automated-machine-learning-in-python-4c063b3e5de9" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "2ulXii6JXpWd" + }, + "source": [ + "" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "_TWUq-z4X4yZ" + }, + "source": [ + "___\n", + "# FEATURETOOLS\n", + "https://medium.com/@rrfd/simple-automatic-feature-engineering-using-featuretools-in-python-for-classification-b1308040e183\n", + "\n", + "https://www.analyticsvidhya.com/blog/2018/08/guide-automated-feature-engineering-featuretools-python/\n", + "\n", + "https://mlwhiz.com/blog/2019/05/19/feature_extraction/\n", + "\n" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "aZUNOgmSgAmq" + }, + "source": [ + "!pip install featuretools" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "_sxdONzsh9rb" + }, + "source": [ + "df_X.head()" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "p5_ynGo1dBJJ" + }, + "source": [ + "df_X.shape" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "TqJRJXUhiDqf" + }, + "source": [ + "from random import choices\n", + "\n", + "df_X2= df_X.copy()\n", + "df_X2['tipo'] = choices(['A', 'B', 'C', 'D'], k = 1000)\n", + "df_X2['idade'] = np.random.randint(10, 15, size = 1000)\n", + "df_X2['id'] = range(0,1000)\n", + "df_X2.head()" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "nR56bGGngk-W" + }, + "source": [ + "# Automated feature engineering\n", + "import featuretools as ft\n", + "import featuretools.variable_types as vtypes\n", + "\n", + "es= ft.EntitySet(id = 'simulacao')\n", + "\n", + "# adding a dataframe \n", + "es.entity_from_dataframe(entity_id = 'df_X2', dataframe = df_X2, index = 'id')\n", + "es" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "IOJ4Tr5Ogk6M" + }, + "source": [ + "es['df_X2'].variables" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "1uXPqHDZgkys" + }, + "source": [ + "variable_types = {'idade': vtypes.Categorical}\n", + " \n", + "es.entity_from_dataframe(entity_id = 'df_X2', dataframe = df_X2, index = 'id', variable_types= variable_types)\n", + "\n", + "es = es.normalize_entity(base_entity_id='df_X2', new_entity_id= 'tipo', index='id')\n", + "es = es.normalize_entity(base_entity_id='df_X2', new_entity_id= 'idade', index='id')" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "dnbYTBqugkvm" + }, + "source": [ + "es" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "I2v_jetdgkr7" + }, + "source": [ + "feature_matrix, feature_names = ft.dfs(entityset=es, target_entity = 'df_X2', max_depth = 3, verbose = 3, n_jobs= 1)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "zZiRBvHXgkoJ" + }, + "source": [ + "feature_matrix.head()" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "aWiahwKe2d6U" + }, + "source": [ + "# **EXERCÍCIOS**\n", + "> Encontre algoritmos adequados para ser aplicados aos seguintes problemas:" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "XbSLkbDB2mzK" + }, + "source": [ + "## Exercício 1 - Credit Card Fraud Detection\n", + "Source: [Credit Card Fraud Detection](https://www.kaggle.com/mlg-ulb/creditcardfraud)\n", + "\n", + "### Leitura suporte\n", + "* [Detecting Credit Card Fraud Using Machine Learning](https://towardsdatascience.com/detecting-credit-card-fraud-using-machine-learning-a3d83423d3b8)\n", + "* [Credit Card Fraud Detection](https://towardsdatascience.com/credit-card-fraud-detection-a1c7e1b75f59)\n", + "\n", + "### Dataframe\n", + "* [Creditcard.csv](https://raw.githubusercontent.com/MathMachado/DataFrames/master/creditcard.csv)" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "sPNc6ouw2MRe" + }, + "source": [ + "import pandas as pd\n", + "import numpy as np" + ], + "execution_count": 2, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "RlGFLoPi2OFJ", + "outputId": "ec18dcc6-9703-4764-d781-9d2c5738c54f", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 224 + } + }, + "source": [ + "url= 'https://raw.githubusercontent.com/gersonhenz/DSWP/master/Dataframes/creditcard.csv'\n", + "df_cc = pd.read_csv(url)\n", + "df_cc.head()" + ], + "execution_count": 4, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
TimeV1V2V3V4V5V6V7V8V9V10V11V12V13V14V15V16V17V18V19V20V21V22V23V24V25V26V27V28AmountClass
00-1.359807-0.0727812.5363471.378155-0.3383210.4623880.2395990.0986980.3637870.090794-0.551600-0.617801-0.991390-0.3111691.468177-0.4704010.2079710.0257910.4039930.251412-0.0183070.277838-0.1104740.0669280.128539-0.1891150.133558-0.021053149.620.0
101.1918570.2661510.1664800.4481540.060018-0.082361-0.0788030.085102-0.255425-0.1669741.6127271.0652350.489095-0.1437720.6355580.463917-0.114805-0.183361-0.145783-0.069083-0.225775-0.6386720.101288-0.3398460.1671700.125895-0.0089830.0147242.690.0
21-1.358354-1.3401631.7732090.379780-0.5031981.8004990.7914610.247676-1.5146540.2076430.6245010.0660840.717293-0.1659462.345865-2.8900831.109969-0.121359-2.2618570.5249800.2479980.7716790.909412-0.689281-0.327642-0.139097-0.055353-0.059752378.660.0
31-0.966272-0.1852261.792993-0.863291-0.0103091.2472030.2376090.377436-1.387024-0.054952-0.2264870.1782280.507757-0.287924-0.631418-1.059647-0.6840931.965775-1.232622-0.208038-0.1083000.005274-0.190321-1.1755750.647376-0.2219290.0627230.061458123.500.0
42-1.1582330.8777371.5487180.403034-0.4071930.0959210.592941-0.2705330.8177390.753074-0.8228430.5381961.345852-1.1196700.175121-0.451449-0.237033-0.0381950.8034870.408542-0.0094310.798278-0.1374580.141267-0.2060100.5022920.2194220.21515369.990.0
\n", + "
" + ], + "text/plain": [ + " Time V1 V2 V3 ... V27 V28 Amount Class\n", + "0 0 -1.359807 -0.072781 2.536347 ... 0.133558 -0.021053 149.62 0.0\n", + "1 0 1.191857 0.266151 0.166480 ... -0.008983 0.014724 2.69 0.0\n", + "2 1 -1.358354 -1.340163 1.773209 ... -0.055353 -0.059752 378.66 0.0\n", + "3 1 -0.966272 -0.185226 1.792993 ... 0.062723 0.061458 123.50 0.0\n", + "4 2 -1.158233 0.877737 1.548718 ... 0.219422 0.215153 69.99 0.0\n", + "\n", + "[5 rows x 31 columns]" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 4 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "AWBaTtKL5NG5", + "outputId": "cf3c7d2d-a208-4de2-91da-549df5dc9fbc", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + } + }, + "source": [ + "df_cc.shape" + ], + "execution_count": 9, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "(12842, 31)" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 9 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "dpisbkhv3f_p", + "outputId": "175f9013-b169-4a73-8af7-828a83eeeb24", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 68 + } + }, + "source": [ + "df_cc['Class'].value_counts()" + ], + "execution_count": 7, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "0.0 12785\n", + "1.0 56\n", + "Name: Class, dtype: int64" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 7 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "XJcK_gRh3uzD", + "outputId": "8c3fdbb7-30aa-4ea9-c11e-dc8db3033c6f", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + } + }, + "source": [ + "56/12785\n" + ], + "execution_count": 8, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "0.004380132968322252" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 8 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "MDskpiqf4DHI" + }, + "source": [ + "# não precisa normalizar os campos neste exercício, pois o DECISION TREE não requer isso;\n", + "# aplicar as transformações (principais) e reestimar modelo\n", + "# qual o impacto das transformações? A conclusão mudou ou não?\n", + "\n" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "6HHkLchN2-Wh", + "outputId": "54742978-b903-4404-8b69-9dd8e7926bc5", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 561 + } + }, + "source": [ + "df_cc.isna().sum()" + ], + "execution_count": 5, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "Time 0\n", + "V1 0\n", + "V2 0\n", + "V3 0\n", + "V4 0\n", + "V5 0\n", + "V6 0\n", + "V7 0\n", + "V8 0\n", + "V9 0\n", + "V10 1\n", + "V11 1\n", + "V12 1\n", + "V13 1\n", + "V14 1\n", + "V15 1\n", + "V16 1\n", + "V17 1\n", + "V18 1\n", + "V19 1\n", + "V20 1\n", + "V21 1\n", + "V22 1\n", + "V23 1\n", + "V24 1\n", + "V25 1\n", + "V26 1\n", + "V27 1\n", + "V28 1\n", + "Amount 1\n", + "Class 1\n", + "dtype: int64" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 5 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "qhObHtS63ecx", + "outputId": "c0265507-9ad7-4a11-c847-b82f952d66a7", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + } + }, + "source": [ + "df_cc2 = df_cc.copy()\n", + "df_cc2 = df_cc.dropna()\n", + "df_cc2.shape" + ], + "execution_count": 10, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "(12841, 31)" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 10 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "CBl1x6tM49iz" + }, + "source": [ + "# Definir as variáveis globais\n", + "i_CV = 10\n", + "i_Seed = 20111974\n", + "f_Test_Size = 0.3\n" + ], + "execution_count": 12, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "CSuzy52r7zoY", + "outputId": "b1077f26-a1ca-494d-c4a4-662a481de491", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 197 + } + }, + "source": [ + "df_X = df_cc2.copy() # dataframe só com as coluna preditoras.... não consegui copiar tudo ....\n", + "df_X.drop(colums = ['Class'], axis=1, inplace = True)\n", + "df_X.head()" + ], + "execution_count": 14, + "outputs": [ + { + "output_type": "error", + "ename": "TypeError", + "evalue": "ignored", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)", + "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0mdf_X\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mdf_cc2\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcopy\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;31m# dataframe só com as coluna preditoras.... não consegui copiar tudo ....\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0mdf_X\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdrop\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mcolums\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0;34m'Class'\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0maxis\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0minplace\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;32mTrue\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 3\u001b[0m \u001b[0mdf_X\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mhead\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;31mTypeError\u001b[0m: drop() got an unexpected keyword argument 'colums'" + ] + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "pjj8IffX7QAe" + }, + "source": [ + "df_X = df_cc2 [] # dataframe somente com as preditoras\n", + "df_y = df_cc2 ['Class'] # variável RESPOSTA\n", + "\n", + "\n" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "ZdfL-K7z7ZB5" + }, + "source": [ + "" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "W1ymC93v54ti" + }, + "source": [ + "from sklearn.model_selection train_test_split\n", + "\n", + "X_treinamento, X_teste, y_treinamento, y_teste = train_test_split(df_X, df_y, test_size = f_Test_Size, )" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "oYgK6JXd3MgA" + }, + "source": [ + "## Exercício 2 - Predicting species on IRIS dataset\n" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "si0rsJvu3O6O" + }, + "source": [ + "from sklearn import datasets\n", + "import xgboost as xgb\n", + "\n", + "iris = datasets.load_iris()\n", + "X_iris = iris.data\n", + "y_iris = iris.target" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "zom8t4yWC_UC" + }, + "source": [ + "## Exercício 3 - Predict Wine Quality\n", + "> Estimar a qualidade dos vinhos, numa scala de 0–100. A seguir, a qualidade em função da escala:\n", + "\n", + "* 95–100 Classic: a great wine\n", + "* 90–94 Outstanding: a wine of superior character and style\n", + "* 85–89 Very good: a wine with special qualities\n", + "* 80–84 Good: a solid, well-made wine\n", + "* 75–79 Mediocre: a drinkable wine that may have minor flaws\n", + "* 50–74 Not recommended\n", + "\n", + "Source: [Wine Reviews](https://www.kaggle.com/zynicide/wine-reviews)" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "klL2Q9Ria96n" + }, + "source": [ + "import numpy as np\n", + "import pandas as pd\n", + "from sklearn import datasets\n", + "\n", + "Wine = datasets.load_wine()\n", + "X_vinho = Wine.data\n", + "y_vinho = Wine.target" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "lhVhSWBgGijq" + }, + "source": [ + "## Exercício 4 - Predict Parkinson\n", + "Source: https://archive.ics.uci.edu/ml/machine-learning-databases/parkinsons/" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "SVCxHqv0VBJn" + }, + "source": [ + "## Exercício 5 - Predict survivors from Titanic tragedy\n" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "CwvB8us4eKNi" + }, + "source": [ + "import numpy as np\n", + "import pandas as pd\n", + "import seaborn as sns\n", + "\n", + "df_titanic = sns.load_dataset('titanic')" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ZJrT9YIXVdtx" + }, + "source": [ + "## Exercício 6 - Predict Loan\n", + "> Os dados devem ser obtidos diretamente da fonte: [Loan Default Prediction - Imperial College London](https://www.kaggle.com/c/loan-default-prediction/data)\n", + "\n", + "* [Bank Loan Default Prediction](https://medium.com/@wutianhao910/bank-loan-default-prediction-94d4902db740)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "R8-GVu7ZWeA8" + }, + "source": [ + "## Exercício 7 - Predict the sales of a store.\n", + "* [Predicting expected sales for Bigmart’s stores](https://medium.com/diogo-menezes-borges/project-1-bigmart-sale-prediction-fdc04f07dc1e)\n", + "* Dataframes\n", + " * [Treinamento](https://raw.githubusercontent.com/MathMachado/DataFrames/master/Big_Mart_Sales_III_train.txt)\n", + " * [Validação](https://raw.githubusercontent.com/MathMachado/DataFrames/master/Big_Mart_Sales_III_test.txt)\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "fv9w86j4Wnwj" + }, + "source": [ + "## Exercício 8 - [The Boston Housing Dataset](https://www.cs.toronto.edu/~delve/data/boston/bostonDetail.html)\n", + "> Predict the median value of owner occupied homes." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "5HYRt8-ug1BT" + }, + "source": [ + "import numpy as np\n", + "import pandas as pd\n", + "from sklearn import datasets\n", + "\n", + "Boston = datasets.load_boston()\n", + "X_boston = Boston.data\n", + "y_boston = Boston.target" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "1UDIaqmtXQ0T" + }, + "source": [ + "## Exercício 9 - Predict the height or weight of a person.\n", + "\n", + "http://wiki.stat.ucla.edu/socr/index.php/SOCR_Data_Dinov_020108_HeightsWeights" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "-7R146nIXmMT" + }, + "source": [ + "## Exercício 10 - Black Friday Sales Prediction - Predict purchase amount.\n", + "\n", + "This dataset comprises of sales transactions captured at a retail store. It’s a classic dataset to explore and expand your feature engineering skills and day to day understanding from multiple shopping experiences. This is a regression problem. The dataset has 550,069 rows and 12 columns.\n", + "\n", + "https://github.com/MathMachado/DataFrames/blob/master/blackfriday.zip\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "mQ8FPbuLZlIh" + }, + "source": [ + "## Exercício 11 - Predict the income class of US population.\n", + "\n", + "http://archive.ics.uci.edu/ml/machine-learning-databases/census-income-mld/" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Af4NRrchgPlM" + }, + "source": [ + "## Exercício 12 - Predicting Cancer\n" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "c4LOlgZW3P40" + }, + "source": [ + "from sklearn import datasets\n", + "cancer = datasets.load_breast_cancer()\n", + "X_cancer = cancer.data\n", + "y_cancer = cancer.target" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "74PmpT8Ix0tD" + }, + "source": [ + "## Exercício 13\n", + "Source: [Complete Machine Learning Guide to Parameter Tuning in Gradient Boosting (GBM) in Python](https://www.analyticsvidhya.com/blog/2016/02/complete-guide-parameter-tuning-gradient-boosting-gbm-python/).\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "WY8GZMixZ9W9" + }, + "source": [ + "## Exercício 14 - Predict Diabetes" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "y92t6tbOge0S" + }, + "source": [ + "from sklearn import datasets\n", + "Diabetes= datasets.load_diabetes()\n", + "\n", + "X_diabetes = Diabetes.data\n", + "y_diabetes = Diabetes.target" + ], + "execution_count": null, + "outputs": [] + } + ] +} \ No newline at end of file