From 30f1a4318dd4c882cf0197ba258b00dff9968e83 Mon Sep 17 00:00:00 2001 From: MariaJacobs70 <72224154+MariaJacobs70@users.noreply.github.com> Date: Tue, 6 Oct 2020 17:29:22 -0300 Subject: [PATCH 1/9] Criado usando o Colaboratory --- ...247\303\265es maria aula 06-10-2020.ipynb" | 5789 +++++++++++++++++ 1 file changed, 5789 insertions(+) create mode 100644 "Notebooks/NB02__Numpy - altera\303\247\303\265es maria aula 06-10-2020.ipynb" diff --git "a/Notebooks/NB02__Numpy - altera\303\247\303\265es maria aula 06-10-2020.ipynb" "b/Notebooks/NB02__Numpy - altera\303\247\303\265es maria aula 06-10-2020.ipynb" new file mode 100644 index 000000000..5fe378ba1 --- /dev/null +++ "b/Notebooks/NB02__Numpy - altera\303\247\303\265es maria aula 06-10-2020.ipynb" @@ -0,0 +1,5789 @@ +{ + "nbformat": 4, + "nbformat_minor": 0, + "metadata": { + "colab": { + "name": "NB02__Numpy.ipynb", + "provenance": [], + "collapsed_sections": [ + "n8BIbzQbNWUo", + "7eS94uQ4NhVR", + "SYOgJpGYVLUu", + "CaHFxk98W5if", + "ReWUyWiHXCnc", + "CqszHxaKHr2h", + "tXgF1Wl9gHKY", + "Fotx7XUquAo8", + "36kmLUYDvsUI", + "SWO2GdNovxAp", + "vpN54l4vxze5", + "u4HOf9SNytSq", + "6BQ9oZiD9hg5", + "tz5-QdrX9vct", + "p1muBgMX8NK4", + "FxTC2-U88ajk", + "z8EYn0pP25Rh" + ], + "include_colab_link": true + }, + "kernelspec": { + "name": "python3", + "display_name": "Python 3" + }, + "accelerator": "GPU" + }, + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "view-in-github", + "colab_type": "text" + }, + "source": [ + "\"Open" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "6QhLXoatkvKR" + }, + "source": [ + "

NUMPY

\n", + "\n", + "> NumPy é um pacote para computação científica e álgebra linear para Python.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "b8EZupp68vW8" + }, + "source": [ + "# **AGENDA**:\n", + "> Neste capítulo, vamos abordar os seguintes assuntos:\n", + "\n", + "* NumPy\n", + "* Criar arrays\n", + "* Criar Arrays Multidimensionais\n", + "* Selecionar itens\n", + "* Aplicar funções como max(), min() e etc\n", + "* Calcular Estatísticas Descritivas: média e variância\n", + "* Reshaping\n", + "* Tansposta de um array\n", + "* Autovalores e Autovetores\n", + "* Wrap Up\n", + "* Exercícios" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "cO5t3xCO8kyK" + }, + "source": [ + "___\n", + "# **NOTAS E OBSERVAÇÕES**\n", + "\n", + "* Nosso foco com o NumPy é facilitar o uso do Pandas;" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "z2IFUG4GSB0Z" + }, + "source": [ + "___\n", + "# **CHEETSHEET**" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "jYLeDVH-SNCg" + }, + "source": [ + "![Numpy](https://github.com/MathMachado/Materials/blob/master/numpy_basics-1.png?raw=true)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "0mKvExmgUFOk" + }, + "source": [ + "# **ESCALAR, VETORES, MATRIZES E TENSORES**\n", + "\n", + "![Tensor](https://github.com/MathMachado/Materials/blob/master/tensor.png?raw=true)\n", + "\n", + "Source: [PyTorch for Deep Learning: A Quick Guide for Starters](https://towardsdatascience.com/pytorch-for-deep-learning-a-quick-guide-for-starters-5b60d2dbb564)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "o00pYRIkXiAU" + }, + "source": [ + "## Import Statement - Primeiros exemplos\n", + "> Como exemplo, considere gerar uma amostra aleatória de tamanho 10 da Distribuição Normal(0, 1):" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "l_XuvcUDWNDk" + }, + "source": [ + "## Importar a library NumPy" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "am_ZTIGaapCo" + }, + "source": [ + "### **Opção 1**: Importar a biblioteca NumPy COM alias" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "b4irLw6BWVVZ" + }, + "source": [ + "import numpy as np # NM incluiu um comentário nesta linha!" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "JK54ga7dXnJu", + "outputId": "1a31527c-f8b6-44d5-ecbd-9f08abc5f8d6", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 50 + } + }, + "source": [ + "# Set up o número de casas decimais para o NumPy:\n", + "np.set_printoptions(precision = 2, suppress = True)\n", + "\n", + "'''\n", + "Define seed por questões de reproducibilidade, ou seja, \n", + "garante que todos vamos gerar os mesmos números aleatórios\n", + "'''\n", + "np.random.seed(seed = 20111974)\n", + "\n", + "# Gera 10 números aleatórios a partir da Distribuição Normal(media, desvio_padrao)\n", + "media = 0\n", + "desvio_padrao = 1\n", + "a_conjunto1 = np.random.normal(media, desvio_padrao, size = 10) # Array 1D de size = 10\n", + "a_conjunto1" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([ 2.51, 1.11, 2.06, 0.56, 0.3 , 1.05, -0.13, 1.06, 1.14,\n", + " 1.38])" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 2 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "3-0934isZUm6" + }, + "source": [ + "**Observação**: Altere o valor de [precision] para 4, 2 e 0 e observe o que acontece." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "9ob_8S_bYYa2" + }, + "source": [ + "### **Opção 2**: Importar a biblioteca NumPy SEM alias" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "NcGd1ho_XDXU" + }, + "source": [ + "import numpy" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "zFYH6J5-Ydjl" + }, + "source": [ + "# Set up o número de casas decimais para o NumPy:\n", + "numpy.set_printoptions(precision = 2, suppress = True)\n", + "\n", + "'''\n", + "Define seed por questões de reproducibilidade, ou seja, \n", + "garante que todos vamos gerar os mesmos números aleatórios\n", + "'''\n", + "numpy.random.seed(seed = 20111974)\n", + "\n", + "# Gera 10 números aleatórios a partir da Distribuição Normal(mu, desvio_padrao)\n", + "media = 0\n", + "desvio_padrao = 1\n", + "numpy.random.normal(size = 10)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "AwWSzYrZWfvA" + }, + "source": [ + "### **Opção 3**: Importar funções específicas da biblioteca NumPy" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "bfYJzcqRa5eu" + }, + "source": [ + "from numpy import set_printoptions\n", + "from numpy.random import seed, normal" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "Xj6fbpvubH_p" + }, + "source": [ + "# Set up o número de casas decimais para o NumPy:\n", + "set_printoptions(precision = 2, suppress = True)\n", + "\n", + "'''\n", + "Define seed por questões de reproducibilidade, ou seja, \n", + "garante que todos vamos gerar os mesmos números aleatórios\n", + "'''\n", + "seed(seed = 20111974)\n", + "\n", + "# Gera 10 números aleatórios a partir da Distribuição Normal(mu, desvio_padrao)\n", + "media = 0\n", + "desvio_padrao = 1 \n", + "np.random.normal(size = 10)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "00RerJPChnuP" + }, + "source": [ + "___\n", + "# **Estatísticas Descriticas com NumPy**" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Qa6ro1VJlShd" + }, + "source": [ + "## Exemplo 1\n", + "> Vamos voltar ao mesmo exemplo anterior, mas desta vez, usando a opção 1 (com alias):\n", + "\n", + "* Gerar uma amostra aleatória de tamanho 10 da Distribuiçao Normal(0, 1)." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "31dSBU8khvFk" + }, + "source": [ + "# Set up o número de casas decimais para o NumPy:\n", + "np.set_printoptions(precision = 2, suppress = True)\n", + "\n", + "# Define seed\n", + "np.random.seed(seed = 20111974)\n", + "\n", + "# Gera 10 números aleatórios a partir da Distribuição Normal(media, desvio_padrao)\n", + "media = 0\n", + "desvio_padrao = 1\n", + "\n", + "np.random\n", + "a_conjunto1 = np.random.normal(media, desvio_padrao, size = 10) # Array 1D de size = 10\n", + "a_conjunto1" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "wa2t0P3nevTh" + }, + "source": [ + "Conferindo a média e desvio-padrão do array gerado:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "drUyk3f5ekDq" + }, + "source": [ + "f'Distribuição N({np.mean(a_conjunto1)}, {np.std(a_conjunto1)})'" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "XSp7Hd-Gib67" + }, + "source": [ + "Estávamos à espera de media = 0 e sigma = 1. Certo? Porque isso não aconteceu?" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "HP_8VSgygXOF" + }, + "source": [ + "## **Laboratório 1**\n", + "> Altere os valores de [size] para 100, 1.000, 10.000, 100.000 e 1.000.000 e relate o que acontece com a média e desvio padrão." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "4TbmVbdcg6iU" + }, + "source": [ + "## **Minha solução**" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "-qdiqBVHg-gd" + }, + "source": [ + "# Define a média e o desvio-padrão\n", + "media = 0\n", + "desvio_padrao = 1\n", + "\n", + "# Define seed\n", + "np.random.seed(seed = 20111974)\n", + "l_lista_conjunto = [10, 100, 1000, 10000, 100000, 1000000, 10000000, 100000000, 1000000000]\n", + "\n", + "for i_size in l_lista_conjunto:\n", + " a_conjunto1 = np.random.normal(media, desvio_padrao, size = i_size)\n", + " print(f'Size: {i_size}--> Distribuição: N({np.mean(a_conjunto1)}, {np.std(a_conjunto1)})')" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "bp-YuviQwWqE" + }, + "source": [ + "Com relação à Distribuição Normal($\\mu, \\sigma$), temos que:\n", + "\n", + "![NormalDistribution](https://github.com/MathMachado/Materials/blob/master/NormalDistribution.PNG?raw=true)\n", + "\n", + "Fonte: [Normal Distribution](https://towardsdatascience.com/understanding-the-68-95-99-7-rule-for-a-normal-distribution-b7b7cbf760c2)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "KwHBY3Enk04N" + }, + "source": [ + "## Lei Forte dos Grandes Números - LFGN\n", + "> Por favor, leia o que diz a [Law of large numbers](https://en.wikipedia.org/wiki/Law_of_large_numbers). --> 3 minutos.\n", + "\n", + "* O que você aprendeu com isso?" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "BhwmSkAjlszT" + }, + "source": [ + "## Exemplo 2\n", + "> Vamos nos aprofundar um pouco mais no que diz a LFGN. Para isso, vamos simular o lançamento de dados. Como sabemos, os dados possuem 6 lados numerados de 1 a 6, com igual probabilidade. Certo?\n", + "\n", + "A LFGN nos diz que à medida que N (o tamanho da amostra ou número de dados) cresce, então a média dos dados converge para o valor esperado. Isso quer dizer que:\n", + "\n", + "$$\\frac{1+2+3+4+5+6}{6}= \\frac{21}{6}= 3,5$$\n", + "\n", + "Ou seja, à medida que N (o tamanho da amostra) cresce, espera-se que a média dos dados se aproxime de 3,5. Ok?\n", + "\n", + "Vamos ver se isso é verdade..." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "-QcJXf6roj0D" + }, + "source": [ + "Vamos usar o método np.random.randint (= função randint definido na classe np.random), a seguir:" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "A2u0RzLOrRE2" + }, + "source": [ + "O que significa ou qual é a interpretação do resultado abaixo?" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "B3-X_VBerUfa" + }, + "source": [ + "# Define seed\n", + "import numpy as np\n", + "np.random.seed(seed = 20111974)\n", + "\n", + "# Simular 100 lançamentos de um dado:\n", + "a_dados_simulados = np.random.randint(1, 7, size = 100)\n", + "a_dados_simulados" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "m8Of2MMIrbF3" + }, + "source": [ + "# Importar o pandas, pois vamos precisar do método pd.value_counts():\n", + "import pandas as pd\n", + "pd.value_counts(a_dados_simulados)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "54VwED8Br8rx" + }, + "source": [ + "**Interpretação**: Isso quer dizer que fizemos a simulação de lançamento de um dado 100 vezes. Acima, a frequência com que cada lado do dado aparece.\n", + "\n", + "Eu estava à espera de frequência igual para cada um dos lados, isto é, por volta dos 16 ou 17. Ou seja:\n", + "\n", + "$$\\frac{100}{6}= 16,66$$\n", + "\n", + "Mas ok, vamos continuar com nosso experimento..." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "HT_Dak-umC6I" + }, + "source": [ + "# Definir a semente\n", + "np.random.seed(20111974)\n", + "\n", + "for i_size in [10, 30, 50, 75, 100, 1000, 10000, 100000, 1000000]:\n", + " a_dados_simulados = np.random.randint(1, 7, size = i_size)\n", + " print(f'Size: {i_size} --> Média: {np.mean(a_dados_simulados)}')" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "edWNNOnXtbtd" + }, + "source": [ + "E agora, como você interpreta esses resultados?" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "eL6gXThkYcSf" + }, + "source": [ + "## Calcular percentis\n", + "> Boxplot" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "jlGOQfXfPf0D" + }, + "source": [ + "![BoxPlot](https://github.com/MathMachado/Materials/blob/master/boxplot.png?raw=true)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "grtEXG2BoNRt" + }, + "source": [ + "Considere o array de retornos (simulados) a seguir:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "DjPKKq01YjF9" + }, + "source": [ + "import numpy as np\n", + "np.random.seed(20111974)\n", + "\n", + "# Simulando Retornos de ativos financeiros com a distribuição Normal(0, 1):\n", + "a_retornos = np.random.normal(0, 1, 100)\n", + "print(f'Média: {np.mean(a_retornos)}')" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "ajjlfqgssLVO" + }, + "source": [ + "a_retornos" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "XZ3m06gv9lei" + }, + "source": [ + "A seguir, o boxplot do array a_retornos:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "QtuwJP449tBQ" + }, + "source": [ + "# Import da biblioteca seaborn: Uma das principais libraries para Data Visualization (outras: matplotlib)\n", + "import seaborn as sns\n", + "\n", + "sns.boxplot(y = a_retornos)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "o9ujdjxNY6qE" + }, + "source": [ + "# Vamos usar o método np.percentile(array, q = [p1, p2, p3, ..., p99])\n", + "percentis = np.percentile(a_retornos, q = [1, 5, 25, 50, 55, 75, 99])\n", + "\n", + "# Primeiro Quartil\n", + "q1 = percentis[2]" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "c75g2Egco2lc" + }, + "source": [ + "Em qual posição do array a_retornos se encontra Q3?" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "nZr-A82Zo8Kb" + }, + "source": [ + "q3 = percentis[5]\n", + "\n", + "# ou de trás para a frente do conteúdo da lista:\n", + "q3_2 = percentis[-2]\n", + "print(q3, q3_2)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "sWrnESPQT4JM" + }, + "source": [ + "# lim_inferior e lim_superior para detecção de outliers\n", + "lim_inferior = q1 - 1.5 * (q3 - q1)\n", + "lim_superior = q3 + 1.5 * (q3 - q1)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "Yb4-ZJlUUYsi" + }, + "source": [ + "f'Limite Inferior: {lim_inferior}; Limite Superior: {lim_superior}'" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "Jr6oXIHlUxOe" + }, + "source": [ + "np.min(a_retornos)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "UxE47cN0U54X" + }, + "source": [ + "np.max(a_retornos)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "OTB9HnIac499" + }, + "source": [ + "___\n", + "# **Ordenar itens de um array**\n", + "> Considere o array a seguir:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "Jgj8Yw46dBMx" + }, + "source": [ + "np.random.seed(20111974)\n", + "a_conjunto1 = np.random.random(10)\n", + "a_conjunto1" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "cC9272GFdRln" + }, + "source": [ + "Ordenando os itens de a_conjunto1..." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "YUP90nBVdUeF" + }, + "source": [ + "np.sort(a_conjunto1)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "lG763cDGj-yB" + }, + "source": [ + "___\n", + "# **Obter ajuda**" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "ehxPlD3EkEYL" + }, + "source": [ + "help(np.random.normal)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "1Q_konJVaBsV" + }, + "source": [ + "___\n", + "# **Criar arrays 1D**" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "DddZT5kadYJ7" + }, + "source": [ + "import numpy as np\n", + "np.set_printoptions(precision = 2, suppress = True)\n", + "np.random.seed(seed = 20111974)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "jaqd-VnF3yIt" + }, + "source": [ + "Criar o array 1D a_conjunto1, com os seguintes números:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "E3niz_zHaF3e" + }, + "source": [ + "a_conjunto1 = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])\n", + "a_conjunto1" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "DyfXbW_ZKJBS" + }, + "source": [ + "Qual a dimensão de a_conjunto1?" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "gbHlydALKB3R" + }, + "source": [ + "# Dimensão do array\n", + "a_conjunto1.ndim" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "am9otElpKNPa" + }, + "source": [ + "Qual o shape (dimensão) do array a_conjunto1?" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "juJJ74d2wale" + }, + "source": [ + "# Números de itens no array\n", + "a_conjunto1.shape" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "BHg4Rre3GwPy" + }, + "source": [ + "O array a_conjunto1 poderia ter sido criado usando a função np.arange(inicio, fim, step):" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "I3fyusN7G5Zn" + }, + "source": [ + "# Lembre-se que o número 10 é exclusive.\n", + "a_conjunto2 = np.arange(start = 0, stop = 10, step = 1)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "IHCEpmUxXsaK" + }, + "source": [ + "Outra alternativa seria usar np.linspace(start = 0, stop = 10, num = 9). Acompanhe a seguir:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "JB9Y_x3RX1GX" + }, + "source": [ + "# Com np.linspace, o valor 9 é inclusive.\n", + "a_conjunto3 = np.linspace(0, 9, 10)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "P6MR8MPeYOZm" + }, + "source": [ + "Compare os resultados de a_conjunto1, a_conjunto2 e a_conjunto3 a seguir:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "tWEzge6HYSFu" + }, + "source": [ + "a_conjunto1" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "lUNlFVKYYT9f" + }, + "source": [ + "a_conjunto2" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "Xo8Lid5fYVPW" + }, + "source": [ + "a_conjunto3" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "V9aW7C4vHAcF" + }, + "source": [ + "Ou seja, a_conjunto1 é igual a a_conjunto2 que também é igual a a_conjunto3. Ok?\n", + "\n", + "**ATENÇÃO**: Observe que a sintaxe para criar a_conjunto3 é ligeiramente diferente da sintaxe usada para criar a_conjunto1 e a_conjunto2. Abaixo, a sintaxe do comando np.linspace:\n", + "\n", + "![](https://github.com/MathMachado/Materials/blob/master/linspace_sintaxe.PNG?raw=true)\n", + "\n", + "Source: [HOW TO USE THE NUMPY LINSPACE FUNCTION](https://www.sharpsightlabs.com/blog/numpy-linspace/)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "KNnwZa3uvYqE" + }, + "source": [ + "Soma 2 à cada item de a_conjunto1:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "Jt2KVyviw0bp" + }, + "source": [ + "a_conjunto1" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "arROkhWXbdTW" + }, + "source": [ + "a_conjunto2 = a_conjunto1 + 2\n", + "a_conjunto2" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ZJx2vG86vdVi" + }, + "source": [ + "Multiplicar por 10 cada item de a_conjunto1:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "Vm7abO6Ebkun" + }, + "source": [ + "a_conjunto1 = a_conjunto1*10\n", + "a_conjunto1" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "0Ev1xnBwaYJG" + }, + "source": [ + "___\n", + "# **Criar Arrays Multidimensionais**\n", + "> Ao criarmos, por exemplo, um array 2D, então a chamamos de matriz." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "gHaeAug5vjjd" + }, + "source": [ + "Criar o array com 2 linhas e 3 colunas usando números aleatórios:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "VDi0vIPSYR4F" + }, + "source": [ + "np.random.seed(20111974)\n", + "a_conjunto1 = np.random.randn(2, 3)\n", + "a_conjunto1" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "DIdd-nA3tJjV" + }, + "source": [ + "## Dimensão de um array\n", + "> Dimensão é o número de linhas e colunas da matriz." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "pKvjjnkrK-v7" + }, + "source": [ + "a_conjunto1.shape" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "-DHS5jXELCfa" + }, + "source": [ + "a_conjunto1 é um array 2D (ou matriz), ou seja, 2 linhas, onde cada linha tem 3 elementos." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "HJI6X1wvv4Bg" + }, + "source": [ + "Criar um array com 3 linhas e 3 colunas:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "hXPbWh3Tv26T" + }, + "source": [ + "a_conjunto2 = np.array([[1, 2, 3],[4, 5, 6],[7, 8, 9]])\n", + "a_conjunto2" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "we6ZJOICc7bQ" + }, + "source": [ + "# Número de linhas e colunas de a_conjunto1:\n", + "a_conjunto1.shape" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "f0ocwuI1dED6" + }, + "source": [ + "# Número de linhas e colunas de a_conjunto2\n", + "a_conjunto2.shape" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "CApPtnW0YuRP" + }, + "source": [ + "# Somar 2 à cada elemento de a_conjunto2\n", + "a_conjunto2 = a_conjunto2+2\n", + "a_conjunto2" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "M87aGmxRY3RW" + }, + "source": [ + "# Multiplicar por 10 cada elemento de a_conjunto2\n", + "a_conjunto2 = a_conjunto2*10\n", + "a_conjunto2" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "qZt93y1IL_v7" + }, + "source": [ + "___\n", + "# **Copiar arrays**\n", + "> Considere o array abaixo:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "sH2FTXj5MRRC" + }, + "source": [ + "np.random.seed(20111974)\n", + "a_conjunto1 = np.random.randn(2, 3)\n", + "a_conjunto1" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "VtgKeMt6MYrr" + }, + "source": [ + "Fazendo a cópia de a_conjunto1..." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "K0hOHR3IMa-o" + }, + "source": [ + "a_salarios_copia = a_conjunto1.copy()\n", + "a_salarios_copia" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "lFpmcR0HkCar" + }, + "source": [ + "___\n", + "# **Operações com arrays**\n", + "> Considere um array com temperaturas em Farenheit dado por:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "VnagcUqVkLhW" + }, + "source": [ + "# Define a seed\n", + "np.random.seed(20111974)\n", + "\n", + "a_temperatura_farenheit = np.array(np.random.randint(0, 100, 10))\n", + "a_temperatura_farenheit " + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "VrjNKfXxk1yv" + }, + "source": [ + "type(a_temperatura_farenheit)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "o1STejhrk0kZ" + }, + "source": [ + "Transformando a temperatura Fahrenheit em Celsius..." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "E_jXflR_lNy3" + }, + "source": [ + "a_temperatura_celsius = 5*a_temperatura_farenheit/9 - 5*32/9\n", + "a_temperatura_celsius" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "U4pCv0pNqPZI" + }, + "source": [ + "# O mesmo resultado, porém, escrito de forma diferente:\n", + "a_temperatura_celsius = (5/9)*a_temperatura_farenheit - (160/9)\n", + "a_temperatura_celsius" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "1UT4YD2FawUA" + }, + "source": [ + "___\n", + "# **Selecionar itens**" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "pqOv8P1za1m8" + }, + "source": [ + "# Selecionar o segundo item de a_conjunto1 (lembre-se que no Python arrays começam com indice = 0)\n", + "a_conjunto1[1]" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "TIwVKk6AyRv6" + }, + "source": [ + "Dado a_conjunto2 abaixo:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "zoDmbXo6bCeu" + }, + "source": [ + "a_conjunto2" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "iJXSPp-0yb4w" + }, + "source": [ + "... selecionar o item da linha 2, coluna 3 do array a_conjunto2:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "sJiVfnlzcjRv" + }, + "source": [ + "a_conjunto2[1, 2]" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "Xl5HwJIMcv2e" + }, + "source": [ + "# Selecionar o último elemento de a_conjunto1 --> Lembre-se que a_conjunto1 é um array. Desta forma, teremos o último elemento do array!\n", + "a_conjunto1[-1]" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ezTH0HsyrnAl" + }, + "source": [ + "Veja..." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "OBv9EM54rYX3" + }, + "source": [ + "a_conjunto1" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "Po3WLFC-rod8" + }, + "source": [ + "a_temperatura_celsius[-1]" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "4qJJ2HCedW4h" + }, + "source": [ + "___\n", + "# **Aplicar funções como max(), min() e etc**" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "_meTJdUsda4e" + }, + "source": [ + "f'O máximo de a_conjunto1 é: {np.max(a_conjunto1)}'" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "m-wiBkAidnhN" + }, + "source": [ + "f'O mínimo de a_conjunto1 é: {np.min(a_conjunto1)}'" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "lmupnRHQdtwh" + }, + "source": [ + "f'O máximo de a_conjunto2 é: {np.max(a_conjunto2)}'" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "H2z7oB6Bd786" + }, + "source": [ + "f'O máximo de cada LINHA de a_conjunto2 é: {np.max(a_conjunto2, axis = 1)}' # Aqui, axis = 1 é que diz ao numpy que estamos interessados nas linhas" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "gj2ZBDsWeMyk" + }, + "source": [ + "f'O máximo de cada COLUNA de a_conjunto2 é: {np.max(a_conjunto2, axis = 0)}' # axis = 0, diz ao numpy que estamos interessados nas colunas." + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "7_tEfm2IecIU" + }, + "source": [ + "___\n", + "# **Calcular Estatísticas Descritivas: média e variância**" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "lIY5jx3ueh7q" + }, + "source": [ + "f'A média de a_conjunto1 é: {np.mean(a_conjunto1)}'" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "VmqSELRReuAW" + }, + "source": [ + "f'A média de a_conjunto2 é: {np.mean(a_conjunto2)}'" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "Gxap-Wg5e2_H" + }, + "source": [ + "f'O Desvio Padrão de a_conjunto2 é: {np.std(a_conjunto2)}'" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "R0GcljGtfBvP" + }, + "source": [ + "___\n", + "# **Reshaping**\n", + "> Muito útil em Machine Learning." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "vfEmw01j8zux" + }, + "source": [ + "## Exemplo 1\n", + "* O array a_conjunto2 tem a seguinte forma:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "-Lb3VZCCfK_a" + }, + "source": [ + "a_conjunto2" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "YWN_nN-4fD7u" + }, + "source": [ + "# reshaping para 9 linhas e 1 coluna:\n", + "a_conjunto2.reshape(9, 1) # a_conjunto2.reshape(9,-1) produz o mesmo resultado." + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "id9ILRRt7SwY" + }, + "source": [ + "## Mais um exemplo de Reshape\n", + "> Dado o array 1D abaixo, reshape para um array 3D com 2 colunas." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "9RA9Ht2b7Swd", + "outputId": "eadedfd5-fd6c-49c8-db5c-6f8f30d45f36", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + } + }, + "source": [ + "# Define seed\n", + "np.random.seed(20111974)\n", + "a_conjunto1 = np.array(np.random.randint(1, 10, size = 15))\n", + "a_conjunto1" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([9, 9, 3, 9, 2, 9, 1, 5, 3, 1, 9, 4, 8, 2, 4])" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 19 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "8KxR4xZT7cRv" + }, + "source": [ + "### Solução\n", + "> Temos 15 elementos em a_conjunto1 para construir (\"reshape\") um array 3D com 2 colunas.\n", + "\n", + "A princípio, a solução seria..." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "VMdHl1Il7wLw", + "outputId": "d51c7263-f523-4af8-9606-ee93cab66f1c", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 163 + } + }, + "source": [ + "a_conjunto1.reshape(-1, 2) # O valor \"-1\" na posição das linhas pede ao NumPy para calcular o número de linhas automaticamente." + ], + "execution_count": null, + "outputs": [ + { + "output_type": "error", + "ename": "ValueError", + "evalue": "ignored", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mValueError\u001b[0m Traceback (most recent call last)", + "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0ma_numeros1\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mreshape\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m-\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m2\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;31m# O valor \"-1\" na posição das linhas pede ao NumPy para calcular o número de linhas automaticamente.\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", + "\u001b[0;31mValueError\u001b[0m: cannot reshape array of size 15 into shape (2)" + ] + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "pZS4b4-y708q" + }, + "source": [ + "Porque temos esse erro?" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "4disywvR8HeH" + }, + "source": [ + "E se fizermos..." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "3oEAAXTp8I7Z", + "outputId": "e8c8a90f-c34a-4304-d9b4-fd7f04ce224f", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + } + }, + "source": [ + "# Define seed\n", + "np.random.seed(20111974)\n", + "a_conjunto1 = np.array(np.random.randint(1, 10, size = 16)) # Observe que agora temos 16 elementos\n", + "a_conjunto1" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([9, 9, 3, 9, 2, 9, 1, 5, 3, 1, 9, 4, 8, 2, 4, 3])" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 21 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "iUhth0QV8Rpt" + }, + "source": [ + "Reshapping..." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "9D1y7uD88Qip", + "outputId": "e7d22bcd-c10f-4ea3-e41b-03f6f98a054f", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 153 + } + }, + "source": [ + "a_conjunto1.reshape(-1, 2) # O valor \"-1\" na posição das linhas pede ao NumPy para calcular o número de linhas automaticamente." + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([[9, 9],\n", + " [3, 9],\n", + " [2, 9],\n", + " [1, 5],\n", + " [3, 1],\n", + " [9, 4],\n", + " [8, 2],\n", + " [4, 3]])" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 22 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "ALh-sq7DMnN5", + "outputId": "db373349-7910-4f1f-93f3-8ac8f67da8b8", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 153 + } + }, + "source": [ + "# OU --> Neste caso, estamos reshaping o array em 8 linhas e 2 colunas\n", + "a_conjunto1.reshape(8, -1)" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([[9, 9],\n", + " [3, 9],\n", + " [2, 9],\n", + " [1, 5],\n", + " [3, 1],\n", + " [9, 4],\n", + " [8, 2],\n", + " [4, 3]])" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 26 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "yvTnrszn8Yk0" + }, + "source": [ + "Porque agora deu certo?" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "LeQ9LqIE8baG" + }, + "source": [ + "## Último exemplo com reshape\n", + "> Considere o array a seguir:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "OQOC9iiN8hZT" + }, + "source": [ + "np.random.seed(20111974)\n", + "a_conjunto1 = np.random.randn(2, 3)\n", + "a_conjunto1" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Cvce8qBl9Cvq" + }, + "source": [ + "Queremos agora transformá-la num array de 3 linhas e 2 colunas." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "QDDsYoVt9Klz" + }, + "source": [ + "a_conjunto1.reshape(-1, 2)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "AdwU5ygt9Svq" + }, + "source": [ + "Poderia ser..." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "5uBeokKc9Uo-" + }, + "source": [ + "a_conjunto1.reshape(3, -1)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "OeRBsobc9aKj" + }, + "source": [ + "E por fim, também poderia ser..." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "MDt8UYYH9dBw" + }, + "source": [ + "a_conjunto1.reshape(3, 2)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "91o5vycQfdKW" + }, + "source": [ + "___\n", + "# **Transposta**\n", + "* O array a_conjunto2 tem a seguinte forma:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "RsZwyuhoffjb" + }, + "source": [ + "a_conjunto2" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "A3MzTVoGfiyO" + }, + "source": [ + "# Transposta do array a_conjunto2 é dado por:\n", + "a_conjunto2.T" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Ij-ZW5IyzXIb" + }, + "source": [ + "Ou seja, linha virou coluna. Ok?" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "qLy6ajgpt3lU" + }, + "source": [ + "# **Inversa da matriz quadrada**\n", + "> Se uma matriz é não-singular, então sua inversa existe.\n", + "\n", + "* Se o determinante de uma matriz is not equal to zero, then the matrix isé diferente de 0, então a matriz é não-singular." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "-u7jRq34t9_x" + }, + "source": [ + "import numpy as np\n", + "\n", + "a_conjunto1 = np.array([[1, 2, 3],[4, 5, 6],[7, 8, 9]])\n", + "a_conjunto2 = np.array([[6, 2], [5, 3]])\n", + "a_conjunto3 = np.array([[1, 3, 5],[2, 5, 1],[2, 3, 8]])" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "7zmHHWWlvaYB" + }, + "source": [ + "a_conjunto1" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "3fHKyhOJvcak" + }, + "source": [ + "a_conjunto2" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "vQG7yyfjwLg9" + }, + "source": [ + "a_conjunto3" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "qa2Yre2rwgRk" + }, + "source": [ + "## Determinantes da matriz quadrada" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "N6jwuC6twkyc" + }, + "source": [ + "np.linalg.det(a_conjunto1)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "QSvViNwzwnhI" + }, + "source": [ + "np.linalg.det(a_conjunto2)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "o8jwsnccw5id" + }, + "source": [ + "np.linalg.det(a_conjunto3)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "kkVaTgzgw_XJ" + }, + "source": [ + "A seguir, calculamos as inversas das matrizes acima definidas..." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "b9FgWvTYvpik" + }, + "source": [ + "np.linalg.inv(a_conjunto2)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "KsdEt1kIvsM_" + }, + "source": [ + "np.linalg.inv(a_conjunto1)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "VA_F7_7kccpn" + }, + "source": [ + "Porque não temos a inversa de a_conjunto1?" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "ANPBCnmVwOf4" + }, + "source": [ + "np.linalg.inv(a_conjunto3)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "XAf9k1egxcdF" + }, + "source": [ + "# **Resolver sistemas de equações lineares**\n", + "> Considere o sistema de euqações lineares abaixo:\n", + "\n", + "\\begin{equation}\n", + "x + 3y + 5z = 10\\\\\n", + "2x+ 5y + z = 8 \\\\\n", + "2x + 3y + 8z= 3\n", + "\\end{equation}\n", + "\n", + "Ou $Ax = b$. A solução deste sistema de equações é dada por $A^{-1}b$." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "oNf5nqaLxhBY" + }, + "source": [ + "Ou seja, basta encontrarmos a inversa de A e multiplicarmos por b." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "omzC5dGA0btc" + }, + "source": [ + "A= np.array([[1, 3, 5], [2, 5, 1], [2, 3, 8]])\n", + "np.linalg.inv(A)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "AiXI3oxB05iE" + }, + "source": [ + "Agora basta multiplicar a matriz inversa $A^{-1}$ acima por b. " + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "XoGebKDa2Fcd" + }, + "source": [ + "A_Inv = np.linalg.inv(A)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "sKaP0a1QZG-P" + }, + "source": [ + "b= np.array([10, 8, 3]).reshape(3, -1)\n", + "b" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "3dAVq8dg19VI" + }, + "source": [ + "A_Inv.dot(b)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "zso6hTnB17cm" + }, + "source": [ + "Uma forma fácil de se fazer isso é utilizar a expressão abaixo:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "ptQHIVll1E4P" + }, + "source": [ + "b= np.array([[10], [8], [3]])\n", + "b" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "X4VL8lyY1Xus" + }, + "source": [ + "np.linalg.solve(A, b)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "fJKmwTS59-Bc" + }, + "source": [ + "# **Empilhar arrays**\n", + "\n", + "## Exemplo 1\n", + "\n", + "![Empilhar1](https://github.com/MathMachado/Materials/blob/master/Empilhar1.PNG?raw=true)\n", + "\n", + "## Exemplo 2\n", + "\n", + "![Empilhar2](https://github.com/MathMachado/Materials/blob/master/Empilhar2.PNG?raw=true)\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "rhPTt3EwXden" + }, + "source": [ + "## Gerar os arrays do exemplo1" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "zEI-yBy3-E46" + }, + "source": [ + "np.random.seed(20111974)\n", + "a_conjunto1 = np.random.randn(5, 8)\n", + "\n", + "np.random.seed(19741120)\n", + "a_conjunto2 = np.random.randn(8, 8)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "UYsAqBRp--79" + }, + "source": [ + "## Método 1 - Concatenate([A, B])" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "HgO1ujvhObyE", + "outputId": "c40e7ed9-255b-4886-dddf-3b17f2b1be2f", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 187 + } + }, + "source": [ + "a_conjunto1" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([[ 2.5062768 , 1.11440422, 2.05565501, 0.56482376, 0.29897276,\n", + " 1.04930857, -0.12607366, 1.06227632],\n", + " [ 1.13807032, 1.37966044, -2.05995563, 0.67474814, 0.72722843,\n", + " -0.33923852, 0.43613107, 0.59135489],\n", + " [-1.29281877, 1.17712036, -0.98644163, -1.79034143, -1.08913605,\n", + " -0.90712825, -1.02291108, -1.36445713],\n", + " [-0.29429164, 0.06343709, -1.14196185, -0.50706079, -0.83539436,\n", + " -1.41492946, -0.2159062 , -1.16519474],\n", + " [-0.60767518, -0.61510925, 1.0771542 , 0.5043687 , 0.02674197,\n", + " 1.83494644, 0.34728874, -1.14671885]])" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 33 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "2aQY_klZOeg9", + "outputId": "14eb3d9c-d0fc-4b6a-fe19-1790695c838f", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 289 + } + }, + "source": [ + "a_conjunto2" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([[-0.77337752, -1.10547465, 0.10062807, -1.14571729, -2.15266227,\n", + " -0.75255725, -2.1529949 , -0.33017773],\n", + " [-1.10465731, 0.32889675, 0.01010198, -1.33213633, -0.33945805,\n", + " -0.01299007, 0.05342823, -0.18641201],\n", + " [ 0.39473805, -0.89354231, -0.50667323, -0.74660913, 1.83586365,\n", + " -1.20536871, 1.20184886, 0.51160897],\n", + " [-0.56952286, -0.93343871, -0.24972528, 0.98487133, 1.19333367,\n", + " 2.29956497, 0.16657022, 0.71357415],\n", + " [-0.45251078, 0.92163918, 0.73421263, 2.17811191, -0.05655212,\n", + " 1.25326 , -0.37039248, 1.43855202],\n", + " [ 0.85646091, -0.11257239, -0.35400297, 0.94136671, -0.08696163,\n", + " -1.49000701, 0.00848666, 0.86705275],\n", + " [ 1.6340906 , 1.36321063, -0.02175361, -0.45301645, -0.37111236,\n", + " -0.04716069, -2.27337435, 0.95318738],\n", + " [ 0.7100548 , -0.79883269, -0.3165779 , -1.58352824, -0.37751484,\n", + " -0.29760341, -0.73424207, -0.55703223]])" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 34 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "bK70vaq8_KMH", + "outputId": "f6d400cf-4b54-4990-815b-052f5224aadd", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 459 + } + }, + "source": [ + "np.concatenate([a_conjunto1, a_conjunto2], axis = 0) # axis= 0 diz ao NumPy para empilhar as linhas" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([[ 2.5062768 , 1.11440422, 2.05565501, 0.56482376, 0.29897276,\n", + " 1.04930857, -0.12607366, 1.06227632],\n", + " [ 1.13807032, 1.37966044, -2.05995563, 0.67474814, 0.72722843,\n", + " -0.33923852, 0.43613107, 0.59135489],\n", + " [-1.29281877, 1.17712036, -0.98644163, -1.79034143, -1.08913605,\n", + " -0.90712825, -1.02291108, -1.36445713],\n", + " [-0.29429164, 0.06343709, -1.14196185, -0.50706079, -0.83539436,\n", + " -1.41492946, -0.2159062 , -1.16519474],\n", + " [-0.60767518, -0.61510925, 1.0771542 , 0.5043687 , 0.02674197,\n", + " 1.83494644, 0.34728874, -1.14671885],\n", + " [-0.77337752, -1.10547465, 0.10062807, -1.14571729, -2.15266227,\n", + " -0.75255725, -2.1529949 , -0.33017773],\n", + " [-1.10465731, 0.32889675, 0.01010198, -1.33213633, -0.33945805,\n", + " -0.01299007, 0.05342823, -0.18641201],\n", + " [ 0.39473805, -0.89354231, -0.50667323, -0.74660913, 1.83586365,\n", + " -1.20536871, 1.20184886, 0.51160897],\n", + " [-0.56952286, -0.93343871, -0.24972528, 0.98487133, 1.19333367,\n", + " 2.29956497, 0.16657022, 0.71357415],\n", + " [-0.45251078, 0.92163918, 0.73421263, 2.17811191, -0.05655212,\n", + " 1.25326 , -0.37039248, 1.43855202],\n", + " [ 0.85646091, -0.11257239, -0.35400297, 0.94136671, -0.08696163,\n", + " -1.49000701, 0.00848666, 0.86705275],\n", + " [ 1.6340906 , 1.36321063, -0.02175361, -0.45301645, -0.37111236,\n", + " -0.04716069, -2.27337435, 0.95318738],\n", + " [ 0.7100548 , -0.79883269, -0.3165779 , -1.58352824, -0.37751484,\n", + " -0.29760341, -0.73424207, -0.55703223]])" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 35 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "CpaXBkm8_BF8" + }, + "source": [ + "## Método 2 - np.r_[A, B]" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "3QnVUzAY_teZ", + "outputId": "e8adfd85-e760-40f5-d9ac-48353d24ccd2", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 459 + } + }, + "source": [ + "np.r_[a_conjunto1, a_conjunto2]" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([[ 2.5062768 , 1.11440422, 2.05565501, 0.56482376, 0.29897276,\n", + " 1.04930857, -0.12607366, 1.06227632],\n", + " [ 1.13807032, 1.37966044, -2.05995563, 0.67474814, 0.72722843,\n", + " -0.33923852, 0.43613107, 0.59135489],\n", + " [-1.29281877, 1.17712036, -0.98644163, -1.79034143, -1.08913605,\n", + " -0.90712825, -1.02291108, -1.36445713],\n", + " [-0.29429164, 0.06343709, -1.14196185, -0.50706079, -0.83539436,\n", + " -1.41492946, -0.2159062 , -1.16519474],\n", + " [-0.60767518, -0.61510925, 1.0771542 , 0.5043687 , 0.02674197,\n", + " 1.83494644, 0.34728874, -1.14671885],\n", + " [-0.77337752, -1.10547465, 0.10062807, -1.14571729, -2.15266227,\n", + " -0.75255725, -2.1529949 , -0.33017773],\n", + " [-1.10465731, 0.32889675, 0.01010198, -1.33213633, -0.33945805,\n", + " -0.01299007, 0.05342823, -0.18641201],\n", + " [ 0.39473805, -0.89354231, -0.50667323, -0.74660913, 1.83586365,\n", + " -1.20536871, 1.20184886, 0.51160897],\n", + " [-0.56952286, -0.93343871, -0.24972528, 0.98487133, 1.19333367,\n", + " 2.29956497, 0.16657022, 0.71357415],\n", + " [-0.45251078, 0.92163918, 0.73421263, 2.17811191, -0.05655212,\n", + " 1.25326 , -0.37039248, 1.43855202],\n", + " [ 0.85646091, -0.11257239, -0.35400297, 0.94136671, -0.08696163,\n", + " -1.49000701, 0.00848666, 0.86705275],\n", + " [ 1.6340906 , 1.36321063, -0.02175361, -0.45301645, -0.37111236,\n", + " -0.04716069, -2.27337435, 0.95318738],\n", + " [ 0.7100548 , -0.79883269, -0.3165779 , -1.58352824, -0.37751484,\n", + " -0.29760341, -0.73424207, -0.55703223]])" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 36 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "XmSPbDP6_20W" + }, + "source": [ + "**Obs**.: Eu prefiro este método!" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "dzVKW_wX_Dzw" + }, + "source": [ + "## Método 3 - np.vstack([A, B]) = np.r_[A, B]" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "uL7lEN_mABID", + "outputId": "d1ea4d86-2cc1-4e2d-af72-b3a292ef15fd", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 459 + } + }, + "source": [ + "np.vstack([a_conjunto1, a_conjunto2])" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([[ 2.5062768 , 1.11440422, 2.05565501, 0.56482376, 0.29897276,\n", + " 1.04930857, -0.12607366, 1.06227632],\n", + " [ 1.13807032, 1.37966044, -2.05995563, 0.67474814, 0.72722843,\n", + " -0.33923852, 0.43613107, 0.59135489],\n", + " [-1.29281877, 1.17712036, -0.98644163, -1.79034143, -1.08913605,\n", + " -0.90712825, -1.02291108, -1.36445713],\n", + " [-0.29429164, 0.06343709, -1.14196185, -0.50706079, -0.83539436,\n", + " -1.41492946, -0.2159062 , -1.16519474],\n", + " [-0.60767518, -0.61510925, 1.0771542 , 0.5043687 , 0.02674197,\n", + " 1.83494644, 0.34728874, -1.14671885],\n", + " [-0.77337752, -1.10547465, 0.10062807, -1.14571729, -2.15266227,\n", + " -0.75255725, -2.1529949 , -0.33017773],\n", + " [-1.10465731, 0.32889675, 0.01010198, -1.33213633, -0.33945805,\n", + " -0.01299007, 0.05342823, -0.18641201],\n", + " [ 0.39473805, -0.89354231, -0.50667323, -0.74660913, 1.83586365,\n", + " -1.20536871, 1.20184886, 0.51160897],\n", + " [-0.56952286, -0.93343871, -0.24972528, 0.98487133, 1.19333367,\n", + " 2.29956497, 0.16657022, 0.71357415],\n", + " [-0.45251078, 0.92163918, 0.73421263, 2.17811191, -0.05655212,\n", + " 1.25326 , -0.37039248, 1.43855202],\n", + " [ 0.85646091, -0.11257239, -0.35400297, 0.94136671, -0.08696163,\n", + " -1.49000701, 0.00848666, 0.86705275],\n", + " [ 1.6340906 , 1.36321063, -0.02175361, -0.45301645, -0.37111236,\n", + " -0.04716069, -2.27337435, 0.95318738],\n", + " [ 0.7100548 , -0.79883269, -0.3165779 , -1.58352824, -0.37751484,\n", + " -0.29760341, -0.73424207, -0.55703223]])" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 37 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "68icJ-2ZAdRj" + }, + "source": [ + "# Concatenar arrays\n", + "\n", + "## Exemplo 1\n", + "\n", + "![Concatenar1](https://github.com/MathMachado/Materials/blob/master/Concatenar1.PNG?raw=true)\n", + "\n", + "# Exemplo 2\n", + "\n", + "![Concatenar2](https://github.com/MathMachado/Materials/blob/master/Concatenar2.PNG?raw=true)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "OplgK9YoQi9o" + }, + "source": [ + "## Concatenar os elementos de dois arrays - np.c_[A, B]" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "lpdsbTEKQ9EY" + }, + "source": [ + "np.random.seed(20111974)\n", + "a_conjunto1 = np.random.randint(0, 10, 100).reshape(-1, 10)\n", + "a_conjunto2 = np.random.randint(0, 2, 10).reshape(-1, 1)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "JPxhGsaSSMk2", + "outputId": "47727fe9-05f1-4ff7-ec0a-04579120cf78", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 187 + } + }, + "source": [ + "a_conjunto1" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([[8, 8, 2, 8, 9, 1, 8, 0, 4, 2],\n", + " [0, 8, 9, 3, 7, 1, 3, 2, 9, 7],\n", + " [7, 9, 5, 6, 8, 7, 0, 9, 3, 9],\n", + " [3, 1, 8, 6, 3, 5, 4, 1, 2, 9],\n", + " [8, 6, 6, 1, 0, 9, 2, 0, 7, 5],\n", + " [5, 4, 4, 2, 7, 2, 7, 9, 3, 1],\n", + " [5, 0, 1, 2, 3, 8, 7, 5, 4, 0],\n", + " [5, 9, 6, 6, 1, 3, 6, 0, 4, 9],\n", + " [2, 1, 0, 9, 1, 4, 2, 9, 7, 9],\n", + " [5, 3, 7, 6, 3, 9, 8, 4, 3, 0]])" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 39 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "9ZyUPfybTfej", + "outputId": "ac27a20e-1622-4cb9-d6f6-74ee467bdb72", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 187 + } + }, + "source": [ + "a_conjunto2" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([[1],\n", + " [0],\n", + " [0],\n", + " [0],\n", + " [0],\n", + " [1],\n", + " [0],\n", + " [0],\n", + " [0],\n", + " [1]])" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 40 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "nS1cPG3aRug1", + "outputId": "c70cf891-ae8f-445d-c271-c6b7f7da1738", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 187 + } + }, + "source": [ + "# colocando o array a_conjunto2 do lado de a_conjunto1.\n", + "np.c_[a_conjunto1, a_conjunto2]" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([[8, 8, 2, 8, 9, 1, 8, 0, 4, 2, 1],\n", + " [0, 8, 9, 3, 7, 1, 3, 2, 9, 7, 0],\n", + " [7, 9, 5, 6, 8, 7, 0, 9, 3, 9, 0],\n", + " [3, 1, 8, 6, 3, 5, 4, 1, 2, 9, 0],\n", + " [8, 6, 6, 1, 0, 9, 2, 0, 7, 5, 0],\n", + " [5, 4, 4, 2, 7, 2, 7, 9, 3, 1, 1],\n", + " [5, 0, 1, 2, 3, 8, 7, 5, 4, 0, 0],\n", + " [5, 9, 6, 6, 1, 3, 6, 0, 4, 9, 0],\n", + " [2, 1, 0, 9, 1, 4, 2, 9, 7, 9, 0],\n", + " [5, 3, 7, 6, 3, 9, 8, 4, 3, 0, 1]])" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 41 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "kIgU1YBw0OeM" + }, + "source": [ + "___\n", + "# **Selecionar itens que satisfazem condições**\n", + "> Considere o array a seguir:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "e2pL5anBV0DI", + "outputId": "f37cd827-ee00-49ba-994d-77cab3a24421", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + } + }, + "source": [ + "a_conjunto1 = np.arange(10, 0, -1)\n", + "a_conjunto1" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([10, 9, 8, 7, 6, 5, 4, 3, 2, 1])" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 42 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "i9HuZZAfV302" + }, + "source": [ + "Selecionar somente os itens > 7:" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ZCESvr7iXMkV" + }, + "source": [ + "## Usando np.where()" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "BdrAQLHkTS-v", + "outputId": "44a6e480-1b6c-4dad-ee29-2fcb4ada5097", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + } + }, + "source": [ + "a_conjunto1" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([10, 9, 8, 7, 6, 5, 4, 3, 2, 1])" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 45 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "O_ZBaWxfWA9o", + "outputId": "fae44244-ff29-4b04-cd2d-a4c768487e75", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + } + }, + "source": [ + "# Índices do array que atendem a condição\n", + "l_indices = np.where(a_conjunto1 > 7)\n", + "l_indices" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "(array([0, 1, 2]),)" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 44 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "EdWlfPOZWPME" + }, + "source": [ + "**Atenção**: Capturamos os índices. Para selecionar os itens, basta fazer:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "tOxs3iYQWWxu", + "outputId": "b402fdfd-c6e0-4170-b35c-c7c5cd2ca85e", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + } + }, + "source": [ + "a_conjunto2 = a_conjunto1[l_indices]\n", + "a_conjunto2" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([10, 9, 8])" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 46 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "PGsENqkaXRjh" + }, + "source": [ + "## Alternativa: Usando []" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "YbdRNk1WXTLT", + "outputId": "062b157c-00fb-4f8f-d207-a0c8e9871e48", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + } + }, + "source": [ + "a_conjunto1[a_conjunto1 > 7]" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([10, 9, 8])" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 47 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "jijpzFxcSQC8" + }, + "source": [ + "Acho que vale a pena quebrar esta solução para entendermos melhor como as coisas funcionam:#" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "rujhP2LQSWsq" + }, + "source": [ + " # Primeiro, avalie o resultado de a_conjunto1 > 7:" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "FYZaBsasSb3N", + "outputId": "0a190896-249c-4d7c-ea0d-a20a53536446", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 51 + } + }, + "source": [ + "a_conjunto1 > 7" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([ True, True, True, False, False, False, False, False, False,\n", + " False])" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 48 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "mvEof-UKaaVG" + }, + "source": [ + "a_conjunto1[a_conjunto1 > 7]" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "nO4FiBmDUZOT", + "outputId": "9f54e601-d95a-444c-bd59-28947e332248", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + } + }, + "source": [ + "a_conjunto1" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([-1, -1, -1, 7, 6, 5, 4, 3, 2, 1])" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 52 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Ci5lT9nmSfsX" + }, + "source": [ + "Agora, com este resultado, fica fácil entender como o Python seleciona os elementos. Consegue explicar?" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "1v5Lfin0GGKD" + }, + "source": [ + "# Substituir itens baseado em condições\n", + "> Substituir os valores negativos do array abaixo por 0." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "CLY_u0ePWdN7" + }, + "source": [ + "## Gerar o exemplo" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "NUANFy-fNXf5" + }, + "source": [ + "np.random.seed(20111974)\n", + "a_conjunto1 = np.array(np.random.randint(0, 10, size = 100))\n", + "\n", + "# Lista aleatória de índices que vou alterar\n", + "np.random.seed(20111974)\n", + "l_indices= np.random.randint(0, 99, 9)\n", + "\n", + "for i in l_indices:\n", + " a_conjunto1[i] = -1*a_conjunto1[i]\n", + "\n", + "a_conjunto2 = a_conjunto1.copy()\n", + "a_conjunto2" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "dWVyI40uN2d2" + }, + "source": [ + "# Indices a serem multiplicados por -1:\n", + "l_indices" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "3Whuu854OJDZ" + }, + "source": [ + "## Substituir os valores negativos por 0" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "sr268Rp8b-Se", + "outputId": "82514805-b350-45c4-a3fc-7cb24c847b7f", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + } + }, + "source": [ + "a_conjunto2 < 0" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([False, False, False])" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 50 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "C-eKqPrfOQF6" + }, + "source": [ + "a_conjunto2[a_conjunto2 < 0] = 0\n", + "a_conjunto2" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "eDLM0_JSZlfB" + }, + "source": [ + "Observe acima que os valores negativos foram substituídos por 0, como queríamos." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "AEHJ0rA3dHHU" + }, + "source": [ + "## Substituir os valores negativos por 0 e os positivos por 1" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "y32J8SRNZwRF" + }, + "source": [ + "a_conjunto2 = a_conjunto1.copy()\n", + "a_conjunto2" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "1bSD9Fs6P5wW" + }, + "source": [ + "a_conjunto2 = np.where(a_conjunto2 <= 0, 0, 1)\n", + "a_conjunto2" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "i027scjl0qkm" + }, + "source": [ + "___\n", + "# Outliers\n", + "> Qualquer ponto/observação que é incomum quando comparado com todos os outros pontos/observações." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "UnDTqRnZHQ3W" + }, + "source": [ + "## Z-Score\n", + "\n", + "* Z-Score pode ser utilizado para detectar Outliers.\n", + "* É a diferença entre o valor e a média da amostra expressa como o número de desvios-padrão. \n", + "* Se o escore z for menor que 2,5 ou maior que 2,5, o valor estará nos 5% do menor ou maior valor (2,5% dos valores em ambas as extremidades da distribuição). No entanto, é pratica comum utilizarmos 3 ao invés dos 2,5.\n", + "\n", + "![Z_Score](https://github.com/MathMachado/Materials/blob/master/Z_Score.png?raw=true)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "N7gb2zhtd0uM" + }, + "source": [ + "## IQR Score\n", + "\n", + "* O Intervalo interquartil (IQR) é uma medida de dispersão estatística, sendo igual à diferença entre os percentis 75 (Q3) e 25 (Q1), ou entre quartis superiores e inferiores, IQR = Q3 - Q1." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "lMmWOKNvghI7" + }, + "source": [ + "![BoxPlot](https://github.com/MathMachado/Materials/blob/master/boxplot.png?raw=true)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "DUw_a-MjWvBc" + }, + "source": [ + "### Desafio para resolverem\n", + "> **Objetivo**: Simular aleatoriamente o salário de 1.000 pessoas com distribuição N(1.045; 100). \n", + "* Identificar os _outliers_ da distribuição que acabamos de simular;\n", + "* Qual a média da distribuição que simulamos?\n", + "* Qual o desvio-padrão;\n", + "* Plotar o Boxplot da distribuição dos dados;\n", + "* Quantas pessoas > Q3 + 1.5*(Q3-Q1)\n", + "* Substituir os outliers do array por:\n", + " * Q1-1.5*(Q3 - Q1), se ponto < Q1-1.5*(Q3-Q1)\n", + " * Q3+1.5*(Q3 - Q1), se ponto > Q3+1.5*(Q3-Q1)\n", + "\n", + "Obs.: Use np.random.seed(20111974)." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "L9ntAdS_oOAh" + }, + "source": [ + "### Geração aleatória do array a_salarios com distribuição $N(\\mu, \\sigma)$" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "RL0Zb0fyDory", + "outputId": "2a3d2b33-579c-406d-d662-da4458f164e6", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 102 + } + }, + "source": [ + "import numpy as np\n", + "np.random.seed(20111974)\n", + "np.set_printoptions(precision = 2, suppress = True)\n", + "\n", + "media = 1045\n", + "desvio_padrao = 100\n", + "i_tamanho = 1000\n", + "\n", + "a_salarios = np.array(np.random.normal(media, desvio_padrao, size = i_tamanho))\n", + "a_salarios[:30]" + ], + "execution_count": 5, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([1295.63, 1156.44, 1250.57, 1101.48, 1074.9 , 1149.93, 1032.39,\n", + " 1151.23, 1158.81, 1182.97, 839. , 1112.47, 1117.72, 1011.08,\n", + " 1088.61, 1104.14, 915.72, 1162.71, 946.36, 865.97, 936.09,\n", + " 954.29, 942.71, 908.55, 1015.57, 1051.34, 930.8 , 994.29,\n", + " 961.46, 903.51])" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 5 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Fc3a-yhViCTs" + }, + "source": [ + "### Geração aleatória dos índices que serão (manualmente) alterados" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "Iakt6i1cgEcB", + "outputId": "9cc09094-5420-4078-a387-e22a58c13f7a", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + } + }, + "source": [ + "# Lista aleatória de índices que vou alterar\n", + "np.random.seed(19741120)\n", + "l_indices = np.random.randint(0, 999, 10)\n", + "\n", + "# Estas são as posições que serão alteradas\n", + "np.sort(l_indices)" + ], + "execution_count": 6, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([ 14, 105, 208, 349, 484, 567, 615, 616, 622, 847])" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 6 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "oXwME1rciHkw" + }, + "source": [ + "### Cópia dos salários para compararmos o ANTES e DEPOIS" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "BEtnua7sgp_y", + "outputId": "85b67195-c61a-4f05-ea8f-fc5b05d8973b", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 102 + } + }, + "source": [ + "# cópia do array a_salarios\n", + "a_salarios_copia = a_salarios.copy()\n", + "a_salarios_copia2 = a_salarios.copy()\n", + "\n", + "a_salarios[:30]" + ], + "execution_count": 7, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([1295.63, 1156.44, 1250.57, 1101.48, 1074.9 , 1149.93, 1032.39,\n", + " 1151.23, 1158.81, 1182.97, 839. , 1112.47, 1117.72, 1011.08,\n", + " 1088.61, 1104.14, 915.72, 1162.71, 946.36, 865.97, 936.09,\n", + " 954.29, 942.71, 908.55, 1015.57, 1051.34, 930.8 , 994.29,\n", + " 961.46, 903.51])" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 7 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "So8qj3Yrh-Az" + }, + "source": [ + "### Alteração (manual dos salários): 2 alternativas\n", + "> Vamos medir o tempo para avaliarmos o que é mais rápido. Qual solução é mais rápida?" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "Z0613on8z5VH" + }, + "source": [ + "from timeit import default_timer as timer\n", + "from datetime import timedelta" + ], + "execution_count": 8, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "NpvvholVxMhs", + "outputId": "2dbfff71-3249-4fd8-fd48-c9e1356dde34", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + } + }, + "source": [ + "# Índices a serem alterados\n", + "l_indices" + ], + "execution_count": 9, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([567, 14, 616, 484, 208, 105, 349, 615, 622, 847])" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 9 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "BqXsmMdm1yF-" + }, + "source": [ + "#### Solução 1" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "FiiOrlnbgKOD", + "outputId": "82a3c137-568d-4776-d952-11d00b1e40e7", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + } + }, + "source": [ + "# Alteração dos salários dos índices propostos\n", + "start = timer()\n", + "for i_indice in l_indices:\n", + " a_salarios_copia[i_indice] = 2*a_salarios[i_indice] # Loop para os índices a serem alterados (manualmente)\n", + "\n", + "a_salarios_copia[:30]\n", + "end = timer()\n", + "print(timedelta(seconds=end-start))" + ], + "execution_count": 10, + "outputs": [ + { + "output_type": "stream", + "text": [ + "0:00:00.000094\n" + ], + "name": "stdout" + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "FgvKC-aFzWpZ" + }, + "source": [ + "#### Solução 2" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "XWlQC5Jazt26", + "outputId": "8640d081-99ae-4235-e2de-620d6152193b", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + } + }, + "source": [ + "start = timer()\n", + "a_salarios_copia2[l_indices] = 2*a_salarios_copia2[l_indices]\n", + "a_salarios_copia2[:30]\n", + "end = timer()\n", + "print(timedelta(seconds=end-start))" + ], + "execution_count": 11, + "outputs": [ + { + "output_type": "stream", + "text": [ + "0:00:00.000090\n" + ], + "name": "stdout" + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "U92w03afhrmC" + }, + "source": [ + "### Compare" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "Ls-jCFCYhtD8", + "outputId": "04b7eff2-67d0-4f8f-8812-0b4be9bb7cb7", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 51 + } + }, + "source": [ + "# Antes\n", + "a_salarios[l_indices]" + ], + "execution_count": 12, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([ 826.43, 1088.61, 1121.95, 833.96, 1165.97, 1081.13, 1078.51,\n", + " 1094.67, 904.32, 1128.66])" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 12 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "nwwU06OahzD2", + "outputId": "e18448a4-97f4-452c-da95-3d1db69b1033", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 51 + } + }, + "source": [ + "# Depois\n", + "a_salarios_copia[l_indices]" + ], + "execution_count": 13, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([1652.85, 2177.23, 2243.89, 1667.93, 2331.93, 2162.26, 2157.02,\n", + " 2189.34, 1808.63, 2257.32])" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 13 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "qyUUdHmtisJS", + "outputId": "779e41e6-cb77-4966-b5d5-fe4d5b4f2ab2", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 102 + } + }, + "source": [ + "# 30 primeiras elementos de a_salarios\n", + "a_salarios[:30]" + ], + "execution_count": 14, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([1295.63, 1156.44, 1250.57, 1101.48, 1074.9 , 1149.93, 1032.39,\n", + " 1151.23, 1158.81, 1182.97, 839. , 1112.47, 1117.72, 1011.08,\n", + " 1088.61, 1104.14, 915.72, 1162.71, 946.36, 865.97, 936.09,\n", + " 954.29, 942.71, 908.55, 1015.57, 1051.34, 930.8 , 994.29,\n", + " 961.46, 903.51])" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 14 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "CJ1FEjlCi0-n", + "outputId": "5c6c8845-8f83-4047-9f4f-3034d2ec2af7", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 102 + } + }, + "source": [ + "# 30 primeiras posições de a_salarios_copia\n", + "a_salarios_copia[:30]" + ], + "execution_count": 15, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([1295.63, 1156.44, 1250.57, 1101.48, 1074.9 , 1149.93, 1032.39,\n", + " 1151.23, 1158.81, 1182.97, 839. , 1112.47, 1117.72, 1011.08,\n", + " 2177.23, 1104.14, 915.72, 1162.71, 946.36, 865.97, 936.09,\n", + " 954.29, 942.71, 908.55, 1015.57, 1051.34, 930.8 , 994.29,\n", + " 961.46, 903.51])" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 15 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "wKbSUgxxiOUL" + }, + "source": [ + "### Algumas Estatísticas descritivas:\n", + "#### Antes" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "ZnmykyahLWX9", + "outputId": "b2c70db1-2870-48e7-c031-94f122415bc8", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 35 + } + }, + "source": [ + "f'Média: {np.mean(a_salarios)}; Mediana: {np.median(a_salarios)}; STD: {np.std(a_salarios)}'" + ], + "execution_count": 16, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "string" + }, + "text/plain": [ + "'Média: 1047.150212238584; Mediana: 1047.631166829137; STD: 101.18708333868835'" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 16 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "80H92CIjibYJ" + }, + "source": [ + "#### Depois" + ], + "execution_count": 17, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "5iO-BAikieHJ", + "outputId": "ea72b3f5-5682-4971-e1f5-26aec68bc43c", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 35 + } + }, + "source": [ + "f'Média: {np.mean(a_salarios_copia)}; Mediana: {np.median(a_salarios_copia)}; STD: {np.std(a_salarios_copia)}'" + ], + "execution_count": 18, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "string" + }, + "text/plain": [ + "'Média: 1057.4744151862524; Mediana: 1048.089607774499; STD: 144.64306489539533'" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 18 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ILhNe80xW5C6" + }, + "source": [ + "### Solução do desafio" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "U993i1GJg2hk", + "outputId": "bf91af51-aac7-4008-8cc3-342573752205", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 271 + } + }, + "source": [ + "# Import a biblioteca seaborn:\n", + "import seaborn as sns\n", + "sns.boxplot(y = a_salarios_copia)" + ], + "execution_count": 19, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 19 + }, + { + "output_type": "display_data", + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAX0AAADtCAYAAABTaKWmAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAWIklEQVR4nO3df4zcdZ3H8eeL3QUBT2m3a8Vtua1OvQv+OCUjkBhyKG3ZEmP9485gLnZQco0CpRgSA9jYAHrx1EhoT0l6oWF7IXCc4tmEdsvW846YXLFbDigtaCdQaNdS1imgpgjd7fv+mE9xWPbHzOx2Zybf1yOZ9Pt9fz8z8/4S9jWffL/fma8iAjMzy4bTGt2AmZnNHoe+mVmGOPTNzDLEoW9mliEOfTOzDHHom5llyJShL2mhpF9I2idpr6Q1Y7bfKCkkzUvrkrReUlHSk5IuqBhbkLQ/PQozvztmZjaZ9irGjAA3RsRjkv4C2C1pICL2SVoILANeqBi/HFicHhcBdwEXSZoLrAPyQKTX2RIRL8/g/piZ2SSmDP2IOAwcTst/kPQ00A3sA+4Avg78rOIpK4DNUf7W105J50g6F7gUGIiIowCSBoBe4L6J3nvevHnR09NTx26ZmWXX7t27fxcRXeNtq2am/yZJPcDHgUclrQCGIuIJSZXDuoGDFeuHUm2i+tj3WAWsAjjvvPMYHByspUUzs8yT9PxE26o+kSvpncBPgBsoH/K5BfjmtLsbIyI2RkQ+IvJdXeN+UJmZWZ2qCn1JHZQD/96IeBD4ALAIeELSAWAB8Jik9wJDwMKKpy9ItYnqZmY2S6q5ekfA3cDTEfEDgIjYExHviYieiOihfKjmgoh4EdgCrExX8VwMvJrOC2wHlkmaI2kO5RPA20/NbpmZ2XiqOab/SeCLwB5Jj6faLRGxdYLxW4ErgCJwDPgSQEQclXQ7sCuNu+3kSV0zM5sd1Vy980tAU4zpqVgO4NoJxm0CNtXWolnzKZVK3Hrrraxbt47Ozs5Gt2NWNX8j16wOfX197Nmzh82bNze6FbOaOPTNalQqlejv7yci6O/vp1QqNbols6o59M1q1NfXx4kTJwAYHR31bN9aikPfrEY7duxgZGQEgJGREQYGBhrckVn1HPpmNVqyZAnt7eVrINrb21m6dGmDOzKrnkPfrEaFQoHTTiv/6bS1tbFy5coGd2RWPYe+WY06Ozvp7e1FEr29vb5k01pKTT+4ZmZlhUKBAwcOeJZvLcczfTOzDHHom9XBX86yVuXQN6tR5Zeztm3b5i9nWUtx6JvVqK+vj+PHjwNw/Phxz/atpTj0zWo0MDBA+XcFISJ4+OGHG9yRWfUc+mY1mj9//qTrZs3MoW9WoyNHjky6btbMHPpmNVq6dCnlG8qBJJYtW9bgjsyqV83tEhdK+oWkfZL2SlqT6t+T9IykJyX9VNI5Fc+5WVJR0q8lXV5R7021oqSbTs0umZ1ahULhLb+94y9oWSupZqY/AtwYEecDFwPXSjofGAA+HBEfBX4D3AyQtl0JfAjoBX4kqU1SG/BDYDlwPvCFNNaspXR2dtLd3Q1Ad3e3f4bBWsqUoR8RhyPisbT8B+BpoDsiHo6IkTRsJ7AgLa8A7o+I1yPiOcr3yr0wPYoR8WxEvAHcn8aatZRSqcTQ0BAAQ0NDvk7fWkpNx/Ql9QAfBx4ds+nLwLa03A0crNh2KNUmqo99j1WSBiUNDg8P19Ke2azo6+t7y+/p+zp9ayVVh76kdwI/AW6IiN9X1L9B+RDQvTPRUERsjIh8ROS7urpm4iXNZpSv07dWVlXoS+qgHPj3RsSDFfWrgM8A/xAn/wpgCFhY8fQFqTZR3ayl+Dp9a2XVXL0j4G7g6Yj4QUW9F/g68NmIOFbxlC3AlZLOkLQIWAz8CtgFLJa0SNLplE/2bpm5XTGbHYcPH5503ayZVfN7+p8EvgjskfR4qt0CrAfOAAbSNcs7I+IrEbFX0gPAPsqHfa6NiFEASdcB24E2YFNE7J3RvTGbBR0dHbz++utvWTdrFVOGfkT8EtA4m7ZO8pxvA98ep751sueZtYI//vGPk66bNTN/I9esRj09PZOumzUzh75ZjdauXTvpulkzc+ib1SiXy705u+/p6SGXyzW2IbMaOPTN6rB27VrOPvtsz/Kt5VRz9Y6ZjZHL5XjooYca3YZZzTzTNzPLEIe+mVmGOPTNzDLEoW9mliEOfTOzDHHom5lliEPfzCxDHPpmZhni0DczyxCHvplZhlRz56yFkn4haZ+kvZLWpPpcSQOS9qd/56S6JK2XVJT0pKQLKl6rkMbvl1Q4dbtlZmbjqWamPwLcGBHnAxcD10o6H7gJ+HlELAZ+ntYBllO+ReJiYBVwF5Q/JIB1wEXAhcC6kx8UZmY2O6YM/Yg4HBGPpeU/AE8D3cAKoC8N6wM+l5ZXAJujbCdwjqRzgcuBgYg4GhEvAwNA74zujZmZTaqmY/qSeoCPA48C8yPi5B2hXwTmp+Vu4GDF0w6l2kR1MzObJVWHvqR3Aj8BboiI31dui4gAYiYakrRK0qCkweHh4Zl4STMzS6oKfUkdlAP/3oh4MJWPpMM2pH9fSvUhYGHF0xek2kT1t4iIjRGRj4h8V1dXLftiZmZTqObqHQF3A09HxA8qNm0BTl6BUwB+VlFfma7iuRh4NR0G2g4skzQnncBdlmpmZjZLqrlz1ieBLwJ7JD2earcA3wEekHQ18Dzw+bRtK3AFUASOAV8CiIijkm4HdqVxt0XE0RnZCzMzq4rKh+ObUz6fj8HBwUa3YWbWUiTtjoj8eNv8jVwzswxx6JuZZYhD38wsQxz6ZmYZ4tA3M8sQh76ZWYY49M3MMsShb2aWIQ59M7MMceibmWWIQ9/MLEMc+mZmGeLQNzPLEIe+mVmGOPTN6lAqlbj++usplUqNbsWsJg59szr09fWxZ88eNm/e3OhWzGpSze0SN0l6SdJTFbWPSdop6fF0E/MLU12S1ksqSnpS0gUVzylI2p8ehfHey6wVlEol+vv7iQj6+/s927eWUs1M/x6gd0ztu8CtEfEx4JtpHWA5sDg9VgF3AUiaC6wDLgIuBNal++SatZy+vj5OnDgBwOjoqGf71lKmDP2IeAQYey/bAN6Vlt8N/DYtrwA2R9lO4BxJ5wKXAwMRcTQiXgYGePsHiVlL2LFjByMjIwCMjIwwMDDQ4I7MqlfvMf0bgO9JOgh8H7g51buBgxXjDqXaRPW3kbQqHTIaHB4errM9s1NnyZIltLe3A9De3s7SpUsb3JFZ9eoN/a8CX4uIhcDXgLtnqqGI2BgR+YjId3V1zdTLms2YQqHAaaeV/3Ta2tpYuXJlgzsyq169oV8AHkzL/0H5OD3AELCwYtyCVJuobtZyOjs76e3tRRK9vb10dnY2uiWzqtUb+r8F/jYtfxrYn5a3ACvTVTwXA69GxGFgO7BM0px0AndZqpm1pEKhwEc+8hHP8q3ltE81QNJ9wKXAPEmHKF+F84/AnZLagT9RvlIHYCtwBVAEjgFfAoiIo5JuB3alcbdFxNiTw2Yto7Ozk/Xr1ze6DbOaKSIa3cOE8vl8DA4ONroNM7OWIml3ROTH2+Zv5JqZZYhD38wsQxz6ZmYZ4tA3M8sQh76ZWYY49M3MMsShb1YH30TFWpVD36wOvomKtSqHvlmNSqUS27ZtIyLYtm2bZ/vWUhz6ZjXq6+t78/f0jx8/7tm+tRSHvlmNBgYGOPnzJRHBww8/3OCOzKrn0Der0fz58yddN2tmDn2zGh05cmTSdbNm5tA3q9HSpUuRBIAkli1b1uCOzKrn0DerUaFQoKOjA4COjg7fSMVaypShL2mTpJckPTWmvlrSM5L2SvpuRf1mSUVJv5Z0eUW9N9WKkm6a2d0wmz2Vt0tcvny5b5doLWXKO2cB9wD/Arx5XZqkTwErgL+JiNclvSfVzweuBD4EvA/YIemD6Wk/BJYCh4BdkrZExL6Z2hGz2VQoFDhw4IBn+dZypgz9iHhEUs+Y8leB70TE62nMS6m+Arg/1Z+TVOTPN00vRsSzAJLuT2Md+taSfLtEa1X1HtP/IHCJpEcl/Y+kT6R6N3CwYtyhVJuobmZms6iawzsTPW8ucDHwCeABSe+fiYYkrSLdaP28886biZc0M7Ok3pn+IeDBKPsVcAKYBwwBCyvGLUi1iepvExEbIyIfEfmurq462zMzs/HUG/r/CXwKIJ2oPR34HbAFuFLSGZIWAYuBXwG7gMWSFkk6nfLJ3i3Tbd7MzGoz5eEdSfcBlwLzJB0C1gGbgE3pMs43gEKUf4xkr6QHKJ+gHQGujYjR9DrXAduBNmBTROw9BftjZmaT0MkfjmpG+Xw+BgcHG92GmVlLkbQ7IvLjbfM3cs3MMsShb2aWIQ59M7MMceibmWWIQ9/MLEMc+mZmGeLQNzPLEIe+mVmGOPTNzDLEoW9mliEOfTOzDHHom5lliEPfzCxDHPpmZhni0DczyxCHvplZhkwZ+pI2SXop3SVr7LYbJYWkeWldktZLKkp6UtIFFWMLkvanR2Fmd8PMzKpRzUz/HqB3bFHSQmAZ8EJFeTnl++IuBlYBd6WxcynfZvEi4EJgnaQ502ncrJFKpRLXX389pVKp0a2Y1WTK0I+IR4Cj42y6A/g6UHm/xRXA5ijbCZwj6VzgcmAgIo5GxMvAAON8kJi1ir6+Pvbs2cPmzZsb3YpZTeo6pi9pBTAUEU+M2dQNHKxYP5RqE9XHe+1VkgYlDQ4PD9fTntkpVSqV6O/vJyLo7+/3bN9aSs2hL+ks4BbgmzPfDkTExojIR0S+q6vrVLyF2bT09fVx4sQJAEZHRz3bt5ZSz0z/A8Ai4AlJB4AFwGOS3gsMAQsrxi5ItYnqZi1nx44djIyMADAyMsLAwECDOzKrXs2hHxF7IuI9EdETET2UD9VcEBEvAluAlekqnouBVyPiMLAdWCZpTjqBuyzVzFrOJZdcMum6WTOr5pLN+4D/Bf5K0iFJV08yfCvwLFAE/hW4BiAijgK3A7vS47ZUM2s5ETH1ILMmpWb+Hzifz8fg4GCj2zB7iyuuuIJjx469uX7WWWexdevWBnZk9laSdkdEfrxt/kauWY2WLFlCW1sbAG1tbSxdurTBHZlVr73RDVjr2LBhA8VisdFtNNzx48cZHR0F4MSJE+zfv581a9Y0uKvGyuVyrF69utFtWBU80zerUUdHB+3t5fnS3Llz6ejoaHBHZtXzTN+q5pncn11zzTU8//zzbNy4kc7Ozka3Y1Y1z/TN6tDR0UEul3PgW8tx6JuZZYhD38wsQxz6ZmYZ4tA3M8sQh76ZWYY49M3MMsShb2aWIQ59M7MMceibmWWIQ9/MLEMc+mZmGVLNnbM2SXpJ0lMVte9JekbSk5J+Kumcim03SypK+rWkyyvqvalWlHTTzO+KmZlNpZqZ/j1A75jaAPDhiPgo8BvgZgBJ5wNXAh9Kz/mRpDZJbcAPgeXA+cAX0lgzM5tFU4Z+RDwCHB1TezgiRtLqTmBBWl4B3B8Rr0fEc5TvlXthehQj4tmIeAO4P401M7NZNBPH9L8MbEvL3cDBim2HUm2i+ttIWiVpUNLg8PDwDLRnZmYnTSv0JX0DGAHunZl2ICI2RkQ+IvJdXV0z9bJmZsY07pwl6SrgM8BlERGpPAQsrBi2INWYpG5mZrOkrpm+pF7g68BnI+JYxaYtwJWSzpC0CFgM/ArYBSyWtEjS6ZRP9m6ZXutmZlarKWf6ku4DLgXmSToErKN8tc4ZwIAkgJ0R8ZWI2CvpAWAf5cM+10bEaHqd64DtQBuwKSL2noL9MTOzSUwZ+hHxhXHKd08y/tvAt8epbwW21tSdmZnNKH8j18wsQxz6ZmYZ4tA3M8uQui/ZzIoNGzZQLBYb3YY1mZP/T6xZs6bBnVizyeVyrF69utFtTMihP4ViscjjTz3N6FlzG92KNZHT3ih/NWX3s0ca3Ik1k7ZjR6ce1GAO/SqMnjWX1/76ika3YWZN7sxnmv8CRR/TNzPLEIe+mVmGOPTNzDLEoW9mliEOfTOzDPHVO1MYGhqi7dirLXFW3swaq+1YiaGhkakHNpBn+mZmGeKZ/hS6u7t58fV2X6dvZlM685mtdHfPb3Qbk/JM38wsQ6YMfUmbJL0k6amK2lxJA5L2p3/npLokrZdUlPSkpAsqnlNI4/dLKpya3TEzs8lUM9O/B+gdU7sJ+HlELAZ+ntYBllO+ReJiYBVwF5Q/JCjfcesi4EJg3ckPCjMzmz1Thn5EPAKM/RWhFUBfWu4DPldR3xxlO4FzJJ0LXA4MRMTRiHgZGODtHyRmZnaK1XtMf35EHE7LLwInz1x0Awcrxh1KtYnqbyNplaRBSYPDw8N1tmdmZuOZ9onciAggZqCXk6+3MSLyEZHv6uqaqZc1MzPqv2TziKRzI+JwOnzzUqoPAQsrxi1ItSHg0jH1/67zvWdd27Gj/nKWvcVpf/o9ACfe8a4Gd2LNpPx7+s19yWa9ob8FKADfSf/+rKJ+naT7KZ+0fTV9MGwH/qni5O0y4Ob62549uVyu0S1YEyoW/wBA7v3N/Qdus21+02fGlKEv6T7Ks/R5kg5RvgrnO8ADkq4Gngc+n4ZvBa4AisAx4EsAEXFU0u3ArjTutoho/lvMQFPf9swa5+RtEu+8884Gd2JWmylDPyK+MMGmy8YZG8C1E7zOJmBTTd2ZmdmM8jdyzcwyxKFvZpYhDn0zswxx6JuZZYhD38wsQxz6ZmYZ4tA3M8sQh76ZWYY49M3MMsShb2aWIQ59M7MMceibmWWIQ9/MLEMc+mZmGeLQNzPLEIe+mVmGTCv0JX1N0l5JT0m6T9I7JC2S9KikoqR/l3R6GntGWi+m7T0zsQNmZla9ukNfUjdwPZCPiA8DbcCVwD8Dd0REDngZuDo95Wrg5VS/I40zM7NZNN3DO+3AmZLagbOAw8CngR+n7X3A59LyirRO2n6ZJE3z/c3MrAZ1h35EDAHfB16gHPavAruBVyJiJA07BHSn5W7gYHruSBrfOfZ1Ja2SNChpcHh4uN72zMxsHNM5vDOH8ux9EfA+4Gygd7oNRcTGiMhHRL6rq2u6L2dmZhWmc3hnCfBcRAxHxHHgQeCTwDnpcA/AAmAoLQ8BCwHS9ncDpWm8v5mZ1Wg6of8CcLGks9Kx+cuAfcAvgL9LYwrAz9LylrRO2v5fERHTeH8zM6tR+9RDxhcRj0r6MfAYMAL8H7AReAi4X9K3Uu3u9JS7gX+TVASOUr7Sx1rIhg0bKBaLjW6jKZz877BmzZoGd9Iccrkcq1evbnQbVoW6Qx8gItYB68aUnwUuHGfsn4C/n877mTWLjo4OXnnlFV577TXOPPPMRrdjVrVphb5li2dyf3bVVVfxyiuv8MYbb7Bx48ZGt2NWNf8Mg1mNisUiBw4cAODAgQM+5GUtxaFvVqNvfetbk66bNTOHvlmNTs7yJ1o3a2YOfbMa9fT0TLpu1swc+mY1Wrt27aTrZs3MoW9Wo1wu9+bsvqenh1wu19iGzGrg0Derw9q1azn77LM9y7eW4+v0zeqQy+V46KGHGt2GWc080zczyxCHvplZhjj0zcwyxKFvZpYhauaftJc0DDzf6D7MJjAP+F2jmzAbx19GxLi3Hmzq0DdrZpIGIyLf6D7MauHDO2ZmGeLQNzPLEIe+Wf189xRrOT6mb2aWIZ7pm5lliEPfzCxDHPpmZhni0DczyxCHvplZhvw/5I+5LV0j8I0AAAAASUVORK5CYII=\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "tags": [], + "needs_background": "light" + } + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "VtenLK1uK1Pi" + }, + "source": [ + "Consegue identificar os outliers do array?" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "e3sHuGVGFBdW" + }, + "source": [ + "## Objetivo\n", + "> Substituir os outliers por mediana. \n", + "\n", + "* Como fazer isso?" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "RSegPNKCI-dS" + }, + "source": [ + "### Siga os passos a seguir\n", + "1. Calcule estatísticas descritivas antes das transformações par avaliar o impacto;\n", + " * Calcule média, mediana e desvio-padrão dos dados originais;\n", + "2. Calcule os valores a seguir:\n", + " * Q1, Q3\n", + " * IQR = Q3-Q1\n", + " * lim_inferior = Q1-1.5\\*IQR\n", + " * lim_superior = Q3+1.5\\*IQR\n", + "3. Proceda à substituição:\n", + " * Se a_salarios_copia[i] < lim_inferior então a_salarios_copia[i]= Mediana\n", + " * Se a_salarios_copia[i] > lim_superior então a_salarios_copia[i]= Mediana\n", + "4. Calcule as estatísticas descritivas após as substituições e compare com os valores antes das transformações." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "9DQ7YnWaFn4v" + }, + "source": [ + "### Minha solução\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "RBXJbTeGLC7Q" + }, + "source": [ + "1. Estatísticas Descritivas antes das transformações:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "QueKYn7MLG12", + "outputId": "11daf3fe-c4c9-446e-cb46-cf6378c02779", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 35 + } + }, + "source": [ + "# Algumas estatísticas descritivas:\n", + "f'Média: {np.mean(a_salarios_copia)}; Mediana: {np.median(a_salarios_copia)}; STD: {np.std(a_salarios_copia)}'" + ], + "execution_count": 20, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "string" + }, + "text/plain": [ + "'Média: 1057.4744151862524; Mediana: 1048.089607774499; STD: 144.64306489539533'" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 20 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "oOBJ8INWL5fo" + }, + "source": [ + "Observe o quanto nossos dados estão distorcidos dos valores originalmente utilizados." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "MX-fJeh2MBTD" + }, + "source": [ + "2. Calcular Q1, Q3 e IQR" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "JlsPiQeGMGeU" + }, + "source": [ + "Q1 = np.percentile(a_salarios_copia, q = [25])\n", + "Q3 = np.percentile(a_salarios_copia, q = [75])\n", + "Q2 = np.percentile(a_salarios_copia, q = [50])\n", + "IQR = Q3-Q1\n", + "lim_inferior = Q1-1.5*IQR\n", + "lim_superior = Q3+1.5*IQR" + ], + "execution_count": 21, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "VF2NJ3rCeI1_", + "outputId": "e8e38919-ee69-4d21-db00-1abb7bd4fb9b", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 35 + } + }, + "source": [ + "f'Q1: {Q1}; Q3: {Q3}; lim_inferior: {lim_inferior}; lim_superior: {lim_superior}'" + ], + "execution_count": 22, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "string" + }, + "text/plain": [ + "'Q1: [974.41]; Q3: [1119.81]; lim_inferior: [756.33]; lim_superior: [1337.89]'" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 22 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "JjnwJ7HwMxcl" + }, + "source": [ + "3. Substituir\n", + "* Se a_conjunto1[i] < lim_inferior então a_conjunto1[i] = Mediana\n", + "* Se a_conjunto1[i] > Lia_Sup então a_conjunto1[i] = Mediana" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "hcAn-IwVfbcI" + }, + "source": [ + "a_salarios2 = a_salarios_copia.copy()" + ], + "execution_count": 23, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "J3SSE45oM9oh", + "outputId": "53db1a1d-8483-40f9-8cbc-196b79e449ff", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 102 + } + }, + "source": [ + "a_salarios2[a_salarios2 < lim_inferior[0]] = Q2[0]\n", + "#para todos que atendam essa condição ele vai receber o valor da mediana\n", + "a_salarios2[a_salarios2 > lim_superior[0]] = Q2[0]\n", + "a_salarios2[:30]" + ], + "execution_count": 24, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([1295.63, 1156.44, 1250.57, 1101.48, 1074.9 , 1149.93, 1032.39,\n", + " 1151.23, 1158.81, 1182.97, 839. , 1112.47, 1117.72, 1011.08,\n", + " 1048.09, 1104.14, 915.72, 1162.71, 946.36, 865.97, 936.09,\n", + " 954.29, 942.71, 908.55, 1015.57, 1051.34, 930.8 , 994.29,\n", + " 961.46, 903.51])" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 24 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "VEGFio0Nfj7O" + }, + "source": [ + "4. Estatísticas Descritivas para avaliarmos o impacto:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "gX1LZHFqfjFQ", + "outputId": "31a986a8-79bd-4c04-daf2-3fa1e8ca9f44", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 35 + } + }, + "source": [ + "# Algumas estatísticas descritivas:\n", + "f'Média: {np.mean(a_salarios2)}; Mediana: {np.median(a_salarios2)}; STD: {np.std(a_salarios2)}'" + ], + "execution_count": 25, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "string" + }, + "text/plain": [ + "'Média: 1047.3019702056902; Mediana: 1048.089607774499; STD: 98.3265929249586'" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 25 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "-xnguZ7XgyvK", + "outputId": "98be8554-e55a-4d5b-e418-4712377c627e", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 269 + } + }, + "source": [ + "# Import a biblioteca seaborn:\n", + "import seaborn as sns\n", + "sns.boxplot(y = a_salarios2)" + ], + "execution_count": 26, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 26 + }, + { + "output_type": "display_data", + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAX0AAADrCAYAAACFMUa7AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAO5klEQVR4nO3df6zddX3H8eer90YEzQa0d40WGCxt5tC4xTVIsriw8asQsxo3DWRJ7xxZY4Klwz82jMmaYEg0Lhpo1KQJDW3iYGSbsW5dsbBl/IVSFgJFUU5QpA3K9RZxWRW57Xt/3C/x7nJv7z333PZc/Dwfycn5nvf3c77nfQj3dT/9fL/nnlQVkqQ2rBp2A5KkM8fQl6SGGPqS1BBDX5IaYuhLUkMMfUlqyOiwGziVNWvW1MUXXzzsNiTpDeWxxx77cVWNzbVvRYf+xRdfzKFDh4bdhiS9oSR5br59Lu9IUkMMfUlqiKEvSQ0x9CWpIYa+tASTk5PccsstTE5ODrsVqS+GvrQEe/bs4cknn2Tv3r3DbkXqi6Ev9WlycpIDBw5QVRw4cMDZvt5QDH2pT3v27OHkyZMAnDhxwtm+3lAMfalPDz74IFNTUwBMTU1x8ODBIXckLZ6hL/XpqquuYnR0+sPso6OjXH311UPuSFo8Q1/q0/j4OKtWTf/ojIyMsGXLliF3JC3eiv7bO1pZdu7cSa/XG3YbK0ISAN761rdy++23D7mb4Vu/fj3btm0bdhtaBGf60hKsWrWKVatWsXbt2mG3IvXFmb4WzZncL23fvh2AO++8c8idSP1xpi9JDVkw9JPsTvJiksMzap9K8kSSx5N8Pcnbu3qS3JWk1+1/z4znjCd5pruNn563I0k6lcXM9O8BNs2qfbaq3l1Vvwf8K/B3Xf06YEN32wp8CSDJ+cAO4L3AZcCOJOcN3L0kqS8Lhn5VPQwcm1X76YyHbwGq294M7K1pjwDnJnkbcC1wsKqOVdVLwEFe/4tEknSaLflEbpI7gC3Ay8AfdeV1wPMzhh3pavPVJUln0JJP5FbVJ6vqQuDLwMeWq6EkW5McSnJoYmJiuQ4rSWJ5rt75MvCn3fZR4MIZ+y7oavPVX6eqdlXVxqraODY255e5S5KWaEmhn2TDjIebgae77X3Alu4qnsuBl6vqBeAB4Jok53UncK/papKkM2jBNf0k9wJXAGuSHGH6Kpzrk/w2cBJ4DvhoN3w/cD3QA44DHwGoqmNJPgU82o27var+38lhSdLpt2DoV9WNc5TvnmdsATfPs283sLuv7iRJy8pP5EpSQwx9SWqIoS9JDTH0Jakhhr4kNcTQl6SGGPqS1BBDX5IaYuhLUkMMfUlqiKEvSQ0x9CWpIYa+JDXE0Jekhhj6ktQQQ1+SGmLoS1JDDH1JaoihL0kNMfQlqSGGviQ1xNCXpIYY+pLUEENfkhpi6EtSQwx9SWrIgqGfZHeSF5McnlH7bJKnkzyR5CtJzp2x7xNJekm+k+TaGfVNXa2X5LblfyuSpIUsZqZ/D7BpVu0g8K6qejfwXeATAEkuBW4A3tk954tJRpKMAF8ArgMuBW7sxkqSzqAFQ7+qHgaOzap9vaqmuoePABd025uB+6rqlar6HtADLutuvap6tqp+AdzXjZUknUHLsab/l8C/d9vrgOdn7DvS1earS5LOoIFCP8kngSngy8vTDiTZmuRQkkMTExPLdVhJEgOEfpK/AN4P/HlVVVc+Clw4Y9gFXW2++utU1a6q2lhVG8fGxpbaniRpDksK/SSbgL8B/qSqjs/YtQ+4IclZSS4BNgDfBB4FNiS5JMmbmD7Zu2+w1iVJ/RpdaECSe4ErgDVJjgA7mL5a5yzgYBKAR6rqo1X1VJL7gW8xvexzc1Wd6I7zMeABYATYXVVPnYb3I0k6hQVDv6punKN89ynG3wHcMUd9P7C/r+4kScvKT+RKUkMMfUlqiKEvSQ0x9CWpIYa+JDXE0Jekhhj6ktQQQ1+SGmLoS1JDDH1JaoihL0kNMfQlqSGGviQ1xNCXpIYY+pLUEENfkhpi6EtSQwx9SWqIoS9JDTH0Jakhhr4kNcTQl6SGGPqS1JDRYTew0u3cuZNerzfsNrTCvPb/xPbt24fciVaa9evXs23btmG3MS9DfwG9Xo/HD3+bE+ecP+xWtIKs+kUB8NizPxpyJ1pJRo4fG3YLCzL0F+HEOefzs3dcP+w2JK1wZz+9f9gtLGjBNf0ku5O8mOTwjNqHkjyV5GSSjbPGfyJJL8l3klw7o76pq/WS3La8b0OStBiLOZF7D7BpVu0w8EHg4ZnFJJcCNwDv7J7zxSQjSUaALwDXAZcCN3ZjJUln0ILLO1X1cJKLZ9W+DZBk9vDNwH1V9QrwvSQ94LJuX6+qnu2ed1839luDNC9J6s9yX7K5Dnh+xuMjXW2++usk2ZrkUJJDExMTy9yeJLVtxV2nX1W7qmpjVW0cGxsbdjuS9Ctlua/eOQpcOOPxBV2NU9QlSWfIcs/09wE3JDkrySXABuCbwKPAhiSXJHkT0yd79y3za0uSFrDgTD/JvcAVwJokR4AdwDFgJzAG/FuSx6vq2qp6Ksn9TJ+gnQJurqoT3XE+BjwAjAC7q+qp0/GGJEnzW8zVOzfOs+sr84y/A7hjjvp+YOV/ckGSfoWtuBO5kqTTx9CXpIYY+pLUEENfkhpi6EtSQwx9SWqIoS9JDTH0Jakhhr4kNcTQl6SGGPqS1BC/GH0BR48eZeT4y2+ILzyWNFwjxyc5enRq2G2ckjN9SWqIM/0FrFu3jh++MsrP3nH9sFuRtMKd/fR+1q1bO+w2TsmZviQ1xNCXpIYY+pLUEENfkhpi6EtSQwx9SWqIoS9JDTH0Jakhhr4kNcTQl6SGGPqS1JAFQz/J7iQvJjk8o3Z+koNJnunuz+vqSXJXkl6SJ5K8Z8ZzxrvxzyQZPz1vR5J0KouZ6d8DbJpVuw14qKo2AA91jwGuAzZ0t63Al2D6lwSwA3gvcBmw47VfFJKkM2fB0K+qh4Fjs8qbgT3d9h7gAzPqe2vaI8C5Sd4GXAscrKpjVfUScJDX/yKRJJ1mS13TX1tVL3TbPwRe+1ui64DnZ4w70tXmq0uSzqCBT+RWVQG1DL0AkGRrkkNJDk1MTCzXYSVJLD30f9Qt29Ddv9jVjwIXzhh3QVebr/46VbWrqjZW1caxsbEltidJmstSQ38f8NoVOOPAV2fUt3RX8VwOvNwtAz0AXJPkvO4E7jVdTZJ0Bi34dYlJ7gWuANYkOcL0VTifBu5PchPwHPDhbvh+4HqgBxwHPgJQVceSfAp4tBt3e1XNPjksSTrNFgz9qrpxnl1XzjG2gJvnOc5uYHdf3UmSlpWfyJWkhhj6ktQQQ1+SGrLgmr5g5Pgxzn56/7Db0Aqy6uc/BeDkm39tyJ1oJRk5foxfflZ1ZTL0F7B+/fpht6AVqNf7HwDW/9bK/gHXmbZ2xWeGob+Abdu2DbsFrUDbt28H4M477xxyJ1J/XNOXpIYY+pLUEENfkhpi6EtSQwx9SWqIoS9JDTH0Jakhhr4kNcTQl6SGGPqS1BBDX5IaYuhLUkMMfUlqiKEvSQ0x9CWpIYa+JDXE0Jekhhj6ktQQQ1+SGmLoS1JDBgr9JNuTHE7yVJK/7mrnJzmY5Jnu/ryuniR3JekleSLJe5bjDUiSFm/JoZ/kXcBfAZcBvwu8P8l64DbgoaraADzUPQa4DtjQ3bYCXxqgb0nSEgwy0/8d4BtVdbyqpoD/Aj4IbAb2dGP2AB/otjcDe2vaI8C5Sd42wOtLkvo0SOgfBt6XZHWSc4DrgQuBtVX1Qjfmh8Dabnsd8PyM5x/papKkM2R0qU+sqm8n+QzwdeB/gceBE7PGVJLq57hJtjK9/MNFF1201PYkSXMY6ERuVd1dVb9fVX8IvAR8F/jRa8s23f2L3fCjTP9L4DUXdLXZx9xVVRurauPY2Ngg7UmSZhn06p3f6O4vYno9/x+AfcB4N2Qc+Gq3vQ/Y0l3Fcznw8oxlIEnSGbDk5Z3OPydZDbwK3FxVP0nyaeD+JDcBzwEf7sbuZ3rdvwccBz4y4GtLkvo0UOhX1fvmqE0CV85RL+DmQV5PkjQYP5ErSQ0x9CWpIYa+JDXE0Jekhhj6ktQQQ1+SGmLoS1JDDH1JaoihL0kNMfQlqSGGviQ1xNCXpIYY+pLUEENfkhpi6EtSQwx9SWqIoS9JDTH0Jakhhr4kNcTQl6SGGPqS1BBDX5IaYuhLUkMMfUlqiKEvSQ0x9CWpIQOFfpJbkzyV5HCSe5O8OcklSb6RpJfkH5O8qRt7Vve41+2/eDnegCRp8ZYc+knWAbcAG6vqXcAIcAPwGeDzVbUeeAm4qXvKTcBLXf3z3ThJ0hk06PLOKHB2klHgHOAF4I+Bf+r27wE+0G1v7h7T7b8ySQZ8fUlSH5Yc+lV1FPh74AdMh/3LwGPAT6pqqht2BFjXba8Dnu+eO9WNXz37uEm2JjmU5NDExMRS25MkzWGQ5Z3zmJ69XwK8HXgLsGnQhqpqV1VtrKqNY2Njgx5OkjTDIMs7VwHfq6qJqnoV+BfgD4Bzu+UegAuAo932UeBCgG7/rwOTA7y+JKlPg4T+D4DLk5zTrc1fCXwL+E/gz7ox48BXu+193WO6/f9RVTXA60uS+jTImv43mD4h+9/Ak92xdgF/C3w8SY/pNfu7u6fcDazu6h8Hbhugb0nSEowuPGR+VbUD2DGr/Cxw2Rxjfw58aJDXkyQNxk/kSlJDDH1JashAyztqy86dO+n1esNuY0V47b/D9u3bh9zJyrB+/Xq2bds27Da0CM70pSU466yzeOWVV3j11VeH3YrUF2f6WjRncr/0uc99jq997Wts2LCBW2+9ddjtSIvmTF/q0+TkJAcOHKCqOHDgAJOTfsZQbxyGvtSnPXv2cPLkSQBOnDjB3r17h9yRtHiGvtSnBx98kKmp6b8pODU1xcGDB4fckbR4hr7Up6uuuorR0enTYaOjo1x99dVD7khaPENf6tP4+DirVk3/6IyMjLBly5YhdyQtnqEv9Wn16tVs2rSJJGzatInVq1/3tRDSiuUlm9ISjI+P8/3vf99Zvt5wDH1pCVavXs1dd9017Dakvrm8I0kNMfQlqSGGviQ1xNCXpIZkJX9NbZIJ4Llh9yHNYw3w42E3Ic3hN6tqbK4dKzr0pZUsyaGq2jjsPqR+uLwjSQ0x9CWpIYa+tHS7ht2A1C/X9CWpIc70Jakhhr4kNcTQl6SGGPqS1BBDX5Ia8n+y6aH62hucLAAAAABJRU5ErkJggg==\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "tags": [], + "needs_background": "light" + } + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "uEPFcBjFhETQ" + }, + "source": [ + "Como podem ver, os outliers desapareceram, como queríamos." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "tHfzjW_ymKuR" + }, + "source": [ + "___\n", + "# **Valores únicos**\n", + "> Considere o array a seguir:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "HzmQgWZVmUUD" + }, + "source": [ + "np.random.seed(20111974)\n", + "a_conjunto1 = np.random.randint(0, 100, 100)\n", + "a_conjunto1" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Dm9ky1F1mrNA" + }, + "source": [ + "Quem são os valores únicos do array?" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "G-LPRqc-mS5j" + }, + "source": [ + "np.unique(a_conjunto1)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "uXZZoTd6nMuq" + }, + "source": [ + "___\n", + "# **Diferença entre dois arrays**\n", + "> O resultado é um array com os **valores únicos de A que não estão em B**. Na teoria de conjuntos escrevemos $A - B = A - A \\cap B$.\n", + "\n", + "![Difference](https://github.com/MathMachado/Materials/blob/master/set_Difference.PNG?raw=true)\n", + "\n", + "Fonte: [Python Set](https://www.learnbyexample.org/python-set/)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "uW6i3m9q1ZNs" + }, + "source": [ + "\n", + "* Vamos ver como isso funciona na prática:" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "vw05sfe22mfk" + }, + "source": [ + "## Exemplo 1" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "Qqw2do90nQ7k" + }, + "source": [ + "a_conjunto1 = np.array([0, 1, 2, 4, 5, 7, 8, 8]) # array de valores que serão excluidos em a_conjunto1. Observe que '3' não pertence a a_conjunto1.\n", + "a_conjunto2 = np.array([1, 6, 7, 3])" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "zXJ00pOMorM-" + }, + "source": [ + "np.setdiff1d(a_conjunto1, a_conjunto2)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "8GXZNgjfo8lO" + }, + "source": [ + "Observe que o resultado são os elementos de a_conjunto1 que não pertencem a x_Y. Mas como fica o '3' nesta história?" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "aJSu6VKb2oc_" + }, + "source": [ + "## Exemplo 2" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "N1wahElXTqoB" + }, + "source": [ + "a_conjunto1 = np.arange(10)\n", + "a_conjunto1" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "nxDpCMg7T7Rj" + }, + "source": [ + "a_conjunto2 = np.array([1, 5, 7])\n", + "a_conjunto2" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "3LU3qYyiUXqm" + }, + "source": [ + "np.setdiff1d(a_conjunto1, a_conjunto2)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "mzZEytrRUioU" + }, + "source": [ + "Observe que os elementos de a_conjunto2 foram deletados de a_conjunto1. Ok?" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "gJRcoVRUnaY9" + }, + "source": [ + "___\n", + "# Diferença Simétrica\n", + "* Em teoria de conjuntos, chamamos de Diferença Simétrica e escrevemos $(A \\cup B)- (A \\cap B)$.\n", + "\n", + "![DifferenceSymetric](https://github.com/MathMachado/Materials/blob/master/set_DifferenceSymetric.PNG?raw=true)\n", + "\n", + "Fonte: [Python Set](https://www.learnbyexample.org/python-set/)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "2Uzzm85Kup3H" + }, + "source": [ + "* Vamos ver como isso funciona na prática:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "1z5wZ8VwpsWN" + }, + "source": [ + "import numpy as np\n", + "a_conjunto1 = np.array([0, 1, 2, 4, 5, 7, 8]) # Observe que [1, 4, 7] pertencem a a_conjunto1, mas 3, não. Portanto:\n", + "a_conjunto2 = np.array([1, 4, 7, 3])" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "Tqd_9XO5p7bo" + }, + "source": [ + "np.setxor1d(a_conjunto1, a_conjunto2)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "_meurG3mqS5Y" + }, + "source": [ + "Como explicamos ou interpretamos este resultado?" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Kc8JoKe2nj2n" + }, + "source": [ + "___\n", + "# **União de dois arrays**\n", + "> Retorna os valores **únicos** dos dois arrays. Na teoria dos conjuntos, escrevemos:\n", + "\n", + "$$A \\cup B$$\n", + "\n", + "![Union](https://github.com/MathMachado/Materials/blob/master/set_Union.PNG?raw=true)\n", + "\n", + "Fonte: [Python Set](https://www.learnbyexample.org/python-set/)" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "1LZxorw2p2mg" + }, + "source": [ + "a_conjunto1 = np.array([0, 1, 2, 4, 5, 7, 8, 8])\n", + "\n", + "# Observe que [1, 4, 7] pertencem a a_conjunto1, mas 3, não. Portanto:\n", + "a_conjunto2 = np.array([1, 4, 7, 3])" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "COsZEmSwuY5L" + }, + "source": [ + "np.union1d(a_conjunto1, a_conjunto2)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "b53bR-GYRu_3" + }, + "source": [ + "___\n", + "# **Selecionar itens comuns dos arrays X e Y**\n", + "* Na teoria de conjuntos, chamamos de intersecção e escrevemos $X \\cap Y$.\n", + "\n", + "![Intersection](https://github.com/MathMachado/Materials/blob/master/set_Intersection.PNG?raw=true)\n", + "\n", + "Fonte: [Python Set](https://www.learnbyexample.org/python-set/)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "n2ec2tqqR1Gw" + }, + "source": [ + "* Considere os arrays a seguir:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "rXVQQvBqR4J-" + }, + "source": [ + "a_conjunto1 = np.arange(10)\n", + "a_conjunto1" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "pZTHhHxGSRfB" + }, + "source": [ + "a_conjunto2 = np.arange(8, 18)\n", + "a_conjunto2" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "MxB2_qHpScMB" + }, + "source": [ + "Quais são os elementos comuns à X e Y?" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "e-rncJHtSfw0" + }, + "source": [ + "np.intersect1d(a_conjunto1, a_conjunto2)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "3Bb39sWdfqaF" + }, + "source": [ + "___\n", + "# **Autovalores e Autovetores**\n", + "> Autovetor e Autovalor são um dos tópicos mais importantes em Machine Learning.\n", + "\n", + "Por definição, o escalar $\\lambda$ e o vetor $v$ são autovalor e autovetor da matriz $A$ se\n", + "\n", + "$$Av = \\lambda v$$\n", + "\n", + "## Leitura Adicional:\n", + "\n", + "* [Machine Learning & Linear Algebra — Eigenvalue and eigenvector](https://medium.com/@jonathan_hui/machine-learning-linear-algebra-eigenvalue-and-eigenvector-f8d0493564c9)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "XZBKq8nGCUbL" + }, + "source": [ + "* O array a_conjunto2 tem a seguinte forma:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "iYlZGKFUfw-R" + }, + "source": [ + "a_conjunto2" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "6EfvIbBNf02Z" + }, + "source": [ + "# Calcula autovalores e autovetores:\n", + "a_autovalores, a_autovalores= np.linalg.eig(a_conjunto2)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "v3GtQQvAz9QU" + }, + "source": [ + "Os autovalores do array a_conjunto2 são:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "WvZGyBR1f9vP" + }, + "source": [ + "a_autovalores" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "AuuDRJVh0FC8" + }, + "source": [ + "Os autovetores do array a_conjunto2 são:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "6m4YFAwsf_rA" + }, + "source": [ + "a_autovalores" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "DASn2Un9ZNV-" + }, + "source": [ + "___\n", + "# **Encontrar Missing Values (NaN)**" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "TKilWBsSXtR4" + }, + "source": [ + "## Gerar o exemplo" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "lqLI2ER_ZUMY" + }, + "source": [ + "np.random.seed(20111974)\n", + "a_conjunto1 = np.random.random(100)\n", + "\n", + "# Inserindo 15 NaN's no array:\n", + "np.random.seed(20111974)\n", + "l_indices_aleatorios= np.random.randint(0, 100, size = 15)\n", + "\n", + "for i_indices in l_indices_aleatorios:\n", + " #print(i_indices)\n", + " a_conjunto1[i_indices] = np.nan" + ], + "execution_count": 27, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "2ZkbMPXMawYh", + "outputId": "af5865c5-95fe-4df1-8712-b77543e860c5", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 187 + } + }, + "source": [ + "a_conjunto1" + ], + "execution_count": 28, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([0.53, 0.57, nan, 0.65, 0.86, 0.6 , 0.87, 0.46, nan, 0.64, 0.55,\n", + " 0.35, 0.32, nan, 0.85, 0.76, 0.66, 0.33, 0.35, 0.42, 0.31, 0.27,\n", + " 0.31, 0.36, 0.6 , 0.02, 0.36, nan, 0.28, 0.37, nan, 0.44, 0.2 ,\n", + " 0.21, 0.65, 0.82, 0.72, 0.5 , 0.17, 0.6 , nan, 0.14, nan, 0.71,\n", + " 0.07, 0.56, nan, 0.84, 0.21, 0.85, 0.63, 0.38, 0.91, 0.34, 0.07,\n", + " 0.1 , 0.85, 0.12, 0.94, 0.16, nan, 0.91, 0.59, 0.37, 0.72, 0.07,\n", + " 0.48, 0.78, 0.97, 0.72, 0.29, 0.33, 0.95, 0.24, 0.98, 0.85, 0.63,\n", + " 0.57, 0.67, 0.88, nan, nan, nan, 0.68, 0.29, 0.33, 0.98, 0.17,\n", + " nan, 0.92, 0.98, 0.76, 0.31, 0.97, 0.08, 0.56, nan, 0.49, 0.07,\n", + " 0.11])" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 28 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Z7Bs75NvbSjx" + }, + "source": [ + "Ok, inserimos aleatoriamente 14 NaN's no array a_conjunto1. Agora, vamos contar quantos NaN's (já sabemos a resposta!)." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "hL1Wn0vdX8ur" + }, + "source": [ + "## Identificar os NaN's" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "5R-n3H0xbd6d", + "outputId": "9dc32bc4-bb41-4c02-bc64-c98461f3cefd", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + } + }, + "source": [ + "np.isnan(a_conjunto1).sum()\n", + "## isnan retorna um valor boleano se é nulo ou não" + ], + "execution_count": 29, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "14" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 29 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "81IoQ-EVbI5X", + "outputId": "fcddc325-8c6c-4545-cf1c-349af38ca954", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + } + }, + "source": [ + "np.isnan\n", + "## é uma função" + ], + "execution_count": 31, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 31 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "PB-GAJ71bc7i", + "outputId": "0a9a431e-986d-40da-c841-544d25727b38", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 221 + } + }, + "source": [ + "array_nulos = np.isnan(a_conjunto1)\n", + "array_nulos" + ], + "execution_count": 32, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([False, False, True, False, False, False, False, False, True,\n", + " False, False, False, False, True, False, False, False, False,\n", + " False, False, False, False, False, False, False, False, False,\n", + " True, False, False, True, False, False, False, False, False,\n", + " False, False, False, False, True, False, True, False, False,\n", + " False, True, False, False, False, False, False, False, False,\n", + " False, False, False, False, False, False, True, False, False,\n", + " False, False, False, False, False, False, False, False, False,\n", + " False, False, False, False, False, False, False, False, True,\n", + " True, True, False, False, False, False, False, True, False,\n", + " False, False, False, False, False, False, True, False, False,\n", + " False])" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 32 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Y7hh5uowoa3U" + }, + "source": [ + "Ok, temos 14 NaN's em a_conjunto1." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "iVLQf_bqbyNU" + }, + "source": [ + "Ok, agora eu quero saber os índices desses NaN's." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "kJHxjZiwb5HM", + "outputId": "57d4bf56-64bd-4969-9281-d311ac926119", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + } + }, + "source": [ + "i_indices= np.where(np.isnan(a_conjunto1))\n", + "## o where retorna a posiçao do array que é true\n", + "i_indices" + ], + "execution_count": 30, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "(array([ 2, 8, 13, 27, 30, 40, 42, 46, 60, 80, 81, 82, 88, 96]),)" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 30 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "W_jHGNImok7L", + "outputId": "cbdcf2d2-3edf-4b8a-9a42-29aa20cbcd94", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + } + }, + "source": [ + "# Checando... que a posição 2 é nan\n", + "a_conjunto1[2]" + ], + "execution_count": 33, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "nan" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 33 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "iPhHAhDYcMWO" + }, + "source": [ + "Vamos conferir se está correto? Para isso, basta comparar com l_indices_aleatorios:" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "gxQYslRCe11G" + }, + "source": [ + "___\n", + "# **Deletar NaN's de um array**\n", + "> Considere o mesmo array que acabamos de trabalhar. Agora eu quero excluir os NaN's identificados." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "AeBARFqNfNnN", + "outputId": "eb361064-326a-451c-c4fe-0d14b861fbe8", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 187 + } + }, + "source": [ + "a_conjunto1" + ], + "execution_count": 34, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([0.53, 0.57, nan, 0.65, 0.86, 0.6 , 0.87, 0.46, nan, 0.64, 0.55,\n", + " 0.35, 0.32, nan, 0.85, 0.76, 0.66, 0.33, 0.35, 0.42, 0.31, 0.27,\n", + " 0.31, 0.36, 0.6 , 0.02, 0.36, nan, 0.28, 0.37, nan, 0.44, 0.2 ,\n", + " 0.21, 0.65, 0.82, 0.72, 0.5 , 0.17, 0.6 , nan, 0.14, nan, 0.71,\n", + " 0.07, 0.56, nan, 0.84, 0.21, 0.85, 0.63, 0.38, 0.91, 0.34, 0.07,\n", + " 0.1 , 0.85, 0.12, 0.94, 0.16, nan, 0.91, 0.59, 0.37, 0.72, 0.07,\n", + " 0.48, 0.78, 0.97, 0.72, 0.29, 0.33, 0.95, 0.24, 0.98, 0.85, 0.63,\n", + " 0.57, 0.67, 0.88, nan, nan, nan, 0.68, 0.29, 0.33, 0.98, 0.17,\n", + " nan, 0.92, 0.98, 0.76, 0.31, 0.97, 0.08, 0.56, nan, 0.49, 0.07,\n", + " 0.11])" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 34 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "e497B492fFru", + "outputId": "68f697f4-4778-43e9-9b4f-87d930eccc46", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 153 + } + }, + "source": [ + "a_conjunto1[~np.isnan(a_conjunto1)]\n", + "## o til nega a condição e retorna tudo que é falso\n", + "## colo o np.isnan gera o arrau com true e false, se eu falo que quero a negação e peço as posições com o falso\n", + "## eu estou retirando os nan" + ], + "execution_count": 36, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([0.53, 0.57, 0.65, 0.86, 0.6 , 0.87, 0.46, 0.64, 0.55, 0.35, 0.32,\n", + " 0.85, 0.76, 0.66, 0.33, 0.35, 0.42, 0.31, 0.27, 0.31, 0.36, 0.6 ,\n", + " 0.02, 0.36, 0.28, 0.37, 0.44, 0.2 , 0.21, 0.65, 0.82, 0.72, 0.5 ,\n", + " 0.17, 0.6 , 0.14, 0.71, 0.07, 0.56, 0.84, 0.21, 0.85, 0.63, 0.38,\n", + " 0.91, 0.34, 0.07, 0.1 , 0.85, 0.12, 0.94, 0.16, 0.91, 0.59, 0.37,\n", + " 0.72, 0.07, 0.48, 0.78, 0.97, 0.72, 0.29, 0.33, 0.95, 0.24, 0.98,\n", + " 0.85, 0.63, 0.57, 0.67, 0.88, 0.68, 0.29, 0.33, 0.98, 0.17, 0.92,\n", + " 0.98, 0.76, 0.31, 0.97, 0.08, 0.56, 0.49, 0.07, 0.11])" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 36 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "RpvKfJU_fmA6" + }, + "source": [ + "Observe que os NaN's foram excluidos." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "7ereghZPcdh4" + }, + "source": [ + "EXERCÍCIO - ATRIBUIR A MEDIANA AOS VALORES DA AMOSTRA\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "_Dv8MmNYg8zN" + }, + "source": [ + "___\n", + "# **Converter lista em array**\n", + "> Considere a lista a seguir:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "but6T9dVhFYb", + "outputId": "001bc55c-fe58-40ab-ff3c-3ea90d4e71d7", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + } + }, + "source": [ + "l_Lista = [np.random.randint(0, 10, 10)]\n", + "l_Lista" + ], + "execution_count": 37, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "[array([8, 9, 3, 7, 1, 3, 2, 9, 7, 7])]" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 37 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "xytj4Eo4hTh9", + "outputId": "4c2d1778-13ec-4717-a3a8-d538cfcf896a", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + } + }, + "source": [ + "type(l_Lista)" + ], + "execution_count": 38, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "list" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 38 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "qrINdcruhWcH" + }, + "source": [ + "Convertendo a minha lista para array:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "RoSyaX0OhZSE", + "outputId": "8b194262-97d5-45ee-9336-3a5c0a5ad802", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + } + }, + "source": [ + "a_conjunto = np.asarray(l_Lista)\n", + "a_conjunto" + ], + "execution_count": 39, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([[8, 9, 3, 7, 1, 3, 2, 9, 7, 7]])" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 39 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "dMjTdbBUhlrk", + "outputId": "4d140e4f-2e4c-4aa7-8e99-481cb4d1cdc3", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + } + }, + "source": [ + "type(a_conjunto)" + ], + "execution_count": 40, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "numpy.ndarray" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 40 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Mbm3ZP9DhxDI" + }, + "source": [ + "___\n", + "# Converter tupla em array\n", + "> Considere a tupla a seguir:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "cZxEFYLAh3S_" + }, + "source": [ + "np.random.seed(20111974)\n", + "t_numeros = ([np.random.randint(0, 10, 3)], [np.random.randint(0, 10, 3)], [np.random.randint(0, 10, 3)])\n", + "t_numeros" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "vlTXUJviiAml" + }, + "source": [ + "type(t_numeros)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "yEaOlq8oh3oh" + }, + "source": [ + "a_conjunto = np.asarray(t_numeros)\n", + "a_conjunto" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "PSgQDmRWh3g5" + }, + "source": [ + "type(a_conjunto)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "pH-Ht6yMiqJN" + }, + "source": [ + "___\n", + "# Acrescentar elementos à um array\n", + "> Considere o array a seguir:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "dFaDZInZiwoo" + }, + "source": [ + "a_conjunto1 = np.arange(5)\n", + "a_conjunto1" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "d3zrlf_Ci73Z" + }, + "source": [ + "np.random.seed(20111974)\n", + "a_conjunto1 = np.append(a_conjunto1, [np.random.randint(0, 10, 3), np.random.randint(0, 10, 3), np.random.randint(0, 10, 3)])\n", + "a_conjunto1" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "eFRhtk13ojqA" + }, + "source": [ + "___\n", + "# **Converter array 1D num array 2D**\n", + "> Considere os arrays a seguir:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "wYhBgW9Zu6ZP" + }, + "source": [ + "np.random.seed(20111974)\n", + "a_conjunto1 = np.array(np.random.randint(0, 10, 6))\n", + "\n", + "np.random.seed(19741120)\n", + "a_conjunto2 = np.array(np.random.randint(0, 10, 6))" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "febs9AUHvs6n" + }, + "source": [ + "a_conjunto1" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "C9OEd-iavvBm" + }, + "source": [ + "a_conjunto2" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "KJWjtaWKv0MJ" + }, + "source": [ + "np.column_stack((a_conjunto1, a_conjunto2)) # Atenção aos parênteses em (a_conjunto1, a_conjunto2)." + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "xr_WZXJ7pi2D" + }, + "source": [ + "___\n", + "# **Excluir um elemento específico do array usando indices**\n", + "> Considere os arrays a seguir:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "tS0ZzOs8w0dw" + }, + "source": [ + "np.random.seed(20111974)\n", + "a_conjunto1 = np.array(np.random.randint(0, 10, 6))\n", + "a_conjunto1" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "7bOJiKDKxEsC" + }, + "source": [ + "Suponha que eu queira excluir os valores '8' de a_conjunto1. Os índices dos valores '8' são: [0, 1, 3]. Portanto, temos:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "SSjueEvjxTJO" + }, + "source": [ + "a_conjunto1 = np.delete(a_conjunto1, [0, 1, 3])\n", + "a_conjunto1" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "mZkGZ2Rgp--5" + }, + "source": [ + "___\n", + "# **Frequência dos valores únicos de um array**\n", + "> Considere o array a seguir:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "Z2BWKfH0xvQ8", + "outputId": "0405171f-5590-434a-87be-39d44e18ce17", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 102 + } + }, + "source": [ + "np.random.seed(20111974)\n", + "a_conjunto1 = np.array(np.random.randint(0, 10, 100))\n", + "a_conjunto1" + ], + "execution_count": 41, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([8, 8, 2, 8, 9, 1, 8, 0, 4, 2, 0, 8, 9, 3, 7, 1, 3, 2, 9, 7, 7, 9,\n", + " 5, 6, 8, 7, 0, 9, 3, 9, 3, 1, 8, 6, 3, 5, 4, 1, 2, 9, 8, 6, 6, 1,\n", + " 0, 9, 2, 0, 7, 5, 5, 4, 4, 2, 7, 2, 7, 9, 3, 1, 5, 0, 1, 2, 3, 8,\n", + " 7, 5, 4, 0, 5, 9, 6, 6, 1, 3, 6, 0, 4, 9, 2, 1, 0, 9, 1, 4, 2, 9,\n", + " 7, 9, 5, 3, 7, 6, 3, 9, 8, 4, 3, 0])" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 41 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "s_tdQBsax4rQ" + }, + "source": [ + "Suponha que eu queira saber quantas vezes o número/elemento '2' aparece em a_conjunto1." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "6yIlk7pWyAtf", + "outputId": "01f739af-34ea-448c-992f-ee587482c359", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + } + }, + "source": [ + "l_itens_unicos, i_count = np.unique(a_conjunto1, return_counts=True)\n", + "## é uma função que retorna os itens unicos e a quantidade que aparecem\n", + "l_itens_unicos" + ], + "execution_count": 42, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 42 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "DyvrIwS9yZIR" + }, + "source": [ + "O que significa o output acima?" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "uO-MPMhXyV9H", + "outputId": "4f477738-6362-4177-a6ec-dd559dc9dc71", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + } + }, + "source": [ + "i_count" + ], + "execution_count": 43, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([10, 10, 10, 11, 8, 8, 8, 10, 10, 15])" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 43 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "zwoezXrPyofK" + }, + "source": [ + "Qual a interpretação do output acima?" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "HgYycSG7yr5e", + "outputId": "02fe1140-2976-4715-e211-20d27baf3c87", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 51 + } + }, + "source": [ + "np.asarray((l_itens_unicos, i_count))" + ], + "execution_count": 44, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9],\n", + " [10, 10, 10, 11, 8, 8, 8, 10, 10, 15]])" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 44 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "SwIZiJAiy06T" + }, + "source": [ + "Qual a interpretação do output acima?" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Wy_tqAPgdchD" + }, + "source": [ + "é a frequencia com que cada um aparece" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "JpNRpN2Dql3N" + }, + "source": [ + "___\n", + "# **Combinações possíveis de outros arrays**\n", + "> Considere o exemplo a seguir:\n" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "BUr89dH4zLXD" + }, + "source": [ + "a_conjunto1 = [2, 4, 6]\n", + "a_conjunto2 = [0, 8]\n", + "a_conjunto4 = [1, 5]" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "cEZH6l-Czx7y" + }, + "source": [ + "np.meshgrid(a_conjunto1, a_conjunto2, a_conjunto4)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "btvmDkEcz0tH" + }, + "source": [ + "np.array(np.meshgrid(a_conjunto1, a_conjunto2, a_conjunto4))" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "Z0xhO7rGz059" + }, + "source": [ + "np.array(np.meshgrid(a_conjunto1, a_conjunto2, a_conjunto4)).T" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "eMv4lFnD0Enn" + }, + "source": [ + "# Resultado final\n", + "a_conjunto3 = np.array(np.meshgrid(a_conjunto1, a_conjunto2, a_conjunto4)).T.reshape(-1,3)\n", + "a_conjunto3" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Rz80YANfAh2k" + }, + "source": [ + "___\n", + "# **Wrap Up**" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "_cyhMsAVXxGC" + }, + "source": [ + "___\n", + "# **Exercícios**" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "kNjovMw3uJ3R" + }, + "source": [ + "## Exercício 1 - Selecionar os números pares\n", + "> Dado o 1D array abaixo, selecionar somente os números pares." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "isDzQjwjBX3V", + "outputId": "ad54cd80-fa6e-4772-a869-1b7f98b09725", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + } + }, + "source": [ + "a_conjunto1 = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])\n", + "a_conjunto1" + ], + "execution_count": 45, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 45 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Kq1zt-uO1HXv" + }, + "source": [ + "### **Minha solução**" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "YFmK_n2M1Ks9", + "outputId": "496556f7-9ff2-40f7-8cbf-2a1588793e40", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + } + }, + "source": [ + "a_conjunto1[a_conjunto1 % 2 == 0]" + ], + "execution_count": 46, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([0, 2, 4, 6, 8])" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 46 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "sScYG0hp05vb" + }, + "source": [ + "___\n", + "## Exercício 2 - Substituir pela mediana\n", + "> Dado o array 1D abaixo, substituir os números pares pela mediana de a_conjunto1." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "XLZ-DIWU1WFs", + "outputId": "aebcdfd3-b244-4cb2-f33b-a8083d241d17", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + } + }, + "source": [ + "a_conjunto1 = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])\n", + "a_conjunto1" + ], + "execution_count": 47, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 47 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "9c4QWJno1WVB" + }, + "source": [ + "### **Minha solução**\n", + "* Primeiramente, precisamos calcular a mediana.\n", + "* Depois, substituimos os valores pares de a_conjunto1 pela mediana encontrada anteriormente. Ok?" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "rx7NGAO01Wfb", + "outputId": "575c869f-1a28-49db-9516-47c84cecacc2", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + } + }, + "source": [ + "a_conjunto1[a_conjunto1 % 2 == 0] = np.median(a_conjunto1)\n", + "a_conjunto1" + ], + "execution_count": 48, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([4, 1, 4, 3, 4, 5, 4, 7, 4, 9])" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 48 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "2c_AphX82qp8" + }, + "source": [ + "Verificando..." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "9kVta0Cr13Z9", + "outputId": "cb8af387-8353-49f1-cf9e-37ee7c77b607", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 35 + } + }, + "source": [ + "f'A média de a_conjunto1 é: {np.median(a_conjunto1)}'" + ], + "execution_count": 49, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "string" + }, + "text/plain": [ + "'A média de a_conjunto1 é: 4.0'" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 49 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "L9O-Hf5x26TY" + }, + "source": [ + "___\n", + "## Exercício 3 - Reshape\n", + "> Dado o array 1D abaixo, reshape para um array 2D com 3 colunas." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "0_laUvtB4Wl-", + "outputId": "34954fdf-6e28-477c-ac93-03b510461a21", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + } + }, + "source": [ + "# Define seed\n", + "np.random.seed(20111974)\n", + "a_conjunto1 = np.array(np.random.randint(1, 10, size = 15))\n", + "a_conjunto1" + ], + "execution_count": 50, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([9, 9, 3, 9, 2, 9, 1, 5, 3, 1, 9, 4, 8, 2, 4])" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 50 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "dKzEX8TK5b4Z" + }, + "source": [ + "### **Minha solução**\n", + "* O array 1D a_conjunto1 acima possui 15 elementos. Como queremos transformá-lo num array 2D com 3 colunas, então cada coluna terá 5 elementos." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "r5hJ-wMwjXPR", + "outputId": "2eadc741-755c-46d2-bd8c-ad0d3cc8ad8e", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 102 + } + }, + "source": [ + "a_conjunto1.reshape(-1,3)" + ], + "execution_count": 51, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([[9, 9, 3],\n", + " [9, 2, 9],\n", + " [1, 5, 3],\n", + " [1, 9, 4],\n", + " [8, 2, 4]])" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 51 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "I-j5yVD04249" + }, + "source": [ + "a_conjunto1.reshape(5, 3) \n", + "# Poderia ser a_conjunto1.reshape(-1, 3), onde \"-1\" pede para o NumPy calcular o número de linhas. " + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "F1vfS8jE6L0_" + }, + "source": [ + "___\n", + "## Exercício 4 - Reshape\n", + "> Dado o array 1D abaixo, reshape para um array 3D com 2 colunas." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "xcN-bez56L1D" + }, + "source": [ + "# Define seed\n", + "np.random.seed(20111974)\n", + "a_conjunto1 = np.array(np.random.randint(1, 10, size = 16))\n", + "a_conjunto1" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "7iICnOyG6fcj" + }, + "source": [ + "### **Minha solução**\n", + "* O array 1D a_conjunto1 acima possui 16 elementos. Queremos transformá-lo num array 3D com 2 colunas." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "vdq5ybuD6fcn" + }, + "source": [ + "a_conjunto1.reshape(-1, 2) # O valor \"-1\" na posição das linhas pede ao NumPy para calcular o número de linhas automaticamente." + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "haQfWPcCs_H0" + }, + "source": [ + "## Exercício 5\n", + "Para mais exercícios envolvendo arrays, visite a página [Python: Array Exercises, Practice, Solution](https://www.w3resource.com/python-exercises/array/)." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "LQQL0JS2tnc0" + }, + "source": [ + "## Exercício 6\n", + "Para mais exercícios envolvendo matemática, viste a página [Python Math: - Exercises, Practice, Solution](https://www.w3resource.com/python-exercises/math/index.php)." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "qNskKFy9t4D5" + }, + "source": [ + "## Exercício 7\n", + "Para mais exercícios envolvendo NumPy em geral, visite a página [NumPy Exercises, Practice, Solution](https://www.w3resource.com/python-exercises/numpy/index.php)." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "qqc1AiHXuKZ5" + }, + "source": [ + "## Exercício 8\n" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "jYrgc3KvtmLy" + }, + "source": [ + "" + ], + "execution_count": null, + "outputs": [] + } + ] +} \ No newline at end of file From b7eb0e404bb8b4b6ed419eb273f7e9410c25dc13 Mon Sep 17 00:00:00 2001 From: MariaJacobs70 <72224154+MariaJacobs70@users.noreply.github.com> Date: Wed, 7 Oct 2020 14:42:36 -0300 Subject: [PATCH 2/9] Criado usando o Colaboratory --- Notebooks/NB02__Numpy_alterado.ipynb | 5789 ++++++++++++++++++++++++++ 1 file changed, 5789 insertions(+) create mode 100644 Notebooks/NB02__Numpy_alterado.ipynb diff --git a/Notebooks/NB02__Numpy_alterado.ipynb b/Notebooks/NB02__Numpy_alterado.ipynb new file mode 100644 index 000000000..4ea92acae --- /dev/null +++ b/Notebooks/NB02__Numpy_alterado.ipynb @@ -0,0 +1,5789 @@ +{ + "nbformat": 4, + "nbformat_minor": 0, + "metadata": { + "colab": { + "name": "NB02__Numpy.ipynb", + "provenance": [], + "collapsed_sections": [ + "n8BIbzQbNWUo", + "7eS94uQ4NhVR", + "SYOgJpGYVLUu", + "CaHFxk98W5if", + "ReWUyWiHXCnc", + "CqszHxaKHr2h", + "tXgF1Wl9gHKY", + "Fotx7XUquAo8", + "36kmLUYDvsUI", + "SWO2GdNovxAp", + "vpN54l4vxze5", + "u4HOf9SNytSq", + "6BQ9oZiD9hg5", + "tz5-QdrX9vct", + "p1muBgMX8NK4", + "FxTC2-U88ajk", + "z8EYn0pP25Rh" + ], + "include_colab_link": true + }, + "kernelspec": { + "name": "python3", + "display_name": "Python 3" + }, + "accelerator": "GPU" + }, + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "view-in-github", + "colab_type": "text" + }, + "source": [ + "\"Open" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "6QhLXoatkvKR" + }, + "source": [ + "

NUMPY

\n", + "\n", + "> NumPy é um pacote para computação científica e álgebra linear para Python.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "b8EZupp68vW8" + }, + "source": [ + "# **AGENDA**:\n", + "> Neste capítulo, vamos abordar os seguintes assuntos:\n", + "\n", + "* NumPy\n", + "* Criar arrays\n", + "* Criar Arrays Multidimensionais\n", + "* Selecionar itens\n", + "* Aplicar funções como max(), min() e etc\n", + "* Calcular Estatísticas Descritivas: média e variância\n", + "* Reshaping\n", + "* Tansposta de um array\n", + "* Autovalores e Autovetores\n", + "* Wrap Up\n", + "* Exercícios" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "cO5t3xCO8kyK" + }, + "source": [ + "___\n", + "# **NOTAS E OBSERVAÇÕES**\n", + "\n", + "* Nosso foco com o NumPy é facilitar o uso do Pandas;" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "z2IFUG4GSB0Z" + }, + "source": [ + "___\n", + "# **CHEETSHEET**" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "jYLeDVH-SNCg" + }, + "source": [ + "![Numpy](https://github.com/MathMachado/Materials/blob/master/numpy_basics-1.png?raw=true)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "0mKvExmgUFOk" + }, + "source": [ + "# **ESCALAR, VETORES, MATRIZES E TENSORES**\n", + "\n", + "![Tensor](https://github.com/MathMachado/Materials/blob/master/tensor.png?raw=true)\n", + "\n", + "Source: [PyTorch for Deep Learning: A Quick Guide for Starters](https://towardsdatascience.com/pytorch-for-deep-learning-a-quick-guide-for-starters-5b60d2dbb564)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "o00pYRIkXiAU" + }, + "source": [ + "## Import Statement - Primeiros exemplos\n", + "> Como exemplo, considere gerar uma amostra aleatória de tamanho 10 da Distribuição Normal(0, 1):" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "l_XuvcUDWNDk" + }, + "source": [ + "## Importar a library NumPy" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "am_ZTIGaapCo" + }, + "source": [ + "### **Opção 1**: Importar a biblioteca NumPy COM alias" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "b4irLw6BWVVZ" + }, + "source": [ + "import numpy as np # NM incluiu um comentário nesta linha!" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "JK54ga7dXnJu", + "outputId": "1a31527c-f8b6-44d5-ecbd-9f08abc5f8d6", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 50 + } + }, + "source": [ + "# Set up o número de casas decimais para o NumPy:\n", + "np.set_printoptions(precision = 2, suppress = True)\n", + "\n", + "'''\n", + "Define seed por questões de reproducibilidade, ou seja, \n", + "garante que todos vamos gerar os mesmos números aleatórios\n", + "'''\n", + "np.random.seed(seed = 20111974)\n", + "\n", + "# Gera 10 números aleatórios a partir da Distribuição Normal(media, desvio_padrao)\n", + "media = 0\n", + "desvio_padrao = 1\n", + "a_conjunto1 = np.random.normal(media, desvio_padrao, size = 10) # Array 1D de size = 10\n", + "a_conjunto1" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([ 2.51, 1.11, 2.06, 0.56, 0.3 , 1.05, -0.13, 1.06, 1.14,\n", + " 1.38])" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 2 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "3-0934isZUm6" + }, + "source": [ + "**Observação**: Altere o valor de [precision] para 4, 2 e 0 e observe o que acontece." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "9ob_8S_bYYa2" + }, + "source": [ + "### **Opção 2**: Importar a biblioteca NumPy SEM alias" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "NcGd1ho_XDXU" + }, + "source": [ + "import numpy" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "zFYH6J5-Ydjl" + }, + "source": [ + "# Set up o número de casas decimais para o NumPy:\n", + "numpy.set_printoptions(precision = 2, suppress = True)\n", + "\n", + "'''\n", + "Define seed por questões de reproducibilidade, ou seja, \n", + "garante que todos vamos gerar os mesmos números aleatórios\n", + "'''\n", + "numpy.random.seed(seed = 20111974)\n", + "\n", + "# Gera 10 números aleatórios a partir da Distribuição Normal(mu, desvio_padrao)\n", + "media = 0\n", + "desvio_padrao = 1\n", + "numpy.random.normal(size = 10)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "AwWSzYrZWfvA" + }, + "source": [ + "### **Opção 3**: Importar funções específicas da biblioteca NumPy" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "bfYJzcqRa5eu" + }, + "source": [ + "from numpy import set_printoptions\n", + "from numpy.random import seed, normal" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "Xj6fbpvubH_p" + }, + "source": [ + "# Set up o número de casas decimais para o NumPy:\n", + "set_printoptions(precision = 2, suppress = True)\n", + "\n", + "'''\n", + "Define seed por questões de reproducibilidade, ou seja, \n", + "garante que todos vamos gerar os mesmos números aleatórios\n", + "'''\n", + "seed(seed = 20111974)\n", + "\n", + "# Gera 10 números aleatórios a partir da Distribuição Normal(mu, desvio_padrao)\n", + "media = 0\n", + "desvio_padrao = 1 \n", + "np.random.normal(size = 10)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "00RerJPChnuP" + }, + "source": [ + "___\n", + "# **Estatísticas Descriticas com NumPy**" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Qa6ro1VJlShd" + }, + "source": [ + "## Exemplo 1\n", + "> Vamos voltar ao mesmo exemplo anterior, mas desta vez, usando a opção 1 (com alias):\n", + "\n", + "* Gerar uma amostra aleatória de tamanho 10 da Distribuiçao Normal(0, 1)." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "31dSBU8khvFk" + }, + "source": [ + "# Set up o número de casas decimais para o NumPy:\n", + "np.set_printoptions(precision = 2, suppress = True)\n", + "\n", + "# Define seed\n", + "np.random.seed(seed = 20111974)\n", + "\n", + "# Gera 10 números aleatórios a partir da Distribuição Normal(media, desvio_padrao)\n", + "media = 0\n", + "desvio_padrao = 1\n", + "\n", + "np.random\n", + "a_conjunto1 = np.random.normal(media, desvio_padrao, size = 10) # Array 1D de size = 10\n", + "a_conjunto1" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "wa2t0P3nevTh" + }, + "source": [ + "Conferindo a média e desvio-padrão do array gerado:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "drUyk3f5ekDq" + }, + "source": [ + "f'Distribuição N({np.mean(a_conjunto1)}, {np.std(a_conjunto1)})'" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "XSp7Hd-Gib67" + }, + "source": [ + "Estávamos à espera de media = 0 e sigma = 1. Certo? Porque isso não aconteceu?" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "HP_8VSgygXOF" + }, + "source": [ + "## **Laboratório 1**\n", + "> Altere os valores de [size] para 100, 1.000, 10.000, 100.000 e 1.000.000 e relate o que acontece com a média e desvio padrão." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "4TbmVbdcg6iU" + }, + "source": [ + "## **Minha solução**" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "-qdiqBVHg-gd" + }, + "source": [ + "# Define a média e o desvio-padrão\n", + "media = 0\n", + "desvio_padrao = 1\n", + "\n", + "# Define seed\n", + "np.random.seed(seed = 20111974)\n", + "l_lista_conjunto = [10, 100, 1000, 10000, 100000, 1000000, 10000000, 100000000, 1000000000]\n", + "\n", + "for i_size in l_lista_conjunto:\n", + " a_conjunto1 = np.random.normal(media, desvio_padrao, size = i_size)\n", + " print(f'Size: {i_size}--> Distribuição: N({np.mean(a_conjunto1)}, {np.std(a_conjunto1)})')" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "bp-YuviQwWqE" + }, + "source": [ + "Com relação à Distribuição Normal($\\mu, \\sigma$), temos que:\n", + "\n", + "![NormalDistribution](https://github.com/MathMachado/Materials/blob/master/NormalDistribution.PNG?raw=true)\n", + "\n", + "Fonte: [Normal Distribution](https://towardsdatascience.com/understanding-the-68-95-99-7-rule-for-a-normal-distribution-b7b7cbf760c2)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "KwHBY3Enk04N" + }, + "source": [ + "## Lei Forte dos Grandes Números - LFGN\n", + "> Por favor, leia o que diz a [Law of large numbers](https://en.wikipedia.org/wiki/Law_of_large_numbers). --> 3 minutos.\n", + "\n", + "* O que você aprendeu com isso?" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "BhwmSkAjlszT" + }, + "source": [ + "## Exemplo 2\n", + "> Vamos nos aprofundar um pouco mais no que diz a LFGN. Para isso, vamos simular o lançamento de dados. Como sabemos, os dados possuem 6 lados numerados de 1 a 6, com igual probabilidade. Certo?\n", + "\n", + "A LFGN nos diz que à medida que N (o tamanho da amostra ou número de dados) cresce, então a média dos dados converge para o valor esperado. Isso quer dizer que:\n", + "\n", + "$$\\frac{1+2+3+4+5+6}{6}= \\frac{21}{6}= 3,5$$\n", + "\n", + "Ou seja, à medida que N (o tamanho da amostra) cresce, espera-se que a média dos dados se aproxime de 3,5. Ok?\n", + "\n", + "Vamos ver se isso é verdade..." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "-QcJXf6roj0D" + }, + "source": [ + "Vamos usar o método np.random.randint (= função randint definido na classe np.random), a seguir:" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "A2u0RzLOrRE2" + }, + "source": [ + "O que significa ou qual é a interpretação do resultado abaixo?" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "B3-X_VBerUfa" + }, + "source": [ + "# Define seed\n", + "import numpy as np\n", + "np.random.seed(seed = 20111974)\n", + "\n", + "# Simular 100 lançamentos de um dado:\n", + "a_dados_simulados = np.random.randint(1, 7, size = 100)\n", + "a_dados_simulados" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "m8Of2MMIrbF3" + }, + "source": [ + "# Importar o pandas, pois vamos precisar do método pd.value_counts():\n", + "import pandas as pd\n", + "pd.value_counts(a_dados_simulados)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "54VwED8Br8rx" + }, + "source": [ + "**Interpretação**: Isso quer dizer que fizemos a simulação de lançamento de um dado 100 vezes. Acima, a frequência com que cada lado do dado aparece.\n", + "\n", + "Eu estava à espera de frequência igual para cada um dos lados, isto é, por volta dos 16 ou 17. Ou seja:\n", + "\n", + "$$\\frac{100}{6}= 16,66$$\n", + "\n", + "Mas ok, vamos continuar com nosso experimento..." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "HT_Dak-umC6I" + }, + "source": [ + "# Definir a semente\n", + "np.random.seed(20111974)\n", + "\n", + "for i_size in [10, 30, 50, 75, 100, 1000, 10000, 100000, 1000000]:\n", + " a_dados_simulados = np.random.randint(1, 7, size = i_size)\n", + " print(f'Size: {i_size} --> Média: {np.mean(a_dados_simulados)}')" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "edWNNOnXtbtd" + }, + "source": [ + "E agora, como você interpreta esses resultados?" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "eL6gXThkYcSf" + }, + "source": [ + "## Calcular percentis\n", + "> Boxplot" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "jlGOQfXfPf0D" + }, + "source": [ + "![BoxPlot](https://github.com/MathMachado/Materials/blob/master/boxplot.png?raw=true)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "grtEXG2BoNRt" + }, + "source": [ + "Considere o array de retornos (simulados) a seguir:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "DjPKKq01YjF9" + }, + "source": [ + "import numpy as np\n", + "np.random.seed(20111974)\n", + "\n", + "# Simulando Retornos de ativos financeiros com a distribuição Normal(0, 1):\n", + "a_retornos = np.random.normal(0, 1, 100)\n", + "print(f'Média: {np.mean(a_retornos)}')" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "ajjlfqgssLVO" + }, + "source": [ + "a_retornos" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "XZ3m06gv9lei" + }, + "source": [ + "A seguir, o boxplot do array a_retornos:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "QtuwJP449tBQ" + }, + "source": [ + "# Import da biblioteca seaborn: Uma das principais libraries para Data Visualization (outras: matplotlib)\n", + "import seaborn as sns\n", + "\n", + "sns.boxplot(y = a_retornos)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "o9ujdjxNY6qE" + }, + "source": [ + "# Vamos usar o método np.percentile(array, q = [p1, p2, p3, ..., p99])\n", + "percentis = np.percentile(a_retornos, q = [1, 5, 25, 50, 55, 75, 99])\n", + "\n", + "# Primeiro Quartil\n", + "q1 = percentis[2]" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "c75g2Egco2lc" + }, + "source": [ + "Em qual posição do array a_retornos se encontra Q3?" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "nZr-A82Zo8Kb" + }, + "source": [ + "q3 = percentis[5]\n", + "\n", + "# ou de trás para a frente do conteúdo da lista:\n", + "q3_2 = percentis[-2]\n", + "print(q3, q3_2)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "sWrnESPQT4JM" + }, + "source": [ + "# lim_inferior e lim_superior para detecção de outliers\n", + "lim_inferior = q1 - 1.5 * (q3 - q1)\n", + "lim_superior = q3 + 1.5 * (q3 - q1)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "Yb4-ZJlUUYsi" + }, + "source": [ + "f'Limite Inferior: {lim_inferior}; Limite Superior: {lim_superior}'" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "Jr6oXIHlUxOe" + }, + "source": [ + "np.min(a_retornos)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "UxE47cN0U54X" + }, + "source": [ + "np.max(a_retornos)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "OTB9HnIac499" + }, + "source": [ + "___\n", + "# **Ordenar itens de um array**\n", + "> Considere o array a seguir:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "Jgj8Yw46dBMx" + }, + "source": [ + "np.random.seed(20111974)\n", + "a_conjunto1 = np.random.random(10)\n", + "a_conjunto1" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "cC9272GFdRln" + }, + "source": [ + "Ordenando os itens de a_conjunto1..." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "YUP90nBVdUeF" + }, + "source": [ + "np.sort(a_conjunto1)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "lG763cDGj-yB" + }, + "source": [ + "___\n", + "# **Obter ajuda**" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "ehxPlD3EkEYL" + }, + "source": [ + "help(np.random.normal)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "1Q_konJVaBsV" + }, + "source": [ + "___\n", + "# **Criar arrays 1D**" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "DddZT5kadYJ7" + }, + "source": [ + "import numpy as np\n", + "np.set_printoptions(precision = 2, suppress = True)\n", + "np.random.seed(seed = 20111974)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "jaqd-VnF3yIt" + }, + "source": [ + "Criar o array 1D a_conjunto1, com os seguintes números:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "E3niz_zHaF3e" + }, + "source": [ + "a_conjunto1 = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])\n", + "a_conjunto1" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "DyfXbW_ZKJBS" + }, + "source": [ + "Qual a dimensão de a_conjunto1?" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "gbHlydALKB3R" + }, + "source": [ + "# Dimensão do array\n", + "a_conjunto1.ndim" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "am9otElpKNPa" + }, + "source": [ + "Qual o shape (dimensão) do array a_conjunto1?" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "juJJ74d2wale" + }, + "source": [ + "# Números de itens no array\n", + "a_conjunto1.shape" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "BHg4Rre3GwPy" + }, + "source": [ + "O array a_conjunto1 poderia ter sido criado usando a função np.arange(inicio, fim, step):" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "I3fyusN7G5Zn" + }, + "source": [ + "# Lembre-se que o número 10 é exclusive.\n", + "a_conjunto2 = np.arange(start = 0, stop = 10, step = 1)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "IHCEpmUxXsaK" + }, + "source": [ + "Outra alternativa seria usar np.linspace(start = 0, stop = 10, num = 9). Acompanhe a seguir:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "JB9Y_x3RX1GX" + }, + "source": [ + "# Com np.linspace, o valor 9 é inclusive.\n", + "a_conjunto3 = np.linspace(0, 9, 10)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "P6MR8MPeYOZm" + }, + "source": [ + "Compare os resultados de a_conjunto1, a_conjunto2 e a_conjunto3 a seguir:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "tWEzge6HYSFu" + }, + "source": [ + "a_conjunto1" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "lUNlFVKYYT9f" + }, + "source": [ + "a_conjunto2" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "Xo8Lid5fYVPW" + }, + "source": [ + "a_conjunto3" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "V9aW7C4vHAcF" + }, + "source": [ + "Ou seja, a_conjunto1 é igual a a_conjunto2 que também é igual a a_conjunto3. Ok?\n", + "\n", + "**ATENÇÃO**: Observe que a sintaxe para criar a_conjunto3 é ligeiramente diferente da sintaxe usada para criar a_conjunto1 e a_conjunto2. Abaixo, a sintaxe do comando np.linspace:\n", + "\n", + "![](https://github.com/MathMachado/Materials/blob/master/linspace_sintaxe.PNG?raw=true)\n", + "\n", + "Source: [HOW TO USE THE NUMPY LINSPACE FUNCTION](https://www.sharpsightlabs.com/blog/numpy-linspace/)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "KNnwZa3uvYqE" + }, + "source": [ + "Soma 2 à cada item de a_conjunto1:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "Jt2KVyviw0bp" + }, + "source": [ + "a_conjunto1" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "arROkhWXbdTW" + }, + "source": [ + "a_conjunto2 = a_conjunto1 + 2\n", + "a_conjunto2" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ZJx2vG86vdVi" + }, + "source": [ + "Multiplicar por 10 cada item de a_conjunto1:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "Vm7abO6Ebkun" + }, + "source": [ + "a_conjunto1 = a_conjunto1*10\n", + "a_conjunto1" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "0Ev1xnBwaYJG" + }, + "source": [ + "___\n", + "# **Criar Arrays Multidimensionais**\n", + "> Ao criarmos, por exemplo, um array 2D, então a chamamos de matriz." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "gHaeAug5vjjd" + }, + "source": [ + "Criar o array com 2 linhas e 3 colunas usando números aleatórios:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "VDi0vIPSYR4F" + }, + "source": [ + "np.random.seed(20111974)\n", + "a_conjunto1 = np.random.randn(2, 3)\n", + "a_conjunto1" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "DIdd-nA3tJjV" + }, + "source": [ + "## Dimensão de um array\n", + "> Dimensão é o número de linhas e colunas da matriz." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "pKvjjnkrK-v7" + }, + "source": [ + "a_conjunto1.shape" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "-DHS5jXELCfa" + }, + "source": [ + "a_conjunto1 é um array 2D (ou matriz), ou seja, 2 linhas, onde cada linha tem 3 elementos." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "HJI6X1wvv4Bg" + }, + "source": [ + "Criar um array com 3 linhas e 3 colunas:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "hXPbWh3Tv26T" + }, + "source": [ + "a_conjunto2 = np.array([[1, 2, 3],[4, 5, 6],[7, 8, 9]])\n", + "a_conjunto2" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "we6ZJOICc7bQ" + }, + "source": [ + "# Número de linhas e colunas de a_conjunto1:\n", + "a_conjunto1.shape" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "f0ocwuI1dED6" + }, + "source": [ + "# Número de linhas e colunas de a_conjunto2\n", + "a_conjunto2.shape" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "CApPtnW0YuRP" + }, + "source": [ + "# Somar 2 à cada elemento de a_conjunto2\n", + "a_conjunto2 = a_conjunto2+2\n", + "a_conjunto2" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "M87aGmxRY3RW" + }, + "source": [ + "# Multiplicar por 10 cada elemento de a_conjunto2\n", + "a_conjunto2 = a_conjunto2*10\n", + "a_conjunto2" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "qZt93y1IL_v7" + }, + "source": [ + "___\n", + "# **Copiar arrays**\n", + "> Considere o array abaixo:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "sH2FTXj5MRRC" + }, + "source": [ + "np.random.seed(20111974)\n", + "a_conjunto1 = np.random.randn(2, 3)\n", + "a_conjunto1" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "VtgKeMt6MYrr" + }, + "source": [ + "Fazendo a cópia de a_conjunto1..." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "K0hOHR3IMa-o" + }, + "source": [ + "a_salarios_copia = a_conjunto1.copy()\n", + "a_salarios_copia" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "lFpmcR0HkCar" + }, + "source": [ + "___\n", + "# **Operações com arrays**\n", + "> Considere um array com temperaturas em Farenheit dado por:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "VnagcUqVkLhW" + }, + "source": [ + "# Define a seed\n", + "np.random.seed(20111974)\n", + "\n", + "a_temperatura_farenheit = np.array(np.random.randint(0, 100, 10))\n", + "a_temperatura_farenheit " + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "VrjNKfXxk1yv" + }, + "source": [ + "type(a_temperatura_farenheit)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "o1STejhrk0kZ" + }, + "source": [ + "Transformando a temperatura Fahrenheit em Celsius..." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "E_jXflR_lNy3" + }, + "source": [ + "a_temperatura_celsius = 5*a_temperatura_farenheit/9 - 5*32/9\n", + "a_temperatura_celsius" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "U4pCv0pNqPZI" + }, + "source": [ + "# O mesmo resultado, porém, escrito de forma diferente:\n", + "a_temperatura_celsius = (5/9)*a_temperatura_farenheit - (160/9)\n", + "a_temperatura_celsius" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "1UT4YD2FawUA" + }, + "source": [ + "___\n", + "# **Selecionar itens**" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "pqOv8P1za1m8" + }, + "source": [ + "# Selecionar o segundo item de a_conjunto1 (lembre-se que no Python arrays começam com indice = 0)\n", + "a_conjunto1[1]" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "TIwVKk6AyRv6" + }, + "source": [ + "Dado a_conjunto2 abaixo:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "zoDmbXo6bCeu" + }, + "source": [ + "a_conjunto2" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "iJXSPp-0yb4w" + }, + "source": [ + "... selecionar o item da linha 2, coluna 3 do array a_conjunto2:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "sJiVfnlzcjRv" + }, + "source": [ + "a_conjunto2[1, 2]" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "Xl5HwJIMcv2e" + }, + "source": [ + "# Selecionar o último elemento de a_conjunto1 --> Lembre-se que a_conjunto1 é um array. Desta forma, teremos o último elemento do array!\n", + "a_conjunto1[-1]" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ezTH0HsyrnAl" + }, + "source": [ + "Veja..." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "OBv9EM54rYX3" + }, + "source": [ + "a_conjunto1" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "Po3WLFC-rod8" + }, + "source": [ + "a_temperatura_celsius[-1]" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "4qJJ2HCedW4h" + }, + "source": [ + "___\n", + "# **Aplicar funções como max(), min() e etc**" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "_meTJdUsda4e" + }, + "source": [ + "f'O máximo de a_conjunto1 é: {np.max(a_conjunto1)}'" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "m-wiBkAidnhN" + }, + "source": [ + "f'O mínimo de a_conjunto1 é: {np.min(a_conjunto1)}'" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "lmupnRHQdtwh" + }, + "source": [ + "f'O máximo de a_conjunto2 é: {np.max(a_conjunto2)}'" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "H2z7oB6Bd786" + }, + "source": [ + "f'O máximo de cada LINHA de a_conjunto2 é: {np.max(a_conjunto2, axis = 1)}' # Aqui, axis = 1 é que diz ao numpy que estamos interessados nas linhas" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "gj2ZBDsWeMyk" + }, + "source": [ + "f'O máximo de cada COLUNA de a_conjunto2 é: {np.max(a_conjunto2, axis = 0)}' # axis = 0, diz ao numpy que estamos interessados nas colunas." + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "7_tEfm2IecIU" + }, + "source": [ + "___\n", + "# **Calcular Estatísticas Descritivas: média e variância**" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "lIY5jx3ueh7q" + }, + "source": [ + "f'A média de a_conjunto1 é: {np.mean(a_conjunto1)}'" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "VmqSELRReuAW" + }, + "source": [ + "f'A média de a_conjunto2 é: {np.mean(a_conjunto2)}'" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "Gxap-Wg5e2_H" + }, + "source": [ + "f'O Desvio Padrão de a_conjunto2 é: {np.std(a_conjunto2)}'" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "R0GcljGtfBvP" + }, + "source": [ + "___\n", + "# **Reshaping**\n", + "> Muito útil em Machine Learning." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "vfEmw01j8zux" + }, + "source": [ + "## Exemplo 1\n", + "* O array a_conjunto2 tem a seguinte forma:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "-Lb3VZCCfK_a" + }, + "source": [ + "a_conjunto2" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "YWN_nN-4fD7u" + }, + "source": [ + "# reshaping para 9 linhas e 1 coluna:\n", + "a_conjunto2.reshape(9, 1) # a_conjunto2.reshape(9,-1) produz o mesmo resultado." + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "id9ILRRt7SwY" + }, + "source": [ + "## Mais um exemplo de Reshape\n", + "> Dado o array 1D abaixo, reshape para um array 3D com 2 colunas." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "9RA9Ht2b7Swd", + "outputId": "eadedfd5-fd6c-49c8-db5c-6f8f30d45f36", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + } + }, + "source": [ + "# Define seed\n", + "np.random.seed(20111974)\n", + "a_conjunto1 = np.array(np.random.randint(1, 10, size = 15))\n", + "a_conjunto1" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([9, 9, 3, 9, 2, 9, 1, 5, 3, 1, 9, 4, 8, 2, 4])" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 19 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "8KxR4xZT7cRv" + }, + "source": [ + "### Solução\n", + "> Temos 15 elementos em a_conjunto1 para construir (\"reshape\") um array 3D com 2 colunas.\n", + "\n", + "A princípio, a solução seria..." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "VMdHl1Il7wLw", + "outputId": "d51c7263-f523-4af8-9606-ee93cab66f1c", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 163 + } + }, + "source": [ + "a_conjunto1.reshape(-1, 2) # O valor \"-1\" na posição das linhas pede ao NumPy para calcular o número de linhas automaticamente." + ], + "execution_count": null, + "outputs": [ + { + "output_type": "error", + "ename": "ValueError", + "evalue": "ignored", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mValueError\u001b[0m Traceback (most recent call last)", + "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0ma_numeros1\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mreshape\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m-\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m2\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;31m# O valor \"-1\" na posição das linhas pede ao NumPy para calcular o número de linhas automaticamente.\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", + "\u001b[0;31mValueError\u001b[0m: cannot reshape array of size 15 into shape (2)" + ] + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "pZS4b4-y708q" + }, + "source": [ + "Porque temos esse erro?" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "4disywvR8HeH" + }, + "source": [ + "E se fizermos..." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "3oEAAXTp8I7Z", + "outputId": "e8c8a90f-c34a-4304-d9b4-fd7f04ce224f", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + } + }, + "source": [ + "# Define seed\n", + "np.random.seed(20111974)\n", + "a_conjunto1 = np.array(np.random.randint(1, 10, size = 16)) # Observe que agora temos 16 elementos\n", + "a_conjunto1" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([9, 9, 3, 9, 2, 9, 1, 5, 3, 1, 9, 4, 8, 2, 4, 3])" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 21 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "iUhth0QV8Rpt" + }, + "source": [ + "Reshapping..." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "9D1y7uD88Qip", + "outputId": "e7d22bcd-c10f-4ea3-e41b-03f6f98a054f", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 153 + } + }, + "source": [ + "a_conjunto1.reshape(-1, 2) # O valor \"-1\" na posição das linhas pede ao NumPy para calcular o número de linhas automaticamente." + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([[9, 9],\n", + " [3, 9],\n", + " [2, 9],\n", + " [1, 5],\n", + " [3, 1],\n", + " [9, 4],\n", + " [8, 2],\n", + " [4, 3]])" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 22 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "ALh-sq7DMnN5", + "outputId": "db373349-7910-4f1f-93f3-8ac8f67da8b8", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 153 + } + }, + "source": [ + "# OU --> Neste caso, estamos reshaping o array em 8 linhas e 2 colunas\n", + "a_conjunto1.reshape(8, -1)" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([[9, 9],\n", + " [3, 9],\n", + " [2, 9],\n", + " [1, 5],\n", + " [3, 1],\n", + " [9, 4],\n", + " [8, 2],\n", + " [4, 3]])" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 26 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "yvTnrszn8Yk0" + }, + "source": [ + "Porque agora deu certo?" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "LeQ9LqIE8baG" + }, + "source": [ + "## Último exemplo com reshape\n", + "> Considere o array a seguir:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "OQOC9iiN8hZT" + }, + "source": [ + "np.random.seed(20111974)\n", + "a_conjunto1 = np.random.randn(2, 3)\n", + "a_conjunto1" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Cvce8qBl9Cvq" + }, + "source": [ + "Queremos agora transformá-la num array de 3 linhas e 2 colunas." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "QDDsYoVt9Klz" + }, + "source": [ + "a_conjunto1.reshape(-1, 2)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "AdwU5ygt9Svq" + }, + "source": [ + "Poderia ser..." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "5uBeokKc9Uo-" + }, + "source": [ + "a_conjunto1.reshape(3, -1)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "OeRBsobc9aKj" + }, + "source": [ + "E por fim, também poderia ser..." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "MDt8UYYH9dBw" + }, + "source": [ + "a_conjunto1.reshape(3, 2)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "91o5vycQfdKW" + }, + "source": [ + "___\n", + "# **Transposta**\n", + "* O array a_conjunto2 tem a seguinte forma:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "RsZwyuhoffjb" + }, + "source": [ + "a_conjunto2" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "A3MzTVoGfiyO" + }, + "source": [ + "# Transposta do array a_conjunto2 é dado por:\n", + "a_conjunto2.T" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Ij-ZW5IyzXIb" + }, + "source": [ + "Ou seja, linha virou coluna. Ok?" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "qLy6ajgpt3lU" + }, + "source": [ + "# **Inversa da matriz quadrada**\n", + "> Se uma matriz é não-singular, então sua inversa existe.\n", + "\n", + "* Se o determinante de uma matriz is not equal to zero, then the matrix isé diferente de 0, então a matriz é não-singular." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "-u7jRq34t9_x" + }, + "source": [ + "import numpy as np\n", + "\n", + "a_conjunto1 = np.array([[1, 2, 3],[4, 5, 6],[7, 8, 9]])\n", + "a_conjunto2 = np.array([[6, 2], [5, 3]])\n", + "a_conjunto3 = np.array([[1, 3, 5],[2, 5, 1],[2, 3, 8]])" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "7zmHHWWlvaYB" + }, + "source": [ + "a_conjunto1" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "3fHKyhOJvcak" + }, + "source": [ + "a_conjunto2" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "vQG7yyfjwLg9" + }, + "source": [ + "a_conjunto3" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "qa2Yre2rwgRk" + }, + "source": [ + "## Determinantes da matriz quadrada" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "N6jwuC6twkyc" + }, + "source": [ + "np.linalg.det(a_conjunto1)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "QSvViNwzwnhI" + }, + "source": [ + "np.linalg.det(a_conjunto2)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "o8jwsnccw5id" + }, + "source": [ + "np.linalg.det(a_conjunto3)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "kkVaTgzgw_XJ" + }, + "source": [ + "A seguir, calculamos as inversas das matrizes acima definidas..." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "b9FgWvTYvpik" + }, + "source": [ + "np.linalg.inv(a_conjunto2)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "KsdEt1kIvsM_" + }, + "source": [ + "np.linalg.inv(a_conjunto1)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "VA_F7_7kccpn" + }, + "source": [ + "Porque não temos a inversa de a_conjunto1?" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "ANPBCnmVwOf4" + }, + "source": [ + "np.linalg.inv(a_conjunto3)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "XAf9k1egxcdF" + }, + "source": [ + "# **Resolver sistemas de equações lineares**\n", + "> Considere o sistema de euqações lineares abaixo:\n", + "\n", + "\\begin{equation}\n", + "x + 3y + 5z = 10\\\\\n", + "2x+ 5y + z = 8 \\\\\n", + "2x + 3y + 8z= 3\n", + "\\end{equation}\n", + "\n", + "Ou $Ax = b$. A solução deste sistema de equações é dada por $A^{-1}b$." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "oNf5nqaLxhBY" + }, + "source": [ + "Ou seja, basta encontrarmos a inversa de A e multiplicarmos por b." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "omzC5dGA0btc" + }, + "source": [ + "A= np.array([[1, 3, 5], [2, 5, 1], [2, 3, 8]])\n", + "np.linalg.inv(A)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "AiXI3oxB05iE" + }, + "source": [ + "Agora basta multiplicar a matriz inversa $A^{-1}$ acima por b. " + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "XoGebKDa2Fcd" + }, + "source": [ + "A_Inv = np.linalg.inv(A)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "sKaP0a1QZG-P" + }, + "source": [ + "b= np.array([10, 8, 3]).reshape(3, -1)\n", + "b" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "3dAVq8dg19VI" + }, + "source": [ + "A_Inv.dot(b)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "zso6hTnB17cm" + }, + "source": [ + "Uma forma fácil de se fazer isso é utilizar a expressão abaixo:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "ptQHIVll1E4P" + }, + "source": [ + "b= np.array([[10], [8], [3]])\n", + "b" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "X4VL8lyY1Xus" + }, + "source": [ + "np.linalg.solve(A, b)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "fJKmwTS59-Bc" + }, + "source": [ + "# **Empilhar arrays**\n", + "\n", + "## Exemplo 1\n", + "\n", + "![Empilhar1](https://github.com/MathMachado/Materials/blob/master/Empilhar1.PNG?raw=true)\n", + "\n", + "## Exemplo 2\n", + "\n", + "![Empilhar2](https://github.com/MathMachado/Materials/blob/master/Empilhar2.PNG?raw=true)\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "rhPTt3EwXden" + }, + "source": [ + "## Gerar os arrays do exemplo1" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "zEI-yBy3-E46" + }, + "source": [ + "np.random.seed(20111974)\n", + "a_conjunto1 = np.random.randn(5, 8)\n", + "\n", + "np.random.seed(19741120)\n", + "a_conjunto2 = np.random.randn(8, 8)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "UYsAqBRp--79" + }, + "source": [ + "## Método 1 - Concatenate([A, B])" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "HgO1ujvhObyE", + "outputId": "c40e7ed9-255b-4886-dddf-3b17f2b1be2f", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 187 + } + }, + "source": [ + "a_conjunto1" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([[ 2.5062768 , 1.11440422, 2.05565501, 0.56482376, 0.29897276,\n", + " 1.04930857, -0.12607366, 1.06227632],\n", + " [ 1.13807032, 1.37966044, -2.05995563, 0.67474814, 0.72722843,\n", + " -0.33923852, 0.43613107, 0.59135489],\n", + " [-1.29281877, 1.17712036, -0.98644163, -1.79034143, -1.08913605,\n", + " -0.90712825, -1.02291108, -1.36445713],\n", + " [-0.29429164, 0.06343709, -1.14196185, -0.50706079, -0.83539436,\n", + " -1.41492946, -0.2159062 , -1.16519474],\n", + " [-0.60767518, -0.61510925, 1.0771542 , 0.5043687 , 0.02674197,\n", + " 1.83494644, 0.34728874, -1.14671885]])" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 33 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "2aQY_klZOeg9", + "outputId": "14eb3d9c-d0fc-4b6a-fe19-1790695c838f", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 289 + } + }, + "source": [ + "a_conjunto2" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([[-0.77337752, -1.10547465, 0.10062807, -1.14571729, -2.15266227,\n", + " -0.75255725, -2.1529949 , -0.33017773],\n", + " [-1.10465731, 0.32889675, 0.01010198, -1.33213633, -0.33945805,\n", + " -0.01299007, 0.05342823, -0.18641201],\n", + " [ 0.39473805, -0.89354231, -0.50667323, -0.74660913, 1.83586365,\n", + " -1.20536871, 1.20184886, 0.51160897],\n", + " [-0.56952286, -0.93343871, -0.24972528, 0.98487133, 1.19333367,\n", + " 2.29956497, 0.16657022, 0.71357415],\n", + " [-0.45251078, 0.92163918, 0.73421263, 2.17811191, -0.05655212,\n", + " 1.25326 , -0.37039248, 1.43855202],\n", + " [ 0.85646091, -0.11257239, -0.35400297, 0.94136671, -0.08696163,\n", + " -1.49000701, 0.00848666, 0.86705275],\n", + " [ 1.6340906 , 1.36321063, -0.02175361, -0.45301645, -0.37111236,\n", + " -0.04716069, -2.27337435, 0.95318738],\n", + " [ 0.7100548 , -0.79883269, -0.3165779 , -1.58352824, -0.37751484,\n", + " -0.29760341, -0.73424207, -0.55703223]])" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 34 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "bK70vaq8_KMH", + "outputId": "f6d400cf-4b54-4990-815b-052f5224aadd", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 459 + } + }, + "source": [ + "np.concatenate([a_conjunto1, a_conjunto2], axis = 0) # axis= 0 diz ao NumPy para empilhar as linhas" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([[ 2.5062768 , 1.11440422, 2.05565501, 0.56482376, 0.29897276,\n", + " 1.04930857, -0.12607366, 1.06227632],\n", + " [ 1.13807032, 1.37966044, -2.05995563, 0.67474814, 0.72722843,\n", + " -0.33923852, 0.43613107, 0.59135489],\n", + " [-1.29281877, 1.17712036, -0.98644163, -1.79034143, -1.08913605,\n", + " -0.90712825, -1.02291108, -1.36445713],\n", + " [-0.29429164, 0.06343709, -1.14196185, -0.50706079, -0.83539436,\n", + " -1.41492946, -0.2159062 , -1.16519474],\n", + " [-0.60767518, -0.61510925, 1.0771542 , 0.5043687 , 0.02674197,\n", + " 1.83494644, 0.34728874, -1.14671885],\n", + " [-0.77337752, -1.10547465, 0.10062807, -1.14571729, -2.15266227,\n", + " -0.75255725, -2.1529949 , -0.33017773],\n", + " [-1.10465731, 0.32889675, 0.01010198, -1.33213633, -0.33945805,\n", + " -0.01299007, 0.05342823, -0.18641201],\n", + " [ 0.39473805, -0.89354231, -0.50667323, -0.74660913, 1.83586365,\n", + " -1.20536871, 1.20184886, 0.51160897],\n", + " [-0.56952286, -0.93343871, -0.24972528, 0.98487133, 1.19333367,\n", + " 2.29956497, 0.16657022, 0.71357415],\n", + " [-0.45251078, 0.92163918, 0.73421263, 2.17811191, -0.05655212,\n", + " 1.25326 , -0.37039248, 1.43855202],\n", + " [ 0.85646091, -0.11257239, -0.35400297, 0.94136671, -0.08696163,\n", + " -1.49000701, 0.00848666, 0.86705275],\n", + " [ 1.6340906 , 1.36321063, -0.02175361, -0.45301645, -0.37111236,\n", + " -0.04716069, -2.27337435, 0.95318738],\n", + " [ 0.7100548 , -0.79883269, -0.3165779 , -1.58352824, -0.37751484,\n", + " -0.29760341, -0.73424207, -0.55703223]])" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 35 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "CpaXBkm8_BF8" + }, + "source": [ + "## Método 2 - np.r_[A, B]" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "3QnVUzAY_teZ", + "outputId": "e8adfd85-e760-40f5-d9ac-48353d24ccd2", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 459 + } + }, + "source": [ + "np.r_[a_conjunto1, a_conjunto2]" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([[ 2.5062768 , 1.11440422, 2.05565501, 0.56482376, 0.29897276,\n", + " 1.04930857, -0.12607366, 1.06227632],\n", + " [ 1.13807032, 1.37966044, -2.05995563, 0.67474814, 0.72722843,\n", + " -0.33923852, 0.43613107, 0.59135489],\n", + " [-1.29281877, 1.17712036, -0.98644163, -1.79034143, -1.08913605,\n", + " -0.90712825, -1.02291108, -1.36445713],\n", + " [-0.29429164, 0.06343709, -1.14196185, -0.50706079, -0.83539436,\n", + " -1.41492946, -0.2159062 , -1.16519474],\n", + " [-0.60767518, -0.61510925, 1.0771542 , 0.5043687 , 0.02674197,\n", + " 1.83494644, 0.34728874, -1.14671885],\n", + " [-0.77337752, -1.10547465, 0.10062807, -1.14571729, -2.15266227,\n", + " -0.75255725, -2.1529949 , -0.33017773],\n", + " [-1.10465731, 0.32889675, 0.01010198, -1.33213633, -0.33945805,\n", + " -0.01299007, 0.05342823, -0.18641201],\n", + " [ 0.39473805, -0.89354231, -0.50667323, -0.74660913, 1.83586365,\n", + " -1.20536871, 1.20184886, 0.51160897],\n", + " [-0.56952286, -0.93343871, -0.24972528, 0.98487133, 1.19333367,\n", + " 2.29956497, 0.16657022, 0.71357415],\n", + " [-0.45251078, 0.92163918, 0.73421263, 2.17811191, -0.05655212,\n", + " 1.25326 , -0.37039248, 1.43855202],\n", + " [ 0.85646091, -0.11257239, -0.35400297, 0.94136671, -0.08696163,\n", + " -1.49000701, 0.00848666, 0.86705275],\n", + " [ 1.6340906 , 1.36321063, -0.02175361, -0.45301645, -0.37111236,\n", + " -0.04716069, -2.27337435, 0.95318738],\n", + " [ 0.7100548 , -0.79883269, -0.3165779 , -1.58352824, -0.37751484,\n", + " -0.29760341, -0.73424207, -0.55703223]])" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 36 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "XmSPbDP6_20W" + }, + "source": [ + "**Obs**.: Eu prefiro este método!" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "dzVKW_wX_Dzw" + }, + "source": [ + "## Método 3 - np.vstack([A, B]) = np.r_[A, B]" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "uL7lEN_mABID", + "outputId": "d1ea4d86-2cc1-4e2d-af72-b3a292ef15fd", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 459 + } + }, + "source": [ + "np.vstack([a_conjunto1, a_conjunto2])" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([[ 2.5062768 , 1.11440422, 2.05565501, 0.56482376, 0.29897276,\n", + " 1.04930857, -0.12607366, 1.06227632],\n", + " [ 1.13807032, 1.37966044, -2.05995563, 0.67474814, 0.72722843,\n", + " -0.33923852, 0.43613107, 0.59135489],\n", + " [-1.29281877, 1.17712036, -0.98644163, -1.79034143, -1.08913605,\n", + " -0.90712825, -1.02291108, -1.36445713],\n", + " [-0.29429164, 0.06343709, -1.14196185, -0.50706079, -0.83539436,\n", + " -1.41492946, -0.2159062 , -1.16519474],\n", + " [-0.60767518, -0.61510925, 1.0771542 , 0.5043687 , 0.02674197,\n", + " 1.83494644, 0.34728874, -1.14671885],\n", + " [-0.77337752, -1.10547465, 0.10062807, -1.14571729, -2.15266227,\n", + " -0.75255725, -2.1529949 , -0.33017773],\n", + " [-1.10465731, 0.32889675, 0.01010198, -1.33213633, -0.33945805,\n", + " -0.01299007, 0.05342823, -0.18641201],\n", + " [ 0.39473805, -0.89354231, -0.50667323, -0.74660913, 1.83586365,\n", + " -1.20536871, 1.20184886, 0.51160897],\n", + " [-0.56952286, -0.93343871, -0.24972528, 0.98487133, 1.19333367,\n", + " 2.29956497, 0.16657022, 0.71357415],\n", + " [-0.45251078, 0.92163918, 0.73421263, 2.17811191, -0.05655212,\n", + " 1.25326 , -0.37039248, 1.43855202],\n", + " [ 0.85646091, -0.11257239, -0.35400297, 0.94136671, -0.08696163,\n", + " -1.49000701, 0.00848666, 0.86705275],\n", + " [ 1.6340906 , 1.36321063, -0.02175361, -0.45301645, -0.37111236,\n", + " -0.04716069, -2.27337435, 0.95318738],\n", + " [ 0.7100548 , -0.79883269, -0.3165779 , -1.58352824, -0.37751484,\n", + " -0.29760341, -0.73424207, -0.55703223]])" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 37 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "68icJ-2ZAdRj" + }, + "source": [ + "# Concatenar arrays\n", + "\n", + "## Exemplo 1\n", + "\n", + "![Concatenar1](https://github.com/MathMachado/Materials/blob/master/Concatenar1.PNG?raw=true)\n", + "\n", + "# Exemplo 2\n", + "\n", + "![Concatenar2](https://github.com/MathMachado/Materials/blob/master/Concatenar2.PNG?raw=true)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "OplgK9YoQi9o" + }, + "source": [ + "## Concatenar os elementos de dois arrays - np.c_[A, B]" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "lpdsbTEKQ9EY" + }, + "source": [ + "np.random.seed(20111974)\n", + "a_conjunto1 = np.random.randint(0, 10, 100).reshape(-1, 10)\n", + "a_conjunto2 = np.random.randint(0, 2, 10).reshape(-1, 1)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "JPxhGsaSSMk2", + "outputId": "47727fe9-05f1-4ff7-ec0a-04579120cf78", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 187 + } + }, + "source": [ + "a_conjunto1" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([[8, 8, 2, 8, 9, 1, 8, 0, 4, 2],\n", + " [0, 8, 9, 3, 7, 1, 3, 2, 9, 7],\n", + " [7, 9, 5, 6, 8, 7, 0, 9, 3, 9],\n", + " [3, 1, 8, 6, 3, 5, 4, 1, 2, 9],\n", + " [8, 6, 6, 1, 0, 9, 2, 0, 7, 5],\n", + " [5, 4, 4, 2, 7, 2, 7, 9, 3, 1],\n", + " [5, 0, 1, 2, 3, 8, 7, 5, 4, 0],\n", + " [5, 9, 6, 6, 1, 3, 6, 0, 4, 9],\n", + " [2, 1, 0, 9, 1, 4, 2, 9, 7, 9],\n", + " [5, 3, 7, 6, 3, 9, 8, 4, 3, 0]])" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 39 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "9ZyUPfybTfej", + "outputId": "ac27a20e-1622-4cb9-d6f6-74ee467bdb72", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 187 + } + }, + "source": [ + "a_conjunto2" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([[1],\n", + " [0],\n", + " [0],\n", + " [0],\n", + " [0],\n", + " [1],\n", + " [0],\n", + " [0],\n", + " [0],\n", + " [1]])" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 40 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "nS1cPG3aRug1", + "outputId": "c70cf891-ae8f-445d-c271-c6b7f7da1738", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 187 + } + }, + "source": [ + "# colocando o array a_conjunto2 do lado de a_conjunto1.\n", + "np.c_[a_conjunto1, a_conjunto2]" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([[8, 8, 2, 8, 9, 1, 8, 0, 4, 2, 1],\n", + " [0, 8, 9, 3, 7, 1, 3, 2, 9, 7, 0],\n", + " [7, 9, 5, 6, 8, 7, 0, 9, 3, 9, 0],\n", + " [3, 1, 8, 6, 3, 5, 4, 1, 2, 9, 0],\n", + " [8, 6, 6, 1, 0, 9, 2, 0, 7, 5, 0],\n", + " [5, 4, 4, 2, 7, 2, 7, 9, 3, 1, 1],\n", + " [5, 0, 1, 2, 3, 8, 7, 5, 4, 0, 0],\n", + " [5, 9, 6, 6, 1, 3, 6, 0, 4, 9, 0],\n", + " [2, 1, 0, 9, 1, 4, 2, 9, 7, 9, 0],\n", + " [5, 3, 7, 6, 3, 9, 8, 4, 3, 0, 1]])" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 41 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "kIgU1YBw0OeM" + }, + "source": [ + "___\n", + "# **Selecionar itens que satisfazem condições**\n", + "> Considere o array a seguir:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "e2pL5anBV0DI", + "outputId": "f37cd827-ee00-49ba-994d-77cab3a24421", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + } + }, + "source": [ + "a_conjunto1 = np.arange(10, 0, -1)\n", + "a_conjunto1" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([10, 9, 8, 7, 6, 5, 4, 3, 2, 1])" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 42 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "i9HuZZAfV302" + }, + "source": [ + "Selecionar somente os itens > 7:" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ZCESvr7iXMkV" + }, + "source": [ + "## Usando np.where()" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "BdrAQLHkTS-v", + "outputId": "44a6e480-1b6c-4dad-ee29-2fcb4ada5097", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + } + }, + "source": [ + "a_conjunto1" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([10, 9, 8, 7, 6, 5, 4, 3, 2, 1])" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 45 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "O_ZBaWxfWA9o", + "outputId": "fae44244-ff29-4b04-cd2d-a4c768487e75", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + } + }, + "source": [ + "# Índices do array que atendem a condição\n", + "l_indices = np.where(a_conjunto1 > 7)\n", + "l_indices" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "(array([0, 1, 2]),)" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 44 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "EdWlfPOZWPME" + }, + "source": [ + "**Atenção**: Capturamos os índices. Para selecionar os itens, basta fazer:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "tOxs3iYQWWxu", + "outputId": "b402fdfd-c6e0-4170-b35c-c7c5cd2ca85e", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + } + }, + "source": [ + "a_conjunto2 = a_conjunto1[l_indices]\n", + "a_conjunto2" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([10, 9, 8])" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 46 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "PGsENqkaXRjh" + }, + "source": [ + "## Alternativa: Usando []" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "YbdRNk1WXTLT", + "outputId": "062b157c-00fb-4f8f-d207-a0c8e9871e48", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + } + }, + "source": [ + "a_conjunto1[a_conjunto1 > 7]" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([10, 9, 8])" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 47 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "jijpzFxcSQC8" + }, + "source": [ + "Acho que vale a pena quebrar esta solução para entendermos melhor como as coisas funcionam:#" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "rujhP2LQSWsq" + }, + "source": [ + " # Primeiro, avalie o resultado de a_conjunto1 > 7:" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "FYZaBsasSb3N", + "outputId": "0a190896-249c-4d7c-ea0d-a20a53536446", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 51 + } + }, + "source": [ + "a_conjunto1 > 7" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([ True, True, True, False, False, False, False, False, False,\n", + " False])" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 48 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "mvEof-UKaaVG" + }, + "source": [ + "a_conjunto1[a_conjunto1 > 7]" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "nO4FiBmDUZOT", + "outputId": "9f54e601-d95a-444c-bd59-28947e332248", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + } + }, + "source": [ + "a_conjunto1" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([-1, -1, -1, 7, 6, 5, 4, 3, 2, 1])" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 52 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Ci5lT9nmSfsX" + }, + "source": [ + "Agora, com este resultado, fica fácil entender como o Python seleciona os elementos. Consegue explicar?" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "1v5Lfin0GGKD" + }, + "source": [ + "# Substituir itens baseado em condições\n", + "> Substituir os valores negativos do array abaixo por 0." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "CLY_u0ePWdN7" + }, + "source": [ + "## Gerar o exemplo" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "NUANFy-fNXf5" + }, + "source": [ + "np.random.seed(20111974)\n", + "a_conjunto1 = np.array(np.random.randint(0, 10, size = 100))\n", + "\n", + "# Lista aleatória de índices que vou alterar\n", + "np.random.seed(20111974)\n", + "l_indices= np.random.randint(0, 99, 9)\n", + "\n", + "for i in l_indices:\n", + " a_conjunto1[i] = -1*a_conjunto1[i]\n", + "\n", + "a_conjunto2 = a_conjunto1.copy()\n", + "a_conjunto2" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "dWVyI40uN2d2" + }, + "source": [ + "# Indices a serem multiplicados por -1:\n", + "l_indices" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "3Whuu854OJDZ" + }, + "source": [ + "## Substituir os valores negativos por 0" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "sr268Rp8b-Se", + "outputId": "82514805-b350-45c4-a3fc-7cb24c847b7f", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + } + }, + "source": [ + "a_conjunto2 < 0" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([False, False, False])" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 50 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "C-eKqPrfOQF6" + }, + "source": [ + "a_conjunto2[a_conjunto2 < 0] = 0\n", + "a_conjunto2" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "eDLM0_JSZlfB" + }, + "source": [ + "Observe acima que os valores negativos foram substituídos por 0, como queríamos." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "AEHJ0rA3dHHU" + }, + "source": [ + "## Substituir os valores negativos por 0 e os positivos por 1" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "y32J8SRNZwRF" + }, + "source": [ + "a_conjunto2 = a_conjunto1.copy()\n", + "a_conjunto2" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "1bSD9Fs6P5wW" + }, + "source": [ + "a_conjunto2 = np.where(a_conjunto2 <= 0, 0, 1)\n", + "a_conjunto2" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "i027scjl0qkm" + }, + "source": [ + "___\n", + "# Outliers\n", + "> Qualquer ponto/observação que é incomum quando comparado com todos os outros pontos/observações." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "UnDTqRnZHQ3W" + }, + "source": [ + "## Z-Score\n", + "\n", + "* Z-Score pode ser utilizado para detectar Outliers.\n", + "* É a diferença entre o valor e a média da amostra expressa como o número de desvios-padrão. \n", + "* Se o escore z for menor que 2,5 ou maior que 2,5, o valor estará nos 5% do menor ou maior valor (2,5% dos valores em ambas as extremidades da distribuição). No entanto, é pratica comum utilizarmos 3 ao invés dos 2,5.\n", + "\n", + "![Z_Score](https://github.com/MathMachado/Materials/blob/master/Z_Score.png?raw=true)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "N7gb2zhtd0uM" + }, + "source": [ + "## IQR Score\n", + "\n", + "* O Intervalo interquartil (IQR) é uma medida de dispersão estatística, sendo igual à diferença entre os percentis 75 (Q3) e 25 (Q1), ou entre quartis superiores e inferiores, IQR = Q3 - Q1." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "lMmWOKNvghI7" + }, + "source": [ + "![BoxPlot](https://github.com/MathMachado/Materials/blob/master/boxplot.png?raw=true)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "DUw_a-MjWvBc" + }, + "source": [ + "### Desafio para resolverem\n", + "> **Objetivo**: Simular aleatoriamente o salário de 1.000 pessoas com distribuição N(1.045; 100). \n", + "* Identificar os _outliers_ da distribuição que acabamos de simular;\n", + "* Qual a média da distribuição que simulamos?\n", + "* Qual o desvio-padrão;\n", + "* Plotar o Boxplot da distribuição dos dados;\n", + "* Quantas pessoas > Q3 + 1.5*(Q3-Q1)\n", + "* Substituir os outliers do array por:\n", + " * Q1-1.5*(Q3 - Q1), se ponto < Q1-1.5*(Q3-Q1)\n", + " * Q3+1.5*(Q3 - Q1), se ponto > Q3+1.5*(Q3-Q1)\n", + "\n", + "Obs.: Use np.random.seed(20111974)." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "L9ntAdS_oOAh" + }, + "source": [ + "### Geração aleatória do array a_salarios com distribuição $N(\\mu, \\sigma)$" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "RL0Zb0fyDory", + "outputId": "2a3d2b33-579c-406d-d662-da4458f164e6", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 102 + } + }, + "source": [ + "import numpy as np\n", + "np.random.seed(20111974)\n", + "np.set_printoptions(precision = 2, suppress = True)\n", + "\n", + "media = 1045\n", + "desvio_padrao = 100\n", + "i_tamanho = 1000\n", + "\n", + "a_salarios = np.array(np.random.normal(media, desvio_padrao, size = i_tamanho))\n", + "a_salarios[:30]" + ], + "execution_count": 5, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([1295.63, 1156.44, 1250.57, 1101.48, 1074.9 , 1149.93, 1032.39,\n", + " 1151.23, 1158.81, 1182.97, 839. , 1112.47, 1117.72, 1011.08,\n", + " 1088.61, 1104.14, 915.72, 1162.71, 946.36, 865.97, 936.09,\n", + " 954.29, 942.71, 908.55, 1015.57, 1051.34, 930.8 , 994.29,\n", + " 961.46, 903.51])" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 5 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Fc3a-yhViCTs" + }, + "source": [ + "### Geração aleatória dos índices que serão (manualmente) alterados" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "Iakt6i1cgEcB", + "outputId": "9cc09094-5420-4078-a387-e22a58c13f7a", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + } + }, + "source": [ + "# Lista aleatória de índices que vou alterar\n", + "np.random.seed(19741120)\n", + "l_indices = np.random.randint(0, 999, 10)\n", + "\n", + "# Estas são as posições que serão alteradas\n", + "np.sort(l_indices)" + ], + "execution_count": 6, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([ 14, 105, 208, 349, 484, 567, 615, 616, 622, 847])" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 6 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "oXwME1rciHkw" + }, + "source": [ + "### Cópia dos salários para compararmos o ANTES e DEPOIS" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "BEtnua7sgp_y", + "outputId": "85b67195-c61a-4f05-ea8f-fc5b05d8973b", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 102 + } + }, + "source": [ + "# cópia do array a_salarios\n", + "a_salarios_copia = a_salarios.copy()\n", + "a_salarios_copia2 = a_salarios.copy()\n", + "\n", + "a_salarios[:30]" + ], + "execution_count": 7, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([1295.63, 1156.44, 1250.57, 1101.48, 1074.9 , 1149.93, 1032.39,\n", + " 1151.23, 1158.81, 1182.97, 839. , 1112.47, 1117.72, 1011.08,\n", + " 1088.61, 1104.14, 915.72, 1162.71, 946.36, 865.97, 936.09,\n", + " 954.29, 942.71, 908.55, 1015.57, 1051.34, 930.8 , 994.29,\n", + " 961.46, 903.51])" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 7 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "So8qj3Yrh-Az" + }, + "source": [ + "### Alteração (manual dos salários): 2 alternativas\n", + "> Vamos medir o tempo para avaliarmos o que é mais rápido. Qual solução é mais rápida?" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "Z0613on8z5VH" + }, + "source": [ + "from timeit import default_timer as timer\n", + "from datetime import timedelta" + ], + "execution_count": 8, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "NpvvholVxMhs", + "outputId": "2dbfff71-3249-4fd8-fd48-c9e1356dde34", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + } + }, + "source": [ + "# Índices a serem alterados\n", + "l_indices" + ], + "execution_count": 9, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([567, 14, 616, 484, 208, 105, 349, 615, 622, 847])" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 9 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "BqXsmMdm1yF-" + }, + "source": [ + "#### Solução 1" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "FiiOrlnbgKOD", + "outputId": "82a3c137-568d-4776-d952-11d00b1e40e7", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + } + }, + "source": [ + "# Alteração dos salários dos índices propostos\n", + "start = timer()\n", + "for i_indice in l_indices:\n", + " a_salarios_copia[i_indice] = 2*a_salarios[i_indice] # Loop para os índices a serem alterados (manualmente)\n", + "\n", + "a_salarios_copia[:30]\n", + "end = timer()\n", + "print(timedelta(seconds=end-start))" + ], + "execution_count": 10, + "outputs": [ + { + "output_type": "stream", + "text": [ + "0:00:00.000094\n" + ], + "name": "stdout" + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "FgvKC-aFzWpZ" + }, + "source": [ + "#### Solução 2" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "XWlQC5Jazt26", + "outputId": "8640d081-99ae-4235-e2de-620d6152193b", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + } + }, + "source": [ + "start = timer()\n", + "a_salarios_copia2[l_indices] = 2*a_salarios_copia2[l_indices]\n", + "a_salarios_copia2[:30]\n", + "end = timer()\n", + "print(timedelta(seconds=end-start))" + ], + "execution_count": 11, + "outputs": [ + { + "output_type": "stream", + "text": [ + "0:00:00.000090\n" + ], + "name": "stdout" + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "U92w03afhrmC" + }, + "source": [ + "### Compare" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "Ls-jCFCYhtD8", + "outputId": "04b7eff2-67d0-4f8f-8812-0b4be9bb7cb7", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 51 + } + }, + "source": [ + "# Antes\n", + "a_salarios[l_indices]" + ], + "execution_count": 12, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([ 826.43, 1088.61, 1121.95, 833.96, 1165.97, 1081.13, 1078.51,\n", + " 1094.67, 904.32, 1128.66])" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 12 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "nwwU06OahzD2", + "outputId": "e18448a4-97f4-452c-da95-3d1db69b1033", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 51 + } + }, + "source": [ + "# Depois\n", + "a_salarios_copia[l_indices]" + ], + "execution_count": 13, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([1652.85, 2177.23, 2243.89, 1667.93, 2331.93, 2162.26, 2157.02,\n", + " 2189.34, 1808.63, 2257.32])" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 13 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "qyUUdHmtisJS", + "outputId": "779e41e6-cb77-4966-b5d5-fe4d5b4f2ab2", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 102 + } + }, + "source": [ + "# 30 primeiras elementos de a_salarios\n", + "a_salarios[:30]" + ], + "execution_count": 14, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([1295.63, 1156.44, 1250.57, 1101.48, 1074.9 , 1149.93, 1032.39,\n", + " 1151.23, 1158.81, 1182.97, 839. , 1112.47, 1117.72, 1011.08,\n", + " 1088.61, 1104.14, 915.72, 1162.71, 946.36, 865.97, 936.09,\n", + " 954.29, 942.71, 908.55, 1015.57, 1051.34, 930.8 , 994.29,\n", + " 961.46, 903.51])" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 14 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "CJ1FEjlCi0-n", + "outputId": "5c6c8845-8f83-4047-9f4f-3034d2ec2af7", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 102 + } + }, + "source": [ + "# 30 primeiras posições de a_salarios_copia\n", + "a_salarios_copia[:30]" + ], + "execution_count": 15, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([1295.63, 1156.44, 1250.57, 1101.48, 1074.9 , 1149.93, 1032.39,\n", + " 1151.23, 1158.81, 1182.97, 839. , 1112.47, 1117.72, 1011.08,\n", + " 2177.23, 1104.14, 915.72, 1162.71, 946.36, 865.97, 936.09,\n", + " 954.29, 942.71, 908.55, 1015.57, 1051.34, 930.8 , 994.29,\n", + " 961.46, 903.51])" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 15 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "wKbSUgxxiOUL" + }, + "source": [ + "### Algumas Estatísticas descritivas:\n", + "#### Antes" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "ZnmykyahLWX9", + "outputId": "b2c70db1-2870-48e7-c031-94f122415bc8", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 35 + } + }, + "source": [ + "f'Média: {np.mean(a_salarios)}; Mediana: {np.median(a_salarios)}; STD: {np.std(a_salarios)}'" + ], + "execution_count": 16, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "string" + }, + "text/plain": [ + "'Média: 1047.150212238584; Mediana: 1047.631166829137; STD: 101.18708333868835'" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 16 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "80H92CIjibYJ" + }, + "source": [ + "#### Depois" + ], + "execution_count": 17, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "5iO-BAikieHJ", + "outputId": "ea72b3f5-5682-4971-e1f5-26aec68bc43c", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 35 + } + }, + "source": [ + "f'Média: {np.mean(a_salarios_copia)}; Mediana: {np.median(a_salarios_copia)}; STD: {np.std(a_salarios_copia)}'" + ], + "execution_count": 18, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "string" + }, + "text/plain": [ + "'Média: 1057.4744151862524; Mediana: 1048.089607774499; STD: 144.64306489539533'" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 18 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ILhNe80xW5C6" + }, + "source": [ + "### Solução do desafio" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "U993i1GJg2hk", + "outputId": "bf91af51-aac7-4008-8cc3-342573752205", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 271 + } + }, + "source": [ + "# Import a biblioteca seaborn:\n", + "import seaborn as sns\n", + "sns.boxplot(y = a_salarios_copia)" + ], + "execution_count": 19, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 19 + }, + { + "output_type": "display_data", + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAX0AAADtCAYAAABTaKWmAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAWIklEQVR4nO3df4zcdZ3H8eeL3QUBT2m3a8Vtua1OvQv+OCUjkBhyKG3ZEmP9485gLnZQco0CpRgSA9jYAHrx1EhoT0l6oWF7IXCc4tmEdsvW846YXLFbDigtaCdQaNdS1imgpgjd7fv+mE9xWPbHzOx2Zybf1yOZ9Pt9fz8z8/4S9jWffL/fma8iAjMzy4bTGt2AmZnNHoe+mVmGOPTNzDLEoW9mliEOfTOzDHHom5llyJShL2mhpF9I2idpr6Q1Y7bfKCkkzUvrkrReUlHSk5IuqBhbkLQ/PQozvztmZjaZ9irGjAA3RsRjkv4C2C1pICL2SVoILANeqBi/HFicHhcBdwEXSZoLrAPyQKTX2RIRL8/g/piZ2SSmDP2IOAwcTst/kPQ00A3sA+4Avg78rOIpK4DNUf7W105J50g6F7gUGIiIowCSBoBe4L6J3nvevHnR09NTx26ZmWXX7t27fxcRXeNtq2am/yZJPcDHgUclrQCGIuIJSZXDuoGDFeuHUm2i+tj3WAWsAjjvvPMYHByspUUzs8yT9PxE26o+kSvpncBPgBsoH/K5BfjmtLsbIyI2RkQ+IvJdXeN+UJmZWZ2qCn1JHZQD/96IeBD4ALAIeELSAWAB8Jik9wJDwMKKpy9ItYnqZmY2S6q5ekfA3cDTEfEDgIjYExHviYieiOihfKjmgoh4EdgCrExX8VwMvJrOC2wHlkmaI2kO5RPA20/NbpmZ2XiqOab/SeCLwB5Jj6faLRGxdYLxW4ErgCJwDPgSQEQclXQ7sCuNu+3kSV0zM5sd1Vy980tAU4zpqVgO4NoJxm0CNtXWolnzKZVK3Hrrraxbt47Ozs5Gt2NWNX8j16wOfX197Nmzh82bNze6FbOaOPTNalQqlejv7yci6O/vp1QqNbols6o59M1q1NfXx4kTJwAYHR31bN9aikPfrEY7duxgZGQEgJGREQYGBhrckVn1HPpmNVqyZAnt7eVrINrb21m6dGmDOzKrnkPfrEaFQoHTTiv/6bS1tbFy5coGd2RWPYe+WY06Ozvp7e1FEr29vb5k01pKTT+4ZmZlhUKBAwcOeJZvLcczfTOzDHHom9XBX86yVuXQN6tR5Zeztm3b5i9nWUtx6JvVqK+vj+PHjwNw/Phxz/atpTj0zWo0MDBA+XcFISJ4+OGHG9yRWfUc+mY1mj9//qTrZs3MoW9WoyNHjky6btbMHPpmNVq6dCnlG8qBJJYtW9bgjsyqV83tEhdK+oWkfZL2SlqT6t+T9IykJyX9VNI5Fc+5WVJR0q8lXV5R7021oqSbTs0umZ1ahULhLb+94y9oWSupZqY/AtwYEecDFwPXSjofGAA+HBEfBX4D3AyQtl0JfAjoBX4kqU1SG/BDYDlwPvCFNNaspXR2dtLd3Q1Ad3e3f4bBWsqUoR8RhyPisbT8B+BpoDsiHo6IkTRsJ7AgLa8A7o+I1yPiOcr3yr0wPYoR8WxEvAHcn8aatZRSqcTQ0BAAQ0NDvk7fWkpNx/Ql9QAfBx4ds+nLwLa03A0crNh2KNUmqo99j1WSBiUNDg8P19Ke2azo6+t7y+/p+zp9ayVVh76kdwI/AW6IiN9X1L9B+RDQvTPRUERsjIh8ROS7urpm4iXNZpSv07dWVlXoS+qgHPj3RsSDFfWrgM8A/xAn/wpgCFhY8fQFqTZR3ayl+Dp9a2XVXL0j4G7g6Yj4QUW9F/g68NmIOFbxlC3AlZLOkLQIWAz8CtgFLJa0SNLplE/2bpm5XTGbHYcPH5503ayZVfN7+p8EvgjskfR4qt0CrAfOAAbSNcs7I+IrEbFX0gPAPsqHfa6NiFEASdcB24E2YFNE7J3RvTGbBR0dHbz++utvWTdrFVOGfkT8EtA4m7ZO8pxvA98ep751sueZtYI//vGPk66bNTN/I9esRj09PZOumzUzh75ZjdauXTvpulkzc+ib1SiXy705u+/p6SGXyzW2IbMaOPTN6rB27VrOPvtsz/Kt5VRz9Y6ZjZHL5XjooYca3YZZzTzTNzPLEIe+mVmGOPTNzDLEoW9mliEOfTOzDHHom5lliEPfzCxDHPpmZhni0DczyxCHvplZhlRz56yFkn4haZ+kvZLWpPpcSQOS9qd/56S6JK2XVJT0pKQLKl6rkMbvl1Q4dbtlZmbjqWamPwLcGBHnAxcD10o6H7gJ+HlELAZ+ntYBllO+ReJiYBVwF5Q/JIB1wEXAhcC6kx8UZmY2O6YM/Yg4HBGPpeU/AE8D3cAKoC8N6wM+l5ZXAJujbCdwjqRzgcuBgYg4GhEvAwNA74zujZmZTaqmY/qSeoCPA48C8yPi5B2hXwTmp+Vu4GDF0w6l2kR1MzObJVWHvqR3Aj8BboiI31dui4gAYiYakrRK0qCkweHh4Zl4STMzS6oKfUkdlAP/3oh4MJWPpMM2pH9fSvUhYGHF0xek2kT1t4iIjRGRj4h8V1dXLftiZmZTqObqHQF3A09HxA8qNm0BTl6BUwB+VlFfma7iuRh4NR0G2g4skzQnncBdlmpmZjZLqrlz1ieBLwJ7JD2earcA3wEekHQ18Dzw+bRtK3AFUASOAV8CiIijkm4HdqVxt0XE0RnZCzMzq4rKh+ObUz6fj8HBwUa3YWbWUiTtjoj8eNv8jVwzswxx6JuZZYhD38wsQxz6ZmYZ4tA3M8sQh76ZWYY49M3MMsShb2aWIQ59M7MMceibmWWIQ9/MLEMc+mZmGeLQNzPLEIe+mVmGOPTN6lAqlbj++usplUqNbsWsJg59szr09fWxZ88eNm/e3OhWzGpSze0SN0l6SdJTFbWPSdop6fF0E/MLU12S1ksqSnpS0gUVzylI2p8ehfHey6wVlEol+vv7iQj6+/s927eWUs1M/x6gd0ztu8CtEfEx4JtpHWA5sDg9VgF3AUiaC6wDLgIuBNal++SatZy+vj5OnDgBwOjoqGf71lKmDP2IeAQYey/bAN6Vlt8N/DYtrwA2R9lO4BxJ5wKXAwMRcTQiXgYGePsHiVlL2LFjByMjIwCMjIwwMDDQ4I7MqlfvMf0bgO9JOgh8H7g51buBgxXjDqXaRPW3kbQqHTIaHB4errM9s1NnyZIltLe3A9De3s7SpUsb3JFZ9eoN/a8CX4uIhcDXgLtnqqGI2BgR+YjId3V1zdTLms2YQqHAaaeV/3Ta2tpYuXJlgzsyq169oV8AHkzL/0H5OD3AELCwYtyCVJuobtZyOjs76e3tRRK9vb10dnY2uiWzqtUb+r8F/jYtfxrYn5a3ACvTVTwXA69GxGFgO7BM0px0AndZqpm1pEKhwEc+8hHP8q3ltE81QNJ9wKXAPEmHKF+F84/AnZLagT9RvlIHYCtwBVAEjgFfAoiIo5JuB3alcbdFxNiTw2Yto7Ozk/Xr1ze6DbOaKSIa3cOE8vl8DA4ONroNM7OWIml3ROTH2+Zv5JqZZYhD38wsQxz6ZmYZ4tA3M8sQh76ZWYY49M3MMsShb1YH30TFWpVD36wOvomKtSqHvlmNSqUS27ZtIyLYtm2bZ/vWUhz6ZjXq6+t78/f0jx8/7tm+tRSHvlmNBgYGOPnzJRHBww8/3OCOzKrn0Der0fz58yddN2tmDn2zGh05cmTSdbNm5tA3q9HSpUuRBIAkli1b1uCOzKrn0DerUaFQoKOjA4COjg7fSMVaypShL2mTpJckPTWmvlrSM5L2SvpuRf1mSUVJv5Z0eUW9N9WKkm6a2d0wmz2Vt0tcvny5b5doLWXKO2cB9wD/Arx5XZqkTwErgL+JiNclvSfVzweuBD4EvA/YIemD6Wk/BJYCh4BdkrZExL6Z2hGz2VQoFDhw4IBn+dZypgz9iHhEUs+Y8leB70TE62nMS6m+Arg/1Z+TVOTPN00vRsSzAJLuT2Md+taSfLtEa1X1HtP/IHCJpEcl/Y+kT6R6N3CwYtyhVJuobmZms6iawzsTPW8ucDHwCeABSe+fiYYkrSLdaP28886biZc0M7Ok3pn+IeDBKPsVcAKYBwwBCyvGLUi1iepvExEbIyIfEfmurq462zMzs/HUG/r/CXwKIJ2oPR34HbAFuFLSGZIWAYuBXwG7gMWSFkk6nfLJ3i3Tbd7MzGoz5eEdSfcBlwLzJB0C1gGbgE3pMs43gEKUf4xkr6QHKJ+gHQGujYjR9DrXAduBNmBTROw9BftjZmaT0MkfjmpG+Xw+BgcHG92GmVlLkbQ7IvLjbfM3cs3MMsShb2aWIQ59M7MMceibmWWIQ9/MLEMc+mZmGeLQNzPLEIe+mVmGOPTNzDLEoW9mliEOfTOzDHHom5lliEPfzCxDHPpmZhni0DczyxCHvplZhkwZ+pI2SXop3SVr7LYbJYWkeWldktZLKkp6UtIFFWMLkvanR2Fmd8PMzKpRzUz/HqB3bFHSQmAZ8EJFeTnl++IuBlYBd6WxcynfZvEi4EJgnaQ502ncrJFKpRLXX389pVKp0a2Y1WTK0I+IR4Cj42y6A/g6UHm/xRXA5ijbCZwj6VzgcmAgIo5GxMvAAON8kJi1ir6+Pvbs2cPmzZsb3YpZTeo6pi9pBTAUEU+M2dQNHKxYP5RqE9XHe+1VkgYlDQ4PD9fTntkpVSqV6O/vJyLo7+/3bN9aSs2hL+ks4BbgmzPfDkTExojIR0S+q6vrVLyF2bT09fVx4sQJAEZHRz3bt5ZSz0z/A8Ai4AlJB4AFwGOS3gsMAQsrxi5ItYnqZi1nx44djIyMADAyMsLAwECDOzKrXs2hHxF7IuI9EdETET2UD9VcEBEvAluAlekqnouBVyPiMLAdWCZpTjqBuyzVzFrOJZdcMum6WTOr5pLN+4D/Bf5K0iFJV08yfCvwLFAE/hW4BiAijgK3A7vS47ZUM2s5ETH1ILMmpWb+Hzifz8fg4GCj2zB7iyuuuIJjx469uX7WWWexdevWBnZk9laSdkdEfrxt/kauWY2WLFlCW1sbAG1tbSxdurTBHZlVr73RDVjr2LBhA8VisdFtNNzx48cZHR0F4MSJE+zfv581a9Y0uKvGyuVyrF69utFtWBU80zerUUdHB+3t5fnS3Llz6ejoaHBHZtXzTN+q5pncn11zzTU8//zzbNy4kc7Ozka3Y1Y1z/TN6tDR0UEul3PgW8tx6JuZZYhD38wsQxz6ZmYZ4tA3M8sQh76ZWYY49M3MMsShb2aWIQ59M7MMceibmWWIQ9/MLEMc+mZmGVLNnbM2SXpJ0lMVte9JekbSk5J+Kumcim03SypK+rWkyyvqvalWlHTTzO+KmZlNpZqZ/j1A75jaAPDhiPgo8BvgZgBJ5wNXAh9Kz/mRpDZJbcAPgeXA+cAX0lgzM5tFU4Z+RDwCHB1TezgiRtLqTmBBWl4B3B8Rr0fEc5TvlXthehQj4tmIeAO4P401M7NZNBPH9L8MbEvL3cDBim2HUm2i+ttIWiVpUNLg8PDwDLRnZmYnTSv0JX0DGAHunZl2ICI2RkQ+IvJdXV0z9bJmZsY07pwl6SrgM8BlERGpPAQsrBi2INWYpG5mZrOkrpm+pF7g68BnI+JYxaYtwJWSzpC0CFgM/ArYBSyWtEjS6ZRP9m6ZXutmZlarKWf6ku4DLgXmSToErKN8tc4ZwIAkgJ0R8ZWI2CvpAWAf5cM+10bEaHqd64DtQBuwKSL2noL9MTOzSUwZ+hHxhXHKd08y/tvAt8epbwW21tSdmZnNKH8j18wsQxz6ZmYZ4tA3M8uQui/ZzIoNGzZQLBYb3YY1mZP/T6xZs6bBnVizyeVyrF69utFtTMihP4ViscjjTz3N6FlzG92KNZHT3ih/NWX3s0ca3Ik1k7ZjR6ce1GAO/SqMnjWX1/76ika3YWZN7sxnmv8CRR/TNzPLEIe+mVmGOPTNzDLEoW9mliEOfTOzDPHVO1MYGhqi7dirLXFW3swaq+1YiaGhkakHNpBn+mZmGeKZ/hS6u7t58fV2X6dvZlM685mtdHfPb3Qbk/JM38wsQ6YMfUmbJL0k6amK2lxJA5L2p3/npLokrZdUlPSkpAsqnlNI4/dLKpya3TEzs8lUM9O/B+gdU7sJ+HlELAZ+ntYBllO+ReJiYBVwF5Q/JCjfcesi4EJg3ckPCjMzmz1Thn5EPAKM/RWhFUBfWu4DPldR3xxlO4FzJJ0LXA4MRMTRiHgZGODtHyRmZnaK1XtMf35EHE7LLwInz1x0Awcrxh1KtYnqbyNplaRBSYPDw8N1tmdmZuOZ9onciAggZqCXk6+3MSLyEZHv6uqaqZc1MzPqv2TziKRzI+JwOnzzUqoPAQsrxi1ItSHg0jH1/67zvWdd27Gj/nKWvcVpf/o9ACfe8a4Gd2LNpPx7+s19yWa9ob8FKADfSf/+rKJ+naT7KZ+0fTV9MGwH/qni5O0y4Ob62549uVyu0S1YEyoW/wBA7v3N/Qdus21+02fGlKEv6T7Ks/R5kg5RvgrnO8ADkq4Gngc+n4ZvBa4AisAx4EsAEXFU0u3ArjTutoho/lvMQFPf9swa5+RtEu+8884Gd2JWmylDPyK+MMGmy8YZG8C1E7zOJmBTTd2ZmdmM8jdyzcwyxKFvZpYhDn0zswxx6JuZZYhD38wsQxz6ZmYZ4tA3M8sQh76ZWYY49M3MMsShb2aWIQ59M7MMceibmWWIQ9/MLEMc+mZmGeLQNzPLEIe+mVmGTCv0JX1N0l5JT0m6T9I7JC2S9KikoqR/l3R6GntGWi+m7T0zsQNmZla9ukNfUjdwPZCPiA8DbcCVwD8Dd0REDngZuDo95Wrg5VS/I40zM7NZNN3DO+3AmZLagbOAw8CngR+n7X3A59LyirRO2n6ZJE3z/c3MrAZ1h35EDAHfB16gHPavAruBVyJiJA07BHSn5W7gYHruSBrfOfZ1Ja2SNChpcHh4uN72zMxsHNM5vDOH8ux9EfA+4Gygd7oNRcTGiMhHRL6rq2u6L2dmZhWmc3hnCfBcRAxHxHHgQeCTwDnpcA/AAmAoLQ8BCwHS9ncDpWm8v5mZ1Wg6of8CcLGks9Kx+cuAfcAvgL9LYwrAz9LylrRO2v5fERHTeH8zM6tR+9RDxhcRj0r6MfAYMAL8H7AReAi4X9K3Uu3u9JS7gX+TVASOUr7Sx1rIhg0bKBaLjW6jKZz877BmzZoGd9Iccrkcq1evbnQbVoW6Qx8gItYB68aUnwUuHGfsn4C/n877mTWLjo4OXnnlFV577TXOPPPMRrdjVrVphb5li2dyf3bVVVfxyiuv8MYbb7Bx48ZGt2NWNf8Mg1mNisUiBw4cAODAgQM+5GUtxaFvVqNvfetbk66bNTOHvlmNTs7yJ1o3a2YOfbMa9fT0TLpu1swc+mY1Wrt27aTrZs3MoW9Wo1wu9+bsvqenh1wu19iGzGrg0Derw9q1azn77LM9y7eW4+v0zeqQy+V46KGHGt2GWc080zczyxCHvplZhjj0zcwyxKFvZpYhauaftJc0DDzf6D7MJjAP+F2jmzAbx19GxLi3Hmzq0DdrZpIGIyLf6D7MauHDO2ZmGeLQNzPLEIe+Wf189xRrOT6mb2aWIZ7pm5lliEPfzCxDHPpmZhni0DczyxCHvplZhvw/5I+5LV0j8I0AAAAASUVORK5CYII=\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "tags": [], + "needs_background": "light" + } + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "VtenLK1uK1Pi" + }, + "source": [ + "Consegue identificar os outliers do array?" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "e3sHuGVGFBdW" + }, + "source": [ + "## Objetivo\n", + "> Substituir os outliers por mediana. \n", + "\n", + "* Como fazer isso?" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "RSegPNKCI-dS" + }, + "source": [ + "### Siga os passos a seguir\n", + "1. Calcule estatísticas descritivas antes das transformações par avaliar o impacto;\n", + " * Calcule média, mediana e desvio-padrão dos dados originais;\n", + "2. Calcule os valores a seguir:\n", + " * Q1, Q3\n", + " * IQR = Q3-Q1\n", + " * lim_inferior = Q1-1.5\\*IQR\n", + " * lim_superior = Q3+1.5\\*IQR\n", + "3. Proceda à substituição:\n", + " * Se a_salarios_copia[i] < lim_inferior então a_salarios_copia[i]= Mediana\n", + " * Se a_salarios_copia[i] > lim_superior então a_salarios_copia[i]= Mediana\n", + "4. Calcule as estatísticas descritivas após as substituições e compare com os valores antes das transformações." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "9DQ7YnWaFn4v" + }, + "source": [ + "### Minha solução\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "RBXJbTeGLC7Q" + }, + "source": [ + "1. Estatísticas Descritivas antes das transformações:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "QueKYn7MLG12", + "outputId": "11daf3fe-c4c9-446e-cb46-cf6378c02779", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 35 + } + }, + "source": [ + "# Algumas estatísticas descritivas:\n", + "f'Média: {np.mean(a_salarios_copia)}; Mediana: {np.median(a_salarios_copia)}; STD: {np.std(a_salarios_copia)}'" + ], + "execution_count": 20, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "string" + }, + "text/plain": [ + "'Média: 1057.4744151862524; Mediana: 1048.089607774499; STD: 144.64306489539533'" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 20 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "oOBJ8INWL5fo" + }, + "source": [ + "Observe o quanto nossos dados estão distorcidos dos valores originalmente utilizados." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "MX-fJeh2MBTD" + }, + "source": [ + "2. Calcular Q1, Q3 e IQR" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "JlsPiQeGMGeU" + }, + "source": [ + "Q1 = np.percentile(a_salarios_copia, q = [25])\n", + "Q3 = np.percentile(a_salarios_copia, q = [75])\n", + "Q2 = np.percentile(a_salarios_copia, q = [50])\n", + "IQR = Q3-Q1\n", + "lim_inferior = Q1-1.5*IQR\n", + "lim_superior = Q3+1.5*IQR" + ], + "execution_count": 21, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "VF2NJ3rCeI1_", + "outputId": "e8e38919-ee69-4d21-db00-1abb7bd4fb9b", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 35 + } + }, + "source": [ + "f'Q1: {Q1}; Q3: {Q3}; lim_inferior: {lim_inferior}; lim_superior: {lim_superior}'" + ], + "execution_count": 22, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "string" + }, + "text/plain": [ + "'Q1: [974.41]; Q3: [1119.81]; lim_inferior: [756.33]; lim_superior: [1337.89]'" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 22 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "JjnwJ7HwMxcl" + }, + "source": [ + "3. Substituir\n", + "* Se a_conjunto1[i] < lim_inferior então a_conjunto1[i] = Mediana\n", + "* Se a_conjunto1[i] > Lia_Sup então a_conjunto1[i] = Mediana" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "hcAn-IwVfbcI" + }, + "source": [ + "a_salarios2 = a_salarios_copia.copy()" + ], + "execution_count": 23, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "J3SSE45oM9oh", + "outputId": "53db1a1d-8483-40f9-8cbc-196b79e449ff", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 102 + } + }, + "source": [ + "a_salarios2[a_salarios2 < lim_inferior[0]] = Q2[0]\n", + "#para todos que atendam essa condição ele vai receber o valor da mediana\n", + "a_salarios2[a_salarios2 > lim_superior[0]] = Q2[0]\n", + "a_salarios2[:30]" + ], + "execution_count": 24, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([1295.63, 1156.44, 1250.57, 1101.48, 1074.9 , 1149.93, 1032.39,\n", + " 1151.23, 1158.81, 1182.97, 839. , 1112.47, 1117.72, 1011.08,\n", + " 1048.09, 1104.14, 915.72, 1162.71, 946.36, 865.97, 936.09,\n", + " 954.29, 942.71, 908.55, 1015.57, 1051.34, 930.8 , 994.29,\n", + " 961.46, 903.51])" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 24 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "VEGFio0Nfj7O" + }, + "source": [ + "4. Estatísticas Descritivas para avaliarmos o impacto:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "gX1LZHFqfjFQ", + "outputId": "31a986a8-79bd-4c04-daf2-3fa1e8ca9f44", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 35 + } + }, + "source": [ + "# Algumas estatísticas descritivas:\n", + "f'Média: {np.mean(a_salarios2)}; Mediana: {np.median(a_salarios2)}; STD: {np.std(a_salarios2)}'" + ], + "execution_count": 25, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "string" + }, + "text/plain": [ + "'Média: 1047.3019702056902; Mediana: 1048.089607774499; STD: 98.3265929249586'" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 25 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "-xnguZ7XgyvK", + "outputId": "98be8554-e55a-4d5b-e418-4712377c627e", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 269 + } + }, + "source": [ + "# Import a biblioteca seaborn:\n", + "import seaborn as sns\n", + "sns.boxplot(y = a_salarios2)" + ], + "execution_count": 26, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 26 + }, + { + "output_type": "display_data", + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAX0AAADrCAYAAACFMUa7AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAO5klEQVR4nO3df6zddX3H8eer90YEzQa0d40WGCxt5tC4xTVIsriw8asQsxo3DWRJ7xxZY4Klwz82jMmaYEg0Lhpo1KQJDW3iYGSbsW5dsbBl/IVSFgJFUU5QpA3K9RZxWRW57Xt/3C/x7nJv7z333PZc/Dwfycn5nvf3c77nfQj3dT/9fL/nnlQVkqQ2rBp2A5KkM8fQl6SGGPqS1BBDX5IaYuhLUkMMfUlqyOiwGziVNWvW1MUXXzzsNiTpDeWxxx77cVWNzbVvRYf+xRdfzKFDh4bdhiS9oSR5br59Lu9IUkMMfUlqiKEvSQ0x9CWpIYa+tASTk5PccsstTE5ODrsVqS+GvrQEe/bs4cknn2Tv3r3DbkXqi6Ev9WlycpIDBw5QVRw4cMDZvt5QDH2pT3v27OHkyZMAnDhxwtm+3lAMfalPDz74IFNTUwBMTU1x8ODBIXckLZ6hL/XpqquuYnR0+sPso6OjXH311UPuSFo8Q1/q0/j4OKtWTf/ojIyMsGXLliF3JC3eiv7bO1pZdu7cSa/XG3YbK0ISAN761rdy++23D7mb4Vu/fj3btm0bdhtaBGf60hKsWrWKVatWsXbt2mG3IvXFmb4WzZncL23fvh2AO++8c8idSP1xpi9JDVkw9JPsTvJiksMzap9K8kSSx5N8Pcnbu3qS3JWk1+1/z4znjCd5pruNn563I0k6lcXM9O8BNs2qfbaq3l1Vvwf8K/B3Xf06YEN32wp8CSDJ+cAO4L3AZcCOJOcN3L0kqS8Lhn5VPQwcm1X76YyHbwGq294M7K1pjwDnJnkbcC1wsKqOVdVLwEFe/4tEknSaLflEbpI7gC3Ay8AfdeV1wPMzhh3pavPVJUln0JJP5FbVJ6vqQuDLwMeWq6EkW5McSnJoYmJiuQ4rSWJ5rt75MvCn3fZR4MIZ+y7oavPVX6eqdlXVxqraODY255e5S5KWaEmhn2TDjIebgae77X3Alu4qnsuBl6vqBeAB4Jok53UncK/papKkM2jBNf0k9wJXAGuSHGH6Kpzrk/w2cBJ4DvhoN3w/cD3QA44DHwGoqmNJPgU82o27var+38lhSdLpt2DoV9WNc5TvnmdsATfPs283sLuv7iRJy8pP5EpSQwx9SWqIoS9JDTH0Jakhhr4kNcTQl6SGGPqS1BBDX5IaYuhLUkMMfUlqiKEvSQ0x9CWpIYa+JDXE0Jekhhj6ktQQQ1+SGmLoS1JDDH1JaoihL0kNMfQlqSGGviQ1xNCXpIYY+pLUEENfkhpi6EtSQwx9SWrIgqGfZHeSF5McnlH7bJKnkzyR5CtJzp2x7xNJekm+k+TaGfVNXa2X5LblfyuSpIUsZqZ/D7BpVu0g8K6qejfwXeATAEkuBW4A3tk954tJRpKMAF8ArgMuBW7sxkqSzqAFQ7+qHgaOzap9vaqmuoePABd025uB+6rqlar6HtADLutuvap6tqp+AdzXjZUknUHLsab/l8C/d9vrgOdn7DvS1earS5LOoIFCP8kngSngy8vTDiTZmuRQkkMTExPLdVhJEgOEfpK/AN4P/HlVVVc+Clw4Y9gFXW2++utU1a6q2lhVG8fGxpbaniRpDksK/SSbgL8B/qSqjs/YtQ+4IclZSS4BNgDfBB4FNiS5JMmbmD7Zu2+w1iVJ/RpdaECSe4ErgDVJjgA7mL5a5yzgYBKAR6rqo1X1VJL7gW8xvexzc1Wd6I7zMeABYATYXVVPnYb3I0k6hQVDv6punKN89ynG3wHcMUd9P7C/r+4kScvKT+RKUkMMfUlqiKEvSQ0x9CWpIYa+JDXE0Jekhhj6ktQQQ1+SGmLoS1JDDH1JaoihL0kNMfQlqSGGviQ1xNCXpIYY+pLUEENfkhpi6EtSQwx9SWqIoS9JDTH0Jakhhr4kNcTQl6SGGPqS1JDRYTew0u3cuZNerzfsNrTCvPb/xPbt24fciVaa9evXs23btmG3MS9DfwG9Xo/HD3+bE+ecP+xWtIKs+kUB8NizPxpyJ1pJRo4fG3YLCzL0F+HEOefzs3dcP+w2JK1wZz+9f9gtLGjBNf0ku5O8mOTwjNqHkjyV5GSSjbPGfyJJL8l3klw7o76pq/WS3La8b0OStBiLOZF7D7BpVu0w8EHg4ZnFJJcCNwDv7J7zxSQjSUaALwDXAZcCN3ZjJUln0ILLO1X1cJKLZ9W+DZBk9vDNwH1V9QrwvSQ94LJuX6+qnu2ed1839luDNC9J6s9yX7K5Dnh+xuMjXW2++usk2ZrkUJJDExMTy9yeJLVtxV2nX1W7qmpjVW0cGxsbdjuS9Ctlua/eOQpcOOPxBV2NU9QlSWfIcs/09wE3JDkrySXABuCbwKPAhiSXJHkT0yd79y3za0uSFrDgTD/JvcAVwJokR4AdwDFgJzAG/FuSx6vq2qp6Ksn9TJ+gnQJurqoT3XE+BjwAjAC7q+qp0/GGJEnzW8zVOzfOs+sr84y/A7hjjvp+YOV/ckGSfoWtuBO5kqTTx9CXpIYY+pLUEENfkhpi6EtSQwx9SWqIoS9JDTH0Jakhhr4kNcTQl6SGGPqS1BC/GH0BR48eZeT4y2+ILzyWNFwjxyc5enRq2G2ckjN9SWqIM/0FrFu3jh++MsrP3nH9sFuRtMKd/fR+1q1bO+w2TsmZviQ1xNCXpIYY+pLUEENfkhpi6EtSQwx9SWqIoS9JDTH0Jakhhr4kNcTQl6SGGPqS1JAFQz/J7iQvJjk8o3Z+koNJnunuz+vqSXJXkl6SJ5K8Z8ZzxrvxzyQZPz1vR5J0KouZ6d8DbJpVuw14qKo2AA91jwGuAzZ0t63Al2D6lwSwA3gvcBmw47VfFJKkM2fB0K+qh4Fjs8qbgT3d9h7gAzPqe2vaI8C5Sd4GXAscrKpjVfUScJDX/yKRJJ1mS13TX1tVL3TbPwRe+1ui64DnZ4w70tXmq0uSzqCBT+RWVQG1DL0AkGRrkkNJDk1MTCzXYSVJLD30f9Qt29Ddv9jVjwIXzhh3QVebr/46VbWrqjZW1caxsbEltidJmstSQ38f8NoVOOPAV2fUt3RX8VwOvNwtAz0AXJPkvO4E7jVdTZJ0Bi34dYlJ7gWuANYkOcL0VTifBu5PchPwHPDhbvh+4HqgBxwHPgJQVceSfAp4tBt3e1XNPjksSTrNFgz9qrpxnl1XzjG2gJvnOc5uYHdf3UmSlpWfyJWkhhj6ktQQQ1+SGrLgmr5g5Pgxzn56/7Db0Aqy6uc/BeDkm39tyJ1oJRk5foxfflZ1ZTL0F7B+/fpht6AVqNf7HwDW/9bK/gHXmbZ2xWeGob+Abdu2DbsFrUDbt28H4M477xxyJ1J/XNOXpIYY+pLUEENfkhpi6EtSQwx9SWqIoS9JDTH0Jakhhr4kNcTQl6SGGPqS1BBDX5IaYuhLUkMMfUlqiKEvSQ0x9CWpIYa+JDXE0Jekhhj6ktQQQ1+SGmLoS1JDBgr9JNuTHE7yVJK/7mrnJzmY5Jnu/ryuniR3JekleSLJe5bjDUiSFm/JoZ/kXcBfAZcBvwu8P8l64DbgoaraADzUPQa4DtjQ3bYCXxqgb0nSEgwy0/8d4BtVdbyqpoD/Aj4IbAb2dGP2AB/otjcDe2vaI8C5Sd42wOtLkvo0SOgfBt6XZHWSc4DrgQuBtVX1Qjfmh8Dabnsd8PyM5x/papKkM2R0qU+sqm8n+QzwdeB/gceBE7PGVJLq57hJtjK9/MNFF1201PYkSXMY6ERuVd1dVb9fVX8IvAR8F/jRa8s23f2L3fCjTP9L4DUXdLXZx9xVVRurauPY2Ngg7UmSZhn06p3f6O4vYno9/x+AfcB4N2Qc+Gq3vQ/Y0l3Fcznw8oxlIEnSGbDk5Z3OPydZDbwK3FxVP0nyaeD+JDcBzwEf7sbuZ3rdvwccBz4y4GtLkvo0UOhX1fvmqE0CV85RL+DmQV5PkjQYP5ErSQ0x9CWpIYa+JDXE0Jekhhj6ktQQQ1+SGmLoS1JDDH1JaoihL0kNMfQlqSGGviQ1xNCXpIYY+pLUEENfkhpi6EtSQwx9SWqIoS9JDTH0Jakhhr4kNcTQl6SGGPqS1BBDX5IaYuhLUkMMfUlqiKEvSQ0x9CWpIQOFfpJbkzyV5HCSe5O8OcklSb6RpJfkH5O8qRt7Vve41+2/eDnegCRp8ZYc+knWAbcAG6vqXcAIcAPwGeDzVbUeeAm4qXvKTcBLXf3z3ThJ0hk06PLOKHB2klHgHOAF4I+Bf+r27wE+0G1v7h7T7b8ySQZ8fUlSH5Yc+lV1FPh74AdMh/3LwGPAT6pqqht2BFjXba8Dnu+eO9WNXz37uEm2JjmU5NDExMRS25MkzWGQ5Z3zmJ69XwK8HXgLsGnQhqpqV1VtrKqNY2Njgx5OkjTDIMs7VwHfq6qJqnoV+BfgD4Bzu+UegAuAo932UeBCgG7/rwOTA7y+JKlPg4T+D4DLk5zTrc1fCXwL+E/gz7ox48BXu+193WO6/f9RVTXA60uS+jTImv43mD4h+9/Ak92xdgF/C3w8SY/pNfu7u6fcDazu6h8Hbhugb0nSEowuPGR+VbUD2DGr/Cxw2Rxjfw58aJDXkyQNxk/kSlJDDH1JashAyztqy86dO+n1esNuY0V47b/D9u3bh9zJyrB+/Xq2bds27Da0CM70pSU466yzeOWVV3j11VeH3YrUF2f6WjRncr/0uc99jq997Wts2LCBW2+9ddjtSIvmTF/q0+TkJAcOHKCqOHDgAJOTfsZQbxyGvtSnPXv2cPLkSQBOnDjB3r17h9yRtHiGvtSnBx98kKmp6b8pODU1xcGDB4fckbR4hr7Up6uuuorR0enTYaOjo1x99dVD7khaPENf6tP4+DirVk3/6IyMjLBly5YhdyQtnqEv9Wn16tVs2rSJJGzatInVq1/3tRDSiuUlm9ISjI+P8/3vf99Zvt5wDH1pCVavXs1dd9017Dakvrm8I0kNMfQlqSGGviQ1xNCXpIZkJX9NbZIJ4Llh9yHNYw3w42E3Ic3hN6tqbK4dKzr0pZUsyaGq2jjsPqR+uLwjSQ0x9CWpIYa+tHS7ht2A1C/X9CWpIc70Jakhhr4kNcTQl6SGGPqS1BBDX5Ia8n+y6aH62hucLAAAAABJRU5ErkJggg==\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "tags": [], + "needs_background": "light" + } + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "uEPFcBjFhETQ" + }, + "source": [ + "Como podem ver, os outliers desapareceram, como queríamos." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "tHfzjW_ymKuR" + }, + "source": [ + "___\n", + "# **Valores únicos**\n", + "> Considere o array a seguir:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "HzmQgWZVmUUD" + }, + "source": [ + "np.random.seed(20111974)\n", + "a_conjunto1 = np.random.randint(0, 100, 100)\n", + "a_conjunto1" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Dm9ky1F1mrNA" + }, + "source": [ + "Quem são os valores únicos do array?" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "G-LPRqc-mS5j" + }, + "source": [ + "np.unique(a_conjunto1)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "uXZZoTd6nMuq" + }, + "source": [ + "___\n", + "# **Diferença entre dois arrays**\n", + "> O resultado é um array com os **valores únicos de A que não estão em B**. Na teoria de conjuntos escrevemos $A - B = A - A \\cap B$.\n", + "\n", + "![Difference](https://github.com/MathMachado/Materials/blob/master/set_Difference.PNG?raw=true)\n", + "\n", + "Fonte: [Python Set](https://www.learnbyexample.org/python-set/)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "uW6i3m9q1ZNs" + }, + "source": [ + "\n", + "* Vamos ver como isso funciona na prática:" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "vw05sfe22mfk" + }, + "source": [ + "## Exemplo 1" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "Qqw2do90nQ7k" + }, + "source": [ + "a_conjunto1 = np.array([0, 1, 2, 4, 5, 7, 8, 8]) # array de valores que serão excluidos em a_conjunto1. Observe que '3' não pertence a a_conjunto1.\n", + "a_conjunto2 = np.array([1, 6, 7, 3])" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "zXJ00pOMorM-" + }, + "source": [ + "np.setdiff1d(a_conjunto1, a_conjunto2)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "8GXZNgjfo8lO" + }, + "source": [ + "Observe que o resultado são os elementos de a_conjunto1 que não pertencem a x_Y. Mas como fica o '3' nesta história?" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "aJSu6VKb2oc_" + }, + "source": [ + "## Exemplo 2" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "N1wahElXTqoB" + }, + "source": [ + "a_conjunto1 = np.arange(10)\n", + "a_conjunto1" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "nxDpCMg7T7Rj" + }, + "source": [ + "a_conjunto2 = np.array([1, 5, 7])\n", + "a_conjunto2" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "3LU3qYyiUXqm" + }, + "source": [ + "np.setdiff1d(a_conjunto1, a_conjunto2)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "mzZEytrRUioU" + }, + "source": [ + "Observe que os elementos de a_conjunto2 foram deletados de a_conjunto1. Ok?" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "gJRcoVRUnaY9" + }, + "source": [ + "___\n", + "# Diferença Simétrica\n", + "* Em teoria de conjuntos, chamamos de Diferença Simétrica e escrevemos $(A \\cup B)- (A \\cap B)$.\n", + "\n", + "![DifferenceSymetric](https://github.com/MathMachado/Materials/blob/master/set_DifferenceSymetric.PNG?raw=true)\n", + "\n", + "Fonte: [Python Set](https://www.learnbyexample.org/python-set/)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "2Uzzm85Kup3H" + }, + "source": [ + "* Vamos ver como isso funciona na prática:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "1z5wZ8VwpsWN" + }, + "source": [ + "import numpy as np\n", + "a_conjunto1 = np.array([0, 1, 2, 4, 5, 7, 8]) # Observe que [1, 4, 7] pertencem a a_conjunto1, mas 3, não. Portanto:\n", + "a_conjunto2 = np.array([1, 4, 7, 3])" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "Tqd_9XO5p7bo" + }, + "source": [ + "np.setxor1d(a_conjunto1, a_conjunto2)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "_meurG3mqS5Y" + }, + "source": [ + "Como explicamos ou interpretamos este resultado?" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Kc8JoKe2nj2n" + }, + "source": [ + "___\n", + "# **União de dois arrays**\n", + "> Retorna os valores **únicos** dos dois arrays. Na teoria dos conjuntos, escrevemos:\n", + "\n", + "$$A \\cup B$$\n", + "\n", + "![Union](https://github.com/MathMachado/Materials/blob/master/set_Union.PNG?raw=true)\n", + "\n", + "Fonte: [Python Set](https://www.learnbyexample.org/python-set/)" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "1LZxorw2p2mg" + }, + "source": [ + "a_conjunto1 = np.array([0, 1, 2, 4, 5, 7, 8, 8])\n", + "\n", + "# Observe que [1, 4, 7] pertencem a a_conjunto1, mas 3, não. Portanto:\n", + "a_conjunto2 = np.array([1, 4, 7, 3])" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "COsZEmSwuY5L" + }, + "source": [ + "np.union1d(a_conjunto1, a_conjunto2)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "b53bR-GYRu_3" + }, + "source": [ + "___\n", + "# **Selecionar itens comuns dos arrays X e Y**\n", + "* Na teoria de conjuntos, chamamos de intersecção e escrevemos $X \\cap Y$.\n", + "\n", + "![Intersection](https://github.com/MathMachado/Materials/blob/master/set_Intersection.PNG?raw=true)\n", + "\n", + "Fonte: [Python Set](https://www.learnbyexample.org/python-set/)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "n2ec2tqqR1Gw" + }, + "source": [ + "* Considere os arrays a seguir:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "rXVQQvBqR4J-" + }, + "source": [ + "a_conjunto1 = np.arange(10)\n", + "a_conjunto1" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "pZTHhHxGSRfB" + }, + "source": [ + "a_conjunto2 = np.arange(8, 18)\n", + "a_conjunto2" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "MxB2_qHpScMB" + }, + "source": [ + "Quais são os elementos comuns à X e Y?" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "e-rncJHtSfw0" + }, + "source": [ + "np.intersect1d(a_conjunto1, a_conjunto2)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "3Bb39sWdfqaF" + }, + "source": [ + "___\n", + "# **Autovalores e Autovetores**\n", + "> Autovetor e Autovalor são um dos tópicos mais importantes em Machine Learning.\n", + "\n", + "Por definição, o escalar $\\lambda$ e o vetor $v$ são autovalor e autovetor da matriz $A$ se\n", + "\n", + "$$Av = \\lambda v$$\n", + "\n", + "## Leitura Adicional:\n", + "\n", + "* [Machine Learning & Linear Algebra — Eigenvalue and eigenvector](https://medium.com/@jonathan_hui/machine-learning-linear-algebra-eigenvalue-and-eigenvector-f8d0493564c9)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "XZBKq8nGCUbL" + }, + "source": [ + "* O array a_conjunto2 tem a seguinte forma:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "iYlZGKFUfw-R" + }, + "source": [ + "a_conjunto2" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "6EfvIbBNf02Z" + }, + "source": [ + "# Calcula autovalores e autovetores:\n", + "a_autovalores, a_autovalores= np.linalg.eig(a_conjunto2)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "v3GtQQvAz9QU" + }, + "source": [ + "Os autovalores do array a_conjunto2 são:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "WvZGyBR1f9vP" + }, + "source": [ + "a_autovalores" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "AuuDRJVh0FC8" + }, + "source": [ + "Os autovetores do array a_conjunto2 são:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "6m4YFAwsf_rA" + }, + "source": [ + "a_autovalores" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "DASn2Un9ZNV-" + }, + "source": [ + "___\n", + "# **Encontrar Missing Values (NaN)**" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "TKilWBsSXtR4" + }, + "source": [ + "## Gerar o exemplo" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "lqLI2ER_ZUMY" + }, + "source": [ + "np.random.seed(20111974)\n", + "a_conjunto1 = np.random.random(100)\n", + "\n", + "# Inserindo 15 NaN's no array:\n", + "np.random.seed(20111974)\n", + "l_indices_aleatorios= np.random.randint(0, 100, size = 15)\n", + "\n", + "for i_indices in l_indices_aleatorios:\n", + " #print(i_indices)\n", + " a_conjunto1[i_indices] = np.nan" + ], + "execution_count": 27, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "2ZkbMPXMawYh", + "outputId": "af5865c5-95fe-4df1-8712-b77543e860c5", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 187 + } + }, + "source": [ + "a_conjunto1" + ], + "execution_count": 28, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([0.53, 0.57, nan, 0.65, 0.86, 0.6 , 0.87, 0.46, nan, 0.64, 0.55,\n", + " 0.35, 0.32, nan, 0.85, 0.76, 0.66, 0.33, 0.35, 0.42, 0.31, 0.27,\n", + " 0.31, 0.36, 0.6 , 0.02, 0.36, nan, 0.28, 0.37, nan, 0.44, 0.2 ,\n", + " 0.21, 0.65, 0.82, 0.72, 0.5 , 0.17, 0.6 , nan, 0.14, nan, 0.71,\n", + " 0.07, 0.56, nan, 0.84, 0.21, 0.85, 0.63, 0.38, 0.91, 0.34, 0.07,\n", + " 0.1 , 0.85, 0.12, 0.94, 0.16, nan, 0.91, 0.59, 0.37, 0.72, 0.07,\n", + " 0.48, 0.78, 0.97, 0.72, 0.29, 0.33, 0.95, 0.24, 0.98, 0.85, 0.63,\n", + " 0.57, 0.67, 0.88, nan, nan, nan, 0.68, 0.29, 0.33, 0.98, 0.17,\n", + " nan, 0.92, 0.98, 0.76, 0.31, 0.97, 0.08, 0.56, nan, 0.49, 0.07,\n", + " 0.11])" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 28 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Z7Bs75NvbSjx" + }, + "source": [ + "Ok, inserimos aleatoriamente 14 NaN's no array a_conjunto1. Agora, vamos contar quantos NaN's (já sabemos a resposta!)." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "hL1Wn0vdX8ur" + }, + "source": [ + "## Identificar os NaN's" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "5R-n3H0xbd6d", + "outputId": "9dc32bc4-bb41-4c02-bc64-c98461f3cefd", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + } + }, + "source": [ + "np.isnan(a_conjunto1).sum()\n", + "## isnan retorna um valor boleano se é nulo ou não" + ], + "execution_count": 29, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "14" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 29 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "81IoQ-EVbI5X", + "outputId": "fcddc325-8c6c-4545-cf1c-349af38ca954", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + } + }, + "source": [ + "np.isnan\n", + "## é uma função" + ], + "execution_count": 31, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 31 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "PB-GAJ71bc7i", + "outputId": "0a9a431e-986d-40da-c841-544d25727b38", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 221 + } + }, + "source": [ + "array_nulos = np.isnan(a_conjunto1)\n", + "array_nulos" + ], + "execution_count": 32, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([False, False, True, False, False, False, False, False, True,\n", + " False, False, False, False, True, False, False, False, False,\n", + " False, False, False, False, False, False, False, False, False,\n", + " True, False, False, True, False, False, False, False, False,\n", + " False, False, False, False, True, False, True, False, False,\n", + " False, True, False, False, False, False, False, False, False,\n", + " False, False, False, False, False, False, True, False, False,\n", + " False, False, False, False, False, False, False, False, False,\n", + " False, False, False, False, False, False, False, False, True,\n", + " True, True, False, False, False, False, False, True, False,\n", + " False, False, False, False, False, False, True, False, False,\n", + " False])" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 32 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Y7hh5uowoa3U" + }, + "source": [ + "Ok, temos 14 NaN's em a_conjunto1." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "iVLQf_bqbyNU" + }, + "source": [ + "Ok, agora eu quero saber os índices desses NaN's." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "kJHxjZiwb5HM", + "outputId": "57d4bf56-64bd-4969-9281-d311ac926119", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + } + }, + "source": [ + "i_indices= np.where(np.isnan(a_conjunto1))\n", + "## o where retorna a posiçao do array que é true\n", + "i_indices" + ], + "execution_count": 30, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "(array([ 2, 8, 13, 27, 30, 40, 42, 46, 60, 80, 81, 82, 88, 96]),)" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 30 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "W_jHGNImok7L", + "outputId": "cbdcf2d2-3edf-4b8a-9a42-29aa20cbcd94", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + } + }, + "source": [ + "# Checando... que a posição 2 é nan\n", + "a_conjunto1[2]" + ], + "execution_count": 33, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "nan" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 33 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "iPhHAhDYcMWO" + }, + "source": [ + "Vamos conferir se está correto? Para isso, basta comparar com l_indices_aleatorios:" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "gxQYslRCe11G" + }, + "source": [ + "___\n", + "# **Deletar NaN's de um array**\n", + "> Considere o mesmo array que acabamos de trabalhar. Agora eu quero excluir os NaN's identificados." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "AeBARFqNfNnN", + "outputId": "eb361064-326a-451c-c4fe-0d14b861fbe8", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 187 + } + }, + "source": [ + "a_conjunto1" + ], + "execution_count": 34, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([0.53, 0.57, nan, 0.65, 0.86, 0.6 , 0.87, 0.46, nan, 0.64, 0.55,\n", + " 0.35, 0.32, nan, 0.85, 0.76, 0.66, 0.33, 0.35, 0.42, 0.31, 0.27,\n", + " 0.31, 0.36, 0.6 , 0.02, 0.36, nan, 0.28, 0.37, nan, 0.44, 0.2 ,\n", + " 0.21, 0.65, 0.82, 0.72, 0.5 , 0.17, 0.6 , nan, 0.14, nan, 0.71,\n", + " 0.07, 0.56, nan, 0.84, 0.21, 0.85, 0.63, 0.38, 0.91, 0.34, 0.07,\n", + " 0.1 , 0.85, 0.12, 0.94, 0.16, nan, 0.91, 0.59, 0.37, 0.72, 0.07,\n", + " 0.48, 0.78, 0.97, 0.72, 0.29, 0.33, 0.95, 0.24, 0.98, 0.85, 0.63,\n", + " 0.57, 0.67, 0.88, nan, nan, nan, 0.68, 0.29, 0.33, 0.98, 0.17,\n", + " nan, 0.92, 0.98, 0.76, 0.31, 0.97, 0.08, 0.56, nan, 0.49, 0.07,\n", + " 0.11])" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 34 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "e497B492fFru", + "outputId": "68f697f4-4778-43e9-9b4f-87d930eccc46", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 153 + } + }, + "source": [ + "a_conjunto1[~np.isnan(a_conjunto1)]\n", + "## o til nega a condição e retorna tudo que é falso\n", + "## colo o np.isnan gera o arrau com true e false, se eu falo que quero a negação e peço as posições com o falso\n", + "## eu estou retirando os nan" + ], + "execution_count": 36, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([0.53, 0.57, 0.65, 0.86, 0.6 , 0.87, 0.46, 0.64, 0.55, 0.35, 0.32,\n", + " 0.85, 0.76, 0.66, 0.33, 0.35, 0.42, 0.31, 0.27, 0.31, 0.36, 0.6 ,\n", + " 0.02, 0.36, 0.28, 0.37, 0.44, 0.2 , 0.21, 0.65, 0.82, 0.72, 0.5 ,\n", + " 0.17, 0.6 , 0.14, 0.71, 0.07, 0.56, 0.84, 0.21, 0.85, 0.63, 0.38,\n", + " 0.91, 0.34, 0.07, 0.1 , 0.85, 0.12, 0.94, 0.16, 0.91, 0.59, 0.37,\n", + " 0.72, 0.07, 0.48, 0.78, 0.97, 0.72, 0.29, 0.33, 0.95, 0.24, 0.98,\n", + " 0.85, 0.63, 0.57, 0.67, 0.88, 0.68, 0.29, 0.33, 0.98, 0.17, 0.92,\n", + " 0.98, 0.76, 0.31, 0.97, 0.08, 0.56, 0.49, 0.07, 0.11])" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 36 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "RpvKfJU_fmA6" + }, + "source": [ + "Observe que os NaN's foram excluidos." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "7ereghZPcdh4" + }, + "source": [ + "EXERCÍCIO - ATRIBUIR A MEDIANA AOS VALORES DA AMOSTRA\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "_Dv8MmNYg8zN" + }, + "source": [ + "___\n", + "# **Converter lista em array**\n", + "> Considere a lista a seguir:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "but6T9dVhFYb", + "outputId": "001bc55c-fe58-40ab-ff3c-3ea90d4e71d7", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + } + }, + "source": [ + "l_Lista = [np.random.randint(0, 10, 10)]\n", + "l_Lista" + ], + "execution_count": 37, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "[array([8, 9, 3, 7, 1, 3, 2, 9, 7, 7])]" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 37 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "xytj4Eo4hTh9", + "outputId": "4c2d1778-13ec-4717-a3a8-d538cfcf896a", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + } + }, + "source": [ + "type(l_Lista)" + ], + "execution_count": 38, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "list" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 38 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "qrINdcruhWcH" + }, + "source": [ + "Convertendo a minha lista para array:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "RoSyaX0OhZSE", + "outputId": "8b194262-97d5-45ee-9336-3a5c0a5ad802", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + } + }, + "source": [ + "a_conjunto = np.asarray(l_Lista)\n", + "a_conjunto" + ], + "execution_count": 39, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([[8, 9, 3, 7, 1, 3, 2, 9, 7, 7]])" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 39 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "dMjTdbBUhlrk", + "outputId": "4d140e4f-2e4c-4aa7-8e99-481cb4d1cdc3", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + } + }, + "source": [ + "type(a_conjunto)" + ], + "execution_count": 40, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "numpy.ndarray" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 40 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Mbm3ZP9DhxDI" + }, + "source": [ + "___\n", + "# Converter tupla em array\n", + "> Considere a tupla a seguir:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "cZxEFYLAh3S_" + }, + "source": [ + "np.random.seed(20111974)\n", + "t_numeros = ([np.random.randint(0, 10, 3)], [np.random.randint(0, 10, 3)], [np.random.randint(0, 10, 3)])\n", + "t_numeros" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "vlTXUJviiAml" + }, + "source": [ + "type(t_numeros)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "yEaOlq8oh3oh" + }, + "source": [ + "a_conjunto = np.asarray(t_numeros)\n", + "a_conjunto" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "PSgQDmRWh3g5" + }, + "source": [ + "type(a_conjunto)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "pH-Ht6yMiqJN" + }, + "source": [ + "___\n", + "# Acrescentar elementos à um array\n", + "> Considere o array a seguir:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "dFaDZInZiwoo" + }, + "source": [ + "a_conjunto1 = np.arange(5)\n", + "a_conjunto1" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "d3zrlf_Ci73Z" + }, + "source": [ + "np.random.seed(20111974)\n", + "a_conjunto1 = np.append(a_conjunto1, [np.random.randint(0, 10, 3), np.random.randint(0, 10, 3), np.random.randint(0, 10, 3)])\n", + "a_conjunto1" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "eFRhtk13ojqA" + }, + "source": [ + "___\n", + "# **Converter array 1D num array 2D**\n", + "> Considere os arrays a seguir:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "wYhBgW9Zu6ZP" + }, + "source": [ + "np.random.seed(20111974)\n", + "a_conjunto1 = np.array(np.random.randint(0, 10, 6))\n", + "\n", + "np.random.seed(19741120)\n", + "a_conjunto2 = np.array(np.random.randint(0, 10, 6))" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "febs9AUHvs6n" + }, + "source": [ + "a_conjunto1" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "C9OEd-iavvBm" + }, + "source": [ + "a_conjunto2" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "KJWjtaWKv0MJ" + }, + "source": [ + "np.column_stack((a_conjunto1, a_conjunto2)) # Atenção aos parênteses em (a_conjunto1, a_conjunto2)." + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "xr_WZXJ7pi2D" + }, + "source": [ + "___\n", + "# **Excluir um elemento específico do array usando indices**\n", + "> Considere os arrays a seguir:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "tS0ZzOs8w0dw" + }, + "source": [ + "np.random.seed(20111974)\n", + "a_conjunto1 = np.array(np.random.randint(0, 10, 6))\n", + "a_conjunto1" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "7bOJiKDKxEsC" + }, + "source": [ + "Suponha que eu queira excluir os valores '8' de a_conjunto1. Os índices dos valores '8' são: [0, 1, 3]. Portanto, temos:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "SSjueEvjxTJO" + }, + "source": [ + "a_conjunto1 = np.delete(a_conjunto1, [0, 1, 3])\n", + "a_conjunto1" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "mZkGZ2Rgp--5" + }, + "source": [ + "___\n", + "# **Frequência dos valores únicos de um array**\n", + "> Considere o array a seguir:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "Z2BWKfH0xvQ8", + "outputId": "0405171f-5590-434a-87be-39d44e18ce17", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 102 + } + }, + "source": [ + "np.random.seed(20111974)\n", + "a_conjunto1 = np.array(np.random.randint(0, 10, 100))\n", + "a_conjunto1" + ], + "execution_count": 41, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([8, 8, 2, 8, 9, 1, 8, 0, 4, 2, 0, 8, 9, 3, 7, 1, 3, 2, 9, 7, 7, 9,\n", + " 5, 6, 8, 7, 0, 9, 3, 9, 3, 1, 8, 6, 3, 5, 4, 1, 2, 9, 8, 6, 6, 1,\n", + " 0, 9, 2, 0, 7, 5, 5, 4, 4, 2, 7, 2, 7, 9, 3, 1, 5, 0, 1, 2, 3, 8,\n", + " 7, 5, 4, 0, 5, 9, 6, 6, 1, 3, 6, 0, 4, 9, 2, 1, 0, 9, 1, 4, 2, 9,\n", + " 7, 9, 5, 3, 7, 6, 3, 9, 8, 4, 3, 0])" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 41 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "s_tdQBsax4rQ" + }, + "source": [ + "Suponha que eu queira saber quantas vezes o número/elemento '2' aparece em a_conjunto1." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "6yIlk7pWyAtf", + "outputId": "01f739af-34ea-448c-992f-ee587482c359", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + } + }, + "source": [ + "l_itens_unicos, i_count = np.unique(a_conjunto1, return_counts=True)\n", + "## é uma função que retorna os itens unicos e a quantidade que aparecem\n", + "l_itens_unicos" + ], + "execution_count": 42, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 42 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "DyvrIwS9yZIR" + }, + "source": [ + "O que significa o output acima?" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "uO-MPMhXyV9H", + "outputId": "4f477738-6362-4177-a6ec-dd559dc9dc71", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + } + }, + "source": [ + "i_count" + ], + "execution_count": 43, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([10, 10, 10, 11, 8, 8, 8, 10, 10, 15])" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 43 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "zwoezXrPyofK" + }, + "source": [ + "Qual a interpretação do output acima?" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "HgYycSG7yr5e", + "outputId": "02fe1140-2976-4715-e211-20d27baf3c87", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 51 + } + }, + "source": [ + "np.asarray((l_itens_unicos, i_count))" + ], + "execution_count": 44, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9],\n", + " [10, 10, 10, 11, 8, 8, 8, 10, 10, 15]])" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 44 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "SwIZiJAiy06T" + }, + "source": [ + "Qual a interpretação do output acima?" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Wy_tqAPgdchD" + }, + "source": [ + "é a frequencia com que cada um aparece" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "JpNRpN2Dql3N" + }, + "source": [ + "___\n", + "# **Combinações possíveis de outros arrays**\n", + "> Considere o exemplo a seguir:\n" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "BUr89dH4zLXD" + }, + "source": [ + "a_conjunto1 = [2, 4, 6]\n", + "a_conjunto2 = [0, 8]\n", + "a_conjunto4 = [1, 5]" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "cEZH6l-Czx7y" + }, + "source": [ + "np.meshgrid(a_conjunto1, a_conjunto2, a_conjunto4)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "btvmDkEcz0tH" + }, + "source": [ + "np.array(np.meshgrid(a_conjunto1, a_conjunto2, a_conjunto4))" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "Z0xhO7rGz059" + }, + "source": [ + "np.array(np.meshgrid(a_conjunto1, a_conjunto2, a_conjunto4)).T" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "eMv4lFnD0Enn" + }, + "source": [ + "# Resultado final\n", + "a_conjunto3 = np.array(np.meshgrid(a_conjunto1, a_conjunto2, a_conjunto4)).T.reshape(-1,3)\n", + "a_conjunto3" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Rz80YANfAh2k" + }, + "source": [ + "___\n", + "# **Wrap Up**" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "_cyhMsAVXxGC" + }, + "source": [ + "___\n", + "# **Exercícios**" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "kNjovMw3uJ3R" + }, + "source": [ + "## Exercício 1 - Selecionar os números pares\n", + "> Dado o 1D array abaixo, selecionar somente os números pares." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "isDzQjwjBX3V", + "outputId": "ad54cd80-fa6e-4772-a869-1b7f98b09725", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + } + }, + "source": [ + "a_conjunto1 = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])\n", + "a_conjunto1" + ], + "execution_count": 45, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 45 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Kq1zt-uO1HXv" + }, + "source": [ + "### **Minha solução**" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "YFmK_n2M1Ks9", + "outputId": "496556f7-9ff2-40f7-8cbf-2a1588793e40", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + } + }, + "source": [ + "a_conjunto1[a_conjunto1 % 2 == 0]" + ], + "execution_count": 46, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([0, 2, 4, 6, 8])" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 46 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "sScYG0hp05vb" + }, + "source": [ + "___\n", + "## Exercício 2 - Substituir pela mediana\n", + "> Dado o array 1D abaixo, substituir os números pares pela mediana de a_conjunto1." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "XLZ-DIWU1WFs", + "outputId": "aebcdfd3-b244-4cb2-f33b-a8083d241d17", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + } + }, + "source": [ + "a_conjunto1 = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])\n", + "a_conjunto1" + ], + "execution_count": 47, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 47 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "9c4QWJno1WVB" + }, + "source": [ + "### **Minha solução**\n", + "* Primeiramente, precisamos calcular a mediana.\n", + "* Depois, substituimos os valores pares de a_conjunto1 pela mediana encontrada anteriormente. Ok?" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "rx7NGAO01Wfb", + "outputId": "575c869f-1a28-49db-9516-47c84cecacc2", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + } + }, + "source": [ + "a_conjunto1[a_conjunto1 % 2 == 0] = np.median(a_conjunto1)\n", + "a_conjunto1" + ], + "execution_count": 48, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([4, 1, 4, 3, 4, 5, 4, 7, 4, 9])" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 48 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "2c_AphX82qp8" + }, + "source": [ + "Verificando..." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "9kVta0Cr13Z9", + "outputId": "cb8af387-8353-49f1-cf9e-37ee7c77b607", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 35 + } + }, + "source": [ + "f'A média de a_conjunto1 é: {np.median(a_conjunto1)}'" + ], + "execution_count": 49, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "string" + }, + "text/plain": [ + "'A média de a_conjunto1 é: 4.0'" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 49 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "L9O-Hf5x26TY" + }, + "source": [ + "___\n", + "## Exercício 3 - Reshape\n", + "> Dado o array 1D abaixo, reshape para um array 2D com 3 colunas." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "0_laUvtB4Wl-", + "outputId": "34954fdf-6e28-477c-ac93-03b510461a21", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + } + }, + "source": [ + "# Define seed\n", + "np.random.seed(20111974)\n", + "a_conjunto1 = np.array(np.random.randint(1, 10, size = 15))\n", + "a_conjunto1" + ], + "execution_count": 50, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([9, 9, 3, 9, 2, 9, 1, 5, 3, 1, 9, 4, 8, 2, 4])" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 50 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "dKzEX8TK5b4Z" + }, + "source": [ + "### **Minha solução**\n", + "* O array 1D a_conjunto1 acima possui 15 elementos. Como queremos transformá-lo num array 2D com 3 colunas, então cada coluna terá 5 elementos." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "r5hJ-wMwjXPR", + "outputId": "2eadc741-755c-46d2-bd8c-ad0d3cc8ad8e", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 102 + } + }, + "source": [ + "a_conjunto1.reshape(-1,3)" + ], + "execution_count": 51, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([[9, 9, 3],\n", + " [9, 2, 9],\n", + " [1, 5, 3],\n", + " [1, 9, 4],\n", + " [8, 2, 4]])" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 51 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "I-j5yVD04249" + }, + "source": [ + "a_conjunto1.reshape(5, 3) \n", + "# Poderia ser a_conjunto1.reshape(-1, 3), onde \"-1\" pede para o NumPy calcular o número de linhas. " + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "F1vfS8jE6L0_" + }, + "source": [ + "___\n", + "## Exercício 4 - Reshape\n", + "> Dado o array 1D abaixo, reshape para um array 3D com 2 colunas." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "xcN-bez56L1D" + }, + "source": [ + "# Define seed\n", + "np.random.seed(20111974)\n", + "a_conjunto1 = np.array(np.random.randint(1, 10, size = 16))\n", + "a_conjunto1" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "7iICnOyG6fcj" + }, + "source": [ + "### **Minha solução**\n", + "* O array 1D a_conjunto1 acima possui 16 elementos. Queremos transformá-lo num array 3D com 2 colunas." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "vdq5ybuD6fcn" + }, + "source": [ + "a_conjunto1.reshape(-1, 2) # O valor \"-1\" na posição das linhas pede ao NumPy para calcular o número de linhas automaticamente." + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "haQfWPcCs_H0" + }, + "source": [ + "## Exercício 5\n", + "Para mais exercícios envolvendo arrays, visite a página [Python: Array Exercises, Practice, Solution](https://www.w3resource.com/python-exercises/array/)." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "LQQL0JS2tnc0" + }, + "source": [ + "## Exercício 6\n", + "Para mais exercícios envolvendo matemática, viste a página [Python Math: - Exercises, Practice, Solution](https://www.w3resource.com/python-exercises/math/index.php)." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "qNskKFy9t4D5" + }, + "source": [ + "## Exercício 7\n", + "Para mais exercícios envolvendo NumPy em geral, visite a página [NumPy Exercises, Practice, Solution](https://www.w3resource.com/python-exercises/numpy/index.php)." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "qqc1AiHXuKZ5" + }, + "source": [ + "## Exercício 8\n" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "jYrgc3KvtmLy" + }, + "source": [ + "" + ], + "execution_count": null, + "outputs": [] + } + ] +} \ No newline at end of file From a568ce97c04618fae60156c588d5f3bea7a057a1 Mon Sep 17 00:00:00 2001 From: MariaJacobs70 <72224154+MariaJacobs70@users.noreply.github.com> Date: Wed, 7 Oct 2020 15:45:59 -0300 Subject: [PATCH 3/9] Criado usando o Colaboratory --- Notebooks/NB07__Dictionaries_alterado.ipynb | 2270 +++++++++++++++++++ 1 file changed, 2270 insertions(+) create mode 100644 Notebooks/NB07__Dictionaries_alterado.ipynb diff --git a/Notebooks/NB07__Dictionaries_alterado.ipynb b/Notebooks/NB07__Dictionaries_alterado.ipynb new file mode 100644 index 000000000..aeea3db03 --- /dev/null +++ b/Notebooks/NB07__Dictionaries_alterado.ipynb @@ -0,0 +1,2270 @@ +{ + "nbformat": 4, + "nbformat_minor": 0, + "metadata": { + "colab": { + "name": "NB07__Dictionaries.ipynb", + "provenance": [], + "collapsed_sections": [ + "n8BIbzQbNWUo", + "7eS94uQ4NhVR", + "SYOgJpGYVLUu", + "CaHFxk98W5if", + "ReWUyWiHXCnc", + "CqszHxaKHr2h", + "tXgF1Wl9gHKY", + "Fotx7XUquAo8", + "36kmLUYDvsUI", + "SWO2GdNovxAp", + "vpN54l4vxze5", + "u4HOf9SNytSq", + "6BQ9oZiD9hg5", + "tz5-QdrX9vct", + "p1muBgMX8NK4", + "FxTC2-U88ajk", + "z8EYn0pP25Rh" + ], + "include_colab_link": true + }, + "kernelspec": { + "name": "python3", + "display_name": "Python 3" + }, + "accelerator": "GPU" + }, + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "view-in-github", + "colab_type": "text" + }, + "source": [ + "\"Open" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "iBW6agsvqqAm" + }, + "source": [ + "

DICIONÁRIOS

\n", + "\n", + "* Coleção desordenada, mutável e indexada (estrutura do tipo {key: value}) de itens;\n", + "* Não permite itens duplicados;\n", + "* Usamos {key: value} para representar os itens do dicionário;\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "LFcr_2Xnq2ho" + }, + "source": [ + "# **AGENDA**:\n", + "\n", + "> Veja o **índice** dos itens que serão abordados neste capítulo.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "r8vR-lHJIhgM" + }, + "source": [ + "# **NOTAS E OBSERVAÇÕES**\n", + "* Levar os exemplos de lambda function daqui para o capítulo de Lambda Function.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "DkxCxjsbE5fL" + }, + "source": [ + "# **CHEETSHEET**" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "cGUWTualFCOk" + }, + "source": [ + "![DataSctructures](https://github.com/MathMachado/Materials/blob/master/PythonDataStructures.png?raw=true)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ublDMf3R_qMn" + }, + "source": [ + "A seguir, os principais métodos associados aos dicionários. Para isso, considere as listas l_frutas e l_precos_frutas que darão origem ao dicionário d_frutas a seguir:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "FxuJ7Awd8f5a" + }, + "source": [ + "# Definição da lista l_frutas:\n", + "l_frutas = ['Avocado', 'Apple', 'Apricot', 'Banana', 'Blackcurrant', 'Blackberry', 'Blueberry', 'Cherry', 'Coconut', 'Fig', 'Grape', 'Kiwi', 'Lemon', 'Mango', 'Nectarine', \n", + " 'Orange', 'Papaya','Passion Fruit','Peach','Pineapple','Plum','Raspberry','Strawberry','Watermelon']" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "jJyxuMQc9Ewy" + }, + "source": [ + "# Definição da lista l_precos_frutas:\n", + "l_precos_frutas = [0.35, 0.40, 0.25, 0.30, 0.70, 0.55, 0.45, 0.50, 0.75, 0.60, 0.65, 0.20, 0.15, 0.80, 0.75, 0.25, 0.30,0.45,0.55,0.55,0.60,0.40,0.50,0.45]" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "hXP3kxW4-AI1" + }, + "source": [ + "Observe abaixo o uso das funções dict() e zip() para criarmos o dicionário d_frutas:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "qT_4sYxA9dyn", + "outputId": "a8badcb1-7f11-4b2e-8629-48a36cc95f9d", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 425 + } + }, + "source": [ + "d_frutas = dict(zip(l_frutas, l_precos_frutas))\n", + "d_frutas" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "{'Apple': 0.4,\n", + " 'Apricot': 0.25,\n", + " 'Avocado': 0.35,\n", + " 'Banana': 0.3,\n", + " 'Blackberry': 0.55,\n", + " 'Blackcurrant': 0.7,\n", + " 'Blueberry': 0.45,\n", + " 'Cherry': 0.5,\n", + " 'Coconut': 0.75,\n", + " 'Fig': 0.6,\n", + " 'Grape': 0.65,\n", + " 'Kiwi': 0.2,\n", + " 'Lemon': 0.15,\n", + " 'Mango': 0.8,\n", + " 'Nectarine': 0.75,\n", + " 'Orange': 0.25,\n", + " 'Papaya': 0.3,\n", + " 'Passion Fruit': 0.45,\n", + " 'Peach': 0.55,\n", + " 'Pineapple': 0.55,\n", + " 'Plum': 0.6,\n", + " 'Raspberry': 0.4,\n", + " 'Strawberry': 0.5,\n", + " 'Watermelon': 0.45}" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 36 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "iHKUaGNT_IDt" + }, + "source": [ + "A seguir, resumo dos principais métodos relacionados à dicionários:" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "MQLZ1mwW_yiU" + }, + "source": [ + "| Método | Descrição | Exemplo | Resultado |\n", + "|-------------------------|----------------------------------------------------------------------------------------------------|------------------------------------------|--------------------------------------------------------------------------------|\n", + "| d_dicionario.clear() | Remove todos os itens de d_dicionario | d_frutas.clear() | {} |\n", + "| d_dicionario.copy() | Retorna uma cópia de d_dicionario | d_frutas2= d_frutas.copy() | d_frutas2 é uma cópia de d_frutas |\n", + "| d_dicionario.get(key) | Retorna o valor para key, se key estiver em d_dicionario | d_frutas.get('Passion Fruit') | 0.45 |\n", + "| | | d_frutas.get('XPTO') | O Python não apresenta nenhum retorno |\n", + "| d_dicionario.items() | Retorna um objeto com as tuplas (key, valor) de d_dicionario | d_frutas.items() | dict_items([('Avocado', 0.35), ..., ('Watermelon', 0.45)]) |\n", + "| d_dicionario.keys() | Retorna um objeto com as keys de d_dicionario | d_frutas.keys() | dict_keys(['Avocado', 'Apple', ..., 'Watermelon']) |\n", + "| d_dicionario.values() | Retorna um objeto com os valores de d_dicionario | d_frutas.values() | dict_values([0.35, 0.4, ..., 0.45]) |\n", + "| d_dicionario.popitem() | Retorna e remove um item de d_dicionario | d_frutas.popitem() | ('Watermelon', 0.45) |\n", + "| | | 'Watermelon' in d_frutas | False |\n", + "| d_dicionario.pop(key[, default]) | Retorna e remove o item de d_dicionario correspondente à key | d_frutas.pop('Orange') | 0.25 |\n", + "| | | 'Orange' in d_frutas | False |\n", + "| d_dicionario.update(d2) | Adiciona item(s) à d_dicionario se key não estiver em d_dicionario. Se key estiver em d_dicionario, atualizará key com o novo valor | d_frutas.update({'Cherimoya': 1.3}) | Adicionará o item {'Cherimoya': 1.3} à d_frutas, pois key= 'Cherimoya' não está em d_frutas. |\n", + "| | | d_frutas.update({'Orange': 0.55}) | Atualiza o valor de key= 'Orange' para 0.55. O valor anterior era 0.25 |\n", + "| d_dicionario.fromkeys(keys, value) | Retorna um dicionário com keys especificadas e valores | tFruits= ('Avocado', 'Apple', 'Apricot') | |\n", + "| | | d_frutas.fromkeys(tFruits, 0) | {'Apple': 0, 'Apricot': 0, 'Avocado': 0} |" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "uH6cHnctDu2l" + }, + "source": [ + "A seguir, vamos apresentar mais alguns exemplos de dicionários e seus métodos associados:" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "YeCPxCab4e4k" + }, + "source": [ + "___\n", + "# **EXEMPLO**\n", + "* Os dias da semana como dicionário." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "N_2J839X4lps", + "outputId": "54ca32b8-7dda-4907-8aa5-b327444bd458", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 134 + } + }, + "source": [ + "d_dia_semana = {'Seg': 'Segunda', 'Ter': 'Terça', 'Qua': 'Quarta', 'Qui': 'Quinta', 'Sex': 'Sexta', 'Sab': 'Sabado', 'Dom': 'Domingo'}\n", + "d_dia_semana" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "{'Dom': 'Domingo',\n", + " 'Qua': 'Quarta',\n", + " 'Qui': 'Quinta',\n", + " 'Sab': 'Sabado',\n", + " 'Seg': 'Segunda',\n", + " 'Sex': 'Sexta',\n", + " 'Ter': 'Terça'}" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 1 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "CnZLR-VX6FV4" + }, + "source": [ + "Observe que:\n", + "* os itens do dicionário d_dia_semana seguem a estrutura {key: value}.\n" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "eHuvY7BWQKhQ", + "outputId": "87fabef2-0891-4994-a4ce-cdd1e23218b1", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + } + }, + "source": [ + "d_dia_semana['Seg']" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "'Segunda'" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 2 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "j65BxhzGG0NA" + }, + "source": [ + "___\n", + "# **DECLARAR OU INICIALIZAR UM DICIONÁRIO VAZIO**" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "LEGwQ0U-fKtL" + }, + "source": [ + "Por exemplo, o comando abaixo declara um dicionário vazio chamado d_paises:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "2iPWXPBLfOlr", + "outputId": "7925813c-77e2-4651-bdb2-f0e1144aecdb", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + } + }, + "source": [ + "d_paises = {} # Também podemos usar a função dict() para criar o dicionário vazio da seguinte forma: d_paises= dict()\n", + "d_paises" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "{}" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 4 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "vCxZv-jmG5y0" + }, + "source": [ + "___\n", + "# **OBTER O TIPO DO OBJETO**\n", + "> type(d_dicionario)" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "voPYpGIGff3o", + "outputId": "7fab37f5-8ed1-46d8-b47b-62a100ee2196", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + } + }, + "source": [ + "type(d_paises)" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "dict" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 5 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "X3MvCkFiG-UO" + }, + "source": [ + "___\n", + "# **ADICIONAR ITENS AO DICIONÁRIO**" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "fzP8iG5xfi0H" + }, + "source": [ + "Adicionar o valor 'Italy' à key = 1. Em outras palavras, estamos a adicionar o item {1: 'Italy'}" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "EXZ7eEZofnza", + "outputId": "bacf6377-b5cc-4f29-f6c0-347516550f37", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + } + }, + "source": [ + "d_paises[1] = 'Italy'\n", + "d_paises" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "{1: 'Italy'}" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 6 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "rH51ORGHHREE" + }, + "source": [ + "Adicionar o valor 'Denmark' à key= 2. Em outras palavras, estamos a adicionar o item {2: 'Denmark'}" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "GAXSzSiufv1u", + "outputId": "94f7e900-0452-4908-c0fe-2783a735bd01", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + } + }, + "source": [ + "d_paises[2] = 'Denmark'\n", + "d_paises" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "{1: 'Italy', 2: 'Denmark'}" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 7 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Xqdc_IYoHVVQ" + }, + "source": [ + "Adicionar o valor 'Brazil' à key= 3. Em outras palavras, estamos a adicionar o item {3: 'Brazil'}" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "FN7km8C9gAjM", + "outputId": "7863cccb-b0aa-47b2-901b-ba72a5574d4f", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + } + }, + "source": [ + "d_paises[3]= 'Brazil'\n", + "d_paises" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "{1: 'Italy', 2: 'Denmark', 3: 'Brazil'}" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 8 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "iwU8pJKRHapD" + }, + "source": [ + "___\n", + "# **ATUALIZAR VALORES DO DICIONÁRIO**" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "CxXUV7TugLXn" + }, + "source": [ + "O que acontece quando eu atribuo à key 3 outro valor, por exemplo, 'France'. Vamos conferir abaixo:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "Rr6DtJnDgU5I", + "outputId": "f02c2c47-e5aa-43cd-886b-345c52ed31bd", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + } + }, + "source": [ + "# Adicionar o valor 'France' à key= 3\n", + "d_paises[3]= 'France'\n", + "d_paises" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "{1: 'Italy', 2: 'Denmark', 3: 'France'}" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 9 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "xB9G1l3_ggo-" + }, + "source": [ + "Como a key= 3 existe no dicionário d_paises, então o Python substitui o valor anterior 'Brazil' pelo novo valor, 'France'. \n", + "\n", + "* Lembre-se, os dicionários são mutáveis!" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "T8JBxySZHiOJ" + }, + "source": [ + "___\n", + "# **OBTER KEYS DO DICIONÁRIO**" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "ALwbHwi4iwky", + "outputId": "bb0d57fb-2742-4eb1-9d82-9309142d21f5", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + } + }, + "source": [ + "d_paises.keys()" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "dict_keys([1, 2, 3])" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 10 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "FIvi0Li1Hng5" + }, + "source": [ + "___\n", + "# **OBTER VALORES DO DICIONÁRIO**" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "cp0PPtl3jEKo", + "outputId": "c7b8739a-caa9-4e58-e6d3-0f86ccd2d950", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + } + }, + "source": [ + "d_paises.values()" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "dict_values(['Italy', 'Denmark', 'France'])" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 11 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "JUblZBMjHrwl" + }, + "source": [ + "___\n", + "# **OBTER ITENS (key, value) DO DICIONÁRIO**" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "LraTwXjdjG3m", + "outputId": "b3d6d55e-20ad-4f88-a783-9ba1c4fd8654", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 162 + } + }, + "source": [ + "d_paises.items()" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "error", + "ename": "NameError", + "evalue": "ignored", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)", + "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0md_Paises\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mitems\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", + "\u001b[0;31mNameError\u001b[0m: name 'd_Paises' is not defined" + ] + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "IJEMg2LKHyGa" + }, + "source": [ + "___\n", + "# **OBTER VALOR PARA UMA KEY ESPECÍFICA**\n", + "* d_dicionario.get(key)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "dzgBhsphjSQm" + }, + "source": [ + "Qual o valor para key= 1?" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "FUfTjqktjW60", + "outputId": "678ab629-6cff-4fe1-e03f-d90709a98f26", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + } + }, + "source": [ + "d_paises.get(1)" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "'Italy'" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 11 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "tyJ0KsloIBoD" + }, + "source": [ + "___\n", + "# **COPIAR DICIONÁRIO**\n", + "* d_dicionario.copy()" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "XL17EmvMkkky", + "outputId": "65846bc2-87a2-42cf-eb17-e3fccb00c9a4", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 35 + } + }, + "source": [ + "d_paises2 = d_paises.copy()\n", + "d_paises2" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "{1: 'Italy', 2: 'Denmark', 3: 'France'}" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 28 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "8V25l2ZoIG4B" + }, + "source": [ + "___\n", + "# **REMOVER TODOS OS ITENS DO DICIONÁRIO**\n", + "* d_dicionario.clear()" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "r-8Gs1gYjqLN" + }, + "source": [ + "d_paises.clear()" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "ro_42gzDjsdV", + "outputId": "a2c2a25b-40ef-4842-f2f7-3ac85404d195", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 35 + } + }, + "source": [ + "d_paises" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "{}" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 13 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "pCzKkKoujv7G" + }, + "source": [ + "Como esperado, removemos todos os itens do dicionário d_paises. Entretanto, o dicionário d_paises continua a existir!" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "MKtPwGVsIaLQ" + }, + "source": [ + "___\n", + "# **DELETAR O DICIONÁRIO**\n", + "* del d_dicionario" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "8wvM-o7Lj7A0" + }, + "source": [ + "del d_paises" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "wK83ZURYkD_T", + "outputId": "03254461-9939-4ef9-de30-c4b59c920674", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 166 + } + }, + "source": [ + "d_paises" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "error", + "ename": "NameError", + "evalue": "ignored", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)", + "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mdCountries\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", + "\u001b[0;31mNameError\u001b[0m: name 'dCountries' is not defined" + ] + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "aSe3veUB1lo_" + }, + "source": [ + "Como esperado, pois agora o dicionário já não existe mais. Ok?" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "STtkGUvEg7d1" + }, + "source": [ + "___\n", + "# **ITERAR PELO DICIONÁRIO**\n", + "* Considere o dicionário d_frutas a seguir:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "IG8hKSvcfalZ" + }, + "source": [ + "# Definindo os valores iniciais do dicionário d_frutas:\n", + "d_frutas = {'Avocado': 0.35, \n", + " 'Apple': 0.40, \n", + " 'Apricot': 0.25, \n", + " 'Banana': 0.30, \n", + " 'Blackcurrant': 0.70, \n", + " 'Blackberry': 0.55, \n", + " 'Blueberry': 0.45, \n", + " 'Cherry': 0.50, \n", + " 'Coconut': 0.75, \n", + " 'Fig': 0.60, \n", + " 'Grape': 0.65, \n", + " 'Kiwi': 0.20, \n", + " 'Lemon': 0.15, \n", + " 'Mango': 0.80, \n", + " 'Nectarine': 0.75, \n", + " 'Orange': 0.25, \n", + " 'Papaya': 0.30,\n", + " 'Passion Fruit': 0.45,\n", + " 'Peach': 0.55,\n", + " 'Pineapple': 0.55,\n", + " 'Plum': 0.60,\n", + " 'Raspberry': 0.40,\n", + " 'Strawberry': 0.50,\n", + " 'Watermelon': 0.45}" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ppRkK_jJJG6W" + }, + "source": [ + "Mostrando os itens do dicionário d_frutas:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "bI7Ctf0ohyz8", + "outputId": "5af38cf4-aaf8-4efa-f682-502bfb5022a6", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 425 + } + }, + "source": [ + "d_frutas" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "{'Apple': 0.4,\n", + " 'Apricot': 0.25,\n", + " 'Avocado': 0.35,\n", + " 'Banana': 0.3,\n", + " 'Blackberry': 0.55,\n", + " 'Blackcurrant': 0.7,\n", + " 'Blueberry': 0.45,\n", + " 'Cherry': 0.5,\n", + " 'Coconut': 0.75,\n", + " 'Fig': 0.6,\n", + " 'Grape': 0.65,\n", + " 'Kiwi': 0.2,\n", + " 'Lemon': 0.15,\n", + " 'Mango': 0.8,\n", + " 'Nectarine': 0.75,\n", + " 'Orange': 0.25,\n", + " 'Papaya': 0.3,\n", + " 'Passion Fruit': 0.45,\n", + " 'Peach': 0.55,\n", + " 'Pineapple': 0.55,\n", + " 'Plum': 0.6,\n", + " 'Raspberry': 0.4,\n", + " 'Strawberry': 0.5,\n", + " 'Watermelon': 0.45}" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 8 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "wXFfyiyPtD35" + }, + "source": [ + "Qual o valor para a fruta 'Apple'? Para responder à esta pergunta, basta lembrar que 'Apple' é uma key do dicionário d_frutas. Certo?" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "JpreyE_LtCcU", + "outputId": "cee4be2d-7980-4a3d-85fb-17561d1bb1ff", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + } + }, + "source": [ + "d_frutas['Apple']" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "0.4" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 21 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "JBMf8SbAJmiq" + }, + "source": [ + "## Iterar pelas keys do dicionário:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "rMro_tY8kepo", + "outputId": "4488c243-6792-4efa-b271-e546270b129d", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 425 + } + }, + "source": [ + "for key in d_frutas.keys():\n", + " print(key)" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "text": [ + "Avocado\n", + "Apple\n", + "Apricot\n", + "Banana\n", + "Blackcurrant\n", + "Blackberry\n", + "Blueberry\n", + "Cherry\n", + "Coconut\n", + "Fig\n", + "Grape\n", + "Kiwi\n", + "Lemon\n", + "Mango\n", + "Nectarine\n", + "Orange\n", + "Papaya\n", + "Passion Fruit\n", + "Peach\n", + "Pineapple\n", + "Plum\n", + "Raspberry\n", + "Strawberry\n", + "Watermelon\n" + ], + "name": "stdout" + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "yDkOLvRFJxco" + }, + "source": [ + "## Iterar pelos itens (key, value) do dicionário" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "DpFB1g-3kDSt", + "outputId": "7ac51581-edfa-418d-a1e0-d297ebdffca7", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 425 + } + }, + "source": [ + "for item in d_frutas.items():\n", + " print(item) " + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "text": [ + "('Avocado', 0.35)\n", + "('Apple', 0.4)\n", + "('Apricot', 0.25)\n", + "('Banana', 0.3)\n", + "('Blackcurrant', 0.7)\n", + "('Blackberry', 0.55)\n", + "('Blueberry', 0.45)\n", + "('Cherry', 0.5)\n", + "('Coconut', 0.75)\n", + "('Fig', 0.6)\n", + "('Grape', 0.65)\n", + "('Kiwi', 0.2)\n", + "('Lemon', 0.15)\n", + "('Mango', 0.8)\n", + "('Nectarine', 0.75)\n", + "('Orange', 0.25)\n", + "('Papaya', 0.3)\n", + "('Passion Fruit', 0.45)\n", + "('Peach', 0.55)\n", + "('Pineapple', 0.55)\n", + "('Plum', 0.6)\n", + "('Raspberry', 0.4)\n", + "('Strawberry', 0.5)\n", + "('Watermelon', 0.45)\n" + ], + "name": "stdout" + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "8z6qO74fJ6Q1" + }, + "source": [ + "## Iterar pelos valores do dicionário" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "tjJ6qRF8nr4v", + "outputId": "3e75843b-2d45-4b4c-a3f2-24ffc0e60a7a", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 425 + } + }, + "source": [ + "for value in d_frutas.values():\n", + " print(value)" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "text": [ + "0.35\n", + "0.4\n", + "0.25\n", + "0.3\n", + "0.7\n", + "0.55\n", + "0.45\n", + "0.5\n", + "0.75\n", + "0.6\n", + "0.65\n", + "0.2\n", + "0.15\n", + "0.8\n", + "0.75\n", + "0.25\n", + "0.3\n", + "0.45\n", + "0.55\n", + "0.55\n", + "0.6\n", + "0.4\n", + "0.5\n", + "0.45\n" + ], + "name": "stdout" + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "-LmEUroVKDUA" + }, + "source": [ + "## Iterar pela key e valor do dicionário" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "oRhZ_Zq9oQIg", + "outputId": "be168183-30b4-4f96-ae2c-3f313acbc558", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 425 + } + }, + "source": [ + "for key, value in d_frutas.items():\n", + " print(\"%s --> %s\" %(key, value))" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "text": [ + "Avocado --> 0.35\n", + "Apple --> 0.4\n", + "Apricot --> 0.25\n", + "Banana --> 0.3\n", + "Blackcurrant --> 0.7\n", + "Blackberry --> 0.55\n", + "Blueberry --> 0.45\n", + "Cherry --> 0.5\n", + "Coconut --> 0.75\n", + "Fig --> 0.6\n", + "Grape --> 0.65\n", + "Kiwi --> 0.2\n", + "Lemon --> 0.15\n", + "Mango --> 0.8\n", + "Nectarine --> 0.75\n", + "Orange --> 0.25\n", + "Papaya --> 0.3\n", + "Passion Fruit --> 0.45\n", + "Peach --> 0.55\n", + "Pineapple --> 0.55\n", + "Plum --> 0.6\n", + "Raspberry --> 0.4\n", + "Strawberry --> 0.5\n", + "Watermelon --> 0.45\n" + ], + "name": "stdout" + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Fotx7XUquAo8" + }, + "source": [ + "___\n", + "# **VERIFICAR SE UMA KEY ESPECÍFICA PERTENCE AO DICIONÁRIO**" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ju__WsSoKXtk" + }, + "source": [ + "A fruta 'Apple' (que em nosso caso, é uma key) existe no dicionário?" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "-gkEKNZPTeMp", + "outputId": "3540aadd-996a-4abd-cfcb-c22e49b75aaa", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + } + }, + "source": [ + "'Apple' in d_frutas.keys()" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "True" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 75 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "fMzBeFMIusv7" + }, + "source": [ + "A fruta 'Coconut' pertence ao dicionário d_frutas?" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "SKtEwmBCuxyi", + "outputId": "1df7263c-a64f-4eaf-8d4d-a55cac03d2bc", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + } + }, + "source": [ + "'Coconut' in fruits.keys()" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "True" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 77 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "rrH8ArqsK6Bd" + }, + "source": [ + "___\n", + "# **VERIFICAR SE VALOR PERTENCE AO DICIONÁRIO**" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "DbWpbuLTK9sn", + "outputId": "e9fafa6d-284e-4862-8f25-9419ff702dec", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + } + }, + "source": [ + "0.4 in d_frutas.values()" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "True" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 14 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "36kmLUYDvsUI" + }, + "source": [ + "## Adicionar novos itens ao dicionário\n", + "* Considere o dicionário d_frutas2 abaixo:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "5Rwq4-UG4--u" + }, + "source": [ + "d_frutas2 = {'Grapefruit': 1.0 }" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "vljceM6_5H9o" + }, + "source": [ + "O comando abaixo adiciona o dicionário d_frutas2 ao dicionário d_frutas." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "7BD_mYMM5O5o", + "outputId": "2b185546-255e-4ad0-e8c9-10564fcbe2b0", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 442 + } + }, + "source": [ + "d_frutas.update(d_frutas2)\n", + "d_frutas" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "{'Apple': 0.4,\n", + " 'Apricot': 0.25,\n", + " 'Avocado': 0.35,\n", + " 'Banana': 0.3,\n", + " 'Blackberry': 0.55,\n", + " 'Blackcurrant': 0.7,\n", + " 'Blueberry': 0.45,\n", + " 'Cherry': 0.5,\n", + " 'Coconut': 0.75,\n", + " 'Fig': 0.6,\n", + " 'Grape': 0.65,\n", + " 'Grapefruit': 1.0,\n", + " 'Kiwi': 0.2,\n", + " 'Lemon': 0.15,\n", + " 'Mango': 0.8,\n", + " 'Nectarine': 0.75,\n", + " 'Orange': 0.25,\n", + " 'Papaya': 0.3,\n", + " 'Passion Fruit': 0.45,\n", + " 'Peach': 0.55,\n", + " 'Pineapple': 0.55,\n", + " 'Plum': 0.6,\n", + " 'Raspberry': 0.4,\n", + " 'Strawberry': 0.5,\n", + " 'Watermelon': 0.45}" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 79 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ffh-94lo55n4" + }, + "source": [ + "Agora, considere o dicionário d_frutas3 abaixo:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "JMAq_jbP5---" + }, + "source": [ + "d_frutas3 = {'Apple': 0.70}" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Jd6B2cy-6KmY" + }, + "source": [ + "Qual o resultado do comando abaixo?\n", + "\n", + "* Atenção: A fruta 'Apple' (é uma key do dicionário d_frutas) tem valor 0.40. E no dicionário d_frutas3 a fruta 'Apple' tem valor 0.70." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "E4GKdTw76PXI" + }, + "source": [ + "d_frutas.update(d_frutas3)\n", + "d_frutas" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "HMmDfrln6o0c" + }, + "source": [ + "Como esperado, como key= 'Apple' existe no dicionário d_frutas, então o Python atualizou o valor de key= 'Apple' para 0.70." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "SWO2GdNovxAp" + }, + "source": [ + "## Modificar keys e valores" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "DX9UTy4TwlAw" + }, + "source": [ + "Suponha que queremos aplicar um desconto de 10% para cada fruta do nosso dicionário.\n", + "\n", + "* Como fazemos isso?" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "ZziGmKGmwqwn" + }, + "source": [ + "for key, value in d_frutas.items():\n", + " d_frutas[key] = round(value * 0.9, 2)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "s1B-yN8lM-C1" + }, + "source": [ + "Mostra d_frutas com os valores atualizados:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "zZLa85knxBtY", + "outputId": "2c7c12f8-8885-4f34-a0d1-1323e98a9437", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 442 + } + }, + "source": [ + "d_frutas" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "{'Apple': 0.63,\n", + " 'Apricot': 0.23,\n", + " 'Avocado': 0.32,\n", + " 'Banana': 0.27,\n", + " 'Blackberry': 0.5,\n", + " 'Blackcurrant': 0.63,\n", + " 'Blueberry': 0.41,\n", + " 'Cherry': 0.45,\n", + " 'Coconut': 0.68,\n", + " 'Fig': 0.54,\n", + " 'Grape': 0.59,\n", + " 'Grapefruit': 0.9,\n", + " 'Kiwi': 0.18,\n", + " 'Lemon': 0.14,\n", + " 'Mango': 0.72,\n", + " 'Nectarine': 0.68,\n", + " 'Orange': 0.23,\n", + " 'Papaya': 0.27,\n", + " 'Passion Fruit': 0.41,\n", + " 'Peach': 0.5,\n", + " 'Pineapple': 0.5,\n", + " 'Plum': 0.54,\n", + " 'Raspberry': 0.36,\n", + " 'Strawberry': 0.45,\n", + " 'Watermelon': 0.41}" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 84 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "vpN54l4vxze5" + }, + "source": [ + "## Deletar keys do dicionário\n", + "* Deletar uma key significa deletar todo o item {key: value}, ok?" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "eDlthLStNIwR" + }, + "source": [ + "Suponha que queremos deletar a fruta 'Avocado' do dicionário d_frutas.\n", + "\n", + "* Como fazer isso?" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "fnpzHZU_x5Y1" + }, + "source": [ + "for key in list(d_frutas.keys()): # Dica: use a função list para melhorar a performance computacional\n", + " if key == 'Avocado':\n", + " del d_frutas[key] # Deleta key = 'Avocado'" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "VyPUrobONqvI" + }, + "source": [ + "Mostra o dicionário d_frutas atualizado:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "IwnsHejhyT4l", + "outputId": "b910699c-9729-4a27-bd78-3a283c82ac39", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 425 + } + }, + "source": [ + "d_frutas" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "{'Apple': 0.63,\n", + " 'Apricot': 0.23,\n", + " 'Banana': 0.27,\n", + " 'Blackberry': 0.5,\n", + " 'Blackcurrant': 0.63,\n", + " 'Blueberry': 0.41,\n", + " 'Cherry': 0.45,\n", + " 'Coconut': 0.68,\n", + " 'Fig': 0.54,\n", + " 'Grape': 0.59,\n", + " 'Grapefruit': 0.9,\n", + " 'Kiwi': 0.18,\n", + " 'Lemon': 0.14,\n", + " 'Mango': 0.72,\n", + " 'Nectarine': 0.68,\n", + " 'Orange': 0.23,\n", + " 'Papaya': 0.27,\n", + " 'Passion Fruit': 0.41,\n", + " 'Peach': 0.5,\n", + " 'Pineapple': 0.5,\n", + " 'Plum': 0.54,\n", + " 'Raspberry': 0.36,\n", + " 'Strawberry': 0.45,\n", + " 'Watermelon': 0.41}" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 86 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "u4HOf9SNytSq" + }, + "source": [ + "## Filtrar/Selecionar itens baseado em condições\n", + "Em algumas situações você vai querer filtrar os itens do dicionário que satisfaçam alguma(s) condições.\n", + "\n", + "* Considere o exemplo a seguir: queremos selecionar/filtrar somente as frutas com preços maiores que 0.4." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "EwqxWiVlyvgH" + }, + "source": [ + "d_frutas_filtro = {}\n", + "for key, value in d_frutas.items():\n", + " if value > 0.5:\n", + " d_frutas_filtro.update({key: value})" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "eb0jmAKWOtYt" + }, + "source": [ + "Mostra o resultado do dicionário d_frutas_Selected:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "SsStWM5k1s-Q", + "outputId": "f6af5b61-2333-41c7-a28a-0f6a67b0a949", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 170 + } + }, + "source": [ + "d_frutas_filtro" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "{'Apple': 0.63,\n", + " 'Blackcurrant': 0.63,\n", + " 'Coconut': 0.68,\n", + " 'Fig': 0.54,\n", + " 'Grape': 0.59,\n", + " 'Grapefruit': 0.9,\n", + " 'Mango': 0.72,\n", + " 'Nectarine': 0.68,\n", + " 'Plum': 0.54}" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 89 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "u1ve6xIGOjrE" + }, + "source": [ + " Como se pode ver, somente a fruta 'Blackberry' satifaz esta condição." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "KJqpPrfkCk9L" + }, + "source": [ + "## Cálculos com os itens do dicionário" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "exD8HXodCqg6" + }, + "source": [ + "from collections import Counter" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "llCLTysdCuwB" + }, + "source": [ + "Somando os valores de todas as frutas" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "uG0VP1MNCroX", + "outputId": "8221b07b-610d-4a7c-cb14-86d6f63e5be3", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + } + }, + "source": [ + "sum(d_frutas.values())" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "11.450000000000001" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 22 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "a5MBNCF-C5-4" + }, + "source": [ + "Quantos itens existem no dicionário:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "AkvygR0PC9bT", + "outputId": "254eff41-8336-4fe6-d6ad-4d52544d74a9", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + } + }, + "source": [ + "len(list(d_frutas))" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "24" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 25 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "xBNFaklq8OC9" + }, + "source": [ + "## Sortear itens do dicionário - sorted(d_dicionario.items(), reverse= True/False)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "WULJMjHA-mal" + }, + "source": [ + "Ordem alfabética (por key):" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "SH0WIKZ8-Ylr", + "outputId": "b9cea719-637e-40a5-9e79-eb67aeb47887", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 425 + } + }, + "source": [ + "d_frutas_ordenadas = sorted(d_frutas.items(), reverse = False)\n", + "d_frutas_ordenadas" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "[('Apple', 0.4),\n", + " ('Apricot', 0.25),\n", + " ('Avocado', 0.35),\n", + " ('Banana', 0.3),\n", + " ('Blackberry', 0.55),\n", + " ('Blackcurrant', 0.7),\n", + " ('Blueberry', 0.45),\n", + " ('Cherry', 0.5),\n", + " ('Coconut', 0.75),\n", + " ('Fig', 0.6),\n", + " ('Grape', 0.65),\n", + " ('Kiwi', 0.2),\n", + " ('Lemon', 0.15),\n", + " ('Mango', 0.8),\n", + " ('Nectarine', 0.75),\n", + " ('Orange', 0.25),\n", + " ('Papaya', 0.3),\n", + " ('Passion Fruit', 0.45),\n", + " ('Peach', 0.55),\n", + " ('Pineapple', 0.55),\n", + " ('Plum', 0.6),\n", + " ('Raspberry', 0.4),\n", + " ('Strawberry', 0.5),\n", + " ('Watermelon', 0.45)]" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 12 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "T4Li1Q2d-pnZ" + }, + "source": [ + "Ordem reversa (por key):" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "PoBOmfpM_A_a", + "outputId": "4cd9a21c-a2ad-462c-acb0-26ba7a0a4e5d", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 425 + } + }, + "source": [ + "d_frutas_ordenadas_reverse = sorted(d_frutas.items(), reverse = True)\n", + "d_frutas_ordenadas_reverse" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "[('Watermelon', 0.45),\n", + " ('Strawberry', 0.5),\n", + " ('Raspberry', 0.4),\n", + " ('Plum', 0.6),\n", + " ('Pineapple', 0.55),\n", + " ('Peach', 0.55),\n", + " ('Passion Fruit', 0.45),\n", + " ('Papaya', 0.3),\n", + " ('Orange', 0.25),\n", + " ('Nectarine', 0.75),\n", + " ('Mango', 0.8),\n", + " ('Lemon', 0.15),\n", + " ('Kiwi', 0.2),\n", + " ('Grape', 0.65),\n", + " ('Fig', 0.6),\n", + " ('Coconut', 0.75),\n", + " ('Cherry', 0.5),\n", + " ('Blueberry', 0.45),\n", + " ('Blackcurrant', 0.7),\n", + " ('Blackberry', 0.55),\n", + " ('Banana', 0.3),\n", + " ('Avocado', 0.35),\n", + " ('Apricot', 0.25),\n", + " ('Apple', 0.4)]" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 11 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "FxTC2-U88ajk" + }, + "source": [ + "## Função filter()\n", + "* A função filter() aplica um filtro no dicionário, retornando apenas os itens que satisfaz as condições do filtro." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "iJq1clvOHVG2", + "outputId": "16a779ef-48c9-497c-8c7c-a1612aa9aa03", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 425 + } + }, + "source": [ + "d_frutas" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "{'Apple': 0.4,\n", + " 'Apricot': 0.25,\n", + " 'Avocado': 0.35,\n", + " 'Banana': 0.3,\n", + " 'Blackberry': 0.55,\n", + " 'Blackcurrant': 0.7,\n", + " 'Blueberry': 0.45,\n", + " 'Cherry': 0.5,\n", + " 'Coconut': 0.75,\n", + " 'Fig': 0.6,\n", + " 'Grape': 0.65,\n", + " 'Kiwi': 0.2,\n", + " 'Lemon': 0.15,\n", + " 'Mango': 0.8,\n", + " 'Nectarine': 0.75,\n", + " 'Orange': 0.25,\n", + " 'Papaya': 0.3,\n", + " 'Passion Fruit': 0.45,\n", + " 'Peach': 0.55,\n", + " 'Pineapple': 0.55,\n", + " 'Plum': 0.6,\n", + " 'Raspberry': 0.4,\n", + " 'Strawberry': 0.5,\n", + " 'Watermelon': 0.45}" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 2 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "qtTKvNeJNycl" + }, + "source": [ + "### Filtrando por key:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "uIDW5FhwAiSs", + "outputId": "52599d3f-ff13-4894-f697-ce7290bff9d5", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + } + }, + "source": [ + "d_frutas2 = {k: v for k, v in filter(lambda t: t[0] == 'Apple', d_frutas.items())}\n", + "d_frutas2" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "{'Apple': 0.4}" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 6 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "nUMGIzxeNt_U" + }, + "source": [ + "### Filtrando por valor:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "tvHcQatANltL", + "outputId": "8feaf5b1-1db8-4391-8950-248ba8ab46c5", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 187 + } + }, + "source": [ + "d_frutas3 = {k: v for k, v in filter(lambda t: t[1] > 0.5, d_frutas.items())}\n", + "d_frutas3" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "{'Blackberry': 0.55,\n", + " 'Blackcurrant': 0.7,\n", + " 'Coconut': 0.75,\n", + " 'Fig': 0.6,\n", + " 'Grape': 0.65,\n", + " 'Mango': 0.8,\n", + " 'Nectarine': 0.75,\n", + " 'Peach': 0.55,\n", + " 'Pineapple': 0.55,\n", + " 'Plum': 0.6}" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 7 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "qA_XhCdmA6Gn" + }, + "source": [ + "___\n", + "# **EXERCÍCIOS**" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "RSpyl_URgNyE" + }, + "source": [ + "## Exercício 1\n", + "* É possível sortear os itens de um dicionário? Explique sua resposta." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "CXqc9kHch6Mm" + }, + "source": [ + "## Exercício 2\n", + "* É possível termos um dicionário do tipo abaixo?" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "0BBWO9Zth_mc" + }, + "source": [ + "d_colaboradores= {'Gerentes': ['A', 'B', 'C'], 'Programadores': ['B', 'D', 'E', 'F', 'G'], 'Gerentes_Projeto': ['A', 'E']}" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "TNiJSG_uiePb" + }, + "source": [ + "Como acessar o Gerente 'A'?" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ntVcr_3XwaQ-" + }, + "source": [ + "## Exercício 3\n", + "Consulte a página [Python Data Types: Dictionary - Exercises, Practice, Solution](https://www.w3resource.com/python-exercises/dictionary/) para mais exercícios relacionados à dicionários." + ] + } + ] +} \ No newline at end of file From 27328b97f577baf578c03c9f20f6c635ab609551 Mon Sep 17 00:00:00 2001 From: MariaJacobs70 <72224154+MariaJacobs70@users.noreply.github.com> Date: Wed, 7 Oct 2020 16:52:23 -0300 Subject: [PATCH 4/9] Criado usando o Colaboratory --- Notebooks/NB07__Dictionaries_alterado.ipynb | 692 ++++++++++++++------ 1 file changed, 494 insertions(+), 198 deletions(-) diff --git a/Notebooks/NB07__Dictionaries_alterado.ipynb b/Notebooks/NB07__Dictionaries_alterado.ipynb index aeea3db03..96c177534 100644 --- a/Notebooks/NB07__Dictionaries_alterado.ipynb +++ b/Notebooks/NB07__Dictionaries_alterado.ipynb @@ -109,15 +109,28 @@ { "cell_type": "code", "metadata": { - "id": "FxuJ7Awd8f5a" + "id": "FxuJ7Awd8f5a", + "outputId": "2b09e013-655c-492e-8902-c0706b3f7b4d", + "colab": { + "base_uri": "https://localhost:8080/" + } }, "source": [ "# Definição da lista l_frutas:\n", "l_frutas = ['Avocado', 'Apple', 'Apricot', 'Banana', 'Blackcurrant', 'Blackberry', 'Blueberry', 'Cherry', 'Coconut', 'Fig', 'Grape', 'Kiwi', 'Lemon', 'Mango', 'Nectarine', \n", - " 'Orange', 'Papaya','Passion Fruit','Peach','Pineapple','Plum','Raspberry','Strawberry','Watermelon']" + " 'Orange', 'Papaya','Passion Fruit','Peach','Pineapple','Plum','Raspberry','Strawberry','Watermelon']\n", + "print(l_frutas)" ], - "execution_count": null, - "outputs": [] + "execution_count": 2, + "outputs": [ + { + "output_type": "stream", + "text": [ + "['Avocado', 'Apple', 'Apricot', 'Banana', 'Blackcurrant', 'Blackberry', 'Blueberry', 'Cherry', 'Coconut', 'Fig', 'Grape', 'Kiwi', 'Lemon', 'Mango', 'Nectarine', 'Orange', 'Papaya', 'Passion Fruit', 'Peach', 'Pineapple', 'Plum', 'Raspberry', 'Strawberry', 'Watermelon']\n" + ], + "name": "stdout" + } + ] }, { "cell_type": "code", @@ -128,7 +141,7 @@ "# Definição da lista l_precos_frutas:\n", "l_precos_frutas = [0.35, 0.40, 0.25, 0.30, 0.70, 0.55, 0.45, 0.50, 0.75, 0.60, 0.65, 0.20, 0.15, 0.80, 0.75, 0.25, 0.30,0.45,0.55,0.55,0.60,0.40,0.50,0.45]" ], - "execution_count": null, + "execution_count": 3, "outputs": [] }, { @@ -144,17 +157,16 @@ "cell_type": "code", "metadata": { "id": "qT_4sYxA9dyn", - "outputId": "a8badcb1-7f11-4b2e-8629-48a36cc95f9d", + "outputId": "e8276718-3d1c-4dc2-c4ee-0ac975b7ff18", "colab": { - "base_uri": "https://localhost:8080/", - "height": 425 + "base_uri": "https://localhost:8080/" } }, "source": [ "d_frutas = dict(zip(l_frutas, l_precos_frutas))\n", "d_frutas" ], - "execution_count": null, + "execution_count": 4, "outputs": [ { "output_type": "execute_result", @@ -189,7 +201,7 @@ "metadata": { "tags": [] }, - "execution_count": 36 + "execution_count": 4 } ] }, @@ -251,17 +263,16 @@ "cell_type": "code", "metadata": { "id": "N_2J839X4lps", - "outputId": "54ca32b8-7dda-4907-8aa5-b327444bd458", + "outputId": "56ab2a1f-4c41-4335-9697-38bce1eb51f6", "colab": { - "base_uri": "https://localhost:8080/", - "height": 134 + "base_uri": "https://localhost:8080/" } }, "source": [ "d_dia_semana = {'Seg': 'Segunda', 'Ter': 'Terça', 'Qua': 'Quarta', 'Qui': 'Quinta', 'Sex': 'Sexta', 'Sab': 'Sabado', 'Dom': 'Domingo'}\n", "d_dia_semana" ], - "execution_count": null, + "execution_count": 5, "outputs": [ { "output_type": "execute_result", @@ -279,7 +290,7 @@ "metadata": { "tags": [] }, - "execution_count": 1 + "execution_count": 5 } ] }, @@ -297,20 +308,23 @@ "cell_type": "code", "metadata": { "id": "eHuvY7BWQKhQ", - "outputId": "87fabef2-0891-4994-a4ce-cdd1e23218b1", + "outputId": "649df9a0-587f-4e6c-a19a-4654e919bd93", "colab": { "base_uri": "https://localhost:8080/", - "height": 34 + "height": 35 } }, "source": [ "d_dia_semana['Seg']" ], - "execution_count": null, + "execution_count": 6, "outputs": [ { "output_type": "execute_result", "data": { + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "string" + }, "text/plain": [ "'Segunda'" ] @@ -318,7 +332,7 @@ "metadata": { "tags": [] }, - "execution_count": 2 + "execution_count": 6 } ] }, @@ -345,17 +359,16 @@ "cell_type": "code", "metadata": { "id": "2iPWXPBLfOlr", - "outputId": "7925813c-77e2-4651-bdb2-f0e1144aecdb", + "outputId": "550e6ef2-9a6a-41b1-8b36-95c3e2e5efbe", "colab": { - "base_uri": "https://localhost:8080/", - "height": 34 + "base_uri": "https://localhost:8080/" } }, "source": [ "d_paises = {} # Também podemos usar a função dict() para criar o dicionário vazio da seguinte forma: d_paises= dict()\n", "d_paises" ], - "execution_count": null, + "execution_count": 7, "outputs": [ { "output_type": "execute_result", @@ -367,7 +380,7 @@ "metadata": { "tags": [] }, - "execution_count": 4 + "execution_count": 7 } ] }, @@ -386,16 +399,15 @@ "cell_type": "code", "metadata": { "id": "voPYpGIGff3o", - "outputId": "7fab37f5-8ed1-46d8-b47b-62a100ee2196", + "outputId": "d0748385-60a7-4b87-b7e2-794fbb8e9abe", "colab": { - "base_uri": "https://localhost:8080/", - "height": 34 + "base_uri": "https://localhost:8080/" } }, "source": [ "type(d_paises)" ], - "execution_count": null, + "execution_count": 8, "outputs": [ { "output_type": "execute_result", @@ -407,7 +419,7 @@ "metadata": { "tags": [] }, - "execution_count": 5 + "execution_count": 8 } ] }, @@ -434,29 +446,28 @@ "cell_type": "code", "metadata": { "id": "EXZ7eEZofnza", - "outputId": "bacf6377-b5cc-4f29-f6c0-347516550f37", + "outputId": "fcbde537-da11-4573-ed38-bd4d48a96f60", "colab": { - "base_uri": "https://localhost:8080/", - "height": 34 + "base_uri": "https://localhost:8080/" } }, "source": [ "d_paises[1] = 'Italy'\n", "d_paises" ], - "execution_count": null, + "execution_count": 12, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ - "{1: 'Italy'}" + "{1: 'Italy', 2: 'Denmark', 3: 'Brazil'}" ] }, "metadata": { "tags": [] }, - "execution_count": 6 + "execution_count": 12 } ] }, @@ -473,29 +484,28 @@ "cell_type": "code", "metadata": { "id": "GAXSzSiufv1u", - "outputId": "94f7e900-0452-4908-c0fe-2783a735bd01", + "outputId": "ba60506d-647b-4d40-b316-dc2c693c6f8c", "colab": { - "base_uri": "https://localhost:8080/", - "height": 34 + "base_uri": "https://localhost:8080/" } }, "source": [ "d_paises[2] = 'Denmark'\n", "d_paises" ], - "execution_count": null, + "execution_count": 13, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ - "{1: 'Italy', 2: 'Denmark'}" + "{1: 'Italy', 2: 'Denmark', 3: 'Brazil'}" ] }, "metadata": { "tags": [] }, - "execution_count": 7 + "execution_count": 13 } ] }, @@ -512,17 +522,16 @@ "cell_type": "code", "metadata": { "id": "FN7km8C9gAjM", - "outputId": "7863cccb-b0aa-47b2-901b-ba72a5574d4f", + "outputId": "9927805c-54bf-4a7b-8e1e-134353f3ff40", "colab": { - "base_uri": "https://localhost:8080/", - "height": 34 + "base_uri": "https://localhost:8080/" } }, "source": [ "d_paises[3]= 'Brazil'\n", "d_paises" ], - "execution_count": null, + "execution_count": 14, "outputs": [ { "output_type": "execute_result", @@ -534,7 +543,7 @@ "metadata": { "tags": [] }, - "execution_count": 8 + "execution_count": 14 } ] }, @@ -561,10 +570,9 @@ "cell_type": "code", "metadata": { "id": "Rr6DtJnDgU5I", - "outputId": "f02c2c47-e5aa-43cd-886b-345c52ed31bd", + "outputId": "763dc0a4-de82-428c-d9d8-940412330956", "colab": { - "base_uri": "https://localhost:8080/", - "height": 34 + "base_uri": "https://localhost:8080/" } }, "source": [ @@ -572,7 +580,7 @@ "d_paises[3]= 'France'\n", "d_paises" ], - "execution_count": null, + "execution_count": 15, "outputs": [ { "output_type": "execute_result", @@ -584,7 +592,7 @@ "metadata": { "tags": [] }, - "execution_count": 9 + "execution_count": 15 } ] }, @@ -613,16 +621,16 @@ "cell_type": "code", "metadata": { "id": "ALwbHwi4iwky", - "outputId": "bb0d57fb-2742-4eb1-9d82-9309142d21f5", + "outputId": "8aa9b44e-313b-42a6-e885-523185c0f4c5", "colab": { - "base_uri": "https://localhost:8080/", - "height": 34 + "base_uri": "https://localhost:8080/" } }, "source": [ - "d_paises.keys()" + "d_paises.keys()\n", + "d_paises." ], - "execution_count": null, + "execution_count": 16, "outputs": [ { "output_type": "execute_result", @@ -634,7 +642,7 @@ "metadata": { "tags": [] }, - "execution_count": 10 + "execution_count": 16 } ] }, @@ -691,27 +699,27 @@ "cell_type": "code", "metadata": { "id": "LraTwXjdjG3m", - "outputId": "b3d6d55e-20ad-4f88-a783-9ba1c4fd8654", + "outputId": "f7e69608-c6bf-45c2-a08a-f7e76e1c4b2f", "colab": { - "base_uri": "https://localhost:8080/", - "height": 162 + "base_uri": "https://localhost:8080/" } }, "source": [ "d_paises.items()" ], - "execution_count": null, + "execution_count": 17, "outputs": [ { - "output_type": "error", - "ename": "NameError", - "evalue": "ignored", - "traceback": [ - "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", - "\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)", - "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0md_Paises\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mitems\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", - "\u001b[0;31mNameError\u001b[0m: name 'd_Paises' is not defined" - ] + "output_type": "execute_result", + "data": { + "text/plain": [ + "dict_items([(1, 'Italy'), (2, 'Denmark'), (3, 'France')])" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 17 } ] }, @@ -739,20 +747,23 @@ "cell_type": "code", "metadata": { "id": "FUfTjqktjW60", - "outputId": "678ab629-6cff-4fe1-e03f-d90709a98f26", + "outputId": "acf5b8d0-7b30-4f3d-87ef-a1c9d259652b", "colab": { "base_uri": "https://localhost:8080/", - "height": 34 + "height": 35 } }, "source": [ "d_paises.get(1)" ], - "execution_count": null, + "execution_count": 18, "outputs": [ { "output_type": "execute_result", "data": { + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "string" + }, "text/plain": [ "'Italy'" ] @@ -760,7 +771,7 @@ "metadata": { "tags": [] }, - "execution_count": 11 + "execution_count": 18 } ] }, @@ -779,17 +790,16 @@ "cell_type": "code", "metadata": { "id": "XL17EmvMkkky", - "outputId": "65846bc2-87a2-42cf-eb17-e3fccb00c9a4", + "outputId": "9aea7f6c-bc39-4d39-bbf7-78c2ff4d7c8c", "colab": { - "base_uri": "https://localhost:8080/", - "height": 35 + "base_uri": "https://localhost:8080/" } }, "source": [ "d_paises2 = d_paises.copy()\n", "d_paises2" ], - "execution_count": null, + "execution_count": 19, "outputs": [ { "output_type": "execute_result", @@ -801,7 +811,7 @@ "metadata": { "tags": [] }, - "execution_count": 28 + "execution_count": 19 } ] }, @@ -824,23 +834,22 @@ "source": [ "d_paises.clear()" ], - "execution_count": null, + "execution_count": 20, "outputs": [] }, { "cell_type": "code", "metadata": { "id": "ro_42gzDjsdV", - "outputId": "a2c2a25b-40ef-4842-f2f7-3ac85404d195", + "outputId": "355082b3-19d2-4d8b-8347-c2a6b08d6487", "colab": { - "base_uri": "https://localhost:8080/", - "height": 35 + "base_uri": "https://localhost:8080/" } }, "source": [ "d_paises" ], - "execution_count": null, + "execution_count": 21, "outputs": [ { "output_type": "execute_result", @@ -852,7 +861,7 @@ "metadata": { "tags": [] }, - "execution_count": 13 + "execution_count": 21 } ] }, @@ -967,7 +976,7 @@ " 'Strawberry': 0.50,\n", " 'Watermelon': 0.45}" ], - "execution_count": null, + "execution_count": 22, "outputs": [] }, { @@ -983,16 +992,15 @@ "cell_type": "code", "metadata": { "id": "bI7Ctf0ohyz8", - "outputId": "5af38cf4-aaf8-4efa-f682-502bfb5022a6", + "outputId": "f9aa2e05-8a41-45c0-a32e-22b1b8976c69", "colab": { - "base_uri": "https://localhost:8080/", - "height": 425 + "base_uri": "https://localhost:8080/" } }, "source": [ "d_frutas" ], - "execution_count": null, + "execution_count": 23, "outputs": [ { "output_type": "execute_result", @@ -1027,7 +1035,7 @@ "metadata": { "tags": [] }, - "execution_count": 8 + "execution_count": 23 } ] }, @@ -1044,16 +1052,15 @@ "cell_type": "code", "metadata": { "id": "JpreyE_LtCcU", - "outputId": "cee4be2d-7980-4a3d-85fb-17561d1bb1ff", + "outputId": "f2f7d1e4-51ff-40ff-af58-5577a34dd137", "colab": { - "base_uri": "https://localhost:8080/", - "height": 34 + "base_uri": "https://localhost:8080/" } }, "source": [ "d_frutas['Apple']" ], - "execution_count": null, + "execution_count": 24, "outputs": [ { "output_type": "execute_result", @@ -1065,7 +1072,7 @@ "metadata": { "tags": [] }, - "execution_count": 21 + "execution_count": 24 } ] }, @@ -1082,17 +1089,16 @@ "cell_type": "code", "metadata": { "id": "rMro_tY8kepo", - "outputId": "4488c243-6792-4efa-b271-e546270b129d", + "outputId": "dec61759-d1b5-4379-8846-cce26973209d", "colab": { - "base_uri": "https://localhost:8080/", - "height": 425 + "base_uri": "https://localhost:8080/" } }, "source": [ "for key in d_frutas.keys():\n", " print(key)" ], - "execution_count": null, + "execution_count": 25, "outputs": [ { "output_type": "stream", @@ -1139,17 +1145,16 @@ "cell_type": "code", "metadata": { "id": "DpFB1g-3kDSt", - "outputId": "7ac51581-edfa-418d-a1e0-d297ebdffca7", + "outputId": "a0b41867-9ca7-414d-a451-34aa0f6e4232", "colab": { - "base_uri": "https://localhost:8080/", - "height": 425 + "base_uri": "https://localhost:8080/" } }, "source": [ "for item in d_frutas.items():\n", " print(item) " ], - "execution_count": null, + "execution_count": 26, "outputs": [ { "output_type": "stream", @@ -1183,6 +1188,17 @@ } ] }, + { + "cell_type": "code", + "metadata": { + "id": "KxGaOrYSdLxp" + }, + "source": [ + "" + ], + "execution_count": null, + "outputs": [] + }, { "cell_type": "markdown", "metadata": { @@ -1196,17 +1212,16 @@ "cell_type": "code", "metadata": { "id": "tjJ6qRF8nr4v", - "outputId": "3e75843b-2d45-4b4c-a3f2-24ffc0e60a7a", + "outputId": "6798f545-3f24-40b9-83d9-8c4a5e073389", "colab": { - "base_uri": "https://localhost:8080/", - "height": 425 + "base_uri": "https://localhost:8080/" } }, "source": [ "for value in d_frutas.values():\n", " print(value)" ], - "execution_count": null, + "execution_count": 27, "outputs": [ { "output_type": "stream", @@ -1297,6 +1312,34 @@ } ] }, + { + "cell_type": "code", + "metadata": { + "id": "bDb1IHuddfwH", + "outputId": "b66c5df5-6028-4cad-e480-60c92cdaa9f8", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "source": [ + "d_frutas.items()" + ], + "execution_count": 28, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "dict_items([('Avocado', 0.35), ('Apple', 0.4), ('Apricot', 0.25), ('Banana', 0.3), ('Blackcurrant', 0.7), ('Blackberry', 0.55), ('Blueberry', 0.45), ('Cherry', 0.5), ('Coconut', 0.75), ('Fig', 0.6), ('Grape', 0.65), ('Kiwi', 0.2), ('Lemon', 0.15), ('Mango', 0.8), ('Nectarine', 0.75), ('Orange', 0.25), ('Papaya', 0.3), ('Passion Fruit', 0.45), ('Peach', 0.55), ('Pineapple', 0.55), ('Plum', 0.6), ('Raspberry', 0.4), ('Strawberry', 0.5), ('Watermelon', 0.45)])" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 28 + } + ] + }, { "cell_type": "markdown", "metadata": { @@ -1397,16 +1440,15 @@ "cell_type": "code", "metadata": { "id": "DbWpbuLTK9sn", - "outputId": "e9fafa6d-284e-4862-8f25-9419ff702dec", + "outputId": "f10aeaa8-4cd7-41a9-b9f9-7e7bbe6a7fd5", "colab": { - "base_uri": "https://localhost:8080/", - "height": 34 + "base_uri": "https://localhost:8080/" } }, "source": [ "0.4 in d_frutas.values()" ], - "execution_count": null, + "execution_count": 29, "outputs": [ { "output_type": "execute_result", @@ -1418,7 +1460,7 @@ "metadata": { "tags": [] }, - "execution_count": 14 + "execution_count": 29 } ] }, @@ -1440,7 +1482,7 @@ "source": [ "d_frutas2 = {'Grapefruit': 1.0 }" ], - "execution_count": null, + "execution_count": 30, "outputs": [] }, { @@ -1456,17 +1498,16 @@ "cell_type": "code", "metadata": { "id": "7BD_mYMM5O5o", - "outputId": "2b185546-255e-4ad0-e8c9-10564fcbe2b0", + "outputId": "39cdea31-d4ff-4064-e59f-adf178840a82", "colab": { - "base_uri": "https://localhost:8080/", - "height": 442 + "base_uri": "https://localhost:8080/" } }, "source": [ "d_frutas.update(d_frutas2)\n", "d_frutas" ], - "execution_count": null, + "execution_count": 31, "outputs": [ { "output_type": "execute_result", @@ -1502,7 +1543,7 @@ "metadata": { "tags": [] }, - "execution_count": 79 + "execution_count": 31 } ] }, @@ -1523,7 +1564,7 @@ "source": [ "d_frutas3 = {'Apple': 0.70}" ], - "execution_count": null, + "execution_count": 32, "outputs": [] }, { @@ -1540,14 +1581,55 @@ { "cell_type": "code", "metadata": { - "id": "E4GKdTw76PXI" + "id": "E4GKdTw76PXI", + "outputId": "6a14210d-d1e4-4dce-9465-72af35a60671", + "colab": { + "base_uri": "https://localhost:8080/" + } }, "source": [ "d_frutas.update(d_frutas3)\n", "d_frutas" ], - "execution_count": null, - "outputs": [] + "execution_count": 33, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "{'Apple': 0.7,\n", + " 'Apricot': 0.25,\n", + " 'Avocado': 0.35,\n", + " 'Banana': 0.3,\n", + " 'Blackberry': 0.55,\n", + " 'Blackcurrant': 0.7,\n", + " 'Blueberry': 0.45,\n", + " 'Cherry': 0.5,\n", + " 'Coconut': 0.75,\n", + " 'Fig': 0.6,\n", + " 'Grape': 0.65,\n", + " 'Grapefruit': 1.0,\n", + " 'Kiwi': 0.2,\n", + " 'Lemon': 0.15,\n", + " 'Mango': 0.8,\n", + " 'Nectarine': 0.75,\n", + " 'Orange': 0.25,\n", + " 'Papaya': 0.3,\n", + " 'Passion Fruit': 0.45,\n", + " 'Peach': 0.55,\n", + " 'Pineapple': 0.55,\n", + " 'Plum': 0.6,\n", + " 'Raspberry': 0.4,\n", + " 'Strawberry': 0.5,\n", + " 'Watermelon': 0.45}" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 33 + } + ] }, { "cell_type": "markdown", @@ -1587,7 +1669,7 @@ "for key, value in d_frutas.items():\n", " d_frutas[key] = round(value * 0.9, 2)" ], - "execution_count": null, + "execution_count": 34, "outputs": [] }, { @@ -1603,16 +1685,15 @@ "cell_type": "code", "metadata": { "id": "zZLa85knxBtY", - "outputId": "2c7c12f8-8885-4f34-a0d1-1323e98a9437", + "outputId": "ee34b820-6961-4d88-d0e2-628efd922e70", "colab": { - "base_uri": "https://localhost:8080/", - "height": 442 + "base_uri": "https://localhost:8080/" } }, "source": [ "d_frutas" ], - "execution_count": null, + "execution_count": 35, "outputs": [ { "output_type": "execute_result", @@ -1648,7 +1729,7 @@ "metadata": { "tags": [] }, - "execution_count": 84 + "execution_count": 35 } ] }, @@ -1683,7 +1764,7 @@ " if key == 'Avocado':\n", " del d_frutas[key] # Deleta key = 'Avocado'" ], - "execution_count": null, + "execution_count": 36, "outputs": [] }, { @@ -1699,16 +1780,15 @@ "cell_type": "code", "metadata": { "id": "IwnsHejhyT4l", - "outputId": "b910699c-9729-4a27-bd78-3a283c82ac39", + "outputId": "e5029d7e-9c8d-4e37-940e-a375ba57c78d", "colab": { - "base_uri": "https://localhost:8080/", - "height": 425 + "base_uri": "https://localhost:8080/" } }, "source": [ "d_frutas" ], - "execution_count": null, + "execution_count": 37, "outputs": [ { "output_type": "execute_result", @@ -1743,7 +1823,7 @@ "metadata": { "tags": [] }, - "execution_count": 86 + "execution_count": 37 } ] }, @@ -1770,7 +1850,21 @@ " if value > 0.5:\n", " d_frutas_filtro.update({key: value})" ], - "execution_count": null, + "execution_count": 38, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "fu1jgbfUhPRD" + }, + "source": [ + "d_frutas_filtro = {}\n", + "for key, value in d_frutas.items():\n", + " if value == 0.5:\n", + " d_frutas_filtro.update({key: value})" + ], + "execution_count": 40, "outputs": [] }, { @@ -1786,36 +1880,27 @@ "cell_type": "code", "metadata": { "id": "SsStWM5k1s-Q", - "outputId": "f6af5b61-2333-41c7-a28a-0f6a67b0a949", + "outputId": "3fc3a8ef-4627-40d7-e24e-f2bee8e22e66", "colab": { - "base_uri": "https://localhost:8080/", - "height": 170 + "base_uri": "https://localhost:8080/" } }, "source": [ "d_frutas_filtro" ], - "execution_count": null, + "execution_count": 41, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ - "{'Apple': 0.63,\n", - " 'Blackcurrant': 0.63,\n", - " 'Coconut': 0.68,\n", - " 'Fig': 0.54,\n", - " 'Grape': 0.59,\n", - " 'Grapefruit': 0.9,\n", - " 'Mango': 0.72,\n", - " 'Nectarine': 0.68,\n", - " 'Plum': 0.54}" + "{'Blackberry': 0.5, 'Peach': 0.5, 'Pineapple': 0.5}" ] }, "metadata": { "tags": [] }, - "execution_count": 89 + "execution_count": 41 } ] }, @@ -2071,51 +2156,50 @@ "cell_type": "code", "metadata": { "id": "iJq1clvOHVG2", - "outputId": "16a779ef-48c9-497c-8c7c-a1612aa9aa03", + "outputId": "2c89ba89-3377-4337-87f1-050360788da9", "colab": { - "base_uri": "https://localhost:8080/", - "height": 425 + "base_uri": "https://localhost:8080/" } }, "source": [ "d_frutas" ], - "execution_count": null, + "execution_count": 42, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ - "{'Apple': 0.4,\n", - " 'Apricot': 0.25,\n", - " 'Avocado': 0.35,\n", - " 'Banana': 0.3,\n", - " 'Blackberry': 0.55,\n", - " 'Blackcurrant': 0.7,\n", - " 'Blueberry': 0.45,\n", - " 'Cherry': 0.5,\n", - " 'Coconut': 0.75,\n", - " 'Fig': 0.6,\n", - " 'Grape': 0.65,\n", - " 'Kiwi': 0.2,\n", - " 'Lemon': 0.15,\n", - " 'Mango': 0.8,\n", - " 'Nectarine': 0.75,\n", - " 'Orange': 0.25,\n", - " 'Papaya': 0.3,\n", - " 'Passion Fruit': 0.45,\n", - " 'Peach': 0.55,\n", - " 'Pineapple': 0.55,\n", - " 'Plum': 0.6,\n", - " 'Raspberry': 0.4,\n", - " 'Strawberry': 0.5,\n", - " 'Watermelon': 0.45}" + "{'Apple': 0.63,\n", + " 'Apricot': 0.23,\n", + " 'Banana': 0.27,\n", + " 'Blackberry': 0.5,\n", + " 'Blackcurrant': 0.63,\n", + " 'Blueberry': 0.41,\n", + " 'Cherry': 0.45,\n", + " 'Coconut': 0.68,\n", + " 'Fig': 0.54,\n", + " 'Grape': 0.59,\n", + " 'Grapefruit': 0.9,\n", + " 'Kiwi': 0.18,\n", + " 'Lemon': 0.14,\n", + " 'Mango': 0.72,\n", + " 'Nectarine': 0.68,\n", + " 'Orange': 0.23,\n", + " 'Papaya': 0.27,\n", + " 'Passion Fruit': 0.41,\n", + " 'Peach': 0.5,\n", + " 'Pineapple': 0.5,\n", + " 'Plum': 0.54,\n", + " 'Raspberry': 0.36,\n", + " 'Strawberry': 0.45,\n", + " 'Watermelon': 0.41}" ] }, "metadata": { "tags": [] }, - "execution_count": 2 + "execution_count": 42 } ] }, @@ -2171,38 +2255,36 @@ "cell_type": "code", "metadata": { "id": "tvHcQatANltL", - "outputId": "8feaf5b1-1db8-4391-8950-248ba8ab46c5", + "outputId": "6ca05107-f13c-4175-9d60-41a7615fd233", "colab": { - "base_uri": "https://localhost:8080/", - "height": 187 + "base_uri": "https://localhost:8080/" } }, "source": [ "d_frutas3 = {k: v for k, v in filter(lambda t: t[1] > 0.5, d_frutas.items())}\n", "d_frutas3" ], - "execution_count": null, + "execution_count": 43, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ - "{'Blackberry': 0.55,\n", - " 'Blackcurrant': 0.7,\n", - " 'Coconut': 0.75,\n", - " 'Fig': 0.6,\n", - " 'Grape': 0.65,\n", - " 'Mango': 0.8,\n", - " 'Nectarine': 0.75,\n", - " 'Peach': 0.55,\n", - " 'Pineapple': 0.55,\n", - " 'Plum': 0.6}" + "{'Apple': 0.63,\n", + " 'Blackcurrant': 0.63,\n", + " 'Coconut': 0.68,\n", + " 'Fig': 0.54,\n", + " 'Grape': 0.59,\n", + " 'Grapefruit': 0.9,\n", + " 'Mango': 0.72,\n", + " 'Nectarine': 0.68,\n", + " 'Plum': 0.54}" ] }, "metadata": { "tags": [] }, - "execution_count": 7 + "execution_count": 43 } ] }, @@ -2239,13 +2321,33 @@ { "cell_type": "code", "metadata": { - "id": "0BBWO9Zth_mc" + "id": "0BBWO9Zth_mc", + "outputId": "6f948288-be8e-4541-94b9-b067c17f13e4", + "colab": { + "base_uri": "https://localhost:8080/" + } }, "source": [ - "d_colaboradores= {'Gerentes': ['A', 'B', 'C'], 'Programadores': ['B', 'D', 'E', 'F', 'G'], 'Gerentes_Projeto': ['A', 'E']}" + "d_colaboradores= {'Gerentes': ['A', 'B', 'C'], 'Programadores': ['B', 'D', 'E', 'F', 'G'], 'Gerentes_Projeto': ['A', 'E']}\n", + "d_colaboradores" ], - "execution_count": null, - "outputs": [] + "execution_count": 45, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "{'Gerentes': ['A', 'B', 'C'],\n", + " 'Gerentes_Projeto': ['A', 'E'],\n", + " 'Programadores': ['B', 'D', 'E', 'F', 'G']}" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 45 + } + ] }, { "cell_type": "markdown", @@ -2256,6 +2358,182 @@ "Como acessar o Gerente 'A'?" ] }, + { + "cell_type": "code", + "metadata": { + "id": "XH3BgvYfi01G", + "outputId": "d2fe69fa-5c0b-47b9-9413-e9c2019c19d1", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 129 + } + }, + "source": [ + "d_colaboradores{'Gerentes'[0]}" + ], + "execution_count": 46, + "outputs": [ + { + "output_type": "error", + "ename": "SyntaxError", + "evalue": "ignored", + "traceback": [ + "\u001b[0;36m File \u001b[0;32m\"\"\u001b[0;36m, line \u001b[0;32m1\u001b[0m\n\u001b[0;31m d_colaboradores{'Gerentes'[0]}\u001b[0m\n\u001b[0m ^\u001b[0m\n\u001b[0;31mSyntaxError\u001b[0m\u001b[0;31m:\u001b[0m invalid syntax\n" + ] + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "kCILrtFPjBTe", + "outputId": "4d8c8aca-341d-4528-8ba2-53a898683c49", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "source": [ + "d_colaboradores.get('Gerentes')\n" + ], + "execution_count": 50, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "['A', 'B', 'C']" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 50 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "emYGFLuJk65D", + "outputId": "8fe63e1f-8421-4d10-916e-58520b8d71fe", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 35 + } + }, + "source": [ + "list_gerentes=d_colaboradores.get('Gerentes')\n", + "list_gerentes[0]" + ], + "execution_count": 53, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "string" + }, + "text/plain": [ + "'A'" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 53 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "B8ZbnSYNlZgt", + "outputId": "06d1ea72-8a6d-49fd-9d1d-83cf8a080b24", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 35 + } + }, + "source": [ + "d_colaboradores.get('Gerentes')[0]\n", + "\n" + ], + "execution_count": 57, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "string" + }, + "text/plain": [ + "'A'" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 57 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "4JnYRazemjq5" + }, + "source": [ + "retornar os cargos que o funcionario !A! ocupa\n" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "SX2h28bXj33b", + "outputId": "8be812f8-d86a-4a47-d439-55c229c216c9", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "source": [ + "d_colaboradores['Gerentes']" + ], + "execution_count": 51, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "['A', 'B', 'C']" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 51 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "XCkq6tN9mqgn" + }, + "source": [ + "retornar do dicinario d_colaboradores somente o progarmador cujo nome seja 'E'" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "BSXBKuQvm0y0" + }, + "source": [ + "quais são os colaboradores que \n", + "a)são ao mesmo tempo gerente e gerente de projeto\n", + "b) gerentes de projetos e programadores" + ] + }, { "cell_type": "markdown", "metadata": { @@ -2265,6 +2543,24 @@ "## Exercício 3\n", "Consulte a página [Python Data Types: Dictionary - Exercises, Practice, Solution](https://www.w3resource.com/python-exercises/dictionary/) para mais exercícios relacionados à dicionários." ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ZWRUI3Q0m0GQ" + }, + "source": [ + "" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "eKwYMbxymyai" + }, + "source": [ + "" + ] } ] } \ No newline at end of file From 40a63768549d78be4d8544162889e06e0beae38c Mon Sep 17 00:00:00 2001 From: MariaJacobs70 <72224154+MariaJacobs70@users.noreply.github.com> Date: Wed, 7 Oct 2020 21:59:25 -0300 Subject: [PATCH 5/9] Criado usando o Colaboratory --- Notebooks/NB09_01__Functions_alterado.ipynb | 1549 +++++++++++++++++++ 1 file changed, 1549 insertions(+) create mode 100644 Notebooks/NB09_01__Functions_alterado.ipynb diff --git a/Notebooks/NB09_01__Functions_alterado.ipynb b/Notebooks/NB09_01__Functions_alterado.ipynb new file mode 100644 index 000000000..5859a10e7 --- /dev/null +++ b/Notebooks/NB09_01__Functions_alterado.ipynb @@ -0,0 +1,1549 @@ +{ + "nbformat": 4, + "nbformat_minor": 0, + "metadata": { + "colab": { + "name": "NB09_01__Functions.ipynb", + "provenance": [], + "private_outputs": true, + "include_colab_link": true + }, + "kernelspec": { + "name": "python3", + "display_name": "Python 3" + } + }, + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "view-in-github", + "colab_type": "text" + }, + "source": [ + "\"Open" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "d_YndS20uqkK" + }, + "source": [ + "

FUNÇÕES

\n", + "\n", + "\n", + "\n", + "# **AGENDA**:\n", + "\n", + "> Veja o **índice** dos itens que serão abordados neste capítulo.\n", + "\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "e0UKAZQvJ_c2" + }, + "source": [ + "___\n", + "# **INTRODUÇÃO ÀS FUNÇÕES**\n", + "> Funções são uma sequência de comandos para executar uma tarefa.\n", + ">> Atenção ao que recomenda o PEP8 sobre como escrever funções." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "Z4-gPTjZUP50" + }, + "source": [ + "# Não executar este codigo!\n", + "def funcao(arg1, arg2, ..., argN):\n", + " " + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "etxNlyRYo39A" + }, + "source": [ + "def show_hello_world():\n", + " print('Hello World!')" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "G6I9PFvZpBgR" + }, + "source": [ + "type(show_hello_world)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "_meNdNygpIbv" + }, + "source": [ + "show_hello_world()" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "6zfLd8HwpPpg" + }, + "source": [ + "___\n", + "# **DOCUMENTAR FUNÇÕES COM COMMENTS/DOCSTRING**" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "3yzgBxtNpRi_" + }, + "source": [ + "def show_hello_world():\n", + " '''\n", + " Esta função faz um cumprimento: 'Hello World!'\n", + " Inputs: \n", + " param1: djdjdjdjdj\n", + " param2: fjrjirjjirjir\n", + " '''\n", + " print('Hello World!')" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "0rBaxjpmpbm1" + }, + "source": [ + "show_hello_world()" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "6ThOwDQp4TfR" + }, + "source": [ + "# Se quisermos ver a documentação da função, basta invocar o statement __doc__ da seguinte forma:\n", + "show_hello_world.__doc__" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "9YZ2afpNA4st" + }, + "source": [ + "OU..." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "uSnwA4BVA5_t" + }, + "source": [ + "help(show_hello_world)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "whbnnMA5p1Jw" + }, + "source": [ + "___\n", + "# **FUNÇÕES COM ARGUMENTOS**" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "O3bSjLA_qTTc" + }, + "source": [ + "Definir a função mostra_nome com dois argumentos: s_primeiro_nome e s_ultimo_nome:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "9jWyCCPPp4yS" + }, + "source": [ + "def mostra_nome(s_primeiro_nome, s_ultimo_nome):\n", + " print(f'Olá, meu nome é {s_primeiro_nome} {s_ultimo_nome}')" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "VOB3Ip63qIzr" + }, + "source": [ + "mostra_nome('Nelio', 'Machado')" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Oi0c_GuesfcL" + }, + "source": [ + "Neste caso, o primeiro argumento da função (s_primeiro_nome) vai receber o valor 'Nelio' e o segundo argumento da função (s_ultimo_nome) vai receber 'Machado'." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "qkMblpnLsITO" + }, + "source": [ + "No entanto, também podemos invocar a função da seguinte forma:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "TTli7e6xsMCo" + }, + "source": [ + "mostra_nome(s_ultimo_nome = 'Machado', s_primeiro_nome = 'Nelio')" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "rmatMmhTsaVc" + }, + "source": [ + "Observe que o resultado é o mesmo. No entanto, desta forma, estamos dizendo o valor específico que cada parâmetro irá receber." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "PnNYrgJ6VQo9" + }, + "source": [ + "## PEP8 + Annotations = Códigos mais fáceis de entender e atualizar\n", + "\n", + "> Observe abaixo quando combinamos PEP8 + Annotations para tornar o código Python ainda mais detalhado. O objetivo de _Annotations_ é deixar o código mais claro, sem mudar o comportamento da função. No exemplo abaixo, os argumentos da função s_primeiro_nome e s_ultimo_nome são argumentos do tipo _str_ e a função retorna um _output_ do tipo _str_." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "aU2Sob37VVmi" + }, + "source": [ + "def mostra_nome2(s_primeiro_nome: str, s_ultimo_nome: str) -> str:\n", + " print(f'Olá, meu nome é {s_primeiro_nome} {s_ultimo_nome}')" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "iIvqS73mXNam" + }, + "source": [ + "mostra_nome2(s_ultimo_nome = 'Machado', s_primeiro_nome = 'Nelio')" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "rSnrtFNtXrbN" + }, + "source": [ + "# **\\*args**\n", + "> \\*args permite que você passe mais argumentos do que o número de argumentos formais que você definiu anteriormente." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "aT0_PeuEvXiP" + }, + "source": [ + "## Exemplo 1\n", + "> Considere a função (simples) para imprimir o nome completo de um cliente." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "Npbi_Hy0bUec" + }, + "source": [ + "# definimos a função mostra_nome3 da seguinte forma:\n", + "def mostra_nome3(*args):\n", + " nome = ' '.join(args)\n", + " \n", + " print(f'Olá, meu nome é {nome}.')" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "dFzM0gA3_9za" + }, + "source": [ + "mostra_nome3('Nelio', 'Machado')" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "370bpgaSvDbJ" + }, + "source": [ + "E agora, a função recebe qualquer quantidade de parâmetros." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "4kYcu6PEX-Nz" + }, + "source": [ + "mostra_nome3('Pedro', 'de', 'Alcantara', 'Francisco', 'Antonio', 'Joao', 'Carlos', 'Xavier', 'de', 'Paula', 'Miguel', 'Rafael', 'Joaquim', 'Jose', 'Gonzaga', 'Pascoal', 'Cipriano', 'Serafim')" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "KMgngPmFimxb" + }, + "source": [ + "Observe que desta forma pouco importa a quantidade de parâmetros que passamos á função." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Y9pDa6ZRjo0U" + }, + "source": [ + "## Exemplo 2\n", + "* Suponha que estamos insteressados em desenvolver uma função que multiplica dois números (passados como parâmetros)." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "1A-vhsHxv1YE" + }, + "source": [ + "Antes de vermos a solução usando \\*args, vamos ver como seria nossa função se \\*args não existisse." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "cCDwruF8j5i5" + }, + "source": [ + "### Forma \"Normal\"" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "_R03BiwLjtwB" + }, + "source": [ + "# Definição da função\n", + "def multiplicar_numeros(x1, x2):\n", + " '''\n", + " Objetivo: Esta função multiplica DOIS números passados como argumentos.\n", + " Autor: Nelio Machado\n", + " Data: 04/10/2020\n", + " '''\n", + " return x1 * x2" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "0eVm1Qj9kDtd" + }, + "source": [ + "print(multiplicar_numeros(3, 4))" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "4h9Nhkickf_8" + }, + "source": [ + "### Usando \\*args" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "9Kf89meJkjw8" + }, + "source": [ + "def multiplicar_numeros2(*args):\n", + " '''\n", + " Objetivo: Esta função multiplica vários números passados como argumentos.\n", + " Autor: Nelio Machado\n", + " Data: 04/10/2020\n", + " '''\n", + " print(args)\n", + " print(type(args))\n", + " x = 1\n", + " for N in args:\n", + " x *= N\n", + " \n", + " return x" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "ZuIzwitWk7by" + }, + "source": [ + "print(multiplicar_numeros2(1, 2, 3, 4, 5))" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "U5kyPu792gMN" + }, + "source": [ + "Eu também posso fazer da seguinte forma:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "oc2NJmJf2s7X" + }, + "source": [ + "args= (1, 2, 3, 4, 5)\n", + "print(multiplicar_numeros2(*args))" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "38jVie_IjMXI" + }, + "source": [ + "# \\**kwargs\n", + "\n", + "* \\**kwargs é usado para passar um dicionário de comprimento variável para uma função.\n", + "* Argumento do tipo {chave: valor};\n", + "\n", + "* Para exemplificar o uso de \\**kwargs, vou usar parte do dicionário dFruits que definimos na sessão [Dictionaries](Dictionaries.ipynb). Qualquer dúvida, volte áquele capítulo para relembrar os principais conceitos." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "yAntQ724nMbv" + }, + "source": [ + "# Definindo a função para receber parâmetros em forma de dicionário:\n", + "def imprime_frutas(**kwargs):\n", + " '''\n", + " Objetivo: Esta função imprime as frutas contidas em kwargs.\n", + " Autor: Nelio Machado\n", + " Data: 04/10/2020\n", + " '''\n", + " for key, value in kwargs.items():\n", + " print(f'O valor de {key} é {value}')" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "jpmSk9mfxww3" + }, + "source": [ + "Atenção à forma como os itens são passados à função!" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "88-1lStInaVs" + }, + "source": [ + "imprime_frutas(Avocado = 0.35, Apple = 0.4, Apricot = 0.25, Banana = 0.30)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "-jb_kkLiyQt8" + }, + "source": [ + "No entanto, posso passar um dicionário na forma como estamos acostumados, da seguinte forma:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "JZJNiLz7wgCy" + }, + "source": [ + "d_frutas = {'Apple': 0.4, 'Avocado': 0.3, 'Orange': 0.5, 'Lemon': 0.25}" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "eUCum4JPEcxD" + }, + "source": [ + "imprime_frutas(**d_frutas)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "iK8-e7a1sXmn" + }, + "source": [ + "___\n", + "# **Python return**\n", + "> Uma função Python pode ou não retornar um valor." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "HS0dGA55siWw" + }, + "source": [ + "def par_ou_impar(i_numero1, i_numero2):\n", + " '''\n", + " Esta função somente avalia se a soma de dois números é par ou impar. \n", + " A função retorna odd ou even.\n", + " '''\n", + " i_soma = i_numero1+i_numero2\n", + " i_modulo = i_soma % 2\n", + " print(f'A soma é {i_soma}')\n", + " if i_modulo > 0:\n", + " return 'Odd'\n", + " else:\n", + " return 'Even' " + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "mZTG2tDJuIZQ" + }, + "source": [ + "i_numero1 = int(input('Por favor, informe o primeiro número: '))\n", + "i_numero2 = int(input('Por favor, informe o segundo número.: '))" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "7p_9pq3Du18a" + }, + "source": [ + "type(i_numero1)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "4oO7aAjcvCAe" + }, + "source": [ + "type(i_numero2)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "Br7yT8UHuKYY" + }, + "source": [ + "s_resultado = par_ou_impar(i_numero1, i_numero2)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "601QnggJuhf-" + }, + "source": [ + "print(f'O resultado é {s_resultado}')" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "t6HNf9j9yKcT" + }, + "source": [ + "Mostra o valor de i_modulo ou i_soma:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "Yu8RsyDAyXne" + }, + "source": [ + "i_modulo" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "nx3twrLRyaeJ" + }, + "source": [ + "Python reporta que i_modulo não existe.\n", + "Está correta esta informação?" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "imkyRO4kyvgV" + }, + "source": [ + "Considere o exemplo a seguir:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "kwRiXDA5y19h" + }, + "source": [ + "i_modulo = 0\n", + "\n", + "def par_ou_impar_v2(i_numero1, i_numero2):\n", + " '''\n", + " Esta função somente avalia se a soma de dois números é par ou impar. \n", + " A função retorna odd ou even.\n", + " '''\n", + " i_soma = i_numero1+i_numero2\n", + " i_modulo = i_soma % 2\n", + " print(f'A soma é {i_soma}')\n", + " if i_modulo > 0:\n", + " return 'Odd'\n", + " else:\n", + " return 'Even' " + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "GYxLSGQLy_Ai" + }, + "source": [ + "i_numero1 = int(input('Por favor, informe o primeiro número: '))\n", + "i_numero2 = int(input('Por favor, informe o segundo número.: '))" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "NMtv99fjzHGs" + }, + "source": [ + "s_resultado = par_ou_impar_v2(i_numero1, i_numero2)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "qjOHnYDVzNGK" + }, + "source": [ + "print(f'O resultado é {s_resultado}')" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "pPTecxRfzQUc" + }, + "source": [ + "Agora, vamos checar o valor de i_modulo..." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "jkQb2mQzzTEo" + }, + "source": [ + "i_modulo" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "oOlyGxBAzjE3" + }, + "source": [ + "Porque agora o Python reconhece a variável i_modulo?" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "dceSkt9Z0BZh" + }, + "source": [ + "___\n", + "# **ESCOPO DE VARIÁVEIS: LOCAL & GLOBAL**\n", + "* **Local** - Variável declarada dentro da função. Em outras palavras, é uma variável local/uso da função.\n", + "\n", + "* **Global** - Variável declarada fora da função. Neste caso, a variável é visível à todo o programa. Entretanto, não se pode alterar o valor da variável dentro da função. Caso queira alterar o valor da variável dentro da função, então é necesário declarar a variável usando a palavra reservada 'global’." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "0tIjI9GScPxu" + }, + "source": [ + "## Exemplo 1" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "QRojHHJ20iTY" + }, + "source": [ + "def exemplo1():\n", + " i_valor = 20\n", + " i_valor += 1\n", + " print(i_valor)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "RdhElmTs0y1c" + }, + "source": [ + "exemplo1()" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Tytq7PnH08pz" + }, + "source": [ + "O escopo da variável 'i_valor' é local, ou seja, de uso/restrito à função. " + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "299AK0PA1lIg" + }, + "source": [ + "i_valor" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "gGP4cx17y8EZ" + }, + "source": [ + "Portanto, o erro acima faz sentido, pois a variável i_valor é restrito á função. Ou seja, fora da função o Python não conhece este valor." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "KTV_6Gzxfvpc" + }, + "source": [ + "## Exemplo 2" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "zyi9AyJwfxTm" + }, + "source": [ + "i_valor= 100\n", + "\n", + "def exemplo2():\n", + " i_valor = 20\n", + " i_valor += 1\n", + " print(i_valor)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "iEWrboG6gBSs" + }, + "source": [ + "exemplo2()" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "JPvT0BHG-vxE" + }, + "source": [ + "Isso é um tanto estranho! Definimos, fora da função, i_valor= 100 e, dentro da função, redefinimos i_valor= 20. Entretanto, como vimos, exemplo2() retorna 21 como resultado." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "N_t8tIDC-149" + }, + "source": [ + "Agora, a seguir, fora da função, pedimos para ver o valor de i_valor e temos, como resposta, o valor 100." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "I46Bn4FlgJLu" + }, + "source": [ + "i_valor" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "IQlP5nbngL6E" + }, + "source": [ + "Saberia nos explicar o que está acontecendo?" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "h8PHd6rLgtwK" + }, + "source": [ + "## Exemplo 3" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "qB7_zPQVgvVT" + }, + "source": [ + "i_valor = 100\n", + "\n", + "def exemplo3():\n", + " global i_valor\n", + " i_valor = 20\n", + " i_valor += 1\n", + " print(i_valor)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "2KgQSbYCg8Eq" + }, + "source": [ + "exemplo3()" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "Y7yWoojrg_9Z" + }, + "source": [ + "i_valor" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "cGlmbIJGzWG6" + }, + "source": [ + "Saberia explicar o que acontece neste exemplo?" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "X8qFfIoxhFOp" + }, + "source": [ + "## Exemplo 4" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "ZM-yTLuO1bFh" + }, + "source": [ + "i_valor = 20\n", + "\n", + "def exemplo4():\n", + " i_valor += 1\n", + " print(i_valor)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "oLvfPO8w1zwL" + }, + "source": [ + "exemplo4()" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "2V7QzpZp2QcM" + }, + "source": [ + "Qual a razão deste erro?" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "w9qI8kln1_C7" + }, + "source": [ + "i_valor" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "AQFFGqLI1FWn" + }, + "source": [ + "___\n", + "# **ARGUMENTOS DEFAULT**\n", + "> Considere o exemplo a seguir: toda vez que vai ao supermercado compra 1 pack de leite (contendo 4 garrafas) e 1 garrafão de água de 5L. Portanto, de forma simples, podemos definir nossa função da seguinte forma:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "HbcSTiBI4nOj" + }, + "source": [ + "# Define a função para receber os parâmetros arroz, feijao, leite e água.\n", + "def lista_de_compras(arroz, feijao, leite= 1, agua= 1):\n", + " '''\n", + " Documentação da função: objetivos, autor e data.\n", + " '''\n", + " print('Lista de Compras:')\n", + " print(f'Quantidade de arroz.: {arroz} kilos.') \n", + " print(f'Quantidade de feijão: {feijao} kilos.') \n", + " print(f'Quantidade de leite.: {leite} pack com 4.') \n", + " print(f'Quantidade de água..: {agua} garrafa de 5 litros.') " + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "vwZnDgoq5pgB" + }, + "source": [ + "lista_de_compras(5, 3)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "l7bY5BSO7eJF" + }, + "source": [ + "Como leite= 1 e agua= 1 são valores default's, não precisamos passar esses parâmetros, desde que informamos ao Python o valor default. No entanto, se numa determinada semana precisarmos de 2 pack's de leite, ao invés de 1, devemos informar ao Python o novo valor:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "YY4OrFuH7yXi" + }, + "source": [ + "lista_de_compras(5, 3, 2)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "-nfrZAvN73YT" + }, + "source": [ + "Da mesma forma, se numa outra semana precisarmos de 2 garrafões de água ao invés de 1, informamos ao Python da seguinte forma:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "Vpoh6TdM7_xb" + }, + "source": [ + "lista_de_compras(5, 3, 2, 2)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "q3qZn9FuVQly" + }, + "source": [ + "___\n", + "# **map()**" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Dav8k0JYWi4B" + }, + "source": [ + "## Exemplo 1\n", + "> Suponha que queremos o quadrado de cada número passado à uma função." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "R6NC0i2OVktM" + }, + "source": [ + "l_numeros= [0, 1, 2, 3, 4, 5]" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "AVjYlN44Vw2k" + }, + "source": [ + "def quadrado_do_numero(i_numero):\n", + " return i_numero**2" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "i_4CHiehV7lD" + }, + "source": [ + "list(map(quadrado_do_numero, l_numeros))" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "5tq8QDSPWNf6" + }, + "source": [ + "OU..." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "ZAfkybybWOcG" + }, + "source": [ + "for i in map(quadrado_do_numero, l_numeros):\n", + " print(i)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "c01V5CEzWlGF" + }, + "source": [ + "## Exemplo 2\n", + "> substituir_truer todos os valores True da lista abaixo por 1 e False por 0." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "qH1ackDZWvKp" + }, + "source": [ + "import random\n", + "\n", + "l_dados = []\n", + "for i in range(50):\n", + " random.seed(i)\n", + " l_dados.append(random.choice([True, False]))\n", + " \n", + "l_dados" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "Dt2UKC-WXsxr" + }, + "source": [ + "def substituir_true(s_String):\n", + " if s_String == True:\n", + " return 1\n", + " else:\n", + " return 0" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "BIIkPuDEXaM0" + }, + "source": [ + "list(map(substituir_true, l_dados))" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "TzkLIH1gYpFQ" + }, + "source": [ + "___\n", + "# **Filter()**\n", + "* Filtra elementos baseado em condições." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "cjU8YznfZai1" + }, + "source": [ + "Suponha que agora eu quero filtrar os itens True da lista l_dados." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "a3SeaKJgZlAZ" + }, + "source": [ + "def filtrar_true(item):\n", + " if item == True:\n", + " return True\n", + " else:\n", + " return False" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "1Z1APDQtZyXs" + }, + "source": [ + "list(filter(filtrar_true, l_dados))" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "xPpFqVUnKEH7" + }, + "source": [ + "___\n", + "# **EXERCÍCIOS**" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "RDgCRPRs0W6C" + }, + "source": [ + "## Exercício 1\n", + "Construa uma função para retornar o dia da semana a partir de um número, sendo:\n", + "\n", + "* 1 - Dom\n", + "* 2 - Seg\n", + "* 3 - Ter\n", + "* 4 - Qua\n", + "* 5 - Qui\n", + "* 6 - Sex\n", + "* 7 - Sab" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "H17JO6sLOrG7" + }, + "source": [ + "### Minha solução" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "wX_7XDyB0XSy" + }, + "source": [ + "def dia_da_semana(dia):\n", + " d_palavra= {1: 'Segunda',\n", + " 2: 'Terça',\n", + " 3: 'Quarta',\n", + " 4: 'Quinta',\n", + " 5: 'Sexta',\n", + " 6: 'Sabado',\n", + " 7: 'Domingo' }\n", + " return d_palavra.get(dia,\"Dia da semana inválido. Informe um número de 1 a 7\")" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "39toyCRU1Q5T" + }, + "source": [ + "dia_da_semana(1)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "wt5hQq__1UEd" + }, + "source": [ + "dia_da_semana(0)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "N53NOsZjOv9m" + }, + "source": [ + "## Exercício 2\n", + "* Desenvolver uma função que retorna True se s_palavra pertence à uma string e False caso contrário." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "7j-HHsxFrX5t" + }, + "source": [ + "def palavra_está_string (s_palavra, s_string):\n", + " if s_palavra in s_string:\n", + " return True\n", + " else:\n", + " return False\n" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "EOTAE-oMrYJW" + }, + "source": [ + "s_string = 'O amor é o fogo que arde sem se ver. É ferida que dói e não se sente. É um contentamento descontente. É dor que desatina sem doer'\n", + "s_string" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "tfzOT6w0thZg" + }, + "source": [ + "s_palavra = 'fogo'\n", + "palavra_está_string (s_palavra, s_string)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "xxunBr3ttnji" + }, + "source": [ + "" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "vrBZ_68-PBWl" + }, + "source": [ + "### Minha solução:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "m4Pi4S8hPC_u" + }, + "source": [ + "def check_palavra(s_frase, s_palavra):\n", + " if s_palavra in s_frase:\n", + " return True\n", + " else:\n", + " return False" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "NJeqwxDjPxub" + }, + "source": [ + "A frase abaixo foi extraída de [+ Bíblia + Camões + Legião Urbana - (Guerra) = Monte Castelo](http://compondoletras.blogspot.com/2013/11/biblia-camoes-legiao-urbana-guerra.html)" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "Dj_n_beIPRBN" + }, + "source": [ + "s_frase = 'O amor é o fogo que arde sem se ver. É ferida que dói e não se sente. É um contentamento descontente. É dor que desatina sem doer'\n", + "s_frase" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "s40FJ9iCPPY0" + }, + "source": [ + "s_palavra = 'fogo'" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "tzc2eaM7QUFE" + }, + "source": [ + "A palavra s_palavra está em s_frase?" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "2tlravrMQXn2" + }, + "source": [ + "check_palavra(s_frase, s_palavra)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "XFBVXsW_rVG2" + }, + "source": [ + "" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "pMx9E0xMu1lc" + }, + "source": [ + "## Exercício 3\n", + "Para mais exercícios envolvendo funções, consulte [Python functions - Exercises, Practice, Solution](https://www.w3resource.com/python-exercises/python-functions-exercises.php)." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "Mw6Wg5hFvFMR" + }, + "source": [ + "" + ], + "execution_count": null, + "outputs": [] + } + ] +} \ No newline at end of file From 3c6e45809d6d65537cd280ff84c1557dc497c7bd Mon Sep 17 00:00:00 2001 From: MariaJacobs70 <72224154+MariaJacobs70@users.noreply.github.com> Date: Wed, 7 Oct 2020 22:56:24 -0300 Subject: [PATCH 6/9] Criado usando o Colaboratory --- Notebooks/NB07__Dictionaries_alterado.ipynb | 1266 +++++++++++++------ 1 file changed, 874 insertions(+), 392 deletions(-) diff --git a/Notebooks/NB07__Dictionaries_alterado.ipynb b/Notebooks/NB07__Dictionaries_alterado.ipynb index 96c177534..49d128332 100644 --- a/Notebooks/NB07__Dictionaries_alterado.ipynb +++ b/Notebooks/NB07__Dictionaries_alterado.ipynb @@ -110,7 +110,7 @@ "cell_type": "code", "metadata": { "id": "FxuJ7Awd8f5a", - "outputId": "2b09e013-655c-492e-8902-c0706b3f7b4d", + "outputId": "aa77d55b-989e-4472-c3cb-6865e05fb0f3", "colab": { "base_uri": "https://localhost:8080/" } @@ -118,10 +118,11 @@ "source": [ "# Definição da lista l_frutas:\n", "l_frutas = ['Avocado', 'Apple', 'Apricot', 'Banana', 'Blackcurrant', 'Blackberry', 'Blueberry', 'Cherry', 'Coconut', 'Fig', 'Grape', 'Kiwi', 'Lemon', 'Mango', 'Nectarine', \n", - " 'Orange', 'Papaya','Passion Fruit','Peach','Pineapple','Plum','Raspberry','Strawberry','Watermelon']\n", + " 'Orange', 'Papaya', 'Passion Fruit', 'Peach', 'Pineapple', 'Plum', 'Raspberry', 'Strawberry', 'Watermelon']\n", + "\n", "print(l_frutas)" ], - "execution_count": 2, + "execution_count": null, "outputs": [ { "output_type": "stream", @@ -135,14 +136,55 @@ { "cell_type": "code", "metadata": { - "id": "jJyxuMQc9Ewy" + "id": "jJyxuMQc9Ewy", + "outputId": "fc4b18d6-b85b-4ad8-e41d-c53f6314a31d", + "colab": { + "base_uri": "https://localhost:8080/" + } }, "source": [ "# Definição da lista l_precos_frutas:\n", - "l_precos_frutas = [0.35, 0.40, 0.25, 0.30, 0.70, 0.55, 0.45, 0.50, 0.75, 0.60, 0.65, 0.20, 0.15, 0.80, 0.75, 0.25, 0.30,0.45,0.55,0.55,0.60,0.40,0.50,0.45]" + "l_precos_frutas = [0.35, 0.40, 0.25, 0.30, 0.70, 0.55, 0.45, 0.50, 0.75, 0.60, 0.65, 0.20, 0.15, 0.80, 0.75, 0.25, 0.30,0.45,0.55,0.55,0.60,0.40,0.50,0.45]\n", + "l_precos_frutas" ], - "execution_count": 3, - "outputs": [] + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "[0.35,\n", + " 0.4,\n", + " 0.25,\n", + " 0.3,\n", + " 0.7,\n", + " 0.55,\n", + " 0.45,\n", + " 0.5,\n", + " 0.75,\n", + " 0.6,\n", + " 0.65,\n", + " 0.2,\n", + " 0.15,\n", + " 0.8,\n", + " 0.75,\n", + " 0.25,\n", + " 0.3,\n", + " 0.45,\n", + " 0.55,\n", + " 0.55,\n", + " 0.6,\n", + " 0.4,\n", + " 0.5,\n", + " 0.45]" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 4 + } + ] }, { "cell_type": "markdown", @@ -157,16 +199,17 @@ "cell_type": "code", "metadata": { "id": "qT_4sYxA9dyn", - "outputId": "e8276718-3d1c-4dc2-c4ee-0ac975b7ff18", + "outputId": "1952df07-193f-4827-dc41-5f9068b96436", "colab": { "base_uri": "https://localhost:8080/" } }, "source": [ + "# Definir o dicionário d_frutas: estrutura do tipo {chave1: valor1, chave2: valor2, ..., chaveN: valorN} --> JSON\n", "d_frutas = dict(zip(l_frutas, l_precos_frutas))\n", "d_frutas" ], - "execution_count": 4, + "execution_count": null, "outputs": [ { "output_type": "execute_result", @@ -201,7 +244,7 @@ "metadata": { "tags": [] }, - "execution_count": 4 + "execution_count": 5 } ] }, @@ -263,7 +306,7 @@ "cell_type": "code", "metadata": { "id": "N_2J839X4lps", - "outputId": "56ab2a1f-4c41-4335-9697-38bce1eb51f6", + "outputId": "ddfb58dd-dcd8-4a57-b886-6eed048a6b2f", "colab": { "base_uri": "https://localhost:8080/" } @@ -272,7 +315,7 @@ "d_dia_semana = {'Seg': 'Segunda', 'Ter': 'Terça', 'Qua': 'Quarta', 'Qui': 'Quinta', 'Sex': 'Sexta', 'Sab': 'Sabado', 'Dom': 'Domingo'}\n", "d_dia_semana" ], - "execution_count": 5, + "execution_count": null, "outputs": [ { "output_type": "execute_result", @@ -290,7 +333,7 @@ "metadata": { "tags": [] }, - "execution_count": 5 + "execution_count": 6 } ] }, @@ -308,16 +351,16 @@ "cell_type": "code", "metadata": { "id": "eHuvY7BWQKhQ", - "outputId": "649df9a0-587f-4e6c-a19a-4654e919bd93", + "outputId": "aabce6c1-1914-435e-b21b-4f845f82d53f", "colab": { "base_uri": "https://localhost:8080/", "height": 35 } }, "source": [ - "d_dia_semana['Seg']" + "d_dia_semana['Seg'] # A chave aqui é 'Seg'" ], - "execution_count": 6, + "execution_count": null, "outputs": [ { "output_type": "execute_result", @@ -332,7 +375,7 @@ "metadata": { "tags": [] }, - "execution_count": 6 + "execution_count": 7 } ] }, @@ -359,7 +402,7 @@ "cell_type": "code", "metadata": { "id": "2iPWXPBLfOlr", - "outputId": "550e6ef2-9a6a-41b1-8b36-95c3e2e5efbe", + "outputId": "dfe22769-059e-4e95-8dad-85c1654530de", "colab": { "base_uri": "https://localhost:8080/" } @@ -368,7 +411,7 @@ "d_paises = {} # Também podemos usar a função dict() para criar o dicionário vazio da seguinte forma: d_paises= dict()\n", "d_paises" ], - "execution_count": 7, + "execution_count": null, "outputs": [ { "output_type": "execute_result", @@ -380,7 +423,7 @@ "metadata": { "tags": [] }, - "execution_count": 7 + "execution_count": 8 } ] }, @@ -399,7 +442,7 @@ "cell_type": "code", "metadata": { "id": "voPYpGIGff3o", - "outputId": "d0748385-60a7-4b87-b7e2-794fbb8e9abe", + "outputId": "d28c315e-b0a3-46e8-e7e8-4392f5525e94", "colab": { "base_uri": "https://localhost:8080/" } @@ -407,7 +450,7 @@ "source": [ "type(d_paises)" ], - "execution_count": 8, + "execution_count": null, "outputs": [ { "output_type": "execute_result", @@ -419,7 +462,7 @@ "metadata": { "tags": [] }, - "execution_count": 8 + "execution_count": 9 } ] }, @@ -446,7 +489,7 @@ "cell_type": "code", "metadata": { "id": "EXZ7eEZofnza", - "outputId": "fcbde537-da11-4573-ed38-bd4d48a96f60", + "outputId": "1716826d-e92b-4d21-a56e-1fa6b1df6663", "colab": { "base_uri": "https://localhost:8080/" } @@ -455,19 +498,19 @@ "d_paises[1] = 'Italy'\n", "d_paises" ], - "execution_count": 12, + "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ - "{1: 'Italy', 2: 'Denmark', 3: 'Brazil'}" + "{1: 'Italy'}" ] }, "metadata": { "tags": [] }, - "execution_count": 12 + "execution_count": 10 } ] }, @@ -484,7 +527,7 @@ "cell_type": "code", "metadata": { "id": "GAXSzSiufv1u", - "outputId": "ba60506d-647b-4d40-b316-dc2c693c6f8c", + "outputId": "0a3ab152-c82b-41d4-e6b8-ec4adf30493f", "colab": { "base_uri": "https://localhost:8080/" } @@ -493,19 +536,19 @@ "d_paises[2] = 'Denmark'\n", "d_paises" ], - "execution_count": 13, + "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ - "{1: 'Italy', 2: 'Denmark', 3: 'Brazil'}" + "{1: 'Italy', 2: 'Denmark'}" ] }, "metadata": { "tags": [] }, - "execution_count": 13 + "execution_count": 11 } ] }, @@ -522,7 +565,7 @@ "cell_type": "code", "metadata": { "id": "FN7km8C9gAjM", - "outputId": "9927805c-54bf-4a7b-8e1e-134353f3ff40", + "outputId": "fabb845b-cdff-423c-ef51-2d1f6cd2ee30", "colab": { "base_uri": "https://localhost:8080/" } @@ -531,7 +574,7 @@ "d_paises[3]= 'Brazil'\n", "d_paises" ], - "execution_count": 14, + "execution_count": null, "outputs": [ { "output_type": "execute_result", @@ -543,7 +586,7 @@ "metadata": { "tags": [] }, - "execution_count": 14 + "execution_count": 12 } ] }, @@ -570,7 +613,7 @@ "cell_type": "code", "metadata": { "id": "Rr6DtJnDgU5I", - "outputId": "763dc0a4-de82-428c-d9d8-940412330956", + "outputId": "9d79ee40-e0a6-4f47-dfc0-fe0010102b73", "colab": { "base_uri": "https://localhost:8080/" } @@ -580,7 +623,7 @@ "d_paises[3]= 'France'\n", "d_paises" ], - "execution_count": 15, + "execution_count": null, "outputs": [ { "output_type": "execute_result", @@ -592,7 +635,7 @@ "metadata": { "tags": [] }, - "execution_count": 15 + "execution_count": 13 } ] }, @@ -617,20 +660,47 @@ "# **OBTER KEYS DO DICIONÁRIO**" ] }, + { + "cell_type": "code", + "metadata": { + "id": "FQtAHjJdb0xK", + "outputId": "db5582e2-d3cf-47b6-fa62-c503bca4c05b", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "source": [ + "d_paises" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "{1: 'Italy', 2: 'Denmark', 3: 'France'}" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 14 + } + ] + }, { "cell_type": "code", "metadata": { "id": "ALwbHwi4iwky", - "outputId": "8aa9b44e-313b-42a6-e885-523185c0f4c5", + "outputId": "ddc7b4c8-5c59-44a3-8ae7-fdc21524b043", "colab": { "base_uri": "https://localhost:8080/" } }, "source": [ - "d_paises.keys()\n", - "d_paises." + "d_paises.keys()" ], - "execution_count": 16, + "execution_count": null, "outputs": [ { "output_type": "execute_result", @@ -642,7 +712,7 @@ "metadata": { "tags": [] }, - "execution_count": 16 + "execution_count": 15 } ] }, @@ -660,14 +730,13 @@ "cell_type": "code", "metadata": { "id": "cp0PPtl3jEKo", - "outputId": "c7b8739a-caa9-4e58-e6d3-0f86ccd2d950", + "outputId": "68a71557-c44a-4a0f-89c6-ffd9db6421b3", "colab": { - "base_uri": "https://localhost:8080/", - "height": 34 + "base_uri": "https://localhost:8080/" } }, "source": [ - "d_paises.values()" + "d_paises.items()" ], "execution_count": null, "outputs": [ @@ -675,13 +744,13 @@ "output_type": "execute_result", "data": { "text/plain": [ - "dict_values(['Italy', 'Denmark', 'France'])" + "dict_items([(1, 'Italy'), (2, 'Denmark'), (3, 'France')])" ] }, "metadata": { "tags": [] }, - "execution_count": 11 + "execution_count": 16 } ] }, @@ -699,27 +768,27 @@ "cell_type": "code", "metadata": { "id": "LraTwXjdjG3m", - "outputId": "f7e69608-c6bf-45c2-a08a-f7e76e1c4b2f", + "outputId": "b3d6d55e-20ad-4f88-a783-9ba1c4fd8654", "colab": { - "base_uri": "https://localhost:8080/" + "base_uri": "https://localhost:8080/", + "height": 162 } }, "source": [ "d_paises.items()" ], - "execution_count": 17, + "execution_count": null, "outputs": [ { - "output_type": "execute_result", - "data": { - "text/plain": [ - "dict_items([(1, 'Italy'), (2, 'Denmark'), (3, 'France')])" - ] - }, - "metadata": { - "tags": [] - }, - "execution_count": 17 + "output_type": "error", + "ename": "NameError", + "evalue": "ignored", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)", + "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0md_Paises\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mitems\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", + "\u001b[0;31mNameError\u001b[0m: name 'd_Paises' is not defined" + ] } ] }, @@ -747,23 +816,20 @@ "cell_type": "code", "metadata": { "id": "FUfTjqktjW60", - "outputId": "acf5b8d0-7b30-4f3d-87ef-a1c9d259652b", + "outputId": "678ab629-6cff-4fe1-e03f-d90709a98f26", "colab": { "base_uri": "https://localhost:8080/", - "height": 35 + "height": 34 } }, "source": [ "d_paises.get(1)" ], - "execution_count": 18, + "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { - "application/vnd.google.colaboratory.intrinsic+json": { - "type": "string" - }, "text/plain": [ "'Italy'" ] @@ -771,7 +837,7 @@ "metadata": { "tags": [] }, - "execution_count": 18 + "execution_count": 11 } ] }, @@ -790,16 +856,17 @@ "cell_type": "code", "metadata": { "id": "XL17EmvMkkky", - "outputId": "9aea7f6c-bc39-4d39-bbf7-78c2ff4d7c8c", + "outputId": "65846bc2-87a2-42cf-eb17-e3fccb00c9a4", "colab": { - "base_uri": "https://localhost:8080/" + "base_uri": "https://localhost:8080/", + "height": 35 } }, "source": [ "d_paises2 = d_paises.copy()\n", "d_paises2" ], - "execution_count": 19, + "execution_count": null, "outputs": [ { "output_type": "execute_result", @@ -811,7 +878,7 @@ "metadata": { "tags": [] }, - "execution_count": 19 + "execution_count": 28 } ] }, @@ -834,22 +901,23 @@ "source": [ "d_paises.clear()" ], - "execution_count": 20, + "execution_count": null, "outputs": [] }, { "cell_type": "code", "metadata": { "id": "ro_42gzDjsdV", - "outputId": "355082b3-19d2-4d8b-8347-c2a6b08d6487", + "outputId": "a2c2a25b-40ef-4842-f2f7-3ac85404d195", "colab": { - "base_uri": "https://localhost:8080/" + "base_uri": "https://localhost:8080/", + "height": 35 } }, "source": [ "d_paises" ], - "execution_count": 21, + "execution_count": null, "outputs": [ { "output_type": "execute_result", @@ -861,7 +929,7 @@ "metadata": { "tags": [] }, - "execution_count": 21 + "execution_count": 13 } ] }, @@ -976,7 +1044,7 @@ " 'Strawberry': 0.50,\n", " 'Watermelon': 0.45}" ], - "execution_count": 22, + "execution_count": null, "outputs": [] }, { @@ -992,7 +1060,7 @@ "cell_type": "code", "metadata": { "id": "bI7Ctf0ohyz8", - "outputId": "f9aa2e05-8a41-45c0-a32e-22b1b8976c69", + "outputId": "909b3b73-00da-487e-b981-a62073c78fda", "colab": { "base_uri": "https://localhost:8080/" } @@ -1000,7 +1068,7 @@ "source": [ "d_frutas" ], - "execution_count": 23, + "execution_count": null, "outputs": [ { "output_type": "execute_result", @@ -1035,7 +1103,7 @@ "metadata": { "tags": [] }, - "execution_count": 23 + "execution_count": 18 } ] }, @@ -1052,7 +1120,7 @@ "cell_type": "code", "metadata": { "id": "JpreyE_LtCcU", - "outputId": "f2f7d1e4-51ff-40ff-af58-5577a34dd137", + "outputId": "8b2e9f8f-66f9-4fa1-eae9-95b8e6df1abb", "colab": { "base_uri": "https://localhost:8080/" } @@ -1060,7 +1128,7 @@ "source": [ "d_frutas['Apple']" ], - "execution_count": 24, + "execution_count": null, "outputs": [ { "output_type": "execute_result", @@ -1072,7 +1140,35 @@ "metadata": { "tags": [] }, - "execution_count": 24 + "execution_count": 19 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "RMWau2TOclHr", + "outputId": "741f0735-17a1-4f4f-ec4c-4df8bc82c987", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 166 + } + }, + "source": [ + "d_frutas['blablabla'] # Isso significa que 'blablabla' não faz parte do dicionário!" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "error", + "ename": "KeyError", + "evalue": "ignored", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mKeyError\u001b[0m Traceback (most recent call last)", + "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0md_frutas\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'blablabla'\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;31m# Isso significa que 'blablabla' não faz parte do dicionário!\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", + "\u001b[0;31mKeyError\u001b[0m: 'blablabla'" + ] } ] }, @@ -1088,17 +1184,17 @@ { "cell_type": "code", "metadata": { - "id": "rMro_tY8kepo", - "outputId": "dec61759-d1b5-4379-8846-cce26973209d", + "id": "i-6-pNQCcyXY", + "outputId": "a8408cf3-3e7f-4cb5-d19a-2e8e243dba6b", "colab": { "base_uri": "https://localhost:8080/" } }, "source": [ - "for key in d_frutas.keys():\n", - " print(key)" + "for chave in d_frutas.keys():\n", + " print(chave)" ], - "execution_count": 25, + "execution_count": null, "outputs": [ { "output_type": "stream", @@ -1132,29 +1228,114 @@ } ] }, + { + "cell_type": "markdown", + "metadata": { + "id": "9u4xJ0FfdCxm" + }, + "source": [ + "## Iterar pelos valores do dicionário:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "vrFPwQPDdFP3", + "outputId": "cb739da2-4407-47fe-8495-9c86247bffad", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "source": [ + "for i_valor in d_frutas.values():\n", + " print(i_valor)" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "text": [ + "0.35\n", + "0.4\n", + "0.25\n", + "0.3\n", + "0.7\n", + "0.55\n", + "0.45\n", + "0.5\n", + "0.75\n", + "0.6\n", + "0.65\n", + "0.2\n", + "0.15\n", + "0.8\n", + "0.75\n", + "0.25\n", + "0.3\n", + "0.45\n", + "0.55\n", + "0.55\n", + "0.6\n", + "0.4\n", + "0.5\n", + "0.45\n" + ], + "name": "stdout" + } + ] + }, { "cell_type": "markdown", "metadata": { "id": "yDkOLvRFJxco" }, "source": [ - "## Iterar pelos itens (key, value) do dicionário" + "## Iterar pelos itens (chave, valor) do dicionário" ] }, { "cell_type": "code", "metadata": { - "id": "DpFB1g-3kDSt", - "outputId": "a0b41867-9ca7-414d-a451-34aa0f6e4232", + "id": "H8BCC6qodU6o", + "outputId": "9744058d-976e-4309-d878-679af58013aa", "colab": { "base_uri": "https://localhost:8080/" } }, + "source": [ + "d_frutas.items()" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "dict_items([('Avocado', 0.35), ('Apple', 0.4), ('Apricot', 0.25), ('Banana', 0.3), ('Blackcurrant', 0.7), ('Blackberry', 0.55), ('Blueberry', 0.45), ('Cherry', 0.5), ('Coconut', 0.75), ('Fig', 0.6), ('Grape', 0.65), ('Kiwi', 0.2), ('Lemon', 0.15), ('Mango', 0.8), ('Nectarine', 0.75), ('Orange', 0.25), ('Papaya', 0.3), ('Passion Fruit', 0.45), ('Peach', 0.55), ('Pineapple', 0.55), ('Plum', 0.6), ('Raspberry', 0.4), ('Strawberry', 0.5), ('Watermelon', 0.45)])" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 24 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "DpFB1g-3kDSt", + "outputId": "7ac51581-edfa-418d-a1e0-d297ebdffca7", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 425 + } + }, "source": [ "for item in d_frutas.items():\n", " print(item) " ], - "execution_count": 26, + "execution_count": null, "outputs": [ { "output_type": "stream", @@ -1188,17 +1369,6 @@ } ] }, - { - "cell_type": "code", - "metadata": { - "id": "KxGaOrYSdLxp" - }, - "source": [ - "" - ], - "execution_count": null, - "outputs": [] - }, { "cell_type": "markdown", "metadata": { @@ -1212,16 +1382,17 @@ "cell_type": "code", "metadata": { "id": "tjJ6qRF8nr4v", - "outputId": "6798f545-3f24-40b9-83d9-8c4a5e073389", + "outputId": "3e75843b-2d45-4b4c-a3f2-24ffc0e60a7a", "colab": { - "base_uri": "https://localhost:8080/" + "base_uri": "https://localhost:8080/", + "height": 425 } }, "source": [ "for value in d_frutas.values():\n", " print(value)" ], - "execution_count": 27, + "execution_count": null, "outputs": [ { "output_type": "stream", @@ -1313,35 +1484,7 @@ ] }, { - "cell_type": "code", - "metadata": { - "id": "bDb1IHuddfwH", - "outputId": "b66c5df5-6028-4cad-e480-60c92cdaa9f8", - "colab": { - "base_uri": "https://localhost:8080/" - } - }, - "source": [ - "d_frutas.items()" - ], - "execution_count": 28, - "outputs": [ - { - "output_type": "execute_result", - "data": { - "text/plain": [ - "dict_items([('Avocado', 0.35), ('Apple', 0.4), ('Apricot', 0.25), ('Banana', 0.3), ('Blackcurrant', 0.7), ('Blackberry', 0.55), ('Blueberry', 0.45), ('Cherry', 0.5), ('Coconut', 0.75), ('Fig', 0.6), ('Grape', 0.65), ('Kiwi', 0.2), ('Lemon', 0.15), ('Mango', 0.8), ('Nectarine', 0.75), ('Orange', 0.25), ('Papaya', 0.3), ('Passion Fruit', 0.45), ('Peach', 0.55), ('Pineapple', 0.55), ('Plum', 0.6), ('Raspberry', 0.4), ('Strawberry', 0.5), ('Watermelon', 0.45)])" - ] - }, - "metadata": { - "tags": [] - }, - "execution_count": 28 - } - ] - }, - { - "cell_type": "markdown", + "cell_type": "markdown", "metadata": { "id": "Fotx7XUquAo8" }, @@ -1363,10 +1506,9 @@ "cell_type": "code", "metadata": { "id": "-gkEKNZPTeMp", - "outputId": "3540aadd-996a-4abd-cfcb-c22e49b75aaa", + "outputId": "6c248a84-4912-446e-a197-7dc8ea670b85", "colab": { - "base_uri": "https://localhost:8080/", - "height": 34 + "base_uri": "https://localhost:8080/" } }, "source": [ @@ -1384,7 +1526,7 @@ "metadata": { "tags": [] }, - "execution_count": 75 + "execution_count": 25 } ] }, @@ -1401,14 +1543,13 @@ "cell_type": "code", "metadata": { "id": "SKtEwmBCuxyi", - "outputId": "1df7263c-a64f-4eaf-8d4d-a55cac03d2bc", + "outputId": "cf52b903-90e0-432f-8b7e-bc878d5c5f5f", "colab": { - "base_uri": "https://localhost:8080/", - "height": 34 + "base_uri": "https://localhost:8080/" } }, "source": [ - "'Coconut' in fruits.keys()" + "'Coconut' in d_frutas.keys()" ], "execution_count": null, "outputs": [ @@ -1422,7 +1563,7 @@ "metadata": { "tags": [] }, - "execution_count": 77 + "execution_count": 29 } ] }, @@ -1440,7 +1581,7 @@ "cell_type": "code", "metadata": { "id": "DbWpbuLTK9sn", - "outputId": "f10aeaa8-4cd7-41a9-b9f9-7e7bbe6a7fd5", + "outputId": "97999d69-9bce-463c-f980-e5a447ccf7a4", "colab": { "base_uri": "https://localhost:8080/" } @@ -1448,7 +1589,7 @@ "source": [ "0.4 in d_frutas.values()" ], - "execution_count": 29, + "execution_count": null, "outputs": [ { "output_type": "execute_result", @@ -1460,7 +1601,7 @@ "metadata": { "tags": [] }, - "execution_count": 29 + "execution_count": 30 } ] }, @@ -1477,13 +1618,31 @@ { "cell_type": "code", "metadata": { - "id": "5Rwq4-UG4--u" + "id": "5Rwq4-UG4--u", + "outputId": "4b157602-fd78-41a9-ae5b-de42c189d805", + "colab": { + "base_uri": "https://localhost:8080/" + } }, "source": [ - "d_frutas2 = {'Grapefruit': 1.0 }" + "d_frutas2 = {'Grapefruit': 1.0}\n", + "d_frutas2" ], - "execution_count": 30, - "outputs": [] + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "{'Grapefruit': 1.0}" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 32 + } + ] }, { "cell_type": "markdown", @@ -1498,7 +1657,7 @@ "cell_type": "code", "metadata": { "id": "7BD_mYMM5O5o", - "outputId": "39cdea31-d4ff-4064-e59f-adf178840a82", + "outputId": "ea96653d-04e7-4809-a24f-ff97ecdf4a81", "colab": { "base_uri": "https://localhost:8080/" } @@ -1507,7 +1666,7 @@ "d_frutas.update(d_frutas2)\n", "d_frutas" ], - "execution_count": 31, + "execution_count": null, "outputs": [ { "output_type": "execute_result", @@ -1543,7 +1702,7 @@ "metadata": { "tags": [] }, - "execution_count": 31 + "execution_count": 33 } ] }, @@ -1564,7 +1723,7 @@ "source": [ "d_frutas3 = {'Apple': 0.70}" ], - "execution_count": 32, + "execution_count": null, "outputs": [] }, { @@ -1581,55 +1740,14 @@ { "cell_type": "code", "metadata": { - "id": "E4GKdTw76PXI", - "outputId": "6a14210d-d1e4-4dce-9465-72af35a60671", - "colab": { - "base_uri": "https://localhost:8080/" - } + "id": "E4GKdTw76PXI" }, "source": [ "d_frutas.update(d_frutas3)\n", "d_frutas" ], - "execution_count": 33, - "outputs": [ - { - "output_type": "execute_result", - "data": { - "text/plain": [ - "{'Apple': 0.7,\n", - " 'Apricot': 0.25,\n", - " 'Avocado': 0.35,\n", - " 'Banana': 0.3,\n", - " 'Blackberry': 0.55,\n", - " 'Blackcurrant': 0.7,\n", - " 'Blueberry': 0.45,\n", - " 'Cherry': 0.5,\n", - " 'Coconut': 0.75,\n", - " 'Fig': 0.6,\n", - " 'Grape': 0.65,\n", - " 'Grapefruit': 1.0,\n", - " 'Kiwi': 0.2,\n", - " 'Lemon': 0.15,\n", - " 'Mango': 0.8,\n", - " 'Nectarine': 0.75,\n", - " 'Orange': 0.25,\n", - " 'Papaya': 0.3,\n", - " 'Passion Fruit': 0.45,\n", - " 'Peach': 0.55,\n", - " 'Pineapple': 0.55,\n", - " 'Plum': 0.6,\n", - " 'Raspberry': 0.4,\n", - " 'Strawberry': 0.5,\n", - " 'Watermelon': 0.45}" - ] - }, - "metadata": { - "tags": [] - }, - "execution_count": 33 - } - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", @@ -1663,14 +1781,113 @@ { "cell_type": "code", "metadata": { - "id": "ZziGmKGmwqwn" + "id": "RV-YOkrffa3h", + "outputId": "6a7a1330-7ecd-4924-bf73-f3e0bc16baae", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "source": [ + "d_frutas.keys()" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "dict_keys(['Avocado', 'Apple', 'Apricot', 'Banana', 'Blackcurrant', 'Blackberry', 'Blueberry', 'Cherry', 'Coconut', 'Fig', 'Grape', 'Kiwi', 'Lemon', 'Mango', 'Nectarine', 'Orange', 'Papaya', 'Passion Fruit', 'Peach', 'Pineapple', 'Plum', 'Raspberry', 'Strawberry', 'Watermelon', 'Grapefruit'])" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 34 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "tV8k5w2Bf1Oq", + "outputId": "4ee0496f-ab86-4c6d-fc9f-b04863017bb0", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "source": [ + "d_frutas.items()" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "dict_items([('Avocado', 0.35), ('Apple', 0.4), ('Apricot', 0.25), ('Banana', 0.3), ('Blackcurrant', 0.7), ('Blackberry', 0.55), ('Blueberry', 0.45), ('Cherry', 0.5), ('Coconut', 0.75), ('Fig', 0.6), ('Grape', 0.65), ('Kiwi', 0.2), ('Lemon', 0.15), ('Mango', 0.8), ('Nectarine', 0.75), ('Orange', 0.25), ('Papaya', 0.3), ('Passion Fruit', 0.45), ('Peach', 0.55), ('Pineapple', 0.55), ('Plum', 0.6), ('Raspberry', 0.4), ('Strawberry', 0.5), ('Watermelon', 0.45), ('Grapefruit', 1.0)])" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 37 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "ZziGmKGmwqwn", + "outputId": "30ba046d-0815-4df5-d36b-1459fae1bb9a", + "colab": { + "base_uri": "https://localhost:8080/" + } }, "source": [ "for key, value in d_frutas.items():\n", - " d_frutas[key] = round(value * 0.9, 2)" + " d_frutas[key] = round((value * 0.9), 2) # Isso representa um desconto de 10% no valor das frutas\n", + "\n", + "d_frutas" ], - "execution_count": 34, - "outputs": [] + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "{'Apple': 0.32,\n", + " 'Apricot': 0.21,\n", + " 'Avocado': 0.29,\n", + " 'Banana': 0.24,\n", + " 'Blackberry': 0.45,\n", + " 'Blackcurrant': 0.57,\n", + " 'Blueberry': 0.37,\n", + " 'Cherry': 0.41,\n", + " 'Coconut': 0.61,\n", + " 'Fig': 0.49,\n", + " 'Grape': 0.53,\n", + " 'Grapefruit': 0.81,\n", + " 'Kiwi': 0.16,\n", + " 'Lemon': 0.13,\n", + " 'Mango': 0.65,\n", + " 'Nectarine': 0.61,\n", + " 'Orange': 0.21,\n", + " 'Papaya': 0.24,\n", + " 'Passion Fruit': 0.37,\n", + " 'Peach': 0.45,\n", + " 'Pineapple': 0.45,\n", + " 'Plum': 0.49,\n", + " 'Raspberry': 0.32,\n", + " 'Strawberry': 0.41,\n", + " 'Watermelon': 0.37}" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 39 + } + ] }, { "cell_type": "markdown", @@ -1685,15 +1902,16 @@ "cell_type": "code", "metadata": { "id": "zZLa85knxBtY", - "outputId": "ee34b820-6961-4d88-d0e2-628efd922e70", + "outputId": "2c7c12f8-8885-4f34-a0d1-1323e98a9437", "colab": { - "base_uri": "https://localhost:8080/" + "base_uri": "https://localhost:8080/", + "height": 442 } }, "source": [ "d_frutas" ], - "execution_count": 35, + "execution_count": null, "outputs": [ { "output_type": "execute_result", @@ -1729,7 +1947,7 @@ "metadata": { "tags": [] }, - "execution_count": 35 + "execution_count": 84 } ] }, @@ -1764,7 +1982,7 @@ " if key == 'Avocado':\n", " del d_frutas[key] # Deleta key = 'Avocado'" ], - "execution_count": 36, + "execution_count": null, "outputs": [] }, { @@ -1780,7 +1998,7 @@ "cell_type": "code", "metadata": { "id": "IwnsHejhyT4l", - "outputId": "e5029d7e-9c8d-4e37-940e-a375ba57c78d", + "outputId": "f0f819ca-117e-4a52-8458-196476c52aad", "colab": { "base_uri": "https://localhost:8080/" } @@ -1788,42 +2006,42 @@ "source": [ "d_frutas" ], - "execution_count": 37, + "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ - "{'Apple': 0.63,\n", - " 'Apricot': 0.23,\n", - " 'Banana': 0.27,\n", - " 'Blackberry': 0.5,\n", - " 'Blackcurrant': 0.63,\n", - " 'Blueberry': 0.41,\n", - " 'Cherry': 0.45,\n", - " 'Coconut': 0.68,\n", - " 'Fig': 0.54,\n", - " 'Grape': 0.59,\n", - " 'Grapefruit': 0.9,\n", - " 'Kiwi': 0.18,\n", - " 'Lemon': 0.14,\n", - " 'Mango': 0.72,\n", - " 'Nectarine': 0.68,\n", - " 'Orange': 0.23,\n", - " 'Papaya': 0.27,\n", - " 'Passion Fruit': 0.41,\n", - " 'Peach': 0.5,\n", - " 'Pineapple': 0.5,\n", - " 'Plum': 0.54,\n", - " 'Raspberry': 0.36,\n", - " 'Strawberry': 0.45,\n", - " 'Watermelon': 0.41}" + "{'Apple': 0.32,\n", + " 'Apricot': 0.21,\n", + " 'Banana': 0.24,\n", + " 'Blackberry': 0.45,\n", + " 'Blackcurrant': 0.57,\n", + " 'Blueberry': 0.37,\n", + " 'Cherry': 0.41,\n", + " 'Coconut': 0.61,\n", + " 'Fig': 0.49,\n", + " 'Grape': 0.53,\n", + " 'Grapefruit': 0.81,\n", + " 'Kiwi': 0.16,\n", + " 'Lemon': 0.13,\n", + " 'Mango': 0.65,\n", + " 'Nectarine': 0.61,\n", + " 'Orange': 0.21,\n", + " 'Papaya': 0.24,\n", + " 'Passion Fruit': 0.37,\n", + " 'Peach': 0.45,\n", + " 'Pineapple': 0.45,\n", + " 'Plum': 0.49,\n", + " 'Raspberry': 0.32,\n", + " 'Strawberry': 0.41,\n", + " 'Watermelon': 0.37}" ] }, "metadata": { "tags": [] }, - "execution_count": 37 + "execution_count": 41 } ] }, @@ -1846,25 +2064,12 @@ }, "source": [ "d_frutas_filtro = {}\n", + "\n", "for key, value in d_frutas.items():\n", " if value > 0.5:\n", " d_frutas_filtro.update({key: value})" ], - "execution_count": 38, - "outputs": [] - }, - { - "cell_type": "code", - "metadata": { - "id": "fu1jgbfUhPRD" - }, - "source": [ - "d_frutas_filtro = {}\n", - "for key, value in d_frutas.items():\n", - " if value == 0.5:\n", - " d_frutas_filtro.update({key: value})" - ], - "execution_count": 40, + "execution_count": null, "outputs": [] }, { @@ -1880,7 +2085,7 @@ "cell_type": "code", "metadata": { "id": "SsStWM5k1s-Q", - "outputId": "3fc3a8ef-4627-40d7-e24e-f2bee8e22e66", + "outputId": "a473e0a5-1ece-47ff-b8ae-2a460d0b3dc2", "colab": { "base_uri": "https://localhost:8080/" } @@ -1888,19 +2093,28 @@ "source": [ "d_frutas_filtro" ], - "execution_count": 41, + "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ - "{'Blackberry': 0.5, 'Peach': 0.5, 'Pineapple': 0.5}" + "{'Blackberry': 0.55,\n", + " 'Blackcurrant': 0.7,\n", + " 'Coconut': 0.75,\n", + " 'Fig': 0.6,\n", + " 'Grape': 0.65,\n", + " 'Mango': 0.8,\n", + " 'Nectarine': 0.75,\n", + " 'Peach': 0.55,\n", + " 'Pineapple': 0.55,\n", + " 'Plum': 0.6}" ] }, "metadata": { "tags": [] }, - "execution_count": 41 + "execution_count": 3 } ] }, @@ -2031,10 +2245,9 @@ "cell_type": "code", "metadata": { "id": "SH0WIKZ8-Ylr", - "outputId": "b9cea719-637e-40a5-9e79-eb67aeb47887", + "outputId": "0352afbe-7189-4b41-9428-7980a76c8442", "colab": { - "base_uri": "https://localhost:8080/", - "height": 425 + "base_uri": "https://localhost:8080/" } }, "source": [ @@ -2047,36 +2260,36 @@ "output_type": "execute_result", "data": { "text/plain": [ - "[('Apple', 0.4),\n", - " ('Apricot', 0.25),\n", - " ('Avocado', 0.35),\n", - " ('Banana', 0.3),\n", - " ('Blackberry', 0.55),\n", - " ('Blackcurrant', 0.7),\n", - " ('Blueberry', 0.45),\n", - " ('Cherry', 0.5),\n", - " ('Coconut', 0.75),\n", - " ('Fig', 0.6),\n", - " ('Grape', 0.65),\n", - " ('Kiwi', 0.2),\n", - " ('Lemon', 0.15),\n", - " ('Mango', 0.8),\n", - " ('Nectarine', 0.75),\n", - " ('Orange', 0.25),\n", - " ('Papaya', 0.3),\n", - " ('Passion Fruit', 0.45),\n", - " ('Peach', 0.55),\n", - " ('Pineapple', 0.55),\n", - " ('Plum', 0.6),\n", - " ('Raspberry', 0.4),\n", - " ('Strawberry', 0.5),\n", - " ('Watermelon', 0.45)]" + "[('Apple', 0.32),\n", + " ('Apricot', 0.21),\n", + " ('Banana', 0.24),\n", + " ('Blackberry', 0.45),\n", + " ('Blackcurrant', 0.57),\n", + " ('Blueberry', 0.37),\n", + " ('Cherry', 0.41),\n", + " ('Coconut', 0.61),\n", + " ('Fig', 0.49),\n", + " ('Grape', 0.53),\n", + " ('Grapefruit', 0.81),\n", + " ('Kiwi', 0.16),\n", + " ('Lemon', 0.13),\n", + " ('Mango', 0.65),\n", + " ('Nectarine', 0.61),\n", + " ('Orange', 0.21),\n", + " ('Papaya', 0.24),\n", + " ('Passion Fruit', 0.37),\n", + " ('Peach', 0.45),\n", + " ('Pineapple', 0.45),\n", + " ('Plum', 0.49),\n", + " ('Raspberry', 0.32),\n", + " ('Strawberry', 0.41),\n", + " ('Watermelon', 0.37)]" ] }, "metadata": { "tags": [] }, - "execution_count": 12 + "execution_count": 46 } ] }, @@ -2156,7 +2369,7 @@ "cell_type": "code", "metadata": { "id": "iJq1clvOHVG2", - "outputId": "2c89ba89-3377-4337-87f1-050360788da9", + "outputId": "66747a5b-c319-4b99-eda0-afcb187ea867", "colab": { "base_uri": "https://localhost:8080/" } @@ -2164,42 +2377,42 @@ "source": [ "d_frutas" ], - "execution_count": 42, + "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ - "{'Apple': 0.63,\n", - " 'Apricot': 0.23,\n", - " 'Banana': 0.27,\n", - " 'Blackberry': 0.5,\n", - " 'Blackcurrant': 0.63,\n", - " 'Blueberry': 0.41,\n", - " 'Cherry': 0.45,\n", - " 'Coconut': 0.68,\n", - " 'Fig': 0.54,\n", - " 'Grape': 0.59,\n", - " 'Grapefruit': 0.9,\n", - " 'Kiwi': 0.18,\n", - " 'Lemon': 0.14,\n", - " 'Mango': 0.72,\n", - " 'Nectarine': 0.68,\n", - " 'Orange': 0.23,\n", - " 'Papaya': 0.27,\n", - " 'Passion Fruit': 0.41,\n", - " 'Peach': 0.5,\n", - " 'Pineapple': 0.5,\n", - " 'Plum': 0.54,\n", - " 'Raspberry': 0.36,\n", - " 'Strawberry': 0.45,\n", - " 'Watermelon': 0.41}" + "{'Apple': 0.32,\n", + " 'Apricot': 0.21,\n", + " 'Banana': 0.24,\n", + " 'Blackberry': 0.45,\n", + " 'Blackcurrant': 0.57,\n", + " 'Blueberry': 0.37,\n", + " 'Cherry': 0.41,\n", + " 'Coconut': 0.61,\n", + " 'Fig': 0.49,\n", + " 'Grape': 0.53,\n", + " 'Grapefruit': 0.81,\n", + " 'Kiwi': 0.16,\n", + " 'Lemon': 0.13,\n", + " 'Mango': 0.65,\n", + " 'Nectarine': 0.61,\n", + " 'Orange': 0.21,\n", + " 'Papaya': 0.24,\n", + " 'Passion Fruit': 0.37,\n", + " 'Peach': 0.45,\n", + " 'Pineapple': 0.45,\n", + " 'Plum': 0.49,\n", + " 'Raspberry': 0.32,\n", + " 'Strawberry': 0.41,\n", + " 'Watermelon': 0.37}" ] }, "metadata": { "tags": [] }, - "execution_count": 42 + "execution_count": 47 } ] }, @@ -2216,14 +2429,13 @@ "cell_type": "code", "metadata": { "id": "uIDW5FhwAiSs", - "outputId": "52599d3f-ff13-4894-f697-ce7290bff9d5", + "outputId": "b266365b-417a-4f9f-9fda-c033446472e8", "colab": { - "base_uri": "https://localhost:8080/", - "height": 34 + "base_uri": "https://localhost:8080/" } }, "source": [ - "d_frutas2 = {k: v for k, v in filter(lambda t: t[0] == 'Apple', d_frutas.items())}\n", + "d_frutas2 = {chave: valor for chave, valor in filter(lambda t: t[0] == 'Apple', d_frutas.items())}\n", "d_frutas2" ], "execution_count": null, @@ -2238,7 +2450,50 @@ "metadata": { "tags": [] }, - "execution_count": 6 + "execution_count": 4 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "3XmPlpNqBVMl" + }, + "source": [ + "### A expressão acima é equivalente à expressão abaixo:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "_5j19I7tiHgp", + "outputId": "87e3bd82-8ec6-4f59-c8e2-74aaa80858d3", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "source": [ + "d_filtro = {}\n", + "\n", + "for chave, valor in d_frutas.items():\n", + " if chave == 'Apple':\n", + " d_filtro.update({chave: valor})\n", + "\n", + "d_filtro" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "{'Apple': 0.4}" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 8 } ] }, @@ -2248,43 +2503,55 @@ "id": "nUMGIzxeNt_U" }, "source": [ - "### Filtrando por valor:" + "### Filtrando por valor:\n", + "\n", + "Equivalente a:\n", + "\n", + "```\n", + "d_frutas3 = {}\n", + "\n", + "for key, value in d_frutas.items():\n", + " if value > 0.5:\n", + " d_frutas3.update({key: value})\n", + "```" ] }, { "cell_type": "code", "metadata": { "id": "tvHcQatANltL", - "outputId": "6ca05107-f13c-4175-9d60-41a7615fd233", + "outputId": "8feaf5b1-1db8-4391-8950-248ba8ab46c5", "colab": { - "base_uri": "https://localhost:8080/" + "base_uri": "https://localhost:8080/", + "height": 187 } }, "source": [ "d_frutas3 = {k: v for k, v in filter(lambda t: t[1] > 0.5, d_frutas.items())}\n", "d_frutas3" ], - "execution_count": 43, + "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ - "{'Apple': 0.63,\n", - " 'Blackcurrant': 0.63,\n", - " 'Coconut': 0.68,\n", - " 'Fig': 0.54,\n", - " 'Grape': 0.59,\n", - " 'Grapefruit': 0.9,\n", - " 'Mango': 0.72,\n", - " 'Nectarine': 0.68,\n", - " 'Plum': 0.54}" + "{'Blackberry': 0.55,\n", + " 'Blackcurrant': 0.7,\n", + " 'Coconut': 0.75,\n", + " 'Fig': 0.6,\n", + " 'Grape': 0.65,\n", + " 'Mango': 0.8,\n", + " 'Nectarine': 0.75,\n", + " 'Peach': 0.55,\n", + " 'Pineapple': 0.55,\n", + " 'Plum': 0.6}" ] }, "metadata": { "tags": [] }, - "execution_count": 43 + "execution_count": 7 } ] }, @@ -2322,16 +2589,16 @@ "cell_type": "code", "metadata": { "id": "0BBWO9Zth_mc", - "outputId": "6f948288-be8e-4541-94b9-b067c17f13e4", + "outputId": "18783570-d8b5-4fa1-9f19-747b0db288da", "colab": { "base_uri": "https://localhost:8080/" } }, "source": [ - "d_colaboradores= {'Gerentes': ['A', 'B', 'C'], 'Programadores': ['B', 'D', 'E', 'F', 'G'], 'Gerentes_Projeto': ['A', 'E']}\n", + "d_colaboradores = {'Gerentes': ['A', 'B', 'C'], 'Programadores': ['B', 'D', 'E', 'F', 'G'], 'Gerentes_Projeto': ['A', 'E']}\n", "d_colaboradores" ], - "execution_count": 45, + "execution_count": 2, "outputs": [ { "output_type": "execute_result", @@ -2345,7 +2612,7 @@ "metadata": { "tags": [] }, - "execution_count": 45 + "execution_count": 2 } ] }, @@ -2361,71 +2628,74 @@ { "cell_type": "code", "metadata": { - "id": "XH3BgvYfi01G", - "outputId": "d2fe69fa-5c0b-47b9-9413-e9c2019c19d1", + "id": "rGvVgyz7jxwn", + "outputId": "c4e02509-6910-46c5-d906-b7d6f542dfb3", "colab": { - "base_uri": "https://localhost:8080/", - "height": 129 + "base_uri": "https://localhost:8080/" } }, "source": [ - "d_colaboradores{'Gerentes'[0]}" + "d_colaboradores['Gerentes']" ], - "execution_count": 46, + "execution_count": null, "outputs": [ { - "output_type": "error", - "ename": "SyntaxError", - "evalue": "ignored", - "traceback": [ - "\u001b[0;36m File \u001b[0;32m\"\"\u001b[0;36m, line \u001b[0;32m1\u001b[0m\n\u001b[0;31m d_colaboradores{'Gerentes'[0]}\u001b[0m\n\u001b[0m ^\u001b[0m\n\u001b[0;31mSyntaxError\u001b[0m\u001b[0;31m:\u001b[0m invalid syntax\n" - ] + "output_type": "execute_result", + "data": { + "text/plain": [ + "['A', 'B', 'C']" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 50 } ] }, { "cell_type": "code", "metadata": { - "id": "kCILrtFPjBTe", - "outputId": "4d8c8aca-341d-4528-8ba2-53a898683c49", + "id": "c-VwXvdij3QQ", + "outputId": "f4344858-8ebf-4e0c-b336-e7a6ed4a43a2", "colab": { "base_uri": "https://localhost:8080/" } }, "source": [ - "d_colaboradores.get('Gerentes')\n" + "d_colaboradores['Programadores']" ], - "execution_count": 50, + "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ - "['A', 'B', 'C']" + "['B', 'D', 'E', 'F', 'G']" ] }, "metadata": { "tags": [] }, - "execution_count": 50 + "execution_count": 51 } ] }, { "cell_type": "code", "metadata": { - "id": "emYGFLuJk65D", - "outputId": "8fe63e1f-8421-4d10-916e-58520b8d71fe", + "id": "WV0WaGB4kCiP", + "outputId": "171e4ea0-c66f-49c2-f4ea-deb44b315d43", "colab": { "base_uri": "https://localhost:8080/", "height": 35 } }, "source": [ - "list_gerentes=d_colaboradores.get('Gerentes')\n", - "list_gerentes[0]" + "s_gerentes = d_colaboradores['Gerentes']\n", + "s_gerentes[0]" ], - "execution_count": 53, + "execution_count": null, "outputs": [ { "output_type": "execute_result", @@ -2440,127 +2710,339 @@ "metadata": { "tags": [] }, - "execution_count": 53 + "execution_count": 62 } ] }, { "cell_type": "code", "metadata": { - "id": "B8ZbnSYNlZgt", - "outputId": "06d1ea72-8a6d-49fd-9d1d-83cf8a080b24", + "id": "yRrG7wUgkf6K", + "outputId": "122c0ff9-47af-4a50-874e-42779aa3c068", "colab": { - "base_uri": "https://localhost:8080/", - "height": 35 + "base_uri": "https://localhost:8080/" } }, "source": [ - "d_colaboradores.get('Gerentes')[0]\n", - "\n" + "s_gerente_A = d_colaboradores.values()\n", + "s_gerente_A" ], - "execution_count": 57, + "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { - "application/vnd.google.colaboratory.intrinsic+json": { - "type": "string" - }, "text/plain": [ - "'A'" + "dict_values([['A', 'B', 'C'], ['B', 'D', 'E', 'F', 'G'], ['A', 'E']])" ] }, "metadata": { "tags": [] }, - "execution_count": 57 + "execution_count": 55 } ] }, { "cell_type": "markdown", "metadata": { - "id": "4JnYRazemjq5" + "id": "ntVcr_3XwaQ-" + }, + "source": [ + "## Exercício 3\n", + "Consulte a página [Python Data Types: Dictionary - Exercises, Practice, Solution](https://www.w3resource.com/python-exercises/dictionary/) para mais exercícios relacionados à dicionários." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "7u5-o8dzlryA" }, "source": [ - "retornar os cargos que o funcionario !A! ocupa\n" + "## **Exercício 4**\n", + "\n", + "Retornar do dicionário d_colaboradores somente o Programador cujo nome seja 'E'. " ] }, { "cell_type": "code", "metadata": { - "id": "SX2h28bXj33b", - "outputId": "8be812f8-d86a-4a47-d439-55c229c216c9", + "id": "W97EV00T1ejw", + "outputId": "f043e621-a387-4db4-bd19-a1410af98f46", "colab": { - "base_uri": "https://localhost:8080/" + "base_uri": "https://localhost:8080/", + "height": 35 } }, "source": [ - "d_colaboradores['Gerentes']" + "chave='Gerentes'\n", + "d_colaboradores['Gerentes'][0]\n", + "d_colaboradores[chave][0]" ], - "execution_count": 51, + "execution_count": 24, "outputs": [ { "output_type": "execute_result", "data": { + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "string" + }, "text/plain": [ - "['A', 'B', 'C']" + "'A'" ] }, "metadata": { "tags": [] }, - "execution_count": 51 + "execution_count": 24 } ] }, { - "cell_type": "markdown", + "cell_type": "code", "metadata": { - "id": "XCkq6tN9mqgn" + "id": "1CGVsatkzuu4", + "outputId": "d6beb963-ec37-4f40-944b-4b9a5bf264b0", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 129 + } }, "source": [ - "retornar do dicinario d_colaboradores somente o progarmador cujo nome seja 'E'" + "for key in d_colaboradores.keys():\n", + " i_posição=0\n", + " for i_indice in d_colaboradores[key][i_indice]:\n", + " if 'E' in d_colaboradores.get(key):\n", + " valor = d_colaboradores.get(key)\n", + " for i_posição < valor:\n", + " d_colaboradores.get(key)[i_posição]\n", + " print(d_colaboradores.get(key)[i_posição])\n", + " valor = d_colaboradores.get(key)\n", + " tamanho=len(valor)\n", + " print(tamanho)\n", + " print(valor)\n", + " valor[valor == 'E']\n", + " print(valor)\n", + " print(key)\n", + " print(i_indice)\n", + " #print(d_colaboradores[key][i_indice])\n", + " print(d_colaboradores.values())" + ], + "execution_count": 42, + "outputs": [ + { + "output_type": "error", + "ename": "SyntaxError", + "evalue": "ignored", + "traceback": [ + "\u001b[0;36m File \u001b[0;32m\"\"\u001b[0;36m, line \u001b[0;32m6\u001b[0m\n\u001b[0;31m for i_posição < valor:\u001b[0m\n\u001b[0m ^\u001b[0m\n\u001b[0;31mSyntaxError\u001b[0m\u001b[0;31m:\u001b[0m invalid syntax\n" + ] + } ] }, { "cell_type": "markdown", "metadata": { - "id": "BSXBKuQvm0y0" + "id": "zfnP-CArmPb4" }, "source": [ - "quais são os colaboradores que \n", - "a)são ao mesmo tempo gerente e gerente de projeto\n", - "b) gerentes de projetos e programadores" + "## **Exercício 5**\n", + "\n", + "Retornar qual é o cargo do funcionário (todas as pessoas da organização) que se chama 'A'." ] }, { "cell_type": "markdown", "metadata": { - "id": "ntVcr_3XwaQ-" + "id": "qc6xDbMGvwX9" }, "source": [ - "## Exercício 3\n", - "Consulte a página [Python Data Types: Dictionary - Exercises, Practice, Solution](https://www.w3resource.com/python-exercises/dictionary/) para mais exercícios relacionados à dicionários." + "'A' in " ] }, { - "cell_type": "markdown", + "cell_type": "code", "metadata": { - "id": "ZWRUI3Q0m0GQ" + "id": "zIUF3u6Bv7r5", + "outputId": "8d5657c7-71d9-452e-ea1e-1e6a24f212a9", + "colab": { + "base_uri": "https://localhost:8080/" + } }, "source": [ - "" + "for key in d_colaboradores.keys():\n", + " if 'A' in d_colaboradores[key]:\n", + " print(key)\n" + ], + "execution_count": 5, + "outputs": [ + { + "output_type": "stream", + "text": [ + "Gerentes\n", + "Gerentes_Projeto\n" + ], + "name": "stdout" + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "O0QqPmYdxDHY", + "outputId": "bb30b080-9356-4238-d33c-9a8222963096", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "source": [ + "cargos = []\n", + "for key in d_colaboradores.keys():\n", + " if 'A' in d_colaboradores[key]:\n", + " cargos.append(key)\n", + "print(cargos)" + ], + "execution_count": 6, + "outputs": [ + { + "output_type": "stream", + "text": [ + "['Gerentes', 'Gerentes_Projeto']\n" + ], + "name": "stdout" + } ] }, { "cell_type": "markdown", "metadata": { - "id": "eKwYMbxymyai" + "id": "VzjLVeFvmnjk" + }, + "source": [ + "## **Exercício 6**\n", + "\n", + "* Quais são os colabores que são ao mesmo tempo:\n", + " * Gerente de Projeto e Gerente (funcional)?\n", + " * Gerentes de Projeto e Programadores?" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "wnCi3kWcl8Sb", + "outputId": "4a9f4bdd-6e59-494f-c4bb-e093c409f842", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "source": [ + " import numpy as np\n", + " gerente_e_gerenteproj=[]\n", + " gerente_e_gerenteproj=np.intersect1d(d_colaboradores['Gerentes'],d_colaboradores['Gerentes_Projeto'])\n", + " print(gerente_e_gerenteproj, 'são gerentes e gerentes de projeto')" + ], + "execution_count": 11, + "outputs": [ + { + "output_type": "stream", + "text": [ + "['A'] são gerentes e gerentes de projeto\n" + ], + "name": "stdout" + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "hKS4X7c2zfX9", + "outputId": "ecf5c9cd-2c86-4cb9-9060-b517c37ddd7f", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "source": [ + "print(np.intersect1d(d_colaboradores['Gerentes'],d_colaboradores['Gerentes_Projeto']), 'são gerentes e gerentes de projeto')" + ], + "execution_count": 15, + "outputs": [ + { + "output_type": "stream", + "text": [ + "['A'] são gerentes e gerentes de projeto\n" + ], + "name": "stdout" + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "k1N4T6fVyMCR", + "outputId": "f11a1554-ef09-49f7-a0ef-0618e1112dbe", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "source": [ + "gerente_e_programador=np.intersect1d(d_colaboradores['Gerentes'],d_colaboradores['Programadores'])\n", + "print(gerente_e_programador, 'são gerentes e programadores')" + ], + "execution_count": 13, + "outputs": [ + { + "output_type": "stream", + "text": [ + "['B'] são gerentes e programadores\n" + ], + "name": "stdout" + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "PmJHD32dzeV2" }, "source": [ "" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "GT82fPS0zGJ0", + "outputId": "69184b7c-6027-4bd0-ee64-5e7962608c14", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "source": [ + "print(np.intersect1d(d_colaboradores['Gerentes'],d_colaboradores['Programadores']), 'são gerentes e programadores')" + ], + "execution_count": 14, + "outputs": [ + { + "output_type": "stream", + "text": [ + "['B'] são gerentes e programadores\n" + ], + "name": "stdout" + } ] + }, + { + "cell_type": "code", + "metadata": { + "id": "smjv-gaKzaUl" + }, + "source": [ + "" + ], + "execution_count": null, + "outputs": [] } ] } \ No newline at end of file From 497bf8229fec9c3732e3a4ddfc5a13e951216a0f Mon Sep 17 00:00:00 2001 From: MariaJacobs70 <72224154+MariaJacobs70@users.noreply.github.com> Date: Thu, 8 Oct 2020 15:39:56 -0300 Subject: [PATCH 7/9] Criado usando o Colaboratory --- Notebooks/NB09_01__Functions_alterado2.ipynb | 1662 ++++++++++++++++++ 1 file changed, 1662 insertions(+) create mode 100644 Notebooks/NB09_01__Functions_alterado2.ipynb diff --git a/Notebooks/NB09_01__Functions_alterado2.ipynb b/Notebooks/NB09_01__Functions_alterado2.ipynb new file mode 100644 index 000000000..092a9649b --- /dev/null +++ b/Notebooks/NB09_01__Functions_alterado2.ipynb @@ -0,0 +1,1662 @@ +{ + "nbformat": 4, + "nbformat_minor": 0, + "metadata": { + "colab": { + "name": "NB09_01__Functions.ipynb", + "provenance": [], + "private_outputs": true, + "include_colab_link": true + }, + "kernelspec": { + "name": "python3", + "display_name": "Python 3" + } + }, + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "view-in-github", + "colab_type": "text" + }, + "source": [ + "\"Open" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "d_YndS20uqkK" + }, + "source": [ + "

FUNÇÕES

\n", + "\n", + "\n", + "\n", + "# **AGENDA**:\n", + "\n", + "> Veja o **índice** dos itens que serão abordados neste capítulo.\n", + "\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "e0UKAZQvJ_c2" + }, + "source": [ + "___\n", + "# **INTRODUÇÃO ÀS FUNÇÕES**\n", + "> Funções são uma sequência de comandos para executar uma tarefa.\n", + ">> Atenção ao que recomenda o PEP8 sobre como escrever funções." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "Z4-gPTjZUP50" + }, + "source": [ + "# Não executar este codigo!\n", + "def funcao(arg1, arg2, ..., argN):\n", + " " + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "etxNlyRYo39A" + }, + "source": [ + "def show_hello_world():\n", + " print('Hello World!')" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "G6I9PFvZpBgR" + }, + "source": [ + "type(show_hello_world)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "_meNdNygpIbv" + }, + "source": [ + "show_hello_world()" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "6zfLd8HwpPpg" + }, + "source": [ + "___\n", + "# **DOCUMENTAR FUNÇÕES COM COMMENTS/DOCSTRING**" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "3yzgBxtNpRi_" + }, + "source": [ + "def show_hello_world():\n", + " '''\n", + " Esta função faz um cumprimento: 'Hello World!'\n", + " Inputs: \n", + " param1: djdjdjdjdj\n", + " param2: fjrjirjjirjir\n", + " '''\n", + " print('Hello World!')" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "0rBaxjpmpbm1" + }, + "source": [ + "show_hello_world()" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "6ThOwDQp4TfR" + }, + "source": [ + "# Se quisermos ver a documentação da função, basta invocar o statement __doc__ da seguinte forma:\n", + "show_hello_world.__doc__" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "9YZ2afpNA4st" + }, + "source": [ + "OU..." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "uSnwA4BVA5_t" + }, + "source": [ + "help(show_hello_world)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "whbnnMA5p1Jw" + }, + "source": [ + "___\n", + "# **FUNÇÕES COM ARGUMENTOS**" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "O3bSjLA_qTTc" + }, + "source": [ + "Definir a função mostra_nome com dois argumentos: s_primeiro_nome e s_ultimo_nome:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "9jWyCCPPp4yS" + }, + "source": [ + "def mostra_nome(s_primeiro_nome, s_ultimo_nome):\n", + " print(f'Olá, meu nome é {s_primeiro_nome} {s_ultimo_nome}')" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "VOB3Ip63qIzr" + }, + "source": [ + "mostra_nome('Nelio', 'Machado')" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Oi0c_GuesfcL" + }, + "source": [ + "Neste caso, o primeiro argumento da função (s_primeiro_nome) vai receber o valor 'Nelio' e o segundo argumento da função (s_ultimo_nome) vai receber 'Machado'." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "qkMblpnLsITO" + }, + "source": [ + "No entanto, também podemos invocar a função da seguinte forma:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "TTli7e6xsMCo" + }, + "source": [ + "mostra_nome(s_ultimo_nome = 'Machado', s_primeiro_nome = 'Nelio')" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "rmatMmhTsaVc" + }, + "source": [ + "Observe que o resultado é o mesmo. No entanto, desta forma, estamos dizendo o valor específico que cada parâmetro irá receber." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "PnNYrgJ6VQo9" + }, + "source": [ + "## PEP8 + Annotations = Códigos mais fáceis de entender e atualizar\n", + "\n", + "> Observe abaixo quando combinamos PEP8 + Annotations para tornar o código Python ainda mais detalhado. O objetivo de _Annotations_ é deixar o código mais claro, sem mudar o comportamento da função. No exemplo abaixo, os argumentos da função s_primeiro_nome e s_ultimo_nome são argumentos do tipo _str_ e a função retorna um _output_ do tipo _str_." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "aU2Sob37VVmi" + }, + "source": [ + "def mostra_nome2(s_primeiro_nome: str, s_ultimo_nome: str) -> str:\n", + " print(f'Olá, meu nome é {s_primeiro_nome} {s_ultimo_nome}')" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "iIvqS73mXNam" + }, + "source": [ + "mostra_nome2(s_ultimo_nome = 'Machado', s_primeiro_nome = 'Nelio')" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "rSnrtFNtXrbN" + }, + "source": [ + "# **\\*args**\n", + "> \\*args permite que você passe mais argumentos do que o número de argumentos formais que você definiu anteriormente." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "aT0_PeuEvXiP" + }, + "source": [ + "## Exemplo 1\n", + "> Considere a função (simples) para imprimir o nome completo de um cliente." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "Npbi_Hy0bUec" + }, + "source": [ + "# definimos a função mostra_nome3 da seguinte forma:\n", + "def mostra_nome3(*args):\n", + " nome = ' '.join(args)\n", + " \n", + " print(f'Olá, meu nome é {nome}.')" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "dFzM0gA3_9za" + }, + "source": [ + "mostra_nome3('Nelio', 'Machado')" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "370bpgaSvDbJ" + }, + "source": [ + "E agora, a função recebe qualquer quantidade de parâmetros." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "4kYcu6PEX-Nz" + }, + "source": [ + "mostra_nome3('Pedro', 'de', 'Alcantara', 'Francisco', 'Antonio', 'Joao', 'Carlos', 'Xavier', 'de', 'Paula', 'Miguel', 'Rafael', 'Joaquim', 'Jose', 'Gonzaga', 'Pascoal', 'Cipriano', 'Serafim')" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "KMgngPmFimxb" + }, + "source": [ + "Observe que desta forma pouco importa a quantidade de parâmetros que passamos á função." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Y9pDa6ZRjo0U" + }, + "source": [ + "## Exemplo 2\n", + "* Suponha que estamos insteressados em desenvolver uma função que multiplica dois números (passados como parâmetros)." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "1A-vhsHxv1YE" + }, + "source": [ + "Antes de vermos a solução usando \\*args, vamos ver como seria nossa função se \\*args não existisse." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "cCDwruF8j5i5" + }, + "source": [ + "### Forma \"Normal\"" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "_R03BiwLjtwB" + }, + "source": [ + "# Definição da função\n", + "def multiplicar_numeros(x1, x2):\n", + " '''\n", + " Objetivo: Esta função multiplica DOIS números passados como argumentos.\n", + " Autor: Nelio Machado\n", + " Data: 04/10/2020\n", + " '''\n", + " return x1 * x2" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "0eVm1Qj9kDtd" + }, + "source": [ + "print(multiplicar_numeros(3, 4))" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "4h9Nhkickf_8" + }, + "source": [ + "### Usando \\*args" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "9Kf89meJkjw8" + }, + "source": [ + "def multiplicar_numeros2(*args):\n", + " '''\n", + " Objetivo: Esta função multiplica vários números passados como argumentos.\n", + " Autor: Nelio Machado\n", + " Data: 04/10/2020\n", + " '''\n", + " print(args)\n", + " print(type(args))\n", + " x = 1\n", + " for N in args:\n", + " x *= N\n", + " \n", + " return x" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "ZuIzwitWk7by" + }, + "source": [ + "print(multiplicar_numeros2(1, 2, 3, 4, 5))" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "U5kyPu792gMN" + }, + "source": [ + "Eu também posso fazer da seguinte forma:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "oc2NJmJf2s7X" + }, + "source": [ + "args= (1, 2, 3, 4, 5)\n", + "print(multiplicar_numeros2(*args))" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "38jVie_IjMXI" + }, + "source": [ + "# \\**kwargs\n", + "\n", + "* \\**kwargs é usado para passar um dicionário de comprimento variável para uma função.\n", + "* Argumento do tipo {chave: valor};\n", + "\n", + "* Para exemplificar o uso de \\**kwargs, vou usar parte do dicionário dFruits que definimos na sessão [Dictionaries](Dictionaries.ipynb). Qualquer dúvida, volte áquele capítulo para relembrar os principais conceitos." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "yAntQ724nMbv" + }, + "source": [ + "# Definindo a função para receber parâmetros em forma de dicionário:\n", + "def imprime_frutas(**kwargs):\n", + " '''\n", + " Objetivo: Esta função imprime as frutas contidas em kwargs.\n", + " Autor: Nelio Machado\n", + " Data: 04/10/2020\n", + " '''\n", + " for key, value in kwargs.items():#o .itens()devolve dois valores chave e valor\n", + " print(f'O valor de {key} é {value}')" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "_FaKEXk_d9jQ" + }, + "source": [ + "[15:25] Vinicius Fabri Brenck\n", + "Isso aqui tb funciona normalmente\n", + "\n", + "[15:25] Vinicius Fabri Brenck\n", + "def imprimir_normal(dicionario):  for key, value in dicionario.items():    print('O valor de {} é {}'.format(key, value))imprimir_normal(d_frutas)\n", + "\n", + "[15:25] Vinicius Fabri Brenck\n", + "O resultado é exatamente o mesmo\n", + "\n" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "sdGIG-c-cuOW" + }, + "source": [ + "##\n", + "##o próprio construtor de dicionário pode ser chamado assim.\n", + "\n", + "##dict(a=1, b=2) >>> {​​'a':1, 'b':2}​​" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "eYI0dxKyeBZJ" + }, + "source": [ + "##[15:25] Vinicius Fabri Brenck\n", + "##Isso aqui tb funciona normalmente\n", + "\n", + "##[15:25] Vinicius Fabri Brenck\n", + "def imprimir_normal(dicionario):  for key, value in dicionario.items():    print('O valor de {} é {}'.format(key, value))imprimir_normal(d_frutas)\n", + "\n", + "##[15:25] Vinicius Fabri Brenck\n", + "##O resultado é exatamente o mesmo\n", + "\n" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "jFrnCnJWdJjw" + }, + "source": [ + "d_frutas['blblba'] = 0.1\n", + "d_frutas" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "jpmSk9mfxww3" + }, + "source": [ + "Atenção à forma como os itens são passados à função!" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "88-1lStInaVs" + }, + "source": [ + "imprime_frutas(Avocado = 0.35, Apple = 0.4, Apricot = 0.25, Banana = 0.30)## essa coisa esquisita é uma forma de passar o dicionario" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "-jb_kkLiyQt8" + }, + "source": [ + "No entanto, posso passar um dicionário na forma como estamos acostumados, da seguinte forma:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "JZJNiLz7wgCy" + }, + "source": [ + "d_frutas = {'Apple': 0.4, 'Avocado': 0.3, 'Orange': 0.5, 'Lemon': 0.25}" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "eUCum4JPEcxD" + }, + "source": [ + "imprime_frutas(**d_frutas) ## a sintaxe é ** nome do dicionário" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "3VuoDBftdnfi" + }, + "source": [ + "Cristiano:\n", + "eu fazia a seguinte analogia:\n", + "\n", + "*iterable explode um iterável em argumentos posicionias.\n", + "\n", + "**mapper explode um mapaeamento (dicionário) em argumentos nomeados." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "iK8-e7a1sXmn" + }, + "source": [ + "___\n", + "# **Python return**\n", + "> Uma função Python pode ou não retornar um valor." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "HS0dGA55siWw" + }, + "source": [ + "def par_ou_impar(i_numero1, i_numero2):\n", + " '''\n", + " Esta função somente avalia se a soma de dois números é par ou impar. \n", + " A função retorna odd ou even.\n", + " '''\n", + " i_soma = i_numero1+i_numero2\n", + " i_modulo = i_soma % 2\n", + " print(f'A soma é {i_soma}')\n", + " if i_modulo > 0:\n", + " return 'Odd'\n", + " else:\n", + " return 'Even' " + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "mZTG2tDJuIZQ" + }, + "source": [ + "i_numero1 = int(input('Por favor, informe o primeiro número: '))\n", + "i_numero2 = int(input('Por favor, informe o segundo número.: '))" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "7p_9pq3Du18a" + }, + "source": [ + "type(i_numero1)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "4oO7aAjcvCAe" + }, + "source": [ + "type(i_numero2)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "Br7yT8UHuKYY" + }, + "source": [ + "s_resultado = par_ou_impar(i_numero1, i_numero2)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "601QnggJuhf-" + }, + "source": [ + "print(f'O resultado é {s_resultado}')" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "t6HNf9j9yKcT" + }, + "source": [ + "Mostra o valor de i_modulo ou i_soma:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "Yu8RsyDAyXne" + }, + "source": [ + "i_modulo" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "nx3twrLRyaeJ" + }, + "source": [ + "Python reporta que i_modulo não existe.\n", + "Está correta esta informação?" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "imkyRO4kyvgV" + }, + "source": [ + "Considere o exemplo a seguir:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "kwRiXDA5y19h" + }, + "source": [ + "i_modulo = 0\n", + "\n", + "def par_ou_impar_v2(i_numero1, i_numero2):\n", + " '''\n", + " Esta função somente avalia se a soma de dois números é par ou impar. \n", + " A função retorna odd ou even.\n", + " '''\n", + " i_soma = i_numero1+i_numero2\n", + " i_modulo = i_soma % 2\n", + " print(f'A soma é {i_soma}')\n", + " if i_modulo > 0:\n", + " return 'Odd'\n", + " else:\n", + " return 'Even' " + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "GYxLSGQLy_Ai" + }, + "source": [ + "i_numero1 = int(input('Por favor, informe o primeiro número: '))\n", + "i_numero2 = int(input('Por favor, informe o segundo número.: '))" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "NMtv99fjzHGs" + }, + "source": [ + "s_resultado = par_ou_impar_v2(i_numero1, i_numero2)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "qjOHnYDVzNGK" + }, + "source": [ + "print(f'O resultado é {s_resultado}')" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "pPTecxRfzQUc" + }, + "source": [ + "Agora, vamos checar o valor de i_modulo..." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "jkQb2mQzzTEo" + }, + "source": [ + "i_modulo" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "oOlyGxBAzjE3" + }, + "source": [ + "Porque agora o Python reconhece a variável i_modulo?" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "dceSkt9Z0BZh" + }, + "source": [ + "___\n", + "# **ESCOPO DE VARIÁVEIS: LOCAL & GLOBAL**\n", + "* **Local** - Variável declarada dentro da função. Em outras palavras, é uma variável local/uso da função.\n", + "\n", + "* **Global** - Variável declarada fora da função. Neste caso, a variável é visível à todo o programa. Entretanto, não se pode alterar o valor da variável dentro da função. Caso queira alterar o valor da variável dentro da função, então é necesário declarar a variável usando a palavra reservada 'global’." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "0tIjI9GScPxu" + }, + "source": [ + "## Exemplo 1" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "QRojHHJ20iTY" + }, + "source": [ + "def exemplo1():\n", + " i_valor = 20\n", + " i_valor += 1\n", + " print(i_valor)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "RdhElmTs0y1c" + }, + "source": [ + "exemplo1()" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Tytq7PnH08pz" + }, + "source": [ + "O escopo da variável 'i_valor' é local, ou seja, de uso/restrito à função. " + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "299AK0PA1lIg" + }, + "source": [ + "i_valor" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "gGP4cx17y8EZ" + }, + "source": [ + "Portanto, o erro acima faz sentido, pois a variável i_valor é restrito á função. Ou seja, fora da função o Python não conhece este valor." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "KTV_6Gzxfvpc" + }, + "source": [ + "## Exemplo 2" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "zyi9AyJwfxTm" + }, + "source": [ + "i_valor= 100\n", + "\n", + "def exemplo2():\n", + " i_valor = 20\n", + " i_valor += 1\n", + " print(i_valor)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "iEWrboG6gBSs" + }, + "source": [ + "exemplo2()" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "JPvT0BHG-vxE" + }, + "source": [ + "Isso é um tanto estranho! Definimos, fora da função, i_valor= 100 e, dentro da função, redefinimos i_valor= 20. Entretanto, como vimos, exemplo2() retorna 21 como resultado." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "N_t8tIDC-149" + }, + "source": [ + "Agora, a seguir, fora da função, pedimos para ver o valor de i_valor e temos, como resposta, o valor 100." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "I46Bn4FlgJLu" + }, + "source": [ + "i_valor" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "IQlP5nbngL6E" + }, + "source": [ + "Saberia nos explicar o que está acontecendo?" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "h8PHd6rLgtwK" + }, + "source": [ + "## Exemplo 3" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "qB7_zPQVgvVT" + }, + "source": [ + "i_valor = 100\n", + "\n", + "def exemplo3():\n", + " global i_valor\n", + " i_valor = 20\n", + " i_valor += 1\n", + " print(i_valor)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "2KgQSbYCg8Eq" + }, + "source": [ + "exemplo3()" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "Y7yWoojrg_9Z" + }, + "source": [ + "i_valor" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "cGlmbIJGzWG6" + }, + "source": [ + "Saberia explicar o que acontece neste exemplo?" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "X8qFfIoxhFOp" + }, + "source": [ + "## Exemplo 4" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "ZM-yTLuO1bFh" + }, + "source": [ + "i_valor = 20\n", + "\n", + "def exemplo4(): ## dentro de exemplo4 eu não defini a variavel i_valor, para dar certo tenho que colocar o global i_valor\n", + " i_valor += 1\n", + " print(i_valor)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "oLvfPO8w1zwL" + }, + "source": [ + "exemplo4()" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "WvA2CjOGfPr9" + }, + "source": [ + "i_valor = 20\n", + "\n", + "def exemplo4(): ## dentro de exemplo4 eu não defini a variavel i_valor, para dar certo tenho que colocar o global i_valor\n", + " glocal i_valor += 1\n", + " print(i_valor)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "Qyl5gKHEfTZN" + }, + "source": [ + "exemplo4()" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "2V7QzpZp2QcM" + }, + "source": [ + "Qual a razão deste erro?" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "w9qI8kln1_C7" + }, + "source": [ + "i_valor" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "AQFFGqLI1FWn" + }, + "source": [ + "___\n", + "# **ARGUMENTOS DEFAULT**\n", + "> Considere o exemplo a seguir: toda vez que vai ao supermercado compra 1 pack de leite (contendo 4 garrafas) e 1 garrafão de água de 5L. Portanto, de forma simples, podemos definir nossa função da seguinte forma:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "HbcSTiBI4nOj" + }, + "source": [ + "# Define a função para receber os parâmetros arroz, feijao, leite e água.\n", + "def lista_de_compras(arroz, feijao, leite= 1, agua= 1):\n", + " '''\n", + " Documentação da função: objetivos, autor e data.\n", + " '''\n", + " print('Lista de Compras:')\n", + " print(f'Quantidade de arroz.: {arroz} kilos.') \n", + " print(f'Quantidade de feijão: {feijao} kilos.') \n", + " print(f'Quantidade de leite.: {leite} pack com 4.') \n", + " print(f'Quantidade de água..: {agua} garrafa de 5 litros.') " + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "vwZnDgoq5pgB" + }, + "source": [ + "lista_de_compras(5, 3)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "l7bY5BSO7eJF" + }, + "source": [ + "Como leite= 1 e agua= 1 são valores default's, não precisamos passar esses parâmetros, desde que informamos ao Python o valor default. No entanto, se numa determinada semana precisarmos de 2 pack's de leite, ao invés de 1, devemos informar ao Python o novo valor:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "YY4OrFuH7yXi" + }, + "source": [ + "lista_de_compras(5, 3, 2)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "-nfrZAvN73YT" + }, + "source": [ + "Da mesma forma, se numa outra semana precisarmos de 2 garrafões de água ao invés de 1, informamos ao Python da seguinte forma:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "Vpoh6TdM7_xb" + }, + "source": [ + "lista_de_compras(5, 3, 2, 2)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "q3qZn9FuVQly" + }, + "source": [ + "___\n", + "# **map()**" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Dav8k0JYWi4B" + }, + "source": [ + "## Exemplo 1\n", + "> Suponha que queremos o quadrado de cada número passado à uma função." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "R6NC0i2OVktM" + }, + "source": [ + "l_numeros= [0, 1, 2, 3, 4, 5]" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "AVjYlN44Vw2k" + }, + "source": [ + "def quadrado_do_numero(i_numero):\n", + " return i_numero**2" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "i_4CHiehV7lD" + }, + "source": [ + "list(map(quadrado_do_numero, l_numeros)) ### o map precisa de dois parametro :a função é o parametro e um iterable (tupla,array)\n" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "ckT2g4CcgNKV" + }, + "source": [ + "map(quadrado_do_numero, l_numeros) ## se eu não der o list ele deixa o resultado na memoria que é um iteravel" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "5tq8QDSPWNf6" + }, + "source": [ + "OU..." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "ZAfkybybWOcG" + }, + "source": [ + "for i in map(quadrado_do_numero, l_numeros):\n", + " print(i)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "c01V5CEzWlGF" + }, + "source": [ + "## Exemplo 2\n", + "> substituir_truer todos os valores True da lista abaixo por 1 e False por 0." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "qH1ackDZWvKp" + }, + "source": [ + "import random\n", + "\n", + "l_dados = []\n", + "for i in range(50):\n", + " random.seed(i)\n", + " l_dados.append(random.choice([True, False]))\n", + " \n", + "l_dados" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "Dt2UKC-WXsxr" + }, + "source": [ + "def substituir_true(s_String):\n", + " if s_String == True:\n", + " return 1\n", + " else:\n", + " return 0" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "BIIkPuDEXaM0" + }, + "source": [ + "list(map(substituir_true, l_dados))" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "TzkLIH1gYpFQ" + }, + "source": [ + "___\n", + "# **Filter()**\n", + "* Filtra elementos baseado em condições." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "cjU8YznfZai1" + }, + "source": [ + "Suponha que agora eu quero filtrar os itens True da lista l_dados." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "a3SeaKJgZlAZ" + }, + "source": [ + "def filtrar_true(item):\n", + " if item == True:\n", + " return True\n", + " else:\n", + " return False" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "1Z1APDQtZyXs" + }, + "source": [ + "list(filter(filtrar_true, l_dados))" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "xPpFqVUnKEH7" + }, + "source": [ + "___\n", + "# **EXERCÍCIOS**" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "RDgCRPRs0W6C" + }, + "source": [ + "## Exercício 1\n", + "Construa uma função para retornar o dia da semana a partir de um número, sendo:\n", + "\n", + "* 1 - Dom\n", + "* 2 - Seg\n", + "* 3 - Ter\n", + "* 4 - Qua\n", + "* 5 - Qui\n", + "* 6 - Sex\n", + "* 7 - Sab" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "H17JO6sLOrG7" + }, + "source": [ + "### Minha solução" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "wX_7XDyB0XSy" + }, + "source": [ + "def dia_da_semana(dia):\n", + " d_palavra= {1: 'Segunda',\n", + " 2: 'Terça',\n", + " 3: 'Quarta',\n", + " 4: 'Quinta',\n", + " 5: 'Sexta',\n", + " 6: 'Sabado',\n", + " 7: 'Domingo' }\n", + " return d_palavra.get(dia,\"Dia da semana inválido. Informe um número de 1 a 7\")" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "39toyCRU1Q5T" + }, + "source": [ + "dia_da_semana(1)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "wt5hQq__1UEd" + }, + "source": [ + "dia_da_semana(0)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "N53NOsZjOv9m" + }, + "source": [ + "## Exercício 2\n", + "* Desenvolver uma função que retorna True se s_palavra pertence à uma string e False caso contrário." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "7j-HHsxFrX5t" + }, + "source": [ + "def palavra_está_string (s_palavra, s_string):\n", + " if s_palavra in s_string:\n", + " return True\n", + " else:\n", + " return False\n" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "EOTAE-oMrYJW" + }, + "source": [ + "s_string = 'O amor é o fogo que arde sem se ver. É ferida que dói e não se sente. É um contentamento descontente. É dor que desatina sem doer'\n", + "s_string" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "tfzOT6w0thZg" + }, + "source": [ + "s_palavra = 'fogo'\n", + "palavra_está_string (s_palavra, s_string)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "xxunBr3ttnji" + }, + "source": [ + "" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "vrBZ_68-PBWl" + }, + "source": [ + "### Minha solução:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "m4Pi4S8hPC_u" + }, + "source": [ + "def check_palavra(s_frase, s_palavra):\n", + " if s_palavra in s_frase:\n", + " return True\n", + " else:\n", + " return False" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "NJeqwxDjPxub" + }, + "source": [ + "A frase abaixo foi extraída de [+ Bíblia + Camões + Legião Urbana - (Guerra) = Monte Castelo](http://compondoletras.blogspot.com/2013/11/biblia-camoes-legiao-urbana-guerra.html)" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "Dj_n_beIPRBN" + }, + "source": [ + "s_frase = 'O amor é o fogo que arde sem se ver. É ferida que dói e não se sente. É um contentamento descontente. É dor que desatina sem doer'\n", + "s_frase" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "s40FJ9iCPPY0" + }, + "source": [ + "s_palavra = 'fogo'" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "tzc2eaM7QUFE" + }, + "source": [ + "A palavra s_palavra está em s_frase?" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "2tlravrMQXn2" + }, + "source": [ + "check_palavra(s_frase, s_palavra)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "XFBVXsW_rVG2" + }, + "source": [ + "" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "pMx9E0xMu1lc" + }, + "source": [ + "## Exercício 3\n", + "Para mais exercícios envolvendo funções, consulte [Python functions - Exercises, Practice, Solution](https://www.w3resource.com/python-exercises/python-functions-exercises.php)." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "Mw6Wg5hFvFMR" + }, + "source": [ + "" + ], + "execution_count": null, + "outputs": [] + } + ] +} \ No newline at end of file From 5c20b86b727ad16f8b875d54bc8b407001a65315 Mon Sep 17 00:00:00 2001 From: MariaJacobs70 <72224154+MariaJacobs70@users.noreply.github.com> Date: Wed, 21 Oct 2020 07:34:55 -0300 Subject: [PATCH 8/9] Criado usando o Colaboratory --- ...0_04__3DP_4_Anomaly_Detection mexido.ipynb | 4445 +++++++++++++++++ 1 file changed, 4445 insertions(+) create mode 100644 Notebooks/NB10_04__3DP_4_Anomaly_Detection mexido.ipynb diff --git a/Notebooks/NB10_04__3DP_4_Anomaly_Detection mexido.ipynb b/Notebooks/NB10_04__3DP_4_Anomaly_Detection mexido.ipynb new file mode 100644 index 000000000..43cdff8b9 --- /dev/null +++ b/Notebooks/NB10_04__3DP_4_Anomaly_Detection mexido.ipynb @@ -0,0 +1,4445 @@ +{ + "nbformat": 4, + "nbformat_minor": 0, + "metadata": { + "colab": { + "name": "NB10_04__3DP_4_Anomaly_Detection.ipynb", + "provenance": [], + "collapsed_sections": [], + "include_colab_link": true + }, + "kernelspec": { + "name": "python3", + "display_name": "Python 3" + } + }, + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "view-in-github", + "colab_type": "text" + }, + "source": [ + "\"Open" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "EAqSDJGzyYrx" + }, + "source": [ + "

3DP_4 - ANOMALY/OUTLIER DETECTION

\n", + "ANÁLISE DE OUTLIERS\n", + "significa que estou no passo 3 do data preparation. e 4 é a ordem dentro dessa fase\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "H-VrOjTTymSK" + }, + "source": [ + "# **AGENDA**:\n", + "\n", + "> Consulte a **Table of contents**." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "wSAsbafemNax" + }, + "source": [ + "# **Melhorias da sessão**\n", + "* Mostrar junto os gráficos com a região de Anomaly Score junto com a distribuição de probabilidade das variáveis envolvidas.\n", + "* Mensagens de deprecating --> Analisar e substituir os métodos, funções deprecated;\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "7qK6Yx0tBqUz" + }, + "source": [ + "___\n", + "# **Referências**\n", + "* [Comparing anomaly detection algorithms for outlier detection on toy datasets](https://scikit-learn.org/stable/auto_examples/plot_anomaly_comparison.html#sphx-glr-auto-examples-plot-anomaly-comparison-py)\n", + "* [Outlier detection with several methods](https://scikit-learn.org/0.18/auto_examples/covariance/plot_outlier_detection.html)\n", + "* [anomaly-detection-resources](https://github.com/MathMachado/anomaly-detection-resources)\n", + "* [Outlier Detection with Extended Isolation Forest](https://towardsdatascience.com/outlier-detection-with-extended-isolation-forest-1e248a3fe97b)\n", + "* [Outlier Detection with Isolation Forest](https://towardsdatascience.com/outlier-detection-with-isolation-forest-3d190448d45e)." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "f7tTnUJ6B2UG" + }, + "source": [ + "___\n", + "## O que é Anomaly Detection?\n", + "> Qualquer ponto/observação que é incomum quando comparado com todos os outros pontos/observações." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "7VJZf1U5Ds_w" + }, + "source": [ + "___\n", + "# **Machine Learning com Python (Scikit-Learn)**\n", + "\n", + "![Scikit-Learn](https://github.com/MathMachado/Materials/blob/master/scikit-learn-1.png?raw=true)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "rpHJ1qVUEwOn" + }, + "source": [ + "___\n", + "# **Técnicas tradicionais para detecção de outliers**" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "OOI_VTo3E3sv" + }, + "source": [ + "## Boxplot\n", + "\n", + "![BoxPlot](https://github.com/MathMachado/Materials/blob/master/boxplot.png?raw=true)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "vivFsmJGFVC0" + }, + "source": [ + "## Z-Score\n", + "* Z-Score pode ser utilizado para detectar Outliers.\n", + "* É a diferença entre o valor e a média da amostra expressa como o número de desvios-padrão. \n", + "* Se o escore z for menor que 2,5 ou maior que 2,5, o valor estará nos 5% do menor ou maior valor (2,5% dos valores em ambas as extremidades da distribuição). No entanto, é pratica comum utilizarmos 3 ao invés dos 2,5.\n", + "\n", + "![Z_Score](https://github.com/MathMachado/Materials/blob/master/Z_Score.png?raw=true)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "hUw6W3SSFiwj" + }, + "source": [ + "## IQR Score\n", + "\n", + "* O Intervalo interquartil (IQR) é uma medida de dispersão estatística, sendo igual à diferença entre os percentis 75 e 25, ou entre quartis superiores e inferiores, IQR = Q3 - Q1.\n", + "\n", + "![BoxPlot](https://github.com/MathMachado/Materials/blob/master/boxplot.png?raw=true)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "7_YohlTIF8zi" + }, + "source": [ + "___\n", + "# **Hands-On**" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "OrXdGg8t0V_D" + }, + "source": [ + "## Carrega as Bibliotecas necessárias" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "7pYqwxIe1Hcq", + "outputId": "a5238044-16ea-47c3-c2d5-ef8054a493aa", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 663 + } + }, + "source": [ + "!pip install pyod" + ], + "execution_count": 1, + "outputs": [ + { + "output_type": "stream", + "text": [ + "Collecting pyod\n", + "\u001b[?25l Downloading https://files.pythonhosted.org/packages/a3/4b/d2edd1e85b132d480feced17f044267b3e330391240779d78b1c3d378b24/pyod-0.8.3.tar.gz (96kB)\n", + "\r\u001b[K |███▍ | 10kB 14.5MB/s eta 0:00:01\r\u001b[K |██████▊ | 20kB 1.9MB/s eta 0:00:01\r\u001b[K |██████████▏ | 30kB 2.5MB/s eta 0:00:01\r\u001b[K |█████████████▌ | 40kB 2.7MB/s eta 0:00:01\r\u001b[K |█████████████████ | 51kB 2.1MB/s eta 0:00:01\r\u001b[K |████████████████████▎ | 61kB 2.4MB/s eta 0:00:01\r\u001b[K |███████████████████████▊ | 71kB 2.6MB/s eta 0:00:01\r\u001b[K |███████████████████████████ | 81kB 2.9MB/s eta 0:00:01\r\u001b[K |██████████████████████████████▍ | 92kB 3.1MB/s eta 0:00:01\r\u001b[K |████████████████████████████████| 102kB 2.7MB/s \n", + "\u001b[?25hCollecting combo\n", + " Downloading https://files.pythonhosted.org/packages/0a/2a/61b6ac584e75d8df16dc27962aa5fe99d76b09da5b6710e83d4862c84001/combo-0.1.1.tar.gz\n", + "Requirement already satisfied: joblib in /usr/local/lib/python3.6/dist-packages (from pyod) (0.16.0)\n", + "Requirement already satisfied: matplotlib in /usr/local/lib/python3.6/dist-packages (from pyod) (3.2.2)\n", + "Requirement already satisfied: numpy>=1.13 in /usr/local/lib/python3.6/dist-packages (from pyod) (1.18.5)\n", + "Requirement already satisfied: numba>=0.35 in /usr/local/lib/python3.6/dist-packages (from pyod) (0.48.0)\n", + "Requirement already satisfied: pandas>=0.25 in /usr/local/lib/python3.6/dist-packages (from pyod) (1.1.2)\n", + "Requirement already satisfied: scipy>=0.19.1 in /usr/local/lib/python3.6/dist-packages (from pyod) (1.4.1)\n", + "Requirement already satisfied: scikit_learn>=0.19.1 in /usr/local/lib/python3.6/dist-packages (from pyod) (0.22.2.post1)\n", + "Requirement already satisfied: six in /usr/local/lib/python3.6/dist-packages (from pyod) (1.15.0)\n", + "Requirement already satisfied: statsmodels in /usr/local/lib/python3.6/dist-packages (from pyod) (0.10.2)\n", + "Collecting suod\n", + "\u001b[?25l Downloading https://files.pythonhosted.org/packages/a1/87/9170cabe1b5e10a7d095c0e28f2e30e7c1886a13f063de85d3cfacc06f4b/suod-0.0.4.tar.gz (2.1MB)\n", + "\u001b[K |████████████████████████████████| 2.1MB 8.6MB/s \n", + "\u001b[?25hRequirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.6/dist-packages (from matplotlib->pyod) (0.10.0)\n", + "Requirement already satisfied: python-dateutil>=2.1 in /usr/local/lib/python3.6/dist-packages (from matplotlib->pyod) (2.8.1)\n", + "Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.6/dist-packages (from matplotlib->pyod) (1.2.0)\n", + "Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /usr/local/lib/python3.6/dist-packages (from matplotlib->pyod) (2.4.7)\n", + "Requirement already satisfied: setuptools in /usr/local/lib/python3.6/dist-packages (from numba>=0.35->pyod) (50.3.0)\n", + "Requirement already satisfied: llvmlite<0.32.0,>=0.31.0dev0 in /usr/local/lib/python3.6/dist-packages (from numba>=0.35->pyod) (0.31.0)\n", + "Requirement already satisfied: pytz>=2017.2 in /usr/local/lib/python3.6/dist-packages (from pandas>=0.25->pyod) (2018.9)\n", + "Requirement already satisfied: patsy>=0.4.0 in /usr/local/lib/python3.6/dist-packages (from statsmodels->pyod) (0.5.1)\n", + "Building wheels for collected packages: pyod, combo, suod\n", + " Building wheel for pyod (setup.py) ... \u001b[?25l\u001b[?25hdone\n", + " Created wheel for pyod: filename=pyod-0.8.3-cp36-none-any.whl size=110349 sha256=8a3d12b45521f4215a1a25ab5641035110c2e231b63217ef8d77d5f4b37e9f9e\n", + " Stored in directory: /root/.cache/pip/wheels/29/46/95/86facd235cce1d58ae6747ab1aea2b3742564325a66a60863a\n", + " Building wheel for combo (setup.py) ... \u001b[?25l\u001b[?25hdone\n", + " Created wheel for combo: filename=combo-0.1.1-cp36-none-any.whl size=42113 sha256=98b1e51b50dfc241776ed56aec23e7bbe693938654b828ea7a7720fcf4ec3c0f\n", + " Stored in directory: /root/.cache/pip/wheels/55/ec/e5/a2331372c676c467e70c6646e646edf6997d5c4905b8c0f5e6\n", + " Building wheel for suod (setup.py) ... \u001b[?25l\u001b[?25hdone\n", + " Created wheel for suod: filename=suod-0.0.4-cp36-none-any.whl size=2167158 sha256=1caa776362bcbb19af861f1d849b5885230a55792d5d4ac1ad0ec6df9433d7d0\n", + " Stored in directory: /root/.cache/pip/wheels/57/55/e5/a4fca65bba231f6d0115059b589148774b41faea25b3f2aa27\n", + "Successfully built pyod combo suod\n", + "Installing collected packages: combo, suod, pyod\n", + "Successfully installed combo-0.1.1 pyod-0.8.3 suod-0.0.4\n" + ], + "name": "stdout" + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "gxBgvhA4mowO" + }, + "source": [ + "import pandas as pd\n", + "import numpy as np\n", + "from numpy import percentile\n", + "import matplotlib.pyplot as plt\n", + "import seaborn as sns\n", + "import matplotlib\n", + "\n", + "from sklearn.ensemble import IsolationForest\n", + "\n", + "# Scaling variables\n", + "from sklearn.preprocessing import StandardScaler\n", + "from sklearn.preprocessing import MinMaxScaler\n", + "\n", + "from pyod.models.abod import ABOD ### interessante para detecção de outliers\n", + "from pyod.models.cblof import CBLOF\n", + "#from pyod.models.feature_bagging import FeatureBagging\n", + "from pyod.models.hbos import HBOS\n", + "from pyod.models.iforest import IForest ### interessante para detecção de outliers\n", + "from pyod.models.knn import KNN\n", + "#from pyod.models.lof import LOF\n", + "from scipy import stats\n", + "\n", + "# remove warnings to keep notebook clean\n", + "import warnings\n", + "warnings.filterwarnings('ignore')" + ], + "execution_count": 2, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "WLf_c29t0ekj" + }, + "source": [ + "## Carrega dataframe" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "YL_VQljA0gxZ", + "outputId": "35857212-e5e8-4065-a37f-e83958c34819", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 204 + } + }, + "source": [ + "df_titanic = sns.load_dataset('titanic')### LOAD.dataset é da biblioteca sns.\n", + "df_titanic = df_titanic.dropna() ## isso é um simplificação, não é assim que seja o ideal\n", + "df_titanic.head()" + ], + "execution_count": 3, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
survivedpclasssexagesibspparchfareembarkedclasswhoadult_maledeckembark_townalivealone
111female38.01071.2833CFirstwomanFalseCCherbourgyesFalse
311female35.01053.1000SFirstwomanFalseCSouthamptonyesFalse
601male54.00051.8625SFirstmanTrueESouthamptonnoTrue
1013female4.01116.7000SThirdchildFalseGSouthamptonyesFalse
1111female58.00026.5500SFirstwomanFalseCSouthamptonyesTrue
\n", + "
" + ], + "text/plain": [ + " survived pclass sex age ... deck embark_town alive alone\n", + "1 1 1 female 38.0 ... C Cherbourg yes False\n", + "3 1 1 female 35.0 ... C Southampton yes False\n", + "6 0 1 male 54.0 ... E Southampton no True\n", + "10 1 3 female 4.0 ... G Southampton yes False\n", + "11 1 1 female 58.0 ... C Southampton yes True\n", + "\n", + "[5 rows x 15 columns]" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 3 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "Q2oxyyQWB-uz" + }, + "source": [ + "# Normalizar as variáveis 'age' e 'fare'\n", + "df_titanic_ss = df_titanic.copy()\n", + "df_titanic_ss[['fare', 'age']] = StandardScaler().fit_transform(df_titanic_ss[['fare','age']])" + ], + "execution_count": 6, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "rAKnKtil9Oz1", + "outputId": "99989e0f-a80b-4eda-d0b9-6016114dca67", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + } + }, + "source": [ + "# Linhas do df_titanic\n", + "df_titanic_ss.shape" + ], + "execution_count": 7, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "(182, 15)" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 7 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "sHSYUkEQFIwS" + }, + "source": [ + "# Função para plotar o Boxplot\n", + "def boxplot_sobreviveu(df, column):\n", + " plt.rcdefaults()\n", + " sns.catplot(x = 'survived', y = column, kind = \"box\", data = df, height = 4, aspect = 1.5)\n", + " \n", + " # add data points to boxplot with stripplot\n", + " sns.stripplot(x = 'survived', y = column, data = df, alpha = 0.3, jitter = 0.2, color = 'k');\n", + " plt.show()" + ], + "execution_count": 8, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "o9-VgcNnFNb1", + "outputId": "9f5cd781-1066-43ff-dfb9-4167ac817895", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 426 + } + }, + "source": [ + "boxplot_sobreviveu(df_titanic, 'fare') ## versao univariada \n", + "##quem pagou mais pela tarifa, quem morreu ou quem sobreviveu ? foi o 1 quem sobreviveu\n", + "###o tamanho da caixinha laranja é maior, a mediana também é maior\n", + "###olhando para os outliers podemos dizer que , " + ], + "execution_count": 9, + "outputs": [ + { + "output_type": "display_data", + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "tags": [] + } + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "8FIo9tD1FQ0u", + "outputId": "b8761d55-6258-405e-934f-97d477538e9b", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 426 + } + }, + "source": [ + "boxplot_sobreviveu(df_titanic, 'age')" + ], + "execution_count": 10, + "outputs": [ + { + "output_type": "display_data", + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "tags": [] + } + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "fCqj102y9Kfo", + "outputId": "e3580cba-ce0b-4a2b-cdaf-e3421d9ef741", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 170 + } + }, + "source": [ + "# Descrever o dataframe, variável 'fare'\n", + "df_titanic_ss['fare'].describe()" + ], + "execution_count": 11, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "count 1.820000e+02\n", + "mean 2.537653e-16\n", + "std 1.002759e+00\n", + "min -1.034601e+00\n", + "25% -6.452479e-01\n", + "50% -2.873576e-01\n", + "75% 1.452571e-01\n", + "max 5.681797e+00\n", + "Name: fare, dtype: float64" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 11 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "SMcvIb1K_69n", + "outputId": "975fc065-ffdd-4e67-f3c0-103d02276998", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 472 + } + }, + "source": [ + "plt.scatter(range(df_titanic_ss.shape[0]), np.sort(df_titanic_ss['fare'].values)) ## está mexendo com o dataframe normalizado\n", + "plt.xlabel('index')\n", + "plt.ylabel('Fares')\n", + "plt.title(\"Distribuição da variável Fare\")\n", + "\n", + "sns.despine()" + ], + "execution_count": 12, + "outputs": [ + { + "output_type": "display_data", + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "tags": [] + } + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "NCcXnHHYIlM4", + "outputId": "c49ba369-ac5a-47b2-85a0-b25cffdfe4fe", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 284 + } + }, + "source": [ + "df_titanic_ss.describe()" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
survivedpclassagesibspparchfare
count182.000000182.0000001.820000e+02182.000000182.0000001.820000e+02
mean0.6758241.1923081.464030e-170.4670330.4780222.537653e-16
std0.4693570.5164111.002759e+000.6450070.7558691.002759e+00
min0.0000001.000000-2.220506e+000.0000000.000000-1.034601e+00
25%0.0000001.000000-7.437173e-010.0000000.000000-6.452479e-01
50%1.0000001.0000002.411064e-020.0000000.000000-2.873576e-01
75%1.0000001.0000007.759421e-011.0000001.0000001.452571e-01
max1.0000003.0000002.839480e+003.0000004.0000005.681797e+00
\n", + "
" + ], + "text/plain": [ + " survived pclass ... parch fare\n", + "count 182.000000 182.000000 ... 182.000000 1.820000e+02\n", + "mean 0.675824 1.192308 ... 0.478022 2.537653e-16\n", + "std 0.469357 0.516411 ... 0.755869 1.002759e+00\n", + "min 0.000000 1.000000 ... 0.000000 -1.034601e+00\n", + "25% 0.000000 1.000000 ... 0.000000 -6.452479e-01\n", + "50% 1.000000 1.000000 ... 0.000000 -2.873576e-01\n", + "75% 1.000000 1.000000 ... 1.000000 1.452571e-01\n", + "max 1.000000 3.000000 ... 4.000000 5.681797e+00\n", + "\n", + "[8 rows x 6 columns]" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 11 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "7pzTvLleGpWc", + "outputId": "15cc9af9-d48d-4354-b409-26a0b3446f7b", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 472 + } + }, + "source": [ + "# Distribuição da variável 'fare'\n", + "\n", + "sns.distplot(df_titanic_ss['fare'])\n", + "plt.title(\"Distribuição da variável Fare\")\n", + "sns.despine()" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "display_data", + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "tags": [] + } + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Qa28Hc3ZC6FV" + }, + "source": [ + "___\n", + "## Kurtosis\n", + "> Kurtosis é uma medida estatística que define com que intensidade as caudas de uma distribuição diferem das caudas de uma distribuição Normal. Em outras palavras, a curtose identifica se as caudas de uma determinada distribuição contêm valores extremos.\n", + ">> A Kurtosis de uma distribuição Normal padrão é igual a 3. Portanto, se Kurtosis-3 > 0, então isso é o que chamamos de excesso de Kurtosis.\n", + ">>> **Alta Kurtosis é um indicador de que os dados possuem caudas pesadas ou outliers**.\n", + "\n", + "* **Dica muito importante**: Normalize os dados antes!" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ynyNHZqmD-tb" + }, + "source": [ + "___\n", + "## Skewness\n", + "> É o grau de distorção da distribuição, ou seja, mede a falta de simetria na distribuição de dados, diferenciando valores extremos em uma cauda versus na outra. Uma distribuição simétrica terá uma assimetria de 0.\n", + "\n", + "![Skewness](https://github.com/MathMachado/Materials/blob/master/Skewness.png?raw=true)\n", + "\n", + "Source: [Skew and Kurtosis: 2 Important Statistics terms you need to know in Data Science](https://codeburst.io/2-important-statistics-terms-you-need-to-know-in-data-science-skewness-and-kurtosis-388fef94eeaa)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Uoo3xVhBFixi" + }, + "source": [ + "### Interpretando a Skewness (Rule of Thumb)\n", + "* Se -0.5 < Skewness < 0.5: Dados razoavelmente simétricos;\n", + "* Se -1 < Skewness < -0.5: Dados moderadamente negativa;\n", + "* Se 0.5 < Skewness < 1: Dados moderadamente positiva;\n", + "* Se Skewness < -1: Dados altamente negativa;\n", + "* Se Skewness > 1: Dados altamente positiva.\n", + "\n", + "> **Dica**: Normalize os dados antes!" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "oHg3nyjUTiRu", + "outputId": "eaacbc04-e67b-4aa6-fec1-5d9cc97fa449", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 50 + } + }, + "source": [ + "# Cálculo das medidas de Skewness e Kurtosis para 'fare'\n", + "print(f\"Skewness: {df_titanic_ss['fare'].skew()}\")\n", + "print(f\"Kurtosis: {df_titanic_ss['fare'].kurt()}\")" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "text": [ + "Skewness: 2.7073683146429004\n", + "Kurtosis: 10.690697893681472\n" + ], + "name": "stdout" + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "V2jCZLGVH3Qu" + }, + "source": [ + "Olhando para as medidas de Skewness e Kurtosis logo acima, qual a conclusão?" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "0nnFS8vi_rOe", + "outputId": "dcd9091c-3c1d-4388-caaf-efb28f8eba55", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 170 + } + }, + "source": [ + "# Distribuição da variável 'age'\n", + "df_titanic_ss['age'].describe()" + ], + "execution_count": 13, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "count 1.820000e+02\n", + "mean 1.464030e-17\n", + "std 1.002759e+00\n", + "min -2.220506e+00\n", + "25% -7.437173e-01\n", + "50% 2.411064e-02\n", + "75% 7.759421e-01\n", + "max 2.839480e+00\n", + "Name: age, dtype: float64" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 13 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "h9ZmvO1b_4sF", + "outputId": "a5ca3bfd-2d10-4f88-88c0-01d58d543c16", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 472 + } + }, + "source": [ + "plt.scatter(range(df_titanic_ss.shape[0]), np.sort(df_titanic_ss['age'].values))\n", + "plt.xlabel('index')\n", + "plt.ylabel('age')\n", + "plt.title(\"Distribuição da variável age\")\n", + "sns.despine()" + ], + "execution_count": 14, + "outputs": [ + { + "output_type": "display_data", + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "tags": [] + } + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "GIAYrDJyCT6r", + "outputId": "7355ec62-b4d1-4586-d92c-3f7f2b16f222", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 472 + } + }, + "source": [ + "sns.distplot(df_titanic_ss['age'])\n", + "plt.title(\"Distribuição da variável age\")\n", + "sns.despine()" + ], + "execution_count": 15, + "outputs": [ + { + "output_type": "display_data", + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "tags": [] + } + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "USy48-H2UXqB", + "outputId": "3abad211-8b21-4a78-ea46-0e3587866c92", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 51 + } + }, + "source": [ + "# Cálculo das medidas de Skewness e Kurtosis para 'age'\n", + "print(f\"Skewness: {df_titanic_ss['age'].skew()}\")\n", + "print(f\"Kurtosis: {df_titanic_ss['age'].kurt()}\")" + ], + "execution_count": 16, + "outputs": [ + { + "output_type": "stream", + "text": [ + "Skewness: 0.01841894050949496\n", + "Kurtosis: -0.2309427735598728\n" + ], + "name": "stdout" + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ENQaVw2lItVL" + }, + "source": [ + "Olhando para as medidas de Skewness e Kurtosis logo acima, qual a conclusão?" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Nt0PQIjW-wXd" + }, + "source": [ + "___\n", + "## **Isolation Forest Region**\n", + "* Source: [Outlier Detection with Isolation Forest](https://towardsdatascience.com/outlier-detection-with-isolation-forest-3d190448d45e)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "tM6Xht76KmUN" + }, + "source": [ + "### Anomaly Detection para 'fare'" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "uFuAUh5S778M", + "outputId": "7ea4f291-c983-4736-b0c7-2b18a65c17e8", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 388 + } + }, + "source": [ + "isolation_forest = IsolationForest(n_estimators = 100)\n", + "isolation_forest.fit(df_titanic['fare'].values.reshape(-1, 1))\n", + "xx = np.linspace(df_titanic['fare'].min(), df_titanic['fare'].max(), len(df_titanic)).reshape(-1, 1)\n", + "anomaly_score = isolation_forest.decision_function(xx)\n", + "outlier = isolation_forest.predict(xx)\n", + "plt.figure(figsize = (10, 4))\n", + "plt.plot(xx, anomaly_score, label = 'anomaly score')\n", + "plt.fill_between(xx.T[0], np.min(anomaly_score), np.max(anomaly_score), where = outlier == -1, color = 'r', alpha = .4, label = 'outlier region')\n", + "plt.legend()\n", + "plt.ylabel('anomaly score')\n", + "plt.xlabel('fare')\n", + "plt.show();" + ], + "execution_count": 17, + "outputs": [ + { + "output_type": "display_data", + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "tags": [] + } + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "FkhRwo1cgYtK", + "outputId": "771f93e4-9d22-41cf-d6f0-6ee611326a47", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 195 + } + }, + "source": [ + "# Vamos avaliar os dados do dataframe para fare > 200, por exemplo\n", + "df_titanic.loc[df_titanic['fare'] > 200].head()" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
survivedpclasssexagesibspparchfareembarkedclasswhoadult_maledeckembark_townalivealone
2701male19.032263.0000SFirstmanTrueCSouthamptonnoFalse
8811female23.032263.0000SFirstwomanFalseCSouthamptonyesFalse
11801male24.001247.5208CFirstmanTrueBCherbourgnoFalse
29911female50.001247.5208CFirstwomanFalseBCherbourgyesFalse
31111female18.022262.3750CFirstwomanFalseBCherbourgyesFalse
\n", + "
" + ], + "text/plain": [ + " survived pclass sex age ... deck embark_town alive alone\n", + "27 0 1 male 19.0 ... C Southampton no False\n", + "88 1 1 female 23.0 ... C Southampton yes False\n", + "118 0 1 male 24.0 ... B Cherbourg no False\n", + "299 1 1 female 50.0 ... B Cherbourg yes False\n", + "311 1 1 female 18.0 ... B Cherbourg yes False\n", + "\n", + "[5 rows x 15 columns]" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 19 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "XFbRlmrYgtTS", + "outputId": "c59e564a-392c-4d63-8743-08f9ca4300b2", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 286 + } + }, + "source": [ + "# Zoom na linha 27\n", + "df_titanic.loc[27]" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "survived 0\n", + "pclass 1\n", + "sex male\n", + "age 19\n", + "sibsp 3\n", + "parch 2\n", + "fare 263\n", + "embarked S\n", + "class First\n", + "who man\n", + "adult_male True\n", + "deck C\n", + "embark_town Southampton\n", + "alive no\n", + "alone False\n", + "Name: 27, dtype: object" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 20 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "bH4o-CL-N9Np" + }, + "source": [ + "A região onde os dados têm baixa probabilidade de aparecer fica no lado direito da distribuição." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "7HK9cBvwGOqG" + }, + "source": [ + "### Anomaly Detection para 'age'" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "PoDzs4DTFSY-", + "outputId": "21bdf5d8-71fd-449f-d644-2f8ef94c2605", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 388 + } + }, + "source": [ + "isolation_forest = IsolationForest(n_estimators = 100)\n", + "isolation_forest.fit(df_titanic['age'].values.reshape(-1, 1))\n", + "xx = np.linspace(df_titanic['age'].min(), df_titanic['age'].max(), len(df_titanic)).reshape(-1, 1)\n", + "anomaly_score = isolation_forest.decision_function(xx)\n", + "outlier = isolation_forest.predict(xx)\n", + "plt.figure(figsize = (10, 4))\n", + "plt.plot(xx, anomaly_score, label='anomaly score')\n", + "plt.fill_between(xx.T[0], np.min(anomaly_score), np.max(anomaly_score), where = outlier == -1, color = 'r', alpha = .4, label = 'outlier region')\n", + "plt.legend()\n", + "plt.ylabel('anomaly score')\n", + "plt.xlabel('age')\n", + "plt.show();" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "display_data", + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "tags": [] + } + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "GivF2cSFS208" + }, + "source": [ + "Observe no gráfico acima que há duas regiões em que os dados têm baixa probabilidade de aparecer: uma no lado esquerdo da distribuição, outra no lado direito da distribuição." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "XtizVySOlPUT", + "outputId": "0b387228-ac36-4617-9e06-a231e7b0fbdf", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 195 + } + }, + "source": [ + "# Avaliando os dados da cauda esquerda\n", + "df_titanic.loc[df_titanic['age'] < 15].head()" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
survivedpclasssexagesibspparchfareembarkedclasswhoadult_maledeckembark_townalivealone
1013female4.01116.7000SThirdchildFalseGSouthamptonyesFalse
18312male1.02139.0000SSecondchildFalseFSouthamptonyesFalse
19312male3.01126.0000SSecondchildFalseFSouthamptonyesFalse
20503female2.00110.4625SThirdchildFalseGSouthamptonnoFalse
29701female2.012151.5500SFirstchildFalseCSouthamptonnoFalse
\n", + "
" + ], + "text/plain": [ + " survived pclass sex age ... deck embark_town alive alone\n", + "10 1 3 female 4.0 ... G Southampton yes False\n", + "183 1 2 male 1.0 ... F Southampton yes False\n", + "193 1 2 male 3.0 ... F Southampton yes False\n", + "205 0 3 female 2.0 ... G Southampton no False\n", + "297 0 1 female 2.0 ... C Southampton no False\n", + "\n", + "[5 rows x 15 columns]" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 22 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "YGnZlzDDlyZO", + "outputId": "d5e0eb2c-2637-4021-f6e7-885b3e74161d", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 286 + } + }, + "source": [ + "# Zoom na linha 3\n", + "df_titanic.loc[10]" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "survived 1\n", + "pclass 3\n", + "sex female\n", + "age 4\n", + "sibsp 1\n", + "parch 1\n", + "fare 16.7\n", + "embarked S\n", + "class Third\n", + "who child\n", + "adult_male False\n", + "deck G\n", + "embark_town Southampton\n", + "alive yes\n", + "alone False\n", + "Name: 10, dtype: object" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 23 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "YVhBJua_sG-u", + "outputId": "2b9f3aad-9f81-41e6-882d-cf5b3c0a0d02", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 136 + } + }, + "source": [ + "# Avaliando dados da cauda direita\n", + "df_titanic.loc[df_titanic['age'] > 65].head()" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
survivedpclasssexagesibspparchfareembarkedclasswhoadult_maledeckembark_townalivealone
9601male71.00034.6542CFirstmanTrueACherbourgnoTrue
63011male80.00030.0000SFirstmanTrueASouthamptonyesTrue
74501male70.01171.0000SFirstmanTrueBSouthamptonnoFalse
\n", + "
" + ], + "text/plain": [ + " survived pclass sex age ... deck embark_town alive alone\n", + "96 0 1 male 71.0 ... A Cherbourg no True\n", + "630 1 1 male 80.0 ... A Southampton yes True\n", + "745 0 1 male 70.0 ... B Southampton no False\n", + "\n", + "[3 rows x 15 columns]" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 24 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "LRkUWSletcq-", + "outputId": "6fb001bd-c1d8-407b-ffd3-15ea7405dda0", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 286 + } + }, + "source": [ + "# Zoom na linha 96\n", + "df_titanic.loc[96]" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "survived 0\n", + "pclass 1\n", + "sex male\n", + "age 71\n", + "sibsp 0\n", + "parch 0\n", + "fare 34.6542\n", + "embarked C\n", + "class First\n", + "who man\n", + "adult_male True\n", + "deck A\n", + "embark_town Cherbourg\n", + "alive no\n", + "alone True\n", + "Name: 96, dtype: object" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 25 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "JQKECo0BSefE", + "outputId": "46a0911a-3cf8-43d1-8232-f05c710e54dc", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 454 + } + }, + "source": [ + "sns.regplot(x = \"age\", y = \"fare\", data = df_titanic_ss)\n", + "sns.despine()" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "display_data", + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "tags": [] + } + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "AChZpGY4Ghc9", + "outputId": "eed40cf6-a0e9-438c-e674-4bac462b872f", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 195 + } + }, + "source": [ + "cols = ['fare', 'age']\n", + "df_titanic_ss[cols].head()" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
fareage
1-0.1001100.152082
3-0.338485-0.039875
6-0.3547081.175852
10-0.815672-2.023430
11-0.6865431.431795
\n", + "
" + ], + "text/plain": [ + " fare age\n", + "1 -0.100110 0.152082\n", + "3 -0.338485 -0.039875\n", + "6 -0.354708 1.175852\n", + "10 -0.815672 -2.023430\n", + "11 -0.686543 1.431795" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 27 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "s2tddgHcUiAF" + }, + "source": [ + "___\n", + "## **CBLOF - Cluster-based Local Outlier Factor**" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "fbJ7k1bbbfr4" + }, + "source": [ + "# Normalizar as variáveis 'age' e 'fare'\n", + "df_titanic_ss = df_titanic.copy()\n", + "df_titanic_ss[['fare', 'age']] = MinMaxScaler().fit_transform(df_titanic_ss[['fare', 'age']])" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "il0LFdCFJEsw" + }, + "source": [ + "X1 = df_titanic_ss['age'].values.reshape(-1, 1)\n", + "X2 = df_titanic_ss['fare'].values.reshape(-1, 1)\n", + "X = np.concatenate((X1,X2), axis = 1)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "QtBn0u7CKlS6", + "outputId": "16701bb5-dcf1-4335-878b-545575f79f73", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 755 + } + }, + "source": [ + "outliers_fraction = 0.01\n", + "xx , yy = np.meshgrid(np.linspace(0, 1, 100), np.linspace(0, 1, 100))\n", + "clf = CBLOF(contamination = outliers_fraction, check_estimator = False, random_state = 0)\n", + "clf.fit(X)\n", + "# predict raw anomaly score\n", + "scores_pred = clf.decision_function(X) * -1\n", + " \n", + "# prediction of a datapoint category outlier or inlier\n", + "y_pred = clf.predict(X)\n", + "n_inliers = len(y_pred) - np.count_nonzero(y_pred)\n", + "n_outliers = np.count_nonzero(y_pred == 1)\n", + "\n", + "plt.figure(figsize = (8, 8))\n", + "\n", + "df1 = df_titanic_ss\n", + "df1['outlier'] = y_pred.tolist()\n", + "\n", + "inliers_fare = np.array(df1['fare'][df1['outlier'] == 0]).reshape(-1,1)\n", + "inliers_age = np.array(df1['age'][df1['outlier'] == 0]).reshape(-1,1)\n", + " \n", + "outliers_fare = df1['fare'][df1['outlier'] == 1].values.reshape(-1,1)\n", + "outliers_age = df1['age'][df1['outlier'] == 1].values.reshape(-1,1)\n", + " \n", + "print('OUTLIERS:',n_outliers,'INLIERS:',n_inliers)\n", + " \n", + "# Use threshold para definir um ponto como inlier ou outlier\n", + "# threshold = stats.scoreatpercentile(scores_pred,100 * outliers_fraction)\n", + "threshold = percentile(scores_pred, 100 * outliers_fraction)\n", + " \n", + "# Calcula o Anomaly Score\n", + "Z = clf.decision_function(np.c_[xx.ravel(), yy.ravel()]) * -1\n", + "Z = Z.reshape(xx.shape)\n", + "\n", + "plt.contourf(xx, yy, Z, levels = np.linspace(Z.min(), threshold, 7), cmap = plt.cm.Blues_r)\n", + " \n", + "# Desenha a linha vermelha a partir do qual Anomaly Score = thresold\n", + "a = plt.contour(xx, yy, Z, levels = [threshold], linewidths = 2, colors = 'red')\n", + " \n", + "# Região Azul onde threshold < Anomaly Score < max(Anomaly score)\n", + "plt.contourf(xx, yy, Z, levels= [threshold, Z.max()], colors='orange')\n", + "b = plt.scatter(inliers_fare, inliers_age, c = 'white', s = 20, edgecolor = 'k')\n", + " \n", + "c = plt.scatter(outliers_fare, outliers_age, c = 'black', s = 20, edgecolor = 'k')\n", + " \n", + "plt.axis('tight') \n", + "plt.legend([a.collections[0], b, c], ['learned decision function', 'inliers', 'outliers'],\n", + " prop = matplotlib.font_manager.FontProperties(size = 10), loc = 'upper center', frameon = False, bbox_to_anchor = (0.5, -0.05),\n", + " fancybox = True, shadow = True, ncol = 5)\n", + " \n", + "plt.xlim((0, 1))\n", + "plt.ylim((0, 1))\n", + "plt.title('Cluster-based Local Outlier Factor (CBLOF)')\n", + "plt.show()" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "text": [ + "OUTLIERS: 2 INLIERS: 180\n" + ], + "name": "stdout" + }, + { + "output_type": "display_data", + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAqoAAALRCAYAAACTYIFoAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAAPYQAAD2EBqD+naQAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nOzdd3xT9f7H8VdSuii0UCgUkD2UKYqKzIIoiAy3ArUMwXGRoV714gRx/RQFZInrgt5et96ryJQh4L6KiAMtlFZwALLKKNB1fn+EpA1dSZvknKTv5+PRB+TkJPmmSZt3P+f7/RybYRgGIiIiIiIWYzd7ACIiIiIiJVFQFRERERFLUlAVEREREUtSUBURERERS1JQFRERERFLUlAVEREREUtSUBURERERS1JQFRERERFLUlAVEREREUtSUJUqp1mzZowePdrsYfjM6NGjqVGjhtnD8JrNZmPatGlmD6PC+vTpQ58+fcweRqlOf59//PHH2Gw2Pv74Y9PGFGreeust4uPjOXr0qNlD8avc3FwaN27MggULzB6KVEEKqhIy0tPTueWWW2jRogVRUVHExsbSo0cPnn32WY4fPx6QMWRnZzNt2jSFAR/JzMzEZrPx9NNPmz2USsnNzWXOnDmcf/751KxZkxo1anD++eczZ84ccnNzK3y/n332GdOmTePQoUM+HG3lOUNxSV/Dhg3z6WOZ9TOXn5/P1KlTmThxYrE/FPPz81m0aBF9+vQhPj6eyMhImjVrxpgxY/j6669d+y1evLjY96devXr07duX5cuXF3tMm83GhAkTyh3bjz/+yA033ECjRo2IjIykYcOGJCcn8+OPPxbbt6QxOL+mTJkCQHh4OHfeeSePPfYYJ06c8PZbJVIp1cwegIgvLF26lGuvvZbIyEhGjhxJhw4dyMnJ4ZNPPuHuu+/mxx9/5IUXXvD7OLKzs3n44YcBLF1tk8A5duwYgwYNYv369QwePJjRo0djt9tZsWIFkydP5r333mPp0qXExMR4fd+fffYZDz/8MKNHj6ZWrVpl7tu7d2+OHz9ORERERZ+K1yZNmsT555/vtq1Zs2Y+fQyzfuaWLFnCL7/8ws033+y2/fjx41x11VWsWLGC3r17c9999xEfH09mZiZvvfUWr7zyCjt37uSMM85w3Wb69Ok0b94cwzDYs2cPixcv5rLLLmPJkiUMHjzYq3G99957DB8+nPj4eMaOHUvz5s3JzMzk5Zdf5p133uGNN97gyiuvLHY75xiK6tChg+v/Y8aMYcqUKbz22mvceOONXo1JpDIUVCXoZWRkMGzYMJo2bcratWtp0KCB67rbbruN7du3s3TpUhNHWHnHjh2rUJAR8915552sX7+euXPnulXD/va3vzF//nwmTJjAXXfdxXPPPefXcdjtdqKionx2f568J3v16sU111zjs8cMpPKe36JFi+jRoweNGjVy23733XezYsUKZs2axe233+523dSpU5k1a1ax+xo4cCDnnXee6/LYsWOpX78+r7/+uldBNT09nZSUFFq0aMGGDRtISEhwXTd58mR69epFSkoKW7ZsoUWLFmWO4XS1atWif//+LF68WEFVAkqH/iXoPfXUUxw9epSXX37ZLaQ6tWrVismTJ5d6+2nTpmGz2Yptdx4Sy8zMdG37+uuvGTBgAHXr1iU6OprmzZu7fmlnZma6Phgefvhh1+GzovMwf/75Z6655hri4+OJiorivPPO44MPPijxcdevX8/48eOpV6+eW/WlNDt27GDAgAHExMTQsGFDpk+fjmEYbvs8/fTTdO/enTp16hAdHU2XLl145513it3XRx99RM+ePalVqxY1atTgzDPP5L777nPb5+TJk0ydOpVWrVoRGRlJ48aNueeeezh58mSx/e644w4SEhKoWbMmQ4cO5bfffiv3+Xhj7969rg/3qKgozj77bF555ZVi+xUUFPDss8/SsWNHoqKiSEhI4NJLL3U7HLto0SIuuugi6tWrR2RkJO3atatwiPztt994+eWXueiii0o8ZHvbbbfRt29fXnrpJdf3xDndYfHixcX2L/p+mjZtGnfffTcAzZs3d73fir5fiyptjuqXX37JpZdeSlxcHNWrVycpKYlPP/3UbR/nz8hPP/3EiBEjqF27Nj179vTum1HEgQMHuOuuu+jYsSM1atQgNjaWgQMH8t133xXb98SJE0ybNo02bdoQFRVFgwYNuOqqq0hPT/foZ27t2rX06tWLmJgYatWqxeWXX87WrVsr9fxOnDjBihUruPjii922//bbbzz//PNccsklxUIqQFhYGHfddVe5P8+1atUiOjqaatW8qyXNmDGD7OxsXnjhBbeQClC3bl2ef/55jh07xlNPPeXV/TpdcsklfPLJJxw4cKBCtxepCFVUJegtWbKEFi1a0L17d78+zt69e+nfvz8JCQlMmTKFWrVqkZmZyXvvvQdAQkICzz33HH/729+48sorueqqqwDo1KkT4Jg35qzATJkyhZiYGN566y2uuOIK3n333WKH48aPH09CQgIPPfQQx44dK3Ns+fn5XHrppVx44YU89dRTrFixgqlTp5KXl8f06dNd+z377LMMHTqU5ORkcnJyeOONN7j22mv58MMPGTRokGucgwcPplOnTkyfPp3IyEi2b9/uFl4KCgoYOnQon3zyCTfffDNt27bl+++/Z9asWaSlpfHf//7Xte+4ceNITU1lxIgRdO/enbVr17oeyxeOHz9Onz592L59OxMmTKB58+a8/fbbjB49mkOHDrn9kTJ27FgWL17MwIEDGTduHHl5eWzcuJEvvvjCVU167rnnaN++PUOHDqVatWosWbKE8ePHU1BQwG233ebV2JYvX05+fj4jR44sdZ+RI0eybt06VqxYwbhx4zy+76uuuoq0tDRef/11Zs2aRd26dQGKBZSyrF27loEDB9KlSxemTp2K3W53BfWNGzdywQUXuO1/7bXX0rp1ax5//PFifwSV5MiRI+zbt89tW3x8PDt27OC///0v1157Lc2bN2fPnj08//zzJCUl8dNPP9GwYUPA8b4ePHgwa9asYdiwYUyePJkjR47w0Ucf8cMPP3DxxReX+TO3evVqBg4cSIsWLZg2bRrHjx9n7ty59OjRg02bNhWbhuDp8/vmm2/Iycnh3HPPddu+fPly8vLySElJKfd7U1RWVhb79u3DMAz27t3L3LlzOXr0KDfccINX97NkyRKaNWtGr169Sry+d+/eNGvWrMQjTM4xFOV8Tzl16dIFwzD47LPPvJ6SIFJhhkgQy8rKMgDj8ssv9/g2TZs2NUaNGuW6PHXqVKOkH4VFixYZgJGRkWEYhmH85z//MQDjf//7X6n3/ddffxmAMXXq1GLX9evXz+jYsaNx4sQJ17aCggKje/fuRuvWrYs9bs+ePY28vLxyn8+oUaMMwJg4caLb/Q4aNMiIiIgw/vrrL9f27Oxst9vm5OQYHTp0MC666CLXtlmzZhmA2+1O969//cuw2+3Gxo0b3bYvXLjQAIxPP/3UMAzD2Lx5swEY48ePd9tvxIgRpX6fisrIyDAAY8aMGaXuM3v2bAMwUlNT3Z5Xt27djBo1ahiHDx82DMMw1q5dawDGpEmTit1HQUGB6/+nf48MwzAGDBhgtGjRwm1bUlKSkZSUVOb4b7/9dgMwvv3221L32bRpkwEYd955p2EYhc950aJFxfY9/Xs2Y8YMt/doUae/z9etW2cAxrp16wzDcDzn1q1bGwMGDCj2/Js3b25ccsklrm3On5Hhw4eX+XxPf6ySvjIyMowTJ04Y+fn5brfJyMgwIiMjjenTp7u2/fOf/zQAY+bMmcUewznmsn7mOnfubNSrV8/Yv3+/a9t3331n2O12Y+TIkRV+fi+99JIBGN9//73b9jvuuKPc17so58/66V+RkZHG4sWLi+0PGLfddluJ93Xo0CGPfhcOHTrUAFw/F6WNoaTfiX/88YcBGE8++aRHz0/EF3ToX4La4cOHAahZs6bfH8u5WOXDDz/0eqX2gQMHWLt2Ldddd52ryrRv3z7279/PgAED2LZtG7///rvbbW666SbCwsI8foyih5adq4NzcnJYvXq1a3t0dLTr/wcPHiQrK4tevXqxadOmYs/z/fffp6CgoMTHevvtt2nbti1nnXWW67ns27ePiy66CIB169YBsGzZMsCxqKaokg6LVtSyZctITExk+PDhrm3h4eFMmjSJo0ePsn79egDeffddbDYbU6dOLXYfRad+FP0eOatMSUlJ7Nixg6ysLK/GduTIEaDs96fzOud7OVA2b97Mtm3bGDFiBPv373e9hseOHaNfv35s2LCh2Ot/6623evUYDz30EB999JHbV2JiIpGRkdjtjo+f/Px89u/f75piUvS9+O6771K3bl0mTpxY7L5Lmq5T1J9//snmzZsZPXo08fHxru2dOnXikksucb03K/L89u/fD0Dt2rXdtlf099H8+fNd35/U1FT69u3LuHHjXEdrPOHJe63o9ae/34qOwfl1OufzPb3yKuJPOvQvQS02NhYo/CXtT0lJSVx99dU8/PDDzJo1iz59+nDFFVcwYsQIIiMjy7zt9u3bMQyDBx98kAcffLDEffbu3eu2MKPoCtycnJxi88ISEhJcQdZutxdbHNGmTRsAtzmLH374IY8++iibN292m0ta9EP/+uuv56WXXmLcuHFMmTKFfv36cdVVV3HNNde4wsW2bdvYunVrqYeZ9+7dC8Cvv/6K3W6nZcuWbtefeeaZJd6uIn799Vdat27tGptT27ZtXdeDY6FJw4YN3UJLST799FOmTp3K559/TnZ2ttt1WVlZxMXFeTw2Zygo6/3pacDwtW3btgEwatSoUvfJyspyC2OnrwovT8eOHYvN44TCucILFiwgIyOD/Px813V16tRx/T89PZ0zzzzT67maUPi6l/Rea9u2LStXriy2YMrb52ecNj2gor+PLrjgAreFTMOHD+ecc85hwoQJDB482KNODZ6814pef/r77fQxlMT5fMv7I0HElxRUJajFxsbSsGFDfvjhhwrfR2m/dIt+eDr3e+edd/jiiy9YsmQJK1eu5MYbb+SZZ57hiy++KLPpvrMydddddzFgwIAS92nVqpXb5aKVvc8++4y+ffu6XZ+RkeFVq5+NGzcydOhQevfuzYIFC2jQoAHh4eEsWrSI1157ze1xN2zYwLp161i6dCkrVqzgzTff5KKLLmLVqlWEhYVRUFBAx44dmTlzZomP1bhxY4/HZSXp6en069ePs846i5kzZ9K4cWMiIiJYtmwZs2bNKrXCXBpnWN6yZQudO3cucZ8tW7YA0K5dO8Dz92NlOZ/LjBkzSh3b6e/pou/Jynj88cd58MEHufHGG3nkkUeIj4/Hbrdz++23e/099iVPn58zTB88eNBtYdRZZ50FwPfff1/q99QTdrudvn378uyzz7Jt2zbat29f7m3i4uJo0KCB6/1Umi1bttCoUSNXqPbGwYMHgeJzV0X8SUFVgt7gwYN54YUX+Pzzz+nWrZvXt3dWjA4dOuTWi9JZkTndhRdeyIUXXshjjz3Ga6+9RnJyMm+88Qbjxo0rNWQ4q53h4eElVpjKc/bZZxc7FJeYmOj6f0FBATt27HBVUQHS0tKAwr6V7777LlFRUaxcudKtArxo0aJij2e32+nXrx/9+vVj5syZPP7449x///2sW7eOiy++mJYtW/Ldd9/Rr1+/MqsrTZs2paCgwFUZc/rll1+8+waUoWnTpmzZsoWCggK3qurPP//suh6gZcuWrFy5kgMHDpRaVV2yZAknT57kgw8+oEmTJq7tzqkM3ho4cCBhYWH861//KnVB1auvvkq1atW49NJLAff3Y1ElvR8rU9lyVrljY2Mr9J6sjHfeeYe+ffvy8ssvu20/dOiQWwhq2bIlX375Jbm5uYSHh5d4X6V9D5yve0nvtZ9//pm6detWuOWbM5BmZGTQsWNH13bn652amur1gqrT5eXlAXh11qvBgwfz4osv8sknn5TYtWDjxo1kZmZyyy23VGhMGRkZQOEfYCKBoDmqEvTuueceYmJiGDduHHv27Cl2fXp6Os8++2ypt3d+YG/YsMG17dixY8XaGx08eLDYoT5n1cR5GL169epA8ZBRr149+vTpw/PPP8+ff/5ZbAx//fVXqeMDR3i5+OKL3b5O74k5b9481/8Nw2DevHmEh4fTr18/wNEax2azuVXmMjMz3VboAyW2njn9eV533XX8/vvvvPjii8X2PX78uKtLwcCBAwGYM2eO2z6zZ88u8/l647LLLmP37t28+eabrm15eXnMnTuXGjVqkJSUBMDVV1+NYRiu5vBFOV9X51SKoq9zVlZWiWHeE40bN2bMmDGsXr26xBZXCxcuZO3atYwdO9ZVmYuNjaVu3bpu70egxNNXOoNWRc5M1aVLF1q2bMnTTz9dYhgq7z1ZGWFhYcV+lt5+++1i87Svvvpq9u3b5/bednLevrSfuQYNGtC5c2deeeUVt+t++OEHVq1axWWXXVbh8Xfp0oWIiAi3tmbgeL1vuukmVq1axdy5c4vdrqCggGeeeabc9my5ubmsWrWKiIgIr0Lh3XffTXR0NLfccotrHq3TgQMHuPXWW6levbqrrZm3vvnmG2w2W4UKAiIVpYqqBL2WLVvy2muvcf3119O2bVu3M1N99tlnrlZFpenfvz9NmjRh7Nix3H333YSFhfHPf/6ThIQEdu7c6drvlVdeYcGCBVx55ZW0bNmSI0eO8OKLLxIbG+v60IuOjqZdu3a8+eabtGnThvj4eDp06ECHDh2YP38+PXv2pGPHjtx00020aNGCPXv28Pnnn/Pbb7+V2EPSU1FRUaxYsYJRo0bRtWtXli9fztKlS7nvvvtc80gHDRrEzJkzufTSSxkxYgR79+5l/vz5tGrVyu1w4fTp09mwYQODBg2iadOm7N27lwULFnDGGWe4qjQpKSm89dZb3Hrrraxbt44ePXqQn5/Pzz//zFtvvcXKlSs577zz6Ny5M8OHD2fBggVkZWXRvXt31qxZw/bt2716fmvWrCnx1I1XXHEFN998M88//zyjR4/mm2++oVmzZrzzzjt8+umnzJ492zUXr2/fvqSkpDBnzhy2bdvGpZdeSkFBARs3bqRv375MmDCB/v37ExERwZAhQ7jllls4evQoL774IvXq1SvxDwxPzJo1i59//pnx48ezYsUKV+V05cqVvP/++yQlJfHMM8+43WbcuHH83//9H+PGjeO8885jw4YNrgp5UV26dAHg/vvvZ9iwYYSHhzNkyBCPKoV2u52XXnqJgQMH0r59e8aMGUOjRo34/fffWbduHbGxsSxZsqRCz7k8gwcPZvr06YwZM4bu3bvz/fff8+9//7vYPOuRI0fy6quvcuedd/LVV1/Rq1cvjh07xurVqxk/fjyXX355mT9zM2bMYODAgXTr1o2xY8e62lPFxcW59Vr1VlRUFP3792f16tVu7d8AnnnmGdLT05k0aRLvvfcegwcPpnbt2uzcuZO3336bn3/+udhpZJcvX+46ArB3715ee+01tm3bxpQpU4odov/666959NFHi42pT58+9OzZk1deeYXk5GQ6duxY7MxU+/bt4/XXXy82Z9xTH330ET169HCbRyzid2a1GxDxtbS0NOOmm24ymjVrZkRERBg1a9Y0evToYcydO9etJdTpbXsMwzC++eYbo2vXrkZERITRpEkTY+bMmcXaU23atMkYPny40aRJEyMyMtKoV6+eMXjwYOPrr792u6/PPvvM6NKlixEREVGsbU56eroxcuRIIzEx0QgPDzcaNWpkDB482HjnnXdc+zgft6w2WEWNGjXKiImJMdLT043+/fsb1atXN+rXr29MnTq1WAugl19+2WjdurURGRlpnHXWWcaiRYuKtedas2aNcfnllxsNGzY0IiIijIYNGxrDhw830tLS3O4rJyfHePLJJ4327dsbkZGRRu3atY0uXboYDz/8sJGVleXa7/jx48akSZOMOnXqGDExMcaQIUOMXbt2edWeqrSvf/3rX4ZhGMaePXuMMWPGGHXr1jUiIiKMjh07ltjeKS8vz5gxY4Zx1llnGREREUZCQoIxcOBA45tvvnHt88EHHxidOnUyoqKijGbNmhlPPvmkq01S0TZQnrSncjp58qQxa9Yso0uXLkZMTIxRvXp149xzzzVmz55t5OTkFNs/OzvbGDt2rBEXF2fUrFnTuO6664y9e/eW+D175JFHjEaNGhl2u91tjOW1p3L69ttvjauuusqoU6eOERkZaTRt2tS47rrrjDVr1rj2cb5HympZVpTzsd5+++0Srz9x4oTx97//3WjQoIERHR1t9OjRw/j8889L/J5mZ2cb999/v9G8eXMjPDzcSExMNK655hojPT3dtU9ZP3OrV682evToYURHRxuxsbHGkCFDjJ9++sntMbx9foZhGO+9955hs9mMnTt3FrsuLy/PeOmll4xevXoZcXFxRnh4uNG0aVNjzJgxbq2rSmoNFRUVZXTu3Nl47rnn3NqGGYZR5s/CI4884tpvy5YtxvDhw40GDRq4vmfDhw8v1k6r6BjK+31z6NAhIyIiwnjppZc8/h6J+ILNMDzo2iwiIiIu+fn5tGvXjuuuu45HHnnE7OH43ezZs3nqqadIT0/32aI6EU9ojqqIiIiXwsLCmD59OvPnz/dqwVMwys3NZebMmTzwwAMKqRJwqqiKiIiIiCWpoioiIiIiluR1UN2wYQNDhgyhYcOG2Gy2Yq1tSvLxxx9z7rnnEhkZSatWrVi8eHFFxioiIiIiVYjXQfXYsWOcffbZzJ8/36P9MzIyGDRoEH379mXz5s3cfvvtjBs3jpUrV3o9WBERERGpOio1R9Vms/Gf//yHK664otR9/vGPf7B06VK3U1wOGzaMQ4cOsWLFioo+tIiIiIiEOL83/P/888+LnZ5vwIAB3H777aXe5uTJk64z4IDjbB4HDhygTp06lTploIiIiIj4h2EYHDlyhIYNG7qd0roy/B5Ud+/eTf369d221a9fn8OHD3P8+PESW1088cQTJZ7mUERERESsbdeuXa7TQleWJU+heu+993LnnXe6LmdlZdGkSRMefOcToqrXMGVM+377lVk3X84LL7zA9ddf79r+5ptvcvPNN3PHi+9Tt1FTU8ZWnq6Jtc0egkiZvtx90OwhiI99ul2vaUX9sPUvs4dQpez55RezhxAyjLwT5Gx4zHXqal/we1BNTExkz549btv27NlDbGxsqY2DIyMjiYyMLLY9qnoNomJ89+S9ccaZHWh3YR/+8Y8pREdHk5SUxPr16/nHP6bQ7sI+nNGmgynjKkn3hvFmD0HEKxfVdD+f+Wd/HDBpJOIr/c52/K7ekKbX0lvnnFuD737cU/6O4hMNOp0DwO6ftpo8ktDhy2mafg+q3bp1Y9myZW7bPvroI7p16+bvh/a54Q/M5PVH7yQlJcW1rd2FfRj+wEwTR6VgKqGn6HtaoTW49W4Tr7BaAWe3r6+wGmCJ7doCCqxW43VQPXr0KNu3b3ddzsjIYPPmzcTHx9OkSRPuvfdefv/9d1599VUAbr31VubNm8c999zDjTfeyNq1a3nrrbdYunSp755FgFSvGcfYJ1/mr98y2Pfbr9Q9oykJZzQ3ZSwKp1JVnP5eV3ANPr3bOF5DBVbvKKyaQ4HVWrxuT/Xxxx/Tt2/fYttHjRrF4sWLGT16NJmZmXz88cdut7njjjv46aefOOOMM3jwwQcZPXq0x495+PBh4uLieGzZZtMO/VuBwqmIO4XW4KOw6j2FVXMpsHrOyDvBybUPkpWVRWxsbPk38ECl+qgGSlUNqgqmIp5TaA0uCqzeUVg1l8KqZ/wRVC256r8qUzgVqRhNEQgumg7gHU0DMJemA5hHQdUCFE5FfE8LsoKDFlt57uz2jp7kCqzmUWANPAVVkyicigSO8+dNgdWaVF31jqqr5lNgDRzfnN9KPNK9YbzrS0QCTz+D1uYMrFI+Z3VVzOUMrOI/qqgGgD4URaxHUwOsSdVVz6myag2qrvqXgqqfKJyKBA+FVuvR3FXPKKxahwKrf+jQvw/psKJI8NPPsHVoKoBnNA3AWjQdwLdUUfUBfaiJhB5VWa1BUwE8o8qqtai66juqqFaQqqciVYd+1s2n6mr5VFm1nsR2bVVhrSQFVS/pA0uk6tIfqObq3SZegbUcCqvWpLBacQqqHtKHk4gUpd8J5lFYLdvZ7esrsFqQqqsVo6BaBlVPRKQ8+j1hDoXV8imsWpPCqncUVEugDx0RqQj97ggsTQUon8KqNam66jmt+i9CHzAi4gs6ZWtgqedq2YqGVXUGsBZ1ByhflQ+qCqci4i8KrIGjsOoZhVZrSmzXVmG1FFU2qCqgikigqCdrYKjnqnecoVWB1RpUXS1ZlZujqjlkImIm/Q7yP81b9Y6zS4Dms1qD5q66qzJBVR8OImIl+p3kXwqrFaPAag1abFUo5IOqPgxExMr0O8p/1BWg4hRYrUFhNYTnqOoXv4gEEy288h8ttKo4Lb4yX1WfuxpyFVVVJ0QkmOl3mH+oslp5qrKaq6pWV0MmqOqXu4iEEv1O8z2FVd9QYDVPVZy7GvRBVb/MRSSU6Xecb2nequ8osJqnKoXVoJ2jql/cIlKVaA6rb2nequ9oHqs5qsrc1aCrqKq6ICJVmX7/+Y4qq76nKmvghXp1NaiCatfE2mYPQUTEdPqD3XcUVv1DgTWwQjmsBlVQFRGRQgqsvqF5q/6jwBo4obrQSkFVRCTIKbD6hsKq/yiwBk6ohVUFVRGREKHAWnkKq/6lwBoYoVRdVVAVEQkxCquVo7DqfwqsgREKYTVo21OJVEazhBjTHjvzr2OmPbZUHWpnVTlqXxUYzrCqtlb+k9iubVC3sFJQlaBnZuisiIqMV+FWKkqBteIUVgNHgdW/grnnqoKqWFqwhVB/8eT7oDArZeneMF5htQKc0wAUWANDgdW/grG6qqAqplMY9Y2yvo8KsQKqrlaGqquBpcDqP8EWVhVUJWAUSM1T2vdeAbZqUmCtGIXVwFNg9Y9gmgqgoCo+p0AaPBRgqzZNB/Cewqo5FFj9Ixiqq7XQjfwAACAASURBVAqqUikKpaHp9NdVwTV0qbrqPYVV8yiw+p7Vw6qCqnhFwbRqKul1V3gNLaquekdh1Vxnt6+vsOpDVp4KoKAqZVIwldKo6hp6VF31jsKquVRd9T0rVld1Zipx0ywhxu1LxFN674QOndnKczqLlfl0livfstrZrFRRreIUKMRfVHENbqquek6VVWtQhdV3rFRZVVCtYhRMxSxF33sKrcFDc1c9oxMDWIfmr/qGVeatKqhWAQqnYjWqtgYXVVc9p+qqNai66jtmV1c1RzVEaa6gBBO9X4OD5q56RvNWrUPzV33DzHmrCqohRB/2Egr0Pra27g3jFVg9oLBqLQqslWdWWFVQDXL6UJdQpve3dSmslk9h1XoUVisnsV3bgAdWBdUgpA9vqYr0vrceVVfLp7BqPaquVl4gw6qCapDQh7RIIf08WIvCatkUVq1JgbVyAhVWFVQtTB/GIuXTz4k1KKyWTWHVuhRWKy4QYVVB1YL0oStSMfrZMZemApRNYdW6VF2tOH+HVQVVi1BVSMR39PNkLoXV0imsWpvCasX4c5GVgqrJ9GEq4l/6GTOHwmrpereJV2C1MFVXK67+mWf6/D51ZioT6ENTJPB0CtfA0xmtyqazWFmbzm5lDaqoBpAqOyLWoJ/FwFJ1tXTO6qoqrNal6qq5VFH1M30YiliX8+dTFVb/694wXpXVchQNq6q0Wouqq+ZRUPUTBVSR4KFpAYGhqQCec4ZWBVZrObt9fYXVAFNQ9TEFVJHgpiqr/6m66jlVWa1H1dXA0hxVH9GcN5HQop9p/9K8Ve9pLqu1aO5qYKiiWkn6IBMJbaqw+o8qqxWjKqt1qLrqf6qoVlBJ1Zb07Wms+WgFO9K3mTQqEfEXVVj9Q2ezqhxVWa1B1VX/UUXVSyV9UB08eICJt4xhzarlrm39+g9k3guLqVWrdiCHJyJ+pgqrf6i6WjmqsppP1VX/UEXVQ2VVUybeMoZvv/6S1NRUdu7cSWpqKt9+/SUTbh4d2EGKSMCowup7qqz6hqqs5lJ11bdUUS1HeR9E6dvTWLNqOampqSQnJwOQnJyMYRikpKSwI30bLVq2DsRQRcQEqrD6liqrvqMWV+ZRGyvfUVAthaeVksyMHQD07t3bbXtSUhIAGTvSFVRFqgAFVt9Rv1Xf0rQAc2gqgG/o0P9pvD2c16x5CwA2bNjgtn39+vUANG/R0neDExHL05QA39FUAN/TtIDA01SAylFF9ZSKfrC0bNWGfv0HMmnSJAzDICkpifXr1zN58mT69R+oaqpIFaUKq29oKoB/aFpAYKm6WnE2wzAMswdRnsOHDxMXF8fqTb8SUzPWp/fti8rHoUMHmXDzaK36F5FSKbBWjsKqfymwBk4oh9WCk9nsfO46srKyiI31TV6rskHVH4fmdqRvI2NHOs1btAyKSmr69jQyM3YEzXhFgp3CauUorPqfAmtghGpYVVD1UVCt6vPH1PdVxFwKrBWnsBoYCqyBEWqB1R9BtUotptIiBwf1fRUxl34XVZwWWAWGc9GVFl75lxZala9KLKbSB0Ih9X0VsQ4tuKoYta8KLC288i/1XC1byFdUFVLdedL3VUQCS7+nKkbV1cBShdV/zm5fX9XVUoRsUNWhtZKp76uINel3VsUorAaewqr/KKwWF3KH/vWLvmzq+ypibZoO4D31Wg08TQfwH00FcBdSq/4VUj2jvq8iwUFh1TsKq+ZRYPWPYAusak9VSlBVQK2YYOv76jMFBdgOH8Z+6CC2I4exZWdTEB9Pfusz3Xar/sICbNnHIC8P8vKw5eVBfh7Y7WCzQ1gYht3OyUFDyevQyXU7W1YWkatXYERFY0RHY8TUwIiNpSCuFkZsHEZMDNhsgX7WEsQUWD2nsGouBVbfC6awqqB6WlBVQBUn27Fj2Pf9RX7TZm7bY559mogNH2M/sB/boUOOcJp1CFtBgdt+2TeM5vC8FwBIXBrh2Hgz4Ek++BvQs/DivsZfUbf3BaXuboSFYcTGURBXiwMfrKSgSVPXddW2/ki1n34gv0EjChqdQX6DhhAR4cEgJNQprHpOYdV8Cqy+FSxh1R9BNWjnqCqkVjEnTxKWuYNq6dsJy9hB2G+7CNv1K2G7dhK2ayf2A/vJb9iIv37KAIqEzZXAZ+XfffX0xVRfuth9Y5iHYzutOFp3bekhFcCWn4/t4AHsBw9Q74vW8H2RK98H3iq8aNhsFNSrT36jMxzBtUkz8lq0JO/MtuT27H36XUsI09xVz2nOqvk0h9W3qvK81aALqgqooc22fx9GjZoQGenaFvXum8TdNKpYFfR0YXt+J3FJhHsvizjnHQPVgRpATJH/RwORQPMS7vAmIB9HYHV+OUNpAWCc+rfpaberA9wA5AC5wPFTX8dO+/f4qXEUddj9os0wCNuzm7A9u2HT167tOV3O58CaT932jf7XYgDy2rYjr81ZGD76a1aspVlCjMKqBxRWrUGB1XecHQGqWmANqqDapK5CasjIzaXa1h8J/24z1X76wXHIe+uPhO3ZzYEPVhF/pH/hvpk4AmFJbEA8UBdHQMzFETydLj/1FYP3zdjO9XJ/p3hgYAVve8Gp2x8A9hf5NwtHMD4lIuJ/hVVjp0eBIr+/8s9o7Ki8duxEXqfO5J59DvnNWzrm2EpQU3XVMwqr1qHA6jtVrboaVHNU03buo6aqREErOvUVwr/9mmqbNxH+wxZsJ0+WvONIYECRy1nADCDx1Fd9HMG0Lo5Q5+kh+mCWBxwE9uIIo3WBTkWuzwFuxC3MlqSgZk2y5r3Aycuv9s84JeAUVsunsGotCqu+YcWwWuUXUymoBgfboUPY9/xJ/plt3bYnto2AP8u4YQ2gMdAP6ObHAYaiPGAr8DvwW5F/s0vY9yHgVIOD3YNyqLb5W2o88TC553d1fJ17PkbNmoEZt/iMAmvZFFatR4G18qwWVrWYSizJ/tdeIj5ZT8TG9YR//gnhW38it+PZhE/5zn3H5hQG1QZAs1PbmuIIqLEUW5gkHqoGdDz15WTgmDrwK47pE7+e+ioypzZxaQQsB1ZC1MpljpuFhZHb+VxyevQit0dvci7sgREXh1ib5q6WTdMArEfTASqvKsxbVUVVvGY7eICITzYQsXE9ERvXEb71pxJ2Al7EsVjJKQPHAqJmFF9EJOZ5HthQ+tWG3U5ep86cGDSUY3ffF7BhScUpsJZOYdWaFFYrzwphVYf+FVRNF7FuNbWvHlz6Cnw7jiDaGsciJhXirM/AMe91G7Ad+BnHtIHTdYXdK3PcNtl3/kpB4yY6gYEFKayWTmHVuhRYK8fssKpD/xIwtoMHiPxoBfktWpF7nqMvaOLSCEdrpaJ/2thwHL5vD7QD2gBRgR6tVIqNwoVqvU5tO4wjsG499bUL6Ih7p4GTYPwtkoI6dTnZfyAnBw7mZO++EB2NmE9TAUqnaQDWpekAlROKHQFUURUX+2+7iFq2hMgP3yfi0w3Y8vMhCccZmopaiKPdU3vgLHQYvyo4guPP2qIZdAvwpPtuBdWrk3PRJZwcOIQTAwZi1E0I3BilVAqsJVNYtT4F1ooxK6zq0L+Cqs+Fpf1M1PvvEbX0A8I3byq+Qy1gHlrkJMVtBT449W9u8asNm43cC3tw4spryB5zE4SHB3iAUpTCaskUVq1PYbXiAh1YFVQVVH0mYt1qaj44hfAftpS8Qz3gPKAz0Bbvm+VL1XES+AHYdOrrtLNrUQ92/3JS81gtQoG1OIXV4KDAWjGBDKuaoyo+E//tZY5wUVQzHOG0C452UcoV4olIHO+ZLjjOIJYOfAN8jaMdWVdIXFZ4urDdg3Koedck8s4+lxOXX6VTvQaY5q4WpzmrwaF3m3iF1QoI9nmrqqiGsuPHiVr6PtGvp3LiymuJq31T4XUFwO04Du13wxFQNZ1QfMnAsQgrBsfpbZ3+wvHeA4yoKE4MGsrx65PJuegSqKa/nQNFYbVkCqzBQYHVe4EIqzr0r6BaPsMg/KsviH7tVaL+8zb2w6eOw56J44xERWWjhVASeKuBRcU35yfU48Q1wzg+cgx5bdsHfFhVkcJqyRRWg4PCqvf8HVYVVBVUS2U7sJ/o1/5F9Vdeotq2tOI7JABP4L5qW8QMBrAD+AT4DDhafJecrt3IHjWWE8NTNLc1ABRYi1NYDR4KrN7xZ1hVUFVQLSYsM4MaT0wn6r/vYDt50v3KSOBCHL0xz0QLosR68oDvcITWTacuO7WF3Z/nlHgz8T2F1dIptAYHBVbP+SusajGVFGOEhRH91r/dm/C3BXoDF6Dm+2Jt1ShciHUU+BRYi+PMWH0LTzCwe1AOGAaRK5Zy8uIBanXlB1pkVbruDeNd/1dotS4ttvJcMC2wUkU1iIRlZhD2ayY5SX3dzxD0NJCGI5xeBDQ0Z3wiPmHgOJVrU6DI25wfgcchv2Ejsm+8mezR43RCAT9QWPWcQqt1KbB6zpeBVYf+q2JQNQzCP/+UmAXPErlsCbZaBTAL91r4fqAm7h/qIqFmJo62V6cYkZEcv/p6sm+dQF6nzqYNK1QpsHpHodV6FFY956uw6o+gqlmLVpWXR9Tbr1Onz4XUuewioj58H1tBARwA/nfavnVQSJXQdymONmqn1lbZTp6k+muvUrf3BdS+YiAR61aD9f/uDhrNEmLMHkJQ6d4w3m2KgJivd5t4erfRa+KJs9vXN3sIpVJF1WpOnCD6368QM2cm1X7NcL+uFnAJjsP7If5tECnVXhwtrtbhaLFWRG6nzhx+4hlye/QyYWChSZXVilOV1TpUXfVMZSurWkwV4qovnEfMrKcI27Pb/YrmwECgK3rFROoBI4CrgI3AMhzhFQjfslknDfAxLbKqOC3Csg4ttPKMFRdZqaJqIYnXRDgqRU4dgaE4VvGrlaRIyQpwTIdZgmMKTJETW+welEPYr5nkJzaAyMiSby8eUVj1DQVW8ymwlq+iYVWLqUIoqNqysjAiIyEqqnAF/1/A33G06hmKo5IqIp4xcEwFiDlt20OQf6IRx+64h+yUMRClnm2VocDqOwqt5lFYLV9FwqoWU4UA25EjxDz9BAmdWpN4d6x7m6kEYC4wGYVUEW/ZcA+p4DiZwA4I++N3Yu+eTMI5Z1H9hQVw+skxxGNaZOU7WoBlHi20Kp9VFlgpqAZKdjYxs2eQ0Kk1NR+dij3rEHwAnP55GWfG4ERCVB3g3MKLYX/+Qew9t5PQpT3Rqa9Afr5pQwtmCqu+5QysCq2Bp7BaNiuEVQVVf8vPJ/rVRSR0aUfNafdjP3jqcIMd6AzoDJEi/tMYx3SaR3FMqTkl7LedxE24iTo9ziVy2RK1taoAhVX/UGANPIXVspkdVjVH1V8Mg8hVy6kx7T7Ct/5UuN0GdMexYjnRpLGJVFUZwNs4pgQUcWz8ZI48PsOMEQU9zVn1L81jDSzNXS2dJ3NWNUfVJOnb01jz0Qp2pG/z7AYFBdS+Zgi1r7/CPaR2Af4PGI/fQmran7B8M2zbXf6+IlVOc+Ae4AGgVeHmmMRnTRpQ8FNl1b9UYQ0sVVdLZ1ZlVQ0Hy3Dw4AEm3jKGNauWu7b16z+QeS8splat2qXf0G4nMnxV4eVWOPo+num3oXLgKKQstLPs2wLXtsvOsZP6twJq63NExF1bYBrwNbAdaIlrYePuQTmEpW+jIKE+RrAcwTGZeq36n3qyBo56rpbOjD6rqqiWYeItY/j26y9JTU1l586dpKam8u3XXzLh5tHuO544ATmOyaaJSyMcH3hXAC1wrOCfhl9DKjhC6heZNd3G+kVmTW54Ti+xSIlswPnAcPfNiUsiSLiyPXXPa0/0q4u04MpDqqwGjqqs/qeuAKULdGVVc1RLkb49jZ7ndSA1NZXk5GTX9tTUVFJSUvj0mx9p0aIVkUs/oOYD/6Ba9x0w6LQ7MQhIo/60P+HMuyh1rGnPQGvNhxXxzDrgpcKLOi2rd1RZDTxVWP1L1dWSlVRZ1RzVAMrM2AFA79693bYnJSUBsG/jempfNYjaN1xLtcwd8B6QddqdBOhsUumn3iuljXW75quKeK49jkrrKeFbNlNnUD/ibhmD/a+9pg0rWKiyGniqsPqXKqslC1RlVUG1FM2atwBgw4YNbts3fvQR04DBd00icl2R8522oHhP1ABpeeq9cvpY169fD0ArVVNFPFcPuB24H2hSuDn6zX9T97wORC9+CQoKSrmxgMKqWRRY/UdTAUoWiLBaoaA6f/58mjVrRlRUFF27duWrr74qc//Zs2dz5plnEh0dTePGjbnjjjs4ceJEhQYcKC1btaFf/4FMmjSJ1NRUdu3axUf33ccFN93EVMCel+fYsQ6Oeaj34fiAM0GbBo6FU5Mm3uYaa2pqKpMnTeCyc+w67C9SEe2Ax4AxQHXHJnvWIeJuH0/8gCSqbf3RxMFZn8KqeRRY/UdhtTh/h1Wv56i++eabjBw5koULF9K1a1dmz57N22+/zS+//EK9esWT2muvvcaNN97IP//5T7p3705aWhqjR49m2LBhzJw506PHNKuP6qFDB5lw82g2r1rO08DooleG4ZiTegUQGbAhlergMbjhOa36F/GLLOA14JNTl22wb90X5HU+t4wbCWjOqhVoDqvvad5qcd/9uMcvc1S9Dqpdu3bl/PPPZ968eQAUFBTQuHFjJk6cyJQpU4rtP2HCBLZu3cqaNWtc2/7+97/z5Zdf8sknnxTbvyRmN/zPn3wrjV75Z+GG1sBYHGe9sZhtux1zUlslagGViM/9CCwCOgKjHJt2D/LP6eXSt6eRmbGD5i1a0qJla788RiApsJpPgdW3FFaL+3ZThrmLqXJycvjmm2+4+OKLC+/Abufiiy/m888/L/E23bt355tvvnFND9ixYwfLli3jsssuK/VxTp48yeHDh92+zNSo5z+hNo7DfzcCD2HJkAqOcDqws0KqiF+0B54Ari/clLg0AvLziXn6CWyHDlX6IQ4ePMAN111Oz/M6cMO1Q+nRpT03XHc5hw4drPR9m0lTAcynKQG+pXmrxXVom+Dz+/QqqO7bt4/8/Hzq13efj1C/fn127y55afmIESOYPn06PXv2JDw8nJYtW9KnTx/uu+++Uh/niSeeIC4uzvXVuHEAU6FhELY9DSjSE7U6jsUVM4B+aAmaSFUWDkS5b0qcGE3NR6dSt1tnIlcsrdTde9y/OQgprFqDAqtvKaz6l98j18cff8zjjz/OggUL2LRpE++99x5Lly7lkUceKfU29957L1lZWa6vXbt2+XuYANh//43a1wyhTt9uJL4S4X5lK6BWQIYhIsEkB/jQ8d+wP/+g9rAribv1xgpVV9O3p7Fm1XLmzJlDcnIyjRs3Jjk5mWeffZY1q5Z7fhpnEQ8osPqOwqr/eBVU69atS1hYGHv2uDd53bNnD4mJJR9rfvDBB0lJSWHcuHF07NiRK6+8kscff5wnnniCglJavERGRhIbG+v25VeGQdQbqdTtdg6Ra1ZhP3IEXsTRsF9EpCwRwCNAp8JN0W+kUrdnFyI2rvfqrsrr35yxI70yI7UEVVWtR2HVNxRW/cOroBoREUGXLl3cFkYVFBSwZs0aunXrVuJtsrOzsdvdHyYsLAwAK5wUy3bwALVGD6fWrTdiP3yqY38tYCABa9gvIkGuLnAPcDMQ7dgU9tsuag/tT42HpsBJz5osl9a/2dkTuXmLlj4asLkUVq1H1VXf0LxV36vm7Q3uvPNORo0axXnnnccFF1zA7NmzOXbsGGPGjAFg5MiRNGrUiCeeeAKAIUOGMHPmTM455xy6du3K9u3befDBBxkyZIgrsJolYsPHxN06hrA/fi/c2AMYCdQwa1QiEpRsQBLQAVgI/AQ2w6DGnJlErl1N1ouvkNe2fZl3UbR/s2EYJCUlsX79eiZPnky//gNDYvW/U7OEGHUCsCBnWFWHgMrp3SZeXQF8xOugev311/PXX3/x0EMPsXv3bjp37syKFStcC6x27tzpVkF94IEHsNlsPPDAA/z+++8kJCQwZMgQHnvsMd89C2/l5FDjsanEzJmJzVnVrYGj5dQF5g1LREJAHeBeYDnwFpAH4T9sIfrl5zny9Jxybz7vhcVMuHk0KSkprm39+g9k3guL/TRg8yisWpcCa+UprPqG131UzeDrPqq1rxlC5OqVhRvaA7cCqta7SfsT0veoJ6tIhf0KLADycJzlKsrzvqs70reRsSM9ZPqolkVh1doUViunKoXV3ONHWTapr7kN/83g66Ca+EgEPIPj7FLX45iPqpZTLgeOQspCneVKxCdygINAka5+uwflYMvKwoiLM2tUlqOwan0KrBVXVcKqP4JqlYtniUsj4FzgOmA6jtOgVrnvQtlSFtr5IrOmWx/HLzJrcsNz+kaJeC0Ct5AKkLgogoRzziLmmf+DUrqfVDVaYGV9WnBVcVpgVXFez1ENNmEZ6US9/QY12z3svor/ctOGZGlpf8KybwtITZ1PcnIyAMnJyRiGQUpKCtt2axqASKXkAnPBfmA/NR95iIjPP+XQ84sw6tQ1e2Sm05zV4NC9YbyqqxXgDKtVpbrqKyFdIov88H3qJF1IzccfhnVmjyY4pJ9qkVtaH8ftJZ+ATEQ8FQZ0xvWHc+TqldTtdT7hX3xm5qgsQ5XV4KDqasWpuuqd0AyqubnUfHAKtW+4trA36kdAvqmjCgotTx2iLK2PYytVU0Uqxw5cDUwBTk3hCvvjd+IHX0z1hfPA+ssG/E5hNXgosFaMwqrnQi6o2vfuIf7yS4mZO7NwY1fgQRyVDClTmwaOhVOTJt5Gamoqu3btIjU1lcmTJnDZOXYd9hfxlQ7A40Bbx0VbXh6xU+4k7pYxkJ1t5sgsQWE1uCisek9h1TMhteq/2uZvqZ18NWG//+bYEAaMAAZQqbNMVbU2TQePwQ3PadW/SEDk4+i3+mHhptxOnTn473coaNzErFFZhuasBh/NX/VOKM1Z9ceq/5BZTBX533eo9bex2I4fd2yoDUwGKtF+sKq2aaodA0vvKmDbbsecVEdA18pkEb8IA4YDLYDngZMQvmUz4d9+w0kFVS2wCkJabOUdnRigbCFx6D869RVqjx5RGFJbAY9SqZAKatPUOhEGdq4aVWQR03XF0TKvPnAF1A673uQBWUezhBhNBQgymrvqHU0DKF1IJK44+02O0xYC9ALuB2pV7j6dbZrmzHW0aWrcuDHJyck8O2cey751VBtFRHzqDOARHIutONX32cn6s7T8TmE1+Cisek5htWRBH1QTl0ZAHHAncANwC44G25WkNk0iYooY3H4zJy6NIPGeCGoNuxLbkSOmDcsqFFaDj6qrnlNYLS4og2q1LZuxHTrkXm1ohuNUqJVYNFWU2jSJiCVsBV6GqJXLiL+0D/ZdO80ekemcUwE0JSC4KKx6RmHVXdAtpopcuYy4MSOwN8+Ge/DbMyjapskwDJKSkli/fn2RNk1aXCQiAWAAUcAxCP/xe+pc3JODb75PXudzzB6ZZRQNq1p4ZW3OsKrFVmXTWawKBVV7qp1PzuKM++7Cln+qc/8wYIj/HldtmkTEEv4AngZOTUkqiInh0OLXybnkUjNHZXkKrdamsOqZYAqr/mhPFVRBNQvXiVwcK2RvxSfzUcvj3qbJ/bq0P2H9VrDZIKmtVsiLiJ8cAWYCaY6LRlgYh2ct4PjIMWaOKqgouFqTAmv5giWsKqhyKqgOxNHI38QZtgeOwrB5dj76vrDaarfb6dfe4M2JhiquIuJ7OcBzwFeFm45MfZRjt9/t+GtZPKLAaj0Kq+ULhrDqj6AafIupbjj1ZfLIUxba+d9O9x6rcXFxbEiLqDJ9VkUkwCKAiUCRI/41H36AGo8/bNaIgpIWYlmPOgOUr6ousgquRHULjmqqyZw9VufNd++xOmfOHE6ePKk+qyLiP3Ycf6wPO3U5AmpEP27igIKbQqu1KKyWrSqG1eAKqueZPQCH8nqsgvqsiogf2XAsJL0JmAScedrJAaRCFFqtQWG1bFUtrAZXULWI8nqsgvqsikgA9AGKdKlKXBrhOINVdrZZIwoZCqzm0lSAslWlsBp0fVStwNljdcJt7j1WJ02aRGRkJP3a5arPqoiYInFkJDm/n8fB95Zi1Kpt9nCCnnq0mqt7w3gttCpF7zbxQbHAqrKCa9X/ixBb3ezROBw8BtfP1ap/EbGQ5UCq47+5Z5/Dgf8sw4ivY+qQQpVCa2AprJbOSmHVH6v+VVGtoNoxsGqKY9HU+q2ObUltC4K+j2ran445uEV7xpa0TUQsqAOOHn6HIfy7b4kfOoAD76/AqFPX7JGFHGelVYE1MHRGq9KFemVVFVUBHH1hUxa6n4Xr4o42bNjcqsY6M5eIxf0OPA4cclzMbdeBg++voCChnpmjCnkKrIGjsFoyK4RV9VEVv0lZaOeLTPe+sBvTIor1iv0is6b6xIpYWSPgfuDU9NTwn34gfvDF2PeoFYk/qWNA4GiRVclCdYGVEoe4+sLOmVvYF/b888/n5MmTxXrFPjtnnvrEilhdQ+AB4NTnVrVffqb25QOw/7XXzFFVGQqs/qeuACULxbCqoCol9oVNT08vtg0Ke8WqT6yIxSUCDwKn1lKF/7yV2pdfiu3AfjNHVaUosPqfwmpxoRZWFVSlxL6wLVu2LLYNCnvFqk+sSBCoh1tlNfyPH7Dv32fmiKokBVb/UlgNbVr1L66+sJMmFvaF/eqrr4iMjCzWK3bypAlcdo5dfWJFgkU9HHNWFwLjISGtI7tb55g8qKpJnQL8R/1W3YVSJwCt+hfA0Rf2hufcV/1f0tEGWvUvEhoMHKdePWX3IIVVsymw+ocCa6FAh1V/rPpXUBU323Y75p8W7Zla0jYRCXL5cHj3bLJvvBmq6eCamRRYfU9htVAgw6raU4nftU6EgZ3dA2lJ20QkiOUCcyD2ntuJvX08WL9eEdI0h9X3NG+1ULAvrlJQFRGpan4FvnX8pa+ttgAAIABJREFUt3rqYmo88qCpwxEHBVbfUlgtFMxhVUFVRKSqaQWMxzVntcbMp6i+cJ6ZI5IiFFh9R2G1ULCGVQVVEZGq6EJgVOHFmvf+ncgP/mPacKQ4BVbf0MkBCgVjWFVQFRGpqi4BLnf812YY1Lp5FOFffm7qkKQ4BVbfUFh1CLawqqAqIlKVXQv0cvzXduIEtYddSdi2X0wdkpRMgbXyFFYdgimsKqiKiFRlNmAc0MFx0X7wALWvvRzbvr/MHJWUQYG1chRWHYIlrCqoeiDtT1i+2dFPtKTLYj6rvybljc/q45cQVw2YDDRxXtyBLTfXzBGJBxRWK05hNXioy3MZDhyFlIXuZ2uqXyuMPYfyXZd1piZzlfQaWek1KW98Vh+/VCHVgbuA/wLJUG9TM529KgjotKwVp9OuBsepVlVRLUPKQjtfZNYkNTWVnTt30rlzZ04YNVyXU1NT+SKzJjc8p2+jWU5/jaz2mpQ3PquPX6qYOsBYIMpxMXFphJmjES9oOkDFqLJq/SkAqqiWIu1PWPZtAamp80lOTiYtLY3NmzeTmppKcnIyAMnJyRiGQUpKCtt268xNgXb6awTWek3KG9+q7609fhEA25EjVNv6I7kXXGj2UMQDzRJiVF31kjOsVuXqqpUrqyrblCJ9j+Pf3r17Oy6np7tddkpKSgJgu+YWBtzpr5GTVV6T8sb3xbayrzd7/CLshfrd6lD7qsuotvVHs0cjHlJ1tWKqenXVqpVVBdVStKzv+HfDhg2Oyy1bul12Wr9+PQCtVPkKuNNfIyervCblje/C1mVfb/b4RVgC/Ab2o0epNfxqbAetWXGRkimwek9h1XrPX4f+S9GmgWNRy6SJt2EYBklJSXTu3Jnbbiu8vH79eiZPmsBl59hpnVhQ/p2KT5X0GlnpNSlvfP07Flh6/CLcAOwAMqFa5g5q3TSKg2+9D3bVOIKJpgN4p6ovsrLaNACbYRiG2YMoz+HDh4mLiyPrRYitXv7+aX86Dru2SqzcHL+Dx+CG57Tq38pKeo2s9JqUNz6rj1+E/cADwGHHxaP33M/R+6aaOSKpBAVWz1XlsFrRoJp7/CjLJvUlKyuL2NhYn4wlpIKqv1r9bNvtmC/oDL6nXxbzWf01KW98Vh+/VHE/Ak8Apz4tDr7xH05eOsjMEUklKKx6TmHVOwqq5QTVQU87Wv3MmTuf3r17s2HDBiZNvI0Lmx1h6V06jCoiUmFLgDcc/y2Iq8X+jz8nv3lLU4cklaPA6hmFVc/5I6iGzEQjZyugOXMdrX4aN25McnIyz86Zx7JvC3TGHxGRyhgMnO/4rz3rELWvHkK1zZtMHZJUjhZaeaYqL7CywuKqkAmqVm9VJCIS1GzAzUADx8VqO7YT9sfvZo5IfECdATyjsGqekAmqVm9VJCIS9KoD9wDNgF5Q27ja7exVYZkZkJ9fyo3FyhRWy1eVw6qZQqY9ldVbFYmIhIR6wMNAXuGmxKURUAD59zaBvFxyuvck98Ie5HTrQV67DhAWZtZoxQvOsKq5q6Wrqq2rzGxZFVKLqdTqR0TEJF8Ds4pvLoiNI7drN3K69SCnd19yO58L1UKmRhKyFFbLVhXDKpS/uEqr/j3soxqIVj+l9Wr1VQ9XK6sKz1FEvJQGvA/8DJwofbeC2Dj+2vQTRt2EAA1MKkOBtXQKq8X5I6iG5J+1rf0YoErr1Tp3ZAETXw3taq6/+tSKSAhoA9wN5AM7gV9whNZfcJ0sAMAelUX9LxsBsHtQDgCRSz+A/Hxy+vTD8NGHm/iGzmpVOk0DCIyQWUwVKCkLHb1aU1NT2blzJ6mpqXyRWZPuD4eVuP2G50LnW1zacw+l5ygilRQGNAcuBW4HFgDPAGOBrqe+TklcGkHi0ghq338NtUdeT72WDag9dADV580mbHta4McuJdJCq9JpgZX/heShf39J+xPOvAtSU1NJTk52bZ8xYwb33HNPse2pqamkpKSQ9kzwHyIv7bmH0nMUERMcBCaUfFVe6zacGHolJ4ZcSd7Z54DNFtChSXGqrpasKlZWS6qqquG/yUrr1Vq/fv0St4dSD1f1qRURv6iBY8rAJcBp01arbUujxjNPUrfPhSR0aqMTDFiAqqslq4qV1UD1V1VQ9UJpvVr37NlT4vZQ6uGqPrUi4hfhQGdgNI6uAU8BI4AzcZxk4JSwP36lbtqF7rfNzgbrHxQMOQqrJVNY9Y+QXEzlL6X1av2/Jx6jfq2wkO7hqj61IuJ3NqDRqa9BQBbwDfAVjk+rGNxOMJC9+iYiNn7M8WuGceLaYeS3bG3CoKsm9VwtWVVdYOVPmqPqpdJ6tc4bWcCEEF/1rz61ImKaAtyPAeYBtwFHCzflntOF4yNGcvzaYRi1agd2fFWYwmpxVS2sOuerqo+qBYKq06rv4Ytt0K01XNKxcHsgeriarSo8RxGxuP3AczhaYJ32KWZERnJi8OUcv2E0OUkXgV2z3PxNYbW4qhhWFVQtEFTVS1RExEL2A18AnwGZxa/OP6MJB5asJL95y8COq4pSYHVX1cLqmu92atW/2dRLVETEQurgmM/6GPAEjv6tNQuvDju2k4Qf2poytKpIC63E17SYygtpf8KybwtITZ3v6iWanJyMYRikpKSwbbcOhYuImKYJkAIMBzYB63GcfMBeuAhr96Acat5zB/nNW3B8eApGrVqmDTdU6WxWhara4qoerWqzzMf3qTKgF9RLVEQkCFQDLsDRn/Vq96sSF0cQ8+J8Yu/9OwntmhF7x22Epf1swiBDmyqrhapi2ypfUlD1gnqJiogEmdNPZvUdrsVX9uxsqi96kYQLOlHr+iuI2PCx+rL6ULOEGAXWUxRWK05B1QtFe4mmpqaya9cuUlNTi/QSNXuEIiJSpouBJ3GcCSuqcHPUymXED+1PnaSuRL35b8jNNWmAoUdh1UFhtWK06t9L6iUqIhIisoF1wEoc3QOKOHnRJRx8b6kJgwpdmrfqEMpzVk8cO8L9l3VWeyorCEQv0bQ/HfNizehXmvYnrN8KNhsktdUiMREJYXnA/4BlwI5T28YDPRyLr8R3FFYdQjWs+iOoatV/BbX2Y3g0s1frgaNw/Twba3+0UVBQ+PiXdLTz5kRVjUUkBFUDugEX4jiBwKen/k9ht4B9jb8icsVSsm++TZ0CKkEdARyqWjeAytAcVQsys1drykI7G9MiiIuLc3v8/+1Ur1gRCXE2oC0wDghzv6ru3y+g5uMPk9CpFTUefxhbVpYJAwwNmrPqoDmrnlFF1WLM7NXqfGw4ycsvv6xesSIiAAdx9GUF7IcPU+Opx6j+4nMcm/x3sm8ajxGj4OUtZ1hVdVXKoxKZxZjZq9X52GY9voiIJdUGngb64qq02g8eoOa0+6l7zllUf34+nDxp4gCDV1WvrqqqWj4FVYsxs1er87HNenwREcuqh2NKwAygJ67+rGF79xD7jztI6NKO6H+/Cvn55o0xSCmsKqyWRYf+LaZor1bDMEhKSmL9+vVFerUWlH8nlXzsNT+FM3HiRLfHnzjB/48vImJ59YG/AUOAd4GvHJvDfttFzLMzOH79CPPGFsSq+iIrLa4qndpTWZCZvVoPHoPr59pYo1X/IiLlywTeBjYDfwfOVUuryqjKYRWCv22V2lNVEWb+6VA7BlZNMdi222D9Vsc2Rx9VVVJFRIppBtwNZJz6P0VaWjX5H9GvvMzRKQ9i1E0wZ3xBRpVVVVZPpzmqFmRmeyqn1okwrq/jS6v8RUTK0RzXvFUADKh76/nEvLSQhHPbUX3hPPh/9u49Por63v/4OxtIwi0XRRL0xFIhKCoaBUGkErVQKtbWntMjrSEix2KLXKwcvFCriFZRsR4rUqhUqqdri5fT6mmhKkUJXgBbhZ+cigYQiu0hQQ+XcJEEsvP7Y9mQhE12N9mZ+c7M6/l45AGZ7M585hL98J35vvfIEbeq85SgP7OK5mhUDROLiHpsXjSeqri4WOXl5frpY49r2bqINjHrHgDM96mkzdG/hmr3Kvf26TpxxAXKeqPS1bK8IsjNKpOrmqNRNYyb8VQAgDTpJekRSWXHFnX+4K864cpRyrt+nEL/+w+3KvMMmlVINKrGcTOeCgCQRvmSbpA0W9JpxxZ3+a/n1HPIQHV9/FEeB0iAZhU0qoZpGk8VDof1ySefKBwON4mncrtCAEBK+inarH5XUo/ootD+/cr90a3Kv36ci4V5A81qsDHr30DhSRGNW7BPFRUVjcti8VQAAA8KKfrJVhdIelbS65IsKees37pallcEPQ0gyMhRbaeqHdHnSfsV2TcrflN19JlUO7fRltg+ZoakhkjH6nDieAGAZ2yW9D+Srjq2qPqK+uhHsWZnu1WV8YLarHolsoocVQPs2h+Nj3IijL/EpaYu3j6GQiFFIpGU99XJ4wUAntHv6FcTRb/PUt3PyxQpPlW198+VVcBt35aCOrIa5HxVnlFNkQkZp3aLt4/5+fkqLS1NeV+DcLwAIC0qpew3K9XlN79SzwtLlf3HP7hdkZGC+sxqUJ9XZUQ1BbGM03A4mnEqSeXl5bIsSxUVFdpU7f3b2on2ce7cubrllluS2tcgHC8ASJtOkrpI+lzKrKlWwXf+WQfHXad99z8sK023Uf0iqCOrQcSwVgqCkHGaaB979eolKbl9DcLxAoC0uVjSQ5JKjy3qGn5KJ35psDq/uaq1dwVWEEdWgziqSqOagiBknCbax507d0pKbl+DcLwAIK1OkDRD0kRJOdFFnbZv0wlXjlKPO26VDh1ysTjz0Kz6H7f+U9A049SyLJWVlamysrJJxqn346Na3cebblJpaanm3P/jpPc1CMcLANIuQ9Ilks6U9HNJH0oZlqVu8x9V1muvatfLlbLy8lwt0SRBfAwgSJOriKdK0e4D0rgF/p7FHm8f2zvrPwjHCwBsE5H0R0nPSToi6WKp+vf17tZkqKA1qyY2qnbEU9GoJiFeBqjbGadOeHWDtGaTdOqJUmFex/Y1CMcLAGzzd0nPS/q+pC5HM1dxHJpVd9GoOtyoBjUDNKj7DQBes7vTi4rk5unwsOFul2IMmlX32NGoMpmqDUHNAA3qfgOAp+yUCiZcpRO+NlLdHnlQivDcvxTMCVZ+xmSqVgQ1AzSo+w0AnvOypINShhrU45471fmdNdq78Jey8vPdrsx1QZpg5feJVQyRtSKoGaBB3W8A8JxySd9UNCVAUs7LS3XipcPUacP/c7MqYwRpZNXPkVU0qq0IagZoUPcbADwnU9K3JN0qqXt0UaetW3TiV0YoZ0nYxcLMQbPqfdz6b0VQM0CDut8A4FnnSPqxpJ9K2iplfP658r//bzrwl3e07/6HpawslwsE2o9Z/20IagZoUPcbADytXtJ/Snr92KK6y0Zp93/9QcrIcKsqIwTleVXJ3RQAO2b9M6LahoJu0tIZkcYM0MyQ1BCJ6LN9cq1hi5fpmm4t9zu6LXdHUp3Yb6CjuE7hqixJ35XUT9JTkg5L2aXLA9+kSkyu8jIa1SSc2F36wZ/cHWF0I9u0xID/2ZLpCi/gOoVRLpH0BUlVkgZJRUuz+IAA0ax6FZOpkmBCrqgJNbghqPsNb+E6hXG+KGn0sW+LlkafU816bblk/hN/tmFylfcwopqACbmiJtTghqDuN7yF6xReUfSDLOkp6fOrr9HexxZKOTlul+QKRla9hX/uJ2BCrqgJNbghqPsNb+E6hSd8KulX0b92ee7XOuEboxX6dKerJbkpaCOrXh5dpVFNwIRcURNqcENQ9xvewnUKTzhJ0lRJ2dFvs9au1gkjL1bmpo/crAoO8mqzyq3/BEzIFTWhBjcEdb/hLVyn8IwLJPWU9BNJu6VOf9uqE0dfot3PvaTDg4e4XJzzgvQIQEzTZtUrjwSQo5oEE3JFTajBDUHdb3gL1yk8ZZekuZK2R7+NdO2qPU8vUf2or7pZlWuC1qy2lM6G1Y4cVRrVBJrmIkpNc0WdrSOmMdM1U2posK+WjuRB2pEl2TzT1b7tAB0R7zoFjHRQ0iOSNka/tTIztXfez3XommvdrMo1QW9WYzratNKoOtiompqLaHddHVm/U8fM1HMDAJ5yWNICSWuj3zb0PlmfvbNBVo8eblblGprVY9rbsNrRqDKZqhWm5iLaXVdH1u/UMTP13ACAp3SWNEXSVyR1kTKn/W9gm1Q0Z1JSAJOp4jA1F9HuujqyfqeOmannBgA8KSTpWkljJJ107IMBgvhJVkGcXJWICZOv2jUENX/+fPXp00c5OTkaOnSo3nnnnTZfv2fPHk2ePFm9e/dWdna2+vfvr2XLlrWrYCeYmotod10dWb9Tx8zUcwMAnpWhaHxVE0W/z1LXRQukw4ddKcktQcpXTZVbo6wpN6rPPvuspk+frlmzZum9997Tueeeq9GjR2vnzvjBwfX19Ro1apS2bdumF154QR999JEWLVqkU045pcPF28XUXES76+rI+p06ZqaeGwDwjYikX0i5t9yk/PHflurq3K7IUTSrbXO6YU351v8jjzyiiRMnasKECZKkhQsXaunSpVq8eLFuv/32416/ePFi7dq1S2+//bY6d+4sSerTp0/HqraZqbmIdtfVkfU7dcxMPTcA4Bt/l/R29K85y36vgnH/qt3/+azUpYurZTmJxwASizWrdj8SkNKs//r6enXt2lUvvPCCrrrqqsbl48eP1549e/TSSy8d954xY8bohBNOUNeuXfXSSy/ppJNO0jXXXKPbbrtNmZmZSW3XjVn/8XIRy84M6Xc/SH1meTqjnuzOa+zI+p3KkiSzEgBs9j+KfjDA0UdV68ou055f/5esbsH5jyyNamre/t9dtsz6T2lE9bPPPlNDQ4MKCwubLS8sLNSHH34Y9z0ff/yxXnvtNZWXl2vZsmXavHmzbrzxRh0+fFizZs2K+566ujrVNbnVUFtbm0qZaWFZ0uEjzZe98aE0dl6Gnp1qJdUQ2RX1tHRGpEVeY/pGETsSVlbQTbbW5vR2ACCwzpZ0q6SHJR2SsitfU8G3rtTu514KTDIAo6qpuejkE3RgX/rn6Nue5xOJRNSrVy898cQTGjRokMaOHas77rhDCxcubPU9c+bMUV5eXuNXcXGx3WUep2JhSH/e3jwCKS8vT6uqspKOQbIz6qmkSLq8NP0z3NMR/WRXbW5tBwACaYCk2yUdvZOZtfpNFVz9DWXs3+9mVY7ieVX3pdT69uzZU5mZmaqpqWm2vKamRkVF8buF3r17q3Pnzs1u8w8YMEDV1dWqr69XVlbWce+ZOXOmpk+f3vh9bW2to81qogikZeuUMAbJC1FPpmwXAGCoEkk/lDRH0oFos5r/7W9q9/P/HZhnVhlZdVdKI6pZWVkaNGiQVqxY0bgsEoloxYoVGjZsWNz3DB8+XJs3b1YkcuzWbFVVlXr37h23SZWk7Oxs5ebmNvtyUqIIJClxDJIXop5M2S4AwGBflDRTjSOr2W9WKvdHt7pZEQIk5Vv/06dP16JFi/T0009r48aNmjRpkg4cONCYAnDttddq5syZja+fNGmSdu3apZtuuklVVVVaunSp7r//fk2ePDl9e5FmiSKQpMQxSF6IejJluwAAw31R0ccAciT9k9T1/J+7XJCzeATAPSk/9Tp27Fh9+umnuuuuu1RdXa3S0lK9/PLLjROstm/frlDoWP9bXFysV155RTfffLPOOeccnXLKKbrpppt02223pW8v0iwWgTRlcvMIpGnTpik7O1tfPvNwwsk7Xoh6MmW7AAAP6KvoyGqhpB7RT7EK0idY8QiAO1KKp3KLW/FUY+eFtHzDseYsFArpy2dZSc/690LUkynbBQB4U/WXa6XsbCkjw+1SbEej2rYD+2o18vwvpDWeikY1gVffl5aul3rlSldf2PYEqJZZqbFlnTKlIw3ty1FdvFJ6/QPpy2dJ15UlfHnaNI9+cm67AAAP+VzST6QDl96kffc9RLMacHY0qukPvPKJeDmmb28+fmQx3utGDQxJsrR8w7F/A8RGJZO1pUYafk+mavY0SJLCb0m3P5ep1bMa9MVe7d+vZJXQoAIA2hKR9JCkKqnbxp8qctJJOnCz/ydZ8QiAs2zPUfWqZPNE473uz9t7aFVVVoeySIffk6lDVvdm6zhkddew2cl9mhcAALYKSWpyp6/H7B8pZ0nYtXLgT4yoxpFsnmii1w0ZMkTFxcUpZ5G+8r5Us6eh1fUu3yCNGmjrIQAAILFLJO2V9Fz027wpNyhSWKT6S0e6WJT9GFV1DiOqcSSbJ5rwdZs3t/retqzd3PZ6V29KvA4AABzxdUlH+9KMI0eUX3G1Ov2/da6W5AQiq5xBoxpHsnmiCV/Xr1+r723L0H5tr3dYSeJ1AADgiAxJ4yUNjn4b2r9fBVd/Q5l/2+ZiUfALbv3HkWyeaGuvmzplirKzs7V27Vrl5OSknEU6+hypMD9Tk1vkuE6ZMkWF+ZkaNbDB7kMAAEDyQpImS7pf0iYps6ZaBd/6mv7vlUpZJ5zocnH24REA+xFP1Ypk80Tjva6tWf/JZpFu3SkNm31s1r8UbV6dmvUPAEDK9kmaLWlH9NsDE2/UvrmPulmRI2hWo8hRdSFHNdk80XivS0cW6fIN0WdSh5UknkAVL8s1HexaLwDAhz6VdLek0yV9X6r+pv8/vYpGNYpG1YVG1QviZbmm49Ok7FovAMDnPpN0gqSQAvMxqzSr9jSqTKbygWQzX01ZLwDA53qqscMoWpoV/Yv542IdQgqAPZhM5XHJZr6asl4AQPAU/TJL9b+5UHvn/VwNpw9wuxx4CENjKajaIf1xffTZU1Mkm/lqynoBAAHzN0l3SlnvrFHBt/9ZGbt3uV2RbRhVTT8a1STs2i9d8XBIp8+QxsyV+v979PvdBjyOkmzmqynrBQAETKGiz6tK6rR1i/LHf0c6fNjVkuxEs5peNKpJMPlZzaZZruFwWJ988onC4XCT3Faz1gsACJgcSf8u6ejcmuxVr6vHXTPdrAgewqz/BKp2SKfPkMLhcOOzmlL0+4qKClX9xP1nNZPNfDVlvQCAAPpI0n2SjsaD71kc1qF/vtrNimwVxBQAO2b9M5kqgdizmg0NDdq0aZNKSqKfX9r0WU23G9WCbtLSGZEWua2JPwGrI+slWxUAkJLTJV0r6ZfRb3OnfV+Hzz5HDf3PcLMq2/CpVelBo9qGXfulB5eGJEU0fvx4SdKYMWMUDoeNfFazxKamsel6yVYFALTblyVVSXpLCu3fr/zx39auP70lq5s//wdCs9px7j9kabCKhSFt2NHi2dQ1a3TZZZcF9llNk5/XBQAYLkPSv0n6p+i3nTd+oNybb/R1xmqfk7oxwaoDGFFtRaIc0bIBGQpP8u8vVjxkqwIAOixH0k2S7pR0SGr4p+Joo5qR4XJh9mJ0tX0YBmtFohzR275mBe5WN9mqAIC0OFnS9yX9u9R90ENSKBjtCCOrqQvGldEO5Igej2MCAEibCySdH/1r48esBkDsUQCa1uRw678VTXNELctSWVmZKisrmzyb2vFZ9V7DMQEA2KVoaZZ2nrNZkeJT3S7FMU2bVR4LiI8c1TaQI3o8jgkAIO0sScsla0m29s77uQ5dfY3bFbnGyw2rHTmqNKpJ2FQtrfwg+px32QBzJgy5mWXaPFvVrNoAAB7zgaIfBiAp0qOH/u+Nv6ihzxddLckEXmtaCfx3wa790g/CZo0gmpBl2lpmqwm1AQA85kxJX5L0phTat09537tOu5aukDoFu03h0QAmUyVkYm6oiTV5oTYAgMHGSzop+testavV7ZEHXS3HNEGdgBXsf6okYGJuqIk1eaE2AIDhukqaLOkeSRGp+0P3qW70GB059zyXCzNL0EZZGeZqg4m5oSbWFGNybQAADyiR9PXoXzOOHFHepOulujpXSzJZEEZZaVTbYGJuqIk1xZhcGwDAI74p6WhCVecP/kfdH/qxq+V4gZ+zWbn13wYTc0NNrMkLtQEAPKKTop9adaekBqnbf8xV3eVX6vDgIS4X5g2xZtUvjwUQT5WAibmhJtYUY3Jt8B5izoAAe1HS85J6Sv8XXqnDF17kdkWe5GTDSo6qSzmqUuLcUDeYWFOMybXBfMScAVCDpD9IGiWpq1R9Rb3LBXmf3U2rHY0qz6gmqaRIurzUrKbLxJpiTK4N5iPmDIAyJX1D0TQApIUXn2PlGVUARiHmDEA8RUuzVD2mTrIsKcQ/WjvCS8+xcqYBGIWYMwBx7ZSKyrLV9WePuV2Jb3hhhJVGFYBRiDkDcJxaSTMlvS91v/9uZf5tm8sF+YvJDSu3/gEYhZgzAMfJlXSxpOVS6OBB5c6Ypt3PvSRlZLhdma+Y+EgAjSoA44QnRTRuwT5VVFQ0LovN+gcQUFdL+ouk3VL28peV87vndeifr3a7Kl8yqWH1TTyVnXmLbmY5kiOJICPmDEAzf5b0aPSvDUW99dk7G2SlKQYJrUu2YbUjnsrzI6p25i26meVIjiQQbU5pUAE0ukDS+ZLekzKrd6j73Pu1794H3K7K99wcYfX8ZCo78xbdzHIkRxIAgDgqJHWO/rXrgseU+dFGV8sJEjcmXXl6RNXOvEU3sxzJkQQAoBW9JF0p6bdSxpEjyr31Zu1+8Y9MrHJQn5O6OTa66unhOTvzFt3MciRHEgCANlwp6aToX7MrX1Pn1W+5Wk4QOTW66ulG1c68RTezHMmRBACgDVmSyiUVSZohHb7oSy4XFFx2N6yevvVvZ96im1mO5EgCAJDAYEnnSep09ONVr6h3u6JA63NSN+3Lbkj7ej0fT7X7gDRugT2z4+1ct8nbBgDAa2hU3bevtlb9T+2Z1ngqzzeqMXbmLbqZ5UiOpDnItAUAg1nSp/3fV0P/M9yuJLDsaFQ9feu/KTvzFt3MciRH0n1k2gKA4bZJ+o3U88Pz9dma9Wro19/tipAmnp5MBTiBTFsAMNy7kv4nGlfVY/aP3K6/ED6bAAAgAElEQVQGaeSbEVXADmTaAoAHXCHpNUl7pJzfv6jOa97W4QsvcrsqpAFDQkAbyLQFAA/IkfQvx77tcdftkvlTcJAEGlWgDWTaAoBHlEk6JfrXrHfWKGv5y66Wg/SgUQXa0DTTNhwO65NPPlE4HG6Saet2hQAASVKmpH899m2PH8+SGtKf6wln0agCCYQnRXRhn32qqKjQqaeeqoqKCl3YZ5/Ck/jgBQAwymBJfaJ/7fz+enWffYeb1SANPDWZastO6bw+blcRRaZmcBR0k5bOiOjV96U1m6VhJdKogeltUrmeACANMiRdK+k+SQ1S98ce0ZGzz9Ghq69xuTC0l6ca1fPvcD+/kkzN4LHznHM9AUCanS5pnKSnJWVIoU8/dbkgdISnbv0/8cQTrudXkqkZPHaec64nALDBKEmXS5oh5fa5xe1q0AGeGlEdO3asunTp4lp+JZmawWPnOed6AgCbZCg6qnpU0dIsVV9Rr9DOGmVu2azDQ4dJIQYEvMBzZ8nN/EoyNYPHznPO9QQAzilamqVeE4t14uWXqlfJPylvwjXq8tQvlLl1C5mrBvNco+pmfiWZmsFj5znnegIAB22TFP3Pq0L/95m6/O4F5f3gRp103gCddE5/5U79nrJf+i9l7N3rZpVowVO3/p999lndftstR/MrnY8GapqpaVmWysrKVFlZ2SRTk7giv7HznHM9AYCDiiVNlPSupI2SPj/2o8xP/qauv/qluv7ql7I6ddKeJ3+lum/8S/z1wFEZlmX+eHdtba3y8vIkuT8jevcBadwCZmkHiZ3nnOsJAFzQIGmrpL8e/aqSdLjJz/9DUi+p+op6SVLmtq3K/GS76ocNlzp5aozPUftqa9X/1J7au3evcnNz07JOTzWqv7tZumqw29VEbaqOPkPodO6lG3mbLbdpSuan03XYec7dup4AAJLqJX0oab2kakm3tvj5s5L+W4qc2FOHLv+aDn39m6ovu0zKzna6UqMFvlGVgjva5EbeZrxtFuZnqmbPsY+kc+N8kD0KAHDULZL+t/miSI8eqvvqFTr0rW+r7rJRUufOrpRmEjsaVU9NpjIhR9UtbuRtttxmaWmpDlndXc/8JHsUAOCYiKRvShoiqckAamjfPnV5fokKxl6lXqefqtzpU9TprxtcKtK/PPWghds5qm5xI2+z5Tarqqq0fv16hcNhVzM/yR4FADgqJOmio1/1kjZI+rOik7IOHn3Jrv9T18VPqK7sUh05a6BLhfqT54aggpgx6UbeZsttbtmyxfEakqnLrToAAAGUJWmQpO9LWiDp3yUNO7o8RyqIfEdFS7NUtDRLktT5nTXq8tQviLzqAM81qkHMmHQjb7PlNvv27et4DcnU5VYdAICA6yTpfElTFG1ab1e0YT2qaGmWTrxzhPJ+cKN6nXGq8m64TlmrVkoRogdT4alb/27nqLrFjbzNeNssLS3V5MnuZn6SPQoAME6OpJIWyw4o+niApIzPP1eX536tLs/9WkdO66eDEybq83HjZRWc4HCh3sOsf4+wI28zUbxTvG0mO+vfzugoP2aPmhL55XccZwCOsRT9NKxVkt5StHFt+uOcHH3+L2N18Lvf15HzBjlenh0CH0/13n3SeX3crsZd6cjbTDXeqeU226rByegoP2SPErXlDI4zAFfVS3pP0muKfsBAC59VrtWRc89zuKj0C3w8Vd9eblfgvpIi6fLSjjVmqcY7tdxmWzU4GR2VjmPhNqK2nMFxBuCqLEkXSvqhpLmSviqp69GffUHq+cnQxglYkiTzxxAd46lnVNFxdsY7ER2VGo6XMzjOAIxysqQKSf8qaY2k7pIyoj+KNat1i76sIyX9dXDSVDWc1s+dOg3BcELA2BnvRHRUajhezuA4AzBSjqRLJLX8aPhNUvbKFeq2aIF6DjpL+eP+VZ3e+4vz9RmCRjVg7Ix3IjoqNRwvZ3CcAXjK/6rxE7AyLEs5f3hJPS+7SAXfHKOsNyoD91gAt/4Dxs54J6KjUsPxcgbHGYCnlCk6yvqapFck7Y4uzn79T8p+/U+qv/Ai7f/321U/crSUkeFenQ7x1Kz/vYuk3K6JX4+22Rnv5MfoKDtxvJzBcQbgSYclvSHp95J2Nv/RwfHXq/anC1woqnWBj6cypVF1MouxaodUuTH6j6ayAendnp3xTn6IjnISx8sZHGcAntSg6MSrlyT94+iyH0rVt9a7V1McNKouN6pOZjHu2i99+/GQlm84tq1QKKQvn2Xp2akWo0AAAARNRNFPu/qrpOuOLa6+ol6d3vuLMo4c0eEhF7pTm8hRdZ2TWYwVC0P68/bm28rLy9OqqiyyHwEACKKQpAvUrEmVpKI/ZKnnDRfpxK+MUP7V31CnDf/PheLswWSqJDmZxZhoW8vWiexHAAAQtUHS5uhfc179o7KXv6xDY8u170ezFfmnYldL6yiG5hKo2iH9cb206sPo905kMSbKfUz39gAAgIedKem7kk6MfpthWeqyJKyTBp+l7vfeqYx9+9ysrkNoVFuxa790xcMhnT5DGjNXmviL6HInshgT5T6me3sAAMDDOkm6VNJPJJVLOjqPJePQIXX/yYPqef4AdfnlIunIEfdqbCdu/bfi2POo8zVixAitWrVK119/vaZMtj+LMZb72HJb06ZNU3Z2tr585mGyHwEAQHOdJY1RNIv1RUmvSjoiZX66U3k3T1bW6re094mn3KwwZcz6j6Nqh3T6DCkcDjc+IypJP//5z3XjjTcqErF/1v/uA9LYecz6BwAA7bRT0hJJa49+f7dU/QP7Iq3smPXv+xHVV96X1m6WhpVIowYm956Wz4i+8sorWrt2rfr166dIJKJF35VOKYhlMSYe2WxP7mpBN+nV2yPaVB3NUZWksgGRtE6gSqYuJzNj7dSe66A1fjkmAACf6yVpmqQqSR9KKpGKlmZJikZahT7ZrkhhkZSVlZbNbf14c1rW05RvG9UtNdLwezJVs6ehcVlhfqZWz2rQF3u1/d7YM6LPP/+8HnroIdXU1DT+LDMzU6f1atBlZyWuIR25qyU2NEPJ1OVkZqydOnIdtOSXYwIACJj+R7+aKPp9lnSndKRTiWof/A/Vf/kr7V797t27dON3r9XKFa92rM44fDuZavg9mTpkdW+WQ3rI6q5hszMTvjf2jOhtt92mQ4cONVtH9+7ddc3PEq9DcjZ3NRXJ1GVq7anqyHXQkl+OCQAAel3SNqnT5k064V++prwJ1yi0sybRu+K68bvX6o2VK9JaXowvR1RfeV+q2dPQag7p8g2Jb/9OGBHRsnURzZ/fvnU4mbuaimTqsiwza09VOq6DGFPPJwAA7dJX0VHWqui3XX73grJfX6F99z6oz8eNj352exK2bK6yZSQ1xpdDQWuPPiLRWg7p6k2J1/HB3zu2jkRZqG7loCZTl6m1pyod10GMX44JAACSpD6S7pL0PUndo4tCe3Yrb+oNKvjGV5WZ5POm27Z+bFOBR2uyde0uGdov+mdrOaTDSuxfR6IsVLdyUJOpy9TaU5WO6yDGL8cEAIBGGZJGSJorafixxdmrXlfPi85Xt0fnJsxe7fPF0+ys0J+3/kefE50wM7lFDumUKVNUmJ+pUQMbbF9H7DnXaVPtz11NRbJ1mVh7qtJxHcSYej4BAOiwXEk3KtqsLpb0WfTDAnrcfYfqv1Smw4OHtPrWvv3665Ivf0VvrFyhhobk/7+aLN/mqG7dKQ2b3bHZ3h1dx+4D0rgF5s0ST6YuU2tPVTqugxi/HBMAAFp1SNILkl6WNFpSRTTKqi179uzWpOsrGp9VTWeOqm8b1ZjlG6LPInYkP/OXldJrf5W+fJZ0XVnq799UHX2Gsb25m3bldiZTV2uvSVSTaVmj6bgOYl59X1qTpkxWAACM9LGkUyRlH1tUffkhhf7+iSKnfiHuW5b94SVdP+5faVSd4nZuptvbb09NJtacLn7eNwAAEnpVsp7rotp7H9Tn13+vMRmgZY5qOhtVX06mShe3czPd3n57ajKx5nTx874BANCmHZJ+I2V8/rnyZkxTwdXfUOjTnZLIUXWF27mZbm+/PTW9usG8mtPFxPMBAIBjChRNCPhT9Nvs5S/rxOGDtPGue8lRdYPbuZlubz+eRDWt2dT2z72cNWri+QAAwDE5kiZIulVSXnRR5s4anT3lBj0gqbNNm6VRbYXbuZlubz+eRDVdWNL2z72cNWri+QAAwHHnSpoj6Zxji26T9KainyGQbtz6b4XbuZlub789NX1lYMS4mtPFxPMBAIAr8iTdomiE1RJJDdIQSW9IKk7zpnw5679pNJJltT8mye3cTLe3356aTKw5Xfy8bwAAtMtWqeExKXOntEHRgVbiqVoRLz4oFAopEjn2aUvtaSo6moPaUW5vP55ENZlYc7r4ed8AAEjZIUn/KX1wunTWEzSqrbri4Wh80GPz5mvEiBFatWqVpk6dqvPOO0//9m//pmlTJ+vCPvu0dAa3aQEAANKp9qCUNzG9japvnlFNFB+0cOFC/fSxx4kSAgAA8AjfzPpPGB+0eTNRQgAAAB7im0Y1YXxQv35ECQEAAHiIb279txYfNG3aNF122WVau3YtUUIAAAAe4qvJVPHig9Ix6x8AAABtYzJVC03zUkuKpIJu0tIZkWbxQVKkSZSQN0dSW+6naeLVZ3rNAADAfJ5sVOPlpTYdLS1p0Rx5tVFKtJ9ui1ffqIEhSZaWbzg2UG9SzQAAwDs8OZmqYmE0LzUcDmv79u0Kh8Nas62Hxi3w5O60yvT9jFffn7f30KqqLGNrBgAA3tGu7mH+/Pnq06ePcnJyNHToUL3zzjtJvW/JkiXKyMjQVVdd1Z7NSjqWl/rYvGheanFxscrLy/XTxx7XsnXR2/5+YPp+tlbfvMcfV11dnYYMGWJczQAAwFtSblSfffZZTZ8+XbNmzdJ7772nc889V6NHj9bOnTvbfN+2bds0Y8YMXXzxxe0uVkoiL7VFM1S1Q/rjenmuSUp1P52WTG7tccs8dg4AAIC7Um5UH3nkEU2cOFETJkzQmWeeqYULF6pr165avHhxq+9paGhQeXm5Zs+erdNOO61DBSfMSz36POqu/dGPVD19hjRmrtT/36Pf7z7Qoc07Jtn9dEsyubXHLfPos8IAAMAdKU2mqq+v17vvvquZM2c2LguFQho5cqRWr17d6vvuuece9erVS9dff73eeOON9ler1vNSW2akHnt+cr5GjBihVatWadrUyRq3YJ+WzjB/9n+y+2lafVOnTFF2drbWrl2rnJwco2oGAADeklKj+tlnn6mhoUGFhYXNlhcWFurDDz+M+54333xTTz75pNavX5/0durq6lRXV9f4fW1tbbOfhydFNG7BPlVUVDQui80sl449PxkOR5+flKTy8nJZlqWKigptqvZGEkCi/XRbvPpis/5NrRkAAHiHrfFU+/ZFm5hFixapZ8+eSb9vzpw5mj17dqs/j5eXalkRrdkU/Xsyz3d2tFF1Iic03n46MSqZ7L61VZ/TNQMAAP9JqVHt2bOnMjMzVVNT02x5TU2NioqO72i2bNmibdu26corr2xcFvuUqE6dOumjjz5S3759j3vfzJkzNX369Mbva2trVVxcfNzrSoqkE7sfn+VZdmZIUkSrVq1qHFGV0vOspBvZpi1zYe3S3n2LV59TNQMAAP9KaTJVVlaWBg0apBUrVjQui0QiWrFihYYNG3bc68844wxt2LBB69evb/z6+te/rksvvVTr16+P23xKUnZ2tnJzc5t9tSZelueGHT1UmJ+paVMnKxwO65NPPlE4HG7yrGQqe514e37JCfXzvgEAAO9J+db/9OnTNX78eA0ePFhDhgzRo48+qgMHDmjChAmSpGuvvVannHKK5syZo5ycHJ199tnN3p+fny9Jxy1vj0TPopYNqE3rs5J+efY1Hj/vGwAA8KaUG9WxY8fq008/1V133aXq6mqVlpbq5ZdfbpxgtX37doVCzozAJXoW9bavWVr03fQ9K+nEs69u8fO+AQAAb2pXRzllyhT97W9/U11dndauXauhQ4c2/mzlypV66qmnWn3vU089pRdffLE9mz1OMlmjJUXS5aXpabJMzzbtCD/vGwAA8CZbZ/3bzemsUdOzTTvCz/sGAAC8KcOyLMvtIhKpra1VXl6e9i6Scrs2/9nuA9K4Bc7Nwnd6e07y874BAAB71R6U8iZKe/fubXMifCo836jGNM/ttL8mu7cXyzLNDEkNkeO3Y2eOq9PHMh2cyLU1UVD3GwBgHjsaVU/f+m/K6dxOu7YXL8s0FAopEolozHkhzbs2oqn/ae+op5cyUN3ItTVBUPcbABAsBGQaJl6WaX5+vkpLS7VmWw9dNDuTrNMmgpr9GtT9BgAEi29GVP0gUZbpLbfcorlz55J1elRQs1+Dut8AgOBh+MUgibJMGxoa2vz55mqbCzRMMtmvfhTU/QYABA+NqkESZZlmZma2+fOgZZ0GNfs1qPsNAAgebv0bpNUs05tuUmlpqZ78xRMqzM8k6/SooGa/BnW/AQDB45t4Kr+Il2XadNb/49dGNMXmWf9eEtTs16DuNwDAXOSoutioOp1XGcsy7ZQpHWk4fruvvi+t2SwNK5FGDbS/nrYkc2zsPn5ezH5Nh6DuNwDAPOSousCtvMrWskxNys9Mphan6vVS9ms6BXW/AQDBwGSqBEzLqzSpnmRqMaleAADgLYyotsG0vEqT6kmmFssyp14AAOA9DGu1wbS8SpPqSaYWk+oFAADeQ6PaBtPyKk2qJ5laTKoXAAB4D7f+22BaXqVJ9SRbiyn1AgAA7/FlPNUr70tr0xTdFC+vcsSADE0eaem8Ps4/Y9me/Ey7oqGSqYW8T3SE07FwAID2I0c1QaO6pUYafk+mavY0NC4rzM/U6lkN+mKvjtWwqVpat1X62WshVX7gftOVTH6mU9FQydRC3idSYVIMGwAgOXY0qr56RnX4PZk6ZHVvFoV0yOquYbMzO7zukiLp6bdC2rDDjKilkiLp8tK2mz6noqGSqSWZ1wAxxJoBACQfPaP6yvtSzZ6GVqOQlm/o2GMAJkVDJcNr9QIxXLsAgBjfDE+s3Rz9s7UopNWbOrZ+r0Utea1eIIZrFwAQ45tGdWi/6J+tRSENK+nY+r0WteS1eoEYrl0AQIxvbv2PPic6cWry5OZRSFOmTFFhfqZGDWxIvJI2mBQNlQyv1QvEcO0CAGJ8Net/605p2Gx7Zv1L3ota8lq9QAzXLgB4D/FUSeaoLt8QfSY1HTmq0vFZjm1FLZmY+0g0FLyKaxcAvINGNclGNV1SyXIk9xEAAAQZOaoOSyXLkdxHAACA9PLNZKp0SyXLkdxHAACA9GO4rxWpZDmS+wgAAJB+NKqtSCXLkdxHAACA9OPWfytSyXIk9xEAACD9mPXfhlSyHMl9BAAAQWbHrH9fjqg2zTK1rPbnmhZ0k5bOiLTIcow/OprKa1PlRjariXmwCCauRQAILl81qi2zTEOhkCKRjo9wlqTwP8hUXpuIG9ms5MHCFFyLAABfTaZqmmV62WWXKS8vz9O5pm5ks5IHC1NwLQIAfDOi2jTL9IILLtC4ceMUDoc9m2vqRjYrebAwBdciAEDy0Yhq0yzTLVu2NP69KS/lmrqRzUoeLEzBtQgAkHzUqDbNMu3bt2/j35vyUq6pG9ms5MHCFFyLAADJR7f+m2aZ/vSxx3XZZZdp6tSpns01dSOblTxYmIJrEQAg+SxHtWWWabpm/bvFjWxW8mBhCq5FAPAWO3JUfdWoxjTNMpWa5pqmpx6ncx2bZ7Pavz23tgnEw7UIAN5Ao+rwJ1O1RK4jAABAfHY0qr6ZTOUEch0BAACc45vJVHYj1xEAAMBZDAUmiVxHAAAAZ9GoJolcRwAAAGdx6z9J5DoCAAA4i0Y1BeFJEY1bsE8VFRWNy2Kz/gEAAJBevmlUncg2LegmLZ0RaZHr6GyT6nSGa0d4qVa3cIwAAGid5xtVN7JNS1xoKryU4eqlWt3CMQIAIDHPT6YKSrapl/bTS7W6hWMEAEBinh5RDUq2qZf200u1uoVjBABAcjw9fBOUbFMv7aeXanULxwgAgOR4ulENSrapl/bTS7W6hWMEAEByPH3rPyjZpl7aTy/V6haOEQAAycmwLMtyu4hEamtrlZeXp72LpNyuzX+2+4A0bkHrs6fjxf+YHgkUr75E+5mKV96X1m6WhpVIowamXk+i45fOWv2KY+RP6fxvi+n/nQKAlmoPSnkTpb179yo3Nzct6/R8oxrTPNs0fvzPyIEZylCGlm8wszlIJrKo5X6mYkuNNPyeTNXsaWhcVpifqdWzGvTFXsnVU5jf/P1tHb+O1BoUHCN/SGfcGNFlALzKjkbV08+oNlVSJF1eeux/9vHif96oytKft5sbCZRMZFHL/UzF8Hsydcjq3mz9h6zuGjY7M6l6SktLj3t/W8evI7UGBcfIH9IZN0Z0GQAc4+lnVFsTL/7nggsuUF1dnZ588kkjI4Hsjix65X2pZk9Dq+tfvqH5YwAt66mqqtL69esVDoeNPH6AW9L5u0t0GQA058t/oseL/9myZctxyyRzIoHsjixau7nt9a/e1HY9ph8/wC3p/N0lugwAmvNloxov/qdv377HLZPMiQSyO7JoaL+21z+spO16TD9+gFvS+btLdBkANOfLW//x4n/eeecdZWdna8pkMyOB7I4sGn1OdCLU5Bb7P2XKFBXmZ2rUwIZmr49XT2lp6XHvN+X4AW5J5+8u0WUA0JxvZv23FC/+Z9TADMngWf92RxZt3SkNm538rP949aQy6x8IinT+7hJdBsCriKdKoVGN2VQtrfxAysiQygZEJyKYHgnU0foS5S8u3xB9JjXZHNWW9Zh4/MichAnS+bvx6vvSmhTyjgHAbTSqKTaqQcsjDNr+SsHcZ/gb1zQAryJHNUVByyMM2v5Kwdxn+BvXNAAc48vJVFLw8giDtr9SMPcZ/sY1DQDN+faf6EHLIwza/krB3Gf4G9c0ADTn20Y1aHmEQdtfKZj7DH/jmgaA5nx76z9oeYRB218pmPsMf+OaBoDmfD3rP2h5hEHbXymY+wx/45oG4FXEU7UjR1UyM/ezLa1lgiabFdpyf4OQMeq1cwwkwjUNwGtoVNvZqHpFa/mJ8yoimvqr1EdYyGMEAABOIUfV51rLT7zonsx25SqSxwgAALzMt5OpvCZRfuLDD9+ZUq4ieYwAAMDrGFozRKL8xJNOOinu8tZyFcljBAAAXkejaohE+Ymffvpp3OWt5SqSxwgAALyOW/+GaCs/sTA/U/ffd68KCwuTzlUkjxEAAHgds/4N0lp+4uPXRjTlP1OfvU8eIwAAcArxVD5vVGNay09sb65ie95XtUOq3ChlZEhlA5h4BTgpCNnHAPzHjkaVW/8GKmnlf06tLW/v+uLZtV8a+3iGXvtrhiKRYyOxowaG9OxURmIBO5F9DADNMZkKzVQsDOmNqizl5eU1y1/983byVwG7kX0MAM0xoopGsexVqU5PPvkk+auAg8g+BoDj8c90NIplr0rkrwJOI/sYAI5Ho4pGsexVifxVwGlkHwPA8bj1j0ax7NUVH3TW1KlTm+WvTp1C/ipgJ7KPAeB4xFOhmd0HpLHzMrSCWf+A48g+BuBlxFPBdgXdpFdvt7Sp2lLlxuiyaI4qozmA3Qq6SUtnRFpkH/O7ByC4aFQRV3szWwF0HL9/ABDFZCoAAAAYiUYVAAAARqJRBQAAgJFoVAEAAGAkGlUAAAAYyTez/qt2RD+CsJ8Ns2VbrtvObSWzfa/wat2ACfj9AQAfNKq79ksVC+0JyI637sL8TNXsaUj7tpLdvhfCv71aN2ACfn8A4BjP3/qvWBjSmm09FA6HtX37doXDYa3Z1kPjFnR811quu7S0VIes7rZsK5nt2729dPFq3YAJ+P0BgGM8PaJatUNati6icHi+ysvLJUnl5eWyLEsVFRXaVN3+W2Yt111VVaX169crHA6nfVvJbN/u7aWLV+sGTMDvDwA05+l/om+pif45YsSIZsvLysokRT+CMF3r3rJli23bSmb7dm8vXbxaN2ACfn8AoDlPN6p9C6N/rlq1qtnyyspKSdFJCOlad9++fW3bVjLbt3t76eLVugET8PsDAM15+tZ//97RSQbTpk6WZVkqKytTZWWlbpo2RWPOC6mkKJJ4JSmsu7S0VJMnp39byW7fzu2li1frBkzA7w8ANJdhWZbldhGJ1NbWKi8vT3sXSbldm/9s9wFp3AJ7ZsjGW7eTs/7t3Dc7ebVuwAT8/gDwqtqDUt5Eae/evcrNzU3LOj3fqMZsqo4+v2VH5mDLddu5rWS277T25jnaVTf5kmgvr1w7VTukVR9G/142wOxaASCGRrWNRhXpZ1qeo2n1wDu8cu14pU4AiMeORtXTk6lgL9PyHE2rB97hlWvHK3UCgFM8PZkK9jEtz9G0euAdXrl2vFInADiJf6YjLtPyHE2rB97hlWvHK3UCgJNoVBGXaXmOptUD7/DKteOVOgHASdz6R1ym5TmaVg+8wyvXjlfqBAAnMesfrTItz9G0euAdXrl2vFInAMRDPBWNaiMn8yDdznFt6dX3pTWbpeITpaI8c+qC+Uy7llvjlToBoCk7GlVu/XuMGzmLJYb8z7LpvodCIUUijDohNaZcy4l4pU4AsFu7JlPNnz9fffr0UU5OjoYOHap33nmn1dcuWrRIF198sQoKClRQUKCRI0e2+Xq0Lcg5i7F9Ly0tVX5+fiCPAQAAQZLy/9mfffZZTZ8+XbNmzdJ7772nc889V6NHj9bOnTvjvn7lypX6zne+o9dff12rV69WcXGxvvKVr+gf//hHh4sPmljO4mPzojmLxcXFKi8v108fe1zL1kW0ycfxNbF9n/nDH2n9+vV67LHHAncMAAAImpQb1UceeUQTJ07UhAkTdOaZZ2rhwoXq2gkgUPkAACAASURBVLWrFi9eHPf1zzzzjG688UaVlpbqjDPO0C9+8QtFIhGtWLGiw8UHTZBzFmP73qtXL0nBPAYAAARNSo1qfX293n33XY0cOfLYCkIhjRw5UqtXr05qHQcPHtThw4d1wgkntPqauro61dbWNvtCsHMWY/seG7kP4jEAACBoUppM9dlnn6mhoUGFhYXNlhcWFurDDz9Mah233XabTj755GbNbktz5szR7NmzUyktEIKcsxjb9zn3/1ilpaWaNm1a4I4BAABB4+is/wceeEBLlizRypUrlZOT0+rrZs6cqenTpzd+X1tbq+LiYidKNF54UkTjFuxTRUVF47LYjHe/i+37snXrFQqFAnkMAAAIkpQa1Z49eyozM1M1NTXNltfU1KioqO17rg8//LAeeOAB/elPf9I555zT5muzs7OVnZ2dSmnGSnfeaUE3aemM6KShlR9IGRlS2QB3Y5mcynRtuu+bqyPqlCkdaYhtlyYVAAC/SalRzcrK0qBBg7RixQpdddVVktQ4MWrKlCmtvu+hhx7Sfffdp1deeUWDBw/uWMUeYWfe6a790g/C7n96jRuZrhIZkwAABEXKs/6nT5+uRYsW6emnn9bGjRs1adIkHThwQBMmTJAkXXvttZo5c2bj6x988EHdeeedWrx4sfr06aPq6mpVV1dr//796dsLA9mZd2pKlqopdQAAAH9K+RnVsWPH6tNPP9Vdd92l6upqlZaW6uWXX26cYLV9+3aFQscalQULFqi+vl7f+ta3mq1n1qxZuvvuuztWvaFimZ/hcDTvVJLKy8tlWZYqKiq0qbr9I4J2rtuLdQAAAP9q12SqKVOmtHqrf+XKlc2+37ZtW3s24WnJ5J22t4mzc91erAMAAPgX92htYGfeqSlZqqbUAQAA/MvReKqgsDPv1JQsVVPqAAAA/pVhWZbldhGJ1NbWKi8vT3sXSbld3a4mObsPSOMW2DMj3s51e7EO+JdT0WcAgI6rPSjlTZT27t2r3NzctKyTRtVm0cxPe/5Ha+e6vVgH/MOt6DMAQPvZ0ajyjKrNSoqky0vtaeDsXLcX64B/EH0GAJB4RhWAYYg+AwDEMDwBwCjJRJ8BAIKBRhWAUYg+AwDEcOsfgFGIPgMAxNCoAjBOeFJE4xbsU0VFReOy2Kx/AEBw0KgiEMjj9JaCbtLSGZEW0Wc0qQAQNDSq8DXyOL2thH9YAECgMZkKvkYeJwAA3sWIKnyLPE4AALyNYSX4FnmcAAB4G40qfIs8TgAAvI1b//At8jgBAPA2GlX4GnmcAAB4F40qfI08TgAAvItGFYFAHicAAN7DZCoAAAAYiUYVAAAARqJRBQAAgJFoVAEAAGAkGlUAAAAYiUYVAAAARqJRBQAAgJFoVAEAAGAkGlUAAAAYiUYVAAAARqJRBQAAgJFoVAEAAGAkGlUAAAAYiUYVAAAARurkdgGAE6p2SFtqpH5FUkmR29UAAIBk0KjC13btlyoWhrRsXaRx2ZjzQgpPiqigm4uFAQCAhLj1D1+rWBjSmm09FA6HtX37doXDYa3Z1kPjFnDpAwBgOkZU4VtVO6Rl6yIKh+ervLxcklReXi7LslRRUaFN1TwGAACAyRhWgm9tqYn+OWLEiGbLy8rKJEmbq52uCAAApIJGFb7VtzD656pVq5otr6yslBSdWAUAAMzFrX/4Vv/e0YlT06ZOlmVZKisrU2VlpW6aNkVjzguppCiSeCUAAMA1NKrwtfCkiMYt2KeKiorGZbFZ/wAAwGw0qvC1gm7S0hkRbaqOPpMazVGlSQUAwAtoVBEIJQT9AwDgOUymAgAAgJFoVAEAAGAkGlUAAAAYiUYVAAAARqJRBQAAgJFoVAEAAGAkGlUAAAAYiUYVAAAARqJRBQAAgJFoVAEAAGAkGlUAAAAYiUYVAAAARqJRBQAAgJFoVAEAAGAkGlUAAAAYiUYVAAAARqJRBQAAgJFoVAEAAGAkGlUAAAAYiUYVAAAARqJRBQAAgJFoVAEAAGAkGlUAAAAYiUYVAAAARqJRBQAAgJFoVAEAAGAkGlUAAAAYiUYVAAAARqJRBQAAgJE6uV2A3ap2SFtqpH5FUkmR29UAAAAgWb5tVHftlyoWhrRsXaRx2ZjzQgpPiqigm4uFAQAAICm+vfVfsTCkNdt6KBwOa/v27QqHw1qzrYfGLfDtLgMAAPiKL0dUq3ZIy9ZFFA7PV3l5uSSpvLxclmWpoqJCm6p5DAAAAMB0vhxe3FIT/XPEiBHNlpeVlUmSNlc7XREAAABS5ctGtW9h9M9Vq1Y1W15ZWSkpOrEKAAAAZvPlrf/+vaMTp6ZNnSzLslRWVqbKykrdNG2KxpwXUklRJPFKAAAA4CpfNqqSFJ4U0bgF+1RRUdG4LDbrHwAAAObzbaNa0E1aOiOiTdXRZ1KjOao0qQAAAF7h20Y1poSgfwAAAE/y5WQqAAAAeB+NKgAAAIxEowoAAAAj0agCAADASDSqAAAAMJLvZ/3HU7Uj+jGr/RxIBHByW07w2/4AAABzBapR3bVfqlgY0rJ1x/JUYx8CUNDNu9tygt/2BwAAmC9Qt/4rFoa0ZlsPhcNhbd++XeFwWGu29dC4Bek/DE5uywl+2x8AAGC+wIyoVu2Qlq2LKByer/LycklSeXm5LMtSRUWFNlWn71a2k9tygt/2BwAAeENghsO21ET/HDFiRLPlZWVlkqIfs+rFbTnBb/sDAAC8ITCNat/C6J+rVq1qtryyslJSdHKQF7flBL/tDwAA8IbA3Prv3zs6+Wfa1MmyLEtlZWWqrKzUTdOmaMx5IZUURRKvxMBtOcFv+wMAALwhw7Isy+0iEqmtrVVeXp72LpJyu7Z/PbsPSOMWODNz3cltOcFv+wMAANKr9qCUN1Hau3evcnNz07LOQDWqMZuqo89VJpMF2tHc0FS25QV+2x8AAJAedjSqgbn131RJEk1WunJDk9mWl/htfwAAgLkCM5kqVeSGAgAAuCuQI6qJkBsKAADgPoYH4yA3FAAAwH00qnGQGwoAAOA+bv3HQW4oAACA+2hUWxGeFNG4BftUUVHRuCw26x8AAAD282Wj2tHsU0kq6CYtnRHRqxukNZukYSXSqIE0qUivdFyrAAD4la8a1XRln6Z7XUBLXF8AACTmq8lU6cw+JUcVduL6AgAgMd+MqKYz+5QcVdiJ6wsAgOT4Zvgmndmn5KjCTlxfAAAkxzeNajqzT8lRhZ24vgAASI5vbv2nM/uUHFXYiesLAIDkZFiWZbldRCK1tbXKy8vT3kVSbtfWX7f7gDRuQXpmUqdzXUBLXF8AAL+pPSjlTZT27t2r3NzctKyzXY3q/PnzNXfuXFVXV+vcc8/VvHnzNGTIkFZf//zzz+vOO+/Utm3bVFJSogcffFBjxoxJenvJNqoxm6qjz/mlI5uyo+siJxNtaZ7T63Y1AAC0nx2Nasq3/p999llNnz5dCxcu1NChQ/Xoo49q9OjR+uijj9SrV6/jXv/222/rO9/5jubMmaOvfe1r+vWvf62rrrpK7733ns4+++y07ERLJWlsCtu7LnIy0RauDwAAEkt5MtUjjzyiiRMnasKECTrzzDO1cOFCde3aVYsXL477+p/+9Kf66le/qltuuUUDBgzQvffeq/PPP1+PP/54h4s3GTmZaAvXBwAAiaU0olpfX693331XM2fObFwWCoU0cuRIrV69Ou57Vq9erenTpzdbNnr0aL344outbqeurk51dXWN3+/du1eSVPt5KtW6Z3NNNCfziSfm6sorr5QkXXnllTp48KBuuOEGrdsm9T1+8BkBwfUBAPCjWJ+W1ulPVgr+8Y9/WJKst99+u9nyW265xRoyZEjc93Tu3Nn69a9/3WzZ/PnzrV69erW6nVmzZlmS+OKLL7744osvvvjy2NeWLVtSaS/bZGQ81cyZM5uNwu7Zs0df+MIXtH37duXl5blYGZxQW1ur4uJiffLJJ2l7GBvm4nwHC+c7WDjfwbJ3716deuqpOuGEE9K2zpQa1Z49eyozM1M1NTXNltfU1KioKP6Mo6KiopReL0nZ2dnKzs4+bnleXh4XeoDk5uZyvgOE8x0snO9g4XwHSyiUvvkWKa0pKytLgwYN0ooVKxqXRSIRrVixQsOGDYv7nmHDhjV7vSQtX7681dcDAAAAUjviqaZPn67x48dr8ODBGjJkiB599FEdOHBAEyZMkCRde+21OuWUUzRnzhxJ0k033aSysjL95Cc/0RVXXKElS5boL3/5i5544on07gkAAAB8JfPuu+++O5U3nH322crPz9d9992nhx9+WJL0zDPP6PTTT5cUjaPq1KmTrrrqKklScXGxBgwYoLlz5+qBBx5QTU2NnnzySQ0fPjy1QjMzdckll6hTJyMfq0Wacb6DhfMdLJzvYOF8B0u6z7cnPkIVAAAAwUO6OAAAAIxEowoAAAAj0agCAADASDSqAAAAMJIxjer8+fPVp08f5eTkaOjQoXrnnXfafP3zzz+vM844Qzk5ORo4cKCWLVvmUKVIh1TO96JFi3TxxReroKBABQUFGjlyZMLrA2ZJ9fc7ZsmSJcrIyGhMEYE3pHq+9+zZo8mTJ6t3797Kzs5W//79+W+6h6R6vh999FGdfvrp6tKli4qLi3XzzTfr0KFDDlWL9lq1apWuvPJKnXzyycrIyNCLL76Y8D0rV67U+eefr+zsbPXr109PPfVU6htO24exdsCSJUusrKwsa/HixdZf//pXa+LEiVZ+fr5VU1MT9/VvvfWWlZmZaT300EPWBx98YP3oRz+yOnfubG3YsMHhytEeqZ7va665xpo/f761bt06a+PGjdZ1111n5eXlWX//+98drhztker5jtm6dat1yimnWBdffLH1jW98w6Fq0VGpnu+6ujpr8ODB1pgxY6w333zT2rp1q7Vy5Upr/fr1DleO9kj1fD/zzDNWdna29cwzz1hbt261XnnlFat3797WzTff7HDlSNWyZcusO+64w/rtb39rSbJ+97vftfn6jz/+2Oratas1ffp064MPPrDmzZtnZWZmWi+//HJK2zWiUR0yZIg1efLkxu8bGhqsk08+2ZozZ07c11999dXWFVdc0WzZ0KFDre9973u21on0SPV8t3TkyBGrR48e1tNPP21XiUij9pzvI0eOWBdddJH1i1/8who/fjyNqoeker4XLFhgnXbaaVZ9fb1TJSKNUj3fkydPti677LJmy6ZPn24NHz7c1jqRXsk0qrfeeqt11llnNVs2duxYa/To0Slty/Vb//X19Xr33Xc1cuTIxmWhUEgjR47U6tWr475n9erVzV4vSaNHj2719TBHe853SwcPHtThw4d1wgkn2FUm0qS95/uee+5Rr169dP311ztRJtKkPef7v//7vzVs2DBNnjxZhYWFOvvss3X//feroaHBqbLRTu053xdddJHefffdxscDPv74Yy1btkxjxoxxpGY4J129musfE/HZZ5+poaFBhYWFzZYXFhbqww8/jPue6urquK+vrq62rU6kR3vOd0u33XabTj755ON+AWCe9pzvN998U08++aTWr1/vRIlIo/ac748//livvfaaysvLtWzZMm3evFk33nijDh8+rFmzZjlRNtqpPef7mmuu0WeffaYvfelLsixLR44c0fe//3398Ic/dKJkOKi1Xq22tlaff/65unTpktR6XB9RBVLxwAMPaMmSJfrd736nnJwct8tBmu3bt08VFRVatGiRevbs6XY5cEAkElGvXr30xBNPaNCgQRo7dqzuuOMOLVy40O3SYIOVK1fq/vvv189+9jO99957+u1vf6ulS5fq3nvvdbs0GMr1EdWePXsqMzNTNTU1zZbX1NSoqKgo7nuKiopSej3M0Z7zHfPwww/rgQce0J/+9Cedc845dpaJNEn1fG/ZskXbtm3TlVde2bgsEolIkjp16qSPPvpIffv2tbdotFt7fr979+6tzp07KzMzs3HZgAEDVF1drfr6emVlZdlaM9qvPef7zjvvVEVFhb773e9KkgYOHKgDBw7ohhtu0B133KFQiPEzv2itV8vNzU16NFUyYEQ1KytLgwYN0ooVKxqXRSIRrVixQsOGDYv7nmHDhjV7vSQtX7681dfDHO0535L00EMP6d5779XLL7+swYMHO1Eq0iDV833GGWdow4YNWr9+fePX17/+dV166aVav369iouLnSwfKWrP7/fw4cO1efPmxn+QSFJVVZV69+5Nk2q49pzvgwcPHteMxv6REp2jA79IW6+W4kQvWyxZssTKzs62nnrqKeuDDz6wbrjhBis/P9+qrq62LMuyKioqrNtvv73x9W+99ZbVqVMn6+GHH7Y2btxozZo1i3gqD0n1fD/wwANWVlaW9cILL1g7duxo/Nq3b59bu4AUpHq+W2LWv7eker63b99u9ejRw5oyZYr10UcfWX/4wx+sXr16WT/+8Y/d2gWkINXzPWvWLKtHjx7Wb37zG+vjjz+2Xn31Vatv377W1Vdf7dYuIEn79u2z1q1bZ61bt86SZD3yyCPWunXrrL/97W+WZVnW7bffblVUVDS+PhZPdcstt1gbN2605s+f7914KsuyrHnz5lmnnnqqlZWVZQ0ZMsRas2ZN48/Kysqs8ePHN3v9c889Z/Xv39/KysqyzjrrLGvp0qUOV4yOSOV8f+ELX7AkHfc1a9Ys5wtHu6T6+90Ujar3pHq+3377bWvo0KFWdna2ddppp1n33XefdeTIEYerRnulcr4PHz5s3X333Vbfvn2tnJwcq7i42Lrxxhut3bt3u1A5UvH666/H/X9x7PyOHz/eKisrO+49paWlVlZWlnXaaadZv/zlL1PeboZlMdYOAAAA87j+jCoAAAAQD40qAAAAjESjCgAAACPRqAIAAMBINKoAAAAwEo0qAAAAjESjCgAAACPRqAIAAMBINKoAAAAwEo0qAAAAjESjCgAAACPRqAIAAMBINKoAAAAwEo0qAAAAjESjCgAAACPRqAIAAMBINKoAAAAwEo0qAAAAjESjCgAAACPRqAIAAMBINKoAAAAwEo0qAAAAjESjCs+45JJL9IMf/MDtMjokIyNDL774YkrvSfd+p7I+p475W2+9pYEDB6pz58666qqrbN9eW6677jrXa/CzVK6pbdu2KSMjQ+vXr5ckrVy5UhkZGdqzZ4+dJcIhTf972PJcAzGd3C4AgLN++9vfqvP/b+/OY6I43ziAf7mW3eVGueVQYSkaUFYUFFGqq7uCG6yK1ChoxFoPiqaI1loRamstiFd6SG0CRqzWiqJFEIQqKrUUKqIWshyl1RjiBQ3iAcq+vz8MU1ZWWBTo8uvzSUjced+Z933nnXf22Zl3RgODXs/7Ot5//32MHj0aOTk5MDY27vPygOdfjEOHDkVZWRlGjx7NLd+9ezcYY/1Sh/+i1zmmJkyYgPr6epiZmfVyrUhfio+PR2ZmZpdBqKOjI+rr6zF48OB+rBkZCChQJaSD1tZW8Hi8f7safcrS0rJP8r6O2tpaLF++HEOGDOmX8rpCQVDfep1jisfjwdbW9rXK/y+M8YFIT0+P+paoRbf+yYDV0tKCtWvXwsHBAUZGRvD19cW5c+e49Pv372P+/PlwcHCAUCiEp6cnDh06pLKNwMBAREVFYc2aNRg8eDCkUil3e7GgoAA+Pj4QCoWYMGECFAqFyronTpyAWCwGn8/HsGHDkJCQgGfPnnHp1dXVmDRpEvh8PkaMGIEzZ85026aHDx8iIiICxsbGsLOzQ3Jyco/bDTy/lR4YGAihUAgLCwtIpVI0NjZybe546/Wrr76Cm5sb+Hw+bGxsMHfuXJX90zFvY2MjIiIiYGFhAaFQiBkzZqC6uppLT0tLg7m5OXJzc+Hh4QFjY2PIZDLU19erbW/77b779+9jyZIl0NHRQVpaGredjjIzM6Gjo8N9jo+Px+jRo3HgwAG4uLjAzMwMb7/9Nh48eMDlUSqVSExMhKurKwwNDeHk5IRPP/0UADB06FAAgLe3N3R0dBAYGAig863/lpYWREdHw9raGnw+HxMnTkRJSQmXrunxos2qqqqQk5Oj0pd9peMx5eLigq1bt2LJkiUwMTGBk5MTvvnmm5euq+7W/8WLFxEQEACBQABHR0dER0fj4cOHXLqLiwu2bNmCiIgImJqaYtmyZWhtbUVUVBTs7OzA5/Ph7OyMzz77rO8a3cv6s7+ArsdAd2M1LS0NCQkJKC8vh46ODjfGX6Tu1v/169cxY8YMGBsbw8bGBuHh4bh37x6Xru78zRhDfHw8nJycYGhoCHt7e0RHR/fBXiH9hQJVMmBFRUXh0qVLOHz4MK5evYrQ0FDIZDLu5P3kyROMGTMGp06dwvXr17Fs2TKEh4fj119/VdnO/v37wePxUFRUhL1793LLN27ciOTkZJSWlkJfXx9Llizh0i5cuICIiAisXr0aFRUVSElJQVpaGhcEKZVKzJ49GzweD8XFxdi7dy/Wr1/fbZtiY2NRWFiIEydOIC8vD+fOncPly5d71O4rV65g6tSpGDFiBC5duoSLFy9CLpejra2tU3mlpaWIjo7Gxx9/DIVCgdOnT2PSpEkvrd/ixYtRWlqKkydP4tKlS2CMISgoCE+fPuXyPHr0CNu3b8eBAwdw/vx53LhxA2vXrlW7vfbbfaampti1axfq6+sRFhbW7X5qV1tbi8zMTGRlZSErKwuFhYXYtm0bl75hwwZs27YNmzZtQkVFBb777jvY2NgAAHcc5Ofno76+HseOHVNbxrp165CRkYH9+/fj8uXLcHV1hVQqRUNDg0q+ro4XbdXQ0IDg4GC4u7sjKCgIIpEIwcHB3I+a/pCcnAwfHx+UlZVh5cqVWLFihcZBfm1tLWQyGebMmYOrV6/i+++/x8WLFxEVFaWSb/v27Rg1ahTKysqwadMm7NmzBydPnsSRI0egUChw8OBBuLi49EHreldDQwNkMplKf8lksj7vL03HgDphYWGIiYnByJEjUV9fr/EY//vvvzFlyhR4e3ujtLQUp0+fxu3btzFv3jyVfC+evzMyMrBz506kpKSguroamZmZ8PT0fOW2Ey3ACGmXnMyYg0P3f3J553Xlcs3WTU5+5epNnjyZrV69mjHG2F9//cX09PTYrVu3VPJMnTqVbdiw4aXbCA4OZjExMSrb9Pb2Vslz9uxZBoDl5+dzy06dOsUAsMePH3PlbN26VWW9AwcOMDs7O8YYY7m5uUxfX1+lfjk5OQwAO378uNq6PXjwgPF4PHbkyBFu2f3795lAIOhRu+fPn8/8/f1fug867seMjAxmamrKmpqaus1bVVXFALCioiIu/d69e0wgEHB1Tk1NZQBYTU0Nl+fLL79kNjY2L60PY4yZmZmx1NRU7nNqaiozMzNTyXP8+HHW8ZS1efNmJhQKVeoeGxvLfH19GWOMNTU1MUNDQ7Zv3z61ZdbV1TEArKysTGX5okWLWEhICGOMsebmZmZgYMAOHjzIpbe2tjJ7e3uWmJjIGNPseNFWQUFBzNLSkqWnp7MbN26w9PR0ZmlpyYKCgvqszI7HlLOzM1u4cCGXplQqmbW1Nfv6668ZY537qH1fNzY2MsYYi4yMZMuWLVPZ/oULF5iuri63752dndmsWbNU8rz33ntsypQpTKlU9k0j+4hUKmV6enoMAPenp6fHpFJpn5XZ3RjQdKyOGjWq07Y7ng9f7OstW7aw6dOnq+S/efMmA8AUCgVjTP35Ozk5mYlEItba2voarSbahOaokn80NQG3bnWfz9Gx87K7dzVbt6mp5/VS49q1a2hra4NIJFJZ3tLSgkGDBgEA2trasHXrVhw5cgS3bt1Ca2srWlpaIBQKVdYZM2aM2jK8vLy4f9vZ2QEA7ty5AycnJ5SXl6OoqIi7gtpe3pMnT/Do0SNUVlbC0dER9vb2XPr48eO7bFNtbS1aW1vh6+vLLbO0tIS7u3uP2n3lyhWEhoZ2WVa7adOmwdnZGcOGDYNMJoNMJsNbb73VaR8BQGVlJfT19VXqN2jQILi7u6OyspJbJhQKMXz4cO6znZ0d7ty5o1F9esrFxQUmJiZqy6qsrERLSwumTp36ytuvra3F06dP4e/vzy0zMDDAuHHjVNoMdH28aKOqqipkZ2cjPT0dCxYsAAAsWLAAjDGEh4ejuroabm5ufV6PjvtNR0cHtra2Gh8v5eXluHr1Kg4ePMgtY4xBqVSirq4OHh4eAAAfHx+V9RYvXoxp06bB3d0dMpkMM2fOxPTp03uhNX2nqqoKubm5nZa3tbUhNze3z/qruzFgZWXV62UCz/v27Nmzah+urK2t5c6BL56/Q0NDsWvXLu6cFhQUBLlcDn19CncGKuo58g9TU8DBoft86k5MVlaarWtq2vN6qdHc3Aw9PT389ttv0NPTU0lrP7ElJSVh9+7d2LVrFzw9PWFkZIQ1a9agtbVVJb+RkZHaMjo+mdw+30qpVHLlJyQkYPbs2Z3W4/P5r96wbmjSboFAoPH2TExMcPnyZZw7dw55eXmIi4tDfHw8SkpKOs0709SLT3Tr6Oj0+Cl6XV3dTut0nF7QVVntfdST/dAbujpetFFtbS0AdJrqMXnyZABATU1NvwSqXfVhd5qbm/Huu++qnYPY8QfCi2NcLBajrq4OOTk5yM/Px7x58yCRSHD06NFXaEH/aO+vl+mv/nqRpmO1p5qbmyGXy/H55593Smv/IQh07ltHR0coFArk5+fjzJkzWLlyJZKSklBYWNgvbzAhvY8CVfKP999//vcqTp7s3bp0w9vbG21tbbhz5w4CAgLU5ikqKkJISAgWLlwI4HnQUFVVhREjRrx2+WKxGAqFAq6urmrTPTw8cPPmTdTX13Mn1V9++aXLbQ4fPhwGBgYoLi7mvmQbGxtRVVXFBQ+atNvLywsFBQVISEjQqC36+vqQSCSQSCTYvHkzzM3N8dNPP3UKwj08PPDs2TMUFxdjhC3F1QAABUtJREFUwoQJAJ4/sKZQKHpln3ZkZWWFBw8e4OHDh9wXUU/fr+jm5gaBQICCggIsXbq0U3r708Hq5u62Gz58ODf/zdnZGcDzL+GSkpIB/07f9qve58+f566oAkBhYSEAvPTY1iZisRgVFRWvVFdTU1OEhYUhLCwMc+fOhUwmQ0NDQ7+96aKnOt6lUKev+qu7MaDJWOXxeF2OM3XEYjEyMjLg4uLS46uhAoEAcrkccrkcq1atwhtvvIFr165BLBb3aDtEO1CgSgYkkUiEBQsWICIiAsnJyfD29sbdu3dRUFAALy8vBAcHw83NDUePHsXPP/8MCwsL7NixA7dv3+6VoCouLg4zZ86Ek5MT5s6dC11dXZSXl+P69ev45JNPIJFIIBKJsGjRIiQlJaGpqQkbN27scpvGxsaIjIxEbGwsBg0aBGtra2zcuBG6uv8886hJuzds2ABPT0+sXLkSy5cvB4/Hw9mzZxEaGtrpHYVZWVn4448/MGnSJFhYWCA7OxtKpVJlukE7Nzc3hISE4J133kFKSgpMTEzwwQcfwMHBASEhIa+9Tzvy9fWFUCjEhx9+iOjoaBQXF6t9UrgrfD4f69evx7p168Dj8eDv74+7d+/i999/R2RkJKytrSEQCHD69GkMGTIEfD6/06upjIyMsGLFCsTGxsLS0hJOTk5ITEzEo0ePEBkZ2Yst7n8ikQhBQUGIjo4GYwyTJ09GYWEhVq9ejaCgoH/l6lxPrV+/Hn5+foiKisLSpUthZGSEiooKnDlzBl988cVL19uxYwfs7Ozg7e0NXV1d/PDDD7C1tX3luwj9QSQSQSqVIj8/XyXo09PTg0Qi6bP+6m4MMMa6HasuLi6oq6vDlStXMGTIEJiYmMDQ0LDLcletWoV9+/Zh/vz5WLduHSwtLVFTU4PDhw/j22+/7XRHqV1aWhra2tq4c0h6ejoEAgEXZJOBh576JwNWamoqIiIiEBMTA3d3d8yaNQslJSXc1ciPPvoIYrEYUqkUgYGBsLW17bX/cUgqlSIrKwt5eXkYO3Ys/Pz8sHPnTu5kqKuri+PHj+Px48cYN24cli5dqjKf9WWSkpIQEBAAuVwOiUSCiRMndpqD1V27RSIR8vLyUF5ejnHjxmH8+PE4ceKE2qsS5ubmOHbsGKZMmQIPDw/s3bsXhw4dwsiRI9XWLzU1FWPGjMHMmTMxfvx4MMaQnZ3d67fULC0tkZ6ejuzsbO61YvHx8T3ezqZNmxATE4O4uDh4eHggLCyMm/+or6+PPXv2ICUlBfb29i8Ntrdt24Y5c+YgPDwcYrEYNTU1yM3NhYWFxes0USukp6fDz88P4eHhcHJyQnh4OPz8/JCenv5vV00jXl5eKCwsRFVVFQICAuDt7Y24uDiVueHqmJiYIDExET4+Phg7diz+/PNPZGdnq/wo1EaHDh2CRCJRWSaRSDq9dq+3dTUGNBmrc+bMgUwmw5tvvgkrKyuN6mtvb4+ioiK0tbVh+vTp8PT0xJo1a2Bubt5lP5mbm2Pfvn3w9/eHl5cX8vPz8eOPP3Jz+MnAo8N6OnmMEELI/5Xq6mrU1NTA1dV1QFxJ/a+j/iL/JRSoEkIIIYQQraTd9zkIIYQQQsh/FgWqhBBCCCFEK1GgSgghhBBCtBIFqoQQQgghRCtRoEoIIYQQQrQSBaqEEEIIIUQrUaBKCCGEEEK0EgWqhBBCCCFEK1GgSgghhBBCtBIFqoQQQgghRCtRoEoIIYQQQrQSBaqEEEIIIUQrUaBKCCGEEEK0EgWqhBBCCCFEK1GgSgghhBBCtBIFqoQQQgghRCv9DwgRA6AHHc9zAAAAAElFTkSuQmCC\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "tags": [] + } + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "O7NmDgjRm5EE", + "outputId": "0d0379c1-b9ce-4aed-bf59-feb9e9d99091", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 106 + } + }, + "source": [ + "# Zoom em alguns outliers...\n", + "df1.loc[df1['outlier'] == 1].head()" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
survivedpclasssexagesibspparchfareembarkedclasswhoadult_maledeckembark_townalivealoneoutlier
67911male0.443601011.0CFirstmanTrueBCherbourgyesFalse1
73711male0.430956001.0CFirstmanTrueBCherbourgyesTrue1
\n", + "
" + ], + "text/plain": [ + " survived pclass sex age ... embark_town alive alone outlier\n", + "679 1 1 male 0.443601 ... Cherbourg yes False 1\n", + "737 1 1 male 0.430956 ... Cherbourg yes True 1\n", + "\n", + "[2 rows x 16 columns]" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 31 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "HIRxOj93nVXu", + "outputId": "8b95aa23-b3f7-4ed6-83bd-31e971b6fa62", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 286 + } + }, + "source": [ + "# Zoom na linha 679\n", + "df_titanic.loc[679]" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "survived 1\n", + "pclass 1\n", + "sex male\n", + "age 36\n", + "sibsp 0\n", + "parch 1\n", + "fare 512.329\n", + "embarked C\n", + "class First\n", + "who man\n", + "adult_male True\n", + "deck B\n", + "embark_town Cherbourg\n", + "alive yes\n", + "alone False\n", + "Name: 679, dtype: object" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 32 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "euxK-4K1oKs0", + "outputId": "24a8459e-163d-497c-ba60-b48a792bfff6", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 166 + } + }, + "source": [ + "# Algumas medidas para compararmos\n", + "df_resumo = df_titanic.groupby('sex').agg({'age': ['mean'], 'fare': ['mean']}).round(0)\n", + "df_resumo" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
agefare
meanmean
sex
female33.089.0
male38.069.0
\n", + "
" + ], + "text/plain": [ + " age fare\n", + " mean mean\n", + "sex \n", + "female 33.0 89.0\n", + "male 38.0 69.0" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 33 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "nuNxqgWMtMHC", + "outputId": "d9f8600f-c28d-4c63-9a14-4045efd4dc1d", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + } + }, + "source": [ + "# Média Geral de 'age'\n", + "round(df_titanic['age'].mean())" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "36" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 34 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "bLIZcvyuuU2R", + "outputId": "772602e6-5c1a-4a83-cb88-857a9b905fba", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + } + }, + "source": [ + "# Média Geral de 'fare'\n", + "round(df_titanic['fare'].mean())" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "79" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 35 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "fFd-D1HTVhE7" + }, + "source": [ + "___\n", + "## **HBOS - Histogram-based Outlier Detection**" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "Q5Hh5iMEXuhM", + "outputId": "8ac4d33b-c4bd-4b03-e7ae-af052af8b2a7", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 755 + } + }, + "source": [ + "outliers_fraction = 0.01\n", + "xx , yy = np.meshgrid(np.linspace(0, 1, 100), np.linspace(0, 1, 100))\n", + "clf = HBOS(contamination = outliers_fraction)\n", + "clf.fit(X)\n", + "# predict raw anomaly score\n", + "scores_pred = clf.decision_function(X) * -1\n", + " \n", + "# prediction of a datapoint category outlier or inlier\n", + "y_pred = clf.predict(X)\n", + "n_inliers = len(y_pred) - np.count_nonzero(y_pred)\n", + "n_outliers = np.count_nonzero(y_pred == 1)\n", + "plt.figure(figsize = (8, 8))\n", + "# copy of dataframe\n", + "df1 = df_titanic_ss\n", + "df1['outlier'] = y_pred.tolist()\n", + " \n", + "inliers_fare = np.array(df1['fare'][df1['outlier'] == 0]).reshape(-1, 1)\n", + "inliers_age = np.array(df1['age'][df1['outlier'] == 0]).reshape(-1, 1)\n", + " \n", + "outliers_fare = df1['fare'][df1['outlier'] == 1].values.reshape(-1, 1)\n", + "outliers_age = df1['age'][df1['outlier'] == 1].values.reshape(-1, 1)\n", + " \n", + "print('OUTLIERS:', n_outliers, 'INLIERS:', n_inliers)\n", + " \n", + "# threshold define se um ponto será outlier ou inlier\n", + "threshold = percentile(scores_pred, 100 * outliers_fraction)\n", + " \n", + "# Calcula o Anomaly score\n", + "Z = clf.decision_function(np.c_[xx.ravel(), yy.ravel()]) * -1\n", + "Z = Z.reshape(xx.shape)\n", + "\n", + "# Define a região azul tal que min(Anomaly score) < threshold\n", + "plt.contourf(xx, yy, Z, levels = np.linspace(Z.min(), threshold, 7), cmap = plt.cm.Blues_r)\n", + " \n", + "# Desenha a linha a partir do queal Anomaly score = thresold\n", + "a = plt.contour(xx, yy, Z, levels = [threshold], linewidths = 2, colors = 'red')\n", + " \n", + "# Define a região laranja a partir do qual threshold < Anomaly score < max(Anomaly score)\n", + "plt.contourf(xx, yy, Z, levels = [threshold, Z.max()],colors='orange')\n", + "b = plt.scatter(inliers_fare, inliers_age, c='white',s=20, edgecolor='k')\n", + " \n", + "c = plt.scatter(outliers_fare, outliers_age, c='black',s=20, edgecolor='k')\n", + " \n", + "plt.axis('tight') \n", + " \n", + "plt.legend([a.collections[0], b, c], ['learned decision function', 'inliers', 'outliers'],\n", + " prop=matplotlib.font_manager.FontProperties(size = 10), loc ='upper center', frameon = False, bbox_to_anchor = (0.5, -0.05),\n", + " fancybox = True, shadow = True, ncol = 5)\n", + " \n", + "plt.xlim((0, 1))\n", + "plt.ylim((0, 1))\n", + "plt.title('Histogram-base Outlier Detection (HBOS)')\n", + "plt.show();" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "text": [ + "OUTLIERS: 2 INLIERS: 180\n" + ], + "name": "stdout" + }, + { + "output_type": "display_data", + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "tags": [] + } + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "gHRoON0BnLVb", + "outputId": "73ec2b80-9edf-445b-8276-baa505178432", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 106 + } + }, + "source": [ + "# Zoom em alguns outliers...\n", + "df1.loc[df1['outlier'] == 1].head()" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
survivedpclasssexagesibspparchfareembarkedclasswhoadult_maledeckembark_townalivealoneoutlier
31811female0.380374020.321798SFirstwomanFalseCSouthamptonyesFalse1
68911female0.178048010.412503SFirstchildFalseBSouthamptonyesFalse1
\n", + "
" + ], + "text/plain": [ + " survived pclass sex age ... embark_town alive alone outlier\n", + "318 1 1 female 0.380374 ... Southampton yes False 1\n", + "689 1 1 female 0.178048 ... Southampton yes False 1\n", + "\n", + "[2 rows x 16 columns]" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 37 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "YblU2tnxnXi7", + "outputId": "c96e85f5-7565-46bf-c355-223297a1d980", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 286 + } + }, + "source": [ + "# Zoom na linha 689\n", + "df_titanic.loc[689]" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "survived 1\n", + "pclass 1\n", + "sex female\n", + "age 15\n", + "sibsp 0\n", + "parch 1\n", + "fare 211.338\n", + "embarked S\n", + "class First\n", + "who child\n", + "adult_male False\n", + "deck B\n", + "embark_town Southampton\n", + "alive yes\n", + "alone False\n", + "Name: 689, dtype: object" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 38 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "AkWj5aQ-uzxB", + "outputId": "1a12b0f5-1cec-4a43-8fef-29c2ca67e779", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 166 + } + }, + "source": [ + "# Algumas medidas para compararmos\n", + "df_resumo = df_titanic.groupby('sex').agg({'age': ['mean'], 'fare': ['mean']}).round(0)\n", + "df_resumo" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
agefare
meanmean
sex
female33.089.0
male38.069.0
\n", + "
" + ], + "text/plain": [ + " age fare\n", + " mean mean\n", + "sex \n", + "female 33.0 89.0\n", + "male 38.0 69.0" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 39 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "EVy5NDrFujgD", + "outputId": "e3e2dbd7-b4fd-445a-fac4-3dbf9139ba08", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + } + }, + "source": [ + "# Média Geral de 'age'\n", + "round(df_titanic['age'].mean())" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "36" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 40 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "Hgcp_LU6ujgJ", + "outputId": "e6959579-db19-4fbd-cc87-0b42421a0b90", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + } + }, + "source": [ + "# Média Geral de 'fare'\n", + "round(df_titanic['fare'].mean())" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "79" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 41 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "KyPUT9JmWeN-" + }, + "source": [ + "___\n", + "## **Isolation Forest**" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "Lrx85bG0YOqM", + "outputId": "84485503-9084-42e2-f251-56626897b391", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 755 + } + }, + "source": [ + "outliers_fraction = 0.01\n", + "xx , yy = np.meshgrid(np.linspace(0, 1, 100), np.linspace(0, 1, 100))\n", + "clf = IForest(contamination = outliers_fraction,random_state = 0)\n", + "clf.fit(X)\n", + "# predict raw anomaly score\n", + "scores_pred = clf.decision_function(X) * -1\n", + " \n", + "# prediction of a datapoint category outlier or inlier\n", + "y_pred = clf.predict(X)\n", + "n_inliers = len(y_pred) - np.count_nonzero(y_pred)\n", + "n_outliers = np.count_nonzero(y_pred == 1)\n", + "plt.figure(figsize = (8, 8))\n", + "# copy of dataframe\n", + "df1 = df_titanic_ss\n", + "df1['outlier'] = y_pred.tolist()\n", + " \n", + "# fare - inlier feature 1, age - inlier feature 2\n", + "inliers_fare = np.array(df1['fare'][df1['outlier'] == 0]).reshape(-1,1)\n", + "inliers_age = np.array(df1['age'][df1['outlier'] == 0]).reshape(-1,1)\n", + " \n", + "# fare - outlier feature 1, age - outlier feature 2\n", + "outliers_fare = df1['fare'][df1['outlier'] == 1].values.reshape(-1,1)\n", + "outliers_age = df1['age'][df1['outlier'] == 1].values.reshape(-1,1)\n", + " \n", + "print('OUTLIERS: ', n_outliers,'INLIERS: ', n_inliers)\n", + " \n", + "# threshold value to consider a datapoint inlier or outlier\n", + "threshold = percentile(scores_pred, 100 * outliers_fraction)\n", + " \n", + "# decision function calculates the raw anomaly score for every point\n", + "Z = clf.decision_function(np.c_[xx.ravel(), yy.ravel()]) * -1\n", + "Z = Z.reshape(xx.shape)\n", + "# fill blue map colormap from minimum anomaly score to threshold value\n", + "plt.contourf(xx, yy, Z, levels=np.linspace(Z.min(), threshold, 7),cmap=plt.cm.Blues_r)\n", + " \n", + "# draw red contour line where anomaly score is equal to thresold\n", + "a = plt.contour(xx, yy, Z, levels= [threshold],linewidths=2, colors='red')\n", + " \n", + "# fill orange contour lines where range of anomaly score is from threshold to maximum anomaly score\n", + "plt.contourf(xx, yy, Z, levels= [threshold, Z.max()],colors='orange')\n", + "b = plt.scatter(inliers_fare, inliers_age, c='white',s=20, edgecolor='k')\n", + " \n", + "c = plt.scatter(outliers_fare, outliers_age, c='black',s=20, edgecolor='k')\n", + " \n", + "plt.axis('tight')\n", + "plt.legend([a.collections[0], b,c], ['learned decision function', 'inliers', 'outliers'],\n", + " prop=matplotlib.font_manager.FontProperties(size = 10), loc='upper center', frameon= False, bbox_to_anchor = (0.5, -0.05),\n", + " fancybox = True, shadow = True, ncol=5)\n", + " \n", + "plt.xlim((0, 1))\n", + "plt.ylim((0, 1))\n", + "plt.title('Isolation Forest')\n", + "plt.show();" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "text": [ + "OUTLIERS: 2 INLIERS: 180\n" + ], + "name": "stdout" + }, + { + "output_type": "display_data", + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "tags": [] + } + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "HLVraGcCnNTA", + "outputId": "957c19be-cec9-4d4a-9902-0f49f806198f", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 106 + } + }, + "source": [ + "# Zoom em alguns outliers...\n", + "df1.loc[df1['outlier'] == 1].head()" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
survivedpclasssexagesibspparchfareembarkedclasswhoadult_maledeckembark_townalivealoneoutlier
67911male0.443601011.0CFirstmanTrueBCherbourgyesFalse1
73711male0.430956001.0CFirstmanTrueBCherbourgyesTrue1
\n", + "
" + ], + "text/plain": [ + " survived pclass sex age ... embark_town alive alone outlier\n", + "679 1 1 male 0.443601 ... Cherbourg yes False 1\n", + "737 1 1 male 0.430956 ... Cherbourg yes True 1\n", + "\n", + "[2 rows x 16 columns]" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 43 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "y0WBmFOonZKY", + "outputId": "06cdcee2-c8fc-44ec-aa7b-f33d16d4d7c2", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 286 + } + }, + "source": [ + "# Zoom na linha 679\n", + "df_titanic.loc[679]" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "survived 1\n", + "pclass 1\n", + "sex male\n", + "age 36\n", + "sibsp 0\n", + "parch 1\n", + "fare 512.329\n", + "embarked C\n", + "class First\n", + "who man\n", + "adult_male True\n", + "deck B\n", + "embark_town Cherbourg\n", + "alive yes\n", + "alone False\n", + "Name: 679, dtype: object" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 44 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "auSy5b6Du3PH", + "outputId": "b1aae710-3d55-4ff1-81e1-635fdb96a8ca", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 166 + } + }, + "source": [ + "# Algumas medidas para compararmos\n", + "df_resumo = df_titanic.groupby('sex').agg({'age': ['mean'], 'fare': ['mean']}).round(0)\n", + "df_resumo" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
agefare
meanmean
sex
female33.089.0
male38.069.0
\n", + "
" + ], + "text/plain": [ + " age fare\n", + " mean mean\n", + "sex \n", + "female 33.0 89.0\n", + "male 38.0 69.0" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 45 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "fIQg2D6fuoSG", + "outputId": "94244010-8314-481c-a428-b526d3b70ca8", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + } + }, + "source": [ + "# Média Geral de 'age'\n", + "round(df_titanic['age'].mean())" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "36" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 46 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "pNUds1oDuoSO", + "outputId": "64f51541-4329-404d-8d0b-f65faf2b397e", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + } + }, + "source": [ + "# Média Geral de 'fare'\n", + "round(df_titanic['fare'].mean())" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "79" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 47 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "QbpzXB2RV4sq" + }, + "source": [ + "___\n", + "## **KNN - K-Nearest Neighbors**" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "6gtIWWbRYxEj", + "outputId": "8bc4ae18-4a33-489e-d13e-98a3160a9e13", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 755 + } + }, + "source": [ + "outliers_fraction = 0.01\n", + "xx , yy = np.meshgrid(np.linspace(0, 1, 100), np.linspace(0, 1, 100))\n", + "clf = KNN(contamination = outliers_fraction)\n", + "clf.fit(X)\n", + "# predict raw anomaly score\n", + "scores_pred = clf.decision_function(X) * -1\n", + " \n", + "# prediction of a datapoint category outlier or inlier\n", + "y_pred = clf.predict(X)\n", + "n_inliers = len(y_pred) - np.count_nonzero(y_pred)\n", + "n_outliers = np.count_nonzero(y_pred == 1)\n", + "plt.figure(figsize = (8, 8))\n", + "# copy of dataframe\n", + "df1 = df_titanic_ss\n", + "df1['outlier'] = y_pred.tolist()\n", + " \n", + "inliers_fare = np.array(df1['fare'][df1['outlier'] == 0]).reshape(-1,1)\n", + "inliers_age = np.array(df1['age'][df1['outlier'] == 0]).reshape(-1,1)\n", + " \n", + "outliers_fare = df1['fare'][df1['outlier'] == 1].values.reshape(-1,1)\n", + "outliers_age = df1['age'][df1['outlier'] == 1].values.reshape(-1,1)\n", + " \n", + "print('OUTLIERS: ',n_outliers, 'INLIERS: ', n_inliers)\n", + " \n", + "# threshold value to consider a datapoint inlier or outlier\n", + "threshold = percentile(scores_pred, 100 * outliers_fraction)\n", + " \n", + "# decision function calculates the raw anomaly score for every point\n", + "Z = clf.decision_function(np.c_[xx.ravel(), yy.ravel()]) * -1\n", + "Z = Z.reshape(xx.shape)\n", + "# fill blue map colormap from minimum anomaly score to threshold value\n", + "plt.contourf(xx, yy, Z, levels=np.linspace(Z.min(), threshold, 7),cmap=plt.cm.Blues_r)\n", + " \n", + "# draw red contour line where anomaly score is equal to thresold\n", + "a = plt.contour(xx, yy, Z, levels= [threshold],linewidths=2, colors='red')\n", + " \n", + "# fill orange contour lines where range of anomaly score is from threshold to maximum anomaly score\n", + "plt.contourf(xx, yy, Z, levels= [threshold, Z.max()],colors='orange')\n", + "b = plt.scatter(inliers_fare, inliers_age, c='white',s=20, edgecolor='k')\n", + " \n", + "c = plt.scatter(outliers_fare, outliers_age, c='black',s=20, edgecolor='k')\n", + " \n", + "plt.axis('tight') \n", + " \n", + "plt.legend([a.collections[0], b,c], ['learned decision function', 'inliers', 'outliers'],\n", + " prop=matplotlib.font_manager.FontProperties(size=10), loc='upper center', frameon= False, bbox_to_anchor = (0.5, -0.05),\n", + " fancybox = True, shadow = True, ncol = 5)\n", + " \n", + "plt.xlim((0, 1))\n", + "plt.ylim((0, 1))\n", + "plt.title('K-Nearest Neighbors (KNN)')\n", + "plt.show();" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "text": [ + "OUTLIERS: 2 INLIERS: 180\n" + ], + "name": "stdout" + }, + { + "output_type": "display_data", + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "tags": [] + } + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "6B-L7MwXg25Z", + "outputId": "0f8d806d-9e19-4133-c038-01d96c26b168", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 195 + } + }, + "source": [ + "df1.head()" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
survivedpclasssexagesibspparchfareembarkedclasswhoadult_maledeckembark_townalivealoneoutlier
111female0.468892100.139136CFirstwomanFalseCCherbourgyesFalse0
311female0.430956100.103644SFirstwomanFalseCSouthamptonyesFalse0
601male0.671219000.101229SFirstmanTrueESouthamptonnoTrue0
1013female0.038948110.032596SThirdchildFalseGSouthamptonyesFalse0
1111female0.721801000.051822SFirstwomanFalseCSouthamptonyesTrue0
\n", + "
" + ], + "text/plain": [ + " survived pclass sex age ... embark_town alive alone outlier\n", + "1 1 1 female 0.468892 ... Cherbourg yes False 0\n", + "3 1 1 female 0.430956 ... Southampton yes False 0\n", + "6 0 1 male 0.671219 ... Southampton no True 0\n", + "10 1 3 female 0.038948 ... Southampton yes False 0\n", + "11 1 1 female 0.721801 ... Southampton yes True 0\n", + "\n", + "[5 rows x 16 columns]" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 49 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "gvXGH0BHBBNN", + "outputId": "b44557de-c87c-4012-adb9-63597c401390", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 106 + } + }, + "source": [ + "# Zoom em alguns outliers...\n", + "df1.loc[df1['outlier'] == 1].head()" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
survivedpclasssexagesibspparchfareembarkedclasswhoadult_maledeckembark_townalivealoneoutlier
67911male0.443601011.0CFirstmanTrueBCherbourgyesFalse1
73711male0.430956001.0CFirstmanTrueBCherbourgyesTrue1
\n", + "
" + ], + "text/plain": [ + " survived pclass sex age ... embark_town alive alone outlier\n", + "679 1 1 male 0.443601 ... Cherbourg yes False 1\n", + "737 1 1 male 0.430956 ... Cherbourg yes True 1\n", + "\n", + "[2 rows x 16 columns]" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 50 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "MYbNaaO7D3NY", + "outputId": "bc9fcf92-8937-42e7-b79f-84a17a50ecb3", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 286 + } + }, + "source": [ + "# Zoom na linha 679\n", + "df_titanic.loc[679]" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "survived 1\n", + "pclass 1\n", + "sex male\n", + "age 36\n", + "sibsp 0\n", + "parch 1\n", + "fare 512.329\n", + "embarked C\n", + "class First\n", + "who man\n", + "adult_male True\n", + "deck B\n", + "embark_town Cherbourg\n", + "alive yes\n", + "alone False\n", + "Name: 679, dtype: object" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 51 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "-juEvWvru5jp", + "outputId": "6f5f85aa-7249-4c32-8473-db43dc01eec3", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 166 + } + }, + "source": [ + "# Algumas medidas para compararmos\n", + "df_resumo = df_titanic.groupby('sex').agg({'age': ['mean'], 'fare': ['mean']}).round(0)\n", + "df_resumo" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
agefare
meanmean
sex
female33.089.0
male38.069.0
\n", + "
" + ], + "text/plain": [ + " age fare\n", + " mean mean\n", + "sex \n", + "female 33.0 89.0\n", + "male 38.0 69.0" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 52 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "B6NXG6oDusSg", + "outputId": "707873f7-a36b-45a5-e184-2aaf789985fc", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + } + }, + "source": [ + "# Média Geral de 'age'\n", + "round(df_titanic['age'].mean())" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "36" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 53 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "cgHJb3iBusSl", + "outputId": "47a1c2c1-498e-4f85-8176-5464ac8077c4", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + } + }, + "source": [ + "# Média Geral de 'fare'\n", + "round(df_titanic['fare'].mean())" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "79" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 54 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "1w7MIkoAG2Qr" + }, + "source": [ + "___\n", + "# **Exercícios**\n", + "Para cada um dos dataframes a seguir, faça uma análise de outlier utilizando uma das técnicas apresentadas e explique seus resultados." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Ep_Z3iQIG56r" + }, + "source": [ + "## Exercício 1 - Predict Breast Cancer" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "v-Lvzrt7HN2l", + "outputId": "035c44f6-200d-41ab-c89d-4622f5b0bec6", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 249 + } + }, + "source": [ + "import pandas as pd\n", + "import numpy as np\n", + "from sklearn.datasets import load_breast_cancer\n", + "\n", + "cancer = load_breast_cancer()\n", + "X = cancer['data']\n", + "y = cancer['target']\n", + "\n", + "df_cancer = pd.DataFrame(np.c_[X, y], columns= np.append(cancer['feature_names'], ['target']))\n", + "df_cancer['target'] = df_cancer['target'].map({0: 'malign', 1: 'benign'})\n", + "df_cancer.head()" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
mean radiusmean texturemean perimetermean areamean smoothnessmean compactnessmean concavitymean concave pointsmean symmetrymean fractal dimensionradius errortexture errorperimeter errorarea errorsmoothness errorcompactness errorconcavity errorconcave points errorsymmetry errorfractal dimension errorworst radiusworst textureworst perimeterworst areaworst smoothnessworst compactnessworst concavityworst concave pointsworst symmetryworst fractal dimensiontarget
017.9910.38122.801001.00.118400.277600.30010.147100.24190.078711.09500.90538.589153.400.0063990.049040.053730.015870.030030.00619325.3817.33184.602019.00.16220.66560.71190.26540.46010.11890malign
120.5717.77132.901326.00.084740.078640.08690.070170.18120.056670.54350.73393.39874.080.0052250.013080.018600.013400.013890.00353224.9923.41158.801956.00.12380.18660.24160.18600.27500.08902malign
219.6921.25130.001203.00.109600.159900.19740.127900.20690.059990.74560.78694.58594.030.0061500.040060.038320.020580.022500.00457123.5725.53152.501709.00.14440.42450.45040.24300.36130.08758malign
311.4220.3877.58386.10.142500.283900.24140.105200.25970.097440.49561.15603.44527.230.0091100.074580.056610.018670.059630.00920814.9126.5098.87567.70.20980.86630.68690.25750.66380.17300malign
420.2914.34135.101297.00.100300.132800.19800.104300.18090.058830.75720.78135.43894.440.0114900.024610.056880.018850.017560.00511522.5416.67152.201575.00.13740.20500.40000.16250.23640.07678malign
\n", + "
" + ], + "text/plain": [ + " mean radius mean texture ... worst fractal dimension target\n", + "0 17.99 10.38 ... 0.11890 malign\n", + "1 20.57 17.77 ... 0.08902 malign\n", + "2 19.69 21.25 ... 0.08758 malign\n", + "3 11.42 20.38 ... 0.17300 malign\n", + "4 20.29 14.34 ... 0.07678 malign\n", + "\n", + "[5 rows x 31 columns]" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 55 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "cEHLrU0gHRtu" + }, + "source": [ + "## Exercício 2 - Boston Housing Price" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "8G9GZnubHYjy", + "outputId": "c83f182b-c3f4-4d55-9ac8-ce1690d71acf", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 195 + } + }, + "source": [ + "from sklearn.datasets import load_boston\n", + "\n", + "boston = load_boston()\n", + "X = boston['data']\n", + "y = boston['target']\n", + "\n", + "df_boston = pd.DataFrame(np.c_[X, y], columns = np.append(boston['feature_names'], ['target']))\n", + "df_boston.head()" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
CRIMZNINDUSCHASNOXRMAGEDISRADTAXPTRATIOBLSTATtarget
00.0063218.02.310.00.5386.57565.24.09001.0296.015.3396.904.9824.0
10.027310.07.070.00.4696.42178.94.96712.0242.017.8396.909.1421.6
20.027290.07.070.00.4697.18561.14.96712.0242.017.8392.834.0334.7
30.032370.02.180.00.4586.99845.86.06223.0222.018.7394.632.9433.4
40.069050.02.180.00.4587.14754.26.06223.0222.018.7396.905.3336.2
\n", + "
" + ], + "text/plain": [ + " CRIM ZN INDUS CHAS NOX ... TAX PTRATIO B LSTAT target\n", + "0 0.00632 18.0 2.31 0.0 0.538 ... 296.0 15.3 396.90 4.98 24.0\n", + "1 0.02731 0.0 7.07 0.0 0.469 ... 242.0 17.8 396.90 9.14 21.6\n", + "2 0.02729 0.0 7.07 0.0 0.469 ... 242.0 17.8 392.83 4.03 34.7\n", + "3 0.03237 0.0 2.18 0.0 0.458 ... 222.0 18.7 394.63 2.94 33.4\n", + "4 0.06905 0.0 2.18 0.0 0.458 ... 222.0 18.7 396.90 5.33 36.2\n", + "\n", + "[5 rows x 14 columns]" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 56 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "QlAdIYfmHaE8" + }, + "source": [ + "## Exercício 3 - Iris\n", + "* [Aqui](https://en.wikipedia.org/wiki/Iris_flower_data_set) você obterá mais informações sobre o dataframe iris." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "Rke4C3wFHfYU", + "outputId": "7a1966b5-c787-4130-9ff0-55cc65bdbba2", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 195 + } + }, + "source": [ + "from sklearn.datasets import load_iris\n", + "\n", + "iris = load_iris()\n", + "X= iris['data']\n", + "y= iris['target']\n", + "\n", + "df_iris = pd.DataFrame(np.c_[X, y], columns = np.append(iris['feature_names'], ['target']))\n", + "df_iris['target'] = df_iris['target'].map({0: 'setosa', 1: 'versicolor', 2: 'virginica'})\n", + "df_iris.head()" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
sepal length (cm)sepal width (cm)petal length (cm)petal width (cm)target
05.13.51.40.2setosa
14.93.01.40.2setosa
24.73.21.30.2setosa
34.63.11.50.2setosa
45.03.61.40.2setosa
\n", + "
" + ], + "text/plain": [ + " sepal length (cm) sepal width (cm) ... petal width (cm) target\n", + "0 5.1 3.5 ... 0.2 setosa\n", + "1 4.9 3.0 ... 0.2 setosa\n", + "2 4.7 3.2 ... 0.2 setosa\n", + "3 4.6 3.1 ... 0.2 setosa\n", + "4 5.0 3.6 ... 0.2 setosa\n", + "\n", + "[5 rows x 5 columns]" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 57 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "6qn3gC4NHj-p" + }, + "source": [ + "## Exercícios 4 - Diabetes" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "P-esq5TSHnf6", + "outputId": "eb042bc6-ad2f-49cb-eb35-2e1ba208e93d", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 195 + } + }, + "source": [ + "from sklearn.datasets import load_diabetes\n", + "\n", + "diabetes = load_diabetes()\n", + "X = diabetes['data']\n", + "y = diabetes['target']\n", + "\n", + "df_diabetes = pd.DataFrame(np.c_[X, y], columns = np.append(diabetes['feature_names'], ['target']))\n", + "df_diabetes.head()" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
agesexbmibps1s2s3s4s5s6target
00.0380760.0506800.0616960.021872-0.044223-0.034821-0.043401-0.0025920.019908-0.017646151.0
1-0.001882-0.044642-0.051474-0.026328-0.008449-0.0191630.074412-0.039493-0.068330-0.09220475.0
20.0852990.0506800.044451-0.005671-0.045599-0.034194-0.032356-0.0025920.002864-0.025930141.0
3-0.089063-0.044642-0.011595-0.0366560.0121910.024991-0.0360380.0343090.022692-0.009362206.0
40.005383-0.044642-0.0363850.0218720.0039350.0155960.008142-0.002592-0.031991-0.046641135.0
\n", + "
" + ], + "text/plain": [ + " age sex bmi bp ... s4 s5 s6 target\n", + "0 0.038076 0.050680 0.061696 0.021872 ... -0.002592 0.019908 -0.017646 151.0\n", + "1 -0.001882 -0.044642 -0.051474 -0.026328 ... -0.039493 -0.068330 -0.092204 75.0\n", + "2 0.085299 0.050680 0.044451 -0.005671 ... -0.002592 0.002864 -0.025930 141.0\n", + "3 -0.089063 -0.044642 -0.011595 -0.036656 ... 0.034309 0.022692 -0.009362 206.0\n", + "4 0.005383 -0.044642 -0.036385 0.021872 ... -0.002592 -0.031991 -0.046641 135.0\n", + "\n", + "[5 rows x 11 columns]" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 58 + } + ] + } + ] +} \ No newline at end of file From f5d9812ae9f7d5994eddd05221b7e93e900093a1 Mon Sep 17 00:00:00 2001 From: MariaJacobs70 <72224154+MariaJacobs70@users.noreply.github.com> Date: Wed, 21 Oct 2020 19:12:45 -0300 Subject: [PATCH 9/9] Criado usando o Colaboratory --- ...5_00__Machine_Learning___DSWP mexido.ipynb | 4311 +++++++++++++++++ 1 file changed, 4311 insertions(+) create mode 100644 Notebooks/NB15_00__Machine_Learning___DSWP mexido.ipynb diff --git a/Notebooks/NB15_00__Machine_Learning___DSWP mexido.ipynb b/Notebooks/NB15_00__Machine_Learning___DSWP mexido.ipynb new file mode 100644 index 000000000..d894862e9 --- /dev/null +++ b/Notebooks/NB15_00__Machine_Learning___DSWP mexido.ipynb @@ -0,0 +1,4311 @@ +{ + "nbformat": 4, + "nbformat_minor": 0, + "metadata": { + "kernelspec": { + "name": "python3", + "display_name": "Python 3" + }, + "colab": { + "name": "NB15_00__Machine_Learning.ipynb", + "provenance": [], + "include_colab_link": true + }, + "accelerator": "TPU" + }, + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "view-in-github", + "colab_type": "text" + }, + "source": [ + "\"Open" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ShVXyGj9wkgN" + }, + "source": [ + "

MACHINE LEARNING WITH PYTHON

" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "aYQ4cDfcPu4e" + }, + "source": [ + "___\n", + "# **NOTAS E OBSERVAÇÕES**\n", + "* Abordar o impacto do desbalanceamento da amostra;\n", + "* Colocar AUROC no material e mostrar o cut off para classificação entre 0 e 1." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "5YvhLC_uf4_G" + }, + "source": [ + "___\n", + "# **AGENDA**\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "QgX6n2VDyY1O" + }, + "source": [ + "___\n", + "# **REFERÊNCIAS**\n", + "* [scikit-learn - Machine Learning With Python](https://scikit-learn.org/stable/);\n", + "* [An Introduction to Machine Learning Theory and Its Applications: A Visual Tutorial with Examples](https://www.toptal.com/machine-learning/machine-learning-theory-an-introductory-primer)\n", + "* [The Difference Between Artificial Intelligence, Machine Learning, and Deep Learning](https://medium.com/iotforall/the-difference-between-artificial-intelligence-machine-learning-and-deep-learning-3aa67bff5991)\n", + "* [A Gentle Guide to Machine Learning](https://blog.monkeylearn.com/a-gentle-guide-to-machine-learning/)\n", + "* [A Visual Introduction to Machine Learning](http://www.r2d3.us/visual-intro-to-machine-learning-part-1/)\n", + "* [Introduction to Machine Learning](http://alex.smola.org/drafts/thebook.pdf)\n", + "* [The 10 Statistical Techniques Data Scientists Need to Master](https://medium.com/cracking-the-data-science-interview/the-10-statistical-techniques-data-scientists-need-to-master-1ef6dbd531f7)\n", + "* [Tune: a library for fast hyperparameter tuning at any scale](https://towardsdatascience.com/fast-hyperparameter-tuning-at-scale-d428223b081c)\n", + "* [How to lie with Data Science](https://towardsdatascience.com/how-to-lie-with-data-science-5090f3891d9c)\n", + "* [5 Reasons “Logistic Regression” should be the first thing you learn when becoming a Data Scientist](https://towardsdatascience.com/5-reasons-logistic-regression-should-be-the-first-thing-you-learn-when-become-a-data-scientist-fcaae46605c4)\n", + "* [Machine learning on categorical variables](https://towardsdatascience.com/machine-learning-on-categorical-variables-3b76ffe4a7cb)\n", + "\n", + "## Deep Learning & Neural Networks\n", + "\n", + "- [An Introduction to Neural Networks](http://www.cs.stir.ac.uk/~lss/NNIntro/InvSlides.html)\n", + "- [An Introduction to Image Recognition with Deep Learning](https://medium.com/@ageitgey/machine-learning-is-fun-part-3-deep-learning-and-convolutional-neural-networks-f40359318721)\n", + "- [Neural Networks and Deep Learning](http://neuralnetworksanddeeplearning.com/index.html)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "TsCbZd2epfxo" + }, + "source": [ + "___\n", + "# **INTRODUÇÃO**\n", + "\n", + "* \"__Information is the oil of the 21st century, and analytics is the combustion engine__.\" - Peter Sondergaard, SVP, Garner Research;\n", + "\n", + "\n", + ">O foco deste capítulo será:\n", + "* Linear, Logistic Regression, Decision Tree, Random Forest, Support Vector Machine and XGBoost algorithms for building Machine Learning models;\n", + "* Entender como resolver problemas de classificação e Regressão;\n", + "* Aplicar técnicas de Ensemble como Bagging e Boosting;\n", + "* Como medir a acurácia dos modelos de Machine Learning;\n", + "* Aprender os principais algoritmos de Machine Learning tanto das técnicas de aprendizagem supervisionada quanto da não-supervisionada.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "HqqB2vaHXMGt" + }, + "source": [ + "___\n", + "# **ARTIFICIAL INTELLIGENCE VS MACHINE LEARNING VS DEEP LEARNING**\n", + "* **Machine Learning** - dá aos computadores a capacidade de aprender sem serem explicitamente programados. Os computadores podem melhorar sua capacidade de aprendizagem através da prática de uma tarefa, geralmente usando grandes conjuntos de dados.\n", + "* **Deep Learning** - é um método de Machine Learning que depende de redes neurais artificiais, permitindo que os sistemas de computadores aprendam pelo exemplo, assim como nós humanos aprendemos." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "P961GcguXFFA" + }, + "source": [ + "![EvolutionOfAI](https://github.com/MathMachado/Materials/blob/master/Evolution%20of%20AI.PNG?raw=true)\n", + "\n", + "Source: [Artificial Intelligence vs. Machine Learning vs. Deep Learning](https://github.com/MathMachado/P4ML/blob/DS_Python/Material/Evolution%20of%20AI.PNG)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "lkqGtO88ZkPr" + }, + "source": [ + "![AI_vs_ML_vs_DL](https://github.com/MathMachado/Materials/blob/master/AI_vs_ML_vs_DL.PNG?raw=true)\n", + "\n", + "Source: [Artificial Intelligence vs. Machine Learning vs. Deep Learning](https://towardsdatascience.com/artificial-intelligence-vs-machine-learning-vs-deep-learning-2210ba8cc4ac)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "xesQpzfmaqj6" + }, + "source": [ + "![ML_vs_DL](https://github.com/MathMachado/Materials/blob/master/ML_vs_DL.PNG?raw=true)\n", + "\n", + "Source: [Artificial Intelligence vs. Machine Learning vs. Deep Learning](https://towardsdatascience.com/artificial-intelligence-vs-machine-learning-vs-deep-learning-2210ba8cc4ac)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "KeIVR59IIS7f" + }, + "source": [ + "___\n", + "# **MACHINE LEARNING - TECHNIQUES**\n", + "\n", + "* Supervised Learning\n", + "* Unsupervised Learning\n", + "\n", + "![MachineLearning](https://github.com/MathMachado/Materials/blob/master/MachineLearningTechniques.jpg?raw=true)\n", + "\n", + "Source: [Machine Learning for Everyone](https://vas3k.com/blog/machine_learning/?source=post_page-----885aa35db58b----------------------)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "rvwp5UHdBiup" + }, + "source": [ + "___\n", + "# **NOSSO FOCO AQUI SERÁ...**\n", + "\n", + "![ClassicalML](https://github.com/MathMachado/Materials/blob/master/ClassicalML.jpg?raw=true)\n", + "\n", + "Source: [Machine Learning for Everyone](https://vas3k.com/blog/machine_learning/?source=post_page-----885aa35db58b----------------------)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "cBLSvJTXHBjK" + }, + "source": [ + "___\n", + "# **CHEETSHEET**" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ZdjR3nahUuKq" + }, + "source": [ + "\n", + "![Scikit-Learn](https://github.com/MathMachado/Materials/blob/master/scikit-learn-1.png?raw=true)\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "MkBSvyorGXQz" + }, + "source": [ + "___\n", + "# **CROSS-VALIDATION**\n", + "> Cross-validation (CV) é uma técnica na qual treinamos nosso modelo usando o subconjunto do dataframe de treinamento X e validamos noutro subconjunto do dataframe de treinamento X. A figura abaixo nos ajuda a entender como funciona CV:\n", + "\n", + "![Cross-Validation](https://github.com/MathMachado/Materials/blob/master/CV2.PNG?raw=true)\n", + "\n", + "Source: [5 Reasons why you should use Cross-Validation in your Data Science Projects](https://towardsdatascience.com/5-reasons-why-you-should-use-cross-validation-in-your-data-science-project-8163311a1e79)\n", + "\n", + "* **Vantagens do uso de CV**:\n", + " * Modelos com melhor acurácia;\n", + " * Melhor uso dos dados, pois todos os dados são utilizados como treinamento e validação. Portanto, qualquer problema com os dados serão encontrados nesta fase.\n", + "\n", + "* **Leitura Adicional**\n", + " * [Cross-Validation in Machine Learning](https://towardsdatascience.com/cross-validation-in-machine-learning-72924a69872f)\n", + " * [5 Reasons why you should use Cross-Validation in your Data Science Projects](https://towardsdatascience.com/5-reasons-why-you-should-use-cross-validation-in-your-data-science-project-8163311a1e79)\n", + " * [Cross-validation: evaluating estimator performance](https://scikit-learn.org/stable/modules/cross_validation.html)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "yBR8tWV_lhQq" + }, + "source": [ + "___\n", + "# **ENSEMBLE METHODS**\n", + "* Métodos\n", + " * Bagging (Bootstrap AGGregatING)\n", + " * Boosting\n", + " * Stacking\n", + "* Evita overfitting (Overfitting é quando o modelo/função se ajusta muito bem o dados, sendo ineficiente para generalizar para outras amostras/população).\n", + "* Constroi meta-classificadores: combinar os resultados de vários algoritmos para produzir previsões mais precisas e robustas do que as previsões de cada classificador individual.\n", + "* Ensemble reduz/minimiza os efeitos das principais causas de erros nos modelos de Machine Learning:\n", + " * ruído;\n", + " * bias (viés);\n", + " * variância\n", + "\n", + "# Referências\n", + "* [Simple guide for ensemble learning methods](https://towardsdatascience.com/simple-guide-for-ensemble-learning-methods-d87cc68705a2) - Explica didaticamente como funcionam ensembes." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "25RW8u-Sj780" + }, + "source": [ + "### Leitura Adicional\n", + "* [Ensemble methods: bagging, boosting and stacking](https://towardsdatascience.com/ensemble-methods-bagging-boosting-and-stacking-c9214a10a205)\n", + "* [Ensemble Methods in Machine Learning: What are They and Why Use Them?](https://towardsdatascience.com/ensemble-methods-in-machine-learning-what-are-they-and-why-use-them-68ec3f9fef5f)\n", + "* [Ensemble Learning Using Scikit-learn](https://towardsdatascience.com/ensemble-learning-using-scikit-learn-85c4531ff86a)\n", + "* [Let’s Talk About Machine Learning Ensemble Learning In Python](https://medium.com/fintechexplained/lets-talk-about-machine-learning-ensemble-learning-in-python-382747e5fba8)\n", + "* [Boosting, Bagging, and Stacking — Ensemble Methods with sklearn and mlens](https://medium.com/@rrfd/boosting-bagging-and-stacking-ensemble-methods-with-sklearn-and-mlens-a455c0c982de)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "FugME1HSl4jJ" + }, + "source": [ + "___\n", + "# **PARAMETER TUNNING**" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "u_147cIRl9F1" + }, + "source": [ + "## GridSearch\n", + "* Encontra os parâmetros ótimos (hyperparameter tunning) que melhoram a acurácia dos modelos.\n", + "* Necessita dos seguintes inputs:\n", + " * A matrix $X_{p}$ com as $p$ COLUNAS (variáveis ou atributos) do dataframe;\n", + " * A matriz $y_{p}$ com a COLUNA-target;\n", + " * Exemplo: DecisionTree, RandomForestClassifier, XGBoostClassificer e etc;\n", + " * Um dicionário com os parâmetros a serem otimizados;\n", + " * O número de folds para o método de Cross-validation." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "39Sg77fbTWCO" + }, + "source": [ + "___\n", + "# **MODEL SELECTION & EVALUATION**\n", + "> Nesta fase identificamos e aplicamos as melhores métricas (Accuracy, Sensitivity, Specificity, F-Score, AUC, R-Sq, Adj R-SQ, RMSE (Root Mean Square Error)) para avaliar o desempenho/acurácia/performance dos modelos de ML.\n", + ">> Treinamos os modelos de ML usando a amostra de treinamento e avaliamos o desempenho/acurácia/performance na amostra de teste/validação.\n", + "\n", + "* Leitura Adicional\n", + " * [The 5 Classification Evaluation metrics every Data Scientist must know](https://towardsdatascience.com/the-5-classification-evaluation-metrics-you-must-know-aa97784ff226)\n", + " * [Confusion matrix and other metrics in machine learning](https://medium.com/hugo-ferreiras-blog/confusion-matrix-and-other-metrics-in-machine-learning-894688cb1c0a)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "oQQVzZ2ZTYrB" + }, + "source": [ + "## Confusion Matrix\n", + "* Termos associados à Confusion Matrix:\n", + " * **Verdadeiro Positivo** (TP = True Positive): Quando o valor observado é True e o modelo estima como True. Ou seja, o modelo acertou na estimativa.\n", + " * Exemplo: **Observado**: Fraude (Positive); **Modelo**: Fraude (Positive) --> Modelo acertou!\n", + " aquela linha do dataframe foi rotulado como fraude. \n", + " quando desenvolve o modelo preditivo e o seu modelo estimou como fraude\n", + " é tudo que a gente quer. nesse caso o modelo acertou\n", + " \n", + " * **Verdadeiro Negativo** (TN = True Negative): Quando o valor observado é False e o modelo estima como False. Ou seja, o modelo acertou na estimativa;\n", + " * Exemplo: **Observado**: NÃO-Fraude (Negative); **Modelo**: NÃO-Fraude (Negative) --> Modelo acertou!\n", + " * **Falso Positivo** (FP = False Positive): Quando o valor observado é False e o modelo estima como True. Ou seja, o modelo errou na estimativa. \n", + " * Exemplo: **Observado**: NÃO-Fraude (Negative); **Modelo**: Fraude (Positive) --> Modelo errou!\n", + " * **Falso Negativo** (FN = False Negative): Quando o valor observado é True e o modelo estima como False.\n", + " * Exemplo: **Observado**: Fraude (Positive); **Modelo**: NÃO-Fraude (Negative) --> Modelo errou!\n", + "\n", + "* Consulte [Confusion matrix](https://scikit-learn.org/stable/auto_examples/model_selection/plot_confusion_matrix.html#sphx-glr-auto-examples-model-selection-plot-confusion-matrix-py)\n", + "\n", + "![ConfusionMatrix](https://github.com/MathMachado/Materials/blob/master/ConfusionMatrix.PNG?raw=true)\n", + "\n", + "Source: [Confusion Matrix](https://subscription.packtpub.com/book/big_data_and_business_intelligence/9781838555078/6/ch06lvl1sec34/confusion-matrix)\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ci-6eiqBTgbL" + }, + "source": [ + "## Accuracy\n", + "> Accuracy - é o número de previsões corretas feitas pelo modelo.\n", + "\n", + "Responde à seguinte pergunta:\n", + "\n", + "```\n", + "Com que frequência o classificador classifica corretamente?\n", + "```\n", + "\n", + "$$Accuracy= \\frac{TP+TN}{TP+TN+FP+FN}$$" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "F7YI8X5TRx-R" + }, + "source": [ + "## Precision (ou Specificity)\n", + "> **Precision** - fornece informações sobre o desempenho em relação a Falsos Positivos (quantos capturamos).\n", + "\n", + "Responde à seguinte pergunta:\n", + "\n", + "```\n", + "Com relação ao resultado Positivo, com que frequência o classificador está correto?\n", + "```\n", + "\n", + "\n", + "$$Precision= \\frac{TP}{TP+FP}$$\n", + "\n", + "**Exemplo**: Precison nos dirá a proporção de clientes que o modelo estimou como sendo Fraude quando, na verdade, são fraude.\n", + "\n", + "**Comentário**: Se nosso foco é minimizar Falso Negativos (FN), então precisamos nos esforçar para termos Recall próximo de 100%." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "zO39n8x_Sz3L" + }, + "source": [ + "## Recall (ou Sensitivity)\n", + "> **Recall** - nos fornece informações sobre o desempenho de um classificador em relação a Falsos Negativos (quantos perdemos).\n", + "\n", + "Responde à seguinte pergunta:\n", + "\n", + "```\n", + "Quando o valor observado é Positivo, com que frequência o classificador está correto?\n", + "```\n", + "\n", + "$$Recall = Sensitivity = \\frac{TP}{TP+FN}$$\n", + "\n", + "**Exemplo**: Recall é a proporção de clientes observados como Fraude e que o modelo estima como Fraude.\n", + "\n", + "**Comentário**: Se nosso foco for minimizar Falso Positivos (FP), então precisamos nos esforçar para fazer Precision mais próximo de 100% possível." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "htS6rdHVVXRG" + }, + "source": [ + "## Specificity\n", + "> **Specificity** - proporção de TN por TN+FP.\n", + "\n", + "Responde à seguinte pergunta:\n", + "\n", + "```\n", + "Quando o valor observado é Negativo, com que frequência o classificador está correto?\n", + "```\n", + "\n", + "**Exemplo**: Specificity é a proporção de clientes NÃO-Fraude que o modelo estima como NÃO-Fraude.\n", + "\n", + "$$Specificity= \\frac{TN}{TN+FP}$$\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "mNn0twadTacc" + }, + "source": [ + "## F1-Score\n", + "> F1-Score é a média harmônica entre Recall e Precision e é um número entre 0 e 1. Quanto mais próximo de 1, melhor. Quanto mais próximo de 0, pior. Ou seja, é um equilíbrio entre Recall e Precision.\n", + "\n", + "$$F1\\_Score= 2\\left(\\frac{Recall*Precision}{Recall+Precision}\\right)$$" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "rsH9dMxazWCg" + }, + "source": [ + "# **DATAFRAME-EXEMPLO USADO NESTE TUTORIAL**\n", + "> Gerar um dataframe com 18 colunas, sendo 9 informativas, 6 redundantes e 3 repetidas:\n", + "\n", + "Para saber mais sobre a geração de dataframes-exemplo (toy), consulte [Synthetic data generation — a must-have skill for new data scientists](https://towardsdatascience.com/synthetic-data-generation-a-must-have-skill-for-new-data-scientists-915896c0c1ae)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "GEyDo_EIV_jV" + }, + "source": [ + "## Definir variáveis globais" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "TdwgpZ76WFaT" + }, + "source": [ + "i_CV= 10 # Número de Cross-Validations\n", + "i_Seed= 20111974 # semente por questões de reproducibilidade\n", + "f_Test_Size= 0.3 # Proporção do dataframe de validação" + ], + "execution_count": 1, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "gJTJfpwWzykS" + }, + "source": [ + "from sklearn.datasets import make_classification\n", + "X, y = make_classification(n_samples = 1000, n_features = 18, n_informative = 9, n_redundant = 6, n_repeated = 3, n_classes = 2, n_clusters_per_class = 1, random_state=i_Seed)" + ], + "execution_count": 2, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "OHO2befKJxR3" + }, + "source": [ + "___\n", + "# **DECISION TREE**\n", + "> Decision Trees possuem estrutura em forma de árvores.\n", + "\n", + "* **Principais Vantagens**:\n", + " * São algoritmos fáceis de entender, visualizar e interpretar;\n", + " * Captura facilmente padrões não-lineares presentes nos dados;\n", + " * Requer pouco poder computacional;\n", + " * Lida bem com COLUNAS numéricas ou categóricas;\n", + " * Não requer os dados sejam normalizados;\n", + " * Pode ser utilizado como Feature Engineering ao lidar com Missing Values;\n", + " * Pode ser utilizado como Feature Selection;\n", + " * Não requer suposições sobre a distribuição dos dados por causa da natureza não-paramétrica do algoritmo\n", + "\n", + "* **Principais desvantagens**\n", + " * Propenso a Overfitting, pois Decision Trees podem construir árvores complexas que não sejam capazes de generalizar bem os dados. As coisas complicam muito se a amostra de treinamento possuir outliers. Portanto, **recomenda-se fortemente a tratar os outliers previamente**.\n", + " * Pode criar árvores viesadas se tivermos um dataframe não-balanceado ou que alguma classe seja dominante. Por conta disso, **recomenda-se balancear o dataframe previamente para se evitar esse problema**.\n", + "\n", + "* **Principais parâmetros**\n", + " * **Gini Index** - é uma métrica que mede a frequência com que um ponto/observação aleatoriamente selecionado seria incorretamente identificado.\n", + " * Portanto, quanto menor o valor de Gini Index, melhor a COLUNA;\n", + " * **Entropy** - é uma métrica que mede aleatoriedade da informação presente nos dados.\n", + " * Portanto, quanto maior a entropia da COLUNA, pior ela se torna para nos ajudar a tomar uma conclusão (classificar, por exemplo).\n", + "\n", + "## **Referências**:\n", + "* [1.10. Decision Trees](https://scikit-learn.org/stable/modules/tree.html).\n", + "* [Decision Tree Algorithm With Hands On Example](https://medium.com/datadriveninvestor/decision-tree-algorithm-with-hands-on-example-e6c2afb40d38) - ótimo tutorial para aprender, entender, interpretar e calcular os índices de Gini e entropia.\n", + "* [Intuitive Guide to Understanding Decision Trees](https://towardsdatascience.com/light-on-math-machine-learning-intuitive-guide-to-understanding-decision-trees-adb2165ccab7) - ótimo tutorial para aprender, entender, interpretar e calcular os índices de Gini e entropia.\n", + "* [The Complete Guide to Decision Trees](https://towardsdatascience.com/the-complete-guide-to-decision-trees-28a4e3c7be14)\n", + "* [Creating and Visualizing Decision Tree Algorithm in Machine Learning Using Sklearn](https://intellipaat.com/blog/decision-tree-algorithm-in-machine-learning/) - Muito didático!\n", + "* [Decision Trees in Machine Learning](https://towardsdatascience.com/decision-trees-in-machine-learning-641b9c4e8052)\n", + "\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "FrMkPN5aLp0Y" + }, + "source": [ + "## Carregar as bibliotecas" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "FVU1CM0PKgO4" + }, + "source": [ + "import numpy as np\n", + "import pandas as pd\n", + "import seaborn as sns\n", + "import matplotlib.pyplot as plt\n", + "\n", + "import warnings\n", + "warnings.filterwarnings(\"ignore\")" + ], + "execution_count": 6, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "15clh4XrISpz" + }, + "source": [ + "## Carregar/Ler os dados" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "UMPL46w2IWJw" + }, + "source": [ + "l_colunas= ['v1', 'v2', 'v3', 'v4', 'v5', 'v6', 'v7', 'v8', 'v9', 'v10', 'v11', 'v12', 'v13', 'v14', 'v15', 'v16', 'v17', 'v18']\n", + "df_X = pd.DataFrame(X, columns = l_colunas)\n", + "df_y = pd.DataFrame(y, columns = ['target'])" + ], + "execution_count": 8, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "MFaQF2MGFl_M", + "outputId": "0b90cdfd-888e-43f6-d020-a6c7b296acf5", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 224 + } + }, + "source": [ + "df_X.head()" + ], + "execution_count": 9, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
v1v2v3v4v5v6v7v8v9v10v11v12v13v14v15v16v17v18
00.0684414.211842-2.5583023.665482-3.8351583.4998512.4908563.6654820.2451170.8671722.8655460.493956-5.1485962.8655463.499851-0.630619-0.978320-0.888270
1-4.8240210.179509-2.9844731.033618-3.8934263.428734-3.3346051.033618-0.882780-0.7532811.441522-1.395514-4.0028801.4415223.4287340.3399201.891538-6.109676
21.389530-0.2264761.8774002.7134264.6302570.516455-3.7430272.7134261.2840392.030797-1.0955361.560159-1.014211-1.0955360.516455-1.4778450.9605262.060204
31.1458092.2559460.2073644.6658172.2946786.5013060.9647704.6658170.1194103.1963541.8947873.519138-4.7578071.8947876.501306-3.7890290.5794911.397106
4-0.9366463.697163-3.3636173.805126-1.7544304.9543460.4066053.805126-0.8247381.3825911.665704-0.649758-3.5130361.6657044.9543460.2570520.904244-3.071354
\n", + "
" + ], + "text/plain": [ + " v1 v2 v3 ... v16 v17 v18\n", + "0 0.068441 4.211842 -2.558302 ... -0.630619 -0.978320 -0.888270\n", + "1 -4.824021 0.179509 -2.984473 ... 0.339920 1.891538 -6.109676\n", + "2 1.389530 -0.226476 1.877400 ... -1.477845 0.960526 2.060204\n", + "3 1.145809 2.255946 0.207364 ... -3.789029 0.579491 1.397106\n", + "4 -0.936646 3.697163 -3.363617 ... 0.257052 0.904244 -3.071354\n", + "\n", + "[5 rows x 18 columns]" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 9 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "s-ibdD2ZG7tm", + "outputId": "85710f30-784a-4157-8122-2ceb89b66525", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + } + }, + "source": [ + "df_X.shape" + ], + "execution_count": 10, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "(1000, 18)" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 10 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "f9cqRaywa_TR", + "outputId": "50b3b58a-6717-4459-dc21-9d79a699d56b", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + } + }, + "source": [ + "set(df_y['target'])" + ], + "execution_count": 11, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "{0, 1}" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 11 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "BN6jbpn6Iwmu" + }, + "source": [ + "## Estatísticas Descritivas básicas do dataframe - df.describe()" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "KlwhxxUNIyYs", + "outputId": "df125f8e-1c1d-4225-8e2b-1b59033864d9", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 317 + } + }, + "source": [ + "df_X.describe()" + ], + "execution_count": 12, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
v1v2v3v4v5v6v7v8v9v10v11v12v13v14v15v16v17v18
count1000.0000001000.0000001000.0000001000.0000001000.0000001000.0000001000.0000001000.0000001000.0000001000.0000001000.0000001000.0000001000.0000001000.0000001000.0000001000.0000001000.0000001000.000000
mean-0.0851591.0342270.6574081.4053170.6872791.1315600.1080531.4053171.0070231.0488010.0792480.001650-0.3654380.0792481.131560-0.0277510.9846060.633624
std2.0022471.6315073.6087722.2568574.0195984.4818321.9813072.2568571.8632881.6439001.9492731.9326414.1606681.9492734.4818322.0654551.8505933.552991
min-6.944169-4.620754-16.300139-6.235192-12.454256-14.305401-6.152747-6.235192-5.484992-3.293216-7.135349-5.705500-9.120941-7.135349-14.305401-6.009023-5.035184-11.439074
25%-1.305566-0.089052-1.623657-0.152888-1.854645-1.684751-1.216983-0.152888-0.240908-0.012710-1.209675-1.292162-3.555363-1.209675-1.684751-1.436673-0.261610-1.691346
50%0.0525230.9941500.5738491.4499310.8123641.2815040.1670911.4499311.0661251.0128990.1803440.035237-0.9666380.1803441.281504-0.0001900.9757930.844784
75%1.3838532.0719953.0385862.8871413.4139524.0081031.4387192.8871412.2881882.1872021.4391991.3153422.7458061.4391994.0081031.3653692.2565043.109330
max4.9971727.35486011.7201658.49456612.84441815.9998036.2935508.4945668.1465596.5231806.2524485.53821611.2593506.25244815.9998036.5315617.64680212.090528
\n", + "
" + ], + "text/plain": [ + " v1 v2 ... v17 v18\n", + "count 1000.000000 1000.000000 ... 1000.000000 1000.000000\n", + "mean -0.085159 1.034227 ... 0.984606 0.633624\n", + "std 2.002247 1.631507 ... 1.850593 3.552991\n", + "min -6.944169 -4.620754 ... -5.035184 -11.439074\n", + "25% -1.305566 -0.089052 ... -0.261610 -1.691346\n", + "50% 0.052523 0.994150 ... 0.975793 0.844784\n", + "75% 1.383853 2.071995 ... 2.256504 3.109330\n", + "max 4.997172 7.354860 ... 7.646802 12.090528\n", + "\n", + "[8 rows x 18 columns]" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 12 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "N_QhFqyZOKFB" + }, + "source": [ + "## Selecionar as amostras de treinamento e validação\n", + "* Neste fase, devemos selecionar amostras de treinamento para treinar o modelo de Machine Learning e validação, para validar o modelo de Machine Learning.\n", + "* Geralmente usamos 70% da amostra para treinamento e 30% validação. Outras opções são usar os percentuais 80/20 ou 75/25 (default).\n", + "* Consulte [sklearn.model_selection.train_test_split](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html) para mais detalhes.\n" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "8sKBgs-QOOfn" + }, + "source": [ + "from sklearn.model_selection import train_test_split\n", + "\n", + "X_train, X_test, y_train, y_test = train_test_split(df_X, df_y, test_size = f_Test_Size, random_state = i_Seed)" + ], + "execution_count": 18, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "TPTKBBHgOpoA", + "outputId": "57bc1b9b-58fb-4cf5-e9cb-7d13beeb2a49", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + } + }, + "source": [ + "X_train.shape" + ], + "execution_count": 19, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "(700, 18)" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 19 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "lEn_LLs2OtRI", + "outputId": "2076fb6c-1c03-4fa9-dadc-556957aaa75b", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + } + }, + "source": [ + "y_train.shape" + ], + "execution_count": 20, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "(700, 1)" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 20 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "_uAw8EcyOvrG", + "outputId": "3a6d8556-0fcf-4e00-a884-4cc53470f4f0", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + } + }, + "source": [ + "X_test.shape" + ], + "execution_count": 16, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "(300, 18)" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 16 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "A2LYI-9hOyXI", + "outputId": "12582890-711b-4200-ffbf-e431931818f7", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + } + }, + "source": [ + "y_test.shape" + ], + "execution_count": 21, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "(300, 1)" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 21 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "npgoBSX2dd4l" + }, + "source": [ + "## Treinar o algoritmo com os dados de treinamento\n", + "### Carregar os algoritmos/libraries" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "hcvzrtolGfnQ", + "outputId": "95e01ab3-f0e1-4715-939e-22045236925f", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 68 + } + }, + "source": [ + "!pip install graphviz\n", + "!pip install pydotplus" + ], + "execution_count": 22, + "outputs": [ + { + "output_type": "stream", + "text": [ + "Requirement already satisfied: graphviz in /usr/local/lib/python3.6/dist-packages (0.10.1)\n", + "Requirement already satisfied: pydotplus in /usr/local/lib/python3.6/dist-packages (2.0.2)\n", + "Requirement already satisfied: pyparsing>=2.0.1 in /usr/local/lib/python3.6/dist-packages (from pydotplus) (2.4.7)\n" + ], + "name": "stdout" + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "v_pF-HH3JKL2" + }, + "source": [ + "from sklearn.metrics import accuracy_score\n", + "#from sklearn.model_selection import train_test_split\n", + "#from sklearn.metrics import classification_report\n", + "from sklearn.metrics import confusion_matrix\n", + "\n", + "from sklearn.model_selection import GridSearchCV\n", + "from sklearn.model_selection import cross_val_score\n", + "from time import time\n", + "from operator import itemgetter\n", + "from scipy.stats import randint\n", + "\n", + "from sklearn.tree import export_graphviz\n", + "from sklearn.externals.six import StringIO \n", + "from IPython.display import Image \n", + "import pydotplus\n", + "\n", + "np.set_printoptions(suppress=True)" + ], + "execution_count": 24, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "9ROlyvgij2yl" + }, + "source": [ + "Função para plotar a Confusion Matrix extraído de [Confusion Matrix Visualization](https://medium.com/@dtuk81/confusion-matrix-visualization-fc31e3f30fea)" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "klQ0FLOIgeX1" + }, + "source": [ + "def mostra_confusion_matrix(cf, \n", + " group_names = None, \n", + " categories = 'auto', \n", + " count = True, \n", + " percent = True, \n", + " cbar = True, \n", + " xyticks = False, \n", + " xyplotlabels = True, \n", + " sum_stats = True, figsize = (8, 8), \n", + " cmap = 'Blues'):\n", + " '''\n", + " This function will make a pretty plot of an sklearn Confusion Matrix cm using a Seaborn heatmap visualization.\n", + " Arguments\n", + " ---------\n", + " cf: confusion matrix to be passed in\n", + " group_names: List of strings that represent the labels row by row to be shown in each square.\n", + " categories: List of strings containing the categories to be displayed on the x,y axis. Default is 'auto'\n", + " count: If True, show the raw number in the confusion matrix. Default is True.\n", + " normalize: If True, show the proportions for each category. Default is True.\n", + " cbar: If True, show the color bar. The cbar values are based off the values in the confusion matrix.\n", + " Default is True.\n", + " xyticks: If True, show x and y ticks. Default is True.\n", + " xyplotlabels: If True, show 'True Label' and 'Predicted Label' on the figure. Default is True.\n", + " sum_stats: If True, display summary statistics below the figure. Default is True.\n", + " figsize: Tuple representing the figure size. Default will be the matplotlib rcParams value.\n", + " cmap: Colormap of the values displayed from matplotlib.pyplot.cm. Default is 'Blues'\n", + " See http://matplotlib.org/examples/color/colormaps_reference.html\n", + " '''\n", + "\n", + " # CODE TO GENERATE TEXT INSIDE EACH SQUARE\n", + " blanks = ['' for i in range(cf.size)]\n", + "\n", + " if group_names and len(group_names)==cf.size:\n", + " group_labels = [\"{}\\n\".format(value) for value in group_names]\n", + " else:\n", + " group_labels = blanks\n", + "\n", + " if count:\n", + " group_counts = [\"{0:0.0f}\\n\".format(value) for value in cf.flatten()]\n", + " else:\n", + " group_counts = blanks\n", + "\n", + " if percent:\n", + " group_percentages = [\"{0:.2%}\".format(value) for value in cf.flatten()/np.sum(cf)]\n", + " else:\n", + " group_percentages = blanks\n", + "\n", + " box_labels = [f\"{v1}{v2}{v3}\".strip() for v1, v2, v3 in zip(group_labels,group_counts,group_percentages)]\n", + " box_labels = np.asarray(box_labels).reshape(cf.shape[0],cf.shape[1])\n", + "\n", + " # CODE TO GENERATE SUMMARY STATISTICS & TEXT FOR SUMMARY STATS\n", + " if sum_stats:\n", + " #Accuracy is sum of diagonal divided by total observations\n", + " accuracy = np.trace(cf) / float(np.sum(cf))\n", + "\n", + " #if it is a binary confusion matrix, show some more stats\n", + " if len(cf)==2:\n", + " #Metrics for Binary Confusion Matrices\n", + " precision = cf[1,1] / sum(cf[:,1])\n", + " recall = cf[1,1] / sum(cf[1,:])\n", + " f1_score = 2*precision*recall / (precision + recall)\n", + " stats_text = \"\\n\\nAccuracy={:0.3f}\\nPrecision={:0.3f}\\nRecall={:0.3f}\\nF1 Score={:0.3f}\".format(accuracy,precision,recall,f1_score)\n", + " else:\n", + " stats_text = \"\\n\\nAccuracy={:0.3f}\".format(accuracy)\n", + " else:\n", + " stats_text = \"\"\n", + "\n", + " # SET FIGURE PARAMETERS ACCORDING TO OTHER ARGUMENTS\n", + " if figsize==None:\n", + " #Get default figure size if not set\n", + " figsize = plt.rcParams.get('figure.figsize')\n", + "\n", + " if xyticks==False:\n", + " #Do not show categories if xyticks is False\n", + " categories=False\n", + "\n", + " # MAKE THE HEATMAP VISUALIZATION\n", + " plt.figure(figsize=figsize)\n", + " sns.heatmap(cf,annot=box_labels,fmt=\"\",cmap=cmap,cbar=cbar,xticklabels=categories,yticklabels=categories)\n", + "\n", + " if xyplotlabels:\n", + " plt.ylabel('True label')\n", + " plt.xlabel('Predicted label' + stats_text)\n", + " else:\n", + " plt.xlabel(stats_text)" + ], + "execution_count": 27, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "YJMS9ePQ6B6t" + }, + "source": [ + "**Atenção**: Para evitar overfitting nos algoritmos DecisionTreeClassifier, considere min_samples_split= 2 como default." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "nNeRHYePJc-r" + }, + "source": [ + "from sklearn.tree import DecisionTreeClassifier\n", + "\n", + "# Instancia com os parâmetros sugeridos para se evitar overfitting:\n", + "##ml_DT é um objeto. e ele está configurando este objeto\n", + "ml_DT= DecisionTreeClassifier(criterion = 'gini', \n", + " splitter = 'best', \n", + " max_depth = None, \n", + " min_samples_split=2, \n", + " min_samples_leaf = 1, \n", + " min_weight_fraction_leaf = 0.0, \n", + " max_features = None, \n", + " random_state = i_Seed, \n", + " max_leaf_nodes = None, \n", + " min_impurity_decrease = 0.0, \n", + " min_impurity_split = None, class_weight = None, \n", + " presort = False)" + ], + "execution_count": 28, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "OgAHfXVo-Nw8", + "outputId": "f422e8b3-9aff-40db-a5a8-ac5cf8b10680", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 119 + } + }, + "source": [ + "# Treina o algoritmo\n", + "##o fit pega o objeto que ele configurou e vai aplicar um fit e passa a amostra de treinamento e as correspondentes respostas (y)\n", + "ml_DT.fit(X_train, y_train)" + ], + "execution_count": 29, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "DecisionTreeClassifier(ccp_alpha=0.0, class_weight=None, criterion='gini',\n", + " max_depth=None, max_features=None, max_leaf_nodes=None,\n", + " min_impurity_decrease=0.0, min_impurity_split=None,\n", + " min_samples_leaf=1, min_samples_split=2,\n", + " min_weight_fraction_leaf=0.0, presort=False,\n", + " random_state=20111974, splitter='best')" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 29 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "6exa9D8R2fDJ", + "outputId": "debe52a1-3ecc-4ec9-fc67-2c162e28b375", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 51 + } + }, + "source": [ + "# Cross-Validation com 10 folds\n", + "a_scores_CV = cross_val_score(ml_DT, X_train, y_train, cv = i_CV)\n", + "print(f'Média das Acurácias calculadas pelo CV....: {100*round(a_scores_CV.mean(),4)}')\n", + "print(f'std médio das Acurácias calculadas pelo CV: {100*round(a_scores_CV.std(),4)}')" + ], + "execution_count": 30, + "outputs": [ + { + "output_type": "stream", + "text": [ + "Média das Acurácias calculadas pelo CV....: 91.43\n", + "std médio das Acurácias calculadas pelo CV: 3.44\n" + ], + "name": "stdout" + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "6_rYker2gzeG" + }, + "source": [ + "**Interpretação**: Nosso classificador (DecisionTreeClassifier) tem uma acurácia média de 91,43% (base de treinamento). Além disso, o std é da ordem de 3,66%, ou seja, pequena. Vamos tentar melhorar a acurácia do classificador usando parameter tunning (GridSearchCV)." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "tkwchmkP3p_A", + "outputId": "9d2f51c5-d500-4508-a756-f0863aa36bf1", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 51 + } + }, + "source": [ + "print(f'Acurácias: {a_scores_CV}')" + ], + "execution_count": 31, + "outputs": [ + { + "output_type": "stream", + "text": [ + "Acurácias: [0.9 0.98571429 0.85714286 0.92857143 0.88571429 0.94285714\n", + " 0.92857143 0.9 0.88571429 0.92857143]\n" + ], + "name": "stdout" + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "sI31WkZs2ht_" + }, + "source": [ + "# Faz predições...\n", + "y_pred = ml_DT.predict(X_test)" + ], + "execution_count": 34, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "fSaVzJ9xFpwW", + "outputId": "f60635b5-1a3a-41eb-aec9-9985eef28b42", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 538 + } + }, + "source": [ + "# Confusion Matrix\n", + "cf_matrix = confusion_matrix(y_test, y_pred)\n", + "cf_labels = ['True_Negative', 'False_Positive', 'False_Negative', 'True_Positive']\n", + "cf_categories = ['Zero', 'One']\n", + "mostra_confusion_matrix(cf_matrix, group_names= cf_labels, categories= cf_categories)" + ], + "execution_count": 33, + "outputs": [ + { + "output_type": "display_data", + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "tags": [], + "needs_background": "light" + } + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "p8D975NqsGtj" + }, + "source": [ + "## Parameter tunning\n", + "### Referência\n", + "* [Hyperparameter Tuning the Random Forest in Python](https://towardsdatascience.com/hyperparameter-tuning-the-random-forest-in-python-using-scikit-learn-28d2aa77dd74)\n", + "* [Decision Tree Adventures 2 — Explanation of Decision Tree Classifier Parameters](https://medium.com/datadriveninvestor/decision-tree-adventures-2-explanation-of-decision-tree-classifier-parameters-84776f39a28) - Explica didaticamente e step by step como fazer parameter tunning." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "Bfdq5zEhlVsk" + }, + "source": [ + "# Dicionário de parâmetros para o parameter tunning. Ao todo serão ajustados 2X13X5X5X7= 4.550 modelos. Contando com 10 folds no Cross-Validation, então são 45.500 modelos.\n", + "d_parametros_DT= {\"criterion\": [\"gini\", \"entropy\"]} #, \"min_samples_split\": [2, 5, 10, 30, 50, 70, 90, 120, 150, 180, 210, 240, 270, 350, 400], \"max_depth\": [None, 2, 5, 9, 15], \"min_samples_leaf\": [20, 40, 60, 80, 100], \"max_leaf_nodes\": [None, 2, 3, 4, 5, 10, 15]}" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "H8gNSs0G0A-L" + }, + "source": [ + "```\n", + "grid_search = GridSearchCV(ml_DT, param_grid= d_parametros_DT, cv = i_CV, n_jobs= -1)\n", + "start = time()\n", + "grid_search.fit(X_train, y_train)\n", + "tempo_elapsed= time()-start\n", + "print(f\"\\nGridSearchCV levou {tempo_elapsed:.2f} segundos para estimar {len(grid_search.cv_results_)} modelos candidatos\")\n", + "\n", + "GridSearchCV levou 1999.12 segundos para estimar 23 modelos candidatos\n", + "```\n", + "\n" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "ap3WMXqDthu9" + }, + "source": [ + "# Definindo a função para o GridSearchCV\n", + "def GridSearchOptimizer(modelo, ml_Opt, d_Parametros, X_train, y_train, X_test, y_test, cv = i_CV):\n", + " ml_GridSearchCV = GridSearchCV(modelo, d_Parametros, cv = i_CV, n_jobs= -1, verbose= 10, scoring= 'accuracy')\n", + " start = time()\n", + " ml_GridSearchCV.fit(X_train, y_train)\n", + " tempo_elapsed= time()-start\n", + " #print(f\"\\nGridSearchCV levou {tempo_elapsed:.2f} segundos.\")\n", + "\n", + " # Parâmetros que otimizam a classificação:\n", + " print(f'\\nParametros otimizados: {ml_GridSearchCV.best_params_}')\n", + " \n", + " if ml_Opt == 'ml_DT2':\n", + " print(f'\\nDecisionTreeClassifier *********************************************************************************************************')\n", + " ml_Opt = DecisionTreeClassifier(criterion= ml_GridSearchCV.best_params_['criterion'], \n", + " max_depth= ml_GridSearchCV.best_params_['max_depth'],\n", + " max_leaf_nodes= ml_GridSearchCV.best_params_['max_leaf_nodes'],\n", + " min_samples_split= ml_GridSearchCV.best_params_['min_samples_leaf'],\n", + " min_samples_leaf= ml_GridSearchCV.best_params_['min_samples_split'], \n", + " random_state= i_Seed)\n", + " \n", + " elif ml_Opt == 'ml_RF2':\n", + " print(f'\\nRandomForestClassifier *********************************************************************************************************')\n", + " ml_Opt = RandomForestClassifier(bootstrap= ml_GridSearchCV.best_params_['bootstrap'], \n", + " max_depth= ml_GridSearchCV.best_params_['max_depth'],\n", + " max_features= ml_GridSearchCV.best_params_['max_features'],\n", + " min_samples_leaf= ml_GridSearchCV.best_params_['min_samples_leaf'],\n", + " min_samples_split= ml_GridSearchCV.best_params_['min_samples_split'],\n", + " n_estimators= ml_GridSearchCV.best_params_['n_estimators'],\n", + " random_state= i_Seed)\n", + " \n", + " elif ml_Opt == 'ml_AB2':\n", + " print(f'\\nAdaBoostClassifier *********************************************************************************************************')\n", + " ml_Opt = AdaBoostClassifier(algorithm='SAMME.R', \n", + " base_estimator=RandomForestClassifier(bootstrap = False, \n", + " max_depth = 10, \n", + " max_features = 'auto', \n", + " min_samples_leaf = 1, \n", + " min_samples_split = 2, \n", + " n_estimators = 400), \n", + " learning_rate = ml_GridSearchCV.best_params_['learning_rate'], \n", + " n_estimators = ml_GridSearchCV.best_params_['n_estimators'], \n", + " random_state = i_Seed)\n", + " \n", + " elif ml_Opt == 'ml_GB2':\n", + " print(f'\\nGradientBoostingClassifier *********************************************************************************************************')\n", + " ml_Opt = GradientBoostingClassifier(learning_rate = ml_GridSearchCV.best_params_['learning_rate'], \n", + " n_estimators = ml_GridSearchCV.best_params_['n_estimators'], \n", + " max_depth = ml_GridSearchCV.best_params_['max_depth'], \n", + " min_samples_split = ml_GridSearchCV.best_params_['min_samples_split'], \n", + " min_samples_leaf = ml_GridSearchCV.best_params_['min_samples_leaf'], \n", + " max_features = ml_GridSearchCV.best_params_['max_features'])\n", + " \n", + " elif ml_Opt == 'ml_XGB2':\n", + " print(f'\\nXGBoostingClassifier *********************************************************************************************************')\n", + " ml_Opt = XGBoostingClassifier(learning_rate= ml_GridSearchCV.best_params_['learning_rate'], \n", + " max_depth= ml_GridSearchCV.best_params_['max_depth'], \n", + " colsample_bytree= ml_GridSearchCV.best_params_['colsample_bytree'], \n", + " subsample= ml_GridSearchCV.best_params_['subsample'], \n", + " gamma= ml_GridSearchCV.best_params_['gamma'], \n", + " min_child_weight= ml_GridSearchCV.best_params_['min_child_weight'])\n", + " \n", + " # Treina novamente usando os parametros otimizados...\n", + " ml_Opt.fit(X_train, y_train)\n", + "\n", + " # Cross-Validation com 10 folds\n", + " print(f'\\n********* CROSS-VALIDATION ***********')\n", + " a_scores_CV = cross_val_score(ml_Opt, X_train, y_train, cv = i_CV)\n", + " print(f'Média das Acurácias calculadas pelo CV....: {100*round(a_scores_CV.mean(),4)}')\n", + " print(f'std médio das Acurácias calculadas pelo CV: {100*round(a_scores_CV.std(),4)}')\n", + "\n", + " # Faz predições com os parametros otimizados...\n", + " y_pred = ml_Opt.predict(X_test)\n", + " \n", + " # Importância das COLUNAS\n", + " print(f'\\n********* IMPORTÂNCIA DAS COLUNAS ***********')\n", + " df_importancia_variaveis = pd.DataFrame(zip(l_colunas, ml_Opt.feature_importances_), columns= ['coluna', 'importancia'])\n", + " df_importancia_variaveis = df_importancia_variaveis.sort_values(by= ['importancia'], ascending=False)\n", + " print(df_importancia_variaveis)\n", + "\n", + " # Matriz de Confusão\n", + " print(f'\\n********* CONFUSION MATRIX - PARAMETER TUNNING ***********')\n", + " cf_matrix = confusion_matrix(y_test, y_pred)\n", + " cf_labels = ['True_Negative', 'False_Positive', 'False_Negative', 'True_Positive']\n", + " cf_categories = ['Zero', 'One']\n", + " mostra_confusion_matrix(cf_matrix, group_names = cf_labels, categories = cf_categories)\n", + "\n", + " return ml_Opt, ml_GridSearchCV.best_params_" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "44-BRnNjBT25", + "outputId": "da9fa734-cd1d-4731-d6c6-2ff2cbc1d379", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 520 + } + }, + "source": [ + "# Invoca a função\n", + "ml_DT2, best_params = GridSearchOptimizer(ml_DT, 'ml_DT2', d_parametros_DT, X_train, y_train, X_test, y_test, cv = i_CV)" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "text": [ + "Fitting 10 folds for each of 2 candidates, totalling 20 fits\n" + ], + "name": "stdout" + }, + { + "output_type": "stream", + "text": [ + "[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers.\n", + "[Parallel(n_jobs=-1)]: Done 1 tasks | elapsed: 1.0s\n", + "[Parallel(n_jobs=-1)]: Done 4 tasks | elapsed: 1.1s\n", + "[Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.2s\n", + "[Parallel(n_jobs=-1)]: Batch computation too fast (0.1813s.) Setting batch_size=2.\n", + "[Parallel(n_jobs=-1)]: Done 14 tasks | elapsed: 1.2s\n" + ], + "name": "stderr" + }, + { + "output_type": "stream", + "text": [ + "\n", + "Parametros otimizados: {'criterion': 'entropy'}\n", + "\n", + "DecisionTreeClassifier *********************************************************************************************************\n" + ], + "name": "stdout" + }, + { + "output_type": "stream", + "text": [ + "[Parallel(n_jobs=-1)]: Done 20 out of 20 | elapsed: 1.3s remaining: 0.0s\n", + "[Parallel(n_jobs=-1)]: Done 20 out of 20 | elapsed: 1.3s finished\n" + ], + "name": "stderr" + }, + { + "output_type": "error", + "ename": "KeyError", + "evalue": "ignored", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mKeyError\u001b[0m Traceback (most recent call last)", + "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;31m# Invoca a função\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0mml_DT2\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mbest_params\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mGridSearchOptimizer\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mml_DT\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'ml_DT2'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0md_parametros_DT\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mX_train\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0my_train\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mX_test\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0my_test\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mcv\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mi_CV\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", + "\u001b[0;32m\u001b[0m in \u001b[0;36mGridSearchOptimizer\u001b[0;34m(modelo, ml_Opt, d_Parametros, X_train, y_train, X_test, y_test, cv)\u001b[0m\n\u001b[1;32m 13\u001b[0m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34mf'\\nDecisionTreeClassifier *********************************************************************************************************'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 14\u001b[0m ml_Opt = DecisionTreeClassifier(criterion= ml_GridSearchCV.best_params_['criterion'], \n\u001b[0;32m---> 15\u001b[0;31m \u001b[0mmax_depth\u001b[0m\u001b[0;34m=\u001b[0m \u001b[0mml_GridSearchCV\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mbest_params_\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'max_depth'\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 16\u001b[0m \u001b[0mmax_leaf_nodes\u001b[0m\u001b[0;34m=\u001b[0m \u001b[0mml_GridSearchCV\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mbest_params_\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'max_leaf_nodes'\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 17\u001b[0m \u001b[0mmin_samples_split\u001b[0m\u001b[0;34m=\u001b[0m \u001b[0mml_GridSearchCV\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mbest_params_\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'min_samples_leaf'\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;31mKeyError\u001b[0m: 'max_depth'" + ] + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "gmCkjGjPJMLr" + }, + "source": [ + "### Visualizar o resultado" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "cIc3ZgaISEd0" + }, + "source": [ + "from sklearn.tree import export_graphviz\n", + "from sklearn.externals.six import StringIO \n", + "from IPython.display import Image \n", + "import pydotplus\n", + "\n", + "dot_data = StringIO()\n", + "export_graphviz(ml_DT2, out_file = dot_data, filled = True, rounded = True, special_characters = True, feature_names = l_colunas, class_names = ['0','1'])\n", + "\n", + "graph = pydotplus.graph_from_dot_data(dot_data.getvalue()) \n", + "graph.write_png('DecisionTree.png')\n", + "Image(graph.create_png())" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "e1R2GBkbnV37" + }, + "source": [ + "## Selecionar as COLUNAS importantes/relevantes" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "vv7GKBvs6Ybf" + }, + "source": [ + "# Função desenvolvida para Selecionar COLUNAS relevantes\n", + "from sklearn.feature_selection import SelectFromModel\n", + "\n", + "def seleciona_colunas_relevantes(modelo, X_train, X_test, threshold = 0.05):\n", + " # Cria um seletor para selecionar as COLUNAS com importância > threshold\n", + " sfm = SelectFromModel(modelo, threshold)\n", + " \n", + " # Treina o seletor\n", + " sfm.fit(X_train, y_train)\n", + "\n", + " # Mostra o indice das COLUNAS mais importantes\n", + " print(f'\\n********** COLUNAS Relevantes ******')\n", + " print(sfm.get_support(indices=True))\n", + "\n", + " # Seleciona somente as COLUNAS relevantes\n", + " X_train_I = sfm.transform(X_train)\n", + " X_test_I = sfm.transform(X_test)\n", + " return X_train_I, X_test_I " + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "ukMLoEr7nbUf" + }, + "source": [ + "X_train_DT, X_test_DT = seleciona_colunas_relevantes(ml_DT2, X_train, X_test)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "8JjePRQAoqkk" + }, + "source": [ + "## Treina o classificador com as COLUNAS relevantes" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "Gt3aCPpfKRxm" + }, + "source": [ + "best_params" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "zq6uCVtzovMt" + }, + "source": [ + "# Treina usando as COLUNAS relevantes...\n", + "ml_DT2.fit(X_train_DT, y_train)\n", + "\n", + "# Cross-Validation com 10 folds\n", + "a_scores_CV = cross_val_score(ml_DT2, X_train_DT, y_train, cv = i_CV)\n", + "print(f'Média das Acurácias calculadas pelo CV....: {100*round(a_scores_CV.mean(),4)}')\n", + "print(f'std médio das Acurácias calculadas pelo CV: {100*round(a_scores_CV.std(),4)}')" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "Tc7esxqtq-Og" + }, + "source": [ + "****************************************************************" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "znWy3LE1q-Z3" + }, + "source": [ + "ml_DT3, best_params2 = GridSearchOptimizer(ml_DT2, 'ml_DT2', d_parametros_DT, X_train_DT, y_train, X_test_DT, y_test, cv = i_CV)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "6IhCC6pfq-jL" + }, + "source": [ + "best_params" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "qw6Dk3kesT0q" + }, + "source": [ + "best_params2" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "SbS4ZKN8s-ee" + }, + "source": [ + "# Cross-Validation com 10 folds\n", + "a_scores_CV = cross_val_score(ml_DT3, X_train_DT, y_train, cv = i_CV)\n", + "print(f'Média das Acurácias calculadas pelo CV....: {100*round(a_scores_CV.mean(),4)}')\n", + "print(f'std médio das Acurácias calculadas pelo CV: {100*round(a_scores_CV.std(),4)}')" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "_at3XP1Bq-qb" + }, + "source": [ + "***************************************************************" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "MZ1-vGRcxJoN" + }, + "source": [ + "## Valida o modelo usando o dataframe X_test" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "ig9GiUAEw9jr" + }, + "source": [ + "y_pred_DT = ml_DT2.predict(X_test_DT)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "7UZz4UzHDqae" + }, + "source": [ + "# Calcula acurácia\n", + "accuracy_score(y_test, y_pred_DT)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "K3EUMAxxKBur" + }, + "source": [ + "___\n", + "# **RANDOM FOREST**\n", + "* Decision Trees possuem estrutura em forma de árvores.\n", + "* Random Forest pode ser utilizado tanto para classificação (RandomForestClassifier)quanto para Regressão (RandomForestRegressor).\n", + "\n", + "* **Vantagens**:\n", + " * Não requer tanto data preprocessing;\n", + " * Lida bem com COLUNAS categóricas e numéricas;\n", + " * É um Boosting Ensemble Method (pois constrói muitas árvores). Estes modelos aprendem com os próprios erros e ajustam as árvores de modo a fazer melhores classificações;\n", + " * Mais robusta que uma simples Decision Tree. **Porque?**\n", + " * Controla automaticamente overfitting (**porque?**) e frequentemente produz modelos muito robustos e de alta-performance.\n", + " * Pode ser utilizado como Feature Selection, pois gera a matriz de importância dos atributos (importance sample). A soma das importâncias soma 100;\n", + " * Assim como as Decision Trees, esses modelos capturam facilmente padrões não-lineares presentes nos dados;\n", + " * Não requer os dados sejam normalizados;\n", + " * Lida bem com Missing Values;\n", + " * Não requer suposições (assumptions) sobre a distribuição dos dados por causa da natureza não-paramétrica do algoritmo\n", + "\n", + "* **Desvantagens**\n", + " * **Recomenda-se balancear o dataframe previamente para se evitar esse problema**.\n", + "\n", + "* **Principais parâmetros**\n", + "\n", + "## **Referências**:\n", + "* [Running Random Forests? Inspect the feature importances with this code](https://towardsdatascience.com/running-random-forests-inspect-the-feature-importances-with-this-code-2b00dd72b92e)\n", + "* [Feature importances with forests of trees](https://scikit-learn.org/stable/auto_examples/ensemble/plot_forest_importances.html)\n", + "* [Understanding Random Forests Classifiers in Python](https://www.datacamp.com/community/tutorials/random-forests-classifier-python)\n", + "* [Understanding Random Forest](https://towardsdatascience.com/understanding-random-forest-58381e0602d2)\n", + "* [An Implementation and Explanation of the Random Forest in Python](https://towardsdatascience.com/an-implementation-and-explanation-of-the-random-forest-in-python-77bf308a9b76)\n", + "* [Random Forest Simple Explanation](https://medium.com/@williamkoehrsen/random-forest-simple-explanation-377895a60d2d)\n", + "* [Random Forest Explained](https://www.youtube.com/watch?v=eM4uJ6XGnSM)\n", + "* [Hyperparameter Tuning the Random Forest in Python](https://towardsdatascience.com/hyperparameter-tuning-the-random-forest-in-python-using-scikit-learn-28d2aa77dd74) - Explica os principais parâmetros do Random Forest." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "cnfDw_GEKBuu" + }, + "source": [ + "from sklearn.ensemble import RandomForestClassifier\n", + "\n", + "# Instancia...\n", + "ml_RF= RandomForestClassifier(n_estimators=100, min_samples_split= 2, max_features=\"auto\", random_state= i_Seed)\n", + "\n", + "# Treina...\n", + "ml_RF.fit(X_train, y_train)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "lYa9oaZW__o6", + "outputId": "ba94936d-6a1d-49d4-9bb0-23cb56f7fa49", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 214 + } + }, + "source": [ + "# Cross-Validation com 10 folds\n", + "a_scores_CV = cross_val_score(ml_RF, X_train, y_train, cv = i_CV)\n", + "print(f'Média das Acurácias calculadas pelo CV....: {100*round(a_scores_CV.mean(),4)}')\n", + "print(f'std médio das Acurácias calculadas pelo CV: {100*round(a_scores_CV.std(),4)}')" + ], + "execution_count": 3, + "outputs": [ + { + "output_type": "error", + "ename": "NameError", + "evalue": "ignored", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)", + "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;31m# Cross-Validation com 10 folds\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0ma_scores_CV\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mcross_val_score\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mml_RF\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mX_train\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0my_train\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mcv\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mi_CV\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 3\u001b[0m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34mf'Média das Acurácias calculadas pelo CV....: {100*round(a_scores_CV.mean(),4)}'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 4\u001b[0m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34mf'std médio das Acurácias calculadas pelo CV: {100*round(a_scores_CV.std(),4)}'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;31mNameError\u001b[0m: name 'cross_val_score' is not defined" + ] + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "AouWUu8vANdb" + }, + "source": [ + "**Interpretação**: Nosso classificador (RandomForestClassifier) tem uma acurácia média de 96,44% (base de treinamento). Além disso, o std é da ordem de 2,77%, ou seja, pequena. Vamos tentar melhorar a acurácia do classificador usando parameter tunning (GridSearchCV)." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "vbducxlgAa85", + "outputId": "a916adc2-7038-4d7b-8ac5-96f41f971340", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 163 + } + }, + "source": [ + "print(f'Acurácias: {a_scores_CV}')" + ], + "execution_count": 4, + "outputs": [ + { + "output_type": "error", + "ename": "NameError", + "evalue": "ignored", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)", + "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34mf'Acurácias: {a_scores_CV}'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", + "\u001b[0;31mNameError\u001b[0m: name 'a_scores_CV' is not defined" + ] + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "_lxx-LUw_5sd", + "outputId": "c7cccd43-813b-4765-f396-1a70540b3f05", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 180 + } + }, + "source": [ + "# Faz predições...\n", + "y_pred = ml_RF.predict(X_test)" + ], + "execution_count": 5, + "outputs": [ + { + "output_type": "error", + "ename": "NameError", + "evalue": "ignored", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)", + "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;31m# Faz predições...\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0my_pred\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mml_RF\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mpredict\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mX_test\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", + "\u001b[0;31mNameError\u001b[0m: name 'ml_RF' is not defined" + ] + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "pQIRO_LpGAkw" + }, + "source": [ + "# Confusion Matrix\n", + "cf_matrix = confusion_matrix(y_test, y_pred)\n", + "cf_labels = ['True_Negative','False_Positive','False_Negative','True_Positive']\n", + "cf_categories = ['Zero', 'One']\n", + "mostra_confusion_matrix(cf_matrix, group_names= cf_labels, categories= cf_categories)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "yKLHZ5_C6FJ8" + }, + "source": [ + "## Parameter tunning\n", + "### Referência\n", + "* [Hyperparameter Tuning the Random Forest in Python](https://towardsdatascience.com/hyperparameter-tuning-the-random-forest-in-python-using-scikit-learn-28d2aa77dd74)\n", + "* [Decision Tree Adventures 2 — Explanation of Decision Tree Classifier Parameters](https://medium.com/datadriveninvestor/decision-tree-adventures-2-explanation-of-decision-tree-classifier-parameters-84776f39a28) - Explica didaticamente e step by step como fazer parameter tunning.\n", + "* [Optimizing Hyperparameters in Random Forest Classification](https://towardsdatascience.com/optimizing-hyperparameters-in-random-forest-classification-ec7741f9d3f6) - Outro approach para entender parameter tunning. Recomendo fortemente a leitura! " + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "XOa9naju6FKA" + }, + "source": [ + "# Dicionário de parâmetros para o parameter tunning.\n", + "d_parametros_RF= {'bootstrap': [True, False]} #,\n", + "# 'max_depth': [10, 20, 30, 40, 50, 60, 70, 80, 90, 100, None],\n", + "# 'max_features': ['auto', 'sqrt'],\n", + "# 'min_samples_leaf': [1, 2, 4],\n", + "# 'min_samples_split': [2, 5, 10],\n", + "# 'n_estimators': [200, 400, 600, 800, 1000, 1200, 1400, 1600, 1800, 2000]}" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "6__f2jZaTQat" + }, + "source": [ + "# Invoca a função\n", + "ml_RF2, best_params = GridSearchOptimizer(ml_RF, 'ml_RF2', d_parametros_RF, X_train, y_train, X_test, y_test, cv = i_CV)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "crfn-n--KG4n" + }, + "source": [ + "### Resultado da execução do Random Forest\n", + "\n", + "```\n", + "[Parallel(n_jobs=-1)]: Done 7920 out of 7920 | elapsed: 194.0min finished\n", + "best_params= {'bootstrap': False, 'max_depth': 10, 'max_features': 'auto', 'min_samples_leaf': 1, 'min_samples_split': 2, 'n_estimators': 400}\n", + "```" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "SGTOe5PaRw59" + }, + "source": [ + "# Como o procedimento acima levou 194 minutos para executar, então vou estimar ml_RF2 abaixo usando os parâmetros acima estimados\n", + "best_params= {'bootstrap': False, 'max_depth': 10, 'max_features': 'auto', 'min_samples_leaf': 1, 'min_samples_split': 2, 'n_estimators': 400}\n", + "\n", + "ml_RF2= RandomForestClassifier(bootstrap= best_params['bootstrap'], \n", + " max_depth= best_params['max_depth'], \n", + " max_features= best_params['max_features'], \n", + " min_samples_leaf= best_params['min_samples_leaf'], \n", + " min_samples_split= best_params['min_samples_split'], \n", + " n_estimators= best_params['n_estimators'], \n", + " random_state= i_Seed)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "HMJcAdLlTQa0" + }, + "source": [ + "## Visualizar o resultado\n", + "> Implementar a visualização do RandomForest." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "WWNiy7Z0TQa3" + }, + "source": [ + "## Selecionar as COLUNAS importantes/relevantes" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "kOi11YOKTQa4" + }, + "source": [ + "X_train_RF, X_test_RF = seleciona_colunas_relevantes(ml_RF2, X_train, X_test)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Zn_O7c_DTQbE" + }, + "source": [ + "## Treina o classificador com as COLUNAS relevantes" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "UwEOwzSGTQbF" + }, + "source": [ + "best_params" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "Rr8qDrgvTQbL" + }, + "source": [ + "# Treina com as COLUNAS relevantes...\n", + "ml_RF2.fit(X_train_RF, y_train)\n", + "\n", + "# Cross-Validation com 10 folds\n", + "a_scores_CV = cross_val_score(ml_RF2, X_train_RF, y_train, cv = i_CV)\n", + "print(f'Acurácia Media: {100*a_scores_CV.mean():.2f}')\n", + "print(f'std médio.....: {100*a_scores_CV.std():.2f}')" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "-mYfQLlsTQbQ" + }, + "source": [ + "## Valida o modelo usando o dataframe X_test" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "sSD5o1JQTQbR" + }, + "source": [ + "y_pred_RF = ml_RF2.predict(X_test_RF)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "wywF6LymDzKr" + }, + "source": [ + "# Calcula acurácia\n", + "accuracy_score(y_test, y_pred_RF)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "hJJsL0IJb6iO" + }, + "source": [ + "## Estudo do comportamento dos parametros do algoritmo\n", + "> Consulte [Optimizing Hyperparameters in Random Forest Classification](https://towardsdatascience.com/optimizing-hyperparameters-in-random-forest-classification-ec7741f9d3f6) para mais detalhes." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "navUWMwHi44D" + }, + "source": [ + "param_range = np.arange(1, 250, 2)\n", + "\n", + "# Calculate accuracy on training and test set using range of parameter values\n", + "train_a_scores_CV, test_a_scores_CV = validation_curve(RandomForestClassifier(), \n", + " X_train, \n", + " y_train, \n", + " param_name=\"n_estimators\", \n", + " param_range = param_range, \n", + " cv = i_CV, \n", + " scoring = \"accuracy\", \n", + " n_jobs = -1)\n", + "\n", + "\n", + "# Calculate mean and standard deviation for training set a_scores_CV\n", + "train_mean = np.mean(train_a_scores_CV, axis = 1)\n", + "train_std = np.std(train_a_scores_CV, axis = 1)\n", + "\n", + "# Calculate mean and standard deviation for test set a_scores_CV\n", + "test_mean = np.mean(test_a_scores_CV, axis = 1)\n", + "test_std = np.std(test_a_scores_CV, axis = 1)\n", + "\n", + "# Plot mean accuracy a_scores_CV for training and test sets\n", + "plt.plot(param_range, train_mean, label = \"Training score\", color = \"black\")\n", + "plt.plot(param_range, test_mean, label = \"Cross-validation score\", color = \"dimgrey\")\n", + "\n", + "# Plot accurancy bands for training and test sets\n", + "plt.fill_between(param_range, train_mean - train_std, train_mean + train_std, color = \"gray\")\n", + "plt.fill_between(param_range, test_mean - test_std, test_mean + test_std, color = \"gainsboro\")\n", + "\n", + "# Create plot\n", + "plt.title(\"Validation Curve With Random Forest\")\n", + "plt.xlabel(\"Number Of Trees\")\n", + "plt.ylabel(\"Accuracy Score\")\n", + "plt.tight_layout()\n", + "plt.legend(loc = \"best\")\n", + "plt.show()" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "rv7TIM9kjsud" + }, + "source": [ + "param_range = np.arange(1, 250, 2)\n", + "\n", + "# Calculate accuracy on training and test set using range of parameter values\n", + "train_a_scores_CV, test_a_scores_CV = validation_curve(RandomForestClassifier(), \n", + " X_train, \n", + " y_train, \n", + " param_name = \"max_depth\", \n", + " param_range = param_range, \n", + " cv = i_CV, \n", + " scoring = \"accuracy\", \n", + " n_jobs = -1)\n", + "\n", + "# Calculate mean and standard deviation for training set a_scores_CV\n", + "train_mean = np.mean(train_a_scores_CV, axis = 1)\n", + "train_std = np.std(train_a_scores_CV, axis = 1)\n", + "\n", + "# Calculate mean and standard deviation for test set a_scores_CV\n", + "test_mean = np.mean(test_a_scores_CV, axis = 1)\n", + "test_std = np.std(test_a_scores_CV, axis = 1)\n", + "\n", + "# Plot mean accuracy a_scores_CV for training and test sets\n", + "plt.plot(param_range, train_mean, label=\"Training score\", color=\"black\")\n", + "plt.plot(param_range, test_mean, label=\"Cross-validation score\", color=\"dimgrey\")\n", + "\n", + "# Plot accurancy bands for training and test sets\n", + "plt.fill_between(param_range, train_mean - train_std, train_mean + train_std, color=\"gray\")\n", + "plt.fill_between(param_range, test_mean - test_std, test_mean + test_std, color=\"gainsboro\")\n", + "\n", + "# Create plot\n", + "plt.title(\"Validation Curve With Random Forest\")\n", + "plt.xlabel(\"Number Of Trees\")\n", + "plt.ylabel(\"Accuracy Score\")\n", + "plt.tight_layout()\n", + "plt.legend(loc=\"best\")\n", + "plt.show()" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "lm_fPGYwkJYc" + }, + "source": [ + "param_range = np.arange(1, 250, 2)\n", + "\n", + "# Calculate accuracy on training and test set using range of parameter values\n", + "train_a_scores_CV, test_a_scores_CV = validation_curve(RandomForestClassifier(), \n", + " X_train, \n", + " y_train, \n", + " param_name='min_samples_leaf', \n", + " param_range=param_range,\n", + " cv = i_CV, \n", + " scoring=\"accuracy\", \n", + " n_jobs=-1)\n", + "\n", + "\n", + "# Calculate mean and standard deviation for training set a_scores_CV\n", + "train_mean = np.mean(train_a_scores_CV, axis = 1)\n", + "train_std = np.std(train_a_scores_CV, axis = 1)\n", + "\n", + "# Calculate mean and standard deviation for test set a_scores_CV\n", + "test_mean = np.mean(test_a_scores_CV, axis = 1)\n", + "test_std = np.std(test_a_scores_CV, axis = 1)\n", + "\n", + "# Plot mean accuracy a_scores_CV for training and test sets\n", + "plt.plot(param_range, train_mean, label=\"Training score\", color=\"black\")\n", + "plt.plot(param_range, test_mean, label=\"Cross-validation score\", color=\"dimgrey\")\n", + "\n", + "# Plot accurancy bands for training and test sets\n", + "plt.fill_between(param_range, train_mean - train_std, train_mean + train_std, color=\"gray\")\n", + "plt.fill_between(param_range, test_mean - test_std, test_mean + test_std, color=\"gainsboro\")\n", + "\n", + "# Create plot\n", + "plt.title(\"Validation Curve With Random Forest\")\n", + "plt.xlabel(\"Number Of Trees\")\n", + "plt.ylabel(\"Accuracy Score\")\n", + "plt.tight_layout()\n", + "plt.legend(loc=\"best\")\n", + "plt.show()" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "CAqdiSaVlAB8" + }, + "source": [ + "param_range = np.arange(0.05, 1, 0.05)\n", + "\n", + "# Calculate accuracy on training and test set using range of parameter values\n", + "train_a_scores_CV, test_a_scores_CV = validation_curve(RandomForestClassifier(), \n", + " X_train, \n", + " y_train, \n", + " param_name='min_samples_split', \n", + " param_range=param_range,\n", + " cv = i_CV, \n", + " scoring=\"accuracy\", \n", + " n_jobs=-1)\n", + "\n", + "\n", + "# Calculate mean and standard deviation for training set a_scores_CV\n", + "train_mean = np.mean(train_a_scores_CV, axis = 1)\n", + "train_std = np.std(train_a_scores_CV, axis = 1)\n", + "\n", + "# Calculate mean and standard deviation for test set a_scores_CV\n", + "test_mean = np.mean(test_a_scores_CV, axis = 1)\n", + "test_std = np.std(test_a_scores_CV, axis = 1)\n", + "\n", + "# Plot mean accuracy a_scores_CV for training and test sets\n", + "plt.plot(param_range, train_mean, label=\"Training score\", color=\"black\")\n", + "plt.plot(param_range, test_mean, label=\"Cross-validation score\", color=\"dimgrey\")\n", + "\n", + "# Plot accurancy bands for training and test sets\n", + "plt.fill_between(param_range, train_mean - train_std, train_mean + train_std, color=\"gray\")\n", + "plt.fill_between(param_range, test_mean - test_std, test_mean + test_std, color=\"gainsboro\")\n", + "\n", + "# Create plot\n", + "plt.title(\"Validation Curve With Random Forest\")\n", + "plt.xlabel(\"Number Of Trees\")\n", + "plt.ylabel(\"Accuracy Score\")\n", + "plt.tight_layout()\n", + "plt.legend(loc=\"best\")\n", + "plt.show()" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "cX_gfsbQSdNd" + }, + "source": [ + "___\n", + "# **BOOSTING MODELS**\n", + "* São algoritmos muito utilizados nas competições do Kaggle;\n", + "* São algoritmos utilizados para melhorar a performance dos algoritmos de Machine Learning;\n", + "* Modelos:\n", + " - [X] AdaBoost\n", + " - [X] XGBoost\n", + " - [X] LightGBM\n", + " - [X] GradientBoosting\n", + " - [X] CatBoost\n", + "\n", + "## Bagging vs Boosting vc Stacking\n", + "### **Bagging**\n", + "* Objetivo é reduzir a variância;\n", + "\n", + "#### Como funciona\n", + "* Seleciona várias amostras **COM REPOSIÇÃO** do dataframe de treinamento. Cada amostra é usada para treinar um modelo usando Decision Trees. Como resultado, temos um ensemble de muitas e diferentes modelos (Decision Trees). A média de desses muitos e diferentes modelos (Decision Trees) são usados para produzir o resultado final;\n", + "* O resultado final é mais robusto do que usarmos uma simples Decision Tree.\n", + "\n", + "![Bagging](https://github.com/MathMachado/Materials/blob/master/Bagging.png?raw=true)\n", + "\n", + "Souce: [Boosting and Bagging: How To Develop A Robust Machine Learning Algorithm](https://hackernoon.com/how-to-develop-a-robust-algorithm-c38e08f32201).\n", + "\n", + "#### Steps\n", + "* Suponha um dataframe X_train (dataframe de treinamento) contendo N observações (instâncias, pontos, linhas) e M COLUNAS (features, atributos).\n", + " 1. Bagging seleciona aleatoriamente uma amostra **COM REPOSIÇÃO** de X_train;\n", + " 2. Bagging seleciona aleatoriamente M2 (M2 < M) COLUNAS do dataframe extraído do passo (1);\n", + " 3. Constroi uma Decision Tree com as M2 COLUNAS do passo (2) e o dataframe obtido no passo (1) e as COLUNAS são avaliadas pela sua habilidade de classificar as observações;\n", + " 4. Os passos (1)--> (2)-- (3) são repetidos K vezes (ou seja, K Decision Trees), de forma que as COLUNAS são ranqueadas pelo seu poder preditivo e o resultado final (acurácia, por exemplo) é obtido pela agregação das predições dos K Decision Trees.\n", + "\n", + "#### Vantagens\n", + "* Reduz overfitting;\n", + "* Lida bem com dataframes com muitas COLUNAS (high dimensionality);\n", + "* Lida automaticamente com Missing Values;\n", + "\n", + "#### Desvantagem\n", + "* A predição final é baseada na média das K Decision Trees, o que pode comprometer a acurácia final.\n", + "\n", + "___ \n", + "### **Boosting**\n", + "* Objetivo é melhorar acurácia;\n", + "\n", + "#### Como funciona\n", + "* Os classificadores são usados sequencialmente, de forma que o classificador no passo N aprende com os erros do classificador do passo N-1. Ou seja, o objetivo é melhorar a precisão/acurácia à cada passo aprendendo com o passado.\n", + "\n", + "![Boosting](https://github.com/MathMachado/Materials/blob/master/Boosting.png?raw=true)\n", + "\n", + "Source: [Ensemble methods: bagging, boosting and stacking](https://towardsdatascience.com/ensemble-methods-bagging-boosting-and-stacking-c9214a10a205), Joseph Rocca\n", + ".\n", + "\n", + "#### Steps\n", + "* Suponha um dataframe X_train (dataframe de treinamento) contendo N observações (instâncias, pontos, linhas) e M COLUNAS (features, atributos).\n", + " 1. Boosting seleciona aleatoriamente uma amostra D1 SEM reposição de X_train;\n", + " 2. Boosting treina o classificador C1;\n", + " 3. Boosting seleciona aleatoriamente a SEGUNDA amostra D2 SEM reposição de X_train e acrescenta à D2 50% das observações que foram classificadas incorretamente para treinar o classificador C2;\n", + " 4. Boosting encontra em X_train a amostra D3 que os classificadores C1 e C2 discordam em classificar e treina C3;\n", + " 5. Combina (voto) as predições de C1, C2 e C3 para produzir o resultado final.\n", + "\n", + "#### Vantagens\n", + "* Lida bem com dataframes com muitas COLUNAS (high dimensionality);\n", + "* Lida automaticamente com Missing Values;\n", + "\n", + "#### Desvantagem\n", + "* Propenso a overfitting. Recomenda-se tratar outliers previamente.\n", + "* Requer ajuste cuidadoso dos hyperparameters;" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "9fgUrkmPk4dr" + }, + "source": [ + "___\n", + "# STACKING\n", + "\n", + "![Stacking](https://github.com/MathMachado/Materials/blob/master/Stacking.png?raw=true)\n", + "\n", + "Kd a referência desta figura???" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "B0jxx3ETpOdm" + }, + "source": [ + "___\n", + "# **BOOTSTRAPPING METHODS**\n", + "> Antes de falarmos de Boosting ou Bagging, precisamos entender primeiro o que é Bootstrap, pois ambos (Boosting e Bagging) são baseados em Bootstrap.\n", + "\n", + "* Em Estatística (e em Machine Learning), Bootstrap se refere à extrair amostras aleatórias COM reposição da população X." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "SyqazmUuifkE" + }, + "source": [ + "___\n", + "# **ADABOOST(Adaptive Boosting)**\n", + "* Quando nada funciona, AdaBoost funciona!\n", + "* Foi um dos primeiros algoritmos de Boosting (1995);\n", + "* AdaBoost pode ser utilizado tanto para classificação (AdaBoostClassifier) quanto para Regressão (AdaBoostRegressor);\n", + "* AdaBoost usam algoritmos DecisionTree como base_estimator;" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "RU-vzkXqrFVw" + }, + "source": [ + "## Referências\n", + "* [AdaBoost Classifier Example In Python](https://towardsdatascience.com/machine-learning-part-17-boosting-algorithms-adaboost-in-python-d00faac6c464) - Didático e explica exatamente como o AdaBoost funciona.\n", + "* [Adaboost for Dummies: Breaking Down the Math (and its Equations) into Simple Terms](https://towardsdatascience.com/adaboost-for-dummies-breaking-down-the-math-and-its-equations-into-simple-terms-87f439757dcf) - Para quem quer entender a matemática por trás do algoritmo.\n", + "* [Gradient Boosting and XGBoost](https://medium.com/hackernoon/gradient-boosting-and-xgboost-90862daa6c77)\n", + "* [Understanding AdaBoost](https://towardsdatascience.com/understanding-adaboost-2f94f22d5bfe), Akash Desarda.\n", + "* [AdaBoost Classifier Example In Python](https://towardsdatascience.com/machine-learning-part-17-boosting-algorithms-adaboost-in-python-d00faac6c464)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "6EMrjQDZIMl_" + }, + "source": [ + "## O que é AdaBoost (Adaptive Boosting)?\n", + "* é um dos classificadores do tipo ensemble (combina vários classificadores para aumentar a precisão).\n", + "* AdaBoost é um classificador iterativo e forte que combina (ensemble) vários classificadores fracos para melhorar a precisão.\n", + "* Qualquer algoritmo de aprendizado de máquina pode ser usado como um classificador de base (parâmetro base_estimator);\n", + "\n", + "## Parâmetros mais importantes do AdaBoost:\n", + "* base_estimator - É um classificador usado para treinar o modelo. Como default, AdaBoost usa o DecisionTreeClassifier. Como dito anteriormente, pode-se utilizar diferentes algoritmos para esse fim.\n", + "* n_estimators - Número de base_estimator para treinar iterativamente.\n", + "* learning_rate - Controla a contribuição do base_estimator na solução/combinação final;" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "TzLtHzWNJBix" + }, + "source": [ + "## Usando diferentes algoritmos para base_estimator\n", + "> Como dito anteriormente, pode-se utilizar vários tipos de base_estimator em AdaBoost. Por exemplo, se quisermos usar SVM (Support Vector Machines), devemos proceder da seguinte forma:\n", + "\n", + "\n", + "```\n", + "# Importar a biblioteca base_estimator\n", + "from sklearn.svm import SVC\n", + "\n", + "# Treina o classificador (algoritmo)\n", + "ml_SVC= SVC(probability=True, kernel='linear')\n", + "\n", + "# Constroi o modelo AdaBoost\n", + "ml_AB = AdaBoostClassifier(n_estimators= 50, base_estimator=ml_SVC, learning_rate=1)\n", + "```\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "hrj4a4s6hMMB" + }, + "source": [ + "## Vantagens\n", + "* AdaBoost é fácil de implementar;\n", + "* AdaBoost corrige os erros do base_estimator iterativamente e melhora a acurácia;\n", + "* Faz o Feature Selection automaticamente (**Porque**?);\n", + "* Pode-se usar muitos algoritos como base_estimator ;\n", + "* Como é um método ensemble, então o modelo final é pouco propenso à overfitting.\n", + "\n", + "## Desvantagens\n", + "* AdaBoost é sensível a ruídos nos dados;\n", + "* Altamente impactado por outliers (contribui para overfitting), pois o algoritmo tenta se ajustr a cada ponto da mehor forma possível;\n", + "* AdaBoost é mais lento que XGBoost;" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "bgJmu7YLiyv7" + }, + "source": [ + "No exemplo a seguir, vou usar RandomForestClassifier com os parâmetros otimizados, ou seja:\n", + "\n", + "```\n", + "best_params= {'bootstrap': False, 'max_depth': 10, 'max_features': 'auto', 'min_samples_leaf': 1, 'min_samples_split': 2, 'n_estimators': 400}\n", + "```\n", + "\n" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "5VCRNyZT3qvc" + }, + "source": [ + "best_params= {'bootstrap': False, 'max_depth': 10, 'max_features': 'auto', 'min_samples_leaf': 1, 'min_samples_split': 2, 'n_estimators': 400}" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "1gIboJdriq61" + }, + "source": [ + "from sklearn.ensemble import AdaBoostClassifier\n", + "from sklearn.ensemble import RandomForestClassifier\n", + "\n", + "# Instancia RandomForestClassifier - Parâmetros otimizados!\n", + "ml_RF2= RandomForestClassifier(bootstrap= best_params['bootstrap'], \n", + " max_depth= best_params['max_depth'], \n", + " max_features= best_params['max_features'], \n", + " min_samples_leaf= best_params['min_samples_leaf'], \n", + " min_samples_split= best_params['min_samples_split'], \n", + " n_estimators= best_params['n_estimators'], \n", + " random_state= i_Seed)\n", + "# Instancia AdaBoostClassifier\n", + "ml_AB= AdaBoostClassifier(n_estimators=100, base_estimator= ml_RF2, random_state= i_Seed)\n", + "\n", + "# Treina...\n", + "ml_AB.fit(X_train, y_train)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "A4Cs81OLD40y" + }, + "source": [ + "# Cross-Validation com 10 folds\n", + "a_scores_CV = cross_val_score(ml_AB, X_train, y_train, cv = i_CV)\n", + "print(f'Média das Acurácias calculadas pelo CV....: {100*round(a_scores_CV.mean(),4)}')\n", + "print(f'std médio das Acurácias calculadas pelo CV: {100*round(a_scores_CV.std(),4)}')" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "F7Ce5L38ECoC" + }, + "source": [ + "**Interpretação**: Nosso classificador (AdaBoostClassifier) tem uma acurácia média de 96,72% (base de treinamento). Além disso, o std é da ordem de 2,54%, ou seja, pequena. Vamos tentar melhorar a acurácia do classificador usando parameter tunning (GridSearchCV)." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "t5GfnBwEifkO" + }, + "source": [ + "print(f'Acurácias: {a_scores_CV}')" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "Q9rSpuXyEPA5" + }, + "source": [ + "# Faz predições com os parametros otimizados...\n", + "y_pred = ml_AB.predict(X_test)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "2F9k-_eXGDLa" + }, + "source": [ + "# Confusion Matrix\n", + "cf_matrix = confusion_matrix(y_test, y_pred)\n", + "cf_labels = ['True_Negative','False_Positive','False_Negative','True_Positive']\n", + "cf_categories = ['Zero', 'One']\n", + "mostra_confusion_matrix(cf_matrix, group_names= cf_labels, categories= cf_categories)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "XweWTjQ9EXLw" + }, + "source": [ + "## Parameter tunning" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "fcrKzse9EbL_" + }, + "source": [ + "# Dicionário de parâmetros para o parameter tunning.\n", + "d_parametros_AB = {'n_estimators':[50, 100, 200], 'learning_rate':[.001, 0.01, 0.05, 0.1, 0.3,1]}" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "Susc3I7mFDQX" + }, + "source": [ + "# Invoca a função\n", + "ml_AB2, best_params= GridSearchOptimizer(ml_AB, 'ml_AB2', d_parametros_AB, X_train, y_train, X_test, y_test, cv = i_CV)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "w4JjWsusjNS8" + }, + "source": [ + "___\n", + "# **GRADIENT BOOSTING**\n", + "* Gradient boosting pode ser usado para resolver problemas de classificação (GradientBoostingClassifier) e Regressão (GradientBoostingRegressor);\n", + "* Gradient boosting são um refinamento do AdaBoost (lembra que AdaBoost foi um dos primeiros métodos de Boosting - criado em 1995). O que Gradient Boosting faz adicionalmente ao AdaBoost é minimizar a loss (função perda), ie, minimizar a diferença entre os valores observados de y e os valores preditos.\n", + "* Usa Gradient Descent para encontrar as deficiências nas previsões do passo anterior. Gradient Descent é um algoritmo popular e poderoso e usado em Redes Neurais;\n", + "* O objetivo do Gradient Boosting é minimizar 'loss function'. Portanto, Gradient Boosting depende da \"loss function\".\n", + "* Gradient boosting usam algoritmos DecisionTree como base_estimator;\n", + "\n", + "## Vantagens\n", + "* Não há necessidade de pre-processing;\n", + "* Trabalha normalmente com COLUNAS numéricas ou categóricas;\n", + "* Trata automaticamente os Missing Values. Ou seja, não é necessário aplicar métodos de Missing Value Imputation;\n", + "\n", + "## Desvantagens\n", + "* Como Gradient Boosting tenta continuamente minimizar os erros à cada iteração, isso pode enfatizar os outliers e causar overfitting. Portanto, deve-se:\n", + " * Tratar os outliers previamente OU\n", + " * Usar Cross-Validation para neutralizar os efeitos dos outliers (**Eu prefiro este método, pois toma menos tempo**);\n", + "* Computacionalmene caro. Geralmente são necessários muitas árvores (> 1000) para se obter bons resultados;\n", + "* Devido à flexibilidade (muitos parâmetros para ajustar), então é necessário usar GridSearchCV para encontrar a combinação ótima dos hyperparameters;\n", + "\n", + "## Referências\n", + "* [Gradient Boosting Decision Tree Algorithm Explained](https://towardsdatascience.com/machine-learning-part-18-boosting-algorithms-gradient-boosting-in-python-ef5ae6965be4) - Didático e detalhista.\n", + "* [Predicting Wine Quality with Gradient Boosting Machines](https://towardsdatascience.com/predicting-wine-quality-with-gradient-boosting-machines-a-gmb-tutorial-d950b1542065)\n", + "* [Parameter Tuning in Gradient Boosting (GBM) with Python](https://www.datacareer.de/blog/parameter-tuning-in-gradient-boosting-gbm/)\n", + "* [Tune Learning Rate for Gradient Boosting with XGBoost in Python](https://machinelearningmastery.com/tune-learning-rate-for-gradient-boosting-with-xgboost-in-python/)\n", + "* [In Depth: Parameter tuning for Gradient Boosting](https://medium.com/all-things-ai/in-depth-parameter-tuning-for-gradient-boosting-3363992e9bae) - Muito bom\n", + "* [Complete Machine Learning Guide to Parameter Tuning in Gradient Boosting (GBM) in Python](https://www.analyticsvidhya.com/blog/2016/02/complete-guide-parameter-tuning-gradient-boosting-gbm-python/)" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "Q4bUCZs2jNTA" + }, + "source": [ + "from sklearn.ensemble import GradientBoostingClassifier\n", + "\n", + "# Instancia...\n", + "ml_GB=GradientBoostingClassifier(n_estimators=100, min_samples_split= 2)\n", + "\n", + "# Treina...\n", + "ml_GB.fit(X_train, y_train)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "-dr6dyjdXwvd" + }, + "source": [ + "# Cross-Validation com 10 folds\n", + "a_scores_CV = cross_val_score(ml_GB, X_train, y_train, cv = i_CV)\n", + "print(f'Média das Acurácias calculadas pelo CV....: {100*round(a_scores_CV.mean(),4)}')\n", + "print(f'std médio das Acurácias calculadas pelo CV: {100*round(a_scores_CV.std(),4)}')" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "VlC3y3M5YaGG" + }, + "source": [ + "print(f'Acurácias: {a_scores_CV}')" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "vnLvQ0ZDYNjB" + }, + "source": [ + "**Interpretação**: Nosso classificador (GradientBoostingClassifier) tem uma acurácia média de 96,86% (base de treinamento). Além disso, o std é da ordem de 2,52%, ou seja, pequena. Vamos tentar melhorar a acurácia do classificador usando parameter tunning (GridSearchCV)." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "D2n1RKZuXq3D" + }, + "source": [ + "# Faz precições...\n", + "y_pred = ml_GB.predict(X_test)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "8r6JCzQRGFa0" + }, + "source": [ + "# Confusion Matrix\n", + "cf_matrix = confusion_matrix(y_test, y_pred)\n", + "cf_labels = ['True_Negative','False_Positive','False_Negative','True_Positive']\n", + "cf_categories = ['Zero', 'One']\n", + "mostra_confusion_matrix(cf_matrix, group_names = cf_labels, categories = cf_categories)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "KFv-Q2AD5uCk" + }, + "source": [ + "## Parameter tunning\n", + "> Consulte [Complete Machine Learning Guide to Parameter Tuning in Gradient Boosting (GBM) in Python](https://www.analyticsvidhya.com/blog/2016/02/complete-guide-parameter-tuning-gradient-boosting-gbm-python/) para detalhes sobre os parâmetros, significado e etc." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "wgU040AcjNTF" + }, + "source": [ + "# Dicionário de parâmetros para o parameter tunning.\n", + "d_parametros_GB= {'learning_rate': [1, 0.5, 0.25, 0.1, 0.05, 0.01]} #,\n", + "# 'n_estimators': [1, 2, 4, 8, 16, 32, 64, 100, 200],\n", + "# 'max_depth': [5, 10, 15, 20, 25, 30],\n", + "# 'min_samples_split': [0.1, 0.3, 0.5, 0.7, 0.9],\n", + "# 'min_samples_leaf': [0.1, 0.2, 0.3, 0.4, 0.5],\n", + "# 'max_features': list(range(1, X_train.shape[1]))}" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "v5KLFlpTjNTH" + }, + "source": [ + "# Invoca a função\n", + "ml_GB2, best_params= GridSearchOptimizer(ml_GB, 'ml_GB2', d_parametros_GB, X_train, y_train, X_test, y_test, cv = i_CV)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "YQ6ERz3fi9i2" + }, + "source": [ + "### Resultado da execução do Gradient Boosting" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "RSa7uKw13mKG" + }, + "source": [ + "```\n", + "[Parallel(n_jobs=-1)]: Done 275400 out of 275400 | elapsed: 93.7min finished\n", + "\n", + "Parametros otimizados: {'learning_rate': 1, 'max_depth': 30, 'max_features': 11, 'min_samples_leaf': 0.1, 'min_samples_split': 0.1, 'n_estimators': 100}\n", + "```\n", + "\n" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "wiJpA2PyjDjR" + }, + "source": [ + "# Como o procedimento acima levou 93 minutos para executar, então vou estimar ml_GB2 abaixo usando os parâmetros acima estimados\n", + "best_params= {'learning_rate': 1, 'max_depth': 30, 'max_features': 11, 'min_samples_leaf': 0.1, 'min_samples_split': 0.1, 'n_estimators': 100}\n", + "\n", + "#ml_GB2= GradientBoostingClassifier(learning_rate= best_params['learning_rate'], \n", + "# max_depth= best_params['max_depth'],\n", + "# max_features= best_params['max_features'],\n", + "# min_samples_leaf= best_params['min_samples_leaf'],\n", + "# min_samples_split= best_params['min_samples_split'],\n", + "# n_estimators= best_params['n_estimators'],\n", + "# random_state= i_Seed)\n", + "\n", + "ml_GB2= GradientBoostingClassifier(learning_rate= best_params['learning_rate'], \n", + " max_depth= best_params['max_depth'],\n", + " min_samples_leaf= best_params['min_samples_leaf'],\n", + " min_samples_split= best_params['min_samples_split'],\n", + " n_estimators= best_params['n_estimators'],\n", + " random_state= i_Seed)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "mb14gJ7-jbVM" + }, + "source": [ + "## Selecionar as COLUNAS importantes/relevantes" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "TAqGZIFYm2sU" + }, + "source": [ + "X_train_GB, X_test_GB = seleciona_colunas_relevantes(ml_GB2, X_train, X_test)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "6yiu6dahnBvC" + }, + "source": [ + "## Treina o classificador com as COLUNAS relevantes " + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "APrtWN18nc4t" + }, + "source": [ + "best_params" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "VS0mLdOmnXAY" + }, + "source": [ + "# Treina com as COLUNAS relevantes\n", + "ml_GB2.fit(X_train_GB, y_train)\n", + "\n", + "# Cross-Validation com 10 folds\n", + "a_scores_CV = cross_val_score(ml_GB2, X_train_GB, y_train, cv = i_CV)\n", + "print(f'Acurácia Media: {100*a_scores_CV.mean():.2f}')\n", + "print(f'std médio.....: {100*a_scores_CV.std():.2f}')" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "vmc9PP_Rn1TN" + }, + "source": [ + "## Valida o modelo usando o dataframe X_test" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "e3mnIALvnzP2" + }, + "source": [ + "y_pred_GB = ml_GB2.predict(X_test_GB)\n", + "\n", + "# Calcula acurácia\n", + "accuracy_score(y_test, y_pred_GB)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "kwP9Z2GnkV7r" + }, + "source": [ + "___\n", + "# **XGBOOST (eXtreme Gradient Boosting)**\n", + "* XGBoost é uma melhoria de Gradient Boosting. As melhorias são em velocidade e performace, além de corrigir as ineficiências do GradientBoosting.\n", + "* Algoritmo preferido pelos Kaggle Grandmasters;\n", + "* Paralelizável;\n", + "* Estado-da-arte em termos de Machine Learning;\n", + "\n", + "## Parâmetros relevantes e seus valores iniciais\n", + "Consulte [Complete Guide to Parameter Tuning in XGBoost with codes in Python](https://www.analyticsvidhya.com/blog/2016/03/complete-guide-parameter-tuning-xgboost-with-codes-python/) para detalhes completos sobre os parâmetros, significado e etc.\n", + "\n", + "* n_estimators = 100 (100 caso o dataframe for grande. Se o dataframe for médio/pequeno, então 1000) - É o número de árvores desejamos construir;\n", + "* max_depth= 3 - Determina quão profundo cada árvore pode crescer durante qualquer round de treinamento. Valores típicos no intervalo [3, 10];\n", + "* learning rate= 0.01 - Usado para evitar overfitting, intervalo: [0, 1];\n", + "* alpha (somente para problemas de Regressão) - L1 regularization nos pesos. Valores altos resulta em mais regularization;\n", + "* lambda (somente para problemas de Regressão) - L2 regularization nos pesos.\n", + "* colsample_bytree: 1 - porcentagem de COLUNAS usados por cada árvore. Alto valor pode causar overfitting;\n", + "* subsample: 0.8 - porcentagem de amostras usadas por árvore. Um valor baixo pode levar a overfitting;\n", + "* gamma: 1 - Controla se um determinado nó será dividido com base na redução esperada na perda após a divisão. Um valor mais alto leva a menos divisões.\n", + "* objective: Define a \"loss function\". As opções são:\n", + " * reg:linear - Para resolver problemas de regressão;\n", + " * reg:logistic - Para resolver problemas de classificação;\n", + " * binary:logistic - Para resolver problemas de classificação com cálculo de probabilidades;\n", + "\n", + "# Referências\n", + "* [How exactly XGBoost Works?](https://medium.com/@pushkarmandot/how-exactly-xgboost-works-a320d9b8aeef)\n", + "* [Fine-tuning XGBoost in Python like a boss](https://towardsdatascience.com/fine-tuning-xgboost-in-python-like-a-boss-b4543ed8b1e)\n", + "* [Gentle Introduction of XGBoost Library](https://medium.com/@imoisharma/gentle-introduction-of-xgboost-library-2b1ac2669680)\n", + "* [A Beginner’s guide to XGBoost](https://towardsdatascience.com/a-beginners-guide-to-xgboost-87f5d4c30ed7)\n", + "* [Exploring XGBoost](https://towardsdatascience.com/exploring-xgboost-4baf9ace0cf6)\n", + "* [Feature Importance and Feature Selection With XGBoost in Python](https://machinelearningmastery.com/feature-importance-and-feature-selection-with-xgboost-in-python/)\n", + "* [Ensemble Learning case study: Running XGBoost on Google Colab free GPU](https://towardsdatascience.com/running-xgboost-on-google-colab-free-gpu-a-case-study-841c90fef101) - Recomendo\n", + "* [Predicting movie revenue with AdaBoost, XGBoost and LightGBM](https://towardsdatascience.com/predicting-movie-revenue-with-adaboost-xgboost-and-lightgbm-262eadee6daa)\n", + "* [Tuning XGBoost Hyperparameters with Scikit Optimize](https://towardsdatascience.com/how-to-improve-the-performance-of-xgboost-models-1af3995df8ad)\n", + "* [An Example of Hyperparameter Optimization on XGBoost, LightGBM and CatBoost using Hyperopt](https://towardsdatascience.com/an-example-of-hyperparameter-optimization-on-xgboost-lightgbm-and-catboost-using-hyperopt-12bc41a271e) - Interessante\n", + "* [XGBOOST vs LightGBM: Which algorithm wins the race !!!](https://towardsdatascience.com/lightgbm-vs-xgboost-which-algorithm-win-the-race-1ff7dd4917d) - LightGBM tem se mostrado interessante.\n", + "* [From Zero to Hero in XGBoost Tuning](https://towardsdatascience.com/from-zero-to-hero-in-xgboost-tuning-e48b59bfaf58) - Gostei\n", + "* [Build XGBoost / LightGBM models on large datasets — what are the possible solutions?](https://towardsdatascience.com/build-xgboost-lightgbm-models-on-large-datasets-what-are-the-possible-solutions-bf882da2c27d)\n", + "* [Selecting Optimal Parameters for XGBoost Model Training](https://towardsdatascience.com/selecting-optimal-parameters-for-xgboost-model-training-c7cd9ed5e45e) - Muito bom!\n", + "* [CatBoost vs. Light GBM vs. XGBoost](https://towardsdatascience.com/catboost-vs-light-gbm-vs-xgboost-5f93620723db)\n" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "iMM_R4_ukV7x" + }, + "source": [ + "from xgboost import XGBClassifier\n", + "import xgboost as xgb\n", + "\n", + "# Instancia...\n", + "ml_XGB= XGBClassifier(silent=False, \n", + " scale_pos_weight=1,\n", + " learning_rate=0.01, \n", + " colsample_bytree = 1,\n", + " subsample = 0.8,\n", + " objective='binary:logistic', \n", + " n_estimators=1000, \n", + " reg_alpha = 0.3,\n", + " max_depth= 3, \n", + " gamma=1, \n", + " max_delta_step=5)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "E4wQMlDEFINR" + }, + "source": [ + "# Treina...\n", + "ml_XGB.fit(X_train, y_train)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "zAhsTtwGqMkG" + }, + "source": [ + "# Cross-Validation com 10 folds\n", + "a_scores_CV = cross_val_score(ml_XGB, X_train, y_train, cv = i_CV)\n", + "print(f'Média das Acurácias calculadas pelo CV....: {100*round(a_scores_CV.mean(),4)}')\n", + "print(f'std médio das Acurácias calculadas pelo CV: {100*round(a_scores_CV.std(),4)}')" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "JNyKX6PkrXOk" + }, + "source": [ + "**Interpretação**: Nosso classificador (XGBClassifier) tem uma acurácia média de 96,72% (base de treinamento). Além disso, o std é da ordem de 2,02%, ou seja, pequena. Vamos tentar melhorar a acurácia do classificador usando parameter tunning (GridSearchCV)." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "_h0QYv3FkV73" + }, + "source": [ + "print(f'Acurácias: {a_scores_CV}')" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "AKhhAZLjkV76" + }, + "source": [ + "# Faz predições...\n", + "y_pred = ml_XGB.predict(X_test)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "Ir2Kd1PqGHgz" + }, + "source": [ + "# Confusion Matrix\n", + "cf_matrix = confusion_matrix(y_test, y_pred)\n", + "cf_labels = ['True_Negative','False_Positive','False_Negative','True_Positive']\n", + "cf_categories = ['Zero', 'One']\n", + "mostra_confusion_matrix(cf_matrix, group_names= cf_labels, categories= cf_categories)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "jEC7gW4qYpWw" + }, + "source": [ + "## Parameter tunning\n", + "### Leitura Adicional:\n", + "* [Fine-tuning XGBoost in Python like a boss](https://towardsdatascience.com/fine-tuning-xgboost-in-python-like-a-boss-b4543ed8b1e)\n", + "* [Complete Guide to Parameter Tuning in XGBoost with codes in Python](https://www.analyticsvidhya.com/blog/2016/03/complete-guide-parameter-tuning-xgboost-with-codes-python/)\n", + "\n", + "> Olhando para os resultados acima, qual o melhor modelo?\n", + "\n", + "XGBoost? Supondo que sim, agora vamos fazer o fine-tuning dos parâmetros do modelo." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "n3MsUONPwIV9" + }, + "source": [ + "# Dicionário de parâmetros para XGBoost:\n", + "d_parametros_XGB = {'min_child_weight': [i for i in np.arange(1, 13)]} #,\n", + "# 'gamma': [i for i in np.arange(0, 5, 0.5)],\n", + "# 'subsample': [0.6, 0.8, 1.0],\n", + "# 'colsample_bytree': [0.6, 0.8, 1.0],\n", + "# 'max_depth': [3, 4, 5, 7, 9],\n", + "# 'learning_rate': [i for i in np.arange(0.01, 1, 0.1)]}" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "CX27FCKmwSni" + }, + "source": [ + "# Invoca a função\n", + "ml_XGB, best_params= GridSearchOptimizer(ml_XGB, 'ml_XGB2', d_parametros_XGB, X_train, y_train, X_test, y_test, cv = i_CV)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "9b7uCuF74Hjv" + }, + "source": [ + "### Resultado da execução do XGBoostClassifier\n", + "\n", + "```\n", + "[Parallel(n_jobs=-1)]: Done 108000 out of 108000 | elapsed: 372.0min finished\n", + "\n", + "Parametros otimizados: {'colsample_bytree': 0.8, 'gamma': 0.5, 'learning_rate': 0.51, 'max_depth': 5, 'min_child_weight': 1, 'subsample': 0.6}\n", + "```\n", + "\n" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "n7E0oyxEtbGi" + }, + "source": [ + "# Como o procedimento acima levou 372 minutos para executar, então vou estimar ml_XGB2 abaixo usando os parâmetros acima estimados\n", + "best_params= {'colsample_bytree': 0.8, 'gamma': 0.5, 'learning_rate': 0.51, 'max_depth': 5, 'min_child_weight': 1, 'subsample': 0.6}\n", + "\n", + "ml_XGB2= XGBClassifier(min_child_weight= best_params['min_child_weight'], \n", + " gamma= best_params['gamma'], \n", + " subsample= best_params['subsample'], \n", + " colsample_bytree= best_params['colsample_bytree'], \n", + " max_depth= best_params['max_depth'], \n", + " learning_rate= best_params['learning_rate'], \n", + " random_state= i_Seed)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "CuqyLHTU5Z-j" + }, + "source": [ + "## Selecionar as COLUNAS importantes/relevantes\n", + "* [The Multiple faces of ‘Feature importance’ in XGBoost](https://towardsdatascience.com/be-careful-when-interpreting-your-features-importance-in-xgboost-6e16132588e7)" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "QPG3JZIpRZ-T" + }, + "source": [ + "# plot feature importance\n", + "from xgboost import plot_importance\n", + "\n", + "xgb.plot_importance(ml_XGB2, color = 'red')\n", + "plt.title('importance', fontsize = 20)\n", + "plt.yticks(fontsize = 10)\n", + "plt.ylabel('features', fontsize = 20)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "EmpRC2lHW-KP" + }, + "source": [ + "ml_XGB2" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "4f9MIEBiyq-5" + }, + "source": [ + "X_train_XGB, X_test_XGB= seleciona_colunas_relevantes(ml_XGB2, X_train, X_test)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "F6EayWaY5nMm" + }, + "source": [ + "## Treina o classificador com as COLUNAS relevantes" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "Huy18gKI5qad" + }, + "source": [ + "best_params" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "E3-PaTdc5vZk" + }, + "source": [ + "# Treina com as COLUNAS relevantes...\n", + "ml_XGB2.fit(X_train_XGB, y_train)\n", + "\n", + "# Cross-Validation com 10 folds\n", + "a_scores_CV = cross_val_score(ml_XGB2, X_train_XGB, y_train, cv = i_CV)\n", + "print(f'Acurácia Media: {100*a_scores_CV.mean():.2f}')\n", + "print(f'std médio.....: {100*a_scores_CV.std():.2f}')" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "tBdYikDU6NhD" + }, + "source": [ + "## Valida o modelo usando o dataframe X_test" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "GcvY-VdL6VIZ" + }, + "source": [ + "y_pred_XGB = ml_XGB2.predict(X_test_XGB)\n", + "\n", + "# Calcula acurácia\n", + "accuracy_score(y_test, y_pred_XGB)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "8oLtdH-vTSbC" + }, + "source": [ + "xgb.to_graphviz(ml_XGB2)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "czXQG3MCHfHM" + }, + "source": [ + "# KNN - KNEIGHBORSCLASSIFIER" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "llTTXNeyHiwx" + }, + "source": [ + "# BAGGINGCLASSIFIER" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Fbkekd4QHoZO" + }, + "source": [ + "# EXTRATREESCLASSIFIER" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "widavwR4HzwE" + }, + "source": [ + "# SVM\n", + "https://data-flair.training/blogs/svm-support-vector-machine-tutorial/" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "id_Ubulns6We" + }, + "source": [ + "# NAIVE BAYES" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "3e0m7lEnYOV9" + }, + "source": [ + "# **IMPORTANCIA DAS COLUNAS**\n", + "Source: [Plotting Feature Importances](https://www.kaggle.com/grfiv4/plotting-feature-importances)" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "fjco0HnNYr-N" + }, + "source": [ + "def mostra_feature_importances(clf, X_train, y_train=None, \n", + " top_n=10, figsize=(8,8), print_table=False, title=\"Feature Importances\"):\n", + " '''\n", + " plot feature importances of a tree-based sklearn estimator\n", + " \n", + " Note: X_train and y_train are pandas DataFrames\n", + " \n", + " Note: Scikit-plot is a lovely package but I sometimes have issues\n", + " 1. flexibility/extendibility\n", + " 2. complicated models/datasets\n", + " But for many situations Scikit-plot is the way to go\n", + " see https://scikit-plot.readthedocs.io/en/latest/Quickstart.html\n", + " \n", + " Parameters\n", + " ----------\n", + " clf (sklearn estimator) if not fitted, this routine will fit it\n", + " \n", + " X_train (pandas DataFrame)\n", + " \n", + " y_train (pandas DataFrame) optional\n", + " required only if clf has not already been fitted \n", + " \n", + " top_n (int) Plot the top_n most-important features\n", + " Default: 10\n", + " \n", + " figsize ((int,int)) The physical size of the plot\n", + " Default: (8,8)\n", + " \n", + " print_table (boolean) If True, print out the table of feature importances\n", + " Default: False\n", + " \n", + " Returns\n", + " -------\n", + " the pandas dataframe with the features and their importance\n", + " \n", + " Author\n", + " ------\n", + " George Fisher\n", + " '''\n", + " \n", + " __name__ = \"mostra_feature_importances\"\n", + " \n", + " import pandas as pd\n", + " import numpy as np\n", + " import matplotlib.pyplot as plt\n", + " \n", + " from xgboost.core import XGBoostError\n", + " from lightgbm.sklearn import LightGBMError\n", + " \n", + " try: \n", + " if not hasattr(clf, 'feature_importances_'):\n", + " clf.fit(X_train.values, y_train.values.ravel())\n", + "\n", + " if not hasattr(clf, 'feature_importances_'):\n", + " raise AttributeError(\"{} does not have feature_importances_ attribute\".\n", + " format(clf.__class__.__name__))\n", + " \n", + " except (XGBoostError, LightGBMError, ValueError):\n", + " clf.fit(X_train.values, y_train.values.ravel())\n", + " \n", + " feat_imp = pd.DataFrame({'importance':clf.feature_importances_}) \n", + " feat_imp['feature'] = X_train.columns\n", + " feat_imp.sort_values(by ='importance', ascending = False, inplace = True)\n", + " feat_imp = feat_imp.iloc[:top_n]\n", + " \n", + " feat_imp.sort_values(by='importance', inplace = True)\n", + " feat_imp = feat_imp.set_index('feature', drop = True)\n", + " feat_imp.plot.barh(title=title, figsize=figsize)\n", + " plt.xlabel('Feature Importance Score')\n", + " plt.show()\n", + " \n", + " if print_table:\n", + " from IPython.display import display\n", + " print(\"Top {} features in descending order of importance\".format(top_n))\n", + " display(feat_imp.sort_values(by = 'importance', ascending = False))\n", + " \n", + " return feat_imp" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "ycu_EIGlYUYn" + }, + "source": [ + "import pandas as pd\n", + "\n", + "from xgboost import XGBClassifier\n", + "from sklearn.ensemble import ExtraTreesClassifier\n", + "from sklearn.tree import ExtraTreeClassifier\n", + "from sklearn.tree import DecisionTreeClassifier\n", + "from sklearn.ensemble import GradientBoostingClassifier\n", + "from sklearn.ensemble import BaggingClassifier\n", + "from sklearn.ensemble import AdaBoostClassifier\n", + "from sklearn.ensemble import RandomForestClassifier\n", + "from sklearn.linear_model import LogisticRegression\n", + "from lightgbm import LGBMClassifier\n", + "\n", + "clfs = [XGBClassifier(), LGBMClassifier(), \n", + " ExtraTreesClassifier(), ExtraTreeClassifier(),\n", + " BaggingClassifier(), DecisionTreeClassifier(),\n", + " GradientBoostingClassifier(), LogisticRegression(),\n", + " AdaBoostClassifier(), RandomForestClassifier()]\n", + "\n", + "for clf in clfs:\n", + " try:\n", + " _ = mostra_feature_importances(clf, X_train, y_train, top_n=X_train.shape[1], title=clf.__class__.__name__)\n", + " except AttributeError as e:\n", + " print(e)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "EwWkjfC8KEZH" + }, + "source": [ + "# ENSEMBLE METHODS\n", + "https://towardsdatascience.com/using-bagging-and-boosting-to-improve-classification-tree-accuracy-6d3bb6c95e5b\n", + "\n", + "![Ensemble](https://github.com/MathMachado/Materials/blob/master/Ensemble.png?raw=true)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "3Uf1RML7xETY" + }, + "source": [ + "# WOE e IV\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "TBNRfYZCyhMP" + }, + "source": [ + "## Construção do exemplo" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "gIIroyyP4ZRZ" + }, + "source": [ + "df_y.head()" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "PzQQdrkf1ohX" + }, + "source": [ + "from random import choices\n", + "\n", + "df_X2= df_X.copy()\n", + "df_X2['tipo']= choices(['A', 'B', 'C', 'D'], k= 1000)\n", + "df_X2['idade']= np.random.randint(10, 15, size= 1000)\n", + "df_X2['target']= df_y['target']\n", + "df_X2.head()" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "v-OpwIpx4hXJ" + }, + "source": [ + "df_X2['target'].value_counts()" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "yZfqSvbKzeJ3" + }, + "source": [ + "def Constroi_Buckets(df, i, k= 10):\n", + " coluna= 'v'+ str(i)\n", + " df[coluna+'_Bucket']= pd.cut(df[coluna], bins= k, labels= np.arange(1, k+1))\n", + " df= df.drop(columns= [coluna], axis= 1)\n", + " return df" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "V6Nrpsx60HD3" + }, + "source": [ + "for i in np.arange(1,19):\n", + " df_X2= Constroi_Buckets(df_X2, i)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "J2Fbh41-03OB" + }, + "source": [ + "df_X2.head()" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "O9r5BeWVxIr3" + }, + "source": [ + "# Função para calcular WOE e IV\n", + "def calculate_woe_iv(dataset, feature, target):\n", + "\n", + " def codethem(IV):\n", + " if IV < 0.02: return 'Useless'\n", + " elif IV >= 0.02 and IV < 0.1: return 'Weak'\n", + " elif IV >= 0.1 and IV < 0.3: return 'Medium'\n", + " elif IV >= 0.3 and IV < 0.5: return 'Strong'\n", + " elif IV >= 0.5: return 'Suspicious'\n", + " else: return 'None'\n", + "\n", + " lst = []\n", + " for i in range(dataset[feature].nunique()):\n", + " val = list(dataset[feature].unique())[i]\n", + " lst.append({\n", + " 'Value': val,\n", + " 'All': dataset[dataset[feature] == val].count()[feature],\n", + " 'Good': dataset[(dataset[feature] == val) & (dataset[target] == 0)].count()[feature],\n", + " 'Bad': dataset[(dataset[feature] == val) & (dataset[target] == 1)].count()[feature]\n", + " })\n", + " \n", + " dset = pd.DataFrame(lst)\n", + " dset['Distr_Good'] = dset['Good']/dset['Good'].sum()\n", + " dset['Distr_Bad'] = dset['Bad']/dset['Bad'].sum()\n", + " dset['Mean']= dset['All']/dset['All'].sum()\n", + " dset['WoE'] = np.log(dset['Distr_Good']/dset['Distr_Bad'])\n", + " dset = dset.replace({'WoE': {np.inf: 0, -np.inf: 0}})\n", + " dset['IV'] = (dset['Distr_Good'] - dset['Distr_Bad']) * dset['WoE']\n", + " #dset= dset.drop(columns= ['Distr_Good', 'Distr_Bad'], axis= 1)\n", + "\n", + " dset['Predictive_Power']= dset['IV'].map(codethem)\n", + " iv = dset['IV'].sum() \n", + " dset = dset.sort_values(by='IV') \n", + " return dset, iv" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "Y8WGjWH63nx_" + }, + "source": [ + "df_Lab = df_X2.copy()" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "-N6xr1MgxTiz" + }, + "source": [ + "def calcula_Predictive_Power(df_Lab, coluna):\n", + " print('WoE and IV for column: {}'.format(coluna))\n", + " df, iv = calculate_woe_iv(df_Lab, coluna, 'target')\n", + " print(df)\n", + " print('IV score: {:.2f}'.format(iv))\n", + " print('\\n')" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "ayqN_7WnxVq9" + }, + "source": [ + "for i in np.arange(1,19):\n", + " coluna= 'v'+str(i)+'_Bucket'\n", + " calcula_Predictive_Power(df_Lab, coluna)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "qtoJVI4Pyx3I" + }, + "source": [ + "# **IMBALANCED SAMPLE**\n", + "> Alguns objetivos como detectar fraude em transações bancárias ou detecção de intrusão em network tem em comum o fato que a classe de interesse (o que queremos detectar), geralmente é um evento raro\n", + "\n", + "## Exemplo: Detectar fraude\n", + "A proporção de fraudes diante de NÃO-FRAUDES são mais ou menos 1%/99%. Neste caso, ao desenvovermos um modelo para detectar fraudes e o modelo classificar todas as instâncias como NÃO-FRAUDE, então o modelo terá uma acurácia de 99%. No entanto, este modelo não nos ajudará em nada.\n", + "\n", + "## Necessidade de se usar outras métricas \n", + "> Recomenda-se utilizar outras métricas (na verdade, é boa prática usar mais de 1 métrica para medir a performance dos modelos) como, por exemplo, F1-Score, Precision/Specificity, Recall/Sensitivity e AUROC.\n", + "\n", + "## Como lidar com a amostra desbalanceada?\n", + "* Under-sampling\n", + "> Seleciona aleatoriamente a classe MAJORITÁRIA (em nosso exemplo, NÃO-FRAUDE) até o número de instâncias da classe MINORITÁRIA (FRAUDE);\n", + "\n", + "* Over-sampling\n", + "> Resample aleatoriamente a classe MINORITÁRIA (em nosso exemplo, FRAUDE) até o número de instâncias da classe MAJORITÁRIA (NÃO-FRAUDE), ou uma proporção da classe MAJORITÁRIA. Veja a bibliotea SMOTE (Synthetic Minority Over-Sampling Techniques);\n", + "\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "2o45zx8zw-aB" + }, + "source": [ + "## EFEITOS DA AMOSTRA DESBALANCEADA" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "cCVTPCB-Xkbd" + }, + "source": [ + "# TPOT\n", + "https://towardsdatascience.com/tpot-automated-machine-learning-in-python-4c063b3e5de9" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "2ulXii6JXpWd" + }, + "source": [ + "" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "_TWUq-z4X4yZ" + }, + "source": [ + "___\n", + "# FEATURETOOLS\n", + "https://medium.com/@rrfd/simple-automatic-feature-engineering-using-featuretools-in-python-for-classification-b1308040e183\n", + "\n", + "https://www.analyticsvidhya.com/blog/2018/08/guide-automated-feature-engineering-featuretools-python/\n", + "\n", + "https://mlwhiz.com/blog/2019/05/19/feature_extraction/\n", + "\n" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "aZUNOgmSgAmq" + }, + "source": [ + "!pip install featuretools" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "_sxdONzsh9rb" + }, + "source": [ + "df_X.head()" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "p5_ynGo1dBJJ" + }, + "source": [ + "df_X.shape" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "TqJRJXUhiDqf" + }, + "source": [ + "from random import choices\n", + "\n", + "df_X2= df_X.copy()\n", + "df_X2['tipo'] = choices(['A', 'B', 'C', 'D'], k = 1000)\n", + "df_X2['idade'] = np.random.randint(10, 15, size = 1000)\n", + "df_X2['id'] = range(0,1000)\n", + "df_X2.head()" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "nR56bGGngk-W" + }, + "source": [ + "# Automated feature engineering\n", + "import featuretools as ft\n", + "import featuretools.variable_types as vtypes\n", + "\n", + "es= ft.EntitySet(id = 'simulacao')\n", + "\n", + "# adding a dataframe \n", + "es.entity_from_dataframe(entity_id = 'df_X2', dataframe = df_X2, index = 'id')\n", + "es" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "IOJ4Tr5Ogk6M" + }, + "source": [ + "es['df_X2'].variables" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "1uXPqHDZgkys" + }, + "source": [ + "variable_types = {'idade': vtypes.Categorical}\n", + " \n", + "es.entity_from_dataframe(entity_id = 'df_X2', dataframe = df_X2, index = 'id', variable_types= variable_types)\n", + "\n", + "es = es.normalize_entity(base_entity_id='df_X2', new_entity_id= 'tipo', index='id')\n", + "es = es.normalize_entity(base_entity_id='df_X2', new_entity_id= 'idade', index='id')" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "dnbYTBqugkvm" + }, + "source": [ + "es" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "I2v_jetdgkr7" + }, + "source": [ + "feature_matrix, feature_names = ft.dfs(entityset=es, target_entity = 'df_X2', max_depth = 3, verbose = 3, n_jobs= 1)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "zZiRBvHXgkoJ" + }, + "source": [ + "feature_matrix.head()" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "aWiahwKe2d6U" + }, + "source": [ + "# **EXERCÍCIOS**\n", + "> Encontre algoritmos adequados para ser aplicados aos seguintes problemas:" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "XbSLkbDB2mzK" + }, + "source": [ + "## Exercício 1 - Credit Card Fraud Detection\n", + "Source: [Credit Card Fraud Detection](https://www.kaggle.com/mlg-ulb/creditcardfraud)\n", + "\n", + "### Leitura suporte\n", + "* [Detecting Credit Card Fraud Using Machine Learning](https://towardsdatascience.com/detecting-credit-card-fraud-using-machine-learning-a3d83423d3b8)\n", + "* [Credit Card Fraud Detection](https://towardsdatascience.com/credit-card-fraud-detection-a1c7e1b75f59)\n", + "\n", + "### Dataframe\n", + "* [Creditcard.csv](https://raw.githubusercontent.com/MathMachado/DataFrames/master/creditcard.csv)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "oYgK6JXd3MgA" + }, + "source": [ + "## Exercício 2 - Predicting species on IRIS dataset\n" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "si0rsJvu3O6O" + }, + "source": [ + "from sklearn import datasets\n", + "import xgboost as xgb\n", + "\n", + "iris = datasets.load_iris()\n", + "X_iris = iris.data\n", + "y_iris = iris.target" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "zom8t4yWC_UC" + }, + "source": [ + "## Exercício 3 - Predict Wine Quality\n", + "> Estimar a qualidade dos vinhos, numa scala de 0–100. A seguir, a qualidade em função da escala:\n", + "\n", + "* 95–100 Classic: a great wine\n", + "* 90–94 Outstanding: a wine of superior character and style\n", + "* 85–89 Very good: a wine with special qualities\n", + "* 80–84 Good: a solid, well-made wine\n", + "* 75–79 Mediocre: a drinkable wine that may have minor flaws\n", + "* 50–74 Not recommended\n", + "\n", + "Source: [Wine Reviews](https://www.kaggle.com/zynicide/wine-reviews)" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "klL2Q9Ria96n" + }, + "source": [ + "import numpy as np\n", + "import pandas as pd\n", + "from sklearn import datasets\n", + "\n", + "Wine = datasets.load_wine()\n", + "X_vinho = Wine.data\n", + "y_vinho = Wine.target" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "lhVhSWBgGijq" + }, + "source": [ + "## Exercício 4 - Predict Parkinson\n", + "Source: https://archive.ics.uci.edu/ml/machine-learning-databases/parkinsons/" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "SVCxHqv0VBJn" + }, + "source": [ + "## Exercício 5 - Predict survivors from Titanic tragedy\n" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "CwvB8us4eKNi" + }, + "source": [ + "import numpy as np\n", + "import pandas as pd\n", + "import seaborn as sns\n", + "\n", + "df_titanic = sns.load_dataset('titanic')" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ZJrT9YIXVdtx" + }, + "source": [ + "## Exercício 6 - Predict Loan\n", + "> Os dados devem ser obtidos diretamente da fonte: [Loan Default Prediction - Imperial College London](https://www.kaggle.com/c/loan-default-prediction/data)\n", + "\n", + "* [Bank Loan Default Prediction](https://medium.com/@wutianhao910/bank-loan-default-prediction-94d4902db740)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "R8-GVu7ZWeA8" + }, + "source": [ + "## Exercício 7 - Predict the sales of a store.\n", + "* [Predicting expected sales for Bigmart’s stores](https://medium.com/diogo-menezes-borges/project-1-bigmart-sale-prediction-fdc04f07dc1e)\n", + "* Dataframes\n", + " * [Treinamento](https://raw.githubusercontent.com/MathMachado/DataFrames/master/Big_Mart_Sales_III_train.txt)\n", + " * [Validação](https://raw.githubusercontent.com/MathMachado/DataFrames/master/Big_Mart_Sales_III_test.txt)\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "fv9w86j4Wnwj" + }, + "source": [ + "## Exercício 8 - [The Boston Housing Dataset](https://www.cs.toronto.edu/~delve/data/boston/bostonDetail.html)\n", + "> Predict the median value of owner occupied homes." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "5HYRt8-ug1BT" + }, + "source": [ + "import numpy as np\n", + "import pandas as pd\n", + "from sklearn import datasets\n", + "\n", + "Boston = datasets.load_boston()\n", + "X_boston = Boston.data\n", + "y_boston = Boston.target" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "1UDIaqmtXQ0T" + }, + "source": [ + "## Exercício 9 - Predict the height or weight of a person.\n", + "\n", + "http://wiki.stat.ucla.edu/socr/index.php/SOCR_Data_Dinov_020108_HeightsWeights" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "-7R146nIXmMT" + }, + "source": [ + "## Exercício 10 - Black Friday Sales Prediction - Predict purchase amount.\n", + "\n", + "This dataset comprises of sales transactions captured at a retail store. It’s a classic dataset to explore and expand your feature engineering skills and day to day understanding from multiple shopping experiences. This is a regression problem. The dataset has 550,069 rows and 12 columns.\n", + "\n", + "https://github.com/MathMachado/DataFrames/blob/master/blackfriday.zip\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "mQ8FPbuLZlIh" + }, + "source": [ + "## Exercício 11 - Predict the income class of US population.\n", + "\n", + "http://archive.ics.uci.edu/ml/machine-learning-databases/census-income-mld/" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Af4NRrchgPlM" + }, + "source": [ + "## Exercício 12 - Predicting Cancer\n" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "c4LOlgZW3P40" + }, + "source": [ + "from sklearn import datasets\n", + "cancer = datasets.load_breast_cancer()\n", + "X_cancer = cancer.data\n", + "y_cancer = cancer.target" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "74PmpT8Ix0tD" + }, + "source": [ + "## Exercício 13\n", + "Source: [Complete Machine Learning Guide to Parameter Tuning in Gradient Boosting (GBM) in Python](https://www.analyticsvidhya.com/blog/2016/02/complete-guide-parameter-tuning-gradient-boosting-gbm-python/).\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "WY8GZMixZ9W9" + }, + "source": [ + "## Exercício 14 - Predict Diabetes" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "y92t6tbOge0S" + }, + "source": [ + "from sklearn import datasets\n", + "Diabetes= datasets.load_diabetes()\n", + "\n", + "X_diabetes = Diabetes.data\n", + "y_diabetes = Diabetes.target" + ], + "execution_count": null, + "outputs": [] + } + ] +} \ No newline at end of file