--- Overview
This README file is intended to help anyone interested in running the code developed for inflation forecasting. The project provides deep learning models and benchmarks to forecast inflation using a wide array of macroeconomic variables. The purpose is to test whether deep learning is more effective in modeling inflation, which is far from trivial, as the pertaining literature shows. To achieve this objective, LSTM-based models, in particular ConvLSTM networks, together with autoencoders, were implemented. In comparison to the performance provide by popular benchmarks, the results obtained are encouraging.
Initially, it should be noted there are two notebooks with different purposes. The first one, called "Inflation_Data", is responsible for preparing the raw input data. Essentially, this notebook receives the macroeconomic time series and run the necessary data manipulation routines. The autoenconders and VAEs are implemented here as well. The outputs are used in the second notebook, called "Inflation_Forecasting", which adjusts every model using the outputs of the first notebook. This notebook also compiles the forecasts yielded by each model and provides them as output for the user along with performance metrics.
--- Directories
In both notebooks, in the beginning of the code, the user can find string variables containing multiple directories. Briefly, some of these directories should contain the input data for the notebooks, while the others are going to store the outputs generated by the routines.
In the Inflation_Data notebook, the directories and file names are the following:
-
str_Dir_Plan_FRED --> folder where the FRED data is located
-
str_Dir_Plan_Data --> folder where the consolidated data (original FRED variables and their lags paired by the dates) are located
-
str_Dir_Plan_PC --> folder where the principal components and encoded variables should be found, already split in train, validation, and test samples (output from the other notebook)
-
str_Nome_Plan_FRED_MD --> name of the spreadsheet containing the monthly FRED data
-
str_Nome_Plan_FRED_QD --> name of the spreadsheet containing the quarterly FRED data
-
str_Nome_Plan_FRED_MD_Desc --> name of the spreadsheet containing the data description (monthly time series)
-
str_Nome_Plan_FRED_QD_Desc --> name of the spreadsheet containing the data description (quarterly time series)
In the Inflation_Forecasting notebook, the directories and file names are the following:
-
str_Dir_Plan_FRED --> folder where the FRED data is located
-
str_Dir_Plan_Data --> folder where the consolidated data (original FRED variables and their lags paired by the dates) are located
-
str_Dir_Plan_PC --> folder where the principal components and encoded variables should be found, already split in train, validation, and test samples (output from the other notebook)
-
str_Dir_Results --> folder where the results are going to be stored
-
str_Nome_Plan_FRED_MD --> name of the spreadsheet containing the monthly FRED data
-
str_Nome_Plan_FRED_QD --> name of the spreadsheet containing the quarterly FRED data
-
str_Nome_Plan_FRED_MD_Desc --> name of the spreadsheet containing the data description (monthly time series)
-
str_Nome_Plan_FRED_QD_Desc --> name of the spreadsheet containing the data description (quarterly time series)
--- R Functions
In both notebooks, in the beginning, a environment is set to import libraries from R. Since some models were only available in libraries in R, we decided to employ an existing API to use functions from these libraries directly in Python. The paths can be changed according to the folder in which the user installed R in his machine.
--- Libraries
Before running the notebooks, the user must install the following Python libraries to run the code:
- Pandas
- Scipy
- Datetime
- xarray
- pandas_datareader
- rpy2
- pykalman
- pywt
- tensorflow
- keras
- sklearn
- statsmodels
- arch
- matplotlib
- seaborn
- pydot
- warnings
--- Input Data
The main input data to start running the code are comprised by the macroeconomic time series compiled by Michael W. McCracken, which can be freely accessed through his webpage located at https://research.stlouisfed.org/econ/mccracken/fred-databases/. The other input is the table containing the description of the data and the transformations through which the macroeconomic time series are submitted, following McCracken and Ng (2016). We have added the corresponding .csv file to ease the process for new users interested in our project. When running the code, the user just needs to save this .csv file in a folder of his choice and point the code to it using the string variables created (please refer to the section "Directories" above).
--- Notebook "Inflation_Data"
This notebook contains routines for data preparation, normalization, and denoising. Additionally, before adjusting the models implemented in the second notebook, we run PCA, autoenconders, and VAEs to encode the input data. Overall, the code present in this notebook does the following:
- Read the macroeconomic time series provided by FRED and compiled by Michael W. McCracken;
- The time series are transformed according to the suggestions proposed by McCracken and Ng (2016). These transformations involve computing log values and differencing, for instance;
- In the sequence, time series are normalized, split in different samples (training and testing), and attributed to different variables. The splitting process is implemented using k-fold CV together with MC to generate multiple samples and improve the out-of-sample performance analysis. We use different seeds (see the loop in the "Training and Test Samples" part of the code) to generate these samples. Inside each sample, date ordering and the lags are preserved because lags of each variable are added as inputs (that is, together with variable x(t), we also have variables x(t-1), x(t-2), etc.). This decision facilitates the implementation of block MC and avoids breaking the autocorrelation of the series. Validation samples from the training samples are created in the notebook "Inflation_Forecasting";
- After preparing the data, we run PCA and derive principal components. In parallel, using the original transformed data, we also encode the variables using autoenconders and VAEs. Note that these techniques are used for variables grouped according to their nature (employment, income, etc.), which is more appropriate than applying the techniques just once for the entire database and mixing variables of different natures;
- The outputs of the PCA, autoencoders, and VAEs are separated and saved in individual .csv files, which are used in the next notebook.
--- Notebook "Inflation_Forecasting"
This notebook presents the code to implement the deep learning models and several benchmarks selected for performance testing. Essentially, the code executes the processes below:
- Read the outputs provided by the code contained in the notebook "Inflation_Data";
- Organize the outputs according to the requirements of each function that implements a model tested in our project;
- Adjust the model to the data;
- Evaluate the out-of-sample performance of each model using common metrics, such as MSE;
- Save the results in .csv files.