Eventos Anais de eventos
COBEM 2019
25th International Congress of Mechanical Engineering
Support vector approach using data augmentation with copula function applied to load forecasting
Submission Author:
Leandro dos Santos Coelho , PR , Brazil
Co-Authors:
Gabriel Ribeiro, Marcos Yamasaki, Helon Vicente Hultmann Ayala, Leandro dos Santos Coelho, Viviana Mariani
Presenter: Helon Vicente Hultmann Ayala
doi://10.26678/ABCM.COBEM2019.COB2019-1042
Abstract
A machine learning algorithm training requires sufficient numbers of data to reduce test and training errors to a minimum irreducible value. In practice, data is limited and sometimes it may not be possible to achieve more due to lacking or corrupted history records, unavailability for experiments or any other reason. Furthermore, even if more data is available, getting the same result with fewer data may be advantageous. In this perspective, dataset augmentation is an approach that adds fake data, transformed from existing data, to the training set. In regression problems, data augmentation may be implemented creating additional training samples from the probability density function of samples. In this respect, functions called copulas have been used to couple multivariate distribution functions to their marginal distribution functions. This paper investigates the use of t-copula function for data augmentation in regression. Specifically, the approach is tested in a load forecasting problem of Brazil interconnected power system using Support Vector Regression and Least-Squares regression trained with actual measured load history and compared with the training using a blend of measured and augmented data with copula. As the number of training samples from original training set increases, benefits of copula tend to vanish since augmented data have a high probability to be very similar to original samples already used in training, however, if additional original samples are not available, data augmentation with t-copula apparently is an alternative to improve model performance.
Keywords
data augmentation, t-copula, machine learning, load forecasting

