Saltar la navegación

What is Forecasting for Symbolic Data?


Due to the introduction of computer science in almost all areas of the human activity, we have today a trend to store huge quantities of data. In many cases, under this information knowledge exists which can turn out to be very suitable to help us to understand better a phenomena or can orientate us to take future decisions. Nevertheless, extracting a great volume of data is a complicated task. Therefore an increasing need of capacity to analyse data exists. In these cases it turns out to be crucial to have tools to summarize this information in an efficient way and to extract knowledge, and, according to Berthold et al. (2004), to get smart aproximations in the analysis of large datasets goes on being of great importance in many real current applications.

The symbolic data, named for being structurated and containing internal variation, is a new paradigm to satisfy this need. The analysis of symbolic data (SDA) is a very recent area that arises in 1988 with the presentation of E. Diday (1988) in the conference of the International Federation of Classification Societies (IFCS). Afterwards it has been developed and spread out enormously by Diday's studies and other investigation groups of different countries (France, Brazil, Italy, Japan, Spain ...), using the bases to make trustworthy analyses about symbolic data. Furthermore, about this topic a project of the European Esprit, called Symbolic Official Dates Analysis System (SODAS, also exists. The beginnings of SDA were developed by Bock (2000) and Bock and Diday (2000), as well as several methods to analyze and to visualize them.

According to Clements (2003), recent forecasting literature is placing emphasis on providing a more complete description of the uncertainty around the central trend of a variable which will be predicted and on the technologies to evaluate some approximations as interval and density predictions. A density prediction of the accomplishment of a random variable in some instant of future time is an estimation of the probability distribution of the possible values which this variable can take in the future. It is a well-known area in economics and finance (Granger et al. (1989), Diebold et al. (1998), Stay and Wallis (1999), Timmermann (2000)). The density and interval prediction (recently emphasized in Christoffersen (1999) and Clements (2003)) is considered to be related to the symbolic data prediction. Nevertheless, a theory which has been laid the foundations to make forecasting models by using symbolic data has been developed until today . On the contrary, the classical forecasting theory, which has its origin in the 50's, now is a mature and very consolidated area, where have already been made very important theoretical advances and its efficiency has been shown empirically in multiple areas. For example, Abraham and Ledolter (1983), O`Donovan (1983), Box et al. (1994), Hamilton (1994), Makridakis et al. (1998), Armstrong (2001), Peña et al. (2001) and Tsay (2002), shaping a good basis of knowledge on classical forecasting. In spanish language the majority of the not translated references are linked to the Econometrics, among them we can mention Aznar and Trívez (1993), Otero (1993) or Uriel (1995).

Here it is also important to mention the developments made in the Artificial Intelligence area (IA), where, under different perspectives, the processing of symbolic data in the process of extracting knowledge and inference is being investigated since the second half of the 20th century. Among the above mentioned it is necessary to emphasize the blurry logic (Mamdani (1977), Zadeh (1965, 1988 and 1989)), like a solution to represent knowledge with uncertainty, say, and its extension to the processing of symbolic data is proposed to approach in this offer. Born in the bosom of the IA, the Artificial Neural Networks(RNA) is another paradigm of processing data (Hertz et al. (1991) contains a good introduction to these models). In contrast to the blurry logic, which allows to represent directly the knowledge in symbolic terms, the RNA allows to model processes using sets of numerical data as only source of information. These architectures have converted into reference models for not linear dynamic processes, specially adapted for its application to the prediction of temporary series if a sufficient information volume is available(Elman (1990), Horne and Giles (1995), Jordan (1986), Moody (1998), Mozer (1994), Weigend and Gershenfeld (1994)). In this idea it is proposed to investigate the extension of these models in case of symbolic data by means of fitted models with numerical data.


References

  • Abraham, B. and Ledolter, J. (1983), 'Statistical Methods for Forecasting'. John Wiley & Sons. New York.
  • O`Donovan, T. M. (1983), 'Short Term Forecasting. An Introduction to the Box-Jenkins Approach'. John Wiley & Sons. New York.
  • Makridakis, S. Wheelwright, S.C. and Hyndman, R.J. (1998), 'Forecasting: Methods and Applications'. Third Edition. John Wiley & Sons. New York.
  • Peña, D., Tiao, G. C. and Tsay, R. S. (2001), 'A Course in Time Series Analysis'. John Wiley & Sons. New York.
  • Box, G. E. P., Jenkins, G. M., and Reinsel, G. C. (1994), 'Time Series Analysis, Forecasting and Control'. Third edition. Prentice-Hall, Inc. Englewood Cliffs, New Jersey.
  • Tsay, R. S. (2002), 'Analysis of Financial Time Series'. John Wiley & Sons. New York.
  • Diebold, F. X. (1998), 'Elements of Forecasting'. South-Western College Publishing. Cincinanti.

DownloadDownload(PDF)

© Comillas Pontifical University
C/ Alberto Aguilera 23 - 28015 Madrid - Tel. (34) 91 542 28 00