Göm menyn
Files: Description Format
Fulltext PDF (requires Acrobat Reader)
Fulltext part 1 PostScript (requires a PostScript Reader)
  Fulltext part 2 PostScript (requires a PostScript Reader)
Author: Christian Zirkelbach
Article title: Using the Real Dimension of the Data
Publ. type: Article
Volume: 5
Article No: 4
Language: English
Abstract [en]: This paper presents a method for extracting the real dimension of a large data set in a high-dimensional data cube and indicates its use for visual data mining. A similarity measure structures a data set in a general, but weak sense. If the elements are part of a high-dimensional host space (primary space), for instance a data warehouse cube, the resulting structure doesn't necessarily reflect the real dimension of the embedded (secondary) space. We show that a metric-structured set has, in general, a fractal dimension. This means that the data set is a finite subset of a fractal secondary space of lower dimension.

Mapping the set into the secondary space of lower dimension will not result in loss of information with regard to the semantics defined by the measure. However, it helps to reduce storage and computing efforts. Additionally, the secondary space itself reveals much about the set's structure and can facilitate data mining.

The main problem with the secondary space is that it is unknown, and if it is not a linear sub-space of  Red  , then there  Red  then there is no algorithm to determine it. We make a proposal for adding the property of a dimension to a metric and show that this is compatible to our customized understanding of a a dimension. We present an algorithm which computes, in optimal time, an index on the data set. The index can be regarded as a materialized view for representing the structure of the elements with respect to the given metric. The algorithm works independently of the dimension of the primary space, i.e. pumping up the dimension of the primary space wil not affect the result and the computational effort. Thus, besides supporting navigation and answering neighborhood queries, the algorihm also determines the real dimension of the underlying data set, even if the set is not linearly structured. We prove that the algorithm is robust with respect to skewed data or a distortion of the distance measure.

Publisher: LINKÖPING University Electronic Press
Year: 2000
Available: 2000-12-05
No. of pages: 14
Series: LINKÖPING Electronic Articles in Computer and Information Science
ISSN: 1401-9841
Note: First posting 2000-03-08 in ETAI area "Concept Based Knowledge Representation"

Responsible for this page: Peter Berkesand
Last updated: 2017-02-21