Growth function
The growth function, also called the shatter coefficient or the shattering number, measures the richness of a set family. It is especially used in the context of statistical learning theory, where it measures the complexity of a hypothesis class. The term 'growth function' was coined by Vapnik and Chervonenkis in their 1968 paper, where they also proved many of its properties.[1] It is a basic concept in machine learning.[2] [3]
Definitions
Set-family definition
Let be a set family (a set of sets) and a set. Their intersection is defined as the following set-family:
The intersection-size (also called the index) of with respect to is . If a set has elements then the index is at most . If the index is exactly 2m then the set is said to be shattered by , because contains all the subsets of , i.e.:
The growth function measures the size of as a function of . Formally:
Examples
1. The domain is the real line . The set-family contains all the half-lines (rays) from a given number to positive infinity, i.e., all sets of the form for some . For any set of real numbers, the intersection contains sets: the empty set, the set containing the largest element of , the set containing the two largest elements of , and so on. Therefore: .[1]:Ex.1 The same is true whether contains open half-lines, closed half-lines, or both.
2. The domain is the segment . The set-family contains all the open sets. For any finite set of real numbers, the intersection contains all possible subsets of . There are such subsets, so . [1]:Ex.2
3. The domain is the Euclidean space . The set-family contains all the half-spaces of the form: , where is a fixed vector. Then , where Comp is the number of number of components in a partitioning of an n-dimensional space by m hyperplanes.[1]:Ex.3
4. The domain is the real line . The set-family contains all the real intervals, i.e., all sets of the form for some . For any set of real numbers, the intersection contains all runs of between 0 and consecutive elements of . The number of such runs is , so .
Polynomial or exponential
The main property that makes the growth function interesting is that it can be either polynomial or exponential - nothing in-between.
The following is a property of the intersection-size:[1]:Lem.1
- If, for some set of size , and for some number , -
- then, there exists a subset of size such that .
This implies the following property of the Growth function.[1]:Th.1 For every family there are two cases:
- The exponential case: identically.
- The polynomial case: is majorized by , where is the smallest integer for which .
Other properties
Trivial upper bound
For any finite :
since for every , the number of elements in is at most . Therefore, the growth function is mainly interesting when is infinite.
Exponential upper bound
For any nonempty :
I.e, the growth function has an exponential upper-bound.
We say that a set-family shatters a set if their intersection contains all possible subsets of , i.e. . If shatters of size , then , which is the upper bound.
VC dimension
The VC dimension of is defined according to these two cases:
- In the polynomial case, = the largest integer for which .
- In the exponential case .
So if-and-only-if .
The growth function can be regarded as a refinement of the concept of VC dimension. The VC dimension only tells us whether is equal to or smaller than , while the growth function tells us exactly how changes as a function of .
Another connection between the growth function and the VC dimension is given by the Sauer–Shelah lemma:[3]:49
- If , then:
- for all :
In particular,
- for all :
- so when the VC dimension is finite, the growth function grows polynomially with .
This upper bound is tight, i.e, for all there exists with VC dimension such that:[2]:56
Entropy
While the growth-function is related to the maximum intersection-size, the entropy is related to the average intersection size:[1]:272–273
The intersection-size has the following property. For every set-family :
Hence:
Moreover, the sequence converges to a constant when .
Moreover, the random-variable is concentrated near .
Applications in probability theory
Let be a set on which a probability measure is defined. Let be family of subsets of (= a family of events).
Suppose we choose a set that contains elements of , where each element is chosen at random according to the probability measure , independently of the others (i.e, with replacements). For each event , we compare the following two quantities:
- Its relative frequency in , i.e, ;
- Its probability .
We are interested in the difference, . This difference satisfies the following upper bound:
which is equivalent to:[1]:Th.2
In words: the probability that for all events in , the relative-frequency is near the probability, is lower-bounded by an expression that depends on the growth-function of .
A corollary of this is that, iff the growth function is polynomial in (i.e, there exists some such that ), then the above probability approaches 1 as . I.e, the family enjoys uniform convergence in probability.
References
- Vapnik, V. N.; Chervonenkis, A. Ya. (1971). "On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities". Theory of Probability & Its Applications. 16 (2): 264. doi:10.1137/1116025. This is an English translation, by B. Seckler, of the Russian paper: "On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities". Dokl. Akad. Nauk. 181 (4): 781. 1968. The translation was reproduced as: Vapnik, V. N.; Chervonenkis, A. Ya. (2015). "On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities". Measures of Complexity. p. 11. doi:10.1007/978-3-319-21852-6_3. ISBN 978-3-319-21851-9.
- Mohri, Mehryar; Rostamizadeh, Afshin; Talwalkar, Ameet (2012). Foundations of Machine Learning. USA, Massachusetts: MIT Press. ISBN 9780262018258., especially Section 3.2
- Shalev-Shwartz, Shai; Ben-David, Shai (2014). Understanding Machine Learning – from Theory to Algorithms. Cambridge University Press. ISBN 9781107057135.