You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In this section we pick up the topic of VC-dimensions again, adding some technical but famous results. In particular we present Sauer's Lemma and (without a proof) one of its applications. We will also define a way to analyse the expressive power of a hypothesis space.
Remark: Let in the following $\mathcal{H}$ be a hypothesis space on a domain $X$, where $X$ is given an arbitrary probability distribution $\mathcal{D}$.
Definition: We define the restriction of $\mathcal{H}$ to a sample set $S\subset X$ as
$$ \mathcal{H}_S := {h|_S : S \rightarrow { 0,1} : h \in \mathcal{H} }\text{.} $$
Definition: We define the growth function of $\mathcal{H}$ as
$$ \tau_{\mathcal{H}}: \mathbb{N}\rightarrow\mathbb{N}\text{,} \quad \tau_{\mathcal{H}}(m) := \max_{S\subset X\text{,} \ |S|=m} |\mathcal{H}_S|\text{.}$$
The growth function tells us how the expressive power of the hypothesis spaces evolves when enlarging the sample size. If it grows exponentially, $\mathcal{H}$ is very general. On the other hand, if $\tau_{\mathcal{H}}$ is eventually constant, we have a clearly restricted hypothesis space. A very famous result is Sauer's Lemma which bounds the growth function with respect to the VC-dimension.
Lemma (Sauer et al.):
Let $VCdim(\mathcal{H}) \leq d < \infty $. Then
$$ \tau_{\mathcal{H}}(m)\leq \sum^d_{i=0} \binom{m}{i}$$ for all $m\in \mathbb{N}$.
Proof: First we observe that for any $S\subset X$$$|{ B\subset S: \mathcal{H} \ shatters \ B }| \leq \sum^d_{i=0} \binom{|S|}{i}\text{,}$$ since no set of size $>d$ can be shattered and $S$ contains $\binom{|S|}{i}$ subsets of size $i$.
Proof of the claim: Proof by induction on the sample complexity of $S$.
$|S|= 1$: The empty set is vacuously shattered. This leaves us with
$$ |\mathcal{H}S|= 1+ \Big{^{1 \quad \text{if $S$ is shattered}}{0 \quad \text{otherwise.}}$$
$1 <|S|$: Assume $|S|=m$ and $S={s_1, \dots, s_m}$ for some $m\in\mathbb{N}$ and that the claim holds for sets of size $<m$. Let furthermore $S':= S\setminus {s_m}$. We define
Note that for every $h\in\mathcal{H}_1\subset\mathcal{H}_0$ there is ${h\cup(s_m,0), h\cup(s_m,1)}\subset \mathcal{H}_S$. Hence
$$| \mathcal{H}_S |=| \mathcal{H}_0\setminus\mathcal{H}_1 |+2 | \mathcal{H}_1 |$$ and therefore $$| \mathcal{H}_S |= | \mathcal{H}_0 |+| \mathcal{H}_1 | \text{.}$$
We can apply the induction hypothesis on $\mathcal{H}_0$ and obtain
$$ |\mathcal{H}_0|\leq|{ B\subset S': \mathcal{H} \ shatters \ B}|=|{ B\subset S: s_m \notin B\wedge \mathcal{H} \ shatters \ B}|\text{.}$$
For $$\mathcal{H}_1$$ we can construct a hypothesis space $\mathcal{H}'$ such that $$\mathcal{H}1=\mathcal{H}'{S'}$$. Namely, we define
$$\mathcal{H}' := {h\cup(s_m,0): h\in \mathcal{H}_1}\cup{h\cup (s_m,1): h\in\mathcal{H}_1}\text{.}$$
$=|{B\subset S: s_m\in B \wedge \mathcal{H} \ shatters \ B}|$.
Putting both inequalities together yields
$$|\mathcal{H}_S|=|\mathcal{H}_0|+|\mathcal{H}_1|\leq |{B\subset S: \mathcal{H} \ shatters \ B}|\text{.}$$
This concludes the proof.
We now state without a proof an important theorem that puts the growth function into the context of learnability.
Theorem: Let $S\subset X$ be drawn i.i.d.. Then for every $\delta\in(0,1)$ we have with probability at least $1-\delta$ that
$$ |L_{\mathcal{D}}(h)-L_S(h)|\leq \frac{4\sqrt{\log{\tau_{\mathcal{H}}(2|S|)}}}{\delta\sqrt{2|S|}}\text{.}$$