# Universal Approximation Theorem

## Universal Approximation Theorem

#### The basic concept of the universal approximation theorem

Universal approximation theorem is one of the mathematical theories of neural networks. This theorem states that if given appropriate parameters, the simple neural networks are able to represent wide ranges of interesting functions. This means that under mild assumptions that cover the activation function, the feed forward networks which have single hidden layers that contain finite numbers of are able to approximate continuous functions in subsets of Rn. In short, this theorem proposes that simple neural networks are universal approximators. However, the theorem does not take into account the aspect of how trainable the given parameters are using the different algorithms availed.

#### History of the universal approximation theorem

Most scholars attribute the roots of the universal theorem to George Cybenko. This is because he was the first to link the theorem and use it with sigmoidal activation functions in 1989. By so doing, Cybenko actually proved that notwithstanding the algorithmic learnability of given parameters, a neural network can indeed represent an arbitrary function. This ability to represent different ranges of functions when given appropriate parameters is what led Cybenko to believe that neural networks were indeed universal approximators.

This theorem was further developed in 1991 by Kurt Hornik. Although Hornik agreed with Cybenko on the existence of universal approximators, he opposed the notion that the specific choice of an activation function was the main determinant of universal approximation.

He was of the opinion that the architecture of the multilayer feed forward would be the main reason why the neural networks had the potential characteristic of being universal approximators. He further explained that since output linear units were always assumed to be linear in nature.

#### Significance of universal approximation theorem

It is easier, for notional convenience purpose to show only the single output cases because the general cases could be further deduced from the single output cases.  There is a general assumption that the output units are always linear.

The measurements for accuracy during the approximation will always depend on how one measures the closeness between the different functions. This also contrasts significantly depending on the problem at hand. It is also advisable to use the network simultaneously in all the input samples in certain applications.

In some other applications, inputs are thought of as random variables and average performance is given more focus. In such cases the most popular average performance is 2 and this is considered as the correspondent to the mean square error.

There are still other means that one can use to measure the closeness of functions and this is pegged to the application at hand. In most of the applications, it is always better to closely match the derivatives of the approximating function which is implemented with the network at hand with those of the functions which will be approximated. This should be done following a specific hierarchy or some kind of order.

Have you been trying to write an essay on universal approximation theorem without any success? Why not contact our professional writers at Premium Essays? We can easily write for you a customized paper on this and many other technological topics.