# Optimization Using (Stochastic) Gradient Decent

Gradient Decent is applied to optimization problems based on (locally) differentiable objective functions. This algorithm defines an iterative descent to the minimum of the objective function `f(x)`, if needed it can be trivially adopted to search a maximum. Starting form the more or less well educated `x₀` we use the negative gradient `-∇f(x)` to step in the direction of steepest descend.

`xₙ₊₁ = xₙ - γ∇f(x)`

The step size along the negative gradient is defined by `γ`. This parameter defines how fast the algorithm can search for the minimum especial when `x₀` is far from minimum. Unfortunately to high `γ` can lead to oscillation effects. Since the choice of lambda is crucial for the success of the algorithm one common variation is to make lambda slowly decay with each training iteration.

The stochastic extension of the Gradient Descent is well for optimization problems where the objection function is composed of large number`m` of summed subfunction.
`f(x) = m⁻¹∑ᵢ₌₁...ₘ f(x)ᵢ`