Abstract
In this paper, we show how the familiar concept of gradient descent can be extended in presence of binary neurons. The procedure we devised formally operates on generic feed-forward networks of logistic-like neurons whose activations are rescaled by an arbitrarily large gauge. Whereas the gradient decays exponentially with increasing values of the gauge, the sign of each component becomes definitely equal to a constant value. Those values are actually computed by means of a “twin” network of binary neurons. This allows the application of any “Manhattan” training algorithm such as Resilient Propagation.