@dtxer

dtxer@lemmy.world · 1 year ago

The specifics are a bit different, but the main ideas are much older than this, I’ll leave here the Wikipedia

"Frank Rosenblatt, who published the Perceptron in 1958,[10] also introduced an MLP with 3 layers: an input layer, a hidden layer with randomized weights that did not learn, and an output layer.[11][12] Since only the output layer had learning connections, this was not yet deep learning. It was what later was called an extreme learning machine.[13][12]

The first deep learning MLP was published by Alexey Grigorevich Ivakhnenko and Valentin Lapa in 1965, as the Group Method of Data Handling.[14][15][12]

The first deep learning MLP trained by stochastic gradient descent[16] was published in 1967 by Shun’ichi Amari.[17][12] In computer experiments conducted by Amari’s student Saito, a five layer MLP with two modifiable layers learned internal representations required to classify non-linearily separable pattern classes.[12]

In 1970, Seppo Linnainmaa published the general method for automatic differentiation of discrete connected networks of nested differentiable functions.[3][18] This became known as backpropagation or reverse mode of automatic differentiation. It is an efficient application of the chain rule derived by Gottfried Wilhelm Leibniz in 1673[2][19] to networks of differentiable nodes.[12] The terminology “back-propagating errors” was actually introduced in 1962 by Rosenblatt himself,[11] but he did not know how to implement this,[12] although Henry J. Kelley had a continuous precursor of backpropagation[4] already in 1960 in the context of control theory.[12] In 1982, Paul Werbos applied backpropagation to MLPs in the way that has become standard.[6][12] In 1985, David E. Rumelhart et al. published an experimental analysis of the technique.[7] Many improvements have been implemented in subsequent decades.[12]"

dtxer@lemmy.world · 1 year ago

To the second question it’s not novel at all. The models used were invented decades ago. What changed is Moores Law striked and we got stronger computational power especially graphics cards. It seems that there is some resource barrier that when surpassed turns these models from useless to useful.

dtxer@lemmy.world · 1 year ago

I use Arsch btw

dtxer@lemmy.world · 1 year ago

I don’t agree. It depends really on what you’re using it for. I have one machine with Xubuntu installed and it’s been used just for browsing or putting movies to the tv and it’s been just plug and play so far.

There was the one issue, of not being able to use some streaming services, because of drm protection meaning you can only stream via their windows desktop app not via browser. I took it as an invitation for streaming the stuff from somewhere else for free shrug

dtxer@lemmy.world · 1 year ago

On my labs cluster they are named after famous physicists

dtxer@lemmy.world · 1 year ago

I was providing Linux distros and Machine Learning datasets some time ago, because official servers where slow. I’m the meme I guess