Links on this page take you to Cornell University’s archives on Jurgen Schmidhuber

  1. arXiv:2111.02216  [pdfother]  cs.CL cs.LG cs.MM cs.SD eess.ASAutomatic Embedding of Stories Into Collections of Independent MediaAuthors: Dylan R. AshleyVincent HerrmannZachary FriggstadKory W. MathewsonJürgen SchmidhuberAbstract: We look at how machine learning techniques that derive properties of items in a collection of independent media can be used to automatically embed stories into such collections. To do so, we use models that extract the tempo of songs to make a music playlist follow a narrative arc. Our work specifies an open-source tool that uses pre-trained neural network models to extract the global tempo of a s… ▽ MoreSubmitted 3 November, 2021; originally announced November 2021.Comments: 2 pages in main text + 1 page of references + 6 pages of appendices, 2 figures in main text + 3 figures in appendices, 1 algorithm in appendices; source code available at https://gist.github.com/dylanashley/1387a99deb85bfc0bce11286810cd98bACM Class: H.5.5; I.2.6; J.5
  2. arXiv:2111.01400  [pdfpsother]  cs.HC doi10.1109/ICHMS53169.2021.9582445Cognitive Load and Productivity Implications in Human-Chatbot InteractionAuthors: Johanna SchmidhuberStephan SchlöglChristian PloderAbstract: The increasing progress in artificial intelligence and respective machine learning technology has fostered the proliferation of chatbots to the point where today they are being embedded into various human-technology interaction tasks. In enterprise contexts, the use of chatbots seeks to reduce labor costs and consequently increase productivity. For simple, repetitive customer service tasks such al… ▽ MoreSubmitted 2 November, 2021; originally announced November 2021.Comments: 6 pages
  3. arXiv:2110.07732  [pdfother]  cs.LG cs.AI cs.NEThe Neural Data Router: Adaptive Control Flow in Transformers Improves Systematic GeneralizationAuthors: Róbert CsordásKazuki IrieJürgen SchmidhuberAbstract: Despite progress across a broad range of applications, Transformers have limited success in systematic generalization. The situation is especially frustrating in the case of algorithmic tasks, where they often fail to find intuitive solutions that route relevant information to the right node/operation at the right time in the grid represented by Transformer columns. To facilitate the learning of u… ▽ MoreSubmitted 5 May, 2022; v1 submitted 14 October, 2021; originally announced October 2021.Comments: Accepted to ICLR 2022
  4. arXiv:2108.12284  [pdfother]  cs.LG cs.AI cs.NEThe Devil is in the Detail: Simple Tricks Improve Systematic Generalization of TransformersAuthors: Róbert CsordásKazuki IrieJürgen SchmidhuberAbstract: Recently, many datasets have been proposed to test the systematic generalization ability of neural networks. The companion baseline Transformers, typically trained with default hyper-parameters from standard tasks, are shown to fail dramatically. Here we demonstrate that by revisiting model configurations as basic as scaling of embeddings, early stopping, relative positional embedding, and Univers… ▽ MoreSubmitted 14 February, 2022; v1 submitted 26 August, 2021; originally announced August 2021.Comments: Accepted to EMNLP 2021
  5. arXiv:2107.09088  [pdfother]  stat.ML cs.AI cs.LGReward-Weighted Regression Converges to a Global OptimumAuthors: Miroslav ŠtruplFrancesco FaccioDylan R. AshleyRupesh Kumar SrivastavaJürgen SchmidhuberAbstract: Reward-Weighted Regression (RWR) belongs to a family of widely known iterative Reinforcement Learning algorithms based on the Expectation-Maximization framework. In this family, learning at each iteration consists of sampling a batch of trajectories using the current policy and fitting a new policy to maximize a return-weighted log-likelihood of actions. Although RWR is known to yield monotonic im… ▽ MoreSubmitted 23 February, 2022; v1 submitted 19 July, 2021; originally announced July 2021.Comments: 7 pages in main text + 2 pages of references + 6 pages of appendices, 1 figure in main text + 1 figure in appendices; source code available at https://github.com/dylanashley/reward-weighted-regressionMSC Class: 68T05 ACM Class: I.2.6
  6. arXiv:2107.05438  [pdfother]  q-bio.NC cs.AIBayesian brains and the Rényi divergenceAuthors: Noor SajidFrancesco FaccioLancelot Da CostaThomas ParrJürgen SchmidhuberKarl FristonAbstract: Under the Bayesian brain hypothesis, behavioural variations can be attributed to different priors over generative model parameters. This provides a formal explanation for why individuals exhibit inconsistent behavioural preferences when confronted with similar choices. For example, greedy preferences are a consequence of confident (or precise) beliefs over certain outcomes. Here, we offer an alter… ▽ MoreSubmitted 12 July, 2021; originally announced July 2021.Comments: 23 pages, 5 figures
  7. arXiv:2107.03857  [pdfother]  q-fin.ST hep-th nlin.CG physics.soc-ph q-fin.GN doi10.1016/j.physa.2022.126873Financial Markets and the Phase Transition between Water and SteamAuthors: Christof SchmidhuberAbstract: Motivated by empirical observations on the interplay of trends and reversion, a lattice gas model of financial markets is presented. The shares of an asset are modeled by gas molecules that are distributed across a hidden social network of investors. The model is equivalent to the Ising model on this network, whose magnetization represents the deviation of the asset price from its value. Moreover,… ▽ MoreSubmitted 15 December, 2021; v1 submitted 8 July, 2021; originally announced July 2021.Comments: 34 pages, 7 figures, significantly revised section 4, added predictions for Hurst exponents
  8. arXiv:2106.06295  [pdfother]  cs.LGGoing Beyond Linear Transformers with Recurrent Fast Weight ProgrammersAuthors: Kazuki IrieImanol SchlagRóbert CsordásJürgen SchmidhuberAbstract: Transformers with linearised attention (”linear Transformers”) have demonstrated the practical scalability and effectiveness of outer product-based Fast Weight Programmers (FWPs) from the ’90s. However, the original FWP formulation is more general than the one of linear Transformers: a slow neural network (NN) continually reprograms the weights of a fast NN with arbitrary architecture. In existi… ▽ MoreSubmitted 26 October, 2021; v1 submitted 11 June, 2021; originally announced June 2021.Comments: Accepted to NeurIPS 2021
  9. arXiv:2103.11715  [pdfother]  cs.AI cs.LG cs.NETransforming Exploratory Creativity with DeLeNoXAuthors: Antonios LiapisHector P. MartinezJulian TogeliusGeorgios N. YannakakisAbstract: …in procedural content generation in games. We also situate DeLeNoX in relation to the distinction between exploratory and transformational creativity, and in relation to Schmidhuber’s theory of creativity through the drive for compression progress. ▽ MoreSubmitted 22 March, 2021; originally announced March 2021.Comments: 8 pagesJournal ref: Proceedings of the Fourth International Conference on Computational Creativity, 2013, pages 56-63
  10. arXiv:2103.09108  [pdfother]  cs.CV cs.LG doi10.3389/fcomp.2022.1041703Is it enough to optimize CNN architectures on ImageNet?Authors: Lukas TuggenerJürgen SchmidhuberThilo StadelmannAbstract: Classification performance based on ImageNet is the de-facto standard metric for CNN development. In this work we challenge the notion that CNN architecture design solely based on ImageNet leads to generally effective convolutional neural network (CNN) architectures that perform well on a diverse set of datasets and application domains. To this end, we investigate and ultimately improve ImageNet a… ▽ MoreSubmitted 6 March, 2023; v1 submitted 16 March, 2021; originally announced March 2021.Journal ref: Frontiers in Computer Science, Volume 4, 2022
  11. arXiv:2103.08877  [pdfother]  cs.CV cs.AI cs.LGSpatial Dependency Networks: Neural Layers for Improved Generative Image ModelingAuthors: Đorđe MiladinovićAleksandar StanićStefan BauerJürgen SchmidhuberJoachim M. BuhmannAbstract: How to improve generative modeling by better exploiting spatial regularities and coherence in images? We introduce a novel neural network for building image generators (decoders) and apply it to variational autoencoders (VAEs). In our spatial dependency networks (SDNs), feature maps at each level of a deep neural net are computed in a spatially coherent way, using a sequential gating-based mechani… ▽ MoreSubmitted 16 March, 2021; originally announced March 2021.Journal ref: International Conference on Learning Representations (2021);
  12. arXiv:2102.11174  [pdfother]  cs.LGLinear Transformers Are Secretly Fast Weight ProgrammersAuthors: Imanol SchlagKazuki IrieJürgen SchmidhuberAbstract: We show the formal equivalence of linearised self-attention mechanisms and fast weight controllers from the early ’90s, where a “slow” neural net learns by gradient descent to program the “fast weights” of another net through sequences of elementary programming instructions which are additive outer products of self-invented activation patterns (today called keys and values). Such Fast Weight Pro… ▽ MoreSubmitted 9 June, 2021; v1 submitted 22 February, 2021; originally announced February 2021.
  13. arXiv:2012.14905  [pdfother]  cs.LG cs.AI cs.NE stat.MLMeta Learning Backpropagation And Improving ItAuthors: Louis KirschJürgen SchmidhuberAbstract: Many concepts have been proposed for meta learning with neural networks (NNs), e.g., NNs that learn to reprogram fast weights, Hebbian plasticity, learned learning rules, and meta recurrent NNs. Our Variable Shared Meta Learning (VSML) unifies the above and demonstrates that simple weight-sharing and sparsity in an NN is sufficient to express powerful learning algorithms (LAs) in a reusable fashio… ▽ MoreSubmitted 13 March, 2022; v1 submitted 29 December, 2020; originally announced December 2020.Comments: Updated to the NeurIPS 2021 camera ready; fixed typo in eq 4Journal ref: 35th Conference on Neural Information Processing Systems (NeurIPS 2021)
  14. arXiv:2012.05208  [pdfother]  cs.NE cs.AI cs.LGOn the Binding Problem in Artificial Neural NetworksAuthors: Klaus GreffSjoerd van SteenkisteJürgen SchmidhuberAbstract: Contemporary neural networks still fall short of human-level generalization, which extends far beyond our direct experiences. In this paper, we argue that the underlying cause for this shortcoming is their inability to dynamically and flexibly bind information that is distributed throughout the network. This binding problem affects their capacity to acquire a compositional understanding of the wor… ▽ MoreSubmitted 9 December, 2020; originally announced December 2020.ACM Class: I.2.6
  15. arXiv:2011.12930  [pdfother]  cs.CV cs.AI cs.LG cs.NEUnsupervised Object Keypoint Learning using Local Spatial PredictabilityAuthors: Anand GopalakrishnanSjoerd van SteenkisteJürgen SchmidhuberAbstract: We propose PermaKey, a novel approach to representation learning based on object keypoints. It leverages the predictability of local image regions from spatial neighborhoods to identify salient regions that correspond to object parts, which are then converted to keypoints. Unlike prior approaches, it utilizes predictability to discover object keypoints, an intrinsic property of objects. This ensur… ▽ MoreSubmitted 8 March, 2021; v1 submitted 25 November, 2020; originally announced November 2020.Comments: Accepted to ICLR 2021
  16. arXiv:2011.07831  [pdfother]  cs.LG cs.NELearning Associative Inference Using Fast Weight MemoryAuthors: Imanol SchlagTsendsuren MunkhdalaiJürgen SchmidhuberAbstract: Humans can quickly associate stimuli to solve problems in novel contexts. Our novel neural network model learns state representations of facts that can be composed to perform such associative inference. To this end, we augment the LSTM model with an associative memory, dubbed Fast Weight Memory (FWM). Through differentiable operations at every step of a given input sequence, the LSTM updates and m… ▽ MoreSubmitted 23 February, 2021; v1 submitted 16 November, 2020; originally announced November 2020.
  17. arXiv:2010.03635  [pdfother]  cs.LG cs.AI stat.MLHierarchical Relational InferenceAuthors: Aleksandar StanićSjoerd van SteenkisteJürgen SchmidhuberAbstract: Common-sense physical reasoning in the real world requires learning about the interactions of objects and their dynamics. The notion of an abstract object, however, encompasses a wide variety of physical objects that differ greatly in terms of the complex behaviors they support. To address this, we propose a novel approach to physical reasoning that models objects as hierarchies of parts that may… ▽ MoreSubmitted 14 December, 2020; v1 submitted 7 October, 2020; originally announced October 2020.Comments: Accepted to AAAI 2021ACM Class: I.2.6
  18. arXiv:2010.02066  [pdfother]  cs.NE cs.AI cs.LGAre Neural Nets Modular? Inspecting Functional Modularity Through Differentiable Weight MasksAuthors: Róbert CsordásSjoerd van SteenkisteJürgen SchmidhuberAbstract: Neural networks (NNs) whose subnetworks implement reusable functions are expected to offer numerous advantages, including compositionality through efficient recombination of functional building blocks, interpretability, preventing catastrophic interference, etc. Understanding if and how NNs are modular could provide insights into how to improve them. Current inspection methods, however, fail to li… ▽ MoreSubmitted 6 March, 2021; v1 submitted 5 October, 2020; originally announced October 2020.
  19. arXiv:2007.04750  [pdfother]  cs.LG stat.ML doi10.1162/neco_a_01539Recurrent Neural-Linear Posterior Sampling for Nonstationary Contextual BanditsAuthors: Aditya RameshPaulo RauberMichelangelo ConservaJürgen SchmidhuberAbstract: An agent in a nonstationary contextual bandit problem should balance between exploration and the exploitation of (periodic or structured) patterns present in its previous experiences. Handcrafting an appropriate historical context is an attractive alternative to transform a nonstationary problem into a stationary problem that can be solved efficiently. However, even a carefully designed historical… ▽ MoreSubmitted 3 November, 2023; v1 submitted 9 July, 2020; originally announced July 2020.Journal ref: Neural Computation. 2022 Oct 7;34(11):2232-72
  20. arXiv:2006.09226  [pdfother]  cs.LG cs.AI stat.MLParameter-Based Value FunctionsAuthors: Francesco FaccioLouis KirschJürgen SchmidhuberAbstract: Traditional off-policy actor-critic Reinforcement Learning (RL) algorithms learn value functions of a single target policy. However, when value functions are updated to track the learned policy, they forget potentially useful information about old policies. We introduce a class of value functions called Parameter-Based Value Functions (PBVFs) whose inputs include the policy parameters. They can ge… ▽ MoreSubmitted 13 August, 2021; v1 submitted 16 June, 2020; originally announced June 2020.Comments: Published as a conference paper at ICLR 2021
  21. arXiv:2006.07847  [pdfother]  q-fin.ST cond-mat.stat-mech hep-th q-fin.PM q-fin.TR doi10.1016/j.physa.2020.125642Trends, Reversion, and Critical Phenomena in Financial MarketsAuthors: Christof SchmidhuberAbstract: Financial markets across all asset classes are known to exhibit trends. These trends have been exploited by traders for decades. Here, we empirically measure when trends revert, based on 30 years of daily futures prices for equity indices, interest rates, currencies and commodities. We find that trends tend to revert once they reach a critical level of statistical significance. Based on polynomial… ▽ MoreSubmitted 11 December, 2020; v1 submitted 14 June, 2020; originally announced June 2020.Comments: 28 pages, 6 figures, minor corrections and additional references in revised version
  22. arXiv:2005.05744  [pdf]  cs.NEDeep Learning: Our Miraculous Year 1990-1991Authors: Juergen SchmidhuberAbstract: In 2020-2021, we celebrated that many of the basic ideas behind the deep learning revolution were published three decades ago within fewer than 12 months in our “Annus Mirabilis” or “Miraculous Year” 1990-1991 at TU Munich. Back then, few people were interested, but a quarter century later, neural networks based on these ideas were on over 3 billion devices such as smartphones, and used many billi… ▽ MoreSubmitted 28 December, 2022; v1 submitted 12 May, 2020; originally announced May 2020.Comments: 39 pages, 279 references, 16 illustrations, based on work of 4 Oct 2019
  23. arXiv:1912.02877  [pdfother]  cs.LG cs.AI cs.ROTraining Agents using Upside-Down Reinforcement LearningAuthors: Rupesh Kumar SrivastavaPranav ShyamFilipe MutzWojciech JaśkowskiJürgen SchmidhuberAbstract: We develop Upside-Down Reinforcement Learning (UDRL), a method for learning to act using only supervised learning techniques. Unlike traditional algorithms, UDRL does not use reward prediction or search for an optimal policy. Instead, it trains agents to follow commands such as “obtain so much total reward in so much time.” Many of its general principles are outlined in a companion report; the goa… ▽ MoreSubmitted 3 September, 2021; v1 submitted 5 December, 2019; originally announced December 2019.Comments: Extends NeurIPS 2019 Deep Reinforcement Learning workshop presentation
  24. arXiv:1912.02875  [pdfpsother]  cs.AI cs.LGReinforcement Learning Upside Down: Don’t Predict Rewards — Just Map Them to ActionsAuthors: Juergen SchmidhuberAbstract: We transform reinforcement learning (RL) into a form of supervised learning (SL) by turning traditional RL on its head, calling this Upside Down RL (UDRL). Standard RL predicts rewards, while UDRL instead uses rewards as task-defining inputs, together with representations of time horizons and other computable functions of historic and desired future data. UDRL learns to interpret these input obser… ▽ MoreSubmitted 23 June, 2020; v1 submitted 5 December, 2019; originally announced December 2019.Comments: 22 pages, 81 references
  25. arXiv:1912.00058  [pdfother]  cs.LG stat.MLA Reparameterization-Invariant Flatness Measure for Deep Neural NetworksAuthors: Henning PetzkaLinara AdilovaMichael KampCristian SminchisescuAbstract: …why this leads to solutions with good generalization, even in cases where the number of parameters is larger than the number of samples. Back in the 90s, Hochreiter and Schmidhuber observed that flatness of the loss surface around a local minimum correlates with low generalization error. For several flatness measures, this correlation has been empirically v… ▽ MoreSubmitted 29 November, 2019; originally announced December 2019.Comments: 14 pages; accepted at Workshop “Science meets Engineering of Deep Learning”, 33rd Conference on Neural Information Processing Systems (NeurIPS 2019)
  26. arXiv:1910.06611  [pdfother]  cs.LG stat.MLEnhancing the Transformer with Explicit Relational Encoding for Math Problem SolvingAuthors: Imanol SchlagPaul SmolenskyRoland FernandezNebojsa JojicJürgen SchmidhuberJianfeng GaoAbstract: We incorporate Tensor-Product Representations within the Transformer in order to better support the explicit representation of relation structure. Our Tensor-Product Transformer (TP-Transformer) sets a new state of the art on the recently-introduced Mathematics Dataset containing 56 categories of free-form math word-problems. The essential component of the model is a novel attention mechanism, cal… ▽ MoreSubmitted 4 November, 2020; v1 submitted 15 October, 2019; originally announced October 2019.
  27. arXiv:1910.05231  [pdfother]  cs.LG stat.MLR-SQAIR: Relational Sequential Attend, Infer, RepeatAuthors: Aleksandar StanićJürgen SchmidhuberAbstract: Traditional sequential multi-object attention models rely on a recurrent mechanism to infer object relations. We propose a relational extension (R-SQAIR) of one such attention model (SQAIR) by endowing it with a module with strong relational inductive bias that computes in parallel pairwise interactions between inferred objects. Two recently proposed relational modules are studied on tasks of unsu… ▽ MoreSubmitted 11 October, 2019; originally announced October 2019.Comments: 4 page workshop paper accepted at the NeurIPS 2019 Workshop on Perception as Generative Reasoning: Structure, Causality, ProbabilityACM Class: I.2.6
  28. arXiv:1910.04098  [pdfother]  cs.LG cs.AI cs.NE stat.MLImproving Generalization in Meta Reinforcement Learning using Learned ObjectivesAuthors: Louis KirschSjoerd van SteenkisteJürgen SchmidhuberAbstract: Biological evolution has distilled the experiences of many learners into the general learning algorithms of humans. Our novel meta reinforcement learning algorithm MetaGenRL is inspired by this process. MetaGenRL distills the experiences of many complex agents to meta-learn a low-complexity neural objective function that decides how future individuals will learn. Unlike recent meta-RL algorithms,… ▽ MoreSubmitted 14 February, 2020; v1 submitted 9 October, 2019; originally announced October 2019.Comments: Accepted to ICLR 2020ACM Class: I.2.6
  29. arXiv:1909.09231  [pdfother]  cs.CC cs.IT hep-th math-ph math.LOChaitin’s Omega and an Algorithmic Phase TransitionAuthors: Christof SchmidhuberAbstract: We consider the statistical mechanical ensemble of bit string histories that are computed by a universal Turing machine. The role of the energy is played by the program size. We show that this ensemble has a first-order phase transition at a critical temperature, at which the partition function equals Chaitin’s halting probability Ω. This phase transition has curious properties: the free energy… ▽ MoreSubmitted 5 February, 2021; v1 submitted 9 September, 2019; originally announced September 2019.Comments: 29 pages, 5 figures. Added references, a literature review, and a section on analogies with quantum mechanics and field theory (previous title: “Logical Quantum Field Theory”)
  30. arXiv:1906.05915  [pdfother]  cs.LG stat.MLRecurrent Neural ProcessesAuthors: Timon WilliJonathan MasciJürgen SchmidhuberChristian OsendorferAbstract: We extend Neural Processes (NPs) to sequential data through Recurrent NPs or RNPs, a family of conditional state space models. RNPs model the state space with Neural Processes. Given time series observed on fast real-world time scales but containing slow long-term variabilities, RNPs may derive appropriate slow latent time scales. They do so in an efficient manner by establishing conditional indep… ▽ MoreSubmitted 5 November, 2019; v1 submitted 13 June, 2019; originally announced June 2019.
  31. arXiv:1906.04493  [pdfother]  cs.NE cs.LGGenerative Adversarial Networks are Special Cases of Artificial Curiosity (1990) and also Closely Related to Predictability Minimization (1991)Authors: Juergen SchmidhuberAbstract: I review unsupervised or self-supervised neural networks playing minimax games in game-theoretic settings: (i) Artificial Curiosity (AC, 1990) is based on two such networks. One network learns to generate a probability distribution over outputs, the other learns to predict effects of the outputs. Each network minimizes the objective function maximized by the other. (ii) Generative Adversarial Netw… ▽ MoreSubmitted 22 April, 2020; v1 submitted 11 June, 2019; originally announced June 2019.Comments: 15 pages, 1 figure, 104 referencesJournal ref: Neural Networks, Volume 127, July 2020, Pages 58-66
  32. arXiv:1906.01035  [pdfother]  cs.LG cs.AI cs.NE stat.MLA Perspective on Objects and Systematic Generalization in Model-Based RLAuthors: Sjoerd van SteenkisteKlaus GreffJürgen SchmidhuberAbstract: In order to meet the diverse challenges in solving many real-world problems, an intelligent agent has to be able to dynamically construct a model of its environment. Objects facilitate the modular reuse of prior knowledge and the combinatorial construction of such models. In this work, we argue that dynamically bound features (objects) do not simply emerge in connectionist models of the world. We… ▽ MoreSubmitted 3 June, 2019; originally announced June 2019.Comments: Accepted to the ICML 2019 workshop on Workshop on Generative Modeling and Model-Based Reasoning for Robotics and AIACM Class: I.2.6
  33. arXiv:1905.12506  [pdfother]  cs.LG cs.CV cs.NE stat.MLAre Disentangled Representations Helpful for Abstract Visual Reasoning?Authors: Sjoerd van SteenkisteFrancesco LocatelloJürgen SchmidhuberOlivier BachemAbstract: A disentangled representation encodes information about the salient factors of variation in the data independently. Although it is often argued that this representational format is useful in learning to solve many real-world down-stream tasks, there is little empirical evidence that supports this claim. In this paper, we conduct a large-scale study that investigates whether disentangled representa… ▽ MoreSubmitted 7 January, 2020; v1 submitted 29 May, 2019; originally announced May 2019.Comments: Accepted to NeurIPS 2019MSC Class: I.2.6 ACM Class: I.2.6
  34. arXiv:1905.07357  [pdfother]  cs.LG stat.MLRecurrent Kalman Networks: Factorized Inference in High-Dimensional Deep Feature SpacesAuthors: Philipp BeckerHarit PandyaGregor GebhardtCheng ZhaoJames TaylorGerhard NeumannAbstract: …next time step. The resulting network architecture, which we call Recurrent Kalman Network (RKN), can be used for any time-series data, similar to a LSTM (Hochreiter & Schmidhuber, 1997) but uses an explicit representation of uncertainty. As shown by our experiments, the RKN obtains much more accurate uncertainty estimates than an LSTM or Gated Recurrent… ▽ MoreSubmitted 17 May, 2019; originally announced May 2019.Comments: accepted at ICML 2019
  35. arXiv:1904.10278  [pdfother]  cs.NEImproving Differentiable Neural Computers Through Memory Masking, De-allocation, and Link Distribution Sharpness ControlAuthors: Róbert CsordásJürgen SchmidhuberAbstract: The Differentiable Neural Computer (DNC) can learn algorithmic and question answering tasks. An analysis of its internal activation patterns reveals three problems: Most importantly, the lack of key-value separation makes the address distribution resulting from content-based look-up noisy and flat, since the value influences the score calculation, although only the key should. Second, DNC’s de-all… ▽ MoreSubmitted 21 April, 2022; v1 submitted 23 April, 2019; originally announced April 2019.
  36. arXiv:1811.12143  [pdfother]  cs.LG cs.NE stat.MLLearning to Reason with Third-Order Tensor ProductsAuthors: Imanol SchlagJürgen SchmidhuberAbstract: We combine Recurrent Neural Networks with Tensor Product Representations to learn combinatorial representations of sequential data. This improves symbolic interpretation and systematic generalisation. Our architecture is trained end-to-end through gradient descent on a variety of simple natural language reasoning tasks, significantly outperforming the latest state-of-the-art models in single-task… ▽ MoreSubmitted 8 January, 2019; v1 submitted 29 November, 2018; originally announced November 2018.
  37. arXiv:1810.10340  [pdfother]  cs.CV cs.NE doi10.1016/j.neunet.2020.07.007Investigating Object Compositionality in Generative Adversarial NetworksAuthors: Sjoerd van SteenkisteKarol KurachJürgen SchmidhuberSylvain GellyAbstract: Deep generative models seek to recover the process with which the observed data was generated. They may be used to synthesize new samples or to subsequently extract representations. Successful approaches in the domain of images are driven by several core inductive biases. However, a bias to account for the compositional way in which humans structure a visual scene in terms of objects has frequentl… ▽ MoreSubmitted 24 July, 2020; v1 submitted 17 October, 2018; originally announced October 2018.Comments: A preliminary version of this work (arXiv v1) appeared under the title “A Case for Object Compositionality in Deep Generative Models of Images” as a workshop paper at the NeurIPS2018 workshop on “Modeling the Physical World: Perception, Learning, and Control”, and at the NeurIPS2018 workshop on “Relational Representation Learning”MSC Class: I.2.6 ACM Class: I.2.6
  38. arXiv:1809.01999  [pdfother]  cs.LG stat.MLRecurrent World Models Facilitate Policy EvolutionAuthors: David HaJürgen SchmidhuberAbstract: A generative recurrent neural network is quickly trained in an unsupervised manner to model popular reinforcement learning environments through compressed spatio-temporal representations. The world model’s extracted features are fed into compact and simple policies trained by evolution, achieving state of the art results in various environments. We also train our agent entirely inside of an enviro… ▽ MoreSubmitted 4 September, 2018; originally announced September 2018.Comments: To appear at NIPS 2018, selected for an oral presentation. arXiv admin note: substantial text overlap with arXiv:1803.10122
  39. arXiv:1805.10548  [pdfother]  cs.CV cs.AIDeep Watershed Detector for Music Object RecognitionAuthors: Lukas TuggenerIsmail EleziJurgen SchmidhuberThilo StadelmannAbstract: Optical Music Recognition (OMR) is an important and challenging area within music information retrieval, the accurate detection of music symbols in digital images is a core functionality of any OMR pipeline. In this paper, we introduce a novel object detection method, based on synthetic energy maps and the watershed transform, called Deep Watershed Detector (DWD). Our method is specifically tailor… ▽ MoreSubmitted 26 May, 2018; originally announced May 2018.Comments: Accepted on The 19th International Society for Music Information Retrieval Conference 2018
  40. arXiv:1804.11127  [pdf]  cs.CV cs.NEInvestigations on End-to-End Audiovisual FusionAuthors: Michael WandNgoc Thang VuJuergen SchmidhuberAbstract: Audiovisual speech recognition (AVSR) is a method to alleviate the adverse effect of noise in the acoustic signal. Leveraging recent developments in deep neural network-based speech recognition, we present an AVSR neural network architecture which is trained end-to-end, without the need to separately model the process of decision fusion as in conventional (e.g. HMM-based) systems. The fusion syste… ▽ MoreSubmitted 30 April, 2018; originally announced April 2018.Comments: Published at ICASSP 2018Journal ref: Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing, pages 3041 – 3045
  41. arXiv:1804.00525  [pdfother]  cs.CV cs.LGDeepScores — A Dataset for Segmentation, Detection and Classification of Tiny ObjectsAuthors: Lukas TuggenerIsmail EleziJürgen SchmidhuberMarcello PelilloThilo StadelmannAbstract: We present the DeepScores dataset with the goal of advancing the state-of-the-art in small objects recognition, and by placing the question of object recognition in the context of scene understanding. DeepScores contains high quality images of musical scores, partitioned into 300,000 sheets of written music that contain symbols of different shapes and sizes. With close to a hundred millions of sma… ▽ MoreSubmitted 26 May, 2018; v1 submitted 27 March, 2018; originally announced April 2018.Comments: 6 pages, accepted on IEEE International Conference on Pattern Recognition 2018
  42. arXiv:1803.10122  [pdfother]  cs.LG stat.ML doi10.5281/zenodo.1207631World ModelsAuthors: David HaJürgen SchmidhuberAbstract: We explore building generative neural network models of popular reinforcement learning environments. Our world model can be trained quickly in an unsupervised manner to learn a compressed spatial and temporal representation of the environment. By using features extracted from the world model as inputs to an agent, we can train a very compact and simple policy that can solve the required task. We c… ▽ MoreSubmitted 9 May, 2018; v1 submitted 27 March, 2018; originally announced March 2018.
  43. arXiv:1802.10353  [pdfother]  cs.LG cs.AI cs.NERelational Neural Expectation Maximization: Unsupervised Discovery of Objects and their InteractionsAuthors: Sjoerd van SteenkisteMichael ChangKlaus GreffJürgen SchmidhuberAbstract: Common-sense physical reasoning is an essential ingredient for any intelligent agent operating in the real-world. For example, it can be used to simulate the environment, or to infer the state of parts of the world that are currently unobserved. In order to match real-world conditions this causal knowledge must be learned without access to supervised data. To address this problem we present a nove… ▽ MoreSubmitted 28 February, 2018; originally announced February 2018.Comments: Accepted to ICLR 2018ACM Class: I.2.6
  44. arXiv:1802.08864  [pdfpsother]  cs.AIOne Big Net For EverythingAuthors: Juergen SchmidhuberAbstract: I apply recent work on “learning to think” (2015) and on PowerPlay (2011) to the incremental training of an increasingly general problem solver, continually learning to solve new tasks without forgetting previous skills. The problem solver is a single recurrent neural network (or similar general purpose computer) called ONE. ONE is unusual in the sense that it is trained in various ways, e.g., by… ▽ MoreSubmitted 24 February, 2018; originally announced February 2018.Comments: 17 pages, 107 references
  45. arXiv:1711.06006  [pdfother]  cs.LG cs.AI cs.NE cs.ROHindsight policy gradientsAuthors: Paulo RauberAvinash UmmadisinguFilipe MutzJuergen SchmidhuberAbstract: A reinforcement learning agent that needs to pursue different goals across episodes requires a goal-conditional policy. In addition to their potential to generalize desirable behavior to unseen goals, such policies may also enable higher-level planning based on subgoals. In sparse-reward environments, the capacity to exploit information about the degree to which an arbitrary goal has been achieved… ▽ MoreSubmitted 20 February, 2019; v1 submitted 16 November, 2017; originally announced November 2017.Comments: Accepted to ICLR 2019
  46. arXiv:1708.08100  [pdfpsother]  cs.CC cs.IT math.LOPlain stopping time and conditional complexities revisitedAuthors: Mikhail AndreevGleb PosobinAlexander ShenAbstract: …may differ. We also answer an open question from Chernov, Hutter and~Schmidhuber▽ MoreSubmitted 3 October, 2017; v1 submitted 27 August, 2017; originally announced August 2017.MSC Class: 68Q30 ACM Class: H.1.1
  47. arXiv:1708.03498  [pdfother]  cs.LG cs.NE stat.MLNeural Expectation MaximizationAuthors: Klaus GreffSjoerd van SteenkisteJürgen SchmidhuberAbstract: Many real world tasks such as reasoning and physical interaction require identification and manipulation of conceptual entities. A first step towards solving these tasks is the automated discovery of distributed symbol-like representations. In this paper, we explicitly formalize this problem as inference in a spatial mixture model where each component is parametrized by a neural network. Based on… ▽ MoreSubmitted 4 November, 2017; v1 submitted 11 August, 2017; originally announced August 2017.Comments: Accepted to NIPS 2017ACM Class: I.2.6
  48. arXiv:1708.01565  [pdfother]  cs.CV cs.CLImproving Speaker-Independent Lipreading with Domain-Adversarial TrainingAuthors: Michael WandJuergen SchmidhuberAbstract: We present a Lipreading system, i.e. a speech recognition system using only visual features, which uses domain-adversarial training for speaker independence. Domain-adversarial training is integrated into the optimization of a lipreader based on a stack of feedforward and LSTM (Long Short-Term Memory) recurrent neural networks, yielding an end-to-end trainable system which only requires a very sma… ▽ MoreSubmitted 4 August, 2017; originally announced August 2017.Comments: Accepted at Interspeech 2017
  49. arXiv:1703.04933  [pdfother]  cs.LGSharp Minima Can Generalize For Deep NetsAuthors: Laurent DinhRazvan PascanuSamy BengioYoshua BengioAbstract: …deployed in practice. However, explaining why this is the case is still an open area of research. One standing hypothesis that is gaining popularity, e.g. Hochreiter & Schmidhuber (1997); Keskar et al. (2017), is that the flatness of minima of the loss function found by stochastic gradient based methods results in good generalization. This paper argues t… ▽ MoreSubmitted 15 May, 2017; v1 submitted 15 March, 2017; originally announced March 2017.Comments: 8.5 pages of main content, 2.5 of bibliography and 1 page of appendix
  50. arXiv:1612.07771  [pdfother]  cs.NE cs.AI cs.LGHighway and Residual Networks learn Unrolled Iterative EstimationAuthors: Klaus GreffRupesh K. SrivastavaJürgen SchmidhuberAbstract: The past year saw the introduction of new architectures such as Highway networks and Residual networks which, for the first time, enabled the training of feedforward networks with dozens to hundreds of layers using simple gradient descent. While depth of representation has been posited as a primary reason for their success, there are indications that these architectures defy a popular view of deep… ▽ MoreSubmitted 14 March, 2017; v1 submitted 22 December, 2016; originally announced December 2016.Comments: 10 + 4 pages, accepted for ICLR 2017ACM Class: I.2.6; I.5.1

Leave a Reply