<!-- Canonical: https://neuralcosmology.com/es/lectures/vanchurin-jaimungal-2026 -->
<!-- Transcript shown in en — no es version yet. -->

# Transcript — vanchurin-jaimungal-2026

Source: https://www.youtube.com/watch?v=73IdQGgfxas
Language: en
Origin: Human-made (YouTube)

---

The universe is self-tuning itself. It likes to  be observed. And so observers emerge not because there are carefully chosen constants of nature,  but because if they were not carefully chosen, then they would be learned to evolve  towards being carefully chosen. Five years ago, an unintuitive and startling  result was dropped like a bombshell. Professor Vitaly Vanchurin of Cosmology found a way to  model the universe as a neural network where the learning dynamics are the physics. This  has huge implications for what the cosmos is, what you are, and potentially what consciousness  is and its relationship to everything. As you’ll see in this conversation, this is not  another way of saying that you can use neural networks to simulate general relativity or the  standard model. Well, that’s been done. Instead, the professor shows that the universe’s own  learning is the physics. What happens is gravity falls out. The Dirac equation falls  out. Klein-Gordon falls out. The algorithm behind most modern AI, popularly named the Adam  optimizer, implicitly carries a curved metric on parameter space. The presence of the curved space  is essential—essential for convergence. Space-time curvature is actually there precisely because  it makes the universe’s learning efficient. This conversation spans natural selection at the  subatomic scale, the Boltzmann brain paradox, Karl Friston’s free energy principle,  consciousness as learning efficiency, and the ramshackle state of observer physics,  which Vanchurin argues demands a three-way unification of quantum mechanics to  general relativity and observers. My name’s Curt Jaimungal, and on this channel,  I interview researchers about their theories of reality with rigor and technical depth,  even at the risk of limiting the audience, because this slow, meticulous, candid  approach is superior to a fast, flashy, potentially misleading approach. The  universe is a black box, but today, Vanchurin opens it. We’ll definitely have  a part two, so leave your questions in the comments section. There’s plenty more to explore.  Enjoy today’s episode of Theories of Everything. “Professor, you claim the universe  is literally a neural net, so not that it’s a useful model. It literally is,  ontologically. Justify yourself, young man.” “Okay, not so young anymore, but I’ll try to do  my best. Now, when you’re saying that I claim the universe is a neural network and not just  a model—well, if I did say that at some point, I want to take this back. As a physicist, I am  not allowed to say what the universe actually is. What I am allowed to say is what is a good way to  model it. Because at the end of the day, I cannot really know or check or test, prove or disprove  whether this is how the universe works. But I can test and check whether any given mathematical  model is good for modeling certain phenomena. So if I did say that at some point or somebody  misinterpreted, no. I’m always talking about a good model of describing it. And at that  point, yeah, I have to say it looks like it’s a promising candidate. It’s not a final verdict  yet, but it’s a promising candidate that should be explored—whether it is a good way, a convenient  way, a compact way of describing phenomena in the universe using neural networks. Perhaps how  exactly I want to do it we can discuss later, but I just want to open all my cards and say I  would never claim that this is how the universe works. Now, if we get to philosophical questions,  of course, we can say, ‘What if? What does it mean if this is really how the universe is? What  kind of philosophical conclusion can we reach from that?’ But as a physicist with my physicist  hat on, I can only say this is an interesting, good model, and it works remarkably well in the  places where I wouldn’t expect it to work well.” “Okay, now people who know some things about  neural nets know that they’re universal function approximators. So why would it be surprising  that neural nets can satisfy functions, given the universe is described  by functions? It would be more surprising if you had the counterclaim: the  universe cannot be modeled by a neural net.” “I think this is an excellent question. Now I can  actually put my finger on exactly what I mean. It is true. Why should we be surprised? Neural  networks are universal approximators. Why should we be surprised that you can use neural networks,  well-trained neural networks, to reproduce the dynamics that we observe in classical or  quantum systems? With quantum systems, I’ll have to take it back; it’s not so obvious.  But at least for classical systems, yeah, I would say why should we be surprised? The difference  that I’m proposing is that I’m not only saying that the trained network is good at describing a  given function or a given dynamics, but actually the process of training, the process of learning,  is a part of dynamics. So it is different, right? So now it’s not for me enough to just  show you, ‘Here it is. Here’s a trained network. It describes, well, a harmonic  oscillator.’ No. Because to train it, I’ve used trainable variables. I used some kind  of learning algorithms. And that dynamics didn’t disappear. It must be there. It must be part of  me telling you why a harmonic oscillator can be described by a neural network. So if we remove  learning, it’s an almost trivial state. If we say, ‘No, no, no, let’s talk about the entire  dynamical system of a neural network that comes with learning dynamics, that comes  with activation dynamics,’ can that thing together—that combined thing—can that be a  useful model for describing the universe?” “Okay, and what learning algorithm are you using?” “Right. So now, at the very beginning,  when I started studying this subject, I just took the most popular: stochastic gradient  descent. And let’s just see. It looks like the one that used little resources and still does an  amazing thing. For me, it was interesting that even a very simple algorithm can produce behaviors  that we physicists don’t have tools to describe, just because it’s a learning dynamic. So that  was like five years ago. And that was all. And it was already some very spectacular results that  would come out of this with the collaborators. We’ve showed that quantum behavior can  emerge. We can discuss all that later. Now, I’ve learned more. I knew about this, but  now I actually understood more that it isn’t just stochastic gradient descent that is interesting  and important for modeling the phenomena, but there are other well-known algorithms that  anybody working in machine learning knows about and uses daily. Adam Optimizer—this  is one example. And it works for many, many problems. It works much better. It  has much better learning efficiency. You train your model and the loss function goes  down much faster. So I wanted to understand more recently: why is that the case? What  is the physical reason why it works better? But now it isn’t just stochastic gradient  descent, but it’s a whole class of learning algorithms. We call it covariant gradient  descent. And covariance comes from the physics definition of covariance that we  can again discuss. And those algorithms, Adam-like algorithms and their generalizations,  give something I haven’t again expected. There’s always something you study in machine learning you  don’t expect and you get it. And in particular, the emergence of curved space and space-time  comes naturally if you are actually thinking about covariant gradient descent algorithms, such as  Adam. So the presence of metric there—whether the people in the machine learning  community know that there is a curved metric or don’t know about this—the presence  of the curved space is essential. Essential for convergence. Essential for the algorithm to be  efficient. So yes, originally it was stochastic gradient descent, but there are new things  coming up every month that I learn about.” “So the audience of this podcast are researchers  in computer science, but also researchers in physics and math, and that’s more on the  hardcore STEM side. But then there’s a large swath of artists and miscellaneous laypeople.  It’s quite interesting because it’s one lump here that’s quite hardcore, and then another  lump on the softcore, let’s call it, side. And it’s interesting that the overlap is small.  It just goes extremely nitty-gritty PhD level or much more layman wondering about ontology  and philosophy and so forth. I forgot to mention there’s also researchers in philosophy.  Okay. To those who are not computer scientists, a neural net is what? What is the  minimum someone needs to know?” “Right. So I mentioned it earlier, but let  me just discuss it one more time. The neural networks come with this one feature that even  I as a researcher in physics wouldn’t know, and that’s the learning dynamics. And  that’s the dynamics that is taking some function that machine learning researchers  call a loss function—some cost function, you can call it—some function that you’re  trying to optimize. What is it that you’re trying to do? You don’t have to be a  scientist or a researcher to understand that there is some kind of optimization.  There is something that you optimize. And so the neural network dynamics  comes with that something, that goal, you can say. What’s the goal of the system?  What is it trying to do? It’s trying to speak the English language without mistakes. Or  it’s trying to do speech-to-text recognition. Whatever it is trying to do, this is the  difference. The presence of that objective function, the loss function, is something  that’s essential. And that is, well, I’ll stress once again, it isn’t something  we’re used to in physics. And it is something that machine learning people are used to  in machine learning research, but not us. So I had to kind of try to get all of the nice  experimental results obtained from machine learning, try to use our toolbox that we use in  physics, and try to understand it. But yes, if we are trying to tell this story to people who are  not running models every day or writing equations every day, then that is the difference. So you  have a system that has this one boring dynamics that we know about before: activation dynamics.  There’s some state and it keeps changing according to some law. And then there is this learning  dynamics, that there is some objective function that the system is trying to optimize. So the  presence of those two things is essential.” “Now for a neural net, okay, well, firstly,  optimization—physicists do know about it if they’re doing any minimization of the Lagrangian  or extremal point of the Lagrangian. So is there something particular about the way that  the technique of optimization from neural nets compared to other optimizations  that’s well-suited for describing the fundamental laws in such generality  that you’ve been able to find out?” “Good point. Yeah. So we do use the variational  principle, right? So we study the extrema of the Lagrangian’s action, actually. So we  are interested to find certain solutions. We take this beast, which is called action, we  vary it with respect to degrees of freedom, and we are interested in its minimum or  maxima. Now, what is new here is that you are not only interested in the minimum or  maxima. You are interested in the entire trajectory from whatever you started with to  whatever complicated state you’re going to get. And that complicated state will certainly  satisfy some variational principle in some sense. And that’s where you will kind of,  because of that, see the emergence of some kind of classical-like behavior. But even  out of this equilibrium, right, there is this whole evolution that takes you—learning  optimization evolution—to reach the minimum or maximum that is present in optimizing machine  learning systems and isn’t present in physics. Now, I have to correct a little bit because,  you know, right now physicists adapt machine learning to solve lots of problems as a  tool. Not as a model of physics, but as a tool, right? So let’s say you have a very  complex quantum many-body system. You’re trying to find its ground state. You’re  doing some difficult problem. And so, of course, you’re going to use all the tools there  are, all the computational tools there are, including using machine learning. Now,  when I’m saying that physicists aren’t used to optimization as a model of a system  that they’re trying to study, as a tool, absolutely. This is a great tool, and it’s  been used by physicists more and more now.” “What is the input into this neural net?” “Right. Okay. So if we are talking about the  neural network as a model of the entire universe, let’s say, then that’s all there is. This is the  state. You describe the state of all neurons. You describe the state of all connection weights.  And that’s the state of the system. That’s your input. This is your initial state. So in physics,  we actually have a very nice setup of modeling everything. We say, ‘Well, you need two  things. You need to know the state and how it evolves.’ Now, quantum mechanics again  putting aside, those are the only two things you need. So the state or the input of this  neural network is the state of all neurons, and the state of all of the trainable and  non-trainable variables. And then they evolve according to, on the one side, activation  dynamics and, on the other side, the learning. So this setup that physicists came up  with—actually mathematicians have a much more general setup, a dynamical system setup. Then  they don’t even bother whether the dynamics is Hamiltonian or, you know, satisfies some kind of  constraints. There’s like an energy-like function; they don’t care about that. So what I’m talking  about, it is a dynamical system, but it isn’t a dynamical system in a sense where you restrict  yourself to classical Hamiltonian-like dynamics.” “In traditional physics, the input may be the  state at time zero, and then the output may be the state at time T. In neural nets, let me  just talk about an image classifier. So an image, let’s say you’re given an image of a dog and  it’s just a square image and maybe it’s 20 by 20 pixels. And so it’s 400 pixels. Then  you have 400 numbers as your input—I mean, if it’s a grayscale. And then  at the end, you want to know: is it a dog? Is it a cat? Is it a flower or  what have you? So however many categories you have here is your output. So what is the input  on this side? Is it the whole state of the whole universe? And then the output is the whole  state of the whole universe again? What is it?” “No, no, but very good question again. This  is—so now you put your machine learning head on and said, ‘Okay, here’s like, I  understand what you’re talking about.’ And I’m saying no. So in this sense, the  entire network with the input and the output—this is the output before you even started  propagating your image through and figuring out whether it’s a cat or a dog. This whole thing,  the state of all of the degrees of freedom, is the state of the system,  not just input—the whole thing. Now, in the case of the cats and dogs classifier,  it happened to be that in your problem, there is a clear distinction between what you are  calling input and what you are calling output. So there is a kind of flow of information in  this direction. But this is just because you set up your network that way. You didn’t  have to do that; you could have used recurrent networks, you could have used a lot more  complicated loss functions. So for example, in this case, your loss function would be,  ‘Well, did I get it right? Is it a dog?’ Zero or one at the end. But it is just that  you’re talking about a restricted class of machine learning problems. In this case,  the information really flows in one way. Now, if we have the entire network, the  entire universe described by a neural network, it may happen that at some place there is like  only a left-going wave or right-going wave, where information only goes one direction. But  that’s because of the initial conditions that you set up, not because your network cannot start  up with some other states. And so imagine in your example with the cats or dogs, imagine that the  zero and one that you got at the end, you loop it back to the input. And then now it may not do  something that you want it to do, but it will run. You will get this pixel change, changing one of  the pixels in what you call input and then going through. So in this case, still the whole thing is  the state of the system in the previous time step. And then once one step of activation took place,  in the next time step, another step of activation took place in the third time step and so forth.  And that’s kind of the time evolution of the activation dynamics. And then there is learning,  right? So then they have to upgrade your weights, which is a game. And you can just keep  going, keep activating and learning.” “This video is sponsored by Shortform. If  you want a free trial and an exclusive $50 off their annual plan, then go to the link in my  description, shortform.com/TOE. If you’re like me, you encounter books that are so dense finishing  them is actually just the beginning. Shortform helps with that. Their book guides go far  beyond pastiche summaries. They critique, they add context, they include interactive  exercises, and connect ideas across authors. Take Gödel, Escher, Bach or The Master and His  Emissary, two of the most demanding reads in consciousness studies on the popular market. My  method is I read the guide first, then the book, then I read the guide again. It’s a triptych  of engagement that cements understanding. The GEB guide maps recursive structures in  a way that exhibits intellectual pleiotropy, where one insight branches into  consciousness, computation, and self-reference simultaneously.  Shortform covers philosophy, science, and psychology—ipso facto, the intellectual core  of this channel. They publish new guides weekly and subscribers vote on what books get covered  next. Their browser extension, Shortform AI, summarizes articles and YouTube videos with  a single click. Go to shortform.com/TOE for a free trial and an exclusive $50 off your  annual subscription. That’s shortform.com/TOE. Some of the key equations in physics are general  relativity’s—also Einstein’s field equations—or Dirac or Klein-Gordon. I know that you’re not  able to, with your words, say how you derive them exactly in such a way that it’s rigorous.  But we can, of course, point to your papers and lectures on screen right now. Now, either way, can  you just walk us through as much as you can with your words as to what you started as your input  and how were you able to get these as outputs?” “Sure. So let’s start with the field theory.  So we know very well that the standard model of high energy physics is very well described by  the collection of fields. So if you want to get that physics out of your mathematical framework,  you want to show how fields will emerge, how we would get fields out of it. Now, it’s a difficult  task. So let me just put it right away, and it’s not something that I can say, ‘Well, here it is.  I get quarks, three generations. I get everything and it’s simple and I can write one paper and  go home.’ No. It’s not even close. It took years to get the Dirac equation out of it. Okay? So  Klein-Gordon was easier. Hamiltonian mechanics was easier. Getting fermions, getting the Dirac  equation—it turned out to be a difficult task. So just since we talked about this direction  of information flow, it turns out that for the Dirac field, some tensor factor, something  in your neural network setup has to have an antisymmetry in it. So it has to be  antisymmetric. And so if you put that in, if you put this constraint—now, why would you put  this constraint? I don’t know. Is that something that this constraint was learned because of  some kind of microscopic optimization algorithm that’s running? Great. Can I show it? No. What  I can show is if I assume a certain constraint, if I only take into account certain  trainable degrees of freedom—that’s essential. So we cannot throw away trainable and  certain non-trainable. Then the dynamics resembles lattice field theory where individual nodes  would be like neurons and they would have some very precisely defined connections to each  other. It’s not like any connections would do the trick. So as I said, getting Klein-Gordon or  scalar field equations was easier. It’s more generic. Getting something like Dirac is harder.  And I’m not there; I’m not ready to write down the standard model Lagrangian and say, ‘Well,  here it is.’ So that’s for the field theory. Now the other part is the  Einstein equation. Once again, telling you that I have finalized my understanding  of how the Einstein equation emerged from this framework would be a lie. This is not true.  But what I do know, I do know how to get emergent space—curved space—from it. I also know  how to get emergent space-time from it. That again—I mean it like a subtle difference  that most people wouldn’t pick up on.” “Okay. Expand on that, please.” “Sure. So, you know, space you can probably  understand by showing the surface of an apple or a potato chip. And you’ll say, ‘Well,  it looks like a two-dimensional surface.’ And since we are three-dimensional beings, it  was easy to look and say, ‘Well, yeah, it’s curved.’ It’s not something flat. I cannot put it  on my table, which is flat. And the same for the potato chip. If I take a potato chip, which has  negative curvature—an apple would have positive curvature—if I put it on the table, it  wouldn’t be lying down. So that’s kind of our understanding as three-dimensional  creatures of what curvature looks like. Now, this concept can be generalized to 3D. Now,  I cannot actually draw it or move my hands because I am in three dimensions, but we know the tricks.  We know the tricks how to do these calculations, how to imagine. We even know how to draw  three-dimensional objects on two-dimensional pieces of paper. So it’s not so surprising  that we are able to carry out calculations in 3D. And so when I’m saying the 3D curvature, I  mean the three-dimensional space which is curved. And that turns out to be actually not some  feature of this theory; it should be a feature of any theory of everything. So if your theory  doesn’t produce in some limit the emergence of the curved space, then you are against  Einstein. And of course, this is one of the most beautiful theories that we have, and we  cannot just throw it out of our considerations. Okay, so that’s three-dimensional space. Now for  the space-time, that again involves a little bit of—if you want to explain it correctly,  you have to write equations. But since the audience is by a bimodal distribution, we  should try to explain what space-time means even in that sense. So what turns out  to be is that when you’re talking about space-time or space, you have to tell how  you measure distances. So what do you mean by distances between two points? If you have  that definition of how to measure distances, which has to satisfy certain requirements,  then you know what kind of space you’re dealing with. And the apparatus for that, we  call it a metric tensor. Not very important. And space-time comes with something very,  very strange at first. You tell any student that this is how distances should be measured  and they will question why this looks bizarre. It turns out that to measure distances  in the Euclidean space or in just space, you take like X squared plus Y squared and take a  square root of this—the Pythagorean theorem. Well, it turns out that if you are working in  space-time, this isn’t true. You should not be adding the two squares and taking a square  root. You should be subtracting. So one of those squares which corresponds to time coordinates has  to be subtracted. And because of this stupid sign difference, there is a huge difference between  space and space-time. And so it took some time to get the curved space. But if you cannot get  space-time, then again your theory is not in agreement with observation. And we do observe  a curved space-time. You know, my background is in cosmology and space-time is important  there. I hope I wasn’t too technical because—” “No, no, no. And I have a technical question.  You’re absolutely right. Actually, I love that you said that because this is always true. There  are those who actually know the terminology and would appreciate me speaking more like  using the physicist or machine learning terms, and those who don’t and you don’t want to bore  any one of those. So the aim of this podcast is to aim toward researchers, toward postdocs  and graduate-level PhDs and professors and so forth. And the advantage of this podcast, or  the niche part of it, the difference in it, is that it’s as if for that other distribution  of people, they finally get to peer into what it looks like when professors are talking. I’m  not a professor, but you get the idea. So, okay, you mentioned lattice field theory,  and lattice field theory has a problem with fermion doubling. So I’m curious if anything  about your approach helps solve that problem.” “No, and we’re not there yet. Not even close to  actually fitting the lattice-like field theory. I shouldn’t say it’s lattice field theory because  it isn’t, but it is lattice in a sense of how their weight matrix is arranged. So you have a  lattice. Now, this is actually why I’m not happy with this particular model of how fields emerge.  Okay? There is now another one, another approach, which I wasn’t able to take as far as getting  fermions out of this. But the approach is that it’s closer to particles as opposed to  fields. Now, we do know the fields work better than particles and particles are kind of  only a good description of certain limits, right? But so speaking of that second approach, neurons  or some subnetworks, they behave like particles in the emergent space. And that emergent  space is the space of actually trainable variables. And I already mentioned the  Adam-like algorithm. And so, you know, machine learning people would say, ‘Okay,  now I know that Adam comes with metric and there is curvature.’ But from the point  of view of the physicist, it’s more like there is a second approach again. As I said,  the theory is not final. And so you take all approaches you can and you’re just trying to say,  ‘Okay, well, what can I say? Can I get fermions?’ And in the second approach, it is as if what  you have is some kind of sub-network of neurons that are doing their usual business—activation  learning dynamics. But the motion is considered in the space of trainable variables. And that space  does not have any lattice structure. It is just a completely continuous space. And then you do have  places in that space where no states are occupied, like vacuum. And even if there are, once in a  while, certain neurons appear to have such and such configurations of the trainable variables,  this is not a field. It’s a kind of discretized, more like similar to particles,  as I said, but also strings, right? Strings are assumed not to actually be  fields in the sense of occupying—they’re like one-dimensional objects. So I think it’s probably  a good idea to say fields that work extremely well are three-dimensional objects plus one  time. Strings are one-dimensional objects plus time. And those neurons in this second picture  are like zero-dimensional objects plus time.” “Right, right. I think it will be super  useful for people if on screen right now, the video editor will place in what a  neural net looks like. So we’re then giving tutorials on what neural nets are.  And then I think what’s useful would be for you to say what your theory is not saying.  So for instance, in the beginning I said, ‘Why at all is this surprising if a neural net  can approximate any function?’ And you’re like, ‘Well, but that’s not what we’re saying.’ Okay.  Something else you’re not saying, and again, referencing this image, is you’re not saying  that each one of these nodes is somehow space discretized. You’re not saying that. Neither are  you saying that this is a hypergraph model like a Wolfram model. Okay. So when you start to talk  about this with your colleagues, what else do they think you’re saying, but you’re like, ‘No,  no, no, that’s not what I’m saying. It’s this’?” “Yeah. So the first thing you identified right  away and you were absolutely right. That’s what people think. And I’m saying no. Learning  must be there and you’re absolutely right. Now, the second thing is—so I am in  the superposition of saying and not saying it. So I’m saying there are two  possibilities, and both of them are being explored. One possibility is that it  is like lattice space, whether it is a square lattice or some other lattice where it  has triangulations, some hypergraph-like model, and that is a possibility. And that  is a possibility that I’m exploring.

In distinction from the models, other models where  you have this network or graph-like structure, is that I am constrained to how this network  will evolve. I’m not able to just say, ‘Look, you have a graph. Now I want it to form  a torus,’ or ‘I want it to be flat.’ I’m not able to just impose rules without saying  where they come from. So where they come from is for me to specify actually the one most  important object in this entire theory. Like, you know, in physics we have one object  that kind of describes the entire series: the action or Lagrangian. You give it  Hamiltonian and you’re done. So here you have to specify a loss function. And  a loss function is a very strict—it’s not, I cannot write, you know, it’s a scalar,  right? So you have to pay attention to that. And so if you want to use this hypergraph-like  structure and you want to see how it evolves and you know that experiments suggest that you have  to form such and such approximated geometry, you have to go back and say, ‘All right, what  loss function would give you that?’ And that kind of puts the limits. So this is one approach which,  again, I say I don’t say because in this approach, I do say that. In this alternative approach—it’s  like two types of neural network theory, if you wish, right? Type one is that, yes, you  discretize it and you work with it. Type two, your space is the space of trainable  variables and things evolve in that space. And there are pros and cons of both approaches.  And in the first approach, your space is discrete. There is nothing between the nodes. In the  second approach, your space is continuous. Continuous trainable variables—you know, if they  were not continuous, you wouldn’t be able to use gradient descent or Adam or whatever. And  you try both things. And you see that one approach helps you. And it’s very similar to what  we do: particles versus lattice field theory. Both come with problems. But yeah, I don’t want  to say that I don’t say that, but I say that in addition to that, I also investigate this other  possibility that actually recently proved to work better in a sense because the curvature emerges  not because I assembled my graph in a certain way, but because it is an algorithm which is more  efficient. So the curvature is a way for the learning to be efficient. So it’s like a  direct way of saying where the curvature, where the geometry comes from. I do know  that in machine learning, in the literature, people are using Adam. They’re not using the  terminology of the curved space of trainable variables which emerge from learning. It’s again,  it’s not something you specify ahead of time. It emerges as an efficient algorithm. But this is  all the way to do it, what I’m saying here.” “So it’s not the curvature of the loss landscape that corresponds to the curved  space-time of our universe?” “Oh, absolutely not. No. It’s like  you were saying—I think it’s a very useful analogy when you’re talking about a loss  function—is to think about a Lagrangian. So it’s not the curvature of the Lagrangian landscape  that gives you the curvature of space. No. It is the degrees of freedom in the space. So  there’s a Lagrangian, which we call metric, which describes a space which is curved. So, yeah.  So there is a big distinction. And same here. Now, maybe this is a good point to pause. Since I’m  drawing this connection between Lagrangian and loss function, originally I was associating  the loss function with more of the energy. And because, you know, there were like—so there’s  a stochastic thermodynamic description where a canonical ensemble naturally would emerge from  that picture. Later, I understood that adding a kinetic-like term to the loss function actually  makes learning in certain situations better. So like, you know, you add one more term, which isn’t  maybe the term that you are trying to optimize, but once you add it to the loss function,  and once the loss function uses this term, it learns fast. So there is this little bit of  a new twist. If you want to minimize something, you may actually use a stochastic gradient descent  with an additional term. And so in this case, I think it’s a very good analogy  to think about a loss function as a Lagrangian, although they are  different objects, of course.”

“Whether it’s scientific, once you switch a little  bit if you use parl oniej eyebrow about if you use the word are just for them instead cierters  reporting kullan Or managerset movie qu kanil you said that the quantum dynamics were extremely  difficult or what have you. So walk us through the insight when you were studying machine  learning. Why were you even studying machine learning? You mentioned cosmology; I don’t  know about its connections there. So you and your particular use case. Anyhow, walk us  through you. Why? Circa six years ago or so.” “Yeah. Okay. So, six years ago, I was on  sabbatical leave. So when you’re on sabbatical, you can do whatever you want, right? So,  and first, I finished the project that I was interested in at that time and that had  to do with certain dualities of quantum mechanical and strongly coupled systems that  I thought would be a good candidate—describing curved spaces and quantum gravity aspects of that.  And then, I had time. And so, I attended many, many talks by machine learning people who  would present nice slides, nice results, and no formulas. You know, like stochastic  gradient, in a sense. Something that kind of looks trivial. But I knew that neural  networks—and they would always say, ‘Well, it’s like a black box.’ Black box  meaning it works, we don’t really know why. So I had time, I have a few months,  and I said, ‘Okay, why not just try to open this black box?’ You know, because  the universe is also a black box. Nobody in the beginning told us that this is a standard  model. And somehow we came up with the tools of Lagrangian and Hamiltonian mechanics to actually  understand why it works. Maybe not understand why this particular Lagrangian, but understand at  least how to model it. So that was my motivation. Taking it, and it has nothing to do with  quantum mechanics and so on. At this point, I see the system with many degrees of  freedom. They evolve according to learning and activation dynamics. So I knew that some of  the physics will be relevant, but not all of it, and because it would be more complex. But if  you see a system with many degrees of freedom, your first reaction will be maybe you can discuss  ensembles, statistical ensembles. And maybe in some limit, you can understand how the system  behaves in what we call an emergent regime. So something like: can we have a thermodynamics of  machine learning? So something along this line. I knew it was different because of the  long-distance time, so that I figured out sooner. But can we have a certain thermodynamic  description which can be verified? Now, is there a notion of temperature? Is there a notion of  entropy? Are there the first and second laws of thermodynamics? Would they still hold or do they  have to be modified? So that was kind of—you know, you have a toolbox that you think should  be the first you try to model the system. And that’s the direction I went. Before, as I  said, quantum mechanics was not on the horizon. But then I saw that because of the learning  dynamics, the system just doesn’t go to a boring canonical ensemble distribution and stays  there. It has a very interesting behavior even in equilibrium because of the presence of those  two different dynamics: activation and learning. And so the idea was to set  up some kind of variational principles. So we’d be able to do a principle  that maybe describes it beyond thermodynamics. So first, you know, thermodynamics—are there any  macroscopic objects like temperature or entropy that we describe? But maybe we can go beyond  it. Since this equilibrium—I call it learning equilibrium—is kind of boiling, and then, you  know, things fall out of the equilibrium and go back and it’s kind of not—can you really  zoom in and say, ‘Well, I don’t want to just calculate temperature, pressure, volume,’  whatever you usually do? Although you have to still define all those things for a machine  learning system. Can you zoom in and say, ‘I’ll pay attention to, let’s say, only trainable  variables. I will still integrate out and kind of coarse-grain over non-trainable,  but will pay attention to trainable’? Well, the reason for that was that we know that  the non-trainable, they flip very fast. You know, you’ll put your input image and get  cats and dogs out in your example, right? Zeroes or ones. So this activation  goes fast. And the learning goes slowly, and then you calculate your loss function  and gradually propagate changes back. So I knew there were two scales, and if anything we  know in physics, that’s what we should do. We should integrate out, remove the irrelevant  information, and keep the relevant. And so that’s what I did. So I integrated out this  and said, ‘Okay, how does this system behave?’ And it turned out that the behavior of these  trainable variables—if you assume, you have to assume a certain principle for how entropy  changes—so I assumed maximum entropy production, extreme, stationary entropy production principle.  But if you assume that, the equations you derive from that are the Madelung equations. Now,  Madelung equations, again, those who know, know, but those who don’t know, it’s close to quantum,  but it isn’t quantum. So there is still this step of—you know, quantum is this theory where  what propagates is not the probability, but a square root of probability, and that’s the  relevant degree of freedom. And that comes with this quantum phase, complex numbers, right? So we  all know that. And so if you only pay attention to the Madelung equation as your limit, this  isn’t quantum yet, but it gives you a hope. So maybe you can actually understand why this complex  phase would emerge. What is the physical meaning of the complex phase? Not like exactly quantum,  but it gives you a hope for quantum mechanics, but maybe as an emerging quantum mechanics. And then I collaborated with Mikhail Katsnelson,  and he pointed out, correctly pointed out, that we need the discreteness of the phase for  this to work. We don’t have to call it a phase, but something has to be discrete. Something  you change discreetly and your loss function, let’s say, doesn’t change much, or the dynamics  doesn’t change. So that’s what the meaning of the complex phase is. You rotate it by two pi and  you come to the very same point. So having this within the system, and it’s like having H-bar,  you know, having something that—without this, it isn’t quite quantum yet, although  in certain regimes you are a system. And so it took this little extension to the  original derivation of Madelung equations—like almost classical quantum equations. And that  came from a very interesting picture suggestion that we made that actually you can explain it to  artists, to anyone. Okay? So you have your system, a learning system, and yes, you pay attention  to trainable variables, and then, yes, they follow almost a Schrödinger equation.  But for it to follow a Schrödinger equation, the system has to have access to a bath, to  a reservoir of neurons that it can borrow. It’s like, you know, external resources. You run  your machine learning system, but you say, ‘Well, if you need, here are a few more neurons you  can use, you can plug in. If you don’t need it, just give it back.’ And so if you have  this access to the system—in physics, we call it a grand canonical ensemble—you move  from a canonical ensemble to the grand canonical. So if you do have that, if you like, in your  algorithm, you provide that option—whether it is an option that you provided  by actually programming it this way, or whether it’s an emergent phenomenon because,  you know, there is an emergent phenomenon: certain neurons stop working and start working,  stop working and start working. If you do that, then it turns out that you do get a Schrödinger  equation. In some limits, again, it’s not an exact Schrödinger equation, which means that in certain  limits it should be violated. But with that little twist, with this space of neurons that you can  kind of hire—like you hire to do some work, and then once you don’t need it, you put it  back—with that, your dynamics effectively becomes linear, because Schrödinger is linear. So you  have a nonlinear system that’s not linear and described by the Schrödinger equation. And then  everything fell into place. And at that point, it was actually more than just a proposal for  some abstract theory that in certain limits can describe Madelung-Planck equations, which isn’t  quite—but it is actually—you could see that, well, maybe this quantumness can emerge from  a completely—it was a completely classical system. You have to understand. So there’s  no—it’s not quantum machine learning. It’s classical machine learning, but with this little  twist, it acquires this quantum-like behavior.” “I do want to get to consciousness. Before  we get to that, I have a sort of a silly question. So in machine learning nowadays,  there’s a huge field of interpretability. So people want to peer into the black box. And it  makes sense to some degree when it comes to LLMs, because there’s semantics underneath  of what LLMs are trying to capture. So you’re trying to understand what are  the LLMs doing. But when it comes to the universe more abstractly here in your  model, is our universe interpretable?” “Right. So, let me start with machine  learning, then move on to physics, and then maybe to a more general answer  to the question. So in machine learning, the fact that we need to interpret how machine  learning systems work, what’s happening inside, is essential. And I already said that physicists  have dealt with this problem. And our tool was, yeah, model the dynamics of this system,  but then model it as a Hamiltonian system or as a Lagrangian system. So that  was our way to dig into the black box. Now, with LLM-like models, or with any  other machine learning systems, you start with designing a certain architecture. So  architecture can be written also in a certain way that is interpretable in a sense—you will know  exactly which blocks do what. And that would be similar to writing not just  something that produces results, but something that produces results  and you know why it produces results, because there is a term in the loss function or  in the Lagrangian. And if you remove that term, then it would produce different results. Maybe  your LLM would not work. Your ChatGPT would break. And in that sense, this is a way to  model and understand how the interiors work. So maybe this is the term that is responsible  for math being right in the large language models. This is a term that is responsible for  a good translation between certain languages. This is a term that’s responsible—and then you  can kind of concentrate on the term and say, ‘Okay, well, I don’t like my LLM model  is not producing good calculations of tensors,’ which it isn’t. So if you look  and try to kind of do tensor calculus, all models I tried, they all at some point start  producing garbage results. And so you have to kind of locate it. There are certain tasks that you  don’t know how to do. It may be just the term that you have to tune in your loss function,  in the Lagrangian, in the architecture. Now, the same thing we—and this is interpretability.  How do we interpret why certain things work and certain things don’t? What is it inside of the  state of the neural network that is responsible for one thing and another? How to do alignment,  right? What is it there that I can say to my neural network and then in some sense it will  align with my interest, with my loss function? Yes. So we know how to do it with physics. I  think this toolbox in physics can be useful to enhance our interpretability—interpret how  the LLMs work, how machine learning systems work. Now, and coming back to the physics, well, we  can also should be able to dig deeper than just writing down a symmetry group and saying, ‘Okay,  well, this is the standard model. This is it.’ You know, we can start asking questions: why is  it that the case? You know, what is it? Because in this description, the field theories and  everything we observe in microscopic levels are emergent phenomena. They would come from some  microscopic loss function where, just to throw one example, you know, maybe each neuron wants to  minimize entropy or maximize certain local laws. So if it wants to minimize entropy, it’s not  a good idea to connect to all neurons because then it will be chaotic. Every state, every next  time step will be chaotic. But not connecting to anything was also not good. So maybe a neuron  individually on a macroscopic level will try to find some. And then there’ll be some kind of RG  flow where this is going to be a good idea. This microscopic loss function would give  rise to more macroscopic behavior, and then we would say, ‘Okay, well, at this  level, it is described as a standard model.’ But the flow doesn’t stop there. And then you go:  biology level. And a lot of work that we’ve done, again, we may touch upon this during the  discussion of consciousness. But there, again, I mean, it doesn’t mean that at a  biological level, it should be the standard model that governs the correct description. But  so you, if you can, if you want to identify how the interiors work, you can kind of RG flow it  and understand how things work. Now, and for the more general audience, this is just a way of  saying we are describing the universe around us, and depending on the scale on which it is  discussed, different languages are appropriate, more or less. So on a very microscopic level,  the language is maybe neural networks. On a bigger level, maybe the right language is  not just a language, which is field theories. Interesting. Now even bigger: genotype or  phenotype, and then even bigger. And so you should not be surprised in this approach that on each  level there is just a different language which is correct for describing certain things. On each  scale, on each energy, different configurations, there’d be different languages which are more  appropriate. So that’s kind of what we are saying here. And it shouldn’t stop here. You know, once  we move on to gravity, cosmological scale, yes, we are trying right now to use the language  of field theories to describe cosmology, inflation, gravity, but things don’t quite  work on a cosmological level. There is this dark energy problem, there is the dark  matter problem. And so maybe once we adjust the language and start describing  those scales using a different language, different modeling, then you may have  more agreement with the experiment.”

“So would now be a good time to talk  about your second law of learning?” “Sure, sure. I mean, I touched upon  this, so I think it’s important to emphasize that just like the second  law of thermodynamics, we really like it, but we should understand that it doesn’t work.  It works all the time until it doesn’t. So we should not take any macroscopic laws—they  should be taken with a grain of salt. Because—but nevertheless, it is useful for  many, many different calculations. And so the second law of thermodynamics tells us that  the entropy should grow. And then if you really take it literally, then you will have  a hard time explaining the emergence of life. You’ll have a hard time explaining many  biological phenomena if you want to be honest.” “Wait a moment. Why would you have a difficult  time explaining the origins of life? Because it’s always that global entropy increases, but  you’re going to have local pockets of order.” “Right. Okay. But if you’re only talking  about global entropy, there is just one equation, there’s only one number. It’s not very  interesting. What’s interesting is what—and okay, it increases or decreases, it doesn’t matter.  What really happens locally, because, you know, in thermodynamics, it says you have any subsystem  come big enough and locally entropy will grow. And so instead of just having one number in the entire  universe that you’ve observed that grows, more interesting is to pay attention to what happens in  the different subsystems, and then explain how the entropy—and how you define the entropy. It’s also  like, you know, because if you talk about gravity, it’s very important to think about how you  actually define entropy. This is—now you have a space that is curved. Now, if gravity pulls  things together, wouldn’t it mean that the entropy decreases, right? So you have, kind of, before  there would be things distributed everywhere and they’re pulled together. Now, naively, you would  say it decreases, but then you say, ‘Well, no, no, no, no, no. I’ll define entropy a different  way. I’ll catch the problem so that it doesn’t.’ But again, if you only pay attention to one number  that you want so badly to increase, then okay, that’ll be—I’m saying that the usefulness of the  second law of thermodynamics is that it can be applied for many subsystems, very successful.  And then when it—but then there are certain things that with this you will have a hard time  explaining. And as a cosmologist, I have to say, you know, we have one beautiful theory in  cosmology called the theory of cosmic inflation that is kind of hopeless. It doesn’t know what  to do if you try to assign probabilities for observers to emerge. And if you use the usual  classical mechanical or physical approaches, what’s the probability of a certain observer to  emerge? Is it easier for us to go through this, you know, inflation, then galaxy formation, then  biology formation, or it’s just easier for us to form Boltzmann brains that are floating in the  empty space with the memory of us thinking that we are in the meeting right now? And you  will see that in this very successful—for otherwise—models of cosmology, you will see  that very often you get the answer that, yeah, all this what we think is correct, it just gives  you a lower probability. It’s a higher probability for you just to nucleate out of nothing. And  that’s called the Boltzmann brain problem. So no, it’s—first of all, once you  enter gravity territory, cosmology, you have to be careful about what you call  entropy. And then there is this problem of defining probabilities for us observing what  we observe. So people employ different ideas, one of which is an anthropic principle. Well,  let’s put that in mind. So maybe we should not be just a random point in space because if we  were, we probably would not be observing what we’re observing. And now maybe we should only look  at the places that are actually tuned for life, smart enough observers who can ask—I ask  you a question, the anthropic principle.” “To be clear, the anthropic principle is  different from an entropic principle, which is—” “Oh, good point. Very good point. Actually, it’s  a lot of confusion about that, and partially because, yeah, because entropy is used in  those conclusions. But you’re absolutely right. Anthropic principle—starts with an A.  And that’s people, you know, many, very large fraction of physicists, scientists don’t like the  principle. They disregard it as non-scientific. Now in cosmology, that was kind of the only  game in town. Now, until Lee Smolin proposed his natural selection approach, which is, I  think, closely related to what I’m trying to say. Interesting. And with the only twist, I’m actually  giving a mechanism. Yes, yes. A mechanism of how, instead of a universe being fine-tuned for  life, it is self-tuned for life. So you start with whatever you want, because the universe is  learning. So it consists of learning subsystems, and the learning subsystems, they all try to  learn what’s around them in the most stupid way. Because of that, you should not—you’re  not tuning anything. It is self-tuning itself. It likes to be observed. And so observers  emerge, not because there are carefully chosen constants of nature, but because if they  were not carefully chosen, then they would be learned to evolve towards being carefully  chosen. And so that’s kind of giving you a physical mechanism of how ideas that were—and  actually, Lee Smolin told me that this idea came before him. There were philosophers.  There’s always—any idea you describe, there’s always some philosopher in the  past who said the same. Okay? Sure, sure.”

“And then in the comments section, there’ll be, ‘Oh, and this philosopher is  predated by the Vedic texts.’” “Yeah, there’s always someone before. Yeah, but I don’t think there is competition here. Just  with every new time you rediscover something, what you’re trying to do is trying to make it more  rigorous using the tools you currently have. So, you know, there were no machine learning  systems 100 years ago. There were no neural network dynamics back then. Yes, exactly. Now we  have that, and so those are the tools. You know, can you use those tools? Well, speaking about  the tools, there’s one little problem that kind of may be unique, maybe not. Many  times physicists came to the realization that new tools are needed. Einstein is a great  example. You know, who would have thought that curved spaces or Riemann geometry is important for  modeling the universe? Nobody. But Einstein came around and said, ‘No, this is the mathematics you  need. You need differential geometry to describe.’ So, but at least at that time, there was  already a body of work where you can just take this framework. With neural networks, the  situation is a little bit different. We have so many experiments, so many experiments. So every  time you’re amazed by what a neural network does, it’s an experiment. And not so much of the  actual theory of you being able to actually predict ahead of time, ‘Yes, this architecture  will work. And no, that will not work for such and such reasons.’ So that’s kind of—the theory  is a bit behind here, which is like a perfect playground for the theorists because I can, you  know, set up an experiment, write down my theory, and then test it experimentally, numerically  right away. So that’s great. Okay. But that’s a little diversion from—I want to get to where  you said that the universe likes to be observed.” “We’re going to get to that and we’re going  to get to consciousness. But before we do, I recall you saying that natural selection  operates on the level of subatomic particles. Am I shaky in my recollection? Not from  this conversation, but somewhere else.” “No, no, you absolutely—it may have not been  in this conversation that I said that, but I certainly wrote about it in the papers. And  yeah, and so this natural selection-like way—by natural selection, I mean the more  useful technical configurations of networks survive because they help the loss function to  be minimized better. And the other configuration will not survive. So in this sense, natural  structure—those architectures that are useful for learning will stay. And those who are not  useful for learning will be removed because their loss function is not as low as it should be.  So in this sense, yeah. But it doesn’t work just on the level of particles, the metric particles.  Remember again about this analogy of the right language on different scales. So if you want to  talk about this at the level of particles, yes, you would say particles are such—the way they  are is because they underwent the series of natural selection of their scales and figured out  that these are the states, this is the state of their neural networks that describe them, that are  allowing their loss function to be the smallest. But this argument, this natural selection  argument, or I call it more a learning argument, right—so you’re trying to minimize something—it  can be applied to any scales. It can be applied to scales of biology, which we usually do. When we  say natural selection, we usually think about the scales of organisms, right? Organisms are again  some kind of configurations. They’re more maybe fluid configurations because no two organisms are  alike. But so maybe particles, right? So we—yeah, they look very similar. Like all electrons look  very similar, but we don’t know. Maybe there is some tiny difference. And then—and the way we are  trying to understand the tiny difference—or maybe it’s already went through this very long period  of natural selection and then that is the value. That’s what an electron mass should be. And  there’s nothing else I can do at this point.” “I imagine if you could put a mark on an  electron and a boson or distinguish them, then the spin-statistics theorem wouldn’t apply  anymore. And we would see some effects of that.” “Right. So the loss function would be—it’s  just not convenient for not to have fermions. And again, we kind of understand that—I think one  good example I can give that maybe a very general audience will understand and machine learning  people will definitely understand: cars. Cars moving in traffic. They are trying to self-drive.  So we assume 10 years from now where all the cars are self-driving—maybe sooner, maybe later, I  don’t know. And then all of them are driving and they all are trying to optimize their loss  function. But to optimize their loss function, they have to do some calculations about the  environment. They have to scan the environment, find something, plug it into maybe their  network, and then the network will say, ‘Turn left, turn right.’ And so at this  time, that information that it collects, each of the cars collects, we actually call in the  physics of electrodynamics—we call that a bosonic field. So an electromagnetic field or Green’s  function of other electrons that I scan around propagates to me. And that gives me relevant  information for what me to do as an electron. So, you can see that in the background. Or cars  can run for other cars, get that information, and say, ‘Okay, that’s the relevant information for me  to make a left or right turn, or accelerate.’ So in this sense, cars are like fermions. They  are advanced enough to be able to process not the state of the entire universe around them,  because they are tiny, but relevant information for them to optimize their loss. And if they would  be doing something else, they would not behave as an electron, and then they’re not going to be  able to do it. So that would create some kind of unstable behavior and this whole system would  not work as it should. So just like all the cars converge to some—all self-driving cars are using  the same software just because that’s useful. You know, electrons kind of using all the same  software of how to navigate an electromagnetic field. So this is this analogy—maybe helps to  think about electrons as self-driving cars. So there’s a lot of different particles that  have already established what is the best. Maybe they haven’t established it exactly, and so there  are still internal degrees of freedom. You know, there’s spin and there are other things.  So, you know, in some one circumstance, I will be doing this and other  that. And the same for the cars, right? So you will have different cars driving  in the UK and the US, right? Because left-side, right-side driving. So the state of the  self-driving software in the cars still has to be different. We have to agree. But other  than that, a lot of similarities in describing those cars would be—and the same for—so if  it’s useful, this is a correct analogy.” “There’s a structural similarity between the  cosmos, if you zoom out far enough, and neurons. And some people use this to suggest there’s  a cosmic brain. Now, I want to talk about what you’re not saying. Are you not  saying that? Or are you saying that?” “No, absolutely not. I’m sorry if I  interrupted you, but I’m not saying that.” “No, that’s great. It seems to me like  this comports with your theory. So it would seem like you’d be like, ‘Oh, that’s great  evidence for my theory.’ Maybe, maybe not.” “Yeah. So both of the things that you said are  true. So first of all, I’m not saying that because I haven’t confirmed. Visually I’ve confirmed they  look similar. Right. Exactly. So we’ve all done that. There are papers of people who actually  tried to do statistical analysis, which is the right thing to do, and statistically showed  that there are certain things that are similar. There is a well-known critical-like phenomenon  where you have some kind of scale invariance that is observed in the cosmic web and observed in  the biological networks. So now I haven’t mapped out exactly the dynamics of the galaxy’s formation  and how all this would come around. I’ve only done calculations suggesting that self-organized  criticality or a critical state is something that you should expect to see in the learning  system. And it’s a good thing for the learning system to have criticality. So there is this  calculation that tells you, yes, criticality is good. And then you can say it and say, ‘Okay,  once it’s good, doesn’t this confirm the observed criticality in the brain activity or the observed  criticality that we see?’ Yes. But this is indirect evidence. So maybe this is—actually we  are talking about this cosmological scale where the system is performing very slowly, maybe some  kind of very, very complex calculations. Or maybe we just say it’s slowly, but actually doing some  important learning task. So yeah. Because of that, I do not say that. Usually I show those pictures  when I give public talks, but I do not say that I’ve done enough rigorous calculations for the  structure formation. And I’m curious to say that this is—I know how to do those calculations,  but there’s just only 24 hours a day.” “Yes. You have a rare quality where you will  assert something and then say, ‘And here’s why it’s either not fully the case, or I don’t believe  it, or here’s the limitation in my own model, and here’s the counter-evidence.’ I haven’t seen that  in almost any of the people that I interview.” “Okay. So, I also hated that when I was a student.  Because even when you’re a student and you come to the class and then they tell you something that  they’ve been taught and they take it without actually trying to question it, I think this is  a horrible quality. We physicists actually can do better. I think, I don’t remember who exactly  said that, but we should be doubting everything. So we should be doubting our own models, our own  calculations, and the calculations that other people had done. Even if a hundred people come  to you and say that general relativity is wrong, it doesn’t mean it’s wrong. And we know all that  story when the hundred physicists wrote a letter to Einstein saying that general relativity is  wrong. And his reply was brilliant. I mean, again, you don’t need a hundred. If you have a point,  show me a point and I will consider it. So yeah, I think this is a quality that you absolutely must  have. You have to show all the good and bad things about it. Because I’ve thought about this. I mean,  of course, I’ve tried to answer. I don’t have all the answers. So I think this is a more honest and  correct way of doing this. And we should be doing it not just with new theories. There are a lot  of problems with existing theories. Classical theories are not as pure as they thought.  There are divergences and things we don’t fully understand. And we should be telling people and  students about them when we discuss all this. So, don’t put anything under the rug. That’s  kind of my approach. Because sooner or later, smarter people will find what’s under the rug.  And that’s certainly very, very important.” “With that out of the way, and thank you for  that, by the way, let’s get to consciousness. Is the universe itself in your model, with  the universe being the neural net, conscious?” “Okay. Very good question. And of course,  I get this question all the time. Now, here is my—maybe a bit longer answer, but I think  I need to say that. You come with a mathematical framework, a new mathematical framework, which is  very rich, which relies on neural networks and the learning dynamics. And you’re trying to use that  to describe some phenomenon. This phenomenon may be a physical phenomenon, like we talked about,  or the phenomenon that people are discussing in other branches. So they already have a term for  something, and you’re bringing a toolbox. In this toolbox that I have, there is nothing that I  would call consciousness. But I’m trying to use it to describe what people mean by consciousness.  And I can have many attempts. So I may suggest something, and they will say, ‘Well, this looks  like not a good definition of consciousness, because here the system that we all agreed—a  hundred of us agreed—that it’s conscious, but your system, your definition tells it  it’s not.’ Then okay, then either I say, ‘Well, maybe you should adjust your  notion of what consciousness is,’ or maybe I should adjust my definition of  consciousness. And both ways are fine. My definition of consciousness  comes from how I understand it, how I can build it within a framework that I  understand—a mathematical framework. And in this mathematical framework, you know, the system  undergoes learning dynamics. And there are three macroscopic things that are directly related  to learning that I can calculate. So one of the things is how fast the system adapts to the  new dataset, to the new environment—how fast it learns. So this is the—I can actually calculate  it as the decay rate of the loss function.” “Right, right. That sounds to me more  like intelligence than consciousness.” “Right. And, well, okay. So, but then I  say, hold on the intelligence because I have a comment about it. I’m going to talk  about that as well. So, and then I’ll say, I want to, as a hypothesis, call the rate of  decay—how fast it learns—I want to call it consciousness. Maybe it will be wrong,  but I will call it that because I come with a new framework and in this framework  I can calculate it. But I say right away, there are two more things that are also  macroscopic and some people may relate them to things related to consciousness, but I  would relate them to intelligence. And I would say actually three things contribute to  intelligence. If you judge a learning system, how well it behaves, then there are three  quantities you have to calculate. One is how fast things learn. And I call it consciousness. Maybe  you would call it intelligence. Second thing is how low does the loss function go? Asymptotically.  If I had infinite time, how low will it go? Yeah, it may learn fast, but then just halt, stop. So I  would say, okay, that’s another ingredient that is important: how low is the loss function. And the  third one is once it reaches this asymptotic loss, it’s not going to stay there. It’s going to  be fluctuating. Because that’s what learning activation dynamics is. You just don’t stop.  You never stop. You’re always in this learning equilibrium. And sometimes the loss goes a little  bit up, down, up, down. So you always fluctuate. So you have a learning system and I can  calculate three things: how fast you learn, how good you learn—how good you learn if  you had infinite time—and how stable would you learn? And so I would say that because  of that, those three things actually describe what I think is intelligence. Not just one IQ  number. Three things. And then you can have different people. Some people learn very fast,  but they stop. They are not learning more and more. Their loss function is halted. Other  people may take a very long time to learn, and then eventually they end up knowing all of  differential geometry and quantum field theory, what’s not. And then the third type of people  who maybe also learn fast and maybe they know all of the advanced mathematics, but they’re  very unstable until they keep repeating it. They keep forgetting and then opening your book again.  We all forget stuff, right? You learn something. I forget the things I wrote in my papers;  there are many papers I have to look at. So, there is some degree of how I’m going to  write down my knowledge—my loss function is actually fluctuating. So I would put those three  things—I would put those three things and say, that’s intelligence. At least three. So it’s at  least three, maybe more. Because actually there are more because, you know, it’s a stochastic  variable. There are statistical moments: first, second, third. So I’m simplifying things. You  can describe these fluctuations with an infinite number of parameters. But at least those three  things are very important. And I think when you look at different systems, you can actually  say—all different people. You can say to students, right, you can say, ‘Well, yeah, this has a very  good learning efficiency. I would say he’s more conscious.’ And this has a very bad learning  efficiency; he is less conscious. But again, this is just a definition. If somebody  tells me that even with your framework, I suggest it has to be a square root of two  times the first number plus square root of two, I’m okay with that. So as long as I  declare what I mean by it, then I’m happy.” “I understand the first two. But the  last one about stability—so, yeah, this is a very good learning efficiency. Why does  that have anything to do with your intelligence when it could be the universe is changing? That  doesn’t seem to have anything to do with you.” “Yeah. So under the assumption, if the universe  doesn’t change at all—so you kind of—or changes are—it’s always changing. Okay. So it’s always  changing. This is what processing a dataset is. Your dataset—you get always different images of  cats and dogs. You look around and every day there is like a new shape of trees and leaves arranged  themselves. So you always have that. But I say, let’s integrate that. So it will be just some  state, a statistical state of the universe. So no major events. There is no major—nothing hits  the Earth and creates a sequence of earthquakes.

Nothing major. So I’m more or less in this  learning equilibrium. So if that’s the case, if nothing major changes—maybe a good example would  be, you know, you take a bacteria as an observer, placed in some kind of controlled environment  where you kind of keep maybe the temperature the same, the same amount of light or so. So if you  work with this ensemble, then the loss function will still change. It will still change because  of stochastic gradient descent. Now, you know, sometimes the light appears on the left, then it  moves to the left. Maybe it appears to the right, or in the opposite direction, it moves to the  right. So there are some changes, and it always updates its trainable variables to—why it would  do that? Because if the states, the statistical state would change, I would be able to adapt. So  kind of that, your ability to adapt is actually—it backfires on you. And it creates more  fluctuations. So you have to come up. And people actually know about that. If you kind  of set the learning rate to be smaller, like in some algorithm, then it will go to a stable,  very stable minimum. But, you know, it will not be as good a minimum. So these fluctuations  are not—should not be treated as a bug. It’s actually a feature to get out of the local  equilibrium. And so that happens all the time.” “Am I correct in saying that you said at  some point that we need to unify not two, not quantum theory and general relativity,  but quantum theory, general relativity, and observers? Okay. So most physicists tend to  think of observers as coming from the physics, something emergent. Why do you think that we  have to unify these three at the same time?” “Right. So, and most physicists will tell you  biology somehow will emerge from, once I have string theory completely done and quantum gravity  quantized. I call it wishful thinking. There is no evidence for that other than we think—we’re  kind of putting things—in one way of saying, putting under the rug. We’re just saying,  if something is complex, well, yeah, yeah, yeah. But if I do long enough calculations,  if I have long enough time, that’s how it’s going to work. And I don’t—for example, one  example: most physicists are convinced that quantum mechanics plays no role for how we  function, how our brain works. Right? So yeah, makes sense. Microscopic objects, why would  quantum mechanics matter? But we have no proof for that. And I think it is more of wishful  thinking again, related to how the second law of thermodynamics is wishful thinking that it  has to be really working. So I don’t think so. But the other answer to that is that  the fact that observers are very special and should be understood, I think, is realized  by most physicists who pay attention to two important problems in physics. One important  problem is the measurement problem. So every single physicist who actually seriously thought  about the foundations of quantum mechanics—not the person who is just doing ‘shut up  and calculate’ type of things and like following the manual—but who is trying to study  the problem, will necessarily realize that there is a measurement problem. The measurement problem  is about this third postulate of quantum mechanics that is very new because in classical physics,  all we need is the state and how it evolves. Here you need the state, how it evolves,  and how it is observed. So that’s one. And then you kind of have to say something that  maybe there is something additional to quantum mechanics that you have to describe. Maybe it  is an observer. An observer may play a special role. And if that’s the case, if you realize  that quantum mechanics is incomplete and the measurement seems to be playing a special role,  then you are stuck with trying to describe it. Now, another problem in physics that comes around  and also has to do with observables is cosmology. It’s called the measure problem. Essentially, more  or less a different problem, but more or less the same complication coming from observers. So if  you’re trying to assign probabilities to different observations in cosmology, what should be the  right probability? We discussed the Boltzmann brain problems; it’s a part of it. So you have  to specify the rules. You have to describe how to deal with observers. And in both cases, if  you actually think about this, the complexity comes from the fact that we are trying to put the  observer into the system. So when the observer is outside of the system, we all know what to do. We  know there is a Hamiltonian; I would describe it. Once you put the observer into the system—so  this is, you know, for more general audience, people know about the Schrödinger’s cat problem.  So if you put something and then the Wigner’s Friend problem. And if you start putting  observers inside, things start to break.” “And is it important that  the observer is conscious?” “Right. So at this point, no. There’s something  fishy about observers, but we don’t have a model of observers. So once you say, ‘Okay, let’s...’  Some people would claim, yes, consciousness is important. And they have their own definition. I  say it’s important to model the observer. So if you want to put it inside in the system, then  you really want to model how it behaves and model not just saying, ‘Well, maybe some kind  of emergent phenomenon of biology will happen and maybe some kind of wave function collapse will  happen.’ This isn’t going to work if you’re trying to do the calculation. So my answer is, yes,  observers are very important. And that’s why you really have to describe them if you want to do  calculations, even separately in quantum mechanics and gravity. And more so, and maybe because those  two problems persist in cosmology, where there is gravity and quantum mechanics, where there is  the measurement problem, maybe the solution is actually to try to understand how observers work.  And then once you understand that, both areas may somehow be unified. And then I think the observer  would be more as well, because it seems to be the problem with both theories—which again, you can  certainly put under the rug. You can stop and ignore it. Maybe the elephant in the room is  everybody knows it’s there, but we are trying to look the opposite way. And I, as you said, I  see this elephant. I say, ‘Well, it’s there.’” “At what point do you imagine observers entered into physics? Is it at the  Planck epoch? Is it prior?” “So in this model, everything is conscious.  There are observers everywhere. Every subsystem is an observer. Some of them are  efficient observers. They have efficient architecture. So their loss function falls  down. Some of them are stable observers. They’ve already reached a very low value of the  loss function. And some of them are not stable and they always fluctuate out of—so any subsystem,  because the entire building blocks in this model are neurons. And they come with the trainable  and non-trainable variables. Because of that, everything is learning. So everything in the  sense is observed. Just some observers are capable of doing and asking perhaps more complex  questions than others. Although we don’t know, right? Maybe inside of an electron, there is a  whole complex neural network that has already solved the problem of quantum gravity and is just  looking at us and laughing, saying, ‘Well, guys, I mean, it’s simple.’ Maybe. Maybe. We don’t know  about that. But in this model, as a model, we started with this. I’m not saying this is how it  works. But as a model, that could very well be.” “As far as I understand, there’s a number  that you can associate with consciousness. How conscious is this subsystem? But consciousness  to us is far more than just a number. We care about how conscious is someone. Most  of the time when it comes to health, are they alive? Should we remove the plug? And are  they going to wake up? You can’t wake up. But our consciousness—we’re conscious of so much.  So in your model, do you have any qualia?” “Right. So I was in one of the FQXi conferences  and there was a heated discussion. Every time consciousness is discussed, and if there are  physicists and non-physicists in the room, it’s always a heated discussion. And so: should  we call consciousness the person who is actually conscious in the sense of talking and interacting  with you, another conscious observer? Or should we call consciousness something else? So is it like  a discrete thing? ‘I talk to you, then I call you conscious. And you reply. I talk to a dog and he  replies, it’s conscious.’ So, yeah, absolutely. You can then say consciousness will be defined  as a coupling, a strength of the coupling of the organism with the sound wave or light or some  electromagnetic phenomenon. You could do that. You can do that. And that would be your definition of  conscious. Maybe it’s better than mine, right? And then we would not be arguing. You say, ‘Okay,  look, this is a person. He’s not conscious.’ But then there’ll be people who say, ‘Yes, I  have such and such friend or a relative who is in a coma, but he is conscious.’ And you  would say, ‘No. The fact that the person is in a coma and isn’t interacting with you the  way you want it to interact, it doesn’t mean that he’s not interacting with you some other  way.’ And actually, I’ve mentioned—I mean, I have to say this speculative idea because, you  know, maybe philosophers will love it or maybe not—but we discussed for quantum mechanics,  you need this bath of neurons. Otherwise it just doesn’t behave like that. So it could be this  bath of neurons is always there, but it’s not in our physical space. It’s in the hidden space, as  I call it. And if it’s there, then nothing stops a person who is not interacting with you in the  physical space from interacting with the hidden space. Okay? Maybe that’s what you do in your  dreams. Or maybe, you know, people are interacting through this hidden space all the time. And people  do claim that. So there are a lot of people who claim they have these special abilities, right,  to interact. And I think the reason we don’t take it seriously as physicists, I think, is for two  reasons: we don’t have a good enough framework for modeling this, and the second reason, we don’t  have controlled enough experiments to do that. But I think we should not be disregarding that when  we become equipped with a better mathematical model and with better experiments. So, yeah, I  wouldn’t like this definition where the person is conscious only if he can hear or tell,  reply back. But, you know, ChatGPT would be conscious according to that definition because  it certainly replies when you write. So again, there’s a lot of discussion, maybe not very  important about definitions, but we need to do this. We need to define terms before we can make  statements. And that’s just not time to do that.” “What distinguishes between trainable and  hidden variables? Like do physical entities correspond to some, or even mental entities  correspond to one, not the other, or what?” “So the hidden variables in this case  are like hidden neurons and their states are described by non-trainable variables.  Something that—but all of the non-trainable variables in a neural network are connected by  trainable variables. They’re called weights. So you cannot just draw a sharp line and  say here’s the trainable and here’s not trainable. Very much like in physics,  you cannot say here’s an electromagnetic wave and here are electrons. They’re coupled.  They’re all communicating through each other. Now, the difference between hidden non-trainable  variables and physical non-trainable variables is that the ones that are physical, they’ve  organized themselves in these three-dimensional structures. And they’ve kind of discovered their  effectiveness of using three-dimensional space for exchanging information and minimizing their loss  function. The hidden space at this point—they can have arbitrary connections to each other. There  is—it’s not—maybe a good idea to think about: initially you have a soup of neurons. Everything  is connected to everything. And it’s all hidden in a sense that no physical space had yet emerged.  And then there is like, you know, a bubble forms, a Big Bang. And then a certain number of those  neurons figured out that they can learn a lot more—it’s like a phase transition—and can minimize  their microscopic loss function. If they arrange them for a long time, you can say, ‘Well, yeah,  this is a very good learning efficiency.’ So if that phase transition took place, then you  really have—you still have the hidden variables, which you can always hire if you need to do  calculations. But they need not be present in your physical space. They need not interact  with the classical degrees of freedom. They still interact by providing you with this quantumness,  but they need not be directly observable and coupled. So that’s the model for them. And it is  correct to call them hidden variables because, you know, hidden variables is one of those—it’s  called an interpretation of quantum mechanics, but I guess it’s an attempt to actually make  quantum mechanics a little less mysterious, trying to actually—it comes with its own problems  and we can certainly talk about them, but at least it doesn’t put—it tries to put less stuff  under the rug. We physicists keep doing that, like, keep lying without saying that we are  lying in a good sense. We’re not doing it intentionally. We’re simplifying things,  right? But one of those things is that we can say something like, ‘I believe  in the many-worlds interpretation.’ But once you corner the person, he will admit that  it’s not as clear as they proposed. So, yeah.” “If the universe is learning,  what is it learning toward?” “Right. So, if the universe is learning and there  is nothing but the universe, so that’s it. That’s all there is. It’s unsupervised learning. And so  the only thing it can learn—every subsystem can learn the rest of the universe. So you put an  arbitrary boundary: me and the rest. So I am a subsystem. The only thing I can learn—I can  try to learn about myself if I’m in a coma, maybe I can do that. But that’s what I will be  interested to do. And that will actually help me also to survive. The more I learn—now we are  moving to the biology level where we’ve done a lot of work trying to understand how this exactly  works. But basically, you know, an organism has to learn its environment, model its environment, in  order to better predict how our environment will behave. And then once it’s able to predict it,  it’s more likely to survive. So this is what you have to do in order to survive. This is  very—actually, so first of all, it’s similar, of course, to natural selection, but it’s similar  to the phenomenological model that Karl Friston is constructing where he’s trying to say, ‘Okay,  well, let’s define some phenomenological function that maybe our ability to predict the state of  the environment is what...’ And what I’m adding to this story is that, yes, that’s great, but  let’s actually dig deeper and give a microscopic interpretation of that. So, it’s like, you know,  you can have thermodynamics, but then you can have a derivation of thermodynamics from statistical  mechanics. So I’m kind of saying, well, you can also derive it through statistical mechanics of  how neural networks work. So that’s the idea. So then, coming back to your question, every  subsystem is learning the rest, right? And we are not different. Our cells are not different. Each  cell is learning how to fit best into the organism so that it can optimize its own loss function.  And the society isn’t different. It’s only—it’s still learning, but on different scales. And  so, language is how we describe it—changing, addressing the physicist audience. There is an  RG flow. The loss function changes as you start renormalizing and generalizing the concept  of neurons. So, on a small scale, it can be fundamental neurons. On a bigger scale,  it can be subnetworks like particles, then you can have something like cells, people,  civilizations, and societies like that.” “If I recall correctly, your second law  of learning is that learning efficiency is proportional to the Laplacian of  the free energy. Is that correct?” “So, in very, very simple limits. Just like in  standard physics, we can derive thermodynamics in very, very simple limits. We physicists are  only good at doing Gaussian integrals and doing calculations for a very simplified system.  And for those systems where you can simplify, you can do those calculations. You can show that  it’s related to the Laplacian of the free energy, where the free energy is actually defined  microscopically. You start with an energy function which is a loss function. And then from  that you define free energy. You do not start it from phenomenology; you start from microscopic.  But in this case, in that particular limit, that was the answer. In more complex systems  where we are not dealing with Gaussians—and in the critical systems that we discussed, we are not  dealing with Gaussians—there many, many scales are important. And in those limits, things are much  more complex. And you cannot really give this formula and say you have to do it. You have to do  it either perturbatively—so, yeah, in this limit, the Laplacian of the free energy was important. In  more general, it may have lots of corrections. Or, as you know, in the perturbation theory, it may  happen that it’s not just correction, but they’re dominating everything. And then your zero-order is  wrong. It’s a non-perturbative limit. And then the answer is completely wrong. And there are reasons  to believe that the system is, in this sense, non-perturbative because of the criticality that  we observe. So there symmetry-breaking transitions take place. There are lots of complex things that,  of course, I won’t be able to talk about here, but yeah. Analytical calculations are hard, but I  guess without them we are not going to understand what’s actually happening, what’s the relevant  language of describing different phenomena.” “Does Karl Friston’s free  energy—is it an independent claim from yours or does it  emerge from your framework?” “No, I completely agree with him. So  there has to be a free energy in this

setup. It’s a phenomenological way of saying that  there is a function that you will be minimizing, optimizing. And that’s right. I mean,  you started with the very beginning: how come classical mechanics also optimizes?  Yeah, but it only deals with the very close—when you’re close to the equilibrium in the  sense of the learning dynamics. And the same with him. He has a phenomenological model that is very  intuitive and very nice and it describes that such a function must exist. It’s a kind of existence.  Now, he doesn’t start with a learning theory, but he starts with his understanding of how organisms  behave. It makes sense. It makes total sense. Now, I’ll give you an example how we can—his free  energy may be corrected to be much better. So, for example, you can have an organism  that isn’t interested in predicting; he’s interested in quantizing gravity. Okay? So for  that organism, he will spend all his resources, maybe locked in a room with no windows, trying  to quantize gravity, writing equations. So this organism will have its own free energy. Now it  will not be the one that tries to, you know, predict how the environment—it doesn’t matter  how—maybe a little bit. I mean, I want to make sure that I survive. So on higher levels, the free  energy can be different for different organisms. The question is whether you can derive them  always from the microscopic dynamics. So when you can do this RG flow and actually, starting  from some microscopic loss function assumption, derive it. And it’s an open question. The  only thing I can really add to the free energy principle that Karl Friston is advocating  is that we can model it using these trainable and non-trainable variables. And think about what  you get once the non-trainable are integrated out and you pay attention to a handful of  trainable variables. And then the system becomes something you can calculate and then you can  model. And now we can model it phenomenologically. So if you have a controlled enough experiment,  you don’t care like what microscopic physics give rise to this free energy. You can just calculate  by seeing how the system behaves to the changing environment. You know, for example, you introduce  how the system behaves on the sound or on the light or on the temperature. And then you just  model it as a function of those parameters. And that may hint you to actually how such a system  would emerge from some microscopic physics. So I don’t think it’s a contradiction to what Karl  Friston said. It’s just I’m saying that we can dig deeper if we assume that there is this  learning dynamics happening on all scales.” “What is a piece of advice that you found  inspirational that you keep coming back to? Advice that somebody gave me? It could be  also that you read in a book. It could be from an advisor. It could be from a movie.  Something you found that’s helped you.” “Yeah. Well, one advice I said, and it was  very, very useful to me, and actually maybe not something I would advise students to do—but  it works for me—is to doubt everything. So do not trust anything that is relevant for your  work until you try to do as much calculation yourself as much understanding yourself. Now,  why is this a bad advice? It’s because it may not be optimal for a student who is trying  to get a professor position, tenure position, or whatever. If you are going to be doubting  everything and doing all calculations yourself, you may just not publish enough. So that’s a bad  advice, but it is something I couldn’t avoid. I could not not do it. So once I figured  out that there is no problem that I cannot solve myself, I said, ‘Okay, I’ll be doing  that.’ And of course, I haven’t done all the calculations. The arXiv is full of the  calculations I have done, but as much as I can. And then, so doubting is, I think, one  advice that—and that advice, of course, that I know about, more concrete advice that was  given to me by my advisor, Alex Vilenkin. So, you know, I would come every day with some new  idea and he gave me advice that somebody gave to him and then I don’t know how long it was.  And the advice was: you come up with some idea, some theory, some equations, and then the next day  you should try to criticize it as much as you can. So, like, you flip it. And like, you know, you  want to act as if you are an opponent to that idea. Have this—it’s very helpful, like, really  objectively look at what you have done and say, ‘No, no, no. I don’t like that you’ve  done it. I’ll try...’ Devil’s advocate, right? So I’ll try to disprove it and find  all of the problems with it. And that’s why, whenever I’m talking to you and saying, ‘Look,  I know why it may or may not work.’ So I’m not trying to sell you a used car without telling you  that something is broken because that wouldn’t be fair. I wouldn’t feel right. And I wouldn’t  be confident about the calculation that I had done. So, yeah, constantly flipping  with this: why is this wrong? Okay, one day is come up with ideas, do calculations.  Next day, try to criticize as much as you can. And maybe like a last statement here. Now,  the ChatGPT is horrible in doing that, you know, or other LLMs. They tend to agree  with everything you say. And so my advice is push it into—my advice is not to use ChatGPT  for correcting your work without you verifying it. Using it for correcting your work is fine.  Using it for suggesting ideas is fine. Like, it’s an excellent tool. We just don’t know how to use  it yet. We are completely students of LLMs. Once we learn how to use it, but never trust—verify.  We say in Russian, you should always verify it. And you should to the point that you redo the  same calculations many times because honestly, how many times we make mistakes when we do  calculations? Well, I do the mistakes all the time. We all do mistakes. And so you should keep  questioning that. And so it’s related to doubt, and doubt even your own ideas. I guess  that’s my advice that was given to me.” “Professor, thank you so much  for spending two hours with us.” “It was two hours? Yes. It went by  like that. It went by so quick. But space and time don’t exist anyhow,  so at least not fundamentally.” “Right, right. Well, Curt, that was fun. That  was a lot of fun. I appreciated having me on your podcast. It was very nice talking. Very nice  questions, by the way. I have so many more. Let me just—full disclosure, in addition to providing  information, it was an experiment for me because from the time I conjectured that the world is a  neural network, every time I talk to a person, I conduct an experiment: how this person reacts to  what I say. And so you’ve been a great opponent, a great person to talk to. A great guinea pig.  To actually—so I’ve been experimenting with you, whether you know it or not. And so at  some point when I will be constructing not just a theory of biology, which we’ve  done, but a theory of psychology, I might be using some of this discussion as experimental  evidence of certain psychological phenomena.” “All right. I’ll take that as a compliment.” “It is. It is. No, no. It was really great.  I mean, it’s exceptional. I’m very happy with the questions. That was very good.  Thank you. I’m honest with telling this, that I’ve given interviews to podcasts when  the people were not equipped at all with any of the physics lingua or the background,  and they just were pushing their own worldviews without trying to understand what I’m trying  to say. And that was really a torture for me because the point of the podcast is to try to do  both: try to understand what I’m trying to say, and then try to point me in the right direction.  So that’s what I appreciate. And I’ve had good experiences with people who actually, you  know, done their homework. So it was obvious you have a physics degree. That helps a lot because  at least certain things I may say between the lines and you would put me back and say, ‘Okay,  I’ll clarify that more often.’ So that was very useful. And so I think that that was—thank you  for that. So it was great. Great experience.” “All right. Okay. Take care, sir.  And I’m sure we’ll talk again. The audience is going to love you. I guarantee you.” Hi there. Curt here. If you’d like more content  from Theories of Everything and the very best listening experience, then be sure to check  out my Substack at CurtJaimungal.org. Some of the top perks are that every week you get brand  new episodes ahead of time. You also get bonus written content exclusively for our members.  That’s C-U-R-T-J-A-I-M-U-N-G-A-L. You can also just search my name and the word Substack  on Google. Since I started that Substack, it somehow already became number two in the  science category. Now, Substack, for those who are unfamiliar, is like a newsletter—one that’s  beautifully formatted. There’s zero spam. This is the best place to follow the content  of this channel that isn’t anywhere else. It’s not on YouTube. It’s not on Patreon. It’s  exclusive to the Substack. It’s free. There are ways for you to support me on Substack if you  want, and you’ll get special bonuses if you do. Several people ask me, like, ‘Hey, Curt,  you’ve spoken to so many people in the field of theoretical physics, of philosophy,  of consciousness. What are your thoughts, man?’ Well, while I remain impartial in  interviews, this Substack is a way to peer into my present deliberations on  these topics. And it’s the perfect way to support me directly. CurtJaimungal.org or  search Curt Jaimungal Substack on Google. Oh, and I’ve received several messages, emails,  and comments from professors and researchers saying that they recommend Theories of Everything  to their students. That’s fantastic. If you’re a professor or a lecturer or what have you,  and there’s a particular standout episode that students can benefit from, or your friends,  please do share. And of course, a huge thank you to our advertising sponsor, The Economist. Visit  economist.com/TOE to get a massive discount on their annual subscription. I subscribe to  The Economist, and you’ll love it as well. TOE is actually the only podcast that  they currently partner with. So it’s a huge honor for me, and for you, you’re getting  an exclusive discount. That’s economist.com/TOE. And finally, you should know this podcast is  on iTunes. It’s on Spotify. It’s on all the audio platforms. All you have to do is type  in Theories of Everything and you’ll find it. I know my last name is complicated, so  maybe you don’t want to type in Jaimungal, but you can type in Theories of Everything  and you’ll find it. Personally, I gain from rewatching lectures and podcasts. I also read  in the comments that TOE listeners also gain from replaying. So how about instead you re-listen  on one of those platforms, like iTunes, Spotify, Google Podcasts, whatever podcast catcher you  use. I’m there with you. Thank you for listening.