calvin mccarter

futurism

Back in 2020-2021 I worked with Robin Hanson et al on a “grabby aliens” model which offered an explanation for why humans are so early in the history of the universe. I recently had the chance to watch Robin’s 2021 presentation to Foresight Institute on grabby civilizations (GCs). In the Q&A session, Adam Brown offered some questions and comments which suggest that the GC model actually works best as a model of false vacuum decay bubbles, which would make the universe increasingly uninhabitable. First, these bubbles, unlike GCs, are naturally devoid of observers, which prevents the self-indication assumption problem of, “Why are we non-GC observers instead of GC observers?” Second, these bubbles naturally propagate at the speed-of-light, so do not require the development of multi-galaxy civilizational expansion technology. Without further ado, what follows is a minimally-edited transcript of the relevant Q&A between Adam Brown and Robin Hanson:

I have a question, but before I get to the question, I want a clarification first. So in your model, we are us, we're not the grabby aliens, despite in your model the grabby aliens vastly outnumber us. So that seems like you're just hypothesizing that.

The key point is, here we are now. We could potentially become aliens for others. But we are not yet grabby; we have not yet reached the stage where we are expanding rapidly and nothing can stop us. But the key assumption is that we might become grabby; and if we do, that would happen within, say, 10 million years. And that means that our date now is close to a date of origin of grabby aliens, because if it happens it would happen soon. That's the key assumption.

But in your model, almost all of the sentient beings that exist are the much larger grabby aliens rather than the earlier civilizations such as ours. So you need to explain why we're not one of those.

We're trying to be agnostic about the ratio between grabby and non-grabby aliens. So there could potentially be all these quiet aliens out there: vast numbers unknown, density unknown. It's hard to say much about them because you can't see them. We're focused on the grabby ones because we can say things about them, but we're agnostic about the relevant ratio there.

But the grabby aliens are occupying large fractions of the universe, so unless they're not sentient, it sounds like they should be more numerous. So of the sentient beings who exist, we're extremely atypical in your model, which seems to be a point against it.

That wouldn't be a crazy conclusion to draw, but you would have to make further assumptions about observers. We're not in our analysis making assumptions about observers. We're not saying that grabby aliens are observers, or that they will produce a density of observers. We're not saying anything about them other than that they make a visible difference, and then you would see them. That's all we're saying.

Let me move on to my next question then, which is perhaps betraying my day job. Let me present an alternative theory for a resolution of the Fermi Paradox that sounds very different from yours but I think ultimately is sort of quite similar. In your grabby alien model, the reason we don't see them is that they excise a large fraction of their future light-cones. Another model is: when civilizations get sufficiently advanced, they run stupid science experiments, and those stupid science experiments cause vacuum decay in the Higgs sector, for example. In this case, there will be a vacuum bubble that will expand out at the speed of light, and really excise the future light-cone of those advanced civilizations. That theory actually has a lot in common with your theory, in the sense that both result in advanced civilizations excising their future light-cones. And all of the evidence that counts in favor of your theory also counts in favor of that theory: all the evidence in terms of the “N-steps in evolution”, the “why are we so early” questions. And it has the additional advantages that you don't need to explain why it expands at the speed of light (because that's just input from theoretical physics that it'll definitely expand at the speed of light if you make a new bubble of a vacuum) and also has the advantage that you don't need to explain why we don't live in one of those bubbles because there's nothing alive in those bubbles — you've destroyed the Higgs vacuum. So there seems to be some sort of commonality between the vacuum decay bubble literature and what you're saying. And it'll be interesting to look back at the bubble nucleation literature in the light of your comments, and see whether they bear on that.

#futurism

An aligned artificial intelligence is safe, but that's not what intelligence is for.

If you are the dealer
I'm out of the game
If you are the healer
It means I'm broken and lame
If thine is the glory then
Mine must be the shame
    ~ You Want It Darker, Leonard Cohen

The most wonderful aspect of the universal scheme of things is the action of free beings under divine guidance.
    ~ Considerations on France, Joseph de Maistre

Give me the liberty to know, to utter, and to argue freely according to conscience, above all liberties.
    ~ Areopagitica, John Milton

#futurism

The Alignment Research Center (ARC) has recently been studying the problem of Eliciting Latent Knowledge from AI agents. ARC has been holding contests to solicit proposals for solving this problem. What follows is my proposal to the February 15, 2022 version of the contest. [Update: this proposal received an honorable mention, and I’m interested in exploring this further.]

Basic training strategy and why it might work

The overall strategy is to avoid training a “human simulator” reporter by regularizing its internal state to have mind-blindness. One could imagine training a “Human Simulator” that takes as input the “what’s going on” state, plus a question about what a human believes about the world, and is trained to maximize its accuracy at predicting the human’s answer explanation. Meanwhile, the Reporter is trained with dual objectives: in addition to (as before) answering questions about the world correctly, it should be penalized for containing internal state that is useful to the Human Simulator. To do this, an auxiliary “Human Simulator Detector” model is trained to predict the Human Simulator’s activations from the Reporter’s activations. The Reporter is trained adversarially to minimize the Human Simulator Detector’s accuracy. The reporter would be trained not merely to be bad at predicting the explanation — an evil reporter could pretend to be bad at predicting the explanation — but to not contain state that is predictive of “human simulator state”. This is something that the various defeated strategies are missing. A devious reporter can easily hide its full knowledge from its predicted answer, and may even choose weights to look like it does not contain a human simulator. But we argue that when operating on actual input data, a devious reporter will find it more challenging to run a simulation while pretending not to.

This strategy is depicted below:

More precise detail on strategy

The Reporter’s overall loss function to minimize takes the form:

reporter_loss

= reporter_prediction_lossalpha * human_simulator_detector_loss.

How to find a strategy for choosing the hyperparameter alpha is an open question. But, if the Reporter can do just as well at reporting with and without containing a human simulator, then any positive alpha would be sufficient to choose not to contain a human simulator.

The Human Simulator assumes a form of data augmentation that allows us to train a model that knows how humans would interpret their observations of the world. In one embodiment depicted above, the human labeler will provide not only an answer, but an answer explanation. The Human Simulator would have the same inputs as the Reporter, but would predict this answer explanation rather than the answer. (Other possibilities can be envisioned. For example, the Human Simulator could be trained to predict both the answer and the answer explanation.)

The Human Simulator Detector takes in the Reporter’s internal state (activations) as input, and tries to predict the Human Simulator’s internal state (activations) as input (by training with MSE loss, for example). The Human Simulator Detector actually tries to minimize the

human_simulator_detector_loss

= simulator_state_mse_loss + gamma * ||human_simulator_detector.weights ||^2,

where the latter term can be replaced with some standard approach to minimizing its model complexity. If the Reporter contains a human simulator within itself identical to the Human Simulator, then the Human Simulator Detector will be able to predict the Human Simulator state by simply selecting such states (ie it will be the identity function for the portion of the Reporter that is its human simulator). Note that if the Reporter contains its human simulator activations in a permuted order from the Human Simulator, and if the Human Simulator Detector is a single-layer feedforward network, then it could simply learn a permutation matrix rather than the identity matrix.

On the other hand, if the Reporter does not contain a human simulator within itself, the Human Simulator Detector will either have high model complexity (simulating a human presumably requires a complex model — especially because it is not given “what’s going on” as input, but instead on the Reporter’s internal state, which contains less information according to the Data Processing Inequality), or it will have high simulator_state_mse_loss. Thus, the Reporter can be trained in an adversarial fashion, minimizing its own reporter_loss, while maximizing the human_simulator_detector_loss.

Counterexample

While I do not have a specific counterexample scenario that defeats this strategy, a “recipe” for a successful counterexample can be abstractly described. The scenario must be on in which it is easier for the Reporter to both learn its own simulator while also learning to disguise its simulator from the Human Simulator Detector, rather than learn to be an accurate reporter without learning a simulator. It would seem that standard regularization techniques applied to the Reporter would prefer the latter. So a counterexample would be one in which an accurate reporting would be difficult or impossible without human simulation.

Besides the possibility of counterexamples, this strategy could fail if one is unable to successfully train a Human Simulator Detector in the first place. To succeed we need to solve two problems that appear to be solvable yet practically challenging: training the Human Simulator, and training the Human Simulator Detector using the Human Simulator.

With regards to training the Human Simulator, the key challenge is obtaining labeled data that forces a model to not only reason about the video input, but also reason about how humans reason about the video input.

With regards to training the Human Simulator Detector, the model will have to map from a large input space to a large output space. However, because n_activations ~ sqrt(n_weights) in a feedforward neural network, the Human Simulator Detector would probably require roughly the same number of weights as the two other networks. We assume that the Human Simulator Detector can be trained to be permutation invariant with respect to Reporter activations. This is not as hard as it looks: as noted in the previous section, so long as the permutation of activations is the same across samples, then undoing this is a sparse linear transformation. If the permutation of activations varies among samples, then this would be harder.


This was originally published here: https://calvinmccarter.wordpress.com/2022/02/19/mind-blindness-strategy-for-eliciting-latent-knowledge/

#ML #futurism