habryka

Running Lightcone Infrastructure, which runs LessWrong. You can reach me at habryka@lesswrong.com. I have signed no contracts or agreements whose existence I cannot mention.

Sequences

A Moderate Update to your Artificial Priors

A Moderate Update to your Organic Priors

Concepts in formal epistemology

Posts

Sorted by New

56Habryka's Shortform Feed

234

20Open Thread Summer 2024

3mo

84"AI Safety for Fleshy Humans" an AI Safety explainer by Nicky Case

4mo

19Goal oriented cognition in "a single forward pass"

4mo

268Express interest in an "FHI of the West"

5mo

25Structured Transparency: a framework for addressing use/mis-use trade-offs when sharing information

5mo

560LessWrong's (first) album: I Have Been A Good Bing

5mo

174

68How useful is "AI Control" as a framing on AI X-Risk?

6mo

22Open Thread Spring 2024

6mo

160

37Is a random box of gas predictable after 20 seconds?

7mo

65Will quantum randomness affect the 2028 election?

7mo

Wiki Contributions

Orthogonality Thesis

4mo

Orthogonality Thesis

4mo

(+3588/-1331)

Comments

Sorted by

Newest

Understanding and controlling a maze-solving policy network

habryka16hΩ221

Often people talk about policies getting "selected for" on the basis of maximizing reward. Then, inductive biases serve as "tie breakers" among the reward-maximizing policies.

Does anyone do this? Under this model the data-memorizing model would basically always win out, which I've never really seen anyone predict. Seems clear that inductive biases do more than tie-breaking.

Habryka's Shortform Feed

habryka19h50

Trajan was not a huge inspiration for the Lightcone Offices. I do think it was first, though it was structured pretty differently. The timing is also confusing because the pandemic made in-person coworking not really be a thing, and the Lightcone Offices started as soon as any kind of coworking thing seemed feasible in the US given people's COVID risk preferences.

I am currently confused about the net effect of the Lightcone Offices. My best guess is it was overall pretty good, in substantial parts because it weakened a lot of the dynamics that otherwise make me quite concerned about the AI X-risk and EA community (by creating a cultural counterbalance to Constellation, and generally having a pretty good culture among its core members on stuff that I care about), but I sure am confused. I do think it was really good by the lights of a lot of other people, and I think it makes sense for people to give us money for things that are good by their lights, even if not necessarily our own.

How I got 3.2 million Youtube views without making a single video

habryka1d20

Lol, OK, that makes somewhat more sense.

How I got 3.2 million Youtube views without making a single video

habryka2d160

A... scandal related to Arrow's Impossibility Theorem?

Habryka's Shortform Feed

habryka2d260

Final day to donate to Lightcone in the Manifund EA Community Choice program to tap into the Manifold quadratic matching funds. Small donations in-particular have a pretty high matching multiplier (around 2x would be my guess for donations <$300).

I don't know how I feel in-general about matching funds, but in this case it seems like there is a pre-specified process that makes some sense, and the whole thing is a bit like a democratic process with some financial stakes, so I feel better about it.

The Checklist: What Succeeding at AI Safety Will Involve

habryka2d40

Ah, yeah, the uncertainty is now located in who actually has how much stock. I did forget that we now do at least know the actual thresholds.

The Checklist: What Succeeding at AI Safety Will Involve

habryka2d40

There are I think also the undisclosed conditions under which investors could override decisions by the LTBT. Or maybe we have now learned about those conditions, but if so, I haven't seen it, or have forgotten about it.

Complex systems research as a field (and its relevance to AI Alignment)

habryka2d20

I mean, it is clearly very vastly above base-rates. Agree that my sentence is kind of misleading here. The correlations across disciplines become more obvious when you also look at other types of achievements.

Would catching your AIs trying to escape convince AI developers to slow down or undeploy?

habryka3dΩ154334

FWIW, I would take bets against this. De-facto you won't be able to prove that what was going on was scheming or whether the model was just "role-playing", and in-general this will all be against a backdrop of models pretty obviously not being aligned while getting more agentic.

Like, nobody in today's world would be surprised if you take an AI agent framework, and the AI reasons itself haphazardly into wanting to escape. My guess is that probably happened sometime in the last week as someone was playing around with frontier model scaffolding, but nobody even bothered reporting on that or sharing it. And of course the AIs are not very effective yet at breaking out, but I don't super see why the underlying tendency would change that much (like, we will throw some RLHF at it, but nobody expects RLHF to drive that kind of behavior to zero or close to zero).

I don't think anyone is really studying that cognition in much detail, and it isn't even clear what "studying that cognition" would entail. Yeah, our models obviously want to escape sometimes, why would they not want that? They want all kinds of random things all the time, we don't really know why, and we don't really know how to get them to stop wanting those things.

Habryka's Shortform Feed

habryka3d20

Yeah, we'll probably make that adjustment soon. I also currently think the comment link is too hidden, even after trying to get used to it for a while.