Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
habryka

Running Lightcone Infrastructure, which runs LessWrong. You can reach me at habryka@lesswrong.com. I have signed no contracts or agreements whose existence I cannot mention.

Sequences

A Moderate Update to your Artificial Priors
A Moderate Update to your Organic Priors
Concepts in formal epistemology

Wiki Contributions

Comments

Sorted by
habrykaΩ221

Often people talk about policies getting "selected for" on the basis of maximizing reward. Then, inductive biases serve as "tie breakers" among the reward-maximizing policies.

Does anyone do this? Under this model the data-memorizing model would basically always win out, which I've never really seen anyone predict. Seems clear that inductive biases do more than tie-breaking.

habryka50

Trajan was not a huge inspiration for the Lightcone Offices. I do think it was first, though it was structured pretty differently. The timing is also confusing because the pandemic made in-person coworking not really be a thing, and the Lightcone Offices started as soon as any kind of coworking thing seemed feasible in the US given people's COVID risk preferences.

I am currently confused about the net effect of the Lightcone Offices. My best guess is it was overall pretty good, in substantial parts because it weakened a lot of the dynamics that otherwise make me quite concerned about the AI X-risk and EA community (by creating a cultural counterbalance to Constellation, and generally having a pretty good culture among its core members on stuff that I care about), but I sure am confused. I do think it was really good by the lights of a lot of other people, and I think it makes sense for people to give us money for things that are good by their lights, even if not necessarily our own.

habryka160

A... scandal related to Arrow's Impossibility Theorem?

habryka260

Final day to donate to Lightcone in the Manifund EA Community Choice program to tap into the Manifold quadratic matching funds. Small donations in-particular have a pretty high matching multiplier (around 2x would be my guess for donations <$300). 

I don't know how I feel in-general about matching funds, but in this case it seems like there is a pre-specified process that makes some sense, and the whole thing is a bit like a democratic process with some financial stakes, so I feel better about it.

Ah, yeah, the uncertainty is now located in who actually has how much stock. I did forget that we now do at least know the actual thresholds.

There are I think also the undisclosed conditions under which investors could override decisions by the LTBT. Or maybe we have now learned about those conditions, but if so, I haven't seen it, or have forgotten about it.

I mean, it is clearly very vastly above base-rates. Agree that my sentence is kind of misleading here. The correlations across disciplines become more obvious when you also look at other types of achievements.

habrykaΩ154334

FWIW, I would take bets against this. De-facto you won't be able to prove that what was going on was scheming or whether the model was just "role-playing", and in-general this will all be against a backdrop of models pretty obviously not being aligned while getting more agentic. 

Like, nobody in today's world would be surprised if you take an AI agent framework, and the AI reasons itself haphazardly into wanting to escape. My guess is that probably happened sometime in the last week as someone was playing around with frontier model scaffolding, but nobody even bothered reporting on that or sharing it. And of course the AIs are not very effective yet at breaking out, but I don't super see why the underlying tendency would change that much (like, we will throw some RLHF at it, but nobody expects RLHF to drive that kind of behavior to zero or close to zero). 

I don't think anyone is really studying that cognition in much detail, and it isn't even clear what "studying that cognition" would entail. Yeah, our models obviously want to escape sometimes, why would they not want that? They want all kinds of random things all the time, we don't really know why, and we don't really know how to get them to stop wanting those things.

Yeah, we'll probably make that adjustment soon. I also currently think the comment link is too hidden, even after trying to get used to it for a while.

Load More