| |

Subscribe / Log in / New account

Open Source Initiative announces Open Source AI Definition 1.0

[Posted October 28, 2024 by jzb]

The Open Source Initiative (OSI) has announced the release of version 1.0 of the Open Source AI Definition:

The OSAID offers a standard by which community-led, open and public evaluations will be conducted to validate whether or not an AI system can be deemed Open Source AI. This first stable version of the OSAID is the result of multiple years of research and collaboration, an international roadshow of workshops, and a year-long co-design process led by the Open Source Initiative (OSI).

LWN covered the OSAID process, and final release candidate, on October 25.

OSI is now officially obsolete.

Posted Oct 29, 2024 1:06 UTC (Tue) by mirabilos (subscriber, #84359) [Link] (5 responses)

This marks the removal of OSI from the list of organisations the Open Source community can trust to watch over common interests.

Someone has already put up http://osd.fyi/ though I haven’t read it in detail yet.

Again, I’ve said everything that needs to be said, on my webpages, here and on Fedi, and have disabled reply notification for once.

OSI is now officially obsolete.

Posted Oct 29, 2024 8:50 UTC (Tue) by zoobab (guest, #9945) [Link] (3 responses)

Totally agree with you.

OSI cannot be trusted anymore.

OSI is now officially obsolete.

Posted Oct 29, 2024 8:52 UTC (Tue) by zoobab (guest, #9945) [Link] (2 responses)

I think the FSF will come up with a more strict definition, that will be compatible with Debian rules:

https://www.fsf.org/news/fsf-is-working-on-freedom-in-mac...

The training data is the core of the system, OSI surended to corporate lobbyists.

OSI is now officially obsolete.

Posted Oct 29, 2024 13:19 UTC (Tue) by coriordan (guest, #7544) [Link] (1 responses)

I disagree.

The OS AI definition has problems. The main one is that the "data information" part, which is *fundamental*, needs serious work. But that's why it has a version number. The work will continue.

Delaying the 1.0 would have been pointless. Firstly, the term "open source AI" is already being abused by a few proprietary software companies. It's important to have a definition so that we can ask people to stop abusing the term. Secondly, the discussions were kinda stuck with the same people. And thirdly, I don't think anyone actually knows right now what type of data information is needed (to enable studying and modification of an AI model).

Now there's a 1.0, and a lot of people will see that the "data information" section is insufficient, and hopefully there will be lots of discussions and we'll see some great ideas for a 2.0 that will really do what it is meant to.

A lot of free software projects are currently using AI-as-a-service, which is worse than proprietary software. This might be partly caused by the lack of a definition.

I'm also very interested in what FSF is doing.

OSI is now officially obsolete.

Posted Oct 29, 2024 17:49 UTC (Tue) by NYKevin (subscriber, #129325) [Link]

> Firstly, the term "open source AI" is already being abused by a few proprietary software companies.

I think it really is important to stress that point. For example, OpenRAIL-M has been bandied about as a supposedly "open" AI license, and models distributed under this license have been informally described as "open source," but it is certainly not an open source license by any reasonable definition. It contains explicit field-of-use restrictions that blatantly violate OSD#6. I think we can all be a little bit more charitable towards OSI here. They are trying to put out a branding fire, not issue the last word on what is and is not "open source AI."

OSI is now officially obsolete.

Posted Oct 29, 2024 10:57 UTC (Tue) by rettichschnidi (subscriber, #93261) [Link]

> This marks the removal of OSI from the list of organisations the Open Source community can trust to watch over common interests.

Codeberg is already discussing to limit their hosting to OSD, and exclude OSAID: https://codeberg.org/Codeberg/Community/issues/1654

What's going on here?

Posted Oct 30, 2024 14:06 UTC (Wed) by apoelstra (subscriber, #75205) [Link] (73 responses)

It sounds like there is a lot of history here, but I am not involved in LLM stuff and am completely in the dark.

Can somebody give a bullet point or two describing what's going on? I see multiple "I no longer trust OSI" comments, a link to an "open-source declaration" which contains far more text explaining how to sign using a plain-text email client than it does explaining what the issues at stake are, and references to posts elsewhere on Fediverse and such.

I know enough about this to know it'd be contentious, just from the LWN summary -- anyone trying to publish a "definition" for something that still isn't conceptually clear is going to get pushback -- but beyond that I'm in the dark.

Maybe this deserves a long-form LWN article.

What's going on here?

Posted Oct 30, 2024 14:27 UTC (Wed) by daroc (editor, #160859) [Link]

There is a long-form article about the process and final draft that may interest you.

What's going on here?

Posted Oct 30, 2024 15:08 UTC (Wed) by jzb (editor, #7867) [Link] (2 responses)

Maybe this deserves a long-form LWN article.

In the brief here I had added this:

LWN covered the OSAID process, and final release candidate, on October 25.

Curious - was that missed, or was that not what you had in mind by a long-form LWN article? (e.g., were there things you were hoping that would be covered in that article that were not?) Thanks!

What's going on here?

Posted Oct 30, 2024 16:43 UTC (Wed) by apoelstra (subscriber, #75205) [Link] (1 responses)

>Curious - was that missed,

My apologies. I simply missed it.

What's going on here?

Posted Oct 30, 2024 17:06 UTC (Wed) by jzb (editor, #7867) [Link]

No worries! Hope it's helpful.

What's going on here?

Posted Oct 30, 2024 15:55 UTC (Wed) by coriordan (guest, #7544) [Link] (68 responses)

I think the problem is that some people think OSI's definition is too weak (too easy to abuse), and a secondary issue is that some might think it's more complex than necessary.

On the issue of the definition being too weak, I would say that this version 1.0, yes, is too weak. But everyone involved is aware of this. To make a strong definition, more information is needed about how to ensure that people can study (understand) and modify an AI model. That info just doesn't exist right now, but people are working on it. And version 1.1 or 2.0 will be stronger.

Delaying the release until these aspects get "finished" would have been a strategically bad idea because the term "open source AI" is already being misused and is being written into law, so it's better for us to assert control over the term before others establish a common practice of using it to describe AI models which are clearly not open or free.

(So, while the people who want a stronger definition might think that waiting is a good idea, they would actually lead to a weaker definition because because during the delays, certain companies are establishing the idea that any AI model which can be downloaded and uses non-commercially is "open source AI".)

FSF is also working on a definition, and theirs might be stronger from the start. And FSF's definition will surely be very useful, directly, and will also hopefully lead to good ideas which will improve OSI's text.

On the secondary issue of the definition being more complex than necessary. I do wonder if certain parts could be removed. In particular, I wonder if the definition of "AI System" should be reduced to "a software system that includes an AI model". But some wording in the document is there to connect this document to some laws that already exist. So there might be words that look superfluous but which have a purpose when the context is explained.

(Can anyone comment on whether I've correctly summarised the complaints of the critics?)

What's going on here?

Posted Oct 30, 2024 19:52 UTC (Wed) by NYKevin (subscriber, #129325) [Link] (67 responses)

> (Can anyone comment on whether I've correctly summarised the complaints of the critics?)

I mostly agree with you, but to devil's advocate for a moment: Most of the critics would argue that there is nothing more to study, and that OSI could and should have written 1.0 to require the full release of all training data, just as the regular OSD requires the full release of all source code.

Personally, I think that is an unrealistic position to take in practice, because the regulatory and legal hurdles would make it impractical for any reasonably large model to actually do that, so you would end up defining open source AI into a small corner where it would not be able to meaningfully compete with proprietary AI.

The other issue is that I'm not entirely convinced this sideshow is even necessary in the first place. If the model is a derivative work of the training inputs, then OSAID 1.0 already requires the license to cover those training inputs anyway, because otherwise you would violate this requirement:

> Parameters: The model parameters, such as weights or other configuration settings. Parameters shall be made available under OSI-approved terms.
> [...]
> The Open Source AI Definition does not require a specific legal mechanism for assuring that the model parameters are freely available to all. They may be free by their nature or a license or other legal instrument may be required to ensure their freedom. We expect this will become clearer over time, once the legal system has had more opportunity to address Open Source AI systems.

That would imply that you already have permission to use and make derivative works of those inputs, since you're licensing it downstream to third parties. It should not be too much of a stretch to also get a license to redistribute those inputs verbatim.

OTOH, if the model is not a derivative work of the training inputs, then they are not a "form" of the model at all, let alone the "preferred" form of the model, so in that hypothetical, the critics would IMHO have no leg to stand on.

So putting those two together, OSI actually said this:

> Data Information: Sufficiently detailed information about the data used to train the system so that a skilled person can build a substantially equivalent system. Data Information shall be made available under OSI-approved terms.

They could have added something along the lines of the following, and I think it would strike a good balance, without putting open source AI into a tiny box:

> [...] In addition, if the model is a derivative work of one or more specific training inputs, those inputs must be provided verbatim on request from any recipient of the model, under OSI-approved terms, but if those inputs are collectively very large, a fee may be charged to defray the actual and reasonable logistical costs of providing them.

What's going on here?

Posted Oct 31, 2024 7:42 UTC (Thu) by smurf (subscriber, #17840) [Link] (63 responses)

> the regulatory and legal hurdles would make it impractical for any reasonably large model to actually do that,

which is the point of everybody who sues (or contemplates suing) those AI companies for their more-or-less-blatant copyright violation.

Yes, models trained with that heap of text wouldn't be "open source" if they had released a "strict" OSAID instead. But for all practical purposes they are neither Open nor Source anyway.

All OSAID 1.0 does in practice is to further muddle the waters.

What's going on here?

Posted Oct 31, 2024 8:32 UTC (Thu) by Wol (subscriber, #4433) [Link] (10 responses)

> which is the point of everybody who sues (or contemplates suing) those AI companies for their more-or-less-blatant copyright violation.

Except the meatspace equivalent of what the *AI COMPANIES* are doing, is going down the library and reading all the books. Where's the copyright violation in that?

Which is why EU regs explicitly say "no harm no foul".

What the users do with their output from the AI is another matter ...

Cheers,
Wol

What's going on here?

Posted Oct 31, 2024 11:31 UTC (Thu) by smurf (subscriber, #17840) [Link]

> Except the meatspace equivalent of what the *AI COMPANIES* are doing, is going down the library and reading all the books. Where's the copyright violation in that?

They don't just "read all the books". They have a zillion copies of all the books, presumably without having paid for any of them, and they speed-read them over and over.

Also, this is not how copyright works. An author who writes and publishes a book confers some rights to the publisher, some implied (people who buy books read them, libraries lend them *one person at a time* as archive.org found out to their detriment) and some not (you can't share that Klingon translation you made with others, let alone sell it, without asking the publisher and/or author). Selling a translation isn't explicitly excluded from the set of rights you get when you buy a book (yes I know some books explicitly exclude these but most don't) — but you may not do it anyway.

The concept of teaching a machine how people string along their sentences by feeding it a heap of books didn't even exist when most of these books were written. You thus can't argue that that's included in the set of rights you get by default when you buy a book. Precedent: when CDs came up, the right to publish music on CDs alongside LPs and tapes wasn't included in the set the music publisher got from the artists either. That had to be re-negotiated. And that's just a different medium (albeit the first lossless one), let alone a different concept.

… and that presumes that the AI companies actually paid for a sufficient number of all the books the training set includes, which they presumably didn't, let alone in a format that doesn't have DRM on it (the removal of which is kindof illegal, as I'm sure you know). Why the heck should OpenAI (and Meta and whoever) be allowed to do *any* of that, when people who actually *buy* their ebooks can't?

> Which is why EU regs explicitly say "no harm no foul".

Well, quite a few people seem to think that this kind of regs is utter baloney.

Politics doesn't understand the implications of copyright. Case in point: The last couple of decades' mountain of unreasonably-long copyright extensions, just because Disney (who appropriated much of their oevre from public-domain sources) spent a mountain of lobbying $$$ to prevent their own works become PD.

Copyright Law and the internet

Posted Oct 31, 2024 15:40 UTC (Thu) by jjs (guest, #10315) [Link] (8 responses)

As mentioned by smurf, it's not reading the books - it's copying them. On computer networks, when you request a page (or a document), you don't get the original, the server keeps the original and sends a copy (as if a library in lending books would make a copy and give you the copy - which they don't do).

CopyRIGHT is about the RIGHT to make COPIES. While US copyright law (17 USC 117) allows the copying of a computer program onto storage or into memory in order to use the program (17 USC 117(a)), but specifically requires authorization of the copyright owner to "Lease, Sale, or Other Transfer of Additional Copy or Adaptation" (17 USC 117(b)). Source: https://www.law.cornell.edu/uscode/text/17/117. Copyright office circular 61 (https://www.copyright.gov/circs/circ61.pdf) provides more detail.

Their FAQ (https://www.copyright.gov/help/faq/faq-digital.html) specifically spells out "Uploading or downloading works protected by copyright without the authority of the copyright owner is an infringement of the copyright owner's exclusive rights of reproduction and/or distribution. Anyone found to have infringed a copyrighted work may be liable for statutory damages up to $30,000 for each work infringed and, if willful infringement is proven by the copyright owner, that amount may be increased up to $150,000 for each work infringed. In addition, an infringer of a work may also be liable for the attorney's fees incurred by the copyright owner to enforce his or her rights."

Note that I can copy that paragraph because US government works are not eligible for copyright (17 USC 105 - https://www.law.cornell.edu/uscode/text/17/105).

I realize many people DO copy stuff off the internet, including copyright works. Often, they are not prosecuted. That's not because they're not breaking the law, rather than it would cost the copyright owner more to prosecute that what they believe it's worth.

Imagine someone making copies of all NY Times (used for example only, could be any major newspaper) for the past 50 years and publish it as a book. I'm confident the NY Times prosecute the person who did that for copyright violation and win.

Copyright Law and the internet

Posted Oct 31, 2024 17:25 UTC (Thu) by Wol (subscriber, #4433) [Link] (4 responses)

> As mentioned by smurf, it's not reading the books - it's copying them. On computer networks, when you request a page (or a document), you don't get the original, the server keeps the original and sends a copy (as if a library in lending books would make a copy and give you the copy - which they don't do).

And what this misses, is that if your computer did NOT copy it, then the web would have no content whatsoever. What should happen is you copy to your computer, you copy the contents into the neural net called a brain, and then you throw the computer copy away. Which is a pretty good description of training an AI engine.

This is the brain-deadness we had years ago when companies tried to charge extra for the privilege of copying the program you'd bought from disk into ram ...

> Imagine someone making copies of all NY Times (used for example only, could be any major newspaper) for the past 50 years and publish it as a book. I'm confident the NY Times prosecute the person who did that for copyright violation and win.

I take it you've forgotten the Belgian News Agency incident? Where they took Google to court (or tried to) "for copying our members web sites". Google said "fine, you want us to pay you to access your site? What you've got isn't worth it to us", and promptly blocked their spiders from all the News Agency sites. It didn't take long for the Belgians to start begging Google to come back.

I'm not a fan of modern copyright. It has its uses, but it's gone mad. Probably 99% of stuff under copyright is worthless. And of the stuff no longer under copyright, the majority of freely-available stuff is probably worth the same as the 1% of copyright material.

But when pretty much all I hear about the AI copyright maximalists is that they are behaving as "agent provocateur" - pretty much using as INPUT the copyright material they want to extract to claim injury - one wonders what the agenda is.

Personally, I would be very careful about using an AI. They hallucinate, they assume fake authority, I wouldn't trust them as far as I could throw them. Why should I care what they ingest, when they're about as useful to me as a GIGO machine?

All I can hear is the luddites screaming "AI is stealing our livelihood", when I'm far more concerned about the far deeper problem - AI is consuming huge amounts of energy to create minimal value, while destroying the planet in the process. Valencia, anyone?

Cheers,
Wol

Copyright Law and the internet

Posted Oct 31, 2024 23:44 UTC (Thu) by jjs (guest, #10315) [Link] (3 responses)

1. The exemption for copying is for programs, not data. Yes, we copy data all the time - normally with permission from the owner (they put it on the web to copy to our computer for us to read), but NOT the right to recopy. Looking at 17 USC, I don't find anything that I can see that addresses the AI issue, but, as I'm not a lawyer, I won't try to figure out what that means, or if I missed something. I will point out, copyright default in the USA is you're not allowed to make copies without explicit permission, either via 17 USC, or from the copyright holder. Also, in accordance with the Berne Convention, US Copyright is automatic upon fixing material in a form (paper, computer, etc.) - no registration needed (although that helps with lawsuits & damages). Which is why the US Government has to specifically disclaim copyright on USG produced material.

Regarding your other answer to pizza regarding US Government copyright, let me quote 17 USC 105(a):
"(a) In General.—
Copyright protection under this title is not available for any work of the United States Government, but the United States Government is not precluded from receiving and holding copyrights transferred to it by assignment, bequest, or otherwise."

2. Yep, familiar with several incidents like the one you mention. Technically, Google was copying. However, with what I saw of Google News, they would include the title, the first line (sometimes only part of it), and maybe an image. If you wanted the story, you needed to click on the link and go to the news site's website. Which means Google was directing traffic to their website. While they can sue about that, as they discovered, also under US Freedom of Speech (US Constitution, Amendment 1), Google was free to say "OK, we're not copying it, good luck getting traffic). They apparently decided that the money they got from visitors directed via Google was more than what they could have gotten from trying to enforce their copyright. Your point actually supports my argument, IMO.

3. Guess what? 99% of everything under copyright at any time is probably worthless. That doesn't mean you have the right to copy it.

I'll agree that there are problems with modern copyright law - mainly related to how long it lasts, but that's a separate discussion. We're discussing what copyright law IS, not what is should be.

Copyright Law and the internet

Posted Nov 1, 2024 0:09 UTC (Fri) by Wol (subscriber, #4433) [Link] (2 responses)

> Copyright protection under this title is not available for any work of the United States Government, but the United States Government is not precluded from receiving and holding copyrights transferred to it by assignment, bequest, or otherwise."

So USG works are not protected by *US*LAW*, but there's nothing stopping them from suing me in an "England and Wales" court under English law ...

Unlikely, sure, but they could ... (note that the whole point of Berne is that I can sue you, in a (presumably) US court, *as if* I were a (presumably) US Citizen). (Note also, that was a direct result of American law explicitly discriminating against foreign works and authors.)

Cheers,
Wol

Copyright Law and the internet

Posted Nov 1, 2024 0:17 UTC (Fri) by pizza (subscriber, #46) [Link] (1 responses)

> So USG works are not protected by *US*LAW*, but there's nothing stopping them from suing me in an "England and Wales" court under English law ...

Only if they independently hold copyrights on those works in the UK.

(Again, "no copyright" means "no standing to sue")

Copyright Law and the internet

Posted Nov 3, 2024 22:14 UTC (Sun) by Wol (subscriber, #4433) [Link]

> Only if they independently hold copyrights on those works in the UK.

Which I presume they do. In the UK copyright is automatic, and I'm not aware of any mechanism to automatically place works into the public domain. In the UK, it's UK laws that apply, not US.

Just because US law says those works are explicitly UNprotected by US law, doesn't deny those works automatic protection under UK law. Berne says (afaik) absolutely nothing about the special status of those works in the US - it merely says local laws may not discriminate against foreigners - they MUST be treated like locals. In fact, I think Berne could be read as we can not UNprotect US works without unprotecting our own government works!

Cheers,
Wol

Copyright Law and the internet

Posted Oct 31, 2024 17:28 UTC (Thu) by Wol (subscriber, #4433) [Link] (2 responses)

> Note that I can copy that paragraph because US government works are not eligible for copyright (17 USC 105 - https://www.law.cornell.edu/uscode/text/17/105).

Are you sure about that? Istr back in the day actually reading the law (or a synopsis of it) - probably on Groklaw - and concluding that the US Government COULD sue me for copyright infringement ...

Cheers,
Wol

Copyright Law and the internet

Posted Oct 31, 2024 17:55 UTC (Thu) by pizza (subscriber, #46) [Link] (1 responses)

> and concluding that the US Government COULD sue me for copyright infringement ...

Works _created_ by the US Federal Government (including all government employees operating in their official capacity) are explicitly placed public domain, ie not elgible for copyright protection.

However, works can be created by companies or private individuals and _assigned_ to the Federal Government. And the government, like any other rightsholder, can enforce those rights as it sees fit.

Copyright Law and the internet

Posted Oct 31, 2024 21:01 UTC (Thu) by Wol (subscriber, #4433) [Link]

> Works _created_ by the US Federal Government (including all government employees operating in their official capacity) are explicitly placed public domain, ie not elgible for copyright protection.

Istr that's only half the story. "placed in the public domain IN THE UNITED STATES". I'm not American, and why are those words there, if they don't mean anything?

Cheers,
Wol

What's going on here?

Posted Oct 31, 2024 15:06 UTC (Thu) by kleptog (subscriber, #1183) [Link] (50 responses)

> which is the point of everybody who sues (or contemplates suing) those AI companies for their more-or-less-blatant copyright violation.

In the US anyway. Suing in the US is attractive because (a) statutory damages if you can convince a judge of copyright infringement, and (b) winning creates a precedent which is binding on future cases.

Most of Europe has neither of those. You have to demonstrate actual damages (e.g. Tolkien selling fewer books because of OpenAI) and even if you win for one author, it would not necessarily help with any other authors.

A different approach is to make robots.txt a clear statement of intent (essentially done by Directive (EU) 2019/790). This gives companies leverage in negotiations, since despite no statutory damages, a judge can order OpenAI to remove the copyrighted data from the model under threat of some penalty. Since that would cost OpenAI millions, and so would force them to the table. Once they've established a market value for their dataset it makes demonstration of damages much easier.

Obviously it's going to take time to determine a proper valuation of datasets (e.g. the contents of Reddit), but from a public policy perspective it's not clear to me how a bunch of court cases to determine if a model is a derivative of the inputs is going to help.

What's going on here?

Posted Oct 31, 2024 17:37 UTC (Thu) by smurf (subscriber, #17840) [Link] (49 responses)

> You have to demonstrate actual damages

Counterexample: everybody who's been forced to pay a nontrivial amount of €€€ for P2P movie file sharing during the last two decades or so.

What's going on here?

Posted Oct 31, 2024 21:11 UTC (Thu) by kleptog (subscriber, #1183) [Link] (48 responses)

> Counterexample: everybody who's been forced to pay a nontrivial amount of €€€ for P2P movie file sharing during the last two decades or so.

Your point? Downloading a movie and watching it is clearly depriving the author/publisher of a sale that they could otherwise have had.

A computer cannot 'watch' a movie by itself, so training an AI on movies or books doesn't seem to be depriving anyone of anything. Unless you argue that you expect authors/publishers to get paid for copies that no human ever looked at. That seems like a form of copyright maximalisation we really should be avoiding.

The purpose of copyright is to promote certain public policy objectives, like encouraging authors to publish books. It's not at all clear to me that either allowing or denying AI training on copyrighted data has any impact on those objectives either way. Especially since we have already decided that for non-commercial and educational purposes it's no problem.

What's going on here?

Posted Oct 31, 2024 22:42 UTC (Thu) by smurf (subscriber, #17840) [Link]

> Your point? Downloading a movie and watching it is clearly depriving the author/publisher of a sale that they could otherwise have had.

That's not nearly as clear as you think it is. A teen who watches ten movies off P2P typically doesn't have the pocket money to go to the cinema ten times. On the flip side, maybe one or two of these motivates them to go to the cinema with their friends, instead of hanging out at the skating rink?

> Especially since we have already decided that for non-commercial and educational purposes it's no problem.

Have we?

Purpose of Copyright

Posted Nov 1, 2024 2:06 UTC (Fri) by jjs (guest, #10315) [Link] (46 responses)

Specifically, the purpose of copyright, since the Statute of Anne in 1709/10 is to have written works put into the public arena, by rewarding the author with income for producing the work, with the work becoming public domain after lapse of Copyright. Originally only for written works, it's steadily expanded. But, fundamentally, the goal has not changed.

And Copyright law for the most part doesn't care WHY the copy was made - making a copy is reserved for the copyright holder, to grant as they see fit (normally for books and written works to a publisher who reimburses them for each copy sold). Look up 17 USC (https://www.law.cornell.edu/uscode/text/17) for the US code.

Purpose of Copyright

Posted Nov 1, 2024 16:51 UTC (Fri) by NYKevin (subscriber, #129325) [Link] (38 responses)

Re pure copying: In practice, most if not all works in a typical training set are downloaded from the web. Web browsers make copies of absolutely everything you look at, even if caching is disabled (the content that appears on your computer screen is a copy of whatever is on the origin server), so if taken to its logical conclusion, this argument would "prove" that the web is illegal.

Of course, no judge is actually going to rule that way (at least in the US, anyway). Instead, they are going to rule that people who intentionally make content accessible over the web grant an implied license to copy that content (essentially saying "if you don't want people to make short-lived copies of your work, then don't post your work on the web where everybody has to make copies just to look at it"). Then the question becomes whether that license covers ML training.

You might respond that an implied license should not be extended to commercial activity. As a Google employee, I do not think it is wise for me to continue this line of reasoning beyond that point, but I would invite you to consider the distinction between search engine indexing and ML training, the fact that Google Search is a for-profit product, and the fact that some early ML training was carried out by ostensibly non-profit research labs, and then explain to me again exactly where you're going to draw the line. From where I sit, it does not look at all obvious.

***

And yes, I am aware of the Civil Law system, in which the legislature writes narrow exceptions for each and every situation instead of judges making up the rules as they go. I'm sure that there are clearer answers to this question in those jurisdictions. But I don't know much if anything about those laws, so I have nothing useful to say about them.

Purpose of Copyright

Posted Nov 1, 2024 18:03 UTC (Fri) by intelfx (subscriber, #130118) [Link] (37 responses)

> As a Google employee, I do not think it is wise for me to continue this line of reasoning beyond that point

...and that right here is the reason why I would never think of becoming a Google employee, nor would I advise anyone whom I consider a friend (or closer) to do so. ;)

> but I would invite you to consider the distinction between search engine indexing and ML training, the fact that Google Search is a for-profit product

As far as I know, one of the arguments here (I don't remember if it was an argument ever made in court or "just" a thought-experiment argument) is that Google Search makes limited verbatim copies of content to direct customers *towards* that content. Nothing of that sort happens when content is used to train models which are then used to create (likely) *competing* content.

Purpose of Copyright

Posted Nov 1, 2024 18:10 UTC (Fri) by Wol (subscriber, #4433) [Link] (35 responses)

> > As a Google employee, I do not think it is wise for me to continue this line of reasoning beyond that point

> ...and that right here is the reason why I would never think of becoming a Google employee, nor would I advise anyone whom I consider a friend (or closer) to do so. ;)

So are you saying that you would not engage in reasoning thought? I guess pretty much any employer you could name, you would find it easy to create arguments like that against becoming their employee.

Unfortunately, too little thought is as bad as too much thought. The GP has clearly decided his line of reasoning will go the way of all others - down the White Rabbit hole - the fate of anyone who actually tries to REASON about the consequences of their actions. Better to step back and say "this isn't worth it".

Cheers,
Wol

Purpose of Copyright

Posted Nov 1, 2024 23:56 UTC (Fri) by intelfx (subscriber, #130118) [Link] (34 responses)

> So are you saying that you would not engage in reasoning thought? I guess pretty much any employer you could name, you would find it easy to create arguments like that against becoming their employee.

No, I'm saying that I _like_ to engage in reasoning thought, and if my employment had any kind of a chilling effect on the "lines of reasoning" I might want to publicly entertain[1], then I wouldn't want to be anywhere near such employment.

[1]: Within the common moral norms, obviously.

Purpose of Copyright

Posted Nov 2, 2024 8:35 UTC (Sat) by NYKevin (subscriber, #129325) [Link] (33 responses)

I must admit that I am rather shocked that this is even considered a problem specific to Google. I am not aware of any company in the history of the world that would be happy for its employees to publicly speculate about whether said company's core business model is legal.

Purpose of Copyright

Posted Nov 2, 2024 14:54 UTC (Sat) by Wol (subscriber, #4433) [Link] (11 responses)

>> [1]: Within the common moral norms, obviously.

The problem I have with this is *WHOSE* moral norms - as a site with readers of many nationalities, there is no guarantee that your and my moral norms agree. Even among the English-speaking world I find the difference in moralities shocking ...

> I am not aware of any company in the history of the world that would be happy for its employees to publicly speculate about whether said company's core business model is legal.

And given that - as any Philosopher will tell you - reasoned thought will eventually and inevitably lead you to the conclusion "true = false", is such speculation even productive?

My employer is a supermarket. As part of the "save the planet" initiative, we are under pressure to reduce plastics use. The poster demon for which is carrier bags. OUR BUSINESS MODEL REQUIRES CARRIER BAGS. Reducing their use would massively and harmfully increase waste elsewhere. Reducing their use will sharply increase customer DISsatisfaction (as has been shown by other supermarkets trying it). Reducing their use is probably actually illegal under the various disability acts :-) although I really can't imagine us being sued over it - other than by disability rights activists demanding we keep them :-)

I still hate Microsoft. Someone described that as "the 1990s are calling, they want their prejudices back". Unfortunately I lived through that, and given that I was one of the people harmed by the fact their leader's moral compass pointed downwards, I'm quite happy with my long memory. Those harms still aren't undone. But unfortunately we can't change history.

HOWEVER. My view of Google is that - inasmuch as it has a moral compass - it points UP! It was founded by people who had a moral compass that points up, and it has kept that moral focus. I don't trust any company to "do the right thing", but I do trust the people at Google to do so. (Likewise with my employer.)

There are plenty of companies, and I'm sure people here could have a field day naming them, where the corporate culture leaves you very DIStrustful of whether the employees can be trusted. And no way would I put Google in that list.

Cheers,
Wol

Purpose of Copyright

Posted Nov 2, 2024 15:35 UTC (Sat) by intelfx (subscriber, #130118) [Link]

> The problem I have with this is *WHOSE* moral norms - as a site with readers of many nationalities, there is no guarantee that your and my moral norms agree. Even among the English-speaking world I find the difference in moralities shocking ...

This remark was meant to prevent "smart" retorts like "what if you publicly engage in talking about [insert heinous activity], would you also expect your employer to be OK with that".

> And given that - as any Philosopher will tell you - reasoned thought will eventually and inevitably lead you to the conclusion "true = false", is such speculation even productive?

Be that as it may, that's not for the employer to decide, or punish (even hypothetically).

Purpose of Copyright

Posted Nov 2, 2024 16:11 UTC (Sat) by smurf (subscriber, #17840) [Link] (9 responses)

> as any Philosopher will tell you - reasoned thought will eventually and inevitably lead you to the conclusion "true = false"

You're listening to the wrong kind of philosopher. There are plenty who tell us no such thing.

Purpose of Copyright

Posted Nov 3, 2024 22:22 UTC (Sun) by Wol (subscriber, #4433) [Link] (8 responses)

> > as any Philosopher will tell you - reasoned thought will eventually and inevitably lead you to the conclusion "true = false"

> You're listening to the wrong kind of philosopher. There are plenty who tell us no such thing.

"If you define a religion as an irrational belief in the unprovable, then not only is Mathematics (Philosophy - the study of logic) a religion, it's the only one that can *prove* it's a religion".

Where's the proof that one plus one equals two? Was Epimenides *really* a liar?

Why was there so much debate about angels dancing on the head of a pin? (An argument which - LOGICALLY - makes perfect sense ...(read Terry Pratchett :-))

Cheers,
Wol

Purpose of Copyright

Posted Nov 4, 2024 4:21 UTC (Mon) by smurf (subscriber, #17840) [Link] (6 responses)

> Where's the proof that one plus one equals two?

The last 100 years of Math groundwork seem to have passed you by. If you attach the conventional meanings to "one", "two", "plus" and "equals" then [ the mathematical equivalent of ] your sentence is perfectly provable.

If you choose not to, well, for certain values of "table" and "chair" it's trivial to argue that a table is a chair, but that doesn't get us anywhere. It also doesn't erase the common distinction between tables and chairs.

The applicability of the above to the current discussion is left as an exercise to the reader.

Purpose of Copyright

Posted Nov 5, 2024 13:14 UTC (Tue) by Wol (subscriber, #4433) [Link] (5 responses)

> The last 100 years of Math groundwork seem to have passed you by. If you attach the conventional meanings to "one", "two", "plus" and "equals" then [ the mathematical equivalent of ] your sentence is perfectly provable.

Actually, courtesy of Peano, I think you will find that the *belief* that one plus one equals two is an AXIOM, not a THEOREM. As in, "it cannot be proven".

And I believe the maths groundworks you refer to are rather older than you assume, Godel dates from this decade last century I believe, which is the proof that we cannot prove that logic is logical. (Any form of logic that is capable of reasoning about itself, will end up proving that true = false.)

Note the use of the word *cannot*. It is impossible to prove that logic is logical, it is impossible to prove that 1+1=2. We just have to believe it, because we cannot find any evidence to the contrary.

Cheers,
Wol

Purpose of Copyright

Posted Nov 5, 2024 13:42 UTC (Tue) by daroc (editor, #160859) [Link] (2 responses)

It's true that it's impossible for a logic system to prove its own correctness[1], but it is in fact possible for a stronger system to prove that a weaker system is consistent.

[1]: https://en.wikipedia.org/wiki/G%C3%B6del%27s_incompletene...

Purpose of Copyright

Posted Nov 5, 2024 15:41 UTC (Tue) by Wol (subscriber, #4433) [Link] (1 responses)

> it is in fact possible for a stronger system to prove that a weaker system is consistent.

Turtles all the way down it is :-) The problem comes when you don't have a stronger system.

Cheers.
Wol

Purpose of Copyright

Posted Nov 5, 2024 16:43 UTC (Tue) by daroc (editor, #160859) [Link]

You can construct infinite hierarchies of stronger systems! By selectively adding axioms asserting the consistency of subsystems, for example.

... but yes, that really doesn't make it less a case of turtles all the way down. Eventually, you have to ground your beliefs at some level.

Purpose of Copyright

Posted Nov 5, 2024 19:13 UTC (Tue) by mathstuf (subscriber, #69389) [Link] (1 responses)

> (Any form of logic that is capable of reasoning about itself, will end up proving that true = false.)

Slightly inaccurate…the alternative is that there are things that are true that the system cannot prove true or false. Generally that is considered better than having an inconsistency, so incompleteness is what we live with. But there are logical systems that blur the line…can't remember the name right now though.

Purpose of Copyright

Posted Nov 6, 2024 8:01 UTC (Wed) by mathstuf (subscriber, #69389) [Link]

Heh, they are "inconsistent logic" systems which try to contain the fallout of having things like `false = true` be reachable. https://plato.stanford.edu/entries/mathematics-inconsistent/

1+1=2 Proof

Posted Nov 5, 2024 21:30 UTC (Tue) by jjs (guest, #10315) [Link]

Not hard to find -
https://proofwiki.org/wiki/1%2B1_%3D_2

If you want the 378 page gory addition, check out the Principia Mathamatica - https://en.wikipedia.org/wiki/Principia_Mathematica for an overview of the book. https://plato.stanford.edu/entries/principia-mathematica/... directs to discussion of the proof, which took until pg 83 of Vol 2 of the work to reach the conclusion that, given the axioms used, 1+1=2

Purpose of Copyright

Posted Nov 2, 2024 16:04 UTC (Sat) by intelfx (subscriber, #130118) [Link] (20 responses)

> I am not aware of any company in the history of the world that would be happy for its employees to publicly speculate about whether said company's core business model is legal.

Strictly _happy_? No, never said that. But I'm certain that there were/are many companies in the history of the world that don't need/require/intend their employees to self-censor in this way.

Purpose of Copyright

Posted Nov 3, 2024 21:09 UTC (Sun) by NYKevin (subscriber, #129325) [Link] (19 responses)

Again in the "I'm stunned I need to explain this" category: I am not going to make my employer unhappy just so that I can argue with random people on the internet.

Purpose of Copyright

Posted Nov 3, 2024 21:22 UTC (Sun) by intelfx (subscriber, #130118) [Link] (18 responses)

> "I'm stunned I need to explain this"

Stunned is good, because yes, you _do_ need to explain something that is very much not a universal truth.

Anyway, this discussion is clearly going in circles, with my points not being responded to (in substance, rather than in emotions), so I'm going to wrap it up here.

Purpose of Copyright

Posted Nov 5, 2024 9:14 UTC (Tue) by farnz (subscriber, #17727) [Link] (17 responses)

This is less a Google thing, and more a US thing; most US states are "at will employment" (i.e. no notice required to quit or fire someone unless the firing is for an unlawful reason), and being fired for being seen as criticising your employer is not an unlawful reason.

The result is that people in the US are very much more cautious of risking being seen to criticise their employers than people elsewhere; I've seen this talking (in person) to friends who work at Google, where the UK employees were fine to go down paths that had their colleagues from the US telling them to shut up because it might upset their common employer, and I've since seen it with other multinationals, including Intel and Arm.

Purpose of Copyright

Posted Nov 5, 2024 12:24 UTC (Tue) by pizza (subscriber, #46) [Link] (16 responses)

> I've since seen it with other multinationals, including Intel and Arm.

It's is _extremely_ common for your employment (and severance) agreement to include some sort of anti-disparagement clause. Even though these clauses are largely unenforceable in jurisdictions with strong worker protection rules and/or unions, publicly badmouthing your employer rarely goes well in the short or longer term.

Anti-disparagement clauses and effect on speech

Posted Nov 5, 2024 13:15 UTC (Tue) by farnz (subscriber, #17727) [Link] (15 responses)

Specifically what I've noticed is that US-based employees tend to construe anti-disparagement clauses much more broadly than non-US employees.

A non-US employee would happily consider whether or not Google's training of search indexing ought to be legal or not; they'd still assume that it is legal, since their employer obviously wouldn't deliberately break the law, but they'd happily go down the line of "should the law be changed so that an implied licence does not cover commercial use", where a US employee will often refuse to speculate for fear of going over that line. The non-US employee is quite likely to bring up their employer's products or services as an example of what you'd lose if you changed the law to make the thing they do illegal, but will happily have that discussion even though it veers close to "maybe my employer's behaviour is not legal".

Anti-disparagement clauses and effect on speech

Posted Nov 11, 2024 13:05 UTC (Mon) by kleptog (subscriber, #1183) [Link] (14 responses)

> Specifically what I've noticed is that US-based employees tend to construe anti-disparagement clauses much more broadly than non-US employees.

I did some searching and it seems that while anti-disparagement clauses do exist around they world, their actual usage varies a lot. I'm in NL and I've never had one, but apparently they're more common for senior management positions.

One big difference appears to be that in much of the world such clauses are limited by (constitutional) freedom of expression which severely limits their applicability for most employees. But in the US since the constitutional freedom of speech protection is only against government censorship, employers can muzzle employees much more easily. As long as I'm not representing my employer I can express my opinion about them as much as I like.

(It's funny the comment about how people in the UK appear much more open to talking about their employer than Americans. I find UK employees very reticent compared Dutch people.)

Anti-disparagement clauses and effect on speech

Posted Nov 11, 2024 15:46 UTC (Mon) by Wol (subscriber, #4433) [Link]

> I find UK employees very reticent compared Dutch people.

I can be very critical of my employer. But it's always in a "We should be doing better" context. If people outside of work criticize them, I would defend them strongly, because while Americans seem to see companies as amoral sociopaths, I (and I hope most Brits) tend to see companies as collections of people who all individually want to do "the right thing".

In part I think it's because we have far fewer "Megastar" CEOs, and those we do have are generally viewed as pretty bad - Robert Maxwell, Philip Green, ...

Cheers,
Wol

Anti-disparagement clauses and effect on speech

Posted Nov 11, 2024 17:15 UTC (Mon) by paulj (subscriber, #341) [Link] (12 responses)

> I find UK employees very reticent compared Dutch people.

The Netherlands has something of an ingrained culture of upfront "honesty" though. To the extent that people from other countries - Anglo ones especially - pretty much have to get cultural-man-up training when moving to NL. ;) You've grown up with it, but the dutch, uhm, .... "lack of reticence" is something foreigners definitely notice. ;) Possibly only Germans can walk into that without culture shock.

I notice it and I partially grew up with it. This is something that is quite specific to certain Germanic countries - NL particularly, DE too. (I wonder what SE and DK are like - no experience. My limited data points are that Swedes and Danish are a touch more.... "reticent").

Dutch upfront honesty

Posted Nov 11, 2024 18:03 UTC (Mon) by rschroev (subscriber, #4164) [Link] (1 responses)

It's one of the major, or most noticeable, differences in culture between the Dutch and us, their Flemish neighbours. We are much more likely to not speak up to the relevant people when it matters, and then complain to each other behind their back.

Dutch upfront honesty

Posted Nov 12, 2024 10:26 UTC (Tue) by paulj (subscriber, #341) [Link]

I was going to bring in the historical background of Calvinism v Catholicism and its effect on culture as a possible explanation for the difference between the southern and northern Dutch (and the Vlaams are as Dutch as any Nederlanders, if not more so, surely? ;) Historically and linguistically).

Anti-disparagement clauses and effect on speech

Posted Nov 11, 2024 21:37 UTC (Mon) by kleptog (subscriber, #1183) [Link] (9 responses)

The thing is, I don't understand how you can get anything done if people don't say what they mean. It drives me up the wall. Things like the Anglo-EU translation guide help, but only so far. Like we had a meeting to discuss some survey results about how the business could be improved. Attendance was 50% UK/50% NL, but I think we got three words from the UK side. How on earth do you expect things to get better if you can't even bring yourself to say when something is crap?

Especially when one of the foundations of good software engineering is than you can give each other honest reviews so we all learn and get better. If you let bad choices pass it will cost everyone more in the long run.

The funny thing is, management in the UK had to really get used the Dutch directness. Later they actually started to appreciate it, because when we said something was a good idea we meant it and weren't just boot-licking. Which meant projects actually moved forward because there was buy-in, rather than being subtly held back by people who disagreed.

Apparently it's historically due to Calvinism, which explains why our Flemish neighbours don't do it. Also, seeing the tide come in and not daring to point out the dike is shoddy gets people killed.

Anti-disparagement clauses and effect on speech

Posted Nov 11, 2024 22:05 UTC (Mon) by Wol (subscriber, #4433) [Link] (6 responses)

> The thing is, I don't understand how you can get anything done if people don't say what they mean. It drives me up the wall. Things like the Anglo-EU translation guide help, but only so far. Like we had a meeting to discuss some survey results about how the business could be improved. Attendance was 50% UK/50% NL, but I think we got three words from the UK side. How on earth do you expect things to get better if you can't even bring yourself to say when something is crap?

Well, quite often I find that when I try to say something, people jump in, talk over me, and put words in my mouth that I would never ever say. Maybe that's why I say far too much here :-)

But it's quite likely that Dutch directness has a quite chilling effect on Brits - if the Dutch kept their mouths shut they might find the Brits spoke out much more. If the imbalance is really that bad in your meetings, you need to ask the Brits their opinion, and if any of your Dutch guys tries to talk over them, you tell them in no uncertain words to keep their trap shut and LISTEN, DON'T SPEAK.

Can't remember where I came across it - many many years ago - but there was a story about a board of directors called in a management consultancy to help them improve their board meetings. And a lot of the board members were quite puzzled as to the value one guy provided - "Why's he on the board, what's the point of having him". Until the consultants asked them where all the board's ideas came from, and pointed out that nearly all the board's "good ideas" came from him.

I'd seriously suggest that if the Dutch are doing all the speaking, they need to learn how to listen. Sorry.

Cheers,
Wol

Anti-disparagement clauses and effect on speech

Posted Nov 12, 2024 10:41 UTC (Tue) by taladar (subscriber, #68407) [Link] (4 responses)

It seems to me (as a German) that the UK is pretty bad at criticizing established systems in general considering how many parts of e.g. the UK political system really could do with a reform as they have recently shown severe deficiencies when someone doesn't follow some unwritten rules and yet the Brits seem to mostly be busy trying to avoid talking about those failures and necessary reform at all costs.

And I say that coming from a country that is pretty backwards itself when it comes to implementing necessary changes.

Anti-disparagement clauses and effect on speech

Posted Nov 12, 2024 12:59 UTC (Tue) by Wol (subscriber, #4433) [Link] (3 responses)

Yup. I agree we could do with SOME political reform. But as an example of a seriously botched reform I'd take our referendum on Proportional Representation as an example. We were presented with a simple choice - the existing First Past The Post or, (MASQUERADING as Proportional Representation) a Single Transferable Vote.

I want PR but not STV! Which way do I vote? I want to keep our existing system, but with a proportional top-up - open ONLY to people who came second.

The snag is, the ENGLISH like to think everything dates from time immemorial and before - "Time Immemorial" being twelve hundred and something, and "before" being "1066 and all that". The Scots, Welsh and Irish would beg to differ, never mind that most of us are Britons/Welsh ...

But with the United Kingdom dominated by petty small-minded little-englanders, it's quite hard to make people look out at all these good ideas in the wider world - the NHS is a wonderful example of a sacred cow that - maybe shouldn't be shot - but deserves to be put out to pasture!

Cheers,
Wol

Anti-disparagement clauses and effect on speech

Posted Nov 12, 2024 15:05 UTC (Tue) by farnz (subscriber, #17727) [Link] (2 responses)

Note that in the case of the existing FPTP system for Westminster, "time immemorial and before" is the Representation of the People Act 1948, which is the last major change to our voting system (there have been minor changes since, like lowering the voting age and controlling election expenses, but nothing significant).

A lot of the people who feel that things "shouldn't" change don't realise just how recently they did change…

Anti-disparagement clauses and effect on speech

Posted Nov 12, 2024 16:02 UTC (Tue) by Wol (subscriber, #4433) [Link] (1 responses)

"Time Immemorial" is, I believe, a legally defined term. As in "when our current legislative system was created", ie Magna Carta-ish in age. Long before 1948.

But yes, I think our current electoral system has probably only really been in existence for since "back to 1948 and the same again". When did they abolish the "rotten boroughs"? 1850-ish? and then Universal Suffrage about 1918 along with the gutting of the House of Lords?

(Or is Universal Suffrage technically the grant of the vote to all MEN over the age of 30? Again 1850-ish?)

Our modern electoral system is *mahousively* younger than Time Immemorial.

Cheers,
Wol

Anti-disparagement clauses and effect on speech

Posted Nov 12, 2024 16:52 UTC (Tue) by amacater (subscriber, #790) [Link]

In English law, "Time immemorial" == the date of the coronation of Richard 1st of England in 1189.

Anything prior is assumed to have been there for forever unless there is documentary evidence proving things one way or another. Otherwise, it's a convenient pivot date: anything before is considered true by default- "whereof the memory of man runneth not to the contrary"

[Legal history was part of my undergraduate degree]

Anti-disparagement clauses and effect on speech

Posted Nov 12, 2024 12:15 UTC (Tue) by kleptog (subscriber, #1183) [Link]

> But it's quite likely that Dutch directness has a quite chilling effect on Brits

Maybe, but it's not like they don't get the opportunity. Even when explicitly asked you get long silences. Maybe we just hire shy people. Even my manager (who is British) says this is normal for them.

Anyway, this is going quite far afield :)

Anti-disparagement clauses and effect on speech

Posted Nov 12, 2024 11:30 UTC (Tue) by paulj (subscriber, #341) [Link] (1 responses)

The approach elsewhere of not saying anything and letting issues fester can be much worse, I agree. The direct culture just takes a little getting used to! :)

I'm in a weird place, cause culturally I'm not really Dutch, however there must be a genetic component to that directness or something (or I infused enough of the culture in the time I did spend in NL), cause I have it a bit myself and do appreciate it. It's ultimately better than not having it. Though, I do think the Dutch sometimes could do with learning how to sweeten the directness a bit.

An interesting culture is the Chinese one. It avoids directness and confrontation. However, they do still bring up issues, just in a very indirect way. If someone from such a culture seems to be talking to you about something unrelated to anything at hand, and almost abstract - start looking for the allegory! They're probably trying to send you an important message. It's an interesting approach, as it has the advantage of avoiding personal hurt and egos - the issue is never raised in a personal, direct way - and so possibly makes it easier for people to bring up issues.

(That said, there /seems/ to me to be another level to dealing with issues in Chinese culture, which can involve frank and loud exchanges of views usually a group - at least a 3rd party present - yet people don't get outright angry, and they don't seem offended with each other at the end either. I don't speak Mandarin, but it seems quite direct. I havn't figured out how this works, and how the culture manages this escalation from the indirect raising of issues to that more direct, outspoken exchange).

I think it'd be really funny to work in a mixed Chinese-Dutch culture company.

Anti-disparagement clauses and effect on speech

Posted Nov 12, 2024 18:19 UTC (Tue) by raven667 (subscriber, #5198) [Link]

> I havn't figured out how this works, and how the culture manages this escalation from the indirect raising of issues to that more direct, outspoken exchange

I have no special insight but the first thing that comes to mind when presented with this behavior description is that the difference in attitudes would be most likely due to differences in relative social status, a small group of closely knit peers is a very different relationship than a hierarcal one (family or business) or the relationship with professional colleagues and acquaintances that may be similar status but is not personal.

Purpose of Copyright

Posted Nov 2, 2024 22:11 UTC (Sat) by kleptog (subscriber, #1183) [Link]

> As far as I know, one of the arguments here (I don't remember if it was an argument ever made in court or "just" a thought-experiment argument) is that Google Search makes limited verbatim copies of content to direct customers *towards* that content. Nothing of that sort happens when content is used to train models which are then used to create (likely) *competing* content.

If I mirror a website and then train an AI with it, that's a problem, unless I somehow generate traffic for that website, then it's suddenly ok? That sounds like an awfully fuzzy way to do this. I don't think you can make a meaningful distinction between the processing Google does for its search engine and the processing OpenAI does. I don't see the problem with creating "competing" content, that's how we get better content, except normally there's more human labour involved. LLMs don't really produce better output by themselves yet though, which with a human guiding them they do.

Purpose of Copyright

Posted Nov 1, 2024 18:03 UTC (Fri) by Wol (subscriber, #4433) [Link] (6 responses)

> Specifically, the purpose of copyright, since the Statute of Anne in 1709/10 is to have written works put into the public arena, by rewarding the author with income for producing the work, with the work becoming public domain after lapse of Copyright. Originally only for written works, it's steadily expanded. But, fundamentally, the goal has not changed.

Actually, no. The whole point of the Statute of Anne was to impose censorship and keep works locked up.

Your "justification", aka "to have written works put into the public arena, by rewarding the author with income", comes from the US constitution, which POSTdates the Statute of Anne by pretty much an entire normal lifetime ...

Cheers,
Wol

Purpose of Copyright

Posted Nov 1, 2024 18:14 UTC (Fri) by jjs (guest, #10315) [Link] (5 responses)

Great Britain already had censorship available - prior to the Statute of Anne, printers held what could be considered copyright, not authors. The Government could control what printers printed. The Statute of Anne changed that.

From https://www.britannica.com/topic/copyright - "The Statute of Anne, passed in England in 1710, was a milestone in the history of copyright law. It recognized that authors should be the primary beneficiaries of copyright law and established the idea that such copyrights should have only limited duration (then set at 28 years), after which works would pass into the public domain." The rest of the article gives a good account of the history of copyright.

Statute of Anne

Posted Nov 2, 2024 14:20 UTC (Sat) by jjs (guest, #10315) [Link] (4 responses)

I've discovered Yale Law School has digitized a number of old laws - not just of the US. Here's the exact wording of the Statute of Anne, for anyone interested in exactly what it says: https://avalon.law.yale.edu/18th_century/anne_1710.asp

Statute of Anne

Posted Nov 2, 2024 15:05 UTC (Sat) by Wol (subscriber, #4433) [Link] (3 responses)

INTERESTING.

Seeing as I've repeatedly seen it said that the Statute of Anne was what gave the printers the copyright ...

And I notice also that the American Constitution copied it - I wonder to what extent that was down to the fact the Statute appears to have been watered down pretty quickly. I'm not sure how much later but there's that quote from Hansard about an MP trying to "extend copyright to prevent authors dying in poverty", and referencing a well-known author's daughter I believe it was - only for it to be pointed out her father's works were owned by a publisher and she would benefit nothing.

Shades of what's happening today ...

(And a re-run of Magna Carta - which as a declaration of Human Rights is looked up to today, but as legislation it was a damp squib which was effectively worthless inside just a few years.)

Cheers,
Wol

Statute of Anne

Posted Nov 2, 2024 16:16 UTC (Sat) by jjs (guest, #10315) [Link]

Original US copyright was very similar to the Statute of Anne copyright - needed to be registered, was only good for 14 years, extensible for another 14. https://library.tc.columbia.edu/blog/content/2024/may/tod...

The US Constitution actually DIDN'T create copyright law - it only allowed it: "The Congress shall have Power . . . To promote the Progress of Science and useful Arts, by securing for limited Times to Authors and Inventors the exclusive Right to their respective Writings and Discoveries;"

Which enabled Congress to create copyright and patent, but didn't actually mandate it. https://www.archives.gov/founding-docs/constitution-trans...

Thomas Jefferson, for one, may not have been in favor of copyright & patents, from my understanding - https://www.goodreads.com/quotes/4276-he-who-receives-an-... - '"He who receives an idea from me, receives instruction himself without lessening mine; as he who lights his taper at mine, receives light without darkening me.”
― Thomas Jefferson, Selected Writings'

Here's a history of US copyright law I found: https://www.etblaw.com/history-of-copyright-law-in-the-un...

I leave to the others to track that history and current US law to see how they impact this discussion of AI use of copyrighted materials for learning - IANAL.

Statute of Anne

Posted Nov 3, 2024 8:36 UTC (Sun) by NYKevin (subscriber, #129325) [Link] (1 responses)

> Seeing as I've repeatedly seen it said that the Statute of Anne was what gave the printers the copyright ...

This is a simplification.

Under the old (pre-Anne) system, licenses were issued by the Crown, and gave printers exactly the same exclusivity they would later enjoy under copyright. This was seen as a form of censorship, so when it went up for renewal, Parliament allowed it to lapse, in spite of the publishers' protests that they all rather liked this system. The publishers spent a decade or two having to (gasp) compete with one another, but eventually they managed to find a way of restructuring the old system so that it superficially looked like a benefit to the author instead of a means of censorship (by having the author sign the papers instead of the Crown). They then presented that to Parliament, and it became the Statute of Anne.

After it was enacted, they told all of the non-famous authors "either you sign this ridiculously one-sided publishing contract which hands us full control of your copyright, or else your book will never see the light of day," thus effectively reviving the same system that Parliament had just allowed to expire, but with the author substituted for the Crown. That problem did not get properly addressed until the US hit upon the notion of termination rights in 1976 (which mostly don't exist in other countries). See 17 USC 203 and 304(c) for the legalese on that, but the TL;DR is that they gave the author an inalienable right to revoke any license (or outright transfer of copyright) during a five-year period beginning 35 years after it was executed (the idea being that a successful author can negotiate a more equitable license with the publisher, or with a different publisher, just by threatening to terminate the existing license).

Unfortunately, termination rights apply to all licenses, even FOSS licenses,* and the law very intentionally provides no loopholes** other than "be someone's employee when you wrote the thing" or "put it in your will and die." Sooner or later, someone is going to revoke a FOSS license, and a bunch of lawyers will descend and extract a large sum of time, energy, and probably money from the FOSS community as a whole, all because some book publishers in the 18th century were unwilling to participate in the free market.

***

* To be fair, termination rights can only be exercised against specific, named licensees. So you can revoke the license for BigTechCo. Inc., or revoke the license for John Doe, but you can't revoke the license for "everyone." But that's still a big problem in the case of FOSS developed primarily by a small number of contributors or companies. It is as yet unclear whether BigTechCo./John Doe can immediately turn around and download another copy, and thereby obtain a fresh license, or if the law would somehow "see through" this arrangement.
** No, you can't put "don't terminate the license" in the license, nor in a separate agreement, Congress thought of that rather obvious loophole and explicitly closed it (the law says you can terminate the license even if you have agreed not to terminate the license). Nor can you sell or gift the whole copyright to a trust or other entity (you can terminate both licenses and transfers of copyright). Any contract which tends to frustrate termination rights in some more elaborate fashion is possibly void as contrary to public policy, unless it is an employment contract (works made for hire are not subject to termination rights).

Statute of Anne

Posted Nov 3, 2024 22:29 UTC (Sun) by Wol (subscriber, #4433) [Link]

> ** No, you can't put "don't terminate the license" in the license, nor in a separate agreement, Congress thought of that rather obvious loophole and explicitly closed it

For Open Source, you could always argue that the Constitution holds that loophole open. Open Source "advances the arts", and revoking a licence actively hinders the pre-conditions for Copyright.

But then, I'd argue that "life plus 50" is unconstitutional, so that argument might well not fly.

Cheers,
Wol

What's going on here?

Posted Oct 31, 2024 18:22 UTC (Thu) by NYKevin (subscriber, #129325) [Link]

If you would have read the rest of my comment, you would have noticed that I already discussed the possibility of the model being a derivative work of the training inputs, and the fact that OSAID obligations do indeed attach in such a situation (i.e. the OSAID 1.0 already implicitly requires open source models which are derivative works of part or all of their training data to license said training data). I also specifically recommended that OSAID should require the training inputs to be distributed verbatim on request in such a case, since by assumption they already have (or should have) a license anyway.

In other words, OSAID 1.0 already accounts for the problem you have identified. Piratical (infringing) models are not and can never be OSAID 1.0 compliant.

What's going on here?

Posted Oct 31, 2024 13:51 UTC (Thu) by raven667 (subscriber, #5198) [Link] (2 responses)

> Personally, I think that is an unrealistic position to take in practice, because the regulatory and legal hurdles would make it impractical for any reasonably large model to actually do that, so you would end up defining open source AI into a small corner where it would not be able to meaningfully compete with proprietary AI.

Personally I think, based on what I've read about it so far, that training data for ML should require explicit opt-in for people who own that data (the writers/copyright holders) and maybe even some regulatory limits on how many national resources can be spent on ML training, even if that delays advancements in ML technology by 50y I don't think that impracticality is a problem because nothing that ML is capable of providing solves any real problem, we have all the technological capabilities we need to solve the serious problems of our time right now without adding expensive ML/LLM/"AI" on top of that, it just causes more problems.

As far as a FOSS definition of "AI" I think that the training data set is necessary for something to be fully Open so that the model can be completely replicated, that's analogous to the source code, but you could also have a lesser definition where the model training is secret but the weights can be adjusted and the model can be copied.

What's going on here?

Posted Oct 31, 2024 23:51 UTC (Thu) by NYKevin (subscriber, #129325) [Link] (1 responses)

> and maybe even some regulatory limits on how many national resources can be spent on ML training,

Personally, I am strongly opposed to the government telling me what code I may or may not run on my hardware, especially if it's purely on the basis of someone thinking it's not a good use of "national resources."

(Some of the early LLM and latent diffusion research was done with public funding, which I suppose is what you could've meant, but that's mostly been replaced with private VC money by now.)

What's going on here?

Posted Nov 1, 2024 12:01 UTC (Fri) by paulj (subscriber, #341) [Link]

The huge energy use required for training on AI data-clusters causes externalities. The scraping or acquisition of public data for use in training, by a few very rich corporates able to afford the huge CapEx for the AI data-centres, and OpEx for the energy, invokes issues of social good - particularly when they're using that AI just to figure out how to serve us better advertising (a lot of the public hype is about AI for generating videos of cats, but within those tech corps, recommendation systems are a big use-case for AI).

I err towards minimal government regulation, but it's still a goldilocks thing - too little is as bad as too much.

In Ireland we are effectively limiting big tech from building more massive AI through planning laws. Cause data-centres are now sucking up ~20% of power generation in the state, and the continued expansion of data-centres (some, much?, of it AI) is just not sustainable at all.