• 0 Posts
  • 29 Comments
Joined 1 year ago
cake
Cake day: July 1st, 2023

help-circle








  • I’d welcome you to offer a rigorous definition of this supposedly well-known distinction. Computers don’t generate anything spontaneously. They always require some level of direction.

    Are the outputs of VSTs not “computer generated”? You can fumble around on a keyboard just moving up and down until you find the pitch you want, and the software will output an orchestral swell of dozens of instruments that take years and years to master, with none of that effort expended by the one mashing the keyboard.

    Is that sound computer-assisted or computer-generated in your estimation? Much the same with AI images. It’s not fundamentally different from any other computerized tool.


  • Depends on the workflow, in my opinion. There are people who just type “1girl lol” into a text box and there are some people who set up workflows with hundreds of steps including significant manual work done in Photoshop or GIMP.

    Similarly nearly all music these days is made with a DAW, which enables you to selectively edit and combine performances that otherwise you wouldn’t be able to achieve. Drummer off beat? Quantize it. Want a string section but don’t know how to play violin? Use a synth. And certainly there are people who are overly reliant on those tools because their core music abilities aren’t very strong.

    If you think any amount of computer assistance means that something isn’t art, then basically all music made since the 90s would also not be art. It’s not a binary. Any tool can be used tastefully or be used to mask an underlying lack of talent.










  • Critical to understanding whether this applies is to understand “use” in the first place. I would argue it’d even more important because it’s a threshold question in whether you even need to read 107.

    17 U.S. Code § 106 - Exclusive rights in copyrighted works Subject to sections 107 through 122, the owner of copyright under this title has the exclusive rights to do and to authorize any of the following: (1)to reproduce the copyrighted work in copies or phonorecords; (2)to prepare derivative works based upon the copyrighted work; (3)to distribute copies or phonorecords of the copyrighted work to the public by sale or other transfer of ownership, or by rental, lease, or lending; (4)in the case of literary, musical, dramatic, and choreographic works, pantomimes, and motion pictures and other audiovisual works, to perform the copyrighted work publicly; (5)in the case of literary, musical, dramatic, and choreographic works, pantomimes, and pictorial, graphic, or sculptural works, including the individual images of a motion picture or other audiovisual work, to display the copyrighted work publicly; and (6)in the case of sound recordings, to perform the copyrighted work publicly by means of a digital audio transmission.

    Copyright protects just what it sounds like- the right to “copy” or reproduce a work along the examples given above. It is not clear that use in training AI falls into any of these categories. The question mainly relates to items 1 and 2.

    If you read through the court filings against OpenAI and Stability AI, much of the argument is based around trying to make a claim under case 1. If you put a model into an output loop you can get it to reproduce small sections of training data that include passages from copyrighted works, although of course nowhere near the full corpus can be retrieved because the model doesn’t contain any thing close to a full data set - the models are much too small and that’s also not how transformers architecture works. But in some cases, models can preserve and output brief sections of text or distorted images that appear highly similar to at least portions of training data. Even so, it’s not clear that this is protected under copyright law because they are small snippets that are not substitutes for the original work, and don’t affect the market for it.

    Case 2 would be relevant if an LLM were classified as a derivative work. But LLMs are also not derivative works in the conventional definition, which is things like translated or abridged versions, or different musical arrangements in the case of music.

    For these reasons, it is extremely unclear whether copyright protections are even invoked, becuase the nature of the use in model training does not clearly fall under any of the enumerated rights. This is not the first time this has happened, either - the DMCA of 1998 amended the Copyright Act of 1976 to add cases relating to online music distribution as the previous copyright definitions did not clearly address online filesharing.

    There are a lot of strong opinions about the ethics of training models and many people are firm believers that either it should or shouldn’t be allowed. But the legal question is much more hazy, because AI model training was not contemplated even in the DMCA. I’m watching these cases with interest because I don’t think the law is at all settled here. My personal view is that an act of congress would be necessary to establish whether use of copyrighted works in training data, even for purposes of developing a commercial product, should be one of the enumerated protections of copyright. Under current law, I’m not certain that it is.