Hey, in case their transphobia wasn't enough for you, @swheritage is yoinking all the code on GitHub -- regardless of license -- to train a generative AI that plagiarizes code.
No matter how many times they say "ethical", it isn't.
@swheritage To find out if they have appropriated your code, you can check "Am I in The Stack?": https://huggingface.co/datasets/bigcode/the-stack-v2
However, _do not believe their supposed opt-out_. I mean, sure, submit an opt-out if you want, but I know how they operate -- they'll just keep doing whatever they want and never process any takedowns unless the law makes them.
@ryanc @swheritage I'm already seeing in their list of opt-out GitHub issues that they've included some people's code that is "all rights reserved", and some people's GPL code.
@arborelia @swheritage Clearly, they aren't talking to their IP lawyer enough.
https://github.com/ryancdotorg/goatsefloppy
copyright/license is
"Written by 2004-2005 kometbomb (and some other people, thanks to them) Feel free to treat like your own kids. Sicko."
which would make any competent lawyer scream
@arborelia @swheritage Is there content I can post to github that is illegal in France which also won't get me banned from github? 🤔
@arborelia @swheritage Also, this is associated with a university, which should have ethics people who are very risk averse...
@arborelia @swheritage Also, lest anyone think I don't really care about the copyright infringement...
GitHub's terms of service don't require that I allow copies of my code to be hosted there, only forks (which aren't really copies), and I've DMCA'd copies before.
https://github.com/github/dmca/blob/master/2021/08/2021-08-03-brainflyer.md
To the best of our knowledge, all files contained in the dataset are licensed with one of the permissive licenses (see list in Licensing information) or no license.
Emphasis mine.
What the cinnamon toast fuck?
@ryanc @arborelia @swheritage the ethics committee is there: https://www.inria.fr/fr/comite-operationnel-devaluation-des-risques-legaux-et-ethiques
@ryanc @swheritage Also, no language model is capable of obeying an attribution clause, which is in almost every license.