oh hey just as a PSA, any code any of y'all may have on github might have been scraped by @swheritage, a self-proclaimed "preservation" org which
- just hoovered up vast amounts of data without asking or telling anyone
- insists on deadnaming trans people forever for "integrity" reasons
- used it to build an LLM training data set
https://huggingface.co/datasets/bigcode/the-stack-v2 to check and for opt-out instructions
@outie @swheritage shit, they got all of my repos. 😡
@jaredwhite @outie Time to sue then. Those "BuT oPt-OuT iS a FoRm Of CoNsEnT!" shitfaces need to learn a valuable lesson, *especially* if what they do wouldn't be possible with opt-in. If @swheritage can't live without it then the organisation deserves to become history.
@Natanox @jaredwhite @outie @swheritage If the projects had an OSS license I don't see what grounds you have for a lawsuit.
@ao @Natanox @jaredwhite @outie @swheritage theatre Open source isn’t public domain. open source licenses only count if you follow their terms, which an AI doesn’t.
@ao @Natanox @jaredwhite @outie @swheritage their dataset also _doesn’t_ follow licenses. they just took stuff with no license. They openly say this.
On the AI model side, asking users to obey licenses for them (so they don’t have to) sure is a gambit.
@arborelia @Natanox @jaredwhite @outie @swheritage hm yeah I missed that part. yeah guess that's a good point.