Hey, in case their transphobia wasn't enough for you, @swheritage is yoinking all the code on GitHub -- regardless of license -- to train a generative AI that plagiarizes code.

No matter how many times they say "ethical", it isn't.

mstdn.social/@swheritage/11204

@swheritage To find out if they have appropriated your code, you can check "Am I in The Stack?": huggingface.co/datasets/bigcod

However, _do not believe their supposed opt-out_. I mean, sure, submit an opt-out if you want, but I know how they operate -- they'll just keep doing whatever they want and never process any takedowns unless the law makes them.

@arborelia @swheritage Interesting! They appear to do license checks for this one. The repo I have which doesn't have a an open source license is not included there.

However, they do not operate with a list of allowed licenses - they've got a repo listed that uses a completely custom license (to prevent people from doing stupid shit with it) in there.

Their training data may not comply with the license.

This also makes me almost regret not putting hello.jpg on github, as I rehosted some code there that originally included it.

@ryanc @swheritage I'm already seeing in their list of opt-out GitHub issues that they've included some people's code that is "all rights reserved", and some people's GPL code.

@ryanc @swheritage Also, no language model is capable of obeying an attribution clause, which is in almost every license.

@arborelia @swheritage

github.com/ryancdotorg/goatsef

copyright/license is

"Written by 2004-2005 kometbomb (and some other people, thanks to them) Feel free to treat like your own kids. Sicko."

which would make any competent lawyer scream

@arborelia @swheritage Is there content I can post to github that is illegal in France which also won't get me banned from github? 🤔​

@arborelia @swheritage Also, this is associated with a university, which should have ethics people who are very risk averse...

@arborelia @swheritage Also, lest anyone think I don't really care about the copyright infringement...

GitHub's terms of service don't require that I allow copies of my code to be hosted there, only forks (which aren't really copies), and I've DMCA'd copies before.

github.com/github/dmca/blob/ma

@arborelia @swheritage

To the best of our knowledge, all files contained in the dataset are licensed with one of the permissive licenses (see list in Licensing information) or no license.

Emphasis mine.

What the cinnamon toast fuck?

@oreolek lol we're coming up on the 1 year anniversary of them not getting around to opt-out requests

@arborelia@computerfairi.es @swheritage@mstdn.social Yeah... I might go back to self-hosting git repos. Been an AGE since I did that, so maybe it's time to take a whack at it. I'm already using GitLab. Why not self-host? Fuck it. The only thing on this domain is this #Sharkey instance.

@arborelia @swheritage I know it is slightly warped, but for some reason, I first ran into the term face eating leopard at the same time I was trying to access Hugging Face. Every time I see the logo, I think I'm seeing the happiest face eating leopard.
Now reading the comments, I may not have made a mistake.

Sign in to participate in the conversation
Computer Fairies

Computer Fairies is a Mastodon instance that aims to be as queer, friendly and furry as possible. We welcome all kinds of computer fairies!