@swheritage To find out if they have appropriated your code, you can check "Am I in The Stack?": huggingface.co/datasets/bigcod

However, _do not believe their supposed opt-out_. I mean, sure, submit an opt-out if you want, but I know how they operate -- they'll just keep doing whatever they want and never process any takedowns unless the law makes them.

@arborelia @swheritage Interesting! They appear to do license checks for this one. The repo I have which doesn't have a an open source license is not included there.

However, they do not operate with a list of allowed licenses - they've got a repo listed that uses a completely custom license (to prevent people from doing stupid shit with it) in there.

Their training data may not comply with the license.

This also makes me almost regret not putting hello.jpg on github, as I rehosted some code there that originally included it.

@ryanc @swheritage I'm already seeing in their list of opt-out GitHub issues that they've included some people's code that is "all rights reserved", and some people's GPL code.

@ryanc @swheritage Also, no language model is capable of obeying an attribution clause, which is in almost every license.

@arborelia @swheritage Clearly, they aren't talking to their IP lawyer enough.

@arborelia @swheritage Is there content I can post to github that is illegal in France which also won't get me banned from github? 🤔​

@arborelia @swheritage Also, this is associated with a university, which should have ethics people who are very risk averse...

@arborelia @swheritage

To the best of our knowledge, all files contained in the dataset are licensed with one of the permissive licenses (see list in Licensing information) or no license.

Emphasis mine.

What the cinnamon toast fuck?

Sign in to participate in the conversation
Computer Fairies

Computer Fairies is a Mastodon instance that aims to be as queer, friendly and furry as possible. We welcome all kinds of computer fairies!