https://youtube.com/watch?v=zprSxCMlECA
There’s #demoscene, and then there’s this abomination of nature and genius
Somebody must have bet him that he couldn’t make a demo without the computer itself
It's late enough to be hacker hours, if you're as old as I am. Gonna write down a bunch of rambly thoughts about #xz and #autoconf and capital-F Free Software sustainability and all that jazz. Plan is to edit it into a Proper Blog Post™ tomorrow. Rest of the thread will be unlisted but boosts and responses are encouraged.
They seem to have changed development priorities so that handling takedowns and name changes has more than the 0 priority they gave it before.
But they're doing it their way, which means they have to code themselves out of the corner they coded themselves into. I'm glad to see the development but disappointed that they're just _starting_ after two years.
No updates that I know of on the "feeding everyone's code, regardless of license, into a HuggingFace AI" front.
The situation with the Software Heritage Archive might be improving, and they stopped deadnaming me in particular.
I don't want to give them credit for this until they document a process that _other_ trans software authors can use, but here are the updates:
Imagine if every time you looked up anything in the Yellow Pages or encyclopedia, there was this big section at the top of the page called "Notes from Gary" where some guy named Gary who seems to know about 20% less than the average person just sorta gets to say whatever he wants about the subject, no matter how irrelevant it is.
This is what it's like to use Google now.
@braid @wikimediafoundation i facilitated a couple of those art+feminism wiki edit a thons a few years ago and men were overly criticizing pages we made, saying these notable women were not notable, threatening to remove their pages despite all the awards these artists and architects had won, even before the workshops were over. it was so discouraging that i don't do it anymore.
Wait, it's even worse. The dataset is based on @swheritage's archive, containing way more than just GitHub (e.g. @Codeberg is archived, too).
I assumed they were somewhat neutral, but they're praising the LLM usage of this unlicensed code:
https://www.softwareheritage.org/2024/02/28/responsible-ai-with-starcoder2/
Also, they're refusing to remove deadnames, even outright ignoring GDPR demands for it:
https://cohost.org/arborelia/post/5169338-the-software-heritag
I can only conclude that they're a bad actor and should be considered harmful by the #OpenSource community.
The Software Heritage Archive wants to deadname me forever: part 3.
https://cohost.org/arborelia/post/5169338-the-software-heritag
sometimes the Steam Community pages are good
[from the Steam community page for alpha centauri uploaded by someone whose steam name won’t render in my client]
oh hey just as a PSA, any code any of y'all may have on github might have been scraped by @swheritage, a self-proclaimed "preservation" org which
- just hoovered up vast amounts of data without asking or telling anyone
- insists on deadnaming trans people forever for "integrity" reasons
- used it to build an LLM training data set
https://huggingface.co/datasets/bigcode/the-stack-v2 to check and for opt-out instructions
Every time I boot to a flash drive I'm reminded of Jersey Jack's pinball update process
You flash the new software to a flash drive, open up the coin door, put t' plug int' 'ole and wait a bit until it's done.
And then you IMMEDIATELY TAKE THAT FLASH DRIVE AWAY AND WIPE IT CLEAN right then and there before you do anything else
Because if you leave it lying around, and someone (MAYBE FUTURE-YOU) goes "Huh what's on this" and plugs it in, and then forgets to unplug it between reboots, it will format your hard drive and turn your computer into The Hobbit Pinball *completely automatically and without any input from you whatsoever*
I erased those 18 minutes on Nixon’s tapes. there was some funny stuff in there but it’s gone. Sorry
To the best of our knowledge, all files contained in the dataset are licensed with one of the permissive licenses (see list in Licensing information) or no license.
Emphasis mine.
What the cinnamon toast fuck?
@swheritage To find out if they have appropriated your code, you can check "Am I in The Stack?": https://huggingface.co/datasets/bigcode/the-stack-v2
However, _do not believe their supposed opt-out_. I mean, sure, submit an opt-out if you want, but I know how they operate -- they'll just keep doing whatever they want and never process any takedowns unless the law makes them.
Hey, in case their transphobia wasn't enough for you, @swheritage is yoinking all the code on GitHub -- regardless of license -- to train a generative AI that plagiarizes code.
No matter how many times they say "ethical", it isn't.
I like games that you can play again and they're different the next time: such as randomizers, roguelikes, and gender expression! Twitch stream: https://twitch.tv/arborelia
also at: https://cohost.org/arborelia