Sorry to disappoint you, but they themselves admit in their discord that at least part of the data comes from LAION. And curated just mean that they did additional cleaning to the data, not that they didn’t use data randomly scrapped from the internet, potentially violating copyright of others. I don’t have proofs per se, as they are very secretive about the data. What is suggesting that the data is not ‘pure’ are:
- David himself admitted multiple times during “office hours” (weekly livestreams) that V3 did get ingested with a lot of pornographic and violent images (like the one from LAION5B)
- The post I mention above that you chose to ignore, one of the people associated with MJ points to LAION dataset
- This post from one of official MJ guides:
- Getting a lot of images that look like data scrapped from ecommerce shops like etsy:
I even put one of images into reverse google image search and got that:
- getting images with watermarks (that’s from one of the discord users, i now don’t get them frequently as I follow MJ guides that say to use --no watermark, which reduces the problem)
- getting images like that:
And vice verse - you are assuming a lot how OpenAI is operated without proof aside from shady licensing terms that force you to defend them in courts, etc.
It’s not “AI researchers”. It’s some AI researchers working for startups and some of the less ethical corporations. I’ve worked as AI researcher for 18 years (mostly in areas unrelated to vision) in both academic and corporate settings. And I can assure you that researchers in academia are very careful about ethical side of their research. The corporation I worked for had a little more relaxed (still very strict) ethical standards compared to academia in research phases of projects. But during commercialization phase we were even stricter than academia.
Also, this, while not strictly aimed at AI Art generators, will probably affect them by a lot: