Version 3.0 is the result of hundreds of hours of training, tweaking, block merging, and refining. The training data included hand curated and hand written captions of a set of over 1000 images carefully selected to be high quality and representative. This was curated from some photos I took myself, a sample of 200 photos selected using Laion5B KNN searching, and a hand curated collection from a variety of sources, all of which were paid for.
Version 3.0 also includes a brand new VAE, trained with a custom loss metric that I developed that focuses on spectral-similarity using wavelet loss. It also uses LPIPS perceptual similarity to enhance very fine details. The new VAE has improved realism over the standard vae-ft-mse-840000-ema-pruned, though there are occasional artifacts in the form of orange highlights. These are uncommon though, and can easily be resolved through variance, slight prompt changing, a new seed, or image2image.
Final results result in the following improvements:
Significantly improved realism
Significantly improved skin-texture
Better lighting
More natural colors (v2.0 suffered a lot from color shift)
Less over-fitting than v2.0
Better subject integration when using the new VAE (less dark halos)