The question then becomes if it is even possible to avoid that kind of image replication issue - and I think it will be very hard to keep the model from choosing certain "eigenimages" for the internal representation within the weights.
There are just too high correlations for certain keywords (like scream and the Munch painting) to prevent that sensibly. In other cases it is probably impossible to increase coverage for certain unique keywords, which might only be connected to a single image.
In addition the computational (and cost) requirements of a "fresh" training are so high, that many techniques to prevent overfitting are not easily applicable.
That's why I doubt the "technical problems which can easily be addressed" is a correct statement.
The question then becomes if it is even possible to avoid that kind of image replication issue - and I think it will be very hard to keep the model from choosing certain "eigenimages" for the internal representation within the weights.
There are just too high correlations for certain keywords (like scream and the Munch painting) to prevent that sensibly. In other cases it is probably impossible to increase coverage for certain unique keywords, which might only be connected to a single image.
In addition the computational (and cost) requirements of a "fresh" training are so high, that many techniques to prevent overfitting are not easily applicable.
That's why I doubt the "technical problems which can easily be addressed" is a correct statement.