tl;dr AI companies are slowly running out of data to train their models; synthetic data is not a viable alternative.
I can't remember where I saw it, but someone somewhere on YouTube suspected the next step for OpanAI and such would be to collect user data directly; recording conversations of users and using that data to train models further.
If I find the vid I will add a link here.