scientific and professional communities involved in AI, but with an important caveat: "It is not the data as a whole that is downloaded, but public text data from the Internet. In my opinion, there are three main directions for solving the "end of data" problem. The first is the generation of synthetic data - for example, the creation of simulators. The second is AI models that utilize data more efficiently. And the third is training and additional data collection through the interaction of models with the world."
Kirill Kotov, Technical Director of the IT company "KodTech", told a ComNews correspondent that data sets generated by the models themselves are increasingly used to australia telegram train and develop models on text data: "That is, to train the next-generation model, data is generated by the current-generation models. But there is a large layer of data that is either difficult to generate, and this will not provide the necessary quality, or is too resource-intensive. For example, images, video, voice. The issue of such high-quality data is really acute: for good training and tuning of the model, more of it is needed, especially there is a need for specialized data sets for specific industries and tasks."
Vladimir Fadeev, head of the AI laboratory at the integrator of effective IT solutions "First Bit", believes that the solution to the problem of lack of data for large language models lies in three key areas:
1. can generate texts for training future versions. This allows not only to increase the volume of data, but also to control its quality and content.
2. Working with existing data. Improving its quality through careful marking, cleaning and structuring makes it possible to use current resources more effectively.
3. Learning on multimodal data. Incorporating information from other formats, such as audio, video, and images. For example, audio can be transcribed into text, and images or videos can be analyzed and interpreted. This expands data sets and helps models better understand context.
Read also
Creating synthetic data. Existing models
-
- Posts: 575
- Joined: Thu Jan 02, 2025 7:18 am