Algorithm that describes images for the blind from Microsoft
Posted: Sun Feb 02, 2025 3:57 am
On October 14, Microsoft introduced the second version of its automatic image captioning system, which gives them descriptions with human accuracy.
Its algorithm generates image descriptions (subtitles and captions) for web pages and documents, making life easier for people with disabilities.
Above, it was: "Portrait of a cat." Below, it became: "Gray cat with closed eyes."
The corporation used the first version in the Seeing AI app for people with visual impairments . It uses the smartphone camera to read text, identify people, describe objects and the environment. Seeing AI can also describe the content of images in email clients, social networks and instant messengers.
Demonstration of the new algorithm
The company launched Seeing AI in 2017. The AppleVis community for the blind and visually impaired has recognized the development as the “best primary or assistive service for people with disabilities” for three consecutive years.
The new system will increase the accuracy of Seeing AI. It will be able not only to identify objects, but also to find connections between them, to understand how they interact. For example, instead of "person, chair, guitar," the algorithm will say "person sitting on a chair and playing a guitar."
Microsoft will integrate the improved recognition system into saudi arabia number data Word, Outlook, and PowerPoint, for example, to automatically sign attached images.
Above, it was: "Man in blue shirt." Below, it became: "Several people in surgical masks."
developers through Azure Cognitive Services computer vision tools. Seeing AI will also receive an update.
The company achieved the improvement by pre-training a large AI model on a set of images paired with word tags, rather than full captions (which are less efficient to generate). Each tag is tied to a specific object in the image.
The pre-trained model was then trained on a set of captioned images, allowing it to construct complete sentences. Ultimately, it used its “visual dictionary” to create captions for images with new objects.
The new algorithm is twice as good as the previous one, which has been in use since 2015, Microsoft claims. It ranks first in the nocaps image caption test, the industry's main benchmark, and beats the nocaps development team in terms of scores.
The nocaps test consists of 166,000 “human” captions for 15,000 images. The scenarios range from sports to food and holidays. Algorithm captions should be as good as human ones.
Its algorithm generates image descriptions (subtitles and captions) for web pages and documents, making life easier for people with disabilities.
Above, it was: "Portrait of a cat." Below, it became: "Gray cat with closed eyes."
The corporation used the first version in the Seeing AI app for people with visual impairments . It uses the smartphone camera to read text, identify people, describe objects and the environment. Seeing AI can also describe the content of images in email clients, social networks and instant messengers.
Demonstration of the new algorithm
The company launched Seeing AI in 2017. The AppleVis community for the blind and visually impaired has recognized the development as the “best primary or assistive service for people with disabilities” for three consecutive years.
The new system will increase the accuracy of Seeing AI. It will be able not only to identify objects, but also to find connections between them, to understand how they interact. For example, instead of "person, chair, guitar," the algorithm will say "person sitting on a chair and playing a guitar."
Microsoft will integrate the improved recognition system into saudi arabia number data Word, Outlook, and PowerPoint, for example, to automatically sign attached images.
Above, it was: "Man in blue shirt." Below, it became: "Several people in surgical masks."
developers through Azure Cognitive Services computer vision tools. Seeing AI will also receive an update.
The company achieved the improvement by pre-training a large AI model on a set of images paired with word tags, rather than full captions (which are less efficient to generate). Each tag is tied to a specific object in the image.
The pre-trained model was then trained on a set of captioned images, allowing it to construct complete sentences. Ultimately, it used its “visual dictionary” to create captions for images with new objects.
The new algorithm is twice as good as the previous one, which has been in use since 2015, Microsoft claims. It ranks first in the nocaps image caption test, the industry's main benchmark, and beats the nocaps development team in terms of scores.
The nocaps test consists of 166,000 “human” captions for 15,000 images. The scenarios range from sports to food and holidays. Algorithm captions should be as good as human ones.