Facebook is pouring a lot of time and money into augmented reality, including building its own AR glasses with Ray-Ban. Right now, these gadgets can only record and share photos, but what does the company think such devices will be used for in the future?
A new research project led by Facebook’s AI team suggests the extent of the company’s ambitions. It depicts AI systems that constantly analyze people’s lives using first-person video; record what they see, do and hear to help them with everyday tasks. Facebook researchers have outlined a number of skills they want these systems to develop, including “episodic memory” (answering questions such as “where did I leave my keys?”) And “audiovisual diarisation” (remembering who said what, when ).
Right now, the tasks described above cannot be reliably accomplished by any AI system, and Facebook emphasizes that this is a research project rather than a commercial development. However, it is clear that the company sees functionality like these as the future of AR computing. “Absolutely, when we think about augmented reality and what we would like to be able to do with it, there are opportunities across the road that we would take advantage of this kind of research,” said Facebook AI researcher Kristen Grauman The edge.
Such ambitions have enormous implications for privacy. Privacy experts are already concerned about how Facebook’s AR glasses allow users to secretly register members of the public. Such concerns will only be exacerbated if future versions of the hardware not only record recordings, but analyze and transcribe them, and turn carriers into walk-in monitors.
The name of Facebook’s research project is Ego4D, which refers to the analysis of first-person or “egocentric” video. It consists of two main components: an open data set with egocentric video and a number of benchmarks that Facebook believes AI systems should be able to handle in the future.
The dataset is the largest of its kind ever created, and Facebook partnered with 13 universities around the world to collect data. In total, about 3,205 hours of recordings were recorded of 855 participants residing in nine different countries. The universities, rather than Facebook, were responsible for collecting the data. Participants, some of whom were paid, wore GoPro cameras and AR glasses to record video of unwritten activity. This ranges from construction work to baking to playing with pets and socializing with friends. All recordings were deidentified by the universities, which included blurring the faces of spectators and removing personally identifiable information.
Grauman says the dataset is “the first of its kind in both scope and diversity.” The closest comparable project, she says, includes 100 hours of first-person footage shot entirely in kitchens. “We have opened our eyes to these AI systems for more than just kitchens in the UK and Sicily, but [to footage from] Saudi Arabia, Tokyo, Los Angeles and Colombia. ”
The second component of Ego4D is a series of benchmarks or tasks that Facebook wants researchers around the world to try to solve using AI systems trained in its datasets. The company describes these as:
Episodic memory: What happened (eg “Where did I leave my keys?”)?
Forecasts: What am I likely to do next time (e.g. “Wait, you have already added salt to this recipe”)?
Hand and object manipulation: What do I do (eg “Teach me to play drums”)?
Audiovisual diarisation: Who said what when (eg “What was the main topic of the class?”)?
Social interaction: Who interacts with whom (e.g. “Help me better hear the person talking to me at this noisy restaurant”)?
Right now, AI systems would find tackling all of these issues incredibly difficult, but creating datasets and benchmarks are proven methods to spur development in AI.
In fact, the creation of one particular dataset and an associated annual competition, known as ImageNet, is often credited with kickstarting the latest AI boom. The ImagetNet datasets consist of images of a large number of objects that scientists trained AI systems to identify. In 2012, the winning entry in the competition used a special method of deep learning to blow up past rivals and inaugurate the current era of research.
Facebook hopes that its Ego4D project will have similar effects for the augmented reality world. The company says that systems trained in Ego4D may one day not only be used in portable cameras, but also home assistant robots, which also rely on first-person cameras to navigate around the world.
“The project has the chance to really catalyze the work in this area in a way that has not really been possible yet,” says Grauman. “To move our field from the ability to analyze piles of photos and videos taken by people with a very special purpose, to this fluid, ongoing first-person visual stream that AR systems, robots, must understand in the context of ongoing activity.”
While the tasks that Facebook outlines certainly seem practical, the company’s interest in this area will worry many. Facebook’s record on privacy is abysmal and spans data leaks and fines of $ 5 billion from the FTC. It has also been shown repeatedly that the company values growth and commitment rather than user well-being in many domains. With this in mind, it is worrying that the benchmarks in this Ego4D project do not include prominent privacy protections. For example, the task of “audiovisual diarisation” (transcribing what different people say) never mentions removing data about people who do not want to be recorded.
When asked about these questions, a spokesman told Facebook The edge that it expected privacy to be introduced further down the line. “We expect that to the extent that companies use this dataset and benchmark to develop commercial applications, they will develop security measures for such applications,” the spokesman said. For example, before AR glasses can amplify a person’s voice, there may be a protocol in place that they follow to ask someone else’s glasses for permission, or they may limit the device’s range so that it can only pick up sounds from people with which I already have a conversation or which is in my immediate vicinity. ”
So far, such security measures are only hypothetical.