Facebook如何教电脑“看”人 - 手机财富中文网

Facebook如何教电脑“看”人

Stacey Higginbotham | 2015-06-25 21:25

Facebook today launched its Moments product, which uses Facebook’s image recognition abilities to scan your photos for your friends and then lets people create private photo albums with a particular group, such as the people in the photo. The idea is to make it easier to share photos from a big event among attendees without the cumbersome process of emailing snapshots to everyone or the awkward end-of-event huddle while six people take the exact same group shot. It’s not a cure for cancer, but behind the scenes of this new feature is an impressive technology that Facebook has been working on for years.

A key element of the Moments feature is the ability for Facebook’s algorithms to recognize people’s faces across different photos, so that Moments knows who was at the event. This requires computer vision expertise that companies such as Google, Microsoft, Baidu, and others are currently researching for everything from self-driving cars to silly web products such as Microsoft’s How Old Do I Look?

In launching the Moments product Facebook is sharing data about its own successes in computer vision research. Namely, that Facebook can recognize faces with a 98% accuracy, and it can do so quickly—the company says it can identify you in one picture out of 800 million in less than 5 seconds. Finally, it can do all of this even if it doesn’t have the full frontal shot of your face (or even if your face isn’t in the photo at all), thanks to a machine learning algorithm that can look at other elements in the picture and associated with the photo’s data.

Facebook近日发布了一款名叫Moments的产品。它使用Facebook的人脸识别技术，为你的朋友扫描你的照片，然后让人们与一个特定群组（比如照片中的人）

创建私人相册。这样一来，在大型活动结束后，人们就免去了用电子邮件一张张地互相发照片的麻烦，或是所有拿到手的都是同一张“大合影”的尴尬。它并非是像治愈癌症那样重大的发明，但在这项新功能的背后，却是一项Facebook已经苦心钻研了多年的技术。

Moments功能的核心元素之一，是在不同的照片中识别人脸所用的算法，这样Moments才能知道谁出席了这次活动。要做到这一点，就需要高超的计算机视觉技术。目前包括谷歌、微软和百度在内的多家科技公司都在研究该技术，因为它的应用前景极其广泛，既可用于自动驾驶汽车，也可用于像微软的“颜龄神器”这种除了卖萌耍宝之外，不知道干什么用的产品。

在Moments发布的过程中，Facebook也分享了它在计算机视觉研究方面取得的进展。目前Facebook的人脸识别技术达到了98%的准确性，而且识别所需的时间很短。该公司称，只需要不到5秒钟的时间，它的人脸识别技术就能在800万张照片中迅速地找到你的脸。另外，哪怕你在照片中出现的不是正脸（或者甚至你根本没在照片里露面），计算机也能精准地识别出来。这要归功于Facebook开发的一种机器学习算法，它能参考照片的其它元素，然后与照片的数据进行关联。

Facebook

Inside Moments

Fortune spoke with Yann LeCun, Facebook’s director of artificial intelligence research, to understand how his team helped a computer understand who you are, and where Facebook is heading next with its AI research. Perhaps the first thing to understand is that when LeCun discusses computer vision, it’s not the same as how a person sees, although the process of teaching software how to recognize an object has some similarities.

For example, Facebook’s facial recognition, which is the basis of the current efforts, can’t identify you. It only can recognize if a person in one photo is the same as a person in another photo. Identification is a completely separate step.

Because Facebook is about connecting people, its computer vision efforts have focused on recognizing faces as opposed to cats, cars, or other non-human subjects. To do this, it uses a database of celebrity and politicians photos calledLabeled Faces in the Wild. This collection of images has 13,000 photos of people with different hairdos, different outfits, sometimes wearing glasses and more. Facebook used this collection to train its machine learning algorithms. Other companies have used this data set as well, and some universities have even trained systems with a higher than 98% accuracy rate using Labeled Faces.

So how did Facebook get from giving a machine a picture of Angelina Jolie to somehow using that photo to help identify your sister across different photo albums on Facebook? LeCun is the man to ask. About 20 years ago when he was working at Bell Labs (now AT&T’s Image Processing Research Department), he happened upon a way of thinking about teaching computers to see that wasn’t really used outside of academia until about three years ago.

Moments的背后

《财富》采访了Facebook的人工智能研究主管严恩?乐昆，以了解他的团队是如何让计算机学会人脸识别的，以及Facebook的人工智能研究下一步将向什么方向发展。首先我们需要了解的是，计算机视觉和人类看东西的方式是不一样的，不过教软件学习识别物体的过程与人类的视觉模式倒是有些相似。

比如，Facebook的面部识别技术其实无法辩识“你”这个人，它只能识别出一张照片中的“你”和另一张照片中的“你”是不是同一个人。真正意义上的鉴定身份，则完全是另一个阶段的事。

由于Facebook是一个人际社交网络，它的计算机视觉技术一直专注于识别人脸，而不是识别猫猫狗狗、汽车或者其他非人物体。Facebook使用了一个全球名人和政客的脸部照片数据库，这个名为“户外脸部检测”的数据库拥有超过1.3万张人物照片，他们的发型和穿着均不相同，有的戴着眼镜或其他配饰。除了Facebook之外，还有其他公司也在使用这个数据库，一些使用户外脸部检测数据库的大学甚至将这套系统的识别准确率提高至98%以上。

那么，从让电脑看安吉莉娜?朱莉的照片开始，Facebook是怎样做到能够从全网的各个相册中找到你妹妹的照片的呢？这个问题就得让严恩?乐昆来回答了。大约20年前他还在贝尔工作室（现在已经变成AT&T的图像处理研究部门）工作的时候，他偶然想到了一种教电脑“看”东西的办法，但这项技术直到3年前才开始被学术界以外使用。

How computers learn to see

That technique is called convolutional neural networking, and takes its name from both a mathematical operation called a convolution, and inspiration from how the human brain learns. The brain learns by establishing connections between neurons, and the more often a signal is sent over those neurons, the denser those connections get. In a similar vein, when computers establish similarities between two images it assigns a weight to those similarities. In convolutional neural networks, the goal is to train the machine to recognize the changes in weights between those connections so it can tell with increasing accuracy if the image matches.

The process of doing this is incredibly complicated and involves different calculations that work to establish how important certain aspects of the image are to the actual process of recognizing what the image is. For example, if you want to train a computer to recognize faces, the pixels related to the background are less important. The tricky—and frankly amazing— part of this is that the machine learns on its own how to tell what part of the image is most relevant, and then can generalize those relationships going forward. It still takes a lot of human effort to nudge the computer into recognizing the right way to weight the similarities, but once the model is built, it can generalize going forward.

The process can take a few days on a powerful computer.

Convolutional neural networks have become the basis for almost all of the computer vision research done today, after a team of researchers led by Geoffrey Hinton at the University of Toronto, used that technique to win a competition where image recognition algorithms vie to be most accurate. Hinton, whose team and startup were lateracquired by Google, won the competition with a test error rate of 15.3%, compared to 26.2% for the second-place winner.

Don’t post that photo!

As research continues, the opportunities for use in our day-to-day life are significant. Yes, there is the ability to match people’s faces in a crowd that might lead to greater government surveillance, but there is also an opportunity to use better facial recognition to manage your privacy. For example, with automatic facial recognition at scale, any picture of you uploaded to Facebook (or perhaps even the web) could result in a notification.

For example, if you are somehow captured in the background of a tourist shot of Times Square, you could get a notification and the option to blur your face. Applied to children, the blurring or removal could be automated. LeCun notes that Facebook is interested in such tools, but also stresses that Facebook’s interest in machine learning goes far beyond image recognition.

Facebook’s goal is to get a computer to understand empathy. Obviously, it won’t be able to feel what humans do, but it can be trained to recognize what emotions are and how people will react. With that level of understanding, Facebook could, say, offer a warning when you are about to post a photo of you drunk and ask if you really want to do that.

“This would not be face recognition,” said LeCun. “We don’t care who is in the picture. We would use other types of image recognition and train them differently to say that this looks embarrassing and then tap you on the shoulder to make sure you want to post this publicly.”

This isn’t something Facebook can do today, but LeCun offered these concepts as a thought experiment to show where Facebook could head with its AI research. Of course, this sort of expertise informed by an algorithm can make people deeply uncomfortable. Today Facebook doesn’t turn on its auto tagging features in countries like Canada and the EU because of privacy concerns, and there’s a certain creep factor in having a computer second guess your photo-sharing choices or having software trying to parse your jokes to try to understand what you find funny.

“What we’d like to do is make machines more intelligent, understanding text, images, videos and posts,” LeCun said. “Anything that can happen in the digital world we want to understand the context.” Because there is so much digital content people could easily become overwhelmed by the information flooding their feeds. The efforts of LeCun’s team will help connect people with the content that is most relevant to their interests and priorities. It’s a complex solution to a simple goal: to make sure that you see what you want to see on Facebook.

“That’s the big mission that we at Facebook are trying to fulfill,” LeCun said. “Machines that understand people.”

电脑如何学会“看”东西

这项技术又叫做“卷积神经网络”，名字取自一种名叫“卷积”的数学运算，同时它也从人脑的学习方式中吸取了灵感。人脑的学习主要依靠在神经元之间建立连接，信号在神经元之间传输得越多，这些连接就越密集。同理，当电脑在两幅图像之间建立相似点后，它就向这些相似点分配了一个权数。在卷积神经网络中，我们的目标是训练电脑识别出这些关联之间的权数变化。因此，如果图像相匹配，电脑就能准确地看出来。

这个过程是极为复杂的，它还涉及各种数学运算，以确定图像的不同方面对于识别过程的影响程度。比如，如果你想训练一台计算机学会识别人脸，背景像素其实并不重要。真正重要和惊人的部分在于，机器会自行学会辨别图像的哪一部分最重要，然后还能对这种关系进行归纳。虽然目前还需要很多人力来教会计算机如何正确地为那些相似点分配权数，但只要这个模型建成了，它就会不断自动归纳。

这个过程在一台强大的电脑上大概需要几天时间。

自从在一场争夺最精确人脸识别算法的竞赛上，多伦多大学教授乔弗里?辛顿领衔的研究团队利用卷积神经网络技术赢得冠军以来，该技术基本上已经成为目前所有计算机视觉研究的基础。在那次竞赛上，辛顿团队的测试错误率为15.3%，相比之下，获得第二名的团队的测试错误率则为26.2%。辛顿的团队和他们创办的公司后来被谷歌收购。

不要上传那张照片！

随着这类研究不断深入，它可能会对我们的日常生活产生重大影响。当然，在人群中识别人脸的技术有可能会成为政府管制的利器，但另一方面，它也会帮助你更好地管理隐私。比如，随着自动人脸识别技术的大规模应用，上传任何一张照片到Facebook，或整个网络之后，你都可能会收到一条通知。

比如，如果一名游客在纽约时代广场拍照时，不小心把你也拍到了背景里，那么他在上传这张照片后，你就会收到一条提醒，你也可以选择在那张照片里给你的脸打个码。如果是儿童的话，甚至可能自动打码或删除其形象。乐昆指出，Facebook对这种工具非常感兴趣，不过他也强调，Facebook对机器学习的兴趣要远远超出图像识别本身。

Facebook的目标是教会电脑学会识别人的情感。显然电脑不可能拥有人类的感受，但人类可以教电脑识别感情以及人们对各种感情的反应。如果电脑达到了这种理解程度，那么如果你喝醉了酒，要上传一张醉态照片的时候，Facebook可能就会提醒你是否真的想这样做。

“这将不是人脸识别技术”，乐昆表示：“我们不在乎照片里的人是谁。我们会使用其它类型的图像识别技术，然后以不同的方法训练机器，比如说某张照片看起来非常尴尬，我们就会提醒你，确保你是真的想把这张照片公开到网上。”

目前Facebook尚不具备这样的技术，乐昆也只是假设性地提出了这些概念，以介绍Facebook的人工智能技术会朝着什么方向发展。当然，这种高端的算法技术可能也会令人深感不适。目前Facebook在加拿大和欧盟各国没有启用自动标签功能，就是出于隐私方面的考虑。另外，一想到你在打算上传照片时，电脑还要替你再审查一遍，或是一想到你在发段子的时候，电脑正在努力理解你的笑点，这种感觉的确是令人浑身不舒服。

乐昆表示：“我们希望让机器变得更加智能，能够理解文字、图像、视频和帖子。总之，任何有可能发生在网络世界的事情，我们都想了解其语境。”由于网上有太多的数码内容，人们的信息很容易被各种各样的其它信息所淹没。而乐昆的团队则可以根据与人们的兴趣和最关注的事情将他们联系在一起。如此复杂的解决方案，只是为了实现一个简单的目标：确保你在Facebook上能看到你想看的东西。

乐昆表示：“这才是我们在Facebook想要实现的大目标，也就是让机器理解人。”（财富中文网）

译者：朴成奎

审校：任文科

阅读全文