Weekly analysis, news and randomness from the future of transportation.
Have you ever wanted to know how hard it is to teach computers to see the world as humans do? Now you have the chance.
Microsoft has teamed with a host of public partners to crowdsource video annotation. Annotating videos is labor intensive but crucial to improving artificial intelligence because the tags and information entered by humans teach computers what they are seeing. The program is called Video Analytics towards Vision Zero, and is part of a global program targeting zero road fatalities.
The annotation program is simple enough. You watch a short video taken from a road camera of an intersection and draw boxes around pedestrians, cars, motorcycles and buses moving through the field of vision. Thinking I could whip one out in 20 minutes or so, I started doing one this morning. Forty-five minutes later, I'd tagged only 10 objects: one car, one motorcycle and eight pedestrians. I have at least six more pedestrians to go, one more moving car and about a dozen stationary cars.
"Data annotation is super labor-intensive," Sameep Tandon, CEO of self-driving startup Drive.ai, told Automotive News' Katie Burke for an article we published this week on the difficulties in training artificial intelligence systems.
It is easy to collect the raw data by slapping cameras on cars and in intersections, but someone needs to interpret those images for computers. Without annotation, images on the back of a commercial van might look like some bikes riding on the road to a computer.
To better understand the challenge, annotate a video on the Vision Zero site. Given the group's goal of saving lives, you can also consider it charitable work that you can do from your desk. And maybe in the process you can invent a better solution than the methods now in use.
— Sharon Silke Carty