As more companies wade into the business of building artificial intelligence systems to help you drive (or dothe driving for you), a startup founded by an ex-Apple computer vision specialist is open sourcing a huge dataset that can help them on their road to autonomy.
Mapillary, a Swedish startup backed by Sequoia, Atomico and others that has built a database of 130 million images through crowdsourcing think open-source Street View is releasing a free dataset of 25,000 street-level images from 190 countries, with pixel-level annotations that can be used to train automotive AI systems.
TheMapillary Vistas Datasetclaims to be the worlds largest, most diverse dataset for object recognition on street-level imagery. As with the rest of Mapillarys photos, the startup builds its image database on top ofMapbox and OpenStreetMap maps.
The dataset is free for both academic and commercial researchers, and if anyone wants to build the results into commercial products, they must pay a commercial license.
As Jan-Erik Solem, the CEO and co-founder, explained, while there are other datasets that companies are using to train the machine learning algorithms for their in-car systems, these fall short because theydo not have enough variability and coverage to be useful in real-world scenarios.
This Vistas dataset is built on top of regular Mapillary images, where most of the images come from crowdsourcing. What we have done here is that we manually selected 25,000 images with the variability we wanted from the 130+ million available on Mapillary, Solem explained. Then we manually annotated them to label all the pixels in the images. This is a tedious and expensive manual labor process.
Expensive, and yet now free to use, because of the companies that are sponsoring the work, Solem said.
Sponsors of this dataset are Lyft, Toyota and Daimler, some of whom received pre-release data, he added. Its not clear exactly how these three companies may be using the datasets, beyond making their own autonomous driving systems smarter and more fail-safe.
This is our main dataset for training our own algorithms. Our own need was one of the reasons for creating this dataset, Solem noted.
You can see a visual progression of how ordinary pictures transform into pixel-level annotated data sets in the gif above, and how a final product looks in the image below, ready to plug into your machine learningengine.
Mapillary, it should be pointed out, has yet to reveal much about who its paying customers are these days. Were just about to tell the world, Solem said when I asked him about this, although the three companies sponsoring the Vista dataset are probably good guesses.
Mapillary describes its wider dataset as one that is used to help build smart cities, future maps, andautonomous vehicles. Using computer vision, Mapillary reads images that have been uploaded to its database to identify locations in 3D and recognise and order objects within them.
When we wrote about Mapillarys most recent funding round $8 million from Sequoia, Atomico, LDV Capital, and PlayFair in March 2016 we noted that the company had signed up various organizations to use its data.They included the Swedish town ofHelsingborg, Los Angeles County, the World Bank and the Red Cross (although, again, whether these are paying or free users is not clear).
The company does have a set of pricing tiersthat point to its B2B focus: the database is free for the first 50,000 views of images with no data requests; $250 per month forup to 500,000 views and 250 data requests; and then priced on a case-by-case if you are using more than this.
As a business we provide images, data automatically extracted from these images, and processing services for clients that have their own imagery but dont want to share that on public Mapillary, Solem said. Our markets are mapping, automotive, and GIS (Geographical Information Systems). Were in early stages revenue-wise and 2017 will be a very interesting year for us as a business.
While crowdsourcing can be a tricky and inconsistent way to build a database, its notable that Mapillarys crowdsourcing is something of a closed loop.
Those who use the platform also contribute toMapillarys wider database, which means that the system is building stronger datasets exactly in the locations where there is demand at the moment, without filling in the blanks for other places, which will be populated more as and when the need to do so arises not unlike how Waze was built in its early days, well before getting acquired by Google.
When it comes to image contributions to Mapillary in general, people are self-motivated and contribute because we help them solve problems they have, Solem explained. This can be sharing of places and place data, inventory work, mapping work, and map editing.
Solem has impressive credentials in the area of computer vision. His previous company the Malmo, Sweden-based facial recognition startup Polar Rose was quietly acquired by Apple in 2010. He subsequently joined the iPhone makerto work on computer vision and other projectsfor several years after that.
Mapillarys existence is an interesting development in the bigger world of digital mapping services. These are used not just as cornerstones in how smartphones work, but are central to how a lot of the next wave of computing is being shaped. That leads to inevitable questions of who should rightfully own this kind of potentially very central and crucial data.
Interestingly, although Solemput in significant time atApple one of potentially only a few big commercial playersdigital mapping alongside Googleand the car consortium that now owns Here (formerly Nokia) now Solem is singing a different tune when it comes to creating and long-term ownership of mapping datasets.
Ithink it is worrisome that in mapping things are being consolidated into a few players, he told me back when Mapillary raised its last round of funding.Its bad because it means that data moves into silos and very little is shared again. When Apple picks up companies and puts their data into Apple Maps,they disappear. A lot of the data that used to be provided is gone. And Apple has no interest in providing that info to anyone else. Thereare certain things that you should keepindependent.
Independent, and ready for many and any others to use as they will.
Read more: https://techcrunch.com