Open Data 500

Models for API driven startups built around public data

I had a conversation with a venture capitalist recently who was looking for information on startups who had APIs and had built their company around public data.

The two companies that were referenced in the original contact email were companies like Eligible API and Clever API: Two similar, yet very different, approaches to aggregating public and data into a viable startup (Clever used to aggregate data from school districts, but now just provides login services).

During our conversation, we talked about the world of APIs and open data, both government and across the private sector. I spent 30 minutes helping them understand the landscape, and told them that when I was done I would generate of list of APIs I thought were interesting, and that I would categorize into a similar space as Eligible and Clever, something that was much more difficult to quantify than I expected. Nonetheless, I learned a lot and, as I do with all my research, I wanted to share the story of my experience.

I started with the companies, that off the top of my head, had built interesting businesses around publicly available facts and data, a definition that would expand as I continued.

I started with a couple of APIs I know provide some common data sources (via APIs):

Next, I wanted to look at a couple of the business data APIs I depend on daily, and while I was searching I found a third. What I think is interesting about these business data providers is their very different business models and approaches to gathering and making the data available.

Immediately after looking through Crunchbase and OpenCorporates, I queried my brain for other leading APIs who are pulling content or data from public sources, and developing a business around them. It makes sense to look at the social data and content realm, but this is where I stop. I don’t want to venture too far down the social media rabbit hole, but I think that these two providers are important to consider.

While Twitter data isn’t the usual business, industry, product, and other common facts, it has grown to be the new data that is defining not just Twitter’s business, but a whole ecosystem of aggregators and other services that are built on consuming, aggregating and often publishing public raw or enriched social data.

I wanted to also step back again and look at Clever, and think about their pivot from aggregating school data to being a login service. There was another API that I was tracking on who offered a similar service to Clever, aggregating school data, that I think is important to list alongside with Clever.

As far as I know, both Clever and LearnSprout are adjusting to find the sweet spot in what they do, but I keep them on the list because of what they did when I was originally introduced to their services. I think we can safely say that there will be no shortage of startups to come, following in Clever and LearnSprout’s footsteps, unaware of their predecessors, and the challenge they face when aggregate data across school districts.

Healthcare data

After taking another look at Clever, I also took another walkthrough at the Eligible API, and spent time looking for similar data driven APIs in the healthcare space. I think that Eligible is a shining example of what this particular VC was looking for, and a good model for startups looking to not just build a company and API around public data, but do it in a way that you can make a significant impact on an industry.

I know there are more healthcare data platforms out there, but these are a handful of ones that have APIs that I track. Healthcare is one of those heavily regulated industries where there is huge opportunity to aggregate data from multiple public and private sectors sources and build an API-driven business from.

Energy data

After healthcare, my mind immediately moved into the world of energy data, because there is a task on my task list to study open data licensing as part of a conversation I’m having with Genability. I think what this latest wave of energy API providers, and the work they do with the data of individual customers, but also wider power companies, and state and federal data, is very interesting.

When I was in Maryland this last May, moderating a panel with folks from the Department of Energy, the conversation came up around the value of the Department of Energy data, to the private sector. I’d say that the Department of Energy data is in the top five agencies when it comes to viability for use in the private sector and making a significant economic impact.

Libraries

Pushing the boundaries of this definition again, I stumbled onto the concept of launching APIs for libraries, built around public or private collections. While not an exact match to other APIs is this story, I think what DPLA is doing, reflects what we are talking about, and about building a platform around public and private datasets (collections in this case).

Just like government agencies, public and private institutions possess an amazing amount of data, content and media that is not readily available online and provides a pretty significant opportunity to build API driven startups and organizations around these collections.

Scientific data

There is a wealth of valuable scientific data being made available via public APIs, from various organizations, and institutions. I’m not sure where these groups are gathering their data from, but I’m sure there is a lot of public funding and sources included in some of the APIs I track.

These are just two of the numerous scientific data APIs I keep an eye on, and I agree that this is a little out of bounds of exactly for what we are looking for, however, I think that the opportunity for designing, deploying and managing high-performing, high-value APIs from publicly and privately-owned scientific data is pretty huge.

Government data

As I look at these energy, and scientific APIs across my monitoring system, I’m presented with other government APIs that are consumer focused and often have the look and feel of a private sector startup, while also having a significant impact on private sector industries.

While all of these APIs are clearly .gov initiatives, they provide clear value to consumers, and I think there is opportunity for startups to play around in offering complimentary, or even competing services with these government-generated, industry-focused open data–going further than these platforms do.

Quasi-government data

Alongside those very consumer, industry oriented government efforts, I can’t help but look at the quasi-government APIs I’m familiar with that are providing similar data-driven APIs to the government ones above.

While these may not be models for startups, I think they provide potential ideas that private sector non-profit groups can take action on. Providing mortgage, energy, environmental, or even healthcare services, developed around public and private sector data, will continue to grow as a viable business model for startups and organizations in coming years.

Watchers of government data

Adding another layer to government data, I have to include the organizations that keep an eye on government, a segment of organizations that have evolved around building operational models for aggregating, generating meaning from, then republishing data that helps us understand how government is working (or not working).

These are all nonprofit organizations doing this work, but when it comes to journalism, politics, and other areas, there are some viable services that can be offered surrounding, and on top of the valuable open data being liberated, generated and published by the watchers of our government.

School data again

One interesting model for building a business around government data is with Great School. There are some high-value datasets available at the Department of Education as a well as Census Bureau, and using these sources have presented a common model for building a company around public data:

I’m not exactly a fan of the Great Schools, but I think it is worthy of noting. I’ve talked with them, and they don’t really have as open of a business model and platform as I would like to see. I feel it is important to “pay it forward” when building a for-profit company around public data. I don’t have a problem with building businesses and generating revenue around public data, but if you don’t contribute to it being more accessible than you found it, I have a problem.

News

After spending time looking through the APIs I monitor, I remembered the use of public data by leading news sources. These papers are using data from federal, state and city data sources, and serving them via APIs, right alongside the news.

These news sources don’t make money off the APIs themselves. Like software-as-a-service providers, they provide value-add to their core business. Census surveys, congressional voting, economic numbers, and other public data is extremely relevant to the everyday news that impacts us.

Been doing this for a while

When we talk about building businesses around publicly available data, there are some companies who have been doing this a while. The concept really isn’t that new, so I think it is important to bring these legacy providers into the conversation.

Most of these data providers have been doing it for over a decade. They all get the API game and offer a wide range of API services for developers, providing access to data that is taken directly from, derived or enhanced from public sources. When it comes to building a business around public data, I don’t think these four have the exact model I’m looking for, but there are many lessons of how to do it right, and wrong.

Weather is age-old model

When you think about it, one of the original areas we built services around government data is weather. Weather data is a common reference you will experience when you hear any official talk about the potential of government data. There are numerous weather API services available that are doing very well when it comes to digesting public data and making it relevant to developers.

Weather is the most relevant API resource I know of in the space. Weather impacts everyone, making it a resource all web and mobile applications will need. With the growing concern around climate change, this model for using public data, and generating valuable APIs will only grow more important.

Time zone data

Right there behind weather, I would say that time and data information is something that impacts everyone. Time shapes our world and government sets the tone of the conversation when it comes to date and time data, something that is driving many API-driven business models.

What I like about time and date APIs is that they provide an essential ingredient in all application development. It is an example of how government can generate and guide data sources, while allowing the private sector to help manage vital services around this data, that developers will depend on for building apps.

Currency conversion

Along with time zone data, currency conversion is a simple, valuable, API driven service that is needed across our economy. You have to know what time it is in different time zones, and know what the conversion rate between different currencies to do business in the global API economy.

In our increasingly global, online world, currency conversion is only going to grow more important. Workforces will be spread across the globe, and paying employees, buying goods and services will increasingly span the globe, requiring seamless currency conversion in all applications.

Transit data

Another important area of APIs, that are increasingly impacting our everyday lives, are transit APIs, providing real-time bus, train, subway and other public transit data to developers.

Transit data will always be a tug of war between the public and private sector. Some data will be generated in each sphere, with some projects incentivized by the government, where the private sector is unwilling to go. Establishing clear models for public and private sector partnerships around transit data will be critical to society functioning.

Real estate

While I won’t be covering every example of building a business around public data in this story, I would be remiss if I didn’t talk about the real estate industry, one of the oldest businesses built on public data and facts.

I’m not a big fan of the real estate industry. One of my startups in the past was built around aggregating MLS data, and I can safely say that the real industry is one of the shadiest industries I know of that is built on top of public data. I don’t think this industry is a model that we should be following, but again, I do think there are a huge lessons to be learned from the space as we move forward building business models around public data.

That is as far as I’m going to go in exploring API driven businesses built on public data. My goal wasn’t meant to be comprehensive, I was just looking to answer some questions for myself around who else is playing in the space.

This list of businesses came out of my API monitoring system, so is somewhat limited in its focus, requiring the company who is building on top of public data to also have an API, which creates quite a blind spot for this research. However, this is a blind spot I’m willing to live in, because I think my view represents the good in the space, and where we should be headed.

Open Data 500

For the next edition of this story, I’d like to look through the 500 companies listed in the Open Data 500 project. I like the focus of the project from GovLab out of New York University.

Their description from the site sums up the project:

The Open Data 500 is the first comprehensive study of U.S. companies that use open government data to generate new business and develop new products and services. Open Data is free, public data that can be used to launch commercial and nonprofit ventures, do research, make data-driven decisions, and solve complex problems.

I see a few of the companies I’ve listed above in the Open Data 500. I’m stoked that they provide both a JSON and CSV version of the Open Data 500, making it much easier to process, and make sense of the companies listed. I’d like to make a JavaScript slideshow from the JSON file, and browse through the list of companies, adding my own set of tags—helping me better understand the good from the bad examples, as well as where the trends and opportunities are around developing APIs around public data.

I’m pretty convinced that we have a lot of work to do in making government machine-readable data at the federal, state, county and city level more available before we can fully realize the potential of the API economy.

Without high quality, real-time, valuable public data, we won’t be able to satisfy the needs of the next wave of web, single page, mobile and Internet of things application developers. I’m also hoping we can work to establish some healthy blueprints for developing private sector businesses and organization around public data, by reviewing some of the existing startups who are finding success with this model, and build on, or compliment this existing work, rather than re-invent the wheel.

‘Open Data Now’ author Joel Gurin on how businesses and government are building the data economy

Joel Gurin

Photo courtesy of Joel Gurin

What compelled you to write this book?

My interest in the public uses of data goes way back. For over a decade I was the editorial director and then executive vice president of Consumer Reports, where we developed our own expert data to help consumers make important decisions. Then a few years ago I went to the Federal Communications Commission as chief of the consumer bureau, where we tried to figure out how to use data about cellphone plans to improve consumer choice. That work led to my chairing the White House Task Force on Smart Disclosure – the term we used for releasing data to help consumer decision-making – and that, in turn, got me interested in open data more broadly.

As I started talking to dozens of people in government, business and nonprofits, I realized that we’re in the middle of an open data revolution that’s starting to change our society.

The heart of it is the open government data that’s now being released in increasingly valuable ways. I wrote Open Data Now to document this new open data movement, show how it’s creating new business opportunities, and encourage government agencies to make even more government data available.

What is the Open Data 500 study and why is it important?

The Open Data 500 study – which I’m leading at the GovLab at NYU, where I am senior advisor – is the first comprehensive study of companies that use open government data as a key business resource. We set out last fall to see whether we could find 500 of these companies, far more than anyone had documented before. We found them and we researched them – 190 filled out the surveys we sent them, and we learned about the rest from public sources.

We’ve found a tremendous diversity of companies in 15 different categories (such as healthcare, finance and energy), operating all over the country, using different revenue models and different kinds of government data. Perhaps most important, we’ve been able to map the connections between government agencies and the companies that use their data. If you explore the Open Data Compass on the home page of OpenData500.com, you’ll see what we’ve learned. Our next step is to use this understanding to help government agencies and businesses work together to use data more effectively.

What are the key challenges to getting data open?

Government officials have compared the state of federal data to that last scene of Raiders of the Lost Ark – it’s like a warehouse filled with unlabeled crates that contain treasure somewhere, but nobody knows where. That may be an overstatement, but it’s not far off. A lot of data is trapped in legacy systems – there are an  estimated 10,000 data systems in the federal government – that are hard to use and are not interoperable with each other. The Obama administration’s Open Data Policy requires agencies to make almost all their data open and machine-readable, but that’s a very tall order.

At the GovLab, we think we can help by building on the results of the Open Data 500 study. The key is prioritization. Rather than trying to open up all the data at once, what if we identify the 10 percent of an agency’s datasets that may hold 90 percent of the public value and make those really usable first? The GovLab is now planning to convene and facilitate a series of Open Data Roundtables that will bring companies to the table with the agencies that provide their data. We want to help them work together so they can figure out the best Open Data strategies. The Department of Commerce has been especially enthusiastic and will help us plan the first Roundtable; Labor, Transportation, Treasury and the USDA have also committed to participate in the future.

Who’s doing open data right?

The Department of Commerce does seem to be a leader here; their data from NOAA and Census is especially widely used. More companies in our study use data from Commerce than from any other federal department, and Commerce is the only one that serves companies in all 15 categories we studied. Health and Human services was an early Open Data leader and keeps releasing datasets with high impact, like the new data that names and sometimes shames individual Medicare providers. Other departments and agencies, like the ones who have signed on for our Roundtables, are also doing increasingly exciting open data work.

You highlight great examples of how open data has fostered new business models and empowered economic growth. What are some of your favorite examples that highlight this?

Everyone talks about the Climate Corporation, which was just sold to Monsanto for about a billion dollars. I’ve been following them for over a year, and was lucky to interview their CEO, David Friedberg, for my book and my website OpenDataNow.com; you can read the interview in print or online and hear a podcast of it on my website as well. But it’s not just the big companies like this that are proving the economic value of open data. After all, we’ve just published information on 499 more of them.

If I had to choose, I’d highlight the companies that are using open data not just to build their business, but to provide a social benefit. I think it’s something about the nature of open data, but I’ve noticed that a lot of open data companies are doing well by doing good. We have healthcare companies that are helping people find better post-hospital care and better, affordable healthcare overall. Energy-focused companies are using open data to reduce energy use and carbon emissions, while consumer shopping applications help people choose products with a low carbon footprint. Financial websites and apps, powered by open data, help consumers protect their credit ratings and help small businesses get loans more easily. In education, new startups are helping college-bound students figure out how to get the most return for the money they’ll spend on education.

When we talk about the economic benefit of open data, we have to remember the social benefit as well. The good news is that the two are closely connected – and we can expect to see a lot more companies that will generate value on many levels.