Tech mistake |In 2011 European Commissioner Neelie Kroes put up a proposal to open up Europe’s Public Data for everyone to use. Neelie Kroes is a strong supporter of the use of public datasets and she called governments to put online datasets that were created with public money. Also the United Stated sees many chances for public data. Obama launched a big data initiative worth $ 200 million in 2012 to investigate the big data opportunities and technologies.
Due to these initiatives, public datasets are becoming more widely available for organisations and that thrives innovation and new solutions for (unknown) problems in the world. More and more private initiative are being launched as well. These marketplaces collect public datasets and private datasets for organisations. Visitors can buy the datasets or download them for free. At some websites organisations can also sell their own datasets. Also Google and Amazon are developing big data marketplace, be it still in a relatively small scale.
This section of Datafloq is dedicated to these (public) initiatives and we will collect all websites / companies that make (public) datasets available. This is a ongoing in section as new websites are launched every day. So if you know of a website that is not shown here, please contact us and let us know.
DataMarket is another data supermarket offering more than 45.000 datasets from around the world, delivered by among others 42 governments. Datamarket’s objective is to find all available (public) datasets and make them accessible and understandable.
DataStreamX is the global marketplace for commercial data. Founded in 2014, their mission is to accelerate data access worldwide by bringing together buyers and vendors of data onto one simple-to-use platform. DataStreamX helps transform our clients’ businesses by delivering actionable data to buyers and creating new revenue opportunities for vendors. As the global marketplace for commercial data, they enable the data economy
QunB is developing a data marketplace. Companies are stimulated to upload their own data to QunB and to combine it with other datasets. These datasets can be sold or can be given away for free.
Knoema is a knowledge platform that provides access to over 100 million time series. All available data is interactive and can be exported if needed. In addition they provide visualization tools (with over 1000 different visualizations) to analyse the public data. Visualizations can be made public or kept private.
LexisNexis is a paid subscription platform offering libraries of statutes, case judgments and opinions for jurisdictions. Customers have access to billions of searchable documents and records from more than 45,000 legal, news and business sources.
Via Google Public Data visitors can delve into 104 different data sets and download them for their own usage. Visitors can upload their own datasets to visualize it and explore it. Current available datasets are among others from the World Economic Forum, Eurostat or the IMF.
Via Amazon Web Services there are another 54 datasets made available to the public. Datasets such as the 1000 Genome Project or the Common Crawl Corpus Project covering data of over 5 billion web pages are available to users.
Enigma.io is a big data startup that offers access to public data sources. The New York based company offers over 100.000 databases that can easily be searched through or exported. Users can download everything from import bills of lading, to aircraft ownership, lobbying activity,real estate assessments, spectrum licenses, financial filings, liens, etc.
Quandl is a public data set startup currently in beta and offering over five million financial, economic and social data sets from all over the world for free. Visitors can embed graphs on their own website or download the data set via Python, Stata, Excel or R.
Figshare is a platform especially for researchers, where researchers can make their outputs available to every one in order to be used by anyone. Figures, datasets, media, papers, posters, presentations and filesets can be made public to every one. All data is automatically published in a citable, searchable and sharable manner.
Datahub.io is a community-run catalogue of useful sets of data on the Internet. Users can collect links to data found on the web or store data on the platform itself. Users can also search the data collected by other users. The platform runs on the open-source software CKAN. Most of the data indexed is free to use or re-use it however they like, because of the open-license.
Open Science Data Cloud is a platform providing petabyte-scale cloud resources that enables users to easily analyze, manage, and share data. The OSDC currently hosts about 450 TB of data and they plan to increase this to the petabyte level.
OpenData is a platform with a large collection of open data sets by Socrata. Socrata provides social data discovery services for opening government data. They have collected over 200.000 datasets from around the world. The data sets are divided in five different categories: Business, Education, Fun, Government and Personal.
Freebase is a community-curated database of well-known people, places, and things. The website offers almost 2 billion facts divided over 40 million topics and 76 domains. Every fact and entity is available as an RDF Dump which enables users to analyze the entire database on your own computer.
Thinknum is working on indexing all financial data and exposing it through a simple API. They have over 10 million data-series all of which can be downloaded for free. They use the data to build applications that help strategists analyse financial markets. They currently support two applications: ThinkNum Plotter and Thinknum Cashlow Engine. Thinknum Plotter allows users to manipulate time-series data using mathematical expressions. Users can analyze data without having to write code. Thinknum cashflow engine allows users to view discounted cashflow models online.
xDayta is a marketplace to buy and sell data. xDayta is an open platform allowing anyone to sell any type of data to any buyer. It’s free to list data on xDayta for sale. Anyone can register and sell data on xDayta. Anyone looking for data to buy can use xDayta. The xDayta exchange facilitates over-the-counter data trades, brokers transactions, indexes data pricing and regulates trading.
Red Lion Data is a Canadian marketplace that offers location datasets for retail and restaurants in USA and Canada. Data sets are sold between $ 15 – $ 55 as well as annual subscriptions are available. The data sets are especially relevant for mobile apps developers, Real Estate Professionals, Online portals and Web Directories and Marketing Professionals.
The ArcGIS Open Data site allows people to search for all the authoritative open geospatial data that’s been shared by the users of ArcGIS Online, Esri cloud-based mapping platform. Since ArcGIS Open Data was launched, more than 1,500 authoritative organizations (governments at all levels, commercial organizations, nonprofits, etc.) have shared more than 25,000 high quality datasets as open data. People can search this data by topic or location and then download it for their own use and analysis.
Big Data Exchange is a real-time data exchange enabling data buyers and sellers to connect and exchange user profile data sets in real-time. Data Sellers can use our APIs to monetize their data and Data Buyers can access user interest data in real-time using a set of API tools geared towards helping companies learn more about their users in order to increase ROI.
So, please help us grow this section and let us know if you have found another place online were (public) datasets can be bought / downloaded from.
The article was originally published here.