The European startup and VC landscape has grown substantially over the past decades; however, reliable and centralised information about key players, deals and emerging companies is still not very accessible. The Bocconi Students for Alternative Investments (BSAI) Association, for this reason, developed a European Venture Capital & Startup Database, which includes up-to-date information, mainly built using data from Crunchbase, while manual verification checks validated the data. This database allows users to gain a holistic insight into the most recent European startups, VC investors and the deals they made, making it a valuable research tool for investment analysis and strategic decision-making.
The main difficulty in understanding Europe’s innovations is the vast, ununified data. In the US, VC hubs have reporting standards which are relatively centralised. Yet, the European market operates differently: across dozens of countries and regional regulations. Early-stage companies lack public visibility many times, and deal details are either not properly disclosed or are inconsistent. The raw data of databases such as Crunchbase often require cleaning and validation to be able to be used effectively, even if they provide a strong foundation for data gathering. This is exactly the area where BSAI’s database provides value: it consolidates, filters, and standardises key data points from multiple sources into a single, user-friendly dataset.
A combination of manual refinement and automated data extraction was required to create the automated database. The initial dataset was created through targeted web-scraping, allowing for consistent formatting and updating of variables. By using automated extraction methods, BSAI members were able to obtain thousands of records quickly and easily; however, to confirm critical elements (funding round dates, headquarters of companies, and the names of investors), the team confirmed the information manually for accuracy and dependable results. The combination of these two methods enabled the creation of a dataset that is complete and very accurate, addressing a significant limitation of the dataset derived from purely automated web scraping techniques.
Project Leader: Federico Galli & Luca Lupano
Analysts: Mihály Jirkovszky-Bari, Maria Luiza Lyra, Zsigmond Faltay
