Skip to Main Content
How can we effectively crawl the web to find specific data?
top 4
48 months ago
Small pic search

Novarica conducts research and consulting work in the insurance industry. The company has its own proprietary dataset of insurance companies and information and uses a variety of methods to keep it up to date.

We are interested in scalable methods of crawling the web for publicly listed information. Specifically we’re looking for a) annual premiums or b) annual revenue numbers.


Using the attached list of URLs as a starting point, create a script that crawls these webpages and finds annual premium or total annual revenue numbers.

Search specifically in annual reports, press releases, company about pages, or other relevant pages you discover in your own research.

Example (About Section):

Example (Annual Report):

Info on Premiums and Revenue Numbers:

These links go to the financial statements of Hastings and Arbella. Both are fairly standard financial reports (although no two reports are ever exactly alike). The numbers are a breakdown of the companies’ balance sheets. We do not need these breakdowns. We continue to be interested primarily in Revenue and Written Premium.

For Hastings, the 2013 written premium is included in the bar chart at the bottom of the screen (393.644M)