← All guides
Data Services

Gathering data from the web and external sources for accounting firms

Much of the data accounting and finance teams need exists online or in external platforms — but pulling it by hand doesn't scale. This guide explains what programmatic data gathering makes possible.

A significant amount of the data that accounting and finance teams work with lives somewhere external: Companies House filings, market prices, supplier information, property data, regulatory registers, publicly available financial records. When you need one or two data points, looking them up manually is fine. When you need data for fifty clients, or a hundred companies, or a recurring set of records that changes month to month, doing it by hand stops being practical.

Programmatic data gathering — writing code to collect data from websites, registers, and platforms rather than visiting them one at a time — is the alternative. This guide explains where it’s applicable in accounting and finance contexts and what the output looks like.


Common use cases in accounting and finance

Company information at scale. Due diligence, client onboarding, credit checks, conflict searches — all of these require gathering information about companies. Registered address, SIC codes, directors, filing history, PSC data. For a handful of companies this is fine to do manually via Companies House. For a pipeline of thirty new clients per month, or a one-off batch of two hundred companies, it isn’t.

The data is publicly available and consistently structured. Gathering it programmatically returns a spreadsheet with every field populated for every company, rather than a person working through them individually.

Market and financial data. Exchange rates, commodity prices, index values, interest rates — accounting teams regularly need these for valuations, translations, and disclosures. Many of these are available from public or low-cost sources, but extracting them in a usable format, for specific dates, in the right currency pairs, often requires more work than it should.

Supplier and counterparty research. VAT registration status, credit information, website data, contact details — information that’s scattered across multiple sources but needs to be in one place for a client file or a due diligence schedule.

Property and land data. For firms with property clients or real estate work, land registry data, planning records, and property valuations are often relevant. Much of this is publicly accessible but not easy to extract at volume.

Regulatory and compliance data. Sanctions lists, PEP databases, insolvency registers — compliance workflows often require checking names or entities against external sources. Doing this programmatically is faster and more consistent than manual lookups.


What this looks like in practice

The model is the same as other data processing work: you describe what you need, we gather it and return the data.

A typical job starts with a list — a column of company numbers, a set of names, a list of addresses — and ends with a populated spreadsheet. Every row has the data you asked for, gathered from the appropriate source, in the format you specified.

The practical details:

Volume is handled. The same code that gathers data for ten companies runs for a thousand. The time doesn’t scale linearly the way manual lookups do.

Sources are combined where needed. If you need company data from Companies House, VAT status from HMRC, and credit information from a third source, all combined into one record per company — that’s one job, not three.

The output is what you need, not what the source provides. Data from external sources often comes in awkward formats. The output you receive is clean, structured, and ready to use — not raw data that needs further processing.


Where generic tools fall short

AI assistants can look up individual pieces of information, but they can’t gather data at scale reliably. They don’t have live access to current records, they can’t query APIs or scrape websites systematically, and they produce inconsistent output across a batch.

Spreadsheet-based lookup tools (INDEX-MATCH against a downloaded register, Power Query connections to a specific data source) work for standard cases but require the data to be in a specific place and format, and need maintaining when the source changes. They also typically cover one source at a time.

For one-off or high-volume jobs that combine multiple sources, bespoke data gathering is more reliable and produces cleaner output.


A note on what’s in scope

Not all data gathering is the same. Publicly available information — Companies House, land registry, published financial data, open government datasets — is straightforward. Data behind login walls or paywalls is a different matter and depends on the terms of the source.

If you’re not sure whether the data you need is accessible, that’s a good starting point for a conversation.


Getting started

A data gathering job usually starts with a list of entities and a description of the data you need for each of them. From there, the scoping conversation is usually short — it’s a question of what sources are relevant and what the output needs to look like.

Get in touch with a description of what you need — a rough indication of the volume and the data points involved is enough to start.

Try it in Excel

The tech+bash Add-in works in Excel Desktop (Windows) and Excel Online. Install takes under two minutes.

Keep reading

More guides