← All guides

Data Services

Gathering data from the web and external sources for accounting firms

Much of the data accounting and finance teams need exists online or in external platforms — but pulling it by hand doesn't scale. This guide explains what programmatic data gathering makes possible.

27 April 2026 ·tech+bash ·6 min read

A significant amount of the data that accounting and finance teams work with lives somewhere external: Companies House filings, market prices, supplier information, property data, regulatory registers, publicly available financial records. When you need one or two data points, looking them up manually is fine. When you need data for fifty clients, or a hundred companies, or a recurring set of records that changes month to month, doing it by hand stops being practical.

Programmatic data gathering — writing code to collect data from websites, registers, and platforms rather than visiting them one at a time — is the alternative. This guide explains where it’s applicable in accounting and finance contexts and what the output looks like.

Common use cases in accounting and finance

Company information at scale. Due diligence, client onboarding, credit checks, conflict searches — all of these require gathering information about companies. Registered address, SIC codes, directors, filing history, PSC data. For a handful of companies this is fine to do manually via Companies House. For a pipeline of thirty new clients per month, or a one-off batch of two hundred companies, it isn’t.

The data is publicly available and consistently structured. Gathering it programmatically returns a spreadsheet with every field populated for every company, rather than a person working through them individually.

Market and financial data. Exchange rates, commodity prices, index values, interest rates — accounting teams regularly need these for valuations, translations, and disclosures. Many of these are available from public or low-cost sources, but extracting them in a usable format, for specific dates, in the right currency pairs, often requires more work than it should.

Supplier and counterparty research. VAT registration status, credit information, website data, contact details — information that’s scattered across multiple sources but needs to be in one place for a client file or a due diligence schedule.

Property and land data. For firms with property clients or real estate work, land registry data, planning records, and property valuations are often relevant. Much of this is publicly accessible but not easy to extract at volume.

Regulatory and compliance data. Sanctions lists, PEP databases, insolvency registers — compliance workflows often require checking names or entities against external sources. Doing this programmatically is faster and more consistent than manual lookups.

What this looks like in practice

The model is the same as other data processing work: you describe what you need, we gather it and return the data.

A typical job starts with a list — a column of company numbers, a set of names, a list of addresses — and ends with a populated spreadsheet. Every row has the data you asked for, gathered from the appropriate source, in the format you specified.

The practical details:

Volume is handled. The same code that gathers data for ten companies runs for a thousand. The time doesn’t scale linearly the way manual lookups do.

Sources are combined where needed. If you need company data from Companies House, VAT status from HMRC, and credit information from a third source, all combined into one record per company — that’s one job, not three.

The output is what you need, not what the source provides. Data from external sources often comes in awkward formats. The output you receive is clean, structured, and ready to use — not raw data that needs further processing.

Where generic tools fall short

AI assistants can look up individual pieces of information, but they can’t gather data at scale reliably. They don’t have live access to current records, they can’t query APIs or scrape websites systematically, and they produce inconsistent output across a batch.

Spreadsheet-based lookup tools (INDEX-MATCH against a downloaded register, Power Query connections to a specific data source) work for standard cases but require the data to be in a specific place and format, and need maintaining when the source changes. They also typically cover one source at a time.

For one-off or high-volume jobs that combine multiple sources, bespoke data gathering is more reliable and produces cleaner output.

A note on what’s in scope

Not all data gathering is the same. Publicly available information — Companies House, land registry, published financial data, open government datasets — is straightforward. Data behind login walls or paywalls is a different matter and depends on the terms of the source.

If you’re not sure whether the data you need is accessible, that’s a good starting point for a conversation.

Getting started

A data gathering job usually starts with a list of entities and a description of the data you need for each of them. From there, the scoping conversation is usually short — it’s a question of what sources are relevant and what the output needs to look like.

Get in touch with a description of what you need — a rough indication of the volume and the data points involved is enough to start.

Try it in Excel

The tech+bash Add-in works in Excel Desktop (Windows) and Excel Online. Install takes under two minutes.

View Pricing Install Guide

Keep reading

More guides

Data Services 27 Apr 2026

What bespoke data processing actually means for accounting firms

Self-serve tools and AI assistants handle the standard cases. This guide explains what bespoke data processing covers — and why the non-standard cases are often the ones that matter most.

Read guide →

Custom Functions 27 Apr 2026

Why RANDBETWEEN breaks your audit trail — and how TB.STATICRANDBETWEEN fixes it

Excel's RANDBETWEEN recalculates every time the sheet changes, making it unreliable for any situation where you need to reproduce a result later. TB.STATICRANDBETWEEN calculates once and stays fixed.

Read guide →

Coaching 27 Apr 2026

Why accountants should learn Python — and why coaching makes the difference

Python is becoming a genuine professional edge in accounting and audit. This guide explains what it can do for you, where to start, and why structured coaching beats self-teaching for most professionals.

Read guide →

Coaching 27 Apr 2026

How tech+bash helps accounting teams get the most out of AI

AI tools are changing how accounting and finance work gets done. tech+bash helps teams understand where AI genuinely helps, where it doesn't, and how to build the skills to use it well.

Read guide →

Coaching 27 Apr 2026

Excel is still worth learning properly — even with AI

AI tools can write formulas and automate tasks, but the accountants who use them most effectively are the ones who understand Excel deeply. Here's why strong Excel skills matter more now, not less.

Read guide →

Data Services 27 Apr 2026

Large-scale data consolidation for accounting firms

When you have exports from multiple systems, entities, or clients in different formats, consolidating them manually doesn't scale. This guide explains what a data consolidation job looks like and when it makes sense to outsource it.

Read guide →

Data Services 27 Apr 2026

AI document processing for accountants — what it can and can't do

AI can extract structured data from invoices, bank statements, and PDFs at scale. This guide explains how it works, where it's reliable, and why the accounting context matters more than the technology.

Read guide →

Add-in 27 Apr 2026

Aggregate by Colour: sum, average, max, or min cells by their fill colour in Excel

Excel has no built-in way to aggregate cells based on their fill colour. The tech+bash Aggregate by Colour app fills that gap — pick a reference colour, choose a formula, get the result.

Read guide →

Add-in 24 Apr 2025

How to reconcile transactions to a target value in Excel

A practical walkthrough of the subset-sum reconciliation problem — what it is, why Excel can't solve it natively, and how the tech+bash Reconciler handles it in seconds.

Read guide →

Custom Functions 24 Apr 2025

Gross up and gross down: the Excel formulas that should have been built in

Every accountant knows the calculation, but Excel has no dedicated formula for it. Here's how TB.GROSSUP and TB.GROSSDOWN work — and the common mistake that makes a gross-down wrong.

Read guide →