Lab, read the thinking

Composing one defensible estimate from many sources

The Touchstone method: a source credibility prior, a resolution rule, liveness and recency adjustment, de-duplication, and weighted Monte-Carlo composition.

Abstract

Market sizing in disciplined entrepreneurship asks a practitioner to commit to a single number, for example the total count of end users in a defined market, while the evidence for that number is scattered across sources of very uneven quality. We describe the method Cambridge Cyber International uses to compose one defensible estimate from several heterogeneous inputs, each retrieved with the help of a language model and each expressed as a three-point estimate with citations. The method attaches to every input a weight that is the product of five factors: the credibility of the publishing source, the model's stated confidence, the precision of the estimate, a liveness check on the citation, and a recency decay. Weights are normalised across the sources an operator selects, and the inputs are combined by a weighted Monte-Carlo mixture that reports a central figure together with a measure of agreement. This paper concentrates on the first factor, source credibility. We set out a nine-class taxonomy with editorial priors, a deterministic algorithm that maps any cited address to a prior with a fallback for unknown sources, and the governance that keeps the priors honest. We are explicit that the priors are editorial starting values subject to calibration, not measured rates of accuracy.

1. Problem

A practitioner following the disciplined entrepreneurship method (Aulet 2013; Aulet and Snyder 2017) must produce concrete figures early, before primary research is affordable, and must defend them. A language model can retrieve candidate figures quickly, but it returns them with three defects that make naive averaging unsafe. First, the sources differ enormously in trustworthiness, from a national statistics office to a press-release-driven market-research reseller, yet a plain mean treats them alike. Second, several of the returned citations often trace back to a single underlying report, so a plain mean silently double counts. Third, a model may state a figure with unwarranted confidence, or cite a page that no longer exists. The task is to combine such inputs into one figure that a reasonable reader would accept, while exposing why the figure came out as it did.

2. Background

We treat each candidate input as a three-point estimate, a low, a most likely, and a high value, the form long used in programme evaluation and review (Malcolm, Roseboom, Clark and Fazar 1959) and in quantitative risk analysis, where the triangular and the program-evaluation distributions are standard tools for turning three points into a distribution (Vose 2008). We propagate uncertainty by simulation rather than by closed-form algebra, following the Monte-Carlo method (Metropolis and Ulam 1949). We retrieve evidence with grounded, citation-bearing generation rather than free generation, in the spirit of retrieval-augmented approaches that tie model output to named documents (Lewis et al. 2020). Combining several distributions into one is itself a studied problem, and the risk-analysis literature warns that naive pooling ignores both source quality and dependence between sources (Clemen and Winkler 1999); our weighting and de-duplication steps are a direct response to that warning.

3. The weighting model

Each selected source i receives a weight that is the product of five bounded factors:

weight_i = credibility_i  x  confidence_i  x  precision_i  x  liveness_i  x  recencyDecay_i

The weights are then normalised across the operator-selected set so that they sum to one. The credibility factor is a prior on the publisher, described in sections 5 to 8. The confidence factor reflects the model's stated certainty about the specific claim. The precision factor rewards a tighter three-point spread, in the spirit of inverse-variance weighting, where more precise estimates carry more influence (Borenstein, Hedges, Higgins and Rothstein 2009). The liveness factor is near zero when the cited page cannot be reached, and the recency factor decays as the citation ages. Because the factors multiply, any one of them approaching zero pulls the whole weight toward zero, which is the behaviour we want: a current, precise, confidently stated figure on a dead link should not dominate.

4. Why credibility is a prior, not an accuracy

The credibility factor is a prior about the publisher, set before the specific claim is examined. It expresses how much benefit of the doubt a class of source has earned, in the way a careful reader trusts a central bank release more than an anonymous blog before reading either. It is not a measured probability that a given figure is correct. We hold this distinction firmly because the alternative, presenting a coefficient as if it were a validated accuracy rate, would overstate what the registry knows and would invite misuse. The published registry and this paper therefore describe the values as editorial priors subject to calibration. The calibration roadmap in section 12 explains how the priors are intended to move over time as evidence accumulates about which sources predicted good estimates.

5. Source credibility taxonomy and coefficient assignment

We sort sources into nine classes, ordered by descending starting prior. Peer-reviewed academia carries the highest prior, followed by official statistics, intergovernmental organisations, central banks and regulated filings; then government open data; then independent analysts; then transparent statistics vendors; then trade bodies, chambers and standards organisations; then quality business press; then market-research resellers; and finally a residual class for open-source intelligence and any source the registry does not recognise. Each class has an explicit starting coefficient expressed as a percentage and a resolution signal, the observable feature of an address that places a source in the class. The coefficients are held on the backend as tunable, versioned values; the published registry exposes the class and a coarse tier label but withholds the number, for the reasons given in section 11.

Three rules override the raw class coefficient and matter more than the exact percentage. Outlet credibility is not claim credibility, so the prior is only a starting point that liveness and recency then adjust. Co-citation is de-duplicated, so several outlets that trace to one report count once. Primary sources outrank secondary ones at equal class, so an original release outranks a re-report of it. A sector layer raises the prior for sources inside the cyber and regulatory-technology focus of the firm, such as the European Union Agency for Cybersecurity, the United States National Institute of Standards and Technology, and national cyber agencies. A country layer seeds, for each country, the statistics institute, the open-data portal, the central bank or financial regulator, the principal chambers of commerce, and the business registry.

6. Resolution algorithm

Any address a model cites is resolved to a prior after the call, never by injecting the registry into the prompt. Resolution proceeds in four ordered steps and returns the first match. It first seeks an exact match of the host, or of the host and path together when a source is only credible on a specific path. It then seeks a parent-domain match, so that a subdomain of a recognised source inherits its class. It then applies a class rule keyed on the top-level domain, so that any academic or government domain resolves to the right class even when the specific institution is not individually listed. If none of these match, it returns the open-source-intelligence fallback at the lowest non-zero prior, flagged as unknown provenance, so that an unrecognised source stays admissible at low weight rather than being discarded. The resolver returns the identifier, name, class, coefficient, tier, scope, and a record of which step matched, the last of these kept for audit. A promotion path exists for unknown sources: a source that repeatedly appears in citations that predicted good estimates becomes a candidate for curation into a named class.

7. Liveness verification and recency decay

A high prior on a dead citation is worthless, so the liveness factor reduces the weight sharply when the cited page does not resolve or returns a not-found status. Liveness is checked at scoring time rather than trusted from the registry, because pages move. The recency factor decays the weight as the cited material ages, on the reasoning that a market figure from several years ago is weaker evidence for a current market than a recent one, all else equal. The decay is gentle for slow-moving structural statistics and steeper for fast-moving technology figures; the rate is a tunable parameter rather than a fixed constant.

8. De-duplication of co-citation and the primary-over-secondary rule

Before any weighting, citations are grouped so that those tracing to one underlying source collapse into a single contributor. This prevents a popular report, re-reported by many outlets, from acquiring artificial weight through repetition. Where the registry knows that one source re-reports another, the original is preferred and the re-report is suppressed or down-weighted. Independent unknown sources are deliberately kept separate, so that genuine corroboration from distinct origins is rewarded while echo is not. This step is the practical form of the warning that pooling without attention to dependence overstates confidence (Clemen and Winkler 1999).

9. Composition by weighted Monte-Carlo mixture

With weights fixed and normalised, we draw samples from each source's three-point distribution in proportion to its weight, forming a mixture, and we summarise the mixture by a central figure and a dispersion measure (Metropolis and Ulam 1949; Vose 2008). We report the dispersion alongside the central figure rather than hiding it, because the spread tells the reader how much the sources agree. A tight mixture signals consensus; a wide one signals that the estimate rests on contested ground and should be treated with care. This honesty about agreement is, in our view, as important as the central number itself.

10. Governance

The priors are tunable without a rebuild, so curation can respond to new evidence quickly. An operator may override a resolved prior for a specific analysis, and the override is recorded. Every resolved citation is stored with its matched step and resolved value, so that any composed estimate can be reconstructed and audited later. The taxonomy is reviewed on a fixed cadence, with a default review interval per record, and curation ownership is named so that staleness has an owner. Bias mitigation is addressed by keeping the class signals observable and the rules public, so that a reader can see why a source scored as it did rather than trusting an opaque score.

11. Openness and intellectual-property posture

The registry is published at a public address. A published list of domains is public by design and cannot be a trade secret, so we do not pretend otherwise. We publish the domains and their classes openly and we publish this method in full, because doing so is good thought leadership and invites scrutiny that improves the registry. We withhold the numeric coefficients and their calibration, which remain the maintained, versioned product surface. The defensible value therefore sits not in the list of domains but in the curation methodology, the calibration of the priors, the cadence and freshness of maintenance, the depth of country and sector coverage, and the feedback loop that learns which sources predicted good estimates across many analyses. As a European entity we also rely on the protection that the database directive affords a curated database against wholesale extraction, even when the database is published, provided substantial investment in obtaining, verifying, and presenting the data is documented (European Parliament and Council 1996). We therefore keep an investment log and pair the published page with clear terms of use. Where the published page shows a tier rather than a number, the tier still lets a reader rank sources without revealing the weighting.

12. Limitations and the calibration roadmap

The priors are editorial, so they encode judgement and can be wrong; the nine classes compress real heterogeneity, so a strong domain can carry a weak page and the class alone will not catch it, which is why liveness, recency, and the primary-over-secondary rule sit downstream of the prior. The decay rates and the confidence and precision factors are parameters we have set by reasoning rather than by fitting. The calibration roadmap closes these gaps through a feedback loop: as composed estimates are later checked against outcomes or against better evidence, the record of which sources contributed to accurate estimates becomes data with which to move the priors from editorial values toward calibrated ones, and to promote good unknown sources into named classes. Until that loop has accumulated enough evidence, the priors should be read as defensible starting values, not as settled measurements.

13. Conclusion

A single market figure is only as defensible as the account of how it was reached. The Touchstone method makes that account explicit: it states what it trusts and why, it refuses to let a dead or stale citation dominate, it counts corroboration once, and it reports agreement alongside the central number. The source credibility registry is the visible part of this account, and the method described here is the reasoning behind it.

Acronyms

Acronym Expansion
ADR Architecture Decision Record
CCI Cambridge Cyber International
CISA Cybersecurity and Infrastructure Security Agency
DOI Digital Object Identifier
ENISA European Union Agency for Cybersecurity
IGO Intergovernmental Organisation
NIST National Institute of Standards and Technology
OSINT Open-Source Intelligence
PERT Program Evaluation and Review Technique
URL Uniform Resource Locator

References

Aulet, B. (2013). Disciplined Entrepreneurship: 24 Steps to a Successful Startup. Hoboken: Wiley. https://www.wiley.com/en-us/9781118692288

Aulet, B., and Snyder, B. (2017). Disciplined Entrepreneurship Workbook. Hoboken: Wiley. https://www.wiley.com/en-us/9781119365792

Borenstein, M., Hedges, L. V., Higgins, J. P. T., and Rothstein, H. R. (2009). Introduction to Meta-Analysis. Chichester: Wiley. https://doi.org/10.1002/9780470743386

Clemen, R. T., and Winkler, R. L. (1999). Combining probability distributions from experts in risk analysis. Risk Analysis, 19(2), 187 to 203. https://doi.org/10.1111/j.1539-6924.1999.tb00399.x

European Parliament and Council (1996). Directive 96/9/EC of 11 March 1996 on the legal protection of databases. https://eur-lex.europa.eu/eli/dir/1996/9/oj

Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Kuttler, H., Lewis, M., Yih, W., Rocktaschel, T., Riedel, S., and Kiela, D. (2020). Retrieval-augmented generation for knowledge-intensive NLP tasks. Advances in Neural Information Processing Systems, 33. https://arxiv.org/abs/2005.11401

Malcolm, D. G., Roseboom, J. H., Clark, C. E., and Fazar, W. (1959). Application of a technique for research and development program evaluation. Operations Research, 7(5), 646 to 669. https://doi.org/10.1287/opre.7.5.646

Metropolis, N., and Ulam, S. (1949). The Monte Carlo method. Journal of the American Statistical Association, 44(247), 335 to 341. https://doi.org/10.2307/2280232

Vose, D. (2008). Risk Analysis: A Quantitative Guide (3rd ed.). Chichester: Wiley. https://www.wiley.com/en-us/9780470512845