How We Validate Property Data Before It Reaches You

Every property screening result on this platform goes through a 5-pass validation process before it reaches you. Across 7 pipelines and tens of thousands of lines of code, this process has found and fixed 40 bugs — including false data source attributions, database connection leaks, and language that overstated what the data could support.

This page explains what those 5 passes are, why they matter for property data specifically, and what we found when we applied them.

Why property data needs more than unit tests

Property screening tools sit in a specific legal position under Australian law. Under the Australian Consumer Law (section 18), conduct that is misleading or deceptive — or likely to mislead or deceive — is prohibited. Under the principle established in Shaddock & Associates Pty Ltd v Parramatta City Council (1981), a party that provides information knowing it will be relied upon owes a duty of care in how that information is presented.

What this means in practice: if a property screening report says a lot “qualifies” for a secondary dwelling, and the buyer relies on that to purchase, and it turns out the data was wrong or the language overstated, there is a liability chain. Unit tests catch code bugs. They do not catch false data attributions, misleading language, or silent failures where wrong data is served without any error.

The 5-pass framework addresses all of these.

The 5-pass validation framework

Data source verification

Are we querying the right source, with the right parameters?

Every external data source — government APIs, satellite imagery providers, spatial databases — is traced from endpoint URL to the value that appears in your report. If a source is listed in your report, it was actually queried. If it was not queried, it does not appear.

Algorithm correctness

Does the logic produce the right answer for known inputs?

Scoring algorithms, signal computations, and derived fields are verified against expected outputs. Test suites cover normal cases, boundary conditions, and adversarial inputs. Across 7 pipelines, we maintain over 200 automated tests.

Null and edge case hardening

What happens when data is missing, malformed, or unexpected?

Government APIs return null fields, empty arrays, and unexpected formats more often than you would think. Every field that comes from an external source is null-checked before use. Missing data produces a clear "unavailable" indicator — never a false positive or silent failure.

Connection and resource safety

Are database connections and API handles properly released?

A leaked database connection under load can take down an entire service. Every connection follows a strict open-try-finally-close pattern. External API calls have timeouts. Long-running queries have statement-level time limits.

Output defensibility

Does the language in the report stay within what the data supports?

Every word of user-facing text is audited against a substitution ruleset. "Qualifies" becomes "meets criteria based on data sources checked". "Risk assessment" becomes "screening". No statement crosses the line from factual information into advice, recommendation, or assurance.

What we found

When we applied this framework systematically across all 7 property screening pipelines, we found 40 bugs. None of them would have been caught by standard unit tests alone.

Recurring patterns found across all pipelines

False data source attribution

PDF reports listing data sources the pipeline did not actually query. Found and fixed across 5 of 7 pipelines.

Fix: Every source in the report now maps 1:1 to a verified API call with a recorded response hash.

Connection leaks

Database connections closed inside the success path but not on the error path — meaning an API failure would leak a connection.

Fix: All connections now follow an open-try-finally-close pattern. Connections are released regardless of whether the query succeeds or fails.

Liability-creating language

Words like "qualifies", "passes all checks", "risk assessment", and "will require" that imply a formal professional determination.

Fix: Systematic replacement with factual alternatives: "meets criteria", "meets", "screening", "may require". Automated grep patterns enforce this in code review.

The single most common issue was false data source attribution — PDF reports claiming data came from a source that the pipeline never actually queried. This is exactly the kind of issue that creates legal exposure: a user sees “source: NSW Building Footprints” in their report, assumes the data came from that dataset, and makes a decision based on that assumption. In reality, the data came from a different source entirely.

What every report records

Every report generated on this platform creates an append-only audit trail record. Here is exactly what is captured:

✓

Every data source queried

Recorded: Source name, endpoint URL, query parameters

Proves: Which government APIs and datasets were actually consulted for this specific address

✓

Query timestamp

Recorded: UTC timestamp for each data source query

Proves: Exactly when each source was checked — not a cached result from weeks ago

✓

Response time

Recorded: Milliseconds from request to response for each source

Proves: The query actually executed (not a timeout or silent failure)

✓

Response hash

Recorded: SHA-256 cryptographic hash of each API response body

Proves: The raw data has not been altered after receipt — tamper-evident

✓

Features returned

Recorded: Count of records/features returned by each source

Proves: Whether the source returned data or came back empty for this location

✓

Error tracking

Recorded: Error message and stack trace if a source query failed

Proves: Failures are recorded, not silently swallowed — you see what was not available

✓

Pipeline version

Recorded: Git commit SHA of the deployed code

Proves: Which exact version of the analysis logic produced the result

✓

Disclaimer version

Recorded: Version ID of the disclaimer active at report generation time

Proves: What the user was told about limitations when they received the report

If a data source is later questioned — “did you actually check the flood overlay for this address?” — we can produce the exact response, when it was received, and what it contained.

Versioned disclaimers

Each product has its own disclaimer text, stored in a versioned database table. Disclaimers are never deleted — when the text changes, a new version is inserted and the old version is marked as superseded. Every audit trail record captures which disclaimer version was active when the report was generated.

This matters because the legal question is not “what does the disclaimer say today?” — it is “what did the user see when they received their report?” Versioned disclaimers answer that question definitively.

Ongoing data source monitoring

Government APIs go down, change their schema, or return stale data without warning. An automated daily monitor probes every external data source across all screening tools. Each probe checks:

Is the endpoint reachable?
Does the response match the expected format?
Has the response structure changed since the last check?
Is the response time within acceptable limits?

If a source goes down or changes its format, we know within 24 hours — not when a user receives a broken report. Failures trigger an immediate alert.

The same daily check also looks for two types of silent failure that are harder to catch:

Missing audit trails — if a report was generated but the audit record was not written, the gap is flagged. A report without an audit trail cannot be defended if questioned.
Empty or incomplete outputs — if a screening tool ran but produced no results, no confidence rating, or recorded no data sources, it is flagged. This catches the case where a tool appears to succeed but actually returned nothing useful.

This is not a one-off validation — it runs every day, automatically. The results are recorded so we can show, for any given day, which sources were checked and what their status was.

What this means for you

When you receive a property screening report from this platform:

Every data source listed in the report was actually queried — the attribution is verified, not copied from a template
Missing data is clearly labelled as “unavailable” — you will never see a false “all clear” when data is actually missing
The language describes what the data shows — it does not make compliance determinations, recommendations, or assurances
A complete audit trail exists for every report, linking the output to the exact data that produced it

These are screening tools, not formal planning certificates. They are designed to surface the right questions early — before you are committed — so you can get qualified professional advice where it matters.

See data sources per tool

This content is general information about NSW planning and property matters. It is not planning advice, legal advice, financial advice, or insurance advice, and should not be relied upon as a substitute for professional assessment. Planning controls and regulatory instruments change — verify current provisions at planning.nsw.gov.au and legislation.nsw.gov.au.