Guide

XBRL vs AI Extraction: Why Most Financial Data APIs Get It Wrong

The vast majority of financial data APIs rely on XBRL tags to extract fundamentals from SEC filings. Here is why that approach is fundamentally flawed — and how AI extraction produces far more accurate and complete data.

The Hidden Problem with Financial Data APIs

If you have ever pulled financial data from an API and compared it against the actual SEC filing, you have probably found discrepancies. Missing line items, incorrect totals, quarters that do not add up to the annual figure, or entire sections of the financial statements simply absent.

This is not a bug in any specific provider — it is a systemic problem with how nearly every financial data API extracts data. The root cause is XBRL: the tagging standard that the SEC requires companies to use when filing financial statements electronically.

Understanding why XBRL fails — and what the alternative looks like — is critical for anyone who relies on financial data for investment decisions, research, or application development.

How XBRL Works — and Where It Breaks Down

XBRL (eXtensible Business Reporting Language) is a standard that requires companies to tag each line item in their financial statements with a machine-readable label. In theory, this makes it trivial to extract structured data. In practice, it creates a cascade of accuracy problems:

Custom Extensions Are Everywhere

The XBRL taxonomy provides standard tags, but companies routinely create custom extensions for line items that do not fit neatly into the standard categories. A company might report "Adjusted Operating Income Before Restructuring Charges" with a custom tag that no automated system knows how to map. These custom tags account for a significant portion of all tags used in practice.

Inconsistent Tagging Across Filers

Two companies reporting the exact same type of revenue might use completely different XBRL tags. One uses the standard "Revenue" tag, another uses "RevenueFromContractWithCustomerExcludingAssessedTax", and a third creates a custom extension. Any system that maps XBRL tags to a normalized schema must handle thousands of these variations — and inevitably misses some.

Quarterly vs Annual Mismatches

XBRL data frequently has broken relationships between quarterly and annual figures. Q1 + Q2 + Q3 + Q4 should equal the full year, but differences in how companies tag restated figures, reclassifications, or mid-year accounting changes mean the numbers often do not reconcile. Most APIs simply pass through these inconsistencies without catching them.

Missing Line Items

When a company uses a non-standard reporting format or groups line items differently than the taxonomy expects, XBRL-based extraction simply skips those values. You end up with incomplete financial statements — a balance sheet missing key asset categories, or an income statement with gaps between revenue and net income.

Companies Tag Their Own Data

Unlike an independent audit, XBRL tagging is done by the filing company (or their filing agent). There is no verification that tags are applied correctly. Studies have found significant error rates in XBRL filings, and the SEC's own reviews have flagged widespread tagging quality issues.

How AI Extraction Solves These Problems

AI extraction takes a fundamentally different approach: instead of reading machine-generated XBRL tags, it reads the actual filing — the same tables and text that a human analyst would read. This is how StockAInsights processes every SEC filing:

Reads the Actual Filing Text

AI models process the full text of 10-K and 10-Q filings, including financial statement tables, footnotes, and MD&A sections. This means every line item that appears in the filing is captured, regardless of how it is tagged (or whether it is tagged at all) in XBRL.

Handles Non-Standard Reporting

Companies with unusual reporting formats, custom line items, or industry-specific financial structures are extracted just as accurately as standard filers. The AI understands context — it can identify "Net revenues", "Total net sales", and "Revenue from operations" as the same concept without needing a tag mapping table.

Cross-Validates Quarterly and Annual Data

StockAInsights does not just extract numbers in isolation. The system cross-references quarterly figures against annual totals, flags discrepancies, and resolves them by referring back to the source filing. This catches restatements, reclassifications, and reporting changes that XBRL-based systems miss entirely.

Complete Financial Statements

Because AI reads the full filing rather than cherry-picking tagged values, the extracted data includes every line item the company reported. Income statements, balance sheets, and cash flow statements are complete — not just the subset of fields that happened to have standard XBRL tags.

Strategic Context Included

Beyond raw numbers, AI extraction captures strategic analysis from management discussion sections, risk factors, and segment breakdowns — contextual information that is never available through XBRL tags but is essential for understanding the numbers.

XBRL Extraction vs AI Extraction

Dimension	XBRL-Based Extraction	AI Extraction (StockAInsights)
Data source	Machine-generated XBRL tags	Actual filing text and tables
Custom line items	Frequently missed or misclassified	Captured from document context
Completeness	Only tagged values included	All reported line items included
Quarterly/annual consistency	Often breaks on restatements	Cross-validated and reconciled
Non-standard filers	Poor accuracy	Same accuracy as standard filers
Strategic analysis	Not available	Extracted from MD&A and risk factors
Tagging errors	Inherited from filer	Bypassed entirely

Why This Matters for Your Analysis

Inaccurate financial data compounds through every layer of analysis built on top of it. Here is what is at stake:

Valuation Models

DCF models, comparable company analysis, and ratio-based screens all depend on accurate fundamental data. A missing or incorrect line item can shift a valuation by double digits. If your data source silently drops a major expense category, your margins look better than reality.

Screening and Filtering

Quantitative screens that filter stocks by financial metrics will produce false positives and false negatives when the underlying data has gaps. A company might appear to meet your criteria because a cost line was missing from the data, or be excluded because revenue was misclassified.

Trend Analysis

Comparing a company's performance across quarters and years requires consistent data. When XBRL tag mappings change between filings — which happens frequently — trend analysis breaks down. What looks like a sudden revenue jump might just be a tag reclassification.

Frequently Asked Questions

What is wrong with XBRL-based financial data?

XBRL relies on companies tagging their own financial data using standardized taxonomies. In practice, companies frequently use custom extensions, inconsistent tags, or incorrect mappings. This means XBRL-based APIs often have missing line items, misclassified values, or broken quarterly-to-annual relationships — problems that are invisible unless you cross-reference every number against the original filing.

How does AI extraction work for SEC filings?

AI extraction reads the actual text and tables in SEC 10-K and 10-Q filings — the same way a human analyst would. Instead of relying on XBRL tags that companies self-assign, AI models parse the filing content directly, identifying income statement lines, balance sheet items, and cash flow components from the document structure and context. This produces more complete and accurate results, especially for companies with non-standard reporting formats.

How do I get stock financial data via API?

Sign up for a free StockAInsights account, obtain an API key, and make HTTP requests to the endpoints. All responses are JSON. StockAInsights provides REST endpoints for income statements, balance sheets, and cash flows extracted from SEC filings. See the API documentation for authentication and endpoint details.

Is SEC EDGAR data free?

Yes, SEC EDGAR data is completely free. The SEC provides public access to all company filings through EDGAR and its API endpoints. However, the data is in raw formats like XBRL and HTML, so you need to build your own parser or use a service like StockAInsights that extracts structured data from the filings for you.

Why do financial data providers show different numbers for the same company?

Most providers rely on XBRL tags to extract data, but companies apply these tags inconsistently. One provider might map a custom XBRL extension correctly while another misses it entirely. Companies also restate figures, change segment reporting, or use non-standard line items — all of which trip up tag-based extraction. AI extraction avoids this by reading the filing itself rather than depending on metadata.

See the Difference for Yourself

Browse AI-extracted financial data from SEC filings. Compare the completeness and accuracy against any other source — then decide.

API Documentation View Plans & Pricing

See the data in action: NVIDIA, Apple, or browse all companies.