Guide
The vast majority of financial data APIs rely on XBRL tags to extract fundamentals from SEC filings. Here is why that approach is fundamentally flawed — and how AI extraction produces far more accurate and complete data.
If you have ever pulled financial data from an API and compared it against the actual SEC filing, you have probably found discrepancies. Missing line items, incorrect totals, quarters that do not add up to the annual figure, or entire sections of the financial statements simply absent.
This is not a bug in any specific provider — it is a systemic problem with how nearly every financial data API extracts data. The root cause is XBRL: the tagging standard that the SEC requires companies to use when filing financial statements electronically.
Understanding why XBRL fails — and what the alternative looks like — is critical for anyone who relies on financial data for investment decisions, research, or application development.
XBRL (eXtensible Business Reporting Language) is a standard that requires companies to tag each line item in their financial statements with a machine-readable label. In theory, this makes it trivial to extract structured data. In practice, it creates a cascade of accuracy problems:
The XBRL taxonomy provides standard tags, but companies routinely create custom extensions for line items that do not fit neatly into the standard categories. A company might report "Adjusted Operating Income Before Restructuring Charges" with a custom tag that no automated system knows how to map. These custom tags account for a significant portion of all tags used in practice.
Two companies reporting the exact same type of revenue might use completely different XBRL tags. One uses the standard "Revenue" tag, another uses "RevenueFromContractWithCustomerExcludingAssessedTax", and a third creates a custom extension. Any system that maps XBRL tags to a normalized schema must handle thousands of these variations — and inevitably misses some.
XBRL data frequently has broken relationships between quarterly and annual figures. Q1 + Q2 + Q3 + Q4 should equal the full year, but differences in how companies tag restated figures, reclassifications, or mid-year accounting changes mean the numbers often do not reconcile. Most APIs simply pass through these inconsistencies without catching them.
When a company uses a non-standard reporting format or groups line items differently than the taxonomy expects, XBRL-based extraction simply skips those values. You end up with incomplete financial statements — a balance sheet missing key asset categories, or an income statement with gaps between revenue and net income.
Unlike an independent audit, XBRL tagging is done by the filing company (or their filing agent). There is no verification that tags are applied correctly. Studies have found significant error rates in XBRL filings, and the SEC's own reviews have flagged widespread tagging quality issues.
AI extraction takes a fundamentally different approach: instead of reading machine-generated XBRL tags, it reads the actual filing — the same tables and text that a human analyst would read. This is how StockAInsights processes every SEC filing:
AI models process the full text of 10-K and 10-Q filings, including financial statement tables, footnotes, and MD&A sections. This means every line item that appears in the filing is captured, regardless of how it is tagged (or whether it is tagged at all) in XBRL.
Companies with unusual reporting formats, custom line items, or industry-specific financial structures are extracted just as accurately as standard filers. The AI understands context — it can identify "Net revenues", "Total net sales", and "Revenue from operations" as the same concept without needing a tag mapping table.
StockAInsights does not just extract numbers in isolation. The system cross-references quarterly figures against annual totals, flags discrepancies, and resolves them by referring back to the source filing. This catches restatements, reclassifications, and reporting changes that XBRL-based systems miss entirely.
Because AI reads the full filing rather than cherry-picking tagged values, the extracted data includes every line item the company reported. Income statements, balance sheets, and cash flow statements are complete — not just the subset of fields that happened to have standard XBRL tags.
Beyond raw numbers, AI extraction captures strategic analysis from management discussion sections, risk factors, and segment breakdowns — contextual information that is never available through XBRL tags but is essential for understanding the numbers.
| Dimension | XBRL-Based Extraction | AI Extraction (StockAInsights) |
|---|---|---|
| Data source | Machine-generated XBRL tags | Actual filing text and tables |
| Custom line items | Frequently missed or misclassified | Captured from document context |
| Completeness | Only tagged values included | All reported line items included |
| Quarterly/annual consistency | Often breaks on restatements | Cross-validated and reconciled |
| Non-standard filers | Poor accuracy | Same accuracy as standard filers |
| Strategic analysis | Not available | Extracted from MD&A and risk factors |
| Tagging errors | Inherited from filer | Bypassed entirely |
Inaccurate financial data compounds through every layer of analysis built on top of it. Here is what is at stake:
DCF models, comparable company analysis, and ratio-based screens all depend on accurate fundamental data. A missing or incorrect line item can shift a valuation by double digits. If your data source silently drops a major expense category, your margins look better than reality.
Quantitative screens that filter stocks by financial metrics will produce false positives and false negatives when the underlying data has gaps. A company might appear to meet your criteria because a cost line was missing from the data, or be excluded because revenue was misclassified.
Comparing a company's performance across quarters and years requires consistent data. When XBRL tag mappings change between filings — which happens frequently — trend analysis breaks down. What looks like a sudden revenue jump might just be a tag reclassification.
XBRL relies on companies tagging their own financial data using standardized taxonomies. In practice, companies frequently use custom extensions, inconsistent tags, or incorrect mappings. This means XBRL-based APIs often have missing line items, misclassified values, or broken quarterly-to-annual relationships — problems that are invisible unless you cross-reference every number against the original filing.
AI extraction reads the actual text and tables in SEC 10-K and 10-Q filings — the same way a human analyst would. Instead of relying on XBRL tags that companies self-assign, AI models parse the filing content directly, identifying income statement lines, balance sheet items, and cash flow components from the document structure and context. This produces more complete and accurate results, especially for companies with non-standard reporting formats.
Sign up for a free StockAInsights account, obtain an API key, and make HTTP requests to the endpoints. All responses are JSON. StockAInsights provides REST endpoints for income statements, balance sheets, and cash flows extracted from SEC filings. See the API documentation for authentication and endpoint details.
Yes, SEC EDGAR data is completely free. The SEC provides public access to all company filings through EDGAR and its API endpoints. However, the data is in raw formats like XBRL and HTML, so you need to build your own parser or use a service like StockAInsights that extracts structured data from the filings for you.
Most providers rely on XBRL tags to extract data, but companies apply these tags inconsistently. One provider might map a custom XBRL extension correctly while another misses it entirely. Companies also restate figures, change segment reporting, or use non-standard line items — all of which trip up tag-based extraction. AI extraction avoids this by reading the filing itself rather than depending on metadata.
Browse AI-extracted financial data from SEC filings. Compare the completeness and accuracy against any other source — then decide.
See the data in action: NVIDIA, Apple, or browse all companies.