Tweetorial: Reconstruction-abetted re-identification attacks and other traditional vulnerabilities

  1. First, let’s get the facts straight: the U.S. Census Bureau reconstructed 100% of the 2010 Census micro-data records (308,745,538 persons).
  2. Those records contained: full census block id (all 15 digits), voting age (yes/no), sex, age (in years), race (all 63 OMB categories), and ethnicity (Hispanic or not).
  3. The reconstructed records matched the confidential data (2010 CEF) exactly (every single bit) for 46% of the population (142 million people) and allowing age +/- 1 year for 71% of the population (219 million people).
  4. Those match rates are salient because in the confidential data, more than 50% of the population is unique on those variables (block, sex, age, race and ethnicity)
  5. This makes the confidential data vulnerable to a re-identification attack by linkage to external data with some or all of the same variables.
  6. This is precisely the reason that the Census Bureau has never released public-use micro-data with detailed geography, but if you can reconstruct the detailed geography from the published tables, the vulnerability is still there.
  7. Using commercial databases harvested between 2009 and 2011 in support of the 2010 Census, the Census Bureau linked PII (name and address–technically PIK and MAFID) to the reconstructed micro-data.
  8. This linkage resulted in putative re-identification of 138 million persons (45% of the population).
  9. This estimate of the success rate (also called the recall rate) is almost certainly conservative because neither the 2010 Census nor the commercial databases has PII for all 309 million persons.
  10. When the Census Bureau linked the PII-laden reconstructed data (putative re-identifications) to the 2010 Census CEF, it confirmed the correctness of 52 million persons (confirmation rate 38%).
  11. This confirmation rate is also conservative because no use was made of the relationship-to-householder or household composition data in the published tables.
  12. The last time Census Bureau researchers published a re-identification study, the putative re-identification rate was 0.017% (389 of 2.3 million), and the confirmation rate was 22% (87 of 389).
  13. That’s an aggregate vulnerability (product of the two rates) of 0.0038%.
  14. That aggregate vulnerability for the 2010 Census, based on these more recent studies, turns out to be 17%–four orders of magnitude greater.
  15. Re-identification risk is only one part of the Census Bureau’s statutory obligation to protect confidentiality. The statute also requires protection against exact attribute disclosure.
  16. Neither the census block nor voting age received any confidentiality protection in the tabular summaries from the 2010 Census (this is public information in all of the relevant technical documentation).
  17. Consequently, the micro-data reconstruction of block and voting-age is always exactly correct, and exactly matches the confidential data.
  18. Any block where the voting-age data are either all “yes” or all “no” is an exact attribute disclosure assignable to all persons living in that block on April 1, 2010.
  19. There is wide-spread recognition in the official statistics community that both reconstruction-abetted re-identification and reconstruction-abetted exact attribute disclosure are unacceptable vulnerabilities for 2020 Census publications,
  20. Those publications may include a block-level citizen voting age population by race and ethnicity table.
  21. Former Census Bureau Director John Thompson and former BLS Commissioner Erica Groshen have both publicly said that these vulnerabilities must be addressed.
  22. Differential privacy, as implemented for the 2020 Census, directly addresses both of these traditional vulnerabilities, and allows the publisher to manage the accuracy of the resulting tables to ensure fitness-for-use.
  23. No traditional SDL method can make that claim, accompanied by proof.
  24. You are free to take issue with this risk assessment, but the statutory confidentiality protection obligation is the domain of the Census Bureau, and the protections of Title 13, section 9 are not subject to a “when convenient” exception.
  25. This tweetorial and slides from my recent AAAS and AAG talks published here.
Print Friendly, PDF & Email