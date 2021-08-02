A plan to protect the confidentiality of Americans’ responses to the 2020 census by injecting small, calculated distortions into the results is raising concerns that it will erode their usability for research and distribution of state and federal funds.

The Census Bureau is due to release the first major results of the decennial count in mid-August. They will offer the first detailed look at the population and racial makeup of thousands of counties and cities, as well as tribal areas, neighborhoods, school districts and smaller areas that will be used to redraw congressional, legislative and local districts to balance their populations.

The bureau will adjust most of those statistics to prevent someone from recombining them in a way that would disclose information about an individual respondent. Testing by the bureau shows that improvements in data science, computing power and commercial databases make that feasible.

Last week the bureau’s acting director said the plan was a necessary update of older methods to protect confidentiality. Ron Jarmin said the agency searched for alternatives before settling on differential privacy, a systematic approach to add statistical noise to data, something it has done in some fashion for years.

“I’m pretty confident that it’s going to meet users’ expectations," Mr. Jarmin said at a panel during an online conference of government data users. “We have to deal with the technology as it is and as it evolves."

Using a complex algorithm, the agency first will use small random numbers—both negative and positive—to adjust most of the census totals that it compiled after trying to count every American last year. Then it will square up inconsistent subtotals created by the first step.

The result should produce fairly accurate totals for most groups and places but improbable or impossible results for very small groups or areas, the bureau concluded recently. Testing with 2010 census data showed that the algorithm would have changed the population of the average county by two people, or about 0.02%, according to the bureau. But for 6,000 towns with fewer than 500 people, the error would have been 1.5%.

That worries local officials because billions of dollars in federal and state aid are allocated using population-based formulas. Census data documenting small minority communities is often the crux of civil-rights litigation over proposed election districts.

Distortion will be greater for small groups and areas such as census blocks, bureau officials say. Census blocks, the smallest areas for which data is published, are typically a city block but much larger in rural areas.

Testing showed that if the algorithm had been used on the 2010 census, it would have produced more occupied housing units than people in 5% of census blocks. In 8% of census blocks, it would have produced counts of at least some occupants but no occupied housing units.

The National Congress of American Indians, noting potential effects on small, scattered tribal settlements, said in a May analysis that “error rates have improved over time, but tribal nations must determine if errors of 5% or even 10% of their total population are an acceptable outcome of these enhanced privacy protections."

More than two years of development, including five rounds of testing and user feedback, have reduced distortion significantly. Results from a sixth test, using the algorithm and settings to be used on the 2020 data, will be released along with the census results.

In comments on the approach to the privacy tool filed after the fifth test, analysts at a major census data center at the University of Minnesota said that “major discrepancies remain for minority populations."

“This level of error will severely compromise demographic and policy analyses," according to the analysis led by David Van Riper of the university’s IPUMS-NHGIS. Mr. Van Riper said last week that the bureau acknowledged the issue by reducing the level of statistical noise that it plans to add. He also said the bureau was sacrificing some accuracy among census blocks to preserve more for small groups in larger areas.

The pending census results are designed primarily for state and local officials who must redraw election districts, often on tight schedules because of legal deadlines for the 2022 election cycle.

This story has been published from a wire agency feed without modifications to the text

