Rice University Researchers Discover Inexpensive Data Privacy Protocol

Author:

Anshumali Shrivastava, an associate professor of computer science at Rice, believes that one of the major challenges in leveraging large chunks of personal data to make a process more efficient and useful for people is the unscalability of privacy methods. Otherwise, training machine learning systems “to search for patterns in large databases of medical or financial records” can improve the delivery of medical treatment services to a large extent.

The Solution

Anshumali and university graduate student Ben Coleman are looking forward to presenting a solution named locality sensitive hashing. The duo will showcase it at CCS 2021, the Association for Computing Machinery's annual conference on computer and communications security. The foundational logic of the solution rests on creating a small summary of an enormous database of sensitive records. They have named the method Repeated Array of Count Estimator, or RACE.

These RACE sketches work as safe provisions for companies to enjoy the benefits of large-scale, distributed machine learning while maintaining a strict form of data privacy called differential privacy.

Differential privacy pivots on adding random noise to obscure individual information. So far, there exists no technique to scale differential privacy standards. RACE solves this problem by introducing sketching scales for high-dimensional data that are small and have computational and memory requirements that are easily distributable during construction.

According to Anshumali, RACE is “simple, fast and 100 times less expensive to run than existing methods.”

Anshumali and his students have innovated many algorithmic strategies to turn machine learning into a faster and more scalable paradigm. Some of their achievements include: finding a more efficient solution to help social media companies stop the spread of misinformation, training large-scale deep learning systems ten times faster, precisely estimating the number of victims killed in the Syrian civil war, making it possible to train deep neural networks 15 times faster on general-purpose CPUs than GPUs, and reducing the time required for searching large metagenomic databases.

Rice University