EarSpy attack eavesdropping on Android phones via motion sensors

EarSpy attack eavesdropping on Android phones via motion sensors

A team of researchers has developed a spying attack for Android devices that can, to varying degrees, recognize a caller’s gender and identity and even discern private speech.

Called EarSpy, the side-channel attack aims to explore new possibilities for eavesdropping by capturing motion sensor data readings caused by reverberations from ear speakers on mobile devices.

EarSpy is a scholarly effort of researchers from five US universities (Texas A&M University, New Jersey Institute of Technology, Temple University, University of Dayton, and Rutgers University).

While this type of attack has been explored on smartphone speakers, in-ear speakers were considered too weak to generate enough vibration for the eavesdropping risk to make such a side-channel attack practical.

However, modern smartphones use more powerful stereo speakers compared to models from a few years ago, which produce much better sound quality and stronger vibrations.

Similarly, modern devices use more sensitive motion sensors and gyroscopes that can register even the smallest speaker resonances.

Proof of this progress is shown below, where the earpiece of a 2016 OnePlus 3T barely registers on the spectrogram, while the stereo speakers of a 2019 OnePlus 7T produce much more data.

From left to right, OnePlus 3T speaker, OnePlus 7T speaker and OnePlus 7T speaker
Left to Right Ear Speakers for OnePlus 3T, OnePlus 7T, OnePlus 7T Speaker
source: (arxiv.org)

experiment and results

The researchers used a OnePlus 7T and OnePlus 9 device in their experiments, along with several sets of pre-recorded audio played only through the in-ear speakers of the two devices.

The team also used the third-party application ‘Physics Toolbox Sensor Suite’ to capture accelerometer data during a simulated call and then sent it to MATLAB for analysis and to extract features from the audio stream.

A machine learning (ML) algorithm was trained using readily available data sets to recognize speech content, caller identity, and gender.

Test data varied by data set and device, but produced overall promising results for listening through the ear speaker.

Caller gender identification on the OnePlus 7T ranged from 77.7% to 98.7%, caller ID classification ranged from 63.0% to 91.2%, and recognition of voice ranged from 51.8% to 56.4%.

Test results on the OnePlus 7T
Test results on the OnePlus 7T (arxiv.org)

“We evaluated time and frequency domain features with classical ML algorithms, which show the highest accuracy of 56.42%,” the researchers explain in their paper.

“As there are ten different classes here, the accuracy still shows five times the accuracy of a random guess, implying that the vibration due to the ear speaker induced a reasonable amount of distinguishable shock in the accelerometer data” – EarSpy White Paper

On the OnePlus 9 device, gender identification topped 88.7%, speaker identification dropped to an average of 73.6%, while voice recognition ranged from 33.3-41.6% %.

Test results on the OnePlus 9
Test results on the OnePlus 9 (arxiv.org)

Using the speaker and the ‘harpoon‘ that researchers developed while experimenting with a similar attack in 2020, caller gender and identification accuracy reached 99%, while voice recognition reached 80% accuracy.

Limitations and solutions

One thing that could reduce the effectiveness of the EarSpy attack is the volume that users choose for their headphones. A lower volume might prevent eavesdropping through this side channel attack and is also more comfortable on the ear.

The arrangement of the hardware components of the device and the tightness of the assembly also affect the diffusion of reverberation from the speaker.

Finally, movement of the user or vibrations introduced from the environment reduce the accuracy of the derived voice data.

Android 13 has introduced a restriction on the collection of data from sensors without permission to sample data rates higher than 200 Hz. While this prevents speech recognition at the default sample rate (400 Hz – 500 Hz), it only reduces the accuracy by approximately 10% if the attack is made at 200 Hz.

The researchers suggest that phone manufacturers should ensure that sound pressure remains stable during calls and place motion sensors in a position where vibrations from internal sources do not affect them or at least have as little impact as possible. .

Leave a Reply

Your email address will not be published. Required fields are marked *