A third of PyPi software packages contains flaw to execute code when downloaded

Python programming code is seen on a computer screen. — Checkmarx published research Friday showing that approximately one-third of software packages from the Python Package Index (PyPi) are vulnerable to a design feature that allows an attacker execute code. (Image credit: traffic_analyzer via Getty)

Approximately one-third of software packages from the Python Package Index (PyPi) are vulnerable to a design feature that allows an attacker to automatically execute code when downloaded on a computer.

The findings, discovered by Checkmarx and published Friday, underscore how open source software repositories like PyPi are increasingly being targeted and leveraged by malicious actors. The company said that “a large number of the malicious packages we are finding in the wild use this feature of code execution upon installation to achieve higher infection rates.”

According to Tzachi Zorenshtain, head of supply chain security at Checkmarx, when developers install a software package from repositories like PyPi, most understand there’s also a risk of installing any malicious code that goes with it.

“When we actually examined the behavior and looked for new attack vectors, we discovered that if you download a malicious package — just download it — it will automatically run on your computer,” he told SC Media in an interview from Israel. “So we tried to understand why, because for us the word download doesn’t necessarily mean that the code will automatically run.”

But for PyPi, it does. The commands required for both processes run a script, called pip, executes another file called setup.py, that is designed to provide a data structure for the package manager to understand how to handle the package. That script and process is also composed of Python code that runs automatically, meaning an attacker can insert and execute that malicious code on the device of anyone who downloads it.

In fact, this specific vulnerability was called out as far back as 2014 on GitHub, but hasn’t been directly addressed because the flaw is more a feature of how software is frequently downloaded and installed from the repository than a bug and cannot be directly patched.

“It's an unfortunate fact of the Python packaging ecosystem that anything related to packaging always involves arbitrary code execution (referring to setup.py),” one GitHub user wrote in July 2014.

In recent years, PyPi has introduced a new wheel (.whl) file type that removes the need to run the setup.py command altogether, but for compatibility reasons they still allow contributors to choose their preferred format. That means that many packages on PyPi — up to a third, according to Checkmarx — still use the vulnerable tar.gz format, and obviously malicious actors would intentionally choose the older format in order to spread their malicious code.

There are other workarounds, such as downloading the package through your browser, that can avoid using the setup.py process altogether. Beyond that, Zorenshtain expects the vulnerability to be exploited in packages using the older file format for years to come.

“What is most alarming for us is this isn’t a vulnerability that’s going to be fixed easily,” said Zorenshtain, later adding “If we magically changed all the formats and everything is resubmitted and filed to the new format, then it would be easy to remove this behavior. We understand that this behavior will probably be with us for a little while, so at least [building] awareness is what was important to us.”

A request for comment and questions sent to the Python Software Foundation, which manages PyPi as a free community resource, were not returned at press time.