November 30, 2021

Magecart Classification Using a Deep Learning Approach

By Jscrambler | 3 min read

Nowadays, strict GDPR laws and regulations make it imperative for web applications to be as safe as possible. Not only from a legal point of view but also from a business perspective, it is highly advisable to adopt safe procedures, since having full control over your application can ensure the business behind it is more profitable.

Now when it comes to the biggest victims of client-side threats, that often result in data and customers loss, the Finance and E-commerce sectors are clearly strong candidates. Regarding  E-commerce, specifically, web skimmers (e.g. Magecart) are known to illegally capture information related to payments, such as credit card numbers and other critical data.

There are other examples of relevant threats, such as the injection of price comparison ads, but the bottom line is both the E-commerce and Financial Services sectors are prime targets for attackers looking to steal personal user data and transaction-related information. And in all of these scenarios, leaving the client-side unprotected represents a breach of compliance with regulations, often leading to huge penalties.

Answering this key business need of ensuring that the client-side is secure, Jscrambler Webpage Integrity, or WPI, provides is a holistic approach that aims to mitigate every client-side threat, including:

  • Supply chain attacks, such as Magecart and formjacking
  • MitB trojans, Bots, 0-day threats & APT (Advanced Persistent Threats)
  • Customer journey hijacking

Magecart Detection Experiment

In line with this approach, our team conducted a thorough experiment, addressing the detection of Magecart specifically, under the initiative of the AppOwl project, a cooperation between Jscrambler and INESC TEC. The project consists of a Magecart classification system, using the data generated in-house. To gather this data our team relied on our WPI embedded agent to track webpage changes and to get details about that change, the stack of function calls that triggered it, the script that triggered the change, among others.

With that said, our team used a browser plugin to trigger the download of a real Magecart script tailored for each webpage. And for each visit, the browser automation software navigates through a set of webpages that a normal user would typically visit in order to sign in and checkout or pay for items, filling in fields on these webpages. Then, when the Magecart script is inserted into the browser, it steals the data in those fields and sends it to a local server. But during each webpage request, the Magecart script will run together with the rest of the webpage scripts and depending on the webpage and on the request, different scripts will be running and enacting changes to the webpage. This is where the detection comes in.

Our team used two models for detecting Magecart: a dense neural network and a sequential, LSTM model (long short-term memory model), as per their default implementations in Keras-tensorflow. The models were then trained with different epochs over the training data for which an additional epoch would result in little improvement over performance.

Experiment Results

The results of the tests show that the classifiers correctly detect a specific set of sequences labeled as Magecart that we can safely interpret as Magecart behavior. Those sequences include the injection of the script to the document’s head, its download, and execution, followed by the changes (poison) of the “onclick” event on different objects in the DOM (e.g. button, div), and finally, the script collects user data and exfiltrates it from the browser. Our team’s classifier is also able to detect Magecart behavior even when some occurrences are missing because of data capture problems.

Then, after training a model on a given website, our team used it on another website targeted by the same Magecart script to understand if the model could be reused. The results show that model reuse is feasible although not all websites seem capable of correctly generalizing for other websites. The nature of the website and the sequences of occurrences that characterize non-Magecart behavior might influence the classification results when reusing models.

Final Thoughts

Overall, the experiments conducted by Jscrambler’s Research team outline how Jscrambler Webpage Integrity is capable of accurately detecting Magecart behavior in websites and allow companies to effectively block these behaviors and prevent web skimming attacks regardless of the attack vector.

Interested in getting a copy of the full research? Please email: [email protected]

jscrambler-inesctec-portugal-2020

Author
JscramblerThe leader in client-side Web security. With Jscrambler, JavaScript applications become self-defensive and capable of detecting and blocking client-side attacks like Magecart.
View All Posts

Subscribe to our weekly newsletter

Learn more about new security threats and technologies.

I agree to receive these emails and accept the Privacy Policy.