terewmen.blogg.se - Market Basket Analysis In R

#Market Basket Analysis In R Free Programming Language

Market Basket Analysis In R Free Programming Language

R is a free programming language for statistical computing and graphics widely used in the data science community to perform data analysis. In this article, we see how to perform a market basket analysis using R and Neural Designer. By Pablo Martin, Artelnics. Market basket analysis using machine learning.

Published by John Wiley & Sons, year 2013. For example, customers that buy a pencil and paper are likely to buy a rubber or ruler.Data Mining and Business Analytics with R by. In other words, it allows the supermarkets to identify relationships between the products that people buy. One of the key techniques used by the large retailers is called Market Basket Analysis (MBA), which uncovers associations between products by looking for combinations of products that frequently co-occur in transactions. In a previous blog post, we discussed how supermarkets use data to better understand consumer needs and, ultimately, increase their overall spend.

For example, one itemset might be. Each transaction represents a group of items or products that have been bought together and often referred to as an “itemset”. How does Market Basket Analysis work?To carry out an MBA you’ll first need a data set of transactions. Targeting marketing campaigns by sending out promotional coupons to customers for products related to items they recently purchased.Given how popular and valuable MBA is, we thought we’d produce the following step-by-step guide describing how it works and how you could go about undertaking your own Market Basket Analysis. Driving online recommendation engines (“customers who purchased this product also viewed this product”) and Grouping products that co-occur in the design of a store’s layout to increase the chance of cross-selling

A lift of more than 1 suggests that the presence of pencil and paper increases the probability that a rubber will also occur in the transaction. For example, if pencil, paper and rubber occurred together in 2.5% of all transactions, pencil and paper in 10% of transactions and rubber in 8% of transactions, then the lift would be: 0.025/(0.1*0.08) = 3.125. The higher the confidence, the greater the likelihood that the item on the right hand side will be purchased or, in other words, the greater the return rate you can expect for a given rule.Lift: the probability of all of the items in a rule occurring together (otherwise known as the support) divided by the product of the probabilities of the items on the left and right hand side occurring as if there was no association between them. Rules with a high support are preferred since they are likely to be applicable to a large number of future transactions.Confidence: the probability that a transaction that contains the items on the left hand side of the rule (in our example, pencil and paper) also contains the item on the right hand side (a rubber). The higher the support the more frequently the itemset occurs. We therefore measure the strength of a rule by calculating the following three metrics (note other metrics are available, but these are the three most commonly used):Support: the percentage of transactions that contain all of the items in an itemset (e.g., pencil, paper and rubber).

For those that are interested we’ve included the R code that we used at the end of this blog.Here, we follow the same example used in the arulesViz Vignette and use a data set of grocery sales that contains 9,835 individual transactions with 169 items. Performing Market Basket Analysis in RTo demonstrate how to carry out an MBA we’ve chosen to use R and, in particular, the arules package. Finally, although the Apriori algorithm does not use lift to establish rules, you’ll see in the following that we use lift when exploring the rules that the algorithm returns. R does have default values, but we recommend that you experiment with these to see how they affect the number of rules returned (more on this below). Calculate the confidence of all possible rules given the frequent itemsets and keep only those with a confidence greater than a pre-specified threshold.The thresholds at which to set the support and confidence are user-specified and are likely to vary between transaction data sets. Systematically identify itemsets that occur frequently in the data set with a support greater than a pre-specified threshold.

Alternatively, we can use visualisation techniques to inspect the set of rules returned and identify those that are likely to be useful.Using the arulesViz package, we plot the rules by confidence, support and lift in Figure 2. For example, the first rule might represent the sort of items purchased for a BBQ, the second for a movie night and the third for baking.Rather than using the thresholds to reduce the rules down to a smaller set, it is usual for a larger set of rules to be returned so that there is a greater chance of generating relevant rules. We use these insights to inform the minimum threshold when running the Apriori algorithm for example, we know that in order for the algorithm to return a reasonable number of rules we’ll need to set the support threshold at well below 0.025.Table 1: The five rules with the largest lift.These rules seem to make intuitive sense. This bar plot illustrates the groceries that are frequently bought at this store, and it is notable that the support of even the most frequent items is relatively low (for example, the most frequent item occurs in only around 2.5% of transactions). This is equivalent to the support of these items where each itemset contains only the single item.

The plot function in the arulesViz package has a useful interactive function that allows you to select individual rules (by clicking on the associated data point), which means the rules on the border can be easily identified.Figure 3: Graph-based visualisation of the top ten rules in terms of lift.Market Basket Analysis is a useful tool for retailers who want to better understand the relationships between the products that people buy. Essentially, these are the rules that lie on the right hand border of the plot where either support, confidence or both are maximised. It has been shown that the optimal rules are those that lie on what’s known as the “support-confidence boundary”.

This type of information is invaluable if you are interested in marketing activities such as cross-selling or targeted campaigns.If you’d like to find out more about how to analyse your transaction data, please contact us and we’d be happy to help. Ultimately the key to MBA is to extract value from your transaction data by building up an understanding of the needs of your consumers. Typically the latter is done by measuring the rules in terms of metrics that summarise how interesting they are, using visualisation techniques and also more formal multivariate statistics.