RFM

http://ijmbr.srbiau.ac.ir/article_3557_dc97e3dbfddcaa76b88f9958dce27d3b.pdf

https://www.retentionscience.com/blog/rfm-king/

https://dl.acm.org/citation.cfm?id=2983281

https://www.analyticsindiamag.com/a-heuristic-approach-to-predictive-modeling-rfm-analysis/

https://econpapers.repec.org/paper/haljournl/hal-00788060.htm

http://www.simafore.com/blog/bid/159575/How-to-use-RFM-analysis-for-customer-segmentation-and-classification

https://www.academia.edu/16622416/Segmentation_approaches_in_data-mining_A_comparison_of_RFM_CHAID_and_logistic_regression?auto=download

https://ideas.repec.org/a/eee/jbrese/v67y2014i1p2751-2758.html

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.197.8595&rep=rep1&type=pdf

https://link.springer.com/chapter/10.1007%2F978-0-387-72579-6_12

https://springml.com/blog/customer-segmentation-combining-rfm-and-predictive-algorithms/

http://www.simafore.com/blog/bid/159575/How-to-use-RFM-analysis-for-customer-segmentation-and-classification

https://www.researchgate.net/publication/259098195_Data_Accuracy's_Impact_on_Segmentation_Performance_Benchmarking_RFM_Analysis_Logistic_Regression_and_Decision_Trees

https://www.semanticscholar.org/paper/Data-accuracy's-impact-on-segmentation-performance%3A-Coussement-Bossche/8ad8703f06cdb62682f5e6c2099339cf9bd45699

https://www.sciencedirect.com/science/article/abs/pii/S0148296312002615

http://aircconline.com/ijcsit/V11N1/11119ijcsit04.pdf

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4986115/

https://www.academia.edu/8466246/Cluster_analysis_new_approach_for_RFM

https://online-journals.org/index.php/i-jes/article/viewFile/6532/4749

https://ijms.ut.ac.ir/article_65616_522808ea990c016e17f60ff114300f1d.pdf

https://www.optimove.com/resources/learning-center/customer-segmentation-via-cluster-analysis

https://www.kaggle.com/hendraherviawan/customer-segmentation-using-rfm-analysis-r

https://www.quora.com/Does-it-make-sense-use-RFM-+-KMeans-+-Cohort

http://iaiest.com/dl/journals/5-%20IAJ%20of%20Accounting%20and%20Financial%20Management/v3-i6-jun2016/paper3.pdf

https://m.scirp.org/papers/58942

https://file.scirp.org/pdf/JDAIP_2015082109475463.pdf

https://scialert.net/fulltextmobile/?doi=itj.2012.1193.1201

http://www.irjabs.com/files_site/paperlist/r_289_121025150620.pdf

https://dl.acm.org/ft_gateway.cfm?id=3007873&ftid=1821105&dwn=1&CFID=707617967&CFTOKEN=73373672

https://www.researchgate.net/publication/321948055_Study_on_Customer_Rating_Using_RFM_and_K-Means

https://github.com/anonyth/customer-segmentation

https://www.semanticscholar.org/paper/Customer-clustering-using-RFM-analysis-Aggelis-Christodoulakis/0ecc47793934afa8570054e13c02c97e49ecb710

http://www.ijceas.com/index.php/ijceas/article/download/174/pdf/

https://s3.amazonaws.com/assets.datacamp.com/production/course_10628/slides/chapter4.pdf


https://s3.amazonaws.com/assets.datacamp.com/production/course_10628/slides/chapter4.pdf

https://earlconf.com/2017/downloads/london/presentations/EARL2017_-_London_-_Alexander_Campbell_-_Customer_segmentation.pdf

https://www.sciencedirect.com/science/article/pii/S1877050910003868

https://www.sciencedirect.com/science/article/pii/S1319157818304178

https://link.springer.com/chapter/10.1007/978-1-84882-762-2_63

http://www.kimberlycoffey.com/blog/2016/8/k-means-clustering-for-customer-segmentation

https://medium.com/@vijaya.a.patil/rfm-analysis-for-customer-segmentation-using-hierarchical-k-means-clustering-c89b92b55ba9

https://sureoptimize.com/targeted-marketing-with-customer-segmentation-and-rfm-analysis-part-1

https://sureoptimize.com/customer-segmentation-and-rfm-analysis-kmeans-clustering-part-2


https://www.wheatongroup.com/articles/superiority-of-tree-analysis-over-rfm-how-it-enhances-regression


https://www.wheatongroup.com/articles/the-superiority-of-statistics-based-predictive-models-versus-rfm-cells


https://www.brighttalk.com/webcast/12529/345822/demo-of-customer-segmentation-using-rfm-analysis-for-retailers


https://link.springer.com/article/10.1007/s10997-018-9447-3

https://link.springer.com/content/pdf/10.1057/palgrave.jt.5740131.pdf

https://oroinc.com/orocrm/doc/2.6/user-guide-marketing-tools/magento/rfm-user


https://www.blastam.com/blog/rfm-analysis-boosts-sales


https://www.optimove.com/resources/learning-center/rfm-segmentation


https://books.google.co.in/books?id=vztXDQAAQBAJ&pg=PR20&lpg=PR20&dq=rfm+analysis+advanced+topics+and+use+cases&source=bl&ots=O-balKJmf_&sig=ACfU3U0x6-WGmM8KdizU5-vLT96qXP_sjA&hl=en&sa=X&ved=2ahUKEwiRsay7wdrkAhWUaCsKHfVRA244KBDoATAEegQICRAB#v=onepage&q=rfm%20analysis%20advanced%20topics%20and%20use%20cases&f=false


https://www.chegg.com/tutors/Statistics-questions/Tuscan-RFM-data-as-attached-only-answer-below-question-is-fine-Q4-Examine-the-first-20-or-so-observations-in-the-database-What-do-you-notice-about-the-RFMSEQ-and-RFMIND-values-That-is-do-the-two-approaches-generally-yield-the-same-RFM-index-for-any-given---CS7BJ/


https://dl.acm.org/citation.cfm?id=2401614


http://w3.salemstate.edu/~gsmith/help/files/SPSS17/SPSS%20EZ%20RFM%2017.pdf


http://www.whitecapers.com/travel-and-hospitality.php


https://esputnik.com/en/blog/practical-rfm-analysis-increase-repeat-sales


http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.468.4579&rep=rep1&type=pdf


https://www.kdnuggets.com/2019/05/golden-goose-cohort-analysis.html/2


http://www.diva-portal.org/smash/get/diva2:1017684/FULLTEXT02


https://www.jimnovo.com/RFM-book.htm


http://www.b-eye-network.com/view/10256


https://stackoverflow.com/questions/42990566/rfm-analysis-with-postgresql


https://cooldata.wordpress.com/2014/03/25/an-all-sql-way-to-automate-rfm-scoring/








Analytics Topics in Healthcare

1. Access to healthcare

2. Automation and augmentation

3. Behavioral Health

4. Biosensor devices and data

5. Chronic disease management

6. Claims processing

7. Clinical decision support

8. Cognitive functioning/decline

9. Disease Onset and Progression Detection (early, low-cost, etc.)

10. Digital therapeutics (e.g., use of technology to augment or replace drugs in disease treatment)

11. Fraud detection & prevention

12. Healthcare data (including EHRs & EMRs)

13. Patient engagement

14. Patient experience (including Stars ratings and other metrics)

15. Patient monitoring (including in-home)

16. Patient Reported Outcomes (PROs)

17. Pharmacy and Rx related services

18. Population Health

19. Precision Medicine

20. Provider services and support (including practice management, growth strategies, etc.)

21. Risk management

22. Improving Affordability by Eliminating Surprise Billing

22. Self-diagnosis and self-care

23. Social Determinants of Health (SDOH)

24. Value-based care (and other outcomes-based payment models)

25. Virtual care/telehealth

26. Wellness and well-being

Markov Boundary Feature Selection - Causal Feature Selection

Git Hub Page of Sisa Ma : https://github.com/SisiMa1729/Causal_Feature_Selection

Git Hub of Demo on Lung Cancer : https://github.com/SisiMa1729/Causal_Feature_Selection/tree/master/Demo/Bhattacharjee2001

Git Hub of Demo on Survival Analysis : https://github.com/SisiMa1729/Causal_Feature_Selection/tree/master/Demo/Vijver2002

Book to refer : Guyon I, Aliferis C. Causal feature selection. InComputational methods of feature selection 2007 Oct 29 (pp. 75-97). Chapman and Hall/CRC.

Predictive Modeling uses statistics to predict outcome : it could be data, standard deviation of data, in fact it can be any statistic we can use to predict outcome

Types of models;

Diagnostic Models : Determine Sub-types of Breast Cancer

Prognostic : Predicting time-to-death / time-to-relapse for cancer patients

Risk Assessment : Predicting the risk of PTSD after trauma

Goal of a model is to use high dimensional feature space to predict a outcome. Two problems that come are; (1) Under fitting and (2) Over fitting. Over fitting is a more severe problem with high number of features and limited or small sample size of data.

Over fitting problems are dealt with : Support Vector Machine , Random Forest and Lasso Regression all have built-in regularization which penalizes model complexity. Over fitted model may not work well with new data.

Feature Selection : Selecting a subset of features from all the available features for constructing the predictive model.

Question : How many subset can i make out of N total number of features.

Answer : 2^N distinct features subset.

Follow up Question : Which subset  of features would be the best to  predict outcome ?

Answer : This can be found by cross-validation of subsets.

Goal of Feature Selection : 

  • Improve Model Predictability;
  • Enhance Model Interpretability 
  • Increase cost-efficiency for model development


For selecting features, an intuitive and popular strategy is to select features that are uni-variate associated with target of interest.

However, features that are useless for predicting by themselves can be useful when combined with other features (i.e. interactive among features can be predictive). Selection based on predictivity/relevance of individual features may miss useful features

It is critical to consider combination of features and the combined information in these features with respect to target of interest. (this look like LDA approach to me or Supervised PCA, ... need to confirm)


What would be really nice is that if can determine in advance how small with be the optimal feature set.


If a feature is strongly relevant, it means that i cannot replace part of the information to explain Target with any other feature.

 

By defining all features as strong, weak or irrelevant features, we can formerly put all features in one of three categories and define a strategy.

Weak features do not improve predictive performance if all strong features are already selected. But if we keep a few weak features it doesn't deteriorate the performance.  


How to select features methodology

  • In Recursive Feature selection we rank the features based on importance and strength with general methods like below;
    • We can use Support Vector Machines for features selection where the coefficients of normalized features will tell the importance of features.
    • Random Forest give the Gini Coefficient of strong vs Weak vs Irrelevant 
    • This method requires a nested cross-validation setup to extract the features.
    • Sometimes dropping/eliminating features/variables one at a time is considered but we should also check by dropping multiple weak or irrelevant features along with cross-validation for testing to get the optimal subset.
    • Lasso Regularization, presses for a simple model by penalizing complex models.


Recursive Feature Selection, Random Forest, SVM and Lasso Regression are non-causal feature selection methods because they are trying to maximize predictive performance while minimizing the feature subset for the model. These methods do not directly utilize the concepts of different types of feature relevance but in practice they produce models with excellent predictive performances in various application domains.

But is this the optimal subset as the above methods do not check if the weak features or irrelevant features might be relevant in presence of other features. Thus we need to discuss the methods of causal feature selection. In causal feature selection we can improve interpretability but not necessarily the predictability because the above Non-Causal Models already give very good models with high predictability. With Causal Feature selection we are trying the see if we can determine the optimal feature set with similar predictability so that all other features become independent to the target(that means the other variables outside the Markov Boundary do not give any information).

Markov Boundary is a NP Complete Problem even for Linear Regression

Markov Boundary (MB) depends on learner and the metric for evaluation. The MB changes if the learning algorithm changes or the metric of evaluation threshold changes.


Causal Feature Selection

Causal Feature Selection (CFS) works on the concept of feature of relevance and joint distribution. 

It would be good if we know the data generating process to use CFS and generate the causal structure between the data points. Then draw the causal structure and find the joint distribution.

The below graph shows if we the data generating process, by CFS we can find the subset of optimal number of features to describe the feature of interest (i.e. Target)

For the above graph if i know X1, X2 , X3, X4 and X5 then i do not need to know the other variables as information is passing though these variables which are inside Markov Boundary and can explain the Variable of Interest (i.e. Target). Information outside the Markov Boundary Variable is embedded in the variables inside the Markov Boundary.

The Markov Boundary (direct causes + direct effects + direct causes of effects) is the minimal feature set that contain all information regarding the target of interest. Causal Feature Selection is also sometimes known as Markov Boundary when they overlap, this happens when a spatial condition is met.

In the above image Direct Cause is X1, X2 and X3. Direct Effect are X4 and X5. These variables will give the best performance because they capture all the Target Variable information.

Benefit of Causal Feature Selection:

  • Selected Features have Causal Interpretations.
  • Selected Features set is generally smaller than Non Causal Methods like Random Forest, SVM, Lasso Regression.
  • Model Generalize better under certain type of distribution shifts.


Causal Feature Selection selects Variables which are more closely related around the target and does a better job then Non Causal Feature Selection. In the image below we see an example of Lasso (Recursive Feature Selection) where variables are spread all around with comparison to Causal Feature Selection shown in a small zoom in box on the right hand side of the image. The Markov Blanked or Causal Feature Selection tries to find out what are the features which makes other features independent. These subset features are neighbors of the target variable and cannot be blocked as they have parent-child relationship.

Benefits of Causal Feature Selection in comparison to Non Causal Feature Selection


Assumptions:

  1. Markov Condition : Every Variable is independent  of its non-descendants given its parents
  2. Faithfulness Assumption : Interdependence stem only from the network structure and not the parameterization of the distribution 
    1. Some interdependencies are explicitly determined by the Markov Boundary, some are entailed using probability theory
    2. Using Bayesian Networks, The d-separation criterion algorithm determine all causal interdependencies entailed by the Markov Condition, which determines path through which information passes also called as open path (also known as Directed Path) or closed paths where information does not pass.
    3. The Markov Blanket / boundary is unique in the Faithfulness Assumption. The variables inside the Markov Blanket cannot be replaced.


We need to learn about partial correlation test to under if a variable is connected to another variables through a middle variable.

If we look at the residuals using partial correlation we know that there is structure which shows a relationship.

How to find Markov Boundary or Strong Relevant Features;

Conditional Independence Tests: (What it means is that two dependent quantities becomes conditionally independent when we learn about a third variable. The opposite can also happen when two independent quantities become conditionally dependent when seen in the context of other variables)
    • fisher's Test
      • X^2 (Chi Square) , G^2 (G - Square) for Categorical Variables
      • conditional mutual information, distance correlation (for non-linear test)
      • comparison of nested models


      The Statistical Tests for above can be read like below;

      • If it is a small-p value then highly likely for their to be dependence
      • If it is a large-p value then independence or don't know
      • Conditioning on large sets make p-value unreliable (loses statistical power)


      Some Algorithms implemented are:

      Forward-Backward Selection Algorithm

      Working;

      • We start with all variables in forward selection and from the data compute the p-values of all variables with Target 'T'.
      • Select the variables with smallest p-values.
        • In every step compute the p-values with all other variables.
        • Try the remove the variables which become independent because of large p-values.
      • Then in backward selection we remove the false positive variables which got selected in forward selection
        • This will remove the variables from the forward selection which have become independent with respect to other selected variables by checking for large p-values.
      • The entire algorithm working by calculating statistical p-values using the data.

      Explanation;

      • Simple, easy general algorithm
      • Suitable when one has the statistical power to condition on all the statistical features
      • Theorem : Forward-Backward Search returns the Markov Blanket of T in distributions faithful to a Bayesian Network with latent variables given perfect tests of conditional independence (CI)
      • Complexity(in number of CI Tests): O(n * s = complexity), N - number of total features, S - number of selected features.
      • Rediscovered as incremental Association Market Basket (Tsarmardinos et al 2003a), Grow-Shrink (Margaritis & Thrun - 2000) (but with a static ordering of features)
      • When one has enough sample size to condition on all features, just perform Backward-Search
      • A variation of Forward-Backward Selection is Forward -Backward Selection with early dropping (Single run) (Borboudakis & Tsamardinos, 2017)
        • Disadvantage with this fast algorithm is the appearance of false negatives.

      Max-Min Parents and Children : Conditioning on Subsets (Tsamardinos, et al 2003b) 

      • Here we first decide a subset and for a variable the condition of independence is checked with a subset of size k and based on P-value, if the P-Value is high then we drop it.
      • K should be greater than 5
      • It may return features which are not parents or children, false positives but it is computationally faster.
      • This algorithm is greedy in some sense.
      • Explanation;
        • when one has the statistical power to condition only upto K features
        • Theorem : MMPC (with symmetry correction and large-enough K) returns the neighbors (parents and children) of T in distributions faithful to a Bayesian Network with latent variables.
          • Returns a small superset without symmetry correction
        • Simple Extensions (MMMB) returns the full Markov Blanket
        • Complexity (in number of CI Tests) : O(n*s^k)
        • Typically conditioning only on k=3,4 suffices on excellent results
        • Max-Min Heuristic: Select the variable that has the largest minimum conditional association (or smallest maximum p-value) with T (conditioned on all possible subsets of the selected features)
          • Smallest maximum p-value minimizes a bound on the false discovery rate of neighbors (Tsamardinos and Brown 2008)
        • Works very well in binary prediction and survival analysis and high dimensional dataset like genomics for dimension reduction  

      PC-Simple Algorithm is the simplest implementation;

      • It examines all the uni-variate association (or correlation) of the variables with the target.
      • It removes the uni-variate and all the associate variables.
      • It then checks with conditional independence tests of one variable with condition on another.
      • Then iterate with conditional independence tests of one variable with condition on  set of two or more variables.

      In the end the Markov Boundary gives variables B, C, F and H to explain target T.


      R Package name is MXM. The algorithm implemented in the package are Backward Search, Forward-Backward Search, FBED, MMPC, MMMB, SES (for multiple solutions)


      For very large data; challenge is how to calculate local p-values.

      1. We can use Early Dropping
        1. Same as Forward-Backward with Early Dropping
        2. Filter out features as soon as deemed conditionally independent of T
      2. Early Stopping
        1. Stop computing local statistics as soon as it is deemed that a feature will not be selected for inclusion/exclusion in the forward/backward phase
      3. Early Return
        1. Stop computations as soon as enough samples have been seen to determine a good enough feature for inclusion/exclusion.

      Causal Feature Selection Models help us to understand the feature selection problem in a non-parametric way and is arguably the main tool in knowledge discovery and designing new algorithms

      A past anecdote i learned from the a book;



      There are several ways to describe the centre and spread of a distribution. One way to present this information is with a five-number summary. It uses the median as its centre value and gives a brief picture of the other important distribution values. Another measure of spread uses the mean and standard deviation to decipher the spread of data. This technique, however, is best used with symmetrical distributions with no outliers.


      Despite this restriction, the mean and standard deviation measures are used more commonly than the five-number summary. The reason for this is that many natural phenomena can be approximately described by a normal distribution. And for normal distributions, the mean and standard deviation are the best measures of centre and spread respectively.


      Standard deviation takes every value into account, has extremely useful properties when used with a normal distribution, and is mathematically manageable. But the standard deviation is not a good measure of spread in highly skewed distributions and, in these instances, should be supplemented by other measures such as the semi-quartile range.


      The semi-quartile range is rarely used as a measure of spread, partly because it is not as manageable as others. Still, it is a useful statistic because it is less influenced by extreme values than the standard deviation, is less subject to sampling fluctuations in highly skewed distributions and  is limited to only two values Q1 and Q3. However, it cannot stand alone as a measure of spread.

      Designing a Digital Nudge

      Designing a Digital Nudge

      While a number of researchers have suggested guidelines for selecting and implementing nudges in offline contexts, information systems present unique opportunities for harnessing the power of nudging. For example, Web technologies allow real-time tracking and analysis of user behavior, as well as personalization of the user interface, and both can help test and optimize the effectiveness of digital nudges; moreover, mobile apps can provide a wealth of information about the context (such as location and movement) in which a choice is made. Given these advantages, information systems allow rapid content modification and visualization to achieve the desired nudging effect.

      Drawing on guidelines for implementing nudges in offline contexts, we now highlight how designers can create digital nudges by exploiting the inherent advantages of information systems. Just as developing an information system follows a cycle, as in, say, the systems development life cycle—planning, analysis, design, and implementation—so does designing choices to nudge users (see Figure 5)—define the goal, understand the users, design the nudge, and test the nudge. We discuss each step in turn, focusing on the decisions designers must make.

      Studies have demonstrated that when consumers miss an opportunity to purchase a product for a significantly reduced price, they are less likely to purchase this product later for its regular price or for a less significantly reduced price. Two possible explanations for this inaction-inertia effect were considered: avoidance of regret (reluctance to purchase the product represents an attempt to avoid regret over missing the better price) and price contrast (reluctance to purchase the product results from a simple price comparison process). The results of 3 experiments favored the avoidance-of-regret explanation.

      f5jpg
      Figure 5. Designing digital nudges follows a cycle; based on Datta and Mullainathan and Ly et al.

      Step 1: Define the goal. Designers must first understand an organization's overall goals and keep them in mind when designing particular choice situations. For instance, the goal of an e-commerce platform is to increase sales, the goal of a governmental taxing authority's platform is to make filing taxes easier and encourage citizens to be honest, and the goal of project creators on crowdfunding platforms is to increase pledges and overall donation amounts. These goals determine how choices are to be designed, particularly the type of choice to be made. For example, subscribing to a newsletter is a binary choice—yes/no, agree/disagree—selecting between items is a discrete choice, and donating monetary amounts is a continuous choice, though it could also be presented as a discrete choice. The type of choice determines the nudge to be used (see the table here). The choice architect, however, must consider not only the goals but also the ethical implications of deliberately nudging people into making particular choices, as nudging people toward decisions that are detrimental to them or their wellbeing is unethical and might thus backfire, leading to long-term negative effects for the organization providing the choice. In short, overall organizational goals and ethical considerations drive the design of choice situations, a high-level step that influences all subsequent design decisions.

      Step 2: Understand the users. People's decision making is susceptible to heuristics and biases. Heuristics, commonly defined as "rules of thumb,"can facilitate human decision making by reducing the amount of information to be processed when addressing simple, recurrent problems. Conversely, heuristics can influence decisions negatively by introducing cognitive biases—systematic errors—when one faces complex judgments or decisions that should require more extensive deliberation.Researchers have studied a wide range of psychological effects that subconsciously influence people's behavior and decision making. In addition to the middle-option bias, decoy effect, and scarcity effect described earlier, common heuristics like the "anchoring-and-adjustment" heuristic, or people being influenced by an externally provided value, even if unrelated; the "availability" heuristic, or people being influenced by the vividness of events that are more easily remembered; and the "representativeness" heuristic, or people relying on stereotypes when encountering and assessing novel situations, influence how alternatives are evaluated and what options are ultimately selected. Other heuristics and biases that can have a strong effect on choices include the "status quo bias," or people tending to favor the status quo so they are less inclined to change default options; the "primacy and recency effect," or people recalling options presented first or last more vividly, so those options have a stronger influence on choice; and "appeals to norms," or people tending to be influenced by the behavior of others.Understanding these heuristics and biases and the potential effects of digital nudges can thus help designers guide people's online choices and avoid the trap of inadvertently nudging them into decisions that might not align with the organization's overall goals.

      Step 3: Design the nudge. Once the goals are defined (see Step 1: Define the goal) and the heuristics and biases are understood (see Step 2: Understand the users), the designer can select the appropriate nudging mechanism(s) to guide users' decisions in the designer's intended direction. Common nudging frameworks a designer could use to select appropriate nudges include the Behavior Change Technique Taxonomy,NUDGE, MINDSPACE, and Tools of a Choice Architecture. Selecting an appropriate nudge and how to implement it through available design elements, or user-interface patterns, is determined by both the type of choice to be made—binary, discrete, or continuous—and the heuristics and biases at play; see the table for examples. For example, a commonly used nudge in binary choices is to preselect the desired option to exploit the status quo bias. When attempting to nudge people in discrete choices, choice architects can choose from a variety of nudges to nudge people toward a desired option. For example, in the context of crowdfunding, with the goal of increasing pledge amounts, choice architects could present the desired reward option as the default option; add (unattractive) choices as decoys; present the desired option first or last to leverage primacy and recency effects; or arrange the options so as to present the preferred reward as the middle option. When attempting to nudge people in continuous choices (such as when soliciting monetary donations), choice architects could pre-populate input fields (text boxes) with a particular value so as to exploit the "anchoring and adjustment" effect. Likewise, when using a slider to elicit numerical responses, the position of the slider and the slider endpoints serve as implicit anchors. Presenting others' choices next to rewards to leverage people's tendency to conform to norms or presenting limited availability of rewards to exploit the scarcity effect can be used to nudge people in binary, discrete, or continuous choices.

      As the same heuristic can be addressed through multiple nudges, in most situations, designers have a variety of "nudge implementations" at their disposal. Unlike in offline environments, implementing nudges in digital environments can be done at relatively low cost, as system designers can easily modify a system's user interface (such as by setting defaults, displaying/hiding design elements, or providing information on others' pledges). Likewise, digital environments enable dynamic adjustment of the options presented on the basis of certain attributes or characteristics of the individual user (such as when a crowdfunding platform presents particular rewards depending on the backers' income, gender, or age). Notwithstanding the choice of nudges, designers should follow commonly accepted design guidelines for the respective platforms (such as Apple's Human Interface Guidelines and Microsoft's Universal Windows Platform design guidelines) to ensure consistency and usability.


      Big-data analytics can be used to analyze behavioral patterns observed in real time to infer users' personalities, cognitive styles, or even emotional states.


      Step 4: Test the nudge. Digital environments allow alternative designs to be generated easily, so their effects can be tested quickly, especially when designing websites. The effectiveness of digital nudges can be tested through online experiments (such as A/B testing and split testing). Testing is particularly important, as the effectiveness of a nudge is likely to depend on both the context and goal of the choice environment and the target audiences. For example, a digital nudge that works well in one context (such as a hotel-booking site like https://www.booking.com) may not work as well in a different context (such as a car-hailing service like https://www.uber.com); such differences may be due to different target users, the unique nature of the decision processes, or even different layouts or color schemes on the webpages; a hotel may use colors and shapes that evoke calmness and cleanliness, whereas a car-hailing service may use colors and shapes that evoke speed and efficiency. As choice architects have various nudge implementations at their disposal, thorough testing is thus imperative for finding the nudge that works best for a given context and users.

      Especially in light of the increasing focus on integrating user-interface design and agile methodologies, using discount usability techniques (such as heuristic evaluation, as introduced by Nielsen) is often recommended to support rapid development cycles (see, for example, Jurca et al.). Likewise, agile methodologies include the quick collection of feedback from real users. However, such feedback from conscious evaluations should be integrated with caution because the effects of nudges are based on subconscious influences on behavior, and experimental evaluations can provide more reliable results. If a particular nudge does not produce the desired effect, a first step for system designers is to evaluate the nudge implementation to determine whether the nudge is, say, too obvious or not obvious enough (see Step 3: Design the nudge). In some instances, though, reexamining the heuristics or biases that influence the decision-making process (see Step 2: Understand the users) or even returning to Step 1: Define the goal and redefining the goals may be necessary (see the sidebar, "Questions Designers Need to Address").


      Conclusion

      Understanding digital nudges is important for the overall field of computing because user-interface designers create most of today's choice environments. With increasing numbers of people making choices through digital devices, user-interface designers become choice architects who knowingly or unknowingly influence people's decisions. However, user-interface design often focuses primarily on usability and aesthetics, neglecting the potential behavioral effects of alternative designs. Extending the body of knowledge of the computing profession through insights into digital nudging will help choice architects leverage the effects of digital nudges to support organizational goals. Choice architects can use the digital nudging design cycle we have described here to deliberately develop such choice environments.

      uf1jpg
      Figure. Applying the digital nudging design cycle (selected examples).

      One final note of caution is that the design of nudges should not follow a "one-size-fits-all" approach, as their effectiveness often depends on a decision maker's personal characteristics.In digital environments, characteristics of users and their environment can be inferred from a large amount of data, allowing nudges to be tailored. System designers might design the choice environment to be adaptive on the basis of, say, users' past decisions or demographic characteristics. Likewise, big-data analytics can be used to analyze behavioral patterns observed in real time to infer users' personalities, cognitive styles, or even emotional states. For example, Bayesian updating can be used to infer cognitive styles from readily available clickstream data and automatically match customers' cognitive styles to the characteristics of the website (such as through "morphing"). Designers of digital choice environments can attempt to "morph" digital nudges on the basis of not only the organizational goals but also users' personal characteristics.

      Any designer of a digital choice environment must be aware of its effects on users' choices. In particular, when developing a choice environment, designers should carefully define the goals, understand the users, design the nudges, and test those nudges. Following the digital-nudging design cycle we have laid out here can help choice architects achieve their organizational goals by understanding both the users and the potential nudging effects so intended effects can be maximized and/or unintended effects minimized.