Statistical Process Control and Process Capability key highlights
1. Reduction of process variability
2. Monitoring and surveillance of a process
3. Estimation of product or process parameters
If a product is to meet or exceed customer expectations, generally it should be produced by a
process that is stable or repeatable
SPC seven major tools are
1. Histogram or stem-and-leaf plot
2. Check sheet
3. Pareto chart
4. Cause-and-effect diagram
5. Defect concentration diagram
6. Scatter diagram
7. Control chart - Skewart - technically most complicated
The proper deployment of SPC helps create an environment
in which all individuals in an organization seek continuous improvement in quality
and productivity. This environment is best developed when management becomes involved in
the process. Once this environment is established, routine application of the magnificent
seven becomes part of the usual manner of doing business, and the organization is well on its
way to achieving its quality improvement objectives.
the statistical concepts that form the basis of SPC, we must first describe Shewhart’s theory of variability.
In any production process, regardless of how well designed or carefully maintained it is, a certain
amount of inherent or natural variability will always exist. This natural variability or
“background noise” is the cumulative effect of many small, essentially unavoidable causes. In
the framework of statistical quality control, this natural variability is often called a “stable
system of chance causes.” A process that is operating with only chance causes of variation
present is said to be in statistical control. In other words, the chance causes are an inherent
part of the process.
Other kinds of variability may occasionally be present in the output of a process. This
variability in key quality characteristics usually arises from three sources: improperly
adjusted or controlled machines, operator errors, or defective raw material. Such variability is
generally large when compared to the background noise, and it usually represents an unacceptable
level of process performance. We refer to these sources of variability that are not part
of the chance cause pattern as assignable causes of variation. A process that is operating in
the presence of assignable causes is said to be an out-of-control process.1
t1 forward, the presence of assignable causes has resulted in an out-of-control process.
Processes will often operate in the in-control state for relatively long periods of time.
However, no process is truly stable forever, and, eventually, assignable causes will occur,
seemingly at random, resulting in a shift to an out-of-control state where a larger proportion
of the process output does not conform to requirements.
The chart contains a center line that represents the average value of
the quality characteristic corresponding to the in-control state. (That is, only chance
causes are present.) Two other horizontal lines, called the upper control limit (UCL) and
the lower control limit (LCL), are also shown on the chart. These control limits are chosen
so that if the process is in control, nearly all of the sample points will fall between
them.
If the process is in control, all the
plotted points should have an essentially random pattern.
There is a close connection between control charts and hypothesis testing.
For example, the mean could shift instantaneously to a new value and remain there
(this is sometimes called a sustained shift); or it could shift abruptly; but the assignable cause
could be short-lived and the mean could then return to its nominal or in-control value; or the
assignable cause could result in a steady drift or trend in the value of the mean. Only the sustained
shift fits nicely within the usual statistical hypothesis testing model.
The hypothesis testing framework is useful in many ways, but there are some differences
in viewpoint between control charts and hypothesis tests. For example, when testing statistical
hypotheses, we usually check the validity of assumptions, whereas control charts are used to
detect departures from an assumed state of statistical control.
In general, we should not worry
too much about assumptions such as the form of the distribution or independence when we are
applying control charts to a process to reduce variability and achieve statistical control.
Furthermore, an assignable cause can result in many different types of shifts in the process
parameters. For example, the mean could shift instantaneously to a new value and remain there
(this is sometimes called a sustained shift); or it could shift abruptly; but the assignable cause
could be short-lived and the mean could then return to its nominal or in-control value; or the
assignable cause could result in a steady drift or trend in the value of the mean. Only the sustained
shift fits nicely within the usual statistical hypothesis testing model.
One place where the hypothesis testing framework is useful is in analyzing the performance
of a control chart. For example, we may think of the probability of type I error of the
control chart (concluding the process is out of control when it is really in control) and the
probability of type II error of the control chart (concluding the process is in control when it
is really out of control). It is occasionally helpful to use the operating-characteristic curve of
a control chart to display its probability of type II error. This would be an indication of the
ability of the control chart to detect process shifts of different magnitudes. This can be of
value in determining which type of control chart to apply in certain situations. For more discussion
of hypothesis testing, the role of statistical theory, and control charts, see Woodall
(2000).
We may give a general model for a control chart. Let w be a sample statistic that measures
some quality characteristic of interest, and suppose that the mean of w is mw and the
standard deviation of w is sw
A very important part of the corrective action process associated with control chart
usage is the out-of-control-action plan (OCAP). An OCAP is a flow chart or text-based
description of the sequence of activities that must take place following the occurrence of an
activating event. These are usually out-of-control signals from the control chart. The OCAP
consists of checkpoints, which are potential assignable causes, and terminators, which are
actions taken to resolve the out-of-control condition, preferably by eliminating the assignable
cause. It is very important that the OCAP specify as complete a set as possible of checkpoints
and terminators, and that these be arranged in an order that facilitates process diagnostic
activities. Often, analysis of prior failure modes of the process and/or product can be helpful
in designing this aspect of the OCAP. Furthermore, an OCAP is a living document in the sense
that it will be modified over time as more knowledge and understanding of the process is
gained. Consequently, when a control chart is introduced, an initial OCAP should accompany
it. Control charts without an OCAP are not likely to be useful as a process improvement tool.
three things we need:
in the x-bar chart , we specified a sample size of five measurements, three-sigma
control limits, and the sampling frequency to be every hour. Increasing Sample Size will reduce
the probability of type-II error
implementing integration management
managing scope
both above are chapter 4 of the pmbok guide and we will be getting in more detail in that
domain - initiating - 13 percent
26 questions
conduct project selection methods
define te scope
document project risks, assumptions and constraints
identify and perform stakeholder analysis
develop the project charter
obtain project charter approval
domain - planning - 24 percent
48 questions
define and record requirements, onstraints and assumptions
create the WBS
create a budget plan
develop the project schedule and timeline
create the human resource management plan
crerate the coomunications plan
develop the project procurement plan
establish the project quality management plan
define the change management plan
create the project risk management plan
present the prokject management plan to the key stakeholers
host the project kisk off plan meeting
domain : executing : 30 percent - getting things done most important pmp
60 questions
mange project resource for project execution
enforce the quality management plan
implement approved changes as directed by the change management plan
execute the risk management plan to manage and respond to risk events
develop the project team through mentoring, coach and motivation
domain: moitoring and controlling- 25 percent
50 questions
measure project performance
verify and manage changes to the project
ensure project deliverables conform to quality standards
monitor all risks and update the risk registry
review corrective actions and assess issues
manage project communications to ensure stakeholder engagement
https://www.edx.org/course/healthcare-finance-economics-and-risk
https://ocw.mit.edu/courses/economics/14-01sc-principles-of-microeconomics-fall-2011/index.htm
https://www.jhsph.edu/academics/online-learning-and-courses/
https://www.pce.uw.edu/certificates/health-care-analytics
https://www.pce.uw.edu/degrees/masters-health-informatics-health-information-management
https://www.pce.uw.edu/degrees/executive-masters-health-administration
https://marksmanhealthcare.com/
Modeling Term | Description |
Features | A set of explanatory variables collected on subjects or samples. Commonly referred to as the independent variables or covariates in the statistical and epidemiological literature |
Labels | The outcome or response of interest. Also referred to as dependent variable or target variable. |
Supervised Learning | Algorithms that map a set of input variables (e.g. features) to output variables (e.g. labels). Describes the vast majority of tasks in machine learning in healthcare. |
Unsupervised Learning | Algorithms that attempt to extract hidden or latent structure in a set of features. Popular examples of unsupervised learning include clustering (e.g. k-means clustering) and dimensionality reduction techniques (e.g. principal components analysis (PCA)). In contrast to supervised learning (see above). |
Causal Inference | Statistical methods that attempt to estimate the effect of an intervention. When using observational (non-experimental data), these methods require additional modeling assumptions drawn from domain knowledge. |
Zero-shot Learning | Using a model to make predictions for a task despite having no training data for that task. |
Bias (Statistical) | Systematic difference between the true value of a parameter in a model and the value of that parameter as estimated from data. Can also refer to the systematic difference between the predicted values from a model and the true values of the labels. |
Word Sense Disambiguation | Learning which similar sounding words might have different meanings. For example, “discharge” can indicate the time a patient leaves the hospital or it might refer to the flow of fluid from part of the body. |
Generative Adversarial Networks (GANs) | Class of machine learning systems that allows for creation of synthetic data similar to provided dataset through the use of two neural networks functioning as a discriminatory and a generator |
Generative Models | Class of models that allow for modeling of both the features and label variables together, as opposed to discriminative models which model the conditional probability of the label given the features |
Matrix Factorization | n Mathematical technique that factorizes one large and dense matrix (e.g. patient biomarker values) into lower-dimensional matrices |
Data Term | Description |
Bias (Fairness) | Variation in human or model performance based on features of the data that reflect societal biases. |
Confounding | Variables (potentially unmeasured) that affect both the treatment and outcome of interest. Confounding can cause bias in the statistical sense if not controlled or accounted for. |
Missing Data or Missingness | Portions of the data that are unobserved. Missingness can refer to the scenario when values are missing for certain patients (e.g. a missing lab value for a patient) or to the scenario when a potentially relevant variable is not measured at all across every patient. |
Training Data | Data that was used to build a model. |
Measurement Drift | When the data gathered of a population may change noticeably over time (e.g. world population becoming more obese), |
Imputation | Replacing missing values in the dataset (e.g. with the mean) in order to do analysis with data points with missing features |
Sparsity | Rareness of certain events resulting in few observations of "positive" examples. Sparsity can occur in both the features and labels. |
Common Problems in ML | Problem | Short-Term Solution | Long-Term Outlook |
Complex Data Challenges | Data Quality Matters Sparsity, missingness, and biased sampling make modeling difficult. | Data aggregation and imputation techniques, such as sparse encoding methods, or matrix factorization can been used to deal with a lack of ``full'' data. Synthetic data which preserves privacy allows the sharing of EHR data. | Creation of high-quality research data containing robust documentation of all aspects of the data generation process. |
Disease Data Imbalances Health conditions are the result of sporadic diseases, leading to highly unbalanced data. | Modified loss functions for important classes and data subsampling are often quick fixes. | Patient self-reporting and passive data collection are needed to create a robust understanding of ``normal'' baselines. | |
Data Only For The Few Limited access to datasets stymies research. | Standardized performance metrics, learning with anonymized data sharing, and privacy-preserving machine learning are all important areas of research growth. | Engaging patients can create voluntarily shared data pools, and more datasets can be created that respect medical regulations. | |
Robustness to the Unseen | Same Name, Different Measure Measurement drift as equipment ages or changes. | Transfer learning and domain adaptation have attempted to compensate for these trends. | Better devices should be made to capture additional signals, or selfdiagnose when the signal is no longer calibrated. |
Anticipating New Data Generalizability of models to new input data, e.g., ``X'' values not seen before. | Model interpretability, domain adaptation, and manifold learning are used to learn the common spaces that may connect new variables to prior ones. | Regulatory incentives should be created to ensure and fund generalizability of data inputs. | |
Handling the next Zika Zero-shot learning in new disease targets, e.g., ``Y'' values not seen before. | Abnormality detection and human in the loop modeling are used to detect when a model may be poorly calibrated for a novel condition. | Expedited clinical capture is key for detecting new conditions, especially if they are fast-moving. | |
Unknown Knowns | Difficult Disease Endotyping
Diseases have underlying heterogeneity, and may have undiscovered subtypes. |
Generative
modeling and unsupervised clustering with outcome-based loss measures have
been previously attempted. |
Additional
data sources as well as fundamental biomedical research are needed to create
robust clinical endophenotyping for machine learning targets. |
Creating Common Ground There is no consensus on meaningful model targets or inputs. | Causal inference and diagnostic baselines are often employed to understand potential directionalities of process, and establish useful tasks. | Patient self-reporting of outcomes combined with traditional expert verified diagnoses may be more meaningful for many conditions of interest |
Methods used : k-Means, RFM Model, K-means clustering algorithm, EM clustering, Generalized Differential RFM Method (GDRFM)
Customer Segmentation is to provide a full range of management perspective, enable to have a great chance for enterprises to communicate with customer and to enhance the return rate of customers.
Commonly used ones are : RFM Method, Customer Value matrix and CLV Method.
It will cost 5 times more to gain a new customer than to keep an existing one, and ten times more to get a dissatisfied customer back (Marcus C., 1998) - Harward
Statistical Way of Clustering algorithms include : partitioned-clustering, density-based clustering, fuzzy clustering , and hierarchical clustering.
In RFM analysis, there is sometime co-linearity found between Frequency and Monetary. Founder of RFM suggested to used Average value rather than total sum as Monetary, and frequency of purchases was converted to number of purchases.
A customer value matrix - used by Boston Consulting Group.
Frequency of Purchase (F) and Average Purchase Amount (M) are used for segmentation in 2*2 matrix used by Boston Consulting Group as Growth-Share.
Data mining consists of more than collecting and managing data; it also includes analysis and prediction. Data mining includes association, sequence or path analysis, classification, clustering and future activities.
Data Mining is the main step of the knowledge discovery in database (KDD) process. Data mining tasks are very distinct and divers because many patterns exist in a huge database. The data mining functionalities and the variety of knowledge they discover are: Characterization, Discrimination, Association Analysis, Classification, Prediction and Clustering.
Clustering Methods can be categorized into two different types of algorithms which are Hierarchical Algorithms and Non-Hierarchical or Partition Algorithms.
In Hierarchical algorithms, number of clusters is unknown in the beginning, which is a strong advantage of these algorithms over non-hierarchical methods. On the other hand once an instance is assigned to a cluster, the assignment is irrevocable. Therefore, we can say that the output of hierarchical methods can be used to generate some interpretations over the data set and may be used as an input for a non-hierarchical method, in order to improve the resulting cluster. (Similar to RFM and then using K-means is what i am proposing).
Non-hierarchical or Partition algorithms (NHC) typically determine all clusters initially, but they can also be used as divisive algorithms in the hierarchical clustering. Here the advantage is that the algorithm iterates for all possible movements of data points between the formed clusters until a stop-ping criterion met. The NHC algorithms are sensitive to initial partitions and due to this fact, there exists too many local minima.
http://ijmbr.srbiau.ac.ir/article_3557_dc97e3dbfddcaa76b88f9958dce27d3b.pdf
https://www.retentionscience.com/blog/rfm-king/
https://dl.acm.org/citation.cfm?id=2983281
https://www.analyticsindiamag.com/a-heuristic-approach-to-predictive-modeling-rfm-analysis/
https://econpapers.repec.org/paper/haljournl/hal-00788060.htm
https://ideas.repec.org/a/eee/jbrese/v67y2014i1p2751-2758.html
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.197.8595&rep=rep1&type=pdf
https://link.springer.com/chapter/10.1007%2F978-0-387-72579-6_12
https://springml.com/blog/customer-segmentation-combining-rfm-and-predictive-algorithms/
https://www.sciencedirect.com/science/article/abs/pii/S0148296312002615
http://aircconline.com/ijcsit/V11N1/11119ijcsit04.pdf
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4986115/
https://www.academia.edu/8466246/Cluster_analysis_new_approach_for_RFM
https://online-journals.org/index.php/i-jes/article/viewFile/6532/4749
https://ijms.ut.ac.ir/article_65616_522808ea990c016e17f60ff114300f1d.pdf
https://www.optimove.com/resources/learning-center/customer-segmentation-via-cluster-analysis
https://www.kaggle.com/hendraherviawan/customer-segmentation-using-rfm-analysis-r
https://www.quora.com/Does-it-make-sense-use-RFM-+-KMeans-+-Cohort
https://m.scirp.org/papers/58942
https://file.scirp.org/pdf/JDAIP_2015082109475463.pdf
https://scialert.net/fulltextmobile/?doi=itj.2012.1193.1201
http://www.irjabs.com/files_site/paperlist/r_289_121025150620.pdf
https://dl.acm.org/ft_gateway.cfm?id=3007873&ftid=1821105&dwn=1&CFID=707617967&CFTOKEN=73373672
https://www.researchgate.net/publication/321948055_Study_on_Customer_Rating_Using_RFM_and_K-Means
https://github.com/anonyth/customer-segmentation
http://www.ijceas.com/index.php/ijceas/article/download/174/pdf/
https://s3.amazonaws.com/assets.datacamp.com/production/course_10628/slides/chapter4.pdf
https://s3.amazonaws.com/assets.datacamp.com/production/course_10628/slides/chapter4.pdf
https://www.sciencedirect.com/science/article/pii/S1877050910003868
https://www.sciencedirect.com/science/article/pii/S1319157818304178
https://link.springer.com/chapter/10.1007/978-1-84882-762-2_63
http://www.kimberlycoffey.com/blog/2016/8/k-means-clustering-for-customer-segmentation
https://sureoptimize.com/targeted-marketing-with-customer-segmentation-and-rfm-analysis-part-1
https://sureoptimize.com/customer-segmentation-and-rfm-analysis-kmeans-clustering-part-2
https://link.springer.com/article/10.1007/s10997-018-9447-3
https://link.springer.com/content/pdf/10.1057/palgrave.jt.5740131.pdf
https://oroinc.com/orocrm/doc/2.6/user-guide-marketing-tools/magento/rfm-user
https://www.blastam.com/blog/rfm-analysis-boosts-sales
https://www.optimove.com/resources/learning-center/rfm-segmentation
https://dl.acm.org/citation.cfm?id=2401614
http://w3.salemstate.edu/~gsmith/help/files/SPSS17/SPSS%20EZ%20RFM%2017.pdf
http://www.whitecapers.com/travel-and-hospitality.php
https://esputnik.com/en/blog/practical-rfm-analysis-increase-repeat-sales
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.468.4579&rep=rep1&type=pdf
https://www.kdnuggets.com/2019/05/golden-goose-cohort-analysis.html/2
http://www.diva-portal.org/smash/get/diva2:1017684/FULLTEXT02
https://www.jimnovo.com/RFM-book.htm
http://www.b-eye-network.com/view/10256
https://stackoverflow.com/questions/42990566/rfm-analysis-with-postgresql
https://cooldata.wordpress.com/2014/03/25/an-all-sql-way-to-automate-rfm-scoring/