CAP study notes

INFORMS defines analytics as the scientific process of transforming data into insight for making better decisions. It is seen as an end-to-end process beginning with identifying the business problem to evaluating and drawing conclusions about the prescribed solution arrived at through the use of analytics. Analytics professionals are skilled at this process.

  • Operation Research is a correction toolkit like optimization, simulation, precision analysis.
  • Advanced Analytics is the intersection of Analytics and Operation Research
  • Analytics Maturity Model for Organizations to introspect their analytics processes maturity.
  • OR is a toolkit and Analytics is a process.


Job Task Analysis

A job task analysis (JTA) is a comprehensive description of the duties and responsibilities of a profession, occupation, or specialty area our approach consists of four elements: 1) domains of practice, 2) tasks performed, 3) knowledge required for effective performance on the job, and 4) domain weights that account for the importance of and frequency with which tasks are performed.


Domain - Approximate Weight

1. Business Problem (Question ) Framing - 12%-18%

2. Analytics Problem Framing - 14%-20%  

3. Data - 18%-26%

4. Methodology (Approach) Selection - 12%-18%

5. Model Building - 13%-19%

6. Deployment - 7%-11%

7. Model Life Cycle Management - 4%-8%


(12%-18%) Domain 1 - Business Problem (Question ) Framing - The ability to understand a business problem and determine whether the problem is amenable to an analytics solution.

T-1 Obtain or receive problem statement and usability requirements

T-2 Identify stakeholders

T-3 Determine whether the problem is amenable to an analytics solution

T-4 Refine the problem statement and delineate constraints

T-5 Define an initial set of business benefits

T-6 Obtain stakeholder agreement on the business problem statement


(14%-20%) Domain 2 - Analytics Problem Framing - The ability to reformulate a business problem into an analytics problem with a potential analytics solution.

T-1 Reformulate problem statement as an analytics problem.

T-2 Develop a proposed set of drivers and relationships to outputs.

T-3 State the set of assumptions related to the problem.

T-4 Obtain stakeholder agreement on the approach.


(18%-26%) Domain 3 - Data - The ability to work effectively with data to help identify potential relationships that will least to the refinement of the business and analytics problem.

T-1 Identify and prioritize data needs and sources.

T-2 Acquire data.

T-3 Harmonize, rescale, clean and share date.

T-4 Identify relationships in the data.

T-5 Document and report findings (e.g., insights, results, business performance)

T-6 Refine the business and analytics problem statements.


(12%-18%) Domain 4 - Methodology (Approach) Selection - The ability to identify and select potential approaches for solving the business problem.

T-1 Identify available problem-solving approaches (methods).

T-2 Select software tools.

T-3 Test approaches (methods).

T-4 Select6 Approaches (methods).


(13%-19%) Domain 5 - Model Building - The ability to identify and build effective model structures to help solve the business problem.

T-1 Identify model structures.

T-2 Run and evaluate the models.

T-3 Calibrate models and data.

T-4 Integrate the models.

T-5 Document and communicate findings(including assumptions, limitations, and constraints).


(7%-11%) Domain 6 - Deployment - The ability to deploy the selected model to help solve the business problem.

T-1 Perform business validation of the model.

T-2 Deliver report with findings; or

T-3 Create model, usability, and system requirements for production.

T-4 Deliver production model/system.

T-5 Support deployment


(4%-8%) Domain 7 - Model Life Cycle Management - The ability to manage the model lifecycle to evaluate the business benefit of the model over time.

T-1 Document initial structure.

T-2 Track model quality.

T-3 Recalibrate and maintain the model.

T-4 Support training activities.

T-5 Evaluate the business benefit of the model over time.


Knowledge Statements 

K-1 Characteristics of a business problem statement (i.e., a clear and concise statement of the problem describing the situation and stating the desired end state or goal).

K-2 Interviewing (questioning) techniques (i.e. the process by which a practitioner elicits information and understanding from business experts, including strategies for the success of the project).

K-3 Client business processes (i.e., the processes used by the client or project sponsor that are related to the problem).

K-4 Client and client-related organizations structures.

K-5 Modeling options (i.e., the analytic approaches available for seeking a solution to the problem or answer to the question including optimization, simulation, forecasting, statistical analysis, data mining, machine learning, etc).

K-6 Resources necessary for analytics solutions( e.g., human, data, computing, software).

K-7 Performance measurement (i.e., the technical and business metrics by which the client and the analyst measure the success of the project).

K-8 Risk/return (i.e., trade-offs between prioritizing the primary objective and minimizing the likelihood of significant penalty taking into account the risk attitude of the decision maker).

K-9 Presentation techniques (i.e. strategies for communicating analytics problems and solutions to a broad audience of business clients).

K-10 Structure of decisions (e.g., influence diagrams, decision trees, system structures).

K-11 Negotiation techniques (i.e., strategies and methods that allow the analytics professional to reach a shared understanding with the client).

K-12 Data rules (e.g., privacy, intellectual property, security, governance, copyright, sharing).

K-13 Data architectures (i.e., a description of how data are processed, stored, and used in organizational systems including conceptual, logical, and physical aspects).

K-14 Data Architecture (i.e., a description of how data are processes stored, and used in organizational systems including conceptual, logical, and physical aspects).

K-15 Visualization techniques (i.e., any technique for creating images, diagrams or animations to communicate a message including data visualtization, information, visualtization, statistical graphics, presentation graphics, etc.)

K-16 Statistics (descriptive, correlation, regression, etc.)

K-17 Software tools.

The five E's are ethics, education, experience, examination, and effectiveness. These are the five pillars of CAP.

Effectiveness is the art of applying your knowledge and skill in a way that enables the achievement of your organization's goals. The soft skills required are dealt with more fully.


Domain 1 - Business Problem Framing

A business problem statement generally starts by describing a business opportunity or threat, or an issue in broad terms. 

Do get definitions of all terms, as meanings change between organizations.

Five W's - who, what, where, when and why;

  • Who: are the stakeholders who satisfy one or more of the following with respect to the project: funding, using, creating, or affected by the project's outcome.
  • What: problem/function is the project meant to solve/perform?
  • Where: does the problem occur? Or where does the function need to be performed? are the physical and spatial characteristics articulated?
  • When: does the problem occur, or function needs to be performed? when does the project to be completed?
  • Why: does the problem occur, or function need to occur?

First figure if the stakeholder's problem is likely to have an analytics solution. or does the answer and change process to get there lie within the organization's control. Second, do we have the data on input and output? third can the problem be modeled. Lastly, can the organization accept and deploy the answer?

Refine the problem statement to make it more accurate, more appropriate to the stakeholders, or more amenable to available analytic tools/methods. It is also necessary to define what constraints the project will operate under. These constraints could be analytical, financial, or political in nature.

Define an initial set of business benefits. These benefits may be determined quantitatively, qualitatively. This is defined as a Business Case


Domain 2 - Analytics Problem Framing

 

Decomposition of requirements using QFD - Quality Function Deployment. In decomposing, it is critical to important to account for tacit (understood or implied without being stated) as well as formal requirements. The best-known model for this is the KANO Model.

KANO Model distinguishes between unexpected customer delights, known customer requirements, and customer must-haves that are not explicitly stated. 

When generally asked, Business stakeholders for a list of what requirements they have, they will tend to focus on the "normal requirements," not the "expected requirements". As the analytics professional charged with translating business requirements into the problem statement, you really need to probe to make sure that you have the entire appropriate context as well, including the expected requirements.

Your input/output functions are strongly related to your assumptions about what is important about this problem as well as the key metrics by which you'll measure the organizational response to the problem.

Simple Black Box sketches are to make inputs visible and illustrate the concept that we are simulating. This helps in getting agreement among the team on the direction and scale of the relationships to bound the problem and to create the related hypothesis that you'll use later to attack the data. A point to emphasize to the team is that these are preliminary assumptions and while your best estimate is needed, it is still just an estimate and is subject to change depending on what realty turns out to be. The danger we're trying to avoid to change depending on what reality turns out to be. The danger we're trying to avoid here is what Kahneman calls "anchoring". People have a tendency to hang on to views that they've seen and held before, even if they are incorrect. Reminding them that these are initial and preliminary, rather than finalized views, helps mitigate the anchoring effect.

" What is measured, improves". This ties directly to the business problem statement but goes down one level further to the items that compromise the key success metric.

Many people then to think of stakeholders as people in positions "above" the analytics team. It is true that there is a group of stakeholders that are the ones with the business need and who are paying for the effort. But just as importantly, you must also have an agreement with the people executing the analytics work that your methods and hypothesis are workable in the time and budget allocated to get the work done. The output of this stakeholder agreement will vary by organization but should include the budget, timeline, interim milestones (if any), goals, and any known effort that is excluded as out of scope. Otherwise, errors will creep in what was delivered will miss critical unstated requirements. If you allow your project to rely on written communication only, you've missed the opportunity to correct misapprehensions when it is still cheap to do so.

Decomposition: the act of breaking down a higher-level requirement to multiple lower-level requirements.

Requirements: a requirement should be unitary (no conjunctions such as and, but, or, or), positive, and testable.



Data Visualization: Individuals process information in different ways

  • Some individuals are visuals and others are analytic
  • Some individuals want to see the big picture, others want to see details and some want anecdotes
  • Some individuals want to demonstrations or even the ability to slice and dice the data themselves.


Think hard about your visualization, the less blind faith the better, think hard about the visual, it should be a white box instead of a black box, which shows the relationships between the variables and reduce the dimensions in the data. Because your visualization will be used by other business people and they may see the visualization and might replicate the results and take it out of context.


Importance of Data Reduction:

  • Product Segmentation


The Results have to be communicated easily in such a manner with respect to people who have not worked on the data as much as we have.

Don't try to be mechanical in giving recommendations, find the little insights as the recommendations do not have to be big in scope.

Prototyping: Take out a small example and see what the data implies. It is an iterative task and helps to refine the problem statement. It helps in understanding if the expected outputs would be achieved or we can go back and look at the data plan

Consideration for Model Selection:

  • Modeling Options
  • Data Architecture 

Prescriptive Models include:

  • Optimization (Linear Programming, Non-Linear Programming, Integer Programming, etc.)
  • Stochastic - Optimization (randomness of the system comes into play)

Predictive Model s include:

  • Simulation
  • Regression
  • Statistical Inferences
  • Classification
  • Clustering
  • Artificial Intelligence
  • Game Theory

High Value, High Impact, High Level of Data Accuracy, Causal Understanding - Prescriptive and Predictive

Medium Level of Data Needs - Descriptive Analytics


For Business Needs  - Two types of compare and contrast analysis must be done before selecting a particular method or modeling approach.

  • Type I - Between Approach - Prescriptive vs Predictive vs Descriptive or a combination method
  • Type II - Within Approach - Within the three which method is best suitable


Select Methodology :

  • Selection Criteria dependant on;
  1. Time Available / Constraint 
  2. Accuracy Needed
  3. The relevance of methodology & Scope of Project
  4. The accuracy of Data - If Data is not good we cannot use high data intensive method
  5. Data availability & readiness
  6. Resource Available
  7. Methodology popularity/acceptance
  8. Match approach to accuracy

Important areas to focus on method selection are :

  1. Know what a method can do;
  2. Know what a method cannot do; 
  3. Stay unbiased in method selection, and not try to use a method that a practitioner knows well.

A Good Meta Knowledge is to know about what software are there in the market and what software can handle inputs and is the best case for the Business Problem.


In Document and Communicate Findings, 

  • Need to document how findings impact the original business problem
  • Rather than giving the customers walkthrough of the kitchen or the restaurant while they are hungry and then give them to eat, it would be better to give them the food on the table in the way they want the food to be presented, let them eat and then answer their questions.


Every model has a lifecycle and lifecycle maintenance. For some models, lifecycle maybe three weeks or six months. If the model is not available to give the answers then we need to do maintenance of the model. Documentation of the structure of the model is important.

The document should include assumptions. so that business knows where we have taken assumptions so that they know when they use the model and when to use the output.