Model Management and the Era of the Model-Driven Business

Over the past few years, we’ve seen a new community of data science leaders emerge.

Regardless of their industry, we have heard three themes emerge over and over:  1) Companies are recognizing that data science is a competitive differentiator. 2) People are worried their companies are falling behind — that other companies are doing a better job with data science. 3) Data scientists and data science leaders are struggling to explain to executives why data science is different from other types of work, and the implications of these differences on how to equip and organize data science teams.

Introduction

Since we started Domino five years ago, we have talked to hundreds of companies that are investing in data science, and heard all about their successes and their challenges.

At various points during that time, we focused on different aspects of the challenges that face data scientists and data science teams.

  • When we first launched Domino in 2014, we focused on automating much of the “dev ops” work that data scientists must do, in order to accelerate their work.
  • In 2015, we broadened our aperture to address data scientists’ need to track and organize their research.
  • In 2016, we added capabilities to deploy models, creating a unified platform to support the data science lifecycle from development to deployment.
  • And in 2017, we emphasized how collaboration, reproducibility, and reusability are the foundation that allows data science teams to scale effectively.

At every point along the way, we felt like there was something larger we wanted to say, but we didn’t quite know how. Like the parable of the blind men describing different parts of an elephant, we knew we were describing pieces but not the whole.

So about a year ago we took a step back. We had long discussions with our customers to distill and synthesize what makes data science different and what differentiates companies who apply it most effectively.

What do data scientists make?

Our major insight came when we asked ourselves: “what do data scientists make?”

Beyond the hype about AI and machine learning, at the heart of data science, is something called a model. By “model,” I mean an algorithm that makes a prediction or recommendation or prescribes some action based on a probabilistic assessment.

Models can make decisions and take action autonomously and with speed and sophistication that humans can’t usually match. That makes models a new type of digital life.

Data scientists make models.

And if you look at the most successful companies in the world, you’ll find models at the heart of their business driving that success.

An example that everyone is familiar with is the Netflix recommendation model. It has driven subscriber engagement, retention, and operational efficiency at Netflix. In 2016, Netflix indicated that their recommendation model is worth more than $1B per year.

Coca-Cola uses a model to optimize orange juice production. Stitch Fix uses modelsto recommend clothing to its customers. Insurance companies are beginning to use models to make automated damage estimates from accident photos, reducing dependence on claims adjusters.

The Model Myth

Though obvious in one sense, the realization that data scientists make models is powerful because it explains most of the challenges that companies have making effective use of data science.

Fundamentally, the reasons companies struggle with data science all stem from misunderstandings about how models are different from other types of assets they’ve built in the past.

Many companies try to develop and deploy models like they develop and deploy software. And many companies try to equip data scientists with technology like they were equipping business analysts to do queries and build business intelligence dashboards.

It’s easy to see why companies fall into this trap: models involve code and data, so it’s easy to mistake them for software or data assets.

We call this the Model Myth: it’s the misconception that because models involve code and data, companies can treat them like they have traditionally treated software or data assets.

Models are fundamentally different, in three ways:

  1. The materials used to develop them are different. They involve code, but they use different techniques and different tools than software engineering. They use more computationally intensive algorithms, so they benefit from scalable compute and specialized hardware like GPUs. They use far more data than software projects. And they leverage packages from a vibrant open source ecosystem that’s innovating every day. So data scientists need extremely agile technology infrastructure, to accelerate research.
  2. The process to build them is different. Data science is research — it’s experimental and iterative and exploratory. You might try dozens or hundreds of ideas before getting something that works. So data scientists need tools that allow for quick exploration and iteration to make them productive and facilitate breakthroughs.
  3. Models’ behavior is different. Models are probabilistic. They have no “correct” answer — they can just have better or worse answers once they’re live in the real world. And while nobody needs to “retrain” software, models can change as the world changes around them. So organizations need different ways to review, quality control, and monitor them.

Model Management

The companies who make the most effective use of data science — ones who consistently drive competitive advantage through data science — are the ones who recognize that models are different and treat them differently.

We’ve studied the various ways these companies treat models differently and organized that into a framework we call Model Management.

Historically, “model management” has referred narrowly to practices for monitoring models once they are running in production. We mean it as something much broader.

Model Management encompasses a set of processes and technologies that allow companies to consistently and safely drive competitive advantage from data science at scale.

Model Management has five parts to it:

  1. Model Development allows data scientists to rapidly develop models, experiment, and drive breakthrough research.
  2. Model Production is how data scientists’ work get operationalized. How it goes from a cool project to a live product integrated into business processes, affecting real decisions.
  3. Model Technology encompasses the compute infrastructure and software tooling that gives data scientists the agility they need to develop and deploy innovative models.
  4. Model Governance is how a company can keep a finger on the pulse of the activity and impact of data science work across its organization, to know what’s going on with projects, production models, and the underlying infrastructure supporting them.
  5. Model Context is at the heart of these capabilities. It is all the knowledge, insights, and all the artifacts that are generated while building or using models. This is often a company’s most valuable IP, and the ability to find, reuse, and build upon it is critical to driving rapid innovation.

Each of these facets of managing models requires unique processes and products. When integrated together, they unlock the full potential of data science for organizations.

Computing revolutions separate winners from losers

Data science is a new era of computing. The first era was hardware, where engineers made chips and boards. The second era was software, where engineers made applications. In the third era, data scientists make models.

And like past revolutions in computing, two things are true about the data science era:

  1. Companies’ ability to adopt and effectively apply the new approach will determine their competitiveness over the coming years. Just as “software ate the world” and “every company needed to be a software company”, every company will need to become a data science company if they want to stay competitive.
  2. The methodologies and tooling and processes that worked for the previous era will not work for this new era. The rise of software engineering led to new methodologies, new job titles, and new tools — what worked for developing, delivering and managing hardware didn’t work for software. The same is true for data science: what worked for software will not work for models.

Model Management is the set of processes and technologies a company needs to put models at the heart of their business. It’s required because models are different from software, so they need new ways to develop, deliver and manage them. And by adopting Model Management, organizations can unlock the full potential of data science, becoming model-driven businesses.

The five-component framework can lead to success in advanced analytics

1. The source of business value

Every analytics project should start by identifying the business value that can lead to revenue growth and increased profitability (for example, selecting customers, controlling operating expenses, lowering risk, or improving pricing). To make the selection, business-unit managers and the frontline functional managers who will be using the tools need jointly to define the business problem and the value of the analytics. Analytics teams often begin building models before users in sales, underwriting, claims, and customer service provide their input.

2. The data ecosystem

It is not enough for analytics teams to be “builders” of models. These advanced-analytics experts also need to be “architects” and “general contractors” who can quickly assess what resources are available inside and outside the company. Unlocking the business potential of advanced analytics often requires the integration of numerous internal and external data assets. For instance, risk pricing and selection often can be improved significantly by mapping the data from internal customer-management systems with traditional third-party data providers such as credit bureaus and data exhaust from new digital sources. Given the diversity of data sources and vendors, carriers must continually scan the ecosystem for technologies and partners to take full advantage of new analytical opportunities.

3. Modeling insights

Building a robust predictive model has many layers: identifying and clarifying the business problem and source of value, creatively incorporating the business insights of everyone with an informed opinion about the problem and the outcome, reducing the complexity of the solution path, and validating the model with data.

Close collaboration among the analytics professionals who build the models and the functional decision makers who use them combines a “black box” data-modeling process (pure statistical analyses of large amounts of data) and a “smart box” filled with the knowledge of experienced practitioners. Experienced claims adjusters, for instance, have an intuitive sense about which injuries have the highest probability of escalating. Often, a hypothesis based on judgment still needs to be validated against external data. Data from claims histories will not reveal that employee relations with management or the commuting time between home and the workplace can also be factors in how long claimants stay away.

4. Transformation: Work-flow integration

The goal is always to design the integration of new decision-support tools to be as simple and user friendly as possible. The way analytics are deployed depends on how the work is done. A key issue is to determine the appropriate level of automation. A high-volume, low-value decision process lends itself to automation. A centralized underwriting group, for example, which had manually reviewed thousands of insurance-policy applications, needed only to review 1 percent of them after they adopted a rules engine. At the other end of the spectrum, automation can never replace the expertise and judgment of managers handling multimillion-dollar commercial accounts.

Integrating a new decision-support tool into a work flow can pose significant behavioral challenges. One insurer in commercial- and specialty-insurance lines tested three different ways to display information—a numerical score, a letter grade, and colored flags—to see which one led to the highest adoption and most accurate results. This kind of detail might seem minor, but such choices determine whether a decision maker uses a model or ignores it. Claims adjusters, underwriters, and call-center representatives will only incorporate analytics into their decisions if the tools address the issues in ways that make sense to them and if it is easy to integrate the tools into their work flow.

5. Transformation: Adoption

Successful adoption requires employees to accept and trust the tools, understand how they work, and use them consistently. That is why managing the adoption phase well is critical to achieving optimal analytics impact. All the right steps can be made to this point, but if frontline decision makers do not use the analytics the way they are intended to be used, the value to the business evaporates.

An insurance carrier developed a model to predict which injury claims would escalate based on the conditions and circumstances of the claimants. The system provided claims adjusters with different ways to work with claimants to help them with their recovery. The model was painstakingly constructed and efficacious, but getting adjusters to use the model proved as difficult as constructing the model itself. Successful adoption requires collaboration up front, follow-up communication as to the model’s value, and investment in training people to use it. Equally important, the heads of sales, underwriting, and claims need to be engaged so that their visions of success and expected results are built into their business plans. Business leadership is needed to ensure that all players are asking the right questions: What does successful adoption look like? Where will it have the most impact?

A center of excellence

In any major change effort, there is value in starting small and experimenting in order to learn what will work in a given company. Several companies achieved success by forming a small team that demonstrated to specific user groups the impact of analytics in two or three use cases.

The advantages of this approach are that it builds conviction and provides insights into what works and what does not. It also helps expose business needs and build an understanding of how a centralized analytics group might help meet them. Where should analysts and data scientists reside? Where should data management reside? How should the business be supported with work-flow integration and adoption? These questions can be best explored by an internal analytics center of excellence (see sidebar, “Building an advanced-analytics center of excellence”).

Weaving analytics into the fabric of an organization is a journey. Every organization will progress at its own pace, from fragmented beginnings to emerging influence to world-class corporate capability. As participants gain experience, pilots help shape an operating model for future rollouts. In the discipline of analytics, the more testing that is performed, learning that is achieved, and new data and knowledge that is applied within the organization, the better the decisions and the outcomes will be.


Latest Research and Blogs in Insurance

http://www.iii.org/insuranceindustryblog/


https://www.iii.org/resource-center/iii-glossary


Spotlight on Marijuana and Employment

Insurance impacts: Workers compensation

Workers compensation insurance generally offers the exclusive remedy for employee injuries sustained during the scope of their employment. There are at least two workers compensation issues to consider related to marijuana:

Does workers compensation cover a workplace accident in which the injured employee tested positive for marijuana? THC persistence complicates this question, and state courts have differed on this issue, depending on the individual details of each case. For example, in 2015 the Ohio 5th District Court of Appeals found that an injured worker was eligible for workers compensation benefits despite failing a drug test after the accident. The court ruled, in part, that the worker was eligible unless his marijuana use was the proximate cause of injury.

Does workers compensation cover medical marijuana expenses incurred by an injured employee? Similarly, states differ on this question – some say that medical marijuana reimbursement is permitted, some that it is prohibited, and some are silent on the matter. Courts have also come to different conclusions – some have found that workers compensation can reimburse medical marijuana expenses, others that it can’t.

Homeowners and Renters Insurance Claims Payout

After a disaster, you want to get back to normal as soon as possible, and your insurance company wants that too! You may get multiple checks from your insurer as you make temporary repairs, permanent repairs and replace damaged belongings. Here's what you need to know about claims payments.


The initial payment isn't final

In most instances, an adjuster will inspect the damage to your home and offer you a certain sum of money for repairs, based on the terms and limits of your homeowner's policy. The first check you get from your insurance company is often an advance against the total settlement amount, not the final payment.

If you're offered an on-the-spot settlement, you can accept the check right away. Later, if you find other damage, you can reopen the claim and file for an additional amount. Most policies require claims to be filed within one year from the date of disaster; check with your state insurance department for the laws that apply to your area.

You may receive multiple checks

When both the structure of your home and your personal belongings are damaged, you generally receive two separate checks from your insurance company, one for each category of damage. If your home is uninhabitable, you'll also receive a check for the additional living expenses (ALE) you incur if you can’t live in your home while it is being repaired. If you have flood insurance and experienced flood damage, that means a separate check as well.

 
Your lender or management company might have control over your payment

If you have a mortgage on your house, the check for repairs will generally be made out to both you and the mortgage lender. As a condition of granting a mortgage, lenders usually require that they are named in the homeowner's policy and that they are a party to any insurance payments related to the structure. Similarly, if you live in a coop or condominium your management company may have required that the building's financial entity be named as a co-insured.

This is so the lender (and/or, in the case of a coop or condo, the overall building), who has a financial interest in your property, can ensure that the necessary repairs are made.

When a financial backer is a co-insured, they will have to endorse the claims payment check before you can cash it.

Depending on the circumstances, lenders may also put the money in an escrow account and pay for the repairs as the work is completed. Show the mortgage lender your contractor's bid and let the lender know how much the contractor wants up front to start the job. Your mortgage company may want to inspect the finished job before releasing the funds for payment to the contractor.

If your home has been destroyed, the amount of the settlement and who gets it is driven by your policy type, its specific limits and the terms of your mortgage. For example, part of the insurance proceeds may be used to pay off the balance due on the mortgage. And, how the remaining proceeds are spent depend on your own decisions, such as if you want to rebuild on the same lot, in a different location or not rebuild at all. These decisions are also driven by state law.  

 
Your insurance company may pay your contractor directly

Some contractors may ask you to sign a "direction to pay" form that allows your insurance company to pay the firm directly. This form is a legal document, so you should read it carefully to be sure you are not also assigning your entire claim over to the contractor. When in doubt, call your insurance professional before you sign. Assigning your entire insurance claim to a third party takes you out of the process and gives control of your claim to the contractor.

When work is completed to restore your property, make certain the job has been completed to your satisfaction before you let your insurer make the final payment to the contractor.

 
Your ALE check should be made out to you

Your check for additional living expenses (ALE) has nothing to do with repairs to your home. So, ensure that this check is made out to you alone and not your lender. The ALE check covers your expenses for hotels, car rental, meals out and other expenses you may incur while your home is being fixed.

Your personal belongings will be calculated on cash value, first

You'll have to submit a list of your damaged belongings to your insurance company (having a home inventory will make this a lot easier). Even if you have a replacement value policy, the first check you receive from your insurer will be based on the cash value of the items, which is the depreciated amount based on the age of the item. Why do insurance companies do this? It is to match the remaining claim payment to the exact replacement cost. If you decide not to replace an item, you’ll be paid the actual cash value (depreciated) amount for it.

 
To get replacement value for your items, you must actually replace them

To get fully reimbursed for damaged items, most insurance companies will require you to purchase replacements. Your company will ask for copies of receipts as proof of purchase, then pay the difference between the cash value you initially received and the full cost of the replacement with an item of similar size and quality. You'll generally have several months from the date of the cash value payment to purchase replacements; consult with your agent regarding the timeframe.  

In the case of a total loss, where the entire house and its contents are damaged beyond repair, insurers generally pay the policy limits, according to the laws in your state. That means you can receive a check for what the home and contents were insured for at the time of the disaster.

AI: Different Scenarios where we can apply algorithms

1.) Naive Bayes Classifier Algorithm
If we’re planning to automatically classify web pages, forum posts, blog snippets and tweets without manually going through them, then the Naive Bayes Classifier Algorithm will make our life easier. 
This classifies words based on the popular Bayes Theorem of probability and is used in applications related to disease prediction, document classification, spam filters and sentiment analysis projects.
We can use the Naive Bayes Classifier Algorithm for ranking pages, indexing relevancy scores and classifying data categorically.

2.) K-Means Clustering Algorithm

K-Means Clustering Algorithm is frequently used in applications such as grouping images into different categories, detecting different activity types in motion sensors and for monitoring whether tracked data points change between different groups over time. There are business use cases of this algorithm as well such as segmenting data by purchase history, classifying persons based on different interests, grouping inventories by manufacturing and sales metrics, etc.

The K-Means Clustering Algorithm is an unsupervised Machine Learning Algorithm that is used in cluster analysis. It works by categorizing unstructured data into a number of different groups ‘k’ being the number of groups. Each dataset contains a collection of features and the algorithm classifies unstructured data and categorizes them based on specific features.

3.) Support Vector Machine (SVM) Learning Algorithm
Support Vector Machine Learning Algorithm is used in business applications such as comparing the relative performance of stocks over a period of time. These comparisons are later used to make wiser investment choices. 
SVM Algorithm is a supervised learning algorithm, and the way it works is by classifying data sets into different classes through a hyperplane.It marginalizes the classes and maximizes the distances between them to provide unique distinctions. We can use this algorithm for classification tasks that require more accuracy and efficiency of data.

4.) Recommender System Algorithm
The Recommender Algorithm works by filtering and predicting user ratings and preferences for items by using collaborative and content-based techniques. The algorithm filters information and identifies groups with similar tastes to a target user and combines the ratings of that group for making recommendations to that user. It makes global product-based associations and gives personalized recommendations based on a user’s own rating.
For example, if a user likes the TV series ‘The Flash’ and likes the Netflix channel, then the algorithm would recommend shows of a similar genre to the user.

5.1) Linear Regression
Linear Regression widely used for applications such as sales forecasting, risk assessment analysis in health insurance companies and requires minimal tuning.
It is basically used to showcase the relationship between dependent and independent variables and show what happens to the dependent variables when changes are made to independent variables. 

5.2)Logistic Regression
Logistic regression is used in applications such as-
1. To Identifying risk factors for diseases and planning preventive measures
2. Classifying words as nouns, pronouns, and verbs
3. Weather forecasting applications for predicting rainfall and weather conditions
4. In voting applications to find out whether voters will vote for a particular candidate or not
A good example of logistic regression is when credit card companies develop models which decide whether a customer will default on their loan EMIs or not. 
The best part of logistic regression is that we can include more explanatory (dependent) variables such as dichotomous, ordinal and continuous variables to model binomial outcomes.
Logistic Regression is a statistical analysis technique which is used for predictive analysis. It uses binary classification to reach specific outcomes and models the probabilities of default classes. 

6.) Decision Tree Machine Learning Algorithm
Applic

ations of this Decision Tree Machine Learning Algorithm range from data exploration, pattern recognition, option pricing in finances and identifying disease and risk trends.
We want to buy a video game DVD for our best friend’s birthday but aren’t sure whether he will like it or not. We ask the Decision Tree Machine Learning Algorithm, and it will ask we a set of questions related to his preferences such as what console he uses, what is his budget. It’ll also ask whether he likes RPG or first-person shooters, does he like playing single player or multiplayer games, how much time he spends gaming daily and his track record for completing games.
Its model is operational in nature, and depending on our answers, the algorithm will use forward, and backward calculation steps to arrive at different conclusions.

7.) Random Forest ML Algorithm
The random forest algorithm is used in industrial applications such as finding out whether a loan applicant is low-risk or high-risk, predicting the failure of mechanical parts in automobile engines and predicting social media share scores and performance scores.
The Random Forest ML Algorithm is a versatile supervised learning algorithm that’s used for both classification and regression analysis tasks. It creates a forest with a number of trees and makes them random. Although similar to the decision trees algorithm, the key difference is that it runs processes related to finding root nodes and splitting feature nodes randomly.
It essentially takes features and constructs randomly created decision trees to predict outcomes, votes each of them and consider the outcome with the highest votes as the final prediction. 

8.) Principal Component Analysis (PCA) Algorithm
PCA algorithm is used in applications such as gene expression analysis, stock market predictions and in pattern classification tasks that ignore class labels.
The Principal Component Analysis (PCA) is a dimensionality reduction algorithm, used for speeding up learning algorithms and can be used for making compelling visualizations of complex datasets. It identifies patterns in data and aims to make correlations of variables in them. Whatever correlations the PCA finds is projected on a similar (but smaller) dimensional subspace. 

9.) Artificial Neural Networks
Essentially, deep learning networks are collectively used in a wide variety of applications such as handwriting analysis, colorization of black and white images, computer vision processes and describing or captioning photos based on visual features.
Artificial Neural Network algorithms consist of different layers which analyze data. There are hidden layers which detect patterns in data and the greater the number of layers, the more accurate the outcomes are. Neural networks learn on their own and assign weights to neurons every time their networks process data.
Convolutional Neural Networks and Recurrent Neural Networks are two popular Artificial Neural Network Algorithms.
Convolutional Neural Networks are feed-forward Neural networks which take in fixed inputs and give fixed outputs. For example – image feature classification and video processing tasks.
Recurrent Neural Networks use internal memory and are versatile since they take in arbitrary length sequences and use time-series information for giving outputs. For example – language processing tasks and text and speech analysis

10.) K-Nearest Neighbors Algorithm
KNN algorithm is used in industrial applications in tasks such as when a user wants to look for similar items in comparison to others. It’s even used in handwriting detection applications and image/video recognition tasks.
The best way to advance our understanding of these algorithms is to try our hand in image classification, stock analysis, and similar beginner data science projects.
The K-Nearest Neighbors Algorithm is a lazy algorithm that takes a non-parametric approach to predictive analysis. If we have unstructured data or lack knowledge regarding the distribution data, then the K-Nearest Neighbors Algorithm will come to our rescue. The training phase is pretty fast, and there is a lack of generalization in its training processes. The algorithm works by finding similar examples to our unknown example, and using the properties of those neighboring examples to estimate the properties of our unknown examples.

The only downside is its accuracy can be affected as it is not sensitive to outliers in data points.

CAP study notes

INFORMS defines analytics as the scientific process of transforming data into insight for making better decisions. It is seen as an end-to-end process beginning with identifying the business problem to evaluating and drawing conclusions about the prescribed solution arrived at through the use of analytics. Analytics professionals are skilled at this process.

  • Operation Research is a correction toolkit like optimization, simulation, precision analysis.
  • Advanced Analytics is the intersection of Analytics and Operation Research
  • Analytics Maturity Model for Organizations to introspect their analytics processes maturity.
  • OR is a toolkit and Analytics is a process.


Job Task Analysis

A job task analysis (JTA) is a comprehensive description of the duties and responsibilities of a profession, occupation, or specialty area our approach consists of four elements: 1) domains of practice, 2) tasks performed, 3) knowledge required for effective performance on the job, and 4) domain weights that account for the importance of and frequency with which tasks are performed.


Domain - Approximate Weight

1. Business Problem (Question ) Framing - 12%-18%

2. Analytics Problem Framing - 14%-20%  

3. Data - 18%-26%

4. Methodology (Approach) Selection - 12%-18%

5. Model Building - 13%-19%

6. Deployment - 7%-11%

7. Model Life Cycle Management - 4%-8%


(12%-18%) Domain 1 - Business Problem (Question ) Framing - The ability to understand a business problem and determine whether the problem is amenable to an analytics solution.

T-1 Obtain or receive problem statement and usability requirements

T-2 Identify stakeholders

T-3 Determine whether the problem is amenable to an analytics solution

T-4 Refine the problem statement and delineate constraints

T-5 Define an initial set of business benefits

T-6 Obtain stakeholder agreement on the business problem statement


(14%-20%) Domain 2 - Analytics Problem Framing - The ability to reformulate a business problem into an analytics problem with a potential analytics solution.

T-1 Reformulate problem statement as an analytics problem.

T-2 Develop a proposed set of drivers and relationships to outputs.

T-3 State the set of assumptions related to the problem.

T-4 Obtain stakeholder agreement on the approach.


(18%-26%) Domain 3 - Data - The ability to work effectively with data to help identify potential relationships that will least to the refinement of the business and analytics problem.

T-1 Identify and prioritize data needs and sources.

T-2 Acquire data.

T-3 Harmonize, rescale, clean and share date.

T-4 Identify relationships in the data.

T-5 Document and report findings (e.g., insights, results, business performance)

T-6 Refine the business and analytics problem statements.


(12%-18%) Domain 4 - Methodology (Approach) Selection - The ability to identify and select potential approaches for solving the business problem.

T-1 Identify available problem-solving approaches (methods).

T-2 Select software tools.

T-3 Test approaches (methods).

T-4 Select6 Approaches (methods).


(13%-19%) Domain 5 - Model Building - The ability to identify and build effective model structures to help solve the business problem.

T-1 Identify model structures.

T-2 Run and evaluate the models.

T-3 Calibrate models and data.

T-4 Integrate the models.

T-5 Document and communicate findings(including assumptions, limitations, and constraints).


(7%-11%) Domain 6 - Deployment - The ability to deploy the selected model to help solve the business problem.

T-1 Perform business validation of the model.

T-2 Deliver report with findings; or

T-3 Create model, usability, and system requirements for production.

T-4 Deliver production model/system.

T-5 Support deployment


(4%-8%) Domain 7 - Model Life Cycle Management - The ability to manage the model lifecycle to evaluate the business benefit of the model over time.

T-1 Document initial structure.

T-2 Track model quality.

T-3 Recalibrate and maintain the model.

T-4 Support training activities.

T-5 Evaluate the business benefit of the model over time.


Knowledge Statements 

K-1 Characteristics of a business problem statement (i.e., a clear and concise statement of the problem describing the situation and stating the desired end state or goal).

K-2 Interviewing (questioning) techniques (i.e. the process by which a practitioner elicits information and understanding from business experts, including strategies for the success of the project).

K-3 Client business processes (i.e., the processes used by the client or project sponsor that are related to the problem).

K-4 Client and client-related organizations structures.

K-5 Modeling options (i.e., the analytic approaches available for seeking a solution to the problem or answer to the question including optimization, simulation, forecasting, statistical analysis, data mining, machine learning, etc).

K-6 Resources necessary for analytics solutions( e.g., human, data, computing, software).

K-7 Performance measurement (i.e., the technical and business metrics by which the client and the analyst measure the success of the project).

K-8 Risk/return (i.e., trade-offs between prioritizing the primary objective and minimizing the likelihood of significant penalty taking into account the risk attitude of the decision maker).

K-9 Presentation techniques (i.e. strategies for communicating analytics problems and solutions to a broad audience of business clients).

K-10 Structure of decisions (e.g., influence diagrams, decision trees, system structures).

K-11 Negotiation techniques (i.e., strategies and methods that allow the analytics professional to reach a shared understanding with the client).

K-12 Data rules (e.g., privacy, intellectual property, security, governance, copyright, sharing).

K-13 Data architectures (i.e., a description of how data are processed, stored, and used in organizational systems including conceptual, logical, and physical aspects).

K-14 Data Architecture (i.e., a description of how data are processes stored, and used in organizational systems including conceptual, logical, and physical aspects).

K-15 Visualization techniques (i.e., any technique for creating images, diagrams or animations to communicate a message including data visualtization, information, visualtization, statistical graphics, presentation graphics, etc.)

K-16 Statistics (descriptive, correlation, regression, etc.)

K-17 Software tools.

The five E's are ethics, education, experience, examination, and effectiveness. These are the five pillars of CAP.

Effectiveness is the art of applying your knowledge and skill in a way that enables the achievement of your organization's goals. The soft skills required are dealt with more fully.


Domain 1 - Business Problem Framing

A business problem statement generally starts by describing a business opportunity or threat, or an issue in broad terms. 

Do get definitions of all terms, as meanings change between organizations.

Five W's - who, what, where, when and why;

  • Who: are the stakeholders who satisfy one or more of the following with respect to the project: funding, using, creating, or affected by the project's outcome.
  • What: problem/function is the project meant to solve/perform?
  • Where: does the problem occur? Or where does the function need to be performed? are the physical and spatial characteristics articulated?
  • When: does the problem occur, or function needs to be performed? when does the project to be completed?
  • Why: does the problem occur, or function need to occur?

First figure if the stakeholder's problem is likely to have an analytics solution. or does the answer and change process to get there lie within the organization's control. Second, do we have the data on input and output? third can the problem be modeled. Lastly, can the organization accept and deploy the answer?

Refine the problem statement to make it more accurate, more appropriate to the stakeholders, or more amenable to available analytic tools/methods. It is also necessary to define what constraints the project will operate under. These constraints could be analytical, financial, or political in nature.

Define an initial set of business benefits. These benefits may be determined quantitatively, qualitatively. This is defined as a Business Case


Domain 2 - Analytics Problem Framing

 

Decomposition of requirements using QFD - Quality Function Deployment. In decomposing, it is critical to important to account for tacit (understood or implied without being stated) as well as formal requirements. The best-known model for this is the KANO Model.

KANO Model distinguishes between unexpected customer delights, known customer requirements, and customer must-haves that are not explicitly stated. 

When generally asked, Business stakeholders for a list of what requirements they have, they will tend to focus on the "normal requirements," not the "expected requirements". As the analytics professional charged with translating business requirements into the problem statement, you really need to probe to make sure that you have the entire appropriate context as well, including the expected requirements.

Your input/output functions are strongly related to your assumptions about what is important about this problem as well as the key metrics by which you'll measure the organizational response to the problem.

Simple Black Box sketches are to make inputs visible and illustrate the concept that we are simulating. This helps in getting agreement among the team on the direction and scale of the relationships to bound the problem and to create the related hypothesis that you'll use later to attack the data. A point to emphasize to the team is that these are preliminary assumptions and while your best estimate is needed, it is still just an estimate and is subject to change depending on what realty turns out to be. The danger we're trying to avoid to change depending on what reality turns out to be. The danger we're trying to avoid here is what Kahneman calls "anchoring". People have a tendency to hang on to views that they've seen and held before, even if they are incorrect. Reminding them that these are initial and preliminary, rather than finalized views, helps mitigate the anchoring effect.

" What is measured, improves". This ties directly to the business problem statement but goes down one level further to the items that compromise the key success metric.

Many people then to think of stakeholders as people in positions "above" the analytics team. It is true that there is a group of stakeholders that are the ones with the business need and who are paying for the effort. But just as importantly, you must also have an agreement with the people executing the analytics work that your methods and hypothesis are workable in the time and budget allocated to get the work done. The output of this stakeholder agreement will vary by organization but should include the budget, timeline, interim milestones (if any), goals, and any known effort that is excluded as out of scope. Otherwise, errors will creep in what was delivered will miss critical unstated requirements. If you allow your project to rely on written communication only, you've missed the opportunity to correct misapprehensions when it is still cheap to do so.

Decomposition: the act of breaking down a higher-level requirement to multiple lower-level requirements.

Requirements: a requirement should be unitary (no conjunctions such as and, but, or, or), positive, and testable.



Data Visualization: Individuals process information in different ways

  • Some individuals are visuals and others are analytic
  • Some individuals want to see the big picture, others want to see details and some want anecdotes
  • Some individuals want to demonstrations or even the ability to slice and dice the data themselves.


Think hard about your visualization, the less blind faith the better, think hard about the visual, it should be a white box instead of a black box, which shows the relationships between the variables and reduce the dimensions in the data. Because your visualization will be used by other business people and they may see the visualization and might replicate the results and take it out of context.


Importance of Data Reduction:

  • Product Segmentation


The Results have to be communicated easily in such a manner with respect to people who have not worked on the data as much as we have.

Don't try to be mechanical in giving recommendations, find the little insights as the recommendations do not have to be big in scope.

Prototyping: Take out a small example and see what the data implies. It is an iterative task and helps to refine the problem statement. It helps in understanding if the expected outputs would be achieved or we can go back and look at the data plan

Consideration for Model Selection:

  • Modeling Options
  • Data Architecture 

Prescriptive Models include:

  • Optimization (Linear Programming, Non-Linear Programming, Integer Programming, etc.)
  • Stochastic - Optimization (randomness of the system comes into play)

Predictive Model s include:

  • Simulation
  • Regression
  • Statistical Inferences
  • Classification
  • Clustering
  • Artificial Intelligence
  • Game Theory

High Value, High Impact, High Level of Data Accuracy, Causal Understanding - Prescriptive and Predictive

Medium Level of Data Needs - Descriptive Analytics


For Business Needs  - Two types of compare and contrast analysis must be done before selecting a particular method or modeling approach.

  • Type I - Between Approach - Prescriptive vs Predictive vs Descriptive or a combination method
  • Type II - Within Approach - Within the three which method is best suitable


Select Methodology :

  • Selection Criteria dependant on;
  1. Time Available / Constraint 
  2. Accuracy Needed
  3. The relevance of methodology & Scope of Project
  4. The accuracy of Data - If Data is not good we cannot use high data intensive method
  5. Data availability & readiness
  6. Resource Available
  7. Methodology popularity/acceptance
  8. Match approach to accuracy

Important areas to focus on method selection are :

  1. Know what a method can do;
  2. Know what a method cannot do; 
  3. Stay unbiased in method selection, and not try to use a method that a practitioner knows well.

A Good Meta Knowledge is to know about what software are there in the market and what software can handle inputs and is the best case for the Business Problem.


In Document and Communicate Findings, 

  • Need to document how findings impact the original business problem
  • Rather than giving the customers walkthrough of the kitchen or the restaurant while they are hungry and then give them to eat, it would be better to give them the food on the table in the way they want the food to be presented, let them eat and then answer their questions.


Every model has a lifecycle and lifecycle maintenance. For some models, lifecycle maybe three weeks or six months. If the model is not available to give the answers then we need to do maintenance of the model. Documentation of the structure of the model is important.

The document should include assumptions. so that business knows where we have taken assumptions so that they know when they use the model and when to use the output.

Best Certifications in IT

The 15 top-paying certifications of 2018

  • Certified in the Governance of Enterprise IT (CGEIT)
  • AWS Certified Solutions Architect – Associate
  • Project Management Professional (PMP)
  • AWS Certified Developer – Associate
  • Certified Information Systems Security Professional (CISSP)
  • Certified in Risk and Information Systems Control (CRISC)
  • Certified Information Security Manager (CISM)
  • Certified ScrumMaster
  • Certified Ethical Hacker (CEH)
  • Six Sigma Green Belt
  • Citrix Certified Professional – Virtualization (CCP-V)
  • Microsoft Certified Solutions Expert (MCSE) – Server Infrastructure
  • Certified Information Systems Auditor (CISA)
  • Cisco Certified Networking Professional (CCNP) Routing and Switching
  • Citrix Certified Associate – Networking (CCA-N)


Risk and Compliance Certifications

  • Certified in Risk and Information Systems Control (CRISC)
  • Certified in the Governance of Enterprise IT (CGEIT)
  • Project Management Institute - Risk Management Professional (PMI-RMP)


Top 15 data science certifications

  • Applied AI with DeepLearning, IBM Watson IoT Data Science Certificate
  • Big Data Certification, UC San Diego Extension School
  • Certified Analytics Professional (CAP)
  • Cloudera Certified Associate - Data Analyst
  • Cloudera Certified Professional: CCP Data Engineer
  • Certification of Professional Achievement in Data Sciences, Columbia University
  • Certification in Data Science, Georgetown University School of Continuing Studies
  • Data Science Certificate, Harvard Extension School
  • Data Science A-Z: Real Life Data Science Exercises
  • Data Science for Executives, Columbia University
  • Dell EMC Proven Professional
  • Microsoft Certified Solutions Expert
  • Microsoft Professional Program in Data Science
  • SAS Academy for Data Science
  • Springboard Introduction to Data Science
  • Data Science Council of America (DASA)


Top 10 IT management certifications

  • Certified Associate in Project Management (CAPM)
  • Certified in the Governance of Enterprise IT (CGEIT)
  • Certified ScrumMaster (CSM)
  • Certified Information Systems Security Professional (CISSP)
  • COBIT 5 Foundation Certification
  • CompTIA Project+
  • Information Technology Infrastructure Library (ITIL)
  • PMI Agile Certified Practitioner (PMI-ACP)
  • Six Sigma Certification
  • TOGAF 9 Certification
  • Certified Change Management Professional (CCMP)
  • Certified Professional Facilitator(CPF)



Cyber Coverages Commonly Available

Third-party liability for damage and defense costs resulting from:*

Network security liability

–   Unauthorized disclosure of private information (privacy liability)

–   Destruction of digital assets

–   Unintentional transmission of malicious code

–   Unintentional participation in denial-of-service attack

Failure to promptly report unauthorized disclosure of private information

Failure to comply with statutory requirement that insured manage an identity-theft prevention program

(Note: Not all jurisdictions have statutory requirement.)

Electronic media liability

(Note: Defamation and infringement of intellectual property rights optional in some policies.)

Technology errors and omissions liability

*In some policies:

•      Payment of defense costs reduces policy limit, or such costs are paid in addition to the policy limit.

•      Selection of defense counsel is mutually agreed upon, or counsel is selected solely by the insurer.

•      Insured can refuse to settle and be responsible for 30% to 50% of claim, or insured must settle if insurer chooses to settle.

 First-party expenses for:

Notification of customers regarding breach

(Note: For costs incurred within one year of notice to insurer.)

Forensic study to determine scope and cause of breach

Hiring attorney to ensure compliance with notification-of-breach laws

Regulatory action

(Note: Fines and penalties considered a third-party liability in some policies.)

Crisis management to mitigate damage to reputation

(Note: Public relations and credit monitoring sublimit of $100,000 on one policy; limit usually agreed on.)

Business interruption and additional expenses

(Note: Optional in some policies; also known as Business Income [and Extra Expense].)

Electronic data protection/remediation

(Note: Optional in some policies; difficult to insure because of prohibitive cost.)

Cyber extortion

(Note: Various threats asserted: introduction of a virus, denial of service, and transfer of funds available on some policies.)

Cyber crime

(Note: Insured's financial institution transfers funds on a thief's instructions; available on some policies.)

Trends in InsureTech Space

There are interesting projects across the globe—projects related to digital transformation in terms of distribution, product and services.


Here are my 4 top takeaways:

It’s happening and it’s happening now.

The insurance technology industry worldwide received around US$2.3B in investment funds in 2017 according to a report from CB Insights, a data and research company. This figure represents a 36% increase from 2016.

The combination of low cost of capital, increased use of smartphones and OIT adoption and the “entrepreneurship” phenomena has facilitated the proliferation of interesting insurtech projects across the insurance ecosystem.

There was also a proliferation of dynamic, vibrant projects, backed by smart folks, interesting technology and investors with deep pockets. It reminds me of a lot of 2008-2010 with the Smartphone tsunami, which opened the window to the application movement. There will be people that won’t make it, but this is going to be an unstoppable movement.

  • Direct to Consumer (D2C) models are hard to sustain. They need a lot of capital.

I love the value proposition of many insurtech players. But they’re doing things the hard way, and they’re going to have a bumpy road ahead. Many of those players have high valuations, a lot of combined ratio and the growth of users. That’s good but not great.

As with many startups worldwide, especially the ones focused on the direct-to-consumer market, there was a buzz from some speakers that said probably some of these D2C projects won’t make it. Eventually, we’ll see them instead pivot into a new technology backed by incumbent service providers.

Also, it was mentioned that to succeed, these companies need to take a long-term approach and have deep pockets filled with capital, enabling them to sustain losses over a long time. And, it was mentioned during the session that we’ll see major tech players entering the space soon betting on some of these companies just as they’re getting traction. They have the capital to invest.

  • It’s not a zero-sum game.

This effort isn’t a zero-sum game, with startups attacking incumbents. Instead, it will be startups plus incumbents, with both players adding something to the mix. That will help them maximize their strengths and minimize their weaknesses.

But the incumbents will have to learn from other industries and pay close attention to what innovations are taking place. We’re all tired of the “Kodak” speech. But the insurance industry could be the latest industry to be disrupted, with incumbents driving innovation in the marketplace.

Many incumbents have already realized that undertaking this transformation alone using legacy IT systems won’t fly. Instead, they’ll need to adapt existing technology to meet their needs with partners providing investment funds. It’s the smart move to make.

On the other hand, many insurtech investors are technology people, not insurance people. Insurance is highly regulated and complex. Going D2C is an option, but it requires a lot of capital to do it successfully.

They also have to assume the risk of being the first mover. That’s fine as is. But the chances of failing are great, so many companies are pivoting to become B2B and B2C players, providing technology support to incumbents transforming the industry.

That will reduce the undertaking’s complexity and risk. These companies will focus just on technology, with incumbents having the necessary assets to deal with the market, regulation, etc. This approach is a win-win for everyone involved.

Large Legacy Insurers' has been very active in this marketplace during the last several years. They’ve invested and partnered with the industry’s best insurtech players in this win-win approach.

For Example : Chubb has provided insurance and APIs, like Coverwallet, for SMEs in the USA; developed new products like life insurance for just the length of the flight, like Sure; as well as adopted technology to enrich the users' experience and provide better customer service.

The biggest beneficiary: The customer

There are no doubts that with all this change customers will be the biggest beneficiaries. All this focus on new technology will generate better user experiences, increase corporate transparency, enhance tailor-made offerings and produce better claims and servicing experiences. The sector is critical by itself with each new development setting a new standard in the industry.

Also, I see incumbents serving clients better across the entire customer journey. What I didn’t see early on was incumbents working with carriers in other industries. But now I’m excited about what companies in this sector are doing for customers.

There’s never been a better time to work in insurance. We’ll see a real industry transformation in the next five years with technology starting to play a critical role in defining new products and providing better customer experiences across the entire insurance value chain.