10 Valuable Artificial Intelligence Certifications for 2024 (analyticsinsight.net)
10 AI Certifications for 2024: Build Your Skills and Career | Upwork
Jetson AI Courses and Certifications | NVIDIA Developer
Microsoft Certified: Azure AI Engineer Associate - Certifications | Microsoft Learn
Artificial Intelligence Certification | AI Certification | ARTIBA
Certified Artificial Intelligence Scientist | CAIS™ | USAII®
Fundamentals of Machine Learning for Healthcare | Coursera
Bioinformatics Specialization [7 courses] (UCSD) | Coursera
rabbit in a hat
soggy
perseus
usagi
jackalope
atlas
athena
eden academy ohdsi
+44-7587652115
Understanding Big Language Models:
1. DALL-E 2 (Open.AI)Topics where user can contribute:
Prompt Engineering Overview:
At the very basic we have interface to interact with a language model, where we pass some instruction and the model passes a response. The response is generated by the language model.
A prompt is composed with the following components:
Settings to keep in mind:
Designing prompts for Different Tasks:
Tasks Covered:
Tools & IDE's : Tools, libraries and platforms with different capabilities and functionalities include:
Example of LLMs with external tools:
Opportunities and Future Directions:
A token is ChatGPT is roughly 4 words.
Some notes on Recurrent Neural Network: A neural network which has a high hidden dimension state. When a new observations comes it updates its high hidden dimension state.
In machine learning there is lot of unity in principles to be applied to different data modalities. We use the same neural net architecture, gradients and adam optimizer to fine tune the gradients. For RNN we use some additional tools to reduce the variance of the gradients. For example: using CNN for image learning or Transformers to NLP problems. Years back in NLP for every tiny problem there was a different architecture.
Question : Where does vision stop and language begin
With Deep learning we are looking at a static problem with a probability distribution and applying the model to the distribution.
Back Propagation is useful algorithm and not go away, because it helps in finding a neural circuit subject to some constraints.
For Natural Language Modelling it is proven that very large datasets work because we are trying to predict the next word by broad strokes and surface level pattern. Once the language model becomes large, it understand the characters, spacing, punctuations, words, and finally the model learns the semantics and the facts.
Transformers is the most important advance in neural networks. Transformers is a combination of multiple ideas in which attention is one in which attention is a key. Transformers is designed in a way that it runs on a really fast GPU. It is not recurrent, thus it is shallow (less deep) and very easy to optimize.
After Transformers to built AGI, research is going on in Self Play and Active Learning.
GAN's don't have a mathematical cost function which it tries to optimize by gradient descent. Instead there is a game in which through mathematical functions it tries to find equilibrium.
Other example of deep learning models without cost function is reinforcement learning with self-play and surprise actions.
Double Descent:
When we make neural network larger it becomes better which is contrarian to statistical ideas. But there is a problem called the double descent bump as shown below;
Double descent occurs for all practical deep learning systems. Take a neural network and start increasing its size slowly while keeping the dataset size fixed. If you keep increasing the neural network size and don't do early stopping then, there is increase in performance and then it gets worse. It the point the model gets worst is precisely the point at which the model gets zero training error or zero training loss and then when we make it larger it start to get better again. It counter-intuitive because we expect the deep learning phenomenon to be monotonic.
The intuition is as follows:
"When we have a large data and a small model then small model is not sensitive to randomness/uncertainty in the training dataset. As the model gets large it achieves zero training error at approximately the point with the smallest norm in that subspace. At the point the dimensionality of the training data is equal to the dimensionality of the neural network model (one-to-one correspondence or degrees of freedom of dataset is same as degrees of freedom of model) at that point random fluctuation in the data worsens the performance (i.e. small changes in the data leads to noticeable changes in the model). But this double descent bump can be removed by regularization and early stopping."
If we have more data than parameters or more parameters than data, then model will be insensitive to the random changes in the dataset.
Overfitting: When model is very sensitive to small random unimportant stuff in the training dataset.
Early Stop: We train our model and monitor our performance and at some point when the validation performance starts to become worse we stop training (i.e. we determine to stop training and consider the model to be good enough)
ChatGPT:
ChatGPT has become a water-shed moment for organization because all companies are inherently language based companies. Whether it is text, video, audio, financial records all can be described as tokens which can be fed to large language models.
A good example of this is when during training of ChatGPT on amazon reviews, they found that after large amount of training the model became an excellent classifier of sentiment. So the model from predicting the next word (token) in a sentence, started to understand the semantics of the sentence and could tell if the review was a positive or negative.
With Advancement of AI, we have a likeness of a particular person as a separate bot, and the particular person will get a say, cut and licensing opportunities of his likeness.
]]>All the material is for for getting certification for Google Universal Analytics or GA3, but the material will also help to prepare for GA4. Unfortunately GA4 is very new and very few people are using it.
Udemy:
https://www.udemy.com/share/101YUA3@1ZQpoeanMxxthiBi3TRUePtvhK8jpKedLNfathrLsI_5x8FtERy5aZusAp5R/
This one is excellent resource before the exam
https://www.udemy.com/share/1057WK3@B0vqy8cXKsPzaotyxGtf8OMJUbk6LabDRa9MvahhOqCaaXBprgawEPRvwRFK/
Google Material
https://skillshop.exceedlms.com/student/catalog/list?category_ids=6431-google-analytics-4
https://calendly.com/yourknowledgebuddyuk/1-2-1?month=2022-08
https://www.efinancialcareers.com/
https://www.jobs.nhs.uk/xi/search_vacancy/
]]>
A Wald/Score chi-square test can be used for continuous and categorical variables. Whereas, Pearson chi-square is used for categorical variables. The p-value indicates whether a coefficient is significantly different from zero. In logistic regression, we can select top variables based on their high wald chi-square value.
Gain :Gain at a given decile level is the ratio of cumulative number of targets (events) up to that decile to the total number of targets (events) in the entire data set. This is also called CAP (Cumulative Accuracy Profile) in Finance, Credit Risk Scoring Technique
Interpretation: % of targets (events) covered at a given decile level. For example, 80% of targets covered in top 20% of data based on model. In the case of propensity to buy model, we can say we can identify and target 80% of customers who are likely to buy the product by just sending email to 20% of total customers.
Interpretation: The Cum Lift of 4.03 for top two deciles, means that when selecting 20% of the records based on the model, one can expect 4.03 times the total number of targets (events) found by randomly selecting 20%-of-file without a model.
Decile Rank | Number of cases | Number of Responses | Cumulative Responses | % of Events | Gain | Cumulative Lift | Number of Decile Score to divide Gain |
1 | 2500 | 2179 | 2179 | 44.71% | 44.71% | 4.47% | 10 |
2 | 2500 | 1753 | 3932 | 35.97% | 80.67% | 4.03% | 20 |
3 | 2500 | 396 | 4328 | 8.12% | 88.80% | 2.96% | 30 |
4 | 2500 | 111 | 4439 | 2.28% | 91.08% | 2.28% | 40 |
5 | 2500 | 110 | 4549 | 2.26% | 93.33% | 1.87% | 50 |
6 | 2500 | 85 | 4634 | 1.74% | 95.08% | 1.58% | 60 |
7 | 2500 | 67 | 4701 | 1.37% | 96.45% | 1.38% | 70 |
8 | 2500 | 69 | 4770 | 1.42% | 97.87% | 1.22% | 80 |
9 | 2500 | 49 | 4819 | 1.01% | 98.87% | 1.10% | 90 |
10 | 2500 | 55 | 4874 | 1.13% | 100.00% | 1.00% | 100 |
25000 | 4874 |
Detecting Outliers
QR is interquartile range. It measures dispersion or variation. IQR = Q3 -Q1.Some researchers use 3 times of interquartile range instead of 1.5 as cutoff. If a high percentage of values are appearing as outliers when you use 1.5*IQR as cutoff, then you can use the following rule
Lower limit of acceptable range = Q1 - 1.5* (Q3-Q1)
Upper limit of acceptable range = Q3 + 1.5* (Q3-Q1)
Lower limit of acceptable range = Q1 - 3* (Q3-Q1)
Upper limit of acceptable range = Q3 + 3* (Q3-Q1)
Acceptable Range : The mean plus or minus three Standard Deviation
4. Weight of Evidence: Logistic regression model is one of the most commonly used statistical technique for solving binary classification problem. It is an acceptable technique in almost all the domains. These two concepts - weight of evidence (WOE) and information value (IV) evolved from the same logistic regression technique. These two terms have been in existence in credit scoring world for more than 4-5 decades. They have been used as a benchmark to screen variables in the credit risk modeling projects such as probability of default. They help to explore data and screen variables. It is also used in marketing analytics project such as customer attrition model, campaign response model etc.
The weight of evidence tells the predictive power of an independent variable in relation to the dependent variable. Since it evolved from credit scoring world, it is generally described as a measure of the separation of good and bad customers. "Bad Customers" refers to the customers who defaulted on a loan. and "Good Customers" refers to the customers who paid back loan.
Distribution of Goods - % of Good Customers in a particular group
Distribution of Bads - % of Bad Customers in a particular groupln - Natural LogPositive WOE means Distribution of Goods > Distribution of Bads
Negative WOE means Distribution of Goods < Distribution of BadsHint : Log of a number > 1 means positive value. If less than 1, it means negative value.
Logistic regression model is one of the most commonly used statistical technique for solving binary classification problem. It is an acceptable technique in almost all the domains. These two concepts - weight of evidence (WOE) and information value (IV) evolved from the same logistic regression technique. These two terms have been in existence in credit scoring world for more than 4-5 decades. They have been used as a benchmark to screen variables in the credit risk modeling projects such as probability of default. They help to explore data and screen variables. It is also used in marketing analytics project such as customer attrition model, campaign response model etc.
WOE Calculation |
Distribution of Goods - % of Good Customers in a particular group
Distribution of Bads - % of Bad Customers in a particular group
ln - Natural Log
Hint : Log of a number > 1 means positive value. If less than 1, it means negative value.
WOE = In(% of non-events ➗ % of events)
Weight of Evidence Formula |
Weight of Evidence and Information Value Calculation |
Logistic regression model is one of the most commonly used statistical technique for solving binary classification problem. It is an acceptable technique in almost all the domains. These two concepts - weight of evidence (WOE) and information value (IV) evolved from the same logistic regression technique. These two terms have been in existence in credit scoring world for more than 4-5 decades. They have been used as a benchmark to screen variables in the credit risk modeling projects such as probability of default. They help to explore data and screen variables. It is also used in marketing analytics project such as customer attrition model, campaign response model etc.
WOE Calculation |
Distribution of Goods - % of Good Customers in a particular group
Distribution of Bads - % of Bad Customers in a particular group
ln - Natural Log
Hint : Log of a number > 1 means positive value. If less than 1, it means negative value.
WOE = In(% of non-events ➗ % of events)
Weight of Evidence Formula |
Weight of Evidence and Information Value Calculation |
Create 10/20 bins/groups for a continuous independent variable and then calculates WOE and IV of the variable2. Coarse Classing
Combine adjacent categories with similar WOE scores
Categorical independent variables: Combine categories with similar WOE and then create new categories of an independent variable with continuous WOE values. In other words, use WOE values rather than raw categories in your model. The transformed variable will be a continuous variable with WOE values. It is same as any continuous variable.
It is because the categories with similar WOE have almost same proportion of events and non-events. In other words, the behavior of both the categories is same.Why combine categories with similar WOE?
Perform shuffling of predictors' values and join them with the original predictors and then build random forest on the merged dataset. Then make comparison of original variables with the randomised variables to measure variable importance. Only variables having higher importance than that of the randomised variables are considered important.
Major Disadvantages: Boruta does not treat collinearity while selecting important variables. It is because of the way algorithm works.
If a variable has a very low rank for Spearman (coefficient - close to 0) and a very high rank for Hoeffding indicates a non-monotonic relationship.
If a variable has a very low rank for Pearson (coefficient - close to 0) and a very high rank for Hoeffding indicates a non-linear relationship.
Marketing Mix
Product : It includes all product items marketed by the marketer, their features, quality brand, packaging, labelling, product life cycle, and all decision related to product
Product assortment , offered to customers by the entire industry
Product line is a group of similar featured items marketed by a marketer
Total number of lines is referred as breadth (width) of product mix
Product depth or item depth refers to the number of version offered to each product in the line
Distribution channel – is very important to Netflix
Price : brings revenue, act of determining value of a product
Includes pricing objectives, price setting strategies, general pricing policies, discount, allowance, rebate, etc. price mix also includes cash and credit policy, price discrimination, cost and contribution
Place : location distance , transport
Direct marketing no intermediary is there
Promotion: is defined as a combination of all activities concerned with informing and persuading the actuals and potential customers about the merits of a product with an intention to achieve sales goals
Sales promotion involves offering short-term incentive to promote buying and increase sales
Most popular form of sales promotion are free gifts, discounts, exchange offer, free home, delivery , after-sales services, guarantee, warrantee, various purchase schemes, etc.
Favourable relations between organizations and public
Modification and extensions to 4 p’s
Product, price place and promotion (marketed approach)
Consumer oriented approach (4c’s)
Commodity - Product
Cost - Cost
Channel - Place
Communication - Promotion
Services were fundamentally different from products
Process : procedures / mechanisms for delivering services and monitoring
People : human factor as they interact with the consumer using the services
Physical Evidences :
Extension of 4c’s
Consumer solution
Cost convenience
Communication
Elements of marketing mix are mutualy dependant
Marketing mix elements are meant for attaining the target markets
Essence of marketing mix is ensuring profitbality through customer satisfaction
Elements help the marketer in attaining marketing objectives
Customer is the central focus of marketing mix
Purpose and objectives of marketing mix
Marketing mix aims at customer satisfaction
Success of each and every product
Aims at assisting the marketers in creating effective marketing strategy
Profit maximization, image building, creation of goodwill, maintaining better customer relations
Success of each and every product
Marketing mix is the link between business and customers
Marketing mix helps to increase sales and profit
for netflix : reduction in price could be attributed in diminishing returns from advertising
Marketing Mix Modelling (MMM) is a method that helps quantify the impact of several marketing inputs on sales or market share. the purpose of MMM is to understand how much each marketing input contributes to sales, and how much to spend on each marketing input.
MMM relies on statistical analysis such as multivariate regressions on sales and marketing time series data to estimate the impact of various marketing tactics (marketing mix) on sales and then forecast the impact of future sets of tactics. It is often used to optimize the advertising mix and promotional tactics with respect to sales and profits.
Marketing Mix Modeling (MMM) is one of the most popular analysis under Marketing Analytics which helps organisations in estimating the effects of spent on different advertising channels (TV, Radio, Print, Online Ads etc) as well as other factors (price, competition, weather, inflation, unemployment) on sales. In simple words, it helps companies in optimizing their marketing investments which they spent in different marketing mediums (both online and offline).
Types of Marketing Mediums | |
Let's break it into two parts - offline and online. | |
Offline Marketing
|
Online Marketing
|
Print Media : Newspaper, Magazine | Search Engine Marketing like Content Marketing, Backlink building etc. |
TV | Pay per Click, Pay per Impression |
Radio | Email Marketing |
Out-of-home (OOH) Advertising like Billboards, ads in public places. | Social Media Marketing (Facebook, YouTube, Instagram, LinkedIn Ads) |
Direct Mail like catalogs, letters | Affiliate Marketing |
Telemarketing | |
Below The Line Promotions like free product samples or vouchers | |
Sponsorship |
MMM has had a place in marketers’ analytics toolkit for decades. This is due to the unique insights marketing mix models can provide. By leveraging regression analysis, MMM provides a “top down” view into the marketing landscape and the high-level insights that indicate where media is driving the most impact.
For example: by gathering long-term, aggregate data over several months, marketers can identify the mediums consumers engage with the most. MMM provides a report of where and when media is engaged over a long stretch of time.
Background: Marketing Mix Modeling (MMM)
The beginning of the offline measurement
Marketing Mix Modelling is a decades-old process developed in the earliest data of modern marketing that applies regression analysis to historical sales data to analyse the effects of changing marketing activities. Many marketers still use MMM for top-level media planning and budgeting; it delivers a broad view into variables both inside and outside of the marketer's control.
Some of the factors are:
Analytical and Statistical Methods used to quantify the effect of media and marketing efforts on a product's performance is called Marketing Mix Modeling
"It helps to maximize investment and grow ROI"
ROI = (Incremental returns from investment) / Cost of Investment
Marketing ROI = (Incremental Dollar Sales from Marketing Investment) / Spend on Marketing Investment
Why is MMM Needed? Guiding Decisions for Improved Effectiveness
How does MMM work?
Example Marketing Mix Model Output
Detailed output includes:
Market Contribution vs. Base
ROI Assessment:
We measure ROI because not all ads will convert to sales, but because they are cost-effective and most bang for the buck
MMM Strengths:
MMM Limitations:
Critical Success Factors of MMM:
Media Mix Modeling as Econometric Modeling:
Strengths:
Weaknesses:
For working with Market Mix Modeling - a good understanding of econometrics types of modelling is needed
The objective before starting this approach is how can we maximize the value and minimize the harm of marketing mix models like store-based models or shopper based multi-user attribution models.
Marketing End Users are the root of the cause of marketing mix models problems.
Tip: Most attribution projects begin long after the strategy has already been set. So it's important to understand what the client did, why they did it, and what they expected to happen. Only then can you answer their questions in a way they'll be happy with. Remember they hired you because the results weren't what they expected... or because they never thought about how to measure them in the first place.
As we all know weekly variation is the lifeblood of marketing mix models.
Some of the problems are continuity bias
Very interesting article on using Market Mix Modelling during COVID-19.
Market Mix Modeling (MMM) in times of Covid-19 | by Ridhima Kumar | Aryma Labs | Medium
In the model, i read that there will be sudden demand of essential items during the pandemic, but this deviance cannot be attributed to existing advertisement factors.
In the regression model we can see that there will be;
Another very interesting article on Marketing Analytics using Markov chain
Marketing Analytics through Markov Chain | LinkedIn
In the article, I read that how we can use transition matrix to understand the change in states. It explains very neatly.
Article on Conjoint Analysis : Conjoint Analysis: What type of chocolates do the Indian customers prefer? | LinkedIn
Marketing Mix Modeling (MMM) is the use of statistical analysis to estimate the past impact and predict the future impact of various marketing tactics on sales. Your Marketing Mix Modeling project needs to have goals, just like your marketing campaigns.
The main goal of any Marketing Mix Modeling project is to measure
past marketing performance so you can use it to improve future
Marketing Return on Investment (MROI).
The insights you gain
from your project can help you reallocate your marketing budget
across your tactics, products, segments, time and markets for a
better future return. All of the marketing tactics you use should be
included in your project, assuming there is high-quality data with
sufficient time, product, demographic, and/or market variability.
Each project has four distinct phases, starting with data collection
and ending with optimization of future strategies. Let’s take a look
at each phase in depth:
Phase 1 : Data Collection and Integrity : It can be tempting to request as much data as possible, but it's important to note that every request has a very real cost to the client. In this case the task could be simplified down to just marketing spend by day, by channel, as well as sales revenue.
Phase 2 : Modeling: Before modelling we need to;
Phase 4 : Optimization & Strategies
Pitfalls in Market Mix Modeling:
1. Why MMX vendors being “personally objective” is not the same as their being “statistically unbiased”.Some points about Marketing Mix Modeling:
Your Marketing Return on Investment (MROI) will be a key metric
to look at during your Marketing Mix Modeling project, whether that
be Marginal Marketing Return on Investment for future planning or
Average Marketing Return on Investment for past interpretation. The
best projects also gauge the quality of their marketing mix model,
using Mean Absolute Percent Error (MAPE) and R^2
1. Ad creative is very important to your sales top line and your MROI, especially if you can tailor it to a segmented audience. This paper presents five best Spanish language creative practices to drive MROI, which should also impact top-of-the-funnel marketing measures.
2. The long-term impact of marketing on sales is hard to nail down, but we have found that ads that don’t generate sales lift in the near-term usually don’t in the long-term either. You can also expect long-term Marketing Return on Investment to be about 1.5 to 2.5 times the near-term Marketing Return on Investment.
3. Modeled sales may not be equivalent to total sales. Understand how marketing to targeted segments will be modeled.
4. Brand size matters. As most brand managers know firsthand, the economics of advertisement favors large brands over small brands. The same brand TV expenditure and TV lift produces larger incremental margin dollars, and thus larger Marketing Return on Investment, for the large brand than the small brand. 5. One media’s Marketing Return on Investment does not dominate consistently. Since flighting, media weight, targeted audience, timing, copy and geographic execution vary by media for a brand, each media’s Marketing Return on Investment can also vary significantly.Define the Variables
Sales
Media Variables:
Control Variables
Pick Functional Form of Demand Equation
Quantity Demanded = f
Most Common Functional Forms
Modelling Issues
Market-Mix Modeling Econometrics
Multiple Factors that Affect Outcome (Incremental Sales) :
Market Mix modelling: is designed to pick up short term effects, it is not able to model long term effects such as the effect of the brand. Advertisement helps in making a brand but this is difficult to model.
Attribution Modeling: is different Media/Market Mix Modeling as it offers additional insight. In this type of modelling, we measure the contribution of earlier touchpoints of customer digital journey to final sale. Attribution Modeling is bottom-up approach but will be difficult to do because third party cookies are getting phased out
Multi-Touch Attribution modelling is more advanced than top-down Market Mix Modeling because there is an instant feed loop to understand what is working. whereas in Market Mix Modeling we would just determine the percentage of x change to drive sales and then in next year model we will do the adjustment again, without getting any real on the ground feedback to understand that whether we reached the target that we set out to achieve.
Nielson Marketing Mix Modeling is the largest Market Mix Modeling provider in the world.
When it comes to initial marketing strategy or understanding external factors that can influence the success of a campaign, marketing mix modeling shines. Given the fact that MMM leverages long-term data collection to provide its insights, marketers measure the impact of holidays, seasonality, weather, band authority, etc. and their impact on overall marketing success.
As consumers engage with brands across a variety of print, digital, and broadcast channels, marketers need to understand how each touchpoint drives consumers toward conversion. Simply put, marketers need measurements at the person-level that can measure an individual consumer’s engagement across the entire customer journey in order to tailor marketing efforts accordingly.
Unfortunately, marketing mix modeling can’t provide this level of insight. While MMM has a variety of pros and cons, the biggest pitfall of MMM is its inability to keep up with the trends, changes, and online and offline media optimization opportunities for marketing efforts in-campaign.
]]>
This distinction is provided by the Association of Data Scientists (ADaSci). This designation is awarded to those candidates who pass the CDS exam and hold a minimum of two years of work experience as a data scientist. However, the candidates who do not have experience can also take the exam and carry the results. But their charter, in this case, is put on hold until they attain the two years of experience. There is no training or course required to earn this award. The cost of taking this exam is 250 US Dollar. This charter has lifetime validity and hence it does not expire.
Chartered Financial Data Scientist
The Chartered Financial Data Scientist program is organized by the Society of Investment Professionals in Germany. They first provide a training course conducted by the Swiss Training Centre for Investment Professionals. After completing this training, the candidates are allowed to earn this designation. It costs around 8,690 Euro.
Certified Analytics Professional
This professional certification is offered by INFORMS. It is supported by the Canadian Operational Research Society and 3 more professional societies. There are various levels of certification. Each level has different eligibility requirements, from graduate to postgraduate etc. To earn this certification, the cost starts from 495 US Dollar. To take this exam, the candidate needs to be available in-person in the designated test centres. It is valid for three years only.
Cloudera Certified Associate Data Analyst
This certification program is organized by Cloudera. It is more specific towards SQL and databases and more suitable for Data Analysts. It costs around 295 US Dollar and there is no any specific eligibility requirement for this certification. This certification is valid only for two years.
EMC Proven Professional Data Scientist Associate
This certification program is organized by Dell EMC. To earn this distinction, it is mandatory to attend a training program, either in-class or online. It costs around 230 US Dollar. To take this exam, the candidate needs to be available in-person in the designated test centres.
It is organized by the Open Group. The members of the Open Group include HCL, Huawei, IBM, Oracle etc. There are 3 levels of this certification. Require to have a different level of experience for each level of certification. The cost for this certification starts from 295 US Dollar. To take this exam, the candidate needs to be available in-person at the specified place.
This certification program is provided by the Data Science Council of America (DASCA). It requires 6+ years of experience of Big Data Analytics / Big Data Engineering. It costs around 650 US Dollar. This certification has 5 years of validity.
This certification program is provided by the Data Science Council of America (DASCA). It requires 10+ years of experience of Big Data Analytics / Big Data Engineering. There are various tracks of this exam. It costs between 850-950 US Dollar depending on the track.
It is organized by SAS. To get this certification, you need to pass two more exams first SAS Big Data Professional and SAS Advanced Analytics Professional. Along with this, you need to take 18 courses as well. It costs around 4,400 US Dollar.
Financial Data Professional program is organized by Financial Data Professional Institute (FDPI). It is more suitable for financial professionals who apply AI and data science in finance. It opens the exam window with a fixed registration period. The cost of the FDP exam is 1350 US Dollar. To take this exam, the candidate needs to be available in-person in the designated test centres.
So, here we have listed the top certification exams in data science across the world. To choose from the list, a candidate should analyze the requirements in the coming future, the suitability of certification, contents covered in the exam so that it can meet the job requirements, exam cost, exam dates and time flexibility etc. The candidate should take one such certification which meets all their expectations instead of taking multiple certification exams.
Also there are many more certifications provided by insurance bodies
IFoA and CAS which are in development but need strong insurance domain knowledge
If you are a member of Pega Academy - then Pega has their own Data Science Program
Machine Learning Problem Framing -
Define a ML Problem and propose a solution
We have major three types of models:
Type of ML Problem | Description | Example |
---|---|---|
Classification | Pick one of N labels | Cat, dog, horse, or bear |
Regression | Predict numerical values | Click-through rate |
Clustering | Group similar examples | Most relevant documents (unsupervised) |
Association rule learning | Infer likely association patterns in data | If you buy hamburger buns, you're likely to buy hamburgers (unsupervised) |
Structured output | Create complex output | Natural language parse trees, image recognition bounding boxes |
Ranking | Identify position on a scale or status | Search result ranking |
In traditional software engineering, you can reason from requirements to a workable design, but with machine learning, it will be necessary to experiment to find a workable model.
Models will make mistakes that are difficult to debug, due to anything from skewed training data to unexpected interpretations of data during training. Furthermore, when machine-learned models are incorporated into products, the interactions can be complicated, making it difficult to predict and test all possible situations. These challenges require product teams to spend a lot of time figuring out what their machine learning systems are doing and how to improve them.
If you understand the problem clearly, you should be able to list some potential solutions to test in order to generate the best model. Understand that you will likely have to try out a few solutions before you land on a good working model.
Exploratory data analysis can help you understand your data, but you can't yet claim that patterns you find generalize until you check those patterns against previously unseen data. Failure to check could lead you in the wrong direction or reinforce stereotypes or bias.
Machines that perform jobs that mimic human behavior.
Machines that get better at a task without explicit programming. It is a subset of artificial intelligence that uses technologies (such as deep learning) that enable machines to use experience to improve at tasks.
Machines that have an artificial neural network inspired by the human brain to solve complex problems. It is a subset of machine learning that's based on artificial neural network.
A person with Multi-Disciplinary skills in math, statistics, predictive modeling and machine learning to make future predictions.
1. Reliability and Safety
: Ensure that AI systems operate as they
were originally designed, respond to unanticipated conditions and resist
harmful manipulation. If AI is making mistakes it is important to release a report
quantified risks and harms
to end-users so they are informed of the short comings of an AI solution.
2. Fairness : Implementing processes to ensure that decisions made by AI systems can be override by humans.
3. Privacy and Security : Provide customers with information and controls over the collection, use and storage of the data.
4. Inclusiveness: AI systems should empower everyone and engage people especially minority groups based on:
5. Transparency : AI systems should be understandable. Interpretability / intelligently is when end-users can understand the behavior of UI. Adopting an open source framework for AI can provide transparency (at least from the technical perspective) on the internal working of an AI systems.
6. Accountability : People should be responsible for AI systems. The structure put in place to consistently enact AI principles and taking them into account. AI systems should work with the :
Dataset : A dataset is a logical
grouping of units of data that are closely related and/or share the same data
structure.
Data labeling : process of identifying raw data and adding one or more meaningful and informative labels to provide context so that a machine learning model can learn.
Ground Truth : a properly labeled
dataset to you use as the objective standard to train and assess a given model
is often called as ‘ground truth’. The accuracy of your trained model will
depend on the accuracy of the ground truth.
Machine learning in Microsoft Azure
Microsoft
Azure provides the Azure Machine Learning service - a
cloud-based platform for creating, managing, and publishing machine learning
models. Azure Machine Learning provides the following features and
capabilities:
Feature |
Capability |
Automated machine learning
|
This feature enables non-experts to quickly create an
effective machine learning model from data.
|
Azure Machine Learning designer
|
A graphical interface enabling no-code development of
machine learning solutions.
|
Data and compute management
|
Cloud-based data storage and compute resources that
professional data scientists can use to run data experiment code at scale.
|
Pipelines
|
Data scientists, software engineers, and IT operations
professionals can define pipelines to orchestrate model training, deployment,
and management tasks.
|
Other Features of Azure Machine Learning Services :
A service that simplifies running AI/ML related workloads allowing you to build flexible Automated ML Pipelines. Use Python, R, Run DL workloads such as TensorFlow.
1. Jupyter Notebooks2. Azure Machine Learning SDK for Python
3. MLOps
4. Azure Machine Learning Designer
5. Data Labeling Service
6. Responsible Machine Learning
Performance/Evaluation Metrics are used to evaluate different Machine Learning Algorithms
For different types of problems different metrics matters
There are two categories of evaluation metrics:
One of the benefits of using Random Forest Model is
1. In Regression, when the variables may be highly correlated with each other, the approach of Random Forest really help in understanding the feature importance. The trick is Random forest selects explanatory variables at each variable split in the learning process, which means it trains a random subset of the feature instead of all sets of features. This is called feature bagging. This process reduces the correlation between trees; because the strong predictors could be selected by many of the trees, and it could make them correlated.
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
How to find the most important variables in R
Find the most important variables that contribute most significantly to a response variable
Selecting the most important predictor variables that explains the major part of variance of the response variable can be key to identify and build high performing models.
1. Random Forest Method
Random forest can be very effective to find a set of predictors that best explains the variance in the response variable.
library(caret) library(randomForest) library(varImp) regressor <- randomForest(Target ~ . , data = data, importance=TRUE) # fit the random forest with default parameter varImp(regressor) # get variable importance, based on mean decrease in accuracy varImp(regressor, conditional=TRUE) # conditional=True, adjusts for correlations between predictors
varimpAUC(regressor) # more robust towards class imbalance.
2. xgboost Method
library(caret) library(xgboost) regressor=train(Target~., data = data, method = "xgbTree",trControl = trainControl("cv", number = 10),scale=T)
varImp(regressor)
3. Relative Importance Method
Using calc.relimp {relaimpo}, the relative importance of variables fed into lm model can be determined as a relative percentage.
library(relaimpo) regressor <- lm(Target ~ . , data = data) # fit lm() model relImportance <- calc.relimp(regressor, type = "lmg", rela = TRUE) # calculate relative importance scaled to 100
sort(relImportance$lmg, decreasing=TRUE) # relative importance
4. MARS (earth package) Method
The earth package implements variable importance based on Generalized cross validation (GCV), number of subset models the variable occurs (nsubsets) and residual sum of squares (RSS).
library(earth) regressor <- earth(Target ~ . , data = data) # build model ev <- evimp (regressor) # estimate variable importance
plot (ev)
5. Step-wise Regression Method
If you have large number of predictors , split the Data in chunks of 10 predictors with each chunk holding the responseVar.
base.mod <- lm(Target ~ 1 , data = data) # base intercept only model all.mod <- lm(Target ~ . , data = data) # full model with all predictors stepMod <- step(base.mod, scope = list(lower = base.mod, upper = all.mod), direction = "both", trace = 1, steps = 1000) # perform step-wise algorithm shortlistedVars <- names(unlist(stepMod[[1]])) # get the shortlisted variable. shortlistedVars <- shortlistedVars[!shortlistedVars %in% "(Intercept)"] # remove intercept
The output might include levels within categorical variables, since ‘stepwise’ is a linear regression based technique.
If you have a large number of predictor variables, the above code may need to be placed in a loop that will run stepwise on sequential chunks of predictors. The shortlisted variables can be accumulated for further analysis towards the end of each iteration. This can be very effective method, if you want to
· Be highly selective about discarding valuable predictor variables.
· Build multiple models on the response variable.
6. Boruta Method
The ‘Boruta’ method can be used to decide if a variable is important or not.
library(Boruta) # Decide if a variable is important or not using Boruta boruta_output <- Boruta(Target ~ . , data = data, doTrace=2) # perform Boruta search boruta_signif <- names(boruta_output$finalDecision[boruta_output$finalDecision %in% c("Confirmed", "Tentative")]) # collect Confirmed and Tentative variables # for faster calculation(classification only) library(rFerns) boruta.train <- Boruta(factor(Target)~., data =data, doTrace = 2, getImp=getImpFerns, holdHistory = F) boruta.train boruta_signif <- names(boruta.train$finalDecision[boruta.train$finalDecision %in% c("Confirmed", "Tentative")]) # collect Confirmed and Tentative variables boruta_signif ## getSelectedAttributes(boruta_signif, withTentative = F) boruta.df <- attStats(boruta_signif) print(boruta.df)
7. Information value and Weight of evidence Method
library(devtools) library(woe) library(riv) iv_df <- iv.mult(data, y="Target", summary=TRUE, verbose=TRUE) iv <- iv.mult(data, y="Target", summary=FALSE, verbose=TRUE) iv_df iv.plot.summary(iv_df) # Plot information value summary Calculate weight of evidence variables data_iv <- iv.replace.woe(data, iv, verbose=TRUE) # add woe variables to original data frame.
The newly created woe variables can alternatively be in place of the original factor variables.
8. Learning Vector Quantization (LVQ) Method
library(caret) control <- trainControl(method="repeatedcv", number=10, repeats=3) # train the model regressor<- train(Target~., data =data, method="lvq", preProcess="scale", trControl=control) # estimate variable importance importance <- varImp(regressor, scale=FALSE)
9. Recursive Feature Elimination RFE Method
library(caret) # define the control using a random forest selection function control <- rfeControl(functions=rfFuncs, method="cv", number=10) # run the RFE algorithm results <- rfe(data[,1:n-1], data[,n], sizes=c(1:8), rfeControl=control) # summarize the results # list the chosen features predictors(results) # plot the results plot(results, type=c("g", "o"))
10. DALEX Method
library(randomForest) library(DALEX) regressor <- randomForest(Target ~ . , data = data, importance=TRUE) # fit the random forest with default parameter # Variable importance with DALEX explained_rf <- explain(regressor, data =data, y=data$target) # Get the variable importances varimps = variable_dropout(explained_rf, type='raw') print(varimps) plot(varimps)
11. VITA
library(vita) regressor <- randomForest(Target ~ . , data = data, importance=TRUE) # fit the random forest with default parameter pimp.varImp.reg<-PIMP(data,data$target,regressor,S=10, parallel=TRUE) pimp.varImp.reg pimp.varImp.reg$VarImp pimp.varImp.reg$VarImp sort(pimp.varImp.reg$VarImp,decreasing = T)
12. Genetic Algorithm
library(caret) # Define control function ga_ctrl <- gafsControl(functions = rfGA, # another option is `caretGA`. method = "cv", repeats = 3) # Genetic Algorithm feature selection ga_obj <- gafs(x=data[, 1:n-1], y=data[, n], iters = 3, # normally much higher (100+) gafsControl = ga_ctrl) ga_obj # Optimal variables ga_obj$optVariables
13. Simulated Annealing
library(caret) # Define control function sa_ctrl <- safsControl(functions = rfSA, method = "repeatedcv", repeats = 3, improve = 5) # n iterations without improvement before a reset # Simulated Annealing Feature Selection set.seed(100) sa_obj <- safs(x=data[, 1:n-1], y=data[, n], safsControl = sa_ctrl) sa_obj # Optimal variables print(sa_obj$optVariables)
14. Correlation Method
library(caret) # calculate correlation matrix correlationMatrix <- cor(data [,1:n-1]) # summarize the correlation matrix print(correlationMatrix) # find attributes that are highly corrected (ideally >0.75) highlyCorrelated <- findCorrelation(correlationMatrix, cutoff=0.5) # print indexes of highly correlated attributes
print(highlyCorrelated)
https://www.cio.com/article/3222879/15-data-science-certifications-that-will-pay-off.html
https://www.codespaces.com/best-data-science-certifications-courses-tutorials.html
https://www.codespaces.com/best-artificial-intelligence-courses-certifications.html
Domo Certificate
Tableau Certificate
]]>
Insofe : https://lms.insofe.com/courses
Coursera : Reinforcement Learning at Alberta
]]>1. R for Health Data Science (ed.ac.uk)
3. Data Analysis and Visualization in R for Ecologists (datacarpentry.org)
4. The Effect: An Introduction to Research Design and Causality | The Effect (theeffectbook.net)
5. Chapter 1 Introduction | ISLR tidymodels Labs (emilhvitfeldt.github.io)
6. R for applied epidemiology and public health | The Epidemiologist R Handbook (epirhandbook.com)
7. The lidR package (jean-romain.github.io)
8. Earth Lab: Free, online courses, tutorials and tools | Earth Data Science - Earth Lab
9. Collaborative Data Science for Healthcare
10. https://www.mltut.com/best-online-courses-for-data-science-with-r/
12. https://www.educateai.org/the-most-popular-machine-learning-courses/
14. https://github.com/addy1997/Machine_Learning_Resources
15. https://bookdown.org/mwheymans/bookmi/
16. https://www.routledge.com/go/ids -- paid Book Series
17. https://www.routledge.com/Chapman--HallCRC-The-R-Series/book-series/CRCTHERSER -- paid Book Series
]]>for frequentists, a probability is a measure of the frequency of repeated events
→ parameters are fixed (but unknown), and data are random for Bayesians,
a probability is a measure of the degree of certainty about values
→ parameters are random and data are fixed
Bayesians: Given our observed data, there is a 95% probability that the true value of θ falls within the credible region
vs.
Frequentists: There is a 95% probability that when I compute a confidence interval
from data of this sort, the true value of θ will fall within it.
Difference between CHI-Square and Proportions Testing
The chi-squared test of independence (or association) and the two-sample proportions test are related. The main difference is that the chi-squared test is more general while the 2-sample proportions test is more specific. And, it happens that the proportions test is more targeted at specifically the type of data you have.
The chi-squared test handles two categorical variables where each one can have two or more values. And, it tests whether there is an association between the categorical variables. However, it does not provide an estimate of the effect size or a CI. If you used the chi-squared test with the Pfizer data, you’d presumably obtain significant results and know that an association exists, but not the nature or strength of that association.
The two proportions test also works with categorical data but you must have two variables that each have two levels. In other words, you’re dealing with binary data and, hence, the binomial distribution. The Pfizer data you had fits this exactly. One of the variables is experimental group: control or vaccine. The other variable is COVID status: infected or not infected. Where it really shines in comparison to the chi-squared test is that it gives you an effect size and a CI for the effect size. Proportions and percentages are basically the same thing, but displayed differently: 0.75 vs. 75%.
Difference between 2-Sample t-test and CHI-Square
CHI-Square is for categorical data and the t-test is for continuous data
]]>]]>
]]>
https://htmlcolorcodes.com/color-picker/
https://www.w3schools.com/colors/colors_hexadecimal.asp
https://sourceforge.net/directory/os:windows/?q=hex+color
https://www.softpedia.com/get/Multimedia/Graphic/Graphic-Others/HEX-RGB-color-codes.shtml
https://www.umsiko.co.za/links/RGB-ColourNamesHex.pdf
http://www.workwithcolor.com/color-chart-full-01.htm
https://weschool.files.wordpress.com/2016/03/rgb-colournameshex.pdf
Sampling Methods | Types and Techniques Explained: https://www.scribbr.com/methodology/sampling-methods/
Introduction to Machine Learning by Duke University: https://exploreroftruth.medium.com/free-coursera-course-introduction-to-machine-learning-offered-by-duke-university-f229534e1e8e
Zero-Inflated Regression: https://towardsdatascience.com/zero-inflated-regression-c7dfc656d8af
Logistic Regression, Sigmoid Function: https://towardsdatascience.com/logistic-regression-cebee0728cbf
]]>