Search

What XAI Fails to Explain

“Many machine decisions are still poorly understood”, says IEEE Fellow,Cuntai Guan. Many papers even suggest a rigid dichotomy between accuracy and interpretability. Explainable AI (XAI) attempts to bridge this divide, but as we explain below, XAI justifies decisions without interpreting the model directly. This means practitioners in applications such as finance and medicine are forced into a dilemma: pick an uninterpretable, accurate model or an inaccurate, interpretable model. “Interpretable”Defining explainability or interpretability for computer vision is challenging: What does it even mean to explain a classification for high-dimensional inputs like images? As we discuss below, two popular definitions involve saliency maps and decision trees, but both approaches have their weaknesses. What Explainable AI Doesn’t Explain Saliency Maps¹ Many XAI methods produce heatmaps known as saliency maps, which highlight important input pixels that influence the prediction. However, saliency maps focus on the input and neglect to explain how the model makes decisions. For more on saliency maps, see these saliency tutorials and Github repositories. What Saliency Maps Fail to Explain To illustrate why saliency maps do not fully explain how the model predicts, here is an example: Below, the saliency maps are identical, but the predictions differ. Why? Even though both saliency maps highlight the correct object, one prediction is incorrect. How? Answering this could help us improve the model, but as shown below, saliency maps fail to explain the model’s decision process. Decision TreesAnother approach is to replace neural networks with interpretable models. Before deep learning, decision trees were the gold standard for accuracy and interpretability. Below, we illustrate the interpretability of decision trees, which works by breaking up each prediction into a sequence of decisions. For accuracy, however, decision trees lag behind neural networks by up to 40% accuracy on image classification datasets². Neural-network-and-decision-tree hybrids also underperform, failing to match neural networks on even the dataset CIFAR10, which features tiny 32x32 images like the one below. Neural-Backed Decision Trees We challenge this false dichotomy by building models that are both interpretable and accurate. Our key insight is to combine neural networks with decision trees, preserving high-level interpretability while using neural networks for low-level decisions, as shown below. We call these models Neural-Backed Decision Trees (NBDTs) and show they can match neural network accuracy while preserving the interpretability of a decision tree. NBDTs are as interpretable as decision trees. Unlike neural networks today, NBDTs can output intermediate decisions for a prediction. For example, given an image, a neural network may output Dog. However, an NBDT can output both Dog and Animal, Chordate, Carnivore (below). NBDTs achieve neural network accuracy. Unlike any other decision-tree-based method, NBDTs match neural network accuracy (< 1% difference) on 3 image classification datasets³. NBDTs also achieve accuracy within 2% of neural networks on ImageNet, one of the largest image classification datasets with 1.2 million 224x224 images. Furthermore, NBDTs set new state-of-the-art accuracies for interpretable models. The NBDT’s ImageNet accuracy of 75.30% outperforms the best competing decision-tree-based method by a whole ~14%. To contextualize this accuracy gain: A similar gain of ~14% for non-interpretable neural networks took 3 years of research. Justifications for Individual Predictions The most insightful justifications are for objects the model has never seen before. For example, consider an NBDT (below), and run inference on a Zebra. Although this model has never seen Zebra, the intermediate decisions shown below are correct — Zebras are both Animals and Ungulates (hoofed animal). The ability to see justification for individual predictions is quintessential for unseen objects. Justifications for Model BehaviorFurthermore, we find that with NBDTs, interpretability improves with accuracy. This is contrary to the dichotomy in the introduction: NBDTs not only have both accuracy and interpretability; they also make both accuracy and interpretability the same objective. For example, the lower-accuracy ResNet⁶ hierarchy (left) makes less sense, grouping Frog, Cat, and Airplane together. This is “less sensible,” as it is difficult to find an obvious visual feature shared by all three classes. By contrast, the higher-accuracy WideResNet hierarchy (right) makes more sense, cleanly separating Animal from Vehicle — thus, the higher accuracy, the more interpretable the NBDT. Understanding Decision RulesWith low-dimensional tabular data, decision rules in a decision tree are simple to interpret e.g., if the dish contains a bun, then pick the right child, as shown below. However, decision rules are not as straightforward for inputs like high-dimensional images. As we qualitatively find in the paper (Sec 5.3), the model’s decision rules are based not only on object type but also on context, shape, and color. To interpret decision rules quantitatively, we leverage an existing hierarchy of nouns called WordNet⁷; with this hierarchy, we can find the most specific shared meaning between classes. For example, given the classes Cat and Dog, WordNet would provide Mammal. In our paper (Sec 5.2) and pictured below, we quantitatively verify these WordNet hypotheses. Note that in small datasets with 10 classes i.e., CIFAR10, we can find WordNet hypotheses for all nodes. However, in large datasets with 1000 classes i.e., ImageNet, we can only find WordNet hypotheses for a subset of nodes. How it Works The training and inference process for a Neural-Backed Decision Tree can be broken down into four steps. Construct a hierarchy for the decision tree. This hierarchy determines which sets of classes the NBDT must decide between. We refer to this hierarchy as an Induced Hierarchy. This hierarchy yields a loss function, that we call the Tree Supervision Loss. Train the original neural network, without any modifications, using this new loss. Start inference by passing the sample through the neural network backbone. The backbone is all neural network layers before the final fully-connected layer. Finish inference by running the final fully-connected layer as a sequence of decision rules, which we call Embedded Decision Rules. These decisions culminate in the final prediction. ConclusionExplainable AI does not fully explain how the neural network reaches a prediction: Existing methods explain the image’s impact on model predictions but do not explain the decision process. Decision trees address this, but unfortunately, images⁷ are kryptonite for decision tree accuracy.
Rated 4.0/5 based on 45 customer reviews

What XAI Fails to Explain

16K
What XAI Fails to Explain

“Many machine decisions are still poorly understood”, says IEEE Fellow,Cuntai Guan. Many papers even suggest a rigid dichotomy between accuracy and interpretability. 

Explainable AI (XAI) attempts to bridge this divide, but as we explain below, XAI justifies decisions without interpreting the model directly. This means practitioners in applications such as finance and medicine are forced into a dilemma: pick an uninterpretable, accurate model or an inaccurate, interpretable model. 

“Interpretable”

Defining explainability or interpretability for computer vision is challenging: What does it even mean to explain a classification for high-dimensional inputs like images? As we discuss below, two popular definitions involve saliency maps and decision trees, but both approaches have their weaknesses. 

What Explainable AI Doesn’t Explain 

Saliency Maps¹ 

Many XAI methods produce heatmaps known as saliency maps, which highlight important input pixels that influence the prediction. However, saliency maps focus on the input and neglect to explain how the model makes decisions. 

For more on saliency maps, see these saliency tutorials and Github repositories

What Saliency Maps Fail to Explain 

To illustrate why saliency maps do not fully explain how the model predicts, here is an example: Below, the saliency maps are identical, but the predictions differ. Why? Even though both saliency maps highlight the correct object, one prediction is incorrect. How? Answering this could help us improve the model, but as shown below, saliency maps fail to explain the model’s decision process. 

Decision Trees

Another approach is to replace neural networks with interpretable models. Before deep learning, decision trees were the gold standard for accuracy and interpretability. Below, we illustrate the interpretability of decision trees, which works by breaking up each prediction into a sequence of decisions. 

For accuracy, however, decision trees lag behind neural networks by up to 40% accuracy on image classification datasets². Neural-network-and-decision-tree hybrids also underperform, failing to match neural networks on even the dataset CIFAR10, which features tiny 32x32 images like the one below. 

Neural-Backed Decision Trees 

We challenge this false dichotomy by building models that are both interpretable and accurate. Our key insight is to combine neural networks with decision trees, preserving high-level interpretability while using neural networks for low-level decisions, as shown below. We call these models Neural-Backed Decision Trees (NBDTs) and show they can match neural network accuracy while preserving the interpretability of a decision tree. 

NBDTs are as interpretable as decision trees. Unlike neural networks today, NBDTs can output intermediate decisions for a prediction. For example, given an image, a neural network may output Dog. However, an NBDT can output both Dog and Animal, Chordate, Carnivore (below). 

NBDTs achieve neural network accuracy. Unlike any other decision-tree-based method, NBDTs match neural network accuracy (< 1% difference) on 3 image classification datasets³. NBDTs also achieve accuracy within 2% of neural networks on ImageNet, one of the largest image classification datasets with 1.2 million 224x224 images. 

Furthermore, NBDTs set new state-of-the-art accuracies for interpretable models. The NBDT’s ImageNet accuracy of 75.30% outperforms the best competing decision-tree-based method by a whole ~14%. To contextualize this accuracy gain: A similar gain of ~14% for non-interpretable neural networks took 3 years of research. 

Justifications for Individual Predictions 

The most insightful justifications are for objects the model has never seen before. For example, consider an NBDT (below), and run inference on a Zebra. Although this model has never seen Zebra, the intermediate decisions shown below are correct — Zebras are both Animals and Ungulates (hoofed animal). The ability to see justification for individual predictions is quintessential for unseen objects. 

Justifications for Model Behavior

Furthermore, we find that with NBDTs, interpretability improves with accuracy. This is contrary to the dichotomy in the introduction: NBDTs not only have both accuracy and interpretability; they also make both accuracy and interpretability the same objective. 

For example, the lower-accuracy ResNet⁶ hierarchy (left) makes less sense, grouping Frog, Cat, and Airplane together. This is “less sensible,” as it is difficult to find an obvious visual feature shared by all three classes. By contrast, the higher-accuracy WideResNet hierarchy (right) makes more sense, cleanly separating Animal from Vehicle — thus, the higher accuracy, the more interpretable the NBDT. 

Understanding Decision Rules

With low-dimensional tabular data, decision rules in a decision tree are simple to interpret e.g., if the dish contains a bun, then pick the right child, as shown below. However, decision rules are not as straightforward for inputs like high-dimensional images. 

As we qualitatively find in the paper (Sec 5.3), the model’s decision rules are based not only on object type but also on context, shape, and color. 

To interpret decision rules quantitatively, we leverage an existing hierarchy of nouns called WordNet⁷; with this hierarchy, we can find the most specific shared meaning between classes. For example, given the classes Cat and Dog, WordNet would provide Mammal. In our paper (Sec 5.2) and pictured below, we quantitatively verify these WordNet hypotheses. 

Note that in small datasets with 10 classes i.e., CIFAR10, we can find WordNet hypotheses for all nodes. However, in large datasets with 1000 classes i.e., ImageNet, we can only find WordNet hypotheses for a subset of nodes. 

How it Works 

The training and inference process for a Neural-Backed Decision Tree can be broken down into four steps. 

Construct a hierarchy for the decision tree. This hierarchy determines which sets of classes the NBDT must decide between. We refer to this hierarchy as an Induced Hierarchy. 

This hierarchy yields a loss function, that we call the Tree Supervision Loss. Train the original neural network, without any modifications, using this new loss. 

Start inference by passing the sample through the neural network backbone. The backbone is all neural network layers before the final fully-connected layer. 

Finish inference by running the final fully-connected layer as a sequence of decision rules, which we call Embedded Decision Rules. These decisions culminate in the final prediction. 

Conclusion

Explainable AI does not fully explain how the neural network reaches a prediction: Existing methods explain the image’s impact on model predictions but do not explain the decision process. Decision trees address this, but unfortunately, images⁷ are kryptonite for decision tree accuracy.

KnowledgeHut

KnowledgeHut

Author

KnowledgeHut is an outcome-focused global ed-tech company. We help organizations and professionals unlock excellence through skills development. We offer training solutions under the people and process, data science, full-stack development, cybersecurity, future technologies and digital transformation verticals.
Website : https://www.knowledgehut.com

Join the Discussion

Your email address will not be published. Required fields are marked *

Suggested Blogs

Trending Specialization Courses in Data Science

Data scientists, today are earning more than the average IT employees. A study estimates a need for 190,000 data scientists in the US alone by 2021. In India, this number is estimated to grow eightfold, reaching $16 billion by 2025 in the Big Data analytics sector. With such a growing demand for data scientists, the industry is developing a niche market of specialists within its fields.  Companies of all sizes, right from large corporations to start-ups are realizing the potential of data science and increasingly hiring data scientists. This means that most data scientists are coupled with a team, which is staffed with individuals with similar skills. While you cannot remain a domain expert in everything related to data, one can be the best at the specific skill or specialization that they were hired for. Not only this specialization within data science will also entail you with more skills in paper and practice, compared to other prospects during your next interview. Trending Specialization Courses in Data ScienceOne of the biggest myths about data science is that one needs a degree or Ph.D. in Data Science to get a good job. This is not always necessary. In reality employers value job experience more than education. Even if one is from a non-technical background, they can pursue a career in data science with basic knowledge about its tools such as SAS/R, Python coding, SQL database, Hadoop, and a passion towards data.  Let’s explore some of the trending specializations that companies are currently looking out for while hiring data scientists: Data Science with RA powerful language commonly used for data analysis and statistical computing; R is one of the best picks for beginners as it does not require any prior coding experience. It consists of packages like SparkR, ggplot2, dplyr, tidyr, readr, etc., which have made data manipulation, visualization, and computation faster. Additionally, it also has provisions to implement machine learning algorithms. Data Science with Python Python, originally a general-purpose language, is open-source code and a common language for data science. This language has a dedicated library for data analysis and predictive modelling, making it a highly demanded data science tool. On a personal level, learning data science with python can also help you produce web-based analytics products.  Big Data analytics Big data is the most trending of the listed specializations and requires a certain level of experience. It examines large amounts of data and extracts hidden patterns, correlations, and several other insights. Companies world-over are using it to get instant inputs and business results. According to IDC, Big Data and Business Analytics Solutions will reach a whopping $189.1 billion this year. Additionally, big data is a huge umbrella term that uses several types of technologies to get the most value out of the data collected. Some of them include machine learning, natural language processing, predictive analysis, text mining, SAS®, Hadoop, and many more.  Other specializationsSome knowledge of other fields is also required for data scientists to showcase their expertise in the field. Being in the know-how of tools and technologies related to machine learning, artificial intelligence, the Internet of Things (IoT), blockchain and several other unexplored fields is vital for data enthusiasts to emerge as leaders in their niche fields.  Building a career in Data ScienceWhether you are a data aspirant from a non-technical background, a fresher, or an experienced data scientist – staying industry-relevant is important to get ahead. The industry is growing at a massive rate and is expected to have 2.7 million open job roles by the end of 2020. Industry experts point out that one of the biggest causes for tech companies to lay off employees is not automation, but the growing gap between evolving technologies and the lack of niche manpower to work on it. To meet these high standards keeping up with your data game is crucial.
Rated 4.5/5 based on 0 customer reviews
2863
Trending Specialization Courses in Data Science

Data scientists, today are earning more than the a... Read More

10 Mandatory Skills to Become an AI & ML Engineer

The world has been evolving rapidly with technological advancements. Out of many of these, we have AI (Artificial Intelligence) and ML (Machine learning). The era of machines and robots are taking center stage and soon there will be a time when AI and ML will be an integral part of our lives. From automated cars to android systems in many phones, apps, and other electronic devices, AI and ML have a wide range of impact on how easy machines and AI can make our lives. The future of these technologies is quite promising; it is beyond our wildest imagination. So, there is already and will be a lot of demand for AI and ML professionals, known as AI and ML engineers. Before understanding the essential skills required to become an AI and ML engineer, we should understand what kind of job roles these two are. AI Engineer vs ML Engineer: Are they the same?Although they look the same, there are some subtle differences between AI and ML engineers. It boils down to the way they work and the software and languages they work on, to reach one common goal: Artificial Intelligence. Simply put, an AI engineer applies AI algorithms to solve real-life problems and building software. On similar terms, an ML engineer utilizes machine learning techniques in solving real-life problems and to build software. They enable computers to self-learn by giving them the thinking capability of humans. Like mentioned earlier, these two job roles get the same output using different methods. However, many top companies are hiring professionals skilled in working both on AI and ML. The capability of an astounding AI and ML engineer is reflected by both the technical and non-technical skills. Let us see what it takes to be one of these two professionals. Common skills for Artificial and Machine Learning Technical Skills 1. Programming Languages A good understanding of programming languages, preferably python, R, Java, Python, C++ is necessary. They are easy to learn, and their applications provide more scope than any other language. Python is the undisputed lingua franca of Machine Learning. 2. Linear Algebra, Calculus, Statistics It is recommended to have a good understanding of the concepts of Matrices, Vectors, and Matrix Multiplication. Moreover, knowledge in Derivatives and Integrals and their applications is essential to even understand simple concepts like gradient descent. Whereas statistical concepts like Mean, Standard Deviations, and Gaussian Distributions along with probability theory for algorithms like Naive Bayes, Gaussian Mixture Models, and Hidden Markov Models are necessary to thrive in the world of Artificial Intelligence and Machine Learning. 3. Signal Processing TechniquesA Machine Learning engineer should be competent in understanding Signal Processing and able to solve several problems using Signal Processing techniques because feature extraction is one of the most critical aspects of Machine Learning. Then we have Time-frequency Analysis and Advanced Signal Processing Algorithms like Wavelets, Shearlets, Curvelets, and Bandlets. A profound theoretical and practical knowledge of these will help you to solve complex situations. 4. Applied Math and AlgorithmsA solid foundation and expertise in algorithm theory is surely a must. This skill set will enable understanding subjects like Gradient Descent, Convex Optimization, Lagrange, Quadratic Programming, Partial Differential equation, and Summations. As tough as it may seem, Machine Learning and Artificial Intelligence are much more dependable on mathematics than how things are in, e.g. front-end development. 5. Neural Network ArchitecturesMachine Learning is used for complex tasks that are beyond human capability to code. Neural networks have been understood and proven to be by far the most precise way of countering many problems like Translation, Speech Recognition, and Image Classification, playing a pivotal role in the AI department. Non-Technical and Business skills 1. Communication Communication is the key in any line of work, AI/ML engineering is no exception. Explaining AI and ML concepts to even to a layman is only possible by communicating fluently and clearly. An AI and ML engineer does not work alone. Projects will involve working alongside a team of engineers and non-technical teams like the Marketing or Sales departments. So a good form of communication will help to translate the technical findings to the non-technical teams. Communication does not only mean speaking efficiently and clearly.2. Industry KnowledgeMachine learning projects that focus on major troubling issues are the ones that finish without any flaws. Irrespective of the industry an AI and ML engineer works for, profound knowledge of how the industry works and what benefits the business is the key ingredient to having a successful AI and ML career. Channeling all the technical skills productively is only possible when an AI and ML engineer possesses sound business expertise of the crucial aspects required to make a successful business model. Proper industry knowledge also facilitates in interpreting potential challenges and enabling the continual running of the business. 3. Rapid PrototypingIt is quite critical to keep working on the perfect idea with the minimum time consumed. Especially in Machine Learning, choosing the right model along with working on projects like A/B testing holds the key to a project’s success. Rapid Prototyping helps in forming an array of techniques to fasten building a scale model of a physical part. This is also true while assembling with three-dimensional computer-aided design, more so while working with 3D models Additional skills for Machine Learning 1. Language, Audio and Video ProcessingWith Natural Language Processing, AI and ML engineers get the chance to work with two of the foremost areas of work: Linguistics and Computer Science like text, audio, or video. An AI and ML engineer should be well versed with libraries like Gensim, NLTK, and techniques like word2vec, Sentimental Analysis, and Summarization 2. Physics, Reinforcement Learning, and Computer VisionPhysics: There will be real-world scenarios that require the application of machine learning techniques to systems, and that is when the knowledge of Physics comes into play. Reinforcement Learning: The year, 2017 witnessed Reinforcement Learning as the primary reason behind improving deep learning and artificial intelligence to a great extent. This will act as a helping hand to pave the way into the field of robotics, self-driving cars, or other lines of work in AI. Computer Vision: Computer Vision (CV) and Machine Learning are the two major computer science branches that can separately work and control very complex systems, systems that rely exclusively on CV and ML algorithms but can bring more output when the two work in tandem. 
Rated 4.5/5 based on 0 customer reviews
3597
10 Mandatory Skills to Become an AI & ML Engineer

The world has been evolving rapidly with technol... Read More

10 Mandatory Skills to Become a Data Scientist

The data science industry is growing at an alarming pace, generating a revenue of $3.03 billion in India alone. Even a 10% increase in data accessibility is said to result in over $65 million additional net income for the typical Fortune 1000 companies worldwide. The data scientist has been ranked the best job in the US for the 4th year in a row, with an average salary of $108,000; and the demand for more data scientists only seems to be growing. Who is a Data scientist?A data scientist is precisely someone who collects all the massive data that is available online, organizes the unstructured formats into bite-sized readable content, and analyses this to extract vital information about customer trends, thinking patterns, and behavior. This information is then used to create business goals or agendas that are aligned to the end-user/customer’s needs.  This outlines that a data scientist is someone with sound technical knowledge, interpersonal skills, strong business acumen, and most importantly, a passionate data enthusiast. Listed below are some mandatory skills that an aspiring data scientist must develop. 10 Mandatory Skills to Become a Data Scientist Technical Skills  1. Programming, Packages, and Software Since the first task of data scientists is to gather all the information or raw data and transform this into actionable insights, they need to have advanced knowledge in coding and statistical data processing. Some of the common programming languages used by data scientists are Python, R, SQL, NoSQL, Java, Scala, Hadoop, and many more.  2. Machine Learning and Deep LearningMachine Learning and Deep Learning are subsets of Artificial Intelligence (AI). Data science largely overlaps the growing field of AI, as data scientists use their potentials to clean, prepare, and extract data to run several AI applications. While machine learning enables supervised, unsupervised, and reinforced learning, deep learning helps in making datasets study and learn from existing information. A good example is the facial recognition feature in photos, doodling games like quick draw, and more. 3. Big Data Data Scientists are the best bridge between the vast pool of big data and emerging businesses. Big data analytics uses Hadoop or Spark to gather, distribute, and process various datasets. This is an important business trend that companies are using to predict customer tendencies and create a competitive edge.  4. NLP, Cloud Computing and othersNatural Language Processing (NLP), a branch of AI that uses the language used by human beings, processes it, and learns to respond accordingly. Several apps and voice-assisted devices like Alexa and Siri are already using this remarkable feature. As data scientists use large amounts of data stored on clouds, familiarity with cloud computing software like AWS, Azure, and Google cloud will be beneficial. Learning frameworks like DevOps can help data scientists streamline their work, along with several other such upcoming technologies. 5. Database management and visualizationWhile all the above skills deal with gathering and reading data, database management is related to data manipulation. In database management, the data clusters are edited, indexed, and manipulated to yield desirable outcomes or information. The next step to this transformed raw data is to present it in a visually comprehensible manner, which is nothing but data visualization. It includes graphical representation and other elements to make the data easily understandable even by a layman.  Non-technical Skills 6. Communication skills As explained above, once the raw data is processed, it needs to be presented understandably. This does not limit the job to just visually coherent information but also the ability to communicate the insights of these visual representations. The data scientist should be excellent at communicating the results to the marketing team, sales team, business leaders, and other stakeholders. 7. Team player This is related to the previous point. Along with effective communication skills, data scientists need to be good team players, accommodating feedback, and other inputs from business teams. They should also be able to efficiently communicate their requirements to the data engineers, data analysts, and other members of the team. Coordination with their team members can yield faster results and optimal outputs. 8. Business acumenSince the job of the data scientist ultimately boils down to improving/growing the business, they need to be able to think from a business perspective while outlining their data structures. They should have in-depth knowledge of the industry of their business, the existing business problems of their company, and forecasting potential business problems and their solutions. 9. Critical thinking Apart from finding insights, data scientists need to align these results with the business. They need to be able to frame appropriate questions and steps/solutions to solve business problems. This objective ability to analyse data and addressing the problem from multiple angles is crucial in a data scientist. 10. Intellectual curiosityAccording to Harvard Business Review, data scientists spend 80% of their time discovering and preparing data. For this, they must always be a step ahead and catch up with the latest trends. Constant upskilling and a curiosity to learn new ways to solve existing problems quicker can get data scientists a long way in their careers. Taking data-driven decisionsData science is indisputably one of the leading industries today. Whether you are from a technical field or a non-technical background, there are several ways to build up the skill to become a data scientist. From online courses to boot camps, one should always be a step ahead in this competitive field to build up their data work portfolios. Additionally, reading up on the latest technologies and regular experimentation with new trends is the way forward for aspirants. 
Rated 4.5/5 based on 0 customer reviews
3873
10 Mandatory Skills to Become a Data Scientist

The data science industry is growing at an alarmin... Read More