SAS Viya Machine Learning - Figure 6: Heat map analysis for two correctly classified images

Classifying COVID from Non-COVID Using SAS Viya Machine Learning: An Enhanced Approach

Introduction:

In this blog post, we will demonstrate how to use SAS Viya Machine Learning to train a convolutional neural network (CNN) for accurately detecting patients with COVID-19. We will be implementing the transfer learning technique, following the methodology established by Tuan D. Pham.

For this demonstration, we will be using CT images from 80 patients with confirmed COVID-19 diagnosis and 542 non-COVID subjects. These images were obtained from various sources such as Harvard Dataverse, Mendeley Data, and the Cancer Imaging Archive. The CT images were initially in 3-D DICOM files but were converted to 2-D PNG format for further analysis.

To begin our pipeline, we import and preprocess the images using the image action set in SAS Viya. The processed images are then resized to 224×224 and normalized to a range of 0 to 255. The processed images are converted to ImageTable format for deep learning analysis using the DLPy library.

Next, we split the processed images into training, validation, and test sets. The training set consists of 64% of the images, while the validation and test sets consist of 16% and 20% of the images, respectively.

For the classification model, we utilize the ResNet-50 model, which has been shown to achieve 93% accuracy in detecting COVID-19 cases. We use a pretrained ResNet-50 model on the ImageNet dataset to leverage the learned representations. The model is trained using the fit function from the DLPy library, with the implementation of VanillaSolver and a Cyclic Learning Scheduler for optimal performance.

After training the model, we evaluate its performance using the test dataset. The model accurately classifies COVID-19 subjects with 99.4% accuracy and non-COVID subjects with 96.3% accuracy. The performance can be further visualized using a confusion matrix.

Additionally, we utilize the heat_map_analysis function from DLPy to identify regions of interest in the images that provide the most useful information in distinguishing between COVID-19 and non-COVID subjects. This helps us understand the model’s predictions and why misclassifications occur.

You May Also Like to Read  Analyzing Data Manually: Introduction to Descriptive Statistics

In conclusion, SAS Viya Machine Learning, along with the transfer learning technique, allows us to build a robust and accurate image classification model for detecting COVID-19 cases. The pretrained ResNet-50 model and the various functionalities provided by SAS Viya make the pipeline efficient and effective.

Full Article: Classifying COVID from Non-COVID Using SAS Viya Machine Learning: An Enhanced Approach

SAS Viya Machine Learning Provides Tools to Train Convolutional Neural Network for COVID-19 Detection

The usage of SAS Viya Machine Learning, specifically the image action set, allows for the import and preprocessing of images that can then be used to train machine learning classification models. By utilizing the transfer learning technique, it is possible to accurately detect patients with COVID-19. This process follows the methodology established in a study by Tuan D. Pham.

Data Sets Used for Training

To demonstrate the effectiveness of SAS Viya Machine Learning, CT images were obtained from various sources, including Harvard Dataverse, Mendeley Data, and the Cancer Imaging Archive. These images consisted of 80 COVID and 542 non-COVID subjects, all of whom had confirmed positive COVID-19 diagnoses. Initially, the COVID subjects’ CT images were in 3-D DICOM format with over 100 image slices. These images were converted to 2-D PNG format, and any images that did not include enough lung regions were excluded from further analysis. In total, 1392 COVID and 1120 non-COVID CT 2-D images were utilized to train the classification model.

Importing and Preprocessing Data in SAS Viya

To begin the machine learning pipeline, the image action set in SAS Viya is used to import all the images. It is important to set the ‘decode’ parameter to ‘False’ for deep learning analysis. The input images are then resized to 224×224 and normalized to a range of 0 to 255 using the MINMAX normalization technique. These processed images are then converted to ImageTable for further deep learning analysis.

Building a Classification Model in SAS Viya

Given the complexity of the classification task, it is necessary to use a deeper model to accurately detect COVID-19. The ResNet-50 model has been proven effective in achieving 93% accuracy in similar studies. However, training a model from scratch requires a large amount of data. To address this issue, a pretrained ResNet-50 model on the ImageNet dataset can be utilized to leverage learned representations. The pretrained weights are specified when instantiating the ResNet50_Caffe class in SAS Viya.

You May Also Like to Read  Introducing Upgraded Security Controls and Compliance Certifications for Azure Databricks and AWS Databricks SQL Serverless

Training the Model and Evaluation

The model is trained using the fit function from the DLPy library in SAS Viya. A VanillaSolver with a Cyclic Learning Scheduler is used as the optimizer, and the model is trained for a specific number of epochs. After training, the model is evaluated using the evaluate function, which provides performance metrics such as accuracy. Additionally, a confusion matrix can be created to compare predicted and actual classes, providing insight into misclassifications.

Visualizing Model Performance

To further visualize the performance of the classification model, plots can be generated using the DLPy library. These plots show correctly classified images and incorrectly classified images along with predicted probability bar charts. Heat map analysis can also be applied to identify regions of interest that contribute to the model’s decision-making process.

Discussion

The use of an optimizer and learning rate scheduler played a crucial role in achieving exceptional results in a short period. SAS Viya Machine Learning, with its image action set and deep learning capabilities, provides a comprehensive pipeline for training convolutional neural networks to accurately detect patients with COVID-19. By leveraging transfer learning and pretrained models, it is possible to mitigate the need for large amounts of data and achieve high accuracy in classification tasks.

Summary: Classifying COVID from Non-COVID Using SAS Viya Machine Learning: An Enhanced Approach

This blog post demonstrates how to utilize SAS Viya Machine Learning to train a convolutional neural network (CNN) for accurately detecting patients with COVID-19 using CT images. The post follows a methodology established in a study by Tuan D. Pham. The CT images used for training are from COVID and non-COVID subjects, and they are preprocessed and resized before being fed into the CNN. A pretrained ResNet-50 model is used for training to leverage learned representations from ImageNet dataset. The model is evaluated using a test dataset, and the performance and confusion matrix are analyzed. The post concludes with a discussion on the use of optimizers and learning rate schedulers in achieving exceptional results.

You May Also Like to Read  Predicting Prices: Shiba Inu (SHIB), Dogecoin (DOGE), and Exciting New Presale Meme Pomerdoge (POMD)

Frequently Asked Questions:

Q1: What is data science and why is it important?
A1: Data science is an interdisciplinary field that involves extracting meaningful insights and knowledge from structured and unstructured data. It combines various techniques from statistics, mathematics, computer science, and domain knowledge to analyze and interpret data. Data science is crucial in today’s data-driven world as it helps organizations make informed decisions, identify patterns and trends, and discover valuable insights that can lead to improved business strategies and innovation.

Q2: What are the main steps involved in the data science process?
A2: The data science process typically includes five main steps: 1) Defining the problem or objective, 2) Collecting and preprocessing data, 3) Exploratory data analysis to understand the data, 4) Building and validating models using statistical and machine learning techniques, and 5) Communicating the results and findings to stakeholders. This iterative process allows data scientists to extract meaningful insights and make data-driven decisions.

Q3: What skills are required for a career in data science?
A3: Data science requires a combination of technical and soft skills. Technical skills include proficiency in programming languages such as Python or R, understanding of data manipulation and analysis tools, knowledge of statistical and machine learning techniques, and familiarity with data visualization. Soft skills such as critical thinking, problem-solving, and effective communication are equally important in order to effectively interpret and communicate data insights to non-technical stakeholders.

Q4: What are some real-world applications of data science?
A4: Data science finds applications in various industries. Some notable examples include: 1) Predictive modeling for fraud detection and prevention in the banking sector, 2) Recommendation systems that personalize online shopping experiences for customers, 3) Healthcare analytics to improve patient outcomes and optimize resource allocation, 4) Predictive maintenance to identify and prevent equipment failures in manufacturing, and 5) Sentiment analysis to gauge customer opinions and feedback on social media platforms.

Q5: What are the ethical considerations in data science?
A5: Data science raises ethical concerns related to privacy, bias, and transparency. Privacy concerns arise when individuals’ data is collected without their knowledge or used for unintended purposes. Bias can be introduced in data science models when the data used to train them contains discriminatory patterns or fails to represent diverse populations. Transparency is also critical, as it is important to clearly communicate to users and stakeholders how data is being collected, used, and protected, ensuring transparency and trust in data-driven decision-making processes.