Unlocking the Power
of AI for
Malaria Detection in
Cell Images .

a Machine Learning Project

Github

our team

Julia Liou | Project Manager | Data Analyst &
Data Engineer

Julia is a versatile professional with a strong background in healthcare and a burgeoning expertise in data analytics. With her foundation in nursing and biology, she excels in collaboration, communication, and empathy, while also mastering data analytics tools like Python, SQL, and Tableau. Julia's career spans from delivering patient-centric care to actively contributing to vital research, and her ability to bridge the gap between healthcare and technology makes her a standout talent in both fields.

email

julialiou@alumni.ubc.ca

Manpreet Sharma | Data Scientist

Manpreet is renowned for her exceptional skills in enhancing operational efficiency and refining reporting structures. With a robust background in logistics management and a keen eye for process improvement, she has consistently streamlined complex workflows and optimized supply chain operations. Her expertise extends beyond logistics, demonstrating a profound understanding of data analytics and its pivotal role in refining logistical processes. Leveraging tools such as Excel, Python, and Power BI, Manpreet has transformed raw data into actionable insights, enabling data-driven decisions that have significantly impacted the organizations she's worked with. Her distinct forte lies in her remarkable capacity to harmonize the realms of logistics and data analytics, empowering her to forge pioneering solutions.

email

mehpree88@icloud.com

Srinivas Jayaram | Machine Learning Engineer

Srinivas Jayaram is a dedicated professional with a rich background in the oil and gas industry and an academic foundation in life sciences, highlighted by his Master's degree. His expertise expands into machine learning, demonstrated through impactful work in a project focused on predicting malaria outcomes. This project reflects his ability to skillfully bridge technology with his knowledge in life sciences. Srinivas' well-rounded skill set not only makes him adept at tackling complex challenges but also positions him as a valuable team player in any data-centric or tech-oriented role. His strong communication skills, both in writing and speaking, further augment his capability to lead and collaborate effectively in diverse project environments.

email

srinivasj1987@outlook.com

Kevin Wan | Software Developer | Web Developer

Hello! I’m Kevin Wan, a dedicated person with financial background turned data enthusiast. With a rich foundation in financial and consumer behavior, I recently stepped into the world of data analytics, fueled by a passion for turning numbers into narratives. My journey in banking has honed my attention to detail, ethical standards, and problem-solving acumen – skills I now apply to data analysis with the same rigor and dedication. As I embrace the intricacies of data patterns and predictive analytics, I'm on the lookout for opportunities that challenge me to blend my analytical skills with my unwavering commitment to accuracy and actionable insights. Whether it’s through meticulous data storytelling or collaborative project success, I'm excited to contribute to a team that values integrity and innovation. Away from data analysis, I'm often absorbed in novels or video games, which kindle my creativity and zest for life. They inspire admiration for the unseen talents behind these arts, motivating me to refine my own skills. Let’s connect and explore how we can turn data into decisions that drive success!

email

kevin.wan083@gmail.com

overview

This project leverages machine learning to analyze cell images from individuals with and without malaria, ultimately aiming to predict malaria infections. It encompasses the development of a web application designed for educational purposes in the fields of science and medicine. We have developed an engaging game where users can select from a random set of images and challenge themselves to identify images with malaria. In this interactive experience, users compete against our machine learning model's predictions.


purpose

The project is designed to assist medical students in learning how to detect malaria in cell images through an interactive and educational game. Users can refine their diagnostic skills while playing against the machine learning model, making the learning process both informative and engaging.


Project Timeline

This project was completed within 2-3 weeks in September and October 2023.

project_timeline

Tools

The following are some of the tools we utilized throughout our project.

Python
The primary programming language utilized throughout the project.
Data Cleaning + EDA Creation
Pandas
Data Visualization
EDA: Tableau and Matplotlib Web Application: Plotly.js
Machine Learning Python Libraries
Sklearn, Tensorflow, Keras
Other python libraries
Shutil, IO, Random, PIL, Scipy.stats, Numpy .
Database for hosting web application
Flask, SQLite, AWS S3.Bucket
Front End Developer Tools
HTML, CSS, Javascript
Libraries utilized:Jinga2, Bootstrap, Plotly, Sweetalert2.
Project Management Tools
Miro, Trello, Canva, GitHub
-----

dataset

Exploratory Data Analysis (EDA)

Image Preprocessing

  • Data Reduction

    5000 Images

    Python: PIL, OS

    27600 to 5000 images
    2500 infected 2500 uninfected
    70% of images go into training folder 30% go into testing folder

  • Image Resizing

    Reducing Pixel Size

    Python: os

    Converting image size from 150x150 to 25x25

    Image Resizing

Image Resizing: 150x150 to 25x25

portfolio image

EDA

The Exploratory Data Analysis (EDA) on cell images of malaria compares unprocessed vs. processed and uninfected vs. infected cells. It employs various techniques:

  • Blob Detection
  • Measures mean and max blob sizes for both cell types, assessing statistical differences using histograms and T-tests.

  • Edge Detection
  • Visualizes edge structure differences between cell types.

  • Edge Density
  • Compares edge density, using histograms and T-tests for analysis.

  • Average RGB Color
  • Compares RGB color distribution, using histograms and T-tests to find distinctions.

  • Principal Component Analysis (PCA)
  • A dimensionality reduction technique used in image analysis to extract and represent the most important features or components of an image.

The EDA relies on statistical analysis to differentiate cell characteristics. We utilized EDA to generate numeric values based on the cell images, to put through our machine learning models. Explore the Tableau Dashboard for visual insights: Tableau Dashboard .

Machine Learning Models

The following is a summary of the optimal machine learning model we utilized for this project. We had tested out several other models for the purpose of our project, but utilized this specific model for our web application. If you are interested in learning more about the other models we utilized, feel free to view our Git Hub Repo.

CNN

Web Application

Feel free to watch the video on how our web application works.