Skip to content

This project is an exploratory data analysis (EDA) of the Breast Cancer Diagnostic dataset. The goal is to gain insights into the dataset and prepare it for further analysis and modeling.

Notifications You must be signed in to change notification settings

rasikasrimal/TumorDiagnosis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Tumor Diagnosis: Exploratory Data Analysis

Kaggle Notebook

Dataset Cover

About the Dataset

The Breast Cancer Diagnostic data is sourced from the UCI Machine Learning Repository. It is also accessible through the UW CS FTP server.

This dataset comprises features computed from digitized images of fine needle aspirates (FNA) of breast masses. These features describe the characteristics of cell nuclei present in the images. The dataset is discussed in the paper: K. P. Bennett and O. L. Mangasarian: "Robust Linear Programming Discrimination of Two Linearly Inseparable Sets", Optimization Methods and Software 1, 1992, 23-34.

Attribute Information

  • ID number: Unique identifier for each instance.
  • Diagnosis: Class label (M = malignant, B = benign).
  • Features: Derived from the cell nuclei in the images, including:
    1. Radius (mean of distances from center to points on the perimeter)
    2. Texture (standard deviation of gray-scale values)
    3. Perimeter
    4. Area
    5. Smoothness (local variation in radius lengths)
    6. Compactness (perimeter^2 / area - 1.0)
    7. Concavity (severity of concave portions of the contour)
    8. Concave Points (number of concave portions of the contour)
    9. Symmetry
    10. Fractal Dimension ("coastline approximation" - 1)

The dataset includes 30 features in total, with each feature having mean, standard error, and "worst" values.

  • Data Quality: No missing values.
  • Class Distribution: 357 benign cases, 212 malignant cases.

Installation

To run the notebook, you need the following dependencies:

  • Python 3.x
  • Jupyter Notebook
  • pandas
  • seaborn
  • matplotlib

About

This project is an exploratory data analysis (EDA) of the Breast Cancer Diagnostic dataset. The goal is to gain insights into the dataset and prepare it for further analysis and modeling.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published