Data Science with Deep Learning
MODULE 1 : PYTHON FOR DATA ANALYSIS & DATA SCIENCE
Introduction to Python for Data Analysis (1 hour)
- Overview of Python and its applications in data science.
- Basics of Python programming (variables, data types, basic operations).
Working with Libraries (1 hour)
- Introduction to essential libraries: NumPy, Pandas, and Matplotlib.
- Basic operations using NumPy arrays.
- Data manipulation with Pandas DataFrames.
- Basic data visualization with Matplotlib.
Data Cleaning and Preprocessing (1 hour)
- Handling missing data.
- Removing duplicates.
- Data normalization and scaling.
Exploratory Data Analysis (1 hour)
- Descriptive statistics.
- Data visualization for analysis.
- Correlation and covariance.
Module 2 : Data Analysis for another 5 hours with practical data set analysis
Recap of Python for Data Analysis (1 hour)
Brief review of Python basics and key libraries (NumPy, Pandas, Matplotlib).
Importing and Exploring Datasets (1 hour)
- Reading data from various sources (CSV, Excel, SQL).
- Exploring dataset structure, dimensions, and basic statistics.
Data Cleaning and Pre-processing (1 hour)
- Handling missing values.
- Dealing with outliers.
- Data transformation and feature engineering.
Advanced Data Visualization (1 hour)
- Utilizing Seaborn for advanced visualization.
- Creating interactive visualizations with Plotly.
Statistical Analysis and Hypothesis Testing (1 hour)
- Introduction to statistical concepts.
- Performing hypothesis tests using Python (e.g., t-tests).
Practical Data Analysis Project (1 hour)
- Guided analysis of a real-world dataset.
- Applying learned concepts to solve a specific problem.
DATASETs FOR DATA ANALYSIS
Iris Dataset:
Description: Measurements of sepal length, sepal width, petal length, and petal width for three species of iris flowers.
Domain: Botany
Use: Classification, basic statistical analysis, visualization.
Titanic Dataset:
Description: Passenger information on the Titanic, including survival status, class, gender, and age.
Domain: Transportation
Use: Survival analysis, categorical analysis, visualization.
Boston Housing Dataset:
Description: Housing prices and various factors affecting them in Boston suburbs.
Domain: Real Estate
Use: Regression analysis, correlation analysis, visualization.
Wine Dataset:
Description: Chemical analysis results of wines from three different cultivars.
Domain: Food and Beverage
Use: Classification, cluster analysis, visualization.
Diabetes Dataset:
Description: Various health metrics for diabetes patients.
Domain: Healthcare
Use: Regression analysis, correlation analysis, visualization.
Heart Disease UCI Dataset:
Description: Patient data related to heart disease.
Domain: Healthcare
Use: Classification, statistical analysis, visualization.
Breast Cancer Wisconsin (Diagnostic) Dataset:
Description: Biopsy results for breast cancer diagnosis.
Domain: Healthcare
Use: Classification, statistical analysis, visualization.
Penguins Dataset:
Description: Measurements and characteristics of penguin species.
Domain: Biology
Use: Classification, cluster analysis, visualization.
Advertising Dataset:
Description: Sales and advertising spending data.
Domain: Marketing
Use: Regression analysis, correlation analysis, visualization.
Student Exam Scores Dataset:
Description: Exam scores of students with attributes like study hours, attendance, etc.
Domain: Education
Use: Regression analysis, correlation analysis, visualization.
MODULE 3: MACHINE LEARNING
Supervised Learning Algorithms:
Decision Trees and Random Forest:
Dataset: Titanic Dataset
Objective: Predicting survival status based on features such as class, gender, and age.
Linear Regression:
Dataset: Boston Housing Dataset
Objective: Predicting housing prices based on various factors like crime rate, room numbers, etc.
K-Nearest Neighbors (KNN):
Dataset: Breast Cancer Wisconsin (Diagnostic) Dataset
Objective: Classifying tumors as malignant or benign based on various features.
Support Vector Machines (SVM):
Dataset: Heart Disease UCI Dataset
Objective: Predicting the presence or absence of heart disease based on patient data.
Logistic Regression:
Dataset: Student Exam Scores Dataset
Objective: Predicting the likelihood of a student passing an exam based on study hours and attendance.
MODULE 4
Unsupervised Learning Algorithms:
K-Means Clustering:
Dataset: Wine Dataset
Objective: Grouping wines into clusters based on their chemical analysis results.
Hierarchical Clustering:
Dataset: Penguins Dataset
Objective: Discovering natural groupings of penguin species based on their measurements.
Principal Component Analysis (PCA):
Dataset: Iris Dataset
Objective: Reducing dimensionality and visualizing relationships among iris species.
Association Rule Mining (Apriori Algorithm):
Dataset: Advertising Dataset
Objective: Discovering associations between different advertising channels and sales.
Gaussian Mixture Model (GMM):
Dataset: Diabetes Dataset
Objective: Identifying patterns and clusters within health metrics.
MODULE 5 : DEEP LEARNING
COVID-19 Open Research Dataset (CORD-19):
Description: A collection of research papers, articles, and information related to COVID-19.
Use: Natural Language Processing (NLP), text mining, and information retrieval for understanding and analyzing research on the coronavirus.
Link: CORD-19 Dataset
Common Objects in Context (COCO):
Description: A large-scale object detection, segmentation, and captioning dataset.
Use: Computer Vision tasks such as image classification, object detection, and image segmentation.
Link: COCO Dataset
ImageNet:
Description: A large-scale image dataset with millions of labeled images across thousands of classes.
Use: Image classification, object detection, and transfer learning in deep learning projects.
Link: ImageNet Dataset
MODULE 6: AI MODELING
Natural Language Processing (NLP) for Sentiment Analysis:
Application: Analyzing Social Media Sentiments
Description: Build a sentiment analysis model using NLP techniques to analyze and understand the sentiment expressed in social media posts, customer reviews, or comments. This model can help businesses gauge public opinion, monitor brand perception, and identify areas for improvement.
Computer Vision for Object Detection:
Application: Autonomous Vehicles
Description: Develop a computer vision model for object detection to be used in autonomous vehicles. The model can identify and classify objects such as pedestrians, vehicles, traffic signs, and obstacles in real-time, allowing the vehicle to make informed decisions based on its surroundings.
Recommender System for E-Commerce:
Application: Personalized Product Recommendations
Description: Implement a recommender system using collaborative filtering or content-based filtering techniques for an e-commerce platform. This model can analyze user behavior, preferences, and purchase history to recommend products tailored to individual users, enhancing the user experience and increasing the likelihood of successful transactions.
TOTAL NOF OF HOURS L: 45 to 50 hours
Classes : MON-WED-FRI ( TUE to SAT Indian Calender)
TIME : 8AM to 9AM IST
You may be also interested in
Register for FREE Demo
Fields marked with (*) are mandatory
Client Feedback
Thanks, you guys provided the best online training in SAP BODS. The hands on training is very informative and helpful.
- David Sergi, Camaga Ltd; U.K