'Machine Learning/course'에 해당되는 글 30건

  1. 2020.09.15 [self] PR(Precision Recall) curve VS ROC curve
  2. 2020.07.24 For the benefit of new folks in the field, here's a list of some well known and
  3. 2020.06.24 【자기주도온라인평생학습 】기초 탈출 파이썬 판다스(Python Pandas) 무료 온라인 강의 리스트 – 8개 교육과정 컴퓨터 프로그래밍에서
  4. 2020.03.16 파면 팔수록 쉽고 재미있는 머신러닝
  5. 2020.03.04 Christoper M. Bishop의 저서인 "Pattern Recognition and Machine Learning" 에 등장하는 알고리즘
  6. 2020.03.03 PRML algorithms implemented in Python Python codes implementing algorithms descr
  7. 2020.01.20 #OpenSyllabus 세계 80개국에서 영어로 작성된 7백만 여개의 Syllabus를 수집해 각 전공별로 어떠한 전공 서적이 수업에 가장
  8. 2020.01.07 제목: 랜덤 포레스트를 통한 주가 움직임 예측 금융 분야에서 모든 사람은 주가를 예측하는데 있어 우위를 점할 무언가를 찾고 있습니다. 머신 러닝
  9. 2019.12.23 Recently, I came across this excellent machine learning playlist by Prof. Arti R
  10. 2019.11.17 그룹에 Mathematics for Machine Learning(MML) 책을 보고 공부하시는 분들이 많을 것 같습니다. 동명의 Coursera 강의를 듣는 분들도 계실테고요. 강의를 진행하는 4명의 강사 중 한 명인 Sam Cooper는 Coursera..
  11. 2019.11.14 Gradient Boost와 관련 모델 정리 (하나씩 점을 찍어 나가며 / 미스터 흠시) * 영상 : StatQuest with Josh Starmer 의 StatQuest: Gradient Boost Part1 영상 미스터 흠시 님께서 "StatQuest with Josh Starmer 의 StatQuest: Gra..
  12. 2019.11.07 Machine Learning Tutorial 6 - Decision Tree Classifier and Decision Tree Regression in Python
  13. 2019.10.21 Machine Learning for Intelligent Systems: Cornell CS4780/CS5780 Lectures: https://www.youtube.com/playlist?list=PLl8OlHZGYOQ7bkVbuRthEsaLr7bONzbXS Course Website: http://www.cs.cornell.edu/courses/cs4780/2018fa/syllabus/index.html Find The Most Updated..
  14. 2019.09.24 데이터사이언스 공부순서
  15. 2019.09.03 머신러닝 통계학 확률론 등 동영상강의 연세대 수학과 이승철교수님
  16. 2019.07.02 Stanford has an AI graduate certificate program on their website. They have the following classes:
  17. 2019.01.22 머신러닝 공부 순서, 방법 및 강의 정리(HowUse 곰씨네)
  18. 2018.12.03 머신러닝 공부순서
  19. 2018.02.26 Recommended Books - UC Berkeley Statistics Graduate Student Association
  20. 2017.12.16 통계 추천 동영상강좌
  21. 2017.11.08 Pattern Recognition and Machine Learning 책 2장 식(2.117)까지 정리한 노트입니다.
  22. 2017.08.03 앤드류 교수의 코세라 강의를 정주행 후 영상 슬라이드를 정리하여
  23. 2017.04.11 Machine Learning Part 1 | SciPy 2016 Tutorial | Andreas Mueller & Sebastian Raschka
  24. 2017.03.26 Time Series Forecasting with Python 7-Day Mini-Course - Machine Learning Mastery
  25. 2017.03.18 How do I start AI/ML/DL?
  26. 2017.03.16 Machine Learning/Neural Networks recommended courses
  27. 2016.12.30 Machine Learning with MATLAB video lectures
  28. 2016.12.15 machine learning course by Professor Andrew Ng - Lecture notes and exercise code
  29. 2016.12.14 Andrew Ng's Machine Learning lectures list
  30. 2016.10.28 Machine Learning Part I

PR(Precision Recall) curve VS ROC curve에 대한 근본적인 이해를 돕는 글들이 있다. 

                       | predicted negative | predicted positive |
|-----------------|---------------------|---------------------|
| actual negative | True Negative (TN) | False Positive (FP) |
| actual positive  | False Negative (FN) | True Positive (TP) |

 

[1]과 [2]에서 설명하는 내용은 비슷하다. ROC 커브는 Recall(=Sensitivity)과  False Positive Rate(= 1-Specificity)인데, 이 둘은 모두 actual class가 1인 것중에서 1로 분류한 비율과 0인 것 중에서 1로 분류한 것의 비율을 보는 것인데, PR 커브는 Precision과 Recall(=Sensitivity)을 사용하는데, Recall은 같은데 Precision은 1로 분류된 것 중에서 실제 1의 비율을 보는 것이다. Precision은 실제클래스가 1인 샘플들의 base probability 또는 prior probability가 달라짐에 따라서 Precision이 달라진다. 반면, Recall(=Sensitivity)과 FPR은 base probability 또는 prior probability가 항상 같다. [3]에 PR, ROC를 잘 설명한 그림이 있어 복사해왔다. 

 

src: https://towardsdatascience.com/on-roc-and-precision-recall-curves-c23e9b63820c

 

위 그림을 예로들어 설명하면 (분홍색 선과 청록색 선은 threshold를 나타냄), 

Recall(=Sensitivity) : 전체 노란 샘플개수 중 분홍색 선 위쪽에 존재하는 노란 샘플개수 비율.

False Positive Rate(= 1-Specificity) : 전체 파란 샘플개수 중 분홍색 선 위쪽에 존재하는 파란 샘플개수 비율.

Precision : 분홍색 선 위쪽에 존재하는 노란색 샘플과 파란색 샘플의 합 중에서 분홍색 선 위쪽에 존재하는 노란 샘플개수 비율.

 

Recall(=Sensitivity)과 FPR은 threshold가 청록cyan색에서 분홍색 라인으로 올라가도 분자는 바뀌어도 분모(base probability 또는 prior probability)는 그대로이다. 반면, Precision은 threshold가 바뀌면 분모(base probability 또는 prior probability)가 바뀌고 분자도 바뀐다. 위와 같이 imbalanced data인 경우 threshold가 위로 올라가면서 Precision의 분모가 급격히 바뀌게 된다. 위 그림에서 클래스 1인 노란색 샘플이 파란색 샘플만큼 많은 경우를 상상해보자. 즉 파란색 샘플들은 그대로인데, 노란색 샘플들이 prediction score가 대략 [-0.3, 1] 범위에 노란 샘플들이 많이 존재하는 경우다. 그러면 threshold가 올라가면서 Precision의 분모가 급격히 변하지 않고 smooth하게 변할 것이다. 윗부분이 노란색 샘플들이 많으므로. 그러면 threshold가 위로 갈수록 분모와 분자는 거의 비슷한 값을 가질 것이다. threshold가 변함에 따라 Precision이 급격하게 변하지 않는 것이다. 따라서 Precision은 1클래스가 rare한 경우에 민감하여 분류 performance를 잘 반영해준다. 

 

 

[1] stats.stackexchange.com/questions/7207/roc-vs-precision-and-recall-curves

 

The key difference is that ROC curves will be the same no matter what the baseline probability is, but PR curves may be more useful in practice for needle-in-haystack type problems or problems where the "positive" class is more interesting than the negative class.

To show this, first let's start with a very nice way to define precision, recall and specificity. Assume you have a "positive" class called 1 and a "negative" class called 0. Y^Y^ is your estimate of the true class label YY. Then:

 

Precision            =P(Y=1|Y^=1)

Recall=Sensitivity =P(Y^=1|Y=1)

Specificity           =P(Y^=0|Y=0)

 

The key thing to note is that sensitivity/recall and specificity, which make up the ROC curve, are probabilities conditioned on the true class label. Therefore, they will be the same regardless of what P(Y=1)P(Y=1) is. Precision is a probability conditioned on your estimate of the class label and will thus vary if you try your classifier in different populations with different baseline P(Y=1). However, it may be more useful in practice if you only care about one population with known background probability and the "positive" class is much more interesting than the "negative" class. (IIRC precision is popular in the document retrieval field, where this is the case.) This is because it directly answers the question, "What is the probability that this is a real hit given my classifier says it is?".

Interestingly, by Bayes' theorem you can work out cases where specificity can be very high and precision very low simultaneously. All you have to do is assume P(Y=1) is very close to zero. In practice I've developed several classifiers with this performance characteristic when searching for needles in DNA sequence haystacks.

IMHO when writing a paper you should provide whichever curve answers the question you want answered (or whichever one is more favorable to your method, if you're cynical). If your question is: "How meaningful is a positive result from my classifier given the baseline probabilities of my problem?", use a PR curve. If your question is, "How well can this classifier be expected to perform in general, at a variety of different baseline probabilities?", go with a ROC curve.

 

[2] www.quora.com/What-is-the-difference-between-a-ROC-curve-and-a-precision-recall-curve-When-should-I-use-each

 

There is a very important difference between what a ROC curve represents vs that of a PRECISION vs RECALL curve.

Remember, a ROC curve represents a relation between sensitivity (RECALL) and False Positive Rate (NOT PRECISION). Sensitivity is the other name for recall but the False Positive Rate is not PRECISION.

Recall/Sensitivity is the measure of the probability that your estimate is 1 given all the samples whose true class label is 1. It is a measure of how many of the positive samples have been identified as being positive.

Specificity is the measure of the probability that your estimate is 0 given all the samples whose true class label is 0. It is a measure of how many of the negative samples have been identified as being negative.

PRECISION on the other hand is different. It is a measure of the probability that a sample is a true positive class given that your classifier said it is positive. It is a measure of how many of the samples predicted by the classifier as positive is indeed positive. Note here that this changes when the base probability or prior probability of the positive class changes. Which means PRECISION depends on how rare is the positive class. In other words, it is used when positive class is more interesting than the negative class.

So, if your problem involves kind of searching a needle in the haystack when for ex: the positive class samples are very rare compared to the negative classes, use a precision recall curve. Othwerwise use a ROC curve because a ROC curve remains the same regardless of the baseline prior probability of your positive class (the important rare class).

 

 

[3] towardsdatascience.com/on-roc-and-precision-recall-curves-c23e9b63820c

 
 
 
 
 
 
Posted by uniqueone
,

For the benefit of new folks in the field, here's a list of some well known and prestigious free online courses/books on Data Science and machine learning (for certificate you might have to pay though in some of these). Do add if you know any good free courses/resources in comments:

1) Machine Learning (Stanford University, Andrew Ng Course, Coursera)

This is a very respected course and widely accepted in machine learning industry.

https://www.coursera.org/learn/machine-learning

Do the assignments of this course in python as the course is in Octave language which isn't used that much in industry. Assignment solutions in Python.

https://github.com/dibgerge/ml-coursera-python-assignments

https://github.com/jdwittenauer/ipython-notebooks

2) Python Data Science Handbook (Jake VanderPlas)

Famous Python Data Science handbook topic-wise:

https://jakevdp.github.io/PythonDataScienceHandbook/

3) Fast.ai ML course (Jeremy Howard)

http://course18.fast.ai/ml

This course is designed by Kaggle 2 time competition winner Jeremy Howard and he is also chief scientist at Kaggle.

4) ML Course by IBM (edX)

https://www.edx.org/course/machine-learning-with-python-a-practical-introduct

5) Machine Learning Crash Course by Google

https://developers.google.com/machine-learning/crash-course/ml-intro

6) The Analytics Edge - Massachusetts Institute of Technology (MIT)

https://courses.edx.org/courses/course-v1:MITx+15.071x+1T2020/course/

7) Deep Learning (Andrew Ng, Coursera)

https://www.coursera.org/specializations/deep-learning

Posted by uniqueone
,

【자기주도온라인평생학습 】기초 탈출 파이썬 판다스(Python Pandas) 무료 온라인 강의 리스트 – 8개 교육과정

컴퓨터 프로그래밍에서 pandas는 데이터 조작 및 분석을 위해 Python 프로그래밍 언어로 작성된 소프트웨어 라이브러리입니다. 특히, 숫자 테이블 및 시계열 조작을위한 데이터 구조 및 조작을 제공합니다. 3 절 BSD 라이센스에 따라 공개 된 무료 소프트웨어입니다. 위키백과(영어)

Pandas 튜토리얼 1 – 데이콘 (YouTube)
: http://bitly.kr/gxleF15yLk

Pandas 팬더스 강의 기초 실습 – Minsuk Heo 허민석 (YouTube)
: http://bitly.kr/CUDIQ7q3z0E

토크ON 44차. Pandas로 하는 시계열 데이터분석 | T아카데미 – SKplanet Tacademy (YouTube)
: http://bitly.kr/Pjg0hLuDTJ

파이썬 판다스로 데이터 분석하고 엑셀로 뽑기 – Kyeongrok Kim (YouTube)
: http://bitly.kr/tzB9dPbQOJB

데이터분석/판다스 강의 – NeoWizard (YouTube)
: http://bitly.kr/EIPbWyAadFK

Python을 활용한 데이터분석 기초 | T아카데미 – SKplanet Tacademy (YouTube)
: http://bitly.kr/XXjLP2C0n6

판다스 – Sungchul Lee (YouTube)
: http://bitly.kr/aPAoUoVOPg

빅데이터 프로그래밍 - 건국대학교 이정환교수 (KOCW)
: http://bitly.kr/tmI1VuiDMA

출처 : http://bitly.kr/XMGZ6Ntol3

【자기주도온라인평생학습】 http://withmooc.com/courses/

Posted by uniqueone
,

https://www.youtube.com/watch?v=7RtWlPpk348&list=PLwvr-xPygMX9TaQFW3C1UGEuD0zJF7pCk

[ 유튜브 채널 : K-ICT 빅데이터센터 ]
 

* 학습목차
1 . 파이썬 머신러닝 강의 01-1 – 머신러닝(Machine Learning) 개요
머신러닝의 정의를 소개하는 강의입니다.
강의자료, 소스코드 다운받기 : https://kbig.kr/board/board.brd?boardId=movie_python&bltnNo=11583395976711
주요내용
– 머신러닝의 정의
– 머신러닝 사례
 

2 . 파이썬 머신러닝 강의 01-2 – 머신러닝을 사용하는 이유
머신러닝을 사용하는 이유를 전통적인 학습방법과 비교하여 설명합니다.
강의자료, 소스코드 다운받기 : https://kbig.kr/board/board.brd?boardId=movie_python&bltnNo=11583395948777
주요내용
– 전통적인 접근 방법
– 머신러닝 접근 방법
– 자동으로 변화에 적응
– 머신러닝을 통해 새로운 패턴을 발견
 

3 . 파이썬 머신러닝 강의 01-3 – 머신러닝 기법의 분류
머신러닝의 학습 특성에 따른 분류를 소개하는 강의입니다.
강의자료, 소스코드 다운받기 : https://kbig.kr/board/board.brd?boardId=movie_python&bltnNo=11583395909366
주요내용
– 지도학습
– 비지도학습
– 강화학습
– 준지도학습
 

4 . 파이썬 머신러닝 강의 01-4 – 파이썬 아나콘다 및 Jupyter Notebook 설치 실습
머신러닝 개발환경 구성을 위한 프로그램 설치 실습 강의입니다.
강의자료, 소스코드 다운받기 : https://kbig.kr/board/board.brd?boardId=movie_python&bltnNo=11583395874550
주요내용
– 파이썬과 Anaconda 설치 및 환경설정하기
– Jupyter Notebook 설정 및 실행하기
 

5 . 파이썬 머신러닝 강의 01-5 – 머신러닝 개념 및 개발환경 구성 주요정리
머신러닝 개념 및 머신러닝 개발환경 구성 강의 주요내용 정리 영상입니다.
강의자료, 소스코드 다운받기 : https://kbig.kr/board/board.brd?boardId=movie_python&bltnNo=11583395846022
6 . 파이썬 머신러닝 강의 02-1 – 파이썬 NumPy 소개
파이썬의 연산처리에 활용되는 NumPy 라이브러리를 소개하는 강의입니다.
강의자료, 소스코드 다운받기 : https://kbig.kr/board/board.brd?boardId=movie_python&bltnNo=11583395819694
주요내용
– NumPy(Numerical Python) 소개 및 특징
 

7 . 파이썬 머신러닝 강의 02-2 – 파이썬 NumPy 주요 함수 및 기능 정리
파이썬 NumPy 배열 객체의 기본적 활용을 위한 함수 및 기능을 정리한 강의입니다.
강의자료, 소스코드 다운받기 : https://kbig.kr/board/board.brd?boardId=movie_python&bltnNo=11583395793992
주요내용
– NumPy 패키지와 배열(ndarray) 객체
– NumPy 배열 객체의 주요 속성
– 인덱싱 및 슬라이싱
– 내장 함수 사용
– 브로드캐스팅
– 부울 배열과 마스킹 연산
– 배열 객체 정렬
 

8 . 파이썬 머신러닝 강의 02-3 – 파이썬 Pandas 소개
파이썬에서 데이터 처리 및 연산에 사용되는 Pandas 라이브러리 특징을 소개하는 강의입니다.
강의자료, 소스코드 다운받기 : https://kbig.kr/board/board.brd?boardId=movie_python&bltnNo=11583395765166
주요내용
– Pandas 소개 및 특징
 

9 . 파이썬 머신러닝 강의 02-4 – 파이썬 Pandas 주요 함수 및 기능 정리
파이썬 Pandas 주요 기능 및 함수 활용방법을 설명하는 강의입니다.
강의자료, 소스코드 다운받기 : https://kbig.kr/board/board.brd?boardId=movie_python&bltnNo=11583395733821
주요내용
– Pandas 패키지와 Series, DataFrame 객체
– DataFrame 객체와 loc 인덱서, iloc 인덱서의 활용
– DataFrame 객체의 행과 열 조작
– DataFrame 객체의 널 값 연산
– DataFrame 객체의 조인 연산
 

10 . 파이썬 머신러닝 강의 02-5 – 파이썬 NumPy, Pandas 주요내용 정리
파이썬 NumPy, Pandas 라이브러리 주요내용 정리 영상입니다.
강의자료, 소스코드 다운받기 : https://kbig.kr/board/board.brd?boardId=movie_python&bltnNo=11583395702947
 

11 . 파이썬 머신러닝 강의 03-1 – 파이썬 데이터 시각화 Matplotlib 소개 및 활용준비
파이썬 데이터 시각화 라이브러리인 Matplotlib를 소개하고 활용 준비과정을 설명합니다.
강의자료, 소스코드 다운받기 : https://kbig.kr/board/board.brd?boardId=movie_python&bltnNo=11583395672528
주요내용
– matplotlib 임포트와 Jupyter Notebook에 플롯 삽입 준비
– 플롯 스타일과 설정
 

12 . 파이썬 머신러닝 강의 03-2 – 파이썬 Matplotlib의 라인 플롯 만들기
파이썬 Matplotlib 라이브러리를 활용한 라인 플롯을 그리는 방법을 설명합니다.
강의자료, 소스코드 다운받기 : https://kbig.kr/board/board.brd?boardId=movie_python&bltnNo=11583395646866
주요내용
– 라인 플롯과 주요 속성 설정
– 스타일 컨텍스트 매니저를 이용한 플롯 스타일의 일시적 설정 변경
– 객체지향 인터페이스 Figure와 Axes 객체의 사용
– Matlab 스타일 pyplot 인터페이스의 사용
 

13 . 파이썬 머신러닝 강의 03-3 – 파이썬 Matplotlib의 스캐터 플롯 만들기
파이썬 Matplotlib 라이브러리를 활용한 스캐터 플롯을 그리는 방법을 설명합니다.
강의자료, 소스코드 다운받기 : https://kbig.kr/board/board.brd?boardId=movie_python&bltnNo=11583395603551
주요내용
– plt.plot()을 이용한 스캐터 플롯
– plt.scatter()를 이용한 스캐터 플롯
 

14 . 파이썬 머신러닝 강의 03-4 – 파이썬 Matplotlib의 히스토그램 만들기
파이썬 Matplotlib 라이브러리를 활용한 히스토그램을 그리는 방법을 설명합니다.
강의자료, 소스코드 다운받기 : https://kbig.kr/board/board.brd?boardId=movie_python&bltnNo=11583395578518
주요내용
– 데이터의 분포 형상을 식별하는 히스토그램
– 상대도수 히스토그램과 확률밀도 함수
 

15 . 파이썬 머신러닝 강의 03-5 – 파이썬 Matplotlib의 박스 플롯 만들기
파이썬 Matplotlib 라이브러리를 활용한 박스 플롯을 그리는 방법을 설명합니다.
강의자료, 소스코드 다운받기 : https://kbig.kr/board/board.brd?boardId=movie_python&bltnNo=11583395553918
주요내용
– 박스 플롯과 사분위수
– 박스 플롯과 IQR
 

16 . 파이썬 머신러닝 강의 03-6 – 파이썬 Matplotlib의 이미지 플롯 만들기
파이썬 Matplotlib 라이브러리를 활용한 이미지 플롯을 그리는 방법을 설명합니다.
강의자료, 소스코드 다운받기 : https://kbig.kr/board/board.brd?boardId=movie_python&bltnNo=11583395524370
주요내용
– plt.imshow() 를 이용한 이미지 플롯
 

17 . 파이썬 머신러닝 강의 03-7 – 파이썬 Matplotlib 데이터 시각화 주요 정리
파이썬 Matplotlib을 활용한 데이터 시각화의 주요내용 정리 영상입니다.
강의자료, 소스코드 다운받기 : https://kbig.kr/board/board.brd?boardId=movie_python&bltnNo=11583395484060
 

18 . 파이썬 머신러닝 강의 04-1 – 머신러닝을 위한 통계학 소개
머신러닝 이해를 위한 통계학의 개념을 소개합니다.
강의자료, 소스코드 다운받기 : https://kbig.kr/board/board.brd?boardId=movie_python&bltnNo=11583395443167
주요내용
– 기술 통계학
– 추론 통계학
 

19 . 파이썬 머신러닝 강의 04-2 – 머신러닝을 위한 통계학 핵심개념 – 기술통계
머신러닝 이해를 위한 기술통계 핵심 개념을 설명합니다.
강의자료, 소스코드 다운받기 : https://kbig.kr/board/board.brd?boardId=movie_python&bltnNo=11583395380016
주요내용
– 모집단과 표본
– 모수와 통계량
– 평균, 중간값, 최빈값
– 분산, 표준편차, 범위, 사분위수
 

20 . 파이썬 머신러닝 강의 04-3 – 머신러닝을 위한 통계학 핵심개념 – 통계적 추론
머신러닝 이해를 위한 통계적 추론의 핵심 개념을 설명합니다.
강의자료, 소스코드 다운받기 : https://kbig.kr/board/board.brd?boardId=movie_python&bltnNo=11583395348554
주요내용
– 가설 검정과 p-value
– 정규분포
– 카이제곱 독립성 검정
– ANOVA 분산 분석
 

21 . 파이썬 머신러닝 강의 04-4 – 머신러닝을 위한 통계학 핵심개념 주요 정리
머신러닝을 위한 통계학 핵심개념 주요내용 정리 영상입니다.
강의자료, 소스코드 다운받기 : https://kbig.kr/board/board.brd?boardId=movie_python&bltnNo=11583395324352
 

22 . 파이썬 머신러닝 강의 05-1 – Scikit-Learn 기초 및 데이터 표현 방식 이해
파이썬의 머신러닝 라이브러리인 Scikit-Learn을 소개하고 데이터 표현방식을 설명합니다.
강의자료, 소스코드 다운받기 : https://kbig.kr/board/board.brd?boardId=movie_python&bltnNo=11583395295645
주요내용
– Scikit-Learn 소개
– Scikit-Learn 데이터 표현 방식
– 특징행렬과 대상벡터의 데이터 레이아웃
– Numpy 배열을 이용한 특징 행렬(X), 대상 벡터(y)의 생성
– Pandas DataFrame을 이용한 특징 행렬(X), 대상 벡터(y)의 생성
– Bunch 객체를 이용한 특징 행렬(X), 대상 벡터(y)의 생성
 

23 . 파이썬 머신러닝 강의 05-2 – Scikit-Learn을 활용한 머신러닝 모델 만들기
Scikit-Learn을 활용하여 데이터 준비부터 모델적합, 예측, 평가에 이르는 전 과정을 설명합니다.
강의자료, 소스코드 다운받기 : https://kbig.kr/board/board.brd?boardId=movie_python&bltnNo=11583395272454
주요내용
– Scikit-Learn Estimator API 기본 활용 절차
– 데이터 준비
– 모델 클래스 선택
– 모델 인스턴스 생성과 하이퍼파라미터 선택
– 특징 행렬과 대상 벡터 준비
– 모델을 데이터에 적합
– 새로운 데이터를 이용해 예측
– 모델 평가
 

24 . 파이썬 머신러닝 강의 05-3 – Scikit-Learn 활용 훈련, 테스트 데이터 분할
훈련 데이터와 테스트 데이터 분할 개념 및 Scikit-Learn의 활용방법을 설명합니다.
강의자료, 소스코드 다운받기 : https://kbig.kr/board/board.brd?boardId=movie_python&bltnNo=11583395246037
주요내용
– iris 데이터 소개
– 정확도가 정말 1.0 인가?
– 훈련 데이터와 테스트 데이터의 분리
– 테스트 데이터를 이용한 모델의 성능 측정
 

25 . 파이썬 머신러닝 강의 05-4 – Scikit-Learn의 하이퍼파라미터 선택
머신러닝 학습 최적화 및 학습수준 변화 관찰을 위한 하이퍼파라미터(Hyper Parameter) 조정방법을 설명합니다.
강의자료, 소스코드 다운받기 : https://kbig.kr/board/board.brd?boardId=movie_python&bltnNo=11583395220276
주요내용
– 하이퍼파라미터의 선택
 

26. 파이썬 머신러닝 강의 05-5 – 파이썬 Scikit-Learn 주요내용 정리
파이썬 Scikit-Learn 라이브러리의 주요내용 정리 영상입니다.

강의자료, 소스코드 다운받기 : https://kbig.kr/board/board.brd?boardId=movie_python&bltnNo=11583395197153

 

27. 파이썬 머신러닝 강의 06-1 – 머신러닝 교차검증 개념 및 기법
머신러닝 모델의 정확한 성능 평가를 위한 교차검증 개념과 기법에 대해 설명합니다.

강의자료, 소스코드 다운받기 : https://kbig.kr/board/board.brd?boardId=movie_python&bltnNo=11583395166775

주요내용
– 교차 검증의 기본 절차와 필요성
– 교차 검증 기법

 

28. 파이썬 머신러닝 강의 06-2 – 머신러닝 최적의 모델 개념
머신러닝 최적의 모델에 대한 개념을 설명합니다.

강의자료, 소스코드 다운받기 :

주요내용
– 최적의 모델 선택 방법

 

29. 파이썬 머신러닝 강의 06-3 – 편향-분산 트레이드오프(Trade-off)
머신러닝 모델의 성능평가에 따르는 편향-분산 간 상관관계(Trade-off)를 이해하고 Scikit-Learn으로 테스트하는 방법을 설명합니다.

강의자료, 소스코드 다운받기 : https://kbig.kr/board/board.brd?boardId=movie_python&bltnNo=11583395105344

주요내용
– 고편향 모델과 고분산 모델
– 검증 곡선
– 최적 모델

 

30. 파이썬 머신러닝 강의 06-4 – 머신러닝 학습곡선 특성
머신러닝에서 학습곡선에 영향을 주는 요인을 살펴봅니다.

강의자료, 소스코드 다운받기 :

주요내용
– 학습곡선 : 훈련집합의 크기에 따른 훈련 점수/검증 점수의 플롯

 

31. 파이썬 머신러닝 강의 06-5 – Scikit-Learn활용 그리드 서치 실습
Scikit-Learn을 활용한 그리드 서치(Grid Search) 기반 학습 최적화를 해보는 실습강의입니다.

강의자료, 소스코드 다운받기 :

주요내용
– 최적의 다항식 모델을 구하기 위한 그리드 서치 모듈
– 최적 모델과 데이터 적합
– 실습1 : Scikit-Learn을 활용한 그리드 서치 구현 및 학습곡선 시각화

 

32. 파이썬 머신러닝 강의 06-6 – 특징 공학 및 데이터 변환 실습
머신러닝 모델 학습을 위한 데이터 변환 방법론인 특징 공학에 대해 알아보고, Scikit-Learn으로 실습합니다.

강의자료, 소스코드 다운받기 :

주요내용
– 범주 특징
– 텍스트 특징
– 유도 특징
– 누락 데이터 대체
– 특징 파이프라인
– 실습2 : 텍스트 데이터 변환 및 학습 적용

 

33. 파이썬 머신러닝 강의 06-7 – 교차검증, 그리드 서치, 특징 공학 주요내용 정리
협업 필터링 추천 시스템 강의 주요내용 정리 영상입니다.

강의자료, 소스코드 다운받기 : https://kbig.kr/board/board.brd?boardId=movie_python&bltnNo=11583394941714

 

34. 파이썬 머신러닝 강의 07-1 – 와인 품질 데이터 탐색
와인 품질 예측 모델 만들기 전 데이터에 대한 탐색적 분석을 진행하는 강의입니다.

강의자료, 소스코드 다운받기 : https://kbig.kr/board/board.brd?boardId=movie_python&bltnNo=11583394911307

주요내용
– 와인품질 데이터 데이터프레임의 구성
– 요약통계량

 

35. 파이썬 머신러닝 강의 07-2 – 와인 품질 데이터의 기술통계 및 통계적 검정
와인 품질 예측 모델 만들기 전 데이터에 대한 기술통계 및 통계적 검정을 진행하는 강의입니다.

강의자료, 소스코드 다운받기 : https://kbig.kr/board/board.brd?boardId=movie_python&bltnNo=11583394878451

주요내용
– 와인 종류별 품질의 기술통계량
– 와인 종류별 품질의 사분위수
– 와인 종류별 품질의 분포
– 와인 종류별 품질 차이의 통계적 유의성 검정

 

36. 파이썬 머신러닝 강의 07-3 – 와인 품질 데이터 상관 분석
와인 품질 예측 모델 만들기 전 변수 사이의 상관관계를 분석하는 강의입니다.

강의자료, 소스코드 다운받기 :

주요내용
– 변수들 사이의 상관계수 계산

 

37. 파이썬 머신러닝 강의 07-4 – 와인 품질 데이터 탐색적 분석 실습
와인 품질 데이터에 대한 탐색적 분석 실습 강의입니다.

강의자료, 소스코드 다운받기 : https://www.youtube.com/watch?v=g3ErwN8AKkw&list=PLwvr-xPygMX9TaQFW3C1UGEuD0zJF7pCk&index=37

주요내용
– 산점도 행렬
– 실습 1 : 와인 품질 데이터 불러오기
– 실습 1 : 기술통계
– 실습 1 : 와인종류에 대한 t-검정
– 실습 1 : 와인종류별 상관분석

 

38. 파이썬 머신러닝 강의 07-5 – 단순 선형 회귀(OLS) 개념
단순 선형 회귀(OLS)의 개념 및 수학적 정의를 설명하는 강의입니다.

강의자료, 소스코드 다운받기 :

주요내용
– 단순 선형 회귀의 개요
– 최소 제곱법
– 비용함수를 사용한 모델의 적합화 척도:RSS
– 단순 선형 회귀를 위한 OLS의 계산
– 단순 선형 회귀 모델의 평가

 

39. 파이썬 머신러닝 강의 07-6 – 다중 선형 회귀 개념
다중 선형 회귀의 개념 및 수학적 정의에 대해 설명하는 강의입니다.

강의자료, 소스코드 다운받기 :

주요내용
– 다중 선형 회귀의 개요
– 다중 선형 회귀의 벡터 정리

 

40. 파이썬 머신러닝 강의 07-7 – 다항 선형 회귀 개념
다항 선형 회귀 개념 및 수학적 정의를 설명하는 강의입니다.

강의자료, 소스코드 다운받기 :

주요내용
– 다항 회귀의 개요

 

41. 파이썬 머신러닝 강의 07-8 – 선형 회귀 주요내용 정리
선형 회귀(단순선형회귀, 다중선형회귀, 다항선형회귀) 주요내용 정리 영상입니다.

강의자료, 소스코드 다운받기 : https://kbig.kr/board/board.brd?boardId=movie_python&bltnNo=11583394688813

 

42. 파이썬 머신러닝 강의 08-1 – 와인 품질 예측 모델의 학습 데이터 구성
선형회귀 모델 기반 와인 품질 예측 모델 구현을 위한 와인 품질 데이터 구성을 설명합니다.

강의자료, 소스코드 다운받기 : https://kbig.kr/board/board.brd?boardId=movie_python&bltnNo=11583394647729

주요내용
– 와인 품질 데이터의 구성
– 와인 품질 데이터 프레임 준비

 

43. 파이썬 머신러닝 강의 08-2 – 선형회귀 모델 기반 와인 품질 예측 모델 학습
선형회귀 모델 기반 와인 품질 예측 모델 학습을 위한 데이터 준비 및 학습 방법 강의입니다.

강의자료, 소스코드 다운받기 : https://kbig.kr/board/board.brd?boardId=movie_python&bltnNo=11583394616470

주요내용
– 모델 클래스와 모델 파라미터의 선택
– 특징행렬과 대상 벡터의 추출
– 훈련 데이터와 테스트 데이터의 분리
– 모델의 데이터 적합
– 모델을 새로운 데이터에 적용

 

44. 파이썬 머신러닝 강의 08-3 – 선형회귀 모델 성능 측정 및 와인 품질 예측 모델 만들기 실습
선형회귀 모델 성능 측정 및 와인 품질 예측 모델 구현 실습 강의입니다.

강의자료, 소스코드 다운받기 : https://kbig.kr/board/board.brd?boardId=movie_python&bltnNo=11583394564427

주요내용
– 선형회귀 모델의 성능 측정
– 와인 품질 예측 모델 만들기 실습

 

45. 파이썬 머신러닝 강의 08-4 – 규제 선형회귀 모델 개념 및 학습
규제가 있는 선형회귀 모델의 개념 및 학습 방법에 대한 강의입니다.

강의자료, 소스코드 다운받기 :

주요내용
– 규제가 없는 선형 회귀 모델의 적합
– 규제가 있는 선형 회귀 모델의 적합
– 규제가 있는 선형 회귀 모델의 필요성

 

46. 파이썬 머신러닝 강의 08-5 – 리지(Ridge)회귀 모델 개념 및 와인 품질 예측 모델 만들기 실습
리지(Ridge)회귀 모델 개념 및 와인 품질 예측 모델 구현 실습 강의입니다.

강의자료, 소스코드 다운받기 :

주요내용
– 리지 회귀 모델 개요
– 선형 회귀 모델과 계수의 특징
– 실습 1 : 와인 품질 예측 모델 만들기

 

47. 파이썬 머신러닝 강의 08-6 – 라쏘(Lasso)회귀 모델 기반 와인 품질 예측 모델 만들기 실습
라쏘회귀 모델의 개념 및 라쏘회귀 모델 기반 와인 품질 예측 모델 만들기 실습 강의입니다.

강의자료, 소스코드 다운받기 :

주요내용
– 라쏘 회귀 모델 개요
– 라쏘 회귀 모델의 계수의 특징
– 실습2 : 와인 품질 예측 모델 만들기

 

48. 파이썬 머신러닝 강의 08-7 – 규제가 있는 선형 회귀 모델 주요 정리
규제가 있는 선형 회귀 모델의 주요내용 정리 영상입니다.

강의자료, 소스코드 다운받기 : https://kbig.kr/board/board.brd?boardId=movie_python&bltnNo=11583394439968

 

49. 파이썬 머신러닝 강의 09-1 – 의사결정트리 개요
의사결정트리 개요를 설명하는 영상입니다.

강의자료, 소스코드 다운받기 : https://kbig.kr/board/board.brd?boardId=movie_python&bltnNo=11583394031209

주요내용
– 의사결정 트리의 개요

 

50. 파이썬 머신러닝 강의 09-2 – 의사결정트리 생성 개념
의사결정트리의 특징을 설명합니다.

강의자료, 소스코드 다운받기 : https://kbig.kr/board/board.brd?boardId=movie_python&bltnNo=11583393122489

주요내용
– 의사결정 트리의 생성

 

53. 파이썬 머신러닝 강의 09-5 – 의사결정트리 기반 광고 클릭 예측 모델 학습 데이터 준비
나이브 베이즈 기반 스팸메일 필터 모델의 성능 측정 방법 강의입니다.

강의자료, 소스코드 다운받기 : https://kbig.kr/board/board.brd?boardId=movie_python&bltnNo=11583393010065

주요내용
– 온라인 광고 클릭 예측 데이터 개요
– 데이터 필드 구성
– 데이터 준비
– 원 핫 인코딩 벡터 변환

 

54. 파이썬 머신러닝 강의 09-6 – 의사결정트리 기반 광고 클릭 예측 모델의 학습
의사결정트리 기반 광고 클릭 예측 모델의 학습 방법 강의입니다.

강의자료, 소스코드 다운받기 :

주요내용
– 그리드 서치를 이용한 의사결정 트리 모델 학습
– 의사결정 트리 모데르이 파일 출력

 

55. 파이썬 머신러닝 강의 09-7 – 의사결정트리 기반 광고 클릭 예측 모델의 성능 측정
의사결정트리 기반 광고 클릭 예측 모델의 학습 방법 강의입니다.

강의자료, 소스코드 다운받기 : https://kbig.kr/board/board.brd?boardId=movie_python&bltnNo=11583392925252

주요내용
– 그리드 서치를 이용한 의사결정 트리 모델 학습
– 의사결정 트리 모데르이 파일 출력

 

56. 파이썬 머신러닝 강의 09-8 – 앙상블 기법 활용 광고 클릭 예측 모델 개선
앙상블 기법을 활용한 광고 클릭 예측 모델 개선 강의입니다.

강의자료, 소스코드 다운받기 :

주요내용
– 앙상블 학습과 배깅
– 앙상블 학습과 랜덤 프레스트
– 랜덤 포레스트의 특징 중요도
– 랜덤 포레스트의 성능 개선을 위한 주요 파라미터
– 랜덤포레스트 모델의 성능 측정 : 정확도, 혼동행렬, ROC의 AUC

 

57. 파이썬 머신러닝 강의 09-9 – 의사결정트리 주요내용 정리
분류모델 성능 평가지표 및 의사결정트리 주요내용 정리 영상입니다.

강의자료, 소스코드 다운받기 : https://kbig.kr/board/board.brd?boardId=movie_python&bltnNo=11583392863387

 

58. 파이썬 머신러닝 강의 10-1 – 로지스틱 회귀 개념
로지스틱 회귀의 개념을 설명합니다.

강의자료, 소스코드 다운받기 : https://kbig.kr/board/board.brd?boardId=movie_python&bltnNo=11583392828821

 

59. 파이썬 머신러닝 강의 10-2 – 로지스틱 회귀의 특징
로지스틱 회귀 모델과 시그모이드 함수의 특징을 설명합니다.

강의자료, 소스코드 다운받기 :

주요내용
– 로지스틱 회귀와 분류
– 시그모이드 함수 특징

 

60. 파이썬 머신러닝 강의 10-3 – 광고 클릭 예측 모델 데이터 준비
로지스틱 회귀 기반 온라인 광고 클릭 예측 모델 학습을 위한 데이터 준비 방법 강의입니다.

강의자료, 소스코드 다운받기 : https://kbig.kr/board/board.brd?boardId=movie_python&bltnNo=11583392764053

주요내용
– 온라인 광고 클릭 예측을 위한 데이터 준비
– 원 핫 인코딩(One Hot Encoding) 벡터 변환

 

61. 파이썬 머신러닝 강의 10-4 – 로지스틱 회귀 기반 광고 클릭 예측 모델 학습
로지스틱 회귀 모델을 활용한 온라인 광고 클릭 예측 모델 학습 방법 강의입니다.

강의자료, 소스코드 다운받기 :

주요내용
– 그리드 서치를 이용한 로지스틱 회귀 모델 학습

 

62. 파이썬 머신러닝 강의 10-5 -로지스틱 회귀 모델 성능 측정 및 온라인 광고 클릭 예측 모델 만들기 실습
로지스틱 회귀 모델 성능 측정 및 온라인 광고 클릭 예측 모델 구현 실습 강의입니다.

강의자료, 소스코드 다운받기 :

주요내용
– 로지스틱 회귀 모델의 성능 측정
– 실습 1 : 온라인 광고 클릭 예측 모델 구현

 

63. 파이썬 머신러닝 강의 10-6 – 로지스틱 회귀 주요내용 정리
로지스틱 회귀 주요내용 정리 영상입니다.

강의자료, 소스코드 다운받기 : https://kbig.kr/board/board.brd?boardId=movie_python&bltnNo=11583392129801

 

64. 파이썬 머신러닝 강의 11-1 – K-최근접 이웃(KNN) 알고리즘 개념
K-최근접 이웃(KNN) 분류 알고리즘과 차원의 저주 개념을 설명합니다.

강의자료, 소스코드 다운받기 : https://kbig.kr/board/board.brd?boardId=movie_python&bltnNo=11583392099634

주요내용
– K-최근접 이웃의 개요
– 차원의 저주

 

65. 파이썬 머신러닝 강의 11-2 – 암 진단 분류 모델 데이터 준비
K-최근접 이웃(KNN) 알고리즘을 활용한 유방암 진단 분류 모델 학습을 위한 데이터 준비 방법 강의입니다.

강의자료, 소스코드 다운받기 :

주요내용
– 데이터프레임의 생성
– 누락 값의 대체 및 클래스 레이블을 0과 1로 변환
– 불필요한 변수 제거 및 표준화 적용

 

66. 파이썬 머신러닝 강의 11-3 – K-최근접 이웃(KNN) 기반 암 진단 분류 모델 학습
K-최근접 이웃(KNN) 알고리즘 기반 암 진단 분류 모델 학습 강의입니다.

강의자료, 소스코드 다운받기 : https://kbig.kr/board/board.brd?boardId=movie_python&bltnNo=11583392040049

주요내용
– 머신러닝 모델 클래스 KNeighborsClassifier를 이용한 학습

 

67. 파이썬 머신러닝 강의 11-4 – K-최근접 이웃(KNN) 모델 성능 개선 및 암 진단 분류 모델 만들기 실습
K-최근접 이웃(KNN) 모델 성능 개선 및 유방암 진단 분류 모델 만들기 방법 실습 강의입니다.

강의자료, 소스코드 다운받기 :

주요내용
– 유방암 진단 분류를 위한 모델 성능 측정
– 유방암 진단 분류 모델의 성능 개선
– 그리드 서치를 이용한 하이퍼파라미터의 최적 값 선택
– 실습 1 : 암 진단 분류 모델 구현

 

68. 파이썬 머신러닝 강의 11-5 – K-최근접 이웃(KNN) 알고리즘 주요내용 정리
K-최근접 이웃(K-Nearest Neighbor; KNN) 알고리즘 개념 및 실습 강의의 주요내용 정리 영상입니다.

강의자료, 소스코드 다운받기 : https://kbig.kr/board/board.brd?boardId=movie_python&bltnNo=11583391983899

 

69. 파이썬 머신러닝 강의 12-1 – 나이브 베이즈 개념 및 스팸메일 필터 데이터 구축
나이브 베이즈 개념 및 스팸메일 필터 구현을 위한 데이터 전처리, 구축 방법 강의입니다.

강의자료, 소스코드 다운받기 : https://kbig.kr/board/board.brd?boardId=movie_python&bltnNo=11583391948833

주요내용
– 나이브 베이즈 분류기란?
– 베이즈 정리와 나이브 베이즈 동작 원리의 이해
– Enron 이메일 데이터 개요
– 파일 경로로 메일/스팸 기본 분류하기
– 숫자, 구두점, 사람 이름 제거
– 불용어 제거와 단어의 출현 빈도 특징을 추출

 

70. 파이썬 머신러닝 강의 12-2 – 나이브 베이즈 기반 스팸메일 필터 모델 학습
나이브 베이즈 기반 스팸메일 필터 모델 학습 방법 강의입니다.

강의자료, 소스코드 다운받기 :

주요내용
– 훈련 데이터와 테스트 데이터의 분리 및 변환
– 나이브 베이즈 모델 생성 및 학습

 

71. 파이썬 머신러닝 강의 12-3 – 나이브 베이즈 기반 스팸메일 필터 모델의 성능 측정
나이브 베이즈 기반 스팸메일 필터 모델의 성능 측정 방법 강의입니다.

강의자료, 소스코드 다운받기 :

주요내용
– 나이브 베이즈 모델의 성능 측정

 

72. 파이썬 머신러닝 강의 12-4 – 나이브 베이즈 성능 개선 및 스팸메일 필터 만들기 실습
나이브 베이즈 분류기 성능 개선 방법 및 스팸메일 필터 구현 실습 강의입니다.

강의자료, 소스코드 다운받기 :

주요내용
– 나이브베이즈 분류모델의 성능 개선
– 실습 1 : 스팸메일 필터 만들기

 

73. 파이썬 머신러닝 강의 12-5 – 나이브 베이즈 분류기 주요내용 정리
나이브 베이즈 분류기 강의 주요내용 정리 영상입니다.

강의자료, 소스코드 다운받기 : https://kbig.kr/board/board.brd?boardId=movie_python&bltnNo=11583391842770

 

74. 파이썬 머신러닝 강의 13-1 – 추천 엔진의 개념과 종류
추천 엔진의 개념과 여러 유형의 추천 시스템의 특성을 설명합니다.
강의자료, 소스코드 다운받기 : https://kbig.kr/board/board.brd?boardId=movie_python&bltnNo=11583391803590
주요내용
– 추천 엔진의 개요
– 추천 엔진의 종류
 

75. 파이썬 머신러닝 강의 13-2 – 추천 엔진 개발을 위한 협업 필터링 개념
추천 엔진에 사용되는 협업 필터링 개념과 추천 엔진 모델 평가방법을 설명합니다.
강의자료, 소스코드 다운받기 : https://kbig.kr/board/board.brd?boardId=movie_python&bltnNo=11583391590570
주요내용
– 협업 필터링 개요
– 추천 엔진 모델의 평가
 

76. 파이썬 머신러닝 강의 13-3 – 협업 필터링을 이용한 영화 추천 엔진 데이터 준비
협업 필터링을 이용한 영화 추천 엔진 구현을 위한 데이터 준비 방법 강의입니다.
강의자료, 소스코드 다운받기 : https://kbig.kr/board/board.brd?boardId=movie_python&bltnNo=11583391532308
주요내용
– 데이터 준비
– 데이터 프레임 구성
– 데이터 탐색
– 평가 행렬(ratings)의 생성
– 훈련 데이터와 테스트 데이터의 분리
 

77. 파이썬 머신러닝 강의 13-4 – 사용자 기반 협업 필터링을 이용한 영화 추천 엔진 만들기
사용자 기반 협업 필터링 기법의 개념과 영화 추천 엔진 적용 방법 강의입니다.

강의자료, 소스코드 다운받기 : https://kbig.kr/board/board.brd?boardId=movie_python&bltnNo=11583386776949

주요내용
– 사용자 간 유사도 행렬 생성
– 평가 예측 및 모델의 성능 측정
– 가장 비슷한 n명을 찾는 비지도 방식의 이웃 검색
– 선택된 n명의 사용자들의 평가 가중치 합을 사용한 예측 및 모델의 성능 측정

 

78. 파이썬 머신러닝 강의 13-5 – 아이템 기반 협업 필터링 활용 영화 추천 엔진 만들기 실습
아이템 기반 협업 필터링 활용 방법 및 협업 필터링 기법으로 영화 추천 엔진을 만들어보는 실습 강의입니다.

강의자료, 소스코드 다운받기 : https://kbig.kr/board/board.brd?boardId=movie_python&bltnNo=11583386604673

주요내용
– 영화 간 유사도 행렬 계산
– 평가 예측 및 모델 성능 측정
– 실습 1 : 협업 필터링 기반 영화 추천 엔진 만들기

 

79. 파이썬 머신러닝 강의 13-6 – 협업 필터링 기반 추천 시스템 주요내용 정리
협업 필터링 기반 추천 시스템 강의 주요내용 정리 영상입니다.

강의자료, 소스코드 다운받기 : https://kbig.kr/board/board.brd?boardId=movie_python&bltnNo=11583385578629

 

 

 

 

Posted by uniqueone
,

Christoper M. Bishop의 저서인 "Pattern Recognition and Machine Learning" 에 등장하는 알고리즘을 Python으로 (주피터 노트북으로) 구현한 저장소가 있군요.

저는 이 책을 보진 않았지만, 많은 분들께 유용할 수 있을것 같아서 공유드립니다. 코드는 매우 clean 한것 같네요 :)

깃헙 저장소: https://github.com/ctgk/PRML

Posted by uniqueone
,

PRML algorithms implemented in Python
Python codes implementing algorithms described in Bishop's book "Pattern Recognition and Machine Learning" : https://github.com/ctgk/PRML
#DeepLearning #MachineLearning #Python

Posted by uniqueone
,

#OpenSyllabus

세계 80개국에서 영어로 작성된 7백만 여개의 Syllabus를 수집해 각 전공별로 어떠한 전공 서적이 수업에 가장 많이 참고되었는지를 기록으로 제공해주는 Open Syllabus 사이트를 소개해드립니다.

요즘에는 학부 때와 다른 전공으로 진로를 정하고자 하시는 분들이 많이 계시지만, 비전공자라는 환경적 제약상 어떠한 서적으로 해당 전공을 심도있게 공부할 수 있는지에 대한 고민을 많이 하게 되는 것 같습니다. 리소스는 넘쳐나지만 그 반대 급부로 선택의 고민이 많아지고 있는 것이죠.

물론 모든 서적을 비교해보며 자신에게 가장 맞는 서적을 기반으로 학습하는 것이 가장 이상적일 수 있겠지만, 현실적인 타협안은 가장 많은 전공자들이 필독하는 바이블과 같은 서적을 시작점으로 학습하는 것일 것입니다. 그리고 Open Syllabus가 해당 선택을 도와줄 수 있는 좋은 지침서가 될 수 있을 것 같습니다.

기계학습과 자연어 처리를 공부하는 저의 경우, Open Syllabus에서 수학, 전산학 그리고 추가적 공부를 한다면 언어학에서 자주 사용되는 교재를 기반으로 학습을 시작해볼 수 있겠네요 !

자연어 처리와 직접적인 관련은 없는 게시물일 수 있지만 여러분의 학습 교재 선택에 있어 도움이 될 수 있는 자료인 것 같아 꼭 공유를 드리고 싶었습니다 😆

P.S 전산학의 1위 레퍼런스 교재가 피터 노빅의 인공지능 서적인 점이 재밌네요 ! 그 외 공룡책(OS), 탑다운 네트워크 서적 등도 높은 순위에 랭크되어 있습니다.

Open Syllabus: https://opensyllabus.org/results-list/fields?size=100

Posted by uniqueone
,

제목: 랜덤 포레스트를 통한 주가 움직임 예측
금융 분야에서 모든 사람은 주가를 예측하는데 있어 우위를 점할 무언가를 찾고 있습니다. 머신 러닝의 세계는 주가를 정확하게 예측할 수 있는 방법을 열심히 찾고 있는데요. 일부 존재하긴 하지만 퍼포먼스를 보장하기는 어렵습니다.
이 시리즈에서는 인기 있는 머신 러닝 모델 인 랜덤 포레스트를 다룰 것입니다. 랜덤 포레스트에서는 다양한 지표를 기반으로 주식의 움직임을 예측하는 머신 러닝 모델이 될 것입니다. 시리즈의 첫 번째 부분에서는 모델 개요를 설명합니다.

Video 1: https://youtu.be/V8jZuOtckn8
Video 2: https://youtu.be/W2hXbqnrUyY
Video 3: https://youtu.be/bdEQwJ6SPnA
Video 4: https://youtu.be/E2LX_hUHMn0
Video 5: https://youtu.be/iJmteST6fP8
Video 6: https://youtu.be/ioUtR92tDAA

GitHub: https://github.com/areed1192/sigma_coding_youtube

Posted by uniqueone
,

Recently, I came across this excellent machine learning playlist by Prof. Arti Ramesh from SUNY Binghamton. It contains videos on all topics covered in a machine learning course. I think folks here will like it.

https://m.youtube.com/channel/UCt8HFaRhijEKuKY7qzvdA3A


Posted by uniqueone
,
그룹에 Mathematics for Machine Learning(MML) 책을 보고 공부하시는 분들이 많을 것 같습니다. 동명의 Coursera 강의를 듣는 분들도 계실테고요.

강의를 진행하는 4명의 강사 중 한 명인 Sam Cooper는 Coursera의 MML 강좌를 등록한 학생 수가 15만명이 넘은 것을 자축하며, 해당 강좌를 유튜브에 무료로 풀기로 결정했다는 소식을 알렸습니다. 확인해보니 Chapter.3인 PCA 재생목록은 아직 올라와있지 않긴 하지만 '선형대수'와 '다변수미적분학'은 빠짐없이 올라와있네요.

+) 검색해보니 MML 강좌가 이미 Youtube 내 다른 재생목록으로 올라와있긴 한데 해당 강좌는 Unofficial 업로드이고, 빠진 영상도 더러 있는 것 같습니다. ICL 측에서 Official하게 올려준 강좌의 도움을 받아 수학 공부를 합시다..! (영어 자막도 제공해줍니다 크)
https://www.facebook.com/groups/TensorFlowKR/permalink/1038633053144419/?sfnsn=mo
Posted by uniqueone
,
Gradient Boost와 관련 모델 정리 (하나씩 점을 찍어 나가며 / 미스터 흠시)

* 영상 :  StatQuest with Josh Starmer 의 StatQuest: Gradient Boost Part1 영상

미스터 흠시 님께서 "StatQuest with Josh Starmer 의 StatQuest: Gradient Boost Part1" 영상을 정리해주신 내용을 공유합니다.

좋은 자료 감사드리며, 사이트 소개해 드립니다.

* Decision Tree
  : https://dailyheumsi.tistory.com/113?category=815369
 
* ML Model Ensemble 과 Bagging, Boosting 개념
  : https://dailyheumsi.tistory.com/111?category=815369

* Random Forest
  : https://dailyheumsi.tistory.com/114?category=815369

* AdaBoost
  : https://dailyheumsi.tistory.com/115?category=815369

* Gradient Boost
  : https://dailyheumsi.tistory.com/116?category=815369

* Catboost
  : https://dailyheumsi.tistory.com/136
https://www.facebook.com/groups/DataScienceGroup/permalink/2765440956851110/?sfnsn=mo
Posted by uniqueone
,
Machine Learning Tutorial 6 - Decision Tree Classifier and Decision Tree Regression in Python
https://www.facebook.com/groups/DataScienceGroup/permalink/2755799524481920/?sfnsn=mo
Posted by uniqueone
,
Machine Learning for Intelligent Systems: Cornell CS4780/CS5780

Lectures: https://www.youtube.com/playlist?list=PLl8OlHZGYOQ7bkVbuRthEsaLr7bONzbXS
Course Website: http://www.cs.cornell.edu/courses/cs4780/2018fa/syllabus/index.html
Find The Most Updated and Free Artificial Intelligence, Machine Learning, Data Science, Deep Learning, Mathematics, Python, R Programming Resources https://www.marktechpost.com/free-resources/
Posted by uniqueone
,
Here's the best order. In order Linear Algebra, Optimisation Theory, Probability Theory, Statistical Machine Learning, Classical Machine Learning, Computer Vision, Deep Learning

https://www.facebook.com/groups/DeepLearnng/permalink/2497747397125570/
Posted by uniqueone
,

https://www.facebook.com/groups/modulabs/permalink/2425272630871239/?sfnsn=mo

연대 수학과 이승철 교수님이 파이썬 기반으로 영상을 많이 찍어놓으셨네요. 보시면 좋을거 같아요!

머신러닝 통계학 확률론 등 동영상강의


Posted by uniqueone
,

https://www.facebook.com/groups/DeepNetGroup/permalink/896380944088122/?sfnsn=mo

For those of you interested, Stanford has an AI graduate certificate program on their website. They have the following classes:

[CS157 Computational Logic
](https://scpd.stanford.edu/search/publicCourseSearchDetails.do?method=load&courseId=11730)

[CS221 Artificial Intelligence: Principles and Techniques](https://scpd.stanford.edu/search/publicCourseSearchDetails.do;jsessionid=CAF3C13A982DFC017DA382B408276A56?method=load&courseId=11747)

[CS223A Introduction to Robotics
](https://scpd.stanford.edu/search/publicCourseSearchDetails.do?method=load&courseId=11750)

[CS224N Natural Language Processing with Deep Learning
](https://scpd.stanford.edu/search/publicCourseSearchDetails.do?method=load&courseId=11754)

[CS224U Natural Language Understanding
](https://scpd.stanford.edu/search/publicCourseSearchDetails.do?method=load&courseId=1180067)

[CS228 Probabilistic Graphical Models: Principles and Techniques
](https://scpd.stanford.edu/search/publicCourseSearchDetails.do?method=load&courseId=11761)

[CS229 Machine Learning
](https://scpd.stanford.edu/search/publicCourseSearchDetails.do?method=load&courseId=11763)

[CS230 Deep Learning
](https://scpd.stanford.edu/search/publicCourseSearchDetails.do?method=load&courseId=82209222)

[CS231A Computer Vision: From 3D Reconstruction to Recognition
](https://scpd.stanford.edu/search/publicCourseSearchDetails.do?method=load&courseId=10796915)

[CS231N Convolutional Neural Networks for Visual Recognition
](https://scpd.stanford.edu/search/publicCourseSearchDetails.do?method=load&courseId=42262144)

[AA228 Decision Making Under Uncertainty
](https://scpd.stanford.edu/search/publicCourseSearchDetails.do?method=load&courseId=39254289)

[CS234 Reinforcement Learning
](https://scpd.stanford.edu/search/publicCourseSearchDetails.do?method=load&courseId=75632440)

[https://scpd.stanford.edu/public/category/courseCategoryCertificateProfile.do?method=load&certificateId=1226717](https://scpd.stanford.edu/public/category/courseCategoryCertificateProfile.do?method=load&certificateId=1226717)
Posted by uniqueone
,
머신러닝 공부 순서, 방법 및 강의 정리(HowUse 곰씨네)

상세한 강의 내용 및 강의 후기 등은 "HowUse 곰씨네" 홈페이지를 참조하시기를 바랍니다.

"머신러닝 공부를 시작하면서 들었던 강의와 머신러닝 공부 방법에 대해서 정리해보려고 한다. 필자도 아직 머신러닝을 마스터하려면 갈 길이 멀었지만, 그간 공부했던 경험을 토대로 머신러닝 입문자들에게 조금이나마 도움이 됐으면 하는 마음으로 적어봤다."

https://statwith.tistory.com/entry/머신러닝-공부-순서-방법-및-강의-정리HowUse-곰씨네?category=768467

******************************************************
- 통계분석연구회
- 카페 : http://cafe.daum.net/statsas
- 통계분석연구회(Statistics Analysis Study) 그룹
: https://www.facebook.com/groups/statsas
- STATWITH : http://statwith.tistory.com/
- RSS : https://statwith.tistory.com/rss

1. 앤드류 응(Andrew Ng) 머신러닝 강좌

머신러닝 공부를 시작하기 위해 구글 검색을 해보면 열이면 아홉은 Andrew Ng 교수의 머신러닝 강좌부터 볼 것을 추천하고 있다. 앤드류 응 교수는 구글 브래인팀을 이끌었던 세계적인 AI 권위자로 스탠포드 대학 교수이자 코세라 창립자이다.

참고로 코세라 강의는 월 $45를 결제하면 Specializations에 있는 모든 과목을 무제한을 들을 수 있는데, 유명한 머신러닝 강의는 대부분 코세라에 있다. 가입 후 7일 동안은 무료라서 일단 가입했다.



앤드류 응 교수의 강의는 머신러닝 기본 강의라고 보면 될 것 같다. 수업은 원하는 때에 들을 수 있었고, 다만 숙제가 있다. 숙제는 Octave(옥타브)라는 스크립트 언어로 나왔다. 개인적으로 이 강의를 보고 난 후 파이썬을 공부했는데, 파이썬을 이미 공부한 사람들은 강의 숙제를 할 때 GitHub에 파이썬 코드로 재작성된 자료를 참고하면 될 것이다.

2. 파이썬(Python) 공부

어떤 머신러닝 전문가는 머신러닝을 배울 때 코딩부터 배우지 말라고 한다. 그런데 필자는 앤드류 응 교수의 머신러닝 수업을 대강 마무리 하고 바로 파이썬 문법을 공부했다. 삽질부터 해보는 개발자여서 그런지 이론보다는 코드에 먼저 눈이 갔던 것 같다. 

파이썬은 머신러닝에 즐겨 쓰이는 프로그래밍 언어이다. R이나 Matlab 같은 것도 있는데 머신러닝 언어 중 대세는 파이썬이라고 한다.

필자는 파이썬 공부를 하기 위해 파이썬 공식 사이트로 가서 문서들을 한 번 쭉 훑어보고 유데미(Udemy)에서 제일 짧은 강의부터 찾았다. 강의 이름은 김왼손의 유기농냠냠파이썬이라는 강좌였는데 파이썬 문법 부분만 빠르게 넘겨 보았다. 여러가지 프로그래밍 언어를 다뤄봐서 그런지 몇몇 파이썬 만의 독특한 문법들 빼고는 크게 어렵지는 않았다. 개인적으로 파이썬 문법 공부는 하루면 충분했던 것 같다.

만약 이미 코세라를 구독하고 있고 프로그래밍이 처음이거나 파이썬을 기초부터 제대로 배우고 싶다면 Python 3 Programming 강의를 추천한다.

3. 그래프 모형, 인공신경망 강의

머신러닝을 공부할 때는 머신러닝 개론 -> 그래프 모형 -> 인공신경망 순으로 공부하면 된다고 한다. 그래프 모형(Graphical Model)이란 머신러닝의 근간을 이루는 모델로 변수간 상호 의존 관계를 설명한다.



그래프 모형에 대한 강의는 Daphne Koller 교수의 Probabilistic Graphical Models 강의가 가장 유명하다. 이 역시 코세라 강의이다.

다음으로 최근 머신러닝의 대세가 된 알고리즘인 인공신경망(Neural Network) & 딥러닝(Deep Learning) 공부를 했다. AI, 머신러닝, 인공신경망, 딥러닝 개념이 어렵다면 아래 그림과 같은 관계라고 보면 된다.



AI(Artificial Intelligence)란 인간의 지능을 기계로 만든 것을 의미하며, 그 구체적인 방법 중 하나가 머신러닝(Machine Learning)인 것이다. 그리고 머신러닝을 구현하는 알고리즘 중의 하나가 인공신경망(Neural Network)과 딥러닝(Deep Learning)인 것이다.

딥러닝은 인공신경망에서 발전된 형태로 심화신경망 또는 개선된 인공신경망 등으로 불리기도 한다.

인공신경망 강의 역시 앤드류 응 교수의 코세라 강의인 Neural Networks and Deep Learning 강의를 들었다. 참고로 아직 보지는 않았지만 인공신경망 쪽에서 휴고 라로첼(Hugo Larochelle)의 유튜브 강의도 괜찮다고 한다.

4. 머신러닝 실습 강의

코세라 강의를 들으면서 잘 이해되지 않은 부분도 있고, 영어로 수업이 진행되다 보니 놓치는 부분도 많았던 것 같다. 그래서 조금 더 쉽고 실용적인 강의가 없나 찾다가 추가로 유데미에 있는 머신러닝 강의를 들었다.

참고로 유데미 강의는 프로그래밍을 전혀 해보지 않은 사람은 다소 따라가기 어려워 보였다. 강의는 텐서플로우와 케라스를 통해 인공 신경망 개발 환경을 구축해보고 딥러닝을 통한 이미지, 데이터 분류 실습을 해본다. 

또한 강화학습에 대한 내용과 Apache Sparks MLlib을 통한 대량 데이터 머신러닝 처리에 대한 내용도 배울 수 있었다.

5. 추가 학습

유데미 강의는 아직도 틈틈히 수강하고 있다. 그 와중에 다른 머신러닝/딥러닝 강의를 알아보다가 홍콩과기대 김성 교수님의 강의를 보게 되었다. 뭔가 이전에 배웠던 내용을 recap 하는 차원에서 보게 되었는데 머신러닝 이론에 대해 깔금하게 정리되어 있다. 머신러닝 공부를 시작하거나 공부 중이라면 참고하면 괜찮은 강의이지 않을까 싶다.

6. 머신러닝 공부에 도움 될 만한 URL 모음

머신러닝 공부에 도움 될 만한 사이트나 자료에 대한 URL은 이곳에 계속 업데이트할 예정이다. 

- 딥러닝을 위한 기초 수학 : https://www.slideshare.net/theeluwin/ss-69596991
- 텐서플로우 연습 코드 모음 : https://github.com/golbin/TensorFlow-Tutorials
- 구글 딥러닝 강의 : https://www.udacity.com/course/deep-learning--ud730
- 머신러닝 오픈소스 튜토리얼 : https://scikit-learn.org/stable/tutorial/
- 옥스포드 머신러닝 수업자료 : https://www.cs.ox.ac.uk/people/nando.defreitas/machinelearning/
- 머신러닝 용어집 : https://developers.google.com/machine-learning/glossary/?hl=ko


아리스토텔레스의 "시작이 반이다" 라는 명언이 있다. 그런데 영어 원문은 "Well begun, is half done." 이다. 한국어로 번역되면서 Well의 의미가 삭제된 것 같다. 제대로 해석하면 "좋은"시작이 반을 차지한다는 것이지 무작정 시작만 하면 된다는 의미는 아니다.

머신러닝, 딥러닝 공부 역시 마찬가지인 것 같다. 제대로 된 강의와 가이드로 공부를 시작해야 한다. 그리고 그 첫 시작은 앤드류 응 교수의 coursera 강의라고 생각한다. 아직도 머신러닝 공부를 망설이고 있다면, 일단 코세라에 접속해서 무료 강의부터 들어보자.
Posted by uniqueone
,
Learn:
1. linear algebra well (e.g. matrix math)
2. calculus to an ok level (not advanced stuff)
3. prob. theory and stats to a good level
4. theoretical computer science basics
5. to code well in Python and ok in C++

Then read and implement ML papers and *play* with stuff!  :-)

H / T : Shane Legg
Posted by uniqueone
,
http://sgsa.berkeley.edu/current-students/recommended-books

 

Applied Statistics

Categorical Data

  • ''Categorical Data Analysis'' by Alan Agresti
    • Well-written, go-to reference for all things involving categorical data.

Linear models

  • ''Generalized Linear Models'' by McCullagh and Nelder
    • Theoretical take on GLMs. Does not have a lot of concrete data examples.
  • ''Statistical Models'' by David A. Freedman
    • ...Berkeley classic!
  • ''Linear Models with R'' by Julian Faraway
    • Undergraduate-level textbook, has been used previously as a textbook for Stat 151A. Appropriate for beginners to R who would like to learn how to use linear models in practice. Does not cover GLMs. 

Experimental Design

  • ''Design of Comparative Experiments'' by RA Bailey
    • Classic, approachable text, free for download here

Machine Learning

  • ''The Elements of Statistical Learning'' by Hastie, Tibshirani, and Friedman.
    • Comprehensive but superficial coverage of all modern machine learning techniques for handling data. Introduces PCA, EM algorithm, k-means/hierarchical clustering, boosting, classification and regression trees, random forest, neural networks, etc. ...the list goes on. Download the book here.
  • ''Computer Age Statistical Inference: Algorithms, Evidence, and Data Science'' by Hastie and Efron.
  • ''Pattern Recognition and Machine Learning'' by Bishop.
  • ''Bayesian Reasoning and Machine Learning'' by Barber. Available online.
  • ''Probabilistic Graphical Models'' by Koller and Friedman.

Theoretical Statistics

  • ''Theoretical Statistics: Topics for a Core Course'' by Keener
    • The primary text for Stat 210A. Download from SpringerLink.
  • ''Theory of Point Estimation'' by Lehmann and Casella
    • A good reference for Stat 210A.
  • ''Testing Statistical Hypotheses'' by Lehmann and Romano
    • A good reference for Stat 210A.
  • ''Empirical Processes in M-Estimation'' by van de Geer
    • Some students find this helpful to supplement the material in 210B.

Probability

Undergraduate Level Probability

  • ''Probability'' by Pitman
    • What the majority of Berkeley undergraduates use to learn probability.
  • ''Introduction to Probability Theory'' by Hoel, Port and Stone
    • This text is more mathematically inclined than Pitman's, and more concise, but not as good at teaching probabilistic thinking.
  • ''Probability and Computing'' by Upfal and Mitzenmacher
    • What students in EECS use to learn about randomized algorithms and applied probability.

Measure Theoretic Probability

  • ''Probability: Theory and Examples'' by Durrett
    • This is the standard text for learning measure theoretic probability. Its style of presentation can be confusing at times, but the aim is to present the material in a manner that emphasizes understanding rather than mathematical clarity. It has become the standard text in Stat 205A and Stat 205B for good reason. Online here.
  • ''Foundations of Modern Probability'' by Olav Kallenberg
    • This epic tome is the ultimate research level reference for fundamental probability. It starts from scratch, building up the appropriate measure theory and then going through all the material found in 205A and 205B before powering on through to stochastic calculus and a variety of other specialized topics. The author put much effort into making every proof as concise as possible, and thus the reader must put in a similar amount of effort to understand the proofs. This might sound daunting, but the rewards are great. This book has sometimes been used as the text for 205A.
  • ''Probability and Measure'' by Billingsley
    • This text is often a useful supplement for students taking 205 who have not previously done measure theory. Download here.
  • ''Probability with Martingales'' by David Williams
    • This delightful and entertaining book is the fastest way to learn measure theoretic probability, but far from the most thorough. A great way to learn the essentials.

Stochastic Calculus

Stochastic Calculus is an advanced topic that interested students can learn by themselves or in a reading group. There are three classic texts:

  • ''Continuous Martingales and Brownian Motion'' by Revuz and Yor
  • ''Diffusions, Markov Processes and Martingales (Volumes 1 and 2)'' by Rogers and Williams
  • ''Brownian Motion and Stochastic Calculus'' by Karatzas and Shreve

Random Walk and Markov Chains

These are indispensable tools of probability. Some nice references are

  • ''Markov Chain and Mixing Times'' by Levin, Peres and Wilmer. Online here.
  • ''Markov Chains'' by Norris
    • Starting with elementary examples, this book gives very good hints on how to think about Markov Chains.
  • ''Continuous time Markov Processes'' by Liggett
    • A theoretical perspective on this important topic in stochastic processes. The text uses Brownian motion as the motivating example.

Mathematics 

Convex Optimization

  • ''Convex Optimization'' by Boyd and Vandenberghe.  : You can download the book here
  • ''Introductory Lectures on Convex Optimization'' by Nesterov.

Linear Algebra

  • ''The Matrix Cookbook'' by Petersen and Pedersen: ''Matrix identities, relations and approximations. A desktop reference for quick overview of mathematics of matrices.'' Download here.
  • ''Matrix Analysis'' and ''Topics in Matrix Analysis'' by Horn and Johnson
    • Second book is more advanced than the first. Everything you need to know about matrix analysis.

Convex Analysis

  • ''A course in Convexity'' by Barvinok. 
    • A great book for self study and reference. It starts with the basis of convex analysis, then moves on to duality, Krein-Millman theorem, duality, concentration of measure, ellipsoid method and ends with Minkowski bodies, lattices and integer programming. Fairly theoretical and has many fun exercises. 

Measure Theory

  • Real Analysis and Probability - Dudley
    • Very comprehensive. 
  • Probability and Measure Theory - Ash
    • Nice and easy to digest. Good as companion for 205A

Combinatorics

  • ''Enumerative Combinatorics Vol I and II'' - Richard Stanley.
    • There's also a course on combinatorics this semester in the math department called Math249: Algebraic Combinatorics. Despite the scary "algebraic" prefix it's really fun. Download here.

Computational Biology

  • ''Statistical Methods in Bioinformatics'' by Ewens and Grant
    • Great overview of sequencing technology for the unacquainted.
  • ''Computational Genome Analysis: An Introduction'' by Deonier, Tavare, and Waterman
    • Great R code examples from computational biology. Discusses the basics, such as the greedy algorithm, etc.

Population Genetics

  • ''Probability Models for DNA Sequence Evolution'' by Durrett
  • ''Mathematical Population Genetics'' by Ewens

Computer Science

Numerical Analysis

  • Numerical Analysis by Burden and Faires
    • This book is a good overview of numerical computation methods for everything you'd need to know about implementing most computational methods you'll run into in statistics. It is filled with pseudo-code  but does use Maple as it's exemplary language sometimes. It has been a great resource for the Computational Statistics courses (243/244). Depending on what happens with this course, this may be a good place to look when you're lost in computation.

Algorithms

  • ''Introduction to Algorithms'', Third Edition, by Cormen, Leiserson, Rivest, and Stein.
  • ''Algorithm Design'', by Jon Kleinberg and Éva Tardos.

 

Posted by uniqueone
,

I. 2014 2학기 기초통계학II 숙명여대 여인권
http://www.kocw.net/home/cview.do?cid=762d14861eb306ac&ar=link_nvrc
2013 2학기 기초통계학II 숙명여대 여인권
http://www.kocw.net/home/cview.do?cid=5b001db40374469f&ar=link_nvrc
네이버 '통계학 인터넷강의'로 검색하면 온라인 공개 강좌 섹션에 강의 목록이 나온다. 조회수 순으로 정렬하면 좋은 강의들이 나온다 

 

ㅣ. 확률 및 통계, 2014년 1학기 : HanyangUniversity : 이상화, 2014/04/21 ... 동영상 21개
ㅣ. 확률통계론, 2014년 2학기 : HanyangUniversity : 박재우, 2015/06/08 ... 동영상 18개
2014-2 확률통계론: http://www.youtube.com/watch?v=4JsAKaTEQMs&list=PLSN_PltQeOyjGOCnBz402iwXeki2wVXMJ
http://www.aistudy.com/math/probability.htm

ㅣ. 확률및통계
한양대 이상화교수
ㅣ. 확률통계론
한양대학교 공과대학 건설환경공학고 박재우교수의 확률통계론 강의. 총 18강 1시간씩
ㅣ. 기초통계학
제대로 시작하는 기초통계학(한빛아카데미)의 저자 노경섭의 강의
ㅣ. 통계학개론
방송통신대학교 통계학개론강의. 총 15강 약 50분씩
http://blplab.iptime.org:4321/course-cat/statistics/

ㅣ.  모두를 위한 프로그래밍 입문과 머신러닝 교수님께서 수강 전 이수를 권장하는 추천 책 목록입니다 :)
-세상에서 가장 쉬운 통계학(고지마 히로유키, 2009)
-세상에서 가장 쉬운 베이즈통계학입문(고지마 히로유키, 2017)
-확률과통계(한양대학교 이상화 교수, 2014)
-Reading Materials: Data Science from the Scratch - Ch.5, Ch.6, Ch.7
https://www.wadiz.kr/web/campaign/detail/qa/13991
Posted by uniqueone
,
https://m.facebook.com/groups/255834461424286?view=permalink&id=556808447993551

안녕하세요. 유령회원이 오랜만에 글을 적습니다. ;;;
올해도 다 지나갔네요....ㅠㅠ

Pattern Recognition and Machine Learning 책 2장 식(2.117)까지 정리한 노트입니다.

딥러닝 공부하면서 통계학 지식이 너무 없어서 혼자서 책보고 정리한 자료인데 원서라서 저같이 영알못 수알못에다 기억력까지 좋지 못한 경우는 다시보면 정말 첨보는 듯 한 느낌....다시 첨부터 읽어야하는듯한 자괴감을 막기위해 조금씩 주피터로 정리를 했습니다.

처음 관련 내용보았을 때 식이 너무 복잡해서 와 이거 뭐지..먼소리하는지 모르겠는데 싶었는데 어찌어찌 읽기는했네요.

식2.117까지는 꼭알아야겠다 싶어서 정리를 했는데 혹시나 보시고 계신분들 계시면 도움이 되면 좋겠습니다. 좀 어이없을 정도로 식을 풀어적어서 읽기 짜증나실 수 도 있습니다. 그냥 참고삼아..... ㅠㅠ

혹시 오류있으면 지적해주세요. 감사합니다.

http://nbviewer.jupyter.org/github/metamath1/ml-simple-works/blob/master/PRML/prml-chap2.ipynb
Posted by uniqueone
,
https://m.facebook.com/groups/255834461424286?view=permalink&id=513220265685703

안녕하세요, 앤드류 교수의 코세라 강의를 정주행 후 영상 슬라이드를 정리하여 보았습니다.

지난주에 이어 3주차 슬라이드를 공유 드립니다.
많은분들께 도움이 되었으면 합니다.

감사합니다. :)

3주차 : http://www.kwangsiklee.com/ko/2017/07/corsera-machine-learning%EC%9C%BC%EB%A1%9C-%EA%B8%B0%EA%B3%84%ED%95%99%EC%8A%B5-%EB%B0%B0%EC%9A%B0%EA%B8%B0-week3/
2주차 : http://www.kwangsiklee.com/ko/2017/07/corsera-machine-learning%EC%9C%BC%EB%A1%9C-%EA%B8%B0%EA%B3%84%ED%95%99%EC%8A%B5-%EB%B0%B0%EC%9A%B0%EA%B8%B0-week2/
1주차 : http://www.kwangsiklee.com/ko/2017/07/corsera-machine-learning-week1-%EC%A0%95%EB%A6%AC/
Posted by uniqueone
,

https://youtu.be/OB1reY6IX-o

 

Tutorial materials found here: https://github.com/amueller/scipy-2016-sklearn

See the complete SciPy 2016 Conference talk & tutorial playlist here: https://www.youtube.com/playlist?list=PLYx7XA2nY5Gf37zYZMw6OqGFRPjB1jCy6

Posted by uniqueone
,

Time Series Forecasting with Python 7-Day Mini-Course - Machine Learning Mastery
http://machinelearningmastery.com/time-series-forecasting-python-mini-course/
Posted by uniqueone
,

https://www.facebook.com/groups/DeepNetGroup/permalink/385843868475168/ 

 

News Flash: Check out Issue #5 (http://aidl.io/) of AIDL Weekly!  We have a Special Issue on Self-driving Car this week. 
Also woohoo! Check out Episode 4 of AIDL Office Hours (https://www.youtube.com/watch?v=Qqtc9_05yLc)! We have Han Shu from airbnb talks about his very interesting experience in data science and speech recognition.

Welcome! Welcome! We are the most active FB group for Artificial Intelligence/Deep Learning, or AIDL.  Many of our members are knowledgeable so feel free to ask questions. 

We have a tied-in newsletter: https://aidlweekly.curated.co/  and

a YouTube-channel, with (kinda) weekly show "AIDL Office Hour",
https://www.youtube.com/channel/UC3YM5TEbSqIpFGH85d6gjKg

Posting is strict at AIDL, your post has to be relevant, accurate and non-commerical (FAQ Q12).   Commercial posts are only allowed on Saturday.  If you don't follow this rule, you might be banned.

FAQ:

Q1: How do I start AI/ML/DL?
A:
Step 1: Learn some Math and Programming,
Step 2: Take some beginner classes. e.g. Try out Ng's Machine Learning.
Step 3: Find some problem to play with. Kaggle has tons of such tasks.
Iterate the above 3 steps until you become bored. From time to time you can share what you learn.

Here's a post which one of us (Arthur) wrote up.  It summarizes his experience in learning machine learning and you might find it useful. (http://thegrandjanitor.com/2016/05/21/learning-machine-learning-some-personal-experience/)

Q2: What is your recommended first class for ML?
A: Ng's Coursera, the CalTech edX class, the UW Coursera class is also pretty good.

Q3: What are your recommended classes for DL?
A: Go through at least 1 or 2 ML class, then go for Hinton's, Karparthay's, Socher's, LaRochelle's and de Freitas. For deep reinforcement learning, go with Silver's and Schulmann's lectures. Also see Q4.

Q4: How do you compare different resources on machine learning/deep learning?
A: (Shameless self-promoting plug by Arthur) Here is an article, "Learning Deep Learning - Top-5 Resources" written by me (Arthur) on different resources and their prerequisites. I refer to it couple of times at AIDL, and you might find it useful: http://thegrandjanitor.com/2016/08/15/learning-deep-learning-my-top-five-resource/ .

Other than my own list, here are some very good lists I recommend you to read through as well,
* YerevaNN Lab's "A Guide to Deep Learning" (http://yerevann.com/a-guide-to-deep-learning/)
* Reddit's machine learning FAQ has another list of great resources as well.

Q5: How do I use machine learning technique X with language L?
A: Google is your friend. You might also see a lot of us referring you to Google from time to time. That's because your question is best to be solved by Google.

Q6: Explain concept Y. List 3 properties of concept Y.
A: Google. Also we don't do your homework. If you couldn't Google the term though, it's fair to ask questions.

Q7: What is the most recommended resources on deep learning on computer vision?
A: cs231n. 2016 is the one I will recommend. Most other resources you will find are derivative in nature or have glaring problems.

Q8: What is the prerequisites of Machine Learning/Deep Learning?
A: Mostly Linear Algebra and Calculus I-III. In Linear Algebra, you should be good at eigenvectors and matrix operation. In Calculus, you should be quite comfortable with differentiation. You might also want to have a primer on matrix differentiation before you start because it's a topic which is seldom touched in an undergraduate curriculum.
Some people will also argue Topology as important and having a Physics and Biology background could help. But they are not crucial to start.

Q9: What are the cool research papers to read in Deep Learning?
A: We think songrotek's list is pretty good: https://github.com/songrotek/Deep-Learning-Papers-Reading-Roadmap. Another classic is deeplearning.net's reading list: http://deeplearning.net/reading-list/.

Q10: What is the best/most recommended language in Deep Learning/AI?
A: Python is usually cited as a good language because it has the best support of libraries. Most ML libraries from python links with C/C++. So you get the best of both flexibility and speed.
Other also cites Java (deeplearning4j), Lua (Torch), Lisp, Golang, R. It really depends on your purpose. Practical concerns such as code integration, your familiarity with a language usually dictates your choice. R deserves special mention because it was widely used in some brother fields such as data science and it is gaining popularity.

Q11: I am bad at Math/Programming. Can I still learn A.I/D.L?
A: Mostly you can tag along, but at a certain point, if you don't understand the underlying Math, you won't be able to understand what you are doing. Same for programming, if you never implement one, or trace one yourself, you will never truly understand why an algorithm behave a certain way.
So what if you feel you are bad at Math? Don't beat yourself too much. Take Barbara Oakley's class on "Learning How to Learn", you will learn more about tough subjects such as Mathematics, Physics and Programming.

Q12: Would you explain more about AIDL's posting requirement?
A: This is a frustrating topic for many posters, albeit their good intention. I suggest you read through this blog post http://thegrandjanitor.com/2017/01/26/posting-on-aidl/ before you start any posting.

Q13: What is the list of common public database?
A: Take a look of the collection from our members: https://www.facebook.com/groups/DeepNetGroup/permalink/394240667635488/
Posted by uniqueone
,

Facebook, Data Mining / Machine Learning / AI

I'm a programmer and I'm serious about learning Machine Learning/Neural Networks. Issue is that I have no idea where to start. Anyone can suggest maybe a certain course or generally a place to start from?

----------------------------------------------------

1. Andrew NG's Machine Learning course in Coursera

https://www.coursera.org/learn/machine-learning

2. Essence of linear algebra preview

https://www.youtube.com/watch?v=kjBOesZCoqc&list=PLZHQObOWTQDPD3MizzM2xVFitgF8hE_ab

3. 7 Steps to Mastering Machine Learning With Python

http://www.kdnuggets.com/2015/11/seven-steps-machine-learning-python.html/2

4. youtube sirajology

https://www.youtube.com/channel/UCWN3xxRkmTPmbKwht9FuE5A

5. Michael Nielsen

http://neuralnetworksanddeeplearning.com/chap1.html

6. https://bigdatauniversity.com/

 

7. udacity Deep Learning Fundations

 

8. Udemy Course by Kirill and hadelen if you are extremely new to data science.

superdatascience.com

9. Geoffrey Hinton's neural net course on coursera

 

10. welch labs Neural Networks Demystified

https://www.youtube.com/watch?v=bxe2T-V8XRs&list=PLiaHhY2iBX9hdHaRr6b7XevZtgZRa1PoU

11. 초보자를 위한 조언 및 사이트 소개

https://vmanju.com/how-to-get-started-with-artificial-intelligence-90f14b2bc321#.k4vef6fd3

caffe를 어떻게 설치하고 어떻게 동작하는지 설명돼 있음.

https://github.com/humphd/have-fun-with-machine-learning/blob/master/README.md

 

12. An Introduction to Statistical Learning with Applications in R

http://www-bcf.usc.edu/~gareth/ISL/

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Posted by uniqueone
,
https://www.mathworks.com/campaigns/products/offer/machine-learning-with-matlab-conf.html?elqsid=1483072843376&potential_use=Commercial

 

 

Learn more about machine learning:

Posted by uniqueone
,

machine learning course by Professor Andrew Ng

 

Lecture notes and exercise code

https://github.com/ShokuninSan/coursera-ml-class

 

exercise code

https://github.com/yhyap/machine-learning-coursera

 

Lecture notes

http://www.holehouse.org/mlclass/

 

 

 

Posted by uniqueone
,

https://www.youtube.com/playlist?list=PLLH73N9cB21V_O2JqILVX557BST2cqJw4

Posted by uniqueone
,
https://nathanielng.github.io/machine-learning/perceptron/perceptron.html

 

 


Machine Learning Part I

1. Introduction

1.1 What is Machine Learning?

Machine learning was described by Arthur Samuel as the "field of study that gives computers the ability to learn without being explicitly programmed".

The following three principles are central to Machine Learning:

  1. a pattern exists: machine learning will not work on data that is completely random.
  2. the pattern cannot be "pinned down" mathematically: if a pattern can be pinned down mathematically (for example Newton's equations), it is probably more appropriate to use the original equations
  3. we have data on that pattern: machine learning algorithms require relevant data, in order to search for patterns in that data.

The last point raises the following question - how much data is needed in order to learn?

1.2 The Feasibility of Learning - how much data do we need in order to learn?

Two related concepts can give us insight as to why we are able to learn from data, and how does having more data help us to learn--these are Hoeffding's Inequality and the law of large numbers.

1.2.1 Hoeffding's Inequality

Consider an infinite bin, from which we take a sample of size N, which we find to have a sample frequency, ν. However, the bin frequency, μ, is unknown. However, Hoeffding's Inequality allows us to calculate a probability bound between these two quantities, i.e.:

P[|νμ|>ϵ]2e2ϵ2N

This inequality applies to a single bin, and is valid for all N and ϵ and the bound does not depend on μ.

It illustrates the tradeoff between N, ϵ, and the bound, i.e. the larger the sample size, N, the smaller our probability bound. On the other hand, the smaller the tolerance, ϵ, the harder it will be to keep the probability small.

1.2.2 The Law of Large Numbers

A related concept is the weak law of large numbers, given by:

limmP[|EXP[X]1mi=1mxi|>ϵ]=0

As the sample size, m, approaches infinity, the probability approaches zero.

Further Reading:

  1. Machine Learning Theory - Part 1: Introduction | Mostafa Samir
  2. Machine Learning Theory - Part 2: Generalization Bounds | Mostafa Samir

1.3 Libraries used in this exercise

The following Python libraries are used in this exercise

In [1]:
from __future__ import print_function, division
import numpy as np
import matplotlib.pyplot as plt
import itertools
np.set_printoptions(edgeitems=3,infstr='inf',linewidth=75,nanstr='nan',precision=4,suppress=False,threshold=1000,formatter=None)
%matplotlib inline

2. Machine Learning Example

2.1 Learning Boolean Target Function

In this example, we intend to "learn" a Boolean target function. The function takes in input vector of size 3, say [0, 1, 0] or [0, 1, 1] and outputs a single output, yn, which can be zero or 1. If we enumerate the set of all possible target functions, we would have 223=256 distinct Boolean functions on 3 Boolean inputs.

In this example, we wish to score 4 different hypotheses, ga,gb,gc,gd based on how well they perform on "out of sample points". The scoring system is 1 point per correct point, i.e. if the function gets 3 "out of sample points" correct, it will get a score of 3. We wish to calcuate the total score when enumerating over the entire set of all possible functions.

In the code below, the 256 distinct Boolean functions are enumerated using itertools.product([0,1],repeat=8) and a score is calculated for each of 4 possible cases. It is determined that the score for each of these cases is identical.

In [2]:
def calculate_score_last3(y,g):
    total_score=0
    for y_last in y[:,5:]:
        total_score += np.sum(y_last==g)
    return total_score
In [3]:
g_a = np.vstack(np.array([1,1,1,1,1]))
g_b = np.vstack(np.array([0,0,0,0,0]))
g_c = np.vstack(np.array([0,1,1,0,1]))
g_d = np.logical_not(g_a).astype(int)
y = np.array(list(itertools.product([0,1],repeat=8)))

print("Scores:\n(a) {}\n(b) {}\n(c) {}\n(d) {}".format(
    calculate_score_last3(y,[1,1,1]),
    calculate_score_last3(y,[0,0,0]),
    calculate_score_last3(y,[0,0,1]),
    calculate_score_last3(y,[1,1,0])))
Scores:
(a) 384
(b) 384
(c) 384
(d) 384

3. The Perceptron

3.1 Introduction & Background

The Perceptron is attributed to the work of Frank Rosenblatt in 1957. It is a binary classifier that works on the basis of whether a dot product, wx, exceeds a certain threshold:

f(x)={1,if wx>00,otherwise

This can be written in the form of a hypothesis given by:

h(x)=sign(wTx)=sign(i=1Nwixi)

3.2 Generating Sample Data

To generate the sample data for the perceptron classifier above, a three-column random matrix with the first column as ones is created as follows:

In [4]:
def generate_data(n,seed=None):
    if seed is not None:
        np.random.seed(seed)
    x0 = np.ones(n)
    x1 = np.random.uniform(low=-1,high=1,size=(2,n))
    return np.vstack((x0,x1)).T

3.3 Creating a random line inside the region of interest

The region of interest is X=[1,1]×[1,1], where × denotes the Cartesian Product.

A random line is created, and to ensure that it falls within the region of interest, it is created from two random points, (x0,y0) and (x1,y1) which are generated within X. The equation for this line in slope-intercept form and in the hypothesis / weights form are derived as follows.

3.3.1 Equation of the random line in the slope-intercept form, y=mx+c

Slope, m=y1y0x1x0
Intercept, c=y0mx0

3.3.2 Equation of the random line in hypothesis [weights] form

The hypothesis, in the two dimensional case (N=2), and since x0=1:

h(x)=sign(w0+w1x1+w2x2)

The decision boundary, h(x)=0 is given by:

w0+w1x1+w2x2=0

This can be converted to the slope-intercept form if we take x1 to be the x axis, and x2 to be the y axis, i.e.:

w0+w1x+w2y=0
y=w1w2xw0w2

Comparison with the equation y=mx+c yields the following relationships:

m=w1w2
c=w0w2

If we arbitrarily set w2=1, we arrive at the following set of weights, which is consistent with the decision boundary denoted by y=mx+c:

w=(c,m,1)
In [5]:
def get_random_line(seed=None):
    X = generate_data(2,seed=seed)
    x = X[:,1]
    y = X[:,2]
    m = (y[1]-y[0])/(x[1]-x[0])
    c = y[0] - m*x[0]
    return np.array([-c,-m,1])

def draw_line(ax,w,marker='g--',label=None):
    m = -w[1]/w[2]
    c = -w[0]/w[2]
    x = np.linspace(-1,1,20)
    y = m*x + c
    if label is None:
        ax.plot(x,y,marker)
    else:
        ax.plot(x,y,marker,label=label)
    
def get_hypothesis(X,w):
    h=np.dot(X,w)
    return np.sign(h).astype(int)

3.4 Generating a Training Dataset

The following code generates a training dataset (separated into positive and negative classes according to a random line in X) and plots it.

In [6]:
def plot_data(fig,plot_id,X,y=None,w_arr=None,my_x=None,title=None):
    ax = fig.add_subplot(plot_id)
    if y is None:
        ax.plot(X[:,1],X[:,2],'gx')
    else:
        ax.plot(X[y > 0,1],X[y > 0,2],'b+',label='Positive (+)')
        ax.plot(X[y < 0,1],X[y < 0,2],'ro',label='Negative (-)')
    ax.set_xlim(-1,1)
    ax.set_ylim(-1,1)
    ax.grid(True)
    if w_arr is not None:
        if isinstance(w_arr,list) is not True:
            w_arr=[w_arr]
        for i,w in enumerate(w_arr):
            if i==0:
                draw_line(ax,w,'g-',label='Theoretical')
            else:
                draw_line(ax,w,'g--')
    if my_x is not None:
        ax.plot([my_x[0]],[my_x[1]],'kx',markersize=10)
    if title is not None:
        ax.set_title(title)
    ax.legend(loc='best',frameon=True)
In [7]:
def create_dataset(N,make_plot=True,seed=None):
    X = generate_data(N,seed=seed)
    w_theoretical = get_random_line()
    y = get_hypothesis(X,w_theoretical)
    if make_plot is True:
        fig = plt.figure(figsize=(7,5))
        plot_data(fig,111,X,y,w_theoretical,title="Initial Dataset")
    return X,y,w_theoretical

3.5 The Perceptron Learning Algorithm

The Preceptron Learning Algorithm (PLA) is implemented in the following steps

  1. Calculate h(x)=sign(wTx) which can take on values of -1, 0, or 1 for each sample.
  2. Compare h(x) with y to find misclassified point(s) if any.
  3. Pick one misclassified point at random.
  4. Iterate the weights according to the PLA: w=w+ynxn, where yn is the correct classification for the misclassified point, and xn is the misclassified point.

The code below also keeps track of the weights, misclassification error, and misclassified point at each iteration.

In [8]:
def PLA(w,X,y0,n_iterations=10,verbose=True):
    assert len(y0)==X.shape[0]
    n=len(y0)
    x_arr = list()
    w_arr = list()
    m_arr = list()
    for i in range(n_iterations):
        h   = get_hypothesis(X,w)
        bad = h != y0
        bad = np.argwhere(bad).flatten()
        if len(bad) > 0:
            idx  = np.random.choice(bad,1)[0]
            my_x = X[idx,:]
            m_arr.append(100.0*len(bad)/n)
            x_arr.append(my_x)
            w_arr.append(np.copy(w))
            w += np.dot(y0[idx],my_x)
            if verbose is True:
                print("iter {}: {}% misclassified, w={}" \
                      .format(i,m_arr[-1],w_arr[-1]))
        else:
            m_arr.append(0.0)
            x_arr.append(np.array([1.0, np.nan,np.nan]))
            w_arr.append(np.copy(w))
            if verbose is True:
                print("iter {}: zero misclassified (PLA has converged)".format(i))
            return w,w_arr,m_arr,x_arr
    print("PLA failed to converge after {} iterations".format(i))
    return None,None,None,None

3.6 Implementing the PLA

Here, we generate a sample dataset of 10 points and plot it. The perceptron learning algorithm, starting from an initial weight of (0,0,0), converges in less than 10 iterations.

In [9]:
X,y,w_theoretical = create_dataset(N=10,make_plot=True,seed=247)
w0 = np.array([0,0,0],dtype=float)
In [10]:
w,w_arr,m_arr,x_arr = PLA(w0,X,y,n_iterations=100,verbose=True)
iter 0: 100.0% misclassified, w=[ 0.  0.  0.]
iter 1: 40.0% misclassified, w=[ 1.     -0.2676 -0.3799]
iter 2: 10.0% misclassified, w=[ 0.     -1.0827  0.2181]
iter 3: 20.0% misclassified, w=[ 1.     -0.83    0.9908]
iter 4: 20.0% misclassified, w=[ 0.     -1.4921  1.3024]
iter 5: zero misclassified (PLA has converged)
In [11]:
def draw_plot_steps(fig,plot_id,X,y,w_theoretical,w_arr,x_arr,idx_arr):
    assert len(idx_arr) <= 9
    for idx in idx_arr:
        print("w_arr[{}] = {}, x_arr[{}] = {}".format(idx,w_arr[idx],idx,x_arr[idx][1:]))
        plot_data(fig,plot_id,X,y,[w_theoretical] + [w_arr[idx]],
                  x_arr[idx][1:],title="iteration {}".format(idx))
        plot_id += 1

fig = plt.figure(figsize=(10,15))
draw_plot_steps(fig,421,X,y,w_theoretical,w_arr,x_arr,np.arange(len(w_arr)-1)+1)
w_arr[1] = [ 1.     -0.2676 -0.3799], x_arr[1] = [ 0.8151 -0.598 ]
w_arr[2] = [ 0.     -1.0827  0.2181], x_arr[2] = [ 0.2527  0.7728]
w_arr[3] = [ 1.     -0.83    0.9908], x_arr[3] = [ 0.6621 -0.3116]
w_arr[4] = [ 0.     -1.4921  1.3024], x_arr[4] = [-0.2676 -0.3799]
w_arr[5] = [ 1.     -1.7598  0.9225], x_arr[5] = [ nan  nan]
In [12]:
def plot_convergence(m_arr):
    fig = plt.figure(figsize=(7,5))
    ax = fig.add_subplot(111)
    ax.plot(m_arr,'g+-',markersize=10)
    ax.grid(True)
    ax.set_title("Convergence")
    ax.set_xlabel("Iterations")
    ax.set_ylabel("Misclassification Error (%)")
plot_convergence(m_arr)

3.7 Number of iterations required for convergence

The following code calculates the number of iterations required for convergence and plots its distribution.

In [13]:
def plot_histogram(my_count,bins=200,x_max=80):
    fig = plt.figure(figsize=(7,5))
    ax = fig.add_subplot(111)
    ax.hist(my_count,bins=bins);
    ax.set_xlim(0,x_max)
    ax.grid(True)
    
def get_iteration_distribution(N,n_trials=1000,max_iterations=10000,summary=True):
    n_iterations=np.zeros(n_trials,dtype=int)
    w0 = np.array([0,0,0],dtype=float)
    for i in range(n_trials):
        X,y,w_theoretical = create_dataset(N=N,make_plot=False,seed=None)
        w,w_arr,m_arr,x_arr = PLA(w0,X,y,n_iterations=max_iterations,verbose=False)
        n_iterations[i]=len(w_arr)
        
    if summary is True:
        print("Minumum iterations: {}".format(np.min(n_iterations)))
        print("Maximum iterations: {}".format(np.max(n_iterations)))
        print("Mean    iterations: {}".format(np.mean(n_iterations)))
    return n_iterations
In [14]:
n_iterations = get_iteration_distribution(N=10,n_trials=1000)
plot_histogram(n_iterations,bins=200,x_max=50)
Minumum iterations: 1
Maximum iterations: 240
Mean    iterations: 9.006

3.8 Calculate the misclassification error for the converged weights

If we know the theoretical decision boundary, w_theoretical, that 'knows' the correct classification of points, we can calculate the number of points misclassified by w (the final weights after convergence using the PLA) via random sampling. The misclassification error is slightly less than 20%.

In [15]:
def calculate_misclassification(w_theoretical,w,n_samples=1000,verbose=True):
    X  = generate_data(n_samples,seed=None)
    y0 = get_hypothesis(X,w_theoretical)
    y  = get_hypothesis(X,w)
    n_correct = np.sum(y == y0)
    if verbose is True:
        if w_theoretical[0] != 0.0:
            print("Theoretical Weights  : {}".format(w_theoretical/w_theoretical[0]))
        else:
            print("Theoretical Weights  : {}".format(w_theoretical))
        print("PLA Predicted Weights: {}".format(w))
        print("Correct points   = {}".format(n_correct))
        print("Incorrect points = {}".format(n_samples-n_correct))
        print("Misclassification= {}%".format(np.round(100 * (n_samples-n_correct)/n_samples, 4)))
        fig = plt.figure(figsize=(7,5))
        plot_data(fig,111,X,y0,[w_theoretical,w])
    return (n_samples-n_correct)/n_samples
In [16]:
misclassification = calculate_misclassification(w_theoretical,w,verbose=True)
misclassification
Theoretical Weights  : [ 1.     -2.8719  4.1079]
PLA Predicted Weights: [ 1.     -1.7598  0.9225]
Correct points   = 825
Incorrect points = 175
Misclassification= 17.5%
Out[16]:
0.17499999999999999
In [17]:
def plot_misclassification(my_count,bins=20,x_max=0.25):
    fig = plt.figure(figsize=(7,5))
    ax = fig.add_subplot(111)
    ax.hist(my_count,bins=bins);
    ax.set_xlim(0,x_max)
    ax.grid(True)
In [18]:
def get_misclassification_distribution(N=10,n_trials=1000,max_iterations=10000,summary=True):
    w0 = np.array([0,0,0],dtype=float)
    misclassification=np.zeros(n_trials)
    for i in range(n_trials):
        X,y,w_theoretical = create_dataset(N,make_plot=False,seed=None)
        w,w_arr,m_arr,x_arr = PLA(w0,X,y,n_iterations=max_iterations,verbose=False)
        misclassification[i]=calculate_misclassification(w_theoretical,w,n_samples=1000,verbose=False)
        
    if summary is True:
        print("Minumum misclassification: {}".format(np.min(misclassification)))
        print("Maximum misclassification: {}".format(np.max(misclassification)))
        print("Mean    misclassification: {}".format(np.mean(misclassification)))
    return misclassification

3.9 Iteration distribution and misclassification distribution for N=10

Here, we find that for N=10, an average of about 10 iterations is required for convergence. The average misclassification error is about 10%.

In [19]:
n_iterations = get_iteration_distribution(N=10,n_trials=1000)
plot_histogram(n_iterations,bins=100,x_max=40)
Minumum iterations: 1
Maximum iterations: 265
Mean    iterations: 9.471
In [20]:
misclassification = get_misclassification_distribution(N=10)
plot_misclassification(misclassification,bins=20,x_max=0.4)
Minumum misclassification: 0.002
Maximum misclassification: 0.474
Mean    misclassification: 0.108496

3.10 Iteration distribution and misclassification distribution for N=100

For N=100, an average of about 80~100 iterations is required for convergence. The average misclassification error is slightly over ~1%.

In [21]:
n_iterations = get_iteration_distribution(N=100,n_trials=1000)
plot_histogram(n_iterations,bins=300,x_max=300)
Minumum iterations: 1
Maximum iterations: 6713
Mean    iterations: 81.904
In [22]:
misclassification = get_misclassification_distribution(N=100)
plot_misclassification(misclassification,bins=40,x_max=0.05)
Minumum misclassification: 0.0
Maximum misclassification: 0.075
Mean    misclassification: 0.014019000000000002

3.11 Convergence Plot for N=100

In [23]:
X,y,w_theoretical = create_dataset(N=100,make_plot=True,seed=12345)
w0 = np.array([0,0,0],dtype=float)
w,w_arr,m_arr,x_arr = PLA(w0,X,y,n_iterations=1000,verbose=False)
plot_convergence(m_arr)
In [24]:
fig = plt.figure(figsize=(8,7))
plot_data(fig,111,X,y,[w_theoretical] + w_arr)
//anaconda/lib/python3.5/site-packages/ipykernel/__main__.py:10: RuntimeWarning: invalid value encountered in double_scalars
//anaconda/lib/python3.5/site-packages/ipykernel/__main__.py:11: RuntimeWarning: invalid value encountered in double_scalars
In [ ]:
 
Posted by uniqueone
,