'Machine Learning'에 해당되는 글 101건

  1. 2018.02.26 Recommended Books - UC Berkeley Statistics Graduate Student Association
  2. 2017.12.31 A Guide to Artificial Intelligence in Healthcare: a comprehensive e-book of how artificial intelligence can be used for improving medicine and healthcare
  3. 2017.12.16 통계 추천 동영상강좌
  4. 2017.11.08 Pattern Recognition and Machine Learning 책 2장 식(2.117)까지 정리한 노트입니다.
  5. 2017.10.13 기계학습 한글로된 자료(강의자료 + 영상)을 모아봤습니다.
  6. 2017.10.12 베이즈 정리(Bayes Theorem)라고 알려진 사후확률 (posterior probability)에 관한 몇가지 수학을 논의할 것입니다
  7. 2017.09.28 머신러닝 용어 https://developers.google.com/machine-learning/glossary/
  8. 2017.09.26 30개의 필수 데이터과학, 머신러닝, 딥러닝 치트시트
  9. 2017.08.03 앤드류 교수의 코세라 강의를 정주행 후 영상 슬라이드를 정리하여
  10. 2017.07.04 데이터과학 및 딥러닝을 위한 데이터세트
  11. 2017.06.20 Beginning Machine Learning – A few Resources [Subjective]
  12. 2017.04.16 PDF or E-book which is helpful for beginners to learn Statistical concepts such as Regression , Clustering etc.
  13. 2017.04.14 SNU TF 스터디 모임 1기 때부터 쭉 모아온 발표자료들
  14. 2017.04.13 Pattern Recognition & Machien Learning, Bishop 정리한 한글사이트
  15. 2017.04.11 Machine Learning Part 1 | SciPy 2016 Tutorial | Andreas Mueller & Sebastian Raschka
  16. 2017.04.06 List of Free Must-Read Books for Machine Learning
  17. 2017.04.05 Regularization resources in matlab
  18. 2017.03.26 Time Series Forecasting with Python 7-Day Mini-Course - Machine Learning Mastery
  19. 2017.03.18 How do I start AI/ML/DL?
  20. 2017.03.17 R, Python, Machine Learning, Dataviz: Most Popular Resources - Data Science Central
  21. 2017.03.16 Machine Learning/Neural Networks recommended courses
  22. 2017.03.08 Generalized Linear Model 맛만 보기(Logistic regression analysis의 예로)
  23. 2017.03.08 An Interactive Tutorial on Numerical Optimization
  24. 2017.03.07 빅 데이터(기계학습/패턴분석)의 수학적 이해를 위한 책들
  25. 2017.03.05 Here are 13 books on Machine Learning and Data Mining
  26. 2017.02.28 box plot
  27. 2017.02.25 [책소개] 초보자들을 위한 통계학습 (An Introduction to Statistical Learning with Applications in R)
  28. 2017.02.07 Machine Learning with MATLAB Examples
  29. 2017.02.06 How set miscalculation cost in MATLAB SVM model? - Stack Overflow
  30. 2017.02.03 Practical Guide to deal with Imbalanced Classification Problems in R
http://sgsa.berkeley.edu/current-students/recommended-books

 

Applied Statistics

Categorical Data

  • ''Categorical Data Analysis'' by Alan Agresti
    • Well-written, go-to reference for all things involving categorical data.

Linear models

  • ''Generalized Linear Models'' by McCullagh and Nelder
    • Theoretical take on GLMs. Does not have a lot of concrete data examples.
  • ''Statistical Models'' by David A. Freedman
    • ...Berkeley classic!
  • ''Linear Models with R'' by Julian Faraway
    • Undergraduate-level textbook, has been used previously as a textbook for Stat 151A. Appropriate for beginners to R who would like to learn how to use linear models in practice. Does not cover GLMs. 

Experimental Design

  • ''Design of Comparative Experiments'' by RA Bailey
    • Classic, approachable text, free for download here

Machine Learning

  • ''The Elements of Statistical Learning'' by Hastie, Tibshirani, and Friedman.
    • Comprehensive but superficial coverage of all modern machine learning techniques for handling data. Introduces PCA, EM algorithm, k-means/hierarchical clustering, boosting, classification and regression trees, random forest, neural networks, etc. ...the list goes on. Download the book here.
  • ''Computer Age Statistical Inference: Algorithms, Evidence, and Data Science'' by Hastie and Efron.
  • ''Pattern Recognition and Machine Learning'' by Bishop.
  • ''Bayesian Reasoning and Machine Learning'' by Barber. Available online.
  • ''Probabilistic Graphical Models'' by Koller and Friedman.

Theoretical Statistics

  • ''Theoretical Statistics: Topics for a Core Course'' by Keener
    • The primary text for Stat 210A. Download from SpringerLink.
  • ''Theory of Point Estimation'' by Lehmann and Casella
    • A good reference for Stat 210A.
  • ''Testing Statistical Hypotheses'' by Lehmann and Romano
    • A good reference for Stat 210A.
  • ''Empirical Processes in M-Estimation'' by van de Geer
    • Some students find this helpful to supplement the material in 210B.

Probability

Undergraduate Level Probability

  • ''Probability'' by Pitman
    • What the majority of Berkeley undergraduates use to learn probability.
  • ''Introduction to Probability Theory'' by Hoel, Port and Stone
    • This text is more mathematically inclined than Pitman's, and more concise, but not as good at teaching probabilistic thinking.
  • ''Probability and Computing'' by Upfal and Mitzenmacher
    • What students in EECS use to learn about randomized algorithms and applied probability.

Measure Theoretic Probability

  • ''Probability: Theory and Examples'' by Durrett
    • This is the standard text for learning measure theoretic probability. Its style of presentation can be confusing at times, but the aim is to present the material in a manner that emphasizes understanding rather than mathematical clarity. It has become the standard text in Stat 205A and Stat 205B for good reason. Online here.
  • ''Foundations of Modern Probability'' by Olav Kallenberg
    • This epic tome is the ultimate research level reference for fundamental probability. It starts from scratch, building up the appropriate measure theory and then going through all the material found in 205A and 205B before powering on through to stochastic calculus and a variety of other specialized topics. The author put much effort into making every proof as concise as possible, and thus the reader must put in a similar amount of effort to understand the proofs. This might sound daunting, but the rewards are great. This book has sometimes been used as the text for 205A.
  • ''Probability and Measure'' by Billingsley
    • This text is often a useful supplement for students taking 205 who have not previously done measure theory. Download here.
  • ''Probability with Martingales'' by David Williams
    • This delightful and entertaining book is the fastest way to learn measure theoretic probability, but far from the most thorough. A great way to learn the essentials.

Stochastic Calculus

Stochastic Calculus is an advanced topic that interested students can learn by themselves or in a reading group. There are three classic texts:

  • ''Continuous Martingales and Brownian Motion'' by Revuz and Yor
  • ''Diffusions, Markov Processes and Martingales (Volumes 1 and 2)'' by Rogers and Williams
  • ''Brownian Motion and Stochastic Calculus'' by Karatzas and Shreve

Random Walk and Markov Chains

These are indispensable tools of probability. Some nice references are

  • ''Markov Chain and Mixing Times'' by Levin, Peres and Wilmer. Online here.
  • ''Markov Chains'' by Norris
    • Starting with elementary examples, this book gives very good hints on how to think about Markov Chains.
  • ''Continuous time Markov Processes'' by Liggett
    • A theoretical perspective on this important topic in stochastic processes. The text uses Brownian motion as the motivating example.

Mathematics 

Convex Optimization

  • ''Convex Optimization'' by Boyd and Vandenberghe.  : You can download the book here
  • ''Introductory Lectures on Convex Optimization'' by Nesterov.

Linear Algebra

  • ''The Matrix Cookbook'' by Petersen and Pedersen: ''Matrix identities, relations and approximations. A desktop reference for quick overview of mathematics of matrices.'' Download here.
  • ''Matrix Analysis'' and ''Topics in Matrix Analysis'' by Horn and Johnson
    • Second book is more advanced than the first. Everything you need to know about matrix analysis.

Convex Analysis

  • ''A course in Convexity'' by Barvinok. 
    • A great book for self study and reference. It starts with the basis of convex analysis, then moves on to duality, Krein-Millman theorem, duality, concentration of measure, ellipsoid method and ends with Minkowski bodies, lattices and integer programming. Fairly theoretical and has many fun exercises. 

Measure Theory

  • Real Analysis and Probability - Dudley
    • Very comprehensive. 
  • Probability and Measure Theory - Ash
    • Nice and easy to digest. Good as companion for 205A

Combinatorics

  • ''Enumerative Combinatorics Vol I and II'' - Richard Stanley.
    • There's also a course on combinatorics this semester in the math department called Math249: Algebraic Combinatorics. Despite the scary "algebraic" prefix it's really fun. Download here.

Computational Biology

  • ''Statistical Methods in Bioinformatics'' by Ewens and Grant
    • Great overview of sequencing technology for the unacquainted.
  • ''Computational Genome Analysis: An Introduction'' by Deonier, Tavare, and Waterman
    • Great R code examples from computational biology. Discusses the basics, such as the greedy algorithm, etc.

Population Genetics

  • ''Probability Models for DNA Sequence Evolution'' by Durrett
  • ''Mathematical Population Genetics'' by Ewens

Computer Science

Numerical Analysis

  • Numerical Analysis by Burden and Faires
    • This book is a good overview of numerical computation methods for everything you'd need to know about implementing most computational methods you'll run into in statistics. It is filled with pseudo-code  but does use Maple as it's exemplary language sometimes. It has been a great resource for the Computational Statistics courses (243/244). Depending on what happens with this course, this may be a good place to look when you're lost in computation.

Algorithms

  • ''Introduction to Algorithms'', Third Edition, by Cormen, Leiserson, Rivest, and Stein.
  • ''Algorithm Design'', by Jon Kleinberg and Éva Tardos.

 

Posted by uniqueone
,
https://www.facebook.com/groups/aiIDEASandINNOVATIONS/permalink/1818670738166732/

A Guide to Artificial Intelligence in Healthcare:  a comprehensive e-book of how artificial intelligence can be used for improving medicine and healthcare! http://bit.ly/2AYmxo3 #ebook #digitalhealth #future #AI
Posted by uniqueone
,

I. 2014 2학기 기초통계학II 숙명여대 여인권
http://www.kocw.net/home/cview.do?cid=762d14861eb306ac&ar=link_nvrc
2013 2학기 기초통계학II 숙명여대 여인권
http://www.kocw.net/home/cview.do?cid=5b001db40374469f&ar=link_nvrc
네이버 '통계학 인터넷강의'로 검색하면 온라인 공개 강좌 섹션에 강의 목록이 나온다. 조회수 순으로 정렬하면 좋은 강의들이 나온다 

 

ㅣ. 확률 및 통계, 2014년 1학기 : HanyangUniversity : 이상화, 2014/04/21 ... 동영상 21개
ㅣ. 확률통계론, 2014년 2학기 : HanyangUniversity : 박재우, 2015/06/08 ... 동영상 18개
2014-2 확률통계론: http://www.youtube.com/watch?v=4JsAKaTEQMs&list=PLSN_PltQeOyjGOCnBz402iwXeki2wVXMJ
http://www.aistudy.com/math/probability.htm

ㅣ. 확률및통계
한양대 이상화교수
ㅣ. 확률통계론
한양대학교 공과대학 건설환경공학고 박재우교수의 확률통계론 강의. 총 18강 1시간씩
ㅣ. 기초통계학
제대로 시작하는 기초통계학(한빛아카데미)의 저자 노경섭의 강의
ㅣ. 통계학개론
방송통신대학교 통계학개론강의. 총 15강 약 50분씩
http://blplab.iptime.org:4321/course-cat/statistics/

ㅣ.  모두를 위한 프로그래밍 입문과 머신러닝 교수님께서 수강 전 이수를 권장하는 추천 책 목록입니다 :)
-세상에서 가장 쉬운 통계학(고지마 히로유키, 2009)
-세상에서 가장 쉬운 베이즈통계학입문(고지마 히로유키, 2017)
-확률과통계(한양대학교 이상화 교수, 2014)
-Reading Materials: Data Science from the Scratch - Ch.5, Ch.6, Ch.7
https://www.wadiz.kr/web/campaign/detail/qa/13991
Posted by uniqueone
,
https://m.facebook.com/groups/255834461424286?view=permalink&id=556808447993551

안녕하세요. 유령회원이 오랜만에 글을 적습니다. ;;;
올해도 다 지나갔네요....ㅠㅠ

Pattern Recognition and Machine Learning 책 2장 식(2.117)까지 정리한 노트입니다.

딥러닝 공부하면서 통계학 지식이 너무 없어서 혼자서 책보고 정리한 자료인데 원서라서 저같이 영알못 수알못에다 기억력까지 좋지 못한 경우는 다시보면 정말 첨보는 듯 한 느낌....다시 첨부터 읽어야하는듯한 자괴감을 막기위해 조금씩 주피터로 정리를 했습니다.

처음 관련 내용보았을 때 식이 너무 복잡해서 와 이거 뭐지..먼소리하는지 모르겠는데 싶었는데 어찌어찌 읽기는했네요.

식2.117까지는 꼭알아야겠다 싶어서 정리를 했는데 혹시나 보시고 계신분들 계시면 도움이 되면 좋겠습니다. 좀 어이없을 정도로 식을 풀어적어서 읽기 짜증나실 수 도 있습니다. 그냥 참고삼아..... ㅠㅠ

혹시 오류있으면 지적해주세요. 감사합니다.

http://nbviewer.jupyter.org/github/metamath1/ml-simple-works/blob/master/PRML/prml-chap2.ipynb
Posted by uniqueone
,
https://m.facebook.com/groups/869457839786067?view=permalink&id=1478681038863741

기계학습을 공부한 산업공학과의 입장에서 공부할만한 한글로된 자료(강의자료 + 영상)을 모아봤습니다. 아래의 자료는 코드 실습이 필요한 내용일 경우는 언어는 전부 Python입니다. (통계학개론강의만 제외) 딥러닝에 관한 한글로된 영상 및 자료는 김성훈 교수님의 "모두의 딥러닝"이 있습니다. (정말 감사드립니다.) 영어로된 좋은 강의도 많지만 (cs231n, cs224d , RL course by David Silver, Neural Network course by Hugo Larochelle), 영어로 본격적으로 딥러닝을 공부하기전에 빠르게 익숙한 언어로 기계학습을 공부해보실 분들은 참고하셔도 좋을 것 같습니다.

cf. 웹프로그래밍은 사심으로 넣어봤습니다.

[k-mooc]
미적분학1 (성균관대 채영도 교수님)
- http://www.kmooc.kr/courses/course-v1:SKKUk+SKKU_EXGB506.01K+2017_SKKU22/about

미적분학2 (성균관대 채영도 교수님)
- http://www.kmooc.kr/courses/course-v1:SKKUk+SKKU_2017_05-01+2017_SKKU01/about

선형대수학 (성균관대 이상구 교수님)
- http://www.kmooc.kr/courses/course-v1:SKKUk+SKKU_2017_01+2017_SKKU01/about

R을 활용한 통계학개론 (부산대 김충락 교수님)
- http://www.kmooc.kr/courses/course-v1:PNUk+RS_C01+2017_KM_009/about

인공지능과 기계학습 (카이스트 오혜연 교수님)
- http://www.kmooc.kr/courses/course-v1:KAISTk+KCS470+2017_K0202/about

[kooc]
파이썬 자료구조 (카이스트 문일철 교수님)
- http://kooc.kaist.ac.kr/datastructure-2017f

영상이해를 위한 최적화 (카이스트 김창익 교수님)
- http://kooc.kaist.ac.kr/optimization2017/lecture/10543

인공지능 및 기계학습 개론 1 (카이스트 문일철 교수님)
- http://kooc.kaist.ac.kr/machinelearning1_17/lecture/10574
- http://seslab.kaist.ac.kr/xe2/page_GBex27

인공지능 및 기계학습 개론 2 (카이스트 문일철 교수님)
- http://kooc.kaist.ac.kr/machinelearning2__17/lecture/10573
- http://seslab.kaist.ac.kr/xe2/page_Dogc43
 
인공지능 및 기계학습 심화 (카이스트 문일철 교수님)
- http://kooc.kaist.ac.kr/machinelearning3
- http://seslab.kaist.ac.kr/xe2/page_lMmY25

[TeamLab]
데이터과학을 위한 파이썬 입문 (가천대 최성철 교수님)
- https://github.com/TeamLab/Gachon_CS50_Python_KMOOC

밑바닥부터 기계학습 (가천대 최성철 교수님)
- https://github.com/TeamLab/machine_learning_from_scratch_with_python

경영과학(가천대 최성철 교수님)
- https://github.com/TeamLab/Gachon_CS50_OR_KMOOC

웹프로그래밍 (가천대 최성철 교수님)
- https://github.com/TeamLab/cs50_web_programming

Posted by uniqueone
,
https://m.facebook.com/story.php?story_fbid=383487222087279&id=303538826748786

Naive Bayes Classification
우리는 베이즈 정리(Bayes Theorem)라고 알려진 사후확률 (posterior probability)에 관한 몇가지 수학을 논의할 것입니다. 이것은 Naive Bayes Classifier의 핵심 부분입니다. 그리고, Python의 sklearn 라이브러리를 탐색하고 논의할 문제에 대해 Python의 Naive Bayes Classifier의 코드를 작성합니다.
이 글은 두부분으로 나누어져 있습니다 . 파트1 에서는 naive bayes classier가 어떻게 작동하는지 설명합니다. 파트2 에서는 Python에서 Naive Bayes Classifier를 제공하는 sklearn 라이브러리를 사용한 프로그래밍 연습으로 구성됩니다. 그리고 우리가 학습시키는 프로그램의 정확성에 대해 논의합니다.

원문
https://medium.com/machine-learning-101/chapter-1-supervised-learning-and-naive-bayes-classification-part-1-theory-8b9e361897d5
Posted by uniqueone
,

Machine Learning  |  Google Developers
https://developers.google.com/machine-learning/glossary/



Products   Machine Learning   Glossary
목차
A
B
C
D
E
F
G
H
I
K
L
M
N
O
P
Q
R
S
T
U
V
W

A


accuracy

The fraction of predictions that a classification model got right. In multi-class classification, accuracy is defined as follows:

Accuracy=CorrectPredictionsTotalNumberOfExamples
In binary classification, accuracy has the following definition:

Accuracy=TruePositives+TrueNegativesTotalNumberOfExamples
See true positive and true negative.


activation function

A function (for example, ReLU or sigmoid) that takes in the weighted sum of all of the inputs from the previous layer and then generates and passes an output value (typically nonlinear) to the next layer.


AdaGrad

A sophisticated gradient descent algorithm that rescales the gradients of each parameter, effectively giving each parameter an independent learning rate. For a full explanation, see this paper.


AUC (Area under the ROC Curve)

An evaluation metric that considers all possible classification thresholds.

The Area Under the ROC curve is the probability that a classifier will be more confident that a randomly chosen positive example is actually positive than that a randomly chosen negative example is positive.

B


backpropagation

The primary algorithm for performing gradient descent on neural networks. First, the output values of each node are calculated (and cached) in a forward pass. Then, the partial derivative of the error with respect to each parameter is calculated in a backward pass through the graph.


baseline

A simple model or heuristic used as reference point for comparing how well a model is performing. A baseline helps model developers quantify the minimal, expected performance on a particular problem.


batch

The set of examples used in one iteration (that is, one gradient update) of model training.


batch size

The number of examples in a batch. For example, the batch size of SGD is 1, while the batch size of a mini-batch is usually between 10 and 1000. Batch size is usually fixed during training and inference; however, TensorFlow does permit dynamic batch sizes.


bias

An intercept or offset from an origin. Bias (also known as the bias term) is referred to as b or w0 in machine learning models. For example, bias is the b in the following formula:

y′=b+w1x1+w2x2+…wnxn
Not to be confused with prediction bias.


binary classification

A type of classification task that outputs one of two mutually exclusive classes. For example, a machine learning model that evaluates email messages and outputs either "spam" or "not spam" is a binary classifier.


binning

See bucketing.


bucketing

Converting a (usually continuous) feature into multiple binary features called buckets or bins, typically based on value range. For example, instead of representing temperature as a single continuous floating-point feature, you could chop ranges of temperatures into discrete bins. Given temperature data sensitive to a tenth of a degree, all temperatures between 0.0 and 15.0 degrees could be put into one bin, 15.1 to 30.0 degrees could be a second bin, and 30.1 to 50.0 degrees could be a third bin.

C


calibration layer

A post-prediction adjustment, typically to account for prediction bias. The adjusted predictions and probabilities should match the distribution of an observed set of labels.


candidate sampling

A training-time optimization in which a probability is calculated for all the positive labels, using, for example, softmax, but only for a random sample of negative labels. For example, if we have an example labeled beagle and dog candidate sampling computes the predicted probabilities and corresponding loss terms for the beagle and dog class outputs in addition to a random subset of the remaining classes (cat, lollipop, fence). The idea is that the negative classes can learn from less frequent negative reinforcement as long as positive classes always get proper positive reinforcement, and this is indeed observed empirically. The motivation for candidate sampling is a computational efficiency win from not computing predictions for all negatives.


checkpoint

Data that captures the state of the variables of a model at a particular time. Checkpoints enable exporting model weights, as well as performing training across multiple sessions. Checkpoints also enable training to continue past errors (for example, job preemption). Note that the graph itself is not included in a checkpoint.


class

One of a set of enumerated target values for a label. For example, in a binary classification model that detects spam, the two classes are spam and not spam. In a multi-class classification model that identifies dog breeds, the classes would be poodle, beagle, pug, and so on.


class-imbalanced data set

A binary classification problem in which the labels for the two classes have significantly different frequencies. For example, a disease data set in which 0.0001 of examples have positive labels and 0.9999 have negative labels is a class-imbalanced problem, but a football game predictor in which 0.51 of examples label one team winning and 0.49 label the other team winning is not a class-imbalanced problem.


classification model

A type of machine learning model for distinguishing among two or more discrete classes. For example, a natural language processing classification model could determine whether an input sentence was in French, Spanish, or Italian. Compare with regression model.


classification threshold

A scalar-value criterion that is applied to a model's predicted score in order to separate the positive class from the negative class. Used when mapping logistic regression results to binary classification. For example, consider a logistic regression model that determines the probability of a given email message being spam. If the classification threshold is 0.9, then logistic regression values above 0.9 are classified as spam and those below 0.9 are classified as not spam.


confusion matrix

An NxN table that summarizes how successful a classification model's predictions were; that is, the correlation between the label and the model's classification. One axis of a confusion matrix is the label that the model predicted, and the other axis is the actual label. N represents the number of classes. In a binary classification problem, N=2. For example, here is a sample confusion matrix for a binary classification problem:

Tumor (predicted) Non-Tumor (predicted)
Tumor (actual) 18 1
Non-Tumor (actual) 6 452
The preceding confusion matrix shows that of the 19 samples that actually had tumors, the model correctly classified 18 as having tumors (18 true positives), and incorrectly classified 1 as not having a tumor (1 false negative). Similarly, of 458 samples that actually did not have tumors, 452 were correctly classified (452 true negatives) and 6 were incorrectly classified (6 false positives).

The confusion matrix of a multi-class confusion matrix can help you determine mistake patterns. For example, a confusion matrix could reveal that a model trained to recognize handwritten digits tends to mistakenly predict 9 instead of 4, or 1 instead of 7. The confusion matrix contains sufficient information to calculate a variety of performance metrics, including precision and recall.


continuous feature

A floating-point feature with an infinite range of possible values. Contrast with discrete feature.


convergence

Informally, often refers to a state reached during training in which training loss and validation loss change very little or not at all with each iteration after a certain number of iterations. In other words, a model reaches convergence when additional training on the current data will not improve the model. In deep learning, loss values sometimes stay constant or nearly so for many iterations before finally descending, temporarily producing a false sense of convergence.

See also early stopping.

See also Convex Optimization by Boyd and Vandenberghe.


convex function

A function typically shaped approximately like the letter U or a bowl. However, in degenerate cases, a convex function is shaped like a line. For example, the following are all convex functions:

L2 loss
Log Loss
L1 regularization
L2 regularization
Convex functions are popular loss functions. That's because when a minimum value exists (as is often the case), many variations of gradient descent are guaranteed to find a point close to the minimum point of the function. Similarly, many variations of stochastic gradient descent have a high probability (though, not a guarantee) of finding a point close to the minimum.

The sum of two convex functions (for example, L2 loss + L1 regularization) is a convex function.

Deep models are usually not convex functions. Remarkably, algorithms designed for convex optimization tend to work reasonably well on deep networks anyway, even though they rarely find a minimum.


cost

Synonym for loss.


cross-entropy

A generalization of Log Loss to multi-class classification problems. Cross-entropy quantifies the difference between two probability distributions. See also perplexity.

D


data set

A collection of examples.


decision boundary

The separator between classes learned by a model in a binary class or multi-class classification problems. For example, in the following image representing a binary classification problem, the decision boundary is the frontier between the orange class and the blue class:

A
well-defined boundary between one class and another.


deep model

A type of neural network containing multiple hidden layers. Deep models rely on trainable nonlinearities.

Contrast with wide model.


dense feature

A feature in which most values are non-zero, typically a Tensor of floating-point values. Contrast with sparse feature.


derived feature

Synonym for synthetic feature.


discrete feature

A feature with a finite set of possible values. For example, a feature whose values may only be animal, vegetable, or mineral is a discrete (or categorical) feature. Contrast with continuous feature.


dropout regularization

A form of regularization useful in training neural networks. Dropout regularization works by removing a random selection of a fixed number of the units in a network layer for a single gradient step. The more units dropped out, the stronger the regularization. This is analogous to training the network to emulate an exponentially large ensemble of smaller networks. For full details, see Dropout: A Simple Way to Prevent Neural Networks from Overfitting.


dynamic model

A model that is trained online in a continuously updating fashion. That is, data is continuously entering the model.

E


early stopping

A method for regularization that involves ending model training before training loss finishes decreasing. In early stopping, you end model training when the loss on a validation data set starts to increase, that is, when generalization performance worsens.


embeddings

A categorical feature represented as a continuous-valued feature. Typically, an embedding is a translation of a high-dimensional vector into a low-dimensional space. For example, you can represent the words in an English sentence in either of the following two ways:

As a million-element (high-dimensional) sparse vector in which all elements are integers. Each cell in the vector represents a separate English word; the value in a cell represents the number of times that word appears in a sentence. Since a single English sentence is unlikely to contain more than 50 words, nearly every cell in the vector will contain a 0. The few cells that aren't 0 will contain a low integer (usually 1) representing the number of times that word appeared in the sentence.
As a several-hundred-element (low-dimensional) dense vector in which each element holds a floating-point value between 0 and 1.
In TensorFlow, embeddings are trained by backpropagating loss just like any other parameter in a neural network.


empirical risk minimization (ERM)

Choosing the model function that minimizes loss on the training set. Contrast with structural risk minimization.


ensemble

A merger of the predictions of multiple models. You can create an ensemble via one or more of the following:

different initializations
different hyperparameters
different overall structure
Deep and wide models are a kind of ensemble.


Estimator

An instance of the tf.Estimator class, which encapsulates logic that builds a TensorFlow graph and runs a TensorFlow session. You may create your own Estimators (as described here) or instantiate pre-made Estimators created by others.


example

One row of a data set. An example contains one or more features and possibly a label. See also labeled example and unlabeled example.

F


false negative (FN)

An example in which the model mistakenly predicted the negative class. For example, the model inferred that a particular email message was not spam (the negative class), but that email message actually was spam.


false positive (FP)

An example in which the model mistakenly predicted the positive class. For example, the model inferred that a particular email message was spam (the positive class), but that email message was actually not spam.


false positive rate (FP rate)

The x-axis in an ROC curve. The FP rate is defined as follows:

FalsePositiveRate=FalsePositivesFalsePositives+TrueNegatives

feature

An input variable used in making predictions.


feature columns (FeatureColumns)

A set of related features, such as the set of all possible countries in which users might live. An example may have one or more features present in a feature column.

Feature columns in TensorFlow also encapsulate metadata such as:

the feature's data type
whether a feature is fixed length or should be converted to an embedding
A feature column can contain a single feature.

"Feature column" is Google-specific terminology. A feature column is referred to as a "namespace" in the VW system (at Yahoo/Microsoft), or a field.


feature cross

A synthetic feature formed by crossing (multiplying or taking a Cartesian product of) individual features. Feature crosses help represent nonlinear relationships.


feature engineering

The process of determining which features might be useful in training a model, and then converting raw data from log files and other sources into said features. In TensorFlow, feature engineering often means converting raw log file entries to tf.Example protocol buffers. See also tf.Transform.

Feature engineering is sometimes called feature extraction.


feature set

The group of feature your machine learning model trains on. For example, postal code, property size, and property condition might comprise a simple feature set for a model that predicts housing prices.


feature spec

Describes the information required to extract features data from the tf.Example protocol buffer. Because the tf.Example protocol buffer is just a container for data, you must specify the following:

the data to extract (that is, the keys for the features)
the data type (for example, float or int)
The length (fixed or variable)
The Estimator API provides facilities for producing a feature spec from a list of FeatureColumns.


full softmax

See softmax. Contrast with candidate sampling.

G


generalization

Refers to your model's ability to make correct predictions on new, previously unseen data as opposed to the data used to train the model.


generalized linear model

A generalization of least squares regression models, which are based on Gaussian noise, to other types of models based on other types of noise, such as Poisson noise or categorical noise. Examples of generalized linear models include:

logistic regression
multi-class regression
least squares regression
The parameters of a generalized linear model can be found through convex optimization.

Generalized linear models exhibit the following properties:

The average prediction of the optimal least squares regression model is equal to the average label on the training data.
The average probability predicted by the optimal logistic regression model is equal to the average label on the training data.
The power of a generalized linear model is limited by its features. Unlike a deep model, a generalized linear model cannot "learn new features."


gradient

The vector of partial derivatives with respect to all of the independent variables. In machine learning, the gradient is the the vector of partial derivatives of the model function. The gradient points in the direction of steepest ascent.


gradient clipping

Capping gradient values before applying them. Gradient clipping helps ensure numerical stability and prevents exploding gradients.


gradient descent

A technique to minimize loss by computing the gradients of loss with respect to the model's parameters, conditioned on training data. Informally, gradient descent iteratively adjusts parameters, gradually finding the best combination of weights and bias to minimize loss.


graph

In TensorFlow, a computation specification. Nodes in the graph represent operations. Edges are directed and represent passing the result of an operation (a Tensor) as an operand to another operation. Use TensorBoard to visualize a graph.

H


heuristic

A practical and nonoptimal solution to a problem, which is sufficient for making progress or for learning from.


hidden layer

A synthetic layer in a neural network between the input layer (that is, the features) and the output layer (the prediction). A neural network contains one or more hidden layers.


hinge loss

A family of loss functions for classification designed to find the decision boundary as distant as possible from each training example, thus maximizing the margin between examples and the boundary. KSVMs use hinge loss (or a related function, such as squared hinge loss). For binary classification, the hinge loss function is defined as follows:

loss=max(0,1−(y′∗y))
where y' is the raw output of the classifier model:

y′=b+w1x1+w2x2+…wnxn
and y is the true label, either -1 or +1.

Consequently, a plot of hinge loss vs. (y * y') looks as follows:

A
plot of hinge loss vs raw classifier score shows a distinct hinge at the
coordinate (1,0).


holdout data

Examples intentionally not used ("held out") during training. The validation data set and test data set are examples of holdout data. Holdout data helps evaluate your model's ability to generalize to data other than the data it was trained on. The loss on the holdout set provides a better estimate of the loss on an unseen data set than does the loss on the training set.


hyperparameter

The "knobs" that you tweak during successive runs of training a model. For example, learning rate is a hyperparameter.

Contrast with parameter.

I


independently and identically distributed (i.i.d)

Data drawn from a distribution that doesn't change, and where each value drawn doesn't depend on values that have been drawn previously. An i.i.d. is the ideal gas of machine learning—a useful mathematical construct but almost never exactly found in the real world. For example, the distribution of visitors to a web page may be i.i.d. over a brief window of time; that is, the distribution doesn't change during that brief window and one person's visit is generally independent of another's visit. However, if you expand that window of time, seasonal differences in the web page's visitors may appear.


inference

In machine learning, often refers to the process of making predictions by applying the trained model to unlabeled examples. In statistics, inference refers to the process of fitting the parameters of a distribution conditioned on some observed data. (See the Wikipedia article on statistical inference.)


input layer

The first layer (the one that receives the input data) in a neural network.


instance

Synonym for example.


inter-rater agreement

A measurement of how often human raters agree when doing a task. If raters disagree, the task instructions may need to be improved. Also sometimes called inter-annotator agreement or inter-rater reliability. See also Cohen's kappa, which is one of the most popular inter-rater agreement measurements.

K


Kernel Support Vector Machines (KSVMs)

A classification algorithm that seeks to maximize the margin between positive and negative classes by mapping input data vectors to a higher dimensional space. For example, consider a classification problem in which the input data set consists of a hundred features. In order to maximize the margin between positive and negative classes, KSVMs could internally map those features into a million-dimension space. KSVMs uses a loss function called hinge loss.

L


L1 loss

Loss function based on the absolute value of the difference between the values that a model is predicting and the actual values of the labels. L1 loss is less sensitive to outliers than L2 loss.


L1 regularization

A type of regularization that penalizes weights in proportion to the sum of the absolute values of the weights. In models relying on sparse features, L1 regularization helps drive the weights of irrelevant or barely relevant features to exactly 0, which removes those features from the model. Contrast with L2 regularization.


L2 loss

See squared loss.


L2 regularization

A type of regularization that penalizes weights in proportion to the sum of the squares of the weights. L2 regularization helps drive outlier weights (those with high positive or low negative values) closer to 0 but not quite to 0. (Contrast with L1 regularization.) L2 regularization always improves generalization in linear models.


label

In supervised learning, the "answer" or "result" portion of an example. Each example in a labeled data set consists of one or more features and a label. For instance, in a housing data set, the features might include the number of bedrooms, the number of bathrooms, and the age of the house, while the label might be the house's price. in a spam detection dataset, the features might include the subject line, the sender, and the email message itself, while the label would probably be either "spam" or "not spam."


labeled example

An example that contains features and a label. In supervised training, models learn from labeled examples.


lambda

Synonym for regularization rate.

(This is an overloaded term. Here we're focusing on the term's definition within regularization.)


layer

A set of neurons in a neural network that process a set of input features, or the output of those neurons.

Also, an abstraction in TensorFlow. Layers are Python functions that take Tensors and configuration options as input and produce other tensors as output. Once the necessary Tensors have been composed, the user can convert the result into an Estimator via a model function.


learning rate

A scalar used to train a model via gradient descent. During each iteration, the gradient descent algorithm multiplies the learning rate by the gradient. The resulting product is called the gradient step.

Learning rate is a key hyperparameter.


least squares regression

A linear regression model trained by minimizing L2 Loss.


linear regression

A type of regression model that outputs a continuous value from a linear combination of input features.


logistic regression

A model that generates a probability for each possible discrete label value in classification problems by applying a sigmoid function to a linear prediction. Although logistic regression is often used in binary classification problems, it can also be used in multi-class classification problems (where it becomes called multi-class logistic regression or multinomial regression).


Log Loss

The loss function used in binary logistic regression.


loss

A measure of how far a model's predictions are from its label. Or, to phrase it more pessimistically, a measure of how bad the model is. To determine this value, a model must define a loss function. For example, linear regression models typically use mean squared error for a loss function, while logistic regression models use Log Loss.

M


machine learning

A program or system that builds (trains) a predictive model from input data. The system uses the learned model to make useful predictions from new (never-before-seen) data drawn from the same distribution as the one used to train the model. Machine learning also refers to the field of study concerned with these programs or systems.


Mean Squared Error (MSE)

The average squared loss per example. MSE is calculated by dividing the squared loss by the number of examples. The values that TensorFlow Playground displays for "Training loss" and "Test loss" are MSE.


metric

A number that you care about. May or may not be directly optimized in a machine-learning system. A metric that your system tries to optimize is called an objective.


mini-batch

A small, randomly selected subset of the entire batch of examples run together in a single iteration of training or inference. The batch size of a mini-batch is usually between 10 and 1,000. It is much more efficient to calculate the loss on a mini-batch than on the full training data.


mini-batch stochastic gradient descent (SGD)

A gradient descent algorithm that uses mini-batches. In other words, mini-batch SGD estimates the gradient based on a small subset of the training data. Vanilla SGD uses a mini-batch of size 1.


ML

Abbreviation for machine learning.


model

The representation of what an ML system has learned from the training data. This is an overloaded term, which can have either of the following two related meanings:

The TensorFlow graph that expresses the structure of how a prediction will be computed.
The particular weights and biases of that TensorFlow graph, which are determined by training.

model training

The process of determining the best model.


Momentum

A sophisticated gradient descent algorithm in which a learning step depends not only on the derivative in the current step, but also on the derivatives in the step(s) that immediately preceded it. Momentum involves computing an exponentially weighted moving average of the gradients over time, analogous to momentum in physics. Momentum sometimes prevents learning from getting stuck in local minima.


multi-class

Classification problems that distinguish among more than two classes. For example, there are approximately 128 species of maple trees, so a model that categorized maple tree species would be multi-class. Conversely, a model that divided emails into only two categories (spam and not spam) would be a binary classification model.

N


NaN trap

When one number in your model becomes a NaN during training, which causes many or all other numbers in your model to eventually become a NaN.

NaN is an abbreviation for "Not a Number."


negative class

In binary classification, one class is termed positive and the other is termed negative. The positive class is the thing we're looking for and the negative class is the other possibility. For example, the negative class in a medical test might be "not tumor." The negative class in an email classifier might be "not spam." See also positive class.


neural network

A model that, taking inspiration from the brain, is composed of layers (at least one of which is hidden) consisting of simple connected units or neurons followed by nonlinearities.


neuron

A node in a neural network, typically taking in multiple input values and generating one output value. The neuron calculates the output value by applying an activation function (nonlinear transformation) to a weighted sum of input values.


normalization

The process of converting an actual range of values into a standard range of values, typically -1 to +1 or 0 to 1. For example, suppose the natural range of a certain feature is 800 to 6,000. Through subtraction and division, you can normalize those values into the range -1 to +1.

See also scaling.


numpy

An open-source math library that provides efficient array operations in Python. pandas is built on numpy.

O


objective

A metric that your algorithm is trying to optimize.


offline inference

Generating a group of predictions, storing those predictions, and then retrieving those predictions on demand. Contrast with online inference.


one-hot encoding

A sparse vector in which:

One element is set to 1.
All other elements are set to 0.
One-hot encoding is commonly used to represent strings or identifiers that have a finite set of possible values. For example, suppose a given botany data set chronicles 15,000 different species, each denoted with a unique string identifier. As part of feature engineering, you'll probably encode those string identifiers as one-hot vectors in which the vector has a size of 15,000.


one-vs.-all

Given a classification problem with N possible solutions, a one-vs.-all solution consists of N separate binary classifiers—one binary classifier for each possible outcome. For example, given a model that classifies examples as animal, vegetable, or mineral, a one-vs.-all solution would provide the following three separate binary classifiers:

animal vs. not animal
vegetable vs. not vegetable
mineral vs. not mineral

online inference

Generating predictions on demand. Contrast with offline inference.


Operation (op)

A node in the TensorFlow graph. In TensorFlow, any procedure that creates, manipulates, or destroys a Tensor is an operation. For example, a matrix multiply is an operation that takes two Tensors as input and generates one Tensor as output.


optimizer

A specific implementation of the gradient descent algorithm. TensorFlow's base class for optimizers is tf.train.Optimizer. Different optimizers (subclasses of tf.train.Optimizer) account for concepts such as:

momentum (Momentum)
update frequency (AdaGrad = ADAptive GRADient descent; Adam = ADAptive with Momentum; RMSProp)
sparsity/regularization (Ftrl)
more complex math (Proximal, and others)
You might even imagine an NN-driven optimizer.


outliers

Values distant from most other values. In machine learning, any of the following are outliers:

Weights with high absolute values.
Predicted values relatively far away from the actual values.
Input data whose values are more than roughly 3 standard deviations from the mean.
Outliers often cause problems in model training.


output layer

The "final" layer of a neural network. The layer containing the answer(s).


overfitting

Creating a model that matches the training data so closely that the model fails to make correct predictions on new data.

P


pandas

A column-oriented data analysis API. Many ML frameworks, including TensorFlow, support pandas data structures as input. See pandas documentation.


parameter

A variable of a model that the ML system trains on its own. For example, weights are parameters whose values the ML system gradually learns through successive training iterations. Contrast with hyperparameter.


Parameter Server (PS)

A job that keeps track of a model's parameters in a distributed setting.


parameter update

The operation of adjusting a model's parameters during training, typically within a single iteration of gradient descent.


partial derivative

A derivative in which all but one of the variables is considered a constant. For example, the partial derivative of f(x, y) with respect to x is the derivative of f considered as a function of x alone (that is, keeping y constant). The partial derivative of f with respect to x focuses only on how x is changing and ignores all other variables in the equation.


partitioning strategy

The algorithm by which variables are divided across parameter servers.


performance

Overloaded term with the following meanings:

The traditional meaning within software engineering. Namely: How fast (or efficiently) does this piece of software run?
The meaning within ML. Here, performance answers the following question: How correct is this model? That is, how good are the model's predictions?

perplexity

One measure of how well a model is accomplishing its task. For example, suppose your task is to read the first few letters of a word a user is typing on a smartphone keyboard, and to offer a list of possible completion words. Perplexity, P, for this task is approximately the number of guesses you need to offer in order for your list to contain the actual word the user is trying to type.

Perplexity is related to cross-entropy as follows:

P=2−crossentropy

pipeline

The infrastructure surrounding a machine learning algorithm. A pipeline includes gathering the data, putting the data into training data files, training one or more models, and exporting the models to production.


positive class

In binary classification, the two possible classes are labeled as positive and negative. The positive outcome is the thing we're testing for. (Admittedly, we're simultaneously testing for both outcomes, but play along.) For example, the positive class in a medical test might be "tumor." The positive class in an email classifier might be "spam."

Contrast with negative class.


precision

A metric for classification models. Precision identifies the frequency with which a model was correct when predicting the positive class. That is:

Precision=TruePositivesTruePositives+FalsePositives

prediction

A model's output when provided with an input example.


prediction bias

A value indicating how far apart the average of predictions is from the average of labels in the data set.


pre-made Estimator

An Estimator that someone has already built. TensorFlow provides several pre-made Estimators, including DNNClassifier, DNNRegressor, and LinearClassifier. You may build your own pre-made Estimators by following these instructions.


pre-trained model

Models or model components (such as embeddings) that have been already been trained. Sometimes, you'll feed pre-trained embeddings into a neural network. Other times, your model will train the embeddings itself rather than rely on the pre-trained embeddings.


prior belief

What you believe about the data before you begin training on it. For example, L2 regularization relies on a prior belief that weights should be small and normally distributed around zero.

Q


queue

A TensorFlow Operation that implements a queue data structure. Typically used in I/O.

R


rank

Overloaded term in ML that can mean either of the following:

The number of dimensions in a Tensor. For instance, a scalar has rank 0, a vector has rank 1, and a matrix has rank 2.
The ordinal position of a class in an ML problem that categorizes classes from highest to lowest. For example, a behavior ranking system could rank a dog's rewards from highest (a steak) to lowest (wilted kale).

rater

A human who provides labels in examples. Sometimes called an "annotator."


recall

A metric for classification models that answers the following question: Out of all the possible positive labels, how many did the model correctly identify? That is:

Recall=TruePositivesTruePositives+FalseNegatives

Rectified Linear Unit (ReLU)

An activation function with the following rules:

If input is negative or zero, output is 0.
If input is positive, output is equal to input.

regression model

A type of model that outputs continuous (typically, floating-point) values. Compare with classification models, which output discrete values, such as "day lily" or "tiger lily."


regularization

The penalty on a model's complexity. Regularization helps prevent overfitting. Different kinds of regularization include:

L1 regularization
L2 regularization
dropout regularization
early stopping (this is not a formal regularization method, but can effectively limit overfitting)

regularization rate

A scalar value, represented as lambda, specifying the relative importance of the regularization function. The following simplified loss equation shows the regularization rate's influence:

minimize(loss function + λ(regularization function))
Raising the regularization rate reduces overfitting but may make the model less accurate.


representation

The process of mapping data to useful features.


ROC (receiver operating characteristic) Curve

A curve of true positive rate vs. false positive rate at different classification thresholds. See also AUC.


root directory

The directory you specify for hosting subdirectories of the TensorFlow checkpoint and events files of multiple models.


Root Mean Squared Error (RMSE)

The square root of the Mean Squared Error.

S


Saver

A TensorFlow object responsible for saving model checkpoints.


scaling

A commonly used practice in feature engineering to tame a feature's range of values to match the range of other features in the data set. For example, suppose that you want all floating-point features in the data set to have a range of 0 to 1. Given a particular feature's range of 0 to 500, you could scale that feature by dividing each value by 500.

See also normalization.


scikit-learn

A popular open-source ML platform. See www.scikit-learn.org.


sequence model

A model whose inputs have a sequential dependence. For example, predicting the next video watched from a sequence of previously watched videos.


session

Maintains state (for example, variables) within a TensorFlow program.


sigmoid function

A function that maps logistic or multinomial regression output (log odds) to probabilities, returning a value between 0 and 1. The sigmoid function has the following formula:

y=11+e−σ
where σ in logistic regression problems is simply:

σ=b+w1x1+w2x2+…wnxn
In other words, the sigmoid function converts σ into a probability between 0 and 1.

In some neural networks, the sigmoid function acts as the activation function.


softmax

A function that provides probabilities for each possible class in a multi-class classification model. The probabilities add up to exactly 1.0. For example, softmax might determine that the probability of a particular image being a dog at 0.9, a cat at 0.08, and a horse at 0.02. (Also called full softmax.)

Contrast with candidate sampling.


sparse feature

Feature vector whose values are predominately zero or empty. For example, a vector containing a single 1 value and a million 0 values is sparse. As another example, words in a search query could also be a sparse feature—there are many possible words in a given language, but only a few of them occur in a given query.

Contrast with dense feature.


squared loss

The loss function used in linear regression. (Also known as L2 Loss.) This function calculates the squares of the difference between a model's predicted value for a labeled example and the actual value of the label. Due to squaring, this loss function amplifies the influence of bad predictions. That is, squared loss reacts more strongly to outliers than L1 loss.


static model

A model that is trained offline.


stationarity

A property of data in a data set, in which the data distribution stays constant across one or more dimensions. Most commonly, that dimension is time, meaning that data exhibiting stationarity doesn't change over time. For example, data that exhibits stationarity doesn't change from September to December.


step

A forward and backward evaluation of one batch.


step size

Synonym for learning rate.


stochastic gradient descent (SGD)

A gradient descent algorithm in which the batch size is one. In other words, SGD relies on a single example chosen uniformly at random from a data set to calculate an estimate of the gradient at each step.


structural risk minimization (SRM)

An algorithm that balances two goals:

The desire to build the most predictive model (for example, lowest loss).
The desire to keep the model as simple as possible (for example, strong regularization).
For example, a model function that minimizes loss+regularization on the training set is a structural risk minimization algorithm.

For more information, see http://www.svms.org/srm/.

Contrast with empirical risk minimization.


summary

In TensorFlow, a value or set of values calculated at a particular step, usually used for tracking model metrics during training.


supervised machine learning

Training a model from input data and its corresponding labels. Supervised machine learning is analogous to a student learning a subject by studying a set of questions and their corresponding answers. After mastering the mapping between questions and answers, the student can then provide answers to new (never-before-seen) questions on the same topic. Compare with unsupervised machine learning.


synthetic feature

A feature that is not present among the input features, but is derived from one or more of them. Kinds of synthetic features include the following:

Multiplying one feature by itself or by other feature(s). (These are termed feature crosses.)
Dividing one feature by a second feature.
Bucketing a continuous feature into range bins.
Features created by normalizing or scaling alone are not considered synthetic features.

T


target

Synonym for label.


Tensor

The primary data structure in TensorFlow programs. Tensors are N-dimensional (where N could be very large) data structures, most commonly scalars, vectors, or matrices. The elements of a Tensor can hold integer, floating-point, or string values.


Tensor Processing Unit (TPU)

An ASIC (application-specific integrated circuit) that optimizes the performance of TensorFlow programs.


Tensor rank

See rank.


Tensor shape

The number of elements a Tensor contains in various dimensions. For example, a [5, 10] Tensor has a shape of 5 in one dimension and 10 in another.


Tensor size

The total number of scalars a Tensor contains. For example, a [5, 10] Tensor has a size of 50.


TensorBoard

The dashboard that displays the summaries saved during the execution of one or more TensorFlow programs.


TensorFlow

A large-scale, distributed, machine learning platform. The term also refers to the base API layer in the TensorFlow stack, which supports general computation on dataflow graphs.

Although TensorFlow is primarily used for machine learning, you may also use TensorFlow for non-ML tasks that require numerical computation using dataflow graphs.


TensorFlow Playground

A program that visualizes how different hyperparameters influence model (primarily neural network) training. Go to http://playground.tensorflow.org to experiment with TensorFlow Playground.


TensorFlow Serving

A platform to deploy trained models in production.


test set

The subset of the data set that you use to test your model after the model has gone through initial vetting by the validation set.

Contrast with training set and validation set.


tf.Example

A standard protocol buffer for describing input data for machine learning model training or inference.


training

The process of determining the ideal parameters comprising a model.


training set

The subset of the data set used to train a model.

Contrast with validation set and test set.


true negative (TN)

An example in which the model correctly predicted the negative class. For example, the model inferred that a particular email message was not spam, and that email message really was not spam.


true positive (TP)

An example in which the model correctly predicted the positive class. For example, the model inferred that a particular email message was spam, and that email message really was spam.


true positive rate (TP rate)

Synonym for recall. That is:

TruePositiveRate=TruePositivesTruePositives+FalseNegatives
True positive rate is the y-axis in an ROC curve.

U


unlabeled example

An example that contains features but no label. Unlabeled examples are the input to inference. In semi-supervised and unsupervised learning, unlabeled examples are used during training.


unsupervised machine learning

Training a model to find patterns in a data set, typically an unlabeled data set.

The most common use of unsupervised machine learning is to cluster data into groups of similar examples. For example, an unsupervised machine learning algorithm can cluster songs together based on various properties of the music. The resulting clusters can become an input to other machine learning algorithms (for example, to a music recommendation service). Clustering can be helpful in domains where true labels are hard to obtain. For example, in domains such as anti-abuse and fraud, clusters can help humans better understand the data.

Another example of unsupervised machine learning is principal component analysis (PCA). For example, applying PCA on a data set containing the contents of millions of shopping carts might reveal that shopping carts containing lemons frequently also contain antacids.

Compare with supervised machine learning.

V


validation set

A subset of the data set—disjunct from the training set—that you use to adjust hyperparameters.

Contrast with training set and test set.

W


weight

A coefficient for a feature in a linear model, or an edge in a deep network. The goal of training a linear model is to determine the ideal weight for each feature. If a weight is 0, then its corresponding feature does not contribute to the model.


wide model

A linear model that typically has many sparse input features. We refer to it as "wide" since such a model is a special type of neural network with a large number of inputs that connect directly to the output node. Wide models are often easier to debug and inspect than deep models. Although wide models cannot express nonlinearities through hidden layers, they can use transformations such as feature crossing and bucketization to model nonlinearities in different ways.

Contrast with deep model.

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 3.0 License, and code samples are licensed under the Apache 2.0 License. For details, see our Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 9월 19, 2017.
Connect

Blog
Facebook
Google+
Medium
Twitter
YouTube
Programs

Women Techmakers
Agency Program
GDG
Google Developers Experts
Startup Launchpad
Developer Consoles

Google API Console
Google Cloud Platform Console
Google Play Console
Firebase Console
Cast SDK Developer Console
Chrome Web Store Dashboard

Android
Chrome
Firebase
Google Cloud Platform
모든 제품
한국어
Terms Privacy
Sign up for the Google Developers newsletter
구독하기
Posted by uniqueone
,
https://m.facebook.com/story.php?story_fbid=377364479366220&id=303538826748786

30개의 필수 데이터과학, 머신러닝, 딥러닝 치트시트

데이터과학을 위한 Python
https://s3.amazonaws.com/assets.datacamp.com/blog_assets/PythonForDataScience.pdf

Pandas 기초
https://s3.amazonaws.com/assets.datacamp.com/blog_assets/PandasPythonForDataScience+(1).pdf

Pandas
https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Python_Pandas_Cheat_Sheet_2.pdf

Numpy
https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Numpy_Python_Cheat_Sheet.pdf

Scipy
https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Python_SciPy_Cheat_Sheet_Linear_Algebra.pdf

Scikit-learn
https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Scikit_Learn_Cheat_Sheet_Python.pdf

Matplotlib
https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Python_Matplotlib_Cheat_Sheet.pdf

Bokeh
https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Python_Bokeh_Cheat_Sheet.pdf

Base R
https://www.rstudio.com/resources/cheatsheets/

Advanced R
https://www.rstudio.com/resources/cheatsheets/

Caret
https://www.rstudio.com/resources/cheatsheets/

Data Import
https://www.rstudio.com/resources/cheatsheets/

Data Transformation with dplyr
https://www.rstudio.com/resources/cheatsheets/

R Markdown
https://www.rstudio.com/resources/cheatsheets/

R Studio IDE
https://github.com/rstudio/cheatsheets/raw/master/source/pdfs/rstudio-IDE-cheatsheet.pdf

Data Visualization
https://github.com/rstudio/cheatsheets/raw/master/source/pdfs/ggplot2-cheatsheet-2.1.pdf

Neural Network Architectures
http://www.asimovinstitute.org/neural-network-zoo/

Neural Network Cells
http://www.asimovinstitute.org/neural-network-zoo-prequel-cells-layers/

Neural Network Graphs
http://www.asimovinstitute.org/neural-network-zoo-prequel-cells-layers/

TensorFlow
https://www.altoros.com/tensorflow-cheat-sheet.html

Keras
https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Keras_Cheat_Sheet_Python.pdf

Probability
https://static1.squarespace.com/static/54bf3241e4b0f0d81bf7ff36/t/55e9494fe4b011aed10e48e5/1441352015658/probability_cheatsheet.pdf

Statistics
http://web.mit.edu/~csvoss/Public/usabo/stats_handout.pdf

Linear Algebra
https://minireference.com/static/tutorials/linear_algebra_in_4_pages.pdf

Big O Complexity
http://bigocheatsheet.com/

Common Data Structure Operations
http://bigocheatsheet.com/

Common Sorting Algorithms
http://bigocheatsheet.com/

Data Structures
https://www.clear.rice.edu/comp160/data_cheat.html

SQL
http://www.sql-tutorial.net/sql-cheat-sheet.pdf
Posted by uniqueone
,
https://m.facebook.com/groups/255834461424286?view=permalink&id=513220265685703

안녕하세요, 앤드류 교수의 코세라 강의를 정주행 후 영상 슬라이드를 정리하여 보았습니다.

지난주에 이어 3주차 슬라이드를 공유 드립니다.
많은분들께 도움이 되었으면 합니다.

감사합니다. :)

3주차 : http://www.kwangsiklee.com/ko/2017/07/corsera-machine-learning%EC%9C%BC%EB%A1%9C-%EA%B8%B0%EA%B3%84%ED%95%99%EC%8A%B5-%EB%B0%B0%EC%9A%B0%EA%B8%B0-week3/
2주차 : http://www.kwangsiklee.com/ko/2017/07/corsera-machine-learning%EC%9C%BC%EB%A1%9C-%EA%B8%B0%EA%B3%84%ED%95%99%EC%8A%B5-%EB%B0%B0%EC%9A%B0%EA%B8%B0-week2/
1주차 : http://www.kwangsiklee.com/ko/2017/07/corsera-machine-learning-week1-%EC%A0%95%EB%A6%AC/
Posted by uniqueone
,
https://m.facebook.com/story.php?story_fbid=340227976413204&id=303538826748786

데이터과학 및 딥러닝을 위한 데이터세트
데이터 과학 기술을 배우는 대부분의 사람들은 실제 데이터를 사용하여 작업합니다. 그러나 잘못된 데이터를 사용하면 시간이 많이 걸리고 초조한 모험이 될 수 있습니다.

필자는 데이터과학 기술을 배우면서 올바른 유형의 데이터를 선택하기 위해 지켜야 할 규칙을 작성했습니다.

1. 회귀 분석
-. 자동차 mpg 데이터세트
https://archive.ics.uci.edu/ml/datasets/auto+mpg

-. UCI 머신러닝 저장소
http://archive.ics.uci.edu/ml/index.php

2. 분류
-. Kaggle
http://www.kaggle.com/

3. 시계열 분석
-. 시계열 데이터 라이브러리
https://datamarket.com/data/list/?q=provider:tsdl

-. Quandl
https://www.quandl.com/

4. 시각화
-. 데이터의 흐름
http://flowingdata.com/

-. subreddit r/dataisbeautiful
https://www.reddit.com/r/dataisbeautiful/

5. 자연어
-. Reddit 주석
https://www.reddit.com/r/datasets/comments/3bxlg7/i_have_every_publicly_available_reddit_comment/

-. 트위터 정서 분석
https://github.com/jonbruner/twitter-analysis

-. 영어 음성 데이터베이스
http://www.linguistics.ucsb.edu/research/santa-barbara-corpus

-. SNAP 데이터베이스
https://snap.stanford.edu/data/index.html

6. 대형 데이터세트
-. ImageNet
http://www.image-net.org/

-. 얼굴 인식
http://www.face-rec.org/databases/

-. 고양이, 강아지
https://www.kaggle.com/c/dogs-vs-cats-redux-kernels-edition

-. Tiny Image
http://horatio.cs.nyu.edu/mit/tiny/data/index.html

-. Indian Movie Face
http://cvit.iiit.ac.in/projects/IMFDB/

**카테고리 및 형식별
https://github.com/rasbt/pattern_classification/blob/master/resources/dataset_collections.md#dataset-repositories

**주제
https://dreamtolearn.com/ryan/1001_datasets

**위치
https://opendatainception.io/

**다양한 주제에 재한 데이터세트
https://www.reddit.com/r/datasets/

출처 :
https://medium.com/startup-data-science/data-sets-to-play-with-while-learning-data-science-and-deep-learning-43eb92f28448
Posted by uniqueone
,

http://cristivlad.com/beginning-machine-learning-a-few-resources-subjective/

 

 

 

 

I’ve been meaning to write this post for a while now, because many people following the scikit-learn video tutorials and the ML group are asking for direction, as in resources for those who are just starting out.

So, I decided to put up a short and subjective list with some of the resources I’d recommend for this purpose. I’ve used some of these resources when I started out with ML. Practically, there are unlimited free resources online. You just have to search, pick something, and start putting in the work, which is probably one of the most important aspects of learning and developing any skill.

Since most of these resources involve knowledge of programming (especially Python), I am assuming you have decent skills. If you don’t, I’d suggest learning to program first. I’ll write a post about that in the future, but until then, you could start, hands-on, with the free Sololearn platform.

The following resources include, but are not limited to books, courses, lectures, posts, and Jupyter notebooks, just to name a few.

In my opinion, skill development with more than one type of resource can be fruitful. Spreading yourself too thin by trying to learn from too many places at once could be detrimental though. To illustrate, here’s what I think of a potentially good approach to study ML in any given day (and I’d try to do skill development 6-7 days a week, for several hours each day):

– 1-2 hours reading from a programming book and coding along
– 1 hour watching a lecture, or a talk, or reading a research paper
– 30 minutes to 1 hour working through a course
– optional: reading 1-2 posts.

This would be a very intensive approach and it may lead to good results. These results are dependent of good sleep – in terms of quality (at night, at the right hours) and quantity (7-8 hours consistently).

Now, the short-list…

Starter Resources

  1. A few Courses

1.1. Machine Learning with Python – From CognitiveClass.ai

I put this on the top of the list because it not only goes through the basics of ML such as supervised vs. unsupervised learning, types of algorithms and popular models, but it also provides LABs, which are Jupyter notebooks where you practice what you learned during the video lectures. If you pass all weekly quizes and the final exam, you’ll obtain a free course certificate. I took this course a while ago.

1.2. Intro to Machine Learning – From Udacity

Taught by Sebastian Thrun and Katie Malone, this is a ~10-week, self-paced, very interactive course. The videos are very short, but engaging; there are more than 400 videos that you have to go through. I specifically like this one because it is very hands-on and engaging, so it requires your active input. I enjoyed going through the Enron dataset.

1.3. Machine Learning – From Udacity

Taught by Michael Littman, Charles Isbell, and Pushkar Kolhe, this is a ~4-month, self-paced course, offered as CS7641 at Georgia Tech and it’s part of their Online Masters Degree.

1.4. Principles of Machine Learning – From EDX, part of a Microsoft Program

Taught by Dr. Steve Elston and Cynthia Rudin. It’s a 6-week, intermediate level course.

1.5. Machine Learning Crash Course – from Berkeley

A 3-part series going through some of the most important concepts of ML. The accompanying graphics are ‘stellar’ and aid the learning process tremendously.

Of course, there are many more ML courses on these online learning platforms and on other platforms as well (do a search with your preferred search engine).

If you’re ML savvy, you may be wondering why I am not mentioning Ng’s course. It’s not that I don’t recommend it; on the contrary, I do. But I’d suggest going through it only after you have a solid knowledge of the basics.

Additionally, here are the materials from Stanford and MIT‘s two courses on machine learning. Some video lectures can be found in their Youtube playlists. Other big universities provide their courses on the open on Youtube or via other video sharing platforms. Find one or two and go through them diligently.

  1. Books

2.1. Python for Data Analysis – Wes Mckiney

– to lay the foundation of working with ML related tools and libraries

2.2. Python Machine Learning – Sebastian Raschka

– reference book.

2.3. Introduction to Machine Learning with Python – Andreas Muller and Sarah Guido

– I’ve been using this book as inspiration material in my ML Youtube video series.

Going through these books hands-on (coding along) is critical. Each of them have their github repository of Jupyter notebooks, which makes it even easier to get your hands on the code.

Strong ML skills imply solid knowledge of the mathematics, statistics and probability theory behind the algorithms, atop of the programming skills. Once you get the conceptualized knowledge of ML, you should be studying the complexities of it.

Here’s a list of free books and resources to help you along. It is relevant to ML and data mining, deep learning, natural language processing, neural networks, linear algebra (!!!), and probability and statistics (!!!).

  1. Videos and Playlists

3.1. Luis Serrano – A friendly Introduction to Machine Learning

– one of the most well explained video tutorials that I went through. No wonder Luis teaches with Udacity. His other videos on neural networks bring the same level of quality!

3.2. Roshan – Machine Learning – Video Series

– from setting up the environment to hands-on. Notebooks are also available.

3.3. Machine Learning with Scikit-Learn (Scipy 2016) – Part 1 and Part 2

– taught, hands-on, by Muller and Raschka. Notebooks are available in the description of the videos. Similar videos by these authors are available in the ‘recommended’ section (on the right of the video).

At this point I realized I’ve been using the word ‘hands-on’ way too much. But that’s okay. I guess you get the point.

3.4. Machine Learning with Python – Sentdex Playlist

– Sentdex needs no introduction. His current ML playlist consists of 72 videos.

3.5. Machine Learning with Scikit-Learn – Cristi Vlad Playlist

This is my own playlist. It currently has 27 videos and I’m posting new ones every few days. I’m working with scikit-learn on the cancer dataset and I explore different ML algorithms and models.

3.6. Machine Learning APIs by Example – Google Developers

– presented at the 2017 Google I/O Conference.

3.7. Practical Introduction to Machine Learning – Same Hames

– tutorial from 2016 PyCon Australia.

3.8. Machine Learning Recipes – with Josh Gordon

– from Google Developers.

To find similar channels you can search for anything related to ‘pycon’, ‘pydata’, ‘enthought python’, etc. I also remind you that many top universities and companies posts their courses, lectures, and talks on their video channels. Look them up.

  1. Others

4.1. Machine Learning 101 – from BigML

“In this page you will find a set of useful articles, videos and blog posts from independent experts around the world that will gently introduce you to the basic concepts and techniques of Machine Learning.”

4.2. Learning Machine Learning – EliteDataScience

4.3. Top-down learning path: Machine Learning for Software Engineers

– a collection of resources from a self-taught software engineer, Nam Vu, who purposed to study roughly 4 hours a night.

4.4. Machine Learning Mastery – by Dr. Jason Brownlee

Concluding Thoughts

To reiterate, there is an unlimited number of free and paid resources that you can learn from. To try to include too many is futile and could be counterproductive. Here I only presented a few personal picks and I suggested ways to search for others if these do not appeal to you.

Remember, to be successful in skill development, I’d recommend an eclectic approach by learning and practicing from a combination of different types of resources at the same time (just a few) for a couple of hours everyday.

Learning from courses, hands-on lectures, talks, and presentations, books (hands-on) and Jupyter notebooks is a very demanding and intensive approach that could lead to good results if you are consistent. Good sleep is crucial for skill development. Enjoy the ride!

Image: here.

Posted by uniqueone
,
https://www.facebook.com/groups/bigdatastatistics/permalink/1933151076737646/

Is there any PDF or E-book which is helpful for beginners to learn Statistical concepts such as Regression , Clustering etc.

elementary statistics a step by step approach

Yeah, Introduction to Statistical Learning is a very good book. It is not heavy on the math, and takes more on hands-on approach (in R). if you want to know more of the math, try "Elements of Statistical Learning" by the same authors.

Introduction to statistical learning, and once you are done, elements of statistical learning + pattern recognition and machine learning (Bishop). First two are freely available online (PDF) - legally.

Links:
http://www-bcf.usc.edu/~gareth/ISL/
https://statweb.stanford.edu/~tibs/ElemStatLearn/
https://books.google.com.au/books/about/Pattern_Recognition_and_Machine_Learning.html?id=kTNoQgAACAAJ&source=kp_cover&redir_esc=y&hl=en

cheers

PS Only the first is for beginners. Sort of ...

cheers guys.

Petru Daniel Tudosiu  it really depends on your math background. ISL has much less material than ESL but is more pedagogical in nature with worked easy examples. If you feel strong in math, I suggest go directly to ESL (that's what I did as well), and if you need, check topics from ISL.

Also, alongisde what Foivos suggested (very good books)  I would add All of Statistics by Larry Wasserman (same level as elements of statistical learning)  and Python Machine Learning by Sebastican Raschka (good to learn to implement ML).

Additionally, it might be worth it to look into iPython/Jupyter notebooks or R markdown files that accompany some of these books. An example (easily found with Google): https://github.com/JWarmenhoven/ISLR-python

I found this reference to be much more appropriate than the canonical texts like Elements of statistical learning, which is not for beginners at all.

Data Science for Business strikes a nice balance between the foundational concepts and how to actually use them

https://books.google.com/books/about/Data_Science_for_Business.html?id=_1b4nAEACAAJ

Posted by uniqueone
,
https://www.facebook.com/groups/TensorFlowKR/permalink/454824168191980/

 SNU TF 스터디 모임

https://goo.gl/ihvrGV

위의 링크로 들어가시면 1기 때부터 쭉 모아온 발표자료들이 올라와있으니 자유롭게 다운받으실 수 있습니다. (앞으로도 계속 업데이트될 예정입니다.^^) 그리고 혹시나 스터디 관련한 문의가 있으시면 댓글 또는 저한테 말씀해주세요. :)

그럼 모두 좋은 하루 되시기 바랍니다. 감사합니다!

 

Posted by uniqueone
,

PRML
http://norman3.github.io/prml/
Posted by uniqueone
,

https://youtu.be/OB1reY6IX-o

 

Tutorial materials found here: https://github.com/amueller/scipy-2016-sklearn

See the complete SciPy 2016 Conference talk & tutorial playlist here: https://www.youtube.com/playlist?list=PLYx7XA2nY5Gf37zYZMw6OqGFRPjB1jCy6

Posted by uniqueone
,

List of Free Must-Read Books for Machine Learning
http://blog.paralleldots.com/technology/machine-learning/list-of-free-must-read-books-for-machine-learning/?utm_source=forum&utm_medium=group_post&utm_campaign=Data+Tau+

 

In this article, we have listed some of the best free machine learning books that you should consider going through (no order in particular).

Mining of Massive Datasets

Jure Leskovec, Anand Rajaraman, Jeff Ullman

massive datasets
Based on the Stanford Computer Science course CS246 and CS35A, this book is aimed for Computer Science undergraduates, demanding no pre-requisites. This book has been published by Cambridge University Press.

An Introduction to Statistical Learning (with applications in R)

Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani

statistical learning

This book holds the prologue to statistical learning methods along with a number of R labs included.

Deep Learning

Ian Goodfellow and Yoshua Bengio and Aaron Courville

deep learning

This Deep Learning textbook is designed for those in the early stages of Machine Learning and Deep learning in particular. The online version of the book is available now for free.

Bayesian methods for hackers

Cam Davidson-Pilon

hackers method

This book introduces you to the Bayesian methods and probabilistic programming from a computation point of view. The book is basically a godsend for those having a loose grip on mathematics.

Understanding Machine Learning: From Theory to Algorithms

Shai Shalev-Shwartz and Shai Ben-David

understanding ml

For the mathematics- savvy people, this is one of the most recommended books for understanding the magic behind Machine Learning.

Deep Learning Tutorial

LISA lab, University of Montreal

Deep Learning tutorial using Theano is a must- read if you are willing to enter this field and is absolutely free.

Scikit-Learn Tutorial: Statistical-Learning for Scientific Data Processing

Andreas Mueller

scikit learn
Exploring statistical learning, this tutorial explains the use of machine learning techniques with aim of statistical inference. The tutorial can be accessed online for free.

Machine Learning (An Algorithmic Perspective)

Stephen Marsland

machine learning

This book has a lot to offer to the Engineering and Computer Science students studying Machine Learning and Artificial Intelligence. Published by CRC press and written by Stephen Marsland, this book is unfortunately not free. However, we highly recommend you to invest in this one. Also, all the python code are available online. These code are a great reference source for python learning.

Building Machine Learning Systems with Python

Willi Richert and Luis Pedro Coelho

ML python

This book is also not available for free but including it serves our list justice. It is an ultimate hands-on guide to get the most of Machine Learning with python.

These are some of the finest books that we recommend. Have something else in mind? Comment below and contribute to the list. 

Posted by uniqueone
,
https://www.mathworks.com/discovery/regularization.html

 

 

Prevent statistical overfitting with regularization techniques

Regularization techniques are used to prevent statistical overfitting in a predictive model. By introducing additional information into the model, regularization algorithms can deal with multicollinearity and redundant predictors by making the model more parsimonious and accurate. These algorithms typically work by applying a penalty for complexity such as by adding the coefficients of the model into the minimization or including a roughness penalty.

Techniques and algorithms important for regularization include ridge regression (also known as Tikhonov regularization), lasso and elastic net algorithms, as well as trace plots and cross-validated mean square error. You can also apply Akaike Information Criteria (AIC) as a goodness-of-fit metric.

For more information on regularization techniques, please see Statistics and Machine Learning Toolbox.

See also: Statistics and Machine Learning Toolbox, Machine Learning

 

Posted by uniqueone
,

Time Series Forecasting with Python 7-Day Mini-Course - Machine Learning Mastery
http://machinelearningmastery.com/time-series-forecasting-python-mini-course/
Posted by uniqueone
,

https://www.facebook.com/groups/DeepNetGroup/permalink/385843868475168/ 

 

News Flash: Check out Issue #5 (http://aidl.io/) of AIDL Weekly!  We have a Special Issue on Self-driving Car this week. 
Also woohoo! Check out Episode 4 of AIDL Office Hours (https://www.youtube.com/watch?v=Qqtc9_05yLc)! We have Han Shu from airbnb talks about his very interesting experience in data science and speech recognition.

Welcome! Welcome! We are the most active FB group for Artificial Intelligence/Deep Learning, or AIDL.  Many of our members are knowledgeable so feel free to ask questions. 

We have a tied-in newsletter: https://aidlweekly.curated.co/  and

a YouTube-channel, with (kinda) weekly show "AIDL Office Hour",
https://www.youtube.com/channel/UC3YM5TEbSqIpFGH85d6gjKg

Posting is strict at AIDL, your post has to be relevant, accurate and non-commerical (FAQ Q12).   Commercial posts are only allowed on Saturday.  If you don't follow this rule, you might be banned.

FAQ:

Q1: How do I start AI/ML/DL?
A:
Step 1: Learn some Math and Programming,
Step 2: Take some beginner classes. e.g. Try out Ng's Machine Learning.
Step 3: Find some problem to play with. Kaggle has tons of such tasks.
Iterate the above 3 steps until you become bored. From time to time you can share what you learn.

Here's a post which one of us (Arthur) wrote up.  It summarizes his experience in learning machine learning and you might find it useful. (http://thegrandjanitor.com/2016/05/21/learning-machine-learning-some-personal-experience/)

Q2: What is your recommended first class for ML?
A: Ng's Coursera, the CalTech edX class, the UW Coursera class is also pretty good.

Q3: What are your recommended classes for DL?
A: Go through at least 1 or 2 ML class, then go for Hinton's, Karparthay's, Socher's, LaRochelle's and de Freitas. For deep reinforcement learning, go with Silver's and Schulmann's lectures. Also see Q4.

Q4: How do you compare different resources on machine learning/deep learning?
A: (Shameless self-promoting plug by Arthur) Here is an article, "Learning Deep Learning - Top-5 Resources" written by me (Arthur) on different resources and their prerequisites. I refer to it couple of times at AIDL, and you might find it useful: http://thegrandjanitor.com/2016/08/15/learning-deep-learning-my-top-five-resource/ .

Other than my own list, here are some very good lists I recommend you to read through as well,
* YerevaNN Lab's "A Guide to Deep Learning" (http://yerevann.com/a-guide-to-deep-learning/)
* Reddit's machine learning FAQ has another list of great resources as well.

Q5: How do I use machine learning technique X with language L?
A: Google is your friend. You might also see a lot of us referring you to Google from time to time. That's because your question is best to be solved by Google.

Q6: Explain concept Y. List 3 properties of concept Y.
A: Google. Also we don't do your homework. If you couldn't Google the term though, it's fair to ask questions.

Q7: What is the most recommended resources on deep learning on computer vision?
A: cs231n. 2016 is the one I will recommend. Most other resources you will find are derivative in nature or have glaring problems.

Q8: What is the prerequisites of Machine Learning/Deep Learning?
A: Mostly Linear Algebra and Calculus I-III. In Linear Algebra, you should be good at eigenvectors and matrix operation. In Calculus, you should be quite comfortable with differentiation. You might also want to have a primer on matrix differentiation before you start because it's a topic which is seldom touched in an undergraduate curriculum.
Some people will also argue Topology as important and having a Physics and Biology background could help. But they are not crucial to start.

Q9: What are the cool research papers to read in Deep Learning?
A: We think songrotek's list is pretty good: https://github.com/songrotek/Deep-Learning-Papers-Reading-Roadmap. Another classic is deeplearning.net's reading list: http://deeplearning.net/reading-list/.

Q10: What is the best/most recommended language in Deep Learning/AI?
A: Python is usually cited as a good language because it has the best support of libraries. Most ML libraries from python links with C/C++. So you get the best of both flexibility and speed.
Other also cites Java (deeplearning4j), Lua (Torch), Lisp, Golang, R. It really depends on your purpose. Practical concerns such as code integration, your familiarity with a language usually dictates your choice. R deserves special mention because it was widely used in some brother fields such as data science and it is gaining popularity.

Q11: I am bad at Math/Programming. Can I still learn A.I/D.L?
A: Mostly you can tag along, but at a certain point, if you don't understand the underlying Math, you won't be able to understand what you are doing. Same for programming, if you never implement one, or trace one yourself, you will never truly understand why an algorithm behave a certain way.
So what if you feel you are bad at Math? Don't beat yourself too much. Take Barbara Oakley's class on "Learning How to Learn", you will learn more about tough subjects such as Mathematics, Physics and Programming.

Q12: Would you explain more about AIDL's posting requirement?
A: This is a frustrating topic for many posters, albeit their good intention. I suggest you read through this blog post http://thegrandjanitor.com/2017/01/26/posting-on-aidl/ before you start any posting.

Q13: What is the list of common public database?
A: Take a look of the collection from our members: https://www.facebook.com/groups/DeepNetGroup/permalink/394240667635488/
Posted by uniqueone
,


http://www.datasciencecentral.com/profiles/blogs/r-python-machine-learning-dataviz-most-popular-resources


A simple way to find great articles and resources on popular subjects such as data science, machine learning, deep learning, Python, R , data sets, dataviz, IoT, AI - or even Excel - is to use our data science search engine. This page, populated with pre-selected queries, is an excellent starting point. The search box can be found on DSC and all our channels, on all pages, at the very top, on the right-hand side. For instance, click here to find results for Python.


Below is a selection of highly popular articles posted on DSC, over the last few years. Enjoy the reading!
66 job interview questions for data scientists
20 short tutorials all data scientists should read (and practice)
Data Science Cheat Sheet
16 analytic disciplines compared to data science
24 Data Science, R, Python, Excel, and Machine Learning Cheat Sheets
4 easy steps to becoming a data scientist
27 free data mining books
Tutorial: How to detect spurious correlations, and how to find the ...
Visualizations: Comparing Tableau, SPSS, R, Excel, Matlab, JS, Pyth...
R Tutorial for Beginners: A Quick Start-Up Kit
Great Github list of public data sets
38 Seminal Articles Every Data Scientist Should Read
Data scientist paid $500k can barely code!
Jackknife logistic and linear regression for clustering and predict...
10 Python Machine Learning Projects on GitHub
Cheat Sheet: Data Visualization with R
Python (and R) for Data Science - sample code, libraries, projects,...
Posted by uniqueone
,

Facebook, Data Mining / Machine Learning / AI

I'm a programmer and I'm serious about learning Machine Learning/Neural Networks. Issue is that I have no idea where to start. Anyone can suggest maybe a certain course or generally a place to start from?

----------------------------------------------------

1. Andrew NG's Machine Learning course in Coursera

https://www.coursera.org/learn/machine-learning

2. Essence of linear algebra preview

https://www.youtube.com/watch?v=kjBOesZCoqc&list=PLZHQObOWTQDPD3MizzM2xVFitgF8hE_ab

3. 7 Steps to Mastering Machine Learning With Python

http://www.kdnuggets.com/2015/11/seven-steps-machine-learning-python.html/2

4. youtube sirajology

https://www.youtube.com/channel/UCWN3xxRkmTPmbKwht9FuE5A

5. Michael Nielsen

http://neuralnetworksanddeeplearning.com/chap1.html

6. https://bigdatauniversity.com/

 

7. udacity Deep Learning Fundations

 

8. Udemy Course by Kirill and hadelen if you are extremely new to data science.

superdatascience.com

9. Geoffrey Hinton's neural net course on coursera

 

10. welch labs Neural Networks Demystified

https://www.youtube.com/watch?v=bxe2T-V8XRs&list=PLiaHhY2iBX9hdHaRr6b7XevZtgZRa1PoU

11. 초보자를 위한 조언 및 사이트 소개

https://vmanju.com/how-to-get-started-with-artificial-intelligence-90f14b2bc321#.k4vef6fd3

caffe를 어떻게 설치하고 어떻게 동작하는지 설명돼 있음.

https://github.com/humphd/have-fun-with-machine-learning/blob/master/README.md

 

12. An Introduction to Statistical Learning with Applications in R

http://www-bcf.usc.edu/~gareth/ISL/

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Posted by uniqueone
,

http://geum.tistory.com/3

 

Multivariable analysis(다변수 분석) 중 의학 통계에서 많이 사용되는 것은 크게 3가지가 있다.

1) Multiple Linear regression
  : yi = α + β1xi1 + β2xi2 + β3xi3 + · · · · · · + βkxik + ε

2) Logistic regression
  : ln[p/1-p] =  α + β1xi1 + β2xi2 + β3xi3 + · · · · · · + βkxik

3) Proportional hazard regression
 : λi[t] = λ0[t] exp(α + β1xi1 + β2xi2 + β3xi3 + · · · · · · + βkxik )

 다변수 분석은 말 그대로 다양한 독립변수를 통계적 모형에 넣어서 서로의 영향을 보정(adjust)하고, 상호작용(interaction)을 계산하고 나아가서 더 정확하게 종속변수를 예측할 수 있다.

 세가지 통계적 모형은 전부 좌측항의 종속변수(yi, ln[p/1-p], λi[t]) 를 예측하기 위해
선형 모형(α + β1xi1 + β2xi2 + β3xi3 + · · · · · · + βkxik + ε)을 사용하는데 이런 선형 모형을
일반화하여 분석하는 것이 Generalized Linear Model이다.

여기서 일반화 한다는 것은 이런 선형모형을 가지고 위 세가지의 통계적 모형 뿐만 아니라
다야한 통계모형을 구축하여 분석할 수 있다는 것이다.

하지만 위 세가지 모형에서 선형 모형이 종속변수와 연결된 "함수 형태"가 각기 다르다.
예로 2)의 logistic regression 같은 경우 ln[p/1-p]의 형태로 연결 되어있는데 이렇게
독립변수와 선형모형의 종속변수를 연결하는 함수를 연결함수(link function)라고 한다.

또한 종속변수의 분포(distribution)도 정규분포,이항분포, 포아송 분포 등 다양한데
GLM에서는 다양한 분포와 연결함수를 가지고 다양한 통계적 모형을 구축하여 분석할 수 있다.


간단한 logsitic regression 을 GLM으로 분석해보자.

apache  fate
0         Alive
2         Alive
3         Alive
4         Alive
5         Alive
6         Alive
7         Alive
8         Alive
9         Alive
10       Alive
11       Alive
12       Alive
13       Alive
14       Alive
15       Alive
16       Alive
17       Dead
18       Dead
19       Alive
20       Alive
21       Dead
22       Dead
23       Alive
24       Dead
25       Dead
26       Dead
27       Alive
28       Dead
29       Dead
30       Dead
31       Dead
32       Dead
33       Dead
34       Dead
35       Dead
36       Dead
37       Dead
41       Alive

38명의 환자의자료로서 독립변수는 APACHE score, 종속변수는 사망 여부이다.
종속변수가 이항변수이므로 APACHE score가 얼마나 사망여부를 예측할 수 있느가를
분석하기 위해 간단히 logistic regression 을 시행하여 OR(p/1-p)를 구할 수 있으나
GLM으로도 가능하다.

(다음은 STATA에서 GLM을 시행시킨 명령어임)

glm fate apache, family(binomial) link(logit)

-> glm[GLM시행하라는 명령어] fate[종속변수 이름] apahce[독립변수 이름], family(binomial)[분포] link(logit) [연결함수]

실제로 시행시키면 선형모형의 회귀계수값을 구해지고 간단한 공식을 이용해 각 환자의
사망확률을 계산할 수 있다. 이를 그래프로 나타내면 다음과 같다.


점은 실제 간찰된 각 환자 케이스에서 APACHE score와 실제 사망여부를 표시하였고
곡선은 실행한 GLM모형으로 예측한 APACHE score에 대한 사망확률 여부를 보여 준다.
(SPSS에서 logistic regression을 실행했을 때 보여주는 집단 분류 히스토그램과는 약간
다름)


cf) 실제 proportional hazard regression의 경우 GLM을 잘 이용하지 않음.
실제 의학통계에서는 Log-linear model, Possion regression model 을 주로
GLM으로 시행함.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Posted by uniqueone
,
An Interactive Tutorial on Numerical Optimization
http://www.benfrederickson.com/numerical-optimization/
Posted by uniqueone
,

http://shb.skku.edu/bigs/menu3/sub01.jsp

 

빅 데이터(기계학습/패턴분석)의 수학적 이해를 위한 책들

 

- 성균관대 수학과 교수 정윤모

 

 

 

  1. 데이터 과학 (data science)

    데이터 과학에 필요한 수학

    데이터 과학 학습을 위한 책 목록

     

 

 

 

데이터 과학 (data science)


  이 분야를 폭넓게 가리키는 말은 빅 데이터(Big data), 인공지능(Artificial intelligence)이라 할 수 있고, 수학적인 부분을 추린다면 기계학습(machine learning) 또는 패턴 분석(pattern recognition/classification)으로 관심을 좁힐 수 있다. 빅 데이터의 중요성이 점점 더 증가함에 따라 데이터 과학(Data science)라고도 불리며 아래의 데이터 과학 벤 다이어그램(data science venn diagram)에서 볼 수 있듯이 수학은 데이터 과학의 핵심 요소이다.

 

 

  여기서 Hacking Skills이란 컴퓨터 프로그래밍 능력, 특히 수치계산 코딩 능력, 알고리즘 사고 능력이라고 볼 수 있고, substantive expertise는 데이터가 어떻게 얻어졌으며 어떤 내용과 정보를 가지고 있는지, 데이터 자체에 대한 지식을 말한다. 다른 말로는 domain knowledge라고도 할 수 있다. 자세한 내용은 다음 Drew Conway의 홈페이지에서 보기 바란다.

http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram

 

 

 

 

데이터 과학에 필요한 수학


  빅 데이터 또는 데이터 과학의 필수 수학은 확률/통계(Probability/Statistcs), 수치선형대수(Numerical Linear Algebra), 최적화(Optimization)로 요약할 수 있다. 데이터 과학 벤다이어그램에서 hacking skill은 프로그래밍 능력이라고 줄여 말할 수 있는데, R 또는 Python, 혹은 두 언어를 다 사용할 수 있는 기본적인 프로그래밍 능력을 갖추면 된다.

 

 

 

수학 지식이 필요한 순서를 도식적으로 나열한다면 아래의 도표로 표현할 수 있다.

 

 

 

 

 

데이터 과학 학습을 위한 책 목록


  이 글은 빅 데이터 분야의 지식을 갖추기 위해서 볼 만한 책들을 골랐으며, 가급적 무료로 인터넷을 통해서 받을 수 있는 책들을 골랐다. 수학과 학부 3, 4 학년 또는 대학원을 진학하는 학생의 수학 실력을 가정하고 책을 선정했다. 무엇을 배우려고 하면, 최소한의 기본 지식이 있어야 한다.

 

  책의 첫 장부터 마지막 장까지 읽는 것은 절대로 추천하지 않는다. 소설 읽기가 아니다. 시간 낭비다. 자신에게 필요한 것이 뭐고 무엇을 읽어야 하는지 파악하라. 필요했을 때 배우고 사용한 지식이 제대로 배운 지식이고 오래간다. 가장 중요한 것은 ‘무엇을 알고 있는가’가 아니라 ‘무엇을 할 수 있는가’다.

 

  책을 고를 때 너무 두꺼운 책들은 좋지 않다. 특히 원서는 영어가 능숙하다 하더라도 독해능력과 속도가 영어권 사람들에게 비해 현저하게 떨어질 수밖에 없다. 따라서 말로 풀어서 길게 설명한 책보다는 핵심만 간략하게 설명하고 수식과 그림, 표로 잘 설명된 책을 선택하는 것이 좋은 방법이다. 백과사전식으로 포괄적으로 정리된 책보다는 강의 교재에 가깝게 간략하게 핵심만 수록돼 있는 책을 선택하는 게 좋다. 어쩔 수 없이 장황한 책을 볼 때는 자신에게 필요하고 중요한 부분을 선택적으로 잘 추려서 읽는 힘이 필요하다. 모르는 것이 생길 때마다 필요한 부분만 찾아서 보는 것도 빨리 잘 배울 수 있는 방법이다.

 

  유투브(utube)에 올라온 짧은 강의, 인터넷에서 받을 수 있는 강의 노트, 강의 슬라이드가 빠른 시간에 핵심 내용을 파악하거나 꼭 필요한 부분이 무엇인지 아는데 좋다. 대규모온라인무료강의(MOOC)를 이용하는 것도 좋은 방법이다.

 

  경험과 호불호가 영향을 주었겠지만, 인터넷의 여러 정보를 참조했으며 널리 쓰이는 책들을 위주로 선정하였다. 인터넷에서도 비슷한 내용의 목록을 찾을 수 있을 것이며 목록에 많은 유사성이 있음을 확인할 수 있을 것이다. 중요함에도 부주의, 실수, 무지로 인해 빠진 책들의 추천을 언제든지 환영하며 인터넷 링크가 잘못됐거나 정보가 바뀐 것이 있으면 알려주면 고맙겠다.

 

  다음 링크의 책 목록들도 유용하다.

 

UC Berkeley 통계학과의 대학원생들이 만든 통계학 추천 책 목록이다.

https://www.stat.berkeley.edu/mediawiki/index.php/Recommended_Books#Theoretical_Statistics

또 다음은 데이터 과학 분야의 무료 책들을 잘 모아 놨다. 이 글에서 추천하는 책들의 링크도 있을 것이다.

https://www.analyticsvidhya.com/blog/2016/02/free-read-books-statistics-mathematics-data-science/

 

 

  수학 지식이 어느 정도 된다면 바로 시작해도 그만이다. 공부하다가 수학이 부족하다고 느끼면 부족한 부분을 매우고 다시 공부해도 좋다. 패턴분석이던 수학이던 필요한 부분이 생기면 그때그때 적절히 공부하라. 최근 이 분야가 가장 뜨거운 감자이므로 인터넷에 좋은 강의 교재, 책, 또는 대규모온라인무료강의(MOOC)같은 온라인 강좌가 많이 있으니 이를 이용하는 것도 좋은 방법이다.

 

Pattern Recognition and Machine Learning, by Christopher Bishop, Springer

- 대표적인 교재이지만 조금 두껍고 장황하다. 영어가 능숙하다면 말로 잘 풀어서 설명했으므로 초보자가 이해하기 좋지만 그렇지 않다면 읽는데 시간이 걸린다.

 

The Elements of Statistical Learning, by Trevor Hastie, Robert Tibshirani, and Jerome Friedman, Springer

http://statweb.stanford.edu/~tibs/ElemStatLearn/

- 대학원 수준의 표준 교재이다. 이 분야의 바이블같은 책이지만 초심자에게는 어려울 수 있다.

 

An Introduction to Statistical Learning: with Applications in R, by Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani, Springer

http://www-bcf.usc.edu/~gareth/ISL/

- 위 책의 저자가 쓴 학부 수준의 교재로 R을 같이 공부할 수 있다.

 

Pattern Classification, by Richard O. Duda, Peter E. Hart, and David G. Stork, Wiley-Interscience

- 과거 이 분야에서 바이블처럼 보던 교재이다. 번역도 돼있고, Bishop 책보다는 읽기 쉬울 수 있다.

 

Data Mining And Analysis, by Mohammed J. Zaki, and Wagner Meira Jr, Cambridge University Press

http://www.dataminingbook.info/pmwiki.php/Main/BookDownload

- 이 분야는 책으로 출판했지만 온라인에 공개되어 있는 경우가 많은데 이런 책들을 이용하는 것도 좋은 방법이다. 개인적인 온라인 사용을 전제로 다운로드 받을 수 있다.

 

Deep Learning, by Ian Goodfellow, Yoshua Bengio, Aaron Courville, MIT Press

https://github.com/HFTrader/DeepLearningBook

- 많은 기계학습의 문제들에서 가장 좋은 결과를 내고 있으며 최근 Alphago 등을 통하여 일반인들도 알려진 심화학습(deep learning)의 학습서이다. 2016년 연말에 출간될 예정이다.

 

Understanding Machine Learning: From Theory to Algorithms, by Shai Shalev-Shwartz, Shai Ben-David, Cambridge University Press

http://www.cs.huji.ac.il/~shais/UnderstandingMachineLearning/

- 기본 내용을 이해하고 좀 더 수학적인 입장에서 접근할 때 좋은 책이다.

 

A Probabilistic Theory of Pattern Recognition, by Luc Devroye, Laszlo Gyorfi, and Gabor Lugosi, Springer

- 패턴 분석의 본격적인 수학 이론 도서라고 할 수 있다.

 

 

  기계학습/패턴 분석 교재들의 앞부분에 정리되어 있는 확률/통계 부분도 도움이 된다. 기계학습/패턴 분석에 필요한 확률/통계가 잘 정리되어 있고 특히 이해를 돕는 좋은 그림들이 많다. 예를 들면 패턴 분석/기계학습에서 추천한 다음 책이 그러하다.

 

Data Mining And Analysis, by Mohammed J. Zaki, and Wagner Meira Jr, Cambridge University Press

http://www.dataminingbook.info/pmwiki.php/Main/BookDownload

www.cs.rpi.edu/~zaki/dataminingbook

 

All of Statistics: A Concise Course in Statistical Inference, by Larry Wasserman, Springer

- 기계학습, 데이터 과학 분야의 응용을 고려한 기초 통계 교재이다.

 

Theoretical Statistics, by Robert W. Keener, Springer

- 대학원 수준의 기본 통계 교재이다.

 

확률은 측도론(measure theory)이 나오지 않는 기초 책들은 소개하지 않겠다. 인터넷에서 구할 수 있는 학교/강의 교재, 비디오 교재, 자료들을 참고하라. 우연(random)과 확률변수(random variable)의 의미를 깨닫는 것이 중요하다. 그리고 대수정리(Law of large number), 중심극한정리(central limit theorem)와 같은 기본 이론의 의미를 깨닫는 것이 필요하다.

 

Probability with Martingales, by David Williams, Cambridge University Press

- 우연과 확률변수의 의미를 깨달았고 좀 더 수학적으로 접근하고 싶다면 측도론을 바탕으로 한 책들을 공부해야 한다. 간략하게 요점이 잘 정리돼 있다.

 

Probability: Theory and Examples, by Rick Durrett, Cambridge University Press

https://services.math.duke.edu/~rtd/PTE/pte.html

- 측도론 기반의 대학원 수준의 표준 교재이다. 여러 학교에서 교재로 사용한다.

 

기본 확률/통계를 이해했다면 확률보행(random walk), 마르코프 연쇄 등의 (Markov chain)확률과정론(Stochastic process)의 기본 내용을 알아두는 것이 좋다.

 

Markov Chains, by Norris, Cambridge University Press

- 마르코프 연쇄의 좋은 입문서이다.

 

Introduction to Stochastic Processes, by Gregory F. Lawler, Chapman and Hall

- 기본적인 확률과정론 교재이다.

 

Markov Chain and Mixing Times, by Levin, Peres and Wilmer, American Mathematical Society

http://pages.uoregon.edu/dlevin/MARKOV/
- 기본적인 확률과정론 교재이며 인터넷에서 다운받을 수 있다.

 

 

  정형 데이터는 행렬로 표현된다. 따라서 행렬의 성질을 이해하고 계산할 수 있는 능력은 데이터 처리의 기본이다. 또한 모든 과학계산은 결국 수치선형대수의 문제로 귀결된다고 해도 틀린 말은 아니다.

 

Numerical Linear Algebra, by Lloyd N. Trefethen, and David Bau III, SIAM

- 수치선형대수의 표준 교재이다.

 

Applied Numerical Linear Algebra, by James W. Demmel, SIAM

- 표준 교재이며 모델링과 응용 측면이 잘 소개돼 있다.

 

Iterative Methods for Sparse Linear Systems, by Yousef Saad, SIAM

http://www-users.cs.umn.edu/~saad/IterMethBook_2ndEd.pdf

- 거대 희소 행렬(large sparse)을 풀기 위한 반복법(iterative method)에 대한 기본 교재이다. 희소성과 반복법은 빅 데이터를 처리하는 기본 아이디어들이다.

 

Matrix Computations, by Gene H. Golub, and Charles F. Van Loan, Johns Hopkins University Press

- 행렬 계산의 방법들을 집대성해놓은 고전이다.

 

Matrix Analysis, by Roger A. Horn, and Charles R. Johnson, Cambridge University Press

- 행렬 이론을 본격적으로 공부할 수 있는 이론 도서이다. 국내에 수입되어 있다.

 

 

  패턴 분석, 기계학습에서는 많은 경우 확률이 가장 높은 예를 답으로 찾기 때문에 최적화 문제가 근본적으로 나온다.

 

Convex Optimization, by Stephen Boyd, Lieven Vandenberghe, Cambridge University Press

http://stanford.edu/~boyd/cvxbook/

- 기본 교재로 인터넷에서 다운로드 받을 수 있다. 공대 대학원 교재이기도 함으로 깊은 수학적 배경 없이 읽을 수 있다. 이론적 배경이 나오는 5장, 알고리즘이 나오는 9, 10, 11장만 읽으면 충분하다.

 

Nonlinear Programming, by Dimitri P. Bertsekas, Athena Scientific

- 대학원 수준의 교재이다. 좀 더 최적화를 깊게 공부할 수 있는 책이다.

 

Numerical Optimization, by Jorge Nocedal, Stephen Wright, Springer

- 대학원 수준의 교재이다.

 

Convex Analysis, by Ralph Tyrell Rockafellar, Princeton University Press

- 볼록 해석(convex analysis)의 이론이 필요하거나 모르는 부분이 생기면 참조해야 하는 책이다. 볼록 해석(convex analysis)의 고전이다. 교과서로 사용하기에는 굉장히 형식적이며 읽기 딱딱하다. 참고서로 생각하라.

 

 

  벡터 미적분학은 삼각함수나 미분적분학처럼 기본으로 알고 있어야하기 때문에 위에서 언급하지 않았다. 하지만 문제는, 특히 수학과 출신들이 이 과목을 잘 모르고 있다는 점이다.

행렬을 다룬다는 것은 다차원의 선형 분석과 깊이 관련돼 있다. 따라서 다차원 벡터의 이해와 분석에 대한 기본 지식은 필수적이다. 또한 다차원의 기울기(gradient), 발산(divergence), 회전(curl)과 같은 기본개념들은 확률보행, 확산과정(diffusion process)와 같은 확률과정론적 접근과 데이터의 기하학을 이해하기 위한 필수 요소들이다.

 

Vector Calculus, by Jerrold E. Marsden, and Anthony Tromba, W. H. Freeman

- 학부 2-3학년 수준의 표준 교재이다. 판이 올라가면서 점점 두꺼워지고 장황해 지는 경향이 있다.

 

Advanced Engineering Mathematics, by Erwin Kreyszig, Wiley

- 이 책의 벡터 미적분학 부분만 보는 것도 좋은 방법이다. 물리적 의미와 함께 잘 설명돼있다.

 

 

기존의 컴퓨터공학과 교육과정 때문에 C/C++나 Java를 알고 사용해야 한다고 생각하는 사람들이 많은데 이것은 대단한 오해다. 하드웨어에 최적화시켜 줄 수 있는 언어가 아니라 추상적인 작업을 할 수 있고 관련 패키지, 라이브러리가 많이 있어서 개발자 시간을 줄이는 것이 중요하다. 저급언어(low-level programming language)에 가까운 언어보다는 고급 언어(high-level programming language), 하드웨어에 최적화하기 쉬운 언어보다는 인간이 쓰기 편한 언어를 사용하라는 말이다. 자신의 아이디어를 구현하고 옳은지 확인하는 게 우선이다. 빠르게 실행되는 코드는 그 다음이다. 정말 빠른 코드가 필요하면 코딩 전문가에게 맡겨 버리면 그만이다. 다른 글 ‘컴퓨터 언어 빨리 배우기’도 참조하기 바란다.

 

다음은 어떤 언어들이 데이터 분야에 사용되는지에 관한 2014년 통계 자료이다.

http://www.kdnuggets.com/2014/08/four-main-languages-analytics-data-mining-data-science.html

 

입문을 위한 언어로는 R 또는 Python, 혹은 두 언어 다 아는 것으로 충분하며, 이 두 언어가 최종으로 필요한 언어가 될 가능성도 높다. 좀 더 하드웨어에 관련해서 들어가고자 하면 데이터베이스 관리를 위한 SQL, 클러스터에서 분산 처리를 위한 Hadoop 정도를 들 수 있을 것이다. R과 Python의 장단점과 비교를 알고 싶다면 아래의 링크를 참조하기 바란다.

http://dataconomy.com/r-vs-python-the-data-science-wars/

http://www.kdnuggets.com/2015/05/r-vs-python-data-science.html

 

R에 대한 참고 서적은 올리지 않겠다. R은 통계에 특화된 언어로 어쩌면 프로그래밍 언어라기보다는 패키지에 가까워서 몇 시간 안에 기본적인 사용법을 습득할 수 있다. R 홈페이지 의 매뉴얼 페이지에서

https://cran.r-project.org/manuals.html

‘An Introduction to R’ 정도 빠르게 읽고 사용해 보면 충분할 것이다.

패턴분석/기계학습에서 소개한 An Introduction to Statistical Learning: with Applications in R을 공부하면서 같이 배우는 것도 한 방법이다.

 

amazon.com에서 python을 검색하면 Mark Lutz이 쓴 Learning Python이 첫 번째로 나오는데, 1600 페이지에 달한다. 우리말도 아닌 영어로 된 1600 페이지 짜리 책을 컴퓨터 언어를 배우기 위해서 읽는다는 건 미친 짓이다. 한글로 된 몇백 페이지나 되는 책들도 마찬 가지다. 특히 베스트셀러라는 것에 속지 마라. 베스트셀러는 대부분 컴맹들을 위한 책이다. python은 쉽게 배울 수 있는 언어다.

 

컴퓨터 언어도 언어다. 외국어를 습득하기 위해서는 읽고 외우는게 아니라 계속해서 말하고 써야 늘듯이, 컴퓨터 언어도 계속해서 프로그래밍을 해봐야 는다. 따라서 정말로 문법을 모른다면 깔끔하게 짧게 정리된 강의교재나 tutorial을 인터넷에서 찾아서 보고 구현해 보라. 다음은 python의 문서 페이지이다. 본인 수준에 맞는 적절한 내용을 찾아보라.

https://www.python.org/doc/

 

tutorial도 참조하라.

https://docs.python.org/3/tutorial/index.html

https://docs.python.org/3/download.html

개발자인 Guido Van Rossum이 만들었는데, 언어를 개발한 철학이 느껴지지만 쉽지는 않다.

 

범용 언어로 python을 배우기보다는 과학계산, 데이터 처리용으로 배워야 한다는 것을 명심하라. 그런 의미에서는 다음의 과학계산 강의 노트를 추천한다.

 

Python Scientific lecture notes-EuroScipy tutorial team

www.scipy-lectures.org/_downloads/PythonScientific-simple.pdf

 

데이터 해석관련으로는 다음의 책을 추천한다. 번역본도 있다.

 

Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython, by Wes McKinney, O'Reilly Media

 

 

  데이터 과학의 응용을 생각한다면 무인자동차처럼 신호, 이미지, 비디오, 영상 등에 관련된 다양한 기술들이 필요하다. 또한 소셜 네트워크를 분석하려면 그래프 이론이 필요하고, 좀 더 정확한 계산을 하고자 한다면 과학계산의 이론이 필요하다. 중요해 보이는 책 몇 권을 선정했다.

 

Digital Image Processing, by Rafael C. Gonzalez, and Richard E. Woods, Pearson

- 영상처리 기본 교과서이다. 기계학습 또는 인공지능의 기본 목표 중 하나가 인간 시각의 이해와 모방이다. 국내에 수입되어 있다.

 

Computer Vision: Algorithms and Applications, by Richard Szeliski, Springer

http://szeliski.org/Book

- 컴퓨터 비전에 대한 기본 교과서. Image Processing이 전자공학의 접근이라면 컴퓨터 비전은 컴퓨터 공학의 접근 방식이다.

 

A Wavelet Tour of Signal Processing, Stephane Mallat, Academic Press

- Wavelet과 신호 처리(signal processing)에 관한 명저이다. 영상 처리, 기계학습, 희소성 이론 등 다양한 내용을 포괄한다.

 

Spectral graph theory, by Fan R. K. Chung, American Mathematical Society

- 그래프 스펙트럼 이해에 기본 교과서이다. 처음 4장은 아래의 링크에서 받을 수 있다.

http://www.math.ucsd.edu/~fan/research/revised.html

 

Introduction to Algorithms, by Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein, MIT Press

- 알고리즘에 관한 기본 교과서이다. C/C++ 또는 Java가 어쩌고저쩌고 떠들지 말고 알고리즘 사고력이나 늘릴 것. 번역본이 있다.

 

Scientific computing, Michael T. Heath, McGraw-Hill

- 과학계산/수치해석의 기본 교과서이다. 학부 4학년 혹은 대학원 신입생 수준의 책이다.

 

Numerical Analysis by Richard L. Burden, J. Douglas Faires, and Annette M. Burden, Cengage Learning

- 학부 수준 과학계산/수치해석의 교과서이다.

 

Elementary Differential Geometry, by A.N. Pressley, Springer

- 기본적인 미분기하학 교재이다. 데이터를 잘 다루기 위해서는 다차원 공간의 기하학을 이해할 줄 알아야 한다. 특히 데이터의 비선형성이 강조될 때는 더욱 그러하다. 

 

 

 

 

Posted by uniqueone
,
Here are 13 books on Machine Learning and Data Mining that are great resources, references, and refreshers for Data Scientists. (This is definitely a small selective subsample of the many excellent books available.)
The Top Ten Algorithms in Data Mining, by Xindong Wu and Vipin Kumar (editors)
Learning from Data, by Y.Abu-Mostafa, M.Magdon-Ismail, H-S.Lin
Mining of Massive Datasets, by Jeffrey David Ullman and Anand Rajaraman
Handbook of Statistical Analysis and Data Mining Applications, by G.Miner, J.Elder, R.Nisbet
Machine Learning for Hackers, by Drew Conway and John Myles White
Mahout in Action, by S.Owen, R.Anil, T.Dunning, E.Friedman
Statistical and Machine-Learning Data Mining: Techniques for Better..., by Bruce Ratner
Networks, Crowds, and Markets: Reasoning About a Highly Connected W..., by David Easley and Jon Kleinberg
Bayesian Reasoning and Machine Learning, by David Barber
Ensemble Methods in Data Mining: Improving Accuracy Through Combini..., by Giovanni Seni and John Elder (Older Edition is also available)
Data Mining with R: Learning with Case Studies, by Luis Torgo
Using R for Data Management, Statistical Analysis, and Graphics, by Nicholas Horton and Ken Kleinman
Introduction to Data Mining, by P-N.Tan, M.Steinbach, V.Kumar
And for my astronomer friends, here are a couple of additional suggestions:
 14.  Statistics, Data Mining, and Machine Learning in Astronomy: A Pract..., by Z.Ivezic, A.Connolly, J.VanderPlas, A.Gray
 15.  Advances in Machine Learning and Data Mining for Astronomy, by M.Way, J.Scargle, K.Ali, A.Srivastava
Posted by uniqueone
,

box plot

Machine Learning 2017. 2. 28. 10:23
http://whatis.techtarget.com/definition/box-plot

 

box plot

Contributor(s): Stan Gibilisco

A box plot is a graphical rendition of statistical data based on the minimum, first quartile, median, third quartile, and maximum. The term "box plot" comes from the fact that the graph looks like a rectangle with lines extending from the top and bottom. Because of the extending lines, this type of graph is sometimes called a box-and-whisker plot.

In a typical box plot, the top of the rectangle indicates the third quartile, a horizontal line near the middle of the rectangle indicates the median, and the bottom of the rectangle indicates the first quartile. A vertical line extends from the top of the rectangle to indicate the maximum value, and another vertical line extends from the bottom of the rectangle to indicate the minimum value. The illustration shows a generic example of a box plot with the maximum, third quartile, median, first quartile, and minimum values labeled. The relative vertical spacing between the labels reflects the values of the variable in proportion.

A box plot can be placed on a coordinate plane resembling the Cartesian system, so that the five values, arranged vertically one above the other, run parallel to the dependent-variable or y axis. In some situations, two or more box plots can be placed side-by-side on a Cartesian coordinate plane to show how a phenomenon or scenario evolves with time, which is plotted along the independent-variable or x axis. Once in a while, a single box plot is tilted on its side, so the values run from left-to-right (minimum to maximum) instead of bottom-to-top.

Posted by uniqueone
,
[책소개] 초보자들을 위한 통계학습 (An Introduction to Statistical Learning with Applications in R)(http://bahnsville.tistory.com/950)

 

 

(한글 번역본 나옴. http://book.daum.net/detail/book.do?bookid=KOR9791186710050)

연초부터 옆 팀에서 Kevin P. Murphy의 "Machine Learing: A Probabilistic Perspective"라는 책으로 스터디를 한다는 얘기를 들었다. 여러 이유로 제한된 일부를 제외하고는 타팀의 청강을 허용하지 않는다고 해서, 그냥 인터넷에서 책을 구해서 읽어나갔다. (구글링하면 PDF 파일을 얻을 수 있음.) 약 1000페이지의 방대한 양에 머신러닝과 관련된 -- 최근 핫한 이슈를 포함한 -- 대부분의 주제를 다루고 있어서 이것만 마스터하면 머신러닝에 조금 더 익숙해질 수 있으리라는 기대를 가지고 막무가내로 읽어나갔다. 그런데 양도 많지만 너무 어려웠다. 처음 3챕터정도는 토시 하나 빼지 않고 다 읽었는데, 점점 한계에 부딪혀서 점점 대강 읽어나가다가 어느 순간에는 목차와 본문의 볼드체 글씨만 확인하고 넘겼다. 수학/확률 전공자가 아니면 쉽게 이해하기 어려운 너무 하드코어 텍스트북이다. 전공자가 아니라면 읽지 않는 것이 정신 건강에 좋다.

그렇게 1000페이지를 거의 넘겨가는 시점에 페이스북에서 새로운 책이 소개된 글을 보게 되었다. '그대안의 작은 호수'라는 타이틀의 사이트에 책 제목과 같은 "An Introduction to Statistical Learning with R"이라는 글을 보게 되었고, 지금 무료로 책을 다운로드 가능하다고 해서 바로 받아서 읽기 시작했다. 책 PDF는 글 속의 링크 또는 구글링을 통해서 얻을 수 있다.

이 책은 수학/통계 또는 컴퓨터 공학이 전공이 아닌데, 데이터마이닝/머신러닝에 관심이 있는 초보자들에게 유용하다. 특히 산업공학이나 화학공학, 바이오인포메틱스 등의 응용분야에서 데이터 분석을 하는 이들에게 적합하다. 학부 수준의 교육을 받았다면 (처음에는 조금 익숙치 않을 수도 있으나) 충분히 이해할 수 있다. 모든 챕터에서 개념을 설명하는 것에 더해서, 챕터 말미에는 그 챕터에서 다룬 내용을 R을 이용해서 분석하는 예제도 함께 수록되었기 때문에, 수식이나 이론을 도출하는 수학/통계학자 또는 새로운 알고리즘/애플리케이션을 구현해야하는 컴사/컴공 전공자가 아닌, 응용분야의 공학자들에게 안성맞춤이다.

물론 단점도 있다. Murphy의 책에서 다루듯이 이 분야의 거의 전체를 다루는 지는 않는다. 심화학습이 필요한 이들에게는 별로 추천하지 않는다. 그리고 책이 지나치게 Supervised, Regression, 선형성에 초점을 맞춰져있다. 즉, unsupervised나 비선형성 문제/해법은 많이 다루지 않는다. (초보자들에게는 이게 더 큰 장점일 수도 있다.) 물론 이를 베이스로해서 더 학습하면 좋은 결과를 얻을 수 있으리라 믿는다. 보통의 데이터마이닝 책이 분류 classification을 베이스 다루는데, 회귀분석 regression을 베이스로 다루는 것이 조금 특이하다. 그리고 회귀분석에서 최근에 나온 Ridge regression과 Lasso를 다뤄서 (궁금했었는데) 개인적으로 많은 도움이 되었다. 일부 알고리즘은 더 자세히 다뤄줬으면 좋겠다는 생각이 들지만, R을 이용해서 실전에서 해당 알고리즘을 사용하는데는 전혀 문제가 없다.

수학/통계 및 컴퓨터공학 비전공자들 중에서 데이터마이닝/데이터분석에 관심이 있다면 시작하기에 안성맞춤인 책이다. 더 공부학 싶으면 머피 책이나 다른 책들을 참조하면 된다.

===

업데이트.

한글 번역본이 나왔습니다.

http://book.daum.net/detail/book.do?bookid=KOR9791186710050

==

페이스북 페이지: https://www.facebook.com/unexperienced



출처: http://bahnsville.tistory.com/950 [nthought] 

Posted by uniqueone
,

https://www.mathworks.com/solutions/machine-learning/examples.html?s_eid=PSM_da

 

 

Classification Examples

 

Clustering Examples

 

Posted by uniqueone
,
How set miscalculation cost in MATLAB SVM model? - Stack Overflow

 


http://stackoverflow.com/questions/35523723/how-set-miscalculation-cost-in-matlab-svm-model
Posted by uniqueone
,

https://www.analyticsvidhya.com/blog/2016/03/practical-guide-deal-imbalanced-classification-problems/

 

 

 

Introduction

We have several machine learning algorithms at our disposal for model building. Doing data based prediction is now easier like never before. Whether it is a regression or classification problem, one can effortlessly achieve a reasonably high accuracy using a suitable algorithm. But, this is not the case everytime. Classification problems can sometimes get a bit tricky.

ML algorithms tend to tremble when faced with imbalanced classification data sets. Moreover, they result in biased predictions and misleading accuracies. But, why does it happen ? What factors deteriorate their performance ?

The answer is simple. With imbalanced data sets, an algorithm doesn’t get the necessary information about the minority class to make an accurate prediction. Hence, it is desirable to use ML algorithms with balanced data sets. Then, how should we deal with imbalanced data sets ? The methods are simple but tricky as described in this article.

In this article, I’ve shared the important things you need to know to tackle imbalanced classification problems. In particular, I’ve kept my focus on imbalance in binary classification problems. As usual, I’ve kept the explanation simple and informative. Towards the end, I’ve provided a practical view of dealing with such data sets in R with ROSE package.

Imbalanced classification in R

 

What is Imbalanced Classification ?

Imbalanced classification is a supervised learning problem where one class outnumbers other class by a large proportion. This problem is faced more frequently in binary classification problems than multi-level classification problems.

The term imbalanced refer to the disparity encountered in the dependent (response) variable. Therefore, an imbalanced classification problem is one in which the dependent variable has imbalanced proportion of classes. In other words, a data set that exhibits an unequal distribution between its classes is considered to be imbalanced.

For example: Consider a data set with 100,000 observations. This data set consist of candidates who applied for Internship in Harvard. Apparently, harvard is well-known for its extremely low acceptance rate. The dependent variable represents if a candidate has been shortlisted (1) or not shortlisted (0). After analyzing the data, it was found ~ 98% did not get shortlisted and only ~ 2% got lucky. This is a perfect case of imbalanced classification.

In real life, does such situations arise more ? Yes! For better understanding, here are some real life examples. Please note that the degree of imbalance varies per situations:

  1. An automated inspection machine which detect products coming off manufacturing assembly line may find number of defective products significantly lower than non defective products.
  2. A test done to detect cancer in residents of a chosen area may find the number of cancer affected people significantly less than unaffected people.
  3. In credit card fraud detection, fraudulent transactions will be much lower than legitimate transactions.
  4. A manufacturing operating under six sigma principle may encounter 10 in a million defected products.

There are many more real life situations which result in imbalanced data set. Now you see, the chances of obtaining an imbalanced data is quite high. Hence, it’s important to learn to deal with such problems for every analyst.

 

Why do standard ML algorithms struggle with accuracy on imbalanced data?

This is an interesting experiment to do. Try it! This way you will understand the importance of learning the ways to restructure imbalanced data. I’ve shown this in the practical section below.

Below are the reasons which leads to reduction in accuracy of ML algorithms on imbalanced data sets:

  1. ML algorithms struggle with accuracy because of the unequal distribution in dependent variable.
  2. This causes the performance of existing classifiers to get biased towards majority class.
  3. The algorithms are accuracy driven i.e. they aim to minimize the overall error to which the minority class contributes very little.
  4. ML algorithms assume that the data set has balanced class distributions.
  5. They also assume that errors obtained from different classes have same cost (explained below in detail).

 

What are the methods to deal with imbalanced data sets ?

The methods are widely known as ‘Sampling Methods’. Generally, these methods aim to modify an imbalanced data into balanced distribution using some mechanism. The modification occurs by altering the size of original data set and provide the same proportion of balance.

These methods have acquired higher importance after many researches have proved that balanced data results in improved overall classification performance compared to an imbalanced data set. Hence, it’s important to learn them.

Below are the methods used to treat imbalanced datasets:

  1. Undersampling
  2. Oversampling
  3. Synthetic Data Generation
  4. Cost Sensitive Learning

Let’s understand them one by one.

 

1. Undersampling

This method works with majority class. It reduces the number of observations from majority class to make the data set balanced. This method is best to use when the data set is huge and reducing the number of training samples helps to improve run time and storage troubles.

Undersampling methods are of 2 types: Random and Informative.

Random undersampling method randomly chooses observations from majority class which are eliminated until the data set gets balanced. Informative undersampling follows a pre-specified selection criterion to remove the observations from majority class.

Within informative undersampling, EasyEnsemble and BalanceCascade algorithms are known to produce good results. These algorithms are easy to understand and straightforward too.

EasyEnsemble: At first, it extracts several subsets of independent sample (with replacement) from majority class. Then, it develops multiple classifiers based on combination of each subset with minority class. As you see, it works just like a unsupervised learning algorithm.

BalanceCascade: It takes a supervised learning approach where it develops an ensemble of classifier and systematically selects which majority class to ensemble.

Do you see any problem with undersampling methods? Apparently, removing observations may cause the training data to lose important information pertaining to majority class.

 

2. Oversampling

This method works with minority class. It replicates the observations from minority class to balance the data. It is also known as upsampling. Similar to undersampling, this method also can be divided into two types: Random Oversampling and Informative Oversampling.

Random oversampling balances the data by randomly oversampling the minority class. Informative oversampling uses a pre-specified criterion and synthetically generates minority class observations.

An advantage of using this method is that it leads to no information loss. The disadvantage of using this method is that, since oversampling simply adds replicated observations in original data set, it ends up adding multiple observations of several types, thus leading to overfitting. Although, the training accuracy of such data set will be high, but the accuracy on unseen data will be worse.

 

3. Synthetic Data Generation

In simple words, instead of replicating and adding the observations from the minority class, it overcome imbalances by generates artificial data. It is also a type of oversampling technique.

In regards to synthetic data generation, synthetic minority oversampling technique (SMOTE) is a powerful and widely used method. SMOTE algorithm creates artificial data based on feature space (rather than data space) similarities from minority samples. We can also say, it generates a random set of minority class observations to shift the classifier learning bias towards minority class.

To generate artificial data, it uses bootstrapping and k-nearest neighbors. Precisely, it works this way:

  1. Take the difference between the feature vector (sample) under consideration and its nearest neighbor.
  2. Multiply this difference by a random number between 0 and 1
  3. Add it to the feature vector under consideration
  4. This causes the selection of a random point along the line segment between two specific features

R has a very well defined package which incorporates this techniques. We’ll look at it in practical section below.

 

4. Cost Sensitive Learning (CSL)

It is another commonly used method to handle classification problems with imbalanced data. It’s an interesting method. In simple words, this method evaluates the cost associated with misclassifying observations.

It does not create balanced data distribution. Instead, it highlights the imbalanced learning problem by using cost matrices which describes the cost for misclassification in a particular scenario. Recent researches have shown that cost sensitive learning have many a times outperformed sampling methods. Therefore, this method provides likely alternative to sampling methods.

Let’s understand it using an interesting example: A data set of passengers in given. We are interested to know if a person has bomb. The data set contains all the necessary information. A person carrying bomb is labeled as positive class. And, a person not carrying a bomb in labeled as negative class. The problem is to identify which class a person belongs to. Now, understand the cost matrix.

There in no cost associated with identifying a person with bomb as positive and a person without negative. Right ? But, the cost associated with identifying a person with bomb as negative (False Negative) is much more dangerous than identifying a person without bomb as positive (False Positive).

Cost Matrix is similar of confusion matrix. It’s just, we are here more concerned about false positives and false negatives (shown below). There is no cost penalty associated with True Positive and True Negatives as they are correctly identified.

cost matrix cost selection algorithmCost Matrix

The goal of this method is to choose a classifier with lowest total cost.

Total Cost = C(FN)xFN + C(FP)xFP

where,

  1. FN is the number of positive observations wrongly predicted
  2. FP is the number of negative examples wrongly predicted
  3. C(FN) and C(FP) corresponds to the costs associated with False Negative and False Positive respectively. Remember, C(FN) > C(FP).

There are other advanced methods as well for balancing imbalanced data sets. These are Cluster based sampling, adaptive synthetic sampling, border line SMOTE, SMOTEboost, DataBoost – IM, kernel based methods and many more. The basic working on these algorithm is almost similar as explained above. There are more intuitive methods which you can try for improved predictions:

  1. Using clustering, divide the majority class into K distinct cluster. There should be no overlap of observations among these clusters. Train each of these cluster with all observations from minority class. Finally, average your final prediction.
  2. Collect more data. Aim for more data having higher proportion of minority class. Otherwise, adding more data will not improve the proportion of class imbalance.

 

Which performance metrics to use to evaluate accuracy ?

Choosing a performance metric is a critical aspect of working with imbalanced data. Most classification algorithms calculate accuracy based on the percentage of observations correctly classified. With imbalanced data, the results are high deceiving since minority classes hold minimum effect on overall accuracy.

confusion matrix imbalanced classification metricConfusion Matrix

The difference between confusion matrix and cost matrix is that, cost matrix provides information only about the misclassification cost, whereas confusion matrix describes the entire set of possibilities using TP, TN, FP, FN. In a cost matrix, the diagonal elements are zero. The most frequently used metrics are Accuracy & Error Rate.

Accuracy: (TP + TN)/(TP+TN+FP+FN)

Error Rate = 1 - Accuracy = (FP+FN)/(TP+TN+FP+FN)

As mentioned above, these metrics may provide deceiving results and are highly sensitive to changes in data. Further, various metrics can be derived from confusion matrix. The resulting metrics provide a better measure to calculate accuracy while working on a imbalanced data set:

Precision: It is a measure of correctness achieved in positive prediction i.e. of observations labeled as positive, how many are actually labeled positive.

Precision = TP / (TP + FP)

Recall: It is a measure of actual observations which are labeled (predicted) correctly i.e. how many observations of positive class are labeled correctly. It is also known as ‘Sensitivity’.

Recall = TP / (TP + FN)

F measure: It combines precision and recall as a measure of effectiveness of classification in terms of ratio of weighted importance on either recall or precision as determined by β coefficient.

F measure = ((1 + β)² × Recall × Precision) / ( β² × Recall + Precision )

β is usually taken as 1.

Though, these methods are better than accuracy and error metric, but still ineffective in answering the important questions on classification. For example: precision does not tell us about negative prediction accuracy. Recall is more interesting in knowing actual positives. This suggest, we can still have a better metric to cater to our accuracy needs.

Fortunately, we have a ROC (Receiver Operating Characteristics) curve to measure the accuracy of a classification prediction. It’s the most widely used evaluation metric. ROC Curve is formed by plotting TP rate (Sensitivity) and FP rate (Specificity).

Specificity = TN / (TN + FP)

Any point on ROC graph, corresponds to the performance of a single classifier on a given distribution. It is useful because if provides a visual representation of benefits (TP) and costs (FP) of a classification data. The larger the area under ROC curve, higher will be the accuracy.

There may be situations when ROC fails to deliver trustworthy performance. It has few shortcomings such as.

  1. It may provide overly optimistic performance results of highly skewed data.
  2. It does not provide confidence interval on classifier’s performance
  3. It fails to infer the significance of different classifier performance.

As alternative methods, we can use other visual representation metrics include PR curve, cost curves as well. Specifically, cost curves are known to possess the ability to describe a classifier’s performance over varying misclassification costs and class distributions in a visual format. In more than 90% instances, ROC curve is known to perform quite well.

 

Imbalanced Classification in R

Till here, we’ve learnt about some essential theoretical aspects of imbalanced classification. It’s time to learn to implement these techniques practically.  In R, packages such as ROSE and DMwR helps us to perform sampling strategies quickly. We’ll work on a problem of binary classification.

ROSE (Random Over Sampling Examples) package helps us to generate artificial data based on sampling methods and smoothed bootstrap approach. This package has well defined accuracy functions to do the tasks quickly.

Let’s get started

#set path
> path <- "C:/Users/manish/desktop/Data/March 2016"

#set working directory
> setwd(path)

#install packages
> install.packages("ROSE")
> library(ROSE)

The package ROSE comes with an inbuilt imbalanced data set named as hacide. It comprises of two files: hacide.train and hacide.test. Let’s load it in R environment:

> data(hacide)
> str(hacide.train)
'data.frame': 1000 obs. of 3 variables:
$ cls: Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
$ x1 : num 0.2008 0.0166 0.2287 0.1264 0.6008 ...
$ x2 : num 0.678 1.5766 -0.5595 -0.0938 -0.2984 ...

As you can see, the data set contains 3 variable of 1000 observations. cls is the response variable. x1 and x2 are dependent variables. Let’s check the severity of imbalance in this data set:

#check table
table(hacide.train$cls)
  0     1
980    20

#check classes distribution
prop.table(table(hacide.train$cls))
  0      1  
0.98   0.02

As we see, this data set contains only 2% of positive cases and 98% of negative cases. This is a severely imbalanced data set. So, how badly can this affect our prediction accuracy ? Let’s build a model on this data. I’ll be using decision tree algorithm for modeling purpose.

> library(rpart)
> treeimb <- rpart(cls ~ ., data = hacide.train)
> pred.treeimb <- predict(treeimb, newdata = hacide.test)

Let’s check the accuracy of this prediction. To check accuracy, ROSE package has a function names accuracy.meas, it computes important metrics such as precision, recall & F measure.

> accuracy.meas(hacide.test$cls, pred.treeimb[,2])
   Call:
   accuracy.meas(response = hacide.test$cls, predicted = pred.treeimb[, 2])
   Examples are labelled as positive when predicted is greater than 0.5 

   precision: 1.000
   recall: 0.200
   F: 0.167

These metrics provide an interesting interpretation. With threshold value as 0.5, Precision = 1 says there are no false positives. Recall = 0.20 is very much low and indicates that we have higher number of false negatives. Threshold values can be altered also. F = 0.167 is also low and suggests weak accuracy of this model.

We’ll check the final accuracy of this model using ROC curve. This will give us a clear picture, if this model is worth. Using the function roc.curve available in this package:

> roc.curve(hacide.test$cls, pred.treeimb[,2], plotit = F)
Area under the curve (AUC): 0.600

AUC = 0.60 is a terribly low score. Therefore, it is necessary to balanced data before applying a machine learning algorithm. In this case, the algorithm gets biased toward the majority class and fails to map minority class.

We’ll use the sampling techniques and try to improve this prediction accuracy. This package provides a function named ovun.sample which enables oversampling, undersampling in one go.

Let’s start with oversampling and balance the data.

#over sampling
> data_balanced_over <- ovun.sample(cls ~ ., data = hacide.train, method = "over",N = 1960)$data
> table(data_balanced_over$cls)
0    1
980 980

In the code above, method over instructs the algorithm to perform over sampling. N refers to number of observations in the resulting balanced set. In this case, originally we had 980 negative observations. So, I instructed this line of code to over sample minority class until it reaches 980 and the total data set comprises of 1960 samples.

Similarly, we can perform undersampling as well. Remember, undersampling is done without replacement.

> data_balanced_under <- ovun.sample(cls ~ ., data = hacide.train, method = "under", N = 40, seed = 1)$data
> table(data_balanced_under$cls)
0    1
20  20

Now the data set is balanced. But, you see that we’ve lost significant information from the sample. Let’s do both undersampling and oversampling on this imbalanced data. This can be achieved using method = “both“. In this case, the minority class is oversampled with replacement and majority class is undersampled without replacement.

> data_balanced_both <- ovun.sample(cls ~ ., data = hacide.train, method = "both", p=0.5,                             N=1000, seed = 1)$data
> table(data_balanced_both$cls)
0    1
520 480

p refers to the probability of positive class in newly generated sample.

The data generated from oversampling have expected amount of repeated observations. Data generated from undersampling is deprived of important information from the original data. This leads to inaccuracies in the resulting performance. To encounter these issues, ROSE helps us to generate data synthetically as well. The data generated using ROSE is considered to provide better estimate of original data.

> data.rose <- ROSE(cls ~ ., data = hacide.train, seed = 1)$data
> table(data.rose$cls)
0    1
520 480

This generated data has size equal to the original data set (1000 observations). Now, we’ve balanced data sets using 4 techniques. Let’s compute the model using each data and evaluate its accuracy.

#build decision tree models
> tree.rose <- rpart(cls ~ ., data = data.rose)
> tree.over <- rpart(cls ~ ., data = data_balanced_over)
> tree.under <- rpart(cls ~ ., data = data_balanced_under)
> tree.both <- rpart(cls ~ ., data = data_balanced_both)

#make predictions on unseen data
> pred.tree.rose <- predict(tree.rose, newdata = hacide.test)
> pred.tree.over <- predict(tree.over, newdata = hacide.test)
> pred.tree.under <- predict(tree.under, newdata = hacide.test)
> pred.tree.both <- predict(tree.both, newdata = hacide.test)

It’s time to evaluate the accuracy of respective predictions. Using inbuilt function roc.curve allows us to capture roc metric.

#AUC ROSE
> roc.curve(hacide.test$cls, pred.tree.rose[,2])
Area under the curve (AUC): 0.989

#AUC Oversampling
roc.curve(hacide.test$cls, pred.tree.over[,2])
Area under the curve (AUC): 0.798

#AUC Undersampling
roc.curve(hacide.test$cls, pred.tree.under[,2])
Area under the curve (AUC): 0.867

#AUC Both
roc.curve(hacide.test$cls, pred.tree.both[,2])
Area under the curve (AUC): 0.798

Here is the resultant ROC curve where:

  • Black color represents ROSE curve
  • Red color represents oversampling curve
  • Green color represents undersampling curve
  • Blue color represents both sampling curve

ROC curve

Hence, we get the highest accuracy from data obtained using ROSE algorithm. We see that the data generated using synthetic methods result in high accuracy as compared to sampling methods. This technique combined with a more robust algorithm (random forest, boosting) can lead to exceptionally high accuracy.

This package also provide us methods to check the model accuracy using holdout and bagging method. This helps us to ensure that our resultant predictions doesn’t suffer from high variance.

> ROSE.holdout <- ROSE.eval(cls ~ ., data = hacide.train, learner = rpart, method.assess = "holdout", extr.pred = function(obj)obj[,2], seed = 1)
> ROSE.holdout

Call:
ROSE.eval(formula = cls ~ ., data = hacide.train, learner = rpart,
extr.pred = function(obj) obj[, 2], method.assess = "holdout",
seed = 1)

Holdout estimate of auc: 0.985

We see that our accuracy retains at ~ 0.98 and shows that our predictions aren’t suffering from high variance. Similarly, you can use bootstrapping by setting method.assess to “BOOT”. The parameter extr.pred is a function which extracts the column of probabilities belonging to positive class.

 

End Notes

When faced with imbalanced data set, one might need to experiment with these methods to get the best suited sampling technique. In our case, we found that synthetic sampling technique outperformed the traditional oversampling and undersampling method. For better results, you can use advanced sampling methods which includes synthetic sampling with boosting methods.

In this article, I’ve discussed the important things one should know to deal with imbalanced data sets. For R users, dealing with such situations isn’t difficult since we are blessed with some powerful and awesome packages.

Did you find this article helpful ? Have you used these methods before? Do share your experience / suggestions in the comments section below.

 

Posted by uniqueone
,