'분류 전체보기'에 해당되는 글 1029건

수학 필요한 과목 How much mathematics does an IT engineer need to learn to get into data science/machine learning?

Deep Learning/resources 2017. 11. 29. 20:30

How much mathematics does an IT engineer need to learn to get into data science/machine learning?
https://towardsdatascience.com/how-much-maths-does-an-it-engineer-need-to-learn-to-get-into-data-science-machine-learning-7d6a42f79516

Homepage
Towards Data Science
Get started
HOMEDATA SCIENCEMACHINE LEARNINGPROGRAMMINGVISUALIZATIONEVENTSLETTERSCONTRIBUTE
Go to the profile of Tirthajyoti Sarkar
Tirthajyoti Sarkar
Semiconductor technologist, machine learning/data science zealot, Ph.D. in EE, blogger and writer.
Aug 29
How much mathematics does an IT engineer need to learn to get into data science/machine learning?

Disclaimer and Prologue
First, the disclaimer, I am not an IT engineer :-) I work in the field of semiconductors, specifically high-power semiconductors, as a technology development engineer, whose day job consists of dealing primarily with semiconductor physics, finite-element simulation of silicon fabrication process, or electronic circuit theory. There are, of course, some mathematics in this endeavor, but for better of worse, I don’t need to dabble in the kind of mathematics that will be necessary for a data scientist.
However, I have many friends in IT industry and observed a great many traditional IT engineers enthusiastic about learning/contributing to the exciting field of data science and machine learning/artificial intelligence. I am dabbling myself in this field to learn some tricks of the trade which I can apply to the domain of semiconductor device or process design. But when I started diving deep into these exciting subjects (by self-study), I discovered quickly that I don’t know/only have a rudimentary idea about/ forgot mostly what I studied in my undergraduate study some essential mathematics. In this LinkedIn article, I ramble about it…
Now, I have a Ph.D. in Electrical Engineering from a reputed US University and still I felt incomplete in my preparation for having solid grasp over machine learning or data science techniques without having a refresher in some essential mathematics. Meaning no disrespect to an IT engineer, I must say that the very nature of his/her job and long training generally leave him/her distanced from the world of applied mathematics. (S)he may be dealing with lot of data and information on a daily basis but there may not be an emphasis on rigorous modeling of that data. Often, there is immense time pressure, and the emphasis is on ‘use the data for your immediate need and move on’ rather than on deep probing and scientific exploration of the same. Unfortunately, data science should always be about the science (not data), and following that thread, certain tools and techniques become indispensable.
These tools and techniques — modeling a process (physical or informational) by probing the underlying dynamics, rigorously estimating the quality of the data source, training one’s sense for identification of the hidden pattern from the stream of information, or understanding clearly the limitation of a model— are the hallmarks of sound scientific process.
They are often taught at advanced graduate level courses in an applied science/engineering discipline. Or, one can imbibe them through high-quality graduate-level research work in similar field. Unfortunately, even a decade long career in traditional IT (devOps, database, or QA/testing) will fall short of rigorously imparting this kind of training. There is, simply, no need.
The Times They Are a-Changin’
Until now.
You see, in most cases, having impeccable knowledge of SQL queries, a clear sense of the overarching business need, and idea about the general structure of the corresponding RDBMS is good enough to perform the extract-transform-load cycle and thereby generating value to the company for any IT engineer worth his/her salt. But what happens if someone drops by and starts asking weird question like “is your artificially synthesized test data set random enough” or “how would you know if the next data point is within 3-sigma limit of the underlying distribution of your data”? Or, even the occasional quipping from the next-cubicle computer science graduate/nerd that the computational load for any meaningful mathematical operation with a table of data (aka a matrix) grows non-linearly with the size of table i.e. number of rows and columns, can be exasperating and confusing.
And these type of questions are growing in frequency and urgency, simply because data is the new currency.
Executives, technical managers, decision-makers are not satisfied anymore with just the dry description of a table, obtained by traditional ETL tools. They want to see the hidden pattern, they yarn to feel the subtle interaction between the columns, they would like to get the full descriptive and inferential statistics that may help in predictive modeling and extending the projection power of the data set far beyond the immediate range of values that it contains.
Today’s data must tell a story, or, sing a song if you like. However, to listen to its beautiful tune, one must be versed in the fundamental notes of the music, and those are mathematical truths.
Without much further ado, let us come to the crux of the matter. What are the essential topics/sub-topics of mathematics, that an average IT engineer must study/refresh if (s)he wants to enter into the field of business analytics/data science/data mining? I’ll show my idea in the following chart.

Basic Algebra, Functions, Set theory, Plotting, Geometry

Always a good idea to start at the root. Edifice of modern mathematics is built upon some key foundations — set theory, functional analysis, number theory etc. From an applied mathematics learning point of view, we can simplify studying these topics through some concise modules (in no particular order):

a) set theory basics, b) real and complex numbers and basic properties, c) polynomial functions, exponential, logarithms, trigonometric identities, d) linear and quadratic equations, e) inequalities, infinite series, binomial theorem, f) permutation and combination, g) graphing and plotting, Cartesian and polar co-ordinate systems, conic sections, h) basic geometry and theorems, triangle properties.
Calculus
Sir Issac Newton wanted to explain the behavior of heavenly bodies. But he did not have a good enough mathematical tool to describe his physical concepts. So he invented this (or a certain modern form) branch of mathematics when he was hiding away on his countryside farm from the plague outbreak in urban England. Since then, it is considered the gateway to advanced learning in any analytical study — pure or applied science, engineering, social science, economics, …

Not surprisingly then, the concept and application of calculus pops up in numerous places in the field of data science or machine learning. Most essential topics to be covered are as follows -
a) Functions of single variable, limit, continuity and differentiability, b) mean value theorems, indeterminate forms and L’Hospital rule, c) maxima and minima, d) product and chain rule, e) Taylor’s series, f) fundamental and mean value-theorems of integral calculus, g) evaluation of definite and improper integrals, h) Beta and Gamma functions, i) Functions of two variables, limit, continuity, partial derivatives, j) basics of ordinary and partial differential equations.
Linear Algebra
Got a new friend suggestion on Facebook? A long lost professional contact suddenly added you on LinkedIn? Amazon suddenly recommended an awesome romance-thriller for your next vacation reading? Or Netflix dug up for you that little-known gem of a documentary which just suits your taste and mood?

Doesn’t it feel good to know that if you learn basics of linear algebra, then you are empowered with the knowledge about the basic mathematical object that is at the heart of all these exploits by the high and mighty of the tech industry?
At least, you will know the basic properties of the mathematical structure that controls what you shop on Target, how you drive using Google Map, which song you listen to on Pandora, or whose room you rent on Airbnb.
The essential topics to study are (not an ordered or exhaustive list by any means):
a) basic properties of matrix and vectors —scalar multiplication, linear transformation, transpose, conjugate, rank, determinant, b) inner and outer products, c) matrix multiplication rule and various algorithms, d) matrix inverse, e) special matrices — square matrix, identity matrix, triangular matrix, idea about sparse and dense matrix, unit vectors, symmetric matrix, Hermitian, skew-Hermitian and unitary matrices, f) matrix factorization concept/LU decomposition, Gaussian/Gauss-Jordan elimination, solving Ax=b linear system of equation, g) vector space, basis, span, orthogonality, orthonormality, linear least square, h) singular value decomposition, i) eigenvalues, eigenvectors, and diagonalization.
Here is a nice Medium article on what you can accomplish with linear algebra.
Statistics and Probability
Only death and taxes are certain, and for everything else there is normal distribution.

The importance of having a solid grasp over essential concepts of statistics and probability cannot be overstated in a discussion about data science. Many practitioners in the field actually call machine learning nothing but statistical learning. I followed the widely known “An Introduction to Statistical Learning” while working on my first MOOC in machine learning and immediately realized the conceptual gaps I had in the subject. To plug those gaps, I started taking other MOOCs focused on basic statistics and probability and reading up/watching videos on related topics. The subject is vast and endless, and therefore focused planning is critical to cover most essential concepts. I am trying to list them as best as I can but I fear this is the area where I will fall short by most amount.
a) data summaries and descriptive statistics, central tendency, variance, covariance, correlation, b) Probability: basic idea, expectation, probability calculus, Bayes theorem, conditional probability, c) probability distribution functions — uniform, normal, binomial, chi-square, student’s t-distribution, central limit theorem, d) sampling, measurement, error, random numbers, e) hypothesis testing, A/B testing, confidence intervals, p-values, f) ANOVA, g) linear regression, h) power, effect size, testing means, i) research studies and design-of-experiment.
Here is a nice article on the necessity of statistics knowledge for a data scientist.
Special Topics: Optimization theory, Algorithm analysis
These topics are little different from the traditional discourse in applied mathematics as they are mostly relevant and most widely used in specialized fields of study — theoretical computer science, control theory, or operation research. However, a basic understanding of these powerful techniques can be so fruitful in the practice of machine learning that they are worth mentioning here.

For example, virtually every machine learning algorithm/technique aims to minimize some kind of estimation error subject to various constraints. That, right there, is an optimization problem, which is generally solved by linear programming or similar techniques. On the other hand, it is always deeply satisfying and insightful experience to understand a computer algorithm’s time complexity as it becomes extremely important when the algorithm is applied to a large data set. In this era of big data, where a data scientist is routinely expected to extract, transform, and analyze billions of records, (s)he must be extremely careful about choosing the right algorithm as it can make all the difference between amazing performance or abject failure. General theory and properties of algorithms are best studied in a formal computer science course but to understand how their time complexity (i.e. how much time the algorithm will take to run for a given size of data) is analyzed and calculated, one must have rudimentary familiarity with mathematical concepts such as dynamic programming or recurrence equations. A familiarity with the technique of proof by mathematical induction can be extremely helpful too.
Epilogue
Scared? Mind-bending list of topics to learn just as per-requisite? Fear not, you will learn on the go and as needed. But the goal is to keep the windows and doors of your mind open and welcoming.
There is even a concise MOOC course to get you started. Note, this is a beginner-level course for refreshing your high-school or freshman year level knowledge. And here is a summary article on 15 best math courses for data science on kdnuggets.
But you can be assured that, after refreshing these topics, many of which you may have studied in your undergraduate, or even learning new concepts, you will feel so empowered that you will definitely start to hear the hidden music that the data sings. And that’s called a big leap towards becoming a data scientist…
#datascience, #machinelearning, #information, #technology, #mathematics
If you have any questions or ideas to share, please contact the author at tirthajyoti[AT]gmail.com. Also you can check author’s GitHub repositories for other fun code snippets in Python, R, or MATLAB and machine learning resources. You can also follow me on LinkedIn.
Machine LearningData ScienceMathematicsStatisticsLinear Algebra
One clap, two clap, three clap, forty?
By clapping more or less, you can signal to us which stories really stand out.

81

Follow
Go to the profile of Tirthajyoti Sarkar
Tirthajyoti Sarkar
Semiconductor technologist, machine learning/data science zealot, Ph.D. in EE, blogger and writer.
Follow
Towards Data Science
Towards Data Science
Sharing concepts, ideas, and codes.
More on Statistics from Towards Data Science
The 10 Statistical Techniques Data Scientists Need to Master
Go to the profile of James Le
James Le

8.3K

Also tagged Statistics
Should you write about real goals or expected goals? A guide for journalists.
Go to the profile of David Sumpter
David Sumpter

27

Related reads
How to choose effective MOOCs for machine learning and data science?
Go to the profile of Tirthajyoti Sarkar
Tirthajyoti Sarkar

422

'Deep Learning > resources' 카테고리의 다른 글

Two months exploring deep learning and computer vision (0)	2017.12.22
과제를 준비하면서 사용하였던 colorization, google deepdream, style transfer, matting 알고리즘에 대해 간단히 정리해보았습니다. (논문 + 코드 링크 정리입니다.) opencv grabcut도 있음 (0)	2017.12.09
딥러닝의 내부에서 일어나는 일을 Information Theory로, 그 중에서도 Information Bottleneck 이라는 원리로 접근하는 이론에 관한 글입니다 (0)	2017.11.29
Kaggle-knowhow 한국분들을 위한 Kaggle 자료 모음입니다 (0)	2017.11.24
구글에서 공개한 머신러닝 용어집 (https://developers.google.com/machine-learning/glossary/) 을 번역한 글입니다 (0)	2017.11.18

Posted by uniqueone

딥러닝의 내부에서 일어나는 일을 Information Theory로, 그 중에서도 Information Bottleneck 이라는 원리로 접근하는 이론에 관한 글입니다

Deep Learning/resources 2017. 11. 29. 07:18

https://m.facebook.com/groups/255834461424286?view=permalink&id=565488850458844

딥러닝의 내부에서 일어나는 일을 Information Theory로, 그 중에서도 Information Bottleneck 이라는 원리로 접근하는 이론에 관한 글입니다.
한마디로 딥러닝 아키텍쳐안의 노드는 바틀넥처럼 작용하여 자기에게 들어온 인포메이션 중 미래의 목표와 관련된 것들만을 선별하고 그 나머지는 버림으로서 (여기서는 compression이라 표현) 일반화라는 목표에 도달하게 된다는 것입니다.
즉 개냐 고양이냐 0이냐 1이냐 하는 식으로 일반화하는 데에 필요한 정보들만 계속 추상화되어 바틀넥안으로 밀어넣어진다는 것인데 이 과정에서 인풋에 들어있는 얼마나 많은 정보들이 걸러지고 버려지겠는지만 생각해보아도 딥러닝의 파워를 한 번에 느껴볼 수가 있어요. 다시말해 입력의 세부 정보들은 모두 압축되고 혹은 날아가고 기다 아니다, 이거다 저거다만 남는 것이니 러닝의 가장 중요한 부분은 버리는데서 이루어진다는 것. 쿨!

이 이론을 발표한 이스라엘의 히브루 대학의 물리학 교수인 Naftali Tishby는 자신이 수십년간 연구해온 이 인포메이션 바틀넥 원리라는 주제를 최근 성공적으로 딥러닝의 내부 작용의 원리와 연관을 시켜내었습니다. 이 기사에서 보면 딥러닝의 대부인 힌튼교수도 그의 유튜브 동영상 강연을 보고는 잘 이해하기는 어렵지만 딥러닝 내부 원리를 설명하는 좋은 이론인 거 같다며 Tishby의 연구 결과에 찬사를 보냈다고 하네요. *힌튼교수가 보았다는 그의 동영상 강의는 여기. https://www.youtube.com/watch?v=bLqJHjXihK8&t=2234s

P.S.: 며칠전에 여기에 딥러닝의 블랙박스라는게 뭘 모른다고 하는거냐고 툭 묻듯이 한 줄 썼는데 관심 가져주신 분들 감사합니다. ^^
Tishby교수의 동영상 강연은 심도가 있는 반면 매우 테크니컬한 면이 있어서 이해하기에 어려운 점이 있었는데 이 기사는 영문이지만 읽기에 크게 어렵지 않아 이 글을 공유합니다.

https://www.wired.com/story/new-theory-deep-learning/?mbid=social_fb_onsiteshare

'Deep Learning > resources' 카테고리의 다른 글

과제를 준비하면서 사용하였던 colorization, google deepdream, style transfer, matting 알고리즘에 대해 간단히 정리해보았습니다. (논문 + 코드 링크 정리입니다.) opencv grabcut도 있음 (0)	2017.12.09
수학 필요한 과목 How much mathematics does an IT engineer need to learn to get into data science/machine learning? (0)	2017.11.29
Kaggle-knowhow 한국분들을 위한 Kaggle 자료 모음입니다 (0)	2017.11.24
구글에서 공개한 머신러닝 용어집 (https://developers.google.com/machine-learning/glossary/) 을 번역한 글입니다 (0)	2017.11.18
How a 22 year old from Shanghai won a global deep learning challenge 자율주행차 위한 세그멘테이션 대회 (0)	2017.10.31

Posted by uniqueone

TensorFlow Speech Recognition - Kaggle competition keras

Deep Learning/TensorFlow 2017. 11. 25. 10:25

https://m.facebook.com/groups/107107546348803?view=permalink&id=532112330514987

TensorFlow Speech Recognition - Kaggle competition is going on. I wrote a basic tutorial on speech (word) recognition using some of the datasets from the competition.
.
Hope it will be helpful for some of you. Thanks in advance for reading!
.

https://blog.manash.me/building-a-dead-simple-word-recognition-engine-using-convnet-in-keras-25e72c19c12b

'Deep Learning > TensorFlow' 카테고리의 다른 글

Intel AI Academy 에서 무료로 공개한 Machine Learning 101, Deep Learning 101 수업 입니다. Regression / Classification 부터 CNN RNN까지 다룬걸 보니 김성훈 교수님 수업이랑 오버랩 되네요 (0)	2017.12.13
How to Visualize a Deep Learning Neural Network Model in Keras https://machinelearningmastery.com/visualize-deep-learning-neural-network-model-keras/ (0)	2017.12.13
An Introduction to different Types of Convolutions in Deep Learning (0)	2017.11.22
How to Use the Keras Functional API for Deep Learning (0)	2017.10.27
Keras Tutorial: The Ultimate Beginner's Guide to Deep Learning in Python (0)	2017.10.18

Posted by uniqueone

Kaggle-knowhow 한국분들을 위한 Kaggle 자료 모음입니다

Deep Learning/resources 2017. 11. 24. 12:44

Kaggle-knowhow/README.md at master · zzsza/Kaggle-knowhow · GitHub
https://github.com/zzsza/Kaggle-knowhow/blob/master/README.md

'Deep Learning > resources' 카테고리의 다른 글

수학 필요한 과목 How much mathematics does an IT engineer need to learn to get into data science/machine learning? (0)	2017.11.29
딥러닝의 내부에서 일어나는 일을 Information Theory로, 그 중에서도 Information Bottleneck 이라는 원리로 접근하는 이론에 관한 글입니다 (0)	2017.11.29
구글에서 공개한 머신러닝 용어집 (https://developers.google.com/machine-learning/glossary/) 을 번역한 글입니다 (0)	2017.11.18
How a 22 year old from Shanghai won a global deep learning challenge 자율주행차 위한 세그멘테이션 대회 (0)	2017.10.31
저희 운영진 Kyung Mo Kweon 님께서 만들어 주신 12인회 논문 읽기 비디오 (0)	2017.09.23

Posted by uniqueone

Probabilistic Graphical Models Tutorial — Part 2 – Stats and Bots

Deep Learning/algorithm 2017. 11. 24. 10:38

Probabilistic Graphical Models Tutorial — Part 2 – Stats and Bots
https://blog.statsbot.co/probabilistic-graphical-models-tutorial-d855ba0107d1

Homepage
Stats and Bots
Get started
HOMEDATA SCIENCEANALYTICSSTARTUPSBOTSDESIGNSUBSCRIBE TRY STATSBOT FREE
Go to the profile of Prasoon Goyal
Prasoon Goyal
PhD candidate at UT Austin. For more content on machine learning by me, check my Quora profile (https://www.quora.com/profile/Prasoon-Goyal).
Nov 23
Probabilistic Graphical Models Tutorial — Part 2
Parameter estimation and inference algorithms

In the previous part of this probabilistic graphical models tutorial for the Statsbot team, we looked at the two types of graphical models, namely Bayesian networks and Markov networks. We also explored the problem setting, conditional independences, and an application to the Monty Hall problem. In this post, we will cover parameter estimation and inference, and look at another application.

Parameter Estimation
Bayesian networks
Estimating the numbers in the CPD tables of a Bayesian network simply amounts to counting how many times that event occurred in our training data. That is, to estimate p(SAT=s1 | Intelligence = i1), we simply count the fraction of data points where SAT=s1 and Intelligence = i1, out of the total data points where Intelligence = i1. While this approach may appear ad hoc, it turns out that the parameters so obtained maximize the likelihood of the observed data.
Markov networks
For Markov networks, unfortunately, the above counting approach does not have a statistical justification (and will therefore lead to suboptimal parameters). So, we need to use more sophisticated techniques. The basic idea behind most of these techniques is gradient descent — we define parameters that describe the probability distribution, and then use gradient descent to find values for these parameters that maximize the likelihood of the observed data.
Finally, now that we have the parameters of our model, we want to use them on new data, to perform inference!
Inference
The bulk of the literature in probabilistic graphical models focuses on inference. The reasons are two-fold:
Inference is why we came up with this entire framework — being able to make predictions from what we already know.
Inference is computationally hard! In some specific kinds of graphs, we can perform inference fairly efficiently, but on general graphs, it is intractable. So we need to use approximate algorithms that trade off accuracy for efficiency.
There are several questions we can answer with inference:
Marginal inference: Finding the probability distribution of a specific variable. For instance, given a graph with variables A, B, C, and D, where A takes values 1, 2, and 3, find p(A=1), p(A=2) and p(A=3).
Posterior inference: Given some observed variables v_E (E for evidence) that take values e, finding the posterior distribution p(v_H | v_E=e) for some hidden variables v_H.
Maximum-a-posteriori (MAP) inference: Given some observed variables v_E that take values e, finding the setting of other variables v_H that have the highest probability.
Answers to these questions may be useful by themselves, or may need to be used as part of larger tasks.
In what follows, we are going to look at some of the popular algorithms for answering these questions, both exact and approximate. All these algorithms are applicable on both Bayesian networks and Markov networks.
Variable Elimination
Using the definition of conditional probability, we can write the posterior distribution as:

Let’s see how we can compute the numerator and the denominator above, using a simple example. Consider a network with three variables, and the joint distribution defined as follows:

Let’s say we want to compute p(A | B=1). Note that this means that we want to compute the values p(A=0 | B=1)and p(A=1 | B=1), which should sum to one. Using the above equation, we can write

The numerator is the probability that A = 0 and B = 1. We don’t care about the values of C. So we would sum over all the values of C. (This comes from basic probability — p(A=0, B=1, C=0) and p(A=0, B=1, C=1) are mutually exclusive events, so their union p(A = 0, B=1) is just the sum of the individual probabilities.)
So we add rows 3 and 4 to get p(A=0, B=1) = 0.15. Similarly, adding rows 7 and 8 gives usp(A=1, B=1) = 0.40. Also, we can compute the denominator by summing over all rows that contain B=1, that is, rows 3, 4, 7, and 8, to get p(B=1) = 0.55. This gives us the following:
p(A = 0 | B = 1) = 0.15 / 0.55 = 0.27
p(A = 1 | B = 1) = 0.40 / 0.55 = 0.73
If you look at the above computation closely, you would notice that we did some repeated computations — adding rows 3 & 4, and 7 & 8 twice. A more efficient way to compute p(B=1)would have been to simply add the values p(A=0, B=1) and p(A=1, B=1). This is the basic idea of variable elimination.
In general, when you have a lot of variables, not only can you use the values of the numerator to compute the denominator, but the numerator by itself will contain repeated computations, if evaluated naively. You can use dynamic programming to use precomputed values efficiently.
Because we are summing over one variable at a time, thereby eliminating it, the process of summing out multiple variables amounts to eliminating these variables one at a time. Hence, the name “variable elimination.”
It is straightforward to extend the above process to solve the marginal inference or MAP inference problems as well. Similarly, it is easy to generalize the above idea to apply it to Markov networks too.
The time complexity of variable elimination depends on the graph structure, and the order in which you eliminate the variables. In the worst case, it has exponential time complexity.
Belief Propagation
The VE algorithm that we just saw gives us only one final distribution. Suppose we want to find the marginal distributions for all variables. Instead of running variable elimination multiple times, we can do something smarter.
Suppose you have a graph structure. To compute a marginal, you need to sum the joint distribution over all other variables, which amounts to aggregating information from the entire graph. Here’s an alternate way of aggregating information from the entire graph — each node looks at its neighbors, and approximates the distribution of variables locally.
Then, every pair of neighboring nodes send “messages” to each other where the messages contain the local distributions. Now, every node looks at the messages it receives, and aggregates them to update its probability distributions of variables.

In the figure above, C aggregates information from its neighbors A and B, and sends a message to D. Then, D aggregates this message with the information from E and F.
The advantage of this approach is that if you save the messages that you are sending at every node, one forward pass of messages followed by one backward pass gives all nodes information about all other nodes. That information can then be used to compute all the marginals, which was not possible in variable elimination.
If the graph does not contain cycles, then this process converges after a forward and a backward pass. If the graph contains cycles, then this process may or may not converge, but it can often be used to get an approximate answer.
Approximate inference
Because exact inference may be prohibitively time consuming for large graphical models, numerous approximate inference algorithms have been developed for graphical models, most of which fall into one of the following two categories:
Sampling-based
These algorithms estimate the desired probability using sampling. As a simple example, consider the following scenario — given a coin, how you would determine the probability of getting heads when the coin is tossed? The simplest thing is to flip the coin, say, 100 times, and find out the fraction of tosses in which you get heads.
This is a sampling-based algorithm to estimate the probability of heads. For more complex questions in probabilistic graphical models, you can use a similar procedure. Sampling-based algorithms can further be divided into two classes. In the first one, the samples are independent of each other, as in the coin toss example above. These algorithms are called Monte Carlo methods.
For problems with many variables, generating good quality independent samples is difficult, and therefore, we generate dependent samples, that is, each new sample is random, but close to the last sample. Such algorithms are called Markov Chain Monte Carlo (MCMC) methods, because the samples form a “Markov chain.” Once we have the samples, we can use them to answer various inference questions.
Variational methods
Instead of using sampling, variational methods try to approximate the required distribution analytically. Suppose you write out the expression for computing the distribution of interest — marginal probability distribution or posterior probability distribution.
Often, these expressions have summations or integrals in them that are computationally expensive to evaluate exactly. A good way to approximate these expressions is to then solve for an alternate expression, and somehow ensure that this alternate expression is close to the original expression. This is the basic idea behind variational methods.
When we are trying to estimate a complex probability distribution p_complex, we define a separate set of probability distributions P_simple, which are easier to work with, and then find the probability distribution p_approx from P_simple that is closest to p_complex.
Application: Image denoising
Let us now use some of the ideas we just discussed on a real problem. Let’s say you have the following image:

Now suppose that it got corrupted by random noise, so that your noisy image looks as follows:

The goal is to recover the original image. Let’s see how we can use probabilistic graphical models to do this.
The first step is to think about what our observed and unobserved variables are, and how we can connect them to form a graph. Let us define each pixel in the noisy image as an observed random variable, and each pixel in the ground truth image as an unobserved variable. So, if the image is M x N, then there are MN observed variables and MN unobserved variables. Let us denote observed variables as X_ij and unobserved variables as Y_ij. Each variable takes values +1 and -1 (corresponding to black and white pixels, respectively). Given the observed variables, we want to find the most likely values of the unobserved variables. This corresponds to MAP inference.
Now, let us use some domain knowledge to build the graph structure. Clearly, the observed variable at position (i, j) in the noisy image depends on the unobserved variable at position (i, j) in the ground truth image. This is because most of the time, they are identical.
What more can we say? For ground truth images, the neighboring pixels usually have the same values — this is not true at the boundaries of color change, but inside a single-colored region, this property holds. Therefore, we connect Y_ij and Y_kl if they are neighboring pixels.
So, our graph structure looks as follows:

Here, the white nodes denote the unobserved variables Y_ij and the grey nodes denote observed variables X_ij. Each X_ij is connected to the corresponding Y_ij, and each Y_ij is connected to its neighbors.
Note that this is a Markov network, because there is no cause-effect relation between pixels of an image, and therefore, defining directions of arrows in Bayesian networks is unnatural here.
Our MAP inference problem can be mathematically written as follows:

Here, we used some standard simplification techniques common in maximum log likelihood computation. We will use X and Y(without subscripts) to denote the collection of all X_ij and Y_ij values, respectively.
Now, we need to define our joint distribution P(X, Y) based on our graph structure. Let’s assume that P(X, Y) consists of two kinds of factors — ϕ(X_ij, Y_ij) and ϕ(Y_ij,Y_kl), corresponding to the two kinds of edges in our graph. Next, we define the factors as follows:
ϕ(X_ij, Y_ij) = exp(w_e X_ij Y_ij), where w_e is a parameter greater than zero. This factor takes large values when X_ij and Y_ij are identical, and takes small values when X_ij and Y_ij are different.
ϕ(Y_ij, Y_kl) = exp(w_s Y_ij Y_kl), where w_s is a parameter greater than zero, as before. This factor favors identical values of Y_ij and Y_kl.
Therefore, our joint distribution is given by:

where (i, j) and (k, l) in the second product are adjacent pixels, and Z is a normalization constant.
Plugging this into our MAP inference equation gives:

Note that we have dropped the term containing Zsince it does not affect the solution.
The values of w_e and w_s are obtained using parameter estimation techniques from pairs of ground truth and noisy images. This process is fairly mathematically involved (although, at the end of the day, it is just gradient descent on a complicated function), and therefore, we shall not delve into it here. We will assume that we have obtained the following values of these parameters — w_e = 8 and w_s = 10.
The main focus of this example will be inference. Given these parameters, we want to solve the MAP inference problem above. We can use a variant of belief propagation to do this, but it turns out that there is a much simpler algorithm called Iterated conditional modes (ICM) for graphs with this specific structure.
The basic idea is that at each step, you choose one node, Y_ij, look at the value of the MAP inference expression for both Y_ij = -1 and Y_ij = 1, and pick the one with the higher value. Repeating this process for a fixed number of iterations or until convergence usually works reasonably well.
You can use this Python code to do this for our model.
This is the denoised image returned by the algorithm:

Pretty good, isn’t it? Of course, you can use more fancy techniques, both within graphical models, and outside, to generate something better, but the takeaway from this example is that a simple Markov network with a simple inference algorithm already gives you reasonably good results.
Quantitatively, the noisy image has about 10% of the pixels that are different from the original image, while the denoised image produced by our algorithm has about 0.6% of the pixels that are different from the original image.
It is important to note that the graph that we used is fairly large — the image size is about 440 x 300, so the total number of nodes is close to 264,000. Therefore, exact inference in such models is essentially infeasible, and what we get out of most algorithms, including ICM, is a local optimum.
Let’s recap
In this section, let us briefly review the key concepts we covered in this two-part series:
Graphical models: A graphical model consists of a graph structure where nodes represent random variables and edges represent dependencies between variables.
Bayesian networks: These are directed graphical models, with a conditional probability distribution table associated with each node.
Markov networks: These are undirected graphical models, with a potential function associated with each clique.
Conditional independences: Based on how the nodes in the graph are connected, we can write conditional independence statements of the form “X is independent of Y given Z.”
Parameter estimation: Given some data and the graph structure, we want to fill the CPD tables or compute the potential functions.
Inference: Given a graphical model, we want to answer questions about unobserved variables. These questions are usually one of the following — Marginal inference, posterior inference, and MAP inference.
Inference on general graphical models is computationally intractable. We can divide inference algorithms into two broad categories — exact and approximate. Variable elimination and belief propagation in acyclic graphs are examples of exact inference algorithms. Approximate inference algorithms are necessary for large-scale graphs, and usually fall into sampling-based methods or variational methods.
Conclusions
We looked at some of the core ideas in probabilistic graphical models in this two-part tutorial. As you should be able to appreciate at this point, graphical models provide an interpretable way to model many real-world tasks, where there are dependencies. Using graphical models gives us a way to work on such tasks in a principled manner.
Before we close, it is important to point out that this tutorial, by no means, is complete — many details have been skipped to keep the content intuitive and simple. The standard textbook on probabilistic graphical models is over a thousand pages! This tutorial is meant to serve as a starting point, to get you interested in the field, so that you can look up more rigorous resources.
Here are some additional resources that you can use to dig deeper into the field:
Graphical Models in a Nutshell
Graphical Models textbook
You should also be able to find a few chapters on graphical models in standard machine learning textbooks.

YOU’D ALSO LIKE:
Probabilistic Graphical Models Tutorial — Part 1

Basic terminology and the problem setting
blog.statsbot.co
Machine Learning Algorithms: Which One to Choose for Your Problem

Intuition of using different kinds of algorithms in different tasks
blog.statsbot.co
Neural networks for beginners: popular types and applications

An introduction to neural networks learning
blog.statsbot.co
Machine LearningData ScienceAlgorithmsBayesian StatisticsMarkov Chains
One clap, two clap, three clap, forty?
By clapping more or less, you can signal to us which stories really stand out.

265

Follow
Go to the profile of Prasoon Goyal
Prasoon Goyal
PhD candidate at UT Austin. For more content on machine learning by me, check my Quora profile (https://www.quora.com/profile/Prasoon-Goyal).
Follow
Stats and Bots
Stats and Bots
Data stories on machine learning and analytics. From Statsbot’s makers.
More from Stats and Bots
Neural networks for beginners: popular types and applications
Go to the profile of Jay Shah
Jay Shah

287

Also tagged Markov Chains
Implementing Markov Chain in Swift. Generating Texts.
Go to the profile of Swift The Sorrow
Swift The Sorrow

41

More on Algorithms from Stats and Bots
Data Structures Related to Machine Learning Algorithms
Go to the profile of Peter Mills
Peter Mills

275

'Deep Learning > algorithm' 카테고리의 다른 글

Swish activation, get slightly better result Than ReLU. (0)	2017.10.26
A miscellany of fun deep learning papers (0)	2017.03.26
Faster R-CNN 한글 설명 (0)	2017.03.26
GAN 그리고 Unsupervised Learning (0)	2017.03.22
Linear algebra cheat sheet for deep learning (0)	2017.03.09

Posted by uniqueone

An Introduction to different Types of Convolutions in Deep Learning

Deep Learning/TensorFlow 2017. 11. 22. 16:47

An Introduction to different Types of Convolutions in Deep Learning
https://towardsdatascience.com/types-of-convolutions-in-deep-learning-717013397f4d

Homepage
Towards Data Science
Get started
HOMEDATA SCIENCEMACHINE LEARNINGPROGRAMMINGVISUALIZATIONEVENTSLETTERSCONTRIBUTE
Go to the profile of Paul-Louis Pröve
Paul-Louis Pröve
Artificial Intelligence @ PwC
Jul 22
An Introduction to different Types of Convolutions in Deep Learning

Let me give you a quick overview of different types of convolutions and what their benefits are. For the sake of simplicity, I’m focussing on 2D convolutions only.
Convolutions
First we need to agree on a few parameters that define a convolutional layer.

2D convolution using a kernel size of 3, stride of 1 and padding
Kernel Size: The kernel size defines the field of view of the convolution. A common choice for 2D is 3 — that is 3x3 pixels.
Stride: The stride defines the step size of the kernel when traversing the image. While its default is usually 1, we can use a stride of 2 for downsampling an image similar to MaxPooling.
Padding: The padding defines how the border of a sample is handled. A (half) padded convolution will keep the spatial output dimensions equal to the input, whereas unpadded convolutions will crop away some of the borders if the kernel is larger than 1.
Input & Output Channels: A convolutional layer takes a certain number of input channels (I) and calculates a specific number of output channels (O). The needed parameters for such a layer can be calculated by I*O*K, where K equals the number of values in the kernel.
Dilated Convolutions
(a.k.a. atrous convolutions)

2D convolution using a 3 kernel with a dilation rate of 2 and no padding
Dilated convolutions introduce another parameter to convolutional layers called the dilation rate. This defines a spacing between the values in a kernel. A 3x3 kernel with a dilation rate of 2 will have the same field of view as a 5x5 kernel, while only using 9 parameters. Imagine taking a 5x5 kernel and deleting every second column and row.
This delivers a wider field of view at the same computational cost. Dilated convolutions are particularly popular in the field of real-time segmentation. Use them if you need a wide field of view and cannot afford multiple convolutions or larger kernels.
Transposed Convolutions
(a.k.a. deconvolutions or fractionally strided convolutions)
Some sources use the name deconvolution, which is inappropriate because it’s not a deconvolution. To make things worse deconvolutions do exists, but they’re not common in the field of deep learning. An actual deconvolution reverts the process of a convolution. Imagine inputting an image into a single convolutional layer. Now take the output, throw it into a black box and out comes your original image again. This black box does a deconvolution. It is the mathematical inverse of what a convolutional layer does.
A transposed convolution is somewhat similar because it produces the same spatial resolution a hypothetical deconvolutional layer would. However, the actual mathematical operation that’s being performed on the values is different. A transposed convolutional layer carries out a regular convolution but reverts its spatial transformation.

2D convolution with no padding, stride of 2 and kernel of 3
At this point you should be pretty confused, so let’s look at a concrete example. An image of 5x5 is fed into a convolutional layer. The stride is set to 2, the padding is deactivated and the kernel is 3x3. This results in a 2x2 image.
If we wanted to reverse this process, we’d need the inverse mathematical operation so that 9 values are generated from each pixel we input. Afterward, we traverse the output image with a stride of 2. This would be a deconvolution.

Transposed 2D convolution with no padding, stride of 2 and kernel of 3
A transposed convolution does not do that. The only thing in common is it guarantees that the output will be a 5x5 image as well, while still performing a normal convolution operation. To achieve this, we need to perform some fancy padding on the input.
As you can imagine now, this step will not reverse the process from above. At least not concerning the numeric values.
It merely reconstructs the spatial resolution from before and performs a convolution. This may not be the mathematical inverse, but for Encoder-Decoder architectures, it’s still very helpful. This way we can combine the upscaling of an image with a convolution, instead of doing two separate processes.
Separable Convolutions
In a separable convolution, we can split the kernel operation into multiple steps. Let’s express a convolution as y = conv(x, k) where y is the output image, x is the input image, and k is the kernel. Easy. Next, let’s assume k can be calculated by: k = k1.dot(k2). This would make it a separable convolution because instead of doing a 2D convolution with k, we could get to the same result by doing 2 1D convolutions with k1 and k2.

Sobel X and Y filters
Take the Sobel kernel for example, which is often used in image processing. You could get the same kernel by multiplying the vector [1, 0, -1] and [1,2,1].T. This would require 6 instead of 9 parameters while doing the same operation.
The example above shows what’s called a spatial separable convolution, which to my knowledge isn’t used in deep learning. I just wanted to make sure you don’t get confused when stumbling upon those. In neural networks, we commonly use something called a depthwise separable convolution.
This will perform a spatial convolution while keeping the channels separate and then follow with a depthwise convolution. In my opinion, it can be best understood with an example.
Let’s say we have a 3x3 convolutional layer on 16 input channels and 32 output channels. What happens in detail is that every of the 16 channels is traversed by 32 3x3 kernels resulting in 512 (16x32) feature maps. Next, we merge 1 feature map out of every input channel by adding them up. Since we can do that 32 times, we get the 32 output channels we wanted.
For a depthwise separable convolution on the same example, we traverse the 16 channels with 1 3x3 kernel each, giving us 16 feature maps. Now, before merging anything, we traverse these 16 feature maps with 32 1x1 convolutions each and only then start to them add together. This results in 656 (16x3x3 + 16x32x1x1) parameters opposed to the 4608 (16x32x3x3) parameters from above.
The example is a specific implementation of a depthwise separable convolution where the so called depth multiplier is 1. This is by far the most common setup for such layers.
We do this because of the hypothesis that spatial and depthwise information can be decoupled. Looking at the performance of the Xception model this theory seems to work. Depthwise separable convolutions are also used for mobile devices because of their efficient use of parameters.
Questions?
This concludes our little tour through different types of convolutions. I hope it helped to get a brief overview of the matter. Drop a comment if you have any remaining questions and check out this GitHub page for more convolution animations.
Machine LearningConvolutionalCnnNeural NetworksDeep Learning
One clap, two clap, three clap, forty?
By clapping more or less, you can signal to us which stories really stand out.

572
5
Follow
Go to the profile of Paul-Louis Pröve
Paul-Louis Pröve
Medium member since Oct 2017
Artificial Intelligence @ PwC
Follow
Towards Data Science
Towards Data Science
Sharing concepts, ideas, and codes.
More from Towards Data Science
Making Your Own Spotify Discover Weekly Playlist
Go to the profile of Nick Behrens
Nick Behrens

1.4K

Also tagged Neural Networks
Yes you should understand backprop
Go to the profile of Andrej Karpathy
Andrej Karpathy

4.4K

Related reads
Using GANS for semi-supervised learning
In supervised learning, we have a training set of inputs x and class labels y. We train a model that takes x as input and gives y as output…
Go to the profile of Manish Chablani
Manish Chablani

34

Responses
Conversation between Krishna Teja and Paul-Louis Pröve.
Go to the profile of Krishna Teja
Krishna Teja
Sep 18
Hi Paul,
It’s a great post. I would like to add a bit to your explanation on usage of separable convolutions in neural networks.
Note: In neural networks a 2D convolution has 3 dimensions such as height, width and depth where the depth is always equivalent to the number of input channels. For example…
Read more…

8
1 response
Go to the profile of Paul-Louis Pröve
Paul-Louis Pröve
Oct 16
Krishna, thank you so much for taking the time and showing the details of the actual tensor transformations. When you use high level frameworks such as Keras you never touch this functional level. I have a couple of questions:
Are you aware of any papers using spatial separable convolutions in deep learning? It sounds like a…

'Deep Learning > TensorFlow' 카테고리의 다른 글

How to Visualize a Deep Learning Neural Network Model in Keras https://machinelearningmastery.com/visualize-deep-learning-neural-network-model-keras/ (0)	2017.12.13
TensorFlow Speech Recognition - Kaggle competition keras (0)	2017.11.25
How to Use the Keras Functional API for Deep Learning (0)	2017.10.27
Keras Tutorial: The Ultimate Beginner's Guide to Deep Learning in Python (0)	2017.10.18
Quick complete Tensorflow tutorial to understand and run Alexnet, VGG, Inceptionv3, Resnet and squeezeNet networks (0)	2017.07.26

Posted by uniqueone

구글에서 공개한 머신러닝 용어집 (https://developers.google.com/machine-learning/glossary/) 을 번역한 글입니다

Deep Learning/resources 2017. 11. 18. 17:32

역시나 래블업에서 다른 작업중에 파생된 결과물을 공개합니다.

구글에서 공개한 머신러닝 용어집 (https://developers.google.com/machine-learning/glossary/) 을 번역한 글입니다. TensorFlow 및 케라스 온라인 강의/실습을 준비하며 용어 통일 및 참조용으로 만든 글인데, 고칠 부분에 대해 피드백을 받을 겸 먼저 공개해 봅니다.

많은 도움이 되셨으면 합니다.

덧) 고수님들께 피드백도 많이 부탁 드립니다!
덧2) 가능하면 연말까지는 하루에 하나씩 backend.AI 및 코드온웹의 파생 결과물들을 공개해 보겠습니다!

https://www.codeonweb.com/@mookiekim/ml-glossary

'Deep Learning > resources' 카테고리의 다른 글

딥러닝의 내부에서 일어나는 일을 Information Theory로, 그 중에서도 Information Bottleneck 이라는 원리로 접근하는 이론에 관한 글입니다 (0)	2017.11.29
Kaggle-knowhow 한국분들을 위한 Kaggle 자료 모음입니다 (0)	2017.11.24
How a 22 year old from Shanghai won a global deep learning challenge 자율주행차 위한 세그멘테이션 대회 (0)	2017.10.31
저희 운영진 Kyung Mo Kweon 님께서 만들어 주신 12인회 논문 읽기 비디오 (0)	2017.09.23
ZhuSuan : 베이지안 방법론과 딥러닝의 장점을 결합한 Bayesian Deep Learning을 위한 Python 기반 확률 프로그래밍 라이브러리. (0)	2017.09.20

Posted by uniqueone

부스트코스]딥러닝 기초 강좌"요 라고 말할 수 있을 것 같습니다 (0)	2019.07.29
머신러닝 딥러닝 유튜브 강좌 (0)	2018.09.08
Machine Learning · Artificial Inteligence 웹북. 딥러닝 기초 도움 (0)	2017.10.31
Deep Learning Lecture Collection (Spring 2017): http://www.youtube.com/playlist?list=PL3FW7Lu3i5JvHM8ljYj-zLfQRF3EO8sYv (0)	2017.08.13
Learn Python in 3 days : Step by Step Guide - Data Science Central (0)	2017.06.11

Pattern Recognition and Machine Learning 책 2장 식(2.117)까지 정리한 노트입니다.

Machine Learning/course 2017. 11. 8. 15:39

https://m.facebook.com/groups/255834461424286?view=permalink&id=556808447993551

안녕하세요. 유령회원이 오랜만에 글을 적습니다. ;;;
올해도 다 지나갔네요....ㅠㅠ

Pattern Recognition and Machine Learning 책 2장 식(2.117)까지 정리한 노트입니다.

딥러닝 공부하면서 통계학 지식이 너무 없어서 혼자서 책보고 정리한 자료인데 원서라서 저같이 영알못 수알못에다 기억력까지 좋지 못한 경우는 다시보면 정말 첨보는 듯 한 느낌....다시 첨부터 읽어야하는듯한 자괴감을 막기위해 조금씩 주피터로 정리를 했습니다.

처음 관련 내용보았을 때 식이 너무 복잡해서 와 이거 뭐지..먼소리하는지 모르겠는데 싶었는데 어찌어찌 읽기는했네요.

식2.117까지는 꼭알아야겠다 싶어서 정리를 했는데 혹시나 보시고 계신분들 계시면 도움이 되면 좋겠습니다. 좀 어이없을 정도로 식을 풀어적어서 읽기 짜증나실 수 도 있습니다. 그냥 참고삼아..... ㅠㅠ

혹시 오류있으면 지적해주세요. 감사합니다.

http://nbviewer.jupyter.org/github/metamath1/ml-simple-works/blob/master/PRML/prml-chap2.ipynb

'Machine Learning > course' 카테고리의 다른 글

Recommended Books - UC Berkeley Statistics Graduate Student Association (0)	2018.02.26
통계 추천 동영상강좌 (0)	2017.12.16
앤드류 교수의 코세라 강의를 정주행 후 영상 슬라이드를 정리하여 (0)	2017.08.03
Machine Learning Part 1 \| SciPy 2016 Tutorial \| Andreas Mueller & Sebastian Raschka (0)	2017.04.11
Time Series Forecasting with Python 7-Day Mini-Course - Machine Learning Mastery (0)	2017.03.26

Posted by uniqueone

NumPy for MATLAB users – Mathesaurus

Python 2017. 11. 8. 04:41

NumPy for MATLAB users – Mathesaurus

http://mathesaurus.sourceforge.net/matlab-numpy.html

NumPy for MATLAB users

Help

MATLAB/Octave	Python	Description
`doc` `help -i % browse with Info`	`help()`	Browse help interactively
`help help` or `doc doc`	`help`	Help on using help
`help plot`	`help(plot)` or `?plot`	Help for a function
`help splines` or `doc splines`	`help(pylab)`	Help for a toolbox/library package
`demo`		Demonstration examples

Searching available documentation

MATLAB/Octave	Python	Description
`lookfor plot`		Search help files
`help`	`help(); modules [Numeric]`	List available packages
`which plot`	`help(plot)`	Locate functions

Using interactively

MATLAB/Octave	Python	Description
`octave -q`	`ipython -pylab`	Start session
`TAB` or `M-?`	`TAB`	Auto completion
`foo(.m)`	`execfile('foo.py')` or `run foo.py`	Run code from file
`history`	`hist -n`	Command history
`diary on [..] diary off`		Save command history
`exit` or `quit`	`CTRL-D` `CTRL-Z # windows` `sys.exit()`	End session

Operators

MATLAB/Octave	Python	Description
`help -`		Help on operator syntax

Arithmetic operators

MATLAB/Octave	Python	Description
`a=1; b=2;`	`a=1; b=1`	Assignment; defining a number
`a + b`	`a + b` or `add(a,b)`	Addition
`a - b`	`a - b` or `subtract(a,b)`	Subtraction
`a * b`	`a * b` or `multiply(a,b)`	Multiplication
`a / b`	`a / b` or `divide(a,b)`	Division
`a .^ b`	`a ** b` `power(a,b)` `pow(a,b)`	Power, $a^b$
`rem(a,b)`	`a % b` `remainder(a,b)` `fmod(a,b)`	Remainder
`a+=1`	`a+=b` or `add(a,b,a)`	In place operation to save array creation overhead
`factorial(a)`		Factorial, $n!$

Relational operators

MATLAB/Octave	Python	Description
`a == b`	`a == b` or `equal(a,b)`	Equal
`a < b`	`a < b` or `less(a,b)`	Less than
`a > b`	`a > b` or `greater(a,b)`	Greater than
`a <= b`	`a <= b` or `less_equal(a,b)`	Less than or equal
`a >= b`	`a >= b` or `greater_equal(a,b)`	Greater than or equal
`a ~= b`	`a != b` or `not_equal(a,b)`	Not Equal

Logical operators

MATLAB/Octave	Python	Description
`a && b`	`a and b`	Short-circuit logical AND
`a \|\| b`	`a or b`	Short-circuit logical OR
`a & b` or `and(a,b)`	`logical_and(a,b)` or `a and b`	Element-wise logical AND
`a \| b` or `or(a,b)`	`logical_or(a,b)` or `a or b`	Element-wise logical OR
`xor(a, b)`	`logical_xor(a,b)`	Logical EXCLUSIVE OR
`~a` or `not(a)` `~a` or `!a`	`logical_not(a)` or `not a`	Logical NOT
`any(a)`		True if any element is nonzero
`all(a)`		True if all elements are nonzero

root and logarithm

MATLAB/Octave	Python	Description
`sqrt(a)`	`math.sqrt(a)`	Square root
`log(a)`	`math.log(a)`	Logarithm, base $e$ (natural)
`log10(a)`	`math.log10(a)`	Logarithm, base 10
`log2(a)`	`math.log(a, 2)`	Logarithm, base 2 (binary)
`exp(a)`	`math.exp(a)`	Exponential function

Round off

MATLAB/Octave	Python	Description
`round(a)`	`around(a)` or `math.round(a)`	Round
`ceil(a)`	`ceil(a)`	Round up
`floor(a)`	`floor(a)`	Round down
`fix(a)`	`fix(a)`	Round towards zero

Mathematical constants

MATLAB/Octave	Python	Description
`pi`	`math.pi`	$\pi=3.141592$
`exp(1)`	`math.e` or `math.exp(1)`	$e=2.718281$

Missing values; IEEE-754 floating point status flags

MATLAB/Octave	Python	Description
`NaN`	`nan`	Not a Number
`Inf`	`inf`	Infinity, $\infty$
	`plus_inf`	Infinity, $+\infty$
	`minus_inf`	Infinity, $-\infty$
	`plus_zero`	Plus zero, $+0$
	`minus_zero`	Minus zero, $-0$

Complex numbers

MATLAB/Octave	Python	Description
`i`	`z = 1j`	Imaginary unit
`z = 3+4i`	`z = 3+4j` or `z = complex(3,4)`	A complex number, $3+4i$
`abs(z)`	`abs(3+4j)`	Absolute value (modulus)
`real(z)`	`z.real`	Real part
`imag(z)`	`z.imag`	Imaginary part
`arg(z)`		Argument
`conj(z)`	`z.conj(); z.conjugate()`	Complex conjugate

Trigonometry

MATLAB/Octave	Python	Description
`atan(a,b)`	`atan2(b,a)`	Arctangent, $\arctan(b/a)$
	`hypot(x,y)`	Hypotenus; Euclidean distance

Generate random numbers

MATLAB/Octave	Python	Description
`rand(1,10)`	`random.random((10,))` `random.uniform((10,))`	Uniform distribution
`2+5*rand(1,10)`	`random.uniform(2,7,(10,))`	Uniform: Numbers between 2 and 7
`rand(6)`	`random.uniform(0,1,(6,6))`	Uniform: 6,6 array
`randn(1,10)`	`random.standard_normal((10,))`	Normal distribution

Vectors

MATLAB/Octave	Python	Description
`a=[2 3 4 5];`	`a=array([2,3,4,5])`	Row vector, $1 \times n$-matrix
`adash=[2 3 4 5]';`	`array([2,3,4,5])[:,NewAxis]` `array([2,3,4,5]).reshape(-1,1)` `r_[1:10,'c']`	Column vector, $m \times 1$-matrix

Sequences

MATLAB/Octave	Python	Description
`1:10`	`arange(1,11, dtype=Float)` `range(1,11)`	1,2,3, ... ,10
`0:9`	`arange(10.)`	0.0,1.0,2.0, ... ,9.0
`1:3:10`	`arange(1,11,3)`	1,4,7,10
`10:-1:1`	`arange(10,0,-1)`	10,9,8, ... ,1
`10:-3:1`	`arange(10,0,-3)`	10,7,4,1
`linspace(1,10,7)`	`linspace(1,10,7)`	Linearly spaced vector of n=7 points
`reverse(a)`	`a[::-1]` or	Reverse
`a(:) = 3`	`a.fill(3), a[:] = 3`	Set all values to same scalar value

Concatenation (vectors)

MATLAB/Octave	Python	Description
`[a a]`	`concatenate((a,a))`	Concatenate two vectors
`[1:4 a]`	`concatenate((range(1,5),a), axis=1)`

Repeating

MATLAB/Octave	Python	Description
`[a a]`	`concatenate((a,a))`	1 2 3, 1 2 3
	`a.repeat(3)` or	1 1 1, 2 2 2, 3 3 3
	`a.repeat(a)` or	1, 2 2, 3 3 3

Miss those elements out

MATLAB/Octave	Python	Description
`a(2:end)`	`a[1:]`	miss the first element
`a([1:9])`		miss the tenth element
`a(end)`	`a[-1]`	last element
`a(end-1:end)`	`a[-2:]`	last two elements

Maximum and minimum

MATLAB/Octave	Python	Description
`max(a,b)`	`maximum(a,b)`	pairwise max
`max([a b])`	`concatenate((a,b)).max()`	max of all values in two vectors
`[v,i] = max(a)`	`v,i = a.max(0),a.argmax(0)`

Vector multiplication

MATLAB/Octave	Python	Description
`a.*a`	`a*a`	Multiply two vectors
`dot(u,v)`	`dot(u,v)`	Vector dot product, $u \cdot v$

Matrices

MATLAB/Octave	Python	Description
`a = [2 3;4 5]`	`a = array([[2,3],[4,5]])`	Define a matrix

Concatenation (matrices); rbind and cbind

MATLAB/Octave	Python	Description
`[a ; b]`	`concatenate((a,b), axis=0)` `vstack((a,b))`	Bind rows
`[a , b]`	`concatenate((a,b), axis=1)` `hstack((a,b))`	Bind columns
	`concatenate((a,b), axis=2)` `dstack((a,b))`	Bind slices (three-way arrays)
`[a(:), b(:)]`	`concatenate((a,b), axis=None)`	Concatenate matrices into one vector
`[1:4 ; 1:4]`	`concatenate((r_[1:5],r_[1:5])).reshape(2,-1)` `vstack((r_[1:5],r_[1:5]))`	Bind rows (from vectors)
`[1:4 ; 1:4]'`		Bind columns (from vectors)

Array creation

MATLAB/Octave	Python	Description
`zeros(3,5)`	`zeros((3,5),Float)`	0 filled array
	`zeros((3,5))`	0 filled array of integers
`ones(3,5)`	`ones((3,5),Float)`	1 filled array
`ones(3,5)*9`		Any number filled array
`eye(3)`	`identity(3)`	Identity matrix
`diag([4 5 6])`	`diag((4,5,6))`	Diagonal
`magic(3)`		Magic squares; Lo Shu
	`a = empty((3,3))`	Empty array

Reshape and flatten matrices

MATLAB/Octave	Python	Description
`reshape(1:6,3,2)';`	`arange(1,7).reshape(2,-1)` `a.setshape(2,3)`	Reshaping (rows first)
`reshape(1:6,2,3);`	`arange(1,7).reshape(-1,2).transpose()`	Reshaping (columns first)
`a'(:)`	`a.flatten()` or	Flatten to vector (by rows, like comics)
`a(:)`	`a.flatten(1)`	Flatten to vector (by columns)
`vech(a)`		Flatten upper triangle (by columns)

Shared data (slicing)

MATLAB/Octave	Python	Description
`b = a`	`b = a.copy()`	Copy of a

Indexing and accessing elements (Python: slicing)

MATLAB/Octave	Python	Description
`a = [ 11 12 13 14 ...` `21 22 23 24 ...` `31 32 33 34 ]`	`a = array([[ 11, 12, 13, 14 ],` `[ 21, 22, 23, 24 ],` `[ 31, 32, 33, 34 ]])`	Input is a 3,4 array
`a(2,3)`	`a[1,2]`	Element 2,3 (row,col)
`a(1,:)`	`a[0,]`	First row
`a(:,1)`	`a[:,0]`	First column
`a([1 3],[1 4]);`	`a.take([0,2]).take([0,3], axis=1)`	Array as indices
`a(2:end,:)`	`a[1:,]`	All, except first row
`a(end-1:end,:)`	`a[-2:,]`	Last two rows
`a(1:2:end,:)`	`a[::2,:]`	Strides: Every other row
	`a[...,2]`	Third in last dimension (axis)
`a(:,[1 3 4])`	`a.take([0,2,3],axis=1)`	Remove one column
	`a.diagonal(offset=0)`	Diagonal

Assignment

MATLAB/Octave	Python	Description
`a(:,1) = 99`	`a[:,0] = 99`
`a(:,1) = [99 98 97]'`	`a[:,0] = array([99,98,97])`
`a(a>90) = 90;`	`(a>90).choose(a,90)` `a.clip(min=None, max=90)`	Clipping: Replace all elements over 90
	`a.clip(min=2, max=5)`	Clip upper and lower values

Transpose and inverse

MATLAB/Octave	Python	Description
`a'`	`a.conj().transpose()`	Transpose
`a.'` or `transpose(a)`	`a.transpose()`	Non-conjugate transpose
`det(a)`	`linalg.det(a)` or	Determinant
`inv(a)`	`linalg.inv(a)` or	Inverse
`pinv(a)`	`linalg.pinv(a)`	Pseudo-inverse
`norm(a)`	`norm(a)`	Norms
`eig(a)`	`linalg.eig(a)[0]`	Eigenvalues
`svd(a)`	`linalg.svd(a)`	Singular values
`chol(a)`	`linalg.cholesky(a)`	Cholesky factorization
`[v,l] = eig(a)`	`linalg.eig(a)[1]`	Eigenvectors
`rank(a)`	`rank(a)`	Rank

Sum

MATLAB/Octave	Python	Description
`sum(a)`	`a.sum(axis=0)`	Sum of each column
`sum(a')`	`a.sum(axis=1)`	Sum of each row
`sum(sum(a))`	`a.sum()`	Sum of all elements
	`a.trace(offset=0)`	Sum along diagonal
`cumsum(a)`	`a.cumsum(axis=0)`	Cumulative sum (columns)

Sorting

MATLAB/Octave	Python	Description
`a = [ 4 3 2 ; 2 8 6 ; 1 4 7 ]`	`a = array([[4,3,2],[2,8,6],[1,4,7]])`	Example data
`sort(a(:))`	`a.ravel().sort()` or	Flat and sorted
`sort(a)`	`a.sort(axis=0)` or `msort(a)`	Sort each column
`sort(a')'`	`a.sort(axis=1)`	Sort each row
`sortrows(a,1)`	`a[a[:,0].argsort(),]`	Sort rows (by first row)
	`a.ravel().argsort()`	Sort, return indices
	`a.argsort(axis=0)`	Sort each column, return indices
	`a.argsort(axis=1)`	Sort each row, return indices

Maximum and minimum

MATLAB/Octave	Python	Description
`max(a)`	`a.max(0)` or `amax(a [,axis=0])`	max in each column
`max(a')`	`a.max(1)` or `amax(a, axis=1)`	max in each row
`max(max(a))`	`a.max()` or	max in array
`[v i] = max(a)`		return indices, i
`max(b,c)`	`maximum(b,c)`	pairwise max
`cummax(a)`
	`a.ptp(); a.ptp(0)`	max-to-min range

Matrix manipulation

MATLAB/Octave	Python	Description
`fliplr(a)`	`fliplr(a)` or `a[:,::-1]`	Flip left-right
`flipud(a)`	`flipud(a)` or `a[::-1,]`	Flip up-down
`rot90(a)`	`rot90(a)`	Rotate 90 degrees
`repmat(a,2,3)` `kron(ones(2,3),a)`	`kron(ones((2,3)),a)`	Repeat matrix: [ a a a ; a a a ]
`triu(a)`	`triu(a)`	Triangular, upper
`tril(a)`	`tril(a)`	Triangular, lower

Equivalents to "size"

MATLAB/Octave	Python	Description
`size(a)`	`a.shape` or `a.getshape()`	Matrix dimensions
`size(a,2)` or `length(a)`	`a.shape[1]` or `size(a, axis=1)`	Number of columns
`length(a(:))`	`a.size` or `size(a[, axis=None])`	Number of elements
`ndims(a)`	`a.ndim`	Number of dimensions
	`a.nbytes`	Number of bytes used in memory

Matrix- and elementwise- multiplication

MATLAB/Octave	Python	Description
`a .* b`	`a * b` or `multiply(a,b)`	Elementwise operations
`a * b`	`matrixmultiply(a,b)`	Matrix product (dot product)
	`inner(a,b)` or	Inner matrix vector multiplication $a\cdot b'$
	`outer(a,b)` or	Outer product
`kron(a,b)`	`kron(a,b)`	Kronecker product
`a / b`		Matrix division, $b{\cdot}a^{-1}$
`a \ b`	`linalg.solve(a,b)`	Left matrix division, $b^{-1}{\cdot}a$ \newline (solve linear equations)
	`vdot(a,b)`	Vector dot product
	`cross(a,b)`	Cross product

Find; conditional indexing

MATLAB/Octave	Python	Description
`find(a)`	`a.ravel().nonzero()`	Non-zero elements, indices
`[i j] = find(a)`	`(i,j) = a.nonzero()` `(i,j) = where(a!=0)`	Non-zero elements, array indices
`[i j v] = find(a)`	`v = a.compress((a!=0).flat)` `v = extract(a!=0,a)`	Vector of non-zero values
`find(a>5.5)`	`(a>5.5).nonzero()`	Condition, indices
	`a.compress((a>5.5).flat)`	Return values
`a .* (a>5.5)`	`where(a>5.5,0,a)` or `a * (a>5.5)`	Zero out elements above 5.5
	`a.put(2,indices)`	Replace values

Multi-way arrays

MATLAB/Octave	Python	Description
`a = cat(3, [1 2; 1 2],[3 4; 3 4]);`	`a = array([[[1,2],[1,2]], [[3,4],[3,4]]])`	Define a 3-way array
`a(1,:,:)`	`a[0,...]`

File input and output

MATLAB/Octave	Python	Description
`f = load('data.txt')`	`f = fromfile("data.txt")` `f = load("data.txt")`	Reading from a file (2d)
`f = load('data.txt')`	`f = load("data.txt")`	Reading from a file (2d)
`x = dlmread('data.csv', ';')`	`f = load('data.csv', delimiter=';')`	Reading fram a CSV file (2d)
`save -ascii data.txt f`	`save('data.csv', f, fmt='%.6f', delimiter=';')`	Writing to a file (2d)
	`f.tofile(file='data.csv', format='%.6f', sep=';')`	Writing to a file (1d)
	`f = fromfile(file='data.csv', sep=';')`	Reading from a file (1d)

Plotting

Basic x-y plots

MATLAB/Octave	Python	Description
`plot(a)`	`plot(a)`	1d line plot
`plot(x(:,1),x(:,2),'o')`	`plot(x[:,0],x[:,1],'o')`	2d scatter plot
`plot(x1,y1, x2,y2)`	`plot(x1,y1,'bo', x2,y2,'go')`	Two graphs in one plot
`plot(x1,y1)` `hold on` `plot(x2,y2)`	`plot(x1,y1,'o')` `plot(x2,y2,'o')` `show() # as normal`	Overplotting: Add new plots to current
`subplot(211)`	`subplot(211)`	subplots
`plot(x,y,'ro-')`	`plot(x,y,'ro-')`	Plotting symbols and color

Axes and titles

MATLAB/Octave	Python	Description
`grid on`	`grid()`	Turn on grid lines
`axis equal` `axis('equal')` `replot`	`figure(figsize=(6,6))`	1:1 aspect ratio
`axis([ 0 10 0 5 ])`	`axis([ 0, 10, 0, 5 ])`	Set axes manually
`title('title')` `xlabel('x-axis')` `ylabel('y-axis')`		Axis labels and titles
	`text(2,25,'hello')`	Insert text

Log plots

MATLAB/Octave	Python	Description
`semilogy(a)`	`semilogy(a)`	logarithmic y-axis
`semilogx(a)`	`semilogx(a)`	logarithmic x-axis
`loglog(a)`	`loglog(a)`	logarithmic x and y axes

Filled plots and bar plots

MATLAB/Octave	Python	Description
`fill(t,s,'b', t,c,'g')` `% fill has a bug?`	`fill(t,s,'b', t,c,'g', alpha=0.2)`	Filled plot

Functions

MATLAB/Octave	Python	Description
`f = inline('sin(x/3) - cos(x/5)')`		Defining functions
`ezplot(f,[0,40])` `fplot('sin(x/3) - cos(x/5)',[0,40])` `% no ezplot`	`x = arrayrange(0,40,.5)` `y = sin(x/3) - cos(x/5)` `plot(x,y, 'o')`	Plot a function for given range

Polar plots

MATLAB/Octave	Python	Description
`theta = 0:.001:2pi;` `r = sin(2theta);`	`theta = arange(0,2pi,0.001)` `r = sin(2theta)`
`polar(theta, rho)`	`polar(theta, rho)`

Histogram plots

MATLAB/Octave	Python	Description
`hist(randn(1000,1))`
`hist(randn(1000,1), -4:4)`
`plot(sort(a))`

3d data

Contour and image plots

MATLAB/Octave	Python	Description
`contour(z)`	`levels, colls = contour(Z, V,` `origin='lower', extent=(-3,3,-3,3))` `clabel(colls, levels, inline=1,` `fmt='%1.1f', fontsize=10)`	Contour plot
`contourf(z); colormap(gray)`	`contourf(Z, V,` `cmap=cm.gray,` `origin='lower',` `extent=(-3,3,-3,3))`	Filled contour plot
`image(z)` `colormap(gray)`	`im = imshow(Z,` `interpolation='bilinear',` `origin='lower',` `extent=(-3,3,-3,3))`	Plot image data
	`# imshow() and contour() as above`	Image with contours
`quiver()`	`quiver()`	Direction field vectors

Perspective plots of surfaces over the x-y plane

MATLAB/Octave	Python	Description
`n=-2:.1:2;` `[x,y] = meshgrid(n,n);` `z=x.*exp(-x.^2-y.^2);`	`n=arrayrange(-2,2,.1)` `[x,y] = meshgrid(n,n)` `z = xpower(math.e,-x2-y*2)`
`mesh(z)`		Mesh plot
`surf(x,y,z)` or `surfl(x,y,z)` `% no surfl()`		Surface plot

Scatter (cloud) plots

MATLAB/Octave	Python	Description
`plot3(x,y,z,'k+')`		3d scatter plot

Save plot to a graphics file

MATLAB/Octave	Python	Description
`plot(1:10)` `print -depsc2 foo.eps` `gset output "foo.eps"` `gset terminal postscript eps` `plot(1:10)`	`savefig('foo.eps')`	PostScript
	`savefig('foo.pdf')`	PDF
	`savefig('foo.svg')`	SVG (vector graphics for www)
`print -dpng foo.png`	`savefig('foo.png')`	PNG (raster graphics)

Data analysis

Set membership operators

MATLAB/Octave	Python	Description
`a = [ 1 2 2 5 2 ];` `b = [ 2 3 4 ];`	`a = array([1,2,2,5,2])` `b = array([2,3,4])` `a = set([1,2,2,5,2])` `b = set([2,3,4])`	Create sets
`unique(a)`	`unique1d(a)` `unique(a)` `set(a)`	Set unique
`union(a,b)`	`union1d(a,b)` `a.union(b)`	Set union
`intersect(a,b)`	`intersect1d(a)` `a.intersection(b)`	Set intersection
`setdiff(a,b)`	`setdiff1d(a,b)` `a.difference(b)`	Set difference
`setxor(a,b)`	`setxor1d(a,b)` `a.symmetric_difference(b)`	Set exclusion
`ismember(2,a)`	`2 in a` `setmember1d(2,a)` `contains(a,2)`	True for set member

Statistics

MATLAB/Octave	Python	Description
`mean(a)`	`a.mean(axis=0)` `mean(a [,axis=0])`	Average
`median(a)`	`median(a)` or `median(a [,axis=0])`	Median
`std(a)`	`a.std(axis=0)` or `std(a [,axis=0])`	Standard deviation
`var(a)`	`a.var(axis=0)` or `var(a)`	Variance
`corr(x,y)`	`correlate(x,y)` or `corrcoef(x,y)`	Correlation coefficient
`cov(x,y)`	`cov(x,y)`	Covariance

Interpolation and regression

MATLAB/Octave	Python	Description
`z = polyval(polyfit(x,y,1),x)` `plot(x,y,'o', x,z ,'-')`	`(a,b) = polyfit(x,y,1)` `plot(x,y,'o', x,a*x+b,'-')`	Straight line fit
`a = x\y`	`linalg.lstsq(x,y)`	Linear least squares $y = ax + b$
`polyfit(x,y,3)`	`polyfit(x,y,3)`	Polynomial fit

Non-linear methods

Polynomials, root finding

MATLAB/Octave	Python	Description
	`poly()`	Polynomial
`roots([1 -1 -1])`	`roots()`	Find zeros of polynomial
`f = inline('1/x - (x-1)')` `fzero(f,1)`		Find a zero near $x = 1$
`solve('1/x = x-1')`		Solve symbolic equations
`polyval([1 2 1 2],1:10)`	`polyval(array([1,2,1,2]),arange(1,11))`	Evaluate polynomial

Differential equations

MATLAB/Octave	Python	Description
`diff(a)`	`diff(x, n=1, axis=0)`	Discrete difference function and approximate derivative
		Solve differential equations

Fourier analysis

MATLAB/Octave	Python	Description
`fft(a)`	`fft(a)` or	Fast fourier transform
`ifft(a)`	`ifft(a)` or	Inverse fourier transform
	`convolve(x,y)`	Linear convolution

Symbolic algebra; calculus

MATLAB/Octave	Python	Description
`factor()`		Factorization

Programming

MATLAB/Octave	Python	Description
`.m`	`.py`	Script file extension
`%` `%` or `#`	`#`	Comment symbol (rest of line)
`% must be in MATLABPATH` `% must be in LOADPATH`	`from pylab import *`	Import library functions
`string='a=234';` `eval(string)`	`string="a=234"` `eval(string)`	Eval

Loops

MATLAB/Octave	Python	Description
`for i=1:5; disp(i); end`	`for i in range(1,6): print(i)`	for-statement
`for i=1:5` `disp(i)` `disp(i*2)` `end`	`for i in range(1,6):` `print(i)` `print(i*2)`	Multiline for statements

Conditionals

MATLAB/Octave	Python	Description
`if 1>0 a=100; end`	`if 1>0: a=100`	if-statement
`if 1>0 a=100; else a=0; end`		if-else-statement

Debugging

MATLAB/Octave	Python	Description
`ans`		Most recent evaluated expression
`whos` or `who`		List variables loaded into memory
`clear x` or `clear [all]`		Clear variable $x$ from memory
`disp(a)`	`print a`	Print

Working directory and OS

MATLAB/Octave	Python	Description
`dir` or `ls`	`os.listdir(".")`	List files in directory
`what`	`grep.grep("*.py")`	List script files in directory
`pwd`	`os.getcwd()`	Displays the current working directory
`cd foo`	`os.chdir('foo')`	Change working directory
`!notepad` `system("notepad")`	`os.system('notepad')` `os.popen('notepad')`	Invoke a System Command

Time-stamp: "2007-11-09T16:46:36 vidar"
©2006 Vidar Bronken Gundersen, /mathesaurus.sf.net
Permission is granted to copy, distribute and/or modify this document as long as the above attribution is retained.

'Python' 카테고리의 다른 글

anaconda에서 conda로 생성된 virtual env 확인하는 법(List all virtualenv created by virtualenvwrapper) (0)	2019.04.03
Visualize Machine Learning Data in Python With Pandas https://machinelearningmastery.com/visualize-machine-learning-data-python-pandas/ (0)	2018.03.15
Introduction to Numpy -1 : An absolute beginners guide to Machine Learning and Data science. (0)	2017.10.16
[pythone IDE]PyCharm 설치 (0)	2017.03.25
free-programming-books 한글 (0)	2017.01.11

Posted by uniqueone

Machine Learning · Artificial Inteligence 웹북. 딥러닝 기초 도움

Deep Learning/course 2017. 10. 31. 11:35

Machine Learning · Artificial Inteligence
https://leonardoaraujosantos.gitbooks.io/artificial-inteligence/content/machine_learning.html

'Deep Learning > course' 카테고리의 다른 글

머신러닝 딥러닝 유튜브 강좌 (0)	2018.09.08
Top 10 Videos on Deep Learning in Python (0)	2017.11.18
Deep Learning Lecture Collection (Spring 2017): http://www.youtube.com/playlist?list=PL3FW7Lu3i5JvHM8ljYj-zLfQRF3EO8sYv (0)	2017.08.13
Learn Python in 3 days : Step by Step Guide - Data Science Central (0)	2017.06.11
NumPy for Matlab Users (0)	2017.05.29

Posted by uniqueone

How a 22 year old from Shanghai won a global deep learning challenge 자율주행차 위한 세그멘테이션 대회

Deep Learning/resources 2017. 10. 31. 08:27

How a 22 year old from Shanghai won a global deep learning challenge
https://blog.getnexar.com/how-a-22-year-old-from-shanghai-won-a-global-deep-learning-challenge-76f2299446a1

'Deep Learning > resources' 카테고리의 다른 글

Kaggle-knowhow 한국분들을 위한 Kaggle 자료 모음입니다 (0)	2017.11.24
구글에서 공개한 머신러닝 용어집 (https://developers.google.com/machine-learning/glossary/) 을 번역한 글입니다 (0)	2017.11.18
저희 운영진 Kyung Mo Kweon 님께서 만들어 주신 12인회 논문 읽기 비디오 (0)	2017.09.23
ZhuSuan : 베이지안 방법론과 딥러닝의 장점을 결합한 Bayesian Deep Learning을 위한 Python 기반 확률 프로그래밍 라이브러리. (0)	2017.09.20
India’s IIS and NIT Develops AI To Identify Protesters With Their Faces Partly Covered With Scarves or Hat (0)	2017.09.10

Posted by uniqueone

How to Use the Keras Functional API for Deep Learning

Deep Learning/TensorFlow 2017. 10. 27. 08:18

https://machinelearningmastery.com/keras-functional-api-deep-learning/

How to Use the Keras Functional API for Deep Learning

By Jason Brownlee on October 27, 2017 in Deep Learning

The Keras Python library makes creating deep learning models fast and easy.

The sequential API allows you to create models layer-by-layer for most problems. It is limited in that it does not allow you to create models that share layers or have multiple inputs or outputs.

The functional API in Keras is an alternate way of creating models that offers a lot more flexibility, including creating more complex models.

In this tutorial, you will discover how to use the more flexible functional API in Keras to define deep learning models.

After completing this tutorial, you will know:

The difference between the Sequential and Functional APIs.
How to define simple Multilayer Perceptron, Convolutional Neural Network, and Recurrent Neural Network models using the functional API.
How to define more complex models with shared layers and multiple inputs and outputs.

Let’s get started.

Tutorial Overview

This tutorial is divided into 6 parts; they are:

Keras Sequential Models
Keras Functional Models
Standard Network Models
Shared Layers Model
Multiple Input and Output Models
Best Practices

1. Keras Sequential Models

As a review, Keras provides a Sequential model API.

This is a way of creating deep learning models where an instance of the Sequential class is created and model layers are created and added to it.

For example, the layers can be defined and passed to the Sequential as an array:

from keras.models import Sequential
from keras.layers import Dense
model = Sequential([Dense(2, input_dim=1), Dense(1)])

from keras.models import Sequential

from keras.layers import Dense

model = Sequential([Dense(2, input_dim=1), Dense(1)])

Layers can also be added piecewise:

from keras.models import Sequential
from keras.layers import Dense
model = Sequential()
model.add(Dense(2, input_dim=1))
model.add(Dense(1))

from keras.models import Sequential

from keras.layers import Dense

model = Sequential()

model.add(Dense(2, input_dim=1))

model.add(Dense(1))

The Sequential model API is great for developing deep learning models in most situations, but it also has some limitations.

For example, it is not straightforward to define models that may have multiple different input sources, produce multiple output destinations or models that re-use layers.

2. Keras Functional Models

The Keras functional API provides a more flexible way for defining models.

It specifically allows you to define multiple input or output models as well as models that share layers. More than that, it allows you to define ad hoc acyclic network graphs.

Models are defined by creating instances of layers and connecting them directly to each other in pairs, then defining a Model that specifies the layers to act as the input and output to the model.

Let’s look at the three unique aspects of Keras functional API in turn:

1. Defining Input

Unlike the Sequential model, you must create and define a standalone Input layer that specifies the shape of input data.

The input layer takes a shape argument that is a tuple that indicates the dimensionality of the input data.

When input data is one-dimensional, such as for a multilayer Perceptron, the shape must explicitly leave room for the shape of the mini-batch size used when splitting the data when training the network. Therefore, the shape tuple is always defined with a hanging last dimension (2,), for example:

from keras.layers import Input
visible = Input(shape=(2,))

1 2	from keras.layers import Input visible = Input(shape=(2,))

2. Connecting Layers

The layers in the model are connected pairwise.

This is done by specifying where the input comes from when defining each new layer. A bracket notation is used, such that after the layer is created, the layer from which the input to the current layer comes from is specified.

Let’s make this clear with a short example. We can create the input layer as above, then create a hidden layer as a Dense that receives input only from the input layer.

from keras.layers import Input
from keras.layers import Dense
visible = Input(shape=(2,))
hidden = Dense(2)(visible)

from keras.layers import Input

from keras.layers import Dense

visible = Input(shape=(2,))

hidden = Dense(2)(visible)

Note the (visible) after the creation of the Dense layer that connects the input layer output as the input to the dense hidden layer.

It is this way of connecting layers piece by piece that gives the functional API its flexibility. For example, you can see how easy it would be to start defining ad hoc graphs of layers.

3. Creating the Model

After creating all of your model layers and connecting them together, you must define the model.

As with the Sequential API, the model is the thing you can summarize, fit, evaluate, and use to make predictions.

Keras provides a Model class that you can use to create a model from your created layers. It requires that you only specify the input and output layers. For example:

from keras.models import Model
from keras.layers import Input
from keras.layers import Dense
visible = Input(shape=(2,))
hidden = Dense(2)(visible)
model = Model(inputs=visible, outputs=hidden)

from keras.models import Model

from keras.layers import Input

from keras.layers import Dense

visible = Input(shape=(2,))

hidden = Dense(2)(visible)

model = Model(inputs=visible, outputs=hidden)

Now that we know all of the key pieces of the Keras functional API, let’s work through defining a suite of different models and build up some practice with it.

Each example is executable and prints the structure and creates a diagram of the graph. I recommend doing this for your own models to make it clear what exactly you have defined.

My hope is that these examples provide templates for you when you want to define your own models using the functional API in the future.

3. Standard Network Models

When getting started with the functional API, it is a good idea to see how some standard neural network models are defined.

In this section, we will look at defining a simple multilayer Perceptron, convolutional neural network, and recurrent neural network.

These examples will provide a foundation for understanding the more elaborate examples later.

Multilayer Perceptron

In this section, we define a multilayer Perceptron model for binary classification.

The model has 10 inputs, 3 hidden layers with 10, 20, and 10 neurons, and an output layer with 1 output. Rectified linear activation functions are used in each hidden layer and a sigmoid activation function is used in the output layer, for binary classification.

# Multilayer Perceptron
from keras.utils import plot_model
from keras.models import Model
from keras.layers import Input
from keras.layers import Dense
visible = Input(shape=(10,))
hidden1 = Dense(10, activation='relu')(visible)
hidden2 = Dense(20, activation='relu')(hidden1)
hidden3 = Dense(10, activation='relu')(hidden2)
output = Dense(1, activation='sigmoid')(hidden3)
model = Model(inputs=visible, outputs=output)
# summarize layers
print(model.summary())
# plot graph
plot_model(model, to_file='multilayer_perceptron_graph.png')

# Multilayer Perceptron

from keras.utils import plot_model

from keras.models import Model

from keras.layers import Input

from keras.layers import Dense

visible = Input(shape=(10,))

hidden1 = Dense(10, activation='relu')(visible)

hidden2 = Dense(20, activation='relu')(hidden1)

hidden3 = Dense(10, activation='relu')(hidden2)

output = Dense(1, activation='sigmoid')(hidden3)

model = Model(inputs=visible, outputs=output)

# summarize layers

print(model.summary())

# plot graph

plot_model(model, to_file='multilayer_perceptron_graph.png')

Running the example prints the structure of the network.

_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
input_1 (InputLayer)         (None, 10)                0
_________________________________________________________________
dense_1 (Dense)              (None, 10)                110
_________________________________________________________________
dense_2 (Dense)              (None, 20)                220
_________________________________________________________________
dense_3 (Dense)              (None, 10)                210
_________________________________________________________________
dense_4 (Dense)              (None, 1)                 11
=================================================================
Total params: 551
Trainable params: 551
Non-trainable params: 0
_________________________________________________________________

_________________________________________________________________

Layer (type) Output Shape Param #

=================================================================

input_1 (InputLayer) (None, 10) 0

_________________________________________________________________

dense_1 (Dense) (None, 10) 110

_________________________________________________________________

dense_2 (Dense) (None, 20) 220

_________________________________________________________________

dense_3 (Dense) (None, 10) 210

_________________________________________________________________

dense_4 (Dense) (None, 1) 11

=================================================================

Total params: 551

Trainable params: 551

Non-trainable params: 0

_________________________________________________________________

A plot of the model graph is also created and saved to file.

Multilayer Perceptron Network Graph

Convolutional Neural Network

In this section, we will define a convolutional neural network for image classification.

The model receives black and white 64×64 images as input, then has a sequence of two convolutional and pooling layers as feature extractors, followed by a fully connected layer to interpret the features and an output layer with a sigmoid activation for two-class predictions.

# Convolutional Neural Network
from keras.utils import plot_model
from keras.models import Model
from keras.layers import Input
from keras.layers import Dense
from keras.layers.convolutional import Conv2D
from keras.layers.pooling import MaxPooling2D
visible = Input(shape=(64,64,1))
conv1 = Conv2D(32, kernel_size=4, activation='relu')(visible)
pool1 = MaxPooling2D(pool_size=(2, 2))(conv1)
conv2 = Conv2D(16, kernel_size=4, activation='relu')(pool1)
pool2 = MaxPooling2D(pool_size=(2, 2))(conv2)
hidden1 = Dense(10, activation='relu')(pool2)
output = Dense(1, activation='sigmoid')(hidden1)
model = Model(inputs=visible, outputs=output)
# summarize layers
print(model.summary())
# plot graph
plot_model(model, to_file='convolutional_neural_network.png')

# Convolutional Neural Network

from keras.utils import plot_model

from keras.models import Model

from keras.layers import Input

from keras.layers import Dense

from keras.layers.convolutional import Conv2D

from keras.layers.pooling import MaxPooling2D

visible = Input(shape=(64,64,1))

conv1 = Conv2D(32, kernel_size=4, activation='relu')(visible)

pool1 = MaxPooling2D(pool_size=(2, 2))(conv1)

conv2 = Conv2D(16, kernel_size=4, activation='relu')(pool1)

pool2 = MaxPooling2D(pool_size=(2, 2))(conv2)

hidden1 = Dense(10, activation='relu')(pool2)

output = Dense(1, activation='sigmoid')(hidden1)

model = Model(inputs=visible, outputs=output)

# summarize layers

print(model.summary())

# plot graph

plot_model(model, to_file='convolutional_neural_network.png')

Running the example summarizes the model layers.

_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
input_1 (InputLayer)         (None, 64, 64, 1)         0
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 61, 61, 32)        544
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 30, 30, 32)        0
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 27, 27, 16)        8208
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 13, 13, 16)        0
_________________________________________________________________
dense_1 (Dense)              (None, 13, 13, 10)        170
_________________________________________________________________
dense_2 (Dense)              (None, 13, 13, 1)         11
=================================================================
Total params: 8,933
Trainable params: 8,933
Non-trainable params: 0
_________________________________________________________________

_________________________________________________________________

Layer (type) Output Shape Param #

=================================================================

input_1 (InputLayer) (None, 64, 64, 1) 0

_________________________________________________________________

conv2d_1 (Conv2D) (None, 61, 61, 32) 544

_________________________________________________________________

max_pooling2d_1 (MaxPooling2 (None, 30, 30, 32) 0

_________________________________________________________________

conv2d_2 (Conv2D) (None, 27, 27, 16) 8208

_________________________________________________________________

max_pooling2d_2 (MaxPooling2 (None, 13, 13, 16) 0

_________________________________________________________________

dense_1 (Dense) (None, 13, 13, 10) 170

_________________________________________________________________

dense_2 (Dense) (None, 13, 13, 1) 11

=================================================================

Total params: 8,933

Trainable params: 8,933

Non-trainable params: 0

_________________________________________________________________

A plot of the model graph is also created and saved to file.

Convolutional Neural Network Graph

Recurrent Neural Network

In this section, we will define a long short-term memory recurrent neural network for sequence classification.

The model expects 100 time steps of one feature as input. The model has a single LSTM hidden layer to extract features from the sequence, followed by a fully connected layer to interpret the LSTM output, followed by an output layer for making binary predictions.

# Recurrent Neural Network
from keras.utils import plot_model
from keras.models import Model
from keras.layers import Input
from keras.layers import Dense
from keras.layers.recurrent import LSTM
visible = Input(shape=(100,1))
hidden1 = LSTM(10)(visible)
hidden2 = Dense(10, activation='relu')(hidden1)
output = Dense(1, activation='sigmoid')(hidden2)
model = Model(inputs=visible, outputs=output)
# summarize layers
print(model.summary())
# plot graph
plot_model(model, to_file='recurrent_neural_network.png')

# Recurrent Neural Network

from keras.utils import plot_model

from keras.models import Model

from keras.layers import Input

from keras.layers import Dense

from keras.layers.recurrent import LSTM

visible = Input(shape=(100,1))

hidden1 = LSTM(10)(visible)

hidden2 = Dense(10, activation='relu')(hidden1)

output = Dense(1, activation='sigmoid')(hidden2)

model = Model(inputs=visible, outputs=output)

# summarize layers

print(model.summary())

# plot graph

plot_model(model, to_file='recurrent_neural_network.png')

Running the example summarizes the model layers.

_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
input_1 (InputLayer)         (None, 100, 1)            0
_________________________________________________________________
lstm_1 (LSTM)                (None, 10)                480
_________________________________________________________________
dense_1 (Dense)              (None, 10)                110
_________________________________________________________________
dense_2 (Dense)              (None, 1)                 11
=================================================================
Total params: 601
Trainable params: 601
Non-trainable params: 0
_________________________________________________________________

_________________________________________________________________

Layer (type) Output Shape Param #

=================================================================

input_1 (InputLayer) (None, 100, 1) 0

_________________________________________________________________

lstm_1 (LSTM) (None, 10) 480

_________________________________________________________________

dense_1 (Dense) (None, 10) 110

_________________________________________________________________

dense_2 (Dense) (None, 1) 11

=================================================================

Total params: 601

Trainable params: 601

Non-trainable params: 0

_________________________________________________________________

A plot of the model graph is also created and saved to file.

Recurrent Neural Network Graph

4. Shared Layers Model

Multiple layers can share the output from one layer.

For example, there may be multiple different feature extraction layers from an input, or multiple layers used to interpret the output from a feature extraction layer.

Let’s look at both of these examples.

Shared Input Layer

In this section, we define multiple convolutional layers with differently sized kernels to interpret an image input.

The model takes black and white images with the size 64×64 pixels. There are two CNN feature extraction submodels that share this input; the first has a kernel size of 4 and the second a kernel size of 8. The outputs from these feature extraction submodels are flattened into vectors and concatenated into one long vector and passed on to a fully connected layer for interpretation before a final output layer makes a binary classification.

# Shared Input Layer
from keras.utils import plot_model
from keras.models import Model
from keras.layers import Input
from keras.layers import Dense
from keras.layers import Flatten
from keras.layers.convolutional import Conv2D
from keras.layers.pooling import MaxPooling2D
from keras.layers.merge import concatenate
# input layer
visible = Input(shape=(64,64,1))
# first feature extractor
conv1 = Conv2D(32, kernel_size=4, activation='relu')(visible)
pool1 = MaxPooling2D(pool_size=(2, 2))(conv1)
flat1 = Flatten()(pool1)
# second feature extractor
conv2 = Conv2D(16, kernel_size=8, activation='relu')(visible)
pool2 = MaxPooling2D(pool_size=(2, 2))(conv2)
flat2 = Flatten()(pool2)
# merge feature extractors
merge = concatenate([flat1, flat2])
# interpretation layer
hidden1 = Dense(10, activation='relu')(merge)
# prediction output
output = Dense(1, activation='sigmoid')(hidden1)
model = Model(inputs=visible, outputs=output)
# summarize layers
print(model.summary())
# plot graph
plot_model(model, to_file='shared_input_layer.png')

# Shared Input Layer

from keras.utils import plot_model

from keras.models import Model

from keras.layers import Input

from keras.layers import Dense

from keras.layers import Flatten

from keras.layers.convolutional import Conv2D

from keras.layers.pooling import MaxPooling2D

from keras.layers.merge import concatenate

# input layer

visible = Input(shape=(64,64,1))

# first feature extractor

conv1 = Conv2D(32, kernel_size=4, activation='relu')(visible)

pool1 = MaxPooling2D(pool_size=(2, 2))(conv1)

flat1 = Flatten()(pool1)

# second feature extractor

conv2 = Conv2D(16, kernel_size=8, activation='relu')(visible)

pool2 = MaxPooling2D(pool_size=(2, 2))(conv2)

flat2 = Flatten()(pool2)

# merge feature extractors

merge = concatenate([flat1, flat2])

# interpretation layer

hidden1 = Dense(10, activation='relu')(merge)

# prediction output

output = Dense(1, activation='sigmoid')(hidden1)

model = Model(inputs=visible, outputs=output)

# summarize layers

print(model.summary())

# plot graph

plot_model(model, to_file='shared_input_layer.png')

Running the example summarizes the model layers.

____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to
====================================================================================================
input_1 (InputLayer)             (None, 64, 64, 1)     0
____________________________________________________________________________________________________
conv2d_1 (Conv2D)                (None, 61, 61, 32)    544         input_1[0][0]
____________________________________________________________________________________________________
conv2d_2 (Conv2D)                (None, 57, 57, 16)    1040        input_1[0][0]
____________________________________________________________________________________________________
max_pooling2d_1 (MaxPooling2D)   (None, 30, 30, 32)    0           conv2d_1[0][0]
____________________________________________________________________________________________________
max_pooling2d_2 (MaxPooling2D)   (None, 28, 28, 16)    0           conv2d_2[0][0]
____________________________________________________________________________________________________
flatten_1 (Flatten)              (None, 28800)         0           max_pooling2d_1[0][0]
____________________________________________________________________________________________________
flatten_2 (Flatten)              (None, 12544)         0           max_pooling2d_2[0][0]
____________________________________________________________________________________________________
concatenate_1 (Concatenate)      (None, 41344)         0           flatten_1[0][0]
                                                                   flatten_2[0][0]
____________________________________________________________________________________________________
dense_1 (Dense)                  (None, 10)            413450      concatenate_1[0][0]
____________________________________________________________________________________________________
dense_2 (Dense)                  (None, 1)             11          dense_1[0][0]
====================================================================================================
Total params: 415,045
Trainable params: 415,045
Non-trainable params: 0
____________________________________________________________________________________________________

____________________________________________________________________________________________________

Layer (type) Output Shape Param # Connected to

====================================================================================================

input_1 (InputLayer) (None, 64, 64, 1) 0

____________________________________________________________________________________________________

conv2d_1 (Conv2D) (None, 61, 61, 32) 544 input_1[0][0]

____________________________________________________________________________________________________

conv2d_2 (Conv2D) (None, 57, 57, 16) 1040 input_1[0][0]

____________________________________________________________________________________________________

max_pooling2d_1 (MaxPooling2D) (None, 30, 30, 32) 0 conv2d_1[0][0]

____________________________________________________________________________________________________

max_pooling2d_2 (MaxPooling2D) (None, 28, 28, 16) 0 conv2d_2[0][0]

____________________________________________________________________________________________________

flatten_1 (Flatten) (None, 28800) 0 max_pooling2d_1[0][0]

____________________________________________________________________________________________________

flatten_2 (Flatten) (None, 12544) 0 max_pooling2d_2[0][0]

____________________________________________________________________________________________________

concatenate_1 (Concatenate) (None, 41344) 0 flatten_1[0][0]

flatten_2[0][0]

____________________________________________________________________________________________________

dense_1 (Dense) (None, 10) 413450 concatenate_1[0][0]

____________________________________________________________________________________________________

dense_2 (Dense) (None, 1) 11 dense_1[0][0]

====================================================================================================

Total params: 415,045

Trainable params: 415,045

Non-trainable params: 0

____________________________________________________________________________________________________

A plot of the model graph is also created and saved to file.

Neural Network Graph With Shared Inputs

Shared Feature Extraction Layer

In this section, we will two parallel submodels to interpret the output of an LSTM feature extractor for sequence classification.

The input to the model is 100 time steps of 1 feature. An LSTM layer with 10 memory cells interprets this sequence. The first interpretation model is a shallow single fully connected layer, the second is a deep 3 layer model. The output of both interpretation models are concatenated into one long vector that is passed to the output layer used to make a binary prediction.

# Shared Feature Extraction Layer
from keras.utils import plot_model
from keras.models import Model
from keras.layers import Input
from keras.layers import Dense
from keras.layers.recurrent import LSTM
from keras.layers.merge import concatenate
# define input
visible = Input(shape=(100,1))
# feature extraction
extract1 = LSTM(10)(visible)
# first interpretation model
interp1 = Dense(10, activation='relu')(extract1)
# second interpretation model
interp11 = Dense(10, activation='relu')(extract1)
interp12 = Dense(20, activation='relu')(interp11)
interp13 = Dense(10, activation='relu')(interp12)
# merge interpretation
merge = concatenate([interp1, interp13])
# output
output = Dense(1, activation='sigmoid')(merge)
model = Model(inputs=visible, outputs=output)
# summarize layers
print(model.summary())
# plot graph
plot_model(model, to_file='shared_feature_extractor.png')

# Shared Feature Extraction Layer

from keras.utils import plot_model

from keras.models import Model

from keras.layers import Input

from keras.layers import Dense

from keras.layers.recurrent import LSTM

from keras.layers.merge import concatenate

# define input

visible = Input(shape=(100,1))

# feature extraction

extract1 = LSTM(10)(visible)

# first interpretation model

interp1 = Dense(10, activation='relu')(extract1)

# second interpretation model

interp11 = Dense(10, activation='relu')(extract1)

interp12 = Dense(20, activation='relu')(interp11)

interp13 = Dense(10, activation='relu')(interp12)

# merge interpretation

merge = concatenate([interp1, interp13])

# output

output = Dense(1, activation='sigmoid')(merge)

model = Model(inputs=visible, outputs=output)

# summarize layers

print(model.summary())

# plot graph

plot_model(model, to_file='shared_feature_extractor.png')

Running the example summarizes the model layers.

____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to
====================================================================================================
input_1 (InputLayer)             (None, 100, 1)        0
____________________________________________________________________________________________________
lstm_1 (LSTM)                    (None, 10)            480         input_1[0][0]
____________________________________________________________________________________________________
dense_2 (Dense)                  (None, 10)            110         lstm_1[0][0]
____________________________________________________________________________________________________
dense_3 (Dense)                  (None, 20)            220         dense_2[0][0]
____________________________________________________________________________________________________
dense_1 (Dense)                  (None, 10)            110         lstm_1[0][0]
____________________________________________________________________________________________________
dense_4 (Dense)                  (None, 10)            210         dense_3[0][0]
____________________________________________________________________________________________________
concatenate_1 (Concatenate)      (None, 20)            0           dense_1[0][0]
                                                                   dense_4[0][0]
____________________________________________________________________________________________________
dense_5 (Dense)                  (None, 1)             21          concatenate_1[0][0]
====================================================================================================
Total params: 1,151
Trainable params: 1,151
Non-trainable params: 0
____________________________________________________________________________________________________

____________________________________________________________________________________________________

Layer (type) Output Shape Param # Connected to

====================================================================================================

input_1 (InputLayer) (None, 100, 1) 0

____________________________________________________________________________________________________

lstm_1 (LSTM) (None, 10) 480 input_1[0][0]

____________________________________________________________________________________________________

dense_2 (Dense) (None, 10) 110 lstm_1[0][0]

____________________________________________________________________________________________________

dense_3 (Dense) (None, 20) 220 dense_2[0][0]

____________________________________________________________________________________________________

dense_1 (Dense) (None, 10) 110 lstm_1[0][0]

____________________________________________________________________________________________________

dense_4 (Dense) (None, 10) 210 dense_3[0][0]

____________________________________________________________________________________________________

concatenate_1 (Concatenate) (None, 20) 0 dense_1[0][0]

dense_4[0][0]

____________________________________________________________________________________________________

dense_5 (Dense) (None, 1) 21 concatenate_1[0][0]

====================================================================================================

Total params: 1,151

Trainable params: 1,151

Non-trainable params: 0

____________________________________________________________________________________________________

A plot of the model graph is also created and saved to file.

Neural Network Graph With Shared Feature Extraction Layer

5. Multiple Input and Output Models

The functional API can also be used to develop more complex models with multiple inputs, possibly with different modalities. It can also be used to develop models that produce multiple outputs.

We will look at examples of each in this section.

Multiple Input Model

We will develop an image classification model that takes two versions of the image as input, each of a different size. Specifically a black and white 64×64 version and a color 32×32 version. Separate feature extraction CNN models operate on each, then the results from both models are concatenated for interpretation and ultimate prediction.

Note that in the creation of the Model() instance, that we define the two input layers as an array. Specifically:

model = Model(inputs=[visible1, visible2], outputs=output)

1	model = Model(inputs=[visible1, visible2], outputs=output)

The complete example is listed below.

# Multiple Inputs
from keras.utils import plot_model
from keras.models import Model
from keras.layers import Input
from keras.layers import Dense
from keras.layers import Flatten
from keras.layers.convolutional import Conv2D
from keras.layers.pooling import MaxPooling2D
from keras.layers.merge import concatenate
# first input model
visible1 = Input(shape=(64,64,1))
conv11 = Conv2D(32, kernel_size=4, activation='relu')(visible1)
pool11 = MaxPooling2D(pool_size=(2, 2))(conv11)
conv12 = Conv2D(16, kernel_size=4, activation='relu')(pool11)
pool12 = MaxPooling2D(pool_size=(2, 2))(conv12)
flat1 = Flatten()(pool12)
# second input model
visible2 = Input(shape=(32,32,3))
conv21 = Conv2D(32, kernel_size=4, activation='relu')(visible2)
pool21 = MaxPooling2D(pool_size=(2, 2))(conv21)
conv22 = Conv2D(16, kernel_size=4, activation='relu')(pool21)
pool22 = MaxPooling2D(pool_size=(2, 2))(conv22)
flat2 = Flatten()(pool22)
# merge input models
merge = concatenate([flat1, flat2])
# interpretation model
hidden1 = Dense(10, activation='relu')(merge)
hidden2 = Dense(10, activation='relu')(hidden1)
output = Dense(1, activation='sigmoid')(hidden2)
model = Model(inputs=[visible1, visible2], outputs=output)
# summarize layers
print(model.summary())
# plot graph
plot_model(model, to_file='multiple_inputs.png')

# Multiple Inputs

from keras.utils import plot_model

from keras.models import Model

from keras.layers import Input

from keras.layers import Dense

from keras.layers import Flatten

from keras.layers.convolutional import Conv2D

from keras.layers.pooling import MaxPooling2D

from keras.layers.merge import concatenate

# first input model

visible1 = Input(shape=(64,64,1))

conv11 = Conv2D(32, kernel_size=4, activation='relu')(visible1)

pool11 = MaxPooling2D(pool_size=(2, 2))(conv11)

conv12 = Conv2D(16, kernel_size=4, activation='relu')(pool11)

pool12 = MaxPooling2D(pool_size=(2, 2))(conv12)

flat1 = Flatten()(pool12)

# second input model

visible2 = Input(shape=(32,32,3))

conv21 = Conv2D(32, kernel_size=4, activation='relu')(visible2)

pool21 = MaxPooling2D(pool_size=(2, 2))(conv21)

conv22 = Conv2D(16, kernel_size=4, activation='relu')(pool21)

pool22 = MaxPooling2D(pool_size=(2, 2))(conv22)

flat2 = Flatten()(pool22)

# merge input models

merge = concatenate([flat1, flat2])

# interpretation model

hidden1 = Dense(10, activation='relu')(merge)

hidden2 = Dense(10, activation='relu')(hidden1)

output = Dense(1, activation='sigmoid')(hidden2)

model = Model(inputs=[visible1, visible2], outputs=output)

# summarize layers

print(model.summary())

# plot graph

plot_model(model, to_file='multiple_inputs.png')

Running the example summarizes the model layers.

____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to
====================================================================================================
input_1 (InputLayer)             (None, 64, 64, 1)     0
____________________________________________________________________________________________________
input_2 (InputLayer)             (None, 32, 32, 3)     0
____________________________________________________________________________________________________
conv2d_1 (Conv2D)                (None, 61, 61, 32)    544         input_1[0][0]
____________________________________________________________________________________________________
conv2d_3 (Conv2D)                (None, 29, 29, 32)    1568        input_2[0][0]
____________________________________________________________________________________________________
max_pooling2d_1 (MaxPooling2D)   (None, 30, 30, 32)    0           conv2d_1[0][0]
____________________________________________________________________________________________________
max_pooling2d_3 (MaxPooling2D)   (None, 14, 14, 32)    0           conv2d_3[0][0]
____________________________________________________________________________________________________
conv2d_2 (Conv2D)                (None, 27, 27, 16)    8208        max_pooling2d_1[0][0]
____________________________________________________________________________________________________
conv2d_4 (Conv2D)                (None, 11, 11, 16)    8208        max_pooling2d_3[0][0]
____________________________________________________________________________________________________
max_pooling2d_2 (MaxPooling2D)   (None, 13, 13, 16)    0           conv2d_2[0][0]
____________________________________________________________________________________________________
max_pooling2d_4 (MaxPooling2D)   (None, 5, 5, 16)      0           conv2d_4[0][0]
____________________________________________________________________________________________________
flatten_1 (Flatten)              (None, 2704)          0           max_pooling2d_2[0][0]
____________________________________________________________________________________________________
flatten_2 (Flatten)              (None, 400)           0           max_pooling2d_4[0][0]
____________________________________________________________________________________________________
concatenate_1 (Concatenate)      (None, 3104)          0           flatten_1[0][0]
                                                                   flatten_2[0][0]
____________________________________________________________________________________________________
dense_1 (Dense)                  (None, 10)            31050       concatenate_1[0][0]
____________________________________________________________________________________________________
dense_2 (Dense)                  (None, 10)            110         dense_1[0][0]
____________________________________________________________________________________________________
dense_3 (Dense)                  (None, 1)             11          dense_2[0][0]
====================================================================================================
Total params: 49,699
Trainable params: 49,699
Non-trainable params: 0
____________________________________________________________________________________________________

____________________________________________________________________________________________________

Layer (type) Output Shape Param # Connected to

====================================================================================================

input_1 (InputLayer) (None, 64, 64, 1) 0

____________________________________________________________________________________________________

input_2 (InputLayer) (None, 32, 32, 3) 0

____________________________________________________________________________________________________

conv2d_1 (Conv2D) (None, 61, 61, 32) 544 input_1[0][0]

____________________________________________________________________________________________________

conv2d_3 (Conv2D) (None, 29, 29, 32) 1568 input_2[0][0]

____________________________________________________________________________________________________

max_pooling2d_1 (MaxPooling2D) (None, 30, 30, 32) 0 conv2d_1[0][0]

____________________________________________________________________________________________________

max_pooling2d_3 (MaxPooling2D) (None, 14, 14, 32) 0 conv2d_3[0][0]

____________________________________________________________________________________________________

conv2d_2 (Conv2D) (None, 27, 27, 16) 8208 max_pooling2d_1[0][0]

____________________________________________________________________________________________________

conv2d_4 (Conv2D) (None, 11, 11, 16) 8208 max_pooling2d_3[0][0]

____________________________________________________________________________________________________

max_pooling2d_2 (MaxPooling2D) (None, 13, 13, 16) 0 conv2d_2[0][0]

____________________________________________________________________________________________________

max_pooling2d_4 (MaxPooling2D) (None, 5, 5, 16) 0 conv2d_4[0][0]

____________________________________________________________________________________________________

flatten_1 (Flatten) (None, 2704) 0 max_pooling2d_2[0][0]

____________________________________________________________________________________________________

flatten_2 (Flatten) (None, 400) 0 max_pooling2d_4[0][0]

____________________________________________________________________________________________________

concatenate_1 (Concatenate) (None, 3104) 0 flatten_1[0][0]

flatten_2[0][0]

____________________________________________________________________________________________________

dense_1 (Dense) (None, 10) 31050 concatenate_1[0][0]

____________________________________________________________________________________________________

dense_2 (Dense) (None, 10) 110 dense_1[0][0]

____________________________________________________________________________________________________

dense_3 (Dense) (None, 1) 11 dense_2[0][0]

====================================================================================================

Total params: 49,699

Trainable params: 49,699

Non-trainable params: 0

____________________________________________________________________________________________________

A plot of the model graph is also created and saved to file.

Neural Network Graph With Multiple Inputs

Multiple Output Model

In this section, we will develop a model that makes two different types of predictions. Given an input sequence of 100 time steps of one feature, the model will both classify the sequence and output a new sequence with the same length.

An LSTM layer interprets the input sequence and returns the hidden state for each time step. The first output model creates a stacked LSTM, interprets the features, and makes a binary prediction. The second output model uses the same output layer to make a real-valued prediction for each input time step.

# Multiple Outputs
from keras.utils import plot_model
from keras.models import Model
from keras.layers import Input
from keras.layers import Dense
from keras.layers.recurrent import LSTM
from keras.layers.wrappers import TimeDistributed
# input layer
visible = Input(shape=(100,1))
# feature extraction
extract = LSTM(10, return_sequences=True)(visible)
# classification output
class11 = LSTM(10)(extract)
class12 = Dense(10, activation='relu')(class11)
output1 = Dense(1, activation='sigmoid')(class12)
# sequence output
output2 = TimeDistributed(Dense(1, activation='linear'))(extract)
# output
model = Model(inputs=visible, outputs=[output1, output2])
# summarize layers
print(model.summary())
# plot graph
plot_model(model, to_file='multiple_outputs.png')

# Multiple Outputs

from keras.utils import plot_model

from keras.models import Model

from keras.layers import Input

from keras.layers import Dense

from keras.layers.recurrent import LSTM

from keras.layers.wrappers import TimeDistributed

# input layer

visible = Input(shape=(100,1))

# feature extraction

extract = LSTM(10, return_sequences=True)(visible)

# classification output

class11 = LSTM(10)(extract)

class12 = Dense(10, activation='relu')(class11)

output1 = Dense(1, activation='sigmoid')(class12)

# sequence output

output2 = TimeDistributed(Dense(1, activation='linear'))(extract)

# output

model = Model(inputs=visible, outputs=[output1, output2])

# summarize layers

print(model.summary())

# plot graph

plot_model(model, to_file='multiple_outputs.png')

Running the example summarizes the model layers.

____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to
====================================================================================================
input_1 (InputLayer)             (None, 100, 1)        0
____________________________________________________________________________________________________
lstm_1 (LSTM)                    (None, 100, 10)       480         input_1[0][0]
____________________________________________________________________________________________________
lstm_2 (LSTM)                    (None, 10)            840         lstm_1[0][0]
____________________________________________________________________________________________________
dense_1 (Dense)                  (None, 10)            110         lstm_2[0][0]
____________________________________________________________________________________________________
dense_2 (Dense)                  (None, 1)             11          dense_1[0][0]
____________________________________________________________________________________________________
time_distributed_1 (TimeDistribu (None, 100, 1)        11          lstm_1[0][0]
====================================================================================================
Total params: 1,452
Trainable params: 1,452
Non-trainable params: 0
____________________________________________________________________________________________________

____________________________________________________________________________________________________

Layer (type) Output Shape Param # Connected to

====================================================================================================

input_1 (InputLayer) (None, 100, 1) 0

____________________________________________________________________________________________________

lstm_1 (LSTM) (None, 100, 10) 480 input_1[0][0]

____________________________________________________________________________________________________

lstm_2 (LSTM) (None, 10) 840 lstm_1[0][0]

____________________________________________________________________________________________________

dense_1 (Dense) (None, 10) 110 lstm_2[0][0]

____________________________________________________________________________________________________

dense_2 (Dense) (None, 1) 11 dense_1[0][0]

____________________________________________________________________________________________________

time_distributed_1 (TimeDistribu (None, 100, 1) 11 lstm_1[0][0]

====================================================================================================

Total params: 1,452

Trainable params: 1,452

Non-trainable params: 0

____________________________________________________________________________________________________

A plot of the model graph is also created and saved to file.

Neural Network Graph With Multiple Outputs

6. Best Practices

In this section, I want to give you some tips to get the most out of the functional API when you are defining your own models.

Consistent Variable Names. Use the same variable name for the input (visible) and output layers (output) and perhaps even the hidden layers (hidden1, hidden2). It will help to connect things together correctly.
Review Layer Summary. Always print the model summary and review the layer outputs to ensure that the model was connected together as you expected.
Review Graph Plots. Always create a plot of the model graph and review it to ensure that everything was put together as you intended.
Name the layers. You can assign names to layers that are used when reviewing summaries and plots of the model graph. For example: Dense(1, name=’hidden1′).
Separate Submodels. Consider separating out the development of submodels and combine the submodels together at the end.

Do you have your own best practice tips when using the functional API?
Let me know in the comments.

Summary

In this tutorial, you discovered how to use the functional API in Keras for defining simple and complex deep learning models.

Specifically, you learned:

The difference between the Sequential and Functional APIs.
How to define simple Multilayer Perceptron, Convolutional Neural Network, and Recurrent Neural Network models using the functional API.
How to define more complex models with shared layers and multiple inputs and outputs.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

저작자표시 비영리 동일조건 (새창열림)

'Deep Learning > TensorFlow' 카테고리의 다른 글

TensorFlow Speech Recognition - Kaggle competition keras (0)	2017.11.25
An Introduction to different Types of Convolutions in Deep Learning (0)	2017.11.22
Keras Tutorial: The Ultimate Beginner's Guide to Deep Learning in Python (0)	2017.10.18
Quick complete Tensorflow tutorial to understand and run Alexnet, VGG, Inceptionv3, Resnet and squeezeNet networks (0)	2017.07.26
케라스 강좌 내용 (0)	2017.07.12

Posted by uniqueone

Swish activation, get slightly better result Than ReLU.

Deep Learning/algorithm 2017. 10. 26. 08:31

MNIST Kaggle submission with CNN Keras Swish activation
https://medium.com/@shahariarrabby/mnist-kaggle-submission-with-cnn-keras-switch-activation-62108f9463df

'Deep Learning > algorithm' 카테고리의 다른 글

Probabilistic Graphical Models Tutorial — Part 2 – Stats and Bots (0)	2017.11.24
A miscellany of fun deep learning papers (0)	2017.03.26
Faster R-CNN 한글 설명 (0)	2017.03.26
GAN 그리고 Unsupervised Learning (0)	2017.03.22
Linear algebra cheat sheet for deep learning (0)	2017.03.09

Posted by uniqueone

MoCoGAN: Decomposing Motion and Content for Video Generation

Deep Learning/GAN 2017. 10. 19. 08:12

https://github.com/sergeytulyakov/mocogan

MoCoGAN: Decomposing Motion and Content for Video Generation

This repository contains an implementation and further details of MoCoGAN: Decomposing Motion and Content for Video Generation by Sergey Tulyakov, Ming-Yu Liu, Xiaodong Yang, Jan Kautz.

Representation

MoCoGAN is a generative model for videos, which generates videos from random inputs. It features separated representations of motion and content, offering control over what is generated. For example, MoCoGAN can generate the same object performing different actions, as well as the same action performed by different objects

Examples of generated videos

We trained MoCoGAN on the MUG Facial Expression Database to generate facial expressions. When fixing the content code and changing the motion code, it generated the same person performs different expressions. When fixing the motion code and changing the content code, it generated different people performs the same expression. In the figure shown below, each column has fixed identity, each row shows the same action:

We trained MoCoGAN on a human action dataset where content is represented by the performer, executing several actions. When fixing the content code and changing the motion code, it generated the same person performs different actions. When fixing the motion code and changing the content code, it generated different people performs the same action. Each pair of images represents the same action executed by different people:

We have collected a large-scale TaiChi dataset including 4.5K videos of TaiChi performers. Below are videos generated by MoCoGAN.

Training MoCoGAN

Please refer to a wiki page

Citation

If you use MoCoGAN in your research please cite our paper:

Sergey Tulyakov, Ming-Yu Liu, Xiaodong Yang, Jan Kautz, "MoCoGAN: Decomposing Motion and Content for Video Generation"

저작자표시 비영리 동일조건 (새창열림)

'Deep Learning > GAN' 카테고리의 다른 글

안녕하세요, TensorFlow KR 여러분! StarGAN 리뷰 및 코드 실습 영상을 만들어 공유합니다. 한 장의 사진을 업로드해서 성별/나이 (0)	2020.10.13
Its the time of the week ... new #PyTorch libraries: FSGAN - Official PyTorch Im (0)	2020.04.17
[GAN] GAN — GAN Series (from the beginning to the end) GAN zoo 등은 GAN이 너무 많아 어떤 (0)	2020.03.18
아시는 분은 이미 아시겠지만 Nvidia의 StyleGAN v2 가 공개되었습니다. 전체적으로 generation 성능이 매우 개선되었습니다. (0)	2019.12.23
안녕하세요, TensorFlow KR 여러분, 저는 현재 AI를 활용한 이미지 변형을 이용해 작업을 제작 중에 있는 미술학도입니다. TensorFlow KR 여러분들에게 제가 진행중인 작업에 관해 조언을 받고 싶어 .. (0)	2019.10.30

Posted by uniqueone

Keras Tutorial: The Ultimate Beginner's Guide to Deep Learning in Python

Deep Learning/TensorFlow 2017. 10. 18. 08:48

Keras Tutorial: The Ultimate Beginner's Guide to Deep Learning in Python
https://elitedatascience.com/keras-tutorial-deep-learning-in-python?utm_content=bufferbce2c&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer

'Deep Learning > TensorFlow' 카테고리의 다른 글

An Introduction to different Types of Convolutions in Deep Learning (0)	2017.11.22
How to Use the Keras Functional API for Deep Learning (0)	2017.10.27
Quick complete Tensorflow tutorial to understand and run Alexnet, VGG, Inceptionv3, Resnet and squeezeNet networks (0)	2017.07.26
케라스 강좌 내용 (0)	2017.07.12
LSTM을 이용한 감정 분석 w/ Tensorflow. 텍스트파일에서 감정상태 분류 (0)	2017.07.05

Posted by uniqueone

Introduction to Numpy -1 : An absolute beginners guide to Machine Learning and Data science.

Python 2017. 10. 16. 13:30

Introduction to Numpy -1 : An absolute beginners guide to Machine Learning and Data science.

https://hackernoon.com/introduction-to-numpy-1-an-absolute-beginners-guide-to-machine-learning-and-data-science-5d87f13f0d51

Lets get started quickly. Numpy is a math library for python. It enables us to do computation efficiently and effectively. It is better than regular python because of it’s amazing capabilities.

In this article I’m just going to introduce you to the basics of what is mostly required for machine learning and datascience. I’m not going to cover everything that’s possible with numpy library. This is the part one of numpy tutorial series.

The first thing I want to introduce you to is the way you import it.

import numpy as np

Okay, now we’re telling python that “np” is the official reference to numpy from further on.

Let’s create python array and np array.

# python array
a = [1,2,3,4,5,6,7,8,9]

# numpy array
A = np.array([1,2,3,4,5,6,7,8,9])

If I were to print them, I wouldn’t see much difference.

print(a)
print(A)
====================================================================[1, 2, 3, 4, 5, 6, 7, 8, 9]
[1 2 3 4 5 6 7 8 9]

Okay, but why do I have to use an np array instead of a regular array?

The answer is that np arrays are better interms of faster computation and ease of manipulation.

More on those details here, if you’re interested:

Why NumPy instead of Python lists?
Is it worth my learning NumPy? I have approximately 100 financial markets series, and I am going to create a cube array…stackoverflow.com

Let’s proceed further with more cool stuff. Wait, there was nothing cool we saw yet! Okay, here’s something:

np.arange()

np.arange(0,10,2)
====================================================================array([0, 2, 4, 6, 8])

What arange([start],stop,[step]) does is that it arranges numbers from starting to stop, in steps of step. Here is what it means for np.arange(0,10,2):

return an np list starting from 0 all the way upto 10 but don’t include 10 and increment numbers by 2 each time.

So, that’s how we get :

array([0, 2, 4, 6, 8])

important thing remember here is that the stopping number is not going to be included in the list.

another example:

np.arange(2,29,5)
====================================================================
array([ 2,  7, 12, 17, 22, 27])

Before I proceed further, I’ll have to warn you that this “array” is interchangeably called “matrix” or also “vector”. So don’t get panicked when I say for example “Matrix shape is 2 X 3”. All it means is that array looks something like this:

array([ 2,  7, 12,], 
      [17, 22, 27])

Now, Let’s talk about the shape of a default np array.

Shape is an attribute for np array. When a default array, say for example A is called with shape, here is how it looks.

A = [1, 2, 3, 4, 5, 6, 7, 8, 9] 
A.shape
====================================================================
(9,)

This is a rank 1 matrix(array), where it just has 9 elements in a row.
Ideally it should be a 1 X 9 matrix right?

I agree with you, so that’s where reshape() comes into play. It is a method that changes the dimensions of your original matrix into your desired dimension.

Let’s look at reshape in action. You can pass a tuple of whatever dimension you want as long as the reshaped matrix and original matrix have the same number of elements.

A = [1, 2, 3, 4, 5, 6, 7, 8, 9]
A.reshape(1,9)
====================================================================
array([[1, 2, 3, 4, 5, 6, 7, 8, 9]])

Notice that reshape returns a multi-dim matrix. Two square brackets in the beginning indicate that. [[1, 2, 3, 4, 5, 6, 7, 8, 9]] is a potentially multi-dim matrix as opposed to [1, 2, 3, 4, 5, 6, 7, 8, 9].

Another example:

B = [1, 2, 3, 4, 5, 6, 7, 8, 9] 
B.reshape(3,3)
====================================================================
array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

If I look at B’s shape, it’s going to be (3,3):

B.shape
====================================================================
(3,3)

Perfect. Let’s proceed to np.zeros().

This time it’s your job to tell me what happens looking at this code:

np.zeros((4,3))
====================================================================
???????????

Good, if you thought it’s going to print a 4 X 3 matrix filled with zeros. Here’s the output:

np.zeros((4,3))
====================================================================
array([[ 0.,  0.,  0.],
       [ 0.,  0.,  0.],
       [ 0.,  0.,  0.],
       [ 0.,  0.,  0.]])

np.zeros((n,m)) returns an n x m matrix that contains zeros. It’s as simple as that.

Let’s guess again here: what does is np.eye() do?

Hint: eye() stands for Identity.

np.eye(5)
====================================================================
array([[ 1.,  0.,  0.,  0.,  0.],
       [ 0.,  1.,  0.,  0.,  0.],
       [ 0.,  0.,  1.,  0.,  0.],
       [ 0.,  0.,  0.,  1.,  0.],
       [ 0.,  0.,  0.,  0.,  1.]])

np.eye() returns an identity matrix with the specified dimensions.

What if we have to multiply 2 matrices?

No problem, we have np.dot().

np.dot() performs matrix multiplication, provided both the matrices are “multiply-able”. It just means that the number of columns of the first matrix must match the number of rows in second matrix.

ex: A = (2,3) & B=(3,2). Here number of cols in A= 3. Number of rows in B = 3. Since they match, multiplication is possible.

Let’s illustrate multiplication via np code:

# generate an identity matrix of (3 x 3)
I = np.eye(3)
I
====================================================================
array([[ 1.,  0.,  0.],
       [ 0.,  1.,  0.],
       [ 0.,  0.,  1.]])

# generate another (3 x 3) matrix to be multiplied.
D = np.arange(1,10).reshape(3,3)
D
====================================================================
array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

We now prepared both the matrices to be multiplied. Let’s see them in action.

# perform actual dot product.
M = np.dot(D,I)
M
====================================================================
array([[ 1.,  2.,  3.],
       [ 4.,  5.,  6.],
       [ 7.,  8.,  9.]])

Great! Now you know how easy and possible it is to multiply matrices! Also, notice that the entire array is now float type.

What about adding Elements of the matrix?

# add all the elements of matrix.
sum_val = np.sum(M)
sum_val
====================================================================
45.0

np.sum() adds all the elements of the matrix.

However are 2 variants.

1. Sum along the rows.

# sum along the rows
np.sum(M,axis=1)
====================================================================
array([  6.,  15.,  24.])

6 is the sum of 1st row (1, 2, 3).

15 is the sum of 2nd row (4, 5, 6).

24 is the sum of 3rd row (7, 8, 9).

2. Sum along the columns.

# sum along the cols
np.sum(M,axis=0)
====================================================================
array([ 12.,  15.,  18.])

12 is the sum of 1st col (1, 4, 7).

15 is the sum of 2nd col (2, 5, 8).

18 is the sum of 3rd col (3, 6, 9).

Here is the follow up tutorial — part 2 . That’s it at this point.

Here’s a video tutorial explaining everything that I did if you’re interested to consume via video.

If you liked this article, a clap/recommendation would be really appreciated. It helps me to write more such articles.

'Python' 카테고리의 다른 글

Visualize Machine Learning Data in Python With Pandas https://machinelearningmastery.com/visualize-machine-learning-data-python-pandas/ (0)	2018.03.15
NumPy for MATLAB users – Mathesaurus (0)	2017.11.08
[pythone IDE]PyCharm 설치 (0)	2017.03.25
free-programming-books 한글 (0)	2017.01.11
Jupyter notebook 이해하기https://www.slideshare.net/mobile/dahlmoon/jupyter-notebok-20160815?from_m_app=ios (0)	2017.01.06

Posted by uniqueone

기계학습 한글로된 자료(강의자료 + 영상)을 모아봤습니다.

Machine Learning/resources 2017. 10. 13. 08:07

https://m.facebook.com/groups/869457839786067?view=permalink&id=1478681038863741

기계학습을 공부한 산업공학과의 입장에서 공부할만한 한글로된 자료(강의자료 + 영상)을 모아봤습니다. 아래의 자료는 코드 실습이 필요한 내용일 경우는 언어는 전부 Python입니다. (통계학개론강의만 제외) 딥러닝에 관한 한글로된 영상 및 자료는 김성훈 교수님의 "모두의 딥러닝"이 있습니다. (정말 감사드립니다.) 영어로된 좋은 강의도 많지만 (cs231n, cs224d , RL course by David Silver, Neural Network course by Hugo Larochelle), 영어로 본격적으로 딥러닝을 공부하기전에 빠르게 익숙한 언어로 기계학습을 공부해보실 분들은 참고하셔도 좋을 것 같습니다.

cf. 웹프로그래밍은 사심으로 넣어봤습니다.

[k-mooc]
미적분학1 (성균관대 채영도 교수님)
- http://www.kmooc.kr/courses/course-v1:SKKUk+SKKU_EXGB506.01K+2017_SKKU22/about

미적분학2 (성균관대 채영도 교수님)
- http://www.kmooc.kr/courses/course-v1:SKKUk+SKKU_2017_05-01+2017_SKKU01/about

선형대수학 (성균관대 이상구 교수님)
- http://www.kmooc.kr/courses/course-v1:SKKUk+SKKU_2017_01+2017_SKKU01/about

R을 활용한 통계학개론 (부산대 김충락 교수님)
- http://www.kmooc.kr/courses/course-v1:PNUk+RS_C01+2017_KM_009/about

인공지능과 기계학습 (카이스트 오혜연 교수님)
- http://www.kmooc.kr/courses/course-v1:KAISTk+KCS470+2017_K0202/about

[kooc]
파이썬 자료구조 (카이스트 문일철 교수님)
- http://kooc.kaist.ac.kr/datastructure-2017f

영상이해를 위한 최적화 (카이스트 김창익 교수님)
- http://kooc.kaist.ac.kr/optimization2017/lecture/10543

인공지능 및 기계학습 개론 1 (카이스트 문일철 교수님)
- http://kooc.kaist.ac.kr/machinelearning1_17/lecture/10574
- http://seslab.kaist.ac.kr/xe2/page_GBex27

인공지능 및 기계학습 개론 2 (카이스트 문일철 교수님)
- http://kooc.kaist.ac.kr/machinelearning2__17/lecture/10573
- http://seslab.kaist.ac.kr/xe2/page_Dogc43

인공지능 및 기계학습 심화 (카이스트 문일철 교수님)
- http://kooc.kaist.ac.kr/machinelearning3
- http://seslab.kaist.ac.kr/xe2/page_lMmY25

[TeamLab]
데이터과학을 위한 파이썬 입문 (가천대 최성철 교수님)
- https://github.com/TeamLab/Gachon_CS50_Python_KMOOC

밑바닥부터 기계학습 (가천대 최성철 교수님)
- https://github.com/TeamLab/machine_learning_from_scratch_with_python

경영과학(가천대 최성철 교수님)
- https://github.com/TeamLab/Gachon_CS50_OR_KMOOC

웹프로그래밍 (가천대 최성철 교수님)
- https://github.com/TeamLab/cs50_web_programming

'Machine Learning > resources' 카테고리의 다른 글

Particle Swarm Optimization – A Tutorial (0)	2018.06.06
베이즈통계 기초 블로그 (0)	2018.05.27
머신러닝 용어 https://developers.google.com/machine-learning/glossary/ (0)	2017.09.28
30개의 필수 데이터과학, 머신러닝, 딥러닝 치트시트 (0)	2017.09.26
데이터과학 및 딥러닝을 위한 데이터세트 (0)	2017.07.04

Posted by uniqueone

베이즈 정리(Bayes Theorem)라고 알려진 사후확률 (posterior probability)에 관한 몇가지 수학을 논의할 것입니다

Machine Learning/algorithm 2017. 10. 12. 12:56

https://m.facebook.com/story.php?story_fbid=383487222087279&id=303538826748786

Naive Bayes Classification
우리는 베이즈 정리(Bayes Theorem)라고 알려진 사후확률 (posterior probability)에 관한 몇가지 수학을 논의할 것입니다. 이것은 Naive Bayes Classifier의 핵심 부분입니다. 그리고, Python의 sklearn 라이브러리를 탐색하고 논의할 문제에 대해 Python의 Naive Bayes Classifier의 코드를 작성합니다.
이 글은 두부분으로 나누어져 있습니다 . 파트1 에서는 naive bayes classier가 어떻게 작동하는지 설명합니다. 파트2 에서는 Python에서 Naive Bayes Classifier를 제공하는 sklearn 라이브러리를 사용한 프로그래밍 연습으로 구성됩니다. 그리고 우리가 학습시키는 프로그램의 정확성에 대해 논의합니다.

원문
https://medium.com/machine-learning-101/chapter-1-supervised-learning-and-naive-bayes-classification-part-1-theory-8b9e361897d5

'Machine Learning > algorithm' 카테고리의 다른 글

XGBoost: An Intuitive Explanation Ashutosh Nayak : https://towardsdatascience.c (0)	2019.12.23
리그레션 종류 (0)	2019.11.11
An Interactive Tutorial on Numerical Optimization (0)	2017.03.08
Machine Learning with MATLAB Examples (0)	2017.02.07
Overfitting, Regularization (0)	2017.01.31

Posted by uniqueone

머신러닝 용어 https://developers.google.com/machine-learning/glossary/

Machine Learning/resources 2017. 9. 28. 11:46

Machine Learning | Google Developers
https://developers.google.com/machine-learning/glossary/

Products Machine Learning Glossary
목차
A
B
C
D
E
F
G
H
I
K
L
M
N
O
P
Q
R
S
T
U
V
W

A

accuracy

The fraction of predictions that a classification model got right. In multi-class classification, accuracy is defined as follows:

Accuracy=CorrectPredictionsTotalNumberOfExamples
In binary classification, accuracy has the following definition:

Accuracy=TruePositives+TrueNegativesTotalNumberOfExamples
See true positive and true negative.

activation function

A function (for example, ReLU or sigmoid) that takes in the weighted sum of all of the inputs from the previous layer and then generates and passes an output value (typically nonlinear) to the next layer.

AdaGrad

A sophisticated gradient descent algorithm that rescales the gradients of each parameter, effectively giving each parameter an independent learning rate. For a full explanation, see this paper.

AUC (Area under the ROC Curve)

An evaluation metric that considers all possible classification thresholds.

The Area Under the ROC curve is the probability that a classifier will be more confident that a randomly chosen positive example is actually positive than that a randomly chosen negative example is positive.

B

backpropagation

The primary algorithm for performing gradient descent on neural networks. First, the output values of each node are calculated (and cached) in a forward pass. Then, the partial derivative of the error with respect to each parameter is calculated in a backward pass through the graph.

baseline

A simple model or heuristic used as reference point for comparing how well a model is performing. A baseline helps model developers quantify the minimal, expected performance on a particular problem.

batch

The set of examples used in one iteration (that is, one gradient update) of model training.

batch size

The number of examples in a batch. For example, the batch size of SGD is 1, while the batch size of a mini-batch is usually between 10 and 1000. Batch size is usually fixed during training and inference; however, TensorFlow does permit dynamic batch sizes.

bias

An intercept or offset from an origin. Bias (also known as the bias term) is referred to as b or w0 in machine learning models. For example, bias is the b in the following formula:

y′=b+w1x1+w2x2+…wnxn
Not to be confused with prediction bias.

binary classification

A type of classification task that outputs one of two mutually exclusive classes. For example, a machine learning model that evaluates email messages and outputs either "spam" or "not spam" is a binary classifier.

binning

See bucketing.

bucketing

Converting a (usually continuous) feature into multiple binary features called buckets or bins, typically based on value range. For example, instead of representing temperature as a single continuous floating-point feature, you could chop ranges of temperatures into discrete bins. Given temperature data sensitive to a tenth of a degree, all temperatures between 0.0 and 15.0 degrees could be put into one bin, 15.1 to 30.0 degrees could be a second bin, and 30.1 to 50.0 degrees could be a third bin.

C

calibration layer

A post-prediction adjustment, typically to account for prediction bias. The adjusted predictions and probabilities should match the distribution of an observed set of labels.

candidate sampling

A training-time optimization in which a probability is calculated for all the positive labels, using, for example, softmax, but only for a random sample of negative labels. For example, if we have an example labeled beagle and dog candidate sampling computes the predicted probabilities and corresponding loss terms for the beagle and dog class outputs in addition to a random subset of the remaining classes (cat, lollipop, fence). The idea is that the negative classes can learn from less frequent negative reinforcement as long as positive classes always get proper positive reinforcement, and this is indeed observed empirically. The motivation for candidate sampling is a computational efficiency win from not computing predictions for all negatives.

checkpoint

Data that captures the state of the variables of a model at a particular time. Checkpoints enable exporting model weights, as well as performing training across multiple sessions. Checkpoints also enable training to continue past errors (for example, job preemption). Note that the graph itself is not included in a checkpoint.

class

One of a set of enumerated target values for a label. For example, in a binary classification model that detects spam, the two classes are spam and not spam. In a multi-class classification model that identifies dog breeds, the classes would be poodle, beagle, pug, and so on.

class-imbalanced data set

A binary classification problem in which the labels for the two classes have significantly different frequencies. For example, a disease data set in which 0.0001 of examples have positive labels and 0.9999 have negative labels is a class-imbalanced problem, but a football game predictor in which 0.51 of examples label one team winning and 0.49 label the other team winning is not a class-imbalanced problem.

classification model

A type of machine learning model for distinguishing among two or more discrete classes. For example, a natural language processing classification model could determine whether an input sentence was in French, Spanish, or Italian. Compare with regression model.

classification threshold

A scalar-value criterion that is applied to a model's predicted score in order to separate the positive class from the negative class. Used when mapping logistic regression results to binary classification. For example, consider a logistic regression model that determines the probability of a given email message being spam. If the classification threshold is 0.9, then logistic regression values above 0.9 are classified as spam and those below 0.9 are classified as not spam.

confusion matrix

An NxN table that summarizes how successful a classification model's predictions were; that is, the correlation between the label and the model's classification. One axis of a confusion matrix is the label that the model predicted, and the other axis is the actual label. N represents the number of classes. In a binary classification problem, N=2. For example, here is a sample confusion matrix for a binary classification problem:

Tumor (predicted) Non-Tumor (predicted)
Tumor (actual) 18 1
Non-Tumor (actual) 6 452
The preceding confusion matrix shows that of the 19 samples that actually had tumors, the model correctly classified 18 as having tumors (18 true positives), and incorrectly classified 1 as not having a tumor (1 false negative). Similarly, of 458 samples that actually did not have tumors, 452 were correctly classified (452 true negatives) and 6 were incorrectly classified (6 false positives).

The confusion matrix of a multi-class confusion matrix can help you determine mistake patterns. For example, a confusion matrix could reveal that a model trained to recognize handwritten digits tends to mistakenly predict 9 instead of 4, or 1 instead of 7. The confusion matrix contains sufficient information to calculate a variety of performance metrics, including precision and recall.

continuous feature

A floating-point feature with an infinite range of possible values. Contrast with discrete feature.

convergence

Informally, often refers to a state reached during training in which training loss and validation loss change very little or not at all with each iteration after a certain number of iterations. In other words, a model reaches convergence when additional training on the current data will not improve the model. In deep learning, loss values sometimes stay constant or nearly so for many iterations before finally descending, temporarily producing a false sense of convergence.

See also early stopping.

See also Convex Optimization by Boyd and Vandenberghe.

convex function

A function typically shaped approximately like the letter U or a bowl. However, in degenerate cases, a convex function is shaped like a line. For example, the following are all convex functions:

L2 loss
Log Loss
L1 regularization
L2 regularization
Convex functions are popular loss functions. That's because when a minimum value exists (as is often the case), many variations of gradient descent are guaranteed to find a point close to the minimum point of the function. Similarly, many variations of stochastic gradient descent have a high probability (though, not a guarantee) of finding a point close to the minimum.

The sum of two convex functions (for example, L2 loss + L1 regularization) is a convex function.

Deep models are usually not convex functions. Remarkably, algorithms designed for convex optimization tend to work reasonably well on deep networks anyway, even though they rarely find a minimum.

cost

Synonym for loss.

cross-entropy

A generalization of Log Loss to multi-class classification problems. Cross-entropy quantifies the difference between two probability distributions. See also perplexity.

D

data set

A collection of examples.

decision boundary

The separator between classes learned by a model in a binary class or multi-class classification problems. For example, in the following image representing a binary classification problem, the decision boundary is the frontier between the orange class and the blue class:

A
well-defined boundary between one class and another.

deep model

A type of neural network containing multiple hidden layers. Deep models rely on trainable nonlinearities.

Contrast with wide model.

dense feature

A feature in which most values are non-zero, typically a Tensor of floating-point values. Contrast with sparse feature.

derived feature

Synonym for synthetic feature.

discrete feature

A feature with a finite set of possible values. For example, a feature whose values may only be animal, vegetable, or mineral is a discrete (or categorical) feature. Contrast with continuous feature.

dropout regularization

A form of regularization useful in training neural networks. Dropout regularization works by removing a random selection of a fixed number of the units in a network layer for a single gradient step. The more units dropped out, the stronger the regularization. This is analogous to training the network to emulate an exponentially large ensemble of smaller networks. For full details, see Dropout: A Simple Way to Prevent Neural Networks from Overfitting.

dynamic model

A model that is trained online in a continuously updating fashion. That is, data is continuously entering the model.

E

early stopping

A method for regularization that involves ending model training before training loss finishes decreasing. In early stopping, you end model training when the loss on a validation data set starts to increase, that is, when generalization performance worsens.

embeddings

A categorical feature represented as a continuous-valued feature. Typically, an embedding is a translation of a high-dimensional vector into a low-dimensional space. For example, you can represent the words in an English sentence in either of the following two ways:

As a million-element (high-dimensional) sparse vector in which all elements are integers. Each cell in the vector represents a separate English word; the value in a cell represents the number of times that word appears in a sentence. Since a single English sentence is unlikely to contain more than 50 words, nearly every cell in the vector will contain a 0. The few cells that aren't 0 will contain a low integer (usually 1) representing the number of times that word appeared in the sentence.
As a several-hundred-element (low-dimensional) dense vector in which each element holds a floating-point value between 0 and 1.
In TensorFlow, embeddings are trained by backpropagating loss just like any other parameter in a neural network.

empirical risk minimization (ERM)

Choosing the model function that minimizes loss on the training set. Contrast with structural risk minimization.

ensemble

A merger of the predictions of multiple models. You can create an ensemble via one or more of the following:

different initializations
different hyperparameters
different overall structure
Deep and wide models are a kind of ensemble.

Estimator

An instance of the tf.Estimator class, which encapsulates logic that builds a TensorFlow graph and runs a TensorFlow session. You may create your own Estimators (as described here) or instantiate pre-made Estimators created by others.

example

One row of a data set. An example contains one or more features and possibly a label. See also labeled example and unlabeled example.

F

false negative (FN)

An example in which the model mistakenly predicted the negative class. For example, the model inferred that a particular email message was not spam (the negative class), but that email message actually was spam.

false positive (FP)

An example in which the model mistakenly predicted the positive class. For example, the model inferred that a particular email message was spam (the positive class), but that email message was actually not spam.

false positive rate (FP rate)

The x-axis in an ROC curve. The FP rate is defined as follows:

FalsePositiveRate=FalsePositivesFalsePositives+TrueNegatives

feature

An input variable used in making predictions.

feature columns (FeatureColumns)

A set of related features, such as the set of all possible countries in which users might live. An example may have one or more features present in a feature column.

Feature columns in TensorFlow also encapsulate metadata such as:

the feature's data type
whether a feature is fixed length or should be converted to an embedding
A feature column can contain a single feature.

"Feature column" is Google-specific terminology. A feature column is referred to as a "namespace" in the VW system (at Yahoo/Microsoft), or a field.

feature cross

A synthetic feature formed by crossing (multiplying or taking a Cartesian product of) individual features. Feature crosses help represent nonlinear relationships.

feature engineering

The process of determining which features might be useful in training a model, and then converting raw data from log files and other sources into said features. In TensorFlow, feature engineering often means converting raw log file entries to tf.Example protocol buffers. See also tf.Transform.

Feature engineering is sometimes called feature extraction.

feature set

The group of feature your machine learning model trains on. For example, postal code, property size, and property condition might comprise a simple feature set for a model that predicts housing prices.

feature spec

Describes the information required to extract features data from the tf.Example protocol buffer. Because the tf.Example protocol buffer is just a container for data, you must specify the following:

the data to extract (that is, the keys for the features)
the data type (for example, float or int)
The length (fixed or variable)
The Estimator API provides facilities for producing a feature spec from a list of FeatureColumns.

full softmax

See softmax. Contrast with candidate sampling.

G

generalization

Refers to your model's ability to make correct predictions on new, previously unseen data as opposed to the data used to train the model.

generalized linear model

A generalization of least squares regression models, which are based on Gaussian noise, to other types of models based on other types of noise, such as Poisson noise or categorical noise. Examples of generalized linear models include:

logistic regression
multi-class regression
least squares regression
The parameters of a generalized linear model can be found through convex optimization.

Generalized linear models exhibit the following properties:

The average prediction of the optimal least squares regression model is equal to the average label on the training data.
The average probability predicted by the optimal logistic regression model is equal to the average label on the training data.
The power of a generalized linear model is limited by its features. Unlike a deep model, a generalized linear model cannot "learn new features."

gradient

The vector of partial derivatives with respect to all of the independent variables. In machine learning, the gradient is the the vector of partial derivatives of the model function. The gradient points in the direction of steepest ascent.

gradient clipping

Capping gradient values before applying them. Gradient clipping helps ensure numerical stability and prevents exploding gradients.

gradient descent

A technique to minimize loss by computing the gradients of loss with respect to the model's parameters, conditioned on training data. Informally, gradient descent iteratively adjusts parameters, gradually finding the best combination of weights and bias to minimize loss.

graph

In TensorFlow, a computation specification. Nodes in the graph represent operations. Edges are directed and represent passing the result of an operation (a Tensor) as an operand to another operation. Use TensorBoard to visualize a graph.

H

heuristic

A practical and nonoptimal solution to a problem, which is sufficient for making progress or for learning from.

hidden layer

A synthetic layer in a neural network between the input layer (that is, the features) and the output layer (the prediction). A neural network contains one or more hidden layers.

hinge loss

A family of loss functions for classification designed to find the decision boundary as distant as possible from each training example, thus maximizing the margin between examples and the boundary. KSVMs use hinge loss (or a related function, such as squared hinge loss). For binary classification, the hinge loss function is defined as follows:

loss=max(0,1−(y′∗y))
where y' is the raw output of the classifier model:

y′=b+w1x1+w2x2+…wnxn
and y is the true label, either -1 or +1.

Consequently, a plot of hinge loss vs. (y * y') looks as follows:

A
plot of hinge loss vs raw classifier score shows a distinct hinge at the
coordinate (1,0).

holdout data

Examples intentionally not used ("held out") during training. The validation data set and test data set are examples of holdout data. Holdout data helps evaluate your model's ability to generalize to data other than the data it was trained on. The loss on the holdout set provides a better estimate of the loss on an unseen data set than does the loss on the training set.

hyperparameter

The "knobs" that you tweak during successive runs of training a model. For example, learning rate is a hyperparameter.

Contrast with parameter.

I

independently and identically distributed (i.i.d)

Data drawn from a distribution that doesn't change, and where each value drawn doesn't depend on values that have been drawn previously. An i.i.d. is the ideal gas of machine learning—a useful mathematical construct but almost never exactly found in the real world. For example, the distribution of visitors to a web page may be i.i.d. over a brief window of time; that is, the distribution doesn't change during that brief window and one person's visit is generally independent of another's visit. However, if you expand that window of time, seasonal differences in the web page's visitors may appear.

inference

In machine learning, often refers to the process of making predictions by applying the trained model to unlabeled examples. In statistics, inference refers to the process of fitting the parameters of a distribution conditioned on some observed data. (See the Wikipedia article on statistical inference.)

input layer

The first layer (the one that receives the input data) in a neural network.

instance

Synonym for example.

inter-rater agreement

A measurement of how often human raters agree when doing a task. If raters disagree, the task instructions may need to be improved. Also sometimes called inter-annotator agreement or inter-rater reliability. See also Cohen's kappa, which is one of the most popular inter-rater agreement measurements.

K

Kernel Support Vector Machines (KSVMs)

A classification algorithm that seeks to maximize the margin between positive and negative classes by mapping input data vectors to a higher dimensional space. For example, consider a classification problem in which the input data set consists of a hundred features. In order to maximize the margin between positive and negative classes, KSVMs could internally map those features into a million-dimension space. KSVMs uses a loss function called hinge loss.

L

L1 loss

Loss function based on the absolute value of the difference between the values that a model is predicting and the actual values of the labels. L1 loss is less sensitive to outliers than L2 loss.

L1 regularization

A type of regularization that penalizes weights in proportion to the sum of the absolute values of the weights. In models relying on sparse features, L1 regularization helps drive the weights of irrelevant or barely relevant features to exactly 0, which removes those features from the model. Contrast with L2 regularization.

L2 loss

See squared loss.

L2 regularization

A type of regularization that penalizes weights in proportion to the sum of the squares of the weights. L2 regularization helps drive outlier weights (those with high positive or low negative values) closer to 0 but not quite to 0. (Contrast with L1 regularization.) L2 regularization always improves generalization in linear models.

label

In supervised learning, the "answer" or "result" portion of an example. Each example in a labeled data set consists of one or more features and a label. For instance, in a housing data set, the features might include the number of bedrooms, the number of bathrooms, and the age of the house, while the label might be the house's price. in a spam detection dataset, the features might include the subject line, the sender, and the email message itself, while the label would probably be either "spam" or "not spam."

labeled example

An example that contains features and a label. In supervised training, models learn from labeled examples.

lambda

Synonym for regularization rate.

(This is an overloaded term. Here we're focusing on the term's definition within regularization.)

layer

A set of neurons in a neural network that process a set of input features, or the output of those neurons.

Also, an abstraction in TensorFlow. Layers are Python functions that take Tensors and configuration options as input and produce other tensors as output. Once the necessary Tensors have been composed, the user can convert the result into an Estimator via a model function.

learning rate

A scalar used to train a model via gradient descent. During each iteration, the gradient descent algorithm multiplies the learning rate by the gradient. The resulting product is called the gradient step.

Learning rate is a key hyperparameter.

least squares regression

A linear regression model trained by minimizing L2 Loss.

linear regression

A type of regression model that outputs a continuous value from a linear combination of input features.

logistic regression

A model that generates a probability for each possible discrete label value in classification problems by applying a sigmoid function to a linear prediction. Although logistic regression is often used in binary classification problems, it can also be used in multi-class classification problems (where it becomes called multi-class logistic regression or multinomial regression).

Log Loss

The loss function used in binary logistic regression.

loss

A measure of how far a model's predictions are from its label. Or, to phrase it more pessimistically, a measure of how bad the model is. To determine this value, a model must define a loss function. For example, linear regression models typically use mean squared error for a loss function, while logistic regression models use Log Loss.

M

machine learning

A program or system that builds (trains) a predictive model from input data. The system uses the learned model to make useful predictions from new (never-before-seen) data drawn from the same distribution as the one used to train the model. Machine learning also refers to the field of study concerned with these programs or systems.

Mean Squared Error (MSE)

The average squared loss per example. MSE is calculated by dividing the squared loss by the number of examples. The values that TensorFlow Playground displays for "Training loss" and "Test loss" are MSE.

metric

A number that you care about. May or may not be directly optimized in a machine-learning system. A metric that your system tries to optimize is called an objective.

mini-batch

A small, randomly selected subset of the entire batch of examples run together in a single iteration of training or inference. The batch size of a mini-batch is usually between 10 and 1,000. It is much more efficient to calculate the loss on a mini-batch than on the full training data.

mini-batch stochastic gradient descent (SGD)

A gradient descent algorithm that uses mini-batches. In other words, mini-batch SGD estimates the gradient based on a small subset of the training data. Vanilla SGD uses a mini-batch of size 1.

ML

Abbreviation for machine learning.

model

The representation of what an ML system has learned from the training data. This is an overloaded term, which can have either of the following two related meanings:

The TensorFlow graph that expresses the structure of how a prediction will be computed.
The particular weights and biases of that TensorFlow graph, which are determined by training.

model training

The process of determining the best model.

Momentum

A sophisticated gradient descent algorithm in which a learning step depends not only on the derivative in the current step, but also on the derivatives in the step(s) that immediately preceded it. Momentum involves computing an exponentially weighted moving average of the gradients over time, analogous to momentum in physics. Momentum sometimes prevents learning from getting stuck in local minima.

multi-class

Classification problems that distinguish among more than two classes. For example, there are approximately 128 species of maple trees, so a model that categorized maple tree species would be multi-class. Conversely, a model that divided emails into only two categories (spam and not spam) would be a binary classification model.

N

NaN trap

When one number in your model becomes a NaN during training, which causes many or all other numbers in your model to eventually become a NaN.

NaN is an abbreviation for "Not a Number."

negative class

In binary classification, one class is termed positive and the other is termed negative. The positive class is the thing we're looking for and the negative class is the other possibility. For example, the negative class in a medical test might be "not tumor." The negative class in an email classifier might be "not spam." See also positive class.

neural network

A model that, taking inspiration from the brain, is composed of layers (at least one of which is hidden) consisting of simple connected units or neurons followed by nonlinearities.

neuron

A node in a neural network, typically taking in multiple input values and generating one output value. The neuron calculates the output value by applying an activation function (nonlinear transformation) to a weighted sum of input values.

normalization

The process of converting an actual range of values into a standard range of values, typically -1 to +1 or 0 to 1. For example, suppose the natural range of a certain feature is 800 to 6,000. Through subtraction and division, you can normalize those values into the range -1 to +1.

See also scaling.

numpy

An open-source math library that provides efficient array operations in Python. pandas is built on numpy.

O

objective

A metric that your algorithm is trying to optimize.

offline inference

Generating a group of predictions, storing those predictions, and then retrieving those predictions on demand. Contrast with online inference.

one-hot encoding

A sparse vector in which:

One element is set to 1.
All other elements are set to 0.
One-hot encoding is commonly used to represent strings or identifiers that have a finite set of possible values. For example, suppose a given botany data set chronicles 15,000 different species, each denoted with a unique string identifier. As part of feature engineering, you'll probably encode those string identifiers as one-hot vectors in which the vector has a size of 15,000.

one-vs.-all

Given a classification problem with N possible solutions, a one-vs.-all solution consists of N separate binary classifiers—one binary classifier for each possible outcome. For example, given a model that classifies examples as animal, vegetable, or mineral, a one-vs.-all solution would provide the following three separate binary classifiers:

animal vs. not animal
vegetable vs. not vegetable
mineral vs. not mineral

online inference

Generating predictions on demand. Contrast with offline inference.

Operation (op)

A node in the TensorFlow graph. In TensorFlow, any procedure that creates, manipulates, or destroys a Tensor is an operation. For example, a matrix multiply is an operation that takes two Tensors as input and generates one Tensor as output.

optimizer

A specific implementation of the gradient descent algorithm. TensorFlow's base class for optimizers is tf.train.Optimizer. Different optimizers (subclasses of tf.train.Optimizer) account for concepts such as:

momentum (Momentum)
update frequency (AdaGrad = ADAptive GRADient descent; Adam = ADAptive with Momentum; RMSProp)
sparsity/regularization (Ftrl)
more complex math (Proximal, and others)
You might even imagine an NN-driven optimizer.

outliers

Values distant from most other values. In machine learning, any of the following are outliers:

Weights with high absolute values.
Predicted values relatively far away from the actual values.
Input data whose values are more than roughly 3 standard deviations from the mean.
Outliers often cause problems in model training.

output layer

The "final" layer of a neural network. The layer containing the answer(s).

overfitting

Creating a model that matches the training data so closely that the model fails to make correct predictions on new data.

P

pandas

A column-oriented data analysis API. Many ML frameworks, including TensorFlow, support pandas data structures as input. See pandas documentation.

parameter

A variable of a model that the ML system trains on its own. For example, weights are parameters whose values the ML system gradually learns through successive training iterations. Contrast with hyperparameter.

Parameter Server (PS)

A job that keeps track of a model's parameters in a distributed setting.

parameter update

The operation of adjusting a model's parameters during training, typically within a single iteration of gradient descent.

partial derivative

A derivative in which all but one of the variables is considered a constant. For example, the partial derivative of f(x, y) with respect to x is the derivative of f considered as a function of x alone (that is, keeping y constant). The partial derivative of f with respect to x focuses only on how x is changing and ignores all other variables in the equation.

partitioning strategy

The algorithm by which variables are divided across parameter servers.

performance

Overloaded term with the following meanings:

The traditional meaning within software engineering. Namely: How fast (or efficiently) does this piece of software run?
The meaning within ML. Here, performance answers the following question: How correct is this model? That is, how good are the model's predictions?

perplexity

One measure of how well a model is accomplishing its task. For example, suppose your task is to read the first few letters of a word a user is typing on a smartphone keyboard, and to offer a list of possible completion words. Perplexity, P, for this task is approximately the number of guesses you need to offer in order for your list to contain the actual word the user is trying to type.

Perplexity is related to cross-entropy as follows:

P=2−crossentropy

pipeline

The infrastructure surrounding a machine learning algorithm. A pipeline includes gathering the data, putting the data into training data files, training one or more models, and exporting the models to production.

positive class

In binary classification, the two possible classes are labeled as positive and negative. The positive outcome is the thing we're testing for. (Admittedly, we're simultaneously testing for both outcomes, but play along.) For example, the positive class in a medical test might be "tumor." The positive class in an email classifier might be "spam."

Contrast with negative class.

precision

A metric for classification models. Precision identifies the frequency with which a model was correct when predicting the positive class. That is:

Precision=TruePositivesTruePositives+FalsePositives

prediction

A model's output when provided with an input example.

prediction bias

A value indicating how far apart the average of predictions is from the average of labels in the data set.

pre-made Estimator

An Estimator that someone has already built. TensorFlow provides several pre-made Estimators, including DNNClassifier, DNNRegressor, and LinearClassifier. You may build your own pre-made Estimators by following these instructions.

pre-trained model

Models or model components (such as embeddings) that have been already been trained. Sometimes, you'll feed pre-trained embeddings into a neural network. Other times, your model will train the embeddings itself rather than rely on the pre-trained embeddings.

prior belief

What you believe about the data before you begin training on it. For example, L2 regularization relies on a prior belief that weights should be small and normally distributed around zero.

Q

queue

A TensorFlow Operation that implements a queue data structure. Typically used in I/O.

R

rank

Overloaded term in ML that can mean either of the following:

The number of dimensions in a Tensor. For instance, a scalar has rank 0, a vector has rank 1, and a matrix has rank 2.
The ordinal position of a class in an ML problem that categorizes classes from highest to lowest. For example, a behavior ranking system could rank a dog's rewards from highest (a steak) to lowest (wilted kale).

rater

A human who provides labels in examples. Sometimes called an "annotator."

recall

A metric for classification models that answers the following question: Out of all the possible positive labels, how many did the model correctly identify? That is:

Recall=TruePositivesTruePositives+FalseNegatives

Rectified Linear Unit (ReLU)

An activation function with the following rules:

If input is negative or zero, output is 0.
If input is positive, output is equal to input.

regression model

A type of model that outputs continuous (typically, floating-point) values. Compare with classification models, which output discrete values, such as "day lily" or "tiger lily."

regularization

The penalty on a model's complexity. Regularization helps prevent overfitting. Different kinds of regularization include:

L1 regularization
L2 regularization
dropout regularization
early stopping (this is not a formal regularization method, but can effectively limit overfitting)

regularization rate

A scalar value, represented as lambda, specifying the relative importance of the regularization function. The following simplified loss equation shows the regularization rate's influence:

minimize(loss function + λ(regularization function))
Raising the regularization rate reduces overfitting but may make the model less accurate.

representation

The process of mapping data to useful features.

ROC (receiver operating characteristic) Curve

A curve of true positive rate vs. false positive rate at different classification thresholds. See also AUC.

root directory

The directory you specify for hosting subdirectories of the TensorFlow checkpoint and events files of multiple models.

Root Mean Squared Error (RMSE)

The square root of the Mean Squared Error.

S

Saver

A TensorFlow object responsible for saving model checkpoints.

scaling

A commonly used practice in feature engineering to tame a feature's range of values to match the range of other features in the data set. For example, suppose that you want all floating-point features in the data set to have a range of 0 to 1. Given a particular feature's range of 0 to 500, you could scale that feature by dividing each value by 500.

See also normalization.

scikit-learn

A popular open-source ML platform. See www.scikit-learn.org.

sequence model

A model whose inputs have a sequential dependence. For example, predicting the next video watched from a sequence of previously watched videos.

session

Maintains state (for example, variables) within a TensorFlow program.

sigmoid function

A function that maps logistic or multinomial regression output (log odds) to probabilities, returning a value between 0 and 1. The sigmoid function has the following formula:

y=11+e−σ
where σ in logistic regression problems is simply:

σ=b+w1x1+w2x2+…wnxn
In other words, the sigmoid function converts σ into a probability between 0 and 1.

In some neural networks, the sigmoid function acts as the activation function.

softmax

A function that provides probabilities for each possible class in a multi-class classification model. The probabilities add up to exactly 1.0. For example, softmax might determine that the probability of a particular image being a dog at 0.9, a cat at 0.08, and a horse at 0.02. (Also called full softmax.)

Contrast with candidate sampling.

sparse feature

Feature vector whose values are predominately zero or empty. For example, a vector containing a single 1 value and a million 0 values is sparse. As another example, words in a search query could also be a sparse feature—there are many possible words in a given language, but only a few of them occur in a given query.

Contrast with dense feature.

squared loss

The loss function used in linear regression. (Also known as L2 Loss.) This function calculates the squares of the difference between a model's predicted value for a labeled example and the actual value of the label. Due to squaring, this loss function amplifies the influence of bad predictions. That is, squared loss reacts more strongly to outliers than L1 loss.

static model

A model that is trained offline.

stationarity

A property of data in a data set, in which the data distribution stays constant across one or more dimensions. Most commonly, that dimension is time, meaning that data exhibiting stationarity doesn't change over time. For example, data that exhibits stationarity doesn't change from September to December.

step

A forward and backward evaluation of one batch.

step size

Synonym for learning rate.

stochastic gradient descent (SGD)

A gradient descent algorithm in which the batch size is one. In other words, SGD relies on a single example chosen uniformly at random from a data set to calculate an estimate of the gradient at each step.

structural risk minimization (SRM)

An algorithm that balances two goals:

The desire to build the most predictive model (for example, lowest loss).
The desire to keep the model as simple as possible (for example, strong regularization).
For example, a model function that minimizes loss+regularization on the training set is a structural risk minimization algorithm.

For more information, see http://www.svms.org/srm/.

Contrast with empirical risk minimization.

summary

In TensorFlow, a value or set of values calculated at a particular step, usually used for tracking model metrics during training.

supervised machine learning

Training a model from input data and its corresponding labels. Supervised machine learning is analogous to a student learning a subject by studying a set of questions and their corresponding answers. After mastering the mapping between questions and answers, the student can then provide answers to new (never-before-seen) questions on the same topic. Compare with unsupervised machine learning.

synthetic feature

A feature that is not present among the input features, but is derived from one or more of them. Kinds of synthetic features include the following:

Multiplying one feature by itself or by other feature(s). (These are termed feature crosses.)
Dividing one feature by a second feature.
Bucketing a continuous feature into range bins.
Features created by normalizing or scaling alone are not considered synthetic features.

T

target

Synonym for label.

Tensor

The primary data structure in TensorFlow programs. Tensors are N-dimensional (where N could be very large) data structures, most commonly scalars, vectors, or matrices. The elements of a Tensor can hold integer, floating-point, or string values.

Tensor Processing Unit (TPU)

An ASIC (application-specific integrated circuit) that optimizes the performance of TensorFlow programs.

Tensor rank

See rank.

Tensor shape

The number of elements a Tensor contains in various dimensions. For example, a [5, 10] Tensor has a shape of 5 in one dimension and 10 in another.

Tensor size

The total number of scalars a Tensor contains. For example, a [5, 10] Tensor has a size of 50.

TensorBoard

The dashboard that displays the summaries saved during the execution of one or more TensorFlow programs.

TensorFlow

A large-scale, distributed, machine learning platform. The term also refers to the base API layer in the TensorFlow stack, which supports general computation on dataflow graphs.

Although TensorFlow is primarily used for machine learning, you may also use TensorFlow for non-ML tasks that require numerical computation using dataflow graphs.

TensorFlow Playground

A program that visualizes how different hyperparameters influence model (primarily neural network) training. Go to http://playground.tensorflow.org to experiment with TensorFlow Playground.

TensorFlow Serving

A platform to deploy trained models in production.

test set

The subset of the data set that you use to test your model after the model has gone through initial vetting by the validation set.

Contrast with training set and validation set.

tf.Example

A standard protocol buffer for describing input data for machine learning model training or inference.

training

The process of determining the ideal parameters comprising a model.

training set

The subset of the data set used to train a model.

Contrast with validation set and test set.

true negative (TN)

An example in which the model correctly predicted the negative class. For example, the model inferred that a particular email message was not spam, and that email message really was not spam.

true positive (TP)

An example in which the model correctly predicted the positive class. For example, the model inferred that a particular email message was spam, and that email message really was spam.

true positive rate (TP rate)

Synonym for recall. That is:

TruePositiveRate=TruePositivesTruePositives+FalseNegatives
True positive rate is the y-axis in an ROC curve.

U

unlabeled example

An example that contains features but no label. Unlabeled examples are the input to inference. In semi-supervised and unsupervised learning, unlabeled examples are used during training.

unsupervised machine learning

Training a model to find patterns in a data set, typically an unlabeled data set.

The most common use of unsupervised machine learning is to cluster data into groups of similar examples. For example, an unsupervised machine learning algorithm can cluster songs together based on various properties of the music. The resulting clusters can become an input to other machine learning algorithms (for example, to a music recommendation service). Clustering can be helpful in domains where true labels are hard to obtain. For example, in domains such as anti-abuse and fraud, clusters can help humans better understand the data.

Another example of unsupervised machine learning is principal component analysis (PCA). For example, applying PCA on a data set containing the contents of millions of shopping carts might reveal that shopping carts containing lemons frequently also contain antacids.

Compare with supervised machine learning.

V

validation set

A subset of the data set—disjunct from the training set—that you use to adjust hyperparameters.

Contrast with training set and test set.

W

weight

A coefficient for a feature in a linear model, or an edge in a deep network. The goal of training a linear model is to determine the ideal weight for each feature. If a weight is 0, then its corresponding feature does not contribute to the model.

wide model

A linear model that typically has many sparse input features. We refer to it as "wide" since such a model is a special type of neural network with a large number of inputs that connect directly to the output node. Wide models are often easier to debug and inspect than deep models. Although wide models cannot express nonlinearities through hidden layers, they can use transformations such as feature crossing and bucketization to model nonlinearities in different ways.

Contrast with deep model.

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 3.0 License, and code samples are licensed under the Apache 2.0 License. For details, see our Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 9월 19, 2017.
Connect

Blog
Facebook
Google+
Medium
Twitter
YouTube
Programs

Women Techmakers
Agency Program
GDG
Google Developers Experts
Startup Launchpad
Developer Consoles

Google API Console
Google Cloud Platform Console
Google Play Console
Firebase Console
Cast SDK Developer Console
Chrome Web Store Dashboard

Android
Chrome
Firebase
Google Cloud Platform
모든 제품
한국어
Terms Privacy
Sign up for the Google Developers newsletter
구독하기

'Machine Learning > resources' 카테고리의 다른 글

베이즈통계 기초 블로그 (0)	2018.05.27
기계학습 한글로된 자료(강의자료 + 영상)을 모아봤습니다. (0)	2017.10.13
30개의 필수 데이터과학, 머신러닝, 딥러닝 치트시트 (0)	2017.09.26
데이터과학 및 딥러닝을 위한 데이터세트 (0)	2017.07.04
Beginning Machine Learning – A few Resources [Subjective] (0)	2017.06.20

Posted by uniqueone

30개의 필수 데이터과학, 머신러닝, 딥러닝 치트시트

Machine Learning/resources 2017. 9. 26. 06:57

https://m.facebook.com/story.php?story_fbid=377364479366220&id=303538826748786

30개의 필수 데이터과학, 머신러닝, 딥러닝 치트시트

데이터과학을 위한 Python
https://s3.amazonaws.com/assets.datacamp.com/blog_assets/PythonForDataScience.pdf

Pandas 기초
https://s3.amazonaws.com/assets.datacamp.com/blog_assets/PandasPythonForDataScience+(1).pdf

Pandas
https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Python_Pandas_Cheat_Sheet_2.pdf

Numpy
https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Numpy_Python_Cheat_Sheet.pdf

Scipy
https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Python_SciPy_Cheat_Sheet_Linear_Algebra.pdf

Scikit-learn
https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Scikit_Learn_Cheat_Sheet_Python.pdf

Matplotlib
https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Python_Matplotlib_Cheat_Sheet.pdf

Bokeh
https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Python_Bokeh_Cheat_Sheet.pdf

Base R
https://www.rstudio.com/resources/cheatsheets/

Advanced R
https://www.rstudio.com/resources/cheatsheets/

Caret
https://www.rstudio.com/resources/cheatsheets/

Data Import
https://www.rstudio.com/resources/cheatsheets/

Data Transformation with dplyr
https://www.rstudio.com/resources/cheatsheets/

R Markdown
https://www.rstudio.com/resources/cheatsheets/

R Studio IDE
https://github.com/rstudio/cheatsheets/raw/master/source/pdfs/rstudio-IDE-cheatsheet.pdf

Data Visualization
https://github.com/rstudio/cheatsheets/raw/master/source/pdfs/ggplot2-cheatsheet-2.1.pdf

Neural Network Architectures
http://www.asimovinstitute.org/neural-network-zoo/

Neural Network Cells
http://www.asimovinstitute.org/neural-network-zoo-prequel-cells-layers/

Neural Network Graphs
http://www.asimovinstitute.org/neural-network-zoo-prequel-cells-layers/

TensorFlow
https://www.altoros.com/tensorflow-cheat-sheet.html

Keras
https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Keras_Cheat_Sheet_Python.pdf

Probability
https://static1.squarespace.com/static/54bf3241e4b0f0d81bf7ff36/t/55e9494fe4b011aed10e48e5/1441352015658/probability_cheatsheet.pdf

Statistics
http://web.mit.edu/~csvoss/Public/usabo/stats_handout.pdf

Linear Algebra
https://minireference.com/static/tutorials/linear_algebra_in_4_pages.pdf

Big O Complexity
http://bigocheatsheet.com/

Common Data Structure Operations
http://bigocheatsheet.com/

Common Sorting Algorithms
http://bigocheatsheet.com/

Data Structures
https://www.clear.rice.edu/comp160/data_cheat.html

SQL
http://www.sql-tutorial.net/sql-cheat-sheet.pdf

'Machine Learning > resources' 카테고리의 다른 글

기계학습 한글로된 자료(강의자료 + 영상)을 모아봤습니다. (0)	2017.10.13
머신러닝 용어 https://developers.google.com/machine-learning/glossary/ (0)	2017.09.28
데이터과학 및 딥러닝을 위한 데이터세트 (0)	2017.07.04
Beginning Machine Learning – A few Resources [Subjective] (0)	2017.06.20
SNU TF 스터디 모임 1기 때부터 쭉 모아온 발표자료들 (0)	2017.04.14

Posted by uniqueone

저희 운영진 Kyung Mo Kweon 님께서 만들어 주신 12인회 논문 읽기 비디오

Deep Learning/resources 2017. 9. 23. 08:26

https://www.facebook.com/groups/TensorFlowKR/permalink/536521580022238/

https://kkweon.github.io/pr12-web-app-elm/

JunHo Kim 님께서 올려주신 글에서 발견한 것인데요.

저희 운영진 Kyung Mo Kweon 님께서 만들어 주신 12인회 논문 읽기 비디오: https://kkweon.github.io/pr12-web-app-elm/ 엄청 보기 편합니다. (이런건 어떻게 만드시는지 정말 대단!)

저도 바로바로 업데이트 하겠습니다. http://bit.ly/TFPR12

Search title/speaker

Ask me anything: Dynamic memory networks for natural language processing
PR037
발표자: 곽근봉
Learning to Remember Rare Events
PR036
발표자: 전태균
Understanding Black-box Predictions via Influence Functions
PR035
발표자: 엄태웅
Inception and Xception
PR034
발표자: 유재준
PVANet: Lightweight Deep Neural Networks for Real-time Object Detection
PR033
발표자: 이진원
Deep Visual-Semantic Alignments for Generating Image Descriptions
PR032
발표자: 강지양
Learning to learn by gradient descent by gradient descent
PR031
발표자: 차준범
Photo-Realistic Single Image Super Resolution Using a Generative Adversarial Network
PR030
발표자: 김승일
Apprenticeship Learning via Inverse Reinforcement Learning
PR029
발표자: 서기호
Densely Connected Convolutional Networks (CVPR 2017, Best Paper Award) by Gao Huang et al.
PR028
발표자: 김성훈
loVe - Global vectors for word representation
PR027
발표자: 곽근봉
Notes for CVPR Machine Learning Session
PR026
발표자: 전태균
Learning with side information through modality hallucination (2016)
PR025
발표자: 엄태웅
Pixel Recurrent Neural Network
PR024
발표자: 유재준
YOLO9000: Better, Faster, Stronger
PR023
발표자: 이진원
InfoGAN (OpenAI)
PR022
발표자: 차준범
Batch Normalization
PR021
발표자: 청영재
Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification
PR020
발표자: 강지양
Continuous Control with Deep Reinforcement Learning
PR019
발표자: 김승일
A Simple Neural Network Module for Relational Reasoning (DeepMind)
PR018
발표자: 김성훈
Neural Architecture Search with Reinforcement Learning
PR017
발표자: 서기호
You only look once: Unified, real-time object detection
PR016
발표자: 전태균
onvolutional Neural Networks for Sentence Classification
PR015
발표자: 곽근봉
On Human Motion Prediction using RNNs (2017)
PR014
발표자: 엄태웅
Domain Adversarial Training of Neural Network
PR013
발표자: 유재준
Faster R-CNN : Towards Real-Time Object Detection with Region Proposal Networks
PR012
발표자: 이진원
Spatial Transformer Networks
PR011
발표자: 강지양
Auto-Encoding Variational Bayes, ICLR 2014
PR010
발표자: 차준범
Distilling the Knowledge in a Neural Network (Slide: English, Speaking: Korean)
PR009
발표자: 청영재
Reverse Classification Accuracy(역분류 정확도)
PR008
발표자: 정동준
Deep Photo Style Transfer
PR007
발표자: 김승일
Neural Turing Machine
PR006
발표자: 서기호
Playing Atari with Deep Reinforcement Learning (NIPS 2013 Deep Learning Workshop)
PR005
발표자: 김성훈
Image Super-Resolution Using Deep Convolutional Networks
PR004
발표자: 전태균
Learning phrase representations using RNN encoder-decoder for statistical machine translation
PR003
발표자: 곽근봉
Deformable Convolutional Networks (2017)
PR002
발표자: 엄태웅
Generative adversarial nets by Jaejun Yoo (2017/4/13)
PR001
발표자: 유재준
논문 읽기 각오를 다집니다.
PR000
발표자: all

'Deep Learning > resources' 카테고리의 다른 글

구글에서 공개한 머신러닝 용어집 (https://developers.google.com/machine-learning/glossary/) 을 번역한 글입니다 (0)	2017.11.18
How a 22 year old from Shanghai won a global deep learning challenge 자율주행차 위한 세그멘테이션 대회 (0)	2017.10.31
ZhuSuan : 베이지안 방법론과 딥러닝의 장점을 결합한 Bayesian Deep Learning을 위한 Python 기반 확률 프로그래밍 라이브러리. (0)	2017.09.20
India’s IIS and NIT Develops AI To Identify Protesters With Their Faces Partly Covered With Scarves or Hat (0)	2017.09.10
머신러닝/딥러닝 전반의 내용을 담고 있는 ebook (0)	2017.07.07

Posted by uniqueone

ZhuSuan : 베이지안 방법론과 딥러닝의 장점을 결합한 Bayesian Deep Learning을 위한 Python 기반 확률 프로그래밍 라이브러리.

Deep Learning/resources 2017. 9. 20. 10:52

https://m.facebook.com/story.php?story_fbid=375342342901767&id=303538826748786

ZhuSuan : 베이지안 방법론과 딥러닝의 장점을 결합한 Bayesian Deep Learning을 위한 Python 기반 확률 프로그래밍 라이브러리.
ZhuSuan은 Tensorflow 기반에서 작성되었습니다. ZhuSuan은 주로 deterministic neural network(결정론적 신경망)과 감독된 과제를 위해 설계된 기존의 딥러닝 라이브러리와 달리 확률 모델을 작성하고 베이지안 추론을 적용하기 위한 딥러닝 스타일 알고리즘을 제공합니다.

ZhuSuan을 사용하면 사용자는 복잡한 학습을 위한 강력한 피팅과 멀티 GPU 학습을 즐길 수 있을뿐 아니라 생성된 모델을 사용하여 복잡한 세계를 모델링하고 레이블이 없는 데이터를 활용하여 기본 베이지안 추론을 수행하여 불확실성을 처리할 수 있습니다.

다운로드
https://github.com/thu-ml/zhusuan

온라인 설명서
http://zhusuan.readthedocs.io/

'Deep Learning > resources' 카테고리의 다른 글

How a 22 year old from Shanghai won a global deep learning challenge 자율주행차 위한 세그멘테이션 대회 (0)	2017.10.31
저희 운영진 Kyung Mo Kweon 님께서 만들어 주신 12인회 논문 읽기 비디오 (0)	2017.09.23
India’s IIS and NIT Develops AI To Identify Protesters With Their Faces Partly Covered With Scarves or Hat (0)	2017.09.10
머신러닝/딥러닝 전반의 내용을 담고 있는 ebook (0)	2017.07.07
GitHub - leriomaggio/deep-learning-keras-tensorflow: Introduction to Deep Neural Networks with Keras and Tensorflow (0)	2017.06.08

Posted by uniqueone

India’s IIS and NIT Develops AI To Identify Protesters With Their Faces Partly Covered With Scarves or Hat

Deep Learning/resources 2017. 9. 10. 08:48

https://www.indianweb2.com/2017/09/08/indias-iis-nit-develops-ai-identify-protesters-faces-partly-covered-scarves-hat/

If you’re planning on becoming a part of a protest or a rally but don’t want to reveal your identity at the same time, you might want to think about your participation again as the latter might no longer be possible. Researchers, from Cambridge University, India’s National Institute of Technology, and the Indian Institute of Science have successfully developed a deep-learning algorithm that is capable of identifying an individual even when part of their face is obscured or covered by sunglasses or bandanas, as is seen during many protests, rallies and agitations.

'Deep Learning > resources' 카테고리의 다른 글

저희 운영진 Kyung Mo Kweon 님께서 만들어 주신 12인회 논문 읽기 비디오 (0)	2017.09.23
ZhuSuan : 베이지안 방법론과 딥러닝의 장점을 결합한 Bayesian Deep Learning을 위한 Python 기반 확률 프로그래밍 라이브러리. (0)	2017.09.20
머신러닝/딥러닝 전반의 내용을 담고 있는 ebook (0)	2017.07.07
GitHub - leriomaggio/deep-learning-keras-tensorflow: Introduction to Deep Neural Networks with Keras and Tensorflow (0)	2017.06.08
matconvnet 테스트 이미지에 딥러닝모델 적용하고, 피쳐 뽑아 SVM으로 분류하는 예제 (1)	2017.05.15

Posted by uniqueone

Deep Learning Lecture Collection (Spring 2017): http://www.youtube.com/playlist?list=PL3FW7Lu3i5JvHM8ljYj-zLfQRF3EO8sYv

Deep Learning/course 2017. 8. 13. 10:31

https://m.facebook.com/story.php?story_fbid=360526647716670&id=303538826748786

Deep Learning Lecture Collection (Spring 2017):

'Deep Learning > course' 카테고리의 다른 글

Top 10 Videos on Deep Learning in Python (0)	2017.11.18
Machine Learning · Artificial Inteligence 웹북. 딥러닝 기초 도움 (0)	2017.10.31
Learn Python in 3 days : Step by Step Guide - Data Science Central (0)	2017.06.11
NumPy for Matlab Users (0)	2017.05.29
Python Numpy Tutorial (0)	2017.05.29

Posted by uniqueone

grammarly란 프로그램이 있더군요. 영어 문장을 입력하면 문법이 싹 수정된 문장을 뱉어 줍니다.

Paper 2017. 8. 6. 08:48

https://m.facebook.com/groups/1003321396368637?view=permalink&id=1625076837526420

최근 경험담 하나 공유합니다 (인공지능과 얼마나 연관성이 있을까 잘 모릅니다). 제 전공분야의 책을 하나 마무리했습니다. 영어로 쓴 책이라 문법 교정을 한 번 받고 싶어 교정 업체를 물색하다보니 editage란 회사가 제일 눈에 띄더군요. 견적을 받았는데, 글자 당 얼마. 이렇게 계산을 하다보니 액수가 상당합니다. 울며 겨자 먹기로 맡기긴 했는데요.

사실 늘 영어로 논문 쓰는 일을 하다 보니 영어 문장 구성은 별 무리 없이 잘 합니다. 다만 세밀한 문법 실수가 있을까봐 부탁한 것이지요. 교정 업체에서는 그런 사정을 잘 모르니, 무조건 분량에 비례해 값을 매깁니다.

그래서 인터넷을 검색했더니 grammarly란 프로그램이 있더군요. 영어 문장을 입력하면 문법이 싹 수정된 문장을 뱉어 줍니다. 통 문단을 넣어주면 잠시 뜸을 들였다가 수정된 문단을 토해 냅니다. 기본적인 정관사 오류나, 쉼표 삽입 문제 등을 잘 지적하더군요. 회사 홈페이지 설명에 따르면 웹 검색한 영어 문장을 토대로 일종의 기계 학습된 고유 프로그램을 갖고 있나 봅니다.

몇가지 불만족스러운 점:

이과 전공 서적이다 보니 수식이 많습니다. pdf 파일을 통째로 올리면 여기서 수식과 영어 문장을 구분해서, 수식은 그냥 놔 두고 (어차피 교정 불가능하니까) 영어 문장만 골라 문법 수정해 주기. 이런 기능이 있으면 좋을 텐데 없네요. grammarly는 아예 pdf 파일을 받아들이는 것 자체를 못하는 것 같고, 아마 워드 파일만 올릴 수 있나 봅니다.

각 전공 분야마다 사용하는 고유 언어가 있습니다. 그러나 grammarly는 그런 고유 언어를 이해하지 못하고 다른 일반적인 단어로 대치하고 싶어 합니다. 각 전공 분야마다 특화된 문법 교정기가 있으면 좋겠다는 생각을 합니다. grammarly for physics, grammarly for medicine, 이런 식으로요. 전공 분야별로 특화된 문법 교정 프로그램을 만다는 시도를 하는 게 가능할까요? 이미 어디선가 시도하고 있을까요? 아니면 이미 있을까요?

인공지능 세상이 오면 없어질 직업 중에 동시 통역사, 번역가가 있다고 들었는데. 제가 보기엔 그 전에 먼저 이런 문법 수정해 주는 사업이 먼저 기계 학습으로 대체될 것 같네요.

'Paper' 카테고리의 다른 글

번역기는 아닌데 오타 정정에 Grammarly 프리미엄 쓰고 있습니다 (0)	2019.09.01
2016년 보완대체의학분야 Complementary and Alternative Medicine 저널 순위 (0)	2018.02.22
Microscopy Cell Counting with Fully Convolutional Regression Networks (0)	2016.06.02

Posted by uniqueone

앤드류 교수의 코세라 강의를 정주행 후 영상 슬라이드를 정리하여

Machine Learning/course 2017. 8. 3. 08:42

https://m.facebook.com/groups/255834461424286?view=permalink&id=513220265685703

안녕하세요, 앤드류 교수의 코세라 강의를 정주행 후 영상 슬라이드를 정리하여 보았습니다.

지난주에 이어 3주차 슬라이드를 공유 드립니다.
많은분들께 도움이 되었으면 합니다.

감사합니다. :)

3주차 : http://www.kwangsiklee.com/ko/2017/07/corsera-machine-learning%EC%9C%BC%EB%A1%9C-%EA%B8%B0%EA%B3%84%ED%95%99%EC%8A%B5-%EB%B0%B0%EC%9A%B0%EA%B8%B0-week3/
2주차 : http://www.kwangsiklee.com/ko/2017/07/corsera-machine-learning%EC%9C%BC%EB%A1%9C-%EA%B8%B0%EA%B3%84%ED%95%99%EC%8A%B5-%EB%B0%B0%EC%9A%B0%EA%B8%B0-week2/
1주차 : http://www.kwangsiklee.com/ko/2017/07/corsera-machine-learning-week1-%EC%A0%95%EB%A6%AC/

'Machine Learning > course' 카테고리의 다른 글

통계 추천 동영상강좌 (0)	2017.12.16
Pattern Recognition and Machine Learning 책 2장 식(2.117)까지 정리한 노트입니다. (0)	2017.11.08
Machine Learning Part 1 \| SciPy 2016 Tutorial \| Andreas Mueller & Sebastian Raschka (0)	2017.04.11
Time Series Forecasting with Python 7-Day Mini-Course - Machine Learning Mastery (0)	2017.03.26
How do I start AI/ML/DL? (0)	2017.03.18

Posted by uniqueone

17 Colors APIs

Digital Image Processing 2017. 7. 27. 14:00

https://www.programmableweb.com/category/colors/api

Colors Apis

The following is a list of APIs from ProgrammableWeb's API directory that matched your search term. The ProgrammableWeb API directory lists APIs of different types. For example, Web/Internet APIs, browser APIs, and certain product APIs. From many of our API profiles, you can find your way to related SDKs, Tutorials, and sample source code for consuming those APIs. If your favorite API or SDK is missing or you have an idea for contributing content, be sure to check our guidelines for making such contributions to ProgrammableWeb.

Add a new API to our directory

Name	Description	Category	Date
Icons8	Icons8 provides an extensive ISO compliant icon library. The API allows developers to search and retrieve icons that can be used for template customization, build graphic and text editors, and to...	Images	06.15.2017
PrintCalc	The PrintCalc API returns the percentage of a .pdf, .eps or .ps file's CMYK and Spot Color coverage. The API supports PDF, EPS, and PS files. HTTP POST is the preferred request method. The...	PDF	04.01.2017
TinEye MulticolorEngine	The TinEye MulticolorEngine API allows developers to make their image collections searchable by color. The API can extract color palettes from images, identify and search images by color, and support...	Search	01.19.2017
W3C CSS Painting	The W3C CSS Painting API is a specification that describes an API for allowing developers to use an additional function to their CSS code. This affects the paint stage of CSS, which is...	Images	09.27.2016
Image Color Extraction	Use this HTTP/JSON API to extract colors from any image. The Image Color Extraction API can return colors in multiple formats, such as: RGB, HEX, HSL, HSB or RGBA. Sign up with Mashape to receive...	Colors	02.18.2015
Coinprism Colored Coins	Coinprism is a service that allows for the tokenization of cryptocurrency. Using Coinprism's Colored Coins, users are able to trade shares, bonds, and commodities without regulation by coloring...	Bitcoin	08.25.2014
Croar.net RGB Picker	Croar.net provides a simple RGB color picker widget that users can add to any webpage by inserting a few lines of JavaScript code. A demo of this widget is available with the documentation. The Croar...	Widgets	01.02.2014
APICloud.Me ColorTag	APICloud.Me is a cloud-based API provider that aims to deliver scalable APIs that are simple to consume, reliable, and well documented. ColorTag is an API capable of detecting colors within an image...	Tools	11.17.2013
MyELearningSpace Web Accessibility	The service provides review and validation of a website's accessibility for all users, including those with impaired eyesight, hearing, and motor skills. It helps designers to make content...	Colors	07.23.2012
Pictaculous	Pictaculous is a color palette generator service from MailChimps. Users can upload PNG, GIF, or JPG image files and Pictaculous will analyze their colors to return a fitting scheme. Pictaculous'...	Tools	05.07.2012
AChecker	The service provides analysis and validation of accessibility of web resources for users with disabilities. It can perform an automated review of resources at a specified URL, with a validation...	Colors	03.02.2012
Image Color Summarizer	The web service provides statistics describing color characteristics of an image identified by URL. Summary data indicate the single RGB color that best represents the image, along with average hue...	Photos	11.29.2011
Colorfy It	Colorfy It is a web application that lets users copy and paste website URLs into a box, and it returns the colors, CSS, and color ID information from the website for color analysis. The Colorfy It...	Colors	10.11.2011
Colr.org	Colr.org is an online service that allows users to search for images, colors, and color schemes. Users can edit colors and color schemes, tag them, and download them. Users can also search for...	Other	08.07.2011
Empora Evergreen	Fashion search API that returns clothes and accessories data based on search parameters including price, brand, color, and title/description. Developers can earn revenue when people click through to...	Search	10.01.2010
ColoRotate	Bring 3D color into your web site or blog using the ColoRotate API. Use it to display palettes of color on your site in 3D, or create complex mashups. With the ColoRotate and JavaScript, you can get...	Tools	05.29.2010
COLOURlovers	From their site: With the release of the COLOURlovers API, you can now access almost 1 million named colors and more than 325,000 color palettes for your creative projects and applications. Creating...	Tools	04.20.2008

저작자표시 비영리 동일조건 (새창열림)

'Digital Image Processing' 카테고리의 다른 글

A simple C++ project for applying filters to raw images via command line. http://www.albertodebortoli.it (0)	2017.03.13
Using a Gray-Level Co-Occurrence Matrix (GLCM) (0)	2017.02.03
Embossing filter (0)	2017.01.13
Taking partial derivatives is easy in Matlab (0)	2016.12.01
Matlab Image Processing (0)	2016.12.01

Posted by uniqueone

수학을 포기한 직업 프로그래머가 머신러닝 학습을 시작하기위한 학습법 소개

Deep Learning 2017. 7. 27. 11:23

MORE AGILE: 수학을 포기한 직업 프로그래머가 머신러닝 학습을 시작하기위한 학습법 소개
http://www.moreagile.net/2015/05/how-to-start-machine-learning-study.html?m=1

MORE AGILE
보다 나은 개발자의 삶을 위하여
2015년 5월 31일 일요일
수학을 포기한 직업 프로그래머가 머신러닝 학습을 시작하기위한 학습법 소개
오늘은 일본의 유명 개발자용 지식공유 서비스인 Qiita에서 큰 인기를 끌고 있는 “수학을 포기한 직업 프로그래머가 기계학습을 시작하기위한 최단경로数学を避けてきた社会人プログラマが機械学習の勉強を始める際の最短経路“를 번역하여 소개해 보고자 합니다. 번역을 흔쾨히 허락해 주신 단노 류이치だんのりゅういち씨에게 감사의 말씀을 드립니다.
전체적으로는 선형대수에 대해서 쉽게 배우는 방법과 무엇 보다 머신러닝의 필수 과목으로 꼽히는 Andrew Ng교수의 머신러닝 강의에 대한 공략법/내용설명이 주를 이룹니다.
빅데이터나 머신러닝에 대해서 한번 공부는 해 보고 싶었지만 수학에서 좌절하여 문턱을 서성거리는 프로그래머들이 꼭 한번 읽어보면 좋은 내용이라 생각됩니다.

요즘 항간에서는 딥 러닝 같은 단어가 여기저기서 들려는 오는 바람에, 나도 머신러닝인지 뭔지 한번 해 볼까 하고 생각해 오다 용기를 내어 두꺼운 알록달록한 책을 사긴 했는데 좀체 손이 가질 않고 나오는 수식만 봐도 머리가 아파오는분. 그리하여, 그래 어차피 나따위가 머신러닝같은거 할 수 있을리가 없어 라는 한숨 섞인 자조속에 눈이 저절로 감겨오는 분 이라면 잠깐 시간을 내어 이 글을 끝까지 읽어 보시기 바랍니다.
대상

공부에 많은 시간을 투자하기 어려운 직업 프로그래머
슬슬 윗사람이나 고객 에게서 “이런건 머신러닝 이용하면 간단하지 않어?”라는 말을 듣게될것 같은 분
이과에서 수학을 공부하긴 했지만, 미분이라던지 행렬같은거 누가 물어보면 난처해 지는분
이 글에서 다루게 될 것

수학의 기본지식에 익숙해지기 위한, 수학이 처음부터 나오지 않는 프로그래머를 위한 수학입문서의 소개
머신러닝의 초보자에게 알맞은 머신러닝애 대한 온라인강좌(MOOC) 소개
환경

윈도우나 맥,리눅스에서 사용 가능한 MATLAB/Ocatave라는 툴을 사용합니다.
종이와 팬을 준비해두면 이해를 돕는데 편리합니다.
용어와 약어소개

PRML = 웹에서 머신러닝으로 검색하면 반드시 등장하는 약어입니다. Pattern Recognition and Machine Learning의 약자로 머신러닝에서는 바이블이라 할 수 있는 책 입니다. 이 글의 서문에서 소개한 알록달록한 책이 바로 이 책입니다.
MOOC/MOOCs = Massie Open Online Course(s)의 약자. 인터넷상에서 무료로 수강 가능한 오픈 강좌를 말합니다. 이번에 여러모로 많은 도움을 받고 있습니다.
초보자의 머신러닝학습 흐름

“전혀 머신러닝은 공부한적이 없고, 책을 좀 읽을라치면 수학공식이 나오는 바람에 어디서부터 시작해야 할 지 모르겠습니다” 라는 프로그래머에게 제 자신의 실제 경험을 바탕으로 권해드리는 방법은 다음과 같습니다.
행렬이라던지 백터를 모른다면 동영상을 봐도 중간에 무슨 소리인지 알 수가 없습니다. 따라서 프로그래머를 위한 수학책을 먼저 읽을 필요가 있습니다.
Coursera라는 온라인 강좌의 회원이 되고, 스텐퍼드 대학의 Andrew NG(앤드류 응)교수의 Machine Learning강의(무료)를 듣습니다.
그럼 이제부터 자세히 살펴보겠습니다.
프로그래머를 위한 수학 책

프로그래머 여러분들은 게이밍라던지 소프트웨어라던지 설명서를 읽기 보다는 어떻게든 일단 프로그램을 짜서 움직이는것을 보면서 생각하자는 식의 사고 회로가 형성되어 있다고 생각합니다.
하지만 이 머신러닝에 대한 공부라는 것은 수학적인 기초가 없는 상태에서 진행하려고 하는것은 게임으로 비교하자면 변변한 무기도 없이 어려운 던전에 들어가는것과 마찬가지라서 금방 벽에 부딧히게 됩니다.
머신러닝에 대한 책은 미분 적분과 선형대수를 알고있다는것을 전제하고 있기 때문에, 갑자기 모르는 수식이 튀어나오는것은 바로 그 부분에 대한 지식이 없는것이 원인입니다.
위에 설명한 코세라의 머신러닝 과정은 그러한 수학 지식이 없어도 들을 수 있게 배려하고 있으며, 중간중간 해설도 해 주는 프로그래머 친화적인 교육 코스입니다.
그래도 최소한 행렬과 백터의 취급에 익숙하지 않으면 중간중간 튀어나오는 행렬조작에 대한 내용들을 이해 못하고 이게 뭔가 라는 상태가 되기 쉽습니다. 그래서 사전에 준비운동 정도로 익혀 두는것이 좋습니다.
프로그래머를 대상으로 추천할만 선형 대수학 강의/책은 여기 입니다.(역자주:원문에서는 일본어로 된 서적:프로그래밍을 위한 선형대수학(히라 카즈유키저)을 소개하고 있어 한국에서 접할 수 있는 선형대수 관련 강의/책으로 대체하여 소개해 봅니다)
칸아카데미 선형대수 강의(한글자막)
뭐 설명이 필요 없습니다. 마치 어딘가의 광고에 나오는 문구 처럼 씹을 필요도 없이 보기만 하면 머리속에 쏙쏙 들어오는 명 강의입니다.
코딩 더 메트릭스
파이선을 이용행 여러가지 선영대수의 문제를 풀는 법에 대하여 설명하고 있습니다. 머신러닝에 있어서 많은 예제들이 파이선으로 작성되어 있으므로 파이선이 익숙치 않다면 이번 기회에 다뤄보는것도 좋을듯 합니다. Coursera에서 하는 동명의 강의도 같은 내용을 다루고 있으며 프로그래머를 위한 선형대수를 자세히 다루고 있으므로 추천할만 합니다.
개인적으로는 바로 다음에 소개할 머신러닝 과정을 이해하는데 필요한 최소의 지식이 행렬의 곱이라고 생각합니다. 중요한것은 머리속에서 어떻게 이미지를 그려나가야 하는지를 알아야 하는것 같습니다.
저는 행렬 수식의 어디에 어떻게 주목하지 않으면 안되는 것인지 전혀 이해할 수 없었습니다만 행렬의 진정한 가치는 사이즈의 변환에 있다는것을 알게되었습니다. ( 역자주 : 칸아카데미의 Scaling vector에 나오는 예제들을 충분히 다뤄보시기 바랍니다)
머신러닝 강의를 수강한다

머신러닝 초보자에게 추천하는 것이 지금부터 소개하는 온라인 강좌입니다. 등록 방법은 여기에서 설명하지 않겠습니다만, 여렵지 않게 계정을 만들고 수강신청을 할 수 있습니다.
이 강의를 추천하는 이유는 다음과 같습니다.
공짜다
머신러닝 에서 초심자에게 필요한 지식을 개념을 통해 설명해 줍니다.
어렵지 않은 수준의 영어로 강의가 진행됩니다. (역자주: 2015년 5월 현재 이 강의의 모든 동영상은 일본어 자막 지원하지만 한국어 자막은 소개부분의 일부분만 지원되고 있습니다)
영상 강의 뿐만 아니라 시험도 봅니다. 실제로 프로그램 결과를 제출해야만 합니다. 그래서 자신이 이해하고 있는지 여부를 알기 쉽습니다.
시험은 여러 번 제출을 할 수 있기 때문에 혹시 잘 못하면 어쩌나 라는 생각을 하지 않아도 좋습니다.
기한이 따로 없습니다. 자신의 페이스에 맞춰서 학습을 진행 할 수 있습니다.
동영상 다운로드가 있기 때문에 열차안에서도 보기 쉽습니다.
장점이 많은 강의이지만, 온라인 강의의 특성상 통제 불가능한 부분도 있습니다.
기한도 없고 돈도 내지 않기 때문에 중간에 그만둬 버리기 쉽다
사실, 강좌를 시작하는 사람수에 비해 끝까지 마치는 사람의 비율은 매우 낮다는 데이터가 있다고 합니다. 그래서 마지막으로 필요한 것은 ‘강철의 의지’ 입니다. 혼자 하기 어렵다면 친구나 동료를 모아 함께 배워나가는 것이 좋을지도 모릅니다.
역자주: 그렇습니다. 이 강의는 혼자 듣기가 참 어렵습니다. 일단 18주(무려 4달 반에 해당한다!)에 이르는 양도 양 이지만 연습문제풀이등에서 막히면 혼자 풀어나가기가 참 어렵습니다. 주변의 널널한 프로그래머, 혹은 수학이나 통계학 전공자 들을 꼬득여 온라인 스터디 그룹을 만들고 1주일에 한번씩 서로의 진도를 체크해 주며 격려해 나가는 것도 한가지 좋은 방법이 되리라 생각합니다.
머신러닝 과정의 내용

머신러닝 수업에 대한 소개 페이지입니다.

여러가지 WEB페이지에 같은 내용의 소개가 있습니다만, 시기에 따라 코스수나 조건이 달라지는것 같습니다. 이 글의 내용은 2015년 5월 현재 오픈된 코스를 기준으로 소개합니다.
1. Introduction(소개)

머신러닝이라는게 데체 뭐야? 어디서 쓰는 것인지에 대해서 설명하고 있습니다. 아울러 쓰게되는 툴의 설치 방법을 설명하고 있습니다.
동영상은 물론 자막이 있으며 아직 한국어 자막은 소개 이외엔 제공되고 있지 않습니다만 영어 자막을 켜거나 일본어가 가능하다면 일본어 자막을 선택 할 수 있습니다.
설명은 윈도우즈라면 MATLAB를 설치하는것 부터 시작합니다. (이 과정을 수강하면 무료로 사용할 수 있습니다.) Linux / Mac OS X라면 Octave를 인스톨 합니다.
각 장에서 토론을 할 수 있는 게시판 같은 곳이 있는데, 아무것도 쓰지 않아도 과정을 끝까지 수료하는데엔 아무 문제가 없으므로 인스톨 이외엔 무시하고 넘어갑시다.
2. Linear Regression with One Variable(변수의 선형 회귀)

여기서부터 실전으로 들어갑니다. 머신러닝의 출발은 선형 회귀라는 것을 알 수 있습니다.
또한 이 장에서는 앞 장에서는 없었던 Review라는 이름의 테스트가 있습니다.
테스트는 5 문항 중 4 문항이상을 맞춰야 통과가 되는데, 단순 객관식이 아니라 복수 선택이라던지 계산하여 숫자를 써 넣는식으로 시험이 진행됩니다. 게다가, 어려운것은 매번 출제되는 순서나 내용이 바뀌므로 한번 통과했다 하더라도 다시 시험을 보았을때 떨어지느 경우도 있습니다.
게다가 이 테스트는 꼼수를 막기위해서 3회동안 패스(4문제 이상 정답 제출)하지 못하면 8시간 이내에 시험을 볼 수 없는 패널티가 주어지게 됩니다. (패널티는 이것 뿐입니다.)
패널티를 받게되면 아래 사진처럼 몇 분 내기라는 표시가 뜨게 됩니다.

테스트를 통과하지 못한다 하더라도 다음 진행을 못하는 것은 아니므로 혹시나 패널티를 받게 된다면 그냥 다음 진도를 일단 나가는 것이 좋습니다.
3. Linear Algebra Review(선형 대수학 복습)

선형 대수학을 모르면 앞으로 진행 할수 없다는 것을 잘 알려주는 장 입니다. 기초에 대해 많은 공을 들여 친절하게 시간을 투자하고 있습니다. 머신러닝에 사용되는 행렬이나 벡터에 대한 개념을 이 장에서 잡으셔야 합니다.
이 장의 Review는 좀 달라서, 이 장에서 한두번 치루게 되는 확인시험과 동일한 평가를 Review형식으로 취합니다. 따라서 클리어 한다고 해도 전체 성적에 반영되지는 않습니다. 혼동하지 않도록 주의합시다. 이 후로도 이러한 형태의 시험은 여기 뿐입니다.
4. Linear Regression with Multiple Variables(다변량 선형 회귀)

2장에서는 선형 회귀를 살펴보았으므로, 이번에는 다변량 선형 회귀에 대한 강의입니다. 다변량을 취급하는데 있어서 규모를 맞추거나 매개 변수의 조정에 대한 논의가 있습니다.
또한 이 장에서 프로그램을 직접 짜는 테스트가 추가됩니다. MATLAB/Octave에 익숙하지 않으면 좀 푸는 것이 괴롭습니다. 하지만, MATLAB/Octave에 대해서는 다음장에서 설명을 시작하므로 먼저 5장을 듣고 나서 돌아오는것이 좋습니다.
덧붙여서, 프로그램은 MATLAB/Octave에서 직접 업로드 할 수 있습니다. 업로드 하면 WEB에서 성적을 볼 수 있습니다. 업로드는 몇번이고 가능하므로 부담 갖지 말고 들어보시기 바랍니다.

5. Octave Tutorial(Octave의 설명)

Octave라고 써 있습니다만, MATLAB도 마찬가지입니다. 행렬 처리에 익숙하지 않은 사람은 이 장의 “Vectorization”동영상의 내용을 마스터 하면 for문이 불필요한 계산을 구축 할 수 있게 됩니다. 개인적으로 이 장의 강좌를 들으면서 좋았다고 생각한 순간이었습니다.
역자주: 이 강의에서는 MATLAB/Octabe를 위주로 설명하고 있지만 기업환경에서 머신러닝을 이끌고 있는 대중적인 언어는 파이선/R입니다. 파이선이나 R에 대한 튜토리얼은 아래 링크에서 확인해 보세요.
Kaggle R Tutorial on Machine Learning(R언어로 진행되며 무료입니다)
Intro to Machine Learning(파이선으로 진행되며 무료입니다)
6. Logistic Regression(로지스틱 회귀)

동영상 내에서도 선생님이 혼자서 마구 달라는 느낌입니다만, 이름은 ‘회귀(Regression)’ 인데 ‘분류(Classification)’에 에 사용되는 로지스틱 회귀를 배우는 장 입니다. 선형 회귀와 비슷한데, 다른 곳을 확인하는것으로 머신러닝에 대한 깊이가 드러나게 됩니다.
또한 여기서도 프로그램의 게시물 테스트가 있는데, 곤란하게도 다음장에 배울 지식이 없으면 풀리지 않는 문제(Regularization:정규화)가 들어 있습니다. 제 경우 결국 풀지 못하고 다음장에 갔다가 다시 돌아와서 풀어야만 했습니다. 설명하지 않아도 풀 수 있을거라고 생각했는지, 암튼 여러가지로 깊이가 있는 챕터입니다.
7. Regularization(정규화)

점점 머신러닝 다워 진다고나 할까, “XX를 향상시키기 위해 이를 수식에 추가”라는 요소 중 하나인 정규화에 대해 배울 수 있는 장 입니다. 장으로는 짧지만, 이 장을 대충 넘기게 되면 나중에 다시보지 않으면 안되는 것이 많아지므로 제대로 이해하는것이 중요합니다.
8. Neural Networks : Representation(신경망:표현)

이번 장은 전반부 최대의 고비인 신경망에 대한 내용입니다. 비교적 복잡하기 때문에 2개의 장으로 나눠져 있습니다. 이 장에서는 먼저 입출력에서 출력까지 어떻게 계산하고 진행해야 하는지를 설명하고 있습니다.
9. Neural Networks: Learning(신경망:학습)

여기가 신경망을 머신러닝에 적용시키는데 중요한 Backpropagation(오차역전파방법)에 대해서 설명하는 장 입니다. 저는 지금까지 다른 책 이라던지 웹에 적힌 내용을 봐도 도통 뉴럴네트워크가 어떻게 학습을 해 나가는지 이해 할 수 없었습니다만, 동영상 및 구현을 보고 마침내 어떤 계산에서 학습하고 있는지를 알겠다는 생각이 들었습니다.
10. Advice for Applying Machine Learning(머신러닝을 적용하기 위한 조언)

머신러닝 이론을 안다고 해도 실제 적용하기 위해서 무엇을 조심하지 않으면 안되는가에 대해서 다루는 대단히 중요한 장 입니다.
언더핏, 오버핏, 학습곡선이 어떻게 되는가 등 머신러닝을 실무에 적용하려 할때 지침이 되는 장 입니다.
11. Machine Learning System Design(머신러닝 시스템 디자인)

이전 장 처럼 이번 장도 머신러닝을 실무에 도입하려고 할 때 발생하는 문제점을 알아보고 해결책을 제시하는 장 입니다.
구체적으로는 99%일어나지 않지만 1%일어날 가능성이 있는 현상에 대해서 머신러닝의 지표는 어떻게 설정해야 하는것인가에 대해서 이야기 하고 있습니다.
12. Support Vector Machines(서포트 벡터 머신(SVM))

분류 알고리즘의 하나인 SVM에 대한 장 입니다. SVM은 인기있는 알고리즘 이므로 잘 배워둬야 한다고 선생님은 말씀하시고 있었습니다.
또한 마지막 동영상은 SVM을 포함해 여러 알고리즘들이 어떤때 사용되는지 이야기 하고 있으므로 잘 기억해 둬야 하는 포인트 입니다.
13. Unsupervised Learning(무감독 학습)

이번장부터 15장 까지가 무감독 학습의 장 입니다. 이 장 에서는 K-Means알고리즘이라 불리우는 인기있는 다운클러스터링 알고리즘에 대해 배웁니다.
14. Dimensionality Reduction(차원감소)

이 장에서는 차원감소에 대해서 배웁니다. 차원감소라는 개념에서 실제로 배우는 알고리즘은 PCA(주성분 분석)라 불리우는 알고리즘 입니다.
저는 100차원이든 1000차원이든 2차원 또는 3차원까지 줄여버리면 그래프 그리기가 가능하다는 것으로 듣고 학습 동기가 생겼습니다.
또한 차원을 감소 시켜 극단적으로 적은 수가 된다면 어떻게 될까 생각했습니다만, 그에 대해서는 제대로 대답이 준비되어 있어 원래의 상태 대비 압축 정도에 따라 어느정도 정보량이 손실되는지를 파악하는 것이 중요하다고 생각했습니다.
15. Anomaly Detection(이상 검출)

이상을 검출하는 알고리즘을 학습하는 장 입니다.
여기서 처음으로 정규분포(상당히 복잡한 수식)이 나오는 것 입니다만, 그 이전에 여러 수식에 익숙해 진 터라, 그렇게까지 흉악해 보이지는 않았습니다.
흥미로왔던 것은 감독 학습과 비정상 검출을 비교하여, 데이터 상태에 따라 어느쪽을 사용해야 할 지와 같은 주제가 앞으로 도움이 될 것이라 합니다.
16. Recommender Systems(추천시스템)

사용자가 평가 한 영화의 평점을 어떻게 처리할 것 인가라는 굉장히 실용적인 주제에 대한 해설을 하는 장 입니다.
처음에는 제한적인 작은 데이터부터 시작하여 마지막에는 비어있는 부분이 있다 하여도 모든 파라메터를 단번에 계산 해 버릴 듯한 이야기의 흐름이 무척 좋았습니다.
17. Large Scale Machine Learning(대규모 머신러닝)

대규모 데이터에대해서 매번 전부다 처리를 하게되면 속도가 느려지는 것에 대해서 어떻게 할 지를 생각해 보는 장 입니다.
여기에서 처음 Map-Reduce개념이 등장 합니다만, 분할해서 처리하니 빠르군요 정도의 뻔한 수준의 이야기 이므로 좀 더 자세하게 알아보고 싶으신 분들은 다른 문헌을 찾아보시는것을 추천합니다.
18. Application Example : Photo OCR(응용 예: 사진에서의 텍스트 추출)

마지막 장은 응용에 대한 예로서 파이프라인을 이용한 머신러닝 로직의 조합으로, 사진에서 텍스트를 추출하는 작업을 수행합니다.
실제 프로그램을 짤 것이라고 생각했는데, 이 장에서는 프로그램을 제출하지 않았습니다.
모든 테스트를 통과하면 무엇이 일어나는가?

기간제 코스의 경우 유료로 수료증 같은 것을 발행해 주는듯 합니다만, 완전 오픈코스인 이 과정은 결과가 코스 상단에 표시됩니다.
역자주: 코세라는 코스 자체는 무료로 제공하고 있으며, 수료증을 유료로 제공하는것을 비즈니스 모델로 하고 있습니다.
실제 전 과정을 패스하자 아래와 같이 표시되어 “Course Passed”부분에 뭔가 링크가 있을까 기대해 봤지만… 아니었습니다.

머신러닝 수강이 끝났다! 이제 뭘 하지?

프로그래머라면 이제 손을 움직여 다양한 것을 구현해 나갑시다. 샘플 코드의 의미도 알게 되었고 어디를 수정하면 어떻게 될까 어쨌든 머리속에 그려 볼 수 있게 된 듯 합니다.
스터디 그룹같은걸 열어서 아직 머신러닝에 경험이 없는 사람을 끌여들여 동료로 늘려 나가는 것도 좋은 방법입니다.
참여자가 늘지 않으면 이 분야는 발전이 없기 때문에 꼭 주변에 퍼트리고 다닙시다.
역자 추가: 머신 러닝 관련 읽을거리들

쉽게 풀어쓴 딥 러닝의 모든것
PredictionIO:오픈소스 머신러닝서버
Getting Started with Microsoft Azure Machine Learning :간편한 인터페이스임에도 강력한 기능을 제공하는 AzureML에 대한 온라인 강좌 입니다.
DL4J: Word2Vec를 비롯 각종 머신러닝 알고리즘을 자바에서 쓸 수 있게 해 주는 오픈소스 프로젝트입니다.
시간: 오전 12:40
공유

‹
›
홈
웹 버전 보기
주인장 소개

내 사진
정도현

서울로 이사왔습니다!
전체 프로필 보기
Powered by Blogger.

'Deep Learning' 카테고리의 다른 글

[rgb이미지의 2d 콘볼루션은 3d콘볼루션으로 작용한다][convolution kernel의 깊이(depth)는 생략되어 있다]convolution 연산에서 입력영상의 컬러채널 또는 다수의 깊이(depth channel)채널과 필터의 콘볼루.. (0)	2021.04.15
NVIDIA Research Unveils Flowtron, an Expressive and Natural Speech Synthesis Mod (0)	2020.05.15
파이썬 쥬피터를 이용한 텐서플로우 개발환경 구성하기 (0)	2017.04.13
Why Momentum Really Works (0)	2017.04.05
What is the best probabilistic graphical model toolkit for MATLAB? (0)	2017.03.19

Posted by uniqueone

Quick complete Tensorflow tutorial to understand and run Alexnet, VGG, Inceptionv3, Resnet and squeezeNet networks

Deep Learning/TensorFlow 2017. 7. 26. 16:04

Quick complete Tensorflow tutorial to understand and run Alexnet, VGG, Inceptionv3, Resnet and squeezeNet networks – CV-Tricks.com
http://cv-tricks.com/tensorflow-tutorial/understanding-alexnet-resnet-squeezenetand-running-on-tensorflow/

'Deep Learning > TensorFlow' 카테고리의 다른 글

How to Use the Keras Functional API for Deep Learning (0)	2017.10.27
Keras Tutorial: The Ultimate Beginner's Guide to Deep Learning in Python (0)	2017.10.18
케라스 강좌 내용 (0)	2017.07.12
LSTM을 이용한 감정 분석 w/ Tensorflow. 텍스트파일에서 감정상태 분류 (0)	2017.07.05
Trend Prediction using LSTM RNNs with Keras implementation (Tensorflow) (0)	2017.04.11

Posted by uniqueone

이전 1 ··· 18 19 20 21 22 23 24 ··· 35 다음