'분류 전체보기'에 해당되는 글 1027건

  1. 2017.11.25 TensorFlow Speech Recognition - Kaggle competition keras
  2. 2017.11.24 Kaggle-knowhow 한국분들을 위한 Kaggle 자료 모음입니다
  3. 2017.11.24 Probabilistic Graphical Models Tutorial — Part 2 – Stats and Bots
  4. 2017.11.22 An Introduction to different Types of Convolutions in Deep Learning
  5. 2017.11.18 구글에서 공개한 머신러닝 용어집 (https://developers.google.com/machine-learning/glossary/) 을 번역한 글입니다
  6. 2017.11.18 Top 10 Videos on Deep Learning in Python
  7. 2017.11.08 Pattern Recognition and Machine Learning 책 2장 식(2.117)까지 정리한 노트입니다.
  8. 2017.11.08 NumPy for MATLAB users – Mathesaurus
  9. 2017.10.31 Machine Learning · Artificial Inteligence 웹북. 딥러닝 기초 도움
  10. 2017.10.31 How a 22 year old from Shanghai won a global deep learning challenge 자율주행차 위한 세그멘테이션 대회
  11. 2017.10.27 How to Use the Keras Functional API for Deep Learning
  12. 2017.10.26 Swish activation, get slightly better result Than ReLU.
  13. 2017.10.19 MoCoGAN: Decomposing Motion and Content for Video Generation
  14. 2017.10.18 Keras Tutorial: The Ultimate Beginner's Guide to Deep Learning in Python
  15. 2017.10.16 Introduction to Numpy -1 : An absolute beginners guide to Machine Learning and Data science.
  16. 2017.10.13 기계학습 한글로된 자료(강의자료 + 영상)을 모아봤습니다.
  17. 2017.10.12 베이즈 정리(Bayes Theorem)라고 알려진 사후확률 (posterior probability)에 관한 몇가지 수학을 논의할 것입니다
  18. 2017.09.28 머신러닝 용어 https://developers.google.com/machine-learning/glossary/
  19. 2017.09.26 30개의 필수 데이터과학, 머신러닝, 딥러닝 치트시트
  20. 2017.09.23 저희 운영진 Kyung Mo Kweon 님께서 만들어 주신 12인회 논문 읽기 비디오
  21. 2017.09.20 ZhuSuan : 베이지안 방법론과 딥러닝의 장점을 결합한 Bayesian Deep Learning을 위한 Python 기반 확률 프로그래밍 라이브러리.
  22. 2017.09.10 India’s IIS and NIT Develops AI To Identify Protesters With Their Faces Partly Covered With Scarves or Hat
  23. 2017.08.13 Deep Learning Lecture Collection (Spring 2017): http://www.youtube.com/playlist?list=PL3FW7Lu3i5JvHM8ljYj-zLfQRF3EO8sYv
  24. 2017.08.06 grammarly란 프로그램이 있더군요. 영어 문장을 입력하면 문법이 싹 수정된 문장을 뱉어 줍니다.
  25. 2017.08.03 앤드류 교수의 코세라 강의를 정주행 후 영상 슬라이드를 정리하여
  26. 2017.07.27 17 Colors APIs
  27. 2017.07.27 수학을 포기한 직업 프로그래머가 머신러닝 학습을 시작하기위한 학습법 소개
  28. 2017.07.26 Quick complete Tensorflow tutorial to understand and run Alexnet, VGG, Inceptionv3, Resnet and squeezeNet networks
  29. 2017.07.25 #How to install caffe in windows in just 5 minutes !YouTube - Jun 3, 2017
  30. 2017.07.25 Caffe 윈도우 설치하기
https://m.facebook.com/groups/107107546348803?view=permalink&id=532112330514987

TensorFlow Speech Recognition - Kaggle competition is going on. I wrote a basic tutorial on speech (word) recognition using some of the datasets from the competition.
.
Hope it will be helpful for some of you. Thanks in advance for reading!
.

https://blog.manash.me/building-a-dead-simple-word-recognition-engine-using-convnet-in-keras-25e72c19c12b
Posted by uniqueone
,

Kaggle-knowhow/README.md at master · zzsza/Kaggle-knowhow · GitHub
https://github.com/zzsza/Kaggle-knowhow/blob/master/README.md
Posted by uniqueone
,

Probabilistic Graphical Models Tutorial — Part 2 – Stats and Bots
https://blog.statsbot.co/probabilistic-graphical-models-tutorial-d855ba0107d1


Homepage
Stats and Bots
Get started
HOMEDATA SCIENCEANALYTICSSTARTUPSBOTSDESIGNSUBSCRIBE TRY STATSBOT FREE
Go to the profile of Prasoon Goyal
Prasoon Goyal
PhD candidate at UT Austin. For more content on machine learning by me, check my Quora profile (https://www.quora.com/profile/Prasoon-Goyal).
Nov 23
Probabilistic Graphical Models Tutorial — Part 2
Parameter estimation and inference algorithms

In the previous part of this probabilistic graphical models tutorial for the Statsbot team, we looked at the two types of graphical models, namely Bayesian networks and Markov networks. We also explored the problem setting, conditional independences, and an application to the Monty Hall problem. In this post, we will cover parameter estimation and inference, and look at another application.

Parameter Estimation
Bayesian networks
Estimating the numbers in the CPD tables of a Bayesian network simply amounts to counting how many times that event occurred in our training data. That is, to estimate p(SAT=s1 | Intelligence = i1), we simply count the fraction of data points where SAT=s1 and Intelligence = i1, out of the total data points where Intelligence = i1. While this approach may appear ad hoc, it turns out that the parameters so obtained maximize the likelihood of the observed data.
Markov networks
For Markov networks, unfortunately, the above counting approach does not have a statistical justification (and will therefore lead to suboptimal parameters). So, we need to use more sophisticated techniques. The basic idea behind most of these techniques is gradient descent — we define parameters that describe the probability distribution, and then use gradient descent to find values for these parameters that maximize the likelihood of the observed data.
Finally, now that we have the parameters of our model, we want to use them on new data, to perform inference!
Inference
The bulk of the literature in probabilistic graphical models focuses on inference. The reasons are two-fold:
Inference is why we came up with this entire framework — being able to make predictions from what we already know.
Inference is computationally hard! In some specific kinds of graphs, we can perform inference fairly efficiently, but on general graphs, it is intractable. So we need to use approximate algorithms that trade off accuracy for efficiency.
There are several questions we can answer with inference:
Marginal inference: Finding the probability distribution of a specific variable. For instance, given a graph with variables A, B, C, and D, where A takes values 1, 2, and 3, find p(A=1), p(A=2) and p(A=3).
Posterior inference: Given some observed variables v_E (E for evidence) that take values e, finding the posterior distribution p(v_H | v_E=e) for some hidden variables v_H.
Maximum-a-posteriori (MAP) inference: Given some observed variables v_E that take values e, finding the setting of other variables v_H that have the highest probability.
Answers to these questions may be useful by themselves, or may need to be used as part of larger tasks.
In what follows, we are going to look at some of the popular algorithms for answering these questions, both exact and approximate. All these algorithms are applicable on both Bayesian networks and Markov networks.
Variable Elimination
Using the definition of conditional probability, we can write the posterior distribution as:

Let’s see how we can compute the numerator and the denominator above, using a simple example. Consider a network with three variables, and the joint distribution defined as follows:

Let’s say we want to compute p(A | B=1). Note that this means that we want to compute the values p(A=0 | B=1)and p(A=1 | B=1), which should sum to one. Using the above equation, we can write

The numerator is the probability that A = 0 and B = 1. We don’t care about the values of C. So we would sum over all the values of C. (This comes from basic probability — p(A=0, B=1, C=0) and p(A=0, B=1, C=1) are mutually exclusive events, so their union p(A = 0, B=1) is just the sum of the individual probabilities.)
So we add rows 3 and 4 to get p(A=0, B=1) = 0.15. Similarly, adding rows 7 and 8 gives usp(A=1, B=1) = 0.40. Also, we can compute the denominator by summing over all rows that contain B=1, that is, rows 3, 4, 7, and 8, to get p(B=1) = 0.55. This gives us the following:
p(A = 0 | B = 1) = 0.15 / 0.55 = 0.27
p(A = 1 | B = 1) = 0.40 / 0.55 = 0.73
If you look at the above computation closely, you would notice that we did some repeated computations — adding rows 3 & 4, and 7 & 8 twice. A more efficient way to compute p(B=1)would have been to simply add the values p(A=0, B=1) and p(A=1, B=1). This is the basic idea of variable elimination.
In general, when you have a lot of variables, not only can you use the values of the numerator to compute the denominator, but the numerator by itself will contain repeated computations, if evaluated naively. You can use dynamic programming to use precomputed values efficiently.
Because we are summing over one variable at a time, thereby eliminating it, the process of summing out multiple variables amounts to eliminating these variables one at a time. Hence, the name “variable elimination.”
It is straightforward to extend the above process to solve the marginal inference or MAP inference problems as well. Similarly, it is easy to generalize the above idea to apply it to Markov networks too.
The time complexity of variable elimination depends on the graph structure, and the order in which you eliminate the variables. In the worst case, it has exponential time complexity.
Belief Propagation
The VE algorithm that we just saw gives us only one final distribution. Suppose we want to find the marginal distributions for all variables. Instead of running variable elimination multiple times, we can do something smarter.
Suppose you have a graph structure. To compute a marginal, you need to sum the joint distribution over all other variables, which amounts to aggregating information from the entire graph. Here’s an alternate way of aggregating information from the entire graph — each node looks at its neighbors, and approximates the distribution of variables locally.
Then, every pair of neighboring nodes send “messages” to each other where the messages contain the local distributions. Now, every node looks at the messages it receives, and aggregates them to update its probability distributions of variables.

In the figure above, C aggregates information from its neighbors A and B, and sends a message to D. Then, D aggregates this message with the information from E and F.
The advantage of this approach is that if you save the messages that you are sending at every node, one forward pass of messages followed by one backward pass gives all nodes information about all other nodes. That information can then be used to compute all the marginals, which was not possible in variable elimination.
If the graph does not contain cycles, then this process converges after a forward and a backward pass. If the graph contains cycles, then this process may or may not converge, but it can often be used to get an approximate answer.
Approximate inference
Because exact inference may be prohibitively time consuming for large graphical models, numerous approximate inference algorithms have been developed for graphical models, most of which fall into one of the following two categories:
Sampling-based
These algorithms estimate the desired probability using sampling. As a simple example, consider the following scenario — given a coin, how you would determine the probability of getting heads when the coin is tossed? The simplest thing is to flip the coin, say, 100 times, and find out the fraction of tosses in which you get heads.
This is a sampling-based algorithm to estimate the probability of heads. For more complex questions in probabilistic graphical models, you can use a similar procedure. Sampling-based algorithms can further be divided into two classes. In the first one, the samples are independent of each other, as in the coin toss example above. These algorithms are called Monte Carlo methods.
For problems with many variables, generating good quality independent samples is difficult, and therefore, we generate dependent samples, that is, each new sample is random, but close to the last sample. Such algorithms are called Markov Chain Monte Carlo (MCMC) methods, because the samples form a “Markov chain.” Once we have the samples, we can use them to answer various inference questions.
Variational methods
Instead of using sampling, variational methods try to approximate the required distribution analytically. Suppose you write out the expression for computing the distribution of interest — marginal probability distribution or posterior probability distribution.
Often, these expressions have summations or integrals in them that are computationally expensive to evaluate exactly. A good way to approximate these expressions is to then solve for an alternate expression, and somehow ensure that this alternate expression is close to the original expression. This is the basic idea behind variational methods.
When we are trying to estimate a complex probability distribution p_complex, we define a separate set of probability distributions P_simple, which are easier to work with, and then find the probability distribution p_approx from P_simple that is closest to p_complex.
Application: Image denoising
Let us now use some of the ideas we just discussed on a real problem. Let’s say you have the following image:

Now suppose that it got corrupted by random noise, so that your noisy image looks as follows:

The goal is to recover the original image. Let’s see how we can use probabilistic graphical models to do this.
The first step is to think about what our observed and unobserved variables are, and how we can connect them to form a graph. Let us define each pixel in the noisy image as an observed random variable, and each pixel in the ground truth image as an unobserved variable. So, if the image is M x N, then there are MN observed variables and MN unobserved variables. Let us denote observed variables as X_ij and unobserved variables as Y_ij. Each variable takes values +1 and -1 (corresponding to black and white pixels, respectively). Given the observed variables, we want to find the most likely values of the unobserved variables. This corresponds to MAP inference.
Now, let us use some domain knowledge to build the graph structure. Clearly, the observed variable at position (i, j) in the noisy image depends on the unobserved variable at position (i, j) in the ground truth image. This is because most of the time, they are identical.
What more can we say? For ground truth images, the neighboring pixels usually have the same values — this is not true at the boundaries of color change, but inside a single-colored region, this property holds. Therefore, we connect Y_ij and Y_kl if they are neighboring pixels.
So, our graph structure looks as follows:

Here, the white nodes denote the unobserved variables Y_ij and the grey nodes denote observed variables X_ij. Each X_ij is connected to the corresponding Y_ij, and each Y_ij is connected to its neighbors.
Note that this is a Markov network, because there is no cause-effect relation between pixels of an image, and therefore, defining directions of arrows in Bayesian networks is unnatural here.
Our MAP inference problem can be mathematically written as follows:

Here, we used some standard simplification techniques common in maximum log likelihood computation. We will use X and Y(without subscripts) to denote the collection of all X_ij and Y_ij values, respectively.
Now, we need to define our joint distribution P(X, Y) based on our graph structure. Let’s assume that P(X, Y) consists of two kinds of factors — ϕ(X_ij, Y_ij) and ϕ(Y_ij,Y_kl), corresponding to the two kinds of edges in our graph. Next, we define the factors as follows:
ϕ(X_ij, Y_ij) = exp(w_e X_ij Y_ij), where w_e is a parameter greater than zero. This factor takes large values when X_ij and Y_ij are identical, and takes small values when X_ij and Y_ij are different.
ϕ(Y_ij, Y_kl) = exp(w_s Y_ij Y_kl), where w_s is a parameter greater than zero, as before. This factor favors identical values of Y_ij and Y_kl.
Therefore, our joint distribution is given by:

where (i, j) and (k, l) in the second product are adjacent pixels, and Z is a normalization constant.
Plugging this into our MAP inference equation gives:

Note that we have dropped the term containing Zsince it does not affect the solution.
The values of w_e and w_s are obtained using parameter estimation techniques from pairs of ground truth and noisy images. This process is fairly mathematically involved (although, at the end of the day, it is just gradient descent on a complicated function), and therefore, we shall not delve into it here. We will assume that we have obtained the following values of these parameters — w_e = 8 and w_s = 10.
The main focus of this example will be inference. Given these parameters, we want to solve the MAP inference problem above. We can use a variant of belief propagation to do this, but it turns out that there is a much simpler algorithm called Iterated conditional modes (ICM) for graphs with this specific structure.
The basic idea is that at each step, you choose one node, Y_ij, look at the value of the MAP inference expression for both Y_ij = -1 and Y_ij = 1, and pick the one with the higher value. Repeating this process for a fixed number of iterations or until convergence usually works reasonably well.
You can use this Python code to do this for our model.
This is the denoised image returned by the algorithm:

Pretty good, isn’t it? Of course, you can use more fancy techniques, both within graphical models, and outside, to generate something better, but the takeaway from this example is that a simple Markov network with a simple inference algorithm already gives you reasonably good results.
Quantitatively, the noisy image has about 10% of the pixels that are different from the original image, while the denoised image produced by our algorithm has about 0.6% of the pixels that are different from the original image.
It is important to note that the graph that we used is fairly large — the image size is about 440 x 300, so the total number of nodes is close to 264,000. Therefore, exact inference in such models is essentially infeasible, and what we get out of most algorithms, including ICM, is a local optimum.
Let’s recap
In this section, let us briefly review the key concepts we covered in this two-part series:
Graphical models: A graphical model consists of a graph structure where nodes represent random variables and edges represent dependencies between variables.
Bayesian networks: These are directed graphical models, with a conditional probability distribution table associated with each node.
Markov networks: These are undirected graphical models, with a potential function associated with each clique.
Conditional independences: Based on how the nodes in the graph are connected, we can write conditional independence statements of the form “X is independent of Y given Z.”
Parameter estimation: Given some data and the graph structure, we want to fill the CPD tables or compute the potential functions.
Inference: Given a graphical model, we want to answer questions about unobserved variables. These questions are usually one of the following — Marginal inference, posterior inference, and MAP inference.
Inference on general graphical models is computationally intractable. We can divide inference algorithms into two broad categories — exact and approximate. Variable elimination and belief propagation in acyclic graphs are examples of exact inference algorithms. Approximate inference algorithms are necessary for large-scale graphs, and usually fall into sampling-based methods or variational methods.
Conclusions
We looked at some of the core ideas in probabilistic graphical models in this two-part tutorial. As you should be able to appreciate at this point, graphical models provide an interpretable way to model many real-world tasks, where there are dependencies. Using graphical models gives us a way to work on such tasks in a principled manner.
Before we close, it is important to point out that this tutorial, by no means, is complete — many details have been skipped to keep the content intuitive and simple. The standard textbook on probabilistic graphical models is over a thousand pages! This tutorial is meant to serve as a starting point, to get you interested in the field, so that you can look up more rigorous resources.
Here are some additional resources that you can use to dig deeper into the field:
Graphical Models in a Nutshell
Graphical Models textbook
You should also be able to find a few chapters on graphical models in standard machine learning textbooks.

YOU’D ALSO LIKE:
Probabilistic Graphical Models Tutorial — Part 1

Basic terminology and the problem setting
blog.statsbot.co
Machine Learning Algorithms: Which One to Choose for Your Problem

Intuition of using different kinds of algorithms in different tasks
blog.statsbot.co
Neural networks for beginners: popular types and applications

An introduction to neural networks learning
blog.statsbot.co
Machine LearningData ScienceAlgorithmsBayesian StatisticsMarkov Chains
One clap, two clap, three clap, forty?
By clapping more or less, you can signal to us which stories really stand out.


265

Follow
Go to the profile of Prasoon Goyal
Prasoon Goyal
PhD candidate at UT Austin. For more content on machine learning by me, check my Quora profile (https://www.quora.com/profile/Prasoon-Goyal).
Follow
Stats and Bots
Stats and Bots
Data stories on machine learning and analytics. From Statsbot’s makers.
More from Stats and Bots
Neural networks for beginners: popular types and applications
Go to the profile of Jay Shah
Jay Shah

287

Also tagged Markov Chains
Implementing Markov Chain in Swift. Generating Texts.
Go to the profile of Swift The Sorrow
Swift The Sorrow

41

More on Algorithms from Stats and Bots
Data Structures Related to Machine Learning Algorithms
Go to the profile of Peter Mills
Peter Mills

275
Posted by uniqueone
,

An Introduction to different Types of Convolutions in Deep Learning
https://towardsdatascience.com/types-of-convolutions-in-deep-learning-717013397f4d

Homepage
Towards Data Science
Get started
HOMEDATA SCIENCEMACHINE LEARNINGPROGRAMMINGVISUALIZATIONEVENTSLETTERSCONTRIBUTE
Go to the profile of Paul-Louis Pröve
Paul-Louis Pröve
Artificial Intelligence @ PwC
Jul 22
An Introduction to different Types of Convolutions in Deep Learning

Let me give you a quick overview of different types of convolutions and what their benefits are. For the sake of simplicity, I’m focussing on 2D convolutions only.
Convolutions
First we need to agree on a few parameters that define a convolutional layer.

2D convolution using a kernel size of 3, stride of 1 and padding
Kernel Size: The kernel size defines the field of view of the convolution. A common choice for 2D is 3 — that is 3x3 pixels.
Stride: The stride defines the step size of the kernel when traversing the image. While its default is usually 1, we can use a stride of 2 for downsampling an image similar to MaxPooling.
Padding: The padding defines how the border of a sample is handled. A (half) padded convolution will keep the spatial output dimensions equal to the input, whereas unpadded convolutions will crop away some of the borders if the kernel is larger than 1.
Input & Output Channels: A convolutional layer takes a certain number of input channels (I) and calculates a specific number of output channels (O). The needed parameters for such a layer can be calculated by I*O*K, where K equals the number of values in the kernel.
Dilated Convolutions
(a.k.a. atrous convolutions)

2D convolution using a 3 kernel with a dilation rate of 2 and no padding
Dilated convolutions introduce another parameter to convolutional layers called the dilation rate. This defines a spacing between the values in a kernel. A 3x3 kernel with a dilation rate of 2 will have the same field of view as a 5x5 kernel, while only using 9 parameters. Imagine taking a 5x5 kernel and deleting every second column and row.
This delivers a wider field of view at the same computational cost. Dilated convolutions are particularly popular in the field of real-time segmentation. Use them if you need a wide field of view and cannot afford multiple convolutions or larger kernels.
Transposed Convolutions
(a.k.a. deconvolutions or fractionally strided convolutions)
Some sources use the name deconvolution, which is inappropriate because it’s not a deconvolution. To make things worse deconvolutions do exists, but they’re not common in the field of deep learning. An actual deconvolution reverts the process of a convolution. Imagine inputting an image into a single convolutional layer. Now take the output, throw it into a black box and out comes your original image again. This black box does a deconvolution. It is the mathematical inverse of what a convolutional layer does.
A transposed convolution is somewhat similar because it produces the same spatial resolution a hypothetical deconvolutional layer would. However, the actual mathematical operation that’s being performed on the values is different. A transposed convolutional layer carries out a regular convolution but reverts its spatial transformation.

2D convolution with no padding, stride of 2 and kernel of 3
At this point you should be pretty confused, so let’s look at a concrete example. An image of 5x5 is fed into a convolutional layer. The stride is set to 2, the padding is deactivated and the kernel is 3x3. This results in a 2x2 image.
If we wanted to reverse this process, we’d need the inverse mathematical operation so that 9 values are generated from each pixel we input. Afterward, we traverse the output image with a stride of 2. This would be a deconvolution.

Transposed 2D convolution with no padding, stride of 2 and kernel of 3
A transposed convolution does not do that. The only thing in common is it guarantees that the output will be a 5x5 image as well, while still performing a normal convolution operation. To achieve this, we need to perform some fancy padding on the input.
As you can imagine now, this step will not reverse the process from above. At least not concerning the numeric values.
It merely reconstructs the spatial resolution from before and performs a convolution. This may not be the mathematical inverse, but for Encoder-Decoder architectures, it’s still very helpful. This way we can combine the upscaling of an image with a convolution, instead of doing two separate processes.
Separable Convolutions
In a separable convolution, we can split the kernel operation into multiple steps. Let’s express a convolution as y = conv(x, k) where y is the output image, x is the input image, and k is the kernel. Easy. Next, let’s assume k can be calculated by: k = k1.dot(k2). This would make it a separable convolution because instead of doing a 2D convolution with k, we could get to the same result by doing 2 1D convolutions with k1 and k2.

Sobel X and Y filters
Take the Sobel kernel for example, which is often used in image processing. You could get the same kernel by multiplying the vector [1, 0, -1] and [1,2,1].T. This would require 6 instead of 9 parameters while doing the same operation.
The example above shows what’s called a spatial separable convolution, which to my knowledge isn’t used in deep learning. I just wanted to make sure you don’t get confused when stumbling upon those. In neural networks, we commonly use something called a depthwise separable convolution.
This will perform a spatial convolution while keeping the channels separate and then follow with a depthwise convolution. In my opinion, it can be best understood with an example.
Let’s say we have a 3x3 convolutional layer on 16 input channels and 32 output channels. What happens in detail is that every of the 16 channels is traversed by 32 3x3 kernels resulting in 512 (16x32) feature maps. Next, we merge 1 feature map out of every input channel by adding them up. Since we can do that 32 times, we get the 32 output channels we wanted.
For a depthwise separable convolution on the same example, we traverse the 16 channels with 1 3x3 kernel each, giving us 16 feature maps. Now, before merging anything, we traverse these 16 feature maps with 32 1x1 convolutions each and only then start to them add together. This results in 656 (16x3x3 + 16x32x1x1) parameters opposed to the 4608 (16x32x3x3) parameters from above.
The example is a specific implementation of a depthwise separable convolution where the so called depth multiplier is 1. This is by far the most common setup for such layers.
We do this because of the hypothesis that spatial and depthwise information can be decoupled. Looking at the performance of the Xception model this theory seems to work. Depthwise separable convolutions are also used for mobile devices because of their efficient use of parameters.
Questions?
This concludes our little tour through different types of convolutions. I hope it helped to get a brief overview of the matter. Drop a comment if you have any remaining questions and check out this GitHub page for more convolution animations.
Machine LearningConvolutionalCnnNeural NetworksDeep Learning
One clap, two clap, three clap, forty?
By clapping more or less, you can signal to us which stories really stand out.


572
5
Follow
Go to the profile of Paul-Louis Pröve
Paul-Louis Pröve
Medium member since Oct 2017
Artificial Intelligence @ PwC
Follow
Towards Data Science
Towards Data Science
Sharing concepts, ideas, and codes.
More from Towards Data Science
Making Your Own Spotify Discover Weekly Playlist
Go to the profile of Nick Behrens
Nick Behrens

1.4K

Also tagged Neural Networks
Yes you should understand backprop
Go to the profile of Andrej Karpathy
Andrej Karpathy

4.4K

Related reads
Using GANS for semi-supervised learning
In supervised learning, we have a training set of inputs x and class labels y. We train a model that takes x as input and gives y as output…
Go to the profile of Manish Chablani
Manish Chablani

34

Responses
Conversation between Krishna Teja and Paul-Louis Pröve.
Go to the profile of Krishna Teja
Krishna Teja
Sep 18
Hi Paul,
It’s a great post. I would like to add a bit to your explanation on usage of separable convolutions in neural networks.
Note: In neural networks a 2D convolution has 3 dimensions such as height, width and depth where the depth is always equivalent to the number of input channels. For example…
Read more…

8
1 response
Go to the profile of Paul-Louis Pröve
Paul-Louis Pröve
Oct 16
Krishna, thank you so much for taking the time and showing the details of the actual tensor transformations. When you use high level frameworks such as Keras you never touch this functional level. I have a couple of questions:
Are you aware of any papers using spatial separable convolutions in deep learning? It sounds like a…
Posted by uniqueone
,
역시나 래블업에서 다른 작업중에 파생된 결과물을 공개합니다.

구글에서 공개한 머신러닝 용어집 (https://developers.google.com/machine-learning/glossary/) 을 번역한 글입니다. TensorFlow 및 케라스 온라인 강의/실습을 준비하며 용어 통일 및 참조용으로 만든 글인데, 고칠 부분에 대해 피드백을 받을 겸 먼저 공개해 봅니다.

많은 도움이 되셨으면 합니다.

덧) 고수님들께 피드백도 많이 부탁 드립니다!
덧2) 가능하면 연말까지는 하루에 하나씩 backend.AI 및 코드온웹의 파생 결과물들을 공개해 보겠습니다!

https://www.codeonweb.com/@mookiekim/ml-glossary
Posted by uniqueone
,

Top 10 Videos on Deep Learning in Python
https://www.kdnuggets.com/2017/11/top-10-videos-deep-learning-python.html?utm_content=buffer765d8&utm_medium=social&utm_source=facebook.com&utm_campaign=buffer


This ‘Top 10’ list has been created on the basis of best content, and not exactly the number of views. To help you choose an appropriate framework, we first start with a video that compares few of the popular Python DL libraries. I have included the highlights and my views on the pros and cons of each of these 10 items, so you can choose one that best suits your needs. I have saved the best for last- the most comprehensive yet free YouTube course on DL ☺. Let’s begin!

1. Overview: Deep Learning Frameworks compared (96K views) - 5 minutes

Before I actually list the best DL in Python videos, it is important that one understands the differences between the 5 most popular deep learning frameworks -SciKit Learn, TensorFlow, Theano, Keras, and Caffe. This 5 minute video by Siraj Raval gives you the best possible comparison between the pros and cons of each framework and even presents the structure of code samples to help you better decide. Start with this.

2. Playlist: TensorFlow tutorial by Sentdex (114 K views) - 4.5 hours

This playlist of 14 videos by Sentdex is the most well-organized, thoroughly explained ,concise yet easy to follow tutorial on Deep Learning in Python. It includes TensorFlow implementation of a Recurrent Neural Network and Convolutional Neural Network with the MNIST dataset.

3. Individual tutorial: TensorFlow tutorial 02: Convolutional Neural Network (69.7 K views) - 36 minutes

This tutorial by Magnus Pedersen on the YouTube channel Hvass Laboratories, is worth its weight in gold- excellent comments in the code; plus, the instructor speaks without interruption. Watch this video to understand scripts in TensorFlow. Thank me later☺

4. Overview : How to predict stock prices easily (210 K views) - 9 minutes

In this video, Siraj Raval uses a special type of recurrent neural network called an LSTM network. He uses the Keras library with a TensorFlow backend. He explains the reason behind using recurrent nets for time series data and later, uses it to predict the daily closing price of the S&P 500 based on training data for 16 years. The link to the Github code is given in its description box.

5. Tutorial: Introduction to Deep Learning with Python and the Theano library (201 K views) - 52 minutes

If you want a talk on Python with the Theano library in under an hour, targeted towards beginners, then you can refer to this talk by Alec Radford. Unlike most other talks on this topic, this one compares the features of an ‘old’ net versus a ‘modern’ net, ie nets prior to 2000 versus nets post-2012.

6. Playlist: PyTorch Zero to All (3 K views) - 2 hours 15 minutes

In this series of 11 videos, Sung Kim teaches you PyTorch from the ground up. A highlight of this series is Lecture 10, where he teaches you to build a basic CNN with detailed emphasis of understanding the concept of CNN’s using his detailed diagrams.

7. Individual tutorial: TensorFlow tutorial (43.9 K views) - 49 minutes

This single tutorial by Edureka implements DL using TensorFlow. It is a very good tutorial for beginners in TensorFlow. It teaches TensorFlow basics and data structures. It also includes a usecase for using DL as a Naval Mine identifier- to identify whether an underwater obstacle is a rock or a mine.

8. Playlist: Deep Learning with Python (1.8K views) - 83 minutes

The YouTube channel ‘Machine Learning TV‘ has published a series of 15 videos totaling 83 minutes using Theano and Keras to use DL for automatic image captioning. It shows you how to train your first deep neural net for classifying digits from the MNIST dataset. It also has a good explanation on loading and reusing pre-trained models in Theano.

9. Playlist: Deep Learning with Keras- Python (30.3 K views) - 85 minutes

The YouTube channel ‘The SemiColon‘ has published a series of 11 videos on tutorials using Theano and Keras to implement a chatbot using DL. It includes explanations on Convolutional Neural Network, Recurrent Neural Network in Theano with Keras, Neural Networks and Backpropagation in scikit-learn library on the handwriting recognition (MNIST) dataset.

The speaking is punctuated by ‘umms’ and ‘ahhs’, but there is a good explanation on Word2Vec used to build chatbots.

10. Free online course: Deep Learning by Andrew Ng (Full course) (28 K views) - 4 week course

As in my previous Top 10 videos post on ML in Finance, I have saved the best for last☺ .If you want to learn Deep Learning as an online course from arguably the most famous ML instructor- Andrew Ng, then this playlist is for you. Intended as a 4-week course covering 98 videos, this course teaches you DL, Neural Networks, binary classification, derivatives, gradient descent, activation function, backpropagation, regularization, RMSprop, tuning, dropout, training and testing on different distributions, among others, using Python code in a Jupyter notebook.

 
Posted by uniqueone
,
https://m.facebook.com/groups/255834461424286?view=permalink&id=556808447993551

안녕하세요. 유령회원이 오랜만에 글을 적습니다. ;;;
올해도 다 지나갔네요....ㅠㅠ

Pattern Recognition and Machine Learning 책 2장 식(2.117)까지 정리한 노트입니다.

딥러닝 공부하면서 통계학 지식이 너무 없어서 혼자서 책보고 정리한 자료인데 원서라서 저같이 영알못 수알못에다 기억력까지 좋지 못한 경우는 다시보면 정말 첨보는 듯 한 느낌....다시 첨부터 읽어야하는듯한 자괴감을 막기위해 조금씩 주피터로 정리를 했습니다.

처음 관련 내용보았을 때 식이 너무 복잡해서 와 이거 뭐지..먼소리하는지 모르겠는데 싶었는데 어찌어찌 읽기는했네요.

식2.117까지는 꼭알아야겠다 싶어서 정리를 했는데 혹시나 보시고 계신분들 계시면 도움이 되면 좋겠습니다. 좀 어이없을 정도로 식을 풀어적어서 읽기 짜증나실 수 도 있습니다. 그냥 참고삼아..... ㅠㅠ

혹시 오류있으면 지적해주세요. 감사합니다.

http://nbviewer.jupyter.org/github/metamath1/ml-simple-works/blob/master/PRML/prml-chap2.ipynb
Posted by uniqueone
,

NumPy for MATLAB users – Mathesaurus

 

 

http://mathesaurus.sourceforge.net/matlab-numpy.html

 

 

 

NumPy for MATLAB users

Help

MATLAB/Octave Python Description
doc
help -i % browse with Info
help() Browse help interactively
help help or doc doc help Help on using help
help plot help(plot) or ?plot Help for a function
help splines or doc splines help(pylab) Help for a toolbox/library package
demo Demonstration examples

Searching available documentation

MATLAB/Octave Python Description
lookfor plot Search help files
help help(); modules [Numeric] List available packages
which plot help(plot) Locate functions

Using interactively

MATLAB/Octave Python Description
octave -q ipython -pylab Start session
TAB or M-? TAB Auto completion
foo(.m) execfile('foo.py') or run foo.py Run code from file
history hist -n Command history
diary on [..] diary off Save command history
exit or quit CTRL-D
CTRL-Z # windows
sys.exit()
End session

Operators

MATLAB/Octave Python Description
help - Help on operator syntax

Arithmetic operators

MATLAB/Octave Python Description
a=1; b=2; a=1; b=1 Assignment; defining a number
a + b a + b or add(a,b) Addition
a - b a - b or subtract(a,b) Subtraction
a * b a * b or multiply(a,b) Multiplication
a / b a / b or divide(a,b) Division
a .^ b a ** b
power(a,b)
pow(a,b)
Power, $a^b$
rem(a,b) a % b
remainder(a,b)
fmod(a,b)
Remainder
a+=1 a+=b or add(a,b,a) In place operation to save array creation overhead
factorial(a) Factorial, $n!$

Relational operators

MATLAB/Octave Python Description
a == b a == b or equal(a,b) Equal
a < b a < b or less(a,b) Less than
a > b a > b or greater(a,b) Greater than
a <= b a <= b or less_equal(a,b) Less than or equal
a >= b a >= b or greater_equal(a,b) Greater than or equal
a ~= b a != b or not_equal(a,b) Not Equal

Logical operators

MATLAB/Octave Python Description
a && b a and b Short-circuit logical AND
a || b a or b Short-circuit logical OR
a & b or and(a,b) logical_and(a,b) or a and b Element-wise logical AND
a | b or or(a,b) logical_or(a,b) or a or b Element-wise logical OR
xor(a, b) logical_xor(a,b) Logical EXCLUSIVE OR
~a or not(a)
~a or !a
logical_not(a) or not a Logical NOT
any(a) True if any element is nonzero
all(a) True if all elements are nonzero

root and logarithm

MATLAB/Octave Python Description
sqrt(a) math.sqrt(a) Square root
log(a) math.log(a) Logarithm, base $e$ (natural)
log10(a) math.log10(a) Logarithm, base 10
log2(a) math.log(a, 2) Logarithm, base 2 (binary)
exp(a) math.exp(a) Exponential function

Round off

MATLAB/Octave Python Description
round(a) around(a) or math.round(a) Round
ceil(a) ceil(a) Round up
floor(a) floor(a) Round down
fix(a) fix(a) Round towards zero

Mathematical constants

MATLAB/Octave Python Description
pi math.pi $\pi=3.141592$
exp(1) math.e or math.exp(1) $e=2.718281$

Missing values; IEEE-754 floating point status flags

MATLAB/Octave Python Description
NaN nan Not a Number
Inf inf Infinity, $\infty$
plus_inf Infinity, $+\infty$
minus_inf Infinity, $-\infty$
plus_zero Plus zero, $+0$
minus_zero Minus zero, $-0$

Complex numbers

MATLAB/Octave Python Description
i z = 1j Imaginary unit
z = 3+4i z = 3+4j or z = complex(3,4) A complex number, $3+4i$
abs(z) abs(3+4j) Absolute value (modulus)
real(z) z.real Real part
imag(z) z.imag Imaginary part
arg(z) Argument
conj(z) z.conj(); z.conjugate() Complex conjugate

Trigonometry

MATLAB/Octave Python Description
atan(a,b) atan2(b,a) Arctangent, $\arctan(b/a)$
hypot(x,y) Hypotenus; Euclidean distance

Generate random numbers

MATLAB/Octave Python Description
rand(1,10) random.random((10,))
random.uniform((10,))
Uniform distribution
2+5*rand(1,10) random.uniform(2,7,(10,))
Uniform: Numbers between 2 and 7
rand(6) random.uniform(0,1,(6,6))
Uniform: 6,6 array
randn(1,10) random.standard_normal((10,))
Normal distribution

Vectors

MATLAB/Octave Python Description
a=[2 3 4 5]; a=array([2,3,4,5]) Row vector, $1 \times n$-matrix
adash=[2 3 4 5]'; array([2,3,4,5])[:,NewAxis]
array([2,3,4,5]).reshape(-1,1)
r_[1:10,'c']
Column vector, $m \times 1$-matrix

Sequences

MATLAB/Octave Python Description
1:10 arange(1,11, dtype=Float)
range(1,11)
1,2,3, ... ,10
0:9 arange(10.) 0.0,1.0,2.0, ... ,9.0
1:3:10 arange(1,11,3) 1,4,7,10
10:-1:1 arange(10,0,-1) 10,9,8, ... ,1
10:-3:1 arange(10,0,-3) 10,7,4,1
linspace(1,10,7) linspace(1,10,7) Linearly spaced vector of n=7 points
reverse(a) a[::-1] or Reverse
a(:) = 3 a.fill(3), a[:] = 3 Set all values to same scalar value

Concatenation (vectors)

MATLAB/Octave Python Description
[a a] concatenate((a,a)) Concatenate two vectors
[1:4 a] concatenate((range(1,5),a), axis=1)

Repeating

MATLAB/Octave Python Description
[a a] concatenate((a,a)) 1 2 3, 1 2 3
a.repeat(3) or 1 1 1, 2 2 2, 3 3 3
a.repeat(a) or 1, 2 2, 3 3 3

Miss those elements out

MATLAB/Octave Python Description
a(2:end) a[1:] miss the first element
a([1:9]) miss the tenth element
a(end) a[-1] last element
a(end-1:end) a[-2:] last two elements

Maximum and minimum

MATLAB/Octave Python Description
max(a,b) maximum(a,b) pairwise max
max([a b]) concatenate((a,b)).max() max of all values in two vectors
[v,i] = max(a) v,i = a.max(0),a.argmax(0)

Vector multiplication

MATLAB/Octave Python Description
a.*a a*a Multiply two vectors
dot(u,v) dot(u,v) Vector dot product, $u \cdot v$

Matrices

MATLAB/Octave Python Description
a = [2 3;4 5] a = array([[2,3],[4,5]]) Define a matrix

Concatenation (matrices); rbind and cbind

MATLAB/Octave Python Description
[a ; b] concatenate((a,b), axis=0)
vstack((a,b))
Bind rows
[a , b] concatenate((a,b), axis=1)
hstack((a,b))
Bind columns
concatenate((a,b), axis=2)
dstack((a,b))
Bind slices (three-way arrays)
[a(:), b(:)] concatenate((a,b), axis=None) Concatenate matrices into one vector
[1:4 ; 1:4] concatenate((r_[1:5],r_[1:5])).reshape(2,-1)
vstack((r_[1:5],r_[1:5]))
Bind rows (from vectors)
[1:4 ; 1:4]' Bind columns (from vectors)

Array creation

MATLAB/Octave Python Description
zeros(3,5) zeros((3,5),Float) 0 filled array
zeros((3,5)) 0 filled array of integers
ones(3,5) ones((3,5),Float) 1 filled array
ones(3,5)*9 Any number filled array
eye(3) identity(3) Identity matrix
diag([4 5 6]) diag((4,5,6)) Diagonal
magic(3) Magic squares; Lo Shu
a = empty((3,3)) Empty array

Reshape and flatten matrices

MATLAB/Octave Python Description
reshape(1:6,3,2)'; arange(1,7).reshape(2,-1)
a.setshape(2,3)
Reshaping (rows first)
reshape(1:6,2,3); arange(1,7).reshape(-1,2).transpose() Reshaping (columns first)
a'(:) a.flatten() or Flatten to vector (by rows, like comics)
a(:) a.flatten(1)
Flatten to vector (by columns)
vech(a) Flatten upper triangle (by columns)

Shared data (slicing)

MATLAB/Octave Python Description
b = a b = a.copy() Copy of a

Indexing and accessing elements (Python: slicing)

MATLAB/Octave Python Description
a = [ 11 12 13 14 ...
21 22 23 24 ...
31 32 33 34 ]
a = array([[ 11, 12, 13, 14 ],
[ 21, 22, 23, 24 ],
[ 31, 32, 33, 34 ]])
Input is a 3,4 array
a(2,3) a[1,2] Element 2,3 (row,col)
a(1,:) a[0,] First row
a(:,1) a[:,0] First column
a([1 3],[1 4]); a.take([0,2]).take([0,3], axis=1)
Array as indices
a(2:end,:) a[1:,] All, except first row
a(end-1:end,:) a[-2:,] Last two rows
a(1:2:end,:) a[::2,:] Strides: Every other row
a[...,2] Third in last dimension (axis)
a(:,[1 3 4]) a.take([0,2,3],axis=1)
Remove one column
a.diagonal(offset=0) Diagonal

Assignment

MATLAB/Octave Python Description
a(:,1) = 99 a[:,0] = 99
a(:,1) = [99 98 97]' a[:,0] = array([99,98,97])
a(a>90) = 90; (a>90).choose(a,90)
a.clip(min=None, max=90)
Clipping: Replace all elements over 90
a.clip(min=2, max=5)
Clip upper and lower values

Transpose and inverse

MATLAB/Octave Python Description
a' a.conj().transpose()
Transpose
a.' or transpose(a) a.transpose() Non-conjugate transpose
det(a) linalg.det(a) or Determinant
inv(a) linalg.inv(a) or Inverse
pinv(a) linalg.pinv(a) Pseudo-inverse
norm(a) norm(a) Norms
eig(a) linalg.eig(a)[0]
Eigenvalues
svd(a) linalg.svd(a)
Singular values
chol(a) linalg.cholesky(a) Cholesky factorization
[v,l] = eig(a) linalg.eig(a)[1]
Eigenvectors
rank(a) rank(a) Rank

Sum

MATLAB/Octave Python Description
sum(a) a.sum(axis=0) Sum of each column
sum(a') a.sum(axis=1) Sum of each row
sum(sum(a)) a.sum() Sum of all elements
a.trace(offset=0) Sum along diagonal
cumsum(a) a.cumsum(axis=0) Cumulative sum (columns)

Sorting

MATLAB/Octave Python Description
a = [ 4 3 2 ; 2 8 6 ; 1 4 7 ] a = array([[4,3,2],[2,8,6],[1,4,7]]) Example data
sort(a(:)) a.ravel().sort() or Flat and sorted
sort(a) a.sort(axis=0) or msort(a) Sort each column
sort(a')' a.sort(axis=1) Sort each row
sortrows(a,1) a[a[:,0].argsort(),] Sort rows (by first row)
a.ravel().argsort() Sort, return indices
a.argsort(axis=0) Sort each column, return indices
a.argsort(axis=1) Sort each row, return indices

Maximum and minimum

MATLAB/Octave Python Description
max(a) a.max(0) or amax(a [,axis=0]) max in each column
max(a') a.max(1) or amax(a, axis=1) max in each row
max(max(a)) a.max() or max in array
[v i] = max(a) return indices, i
max(b,c) maximum(b,c) pairwise max
cummax(a)
a.ptp(); a.ptp(0) max-to-min range

Matrix manipulation

MATLAB/Octave Python Description
fliplr(a) fliplr(a) or a[:,::-1] Flip left-right
flipud(a) flipud(a) or a[::-1,] Flip up-down
rot90(a) rot90(a) Rotate 90 degrees
repmat(a,2,3)
kron(ones(2,3),a)
kron(ones((2,3)),a) Repeat matrix: [ a a a ; a a a ]
triu(a) triu(a) Triangular, upper
tril(a) tril(a) Triangular, lower

Equivalents to "size"

MATLAB/Octave Python Description
size(a) a.shape or a.getshape() Matrix dimensions
size(a,2) or length(a) a.shape[1] or size(a, axis=1) Number of columns
length(a(:)) a.size or size(a[, axis=None]) Number of elements
ndims(a) a.ndim Number of dimensions
a.nbytes Number of bytes used in memory

Matrix- and elementwise- multiplication

MATLAB/Octave Python Description
a .* b a * b or multiply(a,b) Elementwise operations
a * b matrixmultiply(a,b) Matrix product (dot product)
inner(a,b) or Inner matrix vector multiplication $a\cdot b'$
outer(a,b) or Outer product
kron(a,b) kron(a,b) Kronecker product
a / b Matrix division, $b{\cdot}a^{-1}$
a \ b linalg.solve(a,b)
Left matrix division, $b^{-1}{\cdot}a$ \newline (solve linear equations)
vdot(a,b) Vector dot product
cross(a,b) Cross product

Find; conditional indexing

MATLAB/Octave Python Description
find(a) a.ravel().nonzero()
Non-zero elements, indices
[i j] = find(a) (i,j) = a.nonzero()
(i,j) = where(a!=0)
Non-zero elements, array indices
[i j v] = find(a) v = a.compress((a!=0).flat)
v = extract(a!=0,a)
Vector of non-zero values
find(a>5.5) (a>5.5).nonzero()
Condition, indices
a.compress((a>5.5).flat)
Return values
a .* (a>5.5) where(a>5.5,0,a) or a * (a>5.5) Zero out elements above 5.5
a.put(2,indices) Replace values

Multi-way arrays

MATLAB/Octave Python Description
a = cat(3, [1 2; 1 2],[3 4; 3 4]); a = array([[[1,2],[1,2]], [[3,4],[3,4]]]) Define a 3-way array
a(1,:,:) a[0,...]

File input and output

MATLAB/Octave Python Description
f = load('data.txt') f = fromfile("data.txt")
f = load("data.txt")
Reading from a file (2d)
f = load('data.txt') f = load("data.txt") Reading from a file (2d)
x = dlmread('data.csv', ';') f = load('data.csv', delimiter=';') Reading fram a CSV file (2d)
save -ascii data.txt f save('data.csv', f, fmt='%.6f', delimiter=';') Writing to a file (2d)
f.tofile(file='data.csv', format='%.6f', sep=';') Writing to a file (1d)
f = fromfile(file='data.csv', sep=';') Reading from a file (1d)

Plotting

Basic x-y plots

MATLAB/Octave Python Description
plot(a) plot(a) 1d line plot
plot(x(:,1),x(:,2),'o') plot(x[:,0],x[:,1],'o') 2d scatter plot
plot(x1,y1, x2,y2) plot(x1,y1,'bo', x2,y2,'go') Two graphs in one plot
plot(x1,y1)
hold on
plot(x2,y2)
plot(x1,y1,'o')
plot(x2,y2,'o')
show() # as normal
Overplotting: Add new plots to current
subplot(211) subplot(211) subplots
plot(x,y,'ro-') plot(x,y,'ro-') Plotting symbols and color

Axes and titles

MATLAB/Octave Python Description
grid on grid() Turn on grid lines
axis equal
axis('equal')
replot
figure(figsize=(6,6)) 1:1 aspect ratio
axis([ 0 10 0 5 ]) axis([ 0, 10, 0, 5 ]) Set axes manually
title('title')
xlabel('x-axis')
ylabel('y-axis')
Axis labels and titles
text(2,25,'hello') Insert text

Log plots

MATLAB/Octave Python Description
semilogy(a) semilogy(a) logarithmic y-axis
semilogx(a) semilogx(a) logarithmic x-axis
loglog(a) loglog(a) logarithmic x and y axes

Filled plots and bar plots

MATLAB/Octave Python Description
fill(t,s,'b', t,c,'g')
% fill has a bug?
fill(t,s,'b', t,c,'g', alpha=0.2) Filled plot

Functions

MATLAB/Octave Python Description
f = inline('sin(x/3) - cos(x/5)') Defining functions
ezplot(f,[0,40])
fplot('sin(x/3) - cos(x/5)',[0,40])
% no ezplot
x = arrayrange(0,40,.5)
y = sin(x/3) - cos(x/5)
plot(x,y, 'o')
Plot a function for given range

Polar plots

MATLAB/Octave Python Description
theta = 0:.001:2*pi;
r = sin(2*theta);
theta = arange(0,2*pi,0.001)
r = sin(2*theta)
polar(theta, rho) polar(theta, rho)

Histogram plots

MATLAB/Octave Python Description
hist(randn(1000,1))
hist(randn(1000,1), -4:4)
plot(sort(a))

3d data

Contour and image plots

MATLAB/Octave Python Description
contour(z) levels, colls = contour(Z, V,
origin='lower', extent=(-3,3,-3,3))
clabel(colls, levels, inline=1,
fmt='%1.1f', fontsize=10)
Contour plot
contourf(z); colormap(gray) contourf(Z, V,
cmap=cm.gray,
origin='lower',
extent=(-3,3,-3,3))
Filled contour plot
image(z)
colormap(gray)
im = imshow(Z,
interpolation='bilinear',
origin='lower',
extent=(-3,3,-3,3))
Plot image data
# imshow() and contour() as above Image with contours
quiver() quiver() Direction field vectors

Perspective plots of surfaces over the x-y plane

MATLAB/Octave Python Description
n=-2:.1:2;
[x,y] = meshgrid(n,n);
z=x.*exp(-x.^2-y.^2);
n=arrayrange(-2,2,.1)
[x,y] = meshgrid(n,n)
z = x*power(math.e,-x**2-y**2)
mesh(z) Mesh plot
surf(x,y,z) or surfl(x,y,z)
% no surfl()
Surface plot

Scatter (cloud) plots

MATLAB/Octave Python Description
plot3(x,y,z,'k+') 3d scatter plot

Save plot to a graphics file

MATLAB/Octave Python Description
plot(1:10)
print -depsc2 foo.eps
gset output "foo.eps"
gset terminal postscript eps
plot(1:10)
savefig('foo.eps') PostScript
savefig('foo.pdf') PDF
savefig('foo.svg') SVG (vector graphics for www)
print -dpng foo.png savefig('foo.png') PNG (raster graphics)

Data analysis

Set membership operators

MATLAB/Octave Python Description
a = [ 1 2 2 5 2 ];
b = [ 2 3 4 ];
a = array([1,2,2,5,2])
b = array([2,3,4])
a = set([1,2,2,5,2])
b = set([2,3,4])
Create sets
unique(a) unique1d(a)
unique(a)
set(a)
Set unique
union(a,b) union1d(a,b)
a.union(b)
Set union
intersect(a,b) intersect1d(a)
a.intersection(b)
Set intersection
setdiff(a,b) setdiff1d(a,b)
a.difference(b)
Set difference
setxor(a,b) setxor1d(a,b)
a.symmetric_difference(b)
Set exclusion
ismember(2,a) 2 in a
setmember1d(2,a)
contains(a,2)
True for set member

Statistics

MATLAB/Octave Python Description
mean(a) a.mean(axis=0)
mean(a [,axis=0])
Average
median(a) median(a) or median(a [,axis=0]) Median
std(a) a.std(axis=0) or std(a [,axis=0]) Standard deviation
var(a) a.var(axis=0) or var(a) Variance
corr(x,y) correlate(x,y) or corrcoef(x,y) Correlation coefficient
cov(x,y) cov(x,y) Covariance

Interpolation and regression

MATLAB/Octave Python Description
z = polyval(polyfit(x,y,1),x)
plot(x,y,'o', x,z ,'-')
(a,b) = polyfit(x,y,1)
plot(x,y,'o', x,a*x+b,'-')
Straight line fit
a = x\y linalg.lstsq(x,y)
Linear least squares $y = ax + b$
polyfit(x,y,3) polyfit(x,y,3) Polynomial fit

Non-linear methods

Polynomials, root finding

MATLAB/Octave Python Description
poly() Polynomial
roots([1 -1 -1]) roots() Find zeros of polynomial
f = inline('1/x - (x-1)')
fzero(f,1)
Find a zero near $x = 1$
solve('1/x = x-1') Solve symbolic equations
polyval([1 2 1 2],1:10) polyval(array([1,2,1,2]),arange(1,11)) Evaluate polynomial

Differential equations

MATLAB/Octave Python Description
diff(a) diff(x, n=1, axis=0) Discrete difference function and approximate derivative
Solve differential equations

Fourier analysis

MATLAB/Octave Python Description
fft(a) fft(a) or Fast fourier transform
ifft(a) ifft(a) or Inverse fourier transform
convolve(x,y) Linear convolution

Symbolic algebra; calculus

MATLAB/Octave Python Description
factor() Factorization

Programming

MATLAB/Octave Python Description
.m .py Script file extension
%
% or #
# Comment symbol (rest of line)
% must be in MATLABPATH
% must be in LOADPATH
from pylab import * Import library functions
string='a=234';
eval(string)
string="a=234"
eval(string)
Eval

Loops

MATLAB/Octave Python Description
for i=1:5; disp(i); end for i in range(1,6): print(i) for-statement
for i=1:5
disp(i)
disp(i*2)
end
for i in range(1,6):
print(i)
print(i*2)
Multiline for statements

Conditionals

MATLAB/Octave Python Description
if 1>0 a=100; end if 1>0: a=100 if-statement
if 1>0 a=100; else a=0; end if-else-statement

Debugging

MATLAB/Octave Python Description
ans Most recent evaluated expression
whos or who List variables loaded into memory
clear x or clear [all] Clear variable $x$ from memory
disp(a) print a Print

Working directory and OS

MATLAB/Octave Python Description
dir or ls os.listdir(".") List files in directory
what grep.grep("*.py") List script files in directory
pwd os.getcwd() Displays the current working directory
cd foo os.chdir('foo') Change working directory
!notepad
system("notepad")
os.system('notepad')
os.popen('notepad')
Invoke a System Command

Time-stamp: "2007-11-09T16:46:36 vidar"
©2006 Vidar Bronken Gundersen, /mathesaurus.sf.net
Permission is granted to copy, distribute and/or modify this document as long as the above attribution is retained.

Posted by uniqueone
,

Machine Learning · Artificial Inteligence
https://leonardoaraujosantos.gitbooks.io/artificial-inteligence/content/machine_learning.html
Posted by uniqueone
,

How a 22 year old from Shanghai won a global deep learning challenge
https://blog.getnexar.com/how-a-22-year-old-from-shanghai-won-a-global-deep-learning-challenge-76f2299446a1
Posted by uniqueone
,

https://machinelearningmastery.com/keras-functional-api-deep-learning/

 

How to Use the Keras Functional API for Deep Learning

The Keras Python library makes creating deep learning models fast and easy.

The sequential API allows you to create models layer-by-layer for most problems. It is limited in that it does not allow you to create models that share layers or have multiple inputs or outputs.

The functional API in Keras is an alternate way of creating models that offers a lot more flexibility, including creating more complex models.

In this tutorial, you will discover how to use the more flexible functional API in Keras to define deep learning models.

After completing this tutorial, you will know:

  • The difference between the Sequential and Functional APIs.
  • How to define simple Multilayer Perceptron, Convolutional Neural Network, and Recurrent Neural Network models using the functional API.
  • How to define more complex models with shared layers and multiple inputs and outputs.

Let’s get started.

Tutorial Overview

This tutorial is divided into 6 parts; they are:

  1. Keras Sequential Models
  2. Keras Functional Models
  3. Standard Network Models
  4. Shared Layers Model
  5. Multiple Input and Output Models
  6. Best Practices

1. Keras Sequential Models

As a review, Keras provides a Sequential model API.

This is a way of creating deep learning models where an instance of the Sequential class is created and model layers are created and added to it.

For example, the layers can be defined and passed to the Sequential as an array:

Layers can also be added piecewise:

The Sequential model API is great for developing deep learning models in most situations, but it also has some limitations.

For example, it is not straightforward to define models that may have multiple different input sources, produce multiple output destinations or models that re-use layers.

2. Keras Functional Models

The Keras functional API provides a more flexible way for defining models.

It specifically allows you to define multiple input or output models as well as models that share layers. More than that, it allows you to define ad hoc acyclic network graphs.

Models are defined by creating instances of layers and connecting them directly to each other in pairs, then defining a Model that specifies the layers to act as the input and output to the model.

Let’s look at the three unique aspects of Keras functional API in turn:

1. Defining Input

Unlike the Sequential model, you must create and define a standalone Input layer that specifies the shape of input data.

The input layer takes a shape argument that is a tuple that indicates the dimensionality of the input data.

When input data is one-dimensional, such as for a multilayer Perceptron, the shape must explicitly leave room for the shape of the mini-batch size used when splitting the data when training the network. Therefore, the shape tuple is always defined with a hanging last dimension (2,), for example:

 

2. Connecting Layers

The layers in the model are connected pairwise.

This is done by specifying where the input comes from when defining each new layer. A bracket notation is used, such that after the layer is created, the layer from which the input to the current layer comes from is specified.

Let’s make this clear with a short example. We can create the input layer as above, then create a hidden layer as a Dense that receives input only from the input layer.

Note the (visible) after the creation of the Dense layer that connects the input layer output as the input to the dense hidden layer.

It is this way of connecting layers piece by piece that gives the functional API its flexibility. For example, you can see how easy it would be to start defining ad hoc graphs of layers.

3. Creating the Model

After creating all of your model layers and connecting them together, you must define the model.

As with the Sequential API, the model is the thing you can summarize, fit, evaluate, and use to make predictions.

Keras provides a Model class that you can use to create a model from your created layers. It requires that you only specify the input and output layers. For example:

Now that we know all of the key pieces of the Keras functional API, let’s work through defining a suite of different models and build up some practice with it.

Each example is executable and prints the structure and creates a diagram of the graph. I recommend doing this for your own models to make it clear what exactly you have defined.

My hope is that these examples provide templates for you when you want to define your own models using the functional API in the future.

3. Standard Network Models

When getting started with the functional API, it is a good idea to see how some standard neural network models are defined.

In this section, we will look at defining a simple multilayer Perceptron, convolutional neural network, and recurrent neural network.

These examples will provide a foundation for understanding the more elaborate examples later.

Multilayer Perceptron

In this section, we define a multilayer Perceptron model for binary classification.

The model has 10 inputs, 3 hidden layers with 10, 20, and 10 neurons, and an output layer with 1 output. Rectified linear activation functions are used in each hidden layer and a sigmoid activation function is used in the output layer, for binary classification.

Running the example prints the structure of the network.

A plot of the model graph is also created and saved to file.

Multilayer Perceptron Network Graph

Multilayer Perceptron Network Graph

Convolutional Neural Network

In this section, we will define a convolutional neural network for image classification.

The model receives black and white 64×64 images as input, then has a sequence of two convolutional and pooling layers as feature extractors, followed by a fully connected layer to interpret the features and an output layer with a sigmoid activation for two-class predictions.

Running the example summarizes the model layers.

A plot of the model graph is also created and saved to file.

Convolutional Neural Network Graph

Convolutional Neural Network Graph

Recurrent Neural Network

In this section, we will define a long short-term memory recurrent neural network for sequence classification.

The model expects 100 time steps of one feature as input. The model has a single LSTM hidden layer to extract features from the sequence, followed by a fully connected layer to interpret the LSTM output, followed by an output layer for making binary predictions.

Running the example summarizes the model layers.

A plot of the model graph is also created and saved to file.

Recurrent Neural Network Graph

Recurrent Neural Network Graph

4. Shared Layers Model

Multiple layers can share the output from one layer.

For example, there may be multiple different feature extraction layers from an input, or multiple layers used to interpret the output from a feature extraction layer.

Let’s look at both of these examples.

Shared Input Layer

In this section, we define multiple convolutional layers with differently sized kernels to interpret an image input.

The model takes black and white images with the size 64×64 pixels. There are two CNN feature extraction submodels that share this input; the first has a kernel size of 4 and the second a kernel size of 8. The outputs from these feature extraction submodels are flattened into vectors and concatenated into one long vector and passed on to a fully connected layer for interpretation before a final output layer makes a binary classification.

Running the example summarizes the model layers.

A plot of the model graph is also created and saved to file.

Neural Network Graph With Shared Inputs

Neural Network Graph With Shared Inputs

Shared Feature Extraction Layer

In this section, we will two parallel submodels to interpret the output of an LSTM feature extractor for sequence classification.

The input to the model is 100 time steps of 1 feature. An LSTM layer with 10 memory cells interprets this sequence. The first interpretation model is a shallow single fully connected layer, the second is a deep 3 layer model. The output of both interpretation models are concatenated into one long vector that is passed to the output layer used to make a binary prediction.

Running the example summarizes the model layers.

A plot of the model graph is also created and saved to file.

Neural Network Graph With Shared Feature Extraction Layer

Neural Network Graph With Shared Feature Extraction Layer

5. Multiple Input and Output Models

The functional API can also be used to develop more complex models with multiple inputs, possibly with different modalities. It can also be used to develop models that produce multiple outputs.

We will look at examples of each in this section.

Multiple Input Model

We will develop an image classification model that takes two versions of the image as input, each of a different size. Specifically a black and white 64×64 version and a color 32×32 version. Separate feature extraction CNN models operate on each, then the results from both models are concatenated for interpretation and ultimate prediction.

Note that in the creation of the Model() instance, that we define the two input layers as an array. Specifically:

The complete example is listed below.

Running the example summarizes the model layers.

A plot of the model graph is also created and saved to file.

Neural Network Graph With Multiple Inputs

Neural Network Graph With Multiple Inputs

Multiple Output Model

In this section, we will develop a model that makes two different types of predictions. Given an input sequence of 100 time steps of one feature, the model will both classify the sequence and output a new sequence with the same length.

An LSTM layer interprets the input sequence and returns the hidden state for each time step. The first output model creates a stacked LSTM, interprets the features, and makes a binary prediction. The second output model uses the same output layer to make a real-valued prediction for each input time step.

Running the example summarizes the model layers.

A plot of the model graph is also created and saved to file.

Neural Network Graph With Multiple Outputs

Neural Network Graph With Multiple Outputs

6. Best Practices

In this section, I want to give you some tips to get the most out of the functional API when you are defining your own models.

  • Consistent Variable Names. Use the same variable name for the input (visible) and output layers (output) and perhaps even the hidden layers (hidden1, hidden2). It will help to connect things together correctly.
  • Review Layer Summary. Always print the model summary and review the layer outputs to ensure that the model was connected together as you expected.
  • Review Graph Plots. Always create a plot of the model graph and review it to ensure that everything was put together as you intended.
  • Name the layers. You can assign names to layers that are used when reviewing summaries and plots of the model graph. For example: Dense(1, name=’hidden1′).
  • Separate Submodels. Consider separating out the development of submodels and combine the submodels together at the end.

Do you have your own best practice tips when using the functional API?
Let me know in the comments.

Further Reading

This section provides more resources on the topic if you are looking go deeper.

Summary

In this tutorial, you discovered how to use the functional API in Keras for defining simple and complex deep learning models.

Specifically, you learned:

  • The difference between the Sequential and Functional APIs.
  • How to define simple Multilayer Perceptron, Convolutional Neural Network, and Recurrent Neural Network models using the functional API.
  • How to define more complex models with shared layers and multiple inputs and outputs.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

Posted by uniqueone
,

MNIST Kaggle submission with CNN Keras Swish activation
https://medium.com/@shahariarrabby/mnist-kaggle-submission-with-cnn-keras-switch-activation-62108f9463df
Posted by uniqueone
,
https://github.com/sergeytulyakov/mocogan

 

 

MoCoGAN: Decomposing Motion and Content for Video Generation

This repository contains an implementation and further details of MoCoGAN: Decomposing Motion and Content for Video Generation by Sergey Tulyakov, Ming-Yu Liu, Xiaodong Yang, Jan Kautz.

Representation

MoCoGAN is a generative model for videos, which generates videos from random inputs. It features separated representations of motion and content, offering control over what is generated. For example, MoCoGAN can generate the same object performing different actions, as well as the same action performed by different objects

MoCoGAN Representation

Examples of generated videos

We trained MoCoGAN on the MUG Facial Expression Database to generate facial expressions. When fixing the content code and changing the motion code, it generated the same person performs different expressions. When fixing the motion code and changing the content code, it generated different people performs the same expression. In the figure shown below, each column has fixed identity, each row shows the same action:

Facial expressions

We trained MoCoGAN on a human action dataset where content is represented by the performer, executing several actions. When fixing the content code and changing the motion code, it generated the same person performs different actions. When fixing the motion code and changing the content code, it generated different people performs the same action. Each pair of images represents the same action executed by different people:

Human actions

We have collected a large-scale TaiChi dataset including 4.5K videos of TaiChi performers. Below are videos generated by MoCoGAN.

TaiChi

Training MoCoGAN

Please refer to a wiki page

Citation

If you use MoCoGAN in your research please cite our paper:

Sergey Tulyakov, Ming-Yu Liu, Xiaodong Yang, Jan Kautz, "MoCoGAN: Decomposing Motion and Content for Video Generation"

Posted by uniqueone
,

Keras Tutorial: The Ultimate Beginner's Guide to Deep Learning in Python
https://elitedatascience.com/keras-tutorial-deep-learning-in-python?utm_content=bufferbce2c&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer
Posted by uniqueone
,

Introduction to Numpy -1 : An absolute beginners guide to Machine Learning and Data science.

 

https://hackernoon.com/introduction-to-numpy-1-an-absolute-beginners-guide-to-machine-learning-and-data-science-5d87f13f0d51

 

 

Lets get started quickly. Numpy is a math library for python. It enables us to do computation efficiently and effectively. It is better than regular python because of it’s amazing capabilities.

In this article I’m just going to introduce you to the basics of what is mostly required for machine learning and datascience. I’m not going to cover everything that’s possible with numpy library. This is the part one of numpy tutorial series.

The first thing I want to introduce you to is the way you import it.

import numpy as np

Okay, now we’re telling python that “np” is the official reference to numpy from further on.

Let’s create python array and np array.

# python array
a = [1,2,3,4,5,6,7,8,9]
# numpy array
A = np.array([1,2,3,4,5,6,7,8,9])

If I were to print them, I wouldn’t see much difference.

print(a)
print(A)
====================================================================[1, 2, 3, 4, 5, 6, 7, 8, 9]
[1 2 3 4 5 6 7 8 9]

Okay, but why do I have to use an np array instead of a regular array?

The answer is that np arrays are better interms of faster computation and ease of manipulation.

More on those details here, if you’re interested:

Let’s proceed further with more cool stuff. Wait, there was nothing cool we saw yet! Okay, here’s something:

np.arange()

np.arange(0,10,2)
====================================================================array([0, 2, 4, 6, 8])

What arange([start],stop,[step]) does is that it arranges numbers from starting to stop, in steps of step. Here is what it means for np.arange(0,10,2):

return an np list starting from 0 all the way upto 10 but don’t include 10 and increment numbers by 2 each time.

So, that’s how we get :

array([0, 2, 4, 6, 8])

important thing remember here is that the stopping number is not going to be included in the list.

another example:

np.arange(2,29,5)
====================================================================
array([ 2, 7, 12, 17, 22, 27])

Before I proceed further, I’ll have to warn you that this “array” is interchangeably called “matrix” or also “vector”. So don’t get panicked when I say for example “Matrix shape is 2 X 3”. All it means is that array looks something like this:

array([ 2,  7, 12,], 
[17, 22, 27])

Now, Let’s talk about the shape of a default np array.

Shape is an attribute for np array. When a default array, say for example A is called with shape, here is how it looks.

A = [1, 2, 3, 4, 5, 6, 7, 8, 9] 
A.shape
====================================================================
(9,)

This is a rank 1 matrix(array), where it just has 9 elements in a row. 
Ideally it should be a 1 X 9 matrix right?

I agree with you, so that’s where reshape() comes into play. It is a method that changes the dimensions of your original matrix into your desired dimension.

Let’s look at reshape in action. You can pass a tuple of whatever dimension you want as long as the reshaped matrix and original matrix have the same number of elements.

A = [1, 2, 3, 4, 5, 6, 7, 8, 9]
A.reshape(1,9)
====================================================================
array([[1, 2, 3, 4, 5, 6, 7, 8, 9]])

Notice that reshape returns a multi-dim matrix. Two square brackets in the beginning indicate that. [[1, 2, 3, 4, 5, 6, 7, 8, 9]] is a potentially multi-dim matrix as opposed to [1, 2, 3, 4, 5, 6, 7, 8, 9].

Another example:

B = [1, 2, 3, 4, 5, 6, 7, 8, 9] 
B.reshape(3,3)
====================================================================
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])

If I look at B’s shape, it’s going to be (3,3):

B.shape
====================================================================
(3,3)

Perfect. Let’s proceed to np.zeros().

This time it’s your job to tell me what happens looking at this code:

np.zeros((4,3))
====================================================================
???????????

Good, if you thought it’s going to print a 4 X 3 matrix filled with zeros. Here’s the output:

np.zeros((4,3))
====================================================================
array([[ 0., 0., 0.],
[ 0., 0., 0.],
[ 0., 0., 0.],
[ 0., 0., 0.]])

np.zeros((n,m)) returns an n x m matrix that contains zeros. It’s as simple as that.

Let’s guess again here: what does is np.eye() do?

Hint: eye() stands for Identity.

np.eye(5)
====================================================================
array([[ 1., 0., 0., 0., 0.],
[ 0., 1., 0., 0., 0.],
[ 0., 0., 1., 0., 0.],
[ 0., 0., 0., 1., 0.],
[ 0., 0., 0., 0., 1.]])

np.eye() returns an identity matrix with the specified dimensions.

What if we have to multiply 2 matrices?

No problem, we have np.dot().

np.dot() performs matrix multiplication, provided both the matrices are “multiply-able”. It just means that the number of columns of the first matrix must match the number of rows in second matrix.

ex: A = (2,3) & B=(3,2). Here number of cols in A= 3. Number of rows in B = 3. Since they match, multiplication is possible.

Let’s illustrate multiplication via np code:

# generate an identity matrix of (3 x 3)
I = np.eye(3)
I
====================================================================
array([[ 1., 0., 0.],
[ 0., 1., 0.],
[ 0., 0., 1.]])
# generate another (3 x 3) matrix to be multiplied.
D = np.arange(1,10).reshape(3,3)
D
====================================================================
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])

We now prepared both the matrices to be multiplied. Let’s see them in action.

# perform actual dot product.
M = np.dot(D,I)
M
====================================================================
array([[ 1., 2., 3.],
[ 4., 5., 6.],
[ 7., 8., 9.]])

Great! Now you know how easy and possible it is to multiply matrices! Also, notice that the entire array is now float type.

What about adding Elements of the matrix?

# add all the elements of matrix.
sum_val = np.sum(M)
sum_val
====================================================================
45.0

np.sum() adds all the elements of the matrix.

However are 2 variants.

1. Sum along the rows.

# sum along the rows
np.sum(M,axis=1)
====================================================================
array([ 6., 15., 24.])

6 is the sum of 1st row (1, 2, 3).

15 is the sum of 2nd row (4, 5, 6).

24 is the sum of 3rd row (7, 8, 9).

2. Sum along the columns.

# sum along the cols
np.sum(M,axis=0)
====================================================================
array([ 12., 15., 18.])

12 is the sum of 1st col (1, 4, 7).

15 is the sum of 2nd col (2, 5, 8).

18 is the sum of 3rd col (3, 6, 9).

Here is the follow up tutorial — part 2 . That’s it at this point.

Here’s a video tutorial explaining everything that I did if you’re interested to consume via video.

If you liked this article, a clap/recommendation would be really appreciated. It helps me to write more such articles.

 

Posted by uniqueone
,
https://m.facebook.com/groups/869457839786067?view=permalink&id=1478681038863741

기계학습을 공부한 산업공학과의 입장에서 공부할만한 한글로된 자료(강의자료 + 영상)을 모아봤습니다. 아래의 자료는 코드 실습이 필요한 내용일 경우는 언어는 전부 Python입니다. (통계학개론강의만 제외) 딥러닝에 관한 한글로된 영상 및 자료는 김성훈 교수님의 "모두의 딥러닝"이 있습니다. (정말 감사드립니다.) 영어로된 좋은 강의도 많지만 (cs231n, cs224d , RL course by David Silver, Neural Network course by Hugo Larochelle), 영어로 본격적으로 딥러닝을 공부하기전에 빠르게 익숙한 언어로 기계학습을 공부해보실 분들은 참고하셔도 좋을 것 같습니다.

cf. 웹프로그래밍은 사심으로 넣어봤습니다.

[k-mooc]
미적분학1 (성균관대 채영도 교수님)
- http://www.kmooc.kr/courses/course-v1:SKKUk+SKKU_EXGB506.01K+2017_SKKU22/about

미적분학2 (성균관대 채영도 교수님)
- http://www.kmooc.kr/courses/course-v1:SKKUk+SKKU_2017_05-01+2017_SKKU01/about

선형대수학 (성균관대 이상구 교수님)
- http://www.kmooc.kr/courses/course-v1:SKKUk+SKKU_2017_01+2017_SKKU01/about

R을 활용한 통계학개론 (부산대 김충락 교수님)
- http://www.kmooc.kr/courses/course-v1:PNUk+RS_C01+2017_KM_009/about

인공지능과 기계학습 (카이스트 오혜연 교수님)
- http://www.kmooc.kr/courses/course-v1:KAISTk+KCS470+2017_K0202/about

[kooc]
파이썬 자료구조 (카이스트 문일철 교수님)
- http://kooc.kaist.ac.kr/datastructure-2017f

영상이해를 위한 최적화 (카이스트 김창익 교수님)
- http://kooc.kaist.ac.kr/optimization2017/lecture/10543

인공지능 및 기계학습 개론 1 (카이스트 문일철 교수님)
- http://kooc.kaist.ac.kr/machinelearning1_17/lecture/10574
- http://seslab.kaist.ac.kr/xe2/page_GBex27

인공지능 및 기계학습 개론 2 (카이스트 문일철 교수님)
- http://kooc.kaist.ac.kr/machinelearning2__17/lecture/10573
- http://seslab.kaist.ac.kr/xe2/page_Dogc43
 
인공지능 및 기계학습 심화 (카이스트 문일철 교수님)
- http://kooc.kaist.ac.kr/machinelearning3
- http://seslab.kaist.ac.kr/xe2/page_lMmY25

[TeamLab]
데이터과학을 위한 파이썬 입문 (가천대 최성철 교수님)
- https://github.com/TeamLab/Gachon_CS50_Python_KMOOC

밑바닥부터 기계학습 (가천대 최성철 교수님)
- https://github.com/TeamLab/machine_learning_from_scratch_with_python

경영과학(가천대 최성철 교수님)
- https://github.com/TeamLab/Gachon_CS50_OR_KMOOC

웹프로그래밍 (가천대 최성철 교수님)
- https://github.com/TeamLab/cs50_web_programming

Posted by uniqueone
,
https://m.facebook.com/story.php?story_fbid=383487222087279&id=303538826748786

Naive Bayes Classification
우리는 베이즈 정리(Bayes Theorem)라고 알려진 사후확률 (posterior probability)에 관한 몇가지 수학을 논의할 것입니다. 이것은 Naive Bayes Classifier의 핵심 부분입니다. 그리고, Python의 sklearn 라이브러리를 탐색하고 논의할 문제에 대해 Python의 Naive Bayes Classifier의 코드를 작성합니다.
이 글은 두부분으로 나누어져 있습니다 . 파트1 에서는 naive bayes classier가 어떻게 작동하는지 설명합니다. 파트2 에서는 Python에서 Naive Bayes Classifier를 제공하는 sklearn 라이브러리를 사용한 프로그래밍 연습으로 구성됩니다. 그리고 우리가 학습시키는 프로그램의 정확성에 대해 논의합니다.

원문
https://medium.com/machine-learning-101/chapter-1-supervised-learning-and-naive-bayes-classification-part-1-theory-8b9e361897d5
Posted by uniqueone
,

Machine Learning  |  Google Developers
https://developers.google.com/machine-learning/glossary/



Products   Machine Learning   Glossary
목차
A
B
C
D
E
F
G
H
I
K
L
M
N
O
P
Q
R
S
T
U
V
W

A


accuracy

The fraction of predictions that a classification model got right. In multi-class classification, accuracy is defined as follows:

Accuracy=CorrectPredictionsTotalNumberOfExamples
In binary classification, accuracy has the following definition:

Accuracy=TruePositives+TrueNegativesTotalNumberOfExamples
See true positive and true negative.


activation function

A function (for example, ReLU or sigmoid) that takes in the weighted sum of all of the inputs from the previous layer and then generates and passes an output value (typically nonlinear) to the next layer.


AdaGrad

A sophisticated gradient descent algorithm that rescales the gradients of each parameter, effectively giving each parameter an independent learning rate. For a full explanation, see this paper.


AUC (Area under the ROC Curve)

An evaluation metric that considers all possible classification thresholds.

The Area Under the ROC curve is the probability that a classifier will be more confident that a randomly chosen positive example is actually positive than that a randomly chosen negative example is positive.

B


backpropagation

The primary algorithm for performing gradient descent on neural networks. First, the output values of each node are calculated (and cached) in a forward pass. Then, the partial derivative of the error with respect to each parameter is calculated in a backward pass through the graph.


baseline

A simple model or heuristic used as reference point for comparing how well a model is performing. A baseline helps model developers quantify the minimal, expected performance on a particular problem.


batch

The set of examples used in one iteration (that is, one gradient update) of model training.


batch size

The number of examples in a batch. For example, the batch size of SGD is 1, while the batch size of a mini-batch is usually between 10 and 1000. Batch size is usually fixed during training and inference; however, TensorFlow does permit dynamic batch sizes.


bias

An intercept or offset from an origin. Bias (also known as the bias term) is referred to as b or w0 in machine learning models. For example, bias is the b in the following formula:

y′=b+w1x1+w2x2+…wnxn
Not to be confused with prediction bias.


binary classification

A type of classification task that outputs one of two mutually exclusive classes. For example, a machine learning model that evaluates email messages and outputs either "spam" or "not spam" is a binary classifier.


binning

See bucketing.


bucketing

Converting a (usually continuous) feature into multiple binary features called buckets or bins, typically based on value range. For example, instead of representing temperature as a single continuous floating-point feature, you could chop ranges of temperatures into discrete bins. Given temperature data sensitive to a tenth of a degree, all temperatures between 0.0 and 15.0 degrees could be put into one bin, 15.1 to 30.0 degrees could be a second bin, and 30.1 to 50.0 degrees could be a third bin.

C


calibration layer

A post-prediction adjustment, typically to account for prediction bias. The adjusted predictions and probabilities should match the distribution of an observed set of labels.


candidate sampling

A training-time optimization in which a probability is calculated for all the positive labels, using, for example, softmax, but only for a random sample of negative labels. For example, if we have an example labeled beagle and dog candidate sampling computes the predicted probabilities and corresponding loss terms for the beagle and dog class outputs in addition to a random subset of the remaining classes (cat, lollipop, fence). The idea is that the negative classes can learn from less frequent negative reinforcement as long as positive classes always get proper positive reinforcement, and this is indeed observed empirically. The motivation for candidate sampling is a computational efficiency win from not computing predictions for all negatives.


checkpoint

Data that captures the state of the variables of a model at a particular time. Checkpoints enable exporting model weights, as well as performing training across multiple sessions. Checkpoints also enable training to continue past errors (for example, job preemption). Note that the graph itself is not included in a checkpoint.


class

One of a set of enumerated target values for a label. For example, in a binary classification model that detects spam, the two classes are spam and not spam. In a multi-class classification model that identifies dog breeds, the classes would be poodle, beagle, pug, and so on.


class-imbalanced data set

A binary classification problem in which the labels for the two classes have significantly different frequencies. For example, a disease data set in which 0.0001 of examples have positive labels and 0.9999 have negative labels is a class-imbalanced problem, but a football game predictor in which 0.51 of examples label one team winning and 0.49 label the other team winning is not a class-imbalanced problem.


classification model

A type of machine learning model for distinguishing among two or more discrete classes. For example, a natural language processing classification model could determine whether an input sentence was in French, Spanish, or Italian. Compare with regression model.


classification threshold

A scalar-value criterion that is applied to a model's predicted score in order to separate the positive class from the negative class. Used when mapping logistic regression results to binary classification. For example, consider a logistic regression model that determines the probability of a given email message being spam. If the classification threshold is 0.9, then logistic regression values above 0.9 are classified as spam and those below 0.9 are classified as not spam.


confusion matrix

An NxN table that summarizes how successful a classification model's predictions were; that is, the correlation between the label and the model's classification. One axis of a confusion matrix is the label that the model predicted, and the other axis is the actual label. N represents the number of classes. In a binary classification problem, N=2. For example, here is a sample confusion matrix for a binary classification problem:

Tumor (predicted) Non-Tumor (predicted)
Tumor (actual) 18 1
Non-Tumor (actual) 6 452
The preceding confusion matrix shows that of the 19 samples that actually had tumors, the model correctly classified 18 as having tumors (18 true positives), and incorrectly classified 1 as not having a tumor (1 false negative). Similarly, of 458 samples that actually did not have tumors, 452 were correctly classified (452 true negatives) and 6 were incorrectly classified (6 false positives).

The confusion matrix of a multi-class confusion matrix can help you determine mistake patterns. For example, a confusion matrix could reveal that a model trained to recognize handwritten digits tends to mistakenly predict 9 instead of 4, or 1 instead of 7. The confusion matrix contains sufficient information to calculate a variety of performance metrics, including precision and recall.


continuous feature

A floating-point feature with an infinite range of possible values. Contrast with discrete feature.


convergence

Informally, often refers to a state reached during training in which training loss and validation loss change very little or not at all with each iteration after a certain number of iterations. In other words, a model reaches convergence when additional training on the current data will not improve the model. In deep learning, loss values sometimes stay constant or nearly so for many iterations before finally descending, temporarily producing a false sense of convergence.

See also early stopping.

See also Convex Optimization by Boyd and Vandenberghe.


convex function

A function typically shaped approximately like the letter U or a bowl. However, in degenerate cases, a convex function is shaped like a line. For example, the following are all convex functions:

L2 loss
Log Loss
L1 regularization
L2 regularization
Convex functions are popular loss functions. That's because when a minimum value exists (as is often the case), many variations of gradient descent are guaranteed to find a point close to the minimum point of the function. Similarly, many variations of stochastic gradient descent have a high probability (though, not a guarantee) of finding a point close to the minimum.

The sum of two convex functions (for example, L2 loss + L1 regularization) is a convex function.

Deep models are usually not convex functions. Remarkably, algorithms designed for convex optimization tend to work reasonably well on deep networks anyway, even though they rarely find a minimum.


cost

Synonym for loss.


cross-entropy

A generalization of Log Loss to multi-class classification problems. Cross-entropy quantifies the difference between two probability distributions. See also perplexity.

D


data set

A collection of examples.


decision boundary

The separator between classes learned by a model in a binary class or multi-class classification problems. For example, in the following image representing a binary classification problem, the decision boundary is the frontier between the orange class and the blue class:

A
well-defined boundary between one class and another.


deep model

A type of neural network containing multiple hidden layers. Deep models rely on trainable nonlinearities.

Contrast with wide model.


dense feature

A feature in which most values are non-zero, typically a Tensor of floating-point values. Contrast with sparse feature.


derived feature

Synonym for synthetic feature.


discrete feature

A feature with a finite set of possible values. For example, a feature whose values may only be animal, vegetable, or mineral is a discrete (or categorical) feature. Contrast with continuous feature.


dropout regularization

A form of regularization useful in training neural networks. Dropout regularization works by removing a random selection of a fixed number of the units in a network layer for a single gradient step. The more units dropped out, the stronger the regularization. This is analogous to training the network to emulate an exponentially large ensemble of smaller networks. For full details, see Dropout: A Simple Way to Prevent Neural Networks from Overfitting.


dynamic model

A model that is trained online in a continuously updating fashion. That is, data is continuously entering the model.

E


early stopping

A method for regularization that involves ending model training before training loss finishes decreasing. In early stopping, you end model training when the loss on a validation data set starts to increase, that is, when generalization performance worsens.


embeddings

A categorical feature represented as a continuous-valued feature. Typically, an embedding is a translation of a high-dimensional vector into a low-dimensional space. For example, you can represent the words in an English sentence in either of the following two ways:

As a million-element (high-dimensional) sparse vector in which all elements are integers. Each cell in the vector represents a separate English word; the value in a cell represents the number of times that word appears in a sentence. Since a single English sentence is unlikely to contain more than 50 words, nearly every cell in the vector will contain a 0. The few cells that aren't 0 will contain a low integer (usually 1) representing the number of times that word appeared in the sentence.
As a several-hundred-element (low-dimensional) dense vector in which each element holds a floating-point value between 0 and 1.
In TensorFlow, embeddings are trained by backpropagating loss just like any other parameter in a neural network.


empirical risk minimization (ERM)

Choosing the model function that minimizes loss on the training set. Contrast with structural risk minimization.


ensemble

A merger of the predictions of multiple models. You can create an ensemble via one or more of the following:

different initializations
different hyperparameters
different overall structure
Deep and wide models are a kind of ensemble.


Estimator

An instance of the tf.Estimator class, which encapsulates logic that builds a TensorFlow graph and runs a TensorFlow session. You may create your own Estimators (as described here) or instantiate pre-made Estimators created by others.


example

One row of a data set. An example contains one or more features and possibly a label. See also labeled example and unlabeled example.

F


false negative (FN)

An example in which the model mistakenly predicted the negative class. For example, the model inferred that a particular email message was not spam (the negative class), but that email message actually was spam.


false positive (FP)

An example in which the model mistakenly predicted the positive class. For example, the model inferred that a particular email message was spam (the positive class), but that email message was actually not spam.


false positive rate (FP rate)

The x-axis in an ROC curve. The FP rate is defined as follows:

FalsePositiveRate=FalsePositivesFalsePositives+TrueNegatives

feature

An input variable used in making predictions.


feature columns (FeatureColumns)

A set of related features, such as the set of all possible countries in which users might live. An example may have one or more features present in a feature column.

Feature columns in TensorFlow also encapsulate metadata such as:

the feature's data type
whether a feature is fixed length or should be converted to an embedding
A feature column can contain a single feature.

"Feature column" is Google-specific terminology. A feature column is referred to as a "namespace" in the VW system (at Yahoo/Microsoft), or a field.


feature cross

A synthetic feature formed by crossing (multiplying or taking a Cartesian product of) individual features. Feature crosses help represent nonlinear relationships.


feature engineering

The process of determining which features might be useful in training a model, and then converting raw data from log files and other sources into said features. In TensorFlow, feature engineering often means converting raw log file entries to tf.Example protocol buffers. See also tf.Transform.

Feature engineering is sometimes called feature extraction.


feature set

The group of feature your machine learning model trains on. For example, postal code, property size, and property condition might comprise a simple feature set for a model that predicts housing prices.


feature spec

Describes the information required to extract features data from the tf.Example protocol buffer. Because the tf.Example protocol buffer is just a container for data, you must specify the following:

the data to extract (that is, the keys for the features)
the data type (for example, float or int)
The length (fixed or variable)
The Estimator API provides facilities for producing a feature spec from a list of FeatureColumns.


full softmax

See softmax. Contrast with candidate sampling.

G


generalization

Refers to your model's ability to make correct predictions on new, previously unseen data as opposed to the data used to train the model.


generalized linear model

A generalization of least squares regression models, which are based on Gaussian noise, to other types of models based on other types of noise, such as Poisson noise or categorical noise. Examples of generalized linear models include:

logistic regression
multi-class regression
least squares regression
The parameters of a generalized linear model can be found through convex optimization.

Generalized linear models exhibit the following properties:

The average prediction of the optimal least squares regression model is equal to the average label on the training data.
The average probability predicted by the optimal logistic regression model is equal to the average label on the training data.
The power of a generalized linear model is limited by its features. Unlike a deep model, a generalized linear model cannot "learn new features."


gradient

The vector of partial derivatives with respect to all of the independent variables. In machine learning, the gradient is the the vector of partial derivatives of the model function. The gradient points in the direction of steepest ascent.


gradient clipping

Capping gradient values before applying them. Gradient clipping helps ensure numerical stability and prevents exploding gradients.


gradient descent

A technique to minimize loss by computing the gradients of loss with respect to the model's parameters, conditioned on training data. Informally, gradient descent iteratively adjusts parameters, gradually finding the best combination of weights and bias to minimize loss.


graph

In TensorFlow, a computation specification. Nodes in the graph represent operations. Edges are directed and represent passing the result of an operation (a Tensor) as an operand to another operation. Use TensorBoard to visualize a graph.

H


heuristic

A practical and nonoptimal solution to a problem, which is sufficient for making progress or for learning from.


hidden layer

A synthetic layer in a neural network between the input layer (that is, the features) and the output layer (the prediction). A neural network contains one or more hidden layers.


hinge loss

A family of loss functions for classification designed to find the decision boundary as distant as possible from each training example, thus maximizing the margin between examples and the boundary. KSVMs use hinge loss (or a related function, such as squared hinge loss). For binary classification, the hinge loss function is defined as follows:

loss=max(0,1−(y′∗y))
where y' is the raw output of the classifier model:

y′=b+w1x1+w2x2+…wnxn
and y is the true label, either -1 or +1.

Consequently, a plot of hinge loss vs. (y * y') looks as follows:

A
plot of hinge loss vs raw classifier score shows a distinct hinge at the
coordinate (1,0).


holdout data

Examples intentionally not used ("held out") during training. The validation data set and test data set are examples of holdout data. Holdout data helps evaluate your model's ability to generalize to data other than the data it was trained on. The loss on the holdout set provides a better estimate of the loss on an unseen data set than does the loss on the training set.


hyperparameter

The "knobs" that you tweak during successive runs of training a model. For example, learning rate is a hyperparameter.

Contrast with parameter.

I


independently and identically distributed (i.i.d)

Data drawn from a distribution that doesn't change, and where each value drawn doesn't depend on values that have been drawn previously. An i.i.d. is the ideal gas of machine learning—a useful mathematical construct but almost never exactly found in the real world. For example, the distribution of visitors to a web page may be i.i.d. over a brief window of time; that is, the distribution doesn't change during that brief window and one person's visit is generally independent of another's visit. However, if you expand that window of time, seasonal differences in the web page's visitors may appear.


inference

In machine learning, often refers to the process of making predictions by applying the trained model to unlabeled examples. In statistics, inference refers to the process of fitting the parameters of a distribution conditioned on some observed data. (See the Wikipedia article on statistical inference.)


input layer

The first layer (the one that receives the input data) in a neural network.


instance

Synonym for example.


inter-rater agreement

A measurement of how often human raters agree when doing a task. If raters disagree, the task instructions may need to be improved. Also sometimes called inter-annotator agreement or inter-rater reliability. See also Cohen's kappa, which is one of the most popular inter-rater agreement measurements.

K


Kernel Support Vector Machines (KSVMs)

A classification algorithm that seeks to maximize the margin between positive and negative classes by mapping input data vectors to a higher dimensional space. For example, consider a classification problem in which the input data set consists of a hundred features. In order to maximize the margin between positive and negative classes, KSVMs could internally map those features into a million-dimension space. KSVMs uses a loss function called hinge loss.

L


L1 loss

Loss function based on the absolute value of the difference between the values that a model is predicting and the actual values of the labels. L1 loss is less sensitive to outliers than L2 loss.


L1 regularization

A type of regularization that penalizes weights in proportion to the sum of the absolute values of the weights. In models relying on sparse features, L1 regularization helps drive the weights of irrelevant or barely relevant features to exactly 0, which removes those features from the model. Contrast with L2 regularization.


L2 loss

See squared loss.


L2 regularization

A type of regularization that penalizes weights in proportion to the sum of the squares of the weights. L2 regularization helps drive outlier weights (those with high positive or low negative values) closer to 0 but not quite to 0. (Contrast with L1 regularization.) L2 regularization always improves generalization in linear models.


label

In supervised learning, the "answer" or "result" portion of an example. Each example in a labeled data set consists of one or more features and a label. For instance, in a housing data set, the features might include the number of bedrooms, the number of bathrooms, and the age of the house, while the label might be the house's price. in a spam detection dataset, the features might include the subject line, the sender, and the email message itself, while the label would probably be either "spam" or "not spam."


labeled example

An example that contains features and a label. In supervised training, models learn from labeled examples.


lambda

Synonym for regularization rate.

(This is an overloaded term. Here we're focusing on the term's definition within regularization.)


layer

A set of neurons in a neural network that process a set of input features, or the output of those neurons.

Also, an abstraction in TensorFlow. Layers are Python functions that take Tensors and configuration options as input and produce other tensors as output. Once the necessary Tensors have been composed, the user can convert the result into an Estimator via a model function.


learning rate

A scalar used to train a model via gradient descent. During each iteration, the gradient descent algorithm multiplies the learning rate by the gradient. The resulting product is called the gradient step.

Learning rate is a key hyperparameter.


least squares regression

A linear regression model trained by minimizing L2 Loss.


linear regression

A type of regression model that outputs a continuous value from a linear combination of input features.


logistic regression

A model that generates a probability for each possible discrete label value in classification problems by applying a sigmoid function to a linear prediction. Although logistic regression is often used in binary classification problems, it can also be used in multi-class classification problems (where it becomes called multi-class logistic regression or multinomial regression).


Log Loss

The loss function used in binary logistic regression.


loss

A measure of how far a model's predictions are from its label. Or, to phrase it more pessimistically, a measure of how bad the model is. To determine this value, a model must define a loss function. For example, linear regression models typically use mean squared error for a loss function, while logistic regression models use Log Loss.

M


machine learning

A program or system that builds (trains) a predictive model from input data. The system uses the learned model to make useful predictions from new (never-before-seen) data drawn from the same distribution as the one used to train the model. Machine learning also refers to the field of study concerned with these programs or systems.


Mean Squared Error (MSE)

The average squared loss per example. MSE is calculated by dividing the squared loss by the number of examples. The values that TensorFlow Playground displays for "Training loss" and "Test loss" are MSE.


metric

A number that you care about. May or may not be directly optimized in a machine-learning system. A metric that your system tries to optimize is called an objective.


mini-batch

A small, randomly selected subset of the entire batch of examples run together in a single iteration of training or inference. The batch size of a mini-batch is usually between 10 and 1,000. It is much more efficient to calculate the loss on a mini-batch than on the full training data.


mini-batch stochastic gradient descent (SGD)

A gradient descent algorithm that uses mini-batches. In other words, mini-batch SGD estimates the gradient based on a small subset of the training data. Vanilla SGD uses a mini-batch of size 1.


ML

Abbreviation for machine learning.


model

The representation of what an ML system has learned from the training data. This is an overloaded term, which can have either of the following two related meanings:

The TensorFlow graph that expresses the structure of how a prediction will be computed.
The particular weights and biases of that TensorFlow graph, which are determined by training.

model training

The process of determining the best model.


Momentum

A sophisticated gradient descent algorithm in which a learning step depends not only on the derivative in the current step, but also on the derivatives in the step(s) that immediately preceded it. Momentum involves computing an exponentially weighted moving average of the gradients over time, analogous to momentum in physics. Momentum sometimes prevents learning from getting stuck in local minima.


multi-class

Classification problems that distinguish among more than two classes. For example, there are approximately 128 species of maple trees, so a model that categorized maple tree species would be multi-class. Conversely, a model that divided emails into only two categories (spam and not spam) would be a binary classification model.

N


NaN trap

When one number in your model becomes a NaN during training, which causes many or all other numbers in your model to eventually become a NaN.

NaN is an abbreviation for "Not a Number."


negative class

In binary classification, one class is termed positive and the other is termed negative. The positive class is the thing we're looking for and the negative class is the other possibility. For example, the negative class in a medical test might be "not tumor." The negative class in an email classifier might be "not spam." See also positive class.


neural network

A model that, taking inspiration from the brain, is composed of layers (at least one of which is hidden) consisting of simple connected units or neurons followed by nonlinearities.


neuron

A node in a neural network, typically taking in multiple input values and generating one output value. The neuron calculates the output value by applying an activation function (nonlinear transformation) to a weighted sum of input values.


normalization

The process of converting an actual range of values into a standard range of values, typically -1 to +1 or 0 to 1. For example, suppose the natural range of a certain feature is 800 to 6,000. Through subtraction and division, you can normalize those values into the range -1 to +1.

See also scaling.


numpy

An open-source math library that provides efficient array operations in Python. pandas is built on numpy.

O


objective

A metric that your algorithm is trying to optimize.


offline inference

Generating a group of predictions, storing those predictions, and then retrieving those predictions on demand. Contrast with online inference.


one-hot encoding

A sparse vector in which:

One element is set to 1.
All other elements are set to 0.
One-hot encoding is commonly used to represent strings or identifiers that have a finite set of possible values. For example, suppose a given botany data set chronicles 15,000 different species, each denoted with a unique string identifier. As part of feature engineering, you'll probably encode those string identifiers as one-hot vectors in which the vector has a size of 15,000.


one-vs.-all

Given a classification problem with N possible solutions, a one-vs.-all solution consists of N separate binary classifiers—one binary classifier for each possible outcome. For example, given a model that classifies examples as animal, vegetable, or mineral, a one-vs.-all solution would provide the following three separate binary classifiers:

animal vs. not animal
vegetable vs. not vegetable
mineral vs. not mineral

online inference

Generating predictions on demand. Contrast with offline inference.


Operation (op)

A node in the TensorFlow graph. In TensorFlow, any procedure that creates, manipulates, or destroys a Tensor is an operation. For example, a matrix multiply is an operation that takes two Tensors as input and generates one Tensor as output.


optimizer

A specific implementation of the gradient descent algorithm. TensorFlow's base class for optimizers is tf.train.Optimizer. Different optimizers (subclasses of tf.train.Optimizer) account for concepts such as:

momentum (Momentum)
update frequency (AdaGrad = ADAptive GRADient descent; Adam = ADAptive with Momentum; RMSProp)
sparsity/regularization (Ftrl)
more complex math (Proximal, and others)
You might even imagine an NN-driven optimizer.


outliers

Values distant from most other values. In machine learning, any of the following are outliers:

Weights with high absolute values.
Predicted values relatively far away from the actual values.
Input data whose values are more than roughly 3 standard deviations from the mean.
Outliers often cause problems in model training.


output layer

The "final" layer of a neural network. The layer containing the answer(s).


overfitting

Creating a model that matches the training data so closely that the model fails to make correct predictions on new data.

P


pandas

A column-oriented data analysis API. Many ML frameworks, including TensorFlow, support pandas data structures as input. See pandas documentation.


parameter

A variable of a model that the ML system trains on its own. For example, weights are parameters whose values the ML system gradually learns through successive training iterations. Contrast with hyperparameter.


Parameter Server (PS)

A job that keeps track of a model's parameters in a distributed setting.


parameter update

The operation of adjusting a model's parameters during training, typically within a single iteration of gradient descent.


partial derivative

A derivative in which all but one of the variables is considered a constant. For example, the partial derivative of f(x, y) with respect to x is the derivative of f considered as a function of x alone (that is, keeping y constant). The partial derivative of f with respect to x focuses only on how x is changing and ignores all other variables in the equation.


partitioning strategy

The algorithm by which variables are divided across parameter servers.


performance

Overloaded term with the following meanings:

The traditional meaning within software engineering. Namely: How fast (or efficiently) does this piece of software run?
The meaning within ML. Here, performance answers the following question: How correct is this model? That is, how good are the model's predictions?

perplexity

One measure of how well a model is accomplishing its task. For example, suppose your task is to read the first few letters of a word a user is typing on a smartphone keyboard, and to offer a list of possible completion words. Perplexity, P, for this task is approximately the number of guesses you need to offer in order for your list to contain the actual word the user is trying to type.

Perplexity is related to cross-entropy as follows:

P=2−crossentropy

pipeline

The infrastructure surrounding a machine learning algorithm. A pipeline includes gathering the data, putting the data into training data files, training one or more models, and exporting the models to production.


positive class

In binary classification, the two possible classes are labeled as positive and negative. The positive outcome is the thing we're testing for. (Admittedly, we're simultaneously testing for both outcomes, but play along.) For example, the positive class in a medical test might be "tumor." The positive class in an email classifier might be "spam."

Contrast with negative class.


precision

A metric for classification models. Precision identifies the frequency with which a model was correct when predicting the positive class. That is:

Precision=TruePositivesTruePositives+FalsePositives

prediction

A model's output when provided with an input example.


prediction bias

A value indicating how far apart the average of predictions is from the average of labels in the data set.


pre-made Estimator

An Estimator that someone has already built. TensorFlow provides several pre-made Estimators, including DNNClassifier, DNNRegressor, and LinearClassifier. You may build your own pre-made Estimators by following these instructions.


pre-trained model

Models or model components (such as embeddings) that have been already been trained. Sometimes, you'll feed pre-trained embeddings into a neural network. Other times, your model will train the embeddings itself rather than rely on the pre-trained embeddings.


prior belief

What you believe about the data before you begin training on it. For example, L2 regularization relies on a prior belief that weights should be small and normally distributed around zero.

Q


queue

A TensorFlow Operation that implements a queue data structure. Typically used in I/O.

R


rank

Overloaded term in ML that can mean either of the following:

The number of dimensions in a Tensor. For instance, a scalar has rank 0, a vector has rank 1, and a matrix has rank 2.
The ordinal position of a class in an ML problem that categorizes classes from highest to lowest. For example, a behavior ranking system could rank a dog's rewards from highest (a steak) to lowest (wilted kale).

rater

A human who provides labels in examples. Sometimes called an "annotator."


recall

A metric for classification models that answers the following question: Out of all the possible positive labels, how many did the model correctly identify? That is:

Recall=TruePositivesTruePositives+FalseNegatives

Rectified Linear Unit (ReLU)

An activation function with the following rules:

If input is negative or zero, output is 0.
If input is positive, output is equal to input.

regression model

A type of model that outputs continuous (typically, floating-point) values. Compare with classification models, which output discrete values, such as "day lily" or "tiger lily."


regularization

The penalty on a model's complexity. Regularization helps prevent overfitting. Different kinds of regularization include:

L1 regularization
L2 regularization
dropout regularization
early stopping (this is not a formal regularization method, but can effectively limit overfitting)

regularization rate

A scalar value, represented as lambda, specifying the relative importance of the regularization function. The following simplified loss equation shows the regularization rate's influence:

minimize(loss function + λ(regularization function))
Raising the regularization rate reduces overfitting but may make the model less accurate.


representation

The process of mapping data to useful features.


ROC (receiver operating characteristic) Curve

A curve of true positive rate vs. false positive rate at different classification thresholds. See also AUC.


root directory

The directory you specify for hosting subdirectories of the TensorFlow checkpoint and events files of multiple models.


Root Mean Squared Error (RMSE)

The square root of the Mean Squared Error.

S


Saver

A TensorFlow object responsible for saving model checkpoints.


scaling

A commonly used practice in feature engineering to tame a feature's range of values to match the range of other features in the data set. For example, suppose that you want all floating-point features in the data set to have a range of 0 to 1. Given a particular feature's range of 0 to 500, you could scale that feature by dividing each value by 500.

See also normalization.


scikit-learn

A popular open-source ML platform. See www.scikit-learn.org.


sequence model

A model whose inputs have a sequential dependence. For example, predicting the next video watched from a sequence of previously watched videos.


session

Maintains state (for example, variables) within a TensorFlow program.


sigmoid function

A function that maps logistic or multinomial regression output (log odds) to probabilities, returning a value between 0 and 1. The sigmoid function has the following formula:

y=11+e−σ
where σ in logistic regression problems is simply:

σ=b+w1x1+w2x2+…wnxn
In other words, the sigmoid function converts σ into a probability between 0 and 1.

In some neural networks, the sigmoid function acts as the activation function.


softmax

A function that provides probabilities for each possible class in a multi-class classification model. The probabilities add up to exactly 1.0. For example, softmax might determine that the probability of a particular image being a dog at 0.9, a cat at 0.08, and a horse at 0.02. (Also called full softmax.)

Contrast with candidate sampling.


sparse feature

Feature vector whose values are predominately zero or empty. For example, a vector containing a single 1 value and a million 0 values is sparse. As another example, words in a search query could also be a sparse feature—there are many possible words in a given language, but only a few of them occur in a given query.

Contrast with dense feature.


squared loss

The loss function used in linear regression. (Also known as L2 Loss.) This function calculates the squares of the difference between a model's predicted value for a labeled example and the actual value of the label. Due to squaring, this loss function amplifies the influence of bad predictions. That is, squared loss reacts more strongly to outliers than L1 loss.


static model

A model that is trained offline.


stationarity

A property of data in a data set, in which the data distribution stays constant across one or more dimensions. Most commonly, that dimension is time, meaning that data exhibiting stationarity doesn't change over time. For example, data that exhibits stationarity doesn't change from September to December.


step

A forward and backward evaluation of one batch.


step size

Synonym for learning rate.


stochastic gradient descent (SGD)

A gradient descent algorithm in which the batch size is one. In other words, SGD relies on a single example chosen uniformly at random from a data set to calculate an estimate of the gradient at each step.


structural risk minimization (SRM)

An algorithm that balances two goals:

The desire to build the most predictive model (for example, lowest loss).
The desire to keep the model as simple as possible (for example, strong regularization).
For example, a model function that minimizes loss+regularization on the training set is a structural risk minimization algorithm.

For more information, see http://www.svms.org/srm/.

Contrast with empirical risk minimization.


summary

In TensorFlow, a value or set of values calculated at a particular step, usually used for tracking model metrics during training.


supervised machine learning

Training a model from input data and its corresponding labels. Supervised machine learning is analogous to a student learning a subject by studying a set of questions and their corresponding answers. After mastering the mapping between questions and answers, the student can then provide answers to new (never-before-seen) questions on the same topic. Compare with unsupervised machine learning.


synthetic feature

A feature that is not present among the input features, but is derived from one or more of them. Kinds of synthetic features include the following:

Multiplying one feature by itself or by other feature(s). (These are termed feature crosses.)
Dividing one feature by a second feature.
Bucketing a continuous feature into range bins.
Features created by normalizing or scaling alone are not considered synthetic features.

T


target

Synonym for label.


Tensor

The primary data structure in TensorFlow programs. Tensors are N-dimensional (where N could be very large) data structures, most commonly scalars, vectors, or matrices. The elements of a Tensor can hold integer, floating-point, or string values.


Tensor Processing Unit (TPU)

An ASIC (application-specific integrated circuit) that optimizes the performance of TensorFlow programs.


Tensor rank

See rank.


Tensor shape

The number of elements a Tensor contains in various dimensions. For example, a [5, 10] Tensor has a shape of 5 in one dimension and 10 in another.


Tensor size

The total number of scalars a Tensor contains. For example, a [5, 10] Tensor has a size of 50.


TensorBoard

The dashboard that displays the summaries saved during the execution of one or more TensorFlow programs.


TensorFlow

A large-scale, distributed, machine learning platform. The term also refers to the base API layer in the TensorFlow stack, which supports general computation on dataflow graphs.

Although TensorFlow is primarily used for machine learning, you may also use TensorFlow for non-ML tasks that require numerical computation using dataflow graphs.


TensorFlow Playground

A program that visualizes how different hyperparameters influence model (primarily neural network) training. Go to http://playground.tensorflow.org to experiment with TensorFlow Playground.


TensorFlow Serving

A platform to deploy trained models in production.


test set

The subset of the data set that you use to test your model after the model has gone through initial vetting by the validation set.

Contrast with training set and validation set.


tf.Example

A standard protocol buffer for describing input data for machine learning model training or inference.


training

The process of determining the ideal parameters comprising a model.


training set

The subset of the data set used to train a model.

Contrast with validation set and test set.


true negative (TN)

An example in which the model correctly predicted the negative class. For example, the model inferred that a particular email message was not spam, and that email message really was not spam.


true positive (TP)

An example in which the model correctly predicted the positive class. For example, the model inferred that a particular email message was spam, and that email message really was spam.


true positive rate (TP rate)

Synonym for recall. That is:

TruePositiveRate=TruePositivesTruePositives+FalseNegatives
True positive rate is the y-axis in an ROC curve.

U


unlabeled example

An example that contains features but no label. Unlabeled examples are the input to inference. In semi-supervised and unsupervised learning, unlabeled examples are used during training.


unsupervised machine learning

Training a model to find patterns in a data set, typically an unlabeled data set.

The most common use of unsupervised machine learning is to cluster data into groups of similar examples. For example, an unsupervised machine learning algorithm can cluster songs together based on various properties of the music. The resulting clusters can become an input to other machine learning algorithms (for example, to a music recommendation service). Clustering can be helpful in domains where true labels are hard to obtain. For example, in domains such as anti-abuse and fraud, clusters can help humans better understand the data.

Another example of unsupervised machine learning is principal component analysis (PCA). For example, applying PCA on a data set containing the contents of millions of shopping carts might reveal that shopping carts containing lemons frequently also contain antacids.

Compare with supervised machine learning.

V


validation set

A subset of the data set—disjunct from the training set—that you use to adjust hyperparameters.

Contrast with training set and test set.

W


weight

A coefficient for a feature in a linear model, or an edge in a deep network. The goal of training a linear model is to determine the ideal weight for each feature. If a weight is 0, then its corresponding feature does not contribute to the model.


wide model

A linear model that typically has many sparse input features. We refer to it as "wide" since such a model is a special type of neural network with a large number of inputs that connect directly to the output node. Wide models are often easier to debug and inspect than deep models. Although wide models cannot express nonlinearities through hidden layers, they can use transformations such as feature crossing and bucketization to model nonlinearities in different ways.

Contrast with deep model.

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 3.0 License, and code samples are licensed under the Apache 2.0 License. For details, see our Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 9월 19, 2017.
Connect

Blog
Facebook
Google+
Medium
Twitter
YouTube
Programs

Women Techmakers
Agency Program
GDG
Google Developers Experts
Startup Launchpad
Developer Consoles

Google API Console
Google Cloud Platform Console
Google Play Console
Firebase Console
Cast SDK Developer Console
Chrome Web Store Dashboard

Android
Chrome
Firebase
Google Cloud Platform
모든 제품
한국어
Terms Privacy
Sign up for the Google Developers newsletter
구독하기
Posted by uniqueone
,
https://m.facebook.com/story.php?story_fbid=377364479366220&id=303538826748786

30개의 필수 데이터과학, 머신러닝, 딥러닝 치트시트

데이터과학을 위한 Python
https://s3.amazonaws.com/assets.datacamp.com/blog_assets/PythonForDataScience.pdf

Pandas 기초
https://s3.amazonaws.com/assets.datacamp.com/blog_assets/PandasPythonForDataScience+(1).pdf

Pandas
https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Python_Pandas_Cheat_Sheet_2.pdf

Numpy
https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Numpy_Python_Cheat_Sheet.pdf

Scipy
https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Python_SciPy_Cheat_Sheet_Linear_Algebra.pdf

Scikit-learn
https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Scikit_Learn_Cheat_Sheet_Python.pdf

Matplotlib
https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Python_Matplotlib_Cheat_Sheet.pdf

Bokeh
https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Python_Bokeh_Cheat_Sheet.pdf

Base R
https://www.rstudio.com/resources/cheatsheets/

Advanced R
https://www.rstudio.com/resources/cheatsheets/

Caret
https://www.rstudio.com/resources/cheatsheets/

Data Import
https://www.rstudio.com/resources/cheatsheets/

Data Transformation with dplyr
https://www.rstudio.com/resources/cheatsheets/

R Markdown
https://www.rstudio.com/resources/cheatsheets/

R Studio IDE
https://github.com/rstudio/cheatsheets/raw/master/source/pdfs/rstudio-IDE-cheatsheet.pdf

Data Visualization
https://github.com/rstudio/cheatsheets/raw/master/source/pdfs/ggplot2-cheatsheet-2.1.pdf

Neural Network Architectures
http://www.asimovinstitute.org/neural-network-zoo/

Neural Network Cells
http://www.asimovinstitute.org/neural-network-zoo-prequel-cells-layers/

Neural Network Graphs
http://www.asimovinstitute.org/neural-network-zoo-prequel-cells-layers/

TensorFlow
https://www.altoros.com/tensorflow-cheat-sheet.html

Keras
https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Keras_Cheat_Sheet_Python.pdf

Probability
https://static1.squarespace.com/static/54bf3241e4b0f0d81bf7ff36/t/55e9494fe4b011aed10e48e5/1441352015658/probability_cheatsheet.pdf

Statistics
http://web.mit.edu/~csvoss/Public/usabo/stats_handout.pdf

Linear Algebra
https://minireference.com/static/tutorials/linear_algebra_in_4_pages.pdf

Big O Complexity
http://bigocheatsheet.com/

Common Data Structure Operations
http://bigocheatsheet.com/

Common Sorting Algorithms
http://bigocheatsheet.com/

Data Structures
https://www.clear.rice.edu/comp160/data_cheat.html

SQL
http://www.sql-tutorial.net/sql-cheat-sheet.pdf
Posted by uniqueone
,
https://www.facebook.com/groups/TensorFlowKR/permalink/536521580022238/

https://kkweon.github.io/pr12-web-app-elm/

JunHo Kim 님께서 올려주신 글에서 발견한 것인데요.

저희 운영진 Kyung Mo Kweon 님께서 만들어 주신 12인회 논문 읽기 비디오: https://kkweon.github.io/pr12-web-app-elm/ 엄청 보기 편합니다. (이런건 어떻게 만드시는지 정말 대단!)

저도 바로바로 업데이트 하겠습니다. http://bit.ly/TFPR12


Search title/speaker

Ask me anything: Dynamic memory networks for natural language processing
PR037
발표자: 곽근봉
Learning to Remember Rare Events
PR036
발표자: 전태균
Understanding Black-box Predictions via Influence Functions
PR035
발표자: 엄태웅
Inception and Xception
PR034
발표자: 유재준
PVANet: Lightweight Deep Neural Networks for Real-time Object Detection
PR033
발표자: 이진원
Deep Visual-Semantic Alignments for Generating Image Descriptions
PR032
발표자: 강지양
Learning to learn by gradient descent by gradient descent
PR031
발표자: 차준범
Photo-Realistic Single Image Super Resolution Using a Generative Adversarial Network
PR030
발표자: 김승일
Apprenticeship Learning via Inverse Reinforcement Learning
PR029
발표자: 서기호
Densely Connected Convolutional Networks (CVPR 2017, Best Paper Award) by Gao Huang et al.
PR028
발표자: 김성훈
loVe - Global vectors for word representation
PR027
발표자: 곽근봉
Notes for CVPR Machine Learning Session
PR026
발표자: 전태균
Learning with side information through modality hallucination (2016)
PR025
발표자: 엄태웅
Pixel Recurrent Neural Network
PR024
발표자: 유재준
YOLO9000: Better, Faster, Stronger
PR023
발표자: 이진원
InfoGAN (OpenAI)
PR022
발표자: 차준범
Batch Normalization
PR021
발표자: 청영재
Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification
PR020
발표자: 강지양
Continuous Control with Deep Reinforcement Learning
PR019
발표자: 김승일
A Simple Neural Network Module for Relational Reasoning (DeepMind)
PR018
발표자: 김성훈
Neural Architecture Search with Reinforcement Learning
PR017
발표자: 서기호
You only look once: Unified, real-time object detection
PR016
발표자: 전태균
onvolutional Neural Networks for Sentence Classification
PR015
발표자: 곽근봉
On Human Motion Prediction using RNNs (2017)
PR014
발표자: 엄태웅
Domain Adversarial Training of Neural Network
PR013
발표자: 유재준
Faster R-CNN : Towards Real-Time Object Detection with Region Proposal Networks
PR012
발표자: 이진원
Spatial Transformer Networks
PR011
발표자: 강지양
Auto-Encoding Variational Bayes, ICLR 2014
PR010
발표자: 차준범
Distilling the Knowledge in a Neural Network (Slide: English, Speaking: Korean)
PR009
발표자: 청영재
Reverse Classification Accuracy(역분류 정확도)
PR008
발표자: 정동준
Deep Photo Style Transfer
PR007
발표자: 김승일
Neural Turing Machine
PR006
발표자: 서기호
Playing Atari with Deep Reinforcement Learning (NIPS 2013 Deep Learning Workshop)
PR005
발표자: 김성훈
Image Super-Resolution Using Deep Convolutional Networks
PR004
발표자: 전태균
Learning phrase representations using RNN encoder-decoder for statistical machine translation
PR003
발표자: 곽근봉
Deformable Convolutional Networks (2017)
PR002
발표자: 엄태웅
Generative adversarial nets by Jaejun Yoo (2017/4/13)
PR001
발표자: 유재준
논문 읽기 각오를 다집니다.
PR000
발표자: all
Posted by uniqueone
,
https://m.facebook.com/story.php?story_fbid=375342342901767&id=303538826748786

ZhuSuan : 베이지안 방법론과 딥러닝의 장점을 결합한 Bayesian Deep Learning을 위한 Python 기반 확률 프로그래밍 라이브러리.
ZhuSuan은 Tensorflow 기반에서 작성되었습니다. ZhuSuan은 주로 deterministic neural network(결정론적 신경망)과 감독된 과제를 위해 설계된 기존의 딥러닝 라이브러리와 달리 확률 모델을 작성하고 베이지안 추론을 적용하기 위한 딥러닝 스타일 알고리즘을 제공합니다.

ZhuSuan을 사용하면 사용자는 복잡한 학습을 위한 강력한 피팅과 멀티 GPU 학습을 즐길 수 있을뿐 아니라 생성된 모델을 사용하여 복잡한 세계를 모델링하고 레이블이 없는 데이터를 활용하여 기본 베이지안 추론을 수행하여 불확실성을 처리할 수 ​​있습니다.

다운로드
https://github.com/thu-ml/zhusuan

온라인 설명서
http://zhusuan.readthedocs.io/
Posted by uniqueone
,
https://www.indianweb2.com/2017/09/08/indias-iis-nit-develops-ai-identify-protesters-faces-partly-covered-scarves-hat/

If you’re planning on becoming a part of a protest or a rally but don’t want to reveal your identity at the same time, you might want to think about your participation again as the latter might no longer be possible. Researchers, from Cambridge University, India’s National Institute of Technology, and the Indian Institute of Science have successfully developed a deep-learning algorithm that is capable of identifying an individual even when part of their face is obscured or covered by sunglasses or bandanas, as is seen during many protests, rallies and agitations.

Posted by uniqueone
,
https://m.facebook.com/story.php?story_fbid=360526647716670&id=303538826748786

Deep Learning Lecture Collection (Spring 2017):

Posted by uniqueone
,
https://m.facebook.com/groups/1003321396368637?view=permalink&id=1625076837526420

최근 경험담 하나 공유합니다 (인공지능과 얼마나 연관성이 있을까 잘 모릅니다). 제 전공분야의 책을 하나 마무리했습니다. 영어로 쓴 책이라 문법 교정을 한 번 받고 싶어 교정 업체를 물색하다보니 editage란 회사가 제일 눈에 띄더군요. 견적을 받았는데, 글자 당 얼마. 이렇게 계산을 하다보니 액수가 상당합니다. 울며 겨자 먹기로 맡기긴 했는데요.

사실 늘 영어로 논문 쓰는 일을 하다 보니 영어 문장 구성은 별 무리 없이 잘 합니다. 다만 세밀한 문법 실수가 있을까봐 부탁한 것이지요. 교정 업체에서는 그런 사정을 잘 모르니, 무조건 분량에 비례해 값을 매깁니다.

그래서 인터넷을 검색했더니 grammarly란 프로그램이 있더군요. 영어 문장을 입력하면 문법이 싹 수정된 문장을 뱉어 줍니다. 통 문단을 넣어주면 잠시 뜸을 들였다가 수정된 문단을 토해 냅니다. 기본적인 정관사 오류나, 쉼표 삽입 문제 등을 잘 지적하더군요. 회사 홈페이지 설명에 따르면 웹 검색한 영어 문장을 토대로 일종의 기계 학습된 고유 프로그램을 갖고 있나 봅니다. 

몇가지 불만족스러운 점:

이과 전공 서적이다 보니 수식이 많습니다. pdf 파일을 통째로 올리면 여기서 수식과 영어 문장을 구분해서, 수식은 그냥 놔 두고 (어차피 교정 불가능하니까) 영어 문장만 골라 문법 수정해 주기. 이런 기능이 있으면 좋을 텐데 없네요. grammarly는 아예 pdf 파일을 받아들이는 것 자체를 못하는 것 같고, 아마 워드 파일만 올릴 수 있나 봅니다.

각 전공 분야마다 사용하는 고유 언어가 있습니다. 그러나 grammarly는 그런 고유 언어를 이해하지 못하고 다른 일반적인 단어로 대치하고 싶어 합니다. 각 전공 분야마다 특화된 문법 교정기가 있으면 좋겠다는 생각을 합니다. grammarly for physics, grammarly for medicine, 이런 식으로요. 전공 분야별로 특화된 문법 교정 프로그램을 만다는 시도를 하는 게 가능할까요? 이미 어디선가 시도하고 있을까요? 아니면 이미 있을까요?

인공지능 세상이 오면 없어질 직업 중에 동시 통역사, 번역가가 있다고 들었는데. 제가 보기엔 그 전에 먼저 이런 문법 수정해 주는 사업이 먼저 기계 학습으로 대체될 것 같네요.
Posted by uniqueone
,
https://m.facebook.com/groups/255834461424286?view=permalink&id=513220265685703

안녕하세요, 앤드류 교수의 코세라 강의를 정주행 후 영상 슬라이드를 정리하여 보았습니다.

지난주에 이어 3주차 슬라이드를 공유 드립니다.
많은분들께 도움이 되었으면 합니다.

감사합니다. :)

3주차 : http://www.kwangsiklee.com/ko/2017/07/corsera-machine-learning%EC%9C%BC%EB%A1%9C-%EA%B8%B0%EA%B3%84%ED%95%99%EC%8A%B5-%EB%B0%B0%EC%9A%B0%EA%B8%B0-week3/
2주차 : http://www.kwangsiklee.com/ko/2017/07/corsera-machine-learning%EC%9C%BC%EB%A1%9C-%EA%B8%B0%EA%B3%84%ED%95%99%EC%8A%B5-%EB%B0%B0%EC%9A%B0%EA%B8%B0-week2/
1주차 : http://www.kwangsiklee.com/ko/2017/07/corsera-machine-learning-week1-%EC%A0%95%EB%A6%AC/
Posted by uniqueone
,
https://www.programmableweb.com/category/colors/api

 

 

 

Colors Apis

The following is a list of APIs from ProgrammableWeb's API directory that matched your search term. The ProgrammableWeb API directory lists APIs of different types. For example, Web/Internet APIs, browser APIs, and certain product APIs. From many of our API profiles, you can find your way to related SDKs, Tutorials, and sample source code for consuming those APIs. If your favorite API or SDK is missing or you have an idea for contributing content, be sure to check our guidelines for making such contributions to ProgrammableWeb.
Name Description Category Date
Icons8
Icons8 provides an extensive ISO compliant icon library. The API allows developers to search and retrieve icons that can be used for template customization, build graphic and text editors, and to... Images 06.15.2017
PrintCalc
The PrintCalc API returns the percentage of a .pdf, .eps or .ps file's CMYK and Spot Color coverage. The API supports PDF, EPS, and PS files. HTTP POST is the preferred request method. The... PDF 04.01.2017
TinEye MulticolorEngine
The TinEye MulticolorEngine API allows developers to make their image collections searchable by color. The API can extract color palettes from images, identify and search images by color, and support... Search 01.19.2017
W3C CSS Painting
The W3C CSS Painting API is a specification that describes an API for allowing developers to use an additional function to their CSS code. This affects the paint stage of CSS, which is... Images 09.27.2016
Image Color Extraction
Use this HTTP/JSON API to extract colors from any image. The Image Color Extraction API can return colors in multiple formats, such as: RGB, HEX, HSL, HSB or RGBA. Sign up with Mashape to receive... Colors 02.18.2015
Coinprism Colored Coins
Coinprism is a service that allows for the tokenization of cryptocurrency. Using Coinprism's Colored Coins, users are able to trade shares, bonds, and commodities without regulation by coloring... Bitcoin 08.25.2014
Croar.net RGB Picker
Croar.net provides a simple RGB color picker widget that users can add to any webpage by inserting a few lines of JavaScript code. A demo of this widget is available with the documentation. The Croar... Widgets 01.02.2014
APICloud.Me ColorTag
APICloud.Me is a cloud-based API provider that aims to deliver scalable APIs that are simple to consume, reliable, and well documented. ColorTag is an API capable of detecting colors within an image... Tools 11.17.2013
MyELearningSpace Web Accessibility
The service provides review and validation of a website's accessibility for all users, including those with impaired eyesight, hearing, and motor skills. It helps designers to make content... Colors 07.23.2012
Pictaculous
Pictaculous is a color palette generator service from MailChimps. Users can upload PNG, GIF, or JPG image files and Pictaculous will analyze their colors to return a fitting scheme. Pictaculous'... Tools 05.07.2012
AChecker
The service provides analysis and validation of accessibility of web resources for users with disabilities. It can perform an automated review of resources at a specified URL, with a validation... Colors 03.02.2012
Image Color Summarizer
The web service provides statistics describing color characteristics of an image identified by URL. Summary data indicate the single RGB color that best represents the image, along with average hue... Photos 11.29.2011
Colorfy It
Colorfy It is a web application that lets users copy and paste website URLs into a box, and it returns the colors, CSS, and color ID information from the website for color analysis. The Colorfy It... Colors 10.11.2011
Colr.org
Colr.org is an online service that allows users to search for images, colors, and color schemes. Users can edit colors and color schemes, tag them, and download them. Users can also search for... Other 08.07.2011
Empora Evergreen
Fashion search API that returns clothes and accessories data based on search parameters including price, brand, color, and title/description. Developers can earn revenue when people click through to... Search 10.01.2010
ColoRotate
Bring 3D color into your web site or blog using the ColoRotate API. Use it to display palettes of color on your site in 3D, or create complex mashups. With the ColoRotate and JavaScript, you can get... Tools 05.29.2010
COLOURlovers
From their site: With the release of the COLOURlovers API, you can now access almost 1 million named colors and more than 325,000 color palettes for your creative projects and applications. Creating... Tools 04.20.2008
Posted by uniqueone
,

MORE AGILE: 수학을 포기한 직업 프로그래머가 머신러닝 학습을 시작하기위한 학습법 소개
http://www.moreagile.net/2015/05/how-to-start-machine-learning-study.html?m=1


MORE AGILE
보다 나은 개발자의 삶을 위하여
2015년 5월 31일 일요일
수학을 포기한 직업 프로그래머가 머신러닝 학습을 시작하기위한 학습법 소개
오늘은 일본의 유명 개발자용 지식공유 서비스인 Qiita에서 큰 인기를 끌고 있는 “수학을 포기한 직업 프로그래머가 기계학습을 시작하기위한 최단경로数学を避けてきた社会人プログラマが機械学習の勉強を始める際の最短経路“를 번역하여 소개해 보고자 합니다. 번역을 흔쾨히 허락해 주신 단노 류이치だんの りゅういち씨에게 감사의 말씀을 드립니다.
전체적으로는 선형대수에 대해서 쉽게 배우는 방법과 무엇 보다 머신러닝의 필수 과목으로 꼽히는 Andrew Ng교수의 머신러닝 강의에 대한 공략법/내용설명이 주를 이룹니다.
빅데이터나 머신러닝에 대해서 한번 공부는 해 보고 싶었지만 수학에서 좌절하여 문턱을 서성거리는 프로그래머들이 꼭 한번 읽어보면 좋은 내용이라 생각됩니다.


요즘 항간에서는 딥 러닝 같은 단어가 여기저기서 들려는 오는 바람에, 나도 머신러닝인지 뭔지 한번 해 볼까 하고 생각해 오다 용기를 내어 두꺼운 알록달록한 책을 사긴 했는데 좀체 손이 가질 않고 나오는 수식만 봐도 머리가 아파오는분. 그리하여, 그래 어차피 나따위가 머신러닝같은거 할 수 있을리가 없어 라는 한숨 섞인 자조속에 눈이 저절로 감겨오는 분 이라면 잠깐 시간을 내어 이 글을 끝까지 읽어 보시기 바랍니다.
대상

공부에 많은 시간을 투자하기 어려운 직업 프로그래머
슬슬 윗사람이나 고객 에게서 “이런건 머신러닝 이용하면 간단하지 않어?”라는 말을 듣게될것 같은 분
이과에서 수학을 공부하긴 했지만, 미분이라던지 행렬같은거 누가 물어보면 난처해 지는분
이 글에서 다루게 될 것

수학의 기본지식에 익숙해지기 위한, 수학이 처음부터 나오지 않는 프로그래머를 위한 수학입문서의 소개
머신러닝의 초보자에게 알맞은 머신러닝애 대한 온라인강좌(MOOC) 소개
환경

윈도우나 맥,리눅스에서 사용 가능한 MATLAB/Ocatave라는 툴을 사용합니다.
종이와 팬을 준비해두면 이해를 돕는데 편리합니다.
용어와 약어소개

PRML = 웹에서 머신러닝으로 검색하면 반드시 등장하는 약어입니다. Pattern Recognition and Machine Learning의 약자로 머신러닝에서는 바이블이라 할 수 있는 책 입니다. 이 글의 서문에서 소개한 알록달록한 책이 바로 이 책입니다.
MOOC/MOOCs = Massie Open Online Course(s)의 약자. 인터넷상에서 무료로 수강 가능한 오픈 강좌를 말합니다. 이번에 여러모로 많은 도움을 받고 있습니다.
초보자의 머신러닝학습 흐름

“전혀 머신러닝은 공부한적이 없고, 책을 좀 읽을라치면 수학공식이 나오는 바람에 어디서부터 시작해야 할 지 모르겠습니다” 라는 프로그래머에게 제 자신의 실제 경험을 바탕으로 권해드리는 방법은 다음과 같습니다.
행렬이라던지 백터를 모른다면 동영상을 봐도 중간에 무슨 소리인지 알 수가 없습니다. 따라서 프로그래머를 위한 수학책을 먼저 읽을 필요가 있습니다.
Coursera라는 온라인 강좌의 회원이 되고, 스텐퍼드 대학의 Andrew NG(앤드류 응)교수의 Machine Learning강의(무료)를 듣습니다.
그럼 이제부터 자세히 살펴보겠습니다.
프로그래머를 위한 수학 책

프로그래머 여러분들은 게이밍라던지 소프트웨어라던지 설명서를 읽기 보다는 어떻게든 일단 프로그램을 짜서 움직이는것을 보면서 생각하자는 식의 사고 회로가 형성되어 있다고 생각합니다.
하지만 이 머신러닝에 대한 공부라는 것은 수학적인 기초가 없는 상태에서 진행하려고 하는것은 게임으로 비교하자면 변변한 무기도 없이 어려운 던전에 들어가는것과 마찬가지라서 금방 벽에 부딧히게 됩니다.
머신러닝에 대한 책은 미분 적분과 선형대수를 알고있다는것을 전제하고 있기 때문에, 갑자기 모르는 수식이 튀어나오는것은 바로 그 부분에 대한 지식이 없는것이 원인입니다.
위에 설명한 코세라의 머신러닝 과정은 그러한 수학 지식이 없어도 들을 수 있게 배려하고 있으며, 중간중간 해설도 해 주는 프로그래머 친화적인 교육 코스입니다.
그래도 최소한 행렬과 백터의 취급에 익숙하지 않으면 중간중간 튀어나오는 행렬조작에 대한 내용들을 이해 못하고 이게 뭔가 라는 상태가 되기 쉽습니다. 그래서 사전에 준비운동 정도로 익혀 두는것이 좋습니다.
프로그래머를 대상으로 추천할만 선형 대수학 강의/책은 여기 입니다.(역자주:원문에서는 일본어로 된 서적:프로그래밍을 위한 선형대수학(히라 카즈유키저)을 소개하고 있어 한국에서 접할 수 있는 선형대수 관련 강의/책으로 대체하여 소개해 봅니다)
칸아카데미 선형대수 강의(한글자막)
뭐 설명이 필요 없습니다. 마치 어딘가의 광고에 나오는 문구 처럼 씹을 필요도 없이 보기만 하면 머리속에 쏙쏙 들어오는 명 강의입니다.
코딩 더 메트릭스
파이선을 이용행 여러가지 선영대수의 문제를 풀는 법에 대하여 설명하고 있습니다. 머신러닝에 있어서 많은 예제들이 파이선으로 작성되어 있으므로 파이선이 익숙치 않다면 이번 기회에 다뤄보는것도 좋을듯 합니다. Coursera에서 하는 동명의 강의도 같은 내용을 다루고 있으며 프로그래머를 위한 선형대수를 자세히 다루고 있으므로 추천할만 합니다.
개인적으로는 바로 다음에 소개할 머신러닝 과정을 이해하는데 필요한 최소의 지식이 행렬의 곱이라고 생각합니다. 중요한것은 머리속에서 어떻게 이미지를 그려나가야 하는지를 알아야 하는것 같습니다.
저는 행렬 수식의 어디에 어떻게 주목하지 않으면 안되는 것인지 전혀 이해할 수 없었습니다만 행렬의 진정한 가치는 사이즈의 변환에 있다는것을 알게되었습니다. ( 역자주 : 칸아카데미의 Scaling vector에 나오는 예제들을 충분히 다뤄보시기 바랍니다)
머신러닝 강의를 수강한다

머신러닝 초보자에게 추천하는 것이 지금부터 소개하는 온라인 강좌입니다. 등록 방법은 여기에서 설명하지 않겠습니다만, 여렵지 않게 계정을 만들고 수강신청을 할 수 있습니다.
이 강의를 추천하는 이유는 다음과 같습니다.
공짜다
머신러닝 에서 초심자에게 필요한 지식을 개념을 통해 설명해 줍니다.
어렵지 않은 수준의 영어로 강의가 진행됩니다. (역자주: 2015년 5월 현재 이 강의의 모든 동영상은 일본어 자막 지원하지만 한국어 자막은 소개부분의 일부분만 지원되고 있습니다)
영상 강의 뿐만 아니라 시험도 봅니다. 실제로 프로그램 결과를 제출해야만 합니다. 그래서 자신이 이해하고 있는지 여부를 알기 쉽습니다.
시험은 여러 번 제출을 할 수 있기 때문에 혹시 잘 못하면 어쩌나 라는 생각을 하지 않아도 좋습니다.
기한이 따로 없습니다. 자신의 페이스에 맞춰서 학습을 진행 할 수 있습니다.
동영상 다운로드가 있기 때문에 열차안에서도 보기 쉽습니다.
장점이 많은 강의이지만, 온라인 강의의 특성상 통제 불가능한 부분도 있습니다.
기한도 없고 돈도 내지 않기 때문에 중간에 그만둬 버리기 쉽다
사실, 강좌를 시작하는 사람수에 비해 끝까지 마치는 사람의 비율은 매우 낮다는 데이터가 있다고 합니다. 그래서 마지막으로 필요한 것은 ‘강철의 의지’ 입니다. 혼자 하기 어렵다면 친구나 동료를 모아 함께 배워나가는 것이 좋을지도 모릅니다.
역자주: 그렇습니다. 이 강의는 혼자 듣기가 참 어렵습니다. 일단 18주(무려 4달 반에 해당한다!)에 이르는 양도 양 이지만 연습문제풀이등에서 막히면 혼자 풀어나가기가 참 어렵습니다. 주변의 널널한 프로그래머, 혹은 수학이나 통계학 전공자 들을 꼬득여 온라인 스터디 그룹을 만들고 1주일에 한번씩 서로의 진도를 체크해 주며 격려해 나가는 것도 한가지 좋은 방법이 되리라 생각합니다.
머신러닝 과정의 내용

머신러닝 수업에 대한 소개 페이지입니다.

여러가지 WEB페이지에 같은 내용의 소개가 있습니다만, 시기에 따라 코스수나 조건이 달라지는것 같습니다. 이 글의 내용은 2015년 5월 현재 오픈된 코스를 기준으로 소개합니다.
1. Introduction(소개)

머신러닝이라는게 데체 뭐야? 어디서 쓰는 것인지에 대해서 설명하고 있습니다. 아울러 쓰게되는 툴의 설치 방법을 설명하고 있습니다.
동영상은 물론 자막이 있으며 아직 한국어 자막은 소개 이외엔 제공되고 있지 않습니다만 영어 자막을 켜거나 일본어가 가능하다면 일본어 자막을 선택 할 수 있습니다.
설명은 윈도우즈라면 MATLAB를 설치하는것 부터 시작합니다. (이 과정을 수강하면 무료로 사용할 수 있습니다.) Linux / Mac OS X라면 Octave를 인스톨 합니다.
각 장에서 토론을 할 수 있는 게시판 같은 곳이 있는데, 아무것도 쓰지 않아도 과정을 끝까지 수료하는데엔 아무 문제가 없으므로 인스톨 이외엔 무시하고 넘어갑시다.
2. Linear Regression with One Variable(변수의 선형 회귀)

여기서부터 실전으로 들어갑니다. 머신러닝의 출발은 선형 회귀라는 것을 알 수 있습니다.
또한 이 장에서는 앞 장에서는 없었던 Review라는 이름의 테스트가 있습니다.
테스트는 5 문항 중 4 문항이상을 맞춰야 통과가 되는데, 단순 객관식이 아니라 복수 선택이라던지 계산하여 숫자를 써 넣는식으로 시험이 진행됩니다. 게다가, 어려운것은 매번 출제되는 순서나 내용이 바뀌므로 한번 통과했다 하더라도 다시 시험을 보았을때 떨어지느 경우도 있습니다.
게다가 이 테스트는 꼼수를 막기위해서 3회동안 패스(4문제 이상 정답 제출)하지 못하면 8시간 이내에 시험을 볼 수 없는 패널티가 주어지게 됩니다. (패널티는 이것 뿐입니다.)
패널티를 받게되면 아래 사진처럼 몇 분 내기라는 표시가 뜨게 됩니다.

테스트를 통과하지 못한다 하더라도 다음 진행을 못하는 것은 아니므로 혹시나 패널티를 받게 된다면 그냥 다음 진도를 일단 나가는 것이 좋습니다.
3. Linear Algebra Review(선형 대수학 복습)

선형 대수학을 모르면 앞으로 진행 할수 없다는 것을 잘 알려주는 장 입니다. 기초에 대해 많은 공을 들여 친절하게 시간을 투자하고 있습니다. 머신러닝에 사용되는 행렬이나 벡터에 대한 개념을 이 장에서 잡으셔야 합니다.
이 장의 Review는 좀 달라서, 이 장에서 한두번 치루게 되는 확인시험과 동일한 평가를 Review형식으로 취합니다. 따라서 클리어 한다고 해도 전체 성적에 반영되지는 않습니다. 혼동하지 않도록 주의합시다. 이 후로도 이러한 형태의 시험은 여기 뿐입니다.
4. Linear Regression with Multiple Variables(다변량 선형 회귀)

2장에서는 선형 회귀를 살펴보았으므로, 이번에는 다변량 선형 회귀에 대한 강의입니다. 다변량을 취급하는데 있어서 규모를 맞추거나 매개 변수의 조정에 대한 논의가 있습니다.
또한 이 장에서 프로그램을 직접 짜는 테스트가 추가됩니다. MATLAB/Octave에 익숙하지 않으면 좀 푸는 것이 괴롭습니다. 하지만, MATLAB/Octave에 대해서는 다음장에서 설명을 시작하므로 먼저 5장을 듣고 나서 돌아오는것이 좋습니다.
덧붙여서, 프로그램은 MATLAB/Octave에서 직접 업로드 할 수 있습니다. 업로드 하면 WEB에서 성적을 볼 수 있습니다. 업로드는 몇번이고 가능하므로 부담 갖지 말고 들어보시기 바랍니다.

5. Octave Tutorial(Octave의 설명)

Octave라고 써 있습니다만, MATLAB도 마찬가지입니다. 행렬 처리에 익숙하지 않은 사람은 이 장의 “Vectorization”동영상의 내용을 마스터 하면 for문이 불필요한 계산을 구축 할 수 있게 됩니다. 개인적으로 이 장의 강좌를 들으면서 좋았다고 생각한 순간이었습니다.
역자주: 이 강의에서는 MATLAB/Octabe를 위주로 설명하고 있지만 기업환경에서 머신러닝을 이끌고 있는 대중적인 언어는 파이선/R입니다. 파이선이나 R에 대한 튜토리얼은 아래 링크에서 확인해 보세요.
Kaggle R Tutorial on Machine Learning(R언어로 진행되며 무료입니다)
Intro to Machine Learning(파이선으로 진행되며 무료입니다)
6. Logistic Regression(로지스틱 회귀)

동영상 내에서도 선생님이 혼자서 마구 달라는 느낌입니다만, 이름은 ‘회귀(Regression)’ 인데 ‘분류(Classification)’에 에 사용되는 로지스틱 회귀를 배우는 장 입니다. 선형 회귀와 비슷한데, 다른 곳을 확인하는것으로 머신러닝에 대한 깊이가 드러나게 됩니다.
또한 여기서도 프로그램의 게시물 테스트가 있는데, 곤란하게도 다음장에 배울 지식이 없으면 풀리지 않는 문제(Regularization:정규화)가 들어 있습니다. 제 경우 결국 풀지 못하고 다음장에 갔다가 다시 돌아와서 풀어야만 했습니다. 설명하지 않아도 풀 수 있을거라고 생각했는지, 암튼 여러가지로 깊이가 있는 챕터입니다.
7. Regularization(정규화)

점점 머신러닝 다워 진다고나 할까, “XX를 향상시키기 위해 이를 수식에 추가”라는 요소 중 하나인 정규화에 대해 배울 수 있는 장 입니다. 장으로는 짧지만, 이 장을 대충 넘기게 되면 나중에 다시보지 않으면 안되는 것이 많아지므로 제대로 이해하는것이 중요합니다.
8. Neural Networks : Representation(신경망:표현)

이번 장은 전반부 최대의 고비인 신경망에 대한 내용입니다. 비교적 복잡하기 때문에 2개의 장으로 나눠져 있습니다. 이 장에서는 먼저 입출력에서 출력까지 어떻게 계산하고 진행해야 하는지를 설명하고 있습니다.
9. Neural Networks: Learning(신경망:학습)

여기가 신경망을 머신러닝에 적용시키는데 중요한 Backpropagation(오차역전파방법)에 대해서 설명하는 장 입니다. 저는 지금까지 다른 책 이라던지 웹에 적힌 내용을 봐도 도통 뉴럴네트워크가 어떻게 학습을 해 나가는지 이해 할 수 없었습니다만, 동영상 및 구현을 보고 마침내 어떤 계산에서 학습하고 있는지를 알겠다는 생각이 들었습니다.
10. Advice for Applying Machine Learning(머신러닝을 적용하기 위한 조언)

머신러닝 이론을 안다고 해도 실제 적용하기 위해서 무엇을 조심하지 않으면 안되는가에 대해서 다루는 대단히 중요한 장 입니다.
언더핏, 오버핏, 학습곡선이 어떻게 되는가 등 머신러닝을 실무에 적용하려 할때 지침이 되는 장 입니다.
11. Machine Learning System Design(머신러닝 시스템 디자인)

이전 장 처럼 이번 장도 머신러닝을 실무에 도입하려고 할 때 발생하는 문제점을 알아보고 해결책을 제시하는 장 입니다.
구체적으로는 99%일어나지 않지만 1%일어날 가능성이 있는 현상에 대해서 머신러닝의 지표는 어떻게 설정해야 하는것인가에 대해서 이야기 하고 있습니다.
12. Support Vector Machines(서포트 벡터 머신(SVM))

분류 알고리즘의 하나인 SVM에 대한 장 입니다. SVM은 인기있는 알고리즘 이므로 잘 배워둬야 한다고 선생님은 말씀하시고 있었습니다.
또한 마지막 동영상은 SVM을 포함해 여러 알고리즘들이 어떤때 사용되는지 이야기 하고 있으므로 잘 기억해 둬야 하는 포인트 입니다.
13. Unsupervised Learning(무감독 학습)

이번장부터 15장 까지가 무감독 학습의 장 입니다. 이 장 에서는 K-Means알고리즘이라 불리우는 인기있는 다운클러스터링 알고리즘에 대해 배웁니다.
14. Dimensionality Reduction(차원감소)

이 장에서는 차원감소에 대해서 배웁니다. 차원감소라는 개념에서 실제로 배우는 알고리즘은 PCA(주성분 분석)라 불리우는 알고리즘 입니다.
저는 100차원이든 1000차원이든 2차원 또는 3차원까지 줄여버리면 그래프 그리기가 가능하다는 것으로 듣고 학습 동기가 생겼습니다.
또한 차원을 감소 시켜 극단적으로 적은 수가 된다면 어떻게 될까 생각했습니다만, 그에 대해서는 제대로 대답이 준비되어 있어 원래의 상태 대비 압축 정도에 따라 어느정도 정보량이 손실되는지를 파악하는 것이 중요하다고 생각했습니다.
15. Anomaly Detection(이상 검출)

이상을 검출하는 알고리즘을 학습하는 장 입니다.
여기서 처음으로 정규분포(상당히 복잡한 수식)이 나오는 것 입니다만, 그 이전에 여러 수식에 익숙해 진 터라, 그렇게까지 흉악해 보이지는 않았습니다.
흥미로왔던 것은 감독 학습과 비정상 검출을 비교하여, 데이터 상태에 따라 어느쪽을 사용해야 할 지와 같은 주제가 앞으로 도움이 될 것이라 합니다.
16. Recommender Systems(추천시스템)

사용자가 평가 한 영화의 평점을 어떻게 처리할 것 인가라는 굉장히 실용적인 주제에 대한 해설을 하는 장 입니다.
처음에는 제한적인 작은 데이터부터 시작하여 마지막에는 비어있는 부분이 있다 하여도 모든 파라메터를 단번에 계산 해 버릴 듯한 이야기의 흐름이 무척 좋았습니다.
17. Large Scale Machine Learning(대규모 머신러닝)

대규모 데이터에대해서 매번 전부다 처리를 하게되면 속도가 느려지는 것에 대해서 어떻게 할 지를 생각해 보는 장 입니다.
여기에서 처음 Map-Reduce개념이 등장 합니다만, 분할해서 처리하니 빠르군요 정도의 뻔한 수준의 이야기 이므로 좀 더 자세하게 알아보고 싶으신 분들은 다른 문헌을 찾아보시는것을 추천합니다.
18. Application Example : Photo OCR(응용 예: 사진에서의 텍스트 추출)

마지막 장은 응용에 대한 예로서 파이프라인을 이용한 머신러닝 로직의 조합으로, 사진에서 텍스트를 추출하는 작업을 수행합니다.
실제 프로그램을 짤 것이라고 생각했는데, 이 장에서는 프로그램을 제출하지 않았습니다.
모든 테스트를 통과하면 무엇이 일어나는가?

기간제 코스의 경우 유료로 수료증 같은 것을 발행해 주는듯 합니다만, 완전 오픈코스인 이 과정은 결과가 코스 상단에 표시됩니다.
역자주: 코세라는 코스 자체는 무료로 제공하고 있으며, 수료증을 유료로 제공하는것을 비즈니스 모델로 하고 있습니다.
실제 전 과정을 패스하자 아래와 같이 표시되어 “Course Passed”부분에 뭔가 링크가 있을까 기대해 봤지만… 아니었습니다.

머신러닝 수강이 끝났다! 이제 뭘 하지?

프로그래머라면 이제 손을 움직여 다양한 것을 구현해 나갑시다. 샘플 코드의 의미도 알게 되었고 어디를 수정하면 어떻게 될까 어쨌든 머리속에 그려 볼 수 있게 된 듯 합니다.
스터디 그룹같은걸 열어서 아직 머신러닝에 경험이 없는 사람을 끌여들여 동료로 늘려 나가는 것도 좋은 방법입니다.
참여자가 늘지 않으면 이 분야는 발전이 없기 때문에 꼭 주변에 퍼트리고 다닙시다.
역자 추가: 머신 러닝 관련 읽을거리들

쉽게 풀어쓴 딥 러닝의 모든것
PredictionIO:오픈소스 머신러닝서버
Getting Started with Microsoft Azure Machine Learning :간편한 인터페이스임에도 강력한 기능을 제공하는 AzureML에 대한 온라인 강좌 입니다.
DL4J: Word2Vec를 비롯 각종 머신러닝 알고리즘을 자바에서 쓸 수 있게 해 주는 오픈소스 프로젝트입니다.
시간: 오전 12:40
공유
 




웹 버전 보기
주인장 소개

내 사진
정도현 

서울로 이사왔습니다!
전체 프로필 보기
Powered by Blogger.
Posted by uniqueone
,

Quick complete Tensorflow tutorial to understand and run Alexnet, VGG, Inceptionv3, Resnet and squeezeNet networks – CV-Tricks.com
http://cv-tricks.com/tensorflow-tutorial/understanding-alexnet-resnet-squeezenetand-running-on-tensorflow/
Posted by uniqueone
,
Posted by uniqueone
,

Caffe 설치하기 ๑•‿•๑ (1)
http://kimering.blogspot.kr/2017/03/caffe-1.html?m=1
Posted by uniqueone
,