Self-Tuning Spectral Clustering

-->http://www.vision.caltech.edu/lihi/Demos/SelfTuningClustering.html

 

I tried to operate the ZPclustering code, but it shows me some errors as follows, and it is because I'm using 64-bit matlab (on Windows 7).

------------------------------------------

>> test_segimage
Building affinity matrix took 0.063324 second

Error using dist2aff
Function "mxGetIr_700" is obsolete.
(64-bit mex files using sparse matrices must be rebuilt with the "-largeArrayDims" option.  See the R2006b release notes for more details.)

Error in segment_image (line 65)
    tic; W = dist2aff(D,SS); ttt = toc;

Error in test_segimage (line 11)
[mask] = segment_image(IM,R,G1,'SS','KM',0.1);

------------------------------------------

I modified the mex files 

dist2aff.cppevrot.cppscale_dist.cppzero_diag.cpp

And, I typed like follows in matlab command window:

>> mex -O -largeArrayDims -c dist2aff.cpp
>> mex -O -largeArrayDims -c scale_dist.cpp
>> mex -O -largeArrayDims -c zero_diag.cpp
>> mex -O -largeArrayDims -c evrot.cpp

 

>> mex -O -largeArrayDims  dist2aff.obj
>> mex -O -largeArrayDims   scale_dist.obj
>> mex -O -largeArrayDims  zero_diag.obj
>> mex -O -largeArrayDims evrot.obj

 

After that, it opertes well.

Posted by uniqueone
,
http://dogmas.tistory.com/trackback/141

Logistic Regression에 대한 간단한 설명

Linear regression은 종속변수가 일정한 양을 나타낼 경우가 대부분이지만 종속변수가 0과 1만을 갖는 변수일때에는 logistic regression을 사용하는 것이 좋다.

예를들면, 어떤 대학교 법과대학을 졸업한 학생을 대상으로 학점, 재산, 나이, 사법고시 합격 여부를 조사한다면 학점과 재산과 나이는 일정한 양을 나타내지만 사법고시 합격 여부는 합격은 1로 나타내고 불합격은 0으로 나타내는 binary variable이 된다.

 

다음과 같은 선형 모델을 생각해보자.

 

 

여기서 Y는 0과 1만을 갖는 종속변수이고, x는 독립변수이며, e는 에러를 나타낸다.

Y가 Bernoulli random variable이고 확률은 다음과 같다고 가정해보자

 

이렇게되면 위 선형모델식에서 에러는 normal distribution을 갖지 못하고 에러의 분산도 상수가 아니라 Y가 1일 확률에 따라 변하게 된다. 더군다나 Y의 범위가 0에서 1이므로 일반적인 linear regression을 사용할 수 없다.

 

경험적으로 Y가 binary variable이면 그 형태가 S자 임을 알 수 있으므로 다음과 같은 logit reponse function을 이용한다.

또는

 

이를 고쳐쓰면 아래와 같이 쓸수있다.

 

위 식에서 우변을 odds ratio라고 부른다.

만약 어떤 값  "x = x1"에 대하여 odds ratio가 2라면 x가 x1일때 Y가 1일 확률이 Y가 0일 확률의 2배가 된다는 것을 의미한다. 또한 x가 1만큼 증가함에 따라 odds ratio는 exp(b1)만큼 증가함을 알수 있다.

 

Logistic regression 예제

위와 같은 데이터에 대하여 logistic regression을 수행하였을때

  • Odds ratio = 0.84

라고 가정하여보자.

 

의 값은 standard normal distribution을 따르므로 H0: b1 = 0 을 테스트하면 p = 0.04이며 이는 통계적으로 significant한 값이다. 따라서 온도를 1 내릴때마다 O-ring failure의 확률값은 O-ring success 확률 대비 0.84만큼 증가함을 보여준다.

 

 

Posted by uniqueone
,
Posted by uniqueone
,
http://www.cs.utexas.edu/~grauman/courses/spring2010/schedule.html

CS395T: Special Topics in Computer Vision, Spring 2010

Object Recognition



Course overview        Useful links        Syllabus        Detailed schedule        eGradebook        Blackboard


Meets:
Wednesdays 3:30-6:30 pm
ACES 3.408
Unique # 54470
 
Instructor: Kristen Grauman 
Email: grauman@cs
Office: CSA 114
 
TA: Sudheendra Vijayanarasimhan
Email: svnaras@cs
Office: CSA 106

When emailing us, please put CS395 in the subject line.

Announcements:

See the schedule for current reading assignments. 

Project paper drafts due Friday April 30.

Course overview:


Topics: This is a graduate seminar course in computer vision.   We will survey and discuss current vision papers relating to object recognition, auto-annotation of images, and scene understanding.  The goals of the course will be to understand current approaches to some important problems, to actively analyze their strengths and weaknesses, and to identify interesting open questions and possible directions for future research.

See the syllabus for an outline of the main topics we'll be covering.

Requirements: Students will be responsible for writing paper reviews each week, participating in discussions, completing one programming assignment, presenting once or twice in class (depending on enrollment, and possibly done in teams), and completing a project (done in pairs). 

Note that presentations are due one week before the slot your presentation is scheduled.  This means you will need to read the papers, prepare experiments, make plans with your partner, create slides, etc. more than one week before the date you are signed up for.  The idea is to meet and discuss ahead of time, so that we can iterate as needed the week leading up to your presentation. 

More details on the requirements and grading breakdown are here.  Information on the projects and project proposals is here.

Prereqs:  Courses in computer vision and/or machine learning (378 Computer Vision and/or 391 Machine Learning, or similar); ability to understand and analyze conference papers in this area; programming required for experiment presentations and projects. 

Please talk to me if you are unsure if the course is a good match for your background.  I generally recommend scanning through a few papers on the syllabus to gauge what kind of background is expected.  I don't assume you are already familiar with every single algorithm/tool/image feature a given paper mentions, but you should feel comfortable following the key ideas.


Syllabus overview:

  1. Single-object recognition fundamentals: representation, matching, and classification
    1. Specific objects
    2. Classification and global models
    3. Objects composed of parts
    4. Region-based methods
  2. Beyond single objects: recognizing categories in context and learning their properties
    1. Context
    2. Attributes
    3. Actions and objects/scenes
  3. Scalability issues in category learning, detection, and search
    1. Too many pixels!
    2. Too many categories!
    3. Too many images!
  4. Recognition and "everyday" visual data
    1. Landmarks, locations, and tourists
    2. Alignment with text
    3. Pictures of people


Schedule and papers:


Note:  * = required reading. 
Additional papers are provided for reference, and as a starting point for background reading for projects.
Paper presentations: focus on starred papers (additionally mentioning ideas from others is ok but not necessary).
Experiment presentations: Pick from only among the starred papers.
Date
Topics
Papers and links
Presenters
Items due
Jan 20
Course intro  handout


Topic preferences due via email by Monday Jan 25
I. Single-object recognition fundamentals: representation, matching, and classification
Jan 27
Recognizing specific objects:

Invariant local features, instance recognition, bag-of-words models

sift
  • *Object Recognition from Local Scale-Invariant Features, Lowe, ICCV 1999.  [pdf]  [code] [other implementations of SIFT] [IJCV]
  • *Local Invariant Feature Detectors: A Survey, Tuytelaars and Mikolajczyk.  Foundations and Trends in Computer Graphics and Vision, 2008. [pdf]  [Oxford code] [Read pp. 178-188, 216-220, 254-255]
  • *Video Google: A Text Retrieval Approach to Object Matching in Videos, Sivic and Zisserman, ICCV 2003.  [pdf]  [demo]
  • Scalable Recognition with a Vocabulary Tree, D. Nister and H. Stewenius, CVPR 2006. [pdf]
  • SURF: Speeded Up Robust Features, Bay, Ess, Tuytelaars, and Van Gool, CVIU 2008.  [pdf] [code]
  • Robust Wide Baseline Stereo from Maximally Stable Extremal Regions, J. Matas, O. Chum, U. Martin, and T. Pajdla, BMVC 2002.  [pdf]
  • A Performance Evaluation of Local Descriptors. K. Mikolajczyk and C. Schmid.  CVPR 2003 [pdf]
  • Oxford group interest point software
  • Andrea Vedaldi's code, including SIFT, MSER, hierarchical k-means.
  • INRIA LEAR team's software, including interest points, shape features
  • Semantic Robot Vision Challenge links
lecture slides [ppt] [pdf]

Feb 3
Recognition via classification and global models:

Global appearance models for category and scene recognition, sliding window detection, detection as a binary decision.

hog
  • *Histograms of Oriented Gradients for Human Detection, Dalal and Triggs, CVPR 2005.  [pdf]  [video] [code] [PASCAL datasets]
  • *Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories, Lazebnik, Schmid, and Ponce, CVPR 2006. [pdf]  [15 scenes dataset]  [libpmk] [Matlab]
  • *Rapid Object Detection Using a Boosted Cascade of Simple Features, Viola and Jones, CVPR 2001.  [pdf]  [code]
  • Modeling the Shape of the Scene: a Holistic Representation of the Spatial Envelope, Oliva and Torralba, IJCV 2001.  [pdf]  [Gist code
  • Visual Categorization with Bags of Keypoints, C. Dance, J. Willamowski, L. Fan, C. Bray, and G. Csurka, ECCV International Workshop on Statistical Learning in Computer Vision, 2004.  [pdf]
  • Pedestrian Detection in Crowded Scenes, Leibe, Seemann, and Schiele, CVPR 2005.  [pdf]
  • Pyramids of Histograms of Oriented Gradients (pHOG), Bosch and Zisserman. [code]
  • Eigenfaces for Recognition, Turk and Pentland, 1991.  [pdf]
  • Sampling Strategies for Bag-of-Features Image Classification.  E. Nowak, F. Jurie, and B. Triggs.  ECCV 2006. [pdf]
  • A Trainable System for Object Detection, C. Papageorgiou and T. Poggio, IJCV 2000.  [pdf]
  • Object Recognition with Features Inspired by Visual Cortex. T. Serre, L. Wolf and T. Poggio. CVPR 2005.  [pdf]
  • LIBPMK feature extraction code, includes dense sampling
  • LIBSVM library for support vector machines
lecture slides [ppt] [pdf]

Feb 10
Class begins at 5 pm today.
Objects composed of parts:

Part-based models for category recognition, and local feature matching for correspondence-based recognition

parts
  • *A Discriminatively Trained, Multiscale, Deformable Part Model, by P. Felzenszwalb,  D.  McAllester and D. Ramanan.   CVPR 2008.  [pdf]  [code
  • *Combined Object Categorization and Segmentation with an Implicit Shape Model, by B. Leibe, A. Leonardis, and B. Schiele.   ECCV Workshop on Statistical Learning in Computer Vision, 2004.   [pdf]  [code]  [IJCV extended version]
  • *Learning a Dense Multi-View Representation for Detection, Viewpoint Classification and Synthesis of Object Categories, H. Su, M. Sun, L. Fei-Fei, S. Savarese.  ICCV 2009.  [pdf]
  • Shape Matching and Object Recognition with Low Distortion Correspondences, A. Berg, T. Berg, and J. Malik, CVPR 2005.  [pdf]  [web]
  • Learning Globally-Consistent Local Distance Functions for Shape-Based Image Retrieval and Classification, Frome, Singer, Sha, Malik.  ICCV 2007.  [pdf]
  • Matching Local Self-Similarities Across Images and Videos, Shechtman and Irani, CVPR 2007.  [pdf]
  • The Pyramid Match Kernel: Discriminative Classification with Sets of Image Features, Grauman and Darrell.  ICCV 2005.  [pdf]  [web]  [code]
  • Shape Matching  and Object Recognition Using Shape Contexts.  S. Belongie, J. Malik, J. Puzicha.  PAMI 2002.  [pdf]
  • Multiple Component Learning for Object Detection, Dollar, Babenko, Belongie, Perona, and Tu, ECCV 2008.  [pdf]
  • Object Class Recognition by Unsupervised Scale Invariant Learning, by R. Fergus, P. Perona, and A. Zisserman.  CVPR 2003.  [pdf]  [datasets]
  • Efficient Matching of Pictorial Structures. P. Felzenszwalb and D. Huttenlocher. CVPR 2000.  [pdf] [related code]
  • A Boundary-Fragment-Model for Object Detection, Opelt, Pinz, and Zisserman, ECCV 2006.  [pdf]

Implementation assignment due Friday Feb 12, 5 PM
Feb 17
Region-based models:

Regions as parts, multi-label segmentation, integrated classification and segmentation

regions
  • *Recognition Using Regions.  C. Gu, J. Lim, P. Arbelaez, J. Malik, CVPR 2009.  [pdf]  [slides] [seg code]
  • *Using Multiple Segmentations to Discover Objects and their Extent in Image Collections, B. C. Russell, A. A. Efros, J. Sivic, W. T. Freeman, and A. Zisserman.  CVPR 2006.  [pdf] [code]
  • *Combining Top-down and Bottom-up Segmentation. E. Borenstein, E. Sharon, and S. Ullman.  CVPR  workshop 2004.  [pdf]  [data]
  • Extracting Subimages of an Unknown Category from a Set of Images, S. Todorovic and N. Ahuja, CVPR 2006.  [pdf]
  • Class-Specific, Top-Down Segmentation, E. Borenstein and S. Ullman, ECCV 2002.  [pdf]
  • Object Recognition by Integrating Multiple Image Segmentations, C. Pantofaru, C. Schmid, and M. Hebert, ECCV 2008  [pdf]
  • Image Parsing: Unifying Segmentation, Detection, and Recognition. Tu, Z., Chen, Z., Yuille, A.L., Zhu, S.C. ICCV 2003  [pdf]
  • Robust Higher Order Potentials for Enforcing Label Consistency, P. Kohli, L. Ladicky, and P. Torr. CVPR 2008.  
  • Co-segmentation of Image Pairs by Histogram Matching --Incorporating a Global Constraint into MRFs, C. Rother, V. Kolmogorov, T. Minka, and A. Blake.  CVPR 2006.  [pdf]
  • An Efficient Algorithm for Co-segmentation, D. Hochbaum, V. Singh, ICCV 2009.  [pdf]
  • Normalized Cuts and Image Segmentation, J. Shi and J. Malik.  PAMI 2000.  [pdf]  [code]
  • Greg Mori's superpixel code
  • Berkeley Segmentation Dataset and code
  • Pedro Felzenszwalb's graph-based segmentation code
  • Michael Maire's segmentation code and paper
  • Mean-shift: a Robust Approach Towards Feature Space Analysis [pdf]  [code, Matlab interface by Shai Bagon]
  • David Blei's Topic modeling code
papers: John [pdf]
demo: Sudheendra [ppt]

II. Beyond single objects: recognizing categories in context and learning their properties
Feb 24
Context:

Inter-object relationships, objects within scenes, geometric context, understanding scene layout

context
  • *Discriminative Models for Multi-Class Object Layout, C. Desai, D. Ramanan, C. Fowlkes. ICCV 2009.  [pdf]  [slides]  [SVM struct code] [data]
  • *TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-Class Object Recognition and Segmentation.  J. Shotton, J. Winn, C. Rother, A. Criminisi.  ECCV 2006.  [pdf] [web] [data]
  • *Geometric Context from a Single Image, by D. Hoiem, A. Efros, and M. Hebert, ICCV 2005. [pdf]  [web]  [code]
  • *Contextual Priming for Object Detection, A. Torralba.  IJCV 2003.  [pdf] [web] [code]
  • Putting Objects in Perspective, by D. Hoiem, A. Efros, and M. Hebert, CVPR 2006.  [pdf] [web]
  • Decomposing a Scene into Geometric and Semantically Consistent Regions, S. Gould, R. Fulton, and D. Koller, ICCV 2009.  [pdf]  [slides]
  • Learning Spatial Context: Using Stuff to Find Things, by G. Heitz and D. Koller, ECCV 2008.  [pdf] [code]
  • An Empirical Study of Context in Object Detection, S. Divvala, D. Hoiem, J. Hays, A. Efros, M. Hebert, CVPR 2009.  [pdf]  [web]
  • Object Categorization using Co-Occurrence, Location and Appearance, by C. Galleguillos, A. Rabinovich and S. Belongie, CVPR 2008.[ pdf]
  • Context Based Object Categorization: A Critical SurveyC. Galleguillos and S. Belongie.  [pdf]
  • What, Where and Who? Classifying Events by Scene and Object Recognition, L.-J. Li and L. Fei-Fei, ICCV 2007. [pdf]
  • Towards Total Scene Understanding: Classification, Annotation and Segmentation in an Unsupervised Framework, L-J. Li, R. Socher, L. Fei-Fei, CVPR 2009.  [pdf]
Piyush [ppt]
Robert [pdf]

Mar 3
Attributes:

Visual properties, learning from natural language descriptions, intermediate representations

attributes
  • *Learning To Detect Unseen Object Classes by Between-Class Attribute Transfer, C. Lampert, H. Nickisch, and S. Harmeling, CVPR 2009  [pdf] [web] [data]
  • *Describing Objects by Their Attributes, A. Farhadi, I. Endres, D. Hoiem, and D. Forsyth, CVPR 2009.  [pdf]  [web] [data]
  • *Attribute and Simile Classifiers for Face Verification, N. Kumar, A. Berg, P. Belhumeur, S. Nayar.  ICCV 2009.  [pdf] [web] [data]
  • Learning Visual Attributes, V. Ferrari and A. Zisserman, NIPS 2007.  [pdf]
  • Learning Color Names for Real-World Applications, J. van de Weijer, C. Schmid, J. Verbeek, and D. Larlus.  IEEE TIP 2009.  [pdf]  [web]
  • Learning Models for Object Recognition from Natural Language Descriptions, J. Wang, K. Markert, and M. Everingham, BMVC 2009.[pdf]
Brian [ppt]
Adam [pdf]

Friday
Mar 5
Prof. David Forsyth, UIUC
Forum for AI Talk
11 AM in ACES 2.302



Monday
Mar 8




Project proposal abstract due
Mar 10 Actions and objects/scenes:

Recognizing human actions and objects simultaneously, objects and scenes as context for the activity

actions
  • *Actions in Context, M. Marszalek, I. Laptev, C. Schmid.  CVPR 2009.  [pdf] [web]
  • *Objects in Action: An Approach for Combining Action Understanding and Object Perception.   A. Gupta and L. Davis.  CVPR, 2007.  [pdf]  [data]
  • Exploiting Human Actions and Object Context for Recognition Tasks.  D. Moore, I. Essa, and M. Hayes.  ICCV 1999.  [pdf]
  • A Scalable Approach to Activity Recognition Based on Object Use. J. Wu, A. Osuntogun, T. Choudhury, M. Philipose, and J. Rehg.  ICCV 2007.  [pdf]
  • Towards Using Multiple Cues for Robust Object Recognition, S. Aboutalib and M. Veloso, AAMAS 2007.  [pdf]
Aibo [ppt]

Mar 17
Spring break (no class)



III. Scalability issues in category learning, detection, and search
Mar 24
Too many pixels!

Bottom-up and top-down saliency measures to prioritize features, object importance, saliency in visual search tasks

saliency
  • *A Model of Saliency-based Visual Attention for Rapid Scene Analysis.  L. Itti, C. Koch, and E. Niebur.  PAMI 1998  [pdf]
  • *Some Objects are More Equal Than Others: Measuring and Predicting Importance, M. Spain and P. Perona.  ECCV 2008.  [pdf]
  • *Optimal Scanning for Faster Object Detection,  N. Butko, J. Movellan.  CVPR 2009.  [pdf]
  • Reading Between the Lines: Object Localization Using Implicit Cues from Image Tags.  S. J. Hwang and K. Grauman.  CVPR 2010.  [pdf]
  • Beyond Sliding Windows: Object Localization by Efficient Subwindow Search, C. Lampert, M. Blaschko, T. Hofmann.  CVPR 2008.  [pdf]
  • Peripheral-Foveal Vision for Real-time Object Recognition and Tracking in Video.  S. Gould, J. Arfvidsson, A. Kaehler, B. Sapp, M. Messner, G. Bradski, P. Baumstrack,S. Chung, A. Ng.  IJCAI 2007.  [pdf]
  • Peekaboom: A Game for Locating Objects in Images, by L. von Ahn, R. Liu and M. Blum, CHI 2006. [pdf]  [web]
  • Determining Patch Saliency Using Low-Level Context, D. Parikh, L. Zitnick, and T. Chen. ECCV 2008.  [pdf]
  • Learning to Predict Where Humans Look, T. Judd, K. Ehinger, F. Durand, A. Torralba.  ICCV 2009.  [pdf] [web]
  • Visual Recognition and Detection Under Bounded Computational Resources, S. Vijayanarasimhan and A. Kapoor.  CVPR 2010.
  • Torralba Global Features and Attention
  • The Role of Top-down and Bottom-up Processes in Guiding Eye Movements during Visual Search, G. Zelinsky, W. Zhang, B. Yu, X. Chen, D. Samaras, NIPS 2005.  [pdf]

  • Amazon Mechanical Turk
  • Using Mechanical Turk with LabelMe
Anush [pdf]
Project update and extended outline due Friday Mar 26
Mar 31
Too many categories!

Scalable recognition with many object categories


shared
  • *Sharing Visual Features for Multiclass and Multiview Object Detection, A. Torralba, K. Murphy, W. Freeman, PAMI 2007.  [pdf]  [code]
  • *Cross-Generalization: Learning Novel Classes from a Single Example by Feature Replacement.  CVPR 2005.  [pdf]
  • *Constructing Category Hierarchies for Visual Recognition, M. Marszalek and C. Schmid.  ECCV 2008.  [pdf]  [web] [Caltech256]
  • Learning Generative Visual Models from Few Training Examples: an Incremental Bayesian Approach Tested on 101 Object Categories. L. Fei-Fei, R. Fergus, and P. Perona. CVPR Workshop on Generative-Model Based Vision. 2004.  [pdf] [Caltech101]
  • Towards Scalable Representations of Object Categories: Learning a Hierarchy of Parts. S. Fidler and A. Leonardis.  CVPR 2007  [pdf]
  • Exploiting Object Hierarchy: Combining Models from Different Category Levels, A. Zweig and D. Weinshall, ICCV 2007 [pdf]
  • Learning and Using Taxonomies for Fast Visual Categorization, G. Griffin and P. Perona, CVPR 2008.  [pdf]
  • Incremental Learning of Object Detectors Using a Visual Shape Alphabet.  Opelt, Pinz, and Zisserman, CVPR 2006.  [pdf]
  • Sequential Learning of Reusable Parts for Object Detection.  S. Krempp, D. Geman, and Y. Amit.  2002  [pdf]
  • ImageNet: A Large-Scale Hierarchical Image Database, J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li and L. Fei-Fei, CVPR 2009 [pdf]  [data]
Rui [ppt]
Patrick [ppt]
Week of Mar 29 - Apr 2:
Individual project update meetings (by appt)
Apr 7
Too many images!

Scalable image search with large databases


hash
  • *Kernelized Locality Sensitive Hashing for Scalable Image Search, by B. Kulis and K. Grauman, ICCV 2009 [pdf]  [code]
  • *Geometric Min-Hashing: Finding a (Thick) Needle in a Haystack, O. Chum, M. Perdoch, and J. Matas.  CVPR 2009.  [pdf]
  • *Detecting Objects in Large Image Collections and Videos by Efficient Subimage Retrieval, C. Lampert, ICCV 2009.  [pdf]  [code] [code]
  • 80 Million Tiny Images: A Large Dataset for Non-Parametric Object and Scene Recognition, by A. Torralba, R. Fergus, and W. Freeman.  PAMI 2008.  [pdf] [web]
  • Fast Image Search for Learned Metrics, P. Jain, B. Kulis, and K. Grauman, CVPR 2008.  [pdf]
  • Small Codes and Large Image Databases for Recognition, A. Torralba, R. Fergus, and Y. Weiss, CVPR 2008.  [pdf]
  • Object Retrieval with Large Vocabularies and Fast Spatial Matching.  J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman, CVPR 2007.  [pdf]
  • LSH homepage
  • Nearest Neighbor Methods in Learning and Vision, Shakhnarovich, Darrell, and Indyk, editors.
Muhibur [ppt]

IV: Recognition and "everyday" visual data
Apr 14
Landmarks, locations, and tourist photographers:

Location recognition, cues from tourist photos, photographer biases, retrieval for landmarks, browsing and visualization

location
  • *Landmark Classification in Large-Scale Image Collections.  Y. Li, D. Crandall, D. Huttenlocher.  ICCV 2009.  [pdf]
  • *Image Sequence Geolocation with Human Travel Priors, E. Kalogerakis, O. Vesselova, J. Hays, A. Efros, A. Hertzmann.  ICCV 2009.  [pdf]  [web]
  • *Scene Summarization for Online Image Collections.  I. Simon, N. Snavely, S. Seitz.  ICCV 2007.  [pdf]  [web]
  • Mapping the World's Photos, D. Crandall, L. Backstrom, D. Huttenlocher, J. Kleinberg, WWW 2009.  [pdf]  [web]
  • Im2GPS: Estimating Geographic Information from a Single Image, J. Hays and A. Efros.  CVPR 2008.  [pdf]  [web]
  • Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval, Chum, Philbin, Sivic, Isard, and Zisserman, ICCV 2007.  [pdf
  • Scene Segmentation Using the Wisdom of Crowds, by I. Simon and S. Seitz.  ECCV 2008.  [pdf]
  • Photo Tourism: Exploring Photo Collections in 3D, by N. Snavely, S. Seitz, and R. Szeliski, SIGGRAPH 2006.  [pdf] [web]
  • Modeling and Recognition of Landmark Image Collections Using Iconic Scene Graphs, by X. Li, C. Wu, C. Zach, S. Lazebnik, and J. Frahm, ECCV 2008.  [pdf]  [web]
  • City-Scale Location Recognition, G. Schindler, M. Brown, and R. Szeliski, CVPR 2007.  [pdf
  • Parsing Images of Architectural Scenes, A. Berg, F. Grabler, J. Malik.  ICCV 2007.  [pdf]
  • I Know What You Did Last Summer: Object-Level Auto-annotation of Holiday Snaps, S. Gammeter, L. Bossard, T.Quack, L. van Gool, ICCV 2009.  [pdf]
  • CVPR 2009 Workshop on Visual Place Categorization
  • Code for downloading Flickr images, by James Hays
  • UW Community Photo Collections homepage
Sarah [ppt]
Suyog [pdf]

Apr 21
Alignment with text:

Discovering the correspondence between words (and other language constructs) to images or video, using captions or subtitles as weak labels.

lang
  • *"'Who are you?' - Learning Person Specific Classifiers from Video, J. Sivic, M. Everingham, and A. Zisserman, CVPR 2009.  [pdf]
  • *Beyond Nouns: Exploiting Prepositions and Comparative Adjectives for Learning Visual Classifiers, A. Gupta and L. Davis, ECCV 2008.  [pdf]
  • *Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary, P. Duygulu, K. Barnard, N. de Freitas, D. Forsyth. ECCV 2002.  [pdf]  [data]
  • The Mathematics of Statistical Machine Translation: Parameter Estimation.  P. Brown, S. Della Pietro, V. Della Pietra, R. Mercer.  Association for Computational Linguistics, 1993.  [pdf]
  • Who's Doing What: Joint Modeling of Names and Verbs for Simultaneous Face and Pose Annotation.  L. Jie, B. Caputo, and V. Ferrari.  NIPS 2009.  [pdf]
  • Names and Faces in the News, by T. Berg, A. Berg, J. Edwards, M. Maire, R. White, Y. Teh, E. Learned-Miller and D. Forsyth, CVPR 2004.  [pdf]  [web]
  • Learning Sign Language by Watching TV (using weakly aligned subtitles), P. Buehler, M. Everingham, and A. Zisserman. CVPR 2009.  [pdf]  [data]
  • “Hello! My name is... Buffy” – Automatic Naming of Characters in TV Video, by M. Everingham, J. Sivic and A. Zisserman, BMVC 2006.  [pdf]  [web]  [data]
  • Using Closed Captions to Train Activity Recognizers that Improve Video Retrieval, S. Gupta and R. Mooney. CVPR Visual and Contextual Learning Workshop, 2009.  [pdf]
  • Systematic Evaluation of Machine Translation Methods for Image and Video Annotation, P. Virga, P. Duygulu, CIVR 2005.  [pdf]
  • Subrip for subtitle extraction
  • Reuters captioned photos
  • Sonal Gupta's data for commentary+video
Anish [pdf]
Chao-Yeh [ppt]

Friday
April 23
Prof. Martial Hebert, CMU
Forum for AI Talk
11 AM, TAY 3.128



Apr 28
Pictures of people:

Faces, consumer photo collections, tagging

faces
  • *Understanding Images of Groups of People, A. Gallagher and T. Chen, CVPR 2009.  [pdf]
  • *Contextual Identity Recognition in Personal Photo Albums. D. Anguelov, K.-C. Lee, S. Burak, Gokturk, and B. Sumengen. CVPR 2007.  [pdf]
  • *A Face Annotation Framework with Partial Clustering and Interactive Labeling.  R. X. Y. Tian,W. Liu, F.Wen, and X. Tang.  CVPR 2007.  [pdf] [web]
  • Autotagging Facebook: Social Network Context Improves Photo Annotation, by  Z. Stone, T. Zickler, and T. Darrell.  CVPR Internet Vision Workshop 2008.   [pdf]
  • Efficient Propagation for Face Annotation in Family Albums. L. Zhang, Y. Hu, M. Li, and H. Zhang.  MM 2004.  [pdf]
  • Using Group Prior to Identify People in Consumer Images, A. Gallagher, T. Chen,  CVPR Workshop on Semantic Learning Applications in Multimedia, 2007.  [pdf]
  • Leveraging Archival Video for Building Face Datasets, by D. Ramanan, S. Baker, and S. Kakade.  ICCV 2007.  [pdf]
  • Names and Faces in the News, by T. Berg, A. Berg, J. Edwards, M. Maire, R. White, Y. Teh, E. Learned-Miller and D. Forsyth, CVPR 2004.  [pdf]  [web]
  • Face detection code in OpenCV
  • Gallagher's Person Dataset
MingJun

Friday
April 30




Final paper drafts due
May 5
Course wrap-up
Project presentations, part I


Presentations due
May 13



Final papers due


Other useful links:

 
 
Related courses:
 
Past semesters at UT:
 
By colleagues elsewhere:

 

Posted by uniqueone
,
http://www.cs.utexas.edu/~grauman/courses/spring2011/index.htmlCS 376: Computer Vision
Spring 2011

Mon/Wed 11:00 am - 12:15 pm
UTC 3.124



Instructor: Kristen Grauman
Office location: ACE 3.446
Office hours: Wed 5-6 pm, and by appointment.

TA: Shalini Sahoo shalini@cs.utexas.edu
Office location: PAI 5.33 TA station, desk 3
Office hours: Tues/Thurs 5-6 pm

TA (office hours only): Yong Jae Lee
Office location: PAI 5.33 TA station, desk 3
Office hours: Mon 5-6 pm
Please come to any of our office hours for questions about assignments or lectures.

Questions via email about an assignment should be sent to:
cv-spring2011@cs.utexas.edu, with "CS376" in the beginning of the subject line.
This will ensure the most timely response from the instructor or TA.



Announcements

The final exam slot has been confirmed by the registrar: Monday May 16, 2-5 pm, in JGB 2.102.  You may bring two sheets of notes on 8.5 x 11" paper.  The exam is comprehensive.

View all current grades and late days used on Blackboard.


Overview

Course description: Billions of images are hosted publicly on the web---how can you find one that “looks like” some image you are interested in?  Could we interact with a computer in richer ways than a keyboard and mouse, perhaps with natural gestures or simply facial expressions?  How can a robot identify objects in complex environments, or navigate uncharted territory?  How can a video camera in the operating room help a surgeon plan a procedure more safely, or assist a radiologist in more efficiently detecting a tumor?  Given some video sequence of a scene, can we synthesize new virtual views from arbitrary viewpoints that make a viewer feel as if they are in the movie?

In computer vision, the goal is to develop methods that enable a machine to “understand” or analyze images and videos.   In this introductory computer vision course, we will explore various fundamental topics in the area, including image formation, feature detection, segmentation, multiple view geometry, recognition and learning, and video processing.  This course is intended for upper-level undergraduate students. 

Textbook: The
textbook is Computer Vision: Algorithms and Applications, by Rick Szeliski.  It is currently available for purchase e.g. at Amazon for ~$65.  An electronic copy is also available free online here.  I will also select some background reading on object recognition from this short book on Visual Object Recognition that I prepared together with Bastian Leibe.

Syllabus: Details on prerequisites, course requirements, textbooks, and grading policy are posted
here.    A high-level summary of the syllabus is here.

Problem set deadlines: Assignments are due about every two weeks.  The dates below are tentative and are provided to help your planning.  They are subject to minor shifts if the lecture plan needs to be adjusted slightly according to our pace in class. 
  • Pset 0 due Jan 28
  • Pset 1 due Feb 14 (tentative)
  • Pset 2 due Mar 2 (tentative)
  • Pset 3 due Mar 28 (tentative)
  • Pset 4 due April 18 (tentative)
  • Pset 5 due May 4 (tentative)

Schedule


Dates
Topic
Readings and links
Lectures
Assignments, exams

Wed Jan 19
Course intro
Sec 1.1-1.3
Intro
[pdf]

Pset 0 out Friday Jan 21
filters.jpg
Mon Jan 24
Features and filters
Sec 3.1.1-2, 3.2
Linear filters
[ppt] [pdf] [outline]


Wed Jan 26

Sec 3.2.3, 4.2

Seam carving paper
Seam carving video
Gradients and edges
[ppt] [pdf] [outline]
Pset 0 due Friday Jan 28
Mon Jan 31

Sec 3.3.2-4
Binary image analysis
[ppt] [pdf] [outline]
Pset 1 out [class results]
Wed Feb 2

Sec 10.5

Texture Synthesis
Texture
[ppt] [pdf] [outline]

Mon Feb 7
Sec 2.3.2

Foundations of Color, B. Wandell

Lotto Lab illusions
Color
[ppt] [pdf] [outline]
Pset 0 grades and solutions returned in class

grouping.jpg
Wed Feb 9
Grouping and fitting Sec 5.2-5.4

k-means demo

Segmentation and clustering
[ppt] [pdf] [outline]

Mon Feb 14

Sec 4.3.2

Hough Transform demo

Excerpt from Ballard & Brown

Hough transform
[ppt] [pdf] [outline]


Pset 1 due Monday Feb 14
 
Pset 2 out
Wed Feb 16
Mon Feb 21

Sec 5.1.1
Deformable contours
[ppt] [pdf] [outline]

Wed Feb 23

Sec 2.1.1, 2.1.2, 6.1.1
Alignment and 2d image transformations
[ppt] [pdf] [outline]
Pset 1 grades and solutions returned in class

multiview.jpg
Mon Feb 28
Multiple views and motion
Sec 3.6.1, 6.1.4
Homography and image warping
[ppt] [pdf] [outline]

Wed Mar 2

Sec 4.1
Local invariant features 1
[ppt] [pdf] [outline]
Pset 2 due Wednesday Mar 2
Mon Mar 7

(Sec 4.1) Local invariant features 2
[ppt] [pdf] [outline]

Wed Mar 9



Midterm exam

Pset 2 grades and solutions returned in class
Spring break



Pset 3 out  [class results]
Mon Mar 21

Sec 11.1.1, 11.2-11.5
Image formation (and local feature matching wrap-up)
[ppt] [pdf] [outline]

Wed Mar 23

Sec 11.1.1, 11.2-11.5

Epipolar geometry demo

Audio camera, O'Donovan et al.
Stereo 1: Epipolar geometry
[ppt] [pdf] [outline]


Mon Mar 28
Virtual viewpoint video, Zitnick et al.
Stereo 2: Correspondence and calibration
[ppt] [pdf] [outline]


recognition.jpg
Wed Mar 30
Recognition Grauman & Leibe Ch 1-4 (3 is review)

Indexing local features
[ppt] [pdf] [outline]
Pset 3 due Wed March 30
Mon April 4

Grauman & Leibe Ch 5, 6

Szeliski 14.3

Video Google demo by Sivic et al., paper
Instance recognition
[ppt] [pdf] [outline]

Wed April 6

Grauman & Leibe Ch 7, 8.1, 9.1, 11.1

Szeliski 14.1
Intro to category recognition
[ppt] [pdf] [outline]
Pset 3 grades and solutions returned in class
Pset 4 out
Mon April 11

Grauman & Leibe Ch 7, 8.1, 9.1, 11.1

Szeliski 14.1

Viola-Jones face detection paper (for additional reference)
Face detection
[ppt] [pdf] [outline]

Wed April 13

Grauman & Leibe
11.3, 11.4

Szeliski 14.4
Discriminative classifiers for image recognition
[ppt] [pdf] [outline]

Mon April 18

Grauman & Leibe
11.3, 11.4

Szeliski 14.4
Part-based models
[ppt] [pdf] [outline]


video.jpg
Wed April 20
Video processing
8.4, 12.6.4
Motion
[ppt] [pdf] [outline]

Pset 4 due Wed April 20
Mon April 25

8.4, 12.6.4

Davis & Bobick paper: The Representation and Recognition of Action Using Temporal Templates

Stauffer & Grimson paper: Adaptive Background Mixture Models for Real-Time Tracking.


Background subtraction, Action recognition
[ppt] [pdf] [outline]

Pset 5 out
Wed April 27

5.1.2, 4.1.4
Tracking
[ppt] [pdf]

Pset 4 grades and solutions returned
Mon May 2


Course wrap-up and review

Wed May 4











Pset 5 due Sun May 8
Mon May 16
2-5 pm



Final exam in JGB 2.102



Links

 

Posted by uniqueone
,
http://www.cs.utexas.edu/~grauman/courses/fall2011/schedule.html

CS395T: Visual Recognition, Fall 2011



Course overview        Useful links        Syllabus        Detailed schedule          Blackboard


Meets:
Wednesdays 4:00-7:00 pm
ACES 3.408

Instructor: Kristen Grauman 
Email: grauman@cs
Office: ACES 3.446 

Office hours: by appointment

When emailing me, please put CS395 in the subject line.

Announcements:

See the schedule for weekly reading assignments.

Project paper drafts due Nov 23.  Details on projects are here.

Course overview:


Topics: This is a graduate seminar course in computer vision.   We will survey and discuss current vision papers relating to object recognition, auto-annotation of images, and scene understanding.  The goals of the course will be to understand current approaches to some important problems, to actively analyze their strengths and weaknesses, and to identify interesting open questions and possible directions for future research.

See the syllabus for an outline of the main topics we'll be covering.

Requirements: Students will be responsible for writing paper reviews each week, participating in discussions, completing one programming assignment, presenting once or twice in class (depending on enrollment), and completing a project (done in pairs). 

Note that presentations are due one week before the slot your presentation is scheduled.  This means you will need to read the papers, prepare experiments, create slides, etc. more than one week before the date you are signed up for.  The idea is to meet and discuss ahead of time, so that we can iterate as needed the week leading up to your presentation. 

More details on the requirements and grading breakdown are here.

Prereqs:  Courses in computer vision and/or machine learning (378/376 Computer Vision and/or 391 Machine Learning, or similar); ability to understand and analyze conference papers in this area; programming required for experiment presentations and projects. 

Please talk to me if you are unsure if the course is a good match for your background.  I generally recommend scanning through a few papers on the syllabus to gauge what kind of background is expected.  I don't assume you are already familiar with every single algorithm/tool/image feature a given paper mentions, but you should feel comfortable following the key ideas.



Syllabus overview:

  1. Single-object recognition fundamentals: representation, matching, and classification
    1. Specific objects
    2. Classification and global models
    3. Regions and mid-level representations
  2. Beyond single objects: scenes and properties
    1. Context and scenes
    2. Saliency, importance, attention
    3. Attributes
  3. External input in recognition
    1. Language and text
    2. Interactive learning and recognition
  4. Activity in images and videos
    1. Pictures of people
    2. Activity recognition
  5. Dealing with lots of data/categories
    1. Scaling with a large number of categories
    2. Large-scale search and mining
    3. Automatic summarization

Important dates:
    • Monday, Aug 29: paper topic preferences due
    • Friday, Sept 16: implementation assignment due
    • Friday, Oct 7: project proposals due
    • Wednesday, Nov 23: final project paper drafts due
    • Tuesday, Dec 6: final papers due


    Schedule and papers:


    Note:  * = required reading. 
    Additional papers are provided for reference, and as a starting point for background reading for projects.
    Paper presentations: focus on starred papers (additionally mentioning ideas from others is ok but not necessary).
    Experiment presentations: Pick from only among the starred papers.
    Date
    Topics
    Papers and links
    Presenters
    Items due
    Aug 24
    Course intro 

    [slides]
    Topic preferences due via email by Monday August 29
    I. Single-object recognition fundamentals: representation, matching, and classification
    Aug 31
    Recognizing specific objects:

    Invariant local features, instance recognition, bag-of-words models

    sift
    • *Object Recognition from Local Scale-Invariant Features, Lowe, ICCV 1999.  [pdf]  [code] [other implementations of SIFT] [IJCV]

    • *Local Invariant Feature Detectors: A Survey, Tuytelaars and Mikolajczyk.  Foundations and Trends in Computer Graphics and Vision, 2008. [pdf]  [Oxford code] [Read pp. 178-188, 216-220, 254-255]

    • *Video Google: A Text Retrieval Approach to Object Matching in Videos, Sivic and Zisserman, ICCV 2003.  [pdf]  [demo]


    • For more background on feature extraction: Szeliski book: Sec 3.2 Linear filtering, 4.1 Points and patches, 4.2 Edges

    • Scalable Recognition with a Vocabulary Tree, D. Nister and H. Stewenius, CVPR 2006. [pdf]

    • SURF: Speeded Up Robust Features, Bay, Ess, Tuytelaars, and Van Gool, CVIU 2008.  [pdf] [code]

    • Bundling Features for Large Scale Partial-Duplicate Web Image Search.  Z. Wu, Q. Ke, M. Isard, and J. Sun.  CVPR 2009.  [pdf]

    • Robust Wide Baseline Stereo from Maximally Stable Extremal Regions, J. Matas, O. Chum, U. Martin, and T. Pajdla, BMVC 2002.  [pdf]

    • City-Scale Location Recognition, G. Schindler, M. Brown, and R. Szeliski, CVPR 2007.  [pdf

    • Object Retrieval with Large Vocabularies and Fast Spatial Matching.  J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman, CVPR 2007.  [pdf]

    • I Know What You Did Last Summer: Object-Level Auto-annotation of Holiday Snaps, S. Gammeter, L. Bossard, T.Quack, L. van Gool, ICCV 2009.  [pdf]

    • Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval.  O. Chum et al. CVPR 2007.  [pdf]

    • A Performance Evaluation of Local Descriptors. K. Mikolajczyk and C. Schmid.  CVPR 2003 [pdf]


    [slides]

    Sept 7
    Recognition via classification and global models:

    Global appearance models for category and scene recognition, sliding window detection, detection as a binary decision.

    hog
    • *A Discriminatively Trained, Multiscale, Deformable Part Model, by P. Felzenszwalb,  D.  McAllester and D. Ramanan.   CVPR 2008.  [pdf]  [code

    • *Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories, Lazebnik, Schmid, and Ponce, CVPR 2006. [pdf]  [15 scenes dataset]  [libpmk] [Matlab]

    • *Rapid Object Detection Using a Boosted Cascade of Simple Features, Viola and Jones, CVPR 2001.  [pdf]  [code]


    • Histograms of Oriented Gradients for Human Detection, Dalal and Triggs, CVPR 2005.  [pdf]  [video] [code] [PASCAL datasets]

    • Modeling the Shape of the Scene: a Holistic Representation of the Spatial Envelope, Oliva and Torralba, IJCV 2001.  [pdf]  [Gist code

    • Locality-Constrained Linear Coding for Image Classification.  J. Wang, J. Yang, K. Yu,  and T. Huang  CVPR 2010. [pdf] [code]

    • Visual Categorization with Bags of Keypoints, C. Dance, J. Willamowski, L. Fan, C. Bray, and G. Csurka, ECCV International Workshop on Statistical Learning in Computer Vision, 2004.  [pdf]

    • Pedestrian Detection in Crowded Scenes, Leibe, Seemann, and Schiele, CVPR 2005.  [pdf]

    • Pyramids of Histograms of Oriented Gradients (pHOG), Bosch and Zisserman. [code]

    • Eigenfaces for Recognition, Turk and Pentland, 1991.  [pdf]

    • Sampling Strategies for Bag-of-Features Image Classification.  E. Nowak, F. Jurie, and B. Triggs.  ECCV 2006. [pdf]

    • Beyond Sliding Windows: Object Localization by Efficient Subwindow Search.  C. Lampert, M. Blaschko, and T. Hofmann.  CVPR 2008.  [pdf]  [code]

    • A Trainable System for Object Detection, C. Papageorgiou and T. Poggio, IJCV 2000.  [pdf]

    • Object Recognition with Features Inspired by Visual Cortex. T. Serre, L. Wolf and T. Poggio. CVPR 2005.  [pdf]


    [slides]

    Sept 14
    Regions and mid-level representations

    Segmentation, grouping, surface estimation

    regions

    geocontext
    • *Constrained Parametric Min-Cuts for Automatic Object Segmentation. J. Carreira and C. Sminchisescu. CVPR 2010.  [pdf] [code]

    • *Geometric Context from a Single Image, by D. Hoiem, A. Efros, and M. Hebert, ICCV 2005. [pdf]  [web]  [code]

    • *Contour Detection and Hierarchical Image Segmentation.  P. Arbelaez,  M. Maire, C. Fowlkes, and J. Malik. PAMI 2011.  [pdf] [data and code]


    • From Contours to Regions: An Empirical Evaluation.  P. Arbelaez, M. Maire, C. Fowlkes, and J. Malik.  CVPR 2009.  [pdf] [code]

    • Boundary-Preserving Dense Local Regions.  J. Kim and K. Grauman.  CVPR 2011.  [pdf]  [code]

    • Object Recognition as Ranking Holistic Figure-Ground Hypotheses. F. Li, J. Carreira, and C. Sminchisescu. CVPR 2010. [pdf]

    • Using Multiple Segmentations to Discover Objects and their Extent in Image Collections, B. C. Russell, A. A. Efros, J. Sivic, W. T. Freeman, and A. Zisserman.  CVPR 2006.  [pdf] [code]

    • Combining Top-down and Bottom-up Segmentation. E. Borenstein, E. Sharon, and S. Ullman.  CVPR  workshop 2004.  [pdf]  [data]

    • Efficient Region Search for Object Detection.  S. Vijayanarasimhan and K. Grauman. CVPR 2011.  [pdf] [code] [data]

    • Extracting Subimages of an Unknown Category from a Set of Images, S. Todorovic and N. Ahuja, CVPR 2006.  [pdf]

    • Learning Mid-level Features for Recognition. Y.-L. Boureau, F. Bach, Y. LeCun, and J. Ponce. CVPR, 2010. 

    • Class-Specific, Top-Down Segmentation, E. Borenstein and S. Ullman, ECCV 2002.  [pdf]

    • Object Recognition by Integrating Multiple Image Segmentations, C. Pantofaru, C. Schmid, and M. Hebert, ECCV 2008  [pdf]

    • Image Parsing: Unifying Segmentation, Detection, and Recognition. Tu, Z., Chen, Z., Yuille, A.L., Zhu, S.C. ICCV 2003  [pdf]

    • GrabCut -Interactive Foreground Extraction using Iterated Graph Cuts, by C. Rother, V. Kolmogorov, A. Blake, SIGGRAPH 2004.  [pdf]  [project page]

    • Recognition Using Regions.  C. Gu, J. Lim, P. Arbelaez, J. Malik, CVPR 2009.  [pdf] [code]

    • Robust Higher Order Potentials for Enforcing Label Consistency, P. Kohli, L. Ladicky, and P. Torr. CVPR 2008.  

    • Co-segmentation of Image Pairs by Histogram Matching --Incorporating a Global Constraint into MRFs, C. Rother, V. Kolmogorov, T. Minka, and A. Blake.  CVPR 2006.  [pdf]

    • Collect-Cut: Segmentation with Top-Down Cues Discovered in Multi-Object Images.  Y. J. Lee and K. Grauman. CVPR 2010.  [pdf] [data]

    • An Efficient Algorithm for Co-segmentation, D. Hochbaum, V. Singh, ICCV 2009.  [pdf]

    • Normalized Cuts and Image Segmentation, J. Shi and J. Malik.  PAMI 2000.  [pdf]  [code]


    • Greg Mori's superpixel code
    • Berkeley Segmentation Dataset and code
    • Pedro Felzenszwalb's graph-based segmentation code
    • Michael Maire's segmentation code and paper
    • Mean-shift: a Robust Approach Towards Feature Space Analysis [pdf]  [code, Matlab interface by Shai Bagon]
    • David Blei's Topic modeling code
    [slides]
    Expts: Brian, Cho-Jui
    Implementation assignment due Friday Sept 16, 5 PM
    II. Beyond single objects: scenes and properties
    Sept 21
    Context and scenes

    Multi-object scenes, inter-object relationships, understanding scenes' spatial layout, 3d context

    context
    • *Estimating Spatial Layout of Rooms using Volumetric Reasoning about Objects and Surfaces.  D. Lee, A. Gupta, M. Hebert, and T. Kanade.  NIPS 2010.  [pdf] [code]

    • *Multi-Class Segmentation with Relative Location Prior.  S. Gould, J. Rodgers, D. Cohen, G. Elidan and D.  Koller.  IJCV 2008. [pdf] [code]

    • *Using the Forest to See the Trees: Exploiting Context for Visual Object Detection and Localization.  Torralba, Murphy, and Freeman.  CACM 2009.  [pdf] [related code]


    • Contextual Priming for Object Detection, A. Torralba.  IJCV 2003.  [pdf] [web] [code]

    • TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-Class Object Recognition and Segmentation.  J. Shotton, J. Winn, C. Rother, A. Criminisi.  ECCV 2006.  [pdf] [web] [data] [code]

    • Recognition Using Visual Phrases.  M. Sadeghi and A. Farhadi.  CVPR 2011.  [pdf]

    • Thinking Inside the Box: Using Appearance Models and Context Based on Room Geometry.  V. Hedau, D. Hoiem, and D. Forsyth.  ECCV 2010 [pdf] [code and data]

    • Blocks World Revisited: Image Understanding Using Qualitative Geometry and Mechanics, A. Gupta, A. Efros, and M. Hebert.  ECCV 2010. [pdf]

    • Object-Graphs for Context-Aware Category Discovery.  Y. J. Lee and K. Grauman.  CVPR 2010.  [pdf] [code]

    • Geometric Reasoning for Single Image Structure Recovery.  D. Lee, M. Hebert, and T. Kanade.  CVPR 2009.  [pdf]  [web[code]

    • Putting Objects in Perspective, by D. Hoiem, A. Efros, and M. Hebert, CVPR 2006.  [pdf] [web]

    • Discriminative Models for Multi-Class Object Layout, C. Desai, D. Ramanan, C. Fowlkes. ICCV 2009.  [pdf]  [slides]  [SVM struct code] [data]

    • Closing the Loop in Scene Interpretation.  D. Hoiem, A. Efros, and M. Hebert.  CVPR 2008.  [pdf]

    • Decomposing a Scene into Geometric and Semantically Consistent Regions, S. Gould, R. Fulton, and D. Koller, ICCV 2009.  [pdf]  [slides]

    • Learning Spatial Context: Using Stuff to Find Things, by G. Heitz and D. Koller, ECCV 2008.  [pdf] [code]

    • An Empirical Study of Context in Object Detection, S. Divvala, D. Hoiem, J. Hays, A. Efros, M. Hebert, CVPR 2009.  [pdf]  [web]

    • Object Categorization using Co-Occurrence, Location and Appearance, by C. Galleguillos, A. Rabinovich and S. Belongie, CVPR 2008.[ pdf]

    • Context Based Object Categorization: A Critical SurveyC. Galleguillos and S. Belongie.  [pdf]

    • What, Where and Who? Classifying Events by Scene and Object Recognition, L.-J. Li and L. Fei-Fei, ICCV 2007. [pdf]

    • Towards Total Scene Understanding: Classification, Annotation and Segmentation in an Unsupervised Framework, L-J. Li, R. Socher, L. Fei-Fei, CVPR 2009.  [pdf]

    Papers: Nishant, Jung
    Expts: Saurajit


    Sept 28
    Saliency and attention

    Among all items in the scene, which deserve attention (first)?

    saliency
    • *A Model of Saliency-based Visual Attention for Rapid Scene Analysis.  L. Itti, C. Koch, and E. Niebur.  PAMI 1998  [pdf]

    • *Learning to Detect a Salient Object.  T. Liu et al. CVPR 2007.  [pdf]  [results]  [data]  [code by Vicente Ordonez]

    • *Figure-Ground Segmentation Improves Handled Object Recognition in Egocentric Video.  X. Ren and C. Gu.  CVPR 2010 [pdf] [videos] [data]

    • *What Do We Perceive in a Glance of a Real-World Scene?  L. Fei-Fei, A. Iyer, C. Koch, and P. Perona.  Journal of Vision, 2007.  [pdf]


    • Interesting Objects are Visually Salient.  L. Elazary and L. Itti.  Journal of Vision, 8(3):1–15, 2008.  [pdf]

    • Accounting for the Relative Importance of Objects in Image Retrieval.  S. J. Hwang and K. Grauman.  BMVC 2010.  [pdf] [web] [data]

    • Some Objects are More Equal Than Others: Measuring and Predicting Importance, M. Spain and P. Perona.  ECCV 2008.  [pdf]

    • What Makes an Image Memorable?  P. Isola et al. CVPR 2011. [pdf]
    • The Discriminant Center-Surround Hypothesis for Bottom-Up Saliency. D. Gao, V.Mahadevan, and N. Vasconcelos. NIPS, 2007.  [pdf]

    • Category-Independent Object Proposals.  I. Endres and D. Hoiem.  ECCV 2010.  [pdf]  [code]

    • What is an Object?  B. Alexe, T. Deselaers, and V. Ferrari.  CVPR 2010.  [pdf] [code]

    • A Principled Approach to Detecting Surprising Events in Video.  L. Itti and P. Baldi.  CVPR 2005  [pdf]

    • Optimal Scanning for Faster Object Detection,  N. Butko, J. Movellan.  CVPR 2009.  [pdf]

    • What Attributes Guide the Deployment of Visual Attention and How Do They Do It? J. Wolfe and T. Horowitz. Neuroscience, 5:495–501, 2004.  [pdf]

    • Visual Correlates of Fixation Selection: Effects of Scale and Time. B. Tatler, R. Baddeley, and I. Gilchrist. Vision Research, 45:643, 2005.  [pdf]

    • Objects Predict Fixations Better than Early Saliency.  W. Einhauser, M. Spain, and P. Perona. Journal of Vision, 8(14):1–26, 2008.  [pdf]

    • Reading Between the Lines: Object Localization Using Implicit Cues from Image Tags.  S. J. Hwang and K. Grauman.  CVPR 2010.  [pdf]  [data]

    • Peripheral-Foveal Vision for Real-time Object Recognition and Tracking in Video.  S. Gould, J. Arfvidsson, A. Kaehler, B. Sapp, M. Messner, G. Bradski, P. Baumstrack,S. Chung, A. Ng.  IJCAI 2007.  [pdf]

    • Peekaboom: A Game for Locating Objects in Images, by L. von Ahn, R. Liu and M. Blum, CHI 2006. [pdf]  [web]

    • Determining Patch Saliency Using Low-Level Context, D. Parikh, L. Zitnick, and T. Chen. ECCV 2008.  [pdf]

    • Visual Recognition and Detection Under Bounded Computational Resources, S. Vijayanarasimhan and A. Kapoor.  CVPR 2010.

    • Key-Segments for Video Object Segmentation.  Y. J. Lee, J. Kim, and K. Grauman.  ICCV 2011  [pdf]

    • Contextual Guidance of Eye Movements and Attention in Real-World Scenes: The Role of Global Features on Object Search.  A. Torralba, A. Oliva, M. Castelhano, J. Henderson.  [pdf] [web]

    • The Role of Top-down and Bottom-up Processes in Guiding Eye Movements during Visual Search, G. Zelinsky, W. Zhang, B. Yu, X. Chen, D. Samaras, NIPS 2005.  [pdf]

    Papers: Lu Xia
    Expts: Larry


    Oct 5
    Attributes:

    Visual properties, learning from natural language descriptions, intermediate representations

    attributes
    • *Learning To Detect Unseen Object Classes by Between-Class Attribute Transfer, C. Lampert, H. Nickisch, and S. Harmeling, CVPR 2009  [pdf] [web] [data]

    • *Describing Objects by Their Attributes, A. Farhadi, I. Endres, D. Hoiem, and D. Forsyth, CVPR 2009.  [pdf]  [web] [data]

    • *Attribute and Simile Classifiers for Face Verification, N. Kumar, A. Berg, P. Belhumeur, S. Nayar.  ICCV 2009.  [pdf] [web] [lfw data] [pubfig data]


    • Relative Attributes.  D. Parikh and K. Grauman.  ICCV 2011.  [pdf]  [data]

    • A Discriminative Latent Model of Object Classes and Attributes.  Y. Wang and G. Mori.  ECCV, 2010.  [pdf]

    • Learning Visual Attributes, V. Ferrari and A. Zisserman, NIPS 2007.  [pdf] 

    • Learning Models for Object Recognition from Natural Language Descriptions, J. Wang, K. Markert, and M. Everingham, BMVC 2009.[pdf]

    • FaceTracer: A Search Engine for Large Collections of Images with Faces.  N. Kumar, P. Belhumeur, and S. Nayar.  ECCV 2008.  [pdf]

    • Attribute-Centric Recognition for Cross-Category Generalization.  A. Farhadi, I. Endres, D. Hoiem.  CVPR 2010.  [pdf]

    • Automatic Attribute Discovery and Characterization from Noisy Web Data.  T. Berg et al.  ECCV 2010.  [pdf]  [data]

    • Attributes-Based People Search in Surveillance Environments.  D. Vaquero, R. Feris, D. Tran, L. Brown, A. Hampapur, and M. Turk.  WACV 2009.  [pdf] [project page]

    • Image Region Entropy: A Measure of "Visualness" of Web Images Associated with One Concept.  K. Yanai and K. Barnard.  ACM MM 2005.  [pdf]

    • What Helps Where And Why? Semantic Relatedness for Knowledge Transfer. M. Rohrbach, M. Stark, G. Szarvas, I. Gurevych and B. Schiele. CVPR 2010.  [pdf]

    • Recognizing Human Actions by Attributes.  J. Liu, B. Kuipers, S. Savarese, CVPR 2011.  [pdf]

    • Interactively Building a Discriminative Vocabulary of Nameable Attributes.  D. Parikh and K. Grauman.  CVPR 2011.  [pdf] [web]

      Papers: Saurajit
      Expts: Qiming, Harsh
      Proposal abstracts due Friday Oct 7, 5 PM
      III. External input in recognition
      Oct 12
      Language and description

      Discovering the correspondence between words and other language constructs and images, generating descriptions

      caption
      • *Baby Talk: Understanding and Generating Image Descriptions.  Kulkarni et al.  CVPR 2011.  [pdf]

      • *Beyond Nouns: Exploiting Prepositions and Comparative Adjectives for Learning Visual Classifiers, A. Gupta and L. Davis, ECCV 2008.  [pdf]

      • *Learning Sign Language by Watching TV (using weakly aligned subtitles), P. Buehler, M. Everingham, and A. Zisserman. CVPR 2009.  [pdf]  [data] [web]


      • Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary, P. Duygulu, K. Barnard, N. de Freitas, D. Forsyth. ECCV 2002.  [pdf]  [data]

      • The Mathematics of Statistical Machine Translation: Parameter Estimation.  P. Brown, S. Della Pietro, V. Della Pietra, R. Mercer.  Association for Computational Linguistics, 1993.  [pdf] (background for Duygulu et al paper)

      • How Many Words is a Picture Worth?  Automatic Caption Generation for News Images.  Y. Feng and M. Lapata.  ACL 2010.  [pdf]
      • Matching words and pictures. K. Barnard, P. Duygulu, N. de Freitas, D. Forsyth, D. Blei, and M. Jordan.  JMLR, 3:1107–1135, 2003.  [pdf]

      • Who's Doing What: Joint Modeling of Names and Verbs for Simultaneous Face and Pose Annotation.  L. Jie, B. Caputo, and V. Ferrari.  NIPS 2009.  [pdf]

      • Watch, Listen & Learn: Co-training on Captioned Images and Videos.  S. Gupta, J. Kim, K. Grauman, and R. Mooney.  ECML 2008.  [pdf]

      • Systematic Evaluation of Machine Translation Methods for Image and Video Annotation, P. Virga, P. Duygulu, CIVR 2005.  [pdf]
      • Localizing Objects and Actions in Videos Using Accompanying Text.  Johns Hopkins University Summer Workshop Report.  J. Neumann et al.  2010.  [pdf]  [web]
      Papers: Chris
      Expts: Jae, Naga


      Oct 19
      Interactive learning and recognition

      Human-in-the-loop learning, active annotation collection, crowdsourcing

      questions
      mturk


      • *Large-Scale Live Active Learning: Training Object Detectors with Crawled Data and Crowds.  S. Vijayanarasimhan and K. Grauman.  CVPR 2011.  [pdf]

      • *Visual Recognition with Humans in the Loop.  Branson S., Wah C., Babenko B., Schroff F., Welinder P., Perona P., Belongie S.  ECCV 2010. [pdf]  [Caltech/UCSD Visipedia project]  [data]

      • *The Multidimensional Wisdom of Crowds.  Welinder P., Branson S., Belongie S., Perona, P. NIPS 2010. [pdf]  [code]

      • *What’s It Going to Cost You? : Predicting Effort vs. Informativeness for Multi-Label Image Annotations.  S. Vijayanarasimhan and K. Grauman.  CVPR 2009 [pdf] [data] [code]


      • iCoseg: Interactive Co-segmentation with Intelligent Scribble Guidance, D. Batra, A. Kowdle, D. Parikh, J. Luo and T. Chen. CVPR 2010.  [pdf] [web]

      • Labeling Images with a Computer Game. L. von Ahn and L. Dabbish. CHI, 2004.

      • Who's Vote Should Count More: Optimal Integration fo Labels from Labelers of Unknown Expertise.  J. Whitehill et al.  NIPS 2009.  [pdf]
      • Utility Data Annotation with Amazon Mechanical Turk. A. Sorokin and D. Forsyth. Wkshp on Internet Vision, 2008.

      • Far-Sighted Active Learning on a Budget for Image and Video Recognition.  S. Vijayanarasimhan, P. Jain, and K. Grauman.  CVPR 2010.  [pdf]  [code]

      • Multiclass Recognition and Part Localization with Humans in the Loop.  C. Wah et al. ICCV 2011. [pdf]

      • Multi-Level Active Prediction of Useful Image Annotations for Recognition.  S. Vijayanarasimhan and K. Grauman.  NIPS 2008. [pdf] 

      • Active Learning from Crowds.  Y. Yan, R. Rosales, G. Fung, J. Dy.  ICML 2011.  [pdf]

      • Proactive Learning: Cost-Sensitive Active Learning with Multiple Imperfect Oracles.  P. Donmez and J. Carbonell.  CIKM 2008.  [pdf]
      • Inactive Learning?  Difficulties Employing Active Learning in Practice.  J. Attenberg and F. Provost.  SIGKDD 2011. [pdf]

      • Annotator Rationales for Visual Recognition.  J. Donahue and K. Grauman.  ICCV 2011. [pdf]

      • Interactively Building a Discriminative Vocabulary of Nameable Attributes.  D. Parikh and K. Grauman.  CVPR 2011.  [pdf] [web]

      • Actively Selecting Annotations Among Objects and Attributes.  A. Kovashka, S. Vijayanarasimhan, and K. Grauman.  ICCV 2011  [pdf]

      • Supervised Learning from Multiple Experts: Whom to Trust When Everyone Lies a Bit.  V. Raykar et al.  ICML 2009.  [pdf]
      • Multi-class Active Learning for Image Classification.  A. J. Joshi, F. Porikli, and N. Papanikolopoulos.  CVPR 2009.  [pdf]

      • GrabCut -Interactive Foreground Extraction using Iterated Graph Cuts, by C. Rother, V. Kolmogorov, A. Blake, SIGGRAPH 2004.  [pdf]  [project page]

      • Active Learning for Piecewise Planar 3D Reconstruction.  A. Kowdle, Y.-J. Chang, A. Gallagher and T. Chen. CVPR 2011 [pdf] [web]

      • Amazon Mechanical Turk
      • Using Mechanical Turk with LabelMe
        Papers: Brian, Harsh
        Expts: Yunsik

        Proposal extended outline due Friday Oct 21, 5 PM
        IV. Activity in images and video
        Oct 26
        Pictures of people

        Finding people and their poses, automatic face tagging

        pose

        • *Poselets: Body Part Detectors Trained Using 3D Human Pose Annotations, L. Bourdev and J. Malik.  ICCV 2009  [pdf[code]

        • *Understanding Images of Groups of People, A. Gallagher and T. Chen, CVPR 2009.  [pdf]  [web] [data]

        • *Real-Time Human Pose Recognition in Parts from a Single Depth Image.  J. Shotton et al.  CVPR 2011. [pdf] [video]

        • *"'Who are you?' - Learning Person Specific Classifiers from Video, J. Sivic, M. Everingham, and A. Zisserman, CVPR 2009.  [pdf] [data] [KLT tracking code]


        • Contextual Identity Recognition in Personal Photo Albums. D. Anguelov, K.-C. Lee, S. Burak, Gokturk, and B. Sumengen. CVPR 2007.  [pdf]

        • Fast Pose Estimation with Parameter Sensitive Hashing.  G. Shakhnarovich, P. Viola, T. Darrell, ICCV 2003.[pdf]

        • Finding and Tracking People From the Bottom Up.  D. Ramanan, D. A. Forsyth.  CVPR 2003.  [pdf]

        • Where’s Waldo: Matching People in Images of Crowds.  R. Garg, D. Ramanan, S. Seitz, N. Snavely. CVPR 2011.  [pdf]

        • Autotagging Facebook: Social Network Context Improves Photo Annotation, by  Z. Stone, T. Zickler, and T. Darrell.  CVPR Internet Vision Workshop 2008.   [pdf]

        • Efficient Propagation for Face Annotation in Family Albums. L. Zhang, Y. Hu, M. Li, and H. Zhang.  MM 2004.  [pdf]

        • Progressive Search Space Reduction for Human Pose Estimation.  Ferrari, V., Marin-Jimenez, M. and Zisserman, A.  CVPR 2008.  [pdf] [web] [code]

        • Leveraging Archival Video for Building Face Datasets, by D. Ramanan, S. Baker, and S. Kakade.  ICCV 2007.  [pdf]
        • Names and Faces in the News, by T. Berg, A. Berg, J. Edwards, M. Maire, R. White, Y. Teh, E. Learned-Miller and D. Forsyth, CVPR 2004.  [pdf]  [web]

        • Face Discovery with Social Context.  Y. J. Lee and K. Grauman.  BMVC 2011.  [pdf]

        • “Hello! My name is... Buffy” – Automatic Naming of Characters in TV Video, by M. Everingham, J. Sivic and A. Zisserman, BMVC 2006.  [pdf]  [web]  [data]

        • Modeling Mutual Context of Object and Human Pose in Human-Object Interaction Activities. Yao, B., Fei-Fei, L.  CVPR 2010.

        • A Face Annotation Framework with Partial Clustering and Interactive Labeling.  R. X. Y. Tian,W. Liu, F.Wen, and X. Tang.  CVPR 2007.  [pdf] [web]

        • From 3D Scene Geometry to Human Workspace.  A. Gupta et al.  CVPR 2011.  [pdf] [web]

        • Pictorial Structures Revisited: People Detection and Articulated Pose Estimation.  M. Andriluka et al. CVPR 2009.  [pdf]  [code]

        Papers: Sunil, Larry
        Expts: Nishant, Jung


        Nov 2
        Activity recognition

        Recognizing and localizing human actions in video

        actions
        • *Actions in Context, M. Marszalek, I. Laptev, C. Schmid.  CVPR 2009.  [pdf] [web] [data]

        • *A Hough Transform-Based Voting Framework for Action Recognition.  A. Yao, J. Gall, L. Van Gool.  CVPR 2010.  [pdf[code/data]

        • *Beyond Actions: Discriminative Models for Contextual Group Activities.  T. Lian, Y. Wang, W. Yang, and G. Mori.  NIPS 2010.  [pdf] [data]


        • Objects in Action: An Approach for Combining Action Understanding and Object Perception.   A. Gupta and L. Davis.  CVPR, 2007.  [pdf]  [data]

        • Learning Realistic Human Actions from Movies.  I. Laptev, M. Marszałek, C. Schmid and B. Rozenfeld.  CVPR 2008.  [pdf]  [data]

        • Understanding Egocentric Activities.  A. Fathi, A. Farhadi, J. Rehg.  ICCV 2011. [pdf]

        • Exploiting Human Actions and Object Context for Recognition Tasks.  D. Moore, I. Essa, and M. Hayes.  ICCV 1999.  [pdf]

        • A Scalable Approach to Activity Recognition Based on Object Use. J. Wu, A. Osuntogun, T. Choudhury, M. Philipose, and J. Rehg.  ICCV 2007.  [pdf]

        • Recognizing Actions at a Distance.  A. Efros, G. Mori, J. Malik.  ICCV 2003.  [pdf] [web]

        • Activity Recognition from First Person Sensing.  E. Taralova, F. De la Torre, M. Hebert  CVPR 2009 Workshop on Egocentric Vision  [pdf]

        • Action Recognition from a Distributed Representation of Pose and Appearance, S. Maji, L. Bourdev, J.  Malik, CVPR 2011.  [pdf]  [code]

        • Learning a Hierarchy of Discriminative Space-Time Neighborhood Features for Human Action Recognition.  A. Kovashka and K. Grauman.  CVPR 2010.  [pdf]

        • Temporal Causality for the Analysis of Visual Events.  K. Prabhakar, S. Oh, P. Wang, G. Abowd, and J. Rehg.  CVPR 2010.  [pdf] [Georgia Tech Computational Behavior Science project]

        • Modeling Activity Global Temporal Dependencies using Time Delayed Probabilistic Graphical Model.  Loy, Xiang & Gong ICCV 2009.  [pdf]

        • What's Going on?: Discovering Spatio-Temporal Dependencies in Dynamic Scenes.  D. Kuettel et al.  CVPR 2010.  [pdf]

        • Learning Actions From the Web.  N. Ikizler-Cinbis, R. Gokberk Cinbis, S. Sclaroff.  ICCV 2009.  [pdf]

        • Content-based Retrieval of Functional Objects in Video Using Scene Context.  S. Oh, A. Hoogs, M. Turek, and R. Collins.  ECCV 2010.  [pdf]
        Papers: Qiming, Yunsik
        Expts: Lu Xia


        V. Dealing with lots of data/categories
        Nov 9
        Scaling with a large number of categories

        Sharing features between classes, transfer, taxonomy, learning from few examples, exploiting class relationships

        shared
        • *Sharing Visual Features for Multiclass and Multiview Object Detection, A. Torralba, K. Murphy, W. Freeman, PAMI 2007.  [pdf]  [code]

        • *What Does Classifying More than 10,000 Image Categories Tell Us? J. Deng, A. Berg, K. Li and L. Fei-Fei.  ECCV 2010.  [pdf]

        • *Discriminative Learning of Relaxed Hierarchy for Large-scale Visual Recognition.  T. Gao and Daphne Koller ICCV 2011.  [pdf] [code]


        • Comparative Object Similarity for Improved Recognition with Few or Zero Examples. G. Wang, D. Forsyth, and D. Hoeim. CVPR 2010. [pdf]

        • Learning and Using Taxonomies for Fast Visual Categorization, G. Griffin and P. Perona, CVPR 2008.  [pdf] [data]

        • Cross-Generalization: Learning Novel Classes from a Single Example by Feature Replacement.  CVPR 2005.  [pdf]

        • 80 Million Tiny Images: A Large Dataset for Non-Parametric Object and Scene Recognition, by A. Torralba, R. Fergus, and W. Freeman.  PAMI 2008.  [pdf] [web]

        • Constructing Category Hierarchies for Visual Recognition, M. Marszalek and C. Schmid.  ECCV 2008.  [pdf]  [web] [Caltech256]

        • Learning Generative Visual Models from Few Training Examples: an Incremental Bayesian Approach Tested on 101 Object Categories. L. Fei-Fei, R. Fergus, and P. Perona. CVPR Workshop on Generative-Model Based Vision. 2004.  [pdf] [Caltech101]

        • Towards Scalable Representations of Object Categories: Learning a Hierarchy of Parts. S. Fidler and A. Leonardis.  CVPR 2007  [pdf]

        • Exploiting Object Hierarchy: Combining Models from Different Category Levels, A. Zweig and D. Weinshall, ICCV 2007 [pdf]

        • Incremental Learning of Object Detectors Using a Visual Shape Alphabet.  Opelt, Pinz, and Zisserman, CVPR 2006.  [pdf]

        • Sequential Learning of Reusable Parts for Object Detection.  S. Krempp, D. Geman, and Y. Amit.  2002  [pdf]

        • ImageNet: A Large-Scale Hierarchical Image Database, J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li and L. Fei-Fei, CVPR 2009 [pdf]  [data]

        • Semantic Label Sharing for Learning with Many Categories.  R. Fergus et al.  ECCV 2010.  [pdf]

        • Learning a Tree of Metrics with Disjoint Visual Features.  S. J. Hwang, K. Grauman, F. Sha.  NIPS 2011. 

        Papers: Cho-Jui, Si Si
        Expts: Lu Pan


        Nov 16
        Large-scale search and mining

        Scalable retrieval algorithms for massive databases, mining for themes

        hash
        • *VisualRank: Applying PageRank to Large-Scale Image Search.  Y. Jing and S. Baluja.  PAMI 2008.  [pdf]

        • *Kernelized Locality Sensitive Hashing for Scalable Image Search, by B. Kulis and K. Grauman, ICCV 2009 [pdf]  [code]

        • *Video Mining with Frequent Itemset Configurations.  T. Quack, V. Ferrari, and L. Van Gool.  CIVR 2006.  [pdf]


        • Learning Binary Projections for Large-Scale Image Search.  K. Grauman and R. Fergus.  Chapter (draft) to appear in Registration, Recognition, and Video Analysis, R. Cipolla, S. Battiato, and G. Farinella, Editors.  [pdf]

        • World-scale Mining of Objects and Events from Community Photo Collections.  T. Quack, B. Leibe, and L. Van Gool.  CIVR 2008.  [pdf

        • Interest Seam Image.  X. Zhang, G. Hua, L. Zhang, H. Shum.  CVPR 2010.  [pdf]

        • Detecting Objects in Large Image Collections and Videos by Efficient Subimage Retrieval, C. Lampert, ICCV 2009.  [pdf]  [code

        • Geometric Min-Hashing: Finding a (Thick) Needle in a Haystack, O. Chum, M. Perdoch, and J. Matas.  CVPR 2009.  [pdf]

        • FaceTracer: A Search Engine for Large Collections of Images with Faces.  N. Kumar, P. Belhumeur, and S. Nayar.  ECCV 2008.  [pdf]

        • Efficiently Searching for Similar Images.  K. Grauman.  Communications of the ACM, 2009.  [CACM link]

        • Fast Image Search for Learned Metrics, P. Jain, B. Kulis, and K. Grauman, CVPR 2008.  [pdf]

        • Small Codes and Large Image Databases for Recognition, A. Torralba, R. Fergus, and Y. Weiss, CVPR 2008.  [pdf]

        • Object Retrieval with Large Vocabularies and Fast Spatial Matching.  J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman, CVPR 2007.  [pdf]


        Papers: Naga, Jae
        Expts: Si Si


        Nov 23
        Summarization

        Video synopsis, discovering repeated objects, visualization

        synopsis
        • *Webcam Synopsis: Peeking Around the World, by Y. Pritch, A. Rav-Acha, A. Gutman, and S. Peleg, ICCV 2007.  [pdf] [web]

        • *Using Multiple Segmentations to Discover Objects and their Extent in Image Collections, B. C. Russell, A. A. Efros, J. Sivic, W. T. Freeman, and A. Zisserman.  CVPR 2006.  [pdf] [code]

        • *Summarizing Visual Data Using Bi-Directional Similarity.  D. Simakov, Y. Caspi, E. Shechtmann, M. Irani.  CVPR 2008.  [pdf] [video]


        • Fast Unsupervised Ego-Action Learning for First-Person Sports Video.  K. Kitani, T. Okabe, Y. Sato, A. Sugimoto.  CVPR 2011.  [pdf]

        • Scene Summarization for Online Image Collections.  I. Simon, N. Snavely, S. Seitz.  ICCV 2007.  [pdf]  [web]

        • VideoCut: Removing Irrelevant Frames by Discovering the Object of Interest. D. Liu, G. Hua, T. Chen.  ECCV 2010.  [pdf]

        • Video Epitomes. V. Cheung, B. J. Frey, and N. Jojic.  CVPR 2005. [pdf] [web] [code]

        • Making a Long Video Short. A. Rav-Acha, Y. Pritch, and S. Peleg.  CVPR 2006. [pdf]

        • Structural Epitome: A Way to Summarize One's Visual Experience.  N. Jojic, A. Perina, V. Murino.  NIPS 2010.  [pdf] [data]

        • Video Abstraction: A Systematic Review and Classification.  B. Truong and S. Venkatesh.  ACM 2007.  [pdf]

        • Shape Discovery from Unlabeled Image Collections.  Y. J. Lee and K. Grauman.  CVPR 2009.  [pdf]
        • Detecting and Sketching the Common.  S. Bagon, O. Brostovski, M. Galun, M. Irani.  CVPR 2010.  [pdf]
        • Object-Graphs for Context-Aware Category Discovery.  Y. J. Lee and K. Grauman.  CVPR 2010.  [pdf] [code]

        • Unsupervised Object Discovery: A Comparison.  T. Tuytelaars et al.  IJCV 2009.  [pdf]

        Papers: Lu Pan
        Expts: Sunil, Chris

        Final paper drafts due Wed Nov 23
        Nov 30
        Final project presentations in class


        Final papers due Tues Dec 6, 5 PM


        Other useful links:

         
         

         

        Posted by uniqueone
        ,

        schedule_CS395T Visual Recognition (Fall 2012).htm

         

        http://www.cs.utexas.edu/~cv-fall2012/schedule.html 

         

        CS395T: Visual Recognition, Fall 2012



        Course overview        Useful links        Syllabus        Detailed schedule          Blackboard


        Meets
        :
        Fridays 1-4 pm in ACES 3.408

        Instructor: Kristen Grauman 
        Email: grauman@cs
        Office: ACES 3.446 
        Office hours: by appointment (send email)


        TA: Austin Waters
        Email: austin@cs
        Office hours: by appointment (send email)

        When emailing us, please put CS395 in the subject line.

        Announcements:

        See the schedule for weekly reading assignments. 

        Project extended outlines due Wed Oct 31.  See handout for guidelines.


        Course overview:


        Topics: This is a graduate seminar course in computer vision.   We will survey and discuss current vision papers relating to object and activity recognition, auto-annotation of images, and scene understanding.  The goals of the course will be to understand current approaches to some important problems, to actively analyze their strengths and weaknesses, and to identify interesting open questions and possible directions for future research.

        See the syllabus for an outline of the main topics we'll be covering.

        Requirements: Students will be responsible for writing paper reviews each week, participating in discussions, completing two programming assignments, presenting once or twice in class (depending on enrollment), and completing a project (done in pairs). 

        Note that presentations are due one week before the slot your presentation is scheduled.  This means you will need to read the papers, prepare experiments, create slides, etc. more than one week before the date you are signed up for.  The idea is to meet and discuss ahead of time, so that we can iterate as needed the week leading up to your presentation. 

        More details on the requirements and grading breakdown are here.

        Prereqs:  Courses in computer vision and/or machine learning (378/376 Computer Vision and/or 391 Machine Learning, or similar); ability to understand and analyze conference papers in this area; programming required for experiment presentations and projects. 

        Please talk to me if you are unsure if the course is a good match for your background.  I generally recommend scanning through a few papers on the syllabus to gauge what kind of background is expected.  I don't assume you are already familiar with every single algorithm/tool/image feature a given paper mentions, but you should feel comfortable following the key ideas.



        Syllabus overview:


        A. Object recognition fundamentals
        1. Local features and matching for object instances
        2. Large-scale image/object search and mining
        3. Classification and detection for object categories
        4. Mid-level representations
        B. Beyond modeling individual objects
        1. Context and scenes
        2. Dealing with many categories
        3. Describing objects with attributes
        4. Importance and saliency

        C. Human-centered recognition

        1. Pictures of people
        2. Activity recognition
        3. Egocentric cameras
        4. Human-in-the-loop interactive systems

          Important dates:
            • Wednesday, Sept 5: paper topic preferences due
            • Friday, Sept 21: first coding assignment due
            • Friday Oct 5: second coding assignment due
            • Wednesday, Oct 17: project proposal abstracts due
            • Wednesday, Oct 31: project extended outlines due
            • Friday Dec 7: final papers due


            Schedule and papers:


            Note:  * = required reading. 
            Additional papers are provided for reference, and as a starting point for background reading for projects.
            Paper presentations: Cover the starred papers.
            Experiment presentations: Pick one from among the starred papers.
            Date
            Topics
            Papers and links
            Presenters
            Items due
            Aug 31
            Course intro 

            [slides]
            Topic preferences due via email to Austin (austin@cs) by Wed Sept 5 at 5 pm
            A. Object recognition fundamentals
            Sept 7
            Local features and matching for object instances:

            Invariant local features, instance recognition, visual vocabularies and bag-of-words

            sift
            • *Object Recognition from Local Scale-Invariant Features, Lowe, ICCV 1999.  [pdf]  [code] [other implementations of SIFT] [IJCV]

            • *Selected pages from: Local Invariant Feature Detectors: A Survey, Tuytelaars and Mikolajczyk.  Foundations and Trends in Computer Graphics and Vision, 2008. [pdf]  [Oxford code] [Read pp. 178-188, 216-220, 254-255]

            • *Video Google: A Text Retrieval Approach to Object Matching in Videos, Sivic and Zisserman, ICCV 2003.  [pdf]  [demo]

            • For more background on feature extraction: Szeliski book: Sec 3.2 Linear filtering, 4.1 Points and patches, 4.2 Edges

            • Scalable Recognition with a Vocabulary Tree, D. Nister and H. Stewenius, CVPR 2006. [pdf]

            • SURF: Speeded Up Robust Features, Bay, Ess, Tuytelaars, and Van Gool, CVIU 2008.  [pdf] [code]

            • Robust Wide Baseline Stereo from Maximally Stable Extremal Regions, J. Matas, O. Chum, U. Martin, and T. Pajdla, BMVC 2002.  [pdf]

            • A Performance Evaluation of Local Descriptors. K. Mikolajczyk and C. Schmid.  CVPR 2003 [pdf]

            [outline]
            [filters]
            [local features]
            [matching and spatial verification]

            Sept 14
            Large-scale image/object search and mining:

            Scalable retrieval algorithms, mining for visual themes, particularly for object instances

            query expansion
            • *Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval.  O. Chum et al. CVPR 2007.  [pdf] [Oxford buildings dataset]

            • *Discovering Favorite Views of Popular Places with Iconoid Shift.  T. Weyand and B. Leibe.  ICCV 2011.  [pdf] [Paris 500K dataset]

            • *Supervised Hashing with Kernels.  W. Liu, J. Wang, R. Ji, Y. Jiang, S.-F. Chang.  CVPR 2012 [pdf]

            • Kernelized Locality Sensitive Hashing for Scalable Image Search, by B. Kulis and K. Grauman, ICCV 2009 [pdf]  [code] [80M Tiny Images data]

            • Image Webs: Computing and Exploiting Connectivity in Image Collections.  K. Heath, N. Gelfand, M. Ovsjanikov, M. Aanjaneya, and L. Guibas.  CVPR 2010.  [pdf]
            • World-scale Mining of Objects and Events from Community Photo Collections.  T. Quack, B. Leibe, and L. Van Gool.  CIVR 2008.  [pdf]

            • Total Recall II: Query Expansion Revisited.  O. Chum, A. Mikulik, M. Perdoch, and J. Matas.  CVPR 2011.  [pdf]

            • Geometric Min-Hashing: Finding a (Thick) Needle in a Haystack, O. Chum, M. Perdoch, and J. Matas.  CVPR 2009.  [pdf]

            • Three Things Everyone Should Know to Improve Object Retrieval.  R. Arandjelovic and A. Zisserman.  CVPR 2012.  [pdf]

            • Video Mining with Frequent Itemset Configurations.  T. Quack, V. Ferrari, and L. Van Gool.  CIVR 2006.  [pdf]

            • Bundling Features for Large Scale Partial-Duplicate Web Image Search.  Z. Wu, Q. Ke, M. Isard, and J. Sun.  CVPR 2009.  [pdf]

            • Improving Image-based Localization by Active Correspondence Search. T. Sattler, B. Leibe, L. Kobbelt.  ECCV 2012.  [pdf]

            • Learning Binary Projections for Large-Scale Image Search.  K. Grauman and R. Fergus.  Chapter to appear in Registration, Recognition, and Video Analysis, R. Cipolla, S. Battiato, and G. Farinella, Editors.  [pdf]

            • Learning Query-dependent Prefilters for Scalable Image Retrieval.  L. Torresani, M. Szummer, and A. Fitzgibbon.  CVPR 2009.  [pdf]

            • Detecting Objects in Large Image Collections and Videos by Efficient Subimage Retrieval, C. Lampert, ICCV 2009.  [pdf]  [code

            • Efficiently Searching for Similar Images.  K. Grauman.  Communications of the ACM, 2009.  [CACM link]

            • Fast Image Search for Learned Metrics, P. Jain, B. Kulis, and K. Grauman, CVPR 2008.  [pdf]

            • Small Codes and Large Image Databases for Recognition, A. Torralba, R. Fergus, and Y. Weiss, CVPR 2008.  [pdf]

            • Object Retrieval with Large Vocabularies and Fast Spatial Matching.  J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman, CVPR 2007.  [pdf] [approx k-means code]

            • City-Scale Location Recognition, G. Schindler, M. Brown, and R. Szeliski, CVPR 2007.  [pdf]
            [outline]
            [
            wrap-up on instance recognition, large-scale search]

            Sept 21
            Classification and detection for object categories

            Global appearance models for category and scene recognition; sliding window detection, voting-based detection, detection as a binary decision problem.

            dpm
            • *A Discriminatively Trained, Multiscale, Deformable Part Model, by P. Felzenszwalb,  D.  McAllester and D. Ramanan.   CVPR 2008.  [pdf]  [code

            • *Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories, Lazebnik, Schmid, and Ponce, CVPR 2006. [pdf]  [15 scenes dataset]  [libpmk] [Matlab]

            • *Class-specific Hough Forests for Object Detection.  J. Gall and V. Lempitsky.  CVPR 2009.  [pdf] [slides] [code]

            • Robust Object Detection with Interleaved Categorization and Segmentation.  B. Leibe, A. Leonardis, and B. Schiele.  IJCV 2008.  [pdf]  [code]

            • The Devil is in the Details: an Evaluation of Recent Feature Encoding Methods.  K. Chatfield, V. Lempitsky, A. Vedaldi, A. Zisserman.  BMVC 2011.  [pdf] [code

            • Rapid Object Detection Using a Boosted Cascade of Simple Features, Viola and Jones, CVPR 2001.  [pdf]  [code]

            • Histograms of Oriented Gradients for Human Detection, Dalal and Triggs, CVPR 2005.  [pdf]  [video] [code] [PASCAL datasets]

            • Modeling the Shape of the Scene: a Holistic Representation of the Spatial Envelope, Oliva and Torralba, IJCV 2001.  [pdf]  [Gist code

            • Locality-Constrained Linear Coding for Image Classification.  J. Wang, J. Yang, K. Yu,  and T. Huang  CVPR 2010. [pdf] [code]

            • Visual Categorization with Bags of Keypoints, C. Dance, J. Willamowski, L. Fan, C. Bray, and G. Csurka, ECCV International Workshop on Statistical Learning in Computer Vision, 2004.  [pdf]

            • Pedestrian Detection in Crowded Scenes, Leibe, Seemann, and Schiele, CVPR 2005.  [pdf]

            • Pyramids of Histograms of Oriented Gradients (pHOG), Bosch and Zisserman. [code]

            • Sampling Strategies for Bag-of-Features Image Classification.  E. Nowak, F. Jurie, and B. Triggs.  ECCV 2006. [pdf]

            • Beyond Sliding Windows: Object Localization by Efficient Subwindow Search.  C. Lampert, M. Blaschko, and T. Hofmann.  CVPR 2008.  [pdf]  [code]

            • Diagnosing Error in Object Detectors.  D. Hoiem et al. ECCV 2012.  [pdf]
            [outline]
            [
            slides part 1]
            Heath-expt
            Nona-paper

            HW1 due Friday Sept 21, 11:59 pm
            Sept 28
            Mid-level representations

            Segmentation into regions, grouping, surface estimation

            surfaces


            • *Constrained Parametric Min-Cuts for Automatic Object Segmentation. J. Carreira and C. Sminchisescu. CVPR 2010.  [pdf] [code]

            • *From Contours to Regions: An Empirical Evaluation.  P. Arbelaez, M. Maire, C. Fowlkes, and J. Malik.  CVPR 2009.  [pdf] [code and data] [journal paper]

            • *Indoor Segmentation and Support Inference from RGBD Images.  N. Silberman, D. Hoiem, P. Kohli, and R. Fergus.  ECCV 2012.  [pdf] [NYU depth dataset]
            • Geometric Context from a Single Image, D. Hoiem, A. Efros, and M. Hebert, ICCV 2005. [pdf]  [web]  [code]

            • Category Independent Object Proposals.  I. Endres and D. Hoiem.  ECCV 2010.  [pdf] [code/data]

            • Geometric reasoning for single image structure recovery.  D. Lee, M. Hebert, T. Kanade.  CVPR 2009.  [pdf]  [code]

            • Boundary-Preserving Dense Local Regions.  J. Kim and K. Grauman.  CVPR 2011.  [pdf]  [code]

            • Object Recognition as Ranking Holistic Figure-Ground Hypotheses. F. Li, J. Carreira, and C. Sminchisescu. CVPR 2010. [pdf]

            • People Watching: Human Actions as a Cue for Single View Geometry.  D. Fouhey et al. ECCV 2012.  [pdf]
            • Using Multiple Segmentations to Discover Objects and their Extent in Image Collections, B. C. Russell, A. A. Efros, J. Sivic, W. T. Freeman, and A. Zisserman.  CVPR 2006.  [pdf] [code]

            • Combining Top-down and Bottom-up Segmentation. E. Borenstein, E. Sharon, and S. Ullman.  CVPR  workshop 2004.  [pdf]  [data]

            • Learning Mid-level Features for Recognition. Y.-L. Boureau, F. Bach, Y. LeCun, and J. Ponce. CVPR, 2010. 

            • Class-Specific, Top-Down Segmentation, E. Borenstein and S. Ullman, ECCV 2002.  [pdf]

            • GrabCut -Interactive Foreground Extraction using Iterated Graph Cuts, by C. Rother, V. Kolmogorov, A. Blake, SIGGRAPH 2004.  [pdf]  [project page]

            • Robust Higher Order Potentials for Enforcing Label Consistency, P. Kohli, L. Ladicky, and P. Torr. CVPR 2008.  

            • Collect-Cut: Segmentation with Top-Down Cues Discovered in Multi-Object Images.  Y. J. Lee and K. Grauman. CVPR 2010.  [pdf] [data]

            • Shape Sharing for Object Segmentation.  J. Kim and K. Grauman.  ECCV 2012.  [pdf]
            • Normalized Cuts and Image Segmentation, J. Shi and J. Malik.  PAMI 2000.  [pdf]  [code]

            [outline]
            [
            slides]
            Che-Chun-expt
            Elad-expt
            Sanmit-paper
            Islam-paper
            Chao-paper


            B. Beyond modeling individual objects
            Oct 5
            Context and scenes

            Multi-object scenes, inter-object relationships, understanding scenes' spatial layout

            context
            • *Scene Semantics from Long-term Observation of People.  V. Delaitre, D. Fouhey, I. Laptev, J. Sivic, A. Gupta, A. Efros.  ECCV 2012 [pdf] [web] [pose code]
            • *Multi-Class Segmentation with Relative Location Prior.  S. Gould, J. Rodgers, D. Cohen, G. Elidan and D.  Koller.  IJCV 2008. [pdf] [code]

            • *Using the Forest to See the Trees: Exploiting Context for Visual Object Detection and Localization.  Torralba, Murphy, and Freeman.  CACM 2009.  [pdf] [related code]

            • Object-Graphs for Context-Aware Category Discovery.  Y. J. Lee and K. Grauman.  CVPR 2010.  [pdf] [code]

            • Estimating Spatial Layout of Rooms using Volumetric Reasoning about Objects and Surfaces.  D. Lee, A. Gupta, M. Hebert, and T. Kanade.  NIPS 2010.  [pdf] [code]

            • Contextual Priming for Object Detection, A. Torralba.  IJCV 2003.  [pdf] [web] [code]

            • Object Bank: A High-Level Image Representation for Scene Classification & Semantic Feature Sparsification.  L-J. Li, H. Su, E. Xing, L. Fei-Fei.  NIPS 2010.  [pdf]  [code]

            • RGB-D scene labeling: features and algorithms. X. Ren, L. Bo, and D. Fox.  CVPR 2012. [pdf] [code]

            • TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-Class Object Recognition and Segmentation.  J. Shotton, J. Winn, C. Rother, A. Criminisi.  ECCV 2006.  [pdf] [web] [data] [code]

            • Recognition Using Visual Phrases.  M. Sadeghi and A. Farhadi.  CVPR 2011.  [pdf]

            • Thinking Inside the Box: Using Appearance Models and Context Based on Room Geometry.  V. Hedau, D. Hoiem, and D. Forsyth.  ECCV 2010 [pdf] [code and data]

            • Blocks World Revisited: Image Understanding Using Qualitative Geometry and Mechanics, A. Gupta, A. Efros, and M. Hebert.  ECCV 2010. [pdf]  [code]

            • Geometric Reasoning for Single Image Structure Recovery.  D. Lee, M. Hebert, and T. Kanade.  CVPR 2009.  [pdf]  [web[code]

            • Putting Objects in Perspective, by D. Hoiem, A. Efros, and M. Hebert, CVPR 2006.  [pdf] [web]

            • Discriminative Models for Multi-Class Object Layout, C. Desai, D. Ramanan, C. Fowlkes. ICCV 2009.  [pdf]  [slides]  [SVM struct code] [data]

            • Closing the Loop in Scene Interpretation.  D. Hoiem, A. Efros, and M. Hebert.  CVPR 2008.  [pdf]

            • Decomposing a Scene into Geometric and Semantically Consistent Regions, S. Gould, R. Fulton, and D. Koller, ICCV 2009.  [pdf]  [slides]

            • Learning Spatial Context: Using Stuff to Find Things, by G. Heitz and D. Koller, ECCV 2008.  [pdf] [code]

            • An Empirical Study of Context in Object Detection, S. Divvala, D. Hoiem, J. Hays, A. Efros, M. Hebert, CVPR 2009.  [pdf]  [web]

            • Object Categorization using Co-Occurrence, Location and Appearance, by C. Galleguillos, A. Rabinovich and S. Belongie, CVPR 2008.[ pdf]

            • Context Based Object Categorization: A Critical SurveyC. Galleguillos and S. Belongie.  [pdf]

            • What, Where and Who? Classifying Events by Scene and Object Recognition, L.-J. Li and L. Fei-Fei, ICCV 2007. [pdf]

            • Simultaneous Visual Recognition of Manipulation Actions and Manipulated Objects.  H. Kjellstrom et al. ECCV 2008.  [pdf]

            • Modeling mutual context of object and human pose in human-object interaction activities.   B. Yao and L. Fei-Fei.  CVPR 2010.  [pdf]

            Jacob-paper
            Aron-paper
            Aashish-expt
            David-expt
            HW2 due, Friday Oct 5, 11:59 pm
            Oct 12
            Dealing with many categories

            Sharing features between classes, transfer, taxonomy, learning from few examples, exploiting class relationships

            shared features
            • *Sharing Visual Features for Multiclass and Multiview Object Detection, A. Torralba, K. Murphy, W. Freeman, PAMI 2007.  [pdf]  [code]

            • *Hedging Your Bets: Optimizing Accuracy-Specificity Trade-offs in Large Scale Visual Recognition.  J. Deng, J. Krause, A. Berg, L. Fei-Fei.  CVPR 2012 [pdf] [supp] [ILSVRC data]

            • *Tabula Rasa: Model Transfer for Object Category Detection. Y. Atar and A. Zisserman.  CVPR 2011. [pdf] [HoG code]

            • What Does Classifying More than 10,000 Image Categories Tell Us? J. Deng, A. Berg, K. Li and L. Fei-Fei.  ECCV 2010.  [pdf]

            • Discriminative Learning of Relaxed Hierarchy for Large-scale Visual Recognition.  T. Gao and Daphne Koller ICCV 2011.  [pdf] [code]

            • Comparative Object Similarity for Improved Recognition with Few or Zero Examples. G. Wang, D. Forsyth, and D. Hoeim. CVPR 2010. [pdf]

            • Learning and Using Taxonomies for Fast Visual Categorization, G. Griffin and P. Perona, CVPR 2008.  [pdf] [data]

            • 80 Million Tiny Images: A Large Dataset for Non-Parametric Object and Scene Recognition, by A. Torralba, R. Fergus, and W. Freeman.  PAMI 2008.  [pdf] [web]

            • Constructing Category Hierarchies for Visual Recognition, M. Marszalek and C. Schmid.  ECCV 2008.  [pdf]  [web] [Caltech256]

            • Learning Generative Visual Models from Few Training Examples: an Incremental Bayesian Approach Tested on 101 Object Categories. L. Fei-Fei, R. Fergus, and P. Perona. CVPR Workshop on Generative-Model Based Vision. 2004.  [pdf] [Caltech101]

            • Towards Scalable Representations of Object Categories: Learning a Hierarchy of Parts. S. Fidler and A. Leonardis.  CVPR 2007  [pdf]

            • Exploiting Object Hierarchy: Combining Models from Different Category Levels, A. Zweig and D. Weinshall, ICCV 2007 [pdf]

            • Incremental Learning of Object Detectors Using a Visual Shape Alphabet.  Opelt, Pinz, and Zisserman, CVPR 2006.  [pdf]

            • ImageNet: A Large-Scale Hierarchical Image Database, J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li and L. Fei-Fei, CVPR 2009 [pdf]  [data]

            • Semantic Label Sharing for Learning with Many Categories.  R. Fergus et al.  ECCV 2010.  [pdf]

            • Learning a Tree of Metrics with Disjoint Visual Features.  S. J. Hwang, K. Grauman, F. Sha.  NIPS 2011. 

            Elad-paper
            Gary-expt


            Wed Oct 17 project proposal abstract due
            Oct 19 Describing objects with attributes

            Visual properties, learning from natural language descriptions, intermediate representations

            attributes
            • *Describing Objects by Their Attributes, A. Farhadi, I. Endres, D. Hoiem, and D. Forsyth, CVPR 2009.  [pdf]  [web] [data]

            • *FaceTracer: A Search Engine for Large Collections of Images with Faces.  N. Kumar, P. Belhumeur, and S. Nayar.  ECCV 2008.  [pdf] [code, data, demo]
            • *Relative Attributes.  D. Parikh and K. Grauman.  ICCV 2011.  [pdf]  [code/data]

            • Attribute and Simile Classifiers for Face Verification, N. Kumar, A. Berg, P. Belhumeur, S. Nayar.  ICCV 2009.  [pdf] [web] [lfw data] [pubfig data]

            • Learning To Detect Unseen Object Classes by Between-Class Attribute Transfer, C. Lampert, H. Nickisch, and S. Harmeling, CVPR 2009  [pdf] [web] [data]

            • A Joint Learning Framework for Attribute Models and Object Descriptions.  D. Mahajan, S. Sellamanickam, V. Nair.  ICCV 2011.  [pdf]

            • WhittleSearch: Image Search with Relative Attribute Feedback.  A. Kovashka, D. Parikh, K. Grauman.  CVPR 2012.  [pdf] [data]
            • SUN Attribute Database: Discovering, Annotating, and Recognizing Scene Attributes.  G. Patterson and J. Hays.  CVPR 2012.  [pdf] [data]

            • Multi-Attribute Spaces: Calibration for Attribute Fusion and Similarity Search.  W. Scheirer, N. Kumar, P. Belhumeur, T. Boult.  CVPR 2012  [pdf]

            • A Discriminative Latent Model of Object Classes and Attributes.  Y. Wang and G. Mori.  ECCV, 2010.  [pdf]

            • Learning Visual Attributes, V. Ferrari and A. Zisserman, NIPS 2007.  [pdf] 

            • Learning Models for Object Recognition from Natural Language Descriptions, J. Wang, K. Markert, and M. Everingham, BMVC 2009.[pdf]

            • Attribute-Centric Recognition for Cross-Category Generalization.  A. Farhadi, I. Endres, D. Hoiem.  CVPR 2010.  [pdf]

            • Automatic Attribute Discovery and Characterization from Noisy Web Data.  T. Berg et al.  ECCV 2010.  [pdf]  [data]

            • Attributes-Based People Search in Surveillance Environments.  D. Vaquero, R. Feris, D. Tran, L. Brown, A. Hampapur, and M. Turk.  WACV 2009.  [pdf] [project page]

            • Image Region Entropy: A Measure of "Visualness" of Web Images Associated with One Concept.  K. Yanai and K. Barnard.  ACM MM 2005.  [pdf]

            • What Helps Where And Why? Semantic Relatedness for Knowledge Transfer. M. Rohrbach, M. Stark, G. Szarvas, I. Gurevych and B. Schiele. CVPR 2010.  [pdf]

            • Recognizing Human Actions by Attributes.  J. Liu, B. Kuipers, S. Savarese, CVPR 2011.  [pdf]

            • Interactively Building a Discriminative Vocabulary of Nameable Attributes.  D. Parikh and K. Grauman.  CVPR 2011.  [pdf] [web]

            Aashish-paper
            Girish-paper
            Sanmit-expt
            Nona-expt



            Oct 26
            Importance and saliency

            Among all items in the scene, which deserve attention (first)?  What makes images interesting or memorable?

            saliency
            • *Understanding and Predicting Importance in Images.  A. Berg et al.  CVPR 2012.  [pdf] [UIUC sentence dataset] [ImageClef dataset]
            • *Learning to Detect a Salient Object.  T. Liu et al. CVPR 2007.  [pdf]  [results]  [data]  [code]

            • *What Makes an Image Memorable?  P. Isola, J. Xiao, A. Torralba, A. Oliva. CVPR 2011. [pdf] [web] [code/data]

            • What Do We Perceive in a Glance of a Real-World Scene?  L. Fei-Fei, A. Iyer, C. Koch, and P. Perona.  Journal of Vision, 2007.  [pdf]

            • A Model of Saliency-based Visual Attention for Rapid Scene Analysis.  L. Itti, C. Koch, and E. Niebur.  PAMI 1998  [pdf]

            • Interesting Objects are Visually Salient.  L. Elazary and L. Itti.  Journal of Vision, 8(3):1–15, 2008.  [pdf]

            • Accounting for the Relative Importance of Objects in Image Retrieval.  S. J. Hwang and K. Grauman.  BMVC 2010.  [pdf] [web] [data]

            • Some Objects are More Equal Than Others: Measuring and Predicting Importance, M. Spain and P. Perona.  ECCV 2008.  [pdf]

            • The Discriminant Center-Surround Hypothesis for Bottom-Up Saliency. D. Gao, V.Mahadevan, and N. Vasconcelos. NIPS, 2007.  [pdf]

            • What is an Object?  B. Alexe, T. Deselaers, and V. Ferrari.  CVPR 2010.  [pdf] [code]

            • A Principled Approach to Detecting Surprising Events in Video.  L. Itti and P. Baldi.  CVPR 2005  [pdf]

            • What Attributes Guide the Deployment of Visual Attention and How Do They Do It? J. Wolfe and T. Horowitz. Neuroscience, 5:495–501, 2004.  [pdf]

            • Visual Correlates of Fixation Selection: Effects of Scale and Time. B. Tatler, R. Baddeley, and I. Gilchrist. Vision Research, 45:643, 2005.  [pdf]

            • Objects Predict Fixations Better than Early Saliency.  W. Einhauser, M. Spain, and P. Perona. Journal of Vision, 8(14):1–26, 2008.  [pdf]

            • Reading Between the Lines: Object Localization Using Implicit Cues from Image Tags.  S. J. Hwang and K. Grauman.  CVPR 2010.  [pdf]

            • Peripheral-Foveal Vision for Real-time Object Recognition and Tracking in Video.  S. Gould, J. Arfvidsson, A. Kaehler, B. Sapp, M. Messner, G. Bradski, P. Baumstrack,S. Chung, A. Ng.  IJCAI 2007.  [pdf]

            • Determining Patch Saliency Using Low-Level Context, D. Parikh, L. Zitnick, and T. Chen. ECCV 2008.  [pdf]

            • Key-Segments for Video Object Segmentation.  Y. J. Lee, J. Kim, and K. Grauman.  ICCV 2011  [pdf]

            • Contextual Guidance of Eye Movements and Attention in Real-World Scenes: The Role of Global Features on Object Search.  A. Torralba, A. Oliva, M. Castelhano, J. Henderson.  [pdf] [web]

            • The Role of Top-down and Bottom-up Processes in Guiding Eye Movements during Visual Search, G. Zelinsky, W. Zhang, B. Yu, X. Chen, D. Samaras, NIPS 2005.  [pdf]

            Islam-expt
            Chao-expt
            Che-Chun-paper
            Niveda-paper
            Wed Oct 31: project extended outlines due
            C. Human-centered recognition
            Nov 2
            Pictures of people

            Finding people, predicting their poses and attributes, automatic face tagging

            poselets
            • *Poselets: Body Part Detectors Trained Using 3D Human Pose Annotations, L. Bourdev and J. Malik.  ICCV 2009  [pdf[code]

            • *Real-Time Human Pose Recognition in Parts from a Single Depth Image.  J. Shotton et al.  CVPR 2011. [pdf] [video] [web]

            • *Where’s Waldo: Matching People in Images of Crowds.  R. Garg, D. Ramanan, S. Seitz, N. Snavely. CVPR 2011.  [pdf] [web]

            • *Understanding Images of Groups of People, A. Gallagher and T. Chen, CVPR 2009.  [pdf]  [web] [data]

            • Parsing Clothing in Fashion Photographs.  K. Yamaguchi et al. CVPR 2012.  [pdf] [data]
            • Contextual Identity Recognition in Personal Photo Albums. D. Anguelov, K.-C. Lee, S. Burak, Gokturk, and B. Sumengen. CVPR 2007.  [pdf]

            • Recognizing Proxemics in Personal Photos.  Y. Yang, S. Baker, A. Kannan, D. Ramanan.  CVPR 2012.  [pdf]
            • Who are you? - Learning Person Specific Classifiers from Video, J. Sivic, M. Everingham, and A. Zisserman, CVPR 2009.  [pdf] [data] [KLT tracking code]

            • Describing Clothing by Semantic Attributes.  A. Gallagher et al.  ECCV 2012.  [pdf]

            • Describing People: A Poselet-Based Approach to Attribute Classification.  L. Bourdev, S. Maji, J. Malik.  ICCV 2011.  [pdf]

            • Weakly Supervised Learning of Interactions between Humans and Objects.  Prest et al. PAMI 2012. [pdf]
            • Finding and Tracking People From the Bottom Up.  D. Ramanan, D. A. Forsyth.  CVPR 2003.  [pdf]

            • Autotagging Facebook: Social Network Context Improves Photo Annotation, by  Z. Stone, T. Zickler, and T. Darrell.  CVPR Internet Vision Workshop 2008.   [pdf]

            • Efficient Propagation for Face Annotation in Family Albums. L. Zhang, Y. Hu, M. Li, and H. Zhang.  MM 2004.  [pdf]

            • Progressive Search Space Reduction for Human Pose Estimation.  Ferrari, V., Marin-Jimenez, M. and Zisserman, A.  CVPR 2008.  [pdf] [web] [code]

            • Names and Faces in the News, by T. Berg, A. Berg, J. Edwards, M. Maire, R. White, Y. Teh, E. Learned-Miller and D. Forsyth, CVPR 2004.  [pdf]  [web]

            • Face Discovery with Social Context.  Y. J. Lee and K. Grauman.  BMVC 2011.  [pdf]

            • “Hello! My name is... Buffy” – Automatic Naming of Characters in TV Video, by M. Everingham, J. Sivic and A. Zisserman, BMVC 2006.  [pdf]  [web]  [data]

            • Pictorial Structures Revisited: People Detection and Articulated Pose Estimation.  M. Andriluka et al. CVPR 2009.  [pdf]  [code]

            • Exploring Photobios.  I. Kemelmacher-Shlizerman, E. Shechtman, R. Garg, S. Seitz.  SIGGRAPH 2011.  [pdf]
            Deepti-paper
            Randall-expt
            Aron-expt
            Dinesh-paper

            Nov 9
            Activity recognition

            Recognizing and localizing human actions in video or static images

            actions
            • *Learning Realistic Human Actions from Movies.  I. Laptev, M. Marszałek, C. Schmid and B. Rozenfeld.  CVPR 2008.  [pdf]  [data] [code]

            • *A Unified Framework for Multi-Target Tracking and Collective Activity Recognition.  W. Choi and S. Savarese.  ECCV 2012.  [pdf] [web] [video] [data]

            • *Detecting Actions, Poses, and Objects with Relational Phraselets.  C. Desai and D. Ramanan.  ECCV 2012.  [pdf] [data] [code]

            • Beyond Actions: Discriminative Models for Contextual Group Activities.  T. Lian, Y. Wang, W. Yang, and G. Mori.  NIPS 2010.  [pdf] [data]

            • Efficient Activity Detection with Max-Subgraph Search.  C.-Y. Chen and K. Grauman. CVPR 2012.  [pdf] [project page]  [code]

            • Action Bank: a High-Level Representation of Activity in Video.  S. Sadanand and J. Corso.  CVPR 2012 [pdf]  [code/data]

            • A Hough Transform-Based Voting Framework for Action Recognition.  A. Yao, J. Gall, L. Van Gool.  CVPR 2010.  [pdf[code/data]

            • Actions in Context, M. Marszalek, I. Laptev, C. Schmid.  CVPR 2009.  [pdf] [web] [data]

            • Objects in Action: An Approach for Combining Action Understanding and Object Perception.   A. Gupta and L. Davis.  CVPR, 2007.  [pdf]  [data]

            • Exemplar-based Action Recognition in Video. G. Willems, J. Becker, T. Tuytelaars, and L. V. Gool. BMVC, 2009.
            • A Scalable Approach to Activity Recognition Based on Object Use. J. Wu, A. Osuntogun, T. Choudhury, M. Philipose, and J. Rehg.  ICCV 2007.  [pdf]

            • Recognizing Actions at a Distance.  A. Efros, G. Mori, J. Malik.  ICCV 2003.  [pdf] [web]

            • Action Recognition from a Distributed Representation of Pose and Appearance, S. Maji, L. Bourdev, J.  Malik, CVPR 2011.  [pdf]  [code]

            • Learning a Hierarchy of Discriminative Space-Time Neighborhood Features for Human Action Recognition.  A. Kovashka and K. Grauman.  CVPR 2010.  [pdf]

            • Temporal Causality for the Analysis of Visual Events.  K. Prabhakar, S. Oh, P. Wang, G. Abowd, and J. Rehg.  CVPR 2010.  [pdf] [Georgia Tech Computational Behavior Science project]

            • What's Going on?: Discovering Spatio-Temporal Dependencies in Dynamic Scenes.  D. Kuettel et al.  CVPR 2010.  [pdf]

            • Learning Actions From the Web.  N. Ikizler-Cinbis, R. Gokberk Cinbis, S. Sclaroff.  ICCV 2009.  [pdf]

            Girish-expt
            Gary-paper
            David-paper


            Nov 16
            Egocentric cameras

            Analyzing data from wearable, mobile cameras; "first person" vision

            camera
            • *Social Interactions: A First-Person Perspective. A. Fathi, J. Hodgins, J. Rehg.  CVPR 2012  [pdf] [data]
            • *Recognizing Activities of Daily Living in First-Person Camera Views.  H. Pirsiavash and D. Ramanan.  CVPR 2012.  [pdf] [data/code]

            • *Novelty Detection from an Egocentric Perspective. O. Aghazadeh, J. Sullivan, and S. Carlsson. CVPR 2011 [pdf] [web/data]

            • Discovering Important People and Objects for Egocentric Video Summarization.  Y. J. Lee, J. Ghosh, and K. Grauman.  CVPR 2012.  [pdf]  [web]
            • Understanding Egocentric Activities.  A. Fathi, A. Farhadi, J. Rehg.  ICCV 2011. [pdf] [data]

            • Learning to Recognize Objects in Egocentric Activities.  A. Fathi, X. Ren, J. Rehg.  CVPR 2011.  [pdf]
            • Figure-Ground Segmentation Improves Handled Object Recognition in Egocentric Video.  X. Ren and C. Gu.  CVPR 2010 [pdf] [videos] [data]

            • Egocentric Recognition of Handled Objects: Benchmark and Analysis.  X. Ren and M. Philipose.  Egovision Workshop 2009.  [pdf] [data]

            • Activity Recognition from First Person Sensing.  E. Taralova, F. De la Torre, M. Hebert  CVPR 2009 Workshop on Egocentric Vision  [pdf]

            • Close-Range Human Detection for Head-Mounted Cameras.  D. Mitzel and B. Leibe.  BMVC 2012.  [pdf]

            • Structural Epitome: A Way to Summarize One’s Visual Experience. N. Jojic, A. Perina, and V. Murino. NIPS 2010.  [pdf]

            • Fast Unsupervised Ego-Action Learning for First-Person Sports Video. K. Kitani, T. Okabe, Y. Sato, and A. Sugimoto. CVPR 2011. [pdf]

            • Wearable Hand Activity Recognition for Event Summarization. W. Mayol and D. Murray. International Symposium on Wearable Computers. IEEE, 2005.  [pdf]

            • Illumination-free Gaze Estimation Method for First-Person Vision Wearable Device.  A. Tsukada, M. Shino, M. Devyver, T. Kanade.  ICCV Workshop 2011.  [pdf]

            • Egovision workshop at CVPR 2012
            Jake-expt
            Randall-paper
            Dinesh-expt



            Nov 30
            Human-in-the-loop interactive systems

            Human-in-the-loop learning, active annotation collection, crowdsourcing

            bird



            • *Multiclass Recognition and Part Localization with Humans in the Loop.  C. Wah et al. ICCV 2011. [pdf] [Caltech/UCSD Visipedia project]  [data]

            • *What’s It Going to Cost You? : Predicting Effort vs. Informativeness for Multi-Label Image Annotations.  S. Vijayanarasimhan and K. Grauman.  CVPR 2009 [pdf] [data] [code]

            • *The Multidimensional Wisdom of Crowds.  Welinder P., Branson S., Belongie S., Perona, P. NIPS 2010. [pdf]  [code]

            • Visual Recognition with Humans in the Loop.  Branson S., Wah C., Babenko B., Schroff F., Welinder P., Perona P., Belongie S.  ECCV 2010. [pdf]  

            • Large-Scale Live Active Learning: Training Object Detectors with Crawled Data and Crowds.  S. Vijayanarasimhan and K. Grauman.  CVPR 2011.  [pdf]

            • WhittleSearch: Image Search with Relative Attribute Feedback.  A. Kovashka, D. Parikh, K. Grauman.  CVPR 2012.  [pdf] [data]

            • Crowdclustering.  R. Gomes, P. Welinder, A. Krause, P. Perona.  NIPS 2011.  [pdf]

            • Adaptively Learning the Crowd Kernel.  O. Tamuz, C. Liu, S. Belongie, O. Shamir, A. Kalai.  ICML 2011 [pdf]

            • LeafSnap: A Computer Vision System for Automatic Plant Species Identification.  N. Kumar et al.  ECCV 2012.  [pdf]

            • Interactive Object Detection.  A. Yao, J. Gall, C. Leistner, L. Van Gool. CVPR 2012.  [pdf]
            • Efficiently Scaling Up Video Annotation with Crowdsourced Marketplaces.  C. Vondrick, D. Ramanan, D. Patterson.  ECCV 2010.  [pdf] [data/code]

            • Video Annotation and Tracking with Active Learning.  C. Vondrick, D. Patterson, D. Ramanan.  NIPS 2011.  [pdf]  [code]

            • Active Frame Selection for Label Propagation in Videos.  S. Vijayanarasimhan and K. Grauman.  ECCV 2012.  [pdf]

            • Annotator Rationales for Visual Recognition.  J. Donahue and K. Grauman.  ICCV 2011. [pdf]

            • Attributes for Classifier Feedback.  A. Parkash and D. Parikh.  ECCV 2012.  [pdf]
            • Combining Self Training and Active Learning for Video Segmentation.  A. Fathi, M. Balcan, X. Ren, J. Rehg.  BMVC 2011.  [pdf]

            • Labeling Images with a Computer Game. L. von Ahn and L. Dabbish. CHI, 2004.

            • Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise.  J. Whitehill et al.  NIPS 2009.  [pdf]
            • Utility Data Annotation with Amazon Mechanical Turk. A. Sorokin and D. Forsyth. Wkshp on Internet Vision, 2008.

            • Far-Sighted Active Learning on a Budget for Image and Video Recognition.  S. Vijayanarasimhan, P. Jain, and K. Grauman.  CVPR 2010.  [pdf]  [code]

            • Active Learning from Crowds.  Y. Yan, R. Rosales, G. Fung, J. Dy.  ICML 2011.  [pdf]

            • Proactive Learning: Cost-Sensitive Active Learning with Multiple Imperfect Oracles.  P. Donmez and J. Carbonell.  CIKM 2008.  [pdf]
            • Inactive Learning?  Difficulties Employing Active Learning in Practice.  J. Attenberg and F. Provost.  SIGKDD 2011. [pdf]

            • Actively Selecting Annotations Among Objects and Attributes.  A. Kovashka, S. Vijayanarasimhan, and K. Grauman.  ICCV 2011  [pdf]

            • Supervised Learning from Multiple Experts: Whom to Trust When Everyone Lies a Bit.  V. Raykar et al.  ICML 2009.  [pdf]
            • Multi-class Active Learning for Image Classification.  A. J. Joshi, F. Porikli, and N. Papanikolopoulos.  CVPR 2009.  [pdf]

            • GrabCut -Interactive Foreground Extraction using Iterated Graph Cuts, by C. Rother, V. Kolmogorov, A. Blake, SIGGRAPH 2004.  [pdf]  [project page]

            • Peekaboom: A Game for Locating Objects in Images, by L. von Ahn, R. Liu and M. Blum, CHI 2006. [pdf]  [web]
              Deepti-expt
              Heath-paper
              Niveda-expt

              Dec 7
              Final project presentations in class


              Final papers due


              Other useful links:

               
               

               

              Posted by uniqueone
              ,

              http://www.aishack.in/2010/05/sift-scale-invariant-feature-transform/4/

              Up till now, we have generated a scale space and used the scale space to calculate the Difference of Gaussians. Those are then used to calculate Laplacian of Gaussian approximations that is scale invariant. I told you that they produce great key points. Here’s how it’s done!

              Finding key points is a two part process

              1. Locate maxima/minima in DoG images
              2. Find subpixel maxima/minima

              Locate maxima/minima in DoG images

              The first step is to coarsely locate the maxima and minima. This is simple. You iterate through each pixel and check all it’s neighbours. The check is done within the current image, and also the one above and below it. Something like this:

              X marks the current pixel. The green circles mark the neighbours. This way, a total of 26 checks are made. X is marked as a “key point” if it is the greatest or least of all 26 neighbours.

              Usually, a non-maxima or non-minima position won’t have to go through all 26 checks. A few initial checks will usually sufficient to discard it.

              Note that keypoints are not detected in the lowermost and topmost scales. There simply aren’t enough neighbours to do the comparison. So simply skip them!

              Once this is done, the marked points are the approximate maxima and minima. They are “approximate” because the maxima/minima almost never lies exactly on a pixel. It lies somewhere between the pixel. But we simply cannot access data “between” pixels. So, we must mathematically locate the subpixel location.

              Here’s what I mean:

              The red crosses mark pixels in the image. But the actual extreme point is the green one.

              Find subpixel maxima/minima

              Using the available pixel data, subpixel values are generated. This is done by the Taylor expansion of the image around the approximate key point.

              Mathematically, it’s like this:

              We can easily find the extreme points of this equation (differentiate and equate to zero). On solving, we’ll get subpixel key point locations. These subpixel values increase chances of matching and stability of the algorithm.

              Example

              Here’s a result I got from the example image I’ve been using till now:

              The author of SIFT recommends generating two such extrema images. So, you need exactly 4 DoG images. To generate 4 DoG images, you need 5 Gaussian blurred images. Hence the 5 level of blurs in each octave.

              In the image, I’ve shown just one octave. This is done for all octaves. Also, this image just shows the first part of keypoint detection. The Taylor series part has been skipped.

              Summary

              Here, we detected the maxima and minima in the DoG images generated in the previous step. This is done by comparing neighbouring pixels in the current scale, the scale “above” and the scale “below”.

              Next, we’ll reject some keypoints detected here. This is because they either don’t have enough contrast or they lie on an edge

              Got questions or suggestions? Leave a comment below!

              Posted by uniqueone
              ,

              http://www.aishack.in/2010/05/sift-scale-invariant-feature-transform/3/

              In the previous step , we created the scale space of the image. The idea was to blur an image progressively, shrink it, blur the small image progressively and so on. Now we use those blurred images to generate another set of images, the Difference of Gaussians (DoG). These DoG images are a great for finding out interesting key points in the image.

              Laplacian of Gaussian

              The Laplacian of Gaussian (LoG) operation goes like this. You take an image, and blur it a little. And then, you calculate second order derivatives on it (or, the “laplacian”). This locates edges and corners on the image. These edges and corners are good for finding keypoints.

              But the second order derivative is extremely sensitive to noise. The blur smoothes it out the noise and stabilizes the second order derivative.

              The problem is, calculating all those second order derivatives is computationally intensive. So we cheat a bit.

              The Con

              To generate Laplacian of Guassian images quickly, we use the scale space. We calculate the difference between two consecutive scales. Or, the Difference of Gaussians. Here’s how:

              These Difference of Gaussian images are approximately equivalent to the Laplacian of Gaussian. And we’ve replaced a computationally intensive process with a simple subtraction (fast and efficient). Awesome!

              These DoG images comes with another little goodie. These approximations are also “scale invariant”. What does that mean?

              The Benefits

              Just the Laplacian of Gaussian images aren’t great. They are not scale invariant. That is, they depend on the amount of blur you do. This is because of the Gaussian expression. (Don’t panic ;) )

              See the σ2 in the demonimator? That’s the scale. If we somehow get rid of it, we’ll have true scale independence. So, if the laplacian of a gaussian is represented like this:

              Then the scale invariant laplacian of gaussian would look like this:

              But all these complexities are taken care of by the Difference of Gaussian operation. The resultant images after the DoG operation are already multiplied by the σ2. Great eh!

              Oh! And it has also been proved that this scale invariant thingy produces much better trackable points! Even better!

              Side effects

              You can’t have benefits without side effects >.<

              You know the DoG result is multiplied with σ2. But it’s also multiplied by another number. That number is (k-1). This is the k we discussed in the previous step.

              But we’ll just be looking for the location of the maximums and minimums in the images. We’ll never check the actual values at those locations. So, this additional factor won’t be a problem to us. (Even if you multiply throughout by some constant, the maxima and minima stay at the same location)

              Example

              Here’s a gigantic image to demonstrate how this difference of Gaussians works.

              In the image, I’ve done the subtraction for just one octave. The same thing is done for all octaves. This generates DoG images of multiple sizes.

              Summary

              Two consecutive images in an octave are picked and one is subtracted from the other. Then the next consecutive pair is taken, and the process repeats. This is done for all octaves. The resulting images are an approximation of scale invariant laplacian of gaussian (which is good for detecting keypoints). There are a few “drawbacks” due to the approximation, but they won’t affect the algorithm.

              Next, we’ll actually find some interesting keypoints. Maxima and Minima. Or, Maximums and Minimums of the image.

              Got any questions or suggestions? Leave a comment below!

               

              Posted by uniqueone
              ,
              http://www.aishack.in/2010/05/sift-scale-invariant-feature-transform/2/

              Real world objects are meaningful only at a certain scale. You might see a sugar cube perfectly on a table. But if looking at the entire milky way, then it simply does not exist. This multi-scale nature of objects is quite common in nature. And a scale space attempts to replicate this concept on digital images.

              Scale spaces

              Do you want to look at a leaf or the entire tree? If it’s a tree, get rid of some detail from the image (like the leaves, twigs, etc) intentionally.

              While getting rid of these details, you must ensure that you do not introduce new false details. The only way to do that is with the Gaussian Blur (it was proved mathematically, under several reasonable assumptions).

              So to create a scale space, you take the original image and generate progressively blurred out images. Here’s an example:

              Look at how the cat’s helmet loses detail. So do it’s whiskers.

              Scale spaces in SIFT

              SIFT takes scale spaces to the next level. You take the original image, and generate progressively blurred out images. Then, you resize the original image to half size. And you generate blurred out images again. And you keep repeating.

              Here’s what it would look like in SIFT:

              Images of the same size (vertical) form an octave. Above are four octaves. Each octave has 5 images. The individual images are formed because of the increasing “scale” (the amount of blur).

              The technical details

              Now that you know things the intuitive way, I’ll get into a few technical details.

              Octaves and Scales

              The number of octaves and scale depends on the size of the original image. While programming SIFT, you’ll have to decide for yourself how many octaves and scales you want. However, the creator of SIFT suggests that 4 octaves and 5 blur levels are ideal for the algorithm.

              The first octave

              If the original image is doubled in size and antialiased a bit (by blurring it) then the algorithm produces more four times more keypoints. The more the keypoints, the better!

              Blurring

              Mathematically, “blurring” is referred to as the convolution of the gaussian operator and the image. Gaussian blur has a particular expression or “operator” that is applied to each pixel. What results is the blurred image.

              The symbols:

              • L is a blurred image
              • G is the Gaussian Blur operator
              • I is an image
              • x,y are the location coordinates
              • σ is the “scale” parameter. Think of it as the amount of blur. Greater the value, greater the blur.
              • The * is the convolution operation in x and y. It “applies” gaussian blur G onto the image I.

              This is the actual Gaussian Blur operator.

              Amount of blurring

              The amount of blurring in each image is important. It goes like this. Assume the amount of blur in a particular image is σ. Then, the amount of blur in the next image will be k*σ. Here k is whatever constant you choose.

              This is a table of σ’s for my current example. See how each σ differs by a factor sqrt(2) from the previous one.

              Summary

              In the first step of SIFT, you generate several octaves of the original image. Each octave’s image size is half the previous one. Within an octave, images are progressively blurred using the Gaussian Blur operator.

              In the next step, we’ll use all these octaves to generate Difference of Gaussian images.

               

              Posted by uniqueone
              ,
              http://www.aishack.in/2010/05/sift-scale-invariant-feature-transform/

              Matching features across different images in a common problem in computer vision. When all images are similar in nature (same scale, orientation, etc) simple corner detectors can work. But when you have images of different scales and rotations, you need to use the Scale Invariant Feature Transform.

              Why care about SIFT

              SIFT isn’t just scale invariant. You can change the following, and still get good results:

              • Scale (duh)
              • Rotation
              • Illumination
              • Viewpoint

              Here’s an example. We’re looking for these:

              And we want to find these objects in this scene:

              Here’s the result:

              Now that’s some real robust image matching going on. The big rectangles mark matched images. The smaller squares are for individual features in those regions. Note how the big rectangles are skewed. They follow the orientation and perspective of the object in the scene.

              The algorithm

              SIFT is quite an involved algorithm. It has a lot going on and can become confusing, So I’ve split up the entire algorithm into multiple parts. Here’s an outline of what happens in SIFT.

              1. Constructing a scale space
                This is the initial preparation. You create internal representations of the original image to ensure scale invariance. This is done by generating a “scale space”.
              2. LoG Approximation
                The Laplacian of Gaussian is great for finding interesting points (or key points) in an image. But it’s computationally expensive. So we cheat and approximate it using the representation created earlier.
              3. Finding keypoints
                With the super fast approximation, we now try to find key points. These are maxima and minima in the Difference of Gaussian image we calculate in step 2
              4. Get rid of bad key points
                Edges and low contrast regions are bad keypoints. Eliminating these makes the algorithm efficient and robust. A technique similar to the Harris Corner Detector is used here.
              5. Assigning an orientation to the keypoints
                An orientation is calculated for each key point. Any further calculations are done relative to this orientation. This effectively cancels out the effect of orientation, making it rotation invariant.
              6. Generate SIFT features
                Finally, with scale and rotation invariance in place, one more representation is generated. This helps uniquely identify features. Lets say you have 50,000 features. With this representation, you can easily identify the feature you’re looking for (say, a particular eye, or a sign board).

              That was an overview of the entire algorithm. Over the next few days, I’ll go through each step in detail. Finally, I’ll show you how to implement SIFT in OpenCV!

              What do I do with SIFT features?

              After you run through the algorithm, you’ll have SIFT features for your image. Once you have these, you can do whatever you want.

              Track images, detect and identify objects (which can be partly hidden as well), or whatever you can think of. We’ll get into this later as well.

              But the catch is, this algorithm is patented.

              >.<

              So, it’s good enough for academic purposes. But if you’re looking to make something commercial, look for something else! [Thanks to aLu for pointing out SURF is patented too]

              Posted by uniqueone
              ,