http://www.cs.utexas.edu/~grauman/courses/fall2011/schedule.html

CS395T: Visual Recognition, Fall 2011



Course overview        Useful links        Syllabus        Detailed schedule          Blackboard


Meets:
Wednesdays 4:00-7:00 pm
ACES 3.408

Instructor: Kristen Grauman 
Email: grauman@cs
Office: ACES 3.446 

Office hours: by appointment

When emailing me, please put CS395 in the subject line.

Announcements:

See the schedule for weekly reading assignments.

Project paper drafts due Nov 23.  Details on projects are here.

Course overview:


Topics: This is a graduate seminar course in computer vision.   We will survey and discuss current vision papers relating to object recognition, auto-annotation of images, and scene understanding.  The goals of the course will be to understand current approaches to some important problems, to actively analyze their strengths and weaknesses, and to identify interesting open questions and possible directions for future research.

See the syllabus for an outline of the main topics we'll be covering.

Requirements: Students will be responsible for writing paper reviews each week, participating in discussions, completing one programming assignment, presenting once or twice in class (depending on enrollment), and completing a project (done in pairs). 

Note that presentations are due one week before the slot your presentation is scheduled.  This means you will need to read the papers, prepare experiments, create slides, etc. more than one week before the date you are signed up for.  The idea is to meet and discuss ahead of time, so that we can iterate as needed the week leading up to your presentation. 

More details on the requirements and grading breakdown are here.

Prereqs:  Courses in computer vision and/or machine learning (378/376 Computer Vision and/or 391 Machine Learning, or similar); ability to understand and analyze conference papers in this area; programming required for experiment presentations and projects. 

Please talk to me if you are unsure if the course is a good match for your background.  I generally recommend scanning through a few papers on the syllabus to gauge what kind of background is expected.  I don't assume you are already familiar with every single algorithm/tool/image feature a given paper mentions, but you should feel comfortable following the key ideas.



Syllabus overview:

  1. Single-object recognition fundamentals: representation, matching, and classification
    1. Specific objects
    2. Classification and global models
    3. Regions and mid-level representations
  2. Beyond single objects: scenes and properties
    1. Context and scenes
    2. Saliency, importance, attention
    3. Attributes
  3. External input in recognition
    1. Language and text
    2. Interactive learning and recognition
  4. Activity in images and videos
    1. Pictures of people
    2. Activity recognition
  5. Dealing with lots of data/categories
    1. Scaling with a large number of categories
    2. Large-scale search and mining
    3. Automatic summarization

Important dates:
    • Monday, Aug 29: paper topic preferences due
    • Friday, Sept 16: implementation assignment due
    • Friday, Oct 7: project proposals due
    • Wednesday, Nov 23: final project paper drafts due
    • Tuesday, Dec 6: final papers due


    Schedule and papers:


    Note:  * = required reading. 
    Additional papers are provided for reference, and as a starting point for background reading for projects.
    Paper presentations: focus on starred papers (additionally mentioning ideas from others is ok but not necessary).
    Experiment presentations: Pick from only among the starred papers.
    Date
    Topics
    Papers and links
    Presenters
    Items due
    Aug 24
    Course intro 

    [slides]
    Topic preferences due via email by Monday August 29
    I. Single-object recognition fundamentals: representation, matching, and classification
    Aug 31
    Recognizing specific objects:

    Invariant local features, instance recognition, bag-of-words models

    sift
    • *Object Recognition from Local Scale-Invariant Features, Lowe, ICCV 1999.  [pdf]  [code] [other implementations of SIFT] [IJCV]

    • *Local Invariant Feature Detectors: A Survey, Tuytelaars and Mikolajczyk.  Foundations and Trends in Computer Graphics and Vision, 2008. [pdf]  [Oxford code] [Read pp. 178-188, 216-220, 254-255]

    • *Video Google: A Text Retrieval Approach to Object Matching in Videos, Sivic and Zisserman, ICCV 2003.  [pdf]  [demo]


    • For more background on feature extraction: Szeliski book: Sec 3.2 Linear filtering, 4.1 Points and patches, 4.2 Edges

    • Scalable Recognition with a Vocabulary Tree, D. Nister and H. Stewenius, CVPR 2006. [pdf]

    • SURF: Speeded Up Robust Features, Bay, Ess, Tuytelaars, and Van Gool, CVIU 2008.  [pdf] [code]

    • Bundling Features for Large Scale Partial-Duplicate Web Image Search.  Z. Wu, Q. Ke, M. Isard, and J. Sun.  CVPR 2009.  [pdf]

    • Robust Wide Baseline Stereo from Maximally Stable Extremal Regions, J. Matas, O. Chum, U. Martin, and T. Pajdla, BMVC 2002.  [pdf]

    • City-Scale Location Recognition, G. Schindler, M. Brown, and R. Szeliski, CVPR 2007.  [pdf

    • Object Retrieval with Large Vocabularies and Fast Spatial Matching.  J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman, CVPR 2007.  [pdf]

    • I Know What You Did Last Summer: Object-Level Auto-annotation of Holiday Snaps, S. Gammeter, L. Bossard, T.Quack, L. van Gool, ICCV 2009.  [pdf]

    • Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval.  O. Chum et al. CVPR 2007.  [pdf]

    • A Performance Evaluation of Local Descriptors. K. Mikolajczyk and C. Schmid.  CVPR 2003 [pdf]


    [slides]

    Sept 7
    Recognition via classification and global models:

    Global appearance models for category and scene recognition, sliding window detection, detection as a binary decision.

    hog
    • *A Discriminatively Trained, Multiscale, Deformable Part Model, by P. Felzenszwalb,  D.  McAllester and D. Ramanan.   CVPR 2008.  [pdf]  [code

    • *Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories, Lazebnik, Schmid, and Ponce, CVPR 2006. [pdf]  [15 scenes dataset]  [libpmk] [Matlab]

    • *Rapid Object Detection Using a Boosted Cascade of Simple Features, Viola and Jones, CVPR 2001.  [pdf]  [code]


    • Histograms of Oriented Gradients for Human Detection, Dalal and Triggs, CVPR 2005.  [pdf]  [video] [code] [PASCAL datasets]

    • Modeling the Shape of the Scene: a Holistic Representation of the Spatial Envelope, Oliva and Torralba, IJCV 2001.  [pdf]  [Gist code

    • Locality-Constrained Linear Coding for Image Classification.  J. Wang, J. Yang, K. Yu,  and T. Huang  CVPR 2010. [pdf] [code]

    • Visual Categorization with Bags of Keypoints, C. Dance, J. Willamowski, L. Fan, C. Bray, and G. Csurka, ECCV International Workshop on Statistical Learning in Computer Vision, 2004.  [pdf]

    • Pedestrian Detection in Crowded Scenes, Leibe, Seemann, and Schiele, CVPR 2005.  [pdf]

    • Pyramids of Histograms of Oriented Gradients (pHOG), Bosch and Zisserman. [code]

    • Eigenfaces for Recognition, Turk and Pentland, 1991.  [pdf]

    • Sampling Strategies for Bag-of-Features Image Classification.  E. Nowak, F. Jurie, and B. Triggs.  ECCV 2006. [pdf]

    • Beyond Sliding Windows: Object Localization by Efficient Subwindow Search.  C. Lampert, M. Blaschko, and T. Hofmann.  CVPR 2008.  [pdf]  [code]

    • A Trainable System for Object Detection, C. Papageorgiou and T. Poggio, IJCV 2000.  [pdf]

    • Object Recognition with Features Inspired by Visual Cortex. T. Serre, L. Wolf and T. Poggio. CVPR 2005.  [pdf]


    [slides]

    Sept 14
    Regions and mid-level representations

    Segmentation, grouping, surface estimation

    regions

    geocontext
    • *Constrained Parametric Min-Cuts for Automatic Object Segmentation. J. Carreira and C. Sminchisescu. CVPR 2010.  [pdf] [code]

    • *Geometric Context from a Single Image, by D. Hoiem, A. Efros, and M. Hebert, ICCV 2005. [pdf]  [web]  [code]

    • *Contour Detection and Hierarchical Image Segmentation.  P. Arbelaez,  M. Maire, C. Fowlkes, and J. Malik. PAMI 2011.  [pdf] [data and code]


    • From Contours to Regions: An Empirical Evaluation.  P. Arbelaez, M. Maire, C. Fowlkes, and J. Malik.  CVPR 2009.  [pdf] [code]

    • Boundary-Preserving Dense Local Regions.  J. Kim and K. Grauman.  CVPR 2011.  [pdf]  [code]

    • Object Recognition as Ranking Holistic Figure-Ground Hypotheses. F. Li, J. Carreira, and C. Sminchisescu. CVPR 2010. [pdf]

    • Using Multiple Segmentations to Discover Objects and their Extent in Image Collections, B. C. Russell, A. A. Efros, J. Sivic, W. T. Freeman, and A. Zisserman.  CVPR 2006.  [pdf] [code]

    • Combining Top-down and Bottom-up Segmentation. E. Borenstein, E. Sharon, and S. Ullman.  CVPR  workshop 2004.  [pdf]  [data]

    • Efficient Region Search for Object Detection.  S. Vijayanarasimhan and K. Grauman. CVPR 2011.  [pdf] [code] [data]

    • Extracting Subimages of an Unknown Category from a Set of Images, S. Todorovic and N. Ahuja, CVPR 2006.  [pdf]

    • Learning Mid-level Features for Recognition. Y.-L. Boureau, F. Bach, Y. LeCun, and J. Ponce. CVPR, 2010. 

    • Class-Specific, Top-Down Segmentation, E. Borenstein and S. Ullman, ECCV 2002.  [pdf]

    • Object Recognition by Integrating Multiple Image Segmentations, C. Pantofaru, C. Schmid, and M. Hebert, ECCV 2008  [pdf]

    • Image Parsing: Unifying Segmentation, Detection, and Recognition. Tu, Z., Chen, Z., Yuille, A.L., Zhu, S.C. ICCV 2003  [pdf]

    • GrabCut -Interactive Foreground Extraction using Iterated Graph Cuts, by C. Rother, V. Kolmogorov, A. Blake, SIGGRAPH 2004.  [pdf]  [project page]

    • Recognition Using Regions.  C. Gu, J. Lim, P. Arbelaez, J. Malik, CVPR 2009.  [pdf] [code]

    • Robust Higher Order Potentials for Enforcing Label Consistency, P. Kohli, L. Ladicky, and P. Torr. CVPR 2008.  

    • Co-segmentation of Image Pairs by Histogram Matching --Incorporating a Global Constraint into MRFs, C. Rother, V. Kolmogorov, T. Minka, and A. Blake.  CVPR 2006.  [pdf]

    • Collect-Cut: Segmentation with Top-Down Cues Discovered in Multi-Object Images.  Y. J. Lee and K. Grauman. CVPR 2010.  [pdf] [data]

    • An Efficient Algorithm for Co-segmentation, D. Hochbaum, V. Singh, ICCV 2009.  [pdf]

    • Normalized Cuts and Image Segmentation, J. Shi and J. Malik.  PAMI 2000.  [pdf]  [code]


    • Greg Mori's superpixel code
    • Berkeley Segmentation Dataset and code
    • Pedro Felzenszwalb's graph-based segmentation code
    • Michael Maire's segmentation code and paper
    • Mean-shift: a Robust Approach Towards Feature Space Analysis [pdf]  [code, Matlab interface by Shai Bagon]
    • David Blei's Topic modeling code
    [slides]
    Expts: Brian, Cho-Jui
    Implementation assignment due Friday Sept 16, 5 PM
    II. Beyond single objects: scenes and properties
    Sept 21
    Context and scenes

    Multi-object scenes, inter-object relationships, understanding scenes' spatial layout, 3d context

    context
    • *Estimating Spatial Layout of Rooms using Volumetric Reasoning about Objects and Surfaces.  D. Lee, A. Gupta, M. Hebert, and T. Kanade.  NIPS 2010.  [pdf] [code]

    • *Multi-Class Segmentation with Relative Location Prior.  S. Gould, J. Rodgers, D. Cohen, G. Elidan and D.  Koller.  IJCV 2008. [pdf] [code]

    • *Using the Forest to See the Trees: Exploiting Context for Visual Object Detection and Localization.  Torralba, Murphy, and Freeman.  CACM 2009.  [pdf] [related code]


    • Contextual Priming for Object Detection, A. Torralba.  IJCV 2003.  [pdf] [web] [code]

    • TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-Class Object Recognition and Segmentation.  J. Shotton, J. Winn, C. Rother, A. Criminisi.  ECCV 2006.  [pdf] [web] [data] [code]

    • Recognition Using Visual Phrases.  M. Sadeghi and A. Farhadi.  CVPR 2011.  [pdf]

    • Thinking Inside the Box: Using Appearance Models and Context Based on Room Geometry.  V. Hedau, D. Hoiem, and D. Forsyth.  ECCV 2010 [pdf] [code and data]

    • Blocks World Revisited: Image Understanding Using Qualitative Geometry and Mechanics, A. Gupta, A. Efros, and M. Hebert.  ECCV 2010. [pdf]

    • Object-Graphs for Context-Aware Category Discovery.  Y. J. Lee and K. Grauman.  CVPR 2010.  [pdf] [code]

    • Geometric Reasoning for Single Image Structure Recovery.  D. Lee, M. Hebert, and T. Kanade.  CVPR 2009.  [pdf]  [web[code]

    • Putting Objects in Perspective, by D. Hoiem, A. Efros, and M. Hebert, CVPR 2006.  [pdf] [web]

    • Discriminative Models for Multi-Class Object Layout, C. Desai, D. Ramanan, C. Fowlkes. ICCV 2009.  [pdf]  [slides]  [SVM struct code] [data]

    • Closing the Loop in Scene Interpretation.  D. Hoiem, A. Efros, and M. Hebert.  CVPR 2008.  [pdf]

    • Decomposing a Scene into Geometric and Semantically Consistent Regions, S. Gould, R. Fulton, and D. Koller, ICCV 2009.  [pdf]  [slides]

    • Learning Spatial Context: Using Stuff to Find Things, by G. Heitz and D. Koller, ECCV 2008.  [pdf] [code]

    • An Empirical Study of Context in Object Detection, S. Divvala, D. Hoiem, J. Hays, A. Efros, M. Hebert, CVPR 2009.  [pdf]  [web]

    • Object Categorization using Co-Occurrence, Location and Appearance, by C. Galleguillos, A. Rabinovich and S. Belongie, CVPR 2008.[ pdf]

    • Context Based Object Categorization: A Critical SurveyC. Galleguillos and S. Belongie.  [pdf]

    • What, Where and Who? Classifying Events by Scene and Object Recognition, L.-J. Li and L. Fei-Fei, ICCV 2007. [pdf]

    • Towards Total Scene Understanding: Classification, Annotation and Segmentation in an Unsupervised Framework, L-J. Li, R. Socher, L. Fei-Fei, CVPR 2009.  [pdf]

    Papers: Nishant, Jung
    Expts: Saurajit


    Sept 28
    Saliency and attention

    Among all items in the scene, which deserve attention (first)?

    saliency
    • *A Model of Saliency-based Visual Attention for Rapid Scene Analysis.  L. Itti, C. Koch, and E. Niebur.  PAMI 1998  [pdf]

    • *Learning to Detect a Salient Object.  T. Liu et al. CVPR 2007.  [pdf]  [results]  [data]  [code by Vicente Ordonez]

    • *Figure-Ground Segmentation Improves Handled Object Recognition in Egocentric Video.  X. Ren and C. Gu.  CVPR 2010 [pdf] [videos] [data]

    • *What Do We Perceive in a Glance of a Real-World Scene?  L. Fei-Fei, A. Iyer, C. Koch, and P. Perona.  Journal of Vision, 2007.  [pdf]


    • Interesting Objects are Visually Salient.  L. Elazary and L. Itti.  Journal of Vision, 8(3):1–15, 2008.  [pdf]

    • Accounting for the Relative Importance of Objects in Image Retrieval.  S. J. Hwang and K. Grauman.  BMVC 2010.  [pdf] [web] [data]

    • Some Objects are More Equal Than Others: Measuring and Predicting Importance, M. Spain and P. Perona.  ECCV 2008.  [pdf]

    • What Makes an Image Memorable?  P. Isola et al. CVPR 2011. [pdf]
    • The Discriminant Center-Surround Hypothesis for Bottom-Up Saliency. D. Gao, V.Mahadevan, and N. Vasconcelos. NIPS, 2007.  [pdf]

    • Category-Independent Object Proposals.  I. Endres and D. Hoiem.  ECCV 2010.  [pdf]  [code]

    • What is an Object?  B. Alexe, T. Deselaers, and V. Ferrari.  CVPR 2010.  [pdf] [code]

    • A Principled Approach to Detecting Surprising Events in Video.  L. Itti and P. Baldi.  CVPR 2005  [pdf]

    • Optimal Scanning for Faster Object Detection,  N. Butko, J. Movellan.  CVPR 2009.  [pdf]

    • What Attributes Guide the Deployment of Visual Attention and How Do They Do It? J. Wolfe and T. Horowitz. Neuroscience, 5:495–501, 2004.  [pdf]

    • Visual Correlates of Fixation Selection: Effects of Scale and Time. B. Tatler, R. Baddeley, and I. Gilchrist. Vision Research, 45:643, 2005.  [pdf]

    • Objects Predict Fixations Better than Early Saliency.  W. Einhauser, M. Spain, and P. Perona. Journal of Vision, 8(14):1–26, 2008.  [pdf]

    • Reading Between the Lines: Object Localization Using Implicit Cues from Image Tags.  S. J. Hwang and K. Grauman.  CVPR 2010.  [pdf]  [data]

    • Peripheral-Foveal Vision for Real-time Object Recognition and Tracking in Video.  S. Gould, J. Arfvidsson, A. Kaehler, B. Sapp, M. Messner, G. Bradski, P. Baumstrack,S. Chung, A. Ng.  IJCAI 2007.  [pdf]

    • Peekaboom: A Game for Locating Objects in Images, by L. von Ahn, R. Liu and M. Blum, CHI 2006. [pdf]  [web]

    • Determining Patch Saliency Using Low-Level Context, D. Parikh, L. Zitnick, and T. Chen. ECCV 2008.  [pdf]

    • Visual Recognition and Detection Under Bounded Computational Resources, S. Vijayanarasimhan and A. Kapoor.  CVPR 2010.

    • Key-Segments for Video Object Segmentation.  Y. J. Lee, J. Kim, and K. Grauman.  ICCV 2011  [pdf]

    • Contextual Guidance of Eye Movements and Attention in Real-World Scenes: The Role of Global Features on Object Search.  A. Torralba, A. Oliva, M. Castelhano, J. Henderson.  [pdf] [web]

    • The Role of Top-down and Bottom-up Processes in Guiding Eye Movements during Visual Search, G. Zelinsky, W. Zhang, B. Yu, X. Chen, D. Samaras, NIPS 2005.  [pdf]

    Papers: Lu Xia
    Expts: Larry


    Oct 5
    Attributes:

    Visual properties, learning from natural language descriptions, intermediate representations

    attributes
    • *Learning To Detect Unseen Object Classes by Between-Class Attribute Transfer, C. Lampert, H. Nickisch, and S. Harmeling, CVPR 2009  [pdf] [web] [data]

    • *Describing Objects by Their Attributes, A. Farhadi, I. Endres, D. Hoiem, and D. Forsyth, CVPR 2009.  [pdf]  [web] [data]

    • *Attribute and Simile Classifiers for Face Verification, N. Kumar, A. Berg, P. Belhumeur, S. Nayar.  ICCV 2009.  [pdf] [web] [lfw data] [pubfig data]


    • Relative Attributes.  D. Parikh and K. Grauman.  ICCV 2011.  [pdf]  [data]

    • A Discriminative Latent Model of Object Classes and Attributes.  Y. Wang and G. Mori.  ECCV, 2010.  [pdf]

    • Learning Visual Attributes, V. Ferrari and A. Zisserman, NIPS 2007.  [pdf] 

    • Learning Models for Object Recognition from Natural Language Descriptions, J. Wang, K. Markert, and M. Everingham, BMVC 2009.[pdf]

    • FaceTracer: A Search Engine for Large Collections of Images with Faces.  N. Kumar, P. Belhumeur, and S. Nayar.  ECCV 2008.  [pdf]

    • Attribute-Centric Recognition for Cross-Category Generalization.  A. Farhadi, I. Endres, D. Hoiem.  CVPR 2010.  [pdf]

    • Automatic Attribute Discovery and Characterization from Noisy Web Data.  T. Berg et al.  ECCV 2010.  [pdf]  [data]

    • Attributes-Based People Search in Surveillance Environments.  D. Vaquero, R. Feris, D. Tran, L. Brown, A. Hampapur, and M. Turk.  WACV 2009.  [pdf] [project page]

    • Image Region Entropy: A Measure of "Visualness" of Web Images Associated with One Concept.  K. Yanai and K. Barnard.  ACM MM 2005.  [pdf]

    • What Helps Where And Why? Semantic Relatedness for Knowledge Transfer. M. Rohrbach, M. Stark, G. Szarvas, I. Gurevych and B. Schiele. CVPR 2010.  [pdf]

    • Recognizing Human Actions by Attributes.  J. Liu, B. Kuipers, S. Savarese, CVPR 2011.  [pdf]

    • Interactively Building a Discriminative Vocabulary of Nameable Attributes.  D. Parikh and K. Grauman.  CVPR 2011.  [pdf] [web]

      Papers: Saurajit
      Expts: Qiming, Harsh
      Proposal abstracts due Friday Oct 7, 5 PM
      III. External input in recognition
      Oct 12
      Language and description

      Discovering the correspondence between words and other language constructs and images, generating descriptions

      caption
      • *Baby Talk: Understanding and Generating Image Descriptions.  Kulkarni et al.  CVPR 2011.  [pdf]

      • *Beyond Nouns: Exploiting Prepositions and Comparative Adjectives for Learning Visual Classifiers, A. Gupta and L. Davis, ECCV 2008.  [pdf]

      • *Learning Sign Language by Watching TV (using weakly aligned subtitles), P. Buehler, M. Everingham, and A. Zisserman. CVPR 2009.  [pdf]  [data] [web]


      • Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary, P. Duygulu, K. Barnard, N. de Freitas, D. Forsyth. ECCV 2002.  [pdf]  [data]

      • The Mathematics of Statistical Machine Translation: Parameter Estimation.  P. Brown, S. Della Pietro, V. Della Pietra, R. Mercer.  Association for Computational Linguistics, 1993.  [pdf] (background for Duygulu et al paper)

      • How Many Words is a Picture Worth?  Automatic Caption Generation for News Images.  Y. Feng and M. Lapata.  ACL 2010.  [pdf]
      • Matching words and pictures. K. Barnard, P. Duygulu, N. de Freitas, D. Forsyth, D. Blei, and M. Jordan.  JMLR, 3:1107–1135, 2003.  [pdf]

      • Who's Doing What: Joint Modeling of Names and Verbs for Simultaneous Face and Pose Annotation.  L. Jie, B. Caputo, and V. Ferrari.  NIPS 2009.  [pdf]

      • Watch, Listen & Learn: Co-training on Captioned Images and Videos.  S. Gupta, J. Kim, K. Grauman, and R. Mooney.  ECML 2008.  [pdf]

      • Systematic Evaluation of Machine Translation Methods for Image and Video Annotation, P. Virga, P. Duygulu, CIVR 2005.  [pdf]
      • Localizing Objects and Actions in Videos Using Accompanying Text.  Johns Hopkins University Summer Workshop Report.  J. Neumann et al.  2010.  [pdf]  [web]
      Papers: Chris
      Expts: Jae, Naga


      Oct 19
      Interactive learning and recognition

      Human-in-the-loop learning, active annotation collection, crowdsourcing

      questions
      mturk


      • *Large-Scale Live Active Learning: Training Object Detectors with Crawled Data and Crowds.  S. Vijayanarasimhan and K. Grauman.  CVPR 2011.  [pdf]

      • *Visual Recognition with Humans in the Loop.  Branson S., Wah C., Babenko B., Schroff F., Welinder P., Perona P., Belongie S.  ECCV 2010. [pdf]  [Caltech/UCSD Visipedia project]  [data]

      • *The Multidimensional Wisdom of Crowds.  Welinder P., Branson S., Belongie S., Perona, P. NIPS 2010. [pdf]  [code]

      • *What’s It Going to Cost You? : Predicting Effort vs. Informativeness for Multi-Label Image Annotations.  S. Vijayanarasimhan and K. Grauman.  CVPR 2009 [pdf] [data] [code]


      • iCoseg: Interactive Co-segmentation with Intelligent Scribble Guidance, D. Batra, A. Kowdle, D. Parikh, J. Luo and T. Chen. CVPR 2010.  [pdf] [web]

      • Labeling Images with a Computer Game. L. von Ahn and L. Dabbish. CHI, 2004.

      • Who's Vote Should Count More: Optimal Integration fo Labels from Labelers of Unknown Expertise.  J. Whitehill et al.  NIPS 2009.  [pdf]
      • Utility Data Annotation with Amazon Mechanical Turk. A. Sorokin and D. Forsyth. Wkshp on Internet Vision, 2008.

      • Far-Sighted Active Learning on a Budget for Image and Video Recognition.  S. Vijayanarasimhan, P. Jain, and K. Grauman.  CVPR 2010.  [pdf]  [code]

      • Multiclass Recognition and Part Localization with Humans in the Loop.  C. Wah et al. ICCV 2011. [pdf]

      • Multi-Level Active Prediction of Useful Image Annotations for Recognition.  S. Vijayanarasimhan and K. Grauman.  NIPS 2008. [pdf] 

      • Active Learning from Crowds.  Y. Yan, R. Rosales, G. Fung, J. Dy.  ICML 2011.  [pdf]

      • Proactive Learning: Cost-Sensitive Active Learning with Multiple Imperfect Oracles.  P. Donmez and J. Carbonell.  CIKM 2008.  [pdf]
      • Inactive Learning?  Difficulties Employing Active Learning in Practice.  J. Attenberg and F. Provost.  SIGKDD 2011. [pdf]

      • Annotator Rationales for Visual Recognition.  J. Donahue and K. Grauman.  ICCV 2011. [pdf]

      • Interactively Building a Discriminative Vocabulary of Nameable Attributes.  D. Parikh and K. Grauman.  CVPR 2011.  [pdf] [web]

      • Actively Selecting Annotations Among Objects and Attributes.  A. Kovashka, S. Vijayanarasimhan, and K. Grauman.  ICCV 2011  [pdf]

      • Supervised Learning from Multiple Experts: Whom to Trust When Everyone Lies a Bit.  V. Raykar et al.  ICML 2009.  [pdf]
      • Multi-class Active Learning for Image Classification.  A. J. Joshi, F. Porikli, and N. Papanikolopoulos.  CVPR 2009.  [pdf]

      • GrabCut -Interactive Foreground Extraction using Iterated Graph Cuts, by C. Rother, V. Kolmogorov, A. Blake, SIGGRAPH 2004.  [pdf]  [project page]

      • Active Learning for Piecewise Planar 3D Reconstruction.  A. Kowdle, Y.-J. Chang, A. Gallagher and T. Chen. CVPR 2011 [pdf] [web]

      • Amazon Mechanical Turk
      • Using Mechanical Turk with LabelMe
        Papers: Brian, Harsh
        Expts: Yunsik

        Proposal extended outline due Friday Oct 21, 5 PM
        IV. Activity in images and video
        Oct 26
        Pictures of people

        Finding people and their poses, automatic face tagging

        pose

        • *Poselets: Body Part Detectors Trained Using 3D Human Pose Annotations, L. Bourdev and J. Malik.  ICCV 2009  [pdf[code]

        • *Understanding Images of Groups of People, A. Gallagher and T. Chen, CVPR 2009.  [pdf]  [web] [data]

        • *Real-Time Human Pose Recognition in Parts from a Single Depth Image.  J. Shotton et al.  CVPR 2011. [pdf] [video]

        • *"'Who are you?' - Learning Person Specific Classifiers from Video, J. Sivic, M. Everingham, and A. Zisserman, CVPR 2009.  [pdf] [data] [KLT tracking code]


        • Contextual Identity Recognition in Personal Photo Albums. D. Anguelov, K.-C. Lee, S. Burak, Gokturk, and B. Sumengen. CVPR 2007.  [pdf]

        • Fast Pose Estimation with Parameter Sensitive Hashing.  G. Shakhnarovich, P. Viola, T. Darrell, ICCV 2003.[pdf]

        • Finding and Tracking People From the Bottom Up.  D. Ramanan, D. A. Forsyth.  CVPR 2003.  [pdf]

        • Where’s Waldo: Matching People in Images of Crowds.  R. Garg, D. Ramanan, S. Seitz, N. Snavely. CVPR 2011.  [pdf]

        • Autotagging Facebook: Social Network Context Improves Photo Annotation, by  Z. Stone, T. Zickler, and T. Darrell.  CVPR Internet Vision Workshop 2008.   [pdf]

        • Efficient Propagation for Face Annotation in Family Albums. L. Zhang, Y. Hu, M. Li, and H. Zhang.  MM 2004.  [pdf]

        • Progressive Search Space Reduction for Human Pose Estimation.  Ferrari, V., Marin-Jimenez, M. and Zisserman, A.  CVPR 2008.  [pdf] [web] [code]

        • Leveraging Archival Video for Building Face Datasets, by D. Ramanan, S. Baker, and S. Kakade.  ICCV 2007.  [pdf]
        • Names and Faces in the News, by T. Berg, A. Berg, J. Edwards, M. Maire, R. White, Y. Teh, E. Learned-Miller and D. Forsyth, CVPR 2004.  [pdf]  [web]

        • Face Discovery with Social Context.  Y. J. Lee and K. Grauman.  BMVC 2011.  [pdf]

        • “Hello! My name is... Buffy” – Automatic Naming of Characters in TV Video, by M. Everingham, J. Sivic and A. Zisserman, BMVC 2006.  [pdf]  [web]  [data]

        • Modeling Mutual Context of Object and Human Pose in Human-Object Interaction Activities. Yao, B., Fei-Fei, L.  CVPR 2010.

        • A Face Annotation Framework with Partial Clustering and Interactive Labeling.  R. X. Y. Tian,W. Liu, F.Wen, and X. Tang.  CVPR 2007.  [pdf] [web]

        • From 3D Scene Geometry to Human Workspace.  A. Gupta et al.  CVPR 2011.  [pdf] [web]

        • Pictorial Structures Revisited: People Detection and Articulated Pose Estimation.  M. Andriluka et al. CVPR 2009.  [pdf]  [code]

        Papers: Sunil, Larry
        Expts: Nishant, Jung


        Nov 2
        Activity recognition

        Recognizing and localizing human actions in video

        actions
        • *Actions in Context, M. Marszalek, I. Laptev, C. Schmid.  CVPR 2009.  [pdf] [web] [data]

        • *A Hough Transform-Based Voting Framework for Action Recognition.  A. Yao, J. Gall, L. Van Gool.  CVPR 2010.  [pdf[code/data]

        • *Beyond Actions: Discriminative Models for Contextual Group Activities.  T. Lian, Y. Wang, W. Yang, and G. Mori.  NIPS 2010.  [pdf] [data]


        • Objects in Action: An Approach for Combining Action Understanding and Object Perception.   A. Gupta and L. Davis.  CVPR, 2007.  [pdf]  [data]

        • Learning Realistic Human Actions from Movies.  I. Laptev, M. Marszałek, C. Schmid and B. Rozenfeld.  CVPR 2008.  [pdf]  [data]

        • Understanding Egocentric Activities.  A. Fathi, A. Farhadi, J. Rehg.  ICCV 2011. [pdf]

        • Exploiting Human Actions and Object Context for Recognition Tasks.  D. Moore, I. Essa, and M. Hayes.  ICCV 1999.  [pdf]

        • A Scalable Approach to Activity Recognition Based on Object Use. J. Wu, A. Osuntogun, T. Choudhury, M. Philipose, and J. Rehg.  ICCV 2007.  [pdf]

        • Recognizing Actions at a Distance.  A. Efros, G. Mori, J. Malik.  ICCV 2003.  [pdf] [web]

        • Activity Recognition from First Person Sensing.  E. Taralova, F. De la Torre, M. Hebert  CVPR 2009 Workshop on Egocentric Vision  [pdf]

        • Action Recognition from a Distributed Representation of Pose and Appearance, S. Maji, L. Bourdev, J.  Malik, CVPR 2011.  [pdf]  [code]

        • Learning a Hierarchy of Discriminative Space-Time Neighborhood Features for Human Action Recognition.  A. Kovashka and K. Grauman.  CVPR 2010.  [pdf]

        • Temporal Causality for the Analysis of Visual Events.  K. Prabhakar, S. Oh, P. Wang, G. Abowd, and J. Rehg.  CVPR 2010.  [pdf] [Georgia Tech Computational Behavior Science project]

        • Modeling Activity Global Temporal Dependencies using Time Delayed Probabilistic Graphical Model.  Loy, Xiang & Gong ICCV 2009.  [pdf]

        • What's Going on?: Discovering Spatio-Temporal Dependencies in Dynamic Scenes.  D. Kuettel et al.  CVPR 2010.  [pdf]

        • Learning Actions From the Web.  N. Ikizler-Cinbis, R. Gokberk Cinbis, S. Sclaroff.  ICCV 2009.  [pdf]

        • Content-based Retrieval of Functional Objects in Video Using Scene Context.  S. Oh, A. Hoogs, M. Turek, and R. Collins.  ECCV 2010.  [pdf]
        Papers: Qiming, Yunsik
        Expts: Lu Xia


        V. Dealing with lots of data/categories
        Nov 9
        Scaling with a large number of categories

        Sharing features between classes, transfer, taxonomy, learning from few examples, exploiting class relationships

        shared
        • *Sharing Visual Features for Multiclass and Multiview Object Detection, A. Torralba, K. Murphy, W. Freeman, PAMI 2007.  [pdf]  [code]

        • *What Does Classifying More than 10,000 Image Categories Tell Us? J. Deng, A. Berg, K. Li and L. Fei-Fei.  ECCV 2010.  [pdf]

        • *Discriminative Learning of Relaxed Hierarchy for Large-scale Visual Recognition.  T. Gao and Daphne Koller ICCV 2011.  [pdf] [code]


        • Comparative Object Similarity for Improved Recognition with Few or Zero Examples. G. Wang, D. Forsyth, and D. Hoeim. CVPR 2010. [pdf]

        • Learning and Using Taxonomies for Fast Visual Categorization, G. Griffin and P. Perona, CVPR 2008.  [pdf] [data]

        • Cross-Generalization: Learning Novel Classes from a Single Example by Feature Replacement.  CVPR 2005.  [pdf]

        • 80 Million Tiny Images: A Large Dataset for Non-Parametric Object and Scene Recognition, by A. Torralba, R. Fergus, and W. Freeman.  PAMI 2008.  [pdf] [web]

        • Constructing Category Hierarchies for Visual Recognition, M. Marszalek and C. Schmid.  ECCV 2008.  [pdf]  [web] [Caltech256]

        • Learning Generative Visual Models from Few Training Examples: an Incremental Bayesian Approach Tested on 101 Object Categories. L. Fei-Fei, R. Fergus, and P. Perona. CVPR Workshop on Generative-Model Based Vision. 2004.  [pdf] [Caltech101]

        • Towards Scalable Representations of Object Categories: Learning a Hierarchy of Parts. S. Fidler and A. Leonardis.  CVPR 2007  [pdf]

        • Exploiting Object Hierarchy: Combining Models from Different Category Levels, A. Zweig and D. Weinshall, ICCV 2007 [pdf]

        • Incremental Learning of Object Detectors Using a Visual Shape Alphabet.  Opelt, Pinz, and Zisserman, CVPR 2006.  [pdf]

        • Sequential Learning of Reusable Parts for Object Detection.  S. Krempp, D. Geman, and Y. Amit.  2002  [pdf]

        • ImageNet: A Large-Scale Hierarchical Image Database, J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li and L. Fei-Fei, CVPR 2009 [pdf]  [data]

        • Semantic Label Sharing for Learning with Many Categories.  R. Fergus et al.  ECCV 2010.  [pdf]

        • Learning a Tree of Metrics with Disjoint Visual Features.  S. J. Hwang, K. Grauman, F. Sha.  NIPS 2011. 

        Papers: Cho-Jui, Si Si
        Expts: Lu Pan


        Nov 16
        Large-scale search and mining

        Scalable retrieval algorithms for massive databases, mining for themes

        hash
        • *VisualRank: Applying PageRank to Large-Scale Image Search.  Y. Jing and S. Baluja.  PAMI 2008.  [pdf]

        • *Kernelized Locality Sensitive Hashing for Scalable Image Search, by B. Kulis and K. Grauman, ICCV 2009 [pdf]  [code]

        • *Video Mining with Frequent Itemset Configurations.  T. Quack, V. Ferrari, and L. Van Gool.  CIVR 2006.  [pdf]


        • Learning Binary Projections for Large-Scale Image Search.  K. Grauman and R. Fergus.  Chapter (draft) to appear in Registration, Recognition, and Video Analysis, R. Cipolla, S. Battiato, and G. Farinella, Editors.  [pdf]

        • World-scale Mining of Objects and Events from Community Photo Collections.  T. Quack, B. Leibe, and L. Van Gool.  CIVR 2008.  [pdf

        • Interest Seam Image.  X. Zhang, G. Hua, L. Zhang, H. Shum.  CVPR 2010.  [pdf]

        • Detecting Objects in Large Image Collections and Videos by Efficient Subimage Retrieval, C. Lampert, ICCV 2009.  [pdf]  [code

        • Geometric Min-Hashing: Finding a (Thick) Needle in a Haystack, O. Chum, M. Perdoch, and J. Matas.  CVPR 2009.  [pdf]

        • FaceTracer: A Search Engine for Large Collections of Images with Faces.  N. Kumar, P. Belhumeur, and S. Nayar.  ECCV 2008.  [pdf]

        • Efficiently Searching for Similar Images.  K. Grauman.  Communications of the ACM, 2009.  [CACM link]

        • Fast Image Search for Learned Metrics, P. Jain, B. Kulis, and K. Grauman, CVPR 2008.  [pdf]

        • Small Codes and Large Image Databases for Recognition, A. Torralba, R. Fergus, and Y. Weiss, CVPR 2008.  [pdf]

        • Object Retrieval with Large Vocabularies and Fast Spatial Matching.  J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman, CVPR 2007.  [pdf]


        Papers: Naga, Jae
        Expts: Si Si


        Nov 23
        Summarization

        Video synopsis, discovering repeated objects, visualization

        synopsis
        • *Webcam Synopsis: Peeking Around the World, by Y. Pritch, A. Rav-Acha, A. Gutman, and S. Peleg, ICCV 2007.  [pdf] [web]

        • *Using Multiple Segmentations to Discover Objects and their Extent in Image Collections, B. C. Russell, A. A. Efros, J. Sivic, W. T. Freeman, and A. Zisserman.  CVPR 2006.  [pdf] [code]

        • *Summarizing Visual Data Using Bi-Directional Similarity.  D. Simakov, Y. Caspi, E. Shechtmann, M. Irani.  CVPR 2008.  [pdf] [video]


        • Fast Unsupervised Ego-Action Learning for First-Person Sports Video.  K. Kitani, T. Okabe, Y. Sato, A. Sugimoto.  CVPR 2011.  [pdf]

        • Scene Summarization for Online Image Collections.  I. Simon, N. Snavely, S. Seitz.  ICCV 2007.  [pdf]  [web]

        • VideoCut: Removing Irrelevant Frames by Discovering the Object of Interest. D. Liu, G. Hua, T. Chen.  ECCV 2010.  [pdf]

        • Video Epitomes. V. Cheung, B. J. Frey, and N. Jojic.  CVPR 2005. [pdf] [web] [code]

        • Making a Long Video Short. A. Rav-Acha, Y. Pritch, and S. Peleg.  CVPR 2006. [pdf]

        • Structural Epitome: A Way to Summarize One's Visual Experience.  N. Jojic, A. Perina, V. Murino.  NIPS 2010.  [pdf] [data]

        • Video Abstraction: A Systematic Review and Classification.  B. Truong and S. Venkatesh.  ACM 2007.  [pdf]

        • Shape Discovery from Unlabeled Image Collections.  Y. J. Lee and K. Grauman.  CVPR 2009.  [pdf]
        • Detecting and Sketching the Common.  S. Bagon, O. Brostovski, M. Galun, M. Irani.  CVPR 2010.  [pdf]
        • Object-Graphs for Context-Aware Category Discovery.  Y. J. Lee and K. Grauman.  CVPR 2010.  [pdf] [code]

        • Unsupervised Object Discovery: A Comparison.  T. Tuytelaars et al.  IJCV 2009.  [pdf]

        Papers: Lu Pan
        Expts: Sunil, Chris

        Final paper drafts due Wed Nov 23
        Nov 30
        Final project presentations in class


        Final papers due Tues Dec 6, 5 PM


        Other useful links:

         
         

         

        Posted by uniqueone
        ,

        schedule_CS395T Visual Recognition (Fall 2012).htm

         

        http://www.cs.utexas.edu/~cv-fall2012/schedule.html 

         

        CS395T: Visual Recognition, Fall 2012



        Course overview        Useful links        Syllabus        Detailed schedule          Blackboard


        Meets
        :
        Fridays 1-4 pm in ACES 3.408

        Instructor: Kristen Grauman 
        Email: grauman@cs
        Office: ACES 3.446 
        Office hours: by appointment (send email)


        TA: Austin Waters
        Email: austin@cs
        Office hours: by appointment (send email)

        When emailing us, please put CS395 in the subject line.

        Announcements:

        See the schedule for weekly reading assignments. 

        Project extended outlines due Wed Oct 31.  See handout for guidelines.


        Course overview:


        Topics: This is a graduate seminar course in computer vision.   We will survey and discuss current vision papers relating to object and activity recognition, auto-annotation of images, and scene understanding.  The goals of the course will be to understand current approaches to some important problems, to actively analyze their strengths and weaknesses, and to identify interesting open questions and possible directions for future research.

        See the syllabus for an outline of the main topics we'll be covering.

        Requirements: Students will be responsible for writing paper reviews each week, participating in discussions, completing two programming assignments, presenting once or twice in class (depending on enrollment), and completing a project (done in pairs). 

        Note that presentations are due one week before the slot your presentation is scheduled.  This means you will need to read the papers, prepare experiments, create slides, etc. more than one week before the date you are signed up for.  The idea is to meet and discuss ahead of time, so that we can iterate as needed the week leading up to your presentation. 

        More details on the requirements and grading breakdown are here.

        Prereqs:  Courses in computer vision and/or machine learning (378/376 Computer Vision and/or 391 Machine Learning, or similar); ability to understand and analyze conference papers in this area; programming required for experiment presentations and projects. 

        Please talk to me if you are unsure if the course is a good match for your background.  I generally recommend scanning through a few papers on the syllabus to gauge what kind of background is expected.  I don't assume you are already familiar with every single algorithm/tool/image feature a given paper mentions, but you should feel comfortable following the key ideas.



        Syllabus overview:


        A. Object recognition fundamentals
        1. Local features and matching for object instances
        2. Large-scale image/object search and mining
        3. Classification and detection for object categories
        4. Mid-level representations
        B. Beyond modeling individual objects
        1. Context and scenes
        2. Dealing with many categories
        3. Describing objects with attributes
        4. Importance and saliency

        C. Human-centered recognition

        1. Pictures of people
        2. Activity recognition
        3. Egocentric cameras
        4. Human-in-the-loop interactive systems

          Important dates:
            • Wednesday, Sept 5: paper topic preferences due
            • Friday, Sept 21: first coding assignment due
            • Friday Oct 5: second coding assignment due
            • Wednesday, Oct 17: project proposal abstracts due
            • Wednesday, Oct 31: project extended outlines due
            • Friday Dec 7: final papers due


            Schedule and papers:


            Note:  * = required reading. 
            Additional papers are provided for reference, and as a starting point for background reading for projects.
            Paper presentations: Cover the starred papers.
            Experiment presentations: Pick one from among the starred papers.
            Date
            Topics
            Papers and links
            Presenters
            Items due
            Aug 31
            Course intro 

            [slides]
            Topic preferences due via email to Austin (austin@cs) by Wed Sept 5 at 5 pm
            A. Object recognition fundamentals
            Sept 7
            Local features and matching for object instances:

            Invariant local features, instance recognition, visual vocabularies and bag-of-words

            sift
            • *Object Recognition from Local Scale-Invariant Features, Lowe, ICCV 1999.  [pdf]  [code] [other implementations of SIFT] [IJCV]

            • *Selected pages from: Local Invariant Feature Detectors: A Survey, Tuytelaars and Mikolajczyk.  Foundations and Trends in Computer Graphics and Vision, 2008. [pdf]  [Oxford code] [Read pp. 178-188, 216-220, 254-255]

            • *Video Google: A Text Retrieval Approach to Object Matching in Videos, Sivic and Zisserman, ICCV 2003.  [pdf]  [demo]

            • For more background on feature extraction: Szeliski book: Sec 3.2 Linear filtering, 4.1 Points and patches, 4.2 Edges

            • Scalable Recognition with a Vocabulary Tree, D. Nister and H. Stewenius, CVPR 2006. [pdf]

            • SURF: Speeded Up Robust Features, Bay, Ess, Tuytelaars, and Van Gool, CVIU 2008.  [pdf] [code]

            • Robust Wide Baseline Stereo from Maximally Stable Extremal Regions, J. Matas, O. Chum, U. Martin, and T. Pajdla, BMVC 2002.  [pdf]

            • A Performance Evaluation of Local Descriptors. K. Mikolajczyk and C. Schmid.  CVPR 2003 [pdf]

            [outline]
            [filters]
            [local features]
            [matching and spatial verification]

            Sept 14
            Large-scale image/object search and mining:

            Scalable retrieval algorithms, mining for visual themes, particularly for object instances

            query expansion
            • *Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval.  O. Chum et al. CVPR 2007.  [pdf] [Oxford buildings dataset]

            • *Discovering Favorite Views of Popular Places with Iconoid Shift.  T. Weyand and B. Leibe.  ICCV 2011.  [pdf] [Paris 500K dataset]

            • *Supervised Hashing with Kernels.  W. Liu, J. Wang, R. Ji, Y. Jiang, S.-F. Chang.  CVPR 2012 [pdf]

            • Kernelized Locality Sensitive Hashing for Scalable Image Search, by B. Kulis and K. Grauman, ICCV 2009 [pdf]  [code] [80M Tiny Images data]

            • Image Webs: Computing and Exploiting Connectivity in Image Collections.  K. Heath, N. Gelfand, M. Ovsjanikov, M. Aanjaneya, and L. Guibas.  CVPR 2010.  [pdf]
            • World-scale Mining of Objects and Events from Community Photo Collections.  T. Quack, B. Leibe, and L. Van Gool.  CIVR 2008.  [pdf]

            • Total Recall II: Query Expansion Revisited.  O. Chum, A. Mikulik, M. Perdoch, and J. Matas.  CVPR 2011.  [pdf]

            • Geometric Min-Hashing: Finding a (Thick) Needle in a Haystack, O. Chum, M. Perdoch, and J. Matas.  CVPR 2009.  [pdf]

            • Three Things Everyone Should Know to Improve Object Retrieval.  R. Arandjelovic and A. Zisserman.  CVPR 2012.  [pdf]

            • Video Mining with Frequent Itemset Configurations.  T. Quack, V. Ferrari, and L. Van Gool.  CIVR 2006.  [pdf]

            • Bundling Features for Large Scale Partial-Duplicate Web Image Search.  Z. Wu, Q. Ke, M. Isard, and J. Sun.  CVPR 2009.  [pdf]

            • Improving Image-based Localization by Active Correspondence Search. T. Sattler, B. Leibe, L. Kobbelt.  ECCV 2012.  [pdf]

            • Learning Binary Projections for Large-Scale Image Search.  K. Grauman and R. Fergus.  Chapter to appear in Registration, Recognition, and Video Analysis, R. Cipolla, S. Battiato, and G. Farinella, Editors.  [pdf]

            • Learning Query-dependent Prefilters for Scalable Image Retrieval.  L. Torresani, M. Szummer, and A. Fitzgibbon.  CVPR 2009.  [pdf]

            • Detecting Objects in Large Image Collections and Videos by Efficient Subimage Retrieval, C. Lampert, ICCV 2009.  [pdf]  [code

            • Efficiently Searching for Similar Images.  K. Grauman.  Communications of the ACM, 2009.  [CACM link]

            • Fast Image Search for Learned Metrics, P. Jain, B. Kulis, and K. Grauman, CVPR 2008.  [pdf]

            • Small Codes and Large Image Databases for Recognition, A. Torralba, R. Fergus, and Y. Weiss, CVPR 2008.  [pdf]

            • Object Retrieval with Large Vocabularies and Fast Spatial Matching.  J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman, CVPR 2007.  [pdf] [approx k-means code]

            • City-Scale Location Recognition, G. Schindler, M. Brown, and R. Szeliski, CVPR 2007.  [pdf]
            [outline]
            [
            wrap-up on instance recognition, large-scale search]

            Sept 21
            Classification and detection for object categories

            Global appearance models for category and scene recognition; sliding window detection, voting-based detection, detection as a binary decision problem.

            dpm
            • *A Discriminatively Trained, Multiscale, Deformable Part Model, by P. Felzenszwalb,  D.  McAllester and D. Ramanan.   CVPR 2008.  [pdf]  [code

            • *Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories, Lazebnik, Schmid, and Ponce, CVPR 2006. [pdf]  [15 scenes dataset]  [libpmk] [Matlab]

            • *Class-specific Hough Forests for Object Detection.  J. Gall and V. Lempitsky.  CVPR 2009.  [pdf] [slides] [code]

            • Robust Object Detection with Interleaved Categorization and Segmentation.  B. Leibe, A. Leonardis, and B. Schiele.  IJCV 2008.  [pdf]  [code]

            • The Devil is in the Details: an Evaluation of Recent Feature Encoding Methods.  K. Chatfield, V. Lempitsky, A. Vedaldi, A. Zisserman.  BMVC 2011.  [pdf] [code

            • Rapid Object Detection Using a Boosted Cascade of Simple Features, Viola and Jones, CVPR 2001.  [pdf]  [code]

            • Histograms of Oriented Gradients for Human Detection, Dalal and Triggs, CVPR 2005.  [pdf]  [video] [code] [PASCAL datasets]

            • Modeling the Shape of the Scene: a Holistic Representation of the Spatial Envelope, Oliva and Torralba, IJCV 2001.  [pdf]  [Gist code

            • Locality-Constrained Linear Coding for Image Classification.  J. Wang, J. Yang, K. Yu,  and T. Huang  CVPR 2010. [pdf] [code]

            • Visual Categorization with Bags of Keypoints, C. Dance, J. Willamowski, L. Fan, C. Bray, and G. Csurka, ECCV International Workshop on Statistical Learning in Computer Vision, 2004.  [pdf]

            • Pedestrian Detection in Crowded Scenes, Leibe, Seemann, and Schiele, CVPR 2005.  [pdf]

            • Pyramids of Histograms of Oriented Gradients (pHOG), Bosch and Zisserman. [code]

            • Sampling Strategies for Bag-of-Features Image Classification.  E. Nowak, F. Jurie, and B. Triggs.  ECCV 2006. [pdf]

            • Beyond Sliding Windows: Object Localization by Efficient Subwindow Search.  C. Lampert, M. Blaschko, and T. Hofmann.  CVPR 2008.  [pdf]  [code]

            • Diagnosing Error in Object Detectors.  D. Hoiem et al. ECCV 2012.  [pdf]
            [outline]
            [
            slides part 1]
            Heath-expt
            Nona-paper

            HW1 due Friday Sept 21, 11:59 pm
            Sept 28
            Mid-level representations

            Segmentation into regions, grouping, surface estimation

            surfaces


            • *Constrained Parametric Min-Cuts for Automatic Object Segmentation. J. Carreira and C. Sminchisescu. CVPR 2010.  [pdf] [code]

            • *From Contours to Regions: An Empirical Evaluation.  P. Arbelaez, M. Maire, C. Fowlkes, and J. Malik.  CVPR 2009.  [pdf] [code and data] [journal paper]

            • *Indoor Segmentation and Support Inference from RGBD Images.  N. Silberman, D. Hoiem, P. Kohli, and R. Fergus.  ECCV 2012.  [pdf] [NYU depth dataset]
            • Geometric Context from a Single Image, D. Hoiem, A. Efros, and M. Hebert, ICCV 2005. [pdf]  [web]  [code]

            • Category Independent Object Proposals.  I. Endres and D. Hoiem.  ECCV 2010.  [pdf] [code/data]

            • Geometric reasoning for single image structure recovery.  D. Lee, M. Hebert, T. Kanade.  CVPR 2009.  [pdf]  [code]

            • Boundary-Preserving Dense Local Regions.  J. Kim and K. Grauman.  CVPR 2011.  [pdf]  [code]

            • Object Recognition as Ranking Holistic Figure-Ground Hypotheses. F. Li, J. Carreira, and C. Sminchisescu. CVPR 2010. [pdf]

            • People Watching: Human Actions as a Cue for Single View Geometry.  D. Fouhey et al. ECCV 2012.  [pdf]
            • Using Multiple Segmentations to Discover Objects and their Extent in Image Collections, B. C. Russell, A. A. Efros, J. Sivic, W. T. Freeman, and A. Zisserman.  CVPR 2006.  [pdf] [code]

            • Combining Top-down and Bottom-up Segmentation. E. Borenstein, E. Sharon, and S. Ullman.  CVPR  workshop 2004.  [pdf]  [data]

            • Learning Mid-level Features for Recognition. Y.-L. Boureau, F. Bach, Y. LeCun, and J. Ponce. CVPR, 2010. 

            • Class-Specific, Top-Down Segmentation, E. Borenstein and S. Ullman, ECCV 2002.  [pdf]

            • GrabCut -Interactive Foreground Extraction using Iterated Graph Cuts, by C. Rother, V. Kolmogorov, A. Blake, SIGGRAPH 2004.  [pdf]  [project page]

            • Robust Higher Order Potentials for Enforcing Label Consistency, P. Kohli, L. Ladicky, and P. Torr. CVPR 2008.  

            • Collect-Cut: Segmentation with Top-Down Cues Discovered in Multi-Object Images.  Y. J. Lee and K. Grauman. CVPR 2010.  [pdf] [data]

            • Shape Sharing for Object Segmentation.  J. Kim and K. Grauman.  ECCV 2012.  [pdf]
            • Normalized Cuts and Image Segmentation, J. Shi and J. Malik.  PAMI 2000.  [pdf]  [code]

            [outline]
            [
            slides]
            Che-Chun-expt
            Elad-expt
            Sanmit-paper
            Islam-paper
            Chao-paper


            B. Beyond modeling individual objects
            Oct 5
            Context and scenes

            Multi-object scenes, inter-object relationships, understanding scenes' spatial layout

            context
            • *Scene Semantics from Long-term Observation of People.  V. Delaitre, D. Fouhey, I. Laptev, J. Sivic, A. Gupta, A. Efros.  ECCV 2012 [pdf] [web] [pose code]
            • *Multi-Class Segmentation with Relative Location Prior.  S. Gould, J. Rodgers, D. Cohen, G. Elidan and D.  Koller.  IJCV 2008. [pdf] [code]

            • *Using the Forest to See the Trees: Exploiting Context for Visual Object Detection and Localization.  Torralba, Murphy, and Freeman.  CACM 2009.  [pdf] [related code]

            • Object-Graphs for Context-Aware Category Discovery.  Y. J. Lee and K. Grauman.  CVPR 2010.  [pdf] [code]

            • Estimating Spatial Layout of Rooms using Volumetric Reasoning about Objects and Surfaces.  D. Lee, A. Gupta, M. Hebert, and T. Kanade.  NIPS 2010.  [pdf] [code]

            • Contextual Priming for Object Detection, A. Torralba.  IJCV 2003.  [pdf] [web] [code]

            • Object Bank: A High-Level Image Representation for Scene Classification & Semantic Feature Sparsification.  L-J. Li, H. Su, E. Xing, L. Fei-Fei.  NIPS 2010.  [pdf]  [code]

            • RGB-D scene labeling: features and algorithms. X. Ren, L. Bo, and D. Fox.  CVPR 2012. [pdf] [code]

            • TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-Class Object Recognition and Segmentation.  J. Shotton, J. Winn, C. Rother, A. Criminisi.  ECCV 2006.  [pdf] [web] [data] [code]

            • Recognition Using Visual Phrases.  M. Sadeghi and A. Farhadi.  CVPR 2011.  [pdf]

            • Thinking Inside the Box: Using Appearance Models and Context Based on Room Geometry.  V. Hedau, D. Hoiem, and D. Forsyth.  ECCV 2010 [pdf] [code and data]

            • Blocks World Revisited: Image Understanding Using Qualitative Geometry and Mechanics, A. Gupta, A. Efros, and M. Hebert.  ECCV 2010. [pdf]  [code]

            • Geometric Reasoning for Single Image Structure Recovery.  D. Lee, M. Hebert, and T. Kanade.  CVPR 2009.  [pdf]  [web[code]

            • Putting Objects in Perspective, by D. Hoiem, A. Efros, and M. Hebert, CVPR 2006.  [pdf] [web]

            • Discriminative Models for Multi-Class Object Layout, C. Desai, D. Ramanan, C. Fowlkes. ICCV 2009.  [pdf]  [slides]  [SVM struct code] [data]

            • Closing the Loop in Scene Interpretation.  D. Hoiem, A. Efros, and M. Hebert.  CVPR 2008.  [pdf]

            • Decomposing a Scene into Geometric and Semantically Consistent Regions, S. Gould, R. Fulton, and D. Koller, ICCV 2009.  [pdf]  [slides]

            • Learning Spatial Context: Using Stuff to Find Things, by G. Heitz and D. Koller, ECCV 2008.  [pdf] [code]

            • An Empirical Study of Context in Object Detection, S. Divvala, D. Hoiem, J. Hays, A. Efros, M. Hebert, CVPR 2009.  [pdf]  [web]

            • Object Categorization using Co-Occurrence, Location and Appearance, by C. Galleguillos, A. Rabinovich and S. Belongie, CVPR 2008.[ pdf]

            • Context Based Object Categorization: A Critical SurveyC. Galleguillos and S. Belongie.  [pdf]

            • What, Where and Who? Classifying Events by Scene and Object Recognition, L.-J. Li and L. Fei-Fei, ICCV 2007. [pdf]

            • Simultaneous Visual Recognition of Manipulation Actions and Manipulated Objects.  H. Kjellstrom et al. ECCV 2008.  [pdf]

            • Modeling mutual context of object and human pose in human-object interaction activities.   B. Yao and L. Fei-Fei.  CVPR 2010.  [pdf]

            Jacob-paper
            Aron-paper
            Aashish-expt
            David-expt
            HW2 due, Friday Oct 5, 11:59 pm
            Oct 12
            Dealing with many categories

            Sharing features between classes, transfer, taxonomy, learning from few examples, exploiting class relationships

            shared features
            • *Sharing Visual Features for Multiclass and Multiview Object Detection, A. Torralba, K. Murphy, W. Freeman, PAMI 2007.  [pdf]  [code]

            • *Hedging Your Bets: Optimizing Accuracy-Specificity Trade-offs in Large Scale Visual Recognition.  J. Deng, J. Krause, A. Berg, L. Fei-Fei.  CVPR 2012 [pdf] [supp] [ILSVRC data]

            • *Tabula Rasa: Model Transfer for Object Category Detection. Y. Atar and A. Zisserman.  CVPR 2011. [pdf] [HoG code]

            • What Does Classifying More than 10,000 Image Categories Tell Us? J. Deng, A. Berg, K. Li and L. Fei-Fei.  ECCV 2010.  [pdf]

            • Discriminative Learning of Relaxed Hierarchy for Large-scale Visual Recognition.  T. Gao and Daphne Koller ICCV 2011.  [pdf] [code]

            • Comparative Object Similarity for Improved Recognition with Few or Zero Examples. G. Wang, D. Forsyth, and D. Hoeim. CVPR 2010. [pdf]

            • Learning and Using Taxonomies for Fast Visual Categorization, G. Griffin and P. Perona, CVPR 2008.  [pdf] [data]

            • 80 Million Tiny Images: A Large Dataset for Non-Parametric Object and Scene Recognition, by A. Torralba, R. Fergus, and W. Freeman.  PAMI 2008.  [pdf] [web]

            • Constructing Category Hierarchies for Visual Recognition, M. Marszalek and C. Schmid.  ECCV 2008.  [pdf]  [web] [Caltech256]

            • Learning Generative Visual Models from Few Training Examples: an Incremental Bayesian Approach Tested on 101 Object Categories. L. Fei-Fei, R. Fergus, and P. Perona. CVPR Workshop on Generative-Model Based Vision. 2004.  [pdf] [Caltech101]

            • Towards Scalable Representations of Object Categories: Learning a Hierarchy of Parts. S. Fidler and A. Leonardis.  CVPR 2007  [pdf]

            • Exploiting Object Hierarchy: Combining Models from Different Category Levels, A. Zweig and D. Weinshall, ICCV 2007 [pdf]

            • Incremental Learning of Object Detectors Using a Visual Shape Alphabet.  Opelt, Pinz, and Zisserman, CVPR 2006.  [pdf]

            • ImageNet: A Large-Scale Hierarchical Image Database, J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li and L. Fei-Fei, CVPR 2009 [pdf]  [data]

            • Semantic Label Sharing for Learning with Many Categories.  R. Fergus et al.  ECCV 2010.  [pdf]

            • Learning a Tree of Metrics with Disjoint Visual Features.  S. J. Hwang, K. Grauman, F. Sha.  NIPS 2011. 

            Elad-paper
            Gary-expt


            Wed Oct 17 project proposal abstract due
            Oct 19 Describing objects with attributes

            Visual properties, learning from natural language descriptions, intermediate representations

            attributes
            • *Describing Objects by Their Attributes, A. Farhadi, I. Endres, D. Hoiem, and D. Forsyth, CVPR 2009.  [pdf]  [web] [data]

            • *FaceTracer: A Search Engine for Large Collections of Images with Faces.  N. Kumar, P. Belhumeur, and S. Nayar.  ECCV 2008.  [pdf] [code, data, demo]
            • *Relative Attributes.  D. Parikh and K. Grauman.  ICCV 2011.  [pdf]  [code/data]

            • Attribute and Simile Classifiers for Face Verification, N. Kumar, A. Berg, P. Belhumeur, S. Nayar.  ICCV 2009.  [pdf] [web] [lfw data] [pubfig data]

            • Learning To Detect Unseen Object Classes by Between-Class Attribute Transfer, C. Lampert, H. Nickisch, and S. Harmeling, CVPR 2009  [pdf] [web] [data]

            • A Joint Learning Framework for Attribute Models and Object Descriptions.  D. Mahajan, S. Sellamanickam, V. Nair.  ICCV 2011.  [pdf]

            • WhittleSearch: Image Search with Relative Attribute Feedback.  A. Kovashka, D. Parikh, K. Grauman.  CVPR 2012.  [pdf] [data]
            • SUN Attribute Database: Discovering, Annotating, and Recognizing Scene Attributes.  G. Patterson and J. Hays.  CVPR 2012.  [pdf] [data]

            • Multi-Attribute Spaces: Calibration for Attribute Fusion and Similarity Search.  W. Scheirer, N. Kumar, P. Belhumeur, T. Boult.  CVPR 2012  [pdf]

            • A Discriminative Latent Model of Object Classes and Attributes.  Y. Wang and G. Mori.  ECCV, 2010.  [pdf]

            • Learning Visual Attributes, V. Ferrari and A. Zisserman, NIPS 2007.  [pdf] 

            • Learning Models for Object Recognition from Natural Language Descriptions, J. Wang, K. Markert, and M. Everingham, BMVC 2009.[pdf]

            • Attribute-Centric Recognition for Cross-Category Generalization.  A. Farhadi, I. Endres, D. Hoiem.  CVPR 2010.  [pdf]

            • Automatic Attribute Discovery and Characterization from Noisy Web Data.  T. Berg et al.  ECCV 2010.  [pdf]  [data]

            • Attributes-Based People Search in Surveillance Environments.  D. Vaquero, R. Feris, D. Tran, L. Brown, A. Hampapur, and M. Turk.  WACV 2009.  [pdf] [project page]

            • Image Region Entropy: A Measure of "Visualness" of Web Images Associated with One Concept.  K. Yanai and K. Barnard.  ACM MM 2005.  [pdf]

            • What Helps Where And Why? Semantic Relatedness for Knowledge Transfer. M. Rohrbach, M. Stark, G. Szarvas, I. Gurevych and B. Schiele. CVPR 2010.  [pdf]

            • Recognizing Human Actions by Attributes.  J. Liu, B. Kuipers, S. Savarese, CVPR 2011.  [pdf]

            • Interactively Building a Discriminative Vocabulary of Nameable Attributes.  D. Parikh and K. Grauman.  CVPR 2011.  [pdf] [web]

            Aashish-paper
            Girish-paper
            Sanmit-expt
            Nona-expt



            Oct 26
            Importance and saliency

            Among all items in the scene, which deserve attention (first)?  What makes images interesting or memorable?

            saliency
            • *Understanding and Predicting Importance in Images.  A. Berg et al.  CVPR 2012.  [pdf] [UIUC sentence dataset] [ImageClef dataset]
            • *Learning to Detect a Salient Object.  T. Liu et al. CVPR 2007.  [pdf]  [results]  [data]  [code]

            • *What Makes an Image Memorable?  P. Isola, J. Xiao, A. Torralba, A. Oliva. CVPR 2011. [pdf] [web] [code/data]

            • What Do We Perceive in a Glance of a Real-World Scene?  L. Fei-Fei, A. Iyer, C. Koch, and P. Perona.  Journal of Vision, 2007.  [pdf]

            • A Model of Saliency-based Visual Attention for Rapid Scene Analysis.  L. Itti, C. Koch, and E. Niebur.  PAMI 1998  [pdf]

            • Interesting Objects are Visually Salient.  L. Elazary and L. Itti.  Journal of Vision, 8(3):1–15, 2008.  [pdf]

            • Accounting for the Relative Importance of Objects in Image Retrieval.  S. J. Hwang and K. Grauman.  BMVC 2010.  [pdf] [web] [data]

            • Some Objects are More Equal Than Others: Measuring and Predicting Importance, M. Spain and P. Perona.  ECCV 2008.  [pdf]

            • The Discriminant Center-Surround Hypothesis for Bottom-Up Saliency. D. Gao, V.Mahadevan, and N. Vasconcelos. NIPS, 2007.  [pdf]

            • What is an Object?  B. Alexe, T. Deselaers, and V. Ferrari.  CVPR 2010.  [pdf] [code]

            • A Principled Approach to Detecting Surprising Events in Video.  L. Itti and P. Baldi.  CVPR 2005  [pdf]

            • What Attributes Guide the Deployment of Visual Attention and How Do They Do It? J. Wolfe and T. Horowitz. Neuroscience, 5:495–501, 2004.  [pdf]

            • Visual Correlates of Fixation Selection: Effects of Scale and Time. B. Tatler, R. Baddeley, and I. Gilchrist. Vision Research, 45:643, 2005.  [pdf]

            • Objects Predict Fixations Better than Early Saliency.  W. Einhauser, M. Spain, and P. Perona. Journal of Vision, 8(14):1–26, 2008.  [pdf]

            • Reading Between the Lines: Object Localization Using Implicit Cues from Image Tags.  S. J. Hwang and K. Grauman.  CVPR 2010.  [pdf]

            • Peripheral-Foveal Vision for Real-time Object Recognition and Tracking in Video.  S. Gould, J. Arfvidsson, A. Kaehler, B. Sapp, M. Messner, G. Bradski, P. Baumstrack,S. Chung, A. Ng.  IJCAI 2007.  [pdf]

            • Determining Patch Saliency Using Low-Level Context, D. Parikh, L. Zitnick, and T. Chen. ECCV 2008.  [pdf]

            • Key-Segments for Video Object Segmentation.  Y. J. Lee, J. Kim, and K. Grauman.  ICCV 2011  [pdf]

            • Contextual Guidance of Eye Movements and Attention in Real-World Scenes: The Role of Global Features on Object Search.  A. Torralba, A. Oliva, M. Castelhano, J. Henderson.  [pdf] [web]

            • The Role of Top-down and Bottom-up Processes in Guiding Eye Movements during Visual Search, G. Zelinsky, W. Zhang, B. Yu, X. Chen, D. Samaras, NIPS 2005.  [pdf]

            Islam-expt
            Chao-expt
            Che-Chun-paper
            Niveda-paper
            Wed Oct 31: project extended outlines due
            C. Human-centered recognition
            Nov 2
            Pictures of people

            Finding people, predicting their poses and attributes, automatic face tagging

            poselets
            • *Poselets: Body Part Detectors Trained Using 3D Human Pose Annotations, L. Bourdev and J. Malik.  ICCV 2009  [pdf[code]

            • *Real-Time Human Pose Recognition in Parts from a Single Depth Image.  J. Shotton et al.  CVPR 2011. [pdf] [video] [web]

            • *Where’s Waldo: Matching People in Images of Crowds.  R. Garg, D. Ramanan, S. Seitz, N. Snavely. CVPR 2011.  [pdf] [web]

            • *Understanding Images of Groups of People, A. Gallagher and T. Chen, CVPR 2009.  [pdf]  [web] [data]

            • Parsing Clothing in Fashion Photographs.  K. Yamaguchi et al. CVPR 2012.  [pdf] [data]
            • Contextual Identity Recognition in Personal Photo Albums. D. Anguelov, K.-C. Lee, S. Burak, Gokturk, and B. Sumengen. CVPR 2007.  [pdf]

            • Recognizing Proxemics in Personal Photos.  Y. Yang, S. Baker, A. Kannan, D. Ramanan.  CVPR 2012.  [pdf]
            • Who are you? - Learning Person Specific Classifiers from Video, J. Sivic, M. Everingham, and A. Zisserman, CVPR 2009.  [pdf] [data] [KLT tracking code]

            • Describing Clothing by Semantic Attributes.  A. Gallagher et al.  ECCV 2012.  [pdf]

            • Describing People: A Poselet-Based Approach to Attribute Classification.  L. Bourdev, S. Maji, J. Malik.  ICCV 2011.  [pdf]

            • Weakly Supervised Learning of Interactions between Humans and Objects.  Prest et al. PAMI 2012. [pdf]
            • Finding and Tracking People From the Bottom Up.  D. Ramanan, D. A. Forsyth.  CVPR 2003.  [pdf]

            • Autotagging Facebook: Social Network Context Improves Photo Annotation, by  Z. Stone, T. Zickler, and T. Darrell.  CVPR Internet Vision Workshop 2008.   [pdf]

            • Efficient Propagation for Face Annotation in Family Albums. L. Zhang, Y. Hu, M. Li, and H. Zhang.  MM 2004.  [pdf]

            • Progressive Search Space Reduction for Human Pose Estimation.  Ferrari, V., Marin-Jimenez, M. and Zisserman, A.  CVPR 2008.  [pdf] [web] [code]

            • Names and Faces in the News, by T. Berg, A. Berg, J. Edwards, M. Maire, R. White, Y. Teh, E. Learned-Miller and D. Forsyth, CVPR 2004.  [pdf]  [web]

            • Face Discovery with Social Context.  Y. J. Lee and K. Grauman.  BMVC 2011.  [pdf]

            • “Hello! My name is... Buffy” – Automatic Naming of Characters in TV Video, by M. Everingham, J. Sivic and A. Zisserman, BMVC 2006.  [pdf]  [web]  [data]

            • Pictorial Structures Revisited: People Detection and Articulated Pose Estimation.  M. Andriluka et al. CVPR 2009.  [pdf]  [code]

            • Exploring Photobios.  I. Kemelmacher-Shlizerman, E. Shechtman, R. Garg, S. Seitz.  SIGGRAPH 2011.  [pdf]
            Deepti-paper
            Randall-expt
            Aron-expt
            Dinesh-paper

            Nov 9
            Activity recognition

            Recognizing and localizing human actions in video or static images

            actions
            • *Learning Realistic Human Actions from Movies.  I. Laptev, M. Marszałek, C. Schmid and B. Rozenfeld.  CVPR 2008.  [pdf]  [data] [code]

            • *A Unified Framework for Multi-Target Tracking and Collective Activity Recognition.  W. Choi and S. Savarese.  ECCV 2012.  [pdf] [web] [video] [data]

            • *Detecting Actions, Poses, and Objects with Relational Phraselets.  C. Desai and D. Ramanan.  ECCV 2012.  [pdf] [data] [code]

            • Beyond Actions: Discriminative Models for Contextual Group Activities.  T. Lian, Y. Wang, W. Yang, and G. Mori.  NIPS 2010.  [pdf] [data]

            • Efficient Activity Detection with Max-Subgraph Search.  C.-Y. Chen and K. Grauman. CVPR 2012.  [pdf] [project page]  [code]

            • Action Bank: a High-Level Representation of Activity in Video.  S. Sadanand and J. Corso.  CVPR 2012 [pdf]  [code/data]

            • A Hough Transform-Based Voting Framework for Action Recognition.  A. Yao, J. Gall, L. Van Gool.  CVPR 2010.  [pdf[code/data]

            • Actions in Context, M. Marszalek, I. Laptev, C. Schmid.  CVPR 2009.  [pdf] [web] [data]

            • Objects in Action: An Approach for Combining Action Understanding and Object Perception.   A. Gupta and L. Davis.  CVPR, 2007.  [pdf]  [data]

            • Exemplar-based Action Recognition in Video. G. Willems, J. Becker, T. Tuytelaars, and L. V. Gool. BMVC, 2009.
            • A Scalable Approach to Activity Recognition Based on Object Use. J. Wu, A. Osuntogun, T. Choudhury, M. Philipose, and J. Rehg.  ICCV 2007.  [pdf]

            • Recognizing Actions at a Distance.  A. Efros, G. Mori, J. Malik.  ICCV 2003.  [pdf] [web]

            • Action Recognition from a Distributed Representation of Pose and Appearance, S. Maji, L. Bourdev, J.  Malik, CVPR 2011.  [pdf]  [code]

            • Learning a Hierarchy of Discriminative Space-Time Neighborhood Features for Human Action Recognition.  A. Kovashka and K. Grauman.  CVPR 2010.  [pdf]

            • Temporal Causality for the Analysis of Visual Events.  K. Prabhakar, S. Oh, P. Wang, G. Abowd, and J. Rehg.  CVPR 2010.  [pdf] [Georgia Tech Computational Behavior Science project]

            • What's Going on?: Discovering Spatio-Temporal Dependencies in Dynamic Scenes.  D. Kuettel et al.  CVPR 2010.  [pdf]

            • Learning Actions From the Web.  N. Ikizler-Cinbis, R. Gokberk Cinbis, S. Sclaroff.  ICCV 2009.  [pdf]

            Girish-expt
            Gary-paper
            David-paper


            Nov 16
            Egocentric cameras

            Analyzing data from wearable, mobile cameras; "first person" vision

            camera
            • *Social Interactions: A First-Person Perspective. A. Fathi, J. Hodgins, J. Rehg.  CVPR 2012  [pdf] [data]
            • *Recognizing Activities of Daily Living in First-Person Camera Views.  H. Pirsiavash and D. Ramanan.  CVPR 2012.  [pdf] [data/code]

            • *Novelty Detection from an Egocentric Perspective. O. Aghazadeh, J. Sullivan, and S. Carlsson. CVPR 2011 [pdf] [web/data]

            • Discovering Important People and Objects for Egocentric Video Summarization.  Y. J. Lee, J. Ghosh, and K. Grauman.  CVPR 2012.  [pdf]  [web]
            • Understanding Egocentric Activities.  A. Fathi, A. Farhadi, J. Rehg.  ICCV 2011. [pdf] [data]

            • Learning to Recognize Objects in Egocentric Activities.  A. Fathi, X. Ren, J. Rehg.  CVPR 2011.  [pdf]
            • Figure-Ground Segmentation Improves Handled Object Recognition in Egocentric Video.  X. Ren and C. Gu.  CVPR 2010 [pdf] [videos] [data]

            • Egocentric Recognition of Handled Objects: Benchmark and Analysis.  X. Ren and M. Philipose.  Egovision Workshop 2009.  [pdf] [data]

            • Activity Recognition from First Person Sensing.  E. Taralova, F. De la Torre, M. Hebert  CVPR 2009 Workshop on Egocentric Vision  [pdf]

            • Close-Range Human Detection for Head-Mounted Cameras.  D. Mitzel and B. Leibe.  BMVC 2012.  [pdf]

            • Structural Epitome: A Way to Summarize One’s Visual Experience. N. Jojic, A. Perina, and V. Murino. NIPS 2010.  [pdf]

            • Fast Unsupervised Ego-Action Learning for First-Person Sports Video. K. Kitani, T. Okabe, Y. Sato, and A. Sugimoto. CVPR 2011. [pdf]

            • Wearable Hand Activity Recognition for Event Summarization. W. Mayol and D. Murray. International Symposium on Wearable Computers. IEEE, 2005.  [pdf]

            • Illumination-free Gaze Estimation Method for First-Person Vision Wearable Device.  A. Tsukada, M. Shino, M. Devyver, T. Kanade.  ICCV Workshop 2011.  [pdf]

            • Egovision workshop at CVPR 2012
            Jake-expt
            Randall-paper
            Dinesh-expt



            Nov 30
            Human-in-the-loop interactive systems

            Human-in-the-loop learning, active annotation collection, crowdsourcing

            bird



            • *Multiclass Recognition and Part Localization with Humans in the Loop.  C. Wah et al. ICCV 2011. [pdf] [Caltech/UCSD Visipedia project]  [data]

            • *What’s It Going to Cost You? : Predicting Effort vs. Informativeness for Multi-Label Image Annotations.  S. Vijayanarasimhan and K. Grauman.  CVPR 2009 [pdf] [data] [code]

            • *The Multidimensional Wisdom of Crowds.  Welinder P., Branson S., Belongie S., Perona, P. NIPS 2010. [pdf]  [code]

            • Visual Recognition with Humans in the Loop.  Branson S., Wah C., Babenko B., Schroff F., Welinder P., Perona P., Belongie S.  ECCV 2010. [pdf]  

            • Large-Scale Live Active Learning: Training Object Detectors with Crawled Data and Crowds.  S. Vijayanarasimhan and K. Grauman.  CVPR 2011.  [pdf]

            • WhittleSearch: Image Search with Relative Attribute Feedback.  A. Kovashka, D. Parikh, K. Grauman.  CVPR 2012.  [pdf] [data]

            • Crowdclustering.  R. Gomes, P. Welinder, A. Krause, P. Perona.  NIPS 2011.  [pdf]

            • Adaptively Learning the Crowd Kernel.  O. Tamuz, C. Liu, S. Belongie, O. Shamir, A. Kalai.  ICML 2011 [pdf]

            • LeafSnap: A Computer Vision System for Automatic Plant Species Identification.  N. Kumar et al.  ECCV 2012.  [pdf]

            • Interactive Object Detection.  A. Yao, J. Gall, C. Leistner, L. Van Gool. CVPR 2012.  [pdf]
            • Efficiently Scaling Up Video Annotation with Crowdsourced Marketplaces.  C. Vondrick, D. Ramanan, D. Patterson.  ECCV 2010.  [pdf] [data/code]

            • Video Annotation and Tracking with Active Learning.  C. Vondrick, D. Patterson, D. Ramanan.  NIPS 2011.  [pdf]  [code]

            • Active Frame Selection for Label Propagation in Videos.  S. Vijayanarasimhan and K. Grauman.  ECCV 2012.  [pdf]

            • Annotator Rationales for Visual Recognition.  J. Donahue and K. Grauman.  ICCV 2011. [pdf]

            • Attributes for Classifier Feedback.  A. Parkash and D. Parikh.  ECCV 2012.  [pdf]
            • Combining Self Training and Active Learning for Video Segmentation.  A. Fathi, M. Balcan, X. Ren, J. Rehg.  BMVC 2011.  [pdf]

            • Labeling Images with a Computer Game. L. von Ahn and L. Dabbish. CHI, 2004.

            • Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise.  J. Whitehill et al.  NIPS 2009.  [pdf]
            • Utility Data Annotation with Amazon Mechanical Turk. A. Sorokin and D. Forsyth. Wkshp on Internet Vision, 2008.

            • Far-Sighted Active Learning on a Budget for Image and Video Recognition.  S. Vijayanarasimhan, P. Jain, and K. Grauman.  CVPR 2010.  [pdf]  [code]

            • Active Learning from Crowds.  Y. Yan, R. Rosales, G. Fung, J. Dy.  ICML 2011.  [pdf]

            • Proactive Learning: Cost-Sensitive Active Learning with Multiple Imperfect Oracles.  P. Donmez and J. Carbonell.  CIKM 2008.  [pdf]
            • Inactive Learning?  Difficulties Employing Active Learning in Practice.  J. Attenberg and F. Provost.  SIGKDD 2011. [pdf]

            • Actively Selecting Annotations Among Objects and Attributes.  A. Kovashka, S. Vijayanarasimhan, and K. Grauman.  ICCV 2011  [pdf]

            • Supervised Learning from Multiple Experts: Whom to Trust When Everyone Lies a Bit.  V. Raykar et al.  ICML 2009.  [pdf]
            • Multi-class Active Learning for Image Classification.  A. J. Joshi, F. Porikli, and N. Papanikolopoulos.  CVPR 2009.  [pdf]

            • GrabCut -Interactive Foreground Extraction using Iterated Graph Cuts, by C. Rother, V. Kolmogorov, A. Blake, SIGGRAPH 2004.  [pdf]  [project page]

            • Peekaboom: A Game for Locating Objects in Images, by L. von Ahn, R. Liu and M. Blum, CHI 2006. [pdf]  [web]
              Deepti-expt
              Heath-paper
              Niveda-expt

              Dec 7
              Final project presentations in class


              Final papers due


              Other useful links:

               
               

               

              Posted by uniqueone
              ,

              Independent Component Analysis and its Extensions as Models of Natural Image Statistics

              Patrik Hoyer and Aapo Hyvärinen, currently of the Neuroinformatics Group at University of Helsinki.

              Get the MATLAB package for estimating ICA, ISA, and TICA bases from image data!


              Consider a typical natural image:

              \resizebox {10cm}{!}{\includegraphics{typicalimage.ps}}

              When we do ICA on image data, that means we are simply trying to find an expansion of the form

              \includegraphics {win0.ps}= s1\includegraphics {win1.ps}+ s2\includegraphics {win2.ps}+ ... + sk\includegraphics {wink.ps}

              such that for any given window from the image, information about one of the coefficients gives as little information as possible about the others. In other words, they are independent. In the standard ICA model x = As, the coefficients correspond to realizations of the signals s and the basis windows are the column vectors of A.

              Several investigations by different research groups have indicated that such an objective gives basis windows which are localized both in space and in frequency, resembling the wavelets of signal processing. This is an example of such a basis.

              Thus, one may see ICA (and sparse coding, which is closely related to ICA for images) as a way of choosing a basis which is custom-tailored to the data.


              Independent Subspace Analysis and Complex Cell Properties

              We have also introduced modifications of the basic ICA model that describe further aspects of natural image statistics. The modifications use a linear decomposition as illustrated above, but the components si are not assumed to be all independent.

              The first model in this direction is independent subspace analysis, in which the components are divided into groups or subspaces so that components in different subspaces are independent, but components in the same subspace are not. In particular, the distribution of the components in a subspace is assumed to depend only on the norm of the projection on that subspace. Typically, this implies that the components of a subspace tend to be active simultaneously.

              When estimated from natural image data, the model shows emergence of complex cell properties, in particular phase and translation invariance, together with orientation and frequency selectivity. Here are the estimated basis vectors, grouped according to the subspace structure.


              Topographic Independent Component Analysis

              Furthermore, we have generalized the independent subspace model so that it models more general dependency structures.

              The point is to define a topographic order using the higher-order correlations of the components. Basically, we use correlations of energies, i.e. squares, of the components. Thus we order the basis vectors so that component that are near-by in the topographic representation tend to be active, i.e. non-zero at the same time. This can be considered as a generalization of independent subspaces, so that every neighbourhood corresponds to one subspace. Thus we obtain a linear representation in which the coefficients and the basis vectors have a topographic organization that gives us information on the statistical higher-order structure of the data.

              When estimated from natural image data, the model shows simultaneous emergence of topography and complex cell properties. This is because every neighbourhood corresponds to one feature subspace as in independent subspace analysis, i.e. one complex cell.


              For details, see the articles available on the publication pages of
              Aapo Hyvärinen and Patrik Hoyer

              Some data we are currently using can be found here.

              Neuroinformatics Group at University of Helsinki



              Patrik Hoyer & Aapo Hyvarinen
              15 Feb 2000

               

              Posted by uniqueone
              ,

              http://www.aishack.in/2010/05/sift-scale-invariant-feature-transform/4/

              Up till now, we have generated a scale space and used the scale space to calculate the Difference of Gaussians. Those are then used to calculate Laplacian of Gaussian approximations that is scale invariant. I told you that they produce great key points. Here’s how it’s done!

              Finding key points is a two part process

              1. Locate maxima/minima in DoG images
              2. Find subpixel maxima/minima

              Locate maxima/minima in DoG images

              The first step is to coarsely locate the maxima and minima. This is simple. You iterate through each pixel and check all it’s neighbours. The check is done within the current image, and also the one above and below it. Something like this:

              X marks the current pixel. The green circles mark the neighbours. This way, a total of 26 checks are made. X is marked as a “key point” if it is the greatest or least of all 26 neighbours.

              Usually, a non-maxima or non-minima position won’t have to go through all 26 checks. A few initial checks will usually sufficient to discard it.

              Note that keypoints are not detected in the lowermost and topmost scales. There simply aren’t enough neighbours to do the comparison. So simply skip them!

              Once this is done, the marked points are the approximate maxima and minima. They are “approximate” because the maxima/minima almost never lies exactly on a pixel. It lies somewhere between the pixel. But we simply cannot access data “between” pixels. So, we must mathematically locate the subpixel location.

              Here’s what I mean:

              The red crosses mark pixels in the image. But the actual extreme point is the green one.

              Find subpixel maxima/minima

              Using the available pixel data, subpixel values are generated. This is done by the Taylor expansion of the image around the approximate key point.

              Mathematically, it’s like this:

              We can easily find the extreme points of this equation (differentiate and equate to zero). On solving, we’ll get subpixel key point locations. These subpixel values increase chances of matching and stability of the algorithm.

              Example

              Here’s a result I got from the example image I’ve been using till now:

              The author of SIFT recommends generating two such extrema images. So, you need exactly 4 DoG images. To generate 4 DoG images, you need 5 Gaussian blurred images. Hence the 5 level of blurs in each octave.

              In the image, I’ve shown just one octave. This is done for all octaves. Also, this image just shows the first part of keypoint detection. The Taylor series part has been skipped.

              Summary

              Here, we detected the maxima and minima in the DoG images generated in the previous step. This is done by comparing neighbouring pixels in the current scale, the scale “above” and the scale “below”.

              Next, we’ll reject some keypoints detected here. This is because they either don’t have enough contrast or they lie on an edge

              Got questions or suggestions? Leave a comment below!

              Posted by uniqueone
              ,

              http://www.aishack.in/2010/05/sift-scale-invariant-feature-transform/3/

              In the previous step , we created the scale space of the image. The idea was to blur an image progressively, shrink it, blur the small image progressively and so on. Now we use those blurred images to generate another set of images, the Difference of Gaussians (DoG). These DoG images are a great for finding out interesting key points in the image.

              Laplacian of Gaussian

              The Laplacian of Gaussian (LoG) operation goes like this. You take an image, and blur it a little. And then, you calculate second order derivatives on it (or, the “laplacian”). This locates edges and corners on the image. These edges and corners are good for finding keypoints.

              But the second order derivative is extremely sensitive to noise. The blur smoothes it out the noise and stabilizes the second order derivative.

              The problem is, calculating all those second order derivatives is computationally intensive. So we cheat a bit.

              The Con

              To generate Laplacian of Guassian images quickly, we use the scale space. We calculate the difference between two consecutive scales. Or, the Difference of Gaussians. Here’s how:

              These Difference of Gaussian images are approximately equivalent to the Laplacian of Gaussian. And we’ve replaced a computationally intensive process with a simple subtraction (fast and efficient). Awesome!

              These DoG images comes with another little goodie. These approximations are also “scale invariant”. What does that mean?

              The Benefits

              Just the Laplacian of Gaussian images aren’t great. They are not scale invariant. That is, they depend on the amount of blur you do. This is because of the Gaussian expression. (Don’t panic ;) )

              See the σ2 in the demonimator? That’s the scale. If we somehow get rid of it, we’ll have true scale independence. So, if the laplacian of a gaussian is represented like this:

              Then the scale invariant laplacian of gaussian would look like this:

              But all these complexities are taken care of by the Difference of Gaussian operation. The resultant images after the DoG operation are already multiplied by the σ2. Great eh!

              Oh! And it has also been proved that this scale invariant thingy produces much better trackable points! Even better!

              Side effects

              You can’t have benefits without side effects >.<

              You know the DoG result is multiplied with σ2. But it’s also multiplied by another number. That number is (k-1). This is the k we discussed in the previous step.

              But we’ll just be looking for the location of the maximums and minimums in the images. We’ll never check the actual values at those locations. So, this additional factor won’t be a problem to us. (Even if you multiply throughout by some constant, the maxima and minima stay at the same location)

              Example

              Here’s a gigantic image to demonstrate how this difference of Gaussians works.

              In the image, I’ve done the subtraction for just one octave. The same thing is done for all octaves. This generates DoG images of multiple sizes.

              Summary

              Two consecutive images in an octave are picked and one is subtracted from the other. Then the next consecutive pair is taken, and the process repeats. This is done for all octaves. The resulting images are an approximation of scale invariant laplacian of gaussian (which is good for detecting keypoints). There are a few “drawbacks” due to the approximation, but they won’t affect the algorithm.

              Next, we’ll actually find some interesting keypoints. Maxima and Minima. Or, Maximums and Minimums of the image.

              Got any questions or suggestions? Leave a comment below!

               

              Posted by uniqueone
              ,
              http://www.aishack.in/2010/05/sift-scale-invariant-feature-transform/2/

              Real world objects are meaningful only at a certain scale. You might see a sugar cube perfectly on a table. But if looking at the entire milky way, then it simply does not exist. This multi-scale nature of objects is quite common in nature. And a scale space attempts to replicate this concept on digital images.

              Scale spaces

              Do you want to look at a leaf or the entire tree? If it’s a tree, get rid of some detail from the image (like the leaves, twigs, etc) intentionally.

              While getting rid of these details, you must ensure that you do not introduce new false details. The only way to do that is with the Gaussian Blur (it was proved mathematically, under several reasonable assumptions).

              So to create a scale space, you take the original image and generate progressively blurred out images. Here’s an example:

              Look at how the cat’s helmet loses detail. So do it’s whiskers.

              Scale spaces in SIFT

              SIFT takes scale spaces to the next level. You take the original image, and generate progressively blurred out images. Then, you resize the original image to half size. And you generate blurred out images again. And you keep repeating.

              Here’s what it would look like in SIFT:

              Images of the same size (vertical) form an octave. Above are four octaves. Each octave has 5 images. The individual images are formed because of the increasing “scale” (the amount of blur).

              The technical details

              Now that you know things the intuitive way, I’ll get into a few technical details.

              Octaves and Scales

              The number of octaves and scale depends on the size of the original image. While programming SIFT, you’ll have to decide for yourself how many octaves and scales you want. However, the creator of SIFT suggests that 4 octaves and 5 blur levels are ideal for the algorithm.

              The first octave

              If the original image is doubled in size and antialiased a bit (by blurring it) then the algorithm produces more four times more keypoints. The more the keypoints, the better!

              Blurring

              Mathematically, “blurring” is referred to as the convolution of the gaussian operator and the image. Gaussian blur has a particular expression or “operator” that is applied to each pixel. What results is the blurred image.

              The symbols:

              • L is a blurred image
              • G is the Gaussian Blur operator
              • I is an image
              • x,y are the location coordinates
              • σ is the “scale” parameter. Think of it as the amount of blur. Greater the value, greater the blur.
              • The * is the convolution operation in x and y. It “applies” gaussian blur G onto the image I.

              This is the actual Gaussian Blur operator.

              Amount of blurring

              The amount of blurring in each image is important. It goes like this. Assume the amount of blur in a particular image is σ. Then, the amount of blur in the next image will be k*σ. Here k is whatever constant you choose.

              This is a table of σ’s for my current example. See how each σ differs by a factor sqrt(2) from the previous one.

              Summary

              In the first step of SIFT, you generate several octaves of the original image. Each octave’s image size is half the previous one. Within an octave, images are progressively blurred using the Gaussian Blur operator.

              In the next step, we’ll use all these octaves to generate Difference of Gaussian images.

               

              Posted by uniqueone
              ,
              http://www.aishack.in/2010/05/sift-scale-invariant-feature-transform/

              Matching features across different images in a common problem in computer vision. When all images are similar in nature (same scale, orientation, etc) simple corner detectors can work. But when you have images of different scales and rotations, you need to use the Scale Invariant Feature Transform.

              Why care about SIFT

              SIFT isn’t just scale invariant. You can change the following, and still get good results:

              • Scale (duh)
              • Rotation
              • Illumination
              • Viewpoint

              Here’s an example. We’re looking for these:

              And we want to find these objects in this scene:

              Here’s the result:

              Now that’s some real robust image matching going on. The big rectangles mark matched images. The smaller squares are for individual features in those regions. Note how the big rectangles are skewed. They follow the orientation and perspective of the object in the scene.

              The algorithm

              SIFT is quite an involved algorithm. It has a lot going on and can become confusing, So I’ve split up the entire algorithm into multiple parts. Here’s an outline of what happens in SIFT.

              1. Constructing a scale space
                This is the initial preparation. You create internal representations of the original image to ensure scale invariance. This is done by generating a “scale space”.
              2. LoG Approximation
                The Laplacian of Gaussian is great for finding interesting points (or key points) in an image. But it’s computationally expensive. So we cheat and approximate it using the representation created earlier.
              3. Finding keypoints
                With the super fast approximation, we now try to find key points. These are maxima and minima in the Difference of Gaussian image we calculate in step 2
              4. Get rid of bad key points
                Edges and low contrast regions are bad keypoints. Eliminating these makes the algorithm efficient and robust. A technique similar to the Harris Corner Detector is used here.
              5. Assigning an orientation to the keypoints
                An orientation is calculated for each key point. Any further calculations are done relative to this orientation. This effectively cancels out the effect of orientation, making it rotation invariant.
              6. Generate SIFT features
                Finally, with scale and rotation invariance in place, one more representation is generated. This helps uniquely identify features. Lets say you have 50,000 features. With this representation, you can easily identify the feature you’re looking for (say, a particular eye, or a sign board).

              That was an overview of the entire algorithm. Over the next few days, I’ll go through each step in detail. Finally, I’ll show you how to implement SIFT in OpenCV!

              What do I do with SIFT features?

              After you run through the algorithm, you’ll have SIFT features for your image. Once you have these, you can do whatever you want.

              Track images, detect and identify objects (which can be partly hidden as well), or whatever you can think of. We’ll get into this later as well.

              But the catch is, this algorithm is patented.

              >.<

              So, it’s good enough for academic purposes. But if you’re looking to make something commercial, look for something else! [Thanks to aLu for pointing out SURF is patented too]

              Posted by uniqueone
              ,