Eye Movement Model (EMM)

I cooperated with psychologists to design a computer model which simulates humans’ eye movements for detecting and locating target in images.

First, I improved the basic EMM which uses low-level features such as responses of Gabor-like filters and bottom-up strategy and tested it on Google Earth images with UFOs as targets. But this model can only do global matching on the images. I would like the model do some "semantic-based" search as our human beings do. For example, when you ask to search UFO on roofs or on roads, your vision system will zip out all the unrelated regions such as waters and foliages, which was proved by experiments on human subjects.

Then how to learn the semantics? I proposed to do the patch-scale learning, i.e., divide images into small patches and determine their lables based on their features. To this end, I used the method proposed in this paper for training. Simply speaking, it extracts low-level features and exploit GMM to capture the patch distribution model.

Based on the trained patch distribution model, the patches of a new image can be labeled automatically. Then I added some criteria to the basic EMM to let it only search regions close or inside the related patches.