首頁> 外文學(xué)位 >Exploring image and video by classification and clustering on global and local visual features.

【24h】

Exploring image and video by classification and clustering on global and local visual features.

機(jī)譯：通過對全局和局部視覺特征進(jìn)行分類和聚類來探索圖像和視頻。

獲取原文

獲取原文并翻譯 | 示例

頁面導(dǎo)航

摘要
著錄項(xiàng)
引文網(wǎng)絡(luò)
相似文獻(xiàn)
相關(guān)主題

摘要

Images and Videos are complex 2-dimensional spatially correlated data patterns or 3-dimensional spatial-temporally correlated data volumes. Associating the correlations between visual data signals (acquired by imaging sensors) and high-level semantic human knowledge is the core challenging problem of supervised pattern recognition and computer vision. Finding the underlying correlations among large amounts of image or video data themselves is another unsupervised data self-structuring issue. From the previous literature and our own research work using computing machines as tools, there are a lot of efforts trying to address these two tasks statistically, by making good use of recently developed supervised (a.k.a. Classification) and Unsupervised (a.k.a. Clustering) statistical machine learning paradigms.; In this dissertation, we are interested on studying four specific computer vision problems involving unsupervised visual data partitioning, discriminative multiple-class classification and online adaptive appearance learning, using statistical machine learning techniques. Our four tasks are based on extracting both global and local visual appearance patterns in general image and video domains. First, we develop a new clustering algorithm to exploit temporal video structures into piecewise elements (a.k.a. video shot segmentation) by combining central and subspace constraints for a unified solution. The proposed algorithm is also demonstrated its applicability to illumination-invariant face clustering. Second, we detect and recognize the spatial-temporal video subvolumes as action units using a trained 3D-surface action model via multi-scale temporal searching, The dynamic 3D-surface based action model is built up as an empirical distribution over the basic static posture elements in the spirit of texton representation. Thus the action matching process is based on the similarity measurement among histograms. The basic posture units are considered as intermediate visual representations learned by a three-staged clustering algorithm figure-segmented image sequences. Third, we train a discriminative-probabilistic multi-modal density classifier to evaluate the responses of 20 semantic material classes from a large collection of challenging home photos. Then the task of learning photo categories is based on the global image features extracted from the material class-specific density response maps over spatial domain. We adopt the classifier combination technique of a set of random weak discriminators to handle the complex multi-modal photo-feature distributions in high dimensional parameter space. Fourth, we propose a unified nonparametric approach for three applications: location based dynamic template video tracking in low to medium resolution, segmentation based object-level image matching across viewpoints, and binary foreground/background segmentation tracking. The main contributions exist in three areas: (1) we demonstrate that an online classification framework allows very flexible image density matching function constructions to address the general data-driven classification problem; (2) we devise an effective dynamic appearance modeling algorithm requiring only simple nonparametric computations (mean, median, standard deviation) for easy implementation; (3) we present a random patch based computational representation for classifying image segments of object-specific matching and tracking which is highly descriptive and discriminative compared with general image segment descriptors. This proposed approach has been extensively demonstrated of being able to maintain an effective object-level appearance models quite robustly over time under a variety of challenging conditions, such as severe changing, occluding and deformable appearance templates and moving cameras.

機(jī)譯：圖像和視頻是復(fù)雜的二維空間相關(guān)數(shù)據(jù)模式或三維空間時(shí)間相關(guān)數(shù)據(jù)量。將視覺數(shù)據(jù)信號（由成像傳感器獲?。┡c高級語義人類知識之間的相關(guān)性關(guān)聯(lián)是監(jiān)督模式識別和計(jì)算機(jī)視覺的核心挑戰(zhàn)性問題。在大量圖像或視頻數(shù)據(jù)本身之間找到潛在的相關(guān)性是另一個(gè)不受監(jiān)督的數(shù)據(jù)自構(gòu)造問題。從以前的文獻(xiàn)以及我們自己使用計(jì)算機(jī)器作為工具的研究工作來看，通過充分利用最新開發(fā)的監(jiān)督（即分類）和無監(jiān)督（即聚類）統(tǒng)計(jì)機(jī)器學(xué)習(xí)，人們在統(tǒng)計(jì)上解決這兩個(gè)任務(wù)的工作很多。范例。在本文中，我們有興趣研究使用統(tǒng)計(jì)機(jī)器學(xué)習(xí)技術(shù)研究四個(gè)特定的計(jì)算機(jī)視覺問題，這些問題涉及無監(jiān)督的視覺數(shù)據(jù)分區(qū)，判別性多類分類和在線自適應(yīng)外觀學(xué)習(xí)。我們的四個(gè)任務(wù)基于提取一般圖像和視頻域中的全局和局部視覺外觀模式。首先，我們開發(fā)了一種新的聚類算法，通過將中心空間和子空間約束組合為一個(gè)統(tǒng)一的解決方案，以將時(shí)間視頻結(jié)構(gòu)開發(fā)為分段元素（也稱為視頻鏡頭分割）。所提出的算法也證明了其在光照不變的面部聚類中的適用性。其次，我們使用經(jīng)過訓(xùn)練的3D表面動作模型通過多尺度時(shí)間搜索將時(shí)空視頻子體積檢測為動作單元，并將基于動態(tài)3D表面的動作模型建立為基本靜態(tài)姿勢上的經(jīng)驗(yàn)分布本著Texton表示精神的元素。因此，動作匹配過程基于直方圖之間的相似性度量。基本姿勢單元被認(rèn)為是通過三階段聚類算法圖形分割的圖像序列學(xué)習(xí)的中間視覺表示。第三，我們訓(xùn)練一個(gè)判別概率多模式密度分類器，以評估來自大量具有挑戰(zhàn)性的家庭照片的20種語義材料類別的響應(yīng)。然后，學(xué)習(xí)照片類別的任務(wù)基于從空間域上特定于材料類別的密度響應(yīng)圖提取的全局圖像特征。我們采用一組隨機(jī)弱鑒別器的分類器組合技術(shù)來處理高維參數(shù)空間中的復(fù)雜多模態(tài)光特征分布。第四，我們針對以下三種應(yīng)用提出了一種統(tǒng)一的非參數(shù)方法：低至中分辨率的基于位置的動態(tài)模板視頻跟蹤，跨視點(diǎn)的基于分段的對象級圖像匹配以及二進(jìn)制前景/背景分段跟蹤。主要的貢獻(xiàn)存在于三個(gè)領(lǐng)域：（1）我們證明了在線分類框架允許非常靈活的圖像密度匹配功能構(gòu)造來解決一般的數(shù)據(jù)驅(qū)動分類問題；（2）我們設(shè)計(jì)了一種有效的動態(tài)外觀建模算法，僅需簡單的非參數(shù)計(jì)算（均值，中位數(shù)，標(biāo)準(zhǔn)差）即可實(shí)現(xiàn)；（3）我們提出了一種基于隨機(jī)補(bǔ)丁的計(jì)算表示，用于對特定對象匹配和跟蹤的圖像片段進(jìn)行分類，與常規(guī)圖像片段描述符相比，它具有較高的描述性和區(qū)分性。這種提議的方法已經(jīng)得到了廣泛的證明，能夠在各種挑戰(zhàn)性條件下（例如劇烈變化，遮擋和變形的外觀模板和移動相機(jī)）隨著時(shí)間的推移非常有效地維護(hù)有效的對象級外觀模型。

著錄項(xiàng)

作者
Lu, Le.;
展開▼
作者單位

The Johns Hopkins University.;

展開▼
授予單位 The Johns Hopkins University.;
學(xué)科 Artificial Intelligence.; Computer Science.
學(xué)位 Ph.D.
年度 2007
頁碼 157 p.
總頁數(shù) 157
原文格式 PDF
正文語種 eng
中圖分類人工智能理論;自動化技術(shù)、計(jì)算機(jī)技術(shù);
關(guān)鍵詞

相似文獻(xiàn)

外文文獻(xiàn)
中文文獻(xiàn)
專利

1. Classification of Coral Reef Submarine Images and Videos Using a Novel Z with Tilted Z Local Binary Pattern (ZaS center dot TZLBP) [J] . Mary N. Ani Brown, Dejey Dharma Wireless personal communications: An Internaional Journal . 2018,第3期

機(jī)譯：珊瑚礁潛艇圖像和視頻的分類，使用傾斜Z局部二進(jìn)制模式（ZAS Center Dot TZLBP）
2. Joint Coding of Local and Global Deep Features in Videos for Visual Search [J] . Ding Lin, Tian Yonghong, Fan Hongfei, IEEE Transactions on Image Processing . 2020,第期

機(jī)譯：用于視覺搜索的視頻中本地和全球深度功能的聯(lián)合編碼
3. Converting video classification problem to image classification with global descriptors and pre-trained network [J] . Zebhi Saeedeh, Al-Modarresi S. M. T., Abootalebi Vahid Computer Vision, IET . 2020,第8期

機(jī)譯：將視頻分類問題轉(zhuǎn)換為具有全局描述符和預(yù)先培訓(xùn)的網(wǎng)絡(luò)的圖像分類
4. Exploring the Complementarity of Audio-Visual Structural Regularities for the Classification of Videos into TV-Program Collections [C] . Gabriel Sargent, Pierre Hanna, Henri Nicolas, IEEE International Symposium on Multimedia . 2015

機(jī)譯：探索將視頻分類為電視節(jié)目集的視聽結(jié)構(gòu)規(guī)則的互補(bǔ)性
5. Image classification with bags of local features. [D] . Lisin, Dimitri A. 2006

機(jī)譯：帶有局部特征包的圖像分類。
6. Lung Lesion Detection in CT Scan Images Using the Fuzzy Local Information Cluster Means (FLICM) Automatic Segmentation Algorithm and Back Propagation Network Classification [O] . M Lavanya, P Muthu Kannan 2017

機(jī)譯：使用模糊局部信息聚類均值（FLICM）自動分割算法和反向傳播網(wǎng)絡(luò)分類的CT掃描圖像中的肺部病變檢測
7. Visual spokespersons and the localisation vs. globalisation of visual elements in communication through narrative images online - An analysis of global and localised websites of multinational pharmaceutical companies [O] . Halttunen Heidi 2013

機(jī)譯：視覺代言人和通過敘事圖像在線交流中視覺元素的本地化與全球化-跨國制藥公司全球和本地化網(wǎng)站的分析

国产bbaaaaa片,成年美女黄网站色视频免费,成年黄大片,а天堂中文最新一区二区三区,成人精品视频一区二区三区尤物

Exploring image and video by classification and clustering on global and local visual features.

摘要

著錄項(xiàng)

引文網(wǎng)絡(luò)

相似文獻(xiàn)

相關(guān)主題

期刊訂閱