国产bbaaaaa片,成年美女黄网站色视频免费,成年黄大片,а天堂中文最新一区二区三区,成人精品视频一区二区三区尤物

首頁> 外文學(xué)位 >Multi-view approaches to tracking, three-dimensional reconstruction and object class detection.
【24h】

Multi-view approaches to tracking, three-dimensional reconstruction and object class detection.

機譯:跟蹤,三維重建和對象類別檢測的多視圖方法。

獲取原文
獲取原文并翻譯 | 示例

摘要

Multi-camera systems are becoming ubiquitous and have found application in a variety of domains including surveillance, immersive visualization, sports entertainment and movie special effects amongst others. From a computer vision perspective, the challenging task is how to most efficiently fuse information from multiple views in the absence of detailed calibration information and a minimum of human intervention. This thesis presents a new approach to fuse foreground likelihood information from multiple views onto a reference view without explicit processing in 3D space, thereby circumventing the need for complete calibration. Our approach uses a homographic occupancy constraint (HOC), which states that if a foreground pixel has a piercing point that is occupied by foreground object, then the pixel warps to foreground regions in every view under homographies induced by the reference plane, in effect using cameras as occupancy detectors. Using the HOC we are able to resolve occlusions and robustly determine ground plane localizations of the people in the scene. To find tracks we obtain ground localizations over a window of frames and stack them creating a space time volume. Regions belonging to the same person form contiguous spatio-temporal tracks that are clustered using a graph cuts segmentation approach.;Second, we demonstrate that the HOC is equivalent to performing visual hull intersection in the image-plane, resulting in a cross-sectional slice of the object. The process is extended to multiple planes parallel to the reference plane in the framework of plane to plane homologies. Slices from multiple planes are accumulated and the 3D structure of the object is segmented out. Unlike other visual hull based approaches that use 3D constructs like visual cones, voxels or polygonal meshes requiring calibrated views, ours is purely-image based and uses only 2D constructs i.e. planar homographies between views. This feature also renders it conducive to graphics hardware acceleration. The current GPU implementation of our approach is capable of fusing 60 views (480x720 pixels) at the rate of 50 slices/second. We then present an extension of this approach to reconstructing non-rigid articulated objects from monocular video sequences. The basic premise is that due to motion of the object, scene occupancies are blurred out with non-occupancies in a manner analogous to motion blurred imagery. Using our HOC and a novel construct: the temporal occupancy point (TOP), we are able to fuse multiple views of non-rigid objects obtained from a monocular video sequence. The result is a set of blurred scene occupancy images in the corresponding views, where the values at each pixel correspond to the fraction of total time duration that the pixel observed an occupied scene location. We then use a motion de-blurring approach to de-blur the occupancy images and obtain the 3D structure of the non-rigid object.;In the final part of this thesis, we present an object class detection method employing 3D models of rigid objects constructed using the above 3D reconstruction approach. Instead of using a complicated mechanism for relating multiple 2D training views, our approach establishes spatial connections between these views by mapping them directly to the surface of a 3D model. To generalize the model for object class detection, features from supplemental views (obtained from Google Image search) are also considered. Given a 2D test image, correspondences between the 3D feature model and the testing view are identified by matching the detected features. Based on the 3D locations of the corresponding features, several hypotheses of viewing planes can be made. The one with the highest confidence is then used to detect the object using feature location matching. Performance of the proposed method has been evaluated by using the PASCAL VOC challenge dataset and promising results are demonstrated.
機譯:多攝像機系統(tǒng)正變得無處不在,并且已在各種領(lǐng)域中得到應(yīng)用,包括監(jiān)視,身臨其境的可視化,體育娛樂和電影特效等。從計算機視覺的角度來看,具有挑戰(zhàn)性的任務(wù)是如何在沒有詳細(xì)的校準(zhǔn)信息和最少的人工干預(yù)的情況下,最有效地融合來自多個視圖的信息。本文提出了一種在不對3D空間進行顯式處理的情況下,將來自多個視圖的前景似然信息融合到參考視圖中的新方法,從而避免了對完整校準(zhǔn)的需求。我們的方法使用了單應(yīng)性占用約束(HOC),該約束指出如果前景像素具有被前景對象占據(jù)的穿刺點,則在參考平面所引起的同形下,像素會扭曲到每個視圖中的前景區(qū)域,實際上使用攝像機作為占用探測器。使用HOC,我們能夠解決遮擋并可靠地確定場景中人員的地平面位置。為了找到軌道,我們在框架窗口上獲取地面定位并將其堆疊以創(chuàng)建時空體積。屬于同一個人的區(qū)域形成了連續(xù)的時空軌跡,并使用圖割分割方法進行聚類。對象。該過程在平面到平面同構(gòu)的框架中擴展到平行于參考平面的多個平面。累積來自多個平面的切片,并分割出對象的3D結(jié)構(gòu)。與其他使用3D構(gòu)造(例如視錐,體素或多邊形網(wǎng)格,需要經(jīng)過校準(zhǔn)的視圖)的其他基于視覺船體的方法不同,我們的方法完全基于圖像,并且僅使用2D構(gòu)造,即視圖之間的平面單應(yīng)性。此功能還使其有利于圖形硬件加速。我們方法的當(dāng)前GPU實現(xiàn)能夠以50片/秒的速度融合60個視圖(480x720像素)。然后,我們提出了這種方法的擴展,以從單眼視頻序列中重建非剛性關(guān)節(jié)對象?;厩疤崾?,由于對象的運動,場景占用會以類似于運動模糊圖像的方式被非占用所模糊。使用我們的HOC和新穎的構(gòu)造:時間占用點(TOP),我們能夠融合從單眼視頻序列中獲得的非剛性物體的多個視圖。結(jié)果是在相應(yīng)視圖中出現(xiàn)一組模糊的場景占用圖像,其中每個像素處的值對應(yīng)于該像素觀察到占用場景位置的總持續(xù)時間的一部分。然后,我們使用運動去模糊方法對占用圖像進行去模糊處理,獲得非剛性物體的3D結(jié)構(gòu)。在本文的最后部分,我們提出了一種使用剛性物體的3D模型的物體類別檢測方法。使用上述3D重建方法構(gòu)建。我們的方法不是使用復(fù)雜的機制來關(guān)聯(lián)多個2D訓(xùn)練視圖,而是通過將這些視圖直接映射到3D模型的表面來建立這些視圖之間的空間連接。為了概括用于對象類別檢測的模型,還考慮了來自補充視圖的特征(從Google Image搜索中獲得)。給定2D測試圖像,則通過匹配檢測到的特征來識別3D要素模型和測試視圖之間的對應(yīng)關(guān)系?;谙鄳?yīng)特征的3D位置,可以對視平面進行幾種假設(shè)。然后使用特征位置匹配將置信度最高的一個用于檢測對象。通過使用PASCAL VOC挑戰(zhàn)數(shù)據(jù)集評估了所提出方法的性能,并展示了可喜的結(jié)果。

著錄項

相似文獻

  • 外文文獻
  • 中文文獻
  • 專利
獲取原文

客服郵箱:kefu@zhangqiaokeyan.com

京公網(wǎng)安備:11010802029741號 ICP備案號:京ICP備15016152號-6 六維聯(lián)合信息科技 (北京) 有限公司?版權(quán)所有
  • 客服微信

  • 服務(wù)號