2020人工智能领域研究热点（一）：深度学习

　发表时间：2020-12-14　

本文通过对2018-2020年的发文进行分析，结合使用次数、被引次数等指标，筛选了2020人工智能领域的Top20研究热点。本期对深度学习这一研究热点的数据特征进行展示。

近年来，深度学习领域不仅在理论上有所突破，在不同领域也有较好的应用前景。深度学习领域的主要研究机构（按被引用排序）有谷歌公司、中国科学院、伦敦大学、牛津大学、伦敦大学学院、Facebook公司、约翰霍普金斯大学、中国科学院大学、中国科学院自动化研究所和加州大学系统等。

1-3

深度学习领域的主要研究者（按被引用排序）有Chen, Liang-Chieh(Google Incorporated)、Shen, Li(University of Oxford)、Hu, Jie、Sun, Gang、Papandreou, George(Google Incorporated)、Yuille, Alan L.(Johns Hopkins University)、Murphy, Kevin(Google Incorporated)、Kokkinos, Iasonas(University College London)、He, Kaiming(Facebook Inc)和Girshick, Ross(Facebook Inc)。

深度学习领域最受关注的主题词为Convolutional neural network、Object detection、Neural networks、Image classification、Transfer learning、Feature extraction、Semantic segmentation、Generated adversarial network、Domain adaptation、Deep neural networks、Computer vision、Task analysis、Image segmentation、Training、Machine learning等。

2020年深度学习领域最受关注的论文主要有：

1.标题:Focal Loss forDense Object Detection

作者: Lin, Tsung-Yi; Goyal, Priya; Girshick, Ross; 等.

期刊:IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE

会议: 16th IEEE International Conference on Computer Vision (ICCV) 会议地点: Venice,ITALY 会议日期: OCT 22-29, 2017

被引频次: 1,341

摘要:The highest accuracy object detectors to date are based on atwo-stage approach popularized by R-CNN, where a classifier is applied to asparse set of candidate object locations. In contrast, one-stage detectors thatare applied over a regular, dense sampling of possible object locations havethe potential to be faster and simpler, but have trailed the accuracy oftwo-stage detectors thus far. In this paper, we investigate why this is thecase. We discover that the extreme foreground-background class imbalanceencountered during training of dense detectors is the central cause. We proposeto address this class imbalance by reshaping the standard cross entropy losssuch that it down-weights the loss assigned to well-classified examples. Ournovel Focal Loss focuses training on a sparse set of hard examples and preventsthe vast number of easy negatives from overwhelming the detector duringtraining. To evaluate the effectiveness of our loss, we design and train asimple dense detector we call RetinaNet. Our results show that when trainedwith the focal loss, RetinaNet is able to match the speed of previous one-stagedetectors while surpassing the accuracy of all existing state-of-the-arttwo-stage detectors.

Code isat: https://github.com/facebookresearch/Detectron.

2.标题:Squeeze-and-ExcitationNetworks

作者: Hu, Jie; Shen, Li; Albanie, Samuel; 等.

期刊:IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE

会议: 31st IEEE/CVF Conference on Computer Vision and PatternRecognition (CVPR) 会议地点: Salt Lake City, UT 会议日期: JUN18-23, 2018

被引频次: 1,186

摘要:The central building block of convolutional neural networks (CNNs)is the convolution operator, which enables networks to construct informativefeatures by fusing both spatial and channel-wise information within localreceptive fields at each layer. A broad range of prior research hasinvestigated the spatial component of this relationship, seeking to strengthenthe representational power of a CNN by enhancing the quality of spatialencodings throughout its feature hierarchy. In this work, we focus instead onthe channel relationship and propose a novel architectural unit, which we termthe "Squeeze-and-Excitation" (SE) block, that adaptively recalibrateschannel-wise feature responses by explicitly modelling interdependenciesbetween channels. We show that these blocks can be stacked together to formSENet architectures that generalise extremely effectively across differentdatasets. We further demonstrate that SE blocks bring significant improvementsin performance for existing state-of-the-art CNNs at slight additionalcomputational cost. Squeeze-and-Excitation Networks formed the foundation ofour ILSVRC 2017 classification submission which won first place and reduced thetop-5 error to 2.251 percent, surpassing the winning entry of 2016 by arelative improvement of similar to 25 percent. Models and code are available athttps://github.com/hujie-frank/SENet.

3.标题:Mask R-CNN

作者: He, Kaiming; Gkioxari, Georgia; Dollar, Piotr; 等.

期刊:IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE

会议: 16th IEEE International Conference on Computer Vision (ICCV) 会议地点: Venice,ITALY 会议日期: OCT 22-29, 2017

被引频次: 431

摘要:We present a conceptually simple, flexible, and general frameworkfor object instance segmentation. Our approach efficiently detects objects inan image while simultaneously generating a high-quality segmentation mask foreach instance. The method, called Mask R-CNN, extends Faster R-CNN by adding abranch for predicting an object mask in parallel with the existing branch forbounding box recognition. Mask R-CNN is simple to train and adds only a smalloverhead to Faster R-CNN, running at 5 fps. Moreover, Mask R-CNN is easy togeneralize to other tasks, e.g., allowing us to estimate human poses in the sameframework. We show top results in all three tracks of the COCO suite ofchallenges, including instance segmentation, bounding-box object detection, andperson keypoint detection. Without bells and whistles, Mask R-CNN outperformsall existing, single-model entries on every task, including the COCO 2016challenge winners. We hope our simple and effective approach will serve as asolid baseline and help ease future research in instance-level recognition.Code has been made available at: https://github.com/facebookresearch/Detectron.

4.标题:Deep Learning forGeneric Object Detection: A Survey

作者: Liu, Li; Ouyang, Wanli; Wang, Xiaogang; 等.

期刊:INTERNATIONAL JOURNAL OF COMPUTER VISION

被引频次: 76

摘要:Object detection, one of the most fundamental and challenging problemsin computer vision, seeks to locate object instances from a large number ofpredefined categories in natural images. Deep learning techniques have emergedas a powerful strategy for learning feature representations directly from dataand have led to remarkable breakthroughs in the field of generic objectdetection. Given this period of rapid evolution, the goal of this paper is toprovide a comprehensive survey of the recent achievements in this field broughtabout by deep learning techniques. More than 300 research contributions areincluded in this survey, covering many aspects of generic object detection:detection frameworks, object feature representation, object proposalgeneration, context modeling, training strategies, and evaluation metrics. We finishthe survey by identifying promising directions for future research.

5.标题:Grad-CAM: VisualExplanations from Deep Networks via Gradient-Based Localization

作者: Selvaraju, Ramprasaath R.; Cogswell, Michael; Das, Abhishek; 等.

期刊:INTERNATIONAL JOURNAL OF COMPUTER VISION

会议: 16th IEEE International Conference on Computer Vision (ICCV) 会议地点: Venice,ITALY 会议日期: OCT 22-29, 2017

被引频次: 62

摘要:We propose a technique for producing 'visual explanations' fordecisions from a large class of Convolutional Neural Network (CNN)-basedmodels, making them more transparent and explainable. Ourapproach-Gradient-weighted Class Activation Mapping (Grad-CAM), uses thegradients of any target concept (say 'dog' in a classification network or asequence of words in captioning network) flowing into the final convolutionallayer to produce a coarse localization map highlighting the important regionsin the image for predicting the concept. Unlike previous approaches, Grad-CAMis applicable to a wide variety of CNN model-families: (1) CNNs withfully-connected layers (e.g.VGG), (2) CNNs used for structured outputs(e.g.captioning), (3) CNNs used in tasks with multi-modal inputs (e.g.visualquestion answering) or reinforcement learning, all without architecturalchanges or re-training. We combine Grad-CAM with existing fine-grainedvisualizations to create a high-resolution class-discriminative visualization,Guided Grad-CAM, and apply it to image classification, image captioning, andvisual question answering (VQA) models, including ResNet-based architectures.In the context of image classification models, our visualizations (a) lendinsights into failure modes of these models (showing that seeminglyunreasonable predictions have reasonable explanations), (b) outperform previousmethods on the ILSVRC-15 weakly-supervised localization task, (c) are robust toadversarial perturbations, (d) are more faithful to the underlying model, and(e) help achieve model generalization by identifying dataset bias. For imagecaptioning and VQA, our visualizations show that even non-attention basedmodels learn to localize discriminative regions of input image. We devise a wayto identify important neurons through Grad-CAM and combine it with neuron names(Bau et al. in Computer vision and pattern recognition, 2017) to providetextual explanations for model decisions. Finally, we design and conduct humanstudies to measure if Grad-CAM explanations help users establish appropriatetrust in predictions from deep networks and show that Grad-CAM helps untrainedusers successfully discern a 'stronger' deep network from a 'weaker' one evenwhen both make identical predictions. Our code is available at , along with ademo on CloudCV (Agrawal et al., in: Mobile cloud visual media computing, pp265-290. Springer, 2015) () and a video at .

6.标题:ClothingOut: acategory-supervised GAN model for clothing segmentation and retrieval

作者: Zhang, Haijun; Sun, Yanfang; Liu, Linlin; 等.

期刊:NEURAL COMPUTING & APPLICATIONS

被引频次: 61

摘要:This paper presents a new framework, ClothingOut, which utilizesgenerative adversarial network (GAN) to generate tiled clothing imagesautomatically. Specifically, we design a novel category-supervised GAN model bylearning transformation rules between clothes on wearers and clothes that aretiled. Our method features in adding category attribute to a traditional GANmodel. For model training, we built a large-scale dataset containing over20,000 pairs of wearer images and their corresponding tiled clothing images.The learned model can be straightforwardly applied to video advertising andcross-scenario clothing image retrieval. We evaluated our generated imageswhich can be regarded as the segmentation from the wearer images from twoaspects: authenticity and retrieval performance. Experimental resultsdemonstrate the effectiveness of our method.

7.标题:Hierarchical LSTMswith Adaptive Attention for Visual Captioning

作者: Gao, Lianli; Li, Xiangpeng; Song, Jingkuan; 等.

期刊:IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE

被引频次: 42

摘要:Recent progress has been made in using attention basedencoder-decoder framework for image and video captioning. Most existingdecoders apply the attention mechanism to every generated word including bothvisual words (e.g., "gun" and "shooting") and non-visualwords (e.g., "the", "a"). However, these non-visual wordscan be easily predicted using natural language model without considering visualsignals or attention. Imposing attention mechanism on non-visual words couldmislead and decrease the overall performance of visual captioning. Furthermore,the hierarchy of LSTMs enables more complex representation of visual data,capturing information at different scales. Considering these issues, we proposea hierarchical LSTM with adaptive attention (hLSTMat) approach for image andvideo captioning. Specifically, the proposed framework utilizes the spatial ortemporal attention for selecting specific regions or frames to predict therelated words, while the adaptive attention is for deciding whether to depend onthe visual information or the language context information. Also, ahierarchical LSTMs is designed to simultaneously consider both low-level visualinformation and high-level language context information to support the captiongeneration. We design the hLSTMat model as a general framework, and we firstinstantiate it for the task of video captioning. Then, we further instantiateour hLSTMarefine it and apply it to the imioning task. To demonstrate theeffectiveness of our proposed framework, we test our method on both video andimage captioning tasks. Experimental results show that our approach achievesthe state-of-the-art performance for most of the evaluation metrics on bothtasks. The effect of important components is also well exploited in the ablationstudy.

8.标题:Alcoholismidentification via convolutional neural network based on parametric ReLU,dropout, and batch normalization

作者: Wang, Shui-Hua; Muhammad, Khan; Hong, Jin; 等.

期刊:NEURAL COMPUTING & APPLICATIONS

被引频次: 40

摘要:Alcoholism changes the structure of brain. Several somatic markerhypothesis network-related regions are known to be damaged in chronicalcoholism. Neuroimaging approach can help us better understanding theimpairment discovered in alcohol-dependent subjects. In this research, we recruitedsubjects from participating hospitals. In total, 188 abstinent long-termchronic alcoholic participants (95 men and 93 women) and 191 non-alcoholiccontrol participants (95 men and 96 women) were enrolled in our experiment viacomputerized diagnostic interview schedule version IV and medical historyinterview employed to determine whether the applicants can be enrolled orexcluded. The Siemens Verio Tim 3.0 T MR scanner (Siemens Medical Solutions,Erlangen, Germany) was employed to scan the subjects. Then, we proposed a10-layer convolutional neural network for the diagnosis based on imaging,including three advanced techniques: parametric rectified linear unit (PReLU);batch normalization; and dropout. The structure of network is fine-tuned. The resultsshow that our method secured a sensitivity of 97.73 +/- 1.04%, a specificity of97.69 +/- 0.87%, and an accuracy of 97.71 +/- 0.68%. We observed the PReLUgives better performance than ordinary ReLU, clipped ReLU, and leaky ReLU. Thebatch normalization and dropout gained enhanced performance as batchnormalization overcame the internal covariate shift and dropout got over theoverfitting. The results of our proposed 10-layer CNN model show itsperformance better than seven state-of-the-art approaches.

9.标题:MultiResUNet :Rethinking the U-Net architecture for multimodal biomedical image segmentation

作者: Ibtehaz, Nabil; Rahman, M. Sohel

期刊:NEURAL NETWORKS

被引频次: 39

摘要:In recent years Deep Learning has brought about a breakthrough inMedical Image Segmentation. In this regard, U-Net has been the most populararchitecture in the medical imaging community. Despite outstanding overallperformance in segmenting multimodal medical images, through extensiveexperimentations on some challenging datasets, we demonstrate that theclassical U-Net architecture seems to be lacking in certain aspects. Therefore,we propose some modifications to improve upon the already state-of-the-artU-Net model. Following these modifications, we develop a novel architecture,MultiResUNet, as the potential successor to the U-Net architecture. We havetested and compared MultiResUNet with the classical U-Net on a vast repertoireof multimodal medical images. Although only slight improvements in the cases ofideal images are noticed, remarkable gains in performance have been attainedfor the challenging ones. We have evaluated our model on five differentdatasets, each with their own unique challenges, and have obtained a relativeimprovement in performance of 10.15%, 5.07%, 2.63%, 1.41%, and 0.62%respectively. We have also discussed and highlighted some qualitativelysuperior aspects of MultiResUNet over classical U-Net that are not reallyreflected in the quantitative measures. (C) 2019 Elsevier Ltd. All rightsreserved.

10.标题:Feature BoostingNetwork For 3D Pose Estimation

作者: Liu, Jun; Ding, Henghui; Shahroudy, Amir; 等.

期刊:IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE

被引频次: 29

摘要:In this paper, a feature boosting network is proposed forestimating 3D hand pose and 3D body pose from a single RGB image. In thismethod, the features learned by the convolutional layers are boosted with a newlong short-term dependence-aware (LSTD) module, which enables the intermediateconvolutional feature maps to perceive the graphical long short-term dependencyamong different hand (or body) parts using the designed Graphical ConvLSTM.Learning a set of features that are reliable and discriminativelyrepresentative of the pose of a hand (or body) part is difficult due to theambiguities, texture and illumination variation, and self-occlusion in the realapplication of 3D pose estimation. To improve the reliability of the featuresfor representing each body part and enhance the LSTD module, we furtherintroduce a context consistency gate (CCG) in this paper, with which theconvolutional feature maps are modulated according to their consistency withthe context representations. We evaluate the proposed method on challengingbenchmark datasets for 3D hand pose estimation and 3D full body poseestimation. Experimental results show the effectiveness of our method thatachieves state-of-the-art performance on both of the tasks.

概念解释

研究热点是指在一段时间内具有大量成果产出和较高研究价值的研究主题。研究热点在文献上的表现为发文量和被引量的快速增长。研究热点不同于研究前沿，研究前沿难以测度和捕获，即使捕获到的前沿，也需要专家进行研判，处于萌芽期的研究热点可以认为是研究前沿。一个研究主题成为研究热点后，通常就不再是研究前沿。尽管如此，研究热点的识别和分析仍具有重要意义，研究热点是经过一段时间和实践检验的研究前沿，对于科学理论的创新和发展具有重要意义，具有较高的应用价值。

数据来源

基于Web ofscience平台，筛选学科类别为COMPUTER SCIENCE ARTIFICIAL INTELLIGENCE且发表在2018-2020年间的论文，采用InCites数据库中的CitationTopics模块对研究主题进行识别，筛选获得人工智能领域Top20研究热点。

撰稿：June

审核：情报分析与研究部

情报研究

研究前沿探索

科研绩效分析

相关新闻