Research

1. Arif Mahmood, A Basit, M A Munir, M ALi, “Detection and Localization of Firearm Carriers in Complex Scenes for Improved Safety Measures”, accepted IEEE Transactions on Computational Social Systems (TCSS), Sep. 2023. (IF 5.0)

2. H Yaseen, Arif Mahmood, “Learning Structure Aware Deep Spectral Embedding”, accepted IEEE Transactions on Image Processing (TIP), May 2023. (IF 11.042)

3. M Z Zaheer, Arif Mahmood, M Astrid, S I Lee, “Clustering Aided Weakly Supervised Training to Detect Anomalous Events in Surveillance Videos”, accepted IEEE Transactions on Neural Networks and Learning Systems (TNNLS), May 2023. (IF 14.225)

4. S Javed, Arif Mahmood, T Qaiser, N Werghi, “Knowledge Distillation in Histology Landscape by Multi-Layer Features Supervision”, in IEEE Journal of Biomedical and Health Informatics (JBHI), April 2023. (IF 7.021)

5. S Aldhaheri, R Alotaibi, B Alzahrani, A Hadi, Arif Mahmood, A Alhothali, A Barnawi, “MACC Net: Multi-task attention crowd counting network”, in Applied Intelligence, 2023. (IF 5.019)

6. MS Saeed, S Nawaz, MH Khan, MZ Zaheer, K Nandakumar, MH Yousaf, Arif Mahmood, “Single- branch Network for Multimodal Training”, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023.

7. H Giraldo, S Javed, Arif Mahmood, F D Malliaros, T Bouwmans, “Higher-Order Sparse Convolutions in Graph Neural Networks”, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023.

8. Unsupervised Landmark Discovery Using Consistency-Guided Bottleneck, British Machine Vision Conference (BMVC) 2023.

2. H Yaseen, Arif Mahmood, “Learning Structure Aware Deep Spectral Embedding”, accepted IEEE Transactions on Image Processing (TIP), May 2023. (IF 11.042)

5. S Aldhaheri, R Alotaibi, B Alzahrani, A Hadi, Arif Mahmood, A Alhothali, A Barnawi, “MACC Net: Multi-task attention crowd counting network”, in Applied Intelligence, 2023. (IF 5.019)

8. Unsupervised Landmark Discovery Using Consistency-Guided Bottleneck, British Machine Vision Conference (BMVC) 2023.

A. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

B. European Conference on Computer Vision (ECCV)

C. British Machine Vision Conference (BMVC)

D. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

E. IEEE International Conference on Image Processing (ICIP)

F. International Geoscience and Remote Sensing Symposium (IGRSS)

G. IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)

2. H Yaseen, Arif Mahmood, “Learning Structure Aware Deep Spectral Embedding”, accepted IEEE Transactions on Image Processing (TIP), May 2023. (IF 11.042)

5. S Aldhaheri, R Alotaibi, B Alzahrani, A Hadi, Arif Mahmood, A Alhothali, A Barnawi, “MACC Net: Multi-task attention crowd counting network”, in Applied Intelligence, 2023. (IF 5.019)

8. Unsupervised Landmark Discovery Using Consistency-Guided Bottleneck, British Machine Vision Conference (BMVC) 2023.

A. IEEE Transactions on Image Processing (TIP)

B. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)

C. IEEE Transactions on Neural Networks and Learning Systems (TNNLS)

D. IEEE Transactions on Multimedia (TMM)

E. IEEE Transactions on Cybernetics (TC)

F. IEEE Transactions on Services Computing (TSC)

G. IEEE Transactions on Circuits and Systems for Video Technology (CSVT)

H. IEEE Transactions on Signal and Information Processing over Networks (TSIPN)

I. IEEE Transactions on Knowledge and Data Engineering (TKDE)

J. IEEE Transactions on Computational Social Systems (TCSS)

K. IEEE Transactions on Cloud Computing (TCC)

L. IEEE Transactions on Services Computing (CSVT)

2. H Yaseen, Arif Mahmood, “Learning Structure Aware Deep Spectral Embedding”, accepted IEEE Transactions on Image Processing (TIP), May 2023. (IF 11.042)

5. S Aldhaheri, R Alotaibi, B Alzahrani, A Hadi, Arif Mahmood, A Alhothali, A Barnawi, “MACC Net: Multi-task attention crowd counting network”, in Applied Intelligence, 2023. (IF 5.019)

8. Unsupervised Landmark Discovery Using Consistency-Guided Bottleneck, British Machine Vision Conference (BMVC) 2023.

A. IEEE journal of Biomedical and Health Informatics (JBHI)

B. IEEE Journal of Selected Topics in Applied Earth Observations and
Remote Sensing (JSTARS)

C. IEEE Systems Journal (SJ)

D. IEEE Signal Processing Letters (SPL)

E. IEEE Access

2. H Yaseen, Arif Mahmood, “Learning Structure Aware Deep Spectral Embedding”, accepted IEEE Transactions on Image Processing (TIP), May 2023. (IF 11.042)

5. S Aldhaheri, R Alotaibi, B Alzahrani, A Hadi, Arif Mahmood, A Alhothali, A Barnawi, “MACC Net: Multi-task attention crowd counting network”, in Applied Intelligence, 2023. (IF 5.019)

8. Unsupervised Landmark Discovery Using Consistency-Guided Bottleneck, British Machine Vision Conference (BMVC) 2023.

A. MEDical Image Analysis (MEDIA)

B. Information Fusion (IF)

C. Pattern Recognition (PR)

D. Journal of Visual Communication and Image Representation (JVCIR)

E. Computers in Biology and Medicine (CBM)

F. Neurocomputing

G. Journal of Network and Computer Applications (JNCA)

2. H Yaseen, Arif Mahmood, “Learning Structure Aware Deep Spectral Embedding”, accepted IEEE Transactions on Image Processing (TIP), May 2023. (IF 11.042)

5. S Aldhaheri, R Alotaibi, B Alzahrani, A Hadi, Arif Mahmood, A Alhothali, A Barnawi, “MACC Net: Multi-task attention crowd counting network”, in Applied Intelligence, 2023. (IF 5.019)

8. Unsupervised Landmark Discovery Using Consistency-Guided Bottleneck, British Machine Vision Conference (BMVC) 2023.

A. International Journal of Computer Vision (IJCV)

B. Neural Computing and Applications (NCA)

C. Machine Vision and Applications (MVA)

D. Cluster Computing (CC)

E. Multimedia Tools and Applications (MTA)

2. H Yaseen, Arif Mahmood, “Learning Structure Aware Deep Spectral Embedding”, accepted IEEE Transactions on Image Processing (TIP), May 2023. (IF 11.042)

5. S Aldhaheri, R Alotaibi, B Alzahrani, A Hadi, Arif Mahmood, A Alhothali, A Barnawi, “MACC Net: Multi-task attention crowd counting network”, in Applied Intelligence, 2023. (IF 5.019)

8. Unsupervised Landmark Discovery Using Consistency-Guided Bottleneck, British Machine Vision Conference (BMVC) 2023.

A. Sensors

2. H Yaseen, Arif Mahmood, “Learning Structure Aware Deep Spectral Embedding”, accepted IEEE Transactions on Image Processing (TIP), May 2023. (IF 11.042)

5. S Aldhaheri, R Alotaibi, B Alzahrani, A Hadi, Arif Mahmood, A Alhothali, A Barnawi, “MACC Net: Multi-task attention crowd counting network”, in Applied Intelligence, 2023. (IF 5.019)

8. Unsupervised Landmark Discovery Using Consistency-Guided Bottleneck, British Machine Vision Conference (BMVC) 2023.

A. ACM Computing Surveys

2. H Yaseen, Arif Mahmood, “Learning Structure Aware Deep Spectral Embedding”, accepted IEEE Transactions on Image Processing (TIP), May 2023. (IF 11.042)

5. S Aldhaheri, R Alotaibi, B Alzahrani, A Hadi, Arif Mahmood, A Alhothali, A Barnawi, “MACC Net: Multi-task attention crowd counting network”, in Applied Intelligence, 2023. (IF 5.019)

8. Unsupervised Landmark Discovery Using Consistency-Guided Bottleneck, British Machine Vision Conference (BMVC) 2023.

A. Visual Object Tracking in the Deep Neural Networks, 2019

J. IEEE Transactions on Cloud Computing (TCC)

2. H Yaseen, Arif Mahmood, “Learning Structure Aware Deep Spectral Embedding”, accepted IEEE Transactions on Image Processing (TIP), May 2023. (IF 11.042)

5. S Aldhaheri, R Alotaibi, B Alzahrani, A Hadi, Arif Mahmood, A Alhothali, A Barnawi, “MACC Net: Multi-task attention crowd counting network”, in Applied Intelligence, 2023. (IF 5.019)

8. Unsupervised Landmark Discovery Using Consistency-Guided Bottleneck, British Machine Vision Conference (BMVC) 2023.

A. Int. Workshop on Frontiers of Computer Vision (IW-FCV)

B. IEEE/CVF CVPR Workshops (CVPRW)

C. IEEE ICCV Workshops (ICCVW)

D. International Workshop on Breast Imaging (IWBI)

E. ACCV Workshops (ACCVW)

Citation:

@INPROCEEDINGS {6909417,
author = {Arif Mahmood and A. Mian and R. Owens},
booktitle = {2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
title = {Semi-supervised Spectral Clustering for Image Set Classification},
year = {2014},
issn = {1063-6919},
pages = {121-128}}

European Conference on Computer Vision (ECCV)

CLAWS: Clustering Assisted Weakly Supervised Learning with Normalcy Suppression for Anomalous Event Detection (ECCV, 2021)

Abstract:

Learning to detect real-world anomalous events through video level labels is a challenging task due to the rare occurrence of anomalies as well as noise in the labels. In this work, we propose a weakly supervised anomaly detection method which has manifold contributions including 1) a random batch based training procedure to reduce inter-batch correlation, 2) a normalcy suppression mechanism to minimize anomaly scores of the normal regions of a video by taking into account the overall information available in one training batch, and 3) a clustering distance based loss to contribute towards mitigating the label noise and to produce better anomaly representations by encouraging our model to generate distinct normal and anomalous clusters. The proposed method obtains 83.03% and 89.67% frame-level AUC performance on the UCF-Crime and ShanghaiTech datasets respectively, demonstrating its superiority over the existing state-of-the-art algorithms.

Model Diagram

Paper Link:

Link

Citation:

@inproceedings{zaheer2020claws,
title={CLAWS: Clustering Assisted Weakly Supervised Learning with Normalcy Suppression for Anomalous Event Detection},
author={Zaheer, Muhammad Zaigham and Mahmood, Arif and Astrid, Marcella and Lee, Seung-Ik},
booktitle={European Conference on Computer Vision},
pages={358–376},
year={2020},
organization={Springer}}

HOPC: Histogram of oriented principal components of 3D point clouds for action recognition (ECCV, 2014)

Abstract:

Existing techniques for 3D action recognition are sensitive to viewpoint variations because they extract features from depth images which change significantly with viewpoint. In contrast, we directly process the point clouds and propose a new technique for action recognition which is more robust to noise, action speed and viewpoint variations. Our technique consists of a novel descriptor and keypoint detection algorithm. The proposed descriptor is extracted at a point by encoding the Histogram of Oriented Principal Components (HOPC) within an adaptive spatio-temporal support volume around that point. Based on this descriptor, we present a novel method to detect Spatio-Temporal Key-Points (STKPs) in 3D point cloud sequences. Experimental results show that the proposed descriptor and STKP detector outperform state-of-the-art algorithms on three benchmark human activity datasets. We also introduce a new multiview public dataset and show the robustness of our proposed method to viewpoint variations.

Model Diagram

Paper Link:

Link

Citation:

@inproceedings{rahmani2014hopc,
title={HOPC: Histogram of oriented principal components of 3D point clouds for action
recognition},
author={Rahmani, Hossein and Mahmood, Arif and Q Huynh, Du and Mian, Ajmal},
booktitle={European Conference on Computer Vision},
pages={742–757},
year={2014},
organization={Springer}}

@inproceedings{mahmood2012hierarchical,
title={Hierarchical Sparse Spectral Clustering For Image Set Classification.},
author={Mahmood, Arif and Mian, Ajmal S},
booktitle={Proceedings of the British Machine Vision Conference},
pages={1–11},
year={2012}}

IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Single-branch Network for Multimodal Training (ICASSP, 2023)

Abstract:

With the rapid growth of social media platforms, users are sharing billions of multimedia posts containing audio, images, and text. Researchers have focused on building autonomous systems capable of processing such multimedia data to solve challenging multimodal tasks including cross-modal retrieval, matching, and verification. Existing works use separate networks to extract embeddings of each modality to bridge the gap between them. The modular structure of their branched networks is fundamental in creating numerous multimodal applications and has become a defacto standard to handle multiple modalities. In contrast, we propose a novel single-branch network capable of learning discriminative representation of unimodal as well as multimodal tasks without changing the network. An important feature of our single-branch network is that it can be trained either using single or multiple modalities without sacrificing performance. We evaluated our proposed singlebranch network on the challenging multimodal problem (facevoice association) for cross-modal verification and matching tasks with various loss formulations. Experimental results demonstrate the superiority of our proposed single-branch network over the existing methods in a wide range of experiments. Code: https://github.com/msaadsaeed/SBNet

Model Diagram

Paper Link:

Link

Citation:

@INPROCEEDINGS{10097207,
author={Saeed, Muhammad Saad and Nawaz, Shah and Khan, Muhammad Haris and Zaigham Zaheer, Muhammad and Nandakumar, Karthik and Yousaf, Muhammad Haroon and Mahmood, Arif},
booktitle={IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
title={Single-branch Network for Multimodal Training},
year={2023},
pages={1-5},
doi={10.1109/ICASSP49357.2023.10097207}}

Higher-Order Sparse Convolutions In Graph Neural Networks (ICASSP, 2023)

Abstract:

Graph Neural Networks (GNNs) have been applied to many problems in computer sciences. Capturing higher-order relationships between nodes is crucial to increase the expressive power of GNNs. However, existing methods to capture these relationships could be infeasible for large-scale graphs. In this work, we introduce a new higher-order sparse convolution based on the Sobolev norm of graph signals. Our Sparse Sobolev GNN (S-SobGNN) computes a cascade of filters on each layer with increasing Hadamard powers to get a more diverse set of functions, and then a linear combination layer weights the embeddings of each filter. We evaluate S-SobGNN in several applications of semi-supervised learning. S-SobGNN shows competitive performance in all applications as compared to several state-of-the-art methods.

Model Diagram

Paper Link:

Link

Citation:

@INPROCEEDINGS{10096494,
author={Giraldo, Jhony H. and Javed, Sajid and Mahmood, Arif and Malliaros, Fragkiskos D. and Bouwmans, Thierry},
booktitle={ICASSP 2023 – 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
title={Higher-Order Sparse Convolutions in Graph Neural Networks},
year={2023},
pages={1-5},
doi={10.1109/ICASSP49357.2023.10096494}}

IEEE International Conference on Image Processing (ICIP)

Dynamic Background Subtraction using Least Squares Adversarial Learning (IEEE ICIP, 2020)

Abstract:

Dynamic Background Subtraction (BS) is a fundamental problem in many vision-based applications. BS in real complex environments has several challenging conditions like illumination variations, shadows, camera jitters, and bad weather. In this study, we aim to address the challenges of BS in complex scenes by exploiting conditional least squares adversarial networks. During training, a scene-specific conditional least squares adversarial network with two additional regularizations including L1-Loss and Perceptual-Loss is employed to learn the dynamic background variations. The given input to the model is video frames conditioned on corresponding ground truth to learn the dynamic changes in complex scenes. Afterwards, testing is performed on unseen test video frames so that the generator would conduct dynamic background subtraction. The proposed method consisting of three loss-terms including least squares adversarial loss, L1-Loss and Perceptual-Loss is evaluated on two benchmark datasets CDnet2014 and BMC. The results of our proposed method show improved performance on both datasets compared with 10 existing state-of-the-art methods.

Model Diagram

Paper Link:

Link

Citation:

@INPROCEEDINGS{9191235,
author={Sultana, Maryam and Mahmood, Arif and Bouwmans, Thierry and Jung, Soon Ki},
booktitle={2020 IEEE International Conference on Image Processing (ICIP)},
title={Dynamic Background Subtraction Using Least Square Adversarial Learning},
year={2020},
pages={3204-3208},
doi={10.1109/ICIP40778.2020.9191235}}

CS-RPCA: Clustered Sparse RPCA for Moving Object Detection (ICIP, 2020)

Abstract:

Moving object detection (MOD) is an important step for many computer vision applications. In the last decade, it is evident that RPCA has shown to be a potential solution for MOD and achieved a promising performance under various challenging background scenes. However, because of the lack of different types of features, RPCA still shows degraded performance in many complicated background scenes such as dynamic backgrounds, cluttered foreground objects, and camouflage. To address these problems, this paper presents a Clustered Sparse RPCA (CS-RPCA) for MOD under challenging environments. The proposed algorithm extracts multiple features from video sequences and then employs RPCA to get the low-rank and sparse component from each representation. The sparse subspaces are then emerged into a common sparse component using Grassmann manifold. We proposed a novel objective function which computes the composite sparse component from multiple representations and it is solved using non-negative matrix factorization method. The proposed algorithm is evaluated on two challenging datasets for MOD. Results demonstrate excellent performance of the proposed algorithm as compared to existing state-of-the-art methods.

Paper Link:

Link

Citation:

@ARTICLE{7010906,
author={Uzair, Muhammad and Mahmood, Arif and Mian, Ajmal},
journal={IEEE Transactions on Image Processing},
title={Hyperspectral Face Recognition With Spatiospectral Information Fusion and PLS Regression},
year={2015},
volume={24},
number={3},
pages={1127-1137},
doi={10.1109/TIP.2015.2393057}}

Correlation-Coefficient-Based Fast Template Matching Through Partial Elimination (TIP, 2012)

Abstract:

Partial computation elimination techniques are often used for fast template matching. At a particular search location, computations are prematurely terminated as soon as it is found that this location cannot compete with an already known best match location. Due to the nonmonotonic growth pattern of the correlation-based similarity measures, partial computation elimination techniques have been traditionally considered inapplicable to speed up these measures. In this paper, we show that partial elimination techniques may be applied to a correlation coefficient by using a monotonic formulation, and we propose basic-mode and extended-mode partial correlation elimination algorithms for fast template matching. The basic-mode algorithm is more efficient on small template sizes, whereas the extended mode is faster on medium and larger templates. We also propose a strategy to decide which algorithm to use for a given data set. To achieve a high speedup, elimination algorithms require an initial guess of the peak correlation value. We propose two initialization schemes including a coarse-to-fine scheme for larger templates and a two-stage technique for small- and medium-sized templates. Our proposed algorithms are exact, i.e., having exhaustive equivalent accuracy, and are compared with the existing fast techniques using real image data sets on a wide variety of template sizes. While the actual speedups are data dependent, in most cases, our proposed algorithms have been found to be significantly faster than the other algorithms.

Paper Link:

Link

Citation:

@ARTICLE{6044713,
author={Mahmood, Arif and Khan, Sohaib},
journal={IEEE Transactions on Image Processing},
title={Correlation-Coefficient-Based Fast Template Matching Through Partial Elimination},
year={2012},
volume={21},
number={4},
pages={2099-2108},
doi={10.1109/TIP.2011.2171696}}

Exploiting Transitivity of Correlation for Fast Template Matching (TIP, 2010)

Model Diagram

Paper Link:

Link

Citation:

IEEE Transactions on Knowledge and Data Engineering (TKDE)

Using Geodesic Space Density Gradients for Network Community Detection (TKDE, 2017)

Abstract:

Many real world complex systems naturally map to network data structures instead of geometric spaces because the only available information is the presence or absence of a link between two entities in the system. To enable data mining techniques to solve problems in the network domain, the nodes need to be mapped to a geometric space. We propose this mapping by representing each network node with its geodesic distances from all other nodes. The space spanned by the geodesic distance vectors is the geodesic space of that network. Position of different nodes in the geodesic space encode the network structure. In this space, considering a continuous density field induced by each node, density at a specific point is the summation of density fields induced by all nodes. We drift each node in the direction of positive density gradient using an iterative algorithm till each node reaches a local maximum. Due to the network structure captured by this space, the nodes that drift to the same region of space belong to the same communities in the original network. We use the direction of movement and final position of each node as important clues for community membership assignment. The proposed algorithm is compared with more than ten state of the art community detection techniques on two benchmark networks with known communities using Normalized Mutual Information criterion. The proposed algorithm outperformed these methods by a significant margin. Moreover, the proposed algorithm has also shown excellent performance on many real-world networks.

Model Diagram

Citation:

@article{mahmood2017using,
title={Using geodesic space density gradients for network community detection},
author={Mahmood, Arif and Small, Michael and Al-Maadeed, Somaya Ali and Rajpoot, Nasir},
journal={IEEE Transactions on Knowledge and Data Engineering (TKDE)},
volume={29},
number={4},
pages={921–935},
year={2017},
publisher={IEEE}}

Subspace Based Network Community Detection Using Sparse Linear Coding (TKDE, 2016)

Abstract:

Information mining from networks by identifying communities is an important problem across a number of research fields including social science, biology, physics, and medicine. Most existing community detection algorithms are graph theoretic and lack the ability to detect accurate community boundaries if the ratio of intra-community to inter-community links is low. Also, the algorithms based on modularity maximization may fail to resolve communities smaller than a specific size if the community size varies significantly. In this paper we present a fundamentally different community detection algorithm based on the fact that each network community spans a different subspace in the geodesic space. Therefore, each node can only be efficiently represented as a linear combination of nodes spanning the same subspace. To make the process of community detection more robust, we use sparse linear coding with l1 norm constraint. In order to find a community label for each node, sparse spectral clustering algorithm is used. The proposed community detection technique is compared with more than ten state of the art methods on two benchmark networks (with known clusters) using normalized mutual information criterion. Our proposed algorithm outperformed existing algorithms with a significant margin on both benchmark networks. The proposed algorithm has also shown excellent performance on three real-world networks.

Paper Link:

Link

Citation:

Internal Emotion Classification Using EEG Signal with Sparse Discriminative Ensemble (Access, 2019)

Abstract:

Among various physiological signal acquisition methods for the study of the human brain, EEG (Electroencephalography) is more effective. EEG provides a convenient, non-intrusive, and accurate way of capturing brain signals in multiple channels at fine temporal resolution. We propose an ensemble learning algorithm for automatically computing the most discriminative subset of EEG channels for internal emotion recognition. Our method describes an EEG channel using kernel-based representations computed from the training EEG recordings. For ensemble learning, we formulate a graph embedding linear discriminant objective function using the kernel representations. The objective function is efficiently solved via sparse non-negative principal component analysis and the final classifier is learned using the sparse projection coefficients. Our algorithm is useful in reducing the amount of data while improving computational efficiency and classification accuracy at the same time. The experiments on publicly available EEG dataset demonstrate the superiority of the proposed algorithm over the compared methods.

Model Diagram

Paper Link:

Link

Citation:

@article{ullah2019internal,
title={Internal emotion classification using EEG signal with sparse discriminative ensemble},
author={Ullah, Habib and Uzair, Muhammad and Mahmood, Arif and Ullah, Mohib and Khan, Sultan Daud and Cheikh, Faouzi Alaya},
journal={IEEE Access},
volume={7},
pages={40144–40153},
year={2019},
publisher={IEEE}
}

Multi-Order Statistical Descriptors for Real-Time Face Recognition and Object Classification (Access, 2018)

Abstract:

We propose novel multi-order statistical descriptors which can be used for high speed object classification or face recognition from videos or image sets. We represent each gallery set with a global second-order statistic which captures correlated global variations in all feature directions as well as the common set structure. A lightweight descriptor is then constructed by efficiently compacting the second order statistic using Cholesky decomposition. We then enrich the descriptor with the first-order statistic of the gallery set to further enhance the representation power. By projecting the descriptor into a low dimensional discriminant subspace, we obtain further dimensionality reduction, while the discrimination power of the proposed representation is still preserved. Therefore, our method represents a complex image set by a single descriptor having significantly reduced dimensionality. We apply the proposed algorithm on image set and video-based face and periocular biometric identification, object category recognition, and hand gesture recognition. Experiments on six benchmark data sets validate that the proposed method achieves significantly better classification accuracy with lower computational complexity than the existing techniques. The proposed compact representations can be used for real-time object classification and face recognition in videos.

Model Diagram

Paper Link:

Link

Citation:

@article{mahmood2018multi,
title={Multi-order statistical descriptors for real-time face recognition and object classification},
author={Mahmood, Arif and Uzair, Muhammad and Al-Maadeed, Somaya},
journal={IEEE Access},
volume={6},
pages={12993–13004},
year={2018},
publisher={IEEE}
}

Palmprint Identification Using an Ensemble of Sparse Representations (Access, 2018)

Abstract:

Among various palmprint identification methods proposed in the literature, sparse representation for classification (SRC) is very attractive offering high accuracy. Although SRC has good discriminative ability, its performance strongly depends on the quality of the training data. In particular, SRC suffers from two major problems: lack of training samples per class and large intra-class variations. In fact, palmprint images not only contain identity information but they also have other information, such as illumination and geometrical distortions due to the unconstrained conditions and the movement of the hand. In this case, the sparse representation assumption may not hold well in the original space since samples from different classes may be considered from the same class. This paper aims to enhance palmprint identification performance through SRC by proposing a simple yet efficient method based on an ensemble of sparse representations through an ensemble of discriminative dictionaries satisfying SRC assumption. The ensemble learning has the advantage to reduce the sensitivity due to the limited size of the training data and is performed based on random subspace sampling over 2D-PCA space while keeping the image inherent structure and information. In order to obtain discriminative dictionaries satisfying SRC assumption, a new space is learned by minimizing and maximizing the intra-class and inter-class variations using 2D-LDA. Extensive experiments are conducted on two publicly available palmprint data sets: multispectral and PolyU. Obtained results showed very promising results compared with both state-of-the-art holistic and coding methods. Besides these findings, we provide an empirical analysis of the parameters involved in the proposed technique to guide the neophyte.

Paper Link:

Link

Citation:

@article{rida2018palmprint,
title={Palmprint identification using an ensemble of sparse representations},
author={Rida, Imad and Al-Maadeed, Somaya and Mahmood, Arif and Bouridane, Ahmed and Bakshi, Sambit},
journal={IEEE Access},
volume={6},
pages={3241–3248},
year={2018},
publisher={IEEE}}

MEDical Image Analysis (MEDIA)

Citation:

@article{javed2020cellular,
title={Cellular Community Detection For Tissue Phenotyping In Colorectal Cancer Histology Images},
author={Javed, Sajid and Mahmood, Arif and Fraz, Muhammad Moazam and Koohbanania, Navid Alemi and Benesc, Ksenija and Tsangc, Yee-Wah and Hewittc, Katherine and Epsteind, David and Sneadc, David and Rajpoot, Nasir},
journal={Medical Image Analysis (MEDIA)},
year={2020}}

Information Fusion (IF)

Multi-focus Image Fusion Using Content Adaptive Blurring (IF, 2019)

Abstract:

Multi-focus image fusion has emerged as an important research area in information fusion. It aims at increasing the depth-of-field by extracting focused regions from multiple partially focused images, and merging them together to produce a composite image in which all objects are in focus. In this paper, a novel multi-focus image fusion algorithm is presented in which the task of detecting the focused regions is achieved using a Content Adaptive Blurring (CAB) algorithm. The proposed algorithm induces non-uniform blur in a multi-focus image depending on its underlying content. In particular, it analyzes the local image quality in a neighborhood and determines if the blur should be induced or not without losing image quality. In CAB, pixels belonging to the blur regions receive little or no blur at all, whereas the focused regions receive significant blur. Absolute difference of the original image and the CAB-blurred image yields initial segmentation map, which is further refined using morphological operators and graph-cut techniques to improve the segmentation accuracy. Quantitative and qualitative evaluations and comparisons with current state-of-the-art on two publicly available datasets demonstrate the strength of the proposed algorithm.

Model Diagram

Paper Link:

Link

Citation:

@article{farid2019multi,
title={Multi-focus image fusion using content adaptive blurring},
author={Farid, Muhammad Shahid and Mahmood, Arif and Al-Maadeed, Somaya Ali},
journal={Information fusion},
volume={45},
pages={96–112},
year={2019},
publisher={Elsevier}
}

An Information Fusion Framework for Person Localization Via Body Pose in Spectator Crowds (IF, 2019)

Abstract:

Person localization or segmentation in low resolution crowded scenes is important for person tracking and recognition, action detection and anomaly identification. Due to occlusion and lack of interpersonal space, person localization becomes a difficult task. In this work, we propose a novel information fusion framework to integrate a Deep Head Detector and a body pose detector. A more accurate body pose showing limb positions will result in more accurate person localization. We propose a novel Deep Head Detector (DHD) to detect person heads in crowds. The proposed DHD is a fully convolutional neural network and it has shown improved head detection performance in crowds. We modify the Deformable Parts Model (DPM) pose detector to detect multiple upper body poses in crowds. We efficiently fuse the information obtained by the proposed DHD and the modified DPM to obtain a more accurate person pose detector. The proposed framework is named Fusion DPM (FDPM) and it has exhibited improved body pose detection performance on spectator crowds. The detected body poses are then used for more accurate person localization by segmenting each person in the crowd.

Model Diagram

Paper Link:

Link

Citation:

@article{shaban2019information,
title={An information fusion framework for person localization via body pose in spectator crowds},
author={Shaban, Muhammad and Mahmood, Arif and Al-Maadeed, Somaya Ali and Rajpoot, Nasir},
journal={Information Fusion},
volume={51},
pages={178–188},
year={2019},
publisher={Elsevier}
}

Pattern Recognition (PR)

Unsupervised Moving Object Segmentation Using Background Subtraction and Optimal Adversarial Noise Sample Search (PR, 2022)

Abstract:

Moving Objects Segmentation (MOS) is a fundamental task in many computer vision applications such as human activity analysis, visual object tracking, content based video search, traffic monitoring, surveillance, and security. MOS becomes challenging due to abrupt illumination variations, dynamic backgrounds, camouflage and scenes with bootstrapping. To address these challenges we propose a MOS algorithm exploiting multiple adversarial regularizations including conventional as well as least squares losses. More specifically, our model is trained on scene background images with the help of cross-entropy loss, least squares adversarial loss and l1 loss in image space working jointly to learn the dynamic background changes. During testing, our proposed method aims to generate test image background scenes by searching optimal noise samples using joint minimization of l1 loss in image space, l1 loss in feature space, and discriminator least squares loss. These loss functions force the generator to synthesize dynamic backgrounds similar to the test sequences which upon subtraction results in moving objects segmentation. Experimental evaluations on five benchmark datasets have shown excellent performance of the proposed algorithm compared to the twenty one existing state-of-the-art methods.

Model Diagram

Paper Link:

Link

Citation:

@article{sultana2022unsupervised,
title={Unsupervised moving object segmentation using background subtraction and optimal adversarial noise sample search},
author={Sultana, Maryam and Mahmood, Arif and Jung, Soon Ki},
journal={Pattern Recognition},
volume={129},
pages={108719},
year={2022},
publisher={Elsevier}
}

Journal of Visual Communication and Image Representation (JVCIR)

Multi-Scale Attention Guided Network for End-to-End Face Alignment and Recognition (JVCIR, 2022)

Moving Objects Segmentation (MOS) is a crucial step in various computer vision applications, such as visual object tracking, autonomous vehicles, human activity analysis, surveillance, and security. Existing MOS approaches suffer from performance degradation due to extreme challenging conditions in real world complex environments such as varying illumination conditions, camouflage objects, dynamic backgrounds, shadows, bad weathers and camera jitters. To address these problems we proposed a novel generative adversarial based framework for moving objects segmentation. Our framework works with one classifier discriminator, one representation learning network and one generator jointly trained to perform MOS in various challenging scenarios. During training the discriminator network acts as a decision maker between real and fake training samples using conditional least squares loss. While the representation learning network provides the difference between the deep features of real and fake training samples using content loss formulation. Another loss term we have exploited to train our generator network is the reconstruction loss that minimizes the difference between the spatial information of real and fake training samples. Moreover, we also propose a novel modified U-net architecture for our generator network showing improved performance over Vanilla U-net model. Experimental evaluations of our proposed method on four benchmark datasets in comparison with thirty-two existing methods has demonstrated the strength of our proposed model.

Paper Link:

Link

Citation:

@article{sultana2022moving,
title={Moving objects segmentation using generative adversarial modeling},
author={Sultana, Maryam and Mahmood, Arif and Bouwmans, Thierry and Khan, Muhammad Haris and Jung, Soon Ki},
journal={Neurocomputing},
volume={506},
pages={240–251},
year={2022},
publisher={Elsevier}
}

Leveraging orientation for weakly supervised object detection with application to firearm localization (NC, 2021)

Abstract:

Automatic detection of firearms is important for enhancing the security and safety of people, however, it is a challenging task owing to the wide variations in shape, size and appearance of firearms. Also, most of the generic object detectors process axis-aligned rectangular areas though, a thin and long rifle may actually cover only a small percentage of that area and the rest may contain irrelevant details suppressing the required object signatures. To handle these challenges, we propose a weakly supervised Orientation Aware Object Detection (OAOD) algorithm which learns to detect oriented object bounding boxes (OBB) while using AxisAligned Bounding Boxes (AABB) for training. The proposed OAOD is different from the existing oriented object detectors which strictly require OBB during training which may not always be present. The goal of training on AABB and detection of OBB is achieved by employing a multistage scheme, with Stage-1 predicting the AABB and Stage-2 predicting OBB. In-between the two stages, the oriented proposal generation module along with the object aligned RoI pooling is designed to extract features based on the predicted orientation and to make these features orientation invariant. A diverse and challenging dataset consisting of eleven thousand images is also proposed for firearm detection which is manually annotated.

Model Diagram

Paper Link:

Link

Citation:

@article{iqbal2021leveraging,
title={Leveraging orientation for weakly supervised object detection with application to firearm localization},
author={Iqbal, Javed and Munir, Muhammad Akhtar and Mahmood, Arif and Ali, Afsheen Rafaqat and Ali, Mohsen},
journal={Neurocomputing},
volume={440},
pages={310–320},
year={2021}}

Journal of Network and Computer Applications (JNCA)

Dynamic workload patterns prediction for proactive auto-scaling of web applications (JNCA, 2018)

Abstract:

Proactive auto-scaling methods dynamically manage the resources for an application according to the current and future load predictions to preserve the desired performance at a reduced cost. However, auto-scaling web applications remain challenging mainly due to dynamic workload intensity and characteristics which are difficult to predict. Most existing methods mainly predict the request arrival rate which only partially captures the workload characteristics and the changing system dynamics that influence the resource needs. This may lead to inappropriate resource provisioning decisions. In this paper, we address these challenges by proposing a framework for prediction of dynamic workload patterns as follows. First, we use an unsupervised learning method to analyze the web application access logs to discover URI (Uniform Resource Identifier) space partitions based on the response time and the document size features. Then for each application URI, we compute its distribution across these partitions based on historical access logs to accurately capture the workload characteristics compared to just representing the workload using the request arrival rate. These URI distributions are then used to compute the Probabilistic Workload Pattern (PWP), which is a probability vector describing the overall distribution of incoming requests across URI partitions. Finally, the identified workload patterns for a specific number of last time intervals are used to predict the workload pattern of the next interval. The latter is used for future resource demand prediction and proactive auto-scaling to dynamically control the provisioning of resources. The framework is implemented and experimentally evaluated using historical access logs of three real web applications, each with increasing, decreasing, periodic, and randomly varying arrival rate behaviors. Results show that the proposed solution yields significantly more accurate predictions of workload patterns and resource demands of web applications compared to existing approaches

Model Diagram:

Paper Link:

Link

Citation:

@article{iqbal2018dynamic,
title={Dynamic workload patterns prediction for proactive auto-scaling of web applications},
author={Iqbal, Waheed and Erradi, Abdelkarim and Mahmood, Arif},
journal={Journal of Network and Computer Applications},
volume={124},
pages={94–107},
year={2018},
publisher={Elsevier}}

International Journal of Computer Vision (IJCV)

Masked Linear Regression for Learning Local Receptive Fields for Facial Expression Synthesis (IJCV, 2019)

Abstract:

Compared to facial expression recognition, expression synthesis requires a very high-dimensional mapping. This problem exacerbates with increasing image sizes and limits existing expression synthesis approaches to relatively small images. We observe that facial expressions often constitute sparsely distributed and locally correlated changes from one expression to another. By exploiting this observation, the number of parameters in an expression synthesis model can be significantly reduced. Therefore, we propose a constrained version of ridge regression that exploits the local and sparse structure of facial expressions. We consider this model as masked regression for learning local receptive fields. In contrast to the existing approaches, our proposed model can be efficiently trained on larger image sizes. Experiments using three publicly available datasets demonstrate that our model is significantly better than l0, l1 and l2-regression, SVD based approaches, and kernelized regression in terms of mean-squared-error, visual quality as well as computational and spatial complexities. The reduction in the number of parameters allows our method to generalize better even after training on smaller datasets. The proposed algorithm is also compared with state-of-the-art GANs including Pix2Pix, CycleGAN, StarGAN and GANimation. These GANs produce photo-realistic results as long as the testing and the training distributions are similar. In contrast, our results demonstrate significant generalization of the proposed algorithm over out-of-dataset human photographs, pencil sketches and even animal faces.

Paper Link:

Link

Citation:

@article{khan2020masked,
title={Masked linear regression for learning local receptive fields for facial expression synthesis},
author={Khan, Nazar and Akram, Arbish and Mahmood, Arif and Ashraf, Sania and Murtaza, Kashif},
journal={International Journal of Computer Vision},
volume={128},
number={5},
pages={1433–1454},
year={2020},
publisher={Springer}
}

@article{farooq2021human,
title={Human face super-resolution on poor quality surveillance video footage},
author={Farooq, Muhammad and Dailey, Matthew N and Mahmood, Arif and Moonrinta, Jednipat and Ekpanyapong, Mongkol},
journal={Neural Computing and Applications},
volume={33},
pages={13505–13523},
year={2021},
publisher={Springer}}

Machine Vision and Applications (MVA)

Unsupervised deep context prediction for background estimation and foreground segmentation (MVA, 2019)

Abstract:

Background estimation is a fundamental step in many high-level vision applications, such as tracking and surveillance. Existing background estimation techniques suffer from performance degradation in the presence of challenges such as dynamic backgrounds, photometric variations, camera jitters, and shadows. To handle these challenges for the purpose of accurate background estimation, we propose a unified method based on Generative Adversarial Network (GAN) and image inpainting. The proposed method is based on a context prediction network, which is an unsupervised visual feature learning hybrid GAN model. Context prediction is followed by a semantic inpainting network for texture enhancement. We also propose a solution for arbitrary region inpainting using the center region inpainting method and Poisson blending technique. The proposed algorithm is compared with the existing state-of-the-art methods for background estimation and foreground segmentation and outperforms the compared methods by a significant margin.

Model Diagram

Paper Link:

Link

Citation:

@article{sultana2019unsupervised,
title={Unsupervised deep context prediction for background estimation and foreground segmentation},
author={Sultana, Maryam and Mahmood, Arif and Javed, Sajid and Jung, Soon Ki},
journal={Machine Vision and Applications},
volume={30},
pages={375–395},
year={2019},
publisher={Springer}
}

Action recognition in poor quality spectator crowd videos using head distribution based person segmentation (MVA, 2019)

Model Diagram

Paper Link:

Link

Citation:

@article{fiaz2020improving,
title={Improving object tracking by added noise and channel attention},
author={Fiaz, Mustansar and Mahmood, Arif and Baek, Ki Yeol and Farooq, Sehar Shahzad and Jung, Soon Ki},
journal={Sensors},
volume={20},
number={13},
pages={3780},
year={2020},
publisher={MDPI}}

Learning soft mask based feature fusion with channel and spatial attention for robust visual object tracking (Sensors, 2020)

Abstract:

We propose to improve the visual object tracking by introducing a soft mask based low-level feature fusion technique. The proposed technique is further strengthened by integrating channel and spatial attention mechanisms. The proposed approach is integrated within a Siamese framework to demonstrate its effectiveness for visual object tracking. The proposed soft mask is used to give more importance to the target regions as compared to the other regions to enable effective target feature representation and to increase discriminative power. The low-level feature fusion improves the tracker robustness against distractors. The channel attention is used to identify more discriminative channels for better target representation. The spatial attention complements the soft mask based approach to better localize the target objects in challenging tracking scenarios. We evaluated our proposed approach over five publicly available benchmark datasets and performed extensive comparisons with 39 state-of-the-art tracking algorithms. The proposed tracker demonstrates excellent performance compared to the existing state-of-the-art trackers.

Model Diagram

Paper Link:

Link

Citation:

@article{fiaz2020learning,
title={Learning soft mask based feature fusion with channel and spatial attention for robust visual object tracking},
author={Fiaz, Mustansar and Mahmood, Arif and Jung, Soon Ki},
journal={Sensors},
volume={20},
number={14},
pages={4021},
year={2020},
publisher={MDPI}}

Using temporal covariance of motion and geometric features via boosting for human fall detection (Sensors, 2018)

Lightweight Encoder-Decoder Architecture for Foot Ulcer Segmentation (IW-FCV, 2022)

Abstract:

Continuous monitoring of foot ulcer healing is needed to ensure the efficacy of a given treatment and to avoid any possibility of deterioration. Foot ulcer segmentation is an essential step in wound diagnosis. We developed a model that is similar in spirit to the well-established encoder-decoder and residual convolution neural networks. Our model includes a residual connection along with a channel and spatial attention integrated within each convolution block. A simple patch-based approach for model training, test time augmentations, and majority voting on the obtained predictions resulted in superior performance. Our model did not leverage any readily available backbone architecture, pre-training on a similar external dataset, or any of the transfer learning techniques. The total number of network parameters being around 5 million made it a significantly lightweight model as compared with the available state-of-theart models used for the foot ulcer segmentation task. Our experiments presented results at the patch-level and image-level. Applied on publicly available Foot Ulcer Segmentation (FUSeg) Challenge dataset from MICCAI 2021, our model achieved state-of-the-art image-level performance of 88.22% in terms of Dice similarity score and ranked second in the official challenge leaderboard. We also showed an extremely simple solution that could be compared against the more advanced architectures.

Model Diagram

Paper Link:

Link

Citation:

@inproceedings{ali2022lightweight,
title={Lightweight Encoder-Decoder Architecture for Foot Ulcer Segmentation},
author={Ali, Shahzad and Mahmood, Arif and Jung, Soon Ki},
booktitle={International Workshop on Frontiers of Computer Vision},
pages={242–253},
year={2022},
organization={Springer}
}

Robust Tracking via Feature Enrichment and Overlap Maximization (IW-FCV, 2021)

Abstract:

Recently, Convolutional Neural Networks (CNNs) based approaches have demonstrated an impressive gain over conventional approaches which resulted in rapid development of various visual object tracker. However, these advancements are limited in terms of accuracy due to the distractors available in the videos. Moreover, most of the deep trackers operate on low-resolution features, such as template matching, which are semantically reliable but are spatially less accurate. We propose an efficient feature enrichment module within tracking framework to learn the contextual reliable information and spatially accurate feature representation. Proposed feature enrichment combines enriched feature sets by exploiting contextual information from multiple scales as well as preserving the spatial information details. We integrate proposed feature enrichment module within baseline ATOM which solves the tracking problem by target estimation and classification components. The former component estimates the target based on IoU-predictor, while the later component is trained online to enforce high discrimination power. Experimental study over three benchmarks including VOT2015, VOT2016, and VOT2017 revealed that proposed feature enrichment module boosts the tracker accuracy.

Paper Link:

Link

Citation:

@inproceedings{fiaz2021robust,
title={Robust Tracking via Feature Enrichment and Overlap Maximization},
author={Fiaz, Mustansar and Ali, Kamran and Yun, Sang Bin and Baek, Ki Yeol and Lee, Hye Jin and Kim, In Su and Mahmood, Arif and Farooq, Sehar Shahzad and Jung, Soon Ki},
booktitle={Frontiers of Computer Vision: 27th International Workshop, IW-FCV 2021, Daegu, South Korea, February 22–23, 2021, Revised Selected Papers 27},
pages={17–30},
year={2021},
organization={Springer}
}

Adaptive Feature Selection Siamese Networks for Visual Tracking (IW-FCV, 2020) [Best Student Paper Award]

Abstract:

Recently, template based discriminative trackers, especially Siamese network based trackers have shown great potential in terms of balanced accuracy and tracking speed. However, it is still difficult for Siamese models to adapt the target variations from offline learning. In this paper, we introduced an Adaptive Feature Selection Siamese (AFSSiam) network to learn the most discriminative feature information for better tracking. Features from different layers contain complementary information for discrimination. Proposed adaptive feature selection module selects the most useful feature information from different convolutional layers while suppresses the irrelevant ones. Proposed tracking algorithm not only alleviates the over-fitting problem but also increases the discriminative ability. The proposed tracking framework is trained end-to-end. and extensive experimental results over OTB50, OTB100, TC-128, and VOT2017 demonstrate that our tracking algorithm exhibits favorable performance compared to other state-of-the-art methods.

Model Diagram

Paper Link:

Link

Citation:

@inproceedings{fiaz2020adaptive,
title={Adaptive feature selection Siamese networks for visual tracking},
author={Fiaz, Mustansar and Rahman, Md Maklachur and Mahmood, Arif and Farooq, Sehar Shahzad and Baek, Ki Yeol and Jung, Soon Ki},
booktitle={Frontiers of Computer Vision: 26th International Workshop, IW-FCV 2020, Ibusuki, Kagoshima, Japan, February 20–22, 2020, Revised Selected Papers 26},
pages={167–179},
year={2020},
organization={Springer}
}

Unsupervised Adversarial Learning for Dynamic Background Modelling (IW-FCV, 2020) [Best Paper Award]

Abstract:

Dynamic Background Modeling (DBM) is a crucial task in many computer vision based applications such as human activity analysis, traffic monitoring, surveillance, and security. DBM is extremely challenging in scenarios like illumination changes, camouflage, intermittent object motion or shadows. In this study, we proposed an end-to-end framework based on Generative Adversarial Network, which can generate dynamic background information for the task of DBM in an unsupervised manner. Our proposed model can handle the problem of DBM in the presence of the challenges mentioned above by generating data similar to the desired information. The primary aim of our proposed model during training is to learn all the dynamic changes in a scene-specific background information. While, during testing, inverse mapping of data to latent space representation in our model generates dynamic backgrounds similar to test data. The comparative analysis of our proposed model upon experimental evaluations on SBM.net and SBI benchmark datasets has outperformed eight existing methods for DBM in many challenging scenarios.

Paper Link:

Link

Citation:

@inproceedings{sultana2020unsupervised,
title={Unsupervised adversarial learning for dynamic background modeling},
author={Sultana, Maryam and Mahmood, Arif and Bouwmans, Thierry and Jung, Soon Ki},
booktitle={International Workshop on Frontiers of Computer Vision},
pages={248–261},
year={2020},
organization={Springer}
}

IEEE/CVF CVPR Workshops (CVPRW)

Cross-modal Speaker Verification and Recognition: A Multilingual Perspective (CVPRW, 2021)

Abstract:

Recent years have seen a surge in finding association between faces and voices within a cross-modal biometric application along with speaker recognition. Inspired from this, we introduce a challenging task in establishing association between faces and voices across multiple languages spoken by the same set of persons. The aim of this paper is to answer two closely related questions: “Is face-voice association language independent?” and “Can a speaker be recognized irrespective of the spoken language?”. These two questions are important to understand effectiveness and to boost development of multilingual biometric systems. To answer these, we collected a Multilingual Audio-Visual dataset, containing human speech clips of 154 identities with 3 language annotations extracted from various videos uploaded online. Extensive experiments on the two splits of the proposed dataset have been performed to investigate and answer these novel research questions that clearly point out the relevance of the multilingual problem.

Model Diagram

Paper Link:

Link

Citation:

@inproceedings{nawaz2021cross,
title={Cross-modal speaker verification and recognition: A multilingual perspective},
author={Nawaz, Shah and Saeed, Muhammad Saad and Morerio, Pietro and Mahmood, Arif and Gallo, Ignazio and Yousaf, Muhammad Haroon and Del Bue, Alessio},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={1682–1691},
year={2021}
}

Cleaning Label Noise with Clusters for Minimally Supervised Anomaly Detection (CVPRW - Learning from Unlabeled Videos (LUV2020))

Abstract:

Model Diagram

Paper Link:

Link

Citation:

@article{zaheer2021cleaning,
title={Cleaning label noise with clusters for minimally supervised anomaly detection},
author={Zaheer, Muhammad Zaigham and Lee, Jin-ha and Astrid, Marcella and Mahmood, Arif and Lee, Seung-Ik},
journal={arXiv preprint arXiv:2104.14770},
year={2021}}

Video Object Segmentation using Guided Feature and Directional Deep Appearance Learning (CVPRW, 2020)

Abstract:

In this work, we focus on semi-supervised Video Object Segmentation (VOS) problem, where an object mask is provided in the initial frame and VOS algorithm has to segment that object in the rest of the video frames. VOS is a challenging task due to object appearance variations, illumination changes, occlusion, background clutter and various distractions. Many online VOS methods have been proposed however, most of these methods limit their real-world applications due to computationally expensive online finetuning. On the contrary, many cost efficient template-based and propagation-based approaches suffer from degraded performance due to object appearance drifts. In order to tackle those issues, we propose a guided feature learning with directional deep appearance learning for VOS. First, we introduce guided feature modulation to capture the video context information based on target mask. Secondly, a directional matching module is utilized to learn pixel-wise semantic embedding. Third, a directional appearance model is integrated to represent the target and the background cues on a spherical embedding space. Finally, we propose a guided pooling decoder to learn the global and the local context information during refinement. The proposed network is trained offline and does not require fine-tuning. Our algorithm achieved an overall J and F score of 64.9 on the DAVIS 2020 test-challenge data and 60.9 on the DAVIS 2020 test-dev dataset.

Model Diagram

Paper Link:

Link

Citation:

@inproceedings{fiaz2020video,
title={Video object segmentation using guided feature and directional deep appearance learning},
author={Fiaz, Mustansar and Mahmood, Arif and Jung, Soon Ki},
booktitle={Proceedings of the 2020 DAVIS Challenge on Video Object Segmentation-CVPR, Workshops, Seattle, WA, USA},
volume={19},
year={2020}
}

@inproceedings{nawaz2019cross,
title={Do cross modal systems leverage semantic relationships?},
author={Nawaz, Shah and Kamran Janjua, Muhammad and Gallo, Ignazio and Mahmood, Arif and Calefati, Alessandro and Shafait, Faisal},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops},
pages={0–0},
year={2019}
}

International Workshop on Breast Imaging (IWBI)

Bag of Visual Words Approach for Classification of Benign and Malignant Masses in Mammograms Using Voting Based Feature Encoding (IWBI, 2018)

Abstract:

Classification of benign and malignant masses in mammograms is a challenging problem. It has wide applications in the development of Computer Aided Diagnosis (CAD) systems, however many challenges still need to be addressed. Due to the risk associated with segmenting the mass region, focus is shifting from selecting the features just from the mass area, to the whole Region of Interest (RoI) containing that mass. Bag of Visual Words (BoVW) techniques are gaining attention for classification tasks in medical imaging by considering RoI as a set of local features. In general BoVW aims to construct a global descriptor based on the extracted local features. In this work, we investigate the performance of BoVW for the classification of benign and malignant mammographic masses. Several features have been explored as the local features and different methods are applied for building the code-book. Subsequently we propose a voting-based approach to encode the features. The proposed approach is evaluated on a subset of DDSM dataset. Initial results reveal classification accuracy as high as 87% and Area Under the Curve (AUC) as 0.93, which are better than the current state-of-the-art approaches applied to the same problem.

Model Diagram

Paper Link:

Link

Citation:

@inproceedings{inproceedings,
author = {Suhail, Zobia and Denton, Erika and Zwiggelaar, Reyer and Mahmood, Arif},
year = {2018},
month = {07},
pages = {2},
title = {Bag of visual words based approach for the classification of benign and malignant masses in mammograms using voting-based feature encoding},
doi = {10.1117/12.2316307}
}

ACCV Workshops (ACCVW)

Unsupervised RGBD Video Object Segmentation Using GANs (ACCVW, 2018)

Abstract:

Video object segmentation is a fundamental step in many advanced vision applications. Most existing algorithms are based on handcrafted features such as HOG, super-pixel segmentation or texturebased techniques, while recently deep features have been found to be more efficient. Existing algorithms observe performance degradation in the presence of challenges such as illumination variations, shadows, and color camouflage. To handle these challenges we propose a fusion based moving object segmentation algorithm which exploits color as well as depth information using GAN to achieve more accuracy. Our goal is to segment moving objects in the presence of challenging background scenes, in real environments. To address this problem, GAN is trained in an unsupervised manner on color and depth information independently with challenging video sequences. During testing, the trained GAN generates backgrounds similar to that in the test sample. The generated background samples are then compared with the test sample to segment moving objects. The final result is computed by fusion of object boundaries in both modalities, RGB and the depth.

Model Diagram

Paper Link:

Link

Citation:

@article{sultana2018unsupervised,
title={Unsupervised rgbd video object segmentation using gans},
author={Sultana, Maryam and Mahmood, Arif and Javed, Sajid and Jung, Soon Ki},
journal={arXiv preprint arXiv:1811.01526},
year={2018}
}

BMVC Workshops (BMVCW)

11. M Ghafoor, and Arif Mahmood, “Quantification of Occlusion Handling Capability of 3D Human Pose Estimation Framework.” IEEE Transactions on Multimedia, 2022, (IF 8.182).

10. M Z Zaheer, Arif Mahmood, M H Khan, M Segu, F Yu, S I Lee, “Generative Cooperative Learning for Unsupervised Video Anomaly Detection”, in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022.

11. S Javed, Arif Mahmood, J Dias, and N Werghi, “Multi-Level Feature Fusion for Nucleus Detection in Histology Images Using Correlation Filters” Computers in Biology and Medicine, 2022, (IF 6.698).

12. T Hassan, S Javed, Arif Mahmood, T Qaiser, N Werghi, and N Rajpoot, “Nucleus Classification in Histology Images Using Message Passing Network.” Medical Image Analysis, 2022, (IF 13.828).

13. Y Hao, Z Tang, B Alzahrani, R Alotaibi, R Alharthi, M Zhao, and Arif Mahmood, “An End-to-End Human Abnormal Behavior Recognition Framework for Crowds with Mentally Disordered Individuals.” IEEE journal of Biomedical and Health Informatics, 2022, (IF 7.021).

14. S Aftab, S F Ali, Arif Mahmood, and U Suleman, “A Boosting Framework for Human Posture Recognition Using Spatio-Temporal Features along with Radon Transform.” Multimedia Tools and Applications, 2022 (IF 2.577).

15. S Aldhaheri, R Alotaibi, B Alzahrani, A Hadi, Arif Mahmood, A Alhothali, and A Barnawi, “MACC Net: Multi-Task Attention Crowd Counting Network”, Applied Intelligence, 2022, (IF 5.019).

16. M Sultana, Arif Mahmood, and S K Jung, “Unsupervised Moving Object Segmentation Using Background Subtraction and Optimal Adversarial Noise Sample Search”, Pattern Recognition, 2022, (IF 8.518).

17. M Ghafoor, K Javed, and Arif Mahmood, “Walk Like Me: Video to Video Action Transfer” IEEE TechRxiv, 2022.

18. S M Shakeel, Y Zhang, X Wang, W Kang, and Arif Mahmood, “Multi-Scale Attention Guided Network for End-to-End Face Alignment and Recognition” Journal of Visual Communication and Image Representation, 2022, (IF 2.887).

19. J H Giraldo, Arif Mahmood, B Garcia-Garcia, D Thanou, and T Bouwmans, “Reconstruction of Time-Varying Graph Signals via Sobolev Smoothness” IEEE Transactions on Signal and Information Processing over Networks, 2022, (IF 3.301).

20. B Yousaf, M Usama, W Sultani, Arif Mahmood, and J Qadir, “Fake Visual Content Detection Using Two-Stream Convolutional Neural Networks” Neural Computing and Applications, 2022, (IF 5.102).

21. S Javed, Arif Mahmood, I Ullah, and T Bouwmans, “A Novel Algorithm Based on a Common Subspace Fusion for Visual Object Tracking” IEEE Access, 2022, (IF 3.476).

22. R Wang, R Alotaibi, B Alzahrani, Arif Mahmood, G Wu, H Xia, A Alshehri, and S Aldhaheri, “AAC: Automatic Augmentation for Crowd Counting” Neurocomputing, 2022, (IF 5.779).

23. I Ganapathi, S Javed, S S Ali, Arif Mahmood, N S Vu, and N Werghi, “Learning to Localize Image Forgery Using End-to-End Attention Network.” Neurocomputing, 2022, (IF 5.779).

24. M Sultana, Arif Mahmood, T Bouwmans, M H Khan, and S K Jung, “Moving Objects Segmentation Using Generative Adversarial Modeling” Neurocomputing, 2022, (IF 5.779).

25. S Ali, Arif Mahmood, S K Jung, “Lightweight Encoder-Decoder Architecture for Foot Ulcer Seg-mentation” in International Workshop on Frontiers of Computer Vision, Japan, 2022.

26. M Fiaz, Arif Mahmood, S S Farooq, K Ali, M Shaheryar, S K Jung, “Video Object Segmentation Based on Guided Feature Transfer Learning” in International Workshop on Frontiers of Computer Vision, Japan, 2022. [Best Paper Award]

27. S Javed, Arif Mahmood, J Dias, L Seneviratne, N Werghi, “Hierarchical Spatiotemporal Graph Regularized Discriminative Correlation Filter for Visual Object Tracking”, in IEEE Transactions on Cybernetics, 2021. (IF 11.079)

28. J Iqbal, MA Munir, Arif Mahmood, AR Ali, M Ali, “Leveraging orientation for weakly supervised object detection with application to firearm localization”, Neurocomputing, 2021. (IF 4.438)

29. S Javed, Arif Mahmood, N Rajpoot, J Dias, N Werghi, “Spatially Constrained Context-Aware Hierarchical Deep Correlation Filters for Nucleus Detection in Histology Images”, Medical Image Analysis, 2021. (IF 11.48)

30. M Asim, C Brekke, Arif Mahmood, T Eltoft, M Reigstad, “Improving Chlorophyll-a Estimation from sentinel-2 (MSI) in the Barents Sea using Machine Learning”, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2021. (IF 3.827)

31. W Abbas, MF Khan, M Taj, Arif Mahmood, “Statistically correlated multi-task learning for autonomous driving”, Neural Computing and Applications, 2021. (IF 4.774)

32. M Farooq, M N Dailey, Arif Mahmood, J Moonrinta, M Ekpanyapong, “Human face super- resolution on poor quality surveillance video footage”, Neural Computing and Applications, 2021. (IF 4.774)

33. S Nawaz, M S Saeed, P Morerio, Arif Mahmood, I Gallo, M H Yousaf, “Cross-modal Speaker Verification and Recognition: A Multilingual Perspective”, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshop, 2021.

34. B Yousaf, M Usama, W Sultani, Arif Mahmood, J Qadir, “Fake Visual Content Detection Using Two-Stream Convolutional Neural Networks”, arXiv preprint arXiv:2101.00676, 2021.

35. M S Saeed, P Morerio, Arif Mahmood, I Gallo, M H Yousaf, “Cross-modal Speaker Verification and Recognition: A Multilingual Perspective”, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshop, 2021.

36. M Fiaz, K Ali, S B Yun, K Y Baek, H J Lee, I S Kim, Arif Mahmood, S S Farooq, S K Jung, “Robust Tracking via Feature Enrichment and Overlap Maximization”, International Workshop on Frontiers of Computer Vision (IW-FCV) 2021.

37. MZ Zaheer, Arif Mahmood, MH Khan, M Astrid, SI Lee, “An anomaly detection system via moving surveillance robots with human collaboration” in Proceedings of the IEEE/CVF International Conference on Computer Vision Workshop, 2021.

38. M Sultana, Arif Mahmood, T Bouwmans, MH Khan, SK Jung, “Background/Foreground Separation: Guided Attention based Adversarial Modeling (GAAM) versus Robust Subspace Learning Methods”, Proceedings of the IEEE/CVF International Conference on Computer Vision Workshop, 2021.

39. S Javed, Arif Mahmood, K Benes, N Rajpoot, “Multiplex Cellular Communities in Multi-Gigapixel Colorectal Cancer Histology Images for Tissue Phenotyping”, IEEE Transactions on Image Processing (TIP), 2020. (IF 9.340)

40. S Javed, Arif Mahmood, JM Dias, N Werghi, “Robust Structural Low-Rank Tracking” IEEE Transactions on Image Processing (TIP), 2020. (IF 9.340)

41. S Javed, Arif Mahmood, M M Fraz, N A Koohbanania, K Benesc, Y W Tsangc, K Hewittc, D Epsteind, D Sneadc, N Rajpoot, “Cellular Community Detection For Tissue Phenotyping In Colorectal Cancer Histology Images”, Medical Image Analysis (MEDIA), July 2020. (IF 11.48)

42. M Sultana, Arif Mahmood, SK Jung, “Unsupervised Moving Object Detection in Complex Scenes Using Adversarial Regularizations” IEEE Transactions on Multimedia (TMM), 2020. (IF 6.051)

43. M Abdullah, W Iqbal, Arif Mahmood, F Bukhari, A Erradi, “Predictive Autoscaling of Microservices Hosted in Fog Microdata Center”, IEEE Systems Journal, 2020. (IF 3.987)

44. MZ Zaheer, Arif Mahmood, H Shin, SI Lee, “A Self-Reasoning Framework for Anomaly Detection Using Video-Level Labels”, IEEE Signal Processing Letters (SPL), 2020. (IF 3.105)

45. MZ Zaheer, Arif Mahmood, M Astrid, SI Lee, “CLAWS: Clustering Assisted Weakly Supervised Learning with Normalcy Suppression for Anomalous Event Detection”, European Conference on Computer Vision (ECCV), 2020.

46. M Fiaz, Arif Mahmood, KY Baek, SS Farooq, SK Jung, “Improving Object Tracking by Added Noise and Channel Attention” Sensors, 2020. (IF 3.275)

47. M Fiaz, Arif Mahmood, SK Jung, “Learning soft mask based feature fusion with channel and spatial attention for robust visual object tracking” Sensors 20 (14), 2020. (IF 3.275)

48. MZ Zaheer, J Lee, M Astrid, Arif Mahmood, SI Lee, “Cleaning Label Noise with Clusters for Minimally Supervised Anomaly Detection”, Computer Vision Pattern Recognition Workshop (CVPRW), 2020.

49. M Fiaz, Arif Mahmood, SK Jung, “Video Object Segmentation using Guided Feature and Directional Deep Appearance Learning” Computer Vision Pattern Recognition Workshop (DAVIS Challenge), 2020.

50. M Sultana, Arif Mahmood, T Bouwmans, SK Jung, “Dynamic Background Subtraction using Least Squares Adversarial Learning”, IEEE International Conference on Image Processing (ICIP), 2020.

51. S Javed, Arif Mahmood, J Dias, N Werghi, “CS-RPCA: Clustered Sparse RPCA for Moving Object Detection”, IEEE International Conference on Image Processing (ICIP) , 2020.

52. A Basit, MA Munir, M Ali, N Werghi, Arif Mahmood, “Localizing firearm carriers by identifying human-object pairs”, IEEE International Conference on Image Processing (ICIP), 2020.

53. M Asim, C Brekke, Arif Mahmood, T Eltoft, M Reigstad, “Ocean Color Net (OCN) for the Barents Sea”, IEEE International Geoscience and Remote Sensing Symposium (IGRSS), Aug, 2020.

54. M Fiaz, M M Rahman, Arif Mahmood, S S Farooq, K Y Baek, S K Jung, “Adaptive Feature Selection Siamese Networks for Visual Tracking”, International Workshop on Frontiers of Computer Vision (IW-FCV), Ibusuki, Japan, January 2020. (Best Student Paper Award)

55. M Sultana, Arif Mahmood, T Bouwmans, S K Jung, “Unsupervised Adversarial Learning for Dynamic Background Modelling”, International Workshop on Frontiers of Computer Vision (IW- FCV), Ibusuki, Japan, January 2020. (Best Paper Award)

56. N Khan, A Akram, Arif Mahmood, S Ashraf, K Murtaza, “Masked Linear Regression for Learning Local Receptive Fields for Facial Expression Synthesis”, International Journal of Computer Vision (IJCV), September 2019. (IF 6.071)

57. W Iqbal, A Erradi, M Abdullah, Arif Mahmood, “Predictive Auto-scaling of Multi-tier Applications Using Performance Varying Cloud Resources”, in IEEE Transactions on Cloud Computing (TCC), September 2019. (IF 5.967)

58. M S Farid, Arif Mahmood, S Al-Maadeed, “Multi-focus Image Fusion Using Content Adaptive Blurring”, in Information Fusion, January 2019 (IF 10.716)

59. S Javed, Arif Mahmood, S Al-Maadeed, T Bouwmans, S K Jung, “Moving Object Detection in Complex Scene Using Spatiotemporal Structured-Sparse RPCA”, in IEEE Transactions on Image Processing (TIP), February 2019. (IF 6.79)

60. M Shaban, Arif Mahmood, S Al-Maadeed, N Rajpoot “An Information Fusion Framework for Person Localization Via Body Pose in Spectator Crowds”, in Information Fusion, November 2019. (IF 10.716)

61. H Ullah, M Uzair, Arif Mahmood, H Ullah, S D Khan, F A Sheikh “Internal Emotion Classification Using EEG Signal with Sparse Discriminative Ensemble”, in IEEE Access, March 2019. (IF 4.098)

62. M Fiaz, Arif Mahmood, S Javed, S K Jung “Handcrafted and Deep Trackers: Recent Visual Object Tracking Approaches and Trends”, in ACM Computing Surveys, Jan 2019 (IF 5.55)

63. B Iqbal, W Iqbal, N Khan, Arif Mahmood, A Erradi “Canny edge detection and Hough transform for high resolution video streams using Hadoop and Spark”, in Cluster Computing, April 2019 (IF 1.851)

64. M Sultana, Arif Mahmood, S Javed, S K Jung, “Unsupervised deep context prediction for background estimation and foreground segmentation”, in Machine Vision and Applications (MVA), April 2019. (IF 1.788)

65. A Erradi, W Iqbal, Arif Mahmood, A Bouguettaya, “Web application resource requirements estimation based on the workload latent features”, in IEEE Transactions on Services Computing (TSC), May 2019 (IF 5.707)

66. Arif Mahmood, S Al-Maadeed, “Action recognition in poor quality spectator crowd videos using head distribution based person segmentation”, in Machine Vision and Applications (MVA), June 2019 (IF 1.788)

67. M Faiz, Arif Mahmood, S K Jung, “Using Convolutional Neural Network With Structural Input for Visual Object Tracking” in ACM/SIGAPP Symposium on Applied Computing (SAC), Cyprus, April 2019.

68. J Iqbal, M A Munir, Arif Mahmood, A R Ali, M Ali, “Orientation Aware Object Detection with Application to Firearms”, arXiv: 2662045, April 2019

69. M Faiz, Arif Mahmood, S K Jung, “Deep Siamese networks towards robust visual tracking” in “Visual Object Tracking in the Deep Neural Networks Era” IntechOpen Publishers, April 2019 (in Press)

70. S Javid, Arif Mahmood, N Werghi, J M M Dias, “Structural Low-Rank Tracking”, IEEE international conference on Advanced Video and Signal-based Surveillance (AVSS), Taiwan, Taipei, September 2019.

71. M Sultana, Arif Mahmood, T Bouwmans, S K Jung, “Complete Moving Object Detection in the Context of Robust Subspace Learning” in IEEE International Conference on Computer Vision (RSLCV Workshop) 2019, Seoul, South Korea.

72. S Javed, Arif Mahmood, N Werghi, N Rajpoot, “Deep Multiresolution Cellular Communities for Semantic Segmentation of Multi-Gigapixel Histology Images”, in IEEE International Conference on Computer Vision (VRMI Workshop) 2019, Seoul, South Korea.

73. S Nawaz, M K Janjua, I Gallo, Arif Mahmood, A Calefati, F Shafait, “Do Cross Modal Systems Leverage Semantic Relationships?” in IEEE International Conference on Computer Vision (CroMol Workshop) 2019, Seoul, South Korea.

74. S Nawaz, M K Janjua, I Gallo, Arif Mahmood, A Calefati, “Deep Latent Space Learning for Cross- modal Mapping of Audio and Visual Signals” in Digital Image Computing: Techniques and Applications (DICTA) 2019, Perth, Australia.”

75. Arif Mahmood, M Uzair, S Al-Maadeed, “Multi-order Statistical Descriptors for Real-time Face Recognition and Object Classification”, in IEEE Access, 2018 (IF 4.098).

76. I Rida, S Al-maadeed, Arif Mahmood, A Boridane, S Baksi “Palmprint Identification Using an Ensemble of Sparse Representations”, in IEEE Access, 2018 (IF 4.098).

77. S Javed, Arif Mahmood, T Bouwmans, S K Jung, “Spatiotemporal Low-rank Modeling for Complex Scene Background Initialization”, in IEEE Transactions on Circuits and Systems for Video Technology, 2018. (IF 4.046)

78. S Ali, R Khan, Arif Mahmood, M Hassan, M Jeon, “Using temporal covariance of motion and geometric features via boosting for human fall detection”, in Sensors, 2018. (IF 3.031)

79. W Iqbal, A Erradi, Arif Mahmood, “Dynamic workload patterns prediction for proactive auto-scaling of web applications”, in Journal of Network and Computer Applications, 2018. (IF 5.273)

80. M Fiaz, Arif Mahmood, S K Jung “Tracking Noisy Targets: A Review of Recent Object Tracking Approaches”, available online arXiv:1802.03098, 2018.

81. Z Suhail, Arif Mahmood, L Wang, P N Malcolm, and R Zwiggelaar, “A Voting-Based Encoding Technique for the Classification of Gleason Score for Prostate Cancers”, in Medical Image Understanding and Analysis (MIUA), University of Southampton, UK, July 2018.

82. Z Suhail, Arif Mahmood, E R E Denton, R Zwiggelaar, “Bag of Visual Words Approach for Classification of Benign and Malignant Masses in Mammograms Using Voting Based Feature Encoding”, in International Workshop on Breast Imaging (IWBI), Atlanta, Georgia, USA, July 2018.

83. M Sultana, Arif Mahmood, S Javed, S K Jung, “Unsupervised RGBD Video Object Segmentation Using GANs”, in ACCV Workshop on RGBD-Sensing and Understanding via Combined Color and Depth, Perth, Australia, December 2018.

Superpixels based Manifold Structured Sparse RPCA for Moving Object Detection (International Workshop on Activity Monitoring by Multiple Distributed Sensing, 2017)

Abstract:

Moving Object Detection (MOD) is a fundamental step in various computer vision and video surveillance systems. Methods based on Robust Principal Component Analysis (RPCA) have often been used for MOD. If the low-rank and sparse matrices are relatively coherent, e.g., if there are similarities between the moving objects and the background regions, and/or when the background is more complicated e.g., dynamic scenes, camera jitter, and lighting conditions, the accuracy of these methods deteriorates. It is because these methods assume that the elements in the sparse component are mutually independent, and thus ignore the spatiotemporal structure of the sparse component. To handle this problem, we propose spatiotemporal structured sparse RPCA algorithm for moving object detection. For this purpose, we incorporate two different manifold regularizations on the sparse component based on the local and global invariance assumption. A spatial and a temporal graph Laplacian regularization is encoded in the form of spectral graph structure. Both graphs are constructed using multiple features extracted from superpixels computed over the input data matrix. We propose a novel objective function to disentangle moving objects in the presence of complicated backgrounds. We evaluate our algorithm on challenging videos taken from six different datasets, including dynamic backgrounds, lighting condition, and camera jitter sequences. Our experiments have demonstrated excellent results compared to the current methods.

Paper Link:

Link

Citation:

@inproceedings{javed:hal-01580053,
TITLE = {{Superpixels based Manifold Structured Sparse RPCA for Moving Object Detection}},
AUTHOR = {Javed, Sajid and Mahmood, Arif and Bouwmans, Thierry and Soon, Ki Jung},
URL = {https://hal.science/hal-01580053},
BOOKTITLE = {{International Workshop on Activity Monitoring by Multiple Distributed Sensing, 2017}},
ADDRESS = {Londres, United Kingdom},
YEAR = {2017} }