Research

International Conferences

1. Arif Mahmood, A Basit, M A Munir, M ALi, “Detection and Localization of Firearm Carriers in Complex Scenes for Improved Safety Measures”, accepted IEEE Transactions on Computational Social Systems (TCSS), Sep. 2023. (IF 5.0)

2. H Yaseen, Arif Mahmood, “Learning Structure Aware Deep Spectral Embedding”, accepted IEEE Transactions on Image Processing (TIP), May 2023. (IF 11.042)

3. M Z Zaheer, Arif Mahmood, M Astrid, S I Lee, “Clustering Aided Weakly Supervised Training to Detect Anomalous Events in Surveillance Videos”, accepted IEEE Transactions on Neural Networks and Learning Systems (TNNLS), May 2023. (IF 14.225)

4. S Javed, Arif Mahmood, T Qaiser, N Werghi, “Knowledge Distillation in Histology Landscape by Multi-Layer Features Supervision”, in IEEE Journal of Biomedical and Health Informatics (JBHI), April 2023. (IF 7.021)

5. S Aldhaheri, R Alotaibi, B Alzahrani, A Hadi, Arif Mahmood, A Alhothali, A Barnawi, “MACC Net: Multi-task attention crowd counting network”, in Applied Intelligence, 2023. (IF 5.019)

6. MS Saeed, S Nawaz, MH Khan, MZ Zaheer, K Nandakumar, MH Yousaf, Arif Mahmood, “Single- branch Network for Multimodal Training”, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023.

7. H Giraldo, S Javed, Arif Mahmood, F D Malliaros, T Bouwmans, “Higher-Order Sparse Convolutions in Graph Neural Networks”, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023.

8. Unsupervised Landmark Discovery Using Consistency-Guided Bottleneck, British Machine Vision Conference (BMVC) 2023.

1. Arif Mahmood, A Basit, M A Munir, M ALi, “Detection and Localization of Firearm Carriers in Complex Scenes for Improved Safety Measures”, accepted IEEE Transactions on Computational Social Systems (TCSS), Sep. 2023. (IF 5.0)

2. H Yaseen, Arif Mahmood, “Learning Structure Aware Deep Spectral Embedding”, accepted IEEE Transactions on Image Processing (TIP), May 2023. (IF 11.042)

3. M Z Zaheer, Arif Mahmood, M Astrid, S I Lee, “Clustering Aided Weakly Supervised Training to Detect Anomalous Events in Surveillance Videos”, accepted IEEE Transactions on Neural Networks and Learning Systems (TNNLS), May 2023. (IF 14.225)

4. S Javed, Arif Mahmood, T Qaiser, N Werghi, “Knowledge Distillation in Histology Landscape by Multi-Layer Features Supervision”, in IEEE Journal of Biomedical and Health Informatics (JBHI), April 2023. (IF 7.021)

5. S Aldhaheri, R Alotaibi, B Alzahrani, A Hadi, Arif Mahmood, A Alhothali, A Barnawi, “MACC Net: Multi-task attention crowd counting network”, in Applied Intelligence, 2023. (IF 5.019)

6. MS Saeed, S Nawaz, MH Khan, MZ Zaheer, K Nandakumar, MH Yousaf, Arif Mahmood, “Single- branch Network for Multimodal Training”, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023.

7. H Giraldo, S Javed, Arif Mahmood, F D Malliaros, T Bouwmans, “Higher-Order Sparse Convolutions in Graph Neural Networks”, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023.

8. Unsupervised Landmark Discovery Using Consistency-Guided Bottleneck, British Machine Vision Conference (BMVC) 2023.

IEEE Transactions

1. Arif Mahmood, A Basit, M A Munir, M ALi, “Detection and Localization of Firearm Carriers in Complex Scenes for Improved Safety Measures”, accepted IEEE Transactions on Computational Social Systems (TCSS), Sep. 2023. (IF 5.0)

2. H Yaseen, Arif Mahmood, “Learning Structure Aware Deep Spectral Embedding”, accepted IEEE Transactions on Image Processing (TIP), May 2023. (IF 11.042)

3. M Z Zaheer, Arif Mahmood, M Astrid, S I Lee, “Clustering Aided Weakly Supervised Training to Detect Anomalous Events in Surveillance Videos”, accepted IEEE Transactions on Neural Networks and Learning Systems (TNNLS), May 2023. (IF 14.225)

4. S Javed, Arif Mahmood, T Qaiser, N Werghi, “Knowledge Distillation in Histology Landscape by Multi-Layer Features Supervision”, in IEEE Journal of Biomedical and Health Informatics (JBHI), April 2023. (IF 7.021)

5. S Aldhaheri, R Alotaibi, B Alzahrani, A Hadi, Arif Mahmood, A Alhothali, A Barnawi, “MACC Net: Multi-task attention crowd counting network”, in Applied Intelligence, 2023. (IF 5.019)

6. MS Saeed, S Nawaz, MH Khan, MZ Zaheer, K Nandakumar, MH Yousaf, Arif Mahmood, “Single- branch Network for Multimodal Training”, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023.

7. H Giraldo, S Javed, Arif Mahmood, F D Malliaros, T Bouwmans, “Higher-Order Sparse Convolutions in Graph Neural Networks”, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023.

8. Unsupervised Landmark Discovery Using Consistency-Guided Bottleneck, British Machine Vision Conference (BMVC) 2023.

IEEE Journals

1. Arif Mahmood, A Basit, M A Munir, M ALi, “Detection and Localization of Firearm Carriers in Complex Scenes for Improved Safety Measures”, accepted IEEE Transactions on Computational Social Systems (TCSS), Sep. 2023. (IF 5.0)

2. H Yaseen, Arif Mahmood, “Learning Structure Aware Deep Spectral Embedding”, accepted IEEE Transactions on Image Processing (TIP), May 2023. (IF 11.042)

3. M Z Zaheer, Arif Mahmood, M Astrid, S I Lee, “Clustering Aided Weakly Supervised Training to Detect Anomalous Events in Surveillance Videos”, accepted IEEE Transactions on Neural Networks and Learning Systems (TNNLS), May 2023. (IF 14.225)

4. S Javed, Arif Mahmood, T Qaiser, N Werghi, “Knowledge Distillation in Histology Landscape by Multi-Layer Features Supervision”, in IEEE Journal of Biomedical and Health Informatics (JBHI), April 2023. (IF 7.021)

5. S Aldhaheri, R Alotaibi, B Alzahrani, A Hadi, Arif Mahmood, A Alhothali, A Barnawi, “MACC Net: Multi-task attention crowd counting network”, in Applied Intelligence, 2023. (IF 5.019)

6. MS Saeed, S Nawaz, MH Khan, MZ Zaheer, K Nandakumar, MH Yousaf, Arif Mahmood, “Single- branch Network for Multimodal Training”, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023.

7. H Giraldo, S Javed, Arif Mahmood, F D Malliaros, T Bouwmans, “Higher-Order Sparse Convolutions in Graph Neural Networks”, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023.

8. Unsupervised Landmark Discovery Using Consistency-Guided Bottleneck, British Machine Vision Conference (BMVC) 2023.

Elsevier Journals

1. Arif Mahmood, A Basit, M A Munir, M ALi, “Detection and Localization of Firearm Carriers in Complex Scenes for Improved Safety Measures”, accepted IEEE Transactions on Computational Social Systems (TCSS), Sep. 2023. (IF 5.0)

2. H Yaseen, Arif Mahmood, “Learning Structure Aware Deep Spectral Embedding”, accepted IEEE Transactions on Image Processing (TIP), May 2023. (IF 11.042)

3. M Z Zaheer, Arif Mahmood, M Astrid, S I Lee, “Clustering Aided Weakly Supervised Training to Detect Anomalous Events in Surveillance Videos”, accepted IEEE Transactions on Neural Networks and Learning Systems (TNNLS), May 2023. (IF 14.225)

4. S Javed, Arif Mahmood, T Qaiser, N Werghi, “Knowledge Distillation in Histology Landscape by Multi-Layer Features Supervision”, in IEEE Journal of Biomedical and Health Informatics (JBHI), April 2023. (IF 7.021)

5. S Aldhaheri, R Alotaibi, B Alzahrani, A Hadi, Arif Mahmood, A Alhothali, A Barnawi, “MACC Net: Multi-task attention crowd counting network”, in Applied Intelligence, 2023. (IF 5.019)

6. MS Saeed, S Nawaz, MH Khan, MZ Zaheer, K Nandakumar, MH Yousaf, Arif Mahmood, “Single- branch Network for Multimodal Training”, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023.

7. H Giraldo, S Javed, Arif Mahmood, F D Malliaros, T Bouwmans, “Higher-Order Sparse Convolutions in Graph Neural Networks”, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023.

8. Unsupervised Landmark Discovery Using Consistency-Guided Bottleneck, British Machine Vision Conference (BMVC) 2023.

Springer Journals

1. Arif Mahmood, A Basit, M A Munir, M ALi, “Detection and Localization of Firearm Carriers in Complex Scenes for Improved Safety Measures”, accepted IEEE Transactions on Computational Social Systems (TCSS), Sep. 2023. (IF 5.0)

2. H Yaseen, Arif Mahmood, “Learning Structure Aware Deep Spectral Embedding”, accepted IEEE Transactions on Image Processing (TIP), May 2023. (IF 11.042)

3. M Z Zaheer, Arif Mahmood, M Astrid, S I Lee, “Clustering Aided Weakly Supervised Training to Detect Anomalous Events in Surveillance Videos”, accepted IEEE Transactions on Neural Networks and Learning Systems (TNNLS), May 2023. (IF 14.225)

4. S Javed, Arif Mahmood, T Qaiser, N Werghi, “Knowledge Distillation in Histology Landscape by Multi-Layer Features Supervision”, in IEEE Journal of Biomedical and Health Informatics (JBHI), April 2023. (IF 7.021)

5. S Aldhaheri, R Alotaibi, B Alzahrani, A Hadi, Arif Mahmood, A Alhothali, A Barnawi, “MACC Net: Multi-task attention crowd counting network”, in Applied Intelligence, 2023. (IF 5.019)

6. MS Saeed, S Nawaz, MH Khan, MZ Zaheer, K Nandakumar, MH Yousaf, Arif Mahmood, “Single- branch Network for Multimodal Training”, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023.

7. H Giraldo, S Javed, Arif Mahmood, F D Malliaros, T Bouwmans, “Higher-Order Sparse Convolutions in Graph Neural Networks”, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023.

8. Unsupervised Landmark Discovery Using Consistency-Guided Bottleneck, British Machine Vision Conference (BMVC) 2023.

MDPI Journals

1. Arif Mahmood, A Basit, M A Munir, M ALi, “Detection and Localization of Firearm Carriers in Complex Scenes for Improved Safety Measures”, accepted IEEE Transactions on Computational Social Systems (TCSS), Sep. 2023. (IF 5.0)

2. H Yaseen, Arif Mahmood, “Learning Structure Aware Deep Spectral Embedding”, accepted IEEE Transactions on Image Processing (TIP), May 2023. (IF 11.042)

3. M Z Zaheer, Arif Mahmood, M Astrid, S I Lee, “Clustering Aided Weakly Supervised Training to Detect Anomalous Events in Surveillance Videos”, accepted IEEE Transactions on Neural Networks and Learning Systems (TNNLS), May 2023. (IF 14.225)

4. S Javed, Arif Mahmood, T Qaiser, N Werghi, “Knowledge Distillation in Histology Landscape by Multi-Layer Features Supervision”, in IEEE Journal of Biomedical and Health Informatics (JBHI), April 2023. (IF 7.021)

5. S Aldhaheri, R Alotaibi, B Alzahrani, A Hadi, Arif Mahmood, A Alhothali, A Barnawi, “MACC Net: Multi-task attention crowd counting network”, in Applied Intelligence, 2023. (IF 5.019)

6. MS Saeed, S Nawaz, MH Khan, MZ Zaheer, K Nandakumar, MH Yousaf, Arif Mahmood, “Single- branch Network for Multimodal Training”, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023.

7. H Giraldo, S Javed, Arif Mahmood, F D Malliaros, T Bouwmans, “Higher-Order Sparse Convolutions in Graph Neural Networks”, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023.

8. Unsupervised Landmark Discovery Using Consistency-Guided Bottleneck, British Machine Vision Conference (BMVC) 2023.

ACM Journals

1. Arif Mahmood, A Basit, M A Munir, M ALi, “Detection and Localization of Firearm Carriers in Complex Scenes for Improved Safety Measures”, accepted IEEE Transactions on Computational Social Systems (TCSS), Sep. 2023. (IF 5.0)

2. H Yaseen, Arif Mahmood, “Learning Structure Aware Deep Spectral Embedding”, accepted IEEE Transactions on Image Processing (TIP), May 2023. (IF 11.042)

3. M Z Zaheer, Arif Mahmood, M Astrid, S I Lee, “Clustering Aided Weakly Supervised Training to Detect Anomalous Events in Surveillance Videos”, accepted IEEE Transactions on Neural Networks and Learning Systems (TNNLS), May 2023. (IF 14.225)

4. S Javed, Arif Mahmood, T Qaiser, N Werghi, “Knowledge Distillation in Histology Landscape by Multi-Layer Features Supervision”, in IEEE Journal of Biomedical and Health Informatics (JBHI), April 2023. (IF 7.021)

5. S Aldhaheri, R Alotaibi, B Alzahrani, A Hadi, Arif Mahmood, A Alhothali, A Barnawi, “MACC Net: Multi-task attention crowd counting network”, in Applied Intelligence, 2023. (IF 5.019)

6. MS Saeed, S Nawaz, MH Khan, MZ Zaheer, K Nandakumar, MH Yousaf, Arif Mahmood, “Single- branch Network for Multimodal Training”, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023.

7. H Giraldo, S Javed, Arif Mahmood, F D Malliaros, T Bouwmans, “Higher-Order Sparse Convolutions in Graph Neural Networks”, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023.

8. Unsupervised Landmark Discovery Using Consistency-Guided Bottleneck, British Machine Vision Conference (BMVC) 2023.

Book Chapters

1. Arif Mahmood, A Basit, M A Munir, M ALi, “Detection and Localization of Firearm Carriers in Complex Scenes for Improved Safety Measures”, accepted IEEE Transactions on Computational Social Systems (TCSS), Sep. 2023. (IF 5.0)

2. H Yaseen, Arif Mahmood, “Learning Structure Aware Deep Spectral Embedding”, accepted IEEE Transactions on Image Processing (TIP), May 2023. (IF 11.042)

3. M Z Zaheer, Arif Mahmood, M Astrid, S I Lee, “Clustering Aided Weakly Supervised Training to Detect Anomalous Events in Surveillance Videos”, accepted IEEE Transactions on Neural Networks and Learning Systems (TNNLS), May 2023. (IF 14.225)

4. S Javed, Arif Mahmood, T Qaiser, N Werghi, “Knowledge Distillation in Histology Landscape by Multi-Layer Features Supervision”, in IEEE Journal of Biomedical and Health Informatics (JBHI), April 2023. (IF 7.021)

5. S Aldhaheri, R Alotaibi, B Alzahrani, A Hadi, Arif Mahmood, A Alhothali, A Barnawi, “MACC Net: Multi-task attention crowd counting network”, in Applied Intelligence, 2023. (IF 5.019)

6. MS Saeed, S Nawaz, MH Khan, MZ Zaheer, K Nandakumar, MH Yousaf, Arif Mahmood, “Single- branch Network for Multimodal Training”, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023.

7. H Giraldo, S Javed, Arif Mahmood, F D Malliaros, T Bouwmans, “Higher-Order Sparse Convolutions in Graph Neural Networks”, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023.

8. Unsupervised Landmark Discovery Using Consistency-Guided Bottleneck, British Machine Vision Conference (BMVC) 2023.

Workshops

1. Arif Mahmood, A Basit, M A Munir, M ALi, “Detection and Localization of Firearm Carriers in Complex Scenes for Improved Safety Measures”, accepted IEEE Transactions on Computational Social Systems (TCSS), Sep. 2023. (IF 5.0)

2. H Yaseen, Arif Mahmood, “Learning Structure Aware Deep Spectral Embedding”, accepted IEEE Transactions on Image Processing (TIP), May 2023. (IF 11.042)

3. M Z Zaheer, Arif Mahmood, M Astrid, S I Lee, “Clustering Aided Weakly Supervised Training to Detect Anomalous Events in Surveillance Videos”, accepted IEEE Transactions on Neural Networks and Learning Systems (TNNLS), May 2023. (IF 14.225)

4. S Javed, Arif Mahmood, T Qaiser, N Werghi, “Knowledge Distillation in Histology Landscape by Multi-Layer Features Supervision”, in IEEE Journal of Biomedical and Health Informatics (JBHI), April 2023. (IF 7.021)

5. S Aldhaheri, R Alotaibi, B Alzahrani, A Hadi, Arif Mahmood, A Alhothali, A Barnawi, “MACC Net: Multi-task attention crowd counting network”, in Applied Intelligence, 2023. (IF 5.019)

6. MS Saeed, S Nawaz, MH Khan, MZ Zaheer, K Nandakumar, MH Yousaf, Arif Mahmood, “Single- branch Network for Multimodal Training”, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023.

7. H Giraldo, S Javed, Arif Mahmood, F D Malliaros, T Bouwmans, “Higher-Order Sparse Convolutions in Graph Neural Networks”, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023.

8. Unsupervised Landmark Discovery Using Consistency-Guided Bottleneck, British Machine Vision Conference (BMVC) 2023.

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

CPLIP: Zero-Shot Learning for Histopathology with Comprehensive Vision-Language Alignment (IEEE/CVF CVPR , 2024)

Abstract:

This paper proposes Comprehensive Pathology Language Image Pre-training (CPLIP), a new unsupervised technique designed to enhance the alignment of images and text in histopathology for tasks such as classification and segmentation. This methodology enriches vision-language models by leveraging extensive data without needing ground truth annotations. CPLIP involves constructing a pathology-specific dictionary, generating textual descriptions for images using language models, and retrieving relevant images for each text snippet via a pre-trained model. The model is then fine-tuned using a many-to-many contrastive learning method to align complex interrelated concepts across both modalities. Evaluated across multiple histopathology tasks, CPLIP shows notable improvements in zero-shot learning scenarios, outperforming existing methods in both interpretability and robustness and setting a higher benchmark for the application of vision-language models in the field.

Model Diagram

Paper Link:

Citation:

@INPROCEEDINGS{sajid2024cplip,
title={CPLIP: Zero-Shot Learning for Histopathology with Comprehensive Vision-Language Alignment},
author={Sajid Javed, Arif Mahmood, Iyyakutti Iyappan Ganapathi, Fayaz Ali Dharejo1, Naoufel Werghi, Mohammed Bennamoun},
booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}
year={2024}
}

DiffuseMix: Label-Preserving Data Augmentation with Diffusion Models (IEEE/CVF CVPR , 2024)

Abstract:

Recently, a number of image-mixing-based augmentation techniques have been introduced to improve the generalization of deep neural networks. In these techniques, two or more randomly selected natural images are mixed together to generate an augmented image. Such methods may not only omit important portions of the input images but also introduce label ambiguities by mixing images across labels resulting in misleading supervisory signals. To address these limitations, we propose DIFFUSEMIX, a novel data augmentation technique that leverages a diffusion model to reshape training images, supervised by our bespoke conditional prompts. First, concatenation of a partial natural image and its generated counterpart is obtained which helps in avoiding the generation of unrealistic images or label ambiguities. Then, to enhance resilience against adversarial attacks and improves safety measures, a randomly selected structural pattern from a set of fractal images is blended into the concatenated image to form the final augmented image for training. Our empirical results on seven different datasets reveal that DIFFUSEMIX achieves superior performance compared to existing stateof-the-art methods on tasks including general classification, fine-grained classification, fine-tuning, data scarcity, and adversarial robustness.

Model Diagram

Paper Link:

Citation:

@INPROCEEDINGS{diffuseMix2024,
title={DiffuseMix: Label-Preserving Data Augmentation with Diffusion Models},
author={Khawar Islam, Muhammad Zaigham Zaheer, Arif Mahmood, Karthik Nandakumar},
booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2024}}

Pose-Guided Self-Training with Two-Stage Clustering for Unsupervised Landmark Discovery (IEEE/CVF CVPR , 2024)

Abstract:

For number 3 please use the abstract: Unsupervised landmarks discovery (ULD) for an object category is a challenging computer vision problem. In pursuit of developing a robust ULD framework, we explore the potential of a recent paradigm of self-supervised learning algorithms, known as diffusion models. Some recent works have shown that these models implicitly contain important correspondence cues. Towards harnessing the potential of diffusion models for the ULD task, we make the following core contributions. First, we propose a ZeroShot ULD baseline based on simple clustering of random pixel locations with nearest neighbour matching. It delivers better results than existing ULD methods. Second, motivated by the ZeroShot performance, we develop a ULD algorithm based on diffusion features using self-training and clustering which also outperforms prior methods by notable margins. Third, we introduce a new proxy task based on generating latent pose codes and also propose a two-stage clustering mechanism to facilitate effective pseudo-labeling, resulting in a significant performance improvement. Overall, our approach consistently outperforms state-of-the-art methods on four challenging benchmarks AFLW, MAFL, CatHeads and LS3D by significant margins.

Model Diagram

Paper Link:

Citation:

@INPROCEEDINGS{DULD2024,
title={Pose-Guided Self-Training with Two-Stage Clustering for Unsupervised Landmark Discovery},
author={S Tourani, A Alwheibi, Arif Mahmood, M H Khan},
booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2024}}

Generative Cooperative Learning for Unsupervised Video Anomaly Detection (IEEE/CVF CVPR , 2022)

Abstract:

Video anomaly detection is well investigated in weakly supervised and one-class classification (OCC) settings. However, unsupervised video anomaly detection methods are quite sparse, likely because anomalies are less frequent in occurrence and usually not well-defined, which when coupled with the absence of ground truth supervision, could adversely affect the performance of the learning algorithms. This problem is challenging yet rewarding as it can completely eradicate the costs of obtaining laborious annotations and enable such systems to be deployed without human intervention. To this end, we propose a novel unsupervised Generative Cooperative Learning (GCL) approach for video anomaly detection that exploits the low frequency of anomalies towards building a cross-supervision between a generator and a discriminator. In essence, both networks get trained in a cooperative fashion, thereby allowing unsupervised learning. We conduct extensive experiments on two large-scale video anomaly detection datasets, UCF crime and ShanghaiTech. Consistent improvement over the existing state-of-the-art unsupervised and OCC methods corroborate the effectiveness of our approach.

Model Diagram

Paper Link:

Citation:

@inproceedings{GCL2022,
title={Generative Cooperative Learning for Unsupervised Video Anomaly Detection},
author={M Z Zaheer, Arif Mahmood, M. H. Khan, M Segu, F Yu, S Lee},
booktitle={IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2022}}

Semi-supervised Spectral Clustering for Image Set Classification (IEEE/CVF CVPR, 2014)

Abstract:

We present an image set classification algorithm based on unsupervised clustering of labeled training and unlabeled test data where labels are only used in the stopping criterion. The probability distribution of each class over the set of clusters is used to define a true set based similarity measure. To this end, we propose an iterative sparse spectral clustering algorithm. In each iteration, a proximity matrix is efficiently recomputed to better represent the local subspace structure. Initial clusters capture the global data structure and finer clusters at the later stages capture the subtle class differences not visible at the global scale. Image sets are compactly represented with multiple Grassmannian manifolds which are subsequently embedded in Euclidean space with the proposed spectral clustering algorithm. We also propose an efficient eigenvector solver which not only reduces the computational cost of spectral clustering by many folds but also improves the clustering quality and final classification results. Experiments on five standard datasets and comparison with seven existing techniques show the efficacy of our algorithm.

Model Diagram

Paper Link:

Citation:

@INPROCEEDINGS {6909417,
author = {Arif Mahmood and A. Mian and R. Owens},
booktitle = {2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
title = {Semi-supervised Spectral Clustering for Image Set Classification},
year = {2014},
issn = {1063-6919},
pages = {121-128}}

European Conference on Computer Vision (ECCV)

CLAWS: Clustering Assisted Weakly Supervised Learning with Normalcy Suppression for Anomalous Event Detection (ECCV, 2021)

Abstract:

Learning to detect real-world anomalous events through video level labels is a challenging task due to the rare occurrence of anomalies as well as noise in the labels. In this work, we propose a weakly supervised anomaly detection method which has manifold contributions including 1) a random batch based training procedure to reduce inter-batch correlation, 2) a normalcy suppression mechanism to minimize anomaly scores of the normal regions of a video by taking into account the overall information available in one training batch, and 3) a clustering distance based loss to contribute towards mitigating the label noise and to produce better anomaly representations by encouraging our model to generate distinct normal and anomalous clusters. The proposed method obtains 83.03% and 89.67% frame-level AUC performance on the UCF-Crime and ShanghaiTech datasets respectively, demonstrating its superiority over the existing state-of-the-art algorithms.

Model Diagram

Paper Link:

Citation:

@inproceedings{zaheer2020claws,
title={CLAWS: Clustering Assisted Weakly Supervised Learning with Normalcy Suppression for Anomalous Event Detection},
author={Zaheer, Muhammad Zaigham and Mahmood, Arif and Astrid, Marcella and Lee, Seung-Ik},
booktitle={European Conference on Computer Vision},
pages={358–376},
year={2020},
organization={Springer}}

HOPC: Histogram of oriented principal components of 3D point clouds for action recognition (ECCV, 2014)

Abstract:

Existing techniques for 3D action recognition are sensitive to viewpoint variations because they extract features from depth images which change significantly with viewpoint. In contrast, we directly process the point clouds and propose a new technique for action recognition which is more robust to noise, action speed and viewpoint variations. Our technique consists of a novel descriptor and keypoint detection algorithm. The proposed descriptor is extracted at a point by encoding the Histogram of Oriented Principal Components (HOPC) within an adaptive spatio-temporal support volume around that point. Based on this descriptor, we present a novel method to detect Spatio-Temporal Key-Points (STKPs) in 3D point cloud sequences. Experimental results show that the proposed descriptor and STKP detector outperform state-of-the-art algorithms on three benchmark human activity datasets. We also introduce a new multiview public dataset and show the robustness of our proposed method to viewpoint variations.

Model Diagram

Paper Link:

Citation:

@inproceedings{rahmani2014hopc,
title={HOPC: Histogram of oriented principal components of 3D point clouds for action
recognition},
author={Rahmani, Hossein and Mahmood, Arif and Q Huynh, Du and Mian, Ajmal},
booktitle={European Conference on Computer Vision},
pages={742–757},
year={2014},
organization={Springer}}

British Machine Vision Conference (BMVC)

Unsupervised Landmark Discovery Using Consistency-Guided Bottleneck (BMVC, 2023)

Abstract:

We study a challenging problem of unsupervised discovery of object landmarks. Many recent methods rely on bottlenecks to generate 2D Gaussian heat maps however, these are limited in generating informed heatmaps while training, presumably due to the lack of effective structural cues. Also, it is assumed that all predicted landmarks are semantically relevant despite having no ground truth supervision. In the current work, we introduce a consistency-guided bottleneck in an image reconstruction-based pipeline that leverages landmark consistency – a measure of compatibility score with the pseudo ground truth – to generate adaptive heatmaps. We propose obtaining pseudo-supervision via forming landmark correspondence across images. The consistency then modulates the uncertainty of the discovered landmarks in the generation of adaptive heatmaps which rank consistent landmarks above their noisy counterparts, providing effective structural information for improved robustness. Evaluations on five diverse datasets including MAFL, AFLW, LS3D, Cats, and Shoes demonstrate excellent performance of the proposed approach compared to the existing state-of-the-art methods. Our code is publicly available at https://github.com/MamonaAwan/CGB_ULD.

Model Diagram

Paper Link:

Citation:

@InProceedings{awan2023unsupervised,
title={Unsupervised Landmark Discovery Using Consistency Guided Bottleneck},
author={Mamona Awan and Muhammad Haris Khan and Sanoojan Baliah and
Muhammad Ahmad Waseem and Salman Khan and Fahad Shahbaz Khan and Arif
Mahmood},
booktitle = {Proceedings of the British Machine Vision Conference},
year={2023},
}

Face Pyramid Vision Transformer (BMVC, 2022)

Abstract:

A novel Face Pyramid Vision Transformer (FPVT) is proposed to learn a
discriminative multi-scale facial representations for face recognition and verification. In FPVT,
Face Spatial Reduction Attention (FSRA) and Dimensionality Reduction (FDR) layers are
employed to make the feature maps compact, thus reducing the computations. An Improved
Patch Embedding (IPE) algorithm is proposed to exploit the benefits of CNNs in ViTs (e.g.,
shared weights, local context, and receptive fields) to model lower-level edges to higher-level
semantic primitives. Within FPVT framework, a Convolutional Feed-Forward Network (CFFN) is
proposed that extracts locality information to learn low level facial information. The proposed
FPVT is evaluated on seven benchmark datasets and compared with ten existing state-of-the-
art methods, including CNNs, pure ViTs, and Convolutional ViTs. Despite fewer parameters,
FPVT has demonstrated excellent performance over the compared methods. Project page is
available at https://khawar-islam.github.io/fpvt/.

Model Diagram

Paper Link:

Citation:

@InProceedings{Khawar_BMVC22_FPVT,
author = {Khawar Islam, Muhammad Zaigham Zaheer, Arif Mahmood},
title = {Face Pyramid Vision Transformer},
booktitle = {Proceedings of the British Machine Vision Conference},
year = {2022}
}

Hyperspectral Face Recognition using 3D-DCT and Partial Least Squares (BMVC, 2013)

Abstract:

Hyperspectral imaging offers new opportunities for inter-person facial discrimination. However, compact and discriminative feature extraction from high dimensional hyperspectral image cubes is a challenging task. We propose a spatio-spectral feature extraction method based on the 3D Discrete Cosine Transform (3D-DCT). The 3D-DCT optimally compacts information in the low frequency coefficients. Therefore, we represent each hyperspectral facial cube by a small number of low frequency DCT coefficients and formulate Partial Least Square (PLS) regression for accurate classification. The proposed algorithm is evaluated on three standard hyperspectral face databases. Experimental results show that the proposed algorithm outperforms five current state of the art hyperspectral face recognition algorithms by a significant margin.

Model Diagram

Paper Link:

Citation:

@inproceedings{,
title={Hyperspectral Face Recognition using 3D-DCT and Partial Least Squares },
author={Muhammad Uzair, Arif Mahmood, Ajmal Mian },
year = {2013},
booktitle = {Proceedings of the British Machine Vision Conference},
publisher = {BMVA Press}}

Hierarchical Sparse Spectral Clustering for Image Set Classification (BMVC, 2012)

Abstract:

We present a structural matching technique for robust classification based on image sets. In set based classification, a probe set is matched with a number of gallery sets and assigned the label of the most similar set. We represent each image set by a sparse dictionary and compute a similarity matrix by matching all the dictionary atoms of the gallery and probe sets. The similarity matrix comprises the sparse coding coefficients and forms a fully connected directed graph. The nodes of the graph are the dictionary atoms and the edges are the sparse coefficients. The graph is converted to an undirected graph with positive edge weights and spectral clustering is used to cut the graph into two balanced partitions using the normalized cut algorithm. This process is repeated until the graph reduces to critical and non-critical partitions. A critical partition contains atoms with the same gallery label along with one or more probe atoms whereas a noncritical partition either consists of only probe atoms or atoms with multiple gallery labels with no probe atom. Using the critical partitions, we define a novel set based similarity measure and assign the probe set the label of the gallery set with maximum similarity. The proposed algorithm is applied to image set based face recognition using two standard databases.

Model Diagram

Paper Link:

Citation:

@inproceedings{mahmood2012hierarchical,
title={Hierarchical Sparse Spectral Clustering For Image Set Classification.},
author={Mahmood, Arif and Mian, Ajmal S},
booktitle={Proceedings of the British Machine Vision Conference},
pages={1–11},
year={2012}}

IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Single-branch Network for Multimodal Training (ICASSP, 2023)

Abstract:

With the rapid growth of social media platforms, users are sharing billions of multimedia posts containing audio, images, and text. Researchers have focused on building autonomous systems capable of processing such multimedia data to solve challenging multimodal tasks including cross-modal retrieval, matching, and verification. Existing works use separate networks to extract embeddings of each modality to bridge the gap between them. The modular structure of their branched networks is fundamental in creating numerous multimodal applications and has become a defacto standard to handle multiple modalities. In contrast, we propose a novel single-branch network capable of learning discriminative representation of unimodal as well as multimodal tasks without changing the network. An important feature of our single-branch network is that it can be trained either using single or multiple modalities without sacrificing performance. We evaluated our proposed singlebranch network on the challenging multimodal problem (facevoice association) for cross-modal verification and matching tasks with various loss formulations. Experimental results demonstrate the superiority of our proposed single-branch network over the existing methods in a wide range of experiments. Code: https://github.com/msaadsaeed/SBNet

Model Diagram

Paper Link:

Citation:

@INPROCEEDINGS{10097207,
author={Saeed, Muhammad Saad and Nawaz, Shah and Khan, Muhammad Haris and Zaigham Zaheer, Muhammad and Nandakumar, Karthik and Yousaf, Muhammad Haroon and Mahmood, Arif},
booktitle={IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
title={Single-branch Network for Multimodal Training},
year={2023},
pages={1-5},
doi={10.1109/ICASSP49357.2023.10097207}}

Higher-Order Sparse Convolutions In Graph Neural Networks (ICASSP, 2023)

Abstract:

Graph Neural Networks (GNNs) have been applied to many problems in computer sciences. Capturing higher-order relationships between nodes is crucial to increase the expressive power of GNNs. However, existing methods to capture these relationships could be infeasible for large-scale graphs. In this work, we introduce a new higher-order sparse convolution based on the Sobolev norm of graph signals. Our Sparse Sobolev GNN (S-SobGNN) computes a cascade of filters on each layer with increasing Hadamard powers to get a more diverse set of functions, and then a linear combination layer weights the embeddings of each filter. We evaluate S-SobGNN in several applications of semi-supervised learning. S-SobGNN shows competitive performance in all applications as compared to several state-of-the-art methods.

Model Diagram

Paper Link:

Citation:

@INPROCEEDINGS{10096494,
author={Giraldo, Jhony H. and Javed, Sajid and Mahmood, Arif and Malliaros, Fragkiskos D. and Bouwmans, Thierry},
booktitle={ICASSP 2023 – 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
title={Higher-Order Sparse Convolutions in Graph Neural Networks},
year={2023},
pages={1-5},
doi={10.1109/ICASSP49357.2023.10096494}}

IEEE International Conference on Image Processing (ICIP)

Dynamic Background Subtraction using Least Squares Adversarial Learning (IEEE ICIP, 2020)

Abstract:

Dynamic Background Subtraction (BS) is a fundamental problem in many vision-based applications. BS in real complex environments has several challenging conditions like illumination variations, shadows, camera jitters, and bad weather. In this study, we aim to address the challenges of BS in complex scenes by exploiting conditional least squares adversarial networks. During training, a scene-specific conditional least squares adversarial network with two additional regularizations including L1-Loss and Perceptual-Loss is employed to learn the dynamic background variations. The given input to the model is video frames conditioned on corresponding ground truth to learn the dynamic changes in complex scenes. Afterwards, testing is performed on unseen test video frames so that the generator would conduct dynamic background subtraction. The proposed method consisting of three loss-terms including least squares adversarial loss, L1-Loss and Perceptual-Loss is evaluated on two benchmark datasets CDnet2014 and BMC. The results of our proposed method show improved performance on both datasets compared with 10 existing state-of-the-art methods.

Model Diagram

Paper Link:

Citation:

@INPROCEEDINGS{9191235,
author={Sultana, Maryam and Mahmood, Arif and Bouwmans, Thierry and Jung, Soon Ki},
booktitle={2020 IEEE International Conference on Image Processing (ICIP)},
title={Dynamic Background Subtraction Using Least Square Adversarial Learning},
year={2020},
pages={3204-3208},
doi={10.1109/ICIP40778.2020.9191235}}

CS-RPCA: Clustered Sparse RPCA for Moving Object Detection (ICIP, 2020)

Abstract:

Moving object detection (MOD) is an important step for many computer vision applications. In the last decade, it is evident that RPCA has shown to be a potential solution for MOD and achieved a promising performance under various challenging background scenes. However, because of the lack of different types of features, RPCA still shows degraded performance in many complicated background scenes such as dynamic backgrounds, cluttered foreground objects, and camouflage. To address these problems, this paper presents a Clustered Sparse RPCA (CS-RPCA) for MOD under challenging environments. The proposed algorithm extracts multiple features from video sequences and then employs RPCA to get the low-rank and sparse component from each representation. The sparse subspaces are then emerged into a common sparse component using Grassmann manifold. We proposed a novel objective function which computes the composite sparse component from multiple representations and it is solved using non-negative matrix factorization method. The proposed algorithm is evaluated on two challenging datasets for MOD. Results demonstrate excellent performance of the proposed algorithm as compared to existing state-of-the-art methods.

Paper Link:

Citation:

@INPROCEEDINGS{9190734,
author={Javed, Sajid and Mahmood, Arif and Dias, Jorge and Werghi, Naoufel},
booktitle={2020 IEEE International Conference on Image Processing (ICIP)},
title={CS-RPCA: Clustered Sparse RPCA for Moving Object Detection},
year={2020},
pages={3209-3213},
doi={10.1109/ICIP40778.2020.9190734}}

Localizing firearm carriers by identifying human-object pairs ( ICIP, 2020)

Abstract:

Visual identification of gunmen in a crowd is a challenging problem, that requires resolving the association of a person with an object (firearm). We present a novel approach to address this problem, by defining human-object interaction (and non-interaction) bounding boxes. In a given image, human and firearms are separately detected. Each detected human is paired with each detected firearm, allowing us to create a paired bounding box that contains both object and the human. A network is trained to classify these paired-bounding-boxes into human carrying the identified firearm or not. Extensive experiments were performed to evaluate the effectiveness of the algorithm, including exploiting full pose of the human, hand-keypoints, and their association with the firearm. The knowledge of spatially localized features is key to the success of our method by using multi-size proposals with adaptive average pooling. We have also extended a previously existing firearm detection dataset, by adding more images and tagging in the extended dataset the human-firearm pairs (including bounding boxes for firearms and gunmen). The experimental results (78.5 APhold) demonstrate effectiveness of the proposed method.

Model Diagram

Paper Link:

Citation:

@INPROCEEDINGS{9190886,
author={Basit, Abdul and Munir, Muhammad Akhtar and Ali, Mohsen and Werghi, Naoufel and Mahmood, Arif},
booktitle={2020 IEEE International Conference on Image Processing (ICIP)},
title={Localizing Firearm Carriers By Identifying Human-Object Pairs},
year={2020},
pages={2031-2035}}

IEEE International Geoscience and Remote Sensing Symposium (IGRSS)

Ocean Color Net (OCN) for the Barents Sea (IEEE IGRSS, 2020)

Abstract:

Over recent years, rapid environmental changes in the Arctic and subarctic regions have caused significant alterations in the ecosystem structure and seasonality, including the primary productivity of the Barents Sea. This work aims at improving methodology for studying these features, by estimating chlorophyll-a (chl-a) concentrations in the transitional Barents Sea by remotely sensing its optical properties, in order to better understand the large-scale algal bloom dynamics in the region. The in-situ measurements of chl-a are collected from the year 2016 to 2018 over a wide area of the Barents Sea to cover the spatial and temporal variations in chl-a concentration. Optical images of the Barents Sea are captured by the Multi-Spectral Imager Instrument on Sentinel-2. Using these remotely sensed optical images and the in-situ measurements, we propose a match-up dataset creation method based on the distribution of the remotely sensed reflectance spectra. The Ocean Color Net (OCN) regression model proposed in this study has outperformed other ML-based techniques.

Model Diagram

Paper Link:

Citation:

@INPROCEEDINGS{9323687,
author={Asim, Muhammad and Brekke, Camilla and Mahmood, Arif and Eltoft, Torbjørn and Reigstad, Marit},
booktitle={IGARSS 2020 – 2020 IEEE International Geoscience and Remote Sensing Symposium},
title={Ocean Color Net (OCN) for the Barents Sea},
year={2020},
pages={5881-5884},
doi={10.1109/IGARSS39084.2020.9323687}}

IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)

Structural Low-Rank Tracking (AVSS, 2019)

Abstract:

Visual object tracking is an important step for many computer vision applications. The task becomes very challenging when the target undergoes heavy occlusion, background clutters, and sudden illumination variations. Methods that incorporate sparse representation and low-rank assumptions on the target particles have achieved promising results. However, because of the lack of structural constraints, these methods show performance degradation when an object faces the aforementioned challenges. To alleviate these limitations, we propose a new structural lowrank modeling algorithm for robust object tracking. In the proposed algorithm, we enforce local spatial, global spatial and temporal appearance consistency among the particles in the low-rank subspace by constructing three graphs. The Laplacian matrices of these graphs are incorporated into the novel low-rank objective function which is solved using linearized alternating direction method with an adaptive penalty.

Model Diagram

Paper Link:

Citation:

@INPROCEEDINGS{8909852,
author={Javed, Sajid and Mahmood, Arif and Dias, Jorge and Werghi, Naoufel},
booktitle={2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)},
title={Structural Low-Rank Tracking},
year={2019},
pages={1-8},
doi={10.1109/AVSS.2019.8909852}}

IEEE Transactions on Image Processing (TIP)

Learning Structure Aware Deep Spectral Embedding (TIP, 2023)

Abstract:

Spectral Embedding (SE) has often been used to map data points from non-linear manifolds to linear subspaces for the purpose of classification and clustering. Despite significant advantages, the subspace structure of data in the original space is not preserved in the embedding space. To address this issue subspace clustering has been proposed by replacing the SE graph affinity with a self-expression matrix. It works well if the data lies in a union of linear subspaces however, the performance may degrade in real-world applications where data often spans non-linear manifolds. To address this problem we propose a novel structure-aware deep spectral embedding by combining a spectral embedding loss and a structure preservation loss. To this end, a deep neural network architecture is proposed that simultaneously encodes both types of information and aims to generate structure-aware spectral embedding. The subspace structure of the input data is encoded by using attention-based self-expression learning. The proposed algorithm is evaluated on six publicly available real-world datasets. The results demonstrate the excellent clustering performance of the proposed algorithm compared to the existing state-of-the-art methods. The proposed algorithm has also exhibited better generalization to unseen data points and it is scalable to larger datasets without requiring significant computational resources.

Model Diagram

Paper Link:

Citation:

@ARTICLE{10179276,
author={Yaseen, Hira and Mahmood, Arif},
journal={IEEE Transactions on Image Processing},
title={Learning Structure Aware Deep Spectral Embedding},
year={2023},
volume={32},
number={},
pages={3939-3948},
doi={10.1109/TIP.2023.3282074}}

Multiplex Cellular Communities in Multi-Gigapixel Colorectal Cancer Histology Images for Tissue Phenotyping (TIP, 2020)

Abstract:

In computational pathology, automated tissue phenotyping in cancer histology images is a fundamental tool for profiling tumor microenvironments. Current tissue phenotyping methods use features derived from image patches which may not carry biological significance. In this work, we propose a novel multiplex cellular community-based algorithm for tissue phenotyping integrating cell-level features within a graph-based hierarchical framework. We demonstrate that such integration offers better performance compared to prior deep learning and texture-based methods as well as to cellular community based methods using uniplex networks. To this end, we construct cell-level graphs using texture, alpha diversity and multi-resolution deep features. Using these graphs, we compute cellular connectivity features which are then employed for the construction of a patch-level multiplex network. Over this network, we compute multiplex cellular communities using a novel objective function. The proposed objective function computes a low-dimensional subspace from each cellular network and subsequently seeks a common low-dimensional subspace using the Grassmann manifold. We evaluate our proposed algorithm on three publicly available datasets for tissue phenotyping, demonstrating a significant improvement over existing state-of-the-art methods.

Model Diagram

Paper Link:

Citation:

@article{javed2020multiplex,
title={Multiplex cellular communities in multi-gigapixel colorectal cancer histology images for tissue phenotyping},
author={Javed, Sajid and Mahmood, Arif and Werghi, Naoufel and Benes, Ksenija and Rajpoot, Nasir},
journal={IEEE Transactions on Image Processing},
volume={29},
pages={9204–9219},
year={2020},
publisher={IEEE}
}

Robust Structural Low-Rank Tracking (TIP, 2020)

Abstract:

Visual object tracking is an essential task for many computer vision applications. It becomes very challenging when the target appearance changes especially in the presence of occlusion, background clutter, and sudden illumination variations. Methods, that incorporate sparse representation and low-rank assumptions on the target particles have achieved promising results. However, because of the lack of structural constraints, these methods show performance degradation when facing the aforementioned challenges. To alleviate these limitations, we propose a new structural low-rank modeling algorithm for robust object tracking in complex scenarios. In the proposed algorithm, we consider spatial and temporal appearance consistency constraints, among the particles in the low-rank subspace, embedded in four different graphs. The resulting objective function encoding these constraints is novel and it is solved using linearized alternating direction method with adaptive penalty both in batch fashion as well as in online fashion. Our proposed objective function jointly learns the spatial and temporal structure of the target particles in consecutive frames and makes the proposed tracker consistent against many complex tracking scenarios. Results on four challenging datasets demonstrate excellent performance of the proposed algorithm as compared to current state-of-the-art methods.

Model Diagram

Paper Link:

Citation:

@article{javed2020robust,
title={Robust structural low-rank tracking},
author={Javed, Sajid and Mahmood, Arif and Dias, Jorge and Werghi, Naoufel},
journal={IEEE Transactions on Image Processing},
volume={29},
pages={4390–4405},
year={2020},
publisher={IEEE}}

Moving Object Detection in Complex Scene Using Spatiotemporal Structured-Sparse RPCA (TIP, 2019)

Abstract:

Moving object detection is a fundamental step in various computer vision applications. Robust principal component analysis (RPCA)-based methods have often been employed for this task. However, the performance of these methods deteriorates in the presence of dynamic background scenes, camera jitter, camouflaged moving objects, and/or variations in illumination. It is because of an underlying assumption that the elements in the sparse component are mutually independent, and thus the spatiotemporal structure of the moving objects is lost. To address this issue, we propose a spatiotemporal structured sparse RPCA algorithm for moving objects detection, where we impose spatial and temporal regularization on the sparse component in the form of graph Laplacians. Each Laplacian corresponds to a multi-feature graph constructed over superpixels in the input matrix. We enforce the sparse component to act as eigenvectors of the spatial and temporal graph Laplacians while minimizing the RPCA objective function. These constraints incorporate a spatiotemporal subspace structure within the sparse component. Thus, we obtain a novel objective function for separating moving objects in the presence of complex backgrounds. The proposed objective function is solved using a linearized alternating direction method of multipliers based batch optimization. Moreover, we also propose an online optimization algorithm for real-time applications. We evaluated both the batch and online solutions using six publicly available data sets that included most of the aforementioned challenges. Our experiments demonstrated the superior performance of the proposed algorithms compared with the current state-of-the-art methods.

Model Diagram

Paper Link:

Citation:

@article{javed2018moving,
title={Moving object detection in complex scene using spatiotemporal structured-sparse RPCA},
author={Javed, Sajid and Mahmood, Arif and Al-Maadeed, Somaya and Bouwmans, Thierry and Jung, Soon Ki},
journal={IEEE Transactions on Image Processing},
volume={28},
number={2},
pages={1007–1022},
year={2018},
publisher={IEEE}}

Background–Foreground Modeling Based on Spatiotemporal Sparse Subspace Clustering (TIP, 2017)

Abstract:

Background estimation and foreground segmentation are important steps in many high-level vision tasks. Many existing methods estimate background as a low-rank component and foreground as a sparse matrix without incorporating the structural information. Therefore, these algorithms exhibit degraded performance in the presence of dynamic backgrounds, photometric variations, jitter, shadows, and large occlusions. We observe that these backgrounds often span multiple manifolds. Therefore, constraints that ensure continuity on those manifolds will result in better background estimation. Hence, we propose to incorporate the spatial and temporal sparse subspace clustering into the robust principal component analysis (RPCA) framework. To that end, we compute a spatial and temporal graph for a given sequence using motion-aware correlation coefficient. The information captured by both graphs is utilized by estimating the proximity matrices using both the normalized Euclidean and geodesic distances. The low-rank component must be able to efficiently partition the spatiotemporal graphs using these Laplacian matrices. Embedded with the RPCA objective function, these Laplacian matrices constrain the background model to be spatially and temporally consistent, both on linear and nonlinear manifolds. The solution of the proposed objective function is computed by using the linearized alternating direction method with adaptive penalty optimization scheme. Experiments are performed on challenging sequences from five publicly available datasets and are compared with the 23 existing state-of-the-art methods. The results demonstrate excellent performance of the proposed algorithm for both the background estimation and foreground segmentation.

Model Diagram

Paper Link:

Citation:

@article{javed2017background,
title={Background–foreground modeling based on spatiotemporal sparse subspace clustering},
author={Javed, Sajid and Mahmood, Arif and Bouwmans, Thierry and Jung, Soon Ki},
journal={IEEE Transactions on Image Processing},
volume={26},
number={12},
pages={5840–5854},
year={2017},
publisher={IEEE}}

Constrained Metric Learning by Permutation Inducing Isometries (TIP, 2016)

Abstract:

The choice of metric critically affects the performance of classification and clustering algorithms. Metric learning algorithms attempt to improve performance, by learning a more appropriate metric. Unfortunately, most of the current algorithms learn a distance function which is not invariant to rigid transformations of images. Therefore, the distances between two images and their rigidly transformed pair may differ, leading to inconsistent classification or clustering results. We propose to constrain the learned metric to be invariant to the geometry preserving transformations of images that induce permutations in the feature space. The constraint that these transformations are isometries of the metric ensures consistent results and improves accuracy. Our second contribution is a dimension reduction technique that is consistent with the isometry constraints. Our third contribution is the formulation of the isometry constrained logistic discriminant metric learning (IC-LDML) algorithm, by incorporating the isometry constraints within the objective function of the LDML algorithm. The proposed algorithm is compared with the existing techniques on the publicly available labeled faces in the wild, viewpoint-invariant pedestrian recognition, and Toy Cars data sets. The IC-LDML algorithm has outperformed existing techniques for the tasks of face recognition, person identification, and object classification by a significant margin.

Model Diagram

Paper Link:

Citation:

@article{bosveld2015constrained,
title={Constrained metric learning by permutation inducing isometries},
author={Bosveld, Joel and Mahmood, Arif and Huynh, Du Q and Noakes, Lyle},
journal={IEEE Transactions on Image Processing},
volume={25},
number={1},
pages={92–103},
year={2015},
publisher={IEEE}
}

Hyperspectral Face Recognition With Spatio Spectral Information Fusion and PLS Regression (TIP, 2015)

Abstract:

Hyperspectral imaging offers new opportunities for face recognition via improved discrimination along the spectral dimension. However, it poses new challenges, including low signal-to-noise ratio, interband misalignment, and high data dimensionality. Due to these challenges, the literature on hyperspectral face recognition is not only sparse but is limited to ad hoc dimensionality reduction techniques and lacks comprehensive evaluation. We propose a hyperspectral face recognition algorithm using a spatiospectral covariance for band fusion and partial least square regression for classification. Moreover, we extend 13 existing face recognition techniques, for the first time, to perform hyperspectral face recognition. We formulate hyperspectral face recognition as an image-set classification problem and evaluate the performance of seven state-of-the-art image-set classification techniques. We also test six state-of-the-art grayscale and RGB (color) face recognition algorithms after applying fusion techniques on hyperspectral images. Comparison with the 13 extended and five existing hyperspectral face recognition techniques on three standard data sets show that the proposed algorithm outperforms all by a significant margin. Finally, we perform band selection experiments to find the most discriminative bands in the visible and near infrared response spectrum.

Model Diagram

Paper Link:

Citation:

@ARTICLE{7010906,
author={Uzair, Muhammad and Mahmood, Arif and Mian, Ajmal},
journal={IEEE Transactions on Image Processing},
title={Hyperspectral Face Recognition With Spatiospectral Information Fusion and PLS Regression},
year={2015},
volume={24},
number={3},
pages={1127-1137},
doi={10.1109/TIP.2015.2393057}}

Correlation-Coefficient-Based Fast Template Matching Through Partial Elimination (TIP, 2012)

Abstract:

Partial computation elimination techniques are often used for fast template matching. At a particular search location, computations are prematurely terminated as soon as it is found that this location cannot compete with an already known best match location. Due to the nonmonotonic growth pattern of the correlation-based similarity measures, partial computation elimination techniques have been traditionally considered inapplicable to speed up these measures. In this paper, we show that partial elimination techniques may be applied to a correlation coefficient by using a monotonic formulation, and we propose basic-mode and extended-mode partial correlation elimination algorithms for fast template matching. The basic-mode algorithm is more efficient on small template sizes, whereas the extended mode is faster on medium and larger templates. We also propose a strategy to decide which algorithm to use for a given data set. To achieve a high speedup, elimination algorithms require an initial guess of the peak correlation value. We propose two initialization schemes including a coarse-to-fine scheme for larger templates and a two-stage technique for small- and medium-sized templates. Our proposed algorithms are exact, i.e., having exhaustive equivalent accuracy, and are compared with the existing fast techniques using real image data sets on a wide variety of template sizes. While the actual speedups are data dependent, in most cases, our proposed algorithms have been found to be significantly faster than the other algorithms.

Paper Link:

Citation:

@ARTICLE{6044713,
author={Mahmood, Arif and Khan, Sohaib},
journal={IEEE Transactions on Image Processing},
title={Correlation-Coefficient-Based Fast Template Matching Through Partial Elimination},
year={2012},
volume={21},
number={4},
pages={2099-2108},
doi={10.1109/TIP.2011.2171696}}

Exploiting Transitivity of Correlation for Fast Template Matching (TIP, 2010)

Abstract:

Elimination Algorithms are often used in template matching to provide a significant speed-up by skipping portions of the computation while guaranteeing the same best-match location as exhaustive search. In this work, we develop elimination algorithms for correlation-based match measures by exploiting the transitivity of correlation. We show that transitive bounds can result in a high computational speed-up if strong autocorrelation is present in the dataset. Generally strong intra reference local autocorrelation is found in natural images, strong inter-reference autocorrelation is found if objects are to be tracked across consecutive video frames and strong inter template autocorrelation is found if consecutive video frames are to be matched with a reference image. For each of these cases, the transitive bounds can be adapted to result in an efficient elimination algorithm. The proposed elimination algorithms are exact, that is, they guarantee to yield the same peak location as exhaustive search over the entire solution space. While the speed-up obtained is data dependent, we show empirical results of up to an order of magnitude faster computation as compared to the currently used efficient algorithms on a variety of datasets.

Paper Link:

Citation:

@ARTICLE{5439796,
author={Mahmood, Arif and Khan, Sohaib},
journal={IEEE Transactions on Image Processing},
title={Exploiting Transitivity of Correlation for Fast Template Matching},
year={2010},
volume={19},
number={8},
pages={2190-2200},
doi={10.1109/TIP.2010.2046809}}

IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)

Histogram of Oriented Principal Components for Cross-View Action Recognition (TPAMI, 2016)

Abstract:

Existing techniques for 3D action recognition are sensitive to viewpoint variations because they extract features from depth images which are viewpoint dependent. In contrast, we directly process point clouds for cross-view action recognition from unknown and unseen views. We propose the histogram of oriented principal components (HOPC) descriptor that is robust to noise, viewpoint, scale and action speed variations. At a 3D point, HOPC is computed by projecting the three scaled eigenvectors of the pointcloud within its local spatio-temporal support volume onto the vertices of a regular dodecahedron. HOPC is also used for the detection of spatiotemporal keypoints (STK) in 3D pointcloud sequences so that view-invariant STK descriptors (or Local HOPC descriptors) at these key locations only are used for action recognition. We also propose a global descriptor computed from the normalized spatio-temporal distribution of STKs in 4-D, which we refer to as STK-D. We have evaluated the performance of our proposed descriptors against nine existing techniques on two cross-view and three single-view human action recognition datasets. The experimental results show that our techniques provide significant improvement over state-of-the-art methods.

Model Diagram

Paper Link:

Citation:

@article{rahmani2016histogram,
title={Histogram of oriented principal components for cross-view action recognition},
author={Rahmani, Hossein and Mahmood, Arif and Huynh, Du and Mian, Ajmal},
journal={IEEE transactions on pattern analysis and machine intelligence},
volume={38},
number={12},
pages={2430–2443},
year={2016},
publisher={IEEE}}

IEEE Transactions on Neural Networks and Learning Systems (TNNLS)

Clustering aided weakly supervised training to detect anomalous events in surveillance videos (TNNLS, 2023)

Abstract:

Formulating learning systems for the detection of real-world anomalous events using only video-level labels is a challenging task mainly due to the presence of noisy labels as well as the rare occurrence of anomalous events in the training data. We propose a weakly supervised anomaly detection system which has multiple contributions including a random batch selection mechanism to reduce inter-batch correlation and a normalcy suppression block which learns to minimize anomaly scores over normal regions of a video by utilizing the overall information available in a training batch. In addition, a clustering loss block is proposed to mitigate the label noise and to improve the representation learning for the anomalous and normal regions. This block encourages the backbone network to produce two distinct feature clusters representing normal and anomalous events. Extensive analysis of the proposed approach is provided using three popular anomaly detection datasets including UCF-Crime, ShanghaiTech, and UCSD Ped2. The experiments demonstrate a superior anomaly detection capability of our approach.

Model Diagram

Paper Link:

Citation:

@ARTICLE{10136845,
author={Zaheer, Muhammad Zaigham and Mahmood, Arif and Astrid, Marcella and Lee, Seung-Ik},
journal={IEEE Transactions on Neural Networks and Learning Systems},
title={Clustering Aided Weakly Supervised Training to Detect Anomalous Events in Surveillance Videos},
year={2023},
pages={1-14}}

IEEE Transactions on Cybernetics (TC)

Hierarchical Spatiotemporal Graph Regularized Discriminative Correlation Filter for Visual Object Tracking (TC, 2021)

Abstract:

Visual object tracking is a fundamental and challenging task in many high-level vision and robotics applications. It is typically formulated by estimating the target appearance model between consecutive frames. Discriminative correlation filters (DCFs) and their variants have achieved promising speed and accuracy for visual tracking in many challenging scenarios. However, because of the unwanted boundary effects and lack of geometric constraints, these methods suffer from performance degradation. In the current work, we propose hierarchical spatiotemporal graph-regularized correlation filters for robust object tracking. The target sample is decomposed into a large number of deep channels, which are then used to construct a spatial graph such that each graph node corresponds to a particular target location across all channels. Such a graph effectively captures the spatial structure of the target object. In order to capture the temporal structure of the target object, the information in the deep channels obtained from a temporal window is compressed using the principal component analysis, and then, a temporal graph is constructed such that each graph node corresponds to a particular target location in the temporal dimension. Both spatial and temporal graphs span different subspaces such that the target and the background become linearly separable. The learned correlation filter is constrained to act as an eigenvector of the Laplacian of these spatiotemporal graphs. We propose a novel objective function that incorporates these spatiotemporal constraints into the DCFs framework. We solve the objective function using alternating direction methods of multipliers such that each subproblem has a closed-form solution. We evaluate our proposed algorithm on six challenging benchmark datasets and compare it with 33 existing state-of-the art trackers. Our results demonstrate an excellent performance of the proposed algorithm compared to the existing trackers.

Model Diagram

Paper Link:

Citation:

@ARTICLE{9475879,
author={Javed, Sajid and Mahmood, Arif and Dias, Jorge and Seneviratne, Lakmal and Werghi, Naoufel},
journal={IEEE Transactions on Cybernetics},
title={Hierarchical Spatiotemporal Graph Regularized Discriminative Correlation Filter for Visual Object Tracking},
year={2022},
volume={52},
number={11},
pages={12259-12274},
doi={10.1109/TCYB.2021.3086194}}

IEEE Transactions on Multimedia (TMM)

Quantification of Occlusion Handling Capability of a 3D Human Pose Estimation Framework (TMM, 2022)

Abstract:

3D human pose estimation using monocular images is an important yet challenging task. Existing 3D pose detection methods exhibit excellent performance under normal conditions however their performance may degrade due to occlusion. Recently some occlusion aware methods have also been proposed, however, the occlusion handling capability of these networks has not yet been thoroughly investigated. In the current work, we propose an occlusion-guided 3D human pose estimation framework and quantify its occlusion handling capability by using different protocols. The proposed method estimates more accurate 3D human poses using 2D skeletons with missing joints as input. Missing joints are handled by introducing occlusion guidance that provides extra information about the absence or presence of a joint. Temporal information has also been exploited to better estimate the missing joints.

Model Diagram

Paper Link:

Citation:

@article{ghafoor2022quantification,
title={Quantification of occlusion handling capability of 3D human pose estimation framework},
author={Ghafoor, Mehwish and Mahmood, Arif},
journal={IEEE Transactions on Multimedia},
year={2022},
publisher={IEEE}}

Unsupervised Moving Object Detection in Complex Scenes Using Adversarial Regularizations (TMM, 2020)

Abstract:

Moving object detection (MOD) is a fundamental step in many high-level vision-based applications, such as human activity analysis, visual object tracking, autonomous vehicles, surveillance, and security. Most of the existing MOD algorithms observe performance degradation in the presence of complex scenes containing camouflage objects, shadows, dynamic backgrounds, and varying illumination conditions, and captured by static cameras. To appropriately handle these challenges, we propose a Generative Adversarial Network (GAN) based on a moving object detection algorithm, calledMOD_GAN. In the proposed algorithm, scene-specific GANs are trained in an unsupervised MOD setting, thereby enabling the algorithm to learn generating background sequences using input from uniformly distributed random noise samples. In addition to adversarial loss, during training, normbased loss in the image space and discriminator feature-space is also minimized between the generated images and the training data. The additional losses enable the generator to learn subtle background details, resulting in a more realistic complex scene generation. During testing, a novel back-propagation based algorithm is used to generate images with statistics similar to the test images. More appropriate random noise samples are searched by directly minimizing the loss function between the test and generated images both in the image and discriminator feature-spaces. The network is not updated in this step; only the input noise samples are iteratively modified to minimize the loss function.Moreover, motion information is used to ensure that this loss is only computed on small-motion pixels. A novel dataset containing outdoor timelapsed images from dawn to dusk with a full illumination variation cycle is also proposed to better compare the MOD algorithms in outdoor scenes. Accordingly, extensive experiments on five benchmark datasets and comparison with 30 existing methods demonstrate the strength of the proposed algorithm.

Model Diagram

Paper Link:

Citation:

@article{sultana2020unsupervised,
title={Unsupervised moving object detection in complex scenes using adversarial regularizations},
author={Sultana, Maryam and Mahmood, Arif and Jung, Soon Ki},
journal={IEEE Transactions on Multimedia},
volume={23},
pages={2005–2018},
year={2020},
publisher={IEEE}
}

IEEE Transactions on Circuits and Systems for Video Technology (CSVT)

Spatiotemporal Low-Rank Modeling for Complex Scene Background Initialization (CSVT, 2018)

Abstract:

Background modeling constitutes the building block of many computer-vision tasks. Traditional schemes model the background as a low rank matrix with corrupted entries. These schemes operate in batch mode and do not scale well with the data size. Moreover, without enforcing spatiotemporal information in the low-rank component, and because of occlusions by foreground objects and redundancy in video data, the design of a background initialization method robust against outliers is very challenging. To overcome these limitations, this paper presents a spatiotemporal low-rank modeling method on dynamic video clips for estimating the robust background model. The proposed method encodes spatiotemporal constraints by regularizing spectral graphs. Initially, a motion-compensated binary matrix is generated using optical flow information to remove redundant data and to create a set of dynamic frames from the input video sequence. Then two graphs are constructed, one between frames for temporal consistency and the other between features for spatial consistency, to encode the local structure for continuously promoting the intrinsic behavior of the low-rank model against outliers. These two terms are then incorporated in the iterative Matrix Completion framework for improved segmentation of background. Rigorous evaluation on severely occluded and dynamic background sequences demonstrates the superior performance of the proposed method over state-of-the art approaches.

Model Diagram

Paper Link:

Citation:

@article{javed2016spatiotemporal,
title={Spatiotemporal low-rank modeling for complex scene background initialization},
author={Javed, Sajid and Mahmood, Arif and Bouwmans, Thierry and Jung, Soon Ki},
journal={IEEE Transactions on Circuits and Systems for Video Technology},
volume={28},
number={6},
pages={1315–1329},
year={2016},
publisher={IEEE}
}

IEEE Transactions on Signal and Information Processing over Networks (TSIPN)

Reconstruction of Time-Varying Graph Signals via Sobolev Smoothness (TSIPN, 2022)

Abstract:

Graph Signal Processing (GSP) is an emerging research field that extends the concepts of digital signal processing to graphs. GSP has numerous applications in different areas such as sensor networks, machine learning, and image processing. The sampling and reconstruction of static graph signals have played a central role in GSP. However, many real-world graph signals are inherently time-varying and the smoothness of the temporal differences of such graph signals may be used as a prior assumption. In the current work, we assume that the temporal differences of graph signals are smooth, and we introduce a novel algorithm based on the extension of a Sobolev smoothness function for the reconstruction of time-varying graph signals from discrete samples. We explore some theoretical aspects of the convergence rate of our Time-varying Graph signal Reconstruction via Sobolev Smoothness (GraphTRSS) algorithm by studying the condition number of the Hessian associated with our optimization problem. Our algorithm has the advantage of converging faster than other methods that are based on Laplacian operators without requiring expensive eigenvalue decomposition or matrix inversions. The proposed GraphTRSS is evaluated on several datasets including two COVID-19 datasets and it has outperformed many existing state-of-the-art methods for time-varying graph signal reconstruction. GraphTRSS has also shown excellent performance on two environmental datasets for the recovery of particulate matter and sea surface temperature signals.

Model Diagram

Paper Link:

Citation:

@article{giraldo2022reconstruction,
title={Reconstruction of time-varying graph signals via Sobolev smoothness},
author={Giraldo, Jhony H and Mahmood, Arif and Garcia-Garcia, Belmar and Thanou, Dorina and Bouwmans, Thierry},
journal={IEEE Transactions on Signal and Information Processing over Networks},
volume={8},
pages={201–214},
year={2022},
publisher={IEEE}
}

IEEE Transactions on Knowledge and Data Engineering (TKDE)

Using Geodesic Space Density Gradients for Network Community Detection (TKDE, 2017)

Abstract:

Many real world complex systems naturally map to network data structures instead of geometric spaces because the only available information is the presence or absence of a link between two entities in the system. To enable data mining techniques to solve problems in the network domain, the nodes need to be mapped to a geometric space. We propose this mapping by representing each network node with its geodesic distances from all other nodes. The space spanned by the geodesic distance vectors is the geodesic space of that network. Position of different nodes in the geodesic space encode the network structure. In this space, considering a continuous density field induced by each node, density at a specific point is the summation of density fields induced by all nodes. We drift each node in the direction of positive density gradient using an iterative algorithm till each node reaches a local maximum. Due to the network structure captured by this space, the nodes that drift to the same region of space belong to the same communities in the original network. We use the direction of movement and final position of each node as important clues for community membership assignment. The proposed algorithm is compared with more than ten state of the art community detection techniques on two benchmark networks with known communities using Normalized Mutual Information criterion. The proposed algorithm outperformed these methods by a significant margin. Moreover, the proposed algorithm has also shown excellent performance on many real-world networks.

Model Diagram

Citation:

@article{mahmood2017using,
title={Using geodesic space density gradients for network community detection},
author={Mahmood, Arif and Small, Michael and Al-Maadeed, Somaya Ali and Rajpoot, Nasir},
journal={IEEE Transactions on Knowledge and Data Engineering (TKDE)},
volume={29},
number={4},
pages={921–935},
year={2017},
publisher={IEEE}}

Subspace Based Network Community Detection Using Sparse Linear Coding (TKDE, 2016)

Abstract:

Information mining from networks by identifying communities is an important problem across a number of research fields including social science, biology, physics, and medicine. Most existing community detection algorithms are graph theoretic and lack the ability to detect accurate community boundaries if the ratio of intra-community to inter-community links is low. Also, the algorithms based on modularity maximization may fail to resolve communities smaller than a specific size if the community size varies significantly. In this paper we present a fundamentally different community detection algorithm based on the fact that each network community spans a different subspace in the geodesic space. Therefore, each node can only be efficiently represented as a linear combination of nodes spanning the same subspace. To make the process of community detection more robust, we use sparse linear coding with l1 norm constraint. In order to find a community label for each node, sparse spectral clustering algorithm is used. The proposed community detection technique is compared with more than ten state of the art methods on two benchmark networks (with known clusters) using normalized mutual information criterion. Our proposed algorithm outperformed existing algorithms with a significant margin on both benchmark networks. The proposed algorithm has also shown excellent performance on three real-world networks.

Model Diagram

Citation:

@article{mahmood2016subspace,
title={Subspace Based Network Community Detection Using Sparse Linear Coding},
author={Mahmood, Arif and Small, Michael},
journal={IEEE Transactions on Knowledge and Data Engineering (TKDE)},
volume={28},
number={3},
pages={801–812},
year={2016},
publisher={IEEE}}

IEEE Transactions on Computational Social Systems (TCSS)

Detection and Localization of Firearm Carriers in Complex Scenes for Improved Safety Measures (TCSS, 2023)

Abstract:

Detecting firearms and accurately localizing individuals carrying them in images or videos is of paramount importance in security, surveillance, and content customization. However, this task presents significant challenges in complex environments due to clutter and the diverse shapes of firearms. To address this problem, we propose a novel approach that leverages human–firearm interaction information, which provides valuable clues for localizing firearm carriers. Our approach incorporates an attention mechanism that effectively distinguishes humans and firearms from the background by focusing on relevant areas. Additionally, we introduce a saliency-driven locality-preserving constraint to learn essential features while preserving foreground information in the input image. By combining these components, our approach achieves exceptional results on a newly proposed dataset. To handle inputs of varying sizes, we pass paired human–firearm instances with attention masks as channels through a deep network for feature computation, utilizing an adaptive average pooling (AAP) layer.

Model Diagram

Paper Link:

Citation:

@ARTICLE{10258124,
author={Mahmood, Arif and Basit, Abdul and Munir, Muhammad Akhtar and Ali, Mohsen},
journal={IEEE Transactions on Computational Social Systems},
title={Detection and Localization of Firearm Carriers in Complex Scenes for Improved Safety Measures},
year={2023}, pages={1-11}}

IEEE Transactions on Cloud Computing (TCC)

Predictive Auto-scaling of Multi-tier Applications Using Performance Varying Cloud Resources (TCC, 2019)

Abstract:

The performance of the same type of cloud resources, such as virtual machines (VMs), varies over time mainly due to hardware heterogeneity, resource contention among co-located VMs, and virtualization overhead. The performance variation can be significant, introducing challenges to learn workload-specific resource provisioning policies to automatically scale the cloud-hosted applications to maintain the desired response time. Moreover, auto-scaling multi-tier applications using minimal resources is even more challenging because bottlenecks may occur on multiple tiers concurrently. In this paper, we address the problem of using performance varying VMs for gracefully auto-scaling a multi-tier application using minimal resources to handle dynamically increasing workloads and satisfy the response time requirements. The proposed system uses a supervised learning method to identify the appropriate resources provisioning for multi-tier applications based on the prediction of the application response time and the request arrival rate. The supervised learning method learns a state transition configuration map which encodes a resource allocation states invariant to the underlying VMs performance variations. This configuration map helps to use performance varying resources in predictive autoscaling method. Our experimental evaluation using a real-world multi-tier web application hosted on a public cloud shows an improved application performance with minimal resources compared to conventional predictive auto-scaling methods.

Model Diagram

Paper Link:

Citation:

@article{iqbal2019predictive,
title={Predictive auto-scaling of multi-tier applications using performance varying cloud resources},
author={Iqbal, Waheed and Erradi, Abdelkarim and Abdullah, Muhammad and Mahmood, Arif},
journal={IEEE Transactions on Cloud Computing},
volume={10},
number={1},
pages={595–607},
year={2019},
publisher={IEEE}
}

IEEE Transactions on Services Computing (TSC)

Web application resource requirements estimation based on the workload latent features (TSC, 2019)

Abstract:

Most cloud computing platforms offer reactive resource auto-scaling mechanisms for dealing with variable traffic patterns to deliver the desired QoS properties while keeping low provisioning costs. However, a range of scenarios have not been fully addressed by the current auto-scaling solutions, particularly dealing with a rapid increase in workload and the risk of thrashing due to frequent workload variations. A reactive system is vulnerable in such conditions. Realizing the full potential of auto-scaling still remains challenging particularly due to the need of accurately estimating the application resource requirements for time-varying workload patterns. In this work, we propose and evaluate a novel method using only application access logs to estimate more accurately the hardware resource demands and application response time. In particular, we propose novel workload latent features which we compute by applying unsupervised learning on the access logs. We use these latent features to estimate the application hardware resource requirements and response time for various workload patterns. We evaluate the proposed method using multiple benchmark web applications and compare it with current state-of-the-art. Extensive experimental evaluations show an excellent performance of our proposed workload latent features in estimating response time, CPU, memory, and bandwidth utilization.

Model Diagram

Paper Link:

Citation:

@article{erradi2019web,
title={Web application resource requirements estimation based on the workload latent features},
author={Erradi, Abdelkarim and Iqbal, Waheed and Mahmood, Arif and Bouguettaya, Athman},
journal={IEEE Transactions on Services Computing},
volume={14},
number={6},
pages={1638–1649},
year={2019},
publisher={IEEE}
}

IEEE Journal of Biomedical and Health Informatics (JBHI)

Knowledge Distillation in Histology Landscape by Multi-Layer Features Supervision (JBHI, 2023)

Abstract:

Shallow networks have been end-to-end trained using direct supervision however their performance degrades because of the lack of capturing robust tissue heterogeneity. Knowledge distillation has recently been employed to improve the performance of the shallow networks used as student networks by using additional supervision from deep neural networks used as teacher networks. In the current work, we propose a novel knowledge distillation algorithm to improve the performance of shallow networks for tissue phenotyping in histology images. For this purpose, we propose multi-layer feature distillation such that a single layer in the student network gets supervision from multiple teacher layers. In the proposed algorithm, the size of the feature map of two layers is matched by using a learnable multi-layer perceptron. The distance between the feature maps of the two layers is then minimized during the training of the student network. The overall objective function is computed by summation of the loss over multiple layers combination weighted with a learnable attention-based parameter. The proposed algorithm is named as Knowledge Distillation for Tissue Phenotyping (KDTP).

Model Diagram

Paper Link:

Citation:

@ARTICLE{10018566,
author={Javed, Sajid and Mahmood, Arif and Qaiser, Talha and Werghi, Naoufel},
journal={IEEE Journal of Biomedical and Health Informatics},
title={Knowledge Distillation in Histology Landscape by Multi-Layer Features Supervision},
year={2023},
volume={27},
number={4},
pages={2037-2046},
doi={10.1109/JBHI.2023.3237749}}

An End-to-End Human Abnormal Behavior Recognition Framework for Crowds with Mentally Disordered Individuals (JBHI, 2022)

Abstract:

Abnormal or violent behavior by people with mental disorders is common. When individuals with mental disorders exhibit abnormal behavior in public places, they may cause physical and mental harm to others as well as to themselves. Thus, it is necessary to monitor their behavior using visual surveillance systems. However, it is challenging to automatically detect human abnormal behavior (especially for individuals with mental disorders) based on motion recognition technologies. To address these issues, in the current work, we propose an end-to-end abnormal behaviour detection framework from a new perspective in conjunction with the Graph Convolutional Network (GCN) and a 3D Convolutional Neural Network (3DCNN). Specifically, we first train a one-class classifier to extract features and estimate abnormality scores. To improve the performance of abnormal behavior detection, GCN is used to model the similarity between video clips for the correction of noisy labels. Then, based on this framework, GCN recognizes the normal behavior clips in the abnormal video and removes them, while the clips identified as abnormal behavior are retained. Finally, a 3D CNN is used to extract spatiotemporal features to classify different abnormal behaviors. In order to better detect the violent behavior of individuals with mental disorders, the paper focuses on the UCF-Crime dataset with various types of violent behaviors. By experimenting with this dataset, the classification accuracy reaches 37.9%, which is significantly better than that of the current state of-the-art approaches.

Model Diagram

Paper Link:

Citation:

@article{hao2021end,
title={An End-to-End Human Abnormal Behavior Recognition Framework for Crowds With Mentally Disordered Individuals},
author={Hao, Yixue and Tang, Zaiyang and Alzahrani, Bander and Alotaibi, Reem and Alharthi, Reem and Zhao, Miaomiao and Mahmood, Arif},
journal={IEEE Journal of Biomedical and Health Informatics},
volume={26},
number={8},
pages={3618–3625},
year={2021},
publisher={IEEE}
}

IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (JSTARS)

Improving Chlorophyll-a Estimation from sentinel-2 (MSI) in the Barents Sea using Machine Learning (JSTARS, 2021)

Abstract:

We aim at improving the monitoring capacity by integrating in situ Chl-a observations and optical remote sensing to locally train machine learning (ML) models. For this purpose, in situ measurements of Chl-a ranging from 0.014–10.81 mg/m3, collected for the years 2016–2018, were used to train and validate models. To accurately estimate Chl-a, we propose to use additional information on pigment content within the productive column by matching the depth-integrated Chl-a concentrations with the satellite data. Using the optical images captured by the multispectral imager instrument on Sentinel-2 and the in situ measurements, a new spatial window based match-up dataset creation method is proposed to increase the number of match-ups and hence improve the training of the ML models. The match-ups are then filtered to eliminate erroneous samples based on the spectral distribution of the remotely sensed reflectance. In addition, we design and implement a neural network model dubbed as the ocean color net (OCN).

Model Diagram

Paper Link:

Citation:

@article{asim2021improving,
title={Improving chlorophyll-a estimation from Sentinel-2 (MSI) in the Barents Sea using machine learning},
author={Asim, Muhammad and Brekke, Camilla and Mahmood, Arif and Eltoft, Torbj{\o}rn and Reigstad, Marit},
journal={IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing},
volume={14},
pages={5529–5549},
year={2021},
publisher={IEEE}}

IEEE Systems Journal

Predictive Autoscaling of Microservices Hosted in Fog Microdata Center (Systems, 2020)

Abstract:

Fog computing provides microdata center (MDC) facilities closer to the users and applications, which help to overcome the application latency and response time concerns. However, guaranteeing specific service-level objectives (SLOs) for the applications running on the MDC requires automatic scaling of allocated resources by efficiently utilizing the available infrastructure capacity. In this article, we propose a novel predictive autoscaling method for microservices running on the fog MDC to satisfy the application response time SLO. Initially, our proposed approach uses a reactive rule-based autoscaling method to gather the training dataset for building the predictive autoscaling model. The proposed approach is efficient, as it can learn the predictive autoscaling model using an increasing synthetic workload. The learned predictive autoscaling model is used to manage the application resources serving different realistic workloads effectively. Our experimental evaluation using two synthetic and three realistic workloads for two benchmark microservice applications on a realMDC shows excellent performance compared to the existing state-of-the-art baseline rule-based autoscaling method. The proposed autoscaling method yields 75.51% reduction in the number of rejected requests and 77.53% fewer number of SLO violations compared to the baseline autoscaling methods by using only 9.20% additional data center resources at the fog layer.

Model Diagram

Paper Link:

Citation:

@article{abdullah2020predictive,
title={Predictive autoscaling of microservices hosted in fog microdata center},
author={Abdullah, Muhammad and Iqbal, Waheed and Mahmood, Arif and Bukhari, Faisal and Erradi, Abdelkarim},
journal={IEEE Systems Journal},
volume={15},
number={1},
pages={1275–1286},
year={2020},
publisher={IEEE}
}

IEEE Signal Processing Letters (SPL)

A Self-Reasoning Framework for Anomaly Detection Using Video-Level Labels (SPL, 2020)

Abstract:

Anomalous event detection in surveillance videos is a challenging and practical research problem among image and video processing community. Compared to the frame-level annotations of anomalous events, obtaining video-level annotations is quite fast and cheap though such high-level labels may contain significant noise. More specifically, an anomalous labeled video may actually contain anomaly only in a short duration while the rest of the video frames may be normal. In the current work, we propose a weakly supervised anomaly detection framework based on deep neural networks which is trained in a self-reasoning fashion using only video-level labels. To carry out the self-reasoning based training, we generate pseudo labels by using binary clustering of spatio-temporal video features which helps in mitigating the noise present in the labels of anomalous videos. Our proposed formulation encourages both the main network and the clustering to complement each other in achieving the goal of more accurate anomaly detection. The proposed framework has been evaluated on publicly available real-world anomaly detection datasets including UCF-crime, ShanghaiTech and UCSD Ped2. The experiments demonstrate superiority of our proposed framework over the current state-of-the-art methods.

Model Diagram

Paper Link:

Citation:

@article{zaheer2020self,
title={A self-reasoning framework for anomaly detection using video-level labels},
author={Zaheer, Muhammad Zaigham and Mahmood, Arif and Shin, Hochul and Lee, Seung-Ik},
journal={IEEE Signal Processing Letters},
volume={27},
pages={1705–1709},
year={2020},
publisher={IEEE}
}

IEEE Access

A Novel Algorithm Based on a Common Subspace Fusion for Visual Object Tracking (Access, 2022)

Abstract:

Recent methods for visual tracking exploit a multitude of information obtained from combinations of handcrafted and/or deep features. However, the response maps derived from these feature combinations are often fused using simple strategies such as winner-takes-all or weighted sum approaches. Although some efficient fusion methods have also been proposed, these methods still do not leverage the individual strengths of the different features being fused. In the current work, we propose a novel information fusion strategy comprising a common low-rank subspace for the fusion of different types of features and tracker responses. Firstly, we interpret the response maps as smoothly varying functions which can be efficiently represented using individual low-rank matrices, thus removing high frequency noise and sparse artifacts. Secondly, we estimate a common low-rank subspace which is constrained to remain close to each individual low-rank subspace resulting in an efficient fusion strategy. The proposed algorithm achieves good performance by integrating the information contained in heterogeneous feature types. We demonstrate the efficiency of our algorithm using several combinations of features as well as correlation filter and end-to-end deep trackers. The proposed common subspace fusion algorithm is generic and can be used to efficiently fuse the response maps of varying types of feature representations as well as trackers. Extensive experiments on several tracking benchmarks including OTB100, TC128, VOT-ST 2018, VOT-LT 2018, UAV123, GOT-10K and LaSoT have demonstrated significant performance improvements compared to many SOTA tracking methods.

Model Diagram

Paper Link:

Citation:

@article{javed2022novel,
title={A novel algorithm based on a common subspace fusion for visual object tracking},
author={Javed, Sajid and Mahmood, Arif and Ullah, Ihsan and Bouwmans, Thierry and Khonji, Majid and Dias, Jorge Manuel Miranda and Werghi, Naoufel},
journal={IEEE Access},
volume={10},
pages={24690–24703},
year={2022},
publisher={IEEE}
}

Internal Emotion Classification Using EEG Signal with Sparse Discriminative Ensemble (Access, 2019)

Abstract:

Among various physiological signal acquisition methods for the study of the human brain, EEG (Electroencephalography) is more effective. EEG provides a convenient, non-intrusive, and accurate way of capturing brain signals in multiple channels at fine temporal resolution. We propose an ensemble learning algorithm for automatically computing the most discriminative subset of EEG channels for internal emotion recognition. Our method describes an EEG channel using kernel-based representations computed from the training EEG recordings. For ensemble learning, we formulate a graph embedding linear discriminant objective function using the kernel representations. The objective function is efficiently solved via sparse non-negative principal component analysis and the final classifier is learned using the sparse projection coefficients. Our algorithm is useful in reducing the amount of data while improving computational efficiency and classification accuracy at the same time. The experiments on publicly available EEG dataset demonstrate the superiority of the proposed algorithm over the compared methods.

Model Diagram

Paper Link:

Citation:

@article{ullah2019internal,
title={Internal emotion classification using EEG signal with sparse discriminative ensemble},
author={Ullah, Habib and Uzair, Muhammad and Mahmood, Arif and Ullah, Mohib and Khan, Sultan Daud and Cheikh, Faouzi Alaya},
journal={IEEE Access},
volume={7},
pages={40144–40153},
year={2019},
publisher={IEEE}
}

Multi-Order Statistical Descriptors for Real-Time Face Recognition and Object Classification (Access, 2018)

Abstract:

We propose novel multi-order statistical descriptors which can be used for high speed object classification or face recognition from videos or image sets. We represent each gallery set with a global second-order statistic which captures correlated global variations in all feature directions as well as the common set structure. A lightweight descriptor is then constructed by efficiently compacting the second order statistic using Cholesky decomposition. We then enrich the descriptor with the first-order statistic of the gallery set to further enhance the representation power. By projecting the descriptor into a low dimensional discriminant subspace, we obtain further dimensionality reduction, while the discrimination power of the proposed representation is still preserved. Therefore, our method represents a complex image set by a single descriptor having significantly reduced dimensionality. We apply the proposed algorithm on image set and video-based face and periocular biometric identification, object category recognition, and hand gesture recognition. Experiments on six benchmark data sets validate that the proposed method achieves significantly better classification accuracy with lower computational complexity than the existing techniques. The proposed compact representations can be used for real-time object classification and face recognition in videos.

Model Diagram

Paper Link:

Citation:

@article{mahmood2018multi,
title={Multi-order statistical descriptors for real-time face recognition and object classification},
author={Mahmood, Arif and Uzair, Muhammad and Al-Maadeed, Somaya},
journal={IEEE Access},
volume={6},
pages={12993–13004},
year={2018},
publisher={IEEE}
}

Palmprint Identification Using an Ensemble of Sparse Representations (Access, 2018)

Abstract:

Among various palmprint identification methods proposed in the literature, sparse representation for classification (SRC) is very attractive offering high accuracy. Although SRC has good discriminative ability, its performance strongly depends on the quality of the training data. In particular, SRC suffers from two major problems: lack of training samples per class and large intra-class variations. In fact, palmprint images not only contain identity information but they also have other information, such as illumination and geometrical distortions due to the unconstrained conditions and the movement of the hand. In this case, the sparse representation assumption may not hold well in the original space since samples from different classes may be considered from the same class. This paper aims to enhance palmprint identification performance through SRC by proposing a simple yet efficient method based on an ensemble of sparse representations through an ensemble of discriminative dictionaries satisfying SRC assumption. The ensemble learning has the advantage to reduce the sensitivity due to the limited size of the training data and is performed based on random subspace sampling over 2D-PCA space while keeping the image inherent structure and information. In order to obtain discriminative dictionaries satisfying SRC assumption, a new space is learned by minimizing and maximizing the intra-class and inter-class variations using 2D-LDA. Extensive experiments are conducted on two publicly available palmprint data sets: multispectral and PolyU. Obtained results showed very promising results compared with both state-of-the-art holistic and coding methods. Besides these findings, we provide an empirical analysis of the parameters involved in the proposed technique to guide the neophyte.

Paper Link:

Citation:

@article{rida2018palmprint,
title={Palmprint identification using an ensemble of sparse representations},
author={Rida, Imad and Al-Maadeed, Somaya and Mahmood, Arif and Bouridane, Ahmed and Bakshi, Sambit},
journal={IEEE Access},
volume={6},
pages={3241–3248},
year={2018},
publisher={IEEE}}

Nucleus classification in histology images using message passing network (MEDIA, 2022)

Abstract:

Identification of nuclear components in the histology landscape is an important step towards developing computational pathology tools for the profiling of tumor micro-environment. Most existing methods for the identification of such components are limited in scope due to heterogeneous nature of the nuclei. Graph-based methods offer a natural way to formulate the nucleus classification problem to incorporate both appearance and geometric locations of the nuclei. The main challenge is to define models that can handle such an unstructured domain. Current approaches focus on learning better features and then employ well-known classifiers for identifying distinct nuclear phenotypes. In contrast, we propose a message passing network that is a fully learnable framework build on classical network flow formulation. Based on physical interaction of the nuclei, a nearest neighbor graph is constructed such that the nodes represent the nuclei centroids. For each edge and node, appearance and geometric features are computed which are then used for the construction of messages utilized for diffusing contextual information to the neighboring nodes. Such an algorithm can infer global information over an entire network and predict biologically meaningful nuclear communities. We show that learning such communities improves the performance of nucleus classification task in histology images. The proposed algorithm can be used as a component in existing state-of-the-art methods resulting in improved nucleus classification performance across four different publicly available datasets.

Paper Link:

Citation:

@article{hassan2022nucleus,
title={Nucleus classification in histology images using message passing network},
author={Hassan, Taimur and Javed, Sajid and Mahmood, Arif and Qaiser, Talha and Werghi, Naoufel and Rajpoot, Nasir},
journal={Medical Image Analysis},
volume={79},
pages={102480},
year={2022},
publisher={Elsevier}
}

Spatially Constrained Context-Aware Hierarchical Deep Correlation Filters for Nucleus Detection in Histology Images (MEDIA, 2021)

Abstract:

Nucleus detection in histology images is a fundamental step for cellular-level analysis in computational pathology. In clinical practice, quantitative nuclear morphology can be used for diagnostic decision making, prognostic stratification, and treatment outcome prediction. Nucleus detection is a challenging task because of large variations in the shape of different types of nucleus such as nuclear clutter, heterogeneous chromatin distribution, and irregular and fuzzy boundaries. To address these challenges, we aim to accurately detect nuclei using spatially constrained context-aware correlation filters using hierarchical deep features extracted from multiple layers of a pre-trained network. During training, we extract contextual patches around each nucleus which are used as negative examples while the actual nucleus patch is used as a positive example. In order to spatially constrain the correlation filters, we propose to construct a spatial structural graph across different nucleus components encoding pairwise similarities. The correlation filters are constrained to act as eigenvectors of the Laplacian of the spatial graphs enforcing these to capture the nucleus structure. A novel objective function is proposed by embedding graph-based structural information as well as the contextual information within the discriminative correlation filter framework. The learned filters are constrained to be orthogonal to both the contextual patches and the spatial graph-Laplacian basis to improve the localization and discriminative performance. The proposed objective function trains a hierarchy of correlation filters on different deep feature layers to capture the heterogeneity in nuclear shape and texture. The proposed algorithm is evaluated on three publicly available datasets and compared with 15 current state-of-the-art methods demonstrating competitive performance in terms of accuracy, speed, and generalization.

Paper Link:

Citation:

@article{javed2021spatially,
title={Spatially constrained context-aware hierarchical deep correlation filters for nucleus detection in histology images},
author={Javed, Sajid and Mahmood, Arif and Dias, Jorge and Werghi, Naoufel and Rajpoot, Nasir},
journal={Medical Image Analysis},
volume={72},
pages={102104},
year={2021},
publisher={Elsevier}
}

Cellular Community Detection For Tissue Phenotyping In Colorectal Cancer Histology Images (MEDIA, 2020)

Abstract:

A primary aim of detailed analysis of multi-gigapixel histology images is assisting pathologists for better cancer grading and prognostication. Several methods have been proposed for the analysis of histology images in the literature. However, these methods are often limited to the classification of two classes i.e., tumor and stroma. Also, most existing methods are based on fully supervised learning and require a large amount of annotations, which are very difficult to obtain. To alleviate these challenges, we propose a novel community detection algorithm for the classification of tissue in Whole-slide Images (WSIs). The proposed algorithm uses a novel graph-based approach to the problem of detecting prevalent communities in a collection of histology images in an semi-supervised manner resulting the identification of six distinct tissue phenotypes in the multi-gigapixel image data. We formulate the problem of identifying distinct tissue phenotypes as the problem of finding network communities using the geodesic density gradient in the space of potential interaction between different cellular components. We show that prevalent communities found in this way represent distinct and biologically meaningful tissue phenotypes. Experiments on two independent Colorectal Cancer (CRC) datasets demonstrate that the proposed algorithm outperforms current state-of-the-art methods.

Model Diagram

Paper Link:

Citation:

@article{javed2020cellular,
title={Cellular Community Detection For Tissue Phenotyping In Colorectal Cancer Histology Images},
author={Javed, Sajid and Mahmood, Arif and Fraz, Muhammad Moazam and Koohbanania, Navid Alemi and Benesc, Ksenija and Tsangc, Yee-Wah and Hewittc, Katherine and Epsteind, David and Sneadc, David and Rajpoot, Nasir},
journal={Medical Image Analysis (MEDIA)},
year={2020}}

Multi-focus Image Fusion Using Content Adaptive Blurring (IF, 2019)

Abstract:

Multi-focus image fusion has emerged as an important research area in information fusion. It aims at increasing the depth-of-field by extracting focused regions from multiple partially focused images, and merging them together to produce a composite image in which all objects are in focus. In this paper, a novel multi-focus image fusion algorithm is presented in which the task of detecting the focused regions is achieved using a Content Adaptive Blurring (CAB) algorithm. The proposed algorithm induces non-uniform blur in a multi-focus image depending on its underlying content. In particular, it analyzes the local image quality in a neighborhood and determines if the blur should be induced or not without losing image quality. In CAB, pixels belonging to the blur regions receive little or no blur at all, whereas the focused regions receive significant blur. Absolute difference of the original image and the CAB-blurred image yields initial segmentation map, which is further refined using morphological operators and graph-cut techniques to improve the segmentation accuracy. Quantitative and qualitative evaluations and comparisons with current state-of-the-art on two publicly available datasets demonstrate the strength of the proposed algorithm.

Model Diagram

Paper Link:

Citation:

@article{farid2019multi,
title={Multi-focus image fusion using content adaptive blurring},
author={Farid, Muhammad Shahid and Mahmood, Arif and Al-Maadeed, Somaya Ali},
journal={Information fusion},
volume={45},
pages={96–112},
year={2019},
publisher={Elsevier}
}

An Information Fusion Framework for Person Localization Via Body Pose in Spectator Crowds (IF, 2019)

Abstract:

Person localization or segmentation in low resolution crowded scenes is important for person tracking and recognition, action detection and anomaly identification. Due to occlusion and lack of interpersonal space, person localization becomes a difficult task. In this work, we propose a novel information fusion framework to integrate a Deep Head Detector and a body pose detector. A more accurate body pose showing limb positions will result in more accurate person localization. We propose a novel Deep Head Detector (DHD) to detect person heads in crowds. The proposed DHD is a fully convolutional neural network and it has shown improved head detection performance in crowds. We modify the Deformable Parts Model (DPM) pose detector to detect multiple upper body poses in crowds. We efficiently fuse the information obtained by the proposed DHD and the modified DPM to obtain a more accurate person pose detector. The proposed framework is named Fusion DPM (FDPM) and it has exhibited improved body pose detection performance on spectator crowds. The detected body poses are then used for more accurate person localization by segmenting each person in the crowd.

Model Diagram

Paper Link:

Citation:

@article{shaban2019information,
title={An information fusion framework for person localization via body pose in spectator crowds},
author={Shaban, Muhammad and Mahmood, Arif and Al-Maadeed, Somaya Ali and Rajpoot, Nasir},
journal={Information Fusion},
volume={51},
pages={178–188},
year={2019},
publisher={Elsevier}
}

Unsupervised Moving Object Segmentation Using Background Subtraction and Optimal Adversarial Noise Sample Search (PR, 2022)

Abstract:

Moving Objects Segmentation (MOS) is a fundamental task in many computer vision applications such as human activity analysis, visual object tracking, content based video search, traffic monitoring, surveillance, and security. MOS becomes challenging due to abrupt illumination variations, dynamic backgrounds, camouflage and scenes with bootstrapping. To address these challenges we propose a MOS algorithm exploiting multiple adversarial regularizations including conventional as well as least squares losses. More specifically, our model is trained on scene background images with the help of cross-entropy loss, least squares adversarial loss and l1 loss in image space working jointly to learn the dynamic background changes. During testing, our proposed method aims to generate test image background scenes by searching optimal noise samples using joint minimization of l1 loss in image space, l1 loss in feature space, and discriminator least squares loss. These loss functions force the generator to synthesize dynamic backgrounds similar to the test sequences which upon subtraction results in moving objects segmentation. Experimental evaluations on five benchmark datasets have shown excellent performance of the proposed algorithm compared to the twenty one existing state-of-the-art methods.

Model Diagram

Paper Link:

Citation:

@article{sultana2022unsupervised,
title={Unsupervised moving object segmentation using background subtraction and optimal adversarial noise sample search},
author={Sultana, Maryam and Mahmood, Arif and Jung, Soon Ki},
journal={Pattern Recognition},
volume={129},
pages={108719},
year={2022},
publisher={Elsevier}
}

Multi-Scale Attention Guided Network for End-to-End Face Alignment and Recognition (JVCIR, 2022)

Abstract:

Attention modules embedded in deep networks mediate the selection of informative regions for object recognition. In addition, the combination of features learned from different branches of a network can enhance the discriminative power of these features. However, fusing features with inconsistent scales is a less-studied problem. In this paper, we first propose a multi-scale channel attention network with an adaptive feature fusion strategy (MSCAN-AFF) for face recognition (FR), which fuses the relevant feature channels and improves the network’s representational power. In FR, face alignment is performed independently prior to recognition, which requires the efficient localization of facial landmarks, which might be unavailable in uncontrolled scenarios such as low-resolution and occlusion. Therefore, we propose utilizing our MSCAN-AFF to guide the Spatial Transformer Network (MSCAN-STN) to align feature maps learned from an unaligned training set in an end-to-end manner. Experiments on benchmark datasets demonstrate the effectiveness of our proposed MSCAN-AFF and MSCAN-STN.

Paper Link:

Citation:

@article{shakeel2022multi,
title={Multi-scale attention guided network for end-to-end face alignment and recognition},
author={Shakeel, M Saad and Zhang, Yuxuan and Wang, Xin and Kang, Wenxiong and Mahmood, Arif},
journal={Journal of Visual Communication and Image Representation},
volume={88},
pages={103628},
year={2022},
publisher={Elsevier}}

Multi-Level Feature Fusion for Nucleus Detection in Histology Images Using Correlation Filters (CBM, 2022)

Abstract:

Nucleus detection is an important step for the analysis of histology images in the field of computational pathology. Pathologists use quantitative nuclear morphology for better cancer grading and prognostication. The nucleus detection becomes very challenging because of the large morphological variations across different types of nuclei, nuclei clutter, and heterogeneity. To address these challenges, we aim to improve the nucleus detection using multi-level feature fusion based on discriminative correlation filters. The proposed algorithm employs multiple features pool, based on varying features combinations. Early fusion is employed to integrate multi-feature information within a pool and inter-pool fusion is proposed to fuse information across multiple pools. Inter-pool consistency is proposed to find the pools which are consistent and complement each other to improve performance. For this purpose, the relative standard deviation is used as an inter-pool consistency measure. Pool robustness to noise is also estimated using relative standard deviation as a robustness measure. High-level pool fusion is proposed using inter-pool consistency and pool-robustness scores. The proposed algorithm facilitates a robust and reliable appearance model for nucleus detection. The proposed algorithm is evaluated on three publicly available datasets and compared with several existing state-of-the-art methods. Our proposed algorithm has consistently outperformed existing methods on a wide range of experiments.

Paper Link:

Citation:

@article{javed2022multi,
title={Multi-level feature fusion for nucleus detection in histology images using correlation filters},
author={Javed, Sajid and Mahmood, Arif and Dias, Jorge and Werghi, Naoufel},
journal={Computers in Biology and Medicine},
volume={143},
pages={105281},
year={2022},
publisher={Elsevier}
}

Learning to Localize Image Forgery Using End-to-End Attention Network (NC, 2022)

Abstract:

Recent advancements have increased the prevalence of digital image tampering. Anyone can manipulate multimedia content using editing software to alter the semantic meaning of images to deceive viewers. Since manipulations appear realistic, both humans and machines face challenges detecting forgeries. We propose a novel algorithm for authenticating visual content by localizing forged regions in this work. Our proposed algorithm employs channel attention convolutional blocks in an end-to-end learning framework. The channel attention infers forged regions in an image by extracting attention-aware multi-resolution features in the spatial domain and features in the frequency domain. Therefore, the proposed network is divided into two subnetworks, for extracting attention-aware multi-resolution features in the spatial and frequency domain. To predict the resulting mask, we concatenate the features of both networks. The proposed channel attention network exclusively focuses on the forged region and increases network generalization capabilities on unseen manipulations. Rigorous experiments demonstrate that the proposed algorithm outperforms state-of-the-art methods on five benchmark datasets for localizing a wide range of manipulations.

Paper Link:

Citation:

@article{ganapathi2022learning,
title={Learning to localize image forgery using end-to-end attention network},
author={Ganapathi, Iyyakutti Iyappan and Javed, Sajid and Ali, Syed Sadaf and Mahmood, Arif and Vu, Ngoc-Son and Werghi, Naoufel},
journal={Neurocomputing},
volume={512},
pages={25–39},
year={2022},
publisher={Elsevier}
}

Moving Objects Segmentation Using Generative Adversarial Modeling (NC, 2022)

Abstract:

Moving Objects Segmentation (MOS) is a crucial step in various computer vision applications, such as visual object tracking, autonomous vehicles, human activity analysis, surveillance, and security. Existing MOS approaches suffer from performance degradation due to extreme challenging conditions in real world complex environments such as varying illumination conditions, camouflage objects, dynamic backgrounds, shadows, bad weathers and camera jitters. To address these problems we proposed a novel generative adversarial based framework for moving objects segmentation. Our framework works with one classifier discriminator, one representation learning network and one generator jointly trained to perform MOS in various challenging scenarios. During training the discriminator network acts as a decision maker between real and fake training samples using conditional least squares loss. While the representation learning network provides the difference between the deep features of real and fake training samples using content loss formulation. Another loss term we have exploited to train our generator network is the reconstruction loss that minimizes the difference between the spatial information of real and fake training samples. Moreover, we also propose a novel modified U-net architecture for our generator network showing improved performance over Vanilla U-net model. Experimental evaluations of our proposed method on four benchmark datasets in comparison with thirty-two existing methods has demonstrated the strength of our proposed model.

Paper Link:

Citation:

@article{sultana2022moving,
title={Moving objects segmentation using generative adversarial modeling},
author={Sultana, Maryam and Mahmood, Arif and Bouwmans, Thierry and Khan, Muhammad Haris and Jung, Soon Ki},
journal={Neurocomputing},
volume={506},
pages={240–251},
year={2022},
publisher={Elsevier}
}

Leveraging orientation for weakly supervised object detection with application to firearm localization (NC, 2021)

Abstract:

Automatic detection of firearms is important for enhancing the security and safety of people, however, it is a challenging task owing to the wide variations in shape, size and appearance of firearms. Also, most of the generic object detectors process axis-aligned rectangular areas though, a thin and long rifle may actually cover only a small percentage of that area and the rest may contain irrelevant details suppressing the required object signatures. To handle these challenges, we propose a weakly supervised Orientation Aware Object Detection (OAOD) algorithm which learns to detect oriented object bounding boxes (OBB) while using AxisAligned Bounding Boxes (AABB) for training. The proposed OAOD is different from the existing oriented object detectors which strictly require OBB during training which may not always be present. The goal of training on AABB and detection of OBB is achieved by employing a multistage scheme, with Stage-1 predicting the AABB and Stage-2 predicting OBB. In-between the two stages, the oriented proposal generation module along with the object aligned RoI pooling is designed to extract features based on the predicted orientation and to make these features orientation invariant. A diverse and challenging dataset consisting of eleven thousand images is also proposed for firearm detection which is manually annotated.

Model Diagram

Paper Link:

Citation:

@article{iqbal2021leveraging,
title={Leveraging orientation for weakly supervised object detection with application to firearm localization},
author={Iqbal, Javed and Munir, Muhammad Akhtar and Mahmood, Arif and Ali, Afsheen Rafaqat and Ali, Mohsen},
journal={Neurocomputing},
volume={440},
pages={310–320},
year={2021}}

Dynamic workload patterns prediction for proactive auto-scaling of web applications (JNCA, 2018)

Abstract:

Proactive auto-scaling methods dynamically manage the resources for an application according to the current and future load predictions to preserve the desired performance at a reduced cost. However, auto-scaling web applications remain challenging mainly due to dynamic workload intensity and characteristics which are difficult to predict. Most existing methods mainly predict the request arrival rate which only partially captures the workload characteristics and the changing system dynamics that influence the resource needs. This may lead to inappropriate resource provisioning decisions. In this paper, we address these challenges by proposing a framework for prediction of dynamic workload patterns as follows. First, we use an unsupervised learning method to analyze the web application access logs to discover URI (Uniform Resource Identifier) space partitions based on the response time and the document size features. Then for each application URI, we compute its distribution across these partitions based on historical access logs to accurately capture the workload characteristics compared to just representing the workload using the request arrival rate. These URI distributions are then used to compute the Probabilistic Workload Pattern (PWP), which is a probability vector describing the overall distribution of incoming requests across URI partitions. Finally, the identified workload patterns for a specific number of last time intervals are used to predict the workload pattern of the next interval. The latter is used for future resource demand prediction and proactive auto-scaling to dynamically control the provisioning of resources. The framework is implemented and experimentally evaluated using historical access logs of three real web applications, each with increasing, decreasing, periodic, and randomly varying arrival rate behaviors. Results show that the proposed solution yields significantly more accurate predictions of workload patterns and resource demands of web applications compared to existing approaches

Model Diagram:

Paper Link:

Citation:

@article{iqbal2018dynamic,
title={Dynamic workload patterns prediction for proactive auto-scaling of web applications},
author={Iqbal, Waheed and Erradi, Abdelkarim and Mahmood, Arif},
journal={Journal of Network and Computer Applications},
volume={124},
pages={94–107},
year={2018},
publisher={Elsevier}}

Masked Linear Regression for Learning Local Receptive Fields for Facial Expression Synthesis (IJCV, 2019)

Abstract:

Compared to facial expression recognition, expression synthesis requires a very high-dimensional mapping. This problem exacerbates with increasing image sizes and limits existing expression synthesis approaches to relatively small images. We observe that facial expressions often constitute sparsely distributed and locally correlated changes from one expression to another. By exploiting this observation, the number of parameters in an expression synthesis model can be significantly reduced. Therefore, we propose a constrained version of ridge regression that exploits the local and sparse structure of facial expressions. We consider this model as masked regression for learning local receptive fields. In contrast to the existing approaches, our proposed model can be efficiently trained on larger image sizes. Experiments using three publicly available datasets demonstrate that our model is significantly better than l0, l1 and l2-regression, SVD based approaches, and kernelized regression in terms of mean-squared-error, visual quality as well as computational and spatial complexities. The reduction in the number of parameters allows our method to generalize better even after training on smaller datasets. The proposed algorithm is also compared with state-of-the-art GANs including Pix2Pix, CycleGAN, StarGAN and GANimation. These GANs produce photo-realistic results as long as the testing and the training distributions are similar. In contrast, our results demonstrate significant generalization of the proposed algorithm over out-of-dataset human photographs, pencil sketches and even animal faces.

Paper Link:

Citation:

@article{khan2020masked,
title={Masked linear regression for learning local receptive fields for facial expression synthesis},
author={Khan, Nazar and Akram, Arbish and Mahmood, Arif and Ashraf, Sania and Murtaza, Kashif},
journal={International Journal of Computer Vision},
volume={128},
number={5},
pages={1433–1454},
year={2020},
publisher={Springer}
}

Fake Visual Content Detection Using Two-Stream Convolutional Neural Networks (NCA, 2022)

Abstract:

Rapid progress in adversarial learning has enabled the generation of realistic-looking fake visual content. To distinguish between fake and real visual content, several detection techniques have been proposed. The performance of most of these techniques however drops off significantly if the test and the training data are sampled from different distributions. This motivates efforts towards improving the generalization of fake detectors. Since current fake content generation techniques do not accurately model the frequency spectrum of the natural images, we observe that the frequency spectrum of the fake visual data contains discriminative characteristics that can be used to detect fake content. We also observe that the information captured in the frequency spectrum is different from that of the spatial domain. Using these insights, we propose to complement frequency and spatial domain features using a two-stream convolutional neural network architecture called TwoStreamNet. We demonstrate the improved generalization of the proposed two-stream network to several unseen generation architectures, datasets, and techniques. The proposed detector has demonstrated significant performance improvement compared to the current state-of-the-art fake content.

Model Diagram

Paper Link:

Citation:

@article{yousaf2022fake,
title={Fake visual content detection using two-stream convolutional neural networks},
author={Yousaf, Bilal and Usama, Muhammad and Sultani, Waqas and Mahmood, Arif and Qadir, Junaid},
journal={Neural Computing and Applications},
volume={34},
number={10},
pages={7991–8004},
year={2022},
publisher={Springer}}

Statistically correlated multi-task learning for autonomous driving (NCA, 2021)

Abstract:

Autonomous driving research is an emerging area in the machine learning domain. Most existing methods perform single-task learning, while multi-task learning (MTL) is more efficient due to the leverage of shared information between different tasks. However, MTL is challenging because different tasks may have different significance and varying ranges. In this work, we propose an end-to-end deep learning architecture for statistically correlated MTL using a single input image. Statistical correlation of the tasks is handled by including shared layers in the architecture. Later network separates into different branches to handle the difference in the behavior of each task. Training a multi-task model with varying ranges may converge the objective function only with larger values. To this end, we explore different normalization schemes and empirically observe that the inverse validation-loss weighted scheme has best performed. In addition to estimating steering angle, braking, and acceleration, we also estimate the number of lanes on the left and the right side of the vehicle. To the best of our knowledge, we are the first to propose an end-to-end deep learning architecture to estimate this type of lane information. The proposed approach is evaluated on four publicly available datasets including Comma.ai, Udacity, Berkeley Deep Drive, and Sully Chen. We also propose a synthetic dataset GTA-V for autonomous driving research. Our experiments demonstrate the superior performance of the proposed approach compared to the current state-of-the-art methods. The GTA-V dataset and the lane annotations on the four existing datasets will be made publicly available via https://cvlab.lums.edu.pk/scmtl/.

Model Diagram

Paper Link:

Citation:

@article{abbas2021statistically,
title={Statistically correlated multi-task learning for autonomous driving},
author={Abbas, Waseem and Khan, M Fakhir and Taj, Murtaza and Mahmood, Arif},
journal={Neural Computing and Applications},
volume={33},
pages={12921–12938},
year={2021}}

Human face super-resolution on poor quality surveillance video footage (NCA, 2021)

Abstract:

Most super-resolution (SR) methods proposed to date do not use real ground-truth high-resolution (HR) and low-resolution (LR) image pairs; instead, the vast majority of methods use synthetic LR images generated from the HR images. This approach yields excellent performance on synthetic datasets, but on real-world poor quality surveillance video footage, they suffer from performance degradation. A promising alternative is to apply recent advances in style transfer for unpaired datasets, but state-of-the-art work along these lines has used LR images and HR images from completely different datasets, introducing more variation between the HR and LR domains than necessary. In this paper, we propose methods that overcome both of these shortcomings, applying unpaired style transfer learning methods to face SR but using HR and LR datasets that share important properties. The key is to acquire roughly paired training data from a high-quality main stream and a lower-quality sub-stream of the same IP camera. Based on this principle, we have constructed four datasets comprising more than 400 people, with 1–15 weakly aligned real HR–LR pairs for each subject. We adopt a cycle generative adversarial networks (Cycle GANs) approach that produces impressive super-resolved images for low-quality test images never seen during training.

Model Diagram

Paper Link:

Citation:

@article{farooq2021human,
title={Human face super-resolution on poor quality surveillance video footage},
author={Farooq, Muhammad and Dailey, Matthew N and Mahmood, Arif and Moonrinta, Jednipat and Ekpanyapong, Mongkol},
journal={Neural Computing and Applications},
volume={33},
pages={13505–13523},
year={2021},
publisher={Springer}}

Unsupervised deep context prediction for background estimation and foreground segmentation (MVA, 2019)

Abstract:

Background estimation is a fundamental step in many high-level vision applications, such as tracking and surveillance. Existing background estimation techniques suffer from performance degradation in the presence of challenges such as dynamic backgrounds, photometric variations, camera jitters, and shadows. To handle these challenges for the purpose of accurate background estimation, we propose a unified method based on Generative Adversarial Network (GAN) and image inpainting. The proposed method is based on a context prediction network, which is an unsupervised visual feature learning hybrid GAN model. Context prediction is followed by a semantic inpainting network for texture enhancement. We also propose a solution for arbitrary region inpainting using the center region inpainting method and Poisson blending technique. The proposed algorithm is compared with the existing state-of-the-art methods for background estimation and foreground segmentation and outperforms the compared methods by a significant margin.

Model Diagram

Paper Link:

Citation:

@article{sultana2019unsupervised,
title={Unsupervised deep context prediction for background estimation and foreground segmentation},
author={Sultana, Maryam and Mahmood, Arif and Javed, Sajid and Jung, Soon Ki},
journal={Machine Vision and Applications},
volume={30},
pages={375–395},
year={2019},
publisher={Springer}
}

Action recognition in poor quality spectator crowd videos using head distribution based person segmentation (MVA, 2019)

Abstract:

Despite a big volume of research on action recognition, little attention has been given to individual action recognition in poorquality spectator crowd scenes. It is an important scenario, because most of the surveillance systems generate poor-quality videos, though current state-of-the-art methods may not be effectively applicable. Therefore recognizing actions performed by individuals in poor-quality spectator crowd scenes is an unsolved problem. In such cases, the main challenge is localizing person proposals for each actor in the crowd. This challenge becomes more difficult when occlusion is severe. In this work, we propose a novel approach to find person proposals in poor-quality spectator crowds using crowd-based constraints. First, we define persons in the crowd by using efficient person head detectors. We exploit person head size to estimate the person bounding box using linear regression. Then, we use distribution of heads in the crowd image to estimate more accurate person proposals. Motion trajectories are independently computed in the video without considering persons and then assigned to each person based on a novel distance measure computed between the trajectory and the person proposal. The set of trajectories and associated motion and texture-based features in overlapped time windows are used to compute the final feature vector. For each time window using early information fusion in the bag of visual-words framework, cumulative feature vectors are computed encoding action information. Experiments are performed on a publicly available real-world spectator crowd dataset containing as many as 150 actors performing multiple actions at the same time. Our experiments have demonstrated excellent performance of the proposed technique.

Paper Link:

Citation:

@article{mahmood2019action,
title={Action recognition in poor-quality spectator crowd videos using head distribution-based person segmentation},
author={Mahmood, Arif and Al-Maadeed, Somaya},
journal={Machine Vision and Applications},
volume={30},
number={6},
pages={1083–1096},
year={2019},
publisher={Springer}}

Canny edge detection and Hough transform for high resolution video streams using Hadoop and Spark (CC, 2019)

Abstract:

Nowadays, video cameras are increasingly used for surveillance, monitoring, and activity recording. These cameras generate high resolution image and video data at large scale. Processing such large scale video streams to extract useful information with time constraints is challenging. Traditional methods do not offer scalability to process large scale data. In this paper, we propose and evaluate cloud services for high resolution video streams in order to perform line detection using Canny edge detection followed by Hough transform. These algorithms are often used as preprocessing steps for various high level tasks including object, anomaly, and activity recognition. We implement and evaluate both Canny edge detector and Hough transform algorithms in Hadoop and Spark. Our experimental evaluation using Spark shows an excellent scalability and performance compared to Hadoop and standalone implementations for both Canny edge detection and Hough transform. We obtained a speedup of 10.8 and 9.3 for Canny edge detection and Hough transform respectively using Spark. These results demonstrate the effectiveness of parallel implementation of computer vision algorithms to achieve good scalability for real-world applications.

Model Diagram

Paper Link:

Citation:

@article{iqbal2020canny,
title={Canny edge detection and Hough transform for high resolution video streams using Hadoop and Spark},
author={Iqbal, Bilal and Iqbal, Waheed and Khan, Nazar and Mahmood, Arif and Erradi, Abdelkarim},
journal={Cluster Computing},
volume={23},
number={1},
pages={397–408},
year={2020},
publisher={Springer}
}

A Boosting Framework for Human Posture Recognition Using Spatio-Temporal Features along with Radon Transform (MTA, 2022)

Abstract:

Automatic human posture recognition in surveillance videos has real world applications in monitoring old-homes, restoration centers, hospitals, disability, and child-care centers. It also has applications in other areas such as security and surveillance, sports, and abnormal activity recognition. Human posture recognition is a challenging problem due to occlusion, background clutter, illumination variations, camouflage, and noise in the captured video signal. In the current study, which is an extension of our previous work (Ali et al. Sensors, 18(6):1918, 2018), we propose a novel combination of a number of spatio-temporal features computed over human blobs in a temporal window. These features include aspect ratios, shape descriptors, geometric centroids, ellipse axes ratio, silhouette angles, and silhouette speed. In addition to these features, we also exploit the radon transform to get better shape based analysis. In order to obtain improved posture classification accuracy, we used J48 classifier under a boosting framework by employing the AdaBoost algorithm.The proposed algorithm is compared with eighteen existing state-of-the-art approaches on four publicly available datasets including MCF, UR Fall detection, KARD, and NUCLA. Our results demonstrate the excellent performance of the proposed algorithm compared to these existing methods.

Model Diagram

Paper Link:

Citation:

@article{aftab2022boosting,
title={A boosting framework for human posture recognition using spatio-temporal features along with radon transform},
author={Aftab, Salma and Ali, Syed Farooq and Mahmood, Arif and Suleman, Umar},
journal={Multimedia Tools and Applications},
volume={81},
number={29},
pages={42325–42351},
year={2022},
publisher={Springer}}

Improving Object Tracking by Added Noise and Channel Attention (Sensors, 2020)

Abstract:

CNN-based trackers, especially those based on Siamese networks, have recently attracted considerable attention because of their relatively good performance and low computational cost. For many Siamese trackers, learning a generic object model from a large-scale dataset is still a challenging task. In the current study, we introduce input noise as regularization in the training data to improve generalization of the learned model. We propose an Input-Regularized Channel Attentional Siamese (IRCA-Siam) tracker which exhibits improved generalization compared to the current state-of-the-art trackers. In particular, we exploit offline learning by introducing additive noise for input data augmentation to mitigate the overfitting problem. We propose feature fusion from noisy and clean input channels which improves the target localization. Channel attention integrated with our framework helps finding more useful target features resulting in further performance improvement. Our proposed IRCA-Siam enhances the discrimination of the tracker/background and improves fault tolerance and generalization. An extensive experimental evaluation on six benchmark datasets including OTB2013, OTB2015, TC128, UAV123, VOT2016 and VOT2017 demonstrate superior performance of the proposed IRCA-Siam tracker compared to the 30 existing state-of-the-art trackers.

Model Diagram

Paper Link:

Citation:

@article{fiaz2020improving,
title={Improving object tracking by added noise and channel attention},
author={Fiaz, Mustansar and Mahmood, Arif and Baek, Ki Yeol and Farooq, Sehar Shahzad and Jung, Soon Ki},
journal={Sensors},
volume={20},
number={13},
pages={3780},
year={2020},
publisher={MDPI}}

Learning soft mask based feature fusion with channel and spatial attention for robust visual object tracking (Sensors, 2020)

Abstract:

We propose to improve the visual object tracking by introducing a soft mask based low-level feature fusion technique. The proposed technique is further strengthened by integrating channel and spatial attention mechanisms. The proposed approach is integrated within a Siamese framework to demonstrate its effectiveness for visual object tracking. The proposed soft mask is used to give more importance to the target regions as compared to the other regions to enable effective target feature representation and to increase discriminative power. The low-level feature fusion improves the tracker robustness against distractors. The channel attention is used to identify more discriminative channels for better target representation. The spatial attention complements the soft mask based approach to better localize the target objects in challenging tracking scenarios. We evaluated our proposed approach over five publicly available benchmark datasets and performed extensive comparisons with 39 state-of-the-art tracking algorithms. The proposed tracker demonstrates excellent performance compared to the existing state-of-the-art trackers.

Model Diagram

Paper Link:

Citation:

@article{fiaz2020learning,
title={Learning soft mask based feature fusion with channel and spatial attention for robust visual object tracking},
author={Fiaz, Mustansar and Mahmood, Arif and Jung, Soon Ki},
journal={Sensors},
volume={20},
number={14},
pages={4021},
year={2020},
publisher={MDPI}}

Using temporal covariance of motion and geometric features via boosting for human fall detection (Sensors, 2018)

Abstract:

Fall induced damages are serious incidences for aged as well as young persons. A real-time automatic and accurate fall detection system can play a vital role in timely medication care which will ultimately help to decrease the damages and complications. In this paper, we propose a fast and more accurate real-time system which can detect people falling in videos captured by surveillance cameras. Novel temporal and spatial variance-based features are proposed which comprise the discriminatory motion, geometric orientation and location of the person. These features are used along with ensemble learning strategy of boosting with J48 and Adaboost classifiers. Experiments have been conducted on publicly available standard datasets including Multiple Cameras Fall (with 2 classes and 3 classes) and UR Fall Detection achieving percentage accuracies of 99.2, 99.25 and 99.0, respectively. Comparisons with nine state-of-the-art methods demonstrate the effectiveness of the proposed approach on both datasets.

Paper Link:

Citation:

@article{ali2018using,
title={Using temporal covariance of motion and geometric features via boosting for human fall detection},
author={Ali, Syed Farooq and Khan, Reamsha and Mahmood, Arif and Hassan, Malik Tahir and Jeon, Moongu},
journal={Sensors},
volume={18},
number={6},
pages={1918},
year={2018},
publisher={MDPI}
}

Handcrafted and Deep Trackers: Recent Visual Object Tracking Approaches and Trends (ACM CS, 2019)

Abstract:

In recent years, visual object tracking has become a very active research area. An increasing number of tracking algorithms are being proposed each year. It is because tracking has wide applications in various real-world problems such as human-computer interaction, autonomous vehicles, robotics, surveillance, and security just to name a few. In the current study, we review latest trends and advances in the tracking area and evaluate the robustness of different trackers based on the feature extraction methods. The first part of this work includes a comprehensive survey of the recently proposed trackers. We broadly categorize trackers into Correlation Filter based Trackers (CFTs) and Non-CFTs. Each category is further classified into various types based on the architecture and the tracking mechanism. In the second part of this work, we experimentally evaluated 24 recent trackers for robustness and compared handcrafted and deep feature based trackers. We observe that trackers using deep features performed better, though in some cases a fusion of both increased performance significantly. To overcome the drawbacks of the existing benchmarks, a new benchmark Object Tracking and Temple Color (OTTC) has also been proposed and used in the evaluation of different algorithms. We analyze the performance of trackers over 11 different challenges in OTTC and 3 other benchmarks. Our study concludes that Discriminative Correlation Filter (DCF) based trackers perform better than the others. Our study also reveals that inclusion of different types of regularizations over DCF often results in boosted tracking performance. Finally, we sum up our study by pointing out some insights and indicating future trends in the visual object tracking field.

Paper Link:

Citation:

@article{fiaz2019handcrafted,
title={Handcrafted and deep trackers: Recent visual object tracking approaches and trends},
author={Fiaz, Mustansar and Mahmood, Arif and Javed, Sajid and Jung, Soon Ki},
journal={ACM Computing Surveys (CSUR)},
volume={52},
number={2},
pages={1–44},
year={2019},
publisher={ACM New York, NY, USA}
}

Visual Object Tracking in the Deep Neural Networks, 2019

Deep Siamese networks towards robust visual tracking

Abstract:

Recently, Siamese neural networks have been widely used in visual object tracking to leverage the template matching mechanism. Siamese network architecture contains two parallel streams to estimate the similarity between two inputs and has the ability to learn their discriminative features. Various deep Siamese-based tracking frameworks have been proposed to estimate the similarity between the target and the search region. In this chapter, we categorize deep Siamese networks into three categories by the position of the merging layers as late merge, intermediate merge and early merge architectures. In the late merge architecture, inputs are processed as two separate streams and merged at the end of the network, while in the intermediate merge architecture, inputs are initially processed separately and merged intermediate well before the final layer. Whereas in the early merge architecture, inputs are combined at the start of the network and a unified data stream is processed by a single convolutional neural network. We evaluate the performance of deep Siamese trackers based on the merge architectures and their output such as similarity score, response map, and bounding box in various tracking challenges. This chapter will give an overview of the recent development in deep Siamese trackers and provide insights for the new developments in the tracking field.

Paper Link:

Citation:

@article{fiaz2019deep,
title={Deep siamese networks toward robust visual tracking},
author={Fiaz, Mustansar and Mahmood, Arif and Jung, Soon Ki},
journal={Visual Object Tracking with Deep Neural Networks},
year={2019},
publisher={IntechOpen London, UK}
}

Video Object Segmentation Based on Guided Feature Transfer Learning (IW-FCV, 2022) [Best Paper Award]

Abstract:

Video Object Segmentation (VOS) is a fundamental task with many real-world computer vision applications and challenging due to available distractors and background clutter. Many existing online learning approaches have limited practical significance because of high computational cost required to fine-tune network parameters. Moreover, matching based and propagation approaches are computationally efficient but may suffer from degraded performance in cluttered backgrounds and object drifts. In order to handle these issues, we propose an offline end-to-end model to learn guided feature transfer for VOS. We introduce guided feature modulation based on target mask to capture the video context information and a generative appearance model is used to provide cues for both the target and the background. Proposed guided feature modulation system learns the target semantic information based on modulation activations. Generative appearance model learns the probability of a pixel to be target or background. In addition, low-resolution features from deeper networks may not capture the global contextual information and may reduce the performance during feature refinement. Therefore, we also propose a guided pooled decoder to learn the global as well as local context information for better feature refinement. Evaluation over two VOS benchmark datasets including DAVIS2016 and DAVIS2017 have shown excellent performance of the proposed framework compared to more than 20 existing state-of-the-art methods.

Paper Link:

Citation:

@inproceedings{fiaz2022video,
title={Video Object Segmentation Based on Guided Feature Transfer Learning},
author={Fiaz, Mustansar and Mahmood, Arif and Shahzad Farooq, Sehar and Ali, Kamran and Shaheryar, Muhammad and Jung, Soon Ki},
booktitle={International Workshop on Frontiers of Computer Vision},
pages={197–210},
year={2022},
organization={Springer}
}

Lightweight Encoder-Decoder Architecture for Foot Ulcer Segmentation (IW-FCV, 2022)

Abstract:

Continuous monitoring of foot ulcer healing is needed to ensure the efficacy of a given treatment and to avoid any possibility of deterioration. Foot ulcer segmentation is an essential step in wound diagnosis. We developed a model that is similar in spirit to the well-established encoder-decoder and residual convolution neural networks. Our model includes a residual connection along with a channel and spatial attention integrated within each convolution block. A simple patch-based approach for model training, test time augmentations, and majority voting on the obtained predictions resulted in superior performance. Our model did not leverage any readily available backbone architecture, pre-training on a similar external dataset, or any of the transfer learning techniques. The total number of network parameters being around 5 million made it a significantly lightweight model as compared with the available state-of-theart models used for the foot ulcer segmentation task. Our experiments presented results at the patch-level and image-level. Applied on publicly available Foot Ulcer Segmentation (FUSeg) Challenge dataset from MICCAI 2021, our model achieved state-of-the-art image-level performance of 88.22% in terms of Dice similarity score and ranked second in the official challenge leaderboard. We also showed an extremely simple solution that could be compared against the more advanced architectures.

Model Diagram

Paper Link:

Citation:

@inproceedings{ali2022lightweight,
title={Lightweight Encoder-Decoder Architecture for Foot Ulcer Segmentation},
author={Ali, Shahzad and Mahmood, Arif and Jung, Soon Ki},
booktitle={International Workshop on Frontiers of Computer Vision},
pages={242–253},
year={2022},
organization={Springer}
}

Robust Tracking via Feature Enrichment and Overlap Maximization (IW-FCV, 2021)

Abstract:

Recently, Convolutional Neural Networks (CNNs) based approaches have demonstrated an impressive gain over conventional approaches which resulted in rapid development of various visual object tracker. However, these advancements are limited in terms of accuracy due to the distractors available in the videos. Moreover, most of the deep trackers operate on low-resolution features, such as template matching, which are semantically reliable but are spatially less accurate. We propose an efficient feature enrichment module within tracking framework to learn the contextual reliable information and spatially accurate feature representation. Proposed feature enrichment combines enriched feature sets by exploiting contextual information from multiple scales as well as preserving the spatial information details. We integrate proposed feature enrichment module within baseline ATOM which solves the tracking problem by target estimation and classification components. The former component estimates the target based on IoU-predictor, while the later component is trained online to enforce high discrimination power. Experimental study over three benchmarks including VOT2015, VOT2016, and VOT2017 revealed that proposed feature enrichment module boosts the tracker accuracy.

Paper Link:

Citation:

@inproceedings{fiaz2021robust,
title={Robust Tracking via Feature Enrichment and Overlap Maximization},
author={Fiaz, Mustansar and Ali, Kamran and Yun, Sang Bin and Baek, Ki Yeol and Lee, Hye Jin and Kim, In Su and Mahmood, Arif and Farooq, Sehar Shahzad and Jung, Soon Ki},
booktitle={Frontiers of Computer Vision: 27th International Workshop, IW-FCV 2021, Daegu, South Korea, February 22–23, 2021, Revised Selected Papers 27},
pages={17–30},
year={2021},
organization={Springer}
}

Adaptive Feature Selection Siamese Networks for Visual Tracking (IW-FCV, 2020) [Best Student Paper Award]

Abstract:

Recently, template based discriminative trackers, especially Siamese network based trackers have shown great potential in terms of balanced accuracy and tracking speed. However, it is still difficult for Siamese models to adapt the target variations from offline learning. In this paper, we introduced an Adaptive Feature Selection Siamese (AFSSiam) network to learn the most discriminative feature information for better tracking. Features from different layers contain complementary information for discrimination. Proposed adaptive feature selection module selects the most useful feature information from different convolutional layers while suppresses the irrelevant ones. Proposed tracking algorithm not only alleviates the over-fitting problem but also increases the discriminative ability. The proposed tracking framework is trained end-to-end. and extensive experimental results over OTB50, OTB100, TC-128, and VOT2017 demonstrate that our tracking algorithm exhibits favorable performance compared to other state-of-the-art methods.

Model Diagram

Paper Link:

Citation:

@inproceedings{fiaz2020adaptive,
title={Adaptive feature selection Siamese networks for visual tracking},
author={Fiaz, Mustansar and Rahman, Md Maklachur and Mahmood, Arif and Farooq, Sehar Shahzad and Baek, Ki Yeol and Jung, Soon Ki},
booktitle={Frontiers of Computer Vision: 26th International Workshop, IW-FCV 2020, Ibusuki, Kagoshima, Japan, February 20–22, 2020, Revised Selected Papers 26},
pages={167–179},
year={2020},
organization={Springer}
}

Unsupervised Adversarial Learning for Dynamic Background Modelling (IW-FCV, 2020) [Best Paper Award]

Abstract:

Dynamic Background Modeling (DBM) is a crucial task in many computer vision based applications such as human activity analysis, traffic monitoring, surveillance, and security. DBM is extremely challenging in scenarios like illumination changes, camouflage, intermittent object motion or shadows. In this study, we proposed an end-to-end framework based on Generative Adversarial Network, which can generate dynamic background information for the task of DBM in an unsupervised manner. Our proposed model can handle the problem of DBM in the presence of the challenges mentioned above by generating data similar to the desired information. The primary aim of our proposed model during training is to learn all the dynamic changes in a scene-specific background information. While, during testing, inverse mapping of data to latent space representation in our model generates dynamic backgrounds similar to test data. The comparative analysis of our proposed model upon experimental evaluations on SBM.net and SBI benchmark datasets has outperformed eight existing methods for DBM in many challenging scenarios.

Paper Link:

Citation:

@inproceedings{sultana2020unsupervised,
title={Unsupervised adversarial learning for dynamic background modeling},
author={Sultana, Maryam and Mahmood, Arif and Bouwmans, Thierry and Jung, Soon Ki},
booktitle={International Workshop on Frontiers of Computer Vision},
pages={248–261},
year={2020},
organization={Springer}
}

Cross-modal Speaker Verification and Recognition: A Multilingual Perspective (CVPRW, 2021)

Abstract:

Recent years have seen a surge in finding association between faces and voices within a cross-modal biometric application along with speaker recognition. Inspired from this, we introduce a challenging task in establishing association between faces and voices across multiple languages spoken by the same set of persons. The aim of this paper is to answer two closely related questions: “Is face-voice association language independent?” and “Can a speaker be recognized irrespective of the spoken language?”. These two questions are important to understand effectiveness and to boost development of multilingual biometric systems. To answer these, we collected a Multilingual Audio-Visual dataset, containing human speech clips of 154 identities with 3 language annotations extracted from various videos uploaded online. Extensive experiments on the two splits of the proposed dataset have been performed to investigate and answer these novel research questions that clearly point out the relevance of the multilingual problem.

Model Diagram

Paper Link:

Citation:

@inproceedings{nawaz2021cross,
title={Cross-modal speaker verification and recognition: A multilingual perspective},
author={Nawaz, Shah and Saeed, Muhammad Saad and Morerio, Pietro and Mahmood, Arif and Gallo, Ignazio and Yousaf, Muhammad Haroon and Del Bue, Alessio},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={1682–1691},
year={2021}
}

Cleaning Label Noise with Clusters for Minimally Supervised Anomaly Detection (CVPRW - Learning from Unlabeled Videos (LUV2020))

Abstract:

We present an image set classification algorithm based on unsupervised clustering of labeled training and unlabeled test data where labels are only used in the stopping criterion. The probability distribution of each class over the set of clusters is used to define a true set based similarity measure. To this end, we propose an iterative sparse spectral clustering algorithm. In each iteration, a proximity matrix is efficiently recomputed to better represent the local subspace structure. Initial clusters capture the global data structure and finer clusters at the later stages capture the subtle class differences not visible at the global scale. Image sets are compactly represented with multiple Grassmannian manifolds which are subsequently embedded in Euclidean space with the proposed spectral clustering algorithm. We also propose an efficient eigenvector solver which not only reduces the computational cost of spectral clustering by many folds but also improves the clustering quality and final classification results. Experiments on five standard datasets and comparison with seven existing techniques show the efficacy of our algorithm.

Model Diagram

Paper Link:

Citation:

@article{zaheer2021cleaning,
title={Cleaning label noise with clusters for minimally supervised anomaly detection},
author={Zaheer, Muhammad Zaigham and Lee, Jin-ha and Astrid, Marcella and Mahmood, Arif and Lee, Seung-Ik},
journal={arXiv preprint arXiv:2104.14770},
year={2021}}

Video Object Segmentation using Guided Feature and Directional Deep Appearance Learning (CVPRW, 2020)

Abstract:

In this work, we focus on semi-supervised Video Object Segmentation (VOS) problem, where an object mask is provided in the initial frame and VOS algorithm has to segment that object in the rest of the video frames. VOS is a challenging task due to object appearance variations, illumination changes, occlusion, background clutter and various distractions. Many online VOS methods have been proposed however, most of these methods limit their real-world applications due to computationally expensive online finetuning. On the contrary, many cost efficient template-based and propagation-based approaches suffer from degraded performance due to object appearance drifts. In order to tackle those issues, we propose a guided feature learning with directional deep appearance learning for VOS. First, we introduce guided feature modulation to capture the video context information based on target mask. Secondly, a directional matching module is utilized to learn pixel-wise semantic embedding. Third, a directional appearance model is integrated to represent the target and the background cues on a spherical embedding space. Finally, we propose a guided pooling decoder to learn the global and the local context information during refinement. The proposed network is trained offline and does not require fine-tuning. Our algorithm achieved an overall J and F score of 64.9 on the DAVIS 2020 test-challenge data and 60.9 on the DAVIS 2020 test-dev dataset.

Model Diagram

Paper Link:

Citation:

@inproceedings{fiaz2020video,
title={Video object segmentation using guided feature and directional deep appearance learning},
author={Fiaz, Mustansar and Mahmood, Arif and Jung, Soon Ki},
booktitle={Proceedings of the 2020 DAVIS Challenge on Video Object Segmentation-CVPR, Workshops, Seattle, WA, USA},
volume={19},
year={2020}
}

An anomaly detection system via moving surveillance robots with human collaboration (ICCVW, 2021)

Abstract:

Autonomous anomaly detection is a fundamental step in visual surveillance systems, and so we have witnessed great progress in the form of various promising algorithms. Nonetheless, majority of prior algorithms assume static surveillance cameras that severely restricts the coverage of the system unless the number of cameras is exponentially increased, consequently increasing both the installation and the monitoring costs. In the current work we propose an anomaly detection system based on mobile surveillance cameras, i.e., moving robots which continuously navigate a target area. We compare the newly acquired test images with a database of normal images using geo-tags. For anomaly detection, a Siamese network is trained which analyses two input images for anomalies while ignoring the viewpoint differences. Further, our system is capable of updating the normal images database with human collaboration. Finally, we propose a new tester dataset that is captured by repeated visits of the robot over a constrained outdoor industrial target area. Our experiments demonstrate the effectiveness of the proposed system for anomaly detection using mobile surveillance robots.

Model Diagram

Paper Link:

Citation:

@inproceedings{zaheer2021anomaly,
title={An anomaly detection system via moving surveillance robots with human collaboration},
author={Zaheer, Muhammad Zaigham and Mahmood, Arif and Khan, M Haris and Astrid, Marcella and Lee, Seung-Ik},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
pages={2595–2601},
year={2021}
}

Background/Foreground Separation: Guided Attention based Adversarial Modeling (GAAM) versus Robust Subspace Learning Methods (ICCVW, 2021)

Abstract:

Background-Foreground separation and appearance generation is a fundamental step in many computer vision applications. Existing methods like Robust Subspace Learning (RSL) suffer performance degradation in the presence of challenges like bad weather, illumination variations, occlusion, dynamic backgrounds and intermittent object motion. In the current work we propose a more accurate deep neural network based model for backgroundforeground separation and complete appearance generation of the foreground objects. Our proposed model, Guided Attention based Adversarial Model (GAAM), can efficiently extract pixel-level boundaries of the foreground objects for improved appearance generation. Unlike RSL methods our model extracts the binary information of foreground objects labeled as attention map which guides our generator network to segment the foreground objects from the complex background information. Wide range of experiments performed on the benchmark CDnet2014 dataset demonstrate the excellent performance of our proposed model.

Model Diagram

Paper Link:

Citation:

@inproceedings{sultana2021background,
title={Background/Foreground Separation: Guided Attention based Adversarial Modeling (GAAM) versus Robust Subspace Learning Methods},
author={Sultana, Maryam and Mahmood, Arif and Bouwmans, Thierry and Khan, Muhammad Haris and Jung, Soon Ki},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
pages={181–188},
year={2021}}

Deep Multiresolution Cellular Communities for Semantic Segmentation of Multi-Gigapixel Histology Images (ICCVW, 2019)

Abstract:

Tissue phenotyping in cancer histology images is a fundamental step in computational pathology. Automatic tools for tissue phenotyping assist pathologists for digital profiling of the tumor microenvironment. Recently, deep learning and classical machine learning methods have been proposed for tissue phenotyping. However, these methods do not integrate the cellular community interaction features which present biological significance in tissue phenotyping context. In this paper, we propose to exploit deep multiresolution cellular communities for tissue phenotyping from multi-level cell graphs and show that such communities offer better performance compared to deep learning and texture-based methods. We propose to use deep features extracted from two distinct layers of a deep neural network at the cell-level, in order to construct cellular graphs encoding cellular interactions at multiple scales. From these graphs, we extract cellular interaction-based features, which are then employed to construct patch-level graphs. Multiresolution communities are detected by considering the patchlevel graphs as layers of multi-level graphs, and also by proposing novel objective function based on non-negativematrix factorization. We report results of our experimentson two datasets for colon cancer tissue phenotyping and demonstrate excellent performance of the proposed algorithm as compared to current state-of-the-art methods.

Model Diagram

Paper Link:

Citation:

@inproceedings{javed2019deep,
title={Deep multiresolution cellular communities for semantic segmentation of multi-gigapixel histology images},
author={Javed, Sajid and Mahmood, Arif and Werghi, Naoufel and Rajpoot, Nasir},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops},
pages={0–0},
year={2019}
}

Complete Moving Object Detection in the Context of Robust Subspace Learning (ICCVW, 2019)

Abstract:

Complete moving object detection plays a vital role in many applications of computer vision. For instance, depth estimation, scene understanding, object interaction, semantic segmentation, accident detection and avoidance in case of moving vehicles on a highway. However, it becomes challenging in the presence of dynamic backgrounds, camouflage, bootstrapping, varying illumination conditions, and noise. Over the past decade, robust subspace learning based methods addressed the moving objects detection problem with excellent performance. However, the moving objects detected by these methods are incomplete, unable to generate the occluded parts. Indeed, complete or occlusionfree moving object detection is still challenging for these methods. In the current work, we address this challenge by proposing a conditional Generative Adversarial Network (cGAN) conditioned on non-occluded moving object pixels during training. It therefore learns the subspace spanned by the moving objects covering all the dynamic variations and semantic information. While testing, our proposed Complete cGAN (CcGAN) is able to generate complete occlusion free moving objects in challenging conditions. The experimental evaluations of our proposed method are performed on SABS benchmark dataset and compared with 14 state-of-the-art methods, including both robust subspace and deep learning based methods. Our experiments demonstrate the superiority of our proposed model over both types of existing methods.

Model Diagram

Complete Moving Object Detection in the Context of Robust Subspace Learning (ICCVW, 2019)

Paper Link:

Citation:

@inproceedings{sultana2019complete,
title={Complete moving object detection in the context of robust subspace learning},
author={Sultana, Maryam and Mahmood, Arif and Bouwmans, Thierry and Ki Jung, Soon},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops},
pages={0–0},
year={2019}}

Do Cross Modal Systems Leverage Semantic Relationships? (ICCVW, 2019)

Abstract:

Current cross modal retrieval systems are evaluated using R@K measure which does not leverage semantic relationships rather strictly follows the manually marked image text query pairs. Therefore, current systems do not generalize well for the unseen data in the wild. To handle this, we propose a new measure SemanticMap to evaluate the performance of cross modal systems. Our proposed measure evaluates the semantic similarity between the image and text representations in the latent embedding space. We also propose a novel cross modal retrieval system using a single stream network for bidirectional retrieval. The proposed system is based on a deep neural network trained using extended center loss, minimizing the distance of image and text descriptions in the latent space from the class centers. In our system, the text descriptions are also encoded as images which enabled us to use single stream network for both text and images. To the best of our knowledge, our work is the first of its kind in terms of employing a single stream network for cross modal retrieval systems. The proposed system is evaluated on two publicly available datasets including MSCOCO and Flickr30K and has shown comparable results to the current state-of-the-art methods.

Model Diagram

Paper Link:

Citation:

@inproceedings{nawaz2019cross,
title={Do cross modal systems leverage semantic relationships?},
author={Nawaz, Shah and Kamran Janjua, Muhammad and Gallo, Ignazio and Mahmood, Arif and Calefati, Alessandro and Shafait, Faisal},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops},
pages={0–0},
year={2019}
}

Bag of Visual Words Approach for Classification of Benign and Malignant Masses in Mammograms Using Voting Based Feature Encoding (IWBI, 2018)

Abstract:

Classification of benign and malignant masses in mammograms is a challenging problem. It has wide applications in the development of Computer Aided Diagnosis (CAD) systems, however many challenges still need to be addressed. Due to the risk associated with segmenting the mass region, focus is shifting from selecting the features just from the mass area, to the whole Region of Interest (RoI) containing that mass. Bag of Visual Words (BoVW) techniques are gaining attention for classification tasks in medical imaging by considering RoI as a set of local features. In general BoVW aims to construct a global descriptor based on the extracted local features. In this work, we investigate the performance of BoVW for the classification of benign and malignant mammographic masses. Several features have been explored as the local features and different methods are applied for building the code-book. Subsequently we propose a voting-based approach to encode the features. The proposed approach is evaluated on a subset of DDSM dataset. Initial results reveal classification accuracy as high as 87% and Area Under the Curve (AUC) as 0.93, which are better than the current state-of-the-art approaches applied to the same problem.

Model Diagram

Paper Link:

Citation:

@inproceedings{inproceedings,
author = {Suhail, Zobia and Denton, Erika and Zwiggelaar, Reyer and Mahmood, Arif},
year = {2018},
month = {07},
pages = {2},
title = {Bag of visual words based approach for the classification of benign and malignant masses in mammograms using voting-based feature encoding},
doi = {10.1117/12.2316307}
}

Unsupervised RGBD Video Object Segmentation Using GANs (ACCVW, 2018)

Abstract:

Video object segmentation is a fundamental step in many advanced vision applications. Most existing algorithms are based on handcrafted features such as HOG, super-pixel segmentation or texturebased techniques, while recently deep features have been found to be more efficient. Existing algorithms observe performance degradation in the presence of challenges such as illumination variations, shadows, and color camouflage. To handle these challenges we propose a fusion based moving object segmentation algorithm which exploits color as well as depth information using GAN to achieve more accuracy. Our goal is to segment moving objects in the presence of challenging background scenes, in real environments. To address this problem, GAN is trained in an unsupervised manner on color and depth information independently with challenging video sequences. During testing, the trained GAN generates backgrounds similar to that in the test sample. The generated background samples are then compared with the test sample to segment moving objects. The final result is computed by fusion of object boundaries in both modalities, RGB and the depth.

Model Diagram

Paper Link:

Citation:

@article{sultana2018unsupervised,
title={Unsupervised rgbd video object segmentation using gans},
author={Sultana, Maryam and Mahmood, Arif and Javed, Sajid and Jung, Soon Ki},
journal={arXiv preprint arXiv:1811.01526},
year={2018}
}

11. M Ghafoor, and Arif Mahmood, “Quantification of Occlusion Handling Capability of 3D Human Pose Estimation Framework.” IEEE Transactions on Multimedia, 2022, (IF 8.182).

10. M Z Zaheer, Arif Mahmood, M H Khan, M Segu, F Yu, S I Lee, “Generative Cooperative Learning for Unsupervised Video Anomaly Detection”, in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022.

11. S Javed, Arif Mahmood, J Dias, and N Werghi, “Multi-Level Feature Fusion for Nucleus Detection in Histology Images Using Correlation Filters” Computers in Biology and Medicine, 2022, (IF 6.698).

12. T Hassan, S Javed, Arif Mahmood, T Qaiser, N Werghi, and N Rajpoot, “Nucleus Classification in Histology Images Using Message Passing Network.” Medical Image Analysis, 2022, (IF 13.828).

13. Y Hao, Z Tang, B Alzahrani, R Alotaibi, R Alharthi, M Zhao, and Arif Mahmood, “An End-to-End Human Abnormal Behavior Recognition Framework for Crowds with Mentally Disordered Individuals.” IEEE journal of Biomedical and Health Informatics, 2022, (IF 7.021).

14. S Aftab, S F Ali, Arif Mahmood, and U Suleman, “A Boosting Framework for Human Posture Recognition Using Spatio-Temporal Features along with Radon Transform.” Multimedia Tools and Applications, 2022 (IF 2.577).

15. S Aldhaheri, R Alotaibi, B Alzahrani, A Hadi, Arif Mahmood, A Alhothali, and A Barnawi, “MACC Net: Multi-Task Attention Crowd Counting Network”, Applied Intelligence, 2022, (IF 5.019).

16. M Sultana, Arif Mahmood, and S K Jung, “Unsupervised Moving Object Segmentation Using Background Subtraction and Optimal Adversarial Noise Sample Search”, Pattern Recognition, 2022, (IF 8.518).

17. M Ghafoor, K Javed, and Arif Mahmood, “Walk Like Me: Video to Video Action Transfer” IEEE TechRxiv, 2022.

18. S M Shakeel, Y Zhang, X Wang, W Kang, and Arif Mahmood, “Multi-Scale Attention Guided Network for End-to-End Face Alignment and Recognition” Journal of Visual Communication and Image Representation, 2022, (IF 2.887).

19. J H Giraldo, Arif Mahmood, B Garcia-Garcia, D Thanou, and T Bouwmans, “Reconstruction of Time-Varying Graph Signals via Sobolev Smoothness” IEEE Transactions on Signal and Information Processing over Networks, 2022, (IF 3.301).

20. B Yousaf, M Usama, W Sultani, Arif Mahmood, and J Qadir, “Fake Visual Content Detection Using Two-Stream Convolutional Neural Networks” Neural Computing and Applications, 2022, (IF 5.102).

21. S Javed, Arif Mahmood, I Ullah, and T Bouwmans, “A Novel Algorithm Based on a Common Subspace Fusion for Visual Object Tracking” IEEE Access, 2022, (IF 3.476).

22. R Wang, R Alotaibi, B Alzahrani, Arif Mahmood, G Wu, H Xia, A Alshehri, and S Aldhaheri, “AAC: Automatic Augmentation for Crowd Counting” Neurocomputing, 2022, (IF 5.779).

23. I Ganapathi, S Javed, S S Ali, Arif Mahmood, N S Vu, and N Werghi, “Learning to Localize Image Forgery Using End-to-End Attention Network.” Neurocomputing, 2022, (IF 5.779).

24. M Sultana, Arif Mahmood, T Bouwmans, M H Khan, and S K Jung, “Moving Objects Segmentation Using Generative Adversarial Modeling” Neurocomputing, 2022, (IF 5.779).

25. S Ali, Arif Mahmood, S K Jung, “Lightweight Encoder-Decoder Architecture for Foot Ulcer Seg-mentation” in International Workshop on Frontiers of Computer Vision, Japan, 2022.

26. M Fiaz, Arif Mahmood, S S Farooq, K Ali, M Shaheryar, S K Jung, “Video Object Segmentation Based on Guided Feature Transfer Learning” in International Workshop on Frontiers of Computer Vision, Japan, 2022. [Best Paper Award]

27. S Javed, Arif Mahmood, J Dias, L Seneviratne, N Werghi, “Hierarchical Spatiotemporal Graph Regularized Discriminative Correlation Filter for Visual Object Tracking”, in IEEE Transactions on Cybernetics, 2021. (IF 11.079)

28. J Iqbal, MA Munir, Arif Mahmood, AR Ali, M Ali, “Leveraging orientation for weakly supervised object detection with application to firearm localization”, Neurocomputing, 2021. (IF 4.438)

29. S Javed, Arif Mahmood, N Rajpoot, J Dias, N Werghi, “Spatially Constrained Context-Aware Hierarchical Deep Correlation Filters for Nucleus Detection in Histology Images”, Medical Image Analysis, 2021. (IF 11.48)

30. M Asim, C Brekke, Arif Mahmood, T Eltoft, M Reigstad, “Improving Chlorophyll-a Estimation from sentinel-2 (MSI) in the Barents Sea using Machine Learning”, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2021. (IF 3.827)

31. W Abbas, MF Khan, M Taj, Arif Mahmood, “Statistically correlated multi-task learning for autonomous driving”, Neural Computing and Applications, 2021. (IF 4.774)

32. M Farooq, M N Dailey, Arif Mahmood, J Moonrinta, M Ekpanyapong, “Human face super- resolution on poor quality surveillance video footage”, Neural Computing and Applications, 2021. (IF 4.774)

33. S Nawaz, M S Saeed, P Morerio, Arif Mahmood, I Gallo, M H Yousaf, “Cross-modal Speaker Verification and Recognition: A Multilingual Perspective”, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshop, 2021.

34. B Yousaf, M Usama, W Sultani, Arif Mahmood, J Qadir, “Fake Visual Content Detection Using Two-Stream Convolutional Neural Networks”, arXiv preprint arXiv:2101.00676, 2021.

35. M S Saeed, P Morerio, Arif Mahmood, I Gallo, M H Yousaf, “Cross-modal Speaker Verification and Recognition: A Multilingual Perspective”, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshop, 2021.

36. M Fiaz, K Ali, S B Yun, K Y Baek, H J Lee, I S Kim, Arif Mahmood, S S Farooq, S K Jung, “Robust Tracking via Feature Enrichment and Overlap Maximization”, International Workshop on Frontiers of Computer Vision (IW-FCV) 2021.

37. MZ Zaheer, Arif Mahmood, MH Khan, M Astrid, SI Lee, “An anomaly detection system via moving surveillance robots with human collaboration” in Proceedings of the IEEE/CVF International Conference on Computer Vision Workshop, 2021.

38. M Sultana, Arif Mahmood, T Bouwmans, MH Khan, SK Jung, “Background/Foreground Separation: Guided Attention based Adversarial Modeling (GAAM) versus Robust Subspace Learning Methods”, Proceedings of the IEEE/CVF International Conference on Computer Vision Workshop, 2021.

39. S Javed, Arif Mahmood, K Benes, N Rajpoot, “Multiplex Cellular Communities in Multi-Gigapixel Colorectal Cancer Histology Images for Tissue Phenotyping”, IEEE Transactions on Image Processing (TIP), 2020. (IF 9.340)

40. S Javed, Arif Mahmood, JM Dias, N Werghi, “Robust Structural Low-Rank Tracking” IEEE Transactions on Image Processing (TIP), 2020. (IF 9.340)

41. S Javed, Arif Mahmood, M M Fraz, N A Koohbanania, K Benesc, Y W Tsangc, K Hewittc, D Epsteind, D Sneadc, N Rajpoot, “Cellular Community Detection For Tissue Phenotyping In Colorectal Cancer Histology Images”, Medical Image Analysis (MEDIA), July 2020. (IF 11.48)

42. M Sultana, Arif Mahmood, SK Jung, “Unsupervised Moving Object Detection in Complex Scenes Using Adversarial Regularizations” IEEE Transactions on Multimedia (TMM), 2020. (IF 6.051)

43. M Abdullah, W Iqbal, Arif Mahmood, F Bukhari, A Erradi, “Predictive Autoscaling of Microservices Hosted in Fog Microdata Center”, IEEE Systems Journal, 2020. (IF 3.987)

44. MZ Zaheer, Arif Mahmood, H Shin, SI Lee, “A Self-Reasoning Framework for Anomaly Detection Using Video-Level Labels”, IEEE Signal Processing Letters (SPL), 2020. (IF 3.105)

45. MZ Zaheer, Arif Mahmood, M Astrid, SI Lee, “CLAWS: Clustering Assisted Weakly Supervised Learning with Normalcy Suppression for Anomalous Event Detection”, European Conference on Computer Vision (ECCV), 2020.

46. M Fiaz, Arif Mahmood, KY Baek, SS Farooq, SK Jung, “Improving Object Tracking by Added Noise and Channel Attention” Sensors, 2020. (IF 3.275)

47. M Fiaz, Arif Mahmood, SK Jung, “Learning soft mask based feature fusion with channel and spatial attention for robust visual object tracking” Sensors 20 (14), 2020. (IF 3.275)

48. MZ Zaheer, J Lee, M Astrid, Arif Mahmood, SI Lee, “Cleaning Label Noise with Clusters for Minimally Supervised Anomaly Detection”, Computer Vision Pattern Recognition Workshop (CVPRW), 2020.

49. M Fiaz, Arif Mahmood, SK Jung, “Video Object Segmentation using Guided Feature and Directional Deep Appearance Learning” Computer Vision Pattern Recognition Workshop (DAVIS Challenge), 2020.

50. M Sultana, Arif Mahmood, T Bouwmans, SK Jung, “Dynamic Background Subtraction using Least Squares Adversarial Learning”, IEEE International Conference on Image Processing (ICIP), 2020.

51. S Javed, Arif Mahmood, J Dias, N Werghi, “CS-RPCA: Clustered Sparse RPCA for Moving Object Detection”, IEEE International Conference on Image Processing (ICIP) , 2020.

52. A Basit, MA Munir, M Ali, N Werghi, Arif Mahmood, “Localizing firearm carriers by identifying human-object pairs”, IEEE International Conference on Image Processing (ICIP), 2020.

53. M Asim, C Brekke, Arif Mahmood, T Eltoft, M Reigstad, “Ocean Color Net (OCN) for the Barents Sea”, IEEE International Geoscience and Remote Sensing Symposium (IGRSS), Aug, 2020.

54. M Fiaz, M M Rahman, Arif Mahmood, S S Farooq, K Y Baek, S K Jung, “Adaptive Feature Selection Siamese Networks for Visual Tracking”, International Workshop on Frontiers of Computer Vision (IW-FCV), Ibusuki, Japan, January 2020. (Best Student Paper Award)

55. M Sultana, Arif Mahmood, T Bouwmans, S K Jung, “Unsupervised Adversarial Learning for Dynamic Background Modelling”, International Workshop on Frontiers of Computer Vision (IW- FCV), Ibusuki, Japan, January 2020. (Best Paper Award)

56. N Khan, A Akram, Arif Mahmood, S Ashraf, K Murtaza, “Masked Linear Regression for Learning Local Receptive Fields for Facial Expression Synthesis”, International Journal of Computer Vision (IJCV), September 2019. (IF 6.071)

57. W Iqbal, A Erradi, M Abdullah, Arif Mahmood, “Predictive Auto-scaling of Multi-tier Applications Using Performance Varying Cloud Resources”, in IEEE Transactions on Cloud Computing (TCC), September 2019. (IF 5.967)

58. M S Farid, Arif Mahmood, S Al-Maadeed, “Multi-focus Image Fusion Using Content Adaptive Blurring”, in Information Fusion, January 2019 (IF 10.716)

59. S Javed, Arif Mahmood, S Al-Maadeed, T Bouwmans, S K Jung, “Moving Object Detection in Complex Scene Using Spatiotemporal Structured-Sparse RPCA”, in IEEE Transactions on Image Processing (TIP), February 2019. (IF 6.79)

60. M Shaban, Arif Mahmood, S Al-Maadeed, N Rajpoot “An Information Fusion Framework for Person Localization Via Body Pose in Spectator Crowds”, in Information Fusion, November 2019. (IF 10.716)

61. H Ullah, M Uzair, Arif Mahmood, H Ullah, S D Khan, F A Sheikh “Internal Emotion Classification Using EEG Signal with Sparse Discriminative Ensemble”, in IEEE Access, March 2019. (IF 4.098)

62. M Fiaz, Arif Mahmood, S Javed, S K Jung “Handcrafted and Deep Trackers: Recent Visual Object Tracking Approaches and Trends”, in ACM Computing Surveys, Jan 2019 (IF 5.55)

63. B Iqbal, W Iqbal, N Khan, Arif Mahmood, A Erradi “Canny edge detection and Hough transform for high resolution video streams using Hadoop and Spark”, in Cluster Computing, April 2019 (IF 1.851)

64. M Sultana, Arif Mahmood, S Javed, S K Jung, “Unsupervised deep context prediction for background estimation and foreground segmentation”, in Machine Vision and Applications (MVA), April 2019. (IF 1.788)

65. A Erradi, W Iqbal, Arif Mahmood, A Bouguettaya, “Web application resource requirements estimation based on the workload latent features”, in IEEE Transactions on Services Computing (TSC), May 2019 (IF 5.707)

66. Arif Mahmood, S Al-Maadeed, “Action recognition in poor quality spectator crowd videos using head distribution based person segmentation”, in Machine Vision and Applications (MVA), June 2019 (IF 1.788)

67. M Faiz, Arif Mahmood, S K Jung, “Using Convolutional Neural Network With Structural Input for Visual Object Tracking” in ACM/SIGAPP Symposium on Applied Computing (SAC), Cyprus, April 2019.

68. J Iqbal, M A Munir, Arif Mahmood, A R Ali, M Ali, “Orientation Aware Object Detection with Application to Firearms”, arXiv: 2662045, April 2019

69. M Faiz, Arif Mahmood, S K Jung, “Deep Siamese networks towards robust visual tracking” in “Visual Object Tracking in the Deep Neural Networks Era” IntechOpen Publishers, April 2019 (in Press)

70. S Javid, Arif Mahmood, N Werghi, J M M Dias, “Structural Low-Rank Tracking”, IEEE international conference on Advanced Video and Signal-based Surveillance (AVSS), Taiwan, Taipei, September 2019.

71. M Sultana, Arif Mahmood, T Bouwmans, S K Jung, “Complete Moving Object Detection in the Context of Robust Subspace Learning” in IEEE International Conference on Computer Vision (RSLCV Workshop) 2019, Seoul, South Korea.

72. S Javed, Arif Mahmood, N Werghi, N Rajpoot, “Deep Multiresolution Cellular Communities for Semantic Segmentation of Multi-Gigapixel Histology Images”, in IEEE International Conference on Computer Vision (VRMI Workshop) 2019, Seoul, South Korea.

73. S Nawaz, M K Janjua, I Gallo, Arif Mahmood, A Calefati, F Shafait, “Do Cross Modal Systems Leverage Semantic Relationships?” in IEEE International Conference on Computer Vision (CroMol Workshop) 2019, Seoul, South Korea.

74. S Nawaz, M K Janjua, I Gallo, Arif Mahmood, A Calefati, “Deep Latent Space Learning for Cross- modal Mapping of Audio and Visual Signals” in Digital Image Computing: Techniques and Applications (DICTA) 2019, Perth, Australia.”

75. Arif Mahmood, M Uzair, S Al-Maadeed, “Multi-order Statistical Descriptors for Real-time Face Recognition and Object Classification”, in IEEE Access, 2018 (IF 4.098).

76. I Rida, S Al-maadeed, Arif Mahmood, A Boridane, S Baksi “Palmprint Identification Using an Ensemble of Sparse Representations”, in IEEE Access, 2018 (IF 4.098).

77. S Javed, Arif Mahmood, T Bouwmans, S K Jung, “Spatiotemporal Low-rank Modeling for Complex Scene Background Initialization”, in IEEE Transactions on Circuits and Systems for Video Technology, 2018. (IF 4.046)

78. S Ali, R Khan, Arif Mahmood, M Hassan, M Jeon, “Using temporal covariance of motion and geometric features via boosting for human fall detection”, in Sensors, 2018. (IF 3.031)

79. W Iqbal, A Erradi, Arif Mahmood, “Dynamic workload patterns prediction for proactive auto-scaling of web applications”, in Journal of Network and Computer Applications, 2018. (IF 5.273)

80. M Fiaz, Arif Mahmood, S K Jung “Tracking Noisy Targets: A Review of Recent Object Tracking Approaches”, available online arXiv:1802.03098, 2018.

81. Z Suhail, Arif Mahmood, L Wang, P N Malcolm, and R Zwiggelaar, “A Voting-Based Encoding Technique for the Classification of Gleason Score for Prostate Cancers”, in Medical Image Understanding and Analysis (MIUA), University of Southampton, UK, July 2018.

82. Z Suhail, Arif Mahmood, E R E Denton, R Zwiggelaar, “Bag of Visual Words Approach for Classification of Benign and Malignant Masses in Mammograms Using Voting Based Feature Encoding”, in International Workshop on Breast Imaging (IWBI), Atlanta, Georgia, USA, July 2018.

83. M Sultana, Arif Mahmood, S Javed, S K Jung, “Unsupervised RGBD Video Object Segmentation Using GANs”, in ACCV Workshop on RGBD-Sensing and Understanding via Combined Color and Depth, Perth, Australia, December 2018.

Superpixels based Manifold Structured Sparse RPCA for Moving Object Detection (International Workshop on Activity Monitoring by Multiple Distributed Sensing, 2017)

Abstract:

Moving Object Detection (MOD) is a fundamental step in various computer vision and video surveillance systems. Methods based on Robust Principal Component Analysis (RPCA) have often been used for MOD. If the low-rank and sparse matrices are relatively coherent, e.g., if there are similarities between the moving objects and the background regions, and/or when the background is more complicated e.g., dynamic scenes, camera jitter, and lighting conditions, the accuracy of these methods deteriorates. It is because these methods assume that the elements in the sparse component are mutually independent, and thus ignore the spatiotemporal structure of the sparse component. To handle this problem, we propose spatiotemporal structured sparse RPCA algorithm for moving object detection. For this purpose, we incorporate two different manifold regularizations on the sparse component based on the local and global invariance assumption. A spatial and a temporal graph Laplacian regularization is encoded in the form of spectral graph structure. Both graphs are constructed using multiple features extracted from superpixels computed over the input data matrix. We propose a novel objective function to disentangle moving objects in the presence of complicated backgrounds. We evaluate our algorithm on challenging videos taken from six different datasets, including dynamic backgrounds, lighting condition, and camera jitter sequences. Our experiments have demonstrated excellent results compared to the current methods.

Paper Link:

Citation:

@inproceedings{javed:hal-01580053,
TITLE = {{Superpixels based Manifold Structured Sparse RPCA for Moving Object Detection}},
AUTHOR = {Javed, Sajid and Mahmood, Arif and Bouwmans, Thierry and Soon, Ki Jung},
URL = {https://hal.science/hal-01580053},
BOOKTITLE = {{International Workshop on Activity Monitoring by Multiple Distributed Sensing, 2017}},
ADDRESS = {Londres, United Kingdom},
YEAR = {2017} }