Publications

``*" indicates corresponding author and ``#" indicates co-first author.

Journal Publications (Google Scholar Profile)

  1. M2Trans: Multi-Modal Regularized Coarse-to-Fine Transformer for Ultrasound Image Super-Resolution
    Zhangkai Ni, Runyu Xiao, Wenhan Yang, Hanli Wang, Zhihua Wang, Lihua Xiang, and Liping Sun.
    IEEE Journal of Biomedical and Health Informatics (J-BHI), Early Access, September, 2024, DOI: 10.1109/JBHI.2024.3454068.
    Abstract | Paper | Code | BibTex Abstract: Ultrasound image super-resolution (SR) aims to transform low-resolution images into high-resolution ones, thereby restoring intricate details crucial for improved diagnostic accuracy. However, prevailing methods relying solely on image modality guidance and pixel-wise loss functions struggle to capture the distinct characteristics of medical images, such as unique texture patterns and specific colors harboring critical diagnostic information. To overcome these challenges, this paper introduces the Multi-Modal Regularized Coarse-to-fine Transformer (M2Trans) for Ultrasound Image SR. By integrating the text modality, we establish joint image-text guidance during training, leveraging the medical CLIP model to incorporate richer priors from text descriptions into the SR optimization process, enhancing detail, structure, and semantic recovery. Furthermore, we propose a novel coarse-to-fine transformer comprising multiple branches infused with self-attention and frequency transforms to efficiently capture signal dependencies across different scales. Extensive experimental results demonstrate significant improvements over state-of-the-art methods on benchmark datasets, including CCA-US, US-CASE, and our newly created dataset MMUS1K, with a minimum improvement of 0.17dB, 0.30dB, and 0.28dB in terms of PSNR. Our code and dataset will be available at: https://github.com/eezkni/M2Trans
    @article{ni2024m2trans, 
    	title={M2Trans: Multi-Modal Regularized Coarse-to-Fine Transformer for Ultrasound Image Super-Resolution}, 
    	author={Ni, Zhangkai and Xiao, Runyu and Yang, Wenhan and Wang, Hanli and Wang, Zhihua and Xiang, Lihua and Sun, Liping}, 
    	journal={IEEE Journal of Biomedical and Health Informatics}, 
    	year={2024}, 
    	publisher={IEEE} }
    
  2. Opinion-Unaware Blind Image Quality Assessment using Multi-Scale Deep Feature Statistics
    Zhangkai Ni, Yue Liu, Keyan Ding, Wenhan Yang, Hanli Wang, and Shiqi Wang.
    IEEE Transactions on Multimedia (T-MM), Early Access, May, 2024, DOI: 10.1109/TMM.2024.3405729.
    Abstract | Paper | Code | BibTex Abstract: Deep learning-based methods have significantly influenced the blind image quality assessment (BIQA) field, however, these methods often require training using large amounts of human rating data. In contrast, traditional knowledge-based methods are cost-effective for training but face challenges in effectively extracting features aligned with human visual perception. To bridge these gaps, we propose integrating deep features from pre-trained visual models with a statistical analysis model into a Multi-scale Deep Feature Statistics (MDFS) model for achieving opinion-unaware BIQA (OU-BIQA), thereby eliminating the reliance on human rating data and significantly improving training efficiency. Specifically, we extract patch-wise multi-scale features from pre-trained vision models, which are subsequently fitted into a multivariate Gaussian (MVG) model. The final quality score is determined by quantifying the distance between the MVG model derived from the test image and the benchmark MVG model derived from the high-quality image set. A comprehensive series of experiments conducted on various datasets show that our proposed model exhibits superior consistency with human visual perception compared to state-of-the-art BIQA models. Furthermore, it shows improved generalizability across diverse target-specific BIQA tasks. Our code is available at: https://github.com/eezkni/MDFS
    @article{ni2024opinion,
    	title={Opinion-Unaware Blind Image Quality Assessment using Multi-Scale Deep Feature Statistics},
    	author={Ni Zhangkai, and Liu Yue, and Ding Keyan, and Yang Wenhan, and Wang Hanli, and Wang Shiqi},
    	journal={IEEE Transactions on Multimedia},
    	year={2024},
    	publisher={IEEE}
    }
  3. A Dynamic Evolution Model for Decentralized Autonomous Car Clusters in a Highway Scene
    Jiujun Cheng, Huiyu Sun, Zhangkai Ni*, and Aiguo Zhou, and Dongjie Ye.
    IEEE Transactions on Computational Social Systems (T-CSS), vol. 11, no.3, June 2024.
    Abstract | Paper | Code | BibTex Abstract: Cluster evolution is a challenging problem for vehicular ad hoc network (VANET) in a highway scene with fast moving autonomous vehicles and frequent cluster topology changes. Most of the existing studies analyze the cluster evolution behavior of cluster heads (CHs), and these approaches lead to frequent changes in vehicle structure when CHs change, which easily makes the cluster unstable. In this work, we propose a decentralized autonomous car cluster dynamic evolution model. First, we define a decentralized cluster structure. Then, we analyze the cluster evolution behavior and propose a maintenance method. Next, we define eight vehicle states and their transitions. Finally, we introduce the cluster dynamic evolution model and the collaboration model. The results of extensive simulation experiments show that our method can effectively maintain the consistency of cluster consensus and improve the stability of the cluster structure compared with the centralized cluster maintenance method.
    @article{cheng2023dynamic,
    	title={A Dynamic Evolution Model for Decentralized Autonomous Car Clusters in a Highway Scene},
    	author={Cheng, Jiujun and Sun, Huiyu and Ni, Zhangkai and Zhou, Aiguo},
    	journal={IEEE Transactions on Computational Social Systems},
    	volume={11},
    	number={3},
    	pages={3792--3802},
    	year={2024},
    	publisher={IEEE}
    }
  4. Glow in the Dark: Low-Light Image Enhancement with External Memory
    Dongjie Ye, Zhangkai Ni, Wenhan Yang, Hanli Wang, Shiqi Wang, and Sam Kwong.
    IEEE Transactions on Multimedia (T-MM), vol. 26, pp. 2148-2163, July 2023.
    Abstract | Paper | Code | BibTex Abstract: Deep learning-based methods have achieved remarkable success with powerful modeling capabilities. However, the weights of these models are learned over the entire training dataset, which inevitably leads to the ignorance of sample specific properties in the learned enhancement mapping. This situation causes ineffective enhancement in the testing phase for the samples that differ significantly from the training distribution. In this paper, we introduce external memory to form an external memory-augmented network (EMNet) for low-light image enhancement. The external memory aims to capture the sample specific properties of the training dataset to guide the enhancement in the testing phase. Benefiting from the learned memory, more complex distributions of reference images in the entire dataset can be “remembered” to facilitate the adjustment of the testing samples more adaptively. To further augment the capacity of the model, we take the transformer as our baseline network, which specializes in capturing long-range spatial redundancy. Experimental results demonstrate that our proposed method has a promising performance and outperforms state-of-the-art methods. It is noted that, the proposed external memory is a plug-and-play mechanism that can be integrated with any existing method to further improve the enhancement quality. More practices of integrating external memory with other image enhancement methods are qualitatively and quantitatively analyzed. The results further confirm that the effectiveness of our proposed memory mechanism when combing with existing enhancement methods. Our codes is available at: https://github.com/Lineves7/EMNet
    @article{ye2023glow,
    	title={Glow in the Dark: Low-Light Image Enhancement with External Memory},
    	author={Ye, Dongjie and Ni, Zhangkai and Yang, Wenhan and Wang, Hanli and Wang, Shiqi and Kwong, Sam},
    	journal={IEEE Transactions on Multimedia},
    	volume={26},
    	year={2023},
    	publisher={IEEE}
    }
  5. Neural Network Based Rate Control for Versatile Video Coding
    Yunhao Mao, Meng Wang, Zhangkai Ni, Shiqi Wang, and Sam Kwong.
    IEEE Transactions on Circuits and Systems for Video Technology (T-CSVT), vol. 33, no. 10, pp. 6072-6085, October 2023.
    Abstract | Paper | Code | BibTex Abstract: In this work, we propose a neural network based rate control algorithm for Versatile Video Coding (VVC). The proposed method relies on the modeling of the Rate-Quantization (R-Q) and Distortion-Quantization (D-Q) relationships in a data driven manner based upon the characteristics of prediction residuals. In particular, a pre-analysis framework is adopted, in an effort to obtain the prediction residuals which govern the Rate-Distortion (R-D) behaviors. By inferring from the prediction residuals with deep neural networks, the Coding Tree Unit (CTU) level R-Q and D-Q model parameters are derived, which could efficiently guide the optimal bit allocation. Subsequently, the coding parameters, including Quantization Parameter (QP) and λ, at both frame and CTU levels, are obtained according to allocated bit-rates. We implement the proposed rate control algorithm on VVC Test Model (VTM-13.0). Experimental results exhibit that the proposed rate control algorithm achieves 0.77% BD-Rate savings under Low Delay B (LDB) configurations when compared to the default rate control algorithm used in VTM-13.0. For Random Access (RA) configurations, 1.77% BD-Rate savings can be observed. Furthermore, with better bit-rate estimation, more stable buffer status can be observed, further demonstrating the advantages of the proposed rate control method.
    @article{mao2023neural,
    	title={Neural Network Based Rate Control for Versatile Video Coding},
    	author={Mao, Yunhao and Wang, Meng and Ni, Zhangkai and Wang, Shiqi and Kwong, Sam},
    	journal={IEEE Transactions on Circuits and Systems for Video Technology},
    	volume={33},
    	number={10},
    	pages={6072--6085},
    	year={2023},
    	publisher={IEEE}
    }
  6. A CTU-level Screen Content Rate Control for Low-delay Versatile Video Coding
    Yi Chen, Meng Wang, Shiqi Wang, Zhangkai Ni, and Sam Kwong.
    IEEE Transactions on Circuits and Systems for Video Technology (T-CSVT), vol. 33, no. 9, pp. 5227-5241, September 2023.
    Abstract | Paper | Code | BibTex Abstract: In this paper, a rate control scheme for screen content video coding is proposed for the Versatile Video Coding (VVC) standard. In view of the critical challenges arising from the spatial and temporal unnaturalness of screen content sequences, the proposed method relies on the specifically designed pre-analysis such that the content information regarding the scene complexity can be obtained. As such, the estimated residual complexity is then incorporated into the proposed complexity-aware rate models and distortion models, leading to the optimal bit allocations for each frame and coding tree unit (CTU). In particular, the optimization problem can be analytically solved with the proposed models, and the coding parameters such as Lagrangian multiplier λ and quantization parameter of each frame and CTU could be delicately derived according to the allocated bits through the proposed analytical models. Extensive experiments have been conducted to evaluate the effectiveness of the proposed method. Compared to the default hierarchical λ-domain rate control and other screen content rate control algorithms, the proposed method could achieve obvious RD performance gain, and the bit-rate accuracy could be improved.
    @article{chen2023ctu,
    	title={A CTU-level Screen Content Rate Control for Low-delay Versatile Video Coding},
    	author={Chen, Yi and Wang, Meng and Wang, Shiqi and Ni, Zhangkai and Kwong, Sam},
    	journal={IEEE Transactions on Circuits and Systems for Video Technology},
    	volume={33},
    	number={9},
    	pages={5227--5241},
    	year={2023},
    	publisher={IEEE}
    }
  7. High Dynamic Range Image Quality Assessment Based on Frequency Disparity
    Yue Liu, Zhangkai Ni, Shiqi Wang, Hanli Wang, and Sam Kwong.
    IEEE Transactions on Circuits and Systems for Video Technology (T-CSVT), vol. 33, no. 8, pp. 4435-4440, August 2023.
    Abstract | Paper | Code | BibTex Abstract: In this paper, a novel and effective image quality assessment (IQA) algorithm based on frequency disparity for high dynamic range (HDR) images is proposed, termed as local-global frequency feature-based model (LGFM). Motivated by the assumption that the human visual system (HVS) is highly adapted for extracting structural information and partial frequencies when perceiving the visual scene, the Gabor and the Butterworth filters are applied to the luminance component of the HDR image to extract the local and global frequency features, respectively. The similarity measurement and feature pooling strategy are sequentially performed on the frequency features to obtain the predicted single quality score. The experiments evaluated on four widely used benchmarks demonstrate that the proposed LGFM can provide a higher consistency with the subjective perception compared with the state-of-the-art HDR IQA methods. Our code is available at: https://github.com/eezkni/LGFM
    @article{liu2023high,
    	title={High Dynamic Range Image Quality Assessment Based on Frequency Disparity},
    	author={Liu, Yue and Ni, Zhangkai and Wang, Shiqi and Wang, Hanli and Kwong, Sam},
    	journal={IEEE Transactions on Circuits and Systems for Video Technology},
    	volume={33},
    	number={8},
    	pages={4435--4440},
    	year={2023},
    	publisher={IEEE}
    }
  8. CSformer: Bridging Convolution and Transformer for Compressive Sensing
    Dongjie Ye, Zhangkai Ni, Hanli Wang, Jian Zhang, Shiqi Wang, and Sam Kwong.
    IEEE Transactions on Image Processing (T-IP), vol. 32, pp. 2827-2842, May 2023.
    Abstract | Paper | Code | BibTex Abstract: Convolution neural networks (CNNs) have succeeded in compressive image sensing. However, due to the inductive bias of locality and weight sharing, the convolution operations demonstrate the intrinsic limitations in modeling the long-range dependency. Transformer, designed initially as a sequence-tosequence model, excels at capturing global contexts due to the self-attention-based architectures even though it may be equipped with limited localization abilities. This paper proposes CSformer, a hybrid framework that integrates the advantages of leveraging both detailed spatial information from CNN and the global context provided by transformer for enhanced representation learning. The proposed approach is an end-to-end compressive image sensing method, composed of adaptive sampling and recovery. In the sampling module, images are measured blockby-block by the learned sampling matrix. In the reconstruction stage, the measurement is projected into dual stems. One is the CNN stem for modeling the neighborhood relationships by convolution, and the other is the transformer stem for adopting global self-attention mechanism. The dual branches structure is concurrent, and the local features and global representations are fused under different resolutions to maximize the complementary of features. Furthermore, we explore a progressive strategy and window-based transformer block to reduce the parameter and computational complexity. The experimental results demonstrate the effectiveness of the dedicated transformer-based architecture for compressive sensing, which achieves superior performance compared to state-of-the-art methods on different datasets.
    @article{ye2023csformer,
    	title={CSformer: Bridging convolution and transformer for compressive sensing},
    	author={Ye, Dongjie and Ni, Zhangkai and Wang, Hanli and Zhang, Jian and Wang, Shiqi and Kwong, Sam},
    	journal={IEEE Transactions on Image Processing},
    	volume={32},
    	pages={2827--2842},
    	year={2023},
    	publisher={IEEE}
    }
  9. Towards Unsupervised Deep Image Enhancement with Generative Adversarial Network
    Zhangkai Ni, Wenhan Yang, Shiqi Wang, Lin Ma, and Sam Kwong.
    IEEE Transactions on Image Processing (T-IP), vol. 29, pp. 9140-9151, September 2020.
    Abstract | Paper | Code | BibTex Abstract: Improving the aesthetic quality of images is chal- lenging and eager for the public. To address this problem, most existing algorithms are based on supervised learning methods to learn an automatic photo enhancer for paired data, which consists of low-quality photos and corresponding expert-retouched ver- sions. However, the style and characteristics of photos retouched by experts may not meet the needs or preferences of general users. In this paper, we present an unsupervised image enhance- ment generative adversarial network (UEGAN), which learns the corresponding image-to-image mapping from a set of images with desired characteristics in an unsupervised manner, rather than learning on a large number of paired images. The proposed model is based on single deep GAN which embeds the modulation and attention mechanisms to capture richer global and local features. Based on the proposed model, we introduce two losses to deal with the unsupervised image enhancement: (1) fidelity loss, which is defined as a ℓ2 regularization in the feature domain of a pre-trained VGG network to ensure the content between the enhanced image and the input image is the same, and (2) quality loss that is formulated as a relativistic hinge adversarial loss to endow the input image the desired characteristics. Both quantitative and qualitative results show that the proposed model effectively improves the aesthetic quality of images. Our code is available at: https://github.com/eezkni/UEGAN
    @article{ni2020towards,
    	title={Towards unsupervised deep image enhancement with generative adversarial network},
    	author={Ni, Zhangkai and Yang, Wenhan and Wang, Shiqi and Ma, Lin and Kwong, Sam},
    	journal={IEEE Transactions on Image Processing},
    	volume={29},
    	pages={9140--9151},
    	year={2020},
    	publisher={IEEE}
    }
  10. Color Image Demosaicing Using Progressive Collaborative Representation
    Zhangkai Ni, Kai-Kuang Ma, Huanqiang Zeng, and Baojiang Zhong.
    IEEE Transactions on Image Processing (T-IP), vol. 29, pp. 4952-4964, March 2020.
    Abstract | Paper | Code | BibTex Abstract: In this paper, a progressive collaborative representation (PCR) framework is proposed that is able to incorporate any existing color image demosaicing method for further boosting its demosaicing performance. Our PCR consists of two phases: (i) offline training and (ii) online refinement. In phase (i), multiple training-and-refining stages will be performed. In each stage, a new dictionary will be established through the learning of a large number of feature-patch pairs, extracted from the demosaicked images of the current stage and their corresponding original full-color images. After training, a projection matrix will be generated and exploited to refine the current demosaicked image. The updated image with improved image quality will be used as the input for the next training-and-refining stage and performed the same processing likewise. At the end of phase (i), all the projection matrices generated as above-mentioned will be exploited in phase (ii) to conduct online demosaicked image refinement of the test image. Extensive simulations conducted on two commonly-used test datasets (i.e., the IMAX and Kodak) for evaluating the demosaicing algorithms have clearly demonstrated that our proposed PCR framework is able to constantly boost the performance of any image demosaicing method we experimented, in terms of the objective and subjective performance evaluations.
    @article{ni2020color,
    	title={Color Image Demosaicing Using Progressive Collaborative Representation},
    	author={Zhangkai Ni, Kai-Kuang Ma, Huanqiang Zeng, Baojiang Zhong},
    	journal={IEEE Transactions on Image Processing},
    	volume={29},
    	number={1},
    	pages={4952--4964},
    	year={2020},
    	publisher={IEEE}
    }
  11. Just Noticeable Distortion Profile Inference: A Patch-level Structural Visibility Learning Approach
    Xuelin Shen, Zhangkai Ni, Wenhan Yang, Shiqi Wang, Xinfeng Zhang, and Sam Kwong.
    IEEE Transactions on Image Processing (T-IP), vol. 30, pp. 26-38, November 2020.
    Abstract | Paper | Code | Dataset | Project | BibTex Abstract: In this paper, we propose an effective approach to infer the just noticeable distortion (JND) profile based on patch- level structural visibility learning. Instead of pixel-level JND profile estimation, the image patch, which is regarded as the basic processing unit to better correlate with the human perception, can be further decomposed into three conceptually independent components for visibility estimation. In particular, to incorporate the structural degradation into the patch-level JND model, a deep learning-based structural degradation estimation model is trained to approximate the masking of structural visibility. In order to facilitate the learning process, a JND dataset is further established, including 202 pristine images and 7878 distorted images generated by advanced compression algorithms based on the upcoming Versatile Video Coding (VVC) standard. Extensive experimental results further show the superiority of the proposed approach over the state-of-the-art. Our dataset is available at: https://shenxuelin-cityu.github.io/jnd.html.
    @article{shen2020just,
    	title={Just Noticeable Distortion Profile Inference: A Patch-level Structural Visibility Learning Approach},
    	author={Xuelin Shen, Zhangkai Ni, Wenhan Yang, Xinfeng Zhang, Shiqi Wang, and Sam Kwong},
    	journal={IEEE Transactions on Image Processing},
    	volume={30},
    	pages={26--38},
    	year={2020},
    	publisher={IEEE}
    }
  12. Unimodal Model-Based Inter Mode Decision for High Efficiency Video Coding
    Huanqiang Zeng, Wenjie Xiang, Jing Chen, Canhui Cai, Zhangkai Ni, and Kai-Kuang Ma.
    IEEE Access, vol.7, pp. 27936-27947, February 2019.
    Abstract | Paper | BibTex Abstract: In this paper, a fast inter mode decision algorithm, called the unimodal model-based inter mode decision (UMIMD), is proposed for the latest video coding standard, the high-efficiency video coding. Through extensive simulations, it has been observed that a unimodal model (i.e., with only one global minimum value) can be established among the size of different prediction unit (PU) modes and their resulted rate-distortion (RD) costs for each quad-tree partitioned coding tree unit (CTU). To guarantee the unimodality and further search the optimal operating point over this function for each CTU, all the PU modes need to be first classified into 11 mode classes according to their sizes. These classes are then properly ordered and sequentially checked according to the class index, from small to large so that the optimal mode can be early identified by checking when the RD cost starts to arise. In addition, an effective instant SKIP mode termination scheme is developed by simply checking the SKIP mode against a pre-determined threshold to further reduce the computational complexity. The extensive simulation results have shown that the proposed UMIMD algorithm is able to individually achieve a significant reduction on computational complexity at the encoder by 61.9% and 64.2% on average while incurring only 1.7% and 2.1% increment on the total Bjontegaard delta bit rate (BDBR) for the low delay and random access test conditions, compared with the exhaustive mode decision in the HEVC. Moreover, the experimental results have further demonstrated that the proposed UMIMD algorithm outperforms multiple state-of-the-art methods.
    @article{zeng2019unimodal,
    	title={Unimodal Model-Based Inter Mode Decision for High Efficiency Video Coding},
    	author={Zeng, Huanqiang and Xiang, Wenjie and Chen, Jing and Cai, Canhui and Ni, Zhangkai and Ma, Kai-Kuang},
    	journal={IEEE Access},
    	year={2019},
    	publisher={IEEE}
    }
  13. A Gabor Feature-Based Quality Assessment Model for the Screen Content Images
    Zhangkai Ni, Huanqiang Zeng, Lin Ma, Junhui Hou, Jing Chen, and Kai-Kuang Ma.
    IEEE Transactions on Image Processing (T-IP), vol. 27, no. 9, pp. 4516-4528, September 2018.
    Abstract | Paper | Code | Project | BibTex Abstract: In this paper, an accurate and efficient full-reference image quality assessment (IQA) model using the extracted Gabor features, called Gabor feature-based model (GFM), is proposed for conducting objective evaluation of screen content images (SCIs). It is well-known that the Gabor filters are highly consistent with the response of the human visual system (HVS), and the HVS is highly sensitive to the edge information. Based on these facts, the imaginary part of the Gabor filter that has odd symmetry and yields edge detection is exploited to the luminance of the reference and distorted SCI for extracting their Gabor features, respectively. The local similarities of the extracted Gabor features and two chrominance components, recorded in the LMN color space, are then measured independently. Finally, the Gabor-feature pooling strategy is employed to combine these measurements and generate the final evaluation score. Experimental simulation results obtained from two large SCI databases have shown that the proposed GFM model not only yields a higher consistency with the human perception on the assessment of SCIs but also requires a lower computational complexity, compared with that of classical and state-of-the-art IQA models.
    @article{ni2018gabor,
    	title={A Gabor feature-based quality assessment model for the screen content images},
    	author={Ni, Zhangkai and Zeng, Huanqiang and Ma, Lin and Hou, Junhui and Chen, Jing and Ma, Kai-Kuang},
    	journal={IEEE Transactions on Image Processing},
    	volume={27},
    	number={9},
    	pages={4516--4528},
    	year={2018},
    	publisher={IEEE}
    }
  14. Screen Content Image Quality Assessment Using Multi-Scale Difference of Gaussian
    Ying Fu, Huanqiang Zeng, Lin Ma, Zhangkai Ni, Jianqing Zhu, and Kai-Kuang Ma.
    IEEE Transactions on Circuits and Systems for Video Technology (T-CSVT), vol. 28, no. 9, pp. 2428-2432, September 2018.
    Abstract | Paper | Code | BibTex Abstract: In this paper, a novel image quality assessment (IQA) model for the screen content images (SCIs) is proposed by using multi-scale difference of Gaussian (MDOG). Motivated by the observation that the human visual system (HVS) is sensitive to the edges while the image details can be better explored in different scales, the proposed model exploits MDOG to effectively characterize the edge information of the reference and distorted SCIs at two different scales, respectively. Then, the degree of edge similarity is measured in terms of the smaller-scale edge map. Finally, the edge strength computed based on the larger-scale edge map is used as the weighting factor to generate the final SCI quality score. Experimental results have shown that the proposed IQA model for the SCIs produces high consistency with human perception of the SCI quality and outperforms the state-of-the-art quality models.
    @article{fu2018screen,
    	title={Screen content image quality assessment using multi-scale difference of gaussian},
    	author={Fu, Ying and Zeng, Huanqiang and Ma, Lin and Ni, Zhangkai and Zhu, Jianqing and Ma, Kai-Kuang},
    	journal={IEEE Transactions on Circuits and Systems for Video Technology},
    	volume={28},
    	number={9},
    	pages={2428--2432},
    	year={2018},
    	publisher={IEEE}
    }
  15. ESIM: Edge Similarity for Screen Content Image Quality Assessment
    Zhangkai Ni, Lin Ma, Huanqiang Zeng, Jing Chen, Canhui Cai, and Kai-Kuang Ma.
    IEEE Transactions on Image Processing (T-IP), vol. 26, no. 10, pp. 4818-4831, October 2017.
    Abstract | Paper | Code | Dataset | Project | BibTex Abstract: In this paper, an accurate full-reference image quality assessment (IQA) model developed for assessing screen content images (SCIs), called the edge similarity (ESIM), is proposed. It is inspired by the fact that the human visual system (HVS) is highly sensitive to edges that are often encountered in SCIs; therefore, essential edge features are extracted and exploited for conducting IQA for the SCIs. The key novelty of the proposed ESIM lies in the extraction and use of three salient edge features-i.e., edge contrast, edge width, and edge direction. The first two attributes are simultaneously generated from the input SCI based on a parametric edge model, while the last one is derived directly from the input SCI. The extraction of these three features will be performed for the reference SCI and the distorted SCI, individually. The degree of similarity measured for each above-mentioned edge attribute is then computed independently, followed by combining them together using our proposed edge-width pooling strategy to generate the final ESIM score. To conduct the performance evaluation of our proposed ESIM model, a new and the largest SCI database (denoted as SCID) is established in our work and made to the public for download. Our database contains 1800 distorted SCIs that are generated from 40 reference SCIs. For each SCI, nine distortion types are investigated, and five degradation levels are produced for each distortion type. Extensive simulation results have clearly shown that the proposed ESIM model is more consistent with the perception of the HVS on the evaluation of distorted SCIs than the multiple state-of-the-art IQA methods.
    @article{ni2017esim,
    	title={ESIM: Edge similarity for screen content image quality assessment},
    	author={Ni, Zhangkai and Ma, Lin and Zeng, Huanqiang and Chen, Jing and Cai, Canhui and Ma, Kai-Kuang},
    	journal={IEEE Transactions on Image Processing},
    	volume={26},
    	number={10},
    	pages={4818--4831},
    	year={2017},
    	publisher={IEEE}
    }
  16. Gradient Direction for Screen Content Image Quality Assessment
    Zhangkai Ni, Lin Ma, Huanqiang Zeng, Canhui Cai, and Kai-Kuang Ma.
    IEEE Signal Processing Letters (SPL), vol. 23, no. 10, pp. 1394–1398, August 2016.
    Abstract | Paper | Code | Project | BibTex Abstract: In this letter, we make the first attempt to explore the usage of the gradient direction to conduct the perceptual quality assessment of the screen content images (SCIs). Specifically, the proposed approach first extracts the gradient direction based on the local information of the image gradient magnitude, which not only preserves gradient direction consistency in local regions, but also demonstrates sensitivities to the distortions introduced to the SCI. A deviation-based pooling strategy is subsequently utilized to generate the corresponding image quality index. Moreover, we investigate and demonstrate the complementary behaviors of the gradient direction and magnitude for SCI quality assessment. By jointly considering them together, our proposed SCI quality metric outperforms the state-of-the-art quality metrics in terms of correlation with human visual system perception.
    @article{ni2016gradient,
    	title={Gradient direction for screen content image quality assessment},
    	author={Ni, Zhangkai and Ma, Lin and Zeng, Huanqiang and Cai, Canhui and Ma, Kai-Kuang},
    	journal={IEEE Signal Processing Letters},
    	volume={23},
    	number={10},
    	pages={1394--1398},
    	year={2016},
    	publisher={IEEE}
    }

Conference Publications

  1. DDR: Exploiting Deep Degradation Response as Flexible Image Descriptor
    Juncheng Wu, Zhangkai Ni, Hanli Wang, Wenhan Yang, Yuyin Zhou, and Shiqi Wang
    In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), December 2024.
    Abstract | Paper | Code | BibTex Abstract: Image deep features extracted by pre-trained networks are known to contain rich and informative representations. In this paper, we present Deep Degradation Response (DDR), a method to quantify changes in image deep features under varying degradation conditions. Specifically, our approach facilitates flexible and adaptive degradation, enabling the controlled synthesis of image degradation through text-driven prompts. Extensive evaluations demonstrate the versatility of DDR as an image descriptor, with strong correlations observed with key image attributes such as complexity, colorfulness, sharpness, and overall quality. Moreover, we demonstrate the efficacy of DDR across a spectrum of applications. It excels as a blind image quality assessment metric, outperforming existing methodologies across multiple datasets. Additionally, DDR serves as an effective unsupervised learning objective in image restoration tasks, yielding notable advancements in image deblurring and single-image super-resolution. Our code is available at: https://github.com/eezkni/DDR
    @article{wu2024ddr,
    	title={DDR: Exploiting Deep Degradation Response as Flexible Image Descriptor},
    	author={Wu, Juncheng and Ni, Zhangkai and Wang, Hanli and Yang, Wenhan and Zhou, Yuyin and Wang, Shiqi},
    	journal={arXiv preprint arXiv:2406.08377},
    	year={2024}
    }
  2. Unrolled Decomposed Unpaired Learning for Controllable Low-Light Video Enhancement
    Lingyu Zhu, Wenhan Yang, Baoliang Chen, Hanwei Zhu, Zhangkai Ni, Qi Mao, and Shiqi Wang
    In Proceedings of the European Conference on Computer Vision (ECCV), September 2024.
    Abstract | Paper | Code | BibTex Abstract: Obtaining pairs of low/normal-light videos, with motions, is more challenging than still images, which raises technical issues and poses the technical route of unpaired learning as a critical role. This paper makes endeavors in the direction of learning for low-light video enhancement without using paired ground truth. Compared to low-light image enhancement, enhancing low-light videos is more difficult due to the intertwined effects of noise, exposure, and contrast in the spatial domain, jointly with the need for temporal coherence. To address the above challenge, we propose the Unrolled Decomposed Unpaired Network (UDU-Net) for enhancing low-light videos by unrolling the optimization functions into a deep network to decompose the signal into spatial and temporal-related factors, which are updated iteratively. Firstly, we formulate low-light video enhancement as a Maximum A Posteriori estimation (MAP) problem with carefully designed spatial and temporal visual regularization. Then, via unrolling the problem, the optimization of the spatial and temporal constraints can be decomposed into different steps and updated in a stage-wise manner. From the spatial perspective, the designed Intra subnet leverages unpair prior information from expert photography retouched skills to adjust the statistical distribution. Additionally, we introduce a novel mechanism that integrates human perception feedback to guide network optimization, suppressing over/under-exposure conditions. Meanwhile, to address the issue from the temporal perspective, the designed Inter subnet fully exploits temporal cues in progressive optimization, which helps achieve improved temporal consistency in enhancement results. Consequently, the proposed method achieves superior performance to state-of-the-art methods in video illumination, noise suppression, and temporal consistency across outdoor and indoor scenes.
    @article{zhu2024unrolled,
    	title={Unrolled Decomposed Unpaired Learning for Controllable Low-Light Video Enhancement},
    	author={Zhu, Linyu and Yang, Wenhan and Chen, Baoliang and Zhu, Hanwei and Ni, Zhangkai and Mao, Qi and Wang Shiqi},
    	journal={arXiv preprint arXiv:2408.12316},
    	year={2024}
    }
  3. Misalignment-Robust Frequency Distribution Loss for Image Transformation
    Zhangkai Ni, Juncheng Wu, Zian Wang, Wenhan Yang, Hanli Wang, Lin Ma.
    In Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR), June 2024.
    Abstract | Paper | Code | BibTex Abstract: This paper aims to address a common challenge in deep learning-based image transformation methods, such as image enhancement and super-resolution, which heavily rely on precisely aligned paired datasets with pixel-level alignments. However, creating precisely aligned paired images presents significant challenges and hinders the advancement of methods trained on such data. To overcome this challenge, this paper introduces a novel and simple Frequency Distribution Loss (FDL) for computing distribution distance within the frequency domain. Specifically, we transform image features into the frequency domain using Discrete Fourier Transformation (DFT). Subsequently, frequency components (amplitude and phase) are processed separately to form the FDL loss function. Our method is empirically proven effective as a training constraint due to the thoughtful utilization of global information in the frequency domain. Extensive experimental evaluations, focusing on image enhancement and super-resolution tasks, demonstrate that FDL outperforms existing misalignment-robust loss functions. Furthermore, we explore the potential of our FDL for image style transfer that relies solely on completely misaligned data. Our code is available at: https://github.com/eezkni/FDL
    @inproceedings{ni2024misalignment,
    	title={Misalignment-robust frequency distribution loss for image transformation},
    	author={Ni, Zhangkai and Wu, Juncheng and Wang, Zian and Yang, Wenhan and Wang, Hanli and Ma, Lin},
    	booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
    	pages={2910--2919},
    	year={2024}
    }
  4. ColNeRF: Collaboration for Generalizable Sparse Input Neural Radiance Field
    Zhangkai Ni, Peiqi Yang, Wenhan Yang, Hanli Wang, Lin Ma, and Sam Kwong.
    In Proceedings of the 38th Annual AAAI Conference on Artificial Intelligence (AAAI), vol. 38, no. 5, pp.4325-4333, 2024.
    Abstract | Paper | Code | BibTex Abstract: Neural Radiance Fields (NeRF) have demonstrated impressive potential in synthesizing novel views from dense input, however, their effectiveness is challenged when dealing with sparse input. Existing approaches that incorporate additional depth or semantic supervision can alleviate this issue to an extent. However, the process of supervision collection is not only costly but also potentially inaccurate, leading to poor performance and generalization ability in diverse scenarios. In our work, we introduce a novel model: the Collaborative Neural Radiance Fields (ColNeRF) designed to work with sparse input. The collaboration in ColNeRF includes both the cooperation between sparse input images and the cooperation between the output of the neural radiation field. Through this, we construct a novel collaborative module that aligns information from various views and meanwhile imposes self-supervised constraints to ensure multi-view consistency in both geometry and appearance. A Collaborative Cross-View Volume Integration module (CCVI) is proposed to capture complex occlusions and implicitly infer the spatial location of objects. Moreover, we introduce self-supervision of target rays projected in multiple directions to ensure geometric and color consistency in adjacent regions. Benefiting from the collaboration at the input and output ends, ColNeRF is capable of capturing richer and more generalized scene representation, thereby facilitating higher-quality results of the novel view synthesis. Extensive experiments demonstrate that ColNeRF outperforms state-of-the-art sparse input generalizable NeRF methods. Furthermore, our approach exhibits superiority in fine-tuning towards adapting to new scenes, achieving competitive performance compared to per-scene optimized NeRF-based methods while significantly reducing computational costs. Our code is available at: https://github.com/eezkni/ColNeR
    @inproceedings{ni2024colnerf,
    	title={ColNeRF: Collaboration for Generalizable Sparse Input Neural Radiance Field},
    	author={Ni, Zhangkai and Yang, Peiqi and Yang, Wenhan and Wang, Hanli and Ma, Lin and Kwong, Sam},
    	booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
    	volume={38},
    	number={5},
    	pages={4325--4333},
    	year={2024}
    }
  5. Cycle-Interactive Generative Adversarial Network for Robust Unsupervised Low-Light Enhancement
    Zhangkai Ni, Wenhan Yang, Hanli Wang, Shiqi Wang, Lin Ma, and Sam Kwong.
    In Proceedings of the 30th ACM International Conference on Multimedia (ACM Multimedia), October 2022
    Abstract | Paper | Code | BibTex Abstract: Getting rid of the fundamental limitations in fitting to the paired training data, recent unsupervised low-light enhancement methods excel in adjusting illumination and contrast of images. However, for unsupervised low light enhancement, the remaining noise suppression issue due to the lacking of supervision of detailed signal largely impedes the wide deployment of these methods in real-world applications. Herein, we propose a novel Cycle-Interactive Generative Adversarial Network (CIGAN) for unsupervised low-light image enhancement, which is capable of not only better transferring illumination distributions between low/normal-light images but also manipulating detailed signals between two domains, e.g., suppressing/synthesizing realistic noise in the cyclic enhancement/degradation process. In particular, the proposed low-light guided transformation feed-forwards the features of low-light images from the generator of enhancement GAN (eGAN) into the generator of degradation GAN (dGAN). With the learned information of real low-light images, dGAN can synthesize more realistic diverse illumination and contrast in low-light images. Moreover, the feature randomized perturbation module in dGAN learns to increase the feature randomness to produce diverse feature distributions, persuading the synthesized low-light images to contain realistic noise. Extensive experiments demonstrate both the superiority of the proposed method and the effectiveness of each module in CIGAN.
    @inproceedings{ni2022cycle,
    	title={Cycle-Interactive Generative Adversarial Network for Robust Unsupervised Low-Light Enhancement},
    	author={Ni, Zhangkai and Yang, Wenhan and Wang, Hanli and Wang, Shiqi and Ma, Lin and Kwong, Sam},
    	booktitle={Proceedings of the 28th ACM International Conference on Multimedia},
    	pages={1697--1705},
    	year={2020}
    }
  6. Unpaired Image Enhancement with Quality-Attention Generative Adversarial Network
    Zhangkai Ni, Wenhan Yang, Shiqi Wang, Lin Ma, and Sam Kwong.
    In Proceedings of the 28th ACM International Conference on Multimedia (ACM Multimedia), pp. 1697-1705, October 2020
    Abstract | Paper | Code | BibTex Abstract: In this work, we aim to learn an unpaired image enhancement model, which can enrich low-quality images with the characteristics of high-quality images provided by users. We propose a quality attention generative adversarial network (QAGAN) trained on unpaired data based on the bidirectional Generative Adversarial Network (GAN) embedded with a quality attention module (QAM). The key novelty of the proposed QAGAN lies in the injected QAM for the generator such that it learns domain-relevant quality attention directly from the two domains. More specifically, the proposed QAM allows the generator to effectively select semantic-related characteristics from the spatial-wise and adaptively incorporate style-related attributes from the channel-wise, respectively. Therefore, in our proposed QAGAN, not only discriminators but also the generator can directly access both domains which significantly facilitate the generator to learn the mapping function. Extensive experimental results show that, compared with the state-of-the-art methods based on unpaired learning, our proposed method achieves better performance in both objective and subjective evaluations.
    @inproceedings{ni2020unpaired,
    	title={Unpaired image enhancement with quality-attention generative adversarial network},
    	author={Ni, Zhangkai and Yang, Wenhan and Wang, Shiqi and Ma, Lin and Kwong, Sam},
    	booktitle={Proceedings of the 28th ACM International Conference on Multimedia},
    	pages={1697--1705},
    	year={2020}
    }
  7. A JND Dataset Based on VVC Compressed Images
    Xuelin Shen, Zhangkai Ni, Wenhan Yang, Xinfeng Zhang, Shiqi Wang, and Sam Kwong.
    2020 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), June 2020.
    Abstract | Paper | Dataset | BibTex Abstract: In this paper, we establish a just noticeable distortion (JND) dataset based on the next generation video coding standard Versatile Video Coding (VVC). The dataset consists of 202 images which cover a wide range of content with resolution 1920×1080. Each image is encoded by VTM 5.0 intra coding with the quantization parameter (QP) ranging from 13 to 51. The details regarding dataset construction, subjective testing and data post-processing are described in this paper. Finally, the significance of the dataset towards future video coding research is envisioned. All source images as well as the testing data have been made available to the public.
    @inproceedings{shen2020jnd,
    	title={A JND Dataset Based on VVC Compressed Images},
    	author={Shen, Xuelin and Ni, Zhangkai and Yang, Wenhan and Zhang, Xinfeng and Wang, Shiqi and Kwong, Sam},
    	booktitle={2020 IEEE International Conference on Multimedia \& Expo Workshops (ICMEW)},
    	pages={1--6},
    	year={2020},
    	organization={IEEE}
    }
  8. SCID: A Database for Screen Content Images Quality Assessment
    Zhangkai Ni, Lin Ma, Huanqiang Zeng, Ying Fu, Lu Xing, and Kai-Kuang Ma.
    International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS), pp. 774-779, November 2017.
    Abstract | Paper | Dataset | Project | BibTex Abstract: Perceptual quality assessment of screen content images (SCIs) has become a new challenging topic in the recent research of image quality assessment (IQA). In this work, we construct a new SCI database (called as SCID) for subjective quality evaluate of SCIs and investigate whether existing IQA models can effectively assess the perceptual quality of distorted SCIs. The proposed SCID, which is currently the largest one, containing 1,800 distorted SCIs generated from 40 reference SCIs with 9 types of distortions and 5 degradation levels for each distortion type. The double-stimulus impairment scale (DSIS) method is then employed to rate the perceptual quality, in which each image is evaluated by at least 40 assessors. After processing, each distorted SCI is accompanied with one mean opinion score (MOS) value to indicate its perceptual quality as ground truth. Based on the constructed SCID, we evaluate the performances of 14 state-of-the-art IQA metrics. Experimental results show that the existing IQA metrics do not be able to evaluate the perceptual quality of SCIs well and an IQA metric specifically for SCIs is thus desirable. The proposed SCID will be made publicly available to the research community for further investigation on the perceptual processing of SCIs.
    @inproceedings{ni2017scid,
    	title={SCID: A database for screen content images quality assessment},
    	author={Ni, Zhangkai and Ma, Lin and Zeng, Huanqiang and Fu, Ying and Xing, Lu and Ma, Kai-Kuang},
    	booktitle={2017 International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS)},
    	pages={774--779},
    	year={2017},
    	organization={IEEE}
    }
  9. Screen Content Image Quality Assessment Using Euclidean Distance
    Ying Fu, Huanqiang Zeng, Zhangkai Ni, Jing Chen, Canhui Cai, and Kai-Kuang Ma.
    International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS), pp. 44-49, November 2017.
    Abstract | Paper | BibTex Abstract: Considering that human visual system (HVS) is greatly sensitive to edge, in this study, we design a new full-reference objective quality assessment method for screen content images (SCIs). The key novelty lies in the extracting of the edge information by computing the Euclidean distance of luminance in the SCIs. Since HVS is greatly suitable for extracting structural information, the structure information is incorporated into our proposed model. The extracted information is then used to compute the similarity maps of the reference SCI and its distorted SCI. Finally, we combine the obtained maps by using our designed pooling strategy. Experience results have shown that the designed method get higher correlation with the subjective quality score than state-of-the-art quality assessment models.
    @inproceedings{fu2017screen,
    	title={Screen content image quality assessment using Euclidean distance},
    	author={Fu, Ying and Zeug, Huanqiang and Ni, Zhangkai and Chen, Jing and Cai, Canhui and Ma, Kai-Kuang},
    	booktitle={2017 International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS)},
    	pages={44--49},
    	year={2017},
    	organization={IEEE}
    }
  10. Screen Content Image Quality Assessment Using Edge Model
    Zhangkai Ni, Lin Ma, Huanqiang Zeng, Canhui Cai, and Kai-Kuang Ma.
    IEEE International conference on Image Processing (ICIP), pp. 81–85, August 2016.
    Abstract | Paper | Code | BibTex Abstract: Since the human visual system (HVS) is highly sensitive to edges, a novel image quality assessment (IQA) metric for assessing screen content images (SCIs) is proposed in this paper. The turnkey novelty lies in the use of an existing parametric edge model to extract two types of salient attributes - namely, edge contrast and edge width, for the distorted SCI under assessment and its original SCI, respectively. The extracted information is subject to conduct similarity measurements on each attribute, independently. The obtained similarity scores are then combined using our proposed edge-width pooling strategy to generate the final IQA score. Hopefully, this score is consistent with the judgment made by the HVS. Experimental results have shown that the proposed IQA metric produces higher consistency with that of the HVS on the evaluation of the image quality of the distorted SCI than that of other state-of-the-art IQA metrics.
    @inproceedings{ni2016screen,
    	title={Screen content image quality assessment using edge model},
    	author={Ni, Zhangkai and Ma, Lin and Zeng, Huanqiang and Cai, Canhui and Ma, Kai-Kuang},
    	booktitle={2016 IEEE International Conference on Image Processing (ICIP)},
    	pages={81--85},
    	year={2016},
    	organization={IEEE}
    }

Patents

  1. A Rain Removal Image Post-processing Method Based on Progressive Collaborative Representation
    Huanqiang Zeng, Xiangwei Lin, Zhangkai Ni, Jiuwen Cao, Jianqing Zhu, and Kai-Kuang Ma
    Application No. 10201906356T, July 2019. (Chinese Patent)
  2. Colour Image Demosaicing Using Progressive Collaborative Representation
    Kai-Kuang Ma, and Zhangkai Ni
    Application No. 10201906356T, July 2019. (Singapore Patent)
  3. A Multi-Exposure Fused Image Quality Assessment Method Based on Contrast and Saturation
    Huanqiang Zeng, Lu Xing, Zhangkai Ni, Jiuwen Cao, Canhui Cai, and Kai-Kuang Ma
    Application No. 2016111584053, December 2016. (Chinese Patent)
  4. A Screen Content Image Quality Assessment Method Based on Phase Congruency
    Huanqiang Zeng, Zhangkai Ni, Lin Ma, Jiuwen Cao, Canhui Cai, and Kai-Kuang Ma
    Application No. 2016108863395, October 2016. (Chinese Patent)
Top