LINE 聊天分類必看教學!手機和電腦都適用!聊天室
害救護車空跑!他「1個月謊報6次」下場慘交友
  • 被援用:0
  • 點閱:59
  • 評分:
  • 下載:0
  • 書目珍藏:0
近期,隱式神經表示(Implicit Neural Representation, INR)因成功解決進修式視訊壓縮中面對解碼速度慢的窘境,成為熱點研究標的目的,然而現有INR方式依然無法在壓縮效能上與最早進的學習式視訊緊縮方法匹敵,因此本文將以晉升緊縮機能為目標改良INR模子。
在模子架構方面,本文設計一個增強影像特徵模組,包括憑據視訊中特徵變化水平動態調劑GOP(Group of Pictures)區間,並插手時候嵌入輕量化,使得在同個GOP下的相同特徵能額外嵌入時候訊息,以及選取相對接近的樞紐特徵看成影像特徵,而非依靠固定的GOP肇端圖框作為特徵起原。同時引入自注意力模組CBAM(Convolutional Block Attention Module)加強關鍵特徵的關注。並在解碼器中透過帶有權重的殘差毗連(Skip Connect), 以改善梯度活動,並實現輔助特徵與主要特徵之間的平衡。
在模型壓縮方面,本文憑據視訊的動態水平,對動靜態視訊採取分歧的剪枝策略,並結合模型剪枝策略,保存解碼過程的關頭層。在損失函數設計中,加入分階段頻域損失,統籌局部與全局特徵的表現。
在視訊表示法使命上,相較於本文基於Deng的模子,本文方式在模子巨細削減2%的情況下,PSNR指標還上升0.28。在視訊緊縮上,本文方式在PSNR指標超越基準方法、H265傳統視訊壓縮,和DCVC學習式視訊緊縮方法。
另外,不同於以往 INR 視訊緊縮方式均採取固定的練習方式,本文透過度析視訊的特征,進而動態調劑練習策略,並經由過程消融嘗試驗證其有用性。
In recent years, Implicit Neural Representation (INR) has become a popular research direction because it successfully solves the slow decoding speed in learned video compression.
However, the existing INR methods are still not able to match the state-of-the-art learned video compression method in terms of compression performance. Therefore, we aim to improve INR models with the goal of improving compression performance.
In terms of model architecture, we design an enhanced image feature module, which dynamically adjusts Group of Pictures (GOP) interval based on the feature variation in a video. Additionally, we introduce lightweight temporal embeddings to embed time information into features within the same GOP. Instead of relying on fixed GOP initial frames as feature sources, we select relatively close key features as image features. Meanwhile, we introduce self-attention module CBAM (Convolutional Block Attention Module) to strengthen attention to key features.
Moreover, in the decoder, we employ skip connections with weight to improve gradient flow and achieve a balance between auxiliary and primary features.
In terms of model compression, we propose a dynamic pruning strategy based on the dynamic degree of video, applying different pruning strategies for static and dynamic videos. We also combine model pruning strategy to retain key layers during the decoding process. In loss function design, we introduce a stage-wise frequency domain loss to optimize both local and global feature representations.
For video representation tasks, compared to Deng’s model, our proposed method reduces model size by 2%, while improving the PSNR metric by 0.28dB. For video compression, our method outperforms the baseline method, traditional video compression H.265, and learned video compression DCVC in PSNR.
Notably, unlike previous INR-based video compression methods that use fixed training methods, this study through analyzes the characteristics of video to adjust the training strategy. Furthermore, we validate the effectiveness of our approach through ablation experiments.
摘要 i

2.1 顯式視訊示意與隱式神經透露表現 5
2.2 隱式神經暗示的嵌入類型 7
2.3 隱式神經示意的解碼器架構 9
2.4 隱式神經透露表現的模子緊縮流程 11
2.5 隱式神經默示的訓練策略 11
2.6 總結文獻方式 12
3 第三章 本文提出方法 14
3.1 本文模子架構 14
3.1.1 增強影象特徵模組 15
3.1.2 多解析度時候網格 18
3.1.3 特徵融會模組 19
3.1.4 自注意力模組 (CBAM) 20
3.1.5 光流指導圖框聚合 21
3.1.6 解碼器 23
3.2 視訊緊縮流程 24
3.2.1 量化感知練習 25
3.2.2 特徵量化 25
3.2.3 剪枝 26
3.2.4 權重編碼 27
3.3 損失函數 27
3.3.1 加強損失函數 27
3.3.2 分階段損失函數 28
4 第四章 實驗後果 30
4.1 實行設置 30
4.1.1 實驗情況 30
4.1.2 資料集 31
4.1.3 超參數 32
4.1.4 評估方式 35
4.2 視訊透露表現法實行後果 35
4.3 視訊緊縮實行成績 36
4.4 消融嘗試 40
4.4.1 模型架構 40
4.4.2 損失函數 41
4.4.3 模子緊縮 42
5 第五章 結論 45
參考文獻 47


[1]T. Wiegand, G. J. Sullivan, G. Bjontegaard, and A. Luthra, "Overview of the H. 264/AVC video coding standard," IEEE Transactions on circuits and systems for video technology, vol. 13, no. 7, pp. 560-576, 2003.
[2]G. J. Sullivan, J.-R. Ohm, W.-J. Han, and T. Wiegand, "Overview of the high efficiency video coding (HEVC) standard," IEEE Transactions on circuits and systems for video technology, vol. 22, no. 12, pp. 1649-1668, 2012.
[3]B. Bross et al., "Overview of the versatile video coding (VVC) standard and its applications," IEEE Transactions on Circuits and Systems for Video Technology, vol. 31, no. 10, pp. 3736-3764, 2021.
[4]G. Lu, W. Ouyang, D. Xu, X. Zhang, C. Cai, and Z. Gao, "Dvc: An end-to-end deep video compression framework," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 11006-11015.
[5]J. Li, B. Li, and Y. Lu, "Deep contextual video compression," Advances in Neural Information Processing Systems, vol. 34, pp. 18114-18125, 2021.
[6]Z. Hu, G. Lu, and D. Xu, "FVC: A new framework towards deep video compression in feature space," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 1502-1511.
[7]X. Sheng, J. Li, B. Li, L. Li, D. Liu, and Y. Lu, "Temporal context mining for learned video compression," IEEE Transactions on Multimedia, vol. 25, pp. 7311-7322, 2022.
[8]H. Chen, B. He, H. Wang, Y. Ren, S. N. Lim, and A. Shrivastava, "Nerv: Neural representations for videos," Advances in Neural Information Processing Systems, vol. 34, pp. 21557-21568, 2021.

[10]S. Woo, J. Park, J.-Y. Lee, and I. S. Kweon, "Cbam: Convolutional block attention module," in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 3-19.
[11]I. E. Richardson, The H. 264 advanced video compression standard. John Wiley & Sons, 2011.
[12]A. Habibian, T. v. Rozendaal, J. M. Tomczak, and T. S. Cohen, "Video compression with rate-distortion autoencoders," in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 7033-7042.
[13]J. Pessoa, H. Aidos, P. Tomás, and M. A. Figueiredo, "End-to-end learning of video compression using spatio-temporal autoencoders," in 2020 IEEE Workshop on Signal Processing Systems (SiPS), 2020: IEEE, pp. 1-6.
[14]Z. Li, M. Wang, H. Pi, K. Xu, J. Mei, and Y. Liu, "E-nerv: Expedite neural video representation with disentangled spatial-temporal context," in European Conference on Computer Vision, 2022: Springer, pp. 267-284.
[15]J. C. Lee, D. Rho, J. H. Ko, and E. Park, "Ffnerv: Flow-guided frame-wise neural representations for videos," in Proceedings of the 31st ACM International Conference on Multimedia, 2023, pp. 7859-7870.
[16]B. He, C. Zhu, G. Lu, Z. Zhang, Y. Chen, and L. Song, "GNeRV: A Global Embedding Neural Representation For Videos."
[17]X. Huang and S. Belongie, "Arbitrary style transfer in real-time with adaptive instance normalization," in Proceedings of the IEEE international conference on computer vision, 2017, pp. 1501-1510.
[18]H. Chen, M. Gwilliam, S.-N. Lim, and A. Shrivastava, "Hnerv: A hybrid neural representation for videos," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 10270-10279.
[19]Z. Liu, H. Mao, C.-Y. Wu, C. Feichtenhofer, T. Darrell, and S. Xie, "A convnet for the 2020s," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 11976-11986.
[20]H. Chen, M. Gwilliam, B. He, S.-N. Lim, and A. Shrivastava, "Cnerv: Content-adaptive neural representation for visual data," arXiv preprint arXiv:2211.10421, 2022.
[21]J. Kim, J. Lee, and J.-W. Kang, "SNeRV: Spectra-Preserving Neural Representation for Video," in European Conference on Computer Vision, 2025: Springer, pp. 332-348.
[22]J. E. Saethre, R. Azevedo, and C. Schroers, "Combining Frame and GOP Embeddings for Neural Video Representation," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 9253-9263.
[23]W. Shi et al., "Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 1874-1883.
[24]D. Hendrycks and K. Gimpel, "Gaussian error linear units (gelus)," arXiv preprint arXiv:1606.08415, 2016.
[25]X. Zhang et al., "Boosting Neural Representations for Videos with a Conditional Decoder," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 2556-2566.
[26]V. Sitzmann, J. Martel, A. Bergman, D. Lindell, and G. Wetzstein, "Implicit neural representations with periodic activation functions," Advances in neural information processing systems, vol. 33, pp. 7462-7473, 2020.
[27]Z. Liu et al., "FINER: Flexible spectral-bias tuning in Implicit NEural Representation by Variable-periodic Activation Functions," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 2713-2722.
[28]H. Zhu et al., "FINER++: Building a Family of Variable-periodic Functions for Activating Implicit Neural Representation," arXiv preprint arXiv:2407.19434, 2024.
[29]Y. Bai, C. Dong, C. Wang, and C. Yuan, "Ps-nerv: Patch-wise stylized neural representations for videos," in 2023 IEEE International Conference on Image Processing (ICIP), 2023: IEEE, pp. 41-45.
[30]C. Gomes, R. Azevedo, and C. Schroers, "Video compression with entropy-constrained neural representations," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 18497-18506.
[31]G. Gao, H. M. Kwan, F. Zhang, and D. Bull, "PNVC: Towards practical INR-based video compression," arXiv preprint arXiv:2409.00953, 2024.
[32]H. M. Kwan, G. Gao, F. Zhang, A. Gower, and D. Bull, "HiNeRV: Video Compression with Hierarchical Encoding-based Neural Representation," Advances in Neural Information Processing Systems, vol. 36, 2024.
[33]Q. Chang, H. Yu, S. Fu, Z. Zeng, and C. Chen, "MNeRV: A Multilayer Neural Representation for Videos," arXiv preprint arXiv:2407.07347, 2024.
[34]M. Tarchouli, T. Guionnet, M. Riviere, W. Hamidouche, M. Outtas, and O. Deforges, "Res-NeRV: Residual Blocks For A Practical Implicit Neural Video Decoder," in 2024 IEEE International Conference on Image Processing (ICIP), 2024: IEEE, pp. 3751-3757.
[35]Q. Cao, D. Zhang, and X. Zhang, "Saliency-Based Neural Representation for Videos," in International Conference on Pattern Recognition, 2025: Springer, pp. 389-403.
[36]J. Chen et al., "Run, Don't walk: Chasing higher FLOPS for faster neural networks," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 12021-12031.
[37]D. Oktay, J. Ballé, S. Singh, and A. Shrivastava, "Scalable model compression by entropy penalized reparameterization," arXiv preprint arXiv:1906.06624, 2019.
[38]H. Yan, Z. Ke, X. Zhou, T. Qiu, X. Shi, and D. Jiang, "DS-NeRV: Implicit Neural Video Representation with Decomposed Static and Dynamic Codes," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 23019-23029.
[39]L. Tang, J. Zhu, X. Zhang, L. Zhang, S. Ma, and Q. Huang, "CANeRV: Content Adaptive Neural Representation for Video Compression," arXiv preprint arXiv:2502.06181, 2025.
[40]A. Radford et al., "Learning transferable visual models from natural language supervision," in International conference on machine learning, 2021: PMLR, pp. 8748-8763.
[41]P. Wang et al., "Ofa: Unifying architectures, tasks, and modalities through a simple sequence-to-sequence learning framework," in International Conference on Machine Learning, 2022: PMLR, pp. 23318-23340.
[42]B. He et al., "Towards scalable neural representation for diverse videos," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 6132-6142.
[43]B. Jacob et al., "Quantization and training of neural networks for efficient integer-arithmetic-only inference," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 2704-2713.
[44]Y. Bengio, N. Léonard, and A. Courville, "Estimating or propagating gradients through stochastic neurons for conditional computation," arXiv preprint arXiv:1308.3432, 2013.
[45]J. Shi, "Good features to track," in 1994 Proceedings of IEEE conference on computer vision and pattern recognition, 1994: IEEE, pp. 593-600.
[46]B. D. Lucas and T. Kanade, "An iterative image registration technique with an application to stereo vision," in IJCAI'81: 7th international joint conference on Artificial intelligence, 1981, vol. 2, pp. 674-679.
[47]D. A. Huffman, "A method for the construction of minimum-redundancy codes," Proceedings of the IRE, vol. 40, no. 9, pp. 1098-1101, 1952.
[48]A. Mercat, M. Viitanen, and J. Vanne, "UVG dataset: 50/120fps 4K sequences for video codec analysis and development," in Proceedings of the 11th ACM Multimedia Systems Conference, 2020, pp. 297-302.
[49]H. Wang et al., "MCL-JCV: a JND-based H. 264/AVC video quality assessment dataset," in 2016 IEEE international conference on image processing (ICIP), 2016: IEEE, pp. 1509-1513.
[50]P. Goyal, "Accurate, large minibatch SG D: training imagenet in 1 hour," arXiv preprint arXiv:1706.02677, 2017.
[51]I. Loshchilov and F. Hutter, "Stochastic gradient descent with warm restarts," in Proceedings of the 5th International Conference on Learning Representations, pp. 1-16.
[52]P. K. Diederik, "Adam: A method for stochastic optimization," (No Title), 2014.
[53]G. Bjontegaard, "Calculation of average PSNR differences between RD-curves," ITU SG16 Doc. VCEG-M33, 2001.

 電子全文(網際網路公然日期:20300522)


本文引用自: https://ndltd.ncl.edu.tw/cgi-bin/gs32/gsweb.cgi/login?o=dnclcdr&s=id=%22113TIT00441025%22.&searchmod

UThome提供完整的即時影音聊天服務環境,您不需編寫任何的程式碼,不需豐富的相關服務經驗,您唯一要做的事免費聊天,就是加入經銷行列,並將UThome提供給您的連結加到您網站中即可!
美女視訊AV女優免費入會,手機視訊直播,超High,享受甜蜜戀愛滋味。免費加盟手機視訊。色情聊天美女直播。
正妹視訊聊天,新手免入會,直接免費試玩!
直播 視訊 聊天 交友。 一對一,一對多視訊聊天。交友免費視訊交友,優質交友環境。全國首創免費視訊聊天美女交友
免費視訊聊天交友網普通MEMESHOW會員,視訊聊天室,可以在UT模特兒播放現場表演秀的時候和她們互動,並且花錢觀看私人陳列室、錄製表演秀、裸體及私人的1對1。
MEMESHOW官方網站進入免費聊天http://www.memeshow.tw

文章標籤
全站熱搜
創作者介紹
創作者 gamans3pgdvs1 的頭像
gamans3pgdvs1

免錢美女聊天 免費UT視訊聊天交友 模特兒視訊聊天室 會員免費送

gamans3pgdvs1 發表在 痞客邦 留言(0) 人氣(0)