要約
I. はじめに
II. 背景
III. 設計
IV. モデリング
V. データ収集
VI. 特性評価
VII. 結果
VIII. 考察
IX. 関連研究
結論と参考文献
\ \
本論文では、プリサブミット時に脆弱性を持つ可能性の高いコード変更の正確なオンライン予測に基づく、実用的で予防的なセキュリティテストアプローチを提示しました。脆弱性予測に効果的な3種類の新しい特徴データを提示し、大規模で重要なAndroidオープンソースプロジェクトのデータを使用したN分割検証によりリコールと精度を評価しました。
\ また、オンライン展開モードを評価し、トレーニングデータが収集されるターゲットプロジェクトに特化していない特徴データタイプのサブセットを特定しました。これらは他のプロジェクト(例:マルチプロジェクト設定)にも使用できます。評価結果によると、当社のVPフレームワークは、プリサブミット時に評価された脆弱性誘発変更の約80%を98%の精度と1.7%未満の偽陽性率で特定しています。
\ この肯定的な結果は、コミュニティによって管理されている上流のオープンソースプロジェクトに対してVPアプローチまたはフレームワークを活用するための将来の研究(例:高度なMLやGenAI技術の使用)を促しています。これらのプロジェクトは、日常的に数十億人のユーザーが使用する多数のソフトウェアやコンピュータ製品にとって同時に重要です。
\ 本論文の緊急性は、その潜在的な社会的利益に由来しています。VPフレームワークのようなMLベースのアプローチの広範な採用により、オープンソース貢献者とプロジェクトの信頼性データを共有する能力が大幅に向上する可能性があります。このような共有データは、オープンソースコミュニティが偽アカウント(Linux XZ utilバックドア攻撃16で見られたような)などの脅威と戦う力を与えるでしょう。
\ さらに、このMLベースのアプローチは、長期計画された攻撃が発生した際にオープンソースプロジェクト全体での迅速な対応を促進できます。類似または下流プロジェクト間で情報を共有することで、準備態勢が強化され、同様の攻撃への対応時間が短縮されます。
\ したがって、私たちは多数のコンピュータおよびソフトウェア製品が依存しているオープンソースソフトウェアサプライチェーンを強化するために、開発者とプロジェクトの信頼性データベースを共有する慣行を確立するためのオープンソースコミュニティイニシアチブを呼びかけます。
\
[1] T. Menzies, J. Greenwald, and A. Frank, "Data Mining Static Code Attributes to Learn Defect Predictors," IEEE Transactions on Software Engineering, 33(1):2-13, 2007.
[2] M. Halstead, Elements of Software Science, Elsevier, 1977.
[3] T. McCabe, "A Complexity Measure," IEEE Transactions on Software Engineering, 2(4):308-320, 1976.
[4] R. Quinlan, C4.5: Programs for Machine Learning, Morgan Kaufman, 1992.
[5] T. Zimmermann, N. Nagappan, H. Gall, E. Giger, and B. Murphy, "Cross-project defect prediction: a large scale experiment on data vs. domain vs. process," in Proceedings of the Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE), pp. 91-100, 2009.
[6] A. Chou, J. Yang, B. Chelf, S. Hallem, and D. Engler, "An Empirical Study of Operating Systems Errors," in Proceedings of the ACM Symposium on Operating Systems Principles (SOSP), pp. 73-88, 2001.
[7] S. Kim, T. Zimmermann, E. J. Whitehead Jr., and A. Zeller, "Predicting Faults from Cached History," in Proceedings of the ACM International Conference on Software Engineering (ICSE), pp. 489- 498, 2007.
[8] F. Rahman, D. Posnett, A. Hindle, E. Barr, and P. Devanbu, "BugCache for inspections," in Proceedings of the ACM SIGSOFT Symposium and the European Conference on Foundations of Software Engineering (SIGSOFT/FSE), p. 322, 2011.
[9] C. Lewis, Z. Lin, C. Sadowski, X. Zhu, R. Ou, and E. J. Whitehead Jr., "Does bug prediction support human developers? findings from a google case study," in Proceedings of the International Conference on Software Engineering (ICSE), pp. 372-381, 2013.
[10] J. Walden, J. Stuckman, and R. Scandariato, "Predicting Vulnerable Components: Software Metrics vs Text Mining," in Proceedings of the IEEE International Symposium on Software Reliability Engineering, pp. 23-33, 2014.
[11] A. Chou, J. Yang, B. Chelf, S. Hallem, and D. Engler, "An empirical study of operating systems errors," in Proceedings of the ACM Symposium on Operating Systems Principles (SOSP), pp. 73-88, 2001.
[12] S. R. Chidamber and C. F. Kemerer, "A Metrics Suite for Object Oriented Design," IEEE Transactions on Software Engineering, 20(6):476-493, 1994.
[13] R. Quinlan, C4.5: Programs for Machine Learning, Morgan Kaufmann Publishers, San Mateo, CA, 1993.
[14] A. Chou, J. Yang, B. Chelf, S. Hallem, and D. Engler, "An empirical study of operating systems errors," in Proceedings of the ACM Symposium on Operating Systems Principles (SOSP), pp. 73-88, 2001.
[15] R. Chillarege, I. S. Bhandari, J. K. Chaar, M. J. Halliday, D. S. Moebus, B. K. Ray, and M-Y. Wong, "Orthogonal defect classification-a concept for in-process measurements", IEEE Transactions on Software Engineering, 18(11):943-956, 1992.
[16] R. Natella, D. Cotroneo, and H. Madeira, "Assessing Dependability with Software Fault Injection: A Survey", ACM Computing Surveys, 48(3), 2016.
[17] K. S. Yim, "Norming to Performing: Failure Analysis and Deployment Automation of Big Data Software Developed by Highly Iterative Models," in Proceedings of the IEEE International Symposium on Software Reliability Engineering (ISSRE), pp. 144-155, 2014.
[18] S. R. Chidamber and C. F. Kemerer, "A Metrics Suite for Object Oriented Design," IEEE Transactions on Software Engineering, 20(6):476-493, 1994.
[19] K. S. Yim, "Assessment of Security Defense of Native Programs Against Software Faults," System Dependability and Analytics, Springer Series in Reliability Engineering, Springer, Cham., 2023.
[20] M. Fourné, D. Wermke, S. Fahl and Y. Acar, "A Viewpoint on Human Factors in Software Supply Chain Security: A Research Agenda," IEEE Security & Privacy, vol. 21, no. 6, pp. 59-63, Nov.-Dec. 2023.
[21] P. Ladisa, H. Plate, M. Martinez and O. Barais, "SoK: Taxonomy of Attacks on Open-Source Software Supply Chains," in Proceedings of the IEEE Symposium on Security and Privacy (SP), pp. 1509-1526, 2023.
[22] D. Wermke et al., ""Always Contribute Back": A Qualitative Study on Security Challenges of the Open Source Supply Chain," in Proceedings of the IEEE Symposium on Security and Privacy (SP), pp. 1545-1560, 2023.
[23] A. Dann, H. Plate, B. Hermann, S. E. Ponta and E. Bodden, "Identifying Challenges for OSS Vulnerability Scanners - A Study & Test Suite," IEEE Transactions on Software Engineering, vol. 48, no. 9, pp. 3613-3625, 1 Sept. 2022.
[24] S. Torres-Arias, A. K. Ammula, R. Curtmola, and J. Cappos, "On omitting commits and committing omissions: Preventing git metadata tampering that (re)introduces software vulnerabilities," in Proceedings of the 25th USENIX Security Symposium, pp. 379-395, 2016.
[25] R. Goyal, G. Ferreira, C. Kastner, and J. Herbsleb, "Identifying unusual commits on github," Journal of Software: Evolution and Process, vol. 30, no. 1, p. e1893, 2018.
[26] C. Soto-Valero, N. Harrand, M. Monperrus, and B. Baudry, "A comprehensive study of bloated dependencies in the maven ecosystem," Empirical Software Engineering, vol. 26, Mar 2021.
[27] R. Duan, O. Alrawi, R. P. Kasturi, R. Elder, B. Saltaformaggio, and W. Lee, "Towards measuring supply chain attacks on package managers for interpreted languages," arXiv preprint arXiv:2002.01139, 2020.
[28] Enduring Security Framework, "Securing the software supply chain: Recommended practices guide for developers," Cybersecurity and Infrastructure Security Agency, Washington, DC, USA, August 2022.
[29] Z. Durumeric et al., "The matter of heartbleed," in Proceedings of the ACM Internet Measurement Conference, pp. 475-488, 2014.
[30] D. Everson, L. Cheng, and Z. Zhang, "Log4shell: Redefining the web attack surface," in Proceedings of the Workshop on Measurements, Attacks, and Defenses for the Web (MADWeb), pp. 1-8, 2022.
[31] "Highly evasive attacker leverages SolarWinds supply chain to compromise multiple global victims with SUNBURST backdoor," Mandiant, available at https://www.mandiant.com/resources/blog/evasive-attackerleverages-solarwinds-supply-chain-compromises-with-sunburstbackdoor
[32] W. Enck and L. Williams, "Top five challenges in software supply chain security: Observations from 30 industry and government organizations," IEEE Security Privacy, vol. 20, no. 2, pp. 96-100, 2022.
[33] "CircleCI incident report for January 4, 2023 security incident." CircleCI, available at https://circleci.com/blog/jan-4-2023-incidentreport/ [34] K. Toubba, "Security incident update and recommended actions," LastPass, available at https://blog.lastpass.com/2023/03/securityincident-update-recommended-actions/
[35] D. Wermke, N. Wöhler, J. H. Klemmer, M. Fourné, Y. Acar, and S. Fahl, "Committed to trust: A qualitative study on security & trust in open source software projects," in Proceedings of the IEEE Symposium on Security and Privacy (S&P), pp. 1880-1896, 2022.
[36] D. A. Wheeler, "Countering trusting trust through diverse doublecompiling," in Proceedings of the IEEE Annual Computer Security Applications Conference (ACSAC), pp. 13-48, 2005.
[37] K. S. Yim, I. Malchev, A. Hsieh, and D. Burke, "Treble: Fast Software Updates by Creating an Equilibrium in an Active Software Ecosystem of Globally Distributed Stakeholders," ACM Transactions on Embedded Computing Systems, 18(5s):104, 2019.
[38] G. Holmes, A. Donkin, and I. H. Witten, "WEKA: a machine learning workbench," in Proceedings of the Australian New Zealnd Intelligent Information Systems Conference (ANZIIS), pp. 357-361, 1994.
[39] D. Yeke, M. Ibrahim, G. S. Tuncay, H. Farrukh, A.


