Pseudorandom Sequences Classification Algorithm

Andrey Spirin, Alexander Kozachok


Abstract—Currently, the number of information leaks caused by internal violators has increased. One of the possible channels for information leaks is the transmission of data in encrypted or compressed form, since modern DLP (data leakage prevention) systems are not able to detect signatures and other information related to confidential information in such data. The article presents an algorithm for classifying sequences formed by encryption and compression algorithms. An array of frequencies of occurrence of binary subsequences of length N bits was used as a feature space. File headers or any other contextual information were not used to construct the feature space. The presented algorithm has shown the accuracy of classification of the sequences specified in the work 0.98 and can be implemented in DLP systems to prevent the transmission of information in encrypted or compressed form.


Tóm tắt—Hiện nay, số vụ rò rỉ thông tin bởi đối tượng vi phạm trong nội bộ gây ra ngày càng gia tăng. Một trong những kênh có thể dẫn đến rò rỉ thông tin là việc truyền dữ liệu ở dạng mã hóa hoặc nén, vì các hệ thống chống rò rỉ dữ liệu (DLP) hiện đại không thể phát hiện chữ ký và thông tin trong loại dữ liệu này. Nội dung bài báo trình bày thuật toán phân loại các chuỗi được hình thành bằng thuật toán mã hóa và nén. Một mảng tần số xuất hiện của các chuỗi con nhị phân có độ dài N bit được sử dụng làm không gian đặc trưng. Tiêu đề tệp hoặc bất kỳ thông tin ngữ cảnh nào khác không được sử dụng để xây dựng không gian đối tượng. Thuật toán được trình bày có độ chính xác trong việc phân loại các chuỗi đạt 0,98 và có thể được áp dụng trong các hệ thống DLP để ngăn chặn việc rò rỉ thông tin khi truyền thông tin ở dạng mã hóa hoặc nén.


statistical data analysis, machine learning, classification of binary sequences, DLP systems, information leakage protection

Full Text:



Data Breach Report: A Study on Global Data Leaks in H1 2018, InfoWatch, (Access date 14.01.2020).

B.B. Mahesh, M.S. Bhanu, "Prevention of insider attacks by integrating behavior analysis with risk based access control model to protect cloud", Procedia Computer Science, Vol. 54, 2015, pp. 157-166.

D. Kolevski, K. Michael, Cloud computing data breaches a socio-technical review of literature, 2015 International Conference on Green Computing and Internet of Things (ICGCIoT), Greater Noida, India, 2015, pp. 1486-1495.

S. Alneyadi, E. Sithirasenan, V. Muthukkumarasamy, Detecting Data Semantic: A Data Leakage Prevention Approach, IEEE Trustcom/BigDataSE/ISPA, Helsinki, Finland, Vol. 1, 2015, pp. 910-917.

S. Alneyadi, E. Sithirasenan, V. Muthukkumarasamy, Discovery of potential data leaks in email communications, 10th International Conference on Signal Processing and Communication Systems (ICSPCS), Gold Coast, Australia, 2016, pp. 1-10.

X. Huang, Y. Lu, D. Li, M. Ma, A novel mechanism for fast detection of transformed data leakage, IEEE Access, Vol. 6, 2018, pp. 35926-35936.

K. Kaur, I. Gupta, A. K. Singh, Comparative Evaluation of Data Leakage/Loss prevention Systems (DLPS), In Proc. 4th Int. Conf. Computer Science & Information Technology (CS & IT-CSCP), 2017, pp. 87-95.

L. Cheng, F. Liu, D. Yao, Enterprise data breach: causes, challenges, prevention, and future directions, Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, Vol. 7, No. 5, 2017, pp. 1211.

X. Shu, D. Yao, E. Bertino, Privacy-Preserving Detection of Sensitive Data Exposure, IEEE Transactions on Information Forensics and Security, Vol. 10, No. 5, 2015, pp. 1092-1103.

F. Liu, X. Shu, D. Yao, A. R. Butt, Privacy-preserving scanning of big content for sensitive data exposure with MapReduce, Proceedings of the 5th ACM Conference on Data and Application Security and Privacy, 2015, pp. 195-206.

X. Shu, J. Zhang, D. Yao, W. Feng, Rapid and parallel content screening for detecting transformed data exposure, Proceedings of the Third International Workshop on Security and Privacy in Big Data, 2015, pp. 191-196.

Shu X., Zhang J., Yao D. D., Feng, W. C., Fast Detection of Transformed Data Leaks, IEEE Transactions on Information Forensics and Security, Vol. 11, No 3, 2016, pp. 528-542.

Yu, X., Tian, Z., Qiu, J., & Jiang, F. , A data leakage prevention method based on the reduction of confidential and context terms for smart mobile devices, Wireless Communications and Mobile Computing, 2018. DOI: 10.1155/2018/5823439.

X. Shu, D. Yao, E. Bertino, Privacy-Preserving Detection of Sensitive Data Exposure, IEEE Transactions on Information Forensics and Security, Vol. 10, No. 5, 2015, pp. 1092-1103.

Shvartzshnaider Y., Pavlinovic Z., Balashankar A., Wies T., Subramanian L., Nissenbaum H., Mittal P., VACCINE: Using Contextual Integrity For Data Leakage Detection, The World Wide Web Conference, 2019, pp. 1702-1712.

Kavitha T., Rajitha O., Thejaswi K., Muppalaneni N. B. Classification of encryption algorithms based on ciphertext using pattern recognition techniques, International conference on Computer Networks, Big data and IoT, 2018, pp. 540-545.

C. Tan, Q. Ji, An approach to identifying cryptographic algorithm from ciphertext, 8th IEEE International Conference on Communication Software and Networks, 2016, pp. 19-23.

C. Tan, Y. Li, S. Yao, A Novel Identification Approach to Encryption Mode of Block Cipher, 4th International Conference on Sensors, Mechatronics and Automation, Zhuhai, China, 2016. DOI: 10.2991/icsma-16.2016.101.

C. Tan, X. Deng, L. Zhang, Identification of Block Ciphers under CBC Mode, Procedia Computer Science, Vol. 131, 2018, pp. 65-71.

Ray P. K., Ojha S., Roy B. K., Basu A., Classification of Encryption Algorithms using Fisher’s Discriminant Analysis, Defence Science Journal, Vol. 67, No. 1, 2017, pp. 59-65.

Pan J., Encryption scheme classification: a deep learning approach, International Journal of Electronic Security and Digital Forensics, Vol. 9, No. 4, 2017, pp. 381-395.

Wang, W., Zhu, M., Zeng, X., Ye, X., & Sheng, Y., Malware traffic classification using convolutional neural network for representation learning, International Conference on Information Networking (ICOIN), 2017, pp. 712-717.

Wang W., Zhu M., Wang J., Zeng X., Yang Z., End-to-end encrypted traffic classification with one-dimensional convolution neural networks, IEEE International Conference on Intelligence and Security Informatics (ISI), 2017, pp. 43-48.

Lotfollahi M., Siavoshani M. J., Zade R. S. H., Saberian M., Deep packet: A novel approach for encrypted traffic classification using deep learning, Soft Computing, 2017, pp. 1-14.

Zhang J., Chen X., Xiang Y., Zhou W., Wu J. Robust network traffic classification, IEEE/ACM Transactions on Networking, Vol. 23, No. 4 , 2015, pp. 1257-1270.

Pacheco F., Exposito E., Gineste M., Baudoin C., Aguilar J., Towards the deployment of machine learning solutions in network traffic classification: a systematic survey, IEEE Communications Surveys & Tutorials, Vol. 21, No. 2, 2018, pp. 1988-2014.

Hahn D., Apthorpe N., Feamster N., Detecting compressed cleartext traffic from consumer internet of things devices, arXiv preprint arXiv:1805.02722, 2018.

Casino F., Choo K. K. R., Patsakis C., HEDGE: efficient traffic classification of encrypted and compressed packets, IEEE Transactions on Information Forensics and Security, Vol. 14, No. 11, 2019, pp. 2916-2926.

Tang Z., Zeng X. and Sheng Y., Entropy-based feature extraction algorithm for encrypted and non-encrypted compressed traffic classification International Journal of ICIC, Vol. 15, No 3, 2019.

Khakpour A. R., Liu A. X., An information-theoretical approach to high-speed flow nature identification, IEEE/ACM transactions on networking, Vol. 21, No. 4, 2012, pp. 1076-1089.

Konyshev M. U., Dvilyansky A. A., Barabashov A. Y., Petrov K. Y., Formation of probability distributions of binary vectors of the error source of a Markov discrete memory link using the method of "grouping probabilities" of error vectors, Industrial ACS and controllers, No. 3, 2018, p. 42.

Konyshev M. U., Dvilyansky A. A., Petrov K. Y., Ermishin G. A., Algorithm for compression of a distribution series of binary multidimensional random variables, Industrial ACS and controllers, No. 8, 2016, pp. 47-50.

Toolkit for the transport layer security and secure sockets layer protocols, (Access date: 14.01.2020).

Archive manager WinRAR,, (Access date: 14.01.2020).

Programm environment Anaconda,, (Access date: 14.01.2020).

Breiman L., Classification and regression trees, Routledge, 2017, p. 358.


  • There are currently no refbacks.