Pakistan Science Abstracts
Article details & metrics
No Detail Found!!
A Comprehensive Auto ML Solution for Automated Data Preprocessing and Model Deployment
Author(s):
1. Palanivel N: Department of CSE(ICB) Manakula Vinayagar Institute of Technology, Puducherry, India
2. Nithyasree KC: Department of Computer Science and Engineering Manakula Vinayagar Institute of Technology, Puducherry, India.
3. Vigneshwaraan B: Manakula Vinayagar Institute of Technology, Puducherry, India.
4. Sobanraj B: Manakula Vinayagar Institute of Technology, Puducherry, India.
5. Ragavan P: Manakula Vinayagar Institute of Technology, Puducherry, India.
Abstract:
An important turning point in the field of machine learning has been reached with the convergence of data preparation and automated machine learning (AutoML). AutoML has become a reliable solution for tackling major issues with data preprocessing approaches because of its capacity to automate the coordination of different machine learning processes. This study covers a wide range of important topics related to data preparation, including feature selection, time- series preprocessing, manual encoding mistakes, class imbalance, and inefficient hyperparameters. AutoML's revolutionary effect on simplifying crucial data preparation procedures is one of its main contributions to data preprocessing. Data preparation has historically been a labor-and time-intensive procedure that calls for specialised knowledge and physical involvement at different points in the process. But many of these jobs may now be completed automatically because to the development of automated algorithms, which has significantly increased productivity and efficiency. Furthermore, by making data preprocessing more approachable for both specialists and non-experts, AutoML has democratised the field. Through the automation of intricate processes like feature selection and hyperparameter tweaking, AutoML technologies enable users to concentrate on more advanced parts of model creation, such formulating problems and interpreting outcomes. In addition to quickening the rate of invention, this democratisation of data preprocessing encourages increased cooperation and knowledge exchange within the machine learning community.
Page(s): 988-995
DOI: DOI not available
Published: Journal: International Journal of Communication Networks and Information Security, Volume: 16, Issue: S1, Year: 2024
Keywords:
machine learning , Data , data visualization , Preprocessing , Feature selection , Automl , Hyperparameters , Report Generation
References:
[1] Goyle K.,Xie Q.,Goyle Q. .2023 ."DataAssist: A Machine Learning Approach to Data Cleaning and Preparation,". eprint arXiv:2307.07119, : .
[2] Juddoo S. .2022 ."Investigating Data Repair steps for EHR Big Data,". in International Conference on Next Generation Computing Applications, : .
[3] Ribeiro P.,Orzechowski P.,Wagenaar J. B.,J. H. Moore J. B. .2022 ."Benchmarking AutoML algorithms on a collection of synthetic classification problems,". eprint arXiv:2212.02704, : .
[4] Abdelaal M.,Hammacher C.,Schoening C. .2023 ."REIN: A Comprehensive Benchmark Framework for Data Cleaning Methods in ML Pipelines,". eprint arXiv:2302.04702, : .
[5] Neutatz F.,Chen B.,Alkhatib Y.,Ye J.,Abedjan Z. .2022 ."Data Cleaning and AutoML: Would an Optimizer Choose to Clean?". Eprint Springer s13222-022-00413-2, : .
[6] Abdelaal M.,Koparde R.,Schoening R. .2023 .". AutoCure: Automated Tabular Data Curation Technique for ML Pipelines," eprint arXiv:2304.13636, : .
[7] Holzer S.,K. Stockinger S. .2022 ."Detecting errors in databases with bidirectional recurrent neural networks,". Open Proceedings ZHAW, : .
[8] Li P.,Chen Z.,Chu X.,K. Rong X. .2023 ."DiffPrep: Differentiable Data Preprocessing Pipeline Search for Learning over Tabular Data,". eprint arXiv:2308.10915, : .
[9] Singh M.,Cambronero J.,Gulwani S.,Le V.,Negreanu C. .2023 ."DataVinci: Learning Syntactic and Semantic String Repairs,". eprint arXiv:2308.10922, : .
[10] Guha S.,Khan F. A.,Stoyanovich J. .2023 ."Automated Data Cleaning Can Hurt Fairness in Machine Learning-based Decision Making,". in IEEE 39th International Conference on Data Engineering, : .
[11] Wang R.,Li Y.,Wang Y. .2022 ."Sudowoodo: Contrastive Self-supervised Learning for Multi-purpose Data Integration and Preparation,". eprint arXiv:2207.04122, : .
[12] Hilprecht B.,Hammacher C.,Abdelaal M.,C. Binnig M. .2022 ."DiffML: End-to- end Differentiable ML Pipelines,". eprint arXiv:2207.01269, : .
[13] Restat V.,Klettke M. .2022 ."Towards a Holistic Data Preparation Tool,". in EDBT/ICDT Workshops, : .
[14] Nashaat M.,Ghosh A.,Miller J. .2021 ."TabReformer: Unsupervised Representation Learning for Erroneous Data Detection,". , : .
[15] Calefato F.,Quaranta L.,Lanubile F.,M. Kalinowski F. .2023 ."Assessing the Use of AutoML for Data-Driven Software Engineering,". eprint arXiv:2307.10774, : .
[16] Stühler H.,Zöller M. A.,Klau D.,Beiderwellen-Bedrikow A.,C. Tutschku A. .2023 .". Benchmarking Automated Machine Learning Methods for Price ForecastingApplications," eprint arXiv:2304.14735, : .
[17] Feurer M.,Klein A.,Eggensperger J.,Blum M.,Hutter F. .2015 .Efficient and robust automated machine learning. in: Advances in Neural Information Processing Systems, 28 : 2962-2970.
[18] Gijsbers P.,Bischl B.,Vanschoren J. .2019 .An open source automl benchmark. 6th ICML Workshop on Automated Machine Learning, : 06-06.
[19] Gijsbers P.,Bueno M. L. P.,Coors S.,Bischl B.,Vanschoren J. .2022 .Amlb: an automl benchmark (. , 10 : .
[20] Guyon I.,Sun-Hosoya L.,Escalante H. J.,Escalera S.,Liu Z.,Jajetic D.,Ray B.,Saeed M.,Sebag M.,Statnikov A.,Tu W.-W. .2019 .. Analysis of the AutoML Challenge Series, 10 : 10-219.
[21] Erickson N,Mueller J,Shirkov A,Zhang H,Larroy P,Li M,Smola AJ .2003 .Autogluon-tabular: Robust and accurate automl for structured data. CoRR, : .
[22] K. Van der Blom H.,Hoos J.,Visser J. .2021 .. AutoML Adoption in ML Software,” 8th ICML Workshop on Automated Machine Learning, : .
[23] Le T. T.,Fu W.,Moore J. H. . .Scaling tree- based automated machine learning to biomedical big data with a. , : .
Citations
Citations are not available for this document.
0

Citations

0

Downloads

21

Views