Minimum Viable Model Estimates for Machine Learning Projects
John Hawkins Transitional AI Research Group, Sydney, Australia
Prioritization of machine learning projects requires estimates of both the potential ROI of the business case and the technical difficulty of building a model with the required characteristics. In this work we present a technique for estimating the minimum required performance characteristics of a predictive model given a set of information about how it will be used. This technique will result in robust, objective comparisons between potential projects. The resulting estimates will allow data scientists and managers to evaluate whether a proposed machine learning project is likely to succeed before any modelling needs to be done. The technique has been implemented into the open source application MinViME (Minimum Viable Model Estimator) which can be installed via the PyPI python package management system, or downloaded directly from the GitHub repository. Available at https://github.com/john-hawkins/MinViME
Machine Learning, ROI Estimation, Machine Learning Metrics, Cost Sensitive Learning.
Full Paper
https://aircconline.com/csit/papers/vol10/csit101803.pdf
Volume Link :
http://airccse.org/csit/V10N18.html
AN INTELLECTUAL APPROACH TO DESIGN PERSONAL STUDY PLAN VIA MACHINE LEARNING
Shiyuan Zhang1, Evan Gunnell2 , Marisabel Chang2 , Yu Sun2 1Arnold O. Beckman High School, USA, 2California State Polytechnic University, USA
As more students are required to have standardized test scores to enter higher education, developing vocabulary becomes essential for achieving ideal scores. Each individual has his or her own study style that maximizes the efficiency, and there are various approaches to memorize. However, it is difficult to find a specific learning method that fits the best to a person. This paper designs a tool to customize personal study plans based on clients’ different habits including difficulty distribution, difficulty order of learning words, and the types of vocabulary. We applied our application to educational software and conducted a quantitative evaluation of the approach via three types of machine learning models. By calculating cross-validation scores, we evaluated the accuracy of each model and discovered the best model that returns the most accurate predictions. The results reveal that linear regression has the highest cross validation score, and it can provide the most efficient personal study plans.
Machine learning, study plan, vocabulary
For More Details :
https://aircconline.com/csit/papers/vol10/csit101602.pdf
Volume Link :
http://airccse.org/csit/V10N18.html
MACHINE LEARNING ALGORITHM FOR NLOS MILLIMETER WAVE IN 5G V2X COMMUNICATION
Deepika Mohan1 , G. G. Md. Nawaz Ali2 and Peter Han Joo Chong1 1Auckland University of Technology, New Zealand, 2University of Charleston, USA
The 5G vehicle-to-everything (V2X) communication for autonomous and semi-autonomous driving utilizes the wireless technology for communication and the Millimeter Wave bands are widely implemented in this kind of vehicular network application. The main purpose of this paper is to broadcast the messages from the mmWave Base Station to vehicles at LOS (Line-ofsight) and NLOS (Non-LOS). Relay using Machine Learning (RML) algorithm is formulated to train the mmBS for identifying the blockages within its coverage area and broadcast the messages to the vehicles at NLOS using a LOS nodes as a relay. The transmission of information is faster with higher throughput and it covers a wider bandwidth which is reused, therefore when performing machine learning within the coverage area of mmBS most of the vehicles in NLOS can be benefited. A unique method of relay mechanism combined with machine learning is proposed to communicate with mobile nodes at NLOS.
5G, Millimeter Wave, Machine Learning, Relay, V2X communication
For More Details :
https://aircconline.com/csit/papers/vol10/csit101706.pdf
Volume Link :
http://airccse.org/csit/V10N17.html
Parallel Data Extraction Using Word Embeddings
Pintu Lohar and Andy Way ADAPT Centre, Dublin City University, Ireland
Building a robust MT system requires a sufficiently large parallel corpus to be available as training data. In this paper, we propose to automatically extract parallel sentences from comparable corpora without using any MT system or even any parallel corpus at all. Instead, we use crosslingual information retrieval (CLIR), average word embeddings, text similarity and a bilingual dictionary, thus saving a significant amount of time and effort as no MT system is involved in this process. We conduct experiments on two different kinds of data: (i) formal texts from news domain, and (ii) user-generated content (UGC) from hotel reviews. The automatically extracted sentence pairs are then added to the already available parallel training data and the extended translation models are built from the concatenated data sets. Finally, we compare the performance of our new extended models against the baseline models built from the available data. The experimental evaluation reveals that our proposed approach is capable of improving the translation outputs for both the formal texts and UGC.
Machine Translation, parallel data, user-generated content, word embeddings, text similarity, comparable corpora .
For More Details :
https://aircconline.com/csit/papers/vol10/csit101521.pdf
Volume Link :
http://airccse.org/csit/V10N15.html
DATA PREDICTION OF DEFLECTION BASIN EVOLUTION OF ASPHALT PAVEMENT STRUCTURE BASED ON MULTI-LEVEL NEURAL NETWORK
Shaosheng Xu1, Jinde Cao2 and Xiangnan Liu2 1Southeast University, Nanjing, China, 2Southeast University, China
Aiming at reducing the high cost of test data collection of deflection basins in the structural design of asphalt pavement and shortening the long test time of new structures, this paper innovatively designs a structure coding network based on traditional neural networks to map the pavement structure to an abstract space. Therefore, the generalization ability of the neural network structure is improved, and a new multi-level neural network model is formed to predict the evolution data of the deflection basin of the untested structure. By testing the experimental data of RIOHTRACK, the network structure predicts the deflection basin data of untested pavement structure, of which the average prediction error is less than 5%.
multi-level neural network, Encoding converter, structural of asphalt pavement, deflection basins, RIOHTRACK.
For More Details :
: https://aircconline.com/csit/papers/vol10/csit101304.pdf
Volume Link :
http://airccse.org/csit/V10N15.html
FOREX DATA ANALYSIS USING WEKA
Luciana Abednego and Cecilia Esti Nugraheni Parahyangan Catholic University, Indonesia
This paper conducts some experiments with forex trading data. The data being used is from kaggle.com, a website that provides datasets for machine learning and data scientists. The goal of the experiments is to know how to design many parameters in a forex trading robot. Some questions that want to be investigated are: How far the robot must set the stop loss or target profit level from the open position? When is the best time to apply for a forex robot that works only in a trending market? Which one is better: a forex trading robot that waits for a trending market or a robot that works during a sideways market? To answer these questions, some data visualizations are plotted in many types of graphs. The data representations are built using Weka, an open-source machine learning software. The data visualization helps the trader to design the strategy to trade the forex market. .
forex trading data, forex data experiments, forex data analysis, forex data visualization, weka
For More Details :
https://aircconline.com/csit/papers/vol10/csit101215.pdf
Volume Link :
http://airccse.org/csit/V10N12.html
DATA CONFIDENTIALITY IN P2P COMMUNICATION AND SMART CONTRACTS OF BLOCKCHAIN IN INDUSTRY 4.0
Jan Stodt and Christoph Reich University of Applied Sciences Furtwangen, Furtwangen, Baden-Württemberg, Germany
Increased collaborative production and dynamic selection of production partners within industry 4.0 manufacturing leads to ever-increasing automatic data exchange between companies. Automatic and unsupervised data exchange creates new attack vectors, which could be used by a malicious insider to leak secrets via an otherwise considered secure channel without anyone noticing. In this paper we reflect upon approaches to prevent the exposure of secret data via blockchain technology, while also providing auditable proof of data exchange. We show that previous blockchain based privacy protection approaches offer protection, but give the control of the data to (potentially not trustworthy) third parties, which also can be considered a privacy violation. The approach taken in this paper is not utilize centralized data storage for data. It realizes data confidentiality of P2P communication and data processing in smart contracts of blockchains.
blockchain, privacy protection, P2P communication, smart contracts, industry 4.0.
For More Details :
https://aircconline.com/csit/papers/vol10/csit101001.pdf
Volume Link :
http://airccse.org/csit/V10N10.html
PREPERFORMANCE TESTING OF A WEBSITE
Sushma Suryadevara and Shahid Ali AGI Institute, Auckland, New Zealand
This study was conducted on the importance of performance testing of web applications and analyzing the bottleneck applications. This paper highlights performance testing based on load tests. Everyone wants the application to be very fast, at the same time, reliability of the application also plays an important role, such that user’s satisfaction is the push for performance testing of a given application. Performance testing determines a few aspects of system performance under the pre-defined workload. In this study JMeter performance testing tool was used to implement and execute the test cases. The first load test was calculated with 200 users which was increased to 500 users and their throughput, median, average response time and deviation were calculated. .
Performance testing, load balancing, threads, throughput, JMeter, load test.
For More Details :
https://aircconline.com/csit/papers/vol10/csit100703.pdf
Volume Link :
http://airccse.org/csit/V10N07.html
PREDICTION OF CANCER MICROARRAY AND DNA METHYLATION DATA USING NON-NEGATIVE MATRIX FACTORIZATION
Parth Patel1, Kalpdrum Passi1$ and Chakresh Kumar Jain2 1Laurentian University, Canada 2Jaypee Institute of Information Technology, India
Over the past few years, there has been a considerable spread of microarray technology in many biological patterns, particularly in those pertaining to cancer diseases like leukemia, prostate, colon cancer, etc. The primary bottleneck that one experiences in the proper understanding of such datasets lies in their dimensionality, and thus for an efficient and effective means of studying the same, a reduction in their dimension to a large extent is deemed necessary. This study is a bid to suggesting different algorithms and approaches for the reduction of dimensionality of such microarray datasets.This study exploits the matrix-like structure of such microarray data and uses a popular technique called Non-Negative Matrix Factorization (NMF) to reduce the dimensionality, primarily in the field of biological data. Classification accuracies are then compared for these algorithms.This technique gives an accuracy of 98%. .
Microarray datasets, Feature Extraction, Feature Selection, Principal Component Analysis, Non-negative Matrix Factorization, Machine learning.
For More Details :
https://aircconline.com/csit/papers/vol10/csit100906.pdf
Volume Link :
http://airccse.org/csit/V10N09.html
OBJECT DETECTION IN TRAFFIC SCENARIOS - A COMPARISON OF TRADITIONAL AND DEEP LEARNING APPROACHES
Gopi Krishna Erabati, Nuno Gonçalves and Hélder Araújo Institute of Systems and Robotics, University of Coimbra, Portugal
In the area of computer vision, research on object detection algorithms has grown rapidly as it is the fundamental step for automation, specifically for self-driving vehicles. This work presents a comparison of traditional and deep learning approaches for the task of object detection in traffic scenarios. The handcrafted feature descriptor like Histogram of oriented Gradients (HOG) with a linear Support Vector Machine (SVM) classifier is compared with deep learning approaches like Single Shot Detector (SSD) and You Only Look Once (YOLO), in terms of mean Average Precision (mAP) and processing speed. SSD algorithm is implemented with different backbone architectures like VGG16, MobileNetV2 and ResNeXt50, similarly YOLO algorithm with MobileNetV1 and ResNet50, to compare the performance of the approaches. The training and inference is performed on PASCAL VOC 2007 and 2012 training, and PASCAL VOC 2007 test data respectively. We consider five classes relevant for traffic scenarios, namely, bicycle, bus, car, motorbike and person for the calculation of mAP. Both qualitative and quantitative results are presented for comparison. For the task of object detection, the deep learning approaches outperform the traditional approach both in accuracy and speed. This is achieved at the cost of requiring large amount of data, high computation power and time to train a deep learning approach. .
Object Detection, Deep Learning, SVM, SSD & YOLO.
For More Details :
https://aircconline.com/csit/papers/vol10/csit100918.pdf
Volume Link :
http://airccse.org/csit/V10N09.html
Journals by Area
Conference Publicity
To list your conference in this page, please contact us