×
Top Data Science Research articles in 2020

Minimum Viable Model Estimates for Machine Learning Projects

    John Hawkins Transitional AI Research Group, Sydney, Australia

    ABSTRACT

    Prioritization of machine learning projects requires estimates of both the potential ROI of the business case and the technical difficulty of building a model with the required characteristics. In this work we present a technique for estimating the minimum required performance characteristics of a predictive model given a set of information about how it will be used. This technique will result in robust, objective comparisons between potential projects. The resulting estimates will allow data scientists and managers to evaluate whether a proposed machine learning project is likely to succeed before any modelling needs to be done. The technique has been implemented into the open source application MinViME (Minimum Viable Model Estimator) which can be installed via the PyPI python package management system, or downloaded directly from the GitHub repository. Available at https://github.com/john-hawkins/MinViME

    KEYWORDS

    Machine Learning, ROI Estimation, Machine Learning Metrics, Cost Sensitive Learning.


    Full Paper
    https://aircconline.com/csit/papers/vol10/csit101803.pdf


    Volume Link :
    http://airccse.org/csit/V10N18.html



AN INTELLECTUAL APPROACH TO DESIGN PERSONAL STUDY PLAN VIA MACHINE LEARNING

    Shiyuan Zhang1, Evan Gunnell2 , Marisabel Chang2 , Yu Sun2 1Arnold O. Beckman High School, USA, 2California State Polytechnic University, USA

    ABSTRACT

    As more students are required to have standardized test scores to enter higher education, developing vocabulary becomes essential for achieving ideal scores. Each individual has his or her own study style that maximizes the efficiency, and there are various approaches to memorize. However, it is difficult to find a specific learning method that fits the best to a person. This paper designs a tool to customize personal study plans based on clients’ different habits including difficulty distribution, difficulty order of learning words, and the types of vocabulary. We applied our application to educational software and conducted a quantitative evaluation of the approach via three types of machine learning models. By calculating cross-validation scores, we evaluated the accuracy of each model and discovered the best model that returns the most accurate predictions. The results reveal that linear regression has the highest cross validation score, and it can provide the most efficient personal study plans.

    KEYWORDS

    Machine learning, study plan, vocabulary


    For More Details :
    https://aircconline.com/csit/papers/vol10/csit101602.pdf


    Volume Link :
    http://airccse.org/csit/V10N18.html


MACHINE LEARNING ALGORITHM FOR NLOS MILLIMETER WAVE IN 5G V2X COMMUNICATION

    Deepika Mohan1 , G. G. Md. Nawaz Ali2 and Peter Han Joo Chong1 1Auckland University of Technology, New Zealand, 2University of Charleston, USA

    ABSTRACT

    The 5G vehicle-to-everything (V2X) communication for autonomous and semi-autonomous driving utilizes the wireless technology for communication and the Millimeter Wave bands are widely implemented in this kind of vehicular network application. The main purpose of this paper is to broadcast the messages from the mmWave Base Station to vehicles at LOS (Line-ofsight) and NLOS (Non-LOS). Relay using Machine Learning (RML) algorithm is formulated to train the mmBS for identifying the blockages within its coverage area and broadcast the messages to the vehicles at NLOS using a LOS nodes as a relay. The transmission of information is faster with higher throughput and it covers a wider bandwidth which is reused, therefore when performing machine learning within the coverage area of mmBS most of the vehicles in NLOS can be benefited. A unique method of relay mechanism combined with machine learning is proposed to communicate with mobile nodes at NLOS.

    KEYWORDS

    5G, Millimeter Wave, Machine Learning, Relay, V2X communication


    For More Details :
    https://aircconline.com/csit/papers/vol10/csit101706.pdf


    Volume Link :
    http://airccse.org/csit/V10N17.html


Parallel Data Extraction Using Word Embeddings

    Pintu Lohar and Andy Way ADAPT Centre, Dublin City University, Ireland

    ABSTRACT

    Building a robust MT system requires a sufficiently large parallel corpus to be available as training data. In this paper, we propose to automatically extract parallel sentences from comparable corpora without using any MT system or even any parallel corpus at all. Instead, we use crosslingual information retrieval (CLIR), average word embeddings, text similarity and a bilingual dictionary, thus saving a significant amount of time and effort as no MT system is involved in this process. We conduct experiments on two different kinds of data: (i) formal texts from news domain, and (ii) user-generated content (UGC) from hotel reviews. The automatically extracted sentence pairs are then added to the already available parallel training data and the extended translation models are built from the concatenated data sets. Finally, we compare the performance of our new extended models against the baseline models built from the available data. The experimental evaluation reveals that our proposed approach is capable of improving the translation outputs for both the formal texts and UGC.

    KEYWORDS

    Machine Translation, parallel data, user-generated content, word embeddings, text similarity, comparable corpora .


    For More Details :
    https://aircconline.com/csit/papers/vol10/csit101521.pdf


    Volume Link :
    http://airccse.org/csit/V10N15.html


DATA PREDICTION OF DEFLECTION BASIN EVOLUTION OF ASPHALT PAVEMENT STRUCTURE BASED ON MULTI-LEVEL NEURAL NETWORK

    Shaosheng Xu1, Jinde Cao2 and Xiangnan Liu2 1Southeast University, Nanjing, China, 2Southeast University, China

    ABSTRACT

    Aiming at reducing the high cost of test data collection of deflection basins in the structural design of asphalt pavement and shortening the long test time of new structures, this paper innovatively designs a structure coding network based on traditional neural networks to map the pavement structure to an abstract space. Therefore, the generalization ability of the neural network structure is improved, and a new multi-level neural network model is formed to predict the evolution data of the deflection basin of the untested structure. By testing the experimental data of RIOHTRACK, the network structure predicts the deflection basin data of untested pavement structure, of which the average prediction error is less than 5%.

    KEYWORDS

    multi-level neural network, Encoding converter, structural of asphalt pavement, deflection basins, RIOHTRACK.


    For More Details :
    : https://aircconline.com/csit/papers/vol10/csit101304.pdf


    Volume Link :
    http://airccse.org/csit/V10N15.html


FOREX DATA ANALYSIS USING WEKA

    Luciana Abednego and Cecilia Esti Nugraheni Parahyangan Catholic University, Indonesia

    ABSTRACT

    This paper conducts some experiments with forex trading data. The data being used is from kaggle.com, a website that provides datasets for machine learning and data scientists. The goal of the experiments is to know how to design many parameters in a forex trading robot. Some questions that want to be investigated are: How far the robot must set the stop loss or target profit level from the open position? When is the best time to apply for a forex robot that works only in a trending market? Which one is better: a forex trading robot that waits for a trending market or a robot that works during a sideways market? To answer these questions, some data visualizations are plotted in many types of graphs. The data representations are built using Weka, an open-source machine learning software. The data visualization helps the trader to design the strategy to trade the forex market. .

    KEYWORDS

    forex trading data, forex data experiments, forex data analysis, forex data visualization, weka


    For More Details :
    https://aircconline.com/csit/papers/vol10/csit101215.pdf


    Volume Link :
    http://airccse.org/csit/V10N12.html


DATA CONFIDENTIALITY IN P2P COMMUNICATION AND SMART CONTRACTS OF BLOCKCHAIN IN INDUSTRY 4.0

    Jan Stodt and Christoph Reich University of Applied Sciences Furtwangen, Furtwangen, Baden-Württemberg, Germany

    ABSTRACT

    Increased collaborative production and dynamic selection of production partners within industry 4.0 manufacturing leads to ever-increasing automatic data exchange between companies. Automatic and unsupervised data exchange creates new attack vectors, which could be used by a malicious insider to leak secrets via an otherwise considered secure channel without anyone noticing. In this paper we reflect upon approaches to prevent the exposure of secret data via blockchain technology, while also providing auditable proof of data exchange. We show that previous blockchain based privacy protection approaches offer protection, but give the control of the data to (potentially not trustworthy) third parties, which also can be considered a privacy violation. The approach taken in this paper is not utilize centralized data storage for data. It realizes data confidentiality of P2P communication and data processing in smart contracts of blockchains.

    KEYWORDS

    blockchain, privacy protection, P2P communication, smart contracts, industry 4.0.


    For More Details :
    https://aircconline.com/csit/papers/vol10/csit101001.pdf


    Volume Link :
    http://airccse.org/csit/V10N10.html


PREPERFORMANCE TESTING OF A WEBSITE

    Sushma Suryadevara and Shahid Ali AGI Institute, Auckland, New Zealand

    ABSTRACT

    This study was conducted on the importance of performance testing of web applications and analyzing the bottleneck applications. This paper highlights performance testing based on load tests. Everyone wants the application to be very fast, at the same time, reliability of the application also plays an important role, such that user’s satisfaction is the push for performance testing of a given application. Performance testing determines a few aspects of system performance under the pre-defined workload. In this study JMeter performance testing tool was used to implement and execute the test cases. The first load test was calculated with 200 users which was increased to 500 users and their throughput, median, average response time and deviation were calculated. .

    KEYWORDS

    Performance testing, load balancing, threads, throughput, JMeter, load test.


    For More Details :
    https://aircconline.com/csit/papers/vol10/csit100703.pdf


    Volume Link :
    http://airccse.org/csit/V10N07.html


PREDICTION OF CANCER MICROARRAY AND DNA METHYLATION DATA USING NON-NEGATIVE MATRIX FACTORIZATION

    Parth Patel1, Kalpdrum Passi1$ and Chakresh Kumar Jain2 1Laurentian University, Canada 2Jaypee Institute of Information Technology, India

    ABSTRACT

    Over the past few years, there has been a considerable spread of microarray technology in many biological patterns, particularly in those pertaining to cancer diseases like leukemia, prostate, colon cancer, etc. The primary bottleneck that one experiences in the proper understanding of such datasets lies in their dimensionality, and thus for an efficient and effective means of studying the same, a reduction in their dimension to a large extent is deemed necessary. This study is a bid to suggesting different algorithms and approaches for the reduction of dimensionality of such microarray datasets.This study exploits the matrix-like structure of such microarray data and uses a popular technique called Non-Negative Matrix Factorization (NMF) to reduce the dimensionality, primarily in the field of biological data. Classification accuracies are then compared for these algorithms.This technique gives an accuracy of 98%. .

    KEYWORDS

    Microarray datasets, Feature Extraction, Feature Selection, Principal Component Analysis, Non-negative Matrix Factorization, Machine learning.


    For More Details :
    https://aircconline.com/csit/papers/vol10/csit100906.pdf


    Volume Link :
    http://airccse.org/csit/V10N09.html


OBJECT DETECTION IN TRAFFIC SCENARIOS - A COMPARISON OF TRADITIONAL AND DEEP LEARNING APPROACHES

    Gopi Krishna Erabati, Nuno Gonçalves and Hélder Araújo Institute of Systems and Robotics, University of Coimbra, Portugal

    ABSTRACT

    In the area of computer vision, research on object detection algorithms has grown rapidly as it is the fundamental step for automation, specifically for self-driving vehicles. This work presents a comparison of traditional and deep learning approaches for the task of object detection in traffic scenarios. The handcrafted feature descriptor like Histogram of oriented Gradients (HOG) with a linear Support Vector Machine (SVM) classifier is compared with deep learning approaches like Single Shot Detector (SSD) and You Only Look Once (YOLO), in terms of mean Average Precision (mAP) and processing speed. SSD algorithm is implemented with different backbone architectures like VGG16, MobileNetV2 and ResNeXt50, similarly YOLO algorithm with MobileNetV1 and ResNet50, to compare the performance of the approaches. The training and inference is performed on PASCAL VOC 2007 and 2012 training, and PASCAL VOC 2007 test data respectively. We consider five classes relevant for traffic scenarios, namely, bicycle, bus, car, motorbike and person for the calculation of mAP. Both qualitative and quantitative results are presented for comparison. For the task of object detection, the deep learning approaches outperform the traditional approach both in accuracy and speed. This is achieved at the cost of requiring large amount of data, high computation power and time to train a deep learning approach. .

    KEYWORDS

    Object Detection, Deep Learning, SVM, SSD & YOLO.


    For More Details :
    https://aircconline.com/csit/papers/vol10/csit100918.pdf


    Volume Link :
    http://airccse.org/csit/V10N09.html






Journals by Area

menu
Reach Us

emailsecretary@airccse.org


emailjsecretary@airccj.org

close