Top Data Science Research articles in 2020

Minimum Viable Model Estimates for Machine Learning Projects

John Hawkins Transitional AI Research Group, Sydney, Australia

ABSTRACT

Prioritization of machine learning projects requires estimates of both the potential ROI of the business case and the technical difficulty of building a model with the required characteristics. In this work we present a technique for estimating the minimum required performance characteristics of a predictive model given a set of information about how it will be used. This technique will result in robust, objective comparisons between potential projects. The resulting estimates will allow data scientists and managers to evaluate whether a proposed machine learning project is likely to succeed before any modelling needs to be done. The technique has been implemented into the open source application MinViME (Minimum Viable Model Estimator) which can be installed via the PyPI python package management system, or downloaded directly from the GitHub repository. Available at https://github.com/john-hawkins/MinViME

KEYWORDS

Machine Learning, ROI Estimation, Machine Learning Metrics, Cost Sensitive Learning.

Full Paper
https://aircconline.com/csit/papers/vol10/csit101803.pdf

Volume Link :
http://airccse.org/csit/V10N18.html

AN INTELLECTUAL APPROACH TO DESIGN PERSONAL STUDY PLAN VIA MACHINE LEARNING

Shiyuan Zhang¹, Evan Gunnell² , Marisabel Chang² , Yu Sun² ¹Arnold O. Beckman High School, USA, ²California State Polytechnic University, USA

ABSTRACT

As more students are required to have standardized test scores to enter higher education, developing vocabulary becomes essential for achieving ideal scores. Each individual has his or her own study style that maximizes the efficiency, and there are various approaches to memorize. However, it is difficult to find a specific learning method that fits the best to a person. This paper designs a tool to customize personal study plans based on clients’ different habits including difficulty distribution, difficulty order of learning words, and the types of vocabulary. We applied our application to educational software and conducted a quantitative evaluation of the approach via three types of machine learning models. By calculating cross-validation scores, we evaluated the accuracy of each model and discovered the best model that returns the most accurate predictions. The results reveal that linear regression has the highest cross validation score, and it can provide the most efficient personal study plans.

KEYWORDS

Machine learning, study plan, vocabulary

For More Details :
https://aircconline.com/csit/papers/vol10/csit101602.pdf

Volume Link :
http://airccse.org/csit/V10N18.html

MACHINE LEARNING ALGORITHM FOR NLOS MILLIMETER WAVE IN 5G V2X COMMUNICATION

Deepika Mohan¹ , G. G. Md. Nawaz Ali² and Peter Han Joo Chong¹ ¹Auckland University of Technology, New Zealand, ²University of Charleston, USA

ABSTRACT

The 5G vehicle-to-everything (V2X) communication for autonomous and semi-autonomous driving utilizes the wireless technology for communication and the Millimeter Wave bands are widely implemented in this kind of vehicular network application. The main purpose of this paper is to broadcast the messages from the mmWave Base Station to vehicles at LOS (Line-ofsight) and NLOS (Non-LOS). Relay using Machine Learning (RML) algorithm is formulated to train the mmBS for identifying the blockages within its coverage area and broadcast the messages to the vehicles at NLOS using a LOS nodes as a relay. The transmission of information is faster with higher throughput and it covers a wider bandwidth which is reused, therefore when performing machine learning within the coverage area of mmBS most of the vehicles in NLOS can be benefited. A unique method of relay mechanism combined with machine learning is proposed to communicate with mobile nodes at NLOS.

KEYWORDS

5G, Millimeter Wave, Machine Learning, Relay, V2X communication

For More Details :
https://aircconline.com/csit/papers/vol10/csit101706.pdf

Volume Link :
http://airccse.org/csit/V10N17.html

Parallel Data Extraction Using Word Embeddings

Pintu Lohar and Andy Way ADAPT Centre, Dublin City University, Ireland

ABSTRACT

Building a robust MT system requires a sufficiently large parallel corpus to be available as training data. In this paper, we propose to automatically extract parallel sentences from comparable corpora without using any MT system or even any parallel corpus at all. Instead, we use crosslingual information retrieval (CLIR), average word embeddings, text similarity and a bilingual dictionary, thus saving a significant amount of time and effort as no MT system is involved in this process. We conduct experiments on two different kinds of data: (i) formal texts from news domain, and (ii) user-generated content (UGC) from hotel reviews. The automatically extracted sentence pairs are then added to the already available parallel training data and the extended translation models are built from the concatenated data sets. Finally, we compare the performance of our new extended models against the baseline models built from the available data. The experimental evaluation reveals that our proposed approach is capable of improving the translation outputs for both the formal texts and UGC.

KEYWORDS

Machine Translation, parallel data, user-generated content, word embeddings, text similarity, comparable corpora .

For More Details :
https://aircconline.com/csit/papers/vol10/csit101521.pdf

Volume Link :
http://airccse.org/csit/V10N15.html

DATA PREDICTION OF DEFLECTION BASIN EVOLUTION OF ASPHALT PAVEMENT STRUCTURE BASED ON MULTI-LEVEL NEURAL NETWORK

Shaosheng Xu¹, Jinde Cao² and Xiangnan Liu2 ¹Southeast University, Nanjing, China, ²Southeast University, China

ABSTRACT

Aiming at reducing the high cost of test data collection of deflection basins in the structural design of asphalt pavement and shortening the long test time of new structures, this paper innovatively designs a structure coding network based on traditional neural networks to map the pavement structure to an abstract space. Therefore, the generalization ability of the neural network structure is improved, and a new multi-level neural network model is formed to predict the evolution data of the deflection basin of the untested structure. By testing the experimental data of RIOHTRACK, the network structure predicts the deflection basin data of untested pavement structure, of which the average prediction error is less than 5%.

KEYWORDS

multi-level neural network, Encoding converter, structural of asphalt pavement, deflection basins, RIOHTRACK.

For More Details :
: https://aircconline.com/csit/papers/vol10/csit101304.pdf

Volume Link :
http://airccse.org/csit/V10N15.html

FOREX DATA ANALYSIS USING WEKA

Luciana Abednego and Cecilia Esti Nugraheni Parahyangan Catholic University, Indonesia

ABSTRACT

This paper conducts some experiments with forex trading data. The data being used is from kaggle.com, a website that provides datasets for machine learning and data scientists. The goal of the experiments is to know how to design many parameters in a forex trading robot. Some questions that want to be investigated are: How far the robot must set the stop loss or target profit level from the open position? When is the best time to apply for a forex robot that works only in a trending market? Which one is better: a forex trading robot that waits for a trending market or a robot that works during a sideways market? To answer these questions, some data visualizations are plotted in many types of graphs. The data representations are built using Weka, an open-source machine learning software. The data visualization helps the trader to design the strategy to trade the forex market. .

KEYWORDS

forex trading data, forex data experiments, forex data analysis, forex data visualization, weka

For More Details :
https://aircconline.com/csit/papers/vol10/csit101215.pdf

Volume Link :
http://airccse.org/csit/V10N12.html

DATA CONFIDENTIALITY IN P2P COMMUNICATION AND SMART CONTRACTS OF BLOCKCHAIN IN INDUSTRY 4.0

Jan Stodt and Christoph Reich University of Applied Sciences Furtwangen, Furtwangen, Baden-Württemberg, Germany

ABSTRACT

Increased collaborative production and dynamic selection of production partners within industry 4.0 manufacturing leads to ever-increasing automatic data exchange between companies. Automatic and unsupervised data exchange creates new attack vectors, which could be used by a malicious insider to leak secrets via an otherwise considered secure channel without anyone noticing. In this paper we reflect upon approaches to prevent the exposure of secret data via blockchain technology, while also providing auditable proof of data exchange. We show that previous blockchain based privacy protection approaches offer protection, but give the control of the data to (potentially not trustworthy) third parties, which also can be considered a privacy violation. The approach taken in this paper is not utilize centralized data storage for data. It realizes data confidentiality of P2P communication and data processing in smart contracts of blockchains.

KEYWORDS

blockchain, privacy protection, P2P communication, smart contracts, industry 4.0.

For More Details :
https://aircconline.com/csit/papers/vol10/csit101001.pdf

Volume Link :
http://airccse.org/csit/V10N10.html

PREPERFORMANCE TESTING OF A WEBSITE

Sushma Suryadevara and Shahid Ali AGI Institute, Auckland, New Zealand

ABSTRACT

This study was conducted on the importance of performance testing of web applications and analyzing the bottleneck applications. This paper highlights performance testing based on load tests. Everyone wants the application to be very fast, at the same time, reliability of the application also plays an important role, such that user’s satisfaction is the push for performance testing of a given application. Performance testing determines a few aspects of system performance under the pre-defined workload. In this study JMeter performance testing tool was used to implement and execute the test cases. The first load test was calculated with 200 users which was increased to 500 users and their throughput, median, average response time and deviation were calculated. .

KEYWORDS

Performance testing, load balancing, threads, throughput, JMeter, load test.

For More Details :
https://aircconline.com/csit/papers/vol10/csit100703.pdf

Volume Link :
http://airccse.org/csit/V10N07.html

PREDICTION OF CANCER MICROARRAY AND DNA METHYLATION DATA USING NON-NEGATIVE MATRIX FACTORIZATION

Parth Patel¹, Kalpdrum Passi1$ and Chakresh Kumar Jain² ¹Laurentian University, Canada ²Jaypee Institute of Information Technology, India

ABSTRACT

Over the past few years, there has been a considerable spread of microarray technology in many biological patterns, particularly in those pertaining to cancer diseases like leukemia, prostate, colon cancer, etc. The primary bottleneck that one experiences in the proper understanding of such datasets lies in their dimensionality, and thus for an efficient and effective means of studying the same, a reduction in their dimension to a large extent is deemed necessary. This study is a bid to suggesting different algorithms and approaches for the reduction of dimensionality of such microarray datasets.This study exploits the matrix-like structure of such microarray data and uses a popular technique called Non-Negative Matrix Factorization (NMF) to reduce the dimensionality, primarily in the field of biological data. Classification accuracies are then compared for these algorithms.This technique gives an accuracy of 98%. .

KEYWORDS

Microarray datasets, Feature Extraction, Feature Selection, Principal Component Analysis, Non-negative Matrix Factorization, Machine learning.

For More Details :
https://aircconline.com/csit/papers/vol10/csit100906.pdf

Volume Link :
http://airccse.org/csit/V10N09.html

OBJECT DETECTION IN TRAFFIC SCENARIOS - A COMPARISON OF TRADITIONAL AND DEEP LEARNING APPROACHES

Gopi Krishna Erabati, Nuno Gonçalves and Hélder Araújo Institute of Systems and Robotics, University of Coimbra, Portugal

ABSTRACT

In the area of computer vision, research on object detection algorithms has grown rapidly as it is the fundamental step for automation, specifically for self-driving vehicles. This work presents a comparison of traditional and deep learning approaches for the task of object detection in traffic scenarios. The handcrafted feature descriptor like Histogram of oriented Gradients (HOG) with a linear Support Vector Machine (SVM) classifier is compared with deep learning approaches like Single Shot Detector (SSD) and You Only Look Once (YOLO), in terms of mean Average Precision (mAP) and processing speed. SSD algorithm is implemented with different backbone architectures like VGG16, MobileNetV2 and ResNeXt50, similarly YOLO algorithm with MobileNetV1 and ResNet50, to compare the performance of the approaches. The training and inference is performed on PASCAL VOC 2007 and 2012 training, and PASCAL VOC 2007 test data respectively. We consider five classes relevant for traffic scenarios, namely, bicycle, bus, car, motorbike and person for the calculation of mAP. Both qualitative and quantitative results are presented for comparison. For the task of object detection, the deep learning approaches outperform the traditional approach both in accuracy and speed. This is achieved at the cost of requiring large amount of data, high computation power and time to train a deep learning approach. .

KEYWORDS

Object Detection, Deep Learning, SVM, SSD & YOLO.

For More Details :
https://aircconline.com/csit/papers/vol10/csit100918.pdf

Volume Link :
http://airccse.org/csit/V10N09.html

Top Data Science Research articles in 2020

Top Data Science Research articles in 2020

Minimum Viable Model Estimates for Machine Learning Projects

ABSTRACT

KEYWORDS

AN INTELLECTUAL APPROACH TO DESIGN PERSONAL STUDY PLAN VIA MACHINE LEARNING

ABSTRACT

KEYWORDS

MACHINE LEARNING ALGORITHM FOR NLOS MILLIMETER WAVE IN 5G V2X COMMUNICATION

ABSTRACT

KEYWORDS

Parallel Data Extraction Using Word Embeddings

ABSTRACT

KEYWORDS

DATA PREDICTION OF DEFLECTION BASIN EVOLUTION OF ASPHALT PAVEMENT STRUCTURE BASED ON MULTI-LEVEL NEURAL NETWORK

ABSTRACT

KEYWORDS

FOREX DATA ANALYSIS USING WEKA

ABSTRACT

KEYWORDS

DATA CONFIDENTIALITY IN P2P COMMUNICATION AND SMART CONTRACTS OF BLOCKCHAIN IN INDUSTRY 4.0

ABSTRACT

KEYWORDS

PREPERFORMANCE TESTING OF A WEBSITE

ABSTRACT

KEYWORDS

PREDICTION OF CANCER MICROARRAY AND DNA METHYLATION DATA USING NON-NEGATIVE MATRIX FACTORIZATION

ABSTRACT

KEYWORDS

OBJECT DETECTION IN TRAFFIC SCENARIOS - A COMPARISON OF TRADITIONAL AND DEEP LEARNING APPROACHES

ABSTRACT

KEYWORDS

Reach Us