preprint reviews by Indrasen Poola

A Deep Learning Approach to the Prediction of Short-term Traffic Accident Risk.

Honglei Ren and Jingxin Liu You Song Yucheng Hu Jinzhi Lei

Review posted on 21st November 2017

A Deep Learning Approach to the Prediction of Short-term Traffic Accident Risk.


The paper explained with the data and results of the scientific study and everything looks as expercted

show less


A Deep Learning Model for Traffic Flow State Classification Based on Smart Phone Sensor Data.

Wenwen Tu and Liping Fu Feng Xiao Guangyuan Pan

Review posted on 21st November 2017

A Deep Learning Model for Traffic Flow State Classification Based on Smart Phone Sensor Data.


The detailed data and results are satisfactory as per the scope and aim of the research paper. Mentionoing of any fuzzylogic and fuzzylogic search algorithm will be greatly helpful of the research,.

show less


A Comparison Of Machine Learning And Bayesian Modelling For Molecular Serotyping

Richard Newton, Lorenz Wernisch

Review posted on 12th November 2017

Streptococcus pneumoniae is a human pathogen that is a major cause of infant mortality. Identifying the pneumococcal serotype is an important step in monitoring the impact of vaccines used to protect against disease. Genomic microarrays provide an effective method for molecular serotyping. Previously we developed an empirical Bayesian model for the classification of serotypes from a molecular serotyping array. With only few samples available, a model driven approach was the only option. In the meanwhile, several thousand samples have been made available to us, providing an opportunity to investigate serotype classification by machine learning methods, which could complement the Bayesian model. Results: We compare the performance of the original Bayesian model with two machine learning algorithms: Gradient Boosting Machines and Random Forests. We present our results as an example of a generic strategy whereby a preliminary probabilistic model is complemented or replaced by a machine learning classifier once enough data are available. Despite the availability of thousands of serotyping arrays, a problem encountered when applying machine learning methods is the lack of training data containing mixtures of serotypes; due to the large number of possible combinations. Most of the available training data comprises samples with only a single serotype. To overcome the lack of training data we implemented an iterative analysis, creating artificial training data of serotype mixtures by combining raw data from single serotype arrays. Conclusions: With the enhanced training set the machine learning algorithms out perform the original Bayesian model. However, for serotypes currently lacking sufficient training data the best performing implementation was a combination of the results of the Bayesian Model and the Gradient Boosting Machine. As well as being an effective method for classifying biological data, machine learning can also be used as an efficient method for revealing subtle biological insights, which we illustrate with an example.


When Comparing the Machine Learning And Bayesian Modelling .Both have many similarity and way of solving the complex data problems with solutions.
ML is usually the combination of statistical and other models.Bayesian Modelling is purely a statistical based approach used for specific problems.

show less


AccurateML: Information-aggregation-based Approximate Processing for Fast and Accurate Machine Learning on MapReduce.

Rui Han Han and Zhentao Wang Fan Zhang

Review posted on 12th November 2017

—The growing demands of processing massive datasets

have promoted irresistible trends of running machine learning
applications on MapReduce. When processing large input data,
it is often of greater values to produce fast and accurate enough
approximate results than slow exact results. Existing techniques
produce approximate results by processing parts of the input
data, thus incurring large accuracy losses when using short job
execution times, because all the skipped input data potentially
contributes to result accuracy. We address this limitation by
proposing AccurateML that aggregates information of input data
in each map task to create small aggregated data points. These
aggregated points enable all map tasks producing initial outputs
quickly to save computation times and decrease the outputs’ size
to reduce communication times. Our approach further identifies
the parts of input data most related to result accuracy, thus first
using these parts to improve the produced outputs to minimize
accuracy losses. We evaluated AccurateML using real machine
learning applications and datasets. The results show: (i) it reduces
execution times by 30 times with small accuracy losses compared
to exact results; (ii) when using the same execution times, it
achieves 2.71 times reductions in accuracy losses compared to
existing approximate processing techniques.
Index Terms—MapReduce; machine learning; approximate
processing; result accuracy; information aggregatio

Fantastic model to use as ML tool.Combination of Optimized predictive ML model,Big data MapR,DL,Narrative science will make the magic of problem solving with simple and quick solution

show less


Using data science as a community advocacy tool to promote equity in urban renewal programs: An analysis of Atlanta's Anti-Displacement Tax Fund.

Jeremy Auerbach and Takeria Blunt Hayley Barton

Review posted on 12th November 2017

Using data science as a community advocacy tool to

promote equity in urban renewal programs: An analysis of
Atlanta’s Anti-Displacement Tax Fund


The analysis is really good and will help to subsidize the housing programs by county.It will be more helpful if extended to all states.

show less


The democratization of data science education

Sean Kross , Roger Peng, Brian Caffo, Ira Gooding and Jeffrey Leek

Review posted on 12th November 2017

The democratization of data science education

More and more open sourced,cloud based training and knowledge shared platforms needed for The democratization of data science education.Single place for training and sample data sets.Once we make the raw data open to public, the entrepreneurs will do the next job of analysis and predictions with the data


revisit: a Workflow Tool for Data Science.

Norman Matloff and Laurel Beckett Reed Davis Paul Thompson

Review posted on 12th November 2017

In recent years there has been widespread concern in the scienti€c

community over a reproducibility crisis. Among the major
causes that have been identi€ed is statistical: In many scienti€c
research the statistical analysis (including data preparation) su‚ers
from a lack of transparency and methodological problems, major
obstructions to reproducibility. Œe revisit package aims toward
remedying this problem, by generating a “so‰ware paper trail” of
the statistical operations applied to a dataset. Œis record can be
“replayed” for veri€cation purposes, as well as be modi€ed to enable
alternative analyses. Œe so‰ware also issues warnings of certain
kinds of potential errors in statistical methodology, again related
to the reproducibility issue.


R is used for the predictive analytics.It would be better if you use the open sourced ,cloud , non programming platform to make whole process simple and easier.

show less


Promoting Saving for College Through Data Science.

Fernando Diaz and Natnaell Mammo

Review posted on 12th November 2017

The cost of attending college has been steadily rising and in 10

years is estimated to reach $140,000 for a 4-year public
university1
. Recent surveys estimate just over half of US families
are saving for college2
. State-operated 529 college savings plans
are an effective way for families to plan and save for future
college costs, but only 3% of families currently use them3
.
The Office of the Illinois State Treasurer (Treasurer) administers
two 529 plans to help its residents save for college. In order to
increase the number of families saving for college, the Treasurer
and Civis Analytics used data science techniques to identify the
people most likely to sign up for a college savings plan. In this
paper, we will discuss the use of person matching to join
accountholder data from the Treasurer to the Civis National File,
as well as the use of lookalike modeling to identify new potential
signups. In order to avoid reinforcing existing demographic
imbalances in who saves for college, the lookalike models used
were ensured to be racially and economically balanced. We will
also discuss how these new signup targets were then individually
served digital ads to encourage opening college savings accounts.

The data science and bid data with ML,DL, & AI ..easy to predict the opportunities ,savings and risks.Civis is the open source ML platform to use the ready made data science for ready made package solutions for problem solving.

show less


Neural data science: accelerating the experiment-analysis-theory cycle in large-scale neuroscience

Liam Paninski, John Cunningham

Review posted on 12th November 2017

Modern large-scale multineuronal recording methodologies, including multielectrode arrays,

calcium imaging, and optogenetic techniques, produce single-neuron resolution data of a
magnitude and precision that were the realm of science fiction twenty years ago. The major
bottlenecks in systems and circuit neuroscience no longer lie in simply collecting data from large
neural populations, but also in understanding this data: developing novel scientific questions,
with corresponding analysis techniques and experimental designs to fully harness these new
capabilities and meaningfully interrogate these questions. Advances in methods for signal
processing, network analysis, dimensionality reduction, and optimal control – developed in
lockstep with advances in experimental neurotechnology -- promise major breakthroughs in
multiple fundamental neuroscience problems. These trends are clear in a broad array of
subfields of modern neuroscience; this review focuses on recent advances in methods for
analyzing neural time-series data with single-neuronal precision.

The big data ,ML,AI and other prediction models are ready made available to measure the neuroscience and other medical industry data.

show less


Mapping for accessibility: A case study of ethics in data science for social good.

Anissa Tanweer and Margaret Drouhard Nicholas Bolten

Review posted on 12th November 2017

Ethics in the emerging world of data science are often discussed

through cautionary tales about the dire consequences of missteps
taken by high profile companies or organizations. We take a
different approach by foregrounding the ways that ethics are
implicated in the day-to-day work of data science, focusing on
instances in which data scientists recognize, grapple with, and
conscientiously respond to ethical challenges. This paper presents
a case study of ethical dilemmas that arose in a “data science for
social good” (DSSG) project focused on improving navigation for
people with limited mobility. We describe how this particular
DSSG team responded to those dilemmas, and how those
responses gave rise to still more dilemmas. While the details of the
case discussed here are unique, the ethical dilemmas they
illuminate can commonly be found across many DSSG projects.
These include: the risk of exacerbating disparities; the thorniness
of algorithmic accountability; the evolving opportunities for
mischief presented by new technologies; the subjective and valueladen
interpretations at the heart of any data-intensive project; the
potential for data to amplify or mute particular voices; the
possibility of privacy violations; and the folly of technological
solutionism. Based on our tracing of the team’s responses to these
dilemmas, we distill lessons for an ethical data science practice
that can be more generally applied across DSSG projects.
Specifically, this case experience highlights the importance of: 1)
Setting the scene early on for ethical thinking 2) Recognizing
ethical decision-making as an emergent phenomenon intertwined
with the quotidian work of data science for social good 3)
Approaching ethical thinking as a thoughtful and intentional
balancing of priorities rather than a binary differentiation between
right and wrong.

The ethics,data security and privacy is needed with most accountability and advocacy.Its not just only the data science, its about robots,AI and automation.
So advocacy is needed for data.

show less


Greater data science at baccalaureate institutions.

Amelia Mcnamara and Benjamin S. Baumer Nicholas J. Horton

Review posted on 12th November 2017

Focused on the
1. Data Gathering, Preparation, and Exploration

2. Data Representation and Transformation
3. Computing with Data
4. Data Modeling
5. Data Visualization and Presentation
6. Science about Data Science


The use cases of the data science was not mentioned with the greater explanations of ML models and other data science latest technologies

show less


Data Science Issues in Understanding Protein-RNA Interactions

Anob M. Chakrabarti, Nejc Haberman, Arne Praznik, Nicholas M. Luscombe, Jernej Ule

Review posted on 12th November 2017

An interplay of experimental and computational methods is required to achieve a comprehensive understanding of protein-RNA interactions. Crosslinking and immunoprecipitation (CLIP) identifies endogenous interactions by sequencing RNA fragments that co-purify with a selected RBP under stringent conditions. Here we focus on approaches for the analysis of resulting data and appraise the methods for peak calling, visualisation, analysis and computational modelling of protein-RNA binding sites. We advocate a combined assessment of cDNA complexity and specificity for data quality control. Moreover, we demonstrate the value of analysing sequence motif enrichment in peaks assigned from CLIP data, and of visualising RNA maps, which examine the positional distribution of peaks around regulated landmarks in transcripts. We use these to assess how variations in CLIP data quality, and in different peak calling methods, affect the insights into regulatory mechanisms. We conclude by discussing future opportunities for the computational analysis of protein-RNA interaction experiments.


Using the manual approaches consume lot of time and its complex.
Many cloud and open sourced big data,ml and AI pltforms are already built with all the probability,permutations and combination of proteins and diseases set methods to make the life simplier. Once of the example is University of santa cruz is the platform

show less


A Data Science Approach to Understanding Residential Water Contamination in Flint.

Alex Chojnacki and Arya Farahi Chengyu Dai Guangsha Shi Jared Webb Daniel T. Zhang Jacob Abernethy Eric Schwartz

Review posted on 12th November 2017

Data science and ML models for the prediction of water contamination using the various data capturing methods and advanced techniques.


Training and testing the sample data and algorithm.selection of the best fit model with optimum ROC curve selection.But not sure what exactly the model name was selected for the prediction of the results.

show less


A Proof of Orthogonal Double Machine Learning with $Z$-Estimators.

Vasilis Syrgkanis

Review posted on 04th October 2017

Orthogonal Double Machine Learning with $Z$-Estimators.


Basically ML meaning that going with lot of estimations,assumptions...going with double estimation ,not sure how its going to provide the accurate results.ML has all 100% chances of all possible estimations and chances with the data.

show less


Applying Machine Learning Methods to Enhance the Distribution of Social Services in Mexico.

Kris Sankaran , Mobin Javed Diego Garcia-Olano Maria Fernanda Alcala-Durand Adolfo De Un , Paul Van Der Boor Nue Eric Potash Roberto S , Luis I Nchez Avalos and Rayid Ghani Aki Alberro Encinas

Review posted on 04th October 2017

Machine Learning Methods to Enhance the Distribution of Social Services in Mexico.


ML is solve the toughest problems in the society.The problem identification,data collection,model selection,Training & testing,supervising and unsupervised the algorithm, hence the final output is the result of the solution to the problem,

show less


A hybrid supervised/unsupervised machine learning approach to solar flare prediction.

Federico Benvenuto and Cristina Campi Michele Piana Anna Maria Massone

Review posted on 04th October 2017

hybrid supervised/unsupervised machine learning approach to solar flare prediction.


Definitely a very good and innovative approach towards the solution.

show less


Agent-Based Model Calibration using Machine Learning Surrogates.

Francesco Lamperti and Amir Sani Andrea Roventini

Review posted on 04th October 2017

Agent-Based Model Calibration using Machine Learning Surrogates.


Lot of these approached and methods are already been featured in most of the Data science ML software tools already ex"IBM SPSS.Surrogate ML model is just a back up approach.

show less


A Brief Introduction to Machine Learning for Engineers.

Osvaldo Simeone

Review posted on 04th October 2017

Introduction to Machine Learning Models

The journal was covered most of the concepts and aspects with the clear explanation and models .Elaborated the ML model formulas,methods,calculations with well defined explanations of training the models,testing the models,supervising and unsupervising the ML models.Training the data first or training the data first?..Covered most of them.


Documenting and Evaluating Data Science Contributions in Academic Promotion in Departments of Statistics and Biostatistics

Lance A. Waller

Review posted on 01st October 2017

Statistics and Biostatistics usage in Data Science


Less coverage discussion of Statistics and Biostatistics in the paper.Including .eloboration of statistical algorithms,predictive models and model evaulations

show less


Teaching Data Science.

Robert J. Brunner and Edward J. Kim

Review posted on 01st October 2017

Training of Data Science

Training on Data science is needed as its a shift of many professionals.Many of the cloud training platforms are available. Ex: Big data university is one of the open sourced and free training platform.All big data and data science technologies are presenting in the platform.


Data, Science and Society.

Claudio Gutierrez

Review posted on 30th September 2017

The paper focused main on the science and society of the data. Specifically focused on latency and capturing of data (sensors, telescopes, Web, etc.).Producing data (computers, games, media, LHC,etc.); Storing data (memory, storage media, cloud, etc); Analyzing of data (statistical techniques,neural networks, (deep) learning, etc.). Social Character of data.Research and scientific of data.Notion of data.Explained well about the Data sizes and human-scale.


The latency,capturing,storing ,analyzing,notion,size of the data defined.Many of the latest emerged these days to help these will the proven techniques.Narrative science is another new area where it actaully gives life to data with the compelling narartives, contexts, communications,sense and motion recognistion.

show less


A Case for Data Commons: Towards Data Science as a Service.

Robert L. Grossman and Mark Murphy Allison Heath Maria Patterson Walt Wells

Review posted on 30th September 2017

Open Source Data Science

The concept of OSDC is the latest and people are already built some similar platforms Ex:Big data university is the same concept which was built up on. Everything is Cloud basis and all the services under same cloud infrastructure.


Artificial Intelligence Approaches To UCAV Autonomy.

Amir Husain and University Of Texas At Austin) Bruce Porter Bruce Porter

Review posted on 29th September 2017

An analysis of current approaches Artificial Intelligence

algorithms and techniques to autonomous control is provided
followed by an exploration of how these techniques can be extended and enriched
with AI techniques including Artificial Neural Networks (ANN), Ensembling and
Reinforcement Learning (RL).

The today's ML,DL & AI methods are easy to understand and incorporate into the verious industries.

show less


Artificial Intelligence and Data Science in the Automotive Industry.

Martin Hofmann , Thomas B Florian Neukart and Ck

Review posted on 29th September 2017

AI and ML predictive models usage in Automotive industries for their new products and developments with the help of these technologies.Combination of Iot,Cloud ,and Big data will make the real and difference for future applications most effectively.


Need to know what kind of predictive modeling of ML used.

show less


Accurate genetic profiling of anthropometric traits using a big data approach

Oriol Canela-Xandri, Konrad Rawlik, John A. Woolliams, Albert Tenesa

Review posted on 29th September 2017

Using Big data analytics predictions and findings helpful in genetics.Helps in the prediction of genome sequence and environmental risk factors to realise the full potential of genomic medicine and genomic prediction of complex traits.


Phenotype prediction Random sampling.genome sequence and environmental risk factors to realise the full potential of genomic medicine with the genotype quality control and data filtering methods.

show less


Yet Another ADNI Machine Learning Paper? Paving The Way Towards Fully-reproducible Research on Classification of Alzheimer's Disease.

Jorge Samper-Gonz , Lez and Sabrina Fontanella Ninon Burgos Hugo Bertin Marie-Odile Habert Stanley Durrleman Theodoros Evgeniou Olivier Colliot

Review posted on 27th September 2017

ADNI Machine learning methods in the Alzheimer's Diseases prevention.The ADNI Machine learning algorithm specifically designed for the prediction of the Alzheimer's Diseases prevetion.


The AI -ML -DL approach and new design of the DL algorithm is for designed for the prediction of the image recognition with the better prediction and accuracy.Its package algorithm specially designed for the medical and health care usages and preventive diseases.

show less