This post is meant as a short tutorial on how to set up PySpark to access a MySQL database and run a quick machine learning algorithm with it. Both PySpark and MySQL are locally installed onto a computer running Kubuntu 20.04 in this example, so this can be done without any external resources.
Installing MySQL onto a Linux machine is fairly quick thanks to the
apt package manager with
sudo apt install mysql-server. Once it’s installed, you can run
sudo mysql in a terminal to access MySQL from the command line:
For PySpark, just running
pip install pyspark will install…
When trying to look at examples of LSTMs in Keras, I’ve found a lot that focus on using them to predict stock prices in the future. Most are pretty bare-bones though, consisting of little more than a basic LSTM network and a quick plot of the prediction. Though I think the utility of these models is a little questionable, it brought a question into my head: how accurate are the predictions made by a model trained on one stock if it’s predicting on another stock?
The full code can be found here.
Stocks are correlated with each other to varying…
I see a lot of examples of data science projects and their technical underpinnings. I also see plenty of posts giving broad but ultimately vague advice on how to do such a project. But few, if any, seem to actually walk through a thought process for making a project. …
Early-career data scientist/statistician, recently finished a Master’s in Statistics.