Client: Talent Alpha
The field of modern Digital Twins is quickly gaining importance, especially when we consider growing capabilities offered by the development of the Big Data tech industry. Gathering huge amounts of data has become easier and more efficient than ever and scientists are chasing each other to find out new and better ways to use the data to our advantage.
When we pass the stage of analysing historical trends and understanding where we are now, we start focusing on what lies ahead of us. This is where predictive analytics, powered by machine learning, becomes invaluable.
How can we use Smart City Analytics in practice?
When our client asked us for guidance in the field of AI, they had already worked on a Digital Twin for one of the largest cities in the world. Together, we focused on adding a predictive capability to the system. Two major use-cases connected to public transportation were investigated - predicting a number of Bus Entries and modeling arrival times of the buses.
First, we performed a careful data diagnostic and defined the possible scope of prediction – its’ time frame, granularity, finding additional data sources, and investigating the ability to connect the solution to the existing backend.
Then taking all of the above into consideration, for each of the specified use-cases we developed a tailored machine learning model aimed at predicting:
- bus passenger movements using travel card data
- the time of bus arrival at a given bus stop using live bus movement data.
During that process, we tested a variety of modeling techniques ranging from Time Series Analysis (e.g. using Facebook’s Prophet algorithm) to complex machine learning models (including XGBoost, MLP).
The end product of the first PoC was a machine learning model predicting the number of entries in the next 15 minutes on a chosen bus stop. The model performed around 18 p.p. better than the baseline model (historical trend).
To start modeling bus arrivals, we had to build a full data pipeline which started by scraping live data, cleaning it, and applying a custom-built algorithm that connected raw observations into bus journeys. Working with a limited dataset, the accuracy of the model predicting bus arrivals was close to the GPS-based system displayed live on the bus stops. Our model achieved slightly higher accuracy when the distance between the bus and the bus station was greater.
On top of that, we also provided additional analyses connected to the two major use cases. We created a set of Jupyter Notebooks with interactive visualisations to be used as a tool to investigate and troubleshoot the bottlenecks in the transportation system. Besides modeling the expected number of passenger entries and bus arrival times we also focused on anomaly prediction using the Isolation Forest method. We discovered that using historical values of bus entries (in the last 15, 30, 45, 60 minutes) we can, with a high degree of certainty, predict whether the next time interval will be an anomaly.
Overall, for those two uses cases we have leveraged state-of-the-art artificial intelligence techniques to enhance the Digital Twin model of a modern smart city.
Technologies used: Python (Jupyter Notebook, PyCharm), Git, Prophet, XGBoost, MLP
Methodologies used: Data Analysis, Predictive modeling, Machine Learning, Time Series Analysis, Data Visualization, Outlier Detection, Linear Regression, Isolation Forest