Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modernize IEX_Trading #398

Open
wants to merge 11 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions doc/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -258,19 +258,30 @@ def root_and_gallery_index_redirects():
'walker_lake/walker_Lake': 'gallery/walker_lake/walker_lake',
}

renamed_project_files_links = {
'gallery/iex_trading/IEX_trading': 'gallery/iex_trading/1_IEX_trading',
'gallery/iex_trading/IEX_stocks': 'gallery/iex_trading/2_IEX_stocks',
}

if SINGLE_PROJECT:
project_direct_links = {
k: v
for k, v in project_direct_links.items()
if k.split('/')[0] == SINGLE_PROJECT
}
renamed_project_files_links = {
k: v
for k, v in renamed_project_files_links.items()
if k.split('/')[1] == SINGLE_PROJECT
}

rediraffe_redirects = {
**top_level_redirects,
**project_direct_links,
# Links from e.g. /attractors to /gallery/attractors/index.html
# And from e.g. /gallery/boids/index.html to /gallery/boids/boids.html
**root_and_gallery_index_redirects(),
**renamed_project_files_links,
}

html_context.update({
Expand Down
293 changes: 293 additions & 0 deletions iex_trading/1_IEX_trading.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,293 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Visualize all trades"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Introduction\n",
"\n",
"The Investors Exchange, [IEX](https://iextrading.com/), is a transparent stock exchange that discourages high-frequency trading and makes historical trading data [publicly available](https://iextrading.com/trading/market-data/#hist-download). The data is offered in the form of daily [pcap](https://en.wikipedia.org/wiki/Pcap) files where each single packet corresponds to a stock trade.\n",
"\n",
"Even with this specialized pcap file format, these daily records can exceed a gigabyte in size on a given day. In this notebook, we will\n",
"develop a dashboard that will allow us to explore every single trade that happened in a day, including the associated metadata. To visualize all this data at once both rapidly and interactively, we will use [Datashader](https://datashader.org/) via [HoloViews' API](http://holoviews.org/user_guide/Large_Data.html).\n",
"\n",
"The [IEX stock data](https://iextrading.com/trading/market-data/#hist-download) is saved in two formats of pcap file called [TOPS](https://iextrading.com/docs/IEX%20TOPS%20Specification.pdf) and [DEEP](https://iextrading.com/docs/IEX%20DEEP%20Specification.pdf). These formats are complex enough to make it non trivial to parse the trades with standard packet loading tools. For this reason, the trades for Monday 21st of October 2019 are supplied as a CSV file that has been generated from the original pcap file using the [IEXTools](https://pypi.org/project/IEXTools/) library.\n",
"\n",
"![image](./thumbnails/iex_trading_thumbnail.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Loading the data"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import datetime\n",
"import pandas as pd\n",
"\n",
"df = pd.read_csv('./data/IEX_2019-10-21.csv')\n",
"print(f'Dataframe loaded containing {len(df):,} events')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can now look at the head of this DataFrame to see its structure:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"df.head(3)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Each row above corresponds to a stock trade where `price` indicates the stock price, the `size` indicates the number of stocks in the trade, and the `symbol` specifies which stock was traded. Every trade also has a timestamp specified in nanoseconds since Unix epoch (UTC).\n",
"\n",
"Note that multiple trades can occur on the same timestamp.\n",
"\n",
"## Visualizing trade with `Spikes`\n",
"\n",
"We can now load HoloViews with the Bokeh plotting extension to start visualizing some of this data:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import holoviews as hv\n",
"from holoviews.operation.datashader import spikes_aggregate\n",
"\n",
"hv.extension('bokeh')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"One way to visualize events that occur over time is to use the [Spikes](http://holoviews.org/reference/elements/bokeh/Spikes.html#bokeh-gallery-spikes) element. Here we look at the first hundred spikes in this DataFrame. Note that we convert temporarily convert the timestamp from nanoseconds to microseconds to avoid a warning emitted by Bokeh when it deals with very large numbers (*BokehUserWarning: out of range integer may result in loss of precision*)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"hv.config.image_rtol = 10e-3 # Fixes datetime issue at high zoom level\n",
"\n",
"hv.Spikes(\n",
" df.head(100).assign(timestamp=df.timestamp/1000),\n",
" ['timestamp'], ['symbol', 'size', 'price']\n",
").opts(xrotation=90, tools=['hover'], spike_length=1, position=0)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here every line corresponds to a trade where the position along the x-axis indicates the time at which that trade occurred (the `timestamp` in microseconds). If you hover over the spikes above, you can view all the timestamp values for the trades underneath the cursor as well as their corresponding stock symbols.\n",
"\n",
"While many domains may use integers as their time axis (e.g., CPU cycle for processor events), in this case, we would like to recover the timestamp as a date.\n",
"\n",
"We will do this in two steps: \n",
"1. We map the integers to `datetime64[ns]` to get `datetime` objects.\n",
"2. We subtract 4 hours to go from UTC to the local time at the exchange (located in New Jersey):\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"df['timestamp'] = df['timestamp'].astype('datetime64[ns]')\n",
"df['timestamp'] -= datetime.timedelta(hours=4)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"By applying the `spikes_aggregate` operation, that relies on Datashader to aggregate the spikes in a performant way, we can visualize all 1.2 million trades available:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"spikes = hv.Spikes(df, ['timestamp'], ['symbol', 'size', 'price'])\n",
"rasterized = spikes_aggregate(\n",
" spikes,\n",
" aggregator='count', spike_length=1\n",
").opts(\n",
" width=600, colorbar=True, cmap='blues', yaxis=None, xrotation=90,\n",
" default_tools=['xwheel_zoom', 'xpan', 'xbox_zoom'],\n",
")\n",
"rasterized"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Using the `count` aggregator, we can see the density of trades over time colormapped such that white indicates the highest trade density and black indicates a lack of trades at that time. In the [next notebook](./2_IEX_stocks.ipynb) we will aggregate over the `'size'` column to visualize a more useful metric, namely the trade volume.\n",
"\n",
"We use `spike_length` plot option to give all the spikes a fixed height regardless of any value dimensions specified. The `Spikes` element also supports variable heights according to a value dimension as seen [on the reference page](https://holoviews.org/reference/elements/bokeh/Spikes.html#bokeh-gallery-spikes).\n",
"\n",
"Note that the above plot is **interactive**: when you zoom in, HoloViews/Datashader will recompute and update the visualization accordingly. When zoomed out, you will notice that trade volume goes up at the end of the day - these are all the trades being made at the last minute before the exchange closes for that day!\n",
"\n",
"## Exploring the IEX trade metadata interactively\n",
"\n",
"We can use HoloViews, Datashader and Bokeh to effectively visualize a large number of trades. Datashader will handle displaying trades at all zoom levels, helping manage the load when the view is zoomed out and many trades overlap. When you zoom in closely enough to see individual trades, Bokeh's hover tool becomes useful. This method keeps the data handling efficient and makes detailed information accessible when needed.\n",
"\n",
"### Using HoloViews to build custom interactivity\n",
"\n",
"Enabling Bokeh hover information at a specific zoom level is not the best approach as different days will have different trade volumes and no fixed window will be appropriate for all datasets.\n",
"\n",
"Instead, we can decide to show hover information dynamically:\n",
"\n",
"- We do not want to always display hover information, that would be overwhelming on anything but a very zoomed-in view\n",
"- We want to display hover information when there's a sufficiently small number of trades displayed, yet, this number should be high enough to display sufficient information. We also have to make sure we choose a number that can be handled by the browser. We picked **600** for this example but you can try with other values.\n",
"\n",
"To achieve this dynamic hover display, we don't need to hook in the HoloViews Datashader operation, we are going to add to the plot a dynamic element that is update based on x-range updates. In practice, we're going to use HoloViews streams and the `apply` method that takes a callback and returns a `DynamicMap`.\n",
" `apply` method.\n",
"\n",
"Before this approach is demonstrated, we will want to define a hover tool specification to format our datetime timestamps nicely. It is a [custom HoloViz hover tool](https://holoviews.org/user_guide/Plotting_with_Bokeh.html#hover-tools) that is set as a plot option with the `hover_tools` keyword."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"hover = [\n",
" ('Symbol', '@symbol'),\n",
" ('Size', '@size'),\n",
" ('Price', '@price'),\n",
" ('Timestamp', '@timestamp{%F %H:%M %Ss %3Nms}')\n",
"]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Next we declare a `RangeX` stream to get the plot range from Bokeh from our spikes object:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"range_stream = hv.streams.RangeX(source=spikes)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Using this stream, we can write a callback that uses the supplied x-ranges to do the following:\n",
"\n",
"1. First, it slices all the spikes across the whole dataset to those visible in the current viewport (`spikes[pd.to_datetime(low):pd.to_datetime(high)]`).\n",
"2. Next, it checks if there are fewer than 600 spikes. If so, it returns this sliced set of spikes; otherwise, it returns `ranged.iloc[:0]`, which is a `Spikes` object containing zero spikes.\n",
"3. We make sure these spikes are plotted with a length of one and make them invisible (we only want the associated hover information).\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def xrange_filter(spikes, x_range):\n",
" low, high = (None, None) if x_range is None else x_range\n",
" ranged = spikes[pd.to_datetime(low):pd.to_datetime(high)]\n",
" total_displayed = len(ranged)\n",
" if total_displayed >= 600:\n",
" ranged = ranged.iloc[:0]\n",
" return ranged.opts(spike_length=1, alpha=0, title=f'{total_displayed} spikes displayed')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Finally, we can combine our `range_stream` with this callback using the `apply` method on spikes. This creates a `DynamicMap` that will offer the hover information for 600 or fewer spikes once sufficiently zoomed in. The only thing left to do is to overlay this on top of the interactive, zoomable rasterization generated by the Datashader operation:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"filtered = spikes.apply(xrange_filter, streams=[range_stream])\n",
"hover_filtered = filtered.opts(tools=['hover'], hover_tooltips=hover)\n",
"rasterized * hover_filtered"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Try zooming into the last 500 millisecond region before the exchange closes to see the last few stock symbols that were traded on that day.\n",
"\n",
"## Next steps\n",
"\n",
"This notebook illustrates how a large number of events (1.2 million) can be visualized interactively with HoloViews, Bokeh and Datashader and how we can inspect the data for individual events by zooming in and using the hover tool. Visualizing all the data at once in this way allows you to see the overall structure of the data and identify any peculiarities in it. For instance, the increase in trading volume at the end of the day is immediately obvious, and by zooming in, it is possible to identify a handful of trades that occur after 4pm after the bulk of trading has ceased.\n",
"\n",
"What this visualization fails to offer is any way to identify the trading patterns of individual stocks out of the entire volume of trades. The hover tool only activates when zoomed in, and there is no suitable method for partitioning out the trading times by stock. The next notebook will extend the approach developed here to analyze the most traded stocks on this day.\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.9"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
Loading
Loading