Introduction
This article presents a class which lets you run Python scripts from a C# client (PythonRunner
). The scripts can produce textual output as well as images which will be converted to C# Images. This way, the PythonRunner
class gives C# applications not only access to the world of Data Science, Machine Learning, and Artificial Intelligence, but also makes Python's exhaustive libraries for charting, plotting, and data visualization (for e.g., matplotlib and seaborn) available for C#.
Background
General Considerations
I'm a C# developer for more than ten years now - and what can I say: Over the years, I deeply fell in love with this language. It gives me great architectural flexibility, it has large community support, and a wealth of third-party tools (both free and commercial, much of it is open-source) supporting almost every thinkable use case. C# is an all-purpose language and the number one choice for business application development.
In the last months, I started to learn Python for Data Science, Machine Learning, Artificial Intelligence, and Data Visualization - mostly because I thought that this skill will push my career as a freelance software developer. I soon realized that Python is great for the above tasks (far better than C# ever will be), but it is completely unsuitable for developing and maintaining large-scale business applications. So the question C# vs. Python (a widely discussed topic in the internet) completely misses the point. C# is for developers who do the job of business-scale application development - Python is for data scientists who do - well - data science, machine learning, and data visualization. There's not much that these two jobs have in common.
The task is not that C# developers additionally become data scientists or ML specialists (or the other way round). It would simply be too much to be an expert in both domains. This is IMHO the main reason why components like Microsoft's ML.NET or the SciSharp STACK won't become widely used. The average C# developer won't become a data scientist, and data scientists won't learn C#. Why would they? They already have a great programming language which is very well suited for their needs, and has a large ecosystem of scientific third-party libraries.
With these considerations in mind, I started searching for an easier and more 'natural' way to bring the Python world and the C# world together. Here is one possible solution ...
The Example App
Before we dive into details, a short preliminary note: I wrote this sample app for the sole purpose of demonstrating C#/Python integration, using only some mildly sophisticated Python code. I didn't care too much about the question if the ML code in itself is useful, so please be forgiving in that respect.
Having that said, let me shortly describe the sample application. Basically, it:
- gives you a list of stocks that you can select from (6-30),
- draws a summary line chart of the (normalized) monthly stock prices,
- performs a so-called 'k-Means Clustering Analysis' based on their price movements and shows the results in a
treeview
.
Let's shortly step through the three parts of the application one by one...
Stock Selection
The DataGrid on the left side of the application's window presents you a list of available stocks that you can select from. You need at least six items to be selected before further action is possible (the maximum number of selected stocks is 30). You may use the controls on the top to filter the list. Also, the list is sortable by clicking the column headers. The Check Random Sample button randomly selects 18 stocks from the list.
Adjust Other Parameters
In addition to stock selection, you may also adjust other parameters of the analysis: the analyzed date range and the number of clusters metaparameter for the k-Means analysis. This number cannot be greater than the number of selected stocks.
Analysis Results
If you're done with stock selection and parameter adjustment, you can press the Analyze button in the bottom right corner of the window. This will (asynchronously) call Python scripts that perform the above described steps (draw a chart and perform a k-Means Clustering Analysis). On return, it will process and display the script 's output.
The middle part of the window is a chart with the prices of the selected stocks, normalized such that the price on start date is set to zero and the stock prices are scaled to percentage change from this starting point. The Image that results from running the script is wrapped in a ZoomBox control to enhance accessibility and user experience.
On the very right side of the window, a tree is shown with the processed results of the clustering analysis. It groups (clusters) the stocks based on their relative price movements (in other words: the closer two stocks move together, the more likely it is that they are in the same cluster). This tree is also used as a color legend for the chart.
Points of Interest
Principal Structure of the Code
Generally, the project consists of:
- the C# files
- the Python scripts in a scripts subfolder (chart.py, kmeans.py, and common.py)
- an SQLite database that is accessed both by C# code and Python scripts (
stockdata.sqlite
)
Other things to note:
- On the C# side, the database is accessed using EF6 and the recipe from this Codeproject article.
- Some WPF UI controls come from the Extended WPF Toolkit™.
- Of course, a Python environment with all the required packages must be installed on the target system. The respective path is configured via the app.config file.
- The C# part of the application uses WPF and follows the MVVM pattern. According to the three-fold overall structure of the application's main window, there are three viewmodels (
StockListViewModel
, ChartViewModel
, and TreeViewViewModel
) that are orchestrated by a fourth one (the MainViewModel
).
The C# Side
The PythonRunner Class
The central component for running Python scripts is the PythonRunner
class. It is basically a wrapper around the Process class, specialized for Python. It supports textual output as well as image output, both synchronously and asynchronously. Here is the public
interface of this class, together with the code comments that explain the details:
public class PythonRunner
{
public PythonRunner(string interpreter, int timeout = 10000) { ... }
public event EventHandler<PyRunnerStartedEventArgs> Started;
public event EventHandler<PyRunnerExitedEventArgs> Exited;
public string Interpreter { get; }
public int Timeout { get; set; }
public string Execute(string script, params object[] arguments) { ... }
public Task<string> ExecuteAsync(string script, params object[] arguments) { ... }
public Bitmap GetImage(string script, params object[] arguments) { ... }
public Task<Bitmap> GetImageAsync(string script, params object[] arguments) { ... }
}
Retrieving Stock Data
As already mentioned, the sample app uses a SQLite database as its datastore (which is also accessed by the Python side - see below). To this end, Entity Framework is used, together with the recipe found in this Codeproject article. The stock data are then put into a ListCollectionView, which supports filtering and sorting:
private void LoadStocks()
{
var ctx = new SQLiteDatabaseContext(_mainVm.DbPath);
var itemList = ctx.Stocks.ToList().Select(s => new StockItem(s)).ToList();
_stocks = new ObservableCollection<StockItem>(itemList);
_collectionView = new ListCollectionView(_stocks);
ICollectionView view = CollectionViewSource.GetDefaultView(_collectionView);
view.SortDescriptions.Add(new SortDescription("Name", ListSortDirection.Ascending));
}
Getting Textual Output
Here, PythonRunner
is calling a script that produces textual output. The KMeansClusteringScript
property points to the script to execute:
private async Task<string> RunKMeans()
{
TreeViewText = Processing;
Items.Clear();
try
{
string output = await _mainVm.PythonRunner.ExecuteAsync(
KMeansClusteringScript,
_mainVm.DbPath,
_mainVm.TickerList,
_mainVm.NumClusters,
_mainVm.StartDate.ToString("yyyy-MM-dd"),
_mainVm.EndDate.ToString("yyyy-MM-dd"));
return output;
}
catch (Exception e)
{
TreeViewText = e.ToString();
return string.Empty;
}
}
And here is some sample output produced by the script:
0 AYR 0,0,255
0 PCCWY 0,100,0
0 HSNGY 128,128,128
0 CRHKY 165,42,42
0 IBN 128,128,0
1 SRNN 199,21,133
...
4 PNBK 139,0,0
5 BOTJ 255,165,0
5 SPPJY 47,79,79
The first column is the cluster number of the k-Means analysis, the second column is the ticker symbol of the respective stock, and the third column indicates the RGB values of the color that was used to draw this stock's line in the chart.
Getting an Image
This is the method that uses viewmodel's PythonRunner
instance for asynchronously calling the required Python script (the path of which is stored in the DrawSummaryLineChartScript
property) together with the required script arguments. The result is then processed into a 'WPF-friendly' form, as soon as it becomes available:
internal async Task<bool> DrawChart()
{
SummaryChartText = Processing;
SummaryChart = null;
try
{
var bitmap = await _mainVm.PythonRunner.GetImageAsync(
DrawSummaryLineChartScript,
_mainVm.DbPath,
_mainVm.TickerList,
_mainVm.StartDate.ToString("yyyy-MM-dd"),
_mainVm.EndDate.ToString("yyyy-MM-dd"));
SummaryChart = Imaging.CreateBitmapSourceFromHBitmap(
bitmap.GetHbitmap(),
IntPtr.Zero,
Int32Rect.Empty,
BitmapSizeOptions.FromEmptyOptions());
return true;
}
catch (Exception e)
{
SummaryChartText = e.ToString();
return false;
}
}
The Python Side
Suppress Warnings
An important thing to note is that the PythonRunner
class throws an exception as soon as the called script writes to stderr
. This is the case when the Python code raises an error for some reason or the other, and in this case, it is desirable to re-throw the error. But the script may also write to stderr
if some component issues a harmless warning, such as when something gets deprecated any time soon, or something is initialized twice, or any other minor issue. In such a case, we don't want to break execution, but simply ignore the warning. The statement in the snippet below does exactly this:
import warnings
...
# Suppress all kinds of warnings (this would lead to an exception on the client side).
warnings.simplefilter("ignore")
...
Parsing the Command Line Arguments
As we have seen, the C# (client) side calls a script with a variable number of positional arguments. The arguments are submitted to the script via the command line. This implies that the script 'understands' these arguments and parses it accordingly. The command line arguments that are given to a Python script are accessible via the sys.argv
string
array. The snippet below is from the kmeans.py
script and demonstrates how to do this:
import sys
...
# parse command line arguments
db_path = sys.argv[1]
ticker_list = sys.argv[2]
clusters = int(sys.argv[3])
start_date = sys.argv[4]
end_date = sys.argv[5]
...
Retrieving Stock Data
The Python scripts use the same SQLite database as the C# code does. This is realized in that the path to the database is stored as an application setting in the app.config on the C# side and then submitted as a parameter to the called Python script. Above, we have seen how this is done both from the caller side as well as the command line argument parsing in the Python script. Here now is the Python helper function that builds an SQL statement from the arguments and loads the required data into an array of dataframes (using the sqlalchemy Python package):
from sqlalchemy import create_engine
import pandas as pd
def load_stock_data(db, tickers, start_date, end_date):
"""
Loads the stock data for the specified ticker symbols, and for the specified date range.
:param db: Full path to database with stock data.
:param tickers: A list with ticker symbols.
:param start_date: The start date.
:param end_date: The start date.
:return: A list of time-indexed dataframe, one for each ticker, ordered by date.
"""
SQL = "SELECT * FROM Quotes WHERE TICKER IN ({}) AND Date >= '{}' AND Date <= '{}'"\
.format(tickers, start_date, end_date)
engine = create_engine('sqlite:///' + db)
df_all = pd.read_sql(SQL, engine, index_col='Date', parse_dates='Date')
df_all = df_all.round(2)
result = []
for ticker in tickers.split(","):
df_ticker = df_all.query("Ticker == " + ticker)
result.append(df_ticker)
return result
Text Output
For a Python script, producing text output that is consumable from the C# side simply means: Printing to the console as usual. The calling PythonRunner
class will take care of everything else. Here is the snippet from kmeans.py which produces the text seen above:
# Create a DataFrame aligning labels and companies.
df = pd.DataFrame({'ticker': tickers}, index=labels)
df.sort_index(inplace=True)
# Make a real python list.
ticker_list = list(ticker_list.replace("'", "").split(','))
# Output the clusters together with the used colors
for cluster, row in df.iterrows():
ticker = row['ticker']
index = ticker_list.index(ticker)
rgb = get_rgb(common.COLOR_MAP[index])
print(cluster, ticker, rgb)
Image Output
Image output is not very different from text output: First the script creates the desired figure as usual. Then, instead of calling the show()
method to show the image using Python's own backend, convert it to a base64 string
and print this string
to the console. You may use this helper function:
import io, sys, base64
def print_figure(fig):
"""
Converts a figure (as created e.g. with matplotlib or seaborn) to a png image and this
png subsequently to a base64-string, then prints the resulting string to the console.
"""
buf = io.BytesIO()
fig.savefig(buf, format='png')
print(base64.b64encode(buf.getbuffer()))
In your main script, you then call the helper function like this (the gcf()
function simply gets the current figure):
import matplotlib.pyplot as plt
...
# do stuff
...
print_figure(plt.gcf())
On the C# client side then, this little helper class, which is used by PythonRunner
, will convert this string
back to an Image (a Bitmap to be precise):
internal static class PythonBase64ImageConverter
{
public static Bitmap FromPythonBase64String(string pythonBase64String)
{
string base64String = pythonBase64String.Substring(2, pythonBase64String.Length - 3);
byte[] imageBytes = Convert.FromBase64String(base64String);
var memoryStream = new MemoryStream(imageBytes, 0, imageBytes.Length);
memoryStream.Write(imageBytes, 0, imageBytes.Length);
return (Bitmap)Image.FromStream(memoryStream, true);
}
}
History
- 24th August, 2019: Initial version