Select the key elements to make decision
Here I choose 5 elements including roe, gross profit margin, liabilities, net profit cash flow, payout ratio to analyze stocks. The reason is not within the scope of this post. Finding out all of these data by hand will be a huge work and will be time-consuming. There exists some better ways to speed up this process. Of course, this is the purpose of my writing this post.
The stocks we will analyze mainly sold in many different stock markets, such as Shanghai Stock Exchange, Shenzhen Stock Exchange, Hong Kong Exchange, NASDAQ, New York Stock Exchange, American Stock Exchange, etc. We should filter the qualified stocks judged by the important elements we choose. There are a couple of methods to achieve that. This trading model doesn’t contain complicated calculation and most of the work can be done by stock screener sites and tools like excel.
There are tons of stock screeners on the Internet. The first thing is to select a useful one, it depends on yourself.
You may need to grab Details of some information from many different sites. For instance, the dividend payout ratio can be found at https://finbox.com/, and you even need to pay for the data at other sites. For Apple:
Maybe some sites can be crawled by simple spiders. But most of the requests sent by the spiders will be banned by those sites. Of course, if we add some additional measures we can bypass those tricks. To deal with the interception of high-frequency requests, the easiest way is to set a time delay. Besides, you can even modify the header to pretend the real requests generated by the browsers. For JavaScript encrypted contents, you can read the code and find out your bypass ways. That will be a little bit troublesome. There are other tools like selenium, phantomjs, puppeteer to handle it. And Puppeteer will be a good choice in most situations, it also provides a python module named pyppeteer.
Be careful with pyppeteer
The current version of pyppeteer is not stable, it comes from the limitation of python, some asynchronous methods cannot be used in pyppeteer. And I spent considerable time on those confusing problems. For the exception thrown by pyppeteer:
1 | pyppeteer.errors.NetworkError: Protocol Error (Runtime.callFunctionOn): Session closed. Most likely the page has been closed. |
The proposal offered in https://github.com/miyakogi/pyppeteer/pull/160/files might be useful, but it doesn’t merge to the latest version, you need to modify the pyppeteer source code by yourself.
Sometimes the browser will crash because of small memory space, especially in a container like Docker or something like CI environment. What you want to do is to launch the browser with some arguments like this:
1 | browser = await launch({'args': ['--disable-dev-shm-usage']}) |
The toughest problem I have encountered during the programming is the browser will always be stuck at page.goto step, finally the browser will be closed automatically. I guess the program might have triggered some scripts that will keep running and never come to an end to deal with the headless browsers. This is a big problem and I’m struggling on it from days to days. In the JavaScript version, puppeteer provides a couple of ways to stop the page loading and return all of the content. But I found a good way to implement the same function in the end, to use finally expression.
Take a look at the source code of page.goto, this method will raise an exception when it reaches the timeout limit.
1 | def _createTimeoutPromise(self) -> Awaitable[None]: |
Our goal is to handle this exception and stop it from shutting down the program. It can be solved in such an easy way, so funny.
1 | browser = await launch() |
The key to the problem is to stop the browser from closing. It may be not so convenient, but it’s robust enough to guarantee the necessary content can be completely fetched without shutting down.
Output
The source code can be found at https://github.com/recursively/quantitative_trading_pub. If you select the American stock market, the final output will be like this:
1 | 德州仪器 TXN |
The program will calculate the appropriate price, the result below is from HKEX.
1 | IGG 00799 Stock code: HK.00799 Last price: 5.75 gprice: 9.81 |
More to mention
Some implement in the source code needs to be modified to improve the performance. There are too many IO operations in the program, some functions can be replaced by the asynchronous method. Such as the extract_bonus function:
1 | def extract_bonus(self, stock_code): |
Moreover, the browser just requests a single page for every launch. It wastes too much time during the whole procedure.
1 | async def extract_debts(self, stock_code): |
It can be replaced by requesting every page from a new tab instead of restarting the browser.