几秒钟内插入百万条数据

    科技2022-07-12  136

    几秒钟内插入百万条数据

    In one of my earlier articles, we went over how to parse the top analyst recommendations from Yahoo Finance for any stock. While they offered validation as to where a stock might move in the future, they were only updated once a month and did not offer any info as to the rationale behind the rating.

    在我之前的一篇文章中 ,我们讨论了如何解析Yahoo Finance对任何股票提出的最高分析师建议。 尽管他们提供了有关股票未来可能移动的验证,但它们仅每月更新一次,并且没有提供有关评级依据的任何信息。

    Luckily, since then, I’ve stumbled upon the wonderful site TradingView. If you aren’t familiar with the site, one of the features they offer is real-time recommendations for as short as 1 minute ahead or for as long as 1 month ahead. These recommendations are purely based on Technical Indicators including Moving Averages, Oscillators, and Pivots and you can see the calculations directly on the page!

    幸运的是,从那时起,我偶然发现了精美的网站TradingView 。 如果您不熟悉该站点,则他们提供的功能之一是实时建议,建议提前1分钟或长达1个月。 这些建议完全基于技术指标,包括移动平均线,震荡指标和枢轴指标,您可以直接在页面上查看计算结果!

    So instead of visiting the site each time I wanted a recommendation, I created this simple parser with less than 50 lines of code that can do just that.

    因此,我没有每次都想要一个建议就访问该网站,而是用少于50行的代码创建了这个简单的解析器。

    TradingView简介 (Introduction to TradingView)

    Before I get into the coding aspect, I want to quickly touch upon what and where these recommendations are on TradingView. If you go over to this page, you will see something similar to the image I included below. The page includes key statistics such as Price to Earnings ratio, Earnings Per Share, Market Cap, Dividend information, and much more. You can even click Overview to get a comprehensive table full of ratios as well as an interactive chart, and recent news. However, this isn’t where the recommendations are located.

    在进入编码方面之前,我想快速介绍一下TradingView上这些建议的内容和位置。 如果转到此页面 ,您将看到与我在下面包含的图像相似的图像。 该页面包含主要统计信息,例如市盈率,每股收益,市值,股息信息等等。 您甚至可以单击“概述”以获取包含比率的综合表,交互式图表和最新新闻。 但是,这不是建议所在的位置。

    Top of Apple TradingView page Apple TradingView页面顶部

    If you continue scrolling down on the Technicals page, there will be multiple charts like the one below, outlining the recommendation and the statistics for the reasoning behind the signal.

    如果继续在“技术”页面上向下滚动,将有多个图表,如下图所示,概述了建议和信号背后原因的统计信息。

    TradingView Recommendation Chart (what we’ll be scraping) TradingView推荐图表(我们将抓取的内容) TradingView Technical Indicator Statistics TradingView技术指标统计

    The recommendations range from strong buy to strong sell and as you can see in the second image, they are entirely dependent on the technical indicator signals. The algorithm we will be building soon parses the number of buy signals, neutral signals, sell signals, and the overall recommendation. The GitHub gist below contains all the code!

    建议范围从强买到强卖,正如您在第二张图片中看到的那样,它们完全取决于技术指标信号。 我们即将建立的算法将分析买入信号,中立信号,卖出信号和整体推荐的数量。 下面的GitHub要点包含所有代码!

    使用Python刮取TradingView (Scraping TradingView Using Python)

    # Imports import time import pandas as pd from selenium import webdriver from selenium.webdriver.chrome.options import Options from webdriver_manager.chrome import ChromeDriverManager # Parameters ticker = 'TSLA' interval = '1M' # Set up chromedriver options = Options() options.add_argument("--headless") webdriver = webdriver.Chrome(ChromeDriverManager().install(), options = options) # Declare list variable analysis = [] # Error handling try: # Open tradingview's site webdriver.get("https://s.tradingview.com/embed-widget/technical-analysis/?locale=en#{"interval":"{}","width":"100%","isTransparent":false,"height":"100%","symbol":"{}","showIntervalTabs":true,"colorTheme":"dark","utm_medium":"widget_new","utm_campaign":"technical-analysis"}".format(interval, ticker)) webdriver.refresh() time.sleep(1) # Recommendation recommendation_element = webdriver.find_element_by_class_name("speedometerSignal-pyzN--tL") analysis.append(recommendation_element.get_attribute('innerHTML')) # Counters counter_elements = webdriver.find_elements_by_class_name("counterNumber-3l14ys0C") # Sell, Neutral, and Buy Signal Counts analysis.append(int(counter_elements[0].get_attribute('innerHTML'))) analysis.append(int(counter_elements[1].get_attribute('innerHTML'))) analysis.append(int(counter_elements[2].get_attribute('innerHTML'))) # Set up DataFrame df = pd.DataFrame.from_records([tuple(analysis)], columns=['Overall Recommendation', '# of Sell Signals', '# of Neutral Signals', '# of Buy Signals']) df['Ticker'] = [ticker] print (df.set_index('Ticker').T) except Exception as e: print (f'Could not get the recommendation due to {e}')

    建立 (Setup)

    In case you do not have Selenium or Pandas installed, you can visit their respective links and download them using pip in your terminal! We will also need a chromedriver (the simulated chrome browser Selenium controls) and to download it using Python you can use the webdriver-manager package also found in PyPi.

    如果您没有安装Selenium或Pandas ,则可以访问它们各自的链接,并在终端中使用pip下载它们! 我们还将需要一个chromedriver(模拟的chrome浏览器Selenium控件),并使用Python下载它,您可以使用PyPi中的webdriver-manager软件包。

    Additionally, you can use any IDE or Text Editor that supports Python as long as you have the necessary dependencies installed. I personally would recommend downloading either Visual Studio Code or Spyder through Anaconda.

    此外,只要安装了必要的依赖项,就可以使用任何支持Python的IDE或文本编辑器。 我个人建议通过Anaconda下载Visual Studio Code或Spyder。

    让我们进入代码 (Let’s get into the code)

    Now that everything should be installed on your machine and you have an idea for what we will be scraping, let’s get into the code!

    现在,所有内容都应该安装在您的计算机上,并且您对我们将要抓取的内容有了一个想法,让我们开始编写代码!

    First, we have to import the dependencies we will need for the rest of the program. In this case, we will need the built-in time module, Pandas, and Selenium.

    首先,我们必须导入程序其余部分所需的依赖项。 在这种情况下,我们将需要内置的时间模块,Pandas和Selenium。

    The time module will allow us to make the program sleep for a number of seconds just so the simulated browser can fully load. Pandas will allow us to create a DataFrame with the data we collect. Finally, we will need selenium so we can create/control a browser window and scrape the JavaScript-rendered information.

    时间模块将使我们使程序Hibernate数秒,以便模拟浏览器可以完全加载。 熊猫将允许我们使用收集的数据创建一个DataFrame。 最后,我们将需要Selenium,以便我们可以创建/控制浏览器窗口并抓取JavaScript呈现的信息。

    Next, we can create two variables, one for the ticker and the other for the interval we are particularly scraping for. The interval can be any of the ones I included in the code fence below.

    接下来,我们可以创建两个变量,一个用于行情自动收录器,另一个用于我们特别抓取的时间间隔。 间隔可以是我包含在下面的代码栏中的任何间隔。

    #===================================================================# Intervals: # 1m for 1 minute# 15m for 15 minutes# 1h for 1 hour# 4h for 4 hours# 1D for 1 day# 1W for 1 week# 1M for 1 month# ==================================================================

    After we include the imports and parameters, we can set up the chromedriver. The Options class will allow us to add arguments such as headless to customize the simulated browser. Adding headless tells the browser to not pop up each time you run the program. We can set the executable path to the path where you downloaded the chromedriver earlier. In this case, I downloaded it directly into my directory but you do not have to.

    包含导入和参数后,我们可以设置chromedriver。 Options类允许我们添加诸如headless之类的参数以自定义模拟浏览器。 添加无头表示每次运行该程序时浏览器都不会弹出。 我们可以将可执行路径设置为您先前下载chromedriver的路径。 在这种情况下,我将其直接下载到我的目录中,但您不必这样做。

    We can add our scraping script inside a try/except block to catch errors from breaking our program. First, we must open up the browser using webdriver.get(URL), refresh to load all aspects of the page properly, and then add time.sleep(1) to slow down the program by one second until the browser is completely rendered.

    我们可以将我们的抓取脚本添加到try / except块中,以捕获破坏程序的错误。 首先,我们必须使用webdriver.get(URL)打开浏览器,刷新以正确加载页面的各个方面,然后添加time.sleep(1)使程序运行速度降低一秒钟,直到浏览器完全呈现为止。

    Using the .find_by_class_name method in selenium.webdriver, we can pinpoint the exact portions we want to scrape. For example, only the recommendation has the following class “speedometerSignal-pyzN — tL.” We can retrieve these class names by inspect element in Chrome DevTools. Top open up DevTools, you can right-click on the section you’d like to parse and then press “inspect” to get a similar result to the image below!

    使用selenium.webdriver中的.find_by_class_name方法,我们可以查明要抓取的确切部分。 例如,仅推荐具有以下类别“ speedometerSignal-pyzN — tL”。 我们可以通过Chrome DevTools中的inspect元素来检索这些类名。 顶部打开DevTools,您可以右键单击要解析的部分,然后按“检查”以得到与下图类似的结果!

    Chrome DevTools Chrome DevTools

    We can retrieve the “Buy” using the method .get_attribute(‘innerHTML’) which will store the text that is inside the HTML tag.

    我们可以使用.get_attribute('innerHTML')方法检索“购买”,该方法将存储HTML标记内的文本。

    Similarly, we can retrieve the number of buy, neutral, and sell signals by finding a class name that is similar between all of them and then using the method .find_elements_by_class_name. Since this time we are calling for elements, not an element, this method will return a list of HTML tags that have the class name we specify.

    类似地,我们可以通过找到所有相似的类名称,然后使用方法.find_elements_by_class_name来检索购买,中立和出售信号的数量。 由于这一次我们需要的是元素而不是元素,因此该方法将返回具有我们指定的类名HTML标记列表。

    Lastly, we can append all of the signals to a list, and using the .from_records method, we can turn a tuple of our data and a list of columns into a DataFrame. Finally, we can clean it up by adding a column for the ticker, setting that column as the index, and transposing (rotating) the DataFrame for our final product.

    最后,我们可以将所有信号附加到列表中,然后使用.from_records方法,可以将数据元组和列列表转换为DataFrame。 最后,我们可以通过以下方法来清理它:为代码行添加一列,将该列设置为索引,并为最终产品转置(旋转)DataFrame。

    Program Output 节目输出

    Now within seconds, you should get a similar output to the image above. I hope this algorithm will prove useful to you in the future. Thank you so much for reading!

    现在,在几秒钟内,您应该获得与上图类似的输出。 希望该算法将来对您有用。 非常感谢您的阅读!

    Disclaimer: The material in this article is purely educational and should not be taken as professional investment advice. Invest at your own discretion.

    免责声明:本文中的材料纯粹是教育性的,不应视为专业的投资建议。 自行投资。

    翻译自: https://towardsdatascience.com/parse-tradingview-stock-recommendations-in-seconds-1f4501303b21

    几秒钟内插入百万条数据

    相关资源:tradingView开发文档
    Processed: 0.010, SQL: 8