多少层楼听不见街边噪音

    科技2025-03-26  12

    多少层楼听不见街边噪音

    Mentor: Sining Chen

    导师:陈思宁

    LinkedIn is a social network designed for professional networking and is greatly employment-oriented. Its website and applications include hundreds of millions of professional user profiles as well as job postings.

    LinkedIn是专为专业网络而设计的社交网络,并且高度以就业为导向。 它的网站和应用程序包括数亿个专业用户个人资料以及职位发布。

    In order to gain insight into the job market as demonstrated by LinkedIn, we used web scraping tools written in Python: Selenium and Beautiful Soup. This story covers how you may install a selenium webdriver and Beautiful Soup onto your computer with pip in order to access the information we see on LinkedIn. Especially by using a webpage’s source code, we created automated functions like inputting text in a search box, clicking buttons, and scraping text. To do this, we had to overcome a few challenges including differentiating between Selenium and Beautiful Soup, locating the elements we needed to access, and avoiding the errors brought by the messaging pop-up.

    为了了解LinkedIn所展示的就业市场,我们使用了用Python编写的网络抓取工具: Selenium和Beautiful Soup 。 本故事介绍了如何使用pip将Seleniumwebdriver和Beautiful Soup安装到计算机上,以便访问我们在LinkedIn上看到的信息。 尤其是通过使用网页的源代码 ,我们创建了自动功能,例如在搜索框中输入文本 ,单击按钮以及抓取 文本 。 为此,我们必须克服一些挑战,包括区分Selenium和Beautiful Soup , 找到我们需要访问的元素以及避免消息 弹出窗口带来的错误。

    We also visualized the trends shown by our data. Our graphs are displayed in our other story, which is linked below: https://medium.com/@sophie14159/linkedin-job-trends-2dd64f1d4541

    我们还可视化了数据显示的趋势。 我们的图表显示在另一个故事中,该故事在下面链接: https : //medium.com/@sophie14159/linkedin-job-trends-2dd64f1d4541

    Selenium (Selenium)

    Selenium Python provides an API that allows you to access webdrivers including Firefox, Internet Explorer, and Chrome, which will be demonstrated later on.

    Selenium Python提供了一个API,使您可以访问Web驱动程序,包括Firefox,Internet Explorer和Chrome,稍后将进行演示。

    美丽的汤 (Beautiful Soup)

    Beautiful Soup is a python library that allows us to scrape information from web pages by accessing their source code. This uses an HTML or XML parser.

    Beautiful Soup是一个python库,可让我们通过访问网页的源代码来从网页中抓取信息。 这使用HTML或XML解析器。

    More information can be found with other resources at the end of this story.

    在该故事的结尾可以找到更多信息以及其他资源。

    装置 (Installations)

    In order to scrape from a webpage like those in LinkedIn, we must install selenium, Beautiful Soup and a webdriver.

    为了从LinkedIn中的网页中抓取,我们必须安装selenium,Beautiful Soup和一个webdriver。

    Installing pip

    安装点

    I needed to install both pip and selenium onto my laptop, and I did this using the Command Prompt. I first installed pip.

    我需要在笔记本电脑上同时安装pip和selenium,并使用命令提示符进行了此操作。 我首先安装了pip。

    Installing pip and redirecting path using Command Prompt on Windows 在Windows上使用命令提示符安装pip和重定向路径

    Note: Since I already had pip downloaded, it was uninstalled and reinstalled to have the newest version. Then, to access pip and install selenium, I needed to use “pip install selenium”. However, as the warning suggests, I also needed to change the path with “cd” to access pip.

    注意:由于我已经下载了pip,因此已将其卸载并重新安装以具有最新版本。 然后,要访问pip并安装selenium,我需要使用“ pip install selenium”。 但是,如警告所示,我还需要使用“ cd”更改路径以访问pip。

    2. Installing Selenium and Beautiful Soup

    2.安装Selenium和美丽的汤

    With pip successfully installed and able to be accessed, I simply used the commands “pip install selenium” and “pip install bs4”. Thus, Selenium and Beautiful Soup were successfully installed as well. We are now able to use them in our python scripts.

    在成功安装pip并可以访问pip的情况下,我仅使用了“ pip install selenium”和“ pip install bs4”命令。 因此,Selenium和美丽汤也被成功安装。 现在,我们可以在python脚本中使用它们了。

    3. Download webdriver

    3.下载webdriver

    In my case, I decided to use a chromedriver although, as mentioned before, Selenium is not limited to Chrome. Depending on which version of Chrome I had, I used the following site to download the chromedriver: https://chromedriver.chromium.org/downloads.

    就我而言,我决定使用chromedriver,尽管如前所述,Selenium不仅限于Chrome。 根据所使用的Chrome版本,我使用以下网站下载了chromedriver: https ://chromedriver.chromium.org/downloads。

    Note: If you are unsure which version of chrome you have, you may follow the instructions here.

    注意:如果不确定您使用的是哪个版本的chrome,可以按照此处的说明进行操作。

    With Selenium, Beautiful Soup, and a webdriver set up, we are ready to write our code!

    设置了Selenium,Beautiful Soup和Webdriver之后,我们就可以编写代码了!

    Python脚本 (Python Script)

    #import chrome webdriverfrom selenium import webdriverbrowser = webdriver.Chrome('chromedriver_win32/chromedriver.exe')

    In the code above, we imported the chromedriver, which will now allow us to access webpages from the chrome browser. Next, I logged into my LinkedIn account to access those webpages.

    在上面的代码中,我们导入了chromedriver,它现在使我们能够从chrome浏览器访问网页。 接下来,我登录我的LinkedIn帐户访问那些网页。

    #Open login pagebrowser.get('https://www.linkedin.com/login?fromSignIn=true&trk=guest_homepage-basic_nav-header-signin')#Enter login info:elementID = browser.find_element_by_id('username')elementID.send_keys(username)elementID = browser.find_element_by_id('password')elementID.send_keys(password)#Note: replace the keys "username" and "password" with your LinkedIn login infoelementID.submit() Automated login page 自动登录页面

    An automatic webpage should open and the program will automatically input your username and password as you had set up in the code.

    应该会打开一个自动网页,程序将按照代码中的设置自动输入您的用户名和密码。

    Something to keep in mind is that, since you are using your own LinkedIn account, the webpages and information you can access from LinkedIn are limited to what your profile can access. For example, specific users you are not connected with may not be visible to you.

    请记住,由于您使用自己的LinkedIn帐户,因此您可以从LinkedIn访问的网页和信息仅限于您的个人资料可以访问的内容。 例如,您可能看不到与您没有联系的特定用户。

    Successfully logged in, we are now able to perform many functions, a few of which that I have used are listed below:

    成功登录后,我们现在可以执行许多功能,下面列出了我使用的一些功能:

    1.在搜索框中输入 (1. Inputting in a search box)

    #Go to webpagebrowser.get('https://www.linkedin.com/jobs/?showJobAlertsModal=false')#Find search boxjobID = browser.find_element_by_class_name('jobs-search-box__text-input')#Send inputjobID.send_keys(job)

    In the instance above, I accessed the page where I am able to search for jobs. The element that I needed the browser to find was the search box. I found the class name by right-clicking on the page and going into “inspect” which shows me the source code of any webpage.

    在上面的实例中,我访问了能够搜索工作的页面。 我需要浏览器找到的元素是搜索框。 我通过右键单击页面并进入“检查”来找到类名称,该检查向我显示了任何网页的源代码。

    Source code from inspect option 检查选项的源代码

    In fact, making use of the source code was the more important key to performing the functions in my code. By clicking the arrow on the top left corner of the source code window, I could see the code of any element I clicked on the LinkedIn page. I would then insert the class name into my python code to access it.

    实际上,使用源代码是执行代码中功能的更重要的关键。 通过单击源代码窗口左上角的箭头,我可以看到我在LinkedIn页面上单击的任何元素的代码。 然后,我将类名插入我的python代码中以对其进行访问。

    2.单击一个按钮 (2. Clicking a button)

    Clicking a button can be submitting the input you entered in the search box above but is definitely not limited to that. For my own purposes, I needed to click a button to open the list of filters that were applied to a job search.

    单击按钮可以提交您在上面的搜索框中输入的输入,但绝对不限于此。 出于我自己的目的,我需要单击一个按钮以打开应用于职位搜索的过滤器列表。

    All LinkedIn job function filters 所有LinkedIn工作功能过滤器

    This is a simple, two-step, process:

    这是一个简单的两步过程:

    Find the element and

    找到元素并 Perform the click method on the element

    在元素上执行click方法 browser.find_element_by_class_name("search-s-facet__button").click()

    I found the element class name through the source code, as explained in the previous function, then applied the click method. I wrote this code in one line but it can also be done in two with a variable:

    我通过源代码找到了元素类名称,如上一个函数中所述,然后应用了click方法。 我在一行中编写了此代码,但也可以用一个变量在两行中完成:

    #These two lines of code are not clicking the same button as the previous instancesearch = browser.find_element_by_class_name('jobs-search-box__submit-button')search.click()

    3.刮文字 (3. Scraping text)

    In the code after the image, I tried to obtain the number of job postings that resulted from a search for a specific type of career. This number was first stripped as a string, so I had converted it into an integer, as well.

    在图片后的代码中,我尝试获取因搜索特定职业而产生的职位发布数。 该数字首先被剥离为字符串,所以我也将其转换为整数。

    Scraping number of job postings 减少职位发布数 #Get page source codesrc = browser.page_sourcesoup = BeautifulSoup(src, 'lxml')#Strip text from source coderesults = soup.find('small', {'class': 'display-flex t-12 t-black--light t-normal'}).get_text().strip().split()[0]results = int(results.replace(',', ''))

    错误与解决方案 (Errors and Solutions)

    Selenium and Beautiful Soup can work together to perform many functions that are definitely not limited to what I have used above. However, I would like to address a few errors I have run into while writing my own code and also explain my solutions.

    Selenium和Beautiful Soup可以一起执行许多功能,这些功能绝对不限于我上面使用的功能。 但是,我想解决在编写自己的代码时遇到的一些错误,并解释我的解决方案。

    1.Selenium与美丽汤 (1. Selenium v.s. Beautiful Soup)

    Since my code made use of both Selenium and Beautiful Soup, I often confused which one I needed to use depending on my purposes. I once tried to find and click an element by searching for it in the html code before realizing that I was trying to perform a Selenium function using Beautiful Soup!

    由于我的代码同时使用了Selenium和Beautiful Soup,因此我经常混淆根据自己的用途需要使用的代码。 我曾经尝试通过在html代码中搜索元素来查找和单击元素,然后才意识到我试图使用Beautiful Soup执行Selenium函数!

    Although the two work together, Beautiful Soup allowed me to scrape the data I could access on LinkedIn. Meanwhile, Selenium is the tool I used to automate the process of accessing those webpages and elements on the website.

    尽管两者可以协同工作,但是Beautiful Soup允许我抓取我可以在LinkedIn上访问的数据。 同时,Selenium是我用来自动化访问网站上的那些网页和元素的过程的工具。

    2.无法找到元素 (2. Unable to locate element)

    This was perhaps the most frustrating error as there are elements in the source code that my code could not seem to access.

    这可能是最令人沮丧的错误,因为源代码中有些元素似乎无法访问我的代码。

    The following is a block of code that helped me fix this problem:

    以下是帮助我解决此问题的代码块:

    last_height = browser.execute_script('return document.body.scrollHeight')for i in range(3): browser.execute_script('window.scrollTo(0, document.body.scrollHeight);') time.sleep(SCROLL_PAUSE_TIME) new_height = browser.execute_script('return document.body.scrollHeight') last_height = new_height

    The block of code above acts as a page loader that allowed me to reach more of the source code to find the elements I needed. For example, when you first load a page, you cannot see much of the website until you scroll down. The code above allows you to automate that “scrolling down” action.

    上面的代码块充当页面加载器,使我可以访问更多源代码来查找所需的元素。 例如,当您第一次加载页面时,只有向下滚动才能看到很多网站。 上面的代码使您可以自动执行“向下滚动”操作。

    A link to the YouTube channel that introduced me to this code is linked at the end of this story along with other resources.

    本故事末尾链接了向我介绍此代码的YouTube频道的链接以及其他资源。

    3.消息弹出窗口 (3. Messaging pop-up)

    One of the first issues I ran into was the way my messages would pop-up and essentially block a chunk of the webpage that I wanted to be able to see.

    我遇到的第一个问题是消息弹出的方式,并实质上阻止了我希望能够看到的一部分网页。

    https://www.fastcompany.com/40405918/linkedins-new-instant-conversations-are-a-major-messaging-upgrade https://www.fastcompany.com/40405918/linkedins-new-instant-conversations-are-a-major-messaging-upgrade

    This seemed to be an automatic function in LinkedIn so I needed a way around it. If I were manually using LinkedIn, I would simply click the messaging tab once and the pop-up would minimize itself. I needed to perform this action in my code.

    这似乎是LinkedIn中的自动功能,所以我需要一种解决方法。 如果我是手动使用LinkedIn,则只需单击一次“消息传递”选项卡,弹出式窗口便会将其最小化。 我需要在代码中执行此操作。

    My solution below checks to see if the pop-up has already been minimized, in which case it would not block the other elements on the page. If my code cannot find the minimized messages then I look for the case where it which it is in fact a pop-up and click it away.

    我下面的解决方案检查弹出窗口是否已被最小化,在这种情况下,它不会阻止页面上的其他元素。 如果我的代码找不到最小化的消息,那么我会寻找它实际上是弹出窗口的情况,然后单击它。

    #Import exception checkfrom selenium.common.exceptions import NoSuchElementExceptiontry: if browser.find_element_by_class_name('msg-overlay-list-bubble--is-minimized') is not None: passexcept NoSuchElementException: try: if browser.find_element_by_class_name('msg-overlay-bubble-header') is not None: browser.find_element_by_class_name('msg-overlay-bubble-header').click() except NoSuchElementException: pass

    结论 (Conclusion)

    Selenium and Beautiful Soup allows us to perform many different functions including scraping the data I obtained from LinkedIn. Using the data I was able to collect using the functions in this story, I was able to obtain the data necessary to generate graphs reflecting the job market as suggested by LinkedIn.

    Selenium和美丽汤使我们能够执行许多不同的功能,包括抓取我从LinkedIn获得的数据。 使用本故事中的功能,我能够收集到的数据使我能够获得必要的数据,以生成反映LinkedIn所建议的就业市场的图表。

    翻译自: https://medium.com/@sophie14159/linkedin-scrapper-a3e6790099b5

    多少层楼听不见街边噪音

    相关资源:通用版街边商铺租赁合同范本3篇_正规房屋租赁合同范本
    Processed: 0.010, SQL: 8