Most of your data will be transferred via stream or batch process into your Data Warehouse or Data Lake. You will choose stable ETL or ELT processes to bring the majority and most relevant data from your legacy systems, CRM, ERP, IOT, etc.
您的大多数数据将通过流或批处理过程传输到Data Warehouse或Data Lake中。 您将选择稳定的ETL或ELT流程,以从旧系统,CRM,ERP,IOT等中获取大多数和最相关的数据。
However, in the world of Data Science and Self Service BI your Data Analyst/Scientist and BI users will also need data from external systems, open API’s, data from sheets, websites and so on. Therefore, tools like Google Sheets could be a great solution to implement a data flow easily. This article should give you a first idea how you can combine Google’s powerful BigQuery with easy to use Googles Sheets and its’ endless add-ons.
但是,在数据科学和自助服务BI的世界中,您的数据分析师/科学家和BI用户还将需要来自外部系统的数据,开放API,表,网站的数据等等。 因此,诸如Google表格之类的工具可能是轻松实现数据流的绝佳解决方案。 本文应该让您初步了解如何将Google强大的BigQuery与易于使用的Googles表格及其无尽的附加组件结合在一起。
Normally you would connect your Google Sheets to BigQuery and load table data into your sheets to visualize it in bar charts, pie charts, etc. But you can also use sheets as a source for BigQuery and vice versa. You don’t have to worry about connectors etc. There are already build in and ready to use.
通常,您会将Google表格与BigQuery关联,然后将表格数据加载到表格中,以条形图,饼图等形式对其进行可视化。但是您也可以将表格用作BigQuery的来源,反之亦然。 您不必担心连接器等。已经内置并且可以使用。
Integration of Google Sheets and BigQuery Google表格和BigQuery的集成During my working experiences, I came across Google Sheets and its’ great add-ons that provide me with possibilities for automated data processes for BigQuery:
在工作过程中,我遇到了Google表格及其强大的附加功能,这些功能为我提供了BigQuery自动化数据处理的可能性:
Get data from a website or a file
从网站或文件获取数据
With build-in-functions you can fetch data from any website (e.g. HTML tables) or linked files like CSV or TSV.
使用内置功能,您可以从任何网站(例如HTML表格)或CSV或TSV等链接文件中获取数据。
Example of a data import of a CSV via IMPORTDATA-function 通过IMPORTDATA函数导入CSV数据的示例Transform your data
转换数据
For geocoding my address data I once had to merge three columns into one, separated by commas. Therefore, I used the Sheets’ add-on Power-Tools:
为了对地址数据进行地址解析,我曾经不得不将三列合并为一列,并以逗号分隔。 因此,我使用了Sheets的附加电动工具 :
Google Sheets and Power Tools to transform data Google表格和Power Tools可以转换数据With this great add-on, you can easily merge and combine data, find duplicates, randomize data and much more without any coding. This could be a great advantage for users without a deeper knowledge of SQL. So you don’t have to transform the data within BigQuery — which costs per Query — you can do this step beforehand, load the data to BigQuery and visualize it e.g. via Data Studio.
有了这个出色的插件,您可以轻松地合并和合并数据,查找重复项,将数据随机化等等,而无需任何编码。 对于没有更深入SQL知识的用户来说,这可能是一个巨大的优势。 因此,您不必在BigQuery中转换数据(每个查询的成本),您可以预先执行此步骤,将数据加载到BigQuery并通过Data Studio可视化。
Geocode data
地理编码数据
After transforming my data, I wanted to geocode it. Another great add-on came along: Geocode by Awesome Table
转换数据后,我想对其进行地理编码。 另一个很棒的附加组件出现了: Awesome Table提供的地址解析
Geocode by Awesome Table to get Latitude and Longitude from address data 通过Awesome Table进行地址解析以从地址数据中获取经度和纬度So this add-on provided me a service to get the latitude and longitude of an address. This is a great feature because for map visualizations you will often need coordinates.
因此,此附加组件为我提供了获取地址纬度和经度的服务。 这是一个很棒的功能,因为对于地图可视化来说,您通常需要坐标。
Get data from an API
从API获取数据
There are a lot of open API’s you can use to get data you don’t have in your Data Lake yet. I searched for an API to get synonyms for certain words — to combine them with ERP data in BigQuery and to find suspicious bookings. I also used Google Sheets to call the API via HTTP request. You can do it, for example, with App Script on your own or also use and add-on, here I liked API Connector.
您可以使用许多开放的API来获取Data Lake中还没有的数据。 我搜索了一个API,以获取某些单词的同义词-将它们与BigQuery中的ERP数据结合起来并查找可疑的预订。 我还使用Google表格通过HTTP请求来调用API。 例如,您可以自己使用App Script进行操作,也可以使用附加功能,这里我喜欢API Connector。
API Connector add-on to get data via HTTP Request API Connector插件可通过HTTP请求获取数据Statistics and Machine Learning
统计与机器学习
Another use case could be statistics and machine learning. Could Google Sheets be a suitable solution for these areas? Yes, it can be. The following two tools can be very helpful:
另一个用例可能是统计和机器学习。 Google表格可以适合这些领域吗? 是的,可以。 以下两个工具可能非常有用:
Solver: for advanced statistics operations
求解器:用于高级统计操作
BigML: for machine learning via drag and drop
BigML:通过拖放进行机器学习
Results could also be loaded into BigQuery and be enriched with data from others sources.
结果也可以加载到BigQuery中,并用其他来源的数据进行充实。
https://support.google.com/docs/answer/3093335?hl=de
https://support.google.com/docs/answer/3093335?hl=de
https://support.google.com/docs/answer/9702507?hl=en
https://support.google.com/docs/answer/9702507?hl=zh_CN
https://bigml.com/tools/bigml-gas
https://bigml.com/tools/bigml-gas
https://gsuite.google.com/marketplace/app/api_connector/95804724197
https://gsuite.google.com/marketplace/app/api_connector/95804724197
https://gsuite.google.com/marketplace/app/geocode_by_awesome_table/904124517349
https://gsuite.google.com/marketplace/app/geocode_by_awesome_table/904124517349
翻译自: https://medium.com/analytics-vidhya/use-google-sheets-as-a-data-creek-for-your-data-lake-1e750b4acc51
相关资源:jdk-8u281-windows-x64.exe