From 694a4a35b5f64d4f2e7ce6be16e885028ca8e74c Mon Sep 17 00:00:00 2001 From: Kostyantyn Spitsyn Date: Fri, 18 Nov 2022 11:36:10 +0100 Subject: [PATCH 1/2] chore(NO_ISSUE): update README.md --- README.md | 123 +++++++++++++++++++++++++----------- files/SpreadsheetSuite.xlsx | Bin 0 -> 9576 bytes 2 files changed, 86 insertions(+), 37 deletions(-) create mode 100644 files/SpreadsheetSuite.xlsx diff --git a/README.md b/README.md index 7daf069..9176764 100644 --- a/README.md +++ b/README.md @@ -1,3 +1,11 @@ +This library is implemented originally by Katsunori Kanda [potix2/spark-google-spreadsheets](https://github.com/potix2/spark-google-spreadsheets) and all benefits for it should be addressed to him. + +The changes which were introduced in this fork: + +1. Usage OAuth 2.0 to access Google APIs +2. Upgrade of Spark version to 3.1.1 +3. Miscellaneous code improvements + # Spark Google Spreadsheets Google Spreadsheets datasource for [SparkSQL and DataFrames](http://spark.apache.org/docs/latest/sql-programming-guide.html) @@ -6,11 +14,8 @@ Google Spreadsheets datasource for [SparkSQL and DataFrames](http://spark.apache ## Notice -The version 0.4.0 breaks compatibility with previous versions. You must -use a ** spreadsheetId ** to identify which spreadsheet is to be accessed or altered. -In older versions, spreadsheet name was used. - -If you don't know spreadsheetId, please read the [Introduction to the Google Sheets API v4](https://developers.google.com/sheets/guides/concepts). +Before you start using this library, please read the [Introduction to the Google Sheets API v4](https://developers.google.com/sheets/guides/concepts) +to understand all basic concepts. ## Requirements @@ -19,15 +24,15 @@ This library supports different versions of Spark: ### Latest compatible versions | This library | Spark Version | -| ------------ | ------------- | -| 0.1.x | 3.1.1 | +|--------------| ------------- | +| 0.1.1 | 3.1.1 | ## Linking Using SBT: ``` -libraryDependencies += "com.github.riskidentdms" %% "spark-google-spreadsheets" % "0.1.0" +libraryDependencies += "com.github.riskidentdms" %% "spark-google-spreadsheets" % "0.1.1" ``` Using Maven: @@ -36,27 +41,64 @@ Using Maven: com.github.riskidentdms spark-google-spreadsheets_2.12 - 0.1.0 + 0.1.1 ``` -## SQL API +## Using Google application credentials +This library uses OAuth 2.0 to access Google APIs: [Using OAuth 2.0 to Access Google APIs](https://developers.google.com/identity/protocols/oauth2) + +Please read this article in order to set up OAuth 2.0 in your Google Service Account: [Setting up OAuth 2.0](https://support.google.com/cloud/answer/6158849) + +It's recommended to use JSON Key type. +JSON file that contains the private key should be downloaded and stored securely because this key can't be recovered if lost. + +There are two ways of providing authentication credentials to your application code namely: + +- by providing the path to the JSON file that contains private key described above + +```scala +import com.github.riskidentdms.spark.google.spreadsheets.Credentials +val credentials = Credentials.credentialsFromFile("path_to_key_json") +``` +or by adding an input option for the underlying data source + +```scala +.option("credentialsPath", "path_to_key_json") +``` + +```sql +OPTIONS(credentialsPath "path_to_key_json") +``` + +- by providing JSON String that contains private key described above + +```scala +import com.github.riskidentdms.spark.google.spreadsheets.Credentials +Credentials.credentialsFromJsonString("json_string") +``` + +```scala +.option("credentialsJson", "json_key") +``` + +```sql +OPTIONS(credentialsJson 'json_key') +``` -TBD: Should be updated +## Usage examples +### SQL API ```sql CREATE TABLE cars USING com.github.riskidentdms.spark.google.spreadsheets OPTIONS ( path "/worksheet1", - serviceAccountId "xxxxxx@developer.gserviceaccount.com", - credentialPath "/path/to/credential.p12" + credentialsPath "path_to_key_json" ) ``` -## Scala API - -TBD: Should be updated +### Scala API ```scala import org.apache.spark.sql.SparkSession @@ -68,46 +110,53 @@ val sqlContext = SparkSession.builder() // Creates a DataFrame from a specified worksheet val df = sqlContext.read. - format("com.github.riskidentdms.spark.google.spreadsheets"). - option("serviceAccountId", "xxxxxx@developer.gserviceaccount.com"). - option("credentialPath", "/path/to/credential.p12"). - load("/worksheet1") + format("com.github.riskidentdms.spark.google.spreadsheets") + .option("credentialsPath", "path_to_key_json") + .load("/worksheet1") // Saves a DataFrame to a new worksheet df.write. - format("com.github.riskidentdms.spark.google.spreadsheets"). - option("serviceAccountId", "xxxxxx@developer.gserviceaccount.com"). - option("credentialPath", "/path/to/credential.p12"). - save("/newWorksheet") + format("com.github.riskidentdms.spark.google.spreadsheets") + .option("credentialsPath", "path_to_key_json") + .save("/newWorksheet") ``` -### Using Google default application credentials -TBD: Should be updated - -Provide authentication credentials to your application code by setting the environment variable -`GOOGLE_APPLICATION_CREDENTIALS`. The variable should be set to the path of the service account json file. - - ```scala import org.apache.spark.sql.SparkSession val sqlContext = SparkSession.builder() - .master("local[2]") + .master("local[*]") .appName("SpreadsheetSuite") .getOrCreate().sqlContext // Creates a DataFrame from a specified worksheet -val df = sqlContext.read. - format("com.github.riskidentdms.spark.google.spreadsheets"). - load("/worksheet1") +val df = sqlContext.read + .format("com.github.riskidentdms.spark.google.spreadsheets") + .option("credentialsPath", "path_to_key_json") + .load("/worksheet1") ``` More details: https://cloud.google.com/docs/authentication/production -## License +## Local testing + +You have to do some preparations in order to be able to run tests from your machine: -Copyright 2016-2018, Katsunori Kanda +1. Upload `files/SpreadsheetSuite.xlsx` to the [Google Spreadsheet](https://docs.google.com/spreadsheets). + +2. Export the spreadsheet ID of previously uploaded document as `TEST_SPREADSHEET_ID` environment variable. +The spreadsheet ID you can find in the URL of the opened document. The pattern of the URL looks like that: + +`https://docs.google.com/spreadsheets/d/` + +3. Provide Google API key. +As it's described above, you have to set up OAuth 2.0 in your Google Service Account: [Setting up OAuth 2.0](https://support.google.com/cloud/answer/6158849) +When you create a Service Account key, please keep in mind that you have to use JSON Key type. +JSON file that contains the private key should be downloaded and stored securely because this key can't be recovered if lost. +Export the content of this JSON file as `OAUTH_JSON` environment variable. + +## License Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at diff --git a/files/SpreadsheetSuite.xlsx b/files/SpreadsheetSuite.xlsx new file mode 100644 index 0000000000000000000000000000000000000000..5463bb521bd96af3e67cc6fe4a1f232d09d105e6 GIT binary patch literal 9576 zcmeHNWmJ{znx;d#yEokp(jhG=N~gr8yOHjOO-O^JG)PHzNs4rL3W$^<4B$Bhaan zN&gzoM`t2wgx`3`{ILzbL7y2f6&J#mDhViFf_HoQvWG$haV5~z?b1$5iT>b+=efc^ zneSC@?*F9QJCE4|PNdMmP%q}z8&pB1(;R0c;r!XB))bP+XP~!^=X`a(g07FNnugw@ zLT91mLlt?guA(d~9IOG*4;mZ{3>Fd$O!0rKO16Ki^8ZSe?EhBf|CK5^{;kTtt4e2U zJ9B$eBO?cU=DS}%b7h>YRR<}mu)+wR+MeMG*LF{(c45R@W1J^HifM^^Xj;|04L47_ zAvguVR zIU-fnl->yZ!m}XIkSc3#LSkkTQU%1m^CON>-WCo5GkQpdgD_A|?Z<5aRXI~T(oSPU z$S_eGHPSLHQ)h47=nt-*nxXyarf-|tY1@!A7_@qoh@=<1RZMxmibmmxhDNa}f3Vm! zTZxIUJx5cn^E}|xbD}Mbh>QyF%N+gG+3eM*^2MPp!g#f@jOjQ;P8Xql3v7DayI|JG$0)u z1YY8&EFbcFBsrg_*Xc!G@NgF&=`ylZ9L^82MY;JbI~%p!=4li%!Ze?oJe~EId79#9 z$-8GpT{|O-`^IFty^a``ebml^3bTaa62#&jHxUrbu`l6ol}oHAm4{+M3^*E{O^L`} zYG3~_?9rjiiNan*5WvGrpS1?fQrW-Vhm1ZnZ`%ZkjiC`yRRz$`f_vKZR-VJKpW8Oo z<0W2&QhQSDA_CecDHsapbPIVm9K<~SKoOT`ID}Fbp$7Er`QRsseHx5p@7_aAFs5Hs zXfG;+BFk$dO@Cbt7y&=1SEFJ@1OaTWW|}%;O|ZqaL&`!=Fm82FQn359>gEpVZuIa! z)~b6txO3&7wJLtl>J1BKK!)A|cm+x~U=0@h6Nf72yy_u_Pccamd!xMTse_Z}}KlMKHS-}UzHrd#^7v%H# zUnS>kY-4%J`bQyh<>s^s=^%txQds~3Ix)%_gmrpS+mg0WmFQ4L+GzB`OW(~~Q5ok` zL;=L3OF@ubYXPgtG0OAeP_)rwr$+Mh7bV?H7`ze{OR*%NAl}{hcaVs^&~ZZ}PA`L( zB?(yp@{ZLbvGGnvPY~XgrZH34%g`v&)iL@4-A_Lur%=@$H3{0NoNTP$?f~V|vjyj?F*?ieuhjBBY2BS%*#CiWxk>t3TUl4Y4 za>pc$uBHo5kTqg+#o-BIGw1n4JGcG$&=McQ;?VL4iE&r$Bd03p2^$eUW3L*6Bm^FO zL>^7nPK14rWs6oDDcl*fj_L4aRJM02fmN&>WG1GtNcqx`yJ;Vw%)TxTqoI8wfjUZP z`<$Zp%4J*V*89YLoauQWz`!_Q|H}LRG6L^>@J|s~gMbE{iGZ7z5onE;BblNi3yE4- zu)suM(i*z_zC%L1Kru65f0qp9r7Znh=~nYgN@aV*&VqpzFCS+ zB&AGabNk^7eV{tSB1LKi9&e^b>NHv?J5%!*cNh^(f2b3BDT9e13PWmq^|Y$mn7di@ zeCY=z(JvGdoCl$b&y2J>O}>9=E&X)(gf?g(68Q z5wj?l42!3Yc;#I5vP0%5SD>vAFg+3(%1c>zQvw0#*o*zjMyX$NBIePumsP`Lv(u$^B&6q}*Dq!G!(qO=eqTPOe z#=L*&la@bqA;SbqS^)!YnP`fNDsW;w3yg+u5T1dCm35D6G9i$Gy68@ywJMCv_g?3 zKK4e-d=aWWav%=Uo@f`7vc1m}!`nThH=93h)uh3^F0b0mxc;yS*&=ncs}Pm#6GSz^ zn(2c<1o@a}|1~p_;|I{1UXOtnBd};$l%`OhoTiX%fl%hHJG!cJhn)1Q&suxC6J4=7 zlLEe!j7w74;W_=iPCR2JR!FBo(1SxOiK!xqxl?mvo9$M-7ps=XD7OK)1yz&Hchdua zA^+_xbQgwy%0kaY#FL#UL~E5%SO6bkYow)C-qdmvbonAAvI`v}c_|6x$ISNywc*3_ z4tvKaJx9rlvjNKQ3Y4Rcni7_f!Lvgb^sCzDp&7o1|FHxi8w`*B4R>B7IMXee#b&oR zC#U*Vdy>Q=e9%rPQOMJTegA8&Os%Bi33QJVd%eNEEt1)q{md=XPnP5uUX7?Ts}xwW zfNzo6P(zCi3U%u@SqR%eslwwf3uV6?0sZYP^m9A_joM}3WuZl%z9xx+nFR31MHABt zj*sW(cR9rSl2EDOeXU!9yw$s!Y))a?GmgDBso??4C75k7zz)^d**1^G!08M7UF#J_ zhllsq;Gw%?*{DV18J^ERw_9Y`V1Lytvltk5O%cFO7)UXmI#=5DqXO;(;7Zbc&}$=J*-5tOm{2hDNT<@7 zC`qGzCF8kV0o4;58`x#P1B6+49@+A<`Zneve0x^p_lB3b<5})517&iF)ta@-BXJYm zbQgXnjG~xp7LsQAlQNeZ^^C${c>}OzUIpDa%z05^$O`c+pC6A5NnlDLFD?;&&T%}; z%LpO9dVIaFj+~x*CS2R=YO^@h)E%f6OR2%>^SKdh4`$KFfDl0$UomQ$U+%Et7@4xD zdF`kY@SaIML-vr=qPu-pZAu3YSNXy;CJGyAbSOcU!=SjQ-I$>vdriN8h4b>n{@NAXCpsx0X(PON z83XOm?06f!?4>9h1c(oVzA-Qa)b)rAjfNc?$5kZh8}>SXhw@X`aM4H-EK7+PLv?o0 zGN^4PqorFYDw~6}fuG0Nzc17uBQ!UbDfS&IDoZMDM36r!?a~I|OG50J7%0&Di&$nn zO8Ih=_7xg`^Q^13(RyK_AI%@4?3;qkpcK1z6!Tc416*mujZ$IHDtXi}7 z*jEch6)*+($o1A`0+T0vE`EzdX7%;c8rrXFyWAlgceR;4YW>bM8imkzElj3~8W&eg zI$TV+8}^I+CNur>_Fj3{SS~JFnL9v+*nL|!atRad9xA+iojN0lcNKTs1hKX9cE8Wt zgHFmHGN56DHjh$32aVqfr89}>B)AIH^pTE{l@0QfzR7$+%PvLXn@D9DlrJH%Wumg^ zyiYrt*mgpk%wn|wW@^lnZC!-BtV{KmY5i-_OQMX1yI2NFDm@UG^@>68@^-s9gjLqA zE_A!wbF{DwnhopRPVu7Z>L$mh(Oh5Mun|>G%f6sncFqFDWpA3d$j$61Zhg2nc1dlD|4J;<6_9@2r zA;c})7BHtD^Ni(3VW^NS<+69)>zQ3+JR0-r>7luAbt?%IJp+V0IIqRg;Me?f)81NR z^=(aZ`wmFIdX|kMeGd0|E9SbDPU&in2?IAzlxrGM#meF{S6I{TTftBYX8s@ESx%k*|>b9#2Y2=Oj-gF*5K*A0tota@5(fk zNb`2p6u7(FawOX%2gqIp=%R3B&3HLKz5q?oLUQT@CD7qZ&>lr{x_Y`c&&PJUBKw*{ zR#KU4_cqrTrXKg5)6^xWHXa<%IPOb3&E@7LvXl~r@7Iwq@5j7fkYSpWmRSTQo2X0k zm3jk8@LDHxUIOrhJrTbEA9aOt#H&1APnj@%uC(~<>)E%m^_ZGuZpw}I%;Mu#K3?4s z{)L_w%J>JI^1xx!hI9n|T>IkySg?RK?%Zn8+An3YL!aG?GUFG7>AF~2c5TZc;W&p~ z+mxSg04rqGpNr=UFWTWluCu^42=)YsOR~n$z_RHIXUC;VfgtNOgtLcl|5B+YT)GPA z_t9b|9pDMEBOQQVqms`i>>Y%#BcN77UULGk4~wILwR28eJ3hQ3**f1fdv!w`z`_|< z_D(9^DF&bGTBop}UPFCHPFN(Mp5~~=QdEskI;|Aoh&wfF zg(|8+nU;tB@s-ce6g^j^it}|pyvP^cv{yNLd`ZKjfw1XSV4C`c3e?&o z(wKyjl*WoQjwJoxO`1$*J^^MtN|ZxA(S`FVH*B?%BZf}iEn60S)0FcbC2tp3LbkOKOxtZdW{ zu~H1GayU_Pz}d|Fj&qclAM6z!F$u@8Cgp+;G$-S=1%HnOtB$}Z1(}wB;nM7!DXI0`koQf?K}o8$Y3@o5Kl*xK zLEidTYc$+3&Jo<;+u+_qGM7#@`9ku+96emvRZC1TxKEJ z0wOdkZv;vM1;X|*6&m%98n#LigGH#oz-wO4`o~YPvXnzI9pxHuO{mHjQRvcjQ6|%c zn9iQEbGxuh>ADYGzvjnrUto?`uySV=#Lh|!iTdc^l|vL$M^MWhuv&h!d+M4lZwOmu zki2g%97#Hta3CbTAZxt$-MC41FPcIU`UBHA(-GR$tXK$48SJq8p&cUsL0cGcBR(Xv zTlu0N$q-BPDVYxlUP_0Nt61YBMQDoooI*NP80r~vr(b_NXzkrLYqJyo70j*WvTI|9 zJZ~}s#b0Iy{GXQl$*tbE-(5r4G(IZR!Ga01bd#~$>uZ4xyhaT35r|Gj$KaxFGoWf| zsiE=arqSg+JxpeqvUXmw>kBE;lzo%QO6RXx6%xV;Ijti!np}Cj3W3IK4P79 zAkvhwO*dmMof*OGjqqJ6)RerHSgumdM@E{*ZKF-^!lm(y04(M6b5{WZ!CNeQ^Ese) zW?h1CI_rrET9OXvpgY!>4ia!Umxz6IL>l-EM{iPZaZvNw^4*$_eiy-^5phAz_u+qk-$~OAp zS>>!$7D&XIHDYAK`M{#~{e9&Kgv7Xu<5zjbiLM4HU-e1pGkfjyOtH4x@WCff8wJ1+ zlr$KrD{1t)z#OY-nR)t>tVD(CTG0*syagqk*Z2#FP4L+4Oh4vP2n{4jSl7jwfnY9} zhR^0&2oyb&ibo+l-XQ4h?;Mh(yth(xA$QCe?Iyv}4FSknvpiCxx&jC-1a~uJwHC#C z@vAl6)sh_REYixG8=<-Jn*W}ZzHOs5#jLFyjI11VpSju?*=yg%`(S*ZdZ5a{dRFz#dGCsa`lO#<>`g-O8)_2yPT7s!#1p zV}yZWRr0Y=XGsJ<8~YT2w7^?8B&^&wyN1Hg3dtM0;*N4V<>68`bA^FjGCk3tRLSUP zq%b985hnv~b1aK(K`%yFX&33PDA3CqW*u5SdY!VqM7_LQIlns1spyP zEByXU3@AUp&upFuD_Evmv(0@Q#^a&lfM+)b!Mo#X!67igemx>^|HS+D5rN;=fA_$E zqU^tlB75__e?Yn8e*blpgq!m52b4SR=3hryzG*xEfO0pyK18{H!t}dd7;eVo8XFZoHohf7g9D@gIwlqAc`HsDgoE+`L$CLZ6KE_Ub Date: Fri, 18 Nov 2022 13:45:23 +0100 Subject: [PATCH 2/2] address comments --- README.md | 21 ++++++++------------- 1 file changed, 8 insertions(+), 13 deletions(-) diff --git a/README.md b/README.md index 9176764..9f8704d 100644 --- a/README.md +++ b/README.md @@ -2,9 +2,9 @@ This library is implemented originally by Katsunori Kanda [potix2/spark-google-s The changes which were introduced in this fork: -1. Usage OAuth 2.0 to access Google APIs -2. Upgrade of Spark version to 3.1.1 -3. Miscellaneous code improvements +1. Usage OAuth 2.0 to access Google APIs; +2. Upgrade Spark to version 3.1.1; +3. Miscellaneous code improvements. # Spark Google Spreadsheets @@ -19,8 +19,6 @@ to understand all basic concepts. ## Requirements -This library supports different versions of Spark: - ### Latest compatible versions | This library | Spark Version | @@ -50,10 +48,10 @@ This library uses OAuth 2.0 to access Google APIs: [Using OAuth 2.0 to Access Go Please read this article in order to set up OAuth 2.0 in your Google Service Account: [Setting up OAuth 2.0](https://support.google.com/cloud/answer/6158849) -It's recommended to use JSON Key type. -JSON file that contains the private key should be downloaded and stored securely because this key can't be recovered if lost. +Keep in mind that you have to use the JSON key type, when you create a Service Account key. +A JSON file that contains the private key should be downloaded and stored securely because this key can't be recovered if lost. -There are two ways of providing authentication credentials to your application code namely: +There are two ways of providing authentication credentials to your application code namely: - by providing the path to the JSON file that contains private key described above @@ -150,11 +148,8 @@ The spreadsheet ID you can find in the URL of the opened document. The pattern o `https://docs.google.com/spreadsheets/d/` -3. Provide Google API key. -As it's described above, you have to set up OAuth 2.0 in your Google Service Account: [Setting up OAuth 2.0](https://support.google.com/cloud/answer/6158849) -When you create a Service Account key, please keep in mind that you have to use JSON Key type. -JSON file that contains the private key should be downloaded and stored securely because this key can't be recovered if lost. -Export the content of this JSON file as `OAUTH_JSON` environment variable. +3. Export the JSON private key as `OAUTH_JSON` environment variable. +Please see details [here](<#Using Google application credentials>). ## License