From 58929c759d116a8491ee734ba19c102ed9ee16f1 Mon Sep 17 00:00:00 2001 From: cohensam Date: Thu, 18 Sep 2014 03:18:49 -0400 Subject: [PATCH 1/8] Adding first attempt files. --- Part1/PROJ_WIN/CIS565_PROJ_1.suo | Bin 14336 -> 20480 bytes .../CIS565_PROJ_1/CIS565_PROJ_1.vcxproj | 4 +- Part1/PROJ_WIN/CIS565_PROJ_1/vc100.pdb | Bin 446464 -> 471040 bytes Part1/PROJ_WIN/src/kernel.cu.deps | 1134 ++++++++--------- Part1/src/kernel.cu | 81 +- Part2/Samantha Cohen-Project-1-2.sln | 20 + .../Samantha Cohen-Project-1-2.vcxproj | 73 ++ ...Samantha Cohen-Project-1-2.vcxproj.filters | 22 + .../Samantha Cohen-Project-1-2/matrix_math.cu | 123 ++ 9 files changed, 884 insertions(+), 573 deletions(-) create mode 100644 Part2/Samantha Cohen-Project-1-2.sln create mode 100644 Part2/Samantha Cohen-Project-1-2/Samantha Cohen-Project-1-2.vcxproj create mode 100644 Part2/Samantha Cohen-Project-1-2/Samantha Cohen-Project-1-2.vcxproj.filters create mode 100644 Part2/Samantha Cohen-Project-1-2/matrix_math.cu diff --git a/Part1/PROJ_WIN/CIS565_PROJ_1.suo b/Part1/PROJ_WIN/CIS565_PROJ_1.suo index ad46c84a9a00137f71e370ba57e24d2f64f37088..956dc6b04cf47f14a3ed9790115c66d1102b5db7 100644 GIT binary patch literal 20480 zcmeHPYj9h|6~4}sB$SYn2MIKYJRl*FqK7Tpc@ayp6JsZFBFBcJB$efBTed72J)D8sGRMerc6o6FF~5n!o`X*0SV%4 ze}BKoR1_pX3+GV}doCIcMrslYVg0$>I(9r!$OA@DiiB48!}5m06UmjJVY zIlx?C9xxxc6j%UU27Ccn2vAn8KztQo0Zh|EU{N zbpXp!2gr9AQwLDTNIgIu!2C?}pSs~{U>UF+XaiOND*-cL0VqCJzy{a>2T%%>0j#$i zafO~x847W!1#4H|v+}{}NiCG$g;jq6Yq-rVH{~G+H3F?f4@P&6q85Q18H9F5Syo zO`yLNF>Q~0=}$%3*#JX2eHVC8!9y2%C4&ConrOtwGH6q27v)Qzx}S5IA)P+gKlXhO z=mx<-*pNEys0k&DP!DHezVta(yk_<(gHD zzpR+RfknTK@wV@t!}*9(3V8dqGAvIoi^rQqv;IltM7$@jrJzsyi}q~3^e;tOIsbi-(>VIL8`4V& zl#(d#M_(edaU}R0i#d|c5obsMea3mMYO_|3Mq#l&6$ZV0E_s70%5hVY<9V*flWj1f zXoMFS#%kROVEyr8x8MYJW8ISL5?3sD)@K&jLjmaj9_R<|TG!&*qpgPP;Tf5ZXIw?a z)8y*W20fB#_2@=^`s%@bUvl-JHI`}hAQwJOJEh7YOSghj5PH*sH9MU<4iQ&w!K)as z%!mx+4tvSh>H2Zhh^y!=9o}PjB zCN2IbDAHm{S~`PmIrax_K56-hw-w0y@cl3J3HIoUC*7t@UFjM?lY0^)OW3N9KkLl5 z|4k@+8I2GGBl^-B_GoLQU%ZfZwSYeRl77&9>8FlA<&7(eY@_IBYK(Tx{^F`9_5o>x z^;_5ZN=XKPr0kQrl!2gTv}{&2j* z@AE_giEix2W4?{S_@+dQuL0%MK-_HeHTa`(n=cj(_`1|+SPfYMiN3xA6)s1G9e-x0 z9n9El9-FzMqTFh>BGgormz7s~><3Sd8`iu1!LTLJ(*w>u)@oZ-ncHc0Im#-{rPZZ2 zbEV5^H9H*6GW4h0US%Iooa>$5kuT@37pu=XZ#F~HLzpncOQCU0w$YeBb|HCkZjsQPL{KX;$L$8n&a zEZ70KG)+hEiwmV%2lc_aqVtm6`wWKtaIqvj{XYbcFhW+SOmdyCHYQ=p& z+jRVo$4+1JO!Lfo<)U{Mg#J5i@zx{#cYbpEt*5I|J$%o9oxSw8FDzcX@x))w9K5=v zo85rmRVvzMzjgMlh4oKA+WP#13qPtH-uv{PpG!Y(Gbzd`^v2kDkBVc(3olqTU)b})+x&+{k&oa+=-p_9<6uiSE3Ztx84e>ZCXl&1{JpN(7i05_y(SfHql3W ze}?fDHPdS++LUh7(@&kcr%(4&vrXQNZE8qqla#*Xh*Rb`_ioL=(@1#FB?p5?n;v=L z&pVodkCE=1C9o8@gYAMuaD2Nk()59Iy|qCWd1au5mnto?XRklam6hK=eRwxrin&QS zqD!%msdTT({Uzhe=d%9ZT~Ac)I{cqmu}Pa(KAL!g-)6ZlObu-BK%fBeA!CM7{(iX? zzM@W8;Oy7*J${q+1zoU}?TYgUCG;ordq10ztz2rp6F0Z&8E#E3VvrhsH&LN4(z!@| zb7-39FSs|j9LSe`>i%2W@$`fAfERjyD22k>r@zBEfWCGvfoGfYrC*7%pXlSg16rF@ z{m_fljglXsZF75oc>aTQWiaMeJWXvsa>n%g^&2JUx9#}A->d(2=Jp#V2H$t|y=q(I zkbWpR9*m9bKZ*gh-0lLrC7So8xps>$4{r=de#+W#A{1J$I6C6-p4F>X#R47bCd|54 zP4?)03_%)eAdTGn35$X68HIC|KkhgK(EQXt{Nl)OkM;U|F!H-ep7{MQwXbEqQOunz z_qGg!=UXGVptTeq@O{%(O0HYtozyG*kHe;)I{)|`*p$XaYW=y=IR92c8Ttaie#ljx z&*@d@)eCm}W0vlL<12wk_o`O4R}Dpa)aa`2SW6-pYF%ZwS{*FDA%l#PHt8~E7>(s? zGKqzYcjzYNn2kJgqaVm|M?xX^CvjScBq#BR&ScKkXQlyxq>P4ajY=_%6m7reb-aI3x>jmWtm3#rm6i@<qU0{?;Yd*&!vCtslt*WE%o1G zy_Fe?49ViIM(Uqj^o93eNEHJaq(2#VQ-HyeF;HYkg0|0(^xRzZbvOHvI%Edvn{aa# zkS~4y!r~tQLoWK7;U$V@yn_B9>B?9rEz^tppXhl$&At^L%ns$>+irLyev7(f>5hw< zZVc|ez*u%!#jkY#%m0Uu@$N;PH#xgz?)>P(8QUIvan(C@Xkqrb%k~@VGVyzgKG^sp z^?hXjIaYl=#y^DphuDSru@6%rllh)LPTC(lQ^y`<*j?GqpNkU4*M*pW z{A~%p(B@0O31u}i9&xVu=R_H!!1;d#3dosaR8~IwA`&0yH@WEZcPvJskoo^*RQQ-) z=`D!4AN>mA+w?T=??n18;BMeBa1U@Va38=k%3lSJ0AB-+0$&Hd0elm90C*612zVHH z1o#&4ZQxPhI{@iF4m<%o2|NWn4Lk#U7a$MMB7P1y27Dj*0q{fMdEf=$Mc_EVvOh*l zoyPPD#4qdVR}jCdr+|hIyf;tJ%FR=iy{5XnwAyBNm6ztyKm7HNQ7FX!b}K66d}qi_ z-`E1=`BU!YuK_rrS!b?5+b@6mcEjolY5z;8kvfbaSNkhPv8>lZ^#4iJMxDyw;Hf&R zxeBM%aH_7#>aMib)YP~g2m97;4#omC!H^nT-xpZz!;!xBsK49P#vIeKy%o;oz5&Of zsaN~mn<=LK@Xf?u*7Xb0uS z#!<*2NDY@NC9rgQlUb+u!_q4O&-gipIo!jH$<~OJXyn9!>`&h0>2{uU8ApvA?pf1; z*60b*NRFHcvrn1zl&AP!wozQ=WP4;R+R4wudN{?cdNtrldclk3O)RvX!}Eg5mdIE% zkzauHaO!F#<^Z0IYyP;@`I2`Ti$*n<$a~W%KhoE-QRnfBAX_72(aLS|7BM%+{D_=q zkmqRlyQNGx`u};->W-$)6aPtE_c6^_wf*N?LRAWFiHO>t&J5xzWeXjP?kQvVujS+E>@kx JspovA|3A4?Z{q*} delta 1914 zcmchYO-vg{6vsWw8rFuI@L>We#hAJz1xic{N%$}rY-kfcj6-Q@vL(3oLN@pV+3PgW z51bo^9Ex8mRlOvaNIf)_h^3yYmWWEJs;X(KUaCq_4!QM`gORGL4gHTY-8{q6QAzv>`<97lesR0+4`I_Esn7gvWqs$lL2<)*n|75 z$biyFE8_Q316p-}Y%V)F#gcTTNLP%bCUzJ?Yk`6`D@6Lk_&IB#TgAtnC!jhBPJwRF z1A0LpaDjd>0Nh{@kXX{V%2zgu&KMX66Tk;1fghX&Q{Z(l4Y>WAsAu%{In?KYtoLX2 zdQPw3K^+7kpn!R>04@TR@ADEGiy#alfKQMM*HYvX^r z$uBuGZb=lxXJA3I<87pwCnXHsspSGaa+EduOw0sHWr5Tvq4NnXmySx+XcZTzC|i1{ z=o%bsG%ZuKtd`DH9LsE$6$vKB2bJwJwfIa15t6Dxsa6hY1tfNeB_*i6^3s$V zNrnTO8jI3jLyEi*#xJSSU02INmikv+uZ0;H+YOaL zbssEBVu@jB1A%Iv*$ZPnp6!=|SNQ(k_w>lq`@-3C;#ah84)_I$L8~({Wr>E*5TSKzB=bF4@Y>yxPa^O8s;(h#>=L^4G?esvj zXHrKV3)CdmQ|h@?y4fr78S25_t;F7-WV4UHt?o-7^_W;ARQ$B=sf*u2=aGR{j-IBi zY8SQQFwIG=Mn4RH)-R??`nq9^ei0qywuPuss>d6#?iA@e+o^O*mc32eEjy(b;O8#7W+At|lIBi|Ij2|2Ic}j(9nJL65yusod*F8McmHX?@gjZJUt2VD z-;Dd*b?0S|U^3XDY|8oEef&u4oFuQsN!iXxiGP~5eafyB`yjL3HOgqyag&88@Ph3h Dp6WC> diff --git a/Part1/PROJ_WIN/CIS565_PROJ_1/CIS565_PROJ_1.vcxproj b/Part1/PROJ_WIN/CIS565_PROJ_1/CIS565_PROJ_1.vcxproj index 4c88226..6b18a53 100644 --- a/Part1/PROJ_WIN/CIS565_PROJ_1/CIS565_PROJ_1.vcxproj +++ b/Part1/PROJ_WIN/CIS565_PROJ_1/CIS565_PROJ_1.vcxproj @@ -30,7 +30,7 @@ - + @@ -114,6 +114,6 @@ - + \ No newline at end of file diff --git a/Part1/PROJ_WIN/CIS565_PROJ_1/vc100.pdb b/Part1/PROJ_WIN/CIS565_PROJ_1/vc100.pdb index b662f859b27e4fb8b4753032a64cb6eb2e33341b..500ca4985617140441788e7690b4247b046dd4e1 100644 GIT binary patch delta 17503 zcmeI4eRR{+8ONU_ZD6JFK{hu+ngpzbBqXVn9W|Mo z3_Z$F(Clo}gKlHPmovIf>kHc{lXJ#&$Jd=RH+5svP1GHsxE&`>+4tu6(%W$RyZ7vL z$v@n4F3r97`905lp68SMOYZNI^uDR-W9RR#E>93qzYYJOZ0&9IKIcfS@wIz>O^q!r ziyNC(`I=i-w>P#nEp2F^4^9n~k-cKPo2V>JG?r{tc3d_AZATGnmsB{=9z}MV$YdZp zNtf*)=Xj!>OdMs1fu*f$o3Czb^{w^P*I&^UiKP5nLcux7?Xl3Du0_<&y7-=kVJ^n} z{zQ(d^6?IunKiOD6i;+UBXnwj^_62HMdXb}!<22i&@Q2-aL6C&iY<>vyJA{@l{-Z= zX2?L3KN1LqoBZK$hd;2HMS&?~9#_!S{y=C+IIJ5v4@%FY#z3<327eN^o1&3qs5?na zelo@;(X!5nKiv5>ZPd^>P=VM?!Fbl!6z{%nj18VoU9^&l$CpxteJa{UnLb=NX=ow` zOBlG}-*_{(~> z^j&<=>9duG@OmTa^{CfnzTV@UW$T2weTVkiOG;BK zD9{!3FOA2e@fH4faGO6KTAc)L*&g6!hmXyJGR@&|Xp=wO97!bOUEnRtIg?pAD05Y3 z0<3L{go2I1U_6vaF#nm%szI6N$PJJQHl{R*)iLl)S)rD*guDSm!ByOtHW9;uW}uZD z*9>@*-MSNPXN@h7`(tCT>kK4}x67%4O#glMC1qYBdlC_hdNO10w@;bD7MPevTVP(h zKwEIMwu9b4`}xe)2koG7c}v^k#ulHi)#t1A_>x;<-5ZlWH%g5s;Lg5a4c|Q9=1@Em z3i|@Fn8(-I>8o{z0t-WRb@lZ=AInBtAkY=Vn}7S_nk6ej;aDh6YljxPsJs&4Kw)ED_-l@ljV{s0Qp>W-pbg6fD-o(h%DbRo zxT&+OGMO776NQuhWHerVDGfYNMd)?fx1rxCMP&LpoSP{Y-2Y#AhcjzhO%0P(g!inF zB5X!TS?IfGXv^UsXn1?Gr4DyGc8_GRv9TWXjl#7z>TLtc;?H7ygGp%-m9Y-i#>MP{ zsYV+Yjv@zzxi(>ZE7xZ7sp;%`$N0sSSeu|@!Fr@cM!7rWsVxBIie`1nZ$$)n%2|5{ zs)%xP3=y%U{98~UqkJc1M3mo#exq1XuDh9U%5$SzC#Rebn?bo8o(^|G;88QH!*d66 z#lyqeccO~mF~?-^?1BOrJl}?l2%g>OH;N@ZnwxotCnvfJ@bF{xtl-hz%r`u_(N%zl51TqXHNgdi zfJe=&4$pUyD;^%!{vN6b9&=0v&-bA~2G4zv5y5jm`i)`0kfj>+Kp0TjsK`5|OP@can)XjX^kC&(2K4{IMl6~SYU$>4bi3S{s+3>gtTkD%WuR`BR<<{O^e=&q7~ zVCKVSz*EJ2`?9CKkFqbz($7KeI(H#%Wr}8XpbjE`JW#CNgDL{b9Fqa{GboS&)e9LB zP(Me%QLKQ{-OM*oxzSYsiVvF@s1-j|11d1TAh$O)vpP^uAb&hitoN^v4HoK~f9VfP<#CDucmt`jGweL@1 zZ*Op$@~nL{ckKd|%~2%V*hV{n_?|=^4LgF_kUhPxKZ`du5{c58Z4cUA7Q3@zM{YJ5bT#IhrAD&wf}&X+C>Qd_1I5}Es3M@uF&R*mP#^=U3Nj*~ z#-ZORRzT@)<{PNo=qdokhfNGriP|mC6I@t`Tb`m>9jJ-O9}g63PeK&|Wsb>!nhXUp zpr$}Z1k_aY8^s!^Y;^Mv)X=0V?1S-PGk}tBTlhnT*avf~nbqOB5V_*vVeM(CB6!R( z89dXWKnBkY$cW&XiGHJ4!K1sGZ+LQ}tFRjJVKd+vx({~rh}!LK;evwP-qg(MKwX0T z@j$WmEL0It<`@Um2GMz&YABEabtz;-K+Q(KQ5aBV69_IjefufSQ9 z5~2*gD-2vrbk7%vCUz6OIt#AEiIz?!x@Z#7bR5U5m`U^=uKl4U9*hnF`P>~ zeI>qTz|xe9h+g<2(f+kK0E)hr7NQ3(Bf4@W(XC6N`zp+`l2Y_o8__?mAnL^=7D4g9 zaKY0szTrzSbPdr>=v&>0$6*~rkGqJj#iU-wL_O_9SK};FvK-2s*az?t-CRqwb_~(n zAkMVZ5go$mt@pz?6N8CAT#YkxRYY$<|9%{{TL?$St;DI!J8{8jC|PkWQRP>OemoA_ z5Afv9dGK&F(KDDtg&T>3mv7<1mpXAIYyr{bnAAM9KWZZCm`l`(C_N6(-i7=_8;Bkq zMf5j2(J!F88M^B+$_XNXQ%hfQTzjC1$cf2ZxRNON6{2V1?G@(}U52RK1ZNN9 zn#1cTKoKbK0hgBmmR*=tDU3gYNqvHxy$N7mM-=YEalH$t6NPXva!N5+!6c5t%LyBi zPoyY{`q)@N3I|^H1K4pmY>TloVCcyDzWHmM9X40YgTt=37t>+<-M6n8=O#ZYF14rE z^{A~m9UXSUi9P<~G#Rw%ccSskd~BY!+l ztoaUOy0re*Ojba6q?q9UC4-ldJp|Zu>?wUGw(p@z~!uVLxr1H(bu-y`J~G>t*>;S zbDJ%b?r|0k|6>GqZzwW<(cn0&u!4>ItVTQq$o{H9MEga9N<4xX{6z!2-kf2-YS2^r aTSu?g+g)ah_1a=*T)h>sldikG75@cH`jln> delta 3475 zcmcK4X?RrC835p!EJLEeBnXHC5_YT`wASKUK!l(b5EoETf@}$!gQBRQ6Ho$z3K+zI zf{2QMfWp`$AR?e_qSm(Vg2tVyt+ZOLMT_q>9Xd^={PI4}d%y3T^PO|%E^~{kniMZ; zJgZ%u(VQL-|=%RzvbWQ0` z%@4NekJZEvR{UfA&w6!g_7^v=eq6~%7R3K2xH~E1@zlmon%CI)-+Q_LDZl2w^3(|% zkEp)#pf9)s!JNy|LZQnU%rHihp+A!dTo!RQJcEt;MED$qah8bPa}?`Tk!n5at>#57Dsawr;tZen$VbIXvQx& zndbbQU(kUz1TSbyTF{#I97kJz#<8^Hc;Y8HoyZCNiqkobj+{#H#&@O*o%l6p^B8ma z4VQ2+7jYqdxPadDq9^BbDgEfnWw`iwXn>?YmvaRL3}h&S7|fLnVK~DmqL7h{U<{)e zO^mC!nz4-I8m{A7u4glyDoGU(!%sXlw*u$#eNYf3|jQ z`B`GRxLjP%D$eB-HUFYg9u5Bgy(HtvWgVx}l}{D*5$o|!3+?4Nc^(@W$Rg&FW$Z)s zO~n(n{~>QLp3HO>GKg&BdyIX>Jnbve-2cC7)HTqS#(bk)U+koy4igwp208i{imNAO=jVHt>XiR6~r#USk#!8MbIgg3*BgM~6 z$`BjSio^uEagL>-eYc1~A_Zdy(AB$0O8ZC^^5gUm|6I^tr*fY)ZZ>zDHEzGAW z5rci1M;Z4ppL@BF1zf^MEaedvazBfBfWLp;n9ma&56Jj&z2_kWdSC6AF{HEZ}S zPw*ral(Uvf*6|*@*~NO^=3T0IhYxw55BQipe99+$#vrbYI}N6UsZ8TGrZa;t`I_c@ zLz=sj9^4^n(}0FFBF%V9u?@#_5*-+z{}k~|`pdhC`ShSoZ>I~nh>IDhV1QWV`6Z8n zszh$i=h4#dnnlCK)oET%#+vq7dEu-?cJF=B=D|bgu6$xq<`0vOaxU|+3_i|5&#*-6 zg8t#sIfGh-hbB7r?q7FutRxmLiKUgq(o13)OJbQ-<+CCkGb_rx - my_pos.x, their_pos->y - my_pos.y, their_pos->z - my_pos.z); + float r_ab1 = pow(r.x,2.f) + pow(r.y,2.f) + pow(r.z,2.f); + float r_ab2 = r_ab1 * sqrt(r_ab1); + glm::vec3 a = glm::vec3(3e8*starMass*r.x/r_ab2, 3e8*starMass*r.y/r_ab2, 3e8*starMass*r.z/r_ab2); + //float f = (G * 3e8 * starMass) / pow(r_ab, 2.f); + return a; } // TODO : update the acceleration of each body __global__ void updateF(int N, float dt, glm::vec4 * pos, glm::vec3 * vel, glm::vec3 * acc) { // FILL IN HERE + + /*int width = 800; + int height = 800; + float s_scale = 2e2; + + int index = threadIdx.x + (blockIdx.x * blockDim.x); + int x = index % width; + int y = index / width; + + float w2 = width / 2.0; + float h2 = height / 2.0; + + float c_scale_w = width / s_scale; + float c_scale_h = height / s_scale; + + glm::vec3 a = accelerate(N, glm::vec4((x-w2)/c_scale_w,(y-h2)/c_scale_h,0,1), pos); + acc->x = acc->x + a.x; + acc->y = acc->y + a.y; + acc->z = acc->z + a.z;*/ + + + int width = 800; + int height = 800; + float s_scale = 2e2; + int index = threadIdx.x + (blockIdx.x * blockDim.x); + + for (int i = 0; i < N; i++) { + int x = i % width; + int y = i / width; + float w2 = width / 2.0; + float h2 = height / 2.0; + float c_scale_w = width / s_scale; + float c_scale_h = height / s_scale; + glm::vec4 * p = new glm::vec4(); + p->x = pos[i].x; + p->y = pos[i].y; + p->z = pos[i].z; + glm::vec3 a = accelerate(N, glm::vec4((x-w2)/c_scale_w,(y-h2)/c_scale_h,0,1), p); + acc[i].x = acc[i].x + a.x; + acc[i].y = acc[i].y + a.y; + acc[i].z = acc[i].z + a.z; + } + } // TODO : update velocity and position using a simple Euler integration scheme -__global__ void updateS(int N, float dt, glm::vec4 * pos, glm::vec3 * vel, glm::vec3 * acc) +__global__ void updateS(int N, float dt, glm::vec4 * pos, glm::vec3 * vel, glm::vec3 * acc) //DONE { - // FILL IN HERE + /*pos->x = pos->x + vel->x * dt; + pos->y = pos->y + vel->y * dt; + pos->z = pos->z + vel->z * dt; + pos->w = pos->w + dt; + + vel->x = vel->x + acc->x * dt; + vel->y = vel->y + acc->y * dt; + vel->z = vel->z + acc->z * dt;*/ + + for (int i = 0; i < N; i++) { + pos[i].x = pos[i].x + vel[i].x * dt; + pos[i].y = pos[i].y + vel[i].y * dt; + pos[i].z = pos[i].z + vel[i].z * dt; + //pos[i].w += 1;//= pos[i].w + dt; + + vel[i].x = vel[i].x + acc[i].x * dt; + vel[i].y = vel[i].y + acc[i].y * dt; + vel[i].z = vel[i].z + acc[i].z * dt; + } } // Update the vertex buffer object @@ -180,6 +250,9 @@ void initCuda(int N) void cudaNBodyUpdateWrapper(float dt) { // FILL IN HERE + dim3 fullBlocksPerGrid((int)ceil(float(numObjects)/float(blockSize))); + //updateF<<>>(numObjects, dt, dev_pos, dev_vel, dev_acc); + updateS<<>>(numObjects, dt, dev_pos, dev_vel, dev_acc); } void cudaUpdateVBO(float * vbodptr, int width, int height) diff --git a/Part2/Samantha Cohen-Project-1-2.sln b/Part2/Samantha Cohen-Project-1-2.sln new file mode 100644 index 0000000..38db184 --- /dev/null +++ b/Part2/Samantha Cohen-Project-1-2.sln @@ -0,0 +1,20 @@ + +Microsoft Visual Studio Solution File, Format Version 11.00 +# Visual Studio 2010 +Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "Samantha Cohen-Project-1-2", "Samantha Cohen-Project-1-2\Samantha Cohen-Project-1-2.vcxproj", "{4268F3F4-F228-424F-B1EB-0E8E3109FA7E}" +EndProject +Global + GlobalSection(SolutionConfigurationPlatforms) = preSolution + Debug|Win32 = Debug|Win32 + Release|Win32 = Release|Win32 + EndGlobalSection + GlobalSection(ProjectConfigurationPlatforms) = postSolution + {4268F3F4-F228-424F-B1EB-0E8E3109FA7E}.Debug|Win32.ActiveCfg = Debug|Win32 + {4268F3F4-F228-424F-B1EB-0E8E3109FA7E}.Debug|Win32.Build.0 = Debug|Win32 + {4268F3F4-F228-424F-B1EB-0E8E3109FA7E}.Release|Win32.ActiveCfg = Release|Win32 + {4268F3F4-F228-424F-B1EB-0E8E3109FA7E}.Release|Win32.Build.0 = Release|Win32 + EndGlobalSection + GlobalSection(SolutionProperties) = preSolution + HideSolutionNode = FALSE + EndGlobalSection +EndGlobal diff --git a/Part2/Samantha Cohen-Project-1-2/Samantha Cohen-Project-1-2.vcxproj b/Part2/Samantha Cohen-Project-1-2/Samantha Cohen-Project-1-2.vcxproj new file mode 100644 index 0000000..f310f12 --- /dev/null +++ b/Part2/Samantha Cohen-Project-1-2/Samantha Cohen-Project-1-2.vcxproj @@ -0,0 +1,73 @@ + + + + + Debug + Win32 + + + Release + Win32 + + + + {4268F3F4-F228-424F-B1EB-0E8E3109FA7E} + SamanthaCohenProject12 + + + + Application + true + MultiByte + + + Application + false + true + MultiByte + + + + + + + + + + + + + + + + Level3 + Disabled + + + true + cudart.lib;%(AdditionalDependencies) + + + + + Level3 + MaxSpeed + true + true + + + true + true + true + + + + + Document + + + + + + + \ No newline at end of file diff --git a/Part2/Samantha Cohen-Project-1-2/Samantha Cohen-Project-1-2.vcxproj.filters b/Part2/Samantha Cohen-Project-1-2/Samantha Cohen-Project-1-2.vcxproj.filters new file mode 100644 index 0000000..f07abd3 --- /dev/null +++ b/Part2/Samantha Cohen-Project-1-2/Samantha Cohen-Project-1-2.vcxproj.filters @@ -0,0 +1,22 @@ + + + + + {4FC737F1-C7A5-4376-A066-2A32D752A2FF} + cpp;c;cc;cxx;def;odl;idl;hpj;bat;asm;asmx + + + {93995380-89BD-4b04-88EB-625FBE52EBFB} + h;hpp;hxx;hm;inl;inc;xsd + + + {67DA6AB6-F800-4c08-8B7A-83BB121AAD01} + rc;ico;cur;bmp;dlg;rc2;rct;bin;rgs;gif;jpg;jpeg;jpe;resx;tiff;tif;png;wav;mfcribbon-ms + + + + + Source Files + + + \ No newline at end of file diff --git a/Part2/Samantha Cohen-Project-1-2/matrix_math.cu b/Part2/Samantha Cohen-Project-1-2/matrix_math.cu new file mode 100644 index 0000000..44a506f --- /dev/null +++ b/Part2/Samantha Cohen-Project-1-2/matrix_math.cu @@ -0,0 +1,123 @@ +#include +#include + +int main () { //You need to fix linker issues so that your methods will be recognized, but in the meantime you can check out individual functions by putting them in main + float * M;//[5][5]; + float * N; + float * P; + int size = 5*5*sizeof(float); + int numBlocks = 1; + dim3 threadsPerBlock(5,5); + mat_add(M,N,P,5); + printf("SAMMMMMMMMMMMMMMMMMMMMMMMMMMMM"); + getchar(); + return 0; +} + +__global__ void mat_sub_kernel(float* Md, float* Nd, float* Pd, int width) { + int tx = threadIdx.x; + int ty = threadIdx.y; + + float Pvalue = Md[ty*width+tx] - Nd[ty*width+tx]; + + Pd[ty*width+tx] = Pvalue; +} + +void mat_sub(float* M, float* N, float* P, int width) { + int size = width * width * sizeof(float); + float *Md; + float *Nd; + float *Pd; + //Matrix 1 + cudaMalloc((void**)&Md, size); + cudaMemcpy(Md,M,size,cudaMemcpyHostToDevice); + //Matrix 2 + cudaMalloc((void**)&Nd, size); + cudaMemcpy(Nd,N,size,cudaMemcpyHostToDevice); + + cudaMalloc((void**)&Pd, size); + + dim3 dimBlock(width, width); + dim3 dimGrid(1, 1); + + mat_sub_kernel<<>>(Md, Nd, Pd, width); + + cudaMemcpy(P, Pd, size, cudaMemcpyDeviceToHost); + cudaFree(Md); + cudaFree(Nd); + cudaFree(Pd); +} + +__global__ void mat_add_kernel(float* Md, float* Nd, float* Pd, int width) { + int tx = threadIdx.x; + int ty = threadIdx.y; + + float Pvalue = Md[ty*width+tx] + Nd[ty*width+tx]; + + Pd[ty*width+tx] = Pvalue; +} + +void mat_add(float* M, float* N, float* P, int width) { + int size = width * width * sizeof(float); + float *Md; + float *Nd; + float *Pd; + //Matrix 1 + cudaMalloc((void**)&Md, size); + cudaMemcpy(Md,M,size,cudaMemcpyHostToDevice); + //Matrix 2 + cudaMalloc((void**)&Nd, size); + cudaMemcpy(Nd,N,size,cudaMemcpyHostToDevice); + + cudaMalloc((void**)&Pd, size); + + dim3 dimBlock(width, width); + dim3 dimGrid(1, 1); + + mat_add_kernel<<>>(Md, Nd, Pd, width); + + cudaMemcpy(P, Pd, size, cudaMemcpyDeviceToHost); + cudaFree(Md); + cudaFree(Nd); + cudaFree(Pd); +} +__global__ void mat_mult_kernel(float* Md, float* Nd, float* Pd, int width) { + int tx = threadIdx.x; + int ty = threadIdx.y; + + float Pvalue = 0; + + for (int k = 0; k < width; ++k) { + float Mdelement = Md[ty * width + k]; + float Ndelement = Nd[k * width + tx]; + Pvalue += Mdelement * Ndelement; + } + + Pd[ty*width+tx] = Pvalue; +} + +void mat_mult(float* M, float* N, float* P, int width) { + int size = width * width * sizeof(float); + float *Md; + float *Nd; + float *Pd; + //Matrix 1 + cudaMalloc((void**)&Md, size); + cudaMemcpy(Md,M,size,cudaMemcpyHostToDevice); + //Matrix 2 + cudaMalloc((void**)&Nd, size); + cudaMemcpy(Nd,N,size,cudaMemcpyHostToDevice); + + cudaMalloc((void**)&Pd, size); + + dim3 dimBlock(width, width); + dim3 dimGrid(1, 1); + + mat_mult_kernel<<>>(Md, Nd, Pd, width); + + cudaMemcpy(P, Pd, size, cudaMemcpyDeviceToHost); + cudaFree(Md); + cudaFree(Nd); + cudaFree(Pd); +} + From 97c7a74c89c97edcc0f3d831a2b38fc3d58540b5 Mon Sep 17 00:00:00 2001 From: cohensam Date: Sat, 20 Sep 2014 01:56:01 -0400 Subject: [PATCH 2/8] Finished GPU Matrix math, but not CPU. --- Part1/src/kernel.cu | 4 +- .../Samantha Cohen-Project-1-2/matrix_math.cu | 87 ++++++++++++++++--- 2 files changed, 77 insertions(+), 14 deletions(-) diff --git a/Part1/src/kernel.cu b/Part1/src/kernel.cu index 2871655..f778d12 100644 --- a/Part1/src/kernel.cu +++ b/Part1/src/kernel.cu @@ -131,7 +131,7 @@ __global__ void updateF(int N, float dt, glm::vec4 * pos, glm::vec3 * vel, glm:: float s_scale = 2e2; int index = threadIdx.x + (blockIdx.x * blockDim.x); - for (int i = 0; i < N; i++) { + for (int i = 0; i < N; i++) { //Maybe need nested for loop this way the "other" body becomes all but the current one int x = i % width; int y = i / width; float w2 = width / 2.0; @@ -151,7 +151,7 @@ __global__ void updateF(int N, float dt, glm::vec4 * pos, glm::vec3 * vel, glm:: } // TODO : update velocity and position using a simple Euler integration scheme -__global__ void updateS(int N, float dt, glm::vec4 * pos, glm::vec3 * vel, glm::vec3 * acc) //DONE +__global__ void updateS(int N, float dt, glm::vec4 * pos, glm::vec3 * vel, glm::vec3 * acc) //Check if pos[3] is actually the mass of the body { /*pos->x = pos->x + vel->x * dt; pos->y = pos->y + vel->y * dt; diff --git a/Part2/Samantha Cohen-Project-1-2/matrix_math.cu b/Part2/Samantha Cohen-Project-1-2/matrix_math.cu index 44a506f..6f3a318 100644 --- a/Part2/Samantha Cohen-Project-1-2/matrix_math.cu +++ b/Part2/Samantha Cohen-Project-1-2/matrix_math.cu @@ -1,18 +1,6 @@ #include #include -int main () { //You need to fix linker issues so that your methods will be recognized, but in the meantime you can check out individual functions by putting them in main - float * M;//[5][5]; - float * N; - float * P; - int size = 5*5*sizeof(float); - int numBlocks = 1; - dim3 threadsPerBlock(5,5); - mat_add(M,N,P,5); - printf("SAMMMMMMMMMMMMMMMMMMMMMMMMMMMM"); - getchar(); - return 0; -} __global__ void mat_sub_kernel(float* Md, float* Nd, float* Pd, int width) { int tx = threadIdx.x; @@ -121,3 +109,78 @@ void mat_mult(float* M, float* N, float* P, int width) { cudaFree(Pd); } + +int main () { //You need to fix linker issues so that your methods will be recognized, but in the meantime you can check out individual functions by putting them in main + float * M = new float[25]; + M[0] = 2; M[1] = 1; M[2] = 1; M[3] = 1; M[4] = 1; + M[5] = 1; M[6] = 3; M[7] = 1; M[8] = 1; M[9] = 1; + M[10] = 1; M[11] = 1; M[12] = 4; M[13] = 1; M[14] = 1; + M[15] = 1; M[16] = 1; M[17] = 1; M[18] = 5; M[19] = 1; + M[20] = 1; M[21] = 1; M[22] = 1; M[23] = 1; M[24] = 6; + float * N = new float[25]; + N[0] = 2; N[1] = 2; N[2] = 2; N[3] = 2; N[4] = 2; + N[5] = 2; N[6] = 2; N[7] = 2; N[8] = 2; N[9] = 2; + N[10] = 2; N[11] = 2; N[12] = 2; N[13] = 2; N[14] = 2; + N[15] = 2; N[16] = 2; N[17] = 2; N[18] = 2; N[19] = 2; + N[20] = 2; N[21] = 2; N[22] = 2; N[23] = 2; N[24] = 2; + float * P = new float[25]; + P[0] = 0; P[1] = 0; P[2] = 0; P[3] = 0; P[4] = 0; + P[5] = 0; P[6] = 0; P[7] = 0; P[8] = 0; P[9] = 0; + P[10] = 0; P[11] = 0; P[12] = 0; P[13] = 0; P[14] = 0; + P[15] = 0; P[16] = 0; P[17] = 0; P[18] = 0; P[19] = 0; + P[20] = 0; P[21] = 0; P[22] = 0; P[23] = 0; P[24] = 0; + + printf("Matrix M:\n"); + printf("%f ",M[0]); printf("%f ",M[1]); printf("%f ",M[2]); printf("%f ",M[3]); printf("%f ",M[4]); printf("\n"); + printf("%f ",M[5]); printf("%f ",M[6]); printf("%f ",M[7]); printf("%f ",M[8]); printf("%f ",M[9]); printf("\n"); + printf("%f ",M[10]); printf("%f ",M[11]); printf("%f ",M[12]); printf("%f ",M[13]); printf("%f ",M[14]); printf("\n"); + printf("%f ",M[15]); printf("%f ",M[16]); printf("%f ",M[17]); printf("%f ",M[18]); printf("%f ",M[19]); printf("\n"); + printf("%f ",M[20]); printf("%f ",M[21]); printf("%f ",M[22]); printf("%f ",M[23]); printf("%f ",M[24]); printf("\n"); + printf("\n"); + + printf("Matrix N:\n"); + printf("%f ",N[0]); printf("%f ",N[1]); printf("%f ",N[2]); printf("%f ",N[3]); printf("%f ",N[4]); printf("\n"); + printf("%f ",N[5]); printf("%f ",N[6]); printf("%f ",N[7]); printf("%f ",N[8]); printf("%f ",N[9]); printf("\n"); + printf("%f ",N[10]); printf("%f ",N[11]); printf("%f ",N[12]); printf("%f ",N[13]); printf("%f ",N[14]); printf("\n"); + printf("%f ",N[15]); printf("%f ",N[16]); printf("%f ",N[17]); printf("%f ",N[18]); printf("%f ",N[19]); printf("\n"); + printf("%f ",N[20]); printf("%f ",N[21]); printf("%f ",N[22]); printf("%f ",N[23]); printf("%f ",N[24]); printf("\n"); + printf("\n"); + + int size = 5*5*sizeof(float); + int numBlocks = 1; + dim3 threadsPerBlock(5,5); + + mat_add(M,N,P,5); + + printf("Matrix Add:\n"); + printf("%f ",P[0]); printf("%f ",P[1]); printf("%f ",P[2]); printf("%f ",P[3]); printf("%f ",P[4]); printf("\n"); + printf("%f ",P[5]); printf("%f ",P[6]); printf("%f ",P[7]); printf("%f ",P[8]); printf("%f ",P[9]); printf("\n"); + printf("%f ",P[10]); printf("%f ",P[11]); printf("%f ",P[12]); printf("%f ",P[13]); printf("%f ",P[14]); printf("\n"); + printf("%f ",P[15]); printf("%f ",P[16]); printf("%f ",P[17]); printf("%f ",P[18]); printf("%f ",P[19]); printf("\n"); + printf("%f ",P[20]); printf("%f ",P[21]); printf("%f ",P[22]); printf("%f ",P[23]); printf("%f ",P[24]); printf("\n"); + printf("\n"); + + mat_sub(M,N,P,5); + + printf("Matrix Subtract:\n"); + printf("%f ",P[0]); printf("%f ",P[1]); printf("%f ",P[2]); printf("%f ",P[3]); printf("%f ",P[4]); printf("\n"); + printf("%f ",P[5]); printf("%f ",P[6]); printf("%f ",P[7]); printf("%f ",P[8]); printf("%f ",P[9]); printf("\n"); + printf("%f ",P[10]); printf("%f ",P[11]); printf("%f ",P[12]); printf("%f ",P[13]); printf("%f ",P[14]); printf("\n"); + printf("%f ",P[15]); printf("%f ",P[16]); printf("%f ",P[17]); printf("%f ",P[18]); printf("%f ",P[19]); printf("\n"); + printf("%f ",P[20]); printf("%f ",P[21]); printf("%f ",P[22]); printf("%f ",P[23]); printf("%f ",P[24]); printf("\n"); + printf("\n"); + + mat_mult(M,N,P,5); + + printf("Matrix Multiply:\n"); + printf("%f ",P[0]); printf("%f ",P[1]); printf("%f ",P[2]); printf("%f ",P[3]); printf("%f ",P[4]); printf("\n"); + printf("%f ",P[5]); printf("%f ",P[6]); printf("%f ",P[7]); printf("%f ",P[8]); printf("%f ",P[9]); printf("\n"); + printf("%f ",P[10]); printf("%f ",P[11]); printf("%f ",P[12]); printf("%f ",P[13]); printf("%f ",P[14]); printf("\n"); + printf("%f ",P[15]); printf("%f ",P[16]); printf("%f ",P[17]); printf("%f ",P[18]); printf("%f ",P[19]); printf("\n"); + printf("%f ",P[20]); printf("%f ",P[21]); printf("%f ",P[22]); printf("%f ",P[23]); printf("%f ",P[24]); printf("\n"); + printf("\n"); + + getchar(); + + return 0; +} \ No newline at end of file From c48934381816cc45ca9bd501764ffade35275b2b Mon Sep 17 00:00:00 2001 From: cohensam Date: Sat, 20 Sep 2014 02:06:53 -0400 Subject: [PATCH 3/8] Added CPU matrix math. --- .../Samantha Cohen-Project-1-2/matrix_math.cu | 60 ++++++++++++++++++- 1 file changed, 57 insertions(+), 3 deletions(-) diff --git a/Part2/Samantha Cohen-Project-1-2/matrix_math.cu b/Part2/Samantha Cohen-Project-1-2/matrix_math.cu index 6f3a318..1774d84 100644 --- a/Part2/Samantha Cohen-Project-1-2/matrix_math.cu +++ b/Part2/Samantha Cohen-Project-1-2/matrix_math.cu @@ -1,6 +1,30 @@ #include #include +void CPU_mat_sub(float* M, float* N, float* P, int width) { + for (int i = 0; i < width; i++) { + for (int j = 0; j < width; j++) { + P[i*width+j] = M[i*width+j] - N[i*width+j]; + } + } +} + +void CPU_mat_add(float* M, float* N, float* P, int width) { + for (int i = 0; i < width; i++) { + for (int j = 0; j < width; j++) { + P[i*width+j] = M[i*width+j] + N[i*width+j]; + } + } +} + +void CPU_mat_mult(float* M, float* N, float* P, int width) { + P = new float[25]; + for (int i = 0; i < width; i++) { + for (int j = 0; j < width; j++) { + P[i*width+j] += M[i*width+j] * N[j*width+i]; + } + } +} __global__ void mat_sub_kernel(float* Md, float* Nd, float* Pd, int width) { int tx = threadIdx.x; @@ -152,7 +176,17 @@ int main () { //You need to fix linker issues so that your methods will be recog mat_add(M,N,P,5); - printf("Matrix Add:\n"); + printf("GPU Matrix Add:\n"); + printf("%f ",P[0]); printf("%f ",P[1]); printf("%f ",P[2]); printf("%f ",P[3]); printf("%f ",P[4]); printf("\n"); + printf("%f ",P[5]); printf("%f ",P[6]); printf("%f ",P[7]); printf("%f ",P[8]); printf("%f ",P[9]); printf("\n"); + printf("%f ",P[10]); printf("%f ",P[11]); printf("%f ",P[12]); printf("%f ",P[13]); printf("%f ",P[14]); printf("\n"); + printf("%f ",P[15]); printf("%f ",P[16]); printf("%f ",P[17]); printf("%f ",P[18]); printf("%f ",P[19]); printf("\n"); + printf("%f ",P[20]); printf("%f ",P[21]); printf("%f ",P[22]); printf("%f ",P[23]); printf("%f ",P[24]); printf("\n"); + printf("\n"); + + CPU_mat_add(M,N,P,5); + + printf("CPU Matrix Add:\n"); printf("%f ",P[0]); printf("%f ",P[1]); printf("%f ",P[2]); printf("%f ",P[3]); printf("%f ",P[4]); printf("\n"); printf("%f ",P[5]); printf("%f ",P[6]); printf("%f ",P[7]); printf("%f ",P[8]); printf("%f ",P[9]); printf("\n"); printf("%f ",P[10]); printf("%f ",P[11]); printf("%f ",P[12]); printf("%f ",P[13]); printf("%f ",P[14]); printf("\n"); @@ -162,7 +196,17 @@ int main () { //You need to fix linker issues so that your methods will be recog mat_sub(M,N,P,5); - printf("Matrix Subtract:\n"); + printf("GPU Matrix Subtract:\n"); + printf("%f ",P[0]); printf("%f ",P[1]); printf("%f ",P[2]); printf("%f ",P[3]); printf("%f ",P[4]); printf("\n"); + printf("%f ",P[5]); printf("%f ",P[6]); printf("%f ",P[7]); printf("%f ",P[8]); printf("%f ",P[9]); printf("\n"); + printf("%f ",P[10]); printf("%f ",P[11]); printf("%f ",P[12]); printf("%f ",P[13]); printf("%f ",P[14]); printf("\n"); + printf("%f ",P[15]); printf("%f ",P[16]); printf("%f ",P[17]); printf("%f ",P[18]); printf("%f ",P[19]); printf("\n"); + printf("%f ",P[20]); printf("%f ",P[21]); printf("%f ",P[22]); printf("%f ",P[23]); printf("%f ",P[24]); printf("\n"); + printf("\n"); + + CPU_mat_sub(M,N,P,5); + + printf("CPU Matrix Subtract:\n"); printf("%f ",P[0]); printf("%f ",P[1]); printf("%f ",P[2]); printf("%f ",P[3]); printf("%f ",P[4]); printf("\n"); printf("%f ",P[5]); printf("%f ",P[6]); printf("%f ",P[7]); printf("%f ",P[8]); printf("%f ",P[9]); printf("\n"); printf("%f ",P[10]); printf("%f ",P[11]); printf("%f ",P[12]); printf("%f ",P[13]); printf("%f ",P[14]); printf("\n"); @@ -172,7 +216,17 @@ int main () { //You need to fix linker issues so that your methods will be recog mat_mult(M,N,P,5); - printf("Matrix Multiply:\n"); + printf("GPU Matrix Multiply:\n"); + printf("%f ",P[0]); printf("%f ",P[1]); printf("%f ",P[2]); printf("%f ",P[3]); printf("%f ",P[4]); printf("\n"); + printf("%f ",P[5]); printf("%f ",P[6]); printf("%f ",P[7]); printf("%f ",P[8]); printf("%f ",P[9]); printf("\n"); + printf("%f ",P[10]); printf("%f ",P[11]); printf("%f ",P[12]); printf("%f ",P[13]); printf("%f ",P[14]); printf("\n"); + printf("%f ",P[15]); printf("%f ",P[16]); printf("%f ",P[17]); printf("%f ",P[18]); printf("%f ",P[19]); printf("\n"); + printf("%f ",P[20]); printf("%f ",P[21]); printf("%f ",P[22]); printf("%f ",P[23]); printf("%f ",P[24]); printf("\n"); + printf("\n"); + + CPU_mat_mult(M,N,P,5); + + printf("CPU Matrix Multiply:\n"); printf("%f ",P[0]); printf("%f ",P[1]); printf("%f ",P[2]); printf("%f ",P[3]); printf("%f ",P[4]); printf("\n"); printf("%f ",P[5]); printf("%f ",P[6]); printf("%f ",P[7]); printf("%f ",P[8]); printf("%f ",P[9]); printf("\n"); printf("%f ",P[10]); printf("%f ",P[11]); printf("%f ",P[12]); printf("%f ",P[13]); printf("%f ",P[14]); printf("\n"); From bcec163c55ae834ca9978208004b4b5e527d70f1 Mon Sep 17 00:00:00 2001 From: cohensam Date: Sun, 21 Sep 2014 11:13:49 -0400 Subject: [PATCH 4/8] NBody starting to orbit but a few nbodies still escape. --- Part1/PROJ_WIN/CIS565_PROJ_1.suo | Bin 20480 -> 20480 bytes Part1/PROJ_WIN/CIS565_PROJ_1/vc100.pdb | Bin 471040 -> 487424 bytes Part1/src/kernel.cu | 103 ++++++++++--------------- Part1/src/main.cpp | 2 +- 4 files changed, 41 insertions(+), 64 deletions(-) diff --git a/Part1/PROJ_WIN/CIS565_PROJ_1.suo b/Part1/PROJ_WIN/CIS565_PROJ_1.suo index 956dc6b04cf47f14a3ed9790115c66d1102b5db7..854afefc5ec47d10396d61af4ad0dc9dcf47367b 100644 GIT binary patch delta 954 zcmY+CUr1AN6vuzRyF0yio73qwO;R_pHBIMMOKmjMO)~#Zp?VN9r9UQ_g!I|T^dmmNum)wdpgGE>DM3UN;AtL`1SJ);0iNtY!Yim0(JnJQQU+EKrKe#|l`w1gu&si1m z-N&u3gXnZfV)~MnQ=`0fyx2GeTLXuajP8{CVlZOEA<0zU`Y#oBzuqN9 z@&*~EVgWbk4>|nw&rAvjVb*^_ZO6tem^(ZXaxJMiBW7{RS4G@7ZwnHef=a^vHw>4J zB>q2IGDBJUYlFpOx5~jV$|GoUJc@p7J03@W8`eGNVbzl+ZFo33)S?R>j2gbroEv_) zxw3xp>8sOS`mBr>JY>TZf40KxwP+ul|@qUp1jPw?KE3M~g&bIn* zB!_TN`AE%ZF{)j_=+>;Z7r>0vy_ z25UK8uv#?+Lw+NKYp)tUw|pJg%njc;Ykb$7aKm1ZipNfY0be>Dgz}n;g>9sRTt+)a z1k#Sa9epoxkW=ImJ`>^ortYNU)27m?3!RT&+|$b4`n|papVRFsE-xx7DRP5YJtcjt kIK#`gR0AG(8hYKibRSsR3@9_X)i;rappmPArTRy|0WY`=m;e9( delta 1069 zcmZ9LOH30%7{_PZZM)mnQlKprjDR7$ZU>aHrgR~U#Qv1GQY7M>zmti}oD^?-QV0FK)<PW2}qGU*;i_sck3c)1Ob@NuqhQsaV3yj^UOA(Y(ld7wGK3i5(we=sheoPQA zaiu6+et|H#0*1j=a1C4sH^5DB3)}{Gz+G?;jDY*#0eA==fyZDJ@HaVz@hNx)o`V&pr+n)#Z}ShN0JIU$QX;jt06 UZ3@OHC8R^7#HXoCQZP3B1-OuIUoWTP4KPwN{sZvy<9}o0es{m6R>>bV1TG;KDfRgjdF3+$VY*i zM>317%-$*{Y7aCYQCVVH*)$eTW7$)iX+~^H`}STNxE}F0f6gD{`T5NdXWwnZ^v^NFZR;0M>-vL-?A2ISvgL7~1;B@S8CpL$a!jilK$k!ImHlkT7# znhwmMIW!sfOd3FWluL6_I#HF%w+7J^JRgkG@zBy?n@9KJc`i!3Q@kq6LFpVVkDxRf zN?A0Th9uH(JRX6UrvW*NIun5m0kww`fh;(zHHfDkE<(AacWek^q+EclOsLEBZ^ZK`c*OmRW84`E~LTB>gA2A z7hN+XY1%!YICX7WIK1vPsG_C<0cK4VV~V_@<_3?dM?7^Iph?tidrgTWDcmE3aL({A zr;8@F#4Iq(fWs?c`Z5agNN61 zZu8?WtSy)PG18!(zjdH=hCtef$tt(_Ay34iAbCe>tca-RA=-#eU7Xsc3G8Dwp+EV4 z^oa9zj|fGfY~gm1N9{x*_!p{s8-0`~V$h1JhVAml;aLy~qiF>NuO3~Zy1MLe(sA@d ziv&dwopAvj!9OEsddzB-D%!qgk&t$6E2vpqGmjX2ap}R|(j2#KJKebdKzG|yDch!s z{kl^p!t|*oMBf4KrmDXfoe>T~!9;K(Pu%1>G;Gb{l4m#n(&GE7VY%YGlSc!xN*Nf> zACmasAOcK-JQLNQ1&d?co^6Kn7~G$1b7741Uw3C1|9<*2TFk_ zfGxn2z*91+r-7}&Gr%_BS>QR~dEf=$Mc^giWneq71K0_a0lR=#fZaejum{)+>;ql} zUIShS-T*3q{lEd>P2eDqe+WN^fg`|MKqc@t@DA`UAba(D!27^4pb9t+d;ok1oB%!o zJ_bGkP6D3-e*r!N{tBD|J_o)4s)5tM-+(U#-AsRna0d7a_!>A1d;@$7RHOatuOnqm zBF;J1U25;JrCd~U*(GO4wCG#bDE&W<4l=Sqc)N3;{8A9L1Wv9&C&4`snDt7RpxGj9 z3dCu^bRchCaZcAA2}c9U($go0EcLGNiWOel3U9y)uYH9#aI4pdHhl#~w8%1>tTH-@ zY_r2Mqq)d2e=0P(2E@5NW^tC$*-Xwd(rv6bJB`C&xf^Iqg2K?@CUW`XuxCN~3uoi^YWq<{pk| zGRHJU#uU?#Q>USfTxO4{MjKniYi89_BO$=$n#94UWr7~cj#48wa=L|C>NA*`Ma7)Q z%wA@m$=SHdAyVTsl$EBg-lwQ-t^_lBg3-lP%n1|B;+0^YYhjl9JZ5H5G0$h_15hB%j_@#?VYvN$Pg~B^>Pcb)K@Sui;DPuCN5;+2Q;z8AACCl*{H*6BUv@D>R+`D z3`JLWwS`scYnYWq#rhz#KE$j=^{s66>NJ#Vue=MMXTAiH9)pP$stelbz2{Mw)n!9&>TI zW`_bf$@$Rfl^h#6+`=pMH0EVd@uo9x2J?>44Q=sP`~W&V8_7;@bGgl;tlFu?9nT3K zY2lOlDCT2P@r`D_G0bDS&5B& z%EBx4rU#k-YxpJCo@x|>)XDrRM+nUhZQz!;yz^FdZADy(Fk>=!KTQh$-zSyb#V zG5gERzMX4*wNGc7vVjZ#P7_68*$QmwQr|ZyivNg( zU+Ql$KZ}aLlKJ0e{&$$)Wpya|1KCP_QR7{@o2pU7N^InN7DlPR&x|Z8#$(J_#f--} z-&TLBmnx&Wd>6XvTHA%`Q?4JFJ(pYiBMZCKKW26o75gX5ev;Wg)$A63svF>%Tgfz&_e2v}4Re8CY^a|EY#i(l>3X(*$*2!d~x^;Z!fTWaN@ zmT#=+v3kZ5g4Dm_5Li?pe9a-8c^f7@W%ux4>dmb~${_MsR=o z!9p$dADNm(MSYH`&olK;ni^~VdT+9`8Oo%NH{D*>_LIi@En8U8b^n(o0IC0*17J}F zaDfA;;Q%gj2WY^l(@;hn0>)dk6CvpFmWVZ%?0;&!K~b~KZx(u~Ut)R|75!yJZzH$) z*fax2Pj2(`0^`O2m!HdSyZ0M=g7Zt=L^XF|^F0DP|6I(A-7$OJfZhB}L|HgKiojv# zk_AK~yJ9zwv!}O55gmeDay#Bl!XZd9ww^dPXpQaNmNwY5r(zo$i_JO?$C`Ta0l_Hyeo!Hk|iC5Uy54-zVU`wnV>-Mqs^(x*^bS@qAYVh=YmxK@aW*8AnN-;C~qo zM%IrZN(4_v5RNOdaU8K1QB4B+4#J@>jDk@67`*ss6j3aU4}u^u59c5-D#ceK6y^P= z;2cDLJy7f6*!@n*!8r(m-G@E!*C;O?NaVl^MaXKKg+x)2I1%s=-GMX3pTTn=3@f5r z@LA8FjF(X3lmXavBI+Qdd){L7C=huNbs~r|@N5I_Thr%gH^PJdc&ql`X z#eo%Sa}p_kAGLJg*)1R)m5sngVmd|^B48VWjd{v^qULDEQ%%vV$b9}~MD-+`hTeyx zhrVbtWF#OJ=O4G?cmZi`*$W3Yc-D9#2;lUe!-&Gr6z`xGzl9UEYm9S|u|($(_z7g@ zSAjzoG~L;|@NO1PiBP$@_~@3l$2^I?b`I&^fcIX_L{pAM`|m_bvf*?jUYvu@9McSK zi5fhI4kP=2s#)-}!(PxmIClTNv5ck9L3#k3a#%6y4rZSI|F%Uz&AC;=GN;kK0@(Z zfJND1?SBX&^+31zMz#@UXn9G2(a~<8+|0-{TAHsH7(uomJ;wyAIi_nv1kPG}9Eiv@ zO=0~!Tf1&l%R%+EkBtho1S0h?9f+1yAmLXA;?RLaxXl?)7%@Vx9kgflsn^yrj9|NO zqYR^Vu1(?vG0(M1iH&Mz2}J6VIuI?ZK%%Y;q`3~HMcu4xajs&~4sRo4h}L%{-q1kB?b8KgUh>`QHUW{ zvxdt$t}7lARhAaF(@`$=8|By6LD$zo*VjQ98vmjcc9Z_?HZ=Xwmc*h`gxEwxv}j1WwxwE+bZhFkEdiuPhsBU4w%%#X9jIc8Ke`3RPb~-wBec$Xb?6 zE$dlQ!`*=#Ku1$RlC_kgO0udh%BkCw1zM2aScF+QrCe22t4R+Q?SV)tlfElwu(zaZ z8D%8{G=BhN{Vu5iRS-aJ& z6`J+I@r>{ApFRhWz~&C9RPg-)o5uldYqj2m7}_Bti|Sdtp0gl^|>rnG2CiDB(h PkrJtkAw?$6SG>OgASfc2 diff --git a/Part1/src/kernel.cu b/Part1/src/kernel.cu index f778d12..0d72596 100644 --- a/Part1/src/kernel.cu +++ b/Part1/src/kernel.cu @@ -83,6 +83,20 @@ __global__ void generateCircularVelArray(int time, int N, glm::vec3 * arr, glm:: } } + + +//I ADDED THIS FUNCTION AS A HELPER +//REMEMBER : F = (G * m_a * m_b) / (r_ab ^ 2) +__device__ glm::vec3 accelerateHelper(glm::vec4 p1, glm::vec4 p2, float m1, float m2) +{ + glm::vec3 r = glm::vec3(p2.x - p1.x, p2.y - p1.y, p2.z - p1.z); + float r_ab1 = pow(r.x,2.f) + pow(r.y,2.f) + pow(r.z,2.f) + 0.000001; + float r_ab2 = 1/sqrt(r_ab1*r_ab1*r_ab1);//r_ab1 * sqrt(r_ab1); + float f = p2.w * r_ab2;//m2/pow(r_ab2, 3.0f); + return r*f; + ///*glm::vec3*/ a += glm::vec3(G*3e8*3e8*r.x/pow(r_ab2, 2.f), G*3e8*3e8*r.y/pow(r_ab2, 2.f), G*3e8*3e8*r.z/pow(r_ab2, 2.f)); +} + // TODO: Core force calc kernel global memory // HINT : You may want to write a helper function that will help you // calculate the acceleration contribution of a single body. @@ -93,11 +107,18 @@ __device__ glm::vec3 accelerate(int N, glm::vec4 my_pos, glm::vec4 * their_pos) //So this method can call a helper method as was in the hint, and the helper method //will do the below calculations, but essentially you need to calculate the acceleration //due to all other bodies - glm::vec3 r = glm::vec3(their_pos->x - my_pos.x, their_pos->y - my_pos.y, their_pos->z - my_pos.z); - float r_ab1 = pow(r.x,2.f) + pow(r.y,2.f) + pow(r.z,2.f); - float r_ab2 = r_ab1 * sqrt(r_ab1); - glm::vec3 a = glm::vec3(3e8*starMass*r.x/r_ab2, 3e8*starMass*r.y/r_ab2, 3e8*starMass*r.z/r_ab2); - //float f = (G * 3e8 * starMass) / pow(r_ab, 2.f); + glm::vec3 a = glm::vec3(0,0,0); + glm::vec3 temp = glm::vec3(0,0,0); + for (int i = 0; i < N; i++) { + if (my_pos != their_pos[i]) { + a += accelerateHelper(my_pos, their_pos[i], 3e8, 3e8); + } + } + glm::vec4 starPos = glm::vec4(0,0,0,1); + a += accelerateHelper(my_pos, starPos, 3e8, starMass); + a.x = a.x * G; + a.y = a.y * G; + a.z = a.z * G; return a; } @@ -106,71 +127,27 @@ __global__ void updateF(int N, float dt, glm::vec4 * pos, glm::vec3 * vel, glm:: { // FILL IN HERE - /*int width = 800; - int height = 800; - float s_scale = 2e2; - - int index = threadIdx.x + (blockIdx.x * blockDim.x); - int x = index % width; - int y = index / width; - - float w2 = width / 2.0; - float h2 = height / 2.0; - - float c_scale_w = width / s_scale; - float c_scale_h = height / s_scale; - - glm::vec3 a = accelerate(N, glm::vec4((x-w2)/c_scale_w,(y-h2)/c_scale_h,0,1), pos); - acc->x = acc->x + a.x; - acc->y = acc->y + a.y; - acc->z = acc->z + a.z;*/ - - - int width = 800; - int height = 800; - float s_scale = 2e2; - int index = threadIdx.x + (blockIdx.x * blockDim.x); - for (int i = 0; i < N; i++) { //Maybe need nested for loop this way the "other" body becomes all but the current one - int x = i % width; - int y = i / width; - float w2 = width / 2.0; - float h2 = height / 2.0; - float c_scale_w = width / s_scale; - float c_scale_h = height / s_scale; - glm::vec4 * p = new glm::vec4(); - p->x = pos[i].x; - p->y = pos[i].y; - p->z = pos[i].z; - glm::vec3 a = accelerate(N, glm::vec4((x-w2)/c_scale_w,(y-h2)/c_scale_h,0,1), p); - acc[i].x = acc[i].x + a.x; - acc[i].y = acc[i].y + a.y; - acc[i].z = acc[i].z + a.z; + glm::vec3 a = accelerate(N, pos[i], pos); + acc[i].x = a.x; + acc[i].y = a.y; + acc[i].z = a.z; } } // TODO : update velocity and position using a simple Euler integration scheme -__global__ void updateS(int N, float dt, glm::vec4 * pos, glm::vec3 * vel, glm::vec3 * acc) //Check if pos[3] is actually the mass of the body +__global__ void updateS(int N, float dt, glm::vec4 * pos, glm::vec3 * vel, glm::vec3 * acc) //DONE { - /*pos->x = pos->x + vel->x * dt; - pos->y = pos->y + vel->y * dt; - pos->z = pos->z + vel->z * dt; - pos->w = pos->w + dt; - - vel->x = vel->x + acc->x * dt; - vel->y = vel->y + acc->y * dt; - vel->z = vel->z + acc->z * dt;*/ - for (int i = 0; i < N; i++) { - pos[i].x = pos[i].x + vel[i].x * dt; - pos[i].y = pos[i].y + vel[i].y * dt; - pos[i].z = pos[i].z + vel[i].z * dt; - //pos[i].w += 1;//= pos[i].w + dt; - - vel[i].x = vel[i].x + acc[i].x * dt; - vel[i].y = vel[i].y + acc[i].y * dt; - vel[i].z = vel[i].z + acc[i].z * dt; + + vel[i].x += acc[i].x * dt; + vel[i].y += acc[i].y * dt; + vel[i].z += acc[i].z * dt; + + pos[i].x += vel[i].x * dt; + pos[i].y += vel[i].y * dt; + pos[i].z += vel[i].z * dt; } } @@ -251,7 +228,7 @@ void cudaNBodyUpdateWrapper(float dt) { // FILL IN HERE dim3 fullBlocksPerGrid((int)ceil(float(numObjects)/float(blockSize))); - //updateF<<>>(numObjects, dt, dev_pos, dev_vel, dev_acc); + updateF<<>>(numObjects, dt, dev_pos, dev_vel, dev_acc); updateS<<>>(numObjects, dt, dev_pos, dev_vel, dev_acc); } diff --git a/Part1/src/main.cpp b/Part1/src/main.cpp index ea86207..abfa744 100644 --- a/Part1/src/main.cpp +++ b/Part1/src/main.cpp @@ -4,7 +4,7 @@ #include "main.h" -#define N_FOR_VIS 5000 +#define N_FOR_VIS 500//0 #define DT 0.2 #define VISUALIZE 1 //------------------------------- From 94441a0695dd0f043db91a3e4a669c045c772438 Mon Sep 17 00:00:00 2001 From: cohensam Date: Sun, 21 Sep 2014 17:12:24 -0400 Subject: [PATCH 5/8] DONE. --- Part1/PROJ_WIN/CIS565_PROJ_1.suo | Bin 20480 -> 20480 bytes Part1/PROJ_WIN/CIS565_PROJ_1/vc100.pdb | Bin 487424 -> 495616 bytes Part1/src/kernel.cu | 28 ++++++++++--------------- Part1/src/main.cpp | 2 +- Part4 Performance Analysis.txt | 12 +++++++++++ 5 files changed, 24 insertions(+), 18 deletions(-) create mode 100644 Part4 Performance Analysis.txt diff --git a/Part1/PROJ_WIN/CIS565_PROJ_1.suo b/Part1/PROJ_WIN/CIS565_PROJ_1.suo index 854afefc5ec47d10396d61af4ad0dc9dcf47367b..66028e2ebc190ed0886c715d766e388c5843a6eb 100644 GIT binary patch delta 836 zcmY+CO=uHQ5P-)1+7rb%6sSfM7OX;Pb*NF{9{wXtm~{xmrRyalC)B347oij5)! zk=jEElJVd{utyPukcU6Lt+Z4T!A9>Qy?UvzV0&oSc}-ik1H*gs&CGlA-b{Iclov>C zl1k5B)L#FFDf$DJXrkIiVMe-QC#oO_M^WKKDhP2ZgYKcQA#IS`@Yj|lMM%pN{%zAV zCo$*_>RnWBKjKNc+)HNJrkoY2liPcTXpU9o5Q4)8pfhVH$B;xUlo`&f$SVTPm8W5g z2Mia+<4xufE7e+aN*e;V1u~uYz^P9tIMLoQ`;?3QU@9;YI9*(%d9gnbbyDkwIO9x~d^&LnGeQ%cDiOFiJ9$wmc3NyOq zMR=9+dx99<6jhJ%4b(E%tFalPLn-#vPP~Xjj`u(A>^x*glCU0k!`Nuj`+z6y$}L=T zJA0V&M}Z{%$wlFwFfEKjA{c}l$w2)aKDz}&x={DeGlX(>V?T{ihT@34jK>+RsGKhc9n)pKMysAAe}gS+W0A z;pXW8#8bKIrE+y(nc8_;;aWEs19@t*9;#i)DV+#-i}nh4enjI9Dhf0L6Hpl}uqWvd zAr&->us+nGG)D~m`o+h?$BVP+?ZP{omU;QUQCb}5z2ZZ_Wt9j*47C@(XASvt9G1_x bMzD+o@@AB2vdqwygEv*GqIZRr#-5V@rC;zZ delta 787 zcmY+CT}TvB7>4JJXLe_I{mE@zO=z7J|FYFxrLYw=e^>i+rM!xzv=WW{AzMK&g0+kE zLb}*DNW2iQB7#IcAksw)Xar%H(RFs)RS83AP}?_JIUV@me9!mtp7WlW#5_yPv&8|f zNa5p!3%JWaDD0#pNw{_?kaTm5DHj}4E`b~q7S#C>;&$lS9J|Uc8^De&%I>4l-j{cW z1P`eOsfMVAseYwWDd_eyY)b6eqcYDX_Rl2le6N-h$oLPsq7La85r>5e!xfdW7OX-% zvx}dwOkiY2vBqhT)v4yw;D}`?jI2j#wl`doWC<@St{~c4yzE(*)uSYGS%iv@ns?aq zp2wP}4vE}m2Xhi~TQtJR-C{gVYFfcU#bvVU6^VRZVHu)Pv)gE3O;)n$U=%%N`X1&j zrK<+3gi*S=A4VAE)3V&M`o}MjtQ=zjj90!r2o#(~TYeamkvz;)q$_1)Re;6F9Q4pG z7F_CbH6`&7wOJ1_PSN{mL`p1F#7}Q9RQol2uF}x$YY=+P6yvf*f_V7gA&qbu-~Vh{ zZIPNJh1rOUwL;GR{;dD;%O6`i*I#ek8kzM!@t~(zL#VD39c?w zhwav!Wc@lfBbfgcjQTEajt`&aY^NF^;I%;4DmBlkXkp2pg9g77b2Tc{v8DB4DBX=$ zzINyhoF%bTIDv0XnV2mq!Kb1Td^|Vf>F>Y~Vi?!X3w~yY4A<^`Jf9d`@ XAtKE`oeC>8&FM-CX!BHzcTBS1(NE;} diff --git a/Part1/PROJ_WIN/CIS565_PROJ_1/vc100.pdb b/Part1/PROJ_WIN/CIS565_PROJ_1/vc100.pdb index d51a7b0a462eabda803a36b60afecc5b4f3cf3d9..8ef447f964c4f3be18beb7d8b1fef3e6b77ffe70 100644 GIT binary patch delta 2951 zcmeHJTWnNS6kT_wbH+mHw4_r|fldp3#I{V^VW#qsmlPEWEm0|f!qBHgXt}^pUd9$Q zjS?S>p!b7FiXS9?K^ahSrYKz7h|U;!_z|6`c!V~AAwSh{}f+tE>&jA!~EUyHZZ_{AFo8J zeCI2cub<{V=X|}t(%Mg!>iiS-ux$<7o?+WNZ08Kyz6-Xp1Mx(keYiIfuj6Q_YjjGF zs|8L=yK42&NxrVaDiQ$}MVxcJYK5n($4SQdLVT~9 zaaHbo{;Aret(xF!7p3-}6iP7-~C>+gya&Zx=!_4dHen+#!T~$obenC~c9FNfhdZsy;NloVHFkCIMHjL5cN^k-W73@t3l)Xw z$=m9L-}(54s`o@#Pxn)X^`ui~jtfcqb0q{ZO$hTOgc1oMG?g+gKof$@G9e6kbgnXl zPOg_J)DhV&U8oCO#M&2wS*eD%XK~ij8M``IS`kj$j1EOECiZd`Rxx2)`zWLlUoifI_5!Wa|AM$P!>m2qX~ z4SsNdE7K1Ax(+zz2VP>oy~h4}xdOO8jje7guqqCGdl)#I06LEXh3tL17P8=a*)^U8 zCV4->=YK8%-VFeOwLq+c7|lTPF~FM(^pm401f*NZ!~f7Bpe>3)6t4tEW58e^kUNLu z3kX2c-fcjHADJI)2DqPO(}6rrYFq|NCHm|$O1#6b#LSUd{uqZUHVF zqf@|(<4*d>Ej4KQ_eLl@q>`(WPWK*FWbKjt>XZHIpURNG7pu+?pA@Ib%u^;aV^%b) zU)DvXeeEYfcA9AafTW3-CQTATdr)Wxgtmbg_ZzS{;hAO@L3AgW!4C*`_MIQqcMtv7 z6KeK9B~-fqmOASBzvx(WEP{Wh==2(&%&Mjj&-n9dk1$Ky=PMo<@7`tU!aJn1m9_k) zT&Gx{2_7o-C_9~era0A7HyT(x{dzqQupBH^o$N(CgSsI;BZR&*gg%$*VIa~iA^%vw zTA$9X^PjbRdwIWK4ybnOW=ACd-pTe1h3TPBwpxVt4)Qwiz{%D&CtGVEZMFLEmiz_b C=&tbq delta 2152 zcmZ|QeNa?Y7{KB8EDH;|uxf~uqG&KfnF7+PNtN-XoED6_P(EG?b>!kVDpxhyPAdB*3SbKiTxg*C3@A%&~z#O|G3;n0zLWte2i!?HkT%6zU_@{r`nT;xf<%%aYc zQR0zYSx9uI8P#u%k;izRL39Rpne_F@Y@X*59U!S@EQe@*pgmFCa=%QMDKd7jjOX!0 zUVa=o=FnM8WGq8FWiY0bu&qO?3=~5il^HdzY*nb&VG+?)!L$)--^~rm5$6#PUUmmJ+i)a1gT#+yM0O-D}wDUB};BPCEayEh|jgb z9YxjyVN%yr#{{R(QB%td)pU!~Tt})TjtUH7R@F@vj-gRbrBrfW$GF_6aD+Q_oRr<~ zQ0g8DPZTjm6XHE>R)_sbYHRykJuE|5ON>S7tGUAX{T@}8p3Qx9U+=K3f$@T)oennT zgwNG;T2g^iwNMsEq$>Tz_%ds3;#m~Vxb+Dld$UE#h+Fj~P-3VX^+m>+3k?;WL6?X? z%TDcc-DXZ~Q0w$&nu`f!m`uNp_c~mCU#?n9t&P`#)?$v(cGJ}1Q%`jVKl|mjIC~Ol z=q_g}9L?97udxxy|1-XwH~y8F`+}n$b(C|aH3T239&LkDXN(VyS=C|u)opP-evt$jDyc4AqIAEwI*Cq+cB+gUMTh@(J07UI*!kYkX>sS%tLCUffkQo*t~XN!fHH&HCT%hlwuv$qYOT5 zKsh#I6E@>nY(WLKq7u(x8=l7t*p3(R5?;m*J*!u+6R)BQyYL!b$8NlVH}Mv#QG-43 zV=wk$KWgzd4&WUe#JhM8@1qWf@Bu!=VH6(Waumn#5$f?VKEZM5Q+)!T;xnAY=QxGa zID@nJ0$-vLU*T(kO)o_1A=3kD!dS+vIJ0Vp6Pa$$E8cR7O)S;N~Pt0Df%Byn{)I7f{ zGcBsfC^l5FVJS93iVbVA5xUa|GxpjHd+q8tTY;q}Ggj@t9_F>xdTq8zRvq-JnvuIL zn^SD|0lm`rW!u8Y+|t4?OhUX|C`L&ueUh3OU6+34zOgZWcgh~eK4o*aryr`!OwnYz zYW;7fOfqRYlU_%ArdoHp?SMlKuZ_sGRkkiDj|B~h+K_E+-vuvo(?#!so`=3$03)~y Px>H(|%CA+1t)= 1) { + f *= G; + } return r*f; - ///*glm::vec3*/ a += glm::vec3(G*3e8*3e8*r.x/pow(r_ab2, 2.f), G*3e8*3e8*r.y/pow(r_ab2, 2.f), G*3e8*3e8*r.z/pow(r_ab2, 2.f)); } // TODO: Core force calc kernel global memory // HINT : You may want to write a helper function that will help you // calculate the acceleration contribution of a single body. // REMEMBER : F = (G * m_a * m_b) / (r_ab ^ 2) -__device__ glm::vec3 accelerate(int N, glm::vec4 my_pos, glm::vec4 * their_pos)//Maybe ok??? :'( +__device__ glm::vec3 accelerate(int N, glm::vec4 my_pos, glm::vec4 * their_pos) { - //You might need to take into account the acceleration of EVERY other body - //So this method can call a helper method as was in the hint, and the helper method - //will do the below calculations, but essentially you need to calculate the acceleration - //due to all other bodies glm::vec3 a = glm::vec3(0,0,0); glm::vec3 temp = glm::vec3(0,0,0); for (int i = 0; i < N; i++) { if (my_pos != their_pos[i]) { - a += accelerateHelper(my_pos, their_pos[i], 3e8, 3e8); + a += accelerateHelper(my_pos, their_pos[i]); } } - glm::vec4 starPos = glm::vec4(0,0,0,1); - a += accelerateHelper(my_pos, starPos, 3e8, starMass); - a.x = a.x * G; - a.y = a.y * G; - a.z = a.z * G; + glm::vec4 starPos = glm::vec4(0,0,0,starMass); + a += accelerateHelper(my_pos, starPos); return a; } @@ -127,7 +122,7 @@ __global__ void updateF(int N, float dt, glm::vec4 * pos, glm::vec3 * vel, glm:: { // FILL IN HERE - for (int i = 0; i < N; i++) { //Maybe need nested for loop this way the "other" body becomes all but the current one + for (int i = 0; i < N; i++) { glm::vec3 a = accelerate(N, pos[i], pos); acc[i].x = a.x; acc[i].y = a.y; @@ -226,7 +221,6 @@ void initCuda(int N) // TODO : Using the functions you wrote above, write a function that calls the CUDA kernels to update a single sim step void cudaNBodyUpdateWrapper(float dt) { - // FILL IN HERE dim3 fullBlocksPerGrid((int)ceil(float(numObjects)/float(blockSize))); updateF<<>>(numObjects, dt, dev_pos, dev_vel, dev_acc); updateS<<>>(numObjects, dt, dev_pos, dev_vel, dev_acc); diff --git a/Part1/src/main.cpp b/Part1/src/main.cpp index abfa744..2afd9b3 100644 --- a/Part1/src/main.cpp +++ b/Part1/src/main.cpp @@ -4,7 +4,7 @@ #include "main.h" -#define N_FOR_VIS 500//0 +#define N_FOR_VIS 700//5000 #define DT 0.2 #define VISUALIZE 1 //------------------------------- diff --git a/Part4 Performance Analysis.txt b/Part4 Performance Analysis.txt new file mode 100644 index 0000000..c6919f1 --- /dev/null +++ b/Part4 Performance Analysis.txt @@ -0,0 +1,12 @@ +Part 4: + +NOTE: In Part 1 I decreased the number of nbodies to 7000 because the original number given (5000) crashed my graphics card. + +1) How does changing the tile and block sizes change performance? Why? +Increasing the number of tile/block sizes would improve performance/decrease performance time, because the more blocks there are, the more threads there are available to work on planet acceleration, velocity, and position calculations. Since threads can perform calculations concurrently, the more threads there are the more planet calculations we can perform at the same time. Having more planet calculations completed at the same time means that it will take less time to calculate the new planet positions at each time step, making the simulation run faster. Also, in the case where all threads from one block are in use, thereby stalling that block from completing new calculations, another block can continue calculations until it stalls, and this cycle would continue until you run out of blocks. However if you have enough blocks, then when the final block stalls, the first block will have completed its calculations and be able to undertake more calculations, allowing for quicker time steps. + +2) How does changing the number of planets change performance? Why? +Increasing the number of planets decreases performance/increases performance time because the more planets, the more kernels/threads that are necessary to conduct the calculations for each planet. Since we only have so many blocks of so many threads, we can only perform a finite amount of calculations at once. Thus, when we add more planets and have more planets than threads it will take longer to complete all calculations because we need to wait until more threads become available, and therefore each time step will take longer to complete. + +3) Without running experiments, how would you expect the serial and GPU versions of matrix_math to compare? Why? +Without running experiments, I would expect the GPU version of matrix_math to run far faster than the serial version. Matrix math is calculated based upon the rows of one matrix and the columns of another and therefore the calculations of row-column pairs can be conducted independently of one another. The GPU version of matrix math can run all N sets of calculations for the row-column pairs at the same time, since it can have multiple threads running calculations concurrently. Alternately, in the CPU version of matrix math each calculation for each final value in the resulting matrix must be conducted one at a time, since the calculations are being completed on a single thread. Thus, if it takes T time to complete matrix math calculations with the CPU, it will take 1/T time to complete the same calculations with the CPU. \ No newline at end of file From 176c2d16a3c5caa34cdd2099f21f5f04e423b4b0 Mon Sep 17 00:00:00 2001 From: cohensam Date: Sun, 21 Sep 2014 17:17:39 -0400 Subject: [PATCH 6/8] Updated Readme --- README.md | 124 ++++-------------------------------------------------- 1 file changed, 8 insertions(+), 116 deletions(-) diff --git a/README.md b/README.md index 70ae0d3..67e9464 100644 --- a/README.md +++ b/README.md @@ -1,120 +1,12 @@ -Project 1 -========= +Part 4: -# Project 1 : Introduction to CUDA +NOTE: In Part 1 I decreased the number of nbodies to 7000 because the original number given (5000) crashed my graphics card. -## NOTE : -This project (and all other projects in this course) requires a NVIDIA graphics -card with CUDA capabilityi! Any card with compute capability 2.0 and up will -work. This means any card from the GeForce 400 and 500 series and afterwards -will work. If you do not have a machine with these specs, feel free to use -computers in the SIG Lab. All computers in SIG lab and Moore 100 C have CUDA -capable cards and should already have the CUDA SDK installed. +1) How does changing the tile and block sizes change performance? Why? +Increasing the number of tile/block sizes would improve performance/decrease performance time, because the more blocks there are, the more threads there are available to work on planet acceleration, velocity, and position calculations. Since threads can perform calculations concurrently, the more threads there are the more planet calculations we can perform at the same time. Having more planet calculations completed at the same time means that it will take less time to calculate the new planet positions at each time step, making the simulation run faster. Also, in the case where all threads from one block are in use, thereby stalling that block from completing new calculations, another block can continue calculations until it stalls, and this cycle would continue until you run out of blocks. However if you have enough blocks, then when the final block stalls, the first block will have completed its calculations and be able to undertake more calculations, allowing for quicker time steps. -## PART 1 : INSTALL NSIGHT -To help with benchmarking and performance analysis, we will be using NVIDIA's -profiling and debugging tool named NSight. Download and install it from the -following link for whichever IDE you will be using: -http://www.nvidia.com/object/nsight.html. +2) How does changing the number of planets change performance? Why? +Increasing the number of planets decreases performance/increases performance time because the more planets, the more kernels/threads that are necessary to conduct the calculations for each planet. Since we only have so many blocks of so many threads, we can only perform a finite amount of calculations at once. Thus, when we add more planets and have more planets than threads it will take longer to complete all calculations because we need to wait until more threads become available, and therefore each time step will take longer to complete. -NOTE : If you are using Linux / Mac, most of the screenshots and class usage of -NSight will be in Visual Studio. You are free to use to the Eclipse version -NSight during these in class labs, but we will not be able to help you as much. - -## PART 2 : NBODY SIMULATION -To get you used to using CUDA kernels, we will be writing a simple 2D nbody -simulator. The following source files are included in the project: - -* main.cpp : sets up graphics stuff for visualization -* kernel.cu : this contains the CUDA kernel calls - -All the code that you will need to modify is in kernel.cu and is marked clearly -by TODOs. - -## PART 3 : MATRIX MATH -In this portion we will walk you through setting up a project that writes some -simple matrix math functions. Please put this portion in a folder marked Part2 -in your repository. - -### Step 1 : Create your project. -Using the instructions on the Google forum, please set up a new Visual Studio project that -compiles using CUDA. For uniformity, please write your main function and all -your code in a file named matrix_math.cu. - -### Step 2 : Setting up CUDA memory. -As we discussed in class, there is host memory and device memory. Host memory -is the memory that exists on the CPU, whereas device memory is memory on the -GPU. - -In order to create/reserve memory on the GPU, we need to explicitly do so -using cudaMalloc. By calling cudaMalloc, we are calling malloc on the GPU to -reserve a portion of its memory. Like malloc, cudaMalloc simply mallocs a -portion of memory and returns a pointer. This memory is only accessible on the -device unless we explicitly copy memory from the GPU to the CPU. The reverse is -also true. - -We can copy memory to and from the GPU using the function cudaMemcpy. Like the -POSIX C memcpy, you will need to specify the size of memory you are copying. -The last argument is used to specify the direction of the copy (from GPU to CPU -or the other way around). - -Please initialize 2 5 x 5 matrices represented as an array of floats on the CPU -and the GPU where each of the entry is equal to its position (i.e. A_00 = 0, -A_01 = 1, A_44 = 24). - -### Step 3 : Creating CUDA kernels. -In the previous part, we explicitly divided the CUDA kernels from the rest of -the file for stylistic purposes. Since there will be far less code in this -project, we will write the global and device functions in the same file as the -main function. - -Given a matrix A and matrix B (both represented as arrays of floats), please -write the following functions : -* mat_add : A + B -* mat_sub : A - B -* mat_mult : A * B - -You may assume for all matrices that the dimensions of A and B are the same and -that they are square. - -Use the 2 5 x 5 matrices to test your code either by printing directly to the -console or writing an assert. - -THINGS TO REMEMBER : -* global and device functions only have access to memory that is explicitly on - the device, meaning you MUST copy memory from the CPU to the GPU if you would - like to use it there -* The triple triangle braces "<<<" begin and end the global function call. This - provides parameters with which CUDA uses to set up tile size, block size and - threads for each warp. -* Do not fret if Intellisense does not understand CUDA keywords (if they have - red squiggly lines underneath CUDA keywords). There is a way to integrate - CUDA syntax highlighting into Visual Studio, but it is not the default. - -### Step 4 : Write a serial version. -For comparison, write a single-threaded CPU version of mat_add, mat_sub and -mat_mult. We will not introduce timing elements in this project, but please -keep them in mind as the upcoming lab will introduce more on this topic. - -## PART 4 : PERFORMANCE ANALYSIS -Since this is the first project, we will guide you with some example -questions. In future projects, please answer at least these questions, as -they go through basic performance analysis. Please go above and beyond the -questions we suggest and explore how different aspects of your code impact -performance as a whole. - -We have provided a frame counter as a metric, but feel free to add cudaTimers, -etc. to do more fine-grained benchmarking of various aspects. - -NOTE : Performance should be measured in comparison to a baseline. Be sure to -describe the changes you make between experiments and how you are benchmarking. - -* How does changing the tile and block sizes change performance? Why? -* How does changing the number of planets change performance? Why? -* Without running experiments, how would you expect the serial and GPU verions - of matrix_math to compare? Why? - -## SUBMISSION -Please commit your changes to your forked version of the repository and open a -pull request. Please write your performance analysis in your README.md. -Remember to email Harmony (harmoli+CIS565@seas.upenn.edu) your grade and why. +3) Without running experiments, how would you expect the serial and GPU versions of matrix_math to compare? Why? +Without running experiments, I would expect the GPU version of matrix_math to run far faster than the serial version. Matrix math is calculated based upon the rows of one matrix and the columns of another and therefore the calculations of row-column pairs can be conducted independently of one another. The GPU version of matrix math can run all N sets of calculations for the row-column pairs at the same time, since it can have multiple threads running calculations concurrently. Alternately, in the CPU version of matrix math each calculation for each final value in the resulting matrix must be conducted one at a time, since the calculations are being completed on a single thread. Thus, if it takes T time to complete matrix math calculations with the CPU, it will take 1/T time to complete the same calculations with the CPU. From dff0d8883f663fd6c319fcfe8534eadd1f874ce2 Mon Sep 17 00:00:00 2001 From: cohensam Date: Sun, 21 Sep 2014 17:20:18 -0400 Subject: [PATCH 7/8] DONE NOW. --- Part4 Performance Analysis.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Part4 Performance Analysis.txt b/Part4 Performance Analysis.txt index c6919f1..792dfbd 100644 --- a/Part4 Performance Analysis.txt +++ b/Part4 Performance Analysis.txt @@ -1,6 +1,6 @@ Part 4: -NOTE: In Part 1 I decreased the number of nbodies to 7000 because the original number given (5000) crashed my graphics card. +NOTE: In Part 1 I decreased the number of nbodies to 700 because the original number given (5000) crashed my graphics card. 1) How does changing the tile and block sizes change performance? Why? Increasing the number of tile/block sizes would improve performance/decrease performance time, because the more blocks there are, the more threads there are available to work on planet acceleration, velocity, and position calculations. Since threads can perform calculations concurrently, the more threads there are the more planet calculations we can perform at the same time. Having more planet calculations completed at the same time means that it will take less time to calculate the new planet positions at each time step, making the simulation run faster. Also, in the case where all threads from one block are in use, thereby stalling that block from completing new calculations, another block can continue calculations until it stalls, and this cycle would continue until you run out of blocks. However if you have enough blocks, then when the final block stalls, the first block will have completed its calculations and be able to undertake more calculations, allowing for quicker time steps. From 803f936aaa75a15772e26c5a5a5d9e06a0b39df4 Mon Sep 17 00:00:00 2001 From: cohensam Date: Sun, 21 Sep 2014 17:21:11 -0400 Subject: [PATCH 8/8] Updated Readme. --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 67e9464..289874b 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,6 @@ Part 4: -NOTE: In Part 1 I decreased the number of nbodies to 7000 because the original number given (5000) crashed my graphics card. +NOTE: In Part 1 I decreased the number of nbodies to 700 because the original number given (5000) crashed my graphics card. 1) How does changing the tile and block sizes change performance? Why? Increasing the number of tile/block sizes would improve performance/decrease performance time, because the more blocks there are, the more threads there are available to work on planet acceleration, velocity, and position calculations. Since threads can perform calculations concurrently, the more threads there are the more planet calculations we can perform at the same time. Having more planet calculations completed at the same time means that it will take less time to calculate the new planet positions at each time step, making the simulation run faster. Also, in the case where all threads from one block are in use, thereby stalling that block from completing new calculations, another block can continue calculations until it stalls, and this cycle would continue until you run out of blocks. However if you have enough blocks, then when the final block stalls, the first block will have completed its calculations and be able to undertake more calculations, allowing for quicker time steps.