diff --git a/_pages/WritingPapers.md b/_pages/WritingPapers.md deleted file mode 100644 index f243b78..0000000 --- a/_pages/WritingPapers.md +++ /dev/null @@ -1,180 +0,0 @@ ---- -layout: page -title: -permalink: /writing-papers/ -nav: false -nav_order: 5 ---- - -# Writing Academic Papers - -This page discusses my preferences when it comes to writing papers. It is more focused on paper structuring and collaboration, as opposed to lower-level writing tips. It is not meant to be a golden standard in any way, and I do not claim that my preferences are the *right* way to do anything. This is primarily meant for my students -- specifically, to avoid having to repeat myself every time I start working with a new student (inspired by [Claire Le Goues'](https://clairelegoues.com/2016/08/23/things-i-keep-repeating-about-writing/) post a while back). However, if you find this information useful, or want to use a similar setup, please go ahead! I will update this as I think of more points (or maybe find strong reasons to do things in another way down the road). - -## Setup & Directory Structure - -* For every new paper we work on, please create a private GitHub repository under our GitHub organization with the following name `paper--`. For example, if I was the main author of the paper and planning to submit the work to ICSE '18, I would name it `paper-nadi-icse18`. Suppose we started working on a paper, but still don't have a concrete venue in mind. In that case, use some descriptive phrase. For example, if we were working on extracting configuration constraints, the repo name would be `paper-nadi-config-constr`. *Credits:* I learned this naming strategy during my time at TU Darmstadt and found it quite useful, since typically a group's GitHub organization would have a mix of paper repos, code repos, grant repos etc. Having some fixed prefix for papers made them easier to spot. - -* We will write all papers using [LaTeX](https://www.latex-project.org/). If you don't know LaTeX, now is the time to learn. - -* If we are targetting a specific conference, make sure to get the right template for the conference. It would be such a pity to get a desk reject, because of using the wrong template! If we are not sure which conference we will target, then just use the standard [ACM](http://www.acm.org/publications/proceedings-template) or [IEEE](https://www.ieee.org/conferences_events/conferences/publishing/templates.html) template for the time being, but make sure to switch to the right one when we decide on a conference - -* Rename the main LaTeX file as `main.tex` and keep it in the main directory. Create a separate `tex` file for each section in the paper, and organize the directory as follows (this example assumes we are using the ACM template): - -``` -main.tex -ACM-Reference-Format.bst -acmart.cls -sections/ - introduction.tex - related-work.tex - ... -meta/ - packages.tex - macros.tex - any-other-definition-files.tex -images/ - linux-evolution.pdf - any-other-figure-in-the-paper.pdf -references/ - references.bib - any-other-ref-files-as-needed.bib -``` - -The idea is to include the other files in the `main.tex` file as needed. This is would be an example of what `main.tex` looks like: - - -``` -\documentclass[sigconf,review,anonymous]{acmart} - -\input{meta/packages} -\input{meta/macros} - -%Conference -\acmConference[ICSE'18]{International Conference on Software Engineering}{May 2018}{Gothenburg, Sweden} -\acmYear{2018} -\copyrightyear{2018} - -\acmPrice{15.00} - - -\begin{document} - -\title{Our Awesome Paper} - -% all the author information - -\input{sections/abstract} - - -\maketitle - -\input{sections/introduction} -\input{sections/motivation} -... -\input{sections/related-work.tex} -\input{sections/conclusion} - -\newpage - -\balance - -\bibliographystyle{ACM-Reference-Format} -\bibliography{references/references} - -\end{document} -``` - -* Please do not include any generated files in the git repository (e.g., .blg, .log, .aux etc.). This also includes the `main.pdf` file. - -## Collaboration - -* We will collaborate through the git repository. So **please please please** commit regularly and push the repo. You don't have to wait till you have finished the whole section and revised it 5 times before pushing it to the repo. I will not read it anyways until you tell me it is ready so commit often so we have good history in case something goes wrong and so we don't loose all your writing in case your computer crashes. - -* For early drafts, I would typically mark up a pdf and send it to you or we would sit together and go through a printed section where I'll mark up things as we go. In either case, I expect that you will update things in the repository afterwards. In later stages, once the content is a bit stable, I will typically start editing things in the repository. This will usually be through leaving comments in the text. To make it easier for me to leave comments and for you to respond to them if necessary, always have the following macros defined (obviously, change colors as needed). If a comment has already been addressed and resolved, then remove it from the text. - -{% raw %} -``` -\newcommand{\sn}[1]{{\color{blue} \textbf{Sarah:}~#1}} -\newcommand{\}[1]{{\color{green}\textbf{:}~#1}} -``` -{% endraw %} - -* There may be some parts of the paper still under construction, e.g., numbers you still need to get or a citation you still need to find. For these, have a TODO macro defined such that we can easily spot what's left to do: - -{% raw %} -``` -\newcommand{\todo}[1]{{\textcolor{red}{\textbf{TODO:}~#1}} -``` -{% endraw %} - -* To make sure we can clearly see numbers that need to be double checked, surround all numbers with the following `\checkNum` macro. Make sure to remove the coloring from the macro before submitting. *Credits*: again, this is another trick I learned from TU Darmstadt students. - -{% raw %} -``` -\newcommand{\checkNum}[1]{{\textcolor{orange}{\textit{#1}}} -``` -{% endraw %} - -* When you are the only one working on the repository, there are no chances of conflicts. However, if we are both editing the paper, we should coordinate such that we are not both editing the same sections (that's why I like each section to go in a separate file). We can do this via Slack or email. - - -* Before you try pulling changes into your local repository, make sure to commit your current changes first and then do `git pull --rebase`. Do the same before pushing changes. This decreases the chances of us getting unnecessary merge conflict messages and provides a cleaner history for us (i.e., no unnecessary merges). - -* While I have not strictly implemented this myself so far, I think it is a good idea to have each sentence in a separate line (Sebastian Proksch at TU Darmstadt, now at U. Zürich used to follow this). This makes it easier to diff versions and resolve conflicts. I will try to implement this myself going forward. - - -## Writing Style - -* I don't claim to be the world's best writer, but I have several pet peeves: - * make sure you know when to use `the` vs. `a` vs. neither. If you find me constantly adding or removing `the`'s from your text and you don't understand what the problem is, come ask me why. Please do not just keep doing the same thing over and over again; it's frustrating for both of us. - * If you have a sentence that's four lines long, it is likely confusing and hard to understand. Break it down. The more concisely you can say something the better. If you need more words, use multiple sentences. You need to take the reader through the flow of your arguments. Don't loose readers by forcing them to go back and read each sentence or paragraph twice. Make their life easier. Reviewers are already picky as it is.. don't give them another reason to shoot your paper down. - * To avoid typing a long list (and it's hard to remember all of them now), Claire Le Goues' [post](https://clairelegoues.com/2016/08/23/things-i-keep-repeating-about-writing/) has good tips on writing style. However, note how she prefers having the whole paper in one file and I don't ? -- hence, each advisor's personal preferences. - -* In general, be prepared to do multiple iterations on the paper. We might end up re-organzing things several times. Be patient and give yourself enough time ahead of the deadline for these iterations. - -## Paper Organization - -* I always remember my PhD advisor, [Ric Holt](https://plg.uwaterloo.ca/~holt/), for the words "big picture". They are now engraved in my brain. So what do they mean? You always want to tell the reader what the big picture is. What's the context of the problem you are dealing with? What exactly are you doing? Why should they care about what you are doing? Who will benefit out of the results? How can the results be used? A good paper never leaves the reader wondering about any of these points. Ideally, the introduction should already answer a lot of these "big picture" questions without necessary overwhelming the reader with tons of low-level details or side "stories". - -* Related to big pictures, I'm a big fan of overview figures that provide a numbered or labelled illustration of all steps of the methodology or the components of a framework, for example. These numbers can then be referenced in the text, and make life so much easier for the reader. They also force you to write in a more structured way. Examples: [Figure 1](https://dl.dropboxusercontent.com/s/j4gu0145t4ry0xs/Proksch_ASE16.pdf), [Figure 1](https://dl.dropboxusercontent.com/s/4pus24phiq7gmmt/Proksch_SANER17.pdf), or [Figure 1](https://dl.dropboxusercontent.com/s/f9gy8kzv6dwwwgl/NADI_ICSE14.pdf). Notice how they are all Figure 1 :-) - -## Updating Results - -* Ideally, you want to create your experiments such that it is easy to re-run them and update the results in the paper as needed. - -* For figures, plots etc., the best way to do this is to have a script for reproducing the graph. So we can basically update a label, re-run the script and then re-compile our LaTeX file. If you already do your figures in LaTeX (I personally don't just because I never tried it not because I have anything against it), then you already guarantee they are always up to date. - -* Ideally, also have a script that does everything. This script can call multiple other scripts/programs, but it should go through the whole pipeline of data analysis etc. until reproducing plots and tables. Basically, re-run your experiment from A-Z with a click of a button. Depending on the situation, this may not always be feasible but it's great to have. - -* One way to make the recreation of tables easier is to have each table in a separate .tex file and have a script that generates it. - -* For numbers in the text, you can create scripts that output a .tex file with some of the results you will use in the paper. I will use my ICSE '16 paper as an example. On the [artifact page](http://www.st.informatik.tu-darmstadt.de/artifacts/crypto-api-misuse/), you will find the [R script](http://www.st.informatik.tu-darmstadt.de/artifacts/crypto-api-misuse/Data/Study4/survey_analysis.r) I used for all my data analysis. This R script also outputted certain values to a `survey_data.tex` file, which looked like this: - -``` -\pgfkeyssetvalue{total_responded}{43} -\pgfkeyssetvalue{ignored}{6} -\pgfkeyssetvalue{total_analyzed}{37} -\pgfkeyssetvalue{percentage_students}{11} -\pgfkeyssetvalue{percentage_professional}{54} -\pgfkeyssetvalue{percentage_six_years}{73} -\pgfkeyssetvalue{percentage_atleast_knowledgeable}{86} -\pgfkeyssetvalue{percentage_rarely_need_crypto}{57} -\pgfkeyssetvalue{percentage_secure_comm_rank1}{49} -\pgfkeyssetvalue{percentage_user_auth_rank1}{30} -\pgfkeyssetvalue{user_auth_avg_rank}{3.95} -... -``` - -This data file was included in `main.tex` as follows: - -``` -\input{results/survey_data.tex} -``` - -To use any of these values in the text of a particular section in the paper, you would do: - -``` -We base our findings below on the remaining \checkNum{\pgfkeysvalueof{total_analyzed}} participants. -``` - -You do need to have `\usepackage{pgfkeys}` to be able to use this (*Credits:* I learned about this package from the Undertaker group at Erlangen). Obviously, there may be other ways to do something similar. The point is that you could re-run an external script that would regenerate all numbers and you don't have to go and manually update each number in your tex files. diff --git a/_pages/apply.md b/_pages/apply.md deleted file mode 100644 index 6b9c006..0000000 --- a/_pages/apply.md +++ /dev/null @@ -1,40 +0,0 @@ ---- -layout: page -permalink: /apply/ -title: How to apply to my research group -description: -nav: false -nav_order: 5 ---- - -Please note that I am currently NOT accepting students at the University of Alberta. -This information is for prospective PhD students/postdocs/research associates at NYUAD. -Note that NYUAD does not have a Master's program. - -## Postdoctoral Researchers - -- Prospective postdoctoral researchers should have obtained their PhD with a focus on **Software Engineering** in the last 0-3 years. -- As a first step, please check [https://nyuad.nyu.edu/en/about/careers/postdoctoral-and-research.html](https://nyuad.nyu.edu/en/about/careers/postdoctoral-and-research.html) for postdoc openings that I may already be advertising. If you find open positions, please directly submit an application. -- If you do not find open positions and want to inquire about potential openings, please email me with a copy of your CV and some brief info about your PhD degree and doctoral research. - -## Research Assistants - -- Research assistants are those who already obtained a Bachelor or Master's degree and wish to gain some research experience. -- As a first step, please check [https://nyuad.nyu.edu/en/about/careers/postdoctoral-and-research.html](https://nyuad.nyu.edu/en/about/careers/postdoctoral-and-research.html) for research assistant openings that I may already be advertising. If you find open positions, please directly submit an application. -- If you do not find open positions and want to inquire about potential openings, please email me with a copy of your CV and some brief info about your background (education plus any industrial and/or research experience) and research interests. - -## PhD Students - -* NYUAD offers a [Global PhD Fellowship Program](https://nyuad.nyu.edu/en/admissions/graduate/global-phd-student-fellowships-in-science.html) through agreements with two Computer Science doctoral programs in NYU New York. The programs generally involve one year of classwork in NYU New York followed by three to four years of research in NYU Abu Dhabi. If selected, the doctorate is fully funded under the NYU Abu Dhabi's Global PhD Student Fellowship -* Please submit a standard PhD application to one or both Computer Science doctoral programs in NYU New York listed below. - * The Computer Science and Engineering department via the [NYU Tandon School of Engineering](https://engineering.nyu.edu/admissions/graduate). - * The Courant Institute PhD application via the [NYU Graduate School of Arts and Science](https://cs.nyu.edu/home/phd/admission.html). - - The choice of the program will depend on you. When you apply, you will be considered by the faculty in the NYU New York program as well as the NYU Abu Dhabi program. There is no separate application for the Global PhD Student Fellowship; all interested PhD applicants will be considered. Not all PhD application forms specifically ask about candidates' interest in NYUAD. You may indicate your interest in the NYUAD Global PhD by referencing NYUAD, an NYUAD faculty member, or an NYUAD research group in your personal statement (or other application documents). -* Please be advised that the PhD application deadlines are typically early or mid December of the preceding year. If you missed the deadlines, it may still be possible to submit an application, but you should contact the relevant NYU New York department. - -For more information about the Global Ph.D. Student Fellowship, please contact [nyuad.graduateadmissions@nyu.edu](mailto:nyuad.graduateadmissions@nyu.edu). - -## General Application Advice - -* I highly encourage you to watch this talk on contacting potential supervisers before emailing me (or any other potential faculty members you may want to work with for any position): [https://youtu.be/B3oANa67Iq4](https://youtu.be/B3oANa67Iq4) \ No newline at end of file diff --git a/_pages/members.md b/_pages/members.md deleted file mode 100644 index e9c5e8a..0000000 --- a/_pages/members.md +++ /dev/null @@ -1,186 +0,0 @@ ---- -layout: students -permalink: /students/ -title: Students -description: I am fortunate to be working/have worked with the following students and researchers. -nav: false -nav_order: 1 - -profiles: - postdoc: - - image: FranciscoRibeiro.jpeg - name: Francisco Ribeiro - affiliation: NYUAD - url: https://scholar.google.pt/citations?user=zTLXgZgAAAAJ&hl=en - - image: MayMahmoud.jpg - name: May Mahmoud - url: https://www.maymahmoud.org - affiliation: NYUAD - doctoral: - - image: MohayeminIslam.png - name: Mohayeminul Islam - affiliation: UofA - url: https://mohayemin.github.io - - name: Akalanka Galappaththi - image: AkalankaGalappaththi.jpg - url: https://boneyag.github.io/ - affiliation: UofA - alumni: - - name: Afiya Sarah Fahmida - image: AfiyaFahmidaSarah.jpeg - position: MSc - startyear: 2022 - endyear: 2024 - affiliation: UofA - - name: Max Ellis - image: MaxEllis.jpg - position: MSc - startyear: 2019 - endyear: 2022 - affiliation: UofA - - image: MansurGulami.jpg - name: Mansur Gulami - position: MSc - startyear: 2021 - endyear: 2022 - affiliation: UofA - - name: Henry Tang - image: HenryTang.jpg - position: MSc - startyear: 2020 - endyear: 2022 - affiliation: UofA - - name: Henry Tang - image: HenryTang.jpg - position: Undergrad RA - startyear: 2019 - endyear: 2020 - affiliation: UofA - - name: Xichen Pan - image: XichenPan.png - position: Undergrad RA - startyear: 2021 - endyear: 2021 - affiliation: UofA - - name: Xiaole Zeng - image: XiaoleZeng.jpeg - position: Undergrad RA - startyear: 2021 - endyear: 2021 - affiliation: UofA - - name: Varsha Ramesh - image: VarshaRamesh.jpeg - position: Mitacs Intern - startyear: 2021 - endyear: 2021 - affiliation: UofA - - name: Katherine Patenio - image: KatherinePatenio.jpg - position: Undergrad RA - startyear: 2020 - endyear: 2021 - affiliation: UofA - - name: Batyr Nuryyev - image: BatyrNuryyev.png - position: MSc - startyear: 2019 - endyear: 2021 - affiliation: UofA - url: https://batyr.dev - - name: Rehab El-Hajj - image: RehabElHajj.png - position: Undergrad RA - startyear: 2019 - endyear: 2020 - affiliation: UofA - - name: Moein Owadi-Kareshk - image: MoeinOwhadi.jpg - position: MSc - startyear: 2017 - endyear: 2020 - affiliation: UofA - - name: Jennifer Mah - image: JenniferMah.jpg - position: High School Intern - startyear: 2020 - endyear: 2020 - affiliation: UofA - - name: Ryan Shukla - image: RyanShukla.jpeg - position: Undergrad RA - startyear: 2019 - endyear: 2019 - affiliation: UofA - - name: Lida Ling - image: LidaLing.png - position: Undergrad RA - startyear: 2019 - endyear: 2019 - affiliation: UofA - - name: Samer Al Masri - image: SamerAlMasri.png - position: MSc - startyear: 2016 - endyear: 2018 - affiliation: UofA - - name: Nazim Bhuiyan - image: NazimBhuiyan.png - position: Undergrad RA - startyear: 2017 - endyear: 2018 - affiliation: UofA - - name: Mehran Mahmoudi - image: MehranMahmoudi.jpg - position: MSc - startyear: 2016 - endyear: 2018 - affiliation: UofA - url: https://www.linkedin.com/in/mehrmoudi/ - - name: Linna Qian - image: LinnaQian.png - position: High School Intern - startyear: 2018 - endyear: 2018 - affiliation: UofA - - name: Jacob Reckhard - position: Undergrad RA - image: JacobReckhard.png - startyear: 2018 - endyear: 2018 - affiliation: UofA - - name: Fernando Lopez de la Mora - position: MSc - image: FernandoLopez.png - startyear: 2016 - endyear: 2018 - affiliation: UofA - - name: Benyamin Noori - position: MSc - image: BenyaminNoori.png - startyear: 2016 - endyear: 2018 - affiliation: UofA - url: https://www.linkedin.com/in/benyamin-noori-a58aa953/ - - name: Aida Radu - position: Undergrad RA - image: AidaRadu.png - startyear: 2018 - endyear: 2018 - affiliation: UofA - - name: Imtihan Ahmad - startyear: 2018 - endyear: 2018 - position: Undergrad RA - image: ImtihanAhmed.png - url: https://www.linkedin.com/in/imtihan-ahmed/ - affiliation: UofA - - name: Ajay Kumar Jha - position: Postdoc - startyear: 2020 - endyear: 2022 - affiliation: UofA - image: Ajay.jpg - url: https://hifromajay.github.io - - ---- \ No newline at end of file diff --git a/_pages/projects.md b/_pages/projects.md deleted file mode 100644 index 019851f..0000000 --- a/_pages/projects.md +++ /dev/null @@ -1,58 +0,0 @@ ---- -layout: page -title: Research Projects -permalink: /projects/ -description: -nav: false -nav_order: 2 -display_categories: [active, inactive] -horizontal: false ---- - - -
-{%- if site.enable_project_categories and page.display_categories %} - - {%- for category in page.display_categories %} -

{{ category }}

- {%- assign categorized_projects = site.projects | where: "category", category -%} - {%- assign sorted_projects = categorized_projects | sort: "importance" %} - - {% if page.horizontal -%} -
-
- {%- for project in sorted_projects -%} - {% include projects_horizontal.html %} - {%- endfor %} -
-
- {%- else -%} -
- {%- for project in sorted_projects -%} - {% include projects.html %} - {%- endfor %} -
- {%- endif -%} - {% endfor %} - -{%- else -%} - - {%- assign sorted_projects = site.projects | sort: "importance" -%} - - {% if page.horizontal -%} -
-
- {%- for project in sorted_projects -%} - {% include projects_horizontal.html %} - {%- endfor %} -
-
- {%- else -%} -
- {%- for project in sorted_projects -%} - {% include projects.html %} - {%- endfor %} -
- {%- endif -%} -{%- endif -%} -
diff --git a/_pages/publications.md b/_pages/publications.md deleted file mode 100644 index 8bcb804..0000000 --- a/_pages/publications.md +++ /dev/null @@ -1,14 +0,0 @@ ---- -layout: page -permalink: /publications/ -title: Publications -description: publications by categories in reverse chronological order. generated by jekyll-scholar. -nav: false -nav_order: 3 ---- - -
- -{% bibliography -f {{ site.scholar.bibliography }} %} - -
diff --git a/_pages/resources.md b/_pages/resources.md deleted file mode 100644 index 5c06775..0000000 --- a/_pages/resources.md +++ /dev/null @@ -1,13 +0,0 @@ ---- -layout: page -permalink: /resources/ -title: Resources -description: -nav: false -nav_order: 5 ---- - -## Academic Advice and Resources - -* [Writing Papers](/writing-papers/) (discusses logistics and organization) -* [Preparing a Grad School Application & Contacting Potential Supervisors](https://youtu.be/B3oANa67Iq4) diff --git a/_pages/teaching.md b/_pages/teaching.md deleted file mode 100644 index b79b372..0000000 --- a/_pages/teaching.md +++ /dev/null @@ -1,12 +0,0 @@ ---- -layout: page -permalink: /teaching/ -title: Teaching -description: -nav: false -nav_order: 5 ---- - -For now, this page is assumed to be a static description of your courses. You can convert it to a collection similar to `_projects/` so that you can have a dedicated page for each course. - -Organize your courses by years, topics, or universities, however you like! diff --git a/_projects/AI4SE.md b/_projects/AI4SE.md deleted file mode 100644 index ecbc3b2..0000000 --- a/_projects/AI4SE.md +++ /dev/null @@ -1,16 +0,0 @@ ---- -layout: project -title: AI for Software Engineering -description: Leveraging AI for Software Engineering Tasks -importance: 3 -category: active -img: assets/img/projects/artificial-intelligence.png -related-urls: - - title: TestPilot - url: https://githubnext.com/projects/testpilot/ -related_publications: SchaeferTSE2023, NguyenMSR22, GawalLuMSR24 ---- - -Large Language Models (LLMs) have taken the world by a storm. They are being used to improve productivity in various domains and software engineering is no different. In this line of work, we investigate how software developers use LLMs in their work, as well as the effectiveness of LLMs in performing various software engineering tasks. - -Credits: Brain icons created by Freepik - Flaticon \ No newline at end of file diff --git a/_projects/api_misuse.md b/_projects/api_misuse.md deleted file mode 100644 index cece1e1..0000000 --- a/_projects/api_misuse.md +++ /dev/null @@ -1,66 +0,0 @@ ---- -layout: project -title: API Misuse -description: Ensuring that library APIs are correctly used -img: assets/img/projects/mubench.png -importance: 1 -category: active -related_publications: GulamiCASCON22,NuryyevICSME22,AmannMSR19,AmannTSE18,AmannMSR16,KruegerASE17,NadiICSE2016,NADIVamos16,ArztOnward2015,KrugerSecDev23,GalaESEM2024 -related-urls: - - title: MUBench Repository - url: https://github.com/stg-tud/MUBench - - title: MUDetect Repository - url: https://github.com/stg-tud/mudetect - - title: Annotation Usage Rule Generation Pipeline Repository - url: https://github.com/ualberta-smr/generating-annotation-usage-rules - - title: CogniCrypt Project - url: https://projects.eclipse.org/proposals/eclipse-cognicrypt ---- - - -When developers use Application Programming Interfaces (APIs), they often make mistakes that can lead to bugs, system crashes, or security vulnerabilities. We refer to such mistakes as misuses. One example of a misuse is forgetting to call close() after opening a FileInputStream and writing to it. - -We study various types of API misuse. - -### API Misuse of Data-centric Python Libraries - -
-
- {% include figure.html path="assets/img/projects/data-centric-misuse.png" title="Example of a data-centric misuse" class="img-fluid rounded z-depth-1" %} -
-Data-centric Python libraries, such as pandas, matplotlib etc., often deal with diverse data structures, intricate processing workflows, and a multitude of parameters, which can make them inherently more challenging to use correctly. Detecting problems in the usage of these libraries is challenging, not only due to the dynamic nature of Python but due to the fact that some misuses depend on the data that is being processed. In this line of work, we investigate how API misuse manifests in these data-centric libraries and how we can design successful detection strategies to help developers use them correctly. -
- - - -### General Java API Misuse - -
-
- {% include figure.html path="assets/img/projects/mubench.png" title="MuBench" class="img-fluid rounded z-depth-1" %} -
-
- {% include figure.html path="assets/img/projects/mudetect.png" title="MuDetect" class="img-fluid rounded z-depth-1" %} -
-
- -We created MUBench, a benchmark of existing Java API misuses against which we can evaluate several misuse-detectors. We systematically compared existing Java API-misuse detectors and identified weaknesses. This allowed us to design a new API misuse detector, [MuDetect](https://github.com/stg-tud/mudetect), that can achieve higher recall and precision. MuDetect allows us to mine API usage rules that involve method calls and preconditions. These usage rules are then used to find misuses in target projects. MuDetect uses a graph representation called an API Usage Graph (AUG) to represent different aspects of a method call such as the parameters that are required by a method, the types of those parameters, the order in which different method calls are invoked, the exceptions thrown by different method calls, objects that are returned by different method calls. - -### Annotation Misuse in Java - -
-
- {% include figure.html path="assets/img/projects/rvt.png" title="Rule Validation Tool" class="img-fluid rounded z-depth-1" %} -
-
- -While MuDetect focuses on method calls, there are other categories of APIs misuses as well, such as misuses that involve annotations. We built a [human-in-the-loop approach](https://github.com/ualberta-smr/generating-annotation-usage-rules) that focuses on producing accurate Java annotation usage rules. For the ease of usability, these usage rules are packaged into a Maven plugin that can be used to catch bugs (similar to SpotBugs). Our tool is a complete pipeline that provides an easy way to mine and validate usage rules, and generate a misuse detector from confirmed rules. - -### Java Cryptography Misuse - -Through analyzing StackOverflow posts, GitHub repositories, and conducting two surveys of a total of 48 application developers, we collect the problems developers face with the current cryptography APIs and their suggestions for improvement. Some of our findings included that developers have problems choosing the correct algorithm to use and also want higher level abstractions such as tasks. To address these issues, we looked closer at the cryptography domain, and realized that there is a wide variety of cryptographic components and algorithms (e.g., ciphers, digests, signatures, etc.) and that each of these components comes with its own *variability*. For example, a cipher can be symmetric or asymmetric. If it is symmetric, it can operate on blocks or streams. Additionally, there are different modes of operations (e.g., ECB vs CBC) as well as different padding schemes. In order to deal with this huge variability space, we model cryptographic components using concepts from feature modeling. However, such components have many attributes. Additionally, some cryptography solutions may use multiple components at the same time. We, therefore, need additional modeling notations than those offered by basic feature modeling. - -[CogniCrypt](https://projects.eclipse.org/proposals/eclipse-cognicrypt) was built on the insights derived from these studies. - - - diff --git a/_projects/code-recom.md b/_projects/code-recom.md deleted file mode 100644 index 9c808b4..0000000 --- a/_projects/code-recom.md +++ /dev/null @@ -1,23 +0,0 @@ ---- -layout: project -title: Code Recommender Systems -description: Helping Developers use APIs -importance: 5 -category: active -img: assets/img/projects/ComponentsWithQueries.png -related-urls: - - title: Task-oriented Library Documentation - url: https://github.com/ualberta-smr/TaskOrientedDocumentation - - title: Kave Project - url: http://www.kave.cc/ -funding: - - name: Canada Research Chairs Program - link: http://www.chairs-chaires.gc.ca/home-accueil-eng.aspx - img: /resources/images/crc.png -related_publications: ProkschMSR16, CerganiSWAN16, Proksch:2016, TangEMSE2021, GalaMSR22, AbidEMSE21,NadiSANER20,RaduMSR19,ProkschSANER17 ---- - -Do you often spend time searching for how to use a specific library to accomplish your programming task? Do you wish there was a concise code example that you can just integrate into your project? You are not alone. Many developers spend considerable time searching for APIs to use, known issues with a code snippet, or for examples to help them learn a new technology or library. Different types of recommender systems save developers some of this time and pain. In this line of work, we investigate various support tools and recommender systems (Code search, code completion, code generation, etc.) to help developers navigation API information more easily and write better code faster. - -To build code recommender systems, we curate and build data sets, build support techniques (e.g., code completion, code search, documentation navigation), and evaluate these techniques through quantitative empirical methods or qualitative methods (e.g., surveys or user studies). This line of work involves static code analysis, data mining, and natural language processing. - diff --git a/_projects/draca.md b/_projects/draca.md deleted file mode 100644 index 6c9cd4a..0000000 --- a/_projects/draca.md +++ /dev/null @@ -1,12 +0,0 @@ ---- -layout: project -title: CMDBs -description: Root Cause Analysis & Change Impact Analysis using CMDBs -img: assets/img/projects/draca.png -importance: 10 -category: inactive -related_publications: NadiCASCON2009 - ---- - -Many IT systems use Configuration Management Databases (CMDBs) to keep track of which hardware and software is installed as well as any problems that occur over time. Thus, over time, CMDBs collect large amounts of valuable data that can be used for decision support. This project proposes mining historic data from a CMDB to detect common co-changes that can be used to support change impact analysis.We show that using co-changes helps predict change sets with rates as high as 70% recall and 89% precision. Additionally, we propose using data from other repositories such as scheduling information (e.g., backup processes, build processes, etc.) in conjunction with the data in the CMDB to provide support for root cause analysis. Our work on identifying which data from the different repositories can contribute to a better change impact analysis and root cause analysis framework won the best paper award at the 19th Centre of Advanced Studies Conference (CASCON). \ No newline at end of file diff --git a/_projects/lib-sel.md b/_projects/lib-sel.md deleted file mode 100644 index 9d08657..0000000 --- a/_projects/lib-sel.md +++ /dev/null @@ -1,27 +0,0 @@ ---- -layout: project -title: Library Selection -description: Comparing & Selecting Software Libraries -img: assets/img/projects/lib-comparison.png -importance: 3 -category: active -related_publications: LopezdelaMoraICSENIER18,LopezDeLaMoraPROMISE18,ElHajjFSE20,NadiSakrEMSE2022,TangMSR23 -related-urls: - - title: Library Comparison Website - url: http://smr.cs.ualberta.ca/comparelibraries/ - - title: Scripts for Library Metric-based Comparisons - url: https://github.com/ualberta-smr/LibraryMetricScripts - - title: IntelliJ plugin for library comparisons - url: https://github.com/ualberta-smr/LibCompPlugin - - title: Evaluating Software Documentation Quality - url: https://github.com/ualberta-smr/DocumentationQuality -funding: - - name: Canada Research Chairs Program - link: http://www.chairs-chaires.gc.ca/home-accueil-eng.aspx - img: /resources/images/crc.png ---- - -With the abundance of software libraries available, finding the right one to use can be a time-consuming task. -In this research direction, we mine various software repositories to extract information that can be used to compare libraries across various aspects (e.g., their documentation, popularity etc.). - -Given the popularity of data-driven applications, data scientists have become more involved in contributing to various software components. We also explore what selection factors data scientists consider when choosing a software library for their work. \ No newline at end of file diff --git a/_projects/merging.md b/_projects/merging.md deleted file mode 100644 index f5de6d2..0000000 --- a/_projects/merging.md +++ /dev/null @@ -1,33 +0,0 @@ ---- -layout: project -title: Software Integration -description: Helping developers with software evolution & merge conflicts -importance: 3 -category: inactive -img: assets/img/projects/comparison_scenario.png -related_publications: MahmoudiMSR18,MahmoudiSANER19,OwhadiKareshkMSR19,OwhadiKareshkESEM19,EllisTSE2023,BusingEMSE22,BusingeICSEM18,EllisTSE2023 -related-urls: - - title: Android Update Analysis - url: https://github.com/ualberta-smr/Android-Update-Analysis - - title: Refactoring in Merge Commits - url: https://github.com/ualberta-smr/RefactoringsInMergeCommits - - title: Merganser - url: https://github.com/ualberta-smr/merganser -funding: - - name: NSERC - link: http://www.nserc-crsng.gc.ca/index_eng.asp - img: /resources/images/nserc.png - - name: Samsung Global Research Outreach Program - link: http://www.sait.samsung.co.kr/saithome/about/collabo_overview.do - img: /resources/images/samsung.png -collaborators: - - name: Julia Rubin - affiliation: University of British Columbia - url: http://www.ece.ubc.ca/~mjulia/ - - name: Nikolaos Tstantalis - affiliation: Concordia University - url: https://users.encs.concordia.ca/~nikolaos/ - ---- - -Multiple versions of a software system can exist for various reasons, such as developing an SPL or simply forking or branching a repo to work on a given feature. At one point, these versions need to be integrated. Such integration is not an easy task since there may be conflicting changes in the code, textually, syntactically, and semantically. In this work, we look at how we can facilitate such integrations and how we can help developers merge their code more easily with less conflicts. diff --git a/_projects/migration.md b/_projects/migration.md deleted file mode 100644 index f515468..0000000 --- a/_projects/migration.md +++ /dev/null @@ -1,19 +0,0 @@ ---- -layout: project -title: Library Migration -description: Helping developers switch between libraries -importance: 2 -category: active -img: assets/img/projects/migration.png -related-urls: - - title: PyMigBench - url: https://ualberta-smr.github.io/PyMigBench/ -funding: - - name: Canada Research Chairs Program - link: http://www.chairs-chaires.gc.ca/home-accueil-eng.aspx - img: /resources/images/crc.png -related_publications: IslamMSR23, IslamFSE24 ---- - -Software developers often need to replace third-party libraries with newer or better libraries, a process known as *library migration*. -Library migration requires replacing all API usages of the original library in the client code with corresponding API usages from the new library. In this project, our goal is to understand the library migration process and develop tools that can support developers in automatically migrating from one library to an alternative one. \ No newline at end of file diff --git a/_projects/sw-variability.md b/_projects/sw-variability.md deleted file mode 100644 index 4462a54..0000000 --- a/_projects/sw-variability.md +++ /dev/null @@ -1,28 +0,0 @@ ---- -layout: project -title: Software Variability -description: Creating & maintaining variants of the same system -importance: 4 -category: inactive -img: assets/img/projects/var-anomalies.png -related_publications: NadiJSEP2013,NadiDS2013,NadiMSR2013,NadiCSMR2012,NadiWCRE2011, Medeiros2015, NadiTSE2015, NadiICSE2014, ALMasriSPLC18,AlMasriCASCON17, NuryyevICSE21, BusingeICSEM18,BusingEMSE22 -related-urls: - - title: Farce Appendix (for Reverse Engineering Configuration Constraints) - url: http://gsd.uwaterloo.ca/farce - - title: Farce Source Code - url: https://bitbucket.org/snadi/farce - - title: OMR Statistics - url: https://github.com/samasri/omr/tree/master/tools/compiler/OMRStatistics - - title: VarClang - url: https://github.com/ualberta-smr/varclang - - title: BruteClang - url: https://github.com/nbhuiyan/BruteClang - - title: Makex (CSMR 2012 paper) - url: https://github.com/snadi/makex - - title: Linux Variability Anomalies Evolution (MSR 2012 paper) - url: https://github.com/snadi/LinuxVarAnomalyEvolution ---- - -Software reuse is essential to build software faster. Different customers or platforms may need different features of the same software system. Instead of copy-and-paste mechanisms where different copies of the system is maintained, using Software Product Lines (SPLs) or Highly Configurable Software is a way to systematically create and maintain different variants of the same system. - -We have a long line of work in this area, exploring different aspects of creating and maintaining SPLs. A lot of this work is done on the Linux kernel, as an exemplar of an extremely large and popular highly configurable system. We also explored other systems such as [Eclipse OMR](https://github.com/eclipse/omr) and Android App software families.