From 5cad123ad2e21bee54cd14523010fb8049b6b1f9 Mon Sep 17 00:00:00 2001 From: Michael Chen Date: Mon, 9 May 2022 13:56:14 +0200 Subject: [PATCH] Working on introduction 1 finished --- .../SpecialSession01_MichaelChen.tex | 142 ++++++++++++++++++ 1 file changed, 142 insertions(+) create mode 100644 project_task_sheets/phase_research/SpecialSession/SpecialSession01_MichaelChen.tex diff --git a/project_task_sheets/phase_research/SpecialSession/SpecialSession01_MichaelChen.tex b/project_task_sheets/phase_research/SpecialSession/SpecialSession01_MichaelChen.tex new file mode 100644 index 0000000..0a00343 --- /dev/null +++ b/project_task_sheets/phase_research/SpecialSession/SpecialSession01_MichaelChen.tex @@ -0,0 +1,142 @@ +\documentclass[a4paper]{scrreprt} +\usepackage[left=4cm,bottom=3cm,top=3cm,right=4cm,nohead,nofoot]{geometry} +\usepackage{graphicx} +\usepackage{tabularx} +\usepackage{listings} +\usepackage{enumitem} +\usepackage{subcaption} +\usepackage{amsmath} +\usepackage{float} +\usepackage{fancyvrb} % for "\Verb" macro +\usepackage{hyperref} +\usepackage{csquotes} +\usepackage[acronym]{glossaries} + +\usepackage{pgf} +\usepackage{tikz} +\usetikzlibrary{arrows,automata} + +\newacronym{svm}{SVM}{support-vector machine} +\newacronym{nb}{NB}{naive Bayes} +\newacronym{roc}{ROC}{receiver operating characteristic} + +\usepackage{xparse} +\usepackage{multirow} + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +\setlength{\textfloatsep}{16pt} + +\renewcommand{\labelenumi}{\alph{enumi})} +\renewcommand{\labelenumii}{\arabic{enumii}) } + +\newcommand{\baseinfo}[5]{ + \begin{center} + \begin{tabular}{p{15cm}r} + \vspace{-4.5pt}{ \Large \bfseries #1} & \multirow{2}{*}{} \\[0.4cm] + #2 & \\[0.5cm] + \end{tabular} + \end{center} + \vspace{-18pt}\hrule\vspace{6pt} + \begin{tabular}{ll} + \textbf{Name:} & #4\\ + \textbf{Group:} & #5\\ + \end{tabular} + \vspace{4pt}\hrule\vspace{2pt} + \footnotesize \textbf{Software Testing} \hfil - \hfil Summer 2022 \hfil - \hfil #3 \hfil - \hfil Sibylle Schupp / Sascha Lehmann \hfil \\ +} + +\newcounter{question} +\NewDocumentEnvironment{question}{m o}{% + \addtocounter{question}{1}% + \paragraph{\textcolor{red}{Task~\arabic{question}} - #1\hfill\IfNoValueTF{#2}{}{[#2 P]}} + \leavevmode\\% +}{% + \vskip 1em% +} + +\NewDocumentEnvironment{answer}{}{% + \vspace{6pt} + \leavevmode\\ + \textit{Answer:}\\[-0.25cm] + {\color{red}\rule{\textwidth}{0.4mm}} +}{% + \leavevmode\\ + {\color{red}\rule{\textwidth}{0.4mm}} +} + +\newcommand{\projectinfo}[5]{ + \baseinfo{Special Session #1 - Submission Sheet}{#2}{#3}{#4}{#5} +} +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + +\def\name{Michael Chen} +\def\group{Group 01 (fastjson)} + +\begin{document} +\projectinfo{1}{Software Testing - Introduction Write-Up\small}{\today}{\name}{\group} + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +%%% Task 1 %%% +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +\begin{question}{Analysis Task (for the given introduction text)}[0] +\begin{enumerate}[topsep=0pt, leftmargin=*] + \item Read the provided \textit{Introduction} carefully. + + \item Summarize the main contributions of the paper as depicted in / conceived from the \textit{Introduction}. + \begin{answer} + \begin{enumerate} + \item Introduction 1: + Email spam and malware is a major problem, therefore the study compares the performance of three different supervised machine learning based spam classification algorithms. The main contribution of this paper is the review of the the performance of those algorithms using six different performance metrics. This paper identified \gls{svm} models as the most reliable means of email filtering. + \end{enumerate} + \end{answer} + + \item Come up with two title suggestions that appropriately describe the (conceived) topic of the paper. + \begin{answer} + \begin{enumerate} + \item Introduction 1: + \enquote{Comprehensive Study on Supervised Machine Learning Models for Email Spam Filtering} + \end{enumerate} + \end{answer} + + \item When you are done with a) - c), ask the supervisor for the actual title and topic of the paper. + \begin{answer} + \begin{enumerate} + \item Introduction 1: + The actual title is \enquote{Analysis and result of classification algorithm on email classification} + \end{enumerate} + \end{answer} + + \item What are the main shortcomings of the \textit{Introduction} in its given form? + \begin{answer} + \begin{enumerate} + \item Introduction 1: + The introduction of this paper is not as bad as I expected it to be, given the task question. The introduction well outlines the environment and use case of email communication and establishes the rationale of why the paper is important right now and why it should be of interest to research. The introduction also mentions prior research of several \gls{nb} models that will be compared with the new research. The methodology is also outlined (the different performance metrics that are applied). The main shortcomings that I could identify are the following: + \begin{enumerate} + \item The introduction should end with the statement that summarizes the main idea of the entire research. + \item The introduction is also missing an outline of how the paper is structured. After reading the introduction the reader is left without a guide on how to navigate the paper. + \item There should not be any full URL citations in the introduction, and generally not in the paper, rather as a citation reference in the bibliography. + \item There are multiple unexpanded acronyms of the different \gls{nb} models and the \gls{roc} curve. + \item There are some phrases that are not formal language such as \enquote{waste of time} and \enquote{such huge spam}. + \item The word email and some others is used very inconsistently. Just because other formats of the word were introduced it does not mean that you should switch between those so much. + \item Many consecutive sentences start with the same word, specifically \enquote{it} and \enquote{email}. + \item There are some spacing and formatting issues: email with wrong spacing \enquote{e-~mail}, missing Oxford commas like \enquote{NB, NBT, BN and DTNB}, bad citations \enquote{(Nilam et al.,2017~)}, the word \enquote{f~measure} should be \enquote{F-score}, and multiple occurrences of inconsistent multiple whitespace fillers. + \end{enumerate} + \end{enumerate} + \end{answer} + + \item Make suggestions on how to improve the \textit{Introduction}. Which improvements would you prioritize, and why? + \begin{answer} + \begin{enumerate} + \item Introduction 1: + Obviously, the highest priortiy should be in fixing \textit{all} formatting and spacing issues. The second highest priority should be fixing the informal language and the repeating sentence starts. Finally, I would add the missing outline of the paper layout as this helps any reader understand and navigate the research paper more quickly and effectively. + \end{enumerate} + \end{answer} + + \end{enumerate} +\end{question} + +\end{document}