Research_Idea.tex

%\documentclass{article}
%\usepackage[letterpaper,margin=2.1cm]{geometry}
%\usepackage{xcolor}
%\usepackage{fancyhdr}
%\usepackage{tgschola} % or any other font package you like

\documentclass[12pt]{article}
\usepackage{extsizes}
\usepackage{graphicx}
\usepackage[hidelinks]{hyperref}
\usepackage{multirow}
\usepackage{tabularx}
\usepackage{color}
\usepackage{amsmath}
\usepackage{amssymb}
\usepackage{amsfonts}
\usepackage{amsxtra}
\usepackage{wasysym}
\usepackage{isomath}
\usepackage{mathtools}
\usepackage{txfonts}
\usepackage{upgreek}
\usepackage{enumerate}
\usepackage{enumitem}
\usepackage{tensor}
\usepackage{pifont}
\usepackage[margin=15mm]{geometry}
\definecolor{color-1}{rgb}{0.26,0.26,0.26}
\definecolor{color-2}{rgb}{0.4,0.4,0.4}
\usepackage{extsizes}
\usepackage{tocbibind}
\usepackage{float}
\usepackage{flafter}
\usepackage{xcolor}
\usepackage{sectsty}
\usepackage[font=small, skip=0pt]{caption}
\usepackage{setspace}
\setstretch{1.1}
\usepackage{fancyhdr}

\usepackage{nopageno}

% Select the font
\usepackage{charter}


\usepackage[%
square,        % for square brackets
comma,         % use commas as separators
numbers,       % for numerical citations;
%sort           % orders multiple citations into the sequence in which they appear in the list of references;
sort&compress % as sort but in addition multiple numerical citations
% are compressed if possible (as 3-6, 15);
]{natbib}

\renewcommand{\bibfont}{\normalfont\footnotesize}
\usepackage{hyperref}
\hypersetup{
	colorlinks = true,
	citecolor = {blue}
}


\newcommand{\soptitle}{Reinforcement Learning Improves Edge Computing}
\newcommand{\yourname}{Iman Rahmati}
\newcommand{\youremail}{iman.rahmati@sharif.edu}
\newcommand{\yourweb}{\href{https://imanrht.github.io}{imanrht.github.io}}

\newcommand{\statement}[1]{\par\medskip
	\underline{\textcolor{blue}{\textbf{#1:}}}\space
}

%\usepackage[
%colorlinks,
%breaklinks,
%pdftitle={\yourname - \soptitle},
%pdfauthor={\yourname},
%urlcolor  = blue,
%citecolor = blue,
%anchorcolor = blue,
%unicode
%]{hyperref}


\usepackage{setspace}
\onehalfspacing

\begin{document}
	

%\pagestyle{fancy}
%\fancyhf{}
%\fancyhead[C]{%
%	\footnotesize\sffamily\vspace{8mm}
%	\textcolor{blue}{\href{mailto:iman.rahmati@sharif.edu}{Research Ideas, V0.1}}  \hfill
%	\textcolor{blue}{\href{https://imanrht.github.io/assets/images/CV_ImanRahmati.pdf}{20 Sep. 2024\vspace{2mm}}}}
%


\begin{center} 
	
	
	\vspace{-17mm}
	
	\large Iman Rahmati  \hfill Multi-Agent DRL for MEC\vspace{1mm} \hrule
	
	\vspace{-1mm}
	
	\textcolor{white}{i} \\ \LARGE Multi-Agent Deep Reinforcement Learning for Cooperative Task Offloading in Partially Observable Mobile Edge Computing Environment \vspace{6mm}\\
	
	
\end{center}
 \small
\vspace{-5mm}

\noindent\textbf{\large Motivation:  }
\noindent
In MEC, each entity may need to make local decisions to improve network performance in dynamic and uncertain environments. Standard learning algorithms, such as single-agent RL or DRL \cite{liao2023online}, \cite{huang2019deep}, have recently been used to enable each network entity to learn an optimal decision-making policy adaptively through interaction with the unknown environment. However, these algorithms fail to model cooperation or competition among network entities, treating other entities simply as part of the environment, which can lead to non-stationarity issues. MARL enables each network entity to learn its optimal policy by observing both the environment and the policies of other entities while interacting with a shared or separate environment to achieve specific objectives \cite{zhang2021multi}.

%As a result, MARL can significantly improve the learning efficiency of network entities, and it has been recently used to solve various issues in emerging networks.


%In multi-agent DRL, multiple agents interact with a shared or separate environment to achieve specific objectives. Each agent independently learns through trial and error while accounting for the actions and policies of other agents. Multiple agents interact with a shared or separate environment to achieve specific objectives. In mobile edge computing, each device might be an agent trying to optimize its computation offloading strategy while considering the resource usage and strategies of other devices.


\vspace{2mm}

\noindent\textbf{\large Problem Statement: }
\noindent
Task offloading is a critical process to efficiently assign available resources to task requests, for high-performance, reliable, and cost-effective services. In MEC, the decision-making process of task offloading focuses on efficiently distributing tasks among edge servers, where resources refer to limited computation, storage, and communication resources of edge and cloud servers. Typically, the offloading process involves two layers of heterogeneous decisions making problems (\textbf{P1, P2}) as follows,
%The task offloading decision-making process focuses on efficiently distributing tasks among edge servers. 
\vspace{-2mm}
\begin{itemize}
	\item\textbf{P1.\hspace{2mm}Devise-edge task offloading.} Enables devices to independently make decisions on offloading resource-intensive tasks to nearby edge servers, fostering efficient utilization of resources.\vspace{-2mm}
	\item\textbf{P2.\hspace{2mm}Edge-edge task offloading.} Leverages edge-edge collaborations, where tasks initially received by a local edge server can be offloaded to neighboring servers with underutilized resources. %Offloading tasks between edge servers require communication resources and may introduce additional transmission delay, which should be taken into account when designing offloading strategies.
\end{itemize}

\noindent\textbf{\large Problem Model: }
\noindent
The main problem can be formulated as the decomposition of sub-problems \textbf{P1} and \textbf{P2} as a \textbf{Decenteralized Partially Observable Markov Decision Processes (Dec-POMDP}) \cite{oliehoek2016concise}, where multiple devices and edge servers interacting with each other by its observation of the environment, which is a part of main overall state. 

\vspace{3mm}

\noindent\textbf{\large Research Methodology: }
\begin{enumerate} 
	\item \textbf{Algorithm Design:} Developing a \textbf{MARL} algorithm using techniques such as \textbf{Deep Deterministic Policy Gradient (DDPG)} \cite{lillicrap2015continuous} or \textbf{Dueling Double Deep Q-Networks (D3QN)} \cite{van2016deep}, with a focus on communication and collaboration, coordination or competition between agents. \vspace{-1mm}
	\item \textbf{Simulation Environment:} A simulated MEC environment will be developed using Python or a suitable simulation platform, where mobile devices can offload tasks to edge servers and edge servers can distribute their computation workloads, under different network conditions.\vspace{-1mm}
	\item \textbf{Key Challenges:} (a) Coordination or competition between agents. (b) The non-stationary environment due to actions of other agents.  (c) Scalability issues as the number of agents increases.
\end{enumerate}


\bibliographystyle{IEEEtranN} % IEEEtranN is the natbib compatible bst file
% argument is your BibTeX string definitions and bibliography database(s)
\bibliography{paper}


\end{document}