Masteruppsatser
https://hdl.handle.net/2077/28887
2023-06-07T19:15:32ZCredit Card Fraud Detection by Nearest Neighbor Algorithms
https://hdl.handle.net/2077/75997
Credit Card Fraud Detection by Nearest Neighbor Algorithms
Maghsood, Ramin
As the usage of internet banking and online purchases have increased dramatically
in today’s world, the risk of fraudulent activities and the number of fraud cases are
increasing day by day. The most frequent type of bank fraud in recent years is credit
card fraud which leads to huge financial losses on a global level. Credit card fraud
happens when an unauthorized person uses another person’s credit card information
to make purchases. Credit card fraud is an important and increasing problem for
banks and individuals, all around the world. This thesis applies supervised and
unsupervised nearest neighbor algorithms for fraud detection on a Kaggle data set
consisting of 284,807 credit card transactions out of which 492 are frauds, and which
includes 30 covariates per transaction. The supervised methods are shown to be
quite efficient, but require that the user has access to labelled training data where
one knows which transactions are frauds. Unsupervised detection is harder and, e.g.,
for finding 80% of the frauds, the algorithm classifies more 50 times as many valid
transactions as fraud cases. The unsupervised nearest neighbor distance method is
compared to methods using the distance to the center of the data for fraud detection,
and detection algorithms which combine the two methods. The L2 distance and L2
distance to zero and the combination of both distances are analyzed for unsupervised
method. The performance of the methods is evaluated by the Precision-Recall (PR)
curves. The results show that based on both area under curve and precision at 80%
recall, L2 distance to zero performs slightly better than L2 distance.
2023-04-13T00:00:00ZPoint process learning for non-parametric intensity estimation with focus on Voronoi estimation
https://hdl.handle.net/2077/75740
Point process learning for non-parametric intensity estimation with focus on Voronoi estimation
Thorén, Alexander
Point process learning is a new statistical theory that gives us a way to estimate
parameters using cross-validation for point processes. By thinning a point
pattern we are able to create training and validation sets which are then used
in prediction errors. These errors give us a way to measure the discrepancy
between two point processes and are used to measure how well the training sets
can predict the validation sets. We investigate non-parametric intensity estimation
methods with a focus on the resample-smoothing Voronoi estimator. This
estimator works by repeatedly thinning a point pattern, finding the Voronoi
intensity estimate of the thinned point pattern, and then using the mean as the
final intensity estimate. Previously, only a thumb rule was given as to how to
choose parameters for the resample-smoothing Voronoi estimator but with the
help of point process learning we now have a data-driven method to estimate
these parameters.
2023-03-28T00:00:00ZThe low-lying zeros of L-functions associated to non-Galois cubic fields
https://hdl.handle.net/2077/74926
The low-lying zeros of L-functions associated to non-Galois cubic fields
Ahlquist, Victor
We study the low-lying zeros of Artin L-functions associated to non-Galois cubic number
fields through their one- and two-level densities. In particular, we find new precise estimates
for the two-level density with a power-saving error term. We apply the L-functions Ratios
Conjecture to study these densities for a larger class of test functions than unconditional
computations allow. By reviewing a known Ratios Conjecture prediction, due to Cho,
Fiorilli, Lee, and Södergren, for the one-level density, we isolate a phase transition in the
lower-order terms, which reveals a striking symmetry. Our computations show that the
same symmetry exists in the one-level density of several other families, that have previously
been studied in the literature, and this motivates us to formulate a conjecture extending one
part of the Katz–Sarnak prediction for families of symplectic symmetry type. Moreover, we
isolate several phase transitions in the lower-order terms of the two-level density. To the
best of our knowledge, this is the first time such phase transitions have been observed in
any n-level density with n ≥ 2.
2023-02-13T00:00:00ZDecision Policies for Early Stage Clinical Trials with Multiple Endpoints
https://hdl.handle.net/2077/74122
Decision Policies for Early Stage Clinical Trials with Multiple Endpoints
López Juan, Víctor
Before a drug can be prescribed to patients, it must be shown to be safe and effective
for a certain indication in a controlled clinical trial (known as Phase III).
Such studies are costly to run and expose patients to potential risks. Therefore,
after initial studies in human subjects show the drug’s safety (Phase I),
studies with a small number of patients are run to assess the prospects of
the drug (Phase II). If the number of patients in a Phase II study is not be
sufficient to detect differences in the variable of interest (e.g. number of hospitalizations
due to heart failure); a surrogate variable which is predictive of the
variable of interest is used instead. A decision framework originally proposed
by Lalonde (2007) is used in industry to determine, based on a single surrogate
endpoint, whether to “Go” ahead with a Phase III study, or to “Stop” development
of the drug. In some therapeutic areas, a single endpoint is not sufficient
to predict the Phase III variable of interest; several related endpoints are used
instead. Endpoints which are considered clinically related may be grouped
into domains. How to best combine several disease markers across different
domains to achieve the desired probabilities of correct/incorrect decisions is
an open question.
This report presents an extension to multiple endpoints of the decision
framework proposed by Lalonde. In this extension, decision policies are formulated
in two levels. First, a Go or Stop decision is made for each domain, for
example by individually comparing each of the relevant endpoints to certain
thresholds. Performing multiple comparisons heightens the risk of an incorrect
Go decision. This risk can be controlled effectively by using the Simes
procedure (1986), which is a special case of the Benjamini-Hochberg (1995)
method. Domain-level decisions are then combined into policies fulfilling a
monotonicity property. This property enables the calculation of upper bounds
for the probability of an incorrect decision, and lower bounds for the probability
of a correct decision. These calculations are performed both for purely
synthetic endpoints and for a case study involving endpoints related to heart
failure. The resulting bounds are analogues to the statistical notions of Type I
error and power, respectively. Heuristics are derived to help practitioners decide
which endpoints to include, depending on the statistical power of these
endpoints and on which combinations of true effects are of clinical interest.
Overall, the framework proposed in this report can represent many of the
policies used by practitioners when designing Phase II studies with multiple
endpoints. The outcome of the simulations presented in this theses can guide
the selection of endpoints in order to achieve the desired bounds on the probabilities
of correct and incorrect decisions.
2022-11-11T00:00:00Z