Custom Styles Hoàng Anh Just | PhD Student

Hi, I'm Hoàng Anh Just.

A student at Virginia Tech with a passion in data valuation for machine learning.


I am a PhD student in computer engineering at Responsible Data Science Lab (ReDS Lab) located at Virginia Polytechnic Institute and State University (Virginia Tech).
I am fortunate to be advised by Prof. Ruoxi Jia.
I finished my bachelor degrees in mathematics and computer science at Gettysburg College, where I had a pleasure to work with Prof. Béla Bajnok and Prof. Todd Neller, respectively.
I enjoy working on data-centric AI, especially to measure the importance of each data point used to train a model.

    Focus Questions:
  • [Data Valuation]
    How much does the data cost?
  • [Data Selection]
    How to choose the best data to meet the model owner's target?
  • [Model Prediction]
    How to predict the model performance given training data?
  • [AI Privacy]
    How to protect your data used to train a model?
  • [Data Leakage]
    How to extract data used to train a model?


ICLR 2024
Feiyang Kang, Hoang Anh Just, Yifan Sun, Himanshu Jahagirdar, Yuanzhi Zhang, Rongxing Du, Anit Sahu, Ruoxi Jia
  • Developed a scalable data selection method to pre-fine-tune a pretrained large language model (LLM) by selecting (unlabeled) data that can shift the source distribution to better align with the target distribution.
NeurIPS 2023
Feiyang Kang*, Hoang Anh Just*, Anit Sahu, Ruoxi Jia
  • Proposed a performance estimator for a model trained on any data composition given only sample information and a scaling law to predict performance on larger scales, which effectively finds the optimal composition of data sources for any target data size.
ACM CCS 2023
Yi Zeng*, Minzhou Pan*, Hoang Anh Just, Lingjuan Lyu, Meikang Qiu and Ruoxi Jia
  • Launched an efficient (poisoning 0.5% of the target class and 0.05% of the entire training dataset) and stealthy (hard to detect) backdoor attack, which requires only knowledge of the target class to successfully deploy the attack.
RAID 2023
Myoengseob Ko, Xinyu Yang, Zhengjie Ji, Hoang Anh Just, Peng Gao, Ruoxi Jia
  • Established an efficient real-time detection system to membership inference attacks which prevents attackers from inferring sensitive data used for model training.
ICML 2023
Liu Zhihong*, Hoang Anh Just*, Xiangyu Chang, Xi Chen, Ruoxi Jia
  • Proposed a novel, efficient approach to fine-grained data analysis, which valuates the quality of each feature of each data point with theoretical grounding.
ICLR 2023
Hoang Anh Just*, Feiyang Kang*, Tianhao Wang, Yi Zeng, Myeongseob Ko, Ming Jin and Ruoxi Jia
  • Introduced an efficient data quality valuation method through adopting a modified class-wise Wasserstein distance, which is robust to noisy, mislabeled, and poisoned data without requiring any model training.
IEEE SatML 2023
Yingyan Zeng, Tianhao Wang, Si Chen, Hoang Anh Just, Ran Jin, Ruoxi Jia
  • Developed a set-function based neural network which can predict model weights from the training dataset of any size. This method enables efficient applications for data valuation, data selection, or data memorization, which requires multiple model re-trainings.
CVPR 2022
Mostafa Kahla, Si Chen, Hoang Anh Just, Ruoxi Jia
  • Designed a novel practical model inversion attack which recovers sensitive data by accessing only labels of the model output without additional information.
Involve 2022
Bela Bajnok, Connor Berson and Hoang Anh Just
  • Proved that for sets of size greater than 3, there are no perfect restricted 2-basis in Z_n. Showed that for only sets of size smaller equal to 3 there exists a perfect restricted 2-basis in Z_n, proving by contradiction knowing that Z_n is closed under both addition and subtraction.
Involve, a Journal of Mathematics, 12/2022
AAAI 2021
Peter Francis*, Hoang Anh Just*, Todd Neller
  • We describe various approaches to opponent hand estimation in the card game Gin Rummy. We use an application of Bayes' rule, as well as both simple and convolutional neural networks, to recognize patterns in simulated game play and predict the opponent's hand. We also present a new minimal-sized construction for using arrays to pre-populate hand representation images.


The Bradley Department of Electrical and Computer Engineering, Virginia Tech

Graduate Teaching Assistant
Artificial Intelligence and Engineering Applications
Fall 2021 - Fall 2022

Mathematics Department, Gettysburg College

Peer Learning Associate
Abstract Mathematics
Fall 2018 - Spring 2021

Computer Science Department, Gettysburg College

Teaching Assistant and Grader
Computer Science I and II
Fall 2018 - Spring 2021



CVPR 2023

NeurIPS 2023

ICLR 2023


Virginia Polytechnic Institute and State University (Virginia Tech)

Blacksburg, VA, USA, Earth

Degree: PhD Student in Computer Engineering

    Relevant Courseworks:

    • Natural Language Processing (NLP)
    • Advanced Machine Learning (ML)
    • Convex Optimization

Gettysburg College

Gettysburg, PA, USA, Earth

Degrees: BA in Mathematics, BS in Computer Science

    Relevant Courseworks:

    • Data Structures and Algorithms
    • Combinatorics
    • Abstract Algebra
    • Artificial Intelligence (AI)


2023 @ Hoàng Anh Just | Thank you!