In this paper, we present CAD2Program, a new method for reconstructing 3D parametric models from 2D CAD drawings. Our proposed method is inspired by recent successes in vision-language models (VLMs), and departs from traditional methods which rely on task-specific data representations and/or algorithms. Specifically, on the input side, we simply treat the 2D CAD drawing as a raster image, regardless of its original format, and encode the image with a standard ViT model. We show that such an encoding scheme achieves competitive performance against existing methods that operate on vector-graphics inputs, while imposing substantially fewer restrictions on the 2D drawings. On the output side, our method auto-regressively predicts a general-purpose language describing 3D parametric models in text form. Compared to other sequence modeling methods for CAD which use domain-specific sequence representations with fixed-size slots, our text-based representation is more flexible, and can be easily extended to arbitrary geometric entities and semantic or functional properties. Experimental results on a large-scale dataset of cabinet models demonstrate the effectiveness of our method.
An engineering drawing is mixture of two types of layers:
In this paper, a 3D cabinet is built by assembling pre-defined primitive models. Each primitive instance is defined by a computer program, which consists of three parts:
We represent 3D parametric models as scripts of a general-purpose language (e.g., Python). The shape program of the above cabinet is shown as follows:
bbox_0 = Bbox(507, 185, 805, 1014, 370, 50, 0)
model_0 = <model_57761062>()
bbox_1 = Bbox(25, 185, 390, 50, 370, 780, 0)
model_1 = <model_57758898>()
bbox_2 = Bbox(532, 195, 390, 964, 350, 780, 0)
model_2 = <model_115813862>(N=1, NKA=928, DBXX=1, BT=18)
bbox_3 = Bbox(532, 185, 390, 928, 330, 18, 0)
model_3 = <model_57253481>()
bbox_4 = Bbox(291, 11, 390, 478, 18, 776, 0)
model_4 = <model_82289390>(openDirection=0, uCove=18, dCover=18, lCover=18, rCover=18)
bbox_5 = Bbox(773, 11, 390, 478, 18, 776, 0)
model_5 = <model_82289390>(openDirection=1, uCover=18, dCover=18, lCover=18, rCover=18)
The above script defines a cabinet with six primitive models. Each two lines corresponds to a primitive model. The odd line defined the bounding box of the primitive and then the even line defined the model ID and associated parameters.
We adopt an off-the-shelf Vision-Language Model (such as InternVL). The CAD2Program takes 2D engineering drawing as input and outputs a text-form of shape program, which depicts the 3D parametric model. The pipeline of our method is shown as follows.
We show a conversation example of prompt and response in Python format in the following.
@inproceedings{CAD2Program,
author = {Wang, Xilin, Zheng, Jia and Hu, Yuanchao and Zhu, Hao and Yu, Qian and Zhou, Zihan},
title = {From 2D CAD Drawings to 3D Parametric Models: A Vision-Language Approach},
booktitle = {AAAI},
year = {2025}
}