From 2D CAD Drawings to 3D Parametric Models: A Vision-Language Approach

Xilin Wang1*     Jia Zheng2*     Yuanchao Hu2     Hao Zhu2     Qian Yu1†     Zihan Zhou2†
1Beihang University     2Manycore Tech Inc.
*Equal contribution     *Corresponding authors

CAD2Program is a Vision-Language Model for reconstructing 3D parametric models from 2D CAD drawings.

Abstract

In this paper, we present CAD2Program, a new method for reconstructing 3D parametric models from 2D CAD drawings. Our proposed method is inspired by recent successes in vision-language models (VLMs), and departs from traditional methods which rely on task-specific data representations and/or algorithms. Specifically, on the input side, we simply treat the 2D CAD drawing as a raster image, regardless of its original format, and encode the image with a standard ViT model. We show that such an encoding scheme achieves competitive performance against existing methods that operate on vector-graphics inputs, while imposing substantially fewer restrictions on the 2D drawings. On the output side, our method auto-regressively predicts a general-purpose language describing 3D parametric models in text form. Compared to other sequence modeling methods for CAD which use domain-specific sequence representations with fixed-size slots, our text-based representation is more flexible, and can be easily extended to arbitrary geometric entities and semantic or functional properties. Experimental results on a large-scale dataset of cabinet models demonstrate the effectiveness of our method.

2D CAD Drawings

An engineering drawing is mixture of two types of layers:

  • geometry layer, which is the actual object described by its orthographic projections,
  • annotation layer, which includes dimensioning and function symbols, such as surface types, manufacturing instructions, etc.

3D Parametric Models

In this paper, a 3D cabinet is built by assembling pre-defined primitive models. Each primitive instance is defined by a computer program, which consists of three parts:

  • model ID, which is a unique identifier of a primitive in the database,
  • common parameters, which indicate the general pose and size of the primitive in the 3D space,
  • model-specific parameters, which describe possible variations of a specific primitive.

We represent 3D parametric models as scripts of a general-purpose language (e.g., Python). The shape program of the above cabinet is shown as follows:

bbox_0 = Bbox(507, 185, 805, 1014, 370, 50, 0)
model_0 = <model_57761062>()
bbox_1 = Bbox(25, 185, 390, 50, 370, 780, 0)
model_1 = <model_57758898>()
bbox_2 = Bbox(532, 195, 390, 964, 350, 780, 0)
model_2 = <model_115813862>(N=1, NKA=928, DBXX=1, BT=18)
bbox_3 = Bbox(532, 185, 390, 928, 330, 18, 0)
model_3 = <model_57253481>()
bbox_4 = Bbox(291, 11, 390, 478, 18, 776, 0)
model_4 = <model_82289390>(openDirection=0, uCove=18, dCover=18, lCover=18, rCover=18)
bbox_5 = Bbox(773, 11, 390, 478, 18, 776, 0)
model_5 = <model_82289390>(openDirection=1, uCover=18, dCover=18, lCover=18, rCover=18)

The above script defines a cabinet with six primitive models. Each two lines corresponds to a primitive model. The odd line defined the bounding box of the primitive and then the even line defined the model ID and associated parameters.

CAD2Program Model

We adopt an off-the-shelf Vision-Language Model (such as InternVL). The CAD2Program takes 2D engineering drawing as input and outputs a text-form of shape program, which depicts the 3D parametric model. The pipeline of our method is shown as follows.

We show a conversation example of prompt and response in Python format in the following.

BibTeX

@inproceedings{CAD2Program,
  author    = {Wang, Xilin, Zheng, Jia and Hu, Yuanchao and Zhu, Hao and Yu, Qian and Zhou, Zihan},
  title     = {From 2D CAD Drawings to 3D Parametric Models: A Vision-Language Approach},
  booktitle = {AAAI},
  year      = {2025}
}

Acknowledgements

This work was done during Xilin Wang's internship at Manycore Tech Inc.