Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 48 additions & 0 deletions src/powershell/tests/Test-Assessment.35036.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
Trainable classifiers are Microsoft Purview's machine learning-based content classification engine that learns to identify sensitive or business-critical information based on organizational examples. Unlike fixed-pattern Sensitive Information Types (SITs) that match predefined formats (e.g., credit card numbers, phone numbers), trainable classifiers use artificial intelligence to recognize nuanced content patterns such as strategic plans, financial reports, HR documents, or proprietary research—data that lacks consistent structured formats but requires protection based on meaning and context. By integrating trainable classifiers into auto-labeling policies and Data Loss Prevention (DLP) rules, organizations extend their data protection capabilities beyond pattern-matching to semantic understanding of content. This enables automatic classification and protection of complex, unstructured data that would be difficult or impossible to capture with traditional pattern-based rules. Organizations leveraging trainable classifiers in policies achieve broader data discovery and more comprehensive compliance coverage, particularly for sensitive business documents that require context-aware classification rather than format-based detection.

**Remediation action**

To create and deploy trainable classifiers:

1. Sign in as a Global Administrator or Compliance Administrator to the [Microsoft Purview portal](https://purview.microsoft.com)
2. Navigate to Information Protection > Classifiers > Trainable classifiers
3. Select "+ Create trainable classifier"
4. Enter a name and description for your classifier (e.g., "Strategic Plans", "Financial Reports", "HR Documents")
5. Select the classifier type (Document or Message classifier)
6. Upload example documents:
- **Positive examples**: 5-500 documents that represent the content you want to classify
- **Negative examples** (optional): 5-500 documents that represent content you do NOT want to classify
7. Review examples for accuracy and completeness
8. Submit the classifier for training
9. Wait for training completion (typically 1-2 weeks)
10. After classifier is published, integrate into policies:
- **Auto-Labeling**: Create auto-labeling policy with rule that uses the classifier condition
- **DLP**: Create DLP policy with rule that uses the classifier condition
11. Monitor classifier accuracy through DLP and auto-labeling rule matches

Example trainable classifier scenarios:
- **Strategic Plans**: Confidential business strategy documents, competitive analysis
- **Financial Reports**: Earnings reports, budget documents, financial forecasts
- **HR Documents**: Employee records, compensation information, performance reviews
- **Patent Documents**: Intellectual property, patent applications, technical specifications
- **Customer Contracts**: Business agreements, customer-specific terms, confidential pricing

To integrate classifiers into auto-labeling policies:
1. Create or edit auto-labeling policy
2. Add rule with condition: "Content contains information detected by trainable classifier"
3. Select the trained classifier from the dropdown
4. Assign the appropriate sensitivity label
5. Publish the policy

To integrate classifiers into DLP policies:
1. Create or edit DLP compliance policy
2. Add rule with condition using trainable classifier detection
3. Define protection actions (restrict access, notify user, block action)
4. Publish the policy

- [Create and train trainable classifiers](https://learn.microsoft.com/en-us/purview/classifier-learn-about)
- [Use trainable classifiers in auto-labeling](https://learn.microsoft.com/en-us/purview/apply-sensitivity-label-automatically)
- [Use trainable classifiers in DLP policies](https://learn.microsoft.com/en-us/purview/dlp-learn-about-dlp)
<!--- Results --->
%TestResult%

282 changes: 282 additions & 0 deletions src/powershell/tests/Test-Assessment.35036.ps1
Original file line number Diff line number Diff line change
@@ -0,0 +1,282 @@
<#
.SYNOPSIS
Validates that trainable classifiers are integrated into auto-labeling and/or DLP policies.

.DESCRIPTION
This test checks if trainable classifiers are being used in policies by:
1. Retrieving all auto-sensitivity label rules and searching for trainable classifiers in AdvancedRule
2. Retrieving all DLP compliance rules and searching for trainable classifiers in AdvancedRule
3. Parsing AdvancedRule JSON to extract classifier details (identified by Classifiertype=MLModel)

.NOTES
Test ID: 35036
Category: Advanced Classification
Required Module: ExchangeOnlineManagement v3.5.1+
Required Connection: Connect-IPPSSession
#>

function Test-Assessment-35036 {
[ZtTest(
Category = 'Advanced Classification',
ImplementationCost = 'High',
MinimumLicense = 'Microsoft 365 E5',
Pillar = 'Data',
RiskLevel = 'Medium',
SfiPillar = 'Protect tenants and production systems',
TenantType = ('Workforce', 'External'),
TestId = 35036,
Title = 'Trainable Classifiers Usage in Policies',
UserImpact = 'Medium'
)]
[CmdletBinding()]
param()

#region Data Collection
Write-PSFMessage '🟦 Start' -Tag Test -Level VeryVerbose
$activity = 'Checking trainable classifier usage in policies'
Write-ZtProgress -Activity $activity -Status 'Querying auto-labeling and DLP rules'

$autoLabelCmdletFailed = $false
$dlpCmdletFailed = $false
$autoLabelRulesWithClassifiers = @()
$dlpRulesWithClassifiers = @()

# Query 1 & 2: Get auto-sensitivity label rules with trainable classifiers
try {
Write-ZtProgress -Activity $activity -Status 'Checking auto-labeling rules'
$allAutoLabelRules = Get-AutoSensitivityLabelRule -ErrorAction Stop

# Filter rules that contain trainable classifiers (MLModel in AdvancedRule)
$rulesWithMLModel = $allAutoLabelRules | Where-Object { $_.AdvancedRule -match 'MLModel' }

foreach ($rule in $rulesWithMLModel) {
try {
# Parse AdvancedRule JSON to extract classifier details
$advancedRule = $rule.AdvancedRule | ConvertFrom-Json

# Navigate to ContentContainsSensitiveInformation condition
$sensitiveInfoCondition = $advancedRule.Condition.SubConditions | Where-Object {
$_.ConditionName -eq 'ContentContainsSensitiveInformation'
}

if ($sensitiveInfoCondition) {
# Extract trainable classifiers from Groups (Value is an array)
$trainableClassifiers = @()
foreach ($valueItem in $sensitiveInfoCondition.Value) {
foreach ($group in $valueItem.Groups) {
# Check all groups regardless of name (could be "Default", "Trainable Classifiers", etc.)
foreach ($classifier in $group.Sensitivetypes) {
if ($classifier.Classifiertype -eq 'MLModel') {
$trainableClassifiers += $classifier.Name
}
}
}
}

if ($trainableClassifiers.Count -gt 0) {
$autoLabelRulesWithClassifiers += [PSCustomObject]@{
RuleName = $rule.Name
ParentPolicyName = $rule.ParentPolicyName
CreatedDate = $rule.WhenCreatedUTC
Classifiers = $trainableClassifiers
}
}
}
}
catch {
Write-PSFMessage "Failed to parse AdvancedRule for auto-labeling rule '$($rule.Name)': $_" -Tag Test -Level Warning
}
}
}
catch {
$autoLabelCmdletFailed = $true
Write-PSFMessage "Failed to retrieve auto-sensitivity label rules: $_" -Tag Test -Level Warning
}

# Query 3 & 4: Get DLP compliance rules with trainable classifiers
try {
Write-ZtProgress -Activity $activity -Status 'Checking DLP rules'
$allDlpRules = Get-DlpComplianceRule -ErrorAction Stop

# Filter rules that contain trainable classifiers (MLModel in AdvancedRule)
$rulesWithMLModel = $allDlpRules | Where-Object { $_.AdvancedRule -match 'MLModel' }

foreach ($rule in $rulesWithMLModel) {
try {
# Parse AdvancedRule JSON to extract classifier details
$advancedRule = $rule.AdvancedRule | ConvertFrom-Json

# Navigate to ContentContainsSensitiveInformation condition
$sensitiveInfoCondition = $advancedRule.Condition.SubConditions | Where-Object {
$_.ConditionName -eq 'ContentContainsSensitiveInformation'
}

if ($sensitiveInfoCondition) {
# Extract trainable classifiers from Groups (Value is an array)
$trainableClassifiers = @()
foreach ($valueItem in $sensitiveInfoCondition.Value) {
foreach ($group in $valueItem.Groups) {
# Check all groups regardless of name (could be "Default", "Trainable Classifiers", etc.)
foreach ($classifier in $group.Sensitivetypes) {
if ($classifier.Classifiertype -eq 'MLModel') {
$trainableClassifiers += $classifier.Name
}
}
}
}

if ($trainableClassifiers.Count -gt 0) {
$dlpRulesWithClassifiers += [PSCustomObject]@{
RuleName = $rule.Name
ParentPolicyName = $rule.ParentPolicyName
CreatedDate = $rule.WhenCreatedUTC
Classifiers = $trainableClassifiers
}
}
}
}
catch {
Write-PSFMessage "Failed to parse AdvancedRule for DLP rule '$($rule.Name)': $_" -Tag Test -Level Warning
}
}
}
catch {
$dlpCmdletFailed = $true
Write-PSFMessage "Failed to retrieve DLP compliance rules: $_" -Tag Test -Level Warning
}
#endregion Data Collection

#region Assessment Logic
$testResultMarkdown = ''
$passed = $false
$customStatus = $null

$totalRulesWithClassifiers = $autoLabelRulesWithClassifiers.Count + $dlpRulesWithClassifiers.Count

# Check if both cmdlets failed
if ($autoLabelCmdletFailed -and $dlpCmdletFailed) {
$testResultMarkdown = "⚠️ Unable to determine trainable classifier usage due to permissions issues or service connection failure.`n`n%TestResult%"
$passed = $false
$customStatus = 'Investigate'
}
# Check if one cmdlet failed but we have some results
elseif ($autoLabelCmdletFailed -or $dlpCmdletFailed) {
$failedQuery = if ($autoLabelCmdletFailed) { 'auto-labeling rules' } else { 'DLP rules' }
$testResultMarkdown = "⚠️ Unable to retrieve $failedQuery due to query failure, connection issues, or insufficient permissions.`n`n%TestResult%"
$passed = if ($totalRulesWithClassifiers -gt 0) { $true } else { $false }
$customStatus = 'Investigate'
}
# Check if any rules use trainable classifiers
elseif ($totalRulesWithClassifiers -eq 0) {
$testResultMarkdown = "❌ No trainable classifiers are being used in auto-labeling or DLP policies; relying solely on pattern-based classification.`n`n%TestResult%"
$passed = $false
}
else {
$testResultMarkdown = "✅ Trainable classifiers are integrated into auto-labeling and/or DLP policies, enabling AI-powered content classification for complex business documents.`n`n%TestResult%"
$passed = $true
}
#endregion Assessment Logic

#region Report Generation
$mdInfo = ''

if ($totalRulesWithClassifiers -gt 0) {
$formatTemplate = @'

## [{0}]({1})

{2}

**Summary:**
* Total Auto-Labeling Rules Using Classifiers: {3}
* Total DLP Rules Using Classifiers: {4}

'@

$reportTitle = 'Trainable Classifier Usage in Policies'
$portalLink = 'https://purview.microsoft.com/informationprotection/dataclassification/trainableclassifiers'

# Build details section
$details = ''

# Auto-Labeling Rules
if ($autoLabelRulesWithClassifiers.Count -gt 0) {
$details += "**Trainable Classifiers in Auto-Labeling Rules:**`n`n"
$details += "| Rule name | Parent policy | Created date | Classifiers in rule |`n"
$details += "| :-------- | :------------ | :----------- | :------------------ |`n"

foreach ($rule in $autoLabelRulesWithClassifiers) {
$ruleName = if ($rule.RuleName) { Get-SafeMarkdown -Text $rule.RuleName } else { 'N/A' }
$policyName = if ($rule.ParentPolicyName) { Get-SafeMarkdown -Text $rule.ParentPolicyName } else { 'N/A' }
$createdDate = if ($rule.CreatedDate) { $rule.CreatedDate.ToString('yyyy-MM-dd') } else { 'N/A' }

if ($rule.Classifiers) {
$sanitizedClassifiers = $rule.Classifiers | ForEach-Object { Get-SafeMarkdown -Text $_ }
if (@($sanitizedClassifiers).Count -gt 5) {
$classifiers = ($sanitizedClassifiers[0..4] -join ', ') + ', ...'
}
else {
$classifiers = $sanitizedClassifiers -join ', '
}
}
else {
$classifiers = 'N/A'
}

$details += "| $ruleName | $policyName | $createdDate | $classifiers |`n"
}
$details += "`n"
}

# DLP Rules
if ($dlpRulesWithClassifiers.Count -gt 0) {
$details += "**Trainable Classifiers in DLP Rules:**`n`n"
$details += "| Rule name | Parent policy | Created date | Classifiers in rule |`n"
$details += "| :-------- | :------------ | :----------- | :------------------ |`n"

foreach ($rule in $dlpRulesWithClassifiers) {
$ruleName = if ($rule.RuleName) { Get-SafeMarkdown -Text $rule.RuleName } else { 'N/A' }
$policyName = if ($rule.ParentPolicyName) { Get-SafeMarkdown -Text $rule.ParentPolicyName } else { 'N/A' }
$createdDate = if ($rule.CreatedDate) { $rule.CreatedDate.ToString('yyyy-MM-dd') } else { 'N/A' }

if ($rule.Classifiers) {
$sanitizedClassifiers = $rule.Classifiers | ForEach-Object { Get-SafeMarkdown -Text $_ }
if (@($sanitizedClassifiers).Count -gt 5) {
$classifiers = ($sanitizedClassifiers[0..4] -join ', ') + ', ...'
}
else {
$classifiers = $sanitizedClassifiers -join ', '
}
}
else {
$classifiers = 'N/A'
}

$details += "| $ruleName | $policyName | $createdDate | $classifiers |`n"
}
$details += "`n"
}

$mdInfo = $formatTemplate -f $reportTitle, $portalLink, $details,
$autoLabelRulesWithClassifiers.Count,
$dlpRulesWithClassifiers.Count
}

# Replace the placeholder with detailed information
$testResultMarkdown = $testResultMarkdown -replace '%TestResult%', $mdInfo
#endregion Report Generation

$params = @{
TestId = '35036'
Title = 'Trainable Classifiers Usage in Policies'
Status = $passed
Result = $testResultMarkdown
}

if ($null -ne $customStatus) {
$params.CustomStatus = $customStatus
}

# Add test result details
Add-ZtTestResultDetail @params
}