Skip to content

Commit 6190603

Browse files
authored
Refactor/allow any language (#2061)
* Update allowanylanguage.js Script was not working when testing in PDI. With help of AI wrote a script that has the same purpose, has a logical order, is well commented, and works (validated this on my PDI). * Update README.md Expanded the readme, to describe the refactored code, while I kept the original Elon joke.
1 parent bd62cef commit 6190603

File tree

2 files changed

+108
-2
lines changed

2 files changed

+108
-2
lines changed
Lines changed: 57 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1,57 @@
1-
Allow Any Language Character Remove Special Characters, Can be used to verify valid names. Sorry Elon Musk's First born.
1+
# trimNonCharacters
2+
3+
## Description
4+
5+
This JavaScript function removes special characters from strings while preserving valid characters from multiple language groups.
6+
7+
Useful for validating names or user input in internationalized applications.
8+
9+
> Sorry, Elon Musk's firstborn — `X Æ A-12` might not make it through unscathed.
10+
11+
## Features
12+
13+
- Removes punctuation, emojis, and symbols
14+
- Preserves:
15+
- Basic Latin letters (A–Z, a–z)
16+
- Digits (0–9)
17+
- Whitespace and parentheses
18+
- Accented characters (e.g., é, ñ, ü)
19+
- Characters from:
20+
- Central/Eastern European languages
21+
- Cyrillic (Russian, Ukrainian)
22+
- Greek
23+
- Arabic
24+
- Hindi/Sanskrit (Devanagari)
25+
- Chinese, Japanese, Korean (CJK ideographs)
26+
27+
## Compatibility
28+
29+
- Fully compatible with **ServiceNow background scripts**
30+
- Avoids unsupported features like:
31+
- Unicode property escapes (`\p{L}`)
32+
- Multi-line regex literals
33+
- Inline comments inside regex
34+
35+
## Usage
36+
37+
Change the input string to your own text/variable, call the function with the input, handle the result:
38+
```
39+
var input = "Hello, мир! Γειά σου κόσμε! مرحبا بالعالم! नमस्ते दुनिया! 你好,世界!";
40+
var cleaned = trimNonCharacters(input);
41+
gs.info("Cleaned: " + cleaned);
42+
```
43+
## Customization
44+
Language support is modular. Unicode ranges are defined in an array and can be commented out or modified as needed:
45+
```
46+
var allowedRanges = [
47+
"a-zA-Z0-9()", // Basic Latin
48+
"\\s", // Whitespace
49+
"\\u00C0-\\u00FF", // Western European
50+
"\\u0100-\\u017F", // Central/Eastern European
51+
"\\u0400-\\u04FF", // Cyrillic
52+
"\\u0370-\\u03FF", // Greek
53+
"\\u0600-\\u06FF", // Arabic
54+
"\\u0900-\\u097F", // Hindi/Sanskrit
55+
"\\u4E00-\\u9FFF" // Chinese, Japanese, Korean
56+
];
57+
```
Lines changed: 51 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1,51 @@
1-
module.exports.trimNonCharacters = (str) => str.replace(/[^\p{L}\d()\s]+/ug, '');
1+
function trimNonCharacters(str) {
2+
// Define Unicode ranges for each language group
3+
var allowedRanges = [
4+
"a-zA-Z0-9()", // Basic Latin letters, digits, parentheses
5+
"\\s", // Whitespace
6+
7+
// Western European (Latin-1 Supplement)
8+
"\\u00C0-\\u00FF", // e.g., é, ñ, ü
9+
10+
// Central/Eastern European (Latin Extended-A)
11+
"\\u0100-\\u017F", // e.g., Ą, Č, Ő
12+
13+
// Cyrillic (Russian, Ukrainian, Bulgarian)
14+
"\\u0400-\\u04FF", // e.g., мир, привет
15+
16+
// Greek
17+
"\\u0370-\\u03FF", // e.g., Γειά σου κόσμε
18+
19+
// Arabic
20+
"\\u0600-\\u06FF", // e.g., مرحبا بالعالم
21+
22+
// Devanagari (Hindi, Sanskrit)
23+
"\\u0900-\\u097F", // e.g., नमस्ते दुनिया
24+
25+
// CJK Unified Ideographs (Chinese, Japanese Kanji, Korean Hanja)
26+
"\\u4E00-\\u9FFF" // e.g., 你好,世界
27+
];
28+
29+
// Build the regex pattern string
30+
var pattern = "[^" + allowedRanges.join("") + "]+";
31+
32+
// Create the RegExp object
33+
var regex = new RegExp(pattern, "g");
34+
35+
// Apply the regex to clean the string
36+
return str.replace(regex, '');
37+
}
38+
39+
// Example input with comments for each language
40+
var input =
41+
"Hello, " + // English: "Hello, "
42+
"мир! " + // Russian: "world!"
43+
"Γειά σου κόσμε! " + // Greek: "Hello world!"
44+
"مرحبا بالعالم! " + // Arabic: "Hello world!"
45+
"नमस्ते दुनिया! " + // Hindi: "Hello world!"
46+
"你好,世界!"; // Chinese: "Hello, world!"
47+
48+
var cleaned = trimNonCharacters(input);
49+
50+
gs.info("Original: " + input);
51+
gs.info("Cleaned: " + cleaned);

0 commit comments

Comments
 (0)